Gravitational wave lensing beyond general relativity: birefringence, echoes and shadows

Gravitational waves (GW), as light, are gravitationally lensed by intervening matter, deflecting their trajectories, delaying their arrival and occasionally producing multiple images. In theories beyond general relativity (GR), new gravitational degrees of freedom add an extra layer of complexity and richness to GW lensing. We develop a formalism to compute GW propagation beyond GR over general space-times, including kinetic interactions with new fields. Our framework relies on identifying the dynamical propagation eigenstates (linear combinations of the metric and additional fields) at leading order in a short-wave expansion. We determine these eigenstates and the conditions under which they acquire a different propagation speed around a lens. Differences in speed between eigenstates cause birefringence phenomena, including time delays between the metric polarizations (orthogonal superpositions of $h_+,h_\times$) observable without an electromagnetic counterpart. In particular, GW echoes are produced when the accumulated delay is larger than the signal's duration, while shorter time delays produce a scrambling of the wave-form. We also describe the formation of GW shadows as non-propagating metric components are sourced by the background of the additional fields around the lens. As an example, we apply our methodology to quartic Horndeski theories with Vainshtein screening and show that birefringence effects probe a region of the parameter space complementary to the constraints from the multi-messenger event GW170817. In the future, identified strongly lensed GWs and binary black holes merging near dense environments, such as active galactic nuclei, will fulfill the potential of these novel tests of gravity.

The detection of gravitational wave (GW) signals from black-hole and neutron-star mergers provides a direct probe of Einstein's general relativity (GR) and fundamental properties of gravity. These tests have far reaching implications for cosmology, probing the accelerated expansion of the universe and dark energy models in a manner complementary to "traditional" observations based on electromagnetic (EM) radiation [1]. Observations are sensitive to how GWs are emitted and detected, as well as their propagation through the universe. GW emission and detection occurs in small scales by cosmological standards, in dense regions and near massive objects. In contrast, propagation can occur over vastly different regimes, and allows small effects to compound over very large distances.
GW propagation beyond GR is fairly well understood in the averaged cosmological space-time, described by the Friedmann-Robertson-Walker (FRW) metric. GWs are well described by linear perturbations due to the small amplitude of GWs away from the source. The high degree of symmetry of FRW solutions ensures the decoupling of scalar, vector and tensor perturbations, automatically isolating the propagating degrees of freedom, with deviations from GR represented by a handful of terms in the propagation equation. These facts greatly simplify the study of GWs, making it tractable even for highly complex theories beyond GR.
Corrections to FRW GW propagation have been well studied and have provided some of the most powerful tests of gravitational theories. Such is the case of the anomalous GW speed, measured to a precision |c g − c| O(10 −15 ) [2] with the binary neutron star merger GW170817. This measurement poses a phenomenal challenge to a broad class of dark energy theories [3][4][5][6], well beyond next-generation cosmological observations [7]. Other tests such as GW damping [8] are limited by precision in the luminosity distance measurement and the population of standard sirens, with the weak constraints from GW170817 [9] expected to improve in the future [10][11][12][13][14]. In addition, FRW GW propagation can be used to constrain interactions with additional cosmological fields [15] such as tensor [16] or multiple vec-tor fields [17], but only when the additional fields have a tensor structure. Despite of these achievements and prospects, tests of the propagation of GWs over FRW are intrinsically limited in probing gravity theories by the same simplifications that made them tractable in the first place.
Lensing of GWs offers important opportunities to test GR in at least three distinct ways. 1) In minimally coupled theories, lensing of electromagnetic radiation only probes the solution of the metric. In contrast, lensing of GWs tests the gravitational sector directly, including the fundamental degrees of freedom, their properties and interactions. 2) New propagating degrees of freedom are in most cases isolated by the FRW symmetries: even the simplest gravitational lenses break these symmetries and introduce new interactions with new gravitational fields (e.g. scalars). 3) Finally, beyond FRW effects can introduce new scales and affect the gravitational polarizations (+, ×) differently, providing signatures that do not require an electromagnetic (EM) counterpart. This enables tests from black-hole (BH) binaries, applicable to more events and at higher redshift. Specific examples of these features are explored in this work.
The well studied and rich phenomenology of gravitational lensing highlights the importance of understanding GW propagation beyond FRW in testing GR. Phenomena ranging from galaxy shape distortions, to multiple imaged sources, to the integrated Sachs-Wolfe effect are nowadays routinely used to probe dark energy and gravity. As detections of lensed GWs will become increasingly likely [18][19][20], modelling GW propagation beyond the FRW approximation will become critical to fully use rapidly growing catalogues of GW events to explore the intervening matter and its gravitational effects. As we will discuss here, theories beyond GR extend the range of gravitational lensing phenomena even further.

A. Summary for the busy reader
In this work we study the lensing of GWs beyond GR. We develop a general framework to study the GW propagation over general space-times, identify novel effects and forecast constraints on specific gravity theories. Our main results can be summarized as follow: • Core concept: over general space-times different gravitational degrees of freedom mix while they propagate. Each propagation eigenstate is a linear superposition of different polarizations that evolves independently. Eigenstates with different speeds cause GW birefringence. Non-propagating modes can also be sourced inducing GW shadows. We present our formalism in sections II and III.
• Novel phenomena: at leading order, the main observables are time delays between propagation eigenstates and with respect to light. Delays larger than the GW signal produce echoes. Time delays shorter than the signal cause interference patterns, scrambling the wave-form. We investigate these phenomena in section IV, where we also discuss the observational prospects. Particularly interesting events for these tests correspond to identified strongly-lensed multiple images and binary black holes merging close to a super-massive black hole.
• An example, screening in Horndeski: a natural arena for these lensing modifications are gravity theories with screened environments. We obtain the propagation eigenstates of Horndeski gravity over general space-times in section V. We then study the lensing time delays induced by Vainshtein screening in section VI.
• Detection prospects: These novel lensing effects could be critical to test gravity theories beyond GR. For our simple quartic Horndeski example theory we already find large sectors of the parameter that could be constrained beyond GW170817. Dedicated analyses could be applied to past and future LIGO-Virgo data. These birefringence tests do not require electromagnetic counterparts.
A schematic diagram of the effects of lensing beyond GR is presented in Fig. 1. A GW traveling near the lens splits in the different propagation eigenstates. If the modified gravity theory and background configuration around the lens is such that the eigenstates have different speeds, the overall GW signal could split into sub-packets after crossing the lens potential leading to echoes in the detector. If the time delay between the eigenstates is shorter than the duration of the signal, there will be interference effects producing a scrambling of the detected signal.

II. THE PROBLEM: A GENERAL THEORY FOR GRAVITATIONAL RADIATION
For any given gravity theory, the propagation of GWs can be determined from the equations of motion (EoM) for the linearized perturbations, which are obtained expanding around the background metric For concreteness, we will focus our discussion to metric theories of gravity with an additional scalar field, although our arguments could be easily extended to other types and number of fields. Expanding similarly the scalar field around the background solution the evolution of GWs h µν and the additional gravitational degree of freedom ϕ will follow a set of coupled equations where each of the differential operators depend on background quantities. Therefore, the propagation could be modified with respect to GR by (i) new interaction terms leading to D hh = D GR , (ii) the mixing with ϕ and (iii) the modification of the effective background in which GWs propagate. Any of these modifications makes solving the propagation of GWs significantly more complicated than in GR. The essence of the problem will be identifying the propagation eigenstates which diagonalize the EoM. In general, we will encounter two main obstacles with respect to the standard approach: fixing the gauge (section II A) and identifying the radiative degrees of freedom (DoF) (section II B). We will also introduce the short-wave expansion (section II C).

A. Gauge fixing
The richer structure of the propagation equations beyond GR affects the gauge fixing procedure. In synthesis, one can always fix the transverse gauge but not simultaneously set the traceless condition Imposing the traceless condition throughout relies on h obeying the same equation as the residual gauge, which is not true in general beyond GR. A gauge transformation is a diffeomorphism x µ → x µ + ξ µ that preserves the form of the background metric g µν . It acts on the metric perturbation as where derivatives and contractions involve the background metric g µν . We will start with the transverse condition (5), defined relative to g µν . 1 The transverse condition transforms as where the Ricci tensor of the background metric stems from a re-arrangement of covariant derivatives. The transverse condition is imposed by ξ µ (x) satisfying The above choice does not completely fix the gauge, as any additional transformation x µ → x µ + ζ µ (x) will preserve the transverse condition if This equation fixes the time evolution of the residual gauge.
Let us now investigate whether we can eliminate the trace of the metric h using the residual gauge. Using the trace of Eq. (7), eliminating the trace requires Although at some initial time we can always fix the amplitude of ζ µ to satisfy this condition, Eq. (11) will only be preserved if the trace has the same causal structure that ζ µ . This problem occurs in GR in the presence of sources (R µν = 0) and the trace cannot be eliminated globally. However, the difference beyond GR is that one cannot even fix the trace locally, because h will be subject to a different differential operator.

B. Identifying the radiative degrees of freedom
The presence of additional fields complicates the identification of the propagating degrees of freedom. On the one hand, the background field mixes the metric perturbations in new ways. In the case of a scalar field this is achieved with their derivatives, for example φ µ φ ν h µν or φ µν h µν . On the other hand, the extra perturbations have their evolution coupled with h µν . This means that the decomposition in radiative and non-radiative DoF will be background dependent and in general not possible in a covariant language. Moreover, the new interaction terms could source the non-radiative modes even in vacuum. Thus, we have to keep track of all the constraints and propagation equations.
In a local region of space-time, we are in the limit of linearized gravity and can decompose the 10 metric perturbations around flat space as 2 where Φ is a scalar (1DoF), w i is a vector (3DoF), s ij is a traceless tensor (5DoF) and Ψ is a scalar (1DoF). As discussed before, some of these perturbations are nonphysical and can be removed fixing the gauge. Under a gauge transformation, the above perturbations change as We can always set the spatial transverse gauge ∂ i s ij = 0 as We can also use ξ 0 to set Φ = 0 or the vector components to be transverse ∂ i w i = 0. These choices do not exploit the residual gauge freedom, but will be enough for our purposes.
In the spatial transverse gauge s ij contains the two transverse-traceless polarizations h + and h × . In this language, the fact that the background scalar mixes the tensor modes translates into Φ, Ψ and w i not being set to zero by the constraint equations. In general, the nonradiative DoF will be sourced by both s ij and ϕ, which themselves mix during the propagation.
C. Short-wave approximation As a working hypothesis we will consider that the wave-length of the GWs is small compared to the typical spatial variation of the background fields. That is, we will make a short-wave or WKB approximation [21], expanding the metric perturbation as and the scalar wave where we have introduced a set of amplitudes A (n) , a phase θ and a small dimensionless parameter . 3 The short-wave expansion leads naturally to the wavevector definition from the gradient of the phase. The leading order observables will be the phase evolution and propagation eigenstates, which are determined by the second derivative operators. In other words, we will be solving the mixing in the kinetic terms. Next to leading order contributions will introduce corrections to the amplitude and further mixings. We leave their analysis for future work. At leading order in derivatives, solving the propagation entails diagonalizing an 11 × 11 matrix where D ab is a matrix of second order differential operators and V b is a vector containing the 10 metric h µν plus the scalar degrees of freedom ϕ. Fortunately, as we discussed in section II B, locally we can reduce this to a 3 × 3 problem. We will generically refer to the propagation eigenstates as H J with J = 1, 2, 3. Moreover, we defineM, the mixing matrix changing from the basis of interaction eigenstates (h × , h + , ϕ) to the basis of propagation eigenstates (H 1 , H 2 , H 3 ): 3 is used for book-keeping only and can be set to one when the different orders in the calculation have been collected.
In addition, we will focus in the regime where the stationary phase approximation holds, that is, when the time delay between the lensed images is larger than the duration of the signal. A hard limit on the stationary phase approximation is the onset of diffraction and wave effects [22], which occurs when the multiple images interfere or the wavelength of the GW λ gw is of the order of the Schwarzschild radius of the lens r s = 2GM L /c 2 . For a compact binary this can be translated into where f gw is the frequency of the GW. In the band of ground-based detectors, wave optics is only relevant for lenses M L 100 − 1000M . At lower frequencies (e.g. LISA and other space-borne GW detectors) diffraction effects are produced by heavier lenses.

III. GW LENSING BEYOND GENERAL RELATIVITY
From the previous section we learned that over general backgrounds GW degrees of freedom mix during the propagation. Therefore, the first step to study lensing beyond GR is to identify the propagation eigenstates. In section III A we will use an example theory to identify propagation eigenstates as a combination of different polarizations, travelling at different speeds. This speed difference leads to birefringence (polarization-dependent deflection and time delays), which are discussed in section III B. The observational consequences will be discussed later, in section IV.

A. Propagation Eigenstates
In order to build intuition about kinetic mixing, let us consider a particular example. We will keep the discussion general for the moment and later show how this example materializes in a concrete class of scalar-tensor theories (see section V). Let us further assume that we have already solved the constraint equations and we are left with h + , h × and ϕ. At leading order, the equations for the propagating modes can then be written schematically as 4 where the coefficients of the kinetic matrixD can be read off by, in general, comparing with the covariant equations. In Fourier space and normalizing the fields canonically, we have (26) where k 2 = ω 2 − c 2 m k 2 (the factor k 2 indicates the mixing vanishes, on shell, for modes propagating at the speed of light) and M φ controls the mixing between the tensor and scalar modes. For solutions to exist the determinant of the kinetic matrix det(D) = G hh (G hh G ss − M 2 ϕ k 2 ) needs to be non-zero.
The propagation eigenfrequencies of the system are given by the characteristic equation det(D − λ i 1) = 0 and choosing ω so that λ i (ω i ) = 0, or equivalently In the absence of mixing (M φ = 0), the propagation of each mode is determined by the standard dispersion relations (25), which allows a non-luminal speed for scalars and tensors. The propagation eigenmodes can be obtained by solving (the second equality enforces the on-shell relation λ i (ω i ) = 0). In other words, the propagation eigenstates can be defined through the mixing matrixM that relates them to the interaction eigenstates, where the rows are precisely the eigenvectors v i . Note that because the equations of motion (24) define a symmetric matrix, the matrix of eigenvectors is orthogonal and we can simply invert this mapping by h =M T H. It is useful to define the phase speeds as where the directional dependence onk has been omitted. We will study the case in which the GW speed is not modified before presenting the general calculation.

Equal speed case c h = cm
In the case in which the GW speed c h equals the mixing speed c m the eigenvalue equation simplifies considerably: One can then check that the eigenmodes propagating with speed c correspond to the two metric polarizations. The third eigenmode is a combination of the scalar and metric perturbation v 3 it reduces to the scalar polarization when M φ /∆c 2 hs → 0. One should note that in this definition it has been assumed c 2 h > c 2 s , otherwise v 2 , v 3 are swapped.
Two quantities will be specially relevant in the following discussion: ∆c 2 10 ≡ c 2 1 − c 2 , the speed difference between the pure-metric eigenstates and electromagnetic signals; and ∆c 2 21 ≡ c 2 2 − c 2 1 , the difference between the mostly-metric and pure-metric eigenstates. In the limit of small mixing the second one can be expressed as A difference in the propagation speed between the first two propagation eigenstates leads to a polarization dependent propagation in the interaction basis. In other words, there could be birefringence in the detected GW signals. Therefore, we will generically refer to differences in the propagation with respect to light as multi-messenger, while the differences among propagation eigenstates will be referred as birefringent.
B. Birefringence, GW deflection and time delays There are 4 signals whose propagation can be studied at leading order in GW lensing beyond GR: electromagnetic radiation (or standard model particles) traveling at speed c 0 ≡ c and 3 propagation eigenstates traveling at speeds c 1 , c 2 , c 3 , which depend on the interaction basis speeds c h , c s , c m and the mixing M φ . A gravitational lens will imprint a deflection and time delay, which might differ between each signal. In addition lensing will (de-)magnify the images and introduce a characteristic phase shift for images that cross caustics [23,24]. Here we will discuss deflection angles briefly, before focusing on the implications of time delays. In the following we will assume sources and lenses in the geometric optics limit, where the wavelength of the GW is much smaller than the Schwarzschild radius of the source λ gw r s = 2GM L /c 2 . One should note that in general there will be two types of effects in modified gravity: an anomalous speed effect due to the modified effective metrics in which each eigenstate propagates and a universal effect due to the modified Newtonian potentials stemming from Φ, Ψ, whose relationship with the matter distribution might differ via modified Poisson equations. The anomalous speed effect will affect the deflection angle and time delays of each propagation eigenstate differently (e.g. birefringence). The universal effect is the same for all polarizations and ultra-relativistic matter signal due to the equivalence principle. Traditional lensing analyses in modified gravity have focused on the universal effect, searching for deviations in the gravitational potentials Φ = Ψ (see e.g. [25]). Here we focus on the novel effects due to the anomalous speed of the propagation eigenstates.

Deflection angle
Let us consider the deflection of a ray/signal propagating in the u direction. The eikonal equation for the phase of the propagation eigenmode I, cf Ref. [23,Eq. 3.15], readṡ wherek α is a derivative w.r.t. the affine parameter and the second equality assumes a static metric and canonical normalization (i.e. g µν I k µ k ν = −ω 2 + c 2 I ( x,k)| k| 2 and using the fact that k, x are independent variables). Expanding on small deviations around the unperturbed trajectory k α = k where the integral is obtained in the Born approximation by evaluating Eq. (47) on the unperturbed trajectory x α ≈ x α (0) and specializing to a spherical lens. We have defined the propagation directionk ∝û, the radial distance r 2 (u) = u 2 + b 2 and the gradient perpendicular to the propagation direction ∇ ⊥ (note that we can always define k (0) · k (1) = 0). The geometry of the problem is summarized in Fig. 2.
Equation (48) can be used to compute the deflection angle for light and ultrarelativistic particles with minimal coupling to the metric. In that case, the effective velocity induced by the perturbed potential Φ and Ψ (using the previously mention canonical normalization) leads to the standard expression in terms of the metric potential In the case of GR sourced by non-relativistic matter Φ = Ψ and one recovers the standard result α GR ≈ 2 ∇ ⊥ Φdu. In theories without GW birefringence all eigenstates are deflected by α 0 . Birefringence will cause the deflection angle between two eigenstates I, J to differ by and vanishes in the limit of equal speed as expected.
Typical GR deflection angles are small, on the scale of θ E ∼ arcsec M/10 12 M for strongly lensed cosmological sources. These deflections are hard to resolve even for the most precise optical telescopes. GW detectors have rather low angular resolution that is many orders of magnitude lower than what it would be required to detect a different incoming direction for different polarizations (although there are ambitious projects for high resolution GW astronomy in the next decades [26]). On the other hand, GW detectors have excellent time resolution, making time delays between gravitational polarizations a much more robust observable.

Time delays
There are three independent time delays that a given lens can imprint on the observables, ∆t 01 , ∆t 12 , ∆t 23 . Each time delay will be the sum of a Shapiro term (difference in speeds locally) and a geometric contribution (difference in travel distance): where we used the Born approximation discussed above (recall that the propagation speed will in general depend on the position as well as the propagation direction of the signal). Let us now discuss how the deflection angle (51) leads to the geometric time delay. Assuming a single lens and spherical symmetry, each propagation eigenstate obeys its own lens equation where β is the angular position of the source (equal for all polarizations) and θ I are the apparent position of the source for each polarization I (source and lens plane, respectively) cf. Fig. 3. We have defined also α I =α I D LS /D S . Here D L , D S , D LS are, respectively, the angular diameter distances to the lens, source, and between the lens and the source. In the case of multiple lenses one should substitute the source with the previous lens. The geometric time delay due to the different angles (assuming c I ≈ c over the trajectory) between two propagation eigenstates can be computed following the standard approach and a bit of trigonometry (see e.g. Ref. [23, section 4.3]). We obtain where z L is the redshift of the lens. The order of magnitude of the delay will be determined by the dilated Schwarzschild diameter crossing time As a rule of thumb, one can use that t M 10(M Lz /1M ) µs, i.e. the delay is ∼months, days and minutes for lenses with M Lz = 10 12 M , 10 10 M , 10 7 M , respectively.
In these units the geometrical time delay can be written as where the angles α I = α I /θ E are now normalized in units of the Einstein ring of a point lens Assuming that the difference between the deflection angles of the different eigenstates ∆α IJ is small compared to the light deflection angle α 0 , Eq. (50), ∆α IJ α 0 , we find where ∆t geo 0 is the geometrical time delay induced by the gravitational potential on a wave propagating at the speed of light. This quantity depends on the distance to the source, to the lens and the mass of the lens. For a point lens it is given by From this expression it is explicit that the time delay is subject to the geometry of the lens-source. The time delay will be maximal when the lens is at intermediate distances between the source and observer. The multi-messenger and polarization time delays, Eqs. (52,54) constitute the most promising observables of birefringence. Their exact values depend on the effective background metric for the GWs, through the theory parameters, lens properties and the configuration of additional fields around the lens. We will now turn to the general phenomenological consequences of birefringence and its observability (section IV). In section VI we will study a specific example of a theory with Vainshtein screening, with a detailed modelling of gravitational lenses.

IV. PHENOMENOLOGY AND OBSERVATIONAL PROSPECTS
Let us now analyze the broad phenomenological consequences of birefringence. We will start in section IV A by describing the observational regimes for different values of the time delay, with a discussion of the single and multi-lens case. In section IV B we will then discuss special lensing configurations, focusing on a source near a super-massive black hole. We will address the interplay between birefringence and multiple images due to strong lensing in section IV C. Finally, section IV D addresses the probability of detecting GW birefringence, along with current and forecasted constraints.

A. Observational Regimes: Scrambling & Echoes
There are three important scales when discussing tests of GW lensing birefringence for a given event and detector network: the time resolution, the duration of the GW signal and the timescale of the observing run. Three distinct observational regimes can be established, depending of how the time delay between the propagation eigenstates ∆t IJ relates to these scales.
The sensitivity to ∆t IJ will be determined by modelling as well as experimental uncertainties. For the delay between EM and gravitational signals, the error ∆t 0I is likely dominated by assumptions about the EM counterpart. For example, when the gamma ray is emitted after a binary neutron star merger. In contrast, the GW emission can be modeled accurately, e.g. using a post-Newtonian expansion or numerical relativity. Thus, delays between gravitational polarizations are mostly limited by the time resolution of the instrument, which will be of order 5 or ∼ ms for current ground detectors (LIGO/Virgo). Finally, we note that the emission of scalar polarizations is suppressed in many theories (due to screening mechanisms), which might make the scalar (or mostly-scalar) polarization very hard to detect, precluding a measurement of ∆t 3I . In the following we will focus mostly on the time-delay between the pure metric and mostly-metric polarizations ∆t 12 .
The duration of the signal T g reflects how long a detector is sensitive to a given event. Depending on the mass, compact binary coalescence observed by ground detectors can last from less than a second (black hole binaries) to over a minute (neutron star binaries). Continuous signals such as rotating neutron stars (ground detectors) or mHz compact binaries (LISA) can in principle be detected as well. In those cases T g is limited by the duration of the observational campaign T obs . Here we will assume continuous observation up to T obs : a more realistic analysis should account for the detector's duty cycle (the fact that detection are regularly interrupted for several reasons) when ∆t ij ∈ (T g , T obs ).
The following situations are possible: • Signal scrambling: if σ tg |∆t 12 | T g the signal is observed as a single event and time delay(s) between different eigenstates distort the waveform.
• Signal splitting/GW echoes: if T g |∆t 12 | T obs the signal is split and each eigenstate will be observed as a separate event. The orbital parameters of different events will be related (e.g. orbital inclination/orientation), and it may be possible to associate different echoes from the same underlying event.
• Single polarized signals: if |∆t 12 | T obs , only one instance of each event can be observed. This leads to an excess of edge-on signals, relative to the expectation of random orientations. 6 One should note that the first two effects are analogous to strong lensing where multiple images can be produced and might interfere if their time delay is of the order of the signal duration (sometimes called microlensing regime). However, we stress that these are completely different effects in origin and are also governed by different physical quantities (we will comment more on those 5 One can sharpen this estimate easily using a noise curve with h i → h i e f ∆t 12 applied to each polarization, see below. 6 Due to duty cycle/interruptions of the detector, a fraction of echoes are missed even if Tg |∆t 12 | T obs , leading to an excess of edge-on events. Given that source binaries are randomly inclined, knowing the antenna patter of the detector and having a large statistical sample may allow to discriminate this effect. differences below). The scrambling and echoes are thus independent of strong lensing and would apply to each multiple image if present. Moreover, with a large network of detectors one could distinguish the different polarizations further distinguishing the two effects.
In addition, multiple lenses along the line of sight will contribute a separate time delay. Misalignment between lenses causes a difference in propagation eigenstates for each subsequent lens (e.g. different angle φ). In this situation, each lens causes a separate scrambling or splitting of the signal. Let us first discuss the single lens case and then comment on the effect of multiple lenses.

Single lens
To better understand the effects of birefringence, let us consider the effect of a single lens on a head-on GW event, i.e.L ·n ≡ cos ι = 1 (this will be generalized later). In this case the ×, + polarization are emitted with equal amplitude, and one can define the basis so that they are proportional to metric components of the 1, 2 propagation eigenstates (i.e. rotating the coordinates so the azimuthal angle is φ = 0). In this case the signal after crossing the region where modify gravity effects are relevant is approximately given by where the ellipsis represent GW shadows, including those of additional polarizations. This relationship assumes that the amplitudes are approximately equal in the interaction and propagation basis, and that the mixing with the scalar mode is subdominant. While the exact relationship requires solving the GW propagation at subleading order, ω −1 , one can assume that the corrections are small, given the large frequency of GWs. This implies that the relative amplitude is unchanged in the propagation so that h + ∼ h 2 and h × ∼ h 1 . We are also not taking into account standard lensing effects (e.g. magnifications and phase shifts). All these assumptions could be generalized, but for pedagogical purposes we restrict the derivation to the simplest example. One should note too that these assumptions hold for GWs on FRW, where effects on the amplitude (α M ) are much harder to detect than effects on the phase (α T , m 2 g ). The strain on a given detector is then where A I is the detector's response for a given polarization, given the source's position in the sky. Figure  4 shows the effect of the time delay for a binary black hole signal, both on each polarization and as seen in one detector. The scrambling regime |∆t 12 | < T g is characterized by a time modulation of the amplitude, caused by the interference between the signals, as well as two distinct imprints from the merger, separated by ∆t 12 . In the splitting regime two copies of the signal are detected with a delay ∆t 12 and amplitudes given by the detector's response to each polarization. Multiple detectors provides further means to characterize the signal via different response functions, time delays, etc...
For this example we have considered an unlensed, nonspinning, equal-mass binary. However, some of these effects could be degenerate with binary parameters in more general systems. For example spinning, asymmetric binaries are known to introduce modulations in the waveform. Similarly, strongly lensed multiple GWs produce multiple images that might have short time delays for certain lenses. Nonetheless, with a network of detectors one could use the polarization information to break this degeneracies. For instance, if one expects the amplitude difference between the echoes be produced by the projection on the detector's antenna pattern of each eigenstate, one could use the information on the sky localization to constrain this possibility. If both polarizations can be detected independently, the degeneracy can be completely broken.

Multiple lenses
Multiple lenses can cause further scrambling and splitting of a GW source. Considering spherical lenses and treating their effects as independent, the relationship between the signal at the source and the detector can be approximated as Here h d,s is the vector of amplitudes in Fourier space in the interaction eigenstates at the detector/source.M is the mixing matrix introduced in (29), which relates the interaction h I , I ∈ (×, +, ϕ) and propagation H J , J ∈ (1, 2, 3) eigenstates. Here we have also introduced the delay matrix which encompasses the phase evolution of the propagation eigenstates (note that an overall factor e iωt1 has been factored out to express the results in terms of time delays). The subscript L denotes that the quantities depend on the lens properties (mass, mass distribution) and its configuration relative to the line of sight (impact parameter b, azimuthal angle φ). Schematically, equation (63) is telling us that if a GW crosses a region near a lens, the GW propagation will be determined by the propagation eigenstates, possibly leading to time delays among them. Therefore, after crossing the first lens the initial GW wavepacket could be split in separate packets for each H I . Then, if another lens is on the line of sight, each GW packet will subdivide again since the eignestates of the second lens will be in general different from the first one. In principle this process can be iterated for as many lenses are in the GW trajectory. A possible observational signature of these multiple splittings would be a significant reduction in the GW amplitude since for random orientations of the lenses the projection into the eigenstates at each lens will reduce the overall amplitude of the detected signals. Of course, the key question is how probable is to have this multiple encounters. We touch on the lens probabilities in section IV D.
Before moving on, we remind the reader that equation (63) is only valid at leading order and does not take into account the modifications of the amplitudes of the propagation eigenstates. In general both the mixing matrix and eigenfrequencies depend on the spatial coordinates. This means that there would be spatially dependent corrections to the amplitudes of H. This next to leading order corrections can be computed solving at higher order in the short-wave expansion. As previously alluded, we leave this analysis for future work.

B. Source near the lens
A particular interesting source-lens configuration happens when the GW source is very close to the lens. In that case, the GW will inevitably travel in a region where the background fields are relevant and more likely to enhance birefringence effects. Due to this particular geometry, the total time delay will be dominated by the Shapiro part, since the the geometrical time delay scales with the source-lens distance D LS .
A realization of this setup will occur if a binary black hole (BBH) merge near the disk of an active galactic nucleus (AGN) (see e.g. [27]). There, compact objects are expected to accumulate in specific regions of the accretion disk, the so-called migration traps, at around 20 − 300 r s [28]. A schematic representation of this type of systems is given in Fig. 5, where the impact parameter of the binary b is smaller than the typical scale r where modified gravity backgrounds become relevant. We remind the reader that this scale does not have to be related with the scale of strong lensing.
Recently, a possible EM counterpart to the the heavi-est BBH detected so far, GW190521 [29], was announced in [30]. The interpretation of this coincident EM-GW event was that the BBH mergered within the disk of an AGN: the large kick after the merger would have produced the flare. The mass of the SMBH was estimated to be ∼ 1 − 10 × 10 8 M , meaning that the binary might have merger at only 0.0002 − 0.03pc of the SMBH. Such short distance to the lens would make this event a great candidate to test modifications of gravity. It is to be noted, however, GW190521 is also the furthest event so far with the largest localization volume, making the clear association of a counterpart more difficult. In any case, if this BBH formation channel constitutes a significant fraction of the observed events, one could use this population to very efficiently constrain the GW lensing effects beyond GR discussed here. Moreover, LISA could also see the inspiral of ∼ 5 − 10 events of this class during a 4 year mission (see e.g. Fig. 2 of [31]). A multi-band observation together with an identification of the flare after merger would make this type of BBH system a truly unique laboratory of the theory of gravity.
The opposite scenario of a lens near the observer is also promising to probe birefringence. One possibility is to correlate the maps of nearby gravitational lenses with sky localizations of GW events: for instance, events located behind galactic plane could be used to test theories predicting a sizeable time delay by Milky Way galaxies.
These are examples of unusual lensing setups leading to observable consequences in theories with GW birefringence. In contrast, for standard lensing configurations observable effects are predominantly caused by intervening lenses. In the remainder of the section we will focus on intervening lenses.

C. Strong vs. weak lensing & multiple images
Lensing effects depend on the source-lens geometry and can be classified into strong and weak lensing depending on whether multiple images form or not. These standard multiple images are in addition to possible echoes/splitting caused by birefringence. In particular, a point lens is characterized by an Einstein ring radius where the Einstein angle θ E was given in (57). Whenever the impact parameter of the source is of the order or smaller than the Einstein radius, b r E , we are in the regime of strong lensing and multiple images of the same GW could be produced by the lens. In the case of having different propagation eigenstates, multiple images of each H I will be produced. In the opposite limit b r E we are in the regime of weak lensing where only one image can be detected. Note that the modify gravity lensing effects are a priori independent of the "standard" lensing regimes. Depending of the theory there could be large modifications even in weak lensing. This was schematically depicted in figure 1, where the scale of modify gravity r does not correspond to r E .
For example, a GW travelling near a point mass lens will form two images with positive (+) and negative (-) parity for each propagation eigenstate. For angular positions β 1, we can quantify the dimensionless time delay T ± = t ± /t M between the two images analytically [32] where we have defined the source angle in units of the Einstein radius y = β/θ E and the images positions This tells us that for source angles of the order of the Einstein radius, y ∼ 1, the delay between the images will be of the order of the characteristic lensing time scale t M , which for lenses of ∼ 10 10 M corresponds to a delay ∼ 1 day. If the impact parameter is much smaller than r E , y 1, the delay simplifies to T − − T + ∼ y, which implies that it will be parametrically smaller than t M . This means that for certain theories and lens-source geometry it is possible that there is a degeneracy between the delay of multiple images and the delay between the echos of different eigenstates.
The interplay between strong lensing and the anomalous speed lensing effects beyond GR will depend on the relation between the Einstein radius and the typical scale where modify gravity effect are relevant. For example, for modified gravity theories with an screening mechanism that we will study in section VI, the relevant scale to compare will be the Vainsthein radius r V . In the regime of weak lensing, when b θ E , only one image is detectable with a negligible magnification |µ I | 1/2 1. This was our assumption for Fig. 4, where we computed the echoes and scrambling assuming only one image.
Strong lensing probabilities have been discussed in the context of advanced LIGO-Virgo extensively [18][19][20] with rates ranging between 1 every 100 or 1000 events depending on the source population and lens assumptions. For LISA, it has been shown that a few strongly lensed GW from SMBH binaries could be observed [33], although the result is highly dependent on the modeling of the population of SMBHs.

D. Lensing probabilities
Let us now estimate the probabilities of observing GW birefringence by randomly distributed lenses. We will consider two generic dependences with the lens mass, proportional to 1) the Einstein radius and 2) a physical radius with a power-law dependence on the lens mass. We will use these simple models to compare with current GW data (assuming non-detection) and estimate the sensitivity of future observations. The probability of observing an event with a given property X (e.g. a time delay) is [34] where the optical depth is Here δΩ is an element of solid angle, dV = δΩD 2 L dz (1+z )H(z ) is the physical volume element given a solid angle δΩ, n(z ) is the physical density of lenses and σ X is the angular cross section. We will assume all lenses have equal mass and dilute as matter, with physical number density The lens mass distribution and other properties can be included straightforwardly in Eq. (69). Note that the prefactor can be written as Herer is the mean separation between lenses if the universe's critical density was distributed in objects of mass M L . Incidentally,r coincides with the Vainshtein radius for the theory studied in section VI for parameters The angular cross section σ X represents the area around a lens for which a propagation effect X is observable, where we take that i.e. effects are detectable for angular deviations ≤ σ X away from a lens. This form assumes spherical symmetry and that the effects are easier to detect closer to the lens, as it is expected for example from modify gravity screening backgrounds. If the effect X becomes undetectable for a smaller angle θ 0 (e.g. transitioning from the scrambling to the echoes regime) then the cross section would be σ X = π(θ 2 X − θ 2 0 ) instead. We will analyze two simple cases for θ X .
As a first case, let us assume detectability at a fraction of the Einstein radius where α X depends on the theory, but not on redshift or lens mass. The optical depth then reads and is independent of mass, which is a known property of lensing probabilities for point-like lenses and sources.
Mass dependence often arises from more detailed modeling, e.g. finite source size [35] or extended lenses producing multiple detectable images [34]. For comparison, let us also consider detectability below a given impact parameter around the lens where n characterizes the mass dependence and R X is detectable radius for typical galactic lenses, which depends only on the parameters of the theory. The optical depth is then (76) This dependence is general enough to include scalings like the Einstein radius (n = 1/2, but without the redshift dependence, cf. Eq. 74), the Schwarzschild radius (n = 1, as in theories with scalar hair) or the Vainshtein radius (n = 1/3, as in massive gravity or Horndeski theories cf. section VI). In the rest of this section we will assume that all the mass is effectively in lenses of 10 12 M . However, note that for n < 1/2 the contribution of lighter lenses can be significantly enhanced, cf. Vainshtein radius scaling in section VI, Eq. (152).
The dependence in the source redshift differs between both cases, as shown in the right panel of figure 6 for a ΛCDM expansion history. The dimensionless integral in the Einstein scaling case (74) is an order of magnitude smaller than in the physical scaling (76) for z 1. For a source at z s = 1, the integral in τ E X is ∼ 0.14, while the integral in τ ph X is ∼ 1.68. The difference at z 1 can be absorbed into the redefinition of the scale R X , but even in that case τ ph X is much larger at low redshift due to the projection effect, factor 1/D L . The physical scaling optical depth is favoured also at high redshift z 5, and it might be probed by LISA massive BH binaries [37].
The cross section models (74,76) can be used to derive constraints from existing GW catalogues. The detection probability distribution is governed by Poisson statistics where k is the number of birefringence detections and the rate (i.e. the mean of the distribution) is given by the total optical depth summed over the i = 1, · · · , N events in a catalogue.
The optical depth of each event is evaluated on the mean redshift inferred from the luminosity distance for simplicity (uncertainty of the recovered redshift can be included). Current constraints can be derived assuming non-detection (k=0) in the GWTC-1 of LIGO/Virgo O1+O2 sources [38]. The total optical depth (78) evaluated on the models (74,76) can be translated via Poisson statistics into for Ω L = 0.3, h = 0.7. We note that this limit is subject to a detailed analysis of the waveforms in GWTC-1, to confidently exclude birefringence effects. As we will see, future observing runs and next generation detectors can increase these bounds significantly.
In order to estimate the potential of future GW observations we consider the predicted total optical depth where the differential event rate is given by Here λ collectively determines all additional source properties (besides redshift), P det is the detection probability, R(z, λ) is the event rate (per comoving volume) and 3 is the comoving volume factor (physical volume times density factor). Equation (78) is recovered setting dN dz = i δ(z − z i ). The predicted total optical depth (80) can be used to estimate how future surveys can improve existing bounds (79). We take as a reference model of sources a population of BBHs consistent with GWTC-1 [36]. Specifically we take a power-law distribution of primary masses p(m 1 ) ∼ m −1.6 1 between 5 and 45M with a redshift evolution of the merger rate following the star formation rate [39,Eq. 15] normalized to R 0 = 30 yr −1 Gpc −3 . We set the detection threshold at a signal-to-noise ratio of 8 for a single detector. These predictions applied to LIGO O2 sensitivity are in good agreement with the results from GWTC-1, Eq. (79). Figure 6 shows the expected bounds on α X , R X after a year of observation with advanced LIGO design sensitivity (aLIGO) and Cosmic Explorer (CE) third-generation technology, together with the current bounds (79). The horizontal axis indicates the expected year when these projections could be achieved. In particular, aLIGO design sensitivity is expected to be achieved during the next observing run O4. Current constraints can be expected to improve an order of magnitude by O4, and two orders of magnitude after one year of Cosmic Explorer and other 3rd generation ground-based detectors. Note that bounds on the total cross section are quadratic in α X , R X , so the actual sensitivity increases by ∼ 2, 4 orders of magnitude, respectively.
The framework introduced in this section applies exclusively to a homogeneous and random distribution of lenses. It is important to note that in certain situations the location of the lens relative to the source might not be random and thus these results may vastly underestimate the probabilities. Examples include when the lens is near the observer (GW events located behind the Galactic Center) or when sources forms very close to the lens (stellar mass BH binaries in the vicinity of a massive black hole) as discussed in section IV B.

V. PROPAGATION EIGENSTATES IN HORNDESKI THEORIES
As a particular set up, we will concentrate in gravity theories adding just one extra propagating degree of freedom w.r.t. GR. We will restrict to those scalar-tensor theories with covariant second order EoM. This naturally leads us to Horndeski's gravity [40], whose action reads [41] with This theory has four free function G i of the filed φ and its first derivatives −2X = φ µ φ µ .
We will divide the analysis of this large class of theories in two. First, we will consider the subclass of theories in which the causal structure of the propagating tensor modes is determined by the background metric. Thus, in these luminal theories the phase evolution of GWs is equal to that of light (section V A). Then, we will consider non-luminal theories in which the tensor modes have a different causal structure (section V B).
The causal structure of GWs in Horndeski gravity over general space-times in the absence of scalar waves has been studied in [42]. For the subset of luminal theories, the propagation without scalar waves was investigated in [43], while a geometric optics framework including ϕ was developed in [44]. In addition, there has been large efforts to study the GW propagation over cosmological and BH spacetimes. We refer the interested reader to the recent review [1].
Since the GW and scalar wave evolution will in general depend on the propagation direction for an anisotropic background, it is useful to decompose the spatial components of the background tensors in terms of the directions parallel and perpendicular to the propagation trajectory of the GW, defined by the wave vector k i . Specifically, we decompose the spatial gradient of the scalar background as so that in the transverse gauge These identities will be handy in what comes next.

A. Luminal theories
Since the evolution equations are coupled in general (even at leading order in derivatives), the first step is to diagonalize them. Depending of the complexity of the theory, the diagonalization can be done covariantly. Indeed, we will see in this section that this is the case for those Horndeski theories with a luminal GW propagation speed.
Before that, it is useful to recall the case of GR, where one also needs to diagonalize the propagation in order to obtain a wave equation for each polarization. Although in GR there is no additional scalar field, we can effectively treat the trace as an additional degree of freedom. Starting from Einstein's equations, one can see that the tensor EoM of the linear perturbations, include a mixing with the trace at leading order in derivative, where O (∇h) captures terms linear or lower order in derivatives. The way to diagonalize these equations is to redefine the tensor perturbation tō (well-known as trace-reversed metric perturbations [21]). In this way, after fixing the transverse gauge on the new perturbations ∇ νh µν = 0, one recovers the standard wave equation Note that, at face value, this equation is telling us that the propagation eigenstates of GR are a combination of the tensor perturbations and its trace. In vacuum we can always fix the trace to zero (so that h µν =h µν ), but in the presence of matter its value has to be computed. The fact that in GR only the TT perturbations are non-zero in vacuum can also be easily derived solving the constraint equations. In particular, the 00 Einstein equation tell us that Ψ = 0, the 0j that w i = 0 and the spatial trace that Φ = 0. We are left then with the ij equations which lead to only two independent equations for h + and h × .
Horndeski theories with a luminal GW speed will share with GR the structure of the second order differential operator acting on the tensor perturbations. Such operator corresponds tō The fact that this operator contains the wave operator plus longitudinal terms makes the GW-cone and lightcone equal, and thus c g = c in the absence of ϕ [42].
In the following we will generalize this procedure to gravitational theories with luminal GW propagation: generalized Brans-Dicke, kinetic gravity braiding and the union of both.

Generalized Brans-Dicke
A pedagogical exercise is to consider a generalized Brans-Dicke type scalar-tensor theory described by an action which introduces a direct coupling between the scalar field and the second derivatives of the metric through G 4 (φ). At leading order in derivatives, the metric EoM for the linear perturbations are given bȳ where for convenience we have already introduced the trace-reversed metric and the differential operator (91). Thus, there is a mixing of the perturbations, which also occurs in the scalar EoM (see appendix B for more details). We can decouple both equations by introducing a new tensor perturbatioñ combining both the trace-reversed and scalar perturbations, which is a well known result in the literature (see e.g. [45]). After applying the transverse gauge condition on the new field ∇ µh µν = 0, the EoM simplify to where G αβ s is the effective metric for the scalar perturbations Therefore, the propagating eigenstates are a combination of the original metric and the scalar perturbations. At this order in derivatives and in the absence of sources, the scalar waves will only be present if they are initially emitted. Moreover, because ϕ multiplies g µν , the scalar perturbation will generically contribute to the trace of the tensor perturbations. We can see this explicitly when solving the constraint equations for the non-radiative DoF, obtaining Thus, in Brans-Dicke type theories, the scalar perturbation excites the scalar polarizations of the metric leaving an additional pattern in the GW detector [46,47]. 7 Noticeably, in this theory there is no mixing of the radiative tensorial DoF h +,× with the scalar ϕ (G +,×s = 0), so h +,× become directly the propagation eigenstates traveling at the speed of light.

Kinetic Gravity Braiding
Similarly, we can also diagonalize the propagation equations of Kinetic Gravity Braiding (KGB), a cubic Horndeski theory with a direct coupling between the derivatives of the metric and the scalar field through G 3 (X). Note that for simplicity we have fixed G 4 = const, although one could easily add a scalar field dependence like in the previous section. Because of this cubic coupling, the metric EoM display a mixing of the scalar and tensor perturbations, At this order of derivatives, we can diagonalize the equations by changing variables to 8 As in the case of Brans-Dicke, once we apply the transverse condition to the new tensor perturbationh µν , the EoM reduce schematically to Eq. (95-96) (see details on the form of the effective metric for the scalar field perturbations in appendix B). Accordingly, the main difference between KGB and Brans-Dicke is that the propagation eigentensor involves the scalar perturbation via the gradients of its background field. In other words, depending on the background, the scalar mode could contribute to other polarizations different from the trace. We can see this excitation of non-TT DoF directly by solving the constraint equations. For example, if the scalar background has only temporal components, φ µ = φ 0 δ 0 µ , the non-radiative DoF read 7 It is to be noted an analogous sourcing of the gravitational (nonradiative) potentials occurs over cosmological backgrounds, see e.g. Ref.
[48, Eq. 3.17-3.21] in the limit k H. 8 To the best of our knowledge, this metric perturbation diagionalizing KGB equations is novel in the literature.
and h + , × propagate independently of ϕ. On the opposite regime, if φ µ = (0, φ i ), we obtain that Moreover, for the radiative DoF, we find that the mixing with the scalar has the same causal structure that the tensor modes, We are then in the c h = c m case discussed in section III A 1, meaning that both h +,× will be propagating eigenstates moving at the speed of light. On the other hand, the scalar eigenstate will be a combination of the original scalar ϕ and the tensor modes h +,×

Luminal Horndeski gravity
Altogether, the most general luminal Horndeski theory would be a combination of the previous cases (107) The dependence in φ in G 2 and G 3 does not affect the diagonalization of the leading derivative terms in the EoM. Because we are solving for the linear perturbations, the EoM can be diagonalized by a linear combination of the previous field redefinitions, i.e.

B. Non-luminal theories
As we increase the order of derivatives of the couplings between the metric and the scalar, we enter on the realm of non-luminal Horndeski theories: theories in which the second order differential operator acting on h µν no longer corresponds to the one of GR,D αβ µν , Eq. (91). This induces a different causal structure in the effective GW metric compared to the one that EM waves are sensitive to, leading to c g = c [42], even in the absence of scalar perturbations ϕ. These theories involve higher order Horndeski functions with derivative dependence G 4 (X) and G 5 (φ, X).
Moreover, in this class of theories, the same couplings that produce an anomalous propagation speed induce a background dependent polarization mixing. Specifically, this mixing can be seen in the EoM from the contraction of perturbed Riemann tensors with first φ µ or second derivatives φ µν of the scalar field. Therefore, depending on the scalar field profile the polarizations of the metric may change as they propagate. In practice, this makes the analysis of the propagating DoF difficult in a covariant approach.

Quartic theories
A good example representing this phenomenology is a shift-symmetric quartic Horndeski theory where we have added a generalized kinetic term for the scalar. The leading derivative EoM for the tensor and scalar perturbations are then and where δR µανβ is a second order differential operator constructed by a linear combination of the perturbations of the Riemann tensor, C αβ µν and E αβ µν are background tensors made of second derivatives of the scalar profile and G αβ s is the effective metric for the scalar perturbations which depends on K X and G 4,X (see full definitions in appendix B). It is precisely the presence of δR µανβ which induces the non-luminal propagation. Note also that either G 4,X = 0 or G 4,XX = 0 triggers the mixing of the perturbations in both equations.
In the following we will concentrate in the simplest theory producing this effect, a quartic theory linear in X. 9 It is clear from the equations (110-111) that the dimensionless coupling controlling the mixing is whereX = X/M 2 Pl and we have introduced G s , which quantifies the value of |G αβ s |, to ensure canonical normalization of the scalar field. In other words, if G s is large, the scalar perturbations decouple from the GW evolution.
We now identify the propagation eigenstates of the quartic theory using two methods: a) perturbative solution for small mixing and b) diagonalization based on a local 3+1 splitting.
a. Perturbative solutions for Υ 1 In order to gain some intuition, we will consider first situations in which the GW-scalar mixing is small, Υ 1, so we can make a perturbative expansion of the propagation equations. Thus we expand the full solution as solving order by order iteratively. Accordingly, at leading order (LO), we have to solve simply where we have already applied the transverse condition ∇ µh (0) µν = 0. Therefore, at LO, the equations decouple and we can fix the TT gauge,h (0) = 0. As a consequence, if there is no initial scalar wave ϕ (0) (t e ), it will remain zero along the propagation. One can also see that whilē h At next-to-leading order (NLO), the mixing terms arise in the equations where we have set ∇ µh (1) µν = 0. Note that, sinceh µν is the only non-zero term from −2G 4,X δR µανβ indicates that the perturbations of the Riemann tensors are w.r.t. the zero-th order tensor perturbationh (0) µν . Consequently, the NLO equations tell us that ϕ (1) is only sourced if φ µν T T = 0. Moreover, one can also see that, when there is no initial scalar wave, the second term of the tensor equation (117) acts to modify the GW propagation speed. This can be shown explicitly by solvingh (1) µν with its Green function and noting how the propagator of the total solutionh NLO µν =h (0) µν +h (1) µν is modified. In the opposite situation, when ϕ (0) (t e ) = 0, the different propagation speed of the scalar wave introducing a dephasing in the mixing. Note however that even in the absence of an initial scalar wave,h µν is not necessarily TT. At next order (NNLO), the equations contain all their possible terms, so they are valid for any n > 1 (again ∇ µh (n) µν = 0). b. Local, general solution in the 3+1 splitting Although the general solution when the mixing is dominant, Υ ∼ 1, is not analytically tractable, we can obtain general solution in a local region of space-time where linearized gravity applies. This is equivalent to going to Riemann normal coordinates. We have to solve the evolution and constraint equations for the 11 DoF of the problem, s ij , ϕ, Ψ, w i and Φ (see Eq. 12). As before, we will work in the spatially transverse gauge, ∂ i s ij = ∂ i w i = 0, which it is always possible to choose. Moreover, for clarity in the equations, we will restrict to a static, spatially dependent scalar field background, φ µ = (0, φ i (x)). Additional details on the equations for this derivation are given in appendix C.
Let us focus for the moment on the case of a quartic Horndeski theory in the absence of scalar perturbations. In that case the leading derivative EoM are given by (110). Thus, essentially, we need to compute the different components of δG µν and δR µανβ φ α φ β . For reference, one should remember that in GR there is only δG µν present. As in GR, the 00-equation, provides a constrain equation for Ψ. The difference is that Ψ is sourced by φ i φ j s ij , even in vacuum.
We can proceed similarly for the other equations. For the the 0j-equations, we obtain the constraint equation for w j as In the GR limit we recover the case that w i is sourced by Ψ and consequently it vanishes in vacuum. Here, the new features are the couplings to the backgrounds as well as the dependence on s ij . Next we move to the trace of the ij-equations which yields an equation for the last non-propagating perturbation Φ, i.e In the GR limit Φ is only sourced by Ψ. Therefore, for the same reason as before, in vacuum both vanish. However, for quartic Horndeski Φ is sourced by Ψ, w j and s ij .
In conclusion, we have solved Ψ, Φ and w j in the transverse gauge (∂ i s ij = 0 and ∂ j w j = 0) in terms of s ij , which are the two transverse-traceless components. We denominate these non-radiative, non-zero perturbations GW shadows. We can obtain the equations for s ij plugging in these solutions for the non-propagating perturbations in the spatial tensor equations, cf. (C13-C14).
In order to take into account the scalar perturbation ϕ we have to both incorporate the new terms in the tensor equations and include the scalar EoM. Because we are expanding over flat space, the second derivatives of the scalar background are purely spatial φ 0µ = 0. We also make the further assumption that G 4XX = G 4XXX = 0. Then, the new contribution to the tensor equations is For the 00-equation, we have Then, Ψ can be solved in terms of s ij and ϕ. For the 0j-equations we add Similarly, w j can be solved in terms of s ij and ϕ once Ψ is substituted. For the ij-equations This allow us to compute the spatial trace (128) From this last equation we can solve Φ in terms of s ij and ϕ. Finally, we also have the scalar equation Once we solve the constraints, we end up with two independent equations from the ij-equations plus the scalar EoM for three DoF, h + , h × and ϕ. Therefore, we have solved the constraint equations. For simplicity we present the equations at linear order in G 4X , where they follow the structure of section III A with coefficients Non-linear terms modify the mixing coefficientsĜ ×s and G +s but preserveĜ hh . In this way we can solve the propagation diagonalizing the EoM as described in section III A. In the absence of mixing, the propagation speeds for the tensor modes is which also coincides with the speed of the tensorial propagation eigenstate c 1 = c h . On the other hand, the scalar speed without mixing reads in the limit where G 4XX = G 4XXX = 0 (a more general expression can be derived from the full equations in [53]). One should note that inhomogeneous GW speed (134) generalizes the result of [42,55] where ϕ was set to 0 and a TT-gauge was assumed without solving the constraint equations. This result agrees with the radial and angular speed obtained from the calculation of the small-scale perturbations around a BH in Horndeski gravity [56] and generalizes that result to arbitrary propagation direction. Finally, we have to remember that although nonpropagating, Ψ, Φ and w j cannot be set to zero. At leading order in G 4X they read (assuming propagation in z direction) This suggests that all the tensor polarizations will be excited and that the fact that there are only 3DoF can be seen from the correlations among the different polarizations. GW detectors can in principle detect these GW shadows.

Quintic theories
Quintic Horndeski theories also feature GW-scalar mixing at leading order in derivatives that cannot be diagonalized covariantly. In fact, the interactions have an increased level of complexity. In addition to the operators in (110-111), there will be, for example, contractions of the perturbations of the Riemann tensor with second derivatives of the scalar background φ µν . Because the scalar second derivative tensor could have different projections into the GW polarizations e + µν and e × µν , propagation effects are subject to polarization dependence. In particular, even in the absence of scalar waves, it is possible for the GW speed to depend on the polarization in a generic quintic Horndeski model. For instance, operators like would introduce such a birefrengent effect. An interesting exception is scalar Gauss-Bonnet gravity (sGB) [57], where due to the symmetry of the theory the tensor speed does not depend on the polarization [58]. This theory is the described by the Lagrangian where GB = R 2 − 4R ab R ab + R abcd R abcd is the Gauss-Bonnet invariant. After a bit of calculus, one can show that in the absence of scalar waves, the leading order equations for sGB are the same that for a quartic theory if one replaces Then, locally and at leading order, one obtains the propagation velocity which is the same for both polarizations. It is to be noted that here φ uu corresponds to the projection of the second derivatives of the scalar field background in the direction of propagation. Therefore, the novelty in the propagation speed of GWs in sGB compared to a quartic Horndeski theory is precisely this dependence in the second derivatives. We can go one step further and compute the mixing of the GWs with scalar waves at leading order in derivative in a vacuum solution (R B = R B µν =0). The EoM would look like where δG and δR µανβ are the perturbations of the Einstein and Riemann tensor respectively defined in appendix B. From these equations we can see that the main difference of the mixing in sGB and quartic Horndeski is that in the former the mixing is through the curvature background while in the latter this happens through the scalar field background. We leave the analysis of the detectability of the mixing of GWs and scalar waves in sGB for future work.

VI. PROBING GW PROPAGATION IN SCREENED REGIONS
In this section we will present detailed GW lensing predictions for a concrete Horndeski theory featuring Vainshtein screening. We first introduce the theory Lagrangian and parameters, as well as some quantities of interest. In section VI A we present the local solutions of the scalar field around spherical lenses, including screening phenomena. Section VI B briefly describes the cosmological behavior and limits imposed by compatibility with the GW speed on the cosmological background following GW170817. In section VI C we present detailed predictions for the multi-messenger and birefringent time delays for point-like lenses. Section VI D explores the emergence of GW shadows for signals propagating in a screened region. Finally, Section VI E discusses the prospects to further probe Horndeski theories using GW lensing and birefringence.
To exemplify this modified GW propagation due to screening, let us come back to a quartic Horndeski theory (see section V B 1). We will consider a linear coupling to the curvature of the form where the shift-symmetric quartic theory L shift-sym is given by (109) in which the free functions G i depend only on the derivatives of the scalar. This linear coupling can be thought as the leading order term of an exponential coupling e p 4φ φ/M Pl , which in the Einstein frame corresponds to a linear coupling to the trace of the energymomentum tensor. For concreteness, we will consider a polynomial expansion in the Horndeski parameters Note that we are measuring the field in Planck unitsφ = φ/M Pl so thatX = X/M 2 Pl . Each of these terms have an associated energy scale Λ n which determines the length scale at which non-linearities become relevant. We can define the non-linear length scale associated to the quartic theory, and associated to the scalar kinetic interaction. Here r s = 2GM is the Schwarzschild radius.

A. Local background
Screening mechanisms suppress fifth forces around massive objects, so that GR holds in their vicinity. This is achieved in different ways depending on the underlying theory [59], but typically it is caused by a particular background configuration preventing the propagation of scalar modes (fifth forces). These backgrounds can be induced by the local matter density or curvature profile, depending on the screening mechanism. Screened environments are natural set-ups for GW lensing beyond GR, since they introduce non-trivial background profiles around massive objects that could modify the GW propagation. GW lensing effects beyond GR are thus expected to be different for different types of screening mechanisms.
For the quartic theory under consideration, screening is caused by non-linear derivative self-interactions of the scalar field. Screening becomes effective within a scale known as the Vainshtein radius: (assuming p 4X n = 1 in the last equality). Whenever the coupling to matter p 4φ is of order one, the non-linear scale (150) corresponds to the Vainshein radius. The linearized field equation is valid for r r V : in that unscreened region the scalar field mediates a force ∼ p 2 4φ times that of gravity.
It will be convenient to measure distances in units of the non-linear scale of the quartic theory:r = r/r 4 . In this units, following [60], we can obtain the screening background from the dimensionless quantity x(r), whose algebraic equation for this theory is given by 10 whereM (r) accounts for the mass enclosed in a sphere of radius r, i.e.M (r) ≡ 4πM −1 r 0 (−T t t )r 2 dr. To isolate the dependence on the source mass distribution, we make the definition This is a convenient rewriting of the local scalar field background because all the dependence in the lens mass is isolated in the prefactorr s , while ∂φ/∂r is a profile depending only on the parameters of the theory. For example, outside of the source but well within the screening radius, r r V , the profile becomes constant and far from the source we recover the decay with the inverse square distance (156) 10 To link with the notation of [60] one can set µ = β = 0, ξ = p 4φ , η = 2c 2 , α = −p 4X and ν = p 4XX , as well as A(r) =M (r)/r 3 . As it is evident from the above equation, p 4φ indeed weights the coupling to matter. One should note that by differentiating (153) along the radial direction one can also obtain an algebraic equation for ∂ 2φ /∂r 2 as a function of the theory parameters and ∂φ/∂r. This is useful for instance to compute the second derivative background limit within the screening region.
We have seen previously that the coupling of the scalar perturbation to the tensorial radiative modes is supported by the second derivatives of the scalar background (recall equation 111). For a radial scalar configuration the second order partial derivatives read using r = x 2 + y 2 + z 2 . The projection to spherical coordinates (x = r sin(θ) cos(φ), y = r sin(θ) sin(φ), z = r cos(θ)) yields the following projections (see Fig. 2) Applying this general projection to the quartic theory, we can then read from the entries of the mixing matrix in eqs. (130-133) to obtain the scalar-tensor mixing coefficients 11 11 Note that these mixing coefficients directly connect with the simplified notation of equation (26) This will be the quantity determining how much mixing between polarizations there is when crossing a screened region, which is controlled mostly by the ratio r s /r 4 . Note that whenever the kinetic screening dominates over the Vainshtein mechanism, r 2 r 4 , the tensor-scalar mixing Υ will be suppressed.
The spatial dependence of the mixing modulus is shown in the left panel of Fig. 7 for a quartic theory with a standard scalar kinetic term (p 2X = 1 and p 2XX = 0). One should note that the linear theory (p 4X n = 0 for n = 1) represented in solid lines is independent of the non-linear scale r 4 represented in the color bar. This can be seen directly fixing N = 1 in the general formula for |Υ| given in (162). On the contrary, for a quadratic in X quartic theory (like Covariant Galileons), the mixing is sensitive to the non-linear scale and highly suppressed because if screening is efficient we are always in the regime r s r 4 . Taking into account the polarization information, in the right panel of Fig. 7 we present the spatial dependence of the + polarization mixing for a GW propagating in theẑ-direction. The mixing is larger perpendicular to the propagation direction.
Interestingly, the quartic theory linear in X can be mapped (see e.g. [54]) to an Einstein-Hilbert action plus a modified gravity term L M G ∼ G µν φ µ φ ν . A theory which has been studied in [61]. While theories with n > 1 suppress |Υ| they produce larger screening regions once other constraints are imposed, which may overcome the reduced mixing |Υ|. However, including n > 1 requires additional terms in the equations for GW propagation (for details see Eq. B21), difficulting the analysis. We will focus on the n = 1 theory for this first study, leaving the general case for future work.

B. Cosmological background
Before moving towards the GW lensing observables and their detectability, it is important to consider which region of the parameter space is still viable given present data. In particular, the almost simultaneous arrival of GW and EM radiation from the binary neutron-star merger GW170817 sets the most stringent constraints on the cosmological solutions of the quartic theory under consideration.
In the theory under consideration (147), the cosmological evolution of the scalar field velocity is given approximately byφ This relation is exact for an exponential coupling G 4 = exp(p 4φ φ) in a matter-dominated universe (cf. Ref. [62,Eq. 42]), but reduces to the theory at hand when p 4φ 1 and the contributions of G 4,X are negligible compared to the G 2 terms. It also provides a good order-of-magnitude description for the late universe in the presence of a cosmological constant. If the canonical kinetic term dominates, the scalar is cosmologically unscreened and its velocity readsφ whereasφ ∼ p 4φ HΛ 2 2 /2p 2 1/3 if the non-canonical term dominates (φ Λ 2 / √ p 2 ), corresponding to the cosmologically screened regime. In what follows we will assume the unscreened solution (164). For a quartic theory (149) the cosmological change in GW speed at z ∼ 0 reads where the last equality assumes the unscreened cosmological solution (164). Assuming that only one among the p 4X n coefficients is non-zero where we have set c 4Xn = 1, as it is redundant with Λ 4 in this case. Then the GW170817 constraint |α T | 10 −15 gives a relation between the two theory parameters. For a quartic theory linear in X (n = 1), this bound can be satisfied for sufficiently small matter couplings compared to the scale of the theory, i.e.
The suppression of modified gravity effects ensures that the approximations used to estimate the cosmological evolution (164) remain valid if Λ 4 H 0 . If Λ 4 H 0 , quartic Horndeski gravity breaks down as an effective field theory at energy scales comparable to GWs with typical LIGO/Virgo frequencies [63].
As we will see below, GW lensing and birefringence effects can extend constraints based on the GW speed over the cosmological background, Eq. (167). The reason behind it is that the scalar field gradient sourced by a massive lens is significantly larger than the cosmological time variation φ /φ 1 in a region much larger than the Vainshtein radius. We will use this fact to approximateφ ∼ 0 to compute GW propagation, i.e. considering static lenses hereafter.

C. Time delays
One of the main GW propagation observables is the time delay between different propagation eigenstates and with respect to a possible EM counterpart. For that, we need to compute first the propagation speeds. Assuming that the GW propagates in theû direction, using Eq. (134), we obtain the propagation speed of the tensor modes in the absence of scalar waves which tends to the speed of light when r s r 4 and/or θ → π/2. This velocity also corresponds to the one of the purely tensorial propagation eigenstates, c 1 = c h , when there is mixing. In the absence of mixing, the scalar speed is given by Eq. (135) to arrive at which tends to the speed of light when r 2 r 4 or p 2XX → 0.
When there is mixing but Υ is small, the speeds of the propagation eigenstates are where we have defined the difference in the speed w.r.t. the speed of light ∆c 2 i = c 2 i − c 2 and among different eigenstates ∆c 2 IJ = c 2 I − c 2 J . The dots refer to higher order terms in the expansion in |Υ|.
We can now compute the associated time delays between different signals. As discussed in section III B, there will be two contributions: the Shapiro and the geometrical time delay. We discuss them separately before  commenting on time delays between multiple images in strong lensing.

Shapiro time delay
The Shapiro time delay between the tensorial eigenstate and an EM counterpart, in the limit of small velocity difference ∆c 2 h /c 2 1, is where, again, u is the GW propagation direction. On the other hand, the difference between the two mostly tensorial polarizations, in the limit of ∆c 2 21 /c 2 h 1, is We see then that for a small mixing |Υ| 1 the time delay between the mostly tensorial eigenstates will be suppressed compared to the time delay of the fastest mode and the speed of light. We can observe this directly in Fig. 8, where we present the difference in the speed and associated time delays. Now, because the multi-messenger time delay ∆t 10 scales with the scalar background ∂φ/∂r, which becomes constant in the inner screened region, the delay saturates at impact parameters smaller than the Vainshtein radius. We find this precisely in the blue line of Fig. 9. On the hand, for the tensorial polarization delays, because the delay is also proportional to |Υ| 2 sin 4 θ, the delay increases as a function of the impact parameter. This is shown in the red line. We then conclude that for impact parameters much smaller than the screening radius the delay between the tensorial eigenstates ∆t 21 becomes more constraining that the multi-messenger delay ∆t 10 . Nonetheless, such close encounters are less probable (cf. section IV D).
Going to the particular quartic theory studied in this section, we can see that the multi-messenger time delay scales as Since the scalar field profile decays rapidly outside of the screened region, determined by r V ∼ p 1/3 4φ r 4 , the order of magnitude of the delay will be given essentially by r V times the ratios (r s /r 4 ) n (∂φ/∂r) 2n , where we can find the scaling of the scalar background in (155). For a theory with G 4 linear in X we can compute the order of magnitude of the maximum time delay The time delay thus increases with the coupling to matter p 4φ and the lens mass. We could also integrate analytically for u, b r V , since we know the solution of ∂φ/∂r, to obtain where the integration is performed from −u to u. This order of magnitude calculation can be compared with the explicit calculation that we present in the left panel of Fig. 10 as a function of the parameter space p 4φ and Λ 4 for a super-massive black hole (modeled as a point lens) of mass 10 10 M . We emphasize that our results can be easily adjusted to other masses. It is important to note though that for larger masses (galactic order of magnitude) one would expect the mass to be distributed in a halo, so that the point lens approximation is broken. Introducing a realistic mass distribution would reduce the mass contained in the inner screened region, reducing the induced time delay. We will elaborate more on this later in section VI E. Finally, let us mention that for the multi-messenger time-delay ∆t 10 there would be an astrophysical uncertainty of order 1 − 10 seconds. This means that in practice one can only rule out modify gravity theories with larger delays.
We can also compute explicitly the time delay between the polarization eigenstates. Starting from (174) and noting that for our fiducial theory c s = c (so that ∆c 2 hs = ∆c 2 h ), we find This means that ∆t 21 will be suppressed with respect to ∆t h . From Fig. 9 we see that ∆t 21 increases inversely proportional to the impact parameter b. We can also compute the time delay analytically close to the lens which in this case will dominate the overall integral since the major part of the delay is accumulated close to the lens. Nonetheless, one should remember that smaller values of b are less probable to occur. For that reason we fix b = r V to compute the time delays in the right panel of Fig. 10. Thanks to the ∆t 21 ∼ 1/b scaling, this plot can easily be adapted to other choices of the impact parameter. One should remember that the detectability of the birefringence time delay is only limited by the time resolution of GW detectors that can be considered to be ∼ms.

Geometrical time delay
In order to compute the geometrical time delay (54), we first need to obtain the deflection angle associated to each propagation eigenstate, which can be obtained from their propagation speed (48). In particular, the deflection angle between the tensor propagation eigenmode and the where in the second line we have specialized for a quartic theory linear in X. This corresponds to the blue line in Fig. 11, where one can see that inside the screened region the deflection angle difference decreases with the impact parameter.
We can similarly compute the deflection angle between the two mostly tensorial propagation eigenstates This corresponds to the red line in Fig. 11. In this case, the deflection angle is dominated by the behavior close to the lens. This can be approximated analytical solving the integral in the limit u, b r V . For our fiducial theory we obtain From this expression the most important feature is that it scales with the inverse of the square of the impact parameter. This growth is faster than the typical 1/b induced by a gravitational potential, as can be seen comparing with the black solid line in Fig. 11. One can see though that for impact parameters of the order of the Vainshtein radius the difference in the deflection angle is small compared to the effect of the point mass potential.
Using (54) we can translate the difference in the deflection angles into the geometrical time delay. As we have seen in Fig. 11, the difference in the deflection angle will be much smaller than the deflection angle induced by the gravitational potential (except very close to the lens where the difference reduces). Therefore we can use the approximate expression for the geometrical time delay given in equation (58) that makes use of this hierarchy in the order of magnitude of the deflection angles.
The mass of the lens and its relative location in the line of sight determine the relative importance of the Shapiro and geometric contributions to the total time delay. In Fig. 12 we present the ratio of both time delays as a function of the lens redshift. The geometrical time delay dominates for lenses halfway to the source, while Shapiro dominates when z L → 0, z S . With fixed b ∼ r V , this is driven by the proportionality with the universal deflection angle α 0 ∝ r 2 E /b in Eq. (58), as r 2 E ∝ D L D LS is reduced when the lens is near the source or the observer.
The Shapiro-to-geometric contribution depends differently on lens mass in multi-messenger and birefringent delays (different line styles, left panel of Fig. 12). The Shapiro contribution to ∆t 21 is reduced with increasing mass, while the opposite is true for ∆t 10 , independently of the lens redshift. The mass dependence can be understood from the right panel of Fig. 12, where we present how the Vainshtein radius compares to the Einstein radius as a function of the scale of the theory for fixed α T = 10 −16 . For Λ 4 = H 0 , as chosen in the left panel, a lens with 10 10 M will have the Vainshtein radius well within the strong lensing region, while a 10 5 M lens will have r V > r E . Whenever the impact parameter is smaller than r E , the geometrical time delay will be large. On the other hand, the multi-messenger Shapiro delay scales with the Vainshtein radius and decreases faster than the geometrical one when the lens mass is reduced. Finally, the birefringent Shapiro delay is mostly accumulated near the lens and thus is less affected by the reduction of the lens mass than the analogous geometrical delay.

Multiple image time delays
As introduced in section IV C, in the regime of strong lensing there will be multiple images with an associated delay between them. At the same time, each of this images will be subject to the effects of Shapiro and geometrical time delay for the propagation eigenstates. It is therefore appropriate to ask how this multiple image time delays compare to the delay between the propagation eigenstates.
For the example screening theory that we are considering here, we have seen in the right hand plot of figure 12 that indeed the Vainshtein radius falls inside the Einstein radius for a sector of the parameter space. At small impact parameters, the time delay between the images scales as ∆t ± ∼ t M · b/r E while the Shapiro time delay between the mostly tensor polarization scales as ∆t 21 ∼ t M · r V /b. Therefore, depending on the value of b, the time delay between the images could be larger than the one between the eigenstates or viceversa. Note that in terms of statistics ∆t ± > ∆t 21 is much more probable, as it corresponds to larger impact parameters. Therefore, if a pair of GWs were identified as strongly lensed images of the same event, this would be a perfect candidate to look for additional lensing effects due to screening.

D. Polarization mixing and GW shadows
At leading order in the GW propagation, the other main observable is the appearance of additional polarizations beyond the transverse-traceless tensorial modes. We have shown that there are two ways in which extra polarization can arise: (i) by a direct mixing between the tensor-scalar perturbations through Υ and (ii) by background profiles inducing non-radiative polarizations, what we have called GW shadows.
For the propagating scalar polarizations, we can quantify the coupling of the tensor-scalar mixing from the mostly-tensor eigenvector. In particular, the third entry v 23 in the mixing matrix (29) informs us of the amplitude of the scalar mode that would be generated even if initially there is no scalar mode. In the top panel of Fig. 13 we plot this tensor-scalar mixing. For a linear quartic theory with a standard scalar field kinetic term, the mixing simplifies to v 23 = −|Υ| sin 2 θ .
We can observe that the amplitude of the scalar perturbation can only be sizable near the lens. With respect to the shadows polarizations, we can take as an example the non-propagating polarization Ψ. In particular, we consider a + polarized GW propagating in the z direction. Then, the amplitude is given by where the approximate equality accounts for the fact that we are neglecting the contribution from ϕ (that as we have just seen is small if initially ϕ is not sourced). Because of the sin 2 θ proportionality, the amplitude of Ψ evolves similarly to v 23 , as shown in the lower panel of Fig. 13, however the amplitude is many orders of magnitude smaller. Therefore, for this class of theories compatible with GW170817 detecting GW shadows seems out of reach.

E. Observational prospects
To conclude this section we will discuss the observational prospects of detecting these novel lensing effects beyond GR. The first question would be in which systems these effects would be relevant. In our calculations we have done two important assumptions: we have worked at leading order in geometric optics and modeled the lenses as point masses. Both effects limit the lenses available to test quartic Horndeski theories. We will comment on the implications of these assumptions and then discuss the potential of GW birefringence to probe our example theory.
Working in the geometric optics regime imposes a lower limit on the frequency of GWs for which the short-wave expansion (18,19) applies. The exact limit depends on the background solution around the lens and theoryspecific lower-order corrections to the propagation equations. Even neglecting beyond GR corrections, the frequency range is restricted by the diffraction limit in GR, below which lensing magnification becomes very inefficient. The GR diffraction limit, Eq. (23), corresponding to GW wavelengths larger than or comparable to the Schwarzschild radius of the lens and is shown in figure 15 for f ∼ Hz & kHz.
The diffraction limit excludes stellar-mass lenses to test birefringence using a short-wave expansion. This would be excellent lens candidates, as most stellar objects can be considered point-like, i.e. their sizes are much smaller than their Vainshtein radii, even for theories compatible with GW170817, cf. figure 15. Note that the validity of geometric optics is a limit on the framework, indicating the need of a wave-optics description. In particular, it does not mean that birefringence or time delays cease to exist. If a description similar to the wave-optics integral is valid at low frequencies (Eq. 3 of [32]), the birefringence time delay should leave an imprint on the waveforms, even if gravitational magnification is negligible. The point-mass assumption is a good description for impact parameters larger than the lens size. Because birefringence is suppressed beyond r V , ideal lenses should be smaller than their own Vainshtein radii. Effects on general lenses can be computed given their mass distribution. By Gauss' theorem, the scalar field profile around a spherically symmetric lens is sourced by the enclosed mass at a given radius, i.e.M (r) in Eq. (153). We will model extended sources as truncated singular isothermal spheres (SIS) and ρ SIS = 0 for R > R SIS . The truncation at R SIS ensures a finite total mass, but does not affect the results for low impact parameter. The SIS profile is widely used as a model for simple gravitational lenses. Note that the matter density diverges at the center. 12 The reduced enclosed mass at low radii flattens the derivatives of the scalar field, lowering the time delays in extended lenses. Figure 14 shows the Shapiro time delays for two lenses with 10 10 M : one with a point-like distribution and another one with a SIS profile truncated at R SIS ≈ 250kpc. The delay between gravitational polarizations ∆t 12 is more affected than the multi-messenger time delay ∆t 01 due to the different dependence on the scalar field derivatives, including the shear via Υ. This reduces both the slope and the amplitude of ∆t 12 .
The maximum time delay in finite lenses occurs at a parameter impact smaller than the nominal Vainshtein radius. The reason is that only the total mass with a radius r contributes to the scalar field profile. This motivates the definition of an effective Vainshtein radius r eff where the dependence on theory parameters has been omitted. For the truncated SIS, the mass dependence M ∝ r results in r eff For a point-lens the 12 A regular value of the central density will suppress beyond-GR effects near the center. In the case of a homogeneous density ρ(r) ∼ const, Υ ∼ 0 and birefringence effects vanish entirely.   The requirement of lenses being smaller than their effective Vainshtein radii limits the type of objects (or portions thereof) that can contribute significant time delays. Figure 15 shows the sizes and masses of known astronomical objects that could act as lenses. These are, in order of increasing mass, the Sun, a large star η Car A, a dense globular cluster M75, the massive black hole Sgr* in the center of the milky way, the very dense dwarf galaxy M85-HCC1, the super-massive black hole in M87, the Milky Way halo and the Galaxy Cluster "El Gordo". The mass profiles of extended objects have been extrapolated inward assuming a SIS distribution (185) using the total mass and size (or outward using the central density in the case of M75). This extrapolation suggests that some portion of extended lenses will be within its own effective Vainshtein radius, at least for theories compatible with GW170817 with low Λ 4 .
Super-massive black holes (SMBHs) appear as the optimal lenses to further constrain quartic Horndeski theories. But because black hole solutions have vanishing Ricci curvature, SMBHs would not source the field via the conformal coupling (147) in the specific theory under consideration. SMBHs could still provide an effective lens if they lead to the accumulation of dark matter around the black hole with sufficient density. In such scenarios, a "dark matter spike" could encompass a mass comparable to that of the central black hole in a very small central region of radius r ∼ 0.1(M SMBH /10 6 M )pc [64], sourcing the scalar field profile at the level to render the lens efficient. A coupling to the SMBHs may be induced by the cosmological evolution of the field, as it has been shown to occur for cubic Galileons (unconstrained by GW170817) [65][66][67].
We summarize the parameter space of the quartic Horndeski theory that could be constrained with lensing time delays in figure 16. As an order of magnitude estimate, we consider testable multi-messenger time delays ∆t 10 > 1s and delays between the polarizations ∆t 21 > 1ms. It is clear from the plot that a large new sector of the parameter space p 4φ , Λ 4 could be probed beyond current constraints from GW170817. For reference we also highlight the parameter space in which the scale of the effective field theory cutoff is smaller than LIGO frequencies [63]. Moreover, one can also see that the birefringent Shapiro time delay can constrain a larger portion of the theory than the multi-messenger delay, as can be seen comparing the orange and blue shaded regions respectively.
We also include the geometrical time delay induced by the modified deflection angle with dashed and dasheddotted lines for the multi-messenger and birefrengent delays respectively. We have chosen the redshift of the lens to give the maximum delay and, in this case, it can be more constraining than the Shapiro delay. One should note that while the geometrical time delay is subject to the lens-source-observer geometry, the Shapiro delay only cares about how close to the lens the GW passes. This means that for example if the lens is very close to the observer or source, the Shapiro time delay will dominate over the geometrical. This is interesting because from the sky localization of the source we can then ask for instance whether the GW has traveled close to the center of the Milky Way or Andromeda and quantify what would be the associated Shapiro delay. The possibility of detecting Shapiro time delays via birefringence allows novel tests of GR via GW lensing. One such possibility is the case of a binary merging in the environment of a SMBH, discussed in section IV B. If it turns out that there is non-negligible population of BBHs merging near an AGN, these would be ideal sources to constrain this type of modify gravity theories. For example, if the EM flare associated to GW190521 [29] was confirmed as an indication of this type of systems [30], this would imply that the binary would have merge very close to the SMBH, around 20 − 300 r s [28]. In this interpretation the mass of the SMBH near GW190521 would be ∼ 10 8 M so that the BBH would be located at ∼ 0.2 − 3 · 10 −3 (M L /10 8 M ) pc. For reference, we include in figure 16 with dotted lines where the Vainshtein radius is placed in the (p 4φ , Λ 4 ) parameter space for r V /r s = 10 3 , 10 4 , 10 5 . Therefore, BBHs merging in the accretion disk of an AGN would probe the whole birefringent Shapiro time delay parameter space region in orange.

VII. CONCLUSIONS AND PROSPECTS
Gravitational lensing of gravitational waves (GWs) is sensitive to the propagation of GWs around massive objects and cosmic structures. Gravity theories beyond general relativity (GR) modify the GW propagation by altering the background on which GW propagate and introducing mixing among different polarizations. A theory for GW propagation should unify known propagation effects on FRW backgrounds with new interac-tions between different gravitational degrees of freedom around gravitational lenses, incorporating and generalizing the phenomenology of gravitational lensing. Formulating such a theoretical framework poses a significant challenge.
Here we have analyzed the propagation of gravitational radiation beyond GR in general space-times. We have first developed a model-independent framework and then applied it to Horndeski scalar-tensor theories, without assuming that GWs propagate at the speed of light. We addressed the mixing between different gravitational polarizations induced by lenses that locally break homogeneity and isotropy, working at leading order in derivatives. This approach allowed us to study the causal structure and thus the arrival time of different signals. It also provides a measure of the mixing between different radiative degrees of freedom, but not the corrections to their amplitudes.
Our main conclusions can be summarized as follow: • Simplifications that allow the study of GWs in GR can not be generalized beyond. The traceless gauge can only be set as an initial condition. Nonradiative degrees of freedom, sourced by the GWs via constraint equations, become GW shadows.
• GW propagation is best described using propagation eigenstates H A , which differs from the interaction basis (h µν , ϕ) in a space-dependent manner.
Breaking the symmetry of the background is necessary for the H A 's to mix the metric and scalar fields in Lorentz invariant theories.
• Propagation eigenstates can travel at different speeds, c 1 , c 2 , c 3 . These depend on the theory and background solutions through the speeds for the metric, scalar & mixing term (c h , c s , c m ), and the mixing amplitude M φ .
• Gravitational lenses act like prisms, splitting the propagation eigenstates H I according to their speed. Differences in local speed and deflection angles contribute to lensing-induced time delays between H 1 , H 2 , H 3 and possible EM counterparts.
• The most promising novel observable is the birefringent time delay between the two mostly-metric polarizations H 1 = h × , H 2 ∼ h + (no EM counterpart needed). A lensed GW signal can interfere with itself, causing a scrambling of the wave-form, or be split into echoes from the same event.
• GW birefringence provides novel tests of theories with screening mechanisms. We present detailed predictions for a quartic Horndeski theory, showing how GW lensing effects have the potential to probe regions of the parameter space beyond the stringent limits set by GW170817.
This work is a first step in developing a theory for GW propagation beyond GR including additional polarizations, and exploring the phenomenology of polarization mixing. Future work needs to include the evolution of the amplitude (perhaps at lower WKB orders) to derive complete predictions that can be tested against GW data.
The first obvious step at the theoretical level is extending the computation to next-to-leading order in the short-wave expansion and beyond. The full geometric optics framework is needed to reliably compute the amplitude and explore new effects that persist in the high-frequency limit. Additional, post-geometric optics corrections are frequency-dependent and could be very constraining, even if they're suppressed by inverse powers of the frequency. As argued in the text, birefringence may persist in the wave optics limit (at least for f 1/r s , 1/r V ): a complete treatment will allow new lenses to be used to test beyond GR theories, including stellar mass objects as lenses for LIGO/Virgo sources.
Another future direction is to link the general framework developed in section III to other theories of gravity and place constraints on them. The example quartic Horndeski theory we have considered in section VI is already very constrained by GW170817, so further constraints require extreme lenses. However, in theories with multiple fields or Lorentz violation the cosmological/homogeneous deviation in the GW speed could be suppressed, allowing GW birefringence effects to place stringent constraints. As we have discussed in section V B 2, constraints may be derived also for theories with scalar hair, like scalar Gauss-Bonnet.
Future analyses should also test these novel beyond GR lensing effects against GW data. Under the assumptions outlined in section IV birefringence predicts a very simple modification of the waveform, depending only on two parameters per lens. In the scrambling regime, the predictions can be tested against available GW data, including degeneracies with source parameters. Tests in the "echoe" regime, which splits the signal in two, are more subtle and rely on either on pairing events with related properties (similarly to searches for strongly lensed signals) or on an excess of edge on sources (if signals are lost). A robust statistical framework is needed to carry those tests, as well as to use them to further constrain theories of gravity.
The nature of birefringence beyond GR allows new opportunities with respect to "traditional" lensing studies. For instance, lenses near either the source or the observer (with very small Einstein radii) have a decent chance to produce birefringence through the Shapiro time delay. Correlating signals with maps of known nearby lenses may allow to refine constraints substantially (e.g. signals coming through the galaxy plane). Moreover, if a fraction of the population of binary black holes merge in the disk of active galactic nuclei, as it is suggested by the possible EM counterpart of GW190521 [29] discussed in [30], these systems would be ideal for these tests. Similarly, identified strongly lensed GW pairs would be valuable probes since in that case it is also guaranteed that the GW has traveled close to the lens.
Altogether, gravitational lensing of GWs has the potential to become a fruitful laboratory in which to test gravity. It benefits from the precision of tests of the GW propagation, while avoids the necessity of identifying EM counterparts which limits the reach of tests of the cosmological GW propagation. Future GW observing runs will provide enough events for these and other novel lensing effects to be probed. This works represents a first step towards understanding the rich phenomenology of GW lensing beyond GR.
2. if the friction differs Γ = Γ X , fixing X is a good approximation only in a region |∆x µ | ∆Γ −1 , which is determined by the non-metricity K, but independent of the physical frequency.
3. if the mass term differs m 2 = m 2 X the fixing is good in a region |∆x| · ∆m 2 /| k| 1, which becomes arbitrarily large at higher frequencies.
Note that these conditions do not take into account the failure of the constant background assumption, which is independent of the GW frequency. Matter sources will also make it impossible to set h X = 0 (just as in GR).
While fixingc = c X can be done in general (this is reason for defining an ATG), doing so introduces friction and and curvature terms that limit the validity of h X ≈ 0 (cases 2,3). While the mass condition (case 3) might be unimportant for sufficiently large frequencies, the friction condition (case 2) imposes a frequency-independent limit to the condition h X ≈ 0. Depending on the difference betweenḡ µν ,g µν , this region may or not be large enough for the residual ATG to afford a valuable simplification.

Kinetic Gravity Braiding
We can follow a similar approach for Kinetic Gravity Braiding, the cubic Horndeski theory defined in Eq. (99). The metric EoM of this theory are given by G 4 G µν + 1 2 G 3,X φ φ µ φ ν − G 3,X φ α φ α(µ φ ν) + 1 2 g µν G 3,X φ µ φ µν φ ν = 0 (B11) and the scalar EoM follows Again, the leading derivative part of the EoM for the perturbations can be written in matrix form as where we have introduced the scalar term From here we se again that the scalar perturbation terms in the metric equation can be reabsorbed in a field redefinition as given by (101). Then applying the transverse condition one fully diagonalizes the leading derivative interactions.

(B21)
If we restrict to a theory where G 4XX = G 4XXX = 0, then we do not have to consider E αβ Therefore, as in GR, the 00-equation tell us that the spatial trace Ψ follows a Poisson-like equation where the source are the components s ij modulated by the background. This implies that for this theory only s ij contains propagating DoF. Note that in the case of having a scalar perturbation present this conclusion would not change.
In the transverse gauge, for quartic Horndeski in vacuum and ϕ = 0, we can write the solution of the 00equation, presented in Eq. (??, as We apply a similar strategy to the other equations. To simplify let us fix ∂ i w i = 0. For the the 0j-equations, the relevant terms are Therefore we obtain the constraint equation for w j presented in equation 122.
Next we move to the ij-equations. The two parts are and With all these calculations we can compute the trace of the ij-equations which determine the evolution of Φ given in equation 123.