Vainshtein Screening in Bimetric Cosmology

We demonstrate that the instabilities in linear cosmological perturbations in bimetric theory are the manifestation of the non-linear Vainshtein mechanism on an FRW background. The spin-2 mass serves as the cosmological Vainshtein scale in this case. This allows us to quantitatively address early universe cosmology. In particular, in a global analysis, we study data from the cosmic microwave background radiation and local measurements of the Hubble flow. We show that bimetric cosmology resolves the discrepancy in the local and early-time measurements of the Hubble scale via an effective phantom dark energy component.


I. Introduction
Modifications and extensions to the theory of general relativity (GR) are highly restricted [1,2]. One possible direction is the addition of new degrees of freedom to the massless spin-2 field in a consistent manner. Ghostfree bimetric theory describes a massless and a massive spin-2 field with fully non-linear (self-)interactions [3][4][5][6][7][8][9][10][11][12][13][14][15]. It contains both general relativity and massive gravity in certain parameter limits in which the massive or the massless field decouples, respectively. As such, bimetric theory fills a gap in the list of consistent field theories for massive and massless particles with spin up to 2 and represents an extended gravitational theory with a rich phenomenology.
Bimetric theory can address several open questions in modern cosmology. The theory gives rise to selfaccelerating solutions where the interaction energy between the massive and massless spin-2 field acts as dynamical dark energy [16][17][18][19][20][21][22]. The modification of the gravitational potential (as compared to GR) affects the required Dark Matter abundance from galactic to galaxy cluster scales [23]. Also, the massive spin-2 field itself serves as a candidate for Dark Matter [24][25][26]. Despite these successes, the stability of perturbations around the cosmological background [21,22,[27][28][29][30][31] still poses an open problem 1 . At early times, certain perturbations become large, rendering linear perturbation theory invalid.
In static systems with spherical symmetry an analogous behavior occurs [7,[32][33][34]: Non-linear effects in the perturbations are relevant and render the linear approxi- * mlueben@mpp.mpg.de † angnissm@mpp.mpg.de ‡ smirnov.9@osu.edu 1 Even though linear perturbation theory breaks down, the instabilities can be pushed to arbitrary early times such that the FRW background is stable at all energy scales below, e.g., the cutoff of the theory [31]. This is precisely the GR-limit of the theory. mation invalid. This is the well-known Vainshtein screening mechanism [7] that restores GR in spacetime regions where the energy density is large (compared to the spin-2 mass). In massive gravity, this behavior is crucial because it cures the so-called vDVZ discontinuity [5,6]).
In this paper, we argue that the growth of linear perturbations around the cosmological background is a manifestation of the Vainshtein mechanism. We give a physical argument for the cosmological version of this mechanism and find that the spin-2 mass sets the energy scale at which non-linearities become important. We further back up this argument by studying the evolution of metric functions on the background and the perturbative levels. It then becomes clear that the universe transitions from a Vainshtein screened period at early times into a late time de Sitter phase.
We use our result to address a highly debated problem of the ΛCDM model, the H 0 -tension. As has been discussed at length in the literature [35][36][37][38][39][40][41][42][43][44][45][46][47][48], the CMB observations tend to predict a lower Hubble scale than local measurements. We demonstrate that this tension can be resolved in a minimal realization of bimetric theory with only two more parameters than the ΛCDM-model. These are the mass of the spin-2 field and its coupling constant to ordinary matter. We perform a fit to the CMB, Cepheid and supernova Type Ia (SN Ia) observables and demonstrate that the tension with the current data is alleviated. Instead, all observables are within 1σ error intervals. Finally, we comment on the compatibility with observations of baryon-acoustic-oscillations (BAOs).

II. Review of bimetric theory
In this section, we review some technical aspects of the bimetric theory. Readers familiar with bimetric theory might want to jump directly to the next section. For a review on bimetric theory we refer to Ref. [49].

A. Action and equations of motion
The action of bimetric theory with standard model matter minimally coupled to one metric is [15] where R(g) and R(f ) are the Ricci scalars of the metric tensors g µν and f µν , respectively. The parameter m g is the Planck mass of g µν and α parametrizes the ratio to the Planck mass of f µν . The bimetric potential, whose structure is entirely fixed by the absence of ghost instabilities, reads in terms of the elementary symmetric polynomials e n . The interaction parameters β n have mass-dimension 2 in our parametrization. Matter fields, collectively denoted by Φ, minimally couple to the metric g µν via the generic matter Lagrangian L m .
Varying the action (1) with respect to the metric tensors yields two sets of modified Einstein equations, where G g µν and G f µν are the Einstein tensors of g µν and f µν , respectively. The variations of the potential yield where the matrices Y λ (n)ν (S) are sums of different powers and contractions of the matrix g −1 f , see, e.g., Ref. [11]. Since matter minimally couples to g µν , the stress-energy tensor for g µν is For usual matter (invariant under diffeomorphisms), stress-energy is conserved where ∇ µ is the covariant derivative compatible with g µν . Due to the Bianchi identity ∇ µ G g µν = 0, this implies an on-shell constraint on the potential, referred to as Bianchi constraint.

B. Proportional background
The theory has a well-defined mass spectrum only around proportional backgrounds where the two metrics are related as f µν = c 2 g µν for some non-vanishing constant c [15,50]. The solution is an Einstein background, R µν (f ) = R µν (g) = Λg µν , with a cosmological constant given by The equality of the first and second line is required for the consistency of the vacuum solution. It is a quartic polynomial in c and each root corresponds to a different Einstein solution to the bimetric field equations. The Fierz-Pauli mass of the massive spin-2 mode that propagates on the proportional background is given by in terms of the bimetric parameters. For later use, let us introduce the short-hand notationᾱ = αc. In the rest of the paper, we will refer toᾱ, m FP , and Λ as the physical parameters of a solution [51].

C. FRW solutions
To describe the universe on large scales, we assume spacetime to be homogeneous and isotropic according to the cosmological principle. Both metrics take on FRW form [16], where we fixed the time-reparametrization invariance such that we work in conformal time η of the g-metric.
The scale factors a(η) and b(η) of the metrics g µν and f µν are functions of time only. The f -metric lapse (1+µ) parameterizes the relative twist between the coordinate systems of the two metrics. In this sense, it is similar to a Stückelberg field since it would be shifted by time reparametrizations of the metric f µν . From now on, we suppress the η-dependence. For convenience, let us introduce the Hubble rate and the ratio of the scale factors as where the dot represents derivative w.r.t. conformal time.
The conformal and physical Hubble rates are related via H = aH. We assume the universe to be filled with a perfect fluid with stress-energy tensor, where ρ m is the energy density and p m the pressure of the fluid. They are related via the linear equation of state w m = p m /ρ m . u µ is the 4-velocity of the fluid. The conservation eq. (8) leads to the continuity equation, which is solved by ρ m = ρ m,0 a −3(1+wm) , where ρ m,0 is an integration constant. The physical scale factor is related to the redshift as a = (1 + z) −1 . The Bianchi constraint on the dynamical branch 2 is solved by where we have introduced the derivative w.r.t. e-folds, = d/d ln a. On the dynamical branch, the modified Friedmann equations for g µν and f µν read, Combining both Friedmann equations yields a quartic polynomial for y, that can be thought of determining y as a function of ρ m . The polynomial has in general up to four real-valued roots, of which only one (the so-called finite branch) is physical [20,22]. On this branch, the ratio of scale factors evolves from y = 0 in the early universe (where ρ m = ∞), to a finite constant value y = c in the asymptotic future (where ρ m = 0, i.e. de Sitter space). Taking the derivative w.r.t. e-folds and using the conservation eq. (17), we can express y as a function of y as [18] where ρ m is a function of y via eq. (21). In terms of the dynamical mass parameter 2 Another solution to the Bianchi constraint is the algebraic branch, where the ratio of the scale factors is fixed, y = const. This branch was studied in the literature and found to be pathological, see, e.g., Refs. [22,27,52].
we can rewrite eq. (22) as In the following discussion we always assume an expansion history on the finite branch 3 implying m 2 eff > 2H 2 [53].

III. Vainshtein screening
The Vainshtein mechanism was discovered in systems, where due to a locally increasing gravitational field, GR is restored by non-linear interactions [7,32]. We translate the analysis to the case where the gravitational field varies in time. By studying the cosmological background and linear perturbations we identify the energy scale at which non-linearities become important. Due to our analogy this can be interpreted as a cosmological Vainshtein mechanism.

A. The standard Vainshtein mechanism
On the technical level, the Vainshtein screening effect is caused by the strong coupling of the longitudinal helicity-0 mode of the massive spin-2 field.
In the case of a point source in bimetric theory, the critical radius below which the non-linearities of the fields become strong enough is given by [33], Here, the Schwarzschild radius for a source of mass M is given by For a central, point-like source (or on scales much larger than the massive object itself), the induced gravitational potential is then given by [25,33,54,55], Inside the Vainshtein sphere, i.e. when r r V , the gravitational potential is the same as in GR with Planck mass m g . Outside the Vainshtein sphere, the massive spin-2 field propagates, contributing an attractive Yukawa term to the potential. On scales much larger than the Compton wavelength, r m −1 FP , the Yukawa term is suppressed and the potential coincides with the one in GR, however with a different Planck mass given by m Pl = √ 1 +ᾱ 2 m g , compared to the one inside the Vainshtein sphere. In this sense, there are two different scales on which GR is recovered, but with different effective Planck masses. Note that we neglect an asymptotic cosmological constant in eq. (27).
Given a Schwarzschild geometry, on a technical level, the crucial indicator for the transition from the Vainshtein screened regime to the massive gravity regime, is the radial, relative metric twist µ(r) [32,55]. This function drops off quickly outside the Vainshtein regime and can be used as a small parameter in perturbation theory. Inside the Vainshtein regime µ(r) becomes large and the perturbative expansion breaks down. On the non-linear level, however, in this regime, the GR solution is recovered. We will show in the following, that the same behavior is present on the FRW background and use the temporal metric twist function µ(η), introduced above, to study when it becomes non-linear.
For a source of finite size, the Vainshtein mechanism can be at work, once the radius within which the matter is concentrated is smaller than the Vainshtein radius itself. This indicates that there exists a minimal density for the Vainshtein mechanism. In the next section, we will elaborate on this observation and apply its logic to cosmology.

B. The Vainshtein mechanism in cosmology
In this section, we will explore possible implications of the standard Vainshtein mechanism for the Universe filled with a homogeneous energy density.
In particular, using properties of the solution for a static, spherically symmetric system around a compact source, we would like to answer the following question: What is the critical density ρ c of a homogeneous mass distribution, for which the entire mass lies inside its own Vainshtein radius? We can only give a very rough estimate for this value since the standard expression for the Vainshtein radius is derived assuming that its value is larger than the radius of the source. In the following, we simply extend its definition to all possible configurations.
Let us thus consider a spherical, constant mass or energy distribution ρ(r) = const. This distribution can be infinitely extended with radius R = ∞ or have a large but finite radius R; for our argument below, this will make no difference. The mass M (r) enclosed within a distance r < R from the center of the mass distribution is, The Schwarzschild radius corresponding to this enclosed mass is, The Vainshtein radius corresponding to the mass M (r) is, The mass within the radius r fits precisely inside its own Vainshtein radius, when r = r V . This gives the value for the critical density, For less dense systems with ρ < ρ c , the Vainshtein radius is smaller than the radius that encloses its corresponding mass. For denser systems with ρ > ρ c , the mass enclosed by a radius r lies entirely inside its own Vainshtein radius.
We can now apply this result to cosmology, assuming that time evolution will not invalidate the arguments. Our reasoning suggests that there exists a critical value ρ c for the homogeneous energy density ρ(t) in the Universe, given by (31). Since ρ(t) is time-dependent and increases with redshift, there will be a point in time when it passes the value ρ c . Using Friedmann's equation, with ρ being the total energy density of the universe, this translates into a critical value for the Hubble rate, Moving backward in time, we reach the point t c at which H(t) passes the value H c . At this time, the energy density takes the value ρ c , for which its corresponding mass lies entirely inside its own Vainshtein radius. For a homogenous matter distribution like in cosmology, the Vainshtein radius scales linearly with radius, r V ∼ r. Therefore, if the energy density of the universe is below its critical value, ρ < ρ c (or equivalently H < m FP ), each Hubble patch is outside the Vainshtein regime. Analogously, for larger energy densities, ρ > ρ c , and hence at early times, each Hubble patch lies within its Vainshtein region. Even though we apply a concept derived in a spherically symmetric setup to a homogenous energy distribution, the equations do not single out a preferred point in space. This is consistent with translational invariance of FRW; our analogy does not spoil the cosmological principle.
Instead, this allows for another, rather curious interpretation. In the expanding universe, let us treat the Big Bang singularity as a central source and measure the distance to it by the inverse Hubble scale H −1 . From this perspective, the profile of the energy density distribution is ρ ∼ H 2 ∼ r −2 according to Friedmann's equation. The corresponding Schwarzschild radius is r S = H −1 . Calculating the Vainshtein radius in this setup yields which of course leads to the same critical Hubble rate H c derived above as this is just the special case of the above analysis for r = H −1 .
Vainshtein r�gime Since the universe expands faster than its Vainshtein sphere, it unavoidably becomes larger than this sphere. That happens at the critical Hubble rate H c = m FP .
In fig. 1, we show the different scalings of the Vainshtein radius and of the size of the observable universe as functions of the Hubble rate H. The cosmological Vainshtein radius scales as r V ∼ H −1/3 while the size of the observable universe scales as H −1 . Therefore, at early times when the Hubble rate is large, the Vainshtein radius is larger than the size of a Hubble patch. At the critical value set by the spin-2 mass, the former surpasses the latter, and at late times a Hubble patch is larger than its Vainshtein sphere.
In fig. 2, the evolution of the universe is schematically depicted, and can be interpreted as follows. The Big-Bang singularity serves as an analog of the central source in spherically symmetric systems. Moving away from the source, i.e. letting time pass, the evolution is expected to be governed by equations of motion equivalent to GR because we are inside the Vainshtein sphere. When the Hubble rate becomes comparable to the critical value, the universe exits the Vainshtein regime and enters a transition phase where the evolution is governed by the bimetric field equations. Asymptotically in the future, the universe approaches de Sitter space. In this sense, also at late times, GR is effectively recovered.
We emphasize again that we made simplifying assumptions about the validity of the expression for the Vainshtein radius 4 . Thus our arguments here can at most 4 Note that the standard derivation of the Vainshtein mechanism is restricted to scales r which are much smaller than the Compton wavelength of the massive spin-2 field, r m −1 FP . When considering the entire Universe, the Vainshtein radius is r V,crit = m −1 FP and thus violates this condition. give very rough estimates for the critical values of cosmological quantities. Nevertheless, these approximate values are supported by results in cosmological perturbation theory, see section III D. Moreover, we observe interesting behavior of the solutions already at the background level, as we demonstrate in the next section.

C. The spin-2 mass in background cosmology
In this section, we demonstrate at the level of Friedmann's eq. (19) that the background cosmology indeed transitions from the GR to the bimetric phase exactly at the energy scale m FP . In particular, we study the evolution of the Stückelberg field µ.
Let us start by analyzing the asymptotic behavior of µ. At late times, the matter-energy density ρ m vanishes implying that y vanishes and the Stückelberg field is small, cf. eq. (18). At early times y approaches zero and the energy density diverges as ρ m /m 2 g ∼ β 1 /(α 2 y). The Stückelberg field approaches the constant value, which is O(1). The Stückelberg field µ hence transitions from a large value at early times to a small value in the asymptotic future. This behavior is analogous to the effect in the Schwarzschild geometry, where µ asymptotes to a constant value when approaching the source.
In the following, we study the transition era in more detail. In order to give explicit numbers and plots, we focus on the β 0 β 1 β 4 -model defined by setting β 2 = β 3 = 0. For details on the precise expressions in this model, we refer to the appendix A. In fig. 3, we plot the Stückelberg field as a function of redshift z in the β 0 β 1 β 4 -model for three exemplary cases with different mixing anglesᾱ and spin-2 masses m FP as displayed in the caption. The vertical lines represent the critical redshift at H = m FP . Qualitatively we find that the Stückelberg field µ indeed starts to deviate from its asymptotic value as soon as the Hubble rate is of the order of m FP . We have explicitly checked this for various examples and find that the behavior is completely generic 5 . At the energy scale, m FP the Stückelberg field becomes small, entering the linear regime for growing z.
In fig. 4 we show the Hubble rate as function of redshift, for several exemplary values. Again, the Hubble rate starts to deviate from the energy scale set by ρ m /m 2 g when H ∼ m FP , as indicated by the vertical lines. However, the deviations are suppressed byᾱ. Therefore, the green (ᾱ = 0.01) and blue (ᾱ = 0.5) line almost coincide. For the red line,ᾱ = 2 is not small and the transition between the two phases can be seen directly from the plot. Since the parameters that lead to the red line imply β 0 < 0, the energy contribution from the bimetric 5 Note that for the case represented by the red line µ develops a peak. This happens only for models with β 0 < 0 as we checked explicitly. For submodels with β 0 = 0, µ does not develop a peak. Instead, the Stückelberg field for these submodels always decreases monotonically in time. Since this behavior is already captured by the blue and green examples, we do not demonstrate that explicitly here. Let us only note that for all these submodels, µ starts to deviate from its asymptotic value 3(1 + wm) as soon as the Hubble rate falls below the spin-2 mass. potential is negative for sufficiently large redshift. This implies that the Hubble rate is smaller than the energy scale set by ρ m /m 2 g . We now estimate at which scale the transition occurs. To identify the critical Hubble rate we would extremize the curvature of µ. However, this results in a polynomial of high degree that in general is not solvable analytically. Instead, we choose to extremize y as a bookkeeper, while keeping in mind that this might result in a parametrically different value for the critical Hubble rate compared to our discussion in section III B. When y = 0, y is at its maximum and thus the metric functions are changing most dramatically at this point. Therefore, y = 0 gives us a good hint for the energy scale of the transition.
No analytical solutions could be found for the polynomial y = 0 for the β 0 β 1 β 4 -model due to its high degree. Instead, we find the roots of y = 0 for the two submodels, β 0 = 0 and β 4 = 0. In both cases, the results are very lengthy. However, expanding aroundᾱ 1 (i.e. in the GR limit of the models) we get the remarkably simple result as the critical Hubble rate at which y = 0 for both submodels 6 , up to an O(1) factor.  In the limit of smallᾱ we find the same expression for the critical Hubble rate as we found with the Vainshtein analogy. Away from the limit, the critical Hubble rate does not only depend on m FP but also onᾱ and Λ. This can be seen in fig. 3 because the inflection point y = 0 and the critical redshift are not exactly aligned. Our analogy with the local Vainshtein mechanism, hence, gives a rough estimate of the critical Hubble rate. The remarkable feature however is, that in the limitᾱ 1, the value of the critical Hubble rate at which the Stückelberg field becomes non-linear is set only by m FP and becomes independent ofᾱ and Λ. In fig. 5, we show a schematic overview of the different regimes in bimetric cosmology.
In this entire discussion, we used the roots of y as a bookkeeper in order to identify when the Stückelberg field µ changes most dramatically. While this might appear somewhat arbitrary at first, in the next section we see that the roots play an important role in the stability of scalar perturbations around FRW background as well.

D. Linear scalar perturbations and Vainshtein
In this section, we connect our previous results to cosmological perturbations around the FRW background in bimetric theory. Their analysis received a lot of attention in the literature [21,22,[28][29][30][31]56] and here we will summarize and rephrase the conclusions. It was found that the linear scalar perturbations in the WKB approximation are unstable on subhorizon scales during early times (for an expansion history on the finite branch). More precisely, they are stable as long as the dynamical bound [22], is satisfied. For models with β 2 = β 3 = 0 this is equivalent to y < 0 [22]. Hence, the linear perturbations become unstable exactly when the background changes most quickly. In other words, exactly when the Stückelberg field becomes large, linear perturbation theory breaks down. For the β 0 β 1 -and β 1 β 4 -model we already identified the energy scale at which y = 0. For completeness, we also compute at which energy scale the dynamical bound in eq. (37) is violated for the remaining two parameter models, β 1 β 2 and β 1 β 3 . Again, the expression for the critical Hubble rate is too long to display it here, but in the limitᾱ 1 we find H * m FP (up to an O(1) factor) 7 as well.
The expansion history and the scalar perturbations are sensitive to the energy scale set by m FP as we have explicitly demonstrated for all the two-parameter models. When the Hubble rate is of the order of the spin-2 mass, H ∼ m FP , the Stückelberg field µ becomes non-linear and the linear perturbations start to grow exponentially. At the same energy scale, the universe becomes smaller than its own Vainshtein radius. Combining these results suggests that the Vainshtein mechanism is active also on a time-dependent background like the FRW and with spatially extended matter sources. Our analysis strongly suggests that the scalar perturbations become large as an artifact of the Vainshtein mechanism. A full calculation including non-linearities in the perturbations, just as for the local Vainshtein mechanism, should yield the GR result. However, it is beyond the scope of this work to prove that in a general manner.
To show that the non-linearities are such that they restore GR at the perturbative level, one would need to solve the full equations of motion. First attempts exist in the literature already justifying this claim. The authors of Ref. [56] solved the perturbation equations nonlinearly for early times with several simplifying assumptions 8 . Their analysis identifies the spin-2 mass m FP as the scale at which the Vainshtein mechanism kicks in and restores GR (for a generic model). In Ref. [57], on the other hand, the equations of motion were solved for an inhomogeneous mass distribution non-linearly. Again, no instabilities were found. Both these results show that the instabilities are indeed an artifact of the linear approximation and that the Vainshtein mechanism is active also on a time-dependent background with a spatially extended source. In a different setting, the traditional Vainshtein mechanism was used to investigate the effect of early time instabilities on structure formation, see Ref. [58]. 7 For the β 1 β 2 -model, H * was already computed in Ref. [31] in the limit α 1, but not in terms of the physical parameters. They did not interpret their result as the spin-2 mass. 8 They analyzed spherically symmetric perturbations on subhorizon scales and on length scales smaller than the Compton wavelength m −1 FP . Furthermore, they assumed the background to be proportional,ḡµν = y 2f µν , with a time-dependent conformal factor y. This is only a solution to the equations of motion when both metrics couple to their own matter sector, which are proportional on-shell.

E. Effective time-variation of the Planck mass
For a general model, the two modified Friedmann equations for g µν and f µν that depend on the ratio of the scale factors y and the matter energy density ρ m can be combined into a single modified Friedmann equation. The Hubble rate is a function of the energy density ρ m only [16], where F is a function with a model-dependent form. On the finite branch, the early universe is characterized by y → 0 and ρ m → ∞. Hence, from the Friedmann eq. (19) it follows that during early times the contribution from the bimetric potential to the Friedmann equation becomes subdominant. Indeed, for every model, the Friedmann eq. (38) can be expanded as for early times. Therefore, we can interpret the parameter m g as the cosmological Planck mass that is relevant during early times. The parameter m g is the Planck mass when the universe is in its Vainshtein phase. For the local Vainshtein mechanism we find exactly the same behavior, cf. eq. (27). At late times, the situation is different. The bimetric potential yields the leading contribution to the Friedmann equation. Asymptotically, the universe enters a de Sitter phase because the effect of the bimetric potential reduces to an effective cosmological constant, cf. eq. (10). Hence, also at late times GR is restored. Expanding the Friedmann eq. (38) around the de Sitter point yields [25] This expansion holds true for any (sub)model 9 . Also for late times, i.e. when the universe is in its de Sitter phase, we can extract an effective Planck mass, which differs from m g and is given by For a large Fierz-Pauli mass, m 2 FP Λ, this expression reduces to m dS m Pl = √ 1 +ᾱ 2 m g , for those models where all three physical quantities are independent. The term within the brackets is manifestly larger than unity, when the parameters satisfy the Higuchi bound [59]. 9 Note again that for models with less than three free interaction parameters βn the physical parameters are not all independent of each other [51].
Hence, is always holds that m dS > m g for any consistent set of parameters. Let us summarize. There are three different effective Planck masses in bimetric theory. The parameter m g is the Planck mass of the physical metric g µν and as such measures the coupling strength of ordinary matter to g µν . As we have seen, it measures the strength of gravity within the Vainshtein sphere of spherically symmetric systems, cf. eq. (27), as well as in the early universe on cosmological scales, cf. eq. (38). This is another indication that the Vainshtein mechanism is active in the early universe. We can interpret m g as the screened Planck mass.
On unscreened scales, we identified two different effective Planck masses. In spherically symmetric systems, the parameter m Pl plays the role of a Planck mass, strictly speaking on scales much larger than the Compton wavelength of the massive spin-2 field. On cosmological scales in the late universe, the parameter m dS quantifies how strongly matter affects the dynamics of the metric. Hence, on unscreened scales, different systems give rise to different Planck masses. However, on scales much larger than the Compton wavelength also a non-zero cosmological constant has to be taken into account. To compare the unscreened Planck masses under equal assumptions, we have to expand 10 m dS for m 2 FP Λ. In that case we indeed find m dS = m Pl .
Moreover, in the GR-limitᾱ 1 all three effective Planck masses coincide. Hence, in the GR-limit of bimetric theory, there is only a unique Planck mass as in GR. Note that both m Pl and m dS are always larger than m g , while the parametric relation between m Pl and m dS is not fixed.
Let us dwell on the cosmological Planck masses a bit longer. We can interpret m g and m dS as the asymptotic values of a time-varying Planck mass in the following sense. Let us write the bimetric Friedmann equation as 3H 2 = Λ + ρ m /m 2 c with a time-dependent function m c . The following definition is implied, We interpret the quantity m c is a cosmological Planck mass with an apparent time-dependence. It interpolates between the two asymptotic values m g and m dS . In fig. 6, we plot the effective cosmological Planck mass, in the β 0 β 1 β 4 -model, as a function of redshift z. We use the same parameter values as before. The explicit expression can be found in eq. (A5) in the appendix. For the case whereᾱ is not small, we see that the effective Planck mass changes its value at the critical Hubble rate 10 Again, for models with less than three free interaction parameters βn, one has to be more careful. E.g., for the β 0 β 1 -model we have to set m 2 FP = Λ, which automatically impliesᾱ = 0, in order to arrive at m Pl = m dS . H * = m FP . The Vainshtein mechanism again manifests itself as the cosmological Planck mass deviates from the screened value m g after the Hubble rate falls below its critical value.

IV. Impact on the H0 tension
Despite the huge success of General Relativity describing gravitational systems on many different scales to enormous precision and in particular of the ΛCDM, latest data challenge the Standard Model of Cosmology. Local observations of the Hubble flow are in good agreement with each other and constrain the value of the Hubble rate today to h = 0.7324 ± 0.0174 [35,45] where is the normalized Hubble rate today. In the ΛCDM model, CMB data from Planck constraints however favors a value h = 0.6781 ± 0.0092 [39] and represents a deviation of ∼ 3.4σ. While local measurements are quite sensitive to systematics [47,48,60], the constraints from CMB measurements highly depend on the gravitational model [39]. Indeed, CMB data alone favors a Dark Energy component with a phantom equation of state, w DE < −1 [61].

A. Effective phantom dark energy
Models with a phantom equation of state should be treated with caution. Phantom energy-momentum violates the dominant energy condition [62] and causes a future spacetime singularity (Big Rip) [63][64][65][66]. For simple models that build on a single field (as e.g. quintessence) a phantom equation of state implies the presence of a lowenergy ghost [67]. There are ways to get around these issues. The Big Rip can be avoided if the equation of state varies in time and approaches −1 sufficiently fast [68][69][70]. Ghost condensation can stabilize the vacuum [71].
In contrast to these simple realizations, bimetric theory incorporates an effective phantom equation of state naturally. Friedmann's eq. (19) can be written where we defined the density of Dark Energy that is due to the potential energy of the interacting spin-2 fields as The corresponding equation of state of the energy density ρ DE is given by A cosmic expansion history on the finite branch implies m 2 eff > 2H 2 and hence for ρ DE > 0 the equation of state is phantom, w DE < −1 [22]. Note that the equation of state varies in time. In the asymptotic future (when ρ m → 0) the equation of state approaches w DE → −1 and the effect of the dynamical Dark Energy reduces to a cosmological constant. Thus, the Big-Rip singularity can be avoided in bimetric theory. At early times the asymptotic value of w DE is either −1 (for models with β 0 = 0) or −2 − w m (for models with β 0 = 0).
The effective Dark Energy component violates the null energy condition (NEC) [72] allowing for a phantom equation of state while the sum of potential energy and matter stress-energy satisfies the NEC. In bimetric theory, this does not imply the presence of a ghost mode due to the nontrivial interactions between the different modes that give rise to the effective phantom dark energy. On the other hand, the NEC violation manifests itself in linear perturbation theory as the gradient instability. However, as we argued earlier, it appears to be an artifact of the calculation and higher-order terms have to be taken into account due to the Vainshtein mechanism. It would be interesting to study the connection to ghost condensation.
In the following fit to data, we restrict ourselves to the study of the β 0 β 1 β 4 -model. In appendix A we collect the equations that we need for the analysis. It is the simplest minimal bimetric model where all three physical parameters are independent of each other. Already the authors of Ref. [45] studied phantom dark energy as a possible resolution to the H 0 -tension. Additionally in. 11 In particular, the bimetric β 0 β 1 -and β 1 β 2 -model were used as concrete models. They conclude that these models are driven into their GR-limits and hence do not resolve the H 0 -tension. However, for these two-parameter-models of bimetric theory, not all the physical parameters are independent and the setup is too restricted. In particular, the fact that at small redshifts the equation of state is forced to be w DE −1 leads to a small mixing parameter α. As a consequence, the equation of state is forced to be close to −1 at all redshifts. This is not the case for models where the three physical parameters are free as we will demonstrate in this section with the β 0 β 1 β 4 -model. We find that solutions exist which feature w DE −1 at small redshifts but deviate from that value significantly at intermediate redshifts.

B. Parametrization and scanning strategy
To treat cosmological observables of late and early times on somewhat equal footings, we do the following. We approximate Friedmann's equation at late times by the Hubble law including the deceleration parameter q = −(1 +Ḣ/H 2 ) as Observations of Cepheid variables constrain the Hubble rate parameter today to be h = 0.7324 ± 0.0174 [35]. The quoted value is derived from the combination of four independent Cepheid observables, which we will use to constrain the local Hubble rate. Furthermore, we use observations of Tye Ia supernovae to constrain the deceleration parameter q = −0.41 ± 0.1 [74,75]. We consider supernova redshifts in the interval 0.15 < z < 0.62, such that the description in terms of the deceleration parameter is still valid. This approach projects a large number of SN-Ia observables on one coarse grained parameter. We choose this description, to have a similar number of local and global observables, as we will discuss shortly. One crucial point of the present analysis is the following. We assume that the physics that controls the inhomogeneities of the CMB in bimetric theory is identical to GR. This assumption is justified since the Vainshtein mechanism ensures that GR is restored at early times at the background level. For our analysis, we further assume, that small perturbations around this GR background are well described by the standard CMB perturbation theory.
The essence of the CMB physics, can be well captured by the following coarse-graining method, suggested in 11 In Ref. [73], the inverse distance ladder method is discussed with the goal of testing the parameter region in bimetric theory, relevant for the H 0 tension.
Ref. [45]. At the core of the analysis, there are only three physical observables. Two of which are, so called, shift parameters based on the comoving angular distance to the last scattering surface given that Ω K = 0 , where z * is the redshift at which decoupling happens, and the sound horizon The physically constrained combinations are • the angular distance normalized to the Hubble horizon at decoupling R = √ Ω m h 2 D A (z * ), • and the principle multipole number l A = π D A (z * ) rs(z * ) . The third parameter is the energy density of baryons at decoupling Ω b h 2 .

C. Results
In Tab. I, we show the best-fit values, one sigma intervals and the χ 2 values of the ΛCDM and the β 0 β 1 β 4 models. As expected, the ΛCDM model fit is poor. With a best-fit value of χ 2 ≈ 12 and given the four degrees of freedom, this corresponds to a ∼ 3σ tension of the global fit. The error intervals are derived from projections on the one dimensional subspaces of the likelihood function.
In contrast to this, when the fit is performed in the β 0 β 1 β 4 model, the fit is improved and χ 2 ≈ 1.3. Given that we have two additional fit-parameters and thus two degrees of freedom, this corresponds to an excellent fit value, indicating that all observables are within the 1σ error range. Another measure of the improvement is the ∆χ 2 /dof, which in this case is ∼ 5. We conclude that the given data-set strongly favors the β 0 β 1 β 4 model.
The χ 2 functions and 95% confidence intervals of H 0 , around the best-fit points of the ΛCDM model and the β 0 β 1 β 4 -realization of bimetric theory. We have three fitted CMB observables, the deceleration parameter and four Cepheid measurements, that are fitted with a four parameter fit, resulting in four degrees of freedom for the ΛCDM-fit. And consequently two degrees of freedom for the β 0 β 1 β 4 -fit. The fit improvement is substantial, with ∆χ 2 /dof ≈ 5. = 0.0429 ± 0.005. In both cases Ω γ is negligibly small for late time observations. In Fig. 7, we show the χ 2 as a function of H 0 for the ΛCDM and the β 0 β 1 β 4 models. It shows, that the alternative time evolution of the Hubble rate in bimetric theory can accommodate the CMB observables, and a larger H 0 value today, than the ΛCDM scenario. Note that β 0 > 0 and hence ρ DE > 0 at the best-fit point due to eq. (A1a). Thus the equation of state is always phantom. The favored Fierz-Pauli masses in the considered best-fit intervals are m FP ≈ 4 · 10 −33 − 7 · 10 −33 eV and m FP ≈ 5 · 10 −31 − 7 · 10 −31 eV, both are consistent with cluster lensing [23] and other [76] constraints.
In fig. 8 case where the spin-2 mass is two orders of magnitude larger, the phantom era is at larger redshifts z ∼ 10 and z ∼ O(100). This temporary phantom behavior allows to obtain a good fit to CMB data despite a higher value of H 0 than in ΛCDM. In particular in the case of the larger spin-2 mass, the equation of state is within the experimental bounds from all supernova and BAO observables at redshifts z 1.5, see Ref. [45]. The general behavior is the following. The spin-2 mass controls at which redshifts the equation of state significantly deviates from −1, while the mixing parameterᾱ controls the deviation. To be precise, it is the value of β 0 that controls the deviation. If its value is close to zero, the equation of state significantly deviates from −1. On the other hand, if β 0 is positive and far away from zero, the equation of state is close to −1 always 12 . Note that in order to achieve a value β 0 close to zero requires a non-zero mixing parameterᾱ, cf. eq. (A1a). In the GRlimitᾱ 1 the phantom era is absent. The freedom to allow for large spin-2 masses and thus shifting the phantom behavior to larger z while keepingᾱ finite to yield a significant phantom era, is not possible in the more restricted two parameter models. This is the reason why Ref. [45] finds that low redshift data disfavor the phantom behavior enforcing w DE −1.
In fig. 9 we show the ratio between the Hubble rates of the β 0 β 1 β 4 -model to the Hubble rate in the ΛCDMmodel, evaluated at the two exemplary best-fit points. For both examples there are two regimes where the relative values change. The regime at high redshifts is because matter-radiation-equality is slightly shifted. The second regime is due to the shifted dark energy-matterequality and the phantom equation of state. It eventually increases the value of the Hubble rate today. In the asymptotic future, the Hubble rate approaches the value of the cosmological constant, which is higher in the β 0 β 1 β 4 -model as compared to the ΛCDM-model, cf. table I. Hence it is a combination of different aspects that increases the value of H 0 in the global fit: a larger value of the asymptotic cosmological constant Λ, a delayed expansion history, and an effective phantom equation of state.

D. Baryon-Acoustic-Oscillations
The same physical scale of the CMB perturbations is imprinted in the matter power spectrum and is accessible to us in the data of several surveys, measuring galaxy distributions at different redshifts. The useful oblique parameter relevant for the computation of the matter distribution observable is the ratio of the sound horizon r s (z d ) and the spherical average of the angular scale and the redshift separation and z d is the drag epoch, the redshift at which the baryons are released from the Compton drag of the photons.
In Fig. 10 we show, following [45], the measurements of d z by 6dFGS [77]: z eff = 0.106, SDSS [78]:z eff = 0.15, BOSS [79]: z eff = 0.32 and z eff = 0.57. We compare the observations with the predictions by the ΛCDM and the β 0 β 1 β 4 model. We find that the predictions of the bimetric β 0 β 1 β 4 -model at its best-fit point are indistinguishable from the predictions of our standard cosmology at the current experimental precision and given the current redshift sensitivity to the matter power spectrum. Overall, the fit to the baryon-acoustic-oscillation (BAO) data shows some degree of tension in both models, however, underestimated systematic errors could contribute to that. In near future the DESI instrument [80] will provide a new dataset of BAO observations at multiple redshifts and even more advanced experiments will push to larger redshifts [81]. A big advantage of DESI will be, that BAO data will be obtained at multiple redshifts with the same instrument, thus avoiding the problem of different systematic errors among the instruments. With the newly collected data, this question should be re-examined.

V. Summary and discussion
In the first part of this paper, we discussed the signatures of the Vainshtein mechanism in cosmology. In particular, we find that the Stückelberg field becomes large and linear perturbations start to grow exactly at the time when the universe becomes smaller than its own Vainshtein radius when looking back in time from today. The corresponding critical Hubble rate is For times when the Hubble rate is larger than the critical value, non-linearities have to be taken into account. We assumed that the non-linearities are such that the massive mode becomes strongly coupled and can be integrated out to restore GR. This allows us to study complex phenomena, such as the formation of the CMB.
Under this assumption (that CMB physics is the same as in GR due to the Vainshtein mechanism), we perform a global fit to the CMB observables and the local measurements on Cepheid variables and SNIa observations of the cosmic acceleration. We find that due to the phantom equation of state of the effective dark energy component the reported tension between the local and CMB determination of the Hubble scale can be resolved.
We discussed other observations at intermediate scales, in particular, the BAOs and concluded that at the moment our framework can not be distinguished from ΛCDM given the present data. However, in the near future, a strong improvement in sensitivity and systematic uncertainty will likely change this situation.
When discussing the significance of the Hubble tension, a word of caution is in order. So far we have taken the local determinations of the Hubble rate at phase value with the quoted uncertainties, which are reported to be at the 2 − 3% level. However, the distance determination with the Cepheid observables is known to be subject to systematic errors, which are hard to control, see for example Ref. [82].
One important effect is the so-called blending effect and is based on the fact that the spatial resolution of our instruments gets worse with distance. Thus observations of Cepheids, that are further away are more likely to pick up light from unresolved background sources. This leads to systematically larger luminosities for more distant objects. This effect tends to increase the reconstructed local Hubble rate, which is consistent with the sign of the observed discrepancy. Taking the blending effect into account would increase the uncertainty to the ∼ 5% level [83,84]. Even though this would not resolve the Hubble tension, without proper control of the systematic error, we can not make strong statements about the true statistical significance of this anomaly.
On the other hand, if the H 0 -tension is real and statistically significant, bimetric theory appears as an experimentally favored, consistent, and theoretically wellmotivated alternative to GR. Whether data eventually favors bimetric theory also over other gravitational theories, remains an open question. A next step would be a global fit to all the current data also including high-redshift supernovae and BAOs and allowing for large spin-2 mass, say up to the CMB scale. This will presumably provide a lower bound on the spin-2 mass. Another direction is a non-linear study of the cosmological perturbations and of the Vainshtein mechanism. Within an entirely bimetric framework, the constraints from the CMB should be derived to explicitly check our assumption a posteriori. Also, cosmic structure formation should be addressed entirely within bimetric theory as it probes redshifts where the phantom era occurs.
Throughout the paper we used the β 0 β 1 β 4 -model that is defined by setting β 2 = β 3 = 0 to provide an explicit example. In this appendix, we report the exact expressions and discuss some features of the model. For details, we refer to [51].
First, let us find the relation between the interaction and physical parameters. The background eq. (10) is a cubic polynomial in c and gives rise to up the three real-valued roots. Each root describes a vacuum of the β 0 β 1 β 4 -model with different spin-2 mass, mixing angle and cosmological constant. However as discussed in Ref. [51], only one of the vacua is physical. Therefore, the vacuum eqs. (10) and (12) imply the following unique relation between the interaction parameters and the physical parameters, The physical parameters are not completely free but have to satisfy the Higuchi bound, m 2 FP > 3Λ/2, to ensure unitarity [59]. In the following, we still express all equations in terms of the interaction parameters β n for brevity but they should be understood as being functions of the physical parameters. Furthermore, we rescaleȳ = αy for brevity.
Setting β 2 = β 3 = 0, eq. (21) reduces tō whereβ n = α −n β n for brevity. This polynomial has up to three real-valued roots that yield y as a function of ρ m . For a given set of parameters, only one of these solutions corresponds to the finite branch. Since the expressions are quite lengthy and not enlightening, we do not show them explicitly here. The finite branch solution must satisfy 0 ≤ȳ ≤ᾱ which allows picking the finite branch solution numerically. Hence, we can express µ = y y (A3) either analytically (but lengthy) or numerically as a function of matter-energy density ρ m and consequently as a function of redshift z only. We used that for producing the exemplary plots in fig. 3. Next, we want to find the Hubble rate as a function of redshift z only. For the β 0 β 1 β 4 -model the Hubble rate reads where the interaction parametersβ n are understood as functions of the physical parameters andȳ as the finite branch solution to eq. (A2) (either analytically or numerically). We used the result for drawing the plot in fig. 4 and for the data analysis in section IV. The effective time-varying cosmological Planck mass as a function of redshift is again to lengthy to display. Instead, we give the expression in terms ofȳ, where we used that ρ m ,ȳ, and ln a all serve as time coordinate. Whenȳ is understood as the finite branch solution, this equation gives m 2 c as a function of redshift z only. This is shown in fig. 6.
Let us collect some more details for the data analysis with the β 0 β 1 β 4 -model. Evaluating Friedmann's equation today yields a relation among the parameters of the model, where the subscript 0 indicates the value of the quantity at present time. We used this relation to eliminate Ω m,0 in terms of the other parameters.
In the data analysis, we constrained the deceleration parameter q that is derived from Friedmann's equation. We can express the definition more explicit in terms of bimetric parameters as whereȳ =ẏ/H is given by eq. (22) and Again, withȳ understood as the finite branch solution, q is a function of redshift only. For the data analysis, we used the constraints on q 0 (i.e. at z = 0) to find the favored values of the physical parameters. Note that eq. (A8) holds for any bimetric (sub)model.