Conformal symmetry and nonlinear extensions of nonlocal gravity

We study two nonlinear extensions of the nonlocal $R\,\Box^{-2}R$ gravity theory. We extend this theory in two different ways suggested by conformal symmetry, either replacing $\Box^{-2}$ with $(-\Box + R/6)^{-2}$, which is the operator that enters the action for a conformally-coupled scalar field, or replacing $\Box^{-2}$ with the inverse of the Paneitz operator, which is a four-derivative operator that enters in the effective action induced by the conformal anomaly. We show that the former modification gives an interesting and viable cosmological model, with a dark energy equation of state today $w_{\rm DE}\simeq -1.01$, which very closely mimics $\Lambda$CDM and evolves asymptotically into a de Sitter solution. The model based on the Paneitz operator seems instead excluded by the comparison with observations. We also review some issues about the causality of nonlocal theories, and we point out that these nonlocal models can be modified so to nicely interpolate between Starobinski inflation in the primordial universe and accelerated expansion in the recent epoch.


Introduction
Much work has been recently devoted to the study of infrared (IR) modifications of General Relativity (GR), with the aim of producing viable cosmological models displaying selfaccelerating solutions even in the absence of a cosmological constant (see e.g. [1] for a recent review). In this context, our group has developed a program aiming at exploring the effect of nonlocal modifications of gravity. While at the fundamental level QFT is local, at an effective level nonlocalities are commonly generated. This can happen both classically, when one integrates out some degrees of freedom to obtain an effective action for the remaining degrees of freedom, and at the quantum level, because of massless or light particles running into quantum loops. In principle this could generate nonlocal terms depending, e.g., on the inverse d'Alembertian 2 −1 . This operator becomes relevant in the IR, and is therefore potentially relevant in cosmology.
A nonlocal quantum effective action produces nonlocal equations of motion for the vacuum expectation values of the quantum fields. The relevant quantities, for cosmological applications, are the in-in vacuum expectation values. The corresponding equations of motion, which are obtained using the Schwinger-Keldysh formalism, depend on the inverse d'Alembertian defined with the retarded Green's function, and are therefore automatically causal. 1 There are two aspects in the problem of developing a nonlocal IR modification of GR. First, at the purely phenomenological level, one must identify models which work well, i.e. have a viable background evolution at the cosmological level, have well-behaved cosmological perturbations, and fit the observations, to the extent that they can compete with ΛCDM. Second, one must identify the specific mechanism that produces these nonlocalities from a fundamental local theory. It is quite natural to begin this program from the first part. Indeed, it is highly non-trivial to construct IR modifications of GR that are cosmologically viable, as has been learned from experience with the DGP model [2][3][4][5][6][7][8][9], the dRGT theory of massive gravity [10][11][12][13][14][15][16], Hassan-Rosen bigravity [17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36], or nonlocal models of the Deser-Woodard type [37][38][39][40]. Indeed, none of these attempts has yet produced a viable competitor to ΛCDM. We can therefore hope that the condition of producing a cosmologically viable model will be sufficiently restrictive to select a limited range of nonlocal models. In turn, this might give precious hints for their derivation from a fundamental local theory.
The aim of this paper is to explore some possibly well-motivated nonlinear extensions of the nonlocal models that have been recently proposed by our group, in order to contribute to charting the territory of possible viable nonlocal models. The paper is organized as follows. In Section 2 we put our work into context, giving an overview of the different possibility that have been explored to date, and we will justify our choice of the class of models that deserve to be further investigated. In Sections 3 and 4 we will examine two particularly interesting nonlinear extensions of the simplest viable nonlocal model. We present our conclusions in Section 5. We also take this opportunity to review, in App. A, some issues about the causality of nonlocal theories, that occasionally generate some confusion. We use the Misner, Thorne and Wheeler (MTW) conventions for the metric, Riemann tensor, etc., so in particular our signature is (−, +, +, +), and we set = c = 1. A prime will denote the derivative with respect to x ≡ log a, where a is the scale factor in FRW.

An overview of nonlocal models
The class of nonlocal models that we investigate here are characterized by the fact that the nonlocal terms are associated to an explicit mass scale m (and are therefore different from the nonlocal models studied in [37][38][39][40] as well from those discussed in [41][42][43]). The original inspiration came from the degravitation idea [44][45][46], in which Einstein equations were modified phenomenologically into However, eq. (2.1) has the problem that the energy-momentum tensor is no longer automatically conserved, since in curved space the covariant derivative ∇ µ does not commute with the covariant d'Alembertian 2, and therefore does not commute with 2 −1 either. One can however observe that any symmetric tensor S µν can be decomposed as where S T µν is the transverse part of the tensor, that satisfies ∇ µ S T µν = 0. It can be proven that this decomposition is valid in a generic curved space-time [47,48]. The extraction of the transverse part of a tensor is itself a nonlocal operation. For instance in flat space, where ∇ µ → ∂ µ , applying to both sides of eq. (2.2) ∂ µ and ∂ µ ∂ ν , it is easy to show that the inversion of eq. (2.2) is In a generic curved spacetime there is no such a simple formula. In any case, technically the easiest way to handle these models is to put them in a local form with the help of auxiliary fields, see below. Using the possibility of extracting the transverse part of a tensor, in [49] it was proposed to modify eq. (2.1) into so that energy-momentum conservation ∇ µ T µν = 0 is automatically ensured. In [50,51] it was however found that the cosmological evolution that follows from this model is unstable, already at the background level. We will review below how such instabilities can in principle emerge in these nonlocal models. In any case, this adds the model (2.4) to the long list of IR modifications of GR that did not make it.
The first successful nonlocal model was then proposed in [50], observing that the instability is specific to the form of the 2 −1 operator on a tensor such as R µν or G µν , and does not appear when 2 −1 is applied to a scalar, such as the Ricci scalar R. Thus, in [50] it was proposed a model based on the nonlocal equation where the factor 1/3 provides a convenient normalization for the new mass parameter m. This model is quite interesting phenomenologically. It has no van Dam-Veltman-Zakharov discontinuity, and smoothly reduces to GR in the limit m → 0. For m = O(H 0 ), as will be required by cosmology, it therefore passes without difficulty all solar-system and laboratory constraints [50,52]. 2 At the cosmological level, its background evolution is stable during RD and MD and has a self-accelerating solution, i.e. the nonlocal term behaves as an effective dark energy density [50,51]. This produces a realistic background FRW evolution, without the need of introducing a cosmological constant. Its cosmological perturbations are well-behaved, both in the scalar [54] and in the tensor sector [55]. The study of the effect of its cosmological perturbations shows that the predictions of the model are consistent with CMB, supernovae, BAO and structure formation data [54,56,57]. The cosmological perturbations have then been implemented in a Boltzmann code in [58]. This allowed us to perform Bayesian parameter estimation and a detailed quantitative comparison with ΛCDM, that shows that the model fits the data at a level which is statistically indistinguishable from ΛCDM. 3 Having passed all these tests the model deserves a name, and we have dubbed it the "RT" model, where R stands for the Ricci scalar and T for the extraction of the transverse part. A closed form for the action corresponding to eq. (2.5) is currently not known. This model is however closely related to another nonlocal model, subsequently proposed in [59], and defined by the action where m Pl is the reduced Planck mass, m 2 Pl = 1/(8πG), and µ ≡ m 2 /6. Indeed, if we compute the equations of motion from eq. (2.6) and we linearize them over Minkowski space, we find the same equations of motion obtained by linearizing eq. (2.5). However, at the full nonlinear level, or linearizing over a background different from Minkowski, the two models are different. Also the model (2.6) works very well, both at the background level [59] and at the level of perturbations [54]. Again, the perturbations of this model have been implemented in a Boltzmann code in [58], and compared to observations using the 2013 Planck data. It was found that the model fits again well the data, even if not as well as ΛCDM or the RT model, although at the level of the analysis of [58] the difference was not statistically very significant. We will call the model defined by eq. (2.6) the "RR" model. Further work on the RR and RT models has been presented in [60][61][62][63][64][65][66][67].
Of course, as often happens in model building, there are in principle infinite choices for the specific form of the nonlocal model. We have therefore attempted to chart this large unexplored territory, considering some particularly natural extensions of these models. At the level of models defined by the equations of motion using the extraction of the transverse part, we have seen that the RT model (2.5), where (g µν 2 −1 R) T enters, is viable, while a model where appears (2 −1 G µν ) T , or equivalently where appears (2 −1 R µν ) T , is not. These models are naturally written down at the level of equations of motions, but are not easily written in terms of actions. Turning to models defined at the level of the action, one can observe that a basis for the curvature-square terms is provided by R 2 µνρσ , R 2 µν and R 2 . However, for cosmological applications it is more convenient to trade the Riemann tensor R µνρσ for the Weyl tensor C ρσµν . Thus, a natural generalization of the RR model is given by where µ 1 , µ 2 and µ 3 are independent parameters with dimension of squared mass. In [55] is has however been found that the term R µν 2 −2 R µν is ruled out since it gives again instabilities at the background level. 4 The Weyl-square term instead does not contribute to the background evolution, since the Weyl tensor vanishes in FRW, and it also has wellbehaved scalar perturbations. However, its tensor perturbations are unstable [55], which again rules out this term. 5 These results show that the condition of obtaining a viable cosmological model is indeed a powerful requirement, which allows us to eliminate most of the possible choices. In practice, at least within the space of theories that we have explored, we find that only models constructed uniquely with the Ricci scalar work. At a finer level of resolution, using again the Boltzmann code modified for nonlocal theories, in [53] we have repeated the comparison with observations using the 2015 Planck data (which were not yet publicly available when [58] appeared) as well as with an extended set of BAO and structure formation data. A Bayesian model comparison between ΛCDM, the RT and RR models has then been performed. In this improved analysis, ΛCDM and the RT models still both fit the data very well, and are statistically indistinguishable. In contrast the RR model, while by itself still fits the data at a fully acceptable level, in a Bayesian model comparison with ΛCDM or with the RT model is now significantly disfavored.
In a sense, the RT model can be considered as a nonlinear extension of the RR model, since the two models become the same when linearized over Minkowski. An action for the RT model would probably include further nonlinear terms, such as higher powers of the curvature associated to higher powers of 2 −1 . Since the data seem to point toward the importance of these nonlinear terms, it is natural to ask whether other nonlinear extensions of the RR theory are cosmologically viable. Once again, it is not possible to explore the most general form of these extensions. However, symmetries are often a powerful guide for model building. In particular, conformal symmetry naturally appears at high energies, or in the presence of massless particles. In the physical and mathematical literature, there 4 This result is analogous to the one found in [68], where it was shown that a term R µν 2 −1 Rµν also produces instabilities in the cosmological evolution. Observe that the latter term is rather of the Deser-Woodard type, i.e. of the form R µν f (2 −1 Rµν ), with a dimensionless function f and no explicit mass scale m. However, in both cases the instability is ultimately due to the form of the 2 −1 operator on the tensor Rµν , as also in the model (2.4). 5 Unless one takes a model for the early Universe that generates a totally negligible amount of primordial tensor perturbations, so there are no primordial tensor perturbations that will be amplified by the subsequent unstable evolution. Within the standard inflationary paradigm, even for models with extremely low values of the tensor-to-scalar ratio, the Weyl term is ruled out.
are two notable extensions of the 2 or of the 2 2 operator, related to conformal symmetry. The first is obtained replacing This is the operator that appears in the action of a conformally coupled scalar field in D = 4 space-time dimensions. Another interesting option is to replace directly 2 2 with the Paneitz operator This operator was independently discovered in a mathematical context, as well as in physics in the context of conformal supergravity [69], and there is a large body of mathematical literature on it. In physics ∆ −1 4 appears in particular in the nonlocal anomalyinduced effective action in four dimensions (see e.g. [70] for review, and eq. (A.7) below). Just as the operator (2.8), ∆ 4 only depends on the conformal structure of the space-time. 6 The first nonlinear extension of the RR model that we will consider is then defined by the action We will call it the "conformal RR" model. The second model that we will investigate is defined by In both case, µ ≡ m 2 /6. If the model (2.10) should work, this could give a hint that the fundamental theory behind the effective nonlocal model involves a conformally-coupled scalar field. The model (2.11) would rather point toward a role of the anomaly-induced effective action in the derivation of the nonlocal theory, possibly along the lines recently discussed in [71].
To assess whether a model is cosmologically viable we will study its cosmology at the background level, verifying if it has a stable self-accelerating background solution. If this is the case, one should then in principle study its cosmological perturbations, implement them in a Boltzmann code, perform parameter estimation of the model, and finally compare with the data. The latter part is of course very laborious. However, an approximate but much simpler criterium for the viability of the model is given by its prediction for the equation of state (EOS) of DE. Dynamical DE models are often investigated using the so-called wCDM model, in which the EOS parameter is taken to be a constant in time, with a value w which, rather than being fixed to −1 as in ΛCDM, is taken as a new fitting parameter. Actually, once a dynamical DE is considered, there is no reason a priori why w should be constant in time, and a more general phenomenological parametrization is obtained in the so-called (w 0 , w a ) model, where near the recent epoch w(a) is written, as a function of the scale factor a, as (2.12) 6 The Paneitz operator was also considered in the context of the Deser-Woodard class of nonlocal models in [37], where was considered the possibility of adding to the Ricci scalar in the action a term R∆ −1 4 R 2 which, on dimensional ground, does not require the introduction of a mass scale.
Of course, both in wCDM and in the (w 0 , w a ) parametrization no DE perturbations are included, so these parametrizations do not exactly capture all features of a specific dynamical model, such as the RR and RT nonlocal models or of their generalizations. Indeed, these specific nonlocal models also have a given structure of cosmological perturbations, which differs ΛCDM, and which also affect the parameter estimation in these models. Thus, to perform a quantitative Bayesian comparison between the performance of a nonlocal model with that of ΛCDM, there is no alternative to the full analysis, as done for the RT and RR models in [53,58]. However, to have a first estimate of whether a model is viable, we can just compare the value of w(a) obtained from the nonlocal model with the limits on w 0 or on (w 0 , w a ) obtained comparing wCDM or the (w 0 , w a ) model to the data, which has been done in the 2015 Planck dark energy paper [72]. Indeed, experience with the RT and RR model shows that this criterium gives quite reasonable results. In particular, for the RT model, one finds [50] w 0 −1.04, w a −0.02, 7 while for the RR model w 0 −1.14, w a = 0.08 [59]. This suggests that the RT model produces deviations, with respect to ΛCDM, of order of a few percent, while the RR model should produce larger deviations. Indeed, in the full Boltzmann code analysis we found for instance that, in structure formation, the RT model shows deviation from ΛCDM at the level of about 2%, while the RR model shows deviations that, depending on the observable, can be up to 8% [53,58]. Similarly, when performing parameter estimation from CMB, SNe and BAO, the results for the RT model are quite close to that of ΛCDM, while the RR model shows larger departures from ΛCDM. For instance, the best-fit values for H 0 from Planck 2015 temperature and polarization data, plus the set of BAO and SNa data considered in [53] are H 0 = 67.67 for ΛCDM, H 0 = 68.76 for the RT model and H 0 = 70.44 for the RR model.
In [53], performing the Bayesian comparison between the model, we found that the RR model, which has w 0 −1.14, is disfavored, while the RT model, with a value w 0 −1.04 closer to the ΛCDM value w = −1, is fully consistent with the observations, and fit the data in a way statistically equivalent to ΛCDM. These results are fully consistent with those obtained in the 2015 Planck dark energy paper [72] for the generic wCDM or (w 0 , w a ) parametrizations. This gives us a first guidance into the typical values that of w 0 that a nonlocal model should have, to be consistent with the observation. Of course, for a model that passes this first test, in the end a full analysis will be necessary, especially if we want to compare its performances to that of ΛCDM.

The conformal RR model
We first consider the model defined by Writing the action in terms of a generic value of ξ can be useful at the mathematical level, to investigate the dependence on ξ. However, physically, beside ξ = 0 there is only one special value, which is the conformal case ξ = 1/6. To compare with the observational data we will only be interested in the case ξ = 1/6, which gives a sharp and physicallymotivated prediction. Furthermore, if we keep ξ as a free parameter and consider ξ = 1/6, we are no longer protected by conformal symmetry, and nothing forbids to add to −2 also a mass term, which would lead to a second extra free parameter. In this sense, the model (3.1) with ξ = 1/6 is privileged also with respect to the RR model, which has ξ = 0, or the RT model. The model (3.1), which is a natural extension of the model (2.6), was already studied in [65], closely following the analysis of the RT and RR models in [50,59].
Since however the most interesting case ξ = 1/6 was not specifically investigated in [65], we will repeat below part of this analysis, and the relevant numerical integration, and we will work out the prediction for the DE equation of state in this case. Following a technique introduced in [73] in the context of the Deser-Woodard model, and already used in [50,59], we write the action in a local form introducing two Lagrange multipliers λ 1 and λ 2 and two auxiliary scalar fields S, U , as (3. 2) The variation with respect to the Lagrange multipliers enforces the equations The variations with respect to S and U give λ 1 = µS and λ 2 = µU . The equations of motions obtained performing the variation with respect to the metric give (adding also the matter action) where in agreement with [65]. For ξ = 0, eqs. (3.3), (3.4) and (3.7) reduce to that given in [59].

Cosmological equations
We now specialize to a FRW metric. The computation is a straightforward generalization of that performed in [59]. We parametrize the time evolution using the variable x = log a, we denote df /df = f and we introduce the notations where Ω M and Ω R are the density fractions of matter and radiation today, respectively. From the (00) component of eq. (3.7) we get the Friedman equation which, in these dimensionless variables, reads This form is the most convenient for the numerical integration. For studying the stability of the system and for identifying the effective dark energy, it is convenient to trade V for Equation (3.13) shows that there is an effective dark energy term, with ρ DE = ρ 0 γY , so the DE density fraction is The fundamental equations for the variables h(x), U (x) and W (x) are eqs. (3.13), (3.14), (3.11) and (3.16).

Stability of the background solution
An important aspect for the viability of a cosmological model is the stability of the background solution. A full analysis of the stability requires the study of linear cosmological perturbations, and we will report on it in a future work. However, already at the background level that we are considering in this paper, a stringent test is possible [50,51,59]. Indeed, the auxiliary fields U and W obey the inhomogeneous differential equations (3.11) and (3.16). The general solution will be a superposition of a particular solution of the inhomogeneous equation and the most general solution of the associated homogeneous equations. The latter can be easily obtained analytically whenever we are deep in a given era, so that ζ(x) becomes approximately a constant ζ 0 . In particular ζ 0 = {−2, −3/2, 0} in RD, MD and de Sitter (dS), respectively. Then the solutions of the homogeneous equations associated to eqs. (3.11) and (3.16) have the general form U hom = e α ± x and W hom = e β ± x . If at least one among the four coefficients α ± , β ± is positive, either in RD or in MD, there will be at least one growing mode. Of course, at the background level, one can in principle choose initial conditions such that U hom = W hom = 0. However this is a fine-tuning, and any spatially-homogeneous perturbation δU (t), δW (t) will move the system away from this point. Then, for a generic perturbation the growing mode will unavoidably be excited, and the background solution will be destabilized. In other words, if the homogenous equations for U and W have growing modes, we automatically know that, when we will study linear cosmological perturbations, the variables δU (k, t) and δW (k, t) will show instability already in the spatially-homogeneous limit k → 0. The absence of growing modes for the homogeneous solutions is therefore a necessary (but certainly in general not sufficient) condition for the stability at the level of linear cosmological perturbations, and the homogeneous solutions must be stable both in RD and in MD. 8 It is indeed this stability criterium that ruled out the model (2.4), while the RR and RT models passed this test [50,51,59]. For the model (3.1), specializing directly to the physically interesting case ξ = 1/6, the homogeneous equation for U , with ζ = ζ 0 constant, reads The corresponding solutions are U = e α ± x with α + = −1 and α − = −(2 + ζ 0 ), which are never positive in RD, MD or de Sitter. Similarly, the solutions of the homogeneous equation for W are W = e β ± x with β + = −1 + 2ζ 0 and β − = −2 + ζ 0 , which again are both negative, in all three eras. Therefore, there is no instability in the background evolution, as also observed in [65].

Solution for the background evolution
For the numerical integration we use eqs. (3.10)-(3.12). As in [54], we observe that in eqs. (3.11) and (3.12) appears ζ = h /h. However, h can be computed explicitly taking the derivative of eq. (3.10). The resulting expression contains V and U , which can be eliminated using again eqs. (3.11) and (3.12). The result is given by The value of γ is tuned so to obtain the desired value of Ω M (i.e., choosing a value of Ω M and tuning γ requiring that the solution of the numerical integration satisfies h(x = 0) = 1, 8 In principle stability in the dS era is not mandatory because one might imagine that, at the large energy scales corresponding to primordial inflation, the nonlocal models are modified. Indeed, one could imagine that effective actions such as (2.6) might be valid only in the low energy limit, and could be modified at the large energies corresponding to inflationary scales. We will give an interesting example of this sort in eq. (5.1) below. In the numerical solution of the equations, we always start the integration deep in RD. However, if the model is already stable even in de Sitter, this is certainly a positive feature.
Observe that the RT model is only stable in RD and MD [50,51], while the RR model is stable in dS, RD and MD [59]. We will find below that also the model (3.1) with ξ = 1/6 is stable in all three eras, see also [65] for ξ generic. (recall that the prime is the derivative with respect to x), so The results of the numerical integration are shown in Figs. 1 and 2. We see from Fig. 1 that, asymptotically, the Hubble parameter and the DE density go to a constant. The model therefore goes asymptotically to de Sitter, just as ΛCDM, as also observed in [65]. This result is easily understood analytically. Rewriting eqs. (3.10)-(3.12) in terms of cosmic time t and of the Hubble parameter H(t) =ȧ/a, we get Ü + 3HU + 12ξ 2 H 2 U = 12H 2 ,

(3.24)
S + 3HṠ + 12ξ 2 H 2 S = U , (3.25) whereḟ ≡ ∂f /∂t. We now consider the asymptotic regime where DE dominates (see the top-right panel of Fig. 1), so the term (8πG/3)ρ on the right-hand side of eq. (3.23) is negligible compared to the effective DE term, and in this limit we look for a solution with H constant. In this case, for ξ = 0, the solution of eq. (3.24) is In particular, for ξ = 1/6, asymptotically H = m, and therefore h = 3γ 1/2 . This perfectly agrees with the asymptotic value of h that we find from the numerical integration, shown in the bottom panel of Fig. 1. Observe that the case ξ = 0 is quite different from the RR model, which corresponds to ξ = 0 and therefore does not have this solution. Indeed, in the RR model h(x) grows indefinitely, even in the DE dominated era, see Fig. 1 of ref. [59]. Alternatively, we can understand the emergence of a de Sitter solution directly from the action, observing that in a regime of constant (and non-vanishing) R, the operator (−2 + ξR) −1 acting on R reduces to (ξR) −1 . Then the nonlocal term in the action (3.1) reduces to a cosmological constant Λ = m 2 /(12ξ 2 ), leading to a de Sitter era with H 2 = (1/3)Λ, in agreement with eq. (3.27).
The DE equation of state is shown in Fig. 2, and we see that it is on the phantom side, as for the RR and RT models. This is a consequence of eq. (3.21), together with the fact that in these three models the DE density vanishes in RD and then grows monotonically, so ρ DE > 0 and ρ DE > 0, which implies 1 + w DE < 0. Among the three plots in Fig. 2, the one against x shows w DE on the largest time interval. The matter-radiation equilibrium is around x −8.1, so this plot displays the whole MD era, the present DE dominated era and is extended into the future, x > 0. The plot as a function of redshift z focus on the more recent past epoch 0 ≤ z < ∼ 10. This is still a range broader than that on which such a DE can play a relevant role (observe that, going backward in time, this DE decreases, rather than being a constant as in ΛCDM, so in the past it becomes even more and more irrelevant than a cosmological constant). Note that z = 10 corresponds to x −2.4. Observing that the vertical scale has been expanded, we see that the dependence of w DE on z in this epoch is very mild, and w DE is very close to −1. Indeed, w DE (z = 0) −1.012. The bottom panel in Fig. 2 shows that, in the recent epoch, the standard linear fit (2.12) is appropriate. Fitting our numerical result to eq. (2.12) for −1 < x < 0, we get Actually, Fig. 2 shows that a fit linear in z is appropriate on a broader range, so we also perform a fit of the form w(z) = w 0 + w z z . (3.29) In this case, the fit is excellent in the whole range 0 < z < 10 (which corresponds to −2.4 < x < 0), and we get Since also the perturbations in the dark energy sector are proportional to 1 + w DE , we expect that this model produces deviations from ΛCDM at the level of about 1%, which are not detectable with present observations. A more detailed quantitative analysis will require the computation of the cosmological perturbations and their implementations in a Boltzmann code, as performed in [53,54,58] for the RR and RT nonlocal model. We hope to report on this in the future. However, from the discussion at the end of Section 2, it is clear that the model is consistent with observations, and we expect that its predictions will be intermediate between that of ΛCDM and that of the RT model.

Non-local action with the Paneitz operator
We next consider the action (2.11). The operator ∆ 4 has a number of interesting mathematical properties. In particular, if two metrics g µν andḡ µν are related by a conformal factor, g µν = e φḡ µν , then In this sense, it is the analogous of the Laplacian in two dimensions, and indeed it also appears in the four-dimensional quantum effective action for the conformal anomaly. We rewrite S ∆ 4 introducing an auxiliary field S and a Lagrange multiplier ξ, The variation with respect to ξ enforces i.e. S = ∆ −1 4 R. In covariant form, the equations of motions are As always, the Friedmann equation is then obtained from the (00) component of this equation. This covariant result is the necessary starting point when performing perturbation theory. Given the large number of terms in eq. (4.4), for obtaining the equations determining the background evolution, the fastest route is actually to work directly in FRW. Given the conformal properties of the ∆ 4 operator, to derive the cosmological equations it is convenient to use conformal time, so ds 2 = a 2 (−dη 2 + dx 2 ). We also use the notation H = ∂ η a/a. We will write explicitly the derivative with respect to η, reserving again the prime for d/dx, with x = ln a. Equation (4.1) allows us to compute immediately the expression of ∆ 4 on a FRW metric. Using conformal time, g µν = a 2 (η)η µν . Then eq. (4.1) shows that, on a scalar S, a 4 ∆ 4 = ∂ 4 η i.e.
To compute the equations of motion in FRW we however need to keep also the lapse function, considering a metric of the form The explicit computations shows that on this metric where ∂ η n ≡ (∂ η N )/N , i.e. n = log N . In the above expression we have neglected all terms that contains products of more than one derivative of n, e.g (∂ η n) 2 , ∂ η n∂ 2 η n, etc.
Indeed, to derive the Friedman equation we must take the variation of the action wrt to the lapse function N and then set N = 1, so ∂ η n = 0. Then, all terms with products of more than one derivative of n give zero in the variation. In the limit N = 1 this expression correctly reduces to eq. (4.5). This result can also be derived more elegantly defining a new conformal timeη from N dη = dη. Then ds 2 = a 2 (−dη 2 + dx 2 ) is again conformal, and therefore a 4 ∆ 4 = ∂ 4 /∂η 4 . Since ∂/∂η = (dη/dη)∂/∂η = (1/N )∂/∂η, we get Computing the derivatives and retaining only terms with at most one derivative of N gives back eq. (4.7). This also makes it clear why all terms involving H 'miracolously' cancel in a brute-force computation of eq. (4.7) from eq. (2.9). 9 Restricting to time-dependent fields in the metric (4.6), the action (2.11) becomes  The variation with respect to N , at N = 1, can be obtained setting directly ξ = −µS in eq. (4.10), i.e. using as a Lagrangian (4.13) The variation wrt to N gives (adding also the matter action) (4.14) where, after the variation, we set N = 1. This gives the modified Friedman equation, For instance, the term ∂ηS in eq. (4.7), beside being multiplied by ∂ 3 η n, in the intermediate steps of the computation is also multiplied by terms H 2 , H∂ηH, H 2 ∂ηn and ∂ηH∂ηn, and similarly for the terms ∂ 2 η S and ∂ 3 η S. 10 Notes that this is not true on a generic background. In FRW, with N = 1, in the term proportional to ξ in eq. (4.10) only ξ∂ 4 η S, survives, which can be trivially integrated by parts to give S∂ 4 η ξ. However, in a general background ξ∆4S does not integrate by parts to S∆4ξ. We have checked that this result agrees with that obtained directly from the (00) component of the covariant equation of motion (4.4). We now introduce and, in eq. (4.15), we express ∂ 2 η S and ∂ 3 η S in terms of U and ∂ η U . After defining, as in Sect. 3 In terms of the these dimensionless variables the equation ∂ 2 η S = a 2 U , which follows from eq. (4.16), reads while eq. (4.11), expressed in terms of U , reads U + (5 + ζ)U + (6 + 2ζ)U = 6(2 + ζ) . where quite similar to eqs. (31) and (33) of [59] (or (3.2) and (3.5) of [54]), except that U is replaced by U + 2U . We see that γY plays the role of the effective dark energy fraction, Ω DE (x) = γY , as in eq. (3.15). Equation (4.18) is replaced by These are given by W = e β + x and W = e β − x with β + = 2ζ 0 and β − = −1 + ζ 0 . Again, β − is negative in all three eras, while β + is negative in RD and MD and vanishes, corresponding to a constant solution, in dS. Thus, there is no growing mode and the cosmological evolution is stable. We now integrate eqs. (4.17)-(4.19) numerically. As before, we need to compute explicitly ζ in terms of U, V, U and V . This gives To assess whether this prediction for w DE (z) is consistent with cosmological observations, a full analysis of the cosmological perturbations, as well as the corresponding parameter estimation in the R∆ −1 4 R model, is in principle necessary. However, the rightbottom panel of Fig. 4 shows that, at least at the level of background evolution, near the present epoch a linear fit of the form (2.12) is appropriate. Fitting our results to eq. (2.12) we get w 0 −1.31 , w a = 0.49 . Comparing the values in eq. (4.26) with Fig. 4 of the 2015 Planck dark energy paper [72], or comparing directly our Fig. 4 (upper-right panel) for w(z) with Fig. 5 of [72] we see that, if one combines Planck CMB data with BAO, SNe and H 0 measurements, the predictions of the ∆ 4 model are excluded at more than 95% c.l. (and possibly, extrapolating from the contours of the figures in [72], at about 99% c.l.). Thus, unless the inclusion of cosmological perturbations changes substantially the picture, the ∆ 4 model is basically excluded by the data.

Conclusions and further directions
In this paper we have continued our exploration of the landscape of possible viable nonlocal IR modifications of GR. From our previous works, we have been lead to focus on terms in the action in which a nonlocal operator is sandwiched between two Ricci scalars. The simplest option, the RR model (2.6) where R2 −2 R appears in the action, by itself fits well the data. However, once we use the most recent 2015 Planck data, it is significantly disfavored compared to ΛCDM and to the RT nonlocal model (2.5) [53]. In contrast, the RT model and ΛCDM fit the data equally well. Since the RT model is in a sense a nonlinear extension of the RR model, we were lead to examine other possible forms of such nonlinear extensions. Of course in principle there is an infinity of choices, as often happens in model building. Symmetries are however a powerful guiding principle in model building, and we have then chosen to explore two possibilities that might be indications of an underlying conformal symmetry. We have found that the first model, defined by the action (3.1), works very well, while a model constructed with the Paneitz operator, eq. (2.11), seems ruled out. The model (3.1) seems indeed to enjoy several positive features, even compared to the other nonlocal model that works well, i.e. the RT model. Indeed, it is defined in terms of a relatively simple action. As we have discussed, its cosmological evolution is stable both in an early de Sitter inflationary era, as well as in the subsequent RD and MD epochs, while the RT model is only stable in RD and MD, and must therefore be eventually embedded in some high-energy modification at the inflationary scale. The specific form of the nonlocal term is protected by conformal symmetry, which gives a motivation for discarding the possibility of adding also a mass term, −2 → −2+m 2 , which is otherwise in principle possible for the RR and RT models. At the level of comparison with the data, a more detailed study of its cosmological perturbations is necessary to make quantitative assessment, and a Bayesian comparison with ΛCDM. However, from the background evolution, we see that the deviation of this model from ΛCDM are very tiny, at the level of about 1%. We expect that, once the full apparatus of computing the cosmological perturbations and implementing them in a modified Boltzmann code will be developed, its predictions will be intermediate between those of ΛCDM and those of the RT model, and possibly near the limit of resolution of future missions such as Euclid.
It is also interesting to observe that nonlocal models featuring R 2 terms in the action, which can explain DE in the recent epoch, can be naturally connected with Starobinski inflation at high energies. For instance, one could generalize eq. (2.10) into At sufficiently high energies or curvatures we have R(Λ 4 S /2 2 )R R 2 . 11 In this regime the nonlocal term is much smaller than the local one, and we recover the Starobinski model. This means that, at E ∼ M , we have the standard Starobinski inflationary phase. After reheating, when the energy and curvature drop below the scale M , the local R 2 term becomes negligible with respect to the Einstein-Hilbert term. Finally, in the recent epoch, say z < 10, the nonlocal term gradually starts to become important, and the evolution is the one obtained from the nonlocal model (3.1). Thus, an effective action such as (5.1) nicely interpolates between inflation in the primordial Universe and accelerated expansion in the recent epoch. It would be interesting to understand if such a model could reflect a renormalization-group flow in a gravity theory with R 2 term, as tentatively discussed in [71].
equations of motion. The simplest example, discussed in [61], is given by a nonlocal term in the action of a scalar field, of the form dxφ2 −1 φ, where φ is some scalar field, and 2 −1 is defined with respect to some Green's function G(x; x ). Taking the variation with respect to φ(x) we get We see that the variation of the action automatically symmetrizes the Green's function. It is therefore impossible to obtain in this way a retarded Green's function in the equations of motion, since G ret (x; x ) is not symmetric under x ↔ x ; rather G ret (x ; x) = G adv (x; x ). So, even if we start from a retarded Green's function in the action, we get a combination of retarded and advanced Green's function in the equation of motion. The same unavoidably happens if we formally take the variation of a nonlocal gravity action such as that of the RR model [50,61,75]. This is indeed one of the reason why the action of a fundamental field theory must be local. However quantum effective actions can, and in fact almost unavoidably are, nonlocal. Indeed, a nonlocal quantum effective action just describes, in coordinate space, the running of coupling constants that is more commonly described in momentum space. For instance, at one-loop level the running of the electric charge in QED can be described in coordinate space by the one-loop quantum effective action [76] (see also [77,78]) Here µ is the renormalization scale, e(µ) is the renormalized charge at the scale µ and, for a single massless fermion, β 0 = 1/(12π 2 ). The logarithm of the d'Alembertian can be defined for instance from Another particularly famous example of nonlocal quantum effective action is the Polyakov action in D = 2 space-time dimensions, The Polyakov action is the quantum effective action obtained by integrating out the massless matter fields in 2-dimensional gravity coupled to matter, and can also be obtained integrating the conformal (or trace) anomaly. In D = 2 the conformal anomaly takes the form (see e.g. [79]) where N = N S +N F is the total number of massless scalar and Dirac fermion fields. 12 The energy-momentum tensor obtained taking the variation of S P has indeed a trace equal to the right-hand side of eq. (A.6). Thus, eq. (A.5) can be added to the classical Einstein-Hilbert action (which in D = 2 is just a topological invariant), to provide an effective action which takes into account the quantum effect due to loops of massless particles. Similarly, in D = 4, starting from gravity coupled to massless matter fields and integrating out the massless matter fields, one obtains the nonlocal action (see e.g. [70,79,80] for pedagogical introductions) where E is the Gauss-Bonnet term, C 2 = C µνρσ C µνρσ is the square of the Weyl tensor, ∆ 4 is the Paneitz operator (2.9), and the coefficients b, b depends on the number of scalar, vector and tensor massless fields integrated out. Again, the corresponding energy-momentum tensor reproduces the conformal anomaly, and the action (A.7) can equivalently be obtained integrating the anomaly. Thus, eq. (A.7) is called the anomaly-induced effective action.
Of course, if one would naively take the variation of these nonlocal actions for deriving the corresponding Euler-Lagrange equations, one would find again acausal equations of motions, with symmetrized Green's function. However, it is obvious that this cannot be a sign of a genuine physical acausality, since these nonlocal effective actions are just a way to express, with an action that can be used at tree level, the result of a one-loop quantum computation in fundamental theories, like QED or gravity with massless matter fields, which are local and causal. The resolution of this apparent paradox is that the equations of motion derived from the quantum effective action are no longer equation for the classical fields involved, say a classical field φ or the classical metric g µν . Rather, they are the equations of motion obeyed by the vacuum expectation values of the corresponding operators, 0|φ|0 or 0|ĝ µν |0 . However, now we must specify whether we consider the in-in or the in-out expectation values, i.e. 0 in |φ|0 in or 0 out |φ|0 in .
The classical equation for the in-out expectation values correspond to a diagrammatic expansion of the usual Feynman path integral. Then one finds that the in-out expectation values indeed obey nonlocal and acausal equations of motion, where the nonlocal operators, such as 2 −1 , are defined with the Feynman Green's function. Of course, there is nothing wrong with it. The in-out matrix element are not observable quantities, but just auxiliary objects which enter in intermediate steps in the computation of scattering amplitudes. Furthermore, even if an operator such asφ orĝ µν is hermitean, its in-out matrix element are complex. In particular, this makes it impossible to interpret 0 out |ĝ µν |0 in as an effective metric. The in-out matrix elements do not have to obey causal equations (and indeed the Feynman propagator, which is acausal, enters everywhere in QFT computations). In contrast, the in-in matrix elements are the semiclassical quantities that are in principle observables (and are real, if the corresponding operators are hermitean). The equations of motion for the in-in expectation values correspond to a diagrammatic expansion of the Schwinger-Keldysh path integral, which automatically provides nonlocal but causal equations [81,82], involving only the retarded propagator.
Thus, nonlocal actions such as (2.6) or (2.10), interpreted as quantum effective actions, correspond to causal theories, exactly as, for instance, the one-loop effective action for QED (A.2) or the Polyakov action (A.5).
Non-local but causal equations can also emerge from a purely classical averaging procedure, when one separates the dynamics of a system into a long-wavelength and a shortwavelength part. Here the mechanism ensuring causality is even simpler. Suppose that a system has a degree of freedom φ that we wish to eliminate, coupled to a set of degrees of freedoms, that we denote collectively as Ψ, that we wish to retain. Classically, φ could satisfy an equation, say, of the form 2φ = j(Ψ). This equation is then solved as φ = 2 −1 ret j(Ψ), where the retarded propagator is selected by causality, as always in such classical computations. This solutions is then re-injected in the equations for the remaining degrees of freedom Ψ, which were coupled to φ, e.g. 2Ψ = f (φ). As a consequence, Ψ now satisfies nonlocal but causal equations. An example of this form, in the context of cosmological perturbation theory, is discussed in [83]. If this is the mechanism behind nonlocal equations such as (2.5), then again the causality of the equations is automatically assured. In this case actions such as (2.6) or (3.1) can be interpreted simply as convenient tools for generating nonlocal equations of motions that are automatically covariant. However, in this case the fundamental quantity would be the equation of motion, rather than the action, in which 2 −1 is then taken to be the retarded propagator.