Theoretical priors in scalar-tensor cosmologies: Shift-symmetric Horndeski models

Attempts at constraining theories of late time accelerated expansion often assume broad priors for the parameters in their phenomenological description. Focusing on shift-symmetric scalar-tensor theories with standard gravitational wave speed, we show how a more careful analysis of their dynamical evolution leads to much narrower priors. In doing so, we propose a simple and accurate parametrisation of these theories, capturing the redshift dependence of the equation of state, $w(z)$, and the kinetic braiding parameter, $\alpha_{\rm B}(z)$, with only two parameters each, and derive their statistical distribution (a.k.a. theoretical priors) that fit the cosmology of the underlying model. We have considered two versions of the shift-symmetric model, one where the energy density of dark energy is given solely by the scalar field, and another where it also has a contribution from the cosmological constant. By including current data, we show how theoretical priors can be used to improve constraints by up to an order of magnitude. Moreover, we show that shift-symmetric theories without a cosmological constant are observationally viable. We work up to quartic order in first derivatives of the scalar in the action and our results suggest this truncation is a good approximation to more general shift-symmetric theories. This work establishes an actionable link between phenomenological parameterisations and Lagrangian-based theories, the two main approaches to test cosmological gravity and cosmic acceleration.


I. INTRODUCTION
There is some hope that the evidence of accelerated expansion [1][2][3][4][5][6] is an indication that new physics is at play on cosmological scales. Thus, by characterising the evolution of the Universe in detail [7][8][9][10][11], it should be possible to measure and constrain physical parameters that capture this novel behaviour. Typically, the new physics associated with these parameters involves new fields, a notable example of which is the scalar field, φ . Indeed, shortly after the accelerated expansion was discovered, quintessence -a scalar field whose dynamics is dominated by its potential energy -was proposed [12][13][14][15] (see also [16,17] for reviews). The impact of the scalar field can be neatly encapsulated in terms of one free function, its equation of state, w(a), given by where a is the scale factor and P φ (ρ φ ) are the pressure (energy density) of the scalar field. Quintessence is part of a much larger class of theoriesscalar-tensor gravity (see [18][19][20][21] for a review on scalar-tensor theories of gravity) -which involves a host of possible couplings of the scalar field, both with itself and the metric. The Horndeski family of models [22][23][24], which leads to second order equations of motion, can be further generalised to what seems like an infinite tower of possible theories [25,26]. In principle, it should be possible to constrain such theories with observations, pinning down the fundamental parameters that enter the action. However, given the generality of the construction, the prospects are daunting.
It turns out that it is possible to completely characterise a broad class of scalar-tensor on cosmological scales in terms of a handful of time dependent functions, α X (a) (as well as w(a)) [27,28] where, in the case of Horndeski gravity, X ∈ {M, K, B, T } each associated to a particular physical feature of the underlying action [27]. A particular Horndeski model can be associated with a choice of w and α X . In this way, the exercise of constraining scalar-tensor gravity, reduces to finding constraints on these free functions. There have been a number of attempts at constraining these functions but current uncertainties are at around the 10 to 50% level [29][30][31][32][33][34][35][36][37][38][39][40] (see also related forecasts [41,42]).
The typical approach for models that use phenomenological functions such as w(a) and the α X (a) is to assume a parametric form for their evolution and constrain its parameters. The favoured model for w is the Chevallier-Polarski-Linder (CPL) parametrisation, expansion in terms of the scale factor with coefficients w 0 and w a [43,44]. There exist a number of wellmotivated parametrisations of α X (a) that assume these functions scale in some way with the fractional density parameter of dark energy (DE), Ω DE , or the scale factor, a (see e.g [27,29,33,42,[45][46][47][48][49]). However, what is often overlooked by making such a choice is that there are underlying physical models which may limit the ranges (and behaviours) of these functions. One way of putting this is that the underlying physical model will impose quite strict physical priors on these functions and these should be taken into account when undertaking parameter constraints with cosmological data. This situation is entirely analogous to what happens when constraining inflationary models. While it is the norm to find constraint on the spectral index, n, and the tensor to scalar ratio, r, each class of inflationary models singles out very specific (often one dimensional) locii in the (n,r) plane [50][51][52][53][54].
There have been a number of studies where the evolution of the DE equation of state has been reconstructed nonparametricaly (in redshift bins) using data [55][56][57][58][59], as well as ones where the impact of theoretical priors on the parameters of quintessence and more general scalar-tensor theories was considered and used to introduce correlations and minimise the number of these parameters [60][61][62][63][64]. Using a different and complementary approach, we have tackled this problem of physical priors in the case of thawing quintessence where, remarkably, we could construct an analytic prior for w(a) [65]. By parametrising it as we found that if {w 0 , w a } were chosen to fit the observables, those could be reproduced with the accuracy required by nextgeneration surveys up to recombination. Furthermore, the prior, P, was factorisable, P[w 0 , w a ] = P[w a |w 0 ]P[w 0 ] and the shape of P was such that it was not collinear with current constraints on {w 0 , w a } and thus, if incorporated could reduce the uncertainties in w by up to an order of magnitude. Emboldened by what we have found in the case of thawing quintessence, we now wish to generalise this approach to more general scalar-tensor theories. From the outset, it is a somewhat challenging task to construct a multidimensional probability distribution function for w and α X . We have therefore established a more modest goal and focused on a subclass of theories that are shift-symmetric, i.e. theories which are invariant under a scalar field transformation of the form where C is a constant. Such theories are, in a sense we will make more precise below, well-defined and natural. In this case, the theory is completely determined by w(a), α B (a) and α K (a); however, it is well known that α K (a) is unconstrained by observations [29], so we are seeking a prior distribution function for w(a) and α B (a). As we will see, exploring this restricted set of scalar tensor models already sheds light on the hurdles we need to tackle in the general case. Note that, motivated by recent observations [66][67][68] and associated theoretical bounds [69][70][71][72], in the above we have implicitly required that the speed of gravitational waves is luminal.
Outline: In Section II we outline the theoretical aspects of and motivation for the shift-symmetric Horndeski model that we focus on here. In Section III we justify the choice of physical priors we impose on the theory. In Section IV we describe the approximation scheme we use here and explain how we evaluated the required accuracy. Further, in Section V we present the constructed prior functions on w and α B . In Section VI we combine these priors with a set of cosmological data. Finally in Section VII we discuss our findings.

II. SHIFT-SYMMETRIC SCALAR-TENSOR GRAVITY
Consider as a starting point, the Horndeski action [22][23][24]: where L m captures the matter Lagrangian, with all matter fields ψ M minimally coupled to g µν (in other words, we are in the Jordan frame), and where Here X ≡ − 1 2 ∇ µ φ ∇ µ φ , covariant derivatives on φ are denoted by indices, so e.g. φ ;µ ν ≡ ∇ µ ∇ ν φ , and similarly we use a shorthand for partial derivatives wrt. X, e.g. G 4X = ∂ G 4 /∂ X. The Horndeski action describes the most general Lorentz invariant, local action in four dimensions, featuring a scalar field on top of the metric and having at most second-order equations of motion on any background. Even if the final aim, beyond the scope of this paper, is to investigate the impact of physical priors for this action in full generality, in this paper we focus on a simpler scenario: shift-symmetric Horndeski theories. This subset of theories is also known as 'weakly broken Galileons' [73], since the shift symmetry ensures that radiative corrections are parametrically suppressed around (quasi) de Sitter backgrounds, reminiscent of non-renormalisation theorems for Galileons [74,75]. 1 By focusing on this subset of solutions we are therefore already implicitly ensuring that a theoretical prior requiring the radiative stability of the theory is satisfied. 2 As we are ultimately interested in investigating concrete cosmological observables for shift-symmetric Horndeski theories (and the effect theoretical priors have on them), we need to choose a concrete parametrisation of the (in principle infinite) freedom inherent in the G i functions. As a concrete illustration we therefore focus on the following subset of theories 1 Although see [34,76] for examples of shift-symmetry breaking theories that maintain this property. 2 By this we mean radiative stability of the Horndeski scalar interactions considered here. We have nothing new to say about the old cosmological constant problem.
Here the reduced Planck mass is M 2 P = 1/8πG and conventionally Λ 4 2 = M 2 P H 2 0 , Λ 3 3 = M P H 2 0 , ensuring all the above interactions can give O(1) contributions to the cosmological background evolution today. The choice for G 4 and G 5 is dictated by constraints on the speed of gravitational waves [66-68, 78, 79] -see [69][70][71][72] and references therein for why this implies the above restrictions on the G i , at least as long as the cosmological Horndeski theory is valid up to energy scales of Λ 3 [80]. For G 2,3 we keep the first two orders in X, where the c 01 and d 01 terms capture the Galileon symmetric contributions, while the c 02 and d 02 capture the lowest order (in X) shift-symmetric corrections to this. 3 This will afford us with a fairly minimal, yet suitably rich testbed in which to investigate the effect of theoretical priors on shift-symmetric Horndeski theories. Note that, for simplicity, we have excluded the (equally shift-symmetric) tadpole term c 10 φ in our test case, Eq.(9).
The shift-symmetric model has been explored previously in Refs. [81] and [82], where the authors put cosmological constraints on the parameters of the model, defined in Eq. (9), and on the parameters of the shift-symmetric generalisation of the Cubic Covariant Galileon model, respectively.
We will be considering a homogeneous and isotropic cosmological (FRW) background solution, ds 2 = −dt 2 + a 2 (t)(dx) 2 , populated by matter, radiation and the dark energy scalar φ . The Friedmann equations then are where H ≡ȧ/a as usual, ρ tot = ρ m + ρ r + ρ φ and p tot = p r + p φ (subscripts refer to matter, radiation and dark energy, respectively). For Eq. (9), ρ DE and p DE then satisfy Note that in the case where we include a cosmological constant Λ, described in more detail below, the dark energy density, ρ DE , and pressure, P DE , will have a contribution from Λ in addition to φ . However, in both cases we take w(a) = P φ /ρ φ . The background scalar equation of motion can be written in terms of a conserved current [27] aṡ 3 If higher order terms in X/Λ 4 2 are suppressed (while terms such as ( φ ) n /Λ 3n 3 are not), then this will fully capture the leading order terms as well as next-to-leading-order corrections for a generic G 2,3 . If higher-order terms are not suppressed and e.g. all powers of X/Λ 4 2 equally contribute to G 2,3 , this is not the case. A truncation like Eq. (9) is therefore not generically valid, but instead it should be viewed as a specific illustrative example of a shift-symmetric Horndeski theory. where There are a few key points to note about the background evolution. First of all, we have that Eq. (12) implies that there is a tracker solution as J ∝ a −3 → 0 as a grows. This greatly simplifies the dynamics and, as we will reiterate further down, the priors we need to assume on the various ingredients of this model. Second, we will consider two versions of this theory. In the first version the scalar field is entirely responsible for the late time acceleration and thus there is no explicit cosmological constant, Λ (or a constant term V 0 in the scalar field potential); we will dub this the Λ = 0 self-accelerating version. 4 The Λ = 0 version is, in some sense, the more interesting as it can be invoked as an alternative to cosmological constant driven acceleration. But we also have experience from other theories that self-accelerating solutions are more tightly constrained and potentially easier to rule out (for example in the case of GDP gravity [83][84][85][86]). This means that the dark energy density is solely given in terms of the energy density associated to the scalar field: A key aspect of self-accelerating solutions is that they require "negative kinetic energy" G 2 < 0, at least in the class of theories under consideration [77]. For shift-symmetric Horndeski theories up to cubic term (Kinetic Gravity Braiding), the energy density can be written as [77] where the latest limit corresponds to the tracker solution. Because G 2 is even inφ , ρ φ > 0 requires that at least one of c 01 , c 02 to be negative (the tracker condition J = 0 on Eq. (13) might impose further constraints on the relative signs). We Distributions of the parameters ofφ 2 0 /Λ 2 2 andφ 0 /Λ 3 3 , where φ 0 is the amplitude of the scalar field today. The fact that they are lower than 1 means that higher order terms in our expansion of the Lagrangian, Eq. (9), should be suppressed unless large values of the coefficients {c i j , d i j } were chosen. Therefore, this could be seen as a posteriori justification of our ansatz, Eq. (9). will find that generically c 01 < 0, i.e. the "wrong" sign of the standard kinetic term, Fig. 1. This means that Minkowski space withφ = 0 is not a stable solution of these models nor can we apply the usual battery of consistency conditions that have been developed in the standard vacuum (see discussion in the next section). 5 Another interesting feature of the self-accelerating solutions is illustrated in Fig. 2. There we can see that bothφ 2 0 /Λ 4 2 andφ 0 /Λ 3 3 are smaller than unity. This is encouraging in that it provides a posteriori justification for our ansatz, Eq. (9): the higher order terms omitted in Eq. (9) scale with higher powers ofφ 2 0 /Λ 4 2 andφ 0 /Λ 3 3 . So if these higher powers are indeed suppressed, then omitting higher order terms in the first place is consistent. This is also related to the above discussion of the sign of c 01 . If higher order terms with coefficients c 0i and i > 1 are increasingly suppressed, then obtaining a positive scalar energy density, Eq. (14), with positive c 01 becomes very challenging. Note, however, that the suppression illustrated in Fig. 2 is rather mild and can easily be compensated for by coefficients c i j and d i j that are somewhat larger than unity. Fig. 1 shows that this is in fact the case for the lower order interactions in our ansatz, Eq. (9), so we emphasise that our findings here are certainly not conclusive evidence that the higher order interactions omitted cannot yield O(1) contributions to the scalar energy density or the background and perturbative evolutions in general.
The second variant that we will consider does include Λ; we will dub it the Λ = 0 version. In this case the signs of c 01 , c 02 are less restricted by requiring the scalar field to dominate the expansion, Eq. (14). If we were to restrict ourselves to c 01 > 0 (which we do not here) we would be looking at what is conventionally dubbed the normal branch. In the cubic Galileon limit (c 02 , d 02 = 0) Ω φ > 0 requires c 01 < 0, in agreement with Eq. (14). Normal-branch Galileons (c 01 > 0) are driven towards a trivial tracker withφ → 0, ρ φ → 0 unless shift-symmetry is broken [87]. We will not fix a sign of c 02 to be able to capture more general behaviour in the Λ = 0 case. Note that the cosmological constant is allowed and does not break shift symmetry. Here the dark energy density is the sum of the energy density associated to the scalar field and the cosmological constant: As we will focus on large scale observables, we are particularly interested in linearised perturbations around the cosmological background solution described above. The freedom in the dynamics of such perturbations for a general Horndeski theory, as specified in Eqs. (4)- (8), is controlled by just four functions α X of time with X ∈ {K, B, M, T }. For the general form of these α X see [27]. In the shift-symmetric subset of theories we are considering here, with G 4X = 0 = G 5X , we find that the effective Planck mass seen by linear perturbations is simply M P (and hence has no time-dependence), while the speed of gravitational waves c GW = 1 by construction. We are therefore left with only two non-trivial α X controlling linear perturbations, namely where all functions are evaluated at the background level. Upon substituting Eq. (9) into Eq. (15), it is then straightforward to express these two α X in terms of the c i j , d i j in Eq. (9) and the background degrees of freedom, a and φ .

III. ESTABLISHING PHYSICAL PRIORS
It has been well established that cosmological observables are insensitive to α K [29], a direct manifestation of the fact that α K drops out in the quasi-static limit (which applies to the vast majority of observable scales at late times) at leading order [42]. The challenge, then, is to construct physical priors for w and α B . There are a number of steps in working towards this goal, the first one of which is to map out the space of possible histories for the scalar field φ and the metric g αβ . In fact, as we saw in the previous section, w and α B are completely determined in terms of φ (t) and a(t) so we will only have to focus on the evolution of the background in these theories.
We then have a number of parameters which need to be chosen. The standard cosmological parameters will be included in the analysis, whether we work with the scalar field action directly or we work with the parametrised form, in terms of w and α B ; therefore, we will not be specially concerned with the choice of their priors; indeed we will consider a standard range such as Ω cdm ∈ [0.15, 0.35] and H 0 ∈ [60, 80] km s −1 Mpc −1 , which ensures our findings will be compatible with current constraints of these parameters, while not too broad to explore values that are already ruled (e.g. H 0 = 0). We then have the parameters in the action which we have distilled down to {c 01 , c 02 , d 01 , d 02 } and {Λ 2 .Λ 3 }. Two dimensionless c i j , d i j can be absorbed into the Λ i . In In contrast, in the case of slicing, one varies the all three parameters simultaneously and keeps only the sets that give ∑ Ω i = 1. As can be seen above, the tuning method is subject to projection effects.
our concrete implementation, however, we find it more practical to follow a different (yet physically equivalent) prescription and fix Λ 4 2 ≡ M 2 P H 2 0 and Λ 3 3 ≡ M P H 2 0 , varying only the coefficients {c 01 , c 02 , d 02 }. In order to decouple the effects of H 0 on the coefficients and avoid possible inconsistencies due to the way we choose to sample our parameters, we set this normalisation H 0 to a fiducial value. We use the fact that we can set d 01 = −1 due to the normalization of the field [88]. 6 The physics of the model does not change depending on which of the c 0i (up to its sign) and d 0i parameters one chooses to fix, and hence our priors would be unaffected by this choice.
And finally, we must also consider the initial conditions of the scalar field, φ i andφ i . As we have seen in the previous section, shift-symmetric theories come endowed with tracking behaviour. This means that irrespective of the initial condition, the field will (quite rapidly) evolve towards a universal solution which is uniquely determined in terms of the coupling constants of the theory. And, because the theory is shiftsymmetric, the result is completely independent of φ i . This means that the prior will also be completely independent of φ i andφ i .
With regards to {c 01 , c 02 , d 02 }, it makes sense to consider uniform, uncorrelated priors over a fixed range; as with all uniform priors, one needs to define hard limits to their ranges. One may expect that naturalness criteria suggest one should only vary these dimensionless constants within a range of O(1) (around 0). However, note that the shift-symmetric nature of the theories at hand means that radiative corrections to the c 0i and d 0i are parametrically suppressed [73], more specifically these corrections scale as [71,73]. So considerably wider prior ranges can be explored without running into naturalness issues. We have explored different choices for the ranges of these parameters and have found that, once we allow them to vary within a range of O(2), the final results are unchanged. We also check that the constraints with data are consistent with these bounds, and indeed we find distributions that are well within the range of O(2). This confirms that this is a wide enough range so that our results are not biased by the bounds we have chosen, while at the same time we exclude regions of space that are ruled out by data. We use such an extended range in all our subsequent results.
There is a further complication, however, which is that we are interested in cosmologies which are reasonably close to the one we observe, i.e. one in which Ω r + Ω m + Ω DE = 1 (note that our definition of Ω DE differs between the cases with and without Λ); one can loosen this statement and say that we do not want Ω DE 0 or Ω DE 1. This immediately imposes additional restrictions on {c 01 , c 02 , d 02 }. In other words, one can see such a restriction due to Ω DE as a deformed slab cutting through {c 01 , c 02 , d 02 }, picking out a lower dimensional space. Projecting such a cut onto each of the {c 01 , c 02 , d 02 } will naturally lead to non-uniform 1-D priors.
One might think that an alternative approach is to solve for one of the {c 01 , c 02 , d 02 } for a fixed range of Ω DE and indeed it is possible to do so using a well established shooting method. Unfortunately the resulting combined priors depend heavily on which of the constants one chooses to solve for. One can understand this if one takes two examples. In one case, one assumes a uniform prior for {c 02 , d 02 } and solves for c 01 . The resulting distribution c 01 will not, generally be uniform. Alternatively one might consider a uniform prior for {c 01 , d 02 } and solves for c 02 . Now the prior on c 01 will be uniform while the prior on c 02 will not be uniform. We illustrate this in Fig. 3. This is not surprising as this approach effectively introduces a non-linear correction to the measure which is highly dependent on the constant one is solving for. Thus we have opted to use original approach -to sample all the parameters and then project down the constraint slice (or slab) 7 .
A comment is in order about imposing priors related to the validity of the underlying theory itself. Firstly, these come in the form of stability priors. We have already alluded to radiative stability above and we will complement this by requiring the absence of ghost and gradient instabilities for our cosmological solution, using the implementation of [89]. Note that these instabilities directly manifest themselves in the effective (low-energy and classical) theory we are considering, i.e. Eqs. (4)-(9). 8 Secondly, there are priors not directly linked to any easily recognisable sickness in the low-energy theory, but instead to ensuring that this low-energy theory can be embedded in a sensible UV completion. These bounds turn out to be powerful, even if the UV completion is not known. In this context we will focus on so-called positivity bounds, requiring that the underlying fundamental theory (and hence the UV completion as well) is consistent with a "standard" Wilsonian field theory description -one in which Lorentz invariance, unitarity (well-defined probabilities), analyticity (causality) and polynomial boundedness (locality) are respected. These basic principles turn out to be sufficient in order to derive a variety of additional constraints on the low energy parameters of the theory, in our case encoded in the c i j and d i j -see [90][91][92][93][94][95][96][97][98][99][100][101][102][103][104][105][106][107][108] for constraints directly applicable to our present scalar-tensor context. The simplest such bounds can be derived via considering tree-level 2 → 2 scattering on a flat (Minkowski) background. For general Horndeski theories the resulting bounds are presented in [101]. Specialised to Eq. (9), these reduce tō where a bar denotes that the function is evaluated on a flat background ( φ = 0) and constraints are subsequently ported to cosmological backgrounds. While the second bound is trivially satisfied, the first imposes a non-trivial constraint. However, and crucial to the results of this paper, these bounds will turn out to not be applicable here. This is because we will find that, for our ansatz, Eq. (9), cosmological constraints push c 01 to be overwhelmingly negative. While this condition is consistent with obtaining healthy solutions on cosmological backgrounds, around a flat (Minkowski) space-time it renders φ into a ghost. But the existence of a well-defined and ghost-free Minkowski solution is an essential ingredient for the derivation of the above positivity bounds. So, at least for our specific ansatz, Eq. (9), we will not be able to identify regions of parameter space here, where observational constraints are satisfied and where we can consistently apply the above positivity bounds -for a more detailed discussion see [109].
IV. APPROXIMATING THE TIME DEPENDENCE OF w AND α B We now proceed to determine the best way to parametrise the time dependence of w and α B . We recall that, in the case of thawing quintessence, we found that w = w 0 + w a (1 − a) was an excellent approximation to the equation of state; this was not the case for tracking quintessence. On the left panel of Fig. 4 we plot the two typical shapes of the evolution of w for the shift-symmetric model that we consider; although, on the face of it, the true curve and the fit do not seem to agree particularly well, we find that w = w 0 + w a (1 − a) approximates the evolution of equation of state well enough in a sense that will be clear soon.
A natural first choice for the time-dependence of α B was the commonly assumed scaling with fractional density of DE, Ω DE , however, we found this parametrisation to only provide a good fit to a small fraction of the models we calculated. We found similar results for a proportionality with the scale factor, a. A Taylor expansion in terms of either a or Ω DE worked better than a single constant factor, however, in order to reach our desired error for a the majority of models, it was necessary to include at least seven coefficients in the expansion. We had similar success with other parametrisations, such as inverse power law, binomial expansion, exponential power law and others. Finally, we note that in a simplified version of our model, i.e. the cubic galileon, 9 its exact time dependence is α B ∝ H −4 . The tracker solution, J = 0 in Eq. (13), provides a solution for the scalar field evolutionφ ∝ H −1 , which can be substituted into Eq. (15) to get the expected result. It is also possible to prove that this time-dependence approximately holds for a more general case too, i.e. c 02 = 0 and d 02 = 0. Therefore we expect that a function of (H 0 /H) 4 should fit the evolution of α B in the shift-symmetric case. We find the following function to fit the true models extremely well, whereα B and m are constant parameters. On the right of Fig. 4 we plot two typical representative α B and the lines that fit to those given the function that we chose. As will be discussed in more detail in the following section V this parametrisation fits incredibly well (to less than 1% error) more than 98% of the large set of randomly generated models.
In the spirit of [65], we now want to find the set of parameters {w 0 , w a ,α B , m} that reproduce the Hubble rate (H), the angular diameter distance (D A ) and the growth factor ( f = d ln δ m /d ln a), 9 The cubic galileon model is equivalent to a special case of the shiftsymmetric we consider here with c 02 = d 02 = 0. where computed from the Lagrangian with parameters (c 01 , c 02 , d 01 , d 02 ) with the accuracy required by nextgeneration surveys; i.e. 1% at z < 10 [7,8,10,110]. and 0.3% at recombination for D A [5], for 99% of the models. In order to find {w 0 , w a ,α B , m} we minimise where O(w 0 , w a ,α B , m) z and O(c 01 , c 02 , d 01 , d 02 ) z are the observables at redshift z computed with the parametrisations of w and α B and the exact from the full evolution of the field equations for a shift-symmetric model given by the set of parameters {c 01 , c 02 , d 01 , d 02 }, respectively. The variable σ O z weights each point so that we can require different precision depending on the variable and redshift. For instance, we set σ O z = 10 −3 for all observables at z < 10 and σ D A (z rec ) = 10 −4 at recombination for the angular diameter distance. The software used to make these fits is a modified version of RU-FIAN [65] and can be found at https://gitlab.com/ dinatraykova/horndeski-priors.
Let us emphasise that, with this approach, we do not choose the set {w 0 , w a ,α B , m} that best fit the equation of state and α B curves obtained from the Lagrangian, which are not observable quantities. Instead, we minimise the error in the background evolution H and D A , and f for the linear perturbations. In this sense, allowing {w 0 , w a ,α B , m} to differ from their best fit values with respect to the exact w and α B , we find the set of parameters that minimise the error on the observables. It is important to note that, in comparison with quintessence, the equation of f , Eq. (20), has the source term modified as which introduces an extra dependency on α B in the source term and means that a more precise fit to α B would be required to get a good fit to the observables than is the case for w.
In Fig. 5 we show the distributions for {w 0 , w a ,α B , m} set of parameters for the shift-symmetric model with and without Λ recovered by minimising the error on the observables, as detailed here; correlations between these variables will become apparent as we construct a complete model for the priors in the next section.
In Fig. 6, we present a summary diagram of the method explained in this Section that allows us to derive the approximate time dependent functions that describe the shift-symmetric model, w(a) and α B (a), and find the best fit coefficients {w 0 , w a ,α B , m} that better reproduce the observable quantities obtained from the evolution of the Lagrangian {c 0i , d 0i }.

V. RESULTS
We now have a robust process for determining {w 0 , w a ,α B , m} for each choice of the physical priors: minimising Eq. (21) allows us to find the set of parameters that reproduce the observables H, D A and f with the accuracy needed by next-generation surveys. The next step is to obtain the probability distribution that will be used as theoretical priors for the shift-symmetric Horndeski models. For that, we sample 30,000 random models with parameters {c 01 , c 02 , d 01 , d 02 } and store their corresponding observables at specific redshifts (100 points at z < 10 and at z rec , in the case of D A ). After minimising Eq. (21) for each of this set of 30,000 observables, we end up with having 30,000 {w 0 , w a ,α B , m} that can be used to build our theoretical priors. We obtain the observables quantities for each realisation using hi class [89,111], an extension to the Boltzmann code CLASS [112] that solves the cosmological equations for a broad range of sub-sets of the Horndeski class of theories.
Let us note that the choice of 30,000 samples and 100 points at z < 10 is just a matter of computational efficiency and has no physical insights. We checked that after 30,000 samples the probability distributions had already converged and increasing its size to 100,000 does not alter the results. In addition, for each set of {w 0 , w a ,α B , m} obtained minimising Eq. (21) with 100 points for each observable below z = 10 and D A at z rec , we saw that the new observables satisfy the requirement of having  an error below 1% at z < 10 and below 0.3% for D A (z rec ) for the 99% of the cases. This can be seen in Fig. 7.
The probability distributions for the parametrised shiftsymmetric Hordenski models can be seen in Fig. 5, which shows mild correlations between different parameters. We note that these correlations do not have the usual elliptical shape that one expects for a multivariate Gaussian. Clearly there is a non-linear correction that must be taken into account in the next steps.
If we are to construct theoretical priors that can be used in a Markov chain Monte Carlo (MCMC), we need to find a sufficiently good approximation of the probability distribution that allows us to recover the same distribution of the parameters and the observables when sampling from it. We do this in two steps. We first transform to a new set of parameters X 1 =α B , which effectively Gaussianise the distributions (Fig. 8).
We find that a multivariate normal distribution fits the distribution of {X 1 , X 2 , X 3 , X 4 } and that, once transformed back to {w 0 , w a ,α B , m} recovers the correlations between the variables to a very good approximation; this can be seen in Fig. 9. Of course, a crucial test is to see the impact on the observables and whether one is able to recover the correct distribution for those. We do so in Fig. 10 where we compare the observables obtained by integrating the field equations of motion to the ones obtained by sampling from the distribution. Note that FIG. 8. Approximate fit to the probability distribution of the {X 1 , X 2 , X 3 , X 4 } for the Λ = 0 model. In black, the contours from the original set of parameters transformed by Eq. (23). In green, those obtained from a multivariate Gaussian distribution. As one can see, their differences are small and have little effect on the original parameters {w 0 , w a ,α B , m} (Fig. 9), the observables (Fig. 10) and, therefore, in an MCMC.
we only show the distances at z = 1 and at recombination, as they have the largest differences, yet these are still small and should have little impact on the posterior distributions of the parameters when combined with data.
Using the Gaussianised distribution we can construct an analytic model and calculate the probability density in the transformed parameter basis,

(26)
We use this to infer the a priori likelihood of a given sample used in the combined analysis with data in Section VI.
We might want to compare our current results with those of our previous work [65]. There, we found that the phenomenology of different highly dimensional theories is well described by the usual w 0 -w a parametrisation. In that case, we went from having many parameters in the Lagrangian to just two, accurately accounting for their cosmologies. In this paper, however, we start from a Lagrangian with three parameters (after fixing d 01 = −1) and end up with four parameters to describe its phenomenology. However, this phenomenological parametrisation is still advantageous: it does not require solving the field equations, allows to clearly split the background and linear perturbations effects and shows that both w and α B are simpler than one would a priori think. We expect this to also be the case in other more general theories.

VI. COMPARISON WITH CURRENT DATA
Here we present the constraints on α B and w from current cosmological data and compare and combine these with our theoretical priors. To do this we use a combination of cosmic microwave background (CMB), Baryon Acoustic Oscilations (BAO), Redshift Space Distorsions (RSD) and Supernovae Type IA (SN Ia) data.
From Planck 2018 [5,113,114], we use the auto-and crosscorrelations of the temperature (T ) and polarisation (E) fluctuations of the Cosmic Microwave Background, together with measurements of the lensing potential; i.e. the likelihood from the high-l temperature auto-correlation (TT), temperature and polarisation cross-correlation (TE), and the polarisation autocorrelation (EE) spectra (at l ≥ 30), the low-l (2 ≤ l < 30)) TT and EE likelihoods and the lensing likelihood (with temperature and polarisation lensing reconstruction) in the multipole range l = 8 − 400.
Additionally we use the BAO and RSD measurements from BOSS DR12 [6], as well as BAO from the 6dFGS survey [115]. The BAO measurements are of the Hubble rate, H and the angular diameter distance D A , while RSD measures the growth rate of the universe through f (z)σ 8 (z). We use the full covariance between the f (z)σ 8 (z) measurements at different redshifts and the BAO measurements of H(z) and D A (z) from BOSS. However, we do not consider the correlation between the BOSS and 6dF measurements as those cover different areas of the sky and thus any such correlation would be negligible.
Finally, we also include the Pantheon SNe Ia sample [116], which combines the Pan-STARRS1 Medium Deep Survey with ones from the SDSS, SNLS, and various low-redshift and HST samples, 1048 SNe Ia in total in the redshift range 0.01 < z < 2.3. We also note that, throughout, we assume that the cross-correlation between the different datasets is negligible.
We built our prior likelihood in the Gaussianised basis {X 1 , X 2 , X 3 , X 4 } and, in order to be consistent, we use the same basis for the sampling in all cases, from which we then con-vert the resulting distributions back to {w 0 , w a ,α B , m}. We sample through the parameter set {X 1 , X 2 , X 3 , X 4 } together with the standard cosmological parameters in a Markov Chain Monte Carlo (MCMC) with MontePython [117,118] using the Metropolis-Hastings algorithm [119,120]. We do not consider any prior bounds on the standard cosmological parameters in the MCMC run (apart from τ > 0.004), but just start from a known good fit point from Planck for the set {Ω cdm,0 , Ω b,0 , H 0 , A s , n s , τ}. For the X i we set the following ranges, In the case of Ω Λ = 0 we also set Ω Λ ∈ (0, 1). Using the Gelman-Rubin convergence criterion [121] we require R − 1 < 0.02. The contour plots were produced using Get-Dist [122]. In order to obtain the combined constraints from data and the theoretical priors, we implemented Eqs. (24)- (26) in MontePython as a new likelihood module 10 . In the analysis with data only we assume uniform priors on the Comparison between data and the prior distributions for the two shift-symmetric variants we have considered, Λ = 0 (lower left corner) and Λ = 0 (upper right). The grey filled contours show the distributions of the parameters when constrained with 'data' alone, the green or blue filled contours show our 'priors' for the Λ = 0 and Λ = 0 cases respectively, and the purple dashed contours are from the combined analysis. Note that the 'prior' likelihood is built in the Gaussianised {X i } basis, as discussed in Sections V and VI and takes into account the underlying physical properties of the model (not to be confused with the flat uncorrelated priors we put on the parameters in the 'data' runs). We have only plotted the 1D pdf's for the case Λ = 0 for clarity and again we have marked the line where α B = 2 (see Appendix A). This plot demonstrates how combining data with theoretical priors can result in much tighter constraints on some of the parameters of the model.
We present the results in Fig. 11 . Let us note that Ω Λ ∼ 0.7 in the data run implies that data seems to prefer a universe mainly filled with a cosmological constant and only a small contribution of the scalar field to DE. Λ = 0 data-only constraints are much broader than any of the others, making it difficult to read. If we focus only on the bottom left corner of Fig. 11  (Λ = 0), we note that although the data and the priors appear to be equally as constraining forα B , w 0 and w a , the contours are misaligned and only overlap away from their respective centres (i.e. regions of high probability). Nevertheless they are statistically consistent and combining the two results in tighter bounds on these parameters. Further, data alone does not provide a very strong bound on m, which makes including the prior likelihood crucial in constraining the time evolution of α B . Looking at the combined constraints (dashed line) one may expect that the contours should lie in between the data and priors alone. However, due to the high dimensionality of the problem and the correlations between different parameters, the 2D and 1D projections of the distributions from the combined analysis can end up off the centre of the distributions of the data and the priors runs alone. In the case of w 0 this effect can be seen quite clearly, where the combined histogram appears to be to the right of both the data and priors alone. This is not surprising looking at the 2D contours of w 0 and the other three parameters, where we see that the data and priors contours overlap only at their edges, which could result in such a shift in the 1D projections of one or more parameter.
In the top right corner of Fig. 11 we show the contours for the case where we include a contribution of Λ to the DE density (Λ = 0). In this case we find that while data alone (grey solid contours) constrains theα B and m parameters well, the distributions of w 0 and w a are very wide, compared to the priors (blue solid contours) and the combined constraints are almost fully driven by our priors. We note that these distributions cover almost the full range that we have set for these parameters in the data run. The distributions of Ω Λ are shownα  I. Best fit and confidence limits of w 0 and w a for the data set CMB+BAO+RSD+SN, the theoretical priors and the combined analysis for the shift-symmetric models both with and without Λ, (Λ = 0 and Λ = 0). Note that we have not written the means and errors for w 0 and w a from the data run in the Λ = 0 case, as data is not constraining on these; the errors are determined by the ranges we have set and the mean values are consistent with ΛCDM. This is related to the fact that, in this work, w is defined as the scalar field equation of state and that data seems to prefer a negligible contribution of the scalar field to the DE density with the majority coming from Λ (Fig. 12).
in Fig. 12 from the sampling with data (grey line) and the ones recovered when deriving the priors (green line). This plot shows that the data prefers the majority of the DE density contribution to come from Λ, as it is consistent with Ω Λ ∼ 0.7. This leaves only a very small portion of Ω DE to come from the scalar field φ . Our definition of w(a), Eq. (11) includes only the contribution of the scalar field, so we can expect that in a case where Ω Λ dominates the DE density, it would not be possible to get a strong constraint on the equation of state parameters of the field, w 0 and w a .
To emphasise the benefit of including theoretical priors into the likelihood analysis in constraining these models, in Table I we present the parameter ranges for the {w 0 , w a ,α B , m} set from the likelihood analysis with uniform priors and with theoretical priors. In the Λ = 0 case we find that for m there is a significant improvement in the error after including our theoretical priors, compared to the constraints with data using uniform uncorrelated priors (from ±1.6 to ±0.4). For the other three parameters ranges from the data run with uncorrelated priors and our derived correlated ones are comparable but combining them still results in slight improvement of the errors. In the case of Λ = 0, we see that there is a similar improvement on the constraint of m as we find in the Λ = 0 case (from ±1.4 to ±0.4). However, as we saw from the contours in Fig. 11, w 0 and w a cannot be constrained with data using the uniform priors on the parameter set due data preferring Ω Λ to be the dominant contribution to the DE density. The addition of the theoretical priors in this case is, therefore, crucial as the only way to fully constrain the parameter space.

VII. DISCUSSION
In this paper we have taken a further step towards constructing a set of physical priors for Horndeski theories of gravity. Building on the experience of constructing such a prior for thawing quintessence, we have focussed on a physically well motivated subset of Horndeski gravity: shift-symmetric theories with standard speed of gravitational waves. While these theories are less general than the full Horndeski space of theories, they are more general than the much studied Galileon scalar-tensor theories.
Working with shift-symmetric theories has allowed us to explore a situation in which one needs more than just the equation of state, w, to fully characterise its behaviour on cosmological scales. For such theories one needs to also include an accurate model for the "braiding" parameter, α B . We have done so, constructing a prior distribution function, P for four constant parameters defined in Eqs. (2) and (17). Remarkably, and very much like in the case of thawing quintessence, we have come up with a simple analytical form for P which can be easily deployed in future cosmological parameter analysis.
We have learnt a number of lessons from focusing on shiftsymmetric theories which give us a sense of the challenge of tackling more general Horndeski theories. For a start, the theories we have looked at here are endowed with a tracking behaviour which eliminates the need to pin down a prior for initial conditions. This will not be true in general for full Horndeski theories.
We have had to face the problem of sampling over a multi-dimensional space of parameters (in this case {c 01 , c 02 , d 01 , d 02 }) which is subjected to some form of constraint. The way one implements the constraint can greatly affect the prior distribution function. For example, explicitly solving the constraint can bias the resulting prior, depending on which of the parameters one is solving for. We have argued that one should sample over all parameters and exclude points which lie outside the constraint sub-region. This is, nevertheless, a computationally costly approach to the problem which will become far more severe, the more general the theory one is looking at.
With an appropriate algorithm for sampling over {c 01 , c 02 , d 01 , d 02 }, we have proposed a functional form for the phenomenological parameters, w and α B . We have found that the usual form for w is still remarkably effective while, building on our knowledge of Galileons, we have come up with a suitably simple form for α B , if we choose the parameters by minimising the error on the observables (a crucial aspect of this approach). The latter insight is useful and points to the fact that, in general, the α X parameters may have a simple functional form in the more general theory. This means that a reanalysis of current cosmological data may lead to far tighter constraints than have until now been found.
An important step has been to find non-linear transformations that, to some extent, "Gaussianises" the distributions of our parameters. Such a transformation has been remarkably effective allowing us to determine, rather more easily than one would naively expect, an analytic expression for the prior. Again, one would expect this approach to be useful when looking at more general theories.
An interesting aspect of the theories we have focused onshift-symmmetric theories -is that they are, in some sense, viable and complete. In other words, Including terms ∝ X 2 in G 2 , G 3 give viable generalisations of the cubic Galileon (Fig.  11), even with Λ = 0. In this modelφ 2 /Λ 4 2 0.2 (Fig. 2), suggesting that higher order corrections are subdominant and can be neglected.
An important aspect, which from our understanding has been somewhat unexplored, is that c 01 < 0 is more generic than just for the Covariant Galileon. This is important, since this regime disconnects these theories from the Minkowski solution and constraints derived for that solution. Note that there may be other solutions where, for example, higher orders in X n contribute. In that situation, the constraints on c 01 may be markedly different.
Note that, while we have considered and taken into account a number of theoretical priors and observational constraints throughout this paper, these are of course not complete and one may wish to add additional priors/constraints to this analysis in the future. One such example to highlight are constraints from dark energy-gravitational wave interactions, specifically related to dark energy (gradient) instabilities that can be induced by gravitational wave sources such as massive binaries [123]. Requiring the absence of these instabilities in general can be used to significantly tighten cosmological parameter constraints [38]. In the specific shift-symmetric context of the theories considered here, avoiding such instabilities amounts to requiring |α B | 10 −2 . This effectively renders the cubic Horndeski interactions we have considered into an afterthought for cosmology. We will leave a more detailed investigation of this and other additional priors in the context of shift-symmetric theories for future research.
Finally, our brief comparison with current data shows that this theory is a viable, self-accelerating model of the Universe: the physical priors are consistent with the cosmological constraints. This is somewhat promising given the dearth of theoretically viable models of self-acceleration which are currently compatible with cosmological data. A thorough analysis of shift-symmetric cosmologies, along the lines of what has been undertaken in [124] will allow us to assess if such shift-symmetric gravity is a credible contender for the late time acceleration of the Universe. done with a modified version of RUFIAN 11 [65], the MCMC were run with MontePython [117,118] and the contour plots were produced using GetDist [122].
Appendix A: Discontinuities for models crossing α B = 2 In the main text we briefly alluded to potential issues associated with crossing α B = 2. As shown in Fig. 4, α B generically starts strongly suppressed at high redshifts and then grows towards redshift zero, typically reaching O(1) values. For a small, yet significant, subset of the models discussed in this paper (see e.g. the distributions shown in Fig. 5) α B eventually grows to be larger than 2. This is important, because crossing the α B = 2 point is associated with a number of discontinuities. This was first noted in [27] and discussed in [129,130]. As a result, evolutions crossing this point have being conservatively excluded in some of the subsequent analyses -see e.g. [33,35,38]. On the other hand, this could be only a gauge discontinuity, as advocated in [130], that can be safely removed. So in this appendix we quickly summarise the issues associated with crossing this point and how we treat models that do so in this paper. Discontinuity in the number of propagating degrees of freedom: Horndeski scalar-tensor models of dark energy generically propagate two scalar degrees of freedom: one directly associated to dark energy and one to matter. Following the approach outlined in [27,129,131] and for concreteness modelling matter as a minimally coupled canonical scalar field ψ M with Lagrangian L = − 1 2 ∂ µ ψ M ∂ µ ψ M −V (ψ M ), working on a cosmological background and in unitary gauge we find these two independent degrees of freedom can be associated with δ ψ M and Φ (i.e. the scalar metric perturbation of the ii component of the metric). While crossing α B = 2 in the evolution, the following constraint relating these two degrees of freedom emerges where we have assumed that there is no cosmological running of the Planck mass, as is the case for the models considered in this paper. This relation shows that one propagating degree of freedom is eliminated at this point, so only one dynamical degree of freedom remains here. This is alarming, since the number of propagating degrees of freedom therefore changes as we evolve through α B = 2: it is two on either side, but only one remains on the divide itself. This may be an artefact of using perturbation theory, but in any case indicates a potential ill-definedness in the evolution across α B = 2. Giving a definitive answer may require a non-perturbative analysis, which allows to follow the dynamics of the real degree of freedom (and not an approximated version of it), and may allow us to exclude the dangerous situation of being in a strongly coupled 11 Original: https://gitlab.com/carlosggarcia/ horndeski-priors.
This project: https://gitlab.com/ dinatraykova/horndeski-priors  FIG. 13. We show the relative deviation of the CMB temperaturetemperature (top panel) and the matter (bottom panel) power spectra w.r.t. a fiducial model for different values of α B (a = 1) (the corresponding color for each value is shown on the color bar). We chose to pick models with both α B < 2 and α B > 2, to see if some kind of discontinuity could be detected. We notice that the spectra seems continuous and smooth at this point.

regime.
Discontinuity in the evolution equations: For general α B , one can straightforwardly derive the (coupled) evolution equations for δ ψ M and Φ. Using these and taking the limit as α B → 2, one recovers the constraint, Eq. (A1), from the equation of motion for δ ψ M and can then use this constraint to solve for δ ψ M , arriving at a single second order evolution equation for Φ. This reads where we have again assumed that there is no cosmological running of the Planck mass and will also assume that the speed of gravitational waves is precisely the speed of light in what follows -both assumptions are met for the models considered in this paper and violating them would complicate the expressions shown here, although not the qualitative conclusions of this appendix. Now suppose we instead first set α B = 2 in the full quadratic action and then derive the residual evolution equation for the remaining degree of freedom. Again we recover the constraint, Eq. (A1), this time from a Lagrange multiplier in the quadratic action. However, the evolution equation for Φ now instead reads This is identical to Eq. (A2), except for the addition of the last term. While the last term is suppressed with respect to the second last in the sub-horizon limit, this nevertheless again hints at evolutions crossing the α B = 2 point being ill-defined, since there does not seem to be a uniquely defined evolution across this point. However, carrying out the analogous calculation in Newtonian gauge gives the same equations up to terms proportional to α B , which also vanish in this limit. This suggests that the above-mentioned discontinuity in the evolution equations might be a gauge artefact [130].
Summarising, both these issues are alarming and should be investigated in more detail. However, a definitive answer can be given only after further investigation, and this is beyond the scope of this paper. Despite the above issues, it is important to notice -as shown in Fig. 13 -that the CMB and matter spectra do not show any discontinuity when crossing α B = 2. This shows that it is possible to solve this system in such a way that they do not show any observable discontinuity at α B = 2. This corresponds to the second case considered above, Eq. (A3). In addition, given that the majority of the models considered and consistent with current observational constraints never crosses α B = 2, a hard bound at this point should not affect our results qualitatively. While these observations do not resolve the above issues as such, they are nevertheless encouraging and suggest that they may be resolved without invalidating other parts of the analysis. For this reason we here put these issues to one side and do evolve across α B = 2 in this way.