Testing one-loop galaxy bias: cosmological constraints from the power spectrum

We investigate the impact of different assumptions in the modeling of one-loop galaxy bias on the recovery of cosmological parameters, as a follow up of the analysis done in the first paper of the series at fixed cosmology. We use three different synthetic galaxy samples whose clustering properties match the ones of the CMASS and LOWZ catalogues of BOSS and the SDSS Main Galaxy Sample. We investigate the relevance of allowing for either short range non-locality or scale-dependent stochasticity by fitting the real-space galaxy auto power spectrum or the combination of galaxy-galaxy and galaxy-matter power spectrum. From a comparison among the goodness-of-fit ($\chi^2$), unbiasedness of cosmological parameters (FoB), and figure-of-merit (FoM), we find that a four-parameter model (linear, quadratic, cubic non-local bias, and constant shot-noise) with fixed quadratic tidal bias provides a robust modelling choice for the auto power spectrum of the three samples, up to $k_{\rm max}=0.3\,h\,\mathrm{Mpc}^{-1}$ and for an effective volume of $6\,h^{-3}\,\mathrm{Gpc}^3$. Instead, a joint analysis of the two observables fails at larger scales, and a model extension with either higher derivatives or scale-dependent shot-noise is necessary to reach a similar $k_{\rm max}$, with the latter providing the most stable results. These findings are obtained with three, either hybrid or perturbative, prescriptions for the matter power spectrum, \texttt{RESPRESSO}, gRPT and EFT. In all cases, the inclusion of scale-dependent shot-noise increases the range of validity of the model in terms of FoB and $\chi^2$. Interestingly, these model extensions with additional free parameters do not necessarily lead to an increase in the maximally achievable FoM for the cosmological parameters $\left(h,\,\Omega_ch^2,\,A_s\right)$, which are generally consistent to those of the simpler model at smaller $k_{\rm max}$.


I. INTRODUCTION
Over the past decades galaxy redshift surveys have provided a wealth of information on the large-scale distribution of galaxies across the Universe. Clustering measurements of two-point statistics -the galaxy power spectrum P (k) and the two-point correlation function ξ(s) -from large data samples can indeed provide precise measurements about the underlying cosmological model [1][2][3][4].
The inference of cosmological parameters from largescale structure is made intrinsically more difficult by the realisation that galaxies are a biased tracer of the total matter density field [5][6][7][8][9][10][11]. This translates into the necessity of having an accurate description of the relationship between the galaxy and matter density fields, a phenomenon which is commonly referred to as galaxy bias. Even though the latter can be partially understood in a phenomenological way, e.g. using results from Nbody simulations, the complexity of dealing with tracers featuring different morphological properties makes desirable to develop an analytic formulation that is based on a more theoretical background. From this point of view, * pezzotta@mpe.mpg.de perturbative approaches stand as a natural way of describing galaxy bias in a physically motivated way.
The main idea behind this formulation is that the galaxy overdensity δ g can be expressed in terms of a series of operators involving spatial derivatives of the gravitational and velocity potentials. At leading order, this relation is captured by a single multiplicative factor, i.e. δ g = b 1 δ, where δ is the matter density contrast and b 1 is a multiplicative factor called linear bias [6]. Higher-order contributions become progressively more important on mildly non-linear scales, as expected from a sphericallysymmetric gravitational collapse [12,13], in a way that the expression for δ g can be expanded to include higher powers of δ, i.e. δ g = n b n /n! δ n [6,14]. At the same time it has been shown that anisotropies in the process of gravitational collapse are responsible for the generation of non-negligible tidal effects, which also contribute to the local distribution of galaxies [15,16]. This finding followed the realization that the local-in-matter-density bias model had limitations in providing a proper description of the clustering of dark matter halos [17,18] and was leading to incompatible constraints on the quadratic bias b 2 from measurements of the power spectrum and bispectrum [19,20].
The most important aspect of the previously described model is that galaxy bias is treated as a spatially local quantity. However, it is well known that the formation of halos and galaxies is triggered by the gravitational collapse of matter from a spatially finite region, and therefore the local assumption is bound to fail when approaching scales that roughly correspond to the Lagrangian radius R of the host halos. In terms of δ g , this effect can be taken into account by considering not only its dependency on density and tidal fields, but also on functionals of δ [11]. This is equivalent to introducing higher derivatives of the matter density field, that at leading order provide a contribution to δ g of the form R 2 ∇ 2 δ [21][22][23]. A further ingredient to the galaxy-matter relation is represented by stochastic terms, that on large scales behave as an additional contribution to Poissonian shot-noise [24]. Stochasticity is the direct result of small-scale perturbations, which are not correlated over long distances under the assumption of Gaussian initial conditions [25][26][27]. At higher wave modes the halo-exclusion effect [12,28,29] imprints a scale dependence on the stochastic contributions, whose strength is controlled by the Lagrangian radius [30], similarly to higher derivatives. At next-toleading order, this contribution scales as k 2 , and might become relevant even for clustering analyses based on two-point statistics, as shown in [31].
Galaxy bias models based on the one-loop perturbative expansion have been used to extract cosmological constraints from big data collaborations, such as BOSS, using clustering measurements both in configuration [32] and in Fourier space [33][34][35]. More recently, the same data have been re-analysed in a number of works [36][37][38] with novel techniques that nevertheless assume the same biasing scheme for the galaxy-matter relationship. The majority of these analyses assume a different ansatz in terms of the degrees of freedom of the model, by fixing some of the free bias parameters to some physically motivated relations. For example [32] [35] and [38] fix the cubic non-local bias parameter to the Local Lagrangian (LL) relation, while [34] also fix the quadratic non-local parameter. On the other side, [36] set the cubic term to zero and leave the quadratic tidal bias completely free. This paper is part of a set of works [31] [39] in which we explore the impact of different bias modeling choices using a set of three performance metrics, namely the goodness-of-fit, the unbiasedness of sampling parameters, and the merit of the model. To do that, we make use of a set of three different simulated galaxy samples with an effective volume of 6 h −3 Gpc 3 , whose clustering properties and number densities reproduce the ones of three real catalogues, the CMASS and LOWZ samples of BOSS [40][41][42], and the Main Galaxy Sample of SDSS [43]. In order to exclusively concentrate on the modeling of oneloop galaxy bias, we carry out this analysis in real-space removing the impact of redshift-space distortions, which would require an additional modeling layer. In [31] we performed a fixed cosmology analysis by using the measured non-linear matter power spectrum. In that case, we used the linear bias parameter as a proxy for the goodness of our model. Here, we additionally model the impact of matter non-linear evolution and sample over cosmological parameters to determine the impact of one-loop galaxy bias on the recovery of such parameters. In the first part of the paper we use the hybrid perturbative-simulated approach, RESPRESSO, while comparing results for different PT-based dark matter models in a later section.
Our paper is organised as follows. In Section II we summarise the main ingredients of our galaxy bias model including a description of all the terms contributing at one-loop in perturbation theory. This includes also a short review of the three non-linear matter predictions we employ in this work. In Section III we describe the simulated galaxy samples we use to test one-loop galaxy bias, along with fitting procedure, parameter priors and the performance metrics we introduced above. Main results from fit of the auto galaxy power spectrum and the combination between the latter and the galaxy-matter cross power spectrum are given in Section IV. We finally draw our conclusions in Section V.

II. ONE-LOOP PERTURBATION THEORY FOR BIASED TRACERS
The theory describing the evolution of the clustering of biased tracers on mildly non-linear scales is a well established topic (for a review of galaxy bias see [11]) that can be naturally described in the framework of perturbation theory. This section stands as an overview of the most important results obtained at one-loop in standard perturbation theory (SPT) and derived approaches. Notice that for sake of readability we omit all the dependences on redshift z.

A. Galaxy bias expansion
The general perturbative expansion of galaxy bias can be interpreted as a sum of different operators that are functions of the gravitational potential Φ and velocity potential Φ v . Focusing on one-loop contributions, the relationship between biased tracers and the underlying dark matter density field can be described considering terms up to third order in the matter perturbation δ. Following the notation of [44] we can write where the first two terms on the right-hand side are part of the standard local expansion in powers of δ.
In the previous equation G 2 is a Galilean invariant operator, representing the tidal stress tensor generated by the velocity potential Φ v , and it is given by In Fourier space 1 this translates into where θ is the divergence of the matter velocity field, such that Differently from the first two terms of Equation (1), terms involving G 2 incorporate non-local (in matter density δ) contributions that spontaneously arise from the non-linear evolution of the matter density field. The second of these terms is obtained by expressing the nonlinear velocity potential up to second order (i.e. Φ v = Φ v ), and keeping the next-to-leading order correction [45], leading to where ∇ 2 ϕ 1 = −θ is the linear velocity divergence field, and ∇ 2 ϕ 2 = −G 2 (ϕ 1 ) is the next-to-leading order. The net result of adding this higher-order correction is that δ g collects contributions up to third order in the matter perturbation δ.
In addition to the standard expansion, any biased tracer also comes with a physical scale which regulates the importance of higher-derivative operators [8,21,23,27,46], whose leading order scales as ∇ 2 δ. This scale quantifies the size of the region in which galaxy formation occurs, and it is therefore influenced by any short-range gravitational effect and baryonic corrections. For halos, this scale is close to the Lagrangian radius [21,47], while it differs for other kinds of tracers, such as galaxies and quasars, depending on their type.
An important aspect of the previously described biasing scheme is the renormalization of the parameters on which it is based. As a matter of fact, the bare bias parameters listed in Equation (1) are sensitive to the UV cutoff scale used to make one-loop integrals convergent. In particular, the current basis carries a dependence on the variance of the matter density field σ 2 = δ 2 (x) , and more generally to any higher-order correlator δ (n) . This dependence is completely non-physical and can be reabsorbed by means of an adequate renormalization of the bias parameters [48,49]. The new basis (for convention this is denoted without the overscript hat) can be identified using the peak-background split formalism [50], and it is constructed in a way that its components quantify the response of the cosmic mean abundance of tracers to a change in the background density, with no dependence on the one-loop cutoff scale.
Consistently with the naming convention adopted in the context of renormalized perturbation theory [51], these quantities are defined as the ensemble averaged derivatives of δ g with respect to δ L , and, being observables themselves, they are already normalized by construction. For this reason, the multi-point propagators can be identified with the scale-dependent bias parameters in a way that where the H n are the Wiener-Hermite functionals [44,52], and the operator ⊗ is defined as Using this approach, we can define our observables in a fully consistent framework. In this paper we are interested in the galaxy auto power spectrum P gg and galaxymatter cross power spectrum P gm , which are defined as the auto and cross-correlation of the two density fields δ g and δ m , as Substituting Equation (1) in the previous set, we get to the full expressions for the galaxy auto and cross power spectra at one-loop, where P L is the linear matter power spectrum, P mm is the non-linear matter power spectrum, and the one-loop bias corrections read Here F 2 and G 2 are the symmetrised second-order modecoupling kernels [52], and S(k 1 , k 2 ) = k 1 ·k 2 2 − 1 is the Fourier transform of the kernel describing G 2 (Φ v ), as shown in Equation (3). The only two propagator-like contributions are perfectly degenerate with each other, and follow the relation For this reason, in a real analysis, it is common practice to either neglect one of the two tidal field related parameters (e.g. [36]) or to assume perfectly local-inmatter-density relations (see Section II D) to express one of them in terms of lower order local bias parameters (e.g. [32,53]). Since the P b2b2 contribution does not asymptote to 0 in the large-scale limit, we renormalise it as in Equation (14), and absorb the constant low-k amplitude as an additional contribution to the shot noise error, that will be discussed in Section II C.

B. Matter modeling
In this section we discuss the modeling options for the non-linear matter power spectrum P mm in Equations (10) and (11). As a matter of fact an accurate modeling of P mm is essential, as any systematic effect in the description of the matter density field in the range of scales we are considering might lead to invalid interpretations of the galaxy-matter bias relationship.
Differently from [31], where one-loop bias was investigated at fixed cosmology and adopting the measured matter power spectrum as reference, here the main goal is to assess the level of accuracy of our model in terms of cosmological parameters. For this reason, we have to explicitly assume a model for P mm . In the rest of this section we provide a description of the three models we are going to test. In particular we use 1) a refined RPTderived model, based on the preservation of Galilean invariance (dubbed gRPT), 2) an EFT-like approach based on BAO damping and a non-trivial stress tensor, and 3) a mixed approach, RESPRESSO, based on accurate Nbody simulations and a perturbative expansion around the fiducial cosmology.

Standard perturbation theory
The basic assumption of SPT is that dark matter behaves as a perfect pressureless fluid on large enough scales, where matter is not subject to shell crossing as in multi-streaming regions. Under this assumption, and after having expanded the matter density contrast in a Taylor series, i.e. δ = δ (1) + δ (2) + δ (3) + . . ., we find solutions at every order in perturbation as [52] where F n is the n-th order symmetrized kernel describing the non-linear mode coupling between fluctuations at different wavelengths. Moving to two-point statistics, the expansion for the matter power spectrum can be written as where at one-loop the only non-vanishing contributions are It is now well established that a SPT approach like the one described above lead to significant residuals if compared to the output of numerical simulations, even including higher-order corrections [54]. The source of this inaccuracy can be mostly identified in two separate effects, whose description is the subject of the next two sections.
2. BAO damping from large-scale "infrared" modes One of the most acknowledged deviations between oneloop SPT predictions and the matter power spectrum measured from numerical simulations is the shape of the BAO features. Since the characteristic scale of the BAO peak is much larger than the scale at which non-linear contributions become important, one may expect a standard perturbative approach to provide accurate predictions on that scale. However, large-scale bulk motions produce a non-negligible effect on the amplitude of the power spectrum at the BAO scale, the most significant of which is a smearing of the BAO signal due to the largescale relative displacement field [55][56][57][58][59].
These corrections to the matter power spectrum were firstly resummed in the context of RPT [55], and at leading order the net effect is to apply a damping factor to the BAO wiggles. Practically, we can express the matter power spectrum as the sum of a smooth (P nw ) and wiggly (P w ) term [60], The smooth-wiggle split can be realised adopting several different recipes, and throughout this paper we follow the approach described in [61] and [62], where the smooth component is defined as a rescaling of the featureless spectrum firstly defined in [63] to account for broadband difference with the linear power spectrum. At leading order, the damping factor can be calculated assuming the Zel'dovich approximation, and subsequently applied to the wiggly component, so that where is the relative displacement field two-point function at the BAO scale [56]. Here j n is the n-th order spherical Bessel function, k BAO = π/l BAO is the wavemode corresponding to the reference BAO scale l BAO = 110 h −1 Mpc, and k S is the UV limit of integration. To properly account for the resummation of IR modes at any given scale k, one should integrate over all modes q < k, in a way that k S = k S (k). However, pushing the integration to significantly large values of k S would result in the breaking of the range of validity of the pertubative IR expansion. At the same time, it can be shown that the integrand of Equation (24) gives significant contributions only up to k S ∼ 0.2 h Mpc −1 and therefore we restrict the integration to the range [0, 0.2] when computing the value of Σ [36,64].
At next-to-leading order, the IR-resummed matter power spectrum can be written as [64] P NLO (k) = P nw (k) + 1 + k 2 Σ 2 e −k 2 Σ 2 P w (k) where P 1-loop nw is the one-loop matter correction defined in Equation (21) but evaluated using the smooth component P nw rather than the full linear power spectrum P L , and P 1-loop w = P 1-loop − P 1-loop nw .

Small-scale corrections: non-trivial stress tensor
Along with the resummation of infrared modes, we also have to consider the impact of small-scale physics on long wavelength fluctuations. This happens because the original assumption of a perfectly pressureless fluid is bound to fail on non-linear scales, where dark matter experiences shell-crossing in multistreaming regions [52]. Moreover, the effect of baryonic physics, such as galaxy formation, cooling and feedback, also contributes in generating a baryonic pressure that impacts the clustering of dark matter on larger scales.
The net effect of UV scales on dark matter clustering is to generate a non-zero stress tensor [65] whose leading contribution to the matter power spectrum is to add a counter-term of the form [65][66][67] P ctr (k) = −2 c 1 k 2 P L (k).
Here c 1 can be treated as an effective speed of sound, that reflects the influence of short wavelength perturbations, and in particular of the complex physics beyond galaxy formation. Given the poor knowledge about these types of processes, a standard assumption is to treat c 1 as a free parameter (see e.g. [68,69]) and marginalise over it to obtain the posterior distribution of the parameters of interest. By inspection of the individual terms contributing to one-loop formulas in Equations (10) and (11), we can notice that the counter-term defined in Equation (26) is completely degenerate with the leading order higherderivative contribution defined in Section II A, as they both scale as k 2 P L (k). In principle we may remove this degeneracy when jointly fitting the auto and cross galaxy power spectra, as the non-vanishing stress tensor affects only the one-loop matter power spectrum P mm . This results in different imprints on P gg and P gm , In practice, the sensitivity to UV modes might affect in different ways P gg and P gm , leading to inconsistent values of c 1 between the two observables. For this reason in the rest of the paper we will employ two independent free parameters β P and β × P characterising the k 2 P L (k) contributions coming from P gg and P gm , respectively.

Modeling of the non-linear matter power spectrum
In this section we briefly summarise the three different prescriptions we adopt to model the non-linear matter power spectrum P mm throughout the rest of the paper.
The first of such models is based on a perturbativesimulated mixed approach, which revolves around a fiducial high-resolution measurement of the non-linear matter power spectrum from N-body simulations, and a twoloop perturbative expansion in the cosmological parameter space for the response function [70,71]. The latter quantifies the variation of the non-linear power spectrum at scale k induced by a variation of the linear power spectrum at scale q, namely Under the assumption of having a reliable measurement of P mm for a fiducial cosmology θ fid , it is possible to predict the same observable at a generic position θ as The range of validity of this mixed approach becomes progressively less accurate for cosmologies that are far way from θ fid , but this issue can be overcome by employing a multi-step reconstruction starting from the fiducial cosmology.
The RESPRESSO public package [72] makes use of this approach, starting from a fiducial measurements of P mm from a set of high-resolution N-body simulations with the Planck 2015 cosmology [73].
The second model we consider is based on the Effective Field Theory of Large Scale Structure [66,74], and it is close to what was recently used in the full shape analysis of the BOSS DR12 galaxy power spectrum [36]. At oneloop in real-space, this model is based on SPT results, but it also accounts for the effect of IR and UV modes on the evolution of the matter power spectrum, as described in the previous two sections. Namely, we can write a simple expression for the one-loop matter power spectrum, that reads where P NLO (k) and P ctr (k) are defined in Equations (25) and (26), respectively. The third model we consider is based on a particular flavour of Renormalised Perturbation Theory (RPT) [51,55]. In this kind of approach, the non-linear matter power spectrum is typically separated into a component that evolves the initial density contrast independently at each wavelength, called propagator G(k), and a mode-coupling term that accounts for the mixing of scales due to nonlinear evolution, so that we can write In a RPT-based approach (e.g. [75,76]), the propagator is resummed while keeping the mode-coupling term at a fixed order, leading to a breaking of the Galilean invariance (GI) of equal-time correlators [77]. This translates into an unphysical damping of the broadband power, which becomes mostly significant in the UV regime. The RPT flavour we consider here, known as gRPT, effectively attempts to resum the mode-coupling term in a way that is consistent with the resummation of the propagator (see equation 17 of [31] for the explicit formula). At the same time, this approach naturally incorporates IR resummation consistently with what is described in Section II B 3. We notice that, although even in this case the impact of the non-zero stress tensor should be taken into account with the addition of an effective speed of sound, the broadband predicted by gRPT is slightly suppressed with respect to SPT predictions. This would in principle lead to an even smaller UV counter-term, and for this reason we fix c 1 = 0 when modeling P mm with gRPT.
Being partially calibrated on numerical measurements, RESPRESSO provides much more accurate results on nonlinear scales, as it intrinsically incorporates higher-order corrections with respect to the previous two models, which include only one-loop contributions to the matter density field. For this reason, in the first part of this paper we fix the description of dark matter non-linear evolution following this approach, and check the impact of using different matter models only in Section IV D. This choice is also motivated by the analysis performed in [31], that showed how RESPRESSO is the model that most closely reproduces the performances of the true matter power spectrum measured from simulations.

C. Stochasticity
One ingredient that is still missing from Equation (10) is the stochastic contribution to the galaxy power spectrum. As a matter of fact, the previous relations are completely deterministic, and assume a one-to-one correspondence between the distribution of galaxies and the combined effect of the matter density and tidal fields. However, galaxy formation is determined not only by large-scales perturbations, but also by short wavelength modes. Under the assumption of Gaussian initial conditions, these modes are completely uncorrelated from large-scales fluctuations, and give birth to an additional stochastic field ε g which is dependent on the local distribution of matter.
In practice, we can define the stochastic contribution P εgεg to the power spectrum as [11] and add this contribution to the one-loop galaxy power spectrum in Equation (10). As for the modeling of P εgεg , we notice that the relation between the distribution of galaxies and the high-k modes is not exactly local, as it depends on the distribution of matter within a small finite region, similarly to the higher-derivatives of the galaxy density field. For this reason we can write where n is the mean number density of galaxies in the considered volume. The constant contribution N 0 represents deviations from purely Poissonian shot-noise (1/n), while higher-order corrections, of which the leading term scales as k 2 , are generated to account for the short-range non-locality described above. N 0 is expected from the halo exclusion effect [28][29][30] for which two different dark matter halos cannot overlap (the same principle is what drives the effective matter pressure in multi-streaming regions). This implies a deviation from Poissonian shot-noise that can be either positive (super-Poisson) or negative (sub-Poisson), depending on the considered tracer. Sub-Poissonian shot-noise is more expected for central galaxies of massive halos, while super-Poissonian values are more typical of galaxy populations with high satellite fractions [30,78].
Notice that assuming Gaussian initial conditions all cross-correlators of the form ε g δ are null by construction, and therefore Equation (33) is the only stochastic contribution to the galaxy auto power spectrum P gg . In reality, non-linear gravitational evolution introduces a degree of correlation between long and short wavelengths, so that ε g δ = 0 at later times, but all of these contributions are subdominant in the case of the power spectrum, and can thus be neglected.
As anticipated in Section II A, the quadratic term P b2b2 needs to be renormalized in order to provide a null contribution in the low-k limit. For this reason, we subtract from Equation (14) its large-scale asymptote [21,48], defined by and reabsorb it into the constant shot-noise parameter N 0 . Along with P gg it can be shown that also P gm requires an additional stochastic component. In this case, stochasticity is not sourced by ε g , as once again all correlators of the form ε g δ vanish, but rather by the matter density field itself via a new stochastic field ε m . The reason of this is that dark matter ceases to behave as an ideal pressureless fluid on small-scales, where the dynamics of gravitational collapse is subject to shell crossing. This translates into an effective pressure exerted by the matter density field, whose contribution to the galaxymatter cross power spectrum scales as k 2 in the low-k limit [79]. For this reason, it follows [11] where D. Co-evolution relations Although the previously described galaxy bias expansion is complete at one-loop, the large number of free bias parameters makes its applicability to real datasets difficult, particularly when using the information contained only in the two-point statistics (using additional constraints from e.g. the galaxy bispectrum can break some of the degeneracies between parameters). For this reason it is common practice to adopt empirical relations among bias parameters in order to reduce the degrees of freedom of our model.
The most natural expressions are the so-called local Lagrangian relations [21,[80][81][82]. The latter are based on the assumption that galaxies (or more generally any biased tracer of the matter density field) are formed instantaneously at an infinite past time, that galaxy formation is driven exclusively by the local matter density field, and that the number of tracers is conserved after their formation. Under these assumptions, it is possible to describe the subsequent evolution of the galaxy density field under the effect of gravity, which leads to the appearance of higher-order non-local operators even in the presence of a purely local relation at the time of formation.
It is possible to relax the Lagrangian local-in-matterdensity assumption, while still requiring the conservation of the total number of tracers. In this way, we can express the Eulerian non-local parameters γ 2 and γ 21 as a function of the corresponding Lagrangian counterparts, where the subscript L indicates Lagrangian quantities, and the remaining terms in the right-hand sides are the result of gravitational evolution [15,16,44]. Although local Lagrangian relations (i.e. γ 2,L = γ 21,L = 0) have been proven to be much more accurate than simply neglecting non-local terms, recent measurements of non-local bias parameters from a wide range of halo samples showed a slight deviation with respect to observations [83,84]. An alternative approach for the quadratic tidal parameter is based on the excursion set theory [85], for which γ 2 can be predicted as a function of b 1 using a quadratic fit, As shown in Figure 1 of [31], this relation is more accurate than the local Lagrangian approximation for tracers with b 1 1.3. Therefore this relation should apply much better to our datasets, which show a linear bias consistently larger than this value (see Table I). For this reason, we fix γ 2 to Equation (39) throughout the rest of the paper.

III. DATA AND METHODOLOGY
A. Simulated galaxy samples The robustness of our biasing scheme can be assessed only by validating the model over a wide range of tracers, featuring different host halo masses, galaxy bias, and redshifts. For this purpose, we make use of a set of three different synthetic galaxy samples, whose main properties are summarised in Table I. The three catalogues were generated by populating dark matter halos with galaxies using Halo Occupation Distribution (HOD) prescriptions. Since the HOD parameters were calibrated to obtain number densities consistent with the ones of pre-existing real observations, for sake of easiness we label our simulated catalogues with the name of the corresponding data samples. Nevertheless, we remind the reader that here we make use of the full comoving snapshot volume, without considering any selection effect, as the goal is to investigate the range of validity of 1-loop galaxy bias, leaving aside the impact of observational systematics.
The CMASS sample is based on the Minerva simulations [53], which consist in a set of 100 realizations of a (1500 h −1 Mpc) 3 cubic box with periodic boundary conditions. The simulations were run using the public Gadget code [86,87], that regulated the motion of 1000 3 dark matter particles within the aforementioned volume. Initial conditions were set up using the linear power spectrum obtained from CAMB [88] and displacing particles according to second-order Lagrangian Perturbation Theory (2LPT) [89]. The comoving snapshot has a redshift of z = 0.57 and it is meant to reproduce the properties of the BOSS CMASS galaxy sample [90].
The LOWZ (z = 0.342) and MGS (z = 0.132) samples are based on the Oriana and Carmen boxes of the LasDamas N-body simulations [91,92]. These are a set of 40 independent realizations with periodic boundary conditions, with a volume of (2400 h −1 Mpc) 3 and (1000 h −1 Mpc) 3 , and a mass resolution of 4.6 × 10 11 h −1 M and 4.9 × 10 10 h −1 M , respectively. Initial conditions were also set up using 2LPT, but in this case the initial power spectrum is computed using CMB-Fast [93]. The HOD parameters of these synthetic catalogues were selected to reproduce the properties of the BOSS LOWZ and SDSS Main Galaxy Sample (MGS) at M r < −21.
The complete set of cosmological parameters for Minerva and LasDamas is summarised in Table II.

B. Measurements of power spectra and their covariances
We make use of the estimator described in [94], based on fourth order particle assignment scheme and interlacing optimization to reduce the effect of large modes aliasing, to measure the galaxy auto power spectrum P gg and the galaxy-matter cross power spectrum P gm for all the tracers described in Table I. We adopt a linear k binning from k min = k f , where k f = 2π/L is the fundamental frequency of the box of size L, to k max = k Nyq , where k Nyq = πN grid /L is the Nyquist frequency corresponding to a FFT grid of size N grid . We use a linear binning with step ∆ k = k f for CMASS and LOWZ, and ∆ k = 2k f for MGS. Since we model the stochastic contribution to the galaxy auto power spectrum in terms of deviations from Poisson shot-noise, we correct our raw measurements of P gg by subtracting the constant factor P noise = 1/n.
Our final data vector consists in the ensemble average over the number of independent realizations N R (100 for CMASS, 40 for both LOWZ and MGS). First, we define the averaged observable where X ∈ {P gg , P gm }, i is the index running through the k binning, and the superscript n refers to the n-th realization of the considered observable. From this definition we estimate the auto-and cross-covariance matrices as Similarly to what was done in [31] we retain only the diagonal entries of the previously defined covariance matrices. This approximation is justified given the significant low number density of the synthetic samples we consider (actual values are listed in Table I). These numbers translate in a significant shot-noise contribution to the power spectrum, whose main effect is to boost the diagonal entries of the covariance matrix, leading to subdominant off-diagonal terms. Therefore we approximate our covariance matrices with a block-diagonal shape, and consider only the auto-and cross-correlation at the same wavelength (i.e. C ij = 0 for k i = k j ). Additionally, in order to reduce noise due to the limited number of independent realizations, we compare the raw covariance matrix to Gaussian predictions [53] for each k bin, and retain the maximum of the two values.
The error budgets of our galaxy samples are subsequently rescaled in order to match the same effective volume [95,96], defined as where n and V are the mean number density and volume of the considered sample, respectively, and we choose the reference scale to evaluate the effective volume as k = 0.1 h Mpc −1 . In this way the constraining power and the signal-to-noise ratio of the three galaxy catalogues are artificially set to roughly match the same amplitude. We choose to rescale the covariance of both CMASS and MGS to match the effective volume of the LOWZ sample ≈ 6 (Gpc/h) 3 . Table I shows the rescaling factor for the three samples, that is simply defined as the sample-to-LOWZ ratio between the respective effective volumes.

C. Fitting procedure and prior choices
We make use of the large number of independent realizations for each synthetic galaxy sample to define the overall likelihood function as the product of the individual N R likelihoods [31,97]. In this way, under the assumption that the individual likelihoods are well described by a multivariate Gaussian distribution, we can define the overall likelihood function as where X (n) i is the measurement from the n-th realization of the i-th k bin, µ i is the corresponding model prediction, and C ij is the rescaled covariance matrix as described in Sec. III B. With this definition, our likelihood ends up coinciding with the likelihood of the mean of the data with covariance C ij . In practice, there is a constant factor between the two definitions, that depends on the number of independent realizations N R and the number of data points N b . This factor is taken into account when deriving the goodness of fit for each of the tested configurations.
The inference of model parameters is carried out through a least-χ 2 analysis based on a standard Metropolis-Hastings MCMC algorithm. The likelihood incorporates a complete recipe for one-loop galaxy clustering and an interface to CAMB. For each tested configuration of the MCMC, we first run some preliminary chains that are needed to obtain a robust estimate of the parameter covariance, which is essential for the good convergence of the Markov chain in a highly dimensional parameter space. We iterate this process twice, each time specifying the parameter covariance obtained at the previous step, before running the final set of chains. These are terminated as soon as the chains satisfy a standard Gelman-Rubin convergence criterium, i.e. as soon as the between-chain and within-chain variances (we run 8 independent chains for each case, changing the initial random seed) are in agreement within R < 1.1 [98]. After removing the burn-in, these chains are combined into a single object, which we pass to the software package getdist [99] to extract the parameter posteriors.
The models described in Section II embed a large number of free parameters. In this analysis we vary both cosmological and nuisance terms (the latter include bias, shot-noise and matter counter-term). Since we are constraining two-point statistics only via galaxy clustering, we fix both the baryon density parameter Ω b h 2 and the primordial spectral index n s to their fiducial values (listed in Table II), and vary the CDM density parameter Ω c h 2 and the Hubble constant h. We add the primordial scalar amplitude A s , but only when fitting the combination of P gg and P gm , since the fit of the auto galaxy power spectrum alone cannot break the strong degeneracy between A s and b 1 . On the contrary, while both P gg and P gm have the same dependency on A s , they scale differently with the linear bias, i.e. P gg ∝ b 2 1 and P gm ∝ b 1 in the linear limit. We choose a flat prior for the three cosmological parameters that is large enough to fully contain their posterior distribution up to 2σ even for the least constraining configuration. III. Adopted priors for the full list of cosmological and nuisance parameters of the fits. We impose a uniform prior (U) for all the sampling parameters. The scalar amplitude As is not sampled over when fitting only Pgg but it is when adding the additional constraint from Pgm. In all cases the second-order tidal bias γ2 is fixed to the excursion set relation defined in Equation (39 SHOT-NOISE 10] The same choice of uniform priors is chosen for the nuisance parameters. We sample the combination of the bias parameters with the corresponding power of σ 8 in order to avoid strong degeneracies between the one-loop bias expansion and the intrinsic non-linearity of the matter power spectrum P mm , that at first order is well captured by σ 8 . As anticipated in Section II D, we fix γ 2 to Equation (39), and vary all other bias parameters freely. We quote higher-derivatives parameters with respect to an arbitrary fiducial scale k HD = 0.4 h Mpc −1 , while all the shot-noise parameters can be naturally expressed in units of 1/n. Since terms involving N 2 and N × 2 both carry a k 2 -dependency, we also express the latter in units of k HD .
The full list of priors for the model parameters is shown in Table III.

D. Rescaling of the input linear power spectrum
As discussed in Section III, Minerva and LasDamas adopt different Boltzmann solvers to obtain the power spectra for the initial particle displacements. In particular, the Minerva runs assume initial conditions based on CAMB, while the LasDamas simulations make use of the power spectrum computed with CMBFast. Figure 1 shows the fractional deviation between the output of the two codes at the reference cosmology of the LasDamas runs and at the redshift of the MGS sample. This difference is compared to LOWZ and MGS intrinsic signalto-noise ratio, represented by the two shaded areas. The two spectra deviate at the level of ∼ 0.5% in the range of scales that are relevant to this analysis, and this might represent an issue when modeling the galaxy-galaxy and galaxy-matter power spectra. For this reason, when fitting LOWZ and MGS, we employ a rescaling scheme based on the renormalization of the input power spectrum of CAMB by the ratio shown in Figure 1. This approximation is valid as long as the sampled cosmology is not far from the fiducial one, and this becomes a completely legitimate assumption when hitting exactly the true cosmology. However, we claim that even for cosmologies that are slightly off with respect to the fiducial one, this rescaling scheme is already better than directly using the CAMB prediction with no further correction.
In order to perform this rescaling, we first apply the units transformation h Mpc −1 → Mpc −1 . This is meant to assure that the power spectrum ratio is obtained at the same set of scales, avoiding the dependency on the Hubble parameter h [100]. At each step of the Markov chain, the current CAMB prediction is firstly transformed into Mpc −1 units, rescaled using the approach described above, and finally transformed back into h Mpc −1 units.

E. Performance metrics
The validation of one-loop galaxy bias as described in Section II is mostly based on its range of validity. In order to select the maximum mode at which our model stops providing good performances, we run multiple MCMC chains varying k max in the interval (0.1, 0.3) h Mpc −1 with a step of 0.025 h Mpc −1 , leading to 9 different fitting ranges for each configuration. We stop at k max = 0.3 h Mpc −1 as pushing the model to even smaller scales would inevitably require accounting for two-loop contributions in the description of galaxy bias and non-linear matter evolution. Clearly, exploiting the small-scale information contained in the non-linear regime naturally provides a better constraining power for the model parameters, but at the cost of introducing theory systematics in their posterior distribution. As described in [31] (see [62] for a similar approach), we make use of a set of three different performance metrics, whose goal is to provide a quantitative way to compare the posteriors extracted from our Markov chains, and to determine when a particular model no longer gives an accurate description of our datasets.

Figure of bias
The first quantity we are interested in evaluating is the ability of the model to provide unbiased measurements of its free parameters. In particular, in this work we focus mostly on the systematic error of the cosmological parameters: the Hubble constant h, the cold dark matter density Ω c h 2 , and, when considering the joint fit between P gg and P gm , the primordial scalar amplitude A s .
We define the Figure of Bias (FoB) of the considered model for a given parameter set θ as where θ represents the mean of the posterior distribution, θ fid represents the fiducial position, and S is the parameter covariance expressed in matrix form. In this way, we are simply quantifying the relative separation of the measured parameter from its true value in terms of the variance of the posterior distribution. If the FoB is evaluated only for one parameter, then the standard 68%−95% percentile thresholds correspond to a FoB of 1−2, respectively. On the contrary, when more than one parameter is considered to compute the FoB, the two percentiles can be calculated by direct integration of a multivariate Gaussian distribution with the corresponding dimensionality. For the case we consider n = 2 (n = 3) it follows that the 68% − 95% percentiles correspond to a FoB of 1.52 − 2.49 (1.88 − 2.83), respectively.

Goodness-of-fit
Along with finding unbiased constraints on the parameters of interest, we also ask our model to provide a good description of the observables we use in the fit. We quantify the goodness-of-fit in terms of the standard χ 2 extracted from the Markov chains (Equation (43)), but after rescaling it to account for the additional factor η introduced in the covariance of the data (see Section III B). With this rescaling, we can compare the recovered χ 2 to the 68% and 95% percentiles of a χ 2 distribution with N dof degrees of freedom, where Here N R is the number of independent realizations for each galaxy sample, N b is the total number of k bins included in the fit, and N p is the number of free model parameters.

Figure of merit
The final metric we employ is based on the merit of the considered model with respect to the cosmological parameters that are varied in the fit. We define the .
Here S(θ) ≡ S(θ)/θ 2 fid is the block corresponding to the parameter subset θ rescaled by their fiducial values. With this last normalization we are explicitly asking for a relative FoM rather than an absolute quantity, that would be more difficult to compare to the one of other parameters.

IV. TESTING ONE-LOOP GALAXY BIAS
In this section we analyse parameter posteriors extracted from several Markov chains, where we vary both the maximum fitting scale k max and the final expression we adopt to model one-loop galaxy power spectra. The complete list of model configurations is summarised in Table IV, along with the total number of free parameters in each case. Based on results from [31], we decide to employ RESPRESSO to model P mm in this section, and to leave the comparison between different matter models to Section IV D. In this way we can focus exclusively on assessing the validity of one-loop bias in real-space leaving aside both the impact of non-linear matter evolution and redshift-space distortions (that will be the topic of a future paper). For each configuration, we compute the three performance metrics defined in Section III E and show their trends as a function of k max . At the same time, we perform sanity checks on the linear bias b 1 σ 8 and the large-scale shot-noise parameter N 0 , for which we have fiducial values to compare with (see Table I). As anticipated in Section II D, throughout the rest of the paper we fix the second-order non-local bias parameter γ 2 to the excursion-set relation (Equation (39)), in order to break the strong degeneracy between the former and γ 21 . This choice is also based on results from [31], where it was shown how this choice led to stable and overall accurate measurement of the linear bias.
In the next two sections we analyse the posterior derived from fits of the auto galaxy power spectrum P gg alone, and from the combination of the auto and cross galaxy power spectra, P gg + P gm . In the latter case, the additional information contained in P gm allows to break the degeneracy between the linear bias and the amplitude of primordial fluctuations, allowing us to also sample over A s .
In both cases, we adopt a criterion for defining the range of validity of a given model, which is based on a combination of FoB and goodness-of-fit. Following [31], we define the model-breaking statistics as where the subscript percentages correspond to the percentiles of the corresponding distribution (FoB and χ 2 ).
We say that a given model breaks down at a scale k † when the model-breaking statistics assumes a critical value σ crit at k † , i.e.
We arbitrary choose the critical threshold σ crit = 1.5. In this way we would accept models with a maximum FoB one and a half times larger than the value corresponding to the 1-sigma of the FoB distribution, but only under a perfect recovery of the shape of the input dataset. Practically, this case is unrealistic at high k max , as the χ 2 progressively deviates from the number of degrees of freedom because of the weakening of the model. For this reason σ MB receives individual contributions from the FoB and the goodness-of-fit.
A. Validity of one-loop galaxy bias for the auto power spectrum In this section we analyse results from the fits to the auto power spectrum P gg . We fix the description of the non-linear matter power spectrum and test different extensions of one-loop galaxy bias. The standard (STD) model is based on two free cosmological parameters (h, Ω c h 2 ) plus four free nuisance parameters (b 1 σ 8 , b 2 σ 2 8 , γ 21 σ 3 8 , N 0 ). The k 2 -dependent shot-noise (k2N) and higher-derivatives (HD) extensions have one additional free parameter each, i.e. N 2 and β P , respectively.
In Figure 2 we show the three performance metrics extracted from the three different P gg measurements (CMASS, LOWZ and MGS), plotted against the maximum scale k max included in the fit. In this case we show the FoB and FoM corresponding to the combination h, Ω c h 2 ), i.e. to all the cosmological parameters that are varied in the model. The two grey shaded areas in the panels corresponding to FoB and goodness-of-fit mark the 68th and 95th percentiles of the corresponding quantity. As described in Section III E 1, the 68% − 95% FoB limits do not correspond to values of FoB = 1 and FoB = 2, but rather have to be computed integrating a multivariate Gaussian distribution with n = 2 number of dimensions.
For the three galaxy samples, we find that the STD model (blue line) alone is enough to provide a good description of the dataset in terms of goodness-of-fit, as the reduced χ 2 constantly stays within the 68% (dark grey shaded area) of the corresponding χ 2 distribution. At the same time, the combined constraint on the (h, Ω c h 2 ) pair is unbiased, showing a multivariate posterior distribution that is able to capture the fiducial position at typically ∼ 0.5 σ. Adding either an additional k 2 -dependent shot-noise (red line) or the leading higher derivatives correction (green line) does not significantly alter the two metrics, while reducing the amplitude of the corresponding FoM (∼ 10% when adding the stochastic contribution Performance metrics refer to the combinations of h and Ωch 2 , which are the only two free cosmological parameters of the model. The standard (STD), k 2 -dependent shot-noise (k2N), and higher derivatives (HD) models are colour-coded as shown in the legend. In all three cases, the non-linear matter power spectrum is modelled with RESPRESSO, and the second-order tidal bias γ2 is fixed to Equation (39). The two shaded grey regions mark the 68th and 95th percentiles of the FoB and χ 2 distribution. N 2 , ∼ 20 − 30% when adding the higher derivatives parameter β P ). Even for the STD model, for which we expect the model to fail earlier than the other two extensions) the model-breaking scale k † is higher than the maximum scale we probe (k max = 0.3 h Mpc −1 ). However, we decide to stop at this scale without exploring smaller scales, even if the model-breaking criterion were most likely satisfied at higher k max . As a matter of fact it would be hard to separate the effective goodness of the model from the impact of two-loop contributions to galaxy bias.
We notice how, similarly to what was obtained in [31] when focusing on the linear galaxy bias, there is a tendency of the FoM to flatten above an approximate scale of k max = 0.25 h Mpc −1 . This is an indication that pushing the non-linear model to increasingly higher wave modes does not necessarily imply a more stringent measurement of the cosmological parameters, as most of the additional constraining power is absorbed by nuisance parameters such as higher-order galaxy bias contributions.
In Figure 3 we show the dependence on k max of the linear bias parameter b 1 σ 8 and the constant stochastic con-tribution N 0 , for which we have fiducial values obtained exploiting the large-scale limit of the measured galaxy and matter power spectra. Although the extension to either the k2N or the HD model enlarges the size of the errorbars, we notice how the former model is the only one capable of providing a simultaneous unbiased measurements of the two parameters, for the three galaxy samples we are considering. In particular the LOWZ sample shows a clear detection of N 2 , that, if not accounted for, can lead to a significant (> 2-σ) systematic effect on N 0 above k max ∼ 0.25 h Mpc −1 , while partially recovering the true amplitude of P gg with an underestimation of b 1 σ 8 . The only exception is represented by the MGS sample, whose linear bias is constantly overestimated by a ∼ 2% factor for all the three modeling assumptions, and that is consistent with the fiducial value only up to 2-σ.  Table I.

B. Consistency between auto and cross power spectra
In this section we focus on a much more stringent test, corresponding to the simultaneous fit of both the galaxy auto power spectrum P gg and the galaxy-matter cross power spectrum P gm . As a matter of fact, a good accuracy for this combined statistics might become crucial in analyses that aim to exploit the whole information contained in galaxy clustering and galaxy-galaxy weak lensing, as most of the upcoming large observational projects are going to do (3 × 2-point analysis, see e.g. [101][102][103]).
The different scaling of P gg and P gm on the linear bias b 1 leads to the breaking of the strong b 1 − A s degeneracy. Therefore, we additionally sample the scalar amplitude A s , effectively extending by one the number of degrees of freedom of our model. Also in this case, we concentrate on the one-loop biasing scheme, fixing the description of the matter clustering to the output of RESPRESSO. We test the standard (STD) model against the k 2 -dependent shot-noise (k2N) and higher derivatives (HD) templates, for which we introduce two more degrees of freedom (one free parameter for P gg and P gm , each). In summary, the STD model is based on three free cosmological parameters (h , Ω c h 2 , A s ) plus four free nuisance parameters (b 1 σ 8 , b 2 σ 2 8 , γ 21 σ 3 8 , N 0 ). The k2N and HD models have two additional free parameters each, i.e. (N 2 , N × 2 ) and (β P , β × P ), respectively. Note that that the second-order tidal bias γ 2 is not a free parameter, but it is fixed to the excursion-set relation defined in Equation (39) for all the cases we consider.
In Figure 4 we show the three performance metrics in a similar way as we did for the P gg -only fits. In this case both FoM and FoB are computed from the combinations of the three cosmological parameters (h, Ω c h 2 , A s ). We first notice how the use of the STD model is no longer sufficient to provide an accurate recovery of the cosmological parameters on the overall range of scales we are considering. Indeed, the FoB of both CMASS and LOWZ quickly increases above 2-σ at an approximate scale of k max ∼ 0.23 h Mpc −1 and keeps getting worse at higher wave modes. The same trend is observed in panels referring to the goodness-of-fit, as the reduced χ 2 starts to fall outside the 95% confidence region at approximately the same scale cut. According to the model-breaking criterion, the range of validity of the STD model is limited to scales below k ∼ 0.2 h Mpc −1 (this is reflected in the plot by the solid-to-dashed line transition).
Adding k 2 -dependent stochasticity increases both the accuracy in the recovery of the cosmological parameters and the overall likelihood between data vectors and bestfitting model. Indeed, the k2N model shows a FoB which is constantly within 1-σ for the three galaxy samples. Moreover, the goodness-of-fit is significantly improved with respect to the STD model, with the χ 2 measured at the highest k max still being consistent within 1-σ. The combination of FoB and goodness-of-fit can be observed via the model-breaking criterion on the FoM, for which The HD model also provides a better recovery (compared to STD) of the cosmological parameters, within 1-σ from the fiducial position, but it fails earlier than the k2N model to provide the required goodness-of-fit for both CMASS and LOWZ, hinting for a stronger necessity of adding k 2 -dependent stochasticity with respect to short-range non-localities.
In terms of FoM amplitude, the k2N and HD models behave in a mostly similar way, significantly reducing the statistical constraints of the cosmological parameters at any given k max cut. However, we notice how according to the model-breaking criterion both models allow to push the analysis to higher k max with respect to the STD case. In other words, a proper comparison of the merit of the three models should be done considering the individual scale k † after which the model breaks down. Considering the CMASS sample, the STD model provides a FoM consistent with the one obtained from the HD case, and ∼ 30% larger than the one of the k2N model. However, the latter two models have yet to reach the maximum scale k † corresponding to the breaking of the model. A simple exercise of extending the k2N model to slightly larger modes values shows that the latter turns out to be competitive against the STD model at a scale of k max ∼ 0.35 h Mpc −1 . As for LOWZ, the HD case is significantly less constraining than the STD model (∼ 50%), mostly because its range of validity does not run all the way up to k max = 0.30 h Mpc −1 but stops earlier. In contrast, the k2N model outperforms the standard model by a ∼ 10% factor, although also in this case the model breaking scale k † has yet to be reached.
A particular case is represented by MGS, that differently from the previous two samples shows a stable and accurate recovery of cosmological parameters all the way up to k max = 0.3 h Mpc −1 even with the STD model. The reason is most likely related to our choice of employing RESPRESSO to model the non-linear matter evolution. This ultimately provides an accurate description of P mm even at the low redshift of MGS, leaving only the galaxy-matter relation to be modelled. Since MGS features the lowest large-scale bias among the samples we consider, it is likely that the one-loop standard expan- sion is enough to provide both a good χ 2 and an accurate parameter recovery on the full range of scales up to k max = 0.3 h Mpc −1 .
Differently from the fit of the galaxy auto-power spectrum, in this case the FoM for the three different galaxy samples do not show a flattening on the range of scales we are considering. Checking the independent contributions coming from the individual cosmological parameters, we notice how the FoM does actually flatten for both h and A s , whereas it monotonically increases for Ω c h 2 . This is a hint showing that the matter density parameter still benefits of additional informations contained in the mildly non-linear regime, when combining the two galaxy power spectra.
In Figure 5 we show the dependence of the linear bias b 1 σ 8 and the constant shot-noise contribution on k max , similarly to what we showed in Figure 3 for P gg alone. The more stringent requirement of simultaneously fitting P gg and P gm makes the standard model fail in providing accurate measurements of both parameters. In particular, b 1 σ 8 starts to be biased at approximately the same scale for which the model also provides incorrect measurements of the cosmological parameters. Adding either k 2 -dependent shot noise terms or higher derivatives parameters help in recovering the true values of the bias, with a slight preference for the former. More generally the k2N model is the only one that is capable of simultaneously providing unbiased measurements of the cosmological parameters, the linear bias and the large-scale stochastic contribution to the galaxy power spectrum. This is also in agreement with the results obtained in [31] at fixed cosmology.

C. Constraints on stochasticity and higher-derivative parameters
After having focused on the marginalised posteriors of the cosmological parameters, linear bias and constant shot noise, here we try to put internal constraints on both the k 2 -dependent stochastic parameters and the higher derivatives parameters. In particular, we want to check whether these terms can be detected with statistical significance under the assumption of using a reference effective volume like the one described in Section III. Figure 6 shows marginalised constraints for the two sets of parameters (extracted from the P gg + P gm chains) as a function of k max . For consistency check, in the panels referring to parameters that enter the modeling of the galaxy auto power spectrum, N 2 and β P , we also show less stringent constraints extracted from the chains corresponding to the fit of P gg (shaded grey band).
The consistency of the k 2 -dependent shot-noise model is additionally reinforced by the trend observed in the top panels of Figure 6. As a matter of fact, the relation between N 2 /N × 2 and k max can be well described by a straight line with zero slope, this fact intrinsically hinting at a correct behaviour for both parameters, as their profiles remain stable when adding constraints at higher wave modes. Moreover, there is a clear detec- tion of the k 2 -dependent shot-noise parameters, with a statistical significance that further reinforce the impact of adding N 2 and N × 2 in terms of goodness-of-fit. Indeed N 2 and N × 2 are detected at k max = 0.3 h Mpc −1 with a statistical significance of 3.5-σ and 2.5-σ, respectively, for the CMASS sample. With LOWZ the detection becomes even larger, with numbers that are close to 6-σ and 4-σ, respectively. The MGS sample is the one showing the least significant detection of both parameters, with a possible modification of the recovered value of N × 2 at k max = 0.3 h Mpc −1 . However, we remind that the STD model performs surprisingly well on this dataset, and therefore the weaker detection of the two parameters is partially expected.
The higher derivatives parameters also show stable results as a function of k max , with the highest detection represented by β × P for LOWZ. However on these scales the HD model is already broken, as shown in Figure 4. For the other cases, the typical significance of the detection is set to ∼ 2-σ and ∼ 4-σ for β P and β × P , respectively. Differently from [31], we do not observe incompatible results between the marginalised posteriors of N 2 and β P from fits of P gg and P gg + P gm . However in this case we significantly extended the dimensionality of the parameter space leading to weaker constraints on the model parameters, so that the deviation may be hidden by the larger statistical noise.

D. Results for alternative matter models
So far we have fixed the modeling of the non-linear matter power spectrum using a hybrid approach such as RESPRESSO. Another way of proceeding is to employ a perturbative description not only for one-loop bias but also for P mm . In this section, we analyse the impact of a different modeling of P mm inside Equation (10) and (11), and for this purpose we make use of the two additional models described in Section II B 4.
We remind the reader that these models correspond to 1) an EFT-like model based on resummation of IR modes and non-trivial stress tensor that at leading order in real- space can be described by a single counterterm, and 2) a RPT-based approach with a tighter requirement of preserving the Galilean invariance of equal time correlators (gRPT). For the three different matter models, we show results for the STD model and for the extension including higher-order stochastic contributions, as in the previous sections we assessed how this extension is preferred from the three data samples we are considering. Differently from the other matter models, the EFT-based approach requires an additional free parameter, represented by the counterterm c 1 , that is completely degenerate with leading order higher derivatives terms. For this reason, in this case the STD model already includes a higher-derivative correction. The full list of configurations, together with free model parameters, is given in Table IV. Figure 7 shows the performance metrics for the three different dark matter models when fitting the combination of P gg and P gm . In this case, we show the individual contributions to FoM and FoB from the three cosmological parameters, h, Ω c h 2 and A s . In addition, for the sake of simplicity, we identify the model breaking scale k † with the end of the corresponding line in the FoM panels.
We first notice how the recovered goodness-of-fit is severely worsened when adopting gRPT to describe the non-linear matter power spectrum, particularly when considering the STD model. We separately tested the impact of extending this model to include higher derivatives, and while this extension slightly improves the range of validity of the gRPT-based model, it still fails in being accepted up to k max = 0.30 h Mpc −1 . The corresponding k2N model definitely improves the recovered trends but the net result is a worse performance respect to RESPRESSO and EFT. The STD model of EFT breaks at higher values of k max , and this is somewhat expected given the presence of the additional free parameter c 1 . Nevertheless in order to provide a χ 2 consistent within 95% confidence interval at all scales we still have to include additional stochastic terms.
The FoB is overall consistent among the three different parameters, showing a strong bias on the marginalised posterior (> 2-σ at k max > 0.25 h Mpc −1 ) when adopting the STD for either gRPT or RESPRESSO. Similarly to the goodness-of-fit, the STD case combined with EFT provides better results (consistent within 1-σ) since it already incorporates a k 2 P L (k) correction which is absent from the other matter models. The overall interpretation is that all three matter models strongly hint for the necessity of adding k 2 -dependent stochastic contributions, both to produce an accurate description of the joint data vector P gg + P gm and to provide unbiased measurements of the cosmological parameters.
As for the FoM, we notice how its dependency on k max is significantly different between the (h, A s ) pair and Ω c h 2 . For the former, the difference in performance between STD and extended models is tiny (∼ 10%), while, for the latter, adding k 2 -dependent stochasticity (or higher derivatives) results in the suppression of the FoM at a reference scale of k max = 0.2 h Mpc −1 by a factor of ∼ 50% and ∼ 25%, respectively.
Focusing on the maximum FoM achieved by each model in the range of scales we consider, we obtain different results. The usage of gRPT typically leads to the breaking of the model at lower wave modes, k max ∼ 0.2 h Mpc −1 , when using the STD model (slightly larger with the HD extension). Substituting gRPT with RESPRESSO helps in extending the range of validity of the HD model, as shown in Section IV B, while leaving almost untouched the model breaking scale of the STD case. However, this last configuration, i.e. RESPRESSO with the STD biasing scheme systematically results in one of the best performing models in terms of maximally achievable FoM. Adding k 2 -dependent stochastic terms to both matter models extends the range of validity up to k max = 0. After having assessed the impact of different modeling assumptions we can now check the statistical significance of adding the galaxy-matter cross power spectrum to a full shape measurement fit like the one we carried out in this analysis. Indeed, as we already pointed out in Section IV B 3×2-point analysis will be the source of several cosmological constraints in next-generation galaxy surveys. Although in this analysis we are excluding redshiftspace distortions, in order to separate the additional model systematics from the ones of one-loop galaxy bias, it is anyhow interesting to quantify the improvement in FoM when including constraints from P gm .
In Figure 8 we compare the marginalised posterior distribution of the cosmological parameters and linear bias obtained when fitting only P gg and the combination P gg + P gm . For this comparison we model P mm with RESPRESSO, include k 2 -dependent stochastic contributions, and include all scales up to k max = 0.3 h Mpc −1 . We remind that the comparison is not properly fair, as we fix the scalar amplitude A s in the fits of the auto power spectrum alone, while this is treated as a free parameter in the fits of the combined statistics.
The observed trend when including P gm in the fit is a significant reduction of the statistical uncertainty of all parameters. In particular we measure a 1-σ standard deviation for h and Ω c h 2 that is 1.4, 2, and 1.2 times smaller for CMASS, LOWZ and MGS, respectively. On the other hand, the linear bias absorbs most of the additional constraining power, leading to more precise measurements of b 1 σ 8 by a factor 3 for all three samples. FIG. 8. Comparison between the posterior parameter distribution (68% and 95% confidence intervals) obtained by fitting the galaxy auto power spectrum Pgg (blue) and the combination of galaxy-galaxy and galaxy matter power spectrum Pgg + Pgm (red). We show constraints obtained at kmax = 0.3 h Mpc −1 for the cosmological parameters h, Ωch 2 , As, and the linear bias b1σ8. In all cases we assume non-linear dark-matter evolution to be described by RESPRESSO. Dashed solid lines correspond to the fiducial values of the corresponding parameter. Grey solid bands mark the 1-σ and 2-σ error on the fiducial linear bias parameter.

V. CONCLUSIONS
In this work we performed a full-shape analysis of the galaxy power spectrum meant to assess the robustness of one-loop galaxy bias models, extending the investiga-tion carried out in [31] to include also the sampling over cosmological parameters. Since we deal with galaxy clustering in real space, we decided to fix both the baryon density Ω b h 2 and the spectral index n s while leaving as free parameters the cold dark matter density Ω c h 2 , the Hubble constant h, and the scalar amplitude A s . The latter is kept as a free parameter only when considering joint fits to the galaxy-galaxy and galaxy-matter power spectra.
We measured both observables from a set of three different synthetic galaxy samples, whose clustering properties are meant to reproduce the ones of three real data catalogues, i.e. CMASS and LOWZ from BOSS, and MGS from SDSS. We rescaled the statistical uncertainties of each of these samples to match an effective volume of 6 h −1 Gpc 3 (i.e. the one of the LOWZ sample), which is representative of the volume of tomographic bins from next-generation galaxy surveys (e.g. Euclid [104]). The analytical recipes we adopted to model these observables are based on a standard one-loop expansion of the galaxy density field on the matter density field, collecting terms related to both spherically-symmetric gravitational collapse and tidal fields up to third-order in perturbations. In the first part of the paper, we fixed the description of the non-linear matter power spectrum using a hybrid perturbative-simulated approach, i.e. RESPRESSO, which was already shown in [31] to provide the closest behaviour to the true measured matter power spectrum on the overall range of scales and redshifts that we consider. In a later section we explicitly tested the impact of different matter modeling assumptions, by employing an EFT-like model and a RPT-derived model (gRPT). In all the considered cases we fixed the quadratic tidal bias γ 2 using an excursion-set-derived relation, in order to break the strong degeneracy between the latter and the cubic non-local parameter γ 21 .
Our main interest was in understanding the range of validity of the standard one-loop expansion, and if our synthetic galaxy sample would hint for the necessity of adding additional terms to the standard recipe. For this, we considered two extensions of the standard model that take into account either higher-derivatives of the gravitational potential or scale-dependent stochasticity, as expected from short-range non-locality due to galaxy formation and the halo-exclusion effect, respectively.
In order to quantify the goodness of each model we make use of a set of three different performance metrics, which are 1) the Figure of Bias, to assess the level of bias introduced in the recovery of cosmological parameters, 2) the goodness-of-fit, to quantify the likelihood between the input data vector and the best fit model, and 3) the Figure of Merit, to compare the constraining power of each model at any given k max . All of these quantities are easily obtained by post-processing the MCMC chains that we run for each combination of matter modeling, one-loop bias extensions, and maximum mode k max .
Our results can be summarised as follows: (i) When using the fiducial description of the nonlinear matter power spectrum (i.e RESPRESSO), we find that a standard one-loop bias model (linear bias, quadratic bias, cubic non-local bias, constant shot-noise parameter) with fixed quadratic tidal bias can provide a good description of the galaxy power spectrum for all our samples up to k max = 0.3 h Mpc −1 while also returning unbiased value of the cosmological parameter set h, Ω c h 2 .
(ii) Similarly to the fixed cosmology analysis of [31] we notice that when we consider the additional information from the galaxy-matter cross power spectrum, the standard model is no longer capable of providing a good performance on the overall range of scales we consider for all the three samples. In particular, this model breaks at a scale of k max = 0.25 h Mpc −1 and k max = 0.2 h Mpc −1 for CMASS and LOWZ, respectively, while being accepted all the way up to k max = 0.3 h Mpc −1 for MGS. This might be representative of the level of non-linearities in the galaxy-matter relationship, which at first order can be captured by the parameter b 1 σ 8 (that is ∼ 1.52 for LOWZ, ∼ 1.26 for CMASS and ∼ 1.06 for MGS).
(iii) Extending the standard model to account also for the presence of either higher derivatives or scaledependent stochasticity provides a better performance both in terms of Figure of Bias (in this case for the parameter set (h, Ω c h 2 , A s )) and goodnessof-fit. Adding scale-dependent stochastic terms restores the range of validity of the model up to k max = 0.3 h Mpc −1 for both CMASS and LOWZ, while the importance of higher-derivatives is limited by a worse χ 2 for k max 0.2 h Mpc −1 , and slightly worse recovery of the linear bias parameter and the constant shot-noise correction. As an additional check, we verify that the marginalised posterior distribution of both k 2 -dependent terms and higher derivatives parameters is stable as a function of k max . However we notice how the maximally achievable FoM of the extended models is consistent with the one of the standard configuration at its lower k max , leading to equivalent statistical constraints on the cosmological parameters.
(iv) Changing the description of the non-linear matter power spectrum to either EFT or gRPT induces modifications to the range of validity of oneloop galaxy bias models. Overall, we find that, as highlighted in [31], RESPRESSO is the best performing model, followed by EFT and gRPT. The EFT-based model is penalised in terms of Figure  of Merit because of the presence of an additional degree of freedom (two in case of the combined fits), but it compensates this with an extended range of validity. On the contrary, gRPT breaks sooner than the other models, at a typical scale of k max ∼ 0.2 h Mpc −1 , which is consistent with the range of validity observed in direct fits of the matter power spectrum. This indicates that this model would also benefit of the presence of an additional free parameter representing a non-zero stress tensor such as in the case of the EFT-based model, as expected. Although the gRPT-based model breaks earlier than the other two models, its maximally achievable FoM is consistent with the one recovered using RESPRESSO.
(v) The additional constraints coming from the galaxymatter cross power spectrum results in an improved statistical precision on measurements of cosmological parameters. Although we fix the scalar amplitude A s when fitting only the galaxy auto power spectrum, we still find that in the combined case we achieve better constraints on both h and Ω c h 2 by a factor 1.4, 2 and 1.2 for CMASS, LOWZ and MGS, respectively. At the same time we notice how a significant fraction of the additional constraining power can be absorbed by nuisance parameters, such as the linear bias, for which constraints are tighter by a factor of 3 for all the three samples.
In this work we analyse three samples with an effective volume which is rescaled to a value of 6 h −3 Gpc 3 . Although the results we obtained are partially based on this choice, as shown in [31], this volume can be well representative of individual tomographic redshift bins that will be adopted by next-generation galaxy surveys. In addition we explore the additional constraining power brought by adding the galaxy-mass cross power spectrum to galaxy power spectrum fits, which is a relevant study in the context of 3 × 2pt analyses of next-generation imaging surveys. Therefore, these results can provide useful insights when adopting one-loop perturbation theory to describe the relationship between the galaxy and matter density fields in that context [103]. In order to completely concentrate on galaxy bias, we performed this analysis using real-space coordinates, avoiding a further modeling layer for redshift-space distortions. We leave to future works the exploration of this effect, both using the power spectrum alone, and combining it with the galaxy bispectrum.

ACKNOWLEDGMENTS
AP and MC acknowledge support from the Spanish Ministry of Science and Innovation through grants PGC2018-102021-B-100 and ESP2017-89838-C3-1-R, and EU grants LACEGAL 734374 and EWC 776247 with ERDF funds. IEEC is funded by the CERCA program of the Generalitat de Catalunya. AE acknowledges support from the European Research Council (grant num-ber ERC-StG-716532-PUNCA. AGS acknowledges support from the Excellence Cluster ORIGINS, which is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy -EXC-2094 -390783311 AP and AGS would like to thank Daniel Farrow, Jiamin Hou, Martha Lippich and Agne Semenaite for their help and useful discussions. The analysis presented here has been performed on the high-performance computing resources of the Max Planck Computing and Data Facility (MPCDF) in Garching. This research made use of matplotlib, a Python library for publication quality graphics [105].