Modeling Biased Tracers at the Field Level

In this paper we test the perturbative halo bias model at the field level. The advantage of this approach is that any analysis can be done without sample variance if the same initial conditions are used in simulations and perturbation theory calculations. We write the bias expansion in terms of modified bias operators in Eulerian space, designed such that the large bulk flows are automatically resummed and not treated perturbatively. Using these operators, the bias model accurately matches the Eulerian density of halos in N-body simulations. The mean-square model error is close to the Poisson shot noise for a wide range of halo masses and it is rather scale-independent, with scale-dependent corrections becoming relevant at the nonlinear scale. In contrast, for linear bias the mean-square model error can be higher than the Poisson prediction by factors of up to a few on large scales, and it becomes scale dependent already in the linear regime. We show that by weighting simulated halos by their mass, the mean-square error of the model can be further reduced by up to an order of magnitude, or by a factor of two when including $0.2$ dex mass scatter. We also test the Standard Eulerian bias model using the nonlinear matter field measured from simulations and show that it leads to a larger and more scale-dependent model error than the bias expansion based on perturbation theory. These results may be of particular relevance for cosmological inference methods that use a likelihood of the biased tracer at the field level, or for initial condition and BAO reconstruction that requires a precise estimate of the large-scale potential from the biased tracer density.

(Dated: November 28, 2018) In this paper we test the perturbative halo bias model at the field level. The advantage of this approach is that any analysis can be done without sample variance if the same initial conditions are used in simulations and perturbation theory calculations. We write the bias expansion in terms of modified bias operators in Eulerian space, designed such that the large bulk flows are automatically resummed and not treated perturbatively. Using these operators, the bias model accurately matches the Eulerian density of halos in N-body simulations. The mean-square model error is close to the Poisson shot noise for a wide range of halo masses and it is rather scale-independent, with scaledependent corrections becoming relevant at the nonlinear scale. In contrast, for linear bias the mean-square model error can be higher than the Poisson prediction by factors of up to a few on large scales, and it becomes scale dependent already in the linear regime. We show that by weighting simulated halos by their mass, the mean-square error of the model can be further reduced by up to an order of magnitude, or by a factor of two when including 0.2 dex mass scatter. We also test the Standard Eulerian bias model using the nonlinear matter field measured from simulations and show that it leads to a larger and more scale-dependent model error than the bias expansion based on perturbation theory. These results may be of particular relevance for cosmological inference methods that use a likelihood of the biased tracer at the field level, or for initial condition and BAO reconstruction that requires a precise estimate of the large-scale potential from the biased tracer density.
The main motivation for testing the bias expansion at the field level is that it is more stringent than a comparison of summary statistics: A model that predicts the simulated halo or galaxy density correctly for all pixels also predicts all summary statistics correctly; the reverse is generally not true. Also, overfitting, which can be a potential issue when fitting bias predictions for summary statistics, is not a concern, because at the field level all pixels or Fourier mode phases must be fitted, and the fit is not dominated by the Fourier modes with highest signal-to-sample-variance-noise.
A second, closely related motivation is to avoid sample variance. It is difficult to use correlation functions to accurately test the bias expansion, because the correlation functions are subject to sample variance in the simulation volume. That sample variance is not present when comparing prediction and simulation at the field level for the same initial conditions. This simplifies quantifying the accuracy and regime of validity of the bias expansion. Specifically, it enables simulations of moderate volumes with accurate mass and spatial resolution to characterize the bias expansion and its error with a precision corresponding to the cosmic variance of surveys that cover a much larger volume. The model error determined in this way can then inform cosmological analyses of galaxy survey data, for example by predicting the fiducial stochastic model error or shot noise to be included in the likelihood.
A third motivation is that a bias model that works at the field level could turn out useful for other applications. For example, it could serve as a forward model predicting the halo density given a linear initial density, which is one of the ingredients of cosmological inference methods that use a likelihood of the biased tracer at the field level [25][26][27][28][29][30][31][32][33]. Or it could help to optimize initial condition and BAO reconstruction, which requires a precise estimate of the large-scale potential field from the biased tracer density (see [34] and references therein, and [32,[35][36][37][38][39][40][41][42][43][44][45] for more recent developments).
Two major goals of our analysis are to check how well the bias expansion describes the simulated overdensity of dark matter halos, which we will refer to as "true" halo overtensity δ truth h , and to measure the amplitude and the scale dependence of the residual noise. These two questions are tightly related to each other. To illustrate this, let us arXiv:1811.10640v1 [astro-ph.CO] 26 Nov 2018 consider the simplest model with the linear bias b 1 where δ is the nonlinear dark matter field. The stochastic term in this formula must be present, since we do not expect that the relation between dark matter and halos is perfectly deterministic [9][10][11][12][13][14][15][16][17]. The best possible b 1 that describes the halo density field can be found by minimizing the mean-square difference |δ truth h − b 1 δ| 2 , leading to the usual formula If the fields δ truth h and δ share the same initial conditions, the measurement of b 1 (k) can be done without sample variance. Notice that the bias measured in this way is a function of k. One way to argue how well the linear bias model works is to ask up to which scales b 1 (k) is a constant. A significant scale dependence is a sign that higher order corrections must be included.
An equally relevant question is how big an error we make, using the best fit values for bias parameters (in our simple example, b 1 (k)). The power spectrum of this model error, or noise (sometimes also referred to as stochasticity [7,[18][19][20][21][22][23][24]), is for the linear bias model given by where in the last equality we have used Eq. (2). The naive expectation for the large-scale amplitude of P err is that it is close to Poisson noise 1/n ≡ V /N particles , which is the power spectrum obtained when distributing pointlike particles randomly in the simulation volume. However, the amplitude of the noise measured in simulations is larger than 1/n for low-mass halos, and smaller than 1/n for high-mass halos [7,23,24,46,47]. The noise can also have a significant scale dependence, even at relatively large scales. In some cases, the amplitude of the noise on mildly nonlinear scales can differ from the amplitude in the low-k limit even by tens of percent. Large amplitude and large scale dependence, if real, are dangerous, because they can significantly impact the inference of cosmological parameters.
One possible interpretation of these results is that the scale dependence of the noise is due to the higher order terms in the bias expansion. Indeed, in definition (1), the noise field contains operators constructed from matter fields that are not included in the model. Even though one may naively think that the higher order terms are irrelevant at large scales, as we are going to see they can significantly change the behavior of the noise even in the low-k limit. Therefore, a more appropriate relation between dark matter and halos on large scales is [1,33,48] where δ model h [δ] stands for the model based on perturbative bias expansion. 1 The success of the perturbative description can then be rephrased as the question of whether or not including higher orders in perturbation theory leads to a P err (k) that has an amplitude closer to the Poisson noise and no significant scale dependence up to the nonlinear scale. To test whether the noise of the perturbative bias models has these properties, we estimate as the field difference between the true halo density, obtained for example from an N-body simulation, and the perturbation theory prediction,ˆ This model error vanishes on average, ˆ = 0, and its power spectrum, [δ], the model errorˆ in Eqs. (5) and (6) is free from these higher order bias terms. It only contains other higher order bias terms, which are not included in the model, and stochastic noise terms. We are going to show that, as a consequence, the model error power spectrum becomes more flat and has an amplitude closer to the Poisson prediction. This is because the higher order bias operators not included in the model make only small k-dependent contributions to the model error, and k-dependent corrections to the stochastic noise become only relevant on small scales. We will discuss these points in more details throughout the paper.
One technical challenge in predicting the halo overdensity at the field level in perturbation theory is the treatment of large infrared (IR) displacements (bulk flows). Comparisons with simulations are naturally done in Eulerian space in the final conditions, but the large displacements are treated perturbatively in Standard Eulerian perturbation theory. This causes significant decorrelation of the predicted fields and simulations even on perturbative scales. To solve this problem we introduce a bias expansion using a new basis of Eulerian bias operators that fully include the Zel'dovich displacement. We call these operators shifted operators and define them as where q is a Lagrangian coordinate in the initial condition, ψ 1 is the Zel'dovich displacement field, and O(q) is any of the standard bias operators written in Lagrangian coordinates. We will show that the bias expansion defined in this way successfully describes the number density of halos in simulations, without the aforementioned decorrelation. Furthermore, we will show that the correlation functions of the shifted operators are closely related to the standard IR-resummed Eulerian counterparts, making a clear connection to the usual one-loop power spectrum for biased tracers.
Another important technical point is that the bias parameters obtained by minimizing the difference between the true halo density field and the model do not generally correspond to the physical (renormalized) bias parameters measured form the low-k limits of the n-point functions. The reason for this is that the shifted operatorsÕ i depend on small scales and they are not renormalized bias operators. We make the choice of not smoothing the density fields for two reasons. Firstly, our goal is not to merely measure the bias parameters, but to push the bias expansion to its limits and see how high in k we can in principle go, maintaining a good correlation with the halo density field. Secondly, we want to see how much of the halo density field on large scales can be explained by the Fourier modes that are not in the perturbative regime. The main advantage of our approach is that it leads to a lower model error than using the bias parameters defined and measured in the standard way. Given a fixed survey volume, this in principle leads to more powerful measurements of cosmological parameters.
One may be tempted to argue that instead of using an analytical bias model one could directly use a full N-body simulation as the forward model for cosmological parameter inference [30]. This would capture small-scale modes and should therefore reduce the model error. But a critical component of this simulation-based approach is to obtain the halo density from the simulated nonlinear dark matter density. Applying a halo finder to the simulated dark matter density has led to complications because it renders the forward-model non-differentiable [30,32]. A possible solution is to use a neural network or other machine learning techniques to approximate the result of non-differentiable halo finding algorithms with a differentiable model [32]. However, such an approximation typically makes some stochastic error. If an analytical bias model can predict the halo density field from the dark matter field with a similar error, then it may be a potentially simpler and useful alternative to a neural net.
The paper is organized as follows. We introduce the bias model in terms of shifted operators in Section II, where we also describe our method to fit bias parameters at the field level and list other bias models that we will compare against simulations. Section III presents the numerical implementation of the bias model and the N-body simulations. We then compare the model against simulations in position space in Section IV and in Fourier space in Section V, where we analyse the size and scale dependence of the model error, and the size of the bias terms contributing to the bias model. Section VI presents a perturbative description of the transfer functions associated with the shifted bias operators that we use, and perturbative fits of these transfer functions. The rest of the paper consists of two standalone sections. In the first of these two sections, Section VII, we discuss the relation to Standard Eulerian bias and perturbation theory. Specifically, we show in Section VII A that the Standard Eulerian bias expansion fails to describe biased tracers at the field level. In Section VII B we discuss the connection with the usual IR-resummed power spectrum. In the second standalone section, Section VIII, we explore how an extension of our work using massweighted halos can reduce the stochastic noise and therefore the error of the bias model. We end by summarizing the main results in Section IX. All technical details are described in appendices.

II. BIAS MODEL AT THE FIELD LEVEL
The perturbative approach to halo biasing has a long history, and has been well studied both in Eulerian and Lagrangian space (see the review [1] and references therein, including, e.g., [7,[49][50][51][52][53][54][55][56][57]). However, most of the focus has been on the prediction of summary statistics, such as the power spectrum or the bispectrum of biased tracers. In contrast, in this paper we are interested in perturbation theory predictions at the level of realizations. In this section we describe a perturbative bias model that we are going to use to make comparisons with simulations.

A. Bias Expansion in Terms of Shifted Operators
Describing dark matter or biased tracers at the field level is a nontrivial challenge for perturbation theory. For instance, it is well known that the large IR displacements (bulk flows) induced by long modes cannot be treated perturbatively. If they were, the positions of particles computed in perturbation theory would be off by as much as O(10) h −1 Mpc compared to their true values. This means that the density field obtained from N-body simulations and the one computed treating the large IR displacements perturbatively (using the same initial conditions) would be completely uncorrelated on scales smaller than O(10) h −1 Mpc. 2 This is precisely what happens in Standard Eulerian perturbation theory, making it deficient for the description of realizations of dark matter or halo density fields. We will come back to the details of this failure of Standard Eulerian perturbation theory in Section VII A.
On the other hand, in Lagrangian perturbation theory the large IR displacements are naturally taken into account. However, this framework has a different problem. It predicts only the nonlinear displacement field ψ and not the density field δ. Going from one to the other is a nontrivial step. Given that the relation between δ and ψ is very nonlinear, even a very good knowledge of the displacement field up to some scale does not guarantee that the density field will be correct up to the same scale with the same precision [67,68].
In this paper we present one possible perturbative description that circumvents these problems by constructing a bias expansion tailored to describe biased tracers at the field level. We put forward the following requirements: (a) The bias expansion must be perturbative; (b) The bias operators have to be written in Eulerian space, given that we are comparing theoretical predictions and simulations of the final Eulerian density field; (c) The large IR displacements have to be treated non-perturbatively.
Our strategy to achieve all of these goals is to combine the virtues of Eulerian and Lagrangian descriptions into a hybrid scheme. We start with the description of biased tracers in Lagrangian space. The displacement field is then split into the dominant linear contribution and smaller higher order corrections. The nonlinear corrections to ψ are treated perturbatively, while the linear piece is kept in the exponent. In this way, the dominant part of the large displacements can be treated exactly, and the resulting operators once written in Eulerian space are automatically IR-resummed. In the rest of this section we give the details of this construction.
The proto-halo density field at Lagrangian position q is modeled using a bias expansion in the linear Lagrangianspace density δ 1 (q): . . are Lagrangian bias parameters, σ 2 1 is the r.m.s. fluctuation of the linear density field and the tidal operator G 2 (q) is defined as 3 2 It is important to stress that the effect of this decorrelation is much more dramatic at the field level than for the correlation functions. This is due to the general statement that the effects of bulk flows have to cancel in equal time n-point functions [58][59][60][61]. The only exception to this theorem are cases in which there are sharp features in the correlation function, such as the BAO peak. For example, the only effect of large displacements on the power spectrum is to smooth out the BAO wiggles (or spread the BAO peak in real space two-point function) [62][63][64][65][66], while the smooth part of the power spectrum at small scales remains unchanged. 3 The basis of operators at second order (and higher orders) in perturbation theory is not unique. One of the advantages of working with {δ 2 1 , G 2 } is that the auto-power spectrum of G 2 and its cross-spectrum with δ 2 1 vanish in the low-k limit. This simplifies our analysis and helps to disentangle relevant contributions to the shot noise in the low-k limit. For other common choices of the basis operators and their relation to {δ 2 1 , G 2 } see [1].
The representation of this operator in momentum space is given by Notice that in our notation p ≡ d 3 p/(2π) 3 . For the rest of this section we will also use δ h ≡ δ model h . In the bias expansion (8) we kept only terms up to second order in perturbation theory. We will continue to work at this order throughout this section, because it is sufficient for introducing notation and motivating the bias model that we are going to use to make comparisons with simulations. The higher order or higher derivative operators needed for the consistent one-loop calculation can be straightforwardly included. We will come back to this in Section VI.
The bias expansion in Eq. (8) is in Lagrangian space. In order to go to Eulerian space, let us start from Eq. (8) and include the gravitational evolution. The gravitational evolution is encoded in the nonlinear displacement field 4 , such that the Eulerian coordinates x of a halo at the initial position q are given by x = q + ψ(q). The overdensity generated in this way is given by where δ D is the Dirac delta. The Fourier transform of this field in Eulerian space is For simplicity, in this equation and in the rest of the paper we restrict the range of momenta to k = 0, so that the zero modes or mean density do not enter our formulas. The nonlinear displacement from Lagrangian to Eulerian position can be expanded in a perturbative series ψ = ψ 1 + ψ 2 + · · · . At first order, we have the well-known Zel'dovich approximation [69] The second-order displacement can be written as Using the perturbative description of the nonlinear displacement field and expanding the exponent e −ik·ψ(q) in Eq. (13) it is possible to recover the usual Standard Eulerian bias expansion. This procedure also fixes the relation between Lagrangian bias parameters and their Standard Eulerian counterparts. Of course, this is not a surprise, as we expect the two descriptions to agree order by order in perturbation theory.
On the other hand we do not want to expand the full nonlinear displacement. We are going to keep the largest part ψ 1 (q) exponentiated and expand only the higher-order terms. 5 In this way the largest part of the problematic IR displacements is not expanded in perturbation theory. With this in mind, we can rewrite Eq. (13) in the following way where the new contributions come from expanding the second (and higher) order displacement field in the exponent. It is important to stress that at leading order this new term can be expressed through the second order operator G 2 4 We are assuming that the halos are formed in the the initial conditions and displaced by ψ. In reality the evolution is more complicated and in general nonlocal in time. However, it can be shown that these complications can be rewritten such that they only change the values of bias coefficients in perturbative approach to halo clustering (for more details see [55,56]). For this reason we proceed with the simplified picture of halo formation and evolution. 5 Let us define W (k) to be a low-pass filter, compared to the wavelength of a Fourier mode δ 1 (k). For a given wavenumber k, the linear displacement can be split into the long-wavelength and short-wavelength part: The effect of ψ L 1 on the short modes is fixed by the Equivalence Principle. Therefore, strictly speaking, only ψ L 1 should be kept exponentiated and in any perturbative calculation ψ S 1 has to be expanded order by order in perturbation theory. The error in our formulas introduced by keeping the full ψ 1 in the exponent is always higher order in ψ S 1 than terms we calculate. Also, this error is mainly relevant on small scales. In order to keep the formulas simple, we decide not to do the long-short splitting in our calculation.
(see Eq. (15)). Therefore, at second order in perturbation theory, expanding the nonlinear terms in the displacement field ψ(q) only shifts some of the standard Lagrangian bias parameters by a calculable constant. We will give more details about higher order terms in Section VI.
The previous expression motivates us to write down the bias expansion in Eulerian space in terms of shifted operators defined asÕ where O ∈ {1, δ 1 , δ 2 ≡ (δ 2 1 −σ 2 1 ), G 2 , . . .}. 6 We stress again some of the advantages of using an expansion in this basis: (a) The shifted operators are written in Eulerian space and therefore allow for easy comparisons with simulations and quantification of their importance. (b) The large displacement terms ψ 1 (q) are kept resummed, which is crucial for comparisons with simulations at the field level. Notice that this also implies that in this description the BAO wiggles are properly suppressed (the BAO peak is spread). However, the model is still perturbative in small quantities, such as derivatives of the linear displacement δ 1 = −∇ · ψ 1 . The power spectrum calculated using the shifted operators is identical on large scales to the standard 1-loop result with IR-resummation. (c) The shifted operators are easy to generate on a 3-d grid for a given initial condition realization on a 3-d grid, by shifting properly weighted particles from Lagrangian to Eulerian coordinates using the Zel'dovich displacement (see Section III below).
One term in the previous equations that has a somewhat special role is the shift of a uniform density, O = 1. This contribution to δ h (k) is equal to the Zel'dovich density field It is fixed by dynamics and it is not a part of the bias expansion in the usual sense (it has no free parameters). However, the Zel'dovich density δ Z (k) can also be expanded in the basis of shifted operators (see Appendix A), whereG 3 is a cubic operator analogous toG 2 (see Appendix D). In other words, δ Z (k) can be absorbed in the bias expansion by simply changing the bias parameters. Of course, this is just a choice, and there is nothing wrong in keeping δ Z explicitly in the formulas. As we are going to see later, different choices may be more appropriate for different applications. Let us point out that in the formula (19) the displacements ψ 1 (q) are treated exactly. In other words, the exponential e −ik·ψ1(q) is never expanded in ψ 1 (q). The only expansion parameter is the derivative of the displacement, ∇ · ψ 1 (q) = −δ 1 (q), which is a small quantity. 7 This is consistent with the way the shifted operators are defined.
Using the basis of shifted operators (17) we can therefore write the bias expansion of the halo density field in Eulerian coordinates, up to second order in perturbation theory, as This is the main result of this section. Notice that the new bias parameters b i differ from the original Lagrangian biases b L i by a constant. This difference comes from expanding the nonlinear part of the displacement (Eq. (16)) and writing the Zel'dovich density field in terms of shifted operators (Eq. (19)). We give the explicit relation of b i and b L i in Section VI. Equation (20) has a similar structure as the usual Standard Eulerian bias expansion where δ 2 (k) is the Fourier transform of the squared Eulerian density δ 2 (x) (as opposed toδ 2 (k), which is obtained by squaring in Lagrangian coordinates and then transforming to Eulerian coordinates using Eq. (17)). Notice that all fields in Eq. (21) are nonlinear. In contrast, in the expansion (20) all operators are expressed in terms of the linear field δ 1 , which, as we are going to see, is more suitable for describing biased tracers at the field level.
Another virtue of the expansion (20) is that the theoretical calculation of the power spectrum is quite straightforward (see Section VI C). It involves the calculation of the power spectra of shifted operators, which have a familiar form, for instance The expression on the r.h.s. is common in Lagrangian perturbation theory. This connection is not surprising, given that we started our derivation in Lagrangian space. Even though we have come to the definition of the shifted operators using a different motivation, a lot of literature already exists on the power spectrum of biased tracers in Lagrangian perturbation theory. In this paper we are going to use some results presented there. For some recent developments, such as Convolution Lagrangian Effective Field Theory, see for example [57,[70][71][72] and references therein.

B. Promoting Bias Parameters to Transfer Functions
So far we wrote the bias expansion in terms of shifted operators keeping only terms up to second order in perturbation theory. If we want to describe the density field of biased tracers deeper in the nonlinear regime, we have to include higher order terms. For instance, even for the evaluation of the one-loop power spectrum one has to keep all cubic operators. Let us take a closer look at this example whereÕ i 3 is a set of cubic operators and b i 3 are the corresponding bias parameters. At lowest order in perturbation theory the cubic operators correlate only withδ 1 . We can split the cubic operators into parts parallel and orthogonal In this way, allowing for a scale-dependent bias parameter b 1 (k), we can write At one-loop order, the new cubic operators are orthogonal to all other fields. This implies that even the bias expansion up to second order in the fields, with the appropriate b 1 (k), is sufficient to describe the density field with the correct one-loop power spectrum. Allowing for scale-dependent bias parameters effectively allows us to reduce the order in perturbation theory that we need to describe the density field of biased tracers at a given order in perturbation theory.
This example provides motivation to promote all bias parameters to k-dependent functions in order to take into account as much nonlinearity as possible. This expression can be compared to realizations of Nbody simulations. Calculating the operators with the same initial conditions, the sample variance can be canceled [67]. The bias functions can be measured from the condition that the difference between realizations in simulations and theory is minimal. This procedure allows us to ask a very general question: How much of the real halo density field can be described with a few leading-order operators, even beyond the perturbative regime? In a setup this general, a perturbation-theory-inspired model can be considered successful if it leads to small (close to Poisson) and scale-independent mean-square model error.
When fitting the above model to a halo density at the field level, the bias coefficients b i are correlated with each other because the shifted fieldsδ 1 ,δ 2 andG 2 are correlated among themselves (they are defined using the same initial conditions and the same displacement field ψ 1 ). When interpreting the bias parameters, it is useful to change the basis to avoid this correlation. We therefore rotate the shifted operators to mutually orthogonal fields using the Gram-Schmidt algorithm:δ The Gram-Schmidt rotation matrix M ij (k) is M 10 (k) = −Pδ 2δ1 (k)/Pδ 1δ1 (k) etc., and can be computed using a Cholesky decomposition of the 3 × 3 correlation matrix between the three shifted fields {δ 1 ,δ 2 ,G 2 } in every k-bin as described in Appendix C. The bias expansion in this orthogonal basis is then These new bias parameters, or transfer functions, β i (k) are independent from each other. We can therefore add higher-order operators using the same procedure without changing any of the lower-order bias parameters, which is a useful property. In our framework, where transfer functions are determined by minimizing the mean-square model error at the field level, the change of basis, i.e., going from b i to β i , does not change the predicted halo density; it merely provides a more convenient way to interpret the numerical values of bias parameters. Also notice that the first parameter remains unchanged, β 1 (k) = b 1 (k). In Section VI we will present one-loop perturbation theory predictions for β i (k) and compare against measurements of β i (k) from N-body simulations.

C. Relation to Renormalized Bias Parameters
Before we close this section listing all bias models that we use in the paper, we get back to an important point that we have only briefly mentioned in the introduction: The low-k limit of the transfer functions β i (k) does not necessarily approach the values of physical (renormalized) bias parameters. This means that the bias parameters we measure at the field level are not generally expected to be the same as the bias parameters measured from correlation functions of the halo density field. In the terminology of renormalization, what we measure at the field level is closer to "bare" bias parameters. These biases depend on the cutoff scale, or the way the small scales are regulated. For example, as we are going to see, using the linear or the nonlinear matter density field to construct bias operators leads to very different transfer functions in the low-k limit. One easy way to see why this happens is to take a look at the expression for a transfer function obtained using the minimization described above. If we assume that the basis of operators is orthogonal, we can write The power spectrum in the denominator in general involves loops, and therefore it is obviously dependent on how the high-k modes are treated. The usual way to deal with this issue is to renormalize the bias operators, subtracting the cutoff-dependent counterterms [54]. Away from the perturbative regime and at the field level this becomes challenging. Take for example the operator δ 2 . The power spectrum of this operator is constant in the low-k limit. This constant comes from integrating very short scales and can be always absorbed by the free amplitude of the shot noise in the power spectrum. However, this is not possible at the field level. If we add an independent field with constant power spectrum to the model with the hope to fix the problem, it can only give a positive definite contribution to the model error power spectrum, making the model worse.
At this point it is important to clarify the relation to other works (see for example [8,73]) where similar techniques were exploited to measure the physical bias parameters. The idea is that the bias parameters can be measured projecting the halo density field on the basis of bias operators, leading to equations very similar to Eq. (31). One major difference is that the bias operators in [8,73] are constructed from the smoothed density field. The smoothing scale R is chosen to ensure that only the Fourier modes in the perturbative regime contribute and it is typically R ∼ O(10) Mpc at z = 0. 8 In this way it is indeed possible to measure the low-k limit of the transfer functions and rigorously prove that they can be identified with the renormalized bias parameters.
However, this program is somewhat orthogonal to our goals in this paper. We do not necessarily restrict to the perturbative regime k k NL , but we want to test how well we can reproduce the halo density field even around the nonlinear scale. Using the smoothed density field to construct the basis operators would imprint the smoothing scale in all our calculations and lead to significant decorrelation with the halo density field already around k ∼ 0.1 hMpc −1 . In this context, keeping the short scales in the bias operators seems to lead to better results. We therefore do not apply any smoothing to the fields. 9 The price that we have to pay for this choice is that the low-k limit of the transfer functions does not correspond to bias parameters defined in the usual way.
Let us finish by saying that one important exception in this discussion is the linear bias. In this case The low-k limit of this expression coincides with the usual definition of the renormalized linear bias, since the power spectrum in the denominator approaches P 11 (k). Therefore, we do expect to find that b 1 = β 1 (k → 0) is indeed the same as inferred from the power spectrum or separate universe simulations.

D. List of Bias Models
When comparing against simulations we will mostly use the bias expansion in terms of shifted operators described above, but sometimes we will also show comparisons with other bias expansions. The following list provides an overview over all bias models that we will use for the analysis.
• Quadratic bias model: This is our perturbation theory prediction described above for the density field of biased tracers in a realization.
We are going to use this, or the cubic extension described below, as the reference model for comparisons with simulations and with other biasing schemes.
• Linear bias model: We include this model in the analysis to study how the second order terms in Eq. (33) affect results, particularly the amplitude and scale dependence of the model error. The transfer functions in this model approach the usual linear Lagrangian bias parameters on large scales. This is because we have kept the Zel'dovich density δ Z explicitly in the formula. At leading order in perturbation theory there is no reason not to replace δ Z withδ 1 (see Eq. (19)). However, the second order contributions in δ Z , which are fixed by the gravitational evolution and come with fixed coefficients, can be significantly larger than the second order bias contributions (depending on halo mass). Dropping them would then affect the model error (shot noise) of the linear bias model, making it larger and more scale dependent. For this reason we choose to keep δ Z in the formula. In other words, the linear bias model as we choose to write it here is the best possible one-parameter model that we can use in comparisons with realizations. This is a conservative choice because the impact of the additional second order terms in Eq. (33) compared to Eq. (34) is minimized. Even then, as we will see, the second order bias terms will be quite significant.
• Cubic bias model: Another possible modification is to include additional operators in the bias model. Here we include the shifted cubic term δ 3 1 (q), ignoring all other contributions at the same order. Strictly speaking this choice is not consistent with perturbation theory and we should not trust this model on small scales where the two-loop terms become important. However, our motivation to keep δ 3 1 is due to the fact that we want to study the impact of this operator on the amplitude of the shot noise in k → 0 limit. As it turns out, in the basis of cubic operators {δ 3 1 , δ 1 G 2 , G 3 , Γ 3 }, the only operator that has a constant contribution to its auto power spectrum in the largescale limit is δ 3 1 . Therefore, unlike in the case of correlation functions, at the level of realizations it does make sense to add a subset of bias operators at the given order in perturbation theory, as long as they can have a large contribution on very large scales. We find that adding δ 3 1 is most effective when we remove small-scale modes from δ 1 before cubing the field; we therefore apply a smoothing to δ 1 with sharp cutoff at k max = 0.5 hMpc −1 when computing δ 3 1 (none of the other fields are smoothed because their auto-power spectra are less UV sensitive). 10 10 With Gaussian smoothing the model can be improved further for high mass halos, but this typically increases the scale-dependence of the transfer function associated with δ 3 1 . The sensitivity of δ 3 1 on smoothing suggests that a more systematic investigation of the optimal smoothing of this term could improve the bias model. Including the full set of allowed cubic operators can lead to further improvements, but also requires more bias parameters.
• Standard Eulerian bias model: This is the standard expression for the density field of biased tracers using Standard Eulerian bias. This model assumes that we can perfectly model the fully nonlinear dark matter density field δ. In practice, we measure this from N-body simulations, i.e. we use the best Standard Eulerian bias model we could ever hope for. Notice that the second order operators are also evaluated using the nonlinear field and they are orthogonal to each other and δ. We are going to compare both first and second order terms with simulations. We have already discussed some shortcomings of modeling δ with the Standard Eulerian perturbation theory. As we are going to see, using the full nonlinear density field from N-body simulations also has its own problems. We will get back to these issues in Section VII A, where we will also consider possible modifications of this model by smoothing δ or replacing δ by the perturbative dark matter density.
For each of the bias models listed above, we allow the bias parameters or transfer functions β X i (k) to be free functions of wavenumber k. We will measure them from simulations as described in the next section and show that they are smooth functions. On large scales the k-dependence of these functions can be predicted using perturbation theory with a few free parameters. The number of these free parameters is the same as the number of usual bias parameters.

III. NUMERICAL IMPLEMENTATION
To test these bias expansions against simulations we proceed as follows. We first draw a Gaussian linear density from a fiducial linear power spectrum, computed with CAMB [74] for a flat ΛCDM cosmology with Ω m = 0.3075, Ω b h 2 = 0.0223, Ω c h 2 = 0.1188, h = 0.6774, σ 8 = 0.8159, and n s = 0.9667 based on Planck 2015 [75]. Using this linear density, we evaluate each halo bias model on a 3-d grid in Eulerian coordinates, and compare this against the halo density obtained from N-body simulation initialized with the same linear density. We then compute the difference between the model and simulation density, which is free of sample variance and directly measures the error of the bias model in Eulerian coordinates. 11 Before showing the results of this, let us briefly discuss in more detail how the model density and simulations are generated.

A. Halo Bias Model on 3-D Grid in Eulerian Space
To evaluate the linear, quadratic and cubic bias models in Eqs. (33), (34), and (35), we must evaluate the shifted operators (7) on a 3-d grid in Eulerian coordinates. To do this, we generate a uniform catalog with 1536 3 particles located at the vertices q of a regular 1536 3 grid in a 3-d box with side length L = 500 h −1 Mpc, corresponding to a particle separation of ∆q = 0.33 h −1 Mpc. We then displace each particle, q → q + ψ 1 (q), where ψ 1 (q) is the linear displacement in Lagrangian coordinates q from Eq. (14), rescaled linearly to redshift z = 0.6 using the linear growth function D(z). To compute the shifted operator corresponding to O(q) = 1, we paint the displaced particles to a 512 3 grid using the standard cloud-in-cell (CIC) algorithm, so that each grid cell stores the number of nearby particles, weighted by the distance of each particle from the cell center. The corresponding overdensity is the Zel'dovich density δ Z (x) in Eulerian coordinates. Notice that this procedure is the same as when initializing N-body simulations from a regular grid, except that the displacement is evaluated at late time, z = 0.6.
To generate the shifted linear densityδ 1 , we proceed in a similar way. We again start with the uniform catalog of 1536 3 particles, but now assign each particle an artificial mass given by δ 1 (q), rescaled linearly to z = 0.6. (Notice that the density of this catalog is δ 1 (q).) We displace these particles using q → q + ψ 1 (q) as before. To paint the resulting catalog to a grid, we modify the CIC painting scheme such that now each particle contributes to nearby grid cells with the usual CIC distance weight multiplied by the mass of each particle. We sum these masses, without dividing by the number of particles that contribute to each cell, so that nearby particles with equal mass (i.e., particles that originate from a region in Lagrangian space where δ 1 (q) is constant) can cluster and create a density that is larger than the mass of these particles. This ensures that the volume factor given by the determinant of the Jacobian ∂x i /∂q j between Eulerian and Lagrangian coordinate systems is included inδ 1 , and that the mean density remains unchanged. The shifted squared densityδ 2 and shifted tidal fieldG 2 are computed similarly, using δ 2 1 (q) or G 2 (q) for the particle mass.
Next, the fields entering the model are orthogonalized using the Gram-Schmidt procedure in Eq. (27). Details specific to this orthogonalization procedure are described in Appendix C. Finally, we compute all power spectra between these orthogonalized model contributions and the true halo density obtained from an N-body simulation started from the same linear density, get the optimal model transfer functions β i (k) using linear regression (40), and sum up the model contributions weighted by the transfer functions.

B. Phase-Matched N-body Simulations
The phase-matched N-body simulations are generated as follows. Using the same initial linear Gaussian density as above, initial particle positions and velocities at z = 99 are set up using the Zel'dovich approximation for 1536 3 dark matter particles in a L = 500 h −1 Mpc box. These particles are evolved to redshift z = 0.6 using the TreePM N-body code MP-Gadget [76,77], with N mesh = 3072 for the particle-mesh (PM) grid. The code makes about 4200 time steps to reach z = 0.6. The mass of each dark matter particle is 2.94 × 10 9 h −1 M .
In the resulting dark matter snapshot we identify halos using the standard friends-of-friends (FOF) algorithm with linking length of 0.2 using nbodykit [78,79]. We require halos to have at least 25 dark matter particles, corresponding to a minimum halo mass of 7.4 × 10 10 h −1 M ; the heaviest halo weighs about 1.3 × 10 15 h −1 M . We define four halo mass bins with number densities roughly corresponding to different future experiments as indicated in Table I. For each mass bin we compute the halo density on a 512 3 grid using standard CIC painting.
To estimate uncertainties, we generate five independent realizations of the linear density using different random seeds, and generate the bias expansion density and simulations for each of these five realizations. Whenever we compare model and simulations we first compute their difference for each random seed and then average the result over the five realizations, to avoid sample variance.
We will refer to these simulations as the ground truth, and we will ask how well the analytic halo bias expansion can describe them. Of course, the simulations could be made more realistic by populating the halos with galaxies and including redshift space distortions, but we will restrict ourselves to halos in real space in this work.

C. Determining Bias Transfer Functions
To compute the bias transfer functions β i (k) we minimize the mean-square model error defined in Eq. (6), in every k bin. This minimization is meaningful because P err is non-negative and vanishes if and only if the amplitude and phases of all Fourier modes match perfectly, Since all bias expansions that we consider are of the form i.e. linear in the bias transfer functions β i , the minimization of P err (k) in each k bin is equivalent to linear regression or ordinary least squares in each k bin, which gives is the covariance matrix between the model operators O i in a k bin, and O −1 (k) is the inverse of this matrix in that k bin. 12 As described above we orthogonalize these model operators using Gram-Schmidt orthogonalization (27) so that the covariance matrix is diagonal for every k. We then fit these orthogonalized transfer functions using perturbation theory as described in Section VI below.
Similarly to the measured model error, the transfer functions determined in this way avoid sample variance. Related methods have also been used to model the displacement field [67], the nonlinear dark matter density [68,91], or the 21cm radiation from reionization [92]. While one could include regularization or prior terms like i β 2 i in the minimization, we find no need for this if fields are orthogonalized.

IV. SIMULATION RESULTS IN POSITION SPACE
We start the comparison of the bias models against simulations in position space in this section, turning to Fourier space in the subsequent section.
A. Two-Dimensional Slices Fig. 1 shows 2-d slices of the 3-d overdensity of halos δ h (x) in one of the simulations, compared with two of the bias models. This shows that the cubic bias model provides an accurate description of the density contrast of these halos, with minor differences only visible on rather small scales. The linear Standard Eulerian bias provides a less accurate description, but still gets most of the structure on large scales right.
For more massive and less abundant halos, we obtain Fig. 2. The cubic model is less successful for these halos, especially on small scales. For example, the model predicts a large spherical overdensity up from the center of the slice, but this does not exist for these halos in the simulation; in many other regions the model tends to underpredict the peaks of the true halo overdensity. This is even more severe for the linear Standard Eulerian bias model, and for more massive halo populations. On large scales, however, the models work still well, as we will see more clearly when we turn to Fourier space later.

B. One-Point Probability Distribution
To get a more global view of the position-space halo density we estimate its one-point probability distribution by computing the histogram of the halo density for different smoothing scales. Fig. 3    simulations. The variance, skewness and kurtosis of the densities shown in the histograms are listed in Table II for the full simulated and modeled densities, and in Table III for the model error.
The linear Standard Eulerian bias model tends to underpredict troughs and overpredict peaks of the halo density, as shown in Fig. 3. The model error is not Gaussian for any of the shown smoothing scales; in particular its kurtosis is larger than 1 for all smoothing scales.
The cubic model provides a more accurate description of the halo density pdf, as shown in Fig. 4. This emphasizes the importance of using nonlinear bias terms even on rather large scales. Still, the cubic model tends to underpredict the peaks of the true halo density, especially on small scales. This agrees with Fig. 2 where the model also underpredicts    the simulated density in more regions than it overpredicts it (considering only overdense regions that are easiest to pick up by eye). Related to this, the variance, skewness and curtosis of the cubic model halo density are similar to that of the true simulated density, especially for large smoothing scale (see Table II). The model error of the cubic model looks most Gaussian for large smoothing scales, but it is never completely Gaussian, with a skewness of 0.13 and a kurtosis of 0.68 even for R = 10 h −1 Mpc smoothing. Most of this is caused by the tails of the distribution, i.e. by outliers of . Quantifying the non-Gaussianity of the error in more detail, for example by measuring bispectra, would be interesting. In what follows we will only consider the power spectrum of the error however.

V. SIMULATION RESULTS IN FOURIER SPACE
The one-point pdf and histograms shown above quantify the number of pixels where model and simulation density have the same value, but they do not check that the densities agree pixel by pixel and are spatially coherent. To test this, we turn to Fourier space and compute two performance measures quantifying the size of the model error mode by mode: First, in Section V A 1, we compute the model error power spectrum P err (k) = |δ truth for the simulated halos as introduced in the introduction. Second, in Section V A 2, we discuss the cross-correlation coefficient between Fourier modes of the model and simulated (truth) halo density. As we are going to see in Section V A 3, the size of the model error P err and the cross-correlation coefficient r cc are directly related to the amount of cosmological information that can be extracted when using the model to describe a measurement of the halo density. (Also, P err and r cc are closely related to each other by relations given in Appendix B.) Following these results on the size of the model error and the cosmological constraining power, we proceed in Section V B to investigate the scale dependence of the model error, which, if ignored, can lead to biases of cosmological parameter measurements. In particular, we determine the maximum wavenumber k max up to which it is safe to assume a scale-independent model error power spectrum or shot noise.
We end the section by showing how large the contribution from the different bias terms is to the total model as a function of wavenumber, demonstrating the importance of including nonlinear bias terms.
Throughout the section, P model and P truth refer to the halo power spectrum of the model and simulations, respectively. As described in the introduction, our measurements differ quantitatively from previous measurements of stochasticity because we work at the field level and include nonlinear bias terms in the perturbative model.

Model Error Power Spectrum
Fig . 5 shows the broadband power spectra of the four halo mass bins of simulated halos, and the best-fit model for one of the bias models introduced above (the quadratic bias model). The mean-square difference between the simulation and model density, given by the error power spectrum P err (k), is shown in orange. It is rather flat as a function of k, and it deviates from the Poisson prediction by up to a factor of 2, depending on halo mass.
Our goal is to study the amplitude and the scale dependence of the model error in more detail and also for the other halo bias models introduced previously in Section II D. For this purpose we show P err divided by the Poisson prediction 1/n in Fig. 6.
Let us first discuss the low-mass halos, M ≤ 10 12.8 h −1 M . We find that for the linear bias models, the meansquare model error is larger than the Poisson prediction by a factor of a few, and it is rather scale-dependent, even on large scales. In contrast, the mean-square model error of the quadratic bias model deviates only by a few tens of percent from the Poisson prediction, and is rather scale-independent, with some scale dependence only visible at k 0.2 hMpc −1 . This shows that including the quadratic bias termsδ 2 andG 2 reduces the mean-square model error on large scales by a factor of 4 to 6 and reduces its scale dependence.
For more massive halos and clusters, M > 10 12.8 h −1 M , we find that the mean-square model error of the quadratic and cubic bias models is smaller than the Poisson prediction 1/n by up to a factor of two, which is about 30% smaller than the mean-square model error of the linear bias models for these halos. Qualitatively similar sub-Poissonian

S i m u l a t e d h a l o s ( t r u t h )
Q u a d r . b i a s ( m o d e l ) Figure 5. Broadband power spectra of the simulated halo overdensity (solid grey), the best-fit quadratic bias model of that overdensity (dashed grey), and the field-level difference between simulation and model (orange), which represents the meansquare model error or error power spectrum Different panels show different halo mass bins. The amplitude of the model error is larger than the Poisson prediction for low mass halos and smaller for high mass halos, and it is rather scale-independent. The results are averaged over five independent simulations. The uncertainty of Perr estimated from the scatter between these simulations is indicated by the width of the shaded orange region at low k, and it is smaller than that at high k. errors for heavy halos have been reported in the literature before [7,23,24]. This is theoretically expected because of self-exclusion and clustering of halos [23,46,93], which violate the assumption of placing point particles randomly in space (sampling the continuous density uniformly). Although less clearly visible than for the low-mass halos, the model error of the nonlinear bias models is again less scale-dependent than the model error of linear bias, which deviates by tens of percent from a k-independent shot noise at k 0.1 hMpc −1 , The model errors shown in color in Fig. 6 represent the minimum mean-square model error if the transfer functions β i (k) of the bias models are allowed to be free functions of k, obtained using linear regression in each k-bin as described in Section III C above. If we instead restrict the functional form of these transfer functions to a theory prediction by fitting the linear regression transfer functions β i (k) using five k-independent parameters b 1 , c s , b Γ3 , b 2 , and b G2 (see Section VI D below for details), we obtain the black dashed curves in Fig. 6 in the case of the quadratic and cubic bias model (for the latter we fit β 3 (k) with a constant sixth parameter). This more conservative model error is only minimally larger than before, which shows that the transfer functions can be well described with a 5-or 6-parameter fit as we are going to see in more detail in Section VI D below.
For the lowest halo mass bin, shown in the top left panel of Fig. 6, we show two dashed lines corresponding to the quadratic bias model. The difference between them is whether or not the Zel'dovich density δ Z is absorbed in the bias expansion using Eq. (19). The grey dashed curve is obtained keeping δ Z explicitly in the bias expansion as an extra field with the fixed transfer function. In this case the noise is somewhat different with respect to the standard second order bias model, which implies thatG 3 and higher-order terms in the expansion of the Zel'dovich field become important. This is not surprising, since the amplitude of the noise for the lowest halo mass bin is very small and comparable to |G 3 (k)| 2 around k ∼ 0.1 hMpc −1 . Our results suggest that in the limit of very low shot noise it is better to keep δ Z explicitly in the bias expansion because this leads to an error with smaller amplitude and scale dependence. One may wonder whether this is consistent, given that we are anyway neglecting higher order L i n e a r S t d . E u l . b i a s L i n e a r b i a s Cubic bias P hh

Poisson prediction
Quadr. bias halo bias models, divided by the Poisson prediction 1/n. Different panels show different halo mass bins; different colors represent different bias models. For all mass bins the quadratic and cubic bias models have the smallest large-scale model error and the smallest scale dependence. Shaded areas represent the 1σ credibility interval if bias transfer functions are allowed to be free functions of k (the uncertainty is computed as the standard error of the mean estimated from the scatter between the five independent simulations). If we instead fit these transfer functions using five k-independent parameters b1, cs, bΓ 3 , b2, and bG 2 , we obtain the dashed curves for the quadratic and cubic bias models. For the densest halo sample (top left panel), keeping δZ as an extra field in the quadratic model without a transfer function yields the grey dashed curve when fitting the other transfer functions with the theory model. For the two most massive halo samples (lower panels), we include the cubic bias model withδ3(k), which helps to describe these halos. The small suppression of all curves at high k is due to the CIC window used to paint particles to the grid. bias operators. One way to justify keeping the Zel'dovich density field explicitly is to note that the coefficients in the expansion of δ Z in terms of shifted operators are possibly significantly larger than typical Lagrangian bias parameters. It would be interesting to further explore this question. However, in cases with realistic halo masses the difference between the two approaches is very small compared to the amplitude of the shot noise.
Of course, it is a well known result from the literature that nonlinear bias is required to describe summary stastistics such as the galaxy power spectrum or bispectrum on mildly nonlinear scales. For example, analyses of data from the recent SDSS BOSS galaxy survey found that nonlinear bias terms are required to model their measurements [94][95][96][97][98]. It is therefore not surprising that we also find nonlinear bias to be important when comparing at the field level. What is more surprising is thatδ 2 has a nearly constant auto-power spectrum on large scales (see Fig. 13 below), but nevertheless it describes part of the true halo density on large scales, substantially lowering the large-scale model error. As we will find in Section VII A, this is a consequence of working with the shifted operatorδ 2 ; when instead working with the squared nonlinear Eulerian dark matter density, as is done in the Standard Eulerian bias expansion, the resulting field is dominated by UV modes and does consequently not correlate well with the true halo density on large scales.
Overall we have shown in this section that the quadratic bias model performs substantially better than the linear models, because its model error is smaller and less scale-dependent. As we are going to discuss in Section VII A later, the quadratic bias model also performs better than nonlinear Standard Eulerian bias models, because it avoids squaring the nonlinear dark matter density and expanding large bulk flows.
FOF halos at z = 0.6, no mass weighting This demonstrates that the correlation between model density and halo density in simulations is similar to that expected from the Poisson prediction on all scales. In detail, it is somewhat larger than that for low-mass halos and somewhat smaller for high-mass halos, because Perr deviates from 1/n as shown previously. The curves in the upper subpanels coincide with 1 − r 2 cc , and the curves in the lower panel are equal to P model /P truth , because the model transfer functions minimize Perr (see Appendix B). The cubic bias model is only shown for the two heaviest halo samples because they are identical to the quadratic bias model for the other halo samples.

Correlation Coefficient
A related question is how well the model density is correlated with the simulated halo density. This is shown in Fig. 7. For the lightest halos, we find that the model and simulated halo density are more than 75% correlated at k ≤ 1 hMpc −1 , and more than 99.5% correlated at k ≤ 0.08 hMpc −1 for the quadratic bias model, which is similar to the level of correlation expected from Poisson shot noise for the number density of these halos. For the heavier and less abundant halo samples, the cross-correlation coefficient is lower, as expected, because the shot noise is larger. The linear bias models are less correlated with the simulated halo density than the quadratic and cubic bias model are, reflecting their larger model error (this is best seen in the upper panels of Fig. 7 which show 1 − r 2 cc ). Perhaps surprisingly, the quadratic bias model is more than 50% correlated with the simulated halo density for all halo mass bins up to k = 1 hMpc −1 . This implies that even on such small scales, which are expected to be well inside the one-halo term regime, there is significant information about the phases of the linear initial conditions that are used to generate the bias model density. One might think that this is impossible, at least for the most massive halos, because the radius of these halos is larger than 1 h −1 Mpc and modes smaller than the halo radius are virialized, which should destroy any memory of the initial conditions. A possible explanation could be that we know the center of mass positions of these massive halos very accurately (to a fraction of the halo radius), which can probe modes on scales smaller than the radius of these halos. 13 Fig. 7 also shows the mean-square model error divided by the power spectrum of the simulated halos, which represents the fractional mean-square error of the model and coincides with 1 − r 2 cc (see Appendix B). For the quadratic bias model, this fractional mean-square error is less than 1% for the lowest halo mass bin on large scales k ≤ 0.1 hMpc −1 . This means that the rms fluctuations of the model Fourier modes around the truth are less than 10% at k ≤ 0.1 hMpc −1 for these halos. The error increases on smaller scales and for the heavier, less abundant halos, as expected.
In addition to the stochastic model error, the bias model is expected to fail on small scales because of missing 2-loop terms. To get a rough estimate of their size, Fig. 7 also shows the error that the cubic bias model makes when predicting the fully nonlinear dark matter field measured from the N-body simulations, which is essentially free from shot noise (thin solid grey curve). For the densest halo sample this error becomes comparable to the error of the bias model in describing the halo density on very small scales, but at all other scales and for all other halo samples the error of the dark matter model is much smaller. This suggests that the model error is dominated by stochastic noise rather than missing higher order terms in the bias expansion, at least for the heavier halo samples.

Relation to Cosmological Information Content
In the last two subsections we have characterized the size of the model errorˆ = δ truth h − δ model h and the crosscorrelation coefficient r cc between truth and model. But how is this related to the cosmological information one would hope to extract when applying this model to a measurement of the halo density? As we are going to show in this section, the size of the model error discussed above determines the amount of cosmological information one can extract from the halo density relative to the total information one would get with a perfect model.
To see this, let us first write the true halo density as the sum of the model density and the model error, and assume that the model is evaluated for the optimal transfer functions that minimize P err = |ˆ | 2 and enforce δ model hˆ = 0. Since we know how this model density depends on the linear density and therefore on cosmology, we can use it to measure cosmological parameters. In contrast, we do not attempt to use any potential cosmology information of the model errorˆ -otherwise we would include it in the model density. The model error therefore acts as an uncorrelated noise contribution to the field. 14 The size of the model error relative to the size of the true halo density therefore determines how noisy the field is and how much cosmological information we can extract from it. In the last two subsections we have quantified this by comparing the noise power, P err = |ˆ | 2 , against the power of the measurable true density, P truth = |δ truth h | 2 .
To illustrate this more clearly, consider the amplitude A of the model power spectrum, P model (k) → AP model (k), as a proxy for the cosmological information content. How well we can determine A given a measurement of δ truth h modeled with δ model h ? This is given by the Fisher information where we assumed a diagonal Gaussian covariance given the observed power spectrum P truth . We have also assumed that there are no other parameters in the analysis (in practice, one would usually need at least one parameter to describe P err , and marginalizing over this can degrade F AA in Eq. (43) [99]).
If the model was perfect, i.e. P err = 0 and P model = P truth , Eq. (43) would give the optimal amount of information, which is determined by the total number of Fourier modes. For an imperfect model, P model < P truth , the ratio 13 We can tell if the halos are located at positions x and y or x and y + ∆, where ∆ is only limited by the resolution with which we can measure center of mass halo positions and not directly by the radius of the halos, as long as the halos are separated by a few halo radii, which is usually the case given the low number density of very massive halos. 14 In our approach the model errorˆ has two contributions: First, stochastic noise terms, which cannot be predicted given the initial condition Fourier modes on large scales. Second, higher-order bias terms not included in the model, or more specifically, the components of these higher-order bias terms that are orthogonal to any bias term in the model (so they cannot be absorbed by transfer functions; for exampleG ⊥ 3 is part ofˆ for our models). These orthogonal higher-order bias terms do depend on cosmology, but to make use of this we would have to include them in the model. All cosmological information that we extract from an observation of δ truth h is therefore contained in δ model h , and the model errorˆ acts as a noise contribution. Notice that the model error is uncorrelated with the model density, δ model hˆ = 0, because both the stochastic and orthogonal higher-order terms are orthogonal to all terms in δ model h . As a consequence, P truth = P model + Perr. P model /P truth in Eq. (43) determines how much less information one gets per Fourier mode. The square root of this is shown in the lower subpanels of Fig. 7 above. This therefore shows how close the bias expansion gets to keeping the information on the model amplitude A which contains cosmological information; on large scales it typically keeps 99% or more of the cosmological information, but it retains less on smaller scales.
Using the results from Appendix B, Eq. (43) can be rewritten in several ways if transfer functions are chosen such that P err is minimized. For example, This shows that the amount of information on the amplitude A is given by the correlation coefficient between the perturbative bias model (whose cosmology dependence or dependence on A we know) and the true density. We can also rewrite this as The first term in square brackets corresponds to the optimal amount of information; the second term, 1−r 2 cc , represents the fractional amount of information we loose if the perturbative model and truth are not perfectly correlated (if they are perfectly correlated, we loose no information because 1 − r 2 cc = 0; if they are completely uncorrelated we loose 100% of the cosmological information because 1 − r 2 cc = 1). This is indicated by the upper subpanels in Fig. 7 above. This shows that the bias expansion with optimal transfer functions looses 0.2% of the cosmological information at k 0.02 hMpc −1 for the lightest and most abundant halos, while it looses more of that information on smaller scales and for the heavier and less abundant halos.
A third way to rewrite the above formula follows from 1 − r 2 cc = P err /P truth (assuming optimal transfer functions): Similarly to before, the first term in brackets gives the optimal amount of information, and the second term represents the penalty we get if the field level model error P err is large, which is the case if stochastic noise terms or higher order bias terms not included in the model are large.
In practice, when analysing data from a galaxy survey, the noise power spectrum P err is not known a prioriwe only know this for the particular set of halos that we selected from our simulations and compared against the model density, and it is difficult to determine which halos exactly host the galaxies observed by a survey and what noise power spectrum they have. This reflects an important difference between large-scale structure and CMB data analysis: For the CMB, the noise power spectrum is known if the detector noise of the experiment is known, and the noise bias it imprints on CMB auto-power spectra can be subtracted, or it can be avoided by using cross-correlations. In contrast, for large-scale structure, the theoretical model itself has a noise, which imprints a noise bias (P err ) on the measured galaxy power spectrum. Its amplitude, and potential scale dependence, depend on the sample of galaxies under consideration; since they are unknown in general, the amplitude and scale dependence of P err need to be marginalized over and P err cannot simply be subtracted from the measured galaxy power. Our goal in this paper is to characterize the model error and the induced power spectrum noise bias for simulated halos. This can serve as a guide for the expected amplitude and scale dependence of the noise bias of the galaxy power spectrum in a real survey, and, as explained above, it quantifies the amount of cosmological information retained by the bias expansion.

B. Scale Dependence of the Model Error
The above Fisher information represents the inverse variance with which parameters like the model amplitude A can be measured when modeling a measurement of the halo density with the bias expansion. A different question is whether the resulting parameter measurements are also unbiased. This is not determined by the fractional size of the model error or noise relative to the size of the true halo density, but by our ability to describe the expectation value of the true halo power spectrum (or any other observable). For that, we need to parameterize the noise power spectrum P err (k), and ask how accurate that parameterization is, i.e. how well the sum of P model (k) and the parameterized P err (k) matches the observable P truth (k).
A common and simple choice for data analyses is to parameterize the model error with a scale-independent constant, P err (k) = const. This approach is correct if P err (k) is really independent of scale, which is expected theoretically on Cubic bias Linear Std.
Eul. bias Figure 8. Fractional deviation of the mean-squared model error Perr(k) from a constant in k, for the linear Standard Eulerian bias (black) and cubic bias (orange). Including the nonlinear bias terms makes Perr(k) flatter at k 0.1 hMpc −1 , thereby pushing the wavenumber where the deviation of Perr from a constant exceeds ±1% of the halo power spectrum P hh (blue region), or ±5% of the low-k constant (green region), to higher k. As a consequence, including nonlinear bias terms can extend the k-range usable in a data analysis that assumes a k-independent model error or shot noise, although at the price of introducing more free bias parameters. In this figure and the next ones, the linear Standard Eulerian bias model uses a free transfer function β1(k); the cubic model instead uses the 6-parameter theory fit described in the main text, and includes the full δZ field. The low-k constant against which we compare is computed by tripling the width of k bins to ∆k 0.038 hMpc −1 to reduce the noise of the measured Perr, averaging over realizations, and computing the mean of this rebinned Perr(k) at k < 0.15 hMpc −1 . Shaded regions around solid curves represent the 1σ uncertainty estimated from the scatter between the five simulations. scales much larger than the typical size of halos (e.g., [33,48]), and this is indeed what we found for the nonlinear bias models on large scales. But on small scales, the measured model error does depend on scale. If that scale dependence is sufficiently strong, ignoring it can potentially bias cosmological parameter measurements, because the scale dependence of the error could be misinterpreted as a cosmological signal. To account for this, we either need a more general parameterization of P err (k), or we need to exclude from data analyses all small scales k > k max where the scale dependence of the model error is significant. In the next subsections we will investigate the latter approach in more detail, quantifying the scale dependence of the model error and determining the k max up to which it is safe to assume a constant P err (k).

Simulation Results
Let us start by quantifying the scale dependence of the model error. Fig. 8 shows the fractional deviation of the measured model error power spectrum P err (k) from the constant low-k component of P err , for the linear Standard Eulerian (black) and cubic (orange) bias model. For the linear Standard Eulerian model, P err (k) starts to deviate from a constant by more than 5% at k 0.1−0.15 hMpc −1 ; for the cubic model, that only happens at k 0.3−0.4 hMpc −1 for the three low halo mass bins, and at k 0.5 hMpc −1 for the most massive bin. Including the nonlinear bias terms therefore increases the k-range where P err is approximately constant by a factor of more than two.
We can also ask how large this scale dependence of P err is compared to the amplitude of the measured halo power spectrum, [P err (k) − const]/P truth (k). This is shown in Fig. 9. For the linear Standard Eulerian bias model, the scale ±1% of P hh Expanding Z

Quadratic bias
Cubic bias Linear Std.
Eul. bias Figure 9. Deviation of the model error power spectrum Perr(k) from a constant in k, relative to the halo power spectrum P truth , for the linear Standard Eulerian bias (black) and cubic bias model (orange). The blue band shows ±1% of P truth . This shows that by including nonlinear bias terms the deviation of the model error from a constant in k becomes relevant compared to the halo power spectrum at higher k than for linear bias, allowing smaller scales to be included in an analysis that assumes a constant model error power spectrum. The cubic model shown in solid orange includes the full δZ . Expanding δZ gives a slightly larger error at k 0.2 − 0.3 hMpc −1 for the halos in the top left panel (as expected theoretically for such low levels of noise; see Section V A 1), but does not visibly affect the curves for the halos in the other panels. Dropping the cubicδ3 term from the model increases the scale dependence of Perr at k 0.1 hMpc −1 for the halos in the lower left panel, but does not visibly affect Perr for the halos in the other panels. As before, shaded regions around solid curves represent the 1σ uncertainty estimated from the scatter between the five simulations. dependence of P err exceeds 1% of P truth (shown in blue in Fig. 9) again around k 0.1 − 0.15 hMpc −1 . In contrast, the flatter P err of the cubic model exceeds 1% of P truth only at k 0.25 − 0.5 hMpc −1 , depending on halo mass.
These results depend only mildly on details of the quadratic or cubic bias model. Expanding the Zel'dovich density δ Z in the bias model using Eq. (19) only has a visible effect for the lowest halo mass bin where the number density is so large that P err 30 h −3 Mpc 3 at low k, which is sufficiently small that corrections from expanding δ Z become relevant. The cubic termδ 3 only affects the model error of the 10 12.8 − 10 13.8 h −1 M halos; for these halos, the quadratic bias parameter β 2 is close to crossing zero and is smaller than the linear and cubic bias parameters, so that the cubic bias is relatively more important (see also Section V C and Figures 12 and 17 below). Other than that, the scale dependence of the model error with full or expanded δ Z and with or without cubic bias term is rather similar.

Detectability of the Scale Dependence of the Model Error
The scale dependence of the model error is only relevant if it is strong enough to be statistically detectable by galaxy surveys, because in that case it may bias cosmological parameters if unaccounted for. We therefore compute the significance with which any scale dependence of P err could be detected in the halo power spectrum for a survey FOF halos at z = 0.6, no mass weighting covering a volume V survey and using Fourier modes up to k max . This is given by Here, a Gaussian covariance is assumed for the measured halo power spectrum P truth , and the sum is over k bins with width ∆k (the result does not depend on ∆k if the binning is sufficiently fine). As expected, the significance of the scale dependence of the model error is determined by the size of the scale dependence relative to the amplitude of the measured halo power spectrum (shown in Fig. 9), and it increases with the survey volume and with the highest included wavenumber k max , because these determine the number of 3-d Fourier modes. Importantly, this is the bestcase scenario for the model error because we assume all bias parameters to be perfectly known (by matching the field level prediction against the simulations).
We evaluate Eq. (47) for V survey = 1 h −3 Gpc 3 as a function of k max in Fig. 10. This shows that the scale dependence of P err (k) cannot be detected for k max < 0.1 hMpc −1 , but it becomes a 1σ effect around k max 0.1 hMpc −1 for the linear Standard Eulerian bias model and at higher k max for the cubic model.   Table IV. Maximum wavenumber kmax when a scale dependence of the model error can be detected with 1σ in a 10 h −3 Gpc 3 volume (or in a 0.5 h −3 Gpc 3 volume, shown in brackets), for the linear Standard Eulerian and the cubic bias models. The kmax of the cubic model is typically 2 to 3 higher than that of linear Standard Eulerian bias. This can improve measurements of cosmological parameters that affect the galaxy power spectrum at these scales, for example the sum of neutrino masses. The values carry a significant uncertainty due to the noise in our measurement of the model error, as shown in Fig. 11.
as a function of the survey volume. We also summarize this in Table IV for two survey volumes.
For the linear Standard Eulerian bias we find the following. For the lowest halo mass bin, the scale dependence becomes significant for k max 0.1 hMpc −1 in a 10 h −3 Gpc 3 volume or for k max 0.14 hMpc −1 in a 0.5 h −3 Gpc 3 volume. For the highest halo mass bin, the critical k max is similar, while for the intermediate mass halos it is somewhat lower, k max 0.07 − 0.1 hMpc −1 . For the cubic bias model, the model error is much less scale-dependent, and its scale dependence becomes a 1σ effect only at higher k max , typically around k max 0.15 − 0.3 hMpc −1 (see Table IV).
The cubic model therefore extends the k range where an analysis with scale-independent model error may in principle be safe by a factor of 2 to 3 compared to the linear Standard Eulerian bias model. In principle, this could reduce cosmic variance error bars of parameters by a factor of √ To illustrate the resulting increase in constraining power, we consider an idealized example where all parameters are fixed except for the overall amplitude A of the clustering part of the halo power spectrum, and ask how well this amplitude can be measured. For this purpose, bold black lines in Fig. 11 show for what volume and k max the amplitude A can be constrained to 1%, 0.5%, or 0.1% (the lines are obtained similarly to Eq. (47) but summing over (P model /P truth ) 2 ).
For the lowest halo mass bin, using the linear Standard Eulerian bias with a scale-independent model error can safely constrain the amplitude A to 0.5% if the volume is larger than ∼ 3 h −3 Gpc 3 (corresponding to volumes where the black long-dashed curve in Fig. 11 is in the grey shaded area). In contrast, using cubic bias for these halos, the amplitude can be constrained to 0.1% if the volume is larger than ∼ 3 h −3 Gpc 3 , because the model error is less scale-dependent.
For the more massive halos with a more realistic number density, the scale dependence of the error becomes relevant on larger scales, so that the amplitude cannot be constrained as well, typically σ A /A 0.7 − 1% for linear Standard Eulerian bias and σ A /A 0.3 − 0.5% for cubic bias in a 3 h −3 Gpc 3 volume. Therefore, parameter constraints that rely on measuring the halo power spectrum at redshift z = 0.6 with subpercent level precision while assuming a scale-independent model error or shot noise will require nonlinear bias terms, or a rather large volume, V survey > 10 h −3 Gpc 3 . 15 We can also ask what volume and k max are needed to constrain the model amplitude A with a certain precision. Let us pick the halo sample in the upper right panel in Fig. 11. If this is modeled with linear Standard Eulerian bias, a 1% amplitude measurement is possible by using all modes up to k max = 0.1 hMpc −1 in a 1 h −3 Gpc 3 volume. Or one could observe a larger volume, so that there are more 3-d modes per k, and use a lower k max . Using k max > 0.1 hMpc −1 for these halos with the linear Standard Eulerian bias model would require accounting for scale-dependent corrections of P err . Measuring the amplitude to 0.5% in a volume smaller than 10 h −3 Gpc 3 requires a higher k max . When assuming P err = const, this is only possible with the cubic bias model. A 0.1% amplitude measurement is not possible with these halos when assuming P err = const, except maybe for very large volumes.
In reality, the signal one is after may only come from some range of scales, and there may be degeneracies between parameters, so that cosmological parameters will be less well constrained than the simple model amplitude A that we used above. We also emphasize that the k max values reported here are only approximate estimates because of uncertainty in our measurement of P err (k) from only five simulations. This is indicated by thin grey lines in Fig. 11, which show how the critical k max changes when computing the low-k constant part of P err (k) from a different k range.
Eul. bias Figure 11. Relevance of the scale dependence of the model error as in Table IV, but for more survey volumes. For kmax and Vsurvey within shaded regions, it is safe to assume a scale-independent model error, Perr = const. Outside shaded regions, the scale dependence of the model error is detectable with more than 1σ, so that assuming Perr = const can bias cosmological parameters by 1σ or more. For the linear Standard Eulerian bias the threshold is typically at kmax 0.1 − 0.15 hMpc −1 (grey region). In contrast, for the cubic bias model the maximum wavenumber kmax is higher by a factor of 2 to 3 (orange region). For comparison, the bold black lines show what kmax is required for a given volume to measure the amplitude A of the model power spectrum to 1% (solid), 0.5% (dashed), or 0.1% (short-dashed), assuming Gaussian cosmic variance and that all other parameters (six parameters for transfer functions and one for Perr) are perfectly known. In the case of a 10 h −3 Gpc 3 volume, the total number of halos in the four panels isnVsurvey = 430 M, 57 M, 5.6 M, and 0.26 M. All results apply to a single redshift at z = 0.6. Due to noise in our determination of Perr, the shaded regions are somewhat uncertain. This is indicated by the semi-transparent lines, which show results when the k range for determining the constant part of Perr is changed in the same way as in Fig. 10. Alternative estimates for the breakdown of the bias models are shown as solid horizontal lines at the bottom of each panel. These are computed as the value of k where |Perr(k) − const|/P truth (k) first exceeds 0.5% or 1% for the linear Standard Eulerian bias model, or 0.2% and 0.5% for the cubic model (to reduce the impact of noise we use a spline smoothed Perr). For the linear Standard Eulerian bias we assume that the transfer function β1(k) is a free function k, whereas for the cubic bias we use the 6-parameter fit of the transfer functions βi(k) described in the main text. For the cubic model in the lower right panel we use the fitting formula from Fig. 10 because it looks more robust in that case; otherwise we use the full k dependence shown in Fig. 10.
We emphasize that the primary goal of this section was to investigate whether there is any physical scale dependence of the model error when describing simulated halos with the best possible bias expansion based on the shifted operators we include, and how relevant this may be. If such a scale dependence of P err is too strong, it is wrong to account for it with scale-independent rescaling of the shot noise in data analyses, and we have quantified for what scales and volume this is the case. In practice there can be other reasons and systematics that may require a lower k max cutoff in data analyses.

Interpretation of the Scale Dependence of the Model Error
The critical k max values shown in Fig. 11 and Table IV above depend only weakly on the survey volume. The reason for this is that the assumption of a scale-independent model error breaks down abruptly once a critical k max value is exceeded, which is the case because the detectability of a scale dependence of the model error scales strongly with k max , as shown in Fig. 10. This strong scaling with k max can be understood as follows. On scales larger than the typical size R M of a halo, the true stochastic part of the modeling error stoch can be written as [48,56] where d i andd i are k-independent parameters, and 0 (x) is a stochastic noise field with P 0 0 (k) = const ∼ 1/n and On scales larger than the typical size of a halo, k k M ∼ 1/R M , the stochastic contribution to the error power spectrum therefore takes the form [33,48] Even if P 0 0 had some k dependence, expanding P 0 0 (|k − p|) in k/p 1 on large scales would again lead to a constant term and a correction scaling like k 2 . In addition to the stochastic term, the error power spectrum also has a part related to the higher order contributions in the bias model P err (k) = P stoch (k) + P 2−loop (k) + . . . .
Let us briefly discuss the relative size of the different contributions to the noise power spectrum. The k 2 correction to P err is roughly given by In this formula M min is a typical mass of a halo in the lowest mass bin and 1/n min the corresponding shot noise. In the last step we have for simplicity assumed that R Mmin ∼ 1/k NL and that the number density of halos scales like n ∼ M −1 , which is consistent with the halo mass function in the relevant range of masses. On the other hand, the two-loop contribution at z = 0.6 is roughly given by [99] where P (k) is the nonlinear matter power spectrum. Comparing the two contributions we find that the two-loop power spectrum is sub-dominant when For the lowest mass bin the number on the l.h.s. of this inequality is of order O(few). This is expected, as we have already seen that the two-loop contribution has a similar size as the shot noise for the least massive halos. However, for higher halo masses the relevant number quickly becomes much less than one. Even though the linear bias increases with mass, this is not nearly as fast as needed to compensate the strong mass dependence M −5/3 . For this reason it is safe to assume that for the halo masses relevant for future LSS surveys the stochastic k 2 contribution dominates over the two-loop contribution to P err , so that the following relation holds on scales in the perturbative regime: P err (k) − c 1 ∝ k 2 . Then, the SNR 2 for the scale dependence of P err , given by Eq. (47), scales as where the k 3 max factor represents the number of modes, and we assumed P model (k) ∝ k n in the range of wavenumbers that dominates the sum in Eq. (47). Typically, n −1.5 at k 0.2 hMpc −1 , so that we expect SNR 2 ∝ k 3+γ max with γ 7. Indeed, fitting the SNR 2 shown in Fig. 10 with a power law in k max , with dimensionless fitting parameters N and γ, we find that γ 6 − 7 provides an acceptable fit for the cubic model, in agreement with the expected value of γ 7. An exception to this are the halos in the lower left panel of Fig. 10, for which we find γ 3.2; this might be an indication of missing bias terms for these halos, which represent a somewhat special case as we already saw above because the cubic bias term is so important for these halos. Because of the steep scaling with k max , deviations from a scale-independent model error become almost immediately relevant once a critical value of k max is exceeded, without much dependence on the survey volume, as noted above.
One may wonder whether fitting the measured P err (k) with c 1 + c 2 k 2 provides a better fit than fitting P err (k) with just a constant c 1 . As expected from the scaling arguments above, we find indeed that adding the c 2 k 2 term allows to describe the measured P err (k) up to higher k. However, depending on the range of scales and on the mass bin, it is not entirely clear whether this is due to the expected k 2 term or higher-order bias terms that we have ignored in the model and that contribute to P err (k) in a way that might mimic k 2 over a small k range. After all, in the previous analysis we have only kept the leading two-loop term and our estimates break down close to k NL . Additional confusion comes from the fact that including c 2 k 2 to describe P err requires adding another free parameter c 2 , so that the k range where the fit holds gets always extended just because there is more flexibility to fit the measured P err (k). Another problematic aspect of this measurement is that in some cases P err is orders of magnitude smaller than the halo power spectrum so that our determination of P err (k) − c 1 has significant noise because of the small number of simulations used. With more simulations it may be possible to establish the presence of the k 2 term in P err (k) or higher order bias terms more conclusively.
At this stage it is therefore not clear whether adding a k 2 term to the noise or including higher order bias parameters leads to a larger improvement of cosmological constraining power for different ranges of scales and halo mass. We leave this question for future work. At a practical level, it is of course always possible to include a k 2 term in the noise, marginalize over its amplitude, and see if it improves cosmological constraining power. Even if the k 2 term only happens to describe missing two-loop terms (without exploiting their cosmology dependence), it may extend the k range where the model fits the data, potentially improving the cosmological constraining power from the other terms in the model whose cosmology dependence is included.

Relative Size
We have demonstrated that nonlinear bias terms can lead to a substantial reduction of the model error at the field level. But which of the nonlinear bias terms are most important? To answer this, we compute the contributions of the individual bias terms to the complete model power spectrum, for the case of the cubic bias model, which is the most successful model that we have tested. We work with the orthogonalized bias operators so that there are no contributions from cross-spectra between different bias terms. Fig. 12 shows the contribution of each bias term relative to the total halo power spectrum model. As expected, the linear bias term is clearly the most important one, contributing more than 50% in power to the total model at all wavenumbers and for all halo masses, and more than 98% in power on large scales for all halos except the most massive ones.
For the two low-mass halo bins, the second largest contribution is the quadratic bias β 2δ FOF halos at z = 0.6, no mass weighting Figure 12. Fractional size of bias terms contributing to the best-fit cubic bias model. The linear bias term is clearly the dominant contribution for all halo samples on all scales. The next most important term is typically the quadratic bias term δ ⊥ 2 . It typically contributes 10% in power at k 0.1 hMpc −1 . As expected, it contributes less (1 − 10% in power), on larger scales, 0.02 hMpc −1 ≤ k < 0.1 hMpc −1 , while it contributes more (up to 50% in power) on smaller scales, k > 0.1 hMpc −1 . An exception are the halos in the lower left panel, where this term is smaller (as expected for this range of halo mass), and the cubic termδ ⊥ 3 is more important, contributing between 2% and 20% in power. The shifted quadratic tidal bias termG ⊥ 2 is always smaller thanδ ⊥ 2 , but it still contributes more than 1% in power at k 0.1 hMpc −1 . Grey shaded areas represent the standard error of the mean estimated from five independent realizations. We use the model where δZ is expanded and absorbed by bias operators using Eq. (19). Results are averaged over five simulations and the shaded regions represent the 1σ uncertainty estimated from the scatter between them.
20% at k = 0.2 hMpc −1 . The quadratic tidal term and the cubic term contribute about an order of magnitude less in power, but still contribute with more than 1% to the total model power spectrum at k > 0.1 hMpc −1 , especially the quadratic tidal term.
For the more massive 10 12.8 − 10 13.8 h −1 M halos the cubic term is the second largest contribution for all wavenumbers, contributing 2% to the power spectrum on large scales and up to 20% on smaller scales. The quadratic bias contributes less, but it is still between 1% and 4% of the total power spectrum on most scales. The quadratic tidal term is somewhat smaller than that, reaching a maximum contribution of about 2% at k = 0.2 − 0.4 hMpc −1 . This is roughly the halo mass range where we expect the quadratic bias to cross zero while the linear and cubic bias are large, so it is not surprising that the quadratic contribution is smaller than that of the cubic term for these halos.
For the most massive halos, with 10 13.8 − 10 15.1 h −1 M , the quadratic bias term is the second largest contribution at k < 0.6 hMpc −1 , representing 5% to 15% of the total model power spectrum, and the cubic term takes over at higher k. The cubic term is again larger than 2% of the total power spectrum on all scales. This shows that different bias terms dominate on different scales for different halo masses, which is not surprising because the bias parameters change with halo mass. Also, their contribution to the total model power spectrum is often larger than 1%, which supports the finding above that they should be included when modeling the halo power spectrum to 1% or better.
Given the size of the individual bias terms we can also estimate how well bias parameters need to be known to be able to predict the halo power spectrum with a given precision [99]. For example, for the two low-mass halo bins, modeling the halo power spectrum to 1% at k = 0.1 hMpc −1 (k = 0.2 hMpc −1 ) requires β 2 to be known to 10% (5%),  ] FOF halos at z = 0.6, no mass weighting Figure 13. Similar to Fig. 12 but showing the broadband shape of the bias terms contributing to the cubic model.

Absolute Size
To see the broadband scale dependence of the different bias terms contributing to the power spectrum, Fig. 13 shows the absolute power spectra of the bias terms. The power spectra of the quadratic and cubic bias terms,δ ⊥ 2 and δ ⊥ 3 , are mostly constant in k at low k, and become only visibly k-dependent at k 0.1 hMpc −1 . If we were to perform a data analysis using only the halo power spectrum, we would not be able to separate such constant terms from the constant part of the model error or shot noise, i.e. it seems challenging to constrain β 2 and β 3 using measurements of the halo power spectrum at k 0.1 hMpc −1 . This would be a potential concern, because such a data analysis might not be able to identify a reduced model error for the correct bias parameter values. The k dependence of quadratic and cubic bias terms at higher k might come as a rescue, as might a different type of data analysis that uses a likelihood at the field level or higher-order statistics like the bispectrum. Figures 12 and 13 also show that the tidal quadratic bias termG ⊥ 2 contributes more than 1% in power only at k 0.1 hMpc −1 , and it contributes typically a few times less in power than the quadratic bias termδ ⊥ 2 , although both terms are at the same order in perturbation theory. This might just be a consequence of the ordering we use to orthogonalize bias operators. An alternative explanation would be that Lagrangian space proto-halos may be less sensitive to the quadratic tidal field than to the local quadratic bias term, which may also be the reason why the quadratic tidal bias of Lagrangian proto-halos has only recently been identified with simulations [7,8].

VI. TRANSFER FUNCTIONS IN PERTURBATION THEORY
The main purpose of this section is to derive the k-dependence of the transfer functions β i (k) at leading order in perturbation theory. At the level of a single realization this means that we have to keep all relevant cubic operators in the bias expansion that contribute to the one-loop power spectrum. The steps in the derivation are the same as in Section II. In this section we use the simplified notation δ h ≡ δ model h .

A. Bias Expansion Including Cubic Operators
Let us begin with the expression for the halo density field in Lagrangian space, keeping all cubic operators in the bias expansion The explicit formulas for the cubic operators can be found in Appendix D. As before, to go to Eulerian coordinates we have to displace halos using the nonlinear displacement field ψ Our goal is to rewrite this expression in terms of shifted operators To achieve this, we split the displacement field in the halo density Eq. (58) into the linear piece ψ 1 , which is kept in the exponent, and the nonlinear correctionψ ≡ ψ 2 + ψ 3 + · · · , which is expanded. Keeping all terms up to third order, as required for the one-loop power spectrum, we get 16 Notice that we have rewritten k which multipliesψ as a derivative with respect to q. After integration by parts, the derivative can act on ψ 1 , δ L h andψ, so we get The third term in the square brackets can be further simplified. The product ∂ b ψ a 1ψ b starts at third order in perturbation theory and we can neglect δ h . One k a that appears in this term can be written as a derivative with respect to q a . After another integration by parts and keeping terms of order three or less, we find Each term in this formula has the form of a shifted operator (17). We have already mentioned that the shift of the uniform density, i.e. the Zel'dovich density, can be written in terms of bias operators (see Appendix A). Similarly, the other terms {∂ b ψ a 1 ∂ a ψ 2b , ∇ · ψ 2 , ∇ · ψ 3 } can each be expressed as linear combinations of bias operators. We leave the details of this calculation for Appendix D. As a result, the effect of these terms is to effectively change the values of the bias parameters in the original Lagrangian-space bias expansion in δ L h . On the other hand, the last term S 3 ≡ ψ 2 · ∇δ 1 is a shift term of the linear field by ψ 2 , which cannot be expressed in terms of bias operators. It is a new contribution that has to be taken into account separately. However, the coefficient of this term is completely fixed by the equivalence principle-i.e. this term does not come with an extra free parameter.
Putting everything together, the bias model becomes Notice that the local bias parameters remain unchanged in this process (apart from the linear bias b L 1 which as expected increases by 1 when going to Eulerian space). Indeed, this is a general statement that is true at all orders in perturbation theory. The simplest way to see this is to consider the limit of the dark matter density field, when all Lagrangian bias parameters are equal to zero. In this limit, all local operatorsδ n (i.e., shifted δ n (q)) with n ≥ 2 have to drop from the expression for the nonlinear dark matter density field because otherwise the power spectrum would have nonphysical shot noise contributions. The other, nonlocal bias parameters are related to the original Lagrangian ones in the following way:

B. Transfer Functions
In Sections IV and V, we compared the halo density field with the bias model We can use the perturbative bias expansion derived in the previous section, Eq. (63), to predict the shapes of the transfer functions β 1 (k), β 2 (k) and β G2 (k) at one-loop order in perturbation theory as follows.
Let us first consider the cubic operatorsÕ 3 in Eq. (63). These operators can easily be decomposed into the part parallel toδ 1 and the part orthogonal toδ 1 At one-loop level, the orthogonal cubic fields are also effectively orthogonal to second-order operators, because their correlation is higher order in perturbation theory. This means that the fieldsÕ ⊥ 3 (k) do not contribute to the one-loop power spectrum and we can drop them from our expressions. In a similar fashion, we can projectδ 2 andG 2 toδ 1 and make the remaining fields orthogonal to each other. It is then straightforward to compute the transfer functions in terms of the bias parameters: 3 and other bias parameters are given by Eqs. (64)- (67). Indeed, it is easy to check that with these transfer functions the power spectrum of the halo density field in Eq. (68) exactly agrees with the usual one-loop halo power spectrum (up to the fact that the IR resummation is automatically included if one uses shifted operators).
These transfer functions can be further simplified in practice. For example, not all cubic operators contribute to β 1 (k) with independent shapes. At leading order in perturbation theory the only two nontrivial k-dependent terms come fromΓ 3 andS 3 . The other operators give constant contributions, which are degenerate with b 1 . 17 Once this is taken into account, β 1 (k) becomes Notice that b 1 in this formula is different from its starting value due to the degenerate contributions from cubic operators. The new value corresponds to the so called renormalized bias b 1 . The important point is that we kept the same parameter multiplying the operatorS 3 . Even though this may not be obvious from just a few leading orders in perturbation theory, this choice is imposed by the fact thatS 3 comes from the shift of the halo density field by the second order displacement. This term is fixed and has no extra free parameters, even when renormalization is taken into account. Finally, we have to add a k 2 term to the transfer function β 1 (k) with a free coefficient. In analogy with the EFT counterterm for the one-loop matter power spectrum we label this parameter c 2 s even though this counterterm is there to absorb all UV contributions from correlation functions of the form δ 1Õ3 and the bias coefficients from the higher-derivative bias operators such as ∇ 2 δ.
Let us now turn to the second transfer function. This expression can be further simplified. The first step is to write which implies that at one loop δ⊥ 2δ ⊥ 2 = δ 2δ2 because the second term is higher order in perturbation theory. For the same reason, at large scales we can replace δ⊥

2G
⊥ 2 with δ 2G2 . As a result, we can write the transfer function β 2 (k) as follows In the limit k → 0 the numerator of the second term scales like O(k 2 ) while the denominator approaches a constant. Therefore, the second term vanishes on very large scales. Notice that this contribution is not suppressed by loop factors because both numerator and denominator are of the same order in perturbation theory. For this reason, when the transfer functions are measured at not-so-large scales where the scaling O(k 2 ) is not valid, the second term is not necessarily negligible. However, because of the large constant contribution to δ 2δ2 , the second term turns out always to be small enough, given the size of the higher loop corrections that we are neglecting and final error bars with which we determine the bias parameters.
To summarize, we use the following minimal model to fit the k-dependent transfer functions This model has 5 free parameters, the same as the one-loop power spectrum. When we use the cubic bias model, we add one extra parameter, b 3 , which is fitted from the low-k limit of β 3 (k).

C. Power Spectra of Shifted Fields from Theory and on a Grid
To fit the transfer functions with Eq. (76) we need to calculate the power spectra Õ aÕb of shifted operators that enter Eq. (76). As we already mentioned, this calculation is the same as in [57,70], and more details can be found there. Here we summarize only the main steps. Let us start from the definition The strategy to evaluate the expectation value is to bring the operators O a and O b to the exponent and use the cumulant theorem. This can be achieved using the following trick The cumulant theorem reads and since we are interested only in terms at leading order in λ, the final expression for the expectation value of the exponential is given by . Notice that we have truncated the sum. The reason is that at one-loop level the operator O ab can be at most fourth order in perturbation theory and therefore it can be contracted with at most four displacement fields ∆ψ in a connected n-point function. Furthermore, for different terms in transfer functions the operator O ab can be proportional to even or odd powers of initial density fields and some of the correlation functions vanish for Gaussian initial conditions. We will refer to the first non-vanishing term proportional to λ as the leading-order term (LO) and higher order terms in perturbation theory as next-to-leading (NLO) and next-to-next-to-leading orders (NNLO). Let us take a look at the simplest example of δ 1δ1 . In this case only two terms are non-vanishing All connected n-point functions on the r.h.s of this equation involve only linear fields and can be calculated by going to momentum space. The result is a function of q and one can do the Fourier transform to calculate the power spectrum. Fig. 14 shows the power spectrum prediction with and without the NLO corrections. The points represent measurements from a realization of the linear density field shifted by ψ 1 . The agreement when NLO corrections are included is quite good. Notice that in this example the NLO term is of one-loop order and has to be kept for consistency.  Other correlators of interest for transfer functions are those where the shifted linear field is correlated with a shifted second order operator, such as δ 1δ2 or δ 1G2 . These correlation functions would vanish for non-shifted operators in the case of Gaussian initial conditions. Due to the correlations induced by the linear displacement field ψ 1 , they are not zero but the LO terms are roughly of the order of the one-loop corrections. For example, A similar expression can be written for the correlation withG 2 . The correlation functions in the brackets can be again straightforwardly calculated. Notice that the NLO term in this formula is higher order in perturbation theory because it is proportional to the cubic power of the linear power spectrum. Indeed, these NLO corrections are small. In Fig. 15 we compare the theoretical prediction with LO and NLO with the measurement of the cross spectra of shifted fields in a given realization. As we can see, the NLO corrections are indeed small compared to one loop terms. Given that all our analysis is valid only at one-loop level, the NLO terms in these correlation functions can be neglected.
Finally, in a similar way one can calculate the other two correlation functions δ 1Γ3 and δ 1S3 in the transfer function β 1 (k). In this case there are three terms that survive in the cumulant expansion, but the LO term is already of one-loop order and the higher order contributions can be ignored.

D. Fitting the Transfer Functions from Simulations
In the previous sections we have derived the transfer functions in perturbation theory. Working at the one-loop level, for the cubic bias model we got while the other three transfer functions are constant In this section we compare this theoretical prediction with the free transfer functions β i (k) measured from simulations by minimizing the mean-square model error.
for the four mass bins. Treating all k bins as independent and minimizing the power of the model error in each k bin gives the black lines, with uncertainty shown in grey (estimated from the scatter between the five independent simulations). When fitting these transfer functions with the theoretical model described in the text, using six k-independent parameters, we obtain the red and orange lines. These fits either include theoretical errors, effectively restricting the fitting region to low k (red), or they are fitted up to higher k without theoretical errors (orange). The model error is almost the same for the free transfer functions and the smooth theory fits (see dashed lines in Fig. 6 above).
One obvious problem with fitting of the transfer functions is that the perturbation theory expressions are valid only in the low-k limit, while most of the constraining power in the fit comes from the small scales. In order to solve this problem and avoid overfitting, we use the prescription from [99] to consistently include theoretical errors in the estimate of the bias parameters. The covariance matrix used to calculate χ 2 can then be written as a sum of two terms: where k i is the wavenumber in bin i. The first term is related to the noise in the halo power spectrum. For example, in the case of β 1 (k) this contribution in the low-k limit can be estimated as where N sim is the number of simulations, k F the fundamental mode of a simulation box and δk the width of the bins. In practice our model for C noise (k) is based on a fit of the scatter between 5 simulations in the measurement of each of the transfer functions. In this way our estimate of this contribution is valid on all scales and for each β i (k). As we already explained, the second term in the covariance matrix comes from the theoretical uncertainties. Our estimate for the size of the one and two-loop corrections to the transfer functions is and Given that in our perturbative model  Fig. 6 above, is the same with or without δZ in the model, except for the lowest halo mass bin, where the model error is small enough that correction terms from expanding δZ become visible. We add 1 and 1/2 to β1 and βG 2 for easier comparison against bias model without δZ , noting that δ h = δZ + β1δ1 + · · · ≈ (β1 + 1)δ1 + β2δ ⊥ 2 + (βG 2 + 1/2)G ⊥ 2 + β3δ ⊥ 3 .
coherence length of the transfer functions. In other words, this is the typical scale at which the transfer functions vary with k. Given that they are quite smooth, we choose ∆k = 0.2 hMpc −1 . Our fits are not sensitive to the choice of ∆k as long as it remains in a reasonable range of values. We choose k max = 0.5 hMpc −1 for β 1 and k max = 0.2 hMpc −1 for all other transfer functions. Given our theoretical errors, the values of the best fitted bias parameters saturate well before k max .
The result of fitting the transfer functions using the procedure described so far is shown in Fig. 16 (red lines). The k dependence of β 1 (k) is well described at k ≤ 0.3 − 0.4 hMpc −1 when using theoretical errors in the fitting procedure. The constant pieces of the nonlinear bias transfer functions β 2 (k), β G2 (k), and β 3 (k) are in reasonable agreement with those measured from simulations at low k. The typical relative error of the fitted parameters is roughly 1% for b 1 and roughly 10% for all other parameters. As we have shown in Fig. 6 above, the model error changes only minimally when using the theory fits instead of the full transfer functions from simulations for the cubic model, confirming that a perturbative description of the transfer functions is sufficiently accurate for our purposes.
If the Zel'dovich density is kept explicitly and not absorbed by shifted bias operators using Eq. (19), the transfer functions remain unchanged at low k (except for the expected offset of 1 for β 1 and 1/2 for β G2 ), but they change their shape at high k, as shown in Fig. 17. This is because the higher order corrections to Eq. (19) become important at high k. However, the theory prediction for the transfer functions with theoretical errors are flexible enough to capture this difference.
So far we have used the perturbation theory predictions in a rigorous way, accompanied with the appropriate estimate of the neglected higher order corrections. This effectively restricts the range of applicability of perturbative results to k ≤ 0.4 hMpc −1 , which is a bit smaller than the rough estimate for the nonlinear scale k NL at z = 0. 6. In what follows we use the perturbation theory prediction for β 1 (k) in a different way-just as an ansatz for the fitting function. We modify our fitting procedure to remove any theoretical error for β 1 (k) and extend the range of wavenumbers to k max = 0.8 hMpc −1 . The theoretical errors and range of scales remain the same for other transfer functions. In other words, the point of this exercise is to test whether the functional form of β 1 with 5 free parameters has enough freedom to fit the full shape of the measured transfer function. The results are shown in Fig. 16 and Fig. 17 as orange lines. We can see that extending the fit to higher k without accounting for theoretical error improves the fit of β 1 (k) almost up to k 1 hMpc −1 , although in some cases this gives a worse fit of the constant part of the higher order transfer functions, which have a smaller effect on the model error than corrections to β 1 (k).
The best-fit values for the bias parameters are shown in Fig. 18 and in Tables V and VI. The marginalized 1σ uncertainty is typically ∼ 1% for b 1 and ∼ 10% for all other parameters. As expected, the linear bias increases with halo mass, from b 1 0.9 for the lowest halo mass bin to b 1 3.5 for the most massive halos; the local quadratic and cubic bias parameters b 2 and b 3 are negative for low and intermediate mass halos, and become large and positive for the more massive halos. The quadratic tidal bias parameter is positive for low masses and negative for the most massive  halos. These trends broadly agree with theoretical expectations and previous measurements of bias parameters in the literature using different measurement techniques [7,8,73]. However, let us again stress that we expect b 1 to be the only bias parameter that is equal to its renormalized value, measured for example from the power spectrum. The other bias parameters can be different from the values inferred from the correlation functions. The fact that b 2 or b G2 are close to their renormalized values indicates that our prescription for building shifted operators is not very sensitive to very high-k modes. It would be interesting to see if this remains true at higher orders in perturbation theory and we leave this question for future work.
In most cases the best-fit parameters are similar with or without theoretical errors included in the fitting procedure, and with or without absorbing δ Z in the bias operators. An exception are c s and b Γ3 , which are fitted only from the k-dependence of β 1 (k) and which vary significantly between fitting procedures. This is due to a strong degeneracy of these two parameters when fitting only β 1 (k). This degeneracy could be broken by including the shifted Γ 3 operator in the bias expansion on the grid and measuring its transfer function. We do not attempt to do this here, noting that the transfer functions are fitted sufficiently well for our purposes independently of the fitting method.

E. Power Spectrum Model Using Approximate Transfer Functions
In Section V above we focused on the model error power spectrum P err assuming optimal transfer functions that minimize P err (k) in every k bin. This led to several simplifications, including a relation between P err and the crosscorrelation coefficient r cc between model and truth (see Appendix B). How do these results change if we instead use the approximate transfer functions from this section? In particular, what are the implications for modeling of the power spectrum and inference of cosmological parameters? We explore this by calculating the impact of using approximate transfer functions β i = β i + ∆β i , where β i are the optimal transfer functions that minimize P err , and β i are their approximation using the perturbation theory fits (or any other basis of smooth functions).
Let us begin with the power spectrum of the model error. It is easy to show that the new P err is given by where for simplicity we have suppressed the explicit dependence on k. The relative change can be written as where f i is the fraction of the model halo power spectrum P model that comes from the operator O ⊥ i . The typical values of f i are f i ∼ 1 for the shifted linear field and f i ≤ 0.1 for higher order operators (see Fig. 12). Close to the nonlinear scale where the perturbation theory fits break down the error power spectrum is roughly a few times smaller than the halo power spectrum (depending on the halo mass). Therefore, if the transfer functions can be fitted better than O(10%), the change of P err compared to the optimal value is smaller than O(1%).
The situation is different if we are interested in the model for the halo power spectrum. In this case Notice that the leading correction is linear in ∆β i /β i , and the model for the halo power spectrum is therefore more sensitive to the error in the transfer functions. In cosmological parameter inference, in order not to get biased results, we would have to marginalize over the uncertainty ∆P model . In other words, this uncertainty acts like an extra noise. It is then interesting to compare ∆P model and P err and see which one is expected to dominate. It is easy to see that Plugging in typical numbers we can see that this ratio can be easily of order one (or even higher for the low mass halos) if the transfer functions are not approximated to better than O(10%) at all scales of interest for β 1 , or few × O(10%) for other transfer functions. In our perturbation theory fits this is marginally achieved.
It is important to point out that the transfer function fits are dominated by the low-k data, particularly when the theoretical errors are included. This leads to a relatively large variance on the best-fit parameters as explained in FOF halos at z = 0.6, no mass weighting Figure 19. Comparison of the power spectrum of the quadratic bias model against the true halo power spectrum measured in simulations. The orange solid curves assume best-fit transfer functions that minimize Perr(k) in every k bin for each realization.
In contrast, the black dashed curves use the five parameter approximation of the ensemble-averaged transfer functions using perturbation theory as described in Section VI E, designed to fit both the best-fit transfer functions that minimize Perr and the true halo power spectrum. The five parameter fit describes the true halo power spectrum at the 1% to 2% level up to k 0.2 − 0.6 hMpc −1 , depending on halo mass, although the uncertainty, estimated from the scatter between the five independent simulations and shown by shaded regions, is considerable. the previous section. It is then interesting to ask whether in this range it is possible to find a set of bias parameters that gives the correct model for the halo power spectrum. In order to answer this question we fit the measured power spectrum using the same bias model as used for the transfer functions. Importantly, we restrict the range of possible bias parameters to be compatible with the fits of the transfer functions. The results are shown in Fig. 19 where the relative error for the halo power spectrum is plotted as a function of k. The solid orange line corresponds to the model with the optimal transfer functions (minimizing P err ). The dashed black line corresponds to the perturbation theory model with five bias parameters determined to give the best possible fit to the halo power spectrum while still being compatible with the fits of the transfer functions. Only for the lowest mass bin the optimal transfer functions perform better than the perturbation theory fits. In all other cases, the simple fits are sufficient to model the measured halo power spectrum accurately on perturbative scales. For these halos (i.e., for all but the lightest halos), the approximate transfer functions describe the true halo power spectrum slightly better at high k than the optimal transfer functions that minimize P err ; this is expected because the approximate transfer functions are fitted such that P err is close to the minimal value and at the same time the model power spectrum is close to the true halo power spectrum, whereas the optimal transfer functions only minimize P err . 18

VII. RELATION TO STANDARD EULERIAN PERTURBATION THEORY
In this section we discuss how the above results relate to previous approaches and calculations in the literature based on Standard Eulerian bias and Standard Eulerian perturbation theory. Specifically, we will demonstrate in Section VII A that the Eulerian bias expansion fails to describe halos at the field level, and we will discuss the connection with the usual IR-resummation in Standard Eulerian Perturbation Theory in Section VII B.

A. Failure of the Standard Eulerian Bias Expansion at the Field Level
The Standard Eulerian bias model is given by where all operators are evaluated using the nonlinear matter density field δ. In the perturbative approach, δ is calculated using Standard Eulerian perturbation theory. Alternatively, the nonlinear matter density field can be measured from simulations. In this section we are going to show that both these approaches face problems when theoretical predictions are compared to simulations at the level of realizations.

Standard Eulerian Bias Using the Perturbative Matter Density Field
Let us begin by using the perturbative nonlinear matter density field δ as the input for the Standard Eulerian bias model. As in the rest of the paper, we restrict ourselves to operators up to second order and promote bias parameters to k-dependent transfer functions. The model for the halo density field then reads where we explicitly wrote the operators at second order in perturbation theory. For example, the second order density field can be calculated in a realization using a simple convolution where the F 2 kernel is given by Notice that the displacements (the second term in F 2 ) are treated perturbatively. We have already emphasized that this is the reason why Standard Eulerian perturbation theory fails on small scales in describing a realization of the density field of biased tracers. In order to illustrate this point more quantitatively, let us consider a very simple Universe where the true halo density field is exactly given by a linear bias: where accounts for stochasticity. If we fit this halo density field to the model δ linear regression gives Let us compute the r.h.s. of this equation using Standard Eulerian perturbation theory. On large scales we expect β E 1 (k) to be close to b E 1 with corrections of order P loop /P 11 . However, at next-to-leading order, we find that the transfer function is   Figure 20. Left panel: Model error power spectrum for Standard Eulerian bias models, for the lowest halo mass bin. Using the nonlinear dark matter δNL from simulations as the input for the Standard Eulerian bias model (purple) creates a large error on large scales because it involves squaring δNL, which is rather UV-sensitive. Alternatively, using the perturbative dark matter density as the input to the bias model (dark orange) is treating large bulk flows perturbatively, which causes a decorrelation between the model and the true halo density that shows up as a bump in the model error at k 0.1 hMpc −1 . The quadratic model with shifted bias operators (bright orange) avoids both of these issues by squaring the linear density in Lagrangian space, where this operation is less UV sensitive, and then shifting the resulting field to Eulerian space to achieve coherence with the Eulerian-space halo density of the simulations. Right panel: Similar, but with Gaussian smoothing applied to δNL before computing the quadratic bias operators. For larger smoothing scale R, the model error becomes larger because we keep less of the small-scale modes in δ 2 NL that describe the large-scale halo density. Gaussian smoothing does therefore not resolve the above issues of Standard Eulerian bias. In both panels, the width of the shaded regions at low k represents the 1σ uncertainty estimated as the standard error of the mean of the five independent simulations; at high k, the uncertainty is smaller than the width of the curves.
where P 13 is one of the two contributions to the matter power spectrum at one loop P loop ≡ 2P 13 +P 22 [100]. Famously, due to a large contribution from the IR shift terms, P 13 is much larger than P loop [62], and being large and negative causes a significant decay of the transfer function even on scales larger than the nonlinear scale. This decorrelation means that even in the perturbative regime the model fails to predict the halo density field. As a result, the residual noise becomes large and strongly scale-dependent. We find Of course, the residual noise gets corrections from higher-order loop contributions too. However, the P 22 term is already much larger than the naive expectation-the one-loop power spectrum. To conclude, if Standard Eulerian perturbation theory is used to predict the realization of the halo density field, we expect to find a model error which becomes large and strongly scale dependent around the nonlinear scale.
To test this expectation we use the model in Eq. (94) and compare it to simulations. The plot of the power spectrum of the model error normalized to the Poisson prediction is shown in Fig. 20. As we expect, this model works very well at large scales, and in the limit k → 0 the noise is close to the Poisson expectation. However, already around k ∼ 0.1 hMpc −1 the noise becomes scale-dependent and sharply rises. This is due to the decorrelation of the predicted and simulated halo density fields at these scales. In the high k limit, when the transfer functions approach zero, the power spectrum of the model error by definition approaches the halo power spectrum (black dotted curve). This creates a characteristic bump in the noise curve. Notice that the same quadratic model written in terms of shifted operators performs much better and has the constant noise practically all the way to k ∼ 1 hMpc −1 .

Standard Eulerian Bias Using the Matter Density Field from Simulations
One may be tempted to think that a simple way to fix the problem from the previous section is to use the nonlinear matter density field measured in simulations rather than the perturbative prediction. After all, the N-body simulations provide us with the best possible dark matter field that we can hope for. How does the Standard Eulerian bias model work in this case? Let us begin with the linear Standard Eulerian bias. Its model error is shown in Fig. 20 in dark purple. It is a few times larger than the Poisson prediction, especially on very large scales, which is not surprising because we only use a linear bias term. The next step is to include the second order bias operators, that is to use the following model where all operators are evaluated using the nonlinear dark matter field measured from simulations. The resulting model error is shown by the light purple line in Fig. 20. While the error is quite flat, its amplitude is still a few times larger than the Poisson expectation. In particular, in the low-k limit, the error is much larger than for the quadratic bias model based on perturbation theory. This implies that, even on very large scales, the bias model based on the nonlinear matter density field fails to predict the realization of halos.
This observation brings us back to the discussion of bare versus renormalized bias parameters. As we already explained, using different prescriptions for small scale modes leads to different results for transfer functions. The short modes can have a significant impact on the long-wavelength fluctuations through the nonlinear effects. This effect is amplified when using the true nonlinear dark matter field, where the short modes have a large amplitude. In particular, the low-k limit of the halo power spectrum is dominated by the following term where P (k) is the nonlinear matter power spectrum. The integral, related to the variance of the square of the density field, is large and dominated by the short modes. In other words, the quadratic term δ 2 in the bias expansion is producing a large shot noise at large scales. When fitted to simulations at the level of realizations, the minimization procedure will favor very small values for b E 2 to compensate for this large noise. On the other hand, if the optimal b E 2 is chosen to be very small, then the quadratic operator δ 2 essentially does not contribute to the model for the halo density field and consequently the noise of this model is higher than expected.
To confirm that the second order terms are the real cause for the issue, we also use a hybrid model where the field multiplying b E 1 is nonlinear, while the second order operators are calculated using δ 1 The error of this model is shown on the left panel in Fig. 20 in orange. As expected, on large scales this model performs as well as the other perturbative models. On small scales the error is not flat, even though the amplitude of the "bump" is much smaller than before. The residual scale dependence is due to the improper treatment of the large IR displacements in the linear field which enters the second order operators. This is resolved when using the shifted operators (7) for the bias expansion as we do in the other sections of the paper.
The final question that we can ask is how the results look like if the problematic high-k modes are removed from the model. This can be achieved by constructing the bias operators using the smoothed nonlinear density field. The limit of large smoothing scale is particularly important, because in this limit the low-k values of the transfer functions have to match the renormalized bias parameters (this fact was used in [8,73] to infer the values of biases at the field level). What kind of model error do we get in this case? The left panel of Fig. 20 shows the answer. Larger smoothing scales lead to larger model errors in the low-k limit. In other words, in order to explain the halo density field on large scales, it is better to keep the full nonlinear density field than to smooth it out. These results suggest that the description of the halo density field using the renormalized bias parameters and operators is less optimal than the basisÕ i that we use in this paper.
In conclusion, both the Standard Eulerian perturbation theory and the Standard Eulerian bias model have problems when compared to realizations of the halo density field. The perturbative approach expands shift terms which leads to a decorrelation on short scales and a large scale-dependence of the model error around k 0.2 hMpc −1 , while using the nonlinear matter density field from simulations amplifies the effects of very short modes and leads to a large model error even in the low-k limit. Crucially, in both cases, some information from the short scales has to be kept in the model. Smoothing the nonlinear matter field always leads to a larger error.

B. Connection to the IR-resummation in Standard Eulerian Perturbation Theory
So far we have argued that in order to make a perturbative prediction for the realization of the density field of dark matter or biased tracers one has to work with shifted operators. However, at the level of the transfer functions or predictions for the power spectra, only the correlation functions of shifted operators appear. It is then natural to ask how these correlation functions relate to the more familiar counterparts in IR-resummed Standard Eulerian perturbation theory where the large bulk flows are also treated nonperturbatively. This question has been explored previously (see for instance [70]) and in this section we review the main arguments and give some further details. We will begin with the simplest case of dark matter only and then move to biased tracers.

Dark Matter
The nonlinear dark matter field is given by the same expression as δ h where all Lagrangian bias parameters are set to zeroδ The power spectrum of this field up to one-loop order is given bỹ Let us make a few comments about some of the terms in this expression. The kernel of the G 3 operator is such that δ 1 G 3 vanishes. This implies that the cross spectrum of shifted operators δ 1G3 is non-vanishing only at the two-loop order and we can neglect this contribution. The cross spectrum δ 1 [G 2 δ] is proportional to P 11 (k) The corrections to this expression for the shifted fields are of the two-loop order and we will ignore them. In the standard calculation of the one-loop power spectrum for biased tracers this term renormalizes the linear bias b 1 . However, given that in this case we are calculating the power spectrum of the dark matter field, this contribution has to cancel. Indeed, the cancellation is ensured by the contribution fromS 3 . The symmetrized kernel of this operator is such that This implies that the low-k limit of the correlator δ 1S3 is given by This precisely cancels the contribution from δ 1 [G 2 δ] in the power spectrum. Therefore, the nontrivial terms that survive at one-loop order areP whereS new 3 is derived from theS 3 operator by subtracting the constant 4/21 contribution from the kernel. This is the prediction for the one-loop IR-resummed power spectrum from a realization of the shifted fields. Fig. 21 shows the different contributions to the power spectrum. The thin blue line is the power spectrum of the shifted linear field. The thick brown line is the sum of all four terms in the previous equation which represent the one-loop contributions. 19 One interesting point to notice is that the total one-loop contribution is at least an order of magnitude smaller than the leading term in the power spectrum on all scales. This result is not surprising, since the expansion of the nonlinear density field in terms of shifted operators is closely related to the expansion of the nonlinear displacement field in Lagrangian perturbation theory, and it is well known that the one-loop power spectrum of the displacement field is smaller than the linear prediction on all scales.
In what follows we are going to compareP (k) to the usual one-loop IR-resummed power spectrum in Standard Eulerian perturbation theory. Before showing the details let us make some general comments. The shifted power  spectrumP (k) contains all terms of the Standard Eulerian perturbation theory up to one-loop. Therefore, the difference can be only two-loop and higher order contributions. Secondly, the large IR-displacements are resummed iñ P (k) in the same way as in the usual IR-resummation, using the Zel'dovich displacement field ψ 1 . This implies that the BAO wiggles must be suppressed in the same way. Indeed, we are going to show that both these expectations are correct.
Let us begin with a brief summary of how the IR-resummed power spectrum is calculated. The starting point is to split the linear power spectrum in the smooth (non-wiggly) part P nw 11 (k) and the wiggly part that comes from the BAO oscillations P w 11 (k). Algorithms to do this splitting efficiently can be found in [101,102]. The effects of the large displacements exactly cancel in the equal-time correlation functions if the power spectrum is smooth. Therefore, the non-wiggly part of the linear power spectrum can be used to evaluate the loop integrals in the usual way. On the other hand, the BAO wiggles are damped by the large displacements (the BAO peak is broadened in the real space correlation function). For this reason the wiggle part of the one-loop power spectrum evaluated using P w 11 (k) has to be suppressed by the appropriate exponential factor (for more details see [62][63][64][65][66]). The final formula is given by where The parameter λ in Σ 2 λk is usually chosen to be smaller than 1, in order to ensure that the displacements with a given wavenumber affect only the fluctuations on shorter scales. However, in our definition of shifted operators such a condition is not imposed, and for the purposes of the comparison we will use the k-independent Σ 2 ∞ . In a ΛCDM-like cosmology the difference between the two definitions is small. Figure 22 shows the comparison of the one-loop dark matter power spectrum calculated using the shifted operators and the standard formula for the IR-resummation. The agreement between the two is reasonably good. The left panel shows different power spectra normalized to the standard one-loop non-wiggle power spectrum. The thin dashed and solid gray lines are the estimate for the typical relative size of the one-and two-loop corrections respectively at z = 0.6. We can see that the wiggles in the non-IR-resummed one-loop power spectrum are irregular, unlike the case with the IR-resummation. As expected, the difference between the broadband ofP (k) and the Standard Eulerian prediction is of the order of two-loop terms (within a factor of 2). Figure 22 also shows that the wiggles in P IR (k) andP (k) are identical since the relative difference (P (k) − P IR (k))/P nw (k) is smooth (thick blue line).
The other way to see that the wiggles in P IR (k) andP (k) are the same is to look at the correlation function in real space and focus on the BAO peak. This comparison is shown in the right panel of Fig. 22 Figure 22. Comparison of the IR resummation and shifted fields, for the power spectrum (left) and correlation function (right). All curves are evaluated using theory expressions involving the mean linear power spectrum without matching simulated realizations.
functions calculated usingP (k) and P IR (k), labeled byξ(r) and ξ IR (r) respectively, are almost identical. They correctly predict the broadening of the BAO peak, compared to the linear theory prediction ξ 11 (r). As expected, the correlation function that corresponds to the one-loop power spectrum without the IR-resummation ξ(r) has a very irregular peak. For reference we also plot the prediction based on the Zel'dovich power spectrum, ξ Zel (r), which is known to be in good agreement with simulations.
In conclusion, the dark matter one-loop power spectrum calculated with shifted operators is indeed, up to two-loop corrections, identical to the IR-resummed one-loop Standard Eulerian prediction. This remains true for the halo one-loop power spectrum, as we discuss next.

Halos
Let us now turn to the halo density field. Using results from the previous section we can rewrite it as Notice that b 1 multiplies the nonlinear shifted density field. For this reason theS 3 operator is absent from the bias expansion and some bias parameters are modified. This expression is very similar to the Standard Eulerian bias expansion. We have already demonstrated that the power spectrum ofδ is indeed close to the IR-resummed Standard Eulerian one-loop power spectrum. The same is true for the other correlation functions as well.
To see this more explicitly from the definition of shifted operators let us take a look at the auto and cross spectra of operatorsÕ ∈ {δ 2 ,G 2 } as an example. We have argued that, neglecting the two-loop corrections, these spectra can be calculated at leading order in the following way Following the arguments of [63] we are going to show that this expression is identical to the IR-resummed counterpart at the one-loop order. We can first write the correlation function under the integral as a sum of the smooth part and a feature at the BAO scale. Then the integral of the smooth part is dominated by q ∼ 1/k. For this separation the typical size of the exponential factor can be approximated as This approximation follows from the form of the correlation function of the relative displacement field which can be written in the following way and noting that this integral peaks at p ∼ 1/q for a given q. The exponential factor therefore can be always neglected as long as we are in the pertutbative regime where k 3 P11(k) 6π 2 1. In conclusion, the featureless or smooth part of the power spectrum is identical for the shifted and ordinary operators at the one-loop level Õ aÕb Let us now turn to the feature at the BAO scale. In this case the integral in Eq. (114) has support only around q ∼ BAO and the exponent, which is a smooth function of q, can be approximated at its value at q ∼ BAO . This leads to the following expression for the wiggle part of the power spectrum where Σ 2 Λ is exactly given by Eq. (112). We can therefore see that the IR-resummed power and cross spectra of the operators δ 2 and G 2 are indeed the same as the spectra of their shifted counterparts. 20 In this paper we are interested only in δ 2 and G 2 as these are the only bias operators that we keep in the expansion when we compare it to simulations. However, it is important to stress that this derivation holds more generally and the same conclusions apply to any one-loop contribution to the power spectrum of biased tracers.
One important consequence of these results is that all correlators in formulas for the transfer functions can be replaced by the corresponding IR-resummed expressions, which are easier to calculate. In other words, the Standard Eulerian bias expansion gives the same (up to two-loop error) power spectrum as our quadratic bias model based on the shifted operators using the following expressions In these formulas all power spectra are calculated in Standard Eulerian perturbation theory with IR-resummation and all bias parameters are as measured from the transfer functions using the shifted fields.
Let us finish this section by making a comment about measuring the bias parameters from the power spectrum. We have just argued that Standard Eulerian perturbation theory with IR-resummation predicts the correct shape for the nonlinear power spectrum of biased tracers. The measurement of bias parameters then proceeds in the usual way leading to the usual results. On the other hand, we have also argued that the bias parameters with the lowest model error are different from those inferred from the correlation functions. How do we see this difference using the power spectrum? The answer is that measuring the bias parameters minimizing the model error and fitting the power spectrum are two different fitting procedures with different number of fitting parameters. For instance, when fitting the power spectrum it is common to combine all constant k → 0 contributions from the bias operators such as δ 2 with the noise power spectrum. In this way, the bias parameters are measured only from the k-dependence of different contributions. This is possible because all the constant terms are exactly degenerate with the Poisson noise, whose amplitude is fitted simultaneously with other parameters. On the other hand, in the minimization procedure we only fit for the bias parameters and the noise is entirely fixed by their best fit values. In other words, we cannot trade the contributions of different bias operators for the noise. Given that there is one less parameter to fit, the values of biases in the two fitting procedures must be different. Notice that the noise fitted from the power spectrum is always higher than inferred from minimization. This can be of particular relevance when trying to measure cosmological parameters. 20 Notice that the constant low-k contribution to δ 2 δ 2 is the same for the power spectrum of the shifted fields. Let us define P δ 2 δ 2 (0) ≡ δ 2 (k)δ 2 (−k) k→0 . The two-point function of this constant part is proportional to the Dirac delta function δ 2 (q)δ 2 (0) = P δ 2 δ 2 δ D (q). Then we can write

VIII. EXTENSION: HALO MASS WEIGHTING
So far we have only studied the halo number density, and the error or stochastic noise when modeling this with a bias expansion. But it is well known that the stochastic noise can be smaller for the halo mass density, where each halo is weighted by its mass [20][21][22][23][103][104][105][106]. More generally, any weighting of the halos that makes their overdensity more similar to the dark matter overdensity should reduce the stochastic noise relative to the dark matter based bias expansion. In particular, if we could weight each halo by the exact value of the dark matter density in the surrounding region, there would be no difference at all between the weighted halo density and the dark matter density, i.e. the model error would vanish. A related motivation is that the dark matter density satisfies momentum conservation, so that the power spectrum of stochastic effects scales as k 4 on large scales [105,[107][108][109][110] and cannot generate a white noise contribution to the large-scale linear dark matter power spectrum; weighting halos by their mass also imposes approximate momentum conservation, therefore suppressing the k 0 white noise of the halo number density on large scales [20].
In practice, the efficiency of this mass weighting method is of course limited by how well halo mass, or the local value of the dark matter density, can be estimated from observables such as galaxy luminosities, which is limited for example by scatter in the observable-to-halo-mass relation and the fact that halo mass is only a proxy for the local dark matter density. Moreover, on small scales the error of any analytical model for the weighted halo density will never vanish because terms that are not included in the model (e.g., two-loop contributions to the nonlinear dark matter density) would ultimately appear in the measured model error. We defer a more complete analysis of halo mass weighting and nonlinear bias expansions to future work, and discuss only some simple simulation results to get a sense for how it can impact the stochastic model error.
It has been shown that weighting halos with w(M ) = α h + α M M , where M is the halo mass and α h and α M are constants, is a good approximation to the optimal halo mass weighting and can significantly reduce the stochasticity of a linear bias expansion [21,22]. This motivates us to work with a similar mass weighting scheme, but we generalize it by promoting the constants α h and α M to k-dependent transfer functions. In the rest of the section we will describe this mass weighting method in more detail, and then present results from simulations.
We will use the following notation in this section. δ truth h is the true halo number density of a simulation or galaxy survey data; δ M is the true halo mass density of a simulation or galaxy survey data, obtained by weighting each halo by its mass; δ ⊥ M is the component of δ M that is orthogonal to δ truth h ; the weighted or mass-weighted density δ obs h is a linear combination of simulated (observable) halo number and mass density-this combined field is what we regard as the observable; finally, δ model h is the bias model that describes δ obs h . The power spectra of δ truth h , δ ⊥ M , and δ obs h are called P truth , P ⊥ M M , and P obs , respectively.

A. Mass Weighting Method
We will choose the mass weights such that the mean-square error between the weighted halo density (the observable) and the bias expansion (the model) is minimized in every k bin. To achieve this, we first rewrite the weighted halo density δ obs h , where each halo is weighted by w(M ) = α h + α M M , as a linear combination of the measured halo number density δ truth h and the measured halo mass density δ M . Orthogonalizing the latter with respect to the former, so that δ truth h δ ⊥ M = 0, and allowing for k-dependent weights, we have Then, we minimize the mean-square model error simultaneously with respect to the mass weights α µ (k) and the bias parameters β i (k) in every k bin. 21 A trivial but pathological solution is to set all α µ = 0 and β i = 0, which would give P err = 0, but at the same time it would set 21 We will only consider models of the form δ model h = i β iÕi whenever we apply mass weighting.
the observable density δ obs h to zero. To avoid this, we minimize under the constraint that at least one of the mass weighting parameters has to be nonzero, say α h = 0. Then, For any α h = 0, the optimal α M and β i are then found using linear regression analogously to Eq. (40). Afterwards α h can be changed to any nonzero value, which changes the overall normalization of all coefficients. We choose α h such that the power spectrum of the weighted halo density, P obs (k), is equal to the power spectrum of the halo number density in absence of mass weighting, i.e. we impose We therefore add information about halo mass at the field level in such a way that the observable power spectrum remains unchanged. We are going to apply this mass weighting procedure to the simulated halos in the next section.
Note that changing the normalization condition would change power spectra, but it would not affect the ratio P err /P obs or the cross-correlation coefficient between δ obs h and model. Also note that by adding the orthogonalized mass density δ ⊥ M as opposed to the mass density δ M we ensure that adding this field cannot cancel the number density to set δ obs h to zero (the coefficient α M is only turned on when a part of the mass density that is uncorrelated with the number density helps to reduce the model error).
To get a better understanding for how the mass weighting operates, consider a toy model where the true halo number and mass density are Then, where s ≡ α n.o. M /α h and α n.o. M is the weight that would appear when not orthogonalizing, i.e. writing δ obs The mass weighting procedure then amounts to choosing α h and s such that the power spectrum of h + s M is small while keeping the total power spectrum of δ obs h unchanged. If the fractional size of the stochastic error is different between the two fields 22 , this is most efficient when h and M are correlated; in that case we can generally pick s(k) such that the stochastic fields h and M cancel each other mode by mode, without entirely cancelling the signal part involving theÕ i operators. The effectiveness of this is controlled by the correlation coefficient between h and M , and by the fractional size of the stochastic noise relative to the signal. We will get back to this in more detail in Section VIII D below, but first we describe simulation results with mass weighting. Fig. 23 show the mean-square error P err of the best cubic bias model to describe the massweighted halo density δ obs h computed using exact FOF halo masses. For all but the most massive halo bin, this mean-square model error is less than 10% of the Poisson prediction 1/n on large scales; for the most massive halo bin, M ≥ 10 13.8 h −1 M , it is about 25% of the Poisson prediction. 23 Relative to no mass weighting (dark grey in Fig. 23), mass weighting therefore reduces the large-scale mean-square model error by a factor of 17 for the two densest halo populations, M ≥ 10 10.8 h −1 M and M ≥ 10 11.8 h −1 M , by a factor of 7 for the heavier and rarer M ≥ 10 12.8 h −1 M halos, and by a factor of 2 for the very massive M ≥ 10 13.8 h −1 M halos. 22 I.e., if | h | 2 / | i β iÕi | 2 = | M | 2 / | i γ iÕi | 2 , which makes sure that δ truth h and δ M do not just differ by a normalization factor. 23 The mass bins are similar to the last sections, using the same minimum halo masses for the four bins, but we do not impose any maximum halo mass cut for any of the bins (very massive halos are easy to observe and halo mass weighting should appropriately upor down-weight those halos).

Light grey curves in
, extracted from simulations. The model is the cubic bias model as before but without δZ , i.e. δ model h = i∈{1,2,G 2 ,3} βi(k)δ ⊥ i (k). The light grey curve asssumes that the halo mass is known perfectly (as measured by the FOF halo finder), the green curves include a random scatter σM in the halo masses, and the dark grey curve assumes no mass weighting, corresponding to the no-mass-weighting result presented previously in the paper. Transfer functions αµ(k) and βi(k) are optimized as free functions of k, with αµ satisfying the normalization condition (127). At low k, the width of the curves represents the uncertainty of Perr estimated from the scatter between the five independent simulations; at high k, the estimated uncertainty is smaller.
To be more realistic, green curves in Fig. 23 include a log-normal scatter added to the FOF halo masses. 24 We find that for a 0.4 dex (i.e., factor 2.5) mass scatter, mass weighting is not effective and the model error is only marginally reduced compared to using just the halo number density. For 0.2 dex (i.e., 60%) mass scatter, however, the large-scale P err is reduced relative to no mass weighting by a factor of 1.5 − 2 for the three low-and intermediate halo mass bins, and by a factor of 1.3 for the most massive halos. For 0.1 dex (i.e., 26%) mass scatter, the large-scale P err is reduced by a factor of 3 − 5 for the low-and intermediate mass halos, and by a factor of 1.6 for the most massive halos. So if we can determine halo masses with a scatter of ∼ 60% or less, this could reduce P err by factors of a few for halo samples like ours.
What is the scale dependence of the model error after mass weighting? Fig. 23 shows essentially no scale dependence for k 0.1 hMpc −1 , but there is a clear scale dependence at higher k, and this tends to be stronger than the scale dependence of P err (k) without mass weighting. This could be caused by two-loop terms that are missing in the model and therefore contribute to the measured P err (k); after mass weighting, the stochastic noise contribution P 0 0 to P err might be so small that the missing two-loop terms could be the dominant contribution to P err at high k, especially when using a high number density of halos and assuming perfectly known halo mass. Alternatively, the k 2 corrections to P err might be larger after mass weighting. Resolving this question is beyond the scope of this paper. Note that in order to make use of the reduced model error on small scales, one would have to model this increased scale dependence of the model error or modify the bias model or mass weighting scheme to obtain a flatter P err . Fig. 24 shows the cross-correlation coefficient r cc (k) between the mass-weighted halo field δ obs h and the best-fit cubic bias model, and the fractional mean-square model error 1 − r 2 cc . Using exact FOF halo masses with no scatter, the correlation coefficient at k 0.02 hMpc −1 is between 99.995% and 99.9% (1 − r 2 cc between 0.01% and 0.2%) for all but the most massive halo bin. This is a substantial improvement over no mass weighting where the correlation FOF halos at z = 0.6, with mass weighting + αM δ ⊥ M as described in the text, with normalization such that |δ obs h (k)| 2 = P truth (k), where P truth is the measured halo number density power spectrum. Lower panels: Crosscorrelation coefficient between best-fit cubic bias model and mass-weighted halo density from simulations. This shows that halo mass weighting can reduce the model error by more than an order of magnitude on large scales. coefficient at k 0.02 hMpc −1 is between 99.9% to 99.2% (1 − r 2 cc between 0.2% and 1.5%). The correlation decreases on smaller scales and when adding scatter to the halo mass. Similarly to before, a scatter of 0.4 dex is too large for mass weighting to be effective, but with a scatter of 0.2 dex or 0.1 dex the mass weighting can substantially improve the cross-correlation coefficient, exceeding 99.5% (1 − r 2 cc = 1%) on large scales for all but the most massive halo mass bin.

C. Contributions and Mass Weights
To see in more detail how the mass weighting operates, Fig. 25 shows power spectra of the halo number density δ truth h and the halo mass density δ M . They have a similar k dependence, but the mass density has a larger linear bias because heavy halos are up-weighted relative to the number density (and we use the same minimum halo mass cutoff for both densities). In contrast, the power spectrum of the orthogonalized mass density, , is rather independent of scale. To understand this, note that if δ M was equal to the halo number density plus an additional white noise field, δ M = δ truth h + , then δ ⊥ M = and its power spectrum would consequently be flat. We will argue in the next section that we can indeed expect δ ⊥ M to be a combination of the stochastic noise fields of the halo number and mass density, whose power spectrum is again expected to be flat on large scales.
The optimal mass weights α h and α M are shown in Fig. 26 as a function of k for the four halo mass bins. Both weights are rather smooth functions of k. The halo number density weight α h is one on large and small scales, but decreases by 10% to 20% on intermediate scales, 0.1 hMpc −1 k 0.6 hMpc −1 . The weight α M of the halo mass density is typically a few times smaller, and is constant at low k but suppressed at intermediate and small scales,   ] FOF halos at z = 0.6, with mass weighting Figure 25. Power spectra relevant for halo mass weighting. The power spectrum of the halo mass density (solid blue) has a similar shape as that of the halo number density (solid black), but it is more biased relative to the linear density (solid grey), because heavy halos are upweighted. Orthogonalizing the halo mass density with respect to the halo number density leads to a flat power spectrum (dashed blue); this corresponds approximately to a combination of the stochastic noise terms of the halo number density and mass density, so that adding or subtracting it from δ h can cancel part of its stochastic noise. No scatter is added to halo masses (doing so increases the constant, stochastic part).
k 0.2 hMpc −1 . When increasing the mass scatter, α M becomes smaller as expected because the information in the halo mass density is less useful. Fig. 27 shows how much power the halo number density δ truth h and the mass density δ ⊥ M contribute to the combined field. The halo number density δ truth h always dominates. The contribution from the orthogonalized mass density δ ⊥ M typically reaches several tens of percent of the number density power around k 0.3 − 0.4 hMpc −1 , and is approximately constant at k 0.2 hMpc −1 and drops at higher k. This again suggests that δ ⊥ M contains mostly stochastic noise, which partially cancels the stochastic noise of δ truth h , reducing the stochastic noise of the combined field.

D. Toy Model
To better understand the partial cancellation of the stochastic noise in the halo number and mass density, we consider a simple toy model (similarly to [21,103]), where the true halo number and mass density are given by X are stochastic noise terms that are uncorrelated with the model, X δ 1 = 0. In that case, Here and in what follows we ignore terms whose power spectrum is suppressed by a factor of P /(b 2 P 11 ), which is small at low k. Eq. (133) shows that the orthogonal component of the halo mass density is indeed a combination of the stochastic noise fields h and M of the halo number and mass densities. We therefore expect its power spectrum to be flat, which is what we saw in Fig. 25 above.
The fractional noise of the combined field is therefore given by the fractional noise of the low-noise field, times (1 − r 2 hM ), which is small when the two noise fields h and M are very correlated. 26 These arguments are related to the idea of cancelling cosmic variance using two correlated tracers with different biases [111], which was one of the original motivations to study halo mass weighting [20]. In that case, the argument is usually formulated as improving the Fisher information when regarding the relative bias b h /b M between the two tracers as the signal of interest, whereas we computed the reduction of the stochastic noise term when combining the 25 Since I ≥ 1 the denominator in Eq. (136) is always non-negative. 26 Another interesting limit is I = 1. This happens if and only if E M M = E hh , i.e. when the fractional size of the stochastic noise fields is the same, | M | 2 /b 2 M = | h | 2 /b 2 h . In that case, Eq. (136) becomes P obs /P obs = E hh /P 11 × (1 + r hM )/2. If the noise fields are perfectly correlated, r hM = 1, this gives E hh /P 11 . We therefore do not gain anything by combining the fields, which is not surprising because this case corresponds to δ truth h and δ M being identical up to an overall normalization factor, so combining δ h and δ M cannot reduce the fractional stochastic noise. In contrast, if h and M were perfectly anti-correlated, r hM = −1, the combined noise would vanish, which again makes sense because the noise cancels identically when adding the two fields while the signal does not. In practice, we expect the halo number and mass density and their fractional stochastic noise to be different, i.e. E M M = E hh , so that I > 1.
two tracer fields and modeling that combination with a bias expansion. The final result, especially the 1 − r 2 hM scaling as r hM → 1, is similar.
Of course the analytical arguments above assumed a simple toy model without nonlinear bias, and ignored corrections to Eq. (133) that can be important at intermediate and high k. The simulation results, however, include the nonlinear bias terms of the cubic model and show that mass weighting works very efficiently in that case up to rather high k.
In conclusion, we confirm that halo mass weighting is a promising method to suppress the stochastic model error or shot noise, which could yield more powerful cosmological constraints if achievable in practice, but more work is required to characterize the most suitable bias model and its error when mass weighting is applied. 27

IX. SUMMARY
Using a modified basis of operators in Eulerian space, we have constructed a model of biased tracers at the density field level that accounts for bulk flows (large IR displacements) without a perturbative expansion in the displacement. We find that this model is able to describe the halo density obtained from N-body simulations accurately over a wide range in scale and halo mass. Our main findings are as follows.
• To obtain coherent positions of particles in the model and simulations, it is important to use the bias expansion in terms of shifted operators which keep large IR displacements resummed. In contrast, Standard Eulerian bias applied to the dark matter density from Standard Eulerian perturbation theory treats displacements perturbatively, leading to a decorrelation between model and simulations at wavenumbers k 0.2 hMpc −1 . The Standard Eulerian bias expansion based on the fully nonlinear dark matter density (including full displacements) instead suffer from a large constant contribution to the power spectrum of δ 2 m , which generally leads to a large model error even on very large scales. For these reasons, the bias expansion in terms of shifted operators is more suitable for modeling biased tracers at the field level. Its power spectrum agrees with that of the usual IR-resummed Standard Eulerian power spectrum.
• Nonlinear bias operators are important to obtain a model error power spectrum P err (k) ≡ |δ truth h (k) − δ model h (k)| 2 that is comparable to the Poisson prediction, P err (k) = 1/n. For the quadratic and cubic bias models, the amplitude of P err is a few tens of percent higher than the Poisson prediction for M 10 13 h −1 M halos, and about a factor of two smaller than the Poisson prediction for heavier halos. Without the nonlinear bias terms, the model error power spectrum is ∼ 5 times larger for M 10 13 h −1 M halos and 30% for heavier halos, even on very large scales, k < 0.05 hMpc −1 (see Fig. 6).
• The cross-correlation coefficient r cc between the modeled and simulated halo density is roughly consistent with the Poisson shot noise prediction for low and intermediate mass halos, reaching up to r cc = 99.9% on large scales. For heavy, M 10 13 h −1 M halos the correlation is better than expected from the Poisson prediction, remaining above 50% up to k 1 hMpc −1 (see Fig. 7).
• For the simulated halo samples at z = 0.6 that we analysed it would be safe to assume a scale-independent model error or shot noise up to k max 0.13 − 0.3 hMpc −1 , depending on halo mass, when analysing a 10 h −3 Gpc 3 volume, or up to k max 0.18 − 0.37 hMpc −1 when analysing a 0.5 h −3 Gpc 3 volume. Without nonlinear bias terms, these k max values are smaller by a factor of two to three, because the measured model error is much more scale dependent (see Table IV and Fig. 11).
• On small scales, k 0.3 hMpc −1 , the model error depends strongly on scale. This could be due to expected (k/k M ) 2 corrections to the noise on scales comparable to the typical size of a halo, although the uncertainty of our measurements of the model error is too large to test this conclusively. Alternatively this could be caused by additional higher order bias terms that we did not include in the bias expansion.
• The local quadratic bias term contributes typically 10% in power at k 0.1 hMpc −1 (see Fig. 12). Modeling the halo power spectrum to 1% at k = 0.1 hMpc −1 therefore typically requires the quadratic bias parameter β 2 to be known or constrained to better than 10%. For M = 10 12.8 − 10 13.8 h −1 M halos, this contribution 27 In addition to improving the bias model or modeling the scale dependence of Perr, one might also want to employ a more realistic halo mass scatter, noting that mass estimates from the luminosity of massive halos (M 10 12 h −1 M ) can suffer from the flattening of the stellar-to-halo mass relation at high mass, while determining the halo mass of very faint (possibly satellite) galaxies is also challenging. A potential systematic offset in the assumed mean stellar-to-halo mass relation can also have an impact, likely leading to suboptimal weights. Still, it would be interesting to study clustering for different cuts in luminosity or other galaxy and cluster properties (e.g., [112][113][114][115]) to suppress stochasticity.
is smaller because β 2 is much smaller than the linear bias, but for these halos the contribution from the cubic local bias term is 5% in power at k = 0.1 hMpc −1 , so β 3 should be known within 20%.
• The bias transfer functions β i (k) that we use in the bias model can be easily described with a 5-or 6-parameter fit based on theoretical predictions of the expected shape of these transfer functions (see Section VI). The number of free parameters of our model is therefore the same as in usual nonlinear bias expansions (i.e., one parameter b i for each included bias term, and c s ).
• We confirm that the mean-square model error can be suppressed by an order of magnitude when weighting halos by their mass. This is similar to previous findings [20][21][22], although our definitions differ somewhat because we include nonlinear bias terms as part of the model and not the stochasticity. With a mass scatter of 0.2 dex (i.e., 60%) or less, the stochastic mean-square model error can typically be suppressed by a factor of two or more, even for relatively low number density ofn 5 × 10 −4 h 3 Mpc −3 (see Figures 23 and 24).
By demonstrating that dark matter halos in N-body simulations can be modeled accurately at the level of the density field realization we have provided a stringent test of the validity of the bias expansion. This can be useful for several practical applications, including forward-model inferences that rely on a density-level forward model in the likelihood, cosmological analyses of power spectra measured from galaxy survey data where the model error enters as a noise contribution, and the design of initial condition and BAO reconstruction algorithms from biased tracers. We leave these and other possible applications for future work.
where k-arguments are suppressed. This shows that the cross-correlation coefficient r cc between best-fit model and truth is identical to the square-root of the model power spectrum divided by the truth power spectrum, and the minimum mean-square model error P err is proportional to one minus the squared correlation coefficient between best-fit model and truth. This shows that P err and r cc are closely related to each other.
It is important to stress that this is valid only for the best-fit model with the best fit-transfer functions. If we use 5or 6-parameter fits to the transfer function the relation between the model error and the cross-correlation coefficient is not so simple anymore. Let us call M the model that uses β i = β i + δβ i , where β i are the best-fit transfer functions and β i their approximation using the perturbation theory fits. In this case a part of the error is correlated with the model, and therefore P M M = P M T (also see Section VI E).