General and consistent statistics for cosmological observations

This paper focuses on two aspects of the statistics of cosmological observables that are important for the next stages of precision cosmology. First, we note that the theory of reduced angular N -point spectra has only been developed in detail up to the trispectrum case and in a fashion that makes it difﬁcult to go beyond. To ﬁll this gap, here we present a constructive approach that provides a systematic description of reduced angular N -point spectra and their covariance matrices, for arbitrary N . Second, we focus on the common practice in the literature on cosmological observables, which consists in simply discarding a part of the expression, namely, the terms containing ﬁelds evaluated at the observer position. We point out that this is not justiﬁed beyond linear order in perturbation theory, as these terms contribute to all the multipoles of the corresponding spectra and with a magnitude that is of the same order as the rest of the nonlinear corrections. We consider the possibility that the reason for neglecting these terms is a conceptual discomfort when using ensemble averages, which originates in an apparent tension between the ergodic hypothesis and the privileged position of the observer on the light-cone. We clarify this subtle issue by performing a careful derivation of the relation between the theoretical statistical predictions and the observational estimators for all N . We conclude that there is no inconsistency whatsoever in ensemble-averaging ﬁelds at and near the observer position, thus clearing the way for consistent and robust high-precision calculations


I. INTRODUCTION
Upcoming surveys such as Refs.[1][2][3][4][5] will significantly enhance the quantity and quality of the data that we have at our disposal for understanding the universe.This development requires a commensurate effort on the theoretical side for the new observational input to be interpreted correctly.In particular, one needs to consider analytical expressions for cosmological observables beyond the linear order in perturbation theory, as is already clearly reflected by the extensive amount of work on the subject in the case of CMB lensing [6][7][8][9][10][11][12][13][14][15][16][17], galaxy number counts [18][19][20][21][22][23][24][25][26][27][28][29][30][31][32], and cosmological distances and weak lensing [33][34][35][36][37][38][39][40][41].The inclusion of nonlinear corrections is not only required for increasing the precision of the basic quantities of interest, such as the power spectrum of observables, but also for obtaining the leading contributions to higher-order statistics (e.g., bispectrum, trispectrum, etc.).The latter contain crucial information about the early universe and the formation of structure, which will become more accessible thanks to the expected advances in observations.In this paper we will elaborate on two important aspects of the statistics of cosmological observables for the next stages of precision cosmology which have not yet been considered in the literature.
The first issue is the lack of generality in our description of angular higher-order statistics.Indeed, in the literature, the rotationally invariant (or "reduced") angular N-point spectra of a given observable on the sky are defined only up to the trispectrum (N = 4) case [42] and built in a rather opaque way that does not generalize easily to arbitrary N. The N = 3, 4 cases are well studied and measured [43][44][45][46][47] in CMB observations, whittling down the parameter spaces of the inflationary models.It is known, however, that the hierarchy between N-point spectra is not necessarily trivial, e.g., there exist models with a large trispectrum, but negligible bispectrum [48][49][50][51].Therefore, the study of N-point spectra beyond N = 4 could be relevant for models with both negligible bispectrum and trispectrum.In any case, it is always desirable to have a systematic theoretical framework for approaching a given problem, which is what we will present here.This is a transparent construction of the reduced angular N-point spectra, for arbitrary N, their covariance matrices for arbitrary N and N , as well as a flurry of useful equations for manipulating them.The central ingredient in this framework will be the generalization of the "triangular" Wigner 3 − j symbol to a "multilateral" one, leading in particular to a neat diagrammatic representation.Our equations will hold for the idealized case of full sky measurements, but they should not be difficult to generalize to the case of partial covering.Note, finally, that the number of components of the angular spectra increases rapidly with N, meaning that the new higher-order statistics we can now work with are practically impossible to store, let alone manipulate, in their entirety.Nevertheless, our construction is the indispensable prerequisite for handling these objects, as it provides the starting point for research on privileged subsets, limits, shapes and efficient estimators thereof.
The second issue we want to discuss concerns the consistency and accuracy of the theoretical predictions, especially at the nonlinear orders in perturbation theory which now need to be taken into account.The standard approach to analytical computations of cosmological observables neglects certain terms, namely, those containing fields evaluated at the observer point.As we will discuss, for linear order perturbation theory these observer-dependent contributions have no effect on multipoles l 2, but we will argue that they do affect all multipoles at the nonlinear level.This is an important point, because none of the works on nonlinear effects so far  considers these observer-dependent terms.Moreover, at the linear level it has already been established that the consideration of the full expressions, i.e., including the observer-dependent terms, has some important advantages, as it leads to results that are gauge-invariant and respect the equivalence principle in k-space (absence of spurious infrared divergences) [52][53][54][55][56][57][58][59][60][61][62][63].
We identify two possible reasons behind the reticence towards considering these observer-dependent terms in the literature.The first one is rooted in a common misconception that observer-dependent terms are synonymous with "monopole," because they only affect the later in the power spectrum of the linear theory (or the s multipole for spin s observables).Since low multipoles are dominated by cosmic variance, one is then naturally led to ignore observer-dependent terms.We will show that this is an artifact of linear perturbation theory, because observer-dependent terms actually do arise in nonlinear corrections to all multipoles, and there is no reason to believe that these contributions are negligible.The second possible reason for ignoring these terms might be a discomfort of conceptual nature.Simply put, it is not clear whether one can consistently ensemble average them to compute correlation functions and spectra, because, contrary to the case of source points, one does not probe several "observer points" observationally for the ergodic hypothesis to apply.A first shaky aspect of this argument is that the ensemble averages that are considered in the standard approach are statistically homogeneous, so this already wipes out the observationally privileged status of the observer point, independently of whether one chooses to include the observerdependent terms or not.More precisely, we will see that the above objection to the statistical use of observer-dependent terms arises from a misunderstanding of the way the ergodic hypothesis enters the relation between the ensemble averages of the theorist and the geometrical averages of the observer.We will perform a careful derivation of that relation under the standard assumption of statistical homogeneity and isotropy for the stochastic fluctuations, but without any assumptions about which points are included in the ensemble averages.We will then see that the error of this theoretical prediction with respect to the data averages is nothing but cosmic variance.Thus, by carelessly ensemble averaging field products at arbitrary points on the light-cone, independently of how well we can probe them observationally, the theorist is not deviating in any new way from observation other than the fundamental limitation of cosmic variance.Let us also stress that our result is completely general, as we will work with the freshly defined N-point spectra for arbitrary N. The assumption of full sky measurement will not be problematic here, as the point we wish to make is of qualitative nature.Also, we will only consider the spectra of a single observable, but the formalism generalizes straightforwardly in the case of cross-correlations of different observables.
The paper is organized as follows.In Sec.II we discuss some technical aspects regarding cosmological observables and their theoretical statistical treatment.The reader who is familiar with cosmological observables can skip this part, although some of its definitions and equations will be used later on, so it might be useful to take a quick look anyway.In Sec.III we discuss the present situation regarding reduced angular N-point spectra in the literature and then derive our formulas for defining and handling these objects for arbitrary N. Section IV focuses on the issue of observer-dependent terms.We first discuss in more detail the inconsistencies, and potential inaccuracies, that result from ignoring them in the spectra of observables.Then, using our definition of the angular correlation functions and spectra given in the previous section, we derive the relation between the statistical ones of the theorist and the geometrical ones of the observer.This demonstrates that one is allowed to (and one should) take into account the observer-dependent terms inside ensemble averages.In Sec.V we derive the expression for the covariance matrix of two arbitrary reduced angular spectra and determine the asymptotic behavior of cosmic variance for large l values.In Sec.VI we discuss a few subtleties one should be aware of when working with the connected parts of the statistics.Finally, in Sec.VII we conclude.The Appendices contain technical derivations to avoid clogging the main text, as well as the set of Wigner 3 − j symbol identities we will use in this paper.

A. Angle, redshift, and observables
A cosmological observable associated with some localized source of light (galaxy, supernovae, etc.) is a function of two space-time points O(x o ; x s ), the "observer" point x o and the "source" point x s , that are constrained to lie on a common lightlike geodesic, with x s in the past of x o .Thus, for a given observer at x o , the set of possible light source points x s detected through light forms the past light-cone of x o .The numbers x μ s are not observables, as they are ambiguous due to the freedom of performing coordinate transformations.One can of course consider correlation functions of the form O(x o ; x 1 ) . . .O(x o ; x N ) , or the associated spectra in k-space, but these cannot be related to observable information, especially in the presence of nonnegligible inhomogeneity effects.One must therefore work directly with the physically unambiguous relations, i.e., the relations between observables, so one must parametrize the past-light cone at x o in terms of observables.One of them is the incoming photon direction in the sky, n, leading to an angular parametrization of the sky.Thus, instead of working with the "k-space" spectra, we will work with the "l-space" ones, which are directly related to observable quantities.Finally, for the radial parametrization of the light-cone there are several choices, such as the observed redshift z or the luminosity/angular distances D L,A .Here we will consider the former, which is also the most widely used and model independent.Thus, the observer at x o parametrizes her light-cone in terms of the observables z and n, and the physical information lies in the relation between O and (z, n) that is the function O(x o ; z, n).
The quantities z and n are defined with respect to the observer rest-frame, i.e., a tetrad e a at x o , whose time-component e 0 is the observer's 4-velocity, and the source 4-velocity u s satisfying g s (u s , u s ) ≡ −1.More precisely, if k denotes the momentum 4-vector of the photon, then we have where i ∈ {1, 2, 3} is the spatial part of the four-dimensional index a ∈ {0, 1, 2, 3}.Note that n i is the observed angular direction, not the propagation direction, hence the minus sign.These quantities are invariant under space-time coordinate transformations, as any observable should, since a measurement cannot depend on how we parametrize that manifold.In the absence of a tetrad, the only available numbers are the components of k in a coordinate system k μ and the corresponding angles in the spatial part are the ones an observer uses only if the basis ∂ μ is orthonormal, i.e., only if g μν (x o ) = η μν .This is not the case in most of the coordinate systems used in cosmology, which is why one requires a tetrad to obtain the correct angles n.Since this field is defined at the observer position, the corresponding correction of the angles it induces will appear as observer terms in the expressions of observables.
The observer tetrad Eq. ( 1) is defined only up to a Lorentz transformation e a → b a e b , but the boosts alter the observer 4-velocity, so the only ambiguity is the orientation of the spatial frame e i → R j i e j , leading to an SO(3) ambiguity for n, As for the observable O, the theoretical expression must also be a scalar under coordinate transformations [58], but it can be a tensor with respect to the Lorentz index a of the observer tetrad.For instance, ω and n are components of the Lorentz vector which is why they transform nontrivially under Lorentz transformations of e a .Since the boost part here is fixed by the definite observer 4-velocity, we only have to deal with the SO(3) ambiguity Eq. (3).To that end, we express n in terms of observed angles (ϑ, ϕ), n ≡ sin ϑ cos ϕ e 1 + sin ϑ sin ϕ e 2 + cos ϑ e 3 , (6) so that O ≡ O(x o ; z, ϑ, ϕ) becomes a function on the unit sphere S 2 .The latter is a two-dimensional manifold with coordinates ϑ A ∈ {ϑ, ϕ} and with the admissible coordinate transformations being the ones induced by the rotation Eq. ( 3). 1he generic observable will therefore be a tensor field on that manifold O A 1 ...A n .However, in two dimensions any tensor can be locally reduced to scalars and pseudoscalars.For instance, a vector can be decomposed into a scalar v and a pseudoscalar ṽ via while a tensor can be decomposed into two scalars T, t and two pseudo-scalars T , t where S AB , ε AB , and ∇ A are the metric, totally antisymmetric tensor in two-dimensional and covariant derivative on the 2-sphere S 2 , respectively.Further decomposing these (pseudo-)scalars into spherical harmonics leads to the decomposition of O A 1 ...A n into spin-weighted spherical harmonics.
To avoid the introduction of the latter, which complicates unnecessarily the formalism, here we will assume that our observables are the (pseudo-) scalars of the above decomposition (up to possible Laplacians).For instance, in the case of the CMB polarization tensor P AB we would directly work with its "electric" and "magnetic" components ∇ A ∇ B P AB and ε C A ∇ B ∇ C P AB , respectively.Finally, another important observable is the incoming photon frequency so the most general parametrization of light-based observables is a spectral distribution O(x o ; z, n, ω).In the case of observables associated with diffuse sources (e.g., the CMB), i.e., that do not have a particular emission moment and therefore no associated redshift, then we simply have O(x o ; n, ω).For the sake of simplicity, here we will assume that ω is either fixed, or that it is integrated over with some given spectral distribution, as in the case of the CMB temperature for instance.We will therefore work with observable functions of the form O(x o ; z, n), but the inclusion of ω is straightforward and basically enters our equations exactly as the redshift dependence.
A more detailed account of this subsection's content, and in particular of the underlying geometrical constructions, can be found in Ref. [64].

B. Ensemble averaged N-point correlation functions
The theorist works with tensor fields on the full spacetime manifold which we collectively denote by (x).The observable O(x o ; z, n) is an integrodifferential functional of these fields Within cosmological perturbation theory, one splits into a homogeneous and isotropic "background" solution ¯ and a fluctuation φ, where the latter is subject to a gauge ambiguity.This is the manifestation of infinitesimal coordinate transformations in this framework, since the background is kept fixed under these transformation.The observable O(x o ; z, n) should, by definition, be independent of the choice of gauge, so one can fix the latter.Moreover, one can use any constraint equations (e.g., the Poisson equation) to reduce the number of fields down to those carrying degrees of freedom, which we denote by φ a (x), i.e., the "a" index includes both space-time and internal indices.Thus, the φ a (t 0 , x) data at a given time t 0 uniquely determine the ones at any other time t, Here it is assumed that the φ a also contain the momenta/velocities, for fields obeying second-order equations in time.Next, to compare with observations, one needs to consider a statistical ensemble of solutions.The φ a are therefore promoted to stochastic fields with a corresponding probability distribution functional (pdf) associating a probability density to each field solution φ a (x).Since the latter are completely determined by their configuration at some reference time φ a (t 0 , x), it suffices to define that pdf on these field configurations P ≡ P[φ(t 0 )] (dropping the index a and the position x for notational simplicity).One can then define the moments, i.e., the statistical averages of field products, φ a 1 (t 0 , x 1 ) . . .φ a n (t 0 , x n ) := Dφ(t 0 ) P[φ(t 0 )] φ a 1 (t 0 , x 1 ) . . .φ a n (t 0 , x n ), (13) which completely determine the functional P, and with these one can define the field correlation functions (FCF), and use the linearity of the averaging operation to express this as a functional of Eq. ( 13).In particular, P is chosen such that which can be alternatively stated as With the above definitions one can now perform statistical averages of arbitrary functionals of the φ a (x).In particular, the theoretical correlation functions (TCF) of the observables are defined by which are therefore ultimately a functional of ¯ and the FCFs Eq. ( 14).Now note that, although strict homogeneity and isotropy are lost as soon as φ = 0, these notions can be reintroduced at the statistical level by imposing the corresponding symmetries on P[φ(t 0 )].Thus, we require that, if the two configurations φ a (t 0 ) and φ a (t 0 ) are related by an isometry of the background geometry, then P[φ(t 0 )] = P[φ (t 0 )].As a result, the FCFs and TCFs are invariant under isometries, meaning in particular that Gth is independent of the observer position x o , and invariant under a common rotation of its N directions,

III. THE GENERAL THEORY OF REDUCED ANGULAR N-POINT SPECTRA A. The previous implicit approach
Consider some observable on the sky O(n), where for the purposes of the present discussion we can omit the redshift dependence.The quantity that is usually considered in theory is the angular N-point spectrum through ensemble averaging O l 1 m 1 . . .O l N m N , where O lm are the spherical harmonic components.In the standard approach where statistical isotropy is assumed, that N-point spectrum is invariant under rotations, so that not all of its components are independent.Its information can therefore be stored in a rotationally invariant quantity with fewer indices called the "reduced" angular spectrum.Assuming for simplicity here O lm = 0, the lowest-order cases are the angular power spectrum, where C l is the "reduced angular power spectrum," and the angular bispectrum where B l 1 l 2 l 3 is the "reduced angular bispectrum" and the first factor is the Wigner 3 − j symbol.As one could expect from isotropy, C l and B l 1 l 2 l 3 have no m k dependence.The relations Eqs. ( 20) and ( 21) can be inverted and, in particular, allow one to build unbiased estimators for these quantities, i.e., One can therefore work exclusively with the invariant and more compact reduced spectra, both at the theoretical and data level, instead of the full ones.The theory of the reduced angular N-point spectra is well developed in the literature up to the trispectrum case [43][44][45][46][47][65][66][67][68], based on the pioneering work in Ref. [42] which considered the following method.First, one looks for the relation between the full spectrum and its reduced counterpart [e.g., Eqs.(20) and (21)] by imposing on the former the invariance under rotations, where D l,mm is the Wigner matrix, all of them here depending on the same rotation angles.Using the identities of the Wigner matrices and 3 − j symbols, one must then find the most general solution to this equation, which is Eqs.(20) and (21) for the N = 2, 3 cases, respectively.Finally, one must also invert this relation to obtain its estimator [e.g., Eq. ( 22)] and, through that quantity, its explicit definition in terms of ensemble averages of the observables [e.g., Eq. ( 23)].Although this procedure generalizes straightforwardly to N = 4 [42] and beyond, 2 it becomes quickly very difficult in practice, as one has to "guess" the relation between the full and reduced spectrum by inferring it from an increasingly complicated equation.Moreover, even after finding a solution, one must still prove that it is the most general one and then invert it, which is also harder to do for larger values of N. As a result, the generalization of Eq. ( 22) for arbitrary N does not exist in the literature.Finally, with this procedure the reduced spectrum is determined only up to an overall normalization, which can depend on the l i numbers, since only the m i numbers are summed over in the defining Eq. ( 24). 3

B. A new constructive approach
Here we propose a computationally straightforward approach to the problem.Since the defining property behind the notion of reduced spectra is rotational invariance, we will first focus on this geometrical aspect, i.e., before any statistical considerations.We will therefore be constructing the generalization of the full estimators Eq. ( 22), which exclusively depend on observed data, with their statistical counterparts being obtainable by simply taking the ensemble average as in Eq. (23).In fact, we will refer to these estimators as the "observational" reduced spectra, as opposed to the "theoretical" ones Eq. ( 17) which are obtained though statistical ensembles that are observationally unavailable.This is to highlight the fact that the observational reduced spectra have a significance of their own, as they amount to all the rotationally invariant information one can extract out of the observed data.This is the physical information, as it is independent of the artificial SO(3) ambiguity of the observer's spatial reference frame e i in Eq. ( 6). 2 Following this procedure, in Ref. [69] the authors have found the analog of Eqs. ( 20) and ( 21) for the quadrispectrum case N = 5.In Ref. [70] one can find a formal expression for arbitrary N, and also the real space N-point correlation function in terms of the reduced angular spectrum, but without rigorous demonstration.In both of these references, however, the inversion to find the estimators and ensemble-averaged spectra in terms of the observables, i.e., the analogues of Eqs. ( 22) and (23), is not given. 3See, for instance, Refs.[66,70], where alternative normalizations are considered and are essentially related to the l i -dependent factors between the Gaunt and 3 − j coefficients.
We start with the fact that the observer only has access to a single light-cone x o and therefore to the observables O(x o ; z, n).The building blocks for rotationally invariant functionals of O(x o ; z, n) are the average of the products N k=1 O(x o ; z k , n k ) over all possible common rotations of the directions n k , i.e., the average over the SO(3) group, where dR is the Haar measure on SO(3) and is invariant under group multiplication.Thanks to this, if one rotates the n k in Eq. ( 25) with the same rotation R , then this can be reabsorbed in the dummy variable R by the redefinition R → R R, thus leaving the average invariant.To turn Eq. ( 25) into a well-defined integral, we consider an arbitrary orthonormal reference frame e i and parametrize R as where R i (θ ) denotes the matrix corresponding to a rotation around e i with angle θ , so that α, β, γ are the Euler angles, In particular, the inverse matrix is simply The Haar measure dR on SO(3) now reads so the average over SO(3) of some function f (n 1 , . . ., n N ) is given by and thus, the N-point observational correlation functions (OCF) are defined by In particular, the N = 1 case reduces to the average over the sphere, which is shown by picking e 3 = n.For N > 1, three out of the 2N angles in G ob are redundant, since we are free to rotate at will, or equivalently, to choose the reference frame {e i } 3 i=1 arbitrarily.For instance, one can pick (assuming that n 1 and n 2 are not parallel) thus leaving us with a dependence on the 2N − 3 angles ϑ 2 , . . .ϑ N and ϕ 3 , . . ., ϕ N that parametrize {n k } N k=2 in the e i basis Eq. ( 6).Now note that the N-point OCF is a redshift-dependent function on that is symmetric under a common rotation of these N spheres, just as its theoretical counterpart Eq. ( 19).In particular, expressing O inside G ob in terms of the stochastic fields O ≡ O[ ] and using the linearity of averaging, we obtain the identity showing that the G ob are full unbiased estimators of the G th .
The idea now is to decompose such functions in a basis of SO(3)-invariant functions on S N 2 .For N = 2 the only SO(3)invariant combination of the two vectors is and a basis of functions on that interval are the Legendre polynomials P l , so where is the observational two-point spectrum and O lm are the harmonic components of O, Note that we work with the unit-average normalization of the spherical harmonics Eq. (A5), which is the natural one in this context and that we keep the summation over l, m indices implicit for notational simplicity.In the presence of both dummy and free l, m indices, their nature will be inferable unambiguously by looking at both sides of the equation.The m indices will always be clearly associated to some l value and therefore run from −l to l, while the l indices run from s to ∞, where s is the spin of the observable under consideration. 4For N > 2, the decomposition is performed in detail in Appendix A for generic functions f (n 1 , . . ., n N ) and the result is the following: where and and where we have defined the following coefficients: Although the latter appear as a sum over the M k indices, they are actually a single product because a Wigner 3 − j symbol vanishes if the sum of its m k entries is not zero.Also for that reason, on can directly check that these coefficients are nonzero only if N k=1 m k = 0. Equations (39) to (42) are the main result of this section.Note that they can actually also encompass the N = 1, 2 cases if we define the redundant notation and The "harmonic" components of the N-point OCF Eq. ( 40) are the observational N-point spectra which generalize the structures Eq. ( 22) to arbitrary N. In particular, for the N = 3, 4 cases we recover the known results [42], up to a different normalization convention factor (−1) As for the functions Eqs.(41), they form indeed a basis for SO(3)-invariant functions on S N 2 , as shown in Appendix A. They are orthonormal Eq. (A25) and therefore we can invert Eq. ( 39) Observe how the quantities Eqs. ( 40) and ( 41) are explicitly SO(3)-invariant, since they only depend on total angular momentum numbers l k and L k , making a total of 2N − 3 indices.As noticed earlier, this is indeed the number of independent angles present in the corresponding correlation functions.Note also that Eqs. ( 40) and ( 41) are the very same combinations of the observable components and spherical harmonics, respectively.As we will see later on, all of the nontrivial contractions of indices will always be controlled by the coefficients Eq. ( 42) with no extra factors, which comes from the fact that they are orthonormal in the sense of Eq. (A23).Thus, contrary to the implicit method described in the previous Sec.III A, here there exists a normalization of the reduced spectra that appears as naturally privileged for computational convenience.
For the sake of completeness, let us also present the generalization of Eq. ( 23) by reminding that the TCFs Eq. ( 17) are SO(3)-invariant functions on S N 2 [see Eq. ( 19)], so they can also be decomposed in the basis Eq. ( 41).Using Eq. ( 35) and the linearity of averaging we find Finally, to make contact with the implicit method described in Sec.III A, let us also invert Eq. ( 49) to obtain the generalization of Eqs. ( 20) and (21), This is the general solution to the equation of the implicit method Eq. ( 24) for arbitrary N, up to a l k , L k -dependent normalization.This result highlights again the naturalness of the normalization used here for the reduced angular spectra, as all the complicated factors entering any of the above equations always combine to form the multilateral Wigner symbols exactly.Note also that, with our construction, the consideration of the full spectra Eq. ( 50) becomes superfluous, as one no longer requires them to derive the reduced ones as in the implicit method of the previous subsection.

C. Multilateral diagrammatic representation
For N = 3 the coefficients defined in Eq. ( 42) simply reduce to the Wigner 3 − j symbol This quantity contains a {l 1 l 2 l 3 } factor, for which an alternative definition from Eq. (A22) is hence the common name "triangle delta."Consequently, the general coefficient Eq. ( 42) contains a {l 1 . . .l N |L 1 . . .L N−3 } factor, defined in Eq. (A24) as a product of certain { j i , j k , j m } factors.Generalizing the picture laid out in Refs.[42,69] for the N = 4, 5 cases,5 this quantity Eq. (A24) can be interpreted as a "multilateral delta" in the following way.Being a product of N − 2 triangle deltas, it is nonzero only if all of the involved integers l 1 , . . ., l k and L 1 , . . ., L N−3 form their respective triangles.Moreover, with respect to the triangle ordering in Eq. (A24), one of the edges of two neighboring triangles must have the same length L k , so we can picture the triangles as being connected by common edges.The resulting shape is therefore a multilateral with N edges of lengths l 1 , . . ., l N , while the L 1 , . . ., L N−3 integers correspond to the lengths of the N − 3 diagonals connected to the vertex where the edges l 1 and l N join (see Fig. 1).Thus, the multilateral delta is not nonzero for any set of l 1 , . . ., l N that can form a multilateral, but only for those whose diagonals also have integer length.This suggests naming the symbols W ... ... defined in Eq. ( 42) the "multilateral Wigner symbols", with the 3 − j symbols therefore corresponding to the "triangular" case.
From this multilateral picture one can also directly infer that the decomposition Eq. ( 39) is not unique for N > 3, as was already recognized in Ref. [42] for the case N = 4. Indeed, there is an ambiguity in choosing the vertex with respect to which the diagonals are drawn.This ambiguity, modulo symmetries of the 3 − j symbols, corresponds to alternative orderings of the (l k , m k ) pairs and therefore leads us to consider permutations of the multilateral Wigner symbol indices.As shown in detail [42] for the N = 4 case, these are achieved by taking linear combinations with Wigner 6 − j symbols, thus allowing one to relate all possible orderings.

A. The issue
An important limitation in cosmology is that, observationally, we have access only to one single realization of the statistical process under consideration, the universe, and only from one single vantage point.One therefore requires some assumptions to relate the theoretical predictions, which are of statistical nature, to observations.In the standard approach, we consider stochastic small initial fluctuations in a homogeneous and isotropic universe, which evolve under gravity.We also assume that their probability distribution functional is homogeneous and isotropic and that averaging over positions and directions is equivalent to ensemble average, i.e., an ergodic hypothesis.With these assumptions one can effectively treat different parts of the universe as different realizations and relate the theoretical ensemble averages to observational light-cone averages.
The requirement for the ergodic hypothesis to be applicable is the observational availability of a large enough number of source points.Indeed, the common motto in the literature is that the theorist is allowed to ensemble average a given product of fields evaluated at source positions, because the observer probes several of these positions.This viewpoint is perfectly sufficient as long as one is only interested in the lowest-order approximation where a cosmological observable is entirely determined by the value of fields at the source position.In the era of precision cosmology, however, the above approximation is no longer valid, because observations are able to capture several subleading effects which depend on fields evaluated on the whole line of sight from the source to the observer (e.g., weak lensing).As one approaches the observer along its past light-cone, the number of observable points decreases and therefore, one would naively infer, so does the degree of applicability of the ergodic hypothesis.The extreme case is the observer point itself, for which we can only have a single measurement.Are we then allowed to perform ensemble averages of fields at, and in the vicinity of, the observer point, as we implicitly do in Eq. ( 17)?The aim of this section is to answer this question and its conclusion is in the affirmative.
Before demonstrating this statement, however, let us explain in more detail why this question is relevant for precision cosmology.To that end, note first that the typical expression for a cosmological observable O to linear order in perturbation theory is of the form where X o and X s denote field fluctuations evaluated at the observer and source positions, respectively, while X los denotes an integral over fields evaluated along the background line of sight.The quantity that is relevant for comparing with observations is then, e.g., the corresponding two-point correlation function, whose second-order contribution is Now note that the split Eq. ( 53) is not unique, as one can always transfer quantities between the three kinds of contributions by performing integrations by parts.Moreover, since we parametrize O with the observed redshift z and position in the sky n, the function O (1) (z, n) is gauge-invariant, because the full observable O(z, n) has no dependence on coordinates whatsoever [58].The split Eq. ( 53) can then be chosen such that each of the three terms are separately gauge-invariant.In practice, however, the expression Eq. ( 53) often presents itself with a gauge-dependent observer term X o , which can therefore be set to zero with an appropriate choice of gauge, e.g., the synchronous gauge.Nevertheless, the expressions found in the literature are often in longitudinal gauge, in which case As already mentioned, one can consistently avoid the presence of observer terms (gauge-invariant or not) by an appropriate integration by parts, but one then needs to take into account the resulting extra terms in X los and X s that this manipulation introduces.Instead, the standard approach in the literature is to work in the longitudinal gauge and simply discard X o as soon as it appears in the computation.The issue of whether one can ensemble average field values at the observer position therefore never arises in the standard practice.As for the line-of-sight terms X los , they are ensemble averaged just like the source terms X s , which implicitly assumes that ergodicity applies to all points on the light-cone, independently of how close they are to the observer point.
It is now well understood that discarding the observer terms X o as described above leads to several qualitative problems: (1) In most cases the resulting O (1) is not gauge-invariant under gauge transformations at the observer because one discards a gauge-dependent term X o .However, the gauge transformation of the observable only produces observer terms, by construction, which is therefore "ok" in this approach, since one is precisely neglecting such terms to begin with.
(2) Following the construction of Sec.II B, Fourier decomposing the field configurations φ(t 0 ) and using statistical homogeneity to eliminate one of the d 3 k integrals, one obtains where the integrand is a combination of field k-spectra at t 0 , multiplied by growth functions and e i k• x factors evaluated on the light-cone.This integrand is generically divergent in the infrared, a fact which has led to confusion in the literature [71].Moreover, the dependence on the infrared cut-off that has to be introduced to control the divergence in Eq. ( 55) breaks the equivalence principle, as the observable correlation function ends up depending on the uniform gravity mode.
Taking into account X o precisely cancels the divergent terms, thus showing that they cannot be associated with physical effects and that the equivalence principle holds as one would expect [56,60,63]. 6As discussed at the beginning of Sec.II, the integrand in Eq. ( 55) is not itself an observable quantity, but it is interesting to see that the consideration of observer terms preserves the physical intuition in k-space.
(3) The sky parametrization n does not generically correspond to the one of an actual observer. 7Indeed, the terms X o contain corrections that implement the diffeomorphism relating the observer parametrization n to the one induced by the local coordinate system around the observer and therefore to the gauge that is chosen.In longitudinal gauge the coordinate-induced frame in the observer's tangent space is not orthonormal in the presence of vector and tensor modes, as the one of an observer should be, so neglecting X o leads to spurious contributions to the "lensing" of observables in the sky.
Nevertheless, it turns out that there is no significant quantitative problem today in neglecting X o to linear order in perturbation theory.This is because the actual observer approximates the ensemble average in Eq. ( 54) with the SO(3) averages of the data in the sky, denoted by . . .SO(3) in what follows.X o is special with respect to this operation, because it has no angular dependence (if its spin vanishes) and can therefore be factored out.Consequently, the contribution X 2  o SO(3) ≡ X 2 o is a monopole term (for spin s = 0) in the corresponding power spectrum, while the cross-terms are simply zero, e.g., If O is an observable of nonzero spin s (e.g., for the observer velocity, where s = 1, or the CMB polarization, where s = 2), then the X 2 o SO(3) will contribute to the first nontrivial multipole which is the s multipole.But the overwhelming part of the relevant information actually lies in the rest of the spectrum, so ignoring the observer terms of cosmological observables is indeed irrelevant quantitatively. 8he caveat of the previous argumentation is that this is only true to linear order in perturbation theory.Indeed, to second order the solution of the observable will have cross-terms of the schematic form9 implying terms of the form for the next order of the two-point correlation function.Importantly, these contributions will affect all multipoles of the third-order power spectrum with an amplitude that is a priori comparable to the standard terms such as ∼ X 2 s X los , etc.Moreover, similar terms will also appear in higher-order correlation functions.Thus, in the era of precision cosmology one can no longer ignore the observer terms X o , in the usual gauge choices that are considered, meaning that it is relevant to discuss their inclusion inside ensemble averages.

B. Relating theory and observation
We now wish to derive the relation between the observational and theoretical N-point functions of observables Eqs. ( 31) and (17).We already have Eq.( 35), but this is a theoretical relation, as it involves ensemble averaging, and simply states that G ob is a full unbiased estimator of G th .Rather, we are interested in relating the two N-point functions as one does in practice, i.e., we need G ob to be evaluated on a definite field configuration, and this is achieved using the ergodic hypothesis.In particular, we want to see how the latter enters the derivation and how it affects the issue of the single observer point.
We start by recalling Sec.II B and consider the set of three-dimensional field configurations φ a (t 0 ) over which we sum when performing the ensemble average in Eq. ( 13).Given the invariance of P[φ(t 0 )] under the isometry group, it is convenient to partition this ensemble into equivalence classes, where two field configurations are deemed equivalent if they can be related by an isometry.For the sake of simplicity, let us consider here the case of flat background space, so that the isometries form the Euclidean group E := SO(3) R 3 , and let us also use the trivial Cartesian coordinates.Any element of a given equivalence class can be described as an isometry of some fixed representative φa (t 0 ), where R is a rotation matrix, c a translation vector and M b a is the matrix that rotates tensor indices in a.We can therefore split the functional integration in Eq. ( 13) into an integral over the elements of a given class followed by an integral over all possible classes.By the latter we mean an integral over suitably chosen representatives φa (t 0 ) such that the corresponding functional integral is well-defined. 10Since the pdf is constant over all representatives of a given class, the integral in Eq. ( 13) becomes ) 10 The existence of such a splitting of the integration is a nontrivial mathematical assertion, whose proof, if possible, would go beyond the scope of this paper.
where we now integrate only over the set of representatives, and is the average over the Euclidean group action over the field configurations, V is the total volume and dR is the SO(3) Haar measure.As one may expect, the ensemble average therefore contains a purely geometric average over the symmetry group of the pdf.
Apart from subsets of measure zero, the configurations φ(t 0 ) appearing in the integral Eq. ( 60) have a rich spatial dependence.In particular, they are nonperiodic functions which therefore probe a large variety of local field profiles for large enough V .In the V → ∞ limit, which we consider here, the ergodic hypothesis states that this probing is thorough enough to make the Euclidean average in Eq. ( 60) independent of the configuration φ(t 0 ).As a result, that average factorizes out of the integral, thus yielding the relation Now on the right-hand side it is a definite, although generic field configuration that is considered.We can then generalize this manipulation straightforwardly to the case of the FCFs Eq. ( 14), since the Euclidean group action is independent of the time variable t, and therefore also to the case of the TCF Eq. ( 17), where now we act with E directly on the four-dimensional fields φ a (x) instead of the φ a (t 0 , x).Note also that we have not specified the ¯ dependence in Eq. ( 63) for simplicity.The above expression allows us to make contact with the OCFs Eq. ( 31).Let us first redefine the dummy variable c → c − R x o in Eq. ( 63) and let us also define the notation for shifted fields to get We next observe that, for a given value of c, the fields φ a, c are rotated by R −1 around x o in Eq. (65).But rotating the fields around x o is tantamount to rotating the observed angles n in the opposite direction, so the SO(3) average on the fields translates into an OCF Similarly, translating the fields by x o − c can be equivalently expressed as shifting the observer by c − x o , so we finally obtain (after renaming c → x o ), i.e., the average over the action of the Euclidean group on the fields φ a is equivalent to averaging over all observer reference frames, as in G ob , but also over all observer positions.This is simply because we have imposed both statistical isotropy and homogeneity.For instance, the CMB maps one would obtain from another vantage point of the universe x o would be different from the ones we observe on earth today, while the theoretical power spectrum which we calculate is the same for all vantage points.Since the relation between the correlation function and their associated spectra is linear, we have that Eq. ( 67) simply translates into Equations ( 67) and ( 68) are the central result of this section.The important thing to understand from them is that the application of the ergodic hypothesis singles out a definite field configuration on the right-hand side, but it does not single out the position of the actual observer, since all possible such positions are taken into account equally inside an average.Therefore, the applicability of the ergodic hypothesis depends on the representative field configuration which we choose on the right-hand side of Eq. ( 67), completely independently of how this configuration is probed observationally.In particular, as stated in the step in which ergodicity is actually used Eq. ( 62), what we need is that field configuration to have a rich profile, which is a decent assumption to make about the configuration of the actual universe.What we also see, however, is that the use of ergodicity is necessary, but not sufficient to relate theory and observations yet, because the information from several x o that is required in Eqs. ( 67) and ( 68) is not observationally available.Rather, the step which actually singles out the observer point is the approximation of the x o averages in the right hand-sides of Eqs. ( 67) and ( 68) by the value at the actual observer position, (69) and respectively.Approximating an average over some set by a single value within that set is of course the crudest possible estimator of that average.One expects this estimation to be accurate enough only if the values of the considered set exhibit a relatively small dispersion around the average value.But this dispersion in x o space cannot be measured for the very same reason that brought us to this approximation in the first place, i.e., the observational unavailability of data at other observation points.Nevertheless, as we will see in the next subsection, the variance of G ob with respect to the x o dependence is equal to its theoretical analog, the statistical or "cosmic" variance, through the ergodic hypothesis.Thus, the error one makes in Eq. ( 69) or Eq. ( 70) is precisely cosmic variance, which is indeed negligible on small enough scales (see Sec. V B).Equations ( 69) and ( 70) allow to relate theory and observation, as both sides can be computed in their respective domains.Finally, another way of interpreting the result Eq. ( 67) is by combining it with Eq. ( 35) to obtain i.e., the ergodic hypothesis and statistical homogeneity equate the ensemble averaging with the translational averaging of a generic configuration.Thus, under these assumptions, while estimating G th with G ob first appears as a double approximation, i.e., the measurement of a single universe realization and from a single viewpoint, these two turn out to be the very same approximation.

C. Unambiguous covariance matrix and cosmic variance
Equation (71) shows that, assuming ergodicity and statistical homogeneity, ensemble averaging G ob is tantamount to averaging it over x o with a generic configuration of the fields.The last open question is therefore whether these two types of averaging are also equal when it comes to the variance associated with the estimation Eq. ( 69).The statistical covariance matrix is (72) where α N collectively denotes the set {z k , n k } N k=1 for notational simplicity, while the spatial one is The former leads to the notion of cosmic variance, so, if both of them are the same, we would have shown that our indiscriminate use of ensemble averaging leads to no new uncertainties between theory and observation.This is simply achieved by noting that a product of two OCFs can be expressed as a partial SO(3) average of a single OCF, Inserting this in Eqs. ( 72) and ( 73) and using Eqs.( 67) and ( 35) we find that both covariances are equal, Cov spat. erg.
and that Thus, the absolute "1-sigma" error of the estimator Eq. ( 69), is unambiguously cosmic variance.We see that our derivation provides a refined understanding of that notion, which is not usually expressed in cosmology textbooks.Given our statistical assumptions, cosmic variance is the error due to the fact that we observe a single realization of the universe and from a single vantage point x o .In accordance with the discussion of the previous subsection, if either of these two conditions were dropped, then there would be no cosmic variance.On one hand, if we had simultaneous access to the data of a single realization from all possible x o , then the ergodic hypothesis Eq. ( 67) would allow us to match the theoretical predictions exactly. 11On the other hand, if we could observe all possible universe realizations, even from a single viewpoint x o , then we would be able to compute directly the theoretical N-point functions, which are independent of x o .In the first case we are technically setting spat.→ 0, whereas in the second one it is rather stat.→ 0.
11 Here we neglect the fact that this information would also require a time ∼V 1/3 to be collected by a main observer and therefore analyzed.
In the next section we will see that, although cosmic variance is always nonzero in practice, the associated relative uncertainty on the spectra must tend to zero when l k → ∞ as a consequence of statistical isotropy and the central limit theorem.Technically, this corresponds to stat.becoming negligible compared to G th for large enough l k values, with statistical homogeneity and the ergodic hypothesis then implying that so does spat., as we just saw in Eq. ( 75).This is the reason why, in practice, one can still obtain statistical cosmological information even from a single observational point.

A. The covariance matrix of angular N-point spectra
From now on we drop the dependence on t o for notational simplicity.Here we wish to derive the expression for the covariance matrix of the reduced N-point and N -point spectra, i.e., the harmonic analog of Eq. ( 76).To that end, we first note that Cov(t o ; α N ; α N ) is a function on S N 2 × S N 2 that is invariant under independent rotations on each of these two components, so we can decompose each of the corresponding arguments in the basis Eq. ( 41).The harmonic components are then found by projecting twice To express this in terms of the reduced theoretical spectra, we use Eq. ( 76) and decompose the TCFs in the basis Eq. (41).By proceeding exactly as in Appendix A to eliminate the SO(3) average in Eq. ( 76), and using the orthonormality relations Eqs.(A5) and (A23), we find Now the second product of Wigner symbols can be simplified using Eq.(B5) iteratively, then Eq. (B6) and finally Eq. (B1) to find 2. The type of diagram composing the covariance matrix of the N-point and N -point spectra.
We therefore obtain where we have defined the following contraction of multilateral Wigner symbols of equal order which is rotationally invariant, as it does not depend on m k indices.From Eq. (81) we see that the covariance matrix of the N-point and N -point spectra is controlled by the squeezed (N + N )-point spectrum, as depicted in Fig. 2, which arises from gluing together an N-point and an N -point diagram.Therefore, the squeezed (N + N )-point spectrum here is already proportional to {l 1 . . .l N |L 1 . . .L N −3 }, which is why we were able to discard that factor coming from Eq. ( 80).The coefficients defined in Eq. ( 82) form a symmetric square matrix in the space of reduced N-point spectra and in Eq. ( 81) we see that this matrix acts on the N-point component of the squeezed (N + N )-point spectrum.Comparing with the analogous expression for the correlation functions Eq. ( 76), we thus see that the A matrix Eq. (82) implements the operation of partial SO(3) average on invariant functions in harmonic space.Finally, note that one can extend Eq. (81) to the cases N, N = 1, 2 by using a redundant notation analogous to Eqs. ( 43), (44), and (45).

B. Cosmic variance in the large l k limit
Let us consider the case N = N , where the covariance matrix quantifies the typical error of the G th ≈ G ob ( x o ) approximation that is cosmic variance, so that the l-space analog of Eq. (77) reads We now note that this quantity has a simple asymptotic behavior at large l k , which is completely determined by the central limit theorem [72].Indeed, thanks to the assumption of statistical isotropy, the O lm ( x o ) for a given value of l are 2l + 1 independent and equally distributed random variables.Thus, the product N k=1 O l k m k ( x o ) for given l k values consists of O( N k=1 l k ) independent and equally distributed random variables.The observational spectrum component ) then corresponds to a weighted sum of these products over their m k indices Eq. (40).However, since the multilateral Wigner symbol Eq. ( 42) is zero unless N k=1 m k = 0, the sum actually only contains O( N−1 k=1 l k ) independent and equally distributed random variables.Finally, noting that G ob ( x o ) is technically a sample average and G th is its ensemble average, the central limit theorem [72] states that Consequently, the relative cosmic variance tends to zero as l k → ∞.In particular, this means that , without the need to perform the average over x o as in Eq. ( 68), thus justifying the approximation Eq. ( 69) for small enough scales.This is therefore how one can obtain accurate statistical cosmological information from a single vantage point for sufficiently large l k .Following the discussion at the end of Sec.IV, this loss of the x o information in the l k → ∞ limit is a consequence of statistical homogeneity and the ergodic hypothesis.

VI. CONNECTED ANGULAR N-POINT SPECTRA A. Definitions, modified ergodic relation and covariance matrices
Here we focus on the part that is usually of most interest: the connected component of a given correlation function or associated spectrum.This is a nonlinear combination of the full statistics G, so one should expect the linear ergodic relations Eqs. ( 67) and ( 68) to be modified.Here we will illustrate this modification by considering the case of the connected two-, three-, and four-point functions.We start by defining the observable fluctuations, in the observational and theoretical cases, as the deviation of the observable from the respective one-point functions or, in terms of the harmonic components, We note in particular that, by construction, the observational monopole is identically zero, while the theoretical one captures precisely the difference between the two one-point functions The connected two-, three-, and four-point functions can then be defined by where here the star is a placeholder for "ob" and "th," the corresponding averages are respectively . . .SO(3) and . . .and we have omitted the x o dependencies in the "ob" case for notational simplicity.Clearly, the these functions are invariant under a common rotation of their arguments, so we can compute their harmonic components by projecting on the basis Eq. (41).Using the connected analog of Eq. ( 36) and also Eqs. (A15), (B5), (B6), and the 3 − j symbol symmetries, we recover the known expressions for the reduced power spectrum, bispecrum, and trispectrum estimators [42] 12 The theoretical counterparts are then obtained by replacing "ob" → "th" and ensemble averaging, again in accordance with the N = 2, 3, 4 results of Ref. [42].
Let us now see how the linear ergodic relation Eq. ( 68) looks like in terms of the connected parts.First, because of Eq. ( 89), the observational spectra vanish identically whenever at least one of their l k entries is zero, i.e., they have vanishing "monopoles" by construction.In contrast, this is not the case for the theoretical spectra Eq. (90).As a concrete example, let us consider the relation between the connected two-point correlation functions, by expressing them in terms of the full functions and using Eq. ( 67) where Cov(z 1 ; z 2 ) is the covariance matrix of the one-point function with itself [see Eq. ( 73)].Alternatively, in harmonic space, The difference with Eq. ( 68) in the connected case is a monopole term, which is also exactly the multipole for which the observational spectrum vanishes identically C ob 0 (z 1 , z 2 ) ≡ 0. Therefore, a better display of Eq. ( 98) could be erg.
= Cov(z 1 ; z 2 ). (99) Thus, the ergodic relation Eq. ( 68) still holds for the components containing physical information C ob l = 0, while the leftover equation relates the covariance matrix of the one-point function to the monopole of the theoretical power spectrum.As a result, the absolute one-sigma error Eq. ( 77) associated with the one-point function approximation, This picture generalizes to the case of higher N, as one can check by expressing th O lm in terms of ob O lm in the theoretical spectra B th and T th .One finds that the resulting equation splits into two sets of equations: the nonmonopole ones, i.e., the ones where none of the l k is zero and for which Eq. ( 68) still holds, and the ones relating the theoretical monopoles to the covariance matrix and higher order analogues (skewness, kurtosis, etc.) of lower-N spectra.For this reason, from now on all of our equations will hold up to monopole terms, so that we do not have to deal with this subtlety.The next nontrivial case is the trispectrum, where one now subtracts products of two-point functions.For the purposes of our point it suffices to describe it schematically as where the ellipses here denote the other two possible orderings of the two-point function arguments.In complete analogy with the derivation in the N = 2 case Eq. (97), using Eq. ( 67) we find so the difference is again a covariance matrix, but now the one of the two-point function with itself.In harmonic space we then find where we recognize the same structure as in Eq. ( 96) by construction and we have the covariance matrix of the two-point spectra Cov l;l (z 1 , z 2 ; z 3 , z 4 ) erg.
A first important difference with the two-point case Eq. ( 98) is that the extra terms here affect multipoles of arbitrary magnitude, although only a small subset, the one of pairwise equal l k 's.A second important difference is that the components of T ob with pairwise equal l k 's are not identically zero in general, i.e., the equations no longer split as in the monopole case.One must therefore take Eq.(104) as it is and infer that the analog of the single-observer approximation Eq. ( 70) in this case is actually Finally, just as the two-point case Eq. ( 98) with l = 0 provides the cosmic variance of the 1-point function for free, the four-point case Eq. ( 104) with l 1 = l 2 =: l, l 3 = l 4 =: l and L = 0 provides the cosmic variance for the power spectrum.However, here T ob lll l |0 = 0, but rather [use Eq. ( 96) and properties of the 3 − j symbols], so, inserting this in Eq. ( 104), using Eq. ( 106) and isolating the covariance matrix, we find This generalizes the result of Ref. [42], worked out for the CMB where there is no redshift dependence, to the case of generic redshifts along the light-cone.In particular, the cosmic variance of the power spectrum, i.e., the absolute 1-sigma error Eq. (77) in the approximation Finally, for Gaussian statistics and equal redshifts, we recover the well-known result13 Given the discussion of Sec.V B and comparing Eq. (110) with Eq. (84), we find that here the central limit theorem essentially states that the trispectrum decays faster with growing l than the C 2 l terms.

B. Mind the monopole when using relative fluctuations
Until now we have considered the "absolute" fluctuations ob O and th O, but in practice the most convenient ones to work with are the relative ones (if G th (z) = 0), This introduces a nonlinear difference between the observational and theoretical definitions and will affect the covariance matrices of the corresponding spectra.Indeed, consider for instance the case of the relative power spectra which are denoted using tilded letters to distinguish them from the absolute ones C ob l and C th l .The corresponding covariance matrix is given by and one would naively expect this to be simply where here Cov l;l is the covariance matrix of the absolute two-point spectrum Eq. (108).In particular, one would then infer the analog of Eq. ( 111) for the corresponding cosmic variance in the equal redshift Gaussian case, However, as we will now show, Eqs. ( 116) and (117) are actually only approximate, because to obtain them one must wrongly assume that G ob ( x o ; z) = G th (z) or, according to Eq. ( 89), that the monopole of the observable is zero at x o for all z, where the remainder term Rll depends on the monopole δ th O 00 ( x o ; z k ) that comes from converting the G ob ( x o , z k ) into G ob (z k ) and which cannot go through the integral over x o , because of its dependence on that variable.The remainder term is therefore controlled by a combination of monopoles of the theoretical power spectrum, bispectrum and trispectrum which, unlike the extra "mon."terms described in the previous section, contribute here to all l values, not just to l = 0. Thus, it is important to know whether one compares absolute quantities or relative ones, because the corresponding covariance matrices are different and therefore so will be the results of the corresponding Fisher forecasts.

VII. CONCLUSION
In this paper we have addressed two potentially important issues for the next stages of precision cosmology.First, we have developed the general theory of the rotationally invariant ("reduced") angular N-point spectra.The method employed so far [42] is not suited for obtaining expressions for arbitrary N, because it requires solving some increasingly complicated algebraic equation as N grows.As a result, one must work out each N case separately and the present literature contains a detailed description only up to the N = 4 case [42].We have presented an alternative construction which provides all the relevant definitions and relations associated with reduced angular N-point spectra straightforwardly and for all N.This includes the covariance matrix of these spectra for arbitrary N and N , which is controlled by a squeezed (N + N )-point spectrum and leads to a cosmic variance of the N-point spectrum that decays as l (N−1)/2 for large l k numbers.Our construction is based on the introduction of the "multilateral" Wigner symbols, which generalize the Wigner 3 − j symbols of triangles to polygons.With these we have built an orthonormal harmonic basis for N-point correlation functions on the sphere, such that the reduced spectra appear as the coefficients in this basis.We have also discussed a corresponding diagrammatic representation, generalizing the one of Refs.[42,69] for the cases N = 4, 5, which was also considered in Ref. [70] in the flat sky limit.Even though the determination of a generic N-point spectrum might be prohibitive numerically, our general framework allows easily for specialization, e.g., to squeezed spectra with only one l varying while the others are fixed to some low value.
The second important part of the paper consists in motivating and justifying the consideration of observer terms of cosmological observables inside the ensemble averages of correlation functions and spectra.Through a careful derivation of the relation between theoretical predictions ("G th ") and observations ("G ob "), under the assumptions of statistical homogeneity and isotropy and ergodicity, we have shown that no special treatment of the observer point is required whatsoever on the theoretical side.Ensemble averaging field products at this point does not introduce any inconsistencies, nor does it imply any new assumptions or uncertainties.This motivates us to consider the full analytical expressions of cosmological observables in calculations, especially at nonlinear order in perturbations where observer terms are intertwined with source and line-of-sight contributions in a nontrivial way, as we argued in Sec.IV A. Thus, with this conceptual ambiguity being lifted, we can now safely state that a rigorous treatment of cosmological observables should include the observer terms, or at least a check of the magnitude of their contribution in the final result if they are neglected.
We have also taken advantage of the present framework to discuss some subtle aspects of working with the connected part of spectra, as one usually does, when relating theory to observation.When working directly with fluctuations of observables around their average (one-point function) value, as is often the case in cosmology, the difference between full and connected spectra is simply a sum of products of spectra with N > 1.However, here we pointed out that the notion of fluctuation is different in the theoretical and observational cases, since the one-point functions are different (ensemble versus sky average).As a result, the relation between theoretical and observational connected spectra involves extra "monopole" terms.Moreover, if one works with relative, instead of absolute, fluctuations of observables, then one must take into account additional corrections which now affect all multipoles, not just the monopoles.A direct consequence of this fact is that the covariance matrices computed with the relative and absolute fluctuations are not proportional to each other, but have an additional remainder term.Thus, this can lead to different results in the corresponding Fisher analyses, depending on whether one uses absolute or relative observational fluctuations.
Finally, and remarkably, the fact that the observational and theoretical one-point functions are not equal in general might also be relevant in the context of parameter estimation involving CMB data.Indeed, the standard treatment of the CMB temperature is to simply neglect cosmic variance and therefore consider these two quantities as equal T := T = T SO(3) ( x o ).This is justified by the high precision of the average CMB temperature measurement [74] and, most importantly, by the smallness of the associated relative cosmic variance C 0 ∼ 10 −5 , which sets the fundamental uncertainty floor.In principle, however, one should consider T (or ω γ ) as an extra cosmological parameter to be varied in the likelihood analysis.Through a careful Fisher analysis that incorporates the above-mentioned subtleties [75], the impact of an independent T on the other cosmological parameters and their errors is estimated in Ref. [76].
where the P l are the Legendre polynomials, arising from the identity In the case of observable products, we obtain the well-known results Eqs. ( 36) and (37).For the case of most interest N > 2, we need to use iteratively the "Clebsch-Gordan" composition rule, (A21) The latter form a basis for SO(3)-invariant functions S N 2 .Indeed, we have just shown that they generate that space, so we must still show that they are orthogonal under the natural scalar product.Using the identity Eq. (B5), where Here we display all the identities satisfied by the Wigner 3 − j symbols that are used in this paper.