CMB B-mode non-Gaussianity: optimal bispectrum estimator and Fisher forecasts

Upcoming cosmic microwave background (CMB) data can be used to explore harmonic 3-point functions that involve the B-mode component of the CMB polarization signal. We focus on bispectra describing the non-Gaussian correlation of the B-mode field and the CMB temperature anisotropies (T) and/or E-mode polarization, i.e.,, and. Such bispectra probe violations of the tensor consistency relation: the model-independent behavior of cosmological correlation functions that involve a large-wavelength tensor mode (gravitational wave). An observed violation of the tensor consistency relation would exclude a large number of inflation models. We describe a generalization of the Komatsu-Spergel-Wandelt (KSW) bispectrum estimator that allows statistical inference on this type of primordial non-Gaussianity with data of the CMB temperature and polarization anisotropies. The generalized estimator shares its statistical properties with the existing KSW estimator and retains the favorable numerical scaling with angular resolution. In this paper we derive the estimator and present a set of Fisher forecasts. We show how the forecasts scale with various experimental parameters such as lower and upper angular band-limit, relevant for e.g. the upcoming ground-based Simons Observatory experiment and proposed LiteBIRD satellite experiment. We comment on possible contaminants due to secondary cosmological and astrophysical sources.

Upcoming cosmic microwave background (CMB) data can be used to explore harmonic 3-point functions that involve the B-mode component of the CMB polarization signal. We focus on bispectra describing the non-Gaussian correlation of the B-mode field and the CMB temperature anisotropies (T ) and/or E-mode polarization, i.e. T T B , EEB , and T EB . Such bispectra probe violations of the tensor consistency relation: the model-independent behavior of cosmological correlation functions that involve a large-wavelength tensor mode (gravitational wave). An observed violation of the tensor consistency relation would exclude a large number of inflation models. We describe a generalization of the Komatsu-Spergel-Wandelt (KSW) bispectrum estimator that allows statistical inference on this type of primordial non-Gaussianity with data of the CMB temperature and polarization anisotropies. The generalized estimator shares its statistical properties with the existing KSW estimator and retains the favorable numerical scaling with angular resolution. In this paper we derive the estimator and present a set of Fisher forecasts. We show how the forecasts scale with various experimental parameters such as lower and upper angular band-limit, relevant for e.g. the upcoming ground-based Simons Observatory experiment and proposed LiteBIRD satellite experiment. We comment on possible contaminants due to secondary cosmological and astrophysical sources.

I. INTRODUCTION
Inflationary cosmology was proposed [1][2][3] to solve several cosmological puzzles: an early period of accelerated expansion explains the homogeneity, isotropy, and flatness of the Universe, as well as the lack of relic monopoles. One of the great successes of the inflationary paradigm is the production of small density inhomogeneities that grow to create the large-scale structure of the Universe today [4][5][6][7][8]. In addition, tensor modes produced during inflation lead to primordial gravitational waves that are potentially detectable in the polarization of the Cosmic Microwave Background (CMB) [9][10][11][12]. Observations of the CMB provide tests of these predictions of inflation and can serve to distinguish between specific inflationary models.
As yet, CMB observations are consistent with a single slowly rolling scalar field as the inflaton, the field responsible for inflation [13]. For these single-field slow roll (SFSR) models, the fluctuations are described by primordial density fluctuations, which are nearly Gaussian, adiabatic, and nearly scale invariant [2,3]. Gaussianity implies that the 2-point correlation function of the density fluctuations uniquely determines all higher even npoint functions while all odd n-point functions vanish. In principle, inflation could be described by variants other than SFSR that introduce significant non-Gaussianity, such as multi-field inflation; models with non-canonical kinetic terms or non Bunch-Davies vacua [14]. As yet no evidence for primordial non-Gaussianity has been found in the Planck data [15]; hence many of these models have been ruled out. Conversely, evidence for non-Gaussian statistics in upcoming data would imply deviations from SFSR inflation and would provide an informative probe of the inflationary dynamics and the associated highenergy physics [16].
While the usual searches for primordial non-Gaussianity focus on the n-point statistics of scalar fluctuations, in this paper we concentrate on the relatively unexplored observational signatures of non-Gaussian correlations involving tensor fluctuations, as previously discussed by [17]. We propose to extend the search for primordial non-Gaussianity from one that only looks for the 'scalar-scalar-scalar' correlation to one that also searches for the 'scalar-scalar-tensor' correlation [17][18][19][20][21][22][23]: the non-Gaussian correlation between two modes of the primordial scalar perturbation and a mode of the tensor perturbation produced during inflation. To enable this goal, we generalize the statistical inference framework used for primordial non-Gaussianity.
The scalar-scalar-tensor correlation is parameterized in terms of the Fourier coefficients of the curvature (scalar) perturbation ζ [24,25] and the two helicity modes ±2 h k that describe the tensor perturbation [18]: with q = k 1 + k 2 + k 3 . The ±2 F functions depend on the inflationary dynamics and can differ between models. Both ζ k and ±2 h k are described in the early radiationdominated Universe at a time when their comoving wavelength 2π/k (with k ≡ |k|) is larger than the Universe's 'comoving Hubble radius' (aH) −1 in natural units. H(t) and a(t) are the Hubble parameter and the Robinson-Walker scale factor as function of cosmic time. Both types of perturbations are assumed to be 'adiabatic', implying that they do not evolve on these 'super-horizon' scales [26]. Evidence for a nonzero ζζh 3-point function would not only point towards a deviation from SFSR inflation [18], but would potentially rule out the majority of currently formulated models of inflation [21]. The reason for this is a robust consistency relation for the 'squeezed limit': |k 3 | |k 1 | ≈ |k 2 |, of the ζζh correlation. In the squeezed limit, ζζh is completely determined by P ζ (k) and P h (k), the power spectra of ζ k and ±2 h k [18,27]: The relation is independent from the dynamics of scalar fields present during inflation and holds as long as modes of the tensor perturbation become adiabatic directly after reaching a super-horizon scale during inflation [21]. The 'polarization tensors' e ±2 ab with a, b ∈ {1, 2, 3} are two traceless, transverse tensor fields that will be precisely defined later. n s − 1 parameterizes how much P ζ (k) deviates from the scale-invariant form, see Appendix C 1.
The tensor consistency relation in Eq. (2) is powerful because its predictions are falsified if a significant ζζh correlation is detected in the squeezed limit [28,29]. An observed violation of the tensor consistency relation would indicate that inflation is described by a nonstandard variant. For example, the relation is violated by inflationary models with light, nonzero spin fields that do not decay quickly after leaving the horizon [21]. As a consequence, falsification of the tensor consistency relation allows ruling out models that approximately respect the de Sitter isometries [30], except for isometryrespecting models with so-called partially massless spin fields [27,31]. Other inflationary models that cannot be ruled out by falsification are those that weakly break some of the de Sitter isometries and couple the extra spin fields to the resulting preferred spatial slicing [32][33][34]. Furthermore, in models where a subset of the de Sitter isometries is strongly broken, there is no reason for the consistency relation to hold [22,23]. This last class includes models in which the tensor perturbations are produced by additional fields [35,36]. These models generally also make predictions for large tensor non-Gaussianity in different forms than just the squeezed ζζh type [37].
It should be noted that tests for the consistency relation of the squeezed scalar-scalar-scalar (ζζζ) correlation [38,39], that are similar to the tests for the tensor consistency relation, are already underway [15]. The consistency relation for the squeezed ζζζ correlation holds for single-field inflation models [28]. 1 A detection of a significant ζζζ correlation in the squeezed limit would experimentally rule out the validity of the consistency relation and would provide evidence for the presence of more than one time-evolving scalar field during inflation. The tensor consistency relation in Eq. (2) is arguably more general than the ζζζ counterpart as it will, in principle, still hold for models with multiple scalar fields [21,45].
The CMB contains cosmological information both in its temperature anisotropies (T ) and linear polarization. The polarization field can be divided in two components: the parity-even E-mode and parity-odd B-mode field [7,8]. Primordial scalar perturbations source T and E-mode polarization, while primordial tensor perturbations source the T , E-, and B-mode fields. Observational searches using T and E constrain both scalar and tensor perturbations. However, the contributions to T and E from scalars are much larger than those of tensors, and so cosmic variance, due to the limited number of measurable modes, prohibits strong constraints on tensor perturbations with T and E data. The inclusion of B-mode data allows for much tighter constraints on tensor perturbations [46]. Furthermore, unlike T , current B-mode observations are not cosmic-variance limited; hence sensitivity to primordial tensor perturbations can significantly increase with B-mode polarization data [47].
For these reasons, this paper focuses on bispectra, the harmonic equivalent of 3-point correlation functions, that describe how a single B-mode perturbation is correlated to perturbations in the E-mode field or the CMB temperature, i.e the T T B , EEB and T EB bispectra. These correlation are currently unconstrained, but will be within reach of observations by currently operating [48][49][50][51], upcoming [52][53][54][55][56][57] and proposed [47,58] experiments. Since B-modes are sourced by primordial tensor modes, these bispectra directly probe the ζζh correlation. The use of the T T B , EEB , and T EB bispectra avoids much of the scalar-induced cosmic variance that plagues current constraints on the ζζh correlation. 2 These constraints are expected to improve by an order of magnitude with the inclusion of current B-mode data [17,22,47]. CMB constraints on ζζζ, already close to the cosmic-variance limit, will not see such improve-ments. 3 Future constraints on ζζh will benefit from the ongoing, unified experimental effort to collect B-mode data in order to constrain the ratio of the primordial tensor-to-scalar ratio r: Besides the fact that a detection of a roughly scaleinvariant tensor power spectrum P h (k) would provide a strong argument against a range of alternatives to inflation [72][73][74][75], constraints on r are used to differentiate between models of inflation [13]. For slow-roll models, r also provides the energy scale of inflation V 1/4 : V 1/4 ∼ r 1/4 × 10 16 GeV [76]. The upper limit on r is determined by the Bicep2/Keck Array and Planck CMB data to be r 0.002 < 0.064 (at 95% confidence level) [13].
In the case of a non-detection, upcoming B-mode observations have the potential to improve over the current 95% CL upper limit by factors of approximately 10 [52,56] and 30 [47, 58]. Statistical inference on primordial non-Gaussianity is generally done using statistical 'estimators'. Loosely speaking, an estimator is a rule to transform observed data into a statistical estimate of a parameter of interest. Here we concentrate on a CMB bispectrum estimator that transforms CMB data into an estimate of the amplitude of a given bispectrum and, simultaneously, the amplitude of the primordial 3-point function responsible for this bispectrum. There is a complication associated with the ζζh 3-point function that prohibits a straightforward implementation of the standard bispectrum estimator, see Eq. (41) [77][78][79][80][81]. Existing bispectrum estimators rely on a summary statistic of the CMB bispectrum: the so-called reduced bispectrum b 1 2 3 , defined in Sec. III of this paper [82]. Data are usually compared to a version of the reduced bispectrum that is separable (factorizable) in 1 , 2 , and 3 . For data with a large harmonic bandlimit max this separable form reduces the computational scaling of the estimator from O( 5 max ) to O( 3 max ) [77]. The problem is that the (k 1 ) a (k 2 ) b e ±2 ab (k 3 ) term that is present in the ζζh 3-point correlation function results in reduced bispectra that are not separable into 1 , 2 , and 3 [83]. Without a separable form of the reduced bis- 3 While inference on certain standardized types of ζζζ non-Gaussianity will only improve by a factor of approximately two with upcoming CMB data [47], it is possible that more complicated non-Gaussian features would still be hidden in the data. This is especially true for models with oscillating or non-smooth inflationary potentials (see e.g. [60][61][62][63][64]) or models that predict non-Gaussian n-point correlation functions with n > 3 [27,65,66]. In terms of improving constraints on (especially squeezed) ζζζ correlation functions, observables such as galaxy clustering [67,68], 21 cm tomography [69], the crosscorrelation between CMB lensing and galaxy clustering [70] or the cross-correlation between the primary CMB anisotropies and small-scale spectral distortions of the CMB [71] have the potential to significantly improve constraints in the (far) future.
pectrum, inference on ζζh likely becomes an enormous computational challenge. 4 We demonstrate that a numerically efficient estimation of the amplitude of the ζζh 3-point correlation is still possible by making use of the full bispectrum instead of the reduced bispectrum, and propose a generalization of the standard bispectrum estimator (Eq. (58)). This generalization, which can be seen as the main result of this paper, allows for computationally efficient (and statistically optimal) estimation for all ζζh 3-point functions that include the (k 1 ) a (k 2 ) b e ±2 ab (k 3 ) term in the following way: Here it is assumed that f can be expressed as (a sum of terms) separable in the three wave numbers k 1 , k 2 , and k 3 . It is argued how numerical evaluation still scales as O( 3 max ) and how the proposed estimator is exact: it does not rely on lossy data compression or on the flat-sky approximation [17,91].
In Appendix A it is shown how the estimator can be adapted to other non-standard 3-point correlation functions. We derive an estimator for scalar 3-point functions that are sensitive to the presence of higher-spin fields during inflation [65,92,93] and provide estimators for 3-point functions that involve two or three tensor components. 3-point functions with multiple tensor components are relevant for inflation models with peusoscalar-gauge field interactions [37,[94][95][96], models with higher-derivative terms in the inflationary gravitational sector [97] and bimetric gravity models [98].
To illustrate the potential of the generalized estimator for testing the tensor consistency relation we provide a number of Fisher forecasts that represent idealized experimental outcomes. These forecasts demonstrate the min and max dependence of constraints on the amplitude of the squeezed ζζh correlation. The forecasts also show the influence of the lensing B-mode power spectrum, the effects of reionization and the advantage of using both temperature and E-mode data in addition to the B-mode data. We comment on the expected contamination that is associated with B-mode data and the high-resolution data needed for squeezed 3-point functions. In future work the generalized estimator will be 4 Approximate methods that retain some computational efficiency without relying on a separable form do exist (the binned bispectrum estimator [84] and the modal estimator [85][86][87]) and have been successfully applied in the Planck analysis [15, 88,89]. The first constraint on the amplitude of the ζζh 3-point function in [59] was made with a modified [90] version of the modal estimator. Despite the fact that the binned and modal estimators are broadly applicable, they are relatively involved, not strictly statistically optimal and have an unnecessary computational overhead in the case of reduced bispectra that are already in separable form. For inference on such bispectra the dedicated estimator developed in Ref. [77][78][79][80][81] provides a simpler and more efficient solution.
applied to simulated microwave sky data to evaluate the Fisher forecasts. The current paper is organized as follows. We first review the CMB anisotropies, bispectrum and the primordial 3-point correlation function in Sec. II. We then introduce the generalized bispectrum estimator in Sec. III and present Fisher forecasts for the tensor-scalar-scalar bispectrum in Sec. IV. We discuss future work in Sec. V and conclude in Sec. VI.

II. PRELIMINARIES
A. CMB anisotropies The data we consider are spherical harmonic modes of the CMB temperature and linear polarization anisotropies on the celestial sphere. After a brief review of the general properties of the harmonic modes, we will demonstrate the linear relation between the CMB anisotropies and the primordial scalar and tensor perturbations.
The temperature harmonic modes are related to the CMB temperature T measured at positionn ∈ S 2 on the celestial sphere by: where dΩ(n) and Y * m are the differential solid angle and a complex-conjugated spherical harmonic function respectively. See appendix B 1 for a summary of our notation.
The symmetric, traceless tensor field that describes the linearly polarized component of the microwave sky can be decomposed into two (real) fields: Q(n) and U (n). These fields are coordinate-dependent quantities that transform among themselves when the local coordinate basis (the tangent space) on the sphere atn is rotated. For that reason, it is convenient to combine these fields into a complex 'spin-2' field on the sphere: (±2) P , which is defined as follows: Under a right-handed rotation of the local coordinate system around the pointn we then have: where ψ is the angle of rotation. The sign of the exponent is a convention. Instead of directly using (±2) P , we will describe polarization in terms of the harmonic modes of two fields that are scalars under coordinate rotations aroundn: the parity-even E field and the parity-odd B field. The harmonic modes of these two fields: the E-and B-modes, are related to the locally observable field as follows: The spin-weighted spherical harmonics s Y m form a complete and orthonormal basis for spin-s functions on the sphere, analogous to the regular spherical harmonics. See Appendix B 1 for a brief overview. The parity-even E and parity-odd B harmonic modes transform differently under the parity transformation of the underlying spherical coordinates. Under parity, the odd moments of the temperature anisotropies and the E-mode field gain a minus sign. The opposite behavior holds for the B-mode field: To describe the primordial adiabatic scalar perturbations that source the CMB anisotropies, we use the gauge invariant curvature perturbation ζ [24,25]. 5 As the initial adiabatic state is constant on super-horizon scales, we only need to consider the amplitude of ζ on some spacelike hypersurface in the early radiation-dominated era when all Fourier modes of interest were super horizon. The Fourier coefficients of this amplitude at early time t i are given by: wheret = t + δt(x, t) parameterizes weakly perturbed spacelike hypersurfaces relative to comoving coordinates {x, t} of the flat FLRW background. Throughout this work k denotes a 3D comoving wave vector. The primordial tensor perturbation h is the traceless and divergenceless linear perturbation to the flat FLRW metric: with h a a = ∂ a h ab = 0. Instead of using the coordinate basis to describe the tensor perturbation, we use a basis that sits perpendicular to the unit wave vector 5 The invariance under choice of gauge (the choice of constant-time spacelike hypersurfaces and constant-position timelike worldlines) of ζ explains why it can simultaneously be interpreted as e.g. the spatial curvature on hypersurfaces with constant energy density or as the energy density perturbation on spatially flat hypersurfaces [99]. k, spanned by theê (±) unit vectors. 6 On this new basis, the tensor perturbation conveniently reduce to two helicity states with Fourier coefficients given by: The polarization tensors e ±2 are two symmetric, traceless and transverse tensor fields that transform h from the comoving coordinate basis to theê (±) basis. The polarization tensors have the following properties: The tensor perturbation h is gauge invariant (in the same sense as ζ is) [100]. The helicity components (±2) h are scalars under coordinate transformations up to a phase factor depending on the orientation of the basis spanned byê (±) . 7 Let us categorize the stochastic primordial (superhorizon) amplitudes in terms of their helicity λ: Following the notation set by [83], we then write down a compact expression for the observed CMB modes in terms of these helicity-dependent super-horizon amplitudes and a set of rotationally invariant transfer functions T (k): with Z ∈ {ζ (scalar), h (tensor)}, sgn(0) ≡ 0, 0 0 ≡ 1, X ∈ {T, E, B}, and helicity and parity determined by: To relate the basis vectors of the comoving coordinatesê (a) to those of the noncoordiate basis, we introduce a set of 'polarization' vectors: {e + , e − , e 0 }, such thatê (λ) = e a λê (a) with λ ∈ {+, −, 0}. Geometrically, theê (±) basis vectors span the plane perpendicular to the wave vector, whileê (0) points along the wave vector. The three vectors form a complete orthonormal basis. We letê (±) describe states of circular polarization, i.e. the polarization vectors obey (e a ± ) * = e a ∓ . 7 The polarization tensors are defined in terms of the ± polarization vectors as e ab ±2 ≡ √ 2e a ± e b ± . In the Cartesian basis, we may define the polarization vector as e ± = {1, ±i, 0}/ √ 2 for a wave vector aligned with theẑ direction. The addition of a complex phase exp(−iψ) to this definition amounts to an equally suitable basis that is simply rotated around the wave vector. The polarization tensors and helicity components are thus defined up to exp(−2iψ).
Note that by defining ∓2 Y * m in Eq. (16) on the transverse basis spanned byê (±) , we ensure that the a X, m for Z = h are independent of the orientation of this basis. This approach is fully analogous to the decomposition of the spin-2 polarization field in Eq. (8).
The transfer functions T (k) transform the superhorizon amplitudes ζ k and (λ) h k to the CMB radiation and its polarization seen today [8,101]. In short, once the comoving Hubble radius (growing after inflation has ended) becomes larger than the comoving wavelengths of ζ k and (λ) h k , they 'enter the horizon' and start to evolve with time. The scalar perturbations sourced by ζ begin to oscillate under the effects of gravity and photon pressure, resulting in the acoustic oscillations seen in the CMB angular power spectra. The helicity components ±2 h start to propagate through space as the two polarization states of a gravitational wave, virtually decoupled from the other components of the Universe, and decay away with the expansion of space [76,102,103]. As a result, the most prominent difference between the scalar and tensor transfer functions is that the latter result in small values for CMB fluctuations on small ( > 100) angular scales. Small-scale tensor perturbations that entered the horizon before recombination decay significantly before leaving their imprint on the CMB. The transfer functions only depend on the unperturbed background cosmology and are readily available through numerical Einstein-Boltzmann solvers such as CAMB [104,105] or CLASS [106]. 8 The projection onto the celestial sphere is also handled by the transfer functions. See Appendix C 1 for more details on the transfer functions used in this work.
With Eq. (16) we have quantified the relation between the CMB anisotropies and the primordial scalar and tensor fields. The relation reiterates an important point: the primordial scalar fluctuations do not source the parityodd B-mode field at linear order [8]. Higher-order cosmological effects, such as weak lensing by matter along the line of sight [107] or second order time-evolution of the scalar perturbations [108][109][110], create a B-mode signal even in the absence of a primordial tensor contribution. Such effects are not included in the linear transfer functions so their influence has to be described separately. The same is true for signal from astrophysical foregrounds. We will briefly discuss these contributions in Sec. V but will consider them in more detail in a future paper.

B. Bispectrum and the primordial 3-point function
In Sec. II B 1, we summarize general properties of the observable of interest: the CMB bispectrum. As we are interested in bispectra that include a B-mode compo-nent, we explicitly discuss the inclusion of B-mode polarization. In Sec. II B 2 we then introduce the concept of a linearly propagated, or primary, bispectrum: a primordial 3-point correlation function that is evolved to the CMB bispectrum today by the linear transfer functions introduced in Sec. II A. In addition, we describe the primordial ζζh 3-point correlation function in more detail.

General properties of the bispectrum
The bispectrum is defined as the isotropic 3-point correlation function represented in terms of spherical harmonic coefficients. The bispectrum is proportional to the multivariate generalization of the skewness of a probability distribution and thus vanishes for purely Gaussian coefficients.
We can formulate a bispectrum for every combination of the temperature and polarization components X 1 , X 2 , X 3 ∈ {T, E, B}: The a X, ,m are defined in Eq. (5) and Eq. (8). Statistical isotropy constrains the azimuthal dependence such that the bispectrum may always be factored into a Wigner 3j symbol and a factor independent of m 1 , m 2 , and m 3 [111,112]: We will refer to the l.h.s. as the bispectrum, while B on the r.h.s. is the angle-averaged bispectrum. See Appendix B for an overview of the Wigner 3-j symbols. It is possible to construct a parity-invariant bispectrum from three fields regardless of the parity behavior of the individual fields. This means that we can form a parityinvariant bispectrum for all combinations of T , E, and B. This is not the case for the angular power spectrum. 9 From Eq. (9), we see that invariance under parity alone imposes that 1 + 2 + 3 = even for bispectra with an even number of B-mode contributions and 1 + 2 + 3 = odd otherwise; see Table I [17,113].
Taken together with isotropy, which demands that the crosscorrelation is proportional to δ 1 2 δm 1 m 2 , we see that there is no parity-invariant configuration. The BE power spectrum vanishes by extension. I. Factor gained after a parity transformation P :n → −n, for bispectra B 1 2 3 m 1 m 2 m 3 ,X 1 X 2 X 3 grouped by X1, X2, X3 polarization indices.
Isotropy forces the 1 + 2 + 3 = even components of bispectra to be real while the 1 + 2 + 3 = odd parts are purely imaginary. This constraint can be deduced from the condition for isotropy in Eq. (18) and the reality condition of the harmonic coefficients: which holds because the underlying X = {T, E, B} fields are real-valued. The combination of these two conditions together with the reality of the 3-j symbols then implies: which, through the property of the 3-j symbol in Eq. (B11), means that complex-conjugating the bispectrum results in the following behavior: Note that the Wigner 3-j symbol vanishes for m 1 + m 2 + m 3 = 0. Clearly, the above relation also holds for the angle-averaged bispectrum B 1 2 3 . We have suppressed the X indices as the above holds for all combinations of polarization indices. See Table II for an overview of the geometric constraints on parity-invariant, isotropic bispectra. The fact that the bispectra of interest here: T T B , T EB , and EEB , are purely imaginary is a consequence of the complex representation of the spherical harmonics that we use. Expressed in terms of the Stokes parameters Q and U , the corresponding 3-point correlations would be real-valued and thus observable. before, the linearly propagated bispectrum is formed by time-evolving a primordial 3-point correlation function to the CMB bispectrum today using the linear transfer functions introduced in Sec. II A. We then introduce the standard scalar-only (ζζζ) primordial 3-point correlation function as well as our main focus: the ζζh 3-point correlation function. Let us parameterize the super-horizon 3-point correlation function, the object we are ultimately interested in, as a helicity-dependent quantity using the amplitudes introduced in Eq. (15): where the helicity λ is 0 for scalar perturbations and ±2 for tensor perturbations. We can then, using Eq. (16), form the linearly propagated bispectrum [114]: Note that the three Z indices of the bispectrum in Eq. (21) may each be either ζ or h. We now consider the symmetries of the primordial 3point function. The assumed translational invariance of the process generating the primordial fluctuations implies momentum conservation in Fourier space: What remains now is to consider certain expressions for the helicity-dependent (λ1,λ2,λ3) F (k 1 , k 2 , k 3 ) functions. In a regular analysis, these functions would be given by the model under consideration. Here we are more interested in classes of models, so we use general parameterizations.
For the scalar-only (ζζζ) 3-point function, isotropy demands that F only depends on scalar products of the three wave vectors: the individual amplitudes and k 1 ·k 2 , k 1 · k 3 , and k 2 · k 3 . F cannot depend on a pseudoscalar like k 1 · (k 2 × k 3 ) in case of a parity-invariant 3-point correlation function. For simplicity, we use the following template: where f is generally referred to as the shape of the bispectrum. We will make use of this standard ζζζ template to introduce the reader to existing estimation techniques later in this paper.
For the ζζh case, we use the following parameterization: Recall that roman indices denote three-dimensional spatial comoving coordinates; they are summed over when repeated. Note that (00+2) F and (00−2) F correspond to two independent 3-point functions; by denoting the shape function f (ζζh) independent of helicity, we however implicitly assume parity invariance. The class of ζζh 3-point functions described by Eq. (24) include those predicted by SFSR inflation [18]. The amplitude of the ζζh 3-point function will be too small to be observable with CMB data in the SFSR case. More importantly, the template in Eq. (24) also applies to the majority of mentioned models that violate the tensor consistency relation in Eq. (2) and thus potentially produce an observable signal [19][20][21][22][23]. We may therefore use Eq. (24) as the basis for inference on such models.
To gain intuition for the characteristics of the ζζh template, it is useful to realize that the delta function in Eq. (22) imposes that k 1 + k 2 + k 3 = 0, i.e. the 3-point function is defined on triangular configurations of the three wave vectors. The f (ζζh) (k 1 , k 2 , k 3 ) part of the ζζh template thus assigns a weight to each triangle based on the lengths of the three sides. While these weights completely determine the 3-point function in the ζζζ case, the ζζh case requires that two more aspects are taken into account. First, the ζζh 3-point function is always suppressed in triangular configurations wherein the wave vector of the h k Fourier mode is roughly (anti)parallel to the wave vector(s) of one or both of the scalar modes. This suppression is not due to the f (ζζh) weight function but is a consequence of the nature of the polarization tensors. Their transverse property demands that (k) a e ±2 ab (k ) vanishes ask becomes equal tok . We thus see a suppression whenk 3 aligns withk 2 and/ork 1 in Eq. (24). Secondly, the transverse traceless behavior of the ζζh 3-point function is also reflected in its helicity dependence.
We have demonstrated how the CMB is affected by a nonzero primordial 3-point correlation function through the bispectrum. We have also introduced the ζζh 3-point correlation function in Eq. (24). The bulk of this work will focus on this 3-point function. Note that the estimation technique that will be presented in the following sections is, in principle, also applicable to other types of 3-point functions. For conciseness, the discussion of some other templates (including the SFSR scalar-tensor-tensor and tensor-tensor-tensor 3-point functions [18]) is placed in Appendix A.

III. ESTIMATOR
This section is organized as follows. We first introduce the general form of the bispectrum estimator in III A. In III B, we then summarize the existing numerically efficient implementation of the estimator and Sec. III C we present our new work: the generalization of the fast implementation to the ζζh case.

A. General bispectrum estimation
We summarize the properties of the now standard CMB bispectrum estimation method [78,82,115]: a parametric search for the amplitudes of theoretically motivated bispectrum templates using an estimator that consists of a cubic and a linear statistic. This method has been the basis for the Planck non-Gaussianity analysis [15]. A derivation of the estimator and discussion of its properties can be found in Appendix D.
The estimator yields an estimate of the overall (dimensionless) amplitude f NL ∈ R of a bispectrum. We thus parameterize the bispectrum of interest as: where B 1 ≡ B(f NL = 1) is a fixed theoretical template with suppressed and m indices.
In searches for primordial non-Gaussianity, the template B 1 is given by a normalized version of the linearly propagated bispectrum in Eq. (21). The linear nature implies that the f NL parameter corresponds to the overall amplitude of the primordial 3-point correlation function (−λ1−λ2−λ3) B in Eq. (21). In principle, the amplitudes of several templates can be jointly estimated (see Appendix D). Here we only need the single parameter variant.
The estimator for f NL is given by: where X ∈ {T, E, B}. The data: a X, m , only enter in inverse-covariance-weighted form: Here C −1 is the inverse of the block matrix: Each element is defined as: with X, X ∈ {T, E, B}. This covariance matrix includes both the signal and noise covariance and is therefore generally not diagonal. The estimating procedure considers the covariances as fixed and known a priori. Intuitively, the first and second line, the 'cubic term', in Eq. (26) serve as a matched filter that correlates the observed bispectrum with the theoretical template B 1 . The terms linear in the data (first times third line) are usually jointly referred to as the 'linear term' and effectively serve to counter the estimator variance induced by the anisotropic parts of the covariance matrix [78,81]. Only the cubic part of the estimator is needed in cases where the covariance matrix in Eq. (28) is rotationally invariant. 10 With weakly anisotropic covariance, the linear term can be neglected for non-squeezed bispectrum templates and/or analyses without large-scale ( 100) data [91].
The normalization of the estimator is given by the following (dimensionless) number: Note that I 0 is completely independent from the observed data.
The estimator is often referred to as 'optimal'. The word 'optimal' refers to the fact that, in the appropriate limit, the estimator yields an unbiased point estimate of f NL with variance given by the inverse of the model's Fisher information on f NL . It should be noted that this behavior is strictly true only in the limit where all non-Gaussian signal vanishes, this includes f NL → 0. The expression in Eq. (30) becomes equal to the Fisher information on f NL in this limit. In Appendix D we specify the likelihood function of the data to make the above statements more precise.
The estimator in Eq. (26) is well-suited to estimate upper limits on f NL . When a weak non-Gaussian signal is present, the estimator is still usable, but one has to be wary of biases and non-optimal variance [116,117]. This is especially relevant for B-mode data contaminated by Galactic signal or high-resolution data with relatively strong non-Gaussian contributions from e.g. weak lensing. See the discussion in Sec. V for more details.
We end this summary with a practical note on the inverse covariance matrix C −1 that appears in the linear term and normalization of the estimator. This matrix is typically approximated by a Monte Carlo average over inverse-covariance-weighted Gaussian a X, m with the same signal covariance, noise covariance, masking etc. as the data, i.e. as if drawn from the distribution specified by Eq. (28): 10 One can check that the rotational invariance of the bispectrum forces the linear term to be proportional to the (unobservable) CMB monopole perturbation when the covariance matrix is rotationally invariant [115].
which converges because: where ( . . . ) denotes the ensemble average over the multivariate N (0, C) distribution. The need for this complication arises because the high-dimensional covariance matrix is usually too dense for regular matrix operations. 11 On the other hand, it is generally possible to evaluate Eq. (27) with iterative methods.

B. Fast bispectrum estimation
In this section we motivate the need for an efficient way to evaluate the estimator in Eq. (26) and review the standard method to do so: the KSW estimator. When used to estimate the amplitude of primordial 3-point functions, the KSW estimator applies to the ζζζ correlation but not to our main interest: the ζζh correlation. We will introduce the generalized version of the KSW estimator that can be used for ζζh in the next section.
The number of numerical operations needed to evaluate the estimator in Eq. (26) quickly grows to enormous sizes as the resolution of the data, i.e. max , increases. Even when the costs of computing C −1 a are ignored, direct evaluation of the estimator in Eq. (26) will asymptotically scale as O( 6 max ). The isotropy of the bispectrum may be used to reduce this scaling to O( 5 max ) by, for instance, fixing m 3 = −(m 1 + m 2 ), but this scaling is still unmanageable.
To avoid the O( 5 max ) scaling, bispectrum estimation generally focuses on separable bispectrum templates to reduce the scaling to O( 3 max ) (albeit possibly with a relatively large prefactor). The most straightforward implementation of this idea is formulated by Komatsu, Spergel and Wandelt [77], in what we will refer to as the KSW estimator. See Ref. [118] for technical details and Ref. [80,81] for a generalization that uses E-mode data in addition to T data.
Simply put, the KSW estimator exploits the idea that for a hypothetical bispectrum template: the sum in Eq. (26) can be factored into three independent parts, thereby reducing the scaling to O( 2 max ). Of course, this hypothetical bispectrum template is not suitable, as it is not rotationally invariant. The decomposition in Eq. (18) forbids isotropic templates that are explicitly factored like this. In reality, the KSW approach therefore uses a slightly modified version of the above decomposition. The numerical advantage is largely maintained with the modified version. 11 The signal and noise covariance matrices are typically sparse in the spherical harmonic and (pixel) coordinate basis respectively. The combined matrix is then sparse in neither of the two bases.
The modification comes in the form of the Gaunt integral expression. It allows the rotationally invariant part of the product of three (spin-weighted) spherical harmonics to be expressed in terms of Wigner 3-j symbols. The general expression can be found in Eq. (B10). Here we only need the following version: with J 000 1 2 3 given by: Now consider the reduced bispectrum [82]: where B 1 2 3 is the angle-averaged bispectrum. We have suppressed the polarization indices for simplicity. Note that the reduced bispectrum is only defined for 1 + 2 + The crucial insight is that isotropy does not constrain the reduced bispectrum in any way. Given a reduced bispectrum that is separable in N fact sets of functions as: we may thus express the bispectrum as: 12 Restricting to 1 + 2 + 3 = even does not introduce a loss of generality for the parity-invariant T T T , T T E , T EE , and EEE angle-averaged bispectra that are usually considered (see Table II), but, as was shown in the previous section, angle-averaged bispectra can in general be nonzero for 1 + 2 + 3 = odd.
We will refer to bispectra that can be written as the above expression as 'locally separable'. This name refers to the fact that the integrand of the angular integral is separable in ( 1 , m 1 ), ( 2 , m 2 ), and ( 3 , m 3 ). We conclude that, while factored bispectra as in Eq. (33) are forbidden, isotropy allows a locally separable template like Eq. (39). As we will see, the alluded computational benefits are largely retained for such templates. It is important to note that up to now we have not assumed that the bispectra are sourced by primordial 3-point functions.
With regards to primordial non-Gaussianity, the above construction is only useful when actual theoretical bispectrum templates can be reduced to the form of Eq. (38). Fortunately, this is the case for a large class of linearly propagated bispectra sourced by the ζζζ correlation. For such bispectra, the condition in Eq. (38) is met when the shape of the 3-point function in Eq. (23) is separable in k: Note that the N fact parameter in Eq. (38) is linearly related to N prim ; the constant of proportionality between N prim and N fact will be detailed later in Sec. III C 4. The local shape in Eq. (C7) is an example of a separable shape template. The equilateral and orthogonal shape templates used in the Planck analysis [15] have been specifically derived to be separable [78,119]. Going back to the general, not-necessarily primordially sourced case, it remains to be demonstrated how separable reduced bispectra actually lead to a reduction in computational cost. To see this, we insert Eq. (39) into Eq. (26) and write down the cubic part of the resulting expression: The A functionals yield spin-0 fields on the sphere given by the inverse covariance-weighted data, weighted by the factors of the reduced bispectrum (f , g , and h , see Eq. (38)). For example: Note that we have reintroduced the polarization indices and assume they only run over X ∈ {T, E} here. Evaluating Eq. (41) does not quite scale like O( 2 max ) as one might expect but as O(N fact 3 max ). Simply put, the scaling is determined by the O( 3 max ) scaling of the recursive algorithms needed to compute the spherical harmonics which have to be recomputed N fact times. 13 This is still an significant improvement over the general O( 5 max ) scaling.
The Monte Carlo expression for the linear term in Eq. (26) becomes equal to: The two additional terms denoted by 'cyclic' are obtained by cyclic permutations of f (i) , g (i) , and h (i) . Evaluation of this linear term scales as O(N sim N fact 3 max ), where 100 N sim 1000 iterations are typically needed for a sufficiently accurate estimate [118].
The estimator normalization I 0 in Eq. (30) is evaluated by a Monte Carlo estimate. We omit the details of this aspect of the estimation procedure, and only mention the two methods that are used in practical applications. The most straightforward estimate of I 0 is given by the variance of the unnormalized estimator applied to an ensemble of simulated Gaussian data effectively drawn from the distribution specified by Eq. (28). A similar but slightly more involved Monte Carlo procedure is described in [118]. This second method is shown to converge for smaller ensembles than the first method.
In summary, a primordial ζζζ 3-point correlation function described by a separable shape function will source a separable reduced bispectrum. We have established that the separability of the reduced bispectrum allows the use of the KSW estimator, see Eq. (41). Finally, the KSW estimator is a prescription that alleviates the scaling of the estimator in Eq. (26) from O( 5 max ) to a more manageable O(N fact 3 max ).

Overview
We now turn to the situation for the ζζh 3-point correlation function. We explain why the standard KSW estimator, derived in Sec. III B, does not apply to this type of correlation. We then come to the main new result of this paper: we introduce an alternative approach that allows the construction of an efficient estimator for the ζζh correlation.
Recall that for the ζζζ correlation the necessary condition for a separable reduced bispectrum is given by Eq. (40): a separable shape function. Unlike the ζζζ 3-point correlation function, the ζζh correlation is not uniquely specified by a shape function. It turns out that when the reduced bispectrum for the ζζh template in Eq. (24) is computed, the result is non-separable in 1 , 2 , and 3 [59]. This holds true even when the f (ζζh) shape function in Eq. (24) is separable in k 1 , k 2 , and k 3 , which means that the responsible piece is the angular term: Despite the angular dependence, this term is a scalar under spatial coordinate transformations. The term provides a weight and complex phase to each {k 1 ,k 2 } configuration relative to the wave vector of the tensor perturbation but has no preference for a global orientation of the three wave vectors. The associated CMB bispectrum is therefore isotropic and has a trivial dependence on its m 1 , m 2 , and m 3 azimuthal numbers, given by Eq. (18). With the azimuthal numbers constrained by isotropy, the geometrical coupling between the wave vectors in Eq. (44) can then only manifest itself in an explicit coupling between the 1 , 2 , and 3 multipole orders, which in turn prevents the reduced bispectrum to be separable. Without a separable reduced bispectrum we cannot construct the KSW estimator for the ζζh template by simply inserting the factors of the reduced bispectrum into Eq. (41). To derive a generalized KSW estimator for this template, let us observe that each term in the sum over spatial indices in Eq. (44) is factored in the three wave vectors. Of course, unlike the summed expression, the individual terms are not 3-scalars; the decomposition is coordinate dependent. By itself, each term can be interpreted as a homogeneous but anisotropic 3-point function. Homogeneity is still preserved by the overall delta function in Eq. (22). 3-point functions of this form result in anisotropic bispectra 14 that are locally separable in the sense of Eq. (39). The anisotropic expressions differ from the isotropic one in Eq. (39) by the f , g , and h factors; they gain a dependence on m in addition to .
Roughly speaking, we thus exchange isotropy for separability. The estimates of the amplitudes of the anisotropic terms combine into an estimate of the amplitude of the original isotropic template. The trade-off is that several anisotropic templates have to be considered for one isotropic template. Constructing analogues of the cubic and linear estimator terms in Eq. (41) and Eq. (43) for an anisotropic template will turn out to be rather straightforward. The generalizations of the A functionals in Eq. (42) will transform the data in an anisotropic manner, but note that this operation does not scale differently than the regular isotropic transformation. The overall scaling of the estimator with max will thus be 14 The bispectrum is isotropic by definition so an anisotropic bispectrum should be understood as a shorthand for a harmonic 3-point function that does not obey Eq. (18).
unchanged. The number of anisotropic terms needed for 3-point functions of the type in Eq. (44) turns out to be only five. The amount of extra computations compared to the ζζζ estimator is thus rather insignificant. Guided by the rough arguments provided in this section, we now turn to the actual derivation of the proposed estimator. We will first derive the expression for the linearly propagated bispectrum for the ζζh 3-point function and demonstrate how it is indeed given by a sum of anisotropic bispectra. We will then construct the actual estimator.

Full bispectrum for the scalar-scalar-tensor template
In this section we derive the linearly propagated bispectrum for the ζζh 3-point correlation function. As mentioned in the previous section, we require an expression for the full bispectrum instead of the angle-averaged or reduced bispectrum.
The general expression for the linearly propagated bispectrum in Eq. (21) is most easily evaluated by separating the integrals over the three wave vectors in angular and radial integrals. In order to do so we need to rewrite the delta function that imposes momentum conservation in Eq. (22). Additionally, we express all angular terms of the 3-point function as spin-weighed spherical harmonics in order to simplify the angular integrals.
We start with the delta function. We make use of the plane wave expansion in terms of spherical harmonics and spherical Bessel functions: with k = kk and x = rn. The unit vectorn represents the direction of the line of sight from the origin of the comoving coordinate system (our location). Using this expansion we decompose the delta function into radial and angular parts: See Appendix B 3 for details. Although the integral over n is given by the Gaunt integral expression in Eq. (B10), it will turn out to be important to leave the expression factorizable in L 1 , L 2 , and L 3 so we do not solve the angular integral. We then move on to the angular part of the ζζh template in Eq. (24). As discussed, this part is already expressed as a sum of factorized terms, so we leave it in its uncontracted form. However, we express the unit vectors and polarization tensor in terms of spherical harmonics. In a general coordinate system, not necessary aligned with k 1 , k 2 , or k 3 , the two unit vectors in Eq. (24) are decomposed into dipole ( = 1) moments with a longitudinal (m = 0) and two solenoidal (m = ±1) modes, while the polarization tensor is decomposed into quadrupole ( = 2) moments with longitudinal (m = 0), solenoidal (m = ±1) and transverse (m = ±2) modes. To retain the correct transformation properties, the quadrupole moment is expressed in terms of spin-±2 spherical harmonics on the plane perpendicular tok 3 . As the 45 resulting combinations have to sum to a 3-scalar, each combination has to be weighted by the appropriate Wigner 3-j symbol. The resulting expression is given by [83]: The selection rules of the 3-j symbol limit the azimuthal modes to only nine combinations: those that obey m a + m b + M = 0.
We may now use Eq. (46) and Eq. (47) to decompose the ζζh 3-point function in radial and angular parts, resulting in the following expression: As is required for the KSW estimator, we assume that the shape function f (ζζh) is separable, i.e. it obeys: The N prim sets of f , g, and h functions depend on the model under investigation so we leave them unspecified.
We have now gathered all ingredients to form the lin-early propagated CMB bispectrum for the ζζh 3-point correlation function. We do so by combining Eq. (48) and Eq. (49) and inserting the result into Eq. (21). Because we have separated the 3-point function in radial and angular parts, the expression neatly factors into six independent integrals. We evaluate the angular integrals using the generalized Gaunt integral relation in Eq. (B10). The resulting contribution to the CMB bispectrum is then as follows: Here we have, as a shorthand, defined the following set of functionals for all Z ∈ {ζ, h}, X ∈ {T, E, B}: The T (k) transfer functions were introduced in Eq. (16). To evaluate the sum over the tensor helicities we have made use of the following relation: which reflects that the f (ζζh) shape function in Eq. (48) is helicity-independent. Recall that x 3 ∈ {0, 1} indicates whether the X 3 CMB field is parity-even or parity-odd.
The J symbols are defined in Eq. (B9).
The expression for the bispectrum in Eq. (50) is a bit verbose, but this expanded form will make it easier to construct the estimator in the next section. The expression shows how the bispectrum can be separated into factors that only depend on 1 , 2 , or 3 . Of course, the expression, taken as a whole, ought to be isotropic. This may be checked by summing over all az- As expected, the resulting expression reduces to the isotropic form in Eq. (18) but yields a non-separable angle-averaged/reduced bispectrum.
Each term in the sum over m a , m b , and M in Eq. (50) describes an anisotropic bispectrum. Each of these bispectra is 'locally' separable in the sense of Eq. (39). The integral over the comoving radial coordinate r in Eq. (50) may be replaced with a weighted sum over N quad integration points. Combined with the N prim terms in the primordial shape function there will then be N fact = N prim N quad locally separable terms.
The allowed combinations of L 1 , L 2 , and L 3 per ( 1 , 2 , 3 ) triplet in Eq. (50) are quite limited; depending on the polarization indices of the bispectrum only eight or twelve combinations are allowed [83]. Recall that the capital L's arise from the expansion of the delta function in Eq. (46). The specific values can be found by systematically going over the 3-j symbols, including the ones hidden in the J symbols (see Eq. (B9)). First, note that J 000 1L1 1 and J 000 1L2 2 require L 1 + 1 and L 2 + 2 to be odd. The triangle conditions of the 3-j symbols in the second and third line then enforce L 1 = | 1 ± 1| and L 2 = | 2 ± 1|. The term in square brackets in the fourth line forces L 3 + 3 to be even when x 3 = 0 or odd when x 3 = 1. The triangle condition of the 3-j symbol in the fourth line then requires We have derived the linearly propagated bispectrum for the ζζh 3-point correlation function: a crucial ingredient for the derivation of the estimator. The resulting 15 First express the angular integral over Y L 1 M 1 , Y L 2 M 2 , and Y L 3 M 3 in terms of the Gaunt integral and then sum over the five 3-j symbols that depend on azimuthal numbers using Eq. (B17) [83].
bispectrum is given in Eq. (50). We have showed that the bispectrum can be viewed as a sum of anisotropic bispectra. As a sanity check of the derivation one may verify that the bispectrum holds up to the general constraints due to parity invariance that were formulated in Sec. II B 1. For polarization triplets X 1 , X 2 , X 3 with even parity, i.e. X 3 = B, the bispectrum is real and nonzero when 1 + 2 + 3 = even. On the other hand, when X 3 = B (so x 3 = 1), the bispectrum becomes purely imaginary and nonzero only for 1 + 2 + 3 = odd.

K functionals
Before constructing the estimator it is instructive to take a more detailed look at the K ,L functionals defined in Eq. (51). They will become an important part of the estimator. We thus have a brief digression in which we illustrate the role of the functionals in Eq. (50). Readers that are more interested in the actual estimator may skip this section.
The K's are a straightforward generalization of the α (r) and β (r) functions introduced in the KSW description for the local model [77]. 16 In the original KSW description the K's serve to transform the factors of the 3-point function to the factors of the reduced bispectrum, i.e. f (k) → K[f ] = f . For the ζζh estimator, the K's still serve to transform factors of the 3-point function into factors of the bispectrum. The difference is that, as can be seen in Eq. (50), the factors of the 3-point function now each require multiple transformations to account for their non-scalar nature.
Let us first focus on the K functionals that are relevant for regular ζζζ non-Gaussianity estimation: the K's with L = and transfer functions for Z = ζ. It is convenient to consider a constant input function f (k) = 1, the resulting functions are equal to the α X (r) functions defined in Ref. [80]: where X ∈ {T, E} because of the ζ transfer function.
The α X (r) functions have a special interpretation: they serve as the transfer functions in coordinate space instead of Fourier space. Eq. (52) is an inverse Fourier transform (i.e. inverse spherical Hankel transform) of the transfer function T (k) and it is true that the observable CMB harmonic modes sourced by ζ may be expressed as follows [79]: for X ∈ {T, E}. Here ζ m (r) are the spherical harmonic coefficients of the same initial amplitude of the curvature perturbation as in Eq. (10) but now decomposed on spherical shells around the origin of the comoving coordinate system: Recall thatt(x, t i ) denotes a spacelike hypersurface in the early radiation-dominated era. The solid lines in Fig. 1 show α X (r) for X = T and X = E as function of comoving radius on the initial spatial hypersurface. The lines show how ζ m (r) contributes to a X, m for = 60 over a range of comoving radii around 14000 Mpc. In terms of the conformal time along the path of a radially traveling photon (∆τ = r/c), this range of r is roughly centred around the epoch of recombination. Another response at r ≈ 9000 Mpc corresponds to the rescattering of CMB photons at reionization. Finally, at r 3000 Mpc there is a slowly rising response as r approaches zero for X = T and 150 that corresponds to the late-time integrated Sachs-Wolfe (ISW) effect.
The fact that K (ζ) [1] yields the radial transfer functions provides a physical reason why the K functionals result in functions that are highly localized in r. During bispectrum estimation the integral over r has to be evaluated as efficiently as possible; the localized nature of the radial functions is thus highly beneficial. We will now see how and why the radial functions used for the ζζh bispectrum differ from the ones used for regular scalarsourced bispectra. These new functions will turn out to be slightly less localized in r, but the difference is minor.
Eq. (53) must hold because the harmonic modes of the curvature perturbation on spherical shells ζ m (r) in Eq. (54) are related to the harmonic modes of the Fourier representation of ζ through the following simple relation: Here the ζ m (k) are the coefficients of the spherical harmonic decomposition of the angular part of ζ k in Eq. (10): One can check that Eq. (53) holds by inserting Eq. (56) and Eq. (55) into Eq. (16) and making use of the orthonormality of the spherical harmonics.
In turn, Eq. (55) is valid because ζ is a 3-scalar, it has no intrinsic angular dependence. The projection from the Fourier basis to a basis of spherical shells at comoving radii r is thus completely determined by the 'orbital' angular momentum of the field, i.e. the projection is determined by the plane wave decomposition of the 3D Fourier basis functions in Eq. (45). Simply put: projecting a Fourier mode of a 3-scalar to an angular mode with multipole order and azimuthal mode m sitting on a shell at radius r only requires transformations involving j and Y m . Inserting Eq. (55) into Eq. (56) demonstrates this behavior.
For fields that are not 3-scalars, a relation like Eq. (55) will not hold. In these cases, the coupling between the intrinsic angular dependence of the field and that of the plane wave contributes to the projection. The exact expressions for these 'total angular momentum' projection operators may be found in Ref. [120][121][122]. We will use the general properties of these operators to gain a better understanding of the role of the second multipole index of the K ,L functionals.
In the above we argued that the projection of a single Fourier mode, i.e. a plane wave, to an angular mode with multipole order and azimuthal mode m sitting on a shell at radius r will only involve j and Y m . The same projection for an intrinsically dipole-like ( = 1) field that is modulated by a plane wave will involve operators constructed out of j ±1 and Y ±1m . Two distinct projections exist in this case: one for the longitudinal (m = 0) component of the dipole-like-field and one for the solenoidal (m = ±1) components [120]. Similarly, the projection of an intrinsically quadrupole-like ( = 2) field modulated by a plane wave will involve j , Y m ; j ±1 , Y ±1m ; and j ±2 , Y ±2m . Again, there are distinct projections for the longitudinal (m = 0), solenoidal (m = ±1), and transverse (m = ±2) components of the field. This time, a projection using and ±2 only contributes to the parityeven component of the resulting field; the ±1 projections only contribute to the parity-odd component [120].
Having gained this intuition, it is now understood why only the terms with L 1 = | 1 ± 1| and L 2 = | 2 ± 1| contribute in the second and third line of Eq. (50) respectively. Each of the two lines describes how a dipole moment constructed out of one of the two unit wave vectors in the 3-point function template in Eq. (44) is projected to a set of angular modes on spherical shells at radius r. The prefactor given to K ,L Y LM in the second and third line of Eq. (50) will change depending on whether the longitudinal mode (e.g. m a = 0) or the solenoidal modes (e.g. m a = ±1) are projected.
The K functionals with L = ± 1 differ substantially from the L = variants used in the ζζζ KSW estimator. This is especially true for low ( 500) multipole orders. We plot the K[1] , ±1 functions next to the regular radial transfer functions in Fig. 1 to illustrate this. Note that for 500 the functions with L = ± 1 converge to the shape of those with L = although there remains a small phase shift in r regardless of .  Fig. 2. The plotted range again roughly corresponds to the recombination era. Not shown is another small response that corresponds to the reionization era. There is no equivalent for the late-time ISW effect. In Fig. 3 we plot the same functions but for L = { ± 1}. These functions are used to project the quadrupole moment of the 3-point function to the CMB B-mode field.
The small aliasing effects seen in Fig. 2 and Fig. 3 are purely numerical; both j L and T in Eq. (51) oscillate rapidly with k. The integral thus requires a large number of k integration points to completely converge for each value of r. We have verified that the bispectrum and the results in Sec. IV are not sensitive to these numerical artifacts.
The point of this section was to explain the role of the K ,L functionals present in the ζζh bispectrum in Eq. (50). As illustrated in the figures, the functionals with = L, i.e. the ones needed for the ζζh bispectrum, differ substantially from the = L functionals that are used for the standard ζζζ bispectrum.

Estimator
Using Eq. (50), the expression for the ζζh bispectrum, we now write down the estimator for the amplitude of this bispectrum template. For simplicity we start by neglecting the linear term in the estimator in Eq. (26) and focus on the cubic term.
The expression for the bispectrum in Eq. (50) is sourced by the ζζh template. The order matters, the observed CMB bispectrum is also sourced by the 3-point functions with permuted ζ and h indices. However, it will be convenient to keep ignoring the ζhζ and hζζ contributions for now and start by constructing the estimator for the ζζh template only. We thus divide the (cubic part of) the estimator in three parts: and start with the first term on the r.h.s. We reap the benefits of our work in the previous sections; the cubic estimator is simply constructed by inserting the expression for the bispectrum, Eq. (50), into the general expression for the estimator in Eq. (26) and keeping the terms cubic in the data. Let us again stress that this result is only achieved through the use of the full bispectrum as opposed to the angle-averaged or reduced bispectrum. The resulting expression for the cubic part of the estimator, and the main result of this paper, is given by: We have again made use of the shape template in Eq. (49). The two extra terms are cyclic permutations of f (i) , g (i) , h (i) . The six permutations of the shape function in Eq. (50) thus reduce to three. This is possible due to the invariance under simultaneous interchange of f (i) , g (i) and m a , m b in Eq. (58) or, more physically, the indistinguishability of the two scalar components of the 3-point function.
The similarity of Eq. (58) to the standard KSW estimator in Eq. (41) is evident. There are, however, two clear differences between the expressions. The most important difference is the anisotropy in the dependence on m a , m b , and M ; as a reminder, in order to construct the equivalent of a KSW estimator, we need to construct pieces of the bispectrum separable in l 1 , l 2 , and l 3 , and this can only be done at the expense of introducing several anisotropic templates. As discussed previously, the estimates of the amplitudes of the anisotropic terms combine into an estimate of the amplitude of the original isotropic template. The anisotropy appears in the Wigner 3-j symbol and the m a , m b , and M indices of the generalized A functionals in Eq. (58). We will discuss the meaning of the m a , m b , and M indices and the 3-j symbol in more detail in the remainder of this section, but in short: each (m a , m b , M ) triplet corresponds to a combination of the longitudinal, solenoidal and/or transverse angular modes of the contracted angular term, see Eq (44), that is present in the ζζh 3-point function. For a given (m a , m b , M ) triplet, Eq. (58) estimates the contribution from the corresponding combination of angular modes to the data; the 3-j symbols then provides a relative weight to each contribution when all are summed into the final estimatef ζζh NL,cubic . The second difference between Eq. (58) and Eq. (41) is the absence of the radial integral over comoving radius r in Eq. (41). This difference is mainly notational. As explained in the previous section (Sec. III C 3), the radial integral is also present for standard ζζζ estimation because it is part of the transformation between factors of the 3-point function to the factors of the reduced bispectrum, e.g. f (k) → f . The radial integral could be implicitly included in Eq. (41) by letting the sum over i run over N fact = N r N prim values, with N r denoting the number of quadrature point used to evaluate the radial integral and N prim defined in Eq. (40).
Before coming to the computational scaling of the estimator, let us focus our attention to the generalized A functionals in Eq. (58). For a given input function f (k), A (Z) (S,n) [f ] returns a scalar field on a spherical shell at comoving radial coordinate r. The S index denotes whether the associated factor of the 3-point function is a monopole (S = 0), dipole (S = 1) or quadrupole (S = 2) source. The n index tells us whether we are considering the longitudinal (n = 0), solenoidal (n = ±1) or transverse (n = ±2) part of the source. From Eq. (58) we see that for the ζζh bispectrum we only need the S = 1 functionals for the Z = ζ part and the S = 2 functionals for the Z = h part.
At each radial coordinate r we may decompose the A functionals in terms of spherical harmonics: The resulting harmonic modes are given by linear transformations of the inverse-covariance-weighted data. Based on the primordial index Z, we identify two cases: Note that for the Z = ζ case, the sum over X only runs over {T, E}, while for Z = h it runs over {T, E, B}. The parity behavior associated with a given polarization index X is denoted by x. The K functionals are defined in Eq. (51). The data are filtered by the different K functionals in an anisotropic manner depending on the value of n. For example, the (A (S,2) ) LM modes are sourced by the m = −(M + 2) modes of the data.
The inverse spherical harmonic transformation needed to evaluate Eq. (59) scales as O( 3 max ) and will in reality determine the overall scaling of the estimator evaluation. One might worry that the sums over and m needed to construct the harmonic coefficients in Eq. (60) will contribute significantly to the computational scaling. This is not the case, as the selection rules of the Wigner 3-j symbols forbid most values of and m. Only ∈ L ± 1 and m = −(M + n) are needed to compute A To compute the angular integral in Eq. (58), the pixelization scheme used for the A[f ] fields (or 'maps') must support harmonic band-limits given by the sum of the band-limits of the three individual maps (see Eq. (B8)). In reality, the A (ζ) maps will be likely band-limited by the instrumental beam or noise covariance. On the other hand, the A (h) maps only contain information on large ( 200) scales; the tensor transfer functions suppress all information in the data on smaller scales. Smallscale tensor perturbations produced by an approximately scale-invariant process are inaccessible through the primary anisotropies. Unlike scalar perturbations, smallscale tensor perturbations decay away with cosmic expansion before recombination.
It is instructive to take a closer look at how the symmetries of the spherical harmonics and the 3-j symbols relate the harmonic coefficients of the A functionals with (S, n) to those with (S, −n). This relation can be used to approximately half the number of inverse harmonic transformations needed to evaluate Eq. (58). Assuming that the input function f (k) is real-valued, the coefficients transform as follows under complex conjugation: The m a = m b = M = 0 case is unique, the other four combinations in Eq. (63) are related to the remaining four combinations by a factor (−1). The 3-j symbol in Eq. (58) does not change if this minus sign is added to its lower indices. For the A maps, Eq. (62) tells us that the addition of a minus sign to the n index is equivalent to complex conjugation. For the products of A maps the following thus holds: Note that we have suppressed the f (i) , g (i) , and h (i) input functions to the A's. Eq. (64) implies that, instead of computing nine products, one can only calculate five products of complex A maps and discard the imaginary parts to evaluate Eq. (58). The fact that only five out of nine terms are needed can be understood from the original expression for the angular term. Starting with the nine terms in the sum over a and b in Eq. (44), the symmetry under simultaneous exchange of a, b and ζ k1 , ζ k2 removes three degrees of freedom. The vanishing trace of the polarization tensor removes the fourth. It is easy to see that the two additional estimator terms with permuted indices in Eq.
After deriving the cubic part of the estimator, the linear term is obtained in an analogous way. It can be found by inserting the bispectrum in Eq. (50) into Eq. (26) and keeping the terms linear in the data: We again assume an input shape function parameterized by Eq. (49). The eight additional permutations in Eq. (66) are those constructed by cyclic permutations of f (i) , g (i) , h (i) and by varying which pair of A's sits in the MC brackets. Similar to the total cubic term, it may be checked that including the two cyclic permutations of ζζh simply amounts to:f tot NL,lin = 3f ζζh NL,lin .
Finally, the normalization of the estimator I 0 may be estimated by simply applying the unnormalized estimator to an ensemble of simulated data. Given the expressions for the cubic and linear term presented here, the efficient algorithm from Ref. [118] for the estimation of the normalization can also be used for this type of bispectrum. We omit the details of this implementation.
This concludes the derivation of the estimator for the ζζh 3-point function. The resulting expression is given in Eq. (58). In Appendix A we show how one would repeat this effort for several more involved 3-point functions.

IV. FISHER FORECASTS
We forecast the expected uncertainty on an upper limit on the amplitude of a squeezed ζζh 3-point correlation function. We illustrate the constraining power of current and upcoming CMB experiments, and demonstrate how the upper limit depends on certain instrumental effects. We expand on previous forecasts in Ref. [17,22] by taking into account the dependence on the lower harmonic band-limit of the data, the addition of E-mode data and the extra variance induced by weak lensing. In a future paper we apply the derived estimator to a set of map-based simulations to better judge the effects of foreground contamination, non-trivial noise covariances and secondary non-Gaussian contamination. In this light, the forecasts presented here should be considered as a baseline for more realistic forecasts.

A. Procedure
Before presenting the results from the Fisher forecasts, this section specifies the exact parameterization of the ζζh 3-point function. We also explain the assumed experimental setup and the numerical implementation of forecast calculation.
We parameterize the k-dependent part of the ζζh template in Eq. (24) as follows: A s represents the amplitude of the curvature perturbation (see Appendix C 1). We imagine an analysis that looks for a deviation from the tensor consistency relation by placing an upper limit on the amplitude of the squeezed 3-point function; we thus use the standard local shape as f (k 1 , k 2 , k 3 ) template as a generic squeezed shape template. See Eq. (C7) for the precise expression. The local shape differs slightly from the SFSR shape template [18] used in Ref. [17,22,59]. However, the two templates give almost equal weight to squeezed configurations with a large-wavelength tensor perturbation. Given that the tensor perturbation only sources CMB anisotropies on large angular scales, we may, for all practical purposes, consider the shapes as equal here. This is reflected in the results we obtain: our forecasts agree with those in Ref. [17,22] when parameters overlap. 17 For simplicity, we only consider the T T B , EEB , and T EB bispectra in the forecasts. We thus do not take into account the information contained in the T T T , T T E , T EE , and EEE bispectra. The main justification for this choice is the associated extra cosmic variance due to the lack of a B-mode component. Additionally, it should be noted that the squeezed T T T bispectrum is expected to be relatively strongly contaminated by secondary non-Gaussian signal [123]. It is expected to be of limited use for our purpose; see the discussion in Sec. V.
We use the inverse Fisher information I 0 as an estimate for the estimator variance. The 1σ upper limits that we will quote are simply given by 1/ √ I 0 . We calculate the Fisher information in the limit of no non-Gaussian signal contribution, i.e. we use Eq. (30). We further simplify the situation by assuming isotropic signal and noise covariances. The resulting diagonal covariance matrices, together with the orthonormality relation of the Wigner 3-j symbols in Eq. (B14) allow the Fisher information to be expressed in terms of angle-averaged bispectra. The 17 The definition in Eq. (68) differs from the one used in Ref. [17,31] by a factor √ r: f here NL = √ rf there NL , where r is the tensor-toscalar ratio. To compare our results to those in Ref. [22], use f here NL = (λsst ) there . effects from incomplete sky coverage are treated in a simplified manner by taking into account an increase in estimator variance proportional to the observed fraction of the sky (f sky ). Given this trivial scaling, we assume f sky = 1 in all of the following. Finally, we use the lensed version of the CMB power spectra, but neglect the non-Gaussian aspects of CMB lensing. See the discussion in Sec. V D.
As explained in Sec. III C 2, we may obtain the angle-averaged version of the ζζh bispectrum by summing over the m a , m b , M , M 1 , M 2 , and M 3 indices in Eq. (50) and inserting the resulting bispectrum into Eq. (18). This will yield the expression first derived in Ref. [83]. The first term in Eq. (70) for the primordial shape in Eq. (49) is given by: The other two terms in Eq. (70) are obtained by permuting the ζ and h indices. The five permuted terms in Eq. (71) refer to permutations of the f (i) , g (i) and h (i) functions. The K functionals were introduced in Eq. (51). The evaluation of Eq. (69) has an overall O( 3 max ) scaling. The computation is feasible because the K functionals in Eq. (71) can be precomputed. However, for high band-limits (e.g. max = 5000 used below) the procedure is unwieldy. This is especially true when multiple choices for the inverse signal+noise covariance matrix C −1 are to be explored. The computation of multiple Wigner 9-j symbols at every valid ( 1 , 2 , 3 ) triplet exacerbates the situation compared to the Fisher information for a ζζζ bispectrum.
To get around the computational complexity of Eq. (69), we split the problem in two parts: We first store a sparsely sampled representation of Eq. (71). We then 18 Note that the angle-averaged bispectrum used in Eq. (69) is only invariant under cyclic permutations of 1 , 2 , and 3 . For odd permutations, it picks up a factor (−1) 1 + 2 + 3 . Although we consider the 1 + 2 + 3 = odd case here, the factors (−1) cancel in the expression for the Fisher information, so we may still use the 1/∆ 1 2 3 simplification.
interpolate this representation over all multipole orders when the sums over 1 , 2 , and 3 are performed. This approach results in an insignificant reduction in accuracy but reduces evaluation time significantly. Computing I 0 with max = 5000 takes roughly 30 CPU minutes. The method is effective because the smoothness of the primordial templates and transfer functions (in k and respectively) translate into an angle-averaged bispectrum that is rather smooth with 1 , 2 , and 3 . 19 The sparse sampling is determined by the following binning scheme: ∆ = 1 for ≤ 50, ∆ = 4 for 50 < ≤ 200, ∆ = 12 for 200 < ≤ 500, ∆ = 24 for 500 < ≤ 2000, and finally ∆ = 40 for > 2000. This binning scheme is used for the 1 , 2 , and 3 dimensions. In each resulting three-dimensional bin, a single valid sample (depending on the parity and triangle constraints) is selected. The angle-averaged bispectrum for each (X 1 , X 2 , X 3 ) polarization tuple is then calculated over all selected samples. The integral over r in Eq. (71) is evaluated using the trapezoidal rule with 500 integration points that span 0 ≤ r ≤ 18000. Most points are placed around regions corresponding to the reionization and recombination eras. With some effort, we expect that the number of r samples can be reduced by a factor of 10. The resulting sparse, angle-averaged bispectra are compact enough to be saved to disk. Finally, to evaluate Eq. (69) the sparse representations are interpolated over all valid multipole combinations using a threedimensional linear interpolation scheme. The result is weighed by the (unbinned) inverse covariance matrices in Eq. (69).
The above algorithm is implemented in a publicly available Python code library. 20 The code makes heavy use of the scientific SciPy and NumPy libraries. 21 performance-critical steps are compiled to optimized machine code at runtime by Numba: a just-in-time Python compiler [124]. The Wigner symbols are evaluated using the WIGXJPF library [125]. The radiation transfer functions and CMB power spectra are computed using CAMB. Finally, every step of the code has been written with the Message Passing Interface (MPI) standard in mind; computing in parallel on distributed memory systems is therefore possible. The code should be relatively easily adaptable to other (smooth) bispectrum templates. The repository also contains the necessary scripts to reproduce the results in the following section.
In summary, we use the Fisher information to forecast the expected upper-limits on the amplitude of the squeezed ζζh 3-point function. The exact form of the ζζh correlation is specified in Eq. (24) and Eq. (68) with the standard local shape template for f (k 1 , k 2 , k 3 ).

B. Results
The results presented in this section fall into three categories. We first study how the expected upper-limits on the ζζh amplitude vary as function of upper and lower angular band limits. Second, we explore how advantageous it is to use both T and E-mode data together with the B-mode data. Finally, we investigate the deterioration of the upper-limits due to gravitational lensing.
We start by exploring how the lower angular band-limit of the B-mode data affects the constraining power. The flat-sky forecasts in Ref. [17] did not probe this regime. The lowest achievable lower band-limit B min is one of the main distinctions between ground-based and satellite CMB experiments. The atmosphere prohibits measurements over large angular scales. Current B-mode data from ground-based observatories reach B min ≈ 50. Polarization modulation techniques, such as spinning halfwave plates, might allow future efforts to reach an effective B min ≈ 30 [52]. Without atmospheric contamination satellite missions can in principle reach B min = 2. In reality, it remains to be seen if uncertainty on systematic instrumental effects and Galactic foregrounds will allow such a challenging measurement to be made. A more conservative estimate for a satellite (or balloon-borne) experiment would be B min ≈ 20. In Fig. 4 we show the achievable 1σ upper limits on f tot NL as function of overall band-limit max and lower bandlimit B min . There is no contribution from instrumental noise, the only source of uncertainty is the cosmic variance induced by the Gaussian components of ζ and h. The lensing contribution to the B power spectrum is assumed to be 'delensed' to only 10% of the ΛCDM amplitude (A BB lens = 0.1). It is clear that as long as the Gaussian contribution to h is neglected, i.e. r = 0, the upper limits strongly benefit from a low B min . Scattering at reionization significantly contributes to the 20 B-mode components of the bispectrum for r < 0.001. The lensing contribution to B is essentially negligible at such large angular scales, so the low-B-mode data become a highly sensitive probe of the squeezed bispectrum. When r = 0, the additional cosmic variance induced by h quickly closes this window, even though there still remains a significant dependence on B min for r = 0. We find that for r ≥ 10 −2 , the 1σ upper limits conform rather well to the max (log( B max / B min )) 1/2 scaling conjectured in Ref. [21]. Here max refers to the band-limit of the T and E-mode data, while B max refers to the band-limit of the B-mode data. The scaling fits well when B max ≈ 150: roughly the maximum multipole order that contains usable information on the primordial tensor perturbation for a 90% delensed B-mode power spectrum. The curves in the two panels in Fig. 4 that have r < 10 −2 do not fit the scaling: the relatively strong contributions from reionization and lensing are not captured by the analytic relation.
The relative importance of the low-B-mode data also grows when the lensing contribution to the B power spectrum is increased. This behavior is depicted in Fig. 5. As we move from no lensing BB contribution to the full ΛCDM amplitude, the low-B-mode data becomes more relevant. This is a simple consequence of the shape of the lensing contribution relative to the bispectrum. The dominant lensing contribution to the estimator variance, i.e. the B lensing power spectrum, is roughly constant with on large-scales while the T T B , EEB and T EB bispectra peak at configurations with large-scale B-mode components.
Note that the lower band-limit used for the T and E data is set at = 2 for all results presented in this section. The rational behind this choice is that the WMAP and Planck data already provide cosmic-variance limited data for T and E on large angular scales. Note that this is not strictly true for the E-mode data. Current 30 E-mode data is systematic limited [126,127]. We have checked that by conservatively removing the ≤ 30 Emode data the curves do not visibly change.
We now focus on the individual and combined con- Here, the f tot NL parameter is the amplitude of the ζζh 3-point function with a local shape function. The lines in each panel correspond to lower band-limits B min of the B-mode data. The vast improvement due to low-multipole B-mode data seen in the upper-left panel is caused by the contribution from reionization to the bispectrum. When the tensor power is increased (the other three panels) the scaling with B min becomes more regular: the contribution from reionization gets suppressed by the B-mode power spectrum. Still, the low multipole orders contain a significant amount of information on the 3-point function. These results take into account the Fisher information in the T T B , T EB , and EEB CMB bispectra. The lensing contribution to the B-mode power spectrum is assumed to be 'delensed' to only 10% of the ΛCDM amplitude (A BB lens = 0.1).
tribution of the T T B , T EB and EEB bispectra. In Ref. [17] only the T T B bispectrum was taken into account. Ref. [22] additionally calculated the Fisher information associated with the EEB bispectrum. In Fig. 6 we demonstrate how combining the information in T and E (in addition to B) yields much better results than the Fisher information of the individual cases would suggest. This effect is also seen in ζζζ non-Gausianity estimation and can be traced back to the fact that the T and E transfer fuctions for ζ are out of phase [79]. The same is true for the radial transfer functions we use, see Fig. 1. This effect holds up under slightly more realistic circumstances: by adding 4 µK-arcmin white noise to the T harmonic modes and 4 √ 2 µK-arcmin to the E and B harmonic modes, we see the same behavior.
Finally, we investigate the relation between lensing amplitude and instrumental noise level. As mentioned before, the lensing signal serves as a cosmic variance con- Here, the f tot NL parameter is the amplitude of the ζζh 3-point function with a local shape function. The lines in each panel correspond to lower band-limits B min of the B-mode data. As the lensing contribution to the B-mode power spectrum A BB lens is increased from the upper-left panel to the lower-right panel, upper limits worsen and become more dependent on the lowmultipole B-mode data. These limits take into account the Fisher information in the T T B , T EB , and EEB CMB bispectra. The tensor contribution to the CMB power spectra is sourced by an r = 0.001 primordial tensor power spectrum.
tribution to the estimator variance. The lensing contribution to the T and E-mode power spectra provides a relatively minor contribution, while the contribution from the lensed B power spectrum is significant. Fortunately, the lensing contribution to the B-mode field is not entirely irreducible: with knowledge of the lensing potential, the lensing contribution can be reduced, or 'delensed' [128]. In Fig. 7 we show upper limits as function of instrumental B-mode noise for the case of only 10% lensing contribution to the B-mode power spectrum (A BB lens = 0.1) and for the full lensing contribution. The instrumental B-mode noise ranges from 50 to 0.3 µK-arcmin. To put this in context: the upper value roughly corresponds to the noise level in the Planck data. The Simons Observatory [52] and LiteBIRD [58] experiments aim to achieve a B-mode noise level of approximately 3 µK-arcmin, while the CMB-S4 proposal [47] aims for approximately 1 µK-arcmin. From the figure it becomes clear that the lensing BB contribution starts to dominate over the instrumental noise for noise amplitudes below 5 µK-arcmin. This is unsurprising given that the large-scale B-mode lensing contribution is well Combined constraints (i.e. from the Fisher information in the T T B , T EB , and EEB bispectra) (solid) are significantly stronger than those obtained from a naive addition of the B + T and B + E Fisher information. This effect holds when (white) noise is added to the data; the left panel shows the noiseless case, while in the right panel 4 (4 √ 2) µK-arcmin noise is added to the T (E, B) harmonic modes. For these noise levels, the T (E) data are cosmic-variance limited up to ≈ 4000 (2500). For data with higher band-limits ( max) the constraints saturate due to the noise. The addition of white noise to the B-mode data is responsible for the overall upward shift of the curves in the right panel. Note that the lower harmonic band-limit of the B-mode data is set to = 2 for this figure.
approximated by 5 µK-arcmin white noise [129]. We can thus infer that for the Simons Observatory or LiteBIRD experiments the gain from B-mode delensing would be noticeable but relatively minor, while an experiment like CMB-S4 would need at least a factor of 10 of delensing in the B-mode power to make use of the potential of the instrumental sensitivity.
In summary, the forecasts demonstrate that the statistical improvement with angular band-limit roughly follows the expected behavior for a squeezed 3-point function, with the exception that a low min for the B-mode data is more advantageous than one would naively expect. Constraints benefit significantly from the simultaneous use of T and E-mode data. Lastly, future experiments will need to delens their B-mode data significantly to keep improving upper-limits. It should be noted that these conclusions will likely differ for shapes that are not squeezed.

V. DISCUSSION
Generally, we expect two effects that will influence our ability to measure primordial non-Gaussianity. The first effect is a bias in the estimated amplitude of the primordial signal, i.e. a mismatch between the true ampli- For B-mode data that is 'delensed' to only 10% of the ΛCDM amplitude (A BB lens = 0.1) (left panel), the constraining power saturates roughly below 1 µK-arcmin. Without delensing, the constraints already start to saturate below 5 µK-arcmin. This behavior is essentially independent of the noise level of the T and E data: the same curve is seen regardless of whether 1 ( √ 2) (solid) or 10 (10 √ 2) (dotted) µK-arcmin noise is added to the T (E) data. Note that the harmonic band-limit of the data is set to max = 5000.
tude and the expectation value of the estimate, due to other non-Gaussian signal that mimics the primordial signal. Non-primordial, non-Gaussian signal is for example caused by secondary extragalactic sources and Galactic foregrounds. In some cases, these biases may be subtracted from the estimate or captured by a joint estimate, see e.g. the lensing-ISW bias in the Planck analysis [15]. A second, more irreducible effect comes from the fact that non-Gaussian signal, primordial or secondary, will contribute to the estimator variance. When this contribution exceeds the contribution from (cosmic) variance from the Gaussian CMB component and detector noise, simulations of the responsible non-Gaussian signal are needed order to accurately characterize the estimator variance. While we will leave a detailed discussion of both effects to a future publication, here we provide a brief discussion. We focus on contaminants for squeezed bispectra with one large-scale B-mode component, as such bispectra will provide the largest constraining power for the primordial ζζh 3-point function.

A. Polarized Galactic foregrounds
The large-scale polarization B-(and to lesser extent) E-mode fields are dominated by Galactic emission: at low frequencies by synchrotron radiation and at higher frequencies by polarized dust emission [130]. Because the primordial B-mode signature is expected at large angular scales ( 100), inference on the tensor-to-scalar ratio r relies heavily on multi-frequency data to break the degeneracy between foreground and CMB power. Similarly, inference on the bispectra we are interested in would require uncontaminated large-scale B-mode data.
One would naively expect that component-separated B-mode data suitable for constraints on r is also suitable for constraints on bispectra with B-mode components. However, there is an extra complication for bispectrum inference: residual anisotropic or non-Gaussian correlations between foreground B and foreground T or E signal. Residual correlations of this type might not be important for a power spectrum analysis but will bias a bispectrum analysis. Unfortunately, it is quite natural to expect Galactic signal to source a squeezed bispectrum: small-wavelength foreground power in a given direction is likely not independent from the foreground signal on larger wavelengths in the same direction. The question is thus whether multi-frequency cleaning of the data will suppress such correlations enough.
Characterization of the non-Gaussian aspects of the polarized Galactic signal is relatively unexplored at this point. Early results obtained from the Planck data in Ref. [131] suggest that there are indeed significant squeezed T T B , T EB , and EEB bispectra on large angular scales in the thermal dust component of the Galactic signal. No significant bispectrum is found in the synchrotron emission. Ref. [131] does not find a significant non-Gaussian correlation when foreground-cleaned Planck B-mode data are correlated with the T and/or E components of the Galactic dust. Although this analysis omits the very large angular scales ( ≤ 40), it does suggests that the standard component separation methods sufficiently suppress Galactic foregrounds given the Planck noise level. It should also be noted that in a related study no evidence was found for a dust bispectrum template in the foreground-cleaned Planck temperature data [132]. More investigation is clearly still needed; just like it seems to be the case for inference on r, one would expect foreground uncertainty to be the limiting factor for inference on the ζζh 3-point correlation function.

B. Secondaries sourced by ζ
We now consider non-Galactic secondary non-Gaussian signals that are sourced by the curvature perturbation ζ (as opposed to h). We again focus on squeezed bispectra with a large wavelength B-mode, as such bispectra may bias the inference on the primordial signal.
The most well-studied secondary signal is sourced by the correlation between the late-time ISW effect and the lensing potential [133]. A similar correlation exists between the quadrupole perturbation that sources the polarized reionzation signal and the lensing potential [134]. The ISW effect and the polarization generated at reionization only affect the CMB over large angular scales. On the other hand, the lensing potential modulates small-scale power. The associated bispectra are thus of the squeezed type. The ISW effect only affects the temperature anisotropies, the polarized reionization signal is purely E. This means that although T T B , T EB , EEB bispectra are produced [134,135], the only significant configurations will have large-scale T or E-mode components instead of a B-mode component.
In general, the requirement of a squeezed bispectrum with a large-scale B-mode contribution is highly constraining. There are no obvious (non-Galactic) candidates that preferentially source a B-mode signal on large angular scales. Non-linear effects other than lensing that produce B-mode signal, such as patchy reionization [136] and the polarized Sunyaev Zel'dovich (pSZ) effect [137,138], do so only at relatively small angular scales. Unclustered, extragalactic point sources may be weakly polarized and have a reduced bispectrum that is approximately constant with multipole order [139]. They thus contaminate all bispectra, regardless of shape. However, especially for squeezed models, the point-source bias is found to be negligible: the two types of bispectra can be estimated independently [15,139].

C. Secondaries sourced by h
The ζζh 3-point correlation function is contingent upon the existence of the primordial tensor perturbation h. For completeness, we thus briefly discuss possible secondary non-Gaussian signal sourced by a purely Gaussian tensor perturbation h.
In this case, the most obvious single-B-mode bispectrum candidate will be due to the interplay between two effects: 1) the standard correlation between the lensing potential φ and the ISW and polarized reionization signal, together with 2) the fact that lensing will now convert some of the (Gaussian) primordial B-mode signal to E-mode polarization [140]. In the resulting BEX bispectrum, B is the standard primordial B-mode signal, E is the primordial B-mode signal lensed to an E-mode signal and X is the standard scalar-induced T or E signal. To first order in the lensing potential, the bispectrum should be given by the triangular configurations of C BB C φX . The suppression by r, due to the presence of the primordial B-mode power spectrum, makes this bispectrum lower in amplitude than the standard lensing-ISW bispectrum discussed in the previous section. More importantly however, the fact that the lensing-ISW and lensing-reionization correlation C φX is only nonzero for 100 [134] means that there will be no significant bispectrum configurations with a large-scale B-mode component and two small-scale ( > 100) T and/or E components: the relevant configuration for a bias.
Analogous to the E-mode-lensing correlation in the previous section, the B-mode signal from reionization, present when r = 0, is also correlated to small-scale power through a correlation with the lensed signal. The difference is that isotropy and parity invariance forbid a correlation between B and the regular gradient-type lensing potential. Instead the B-mode signal is correlated to the curl-type lensing potential sourced by the h perturbation [141,142]. Unlike the ζζζ case, there will now exist BXX bispectra, where B is the unlensed Bmode field and X the curl-lensed T or E-mode field (and X ∈ {T, E}). To leading-order we expect such bispectra to be proportional to the triangular configurations of C Bω C XX , where C Bω is the cross-correlation between the curl component of the lensing deflection angle and the reionization B signal. The power spectrum of the tensor-induced ω, i.e. C ωω , is strongly suppressed compared to scalar-induced lensing and decays rapidly for > 2 [141,142]. One would expect similar behavior for the amplitude of C Bω and thus expect that C Bω C XX is negligible. Still, the associated bispectra are maximized in the squeezed limit with a large-scale B-mode, so they should be considered as a potential bias to a primordial signal.
The tensor-induced temperature quadrupole on the last-scattering surface seen by galaxy clusters will source the pSZ effect [143,144]. The resulting small-scale power will be correlated with the primary B-mode field from reionization and will thus source a squeezed BEE bispectrum (among others). The B-mode component is on large angular scales, which means that the bispectrum has the right shape to be a potentially relevant contaminant of the primordial bispectrum.

D. Contributions to the covariance
In the previous three sections we focused on possible biases to the estimator. All discussed effects will also contribute to the covariance of the estimate. Fortunately, in most cases these effects are subdominant to the Gaussian contribution to the covariance, given by the inverse of Eq. (30). However, as we illustrate in Appendix D, the covariance of the estimator receives additional contributions from any connected 4-and 6-point correlation function present in the data. For example: for the ζζζ, temperature-only bispectrum, the connected moments due to lensing will introduce significant additional covariance on small angular scales. The variance due to the connected 4-point function alone is expected to dominate the cosmic-variance induced estimator variance for local-type non-Gaussianity for max 3500 [145] (and hence will be a concern for experiments like Simons Observatory and CMB-S4). The total effect on the estimator covariance will depend on the shape of the primordial bispectra that are estimated: local, or squeezed, shapes will likely be affected the most.
We focus primarily on bispectra with a single B-mode component; in the previous sections we argued that such bispectra are less susceptible to secondary biases. However, this argument does not hold for the variance of the estimator: when lensing is introduced, it is expected that the estimator covariance is affected in a way that is rather similar to the temperature-only case mentioned above. For example, consider the T T B bispectrum; the variance of its estimate will be approximately proportional to the T T T T BB 6-point function. In the noiseless Gaussian case, this 6-point function reduces to terms proportional to C T T C T T C BB . When lensing is introduced, the power spectra are replaced by their lensed versions (which has a large effect on C BB ). However, there should also be a contribution proportional to the connected T T T T 4-point function from lensing. One would expect this contribution to saturate the constraining power for max 3500, just like it does for the temperature-only case mentioned above. For the variance on estimates using the EEB or T EB bispectra a similar argument applies [129]. In other words, we expect that an estimate of the ζζh 3-point function using high-resolution data will have large non-Gaussian contributions to its (co)variance, at least for squeezed bispectrum shapes with a B-mode contribution on large angular scales. 22 Note that this non-Gaussian contribution to the variance is not included in the Fisher forecasts presented in Sec. IV.
In a future study we hope to identify all these contributions to the covariance and estimate their effects on our ability extract the primordial signal. We would like to note that, in principle, secondary biases and non-Gaussian contributions to the covariance from lensing can likely be reduced significantly by delensing [146]. As some of the contributions to the covariance might be hard to compute analytically, applying the developed estimator on a suite of realistically lensed simulations would be an important aspect of such a study.

VI. CONCLUSIONS
The CMB bispectrum sourced by primordial scalartensor interactions is a well-defined observable that can be probed effectively with upcoming CMB polarization data. Inference on these types of primordial interactions probes non-standard early-universe models that are essentially unconstrained by current studies. In addition, inference on the squeezed ζζh 3-point function provides a powerful consistency test of the standard inflationary paradigm.
In this work we derived a numerically efficient and optimal estimator for the amplitude of CMB bispectra sourced by primordial ζζh 3-point correlation functions. We demonstrated that despite the intrinsic geometrical complexity of the bispectrum, an efficient estimator can be formulated, see Eq. (58). There is a limited computational overhead compared to standard ζζζ bispectrum 22 Because the effect should only become dominant for 3500 there should be negligible effect on primordial bispectra with more than one B-mode component and/or shapes that are more equilateral. In these cases, the signal drops sharply for B 200. estimation, see Eq. (41), but the same asymptotic scaling with data resolution is reached. The derived estimator provides complementarity to the more general modal and binned bispectrum estimators [84][85][86][87]90] and should, due to its numerical advantage, be the preferred method for high-resolution data.
We studied the bispectrum sourced by a squeezed ζζh 3-point function in more detail. We presented a set of Fisher forecasts that form a baseline to which more realistic forecasts will be compared in future work. The presented forecasts demonstrate a relatively strong dependence on the size of the largest angular scale accessible in the data. We also demonstrated how constraints from the combination of temperature, E-and B-mode data are significantly better than those only from temperature and B-mode data or only from E-and B-mode data. Finally, we found that the lensing contribution to the B-mode data starts to significantly impact the constraints from experiments like the Simons Observatory and LiteBIRD. For a more futuristic experiment like CMB-S4, delensing of the large-scale B-mode data will be crucial.
Although the Fisher forecasts provide us with a good indication of the ultimate constraining power of future CMB experiments, future forecasts will need to include more realism. This requires applying the estimator directly to simulated sky maps. Besides allowing the characterization of standard complications like non-trivial noise properties and sky cuts, this approach is the appropriate way to study effects that are more specific to e.g. the ζζh bispectrum. Examples of such effects include the incomplete removal of Galactic B-mode signal or non-Gaussian polarized secondary sources. Lensed sky simulations will also allow one to quantify the expected extra estimator variance due to non-Gaussian 4-and 6point correlation functions in the lensed CMB fields, as well as the effects of delensing these fields. Although current data are inconclusive, it seems likely that the eventual limit on future constraints will be from foreground uncertainty on large angular scales and the non-Gaussian lensing contribution on small-scales. Before this point is reached however, the data will contain a large amount of unexplored cosmological information. With an efficient estimator in hand, we should now turn towards map-based simulations to predict the exact amount of information.
In the next decade we will significantly improve our measurements of the CMB polarization field. With this in mind, we should consider interesting science targets beyond the tensor-to-scalar ratio that can provide insight into the early Universe. One of these targets is probing the primordial interactions between scalars and tensors as well as tensor self-interactions. Currently, the most sensitive probe of these interactions comes from including the B-mode field into CMB bispectrum inference. The work presented here is a contribution towards the development of a complete framework to constrain these interactions with upcoming CMB data. For comparison and completeness, we first treat the standard ζζζ template, i.e. a template with no contracted angular term. Assuming a shape template like Eq. (40), the expression for the bispectrum in Eq. (21) simplifies to: Note that the 5 extra terms are permutations of the input functions f , g and h. With this bispectrum, the cubic term of the estimator becomes: which is the standard result [80], but rephrased in our notation. See Eq. (59) and Eq. (60) for the definition of the A (S,n) functionals. In the (S, n) = (0, 0) case used here, the functionals are much less complicated: the 3-j symbols reduce to a delta function which simplifies the expression to: Note that the (−1) factors are not present in the original expression [80]. They do not change the estimator, as only configurations with 1 + 2 + 3 = even contribute. The K functionals are defined in Eq. (51).
b. Scalar-only template with angular dependence of massive spinning particles The second ζζζ template is inspired by the three-point function template derived in Ref. [31]. The template captures the imprint of a massive spin-s field during inflation. Although the template only involves the curvature perturbation, it does include a contracted angular term: P s is a Legendre polynomial of degree s. The five additional permutations are permutations of the three wave vectors. In order to write down the corresponding bispectrum, we expand the Legendre polynomial in terms of spherical harmonics: The bispectrum for a spin-s template then becomes: The five additional terms are obtained by simultaneously permuting f (i) , g (i) , and h (i) with the 1, 2, and 3 indices. The cubic term of the estimator for this bispectrum is given by: The two extra terms are cyclic permutations of f (i) , g (i) , and h (i) .

Scalar-tensor-tensor
To illustrate the situation for a scalar-tensor-tensor 3-point function, we use a template inspired by the SFSR result [18]: The polarization tensors e ±2 are defined in Eq. (13) and Eq. (14), the a and b indices run over the three spatial dimensions. We use the Einstein summation convention. Using the notation from Ref. [83], we may expand the polarization tensors as: The α coefficients obey the following orthogonality relation: Using this relation together with the orthogonality relation of the Wigner 3-j symbols in Eq. (B13), the contraction of two polarization tensors can be expressed as follows: The bispectrum corresponding to the template in Eq. (A10) thus becomes: The cubic part of the estimator is given by: The expressions for the hζh and hhζ parts are derived in an analogous way.

Tensor-tensor-tensor
Finally, we derive the estimator for a tensor-tensor-tensor 3-point function. We again take the SFSR prediction [18] as inspiration for our template: The two extra terms are cyclic permutations of the three wave vectors.
To derive the bispectrum, we need to expand the unit wave vectors in spherical harmonics [83]: The α coefficients obey the relation in Eq. (A12). Together with Eq. (A11), Eq. (B8), and Eq. (B13) we then expand the first angular term in Eq. (A17) as follows: The J symbols are defined in Eq. (B9). The second angular term in Eq. (A17) is expressed in terms of Wigner 6-j symbols by making use of the relation in Eq. (B16), see also Ref. [147]. The resulting expression is: e λ1 ab (k 1 )e λ2 cd (k 2 )e bc λ3 (k 3 )k a It is convenient to separate the corresponding bispectrum into a part sourced by the first angular term and a part sourced by the second term. The first part is given by: The two extra terms are given by cyclic permutations of the f (i) , g (i) , and h (i) input functions together with the 1, 2, and 3 indices. The second part is given by: The cubic estimator is also most easily expressed in two parts. The part corresponding to the first bispectrum, Eq. (A21), is given by: The two extra terms are cyclic permutations of f (i) , g (i) , and h (i) . The second part, corresponding to Eq. (A21) is given by: We have introduced the B and C functionals. They are completely analogous to the A functionals, defined in Eq. (59) and Eq. (60), but slightly differ in their spherical harmonic coefficients: Computing the combination of Eq. (A23) and Eq. (A23) will still asymptotically scale as O( 3 max ). Although more terms have to be computed compared to the previous templates, this computational overhead is easily outweighed by the fact that the h transfer functions impose max ≈ 200. The spin-weighted spherical harmonics (SWSHs) s Y m are generalizations of the standard spherical harmonics Y m . Both types of spherical harmonics are functions on the sphere S 2 . Indeed, one may relate: The relation between the two sets of functions for nonzero s can be found in the literature [148,149].
The SWSHs are conveniently defined on the standard spherical coordinate system by taking the Wigner D-matrices (irreps of the three-dimensional rotation group) parameterized in terms of the Euler angles and fixing the polar axis as follows: With a slight abuse of notation, we usen in the arguments of the spherical harmonics to refer to the θ and φ angles that describe the spherical decomposition of the 3D unit vector, i.e.n = (sin θ cos φ, sin θ sin φ, cos θ). Similarly, we denote the differential solid angle with dΩ(n), i.e. S 2 dΩ(n) ≡ 2π 0 dφ π 0 dθ sin θ. The functions form an orthonormal and complete system for each integer 23 spin weight s: ,m 2. Wigner 3-j, 6-j, and 9-j symbols The Wigner 3-j symbols are real-valued and serve to describe the decomposition of tensor products of SWSHs into direct sums of SWSHs (see Eq. (B8)) (this also holds, in more generality, for irreps of the rotation group like the Wigner-D matrices). The 3-j symbols are closely related to the Clebsch-Gordan coefficients but are normalized such that they are the exact coefficients needed to form a rotationally invariant product of three SWSH coefficients (recall the definition of the angle-averaged bispectrum in Eq. (18)). In the following, we list a limited number of symbol properties; see Ref. [150] for an exhaustive description.
The 3-j symbols pick up a (real) phase factor when the sign of the three 'magnetic' indices is simultaneously changed: The symbols are invariant under cyclic permutations of m 1 , m 2 , and m 3 but pick up a factor (−1) 1 + 2 + 3 for anticyclic permutations. The symbols are only nonzero for m 1 + m 2 + m 3 = 0, | 1 − 2 | ≤ 3 ≤ 1 + 2 , and |m i | ≤ i ∀i ∈ {1, 2, 3}. There are two orthogonality relations: In particular, in the case of equal symbols one has: As mentioned, the Wigner 3-j symbols are used to couple two SWSHs or, equivalently, find the third angular state that combines two SWSHs into a rotationally invariant quantity. In general, there is no unique way to couple three SWSHs; there are two distinct sequences of applying Eq. (B8) to the product. The Wigner 6-j symbol is used to transform between these two possible final angular states [150]: By using one of the orthogonality relations of the 3-j symbols, the 6-j symbol may equivalently be expressed as: The 6-j symbols are invariant under all permutations of their columns and under the simultaneous permutation of upper and lower arguments in two columns. The symbols also obey several triangle conditions that can be deduced from the top rows of each of the 3-j symbols in the above expression. There also exist an orthogonality relation for the 6-j symbols [150]. Finally, the Wigner 9-j symbols are defined to describe the transformation between different couplings of four SWSHs. The symbols may either be expressed in terms of 6-j or 3-j symbols [150]. The latter expression is given by: We extract the radiation transfer functions from CAMB. We normalize the default output from CAMB such that the CMB power/cross spectra are related to the primordial power spectra defined above as: with Z ∈ {ζ, h} and XY ∈ {T T, EE, T E, ET, BB}.

Local 3-point correlation function
The local shape template used in Sec. IV is given by [89]: The template is symmetric under permutations of the three wavenumbers and perfectly scale-invariant (i.e. proportional to k −6 for k 1 = k 2 = k 3 ). If desired, including the scalar or tensor spectral tilt simply amounts to the replacement k → k(k 0 /k) (ns−1)/3 or k → k(k 0 /k) nt/3 , where k 0 is some fiducial pivot scale.
Although it is generally non-trivial to construct an estimator that fulfills this relation for all possible values of θ, the relation suggests a simple recipe for the construction of an estimatorθ that fulfills the Cramér-Rao bound in the case where θ is known:θ where I −1 is the inverse of the Fisher matrix. In reality, θ is unknown. However, an estimator constructed in this way may still be useful for estimates of θ that are close to the assumed value. This is the approach we will take.

CMB bispectrum estimation
We now construct the bispectrum estimator and provide a brief discussion of its statistical properties. We will see that the estimator is unbiased and becomes statistically optimal (saturates the Cramér-Rao bound) in the limit of vanishing non-Gaussianity.

a. Probability density function
It is clear from the previous section that a closed-form expression for the likelihood of the data is required to construct the estimator. However, there exists no such expression when the condition of Gaussian initial perturbations is relaxed. Without a closed-form expression, we thus construct an approximation to the full non-Gaussian likelihood by perturbing around the Gaussian form. The specifics of this perturbation are determined by the connected moments, or cumulants, predicted by the model. Given a characteristic function and its associated probability distribution, one can distinguish between the moments about the origin of the distribution (the n-point correlation functions) and the connected moments about the origin (the cumulants). The connected moments are proportional to the MacLaurin coefficients of the natural logarithm of the characteristic function. The connected moments about the origin are proportional to the MacLaurin coefficients of the characteristic function itself (i.e. without the logarithm). In more practical terms: the moments about the origin, the n-point correlation functions, may be expanded in terms of the connected moments with the help of Wick's theorem [152]. For the mean-zero distributions we are interested in, the first moments of a random field, expressed as a set of spherical harmonic modes {a m }, are expanded as follows: a 1 m1 a 2m2 = a 1m1 a 2m2 c , (D6) a 1 m1 a 2 m2 a 3m3 = a 1m1 a 2m2 a 3m3 c , (D7) a 1m1 a 2 m2 a 3 m3 a 4m4 = a 1m1 a 2m2 a 3m3 a 4m4 c + a 1m1 a 2m2 c a 3m3 a 4m4 c + a 1 m1 a 3m3 c a 2m2 a 4m4 c + a 1m1 a 4m4 c a 2 m2 a 3m3 c , (D8) a 1m1 a 2m2 a 3m3 a 4m4 a 5 m5 = a 1m1 a 2m2 a 3m3 a 4m4 a 5m5 c , (D9) a 1m1 a 2m2 a 3m3 a 4m4 a 5m5 a 6m6 = a 1 m1 a 2m2 a 3m3 a 4m4 a 5m5 a 6m6 c + a 1m1 a 2 m2 a 3 m3 a 4m4 c a 5m5 a 6m6 c + 14 perm. + a 1m1 a 2m2 a 3 m3 c a 4m4 a 5m5 a 6m6 c + 9 perm. + a 1m1 a 2m2 c a 3m3 a 4m4 c a 5m5 a 6m6 c + 14 perm. .
The quantities on the l.h.s. represent the moments and the quantities on the r.h.s. are the connected moments (denoted by . . . c ). For a distribution with a vanishing mean, there is no distinction between the moments and connected moments for n = 2 and n = 3. For n = 4 and higher, we see a distinction. A Gaussian distribution is a distribution for which all connected moments with n > 2 vanish. The approximation to the likelihood of the data we will use is known as the Edgeworth series. More specifically: an Edgeworth expansion around a mean zero multivariate Gaussian distribution. We truncate the series such that the only relevant cumulants are the 2-and 3-point functions. A detailed derivation of this procedure can be found in Ref. [153]. In short: one Taylor expands the non-Gaussian part of a general characteristic function to first order and discards all terms except the third-order moments. Fourier transforming this truncated series together with the unmodified Gaussian part yields the PDF. Although the Edgeworth expansion is an asymptotic series, truncating it to third order does not guarantee a well-defined (i.e. positive and normalized) PDF [154]. However, as long as we are only interested in the weakly non-Gaussian regime, where the third-order moment is subdominant to the second, we assume that these subtleties can be safely ignored.
Representing the likelihood for a measured set of n spherical harmonic modes a = {a X, m } as the truncated Edgeworth series yields [78,115]: Pr(a|C, B) = 1 + 1 6 1, 2, 3 m1,m2,m3 where C and B denote the 2-and 3-point correlation functions of a. The notational shorthand C −1 a is defined in Eq. (27). The above expression is that of a nested model: when B vanishes, we recover the mean zero Gaussian model. The extra terms denoted by 'cyclic' are given by the two cyclic permutations of the three ( , m, X) triplets. It is straightforward to incorporate harmonic modes sourced by a combination of primordial scalar and tensor perturbations in the above description. Consider the following decomposition: a X, m = a Since the noise n X, m is independent from the primordial fields and since all components have zero mean, the most general bispectrum then is expressed as: Inserting Eq. (D13) into Eq. (D11) produces a likelihood for a that takes into account the non-Gaussian correlation between the primordial scalar and tensor fields. The I and J indices run over the dimensions of the parameter vector space. Note thatf NL and f NL can be understood as either scalars or as vectors; in the latter case, the I −1 is the inverse of the d × d Fisher matrix instead of the scalar Fisher information. To identify Pr(a|C, B) in Eq. (D11) with Pr(a|f NL ), we treat the bispectrum as fixed up to a scaling f NL ∈ R d and consider the shape of the bispectrum and the covariance as fixed. More specifically, we assume: where the inner product is defined in the parameter vector space. This expression is a generalization of Eq. (25) that allows the bispectrum to consist of a sum of bispectra each with its own f NL parameter. To construct the estimator we now simply insert Eq. (D15) into the expression for the PDF in Eq. (D11) and insert the result into Eq. (D14). We may expand the logarithm in a power series and neglect all terms but the one that is O(B). This is a valid approach because the second term in the brackets in Eq. (D11) must be 1 in the weak non-Gaussian regime. This yields the estimator constructed by Ref. [78] (which is a refinement to the cubic expression originally introduced in Ref. [82]): − (C −1 ) X1X2 1m1 2m2 (C −1 a) X3 3m3 + cyclic . (D16) Note the use of I −1 0 ≡ I −1 (0) instead of I −1 (f NL ): strictly speaking, the inverse of the Fisher matrix will depend on the parameter vector. This reflects the fact that a true optimal estimator should vary between datasets based on the value of f NL . Of course, such optimality is not possible with the point estimator we use here: f NL is unknown. A true optimal weighting would be achieved with a Bayesian approach in which the likelihood of the data is calculated for each value of f NL . In reality, this re-weighting of the estimator is not important for values of f NL that are of interest [155]. For f NL = 0 the estimator is optimal by construction and the Fisher matrix has a simple analytic solution: The estimatorf NL is a function, or 'statistic', of the data a, so the statistical properties of the estimator may be derived from the likelihood of the data. Here we present a heuristic overview of the statistical properties, given different models for the data. It should be understood that an analytic approach like the one presented here is mainly useful to gain intuition; characterization of the estimator applied to a real dataset requires the use of simulations.
To derive the bias, covariance and higher-order moments of the estimate, we first define what we mean by the pth moment of the estimate: where: and wheref p NL denotes the pth power of the estimate. This notation is understood to generalize to the multivariate case as e.g.
It is then convenient to note that the expression for Pr(a|f NL ) in Eq. (D11) consists of two parts: a regular Gaussian PDF and a second part that consists of a Gaussian PDF times terms cubic and linear in a. This means that we can divide the integral in Eq. (D19) into a purely Gaussian integral ( . . . G ) and another Gaussian integral ( . . . G ) with an integrand that is multiplied with these cubic and linear terms. Since the estimator in Eq. (D16) is an odd function of a, the . . . G integral will always vanish for p = odd. On the other hand, the . . . G part will always vanish for a moment with p = even.
With this knowledge and the likelihood of the data in Eq. (D11), deriving the bias of the estimator comes down to evaluating Eq. (D19) for p = 1. This is an odd moment, so only the . . . G integral has to be evaluated. The result is that f NL = f NL , i.e. the estimate is unbiased regardless of the value of f NL . For the (co)variance of the estimate, i.e. Var(f NL ) ≡ f 2 NL − f NL 2 , we need to additionally evaluate Eq. (D19) for p = 2. Doing so, we find Var(f NL ) = I −1 0 − f 2 NL , which is only equal to the optimal value I −1 (f NL ) when f NL = 0. So we establish that in the limit of f NL → 0, the estimator is unbiased and optimal. In cases where f NL = 0, the estimator is still unbiased 24 but suffers from non-optimal (co)variance [116,117]. This is expected, as the situation does not conform to Eq. (D4) anymore. Finally, note that for f NL = 0, the estimate itself becomes (weakly) non-Gaussian. For instance, there will be a nonzero f 3 NL moment with an O(f NL B 4 1 C −6 I −3 0 ) amplitude. In the above we assumed that the likelihood for the data is described by Eq. (D11). When an additional 3-point function, not parameterized by an f NL parameter, is introduced in the likelihood, the estimator becomes biased. The exact bias depends on the shape of the added 3-point function, see the discussion in Sec. V.
An interesting situation arises when the data are drawn from a distribution with nonzero higher-order connected moments. This situation is not only hypothetical: lensing introduces a significant connected 4-point function, as well as smaller connected 6-, 8-, etc. point functions [129]. To describe the statistical properties of the estimator in the presence of lensing, we thus need to update the likelihood of the data in Eq. (D11) with these nonzero higher-order connected moments. Let us focus on the connected 4-point function, denoted by T . The Edgeworth expansion will now include O(T a 4 /C 4 ), O(T a 2 /C 3 ), and O(T /C 2 ) terms in addition to the O(1), O(B 1 a 3 /C 3 ), and O(B 1 a/C 2 ) terms already present in Eq. (D11). With these additions, the bias of the estimator does not change, but the variance of the estimator receives an O(B 2 1 T C −5 I −2 0 ) contribution. By extension, the addition of connected 6-, 8-or higherpoint functions to the likelihood will also contribute to the estimator variance. The estimate itself will also become non-Gaussian with these additions. For instance: there is an O(B 4 1 T C −8 I −4 0 ) connected 4-point function off NL when a connected 4-point function T is added to the likelihood. Computing a semi-analytic estimate of the additional estimator variance is highly challenging due to the number of elements that make up the higher-order connected moments. See Ref. [91,156] for details on an semi-analytic approach in the flat-sky approximation. We briefly discuss the expected additional lensing-induced estimator variance in Sec. V D.