${\rm S{\scriptsize IM}BIG}$: Cosmological Constraints from the Redshift-Space Galaxy Skew Spectra

Extracting the non-Gaussian information of the cosmic large-scale structure (LSS) is vital in unlocking the full potential of the rich datasets from the upcoming stage-IV galaxy surveys. Galaxy skew spectra serve as efficient beyond-two-point statistics, encapsulating essential bispectrum information with computational efficiency akin to power spectrum analysis. This paper presents the first cosmological constraints from analyzing the full set of redshift-space galaxy skew spectra of the data from the SDSS-III BOSS, accessing cosmological information down to nonlinear scales. Employing the ${\rm S{\scriptsize IM}BIG}$ forward modeling framework and simulation-based inference via normalizing flows, we analyze the CMASS-SGC sub-sample, which constitute approximately 10\% of the full BOSS data. Analyzing the scales up to $k_{\rm max}=0.5 \, {\rm Mpc}^{-1}h$, we find that the skew spectra improve the constraints on $\Omega_{\rm m}, \Omega_{\rm b}, h$, and $n_s$ by 34\%, 35\%, 18\%, 10\%, respectively, compared to constraints from previous ${\rm S{\scriptsize IM}BIG}$ power spectrum multipoles analysis, yielding $\Omega_{\rm m}=0.288^{+0.024}_{-0.034}$, $\Omega_{\rm b}= 0.043^{+0.005}_{-0.007}$, $h=0.759^{+0.104}_{-0.050}$, $n_{\rm s} = 0.918^{+0.041}_{-0.090}$ (at 68\% confidence limit). On the other hand, the constraints on $\sigma_8$ are weaker than from the power spectrum. Including the Big Bang Nucleosynthesis (BBN) prior on baryon density reduces the uncertainty on the Hubble parameter further, achieving $h=0.750^{+0.034}_{-0.032}$, which is a 38\% improvement over the constraint from the power spectrum with the same prior. Compared to the ${\rm S{\scriptsize IM}BIG}$ bispectrum (monopole) analysis, skew spectra offer comparable constraints on larger scales ($k_{\rm max}<0.3\, {\rm Mpc}^{-1}h$) for most parameters except for $\sigma_8$.


INTRODUCTION
The LSS holds valuable clues about the Universe's origin, composition, and evolution, making it a crucial arena for testing fundamental physics.In the coming years, the next-generation galaxy surveys including Dark Energy Spectroscopic Survey (DESI; Aghamousa et al. 2016), Euclid (Laureijs et al. 2011), Nancy Ro-man Space Telescope (Roman; Wang et al. 2022), SPHEREx (Bock & SPHEREx Science Team 2018), and Subaru Prime Focus Spectrograph (PFS; Tamura et al. 2016) will provide a detailed spectroscopic view of the three-dimensional distribution of galaxies across significantly larger cosmic volumes than current surveys.This expanded scope promises a marked reduction in statistical uncertainties in the measured clustering statistics of galaxies, yielding unparalleled constraints on the standard ΛCDM model and opening up exciting new avenues for exploring new physics.
Among various proposed summary statistics, the galaxy skew spectra (Schmittfull et al. 2015;Moradinezhad Dizgah et al. 2020;Dai et al. 2020;Schmittfull & Moradinezhad Dizgah 2021;Chakraborty et al. 2022) are optimal, yet computationally efficient proxy statistics for the galaxy bispectrum.They are constructed from the cross-correlations of the observed galaxy density field with appropriately weighted quadratic galaxy fields.For parameters such as galaxy biases, the growth rate of structure, and the amplitude of primordial non-Gaussianity (PNG), by construction, the skew spectra fully capture the information of the bispectrum in the linear regime (Moradinezhad Dizgah et al. 2020).Despite these encouraging results from Fisher forecasts (Moradinezhad Dizgah et al. 2020;Hou et al. 2023), the applicability of the full redshift space galaxy skew spectra to observational data, in particular down to nonlinear scales, has not been studied.This investigation is the focus of the current article.
The standard approach to cosmological inference from the LSS is based on a Bayesian likelihood analysis and requires: (1) a theoretical model of the observable, (2) an explicit form of the likelihood, and (3) schemes for correcting the observational systematics in the measurements.These requirements can hinder the maximal exploitation of upcoming LSS data.
First, if using perturbation theory (PT) to describe the observables, the limited validity of the perturbative description restricts the theory-based analyses to relatively large scales (k max ≃ 0.25 Mpc −1 h). 1 Despite the considerable recent developments in perturbative description of the LSS (McDonald 2006;Baumann et al. 2012;Carrasco et al. 2012;Senatore 2015;Senatore & Zaldarriaga 2015, 2014;Vlah et al. 2015;Blas et al. 2016;Ivanov & Sibiryakov 2018;Cabass et al. 2023), the PTbased analyses are intrinsically limited to the weakly nonlinear regime.Moreover, apart from the galaxy bispectrum, for some of the powerful recently proposed statistics such as wavelet scattering transforms, a perturbative description is missing altogether.
The second limitation of the standard analysis is that the assumption of a Gaussian likelihood, which relies on the central limit theorem, does not hold on large scales with low signal-to-noise ratio and on small scales where modes are highly correlated.Thus an explicit assumption of the Gaussian likelihood, can potentially bias the parameter constraints (Sellentin & Heavens 2018;Hahn et al. 2019).
Lastly, the standard analysis accounts for observational systematics such as targeting, imaging, and spectroscopic incompleteness by applying weights designed to correct their effects on measured observables (Ross et al. 2012(Ross et al. , 2017;;Kitanidis et al. 2020).However, even at the precision level of current surveys, the weights are insufficient at correcting for effects such as fiber collisions, which can bias the clustering measurements on weakly non-linear scales (Guo et al. 2012;Hahn et al. 2017a;Bianchi et al. 2018).Moreover, it is important to note that these correction techniques are currently optimized only for two-point statistics and their validity for other summary statistics has not been thoroughly tested.
To overcome these challenges, simulation-based inference (SBI; Cranmer et al. 2020;Lueckmann et al. 2021) is emerging as a promising avenue.2Without a strong assumption for the form of the likelihood, SBI enables estimation of the parameter posteriors (or the likelihood function) by leveraging high-fidelity numerical simulations to forward model the observables and employing deep generative models from machine learning for efficient parameter inference.In the context of galaxy clustering, several recent studies have explored the potential of SBI, employing either a set of summary statistics or at the field-level (e.g., de Santi et al. 2023a,b;Tucci & Schmidt 2023;Modi et al. 2023).In terms of application of the SBI to observational data, Hahn et al. (2022Hahn et al. ( , 2023a) ) introduced Simulation-Based Inference of Galaxies (SimBIG), a forward-modeling framework that enables performing SBI on a spectroscopic galaxy sample, employing either summary statistics or at the field-level.The SimBIG forward model capitalizes on high-fidelity N -body simulations, coupled with the state-of-the-art Halo Occupation Distribution (HOD) model to accurately simulate nonlinear galaxy clustering.Notably, SimBIG integrates pertinent observational effects such as survey masks and fiber collisions.In contrast to the common approach of accounting for the fiber collisions via correction weights, SimBIG forward models their effect.In a series of recent papers, SimBIG has been used to obtain cosmological constraints from SDSS-III BOSS CMASS galaxies.By employing several statistics (including the power spectrum Hahn et al. (2022Hahn et al. ( , 2023a)), the bispectrum Hahn et al. (2023), and the wavelet scattering transforms Régaldo-Saint Blancard et al. (2023)), and by performing field-level analysis using convolutions neural networks (Lemos et al. 2023), these works have highlighted the substantial potential of SBI in obtaining constraints that surpass the standard analysis of galaxy clustering (Hahn et al. 2023b).This is the first paper using the SimBIG framework to obtain cosmological constraints from analyzing the full set of redshift-space galaxy skew spectra, accessing information down to nonlinear scales.As part of this analysis, we also present a new estimator for the skew spectra that extends the previously developed estimators for periodic-box simulations (Schmittfull & Moradinezhad Dizgah 2021;Hou et al. 2023) to account for survey mask and angular systematic weights so that it can be used on observations.The structure of the paper is as follows: §2 provides an overview of the galaxy skew spectra and the new estimator for measuring the skew spectra on a realistic galaxy sample.In §3, we briefly describe the observational data used in our analysis, and in §4 we describe the SimBIG pipeline (the forward modelling, the simulation-based inference, and the validation tests).We show the results in §5, including the validation tests on synthetic data and the inferred posterior distribution of cosmological parameters from the BOSS-CMASS sample.Finally, we draw our conclusions in §6.
We provide additional details and complementary results in a series of appendices.In Appendix A, we provide explicit forms of the skew spectra, in Appendix C, we discuss the constraints on HOD parameters, and in Appendix E we investigate the dependence of the inferred constraints on several analysis choices including the outlier removal scheme, the removal of shot noise, the scale cuts, and the smoothing scale.

SUMMARY STATISTICS: GALAXY SKEW SPECTRA
The galaxy skew spectra are efficient proxy statistics for the bispectrum and offer several advantages over the bispectrum; first, they are simple to interpret since they are functions of a single wavenumber and not triangle shapes.Second, the computational cost of measuring them from the data is comparable to that of the power spectrum.While accounting for all bispectrum triangles requires O(N 2 ) operations capturing the full information of the bispectrum using the skew spectra requires O(N log N ) operations, where N = (k max /∆k) 3 is the number of 3D Fourier-space grid points at which the fields are evaluated.Third, for standard Bayesian inference, if estimating the covariance matrices from mocks, the skew spectra requires a significantly smaller number of mocks given the significantly smaller size of the data vector.Lastly, in comparison to the bispectrum Pardede et al. (2022), accounting for the survey window function is expected to be considerably simpler for the skew spectra (Hou & Moradinezhad Dizgah 2024).

Theoretical Construction
The skew spectra are constructed by cross-correlating the observed galaxy density field, δ g , with an appropriately weighted square of it, S n (Schmittfull et al. 2015;Schmittfull & Moradinezhad Dizgah 2021); The quadratic field, S n , is a product of two fields in configuration space (evaluated at the same spacial positions), thus, a convolution in Fourier space; with q ≡ d3 q/(2π) 3 .The fields involved in the convolution are smoothed, δ R g (k) = δ g (k)W R (k), to limit the contribution of small-scale fluctuations to skew spectra.While a top-hat filter in Fourier space amounts to imposing a sharp k-cut on contribution of small-scale modes to the convolution integral, to avoid edge effects in the measurements of the skew spectra, we use a Gaussian smoothing kernel W R (k) = exp(−k 2 R 2 /2).In perturbative analysis of the skew spectra (Schmittfull et al. 2015;Moradinezhad Dizgah et al. 2020;Schmittfull & Moradinezhad Dizgah 2021), smoothing the field (with relatively large R) ensures that only the Fourier modes on the semi-linear regime are included.For SBI performed in this paper, the smoothing serves the purpose of limiting the analysis to scales where the forward model is most trusted. 3 The weights, D n (q, k − q), are constructed such that for a theoretical bispectrum that can be decomposed as a sum of product-separable contributions (in terms of the three Fourier modes forming a triangle), the skew spectra correspond to the maximum likelihood estimators for parameters appearing as overall amplitudes of individual contributions in the sum.Explicit forms of the kernels D n (q, k − q) are given in Appendix A. This implies that, by construction, considering full set of 14 galaxy skew spectra in redshift space (3 in real space) fully capture the information of the bispectrum on large scales for parameters such as galaxy biases, the amplitude of primordial power spectrum and bispectrum, and the growth rate of structure (Schmittfull et al. 2015;Moradinezhad Dizgah et al. 2020). 4For other cosmological parameters (including ΛCDM parameters and the sum of neutrino masses), by performing numerical Fisher forecasts on Quijote simulations (Villaescusa-Navarro et al. 2020), Hou et al. (2023) demonstrated that the skew spectra can considerably improve the constraints from the power spectrum, at the level comparable to the bispectrum.
In addition to the clustering component in Eq. ( 1), each of the skew spectra receives a shot noise contribution, which in the Poisson limit is given by where P g (k) is the full 3D galaxy power spectrum and the kernels J Dn and JDn are given by

Estimators for Survey Data
Previous works have focused on the analysis of the skew spectra on simulated dark matter distribution and halo catalogs in N -body simulations with periodic boundary conditions (Schmittfull & Moradinezhad Dizgah 2021;Hou et al. 2023).In order to analyze the data from BOSS survey, we extend the previous estimators to measure the skew spectra using on Fast Fourier Transforms (FFT) (Schmittfull & Moradinezhad Dizgah 2021;Hou et al. 2023) to incorporate the survey geometry and observational systematics.We thus construct an equivalent of Feldman-Kaiser-Peacock (FKP;Feldman et al. 1994a) estimators for the power spectrum and bispectrum, where the galaxy sample is characterized by n g galaxies at positions x and a randoms catalog with n r objects to incorporate the radial and angular selection functions, Here, n ′ g (x) = w c (x)n g (x) and n ′ r (x) = w c,r (x)n r (x) are the weighted number densities of the observed and random galaxies, and α (≪ 1) is their ratio.As in previous SimBIG analyses, for each observed galaxy, we assign the completeness weight of w c = w sys w noz , where w sys is an angular systematic weight based on stellar density and seeing conditions, and w noz is a redshift failure weight.Since for the fiber collisions, the standard weighting scheme has been shown to be inaccurate (Hahn et al. 2017b), we forward model them in our pipeline instead of using weights as is commonly done in other analyses of BOSS data (for further details, see Hahn et al. 2023a).We have also included the FKP weight w fkp (x) = [1 + n(g)P 0 ] −1 (assuming ) to reduce the variance of the estimator (Feldman et al. 1994b). 5The normalization factor of I 33 in Eq. ( 6) is defined in analogy to that of the bispectrum (Scoccimarro 2015) since both skew spectra and bispectrum have three powers of galaxy density fields, where ng is the mean number density per redshift bin for the observed galaxy galaxies.As a simplifying assumption, we consider a fixed lineof-sight (LoS) direction when constructing the quadratic fields.While in computing the cross-correlation between the linear and the quadratic field the varying LoS is accounted for as in the Yamamoto et al. (2006) estimator.The skew spectra correspond to the monopole of the cross-correlation between the input galaxy field and its square.
For implementation, we used the nbodykit (Hand et al. 2018) 6 .We apply the weights D n (q, k − q) (see Eq. 1) to the first two linear density fields to obtain the quadratic field.We then prepare both the quadratic field and the third linear density field as FKPCatalog objects, an nbodykit-based interface for simultaneous modeling of a data and a random catalog.These two FKPCatalog objects are then passed to the Con-volvedFFTPower module for cross-correlating the quadratic field and the linear field.
In analogy to the power spectrum and the bispectrum FKP estimators, we subtract the Poisson shot noise from the measured skew spectra.The powerspectrum-independent contribution to shot noise in Eq.
(3) is estimated by inserting two unity fields, whereas the kernels that depend on the power spectrum are computed by correlating a smoothed 3D power spectrum P R g (q) = W R (q)P g (q) and a unity field.Due to the survey geometry, the full 3D power spectrum is non-Hermitian, whereas the unity fields are real-valued Hermitian fields and the third dimension of the field is halved after the FFT.In practice, we concatenate two hermitian unity fields to match the dimensionality.
As in the previous SimBIG analysis of the galaxy power spectrum and bispectrum, in addition to the skew spectra, we also include the average galaxy number density, ng .Therefore, in total, the data vector that we pass to the SBI pipeline has 1401 elements, with k min = 0.003Mpc −1 h that corresponds to the fundamental frequency of the BOSS SGC, and ∆k = 0.005 Mpc −1 h. 7 We explore the impact of using a different k min and its implications on our analysis.

BOSS CMASS GALAXY SAMPLE
As in previous SimBIG analyses (Hahn et al. 2022(Hahn et al. , 2023a;;Hahn et al. 2023;Régaldo-Saint Blancard et al. 2023;Lemos et al. 2023;Hahn et al. 2023b), we consider the CMASS galaxy sample (consisting mainly of luminous red galaxies) of BOSS DR12 (Eisenstein et al. 2011;Dawson et al. 2013).Since the SimBIG forward model employs Quijote N -body simulations with the box size of L box = 1 Gpc −1 h to model the nonlinear clustering of dark matter, we only use the data in the southern galactic cap (SGC) and further apply angular cuts of DEC > −6 • and −25 • < RA < 28 • and a radial cut corresponding to redshift range of 0.45 < z < 0.6.The resulting galaxy sample spans ∼ 10% of the full BOSS data and about 50% of the CMASS-SGC sample,with 109,636 galaxies and a number density of ∼ 4.2 × 10 −4 [Mpc −1 h]3 .

SimBIG PIPELINE
SimBIG is a framework for performing SBI to infer cosmological constraints from galaxy clustering data, leveraging high-fidelity simulations to forward model the observations and deep generative models to perform inference by estimating the posterior of model parameters.Fig. 1 shows a schematic sketch of the pipeline.In this section, we describe in more detail the different components of the pipeline, including the forward model, the SBI, and the validation tests.

Forward Model
The current implementation of the SimBIG forward model is specifically tailored to produce galaxy catalogs that match the statistical properties of the BOSS-CMASS sample.This involves four integral components: nonlinear dark matter distributions from N -body simulations, a halo finder to construct halo catalogs, an HOD model to populate halos with galaxies, and survey realism (including fiber collisions and survey geometry).

2020).
The simulations are run with Gadget-III TreePM+SPH code (Springel 2005), evolving 1024 3 dark matter particles in a periodic box of L box = 1 h −1 Gpc on the side.The particles are evolved from redshift z = 127 to z = 0, with the initial condition generated from the second-order Lagrangian Perturbation Theory (2LPT).Then, we construct halo catalogs using the Rockstar halo finder (Behroozi et al. 2012), which identifies halos using phase space information of the dark matter particles.
Figure 1.Schematic view of the SimBIG pipeline; to generate synthetic data that is statistically indistinguishable from BOSS observations, we utilize 2,000 N -body simulations from the Quijote suite run at different cosmologies, apply Rockstar halo finder to identify halos, populate halos with galaxies using an extended (9-parameter) HOD model, and finally apply the survey realism.The resulting 20,000 synthetic mocks are then used to train normalizing flows as neural density estimators to obtain approximate posterior distributions, q ϕ , for cosmological and HOD parameters from given summary statistics x, which in this paper refer to the skew spectra).
To populate halos with galaxies, SimBIG employs the HOD framework (Berlind & Weinberg 2002).We consider a state-of-the-art HOD model (Zheng et al. 2007) that includes the standard parameters for characteristic mass scale for halos to host a central galaxy, log M min , the scatter of halo mass at fixed galaxy luminosity, σ logM , the minimum halo mass for halos to host a satellite galaxy, log M 0 , the characteristic mass scale for halos to host a satellite galaxy, log M 1 , and power-law index for the mass dependence of satellite occupation α.In addition, it also incorporates assembly bias, A bias , concentration bias of satellite, η conc , velocity biases for central, η cen , and satellite, η sat , galaxies.For each halo catalog, we generate 10 galaxy catalog with different values of the HOD parameters.In total, our training dataset consists of 20,000 galaxy mock catalogs.
The final step is to apply the survey realism.This includes imposing the angular footprint of the survey and the veto masks (including masking for bright stars, centerpost, bad fields, and collision priority).We also model the fiber collision effect for galaxy pairs within an angular resolution < 62 ′′ .8In the radial direction, we apply the redshift cut 0.45 < z < 0.6.
In total, the forward model involves 5 cosmological parameters and 9 HOD parameters.We chose uniform prior on all parameters except for the assembly bias, where a Gaussian prior centered at zero was implemented.The exact prior range can be found in Table 1 in Hahn et al. (2023a).The wide range of the HOD pa-rameter priors results in a large variation in the galaxy number density and clustering amplitude.
While the SimBIG pipeline offers a random catalog that is 40 times the size of the real BOSS data sample, to avoid noise induced by the random catalog, we generated an additional random catalog that is 35 times the size of the training set with the highest total galaxy counts.We use the make survey toolkit9 to trim the angular footprint of the random.
In order to improve the training and reduce the dynamic range of the data vector, we preprocess the skew spectra of the training catalogs, which exhibit substantial variation in mean number density and clustering amplitude.We remove outliers and normalize the skew spectra in order to ensure that all features contribute meaningfully (Appendix E).

Simulation-based Inference
The 20,000 forward modeled galaxy mocks are formally samples drawn from the joint probability distribution p(Θ, x) of the model parameters, Θ, and a summary statistic, x, measured from observational data.Using this training dataset, we estimate the full posterior distribution over the parameters conditional on the observations, p(Θ|x).In SimBIG, the SBI is performed using neural density estimation with normalizing flows (NFs) (Tabak & Vanden-Eijnden 2010;Tabak & Turner 2013;Jimenez Rezende & Mohamed 2015), which enable efficient estimation of the posteriors with a limited number of simulations.
NFs are powerful tools for constructing expressive probability distributions describing the data using a simple base probability distribution, π(z), and a chain of trainable smooth bijective transformations (diffeomorphisms), f , to map the base distribution to the target one.The base distribution is often chosen to be a multivariate Gaussian, which is easy to sample from.The transformations f are chosen to have a tractable Jacobian so that the target distribution can be computed from the base distribution via a change of variables.A neural network is trained to obtain the flow (find the transformation f ) that best approximate the posterior, q ϕ (Θ, x) ≈ p(Θ, x), by minimizing the Kullback-Leibler (KL) divergence between p(Θ, x) = p(Θ|x)p(x) and q ϕ (Θ|x)p(x).In practice, this is achieved by maximizing the log-likelihood over the training set.
For the analysis of the skew spectra,10 we use a Masked Autoregressive Flow (MAF; Papamakarios et al. 2017) architecture implemented in the sbi Python package (Tejero-Cantero et al. 2020).11MAF uses Masked Autoencoder for Distribution Estimation (MADE; Germain et al. 2015) as a building block and by stacking several of them combines normalizing flows with an autoregressive design, which is well-suited for estimating conditional probabilities such as the posteriors.We split the data into training and validation sets with a 90/10 split, and use only the 90% for training, leaving the rest for validation test presented in §5.2.1.For the split, we randomly select from 20,000 simulations, irrespective of their cosmologies and HOD parameters.
We perform the optimization using ADAM (Kingma & Ba 2017) and tune the hyperparameters (number of blocks and hidden layers, dropout probability, learning rate, batch size, and number of transformations for the normalizing flows) via experimentation using the Optuna hyperparameter optimization framework Akiba et al. (2019) to obtain the best validation score.The details of the configurations for the Optuna search are provided in table 1.We train 2,771 models for our baseline configuration with R = 5 h −1 Mpc and k max = 0.5 Mpc −1 h.For each additional configurations that we use to study the dependence of our results on various analysis choices, we train 2,000-3,000 models.To ensure stability and robustness of our results against model variations (Lakshminarayanan et al. 2017;Alsing et al. 2019), we ensemble average (with equal weights) the top 10 models in each configuration, q ϕ (Θ|x) = 30 i=1 q i ϕ (Θ|x)/10.Overall, we do not observe a large variability in the top selected models and they provide consistent results.

SimBIG Mock Challenge
As in the previous SimBIG analyses, we perform a mock challenge using additional 2,000 simulated galaxy catalogs to validate the posterior estimates, q ϕ .The test simulations are organized in three datasets as introduced in Hahn et al. (2023a).The first two test sets (TEST0, TEST1) are constructed from the Quijote N -body suite, while the third set (TEST2) uses AbacusSummit Nbody simulations (Maksimova et al. 2021;Garrison et al. 2021).Below we list a few important differences between the three test simulations; • TEST0: 500 galaxy mocks generated with the same SimBIG forward model as the training data, but using a set of 100 Quijote simulations at a fixed fiducial cosmology,12 and the HOD model with parameters sampled from a narrower distribution compared to the prior distribution.There are 5 galaxy catalogs per cosmology with 9-parameter HOD models.
• TEST1: 500 galaxy mocks built from the same Quijote simulation as in TEST0 but applying a different halo finder and a simpler HOD model.Halos are identified using friend-of-friend (FoF; Davis et al. 1985) halo finder 5 galaxy catalogs are constructed per cosmology with 5-parameter HOD models.13 • TEST2: 1,000 mocks built from the AbacusSummit base simulations.The simulations evolve 6, 912 3 particles in a periodic box of L box = 2 h −1 Gpc.We use 25 realizations of base simulations, which have the same cosmology and differ in their phase.14Each simulation is divided into 8 subvolumes to obtain effectively 200 independent boxes.Halos are identified using CompaSO halo finder (Hadzhiyska et al. 2022).For each box, 5 galaxy catalogs are constructed using the standard 5-parameter HOD models.

RESULTS
In this section, we begin by presenting the measured skew spectra of the CMASS-SGC galaxy sample.Subsequently, we detail the conducted validation tests and finally present the first cosmological constraints from the analysis of the full set of galaxy skew spectra on an observational dataset including nonlinear scales.We compare the cosmological and HOD constraints with the previous SimBIG power spectrum (monopole and quadruple) and bispectrum (monopole) results, and describe the impact of a number of analysis choices on the inferred constraints.

Measured Skew Spectra on BOSS-CMASS-SGC Sample
Fig. 2 shows the measured skew spectra (including the clustering and shot noise components), applying smoothing scales of R = 5 h −1 Mpc in red, and R = 10 h −1 Mpc in black.Fig. 3 shows the Poisson shotnoise contribution, consisting of three contributions in Eq. (3), which have not been previously measured.
The shapes of the measured spectra are consistent with those on synthetic halo and galaxy catalogs used in Hou et al. (2023).As identified before, the skew spectra shapes fall into three categories: (i) "constant type" involving square of galaxy density field δ 2 , and characterized by a large-scale peak and a sharp decline towards smaller scales.(ii) "displacement type" involving the operator ∂ i ∂ j ∇ 2 , and distinguished by a positive bump and a zero-crossing feature on large scales for LoS-dependent contributions.(iii) "tidal type", involving the tidal operator S 2 , and marked by a negative dip and anti-correlation with other skew spectra on smaller scales.The skew spectra P Snδ with n > 3 optimally capture the LoS dependence of the bispectrum (see Appendix A for specific forms of the kernels S n ).
On small scales, the skew spectra all approach zero due to smoothing, as shown in Fig. 2. Comparing Fig. 2 and Fig. 3 we also notice that the clustering component is smaller than the Poisson shot noise around k = 0.4 h −1 Mpc, coincide with the averaged inter-particle separation ∆s ∼ 15 h −1 Mpc for the BOSS galaxy sample.

Posterior Validation
Before applying the estimated posterior, q ϕ , on the skew spectra of the observed CMASS galaxies, we perform two validation tests to asses whether q ϕ can robustly infer unbiased posteriors of the cosmological parameters.We describe these tests in this section.

Accuracy Test: Simulation-Based Calibration
Our first test aims to assess the accuracy of the approximate posterior, q ϕ .Although, theoretically, a sufficiently large number of simulations and optimally trained normalizing flows would inherently guarantee this accuracy by design (as we minimized the KL divergence between q ϕ and the true posterior), SimBIG employs a relatively small number of simulations for the size of the data vector and dimensionality of the parameter space.Hence, assessing the accuracy of the estimated posterior is pivotal.To conduct this evaluation, we employ simulation-based calibration (SBC; Talts et al. 2020) on the validation dataset that was excluded from the training of our posterior estimate (see §4.2).
For each test simulation with skew spectra, x test,i , we draw N sample = 10, 000 samples from q ϕ (Θ | x test,i ).We then, calculate the fraction of the parameters that are below the true cosmological parameter of the simulation: 4 shows the distribution of the rank statistics for k max = 0.5 Mpc −1 h and R = 5 h −1 Mpc.In the case of independent samples from the true posterior, we should expect uniformly distributed rank statistics.We see that the rank statistics for all parameters are nearly flat.For Ω m , we see a slight ∩-shape distributed, which is an indication that the estimated posterior is broader than the true posteriors (i.e.underconfident).Therefore, our constraint should be considered conservative.In Appendix E we also show the validation tests for a larger smoothing scale of R = 10 h −1 Mpc and a smaller cutoff of k max = 0.25 Mpc −1 h.

Robustness Test: Mock Challenge
Next, we perform the SimBIG "mock challenge" using the test simulations TEST0, TEST1, and TEST2 described in §4.3.Among the three test simulations, TEST0 is the closest to the training dataset as they share the same N -body simulation, halo finder, and HOD model parameters.In contrast, TEST1 used a different halo finder (FoF, which does not resolve substructure as well as the Rockstar).It includes only standard HOD parameters and assumes that only the halo mass governs the galaxy-halo connection.Lastly, TEST2 is the most different one from the training dataset; It is built from a different N -body simulation (Abacus) with halos identified with a different halo finder (Compaso) and populated with galaxies using a standard 5-parameter HOD model.
To test the robustness of our trained model to changes in the forward model, we compute the mean and the standard deviation of the parameter posteriors for each test simulation and plot the distribution of the difference between the posterior mean ( θi ) and the fiducial value, normalized by the standard deviation σ θi for each of the parameters and show the three test sets in Fig. 5.If the analysis is robust to variation of the forward model, we expect to obtain a statistical consistency for the distributions across the three test sets, while the posterior mean is not expected to be an unbiased estimate of the true parameter.Given the good consistency for the five cosmological parameters for the three test sets, we conclude that our pipeline is robust and can be applied to observational data.

Inferred Posterior on BOSS CMASS-SGC Data
Having validated the trained models on various test simulations, we now move on to applying q ϕ to ob- servational data.To illustrate the importance of extracting non-Gaussian information of the LSS beyond what is encoded on the power spectrum, we first compare the cosmological constraints from the skew spectra with those from previous SimBIG analysis of the power spectrum multipoles (Hahn et al. 2022).Next, we compare the results from the skew spectra with those from previous SimBIG analysis of the bispectrum monopole (Hahn et al. 2023) to compare their information content.Lastly, we discuss the dependence of the inferred constraints on several of the analysis choices including the smoothing scale, the minimum and maximum scale cuts, the shot noise subtraction.Table 1 of Appendix C presents the 1σ uncertainties on cosmological parameters for each of these studies.

Comparison with Power Spectrum Multipoles
For this comparison, we select an optimal scale in both the maximum wavenumber k max = 0.5 Mpc −1 h and a smoothing scale of R = 5 h −1 Mpc.This scale choice should, in principle, allow us to extract information from the galaxy field maximally.We notice that, the k max cut for the skew spectra and the power spectrum (or the bispectrum) is not strictly speaking equivalent.As we will allude below, on the one hand, smoothing of the observed field washes out small-scale information and on the other hand, the convolution of two density fields, mixes small-and large-scale information.
Fig. 6 shows the inferred posteriors on cosmological parameters from the skew spectra (blue) and the power spectrum multipoles (grey).The full posterior distributions including the HOD parameters are shown in Fig. 8 of Appendix C. Without applying the BBN prior, we find the posterior mean and the 68% CF to be Ω m = 0.288 +0.024 −0.034 , Ω b = 0.043 +0.005 −0.007 , h = 0.759 +0.104 −0.050 , n s = 0.918 +0.041 −0.090 , and σ 8 = 0.778 +0.066 −0.093 .Compared to the power spectrum, the skew spectra improve the constraints in terms of the 68% CL by 34%, 35%, and 18% in matter density Ω m , baryon density Ω b , and h, respectively.However, the constraint in σ 8 are weaker than that from the power spectrum.As mentioned earlier, the larger uncertainty on σ 8 can be associated with two facts.First, smoothing the observed field in the skew spectra washes out some information on small scales, which plays a role in constraints on σ 8 .Second, the convolutional nature of skew spectra kernels causes a redistribution of information across various scales.Consequently, the influence of σ 8 is not exclusively related to small scales.Imposing the BBN prior on baryon density using importance sampling with ω b = Ω b /h 2 = 0.02268 ± 0.00038, the skew spectra improve the constraints on matter density Ω m , baryon density Ω b , and h, by 34%, 49%, and 38%, respectively.Despite the reported improvements over the SimBIG power spectrum, it is crucial to recognize two points; first, the inferred posteriors from the power spectrum on most parameters are underconfident, thus, the reported relative improvement by the skew spectra can potentially change with a better-trained model for the SimBIG power spectrum.Second, the skew spectra utilized in this study are not fully optimized for constraining cosmological parameters.Instead, they represent maximum-likelihood estimators of amplitude-like parameters.Consequently, the reported constraints could potentially be enhanced by strategically weighting the quadratic field using either the Fisher information matrix (Heavens et al. 2000) or the score function (Alsing & Wandelt 2018).The anticipated improvement arises from up-weighting the Fourier modes that contribute the most to the constraints on a given cosmological parameter.We defer further investigation into this possibility to future work.

Comparison with Bispectrum Monopole
Given that the construction of skew spectra inevitably mixes scales and washes out the information below the smoothing scale, we compare the skew spectra and the bispectrum using a more conservative choice of scales: applying a smoothing scale of R = 10 h −1 Mpc and a lower cutoff scale of k max = 0.25 Mpc −1 h.The comparison of the constraints from the three statistics at higher k max is presented in Appendix E. Fig. 7 shows the constraints from the skew spectra (blue), the power spectrum multipoles (grey) (Hahn et al. 2022), and the bispectrum monopole (ma- genta) (Hahn et al. 2023).Here the BBN prior is imposed for all analyses.The constraints from the bispectrum monopole are obtained using k max = 0.3 Mpc −1 h, while those from the power spectrum assume k max = 0.25 Mpc −1 h.The three summary statistics provide consistent cosmological constraints.The width of the posteriors from P Sn and B 0 , except for σ 8 , are nearly identical.Again, the reduced constraining ability of the skew spectra on σ 8 can be attributed to its inherent smoothing and convolutional structure, as previously discussed.Without BBN prior, the constraints from the bispectrum monopole on most of the parameters are tighter.We note that since for R = 10 h −1 Mpc the inferred posteriors of all parameters from skew spectra show under-confidence (see Fig. 9), the reported comparison with the bispectrum monopole may be altered with future improvements of SimBIG framework.
Although the above configuration offers a more meaningful comparison between the skew spectra and the bispectrum, we also compare the results of the two statistics for a smaller value of the smoothing scale of R = 5 h −1 Mpc and considering k max = {0.25,0.5} Mpc −1 h.With the reduced smoothing scale, the constraints by the skew spectra is weaker when contrasted with the bispectrum.Additionally, we observed a ∼1σ discrepancy in Ω b between the skew spectra and the bispectrum monopole under these scale cuts.A similar discrepancy in h emerges when applying the BBN prior.The fact that skew spectra and bispectrum results align only with conservative scale cuts suggests model misspecification may be affecting each statistic differently.This could be due to the skew spectra's sensitivity to redshift-space distortions, which are not entirely accounted for in the bispectrum monopole.In particular, the HOD model's inaccuracies in characterizing satellite galaxy kinematics might affect the results from the skew spectra.A thorough understanding of this model misspecification requires more detailed future research.

Constraints on HOD parameters
In addition to the cosmological parameters, we also show the full posterior distribution including the HOD parameters with BBN priors, in Figs. 8 of Appendix C, respectively.Our analysis reveals no significant enhancement in HOD constraints using the skew spectra compared to the power spectrum multipoles and bispectrum monopole.We found that the posterior distribution of σ log M touches the prior range from the right, indicating that we need to extend the HOD simulations to a wider prior. 15However, this expansion is unlikely to substantially affect the posterior of other parameters, considering the weak degeneracies between σ log M and other parameters.Additionally, the degeneracy directions in HOD parameters derived from the skew spectra are similar to those from the power spectrum multipoles, as predicted by Hou et al. (2023).This similarity is likely attributable to both estimators accessing redshift-space information.It is worth noting that among cosmological parameters, σ 8 is the one most degenerate with HOD parameters.Given the limited constraints on HOD parameters from skew spectra, this degeneracy can be partially responsible for the rather weak constraints on σ 8 .More discussion in the implication of posterior distributions of HODs can be found in Appendix C.

Assessment of Alternative Analysis Configurations
To gain further insight into how alternative analysis choices impact the inferred posteriors, we trained additional sets of normalizing flow models for five additional configurations.For each case, we trained 2,000 -3,000 models and considered the ensemble average of the top 10 models with lowest validation loss.All analyses are performed setting the smoothing scale of R = 5 h −1 Mpc and including scales up to k max = 0.5 Mpc −1 h.We summarize our observations here and refer the interested reader to Appendix E for further details and visualizations of the results.We also present the 1σ constraints for these analyses in Table 2 • Removal of the outliers in the training data: we remove training galaxy catalogs with extreme number densities and skew spectra amplitudes before conducting SBI.Removing these outliers changes the constraining power of the inferred posteriors by < 15%.This consistency holds both with and without the BBN prior (see Appendix B for further details on different outlier removal schemes).
• Subtraction of the Poisson shot noise: in the standard analysis based on an explicit form of the likelihood, the Poisson shot noise is subtracted in the estimator of a given summary statistics to reduce the covariance.With our SBI approach, the inferred posteriors with or without shot noise subtraction in the skew spectra estimator yields similar constraints (see Fig. 11).This suggests that the SBI method effectively distinguishes the shot noise and clustering components of the skew spectra.Further validation of this interpretation for galaxy samples of varying number densities is left for a future work.
• Choice of the large-scale cutoff, k min : given that the systematic weights for BOSS data are not welltested on large scales, we investigate the impact of excluding the scales close to the survey's fundamental modes k f from the analysis.We find that setting k min = 0.01 [Mpc −1 h], as done in the standard BOSS analyses (e.g.Beutler et al. (2017)), has only a percent level impact on the overall constraints on ΛCDM parameters.This is somewhat expected since imposing this cutoff only excludes a single k-bin given our binning scheme.
• Choice of the small-scale cutoff, k max : to quantify the dependence of the results on the small-scale cutoff, we experimented with The most significant degradation, especially in Ω m (30%), was observed with a cutoff k < 0.15 Mpc −1 h.
Even with such an aggressive cutoff, the overall constraints remained comparable to those from the power spectrum with k max = 0.5 Mpc −1 h.This conclusion holds with or without BBN priors (see Fig. 11).
• Choice of smoothing scale: we compared the results with the posterior inferred from skew spectra with smoothing scale R = 10 h −1 Mpc.Larger smoothing scales leads to expected degradation in constraints, most notably a 40% decrease in Ω b precision.Other parameters were less affected (see Fig. 12).

DISCUSSIONS AND CONCLUSIONS
This paper presents the first cosmological constraints from analysing the full set of galaxy skew spectra obtained from the BOSS DR12 CMASS-SGC dataset.This work is part of a series of articles by the Sim-BIG collaboration, in which several summary statistics of CMASS galaxy sample were analyzed (Hahn et al. 2023b), including the power spectrum (Hahn et al. 2023a(Hahn et al. , 2022)), the bispectrum (Hahn et al. 2023), the wavelet scattering transform (Régaldo-Saint Blancard et al. 2023), and the field-level analysis using the CNN (Lemos et al. 2023).
In order to apply the skew spectra to observational data from spectroscopic galaxy surveys, we developed a new FFT-based estimators for the skew spectra, akin to power spectrum and bispectrum FKP estimators.This estimator integrates essential components such as the survey mask, systematic weights, and a subtraction of the Poisson shot noise.
We use the SimBIG framework to perform a simulation-based inference using normalizing flows for neural density estimation to obtain cosmological constraints.The SimBIG pipeline incorporates accurate forward modeling of the observed galaxy distribution for the BOSS CMASS-SGC sample, enabling SBI to infer posteriors for cosmological and HOD parameters from summary statistics of choice.Before applying the Sim-BIG framework to CMASS data, as in previous SimBIG analyses, we perform two validation tests to ensure that the inferred posteriors are robust and unbiased.Our validation tests, which include simulation-based calibration and the SimBIG mock challenge, confirms that the inferred posterior is robust and unbiased for both parameters.
Setting k max = 0.5 Mpc −1 h and R = 5 h −1 Mpc, and without introducing any informative priors, the skew spectra enhance constraints on Ω m , Ω b , and h by 34%, 35%, and 18%, respectively, over constraints from the power spectrum multipoles.Imposing the BBN prior, we improve the constraint on h by a factor of 2.3, enabling us to constrain h at 3% level with only 10% of the full BOSS survey.These relative improvement can potentially be modified given the current underconfident posterior distribution inferred from the SimBIG power spectrum.The inferred value of n s tends to be lower than that of Planck2018 (?), while the mean of the posterior of h tends to be higher.We attribute these tendencies to either statistical fluctuation or potential model misspecification on small scales.A more detailed investigation of this matter is reserved for future work.
The skew spectra did not yield a reduction in uncertainties on σ 8 and HOD parameters compared to the power spectrum.This lack of improvement can be attributed to the smoothing and convolutional nature of the skew spectra.The smoothing tends to diminish small-scale modes, and the convolution mixes scales, transferring information from small to larger scales.Consequently, the distinctive information typically associated with σ 8 and HOD parameters, often tied to small scales, becomes dispersed across various scales, complicating interpretation.To validate this intuition, one could explore an alternative parameterization of the galaxy-dark matter relation, such as using a perturbative forward model (Tucci & Schmidt 2023).Additionally, the skew spectra's limited constraining power may be influenced by the low mean number density and volume of the CMASS-SGC galaxy sample.Further investigation into these issues is deferred to future research.
Our results confirm the expectation that on the semilinear scales (setting R = 10 h −1 Mpc and imposing k max = 0.25 Mpc −1 h), the skew spectra capture the majority of the information of the bispectrum.The SimBIG analyses of the bispectrum monopole and the skew spectra yield comparable constraints on all parameters, except for σ 8 , which is better constrained by the bispectrum monopole.Extending the analysis to the smaller scales (setting R = 5 h −1 Mpc and imposing k max = 0.5 Mpc −1 h), we observe the inferred constraints from the skew spectra are weaker relative to the bispectrum.In addition, we report a ∼ 1σ discrepancy in the mean of the posterior of Ω b and h from the bispectrum monopole and skew spectra.This discrepancy can hint at a potential model misspecification on small scales, possibly due to inaccuracies in HOD descriptions.We leave a more in-depth investigation of this issue to future works.
Beyond the main result, we performed a series of tests to explore whether our cosmological constraints are impacted by the removal of outliers from training data, the subtraction of the Poisson shot noise, the choice of large-and small-scale cutoff scales, and the choice of the smoothing scale.We found that removal of the outlier can affect the constraints mildly, less than 15%.Regarding the shot noise subtraction, we observe a marginal impact on the posterior distributions.We found that applying a highly conservative scale cut (k max = 0.15 Mpc −1 h) and using a larger smoothing scale lead to degradation of the constraints on Ω m and Ω b , respectively, leaving the uncertainties on other parameters largely unaffected.
This work can be extended in several directions.Firstly, there is room for improvement in the optimality of the skew spectra.The current kernels defining the quadratic fields were designed for optimal constraint on the amplitude of the primordial power spectrum and bispectrum, the growth rate of structure, and galaxy biases.However, they may not fully capture all the shape information.A future extension can refine the kernels to be "shape-sensitive" and optimally capable of capturing cosmological information.Additionally, the 14 galaxy skew spectra can be broadly classified into three families (see Fig. 2).Consequently, the size of the data vector may be potentially reduced by constructing optimal combinations.
Given that the skew spectra are optimally designed for constraining PNG, another natural direction for future work is to analyze BOSS data to constrain the Gaussianity of initial conditions using the full set of galaxy skew spectra.Within SimBIG framework, this analysis requires incorporating non-Gaussian initial conditions into the forward model and investigating whether the HOD model is affected by PNG.
Combining the power spectrum and the skew spectra should improve the constraints from the two statistics individually.Hence, an investigation of an optimal strategy for effectively combining multiple summary statistics is left for future work.
Lastly, considering the observed limited sensitivity of the skew spectra to HOD parameters in our analysis, an interesting avenue for exploration would be to in-vestigate the constraining power of skew spectra when performing SBI using an alternative forward model e.g., based on perturbation theory Tucci & Schmidt (2023).
While finalizing this draft, a related work, Chen et al. (2024), appeared on arXiv, in which an application of a subset of the skew spectra considered here (neglecting the line-of-sight-dependent quadratic kernels) to BOSS data was presented.In contrast to this work, in which we focused on constraining ΛCDM using SBI, Chen et al. (2024) employed a subset of three skew spectra to constrain primordial non-Gaussianity of equilateral and orthogonal shape using perturbation theory and asssuming a Gaussian likelihood.
B. PRE-PROCESSING OF THE MEASUREMENTS TO REMOVE OUTLIERS The SimBIG forward-modeled galaxy mocks display wide variations in the mean number density and the clustering amplitude due to the broad HOD parameter priors.To enhance the stability of our training process and ensure that all features meaningfully contribute to the neural networks, we preprocess the training dataset, by removing "outlier" mock catalogs.Furthermore, we normalize the measured skew spectra by dividing them by square of the power spectrum to obtain features of order unity.We compared several algorithms for removing the outliers, which we describe below.
We remove the outliers based on the amplitude of the skew spectra, and compare the performance of three algorithms; (1) elimination based on the z-score: this is a simple prescription to remove the data points that lie on the tails of a (Gaussian) distribution.The data points with a z-score (the number of standard deviations from the mean) greater than a threshold are declared to be outliers.(2) Local Outlier Factor (LOF Breunig et al. 2000) is a machine-learningbased anomaly detection algorithm.It hinges on the idea that anomalies will have a significantly lower density of neighbours than on average, indicating that they are relatively more isolated.It assesses the local density of data points by comparing the density of a data point to its neighbours.(3) Isolation Forest (IF Liu et al. 2008) is also machine-learning based; the IsoForest isolates anomalies by randomly partitioning the data and creating isolation trees (a decision tree for anomaly isolation).It exploits the fact that anomalies are fewer and more isolated than normal instances, making them easier to separate within the isolation trees.
In the main analyses presented in this paper, we apply the Local Outlier algorithms using its implementation in sklearn package (Pedregosa et al. 2011).We also tested the impact of not removing the outliers and found that it can change the constraints on the 5 cosmological parameters less than 15%.The results are summarized in Table 2.

C. FULL POSTERIOR DISTRIBUTION: CONSTRAINTS ON THE HOD PARAMETERS
To illustrate the constraining power of different summary statistics (P ℓ , B 0 , S R ) and the degeneracies between the nuisance and cosmological parameters, in Fig. 8, we show the full posterior distributions on model parameters from analyses of the BOSS CMASS-SGC sample.For all three statistics, the BBN prior is imposed, and the small-scale cutoff scale is set to k max = 0.5 Mpc −1 h.For the skew specra, the smoothing scale of R = 5 h −1 Mpc is imposed.
Generally, except for log M 1 (the characteristic halo mass scale to host satellite galaxies), the skew spectra do not improve the constraints on any of the HOD parameters over the power spectrum.The inferred posterior on σ logM (the scatter in the halo mass -galaxy luminosity relation) hits the prior range, indicating that additional mock catalogs sampling a wider prior range are necessary.Similar to σ 8 , the failure of the skew spectra in improving the constraints on the HOD parameters can potentially be associated smoothing of the input field and the convolutional nature of the skew spectra.The former washes out small-scale modes, while the latter mixes different scales and transfers information from small to larger scales.This interplay potentially disperses the information usually linked to σ 8 and the HOD parameters, which are typically associated with small scales, thereby complicating their interpretation.
The comparison between the skew spectra and the bispectrum monopole reveals that their constraining power on HOD parameters is quite similar.Notably, certain HOD parameters, like log M min and η cent , even show enhancements in constraint accuracy.This observation suggests that the velocity information encapsulated by the RSD could be crucial for more effectively constraining HOD parameters.
Worth noting that among cosmological parameters, σ 8 shows the strongest degeneracy with several HOD parameters, especially, log M min , η cent , η sat , which are poorly constraints by the skew spectra.These degeneracies play a part in the limited constraining power of the skew spectra on σ 8 .Therefore, analysis of skew spectra using a different model of dark matter -galaxy relation, e.g., the perturbative biasing description, could potentially provide better constraints on σ 8 .

D. VALIDATION TEST WITH CONSERVATIVE SCALE CUTS
In §5.3.2, we presented a comparison of the skew spectra, the power spectrum multipoles, and the bispectrum monopole for a rather conservative choice of scale cuts (k max = 0.25 Mpc −1 h, R = 10 h −1 Mpc).This section provides the two validation tests to assess the accuracy and robustness of the estimated posteriors from the skew spectra.
Fig. 9 shows the distribution of the rank statistics, which exhibits ∩-shaped distributions for all parameters.This behavior indicates that the estimated posteriors are broader than the true posteriors (i.e.underconfident) and implies that the skew spectra constraints should be considered conservative.Fig.10 shows the distributions of the difference between the estimated posterior means and the fiducial values of cosmological parameters across the three test simulations.The statistical consistency of the distributions across the three test sets demonstrates the robustness of the trained models to details of the forward modeling.

E. DEPENDENCE OF THE CONSTRAINTS ON ANALYSIS CHOICES
As described in §5.3.4,we perform several additional SBI analyses to investigate the dependence on the inferred cosmological constraints on various analysis choices.For each configuration, we re-train 2,000 -3,000 neural network models.We report the 1σ constraints on cosmological parameters for these tests in Table 2.We provide further details on these tests here.
First, we study the impact of shot noise.For this test, we opt for k max = 0.5 Mpc −1 h and R = 5 to enhance the influence at small scales where shot noise effects are non-negligible.The left panel of Fig. 11 shows the comparison with and without shot noise subtraction.We found that the constraints on Ω b and h are identical in the two cases, while the widths of the σ 8 and n s posteriors are mildly affected.For Ω m , we observe a slight shift of the peak of the posterior.Despite these marginal differences, we conclude that the constraints are largely unaffected by the subtraction of shot in the estimator.This implies that in contrast to a conventional likelihood-based inference approach, the SBI approach can effectively isolate the influence of shot noise and extract the cosmological response based on a given summary statistics.However, we note that validating this interpretation requires performing further tests on samples with varying number density and employing different summary statistics .
Second, we investigate the impact of the maximum scale cuts by choosing k max = {0.15,0.25, 0.5} [Mpc −1 h].Here, we set the smoothing scale to R = 5 h −1 Mpc.Due to the convolutional structure of the skew spectra and the smoothing operation, small-scale information is partially transferred to larger scales.As previously shown in the numerical Fisher forecast of Hou et al. (2023), while skew spectra can achieve better constraints at relatively large scales compared to the power spectrum or the bispectrum monopole, the constraining power also saturates faster as we push the k max cuts towards smaller scales.This saturation is also apparent in the fast-dropping power in Fig. 2. As expected, applying an aggressive small-scale cut at k < 0.15 Mpc −1 h degrades the constraints, particularly Ω m by 30%.Nevertheless, when imposing the BBN priors, even for this cutoff, the constraints from skew spectra are still comparable to those from the power spectrum multipoles with a scale cut of k max = 0.5 Mpc −1 h.Third, we the impact of the choice of the smoothing scales by imposing R = 5 h −1 Mpc and R = 10 h −1 Mpc.Fig. 2 shows that the smoothing scale can shift the featured peak of the skew spectra, which could potentially be sensitive to information at different scales.Furthermore, a larger smoothing scale washes out the small-scale fluctuations more significantly.We find that, as expected, the constraints degrade when using a larger smoothing scale.The degradation is particularly visible in Ω b by ∼ 40%.For other parameters, the effect is marginal.

Figure 2 .
Figure2.Measurements of the 14 skew spectra from the subsample of the BOSS DR12-SGC galaxies before the shot noise subtraction.Different colors correspond to two smoothing scales of R = 5 h −1 Mpc (in red) and R = 10 h −1 Mpc (in black).Due to smoothing, skew spectra approach zero at small scales.

Figure 3 .
Figure 3.The three contributions to the Poisson shot noise of individual galaxy skew spectra given in Eq. (3).The smoothing scale is set to R = 5 h −1 Mpc.Grey curves denote the total shot noise contribution.

Figure 4 .Figure 5 .
Figure 4. Accuracy test: shown are the rank statistics of the validation dataset for the configuration R = 5 h −1 Mpc and kmax = 0.5 Mpc −1 h.For most of the parameters, the rank statistics are uniformly distributed.The slight ∩-shape in the Ωm plot is an indication of an under-confident (conservative) error bar.

Figure 6 .
Figure6.Posterior distribution of cosmological parameters from the power spectrum multipoles (grey) and the skew spectra (blue) setting kmax = 0.5 h −1 Mpc for both.The smoothing scale for the skew spectra is set to R = 5 h −1 Mpc.

Figure 7 .
Figure7.Posterior distribution of cosmological parameters from the power spectrum (grey), the skew spectra (blue), and the bispectrum monopole (red) imposing the BBN prior, and setting kmax = 0.25 h −1 Mpc for the first two and kmax = 0.3 Mpc −1 h for the third.The smoothing scale is set to R = 10 h −1 Mpc for the skew spectra.
A17) and the operators O[a, b] that act on arbitrary fields a and b as

Figure 8 .
Figure 8. Posterior distribution of cosmological parameters inferred the skew spectra (blue) and power spectrum (grey) with a cut-off scale kmax = 0.5 h −1 Mpc including the BBN prior and smoothing scale R = 5 h −1 Mpc.

Figure 9 .Figure 10 .
Figure 9. Rank statistics of the validation data for analysis of the skew spectra with R = 10 h −1 Mpc and kmax = 0.25 Mpc −1 h.

,Figure 12 .
Figure 11.Left: Posterior distribution of cosmological parameters with (purple) and without (green) subtracting the Poisson shot noise in skew spectra estimators.Right: Posterior distribution of cosmological parameters for three choices of small-scale cutoff: kmax = 0.15 Mpc −1 h in red, kmax = 0.25 Mpc −1 h in orange, and kmax = 0.5 Mpc −1 h in green.