Fewer Mocks and Less Noise: Reducing the Dimensionality of Cosmological Observables with Subspace Projections

Creating accurate and low-noise covariance matrices represents a formidable challenge in modern-day cosmology. We present a formalism to compress arbitrary observables into a small number of bins by projecting into a subspace that minimizes the log-likelihood error. The lower dimensionality leads to a dramatic reduction in covariance matrix noise, significantly reducing the number of mocks that need to be computed. Given a theory model, a set of priors, and a simple model of the covariance, our method works by using singular value decompositions to construct a basis for the observable that is close to Euclidean; by restricting to the first few basis vectors, we can capture almost all the signal-to-noise in a lower-dimensional subspace. Unlike conventional approaches, the method can be tailored for specific analyses and captures non-linearities that are not present in the Fisher matrix, ensuring that the full likelihood can be reproduced. The procedure is validated with full-shape analyses of power spectra from BOSS DR12 mock catalogs, showing that the 96-bin power spectra can be replaced by 12 subspace coefficients without biasing the output cosmology; this allows for accurate parameter inference using only $\sim 100$ mocks. Such decompositions facilitate accurate testing of power spectrum covariances; for the largest BOSS data chunk, we find that: (a) analytic covariances provide accurate models (with or without trispectrum terms); and (b) using the sample covariance from the MultiDark-Patchy mocks incurs a $\sim 0.5\sigma$ bias in $\Omega_m$, unless the subspace projection is applied. The method is easily extended to higher order statistics; the $\sim 2000$ bin bispectrum can be compressed into only $\sim 10$ coefficients, allowing for accurate analyses using few mocks and without having to increase the bin sizes.


I. INTRODUCTION
Most conventional analyses of cosmological data proceed by measuring a summary statistic, computing a theory model, and comparing the two in a Gaussian likelihood. A key ingredient in this is the inverted covariance matrix (also known as the precision matrix). In the simplest case, this is measured by creating a number of mock datasets, computing the desired statistic in each, then estimating the sample covariance directly. In order to obtain an unbiased precision matrix estimate, a large number of mocks is required [1], and, it has been further shown that noise in the precision matrix leads to stochastic shifts in the best-fit parameters, requiring inflation of the output parameter covariances [2][3][4][5]. To reduce these biases, it is important to use a large number of mocks, though creating such a sample requires considerable computational effort, since the mocks are also required to be accurate. Furthermore, the bias increases with dimensionality; upcoming galaxy surveys such as those of DESI [6], Euclid [7], the Rubin Observatory [8], and the Roman Telescope [9] will provide substantially higher resolution data, with an associated increase in number of bins.
There exists significant literature pertaining to alternative methods for galaxy survey covariance matrix generation. On one end of the scale sits purely theoretical models of the covariance, which, for the galaxy power spectrum and its multipoles, is possible via perturbation theory (PT) [10][11][12][13]. Whilst these are theoretically well motivated, we are fundamentally limited by the applicability of the perturbative model in the non-linear regime, and, without simulations, it is unclear how to set model hyperparameters such as galaxy bias and PT counterterms. A natural extension of this therefore is semi-analytic models, which combine well-understood theory and free parameters that can be calibrated from a (small) number of simulations. These exist for a range of statistics, including two-point correlation functions [11,[14][15][16][17], three-point correlation functions [18], power spectra [19,20] and bispectra [21], though the applicability of the model assumptions must be rigorously tested in all circumstances.
An alternative approach is to find different ways of estimating the covariance matrix from simulations, for example by using noise reduction techniques (such as tapering [22], shrinkage [23] and invoking sparsity [24]), calibrating the covariance matrix with inexpensive small-volume simulations [25,26], or expanding the precision matrix around some smooth fiducial model [27]. Of particular interest are schemes that reduce the dimensionality of the data-vector via some form of compression. As the number of bins falls, precision matrix noise becomes less important, allowing one to use less mocks to generate the covariance. The question, however, is how to perform this data compression.
In this work, we develop a method to compress a given data-vector by computing an information maximizing subspace from the theoretical model for our data, using techniques developed for gravitational wave analyses [28]. In practice, we take a large sample of N bank points in the underlying space of N param cosmological and nuisance parameters, and compute model data vectors (which we denote the 'template bank') at each point. Using a singular value decomposition of the resulting N bank × N bin noise-weighted matrix, we generate a set of basis vectors which are ordered in signal-to-noise and, together, contain all cosmological information. By restricting to the first N SV of these modes, we capture the dominant contributors to the specific experimental likelihood, in a much lower dimensional space. The analysis then proceeds using the projection of the full data-vector into this subspace. Assuming one has a somewhat accurate initial ansatz for the covariance, this procedure gives the highest signal-to-noise possible for a fixed dimensional data vector using linear transformations.
Ours is certainly not the first work performing analysis along these lines, with other examples including bispectrum compression via approximate eigenvalue decompositions [29] and expansions in polynomial basis functions [30][31][32][33], as well as the COSEBI method for cosmic shear observables [34,35]. Perhaps most notable are the MOPED algorithm [36][37][38][39] and Karhunen-Loève methods [40,41], both of which perform linear transformations to compress the data into a set of N param values which leave the Fisher matrix unchanged. Whilst these have been shown to have utility in cosmological analyses [42,43] they necessarily assume the posterior surface to be Gaussian, i.e. that all information is contained within the Fisher matrix. Further, the compression accounts only for the linear response of the model data vector with respect to parameter changes around a fiducial point in model space, without consideration of whether this is valid across the parameter space spanned by the priors. The subspace method developed herein instead optimizes the log-likelihood itself, ensuring that the χ 2 factors of the original and compressed likelihoods agree up to some fixed limit across the whole prior domain. Notably, this requires using a few more modes than parameters to encapsulate nonlinear parameter responses, though the precise number can be robustly set from accuracy considerations. Furthermore, whilst many methods use an approximate covariance of the compressed data-points (e.g., by assuming them to be independent) allowing for analytic posterior sampling, we instead opt to use the full covariance for accuracy, as in Ref. [43].
Whilst our approach is fully general and can be applied to any observable, given a theory model, parameter priors and a fiducial estimate of the covariance, we here consider the analysis of galaxy survey data in redshift-space, using the approach of Refs. [44][45][46][47][48][49]. By creating a template bank from the theory model, we can generate a subspace of much reduced dimension, which, for a large number of mocks, gives similarly accurate parameter inference to the full likelihood, and, for few mocks, substantially reduces the noise-induced bias.
The paper has the following structure. In Sec. II, we provide a mathematical derivation of the template bank and subspace decomposition, as well as discussing the corresponding subspace likelihoods. Sec. III discusses the expected constraints obtained from the subspace analysis and the effects of noise in the data and covariance. We apply the formalism to the galaxy power spectrum in Sec. IV, before discussing extensions to other analyses in Sec. V and concluding with a summary in Sec. VI. Appendices A & B contain derivations of key results used in Sec. III, with Appendix C presenting additional material related to Sec. IV.

A. The Template Bank and Euclidean Subspace
A general cosmological model is specified by a set of parameters θ, which can be fundamental, nuisance or systematic. Whilst the arguments below strictly apply to any physical model, this work will principally be concerned with the analysis of galaxy power spectra, via the one-loop Effective Field Theory (EFT) model [44][45][46][47][48][49]. This carries the parameter vector where additional parameters can be added to the first set to more finely probe cosmology. Here, we will use only these three for simplicity, but we note that the method can be arbitrarily extende. Model parameters generate a Riemannian manifold to which we can associate a vector field of N bin -dimensional model spectra, P a (θ), where a specifies the component (k-bin and multipole) of the power spectrum evaluated at θ. For later use, we here consider the power spectra relative to a suitably defined mean spectrum P , i.e. δP (i) = P (θ (i) ) − P , where P will later be set to the mean power spectrum from the template bank. We may define the inner product as the noise weighted distance from the mean spectrum; where C is some fiducial covariance, encoding the metric. This is motivated by considerations of the χ 2 of a sample power spectrumP with model P (θ) and covariance C D , which is simply a squared norm; It is convenient to perform a global linear transform that centers and whitens the power spectra, whilst preserving the inner product; Note that the metric is Euclidean in this set of co-ordinates. 1 We can write an arbitrary rotated spectrum X(θ) in terms of a set of N bin orthonormal basis vectors V α , i.e.
with coefficients c α , where the basis vectors V α satisfy V α |V β = δ K αβ . Hence the inner product of any two (rotated) vectors can be written as a product of basis coefficients; To compute the basis vectors, V α , of Eq. 6, we sample a set of N bank spectra from the parameter manifold (known as the 'template bank') and execute a singular value decomposition. 2 Practically, this identifies the directions in the vector space which have the largest contributions to the log-likelihood, χ 2 . Given a set of points, {θ (i) }, on the (bounded) manifold with corresponding spectra {P (i) } (each of dimension N bin ), we first compute the rotated spectra {X (i) } at each point using Eq. 4, setting P to the mean of all template bank spectra. 3 By stacking the rotated spectra, we can form an N bank × N bin matrix X ia , which has the SVD, where D α is a N SV × N SV diagonal matrix of the singular values (SVs) and the matrices U and V (of dimension N bank × N SV and N SV × N bin ) are unitary, projecting from observations to SVs, and SVs to spectra respectively. 4 By comparison with Eq. 6, we see that the basis coefficients are given by The principal value of this decomposition is that any spectrum can be well approximated by a relatively small number of basis vectors, such that for N SV < N bin , where we assume that the SVs are arranged in decreasing order. 5 This is possible since |c α | ≤ D α as U is unitary, and D α is small for large α. This corresponds to projecting the spectra into an N SV dimensional Euclidean subspace, as we approximate We may fix N SV by again considering the inner product. The mean squared distance of all N bank spectra from the mean spectrum P is given by the average χ 2 ; since V and U are unitary. If one uniformly increases the number of template spectra across the manifold-withboundary by a factor f , the mean χ 2 (which is a geometric property of the manifold, unrelated to the template bank in the large N bank limit) should be invariant, thus D α ∝ √ f . We thus conclude that D α ∝ √ N bank . Assuming χ 2 to be a good estimator of the distance between any two spectra in the template bank (and hence any two theory models on the prior-bounded manifold), the value of N SV may be set by excluding those SVs which, in total, contribute less than some fixed χ 2 min to χ 2 ; this fixes the optimal number of SVs, N * SV , to For a perfect measurement, with non-degenerate parameters, we would additionally require N * SV ≥ N param , where N param is the dimensionality of the manifold. This would ensure that features of the spectra are not lost under the subspace projection, though, in practice, parameter degeneracies limit this.
A few comments on the SVD are needed. Firstly, we note that the decomposition of Eq. 8 implies (keeping co-ordinates implicit for clarity), which is just a diagonalization with eigenvalue matrix D 2 . From this, it is clear that the SVD performs an eigendecomposition (or equivalently, a Principal Component Analysis; PCA) of X T X; the covariance matrix of (whitened) model spectra X(θ) across the prior manifold. The first basis vector in V (with largest singular value) therefore identifies the mode of the power spectrum that sources most of the variance between all templates sampled from the model space; the second vector identifies the second largest mode, subject to being orthogonal to the first one, and so on. Keeping only the basis vectors corresponding to the largest singular values (i.e. Eq. 11) thus encapsulates the main variation in P (k) seen across the sampled model space. Given that we weight by C −1/2 , this is equivalent to finding the modes which contribute most to the log-likelihood, χ 2 . Importantly, this approach is different from a standard eigendecomposition or PCA of the measurement covariance matrix [e.g., 29], which identifies the orthogonal modes of the power spectrum that explain most of the variance of the measurement when sampling over different random realizations of the measurement rather than different samples from the model space. Unlike our approach, this PCA is agnostic of the model space, and thus cannot exploit the underlying structure, e.g., that cosmological parameters typically change multiple k-bins simultaneously in a smooth manner (implying that neighboring bins are correlated with respect to parameters, even on very large scales). Whilst the approach of covariance PCAs is to discard any information corresponding to modes that are too noisy to be measured well, the effect of using SVDs is instead to discard any information which does not affect our analysis, made possible by sampling only a finite region of theory space. Whilst this may lead to loss of information regarding unsampled parameters, it ensures that the basis vectors do not include much information about parameters for which we have strong prior knowledge.
An additional point of note concerns the choice of parameters, {θ (i) }, with which we generate the template bank. A simple choice would be to draw samples uniformly from each cosmological parameter (subject to some limits), though there is freedom in how to choose the parameters, for example Ω m or ω cdm , A s or log A s . In the general case, this will have non-trivial impact on the basis vectors, since it will give different weights to different sections of the manifold from which X is sampled. However, we note that the SVD does not receive any information on the input parameters, only the output spectra, and thus, if the locations on the manifold are held fixed, the decomposition is invariant of the choice of co-ordinate chart. Furthermore, we expect identical results (on average) when drawing samples from any linear transform of the input parameters (assuming that the manifold boundary remains unchanged). In general however, we suggest using the same choices of parameters and priors (or at least a superset of these) to draw {θ (i) } as will be used in the later MCMC analysis, such that the basis vectors well represent the case in hand.
Whilst our decomposition is blind to the input parameters, there exist alternative approaches that include such information, for example partial least squares and canonical correlation analyses [51]. In these formalisms, both the input parameters and model data-vector are projected into a latent space; the benefit of this is that it gives the linear combinations of parameters that define the best-constrained features in the data-vector. Whilst this sounds appealing, in our context its interpretation is difficult since the optimal parameter sets are non-trivial combinations of cosmological and nuisance parameters, the latter of which are less meaningful.

B. Application to Observational Likelihoods
Whilst the above is somewhat abstract, it is of use when we consider observational data, since it gives us a rigorous method for which to reduce the dimensionality of the spectra. For a model power spectrumP , the log-likelihood of parameters θ given data covariance C D is simply Under the previous rotation by fiducial covariance C (Eq. 4), this can be written Note this is only diagonal if the fiducial covariance matches that of the data. When performing inference from a small number of mocks, the data covariance is not well known, thus it is preferred to use a smooth model for C instead, giving a slightly non-Euclidean space. In terms of the basis vectors of Eq. 6, and using modes only up to N SV , we obtainχ In the second line, we have noted this likelihood is simply a Gaussian likelihood for theĉ variables, given covariance C D , which is simply the full covariance C D projected into the subspace. For C D = C, it is simply a unit matrix. Note that this can be written in terms of the inner product of Eq. 5 aŝ where the subscript D indicates that this is with respect to the metric g αβ = C −1 D,αβ , rather than being strictly Euclidean.
Given a set of mocks, we may estimate the projected covariance directly from the measurements ofĉ in each. 6 Since this contains fewer bins than the unprojected covariance, it is less susceptible to bias from covariance matrix noise. The inference proceeds by computing the model power spectrum for each chosen parameter vector, projecting onto the reduced V α basis, and computing the likelihood of Eq. 17 using the mock-based C D subspace covariance. MCMC can then be performed using this subspace likelihood. 7 Subject to a suitably chosen N SV , we expect the incurred χ 2 error to be small, and thus the posteriors to be unbiased.

A. Noise-Averaged Constraints
It is important to show that the θ estimates obtained in this subspace formalism are unbiased, and produce a good estimate of the parameter covariance. To do this, we write the data coefficients asĉ = c(θ * ) +n where θ * are the true parameters andn is some noise vector, assumed to be uncorrelated with theory. For the simple case of matching true and fiducial P (k) covariances (C = C D ), we have a diagonal subspace metric C −1 D,αβ = δ K αβ , which implies that each element ofn is independent and drawn from a Gaussian of unit variance. In contrast, due to the use of SVD to define the subspace basis vectors, we expect the magnitude of the c α (θ * ) coefficients to fall strongly with α; we thus expect the signal-to-noise of the modes to fall monotonically, such that the later components are noise dominated and can be justifiably removed.
The subspace log-likelihood is given by Eq. 17, which we may write as another inner product in the space spanned by c(θ); The observed mean parameter vectorθ is defined by maximizingχ 2 ; where i ∈ {1, ..., N param } indexes the parameter vector. Averaging over noise, this simply requires c(θ * ) − c(θ) = 0, implying thatθ = θ * (assuming that the mapping θ → c(θ) is injective, which is usually the case in cosmological contexts). Note this is true for any value of N SV ; i.e. the subspace decomposition always gives an unbiased estimate of the parameters, as required.
For the parameter covariance, we start from the Fisher matrix, defined by Averaging over noise allows us to drop the second set of parentheses (sinceθ = θ * ). From this, we can see that using less than N bin SVs (i.e. restricting to a subspace) produces a shift in the noise-averaged Fisher matrix; This generates a corresponding shift in the noise-averaged parameter covariance Φ = F −1 ; As shown in Appendix A, ∆Φ ij is positive semidefinite; i.e. restricting to a non-trivial subspace can only increase the parameter covariance. Whilst this shift exists, it is expected to be small, since the c α coefficients rapidly decrease with index α, thus we expect similar behavior for the parameter derivatives and hence contributions to ∆Φ ij . 6 It is worth noting that the subspace coefficients, c, do not have straightforward correspondences with individual cosmological parameters.
As an example, consider an analysis measuring only the primordial amplitude. As required by the SVD, the first coefficent dominates, but will have contributions from all nuisance parameters that affect the amplitude, especially those that control the high-k regime, where the error bars are small. 7 Note that it is not sufficient to simply use the template bank samples to compute the likelihood posterior surface (as in done in gravitational wave analyses, e.g., Ref. [28]). For a posterior that is compact in the prior space, there are very few template bank samples near the minimum point of the likelihood. In this example, we have a minimum log-likelihood of 300 in the data, versus just three in the MCMC output. This could be reduced by using more restrictive priors on the template bank.

B. Noise in the Data Vector
Whilst the above subsection gives the behavior of the subspace likelihood averaged over noisy realizations, it remains to consider how the best-fit parameter is biased for individual noise realizations. Assuming the inverse covariance matrix C D to be known precisely, Eq. B5 of Appendix B shows the shift in best-fit parameter for a given noise realizationn to be equal to adopting the inner product of Eq. 19 and neglecting (subdominant) noise in the parameter covariance, Φ. Since this is a stochastic quantity it is best understood by computing the shift covariance, δθ i δθ j . Using the definition n αnβ = C D,αβ , the average can be simply obtained; As noted in the previous section, Φ ij increases as N SV is decreased, thus a reduction in the number of basis vectors necessarily increases the magnitude of best-fit parameter shifts. However, given that c α (θ * ) falls rapidly with increasing index α (due to the SVD), we expect higher SVs to have only small contributions to the shift covariance.

C. Noise in the Precision Matrix
Of greater importance in this work is noise in the precision matrix, which generally appears when estimating C −1 D using a finite number, N mock , of mocks. As shown in Eq. B8 in Appendix B, a bias in the precision matrix, δΨ, leads to an additional shift in the best-fit parameters Whilst the exact form is unimportant for our purposes here, we note that it depends on the product δΨn; there is no bias from noisy precision matrices if the data is noiseless. For noisy data, this shift is present, though it vanishes on average providing that δΨ = δΨn = 0. 8 Averaging over statistical noise in bothn and δΨ (assuming Gaussian statistics), we obtain the shift covariance from precision matrix noise; [2,4], which includes the Hartlap factor required to debias the inversion of noisy covariance matrices [1]. In the limit of large N mocks N SV , this is approximately It is here that we see the utility of using N SV < N bin . This dramatically decreases the parameter bias obtained using small N mock , and, conversely, allows one to perform equally accurate inference using many fewer mocks, greatly improving the computational efficiency for surveys with a large number of bins. The standard way of accounting for such errors is to inflate the output parameter covariances [5]; using fewer SVs reduces this need, implying that the parameter confidence contours will reduce as N SV decreases.

D. Choice of Fiducial Covariance Matrix
The choice of fiducial covariance C used to generate the SVD subspace warrants discussion. For N SV = N bin , this is arbitrary since the mapping from power spectra to coefficient space is invertible; equivalently, χ 2 is independent of C. In the general case it is important, since the fiducial covariance defines the mapping onto the lower-dimensional subspace. If C is equal to the power spectrum data covariance C D , the subspace metric is Euclidean and all c α coefficients are uncorrelated. Since we define basis vectors via SVD, we expect the information content of the basis coefficients to decrease monotonically with the index α; having a diagonal covariance ensures that we do not throw away information arising from correlations between low and high basis coefficients. In general, the subspace metric is defined by the projection (Eq. 17), which becomes more diagonal as C approaches C D (since V is unitary). For this reason, using a poor estimate of C D will lead to significant correlations between basis coefficients, and thus reduce the efficacy of the method by introducing a larger shift in the parameter covariance (Eq. 23) as SVs are removed. It is thus desirable to use some input estimate of C which is close to the truth (to allow for efficient subspace projections) and low-noise (to ensure the basis vectors are smooth). It is worth noting however, that any invertible choice of fiducial covariance gives unbiased parameter estimates in the limit of zero noise; optimizing this simply ensures that the posteriors are less affected by noise for a given (small) number of basis vectors.

IV. APPLICATION TO BOSS DR12
A. Mock Data-sets and Covariances We apply the formalism of the above section to galaxy power spectra taken from the twelfth data release (DR12) [52] of the Baryon Oscillation Spectroscopic Survey (BOSS), which is part of SDSS-III [53,54]. For simplicity, we use only the largest of the four data-chunks, the high-z North Galactic Cap sample, with mean redshift z = 0.61 and volume V = 2.8h −3 Gpc 3 . In this work, all mock data is drawn from the publicly available 9 power spectrum multipoles from 2048 MultiDark-Patchy mocks (hereafter Patchy) [55][56][57], including 48 k-bins in [0.01, 0.25]h Mpc −1 for both monopole and quadrupole moments, as in Refs. [44][45][46]49]. Two choices of mock data are used: a single Patchy realization, which emulates the BOSS sample, and the mean-of-48 realizations, to ensure our analysis is unbiased.
For the covariance matrix, we consider three possible choices: (1) the sample covariance from Patchy mocks; (2) the analytic covariance presented in Ref. [13]; and (3) a simplified Gaussian (plus shot-noise) covariance computed for the best-fit BOSS cosmology. In the former case, we use either N mock = 2000 or 125 mocks (excluding those used to define the data-vectors), defining the covariance via the usual formula cov (P a , P b ) ≡ C D,ab = 1 whereP (i) a is the power spectrum of the i-th mock, subscripts indicate the k bin and multipole, and the overbar represents the mean over all mocks. When forming χ 2 we require the inverse covariance, Ψ D ; to ameliorate noiseinduced bias, we include the Hartlap factor [1], defined by Since the other choices of covariance are analytic, they do not require a Hartlap factor. The first of these uses perturbation theory to compute the off-diagonal trispectrum terms and the contributions from super-survey modes, whilst the diagonal spectrum is estimated from Patchy. In contrast, the second assumes Gaussianity (and thus no off-diagonal terms before window convolution), with the diagonal computed via the one-loop theory model used in this work. Both include the BOSS window function, as discussed in Ref. [13].

B. Creation of the Template Bank and Subspace Coefficients
The template bank is generated following the procedure of Sec. II A. We first draw 10 5 points in the parameter space of Eq. 1, subject to the broad priors; where N (µ, σ 2 ) indicates a Gaussian prior and all quantities are in h Mpc −1 -type units. Note that the flat priors are only for the generation of the template bank, and are not used in the later MCMC analysis. 10 For each set of parameters, the power spectrum model is computed via one-loop Effective Field Theory, as implemented in the CLASS-PT code [47], then convolved with the BOSS window function. Note that the priors are purposefully very broad; given more restrictive priors, the domain of the parameter manifold can be reduced, leading to fewer required subspace coefficients.
To compute the basis vectors of the subspace discussed in Sec. II A, we first require the fiducial covariance C, which defines the rotated power spectra X (Eq. 4). Whilst setting this equal to the sample covariance would ensure that the subspace metric is diagonal, this is seldom desirable since, unless N mock is very large, it will lead to noisy basis vectors and hence less optimal subspace decompositions. In general, it is important to use a fiducial covariance that is (a) relatively smooth, and (b) provides a fair approximation to the true covariance. For this reason, we principally set C equal to the analytic covariance matrix of Ref. [13], as described above. 11 Using this, we rotate the template spectra into the X variables and apply SVD as in Eq. 8 to compute the set of basis vectors {V α }. This further defines the SVs {D α } which can be used to set N SV . 12 From the condition of Eq. 12, we find the χ 2 deficit (Eq. 17) from excluding all SVs above some N SV drops below 0.1 by N SV ≈ 12 (depending on the exact covariance), thus we expect this to give reasonable results. This corresponds to compressing the data by a factor of eight, and is slightly larger than the number of parameters (10), indicating the effects of non-linearities in the transformation θ → P (θ) (and hence the likelihood) that cannot be well represented as linear across the prior domain. Alternatively, this signifies a break-down of the assumption that all information can be encapsulated in the rank-N param Fisher matrix. Below, we will also consider N SV = 48 and the uncompressed power spectrum vector for testing purposes. Had we used more cosmological or nuisance parameters in our model, we would expect N SV to increase correspondingly.
Given the above Patchy mocks, we generate corresponding subspace coefficients {ĉ (i) α } using the first N SV basis vectors via Eq. 6. These are then used to compute the sample covariance of the coefficients via the standard formula; (cf. Eq. 30), which is used in the subspace likelihood (Eq. 17). As for the former covariance, this requires a Hartlap factor to invert; the upside is that the Hartlap factor will be closer to unity here if N SV < N bin , shrinking the parameter covariances. Additionally, we will find it useful to perform the analysis using analytic covariances within the likelihood (rather than just as the fiducial covariance); in this case, we can form the subspace data covariance from Eq. 29.

C. Posterior Sampling Methods
With the datasets and basis vectors in hand, posterior surfaces are computed using Markov Chain Monte Carlo (MCMC) methods, here implemented via montepython v3.3 [58,59]. Given the necessity to run MCMC a significant number of times to validate our method, we implement two approaches to expedite the sampling. Firstly, we note that any parameter which enters the power spectrum model P (θ) linearly can be analytically marginalized without loss of information [60,61]. This procedure is described in Appendix C, and simply leads to those parameters being absorbed into a cosmology-dependent covariance matrix. In our case, the nuisance parameters b 4 , c s,0 , c s,2 and P shot fall in this category, and are thus marginalized over directly. Note that this does not affect the SVD decompositions since it is applied only in the final MCMC step.
Secondly, when computing multiple posteriors at similar locations, it is illogical to rerun the MCMC each time from scratch. Instead, we opt to run the MCMC once, saving the (unprojected) power spectra for each point in parameter space, then use these samples to reconstruct the posterior for a different analysis, e.g., a different choice of N SV , via standard importance sampling techniques. This is fast (since it does not require a Boltzmann code to be run) and highly accurate for posteriors close to the initial MCMC chain. The latter requirement can be assessed by the effective sample size of the resampled chain [e.g. 62], equal to 10 10 5 samples is likely overkill; the change to the dominant basis vectors is negligible when this is reduced to 10 4 samples. 11 Note that having C = C D does not bias our inference, though it changes the efficiency of our subspace projection. If we use the same number of SVs as bins, the likelihood is independent of our choice of C. Assuming the sample covariance to be fixed, our analysis is robust to inaccuracies in the fiducial covariance. 12 See Fig. 6 for a representative plot of the Dα SVs.  where w i are the importance sampling weights. For a new posterior, if the effective sample size is too small ( 10 4 samples) when importance sampling from any pre-existing set of spectral samples, we simply create a new MCMC chain for this configuration.

D. Results
Below, we present the results of the subspace analyses of the mock power spectrum data. Unless otherwise specified, we use a single realization of mock data, set the fiducial covariance to that of Ref. [13] (including both Gaussian and non-Gaussian components) and use a sample covariance from 2000 Patchy mocks. In all cases, we plot the derived parameters H 0 , Ω m and σ 8 ; in the true Patchy cosmology, these are given by 67.77, 0.3071 and 0.8288 respectively.

Bias in the Noiseless Limit
The first important check is whether the subspace analysis gives an unbiased estimate of cosmological parameters in the limit of zero noise. Fig. 1 displays the posterior contours when the above analysis is applied to the mean-of-mocks data-set, comparing 96 uncompressed power spectrum bins, 48 SVs and 12 SVs. Comparing the modal parameters to the true Patchy cosmology (from which the mock data is generated) shows excellent agreement in all cases; furthermore the posterior shapes are consistent between all data-sets, with no noticeable increase in the parameter variances due to the subspace projections. This matches the theoretical calculation of Sec. III A, which asserted that the subspace likelihood would give an unbiased estimate of the mean, but a small increase to the parameter covariance; we here confirm that this increase is negligibly small even when using N SV = 12.

Bias from a Single Realization
Next, we consider the analogous results using data from a single Patchy mock, emulating the true observational sample. As discussed in Sec. III B, we expect some parameter shifts as a result of the differing noise properties, since data-vectors with lower N SV include fewer noise-dominated modes, and thus, on average, should have less bias from noise. This dependence is however slight and we expect some variation between realizations. These results are shown in Fig. 2 and confirm our suspicions; there is a noticeable shift in parameters (in particular Ω m ) as we move from 96 to 48 to 12 bins in the data-vector. Notably, the noise-induced bias for this sample decreases as we use fewer SVs, and are thus less affected by noise. In any case, although the noise properties of the subspace likelihoods differ somewhat from those of the true data, the results of the previous subsection confirm to us that the analysis is not biased as a result.

Bias from Precision Matrix Noise
Perhaps the most important test is whether restricting to a small number of subspace coefficients allows us to use fewer mocks. To test this, we simply run the likelihood analysis with a single mock data-set and two sets of covariance matrices; one generated from 2000 mocks and one with eight times fewer (matching the factor by which we optimally compress the data). These results are shown in the first two data-sets of Fig. 3, and we immediately note large shifts in the best-fit parameters for the uncompressed likelihood analysis, alongside an inflation of the parameter error bars, matching the predictions of Sec. III C. Notably, these shifts dramatically decrease as we compress the data, and, for N SV = 12 (the value suggested by the constraints on ∆χ 2 in Sec. II A), these are insignificant. This is the main result of this paper; we can obtain unbiased estimates of cosmological parameters with just O(100) mocks when using optimal subspace projections.
Whilst the above discussion correctly shows the bias to the cosmological parameters sourced by precision matrix noise, it does not accurately reflect real analyses, which usually account for this by inflating the parameter error bars. To do this, one rescales the parameter covariances by the constant factor 13 13 It is important to note that we focus only on data-vectors that are not used to build the sample covariance. For this reason, we do not include the M 2 factor used in the mock catalog analyses of Refs. [5,63]. Our m 1 factor is equivalent to M 2 1 in the notation of Ref. [63].
[5]. This is more complex than that of Eq. 27 (which is equal to the numerator of m 1 ), since noise in the covariance matrix inflates the observed parameter error bars in addition to giving a best-fit parameter shift. When rescaling the observed parameter covariances to account for best-fit variation, this must be accounted for, giving the reduced shift of Eq. 35 [56]. The m 1 numerator (equivalently, Eq. 27) gives the necessary parameter covariance inflation relative to the N mock → ∞ case, which is a good indicator of the efficacy of the subspace decomposition; for N mock = 125 and N param , we obtain 1 + B(N bin − N param ) ≈ 4.3, 1.5, 1.0 for N bin = 96, 48, 12, whilst m 1 ≈ 3.0, 1.3, 1.0 for the same data. The results including m 1 are shown in the third and fourth data-sets in Fig. 3; we note that the width of the one-dimensional histograms for the N mock = 125, N bin = 96 case is inflated by ∼ 2x, as expected. Following the rescaling, the results for N mock = 125 and 2000 are broadly consistent, though this is with a significant loss of precision, or equivalently, effective survey volume.

Changing the Fiducial Covariance
An important hyperparameter is the fiducial covariance, C, used to define the subspace. In Sec. III D, it was stated that, whilst any (invertible) choice of fiducial covariance would lead to an unbiased estimate of cosmology when averaged over noise, noisy estimates, or those far from the true covariance, would lead to greater noise-induced shifts for small numbers of SVs. To test this, Fig. 4 compares the posterior predictions from a single mock power spectrum analyzed with three fiducial covariances: (1) the analytic covariance discussed above (with off-diagonal terms); (2) the diagonal of the Patchy covariance matrix from 2000 mocks; and (3) the full Patchy covariance matrix. In all cases, the data covariance is held constant.
Notably, we observe no significant differences between posteriors when using N SV = 48 but larger shifts with N SV = 12. In particular we note a bias from using the Patchy fiducial covariance; this is attributed to the large offdiagonal noise present therein, which leads to less efficient subspace decompositions. This conclusion is strengthened by the results with the diagonal of the covariance matrix; whilst this does not accurately represent the true covariance (since window effects induce non-trivial mode-coupling and hence off-diagonal terms are present even if one assumes Gaussianity), it has low noise, and is consistent with the analytic prediction. We thus conclude that it is important to have a relatively smooth estimate of the fiducial covariance (though not necessarily one that matches the true covariance), else the low-SV results will be significantly biased by noise.

Changing the Data Covariance
Can consistent results be obtained from analyses using different data covariances? This question extends beyond subspace analyses, but can be well-probed in our formalism, since we can separate out the effects of precision-matrix noise by reducing the number of basis coefficients. In Fig. 5 we display the results from a selection of covariance matrices; the 2000-mock Patchy covariance, the analytic prescription of Ref. [13] (including trispectrum and supersample terms) and an analytic Gaussian covariance. In the conventional analysis using 96 power spectrum bins, there are seen to be significant (∼ 0.5σ) shifts in Ω m when different covariances are used, especially between Patchy and the analytic prescriptions. 14 Notably, adding the trispectrum terms to the covariance does not appear to induce a significant parameter shift. As N SV decreases, these differences become small, and, by 12 SVs, we report no obvious deviations in the best-fit parameters when using different covariances.
From these results, one may conclude that, for a BOSS-like sample volume and tracer density, the trispectrum terms in the covariance do not alter the output parameter posteriors. Further, there is a bias induced by using the publicly-available Patchy covariance, which disappears as the number of bins is reduced. This can thus be attributed to residual noise in the covariance matrix, and must be carefully taken into account to avoid biasing the output cosmology, for example by using the subspace-based analysis. These conclusions agree with the results of Ref. [64], which analyzed the BOSS data using the perturbation theory covariance matrices [13]. Results are plotted for the analytic covariance of Ref. [13] (green, full lines), the diagonal of the sample covariance from 2000 Patchy mocks (red, dashed lines), and the full Patchy covariance (blue, dotted lines). We additionally show the posterior from the mean-of-mocks analysis of Fig. 1 (yellow, dot-dashed lines) and note that all likelihoods use the full 2000-mock Patchy covariance in the likelihood. For 48 SVs, there is little difference between choices of fiducial covariance, but some shifts for 12 SVs. We attribute this to noise in the SVD basis vectors induced by using noisy fiducial covariances (i.e. Patchy).

A. Dependence of Number of SVs on Priors and the Survey Volume
In Sec. IV B, we found that, for the BOSS sample used in this work, 12 SVs were needed to ensure that the mean subspace χ 2 was consistent with the usual power spectrum χ 2 to within χ 2 min = 0.1. This is not a general statement, since it depends on the fiducial covariance matrix, the cosmological model and our choice of priors. In the above, we opted to use broad priors implying a lack of knowledge of the posterior; tighter priors lead to smaller variation of χ 2 across the parameter manifold, thus requiring fewer SVs. A simple test of this is to repeat the SVD for power spectrum samples drawn from the posterior rather than the prior template bank. This is simply done, and represents the minimum number of SVs which can be safely used, since it is the number one would obtain if they started with full knowledge of the posterior space. In this case, we find that only 8 subspace coefficients are required, rather than 12. This is less than the number of model parameters, indicating that the parameters are partially degenerate.
The required number of SVs also has dependence on the observational volume. If the survey size is increased by a factor f , we expect the covariance to fall by a factor f , and thus the SVD matrix D α to scale as f 1/2 . From Eq. 13, we may mimic this by choosing N SV such that the cumulative signal-to-noise of excluded SVs is less than χ 2 min /f rather than χ 2 min . For the priors considered herein, increasing the survey volume by a factor of 10 (making it comparable to the DESI volume), whilst making the unrealistic assumption that there is still only one tomographic bin, the number of required SVs increases from 12 to 16. This increase is due to the non-linear parameter dependencies becoming more important (with respect to noise), though we note it to still be a small number compared to the 96 power spectrum bins, due to the steepness of the D α .

B. Application to Bispectra
Whilst the main focus of this paper has been the galaxy power spectrum, the subspace decomposition is fully general and can be applied to any observable, given a theory model and a set of priors. Of particular interest is  [13] (red, dashed lines), and an analytic Gaussian covariance (blue, dotted lines) alongside the mean covariance of Fig. 1 (yellow, dot-dashed lines). In all cases, the fiducial covariance (used to define the subspace basis vectors) is set to the analytic covariance of Ref. [13], and we analyze a single mock data set with differing compressions. Using 12 SVs, the posteriors are identical for each covariance, though there are larger noise-induced discrepancies when using greater numbers of bins.
the galaxy bispectrum, since this statistic generically has a large number of bins, which has limited its applicability in previous mock-based approaches. 15 Any formalism that is able to substantially compress the bispectrum, whilst retaining its information content, is thus of great importance, allowing mock-based analyses to take place in reasonable computation times. 16 To test the applicability of our method to the bispectrum, we may simply ask the question: how many SVs are needed to reproduce the bispectrum likelihood to within ∆χ 2 = 0.1?
For this forecast, we consider a simplified scenario; a survey with the effective volume and redshift of the largest BOSS data-chunk, but with a periodic box geometry. We assume the same power spectrum model as above (one-loop effective field theory), keeping the k-binning constant (i.e. with 48 k bins for each of the monopole and quadrupole). For the (redshift-space monopole) bispectrum, we use tree-level theory as in Ref. [66], for k in [0.01, 0.15]h Mpc −1 . This gives a total of 2135 bispectrum bins and 96 power spectrum bins. To generate the subspace basis vectors we require also a fiducial model for the joint covariance (Sec. II A); for this we assume a Gaussian covariance for both observables (matching that of Ref. [66]), noting that the exact choice of fiducial covariance is of limited importance in the analysis. For simplicity, we assume the power spectrum and bispectrum to be uncorrelated, i.e. that they are observed in separate regions of the sky. By instead inserting a Gaussian model for the cross-covariance [67], we have shown that this assumption does not have a significant impact on the observed χ 2 , or number of SVs.
A set of 10 4 template bank samples is then computed from the prior, and the SVD performed. As in Sec. II A, the output SVs can be used to assess the impact on the prior-averaged χ 2 from including only a subset of all bins; in Fig. 6, we plot the χ 2 deficit for analyses using (a) only the power spectrum, (b) only the bispectrum, and (c) their combination. As previously noted, there is a steep dependence of ∆χ 2 on the number of SVs, with negligible contributions from the higher basis vectors. As before, we consider the optimal N SV to be the minimum number which yields a total (prior-averaged) χ 2 error below 0.1; this is found to be 12 for the power spectrum, 9 for the bispectrum, and 21 for their combination. 17 Further, any correlations between observables are expected to reduce the combined number. The efficacy of this method is impressive; we can compress a combined data-vector with 2231 bins into just 21 (even in the presence of non-linearities in the likelihood), which would allow easy analysis with O(100) mocks.
Similar analysis is possible for the combination of full-shape analyses, such as that presented herein, with BAO information from reconstructed power spectra (as in Ref. [46]). This is straightforward to test; we simply generate a template bank of the power spectrum multipoles coupled with the Alcock-Paczynski parameters, α = {α , α ⊥ }, and apply the SVD, supplementing the fiducial covariance with the measured joint-covariance of α and multipoles discussed in Ref. [46]. Here, we find that only a single additional extra SV is required to accurately approximate the (significantly more constraining) χ 2 . . Impact on χ 2 from using a finite number of basis vectors in the likelihood for mock analyses using the power spectrum, the bispectrum, and the combination of two probes (as described in the main text). For each case we plot the difference between the true χ 2 (from all N bin bins) and that using NSV SVs. The horizontal line indicates ∆χ 2 = 0.1, i.e. beyond this point, the sum of all remaining SVs contribute less than 0.1 to the total log-likelihood. For the power spectrum analysis, 12 basis vectors are required to reach this limit (compared to 96 bins), whilst for the bispectrum, only 9 are needed (compared to 2135 bins).
In combination, one needs 21 SVs, though this is an upper bound as the two observables are assumed to be uncorrelated.

VI. SUMMARY
In this work, we have introduced a formalism to reduce the dimensionality of cosmological observables by linearly projecting the observables into χ 2 -maximizing subspaces. Given only a theoretical model and a set of priors, a set of model predictions can be computed and used to define a quasi-Euclidean metric space via a singular value decomposition. This provides a natural compression for the observables; by restricting to the first N SV basis vectors of the subspace, the dimensionality of the data is significantly reduced whilst the likelihood remains almost unchanged. Observables such as the power spectrum and bispectrum can be represented with O(10) subspace coefficients, incurring χ 2 errors of < 0.1 over the broad prior manifold, even in the presence of non-linearities in the likelihood. Whilst the method has some dependence on an assumed fiducial covariance, this does not bias the inference, and requires only a simple (low-noise) estimate.
The principal appeal of this concerns covariance matrices; likelihood analyses are inherently sensitive to precision matrix noise induced by the use of finite numbers of mocks, which is drastically reduced by data compression. Indeed, we prove that restricting to low-dimensional subspaces significantly reduces the induced parameter biases, whilst keeping the noise-averaged constraints unchanged (except for a slight increase in parameter covariances, which is shown to be negligible in practice). Indeed, due to the rescaling factors needed to ensure that the estimators are unbiased in the presence of noise, our subspace likelihoods give more precise parameter constraints when the number of mocks is small.
Our method is validated by application to mock BOSS DR12 power spectrum multipoles. In particular, we show that the 96-bin power spectrum data can be robustly compressed to a set of 12 subspace coefficients, allowing for accurate parameter estimations using only 125 mocks. The required number of coefficients has only weak dependence on the prior space; adding maximally restrictive priors would allow the number of coefficients to be reduced to only 8, in our 10-parameter model. Whilst we have assumed a simple ΛCDM model in this work, this is by no means a restriction; the formalism can apply to any extension for which model spectra can be evaluated and further include arbitrary parameters describing systematics. Whilst one should recompute the basis vectors given a new cosmological model, this is not a major concern, since creating the template bank requires computing far fewer spectra than needed for a well-converged MCMC analysis.
It is useful to compare our methodology to the SVD-based techniques used to de-noise the power spectrum and bispectrum covariance matrices in Refs. [68][69][70]. In this approach one diagonalizes the covariance matrix and uses only the highest signal-to-noise (S/N ) eigenvalues in the likelihood analysis. In contrast to S/N , we use ∆χ 2 from variations of all relevant parameters as a criterion to define the subspace projections, which allows us to better capture the eigenmodes that are sensitive to features in the power spectrum shape. Indeed, we have seen that our method can fully extract the information encoded in the BAO wiggles, which contribute very little to the overall S/N , but dominate the distance constraints.
With the addition of just a small number of extra coefficients, the method can also apply to larger volumes (where non-linearities in the likelihood become more important, and thus Fisher-based compressions fail), as well as more complex statistics, such as bispectra or combination with BAO constraints. For the BOSS sample analyzed herein, we need only 12 SVs to encapsulate 96 power spectrum bins; this rises to 16 for a ten-times larger survey (like DESI), and only a further eight are required to incorporate the ∼ 2000 bin bispectrum. An additional possibility is for projected statistics such as lensing; the large number of bins across different redshifts in 3x2pt analyses could be substantially reduced using SVs. In general, the exact number of subspace coefficients depends both on the chosen parameter and experimental set-up (e.g., survey redshift and volume) and should thus be computed separately for each experiment, but this is a trivial operation to implement once the SVD has been performed.
The methods developed herein provide a useful sandbox in which to investigate an important question in cosmology: how parameter estimates depend on power spectrum covariances. Focusing on the largest data-chunk of BOSS, we find that the inclusion of off-diagonal trispectrum terms in the covariance does not affect the parameter estimates, though this will generically depend on the shot-noise and k binning. Further, in the uncompressed likelihood, there is a ∼ 0.5σ shift in the best-fit posteriors for Ω m using a covariance drawn from MultiDark-Patchy mocks compared to that using analytic covariances, leading to a bias away from the true cosmology (and in the direction of increasing tension with Planck ). In contrast, using a small number of subspace coefficients, all covariances yield consistent results, indicating that (a) there is residual noise in the mock-based covariance that can bias the inferred cosmological parameters even if 2000 mock catalogs are used, and (b) analytic prescriptions can be accurately applied. Subspace projection provides a simple way of ameliorating this, through limiting the effects of covariance matrix noise.
Approaches such as that of this work are vital when using statistics beyond the power spectrum. In particular, bispectrum analyses should no longer be considered mock-limited, opening up new and exciting avenues into the exploration of cosmological data-sets. The future grows brighter for higher-order statistics.
full Fisher matrix is simply; where σ 2 i is the variance of parameter ψ i . The Fisher matrix in the subspace defined by N SV ≤ N bin SVs is given bŷ using the Cauchy-Schwarz inequality in the second line. Since N SV ≤ N bin and the derivatives are real, each component of the sum is positive, henceF and thus Contracting this with an arbitrary vector x, one can easily show this to be negative semidefinite; where we have again invoked the Cauchy-Schwarz inequality. From Eq. A1, the parameter covariance shift ∆Φ is thus positive semidefinite.
agreeing with standard results (e.g., Eq. 23 of Ref. [2]). To simplify this further, note that at first order, where the Φ = F −1 is the standard parameter covariance. A number of well-known results are apparent from Eq. B5: (a) the best-fit parameters are shifted by noisy data; (b) the amplitude of the shift is proportional to the parameter covariance; (c) noise on the precision matrix gives an additional shift in parameters, though is only present when the data is itself noisy. Note we have made no assumptions thus far on the number of SVs used in the α, β summation, thus the above results indicate that, for noise-free data, we will never have a bias from using too few SVs.
To quantify the extent of these parameter shifts, it is easiest to consider two regimes separately, first setting δΨ = 0. The covariance of δθ is then using n αnβ = C D,αβ = Ψ −1 αβ and ignoring noise contributions to δF ij . This is a standard result; the covariance in the parameter shift δθ is just the inverted Fisher matrix. Combining Eqs. B5 & B6, we find that the precision matrix noise induces an extra shift in the best-fit parameter vector When the error in the precision matrix arises from the estimating the sample covariance with too few mocks, the covariance of the extra parameter shift can be shown to be Φ ij (B9) [2], where N param is the length of the parameter vector. 18 This just inflates the usual parameter estimate by a constant factor, which, for N mock N SV is well approximated by (N SV − N param )/N mock .
In some scenarios, one has a parameter, say ζ, that is additionally constrained to be positive (for example the shot-noise, if an estimate has not already been subtracted). Analytic marginalization can still be performed in this case, though the expression for the log-likelihood is more complex; where ζ and σ ζ define the Gaussian prior on the positive parameter and C D,M does not include ζ.