Future prospects on testing extensions to Λ CDM through the weak lensing of gravitational waves

,


I. INTRODUCTION
Gravitational Waves (GWs) are becoming a powerful cosmological tool.In 2017, coincident GW [1] and electromagnetic (EM) [2,3] observations of a merging binary neutron star system heralded the beginning of multimessenger GW cosmology.This event, named GW170817, was used to measure the present expansion rate of the Universe H 0 .The associated gamma ray burst GRB170817A allowed localisation of the binary to its host galaxy, providing a redshift.This, combined with the distance to the source found using properties of the GW waveform, gave a value of H 0 = 70.0+12.0 −8.0 km s −1 Mpc −1 [4].Precise, independent measurements of H 0 are pertinent due to the apparent tension between local Universe measurements of H 0 (such as Type Ia Supernovae [5] and lensed Quasars [6]) and the value inferred from the Cosmic Microwave Background [7].Just 50 multimessenger events similar to GW170817 would give us sufficient precision on H 0 to favour the early or late universe measurement [8].The planned improvement to the current LIGO-Virgo-Kagra (LVK) network [9] makes this goal highly achievable within the next few years, motivating exploring the further possibilities of using GWs, beyond a standard siren measurement of H 0 .By the 2040's, several other GW detectors (Einstein Telescope [10], Cosmic Explorer [11], LISA [12]), both ground-and spaced-based, will be either taking data or in the later stages of their development.The combination of * c.mpetha@ed.ac.uk these detectors will allow an exploration of more GW frequency ranges (source types) than LVK, and with much improved sensitivity.This will dramatically increase the scope of GWs for astrophysics and cosmology.Large numbers of GWs from binary systems will be ideal for statistical cosmological tests.This study aims to evaluate the prospects of the weak lensing of gravitational waves, where correlations in the small perturbations of the GW propagation can be used to infer properties of the intervening matter distribution.Such observations open up the possibility of using GWs as a novel probe of not only the geometry of the Universe, but also of large scale structure.
There are a few advantages of performing a weak lensing analysis using GWs instead of galaxies, besides it being a novel probe.It is a very clean measurement, avoiding systematics such as intrinsic alignments or blending which plague present and future weak lensing studies [13].GW detectors will also observe mergers to much higher redshifts, up to z ∼ 100.As in the standard siren case, the limiting factor in this analysis is the redshift determination.If there was a reliable GW only redshift inference method, for example sufficient source numbers for a 'Spectral Siren' analysis [14], then properties of the large scale matter distribution could be probed up to very high redshifts.Although the lensing window function peaks at low redshift, direct observation of the 'Dark Ages' would give us valuable observational information on structure formation, such that we do not need to extrapolate the properties of structure in this redshift regime from models.The key challenge of GW weak lensing compared to galaxy weak lensing is the difficulty of observing a GW arXiv:2208.05959v4[astro-ph.CO] 23 May 2024 and distinguishing it from detector noise.
The possibility for using the weak lensing of GWs (GW-WL) as a probe of cosmology was first discussed in Ref. [15] and developed in Ref. [16].GW-WL forecasting was performed in Congedo and Taylor [17] (henceforth CT19), where it was demonstrated a joint analysis of GWs, utilising both the standard siren distance measurement and the matter field information encapsulated in their weak lensing, can break key degeneracies giving better constraints than would be obtained individually.WL is largely insensitive to the Hubble parameter H 0 [18], while a joint standard siren+GW-WL analysis is sensitive to the expansion rate.The standard siren measurement provides a tight constraint on geometry parameters such as the expansion rate H 0 and matter density fraction Ω m , without relying on an external data set.In CT19 the authors assumed a number density of GW sources of 1 deg −2 (15 000 sources), and that each was a well localised event similar to GW170817 (a binary neutron star merger with an EM counterpart).But the performance of future detectors is currently speculative, and highly dependent on the unknown distribution and number of binary mergers in the Universe.To fully explore the potential of cosmological constraints through a joint analysis of standard sirens and GW-WL, a range of assumptions of source properties is needed.The analysis should also be extended to varying models of dark energy and a varying neutrino mass -parameters of interest over the coming decades.
Here we build upon the work of CT19.Tomographic weak lensing is used to exploit all available cosmological information.A variety of assumptions on source and detector properties are made, and we extend to more cosmological models to explore how GWs can be used to answer present problems in cosmology.We also include sources without an associated EM counterpart, which will comprise the bulk of future GW detections.We find that combining two merger populations (sources with and without an EM counterpart) helps break parameter degeneracies and improve constraints in the w 0 − w a and Ω m − S 8 planes, where the clustering parameter S 8 = σ 8 Ω m /0.3.We show how a cosmologically varying source redshift distribution in the weak lensing analysis can further improve dark energy constraints.This is an advantage of gravitational wave studies where an astrophysics motivated cosmic merger rate density is possible.For a combination of GW detectors in the 2040's with EM telescopes, we find a standard siren+weak lensing analysis will be capable of improving upon dark energy equation of state parameter constraints from galaxy surveys such as Euclid [19] and the Vera C. Rubin Observatory [20], and intensity mapping surveys such as HI-RAX [21].As we extend further into the future, if DeciHz GW detectors such as DECIGO [22] and the Big Bang Observer [23] come to fruition then the weak lensing of GWs will not only provide ultra precise constraints on geometry parameters, but could even give us valuable information on structure formation [24] and constrain the sum of neutrino masses Σm ν .This is the first demonstration of GWs from merging binaries being an independent probe on neutrinos.We show that, by combining GWs with redshifts from galaxy surveys, GW-WL will be competitive with other single probes such as The Vera C. Rubin Observatory [25] and CMB-S4 [26] in constraining Σm ν .Even in the most cosmology agnostic model used, νkwCDM (curvature, dark energy and neutrinos all free parameters), we obtain percent or sub-percent level errors on all parameters, apart from Σm ν .The clean and well understood measurement from GWs could be a valuable source of extra information when combining with other future surveys.
Section II details future planned GW detectors, sources of GWs and their use for cosmology.In section III the physics of the weak lensing of gravitational waves is introduced, before the modelling used in this analysis, including source assumptions and detector uncertainties, is described in section IV.Finally the results are presented in section V and we conclude in section VI.

II. STANDARD SIRENS: SOURCES AND FUTURE DETECTORS
Observing gravitational waves from compact object mergers provides the luminosity distance d L of the source, through determination of the amplitude and frequency evolution of the wave.This measurement is model-independent, giving it a clear advantage over, for example, Type Ia Supernovae which depend on calibration of the cosmic distance ladder.This direct measurement of the distance from the waveform makes GWs 'Standard Sirens'.The observed gravitational wave amplitude is given by the linear combination of the + and × polarisation of the GW, combined with their corresponding antenna pattern functions F describing a detectors response to the wave: To leading order, the individual components of the gravitational wave are given by The speed of light is given by c. M c is the redshifted chirp mass, and M 1 are the component masses), f the frequency, ι is the inclination angle and Φ is the phase.Observable binary mergers include binary neutron stars (BNS), a black hole and a neutron star (BHNS) and binary black holes (BBH), and the inspiral of these sources in quite universal.They can, however, be distinguished during the binary merger and ringdown through higher order terms in the Parameterised post-Newtonian formalism of the waveform.For more detail see Refs [27,28].
The expansion rate H(z) at redshift z is related to d L by where H 0 is the present Universe expansion rate (h = H 0 /100 km s −1 Mpc −1 is the dimensionless expansion rate), Ω m , Ω K and Ω DE are the present matter, curvature and dark energy density parameters, while the widely used CPL parameterisation [29,30] of a timevarying dark energy equation of state (EoS) is assumed.w 0 is the dark energy EoS today, and w a is its growth rate with the scale factor a. This functional form can accurately recreate the EoS for alternative dark energy models, such as a scalar field dark energy.Encapsulated in Ω m is the baryonic, cold dark matter and neutrino density components, In ΛCDM most of the matter in the Universe is Cold Dark Matter (CDM), dark energy is a cosmological constant Λ (Ω DE = Ω Λ ) with a constant EoS of w Λ = −1 (w 0 = −1 and w a = 0) and there is zero curvature (Ω K = 0).If the redshift of the GW source can also be determined, then a d L −z relation can be used to probe the cosmological parameters in Eqs (4)(5)(6)(7).This redshift determination is the limiting factor for cosmological applications of GWs.Due to the mass-redshift degeneracy in the GW waveform, some method of breaking this degeneracy, or external data, is needed to determine the redshift.GW sources can be broadly split into two categories, depending on whether they can or can not be localised to their host galaxy.These two categories are Bright Standard Sirens (BSS) and Dark Standard Sirens (DSS).BSS represent an ideal case for cosmology.Either an associated EM counterpart, or very precise sky localisation due to a high SNR, allows an accurate source redshift determination [31].For multimessenger events, there is the possibility to probe deviations from GR through comparing the EM and GW propagation [32].In the case of DSS, there is no associated electromagnetic counterpart and the sky localisation is too poor to identify the host galaxy.These sources make up the bulk of gravitational wave detections, therefore statistical redshift inference methods for large numbers of DSS are receiving more attention in the ramp up towards 3 rd generation (3G) GW detectors.
BNS mergers may produce an observable EM counterpart, depending on the merger distance and inclination angle.Supermassive black hole (SMBH) and BHNS mergers may produce an EM counterpart, but for SMBH mergers their rate is too small for statistical cosmology applications [33].For BHNS mergers their rate is uncertain, as is the fraction of these that will produce a detectable counterpart, though the expectation is they will mostly be DSS [34].Most DSS are stellar binary black hole (BBH) mergers, and for these several redshift inference methods have been proposed in the absence of a counterpart.These include: a statistical average of galaxies within the GW localisation error box [31,35,36] which has already been performed on LVK data [37], correlating the clustering of GW sources with the clustering of galaxies [38,39] or another tracer such as HI intensity mapping [40], by breaking the mass-redshift degeneracy using information on the source frame mass distribution [14,41,42], tidal corrections in the late-inspiral signal of a BNS merger [43], or an expected cosmic merger rate density of sources [44,45].
The so-called 3G ground-based GW detectors expected during the 2030's-2040's include the triangular configuration Einstein Telescope (ET), and Cosmic Explorer (CE) which is similar to LVK with longer arm lengths.In the same time period we will see the first space-based GW detector, LISA.The combined ground-based detectors will observe in the frequency range 1 − 10 3 Hz and be sensitive to most BBH and BNS mergers in the Universe [42,46], observing O(10 4 −10 6 ) binary mergers every year [47,48].The large uncertainty is due to the uncertain cosmic merger rates for different binary populations (see e.g [49] for a summary).LISA, observing in the 10 −4 − 1 Hz frequency range, will observe SMBH inspirals, stellar binaries before their inspiral and merger is detected by ground-based observatories [50], as well as many extreme mass-ratio inspirals (EMRI), which can also be used for cosmology [51].Space-based GW detectors observing in the DeciHz regime, such as DECIGO [22] and Big Bang Observer (BBO) [23], are possible successors to LISA and will bridge the gap between these two frequency ranges.Their design is far more ambitious, with four LISA-like constellations (with smaller arm lengths) in heliocentric orbits, two of which at the same location and the other two distributed around the Sun.They could observe the inspiral of binaries over periods of months before they enter the ground-based frequency range during their merger.This, coupled with their changing position and orientation, allows observations of similar numbers of mergers to 3G detectors but with much improved sky localisation, sufficient to localise a merger to its host galaxy even in the absence of a transient EM counterpart.For example BBO could contribute O(10 5 ) highly localised (∼ few arcsec) binary mergers per year [15].But while 3G ground-based detectors such as ET and CE are in the later stages of their development, the difficulty of space missions makes the future of detectors such as DECIGO and BBO less certain, and strongly dependent on the success of LISA.
To summarise, from 3G detectors we can expect O(10 5 − 10 7 ) sources observed by ground-based observatories by the end of the 2040's, and comparable numbers with much improved sky localisation in the second half of the century should space-based DeciHz detectors be launched.

III. WEAK LENSING OF GRAVITATIONAL WAVES
The gravitational potential that is created by largescale structure in the Universe induces small perturbations in the path of propagating radiation.The radiation is weakly lensed, and the resulting magnification of both electromagnetic [52,53] and gravitational [54] radiation in the geometric optics regime is given by where κ is the lensing convergence, a weighted projection of density perturbations along the line-of-sight.
For GWs the situation is seen in Fig. 1.The amplitude of the wave is increased as the wave is lensed by an overdensity, leading to a smaller inferred luminosity distance.By relating the lensed and unlensed fluxes we obtain an expression for the observed, lensed luminosity distance d obs L in terms of the true d L and the convergence: There is also a small modification to the gravitational wave phase.In the geometric optics limit any phase fluctuations will just correspond to an arrival time shift making them observationally irrelevant in the case of weak lensing [55].The strong lensing of gravitational waves, in the wave optics regime, is another science application of lensed GWs.In this case, it is useful for constraining properties of the lens and source populations [56], or as a further test of geometry [57].To use GWs to explore the large scale matter distribution we turn to their weak lensing.
From a set of many mergers, a best fit d L − z relation can be constructed, sensitive to cosmological parameters.This is the standard siren measurement.The weak lensing of each source will create scatter around this best fit curve of order 1%, and due to the cosmological principle an average from large numbers of sources will recover the true relationship.Hence from this scatter, each source provides a point estimate of the convergence field.Though the lensing error dominates towards higher redshifts, uncertainties introduced by the source redshift, distance measurement and peculiar velocities of the sources make this is a noisy estimate of the convergence field.
Sources, in our case binary mergers, are binned in redshift intervals.The variance of κ at different multipole modes ℓ (the Fourier conjugate of the angular separation) across tomographic redshift bins i and j is found through a two-point correlation function of the Fourier transformed convergence field in each bin, where C κκ (ij) (ℓ) is the convergence power spectrum.This power spectrum is sensitive to cosmological parameters, and provides a complementary probe of information to the standard sirens, through a different filtering of the same data.The auto-correlation for each redshift bin corresponds to i = j.Cross-correlating between bins adds information on the time variation of cosmological parameters.The convergence power spectrum between the i th and j th redshift bin is given by where is the Window function, containing the normalised observed source distribution p(z) and the effect of the lens on this source due to their separation.The comoving distance r is related to the transverse comoving distance where K is the curvature.The Limber approximation for k-modes, k = (ℓ + 1/2)/K(r), is shown in the matter power spectrum P δ (k, z), and is used for large values of ℓ.A common parameter to normalise the matter power spectrum is the linear normalisation of its amplitude in spheres of 8 h −1 Mpc, σ 8 [58], where W (k, R) is a spherical tophat window function.

IV. MODELLING BINARY MERGER OBSERVATIONS A. Properties of binary merger populations
In this study we use certain assumptions on the properties of the BSS and DSS, summarised by the following.
-For 3G detectors, all BSS events are assumed to be from merging binary neutron stars (BNS), while DSS are from both binary black hole (BBH) and black hole neutron star (BHNS) mergers.Though many observed BNS will also be DSS, BBH and BHNS will be the dominant contributors.For De-ciHz detectors we assume well-localised BBH and BHNS mergers also contribute to the BSS sample.This is because for these detectors the constellations of satellites distributed around the Sun provide excellent source sky localisation [23], allowing many GWs to be localised to a host galaxy from which a redshift can be determined.
-For DSS, a localisation uncertainty of 1 deg 2 is adopted [59,60].The redshift is determined by a statistical average of galaxies in the localisation error box.We set Here, σ 0,DSS = 0.03 representing expected photometric uncertainties from the Vera C. Rubin Observatory [61].Future high redshift spectroscopic galaxy surveys, such as MegaMapper [62], could allow a statistical DSS method with spectroscopic redshifts (spec−z).The two functions f (z) and g(z) represent the GW detector and galaxy survey selection functions respectively, and are discussed further below.
-For DSS, z max = 3.5, from the limit of future photometric galaxy surveys [65].These galaxies are needed to estimate the source redshift.
-There are typically expected to be a factor of 10 more BBH than BNS detections [10,[66][67][68].Only a fraction of BNS mergers will have EM counterparts.
From this we assume N DSS /N BSS = 100 for 3G detectors, consistent with current LVK detections [69].
The observed redshift distribution of binary mergers p(z) is a combination of different factors, For a given population of binaries, p (th) (z) is a theoretical probability distribution derived from a theoretical cosmic merger rate density R(z), which can be modelled from the star formation rate and binary synthesis properties.The source number distribution is found from where V c is the comoving volume.We use the BPASS predictions for transients [70], which provides R(z) for different binary populations separately, including BNS, BBH and BHNS.This number distribution is then normalised to find the probability distribution, Depending on GW source and detector parameters such as the sensitivity curve, antenna patterns, inclination angle, compact object masses etc., the GW may or may not be detected.This is modelled by the selection function g(z).In Leandro et al. [71], using a Fisher matrix analysis and SNR limit of 8 for merging BBHs, the authors find for a choice of 3G detector, where the value r cut for BBH mergers is given as 7.9 Gpc, which we adopt for the DSS case.This form is found by fitting a distribution of realistic sources that are above the SNR threshold as a function of comoving distance.In the BSS case a value of r cut = 4 Gpc is found by assuming the same functional form, but with a detector redshift limit of z = 2, as will be the case for BNS observations.The inclusion of the galaxy survey selection function f (z) in Eq. ( 17) is important as we are assuming the redshift determination of merging compact objects is through galaxy observations, either from host galaxy determination, or from a statistical average of galaxies in a photometric galaxy survey.The inclusion of a source in the observed distribution depends on whether or not it has an associated redshift, hence depends on a relevant galaxy survey selection function.A simplistic choice has been made in this case, of a tophat function with a probability of 0.9 (completeness), then a smooth decay before reaching the maximum redshift.The shape is determined by a pivot redshift (z pivot = 1.8 for BSS [72] and 2.8 for DSS [65]), These functions are also used to estimate the redshift uncertainty in the DSS case by acting as a shot noise scaling as shown in Eq. ( 16), recovering the expected behaviour of near-photometric uncertainties at low redshift [51,68], and a rapid dilution to higher redshifts as the number of GW and galaxy observations decreases.The resulting p (obs) (z) is shown for both BSS and DSS in Fig. 2. The sensitivity of the results to the forms of g(z) and f (z) is explored in Appendix A 1.

B. Source and detector uncertainties
When using GWs as standard sirens, there are several sources of uncertainty.The GW detector instrumental uncertainty (σ GW ) is assumed to have a range of values from 0.004d L to 0.1d L .The source redshift uncertainty (σ z ) is defined above.Uncertainty in the redshift owing to the source peculiar velocity (σ vpec ) is given the form An rms value of √ ⟨v 2 pec ⟩ ∼ 500 km s −1 is adopted [73].Finally is the uncertainty in the true value of d L due to the weak lensing of the GWs (σ WL ).Different predictions for the size of σ WL exist in the literature.A (pessimistic) fit used for Einstein Telescope forecasts [74,75] assumes 5% uncertainty at z = 1 and linearly extrapolates: σ WL /d L = 0.05z.A more appropriate expression derived in Ref. [76] using properties of the magnification probability distribution up to z = 3 is given by Other forms for σ WL exist in the literature [77,78], Eq. ( 24) is used here as a middle estimate.There has been development in 'delensing' the GW, by observing the matter distribution along the line of sight of the GW source [79].This can reduce the lensing uncertaintyhowever it is not useful for our purposes, where we wish to extract information from the lensing of GWs.The combined uncertainty is The observed power spectrum is the underlying convergence power spectrum, modified by a shot noise term due to our finite number of sources.
where n is the number density of GW sources, and δ ij is the Kronecker delta ensuring shot noise only contributes for auto-correlations.Possible high-order correlations between the WL and GW signal are not included.To include the effects of sky localisation on the convergence power spectrum, we assume that a Gaussian kernel damps the signal.The total observed power spectrum is then deconvolved with the same kernel, leaving the Inverse-Gaussian term in the shot-noise error seen in Eq. ( 26) [39].The maximum ℓ mode that can be probed is directly related to the localisation area, the uncertainty blows up past this ℓ mode.If the localisation area is given by A loc = Ω deg 2 , then ℓ max ∼ 180/ √ πΩ.Finally, the covariance is given by where trispectrum terms have been neglected.In this work we assume an observable sky area of 15 000 deg 2 giving f sky = 0.36 based on Euclid.This value considers the sky blocked from view by the galactic disk and zodiacal plane, and while GW observations are not limited by this, host galaxy redshifts are.

V. FORECASTING A JOINT STANDARD SIREN+WEAK LENSING ANALYSIS
The real strength of this approach comes from the combination of the standard siren and weak lensing analyses, where the standard sirens essentially provide a strong constraint on geometry in the weak lensing analysis, without needing to rely on external datasets.The Bayesian framework is outlined in CT19.Analytic derivatives can be found in the standard siren case, and numerical derivatives are needed when finding the Jacobian of the convergence power spectrum.Care is needed for these derivatives as the response of the power spectrum to a cosmological parameter can vary greatly over ℓ-modes and redshift bins.Our solution is to, for each cosmological parameter, use the median value from the optimal steps of all modes and redshift bin correlations, then test it for optimality and stability.More detail can be found in Appendix B. A Fisher information matrix is constructed and used to find the Cramér-Rao lower bound on physical parameters, which the model depends on.These are then mapped to observed parameter uncertainties using Monte Carlo methods, as the mapping is often non-linear.We sum Fisher matrices to combine the results from the standard siren and weak lensing analyses, and all quoted constraints are 1σ marginalised uncertainties.
Class [80] with HMcode [81] is used to compute σ 8 values and the auto-and cross-convergence power spectra between six redshift bins, where bin edges are defined such that there is an approximately equal number of sources per bin.A maximum ℓ mode of ℓ = 3000 is used to stay clear of the highly non-linear regime, where the statistical properties of the convergence field are close to Gaussian [82].Otherwise we could produce overly optimistic forecasts by assuming perfect knowledge of a regime with uncertain modelling.This is due to the importance of baryonic physics on such small scales [83].Nevertheless, there is still an assumption of much improved understanding of non-linearity in the matter power spectrum by the 2040's-2050's.We investigate the effect of decreasing the maximum ℓ mode on forecasts in Appendix C. Fiducial parameter choices and the range of their flat priors are given in Table I, as well as a summary of the cosmological models used in this analysis, demonstrating which parameters are allowed to vary in which models.In this work we set h 2 Ω b = 0.0224 as a fixed parameter, motivated by its strong CMB constraint from the Planck satellite [7].

A. 2040's: 3G GW detectors
In the 3rd generation of GW detectors, consisting of a network of ground-based interferometers such as the Einstein Telescope and Cosmic Explorer, there is expected to be up to ∼ 10 5 yr −1 observations of BNS mergers [84] and ∼ 10 6 yr −1 for BBH mergers [10,11,48].In Ref. [85] the authors forecast the expected number of BNS mergers with an associated gamma ray burst counterpart (making them BSS), using different assumptions on the GW and gamma ray detectors.An upper limit from CE+GECAM of N ∼ 3 × 10 4 (translating to n ∼ 2 deg −2 using our assumed fractional sky area of f sky = 0.36) over a 10 year runtime is found, and the number could be higher when considering other GW and EM detectors.The inclusion of the ET would give comparable or slightly higher numbers-there would likely be numerous coincident detections.Predictions for joint GW+EM observations (gamma ray bursts or X-rays) are also found in Refs [86,87], and are generally comparable or lower, agreeing with the range given in Ref. [85].These numbers are fairly small for statistical cosmological tests.In this case of relatively small numbers of BSS, then it is vital the abundant DSS detections are also used for cosmology.With a redshift determined from a statistical average of galaxies in the localisation error box of the GW, the redshift uncertainties of DSS are expected to be much larger than BSS.Despite this, the deeper redshift range and larger numbers means they can still be informative.
TABLE I. Parameters used in the Fisher forecasting analysis, their flat prior ranges, and the cosmological models in which they are treated as free (✓) or set to constant (✗).kΛCDM, wCDM and νCDM are ΛCDM with curvature, dark energy or neutrinos respectively as a free parameter.The most cosmology agnostic model in this analysis, νkwCDM, treats all these as free.3. Forecasts for combined gravitational wave weak lensing+standard siren constraints on cosmological parameters in the wCDM model (see Tab. I) using realistic numbers for 3G detectors.Bright Standard Sirens (BSS) have a smaller redshift range but more accurate redshift determination through localisation to a host galaxy, while Dark Standard Sirens (DSS) can be observed up to larger redshifts, but their redshift is estimated using statistical methods and so is much noisier.We set the GW detector distance uncertainty to 2% for BSS and 10% for DSS.

Parameter
By the late 2030's, there may be other spectroscopic redshift surveys, for example the proposed MegaMapper [62] which would observe spec−z's of ∼ 10 8 galaxies from 2 < z < 5.While not a benefit to spectroscopic follow up of BNS mergers, whose GW detector selection function drops off around z = 2, MegaMapper could be used to perform statistical dark standard siren redshift estimation using spec−z's.To test this possibility, we reran the analysis replacing the redshift uncertainty in Eq. ( 16) with σ 0,DSS = 0.001.This decreases the combined BSS+DSS parameter constraints by a factor of 0.5 − 0.8. of DSS to the sample.Despite their noisy redshift estimate, they probe higher redshifts leading to an improved Ω m and greater sensitivity to the time-variation of dark energy parameterised by w a .But the larger sky localisation uncertainty means the C κκ ℓ result is degraded, hence the much weaker constraint on the clustering parameter S 8 , which can only be probed through the weak lensing measurement of GWs.The dark energy equation of state parameters can be constrained through the standard siren measurement alone, and this provides the most information.Combining the standard siren measurement with weak lensing tomography, as is done for all results, improves constraints on geometry parameters by ∼ 10%.Using 2D lensing (no tomographic bins) gives marginal added information for geometry parameters.Fig. 4 shows constraints for w 0 and w a in wCDM against source numbers, which is a proxy for detector runtime.Using the uncertainty on the LVK inferred local Universe BNS merger rate R(0) [69], and assuming the same BPASS form for R(z), an estimate for the range of total mergers occurring over all cosmic history can be obtained, as an indication of future detection limits.This is done by normalising the BPASS R(z) to each upper and lower bound of R(0) in turn, substituting into Eq.( 18) and integrating over the redshift range.This range over a 10 year runtime is shown by the red vertical lines.The forecasts for the combined standard siren and GW weak lensing constraints using a population of BSS only and a range of values for the instrumental d L uncertainty from 1% − 10% (green lines) are in broad agreement with those presented in Ref. [85].The combination of these populations of BSS with a population of DSS with N DSS = 100 N BSS and an instrumental uncertainty of 10% (blue lines) demonstrates the importance of including the large number of noisier, but higher redshift, sources.The blue lines stop at nBSS = 10 deg −2 as DSS source numbers greater than 10 3 deg −2 are unlikely, indicated by the LVK inferred limits of all BNS mergers.For expected numbers of sources from 3G GW detectors, by combining BSS and DSS observations, constraints on w 0 and w a will improve upon those by the Euclid 3 × 2pt analysis [88] by a factor of 3−10.The Vera C. Rubin Observatory predicts similar constraints [20,89], therefore a joint standard siren+GW-WL analysis using 3G GW detectors will improve further dark energy constraints from future galaxy surveys.Expected forecasts from 3G detectors for all parameters can be found in Appendix D. Fig. 4 shows how increasing the detector uncertainty of the BSS sources from 1% to 10% in the combined BSS+DSS constraints (blue lines) increases these combined constraints by a factor of 2, indicating that the forecasts are sensitive to the assumed instrumental error.However, there is still a significant gain in information when combining the two merger populations, demonstrated by the difference between the green and blue lines.These forecasts are also sensitive to the observed distribution of mergers and hence on the choice of population synthesis code used, in this case BPASS [70].It has been shown how the choice of the star formation rate and metallicity evolution used by the population synthesis code can have a large impact on the distribution of BBH mergers, less so on the BNS merger distribution [90].Forecasts were also produced using different p(z) assumptions to test the sensitivity of our results to this choice, see Appendix A 2. As expected different redshift distributions can alter the parameter degeneracies and resulting constraints, but this effect is marginal, and does not affect the main conclusions of the paper.

B. 2050's and beyond: DeciHz GW detectors
DeciHz detectors will observe the inspiral of orbiting binaries for months to years before merger as the satellites orbit the Sun.The resulting precise d L determination and sky localisation will allow identification to a host galaxy for many sources, without the need for an EM transient counterpart [15,[91][92][93].This greatly increases the expected number of BSS, comparable to the number of DSS.
In Fig. 4, when we extend to source numbers only possible with DeciHz detectors the precision will be significantly improved (for example see Tab.III).This huge population of sources with accurate redshifts allows weakly lensed GWs to be used for more challenging statistical cosmological tests.The value of the sum of neutrino masses Σm ν is presently only constrained by upper bounds.The most stringent constraint is Σm ν < 0.12 eV (95% C.L.) coming from Planck 2018 including lensing + BAO [7].The value of Σm ν impacts structure formation, and precise determinations could shed light on whether neutrinos follow a normal or inverted hierarchy [95], adding valuable information to the question of whether neutrinos are Dirac or Majorana fermions [96].The normal hierarchy predicts Σm ν > 0.06 eV, while in the inverted hierarchy scenario Σm ν > 0.10 eV, so they can be distinguished in the case where neutrinos follow a normal hierarchy and Σm ν < 0.1 eV.
It is interesting to investigate how informative the weak lensing of GWs can be on the sum of neutrino masses.Refs [97][98][99] explored how combining a standard siren measurement of GWs with other probes (CMB, Supernovae and BAO) could be used to improve constraints on the sum of neutrino masses, but as far as we are aware our work demonstrates the first use of GWs from merging binaries as an independent probe of the sum of neu-FIG. 5. Constraints in the sum of neutrino masses vs σ8 plane in the νΛCDM cosmological model (see Tab. I) from a joint gravitational wave weak lensing+standard siren analysis.zmax = 2 corresponds to merging Binary Neutron Stars with an observed electromagnetic (EM) transient counterpart (spec−z), observed up to a maximum redshift of 2. zmax = 3.5 is a more optimistic population with spectroscopic redshifts up to 3.5, and the inclusion of sources without an EM counterpart (merging black holes or black hole-neutron stars) whose host galaxy has been identified through exquisite sky localisation of the gravitational wave (GW) signal.This is only possible with DeciHz GW detectors.Here it is assumed the instrumental uncertainty σ d L /dL = 0.4% [15].The region left of the black vertical can only correspond to the normal neutrino mass hierarchy (NH), while that to the right can correspond to either the normal or inverted hierarchy (NH/IH).trino masses.We include a more optimistic source distribution where BSS are comprised of highly localised BNS, BBH and BHNS mergers and have an associated spectroscopic redshift up to z max = 3.5, similar to that used in [15] where they assumed z max = 5.Fig. 5 shows constraints in νΛCDM in the Σm ν vs σ 8 plane, which can only be investigated with the added lensing information and so represents an entirely novel use of GWs.We see the degeneracy slope between these two parameters, where the free-streaming of higher mass neutrinos impedes clustering.Because of the improved d L measurement in DeciHz detectors we set σ GW /d L = 0.4% based on the median BBH distance uncertainty at z ∼ 1.5 of Ref. [15].More conservative results can be seen in Appendix E. In Fig. 5 the results for z max = 2 correspond to the same BSS population as in Fig. 3.With the higher redshift population and larger source numbers, a ∼ 1σ preference for NH can be determined with σ(Σm ν ) = 0.04 eV.σ(Σm ν ) = 0.05 eV is still achiev-FIG.6. Constraints on all cosmological parameters in the νkwCDM cosmological model (see Tab. I) from the joint inference analysis of gravitational waves (GWs).This involves combining a standard siren distance measurement with a weak lensing analysis of the GWs.Here we use 3 × 10 6 sources up to zmax = 2 with σGW/dL = 1%.Created using the corner package [94].able with the same population and a much smaller source number, even when increasing the distance uncertainty to 1%.Assuming the smaller number and lower redshift scenario with σ GW /d L = 1% gives σ(Σm ν ) = 0.07 eV.Due to cosmic variance and the assumed f sky , improvement with number density diminishes past n = 500 deg −2 .
The most cosmology agnostic model used in our framework is the νkwCDM model.Allowing both curvature and dark energy to vary changes some parameter degeneracies [100].Fig. 6 shows contours for all parameters, in this case there is nBSS = 200 deg −2 (N = 3 × 10 6 ) with σ GW /d L = 1%.Even in this case we obtain percent or sub-percent accuracy on all parameters, besides those relating to neutrinos.This demonstrates the power of this joint standard siren+GW-WL method.Forecasts for constraints on all parameters in each cosmological model from DeciHz detectors can be found in Appendix D.

C. Cosmologically varying p(z)
For the weak lensing of gravitational waves, in this framework we generate the observed redshift distribution using a fixed R(z) in Eq. (18).If the form of R(z) is wellknown, then the cosmology dependence of p(z), through the comoving volume term, can be exploited.Expected astrophysical parameters such as spin and mass distributions contributing to the form of R(z) can be found from many observations, and through this added knowledge we can perform a first order decoupling of astrophysics and cosmology.This will be possible with 3G detectors as we start to constrain the formation channels of binary mergers, leading to an observationally motivated expression for R(z) [101][102][103].Knowing R(z) means the cosmology dependence of p(z) can be exploited, and also reduces our sensitivity to selection effects usually present in an observed p(z).Again we use BPASS for R(z)'s.We choose this non-parametric model due to its good agreement with observations.A parametric model, while possible to marginalise over, could produce wildly incorrect rates and therefore not be a useful model.Fig. 7 shows a comparison between forecasts found when p(z) is kept fixed with the fiducial cosmology, or allowed to vary in the derivative calculation.These contours show constraints given by the weak lensing of GWs only (not combined with the standard siren measurement).It can be seen how the w 0 − w a degeneracy is shifted, and information is gained on w a .For other parameters, allowing p(z) to vary leads to some cancellation of parameter dependence and therefore weaker constraints.The combined standard siren and weak lensing constraints are only marginally affected.Tests were performed to check whether uncertainties in R(z) could affect parameter uncertainties, and we found that even large errors in R(z) do not have an effect on quoted results.

VI. DISCUSSION AND CONCLUSIONS
Future 3 rd generation gravitational wave detectors will observe huge numbers of binary mergers, sufficient for challenging statistical cosmological tests such as weak lensing.Two confirmed detector proposals, the Einstein Telescope and Cosmic Explorer plan to come online in the late 2030's -early 2040's.Looking further, there will be an incredible boom for GW cosmology with the advent of DeciHz observatories, but due to the technical difficulty their status will not be clear until after the pioneering space-based GW observatory LISA has been launched, also in the 2030's.Therefore, as we consider 10 years of data taking, these forecasts are for far in the future, when GW cosmology has become a tried and tested method.Here we explore those cosmological parameters that are presently of interest, such as the dark energy equation of state parameters and the sum of neutrino masses, to illustrate the power of the method and compare with other surveys which provide predictions for the same parameters.
We find GW weak lensing tomography is set to be a valuable cosmological tool for 3G detectors.Bright Standard Sirens found from multimessenger gravitational wave and electromagnetic observations could constrain the dark energy equation of state parameters w 0 and w a with comparable precision to Euclid, depending on the number of sources and their uncertainty.It is likely that true numbers of BSS will be similar to the upper limits quoted in Ref. [85] due to the existence of other GW (ET, LISA) and electromagnetic (Fermi-GBM, Swift-BAT etc.) observatories.In the case where we combine a population of BSS with realistic numbers of Dark Standard Sirens coming from binary black hole mergers, constraints on w 0 and w a improve upon Euclid, and give comparable or better constraints to forecasts for a CMB-S4-like experiment with 10 9 spectrometer hours [104], and a 1024 dish HIRAX experiment which expects a precision ∼ few percent [105].Therefore it is vital the tools necessary for statistical inference of DSS redshifts are developed.The situation could be improved even further by high redshift spectroscopic galaxy surveys such as MegaMapper [62], which could allow a spec−z statistical redshift determination for DSS mergers.The advantage of GW-WL over intensity mapping is we have a much better understanding of the physics underlying how GWs are lensed (essentially just General Relativity), compared to the complex relationship between the distribution of, for example, neutral Hydrogen and dark matter.The advantage over a CMB experiment is the lack of foregrounds confusing the GW signal.A final benefit of GW-WL is the ability to use an astrophysically motivated cosmic merger rate density of sources.Large numbers of future observations will help to constrain the formation channels of binary mergers, allowing us to exploit the cosmology dependence of p(z).We find allowing p(z) to vary with cosmology leads to improved constraints on the time-variation dark energy, but degrades other geometry parameters such as h and Ω m .The main difficulty to contend with for GWs is observing a signal over the background with a high enough SNR to accurately measure its parameters, such as distance and chirp mass.
For space-based DeciHz GW detectors, many more BSS observations are expected due to the incredible sky localisation of these detectors, including DECIGO and the Big Bang Observatory.There will be exquisite precision on geometry parameters due to very large number of d L − z data points with small d L errors.These detectors could achieve a 1σ preference for the neutrino normal hierarchy using a fractional sky coverage of f sky = 0.36.While not significant enough as a single probe to determine the neutrino mass hierarchy, these results demonstrate for the first time how the weak lensing of GWs could provide valuable extra information on the sum of neutrino masses when combined with future CMB experiments [26] and HI intensity mapping surveys [106].For these results to be possible, the substantial challenge to overcome is building and deploying the space-based DeciHz GW detectors into their heliocentric orbits.Another recently explored possibility is a Lunar-based De-ciHz GW detector, which would be easier to maintain and have comparable sensitivity to the proposed space-based DeciHz detectors [107].
A limitation of this study is using single values for uncertainties for the whole population of GW sources.A natural extension to this work is to produce a realistic set of GW measurements through a Fisher forecast of their parameter uncertainties, then perform the d L − z and power spectrum fitting on this realistic set of measurements.Here assumptions on our survey selection functions are varied to test the sensitivity of results to these choices, which are based on predictions for forthcoming electromagnetic and gravitational wave detectors.The galaxy survey selection function f (z), seen in Eq. ( 21), depends only on the pivot redshift z pivot .Instead of changing the form, we investigate the impact of the choice of this pivot redshift in the BSS case.It is found that, because the bulk of the sources are at lower redshifts that are accessible to spectroscopic surveys, varying the pivot redshift and hence the number of sources in the high redshift tail has a small impact on results.

CTM thanks Danny
For the GW selection function, we modify the power in the exponential in Eq. ( 20) and the value of r cut .Varying the third power to a quadratic or quartic term has a very small effect as the change in the shape of the distribution leads to less sources at lower redshifts but more at higher redshifts.Changing the value of r cut has the largest impact on the presented results, as we are essentially changing the sensitivity of our GW detector.Decreasing r cut by 1 Gpc degrades forecasts by ∼ 10% without affecting the parameter degeneracies.The general conclusions of the paper including the benefits of combining BSS and DSS, and a standard siren+GW-WL analysis producing competitive forecasts with other future probes, are not affected.

Observed p(z)
Different choices for the observed p(z) can be made.We can use a shifted galaxy survey p(z), with z 0 = 2/ √ 2 which is higher than the Euclid value of z 0 = 0.9/ √ 2 to reflect the greater redshift range of 3G GW detectors.This is the distribution used in CT19, and will be referred to as galaxy survey-like (GS-like).
Another route is changing the assumption on the cosmic merger rate density R(z).One way is by using a specific delay time distribution (DTD) between the formation of binaries, which follows the star formation rate (SFR), and the eventual inspiral and merger.This is similar to the BPASS results [70], without any modelling of the complicated physics of binary inspirals.P (τ ) is the DTD, and t max is the age of the Universe minus the lookback time to the galaxy.To test the sensitivity of our results we use the more extreme 'Slow' model in Ref. [108].These distributions are seen in Fig. 8.Although different assumptions on the source distributions do affect the forecasts, the effect is marginal as demonstrated in Fig. 9 Appendix B: Numerical Derivatives As discussed in Ref. [88] section 4.5.2,great care is needed when finding numerical derivatives of the power spectrum-especially in the case of a tomographic analysis.This is because the response of the power spectrum to a changing cosmological parameter can vary greatly over different ℓ modes and redshift bin correlations.The choice of the step size for the derivative becomes nontrivial, and tests are needed to ensure results are not overly sensitive to this choice.

R(z) ∝
The method adopted here uses the numdifftools package.The Derivative function is an adaptive central difference approximation scheme which can calculate the derivative of C κκ ℓ for a range of step sizes, returning the optimal step size for each ℓ mode independently.This optimal step can vary greatly over ℓ modes and tomographic bin combinations.But using different values of the step size for different ℓ modes causes discontinuities in the derivative and an unrealistic forecast.Therefore for each parameter, we use the median of the optimal step sizes of each ℓ mode and correlation returned by the Derivative function.This may introduce inaccuracies as too large/small a step is being used in some cases.To test whether this median step is indeed optimal for the derivative of the the whole C κκ ℓ , we define a merit function.We find the absolute difference between the derivative found using a step size and the previous step.This difference between derivatives is found over all ell modes and bin correlations.Then we find the mean of all of these differences (similar results are obtained if the maximum or median difference is used).For an optimal step size, we expect this merit function to be minimised.This implies successive steps are narrowing in on the true value for the derivative.At too small step sizes the difference between successive steps can vary a lot due to numerical error in the calculation.At too large step sizes, we are not picking up the true shape of the function, and the derivative can vary a lot depending on the smaller scale features.
Once we have checked for optimality using this merit function, stability of results around this step size needs to be tested.Stability was tested for by recalculating forecasts for a large range of step sizes around this median step.We find that the parameter uncertainties vary by at most ∼ ±3% of the uncertainty found using the optimal step size when the step size is varied in a range of at least −log(2) ≤ log(Step / Optimal Step) ≤ log(2) for each parameter.This demonstrates that the results are optimal and robust.These results can be seen in Fig. 10.Each parameter's merit function has a minimum plateau region, and the optimal step size (vertical dotted line), along with the region where the results are robust (grey shaded region), lie within this minimum plateau.A method of improving the accuracy could be using a different step size for each bin correlation, though from Fig. 10 it is not expected this will have a significant impact on the results presented.FIG. 10.For each parameter a chosen merit function for the derivative of the convergence power spectrum C κκ ℓ as a function of the derivative step size is displayed.This is the mean difference between the derivative at a step i and i − 1.We expect this to be minimised for an optimal step size.Also shown by the vertical black dotted line is the step size found by taking the median of all the 'optimal' step sizes returned by the numdifftools.Derivative function for each ℓ mode and redshift bin correlation.The grey shaded region shows the step sizes which produce parameter uncertainties within ∼ ±3% of the uncertainty found using the step size at the dotted line.

FIG. 1 .
FIG. 1. Gravitational waves being lensed by a large matter overdensity.The magnification results in a larger wave amplitude, and a smaller inferred distance to the source.The size of the deflection and change in amplitude has been exaggerated for illustration purposes.

FIG. 2 .
FIG. 2. p(z)is the observed redshift distribution of gravitational waves from binary mergers for 3G ground-based detectors.This is split into two populations; binary neutron stars with an EM counterpart (Bright Standard Sirens, top), and a combined population of binary black holes and black hole-neutron stars with a statistically obtained redshift (Dark Standard Sirens, bottom).p(z) is obtained by combining the theoretical normalised redshift distribution p (th) (z) (found from BPASS simulated merger rate densities[70]) with approximate forms of the relevant galaxy survey selection function f (z) and 3 rd generation gravitational wave detector selection function g(z).

FIG. 7 .
FIG. 7. Constraints on dark energy equation of state parameters w0 and wa in the wCDM cosmological model (see Tab. I) from the weak lensing of gravitational waves only.The orange contour is the situation of a fixed, observed p(z) in the weak lensing analysis.The green contour is when we use prior knowledge about the expected cosmic merger rate density of sources, and then exploit the dependence of p(z) on cosmological parameters through the comoving volume.

FIG. 8 .FIG. 9 .
FIG.8.Alternative observed redshift distributions to those used in the main analysis (BPASS, a population synthesis code).'GS-like' is a galaxy-survey like redshift distribution.The last model assumes binary populations follow the star formation rate, with some time delay between the formation and merger.

FIG. 11 .
FIG. 11.The fractional difference between parameter constraints using a maximum ℓ mode of ℓmax = 3000 and other choices of ℓmax.The comparison is made for the results in Fig.3(top) and Fig.5(bottom).
Laghi for helpful discussion on statistical redshift inference, and Suvodip Mukherjee for useful exchange on the shot noise modification due to the gravitational wave localisation uncertainty.CTM is supported by a Science and Technology Facilities Council (STFC) Studentship Grant.AT is supported by an STFC Consolidated Grant.The Python packages numpy, scipy, numdifftools, matplotlib and corner have been used in this work.For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.

TABLE III .
1σ marginalised uncertainties for parameters in the DeciHz case assuming a number density of Bright Standard Sirens of 200 deg −2 , 3 × 10 6 sources using f sky = 0.36.The maximum redshift these sources are observed to is zmax = 2. Case A uses σGW/dL = 0.4%, case B uses σGW/dL = 1%.