Cross-correlating galaxy catalogs and gravitational waves: a tomographic approach

Unveiling the origin of the coalescing binaries detected via gravitational waves (GW) is challenging, notably if no multi-wavelength counterpart is detected. One important quantity for diagnostics is their distribution with respect to the large scale structures of the universe, as encoded, for instance, in their (linear) biases. We discuss the perspectives of inferring these quantities via the cross-correlation of galaxy catalogs with mock GW ones, using both existing and forthcoming galaxy catalogs and using realistic Monte Carlo simulations of GW events. We find that the bias should be marginally detectable for a 10-year data taking of current generation detectors at design sensitivity, at least for binary neutron star mergers. The forthcoming five detector network would allow for a firmer detection of this signal and, in combination with future cosmological surveys, would also permit the detection of the coalescing black hole bias. Such a measurement may also unveil, for instance, a primordial origin of coalescing black holes. To attain this goal, we find that it is crucial to adopt a tomographic approach and to reach a sufficiently accurate localization of GW events. The depth of forthcoming surveys will be fully exploited by third generation GW detectors, which will allow one to perform precision studies of the coalescing black holes bias and attain rather advanced model discrimination capabilities, at a few percent level.


I. INTRODUCTION
New windows in astrophysics usually address existing questions, but also suggest new tools for diagnostic and raise new puzzles. This is the case with gravitational waves (GW): On the one hand, the multi-messenger detection of a neutron star-neutron star merger (BNS) [1] had important implications for a wide range of topics, from r-process nucleosynthesis to modified gravity theories. On the other hand, both the progenitors of coalescing binary black holes (BBH) and the environments where they have originated remain mysterious [2] (see for instance [3] for a compact review).
At the same time, cosmological applications of this new observational window are routinely discussed. The most popular is an independent determination of the current Hubble expansion rate, H 0 , from standard "sirens", the GW analogue to standard candles for electromagnetic observations, with completely different systematic uncertainties [4]. In the future, this may shed light on the so-called "H 0 crisis" [5], i.e. the disagreement between values inferred from CMB analyses or supernovae observations.
Another example of the possible synergy between cosmology and multimessenger astronomy is the crosscorrelation between future GW catalogs and galaxy surveys [6][7][8][9][10]. Until now, studies have mostly focused on * alessandro.cuoco@unito.it cosmological parameter inference (e.g. non-gaussianities in [6], Hubble constant and dark energy in [7]), or constraints on BBH progenitors (e.g. [9,10]). In particular, recent analyses by [9,10] have emphasized that the bias, b, of BBH with respect to large scale structures (LSS) of the universe, as traced by galaxy catalogs, can disentangle a primordial black hole origin from an astrophysical one. Indeed, a bias b 1 is expected from primordial BBH tracing the dark matter LSS, a b > 1 is typical of low mass halos favored when primordial black holes form binaries only at late times in virialized objects, while a b > 1 characterizes stellar origin BBH harbored in luminous and massive galaxies, which are biased tracers of LSS. The most recent study by [10] has found marginal capabilities of detection and discrimination (typical signal to noise ratio of ∼ 1 − 10), which improve with future galaxy surveys and GW detectors. The authors have performed a joint cosmological parameters/BBH bias analysis, assessing the impact of including Planck data as well as lensing and relativistic effects, while adopting a parametric approach for the number of GW events and angular resolution of the GW catalog.
Here we revisit the issue of the determination of the bias, for different classes of binary mergers. We adopt a complementary approach: On the one hand, we fix the cosmology to the Planck best fit, since for a long time to come we expect GW data to provide only modest help to cosmological parameter determinations -at least in minimal cosmological models. This choice also implies that lensing and relativistic effect have tiny impacts (per-cent level) on the results [10], and can be neglected at the price of a (rather small) underestimate of the error. On the other hand, we decide to work in terms of realistic benchmarks, rather than in terms of simplified parameterizations for the source distribution and angular reconstruction capabilities. For example, we do not limit ourselves to the BBH channel nor to specific pre-defined ranges of redshift z -as opposed to previous studies that focused either on the cumulative signal at low redshifts [8], or on coarse-grained distribution in z [6], or on high z only [10].
More specifically: (i) We realistically simulate catalogs of GW events, taking into account the full star formation history. Furthermore, we fully model selection effects for each event, rather than assuming a sharp detectability distance.
(ii) Besides BBH, we extend our study to all type of binaries, including also BNS and black hole-neutron star (BHNS). We compare prospects and performances of both existing and forthcoming catalogs.
(iii) We pay attention to -and highlight the relevance of -the diagnostic power of a binning in z (tomographic approach), with a focus on the relatively low-z range.
(iv) We do not assume a sharp angular resolution cutoff in multipole ( ) space.
(v) We adopt a more realistic angular resolution, also studying the dependence of the results from different detector configurations.
This article is structured as follows. In Sec. II we introduce the formalism. In Sec. III we describe the generated distribution of GW events, as well as the detector configuration benchmarks used. In Sec. IV we describe the (existing or forecasted) catalogs used. In Sec. V we present our results, while in Sec. VI we draw our conclusions.

II. METHODOLOGY AND FORMALISM
Our goal is to investigate the cross-correlation between the distribution of GW events and the sky-projected spatial distribution of galaxies (or more generally, different types of extragalactic objects). The cross angular power spectrum (CAPS) between number density fluctuations in a population of discrete objects of two catalogs a and b can be expressed in terms of the multipoles, C : where for both catalogs i = a or b provided that a, b, say galaxies and GW events, trace the LSS with a certain (scale-independent, but possibly redshift-depedent) linear bias b(z). In the expression above, χ(z) is the comoving distance to redshift z, dN i /dz is the redshift distribution of the objects (normalized so that dN i /dz dz = 1), j the spherical Bessel functions of order , and δ(k, z) is the relative matter density fluctuation field in Fourier space. Taking the ensemble average and using the Limber approximation the previous two equations can be combined [11] into is the window function of the field i, normalized so that dχW i (χ) = 1, H(z) is the Hubble function, and P (k, z) is the matter power spectrum as function of redshift. The main unknown in this approach is the linear bias b i (z). We expect the above approximations to be of comparable accuracy for GW events and galaxies, since current predictions tend to yield coalescence rates roughly proportional to the star formation rate, e.g. [12].
Similarly to the above, we can define the autocorrelation parameters, C aa , C bb .
The model is fully specified by the dN/dz (from GW simulations and galaxy catalogs), the bias b(z) (our unknown and the galaxy catalog one), and the cosmology assumed. The bias of the galaxy catalog can be typically measured at a good level of precision (better than 10%) from the autocorrelation of the galaxies in the catalog itself. For the cosmological parameters we use the latest determination from the Plank Collaboration [13] with a flat cosmology, Ω k = 0 (namely, H 0 = 67.7 km s −1 Mpc −1 , Ω c h 2 = 0.1193, Ω b h 2 = 0.0224, n s = 0.966, σ 8 = 0.810). We use the software CLASS [14] with these parameters as input to compute the linear matter power spectrum P lin (k, z). To derive the fully non-linear P (k, z), required in Eq. 3, we use the halo model prescription [15]. We compare it also with the more common procedure of using Halofit [16] and we found small differences of order 10%. We stress that even neglecting completely the non-linear corrections to P (k, z) would hardly affect our results. In fact, we limit the analysis always to a maximum multipole of =200, where non-linear effects are subdominant. Note that, although some of the analyses reported below are sensitive to larger multipoles (up =1000 in some cases), these are unessential for the determination of the linear bias b of GW events.
Furthermore, given the above limitations, we decided to neglect general relativistic effects such as weak lensing, which are expected to yield small corrections, notably at the smallest scales not probed by current experiments [9]. In the future, one may want to include them to account better for parameter correlations when inferring cosmological parameters and their uncertainties [7,10].
In general, the detectability of the cross-correlation signal depends on observational aspects, such as the number of sources, the field of view, and the angular resolution of the two catalogs. A simple analytical procedure to estimate these quantities leads to [17] (see also [18,19]): where we introduce the fraction of the sky covered by the most restrictive of the two catalogs the Poisson noise arising from the discrete number of sources of the catalog, i.e.
with N a the total number of sources within the catalog a in the considered field of view. We also introduce the Gaussian angular window function, i.e.
where σ a represents the mean square root deviation of the Gaussian beam (i.e. the point-spread function) of catalog a. Analogous relations hold for catalog b. For galaxy catalogs, σ a is better than arcsec scale and the window function can be set to unity for the range of we will consider. For completeness, the error on the autocorrelation is A tomographic approach consists in evaluating Eq. 3 in several redshift bins, instead of over the whole redshift range. Spectroscopic or sufficiently precise photometric redshifts for each object are provided for most galaxy catalogs. GW measurements, instead, provide a determination of the luminosity distance of the object, since the amplitude of the GW signal scales with the inverse of the luminosity distance. Assuming a given cosmology, this can be translated into a z determination.
The GW bias can be derived constructing a χ 2 estimator 1 where the sum runs over redshift bins. When performing an analysis on actual data, the first term in the numerator of the χ 2 will be the measured CAPS of GW and galaxy catalogs. In this work, we use simulated data in order to assess how well the bias can be determined. We therefore replace that first term with the expected CAPS for b = 1, the bias benchmark value of our analysis. We will use Eq. 10 in a twofold way: First, to determine the significance level at which the two cases b = 0 (isotropy) and b = 1 can be distinguished, interpreting χ 2 (0) as the significance with which anisotropy can be detected. Secondly, to compute the error with which a bias b = 1 could be measured. Such an error is indeed provided by the values b * so that χ 2 (b * ) = 1. Clearly, the results depend on the choice of the bias benchmark value, although this dependence is mild. We note that our benchmark b = 1 is slightly conservative, since the most conservative case from a theoretical point of view (and likely correct at least for the BNS case) would be b > 1, typically associated to binaries coming from the endpoint of stellar evolution [10].
Due to the large field of view of GW detectors and for GW surveys over many years as considered here, the sky coverage is quite close to uniform, justifying our all-sky angular analysis. We therefore neglect the effect of slight non-uniform coverage, which is not crucial for our forecast and could be accounted for in the actual analysis knowing the sky-coverage.

III. MOCK GRAVITATIONAL WAVE CATALOGS
The GW mock events are generated using a Monte Carlo code developed for the Einstein Mock data challenges [20][21][22] and used subsequently for other studies [23][24][25]. We first produce a population of compact binary mergers distributed in the universe, selecting the redshift from the probability distribution function: constructed by normalizing, in the range z = [0 − 10], the merger rate per interval of redshift The above quantity is given by the merger rate density R m (z) times the comoving volume element. The factor (1 + z) in the denominator converts the rate in the source frame to the observer frame. As in [26,27], we assume that R m (z) follows the star formation rate of [28] with a time delay t d whose probability distribution is of the form of where z f is the redshift of formation of the binary. The local rate, R 0 , corresponds to the rates inferred from the detections in the first two LIGO/Virgo observation runs [29]. We use median rates obtained from the GstLAL pipeline, assuming a Gaussian distribution of the component masses for BNS (R 0 = 920 Gpc −3 yr −1 ) and a power-law distribution for BBH (R 0 = 56 Gpc −3 yr −1 ). For the case of BHNS we consider the upper limit derived assuming fixed component masses of 1.4 M and 10 M (R 0 = 600 Gpc −3 yr −1 ) [29]. Note that a future refined determination of the event rates would translate in a renormalization of the running time required to attain the performances reported in the following.
After the redshift is selected, we draw the masses of the components from the distributions above, as well as the inclination of the orbit, the polarization and the phase at the merger from uniform distributions [20].
In order to determine whether a source can be detected, for a specific observational scenario (see below), we calculate the combined match-filtered signal-to-noise ratio, SNR, as the quadrature sum SNR 2 = i SNR 2 i of the individual signal-to-noise ratio in each detector, SNR i , cf. Eq. 18 in [20]. In this paper we assume a 10-year data taking with a duty cycle of 80% for each detector and we set the detection threshold to SNR > 8, since a modest contamination with noise is tolerable for this specific statistical analysis. Spurious events due to noise are indeed expected to be uncorrelated with extragalactic structures and would not bias the current forecast.
We consider three possible benchmark configurations for the GW detector network (see also Tab. I): I. HLV: Advanced LIGO-Virgo with three detectors of second generation (2G) at design sensitivity, at the current Hanford, Livingston (LIGO) and Cascina (Virgo) locations. Typical samples of detected mock GW events for an SNR above 8 (based on the simulation described above) are ∼500 BNS events, 4000 BBH events, and up to 2000 BHNS mergers. The redshift distribution of the events from the simulation is reported in Fig. 1 (upper panel). We see that in this case the BNS events are all cosmologically very close (within z 0.15), while BBH extend to higher z, up to a maximum of z ∼ 1 with a peak at z ∼ 0.3. The BHNS redshift distribution is intermediate between the two other classes.
II. HLVIK: Five 2G detectors, three as above, adding IndIGO [30] and KAGRA [31]. The redshift distribution of detectable sources is very similar to the previous case, extending to slightly higher z. Instead, the statistics increases of roughly 5 times for each class of mergers, i.e. 2500 BNS events and 2×10 4 BBH (and a similar upper limit to the BHNS events).
III. ET2CE: Three detectors of third generation (3G), namely one Einstein Telescope (ET, [32]) at the Virgo site and two Cosmic Explorers (CE, [33]) at the two LIGO sites. This configuration would allow one to explore redshifts up to z ∼ 5, with a distribution typically peaking around z ∼ 1 for all classes. The number of events will also feature a large increase up to two orders of magnitude with respect to HLVIK, with ∼ 7×10 6 BNS and ∼ 7×10 5 BBH. However, HLVIK having already almost full efficiency in detecting BBH below z ∼ 0.3, HLVIK and ET2CE will detect roughly the same number of BBH (∼ 6000 in ten years) below z 0.3.
We perform the reconstruction of about 1000 BBH events and 300 BNS for the HLVIK case, ∼1.4 × 10 4 BBH events and ∼ 5000 BNS events for the ET2CE case, which is enough to perform statistical studies of the average expected sky localization and redshift error. For the BHNS case, which is anyway the less reliable and constrained of the cases shown, we just assume for all cases similar performances as for the BBH case, given the expected similarity of their SNR distribution shapes, c.f. Fig. 4.
In the literature, the error on the luminosity distance and redshift is typically estimated to be inversely proportional to the SNR, for example, δz/z δd L /d L 3/SNR in Eq. 3 of [36]. This corresponds to ∼ 35% for an SNR of 8. Such an approximation is only partly confirmed by the BAYESTAR sky localization results, as shown in Fig. 2: On the one hand, we observe the predicted 20-30% error on the reconstructed redshift (assuming ΛCDM cosmology); on the other hand, the relative error is weakly dependent on the SNR, scaling as ∝1/SNR only for a small fraction of the BBH ET2CE events. This poses a lower limit on the minimum width of the bins in which the sample can be divided to perform tomography. In the following, nonetheless, we will use quite large bins in z, thus below the above mentioned limit.
As for the angular reconstruction, discussions can be found in Refs. [36][37][38][39][40][41][42]. However, most references provide only the average containment area of the event at a given level of significance, typically 68% or 90% C.L., but not the full shape of the "beam", which is in principle required for the full angular analysis.
With our simulations, we are able to study the shape of the full posterior probability distribution of the reconstructed localization. We find that when the event is reasonably well localized (i.e better than ∼ 100 deg 2 at 50% C.L.) the posterior is very well approximated by a two-dimensional Gaussian. It is thus reasonable to use the formula in Eq. 8, describing the beam as a Gaussian with a width σ b . In the case HLVIK, with five detectors, the beam is circular to a good approximation, while for ET2CE, with three detectors, the localization is significantly elliptical. While the formalism from Sec. II can be extended to include an elliptical beam, this would be an unnecessary complication for the forecast purpose of this work and we consider the GW beam to be circular also in the ET2CE case. In order to derive σ b , we start from the 50% C.L. localization area, as provided by BAYESTAR, and we convert it to σ b assuming a circular Gaussian beam. resolution that we adopt in the following requires only a mild quality cut, excluding the small fraction (about O(15%)) of events with σ b > 5 • . We note that this procedure mimics what can be done on real data, where the reconstructed angular resolution is known event-byevent. For the BNS HLVIK case, the angular resolution is almost z−independent at the shallow distances accessible to the detectors. If we exclude BNS events with σ b > 3 • (constituting about 20% of the sample), our fit yields  1.0 • z −0.022 at z 0.01. The BBH ET2CE case has a very good angular resolution for z < 1. This is mainly due to the fact that low-z events have a very large SNR in ET2EC and the angular resolution is a decreasing function of SNR. In deriving the z-σ b relation we have excluded the events with σ b > 3 • , which, as can be seen from Fig. 3, constitute a second sub-dominant mode containing only O(1%) of the total events. For z 0.1, we obtain σ b 0.31 • z 0.88 . The localization for BNS with ET2CE is typically worse than for BBH. In particular, the distribution is bimodal, similarly to the BBH case, but the fraction of events badly reconstructed (with σ b as high as 10 • ) is larger than 50%. We attribute the different reconstruction quality for BBH and BNS largely to the different SNR distributions of the two samples. This is illustrated in the bottom panel of Fig. 4, which shows that the SNR distribution of BNS ET2CE is peaked at low SNR, qualitatively similar to the event distribution of the HLVIK reported in the top panel of Fig. 4, while the one of BBH is peaked at SNR ∼ 30. In other words, the BBH sample seen by ET2EC is virtually complete, with the improved sensitivity resulting in stronger signals, rather than many more events at low SNR. Thus, the second mode, which for both BBH and BNS contains events with low SNR, is scarcely populated for BBH while TABLE II. Parameters of the z-σ b (z) relation, in the form log 10 (σ b (z)) = A + B log 10 (z), with A ≡ log 10 (σ b (z = 1)) corresponding to the best-fit parameters of the red curves in Fig. 3 In Tab. II, we quote the parameters of σ b (z) linear regression fit (in log 10 space) for the two classes of events (BNS, BBH) and the two detector configurations HLVIK and ET2CE, obtained considering only the mode with good angular resolution for each case (See Fig. 3 ) For the HLV case, the presence of only three detectors combined with a reduced sensitivity translates into an SNR strongly peaked at low values, and more pronounced deviations from a Gaussian reconstructed region are present. For this case, it is also computationally more challenging to collect a statistics of acceptable quality events sufficient for being post-processed via BAYESTAR. Since the angular resolution is mostly influenced by the number and location of the detectors, we used the ET2CE performance to estimate an optimistic proxy for the HLV case, consisting in a circular equivalent σ b of about 4 • and no redshift dependence. Since we will see that the perspectives for a detection in the HLV case are very poor, a more conservative choice would not alter significantly our conclusions.
Note that, for BNS (and possibly for BHNS), electromagnetic counterparts will be available for a certain fraction of the events, which will depend crucially on the multi-messenger efforts of the astronomical community and the reliability of current models. This fraction is difficult to estimate at the moment and will be thus conservatively neglected in the following. It is clear though that for these events a very precise redshift and angular determination will be available, allowing one to use this sub-sample for more precise angular and tomographic analyses.

IV. THE GALAXY CATALOGS
We will use in the following the 2MASS photometric redshift catalog (2MPZ) [43], the WISE×SuperCosmos catalog (WIxSC) [44], and SDSS Data Release 12 photometric catalog [45,46]. The catalogs characteristics are described in [47,48], which the interested reader can consult for details. Their source redshift distributions, dN/dz, are reported in Fig. 5. We provide below a short summary of the main catalogs' features.
• 2MPZ is a shallow catalog covering the redshift range z ∼ 0.0 − 0.2, including ∼ 8 × 10 5 galaxies and having a large sky-coverage of about 70%. Given the very large overlap with the BNS and BHNS distributions for the benchmark cases HLV and HLVIK, it is extremely suitable for studying their cross-correlation with LSS.
• WIxSC is an extension of 2MPZ to larger redshifts, up z ∼ 0.4, with a similar sky-coverage of 70%, and includes about 2 × 10 7 galaxies. Similar to 2MPZ, it is especially adequate to perform cross-correlation studies for BNS and BHNS classes in the HLV and HLVIK cases.
• SDSS has a similar number of galaxies as WIxSC, but with a much smaller sky-coverage (about 20%), while extending up to z ∼ 0.7. The SDSS dN/dz shown in Fig. 5 resembles more closely the BBH class of events for the HLV and HLVIK cases (see Fig. 1), and is thus better suited for their crosscorrelation analysis.
Current surveys are almost all-sky only up z ∼ 0.4, and with a large statistics and sky-coverage only up to z ∼ 0.7 (SDSS). 3G GW observatories will be all-sky and cover up to z ∼ 5, hence current catalogs would only allow for adequate cross-correlation studies of a small fraction of the collected GW events. This gap is however expected to be filled in the next decade by new surveys like EUCLID, LSST and, to some extent, by SPHEREx. EUCLID [49] will have both spectroscopic and photometric redshift samples. For the purpose of performing cross-correlations a very precise redshift estimate is not crucial and thus we consider here only the photometric sample. The advantage is that the number of galaxies will be extremely large, about ∼ 1.6 × 10 9 , with a galaxy density of about 30 arcmin −2 , providing a greatly reduced galaxy shot noise, and thus considerably improving the cross-correlation sensitivity. Further specifics of the EU-CLID photometric sample reported in the following are taken from [49]. Its dN/dz can be approximated as with z m = 0.9 the median redshift. As shown in Fig. 5, it has a very good overlap with the redshift distribution of GW events for the 3G detectors benchmark. The survey area will be ∼ 1.5 × 10 4 deg 2 , so that f fov ∼ 0.4. The photometric redshifts for the galaxies will be known with an error of ∆z ∼ 0.05 (1 + z), thus around 10% at z = 1. Note that a reasonable approximation for the bias is b(z) = √ 1 + z.
The photometric sample of LSST [50,51] will have very similar numbers as EUCLID, hence we will not adopt a different galaxy catalog benchmark. The main difference is that LSST, providing a new survey of the sky every few nights, is optimized to detect transients. Although not directly relevant for the present analysis, this will likely play a crucial role in identifying the counterparts of the GW events.
Finally, SPHEREx [52,53] will have a similar number of galaxies as EUCLID, with however a larger error, up to 0.2 (1 + z), although with the possibility to select subsamples at low z 0.5 with a few % error. Furthermore, it will have almost all-sky coverage, which will be helpful to improve the cross-correlation sensitivity at low z with respect to EUCLID performances.

A. Cross-correlation of 2G GW events
We first briefly mention the perspectives for measuring the autocorrelation of the GW events. The results for the HLVIK case are reported in Fig. 6. One can see that the signal is larger when the source redshift distribution is skewed towards lower z. This effect, which is purely geometric because of stronger anisotropies in the local universe, qualitatively favors a detection of BNS compared to BHNS, and BNS or BHNS compared to BBH. Needless to say, the total number of events is also important in assessing the error with which a measurement can be performed. Overall, we find that for 2G GW detectors there is little hope of a measurement. The perspectives for a low-significance detection are a bit brighter for 3G detectors, for which the improved statistics may be enough to beat the shot noise, despite the fact that the normalization of the signal is very low as a result of the redshift distribution being peaked at z ∼ 1 for all type of binaries.
A much more promising diagnostic for the not-so-far future (i.e. for HLV and especially HLVIK) is the crosscorrelation signal (see Fig. 7, for the cross-correlation with 2MPZ). The reason to be optimist can be immediately grasped from comparing the z distribution of the expected events ( Fig. 1) with the ones of the catalogs (Fig. 5). The overlap at relatively low z, where the signal from BNS (and possibly from BHNS) should be detectable already for HLV, is particularly large for the BNS signal, notably for the 2MPZ case. For the WIxSC HLV case (not shown) only a marginal detection for the BHNS case seems possible, while the HLVIK case shows a strong correlation signal both with BNS and BHNS, but still a weak one with BBH. Finally, a cross-correlation with SDSS (not shown) appears out of reach even for HLVIK, where only a BHNS signal may be detected for favorable statistics. These differences highlight the importance of the superposition of the redshift region covered by the galaxy catalogs with the one of the GW sample.
The detectability of the signal is quantified through the χ 2 (see Eq. 10), shown as a function of the bias in Fig. 8 for all type of binaries, for HLV and HLVIK, and for the 2MPZ catalog. We remind that the benchmark model assumes that GW events are an unbiased tracer (b = 1) of LSS and that a model with a given b is compared to the case b = 1 via the above χ 2 . The plot encodes a double information: • The range of b around b = 1 characterized by a deviation ∆χ 2 =1 is the expected (one sigma) error on b.
• The ∆χ 2 for b = 0 quantifies the statistical significance ( ∆χ 2 ) of the cross-correlation with respect to the isotropic sky.
A table with the ∆χ 2 (0) values for the different combinations of catalogs and GW detector configurations is given in Tab. III. Considering HLVIK, the cross-correlation with the 2MPZ catalog of BNS has a significance of about 6.5σ. This catalog is indeed optimal since peaking at low z. On the other hand, the optimistic BHNS class would emerge with highest significance cross-correlating with WIxSC. The SDSS catalog performs significantly worse, due to the inadequate matching of the z distribution of events. For HLV, the 2MPZ catalog offers a concrete hope for a ∼ 3σ detection of BNS (and perhaps BHNS at a similar significance) cross-correlation within a decade of running at design sensitivity. The BBH case, however, remains out of reach, which is mainly the result of a redshift distribution peaking at quite large z where the anisotropy is expected to be small. In this case, it would require a large number of GW events to be detectable. However, this can be in part overcome using redshift tomography, as we discuss in the next section. In order to overcome the above-mentioned limitations, we move to study the potential of a future EUCLID-like catalog for the only type of binaries for which it would be definitely needed, namely BBH. An important ingredient that we consider is redshift tomography. To evaluate the impact of a tomographic approach, we consider the  In all cases the binning in z was chosen so to roughly provide equal numbers of BBH in each bin. While with HLV we expect no detection of the BBH signal, for HLVIK the impact of EUCLID-like catalogs combined with a tomographic approach can be crucial. Fig. 9 and Tab. IV shows that the first two cases only provide a modest detection of the signal, but that cases with three bins or more permit to strongly increase the significance, especially thanks to the bins at low redshift where the anisotropy is the largest. This can be seen in the right panel of Fig. 9, where the contribution from each redshift bin is shown for the case 3zbin. Fig. 10 and Tab. IV provide a more quantitative comparison relative to the ∆χ 2 estimator. While for HLV the signal is never above 2σ even with multiple z bins, for HLVIK it is enough to have even a modest binning in redshift to make a significant signal emerge, up to 7σ for the five bins case. For the ET2CE, the signal is always very well detected. It can be seen that neglecting the information from z < 1, as done in [10] (dashed black curve), dramatically decreases the significance of the detection (from about 20σ to 10σ), even if only ∼ 20% of the events are in this range, cf. Fig. 10, right panel. Fig. 11 shows in more detail how well the bias is reconstructed independently in each single bin, for the HLVIK and ET2CE case. For the ET2CE case (middle panel) the error is typically ∼ 10% at low z, decreases to a few percent in the range around z ∼ 0.5 and then significantly increases in the range 1 < z < 5. This behavior is the result of two opposite trends: The lower the redshift, the higher the level of anisotropy projected over the sky. On the other hand, the number of events grows with redshift till z 1. As a result, there is a sweet spot at intermediate redshifts where the error determination is optimal. Finally the blue shaded region in Fig. 11 shows the combined constraint on the bias considering all the z bins together, yielding an error of a few percent. Such a precise BBH bias determination will allow, in the future, to eventually discriminate different models at the origin of the BBH population, in particular between an astrophysical or primordial origin. For HLVIK (top panel) the bias will be determined with a precision of about 10%, definitely enough for a first discrimination among models. In both cases, tomography would allow one to shed light on significant evolutionary effects of the BBH population, if present. The lower panel is another illustration of the advantage of the cross-correlation analysis over the autocorrelation one: For the auto-correlation the degradation of the constraints on b for the ET2CE case is dramatic, and only upper limits on b (or marginal detection for some redshift bins) can be hoped for via this analysis technique.

VI. DISCUSSION AND CONCLUSIONS
A few years after the first detection of coalescing binaries via their GW signal, we are about to enter the era of large samples of detected events, allowing one to perform more and more statistical studies. The cross-correlation of the events with LSS (notably galaxy) catalogs has been discussed in the recent past as a potential tool, in particular to identify the underlying population of progenitors of BBH and discriminate, for instance, between astrophysical and primordial models.
In this article, we have revisited this problem with a number of advances both on the technical and physical side: i) We have relied on realistic mock GW catalogs, calibrated on the most recent GW data, and on existing (and expected) galaxy catalogs. ii) We post-processed these events with realistic reconstruction tools for existing (and expected) detectors configurations, in order to determine the statistics as well as the key information concerning the angular resolution. iii) We have extended the study to different classes of events, namely BBH, BNS and BHNS. iv) We have explored the importance of the redshift distribution of both catalogs and GW samples and studied the impact of a tomographic approach.
Our results are rather promising: For BNS, the current generation of detectors (running at design sensitivity for a decade) should be able to marginally detect a crosscorrelation with a few hundreds of events, in particular with relatively shallow catalogs like 2MPZ, despite a relatively poor GW localization. When the complete 2G network of five detectors will be available, a firm measurement of the cross-correlation signal will be possible, thanks to both an increased statistics and a much better angular resolution. For BBH, however, both current GW detectors and current surveys prove inadequate for a detection of a cross-correlation signal. Fortunately, forthcoming surveys like EUCLID (similar performances are expected for LSST and, possibly, SPHEREx) will provide a much better coverage of the redshift range, where the bulk of BBH events comes from. This, combined with a fully tomographic approach, would allow one to detect the BBH bias at the 10% level with a network of five 2G detectors, and thus permit the first tests on their origin. Finally, 3G detectors will be able to detect mergers at large z, in a range of distances much better covered by forthcoming, deep cosmological surveys. We foresee that the combination of these two facts will open exciting perspectives for precision studies.
FIG. 11. Constraint on the BBH bias from the cross-correlation with a EUCLID-like catalog, in each redshift bin separately (error bars) and globally (shaded region), for HLVIK(top panel) and ET2CE (middle). For comparison, the bias extracted for ET2CE from the autocorrelation is also shown in the bottom panel.