Improved detection statistics for non Gaussian gravitational wave stochastic backgrounds

In a recent paper we described a novel approach to the detection and parameter estimation of a non-Gaussian stochastic background of gravitational waves. In this work we propose an improved version of the detection procedure, preserving robustness against imperfect noise knowledge at no cost of detection performance: in the previous approach, the solution proposed to ensure robustness reduced the performances of the detection statistics, which in some cases (namely, mild non-Gaussianity) could be outperformed by Gaussian ones established in literature. We show, through a simple toy model, that the new detection statistic performs better than the previous one (and than the Gaussian statistic) everywhere in the parameter space. It approaches the optimal Neyman-Pearson statistics monotonically with increasing non-Gaussianity and/or number of detectors. In this study we discuss in detail its efficiency. This is a second, important step towards the implementation of a nearly--optimal detection procedure for a realistic non-Gaussian stochastic background. We discuss the relevance of results obtained in the context of the toy model used, and their importance for understanding a more realistic scenario.


I. INTRODUCTION
The non-Gaussian stochastic gravitational wave (GW) background is an interesting and promising target for GW detectors. It originates from a superposition of many uncorrelated and unresolved events. An interesting example is the background originating from compact binary coalescences, which is within reach of current generation detector network [1], but several others can be observed in the future, both of astrophysical [2] and cosmological origin [3].
The statistical properties of a non-Gaussian background are related to astrophysically relevant quantities of great interest, such as the event rate and the population distribution of source parameters. In the presence of strong overlap between individual signals generated by the different events, the stochastic background is well modeled by a Gaussian stochastic process, fully characterized by its second order statistic. In absence of it , the statistical distribution of the strain signals measured by the detectors becomes non-Gaussian and contains larger information that can be in principle extracted. For example, accordingly with the rate estimates based on recent LVK observations [4] the background generated by BBH coalescences is expected to be strongly non-Gaussian, with a product between the length of the observable coalescence signal and the event rate O(10 −3 ). The same parameter is O(10) for BNS coalescences, and non-Gaussianity is expected to be less important. * riccardo.buscicchio@unimib.it Several detection strategies to efficiently detect a non-Gaussian background have been proposed in literature: Drasco & Flanagan [5] devised a likelihood appropriate for the superposition of burst-like signals; therein an optimal detection statistics was derived assuming well separated, burst-like signals, coaligned and colocated detectors, and white noise; an approach applicable without these assumptions, but only with a negligible superposition probability for different events has been proposed by Thrane [6]; Smith & Thrane [7] further proposed an optimal Bayesian parameter estimation method suitable for a background of astrophysical events in the low overlap regime.
In all the above proposals the basic building block is the probability distribution p for a convenient statistics of the data ρ that can be evaluated for each segment of a measured signal. This is written as a weighted sum where ρ ∼ p 1 when the given segment contains an astrophysical event and ρ ∼ p 0 otherwise. The weight ξ is just the probability of having an event in the segment. Other approaches, based on semiparametric models [8] or higher order statistics [9,10] have also been explored. A comprehensive review can be found in [11].
In a recent paper [12] we discussed a direct approach for detection and parameter estimation of a non-Gaussian stochastic background of GWs through a network of detectors. The core idea was to try to follow as much as possible an optimal approach. The target was to implement an optimal detection statistics (DS), and similarly a Bayesian approach to parameter estimation based on a realistic likelihood L. To circumvent the lack of analyti-cal closed-form expressions for the DS and for L we opted to numerically evaluate them. This approach implies approximations of relevant integrals, that can be reduced at will with a large enough computational power. We proposed a Monte Carlo importance sampling procedure to achieve this objective. In this paper we focus on a further improvement of the DS, while not investigating parameter estimation, anymore. In Sec. II we summarize the our previous relevant findings, leading to the improvement presented in this paper.

II. SUMMARY OF PREVIOUS RESULTS
As a first step in [12] we studied the optimal Neyman-Pearson DS for a simple toy model, where the stochastic signal is a sequence of independent pulses with amplitudes distributed as a Gaussian mixture. In this particular case the DS can be evaluated in a closed form, as a nonlinear function of the data.
A conceptual issue arised in this context: the optimal Neyman-Pearson DS, which is obtained under the assumption of known noise, contains contributions proportional to single detector data autocorrelations (of arbitrary order). These contributions are dominated by the noise 1 , and in a realistic regime the noise amplitude cannot be assumed known with a precision high enough to not spoil the DS effectiveness.
The same shortcoming has been encountered in literature also in the Gaussian case, and we will clarify it with an example. Let us suppose we want to discriminate between the models where s A i are measured values (1 < A < N D is an index labelling the detector and i ≤ 1 ≤ N ), σ and A are the noise and signal amplitudes. The stochastic variables n I , h are normally distributed and independent. Under the assumption of a known σ 2 the Neyman-Pearson test is easily found from which is equivalent to The mean of Y is 1 We always assume a Gaussian noise.
We see that the first term, originating from crosscorrelations, is noise independent while the second, originating from auto-correlations, depends both upon signal and noise. As a matter of fact for a realistic stochastic background σ 2 is uncertain, with an uncertainty larger than A 2 . This means that we will not be able to detect confidently a stochastic background from the DS in (5): a given value of Y could be explained both by a background contribution, or by a noise fluctuation. The problem disappears when the noise dominated diagonal components are removed, obtaining This is a very simplified version of the statistic commonly employed to detect a Gaussian stochastic background, see for example Eq. (3.73) in [13] which defines the optimal cross-correlation in the Gaussian case A simplification comes from the assumption of co-located and co-oriented detectors which makes the overlap reduction function γ(f ) equal to one. Furthermore, both the stochastic signal and the noise are assumed here to have a white spectrum, so Ω GW ∝ f 3 , S A,B n are frequency independent and Eq. (8) becomes equivalent to Eq. (5). It is interesting to note that if we use a generalized likelihood ratio test (GLRT) test like whereσ i andÂ i are the maximum likelihood estimates of σ and A under the H 1 hypothesis we obtain, when For large enough N , relative fluctuations in the denominator become negligible, and we recover the DS in Eq. (7). Under the assumption that noises across different detectors are uncorrelated, the modified statistic has by construction a zero expectation value in the hypothesis H 0 (i.e. absence of signal), and a detection procedure robust against noise mismodelling can be defined (see for example [13], or [14] and references therein).
Removing analogous contributions 2 from the nonlinear statistic, appropriate for a non Gaussian background, is tricky. In [12] we defined a procedure involving the subtraction of a set of auxiliary data constructed from the original ones. The basic idea was to construct a set of data streams (one for each detector) with the same autocorrelation but with zero cross-correlation among themselves and with the original set. This can easily be done, for example by introducing a time shift larger than the correlation length of the stochastic background. In this way autocorrelations can be independently estimated and subtracted.
However our approach was suboptimal: auxiliary data have by construction zero average estimates for cross correlation, but these estimates -which are subtracted from the ones of the original data-carry additional fluctuations. These are summed to the ones of the original set: as a consequence, we obtain more noisy DS and the detection performance is reduced. In [12] this was evident in the regime where non-Gaussianity was not too high. It was quantified by the reduction of detection probability for a given false alarm, which in some cases was worse than the one of the Gaussian DS. We show again these results in this paper, comparing them with the improved ones (see Figures 1,2,4,5).
In this work we define a different, straightforward procedure to remove diagonal contributions from the nonlinear statistics. This procedure does not introduce additional fluctuations, and has improved performances compared to [12], when measured in term of figures of merit related to detection probability, as will be discussed extensively in Section IV. We apply the new procedure to the same toy model used previously, and we quantify the resulting improvement. Finally, we show that the new procedure can be applied to a realistic background.
Henceforth, we will discuss and compare a number of detections statistics, with the following naming convention: • The "Exact" DS is constructed via the Neyman-Pearson lemma [15]. It is formally the optimal one, but the presence of diagonal terms makes it unusable in a realistic setup, i.e. when uncertainty on the noise amplitude is larger than the target GW signal under the hypothesis H 1 .
• The "Scrambled" DS is obtained from the "Exact" statistic with the subtraction procedure defined and characterized in [12], and shortly described in this Section.
• The "Improved" DS is obtained from the "Exact" statistic with the new subtraction procedure described in Section III.
• The "Gaussian" DS is the optimal Neyman-Pearson detector for a Gaussian stochastic background with a spectrum matching that of the toy model described in Section III, the same considered in [12]. It is suboptimal when applied to a non-Gaussian stochastic background, and is used as a fiducial reference for performance comparison.
In this work we provide only minimal details on the "Exact", "Scrambled", and "Gaussian" DSs construction.
The interested reader will find relevant ones in [12] and references therein. We will instead focus on the "Improved" statistics. In Sec. III we construct it explicitly, providing a proof of its optimality with respect to a set of robustness requirements. In Sec. IV A we present performance comparisons between the above statistics, exploring their behaviour for varying signal duration, their amount of non-Gaussianity, and the number of detectors. In Sec. IV B, we briefly discuss a realistic implementation of the procedure, highlighting its advantages. Finally, in Sec. V we draw conclusions and prospects for further developments of our approach.

III. IMPROVED DETECTION
Our discussion will be mainly in the context of the toy model studied in [12]. It allows for a quantitative comparison of detector performances. In this model, the data measured by each detector are time series of the form where i is a discrete time index, 1 ≤ i ≤ N , being t i = iδt the measurement time and δt the sampling time. We suppose to have a network of N D detectors labeled by the index A, with 1 ≤ A ≤ N D . The noise component of the data is n A i , and h we will assume that n A i s are Gaussian, zero-average random numbers with covariances defined by The signal h i is the same on each detector, so they can be thought as identical, aligned and co-located. It follows a Gaussian mixture model distribution By hypothesis, different signal data points are therefore independent: the waveform associated to our events are very short, delta-like burst, without a resolved structure. Let us write the single event in the form We have two parameters: the amplitude h and the index j associated to the event time. As the model is time independent, all the values of j have the same probability. If p(h) is the probability distribution for the amplitude parameter, h i will be distributed as Observing time N δt where Γ is the event rate. This can be approximated by the toy model with two mixture components α ∈ {+, −} introduced in [12] if p ∼ N (· | 0, σ + ) and Γδt ≪ 1, in such a way that terms with n > 1 can be neglected in Eq. (16) This correspond to σ − = 0, p + = Γδt and σ + /σ h = 1/ √ Γδt. The parameter Γδt is the average number of events that contributes to a given h i , a measure of overlap and Gaussianity. For our delta-like signals the approximation Γδt ≪ 1 is not a problem, because the sampling time δt is a parameter unrelated to the physics and could be assumed to be small. The two main limits of the model are the lack of any time structure for the event waveform, which does not allow for relevant overlap structure, and the approximation of identical (co-aligned and co-located) detectors. To accommodate for the general case our model should be interpreted as a superposition of an astrophysical contribution like the previous one, with p ∼ N (· | 0, σ + ), and of a Gaussian contribution with variance σ 2 − . In this case (again in the approximation Γδt ≪ 1) we obtain wherê We can expand the DS aŝ with (26) where k A are non-negative integers. Following the argument provided in Sec. II we want to remove from Eq. (25) all terms with non-zero expectation value under the hypothesis H 0 (i.e., in absence of GW signal). Under such hypothesis, the u A s are independent normal variables and by virtue of Isserlis' theorem the terms to be cancelled are those in the large sum of (26) with all even k A s.
This can be written explicitly, up to to an irrelevant multiplicative constant, aŝ A formal proof that this is the desired DS follows.
Theorem III.1. For a given statistics S(u 1 , · · · , u N D ), where u A = h A + n A , if n A are statistically independent stochastic variables (i.e., the noise in our context) with non-zero even momenta and zero odd momenta, we consider the formal Taylor expansion in powers of u A . A modified statistics where under H 0 terms with non-zero expectation values are canceled while preserving others is Proof. Let us consider a generic term of the Taylor expansion, which reads where k i are positive indices. If h A = 0, its expectation value is different from zero iff all the k i s are even. In fact, (30) as the n A are statistically independent. We observe that and the expression between curly braces is equal to one iff all the k i are even, and equal to zero otherwise. Therefore the statistics in Eq. (28) cancels the fully even terms, while preserving all the others.
It is worth noting that the Gaussian noise assumption is not mandatory: independence and zero odd momenta are required, only. Gaussian, zero average noise is a particular case. We focus on two particular examples, N D = 2, 3. For N D = 2 we get The first two non-zero orders in a power expansion are given bŷ and we see that for a linear functionŷ the usual Gaussian optimal DS is recovered, while higher-order corrections are proportional to higher order correlation of the measured data. As desired, diagonal terms such as (u 1 ) 2 , (u 1 ) 2 (u 2 ) 2 etc. are canceled under the null-hypothesis. We also note here that under the competing hypothesis (h ̸ = 0), O(u 4 ) terms proportional to h 2 n 2 and to h 4 are preserved. The first provide additional information on the GW spectra, boosted by the noises' spectra frequently assumed to be much larger. The other contains information about non-Gaussianity. Similar contributions will appear at higher orders.
In the same way for N D = 3 we havê and the first two non-zero orders arê We get again the Gaussian optimal DS at the lowest order, which is just a sum over all pairings of the weighted cross correlations and no diagonal higher order terms.

IV. RESULTS
In order to contextualize the performances of the improved DS, we first discuss the connection between the toy model used for the tests and a realistic stochastic background. Let us remind that the former can be described as a superposition of statistically independent events. Each event will produce a contribution to the strain where τ I is the event time and λ I = (λ 1 I , · · · , λ Np I ) an appropriate set of parameters describing the event.
These can be intrinsic (e.g., the chirp mass of a coalescing binary) or extrinsic (e.g., the source luminosity distance or its position in the sky). It is convenient for our modelling to isolate the arrival time and not describe it on the same footing of other parameters.
An useful formalism for the description of a sequence of independent events with a fixed rate is based on the introduction of a function Q(λ) [16].
We will not give the explicit derivations (see [16] for details): the interesting final result is that the statistical cumulants of strain field can be written as where we assumed that the average value ⟨h µν (x, t)⟩ is zero. The quantity Q(λ)dλ can be interpreted as the rate of events in a given infinitesimal volume of the parameter space. This is more evident if we write where Γ is the event rate and P (λ) the probability density of λ over its parameter space. We use this notation to also rewrite (38) as or, in the frequency domain, Here X is the average over the parameters λ of X. Note that, from an observational point of view, we are interested in cumulants of the strain measured by the detectors, namely Here D A ij is a generalized detector tensor of the A-th interferometer located in x A , where we included the time dependency connected to the detector´s movement and a whitening transformation that can be chosen in such a way to normalize the signal to a white noise spectrum. Moments of measured strains defined in Eq. (44) can be easily recovered from (38).
All the information of astrophysical interest are encoded in Q(λ). Q functions are additive: if a given stochastic background is the sum of several contributions, each described by Q (a) (λ), we have This mathematical description let us highlight an important fact: let us suppose that the rate is increased Q(λ) → ηQ(λ), and at the same time individual events amplitudes are scaled accordingly with u µν → η −1/2 u µν : the second order cumulant ⟨⟨h µ1ν1 (x 1 , t 1 )h µ2ν2 (x 2 , t 2 )⟩⟩ does not change, while higher order ones are rescaled as ⟨⟨h µ1ν1 (x 1 , t 1 ) · · · h µnνn (x n , t n )⟩⟩ → η (2−n)/2 ⟨⟨h µ1ν1 (x 1 , t 1 ) · · · h µnνn (x n , t n )⟩⟩ (46) and become less and less important in the η → ∞ limit. This offers a straightforward interpretation: when the background is generated by a very large number of weak events, it becomes a Gaussian stochastic field: only its second order cumulant, which is related to spectral characteristics, is relevant.
Let us now connect the formalism above with the proposed toy model. Considering σ − = 0 (Eq. (17)) we havẽ and we can use directly Eq. (43) to get the even cumulants h A1 (ω 1 ) · · ·h An (ω n ) = 2πΓδ(ω 1 + · · · + ω n )h n = 2πΓ(n − 1)!!p + σ n + δt n δ(ω 1 + · · · + ω n ) (48) while the odd ones are zero. The general case in Eq. (18) can be obtained by substituting in this result σ 2 + → σ 2 + − σ 2 − and a by adding the contribution of Gaussian background with variance σ 2 − . This contributes only to the n = 2 cumulant and we get h A1 (ω 1 ) · · ·h An (ω n ) = 2πΓ(n − 1)!! p + (σ 2 + − σ 2 − ) n/2 + σ 2 − δ n,2 δt n δ(ω 1 + · · · + ω n ) (49) In particular the power spectrum is flat and given by which correspond to an adimensional energy density for logarithmic interval of frequency It is important to note that the distribution of signal amplitudes for each event depends on the specific details of Q(λ): e.g., it is sensitive to the redshift z distribution of the sources. The signal power spectrum depends only on σ h , while higher order momenta carry information, and therefore can distinguish additional signal properties. The noise power spectrum can be written similarly as S A n (f ) = σ 2 A δt. Finally, we write the signal-to-noise ratio for a pair of detectors as a function of the toy model parameters where T is the observing time and ∆f the sensitivity bandwidth of the detectors. We stress that this is an appropriate figure of merit for the performances of the Gaussian analysis. In order to compare DSs' performances, we follow closely the procedure in [12]. In addition, we observe that momenta of the DS can be evaluated as an explicit integral, therefore we avoid a direct simulation of the data under H 1 . For the "Exact" DS we write where C ± are matrices in detector space with entries defined by and they describe the two multivariate Gaussian distributions entering the competing hypothesis. Similar expressions can be obtained for the other DSs listed in Sec. I (detailed derivations are provided in [12]).

A. Performance comparison
When a series of N d data is collected from each detector, the DS is the sum of Eq. (19) and for large enough N d ,Ŷ is approximately Gaussian distributed. Using this property and the connected momenta being proportional to N d , i.e.
we write a relation between false alarm probability, detection probability, and N in the form We note that this relation is valid for all the DS considered, for a large enough number of data N d (or, equivalently, for a large enough data taking time T = δtN d ). It follows that a naive definition of a signal-to-noise ratio can always be used. However, it is also clear that this SNR does not describes completely the performances of a given DS, as the ratio is also relevant. For this reason, we argue that the best and unambiguous figure of merit for a comparison between the Gaussian and the non-Gaussian case is the value of the detection probability.
Performances are presented in Fig. 1, 2, 4 and 5 using Eq. (58) for N D = 2, 3, 4 and 5, respectively. We plot the detection probability against the number of datapoints N, for P F A = 10 −15 . The advantage of the procedure is that it gives precise estimates of the relevant quantities also for very small values of P F A , otherwise inaccessible through simulated data. On the other hand, the functional form of Eq. (58) is based on the assumption that the statistic is Gaussian, so it should considered reliable only for high enough number of data.
Every figure shows several plots with different choices of σ + /σ h and σ − /σ h , to explore different levels of non-Gaussianity, from negligible (bottom right panel) to strong (top left panel). The "Improved" DS systematically outperforms the "Scrambled" one studied in [12], with greater probabilities of detections for every N and level of signal Gaussianity. In particular, the "Improved" performances are always equal to or better than the "Gaussian" ones. The "Improved" DS does not suffer of the loss of performances affecting the "Scrambled" one, even for signals very close to Gaussian.
Remarkably, increasing the number of DSs (i.e. in Fig. 2) we see that the "Improved" DS has performances similar to the "Exact" one, showing an important gain in having greater N D . The improved DS performances converge to those of the "Exact" one for large N D .
A semi-quantitative understanding of this behavior is obtained as follows: we notice that for a given order O(u 2n ), there are This suggests that subtracted terms impact less and less the performances when N D grows larger, the "Improved" DS getting closer and closer to the "Exact" one. We note that a network with N D = 5 is not unrealistic [17], and is already large enough to make the "Exact" and "Improved" DSs nearly equivalent. A further argument in support of such behaviour follows: we rewrite Eq. (58) as and we define N G and N I its value computed for the "Gaussian" and "Improved" DSs, respectively. Therefore the ratio N G /N I gives the multiplicative factor for the number of data needed for the "Gaussian" DS to reach the same detection probability of the "Improved" one, for fixed competing hypotheses and probability of false alarm.
We show a plot of such ratio at fixed P D = 0.5 and P F A = 10 −15 in Fig. 3 (right panel) for a set of N D values. The ratios σ + /σ h and σ − /σ h are shown on the vertical and horizontal axis, respectively. The improved statistic has always better performances than the Gaussian statistic, as N G /N I is globally greater than one. Such improvement grows larger in the upper left corner, where the signal kurtosis is larger, as expected. In the right panel of Comparison between detections statistics: each panel contains the probability of detection PD as a function of the number of datapoints N . We show the "Exact", "Improved", "Scrambled", and "Gaussian" statistic as solid red, dashed green, dashed blue, and solid black lines, respectively. Several diffent signal models are considered, parameterized by σ+/σ h and σ−/σ h , of increasing non-Gaussianity, going from bottom right to top left panel. We fix ND = 2 and PF A = 10 −15 . Shaded red areas delimit regions with beyond-optimal performances, right-bounded by the "Exact" DS. Shaded grey area delimit statistics with performances worse than the "Gaussian" one. We observe uniform improvement over the whole parameter space of the "Improved" statistics as compared to the Gaussian one. By contrast, the scrambling procedure introduced in [12] outperforms the Gaussian one for strong non-Gaussianities, only. In Fig. 2, 4, and 5 similar results for increasing number of detectors are shown, with the "Improved" statistics approaching the "Exact" one. Figure 3 we represent the relation between the reduced kurtosis and N G /N I , for a uniform sampling in the parameters σ + /σ h , σ − /σ h of the region shown in the right panel. Results for N D = 2, 3, 4, 5 can be compared, and we see again that the performance of the "Improved" DS relative to the "Gaussian" one increase with N D . This is equivalent to a comparison of the time required to achieve a detection at the same level of significance using the two detection statistics. From Figure 3 we see that a confident detection would require a data taking time three times larger with the standard "Gaussian" DS compared with the "Improved" DS we introduced with two detector and a mild non-Gaussianity. Larger improvements are obtained with larger non-Gaussianities and more that two detectors.
We note that having N G /N I ≥ 1 is a priori expected. In fact, the "Improved" version of the optimal statistic is obtained subtracting non-zero mean terms from the "Exact" one, which is optimal in the Neyman-Pearson sense. In the Gaussian limit the "Improved" statistic becomes exactly the "Gaussian" one, as discussed in the previous section (see Eq. (36) for a specific example), and can be considered its natural generalization. However, we do not provide here a rigorous proof.

B. Subtraction for realistic backgrounds
Given the effectiveness of the new procedure, we explore its application to a realistic non-Gaussian stochastic background. We consider the model where the GW signals measured by each detector are now different, and the joint probability distribution reads where D(h) is some generic function of the data h A i (a probability density) and A,i dh A i a measure in the space (of dimension N D × N ) of possible signal time series.
We first observe that the statistical properties of h are irrelevant, because we only want to obtain the cancella- Comparison between detections statistics: each panel contains the probability of detection PD as a function of the number of datapoints N . We show the "Exact", "Improved", "Scrambled", and "Gaussian" statistic as solid red, dashed green, dashed blue, and solid black lines, respectively. Several diffent signal models are considered, parameterized by σ+/σ h and σ−/σ h , of increasing non-Gaussianity, going from bottom right to top left panel. We fix ND = 3 and PF A = 10 −15 . Shaded red areas delimit regions with beyond-optimal performances, right-bounded by the "Exact" detections statistic. Shaded grey areas delimit statistics with performances worse than the "Gaussian" one. We observe uniform improvement over the whole parameter space of the "Improved" statistics as compared to the "Gaussian" one. By contrast, the scrambling procedure introduced in [12] outperforms the Gaussian one for strong non-Gaussianities, only.
tion of diagonal terms under the H 0 hypothesis. However, realistic detector noises can have non trivial timedomain correlations, and non-Gaussian contaminations are expected. A generic term of a Taylor expansion of the DSs with degree κ 1 + · · · + κ N D reads and its expectation is as we assume again statistically independent noises across different detectors. After applying the subtrac-tion procedure defined in Eq. (27) we get As in the simplified case, a given term is cancelled if all the κ A s are even and is preserved otherwise. The expectation values of all the preserved contributions are zero, unless some odd-order noise momenta are non null. Therefore, the subtraction procedure is effective in a very general sense, and is applicable also when non-Gaussian contaminations (without non-zero odd momenta) are present. In particular this procedure can be incorporated easily in the general DS defined in Eq. (17) of [12]. However, it should be noted that non-Gaussian noise contributions can change the expectation value of the statistic under H 1 .

V. CONCLUSIONS AND PERSPECTIVES
The "Improved" DS introduced in this paper can be seen as the natural generalization to the non-Gaussian case of the optimal statistic used for a Gaussian stochastic background. As a study of a realistic model is computationally expensive, we performed an exploration based on a simplified toy model, and focused in this manuscript on our findings. The toy model we used to quantify its performances is simplified. In particular, it completely neglects the non trivial time structure of the signals associated to a realistic event. However we showed that the core procedure to remove diagonal contributions is applicable to the general case, since it is insensitive to time or frequency structure, and only performs a transformation in the detector space.
Performance results are promising, and we expect that the improvement with respect to the "Gaussian" DS will be preserved in a realistic situation. We observed that the "Improved" DS becomes nearly equivalent to the "Exact" one for a large enough (although reasonable) number of detectors in the network. For completeness, it should be noted that the improvement that can be obtained in a real scenario depends on the details of the background non-Gaussianity, so one should extrapolate with care our findings, and ultimately support them through direct investigation.
From a practical point of view, the main difference in a realistic scenario where no analytical expression for the Improved DS is available, arises from the computational cost in evaluating it as discussed in [12] (e.g. the discussion following Eq. (17) therein), which requires an efficient numerical procedure. The computational cost is related to the number of configurations needed for an accurate statistic estimate, which we investigated in previous work using importance sampling procedure. On this ground, we expect a large number of detectors to help in reducing the effective computational cost.
Computational cost estimate of the method in a realistic case is a key step that needs to be fully addressed to assess the applicability of our method. We argue than an expensive, although manageable, computational cost would certainly be justified by the prospect of an earlier and more confident detection of an astrophysical stochastic background, e.g. generated by a superposition of binary coalescence events, within reach in the near future [1]. We expect that no major additional obstructions will forbid the application of the method to other detectors, such as LISA or pulsar timing arrays.

ACKNOWLEDGMENTS
RB acknowledges support through the Italian Space Agency grant Phase A activity for LISA mission, Agreement n. 2017-29-H.0, CUP F62F17000290005.