Frequentist versus Bayesian analyses: Cross-correlation as an (approximate) sufficient statistic for LIGO-Virgo stochastic background searches

Sufficient statistics are combinations of data in terms of which the likelihood function can be rewritten without loss of information. Depending on the data volume reduction, the use of sufficient statistics as a preliminary step in a Bayesian analysis can lead to significant increases in efficiency when sampling from posterior distributions of model parameters. Here we show that the frequency integrand of the cross-correlation statistic and its variance are approximate sufficient statistics for ground-based searches for stochastic gravitational-wave backgrounds. The sufficient statistics are approximate because one works in the weak-signal approximation and uses measured estimates of the auto-correlated power in each detector. Using analytic and numerical calculations, we prove that LIGO-Virgo's hybrid frequentist-Bayesian parameter estimation analysis is equivalent to a fully Bayesian analysis. This work closes a gap in the LIGO-Virgo literature, and suggests directions for additional searches.


I. INTRODUCTION
Current searches for stochastic gravitational-wave backgrounds (GWBs) using ground-based laser interferometers (e.g., the Advanced LIGO [1] and Virgo [2] detectors) use a hybrid of frequentist and Bayesian analysis techniques [3][4][5]. Certain frequentist statistics (namely, frequency integrands for the cross-correlation statistic and its variance) are calculated for relatively short stretches of time-series data (of order a couple of minutes), and then combined using inverse-noise weighting to produce two final frequency series, valid for the whole observation period (of order several months to a year). These frequency series are then used as the fundamental input data and variance for a subsequent Bayesian analysis that calculates posterior probability distributions and Bayesian upper limits on the strength of potential correlated gravitational-wave signals [6]. These hybrid frequentist-Bayesian analyses have been used to place upper limits on GWBs with different amplitudes and spectral shapes [3,7]; GWBs having non-GR polarizations predicted by alternative theories of gravity [3,8]; and potential contamination from correlated global magnetic fields-i.e., Schumann resonances [9][10][11][12][13].
On the other hand, most searches for GWBs using pulsar timing data [14,15] and proposed searches using space-based detectors like LISA [16] are fully Bayesian, proceeding directly from the time-series data to posterior distributions and upper limits, without ever calculating a cross-correlation frequency integrand or statistic. So the question naturally arises as to whether LIGO and Virgo's current hybrid frequentist-Bayesian analysis is losing information relative to a fully-Bayesian search.
As we shall show below, the answer is basically "no". * andrew.matas@aei.mpg.de † joseph.d.romano@ttu.edu We assume that we can work in the weak-signal approximation, and we use measured estimates of the autocorrelated power in each detector, as opposed to trying to infer the noise power spectra as part of the full analysis. Under these simplifications, the frequency integrands of the cross-correlation statistic and its variance are approximately lossless combinations of the full timeseries data in terms of which the full likelihood function can be rewritten. Hence, Bayesian posterior distributions produced from these frequency series agree quite well with those produced from the full time-series data. Said another way, the frequency integrands of the cross-correlation statistic and its variance are approximate sufficient statistics for the analysis, so the hybrid frequentist-Bayesian method is essentially equivalent to a full Bayesian analysis.
The calculations presented in this paper can also be thought of as a providing an alternative conceptual starting point for LIGO-Virgo's stochastic cross-correlation analysis. We use the full Bayesian likelihood function as a organizing priniciple, out of which the LIGO-Virgo stochastic analyses follows. Various steps in the stochastic analysis, such as coarse graining of cross-correlated data [17], estimating auto-correlated power spectra from neighboring data segments [18], inverse-noise weighting, optimal filtering [19], the inclusion of "bias" factors for the theoretical variance [18], and the use of the crosscorrelation frequency integrand and its variance to do parameter estimation and model selection [6], all follow directly from the Bayesian likelihood under the assumptions of a weak signal and estimated auto-correlated detector power spectra. We also see opportunities to develop new methods to look for signals which violate our assumptions.
Our main result will be to show that the full Gaussian likelihood for a stochastic background (cf. equations (31) and (32)) is equivalent to the reduced likelihood given in equation (48), under the approximations enumerated above. To do this, we will build up the tools necessary in a series of increasingly more realistic (and more complex) scenarios. In Sec. II, we give a simple example of a sufficient statistic in the context of a search for a constant deterministic signal in the output of a single detector. This example serves as a basis for the calculations in the following two sections for cross-correlation-based searches for stochastic gravitational waves. We first restrict attention in Sec. III to white signal+noise models, and stationary data. We then extend our analyses in Sec. IV to colored signal+noise models, allowing for nonstationary noise. Both of these sections present results of different analyses performed on simulated data comparing posterior distributions produced by fully-Bayesian and sufficient-statistic analyses. We conclude in Sec. V with a brief summary, followed by a discussion of other related approaches in the literature, and possible extensions of these results. We also include three appendices: Appendix A contains a simple, yet very useful, identity (A1) that we use repeatedly throughout the paper; Appendix B summarizes uncertainties in power spectrum estimation; and Appendix C gives an alternative derivation of a reduced likelihood function, but which makes different assumptions than those given in the main text.

II. SIMPLE EXAMPLE
Perhaps the simplest example of a non-trivial sufficient statistic is the sample mean of the data for a constant signal in white, Gaussian noise (see Sec. 3.5 of Ref. [5] for a more detailed treatment of this example.) We suppose we record N time-series data samples d ≡ {d i } as where a > 0 is the amplitude of the signal and n i denotes the ith sample of the noise. For simplicity, we will assume that the noise is white and has zero mean-i.e., n i = 0, n i n j = σ 2 δ ij , and that the variance σ 2 is known a priori. The likelihood function is then which has the interpretation of being the probability of observing the data d given a signal of amplitude a in white noise with known variance σ 2 .
It is fairly straight-forward to show that the maximumlikelihood estimator of the amplitude a is given by the sample mean of the datâ This is an unbiased estimator of a and has variance In terms ofâ, the data-dependent term in the likelihood becomes This equation is a special case of the general identity (A1), which is discussed and proven in Appendix A, and which we will use in future sections. The likelihood can then be rewritten in the form In this expression, we see that the data appear only via the combinationâ, up to a proportionality factor which is independent of the parameter a. By precomputing the sample meanâ and i d 2 i , we can reduce the evaluation of the likelihood from O(N ) to O(1) operations.
The posterior distribution for a, denoted p(a|d), is calculated using Bayes' theorem where p(a) is the prior probability distribution for a, and is the so-called evidence (or marginalized likelihood).
Since p(d) is independent of a, the evidence acts as a normalization factor as far as the posterior distribution of a is concerned. Thus, one often writes p(a|d) ∝ p(d|a)p(a) for posterior distribution calculations. For this example, where the data enter the RHS of the above expression only via the expression (3) for the maximum-likelihood estimatorâ ≡ a ML (d). This shows thatâ is a sufficient statistic for a. Finally, we note that the prior probability distribution p(a) is chosen based on expectations for 'a' prior to observing the data. It is common to use flat or log-uniform priors, defined over some interval [a min , a max ], where the constants in these expressions are determined by the normalization condition dap(a) = 1. Note that non-trivial priors imply that the maximum-likelihood and maximumposterior values will differ in general. For the analyses that we will perform in the following sections, we will consider flat priors for simplicity, but note here that the results we obtain are valid for arbitrary, non-flat priors as well.

III. SUFFICIENT STATISTICS FOR CROSS-CORRELATION SEARCHES -WHITE SIGNAL+NOISE MODELS
Here we extend the calculations of the previous section to cross-correlation searches for stochastic gravitational waves. We restrict our attention in Sec. III A to a white signal+noise model, assuming stationary signal and noise (see also Sec. 4 of Ref. [5]). In Sec. III B, we consider a reduced version of this model, where we assume weak signals relative to the detector noise and use measured estimates of the auto-correlated power in each detector, as opposed to having these as model parameters to be determined by the search. Our analysis of the white noise case contains all of the important steps present in the colored noise case that we shall discuss in Sec. IV, but with considerably less complication.

A. White signal+noise, stationary data
To start, let us consider two coincident and coaligned detectors with uncorrelated noise. We take both the detector noise and correlated stochastic signal to be Gaussian, stationary and white. We denote the variances by σ 2 n1 , σ 2 n2 , σ 2 h , respectively. We will not assume that we know the noise variances a priori, so we will try to recover σ 2 n1 and σ 2 n2 in addition to σ 2 h . Then the likelihood function is given by where is the covariance matrix and The indices i, j = 1, 2, · · · , N label individual time samples, and α, β = 1, 2 label the two detectors. The joint posterior distribution for the signal and noise variances is where p(σ 2 n1 , σ 2 n2 , σ 2 h ) is the joint prior probability distribution, and is the evidence for this signal+noise model. It is easy to show that the maximum-likelihood estimators for the parameters σ 2 n1 , σ 2 n2 , σ 2 h are given by the following quadratic combinations of the data: These expressions show that it is also convenient to definê which are estimators of the total auto-correlated variances in the two detectors: In the weak-signal limit σ 2 1 ≈ σ 2 n1 and σ 2 2 ≈ σ 2 n2 , but we will not make that approximation at this stage. We will discuss this approximation in Sec. III B.
To show that the above estimators are sufficient statistics for this problem, it suffices to show that the data enter the likelihood function (11) solely in the form of these estimators, up to an overall normalization that does not depend on the signal and noise parameters. Since the only part of the likelihood function that depends on the data is the argument of the exponential, we just need to show that d T C −1 d can be written in terms ofσ 2 n1 ,σ 2 n2 , σ 2 h , or, equivalently, in terms ofσ 2 1 ,σ 2 2 ,σ 2 h . Since the covariance matrix C is a 2 × 2-block matrix with each block proportional to 1 N ×N , we can explicitly invert C, yielding where Using this result, it is then straightforward to show that (21) Thus, which depends on the data only viaσ 2 1 ,σ 2 2 ,σ 2 h , or, equivalently,σ 2 n1 ,σ 2 n2 ,σ 2 h .

B. Weak-signal approximation, estimating detector auto-correlations
We now consider a reduced signal+noise model and its corresponding likelihood function, again for the case of white signal+noise, stationary data. The reduction in the model has two components: First, instead of treating the detector noise variances σ 2 n1 , σ 2 n2 or the total auto-correlated variances σ 2 1 , σ 2 2 as free parameters for our analysis, we will use measured estimatesσ 2 1 ,σ 2 2 in place of σ 2 1 , σ 2 2 . We use overbars instead of hats to denote these quantities to indicate thatσ 2 1 ,σ 2 2 are not the same as the maximum-likelihood data combinationsσ 2 1 , σ 2 2 for the segment that we are analyzing. For example, for the actual LIGO-Virgo stochastic searches, the estimates of the auto-correlated power are constructed from two segments of data (each approximately one minute in duration) immediately preceding and following the analysis segment. In fact, we will show in Fig. 4 that if the measured estimates of the autocorrelated power equal the maximum-likelihood data combinations for the analysis segment, then we obtain biased recoveries of the GWB spectrum, due to covariances between the autocorrelation and cross-correlation estimators [18]. (But see also the discussion of bias at the end of Appendix C.) Second, we will assume that the stochastic signal is weak compared to the detector noise and thus keep only the leading-order terms in expressions involving (σ 2 h ) 2 /σ 2 1σ 2 2 1. To zeroth order, the detector noise and total auto-correlated variances are equal to one another, This weak-signal approximation is valid for searches for GWBs using ground-based detectors like LIGO and Virgo. It is not a good approximation for GWB searches using pulsar timing arrays, where the auto-correlated power in the GWB may exceed that due to pulsar noise and timing measurement noise at very low frequencies [15,20].
The likelihood function for the reduced signal+noise model can be obtained from (22) by making the simplifications described above. We first approximate the termŝ σ 2 1 /σ 2 1 andσ 2 2 /σ 2 2 by 1, given that we are replacing the parameters σ 2 1 and σ 2 2 by measured estimates of these quantities. We then replace the remaining factors of σ 2 1 , σ 2 2 , which appear in lihood function in the combination where N avg (which we assume to be equal for both detectors) is the number of averages used in the construction ofσ 2 1 ,σ 2 2 , e.g., Welch power spectrum estimates [21]. The justification for including the factor of (1 + 2/N avg ) is given in Appendix B; the factor removes a bias that would otherwise exist in the estimation of 1/σ 2 1 σ 2 2 , due to the use of a finite amount of data to constructσ 2 1 ,σ 2 2 . Making all these replacments in (22), we obtain We then Taylor expand the RHS keeping only the leading-order terms in ( The σ 2 h factor in front of the exponential can be written as while the exponential factor itself becomes Combining these last two expressions gives where we completed the square in σ 2 h andσ 2 h and made the substitution which is the leading-order expression for the variance of the maximum-likelihood estimatorσ 2 h . Thus, which shows thatσ 2 h and var(σ 2 h ) are sufficient statistics for this reduced signal+noise model. Note that the σ 2 hdependent term in the likelihood function has the same general form as that for the simple example described in Sec. II; see (6). (In Appendix C, we present an alternative derivation of the likelihood function for a reduced signal+noise model, but under different assumptions than used here.) Figure 1 compares the marginalized posterior distributions for σ 2 h calculated from the two likelihood functions, (22) and (29), for a short stretch of simulated time-series data consisting of a white signal injected into white noise in two coincident and coaligned detectors. The simulated noise had variances σ 2 n1 = σ 2 n2 = 1, while the injected signal had σ 2 h = 0.3. For 512 samples, these values correspond to an expected signal-to-noise ratio ρ = √ N σ 2 h / σ 2 1 σ 2 2 = 5.22 for the full set of data. Note, however, that (σ 2 h ) 2 /σ 2 1 σ 2 2 = 0.05 1, consistent with the weak-signal approximation. The blue histogram is the marginalized posterior for σ 2 h calculated from the full likelihood function (22); the orange histogram is the marginalized posterior for σ 2 h calculated from the reduced likelihood function (29). For the reduced likelihood analysis, the detector auto-correlated power were estimated from an additional simulated data segment.  (22) (blue histogram) and its reduced version (29) (orange histogram), which substitutes measured estimatesσ 2 1 ,σ 2 2 for the auto-correlated power σ 2 1 , σ 2 2 and keeps only leading-order terms in (

IV. SUFFICIENT STATISTICS FOR CROSS-CORRELATION SEARCHES -COLORED SIGNAL+NOISE MODELS
Here we extend the analysis of the previous section to a colored signal+noise model, starting with stationary data in Sec. IV A and then discussing the complications introduced by non-stationary noise in Sec. IV B.
In Sec. IV C, we simplify the signal+noise model by again considering weak signals and measured estimates of the auto-correlated power, for which the frequency integrands of the cross-correlation statistic and its variance are sufficient statistics. In Sec. IV D we show that the full Bayesian analysis is approximately equivalent to LIGO-Virgo's hybrid frequentist-Bayesian analysis, and in Sec. IV E we construct percentile-percentile (pp) plots [22] to show that the reduced analyses have proper statistical coverage. (We do not assume that the detectors are coincident and co-aligned in this section.)

A. Colored signal+noise, stationary data
For the case where the detector noise and GWB signal are colored, it simplest to work in the frequency domain, since the Fourier components are independent of one another (assuming the data are stationary over the duration of the analysis segment). Assuming multivariate Gaussian distributions as before, the variances σ 2 n1 , σ 2 n2 , σ 2 h , for the white signal+noise case are replaced by power spectral densitites P n1 (f ), whereh(f ) is the Fourier transform of the signal component h(t) of the time-series data. The factor of 1/2 is included to make these one-sided power spectral densities, so that the total variance is given by an integral of the power spectral density over positive frequencies, Although the above expressions are written in terms of continuous functions of frequency, in practice we work with frequency series, e.g., P h (f k ), where the discrete frequencies take values f k ≡ k∆f , where ∆f ≡ 1/T and k = 0, 1, · · · , N/2 − 1, for a data segment of duration T ≡ N ∆t. For white data, P (f ) = 2σ 2 ∆t is constant between f = 0 and the Nyquist frequency f Nyq ≡ 1/2∆t. Even though we are assuming here that the data are stationary, it is convenient to divide the data from a large observation period into segments, which we will label by I = 1, 2, · · · , N seg . The full likelihood function is then a product of the likelihood functions for the individual segments where In the above expressioñ is the covariance matrix of the data, with denoting the total auto-correlated detector power spectral densities. The inverse covariance matrix is simplỹ Note that although the data depend on segment I, the parameters P n1 , P n2 , P h do not, since we are assuming that the noise and signal power spectra are the same in each segment.
The dimensionless function γ(f ), which appears in the off-diagonal elements of the covariance matrix, is the overlap reduction function, which accounts for the relative position and orientation of the detectors [23,24]. The functional form of γ(f ) is not relevant for the discussion that follows, other than the fact that γ(f ) equals unity for coincident and coaligned detectors in the longwavelength limit.
If we want to estimate the values of the power spectral densities P n1 (f ), P n2 (f ), P h (f ), at each discrete (positive) frequency f k ≡ k∆f , then there are no simplying sufficient statistics for this case as the data enter the likelihood function only through the combinations which does not correspond to a reduction in the number of data samples used in writing the likelihood function. However, if the power spectra are expected to be smooth over a coarser frequency resolution δf ≡ 1/τ > ∆f , where τ ≡ T /M is some fractional part of the segment duration T , then there is a reduction in the data combinations. This is because the relevant power spectra need only be estimated at fewer discrete frequencies f ≡ δf , where = 0, 1, · · · , (N/M )/2 − 1. (For typical LIGO-Virgo searches, M is approximately 20.) Hence the data combinations (36) can be averaged over a subset of M fine-grained frequencies f k centered at f : This leads to averaged (or coarse-grained) power spectral density estimatorŝ in terms of which the likelihood function (32) can be written: We note that estimating a power spectrum by subdividing a segment of data into shorter duration subsegments is a standard practice in signal processing [21]. Its a way to reduce the variance in the power spectrum estimate at the expense of a coarser frequency resolution.
For this stationary case, we get a further level of data reduction in the sufficient statistics, as we can average the coarsed-grained power spectrum estimators (38) over the number of segments. By construction, these segmentaveraged estimators will give expected values of P h (f ), P 1 (f ), P 2 (f ) over the whole observation. This is fine for stationary data. But if the detector noise levels can change from segment to segment, then this simple averaging will fail to capture the non-stationarity of the noise. We have to do something different for this more complicated, but realistic, scenario.

B. Colored signal+noise, non-stationary data
For non-stationary detector noise, we have to increase the number of model parameters from P n1 (f ), P n2 (f ), P h (f ) to P n1I (f ), P n2I (f ), P h (f ), where I = 1, 2, · · · , N seg , since the noise levels can differ from seg-ment to segment. We are assuming here that the power spectrum of the stochastic signal is stationary, which is not necessarily the case for a "popcorn-like" background, such as that produced by stellar-mass binary black hole mergers [25]. The covariance matrix for this case is theñ where are the auto-correlated detector power spectra for segment I. Similar to what we found in (39), the corresponding likelihood function for a single segment of data can be written as where the estimatorsP hI (f ),P 1I (f ),P 2I (f ) are the same as in (38). We emphasize that this likelihood differs from that in (39) only by our assumption that the noise is not stationary, which is reflected in the fact that the parameters P 1I (f ), P 2I (f ) carry I indices. Note that the above likelihood has the same form as the white noise case (22), but with both frequency and segment dependence in the estimators and parameters. The full expression for the likelihood function involves a further product over I, as specified in (31).

C. Reduced version of the colored, non-stationary likelihood function
To simplify the above analysis, we will proceed as we did in Sec. III B, where we considered a reduced signal+noise model and its corresponding likelihood by replacing the auto-correlated power spectral densities with measured esti-matesP 1I (f ),P 2I (f ), and working in the weak-signal approximation where P 2 h (f )/P 1I (f )P 2I (f ) 1. For a given discrete frequency f and data segment I, the reduction in the likelihood function has exactly the same form as the white noise case, which allows us to immediately write down: which is the leading-order expression for the variance ofP hI (f ). (Here we used the relation M = T δf for the coarsegrained frequencies, and we included the factor of (1 + 2/N avg ) to account for imperfect estimation of the detector auto-correlated power spectraP 1I (f ),P 2I (f ).) Additionally, since the parameter P h (f ) shows up only in the last exponential, we can use identity (A1) from Appendix A to perform the product over the number of data segments, which translates into a sum over I inside the exponential: h,opt (f ) Thus, summing over both I and : This is our main result. Note that it has the same basic form as (29), which we derived for the white signal+noise case. Thus, given some choice for the prior probability p(P h (f )), the posterior distribution for P h (f ) is given by This expression for the posterior shows thatP h,opt (f ) andσ 2 h,opt (f ) given by (46) and (47) are sufficient statistics for this colored signal+noise, non-stationary analysis, assuming weak signals and measured estimates of the autocorrelated power in two detectors for each data segment.
In Figure 2, we compare recoveries of the amplitude of the GWB using the full and reduced versions of the Bayesian likelihood functions appropriate for both white and colored data. The simulated data are the same as that from the previous section, consisting of a white GWB signal injected into white detector noise, for two coincident and coaligned detectors. For the reduced likelihood functions, the detector autocorrelated power were estimated from an additional simulated data segment. We see that all analyses agree very well, demonstrating that the mathematical derivations capture the behavior of the fully-Bayesian analysis in practice.
In Figure 3, we compare recoveries of the GWB amplitude and spectral index using the full and reduced versions of the Bayesian likelihood function appropriate for a colored signal+noise model. For the full likelihood, we parameterize both the GWB signal and detector noise as power laws of the form where the A is the amplitude, β is the spectral index, and f ref is a reference frequency. (The amplitude and spectral indices for the detector noise will differ, in general, from those for the GWB.) For this simulation the injected noise power spectra have A n1 = A n2 = 0.125, and β 1 = β 2 = 0.5, while the injected signal has A = A n1 /2, β = 0 (i.e., it is white). For the reduced likelihood function, the detector auto-correlated power were estimated from coarse-grained power spectral density estimators applied to an additional simulated data segment. Finally, we note that by using sufficient statistics, one improves the computational efficiency of a search relative to a fully-Bayesian analysis that works with the raw uncombined time-domain (or frequency-domain) data samples. For example, for a stationary GWB, one can reduce the number of likelihood evaluations by a factor of M N seg , where M ∼ 10-20 corresponds to coarsegraining, and N seg ∼ 10 5 comes from averaging six months of data split into approximately one-minute data segments. One simply precomputes the sufficient statistic data combinations (46) and (47), and performs Bayesian inference (e.g., MCMC sampling) for the corresponding likelihood function (48).

D. Connection to LIGO's hybrid frequentist-Bayesian analysis
We can use the reduced-form of the likelihood (48) to search for GWB signals described by the power spectral density P h (f ). But to make connection with the literature on GWBs, we should parametrize the background in terms of the (dimensionless) energy density spectrum [19]: where ρ c is the energy density needed to close the universe. Then which differs from the one-sided strain spectral density S h (f ) = (3H 2 0 /2π 2 )Ω gw (f )/f 3 by a factor of 1/5, since we are interested here in the strain response of a laser interferometer with a 90 • opening angle between the arms [5]. Thus, we can rewrite the likelihood function (48) in terms of Ω gw (f ) and its optimal estimatorΩ gw (f ) as whereΩ This is the form of the likelihood function that you'll find in the LIGO-Virgo GWB literature, e.g., [6], which serves as the starting point for subsequent Bayesian parameter estimation analyses. We can go one step further if we fix the spectral shape of the GWB, and focus attention on estimating only its amplitude at some reference frequency f ref , where we normalize the spectral shape to have unit value. (Typically f ref = 25 Hz for LIGO-Virgo stochastic analyses.) For example, for a power-law background with spectral index α, we have (note that there is no implied sum over α in the above equation). The spectral index of Ω gw (f ) is related to the spectral index of P h (f ) defined in the previous section by α = β +3. Then we can perform the product over frequencies f , again using identity (A1) from Appendix A to do the relevant sum of the argument of the exponential. This yields a likelihood function and posterior distribution for the amplitude Ω α that are both proportional to Note that the factor which multiplies the correlated data in (58), is proportional to the standard expression for the optimal filter in the frequency domain, see e.g., [5,19].

E. Rigorous comparison
As a check on the results of the previous section, we produce several percentile-percentile (pp) plots [22] to verify that the LIGO-Virgo hybrid frequentist-Bayesian analysis has good statistical coverage.
To generate a pp plot, we first perform N = 300 injections and recover the injection with different likelihood functions, as described below. For each recovery, we record the percentile of the posterior distribution at which the injected value lies. We then plot the cumulative distribution function of the percentiles, along with a 90% credible interval determined using order statistics showing the expected range of the cumulative distribution for each percentile value. If the methods we use are unbiased, then the cumulative distribution should lie within the 90% credible interval, close to a diagonal line. Deviations from this line indicate poor coverage, showing the method is biased.
First, we consider a set of white signal+noise injections. We generate 300 noise and signal realizations, and recover the signal amplitude using the full Bayesian likelihood, reduced likelihood for a white signal+noise model, and also the LIGO-Virgo hybrid frequentist-Bayesian analysis. As emphasized previously, for the reduced likelihood, it is important to use data segments different from the analysis segment to estimate the auto-correlated power in the two detectors, in order to avoid a bias in the recovered parameters. To illustrate the importance of this point, we recover the signal using the reduced likelihood analysis in two different ways: (i) using data segments different from the analysis segment to estimate the auto-correlated power, and (ii) using the same data segment as the analysis segment for estimating the autocorrelated power. We show the results of those analyses in Figure 4. The analysis using the same data segment as the analysis segment to estimate the autocorrelated power is clearly biased. On the other hand, the full likelihood analysis, the reduced likelihood analysis and the LIGO-Virgo stochastic analysis, the latter two of which use data segments different from the analysis segment to estimate the auto-correlated power, all show good coverage. These analyses are not identical, however, due to different conditioning of the data. In particular, the LIGO-Virgo analysis computes the crosscorrelation using 50% overlapping Hann windows, and estimates the auto-correlated power by averaging Welch estimators from adjacent data segments. In contrast, for our simple reduced likelihood analysis, we do not win-dow the data (which is okay for white data), and use a single additional data segment for estimating the autocorrelated power. Second, we perform a more realistic set of colored signal+noise injections, and recover the signal with the LIGO-Virgo stochastic analysis. In these injections, the theoretical noise power spectra are kept the same, although the noise realization changes in every injection. The amplitude of P h (f ) at a reference frequency of 1 Hz is fixed, but we draw the spectral index from a uniform prior probability distribution ranging from −3 to 3. We use the same prior on the spectral index to perform the recovery. The realization of the GWB signal changes in every injection. We perform two sets of 300 injections. In the first set, the amplitude of P h (f ) is set to be 1/8 of the amplitude of P n (f ) at the reference frequency in each segment, which is consistent with the weak-signal approximation. In the second set, the amplitude of the signal is set to 2 times the amplitude of the noise at the reference frequency in each segment, which violates the weak-signal approximation.
In the top panel of Figure 5, we show the spectrum associated with one injection in the weak-signal regime, along with the injected noise and GWB signal spectra. In the bottom panel, we show the results of the pp plot analysis for both strong and weak-signal amplitudes. The recovery of both the amplitude and spectral index for weak signals lie within the 90% uncertainty, demonstrating good Bayesian coverage. On the other hand, the recovery of the amplitude and spectral index for strong signals lies outside the 90% region. This is expected, given that strong signals violate the weak-signal approximation, which was used to derive the approximate equivalence of the LIGO-Virgo stochastic analysis and the fully-Bayesian approach.
FIG. 5. Colored signal+noise analysis. In the top panel, we show the injected noise power spectrum Pn(f ) (the same for both detectors) and the injected GWB power spectrum P h (f ), along with the optimal estimatorP h,opt (f ) and its uncertaintyσ h,opt (f ) for one segment in one realization. In the bottom panel, we show a pp plot generated by performing 300 strong-signal and 300 weak-signal injections and recoveries. We see that when the weak-signal approximation is satisfied, the LIGO-Virgo stochastic analysis has excellent Bayesian coverage. Outside of the weak-signal approximation, the coverage is less good, as expected.

V. DISCUSSION
In this paper, we have shown in what sense LIGO-Virgo's hybrid frequentist-Bayesian cross-correlation analysis for a GWB is equivalent to a fully-Bayesian search. The main result was our proof that for a reduced signal+noise model consisting of weak signals and estimated auto-correlated power spectra, the frequency integrand of the cross-correlation statistic and its variance are sufficient statistics for the recovery of the GWB. This means that the posterior distributions of the recovered spectrum of the GWB (e.g., its amplitude and spectral index) will agree for the hybrid frequentist-Bayesian analysis and the fully-Bayesian analysis in the context of this reduced signal+noise model. The results of our analyses on simulated data are consistent with those found in [26], which describes a fully-Bayesian implementation of LIGO's stochastic search that can estimate the presence of a primordial GWB in the presence of an astrophysical foreground.
We note that a similar hybrid frequentist-Bayesian analysis is also being used for pulsar timing array searches for a GWB produced by inspiraling supermassive blackhole binaries associated with galaxy mergers. Here a noise-marginalized version of the cross-correlation statistic [27,28] is used to avoid strong covariances that exist between individual estimates of pulsar red noise parameters and the amplitude of the GWB. Noise marginalization is performed by drawing values of the pulsars' red noise parameters from posterior distributions that were generated by an earlier Bayesian analysis which jointly estimates the pulsar's red noise parameters and that of a common red process (e.g., the auto-correlated power due to the GWB), which may exceed the noise at the lowest observed frequencies. This leads to a more accurate recovery of the amplitude of the GWB at relatively little computational cost to the cross-correlation statistic analysis.
A similar hierarchical approach, which first estimates the auto-correlated component of the background before looking for evidence of cross-correlations, would mostly likely also be needed for analyzing data from proposed 3rd generation (3G) ground-based detectors, like Cosmic Explorer [29] and Einstein Telescope [30]. The expected sensitivity of these 3G detectors is such that the weaksignal approximation in a single segment of data is no longer valid, at least for the rate of binary black hole mergers that advanced LIGO and Virgo are currently observing [31]. As such, simply estimating the autocorrelated power spectra without assigning some portion of it to the GWB would lead to biased estimates of the amplitude of the background, similar to what is seen in pulsar timing analyses [27].
It may be possible to extend the results of our paper to other signal+noise models, such as a non-stationary "popcorn-like" background produced e.g., by stellarmass binary black hole mergers in the LIGO frequency band [25]. For such a background, one would need to use a mixture signal+noise prior to model the nonstationarity, which amounts to postulating the presence of a correlated GWB signal in a certain fraction ξ of the data segments and just noise for the remaining segments, with fraction 1 − ξ [25]. The detection of a GWB in this scenario would amount to a posterior distribution for ξ strongly peaked away from zero. By focusing on a GWB signal model that consists of just excess correlation in a certain fraction of data segments (as opposed to marginalizing over the parameters of potential BBH mergers [25]), one should be able to implement a computationally-efficient and robust (albeit suboptimal) search for a non-stationary GWB.

ACKNOWLEDGMENTS
We thank Stephen Taylor for useful discussions about Bayesian analyses and noise-marginalized statistics in the context of pulsar timing arrays. We also thank Colm Talbot and Sylvia Biscoveanu for conversations at the early stages of this work, regarding fully Bayesian searches for stochastic backgrounds using LIGO-Virgo data. JDR thanks Tom Callister, Michael Coughlin, Joey Key, and Tyson Littenberg for many useful discussions, which took place several years ago and which eventually led to this project, albeit in a slightly different form than originally planned. JDR also acknowledges support of start-up funds from Texas Tech University. The authors are grateful for computational resources provided by the LIGO Laboratory and supported by National Science Foundation Grants PHY-0757058 and PHY-0823459. Numerical computations were performed with the numpy [32], scipy [33], and bilby [34] libraries, and plots with the matplotlib [35] library.

Appendix A: Useful identity
A useful identity that appears in several calculations in the paper is where x opt and σ 2 opt are defined by (A2) One can think of x i and σ i , where i = 1, 2, · · · , N , as a set of N independent measurements and error bars of the quantity µ, whose value is to be determined from the measured data. It is a well-known result that x opt defined above is the minimal-variance unbiased estimator of µ with variance σ 2 opt . A proof of (A1) is the following: where we completed the square in x opt and µ to get the last equality.
Appendix B: Uncertainties in power spectrum estimates In this appendix, we provide a brief summary of uncertainties in power spectrum estimates. The final result of this analysis justifies the inclusion of the factor (1 + 2/N avg ) in the expressions for the reduced likelihood functions for both the white and colored signal+noise models. Our presentation follows that of an unpublished internal LIGO technical note by Warren Anderson (25 May 2004).
To simplify the notation a bit, we will useP 1 ,P 2 to denote two power spectrum estimators, representing either the auto-correlationsσ 2 1 ,σ 2 2 for the white sig-nal+noise model or the auto-correlated power spectrā P 1I (f ),P 2I (f ) for the colored signal+noise model. The number of averages used in the construction of the power spectrum estimators is denoted by N avg , which is proportional to the number of data samples N for the white signal+noise models, or the number of frequency bins M averaged over for coarse graining in a Welch power spectrum estimate [21] for the colored signal+noise models.

(B5)
Since the power spectra appear in the full likelihood function via the product of their inverses, 1/σ 2 1 σ 2 2 or 1/P 1I (f )P 2I (f ), we need to calculate the expectation value 1/P 1P2 . So making a Taylor series expansion for both 1/P 1 and 1/P 2 , it follows that (B7) Taking the expectation value of both sides of the above expression, we find where we have ignored cubic and higher-order terms in δP 1 and δP 2 , and assumed the weak-signal approximation, P 2 h /P 1 P 2 1, to get the last approximately equality. This result shows that 1/P 1P2 is a biased estimator of 1/P 1 P 2 . Nonetheless, this bias can be removed by simply moving the factor of (1 + 2/N avg ) to the left-hand side, so that is a unbiased estimator of 1/P 1 P 2 . This is the replacement we make for 1/P 1 P 2 in the reduced likelihood functions in the main text, cf. (23).
Appendix C: Alternative derivation of a reduced likelihood function In this appendix, we give an alternative derivation of the likelihood function for a reduced signal+noise model, but under different assumptions than those given in the main text. More specifically, we do not assume here that the GWB signal is weak compared to the detector noise, nor do we use estimators of the auto-correlated power calculated from data segments different from that being analyzed for the signal. Rather we assume that: (i) the number of data points N (or coarse-grained averages M ) for a given data segment I is sufficiently large that one can expand the likelihood function around the maximum-likelihood estimators of σ 2 1 , σ 2 2 , σ 2 h (or P 1I (f ), P 2I (f ), P h (f )) to second order without loss of information; and (ii) the data are informative for the autocorrelated power σ 2 1 , σ 2 2 (or P 1I (f ), P 2I (f )) allowing us to evaluate the second-order likelihood function at the values of σ 2 1 , σ 2 2 (or P 1I (f ), P 2I (f )) that maximize the likelihood for fixed values of σ 2 h (or P h (f )). For concreteness, we give the derivation here in the context of the white signal+noise model for two coincident and coaligned detectors. But it can be easily extended to the case of colored data with non-stationary noise and a non-trivial overlap function. Our derivation follows that given in [36].
We start with the full likelihood function (22) for the white signal+noise model, which we rewrite here as where f (σ 2 1 , σ 2 2 , σ 2 h |d) = 2 ln(2π) + ln σ 2 In the above expressions, σ 2 1 , σ 2 2 are the total autocorrelated variances in the two detectors, see (18), and are the maximum-likelihood estimators of σ 2 1 , σ 2 2 , σ 2 h . Assuming that N is sufficiently large, we can Taylor expand f around its maximum-likelihood values ignoring terms higher than second order in the differences σ 2 1 −σ 2 1 , σ 2 2 −σ 2 2 , σ 2 h −σ 2 h . Doing so gives f (σ 2 1 , σ 2 2 , σ 2 h |d) f (σ 2 1 ,σ 2 2 ,σ 2 h |d) + where x i ≡ (σ 2 1 , σ 2 2 , σ 2 h ), and it follows that the inverse matrix (ii) Although we denoted the right-hand sides of (C9) byσ 2 1 ,σ 2 2 , these expressions are not the same as those used in the main text (see the discussion in Sec. III B), which were estimators of σ 2 1 , σ 2 2 constructed from data segments different than that being analyzed for the signal. The use of different data segments to estimate the auto-correlations is necessary for LIGO-Virgo data, since the number of averages used to estimate power spectra is not sufficiently large to beat down the bias that arises from using the same data in both the numerator and denominator of the exponential in (C11). (Recall that for the colored signal+noise model, the number of av-erages is proportional to the number of frequency bins M that are averaged together for coarse-graining.) For LIGO-Virgo data, one is restricted to N avg of order at most 50, due to the complexity of the detector noise (the need to take the coarse-grained frequency resolution to be δf ∼ 0.25 Hz) and its broad non-stationarity on time scales of order minutes (ignoring shorter time-scale instrumental glitches). By using different data segments to estimate the auto-correlated power, the bias goes away in the weak-signal limit, as shown in Fig. 4. But if the number of averages was sufficiently large, then one could use the expression in (C11) as is.