The optimal search for an astrophysical gravitational-wave background

Roughly every 2-10 minutes, a pair of stellar mass black holes merge somewhere in the Universe. A small fraction of these mergers are detected as individually resolvable gravitational-wave events by advanced detectors such as LIGO and Virgo. The rest contribute to a stochastic background. We derive the statistically optimal search strategy for a background of unresolved binaries. Our method applies Bayesian parameter estimation to all available data. Using Monte Carlo simulations, we demonstrate that the search is both"safe"and effective: it is not fooled by instrumental artefacts such as glitches, and it recovers simulated stochastic signals without bias. Given realistic assumptions, we estimate that the search can detect the binary black hole background with about one day of design sensitivity data versus $\approx 40$ months using the traditional cross-correlation search. This framework independently constrains the merger rate and black hole mass distribution, breaking a degeneracy present in the cross-correlation approach. The search provides a unified framework for population studies of compact binaries, which is cast in terms of hyper-parameter estimation. We discuss a number of extensions and generalizations including: application to other sources (such as binary neutron stars and continuous-wave sources), simultaneous estimation of a continuous Gaussian background, and applications to pulsar timing.


I. INTRODUCTION
Observations of gravitational waves from binary black hole mergers imply that stellar-mass black holes coalesce somewhere in the visible Universe every 223 +352 −115 s [1].Binary neutron stars merge every 13 +49 −9 s [1].The vast majority of these events are too distant to be individually resolved by the current generation of detectors.The most distant event yet observed, GW170104, was measured to have a redshift of z = 0.18 +0.08  −0.07 [2].Nonetheless, unresolved compact binary mergers contribute to a stochastic background of gravitational waves, which may be detectable with current detectors [3].Measuring the stochastic background from compact binaries has the potential to provide information about high-redshift binary black holes and neutron stars, which complements observations of local mergers [4].
The stochastic background is typically characterized by the gravitational-wave energy density spectrum where dρ gw is the energy density of gravitational waves between f and f + df and ρ c is the critical energy density for a flat universe [5].Searches for the stochastic background seek to measure Ω gw (f ) > 0. In the LIGO/Virgo band (10 − 2000 Hz), the best current limits on the gravitational-wave energy density spectrum are Ω gw (f ) < 1.7 × 10 −7 (95% confidence, measured in the band 20 − 86 Hz) [6,7].
a rory.smith@monash.edub eric.thrane@monash.edu To date, all LIGO/Virgo searches for the stochastic background have relied on the cross-correlation method described by Allen & Romano [5].A similar crosscorrelation technique is employed by pulsar timing arrays operating in the nanohertz band [8].The crosscorrelation method has two nice features.It is computationally cheap and it yields a statistically optimal (minimum-variance) measurement of Ω gw (f ) for the case of a persistent, Gaussian background.
A Gaussian background is characterized entirely by Ω gw (f ).The stochastic background from stellar-mass binary black holes is highly non-Gaussian; it is rare for different events to overlap in time [1] (within the advanced detector band).The stochastic backgrounds from binary neutron stars are quasi-Gaussian since the signals from individual events overlap in time [1].We focus initially on the highly non-Gaussian background from stellar-mass binary black holes, but return below to consider quasi-Gaussian backgrounds from binary neutron stars.
The binary black hole background consists of (inprinciple), clearly-distinguishable, deterministic signals [9].Since distant events are not resolvable with current detectors, most events are not distinguishable in practice, but they would be with a more sensitive detector.This is in contrast to, for example, the stochastic background from white dwarf binaries in the millihertz band, which cannot be distinguished in principle; see, e.g., [10].Cross-correlation searches are sub-optimal for non-Gaussian backgrounds.It is possible to improve the sensitivity of the stochastic search by including a more accurate description of the signal model.
A number of studies dating back to 2002 have outlined a variety of strategies for developing a non-Gaussian pipeline.Drasco & Flanagan derived an algorithm suitable for bursting sources [11] observed by co-located detectors with white noise [12].While the assumptions of white noise and co-located detectors are not realistic, the study was important for showing that a non-Gaussian analysis could achieve a potentially significant improvement in sensitivity compared to a crosscorrelation search.Subsequently, Thrane outlined a method for bursting sources that can be applied in the more realistic case of spatially separated detectors with colored noise [13].Work by Martellini & Regimbau has explored additional strategies [14,15].None of these methods have yet been used in a published search.For a comprehensive review of stochastic background methodology up until this point, see [16].
In this paper, we take a step back and ask a new question: what is the optimal method for detecting a stochastic background of binary black holes (or any other astrophysical background)?Formally, the optimal method yields a minimum credible interval posterior.In practical terms, the optimal method can be more sensitive than sub-optimal techniques.Conceptually, deriving the optimal method amounts to implementing a likelihood function, which describes the salient features of the signal model and measurement noise.Turning the crank of Bayesian statistics, the resulting posterior distributions yield minimum credible intervals by construction.
It turns out that this question of optimality is interesting for several reasons.First and most obvious, it is desirable to derive the most sensitive possible search.Depending on the magnitude of improvement over the cross-correlation search, an optimal pipeline could significantly reduce the time to the detection of a stochastic background.Second, we show below that the optimal Bayesian search yields, as a byproduct, information about the population properties of high-redshift black holes.In particular, we automatically obtain a posterior distribution for the coalescence rate of binary black holes at high redshift.This information is lost during the process of cross-correlation.
Third, our formalism provides a natural framework for unifying the stochastic search with measurements of unambiguous detections and even "silver-plated" candidates [17] like LVT151012 [18].As the number of detected black hole mergers increases, the gravitationalwave community is likely to be increasingly interested in population statements.The framework proposed here is the natural means of combining all possible data to make statements about populations of binary black holes (and other astrophysical phenomena).The method is free of Malmquist bias [19] since there is no selection of events.
Last, we outline how the method can be generalized in order to solve a number of important problems in gravitational-wave astronomy including (1) measurement of a primordial Gaussian background in the presence of an astrophysical foreground and (2) measurement of the population properties of binary black holes and neutron stars, e.g., their mass and spin distributions.
The remainder of this paper is organized as follows.In Section II, we derive the optimal search for an ensemble of binary black hole mergers.In Section III, we present the results of a Monte Carlo study that demonstrates the method.In Section IV, we investigate scaling relations in order to estimate the performance of the search in various contexts.In Section V, we consider complications arising from non-Gaussian noise.In Section VI, we consider the feasibility of the search given plausible computing resources.
In Section VII, we discuss how this method can be used to measure the population properties of unresolved binary black holes using hyper-parameters.In Sec.VIII, we discuss how the method can be generalized to similar problems including binary neutron stars, neutron star black hole binaries, continuous waves from rotating neutron stars, and supermassive black hole binaries.In Section IX, we discuss the simultaneous measurement of a primordial gravitational-wave background.In Section X, we provide an overall assessment of the feasibility of an optimal stochastic search and the prospects for detection of a stochastic background.

A. The likelihood function
We break the strain data s into convenient-sized segments.The data for segment i is denoted s i .By assumption, each segment is big enough that it might include one signal (specifically, the merger part of the signal), but small enough that it is unlikely to contain two such signals.In this paper, we satisfy these criteria by using segments with a duration of 4 s.For binary black holes like GW150914, the signal is in-band for only ≈ 0.2 s, less than the segment duration.Since the probability of observing just one event is in any given segment is small ≈ 2%, the probability of observing two at once is negligibly small: ≈ 10 −4 .
Assuming Gaussian noise, the log likelihood for a single segment s i containing a compact binary signal with parameters θ is log Here h(θ i ) is the signal model, which depends on a vector of parameters θ i , e.g., sky location, component masses, the time of coalescence within the segment, and so on.
We have introduced the usual noise-weighted inner product: where ∆f is the frequency-bin size and is the real part of the sum.The variable S n (f k ) is the detector noise power spectral density.The sum runs over k frequency bins.
When combining data from a network of M detectors, the likelihood function for segment i becomes where s i represents the data from all M detectors and j indexes each of the M detectors.Stochastic searches typically rely on M ≥ 2 networks in order to distinguish astrophysical signals from poorly modeled noise [5].Henceforth, we assume M ≥ 2 detectors because this enables a coherent search for sub-threshold signals.The notation s i is used to indicate the strain data from M detectors associated with segment i.
We can generalize the likelihood function in Eq. 2 to account for the fact that we expect the data to contain a signal plus Gaussian noise with probability ξ, or pure Gaussian noise with probability (1−ξ).(We defer discussion of non-Gaussian noise until Section V.) Hence, we define a "generalized likelihood" for segment i, modified to include the signal probability hyper-parameter ξ [12], which we refer to as the "duty cycle" We note that this astrophysical duty cycle should not be confused with the detector duty cycle, which is the fraction of time during which a detector collects sciencequality data.Here, L( s i |0) is the likelihood function given the hypothesis that no signal is present: We marginalize over the astrophysical parameters of the event θ i (with a prior distribution π(θ i )) in order to obtain a likelihood for the data given the duty cycle ξ for segment i We have introduced two new terms corresponding respectively to the signal evidence and the noise evidence.These evidences are the raw output of LIGO parameter estimation algorithms such as LALInference [20].We do not marginalize over the detector noise power spectral density.The power spectral densities that enter Eq. 3 are empirically estimated for each data segment.The next step is to combine data from a large set of n segments { s}.The combined likelihood for the data given ξ is The posterior for the duty cycle p(ξ|{ s}) is simply where π(ξ) is the prior distribution on ξ.For simplicity we will assume a flat, uniform prior for π(ξ) so that though our analysis can incorporate any suitable choice of π(ξ), informed, for instance, by expectations about the average time between binary black hole mergers [1].

B. Detection statistic
In order to search for an astrophysical stochastic background, we calculate a "stochastic background evidence" The null hypothesis (there is no stochastic background) is described by a null evidence: We construct a Bayes factor to compare the two hypotheses: This variable is an optimal detection statistic for an astrophysical background of compact binaries.The optimality follows from the fact that the likelihood function describing the data given the signal and noise models is complete.Hence, the search produces a minimum credible-interval posterior distribution on the duty cycle.
In the remainder of this paper, we adopt the convention that a log Bayes factor of ≈ 8 represents a statistically significant preference for one hypothesis over the other [21].
Up to this point, we have, for the sake of convenience, written our likelihoods in terms of duty cycle ξ.In the subsequent subsections (II C-II E), we discuss how ξ is related to other quantities including rate and energy density.

C. Rate
In this subsection, we take the likelihood L tot { s}|ξa function of duty cycle ξ-and recast it as a function of R: the number of mergers per segment.In particular, R is the rate of events throughout the visible Universe per segment.This will allow us to more easily relate our analysis to observations of individual merger events.It is useful to contrast ξ and R. While duty cycle ξ is defined on (0, 1), the rate of events per unit segment is defined on (0, ∞).Even perfect knowledge of ξ does not determine R because the latter is subject to cosmic variance arising from the fact that events take place randomly following Poisson statistics.
The number of compact binary mergers N is given by the product of the duty cycle ξ and the number of segments n N = n ξ. ( We perform a change of variable in order to recast the likelihood variable in terms of N : where Having recast the likelihood in terms of the number of compact binary mergers N , we can marginalize over N to obtain where π(N |R) is a conditional prior for the number of compact binary mergers N given a rate R (with units of mergers per segment).The conditional prior is given by a Poisson distribution The total evidence is a likelihood function for the data given the rate hyper-parameter The rate posterior is where π(R) is some suitable prior on the rate R.

D. Local rate
The variable that we have been referring to as "the rate" R-by which we mean, the number of compact binary mergers in the visible Universe per segmentis distinct from the local merger rate R 0 with units of Gpc −3 yr −1 .However, they are related.We derive the local rate R 0 from R: Here, R(z) is the co-moving merger rate in units of Gpc −3 yr −1 as a function of redshift and δt is the segment duration.The factor dV /dz describes how an element of volume evolves in an expanding universe while the factor of 1 + z comes about transforming the time variable in the source frame to the detector frame; see [1].The shape of R(z) is determined by a model, which takes into account, e.g., the stellar formation rate as a function of redshift and the time delay between formation and coalescence; see, e.g., [3].However, we can treat the overall normalization as a free parameter so that where S(z) is a model-dependent, dimensionless shape function (normalized so that S(z = 0) = 1) and the local rate R 0 ≡ R(z = 0) is the normalization.Combining Eqs.24 and 25, we obtain It will be interesting to compare the local rate inferred from a stochastic detection-and assuming some shape model S(z)-with the local rate measured directly by resolvable mergers.Tension in these two measurements could indicate an inadequate shape model S(z) among other things.

E. Energy density
Given some model, the local rate R 0 (and therefore R) can be converted into dimensionless energy density Ω gw (f ) defined in Eq. 1; see, e.g., [22,23].The fully general expression is a bit unwieldy so we employ two simplifying assumptions in order to obtain an intuitive initial expression.First, we assume that the (source-frame) energy spectrum of the event dE GW /df s -a function of the source frame frequency f s -is determined primarily by the chirp mass of the binary M c .Second, we assume that the co-moving merger rate does not depend on mass.Given these two assumptions, where H 0 is the Hubble constant, Ω M is the energy density of matter, and Ω Λ is the energy density of the cosmological constant.In the first set of parentheses, ρ c is the critical energy density for a flat universe.The variable π(M c ) is the mass distribution.
Thus, we can relate the number of events per segment R (and/or the local rate R 0 ) to the energy density spectrum Ω gw (f ), but only by employing a model to describe the distribution of events in redshift, mass, and so on.In the derivation above, we assumed that Ω gw (f ) depends primarily on chirp mass.However, a more general expression for Ω gw (f ) would include integrals over every variable that can affect the energy spectrum, for example, the mass ratio.Also, in general, S(z) can depend on variables such as M c , in which case S(z|M c , ...) cannot be taken out of the Eq.27 integral over M c .

III. DEMONSTRATION USING GAUSSIAN NOISE
In this section we carry out a Monte Carlo simulation in order to demonstrate the method described in Section II.Our goal is to calculate the duty cycle posterior p(ξ|{ s}) using simulated data containing a population of sub-threshold black hole binaries.We seek to fulfill three criteria.First, the method should be "safe": it should return a null result when applied to pure Gaussian noise.Second, the method should be effective: it must yield a positive detection in the presence of a sufficiently loud stochastic signal.Third, the method should be unbiased.The duty cycle posterior ought to, on average, peak at the injected value.
We assume a two-detector network consisting of the LIGO Hanford and Livingston observatories operating at design sensitivity [24].It is straightforward to extend the method to include additional detectors [25], but we begin with two for the sake of simplicity.For each detector, we generate two datasets.The noise dataset consists of 1000 4 s segments of Gaussian noise.The signal dataset consists of 300 4 s segments with a binary black hole signal added to Gaussian noise.
The signals are coherently generated (and later recovered) using the IMRPhenomPv2 approximant [26].The parameters of each merger are drawn randomly.The orientation angles and sky position are drawn from isotropic distributions.We marginalize over the time of coalescence, which is drawn from a uniform distribution.The total mass is drawn from a uniform distribution on (48M , 80M ) while the mass ratio is drawn from a uniform distribution on (1, 8).This mass range produces signals that fit conveniently into our 4 s segments, given a minimum frequency of 20 Hz.The mass ratio is within the domain of validity for IMRPhenomPv2.The dimensionless spin magnitudes (a 1 , a 2 ) are drawn from a uniform distribution on (0, 0.89), which is within the domain of validity for IMRPhenomPv2.The spin unit vectors are drawn from an isotropic distribution.The luminosity distances are drawn from a uniform-in-volume distribution on the interval (0.50 Gpc, 5 Gpc).The lower limit of 0.50 Gpc removes most gold-plated detections, the remainder of which are eliminated with an additional cut described below.
Reconstructed masses and distance are affected by redshift.The lab-frame masses m l i=1,2 and the luminosity distance d L are given by where m s i is measured in the source frame and d M is the comoving distance.The mass distributions described above apply to quantities measured in the lab frame.
Luminosity distances of (0.50 Gpc, 5 Gpc) imply a redshift interval of (0.10, 0.77) within the framework of the standard ΛCDM cosmology.In principle, we can (and eventually will) extend the interval to d 30 Gpc (z 3.3) beyond which the stochastic signal is expected to be marginal due to the low stellar formation rate and expanding Universe [4].The uniform-in-volume distribution is then modified by the imposition of a "Malmquist prior" [19]; we exclude "gold-plated" detections that can be observed with a statistically significant coherent matched-filter signal-to-noise ratio ρ network ≥ 12: Here, the sum over j runs over detectors.The maximum over θ determines the matched filter template that best fits the data, and so ρ network is the maximum-likelihood signal-to-noise ratio.Since most binaries merge near the edge of our 5 Gpc sphere, this cut removes 1% of the events.Eventually, gold-plated events should be included in the analysis in order to achieve a unified approach to compact binary population inference and stochastic backgrounds.However, we exclude them here for the sole purpose of demonstrating that we can recover the signal from a stochastic background of sub-threshold events.A histogram of ρ network is shown in Fig. 1.
Having generated mock data for the noise and signal populations, we create mixed populations for arbitrary duty cycles ξ by selecting at random a mixture of entries from each distribution.We construct three datasets corresponding to ξ = (0, 0.05, 1) with n = (500, 525, 300) segments respectively.We compute the signal evidence Z S and noise evidence Z N for every event in each dataset (Eqs.8 and 9).The calculation is carried out using LAL-Inference.We employ reduced order modeling and reduced order quadrature methods to control the computational cost of the analysis [27].
In Fig. 2, we plot p(ξ|{ s}) for the three values of duty cycle.The posterior for pure noise (ξ true = 0) is indicated by \-hatched orange.The fact that it peaks at ξ = 0 shows that the method is safe.The posterior for pure signal (ξ true = 1) is indicated by /-hatched blue.The fact that it peaks at ξ = 1, clearly excluding ξ = 0, indicates that the method is effective.Finally, the posterior for a mixed distribution (ξ true = 0.05) is indicated by green.The posterior peaks near the true value of ξ true = 0.05, which shows that the method is unbiased.filter signal-to-noise ratio (Eq.31) for a dataset containing simulated signals + Gaussian noise.The network signal-to-noise ratio is calculated by maximizing over all astrophysical parameters.This is why the distribution falls off sharply near ρ network = 1; one of the many available templates tends to produce a reasonable fit to the data such that ρ network 1, even for weak signals.The signal distribution is generated from a population of mergers uniform in volume between (0.50 Gpc, 5 Gpc).Gold-plated detections with ρ network ≥ 12 are excluded.

IV. TIME TO DETECTION
In this section, we estimate the time it will take to make a confident detection of an astrophysical background (ln BF > 8).In doing so, we study how the Bayes factor (Eq. 15) scales as a function of the true duty cycle ξ true and the number of data segments N segs .Recent estimates place the binary black hole background at Ω gw (f = 25 Hz) ≈ 1.1 × 10 −9 [1].In Appendix A, we show that, given our mass and distance distributions (described above in Section III), ξ true = 4 × 10 −4 provides a realistic duty cycle.We assume this value of ξ true for the remainder of this section.Due to computational constraints, we create 1000 noise segments for this preliminary study.This allows us to carry out a mock study using ≈ 1 hour of data and to probe duty cycles ξ 0.1%.In order to estimate how the algorithm will perform when applied to longer datasets and/or lower duty cycles, it is necessary to extrapolate.Our extrapolation provides an initial performance estimate, which should be checked with a more computationally expensive mock data challenge.
Our extrapolation uses a Gaussian Mixture Model (GMM) [28] to fit the distributions of signal and noise evidences [π(Z S ), π(Z N )] using the (1000, 300) data segments that we have already simulated.A GMM models the data as a superposition of independent Gaussian distributions.The GMM fits are displayed in Figs.3a (signal dataset) and Figs.3b (noise dataset).In each panel, the horizontal axis is the log signal evidence while FIG.2: Duty cycle posteriors p(ξ|h) (Eq.11) for Monte Carlo datasets.The orange data are pure Gaussian noise, the blue data are a population of sub-threshold binary black hole events added to Gaussian noise, and the green data correspond to a mixture with a true duty cycle of ξ true = 0.05.The fact that each posterior peaks at the appropriate value shows that the method is safe, effective, and unbiased.
the vertical axis is the log noise evidence.The color bar indicates the probability density.
Using the GMM fits, we can generate large extrapolated datasets with arbitrarily low duty cycles.In Fig. 4, we show how the log Bayes factor (Eq. 15) scales with the injected duty cycle ξ true and the number of segments N segs , the latter of which is equivalent to the observation time.We average over 1000 realizations of (Z S , Z N ) drawn from the GMMs (created assuming a two-detector LIGO network operating at design sensitivity).We find that an astrophysical background with a realistic effective duty cycle ξ true ≈ 4 × 10 −4 can be detected with N segs ≈ 17000 data segments, corresponding to 20 hours.This can be compared to a detection time of 1 year using cross-correlation [1].The signal is created by ≈ 7 sub-threshold events with z < 0.77.We expect that the improvement in sensitivity results from two effects.First, the likelihood includes information about the non-Gaussian nature of the binary black hole background.Second, the likelihood includes information about the deterministic nature of compact binary waveforms.Additional work is required to determine the relative importance of these two factors.mistaken for gravitational-wave signals.Failure to account for glitches can bias the duty cycle posterior, or worse, yield a false positive.We compute the duty cycle posterior for a set of 600 4 s segments of data from LIGO's first observing run (O1).We introduce an unphysical time shift to ensure that any real gravitational-wave signals do not produce coherent signals in the Hanford and Livingston detectors.The true duty cycle should therefore be zero, but the posterior peaks at around ξ = 0.4 and excludes zero.Non-stationary noise is producing a significant bias.An example of the undesirable effect of glitches is shown in Fig. 5.
In order to take into account non-Gaussian noise, we extend the likelihood expression from Eq. 7 to include contributions from glitches.For simplicity, we consider a two-detector network, but the results generalize.For our glitch model, we conservatively suppose that glitches look exactly like binary black hole waveforms except that the waveform in one detector is completely uncorrelated with the waveform in the other.Any part of a glitch that is orthogonal to the binary black hole signal manifold will not contribute any signal evidence Z S .Some additional notation is necessary.We introduce parameters ξ (1) g and ξ (2) g corresponding to the glitch duty cycle in detectors one and two respectively.The variables Z (1) g and Z (2) g are the single-detector evidences for the glitch hypothesis given by Here, θ (1) and θ (2) are the signal parameters for (uncorrelated) glitches in detectors 1 and 2 respectively.The variables s (1) and s (2) are the strain data in each detector.We note that the glitch evidences are equivalent to single-detector signal evidences given our conservative glitch model.A key feature of this glitch model is that the parameters in each detector are independent.
We introduce the variables Z (1) N and Z (2) N for the singledetector noise evidences (see Eq. 9).The variables Z (1) S+g and Z (2) S+g are the two-detector evidences for a coherent signal with a glitch in one detector.Finally, the variable S+g is the two-detector evidence for a coherent signal with a glitch in both detectors.We do not include explicit expressions for Z (1) S+g , and S+g because-as we shall see momentarily-they can be approximated as zero.For the sake of compact notation, we suppress the segment number index i on every evidence term.
Given our new definitions, the likelihood for coherent merger events in glitchy data is Each line corresponds to a distinct possibility.A probability tree corresponding to the likelihood function is shown in Fig. 6.Starting from the first line and reading toward the bottom, the possibilities are: 1. signal and no glitches 2. no signal and no glitches 3. no signal and a glitch in detector (1) 4. no signal and a glitch in (2) 5. no signal and a glitch in (1) and (2) 6. signal and a glitch in (1) 7. signal and a glitch in (2) 8. signal with glitches in (1) and (2) .
Note the priors for ξ, ξ g , ξ (2) g all run from (0, 1).They are all independent.Each of the above possibilities corresponds to a branching path in Fig. 6.
For the sake of simplicity, we hypothesize that the last three terms in this likelihood contain evidences that are small enough to safely ignore in practice: S+g , and Z (1,2) S+g .We expect these three Z to be small because they employ overzealous models, which tend to overfit the data.While terms like Z S , Z (1) g and Z (2) g employ 15 parameters to fit a merger event or merger-like glitch, Z (1) S+g and Z (2) S+g employ 30 parameters to simultaneously fit a merger event and a glitch.The Z (1,2) S+g term employs 45 parameters to fit a merger event and two glitches.Since we expect that mergers and glitches are easily fit with 15parameter models, the final three Z incur large Occam factors, which results in small Z.This reasoning may not apply to the special case of data that actually contains a merger and a glitch, but we expect such events to be rare.The recent binary neutron star event GW170817 was detected coincident with a significant glitch [29], but this does not change our expectations.Binary neutron star signals are in band for 100 s versus 0.3 s for high-mass binary black holes, and so the chance of a coincident glitch is relatively higher.Of course, one is free to retain the S + g terms, but this requires modification of LALInference.We anticipate that such modifications will be well worth pursuing for a number of applications, including work to extend this analysis to low-mass systems.The subtraction of a glitch associated with GW170817 [29] using a wavelet reconstruction is a step in this direction [30].
If we assume that the final three Z are small enough to ignore, the glitchy likelihood becomes Below, we show that the "small-Z" approximation works well when we apply this likelihood to real data.This is convenient because it will enable us to obtain results using only existing 15-dimensional signal models.We now show that our method is robust when applied to real LIGO noise.Because the LIGO detectors cannot be shielded from gravitational-wave signals, we ensure that the analyzed data cannot contain coincident real signals by performing "time slides" in which a relative time Is there a signal?Is there a glitch in 1?
Is there a glitch in 2? 6: Probability tree for Eq.34 assuming a two detector network.The left branches correspond to "yes" and the right branches "no".The probability of each branch is labeled accordingly.The "leaves" at the base of the tree show the evidence associated with each path.
offset, longer than the light-travel time between sites, is applied between the data from the detectors.Time-slides are a common boot-strap technique for generating realistic background noise.
As before, we generate two datasets: a noise-only dataset consisting of 670 4 s background data segments from O1; and an injection dataset consisting of 60 software injections into 4 s background data segments from O1.The signals are generated according to the prescription described in Sec.III.Injections are drawn from a uniform-in-volume distribution.They are all subthreshold.We construct three populations corresponding to ξ = (0, 0.09, 1) to test for effectiveness, safety, and bias.We compute the duty cycle posterior using the "glitchy" likelihood function (Eq.35).In Fig. 7 we demonstrate that the glitch model employed in the likelihood function Eq. 35 is safe, effective and unbiased as required.

VI. COMPUTATIONAL REQUIREMENTS
Figure 4 implies that, at design sensitivity, a realistic astrophysical background with an effective z < 0.77 duty cycle of (ξ true = 4 × 10 −4 ) could be detected using ≈ 17000 4 s data segments.At the time of writing, a typical run-time of LALInference on 4 s data segments is usually no more than 10 CPU hours [27].To produce evi-dences in a two detector network, we perform three separate runs: a coherent analysis which produces (Z i S , Z i N ); and two incoherent analysis which produce (Z i,(j) S , Z i,(j) N ) where the index j labels the detector, j = (1, 2).We therefore estimate that an astrophysical background is detectable in 500K CPU hours.This works out to one week with 3000 dedicated CPUs.Detecting an astrophysical background of binary black holes is therefore feasible with current computing resources of the LIGO Data Grid (LDG), which consists of tens of thousands of CPUs.This cost estimate assumes relatively high-mass signals split into 4 s data segments.Additional work is required to investigate the cost and performance of the search as the analysis is extended to lower-mass events.As the minimum mass is decreased, the event rate is expected to increase while the signal duration increases, resulting in a higher duty cycle.We discuss some of the associated challenges below in Subsection VIII A.
While it may be possible to observe a stochastic background by analyzing a small amount of data (with a small computational cost), there is strong motivation for analyzing all available data.By analyzing an entire observing run, we can make inferences about the population properties of binary black holes (and with additional work, eventually binary neutron stars).We describe this population inference in the subsequent section.Carrying out full parameter estimation on a year of data with existing software and hardware would require around 15K FIG.7: Duty cycle posteriors computing using the "glitchy" likelihood function Eq. 35 continuously operational CPUs in order to process the data in real time.While this is technically possible, it would be a costly proposition.
In order to reduce the cost, it is worthwhile to consider the development of new computational methods and implementation on new hardware architecture in order to realize the ultimate goals of the search.These advances might come, for instance, by exploring greater parallelization of parameter estimation methods through the use of, e.g., graphical processor unit (GPU) clusters or larger supercomputer clusters; see, e.g., [31].We note in passing that it might be possible to detect some individually resolvable sources using parameter estimation if they were just below the threshold for detection with matched filter pipelines [32].

VII. POPULATION INFERENCE
In addition to inferring the the astrophysical duty cycle, which can be related to the local merger rate (Subsection II C) and GW energy density (Subsection II E), we can also use this framework to infer properties of the binaries that contribute to the astrophysical background.These population properties may be encoded as hyperparameters, which affect the prior distributions for various binary parameters.For example, instead of assuming a flat prior on chirp mass, we can assume that the total mass follows a power-law distribution with some spectral index α Using Bayesian hierarchical modeling [33,34], we can marginalize over M to obtain a posterior on α.
Before sketching how this works, we note that there are three good reasons for eventually developing a sophisticated hyper-parameterization scheme.First, there is interesting information encoded in the population properties of binary black hole coalescences, which can be used, for example, to study binary black hole formation channels; see, e.g., [35][36][37][38][39][40].Second, we should hyperparameterize theoretically uncertain prior distributions in order to obtain an unbiased estimate of the coalescence rate.For example, a systematic error in the assumed mass spectrum will lead to a systematic error in the inferred rate posterior.Finally, by acknowledging theoretical uncertainty with hyper-parameterization, we should improve the sensitivity of the search.This should generally be true when necessary parameters are added to the search.For example, marginalizing over the unknown mass spectrum index is likely to yield a higher Bayes factor than just assuming some value (unless we have a strong prior belief in some particular value).
The likelihood function for a vector of hyperparameters Λ is obtained by introducing a conditional prior π(θ|Λ) and then marginalizing over θ The distribution π(θ|Λ) is the hyper-parameterized conditional prior.The above integral can be computed without performing any extra sampling by "recycling" the posterior samples already computed by LALInference.
Replacing the likelihood by a sum over samples we obtain: The distribution π(θ|LAL) is the prior distribution used for the generation of the posterior samples by LALInference.The product over i runs over the number of data segments from 1 to n.The sum over k runs over the number of posterior samples from 1 to n i .(Each segment has a different number of posterior samples.)We note that Eq. 38 can be applied as a postprocessing step.That is, one need only run LALInference once.As long as we use relatively uniformative priors for π(θ|LAL), the resulting posterior samples can be reweighted in order to obtain the likelihood for different hyper-parameter values.The posterior for Λ and the (hyper-marginalized) Bayes factor are calculated the usual way.The first prior distributions to hyperparameterize are (1) those, which shed light on the mechanisms of binary black hole formation and (2) those, which are subject to significant theoretical uncertainty.Probably, this means hyper-parameter descriptions of the binary black hole mass spectrum and the distribution of black hole spins.The former is the subject of work in preparation [41].

VIII. EXTENSIONS OF THE SEARCH
In this section, we consider five possible extensions to this search method: searches for binary neutron stars, continuous waves, super-massive black hole binaries, bursts, and glitch classification.We also consider the simultaneous estimation of Gaussian background with a non-Gaussian foreground, but the discussion requires some depth, and so we discuss that separately in the next section.Our goal here is two-fold.First, we seek to illustrate promising directions for future research.Second, we endeavor to showcase that, for every astrophysical background, there is an appropriately complete likelihood function, which can serve as the basis for an optimal search.
A. Binary neutron stars.
An obvious extension of this technique is to search for backgrounds of binary neutron stars.Recent observations of a binary neutron star inspiral [29] suggest that the gravitational-wave energy density Ω gw from such systems is roughly comparable to the energy density from binary black holes [3].Binary neutron stars are less massive than typical binary black holes, but they coalesce more frequently.Thus, while the two backgrounds are expected to produce comparable Ω gw , the rate of binary neutron star mergers is much higher (by a factor of ≈ 17).Moreover, binary neutron star waveforns are much longer than binary black hole waveforms.The first binary black hole event GW150914 was in-band for ≈ 0.3 s [42] versus 100 s for the first binary neutron star GW170817 [29].It is important to understand that while the binary neutron star background is continuous (always present), it is still non-Gaussian: the mergers are, for the most part, clearly separated in time [9].
These two effects all but ensure that a signal is present in the data at any given time.At any given moment, there are likely to be 15 binary neutron stars somewhere in the Universe, producing gravitational waves with f > 10 Hz (versus 0.06 for binary black holes) [1].This violates an assumption in our formulation of the binary black hole search: that the vast majority of segments contain either no signal or one signal.
All is not lost.There are probably a number of solutions.One possibility is to treat the number of events in some data segment as a free parameter using the local rate to inform the prior.The formalism of reversiblejump Markov chain Monte Carlo, where the dimensionality of the parameter space is itself a free parameter, is potentially suited for this problem; see, e.g., [30].Using this plan of attack, the algorithm could attempt, for example, to simultaneously fit dozens of binary neutron signals simultaneously, each with 15 parameters.It is not clear if such a high-dimensional search would be computationally feasible, but further investigation is warranted.

B. Continuous waves
There are thought to be ≈ 50000 isolated neutron stars in the Milky Way emitting gravitational waves in the observing band of advanced detectors [43,44].The computational power required to search the full parameter space is so vast that semi-coherent techniques are required.The data are analyzed coherently in small chunks of manageable size, typically 1800 s [45].The results from each chunk are combined incoherently.
Since the signals overlap in time, it would not make sense to define a duty cycle as fraction of segment that include a signal.However, it may be possible to define a duty cycle equal to the fraction of frequency bins that contain a signal.To get a (very rough) idea of the duty cycle, we can assume that continuous-wave sources are roughly evenly distributed throughout the advanced detector observing band from 10 − 1500 Hz [45].Assuming 1800 s segments (with frequency resolution 0.56 mHz), there are 2.7M frequency bins.This implies a duty cycle of ξ ≈ 2%, which is much less than one.We therefore expect that this formalism can be applied relatively straightforwardly to search for a population of Galactic neutron stars.This proposal bears similarities to [46], which proposes an ensemble search for known pulsars.

C. Super-massive black hole binaries
The background from super-massive black hole binaries is analogous to the background from isolated neutron stars, except the measurements are carried out by pulsar timing arrays; for a review see [47,48].Most super-massive black hole binaries are expected to produce nearly monochromatic signals that evolve slowly over the decade-long observation period.At any one time, there might be 10 4 − 10 5 such binaries emitting gravitational waves in the pulsar-timing band [49].The probability that an inspiralling binary emits gravitational waves on the interval (f, f + df ) is [50] Thus, most binaries emit near the lower limit of the observing band.Pulsar timing arrays observe in a band 1 − 100 nHz with a resolution of ≈ 1 nHz.We expect 90% of the binaries to be observed in a narrow band of (1 − 2.4 nHz).This is also the most sensitive part of the band.Thus, we expect many thousands of binaries per frequency bin in the relevant part of the band.We are therefore unable to define a duty cycle as the fraction of frequency bins containing a signal.The frequency bins that contribute the signal do not contain a signal, they contain thousands.One can imagine defining the duty cycle in terms of sky location: the fraction of patches of sky which contain a signal.However, given current pulsar timing arrays, it seems unlikely that there are enough quasi-independent patches of sky so that the expected duty cycle is less than one; see, e.g., [51].If so, and if we are not missing some other means of distinguishing supermassive black hole binary signals, it seems that the pulsar timing background is, for all intents and purposes, Gaussian in nature, at least as measured by foreseeable detectors.

D. Bursts
Gravitational-wave bursts are unmodeled transients, which can be contrasted with well-modeled signals from compact binaries.There are expected bursts from objects like supernovae, but gravitational-wave astronomers also search for unexpected bursts.Given the significant theoretical uncertainties about the loudness and rate of different gravitational-wave bursts, it is hard to say if there is a significant stochastic background from bursts, and if so, whether or not it is Gaussian.If, for example, there is a detectable background from supernovae [52], one might expect a Gaussian background since supernovae explode in the Universe at a rate of ≈ 30 s −1 .Alternatively, the burst background might be dominated by louder but less frequent signals creating a non-Gaussian background from, e.g., cusps of cosmic strings.
Given these theoretical uncertainties, it seems worthwhile to carry out a non-Gaussian search for bursts.The formalism described here can be extended.Instead of marginalizing over compact binary parameters, one can imagine marginalizing over arbitrary combinations of wavelets, see, e.g., [30].Burst waveforms that can be easily parameterized, such as cosmic string bursts [53] and gravitational-wave memory [54], are relatively straightforward to implement in this formalism.

E. Glitch classification
The method can also be repurposed to identify populations of glitches.To do this, we employ the formalism for non-Gaussian noise.We assume that the glitches are described by a hyper-parameterized prior.Following the method described in Section VII, we can estimate the hyper-parameters of the glitch population in order to classify populations of glitches.For example, one might find that there is a population of transients in the LIGO Hanford detector that are best fit by templates corresponding to 50 + 50 M mergers.This idea is only a sketch, but we can envision developing a practical tool, which could be useful for commissioning and detector characterization.For a related discussion, see [55].

IX. SIMULTANEOUS ESTIMATION OF GAUSSIAN BACKGROUND WITH A NON-GAUSSIAN FOREGROUND
In the long run, we are interested in uncovering the primordial background likely to be lurking underneath the astrophysical background from compact binary mergers.Measurement of a primordial background is considered a Holy Grail of gravitational-wave astronomy, potentially allowing us to probe times well before the formation of the cosmic microwave background and to test energy scales that are not accessible through any other means; see, e.g., [56].The problem of disentangling primordial backgrounds from astrophysical foregrounds is therefore important.The primordial background is likely to be Gaussian.In this section, we sketch out how the analysis could be extended to simultaneously measure a Gaussian background in the presence of a non-Gaussian foreground.For the sake of readability, we suppress frequency dependence as well as indices denoting segment number.The reader should consider both of these to be implied.
As a first step, we derive the likelihood for a purely Gaussian background.This derivation will be helpful in order to see how the result is generalized to include simultaneous Gaussian and non-Gaussian signals.The likelihood of obtaining strain data s given a persistent Gaussian signal The inner product, defined in Eq. 3, includes an implicit sum over detectors.Variables in bold-face are vectors with a different entry for each detector, e.g., G , ...).The strain induced in two detectors α and β is not in general the same, though, the two strains are related via the overlap reduction function γ αβ [57]: We do not have a template for h G because it is described by a stochastic process.Our prior for h G is where S h is the signal power-spectral density.Note that both S h and h G are implicit functions of frequency.(This frequency dependence may be described by additional parameters such as a spectral index.)The prior in Eq. 42 states that h G is a Gaussian field characterized by a power spectrum S h and with covariance described by Eq. 41.
Before proceeding further, we introduce a simplifying assumption: that the detector network consists of two co-located detectors.This assumption is by no means necessary, but it will facilitate straightforward comparison with [16].Employing this assumption, we can make the following substitutions Here, γ with no indices is the overlap reduction function for a co-located detector pair.For comparison with [16], we work for the time being with "effective power" S eff h ≡ γS h .
Next, following [16] (see their Section 4.2), we marginalize over h G in Eq. 42 to obtain a marginalized likelihood Following [16], we define P ≡ σ 2 .Remember that there is an implied sum over frequency bins in the expression s T C −1 s and that det(C) includes an implied product over frequency bins.After combining data from multiple segments, there is, additionally, an implied sum over segments in the expression s T C −1 s and an implicit product over segments in the expression det(C).
There are two terms in the exponential of Eq. 47: an auto-power term containing s 2 1 + s 2 2 and a cross-power term containing s * 1 s 2 .The auto-power terms are considered unreliable because a detection relying on auto-power would require a a precise noise budget.If the noise power spectral density includes a component that exceeds the noise budget, experimentalists assume it is an unmodeled noise, not a stochastic background.The typical solution is to start over with a likelihood constructed only out of cross-power [5]

L(s|S
This prescription ensures that auto-power from unknown noise does not create a false signal. The prescription in Eq.50 does not lend itself to our present purposes.Fortunately, there is a Bayesian approach that does.The notion that interferometer noise budgets are not trustworthy enough to detect excess auto-power can be framed in terms of a prior belief.We treat P as a parameter with a prior distribution π(P ) peaked at P 0 with some width σ P , which is wide compared to S h π(P ) ∝ exp(−(P − P 0 ) 2 /2σ 2 p ).
That is to say, we do not trust our measurement of the noise at a level comparable to the size of the stochastic signal power S h .Marginalizing over P , we obtain an evidence (marginalized likelihood) for each S eff h , which cannot be tricked by excess auto-power When σ P S eff h , the stochastic signal encoded in the auto-power is lost, and we recover something close to the cross-correlation likelihood; see Fig. 8.
In Fig. 8, we show likelihoods for a demonstration calculation with mock data.The cross-correlation likelihood (Eq.50, /-purple) and the marginalized likelihood from Eq. 52 (\-red) are nearly identical.Both are consistent with the injected value of S h .If we calculate the likelihood from Eq. 47, but do not marginalize over uncertainty in P , we obtain the green distribution.Since this distribution includes information from both cross-and auto-power, it is narrower than the other distributions.However, it is also biased because an error in P leads to an overestimate of S h .If we do not include a systematic error in P , the posterior peaks in the correct place.For the purposes of this sketch, we model uncertainty in the noise power spectral density with a simple expression for π(P ).We note in passing that significantly more sophisticated models are available in order to take into account, e.g., frequency-dependent artifacts such as instrumental lines [58].Now we are ready to sketch out a method to detect a background with both Gaussian and non-Gaussian components.The likelihood includes a Gaussian component h G and a non-Gaussian component h The non-Gaussian signal depends implicitly on binary parameters θ.The bold face indicates a vector with entries for different detectors.
Following the same reasoning we used to calculate the Gaussian likelihood in Eq. 47, the Gaussian + non- FIG.8: A comparison of likelihood and marginalized likelihood functions for stochastic power spectral density S h .We carry out a numerical experiment with toy-model data.We plot likelihood functions L(s|S h ) using cross-and auto-power (Eq.47, "Unmarginalized", green), using cross-power only (Eq.50, "Cross correlation", hatched purple), and using cross-and auto-power, but marginalizing over large uncertainty in the auto-power (Eq.52, "Marginalized", hatched red).
The purple is nearly indistinguishable from the red.The green "unmarginalized" distribution is narrower, but it is also biased because it does not take into account uncertainty in P .(If we do not include a systematic error in P , the posterior peaks in the correct place.)The horizontal axis is strain power in arbitrary units and the vertical axis is the likelihood.The black vertical line indicates the injected value.
Gaussian likelihood is By introducing a suitable prior π(P ) and marginalizing over uncertainty in the detector noise, it should be possible to simultaneously infer the existence of a population of unresolved binaries and a continuous Gaussian background.
Z(h NG , S h ) = dP π(P )L(s|h NG , S h ).(55) We note that, by marginalizing over uncertainty in P , we should not hamper our ability to measure binary signals.Individual binary signals contribute to our evidence by matching with waveform templates, not by inducing excess power.Indeed, parameterized noise models are a fixture of recent transient analysis [30].
Using the Gaussian + non-Gaussian likelihood function in Eq. 55, we can marginalize over the implicit binary parameters θ, to obtain signal and "noise" evidences analogous to Eqs. 8 and 9 Z S (S h ) = dθ Z(h NG , S h ) ( Z N (S h ) =Z(h NG = 0, S h ).(57) These evidences are functions of the hyper-parameter S h , which describes the Gaussian background.We refer to the "noise" evidence with quotation marks because it contains a Gaussian background signal.In order to calculate them in LALInference, one must implement the revised likelihood function (Eq.55) with a new parameter S h and a suitable prior π(P ).The next step is to introduce the duty cycle as in Eq. 5 as in Section II.
Retracing our steps, we obtain a posterior p(ξ, S h |{ s}) analogous to Eq. 11.We may convert S h into Ω G , the Gaussian energy density; see, e.g., [59] Ω where H 0 is the Hubble parameter.While significant work is required to go beyond this sketch and demonstrate this technique, we hope this proposal provides a useful outline to construct the optimal method to simultaneously detect Gaussian and non-Gaussian backgrounds.In particular, we expect the optimal method to improve upon various schemes in which astrophysical events are fit separately and then subtracted, see, e.g., [60].It is worth investigating as a promising tool for future efforts to measure primordial backgrounds.

X. CONCLUSIONS
Preliminary estimates suggest that advanced detectors, operating at design sensitivity, can detect a stochastic background from binary black holes in about one day.These estimates rely on extrapolation using Gaussian mixture modeling of our Bayesian evidence distributions.The next step is to carry out a mock data challenge in which we demonstrate the safety and efficacy of the search using ≈ 1 day of design sensitivity Monte Carlo data.Such a demonstration would allow us to verify the extrapolations made here with a modest computational cost ≈ 500K core hours.
We have highlighted new directions worthy of deeper investigation beyond the overview we provide here.It will be interesting to more fully develop this method for other audio-band sources of gravitational waves including binary neutron stars, continuous waves, unmodeled bursts, and glitches.The method does not appear to be helpful for pulsar timing.We look forward to demonstrating the simultaneous detection of a Gaussian background in the presence of a non-Gaussian foreground.
This formulation of binary black hole detection provides a unified framework for the analysis of both resolvable signals and a stochastic background of unresolvable signals.It is also a natural framework to carry out analysis of the population properties of binary black holes.Since the resolved and unresolved binaries are analyzed as a single dataset, it is possible to eliminate selection effects.The drop at 20 Hz is an artifact from our cut-off frequency.The orange curve is the LIGO design sensitivity noise curve.Right: energy density Ω gw (f ).Solid blue is for the dataset used in our mock dataset (z < 0.77) while dashed red is the same as solid blue, but including binaries out to arbitrarily high redshifts.The dashed red background is three times higher than the solid blue.The black curve is the 1 σ power-law sensitivity curve [59], indicating the sensitivity of the stochastic cross-correlation search assuming a LIGO Hanford-Livingston network operating at design sensitivity for one year.Since the red dashed curve is below the black sensitivity curve, the background is not detectable with cross correlation after one year.
FIG. 1: Histogram of the coherent network matchedfilter signal-to-noise ratio (Eq.31) for a dataset containing simulated signals + Gaussian noise.The network signal-to-noise ratio is calculated by maximizing over all astrophysical parameters.This is why the distribution falls off sharply near ρ network = 1; one of the many available templates tends to produce a reasonable fit to the data such that ρ network 1, even for weak signals.The signal distribution is generated from a population of mergers uniform in volume between (0.50 Gpc, 5 Gpc).Gold-plated detections with ρ network ≥ 12 are excluded.
FIG.3: Probability density functions for signal and noise evidences.The left-hand panel shows the fit for the signal dataset; right is for the noise dataset.The horizontal axis is the log signal evidence while the vertical axis is the log noise evidence.

FIG. 4 :FIG. 5 :
FIG. 4: Contours of average ln BF as a function of the simulated true duty cycle ξ true and number of 4 s segments N segs .The blue contour corresponds to ln BF = 8 where an astrophysical background is detectable.The color bar saturates at ln BF = 20).The horizontal and vertical lines correspond to ξ true = 4 × 10 −4 and N segs = 17, 500 respectively.
Unbiased duty cycle posterior for O1 background containing software injections with a duty cycle ξtrue = 0.09.The vertical line shows the true duty cycle.

FIG. 9 :
FIG.9: Left: amplitude spectral density [S h (f )] 1/2 .Blue is for the dataset used in our mock dataset (z < 0.77).The drop at 20 Hz is an artifact from our cut-off frequency.The orange curve is the LIGO design sensitivity noise curve.Right: energy density Ω gw (f ).Solid blue is for the dataset used in our mock dataset (z < 0.77) while dashed red is the same as solid blue, but including binaries out to arbitrarily high redshifts.The dashed red background is three times higher than the solid blue.The black curve is the 1 σ power-law sensitivity curve[59], indicating the sensitivity of the stochastic cross-correlation search assuming a LIGO Hanford-Livingston network operating at design sensitivity for one year.Since the red dashed curve is below the black sensitivity curve, the background is not detectable with cross correlation after one year.