Fast and Accurate Sensitivity Estimation for Continuous-Gravitational-Wave Searches

This paper presents an efficient numerical sensitivity-estimation method and implementation for continuous-gravitational-wave searches, extending and generalizing an earlier analytic approach by Wette [1]. This estimation framework applies to a broad class of F-statistic-based search meth- ods, namely (i) semi-coherent StackSlide F-statistic (single-stage and hierarchical multi-stage), (ii) Hough number count on F-statistics, as well as (iii) Bayesian upper limits on (coherent or semi-coherent) F-statistic search results. We test this estimate against results from Monte-Carlo simulations assuming Gaussian noise. We find the agreement to be within a few % at high (i.e. low false-alarm) detection thresholds, with increasing deviations at decreasing (i.e. higher false- alarm) detection thresholds, which can be understood in terms of the approximations used in the estimate. We also provide an extensive summary of sensitivity depths achieved in past continuous- gravitational-wave searches (derived from the published upper limits). For the F-statistic-based searches where our sensitivity estimate is applicable, we find an average relative deviation to the published upper limits of less than 10%, which in most cases includes systematic uncertainty about the noise-floor estimate used in the published upper limits.


INTRODUCTION
The recent detections of gravitational waves from merging binary-black-hole and double neutron-star systems [2][3][4] have opened a whole new observational window for astronomy, allowing for new tests of general relativity [5], new constraints on neutron star physics [6] and new measurements of the Hubble constant [7], to mention just a few highlights.
Continuous gravitational waves (CWs) from spinning non-axisymmetric neutron stars represent a different class of potentially-observable signals [8,9], which have yet to be detected [10]. These signals are expected to be long-lasting (at least several days to years) and quasi monochromatic, with slowly changing (intrinsic) frequency. The signal amplitude depends on the rich (and largely not yet well-understood) internal physics of neutron stars [11], as well as their population characteristics [12,13]. A detection (and even non-detection) of CWs could therefore help us better understand these fascinating astrophysical objects, and may allow for new tests of general relativity [14,15].

Overview of search categories
We can categorize CW searches in two different ways: either based on the search method, or on the type of explored parameter space.
The search methods fall into two broad categories: coherent and semi-coherent (sometimes also referred to * christoph.dreissigacker@aei.mpg.de as incoherent). Roughly speaking, a coherent search is based on signal templates with coherent phase evolution over the whole observation time, while semi-coherent searches typically break the data into shorter coherent segments and combine the resulting statistics from these segments incoherently (i.e. without requiring a consistent phase evolution across segments). However, there are many different approaches and variations, which are beyond the scope of this paper, see e.g. [10] for a more detailed overview. Here we will exclusively focus on coherent and semi-coherent methods based on the F-statistic, which will be introduced in Sec. II.
Coherent search methods are the more sensitive in principle, but in practice they usually suffer from severe computing-cost limitations: for finite search parameter spaces the required number of signal templates typically grows as a steep power-law of the observation time, making such searches infeasible except when the search region is sufficiently small. For larger signal parameter spaces the observation time needs to be kept short enough for the search to be computationally feasible, which limits the attainable coherent sensitivity. This is where semicoherent searches tend to yield substantially better sensitivity at fixed computing cost (e.g. see [16,17]).
Based on the explored parameter space, we distinguish the following search categories (referencing a recent example for each case): (i) Targeted searches for known pulsars [18] assume a perfect fixed relationship between the observed neutron-star spin frequency and the CW emission frequency. Therefore one only needs to search a single point in parameter space for each pulsar, allowing for optimal fully-coherent searches [19].
(ii) Narrow-band searches for known pulsars assume a arXiv:1808.02459v1 [gr-qc] 7 Aug 2018 small uncertainty in the relationship between CW frequency and the measured pulsar spin rates. This finite search parameter space requires a template bank with (typically) many millions of templates, still allowing for optimal fully-coherent search methods to be used [20].
(iii) Directed (isolated) searches aim at isolated neutron stars with known sky-position and unknown spin frequency. The search parameter space covers the unknown frequency and spindowns of the neutron star signal within an astrophysically-motivated range [21,22].
(iv) (Directed) binary searches aim at binary systems with known sky-position and parameter-space uncertainties in the frequency and binary-orbital parameters. Typically these sources would be in lowmass X-ray binaries, with the most prominent example being Scorpius X-1 (Sco X-1)) [23,24].
(v) All-sky (isolated) searches search the whole sky over a large frequency (and spindown) band for unknown isolated neutron stars [25,26].
(vi) All-sky binary searches are the most extreme case, covering the whole sky for unknown neutron stars in binary systems [27,28].

Sensitivity estimation
In this work we use the term sensitivity to refer to the upper limit on signal ampltitude h 0 (or equivalently sensitivity depth D ≡ √ S n /h 0 , see Sec. II E). This can be either the frequentist upper limit for a given detection probability at a fixed false-alarm level (p-value), or the Bayesian upper limit at a given credible level for the given data.
Sensitivity therefore only captures one aspect of a search, namely how "deep" into the noise-floor it can detect signals, without accounting for how "wide" a region in parameter space is covered, how much prior weight this region contains, or how robust the search is to deviations from the signal model. Comparing sensitivity depth therefore only makes sense for searches over very similar parameter spaces. A more complete measure characterizing searches would be their respective detection probability, which folds in sensitivity depth, breadth in parameter space, as well as the prior weight contained in that space [29,30].
However, it is often useful to be able to reliably and cheaply estimate the sensitivity of a search setup without needing expensive Monte-Carlo simulations: • In order to determine optimal search parameters for a semi-coherent search (i.e. the number and semicoherent segments and template-bank mismatch parameters), it is important to be able to quickly asses the projected sensitivity for any given searchparameter combination (e.g. see [17,[29][30][31]).
• For setting upper limits for a given search, one typically has to repeatedly add software-generated CW signals to the data and perform a search, in order to measure how often these signals are recovered above a given threshold. By iterating this procedure one tries to find the weakest signal amplitude that can be recovered at the desired detection probability (or "confidence"). This can be very computationally expensive, and a quick and reasonablyreliable estimate for the expected upper-limit amplitude can therefore substantially cut down on the cost of this iterative process, which can also improve the accuracy of the upper limit.
• The estimate can also serve as a sanity check for determining upper limits 1 .
A number of theoretical sensitivity estimates have been developed over the past decades. One of the first estimates was obtained for a coherent F-statistic search [32], yielding for a 90% confidence upper limit at 1% false-alarm (per template). S n denotes the (single-sided) noise power spectral density, and T data is the total amount of data. This was later generalized to the semi-coherent Hough [33] and StackSlide method [34,35], yielding an expression of the form for the same confidence and false-alarm level as Eq. (1), and where N seg denotes the number of semi-coherent segments. These latter results suggested the inaccurate idea that the sensitivity of semi-coherent searches follows an exact N 1/4 seg scaling. However, this was later shown [1,17] to not be generally a good approximation except asymptotically in the limit of a large number of segments (N seg 100 − 1000).
Furthermore, these past sensitivity estimates relied on the assumption of a "constant signal-to-noise ratio (SNR)" population of signals. While this approximation substantially simplifies the problem, it introduces a noticeable bias into the estimate, as discussed in more detail in [1]. For example, the constant-SNR bias combined with the incorrect N 1/4 seg scaling in Eq. 2 would result in an overestimate by a factor of two of the sensitivity of the first Einstein@Home search on LIGO S5 data [36].
These limitations of previous sensitivity estimates were eventually overcome by the analytic sensitivityestimation method developed by Wette [1] for semicoherent StackSlide F-statistic searches. In this work we simplify and extend this framework by employing a simpler direct numerical implementation. This further improves the estimation accuracy by requiring fewer approximations. It also allows us to generalize the framework to multi-stage hierarchical StackSlide-F searches, Hough-F searches (such as [36]), as well as to Bayesian upper limits based on F-statistic searches.

Plan of this paper
Sec. II provides a description of the CW signal model and introduces different F-statistic-based search methods. In Sec. III we present the sensitivity-estimation framework and its implementation, for both frequentist and Bayesian upper limits. Section IV discusses how (frequentist) upper limits are typically measured using Monte-Carlo injection-recovery simulations. Section V provides comparisons of our sensitivity estimates to simulated upper limits in Gaussian noise, while in Sec. VI we provide a comprehensive summary of published sensitivities of past CW searches (translated into sensitivity depth), and a comparison to our sensitivity estimates where applicable. We summarize and discuss the results in Sec. VII. Further details on the referenced searches and upper limits are given in appendix A. More technical details on the signal model can be found in appendix B. Finally, appendix C contains a discussion of the distribution of the maximum F-statistic over correlated templates.

II. F-STATISTIC-BASED SEARCH METHODS
This section provides an overview of the F-statisticbased search methods for which sensitivity estimates are derived in Sec. III. Further technical details about the signal model and the F-statistic are given in appendix B. For a broader review of the CW signal model, assumptions and search methods, see for example [8][9][10] A. Signal model For the purpose of sensitivity estimation we assume the data timeseries x X (t) from each detector X to be described by Gaussian noise, i.e. n X (t) ∼ Gauss(0, S X n ) with zero mean and (single-sided) power-spectral density (PSD) S X n . A gravitational-wave signal creates an additional strain h X (t) in the detector, resulting in a time-series x X (t) = n X (t) + h X (t) . (3) For continuous gravitational waves the two polarization components can be written as where φ(τ ) describes the phase evolution of the signal in the source frame. For the typical quasi-periodic signals expected from rotating neutron stars, this can be expressed as a Taylor series expansion around a chosen reference time (here τ ref = 0 for simplicity) as in terms of derivatives of the slowly-varying intrinsic CW frequency f (τ ). For a triaxial neutron star spinning about a principal axis, the two polarization amplitudes are given by in terms of the angle ι between the line of sight and the neutron star rotation axis and the overall signal amplitude h 0 . This definition uses the common gauge condition of A + ≥ |A × |. After translating the source-frame signal into the detector frame (see appendix B for details), the strain signal h X (t) at each detector X can be expressed in the factored form which was first shown in [37], and where the four amplitudes A µ depend on the amplitude parameters {h 0 , cos ι, ψ, φ 0 } as given in Eq. B5). The four basis functions h X µ (t; λ), which are given explicitly in Eq. (B6), depend on the phase-evolution parameters λ = {n, f,ḟ , . . .}, namely sky positionn, frequency f and its derivatives f (k) = d k f /dτ k τ ref , and binary-orbital parameters in the case of a neutron star in a binary.

B. Coherent F-statistic
For pure Gaussian-noise timeseries {n X (t)} in all detectors X, the likelihood can be written as (e.g. see [38][39][40]): in terms of the multi-detector scalar product wherex(f ) denotes the Fourier transform of x(t), and x * denotes complex conjugation of x. Using the additivity of noise and signals (cf. Eq. (3)), we can express the likelihood for data x, assuming Gaussian noise plus a signal h(A, λ) as From this we obtain the log-likelihood ratio between the signal and noise hypotheses as Analytically maximizing the log-likelihood ratio over A (c.f. appendix B) yields the F-statistic [37]: The statistic 2F follows a χ 2 -distribution with four degrees of freedom and non-centrality ρ 2 , with expectation and variance where ρ corresponds to the signal-to-noise ratio (SNR) of coherent matched filtering.
In the perfect-match case, where the template phaseevolution parameters λ coincide with the parameters λ s of a signal in the data x, the SNR can be explicitly expressed as where T data is the total amount (measured as time) of data over all detectors, and S n denotes the multi-detector noise floor, defined as the harmonic mean over the perdetector PSDs S X n , namely Note that in practice the F-statistic-based search implementations do not assume stationary noise over the whole observation time, but only over short durations of order T SFT ∼ 30 mins, corresponding to the length of the Short Fourier Transforms (SFTs) that are typically used as input data. The present formalism can straightforwardly be extended to this case [41], where the relevant overall multi-detector noise-PSD definition S n generalizes as the harmonic mean over all SFTs, namely where α is an index enumerating all SFTs (over all detectors), and S α n is the corresponding noise PSD estimated for SFT α.
The response function R(θ) (following the definition in [1]) depends on the subset of signal parameters θ ≡ {n, cos ι, ψ} , (18) and can be explicitly expressed [42] as with the sky-dependent antenna-pattern coefficients {A, B, C} of Eq. (B10), and One can show that R 2 averaged over ψ ∈ [−π/4, π/4] and cos ι ∈ [−1, 1] yields and further averagingn isotropically over the sky yields Using this with Eq. (15) we can therefore recover the skyand polarization-averaged squared-SNR expression (e.g. see [37]): C. Semi-coherent F-statistic methods Semi-coherent methods [16] typically divide the data into N seg shorter segments of duration T seg < T obs . The segments are analyzed coherently, and the per-segment detection statistics are combined incoherently. Generally this yields lower sensitivity for the same amount of data analyzed than a fully-coherent search. However, the computational cost for a fully-coherent search over the same amount of data is often impossibly large, while the semi-coherent cost can be tuned to be affordable and typically ends up being more sensitive at fixed computing cost [16,17,43].
There are a number of different semi-coherent methods currently in use, such as PowerFlux, Frequency-Hough, SkyHough, TwoSpect, CrossCorr, Viterbi, Sideband, loosely-coherent statistics and others (e.g. see [10] and references therein). Many of these methods work on short segments, typically of length T seg ∼ 30 min, and use Fourier power in the frequency bins of these Short Fourier Transforms (SFTs) as the coherent base statistic.
In this work we focus exclusively on sensitivity estimation of F-statistic-based methods, namely StackSlide-F (e.g. see [17]) and Hough-F introduced in [33]. Here the length of segments is only constrained by the available computing cost, and segments will typically span many hours to days, which yields better sensitivity, but also requires higher computational cost. Therefore, many of the computationally expensive semi-coherent F-statistic searches are run on the distributed Einstein@Home computing platform [44].
Note that these methods are not to be confused with the (albeit closely related) "classical" StackSlide and Hough methods, which use SFTs as coherent segments, as described for example in [35].

StackSlide-F: summing F-statistics
The StackSlide-F method uses the sum of the coherent per-segmentF-statistic values in a given parameterspace point λ as the detection statistic, namely whereF is the coherent F-statistic of Eq. (12) in segment . This statistic follows a χ 2 -distribution with 4N seg degrees of freedom and non-centrality ρ 2 , i.e.
where the non-centrality ρ 2 is identical to the expression for the coherent squared SNR of Eq. (15), with T data referring to the whole data set used, and S n is the corresponding noise floor. However, the non-centrality in the semi-coherent case cannot be considered a "signal to noise ratio", due to the larger N seg -dependent degrees of freedom of the χ 2 distribution compared to Eq. (13), which increases the false-alarm level at fixed threshold and reduces the "effective" semi-coherentŜNR toŜNR 2 = ρ 2 / N seg (e.g. see [Eq.(14)] in [45]).
The expectation and variance for 2F are We note that in practice StackSlide-F searches often quote the average F over segments instead of the sum F, i.e.
2. Hough-F: summing threshold crossings The Hough-F method [33] sets a thresholdF th on the per-segment coherentF-statistics and uses the number of threshold-crossings over segments as the detection statistic, the so-called Hough number count n c , i.e.
where Θ(x) is the Heaviside step function.

D. Mismatch and template banks
In wide-parameter-space searches the unknown signal parameters λ ∈ P are assumed to fall somewhere within a given search space P. In this case one needs to compute a statistic (such as those defined in the previous sections) over a whole "bank" of templates . This template bank has to be chosen in such a way that any putative signal λ s ∈ P would suffer only an acceptable level of loss of SNR. This is typically quantified in terms of the so-called mismatch µ, defined as the relative loss of ρ 2 (λ s ; λ) in a template λ with respect to the perfectmatch ρ 2 (λ s ; λ s ) = ρ 2 0 (of Eq. (15)), namely We can therefore express the "effective" non-centrality parameter ρ 2 eff in a template point λ in the F-statistic χ 2 -distribution of Eqs. (13), (27) as

E. Sensitivity Depth
The F-statistic non-centrality parameter ρ 2 depends on signal amplitude h 0 and overall noise floor S n (cf. Eq. (17)) only through the combination h 0 / √ S n , as seen in Eq. (15). The sensitivity of a search is therefore most naturally characterized in terms of the so-called sensitivity depth [46], defined as in terms of the overall noise PSD S n defined as the harmonic mean over all SFTs used in the search, see Eq. (17). A particular choice of search parameters (N seg , T data , template bank) will generally yield a frequencydependent upper limit h 0 (f ), due to the frequencydependent noise floor S n (f ). However, for fixed search parameters this will correspond to a constant sensitivity depth D, which is therefore often a more practical and natural way to characterize the performance of a search, independently of the noise floor.

III. SENSITIVITY ESTIMATE
As discussed in more detail in the introduction, by sensitivity we mean the (measured or expected) upper limit on h 0 for a given search (or equivalently, the sensitivity depth D = √ S n /h 0 ), which can either refer to the frequentist or Bayesian upper limit.

A. Frequentist upper limits
The frequentist upper limit is defined as the weakest signal amplitude h 0 that can be detected at a given detection probability p det 2 (typically chosen as 90% or 95%) above a threshold d th on a statistic d(x). The threshold can be chosen as the loudest candidate obtained in the search, or it can be set corresponding to a desired false-alarm level p fa (or p-value), defined as which can be inverted to yield d th = d th (p fa ). The detection probability for signals of amplitude h 0 is which can be inverted to yield the upper limit h 0 (d th , p det ). We can write p fa (d th ) = p det (d th ; h 0 = 0), and so we can express both in terms of the general thresholdcrossing probability as B. Approximating wide-parameter-space statistics As discussed in Sec. II D, a wide parameter-space search for unknown signals λ ∈ P typically proceeds by computing a (single-template) statistic over a bank of templates T ≡ {λ i } N i=1 covering the parameter space P. This results in a corresponding set of (single-template) statistic values {d 1 (x; λ i )}, which need to be combined to form the overall wide-parameter-space statistic d(x). This would naturally be obtained via marginalization (i.e. integrating the likelihood over P), but in practice is mostly done by maximizing the single-template statistic over T, i.e.
2 or equivalently, false-dismissal probability p fd = 1 − p det 1. Noise case: estimating the p-value p fa For the pure noise case of Eq. (34), it is difficult to determine a reliable expression for P (d * | h 0 = 0), even if the single-template statistic P (d 1 | h 0 = 0) follows a known distribution (such as for the F-based statistics discussed in Sec. II). The reason for this difficulty lies in the correlations that generally exist between "nearby" templates in λ i ∈ T.
If all N templates were strictly uncorrelated, one could use the well-known expression Eq. (C1) [1,47] for the distribution of the maximum. In this case one can also relate the single-trial p-value p 1 fa ≈ p fa /N to the wideparameter-space p-value p fa (for p 1 fa 1). Although it is a common assumption in the literature, template correlations do not simply modify the "effective" number of independent templates to use in Eq. (C1), but they generally also affect the functional form of the underlying distribution for the maximum d * , as illustrated in appendix C with a simple toy model.
In this work we assume that the upper limit refers to a known detection threshold in Eq. (35). This can be obtained either from (i) the loudest observed candidate (the most common situation in real searches), or from (ii) setting a single-template p-value p 1 fa and inverting the known single-template distribution Eq. (34), or from (iii) a numerically-obtained relation between p fa and the threshold d th , e.g. via Monte-Carlo simulation.

Signal case: estimating the detection probability p det
In the signal case it is easier to estimate the maximumlikelihood statistic d * (x) over the full template bank T, provided we can assume that the highest value of d 1 will be realized near the signal location, which should be true as long as the p-value p fa is low (typically p fa 1%) and the signals have relatively high detection probability (typically p det ∼ 90% or 95%). This will typically be a good approximation, but in Sec. V we will also encounter situations where deviations from the predictions can be traced to violations of these assumptions. We therefore approximate where λ * is the "closest" template ∈ T to the signal location λ s , defined in terms of the metric Eq. (31), namely the template with the smallest mismatch µ from the signal. This template yields the highest effective noncentrality parameter over the template bank, namely C. StackSlide-F sensitivity We first consider a semi-coherent StackSlide-F search using the summedF-statistic of Eq. (26), i.e. d 1 (x; λ) = 2F(x; λ). This case also includes fully-coherent Fstatistic searches, which simply correspond to the special case N seg = 1.
We see from Eq. (36) that in order to estimate the sensitivity, we need to know P (2F | h 0 ). This can be obtained via marginalization (at fixed h 0 ) of the known distribution P (2F | ρ 2 ) of Eq. (27), combined with the assumption Eq. (39) that the highest statistic value will occur in the "closest" template, with mismatch distribution P (µ): where ρ 2 eff (h 0 , θ, µ) = ρ 2 0 (h 0 , θ) (1 − µ) in terms of the perfect-match non-centrality ρ 2 0 defined in Eq. (15), and in the last step we used the fact that the distribution for 2F is fully specified in terms of the non-centrality parameter ρ 2 of the χ 2 -distribution with 4N seg degrees of freedom, as given in Eq. (27).
Equation (40) requires five-dimensional integration for each sensitivity estimation, which would be slow and cumbersome. One of the key insights in [1] was to notice that the perfect-match SNR ρ 0 of Eq. (15) depends on the four parameters θ only through the scalar R 2 (θ), and we can therefore use a reparametrization where θ(R 2 ) denotes the subspace of θ values yielding a particular R 2 from Eq. (19). The one-dimensional distribution P (R 2 ) can be obtained by Monte-Carlo sampling over the priors of skypositionn (typically either isotropically over the whole sky, or a single sky-position in case of a directed search) and polarization angles cos ι (uniform in [−1, 1]) and ψ (uniform in [−π/4, π/4]). The resulting values of R 2 (θ) are histogrammed and used as an approximation for P (R 2 ), which can be reused for repeated sensitivity estimations with the same θ-priors. We can therefore rewrite Eq. (40) as with The mismatch distribution P (µ) for any given search will typically be obtained via injection-recovery Monte-Carlo simulation, where signals are repeatedly generated (without noise) and searched for over the template bank, obtaining the corresponding mismatch µ for each injection. This is often a common step in validating a search and template-bank setup. Alternatively, for some search methods pre-computed estimates for the mismatch distributions exist as a function of the template-bank parameters, e.g. for the Weave search code [48]. Inserting Eq. (42) into the detection probability of Eq. (36), we obtain where Equation (45) can be easily and efficiently computed numerically, and simple inversion (via 1-D root-finding) yields the sensitivity (i.e. upper limit) h 0 for given detection probability p det and threshold 2F th .

D. Multi-stage StackSlide-F sensitivity
The sensitivity estimate for a single StackSlide-F search can be generalized to hierarchical multi-stage searches, where threshold-crossing candidates of one search stage are followed up by deeper subsequent searches in order to increase the overall sensitivity (e.g. see [16,26,43,49,50]). We denote the n stages with an index i = 1 . . . n. Each stage i is characterized by the number N  The initial wide-parameter-space search (stage i = 1) yields candidates that cross the threshold 2F (1) th in certain templates {λ}. The next stage follows up these candidates with a more sensitive search, which can be achieved by reducing the mismatch µ (i) (choosing a finer template bank grid), or by increasing the coherent segment length (and reducing the number of segments N (i) seg ). Often the final stage i = n in such a follow-up hierarchy would be fully coherent, i.e. N (n) seg = 1. In order for any given candidate (which can be either due to noise or a signal) to cross the final threshold 2F (n) , it has to cross all previous thresholds as well, in other words Eq (34),(35) now generalize to In order to make progress at this point we need to assume that the threshold-crossing probabilities in different stages are independent of each other, so for j = i we assume which would be exactly true if the different stages used different data (see also [43]). In the case where the same data is used in different stages, this approximation corresponds to an uninformative approach, in the sense that we do not know how to quantify and take into account the correlations between the statistics in different stages. We proceed without using this potential information, which could in principle be used to improve the estimate. It is not clear if and how much of an overall bias this approximation would introduce. A detailed study of this question is beyond the scope of this work and will be left for future study.
Using the assumption of independent stages we write where now the R 2 -marginalization needs to happen over all stages combined, as the signal parameters R 2 (θ) are intrinsic to the signal and therefore independent of the stage. On the other hand, the mismatch distribution P (µ (i) ) depends on the stage, as each stage will typically use a different template grid, and so we have where p det (2F th ; ρ 2 eff ) is given by Eq. (46) using the respective per-stage values.
Equation (49) can easily be solved numerically and inverted for the sensitivity h 0 at given p Note that in practice one would typically [50] want to choose the thresholds in such a way that a signal that passed the 1st-stage threshold 2F (1) th should have a very low probability of being discarded by subsequent stages, in other words p (i>1) det ≈ 1, and therefore p th ; h 0 ). Therefore subsequent stages mostly serve to reduce the total false-alarm level p (tot) fa , allowing one to increase the first-stage p (1) fa by lowering the corresponding thresholdF (1) , resulting in an overall increased sensitivity.

E. Hough-F sensitivity
Here we apply the sensitivity-estimation framework to the Hough-F statistic introduced in Sec. II C 2. The key approximation we use here is to assume that for a given signal {h 0 , R 2 (θ)}, the coherent per-segmentF -statistic has the same threshold-crossing probability p th in each segment , i.e. p th = p th for all = 1 . . . N seg , and where the per-segment effective SNR ρ eff, is given by Eq. (44) with T data and S n referring to the respective persegment quantities, andμ is the mismatch ofF-statistic in a single segment. Provided these quantities are reasonable constant across segments, for a fixed signal {h 0 , R 2 } we can write the probability for the Hough number count n c of Eq. (30) as a binomial distribution, namely with p th (h 0 , R 2 ) given by Eq. (52). For a given threshold n c,th on the number count we therefore have the detection probability and marginalization over R 2 yields the corresponding detection probability at fixed amplitude h 0 , namely We can numerically solve this for h 0 at given p det and number-count threshold n c,th , which yields the desired sensitivity estimate.

F. Bayesian Upper Limits
Bayesian upper limits are conceptually quite different [51] from the frequentist ones discussed up to this point. A Bayesian upper limit h C 0 of given confidence (or "credible level") C corresponds to the interval [0, h C 0 ] that contains the true value of h 0 with probability C. We can compute this from the posterior distribution P (h 0 | x) for the signal-amplitude h 0 given data x, namely The Bayesian targeted searches (here referred to as BayesPE ) for known pulsars (see Table V and Sec. A 5) compute the posterior P (h 0 | x) directly from the data x, using a time-domain method introduced in [52] .
Here we focus instead on F-statistic-based searches over a template bank. As discussed in [51], to a very good approximation we can compute the posterior from the loudest candidate 2F * (x) found in such a search, using this as a proxy for the data x, i.e.
where we used Bayes' theorem, and the proportionality constant is determined by the normalization condition We have already derived the expression for P (2F | h 0 ) in Eq. (42), and for any choice of prior P (h 0 ) we can therefore easily compute the Bayesian upper limit h C 0 (2F * ) for given loudest candidate 2F * by inverting Eq. (56).
It is common for Bayesian upper limits on the amplitude to choose a uniform (improper) prior in h 0 (e.g. see [19]), which has the benefit of simplicity, and also puts relatively more weight on larger values of h 0 than might be physically expected (weaker signals should be more likely than stronger ones). This prior therefore results in larger, i.e. "more conservative", upper limits than a more physical prior would.

G. Numerical implementation
The expressions for the various different sensitivity estimates of the previous sections have been implemented in GNU Octave [53], and are available as part of the OctApps [54] data-analysis package for continuous gravitational waves.
The function to estimate (and cache for later reuse) the distribution P (R 2 ) of Eq. (41) is implemented in SqrSNRGeometricFactorHist().
The sensitivity-depth estimate for StackSlide-F-searches is implemented in SensitivityDepthStackSlide(), both for the singlestage case of Eq. (45) and for the general multi-stage case of Eq. (49). For single-stage StackSlide-F there is also a function DetectionProbabilityStackSlide() estimating the detection probability for a given signal depth D and detection threshold.
The Hough-F sensitivity estimate of Eq. (55) is implemented in SensitivityDepthHoughF(). An earlier version of this function had been used for the theoretical sensitivity comparison in [36] (Sec. VB, and also [55]), where it was found to agree within an rms error of 7% with the measured upper limits.
Typical input parameters are the number of segments N seg , the total amount of data T data , the mismatch distribution P (µ), name of detectors used, single-template false-alarm level p 1 fa (or alternatively, the F-statistic threshold), and the confidence level p det . The default prior on sky-position is isotropic (suitable for an all-sky search), but this can be restricted to any sky-region (suitable for directed or targeted searches).
The typical runtime on a 3GHz Intel Xeon E3 for a sensitivity estimate including computing P (R 2 ) (which is the most expensive part) is about 25 seconds per detector. When reusing the same θ-prior on subsequent calls, a cached P (R 2 ) is used and the runtime is reduced to about 10 seconds total, independently of the number of detectors used.

IV. DETERMINING FREQUENTIST UPPER LIMITS
In order to determine the frequentist upper limit (UL) on the signal amplitude h 0 defined in Eq. (35), one needs to quantify the probability that a putative signal with fixed amplitude h 0 (and all other signal parameters drawn randomly from their priors) would produce a statistic value exceeding the threshold (corresponding to a certain false-alarm level, or p-value). The upper limit on h 0 is then defined as the value h p det 0 for which the detection probability is exactly p det , typically chosen as 90% or 95%, which is often referred to as the confidence level of the UL.
Note that here and in the following it will often be convenient to use the sensitivity depth D ≡ √ S n /h 0 introduced in Sec. II E instead of the amplitude h 0 . We denote D p det as the sensitivity depth corresponding to the upper limit h p det 0 (note that this corresponds to a lower limit on depth).
The UL procedure is typically implemented via a Monte-Carlo injection-and-recovery method: a signal of fixed amplitude h 0 = √ S n /D and randomly-drawn remaining parameters is generated in software and added to the data (either to real detector data or to simulated Gaussian noise). This step is referred to as a signal injection. A search is then performed on this data, and the loudest statistic value F * is recorded and compared against the detection threshold F th . Repeating this injection and recovery step many times and recording the fraction of times the threshold is exceeded yields an approximation for p det (F th ; D). By repeating this procedure over different D values and interpolating one can find D p det corresponding to the desired detection probability (and therefore also h p det 0 ). We distinguish in the following between measured and simulated upper limits: • Measured ULs refer to the published UL results obtained on real detector data. These typically use an identical search procedure for the ULs as in the actual search, often using the loudest candidate (over some range of the parameter space) from the original search as the corresponding detection threshold for setting the UL. The injections are done in real detector data, and typically the various vetoes, data-cleaning and follow-up procedures of the original search will also be applied in the UL procedure.
• Simulated ULs are used in this work to verify the accuracy of the sensitivity estimates. They are obtained using injections in simulated Gaussian noise, and searching only a small box in parameter space around the injected signal locations. The box size is empirically determined to ensure that the loudest signal candidates are always recovered within the box. Only the original search statistic is used in the search without any further vetoes or cleaning.
A key difference between (most) published (measured) ULs and our simulated ULs concerns the method of interpolation used to obtain D p det : in practice this is often obtained via a sigmoid p det -interpolation approach (Sec. IV A), while we use (and advocate for) a (piecewise) linear threshold interpolation (Sec. IV B) instead.

A. Sigmoid p det interpolation
In this approach one fixes the detection threshold F th and determines the corresponding p det for any given fixed-D injection set. The corresponding functional form of p det (D) has a qualitative "sigmoid" shape as illustrated in Fig. 1. An actual sigmoid function of the form is then fit to the data by adjusting the free parameters k and D 0 , and from this one can obtain an interpolation value for D p det . One problem with this method is that the actual functional form of p det (D) is not analytically known, and does not actually seem to be well described by the sigmoid of Eq. (59), as seen in Fig. 1. In this particular example the true value at p det = 90% just so happens to lie very close to the sigmoid fit, but the deviation is quite noticeable at p det = 95% (see the zoomed inset in Fig. 1).
Another problem with this method is that the range of depths required to sample the relation p det (D) often needs to be quite wide, due to initial uncertainties about where the UL value would be found, which can compound the above-mentioned sigmoid-fitting problem. Furthermore, the injection-recovery step can be quite computationally expensive, limiting the number of trials and further increasing the statistical uncertainty on the p det measurements.
Both of these problems can be mitigated to some extent by using the sensitivity-estimation method described in this paper (Sec. III) to obtain a fairly accurate initial guess about the expected UL value, and then sample only in a small region around this estimate, in which case even a linear fit would probably yield good accuracy.

B. Piecewise-linear threshold interpolation
An alternative approach is used in this work to obtain the simulated ULs: for each set of fixed-D injections and recoveries, we determine the threshold on the  statistic required in order to obtain the desired detection fraction p det . This is illustrated in Fig. 2, which shows a histogram of the observed loudest 2F candidates obtained in each of N = 10 4 injection and recovery runs at a fixed signal depth of D = 86 Hz −1/2 , using the S6-CasA-StackSlide-F search setup (cf. Sec. A 3). By integrating the probability density from 2F = 0 until we reach the desired value 1 − p det , we find the detection threshold 2F th at this signal depth D. Repeating this procedure at different depths therefore generates a sampling of the function D p det (2F th ), illustrated in Fig. 3. These points can be interpolated to the required detection threshold, which yields the desired upper-limit depth D p det .
We see in in Fig. 3 that this function appears to be less "curvy" in the region of interest compared to p det (D) shown in Fig. 1. This allows for easier fitting and interpolation, for example a linear or quadratic fit should work quite well. In fact, here we have simply used piecewiselinear interpolation, which is sufficient given our relatively fine sampling of signal depths.
As already mentioned in the previous section, using the sensitivity estimate of Sec. III one can determine the most relevant region of interest beforehand and focus the Monte-Carlo injection-recoveries on this region, which will help ensure that any simple interpolation method will work well.
Alternatively, for either the p det (D)-or the D(2F th )sampling approach, one could also use an iterative rootfinding method to approach the desired p det or 2F th , respectively.

V. COMPARING ESTIMATES AGAINST SIMULATED UPPER LIMITS
In this section we compare the sensitivity estimates from Sec. III against simulated ULs for two example cases (an all-sky search and a directed search), in order to quantify the accuracy and reliability of the estimation method and implementation. This comparison shows generally good agreement, and also some instructive deviations.
Both examples are wide-parameter-space searches using a template bank over the unknown signal parameter dimensions (namely, {sky, frequency and spindown} in the all-sky case, and {frequency and first and second derivatives} in the directed-search case).
The simulated-UL procedure (see Sec. IV) performs a template-bank search over a box in parameter space containing the injected signal (at a randomized location) in Gaussian noise. On the other hand, the sensitivity estimate (cf. Eq. (45)) uses the mismatch distribution P (µ) obtained for this template bank via injection-recovery box searches on signals without noise. We refer to this in the following as the box search.
It will be instructive to also consider the (unrealistic) case of a perfectly-matched search, using only a single template that matches the signal parameters perfectly for every injection, corresponding to zero mismatch µ = 0 in Eq. (45). We refer to this as the zero-mismatch search.
A. Example: S6-AllSky-StackSlide-F search In this example we use the setup of the all-sky search S6-AllSky-StackSlide-F [56], which was using the GCT implementation [57] of the StackSlide-F statistic and was performed on the volunteer-computing project Einstein@Home [44], see Table I and Sec. A 2 for more details. Figure 4 shows the comparison between simulated ULs and estimated sensitivity depths D 90% versus threshold 2F th , for the box search (squares and solid line), as well as for the zero-mismatch search (crosses and dashed line). We see excellent agreement between estimated and simulated ULs for the zero-mismatch search. We also find very good agreement for the box-search at higher thresholds, while we see an increasing divergence D → ∞ of the simulated ULs at decreasing thresholds, which is not captured by the estimate.
This discrepancy can be understood as the effect of noise fluctuations, which can enter in two different ways (that are not completely independent of each other): (i) For decreasing thresholds the corresponding falsealarm level Eq. (34) grows, as it becomes increasingly likely that a "pure noise" candidate (i.e. unrelated to a signal) crosses the threshold. In the extreme case where p fa approaches p det , the frequentist upper limit would tend to h 0 → 0, corresponding to D → ∞ 3 . This is illustrated in Fig. 5 showing the distribution of the loudest 2F in a box search on pure Gaussian noise, which can be compared to the diverging depth of the simulated box search around 2F th 6 in Fig. 4. We note that in practice the procedures used for measured ULs in CW searches typically make sure that the detection threshold has a very small falsealarm level, and we therefore expect this effect to have a negligible impact in cases of practical interest.
(ii) The sensitivity estimate for wide-parameter-space searches makes the assumption that the loudest candidate 2F * is always found in the closest template to the signal (i.e. with the smallest mismatch µ), as discussed in Sec. III B. However, while the closest template has the highest expected statistic value (by definition), other templates can actually produce the loudest statistic value in any given noise realization. How likely that is to happen depends on the details of the parameter space, the template bank and the threshold. It will typically be more likely at lower thresholds, as more templates further away from the signal are given a chance to cross the threshold (despite their larger mismatch).
The true distribution P (2F * | h 0 ) of a box search will therefore be shifted to higher values compared to the approximate distribution used in Eq. (42). This implies that an actual search can have a higher detection probability than predicted by the estimate (corresponding to a larger sensitivity depth).
Both of these effects contribute to different extents to the box-search discrepancy in Fig. 4 at lower thresholds: The sampling distribution for 2F * in the presence of relatively strong signals at D = 20 Hz −1/2 is shown in Fig. 6, both for a simulated box search as well for the assumed distribution in the estimate. We see that most of the loudest candidates obtained in the simulation are above 2F * > 9, and are therefore extremely unlikely to be due to noise alone, as seen from Fig. 5. The difference between the two distributions in Fig. 6 is therefore soley due to effect (ii). However, we see in Fig The comparison between simulated and estimated UL depths D 90% for these two targets is shown in Fig. 8. We see again very good agreement (relative deviations 3%) in the zero-mismatch case. However, these deviations are larger than in the all-sky case shown in Fig. 4. We suspect that this is due to the different antenna-pattern implementations of Eq. (B10) between the search code and the estimation scripts: we see different signs of the deviation for different sky positions (Vela Jr. versus Cas-A), and the effect disappears when averaging over the whole sky (as seen in Fig. 4). However, the small size of the deviations did not warrant further efforts to try to mitigate this.
For the box-search case we see good agreement at higher thresholds, with again increasing deviations at lower thresholds due to the noise effects discussed in the previous all-sky example Sec. V A.

VI. COMPARING ESTIMATES AGAINST MEASURED UPPER LIMITS
In this section we present a general overview of measured sensitivity depths D meas derived from the published  Tables I-IV for the different search categories (all-sky, directed and narrowband, binary and targeted), and more details about each search are found in Appendix A.
A. General remarks and caveats

Bayesian UL comparison
We also provide sensitivity estimates (using the framework of Sec. III F) for comparison to the Bayesian ULs of targeted searches for known pulsars (BayesPE), although these searches compute the h 0 -posterior directly from the data rather than from an F-statistic, which makes the comparison somewhat more indirect: we cannot use a known threshold or loudest candidate 2F * , and we instead compute an expected depth by calculating estimates for 2F * -values drawn randomly from the central χ 2 4 -distribution and averaging the results.

Converting published h0 ULs into Depths D
Some searches already provide their upper limits in the form of a sensitivity depth D p det , but in most cases only the amplitude upper-limits h p det 0 are given. For these latter cases we try to use a reasonable PSD estimate S n (f ) for the data used in the search in order to convert the quoted amplitude upper limits into sensitivity depths according to Eq. (33). This PSD estimate introduces a systematic uncertainty in the converted depth values, as in most cases we do not have access to the "original" PSD estimate used for the h 0 UL calculation.
In particular, even small differences in windowing or the type of frequency averaging can results in large differences in the PSD estimate near spectral disturbances. This can translate into large differences in the resulting converted depth values. In order to mitigate outliers due to such noise artifacts we quote the median over the converted measured depth values {D k } (where k either runs over multiple frequencies, targets or detectors) and estimate the corresponding standard deviation using the mean absolute deviation (MAD) [58], namely

Comparing different searches by sensitivity depth D
We can see in the tables I-IV that searches within the same search category often show roughly comparable sensitivity depths. At one end of the spectrum are the fully-targeted searches, for which the parameter space (for each pulsar) is a single point, and one can achieve the maximal possible sensitivity for the available data, namely D ∼ O 500 Hz −1/2 (see Table V). At the other end of the spectrum lies the all-sky binary search with a sensitivity depth of D ∼ 3 Hz −1/2 (see Table IV), which covers the largest parameter space of any search to date.
One cannot directly compare searches on sensitivity depth alone, even within the same search category. Other key aspects of a search are the parameter-space volume covered, the total computing power used, and the robustness of the search to deviations from the assumed signalor noise-model.
Is it intuitively obvious that the more computing power spent on a fixed parameter-space volume, the more sensitive the search will tend to be, although the increase in sensitivity is typically very weak, often of order the 10th-14th root of the computing power [17].
It is also evident that the larger the parameter space covered by a search, the less sensitivity depth can be achieved due to the increased spending of computing power on "breadth" rather than depth. Ultimately the most directly relevant characteristic of a search would be its total detection probability [29,30], which factors in both breadth and depth as well as the underlying astrophysical prior on signal amplitudes over the parameter space searched.
B. All-sky searches Estimated and measured sensitivity depths for all-sky searches are given in Table I, and further details about individual searches can be found in appendix A 2.
The mean relative error between measured and estimated depths is 9 %, while the median error is 7 %.
One case of interest is the surprisingly large discrepancy of ∼ 18% observed for the S6-AllSky-StackSlide-F+FUP search, shown in Fig. 4, were we see a significantly higher measured depth (D med meas = 46.9 Hz −1/2 ) than estimated (D est = 38.3 Hz −1/2 ). This can be traced back to the template-maximization approximation used in the estimate, namely effect (ii) discussed in Sec. V A. The low threshold used in the search (2F th = 6.1) appears to be at the cusp of becoming affected by pure-noise candidates (effect (i) in Sec. V A), but this effect is still small and does not account for the discrepancy. Furthermore, the upper limit procedure used a multi-stage follow-up, which ensures the final false-alarm level (p-value) is very small, which rules out contamination from pure-noise candidates. . The triangles (∆) and dashed lines show the measured upper-limit depth D med meas in the initial S6-AllSky-StackSlide-F search [56], and the diamond ( ) shows the corresponding result from the follow-up (FUP) search [50] (threshold 2F th = 6.1).

C. Directed and Narrow-band searches
Estimated and measured sensitivity depths for directed and narrow-band searches are given in Tables II and  III, and further details about individual searches can be found in appendix A 3.
The mean relative error between measured and estimated depths is 5 %, and the median error is 1 %. For the S6-NineYoung-F search for nine young supernova remnants shown in Table III, the mean relative error between measured and estimated depths is 4 % (median error 4 %).
For two cases of interest we investigated more closely to understand the origin of the observed deviation: S5-GalacticCenter-StackSlide-F search [72]: the reason for the relatively large deviation of 19% in this case between D est = 58.2 Hz −1/2 and D med meas = 72.1 Hz −1/2 can be understood by looking at the details of this search setup: contrary to the assumed uniform averaging of antenna-pattern functions over time (cf. Sec. III C, this search setup was specifically optimized by choosing the relatively short segments of T seg = 11.5 hours in such a way as to maximize sensitivity, by selecting times of maximal antenna-pattern sensitivity towards the particular sky direction of the galactic center. This is described in more detail in [46], and is quoted there as yielding a sensitivity improvement of about 20%, consistent with the observed enhancement of measured sensitivity compared to our estimate.
S6-CasA-StackSlide-F search [22]: the deviation between D est = 79.6 Hz −1/2 versus D med meas = 72.9 Hz −1/2 does not seem very large per se, but is unusual for the estimate typically does not tend to overestimate sensitivity by that much. A detailed investigation led us to discover a bug in the original upper-limit script used in [22], which resulted in the injection-recovery procedure to sometimes search the wrong box in parameter space, missing the injected signal. By artificially reproducing the bug in our upper limit simulation we are able to confirm that this bug does account for a decrease in detection probability of about 7%, resulting in an underestimate of the upper-limit depth as shown in Fig. 10.

D. Searches for neutron stars in binaries
Estimated and measured sensitivity depths for searches for CWs from neutron stars in binary systems are given in Tables IV, and further details about individual searches can be found in appendix A 4. In this case the only Fstatistic-based search is S2-ScoX1-F, for which we obtain an estimate of D est = 4.4 Hz −1/2 (assuming an average mismatch of µ ∼ 0.1/3 corresponding to a cubic lattice with maximal mismatch of 0.1 [60]). The relative error is between measured and estimated sensitivity depth is therefore 8 %.

E. Targeted searches for known pulsars
Estimated and measured sensitivity depths for targeted searches are given in Tables V, and further details about individual searches can be found in appendix A 5.
Note that the quoted upper limits of the BayesPE-method are obtained by Bayesian parameter-estimation [52] of P (h 0 | x) directly on the data x. Therefore we cannot directly apply the Bayesian sensitivity estimate derived in Sec. III F, which assumes an initial F(x)-statistic computed on the data, from which the Bayesian upper limit would be derived. We therefore provide an approximate comparison with the expected sensitivity estimate, which we compute by estimating depths using 2F * drawn from a central χ 2 4 distribution (given each target corresponds to a single template) and averaging the resulting estimated D values. In cases where several targets are covered by the search, we assume for simplicity that the targets are isotropically distributed over the sky and compute a single all-sky sensitivity estimate. For singletarget searches the exact sky position is used for the estimate. The mean relative error between measured and estimated depths is 16 %, and the median error is 10 %. In this paper we presented a fast and accurate sensitivity-estimation framework and implementation for F-statistic-based search methods for continuous gravitational waves, extending and generalizing an earlier analytic estimate derived by Wette [1]. In particular the new method is more direct and uses fewer approximations for single-stage StackSlide-F searches, and is also applicable to multi-stage StackSlide-F searches, Hough-F searches and Bayesian upper limits (based on F-statistic searches).
The typical runtime per sensitivity estimate is about 10 seconds with cached P (R 2 ) distribution, and about 25 seconds per detector for the first call with a new parameter prior. The accuracy compared to simulated Monte-Carlo upper limits in Gaussian noise is within a few % (provided the threshold corresponds to a low falsealarm level), and we find generally good agreement (of less than ∼ 10% average error) compared to published upper limits in the literature. Several factors leading to the observed deviations in various cases are discussed in detail.
We also provided a comprehensive overview of published CW upper limit results, converting the quoted h 0 upper limits into sensitivity depths. This introduces some systematic uncertainties, as we often do not have access to the original PSD estimate used for the upper limits. We therefore advocate for future searches to directly provide their upper-limit results also in terms of the sensitivity depth of Eq. (33), in order to allow easier direct comparison between searches and to sensitivity estimates. In this appendix we will refer to the different detectors as G for GEO600 [84], V for VIRGO [85,86], H1 and H2 for the two LIGO detectors in Hanford (4km, 2km) and L1 for LIGO Livingston [87,88].
We will use the common abbreviations CW for continuous gravitational waves, SFT for Short Fourier Transform, PSD for power spectral density and UL for upper limits.
The quoted sensitivity depths in tables I -V can correspond to different confidence levels, as some searches use 90%-and others 95%-confidence upper limits. This applicable confidence level is denoted by using regular versus italic font in the tables, respectively.
For searches over many frequencies, multiple targets or for upper limits reported separately for different detectors, we us a consistent averaging procedure using the median and median absolute deviation of Eq. (60) in order to estimate the mean and standard deviation in an outlier-robust way.
PowerFlux and loosely-coherent searches typically give separate upper limits for circular (best) polarisation and for the worst linear polarization, but not the more common type of population-averaged upper limits. There has been some work estimating conversion factors for these upper limits into into polarization-averaged sensitivity, writing D PF ∼ w worst D PF worst and D PF ∼ w best D PF best . For example [1] obtains the conversion factors in the ranges w worst ∼ 1.1 − 1.3 and w best ∼ 0.39 − 0.46. More recent work estimating these conversion factors on O1 data (cf. Fig.[26]) for 90%-confidence upper limits yields [89] w worst 1.51±0.13 and w best = 0.52±0.02. However, these conversion factors were obtained by treating the set of upper limits as a whole, they should not be used to derive a proxy of population average upper limits in individual frequency bands. Furthermore, PowerFlux strict upper limits are derived by taking the highest upper limits over regions of parameter space. This procedure has the advantage of the upper limits retaining validity over any subset of parameter space, such as a particular frequency and or particular sky location. However, the maximization procedure makes it difficult to convert the data into population average upper limits which are more robust to small spikes in the data. Given that there is currently some uncertainty on the detailed values of the conversion factors to use for different PowerFlux searches, here we report the best/worst upper limits converted into sensitivity depths separately in tables I and II.
Generally, for converting h 0 upper limits into depths according to Eq. (33), we need to use an estimate for the corresponding noise PSD S n , for which we either use a corresponding PSD over the data used in the search, where available, or a 'generic' PSD estimate from LIGO for the given science run [90,91] otherwise. This adds another level of uncertainty in the conversions, which could easily be in the range 10% − 20% due to different calibrations and different types of averaging over time.
2. All-sky searches, see Table I a. S2-AllSky-Hough [59] The first all-sky search for CWs from isolated neutron stars, using a semi-coherent Hough transform method applied on Short Fourier Transforms (SFTs) of the data of length T seg = 30 min. The search used data from the second LIGO Science Run (S2), and the number of SFTs used in the search was 687 from L1, 1761 from H1 and 1384 from H2.
The UL sensitivity depth for this search is calculated as the mean over the three depths for H1, L1 and H2, where each depth is computed from the respective quoted best upper-limit value h 95% 0 and the corresponding PSD S n in TABLE III of [59].
b. S2-AllSky-F [60] A matched-filtering search based on the coherent (single-detector) F-statistics, using 20 SFTs from H1 and 20 SFTs from L1 (SFT length T SFT = 30 min). The perdetector F-statistic values were combined via a coincidence scheme, determining the most significant candidate in each ∼ 1 Hz band, which was then used for measuring the upper limits.
The sensitivity depth for this search is calculated from the given (combined multi-detector) upper limits h 95% 0 (f ) over the search frequency range, combined with the harmonic mean over generic H1-and L1-PSDs for the LIGO S2 data.
The estimate was calculated with the mean loudest templates of the search given in the paper as F th = (39.5, 32.2) for the L1 and H1 detector, respectively, and we used an average mismatch of 0.5 % in the H1 search and 1 % in the L1 search, estimated from Figs. 27,28 in [60].
c. S4-AllSky-{StackSlide,Hough,PowerFlux} [35] Three semi-coherent all-sky searches using different search methods, all based on incoherently combining SFTs of length T seg = 30 min. The StackSlide and the Hough search used 1004 SFTs from H1 and 899 from L1 and the Hough search additionally included 1063 SFTs from H2. The PowerFlux search used 1925 and 1628 SFTs from H1 and L1, respectively.
The sensitivity depths are calculated from the quoted upper limits h 95% 0 (f ) from each of the three searches over the search frequency range, combined with the PSDs for two (H1 and L1) detectors (as a common reference) from the S4 science run. Note that the Hough depth corresponds to the quoted multi-detector UL, while the other searches reported only per-detector ULs.
d. S4-AllSky-F+Coinc [61] A search which used the distributed computing project Einstein@Home [44] to analyse 300 h of H1 data and 210 h of L1 data from the S4 run. The data was split into 30 h long segments coherently analysed with the multidetector F-statistic followed by a coincidence-step. The measured sensitivity depth D 90% meas is calculated by converting the quoted sensitivity factors R 90% = {31.8, 33.2} (for frequencies below and above 300 Hz, respectively) into sensitivity depths. However, given these were computed with respect to an (arithmetic) averaged PSD estimate (given in Fig.1 in the paper), we first converted these factors back into equivalent h 0 values using the mean-PSD, and then computed the Depth with respect to the harmonic-mean (over detectors) generic noise PSD for S4. e. earlyS5-AllSky-PowerFlux [62] An all-sky search with PowerFlux over the first eight months of S5 data. The search in total used roughly 4077 h of H1 data and 3070 h L1 data, divided into SFT segments of T seg = 30 min.
The sensitivity depth is calculated from the quoted per-detector upper limits h 95% 0 (f ) over the search frequency range and the corresponding S5 noise PSDs.
f. earlyS5-AllSky-F+Coinc [63] An all-sky search run on Einstein@Home [44], using 660 h of data from H1 and 180 h of L1 data, taken from the first 66 days of the LIGO S5 science run. The data was divided into 28 segments of T seg = 30 h duration, and each segment was searched using the fully-coherent multidetector F-statistic. These per-segment F-statistics were combined across segments using a coincidence scheme.
The measured sensitivity depth D 90% meas is calculated as the median over the converted sensitivity depths converted from the quoted sensitivity factors R 90% = {29.4, 30.3} in the paper for the frequencies below and above 400 Hz, respectively.
The sensitivity depth is calculated from the quoted upper limits h 95% 0 and the S5 noise PSD.
h. S5-AllSky-Hough-F [36] An all-sky search using the Hough-F variant of the semi-coherent Hough method described in Sec. II C 2, which was run on Einstein@Home. The analyzed data consisted of 5550 and 5010 SFTs from the LIGO H1 and L1 interferometers, respectively, taken from the second year of the S5 science run. The data was divided into 121 segments of length T seg = 25 h, and the coherent persegment F-statistic was combined via the Hough method to compute the Hough number count of Eq. (30).
The sensitivity depth of the search is calculated from the quoted h 90 % 0 upper limits and the corresponding S5 noise PSD.
The estimated sensitivity depth uses the generalization of the estimator described in Sec. III E with a numbercount threshold of n c,th = 70, a per segment threshold of F th = 2.6 and a mismatch histogram obtained from an injection-recovery simulation (with an average mismatch ofμ = 0.61).
i. S5-AllSky-Hough [65] An SFT-based Hough all-sky search on S5 data. The search was split into the first and the second year of S5, which were searched separately. The first year used 11 402 SFTs from H1, 12 195 SFTs from H2 and 8 698 SFTs from L1, of length T SFT = 30 min. The analysis of the second year used 12 590 H1-SFTs, 12 178 H2-SFTs and 10 633 L1-SFTs.
The sensitivity depth is calculated from the quoted h 90% 0 upper limits of the second year search found in the paper and from the S5 noise PSD.
j. S5-AllSky-StackSlide-F [66] A high frequency all-sky search to complement previous lower-frequency all-sky searches on S5 data. The search used the so-called GCT method [57] implementing the StackSlide-F statistic and was run on the distributed Einstein@Home platform. The search used a total of 17 797 SFTs spanning the whole two years of S5 data from H1 and L1, divided into 205 segments of length T seg = 30 h.
The measured sensitivity depth D 90% meas is determined by extrapolating the depth values given in the paper for critical ratios of 0 and 3.5 to the median critical ratio over all frequency bands of −0.15 according to figure 6 of [66].
For the estimate we determined the median threshold over all frequency bands from figure 4 of [66] to 2F th = 5.72. Two mismatch histograms at 1255 Hz and 1495 Hz generated with injection-recovery studies were used. The average mismatch for both was µ ≈ 0.82. The quoted value is the mean of the two estimates with different mismatch histograms. An all-sky search using data from the first Virgo science run, VSR1. The search method uses a time-domain implementation of the coherent F-statistic, computed over 2-day coherent segments, which are combined using coincidences. In total the search used 134 days of data.
The measured sensitivity depth D 90% meas is calculated as median of the given sensitivity factors of 15.6 and 22.4. l. {VSR2,4}-AllSky-FreqHough+FUP [68] This all-sky search was performed using data from initial Virgos second (VSR2) and forth (VSR4) science run. It used the FrequencyHough transform as incoherent step with 149 days of data of VSR2 and 476 days of data of VSR4 using segments of length 8192 seconds. The initial candidates were followed-up using 10 times longer segments.
The measured sensitivity depth was calculated from upper limits h 90% 0 extracted from figure 12 of [68] and the harmonic mean of the PSD estimates of VSR2 and VSR4 in 0.1 Hz frequency bands.
m. S6-AllSky-StackSlide-F [56] This search used 12 080 SFTs from L1 and H1 data to perform a StackSlide-F search based on the GCT implementation, and was run on Einstein@Home. The search used 90 coherent segments of length T seg = 60 h.
The measured sensitivity depth D 90% meas is determined by extrapolating the depth from the given critical ratios 0 and 6 to the median critical ratio of −0.07 according to figure 5 of [56].
The estimated depth is given for a threshold of 2F th = 6.694 which is the median of the thresholds given for the frequency bands in figure 4 of [56]. For the estimate two mismatch histogram created with injection-recovery studies for 55 Hz and 505 Hz was used. The average mismatch of the grid in the parameter space was at both frequencies found to be µ = 0.72. The quoted value is the mean of the two estimates with different mismatch histograms.
n. S6-AllSky-StackSlide-F+FUP [50] A multi-stage follow-up on candidates from the S6-AllSky-StackSlide-F search described in the previous paragraph, zooming in on candidates using increasingly finer grid resolution and longer segments. Every candidate from the initial stage with 2F ≥ 6.109 was used as the center of a new search box for the first-stage follow-up, continuing for a total of four semi-coherent follow-up stages. The sensitivity of the search is dominated by the initial-stage threshold, because the later stages are designed to have a very low probability of dismissing a real signal. The measured sensitivity depth D 90% meas = 46.9 Hz −1/2 of this search is directly taken from the quoted value in the paper.
The estimated multi-stage sensitivity of Sec. III D using the thresholds given in the paper, namely {2F (i) th } = (6.109, 6.109, 7.38, 8.82, 15) and a mismatch histogram generated by recovery injection studies for the main search and mismatch histograms provided by the original authors for every stage with average mismatches {µ (i) } = (0.72, 0.55, 0.54, 0.29, 0.14), yields a value of D 90% = 38.3 Hz −1/2 , which differs significantly from the quoted measured sensitivity depth. As discussed in Sec. V, we trace this discrepancy to the low threshold used, which significantly affects the loudest-candidate mismatch approximation used in the theoretical estimate.
o. S6-AllSky-PowerFlux [69] The data used by this search span a time of 232.5 d with duty factor of the detectors of 53% for H1 and 51% for L1.
The measured sensitivity depth is calculated from the quoted upper limits h 95% 0 in the paper and the S6 noise PSD.
p. O1-AllSky-StackSlide-F [26] A low-frequency all-sky search for gravitational waves from isolated neutron stars using the distributed computing project Einstein@Home on data from Advanced LIGO's first observing run (O1). This search used the GCT implementation of the semi-coherent StackSlide-F method with N seg = 12 segments of length T seg = 210 h in the initial search stage. The analyzed data consisted of 4 744 SFTs from the H1 and the L1 detector. The search also included a hierarchical follow-up similar to the S6Bucket follow-up search [50].
The measured sensitivity depth D 90% meas = 48.7 Hz −1/2 of this search is directly taken from the quoted value in the paper.
The sensitivity estimate used a threshold 2F th = 14.5 which we inferred from figure 4 in [26] and we obtained the mismatch histograms of the template grid at different frequencies using an injection-recovery study, which yielded an average mismatch of µ = 0.35 and µ = 0.37 at 20 Hz and 100 Hz respectively. The quoted depth is the average of the two different estimates resulting for each mismatch histogram. Note that the contrary to the measured sensitivity, the estimate only uses the first-stage parameters in this case, as we currently cannot model the line-robust statistic used in the follow-up stages. However, as mentioned in Sec. III D, the overall detection probability is dominated by the first stage, while subsequent stages mostly serve to reduce the false-alarm level.
The first paper [25] searched the lower frequency range [20,475] Hz, using four methods: PowerFlux, Frequency-Hough, SkyHough and a time-domain F-statistic search with segment-coincidences (denoted as F TD +Coinc). The PowerFlux, FrequencyHough and SkyHough search used SFT lengths in the range 1800 − 7200s as coherent segments while the Time-Domain F-statistic used a coherence time of T seg = 6 d. The total amount of analyzed data was about 77 d of H1 data and 66 d of L1 data.
In the second paper [70] three of these searches were extended up to 2000 Hz, namely PowerFlux, SkyHough and a time-domain F-statistic search with segment-coincidences (denoted as F TD +Coinc), using the same data.
The sensitivity depths for the four searches are calculated from the quoted h 95% 0 amplitude upper limits and the noise PSD for the O1 science run.
Note that for the SkyHough method a sensitivity depth of 24.2 Hz −1/2 is quoted in the paper. However, this value is based on a slightly different convention for the multidetector noise PSD S n (maximum over detectors instead of the harmonic mean) than used here. For consistency with the other searches in table I we therefore compute the sensitivity depth by converting from the quoted h 95% 0 upper limits instead.
A comparison of PowerFlux 90%-confidence upper limits for an isotropic polarization population were provided for the O1 Einstein@Home paper [26], with a frequency spacing of 0.0625 Hz, which are converted into sensitivity depth using the O1 noise PSD. Tables II, III a. earlyS5-Crab-F [71] This search aimed at the Crab pulsar and used the first nine month of initial LIGO's fifth science run (S5). It consisted of both a targeted (described in Sec. A 5 d) and a directed F-statistic search described here. The directed search used 182, 206 and 141 days of data from the H1, H2 and L1 LIGO detectors, respectively. The measured depth value is calculated from the given upper limits h 95% 0 and the PSD estimate of the S5 data at the search frequency.

Directed Searches, see
The estimated depth uses the StackSlide estimator for a coherent search with N seg = 1 segment, a threshold of F th = 37 and a maximal template bank mismatch of 5% (given in the paper), from which we estimate the average mismatch asμ ∼ 1 3 5% (assuming a square lattice).
b. S5-CasA-F [47,92] The first search for continuous gravitational waves from the Cassiopeia A supernova remnant using data from initial LIGO's fifth science run (S5). The search coherently analyzed data in an interval of 12 days (934 SFTs of length 30 min) using the F-statistic.
The measured sensitivity depth is obtained from the quoted upper limits h 95% 0 in the paper and the S5 noise PSD.
The estimate is calculated using the StackSlide estimator for a coherent search (N seg = 1 segment), with the mismatch histogram for an A * n lattice with maximal mismatch of µ = 0.2 (obtained from LatticeMis-matchHist() in [54]), and the average threshold of 2F th = 55.8 (averaged over the respective loudest 2Fcandidates found in each of the upper-limit bands).
c. S5-GalacticCenter-StackSlide-F [46,72] The first search for continuous gravitational waves directed at the galactic center. The search used LIGO S5 data and the GCT implementation of the StackSlide-F semi-coherent search algorithm with 630 segments, each spanning 11.5 h, for total data set of 21 463 SFTs of length 30 min.
The segments of the search were selected from the whole S5 science run in such a way as to maximize the SNR for fixed-strength GW signals at the skyposition of the galactic center. Therefore the selected segments fall at times where the antenna patterns of the LIGO detectors are better than average for this particular skyposition. As discussed in Sec. VI C, the sensitivity-estimation method presented in this work assumes the antenna patterns are averaged over multiple days, which causes a unusually large deviation between the estimate and the measured sensitivity depth from the h 90% 0 upper limits. The estimate is calculated using the mismatch histogram (with mean µ = 0.13) obtained from an injectionrecovery study on the template bank of this search, and a detection threshold of 2F th = 4.77.
d. VSR4-{Vela,Crab}-5-vector [73] This coherent narrow-band search on the data from initial Virgo's forth science run (VSR4) was directed at the Vela and the Crab pulsars. This search used the 5-vector method, and covers a range of ±0.02 Hz the twice the known frequencies of Vela and Crab. The total amount of data used is 76 d.
The measured sensitivity depth for this search was obtained from the published h 95% 0 upper limits and the noise PSD estimate for VSR4. e. S6-NineYoung-F [21] This search was directed at nine different targets, listed in Table III, each corresponding to a (confirmed or suspected) compact object in a young supernova remnant. The search uses a fully-coherent F-statistic. The amount of data used for every target varies between 7.3 × 10 5 s and 3.1 × 10 6 s (cf. table III).
The measured depth is calculated for each of the targets from the quoted upper limits h 95% 0 and the corresponding PSD for the actual data used in the search.
The estimate for each target is calculated using the StackSlide estimator for a coherent search (N seg = 1 segment), with the mismatch histogram for an A * n lattice with maximal mismatch of µ = 0.2 (obtained from LatticeMismatchHist() in [54]), and the average 2F th threshold found for each target (averaged over the respective loudest 2F-candidates found in each of the upperlimit bands) are given in table III.
The 'NineYoung' entry in Table II presents the median  depth over all targets for the measured and estimated  depths, respectively. f. S6-CasA-StackSlide-F [22] A search directed at Cassiopeia A, which was run on the distributed computing project Einstein@Home using data from the LIGO S6 science run. The search was based on the GCT implementation of the semi-coherent StackSlide-F statistic, with N seg = 44 segments of length T seg = 140 h, and a total amount of data of 13 143 SFTs of length 30 min from the two LIGO detectors in Hanford (H1) and Livingston (L1). The measured sensitivity depth given in table II is computed from the h 90% 0 upper limits quoted the paper [22] combined with the corresponding PSD estimates. However, as discussed in VI C, this measurement suffered from a bug in the upper-limit script and as a result is somewhat too conservative (i.e. too high).
The estimated sensitivity is calculated assuming an average threshold of F th = 8.25 (estimated from Fig. 4 in [22]) using the mean over estimates with different mismatch histograms generated by injection-recovery studies at different frequencies (spanning 50 − 1000 Hz, average mismatch ∼ 9%).
g. S6-OrionSpur-LooselyCoherent [74] This was a search employing the so-called looselycoherent method, aimed at the Orion spur towards both the inner and outer regions of our Galaxy. The explored sky regions are disks with 6.87 • diameter around 20 h 10 m 54.71 s + 33 •  h. S6-NGC6544-F [75] This was the first search directed at the nearby globular cluster NGC 6544. The search coherently analyzed data from the two LIGO detectors S6 science run with the F-statistic, using a single coherent segment with T seg = 9.2 d. The search analyzed two different data stretches separately. The first one contained 374 SFTs while the second contained 642 SFTs, with SFT duration of 30 min.
The measured depth was determined from the upper limits h 95% 0 given in figure 2 of [75] and a PSD estimate for the LIGO S6 run.
The estimate used the StackSlide estimator with one segment, a threshold of 2F th = 55 (quoted in the paper) and an average mismatch of 0.2/3 (assuming a roughly square lattice).
i. O1-Narrow-band-5-vector [20] A narrow-band search aiming at 11 known pulsars using the fully-coherent 5-vector method on data from Advanced LIGO's first observing run (O1). The search used a total of 121 days of data from the Hanford (H1) and Livingston (L1) detectors.
The sensitivity depth in the table is calculated from the median over the single-target depths, which are converted from the upper-limits h 95% quoted in the paper and the corresponding noise PSD of the data used.
j. O1-{SN1987,GalacticCenter}-Radiometer [76] Described in Sec.A 4 g. 4. Searches for neutron stars in binary systems, see Table IV a. S2-ScoX1-F [60] This first search designed specifically aimed at the NS in the LMXB system Scorpius X-1, using a coherent single-detector F-statistic and a coincidence check on a 6 h long stretch of S2 dat.
The measured sensitivity depth was calculated from the quoted upper limits h 95% 0 in the paper (for the zeroeccentricity case e = 0) and the PSD estimate of the corresponding S2 data.
b. S5-ScoX1-Sideband [77] A search aimed at Scorpius X-1 by incoherently combining sidebands of a coherent F-statistic search that only demodulates the signal for the sky-position but not its binary-orbital Doppler modulation. This method used a stretch of 10 days of data selected from the S5 science run for maximal sensitivity. Two searches were performed, one with no prior assumptions about the orientation of Sco-X1, and one using more restrictive anglepriors based on electromagnetic observations. Bayesian upper limits h 95% 0 were computed over the search frequency range, which we convert into sensitivity depths (for the unknown-polarization case, see Fig.5(a) in [77]) using the noise PSD for the data given in the paper. In each 1Hz-band, 2 × 10 6 upper limit values were quoted, of which we use the maximum value in each 1Hzband in order to be consistent with the usual "loudestcandidate" approach of setting upper limits in a given frequency band.
The quoted upper limits h 95% 0 for the all-sky search and the Scorpius X-1 search were converted into Depths using a combined (generic) PSD for the S6, VSR2 and VSR3 science runs.
d. S6-{ScoX1,J1751}-TwoSpect [78] A search for CW from the low-mass X-ray binaries Scorpius X-1 and XTE J1751-305 using the TwoSpect algorithm. It used about 4 × 10 7 s from each of the two detector in the S6 science run. It used two different length of the SFTs 840 s and 360 s which also where the length of the coherently analysed segments.
The given sensitivity depth D 95% 0 is obtained from the quoted h 95% 0 upper limits combined with the corresponding noise PSD for S6 data.
e. O1-ScoX1-Viterbi [24] A search aimed at Scorpius X-1 using the Viterbi search method performed on 130 days of data from Advanced LIGO's first observational run (O1), segmented into coherent segments of length T seg = 10 days.
The measured sensitivity depth is converted from the quoted upper limits h 95% 0 (for unknown polarization) and the noise PSD of the corresponding O1 data.
Note that contrary to many other search methods, this search setup appears to result in a frequencydependent sensitivity depth, namely D(f ) ∝ f −1/4 (see Eq.(9) in [24]). For consistency with other searches, we quote the median and (MAD) standard-deviation over frequencies in Table IV, and note that the total range of sensitivity depths of this search is found as D(f ) ∼ 11 (f /f 0 ) −1/4 Hz −1/2 ∈ [4.6, 11.2] Hz −1/2 with f 0 = 60.5 Hz.
f. O1-ScoX1-CrossCorr [23] This search aimed at Scorpius X-1 using the Cross-Corr search algorithm using data from Advanced LIGO's first observational run (O1). The data was split into coherently analysed segments (SFTs) with a (frequencydependent) length between 240 s and 1400 s.
The measured sensitivity depth is obtained from the quoted (isotropic-prior) upper limits h 95% 0 and the noise PSD of the O1 data. Note, however, that the search ULs are given per 0.05 Hz bands, which is unusually small compared to most other upper-limit bands (typically 0.25 − 1 Hz), and therefore they display more variability. In order to make these ULs more comparable to other searches, we use the 95th-percentile highest upper limits per 1Hz-bands (as recommended in Fig. 5 of [23]). This 'binning' procedure only has a small effect on the resulting sensitivity depth, which is reduced from 25.3 Hz −1/2 to 24.0 Hz −1/2 .
Note that this search has a frequency-dependent sensitivity depth, which starts at around D(25 Hz) ∼ 45 Hz −1/2 for low frequencies, asymptoting down to D ∼ 23 Hz −1/2 above f 800 Hz. However, in order to be consistent with other searches, we quote the median and (MAD) standard deviation over all frequencies in Table. IV.
g. O1-{ScoX1 and others}-Radiometer [76] The 'Radiometer' search method, which was developed mainly for stochastic background searches, can also be used for directed CW searches at particular skypositions. This method does not use a particular signal model, which allows it to be sensitive to a wide range of possible signal families, at the cost of somewhat lower sensitivity to 'regular' CW signals. This search aimed at the sky-positions of Sco-X1, as well as at the supernova remnant 1987A and the galactic center.
The search reported h 90% 0 (and h 95% 0 for Sco-X1, reported in [23]) upper limits in narrow frequency bands of 1/32Hz = 0.03125 Hz bands, which is unusually small compared to most other upper-limit bands (typically 0.25 − 1 Hz), and therefore they display more variability. In order to make these ULs more comparable to other searches, we use the 95th-percentile highest upper limits per 1Hz-bands (as recommended in Fig. 5 of [23]), and following the same procedure as used for the CrossCorr results (discussed in Sec. A 4 f). Table V a. S1-J1939+21-{F,BayesPE} [32] This first CW search on data from GEO 600 and LIGO's first science run (S1).

Targeted Searches, see
It used (16.7, 5.73, 8.73, 8.9) days of data from four detectors, GEO 600 (G1), LIGO Livingston (L1), LIGO Hanford-4 km (H1), and LIGO Hanford-2 km (H2), respectively. Two types of searches were performed, a coherent F-statistic search as well as direct Bayesian parameter estimation (BayesPE). Table V gives the mean and standard deviation for the sensitivity depths over the four detectors. The measured sensitivity depth for the F-search was determined from the quoted upper limits h 95% 0 in table IV [32] for the most pessimistic ι (cos ι = 0) and ψ, and from the quoted numbers in the conclusion for the (standard) populationaveraged orientation. The noise PSD values are taken from table III in [32]. The corresponding estimate is calculated with the StackSlide estimator for N seg = 1 and quoted threshold values 2F th = (1.5, 3.6, 6.0, 3.4) for the four detectors from table III in the paper. For the 'worstcase' estimate we use the prior cos ι = 0 and minimise the sensitivity depth over ψ ∈ [−π/4, π/4] in order to reflect the 'conservative' ULs quoted in the paper. Note, however, that contrary to the typically small false-alarm level (p-value) of the UL thresholds used (typically 1%), the loudest candidates used here as thresholds here had relatively high p-values of 83%, 46%, 20% and 49%, respectively, as seen in table III of [32].
b. S2-Known pulsars-BayesPE [79] A coherent targeted search for 28 known isolated radio pulsars was performed using the Bayesian parameterestimation pipline (BayesPE) on data from the second LIGO Science Run (S2), using 910 h of data from H1, 691 h from H2 and 342 h of L1 data from the S2 data set.
The measured sensitivity depth is calculated from the quoted Bayesian upper limits h 95% 0 and corresponding noise PSD estimates for the S2 science run.
The sensitivity estimate is performed using the Bayesian sensitivity estimator, for simplicity assuming the sources are distributed isotropically over the sky.
The measured sensitivity depth was determined from the quoted Bayesian upper limits h 95% 0 combined with the noise PSD of the S3 and S4 science runs combined (using harmonic mean).
The sensitivity estimate is calculated using the Bayesian sensitivity estimate, for simplicity assuming the sources to be isotropically distributed on the sky. d. earlyS5-Crab-BayesPE [71] This search on 9 months of data from the early LIGO S5 science run targeted only the Crab pulsar at twice its rotation rate, using the Bayesian parameter-estimation pipeline. A corresponding narrow-band search using the F-statistic is described in Sec. A 3 a. The targeted search used 201, 222 and 158 days of data of the H1, H2 and L1 LIGO detectors.
The measured depth is determined from the quoted (i.e. the corrected value in the Erratum) upper limit h 95% 0 assuming an isotropic polarization prior, and the corresponding noise PSD of the detectors for the early S5 science run data.
e. S5-Known pulsars-BayesPE [81] A search targeting 116 known pulsars using 525 days of H1 data, 532 days of H2 data and 437 days of L1 data from LIGO's fifth science run (S5). The search employed the Bayesian parameter-estimation pipeline.
The measured sensitivity depth is calculated from the quoted Bayesian upper limits h 95% 0 and the noise PSD of the S5 data.
The estimate is calculated with the Bayesian sensitivity estimator under the assumption that the targets are distributed isotropically over the sky.
f. VSR2-Vela-{BayesPE,F,5-vector} [82] A targeted search for the Vela pulsar using Virgo's second science-run (VSR2) data, using three different methods: Bayesian parameter estimation, the F-statistic (and G-statistic) and the 5-vector method. The data set consisted of 149 days of Virgo data.
Two types of searches and upper limits were computed, namely (i) using uninformative (isotropic) priors on the pulsar orientation, and (ii) using angle priors on cos ι and ψ from electromagnetic observations.
In table V we only give the measured depth corresponding to the isotropic prior, averaged over the three methods, which obtained very similar results. This was computed from the quoted upper limits h 95% 0 and the noise PSD for the Vela VSR2 run. The measured sensitivity depth obtained when using the angle priors is found as 462.1 ± 35.0 Hz −1/2 .
The estimated sensitivity depth is calculated using the Bayesian sensitivity estimator.
g. {S6,VSR2,4}-Known pulsars-{BayesPE,F,5-vector} [83] This search targeted 195 known pulsars, using 149 days of VSR2 and 76 days of VSR4 data for pulsars with a CW frequency lower than f < 40 Hz and an additional 238 days of S6 data from H1 and 225 days from L1 for faster spinning pulsars with f > 40 Hz. The analysis was done using three different methods: Bayesian parameter estimation, the F-statistic (or G-statistic for restricted angle priors) and the 5-vector method.
The given measured sensitivity depth in table V is the median and MAD standard deviation over the sensitivity depths for the different targets (averaged over highand low-frequency targets). The sensitivity depths are obtained from the quoted upper limits h 95% 0 and the corresponding noise PSD estimate of the data used (which is either S6 and VSR2 and VSR4 for high-frequency targets f > 40 Hz, or only VSR2 and VSR4 for low-frequency targets).
The estimated sensitivity is obtained from the Bayesian sensitivity estimator assuming an isotropic prior over the sky, averaged over high-and low-frequency depths results.
h. O1-Known pulsars-{BayesPE,F,5-vector} [19] In this search 200 known pulsars were targeted using three different methods: Bayesian parameter estimation, the F-statistic (or G-statistic for restricted angle priors) and the 5-vector method. The searches used 78 and 66 days of H1 and O1 data from the first observational run of advanced LIGO (O1), respectively.
The measured sensitivity depth is obtained from the quoted Bayesian upper limits h 95% 0 over all targets and the corresponding noise PSD for the LIGO detectors during O1.
The estimated sensitivity depth is determined from the Bayesian estimator as an all-sky estimate assuming the targets are isotropically uniformly distributed over the sky.
Appendix B: CW Signal model and F-statistic A plane gravitational wave arriving from a direction n (unit vector) can be written [93] in TT gauge (in the notation of [94]) as a purely spatial strain tensor h ↔ with two polarizations +, ×, namely where τ is the emission time of the signal in the source frame, and e ↔ + and e ↔ × are the two polarization basis tensors, which can be constructed from a right-handed orthonormal basis {ˆ ,m, −n} as e ↔ + =ˆ ⊗ˆ −m ⊗m and e ↔ × =ˆ ⊗m +m ⊗ˆ . The measured scalar CW signal h X (t) at time t by detector X is the response of the detector to the GW tensor h ↔ (τ X (t)), where τ X (t) denotes the emission time of a wavefront that reaches detector X at time t. This timing relationship depends on the sky-positionn of the source as well as any binary-orbital parameters in case of a CW from a neutron star in a binary system, as it describes the time-dependent light-travel time from the source to the detector. In the long-wavelength limit we assume the GW wavelength to be much larger than the detector armlength, which is a good approximation for current ground-based detectors up to kHz frequencies. This allows us to write the detector response as a tensor contraction (in both tensor indices): where d ↔ X =û ⊗û −v ⊗v for interferometer arms along unit vectorsû andv.
Appendix C: Distribution of F-statistic maximized over correlated templates It has been a long-standing assumption (e.g. [1,47] that the distribution of the statistic 2F * (x) ≡ max λi 2F(x; λ i ) in Gaussian noise x, maximized over a template bank λ i ∈ T of i = 1 . . . N (generally correlated) templates can be modelled by assuming maximization over an "effective" number of uncorrelated trials N instead, namely P (2F * | N ) = N cdf 0 (2F * ) N −1 pdf 0 (2F * ) , (C1) where pdf 0 (2F) = P (2F | ρ = 0) , where the (single-template) F-statistic in pure Gaussian noise follows a central χ 2 distribution (with four degrees of freedom in the fully-coherent case Eq. (13), or 4N seg degrees of freedom for a semi-coherent F-statistic over N seg segments, Eq. (27)). We show here by counter-example that the model of Eq. (C1) is not generally accurate, as correlations between templates do not simply modify N but also change the functional form of the distribution. It has been hypothesized previously [1] that these (already-observed) deviations might be due to certain approximations (c.f. [42]) used in the numerical implementation of the F statistic. While such effects will account for some amount of deviation, one can show this effect to be quite small overall.
We demonstrate the fundamental statistical nature of this discrepancy by using a simpler example: we generate a time-series {x j } N −1 j=0 of N = 200 samples drawn from a Gaussian distribution and compute the Fourier transform x k normalized to E[|x k | 2 ] = 2, such that 2F 2 (x, f ) ≡ |x(f )| 2 follows a central χ 2 distribution with two degrees of freedom in every frequency bin f . We can therefore set pdf 0 (2F 2 ) = χ 2 2 (2F 2 ; 0) and use the corresponding cdf in Eq. (C1).
We consider different cases of oversampling by zeropadding the time-series to a multiple (denoted as the oversampling factor in Fig. 11) of the original N time samples: the N/2 − 1 = 99 (positive) frequency bins without oversampling are strictly uncorrelated (and we also know that there can be at most N = 200 independent templates in total, given the length of the initial timeseries). With increasing oversampling, the correlations between frequency bins increase. We repeate this process 10 6 times for different noise realizations, and in each case we compute 2F * 2 (x) over all the (positive) frequency bins of the Fourier power, and histogram these values. We then fit the number of effective templates N in the theoretical distribution of Eq. (C1) by minimizing the (symmetric) Jensen-Shannon divergence between the measured and theoretical distributions. The results are shown in Fig. 11 for different cases of oversampling. We see that for increased oversampling, i.e. more correlations between 'templates' (i.e. frequency bins), the functional form of the histogram agrees less with the theoretical distribution assuming independent templates. The effect seems to saturate for oversampling 10, with N ∼ 230 greater than the known maximal number (i.e. N = 200 of (strictly) independent template in this vector space.
There is no simple or intuitive explanation for this effect that we are aware of, but it is reminiscent of a similarly surprising result found in the localization of the maximum over different assumed signal durations of transient CW signals, see Figs. 8 and 9 in [96]. The distribution of the statistic is identical in each time-step, but the steps are correlated, resulting in a peculiar non-uniform distribution of the location of the maximum.