Improved all-sky search method for continuous gravitational waves from unknown neutron stars in binary systems

Continuous gravitational waves from spinning deformed neutron stars have not been detected yet, and are one of the most promising signals for future detection. All-sky searches for continuous gravitational waves from unknown neutron stars in binary systems are the most computationally challenging search type. Consequently, very few search algorithms and implementations exist for these sources, and only a handful of such searches have been performed so far. In this paper, we present a new all-sky binary search method, BinarySkyHou$\mathcal{F}$, which extends and improves upon the earlier BinarySkyHough method, and which was the basis for a recent search (Covas et al. [1]). We compare the sensitivity and computational cost to previous methods, showing that it is both more sensitive and computationally efficient, which allows for broader and more sensitive searches.


I. INTRODUCTION
Continuous gravitational waves (CWs) are long-lasting and nearly monochromatic gravitational waves, expected to be emitted by deformed spinning neutron stars (NSs) due to their time-varying quadrupole moment [e.g.2].Although many CW searches have been performed to date, using data from the LIGO (H1 and L1) and Virgo (V1) detectors, no detection has been achieved yet (see [3] for a recent review).The expected CW amplitudes are several orders of magnitude smaller than the compactbinary-coalescence signals currently being routinely detected.Therefore, the combined analysis of months to years worth of data is required to accumulate enough signal-to-noise ratio.
When searching for CWs from known pulsars, all the phase-evolution parameters are known from electromagnetic observations, which allows one to perform statistically optimal searches by coherent matched-filtering with very little required computing power.All-sky CW searches for unknown neutron stars represent the opposite extreme, where no prior information about the signals is available, requiring an expensive explicit search over the phase-evolution parameters.
Furthermore, because the required parameter-space resolution increases rapidly with longer coherent integration time, the resulting computing cost explodes and makes it impossible to analyze longer stretches of data by coherent matched filtering.This computing-cost problem is pushed to the extreme when searching for unknown neutron stars in binary systems, as now we also need to search over the unknown binary orbital parameters [4].Therefore all-sky searches for unknown neutron stars in binary systems are the most computationally challenging type of searches.
The primary strategy followed by computationallylimited CW searches is to break up the data into shorter segments that can be coherently analyzed individually and then combine these coherent results across segments.These are the so-called semi-coherent methods (see [5] for a recent review of search methods).The resulting coarser parameter-space resolution entails a reduced computational cost, which allows for analyzing larger datasets and thereby regaining sensitivity.As a result, semi-coherent methods are typically more sensitive than fully-coherent matched filtering at a fixed computational budget [6,7].
The most commonly used coherent detection statistics are the F-statistic and the Fourier power (for sufficiently short segments, where a simple sinusoid can approximate the signal).The F-statistic is obtained by analytically maximizing the likelihood ratio over the four unknown amplitude parameters of a CW signal [8,9].Although this statistic was initially thought to be optimal, its implicit amplitude priors have been shown to be unphysical [10].Using better physical priors results in a more sensitive Bayes factor, albeit (currently) at increased computational cost, which is why this is not yet a viable alternative to the F-statistic for wide parameter-space searches.However, for short segments compared to a day, a new detection statistic has recently been found [11] that is more sensitive than F at no extra computing cost, termed the dominant-response statistic F AB .
Using Fourier power over short segments directly as a coherent detection statistic is computationally cheaper, given that no phase demodulation or other additional calculations are needed.However, one limitation of this statistic is the constrained maximum coherent length (about T seg 30 minutes), resulting from the approximation of the signal as a simple sinusoid.Typically this is expressed as the criterion that the signal power remains in a single frequency bin (of size 1/T seg ).Furthermore, while demodulated statistics (such as F and F AB ) can naturally combine data from several detectors coherently [9,11], this is not straightforward to achieve for short Fourier transforms [12] and is not commonly used.Therefore, constructing semi-coherent statistics on demodulated coherent statistics is generally more sensitive and flexible than using Fourier power.
Only two previous all-sky binary pipelines have been used in searches before BinarySkyHouF [1], namely TwoSpect [13], and BinarySkyHough [14].TwoSpect was the first pipeline for all-sky CW searches of unknown NSs in binary systems [15].BinarySky-Hough is an extension of SkyHough [16] (an all-sky pipeline for isolated systems) to searches for NSs in binary systems, which yields higher sensitivity compared to TwoSpect thanks to its more sensitive detection statistic and usage of GPU parallelization.Two recent all-sky binary searches deployed BinarySkyHough on data from Advanced LIGO's O2 and O3 observing runs [17,18], although over a reduced parameter space in frequency and binary parameters compared to the TwoSpect search.
BinarySkyHough uses short-segment Fourier power as its coherent detection statistic, limiting its attainable sensitivity (as discussed above).Here we present BinarySkyHouF, an extension of BinarySkyHough, which features several improvements compared to the previous pipeline: • Use of demodulated coherent statistics (such as For F AB -statistic) instead of short-segment Fourier power.
• Various code-implementation improvements (such as GPU coalesced memory access) and optimizations, increasing computational efficiency.
As will be shown in this paper, the new search pipeline is both more sensitive and more computationally efficient than BinarySkyHough, i.e., for the same coherent segment length T seg and mismatch distribution, it achieves higher sensitivity at lower computational cost.
A key ingredient of the new pipeline is the use of (loworder) Taylor-expanded phase parameters to describe the binary motion over the (short) coherent segments instead of the physical binary orbital parameters.These Taylor coordinates allow for a substantial dimensional reduction and solve the problem of covering the highly-degenerate per-segment coherent parameter space with an efficient template bank.However, this approach limits the sensitivity to signals from binary systems with orbital periods substantially longer than the segment length T seg .
The development of more sensitive all-sky binary search methods is of utmost importance since more than half of all known millisecond pulsars are part of a binary system 1 [20].Furthermore, accretion from a companion gives a plausible mechanism to generate an asymmetry or excite an r-mode with a detectable amplitude in the current generation of gravitational-wave detectors, as recently discussed in [1].This paper is organized as follows: in Sec.II we introduce the approximate signal model used to compute the F-statistic; in Sec.III we present the new BinarySky-HouF pipeline and we compare it to its predecessor; in Sec.IV we show sensitivity comparisons of different detection statistics; in Sec.V we summarize the main results of this paper and lay out some ideas for future work.

A. Physical phase model
Assuming a slowly varying NS spin frequency, the phase of a CW signal in the source frame can be expressed in terms of the Taylor expansion around a reference time τ ref , namely where τ denotes the time in the source frame, and s is the order of spindown parameters f k needed to accurately describe the intrinsic frequency evolution.The evolution of frequency f (τ ) and higher-order spindowns f (k) (τ ) is given by and the frequency and spindown parameters f k in the phase model of Eq. ( 1) are defined at the reference time τ ref , i.e., In order to obtain the phase of the signal in the frame of a detector, we need to transform it from the source frame by taking into account the movement of the NS and the movement of the detector with respect to the solar system barycenter (SSB).We absorb the unknown relative distance of the source with respect to the SSB into the reference time τ ref , and here we neglect relativistic effects such as Shapiro and Einstein delays2 and the transverse proper motion of the source.We can obtain the timing relation in two steps, first linking the wavefront-emission time τ in the source frame to its arrival time t SSB in the SSB frame, namely where R(τ ) is the radial distance (in light-travel time) of the source to the binary barycenter (BB) [4], with R > 0 when the source is further away from us than the BB.In the second step we can relate the SSB time to the arrival time t at detector X by the Rømer-delay expression: where r X (t) is the position vector (in light-travel time) of detector X with respect to the SSB, and n is the skyposition unit vector pointing from the SSB to the BB.The radial distance R of the source to the BB can be expressed [4] as in terms of the eccentric anomaly E, given by Kepler's equation where a p is the projected semi-major axis of the orbital ellipse (in light-travel time), Ω = 2π/P orb is the (average) orbital angular velocity (corresponding to the period P orb ), e is the orbital eccentricity, t p is the time of periapse passage and ω is the (angular) argument of periapse.
For small-eccentricity orbits, this can be approximated by the (linear in e) ELL1 model [4,21,22], namely (dropping a constant term −3a p η/2) with the Laplace-Lagrange parameters defined as κ ≡ e cos ω and η ≡ e sin ω and the orbital phase using the time of ascending node t asc instead of the time of periapse passage t p , which (to lowest order in e) are related by t asc = t p − ω/Ω.Expressions for R up to any order in e are found in Appendix C of [23].
B. Short-segment SSB Taylor coordinates {u k } Given all-sky binary CW searches need to cover a huge signal parameter space with finite computing resources, the longest coherent segment lengths T seg that can be used are typically very short (i.e, much shorter than a day).If we further assume the orbital periods to be much longer than the short segments, i.e., T seg P orb , then in this short-segment regime we can resort to Taylorexpanding the phase (in the SSB) around each segment mid-time t m (translated to the SSB), similar to what was done in [4,22], namely which defines the (SSB) Taylor coordinates {u k } kmax k=1 as Note that for segments short compared to a day one could also define Taylor coordinates in the detector frame instead of in the SSB, but this would result in detectordependent coordinates that are not suitable for our present search method.The resulting expressions are given in App.A for reference.
where the derivatives τ (k) ≡ d k τ /dt k SSB of the sourceto-SSB timing relation τ (t SSB ) can be further expanded using Eq. ( 4), involving derivatives R (k) ≡ d k R(τ )/dt k SSB of the binary radial distance R(τ ) of Eq. ( 6), which can be expanded in the same form as in terms of the source-frame time derivatives R (k) ≡ d k R(τ )/dτ k .This analysis is complicated by the fact that the binary radial distance R(τ ) of Eq. ( 6) is a function of source-frame (emission) time τ , not the SSB (arrival) time t SSB of a wavefront.In [4,22] this difficulty could be neglected for the purposes of computing the parameter-space metric, where a slow-orbit approximation, i.e., R(τ ) ≈ R(t SSB ), is sufficient.However, in the present application we want to preserve higher accuracy for the purpose of using these coordinates for coherent matched-filtering.
Substituting into the timing derivatives of Eq. ( 4), we can now obtain the expressions which are explicit because of the iterative backwards dependency of the τ (k) on only lower-order derivatives, i.e., From these expressions we can obtain the explicit Taylor coordinates u k via Eq.( 12) as where for the present application (such as [1]) at most first-or second-order u k will be needed in practice, as discussed in Sec.III C 1.Here we defined and where assuming the small-eccentricity approximation of Eq. ( 8), and (here) neglecting the NS-BB time delay of Eq. ( 4) as a higher-order correction, i.e., τ (t m ) ≈ t m .These u 1 , u 2 coordinates have units of Hz and Hz 2 , respectively, and depend on the physical parameters {{f k }, a p , Ω, t asc , e, ω}.Using these coordinates assumes that we have performed the standard SSB demodulation of the signal for any given sky position n, which is typically expressed in terms of the right ascension α and declination δ in equatorial coordinates.
The resulting (constant) parameter-space metric for the Taylor coordinates {u k } (valid for any signal phase of the form Eq. ( 10)) is found in Eq. (57) of [4].

III. BINARYSKYHOUF
In this section we present a summary of the new Bina-rySkyHouF pipeline and its main advantages over the previous BinarySkyHough.

A. Summary of the previous and new pipeline
The predecessor SkyHough and BinarySkyHough algorithms are described in more detail in [16] and [14], here we only provide a short overview summary.Both of these analyze the frequency-time matrix of short-Fouriertransform (SFT) power, by searching for "tracks" (corresponding to different source parameters) that are above the statistical expectation for noise.
SkyHough is limited to searches for signals from isolated systems, while BinarySkyHough is an extension designed for all-sky searches for unknown neutron stars in binary systems.Both are extremely fast model-based pipelines due to the highly efficient algorithms used to analyze the sky-maps and their effective use of look-up tables (see [14,16] for details).Furthermore, BinarySky-Hough leverages the computational advantages provided by GPUs by parallelizing the most expensive steps in the algorithm, and thus further massively reducing the runtime of a search.
A BinarySkyHough search is divided in two consecutive stages, using different detection statistics.In the first stage, a "Hough" weighted sum of 1s and 0s (depending on whether the SFT power crossed a threshold or not) is calculated, and all of the templates are sorted by the resulting significance (a normalized Hough number count with normal distribution, see Eq. ( 25) of [14]).The frequency-time pattern used for the tracks in the first stage is an approximation to the exact one, due to the usage of look-up tables (explained in Sec.IV B of [14]).In the second stage, the refinement stage, a fraction of the best-ranked templates is analyzed again, this time using a StackSlide weighted sum of SFT power, which has a higher sensitivity than summing weighted 1s and 0s (e.g., see [19]), and using a more accurate expression for the frequency-time pattern.
The central new feature of the BinarySkyHouF pipeline is to use a demodulated coherent detection statistic for the segments 4 , such as the F- [8,9] or F AB -statistic [11], instead of the number count or SFT power, but otherwise still benefit from the highly-efficient GPU-based BinarySkyHough-type algorithm to combine the coherent results to a semi-coherent statistic.Three main benefits arise from using a demodulated coherent statistic: 1. Demodulation removes the constraint on the maximum segment length, as the signal is no longer approximated as a pure sinusoid.This allows the algorithm to turn increases in computing power into better sensitivity (shown in Sec.IV C).
2. The per-detector data is combined coherently, which reduces the number of coherent segments needed to combine in the semi-coherent stage, improving sensitivity (shown in Sec.IV A) and reducing computational cost (shown in Sec.III D).
3. The parameter-space resolution and resulting mismatch can be controlled as a free parameter (rather than the fixed 1/T seg Fourier resolution of SFTs), which will be discussed in Sec.III C 1.
When combining coherent results to calculate a semicoherent detection statistic, it has been shown that applying per-segment weights can improve the sensitivity [26].While there is currently no analytic argument for using a weighted sum of F-or F AB -statistics5 , empirically we find (shown in Sec.IV A) that using weights also improves the sensitivity for these detection statistics.The weight w at segment is given by: where K is a normalization factor, S n; is the noise power spectral density of segment , and a and b are the antenna patterns of a detector (evaluated at the mid-time of every SFT), where the sum goes over all the SFTs in segment .When the segment just has one SFT, we recover the weights given by Eq. ( 22) of [14].For the dominantresponse statistic, we use the following weights: where C = NSFTs a b.
As discussed in Sec.II B, for computing-cost reasons the coherent segments for all-sky binary searches need to be very short, which allows us to use a small number (currently one or two) of Taylor coordinates u k to represent the spindown and binary orbital motion in the coherent segments.Using physical parameters, we would need to build a (at least) 6-dimensional parameter space grid6 over {α, δ, f 0 , a p , Ω, t asc }, while using the Taylor coordinates we effectively only need to use three (or four) parameter-space dimensions for the short segments currently considered, namely {α, δ, u 1 (, u 2 )}.This reduces complexity (the parameter-space metric in physical parameters would be hugely degenerate for short segments (cf.[4,22]) and lowers the resulting computational cost 7 .
The u k template bank is constructed as a hyper-cubic lattice in coordinate space.The code processes the sky in patches defined by an isotropic grid with cells of fixed solid-angle, using partial Hough map derivatives [14] to process the semi-coherent sky mapping.Coherent persegment statistics are computed only for the center of each sky-patch using the corresponding antenna pattern modulations and weights.

B. Semi-coherent interpolation
In the previous section we obtained the Taylor coordinates u k , which together with the sky position coordinates will be used in the coherent stage in order to calculate the F-statistic values over coordinates {α, δ, {u k }}.In the semi-coherent stage, on the other hand, we are using physical coordinates to combine the per-segment statistics, namely {α, δ, {f k }, a p , Ω, t asc , e, ω}.For every semi-coherent template, we therefore need to calculate the appropriate mapping to the corresponding Taylor coordinates {u k } and coherent sky position.
In addition to using different signal parameters, the semi-coherent template grids also generally need to be finer than the coherent per-segment grids, which results in the need to interpolate the coherent results when combining them semi-coherently (typically using nearestneighbor interpolation).This is a generic feature of the semi-coherent approach (cf.[7,28]), and in SkyHoughderived pipelines takes the form of the so-called master equation [14,16], linking sky offsets to resulting effective frequency shifts of the signal.
The SkyHough-type sky interpolation works by breaking the sky into several sky patches, as mentioned above, where the center of each patch is used as the coherent sky template for every semi-coherent sky-template in the same sky patch.The resulting offset in sky-position between the semi-coherent and coherent template results in compensating offsets in the {u k } coordinates, generalizing the Hough master equation.
A simple way to derive the shift in u k coordinates due to an offset δn in sky position is to use the full detectorframe Taylor coordinates u X k for each detector X, given in the appendix in Eq. (A5).Using this we can express the induced shifts δu X k as in terms of detector velocity v X m and acceleration a X m at the segment mid-time t m translated to the SSB8 .To remove the detector dependency we average over detectors, which will be a good approximation for δu 1 , given the detector velocity is dominated by the (detectorindependent) orbital motion of the Earth.On the other hand, the detector acceleration a X m in δu X 2 is dominated by the Earth's rotation, so averaging over detectors might be a less reliable approximation, but should still work well as long as the detectors are not too far separated, such as for LIGO H1 and L1.Therefore we arrive at the following generalized master equations (26) with detector-averaged velocity N det v ≡ X v X and acceleration N det a = X a X .Eq. ( 25) agrees with the previous Hough-on-F-statistic master equation found in [16] and [25] (with implicit detector averaging).
The u I1 master equation is illustrated in Fig. 1, where the u 1 values with the highest signal power are plotted as a function of segment mid-time t m for an offset δn = n − nd between the signal sky-position n and the coherent demodulation point nd .In addition we plot the predicted track of shifted u I1 given by Eq. ( 25).These u I1 values closely follow the path that minimizes the mismatch.The mismatch in Fig. 1 between the path followed by u I1 and the maximum path is around 0.1%, whereas the mismatch between the non-shifted u 1 and the maximum path would be around 80 %.
The parameter-space bounds for each of the Taylor coordinates are found using Eqs.( 25) and (26), by calculating the maximum possible values over the given physical parameter space.
For computational efficiency reasons (namely the lookup table approach used here, see [16]) the first-stage "Hough" semi-coherent summation actually uses the following approximate expressions instead of Eqs. ( 25) and ( 26), namely with the "middle" frequency f H given by where the f kH denote the midpoints in frequency-and spindown-ranges currently being searched over, and t mid is the mid-time of the full dataset.

C. Mismatch
In this subsection we describe and characterize different sources of mismatch for the BinarySkyHouF pipeline.The mismatch is defined as the relative loss of signal power, namely where ρ 2 is the full signal power (given by Eq. 20 in [11]) and ρ 2 r is the signal power recovered by the search.The total mismatch of the BinarySkyHouF pipeline has several contributions, which can be separated in coherent and semi-coherent mismatches.The main contributions to the coherent mismatch are offsets between signal and the closest template in the coherent template grid, and the usage of the (truncated) Taylor coordinates u k , while the semi-coherent mismatch is produced by signal-template offsets in the semi-coherent grid and approximations in the interpolation mapping (discussed in the previous section).

Mismatch due to Taylor-coordinate truncation
The usage of a limited number of Taylor coordinates u k incurs an intrinsic mismatch due to the corresponding approximation of the signal waveform.In practice we currently envisage using only u 1 or at most up to order u 2 , which turns out to be sufficient for currently considered practical all-sky binary searches (similar to the recent search [1] using only u 1 ) due to computational constraints.Therefore we quantify the mismatch and corresponding constraints on the maximum coherent segment length T seg due to truncation to order u 1 or u 2 .
The mismatch µ u k due to omission of order u k (and higher) can be estimated as µ u k ∼ g kk u 2 k using the metric g kl in u k -space, which is given in Eqs.(56, 57) of [4].Using this we can express the mismatch due to truncation of u k≥2 or u k≥3 as Using a time-average .over segments together with Eq. ( 16) for u 2 we obtain u 2 2 ≈ 1 2 f 2 0 a 2 p Ω 4 + f 2 1 (neglecting smaller corrections), and for u 3 we can use Eq. ( 58) of [4] as an estimate, which yields u 2 3 ≈ 1 2 f 2 0 a 2 p Ω 6 , and so we obtain the (segment-averaged) mismatch estimates as which illustrate the fact that the segments must be short compared to the orbital period, i.e., T seg Ω 1, in order for the Taylor-coordinates u k to be a good approximation, as discussed in [4].
We can rearrange these equations to obtain a constraint on the maximum coherent time T seg allowed for a given acceptable mismatch µ u from Taylor truncation, namely when using only u 1 we find and similarly for truncation after u 2 we obtain the constraint These expressions for the maximum coherent time are illustrated in Fig. 2 as a function of frequency for different choices of binary orbital parameters.It can be seen that when u 2 is also used the maximum coherent time increases by a certain factor.Figure 3 shows a test of the mismatch predicted by Eq. (32), by generating 1000 different signals with a frequency of 300 Hz and random orbital parameters loguniformly distributed, with Ω ∈ [0.1, 1] days and a p ∈ [0.01, 1] ls.For each signal we measure the perfectlymatched signal power when using physical coordinates and compare it to the signal power obtained with Taylor coordinates up to order u 1 .The corresponding measured mismatch is then compared to the model prediction of Eq. (32).The figure shows that these equations correctly predict the measured mismatch, in the range where the metric approximation is expected to be accurate (i.e., below mismatches of ∼ 0.3).

Total mismatch
In the previous subsection we discussed the mismatch contribution due to using a limited set of Taylor coordinates.Additionally there will be template-bank mismatches incurred from the coherent and semi-coherent template grids.If we count the Taylor-truncation mismatch µ u as part of the coherent mismatch µ c , then the total average (over the template bank) mismatch µ will be given approximately [7] by the sum of the mean coherent µ c and mean semi-coherent mismatch µ s , namely From this expression it can be seen that if the mean coherent mismatch is reduced while the semi-coherent mismatch is equal, the total mismatch would decrease.This is shown in Fig. 4, where the total measured mismatch can be seen for two different cases, which have different u 1 coherent grids but the same semi-coherent grid (the coherent sky position is the same for both cases, and it is shifted from the signal value).A decrease in the total mismatch can be seen for the case where the coherent u 1 grid is finer, as predicted by the previous equation.This represents an improvement over the previous pipeline, where the coherent frequency grid was fixed to be equal to the Fourier transform spacing.

Maximum spindown and eccentricity
Although BinarySkyHouF is able to search the f 1 spindown and eccentricity e parameters in the semicoherent summation, all-sky searches for neutron stars in binary systems are so computationally expensive that one would usually not explicitly search over these parameters at first.For this reason, we want to estimate the value ranges in these omitted parameters to which the pipeline is still sensitive, which will depend on the particular set-up (such as the amount of data and the coherent time).
The maximum covered spindown value |f 1 | is important for interpreting astrophysical upper limits, since this limits the maximum "mountain height" (NS deformation) or r-mode amplitude that the search can find.The reason for this is that larger deformation (or mode amplitude) would produce a larger spindown |f 1 | due to the emission of gravitational waves, which would potentially become undetectable by such a search if it was too large.
The semi-coherent grid is constructed by requiring a maximum mismatch µ M .We can estimate the maximum allowed values by calculating the required resolution for these parameters: where g f1f1 is obtained from [29] assuming that the refinement factor is equal to N seg (i.e., there are no gaps), and g ee is found in [4].
It can be seen that for spindown f 1 the maximum value only depends on the segment length T seg and total amount of data, while the maximum eccentricity e FIG. 2. The left plot shows the maximum coherent time Tseg that can be used in a given region of the binary parameter space with a certain mismatch due to the Taylor truncation (assuming zero spindown f1 = 0).The left plot shows the results for muu = 0.1 (solid lines) and µu = 0.4 (dashed lines) in Eq. (34), using only u1 Taylor coordinate.The right plot shows the same when using both u1 and u2 Taylor coordinates.FIG. 3. Measured mismatch µu 2 (blue points) from Taylor truncation (averaged over segments) against the predicted mismatch (solid line) given by Eq. ( 32), using 1000 injections and assuming one year of data from the H1 detector with coherent segments of Tseg = 900 s.
depends on the frequency f 0 and orbital parameters a p and Ω.To obtain a limit, one can take the parameters that produce the most conservative eccentricity, or calculate a mean value over the parameter-space boundaries.The eccentricity equation has the exact same functional form as Eq. ( 43) in [14], while here we make explicit the dependence on the desired maximum mismatch.
The previous equations quantify the additional mismatch produced by a signal with nonzero spindown f 1 and eccentricity e.However, another potential side-effect would be a shift in the remaining estimated parameters due to correlations between the parameters.However, values exceeding the limits above would not automatically mean such signals are undetectable, only that the resulting mismatch would be larger, thus decreasing the sensitivity of the search.
We can compare the mismatch distribution obtained with the same grid, for four different cases: signals with f 1 = 0 and e = 0; signals with the maximum values; signals with values in between (with log-uniform distributions up to the maximum value); signals with double the maximum value.This is shown in Fig. 5, where it can be seen that signals with parameters at the maximum value (the eccentricity maximum has been calculated using the binary parameters that give the largest eccentricity) increase the mean mismatch by ∼ 0.08, which would reduce the sensitivity of a search by ∼ 5%.

D. Computational model
In this section we explain how the computational cost and Random Access Memory (RAM) of our pipeline scale with different set-up variables, updating and expanding Sec.VF of [14].

Coherent computational cost
Due to the calculation of the F-statistic values, the coherent stage will have an additional computational cost, besides loading the input data and calculating the partial Hough map derivatives.In order to estimate this cost, we summarize the content of [30].The cost of calculating the F-statistic or its related quantities in segment at a single sky-patch scales with9 : where τ core and τ buffer are fundamental timing constants that only depend on the hardware and optimization settings (usually τ buffer is approximately one order of magnitude bigger), 2N dterms + 1 are the number of frequency bins that are used for the calculation of the Dirichlet kernel, N u is the number of coherent u k templates, and N T; is the total number of SFTs in segment .The total coherent computational cost of a single skypatch scales with the number of segments: Since the calculations for each segment are independent from the others, these steps can be easily parallelized.We use an OpenMP loop to take advantage of multi-core CPUs, which can speed-up the calculation by approximately the number of used cores.
The total coherent computational cost will scale linearly with the number of sky-patches.

Semi-coherent computational cost
In the semi-coherent stage the coherent detection statistics are combined for every template that is searched.
The cost of the first stage τ H;j over a sky-patch j scales as: (41) where N fs is the number of semi-coherent frequency and spindown templates, b() is a function containing the nonlinear dependency on the number of binary templates N binary , g j () is a function describing the effective number of semi-coherent sky points needed (due to the Sky-Hough algorithm), N α and N δ are the number of right ascension and declination points in each sky-patch, δ s is the semi-coherent sky grid resolution, h() is a function that depends on the threshold r set at the coherent stage, and τ 1 is a fundamental timing constant.The total semi-coherent cost will scale with the number of skypatches.In the previous paper [14] it was assumed that b = N binary and g j = N δ N α , which left some details out.
The function b() depends on the GPU architecture.If we use a CPU, it would simply be equal to N binary , but if we use a GPU it depends non-linearly on parameters such as the occupation of the GPU cores and the usage of shared memory.This can be seen in Fig. 7, where the non-linear scaling with N binary is clear.
The function g j () is equal or less than N δ N α , and it encodes the SkyHough-type sky interpolation mechanism, which depends on the relation between the size of the annulus produced by the Doppler modulations and the size of the semi-coherent sky grid, as explained in Sec.IVB of [16].At a given timestamp, the sky-patches with n more parallel to v have wider annulus, which may contain several semi-coherent sky pixels, thus lowering the number of sky points that need to be taken into account in the semi-coherent loop.This effect will be different at each timestamp, and over a long observing run this will produce an average value between one and N δ N α for the function g j ().This effect gives the SkyHough algorithm a computational advantage.
The function h is different from one for a non-zero threshold r, which substitutes coherent values to 0 when below the threshold, thus reducing the computational cost.This function is given by h = e − r r , where r is the expected value of the coherent detection statistic.
We define the average cost τ H of the first stage over different sky-patches j: The cost τ R of the second (i.e., "refinement") stage scales as: where N cand is the number of candidates that are passed to the second stage, N a is the number of additional points around each candidate that are searched (when using a finer grid), b() is the same function as before, and τ 2 is a fundamental timing constant.The total semi-coherent computational cost is the sum of the first and second stage.

Total computational cost
In order to estimate the total computational cost of a search, we add the coherent and semi-coherent costs (neglecting other costs such as loading the data and writing output to files, since in a realistic scenario they are negligible): where N F is the number of frequency bands needed to cover a certain frequency range, and N SP;l is the number of sky-patches at frequency band l.The values for τ H l and τ R;l depend on the frequency band, since N binary , N δ , N α , and N cand scale with frequency.Figure 6 shows a comparison of the coherent and semicoherent costs as a function of the number of binary templates.It can be seen that the coherent cost stays constant, but the semi-coherent cost increases, as expected.Past searches using BinarySkyHough and Bi-narySkyHouF have used N binary larger than 10 5 , so in such typical scenarios the coherent cost will be a small fraction of the total cost.If the number of binary templates is small, the coherent computational cost will have a non-negligible impact on the total cost.This also happens for isolated searches, where the search over binary parameters is substituted with a search over f 1 values, which usually is much less than 10 5 .In these two cases, calculating the F-statistic might lower the sensitivity or the span of a search.
When comparing the computational cost to the previous pipeline, it is important to notice that when using the F-statistic as the coherent detection statistic, the total computational cost is reduced compared to using SFT power for multi-detector searches.This is because the semi-coherent summing of the power is done over ∼ N seg N det values, while the F-statistic generates just N seg coherent detection statistics.For this reason, in the semi-coherent stage the combination of powers will take roughly N det times longer than the combination of Fstatistic values.If the template grids are the same in both cases, the computational cost of a semi-coherently dominated search using the F-statistic will therefore be reduced.This improvement can be seen in Fig. 7, where FIG. 7. Timing for a single sky-patch of a 0.1 Hz frequency band as a function of the number of binary templates.The blue points show the cost of the search for the BinarySky-Hough pipeline using SFT power as the coherent detection statistic, the orange points show the cost for the new Bi-narySkyHouF pipeline using the power statistic, while the green points refer to using the F-statistic.All three searches use the same number of templates and the same amount of data (Gaussian noise with equal amounts of data from detectors H1 and L1).These timings are obtained on an NVIDIA Tesla V100 and a single core of an Intel(R) Xeon(R) Silver 4215 CPU 2.50 GHz.
the green points show the lowering of the computational cost compared to the orange points.
Furthermore, the computational efficiency of the code has been improved, mainly by better usage of CUDA's coalesced global memory access.This can be seen in Fig. 7, comparing the performance of the previous to the new code for the same setup as a function of the number of binary templates (increasing the RAM memory needs).The increased efficiency of the new code corresponds to a lowering of the timing coefficient τ 1 in Eq. (41), manifesting as a weaker scaling with the number of binary templates.

RAM
In order to estimate the RAM required by Bina-rySkyHouF, we find the scaling of the data structures in the code as a function of input parameters such as the maximum mismatch, the coherent time, and the amount of data used.
The biggest data structures in the code are: • The partial Hough map derivatives, which hold the results from the coherent stage: where N u is the total number of Taylor ucoordinate templates, and N sky = N δ N α is the number of sky templates.
• The per-frequency bin semi-coherent results: These structures are orders of magnitude larger than the rest and are therefore enough to give a good estimate of the required RAM.The main differences with the previous pipeline are: • The number of frequency bins needed in the partial Hough map derivatives slightly decreases due to the coherent sky demodulation.
• The effective number of "segments" is reduced by the coherent multi-detector combination.
• If more than one Taylor coordinate needs to be used to maintain the mismatch at a certain level, the RAM increases in order to hold the coherent results.This will limit the number of Taylor coordinates that can be used at a certain coherent time.
Another RAM limitation of the code is due to the usage of CUDA's shared memory in the GPU kernel functions.This limits the size of the sky-patches, which is dependent on the GPU architecture.The shared memory size is given by: where T is the number of threads per block in the GPU kernel launch.

IV. SENSITIVITY AND PARAMETER ESTIMATION
In this section we will estimate the sensitivity of the new pipeline, and compare it to the previous one.In order to do this, we will compare the different detection statistics being used, showing the improvements in sensitivity that are possible due to the usage of demodulated statistics.We are not attempting to estimate a realistic sensitivity that these pipelines would achieve in an actual search, since this also depends on other post-processing procedures, such as clustering or follow-up, which are beyond the scope of the present study.
To estimate the sensitivity, we will follow the same common procedure used in [14].Namely, for different setups (encompassing the amount of data, detectors and their relative noise levels, maximum mismatch parameters, and coherent time) and detection statistics we generate Gaussian noise and perform a search to obtain the threshold at a certain false alarm probability (in the results shown, we use the top template as the threshold).Next, we add six groups (each with a different gravitational-wave amplitude) of 1000 randomly distributed signals to the previously generated Gaussian noise, and perform a separate search for each signal.The resulting statistic values are compared to the threshold, and the detection probability is estimated by the fraction of signals detected (i.e., crossing that threshold) out of the total number of signals.This procedure is performed for every different detection statistic.
The injected signals have a random isotropic (NS spin) orientation, a random isotropic sky position, a random frequency f 0 between [100, 100.1]Hz, and a random period P orb ∈ [15,60] days and semi-major axis a p ∈ [10,40] ls.
The detection statistics that we compare are the original Hough number count (given by Eq. ( 25) of [14]), the SFT power (given by Eq. ( 26) of [14]), the F-statistic (given by Eq. ( 23) of [11]), and the dominant-response statistic (given by Eq. ( 34) of [11]).For each of these statistics we also compare their weighted versions (discussed in Sec.III A.Here show the results for a single setup as an illustration, but we have tested various setups with different amounts of data and mismatch distributions and have obtained similar results.

A. Comparison of detection statistics
The left plot of Fig. 8 shows the sensitivity of two unweighted (F and F AB ) and three weighted detection statistics (SFT power, F and F AB ) for two detectors (H1 and L1), while the right plot shows the same comparison for three detectors (using V1 in addition).In both plots we see that the most sensitive statistic is the weighted dominant-response statistic F AB .For the two-detector case, the sensitivities of the SFT power and F-statistic are within the statistical errors of each other, but the right plot shows that for more than two detectors the Fstatistic is more sensitive.This is expected from the reduced number of effective segments (as discussed earlier), resulting in reduced χ 2 degrees of freedom for the background distribution (e.g., see [11]).We further see that the sensitivity of all statistics improves (on these short segments) when using weights.Overall these results illustrate the sensitivity advantage of the demodulated (For F AB -) statistics over using SFT power.
Next we compare the parameter estimation accuracy of the different weighted detection statistics.To do this, we select the template with the highest detection statistic, and compare its parameters with the parameters of the injected signal, if it was detected.The distance between injected signal and recovered "loudest" template is measured in terms of number of grid bins along all six parameter-space dimensions.The results are shown in Fig. 9.It can be seen that the different detection statistics show a very similar behavior, and that for all of them more than 90% of the signals are recovered within one bin.

B. Refinement stage
As discussed in Sec.III A, the new BinarySkyHouF pipeline (like the previous BinarySkyHough) consists of two main stages: after the initial search stage, a percentage of templates with the highest detection statistic are reanalyzed with more accurate u Ik interpolation expressions (see Sec. III B), and possibly with a finer mismatch.The previous BinarySkyHough pipeline also used a more sensitive detection statistic in the second stage, using the weighted power instead of the weighted Hough number count, while now we always use the most sensitive statistic (i.e., weighted F AB ) in all stages.
The sensitivity of this refinement procedure depends on the percentage of templates that is passed to the second stage.If it is large enough, at a realistically low false alarm probability the sensitivity of the search would be determined by the statistic used in the second stage, thus improving the overall sensitivity.In order to simulate realistic search conditions, for a candidate to count as detected in the second stage we also require its firststage detection statistic to be higher than the weakest candidate that was passed to the second stage.
Figure 10 shows the comparison of the weighted number count without a second stage with the result when the highest 1% of the candidates are passed to the second stage.We see that the sensitivity of the weighted number count (with the optimal threshold of 3.210 ) is within the uncertainty errors of the weighted power.This shows that the previous pipeline sensitivity was effectively determined by the second-stage weighted power statistic.
We further see that the sensitivity of the weighted dominant-response F AB -statistic is slightly improved in the refinement stage, due to the usage of the more accurate u Ik master equations of Eqs. ( 25), (26).For more than two detectors, the sensitivities of the demodulated statistics are expected to be even better, similar to what we saw in the right plot of Fig. 8.

C. Increasing the coherent time
One of the advantages of the new pipeline is the ability to extend the coherent segment time T seg while maintaining the same mismatch.Using a longer coherent time increases the computational cost, while also increasing the sensitivity of the search.For a large number of segments, the sensitivity of a StackSlide search scales as N −1/4 seg (e.g., [7]), so at fixed false-alarm probability and amount of data, the coherent time would need to be increased by a factor of 16 in order to double the sensitivity of the search.Here we compare the difference in sensitivity between using a coherent time of T seg = 900 s and T seg = 3600 s, with fixed maximum mismatch parameters.
For this comparison we add random signals from the binary parameter space region of P orb ∈ [0.785, 0.8] days and a p ∈ [0.5, 0.6] ls.In this region of the parameter space, several u 2 templates are needed in order to main- tain the same coherent mismatch for the coherent time of T seg = 3600 s, while the searches with T seg = 900 s only need templates in u 1 .
Fig. 11 shows the results by comparing the dominantresponse F AB -statistic using these two coherent times.The improvement in sensitivity by using a longer coher- ent time is clear, both for the weighted and unweighted cases.For the unweighted detection statistics, the maximal expected improvement due to the four-times increase in T seg is around ∼ 1.41, which agrees approximately with what is seen in the figure .An interesting point to observe here is that the relative improvement due to weighting of the statistics decreases for longer segment length T seg (at constant Gaussiannoise floors).This can be understood from the reduced differences in antenna-pattern responses between segments which therefore contribute more similar signal power.
We also show in Fig. 11 a search using the longer T seg = 3600 s coherent segments, but without including u 2 templates.We see that the this decreases its sensitivity, illustrating the need to include u 2 templates for this setup, as was expected from the mismatch estimates in Sec.III C 1.

V. CONCLUSIONS
In this paper we have presented a new pipeline, Bi-narySkyHouF, to search for continuous gravitational waves from unknown neutron stars in binary systems.This new pipeline is a descendant of the previous Bina-rySkyHough pipeline, improving several different aspects.It can cover the same parameter space at a reduced computational cost, and the usage of demodulated (Fand F AB -) statistics allows to use longer coherent segments, which increases sensitivity and gives more flexi-bility to the pipeline.Furthermore, the per-detector data are combined coherently, thereby reducing the computational cost (by a factor of ∼ N det ) and further improving sensitivity for searches with more than two detectors.
The new pipeline gives explicit control over the mismatch in the coherent stage, allowing one to perform lower-mismatch searches than before, for example when following up an interesting candidate or targeting a particularly interesting smaller region of parameter space..One can now explicitly search over binary orbital parameters such as the eccentricity and argument of periapse, or higher-order frequency derivatives.Therefore this pipeline can be used to follow up candidates from different other searches, such as from an all-sky isolated search that only gives constraints on the ranges of possible binary parameters to follow-up.
BinarySkyHouF also has some limitations.The Taylor coordinates used in the coherent stage only allow searching for orbital periods substantially longer than the coherent segment length, as discussed in Sec.III C 1.This limits the possibilities to search over the shortest orbital periods with longer coherent times.The corresponding RAM requirements limit the number of Taylor coordinates that can be used, which also limits the maximum coherent time that can be used in certain regions of the parameter space.
However, a number of future improvements to this pipeline can be envisaged.For example, the usage of a non-zero threshold on the coherent statistics before summing, or including only a subset of segments in the first stage depending on their sensitivity.Further enhancements could be achieved by reducing the second-stage mismatch by refining the search grid.It would also be interesting to develop an explicit optimization algorithm for the optimal search setup parameters given a certain amount of data and computational budget, similar to what was done in [7,31].We also plan to explore the usage of further demodulated detection statistics, in particular the line-robust statistics of [32], which should help mitigate against non-Gaussian artifacts in the data.

ACKNOWLEDGMENTS
This work has utilized the ATLAS computing cluster at the MPI for Gravitational Physics Hannover.
This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement number 101029058.In Sec.II we introduced short-segment SSB Taylor coordinates {u k }, defined by the Taylor expansion of the signal phase in the SSB frame around each segment midpoint.Here we show how similar detector-frame Taylor

FIG. 1 .
FIG.1.Comparison between the uI1 coordinate per segment given by Eq. (25) (green line) and the u1 values per segment that maximize the signal power (orange points).The sky position is offset by 0.3 rad in both α and δ from the signal.This example assumes a single detector (H1) and one year of data, with segments of length Tseg = 900 s, and a constantfrequency signal of f0 = 100 Hz without binary modulation.

FIG. 4 .
FIG. 4. Mismatch histograms for 1000 injections with random parameters, assuming one year of data from both H1 and L1 detectors and coherent segments of Tseg = 900 s.The olive (right) histogram corresponds to a grid resolution of δu1 = 1/Tseg, while the red (left) histogram uses a finer resolution of δu1 = 0.25/Tseg.Only u1 and a single sky position are searched in the coherent stage.The semi-coherent grids are equal for both cases.

FIG. 5 .
FIG. 5. Measured mismatch for 1000 injections with random parameters, assuming one year of data from the H1 detector and segments of Tseg = 900 s.From left to right, the blue (A) histogram corresponds to signals with zero spindown, f1 = 0, and eccentricity e = 0; the red (B) histogram uses signals loguniformly distributed in f1 ∈ [−4.7×10 −14 , −4.7×10 −11 ] Hz/s and e ∈ [0.2 × 10 −4 , 0.2 × 10 −1 ]; the black (C) histogram uses signals with f1 and e at their maximum values given by Eq. (37); the yellow (D) histogram is for signals with f1 and e given by twice the maximum values.

FIG. 6 .
FIG. 6. Timing for a single sky-patch of a 0.1 Hz frequency band as a function of the number of binary templates.The orange points show the computing time of the semi-coherent stage, while the blue points are for the coherent stage (including the F-statistic computation and generation of partial Hough map derivatives).These timings are obtained on an NVIDIA Tesla V100 and a single core of an Intel(R) Xeon(R) Silver 4215 CPU 2.50 GHz.

FIG. 8 .
FIG. 8. Detection probability as a function of sensitivity depth√ Sn/h0 for different detection statistics in Gaussian noise.The left plot shows the results for the H1 and L1 detectors, while the right plot additionally includes the V1 detector.The error bars denote the 95% binomial confidence interval.We assume one year of data at the same noise floor from each of the detectors, and a coherent segment length of Tseg = 900 s.(The points have been slightly displaced along the x-axis to ease visibility).

FIG. 10 .
FIG. 10.Detection probability as a function of sensitivity depth √ Sn/h0 for different detection statistics in Gaussian noise.The plot shows the results for the H1 and L1 detectors, assuming one year of data at the same noise floor from each of the detectors, and a coherent segment length of Tseg = 900 s.The error bars denote the 95% binomial confidence interval.(Thepoints have been slightly displaced along the x-axis to ease visibility).

FIG. 11 .
FIG. 11.Detection probability as a function of sensitivity depth√ Sn/h0 for different detection statistics in Gaussian noise, using two different coherent segment lengths, namely Tseg = 900 s and Tseg = 3600 s.The plot shows the results for the H1 and L1 detectors, assuming one year of data at the same noise floor from each of the detectors.The error bars denote the 95% binomial confidence interval.(The points have been slightly displaced along the x-axis to ease visibility).
Appendix A: Detector-frame Taylor coordinates