The Einstein@Home search for periodic gravitational waves in LIGO S4 data

A search for periodic gravitational waves, from sources such as isolated rapidly-spinning neutron stars, was carried out using 510 hours of data from the fourth LIGO science run (S4). The search was for quasi-monochromatic waves in the frequency range from 50 Hz to 1500 Hz, with a linear frequency drift f-dot (measured at the solar system barycenter) in the range -f/tau<f-dot<0.1 f/tau, where the minimum spin-down age tau was 1000 years for signals below 300 Hz and 10000 years above 300 Hz. The main computational work of the search was distributed over approximately 100000 computers volunteered by the general public. This large computing power allowed the use of a relatively long coherent integration time of 30 hours, despite the large parameter space searched. No statistically significant signals were found. The sensitivity of the search is estimated, along with the fraction of parameter space that was vetoed because of contamination by instrumental artifacts. In the 100 Hz to 200 Hz band, more than 90% of sources with dimensionless gravitational wave strain amplitude greater than 1e-23 would have been detected.


I. INTRODUCTION
Gravitational waves are a fundamental prediction of Einstein's General Theory of Relativity [1,2].But these waves are very weak, so although there is compelling indirect evidence for their existence [3], direct detection has so far not been possible.
In the past decade, advances in lasers, optics and control systems have enabled construction of a new generation of gravitational-wave detectors [4] that offer the first realistic promise of a direct detection.The Laser Interferometer Gravitational-wave Observatory (LIGO) [5,6] is currently the most sensitive of these instruments.LIGO consists of three kilometer-scale instruments.Two are located in a common vacuum envelope in Hanford, Washington, USA and the other is located in Livingston, Louisiana, USA.
This paper reports on the results of the Einstein@Home search for "continuous wave" sources in the data from the fourth LIGO science run (S4).The configuration of the LIGO detectors during the S4 run is described in a separate instrumental paper [7].

A. Continuous wave (CW) sources and detection methods
"Continuous waves" (CW) are quasi-monochromatic gravitational-wave signals whose duration is longer than the observation time.They have a well-defined frequency on short time-scales, which can vary slowly over longer times.These types of waves are expected, for example, from spinning neutron stars with non-axisymmetric deformations.If the system is isolated, then it loses angular momentum to the radiation.The spinning slows down, and the gravitational-wave frequency decreases.Gravitational acceleration towards a large nearby mass distribution can also produce such a frequency drift (of either sign).Many possible emission mechanisms could lead the to the emission of such waves by spinning neutron stars [8,9,10,11,12,13,14,15,16].
If there were no acceleration between the LIGO detectors and the GW sources, then it would be possible to search for CW signals from unknown sources using only "standard" computing resources, such as a highend workstation or a small computing cluster.In this case the analysis technique would be simple: compute the Fast Fourier Transform (FFT) [17,18] of the original time-series data, and search along the frequency axis for peaks in the power spectrum.Time-domain resampling or similar techniques could be used to compensate for the effects of a linear-in-time frequency drift.
However, this simple analysis is not possible because of the terrestrial location of the LIGO detectors: signals that are purely sinusoidal at the source are Dopplermodulated by the Earth's motion and thus are no longer sinusoidal at the detector.The Earth's rotation about its axis modulates the signal frequency at the detector by approximately one part in 10 6 , with a period of one sidereal day.In addition, the Earth's orbit about the Sun modulates the signal frequency at the detector by approximately one part in 10 4 , with a period of one year.These two modulations, whose exact form depends upon the precise sky location of the source, greatly complicate the data analysis when searching for unknown sources.The search becomes even more complicated if the CWemitter is part of a binary star system, since the orbital motion of the binary system introduces additional modulations into the waveform.
The "brute force" approach to the data analysis problem would employ matched filtering, convolving all available data with a family of template waveforms corresponding to all possible putative sources.The resulting search statistic is called the F-statistic and was first described in a seminal paper of Jaranowski, Królak, and Schutz [19].But even for isolated neutron stars (i.e. which are not in binary systems) the parameter space of possible sources is four-dimensional, with two parameters required to describe the source sky position using standard astronomical equatorial coordinates α (right as-cension) and δ (declination), and additional coordinates (f, ḟ ) denoting the intrinsic frequency and frequency drift.To achieve the maximum possible sensitivity, the template waveforms must match the source waveforms to within a fraction of a cycle over the entire observation time (with current detectors this is months or years).So one must choose a very closely spaced grid of templates in this four-dimensional parameter space, and the computational cost exceeds all available computing resources on the planet [20].Thus the direct approach is not possible in practice.
More efficient and sensitive methods for this type of search have been studied for more than a decade and are under development [21,22,23,24,25].In this paper, the frequency-domain method described in [26,27] is used to calculate the F-statistic.In order to maximize the possible integration time, and hence achieve a more sensitive coherent search, the computation was distributed among approximately 10 5 computers belonging to ∼ 5 × 10 4 volunteers in ∼ 200 countries.This distributed computation project, called Einstein@Home [28], follows the model of a number of other well-known volunteer distributed computing projects such as SETI@home [29] and Folding@home [30].
Other methods have also been employed for the CW search of the S4 data [31,32] and searches for other signal types (burst, inspiral, stochastic background) have also been carried out [33,34,35,36,37,38] with this data set.The results of these searches are all upper bounds, with no detections reported.

B. Outline of this paper
The outline of this paper is as follows.Sections II and III describe the overall construction of the search, including the data set preparation, regions of parameter space searched, and the choices of thresholds and sensitivities.Section IV describes the post-processing pipeline.The level of sensitivity of the search is estimated in Section V. Section VI describes the vetoing of instrumental line artifacts and the fraction of parameter space that was therefore excluded.Section VII describes the end-to-end validation of the search and the post-processing pipeline, which was done by injecting simulated CW signals into the detector hardware.Section VIII describes the final results of the search, followed by a short conclusion.

II. DATA SELECTION AND PREPARATION
The data for the S4 run was collected between February 22, 2005 andMarch 23, 2005.The data analyzed consisted of 300 hours of data from the LIGO Hanford 4-km (H1) detector and 210 hours of data from the LIGO Livingston 4-km (L1) detector.The search method used here (explained in detail in Section III) consists of computing a coherent F-statistic over data segments of 30 hours each, and combining these results via an incoherent coincidence scheme.However, the 30-hour segments have time-gaps, and the number of templates needed for the coherent F-statistic step grows rapidly as the gaps get longer.For this reason, the start and end times of the data segments were selected based on the criteria that the gaps totaled no more than ten hours: each data segment contains 30 hours of sciencemode data and lies within a total time span of less than 40 hours.Here and in the following the term "segment" is always used to refer to one of these time stretches of data, each of which contains exactly T obs = 30 h of data.The total time spanned by the data in segment j is written T span,j ; 30 h < T span,j < 40 h.
The data segments consist of uninterrupted blocks of 1800 s of contiguous science-mode data.This is for technical reasons: the F-statistic code uses Short Fourier Transforms (SFTs) over T SFT = 1800 s as input data, (this data format is described in [39]).To produce these SFTs, the data is first calibrated in the time domain using the method described in [40,41].Then the data is windowed in 1800 s intervals using a Tukey window with  a characteristic turn-on/turn-off time of 500 ms, followed by an FFT.
Applying the above constraints to the S4 data set yielded a total of N seg = 17 data segments (10 from H1, 7 from L1), labeled by j = 1, • • • , 17.The GPS start time of segment j is denoted t j , and these values are listed in Table I.
The maximum Doppler modulation (from the Earth's motion about the Sun) is about one part in 10 4 .Over the length of S4, and in the parameter range considered, the frequency changes due to intrinsic spin-down are smaller still.This means that the CW signals searched for here always stays within a narrow frequency band, drifting no more than about ±0.15 Hz from some fiducial frequency.For this reason the input data, spanning the frequency range of 50 Hz to 1500 Hz, is partitioned in the frequency domain into 5800 "slices" of 0.5 Hz plus wings of 0.175 Hz on either side.The size of one such input data slice is 7 368 000 bytes for H1 (containing 600 SFTs from 10 segments) and 5 157 600 bytes for L1 (containing 420 SFTs from 7 segments).
The detector data contains dozens of narrow-band spectral lines whose origin is instrumental, for example the harmonics of the 60 Hz mains frequency, and violin modes of the mirror suspensions in the range from 342 Hz -350 Hz (H1) and 335 Hz -355 Hz (L1).To simplify later analysis, line features that are known to be instrumental artifacts are removed ("cleaned") from the data by replacing the frequency-domain data bins with computer-generated random Gaussian values.The fre-  quencies of these lines are shown in Table II.The cleaning algorithm uses a moving-in-frequency median of the power in individual frequency bins to determine the instrumental noise floor.To prevent bias at the boundaries of the cleaned regions, the mean of the random values to replace the line features interpolates linearly between the noise floor at either side of the line feature.The median noise strain amplitude spectra of the final cleaned H1 and L1 data sets are shown in Figure 1.

III. DATA PROCESSING
Figure 2 is a schematic flow-diagram of the Einstein@Home data processing which is described in this section and in the following section on post-processing.
It shows what parts of the analysis were done by project participants, what parts were done on project servers, and the relationships between these.

A. BOINC workunit distribution and validation
The computational work of the search is partitioned into 6 731 410 workunits (separate computing tasks) and processed using the Berkeley Open Infrastructure for Network Computing (BOINC) [42,43,44].Because the work is done on computers that are not owned or controlled by our scientific collaboration or institutions, any individual result could be wrong.Error sources include defective hardware (such as over-clocked memory), defective software (erroneous system libraries), and malicious users (faking correct results).To identify and eliminate such errors, BOINC was configured so that each workunit is done independently by computers owned by at least three different volunteers.
The most common types of errors (lack of disk space, corrupted or missing input files, inconsistent internal state, etc.) are detected during program execution.If an error is detected during run-time, the program reports to the Einstein@Home server that the workunit was unsuccessful, and BOINC generates another instance of the workunit, to be sent to another volunteer's computer.This behavior is repeated as necessary until three successful results have been obtained.
The three successful results obtained for each workunit are then compared by an automatic validator, which rejects results that do not agree closely.The validation process is more complicated than simple byte-by-byte comparison of output files, because Einstein@Home supports multiple computing platforms (Windows, GNU/Linux, Mac OS X on Intel and PPC, FreeBSD, and Solaris) and differences in CPU hardware, compiler instruction ordering, and floating-point libraries mean that correct and valid result files may exhibit small numerical differences.The automatic validation takes place in two steps.
The output files have a fixed five-column format and contain 13 000 candidate events, with one line per candi-FIG.2: Schematic overview of the Einstein@Home dataprocessing and subsequent post-processing.date event, as described in Section III C. The first validation step checks that the file syntax is correct and that each value is within the allowed range for that column.This detects most file corruption.
Then the validator does comparison of all possible pairs of result files.For a given pair of result files, the validator checks that corresponding candidate events lie on the same template grid-point and have F-statistic values that agree to within 1%.Since each file contains the 13 000 events with the largest values of the F-statistic, numerical fluctuations in determining the value of F can lead to slightly different lists being returned on different platforms.Hence the validator tolerates unpaired candidate events if they lie within 1% of the smallest value on the list.
A workunit is validated once it has three results that agree with one another to within these tolerances.If the three results do not pass this validation step, the Einstein@Home server generates more instances of this workunit until three valid results have been obtained.For the search described in this paper, a "post-mortem" analysis of the computation shows that the probability of a successful but invalid result is small (0.36%), and the errors which make a successful result invalid are typically unique and irreproducible.Hence we estimate that it is highly improbable that even a single incorrect result has been marked as "valid" by the automatic validator.

B. Workunit design
The different workunits cover (search) different parts of parameter space.A key design goal of these workunits is that they should have roughly equal computational run times, of the order of ∼ 8 hours, and that the computational effort to carry out the entire search should last about 0.5-1 years.Another key design goal is that each workunit uses only a small re-usable subset of the total data set.These allow Einstein@Home volunteers to do useful computations on the one-day timescale, and minimizes the download burden on their internet connections and on the Einstein@Home data servers.
Each workunit uses only one segment of data over a narrow frequency-range, but covers the whole sky and the full range of frequency-derivatives ḟ at that frequency.Therefore, the entire search is divided into computational units over different data segments and frequency-bands.In the following it will be useful to label the workunits by three indices (j, k, ), where j = 1, • • • , 17 denotes the data segment, k = 1, • • • , 2900 labels the 0.5 Hz band covered by the input data file, and = 1, • • • , M (j, k) enumerates individual workunits associated with data segment j and frequency band k.Note that each workunit uses a frequency band that is smaller than the 0.5 Hz covered by the input data files, i.e.M (j, k) ≥ 1.

Search grid in parameter space
The parameter space is gridded in such a way that no point has a "squared-distance" from its nearest grid point that exceeds a certain "maximal mismatch".The distance is defined by a metric on parameter space, first introduced in [21,22].The squared distance is the fractional loss of squared signal-to-noise ratio (SNR 2 ) due to waveform mismatch between the putative signal and the template.The search grid was constructed based on the projected metric on the subspace orthogonal to the frequency direction ∂ f with ḟ = 0..For any given workunit, the parameter-space grid is a Cartesian product of uniformly-spaced steps df in frequency, uniformlyspaced steps d ḟ in frequency derivative, and a twodimensional sky-grid, which has non-uniform spacings determined by the projected metric.For frequencies in the range [50,290) Hz, the maximal mismatch was chosen as m = 0.2 (corresponding to a maximal loss in SNR 2 of 20%), while in the range [300, 1500) Hz, the maximal mismatch was m = 0.5.Due to a bug in the script generating the sky-grids, the range [290, 300) Hz, was covered by frequency and spin-down steps corresponding to m = 0.2, whereas the sky-grids were constructed for m = 0.5.The distribution of actual mismatches in this frequency range will therefore be somewhat in between those of the lowfrequency and high-frequency workunits.
It can be shown [45] that these relatively large mismatches give near-optimal sensitivity for a coherent search at fixed CPU power.Choosing finer grid spacings (i.e. a smaller mismatch) would require searching more grid-points, thus reducing the maximal possible coherent integration time.A coarser search grid would allow longer integrations but at a larger average loss in SNR.Because of these two competing tendencies, the sensitivity as a function of mismatch m has a maximum in the range m ∼ 0.25-0.7,depending on the choice of false-dismissal rate from the grid mismatch.Full details of the parameter-space grid and workunit construction are given in [45]; a short summary follows.

Search grid in frequency and frequency-derivative
The step-size in frequency was determined using the metric-based expression so the frequency-spacing depends on T span,j of the data segment j.
The range of frequency-derivatives ḟ searched is defined in terms of the "spin-down age" τ ≡ −f / ḟ , namely τ ≥ 1000 years for low-frequency and τ ≥ 10 000 years for high-frequency workunits.Younger neutron stars than the limited range of the search probably would have left a highly visible (Sedov phase) supernova remnant or a pulsar wind nebula, and thus our search for unknown neutron stars targeted older objects, which also resulted in less computational cost.The search also covers a small "spin-up" range, so the actual ranges searched are ḟ ∈ [−f /τ, 0.1f /τ ].In ḟ the grid points are spaced ac- cording to resulting in resolutions d ḟj ∈ [1.84, 2.59] × 10 −10 Hz/s for low-frequency workunits, and d ḟj ∈ [2.92, 4.09] × 10 −10 Hz/s for high-frequency workunits, depending on the duration T span,j of different segments j.

Search grid in the sky parameters
The resolution of the search grid in the sky-directions depends both on the start-time t j and duration T span,j of the segment, as well as on the frequency.The number of grid points on the sky scales as ∝ f 2 , and approximately as ∝ T 2.4  span,j for the range of T span,j ∼ 30 − −40 h used in this search.Contrary to the simple uniform spacings in f and ḟ , the sky-grids are computed beforehand and shipped together with the workunits.In order to simplify the construction of workunits and limit the amount of different input-files to be sent, the sky-grids are fixed over a frequency range of 10 Hz, but differ for each data segment j.The sky-grids are computed at the higher end of each 10 Hz band, so they are slightly "over-covering" the sky instead of being too coarse.The search covers a frequency range of 1450 Hz, and so there are 145 different sky-grids for each segment.To illustrate this, four of these sky-grids are shown in Figure 3 corresponding to two different data segments at two distinct frequency bands.
To ensure that each workunit takes a similar amount of CPU time, the total number of template grid points of each workunit is chosen to be approximately constant for all workunits.However, in practice, this number can vary by up to a factor of 2 due to discretization effects.The number of points in the sky-grids grows with frequency as f 2 and the number of points in the spin-down grid grows linearly with f .Thus, to keep the number of templates (and therefore the CPU time) approximately constant, the frequency range covered by each workunit decreases as f −3 .Hence for fixed j, M (j, k) is roughly proportional to k 3 .

C. The output of a workunit
The result from completing one workunit on an Einstein@Home host computer is a ZIP-compressed ASCII text file containing the 13 000 candidate events with the largest values of the F-statistic found over the parameterspace grid points analyzed by that workunit.Each line of the output file contains five columns: frequency (Hz), right ascension angle (radians), declination angle (radians), spin-down-rate (Hz/s) and 2F (dimensionless).The frequency is the frequency at the Solar System Barycenter (SSB) at the instant of the first data point in the corresponding data segment.
The number 13 000 was decided in advance, when the workunits were first launched on the Einstein@Home project, which was about one year before the postprocessing pipeline was developed.The network bandwidth required to retain more than 13 000 candidates per workunit, and the storage space required to preserve them, would have exceeded the capacity of the Einstein@Home server and its internet connection.For frequency-band and data segment combinations with small numbers of workunits, for example the j = 1 data set from 301.0 Hz to 301.5 Hz, almost all of the 13 000 candidate events are later used in the post-processing pipeline.However (as can be seen later in Figure 4) for most frequency-bands the post-processing pipeline only needed and used a fraction of the events that were returned.
Returning the "loudest" 13 000 candidate events effectively corresponds to a floating threshold on the value of the F-statistic.This avoids large lists of candidate events being produced in regions of parameter space containing non-Gaussian noise, such as instrumental artifacts that were not removed a priori from the input data.

IV. POST-PROCESSING
As shown previously in Figure 2, after result files are returned to the Einstein@Home servers by project participants, further post-processing is carried out on those servers and on dedicated computing clusters.The goal of this post-processing pipeline is to identify consistent candidate events that appear in many of the 17 different data segments.
In this paper, a consistent (coincident) set of "candidate events" is called a "candidate".Candidate events from different data segments are considered coincident if they cluster closely together in the four-dimensional parameter space.A clustering method using a grid of "coincidence cells" will reliably detect strong CW signals, which would produce candidate events with closelymatched parameters.
The post-processing pipeline operates in 0.5 Hz-wide frequency-bands, and can be summarized in three steps.In step one, the coincident false alarm probability is fixed.In step two, the frequency values of candidate events are shifted to the same fiducial time.In step three, a grid of cells is constructed in the four-dimensional parameter space, and each candidate event is assigned to a particular cell.In the following the details involved in each step are described.

A. Preparation and selection of candidate events
In the first step the individual result files are prepared for the later analysis by uncompressing them and keeping only a subset of the candidate events: from the (j, k, )'th workunit only the E(j, k, ) candidate events with the largest values of 2F are retained.
The number of these candidate events is chosen a priori to obtain a pre-determined fixed false alarm probability.The false alarms should be approximately uniformly distributed among the workunits, since each workunit examines a similar number of independent grid points in parameter space.The number of candidate events is chosen so that in a 0.5 Hz-wide frequency-band the probability that one or more coincidence cells after doing the clustering (in step three) has C max = 7 or more coincidences is P F = 0.001.Thus, in the analysis of 2900 such frequency bands, in random noise one would expect to find only about three candidates with seven or more coincidences.(As explained later in Section IV F 1, this overall probability for the entire search is somewhat increased because the coincidence cell grids are also shifted by half their width in 16 possible combinations).
In terms of the notation introduced in the previous section, the number of candidate events kept from the (j, k, )'th workunit takes the form where E seg (k) is shown in Figure 4.Because the individual workunits are constructed to use approximately the same amount of CPU time, each workunit examines approximately the same number of templates in parameter space, so the same number of candidate events are retained from all workunits which have the same input data file and data segment.This implies that the number of candidate events that are kept per data segment j and per frequency band is independent of the data segment j: In the later stage of the post-processing, this ensures that each of the different data segments contributes equally to the probability of generating false alarms in the coincidence step.

B. Number of cells in the post-processing coincidence grid
It is important to calculate the number of coincidence cells in the coincidence grid.Together with the desired false alarm probability, this determines the number of candidate events to retain in the post-processing pipeline.
The number of coincidence cells N cell (k) contained in each 0.5 Hz frequency band k (while doing the clustering in step three) is determined by the sizes of the cells.This is given by where ∆f denotes the coincidence cell width in frequency, ∆ ḟ denotes the width in spin-down, and ∆α(δ) and ∆δ(δ) denote the coincidence cell widths in right ascension and declination (both of which vary with declination δ).The choice of the coincidence cell sizes will be explained in detail later when step three will be described.

C. False alarm rate and the number of candidate events retained
The number of candidate events that must be retained is determined by the number of cells in the coincidence grid N cell (k) and by the desired probability of false alarm P F for false coincidence of candidate events from C max or more data segments in each 0.5 Hz band.To relate these quantities, consider the case of random instrumental noise, in which the candidate events are distributed uniformly about the coincidence grid.Concentrate on a single 0.5 Hz band k, and consider the first of the N seg = 17 data segments.A total of E seg (k) candidate events must be distributed uniformly among N cell (k) coincidence cells.Each candidate event falls in a random coincidence cell, independent of the locations of the previous candidate events.The probability that the first candidate event falls in the first coincidence cell is 1/N cell (k), and hence the probability that the first coincidence cell remains empty is 1 − 1/N cell (k).If the remaining E seg (k) − 1 candidate events fall independently and at random into the coincidence cells, then this generates a binomial distribution, and the probability that the first coincidence cell contains no candidate events is Since the first coincidence cell is equivalent to any other, the probability that the candidate events from the first data segment populate any given coincidence cell with one or more candidate events is thus given by In random noise, the candidate events produced by each different data segment are independent, so that the coincidence cells that are "marked" by one or more candidate events are also described by a (different) binomial distribution.Without loss of generality, again consider the first coincidence cell.The probability that it contains candidate events from n distinct data segments is then given by where a b = a! b!(a−b)! is the binomial coefficient.Thus the probability per coincidence cell of finding C max or more coincidences is given by For the desired P F = 0.1% = 10 −3 per 0.5 Hz band k, this equation is solved numerically to find E seg (k).The results for E seg (k) are shown in Figure 4.

D. Choice of false alarm probability and detection threshold
The goal of this work is to make a confident detection, not to set upper limits with the broadest possible coverage band.This is reflected in the choice of the expected false alarm probability and the choice of a detection threshold.
The detection threshold of 12 events was chosen because, as described in Section VII, the hardware injections are only "turned on" for 12 of the 17 data segments.The detection threshold ensures that these simulated signals are properly detected by the post-processing pipeline.
The choice of false alarm probability (P F = 0.1% = 10 −3 per 0.5 Hz band to have coincidences in C max = 7 or more data segments) is a pragmatic choice, which leads to an extremely small false alarm rate at the detection threshold.For actual data, the probability of finding 7 or more coincidences in a given 0.5 Hz band can be somewhat larger than the target value of 0.1% because the candidate events are not uniformly distributed over the grid of coincidence cells and because (as described in Section IV F 1) 16 sets of coincidence cells are used for each 0.5 Hz band.
In random noise, the probability of reaching the detection threshold of 12 coincidences depends on the number of cells in the coincidence grid, which is a function of frequency.Some representative numbers are given in Table III; they vary from about 10 −15 to 10 −13 depending upon the 0.5 Hz band.The false alarm probabilities decrease very rapidly with increasing coincidence number.For example the probability of finding 14 or more coincidences in random noise varies from about 10 −18 to 10 −21 .Once might ask why we chose to specify a uniform false alarm probability, across all frequencies, of 0.1% for C max = 7, rather than directly specify a much lower false alarm probability at the detection threshold C = 12.This was because we wanted the most significant coincident events due to noise alone to have C values a few less than our detection threshold, and we wanted these candidates to be uniformly distributed over frequency bands.Any detected signals could then be compared against fairly uniform fields of noise candidates in adjacent frequency bands.If a uniform false alarm probability had been specified at the C = 12 level, then the expected noise candidates with C ∼ 7 would not have been uniformly distributed over frequency, due to the differing numbers of coincidence cells in each frequency band.
The choice of detection threshold and false alarm probability sacrifice a small amount of sensitivity compared with a higher values, but ensures that high numbers of coincidences are extremely improbable in random noise.A strong signal (say a factor of 1.5 above the upper curve in Figure 9) would be expected to produce 15 or more coincidences in this detection pipeline.With the thresholds that we have adopted, this would stand out very strongly: the probability of having even one such an event appear in coincidence in random noise is about 10 −22 per 0.5 Hz band.

E. Shifting candidate event frequencies to a fixed fiducial time
In the second step of the post-processing, the frequency value of each retained candidate event is shifted to the same fiducial time: the GPS start time of the earliest (j = 4) data segment, t fiducial = t 4 = 793 555 944 s.This shifting is needed because a CW source with non-zero spin-down would produce candidate events with different apparent frequency values in each data segment.This FIG.5: Additional "wings" at the boundaries of each 0.5 Hz frequency band must be included in the coincidence analysis stage of the post-processing.This is because spin-down can carry a source below this half-Hz band, and spin-up can carry it above the band.To illustrate this, the frequency band k = 498 (covering [299, 299.5)Hz) is (partly) shown by the dark-gray shaded area.The dashed sloped lines show the boundaries of the small additional regions (light gray) in frequency space whose candidate events must also be considered in the post-processing.Because the allowed spin-up range is ten times smaller than the allowed spin-down range, the upper boundary has a slope ten times smaller than the lower boundary.
step would shift these candidate events back to the same frequency value: where ḟ and f (t j ) are the spin-down-rate and frequency of a candidate event reported by the search code in the result file, and t j is the timestamp of the first datum in the data segment.At the end of the second step, all candidate events for the 0.5 Hz band are merged into one file.
These candidate events are collected from a frequency interval that is slightly wider than 0.5 Hz.To see why this is necessary, consider a potential source whose frequency in the first data segment (j = 4) is at the lower (or upper) boundary of the 0.5 Hz interval.If the source has the minimum (or maximum) allowed value of ḟ , then in the later data segments it moves into, or is recorded in, the previous (or next) 0.5 Hz band.This effect is most apparent for the last j = 11 data segment, as illustrated in Figure 5.So in collecting the candidate events for analysis of a given 0.5 Hz band, the frequency range is enlarged slightly for events coming from later and later data segments, as shown in Figure 5.

F. Search for coincident candidate events
The third step and final stage of the post-processing is to search for parameter-space coincidence among the candidate events.If a CW source is present that is strong enough to be confidently detected, then it would produce large F-statistic values (i.e.candidate events) in many or all of the 17 data segments.In addition, the values of the frequency at the fiducial time f (t fiducial ), sky position (given by right ascension α and declination δ), and spindown ḟ for these candidate events would agree among all data segments (within some coincidence "window" or "cell").

Coincidence search algorithm
To find coincidences, a grid of cells is constructed in four-dimensional parameter space, as described previously.This analysis uses rectangular cells in the coordinates (f, ḟ , α cos δ, δ).The dimensions of the cells are adapted to the parameter space search.Each candidate event is assigned to a particular cell.In cases where two or more candidate events from the same data segment j fall into the same cell, only the candidate event having the largest value of 2F is retained in the cell.Then the number of candidate events per cell coming from distinct data segments is counted, to identify cells with more coincidences than would be expected by random chance.
The search for coincident cells containing large numbers of candidate events is done with an efficient code that uses linked-list data structures, N log N sort algorithms, and log N bisection search algorithms.To ensure that candidate events located on opposite sides of a cell border are not missed, the entire cell coincidence grid is shifted by half a cell width in all possible 2 4 = 16 combinations of the four parameter-space dimensions.Hence 16 different coincidence cell grids are used in the analysis.
The cells in the coincidence grid are constructed to be as small as possible to reduce the probability of coincidences due to false alarms.However, since each of the 17 different data segments uses a different parameter space grid, the coincidence cells must be chosen to be large enough that the candidate events from a CW source (which would appear at slightly different points in parameter space in each of the 17 data segments) would still lie in the same coincidence cell.

Frequency and spin-down coincidence windows
In frequency, the spacing of the parameter-space grid is largest for the data segment with the smallest value of T span,j , which is the first data segment j = 1.At first, this would appear to be the correct size ∆f for the coincidence cell in the frequency direction.However since the frequency of a candidate event must be shifted to a fixed fiducial time according to its spin-down value, and since that spin-down value can not be more accurate than the ḟ spacing, the size of the coincidence cell must be somewhat larger to accommodate the effects of this discretization error in ḟ .The coincidence window in the frequency direction is thus determined by ∆f = max where the maximization over j selects the data segment with the smallest T span,j (which is j = 1) and is the total time span between the latest and earliest data segments.For safety, e.g. against noise fluctuations that could shift a candidate peak, ∆f has been increased by a further 40% below 300 Hz, so that the width of the coincidence cell in frequency is ∆f = 0.77 × 10 −3 Hz, and by 30% above 300 Hz, so that ∆f = 1.08 × 10 −3 Hz.
For the spin-down parameter, the size of the coincidence cell is given by the largest d ḟj spacing in the parameter space grid, which is also determined by the smallest value of T span,j .For safety this is also increased by 40% below 300 Hz giving ∆ ḟ = 3.7 × 10 −10 Hz s −1 , and by 30% above 300 Hz giving ∆ ḟ = 5.18 × 10 −10 Hz s −1 .

Coincidence windows in apparent sky position
Determining the optimal size for the coincidence cells in the sky coordinate directions is more difficult.Each of the 17 different data segments uses a different sky-grid, as illustrated in Figure 3. Ideally the size of the coincidence cells in these sky directions must be just large enough to enclose one parameter space grid point from each of the 17 different sky-grids.A practical solution to determine the coincidence cells, which is close to optimal, makes use of an observation concerning the parameter-space metric that first appears in [25].
To understand the properties of the parameter-space metric, first consider the relative orders-of-magnitude of the different frequency modulation effects.spin-down rate ḟ of a source: the effects of the Earth's center of mass motion are degenerate with a shift in frequency and spin-down.So the Earth's orbital motion causes a signal only to shift to a different template in f and ḟ ; the Earth's rotation has a period of one sidereal day and can not be modeled by a shift in f or ḟ .Note that terms are neglected only in determining where to place search grid points in parameter space (because the neglected terms have an insignificant effect on where the grid points are placed).The actual filtering of the data uses "exact" barycentering routines (total timing errors below 3µs).
The search grid in parameter space is a Cartesian product of a frequency grid, a spin-down grid, and a twodimensional sky grid.Since the search maximizes the detection statistic over frequency and spin-down, the metric used to place grid points on the sky [45] may be obtained by minimizing the four-dimensional metric over frequency and spin-down and projecting it into the sky directions.As shown in the previous paragraph, this twodimensional projected sky metric is well-approximated by assuming that the Earth is spinning about its axis but has its center of mass at rest.If the coherent integration period is an integer number of days, then by symmetry the two-dimensional metric on the sky is invariant under rotation about Earth's axis (∂ α is a Killing vector).This is still an approximate symmetry for the search described here, since the coherent integration period and T span are longer than the rotation period (one day).
One can easily find coordinates in which this approximate sky metric (the four-dimensional metric, minimized over frequency and spin-down and projected onto the sky directions) is proportional to diag(1, 1).These new skycoordinates are obtained by perpendicular projection of a point on the two-sphere (celestial sphere) vertically down into the unit radius disk that lies in the equatorial plane.If n denotes a unit vector pointing from the SSB to the source the new coordinates are the components of n in the equatorial plane: n x = cos δ cos α, n y = cos δ sin α.Points which are equally spaced in these coordinates correspond to equal spacing in Doppler shift, since source Doppler shift due to the Earth's rotation is just proportional to the component of the source direction vector in the equatorial plane.It then follows from rotational invariance that (with these approximations) the projected sky metric in these coordinates is proportional to diag(1, 1) [47].The effect may be immediately seen in Figure 6: the grid of "equally-spaced" points forms a (roughly) uniform square grid on the unit radius disk in the equatorial plane.Computing the Jacobian of the coordinate transformation shows that in the original coordinates (α, δ) the coordinate-space density of grid points should be proportional to | cos δ sin δ| = | sin(2δ)|.
This simple behavior of the projected sky metric guides the construction of the coincidence-windows in the sky directions.Define polar coordinates (r, α) on the unit radius disk in the equatorial plane by r = cos δ.The coordinate boundaries of uniformly distributed coincidence cells containing a single parameter-grid point are then given by r dα = dr = const.When written in terms of the original coordinates this becomes This is not directly useful, because it is singular as δ → 0, but suggests a coincidence window size which varies with declination according to the model ∆α(δ) = ∆α(0)/ cos(δ) ( 14) To ensure continuity at δ = δ c , the transition point δ c is defined by the condition ∆α(0)/| sin(|δ c | − κ ∆α(0))| = ∆δ(0).κ is a tuning parameter of order unity, described below.An example of this coincidence window model is shown in Figure 7.
For each of the 145 different 10 Hz bands, the window size is determined by the three constants ∆α(0), ∆δ(0) and κ.For each sky-grid p these values are directly determined from the sky-grids used in the search as follows.For each 10 Hz frequency band the maximum distances between adjacent declination points to either side are calculated for each of the 17 sky-grids as a function of declination δ.In this way, 17 different overlaying curves ∆ j (δ) (one per data segment) are obtained.These are indicated by the circles in Figure 7 for a representative 510 Hz−520 Hz frequency band as illustration.Then the parameter ∆δ(0) is obtained by considering the maximum separation to either side between all neighboring declination grid points and between the 17 different skygrids, increased by a 10% safety factor as ∆δ(0) = 1.1 max The largest separations near the poles (1.4 < |δ| ≤ π/2) are then found and increased by a safety factor of 20% to determine the parameter ∆α(0) via Finally, the parameter κ was chosen by visually examining diagrams similar to Figure 7 for all 145 of the 10 Hz bands.A κ value of 1.5 was found to be sufficient in most cases, while some bands required somewhat higher or lower values.For each triple of sky-coincidence parameters, tests were then performed to check that each skycell contained at least one sky-point from each data segment.In Figure 7 the complete declination coincidence window model given by Equation ( 14) is represented by the solid black curve.The three parameters for all sky-grids as a function of frequency are shown in Figure 8.As stated above, the sky-grids are constant for 10 Hz-wide steps in frequency, and so these parameters vary with the same step-size.

G. Output of the coincidence search and significance of a candidate
The output of the post-processing pipeline is a list of the most coincident candidates.In each frequency band of coincidence-window width ∆f , the coincidence cell containing the largest number of candidate events is found.Thus for each frequency band the pipeline finds the most coincident candidate maximized over the sky and over the spin-down parameter-range.The pipeline outputs the average frequency of the coincidence cell, the average sky position and spin-down of the candidate events, the number of candidate events in the coincidence cell, and the significance of the candidate.The "significance" of a candidate was first introduced in [27].A candidate, consisting of the candidate events 1, . . ., Q, has significance where Q ≤ 17 is the number of candidate events in the same coincidence cell.To understand the meaning of the significance, consider the case of pure Gaussian noise with no signal present.In this case the values of 2F have a central χ 2 distribution with four degrees of freedom.The corresponding probability density function p 0 of 2F is given by The false alarm probability P 0 that 2F exceeds a certain threshold 2F 0 when there is no signal present has the form The joint false alarm probability of candidate events 1, . . ., Q can be written as Q q=1 P 0 (2F q ).Therefore, in this analysis candidates are ranked according to where S = Q q=1 − ln P 0 (2F q ) is exactly the significance defined in Equation (17).Thus ranking candidates by S is equivalent to ranking them by false alarm probability: candidates with large positive significance would not be expected to occur in Gaussian random noise.As will be described later in Section VIII the significance is used to rank equally coincident candidates within the same narrow frequency-band.In such cases the candidate with the largest significance is considered.
The post-processing pipeline has been validated by internal testing, and also using simulated CW signals created via so-called "software injections" [48].In addition, Section VII presents realistic end-to-end testing of the analysis pipeline using "hardware injections", where simulated isolated-pulsar signals are physically added into the interferometer control systems to produce instrumental signals that are indistinguishable from those that would be produced by physical sources of gravitational waves.

V. ESTIMATED SENSITIVITY
The sensitivity of the search is estimated using Monte-Carlo methods for a population of simulated sources.The goal is to find the strain amplitude h 0 at which 10%, 50%, or 90% of sources uniformly populated over the sky and in their "nuisance parameters" (described below) would be detected.As previously discussed, the false alarm probability (of getting 7 or more coincidences in a 0.5 Hz frequency-band) is of order 10 −3 .In this analysis, "detectable" means "produces coincident events in 12 or more distinct data segments".The false alarm probability for obtaining 12 or more coincidences in a 0.5 Hz band is of order 10 −14 , making it extremely unlikely for candidate events from random noise to show up consistently in 12 or more segments of data.This is therefore an estimate of the signal strength required for high-confidence detection.The pipeline developed for this purpose operates in 0.5 Hz frequency bands and consists of testing a large number of distinct simulated sources (trials) to see if they are detectable.A "trial" denotes a single simulated source which is probed for detection.

A. Iteration method
For every trial, source parameters are randomly chosen independent of the previous trial, except for the intrinsic amplitude h 0 .For the first trial h 0 is set to a starting value 30 S h /30 hours.The rule for varying h 0 depends upon the last N last trials, where N 10% last = 100, N 50% last = 20, and N 90% last = 100.In the past N last trials, if more than 10%, 50%, or 90% of simulated sources have been detected then h 0 is decreased by 0.25 h 0 /n trial for the following trial, where n trial is an integer in the range 0 ≤ n trial ≤ 1000 that is incremented with each additional trial.On the other hand, if less than 10%, 50%, 90% of simulated sources have been detected then h 0 is increased by 0.25 h 0 /n trial for the next trial.This process is followed until h 0 has converged to a stationary range after 1000 trials.Then the median of h 0 is found using the h 0 -values starting from that trial, where the desired detection level has been reached the first time during the N last trials.The following describes the pipeline for a single trial.

B. Population of simulated sources
For each trial, a random source frequency is chosen from a uniform distribution in a given 0.5 Hz frequency band and a frequency-derivative is drawn from a uniform distribution in the range covered by the Einstein@Home search.A sky position is chosen from a uniform distribution on the celestial sphere, and random values are chosen for the pulsar "nuisance parameters".These are the inclination parameter cos(ι), initial phase Φ 0 , and polarization angle ψ as defined in [19], and are all drawn from the appropriate uniform distributions.

C. Determination of 2F values for a single simulated source
The noise floors of the different SFTs are estimated at the source's frequency intervals using a running median with a window of ±25 frequency bins.Figure 1 showed the average of these for the data segments used from each instrument.
Then for each set of parameters the detection statistic 2F is estimated using a semi-analytic method.From the estimated noise floor at the simulated source's frequency and given the other source parameters, the expectation value of the F-statistic is calculated analytically as given in [19].A random number is then drawn from a non-central χ 2 distribution with four degrees of freedom and with the corresponding mean value.
These random numbers, drawn from the appropriate distribution of 2F values, would be sufficient to determine the sensitivity of the search, if the template grid in parameter space were very closely spaced, so that the template bank always contained at least one waveform that was a very close match to the putative signal.However the grid in parameter space used in this search is quite "coarse", corresponding to a mismatch of 20% below 290 Hz and 50% above 300 Hz, so that the 2F value that would be returned by the search might be significantly lower than the value drawn from the distribution above.To account for this effect, the sensitivity prediction considers the mismatch between the parameters of the simulated signals (determined by a random number generator) and the template grid points of the search (fixed as described earlier).For each simulated source, the search grid point that is nearest in the sense of the metric is located.Then, using the parameter-space metric, the mismatch between the simulated signal and the closest search template is computed.This gives the fractional amount by which the 2F value is reduced.
From this ensemble of 2F values, one can determine the number of coincidences that would be produced by each simulated source.As previously described, the postprocessing sets an effective lower threshold on the Fstatistic of the retained candidate events.For each simulated source, these thresholds are determined by examining the exact workunits that would have contained the corresponding signal.Then the number of data segments for which the estimated 2F values are above threshold is counted.If the 2F values are above threshold in 12 or more of the 17 data segments, the simulated source is labeled as "detected", else it is labeled as "undetected".

D. Search sensitivity, estimated errors, and comparison with the expected sensitivity
Shown in Figure 9 are the resulting search sensitivity curves as functions of frequency.Each data point on the plot denotes the results of 1000 independent trials.These show the values of h 0 as defined in [19] such that 10%, 50%, and 90% of simulated sources produce 12 or more coincidences in the post-processing pipeline.The dominant sources of error in this sensitivity curve are uncertainties in the magnitudes of the LIGO detector response functions (calibration errors).Details of these frequency-dependent uncertainties may be found in reference [41].The uncertainties are typically of order 5% (L1) and 8% (H1) in the frequency band from 50-1500 Hz, and are always less than 10%.Systematic errors, which arise because of the finite number of Monte-Carlo trials and similar effects, are less than ±2%.These can be added in quadrature to the uncertainties given in [41] to obtain frequency-dependent error bounds in the sensitivity curve.The resulting error in this sensitivity plot is below 10% at all frequencies.
The behavior of the curves shown in Figure 9 essentially reflect the instrument noise given in Figure 1.One may fit the curves obtained in Figure 9 to the strain noise power spectral density S h (f ) and then describe the three sensitivity curves in Figure 9 by where the pre-factors R D for different detection probabilities levels D = 90%, 50%, and 10% are well fit below 300 Hz by R 90% = 31.8,R 50% = 20.1, and R 10% = 12.6, above 300 Hz by R 90% = 33.2,R 50% = 21.0, and R 10% = 12.9.Some other published CW searches were done at 95% detection confidence.For comparison in the next section, the sensitivity of this search at that confidence is R 95% = 36.2below 300 Hz and R 95% = 37.9 above 300 Hz.The iteration method previously described used N 95% last = 200.

E. Comparison with sensitivity of other search and upper limit methods
The methods used here would be expected to yield very high confidence if a strong signal were present.It is interesting to compare the sensitivity of this detection method with the sensitivity of CW upper limits such as that of reference [27].The sensitivity of the highconfidence detection method used here is well-described by Equation 21.The same equation describes the results of the S2 F-statistic loudest-event upper limit analysis [27], but in that work the 95% detection confidence curve has a pre-factor R 95% = 29.5.It is useful to understand the source of this apparent difference in sensitivity (a factor of 37.9/29.5 = 1.28).There are three main contributors to this difference.
The most significant difference between the two analyses is the spacing of the search grid templates.In this search, the templates are significantly farther apart (worst-case 50% loss of signal-to-noise ratio, or expected 2F) than in [27], where the worst-case mismatch was negligible.This effect of employing different mismatches has been studied by running the sensitivity estimation pipeline using simulated sources only at the template grid points, and reduces R 95% in Equation ( 21) by a factor of 1.17.
Another difference between the two analyses is the detection criteria.In this work, detection requires a signal to produce 12 or more coincidences between the 17 different data segments.This corresponds to a false alarm probability (in Gaussian noise) of order 10 −14 per 0.5 Hz frequency-band.This is different from [27], where simulated signals are compared against the loudest candi-FIG.9: Estimated sensitivity of the Einstein@Home search for isolated CW sources in the LIGO S4 data set.The set of three curves shows the source strain amplitudes h0 at which 10% (bottom), 50% (middle) and 90% (top) of simulated sources would be confidently detected by the Einstein@Home pipeline (appearing in coincidence in 12 or more of the 17 data segments).The centers of the circles labeled P0 to P9 give the strain amplitudes of the S4 hardware injections as listed in Table IV.Based on this curve, one would expect that the simulated signals P3, P4 and P8 could be confidently detected, and that P0, P1 and P5 would not be detected.

date found (largest 2F
).An equivalent detection criterion for this work would be to compare the simulated signals against the loudest candidates (per 0.5 Hz band).These typically had 7 or 8 coincidences, corresponding to a Gaussian noise false alarm probability of order 10 −3 and 10 −5 , respectively.To estimate the effect on the sensitivity, the sensitivity estimation pipeline was rerun, but now requiring the signal to exceed the 2F-thresholds in only 7 of the 17 data segments.This reduced R 95% in Equation ( 21) by an additional factor of 1.14.
The least important difference between the two analyses is the effective threshold on the F-statistic.As explained in Sections IV A and IV C, only a subset of candidate events with the largest values of 2F are retained in the post-processing, fixing the false alarm probability.The smallest 2F-value on this list is typically around 28 or slightly higher.In [27] a fixed threshold of 2F = 20 has been used.Then, events with a combined significance below S = 64.4[see Equation (17)] were also dropped.While it is difficult to compare these two criteria, they seem to be fairly close.
Taken together, the differences in grid spacing and detection thresholds are responsible for, and explain, the sensitivity difference in the two analyses (a factor of 1.17 × 1.14 = 1.33 ≈ 1.28).

VI. VETOING OF INSTRUMENTAL LINE ARTIFACTS
When the instrument data was prepared and cleaned, narrow-band instrumental line features of known origin were removed, as previously described in Section II.However, the data also contained stationary instrumental line features that were not understood, or were poorly understood.These features were not removed from the input data for the search.As a consequence, the output from the post-processing pipeline contains instrumental artifacts that in some respects mimic CW signals.But these artifacts tend to cluster in certain regions of parameter space, and in many cases they can be automatically identified and vetoed.In previous incoherent searches for CW sources in LIGO data [32] the S-veto method has been employed, which excludes the regions of parameter space where there is little or no frequency modulation from the Earth's motion, leading to a relatively stationary detected frequency.This cannot directly be applied to a coherent matched-filtering search using the F-statistic.Thus the method used here will be similar, but arises from a conceptually different approach that is appropriate for an F-statistic search.

A. Parameter space locations of instrumental lines
For a coherent observation of 30 hours the parameterspace regions where instrumental lines tend to appear are determined by the global-correlation hypersurfaces [49] in parameter space.The global-correlation hypersurface on which stationary instrumental lines preferably produce candidate events is shown in [49] to be described by where c denotes the speed of light, n is a unit vector pointing to the source's sky-location in the SSB frame and relates to the equatorial coordinates α and δ by n = (cos δ cos α, cos δ sin α, sin δ), ω is the angular velocity vector of the Earth as it orbits around the Sun (|ω| ≈ 2π/year) and v av is the average velocity of the Earth (|v av | ≈ 9.9 × 10 −5 c).This equation can also be understood on simple physical grounds.The l.h.s. of Equation ( 22) is the rate of change of detector output frequency, for a source whose SSB frequency and spindown are f and ḟ .An instrumental line, which has fixed detector frequency, mimics such a source when the l.h.s.vanishes.
The potential CW sources whose locations in parameter space are consistent with Equation ( 22) will not produce a modulation pattern that would distinguish them from an instrumental line.As the resolution in parameter space is finite, the post-processing analysis eliminates (vetoes) candidates that satisfy the condition where the parameter > 0 accounts for a certain tolerance needed due to the parameter-space gridding.This tolerance-parameter can be understood as where ∆f denotes width in frequency (corresponding to the coincidence-cell width in the post-processing) up to which candidate events can be resolved during the characteristic length of time ∆T , and N cell represents the size of the vetoed or rejected region, measured in coincidence cells .In this analysis ∆T = 2 122 735 s (≈ 24 days) is the total time interval spanned by the data For potential sources that satisfy (23), the modulation due to the Earth's motion does not make the signal appear in more than N cell coincidence cells during ∆T .

B. Fraction of parameter space excluded by the veto
One can visualize and calculate the volume of the region in four-dimensional parameter space that is excluded by this veto.For a given source sky position, Equation (23) is linear in f and ḟ .Thus, for fixed sky position n, the veto condition (23) defines two parallel lines in the (f, ḟ )-plane.Candidate events that lie in the region between the lines are discarded (vetoed).Candidates that lie outside this region are retained (not vetoed).The locations of these two lines in the (f, ḟ ) plane depend upon the sky position.The fractional volume excluded by the veto depends upon whether or not (as the source position varies over the sky) the excluded region between the lines lies inside or outside of the boundaries of the search, or intersects it.In this work, for the search region −f /τ < ḟ < 0.1 f /τ described in the Abstract, the excluded region lies entirely within the parameter space above 300 Hz, but crosses the boundaries below 300 Hz.This is because a wider range of spin-down ages is searched below 300 Hz.
The fractional volume of the region in parameter space excluded by the veto may be easily calculated.The details of the calculation are in Appendix A. The resulting fraction of sky excluded by the veto (uniformly averaged over spin-down) as a function of frequency is shown in Figure 10.In this search, the fraction of the sky excluded for frequencies f ∈ [300, 1500) Hz has been fixed at the constant fraction 30%.In this search, the fraction of the sky excluded for frequencies f ∈ [50, 300) Hz has been chosen to depend upon the values of f and ḟ , where the uniform average of the excluded sky fraction over the spin-down range considered in this analysis is 36% at 50 Hz and 6% just below 300 Hz.Finally, Figure 15 shows a conclusion diagram illustrating which of the candidates have been vetoed in this search.

VII. HARDWARE-INJECTED SIGNALS
A good way to test and validate search algorithms and code is to add simulated signals into the detector's data stream.This can either be done while the experiment is in progress (real-time injections) or after the data has been collected (software injections).If it is done while the experiment is in progress, the simulated signals can either be added into the hardware (into feedback and error-point control signals) or after data acquisition.
At the time that the S4 run was carried out, ten simulated CW signals were injected at the hardware level: using magnetic coil actuators, the interferometer mirrors were physically made to move as if a gravitational wave was present.

A. Parameters of hardware injections
Table IV shows the parameters of the hardware injections that were carried out at the LIGO detectors during the S4 run, mimicking gravitational-wave signals from ten different isolated pulsars with different frequencies, sky locations, and frequency derivatives.The ten artificial pulsars are denoted Pulsar0 to Pulsar9.At the time of the injections, lack of complete knowledge of the instrument's response function (calibration) meant that the actual hardware injections did not actually have the intended strain amplitudes as given in the Table .The effective strain amplitudes may be computed from correction factors provided in reference [32].These factors are 1.12 for all simulated pulsars in the H1 detector.In the  L1 detector, the correction factor is 1.11 for all simulated pulsars, except for Pulsar1 (1.15) and Pulsar9 (1.18).

B. Duty cycle of hardware injections
During S4 the hardware injections were not active all of the time.Table V shows the fractional overlap between the times when the hardware injections were active and the times of the S4 Einstein@home data segments.As can be seen from the table, the hardware injections were only turned on during twelve of the data segments analyzed in this paper, and for two of those twelve data data segments, the injections were only turned on for about 20% of the data taking time.In the remaining ten data segments, the hardware injections were turned on for al-most the entire segment.This needs to be taken into account when analyzing the Einstein@Home search results for these injections.Because of this, the maximum possible number of coincidences expected from these simulated signals is 12, even though 17 data segments are searched.

C. Results from the hardware injections
For each hardware-injected pulsar signal Table VI compares a prediction for the outcome of the Einstein@Home search to the actual results found through the Einstein@Home analysis pipeline.The predicted values given in Table VI are obtained by feeding the sensitivityestimation pipeline, which was described in Section V,  IV: Parameters for hardware-injected CW signals during the S4 run, labeled Pulsar0 to Pulsar9.The parameters are defined at the GPS reference time t ref = 793130413 s in the Solar System Barycenter.These are the frequency f (t ref ), the spin-down ḟ , the sky position right ascension α and declination δ, the polarization angle ψ, the initial phase Φ0, the inclination parameter cos ι, and the dimensionless strain amplitude h0.Because the calibration was only accurately determined after S4 was finished, the H1 strain amplitudes should be multiplied by the correction factor 1.12.The L1 amplitudes should be multiplied by 1. 15   sar8 are clearly detected.The parameters of Pulsar7 and Pulsar9 are such that in both cases the search pipeline found 7 coincidences, but this is consistent with the level of coincidences that would result from Gaussian noise with no signal present, and so these are not confidently detected.
Figure 11 presents the results of the search for all hardware injections.Small subspaces of the search parameter space around the hardware injections are shown, as well as the locations of the artificial signal parameters.The subspaces considered in Figure 11 and also for the (measured) results presented in   IV.The data points colored in dark-gray show the candidates resulting from the hardware-injected CW signals.
to a band of 2 × 10 −4 f to either side of the injected frequency.This choice of frequency-bandwidth is motivated by the global parameter-space correlations [49,50] and represents the approximate maximum extension in frequency-direction of a global-correlation hypersurface generated by a single signal.
Moreover, the global correlations also explain the significant sky position offset between the a priori location of the simulated source and the location where the search located the source with respect to the detected signals Pulsar4 and Pulsar8.This arises because for Pulsar4 and Pulsar8, the spin-down range that is searched (region between dashed lines in the far right column) is too small to include the actual spin-down value used in creating the simulated signals.Due to the global parameter-space correlations the offset between the actual and detected spin-down value gives rise to the offset in the sky position.The observed structure of large-coincident events in the sky is consistent with a (sky-projected) globalcorrelation hypersurface first found analytically in [49].This is also why Pulsar4 shows a considerable discrepancy between the significance that would have been expected if the search-grid had also covered the a priori parameters, and the significance that was actually observed in the search, as shown in Table VI.

VIII. RESULTS
This section presents the results of the Einstein@Home S4 CW search.Figures 12 and 13 give a summary of all post-processing results, from 50 to 1500 Hz.In Figure 12 the coincidences and significance of all candidates that have 7 or more coincidences are shown as functions of frequency.Figure 13 presents the same information as given in Figure 12, but projected on the sky, and showing all cells that have more than 7 candidate events.
In Figure 13 the number of coincidences is maximized over the entire sky and full spin-down range.The color indicates the numbers of coincidences, where the same color-scale has been used in each plot.The maximum possible number of coincidences ranges from a minimum of 0 to a maximum of 17 (the number of data segments analyzed).The meaning of 0 coincidences is that there is no candidate event found, 1 coincidence means a single candidate events is found (which is always coincident with itself).
Four illustrative examples of different types of typical post-processing results in particular 10 Hz bands are shown in Appendix B by Figures 16,18,17,and 19.Table VII shows all candidates (cells) which have 10  cand and C L1 cand denote the number of coincidences contributing to C cand from detector H1 and L1, respectively.The column "Information" lists information about the source.The following are understood sources of narrow-band line noise in the instrument: "Demod" are the electronics boards that demodulate the signal at the antisymmetric port of the interferometer, "H1 (or L1) MC 1" is a violin mode resonance of the first mode cleaner mirror, "H1 MC 2/3" are violin mode resonances of the second and third mode cleaner mirrors,"TM violin" are harmonics of the test mass violin modes, "EX +15v" is a fifteen volt power supply at the end station of the X arm, "EM Interference" is electromagnetic interference, "H1 Cal" are side-bands of calibration lines at 393.1 Hz and 1144.3Hz.
or more coincidences.In cases where a set of candidates is clustered together at slightly different frequencies, Table VII lists the bandwidth in frequency covered by these candidates and shows the parameters of the most coincident candidate.If candidates within these narrow frequency-bands have the same number of coincidences, then the candidate with the largest significance is shown.
Table VIII shows the same information after the veto method described in Section VI has been applied, for candidates with 9 or more coincidences.There are no candidates that exceed the predefined detection threshold of appearing in 12 or more data segments.(Note that this would be a threshold for initiating a more extensive investigation of the candidate event, not a threshold for announcing a discovery!) Figure 14 shows all candidates from the post- and S cand are for the most significant most coincident candidate within the frequency band, where C H1 cand and C L1 cand denote the number of coincidences contributing to C cand from detectors H1 and L1, respectively.The column "Information" lists information about the source.The following are understood sources of narrow-band line noise in the instrument: "Demod" are the electronics boards which demodulate the signal at the antisymmetric port of the interferometer, "H1 MC 2/3" are violin mode resonances of the second and third mode cleaner mirrors,"EM Interference" is electromagnetic interference, "H1 Cal" are side-bands of a 1144.3Hz calibration line.For the single candidate labeled "Unknown" in the last column no instrumental source could be confidently identified, however the 9 coincidences are far below the confident-detection threshold.
processing results that have not been discriminated by the veto introduced in Section VI. Figure 15 illustrates the fraction of candidates that has been excluded by the veto.Removing fractional bands of 2×10 −4 f around the frequencies f of the S4 hardware injections, the veto discriminates 99.5% of all candidates that have more than 7 coincidences.

IX. CONCLUSION
These are the first published results from the Einstein@Home project, which was launched in February 2005.While no credible CW sources were found in this search of LIGO S4 data, the results clearly establish that this type of distributed computing project can carry out a credible and sensitive search for such signals.
In retrospect, it probably would have been a good idea to employ identical grids on the four-dimensional parameter space for all 17 data segments.This would have required more CPU time on the part of participants, but would have greatly simplified and sped up the development of the post-processing pipeline and would also have greatly simplified the interpretation of the results.
A similar search (also with a 30-hour time baseline) has already been completed using 660 hours of data from the beginning of the S5 science run.The post-processing of that data set is currently underway, using methods identical to those employed here.Future Einstein@Home searches overcome some of the sensitivity limitations discussed at the end of Section V by doing the incoherent step (called "post-processing" in this paper) on the host machines.This allows the use of the optimal threshold of 2F ∼ 5, so those searches are expected to be the most sensitive blind CW searches that will be possible using LIGO data.Results from those searches should become available within the next one to two years, and are expected to offer more than one order of magnitude improvement in strain sensitivity compared with the work presented here.
In the longer term, further increases in sensitivity will probably result mostly from improvements in the detectors rather than from improvements in the data analysis methods.In 2009 LIGO is expected to begin its S6 run with an "enhanced" detector configuration that should improve on S5 sensitivity by at least a factor of two.By 2014, an advanced LIGO detector configuration should give at least another factor of five improvement.By combining these data sets with those from LIGO's international partner projects Virgo and GEO, there is FIG.14: Candidates not eliminated by the veto.This shows Hammer-Aitoff sky projections of all candidates obtained from post-processing that had more than 7 coincidences and that passed the veto.The upper plot includes the S4 hardware injections.The lower plot removes bands of 2 × 10 −4 f width to either side of the hardware injections' frequencies f .In comparison to Figure 13, after excluding the hardware injections, the veto rejects 99.5% of all candidates.
real hope that the first direct CW detection can be made using methods like the ones described here.

X. ACKNOWLEDGMENTS
The authors thank the tens of thousands of volunteers who have supported the Einstein@Home project by donating their computer time and expertise for this analysis.Without their contributions, this work would not have been possible.
The authors gratefully acknowledge the support of the United States National Science Foundation for the construction and operation of the LIGO Laboratory and the Particle Physics and Astronomy Research Council of the United Kingdom, the Max-Planck-Society and the State of Niedersachsen/Germany for support of the construction and operation of the GEO600 detector.The authors also gratefully acknowledge the support of the research by

FIG. 1 :
FIG. 1: Strain amplitude spectral densities p S h (f ) of the S4 data from the LIGO detectors H1 (top) and L1 (bottom).The gray curves are medians of the entire uncleaned LIGO S4 science-mode data set with a frequency resolution of 0.125 Hz.The black curves show the cleaned S4 data used in this analysis with a frequency resolution of 0.5 Hz.The top (bottom) plot is the mean of the 10 H1 (7 L1) 30-hour data segments used in this Einstein@Home analysis.

FIG. 3 :
FIG.3: Four different sky-grids in Hammer-Aitoff[46] projection.The top row is for frequency f = 60 Hz and the bottom row is for f = 310 Hz.The left column shows data segment j = 1 (from H1) with a spanned time of Tspan,1 = 33.8h, while the right column shows data segment j = 15 (from L1) with a spanned time of Tspan,15 = 39.8 h.The grid points are spaced more closely for a longer spanned time, and for a higher frequency.

FIG. 4 :
FIG. 4: The circles show the number of candidate events Eseg(k) retained per data segment and per 0.5 Hz frequency band in the post-processing in each 10 Hz band.The dashed curve represents the number of candidate events which are returned from volunteering hosts participating in Einstein@Home.The strange location of the point at 290 Hz is explained in Sec.III B 1.
Since the sky-grids are fixed in 10 Hz intervals, E seg (k) takes the same value for all values of k in the range of 20p + 1, • • • , 20(p + 1) where p labels the sky-grids by an integer in the range p ∈ 0, • • • , 144.It is illustrative to look at a specific case.For example consider the 0.5 Hz band covering [301.0,301.5)Hz, this band is labeled by k = 503.As is shown in Figure 4, in this band the post-processing pipeline retains E seg (k = 503) = 24 960 candidate events from each of the 17 different 30-hour data segments.The 30-hour data segment from H1 with the shortest time span (j = 1) has approximately 4.3 × 10 8 templates divided among just M (j = 1, k = 503) = 2 workunits, so 12 480 candidate events are retained from each of these workunits.The 30hour data segment from L1 with the longest time span (j = 15) has approximately 1.7 × 10 9 templates divided among M (j = 15, k = 503) = 7 workunits, so 3 565 candidate events are retained from each of these workunits.

FIG. 6 :
FIG. 6: Example sky-grid and its projection onto the equatorial plane.This sky-grid corresponds to the data segment j = 7 used in the frequency range from 60 Hz to 70 Hz.The top plot shows a Hammer-Aitoff projection of the sky-grid.The middle plots show the projection of the sky-grid points in the northern hemisphere (left column) and in the southern hemisphere (right column) onto the equatorial plane.The bottom plots show histograms of cos(δ) and the dashed line represents a linear fit to the distribution showing its uniformity.

FIG. 7 :FIG. 8 :
FIG. 7: The sky coincidence-window model for the frequency band from 510 − 520 Hz.The horizontal axis shows the declination δ in radians.On the vertical axis, the circles labeled ∆j(δ) correspond to the the maximum distance in radians to neighboring δ-points on either side.The solid curve shows the declination coincidence-window model ∆δ(δ) with the parameters ∆δ(0) = 0.2489, ∆α(0) = 0.0433, and κ = 1.5 used in this frequency band.It lies just above the largest declination separations shown.The stars denote the borders of the declination coincidence cell-grid and the diamonds represent the borders of the shifted declination coincidence cell-grid.

FIG. 10 :
FIG.10:The average fraction of sky excluded by the veto method as a function of frequency, uniformly averaged over the searched spin-down range.

FIG. 11 :
FIG. 11: Einstein@Home results showing the 10 hardware-injected pulsar signals labeled P0 to P9.Here, a narrow band of width 2 × 10 −4 f to either side of each injection's frequency f is considered.The color-bar in each plot indicates the number of coincidences.As shown in the color-scale, only candidates having 7 or more coincidences appear.For each hardware injection a group of three different sub-plots are given representing different projections of the parameter space.The left sub plot is a Hammer-Aitoff projection of the entire sky.The middle sub-plot shows declination δ versus frequency f .The right sub-plot shows spin-down ḟ versus frequency f , where the region between the two horizontal magenta dashed lines refers to searched range of spin-downs.The center of a magenta circle represents the location of the injection.P4 and P8 appear at the wrong sky position because their intrinsic spin-downs lie outside the searched range.Table VI shows a comparison with the expectations for these simulated signals.

FIG. 12 :
FIG. 12: Numbers of coincidences greater than 7 (top) and the significance (bottom) of all candidates found in the Einstein@Home post-processing, shown as functions of frequency.The light-gray shaded rectangular regions highlight the S4 hardware injections, listed in TableIV.The data points colored in dark-gray show the candidates resulting from the hardware-injected CW signals.

FIG. 13 :
FIG. 13: All candidates obtained from the post-processing that have more than 7 coincidences, shown in Hammer-Aitoff projections of the sky.The color-bar indicates the number of coincidences of a particular candidate (cell).The upper plot includes the S4 hardware-injected pulsars.In the lower plot, bands of 2 × 10 −4 f width to either side of the hardware injections' frequencies f have been removed.

FIG. 15 :
FIG. 15: Conclusion diagram of candidates discriminated by the veto method.All candidate cells obtained from postprocessing that have more than 7 coincidences are shown, where the color-bar indicates the number of coincidences of a particular cell.The vertical axis represents the veto quantity on the left-hand side of (23), as a function of frequency.Candidates located below the magenta line are eliminated by the veto.The four accumulations of highly coincident cells above the magenta line are the hardware injected pulsars, which are not eliminated by the veto.

FIG. 17 :
FIG. 17: Einstein@Home S4 Post-Processing results for a "quiet" frequency band of real instrumental data from 110.0-120.0Hz.From top to bottom the different plots show the numbers of coincidences in a 3D map of sky and spin-down, in a 2D Hammer-Aitoff projection of the sky, in a 2D plot of declination over frequency, and in a histogram as a function of frequency.

FIG. 18 :
FIG. 18: Einstein@Home S4 Post-Processing results for the frequency band 570.0-580.0Hz including a hardware injected CW signal (Pulsar2).From top to bottom the different plots show the numbers of coincidences in a 3D map of sky and spin-down, in a 2D Hammer-Aitoff projection of the sky, in a 2D plot of declination over frequency, and in a histogram as a function of frequency.

FIG. 19 :
FIG. 19: Einstein@Home S4 Post-Processing results for a "noisy" frequency band of data polluted by instrumental noise artifacts from 640.0-650.0Hz.These spectral features are resonance modes of the mode cleaner optics suspensions.From top to bottom the different plots show the numbers of coincidences in a 3D map of sky and spin-down, in a 2D Hammer-Aitoff projection of the sky, in a 2D plot of declination over frequency, and in a histogram as a function of frequency.

TABLE I :
Segments of S4 data used in this search, in order of decreasing sensitivity at 141.3 Hz for H1 and at 135.3 Hz for L1.The columns are the data segment index j, the GPS start time tj, the GPS end time t end j , and the time spanned Tspan,j = t end j − tj.

TABLE II
: Instrumental lines removed from the input data.The three columns show the frequency of the fundamental harmonic fLine, the number of harmonics N , and the bandwidth ∆fLine removed on either side of the central frequency (total bandwidth removed per harmonic is 2∆fLine).In total 77.92Hz of H1 data and 144.29 Hz of L1 data have been excluded ab initio.If ∆fLine = 0 then the line-cleaning algorithm replaces a single Fourier bin with the average of bins on either side.The spacing between Fourier bins is 1/1800 Hz.

TABLE III :
False alarm probabilities PF in four different 0.5 Hz frequency bands labeled by the integer k.The frequency at the lower boundary of the 0.5 Hz band k is denoted by f k .The number of coincidence cells in the k'th half-Hz frequency band is denoted by N cell (k).The probability of finding 7 or more coincidences (C ≥ 7) in randomly-distributed noise is fixed to be 0.1%.The probability of finding 12 or more coincidences (the detection threshold, C ≥ 12) in random noise varies over two orders of magnitude, from about 10 −15 to 10 −13 .The probability of finding 14 or more coincidences (C ≥ 14) in random noise varies from about 10 −18 to 10 −21 .
Table VI shows a comparison with the expectations for these simulated signals.

TABLE V :
for Pulsar1, 1.18 for Pulsar9, and 1.11 for the others.The time overlap between the Einstein@Home data segments and the hardware injections.The hardware injections were only turned on about 2/3 of the time.The columns are data segment index j, detector, the duration of the overlap, and the fractional overlap (obtained by dividing the third column by 30 hours = 108 000 s).

TABLE VII :
The post-processing candidates that have 10 or more coincidences.The frequency f cand corresponds to the most coincident candidate in the band.The lowest frequency of a candidate in the band is labeled by fstart.The difference from the highest frequency is given by ∆f cand .The parameters δ cand , α cand , ḟcand , C cand = C H1 cand + C L1 cand and S cand are for the most significant most coincident candidate within the frequency band, where C H1

TABLE VIII :
Post-processing candidates that have 9 or more coincidences and that are not excluded by the veto.The frequency f cand corresponds to the most coincident candidate in the band.The lowest frequency of a candidate in the band is labeled by fstart.The difference from the highest frequency is given by ∆f cand .The parameters δ cand , α cand , ḟcand , C cand = C H1 cand + C L1 cand