Inference with finite time series: Observing the gravitational Universe through windows

Time series analysis is ubiquitous in many fields of science including gravitational-wave astronomy, where strain time series are analyzed to infer the nature of gravitational-wave sources, e.g., black holes and neutron stars. It is common in gravitational-wave transient studies to apply a tapered window function to reduce the effects of spectral artifacts from the sharp edges of data segments. We show that the conventional analysis of tapered data fails to take into account covariance between frequency bins, which arises for all finite time series -- no matter the choice of window function. We discuss the origin of this covariance and show that as the number of gravitational-wave detections grows, and as we gain access to more high signal-to-noise ratio events, this covariance will become a non-negligible source of systematic error. We derive a framework that models the correlation induced by the window function and demonstrate this solution using both data from the first LIGO--Virgo transient catalog and simulated Gaussian noise.


I. INTRODUCTION
Time-series analysis underpins recent advances in gravitational-wave astronomy. The vast majority of gravitational-wave data analysis relies on windowing, a procedure that multiplies the time-domain data segment by a window function that tapers off at the beginning and end of the segment. Analysts apply tapered windows to mitigate two effects: (1) spectral artifacts arising from the Fourier transform of the data segment edges (Gibbs phenomena) and (2) correlations between neighboring frequency bins. While correlations between neighboring frequency bins can be reduced, they are never eliminated.
Choosing a suitable window requires balancing various considerations including the spectral leakage from instrumental lines and the low-frequency "seismic wall," effectiveness mitigating the Gibbs phenomenon, and the loss of signal. For a systematic study of the properties of different windows, we refer the reader to, e.g., [1,2] for theoretical introductions, and [3] for a specific discussion in the context of gravitational-wave data analysis. Once the data are windowed, they are typically analyzed in the frequency domain where the noise is described by a power spectral density (PSD), and it is assumed that each frequency bin is statistically independent. However, this assumption is not true for a finite stretch of a longer noise process.
The assumptions underpinning the Whittle approximation have been thoroughly studied and many refinements have been proposed (e.g., [4][5][6][7][8][9][10]). In this paper, we derive from first principles a formalism which accounts a colm.talbot@ligo.org for the correlations between neighboring frequencies introduced by the window function applied to obtain finite time series. We show how correlations between frequency bins arise from the fact that quasi-stationary Gaussian noise processes are fundamentally described in the frequency domain by continuous functions, which imply infinite-duration time series. We derive a simple expression for the "finite-duration" covariance matrix, which encodes the correlations naturally present in all finite time series and identify our result as a specific basis for a Karhunen-Loève transformation (KLT) (see, e.g., [11]). We show that there are practical applications where the current conventional approach of windowing data incurs systematic errors, which though small, produce invalid inferences when data are combined in large sets or when we analyze gravitational-wave events with high signal-tonoise ratio (SNR).
The remainder of this paper is organized as follows.
In Section II, we present the formalism underlying our framework. We derive the finite-duration covariance matrix for the analysis of finite time series. In Section III, we perform a demonstration, applying our method to the binary black hole merger events, GW150914 [12], GW170814 [13], and GW190521 [14] and contrast with results neglecting covariances. We show how the current conventional windowing procedure can lead to faulty inferences when many gravitational-wave measurements are combined. While our demonstration uses data from gravitational-wave astronomy, the framework we put forward is broadly applicable to all time-domain analysis. We show that the problem is fixed by using the finiteduration covariance matrix. We provide closing thoughts in Section IV.

II. FORMALISM
In this section, we derive a likelihood that enables us to analyze time-series data characterized by stationary, Gaussian noise in a way that correctly takes into account covariance between neighboring frequency bins that arises for all finite time series.

A. Basic notation
We consider time-series data d(t) consisting of signal s(t) and noise n(t) d(t) = s(t) + n(t). (1) In gravitational-wave observatories like LIGO [15] and Virgo [16], n(t) is a time series of dimensionless strain (change in length per unit length). Transient gravitational-wave signals from merging binaries are characterized by comparing the data to gravitational waveform templates h(t).

B. Continuous, infinite-duration noise
To start, we focus on noise in the absence of signals. The noise can be expressed in the frequency domain as The noise is best described as continuous because it is defined for an arbitrary choice of frequency: with a sufficiently long measurement, it is possible in principle to achieve sufficient resolution to measureñ(f ) for any value of f . If we assume the noise is Gaussian, the likelihood of observing a specific noise realization is characterized by a covariance matrix the diagonal of which is equal to the noise PSD S We refer to C µν as the "infinite-duration" covariance matrix. It is defined continuously for arbitrary values of f µ and f ν and, in the time domain, it is defined for all times: (−∞, +∞). Throughout, repeated indices are summed over unless otherwise specified. In the next subsection, we contrast C µν (calligraphic script and greek indices) with the finite-duration covariance matrix, denoted C ij (no calligraphic script and roman indices), which is defined only for discrete frequency bins f i and f j (or equivalently, for a finite duration). If we further assume that the noise is stationary (the PSD does not vary in time), C µν is diagonal.
We note that, even if the true covariance matrix is diagonal, a naive empirical estimate of this quantity necessarily has some statistical uncertainty and will not generically be diagonal, even for a stationary Gaussian process. We emphasize that in this section we seek to derive expressions for the true noise covariance matrix assuming C µν is known. and neglect the impact of empirical estimates. In Appendix B, we describe how we estimate C µν in practice.

C. Non-continuous, finite-duration noise
In practice, we only consider finite stretches of data. In this subsection, we derive the properties of finite stretches of continuous noise. Let us consider data measured with sampling rate f s over data segment duration T . There are independent measurements. We assume that the noise has no content above half the sampling rate and so we can probe every frequency without aliasing. In practice, applying an aggressive low-pass filter removes this highfrequency content. These data can be represented either in the timedomain as a real N -component time series with spacing 1/f s or in the frequency domain as a complex frequency seriesd i with −f s /2 ≤ f ≤ f s /2 with spacing 1/T where the endpoints and zero-frequency component are required to be real. These two domains are related via the discrete Fourier transform [17] The frequency-domain covariance matrix for finiteduration, non-continuous noise is: Here, the angled brackets denote ensemble averages over noise realizations. The widely-used Whittle approximation assumes that data at each of the analyzed frequencies are independent, i.e., C ij is a diagonal matrix. This is generally a good approximation. However, as we show in this paper, the assumption of independence is not strictly valid when analyzing a finite stretch of data, especially when using a tapered window. We begin by defining our window function w, which describes how we measure some segment of noise from what is, in theory, an infinite-duration noise process: Here, w j is a time-domain window function and the frequency-domain noise is now the convolution of the original frequency-domain noise with the Fourier transformed window function. The prime denotes quantities associated with the windowed data. We stress that this window function is always present in gravitational-wave data analysis problems and is defined for all times, not just the analysis segment. It is often ignored when it is a top hat function, i.e., The most commonly used window function in parameter estimation for compact binary coalescences is the Tukey window Common limiting cases of the Tukey window are the rectangular window (α = 0) and the Hann window (α = 1). Throughout this paper, we use Tukey windows with α = 0.1 unless otherwise specified, although the formalism described here holds for arbitrary window functions.
Since convolution is a linear operation, we can express the windowed frequency-domain data using standard linear algebra notationd where repeated indices denote summation. Here,W kµ is a non-square subset of the circulant matrix that projects infinite-duration noise with frequency resolution δf → 0 to finite duration data with frequency resolution 1/T . Here,w is the discrete Fourier transform of the time-domain window function.
We can now write the covariance matrix for a finiteduration data stream with an arbitrary window function in terms of the frequency-domain covariance matrix of the infinite-duration process and the window function us- The estimated (top) and analytic (bottom) covariance matrix for simulated noise using a noise power spectral density estimated from LIGO Livingston data around the time of the binary black hole merger GW170814. The color bar indicates log 10 power spectral density in units of Hz −1 . The estimate is obtained using approximately three months of simulated data. The off-diagonal behavior agrees well with the analytic expression. The window used is a Tukey window with α = 0.1.
ing Eq. 7 and Eq. 13 If the underlying data are Gaussian and stationary, C µν is diagonal and the finite-duration covariance matrix only depends on the window function and the infiniteduration PSD. While we initially defined the roman indices as covering frequencies from [−f s /2, f s /2], in practice we analyze a narrower (positive) frequency range from [f min , f max ]. In this paper we will set f min = 20 Hz, f max = 800 Hz; this omits data that are affected by the bandpass filter we apply. From here, roman indices will refer to this frequency range only. Formally, one must carry out matrix products over the infinite axes denoted by greek indices to obtain the finiteduration covariance matrix in Eq. 16. In practice, however, via numerical experiment (see Appendix B) we find that infinite-duration matrices can be adequately approximated using a frequency resolution 16 times that of the analysis segment. In other words, when analyzing a 4 s data segment (frequency resolution = 0.25 Hz), we may model infinite-duration noise with a 1/64 Hz (or higherresolution) noise model. In practice, we use a 1/128 Hz resolution. The resolution of the noise model can be tuned to achieve the necessary precision for a given problem.
In Fig. 1 we show the empirical finite-duration covariance matrix in the neighborhood of the diagonalobtained by averaging approximately three months of simulated Gaussian noise broken into 10 6 4 s segments and using a Tukey window with α = 0.1 (Eq. 7) (top panel)-compared with the exact finite-duration covariance matrix obtained with our analytic expressions (Eq. 16) (bottom panel). The underlying PSD (S µ ) is estimated using data from the LIGO Livingston interferometer at the time of the binary black hole merger GW170814 with a resolution of 1/128 Hz using the method described in Appendix B. The two panels agree well, demonstrating the correctness of this formalism. However, the averaging estimate converges unacceptably slowly for practical use, and so our analytic expression is essential for practical applications.
We are interested in the inverse of the covariance matrix C −1 ij . We discuss the issue of inverting this matrix in Sec. II E. We also emphasize that the finite-duration PSD (the leading diagonal of C ij ) S i = S i , i.e., the finite-duration PSD is not the infinite-duration evaluated at the desired frequencies. Additional technical details about this formalism are provided in the Appendix. In Appendix A, we discuss how our formalism is related to "coarse-graining" procedures used for PSD estimation (e.g., [18]). In Appendix C and Algorithm 1, we describe in detail how we approximate C µν and C ij from real data.

D. Quantifying spectral leakage
From Fig. 1, we see that windowing produces offdiagonal elements in the finite-duration covariance matrix in general. In this subsection, we illustrate how this covariance is compounded by sharp spectral features. The relationship between the choice of window and spectral leakage from sharp spectral features is well known in signal processing; see, e.g., [1]. In gravitational-wave detectors, there are a number of such features commonly referred to as "lines" [19] or the low-frequency seismic wall. The leading-order correction from the off-diagonal components of the covariance matrix is clearly seen using the following metric: We consider two regimes that determine the limiting behaviors of ∆ i : (i) where the power spectral density is slowly varying (locally white noise) and (ii) near a large spectral feature. We first consider the case of white noise, i.e., C µν = Sδ µν . We write the finite-duration covariance matrix and We note that the quantityW iµW * jµ is real and its amplitude monotonically decreases as |i − j| increases. The contamination is therefore maximized when i and j are neighboring finite-duration frequency bins (we denote this as j = i ± and emphasize that i ± = i ± 1 due to the omitted interstitial frequencies in the finite-duration analysis). We find The variable ∆ i is a monotonically increasing function of the Tukey parameter α, with ∆ α=0 i = 0 for a rectangular window and ∆ α=1 i = 2/3 for a Hann window (see Appendix E for a derivation). While the power spectrum of gravitational-wave detectors is not white, it is slowly varying away from the large spectral features and so we expect this approximation to hold throughout much of the observing band.
The other limiting case we can analytically describe is the behavior near a spectral line with relative amplitude L at frequency f µ l with an otherwise white spectrum. In this case we can approximate We can now write the row corresponding to the line in the finite-duration covariance matrix:  [13]. We see two competing effects. First, there is broadband contamination at ∼ 10% when the PSD is slowly varying. The magnitude of this contamination rises with increasing Tukey α (see Appendix E). There is also large contamination near instrumental lines due to spectral leakage that is suppressed with increasing Tukey α.
In the last line we assume Lw i−µ l 1 and so the contamination will fall off with the same spectral shape as the window function: We emphasize at this stage that µ l is not necessarily (and in fact almost guaranteed to not be) contained in the set of roman indices; i.e., the line is not exactly a delta function at one of the 1/T Hz spaced frequency bins. If the line were located at one of these frequencies, a rectangular window would have zero spectral leakage and be the optimal choice. However, when this is not the case, the rectangular window maximizes the contamination from lines. In practice, sharp spectral features in interferometer power spectra have finite width and therefore a rectangular window will never generically avoid leakage from lines. In Fig. 2, we show this quantity for the PSDs used in our analysis of the gravitational-wave signal, GW170814 [13] (Sec. III A), which was observed in 2017 by the Advanced LIGO [15] and Virgo [16] observatories. For the two LIGO observatories [15], the magnitude of the off-diagonal terms is approximately constant throughout the observing band with exceptions for the known lines. The "violin modes" for the Livingston interferometer (∼ 500 Hz) are significantly larger than for the Hanford interferometer and the contamination near the lines is larger and more broadband. Given this behavior, one might think that we can neglect the impact of the off-diagonal terms if we remove from the analysis the frequency bins in the neighborhood of the lines. However, for the Virgo observatory [16], we see that the average correction is much larger across most of the band and is frequently > 20%. This can be attributed to the Virgo PSD being less smoothly varying. A cut based on frequencies with unacceptably large contamination would Eigenvalues of an estimated noise power covariance matrix for data close to GW170814 for a Tukey window with α = 0.1. The large eigenvalues correspond to spectral lines and the low-frequency seismic wall; the slowly varying region corresponds to the smoothly varying observing band. The dashed vertical line denotes the points after which we discard the eigenmodes as determined by the window information loss. The dotted lines indicate the point at which the eigenvalue drops below the minimum of the finite-duration PSD. Both of these well-approximate the turnover after which the eigenvalues rapidly decline due to information loss from windowing. The difference is most pronounced for the Virgo data, which is consistent with the increased contamination between frequency bins (see Figure 14). remove most of the observing band. However, data analysis with the finite-duration covariance matrix can be used to take into account covariance in Virgo noise.

E. Regularized inversion
Having characterized the covariance between frequency bins due to windows, we turn to the inversion of the covariance matrix required to evaluate the likelihood function. Since tapered window functions go to zero at the edges by construction, the covariance matrix is not invertible [20]. To deal with this issue, we perform a regularized inversion using a singular value decomposition (SVD) and discard the smallest eigenvalues. The SVD of a Hermitian matrix can be written as Here, Λ kl = δ kl λ k (no summation) is a diagonal matrix with the eigenvalues λ k on the leading diagonal. Regularization simply involves removing eigenmodes corresponding to small eigenvalues. In practice, this is done by setting the eigenvalue to ∞: Eigenmodes for the covariance matrix estimated from data from the LIGO Livingston interferometer close to GW170814. This matrix encodes the correlations between physical frequencies. The horizontal axis corresponds to the physical frequencies, while the vertical axis is the order of decreasing eigenvalue. The bottom-most eigenmodes are the ones that are discarded. These eigenmodes are associated with very broadband frequency content.
where a fraction of the N eigenmodes are retained. The regularized matrix and its inverse arē In Fig. 3, we show the eigenvalues in decreasing order for the covariance matrix estimated at the time of GW170814 and the corresponding eigenmode spectrum is shown in Fig. 4. We identify three regimes in the eigenvalue spectrum: 1. Large eigenvalues with a steep slope at low eigenmode number. These predominantly correspond to frequencies where C ij is large, e.g., near spectral lines and at low frequencies.
2. A slowly varying region encompassing the majority of the eigenvalues. This corresponds to the remainder of the frequencies where the PSD is smoothly varying.
3. A rapid drop to the smallest eigenvalues. This is a characteristic feature of ill-conditioned matrices and is the region we should remove when regularizing.
We consider two methods to determine how many eigenvalues to discard. In the first method we consider the power loss from the window function. The amount of information lost by the window is related to the timeaveraged square of the window function. The effective number of independent time samples is We choose the fraction of eigenvalues to retain based on the window function: The vertical dashed line is at N eff and shows the number of modes we omit to account for the information loss due to the tapered window; we note that this matches well the transition to the badly behaved modes. For the second method we set a threshold value corresponding to the minimum of the power spectral density over the analyzed frequency band λ T = min µ (S µ ). We show this threshold in the dotted line in Figure 3. The two threshold values agree well for small Tukey α although the former method systematically removes more eigenmodes. We find that the choice of regularization scheme has a negligible impact on the results of our analysis.

F. Finite-duration likelihood
We can now write our final result, the regularized likelihood with a finite-duration covariance matrix: where detC is the determinant of the finite-duration noise covariance matrix,h is a template for the signal, and the inner product is defined as As the exponent in the likelihood can still be written as weighted inner products between data and template, we can analytically marginalize over extrinsic parameters in the same way as for the diagonal likelihood [21].

G. Relation to the Karhunen-Loève transform
The Karhunen-Loève theorem states that for any stochastic process there exists a basis in which the noise covariance matrix is diagonal, and the transformation into this basis is referred to as the KLT. For a colored stationary Gaussian process that is periodic with period T , this basis is the discrete Fourier transform with spacing 1/T Hz. The Whittle likelihood approximation specifically assumes that this basis also diagonalizes the covariance matrix for a subset of a longer Gaussian process that is not periodic with period T . As we have demonstrated, the covariance matrix in the Fourier basis is not diagonal in this case.
We note that the KLT is closely related to the SVD and therefore identify that the basis for the KLT of a finite subset of a longer Gaussian process is the basis found in Section II E. This provides a second, equivalent, interpretation of the inner product in Eq. 34: where we have definedx i ≡x i U ik . The likelihood is now We identify that this has the usual form of the likelihood except that all of the quantities are described in the eigenbasis of the KLT, rather than the Fourier basis.

III. DEMONSTRATION
To demonstrate our formalism we perform two tests. First, we analyze three binary black hole mergers to demonstrate that the effect of the off-diagonal corrections is small but noticeable for confidently detected signals. Second, we demonstrate that, although this effect produces a minor correction to modest-SNR events, neglecting the impact of off-diagonal terms in the noise covariance matrix biases precision estimates, such as evidence calculations required for searches for a population of weak, sub-threshold signals as in [22].

A. Single events
We compare the posterior distributions obtained using both the conventional and finite-duration covariance matrices for three of the observed binary black hole mergers GW150914 [12], GW170814 [13], and GW190521 [14]. We choose these as they are relatively high-mass systems (with detector frame primary and secondary masses of m 1 ≈ m 2 ≈ 30 − 40M for GW150914 and GW170814 and m 1 ≈ m 2 ≈ 150M for GW190521) for which we expect the tapered window to have a comparatively large impact on the data as they use a relatively short stretch of analysis data. They also span a large range in time, with one event from each of the first three observing runs of the advanced detector network, leading to significantly different PSDs.
For each event, we analyze 4 s of data centered at the trigger time for the event and estimate the PSD using a 512 s stretch of data ending 2 s before the trigger. We apply a bandpass filter between 16 and 1024 Hz and resample the data to a new Nyquist frequency of 2048 Hz using GWpy [23] to mitigate spectral leakage from low and high frequencies. We use a Tukey window with α = 0.1 for the analysis segment. The details of the covariance matrix calculation are described in Algorithm 1. We employ the IMRPhenomXPHM waveform approximant [24][25][26]  in the frequency range 20 − 800 Hz; for GW190521 we set the upper frequency limit as 300 Hz. We neglect the impact of calibration uncertainty or uncertainty in our estimate of the PSD. For GW150914, we analyze data from the two LIGO interferometers, for the other two events, we analyze data from the two LIGO interferometers and Advanced Virgo.
We show the posterior distribution for two of the intrinsic binary parameters when assuming C ij is diagonal (blue) and using the full covariance matrix (orange) for GW150194, GW170814, and GW190521 in Figs. 5, 6, and 7 respectively. We assume the same prior for both analyses. The primary and secondary mass refer, respectively, to the more-massive and less-massive black-hole masses. The largest difference we see is in the component masses for GW170814, primarily driven by a change in the inferred mass ratio. There is no visible difference between the posteriors for the other events. The change in the posterior distributions is at a similar level to the changes due to marginalizing over uncertainty in the detector calibration [27,28] or PSD estimate [9,29,30], but less than the difference due to using different PSD estimation methods [9]. Errors of this magnitude become important when we combine many events together for population studies and/or precision tests of general relativity (see, e.g., [31,32]). FIG. 6. Intrinsic parameters for the binary black hole merger GW170814 [13] using our new finite-duration likelihood (blue) and the diagonal likelihood (orange). The primary and secondary mass (m1, m2) refer, respectively, to the more-massive and less-massive component masses. Including covariance between neighboring frequency bins slightly shifts the inferred posterior distribution for the mass ratio.

B. Combining data segments
By combining large numbers of time-series data segments it is sometimes possible to extract weak signals not visible in individual segments, for example, to measure the population of gravitational waves from unresolved compact binaries [22,[33][34][35] and to detect gravitationalwave memory [36]. Combining data segments can also have the effect of magnifying systematic errors that are small enough to ignore when considering just a single segment in isolation. For example, failing to take into account uncertainty in estimates of the noise PSD leads to low-level excess power, which can be mistaken for a population of sub-threshold gravitational-wave signals [9,29].
Here, we show that the correlations between neighboring frequency bins induced by all windows must be taken into account to avoid systematic error in studies that rely on precision measurements combining many segments. We illustrate this point using simulated data to carry out a mock search for a population of sub-threshold simulated signals in simulated Gaussian noise with a known PSD. We employ the formalism from [22] to estimate the fraction of M = 160000 data segments of which 15000 contain a simulated signal.
Our likelihood is a mixture model, which allows for each segment to consist of either signal S or noise N : Here, L(d i |S) is the likelihood of data segment i given the signal hypothesis while L(d i |N ) is the likelihood given the noise hypothesis and the parameter ξ is the fraction of segments that contain a signal. We simulate data in 128 s chunks and break up the data into 4 s segments.
We compute the finite-duration PSD matrix using the known PSD used to simulate the data. The known PSD is as estimated for the LIGO Livingston detector in our analysis of GW170814. We do not re-estimate the PSD from the simulated data in order to avoid uncertainty intrinsic to the PSD estimation method. For the simple example considered here we assume that the signal in each segment containing a signal is the same [37]. We consider four choices of signal: (i) a Gaussian burst centered at 50 Hz and standard deviation 10 Hz, (ii) a Gaussian burst centered at 500 Hz and standard deviation 10 Hz, (iii) a 60M total mass binary black hole merger waveform, and (iv) a 300M total mass binary black hole merger waveform. The first two are two well-localized signals in frequency with random perfrequency phases. The lower-frequency burst does not significantly overlap with large spectral features, while The signals considered for our population test. The four signals we consider are: a Gaussian burst centered at 50 Hz and standard deviation 10 Hz (blue), a Gaussian burst centered at 500 Hz and standard deviation 10 Hz (orange), a 60M binary black hole merger (green), and a 300M binary black hole merger (red). In purple, we show the finite-duration amplitude spectral density for the LIGO Livingston interferometer at the time of binary black hole merger GW170814. The Gaussian bursts isolate the impact of specific spectral features, e.g., the large lines around 500 Hz. The binary black hole mergers are broadband and accumulate most of their signal-to-noise ratio at low frequencies, near the sharp rise due to seismic noise. the higher-frequency burst overlaps with the largest spectral lines.
The second two are representative of the gravitationalwave signals observed so far and considered in the search proposed in [22]. These signals are relatively broadband in frequency with well-defined frequency evolution. The 60M waveform is chosen to be representative of the most commonly observed systems and the 300M waveform is chosen based on the largest observed system [38]. As in the previous section, for the binary black hole waveforms we use the IMRPhenomXPHM waveforms. In Figure 8, we show the amplitude spectral density of each of the signals along with the diagonal of the finite-duration covariance matrix (S i ).
For each iteration, we calculate the posterior for ξ two ways, once using the standard diagonal likelihood and once using the finite-duration likelihood. The results are We analyze ∼ 7.5 days of simulated Gaussian noise divided into 4s segments (1.6 × 10 5 segments), and 15000 of these segments contain four different signals: (i) a Gaussian pulse with standard deviation 10 Hz and central frequency 50 Hz (ii) a Gaussian pulse with central frequency 500 Hz, (iii) an equal-mass binary black hole merger signal with total mass 60M and (iv) an equal-mass binary black hole merger signal 300M . The amplitude of the signals and the PSD are shown in Figure 8. In blue we show the posterior distribution obtained using the standard likelihood that ignores correlations between neighboring frequency bins induced by the window function. In orange, we show the posterior distribution using a likelihood that accounts for these correlations. We see that the diagonal method gives a biased result when the signal is centered at 500 Hz and when analyzing simulated binary black hole systems.
shown in Fig. 9 for a Tukey window with α = 0.1. For all considered signals, the diagonal method produces a visibly biased result. When the Gaussian signal is close to large spectral lines (b) the bias is most significant. The bias for the binary black hole signals (c-d) is larger than for the low-frequency Gaussian (a). We attribute this to two effects. The binary black hole signals ac-cumulate significant SNR at low frequencies where the noise is dominated by leakage from seismic noise. This is somewhat mitigated by the application of a high-pass filter to suppress content below 16 Hz. They also have a characteristic phase evolution, which contributes significant resolving power to the likelihood function. This means that the phase coherence between the neighboring frequencies is important to the evaluated likelihoods.

IV. DISCUSSION
As gravitational-wave astronomy matures, the growing catalog of events enables exciting new science. However, as we probe increasingly higher SNRs, and as we combine larger ensembles of data segments, our analyses are increasingly susceptible to systematic error from approximations in our models and data analysis. Many sources of systematic error in our modeling have been considered in recent years including calibration uncertainty [27,28], waveform systematics [31,[39][40][41], and noise estimation [9,29,30]. In this paper, we examine how correlations between frequency bins are inevitably introduced by windowing in time-domain analysis and find corrections at the same level as those due to calibration and PSD uncertainty. We show how these correlations can be modeled using a frequency-domain noise covariance matrix, thereby avoiding bias. By performing a singular value decomposition, we identify that the basis that diagonalizes the covariance matrix is not the Fourier basis as usually assumed, but depends on both the PSD and the choice of window function.
We demonstrate that, while the impact of the offdiagonal components in the noise covariance matrix is small for individually resolved events, they are important for precision estimates of the Bayesian evidence. These precision estimates of the Bayesian evidence are crucial when attempting to use Bayesian evidence estimates as a detection statistic, e.g., [22,42].
A natural extension of the framework provided here is to incorporate marginalization over sources of systematic uncertainty. Marginalization over waveform or detector calibration uncertainty can be trivially combined with this method. Marginalizing over uncertainty in the PSD estimate would require either modifying the BayesLine algorithm [43] to include modeling the full noise covariance matrix or an analytic method such as in [6,9,44]. Even after considering all these forms of systematic uncertainty, we still need to develop methods to deal with the non-Gaussianity and non-stationarity of real data. This is an active area of development [45][46][47][48][49][50][51][52]; however, establishing a unified treatment is left to future studies.
While the analysis here has focused on analysis of short-duration transients, windows are also used in searches for longer duration gravitational-wave signals [53,54]. Typically, long-duration searches use much longer segment durations than those described here with Hann windows to mitigate leakage from lines. As we showed in this paper, this may lead to correlations between neighboring frequency bins for these analyses. In searches for the stochastic gravitational-wave background, a coarse-grained PSD estimate is typically used (see Section A) which may reduce the impact of these correlations. Additionally, window functions are used to excise short duration non-Gaussian "glitches" from analysis [46,[55][56][57] (typically this is referred to as "gating"). These gates have very short rise times (small Tukey α) and so introduce significant contamination around spectral lines.

ACKNOWLEDGEMENTS
We are grateful to Sharan Banagiri, Katerina Chatziiaonnou, Joe Romano, and Alan Weinstein for many fruitful discussions and the anonymous referee for a thoughtful and detailed review. CT acknowledges the support of the National Science Foundation, and the LIGO Laboratory. This work is supported through the Australian Research Council (ARC) Centre of Excellence CE170100004. SB is also supported by the Paul and Daisy Soros Fellowship for New Americans, the Australian-American Fulbright Commission, and the NSF Graduate Research Fellowship under Grant No. DGE-1122374. This research has made use of data, software, and/or web tools obtained from the Gravitational Wave Open Science Center [58,59] (https://www.gwopenscience.org), a service of LIGO Laboratory, the LIGO Scientific Collaboration and the Virgo Collaboration. LIGO is funded by the U.S. National Science Foundation. Virgo is funded by the French Centre National de Recherche Scientifique (CNRS), the Italian Istituto Nazionale della Fisica Nucleare (INFN), and the Dutch Nikhef, with contributions by Polish and Hungarian institutes. The authors are grateful for computational resources provided by the LIGO Lab and supported by National Science Foundation Grants PHY-0757058 and PHY-0823459. This is document LIGO-P2100090.
Cython and CUDA implementations and Python wrappers of the PSD matrix calculation are available at github.com/ColmTalbot/psd-covariance-matrices. We also provide example scripts demonstrating how to produce the results in this paper and some supplementary results in the same location.

Appendix A: Connection to coarse-grained PSD estimation
While time-averaging of short segments to estimate power spectral densities (e.g., Welch averaging) is common in gravitational-wave data analysis, an alternative "coarse-graining" method is used in some areas, especially searches for the stochastic gravitational-wave background; see, e.g., [18]. In this appendix, we demonstrate that coarse graining is a special case of the projection method we use in this paper.
The coarse-grained PSD is defined for a frequency resolution δf as In the second line, Π is the unit boxcar function. As for the window operator in this paper, we can express this as a circulant matrix with non zero entries only in a small region. The corresponding time-domain window is the sinc function. We note that the coarsened frequencydomain covariance matrix is diagonal by construction in this case.

Appendix B: Estimating the infinite-duration PSD
Computing C ij using Equation 16 requires an expression for the infinite-duration covariance matrix C µν . In this Appendix, we describe our method for estimating C µν from the data. This represents stages 1 − 4 of Algorithm 1.
We assume the data are stationary and Gaussian and therefore the only nonzero elements of C µν are the leading diagonal S µ . We approximate this infinite-duration PSD using segments longer than our final analysis segment. In order to avoid long-term drift of S µ in real interferometer data, we restrict ourselves to 512 s of data. We then subdivide this into non-overlapping segments with duration = D and compute a median average PSD using a Hann window as our representation of S µ .
To assess the convergence of this method, we compute S i using a range of values of D. In Fig. 10, we show the ratio of the finite-duration PSDs to the finite-duration PSD obtained when using D = 128 s (the longest duration we consider). The estimate quickly converges with increasing D away from the spectral lines. Close to the forest of large lines around 500 Hz, the difference between the D = 64 s and D = 128 s estimates is ≈ 20%. We therefore infer that D = 64 s is sufficiently converged.
The key quantity for ensuring adequate convergence is the ratio between D and the original segment duration. In this case, S µ has a resolution 16× as fine as S i . We use this procedure when analyzing real data throughout this paper. For a typical 4 s data segment, with sampling frequency 2048 Hz, analyzed with a 1/128 Hz noise model, we must perform the matrix operations in Equation 16 with O(2 18 × 2 18 ) elements. A naive implementation at double precision would require a prohibitive amount of computational resources. Fortunately, the computation can be performed much more computationally efficiently. The first thing we note is that C µν is diagonal andW µν is a circulant matrix, i.e., it is fully specified by a single row/column (w µ ). Since each matrix can be represented using a single vector, we do not need to form any matrices with the 1/128 Hz resolution, dramatically reducing memory requirements. We also identify that C ij is a Hermitian matrix, reducing the computational cost by a factor of two.
We provide functions to compute the coarsened PSD matrix from a frequency-domain window and PSD implemented in Cython and cupy compatible CUDA. The former runs in O(N 3 ) time and the latter in O(N ) wall time. The latter is used to produce the results in this paper and is less computationally expensive than the SVD performed on the coarsened data.
In Algorithm 1, we describe the process used to compute the regularized inverse covariance matrices used for our binary black hole analyses. We note that the method presented here requires an estimate for the true underlying power spectral density. In practice, we estimate this from the data by taking a median average of the PSD in several longer segments.

Appendix D: Additional figures
In this Appendix we show the PSD matrix (top left), regularized PSD matrix (top right), SVD eigenmatrix (bottom left), and regularized inverse PSD matrix (bottom right) for our analysis of GW170814 for LIGO Hanford (Fig. 11), LIGO Livingston (Fig. 12), and Virgo (Fig. 13). We note that the data for all three interferometers show the same qualitative features and quantitative differences determined by the specific sensitivity of each interferometer.
We see that the PSD matrix is dominated by the leading diagonal and nearby frequencies and correlations at frequencies corresponding to spectral lines. The correlations from the spectral lines are less pronounced in the regularized PSD matrix, however, there is more broadband correlation between frequencies above/below the most sensitive frequency. The divide between frequencies above and below the most sensitive frequency can also be seen in the SVD and regularized inverse PSD matrices.

Appendix E: Window overlaps
In order to quantify spectral leakage for white noise we find Equation 20: The denominator in this expression is simply the total window power w 2 . In the continuum limit, the numerator limit. For finite duration window functions, the rectangular window still gives ∆ i (α = 0) = 0 and the Hann window still has a maximum for |n| = 1. We note that an equivalent result is shown in [2,60] in the discussion of asymptotic independence.
The generic Tukey window does not have an analytic Fourier transform, however, numerical experiments confirm that for all other values of the α parameter, neighboring frequency bins are not independent and ∆ i is a monotonically increasing function of α. In Figure 14, we show ∆ i as a function of α. FIG. 14. Fractional contamination for white noise (Equation 20) as a function of Tukey α parameter. We note that the bias is a monotonic function of α ranging from ∆i = 0 for α = 0 to ∆i ≈ 2/3 for α = 1.