Gravitational-wave astronomy with an uncertain noise power spectral density

In order to extract information about the properties of compact binaries, we must estimate the noise power spectral density of gravitational-wave data, which depends on the properties of the gravitational-wave detector. In practice, it is not possible to know this perfectly, only to estimate it from the data. Multiple estimation methods are commonly used and each has a corresponding statistical uncertainty. However, this uncertainty is widely ignored when measuring the physical parameters describing compact binary coalescences, and the appropriate likelihoods which account for the uncertainty are not well known. In order to perform increasingly precise astrophysical inference and model selection, it will be essential to account for this uncertainty. In this work, we derive the correct likelihood for one of the most widely used estimation methods in gravitational-wave transient analysis, the median average. We demonstrate that simulated Gaussian noise follows the predicted distributions. We then examine real gravitational-wave data at and around the time of GW151012, a relatively low-significance binary black hole merger event. We show that the data are well described by stationary-Gaussian noise and explore the impact of different noise power spectral density estimation methods on the astrophysical inferences we draw about GW151012.


I. INTRODUCTION
The astrophysical parameters of compact binaries are inferred from gravitational-wave data using Bayesian inference. A crucial first step for Bayesian inference is to choose the appropriate likelihood for our data. In gravitational-wave transient data analysis this, typically, hinges on assumptions that the noise is Gaussian and stationary over the period being analysed [1]. If these conditions are met, and if the noise power spectral density (PSD) were known exactly, the appropriate likelihood would be the Whittle likelihood [2] L(d|θ, P ) = 2 πT P exp − 2|d −μ(θ)| 2 T P .
Here P is the PSD,d is the frequency domain interferometer data, T is the duration of the data being analyzed, andμ is our model for the expected signal. However, in practice, we do not have access to the true power spectral density of gravitational-wave detectors and so we have to rely on an empirical estimate. There are two commonly used methods to compute these estimates. The simplest method is to average over the power in neighbouring stretches of data to generate an "off-source" estimate. This method assumes that the PSD does not vary over the duration being averaged and that there are no non-Gaussian features in the data. The other commonly used method is to simultaneously fit the signal and power spectral density to obtain an "onsource" estimate, e.g., [3]. While this method does not involve analyzing as much data, and hence if less effected by non-stationarity and non-Gaussianity, it is far more a ctalbot@caltech.edu computationally expensive. In this work, we are going to focus on the former.
To generate an off-source PSD we typically either compute the mean or median average of neighbouring segments. Taking the mean of neighbouring segments is a commonly used method (sometimes referred to as the Welch or Blackwell method [4]) in gravitational-wave data analysis and many other signal processing applications. However, it is not widely used in gravitationalwave transient data analysis due to it's sensitivity to non-Gaussian transients, "glitches," in the detector noise. To mitigate the effect of these glitches a median average is instead used to compute the PSD as the median is more robust to the presence of large outliers. However, there may be effective methods to either remove or exclude these non-Gaussian features [5][6][7][8][9][10][11].
The other assumption underlying Equation 1 is that the PSD does not change over time; in other words, the data are stationary. In practice, the PSD of real interferometers varies over the time scale of minutes, and so care must be taken when estimating the PSD using longer stretches of time [1,3,12]. Methods for mitigating this non-stationarity have also been considered previously [11]. In this paper, we ignore these possible effects and assume the data we look at are both Gaussian and stationary.
In Chatziioannou et al., the authors compare Advanced LIGO data whitened with a median estimated off-source PSD with on-source PSD estimation methods. They show that the data whitened using the median PSD does not follow a unit normal distribution. They argue that this difference is due to non-Gaussianity and non-stationarity in the data. However, data whitened using an off-source are known to follow a non-Normal distribution even for Gaussian noise. For a mean average, the whitened Gaussian-noise data follow a Student's t-distribution, and the correct likelihood to use is the Student-Rayleigh distribution [14][15][16][17]. In this work, we demonstrate that data whitened with a median PSD estimate follows a different distribution, and we show how to marginalise over the uncertainty in this estimated PSD to obtain the correct likelihood for stationary, Gaussian noise.
On-source PSD estimation using the BayesLine algorithm [3] marginalizes over a prior which models the PSD as a combination of a slowly varying spline and Lorentzians to fit sharp spectral features. This algorithm is generally combined with the BayesWave algorithm [18] to fit astrophysical and terrestrial transients simultaneously with the PSD. However, this does not allow direct inference of physical parameters describing compact binary coalescences, e.g., the masses and spins of merging black holes.
A common approach in compact binary data analysis is to take a median average of the posterior distribution for the PSD obtained using BayesLine rather than an off-source averaged PSD, e.g. [19]. Recently, Biscoveanu et al. introduced a method to marginalize over the uncertainty in these on-source PSDs estimates. However, this is done at significant computational cost requiring ∼ 200× the computational resources as a standard analysis. Additionally, under the formalism presented there, it is not possible to compute the Bayesian evidences necessary to perform model comparison.
The remainder of this paper is structured as follows. In Section II, we derive the appropriate distributions for the likelihood and whitened data after marginalising over the uncertainty in a median (and/or mean) PSD estimate. We provide a brief introduction to Bayesian inference in the context of gravitational-wave astronomy in Section III. We then demonstrate the efficacy of our formalism by applying it to simulated Gaussian data in Section IV. Following this, in Section V we consider a case study using real Advanced LIGO data. We analyze the marginal gravitational-wave candidate GW151012 with both mean and median PSD estimates to understand the effect of marginalizing over the statistical uncertainty and of using the different estimation techniques. This event is convenient for our present purposes since the effects we seek to study are most prominent for marginal signals like GW151012. Some closing comments are then provided in Section VI.

A. Gaussian noise
For stationary Gaussian noise n(t), if we do not manipulate the data in any way before performing a Fourier transform, the noise covariance can be written in the frequency domain as The angle braces denote an ensemble average over realizations. In practice, we work with discrete Fourier transforms and noise covariance matrices Here T is the duration over which the discrete Fourier transforms is performed, i, j index frequency bins, and is the noise amplitude spectral density (ASD). For real data, a number of manipulations are performed before the data are Fourier transformed, which makes things more complicated. The data are bandpassed and windowed in the time domain to prevent aliasing and spectral leakage [1]. As long as the frequency limits of the band-pass filters do not overlap with the frequency range of interest, the band-passing can be ignored. However, the window applied to the data must be considered. Since the window is multiplicative in the time domain, there is a corresponding convolution in the frequency domain, Where T is a Hermitian Toeplitz matrix and there is no implied summation over i or j. For a rectangular window T ij = δ ij and the covariance matrix is diagonal. For generic windows, there is a regular, predictable, off-diagonal power. In reality, the effect of this is much smaller than the effects considered here, and inverting the covariance matrix is a significant computational challenge. We leave a detailed analysis of the effect of nonrectangular windows on parameter estimation and model selection to a future study. The real and imaginary components of the frequencydomain noise follow a normal distribution with variance matrix P , This is not the likelihood which we use when analyzing gravitational-wave transients as we need to simultaneously consider the real and imaginary components of the noise. The likelihood is given by Since we assume the covariance matrix is diagonal this is often written in the simplified form known as the Whittle likelihood, We note that this likelihood is normalized over the complex plane. It is convenient to reduce to one dimension for visualisation purposes, so we note that the power of the noise P i = |ñ i | 2 follows an exponential distribution, All of the expressions above assume that there are no non-Gaussian signals in the data. In order to include signals we simply make the substitution n = d − µ where d is the data and µ is the signal.
Time-domain windows affect the noise and signal components differently. We assume that the window is always applied such that the window does not cause any loss of signal power in the observing frequency band. In addition to the correlation between different frequency bins induced by the window, there is a net power loss in the Gaussian noise given by the mean square value of the window function. Care must be taken to consistently correct for this power loss to avoid biasing our inference, e.g., [21]. Now that we have established which distributions we want to use when the PSD is known, we can address the distributions we want to use when the PSD is uncertain.

B. Median PSD estimate
The generic expression for the likelihood marginalised over uncertainty in an estimated PSD,P , is L P (d|θ,P ) = ∞ 0 dP L(d|θ, P )π(P |P ).
Where L(d|θ, P ) is the likelihood of obtaining the data given model parameters θ and the true PSD P , as defined in Eq. 1, and π(P |P ) is our prior on the true PSD given the estimated PSD. Similarly, using Equation 6, we can write down an expression for the expected distribution of whitened strain residuals,ν =ñ/P 1/2 , Hereñ is the frequency-domain data after removing any signals present. First we need to define the estimated PSD Where is a factor to account for the median being a biased estimator of the mean (see, e.g., Appendix B of [22]), and indexes the segments being averaged over. For simplicity, we assume that we are computing the median of an odd number, N , of non-overlapping stretches ensuring α > 0.
It is convenient to work with a regularised version of the PSD, Since the data are assumed to follow a zero-mean Gaussian distribution with variance P , the Q i are drawn from a χ 2 distribution of order 2, Additionally, we define the usual cumulative distribution function, Φ, and survival function, S, for this quantity, The probability of the median of an odd number of segments follows the median order statistic. This is the probability of the getting the median value from the distribution multiplied by the probability of having m = (N − 1)/2 measurements less thanQ and m measurements larger thanQ. Symbolically, this is Where B is the Beta function and in the last line we perform a binomial expansion. The final piece we need is to relate our prior on Q to our prior on P , π(P |P )dP = π(Q|P )dQ.
Substituting this expression into Equation 10, the PSD-marginalized likelihood is and using Equation 11, the distribution of whitened residuals is While the final expressions Equations 23 and 25 are exact closed form solutions, they are numerically unstable and cannot be safely computed for m 15. We therefore simply construct an interpolant over numerically computed values of the integrals in Equations 22 and 24, which can be rapidly evaluated at run time.

C. Mean PSD estimate
The appropriate distribution to use for a mean averaged PSD has been discussed and independently derived multiple times in the literature, e.g., [15,17]; in this work, we just quote the relevant results. For a mean estimate: Equation 28 is the Student-Rayleigh distribution and Equation 29 is the Student's t-distribution with 2N degrees of freedom, two degrees for each segment being averaged over, coming from the real and imaginary components of the frequency domain strain.

D. Limiting cases
When N = 1 the mean and median are the same and so Equation 25 should reduce to a Student's t-distribution with two degrees of freedom. As expected, we find Analogously, the PSD uncertainty marginalised likelihoods also match and are both The other important limiting case is when N → ∞. It is a well-known result that the Student's t-distribution converges to a Gaussian in this limit; this follows from Taylor-expanding the distribution. Performing a similar expansion, it is possible to demonstrate that the Student-Rayleigh distribution converges to the Whittle likelihood. We numerically confirm that (25) and (23) also converge to a Gaussian distribution and Whittle likelihood respectively.

III. BAYESIAN INFERENCE FOR GRAVITATIONAL-WAVE TRANSIENTS
In the previous section, we derived likelihood functions, marginalized over the statistical uncertainty in an estimate of the PSD. These likelihood functions L(d|θ,P , M) are the probability of obtaining datad given a signal model M described by parameters θ and a PSD estimateP . However, we are generally interested in measuring the source-model parameters and performing model comparison. Using Bayes' theorem we get The term on the LHS, p(θ|d,P , M), is the posterior probability distribution, the probability of the parameters describing the model given the data. The term π(θ,P , M) is our prior distribution which is based on our expectation before analysing the data. The term Z(d|P , M) is the evidence for the data given the model M.
The evidence is used for model comparison by computing Bayes factors for two models While the Bayes factor is often used for model selection, strictly speaking, we should compare the probability of the model given the data, rather than the probability of the data given the model. This is given by the odds which is the Bayes factor comparing the two models multiplied by the prior odds. Throughout this work, we will assume all models have equal prior odds and so the odds reduces to the Bayes factor. Finally, we define the coherent vs incoherent Bayes factor [23], BCI, as a measure of the relative probability that the data contain a coherent signal or incoherent signals in different detectors, Here, the k index multiple independent interferometers, e.g., LIGO Hanford and LIGO Livingston. As in [24] we assume that any incoherent signals are described by the same model as the coherent signals, however, this is not necessarily the case [25]. We note that the BCI is not used as the final discriminator between the coherent and incoherent models as it is missing a prior for the relative rates of coherent and incoherent signals. In [24,25] the priors on rate are empirically calibrated delta functions, however, in [26] the authors fit the rates of coherent and incoherent signals.
In general relativity, non-eccentric binary black hole coalescences are fully described by fifteen parameters. Eight parameters which describe the "intrinsic" properties of the binary (two masses and two three-dimensional angular momentum vectors), and seven "extrinsic" parameters to specify the position, orientation, and coalescence time of the binary relative to Earth. This parameter space is typically explored using stochastic samplers using either Markov-chain Monte Carlo [27] or nested sampling [28].
In order to improve the convergence of the sampling and accelerate our inference, it is possible to use a modification of the Whittle likelihood which is marginalized over the coalescence time, orbital phase, and distance of the source [23,29]. It is not possible to perform these marginalizations as easily while also marginalizing over uncertainty in the PSD. Therefore, in this work, we perform our inference in two stages following [30].
1. First, we analyze the data using the Whittle likelihood marginalized over coalescence time, binary orbital phase, and distance to obtain samples from the posterior distribution and an estimate of the signal evidence and Gaussian noise evidence. Posterior distributions for these marginalized parameters are then recomputed in post-processing. We use dynesty [31], an implementation of the nested sampling algorithm, as implemented in Bilby [32] to sample the space.
2. After this, we importance resample the posterior obtained in the previous step by the ratio of the PSD-marginalised likelihood to the Whittle likelihood to obtain posterior samples and an evidence which include the marginalisation over the statistical uncertainty.
We note that the importance sampling in step 2 only works when resampling to a distribution which is similar to the original posterior distribution. We quantify this by evaluating the efficiency of the resampling and the number of effective samples from the PSD marginalised posterior. Since the marginalized likelihoods converge to the non-marginalized likelihood when averaging many segments, we expect the resampling to be efficient. This method also generically gives a much smaller uncertainty on the Bayes factor comparing the two models than would be obtained by performing two independent sampling runs [33]. A similar method has previously been employed for cosmological inference in [34] to marginalize over uncertainty in an estimated covariance matrix.

IV. DEMONSTRATION WITH GAUSSIAN NOISE
To demonstrate the accuracy of the methods described in Section II we analyze simulated Gaussian noise colored by the Advanced LIGO design sensitivity PSD [35]. In the top panels we consider 512 seconds of simulated data and in the bottom panels we consider 8 seconds of simulated data. The two PSDs used are a mean estimate (orange/red), and a median estimate (green/purple). The gray shaded regions show the expected 1σ, 2σ, and 3σ uncertainties. We average over 31 realisations to generate the PSDs. There are significant deviations when not marginalizing over the uncertainty in the PSD once enough data are considered.
Following [13], we perform three tests on data whitened using median and mean PSD estimates for verification.
As an extension to the analysis presented in [13], we consider the whitened power, |ν| 2 in addition to the real and imaginary components of the whitened strainν. For all estimated PSDs, we average non-overlapping segments with the same duration as the analysis segment.
First, we perform a visual test of the whitened data. In Figure 1, we show the distribution of the real and imaginary components of the whitened strain (first and third panels) and whitened power (second and fourth panels) along with the theoretical expectations. In the top (bottom) pair of panels, we average over 7 (31) independent noise realisations. We see that the data whitened using the off-source estimates follow the expected distributions in each case.
In Figure 2 we show the difference between the empirical and expected cumulative distribution functions, Φ s −Φ, plotted against the expected cumulative distribution function for the same data as in Figure 1. In orange and red we compare the data whitened with the mean PSD estimate with the expected distributions with and without marginalizing over the uncertainty in the PSD respectively. In green and purple we compare the data whitened with the median PSD estimate with the expected distributions with and without marginalizing over the uncertainty in the PSD respectively. The grey regions indicate the expected 1σ, 2σ, and 3σ fluctuations. For both PSD estimation methods, we see that the data agree better with the distributions which marginalize over the uncertainty in the PSD.
When comparing the marginalized distributions to the non-marginalized distributions we see two clear deviations from the expected behaviour. The whitened strain uncertainty-marginalized distribution has wider, symmetric, tails than a normal distribution leading to the negative Φ s − Φ for smallν and positive Φ s − Φ for largẽ ν. The distribution of the whitened power, however, only has a wide tail out to large |ν|, leading to the positive Φ s − Φ for largeν.
To quantify the similarity of the data to the expected distributions we compute the Anderson-Darling statistic Here N is the number of samples, in this case, the number of frequency bins. The numerator is the square of the quantity on the vertical axis of Figure 2 and the integral is over the horizontal axis.
In Figure 3 we show the survival function of the distribution of the Anderson-Darling statistic for four cases: for both the mean and median PSD estimation methods we compare the distribution of the whitened strain to a unit normal distribution and the expected distribution as described in the previous section for the whitened strain (left) and whitened power (right). We also show the expected distribution if the two distributions are the same.
We note that the gradient of the expected distribution of the Anderson-Darling statistic is steeper for the whitened strain than for the whitened power. When applying a window the data before performing the discrete Fourier transforms, the real and imaginary components of the frequency domain strain are no longer independent, reducing the appropriate value of N by a factor of two (see, Appendix A of [21]). We, therefore, avoid the case identified in [13] where the distribution of ν appeared to match the correct distribution better than possible.

V. A CASE STUDY -GW151012
To examine the effect of non-Gaussianity and nonstationarity on the noise properties of real gravitationalwave detectors, we analyze the data in the two Advanced LIGO interferometers [36] at and around the time of GW151012 [37], the lowest significance binary black hole merger included in the first gravitational-wave transient catalog [19]. We analyze 256 s of data ending 2 s after the merger time. We subdivide the data into 32 8 s chunks, the first 31 chunks are used to compute the PSD and the final 8 s are the on-source data.
We apply a Tukey window with a roll off of 0.2 s to each of the chunks to suppress spectral leakage. We then fast-Fourier transform the windowed time-domain strain before averaging the PSD chunks. We do not apply the conventional window amplitude correction factor to either the PSD or the data. After applying the fast-Fourier transform, we remove all data below 20 Hz and above 1024 Hz.
The resulting PSDs and the power in the on-source data are shown in Figures 4 and 5. Figure 4 shows data from the LIGO Hanford interferometer and Figure 5 data from the LIGO Livingston interferometer. The orange curves show the mean estimated PSDs and the green show the median estimated PSDs. All the PSD estimates are at the centre of the scatter in the on-source data, as expected. We note that the width of the scatter on the mean PSDs is slightly smaller than for the median due to the slower convergence of the median estimate.

A. Data quality tests
The main reason for using a median estimate over a mean estimate is to mitigate the effect of large non-Gaussian transients. However, the formalism derived above is invalid if there is a large outlier in the data being averaged over. Therefore, we try to identify if any of the segments are clear outliers. We compute the power per segment divided by the mean power in all the other segments. This is essentially testing how well the data in each of the segments is whitened by the data in the other segments. We apply an empirically tuned threshold of 1.5 for the mean whitened power per segment. Any segment with a mean power above this value we discard and repeat the test. We identify that one segment of the Hanford data which fails this test with a mean whitened power of 2.66. Visual inspection reveals that this segment has a larger amplitude than all the others below ∼ 100Hz. No significant outliers are present in the selected Livingston data.
Additional tests of the quality are possible and performed routinely during gravitational-wave data analysis. For example, researchers often remove specific frequency bins if the noise at that frequency is known to be non-Gaussian, e.g., around the frequency (and higher harmonics) of mains electricity [1]. A possible extension would be to use the normalised average power used above to track non-stationarity in the data, a similar method is used in in [11]. Implementing further data quality cuts and vetoes will improve the quality of our off-source PSD estimates and is an interesting avenue for further study.

B. Data whitening
We repeat the tests performed in Section IV on the data. In Figure 6, we show the deviations from the expected cumulative distribution functions for the data from the Hanford (top) and Livingston (bottom) interferometers. On the left, we show the real and imaginary components of the whitened strain and on the right the whitened power. In orange we show the difference between the empirical mean-estimated PSD whitened data and expected mean-marginalised distributions (29,28), in green the difference between the empirical medianestimated PSD whitened data and expected medianmarginalised distributions (25,23). In red (purple) we compare the data whitened using the mean-(median-) estimated PSDs with the distributions which do not account for the uncertainty. In gray we show the 1σ, 2σ, and 3σ expected deviations.
In Table I, we quote the corresponding values of the Anderson-Darling statistic for each of these lines. We see that the largest deviations are observed when using data whitened with a median estimated PSD and compared FIG. 6: The difference between the empirical and analytic cumulative distribution functions of whitened strainν and whitened power |ν| 2 in the LIGO Hanford (top) and Livingston (bottom) interferometers with two different power spectral density estimation methods at the time of GW151012. The gray shaded regions show the expected 1σ, 2σ, and 3σ fluctuations. For the orange and green curves, the data are compared with the distributions which marginalise over uncertainty in the power spectral density estimate. For the red and purple curves, the data are compared with the distributions which do not marginalize over uncertainty in the power spectral density estimate.
We note that the latter pair of curves deviate from the 3σ region, while the former does not. The data are well described by a stationary Gaussian process when marginalising over uncertainty in the power spectral density.  to the non-marginalised distributions. We also see that, with the exception of the strain components in Hanford, the Anderson-Darling statistic is always smaller when using the appropriate marginalised distributions.

C. Impact on inference
We analyze the data using Bayesian inference as described in Section III twice, once each with the mean-  In blue and green we use the mean estimated PSD while in yellow and red we use the median estimate. In blue and yellow we neglect the uncertainty in the PSD estimate and in green and red we marginalise over the appropriate statistical uncertainty. We note that in both cases, marginalising over the uncertainty increases the width of the chirp mass posterior and decreases the average SNR. The mean estimated PSD gives a wider chirp mass posterior than the median PSD estimate and we see a corresponding decrease in the recovered matched filter SNR.
averaged and median-averaged PSDs to obtain samples from the posterior distribution and Bayesian evidences under four sets of assumptions.
1. The data are well described by the mean-estimated PSD and the Whittle likelihood.
2. The data are well described by the mean-estimated PSD and the Student-Rayleigh likelihood.
3. The data are well described by the medianestimated PSD and the Whittle likelihood.
4. The data are well described by the medianestimated PSD and the median marginalized likelihood, Equation 23.
FIG. 8: The posterior distribution for right ascension (top) and declination (bottom) for GW151012 for four different models. In blue and green we use the mean estimated PSD while in yellow and red we use the median estimate. In blue and yellow we neglect the uncertainty in the PSD estimate and in green and red we marginalise over the appropriate statistical uncertainty. In this case, marginalising over the uncertainty does not make a large difference to the inferred posterior distributions. However, the different PSD estimation techniques while giving consistent posterior distributions, give different posterior weights to different parts of the sky.  In Table II we show the natural logarithm of the BCI under these four set of assumptions. We find that both PSD estimation methods have ln BCI ≈ 10 which is a moderately strong preference for the coherent hypothesis, although we note that a full treatment requires careful consideration of prior odds. For both PSD estimates the BCI decreases slightly when marginalizing over the uncertainty. The increase in the BCI when using the median estimated PSD is mirrored in the increased signalto-noise ratio ρ in the lower panel of Figure 7. This is likely due to the different handling of non-Gaussian features in the mean and median PSD estimation methods.
In Table III we show the natural log Bayes factors comparing the marginalized to unmarginalized likelihoods for both PSD estimation methods. In both cases, we see a strong preference for the model which marginalizes over the uncertainty. This preference is much larger for the median PSD estimate. This can be understood by the fact that the large |ν| tail of the median marginalized likelihood is broader than the mean marginalized likelihood, c.f., Figure 1, lower-right panel.
In Figures 7 and 8 we show selected posterior distribution under our four sets of assumptions. In the top panel of Figure 7 we show the posterior distribution for the best measured combination of the component masses, the chirp mass The parameters m i are the masses of the two component black holes. In the bottom panel we show the network matched filter signal-to-noise ratio (SNR). We find that the recovered SNR is larger when using the median PSD estimate and the marginalizing over the uncertainty in the PSD increases the posterior support at SNR less than the maximum found SNR but does not decrease the maximum SNR. Correspondingly, we see that the posterior for chirp mass is slightly less strongly peaked when marginalizing over uncertainty in the PSD, and when using the median PSD estimate. In Figure 8 we show the posterior distribution for the parameters describing the position on the sky, right ascension α and declination δ. The impact of marginalizing over the uncertainty in the PSD does not significantly affect the inferred sky localisation of the binary. However, the two PSD estimation methods recover different posterior distributions within the same region on the sky.
The fact that the differences in the posterior distributions and Bayes factors when using the different PSD estimation methods are larger than the corrections due to marginalizing over the statistical uncertainty mean that either one or both of the estimation methods are producing a biased estimate of the true PSDs. The source of this bias is presumably non-stationarity and/or non-Gaussianity in the data used to estimate the PSD. It is not possible to determine which estimate is less biased and so this must be considered as an extra source of systematic uncertainty in off-source PSD estimates.

VI. DISCUSSION
Performing astrophysical inference on gravitationalwave data requires an estimate of the noise power spec-tral density (PSD). In practice it is not possible to know this perfectly, only to estimate it from the data. Multiple methods of estimating the PSD are used, and each carries with it a different class of statistical uncertainty. In this work, we derived the relevant statistical uncertainty for an estimation method, which is widely used when analyzing gravitational-wave transients, the median average. We obtained a closed-form expression for the likelihood, which marginalizes over the statistical uncertainty, and demonstrated that simulated Gaussian data matches this distribution.
We then applied our new results to the lowest significance transient in the first LIGO/Virgo gravitationalwave transient catalog, GW151012. We analyzed this event using two different PSDs with likelihoods, which did and did not marginalize over the appropriate statistical uncertainty, one using a median average, and one using a mean average. We showed the PSD estimation method has a clear effect on the inferred posterior probability distribution and Bayesian evidence. The changes in the posterior distributions and Bayesian evidence when marginalizing over the statistical uncertainty is more subtle. However, for applications which require precise estimates of the evidence such as [24][25][26], these small differences will be crucial.
There are many interesting extensions to the work presented here, which are left to future work. These include implementing data quality tests when analyzing real gravitational-wave data, which are known to be non-Gaussian and non-stationary over timescales of minutes to hours. Examples of this can be found in other areas of gravitational-wave data analysis. For example, searches for gravitational-wave transient signals implement methods to track and mitigate non-stationarity [11] and remove large non-Gaussian transients [10,11]. Searches for continuous gravitational-wave sources and the stochastic gravitational-wave background include algorithms to detect and remove stretches of data where the PSD is rapidly fluctuating [5] or frequencies where the data are known to be non-Gaussian, e.g., around the frequency of AC mains electricity [1]. By combining these methods and the statistical models presented here, we can enable precision astrophysical inference for gravitationalwave transients without large computational overheads. of LIGO Laboratory, the LIGO Scientific Collaboration and the Virgo Collaboration. LIGO is funded by the U.S. National Science Foundation. Virgo is funded by the French Centre National de Recherche Scientifique (CNRS), the Italian Istituto Nazionale della Fisica Nu-cleare (INFN) and the Dutch Nikhef, with contributions by Polish and Hungarian institutes. The authors are grateful for computational resources provided by the LIGO Lab and supported by National Science Foundation Grants PHY-0757058 and PHY-0823459.