Model systematics in time domain tests of binary black hole evolution

We perform several consistency tests between different phases of binary black hole dynamics; the inspiral, the merger, and the ringdown on the gravitational wave events GW150914 and GW170814. These tests are performed explicitly in the time domain, without any spectral leakage between the different phases. We compute posterior distributions on the mass and spin of the initial black holes and the final black hole. We also compute the initial areas of the two individual black holes and the final area from the parameters describing the remnant black hole. This facilitates a test of Hawking's black hole area theorem. We use different waveform models to quantify systematic waveform uncertainties for the area increase law with the two events. We find that these errors may lead to overstating the confidence with which the area theorem is confirmed. For example, we find $>99\%$ agreement with the area theorem for GW150914 if a damped sinusoid consisting of a single-mode is used at merger to estimate the final area. This is because this model overestimates the final mass. Including an overtone of the dominant mode decreases the confidence to $\sim94\%$; using a full merger-ringdown model further decreases the confidence to $\sim 85-90\%$. We find that comparing the measured change in the area to the expected change in area yields a more robust test, as it also captures over estimates in the change of area. We find good agreement with GR when applying this test to GW150914 and GW170814.

The frequency-domain IMR consistency test is routinely performed on GW events detected by the LIGO and VIRGO detectors [14].This test checks the consistency of the low frequency part of the observed signal with the high-frequency part.Results are presented in terms of the differences of the final mass and final spin inferred from the two different parts.These values are computed using extrapolations based on GR models parametrized in terms of the individual black hole masses and spins.Results to date are consistent with these differences being zero, implying that the GR based models are consistent with the data.
A related test uses measurements of the initial and final black holes' parameters as a check of the black hole area increase law.This law states that within classical general relativity, assuming the null energy condition and cosmic censorship, the area of a black hole horizon can never decrease.This was first shown by Hawking [33,34] and was later generalized to include non-differentiable event horizons and a cosmological constant [35].Quasi-local versions of this law, which do not require cosmic censorship are also known [36][37][38].Any stationary, astrophysical black hole is completely described by the Kerr metric in terms of its mass M and angular momentum J if the black hole no-hair theorem holds.The corresponding horizon area is A = 8πM 2 (1 + 1 − χ 2 ), where χ = J/M 2 is the dimensionless spin of the black hole.For a binary black hole coalescence, the area theorem states that the final area A f of the merged black hole will be larger than the combined area (A 1 + A 2 ) of the two initial black holes, It is additionally expected that sufficiently removed from the dynamical regime, i.e. at very early or late times, these black holes will be well described by the stationary Kerr metric.Following [39], a concrete proposal for a test of the area increase law was proposed in [40].The test consists of independent analyses of the early inspiral and the final ringdown stages, leading respectively to independent estimates of the initial and the final masses and spins of the binary.This test is carried out in the time domain, meaning that a portion of the strain data around the merger is excised in the time domain to separate the inspiral and ringdown phases.The data segment around the merger is excised in estimating the parameters, since a violation of the area theorem is perhaps most likely to occur near the merger of the two black holes where the spacetime is highly dynamical.These estimates are then used to obtain the areas of the individual BHs using the Kerr formula.To demonstrate the method, the authors used a simulated GW150914-like binary black hole signal and found that the area theorem could be confirmed at 75% probability.
Recently, Ref. [41] has explored the validity of the area theorem in the time domain and presented observational evidence that actual GW150914 data is consistent with the theorem with a probability of 97% when they do not excise the merger and 95% when they excise 3ms of the merger.They also provide an estimate of the same by truncating the inspiral at different times before the peak amplitude and find that the different measurements support the area theorem with probabilities within 88-97%.
In this paper we apply the test proposed in [40] to two GW events, GW150914 and GW170814.To test the area theorem or the IMR consistency between different phases of the waveform, we need each of these different phases (inspiral/ringdown) to have reasonable signal-to-noise ratio (SNR) to perform the parameter estimations.We choose GW150914 as it has a high network-SNR of ∼ 24 [1,2], with a ringdown SNR ∼ 8.5 [42] 3ms after the merger.The other event, GW170814, showed some support for a deviation from GR in initial analyses (∆M f / M f showed a second peak at higher values away from zero; see Fig. 2 of [14]).To explore the validity of the area theorem we choose GW170814 as our second candidate event.
We separately analyze the data segment before a truncation time, removing later data, to obtain the initial parameters.Similarly, the data before a truncation time are removed, and the remainder is analyzed to estimate the final parameters of each event.The truncation times may be chosen differently between the pre-and post-truncation analyses such that the merger phase is excluded.We employ various waveform models for estimating the parameters to show the effect of waveform systematics on the constraints of the area increase law.We show that ignoring higher overtones in the ringdown waveform model leads to overestimation of the final mass.This yields an inconsistency between the estimated parameters from the pre-and post-truncation analyses, but perversely results in a better agreement with the area theorem.A positive change in area, while obeying the area increase law, may still disagree with GR predictions if the area increase is too large.We use the ratio between the measured and the ex- ), as a measure here.For a perfect measurement, obeying GR predictions, R = 1.However, in the presence of noise, we expect R to follow a Gaussian distribution with unit mean.In order to quantify the agreement of our results with GR, we also compute the mismatch, C R .It denotes the probability of R lying within the range symmetric about the median of the R-distribution, extending to R=1.For GW150914, we find C R = 6.5% when we avoid 26ms of data around the merger.Although, for GW170814, we obtain a better measurement yielding C R = 2.7% when ∼ 13ms of data is avoided around the merger.
This paper is organized as follows.In Sec.II we explain the method adopted here to excise the data and perform parameter estimation.In Sec.III and Sec.IV we develop the different pre-truncation and post-truncation analyses.In Sec.V we discuss our results for the inspiral-merger-ringdown consistency test and in Sec.VI for the test of the area theorem.Our concluding remarks are presented in Sec.VII.

II. DATA PREPARATION AND PARAMETER ESTIMATION
The gravitational wave strain, h(t), observed by a detector can be schematically written as where F +,× are the antenna pattern functions of the detector.The right ascension, α, and declination, δ, define the skylocation of the source in a geocentric coordinate system and the polarisation angle, ψ, defines the relative orientation of the wave frame with respect to the geocentric coordinate system [43,44].t 0 is the arrival time of the signal at the detector and φ 0 is the phase at t 0 .h + (t) and h × (t) are the two independent polarisations of the GW signal, given by where A +,× are slowly varying amplitudes and Φ(t) is a rapidly varying phase.
To estimate the source parameters of a GW signal present in a given data stream, s(t), we use Bayesian inference.We first consider a model for the signal h within GR, parametrized by the source properties such as masses, spins etc., {M 1 , χ 1 , ...} ≡ ϑ .According to Bayes' Theorem, the probability distribution of the model parameters given the data s(t), p( ϑ|s, h) (known as the posterior distribution) is proportional to the likelihood, L(s| ϑ, h), of observing the data given ϑ multiplied by a prior distribution, p( ϑ), on the parameters representing the allowed range and expected distribution on ϑ.For a network of N gravitational-wave detectors d, the likelihood function is given by where s d is the data in the d-th detector (assuming the noise to be uncorrelated, stationary and Gaussian) and h d is the waveform model (or template).Here, the noise-weighted inner product is defined as FIG. 1.The 90% posterior contours of the redshifted final mass (detector frame) M f and the final spin χ f obtained from the various pre-and post-truncation analyses for GW150914.In the left panel, we show the contours from the two pre-truncation analyses performed with two gate-start-times, t m and t hmeco using three different waveform models, IMRPhenomPv2, IMRPhenomXPHM and NRSur7dq4.On the right panel, we show the final mass-final spin contours obtained using damped sinusoids with only the dominant mode [220] and the dominant mode with one overtone [220+221] at different analysis start times (or the "gate-end-times") as the representatives of the post-truncation analyses.
We also show three post-truncation analyses with the NRSur7dq4 model for the three "gate-end-times", t hmeco , t m , and t m + 3ms.The posterior contour arising from the full IMR analysis with NRSur7dq4 is shown by the innermost contour.The full IMR contour is contained within the contours of all the truncated analyses.
where S (d) n ( f ) is the noise power spectral density (PSD) of the d-th detector and ã( f ) and b( f ) are frequency domain representations of the time-domain data.We use PyCBC Inference [45] to evaluate Eq. ( 6) over the large, multidimensional parameter space defining the waveform model.To sample the parameter space we use the dynesty [46] and parallel-tempered emcee [47,48] stochastic samplers.Marginalizing the resulting distribution yields measurements on individual parameters.
Our aim is to analyze separately the early and late parts of the signal to investigate the consistency between the different phases, and to explore the validity of the area theorem.In order to perform these analyses, we excise portions of the templates, keeping a desired time segment intact to perform the parameter estimation.As observed in ref. [40], merely excising times from the template can lead to biases unless a corresponding part of the signal is also removed from the data.Hence, both the template and the signal need to be excised while performing parameter estimation.We accomplish this by doing "gating and in-painting" [9,29].In this method, we zero-out ("gate") the residual s d − h d ( ϑ) over times to be excised, then add a component to the gated times such that the contribution of these times to the likelihood is zero ("inpainting").This removes biases that arise due to the convolution of the inverse covariance of the detector noise with the residual.The end result is the pre-truncation analysis is independent of the post-truncation analysis.
Since the sky location affects the arrival time of the signal in the detectors, the specific time for the gating changes with respect to different sky locations.If the sky location is varied during the analysis, the truncated template favors those sky locations that include more of the signal.This results in an estimation of the parameters shifted from their true values [40].To avoid this, here we consider a fixed sky location.
In order to fix the values for α and δ, we first perform the parameter estimation analysis on the full data stream, using the complete inspiral-merger-ringdown signal.We then use the corresponding maximum likelihood values for the sky location in the area-theorem analysis.Using slightly different values within the range of the posterior distributions on α and δ does not affect our final conclusions.

III. PRE-TRUNCATION ANALYSIS
In this section we focus on measuring the initial parameters from the early part of the data.To model the gravitationalwave signal in the early to late inspiral regime, we use three waveform models: IMRPhenomPv2 [49,50], which models precessing binaries; IMRPhenomXPHM [51,52], which models precessing binaries with higher modes; and NR-Sur7dq4 [53], which also models precessing binaries with higher modes, mass ratios q ≤ 6, and spin magnitudes χ 1 , χ 2 ≤ 0.8.
For the pre-truncation analysis, we excise the residual after a desired time ("gate-start-time") and perform the sampling on the remaining data segment.While the truncation can be started at any time within the inspiral regime, earlier truncation times lead to posterior distributions yielding worse constraints on the estimated parameters due to decreasing SNR of the remaining signal.We choose two different gate-starttimes in our analysis.In the first case, we use the merger time, t m , defining the IMR waveform as the truncation time.For GW150914, we base the merger time on the estimate in [15], while for GW170814 we use the results from [2], as explained in more detail in the respective sections.
In the second case we use the time corresponding to the hybrid minimum energy circular orbit (hybrid MECO), FIG. 2. The 90% posterior contours of the redshifted final mass (detector frame) M f and the final spin χ f obtained from the various preand post-truncation analyses for GW170814.On the left panel we show the contours from the pre-truncation analyses performed with two gate-start-times, t m and t hmeco using three different waveform models, IMRPhenomPv2, IMRPhenomXPHM and NRSur7dq4.On the right panel we show the 90% posterior contours of the redshifted final mass (detector frame) M f and the final spin χ f obtained using the damped sinusoids with only dominant mode [220] and the dominant mode with one overtone [220+221] at the merger time t m as the representatives of the post-truncation analyses for GW170814.We also show the results for the two post-truncation analyses with two "gate-end-times", t m and t hmeco for IMRPhenomPv2, IMRPhenomXPHM and NRSur7dq4 waveform models.The ringdown contour is larger when the 221 mode is included because inclusion of the overtone increases the dimension of the parameter space.The posterior contour arising from the full IMR analysis with NRSur7dq4 waveform model is shown by the innermost solid black contour.
t hmeco [54].The hybrid MECO depends on the mass ratio and the spins of the black holes.It corresponds to a time earlier than t m , which can be considered as the end of the inspiral phase for comparable mass binaries, but before the peak amplitude GW emission.Here the hybrid MECO time is computed using the maximum likelihood values for the masses and the spins of the individual black holes from the full IMR analysis.
In the parameter estimation performed here, we use uniform priors on merger time and the source-frame component masses.For the different analyses we use different prior ranges on the component masses.We keep the interval between the minimum and the maximum values of the component masses sufficiently large so that the posteriors have negligible values on the prior boundaries.
We also assume a distance prior uniform in comoving volume assuming a flat ΛCDM cosmological model.For the spins, we use uniform priors for the magnitude of the spin and isotropic for the orientation.We numerically marginalize over polarization.From each of the inspiral analyses we obtain the posterior distributions on the individual masses and spins.Finally, using the fitting formula in Refs.[55][56][57], we convert the initial masses and spins to the final mass and spin posterior.

IV. POST-TRUNCATION ANALYSIS
After the two individual black holes merge, the remnant is expected to settle down to a final stable black hole during the ringdown phase.Similar to the pre-truncation analysis, we perform post-truncation analyses using the GW signal emitted during this merger and post-merger phase.As opposed to the previous analysis, here we excise the residual all the way up to the analysis start-time ("gate-end-time") to perform sampling on the data segment after the gate-end-time to obtain the posterior distribution on the final parameters.
We preform two sets of post-truncation analyses.In one, we use the late part of the same IMR waveform models as in the pre-truncation analysis to obtain the posterior distribution on the masses and the spins.Using the fitting formula in Refs.[55][56][57] we then convert the component-object posteriors to a distribution on the final mass and spin of the remnant BH.For these cases we use similar priors as in the pretruncation analyses: uniform priors on merger time and the source-frame component masses, and a distance prior uniform in comoving volume.For the spins, the priors are uniform in magnitude of the spin and isotropic for the orientation.We also numerically marginalize over polarization.
In the second set of post-truncation analyses we use damped sinusoids as the signal model.The signal emitted during this post-merger phase is conventionally called the "ringdown" signal and can be decomposed into a sum of exponentially damped sinusoids [58].The gravitational waveform for the ringdown can be schematically written in terms of spinweighted spheroidal harmonics as where the sum is over the various integer quantum numbers, l ≥ 2, − ≤ m ≤ and n ≥ 0 denoting the different quasinormal modes.ι is the inclination angle, ϕ is the azimuthal angle of the black hole with respect to the observer, and −2 S mn (ι, ϕ) are the spin-weighted spheroidal harmonics.The spin-weighted spheroidal harmonics reduce to the usual spinweighted spherical harmonics, Y m for the non-spinning case.
The various mode amplitudes A mn and the phases φ mn depend on the initial configuration of the binary and on the particular theory of gravitation.In our analysis we use spheroidal harmonics [29] and treat the mode amplitudes and the phases as independent unknown parameters.The complex frequencies Ω mn consist of the quasi-normal mode frequencies ( f mn ) and the damping times (τ mn ), which can be determined from the Teukolsky equation [59,60].The no-hair theorem states that all ( f mn ) and (τ mn ) are determined by only two quantities, the mass M f and spin χ f of the black hole.Here, we use χ f ∈ (−0.99, 0.99), with positive or negative values referring to perturbations that are co-or counter-rotating with respect to the black hole's spin.
One can in principle vary many modes in the ringdown analysis with Bayesian inference.However, this increases the dimension of the parameter space leading to weaker constraints on the measured parameters if the contribution to the signal from additional modes is weak.Here we perform two types of ringdown analyses for each event.First we consider only the dominant fundamental mode, for which = m = 2 and n = 0 (i.e., the [220] mode).Secondly, we include one overtone of the dominant mode (i.e., [220+221]), giving a signal template consisting of two modes.In a similar study of GW150914 in Ref. [31], the authors claimed the existence of the fundamental quasinormal mode and one overtone associated with the dominant angular mode ( = m = 2) with 3.6σ confidence.
In the ringdown analyses performed here, we vary the following parameters: final mass M f , final spin χ f , A 220 , φ 220 , and inclination ι in the single-mode 220 case, and add two additional parameters A 221 and φ 221 when considering the additional first overtone.We numerically marginalize over the polarization ψ.Furthermore, we use different analysis start times (or "gate-end-times") for the ringdown template.For each of these cases, the priors on final mass and final spin are assumed to be uniform in the following ranges For A 220 , we choose a prior uniform in log 10 .We allow the overtone amplitude, A 221 , to uniformly vary from zero to ten times that of A 220 when we start the analysis at the merger time, t m .When starting the analysis at later times, we choose a uniform prior so that A 221 < A 220 .

V. CONSISTENCY BETWEEN INSPIRAL-MERGER-RINGDOWN PHASES
Our results for GW150914 are summarized in Fig. 1.In the left panel of Fig. 1, we compare the posterior distributions from the various pre-truncation analyses performed with different segments of the data and different waveform models.The innermost contour refers to the full IMR analysis performed using the complete data segment with the NRSur7dq4 model.We find similar estimates with the IMRPhenomX-PHM and IMRPhenomPv2 models.For all these cases we fix the sky location at α = 1.252, δ = −1.224, the maximumlikelihood values from [61].
For the pre-truncation analyses we consider two different gate start times, after which the residual is excised for each of the three waveform models.The first one is the merger time t m , where we use the estimate from [15], 1,126,259,462.423s GPS time at the LIGO Hanford site.Using the sky-location quoted above, this corresponds to 1,126,259,462.411 s in geocentric GPS time.The second one is the hybrid MECO time, t hmeco = 1,126,259,462.388s in geocentric GPS time, which is ∼23ms earlier than t m .Different waveform models provide different constraints on both the final mass and final spin parameters, but all of them are consistent with the full IMR analysis.
For a fixed waveform model, we obtain better constraints for the "gate-start-time" at t m (denoted by "IMR model before t m ") compared to t hmeco ("IMR model before t hmeco ") in left panel of Fig. 1).This is because later times include signal power from the late inspiral regimes.The posterior contours shrink and converge to the full IMR values as more data are included.Comparing between different waveform models, we find that the strongest constraints are obtained with the NR-Sur7dq4 model.
The different post-truncation analyses are shown in the right panel of Fig. 1.First, we consider the same waveform models as in the pre-truncation analyses, where we excise the residual before t hmeco , t m and t m + 3ms.We find that all three waveform models provide consistent results for each of the gateend-times.Hence we only show the posterior arising from the NRSur7dq4 model.As is evident from the figure, the posterior distribution arising from choosing t hmeco as "gate-end-time" provides the most stringent constraints on the parameters and the closest to the IMR values among these three.This is expected since in this case we exclude only the early inspiral segment of the data and include the late inspiral, merger and the ringdown phases.
Secondly, we perform the ringdown analysis as described in Sec.IV.Here we choose two different start times for the ringdown models, t m and t m +3 ms and for each of these, we do a single mode [220] analysis with only the dominant = m = 2, n = 0 harmonic and another with the dominant harmonic and the first overtone [220+221], = m = 2, n = 1.We find that at t m , the dominant mode [220] analysis provides estimates with higher final mass and higher final spin than the IMR result.The 90% posterior contours are completely disjoint.This is expected: applying the ringdown analysis too early leads to too low a ringdown frequency and results in overestimating the final mass.However, including the first overtone we recover estimates consistent with the full IMR analysis.Starting both the single-and two-mode analyses at 3ms, we find that the dominant mode analysis provides tighter constraints compared to the [220+221] analysis.
We find no support in our posterior for final masses above 100M , but masses below 50M are supported for the quasinormal mode analysis after 3ms with [220+221].We also find that as we end the truncation at later times, the remaining signal becomes quieter and the constraints on final mass and final spin get broader.In comparison with Isi et al. [41], we use the same merger time, t m = 1,126,259,462.423s GPS time at the Hanford detector site, but perform the analysis with a different sky location (see Ref. [31] for comparison).We find that changing the sky location does not affect our final result.We also find similar final mass and final spin estimates from the [220]-analysis at t m and the [220+221] analysis at t m + 3 ms as quoted in [41].However, the estimates from the pre-truncation analysis carried out before t m with NRSur7dq4 differ from the results quoted in [41].In this particular case, we find the final mass and final spin estimates to be (1 + z)M f ∼ 73.3 +9.3 −6.6 M and χ f ∼ 0.72 +0.08 −0.1 respectively.We perform a similar set of analyses with GW170814 and present our results in Fig. 2. We fix the sky-location parameters of GW170814 to α = 0.8, δ = −0.8, the maximum likelihood values from the IMR analysis in [2].We use two truncation times to start or end the gating here, the merger time t m and hybrid MECO time t hmeco .For the merger time, we use the maximum likelihood value of the coalescence time t c from [2].The hybrid MECO time is calculated from the waveform with the maximum likelihood parameters also from [2].These are t m = 1,186,741,861.527s and t hmeco = 1,186,741,861.5136s in geocentric GPS time, corresponding to t m = 1,186,741,861.531s and t hmeco = 1,186,741,861.513s at the Hanford detector site.
The estimated final mass-final spin contours from the pretruncation analyses before each of the gate-times are presented in the left panel of Fig. 2. We also show the IMR results from the analysis of the full data segment with NR-Sur7dq4 waveform model.This is found to be consistent with all the different analyses.
In the case of the pre-truncation analyses, for a fixed gatestart-time, the different waveform models provide very similar results.The post-truncation analyses are presented in the right panel of Fig. 2.Here we find that at fixed gate-end-time, the NRSur7dq4 model provides the most stringent constraints.In addition to these IMR waveform models we show our estimates from the two ringdown analyses with the [220] and the [220+221] modes performed at t m in right panel of Fig. 2. We find that both estimates are consistent with the IMR values.However the analysis with only the dominant mode provides better constraints on the parameters.Due to the post-merger signal being quiet, we do not find reasonable constraints on the final mass and final spin for any analysis performed beyond t m and omit them in the figure.

VI. AREA THEOREM
In this section we investigate the validity of the area theorem from the estimated parameters described in the previous sections.In Table I we report the 90% bounds on the fractional change of the black hole horizon area, , for the five (four) different gate-times and three different waveform models for GW150914 (GW170814).Here is computed using the initial component masses and spins from the pre-truncation analyses and A f is the final area estimated from the post-truncation parameters.A graphical representation of the same is provided in Fig. 3 for GW150914 (left panel) and GW170814 (right panel).In these figures we  Here "IMR model" indicates the different IMR waveform models used in parameter estimation, which is denoted in the header of the remaining three different columns.The percentages of the posteriors that have positive area increase are given in parentheses.For GW170814, due to low SNR (∼ 7) of the signal in post-truncation segment after t m , we find larger posterior bounds on the parameters from IMRPhenomXPHM and IMRPhenomPv2 model, resulting in negative relative change in black hole horizon area.
only provide the results obtained using combinations of NR-Sur7dq4 and the quasi-normal mode model for different data segments.
For GW150914 we obtain a 99% agreement with the area theorem when NRSur7dq4 waveform model before t m is the pre-truncation model and a single [220] damped sinusoid after t m as the post-truncation model.The agreement reduces to 98% when the pre-truncation analysis is limited to times before t hmeco .This high agreement with the area theorem is due to the fact that the single [220] damped sinusoid model after t m overestimates the final mass and final spin, and hence overestimates the final area.
A weaker constraint ∼ 94% on the validity of the area theorem is obtained when the pre-truncation analysis is carried out before t m with the post-truncation analysis being the simple ringdown analyses with [220+221] at t m and just [220] at t m + 3ms (see the second and the third entry in Table I).These two estimates are slightly lower than the results in Isi et al [41].Here, in the pre-truncation analysis before t m , the prior ranges on the component masses (uniform between 11M − 120M ) are chosen to ensure that the posteriors have negligible support at the prior boundaries.Using tighter priors (uniform between 17M − 76M , which excludes some part of posterior on the component masses), we find improved constraints on the area theorem, up to ∼97%, as found in Ref. [41].We also find that using uniform priors on the component masses provides similar results if we use uniform pri-ors on total mass and mass ratio, as was done in Ref. [41].To be precise, we find that the difference in the prior distribution function (whether uniform on component masses or uniform in total mass and mass ratio) does not greatly affect our result, but that a larger prior boundary weakens the constraints on the area theorem.
As might be expected, an earlier truncation time (t hmeco ) for the pre-truncation analysis or a later truncation time (3ms) for the post-truncation analysis gives weaker constraints on the area change.
Comparing different waveform models, we find much stronger constraints on the area theorem for GW150914 using NRSur7dq4 or IMRPhenomXPHM than using IMRPhe-nomPv2.This may be due to the fact that NRSur7dq4 and IMRPhenomXPHM models include sub-dominant modes, whereas IMRPhenomPv2 does not.
For comparison, we also provide a similar study on GW170814 data (see Table I).Compared to the GW150914 results, we find weaker bounds on the area theorem for GW170814.This is due to fact that GW170814 has a quieter post-merger signal.When the pre-truncation analysis is extended only until t hmeco , we find that IMRPhenomPv2 provides similar constraints to NRSur7dq4.On the other hand, when the pre-truncation analysis is extended until the merger, the IMRPhenomPv2 analysis provides a stronger bound.We also find a negative change in black hole area when we use IM-RPhenomPv2 or IMRPhenomXPHM in the post-truncation ) from the pre-truncation analyses to the final parameters assuming GR.All the pretruncation analyses are performed before t hmeco to completely avoid the merger regime.For demonstration purposes here we have only used NRSur7dq4 as the IMR model in all of the pre-truncation analyses.
If the GR prediction is true, we expect R to be exactly 1 for a perfect measurement.Due to statistical uncertainties, the probability distribution on R is expected to be a Gaussian with mean equal to one.The priors on masses and spins used for the different pre-and post-truncation analyses are equivalent to a prior distribution on R that can be both negative as well as a positive, with R > 4 corresponding to a violation of the conservation of energy (see the black solid and the blue dashed lines in Fig. 4).
For GW150914, the post-truncation analysis using the [220] damped sinusoid mode starting at t m produces a probability distribution on R peaking at a value greater than 1.Consistently, for this particular case we find a better agreement with the area theorem (∼ 99% [see Table I]).This is expected, as the area theorem only requires the final area to be larger than the initial area, so the agreement is improved when the distribution for final area is shifted to higher values.The final mass and spin are overestimated for this case as seen in the right panel of Fig. 1, resulting in overestimating the final area, with the bias visible in Fig. 4.
On the other hand, when we consider the first overtone with the dominant mode after t m + 3 ms, we find a shift in the distribution to the opposite direction and the agreement with the area theorem drops.To quantify the agreement with GR, we also quote the mismatch, C R for each of the curves in Figs. 4 in Table II.Here the mismatch C R denotes the probability of getting R within the range of values symmetric about the median value of R, extending to R=1 for each of the curves.To be precise, C R denotes the area under the curve bounded by R = 1 and symmetric around the mean value.Lower values of C R refer to higher accuracy in recovery of the GR estimates.We find that excluding the merger and starting the post-truncation analysis with [220] at t m + 3ms provides a better agreement with GR having C R ∼ 6.5% as compared to other cases where we use different waveform models (IMR model or the damped sinusoids with [220+221]) after t m or t m + 3ms.
A similar study for GW170814 is also given in Fig. 4 and Table II.As opposed to the GW150914 result, here we find that the dominant mode post-truncation analysis recovers the GR value with greater accuracy, with the mismatch only C R = 2.9%.As opposed to this, using the NRSur7dq model for the post-truncation analysis leads to lower agreement with GR.

VII. CONCLUSION
In this paper we provide an extensive study on the validity of Hawking's area theorem and the consistency of different phases of the compact binary signal using GW150914 and GW170814 data.We investigate how the different waveform models, various prior distributions, and use of different data segments affect the final results.We observe that uniform priors on component masses or the total mass and mass ratio yield similar results.
For both GW150914 and GW170814, different waveform models provide different constraints on both the final mass and final spin parameters, but all of them are consistent with the full IMR analysis except one.The ringdown analysis of GW150914 at t m considering only the dominant mode overestimates the final mass and final spin for the remnant black hole.However, considering one additional overtone provides consistent bounds when compared to the result obtained by analysing the full data segment using IMR waveform models.In the case of GW170814, we find that the dominant mode analysis and the [220+221] mode analysis give consistent bounds.
We observe that the different choices for the excision of data and for the waveform models introduce significant systematic errors in the measurements of the validity of the area theorem.In the case of GW150914, for various combinations of excised data, the probability of the validity of the area theorem varies in the range ∼ 74% − 94% when we use the NRSur7dq4 waveform model in the pre-truncation analysis.Using instead the waveform model IMRPhenomXPHM, the range drops by ∼ 3%, to ∼ 71% − 91%.IMRPhenomPv2 leads to an even broader range from 54% to 85%.
In Ref. [40], a study of a similar test of the area theorem was presented based on a simulated GW150914-like signal.This study used IMRPhenomPv2 for the inspiral and a [220] damped sinusoid for the ringdown, finding support for the validity of the theorem of ∼ 74%.Our analysis of the GW150914 data shows a similar result of ∼ 70% when using the corresponding waveform model choices (see 9th row on the 4th column of Table.I).
When considering different waveform models along with the different choices of data duration, the probability on the validity of the area theorem varies in the range 54%−94%.We also see that using a damped sinusoid as the waveform model for the post-truncation analysis improves the probability by ∼ 10% on average as compared to one of the updated IMR waveform models.We also see that if we apply the ringdown analysis with the dominant mode [220] at t m for GW150914, we get a lower ringdown frequency.This overestimates the final mass, making an area theorem test biased in a direction that likely overstates the agreement, with the probability being ∼ 99%.For this reason, we believe the ratio of the measured change in area to the expected change R, and the associated mismatch C R , to be a better metric for determining the consistency of the signal with GR.This yields weaker agreement with GR for the [220] ringdown analysis at t m , for which C R ∼ 67%.This is expected, since a ringdown with only the [220] is not thought to be good model of the signal at merger.For comparison, the [220+221] ringdown analysis at t m + 3ms yields C R ∼ 6.7%.
For GW170814, various data durations used in the analyses lead to larger uncertainty on the probability values as compared to GW150914.For NRSur7dq4 it ranges in between 71% − 87%.However, for IMRPhenomXPHM and IMRPhe-nomPv2 it varies from 17% − 85% and 27% − 90%, respectively.To compare between a damped sinusoid and an IMR waveform model for the post-truncation analysis, we find that the damped sinusoid improves the probability of a positive area increase by ∼ 15% as compared to the case where the IMR model is NRSur7dq4.But the improvement could be as high as ∼ 60% when the IMR models used are IMRPhenomX-PHM or IMRPhenomPv2.The similar findings are reflected in Table II through the values of C R .In this case the best case scenario is obtained using an IMR model before t hmeco and the [220] mode analysis after t m , for which C R = 2.7%.However, using IMR model to analyze the post-truncation segment after t m , C R drops to ∼ 61%.
These large systematic uncertainties highlight the need for binary black hole observations across a longer time period than is possible with current generation detectors.This should become possible in the 2030s with the launch of the spacebased Laser Interferometer Space Antenna (LISA) [62] and the beginning of ground-based "3G" detectors, such as Cosmic Explorer (CE) [63] and Einstein Telescope (ET) [64].With a sensitive frequency band of 0.1 − 100mHz, LISA will be able to detect binary black holes ∼ a year before their merger are detected by ground-based detectors [25,[65][66][67][68][69][70][71].This should yield unprecedented precision measurements of the fundamental laws governing black hole thermodynamics.

FIG. 3 .
FIG. 3. Fractional change in the black hole horizon area, ∆A/A measured i , due to the BBH merger [GW150914 on the left and GW170814 on the right] for different pre-and post truncation analyses performed with NRSur7dq4 or the damped sinusoid models.The change in the area is denoted by ∆A = (A measured f

FIG. 4 .
FIG. 4. Ratio of the measured and the expected change in black hole horizon area, R = (A measured f

TABLE I .
The median values and 90% bounds on the fractional change of the BH horizon areas with different IMR waveform models at different truncation times.The first column corresponds to the different combinations of pre-and post-truncation analyses as indicated by the various rows.

TABLE II .
IMR model before t hmeco -Damped Sinusoid[220]after t m 67.8% IMR model before t hmeco -Damped Sinusoid [220+221] after t m 27.6% IMR model before t hmeco -Damped Sinusoid[220] after t m +3 ms 6.5% IMR model before t hmeco -Damped Sinusoid[220+221] after t m +3 ms 42.3% IMR model before t hmeco -IMR model after t m Mismatch C R for the recovery of R =1 using the various pre-and post-truncation analyses for GW150914 and GW170814.Lower value of the mismatch C R , denotes greater agreement with GR predictions.The IMR model used here is NRSur7dq.