VARAHA: A Fast Non-Markovian sampler for estimating Gravitational-Wave posteriors

This article introduces VARAHA, an open-source, fast, non-Markovian sampler for estimating gravitational-wave posteriors. VARAHA differs from existing Nested sampling algorithms by gradually discarding regions of low likelihood, rather than gradually sampling regions of high likelihood. This alternative mindset enables VARAHA to freely draw samples from anywhere within the high-likelihood region of the parameter space, allowing for analyses to complete in significantly fewer cycles. This means that VARAHA can significantly reduce both the wall and CPU time of all analyses. VARAHA offers many benefits, particularly for gravitational-wave astronomy where Bayesian inference can take many days, if not weeks, to complete. For instance, VARAHA can be used to estimate accurate sky locations, astrophysical probabilities and source classifications within minutes, which is particularly useful for multi-messenger follow-up of binary neutron star observations; VARAHA localises GW170817 $\sim 30$ times faster than LALInference. Although only aligned-spin, dominant multipole waveform models can be used for gravitational-wave analyses, it is trivial to extend this algorithm to include additional physics without hindering performance. We envision VARAHA being used for gravitational-wave studies, particularly estimating parameters using expensive waveform models, analysing subthreshold gravitational-wave candidates, generating simulated data for population studies, and rapid posterior estimation for binary neutron star mergers.

These estimated parameters are broadly categorised as a) intrinsic parameters: parameters that are directly responsible for the orbital evolution of the binary, such as masses, spins, tidal parameters, eccentricity, periastron distance, etc., and b) extrinsic parameters: parameters that are observer-dependent, namely, luminosity distance, binary's orientation from the line of sight, sky location, coalescence phase and coalescence time of the GW signal.For binaries moving on a quasi-circular orbit with spins aligned with the orbital angular momentum, the extrinsic parameters do not impact the orbital evolution of a binary and consequently can only impart an overall shift to the amplitude or phase evolution of a signal.However, for binaries moving on an eccentric orbit, or with spins misaligned from the orbital angular momentum, the GW signal morphology depends on the extrinsic parameters [30,31].Although the number of intrinsic parameters may change depending on the physics of the problem, the number of extrinsic parameters remains fixed at seven [32].The basics of PE have been thoroughly discussed, and we cite some of the early works [33,34].
Multiple methods have been developed to perform PE on GW signals.The minute-scale analysis, BAYESTAR [35], focuses on the rapid localisation of GW signals by estimating only the extrinsic parameters of the signal.It achieves this by keeping the intrinsic parameters fixed to the estimate provided by the GW search analysis that first identified the signal.The packages that perform PE of both the extrinsic and intrinsic (full) parameters through stochastic sampling methods include LALInference, which was previously the go-to analysis for GW PE [36], PyCBC Inference [37] and Bilby [38][39][40][41], which offer greater flexibility and modularity.These analyses often employ Nested [42] or Markov Chain Monte Carlo (MCMC) sampling [43] to obtain estimates for the binaries parameters.The packages RapidPE and RIFT use a non-Markovian approach to create an embarrassingly parallel infrastructure and provide comparatively faster processing times.They can also provide the marginal likelihood for straightforward model selection [44][45][46] (although see Refs.[47,48] for other model selection algorithms).Alternatively, methods to approximately estimate the binaries parameters have also been developed [49], including recent advancements with utilising machine-learning techniques [50][51][52][53][54][55][56].
Typically, stochastic sampling analyses that perform full PE can take hundreds or thousands of CPU-hours of processing time [57,58].This high computational requirement is not sustainable, as the detection rate of GW signals is expected to increase [59] due to the continued improvement in the sensitivity of the GW detectors.Fast, minute-scale PE, is therefore crucial, especially for low-latency analyses where accurate skymaps and source classification probabilities are needed for timely follow-up by other multi-messenger facilities.
Attempts at improving computation time have primarily focused on speeding up waveform generation and computation of the likelihood function [48,[60][61][62][63][64], or by utilising machine learning techniques [50][51][52][53][54][55][56].However, a significant improvement in computation time can also be achieved by efficiently populating the parameter space.In this paper, we introduce VARAHA, an alternative sampling technique that iteratively discards regions of low likelihood, and converges to the region of the parameter space that contains high posterior probability density (i.e. the posterior mass).We achieve significant gains in speed by introducing a) a non-Markovian method that performs a comparable number of computational operations, resulting in a similar number of effective samples, as Nested sampling but in significantly fewer iterations, and b) splitting one large-dimensional sampling problem into two small-dimensional problems, where it samples the extrinsic parameters first and uses the acquired information to also sample the intrinsic parameters.These advantages result in significantly reduced processing times arising from greatly improved process parallelisation and array vectorisation in the analysis.
VARAHA can perform GW PE in a matter of a few minutes.Currently, it is limited to using waveform models that a) assume the spins are aligned with the orbital angular momentum, meaning that the binary does not precess [30], and b) restrict attention to only the = 2 gravitational-wave multipole, meaning that higher multipoles are neglected.Nevertheless, VARAHA can meaningfully be used to perform fast PE to localise and classify a source for electromagnetic follow-up, estimate parameters for a large number of sub-threshold GW candidates, and generate PE for simulated populations.We intend to make future extensions that will extend its applicability for the estimation of in-plane spins, eccentricity, and tidal deformability.Future extensions will also include uncertainties arising from the calibration of detector data, and fast methods for waveform generation and matched filtering, which currently consumes a significant portion of the computation.
In section II we describe the basics of the analysis and the factors responsible for the faster processing time.We describe its application to the parameter estimation of GWs in section III.In section IV, we present PE for the individual observations GW151226 [65] and GW170817 [66], as well as a population level validation using hundreds of simulated signals.We also discuss the computational requirements of VARAHA as well as its scalability with the number of CPUs.

II. METHOD
PE is the process of obtaining the probability distribution of parameters, θ, for a model, m, which is believed to describe the observed data, d.Given d, PE generates an estimate for the posterior Probability Density Function (PDF), p(θ|d, m), through Bayes' theorem, p(θ|d, m) ∝ L(d|θ, m) π(θ|m) . ( The likelihood, L(d|θ, m), is a function of the observed data and model parameters, and π(θ|m) is the prior probability for the model parameters.The posterior distribution can alternatively be described as a weighted prior, with weights given by where L max is the maximum likelihood 1 .Typically, obtaining a closed-form expression for the posterior probability across the parameter space θ is not possible.This means that we are not able to trivially evaluate Eq. 1, even if a functional form for the prior distribution is given.It is therefore common to draw samples from the unknown posterior distribution through stochastic sampling techniques, such as Nested sampling [42] or Markov-Chain-Monte-Carlo [43].For the case of Nested sampling, a series of contours of increasing likelihood converge, through an iterative process, to the region of high likelihood.Practically, a Nested Sampling routine draws a series of live points and, at each cycle, it stores the live point with the lowest likelihood, and replaces it with a new point drawn randomly from the prior; the new point is accepted through the Matropolis-Hasting's algorithm [67].A new contour that encases the current set of live points is generated, and eventually, the contours converge to the regions of the highest likelihood.The stored points, along with their weights, constitute the samples drawn from the posterior distribution.
A drawback of Nested sampling is that a few thousand live points are evolved in series, meaning that sampling can take thousands of cycles to complete.In addition, while the recovered posterior distribution becomes more accurate as the number of live points is increased, the number of cycles needed to sample from the unknown posterior distribution also increases (the number of cycles scales linearly with the number of live points [68]).
Dynamic nested sampling [69,70] enables the number of live points to change throughout the analysis.It has been shown that this can be optimised for PE analyses, allowing for a reduction in run time by a factor of ∼ 70 for relatively simple cases.This significant improvement is possible when the region of high probability is contained in a small region of the prior volume (as is typically the case in high-dimensional problems) [69].
In this paper, we introduce an alternate sampling technique that has a major advantage over Nested sampling: a significantly reduced wall and CPU time.Our sampler, VARAHA, achieves this by (a) drawing thousands of points from the relevant regions in the parameter space that contain the posterior probability mass and (b) intelligently defining a likelihood threshold, below which defines a region of the parameter space that can be safely ignored.This approach means that although VARAHA evaluates the likelihood a comparable number of times as Nested sampling, it computes the likelihood for a large number of points at once, meaning that the computation can be efficiently vectorised and parallelised over multiple CPUs for enhanced performance.This is in contrast to computing the likelihood for a relatively low number of points at once, as is done in Nested sampling.Since evaluating the likelihood is often the most computationally expensive element of PE, VARAHA is able to perform PE in a fraction of the wall and CPU time compared to other conventional samplers.
VARAHA iteratively discards regions of the parameter space that do not contribute to the posterior distribution and restricts attention to the remaining regions of high likelihood through a series of cycles.Subsequent cycles draw points from within only the regions of high likelihood identified in the previous cycle.Once the final volume has been found, the points contained within the final volume are returned, along with weights given by Eq. 2. We pictorially show VARAHA's algorithm in the top row of panels in Fig. 1.
VARAHA defines each volume to contain likelihoods greater than L .This volume is defined as, and the posterior probability mass contained within the volume is Here Θ[x] is the Heaviside step function, Θ[x] = 1 for x ≥ 0 and 0 otherwise.As L → −∞, the probability tends to 1 and the volume becomes the full prior volume.However, by choosing L for which P (L ) is close to, but slightly smaller than, unity the volume can be significantly smaller than the full prior volume.VARAHA defines L such that the corresponding volume V (L ) contains a probability P thr .This likelihood value is hereafter referred to as the likelihood threshold.By identifying this region and sampling only from it, VARAHA can very efficiently generate samples.
The challenge, then, is to efficiently find the volume that contains a probability P thr , referred to as the live volume.Since it is generally not possible to evaluate Eq. 4 analytically to find the appropriate likelihood threshold, other than for simple toy examples, VARAHA computes it numerically, and iteratively identifies both the appropriate threshold and the corresponding region of parameter space.
At the beginning of the first cycle, shown in the top left panel of Fig. 1, the live volume is simply the full prior volume.A large number of live points, N live = N pts , are randomly generated within the live volume (shown in green in the figure), and at each point, the likelihood is evaluated.We can use these points to perform a Monte-Carlo integral of Eq. 4 to obtain the required likelihood threshold L thr that contains the desired P thr .However, particularly for sharply peaked posteriors in large parameter spaces, it is likely that only a small number of points will contribute significantly to the Monte-Carlo integral.Therefore, in addition, we also calculate the likelihood threshold L Nmin such that a minimum number of points, N min , lie above the threshold.If L thr < L Nmin , we have successfully identified the region which contains P thr .However, if L Nmin < L thr , we have not sampled the posterior distribution sufficiently densely.In this case, we exclude the regions of parameter space with likelihood below L Nmin and repeat the process.
Selecting N min points with the largest likelihoods identifies a region of high likelihood in the parameter space, which is shown by the orange dots in the top row of panels in Fig. 1.The volume of this region is given by Eq. 3 and is numerically evaluated through the Monte-Carlo integration where V 0 is the full prior volume.The uncertainty in the Monte-Carlo integration is directly related to the Poisson fluctuations in N min : The set of N min points provide an estimate of the volume and discretely constitute the region of the parameter space enclosed by this volume.With N min chosen to be a few thousands, the error in the estimated volume is the order of a few per cent.The procedure outlined above presents two situations: i) The posterior mass enclosed by L Nmin is larger than P thr : Starting from the full prior volume, the live volume reduces to the volume enclosed by L Nmin (bounded by the red contour in the Fig. 1).The fluctuation in L Nmin , due to sampling with a finite number of points, will have a negligible impact on the posterior mass contained by it.The largest fluctuation occurs for a uniform distribution and is equal to the Monte Carlo (MC) errors of a few per cent in the volume.However, in general, the likelihood distribution is expected to decay from one or several maxima.ii) The posterior mass enclosed by L Nmin is smaller than P thr : This occurs when there are a large number of points with non-negligible weights, n > N min , and a threshold smaller than L Nmin is required to enclose the posterior mass, P thr .To estimate n, VARAHA calculates the empirical distribution function (EDF), F (L ), which simply gives the fraction of points with likelihood less than L . 2 The final likelihood threshold is then calculated as L thr := F −1 (P thr ).
VARAHA evolves the first situation (i) using multiple cycles of MC integration until it reaches the second situation (ii).Each cycle performs the integration using live volumes, which themselves decrease as cycles progress.A generalised form of Eq. 3 for this progression is given by, For the second and consecutive cycles, VARAHA samples from the live volume and ignores the remaining parameter space.Since we only store samples contained within the volume and not the volume itself, the structure of the live volume is not known.VARAHA navigates this by a) generating a multi-dimensional grid covering the full parameter space, and b) selecting those hypercubes that contain at least one of the live points from the previous cycle.Once the relevant hypercubes have been identified, the live volume has been reconstructed.N pts points are now uniformly scattered within the reconstructed live volume (the purple points in the top middle panel of Fig. 1).The likelihood of all the points is calculated and only those with a likelihood above the threshold are kept (the green points in the top middle panel of Fig. 1).Since the same number of points are now scattered within a much smaller volume, VARAHA is able to increase the number of live points contained within the live volume and thus setting the stage to perform the next MC integration.
It is important to choose an appropriate spacing for the multi-dimensional grid.If the grid spacing is too large, the reconstructed live volume is much larger than the value estimated in Eq. 5.If the grid spacing is too small, the reconstructed live volume will not include relevant regions of the parameter space that did not get sampled due to random fluctuations in the location of points.VARAHA constructs the multi-dimensional grid over the full parameter space, requiring that the volume of each hypercube in the grid is equal to the error in the estimated volume δ V .We motivate this choice as follows: the chosen hypercube volume is √ N min times larger than the average volume of V /N min approximately occupied by each live point.This choice ensures that the volume will have a maximum uncertainty of δ V if this uncertainty arises due to Poisson fluctuation near a single live point.In this case, the multi-dimensional grid is expected to enclose most of V even though constructed using information gained from the MC volume V .A uniform grid spacing in all dimensions, therefore, leads to the number of bins per dimension: , where N dim is the dimensionality of the parameter space. 3he iterative process described above gradually increases the value of the likelihood threshold and discards the uninteresting regions of the parameter space.However, a very small amount of posterior mass is also lost in the process.A new likelihood threshold approximately increases the discarded posterior mass by an amount, where j identifies the samples that are enclosed by the live volume in the current cycle, and L Nmin is the new likelihood threshold calculated for the next cycle.This corresponds to the posterior mass contained between two concentric circles in Fig. 1.In practice, when deciding if L Nmin > L thr , by calculating the EDF, we require the discarded posterior mass not to accumulate to more than 1 -P thr .This implicitly ensures that the posterior mass enclosed by the live volume is at least P thr .VARAHA does not evolve the likelihood threshold any further if the discarded posterior mass becomes very close to 1 -P thr .
Even though the likelihood threshold no longer evolves, the number of bins used to create the multi-dimensional grid continues to increase as the number of samples inside the live volume also increases.Cycles are terminated once the desired accuracy is obtained.The stopping condition could be determined based on a fixed number of weighted samples or a fixed number of cycles.At the end of the analysis, VARAHA returns a set of weighted samples, where the weight of each sample is determined by Eq.2.All the samples have a likelihood value greater than the final value of L thr .Samplers that employ MCMC methods typically return a set of unweighted samples, with sample weights equal to 1.It is possible to generate these samples from the weighted samples by performing rejection sampling on the weighted samples (see Appendix A for details).The final set of unweighted samples has a sample size that is approximately i w i [72].However, since n eff ≥ i w i [73], with equality only if all the weights are equal to 1, the rejection sampling process leads to a reduction in the information contained in the samples.

A. Implementation
VARAHA implements this algorithm as follows, 1. Sprinkle points within the multi-dimensional grid: (a) N pts points are uniformly drawn from the reconstructed live volume and the likelihood for all points is calculated.(b) Live points with a likelihood larger than the threshold from the previous cycle are then identified.If this is the first cycle, N pts points are uniformly distributed within the entire prior volume and all points are kept, meaning that N pts = N live (this is equivalent to setting the likelihood threshold from the previous cycle to negative infinity) 2. Calculate the likelihood threshold: The likelihood that accumulates no more than 1 -P thr of the truncated posterior mass up to the current cycle, L thr , is calculated using the N live live points that are enclosed by the live volume.A second threshold, L Nmin , which ensures that N min points lie above the threshold, is also calculated.The final likelihood threshold is chosen to be the minimum of these two values.

Calculate the volume of the live volume:
The volume and uncertainty in the live volume is calculated through MC integration; see Eq. 5.
4. Calculate effective sample size: All points from the current or previous cycles that cross the current likelihood threshold are stored as weighted samples.
The number of effective samples is calculated using, where w i is the weight of each sample, defined in Eq. 2 [74].

Reconstruct the live volume:
A multidimensional grid is created that spans the whole parameter space and hypercubes that register at least one live point are kept.The resulting hypercubes reconstruct the live volume.
6. Repeat: Steps 1 to 5 are repeated until a stopping criterion is reached.Example stopping criteria include: terminating once a specific number of effective samples have been obtained or terminating after a specific number of cycles have elapsed.
The final output is then a set of weighted samples, where the weights are given by the likelihood evaluated at the sample, normalised by the maximum likelihood; see Eq. 2.

B. Sampler settings
The number of samples drawn from the uniform distribution N pts , the posterior mass required to be contained within the live volume, P thr , and the minimum number of points to retain, N min , are the sampler parameters that the user is free to specify.In our testing, we found that the choice of N pts has only a small effect on the recovered posterior probability.Although reducing N pts decreases the number of computations in each cycle, it increases the number of cycles needed to obtain the required effective number of samples.We find that choosing N pts to be in the range 100, 000 − 1, 000, 000 provides a good compromise between these.Similarly, N min sets the minimum fractional error on the estimated MC volume.In our testing, we found that as the dimensionality of the problem increases, N min should increase correspondingly.Depending on the complexity, N min within the range of 1, 000 − 10, 000 is adequate for most distributions with dimensionality between 2-8.
As VARAHA only keeps points that cross a specific likelihood threshold, defined through P thr , we deliberately discard part of the parameter space with low likelihood during each cycle.This can have an impact on the recovered posterior distribution if P thr is too small.In contrast, if the threshold is chosen to be too large, then we exclude a limited region of parameter space, which leads to a significant increase in analysis time for limited benefit.In our testing, we found that P thr = 0.999 is sufficient for most cases.

C. Comparison with Nested Sampling
In Fig. 1, we compare the convergence of VARAHA (top panels) and a Nested sampling algorithm (Dynesty4 [70], bottom panels) for a simple twodimensional Gaussian likelihood distribution.We used the static version of Dynesty with default settings, but increased the number of live points to 1, 000, as this is similar to the number of live points used for gravitational-wave analyses [see e.g.39].For this example, VARAHA used N pts = 20, 000, N min = 1, 000 and P thr = 0.999.
As expected, VARAHA rapidly converges to the region of high likelihood (within 3 cycles), while Dynesty requires significantly more cycles to constrain to a similar region of the parameter space (∼ 4, 200 cycles).The significant reduction in the number of cycles of VARAHA, compared to Dynesty, is primarily caused by the likelihood threshold: we see that in VARAHA's first cycle, we constrain the high likelihood region to within a circle of radius 5, while Dynesty takes ∼ 3, 000 cycles to constrain to a comparable region of the parameter space.VARAHA further constrains the high likelihood region to within a circle of radius 2.5 within 3 cycles, while Dynesty takes ∼ 4, 200 cycles to obtain a similar constraint.This culminates in a significantly reduced wall and CPU time.VARAHA also obtains a larger number of effective samples, with n eff ∼ 6, 000 compared to n eff ∼ 4, 000 for Dynesty.

D. The evolution of the multi-dimensional grid
One of the key features of VARAHA is that it is able to draw points from within the live volume without having to know the structure.It achieves this by generating a multi-dimensional grid that covers the full parameter space and registering hypercubes that contain at least one point with likelihood above the previous likelihood threshold.To demonstrate this in practice, we analyse a complex two-dimensional distribution and explicitly show how the multi-dimensional grid is constructed and how it converges to the region of high likelihood.The chosen distribution has multiple disconnected regions of equal likelihood.While this example is only two-dimensional, the disconnected peaks in likelihood can prove challenging to identify.Since this distribution is analytically known, it provides a good illustration of VARAHA.For this example, we used N pts = 20, 000, N min = 1, 000 and we terminated VARAHA once 5 cycles have elapsed.
FIG. 2. Plot showing the evolution of the multi-dimensional grid that surrounds the live volume, when VARAHA samples from a complex two-dimensional distribution with a constant density.The purple dots show the points randomly drawn from within the multi-dimensional grid and the green dots show the points that have a likelihood larger than the likelihood threshold from the previous cycle (Step 1).The orange dots show the points with a likelihood larger than the likelihood threshold calculated at the current cycle (Step 2) and the black lines show the multi-dimensional grid that surrounds the live volume (Step 5).The left, middle and right panels show to the 1 st , 3 rd and 5 th cycles respectively.In the right panel, the red line shows the contour of fixed likelihood equal to the likelihood threshold, representing the final live volume.
In Fig. 2 we show the evolution of the multidimensional grid.
We see that in the first cycle, VARAHA is able to identify the rough location of the high-likelihood region and construct a coarse multidimensional grid that surrounds the entire volume.As VARAHA progresses, we see that the number of hypercubes increases and the multi-dimensional grid converges to the complex two-dimensional distribution with gaps appearing between adjacent hypercubes where there is little to no probability support.Unlike in Fig. 1, we do not see any green dots beyond the first cycle.This is because VARAHA rapidly identifies the high-likelihood region (owing to a constant density throughout) in the first cycle and maintains a constant likelihood threshold for all subsequent cycles.This implies that all points that cross the previously defined likelihood threshold (green dots) are used as live points for the current cycle (orange dots).

E. Example: Bimodal multi-variate Gaussian distribution
Next, we showcase the full VARAHA sampling algorithm.We chose to analyse a six-dimensional bimodal multi-variate Gaussian distribution and compare results with existing Nested and Markov-Chain-Monte-Carlo samplers.We also explicitly show how the likelihood threshold evolves as the number of cycles increases.For this example, we terminated VARAHA once 20, 000 effective samples were collected.During our sampling we used P thr = 0.999, N pts = 400, 000 and N min = 1, 000.
We analysed an asymmetric multivariate Gaussian distribution with each mode's mean and covariance randomly chosen: the mean of the first and second modes are randomly chosen between [−1, 1] and [−5, −3] respec- tively and we use a covariance matrix that is obtained by applying an inverse Wishart distribution [75] to a diagonal matrix with elements randomly chosen between 1 and 2.
The output of VARAHA is shown in Table I.The prior volume spans from -20 to 20 in each dimension.As VARAHA converges to the region of high likelihood, the number of bins steadily increases over the cycles, reflecting the decreased uncertainty in the recovered volume.As expected, the likelihood threshold monotonically increases from −170 to −13.6 as the number of cycles increases.The likelihood threshold for the first three cycles is set by L Nmin with subsequent cycles using L thr .For the fourth cycle and beyond, the likelihood threshold remains fixed, which reflects the fact that the volume enclosing P thr of the posterior mass has been found.The number of effective samples is relatively low in the FIG. 3. Corner plot [76] showing the posterior PDF obtained by VARAHA and the true bimodal multi-variate Gaussian distribution across 4 of the six dimensions.The 2-dimensional contours show the 1.5σ and 2.5σ confidence levels and the individual histograms on the leading diagonal show the data marginalised to a particular dimension.VARAHA completes sampling in 30 seconds on a single CPU thread.
first few cycles since the majority have L i L max , but increases steadily as the likelihood threshold increases.VARAHA obtains just over 23,000 effective samples from a total of approximately 800,000 weighted samples.Of course, rejection sampling could be used to obtain a set of unweighted samples, however, this would result in a smaller final sample size.We see that a larger number of cycles is needed than in the previous example, which is a result of the higher dimensionality of the likelihood surface.
In Figure 3, we plot the posterior distribution obtained with VARAHA and the known analytic distribution.We see that VARAHA recovers the true distribution to high accuracy, with the mean and widths of each mode correctly identified.For comparison, we also analysed the same analytic distribution with two external samplers: dynesty [70], a Nested sampler, and Bilby MCMC [41], a Markov-Chain-Monte-Carlo sampler, both operated through the bilby infrastructure [38] 5 .In our testing, we found that VARAHA finished sampling in 30 seconds on a single CPU: at least 60× faster than either dynesty or Bilby MCMC.Although the runtimes for both dynesty and Bilby MCMC can vary significantly depending on the chosen settings, it is unlikely that either sampler can obtain a comparable number of posterior samples as VARAHA in 30 seconds that accurately recovers the mean and widths of each mode, when using a single CPU.

III. APPLICATION TO PARAMETER ESTIMATION OF GRAVITATIONAL WAVES
In this section, we demonstrate the application of VARAHA to the PE of gravitational wave signals.We focus only on gravitational waves originating from compact binary mergers and compare results to those obtained with LALInference, the software that has regularly been used since the first gravitational-wave detection in 2015 [17] (see also [36-40, 44, 45, 50, 51, 56, 77, 78]).For this article, we restrict attention to quasi-circular binaries with spins aligned with the orbital angular momentum, referred to as an aligned-spin binary, meaning that the binary does not precess [30].In addition, we focus only on the leading (2,2) harmonic of the waveform, which is typically the most significant contribution to the observed GW signal [79].In this case, the binary parameters can be cleanly decomposed into intrinsic parameters, which determine the properties of the binary, and extrinsic parameters, which determine the location and orientation of the binary relative to the earth.Throughout this work, we use the IMRPhenomD [80,81] gravitational-wave model to evaluate the likelihood since it is optimised for aligned-spin binary systems that contain only the leading (2,2) harmonic (see also [82][83][84][85]).
For an aligned-spin model, the intrinsic parameters primarily affect the amplitude and phase evolution of the gravitational wave, and the extrinsic parameters only affect the overall amplitude, phase and time of arrival of the signal at each of the detectors.Table II lists the extrinsic and intrinsic parameters of the system estimated by VARAHA.VARAHA allows separable sampling of extrinsic and intrinsic parameters and breaks one large dimensional problem into two small ones.We note that other analyses, e.g.rapid PE/RIFT [44,45], employ a similar methodology as used by VARAHA.This framework is extensible to include higher harmonics in the gravitational-wave signal and additional signal parameters, such as in-plane spins, which lead to additional physical effects in the waveform, and can reduce biases in the recovered PE [4,[86][87][88][89][90][91][92].It will require accounting for the morphological dependence of a signal on the extrinsic parameters.This extension will be investigated in future work.Mass ratio defined to be less than 1 χ1 First aligned spin component χ2 Second aligned spin component TABLE II.GW signal parameters sampled by VARAHA.We group extrinsic parameters, except coalescence time, into one variable Ω (top section), and all the intrinsic parameters into θ (bottom section).The component masses m1 and m2 are characterised by the mass ratio q = m2/m1 and chirp mass M = (m1m2) (3/5) /(m1 + m2) (1/5) .The masses are measured in the frame of the detector from the signal that has already suffered cosmological redshift.

A. Factorising the likelihood
For a network of gravitational-wave detectors (e.g.[93][94][95]), the probability that the observed detector data contains a gravitational wave signal from a coalescing binary and with parameters, (t c , Ω, θ), is given by Bayes formula as where d ≡ {d 1 , d 2 , • • • } represents the strain data observed in each gravitational-wave detector, t c is the coalescence time in the reference detector, Ω denotes the remaining extrinsic parameters, and θ are the intrinsic parameters.Under the assumption that the data in each detector is independent, stationary, Gaussian and containing a gravitational wave signal, a Gaussian likelihood, where c = (2π) −k/2 for a k dimensional distribution and h i ≡ h i (t c , Ω, θ) is the expected GW signal in the i th detector.The term { d} is the marginal likelihood (or evidence), In Eq. 11, the complex noise weighted inner product between two time-domain functions a(t) | b(t) is defined in the frequency domain as, where S(f ) is the power spectrum of the detector noise [33], ã(f ) and b(f ) are the Fourier transforms of a(t) and b(t) respectively and star represents the complex conjugate.Eq. 13 can be evaluated at an arbitrary time shift by using the convolution theorem and taking the inverse Fourier transform [96], Consequently, the likelihood can be evaluated for a fixed set of intrinsic parameters, from a single inner product calculation.
Returning to Eq. 11, the first term, d i |d i , is independent of the parameters Ω and θ and is therefore constant.Since it will be absorbed in the normalisation for the Eq. 10 we neglect it in what follows.The final term is the inner product of the expected signal in the i th detector with itself, If h(θ) is the signal arriving from a binary with intrinsic parameters θ at a unit distance, from overhead the detector and with the orbital plane facing the observer, the relative amplitude of a signal from a given distance d L sky location and orientation is determined by the effective distance, given by, where the detector response functions F i +,× depend upon the sky location, polarization and time of arrival of the source [97].Thus, once has been calculated, it is straightforward to evaluate i as Finally, we turn to the two middle terms in Eq. 11 which constitute the inner product of the data d i with the expected signal.We have already seen that the variation of the signal amplitude can be encoded in the effective distance D i eff .Similarly, the phase of the signal observed in detector i is given by [98] and the time of arrival is given by where ∆t i depends upon the location of the source relative to the detectors.Since the amplitude and phase evolution of the signal is unchanged by the extrinsic parameters, we can calculate the inner product for any set of extrinsic parameters by rescaling and time-shifting the inner product time series for the reference waveform h(θ).Thus, the inner product of the data d i with the expected signal is given by where we have defined the Signal to Noise Ratio (SNR) for the template with intrinsic parameters θ as The third term in Eq. 11 is simply the complex conjugate of 21.
Combining these expressions, the resulting loglikelihood from Equation 11 assumes the form where ∆φ i is the difference between the measured phase in detector i and the expected phase, given the parameters of the signal: The log-likelihood is maximised for a given set of intrinsic parameters when D i eff = i (θ)/ρ i (θ, t i c ), and the cosine term in the last equation is unity.However, as the cosine term is solely dependent on the phase acquired due to the extrinsic parameters and ρ i (t i c ) is dependent on the arrival times in different network detectors, the maximisation puts a time-phase constraint [99].
The expression in Eq. 23 clearly separates the likelihood dependence on the intrinsic parameters θ from the extrinsic parameters Ω and the coalescence time t c .In particular, the effective distance, D i eff , phase, φ i Ω and time of arrival t i in each detector depends only upon the extrinsic parameters Ω and the time of arrival t c .The SNR time-series associated to the fiducial waveform h(θ), and its overall normalisation, is dependent only on the intrinsic parameters θ.However, the specific time at which to evaluate the SNR does depend upon the intrinsic parameters through ∆t i .In the following sections, we make repeated use of this splitting of the likelihood to independently estimate the intrinsic and extrinsic parameters.In particular, the most time-consuming step is the generation of simulated waveform and evaluation of the SNR time-series.Thus by writing the likelihood in the form of Eq. 23, it becomes clear that the entire extrinsic parameter space, for a fixed set of intrinsic parameters, can be explored with a single evaluation of the SNR time-series.

B. Extrinsic Parameters
VARAHA starts by first fixing the intrinsic parameters θ to a reference waveform and samples the posterior distribution for the seven extrinsic parameters, Ω and t c .BAYESTAR [35], a rapid, non-Markovian sky localisation algorithm commonly used by the LIGO, Virgo and KAGRA collaborations (see e.g.[66]), also samples the extrinsic parameters by fixing the intrinsic parameters to a reference waveform.In this subsection, we explain how VARAHA varies from BAYESTAR and demonstrate that it is able to compete with BAYESTAR's performance.
In the initial evaluation, it is natural to use the values for the intrinsic parameters, θ o , which are reported by the search analysis that identified the signal [100][101][102][103].The detector which has the largest value of ρ i (θ o ) in the network is chosen as the reference detector.We then perform PE over the seven-dimensional extrinsic parameter space using the method described in Section II.The first step involves scattering millions of points across the extrinsic parameter space.Since 5 dimensions are angles, it is natural to cover the full range of possible values: The initial choice of bounds for the coalescence time and luminosity distance requires more care.We wish to determine the narrowest ranges of t c and d L that will ensure we chose a range that encloses the posterior mass.
If the initial choice is too narrow, then we risk missing part of the relevant parameter space and if it is too broad this will lead to unnecessary exploration of uninteresting regions of the parameter space which will increase the analysis time.
To fix the range of coalescence time, we restrict attention to the reference detector (the one with the largest SNR).We assume that the extrinsic parameters are chosen to maximise the likelihood contribution from the reference detector, i.e. that D i eff = i (θ o )/|ρ i (θ o , t c )| and that the phase ∆φ i = 0.In that case, the likelihood contribution from the reference detector is 1  2 |ρ i (θ o , t c )| 2 .This is normally distributed in t c .Furthermore, from the GW search result, we know the time t o which gives the maximum SNR, ρ i (θ o , t o ).As we vary the coalescence time t i c in the reference detector, the observed SNR will be reduced and, consequently, the maximum contribution of the reference detector to the likelihood will be reduced.The initial range of coalescence times is chosen so that the boundary is at least 4-σ, i.e. we require So far we have restricted attention to a reference detector.Let us assume that there exists a set of extrinsic parameters which is a good fit to the data in all detectors, i.e.D i eff ≈ i (θ o )/|ρ i (θ o , t o )| and ∆φ i ≈ 0.Then, as we vary the coalescence time t o , at best we will find a set of extrinsic parameters that matches the data in all detectors other than the reference, while in the reference detector, the time is offset from the observed peak.Thus, the loss in likelihood in the reference detector gives an (approximate) lower limit on the loss in the network likelihood.
We restrict the initial range of allowed coalescence times that satisfies Eq. 26. Figure 4 plots an example for the bounds on ρ(t c ) estimated for the data from the LIGO Hanford detector corresponding to the signal GW151226 [65].The discrete values of ρ(t c ) are obtained by taking discrete inverse Fourier transform of the frequency domain inner-product between data and waveform, sampled at 2048Hz.A smooth function is obtained by interpolating using a cubic spline.
The initial range of distances is chosen to ensure that the chosen range encloses the posterior mass.We also require the bounds to be as small as possible to prevent VARAHA from sampling in regions of parameter space with low likelihood.Returning to Eq. 15, we see that the effective distance is always equal to or greater than the luminosity distance, with equality only for face on systems (cos ι = ±1), lying either directly above or below the detector ( F i + 2 + F i × 2 = 1).We choose the maximum distance to be three times the effective distance in the reference detector: as well as choosing a minimum distance of 0. At a distance, d max L , the maximum possible likelihood in the reference detector occurs for a face-on, overhead system.In that case, the log-likelihood is reduced by an amount At an SNR of 7 in the reference detector, which corresponds to a relatively weak signal, this leads to a reduction in the log-likelihood of 11.However, as with the discussion of the coalescence time, it is likely that there will also be a reduction in the likelihood in the other detectors, meaning this is a lower limit on the loss in the network likelihood.As often is the case, instead of the distance, we use a uniform prior on the volume, We have verified the efficacy of our choices on the range of coalescence time and distance by performing parameter estimation runs on hundreds of simulated signals.
As an example, we estimate the extrinsic parameters for the signal GW151226 [65].We scatter N pts = 1, 000, 000 points within the multi-dimensional grid for each cycle (Step 1 in Section II), set P thr = 0.9999 when evaluating L thr , and keep a minimum of N min = 8, 000 points at each cycle to evaluate L Nmin (Step 2 in Section II).We terminate sampling once 8 cycles have elapsed.Table III shows the evolution of various quantities.The process collects ∼ 10, 000 effective samples from a total of ∼ 160, 000 weighted samples in 8 cycles.As expected, for the first two cycles, the likelihood threshold is set by L Nmin while for later cycles, where there is a higher probability of randomly scattering points within the high likelihood region, the threshold is set by L thr .Unlike in previous examples, the likelihood threshold is positive and increases monotonically.This is because we have neglected the d|d term in Eq. 11. Figure 5 pictorially shows VARAHA converging to the most probable sky location of GW151226.We see that as the number of cycles increases, the live volume shrinks to the region of high likelihood.As a result, the number of bins in a multi-dimensional grid that encloses the live volume steadily increases with each cycle.This indicates that the error on the MC volume is steadily decreasing over time.For this example, VARAHA localises GW151226 within 45 seconds on a single CPU thread.Assuming an approximately linear scaling with the number of CPU threads, we expect to localise most gravitational-wave signals in less than 5 seconds when parallelising over 10 CPU threads (see Section IV C for a discussion about CPU scaling).The exact runtime of BAYESTAR is unknown for this case, but we expect that BAYESTAR completed in ∼ 2 minutes when running on a single CPU thread (based on Fig. 12 in [35]).The purple dots show the points randomly drawn from within the multidimensional grid, the green dots are all of the points that have a likelihood larger than the likelihood threshold from the previous cycle and the orange dots are the points with likelihoods larger than the likelihood threshold for the current cycle.The black lines show the multi-dimensional grid that surrounds the live volume.Eight cycles were completed in less than 45s using one CPU thread and accumulated around 10,000 effective samples.

C. Intrinsic Parameters
In order to obtain samples from the full posterior distribution, we need to vary both the intrinsic and extrinsic parameters in Equation 23.However, as changing the extrinsic parameters only changes the overall amplitude, phase and time of arrival of the gravitational wave signal (since we are restricting to the leading multipole of a non-precessing system), samples can be obtained by combining independent and separated sampling over the extrinsic and intrinsic parameters.In order to achieve this, we require two likelihoods: one which is conditioned only on the extrinsic parameters (introduced in Section.III B), and another, marginalised likelihood, that is dependent only on the intrinsic parameters of the source.The marginalised likelihood that is dependent only on the intrinsic parameters is simply [44], This procedure breaks one high-dimensional problem into two smaller-dimensional problems and has two significant benefits.First, the computational requirement of sampling decreases with decreased dimensionality [42] which is expected to reduce the overall cost.Second, an analysis sampling the full parameter space needs to generate a simulated gravitational waveform for each likelihood calculation, and compute the inner product of the waveform with the detector data.Often the computational cost to generate and filter these waveforms is high.By splitting the sampling, for each waveform generation and matched filtering operation, the likelihood over arbitrary values of extrinsic parameters can be calculated using inexpensive operations that change the overall amplitude, phase and time of the signal.
To evaluate the marginalised likelihood in Equation 29, we integrate over the extrinsic parameter space for each draw of the intrinsic parameters θ.We identify the highlikelihood region of the intrinsic parameter space by following the method outlined in Section II, albeit with the likelihood calculation replaced by the more complex calculation of L intr .To begin, we define the region of interest in the intrinsic parameter space.Here, our four parameters are the chirp mass M, mass ratio q, and the spins of each black hole aligned (or anti-aligned) with the orbital angular momentum χ 1 and χ 2 .We set the initial range of the intrinsic parameter space to be The ranges for mass ratio and spins are primarily driven by the region of validity of the IMRPhenomD waveform model [80,81], as well as the physical restriction that q ≤ 1, |χ| ≤ 1.The central value of the chirp mass range, M o , is the chirp mass of the GW search template that identified the signal.The width ∆M is chosen as, where ρ 0 is the reported SNR.The first term in Eq. 31 is motivated by the expected accuracy of measurements of the chirp mass for low-mass signals where the inspiral is the dominant part of the signal observed in the detectors [33].The second term in Eq. 31 is taken from empirical uncertainties of chirp mass measurements from GWTC-3 [4] and is conservatively broad to ensure that the range is broader than the observed distribution.We again note that the mass values are in the frame of the detector, thus M = (1 + z)M source .
For each draw of intrinsic parameters θ j , we marginalise the likelihood by integrating it over a fiducial parameter space for the extrinsic parameters, Ω o .To generate Ω o , we make use of the samples generated in the extrinsic parameter space associated with the intrinsic parameters identified by the search, θ o .In general, we expect there to be a minimal correlation between the masses and spins and several of the extrinsic parameters.As discussed in [35,104], while changing the masses and spins will impact the measured coalescence time in each detector, the relative time delay will be only weakly impacted and consequently, the inferred sky location will be largely independent of the intrinsic parameters.Similarly, the orientation of the binary, encoded in the inclination ι primarily depends upon the observed ratio of power in the two gravitational wave polarizations and this is unlikely to change significantly with mass or spin.Finally, although the intrinsic amplitude of a gravitational wave signal does vary linearly with mass, the chirp mass width is constrained to be at most a few per cent of the measured value resulting in the inferred distance varying a few per cent with changes in mass.Thus, the overall change in the volume that contains the high likelihood region in the extrinsic parameter space only varies a few per cent with any change in the intrinsic parameters.To accommodate these fractional changes, we use θ o and specify a lower P thr for the intrinsic parameter space than what is desired for the extrinsic parameters.This means that VARAHA defines each volume in the intrinsic parameter space to accommodate a slightly smaller probability than what is desired for the extrinsic parameters.For instance, if P thr = 0.999 is specified for the extrinsic parameters, VARAHA uses a P thr = 0.995 for the intrinsic parameters.This ad-hoc choice results in the recovery of sane posteriors even at a population level.However, we will ascribe a more rigorous treatment of this problem in a future presentation.
We generate Ω o by retaining samples of (α, δ, ι, d L ) that cross the likelihood threshold from the extrinsic-only analysis and augment it with samples from the remaining three parameters: (ψ, φ c , t c ) as defined later.For each draw in the intrinsic parameter space, we evaluate Eq. 29 by numerically integrating over Ω o . 6The observed gravitational wave phase and coalescence time will vary significantly with the masses and spins.Thus, we draw (φ c , ψ) from the full range (0, 2π).In addition, due to the degeneracy between the coalescence phase and the polarisation angle, retaining samples of the latter from the extrinsiconly analysis or regenerating them does not impact the posterior on the intrinsic parameters.The appropriate range for the coalescence time t c can be derived using the same method as for the initial point θ o , as described around Eq. 26, although using the peak SNR in the reference detector for the intrinsic parameters θ j .For each sample in Ω i o we pre-calculate D i eff and φ i Ω .Each draw of θ j requires waveform generation, matched filtering of the data and calculation of likelihood using these precalculated values.Finally, the (marginalised) likelihood is obtained by approximating the integral in Eq. 29 by writing, We continue sampling using different draws of θ j and follow our sampling procedure guided by their marginalised likelihood values.
Calculating the marginal likelihood is computationally expensive.An approximate but optimistic value can be calculated by maximising the likelihood independently over the time of arrival and phase in each detector.This is done by ignoring the phase term in Eq. 29 and using the maximum SNR in each detector independently.By ignoring both of these factors, we obtain a likelihood which will always be equal to or greater than the true likelihood.
The last line in Eq. 33 further maximises the likelihood on all the extrinsic parameters.The benefit is that the term in the second line can be calculated after matched filtering the data, in addition, for the term in the first line we only need to generate D i eff for each of the intrinsic samples.Both these calculations require significantly less computation than the full likelihood calculation.We thus marginalise only if both of these values are larger than the likelihood threshold.
The marginalised likelihood L intr ( d|θ j ) assign the weight for each point in the intrinsic parameter space is where L max (j) is the maximum marginalised likelihood value among all the samples.In addition, we obtain a weight w i j for each sample Ω i o corresponding to the intrinsic parameters θ j , which is simply where L max (i, j) is the maximum likelihood of all the samples in the fiducial volume estimated for each sample of the intrinsic parameters.
As we are interested in generating samples which correspond to the full eleven-dimensional parameter space of intrinsic and extrinsic parameters combined.This presents a challenge in generating and storing the samples.We circumvent this problem by randomly keeping only one set of extrinsic parameters for each set of intrinsic parameters and discarding the rest.This choice is made by performing rejection sampling on the weights w j i to select a single Ω i o and t i j for each θ j .Thus for each point in the intrinsic parameter space that we sample, we store a single sample (θ j , Ω i o , t i j ) and the weight,w j , associated with the intrinsic parameter, θ j .
We are now left with the sampling of intrinsic parameters and the associated marginalised likelihoods, in the successive cycles we identify live volumes across 4 dimensions (M, q, χ 1 , χ 2 ).For each cycle, we evaluate the marginalised likelihood values and continue the cycles until we obtain the desired number of effective samples We continue sampling intrinsic parameters for GW151226.
We scatter points within the multidimensional grid for each cycle (Step 1 in Section II), set P thr = 0.9999 when evaluating L thr , and keep a minimum of N min = 1, 000 points at each cycle to evaluate L Nmin (Step 2 in Section II). 7 We terminate sampling once 8 cycles have been completed.Table IV lists the number of bins in the multi-dimensional histogram, the likelihood threshold and the number of effective samples over the cycles.Like previous examples, the likelihood threshold increases initially before reaching a final value.Figure 6 shows the evolution of the live volume in the M − q plane with the increase in cycle number.

D. Implementation
To summarise, VARAHA analyses gravitational-wave signals as follows, 1. Obtain posterior on the extrinsic parameters: Obtain the posterior distribution for the extrinsic parameters by following the steps detailed in Sec.II A. The full gravitational-wave likelihood is used but we fix the intrinsic parameters to the values reported by the GW search pipelines.
2. Construct a fiducial volume for the extrinsic parameters: Retrieve luminosity distance, inclination angle, right ascension, and declination and FIG. 6.The 2D projection of the evolution in the M − q plane for the analysis sampling the intrinsic parameters for the observation GW151226.The left panel shows the 1 st cycle, the middle panel shows the 4 th cycle and the right panel shows the 8 th cycle.The purple dots show the points randomly drawn from within the multi-dimensional grid, the green dots are all of the points that have a likelihood larger than the likelihood threshold from the previous cycle and the orange dots are the points with likelihoods larger than the current likelihood threshold.The black lines show the multi-dimensional grid that surrounds the live volume with likelihood equal to the likelihood threshold.
augment with the remaining extrinsic parameters as defined in Sec.III C.
3. Obtain posterior on the intrinsic parameters: Marginalise the full likelihood on the fiducial volume calculated in Step 2, and obtain the posterior distribution for the intrinsic parameters by following steps detailed in Sec.II A. We now use the marginalised likelihood given in Eq. 32 and a smaller P thr than in Step 1.The fiducial volume based on the extrinsic parameters remains fixed.

E. Processing Time
The faster processing times of this analysis are due to the following reasons 1.A large dimensional problem has been broken into two small dimensional problems resulting in reduced computational requirement.
2. The sampling method is entirely likelihood driven and swiftly converges to the relevant region of the parameter space.
3. Taking inverse Fourier transform of Equation 13allows constraining bounds on t c and vectorised estimation of likelihood values at thousands of samples in the fiducial set of extrinsic parameters.
4. The waveform morphology of the inferred templates is expected to be similar.This analysis does not match filter if the phase accumulated in the detector's sensitivity band (∼ 20Hz-2000Hz) by the fiducial waveform and a template waveform differs by more than 30 radians (approximately five cycles).
5. This analysis does not marginalise over extrinsic parameters if an approximate but optimistic estimate of marginalised likelihood given in Eq. 33 is smaller than the likelihood threshold.
6. Analysis has been rigorously optimised and performs a judicious vectorised operation to save on computation times.
IV. RESULTS

A. Example: GW151226
Figure 7 compares the VARAHA's posterior with the posterior obtained using LALInference.Both analyses use IMRPhenomD [80,81] for waveform generation and use almost equivalent priors on masses and spins.LAL-Inference allows priors on the component masses, while VARAHA uses uniform priors on the chirp mass and mass ratio.We used a wide prior for the component masses in LALInference and then applied a chirp mass constraint to produce almost equivalent priors between the two algorithms.There is a good agreement in the two results; there are small differences in the marginalised one-dimensional posteriors, but they are consistent at the 90% confidence level.This difference is likely a result of the slightly different priors assumed between the two codes.We also see good agreement in the recovered sky map and distance estimate, with any deviations likely a consequence of LALInference marginalising over the calibration uncertainty while VARAHA does not yet have the functionality to do so.VARAHA obtained the posterior in less than one CPU hour.Based on the experience gained while running the two codes on different signals, we expect more than two orders of magnitude shorter computation times for VARAHA.

B. Example: GW170817
VARAHA can rapidly estimate the origin of the observed gravitational wave.This can have significant implications on electromagnetic follow-up programs for BNS observations.To demonstrate this, we analyse GW170817 using data from all three detectors and compare the estimated skymap with the location of the known host galaxy: NGC 4993 [66].Fig. 8 shows two skymaps: the skymap produced when VARAHA samples only the extrinsic parameters, and the skymap produced when VARAHA samples both the intrinsic and extrinsic parameters.We see that within 2 CPU minutes, VARAHA is able to localise GW170817 to within 49 square degrees (at 90% confidence) when sampling over only the extrinsic parameters.For this analysis, VARAHA used N pts = 1, 000, 000, N min = 8, 000, P thr = 0.999 and we stopped sampling once 8 cycles were completed.The localisation area was reduced to 17 square degrees (at 90% confidence) after 16 CPU hours when VARAHA samples over the intrinsic and extrinsic parameters.For this more detailed analysis, VARAHA used N min = 1, 000, P thr = 0.995 and we terminated sampling once 8 cycles were completed.For comparison, BAYESTAR localises GW170817 to within 31 square degrees (at 90% confidence) when analysing only the extrinsic parameters, and LALInference localises GW170817 to within 23 square degrees (at 90% confidence) [66] when analysing both intrinsic and extrinsic parameters.Although the exact runtimes of BAYESTAR and LALInference are unknown for this case, we expect that BAYESTAR completed in ∼ 2 CPU minutes (based on Fig. 12 in Ref. [35]) and LALInference completed in ∼ 500 CPU hours (based on Ref. [58]) 8 .Consequently, VARAHA matches the performance of BAYESTAR when sampling over only the extrinsic parameters, but, importantly, significantly improves upon LALInference when FIG. 9. Percentile-Percentile (P-P) plot for 500 simulated injections.The 1-, 2-and 3-σ confidence intervals are indicated by the grey bands.For results to be unbiased the trails are required to be enclosed by the bands [108].
sampling over the intrinsic and extrinsic parameters.We note, however, that there has been recent work to reduce the runtime of LALInference by utilising reduced order quadrature models [106], as well as using meshfree approximations [107].Since the inclusion of intrinsic parameters is preferred, as it reduces the 90% contour for most cases, VARAHA may be pivotal for the rapid follow-up of binary neutron star observations.

C. Population Level Test
We evaluate the population level accuracy of VARAHA by performing a Percentile-Percentile (P-P) test [108]; this test involves performing hundreds of parameter estimation runs on synthetic signals embedded in simulated detector noise.The P-P test investigates if the measured interval of parameters at a credibility f % also encloses f % of true values among all the measurements.We perform parameter estimation on 500 simulated signals and show the P-P plot in Figure 9.The parameters of the synthetic signals are drawn from VARAHA's prior, and we only analyse signals that cross a chosen SNR threshold.As described Section III D, we first estimate the extrinsic parameters.We do this by fixing the intrinsic parameters to the true values used when generating the synthetic signals.This is a reasonable choice as we do not expect the detector noise to bias the measurement, and the inferred population should average out to the true population.We then construct the fiducial volume for the extrinsic parameters and use it to estimate the marginalised likelihood for sampling the intrinsic parameters, as well as, obtaining the 11-dimensional posterior on the full parameter space, as described in Sec.III C.
The distribution of injection parameters is listed in Table 25 and Table 30.The luminosity distance is uniform in volume and chirp mass is uniformly distributed be- tween 10M and 20M .Any injection that crosses a matched filter network SNR of 10 is selected for estimating the parameters.In this analysis, the network is comprised of advanced LIGO Livingston/Hanford and the advanced Virgo detector [109,110].
Most of the injections required 8 seconds of simulated noise to accommodate the duration of simulated signals last in the detector's sensitivity band (∼20-2000Hz).Figure 10 shows the time taken by the analysis in performing PE.The median time needed was less than 4 minutes using 10 threads.Almost all the PE runs were completed in less than sixteen minutes.We see that, in general, VARAHA takes longer to analyse binaries with more assymetric masses, with the longest run time of 40 minutes arising from a binary with mass ratio q = 0.1.Waveform generation and matched filtering consumed around 60% of the time and calculation of the reduced likelihood consumed around 30% of the time.
In Figure 11, we show the scalability of the analysis with the number of CPU threads.We perform two additional sets of runs each using an expensive likelihood calculation and performed using 1 and 40 CPU threads respectively.We make the likelihood calculation expensive, by including a one-tenth of a second delay in waveform generation, to reflect what we expect the scalability to be when VARAHA is extended to include additional physics (for example precession, higher order multipoles and eccentricity) since these waveform models are more expensive to generate than the simple aligned-spin case.The median time when using 40 CPU threads was 508 seconds while the median time when using 1 CPU thread was 16910 seconds.Increasing the number of threads by a factor of 40 reduced the analysis time by a factor of 33.We expect this scaling to improve if the likelihood calculation is made more expensive.

V. DISCUSSION
In this article, we introduced a new sampling method that estimates Bayesian posteriors by identifying the volume that encloses the posterior mass and calculating the likelihood values inside the identified volume.This approach significantly increases our ability to parallelise the analysis over multiple CPU threads and increases the efficacy to use vectorised likelihood calculations.In addition, we introduced the use of a likelihood threshold to judiciously populate the parts of the parameter space based on our prior understanding of the distribution at hand.Compared to Nested sampling, which focuses directly on the estimation of probability mass, this approach is less robust in estimating higher-dimensional multi-modal distributions.
Our sampling method is ideally suited for estimating parameters of the compact binaries from their GW signals.These parameters are inherently assumed to follow a uni-modal likelihood distribution.We show that a large one-dimensional sampling problem can be broken into two small-dimensional sampling problems.Using vectorised likelihood calculations, we first sample the extrinsic parameters and subsequently obtain posteriors on the intrinsic parameters.We employ several approximations of the likelihood functions to draw boundaries in the parameter space and calculate likelihood values in the parts that meaningfully contribute to the posterior distribution.Tests indicate that our analysis can estimate parameters for most of the BBH signals in a few minutes.
There is significant scope to further decrease the computation time.The most expensive component of the analysis is waveform generation and matched filtering.Both of these can be significantly reduced by using existing proposals [77,111,112].We reconstruct the volume that contains the probability mass by using a structured multi-dimensional grid.It results in the reconstructed volume being much larger than what is estimated from the MC integral.A more efficient reconstruction can employ the use of unstructured grids reducing the number of cycles needed to obtain an effective sample size.Many of the calculation is done on the fly (time delay between detectors and antenna patterns) and can be pre-calculated to save computation time.

VI. CONCLUSION
The presented analysis offers significant improvement in processing time for estimating the parameters of a CBC while producing results comparable with the contemporary samplers.Although VARAHA is currently restricted to using only aligned-spin waveform models, it is trivial to extend this algorithm to include additional physics, such as precession, eccentricity or tidal deformability, without hindering performance.However, for most cases, we expect that GW signals can be accurately modelled as produced from aligned spin binaries, since the degree of orbital precession is often difficult to measure [see e.g.113].This means that often, the posterior distribution simply recovers the prior [114][115][116].Of course, for binaries that precess [91], this means that some information is lost.
We use a uniform prior on all the parameters but, as we calculate the likelihood for each of our samples, it is straightforward to re-calculate these likelihood values with a new prior by simply dividing the probability density of the new prior with the old one and multiplying it to the likelihood value [117].As we can also calculate the marginal likelihood the samples can naturally be used for model selection.Furthermore, owing to the reduced computational time compared to other samplers, VARAHA is a natural choice for data diagnostics in understanding the systematics or non-Gaussianity in data associated with a signal, as well as performing parameter estimation on gravitational-wave data collected by the Laser Interferometer Space Antenna (LISA) [118] and third generation detectors [119,120] where PE is likely to be slow due to the large observation times.
This code is highly parallalisable as individual threads do not communicate.It also does not have to address any problem that a usual MCMC encounters.This may include proper mixing of chains, tuning of the code and potential correlations due to low proposal acceptance rate.
with the weights given by w = exp( ), = log (L) − log(L max ), (A9) where we have scaled L, such that the maximum value of is zero.Such a scaling does not impact any discussion earlier as it gets absorbed when normalising Eq.A9.
It helps obtain equal-weight samples after performing rejection sampling on the weights.A point to consider is that if rejection sampling is performed on the value of weights, it will result in a sample size of close to i w i with all the samples having equal weights of 1 [72].However, such a procedure results in loss of information as n eff is always larger than i w i .Often MCMC methods are employed to sample from the posterior probability distribution.All the algorithms performing PE using MCMC methods implement some kind of rejection sampling.Although they produce equally weighted samples, they discard a good fraction of the likelihood information [73].
Using a proposal distribution that is significantly different from the true distribution when estimating parameters using importance sampling, leads to most of the samples having very small weights, resulting in a severely reduced n eff and a grossly inefficient analysis.If uniform priors are used, a naive estimation of likelihood inside an arbitrary big box leads to most samples having very small weights.Unlike MCMC methods, importance sampling is severely impacted by the curse of dimensionality.

FIG. 1 .
FIG. 1. Top row : A pictorial representation of VARAHA's sampling algorithm for a two-dimensional Gaussian likelihood distribution with mean [0.5, 0.5] and covariance [0.5, 0.5].The left, middle and right panels show cycle 1, cycle 2 and cycle 3 respectively.The purple dots show the points randomly drawn from within the multi-dimensional grid and the green dots show the points that have a likelihood larger than the likelihood threshold from the previous cycle (Step 1).The orange dots show the points with a likelihood larger than the likelihood threshold calculated at the current cycle (Step 2).The red line shows the contour of fixed likelihood equal to the likelihood threshold, representing the live volume, and the black lines show the multi-dimensional grid that surrounds the live volume (Step 5).Bottom row : A pictorial representation of a Nested sampling algorithm for the same two-dimensional Gaussian likelihood distribution; for this case, we use the Dynesty [70] Nested sampler.The left, middle and right panels show cycle 1, cycle 3, 000 and cycle 4, 200 respectively, and the grey dots show the Nested live points.
Right ascension of the source δ Declination of the source dL Luminosity distance of the source ι Inclination angle ψ Polarisation angle φc Coalescence phase tc Coalescence time in the reference detector M Detector frame chirp mass q

FIG. 4 .
FIG. 4. The figure plots ρ(tc) for values of tc around the observed GPS of the signal GW151226 [65] for the LIGO-Hanford detector.The dotted red lines set the bound on tc as described in Eq. 26.The dashed blue lines show the 99.9% credible interval of the posterior distribution on tc obtained from performing parameter estimation on the full set of extrinsic parameters.

FIG. 5 .
FIG. 5. Points obtained when sampling over the extrinsic parameters for the observation GW151226.The top left panel shows the 1 st cycle, the top right shows the 2 nd cycle, the bottom left shows the 4 th cycle and the bottom right shows the 8 th cycle.The purple dots show the points randomly drawn from within the multidimensional grid, the green dots are all of the points that have a likelihood larger than the likelihood threshold from the previous cycle and the orange dots are the points with likelihoods larger than the likelihood threshold for the current cycle.The black lines show the multi-dimensional grid that surrounds the live volume.Eight cycles were completed in less than 45s using one CPU thread and accumulated around 10,000 effective samples.

FIG. 7 .
FIG.7.Plot comparing the posterior distributions obtained from VARAHA (blue) and LALInference (orange) when analysing GW151226.The left panel contains a corner plot for the primary and secondary masses as well as χPN(Equation5in[105]).The right panel shows the most probable sky location of GW151226 as well as the inferred distance.In both panels, contours enclose 90% probability mass.

FIG. 10 .
FIG. 10.Plot showing the wall time taken to perform PE on 500 synthetic signals embedded in simulated detector noise (see Fig. 9).Each of the 500 individual PE runs used 10 CPU threads and each point is colour coded by the median of the inferred mass ratio distribution.The median wall time is 4 minutes.

FIG. 11 .
FIG. 11.Plot showing the wall time taken to perform PE on 100 synthetic signals embedded in simulated detector noise when the likelihood calculation is made expensive by including a delay in the waveform generation.The wall time taken when using a single CPU thread is shown in blue and 40 CPU threads in orange.Both analyses used identical settings.Assuming a linear CPU scaling from the 1 CPU thread analysis, the expected time taken to analyse 100 PE runs on 40 CPU threads is shown by the blue dashed histogram.In reality, increasing the number of CPU threads to 40 reduces the computation time by a factor of 33.

TABLE I
. Output from VARAHA showing the evolution of the number of bins in each dimension N bins , the likelihood threshold in each cycle, and the number of effective samples n eff from the multi-variate Gaussian example.The likelihood threshold in each cycle is set by either LN min or L thr depending on the situation (see text for details); for this case, the first 3 cycles are set by LN min and subsequent cycles are set by L thr .The sampled distribution is a bimodal multivariate Gaussian and VARAHA collected more than 20,000 effective samples across 13 cycles in 30 seconds on a single CPU thread.

TABLE III .
Output from VARAHA showing the evolution of the number of bins in each dimension N bins , the likelihood threshold in each cycle (either LN min or L thr depending on the situation, see text for details), and the number of effective samples n eff when estimating the extrinsic parameters for GW151226.

TABLE IV .
Output from VARAHA showing the evolution of the number of bins in each dimension N bins , the likelihood threshold in each cycle (either LN min or L thr depending on the situation, see text for details), and the number of effective samples n eff .Here we estimate the intrinsic parameters for the observation of GW151226.