Hints of spin-orbit resonances in the binary black hole population

Binary black hole spin measurements from gravitational wave observations can reveal the binary's evolutionary history. In particular, the spin orientations of the component black holes within the orbital plane, $\phi_1$ and $\phi_2$, can be used to identify binaries caught in the so-called spin-orbit resonances. In a companion paper, we demonstrate that $\phi_1$ and $\phi_2$ are best measured near the merger of the two black holes. In this work, we use these spin measurements to provide the first constraints on the full six-dimensional spin distribution of merging binary black holes. In particular, we find that there is a preference for $\Delta \phi = \phi_1 - \phi_2 \sim \pm \pi$ in the population, which can be a signature of spin-orbit resonances. We also find a preference for $\phi_1 \sim -\pi/4$ with respect to the line of separation near merger, which has not been predicted for any astrophysical formation channel. However, the strength of these preferences depends on our prior choices, and we are unable to constrain the widths of the $\phi_1$ and $\Delta \phi$ distributions. Therefore, more observations are necessary to confirm the features we find. Finally, we derive constraints on the distribution of recoil kicks in the population, and use this to estimate the fraction of merger remnants retained by globular and nuclear star clusters. We make our spin and kick population constraints publicly available.

Binary black hole spin measurements from gravitational wave observations can reveal the binary's evolutionary history. In particular, the spin orientations of the component black holes within the orbital plane, φ1 and φ2, can be used to identify binaries caught in the so-called spin-orbit resonances. In a companion paper, we demonstrate that φ1 and φ2 are best measured near the merger of the two black holes. In this work, we use these spin measurements to provide the first constraints on the full six-dimensional spin distribution of merging binary black holes. In particular, we find that there is a preference for ∆φ = φ1 − φ2 ∼ ±π in the population, which can be a signature of spin-orbit resonances. We also find a preference for φ1 ∼ −π/4 with respect to the line of separation near merger, which has not been predicted for any astrophysical formation channel. However, the strength of these preferences depends on our prior choices, and we are unable to constrain the widths of the φ1 and ∆φ distributions. Therefore, more observations are necessary to confirm the features we find. Finally, we derive constraints on the distribution of recoil kicks in the population, and use this to estimate the fraction of merger remnants retained by globular and nuclear star clusters. We make our spin and kick population constraints publicly available.
Introduction.-Binaries of spinning black holes (BHs) serve as a unique astrophysical laboratory for a range of relativistic phenomena. For example, if the BH spins χ 1 and χ 2 are aligned with the orbital angular momentum L, the orientations of the orbital plane and the spins remain fixed during the inspiral (cf. Fig. 1 for definitions of the binary BH spin parameters). However, if the spins are tilted with respect to L, relativistic spin-orbit and spin-spin coupling cause the orbital plane and the spins to precess [1,2].
While the tilt angles θ 1 and θ 2 control precession, the orbital-plane spin angles φ 1 and φ 2 play a central role in binaries undergoing spin-orbit resonances (SORs) [3]. For these binaries, the χ 1 , χ 2 and L vectors become locked into a common resonant-plane such that ∆φ = φ 1 − φ 2 is fixed at 0 or ±π as the binary precesses. Refs. [4,5] pointed out that this locking is a limiting case of librating states near ∆φ ∼ 0 or ±π. For simplicity, we will follow previous literature [6][7][8] and refer to the more general librating states as SORs. While evidence for precession has been found in the astrophysical binary BH population [9], SORs have not yet been observed even though they are expected in some astrophysical scenarios. For example, stellar binaries can cluster near these resonances if supernova natal kicks and stellar tides are significant [10,11].
Another important relativistic effect that gets amplified for spinning binaries is the gravitational recoil. Gravitational waves (GWs) can carry away linear momentum from the binary, imparting a recoil or kick velocity to the merger remnant [13][14][15][16]. These velocities can reach values up to ∼ 5000 km/s for precessing binaries [17][18][19], large enough to be ejected from any host galaxy [20]. Kick measurements from GW observations [21] can be used to constrain the formation of heavy BHs via successive mergers [22]. However, the kick depends very sensitively on the orbital-plane spin angles [23]. GW observations by LIGO [24] and Virgo [25] have en-abled increasingly precise constraints on the astrophysical distributions of BH spin magnitudes and tilts [9,26], but the distributions of the orbital-plane spin angles remain unconstrained. Constraining these distributions would allow us to understand the prevalence of SORs and merger kicks in nature. The biggest obstacle for this, however, is the difficulty in measuring φ 1 , φ 2 and ∆φ from individual GW events with current detectors [6-8, [27][28][29]. However, in a companion paper, Varma et al. [30], we show that this can be greatly improved by measuring the spins near the merger, in particular, at a fixed dimensionless reference time t ref /M = −100 before the peak of the GW amplitude, rather than the traditional choice of a fixed GW frequency of f ref = 20 Hz. Here M = m 1 + m 2 is the total (redshifted) mass of the binary with component masses m 1 ≥ m 2 , and we set G = c = 1. Ref. [30] shows that this improvement can be attributed to the waveform being more sensitive to variations in the orbital-plane spin angles near the merger. In particular, measuring the spins near the merger leads to improved constraints for φ 1 and φ 2 for several events in the latest GWTC-2 catalog [31][32][33][34][35] released by the LIGO-Virgo Collaboration. While the ∆φ measurements are not significantly impacted, Ref. [30] shows that this parameter will also be better constrained with louder signals expected in the future.
In this Letter, we use the spin constraints from Ref. [30] to perform the first measurement of the full spin distribution in the astrophysical binary BH population. In particular, we identify a preference for ∆φ ∼ ±π, which can be a signature of SORs. Next, given the spin population, we derive constraints on the kick population. Finally, we use the kick constraints to estimate the fraction of merger remnants retained by globular and nuclear star clusters.
Methodology.-The first step in our analysis is to estimate the binary BH parameters from individual GW signals, which is done following Bayes' theorem [36]: where p(Θ|d) is the posterior probability distribution of the binary parameters Θ given the observed data d, L(d|Θ) is the likelihood of the data given Θ, and π(Θ) is the prior probability distribution for Θ. The full set of binary parameters Θ is 15 dimensional [31], and includes the masses and spins of the component BHs as well as extrinsic properties such as the distance and sky location.
In this work, we use the posteriors samples from Ref. [30], obtained using the numerical relativity (NR) surrogate waveform model NRSur7dq4 [37], with the spins measured at t ref /M = −100. NRSur7dq4 accurately reproduces precessing NR simulations and is necessary to reliably measure the orbital-plane spin angles [30]. GWTC-2 includes a total of 46 binary BH events. However, because NRSur7dq4 only encompasses ∼20 orbits before merger, it can only be applied to the shorter signals with M 60 M [37]. This reduces our set of events to 31; these events are listed in Tab. I of Ref. [30]. Given the posterior samples p(Θ|d) for the individual events, we want to measure the astrophysical distribution of the full spin degrees of freedom, S = {χ 1 , χ 2 , θ 1 , θ 2 , φ 1 , ∆φ}, which is a subset of Θ. The remaining angle, φ 2 , is redundant given φ 1 and ∆φ; we choose to work with ∆φ as it is relevant for SORs. As an intermediate step, we first reweight the posterior samples for each event to account for known astrophysical constraints on the primary mass and mass ratio (q = m 2 /m 1 ) populations [9]. The details of the reweighting procedure are given in the Supplement [38]. Post reweighting, Eq. (1) can be rewritten as: where R indicates that these are the reweighted posteriors. Using these reweighted posteriors in Eq. (4) below ensures that the implicit priors on the mass population are astrophysically motivated [39].
To constrain the astrophysical distribution of S, we begin by making the assumption that the true value of S for each event is drawn from a common underlying distribution π(S|Λ), which is conditional on a set of hyperparameters Λ. We then use hierarchical Bayesian inference [36] to collectively analyze all 31 events and constrain Λ: where p(Λ|{d i }) is the hyper-posterior distribution for Λ given a set of observations {d i }, L({d i }|Λ) is the hyperlikelihood of this dataset given Λ, and π(Λ) is the hyperprior distribution for Λ. In our case, {d i } with i = 1 . . . N represents the observed data for our set of N = 31 GW events. The hyper-likelihood is obtained by coherently combining the data from from all events [36]: For the underlying distribution π(S|Λ), the spin magnitudes and tilts are modeled following the "Default spin" model of Ref.
[9]. The orbital-plane spin angles φ 1 and ∆φ are modeled as being drawn from independent von Mises distributions [40]. The von Mises distribution is an approximation of a Gaussian distribution with periodic boundary conditions and is parameterized by a mean and a standard deviation (or simply, width).
The explicit forms of π(S|Λ) and the hyper-prior π(Λ) are given in the Supplement [38]. In particular, the priors on the mean (µ φ1 and µ ∆φ ) and width (σ φ1 and σ ∆φ ) hyperparameters for the φ 1 and ∆φ distributions are as follows. The prior for the mean parameters is always uniform in (−π, π). We consider two different prior choices for the widths: (i) A Jeffreys prior [41] that is log-uniform in σ φ1 and σ ∆φ between (0.3, 4π), henceforth referred to as the Jeffreys-σ φ prior. (ii) A prior that is uniform in σ φ1 and σ ∆φ between (0.3, 4π), henceforth referred to as the Flat-σ φ prior. The Jeffreys prior is an uninformative prior choice often used for scale parameters [41]. The flat prior may be considered a control case to understand the impact of the prior. The lower limit of 0.3 rad on the width priors is arbitrary, but chosen to be smaller than the sharpest features we expect to be resolvable with LIGO-Virgo (which we estimate from the NR injections in Ref. [30]). The upper limit of 4π is chosen to be large enough to approximate a flat distribution between (−π, π).
We use the Bilby [42] package with the dynesty [43] sampler to draw posterior samples for the hyperparameters Λ from p(Λ|{d i }). Finally, the posterior distribution for the S population, also referred to as the posterior population distribution, is obtained by averaging over Λ [36]: In practice, this is done by drawing samples from the hyper-posterior p(Λ|{d i }) and evaluating π(S|Λ) on an array of S values for each Λ sample. This gives us an ensemble of probability distributions on S, which we use to compute the mean and 90% credible widths. Finally, we note that we ignore selection effects for the spin population, as they are not expected to be significant at current sensitivity [38]. Figure 2 shows our constraints on the φ 1 and ∆φ populations. The left panel shows the posteriors for the mean and width hyperparameters. For both Jeffreys-σ φ and Flat-σ φ prior choices, we find that the 1D marginalized posteriors for the widths σ φ1 and σ ∆φ are dominated by the prior itself. However, the 1D posteriors for the mean parameters show a preference for µ φ1 ∼ −π/4 and µ ∆φ ∼ ±π. This is reflected in the corresponding constraints on the posterior population distributions, p(φ 1 ) and p(∆φ), shown in the right panel of Fig. 2. These represent our constraints on the astrophysical distributions for φ 1 and ∆φ; they are generated by evaluating the von Mises model using draws from the joint posterior of the mean and width parameters. We interpret the population constraint in Fig. 2 as follows. For the Flat-σ φ prior, examining the 2D posterior for µ ∆φ − σ ∆φ , we note that when σ ∆φ → 0, only the region around µ ∆φ ∼ ±π is allowed in the 90% credible region. This means that, if there is a sharp peak in the ∆φ population, it is only allowed near ∼ ±π. Similarly, examining the 2D posterior for µ φ1 − σ φ1 , we find that when σ φ1 → 0, there is a preference for µ φ1 ∼ −π/4. The preferences in the 1D µ φ1 /µ ∆φ posteriors and the p(φ 1 )/p(∆φ) distributions get amplified for the Jeffreys-σ φ prior, as this prior already prefers small widths. In short, the data disfavor peaks at regions other than φ 1 ∼ −π/4 and ∆φ ∼ ±π, and this leads to p(φ 1 ) and p(∆φ) peaks in these regions. However, the data are not informative enough to constrain the widths of these peaks. We further note that both populations are still consistent with a uniform distribution at the 90% credible level.

Spin population.-
It is important to recognize that the location of the φ 1 peak in Fig. 2 depends strongly on our choice of reference point. This is because φ 1 changes on the orbital timescale as it is defined with respect to the line-of-separation (cf. Fig. 1). On the other hand, ∆φ only changes on the longer precession time scale, and we find that repeating our analysis using spins measured at 20Hz leads to consistent results for the ∆φ population [38]. However, the biggest gain in measuring the spins at t ref /M = −100 is in the φ 1 population constraint, as φ 1 is significantly better measured there [30]. Constraining both φ 1 and ∆φ is necessary to constrain the kick population below.
For completeness, we include our constraints on the spin magnitude and tilt populations, along with full model hyperparameter posteriors in the Supplement [38]. Our constraints on the spin magnitude and tilt populations are consistent with Ref.
[9], and we do not find any obvious correlations between the orbital-plane spin angles and the other spin parameters. To gain further confidence in our results, we also conduct some mock population studies [38], which suggest that at least some φ 1 and ∆φ populations can be reliably recovered at current detector sensitivity. Finally, by iteratively leaving one event out from the dataset and repeating our analysis, we check that our results are not driven by any single event.
One limitation of this work is the restriction to the 31 signals with M 60M so that we can use the NRSur7dq4 model. We also repeat our analysis for all 46 binary BH events from GWTC-2, using the phenomenological waveform model IMRPhenomTPHM [44] for the remaining 15 events. Interestingly, we find that there is some information gain in the width parameters in this case, with a preference for small widths. However, as noted in Ref. [30], IMRPhenomTPHM can have biases in recovering the orbital-plane spin angles. Therefore, while we include these results in the Supplement [38] for completeness, we treat Fig. 2 as our main result.
Kick population.-Having constrained the full spin degrees of freedom for the binary BH population, we can now derive constraints on the kick population. We begin by generating one realization of the q, χ 1 and χ 2 populations. For q, we use the same model that was used in the initial posterior reweighting. For χ 1,2 , we use the spin population constraints at t ref /M = −100. We simply draw one hyperparameter sample from the posterior of the hierarchical analysis and evaluate the q, χ 1 and χ 2 population models at that point. Next, we draw a large number of samples for q, χ 1 and χ 2 from this population realization and compute the corresponding kick magnitudes using the NRSur7dq4Remnant model [37,45]. Repeating these steps over many draws of q, χ 1 and χ 2 populations, we generate an ensemble of kick population distributions p(v f ). For comparison, we also evaluate the prior p(v f ) by repeating this procedure using prior hyperparameter samples. Figure 3 shows the 90% credible constraints on p(v f ) for the Jeffreys-σ φ and Flat-σ φ prior choices. In addition, we consider a prior choice where the φ 1 and ∆φ populations are restricted to be uniformly distributed. We refer to this prior choice as Infinite-σ φ , as the other priors reduce to this when σ φ1 = σ ∆φ = ∞. This restricted prior was also used in Ref. [46] to constrain the kick population. The mass ratio, spin magnitude and tilt population models are the same for all three choices. We compare our kick population constraints against fiducial escape velocities for globular clusters [47,48], nuclear star clusters [48], elliptical galaxies [20] and Milky Way-like galaxies [49].
Comparing the prior and posterior ranges in Fig. 3, we note that there is significant information gain about the kick population, even though individual events are largely uninformative about the kick [21]. The three prior choices lead to consistent kick populations in Fig. 3, with the Infinite-σ φ prior leading to the tightest constraint. This is expected as the Infinite-σ φ prior is a special case of the other two. This is also reflected in the more restrictive p(v f ) prior in Fig. 3 for Infinite-σ φ . It is somewhat surprising that the kick population is not hugely influenced by the prior choices on φ 1 and ∆φ population, even though the kick is known to be very sensitive to these parameters [23]. This is explained by the fact that the φ 1 and ∆φ distributions in Fig. 2 are still consistent with a uniform distribution at 90% credibility. We expect this to change with future observations. Astrophysical implications.-While the location of the φ 1 peak in Fig. 2 is not particularly important, the fact that there is a peak at all is indeed interesting. This feature has not been predicted for any formation channel. Therefore, our naive expectation is that this peak will get smoothed over as more data are added. However, it will be interesting to see if there are alternative explanations.
On the other hand, the preference for ∆φ ∼ ±π in Fig. 2 is expected in some formation channels where SORs [3] are important. In particular, stellar binaries with significant supernova natal kicks and efficient stellar tides can be driven towards these resonances [10,11]. In the standard scenario where the heavier star becomes the heavier BH, the ∆φ ∼ ±π resonant mode is expected to be dominant. However, if mass transfer between the two components is significant, a mass-ratio reversal occurs and the ∆φ ∼ 0 mode becomes dominant [10]. Note that the predictions of Refs. While our ∆φ population constraint can be interpreted as coming from SORs, this does not yet constitute conclusive evidence for them-especially not without a measurement of the width of the distribution to confirm this feature. In addition, consider a binary for which the spin angular momenta in the plane of the orbit perfectly cancel (which requires ∆φ = ±π). This system will not undergo orbital precession and can approximately mimic a spinaligned system, as pointed out by Ref. [29]. As a result, a precessing waveform model can sometimes mistake a spin-aligned system for one with ∆φ = ±π. However, such a binary still undergoes spin precession which can be used to break this degeneracy given sufficient signalto-noise ratio. A more detailed analysis may be necessary to account for effects of such potential degeneracies on our results.
Finally, as an application of our p(v f ) constraints in Fig. 3, we estimate the fraction of merger remnants that would be retained by various host environments. Assuming a maximum escape velocity v max esc = 100 km/s for globular clusters [47,48], 6 +5 −3 (6 +4 −3 ) % of the remnants will be retained for the Jeffreys-σ φ (Flat-σ φ ) prior. For nuclear star clusters, assuming v max esc = 600 km/s [48], the retention fraction is constrained to 54 +19 −17 (57 +16 −15 ) % for the Jeffreys-σ φ (Flat-σ φ ) prior. Averaging over the two prior choices, we estimate the retention fraction to be ∼ 3 − 10 % for globular clusters, and ∼ 39 − 73 % for nuclear star clusters. All constraints are quoted at 90% credibility. Our constraints on the retention fraction are consistent with those of Refs. [46,50].
If the observed ∆φ ∼ ±π preference is confirmed to be due to SORs, our findings have several important implications: (i) SOR measurements can be used to place new astrophysical constraints on supernova natal kicks and stellar tides [10, 11]. (ii) SORs can be used to constrain the cosmic merger rate of isolated stellar binaries in galactic fields, as well as measure what fraction of merging binaries form via that channel. (iii) The ∆φ ∼ ±π resonance mode tends to enhance merger kicks and suppress the spins of the remnant BHs [51,52], both of which are important observables for constraining the formation of heavy BHs via successive mergers [22]. Furthermore, independent of whether our findings can be attributed to SORs, our p(v f ) constraints already suggest that globular clusters are an unlikely site for that formation channel.
Conclusion.-We constrain the distribution of the orbital-plane spin orientations φ 1 and ∆φ in the binary black hole population. We find that there is a preference for ∆φ ∼ ±π, which can be a signature of SORs. In addition, we find a preference for φ 1 ∼ −π/4 in the population, which has not been predicted for any astrophysical formation channel. However, the strength of these preferences depends on our prior choices. Finally, we constrain the distribution of recoil kicks in the population, and estimate the fraction of merger remnants retained by globular and nuclear star clusters. We make our population constraints publicly available at Ref. [53].
Observational evidence for SORs has far reaching implications for black hole astrophysics. While our population constraints suggest the influence of SORs, we are unable to constrain the widths of the φ 1 and ∆φ distributions with the current dataset of events. Therefore, further observations are necessary to confirm these trends. With LIGO and Virgo approaching their design sensitivities [54], our constraints are certain to improve in the near future.
[1] Theocharis A. Apostolatos, Curt Cutler, Gerald J. Sussman, and Kip S. Thorne, "Spin-induced orbital precession and its modulation of the gravitational waveforms from merging binaries," Phys. Ref. [37]. In this frame, the z-axis is along the direction that maximises the power in the (2,2) mode, which is taken to be the direction of the orbital angular momentum [56]. The x-axis is along the line of separation from the lighter to the heavier BH, and the y-axis completes the righthanded triad. Note that this frame is defined using the gauge-invariant waveform at future null infinity, rather than the gauge-dependent BH trajectories. ().

I. ADDITIONAL DETAILS ON THE HIERARCHICAL ANALYSIS
In this section, we provide additional details on the reweighting procedure applied to posterior samples for individual GW events, the explicit forms for the spin population model and the hyper-prior, and potential selection effects in the hierarchical analysis.
[9] analyzed the GWTC-2 catalog to place constraints on the populations of component BH masses, spin magnitudes and tilts (but not φ 1 and ∆φ). For simplicity, we only model the spin degrees of freedom in this work, but incorporate the astrophysical mass and mass-ratio population constraints from Ref. [9].
We denote Γ = {m src 1 , q}, where m src 1 = m 1 /(1 + z) is the mass of the heavier BH in the source frame, and z is the source redshift. To account for the astrophysical constraints on Γ, we apply the following weights to the posterior samples for each individual event where j indicates the particular GW event, π() is the same prior as in Eq. (1), and p(Γ j |{d} i =j ) denotes the posterior population distribution (cf. Eq. (46) of Ref. [57]) for the Γ population obtained using the data from all events other than j. We use the public data release for the "Power Law + Peak" model from Ref.
[9] for the Γ population constraints. p(Γ j |{d} i =j ) is obtained from these results using the "leave-one-out" computation described in Ref. [57]. This ensures that the event j is not double-counted in the analysis. We perform an additional reweighting to switch to a more astrophysically motivated redshift prior. Note that, when obtaining the posterior samples in Eq. (1), Ref. [30] used a prior that is uniform in comoving volume [31,58]: where dV c /dz is the differential comoving volume. We now apply the weights to these posteriors, effectively switching to a prior that that is uniform in comoving volume and source frame time [31,58]: The additional (1 + z) −1 factor accounts for cosmological time dilation. This matches the redshift prior assumed for the "Power Law + Peak" model in Ref.
[9]. We perform both reweighting steps simultaneously, by applying the weights to the posterior samples Θ j for each event j in our dataset. Post reweighting, Eq. (1) can be rewritten as Eq.
[9] models the mass degrees of freedom simultaneously with the spin degrees of freedom. Using the mass-population-reweighted posteriors is equivalent to Ref.
[9], except that this does not account for any possible correlations between the mass and spin hyperparameters. However, Ref.
[9] found that these correlations are not significant.

Parameter Prior
µχ Table S1. Priors on hyperparameters for our spin population model. In addition, following Ref.
[9], we exclude µχ, σ 2 χ values where the Beta distribution becomes singular. Here, U(a, b) indicates a uniform distribution on the interval (a, b).

Name
Prior on σ φ 1 and σ ∆φ )   Table S2. The prior choices we consider for σ φ 1 and σ ∆φ . LU(a, b) indicates a log-uniform distribution on the interval (a, b), while δ(a) indicates a Dirac delta distribution where the parameter is fixed at a. Note that the Infinite-σ φ prior restricts the φ1 and ∆φ populations to be uniform.

B. Population model
We use the following joint distribution for the underlying spin distribution π(S|Λ) in Eq. (4): Here, p(χ 1,2 | µ χ , σ 2 χ ) is a Beta distribution in the spin magnitudes, parameterized by its mean µ χ and variance σ 2 χ [59], and p(θ 1,2 | ξ θ , σ θ ) is an isotropic tilt distribution with a Gaussian peak component, parameterized by the standard deviation σ θ of the Gaussian and the mixing fraction ξ θ coming from the Gaussian component [60]. Note that we assume the distributions for the two component BHs are the same for the spin magnitude and tilt. This model for the spin magnitudes and tilts is the same as "Default spin" model described in App.D.1 of Ref.
[9]. On top of this model, we include p(φ 1 | µ φ1 , σ φ1 ) and p(∆φ | µ ∆φ , σ ∆φ ) as independent von Mises distri-butions parameterized by their corresponding mean and standard deviations. Our choices for the hyper-prior π(Λ) (cf. Eq. (3)) imposed on the hyperparameters are described in Tab. S1. The von Mises distribution [40] is defined as where µ is the mean, κ is a shape parameter, and I 0 is the modified Bessel function of order 0. The von Mises distribution is a close approximation of a Gaussian with periodic boundary conditions at φ = ±π, making it an appropriate choice for phase parameters like φ 1 and ∆φ. The variance of the von Mises distribution can be approximated as 1/κ; therefore we define the standard deviation to be σ ≡ 1/ √ κ.

C. Selection effects
Ref.
[9] also distinguishes between between the astrophysical distribution of a parameter-the distribution as it is in nature-and the observed distribution of a parameter-the distribution as it appears among detected events due to selection effects, because of which binaries with certain parameters may be easier to detect than others. These effects can be accounted for by modifying the hyper-likelihood in Eq. (4) as done in Eq. (1) of Ref.
[9] includes selection effects for their mass population models, they are ignored for the "Default spin" model as they are not expected to be significant at current detector sensitivity.
In our case, the mass-reweighted posteriors (cf. Sec. I A) already account for selection effects for the mass population. As we simply extend the spin population model of Ref.
[9] with the φ 1 and ∆φ models, we also ignore selection effects for our spin population model. As noted in Ref.
[9], it will be important to include spin selection effects as detectors sensitivity improves. However, at current sensitivity, assuming spin selection effects are not significant (as done in Ref. [9]), our results can be treated as constraints on the astrophysical distribution rather than the observed distribution.
Additional selection effects can arise from the NRSur7dq4 requirement of M 60M , which restricts us to 31 of the available 46 events in GWTC-2. However, assuming the correlations between the mass and spin population distributions are not significant at current sensitivity [9], imposing an additional selection criterion on the masses should not affect the inferred spin distribution, as the 31 events included in our analysis would be a fair representation of the underlying spin distribution. Note that while Ref. [9] found that such correlations are not significant, this did not include the orbital-plane spin angles. Therefore, a full resolution of this would require extending NRSur7dq4 to longer inspirals; see Ref. [63] for work in this direction. Finally, NRSur7dq4 is also restricted to mass ratios q ≥ 1/6 [37]. However, this restriction does not exclude any additional events, as the only GWTC-2 events with significant support at q 1/6 also have a total mass < 60M [31].

A. Full spin population
In Fig. 2, we only show the population constraints on φ 1 and ∆φ. For completeness, we now show the hyperparameter posteriors and population constraints on the spin magnitudes and tilts in Fig. S1 (for the Jeffreys-σ φ prior) and Fig. S2 (for the Flat-σ φ prior). Our population constraints on the spin magnitudes and tilts are consistent with Ref.
[9], but somewhat broader as we only use 31 of the available 46 binary BH events in GWTC-2. We do not find any clear correlations between the orbital-plane spin angles and the other spin parameters.
The spin magnitude and tilt populations of Ref.
[9] are constrained at f ref = 20 Hz while our constraints are at t ref /M = −100. However, we expect these populations to be similar, as spin tilt measurements at current detector sensitivity are not strongly dependent on the reference point [30]. While the BH spin magnitudes can evolve during the inspiral due to in-falling angular momentum carried by GWs, this is a very small effect (4PN higher than leading angular-momentum loss [64,65]), and is thus safely ignored by current waveform models including NRSur7dq4. The orbital-plane spin angle measurements, on ther other hand, do strongly depend on the reference point [30], which leads to noticeable differences in the population constraints as discussed in Sec. II B.

B. Population constraints at f ref = 20 Hz
The results in Fig. 2 are obtained using NRSur7dq4 spin posteriors at t ref /M = −100 [30]. Ref. [30] also generated NRSur7dq4 spin posteriors at f ref = 20 Hz. In this section, we repeat our hierarchical analysis using these spin posteriors for comparison. Figure S3 Fig.4 of Ref. [30]), the ∆φ populations are also consistent between Fig. S3 and Fig. 2. By contrast, as there is significant improvement in φ 1 measurements for individual events at t ref /M = −100 (cf. Fig.3 of Ref. [30]), the φ 1 population is much better constrained in Fig. 2 compared to Fig. S3. It is important to note that this does not imply that the astrophysical φ 1 distribution is flatter at f ref = 20 Hz compared to t ref /M = −100. Instead, this is because the φ 1 measurements at f ref = 20 Hz are very poor. In fact, even the mild peak near φ 1 ∼ 0 for the Jeffreys-σ φ prior in Fig. S3 is driven entirely by GW190521 [61]. As discussed in Ref. [30], GW190521 is the only event with a good measurement of Fixed φ 1 = π/2, ∆φ = 0 Figure S4. Same as Fig. 2, but for the two mock populations (with 31 events) described in Sec. II C. The left panels correspond to injections with fixed φ1 = −π/2 and ∆φ = ±π at t ref /M = −100, while the right panels have fixed φ1 = π/2 and ∆φ = 0.
The injected values are indicated by star markers. For both mock populations, the true values are reasonably well recovered, but the ∆φ = 0 population is better constrained than the ∆φ = ±π one, especially for the Flat-σ φ prior.
distribution at 90% credibility, at current sensitivity.
To project measurements of the kick population with improved detector sensitivity, it is important to first appreciate that kick measurements for individual events do not depend on the reference point when using the method in Ref. [21]. The kick model NRSur7dq4Remnant [37] takes spins at t ref /M = −100 as input; therefore, if spin measurements are available at f ref = 20 Hz, they are first evolved using the NRSur7dq4 dynamics from f ref = 20 Hz to t ref /M = −100 before computing the kick [37,45]. As discussed in our companion paper [30], this is equivalent to measuring the spins directly at t ref /M = −100. Therefore, by construction, we get the same kick velocity for individual events, independent of the reference point (modulo NRSur7dq4 spin evolution errors, which are small compared to the model errors [37,66]). However, the same logic does not apply at the population level. Because the orbital-plane spin angles are poorly constrained at f ref = 20 Hz for individual events, this information can get diluted at the population level (cf. Fig. S3). Therefore, even if we use the spin population at f ref = 20 Hz and evolve spins drawn from this population to t ref /M = −100, the orbital-plane spin angle information is already lost. On the other hand, at t ref /M = −100, the orbital-plane spin angles are better constrained for both individual events [30] and on the population level (cf. Fig. S3), and this information can lead to improved kick population constraints. For this reason, we expect that as more observations become avail-able, the φ 1 and ∆φ populations will be significantly better constrained at t ref /M = −100, leading to better kick population constraints as well.

C. Mock population study
To test the fidelity of the von Mises model in recovering φ 1 and ∆φ populations, we conduct a mock population study. For each of our 31 events, we pick the maximumlikelihood posterior sample, but we rotate the orbitalplane spin angles and set them to constant values at t ref /M = −100 for all events. We construct two such populations, one with φ 1 = −π/2 and ∆φ = ±π and another with φ 1 = π/2 and ∆φ = 0. We inject the corresponding signals in simulated detector noise and recover the spin population using our hierarchical analysis.
We use the NRSur7dq4 waveform model for the injections as well as the parameter inference (with the LAL-Inference package [62]). The signals are injected into Gaussian noise from a simulated LIGO-Virgo network at design sensitivity; however, we rescale the injected distance such that the SNR matches that of the observed event. Therefore, our mock populations approximately mimic the parameters and detector sensitivity for these events. Figure S4 shows the results from our hierarchical analysis on these mock populations. For both mock populations, the φ 1 and ∆φ distributions show a clear preference for