A Parameter-Free Tour of the Binary Black Hole Population

The continued operation of the Advanced LIGO and Advanced Virgo gravitational-wave detectors is enabling the first detailed measurements of the mass, spin, and redshift distributions of the merging binary black hole population. Our present knowledge of these distributions, however, is based largely on strongly parameteric models; such models typically assume the distributions of binary parameters to be superpositions of power laws, peaks, dips, and breaks, and then measure the parameters governing these"building block"features. Although this approach has yielded great progress in initial characterization of the compact binary population, the strong assumptions entailed leave it often unclear which physical conclusions are driven by observation and which by the specific choice of model. In this paper, we instead model the merger rate of binary black holes as an unknown \textit{autoregressive process} over the space of binary parameters, allowing us to measure the distributions of binary black hole masses, redshifts, component spins, and effective spins with near-complete agnosticism. We find the primary mass spectrum of binary black holes to be doubly-peaked, with a fairly flat continuum that steepens at high masses. We identify signs of unexpected structure in the redshift distribution of binary black holes: a uniform-in-comoving volume merger rate at low redshift followed by a rise in the merger rate beyond redshift $z\approx 0.5$. Finally, we find that the distribution of black hole spin magnitudes is unimodal and concentrated at small but non-zero values, and that spin orientations span a wide range of spin-orbit misalignment angles but are also moderately unlikely to be truly isotropic.

At the same time, this approach has some less desirable downsides.
• First, our chosen functional form prescribes from the very outset the set of possible population features, and so it is not always clear which conclusions come from informative data and which are built by assumption into the models themselves.Parametrized models including sharp features, for example, are prone to "false alarms," favoring the existence of such features even when none exist [14].
• Second, different models may yield very different or even conflicting conclusions if they prescribe different sets of features.This again makes it difficult to conclude which conclusions are robust and which are model-induced.
• Finally, strongly parametrized models allow us to search for "known unknowns" (e.g. is there a pair instability cut-off in the black hole mass spectrum?) but do not let us search for the "unknown unknowns," truly unexpected features that might challenge our astrophysical understanding of compact binary formation and evolution.Several features in the binary black hole population (a peak in the merger rate near 35 M [8], a correlation between binary mass ratio and spin [16,17], etc.), for example, were discovered serendipitously only after a fortuitous choice of model.
In this paper, we will explore an alternative and complementary approach, treating the merger rate of binary black holes as an unknown autoregressive process defined over masses, spins, and/or redshifts.Whereas all other population models entail the use of hyperparameters to specify the dependence of the merger rate on mass, spin, and redshift, under our approach the merger rates at every posterior sample are themselves the quantities that we directly infer from data.This allows us to characterize the compact binary distribution with a high degree of agnosticism, assuming only a prior preference that the merger rate be a continuous function of binary parameters.This approach will allow us to confirm the robustness of features previously identified using standard strongly-parameterized models, as well as identify new features that might otherwise be overlooked.
More specifically, our goals in this work are three-fold: 1. First, we present a flexible measurement of the merger rate densities and probability distributions over binary black hole masses, mass ratios, redshifts, and spins.
2. Inference of the binary merger rate is only a first step.As we discussed further in Sect.VII, a conceptually distinct and equally important step is feature extraction: the subsequent identification of features and assessment of their significance.For stronglyparametrized models, rate inference and feature extraction are by definition performed simultaneously.This is not the case for flexible approaches like spline methods or our autoregressive process, and so the importance of developing techniques for feature extraction is particularly acute.For each binary black hole parameter, we therefore seek to methodically assess feature significance without resorting to hyperparameters, but instead by the calculation and comparison of merger rates between different regions of interest.
3. Finally, our goal is to leverage our autoregressive inference to provide new or extended stronglyparametrized models that reflect our most up-todate understanding of the binary black hole population.This serves two purposes.First, these updated models may provide a robust and accessible point of comparison between theory and observation.And second, these strongly-parametrized models can in turn be adopted in traditional hierarchical analyses, helping to confirm (or reject) possible features that appear in more flexible analyses of the black hole population.
In Sect.II we begin by motivating and defining autoregressive processes as a useful tool in the inference of compact binary populations.In Sects.III through VI, we then apply our method to study the distributions of masses, redshifts, component spins, and effective spins among the binary black hole population.Along the way, we systematically demonstrate feature extraction and discuss new or expanded strongly-parametrized models motivated by our results.We conclude in Sect.VII by commenting on the relationship between flexible and strongly-parameterized models and avenues for future work.
The code used to generate our results and to produce all figures and numbers in the text is hosted on GitHub at https://github.com/tcallister/autoregressive-bbh-inference/, and the data produced by our analyses can be download from Zenodo at https://doi.org/10.5281/zenodo.7616096.
To help make our discussion concrete, consider the problem of measuring the binary black hole primary mass distribution.This amounts to measuring the differential merger rate giving the number dN of mergers per unit comoving volume dV c , per unit source-frame time dt s , and per logarithmic mass interval d ln m.For notational convenience, we will use the shorthand R(θ) ≡ dR/dθ to denote the merger rate density over parameters θ, e.g.
The standard strongly-parameterized approach involves assuming some particular functional form for R(ln m), such as a superposition of power laws, Gaussians, and/or truncations, and then measuring the parameters of these functions [5,[8][9][10][11][12][13]. Stepping back, however, we can think more generally about the merger rate density R(ln m) that we seek to measure.
In nature there exists some underlying function that describes the true mass spectrum of compact binaries; this is illustrated in cartoon form by the dark blue curve in Fig. 1.A priori we know nothing about the exact shape of this function.However, we can still attempt to write down prior assumptions about this function's likely behavior.In Fig. 1, we hypothetically know the merger rate R i at some particular value ln m i .Given this knowledge what is our prior expectation on the merger rate at a new point ln m i+1 ?A reasonable expectation is e N L G g o a j q p r s r S g S 3 4 P v f X m F t f W N z q 7 h d 2 t n d 2 z 8 o H x 6 1 r E 4 N Z U 2 q h T a d i F g m u G J N 4 C B Y J z G M y E i w d j S + m / n t J 2 Y s 1 + o R J g n r S T J U P O a U g J P C r l B Y 9 j N + E U z 7 5 Y p f 9 e f A q y T I S Q X l a P T L X 9 2 B p q l k C q g g 1 o a B n 0 A v I w Y 4 F W x a 6 q a W J Y S O y Z C F j i o i m e 1 l 8 5 O n + M w p A x x r 4 0 o B n q u / J z I i r Z 3 I y H V K A i O 7 7 M 3 E / 7 w w h f i m l 3 G V p M A U X S y K U 4 F B 4 9 n / e M A N o y A m j h B q u L s V 0 x E x h I J L q e R C C J Z f X i W t y 2 p w V a 0 9 1 C r 1 2 z y O I j p B p + g c B e g a 1 d E 9 a q A m o k i j Z / S K 3 j z w X r x 3 7 2 P R W v D y m W P 0 B 9 7 n D 5 a 5 k N A = < / l a t e x i t > ln m i+1 Cartoon demonstrating the physical principle behind our autoregressive inference of the binary hole population.Consider a situation in which we know the differential merger rate R(ln m) at several different masses, up to some ln mi.Next take some new mass, ln mi+1.What prior should we place on the merger rate at this new location?If ln mi and ln mi+1 are close (Scenario 1), then we expect the merger rates at this location to be reasonably close as well; we might therefore place a tight prior on Ri+1 about the value Ri.If, on the other hand, ln mi and ln mi+1 are distant (Scenario 2), the merger rates at each point are unlikely to be related, and so we might place a considerably less informative prior on Ri+1.This intuition is codified by choosing an autoregressive prior on ln R(ln m), such that the merger rate at any mass value has a Gaussian prior about the merger rate at the previous value, with variance related to the separation between points.
that, if ln m i and ln m i+1 are close together (Scenario 1 in the top panel), then the rates at these locations are likely similar as well.In fact, in the limit that ln m i = ln m i+1 , we should recover R i = R i+1 .Conversely, if ln m i and ln m i+1 are far apart (Scenario 2), then the rates at each point need not be similar at all.This intuition forms the basis of an autoregressive process prior.An autoregressive process Ψ(x) is a stochastic function whose value Ψ i at some new point is related to the values at all previous points by Here, the {c j } are deterministic coefficients and w i is a random variable.Qualitatively, the coefficients {c j } govern the degree to which Ψ(x) "remembers" its past values, while the parent distribution of {w i } governs the degree to which the function is allowed to randomly fluctu-ate.The parameter p is called the "order" of the process and determines the smoothness of the resulting functions; an autoregressive process of order p has p − 1 continuous derivatives.Choosing order p = 1 gives us the simplest "AR(1)" autoregressive process, which obeys We can adopt this language as a framework with which to codify our intuition regarding possible merger rate densities, considering the merger rate as function of mass to be of the form where Ψ(ln m) is an autoregressive process in ln m of order p = 1.This implies that, if we know the merger rate R i−1 at one mass location, then we take the rate at a new location ln m i to be probabilistically given by the for some choice of c i and w i (discussed further below).The quantity r sets the mean of this process; it is the departures from ln r that are described via an AR(1) model.Note also that it is the log of the merger rate, not the merger rate itself, that is modeled as an AR(1) process.This guarantees that predicted merger rates are everywhere positive, but has the downside that our inferred merger rate can never strictly go to R = 0 (corresponding to ln R → −∞); see Appendix B. We take c i and w i to be of the form and where ∆ i = ln m i − ln m i−1 is the distance between mass locations and n i is a random variable drawn from a unit normal distribution: n i ∼ N (0, 1).The parameter σ functions to rescale the random variable n i and thus controls the allowed variance of the merger rate.The parameter τ , meanwhile, defines the mass scale over which the mass spectrum remains significantly correlated with itself.In the limit that ∆ i τ , Eq. ( 6) demands that ln R i → ln R i−1 .And in the opposite limit that ∆ i τ , we instead have ln R i drawn randomly from N (ln r, σ), with no memory of earlier merger rate values.The exact forms of Eqs.(7) and ( 8) are chosen to ensure that σ 2 and τ indeed control the variance and autocorrelation length of the process; see Appendices A and B for more information about these expressions.
Figure 2 illustrates several random AR(1) processes Ψ(x), generated with various choices of τ and σ.Processes with large τ (top panel) exhibit much stronger correlations between adjacent points, yielding larger observed scale lengths than those with smaller τ (bottom panel).Processes with large σ, meanwhile, traverse a much larger vertical range than processes with small σ.Note that these functions are continuous, but do not have well-defined first derivatives.If we wanted to instead consider functions with continuous first derivatives, we could instead adopt "AR(2)" processes of order p = 2.We continue with an AR(1) process, however, in order to better capture any sharp or non-differentiable features in the binary black hole population that could be missed by models that require continuous derivatives.

B. Hierarchical Inference with an Autoregressive Prior
Consider a set of N obs gravitational-wave detections with sets of posterior samples {λ I } on the properties

FIG. 2. Examples of various autoregressive processes Ψ(x).
Each curve is random draw from an AR(1) process, subject to different autocorrelation lengths τ and standard deviations σ; see Eqs. ( 4), (7), and (8).The top panel shows example autoregressive processes with large τ , while the bottom panel illustrates two processes with short τ .In Secs.III through IV below, we model the mass, redshift, and spin distributions of binary black holes assuming they are each describable as unknown AR(1) processes.
(9) Here, the product is taken over detected events I and the expectation value is taken over posterior samples j for each event.The quantity p pe (λ I,j ) is the prior probability assigned to each posterior sample under parameter estimation, while is the detector frame merger rate density, to be evaluated at each posterior sample.We use semicolons to indicate that R d (λ; Λ) is a function of the population model Λ but not a density over Λ.Note also that R d (λ; Λ) is not a volumetric density, as in Eq. (1).If redshift z is a parameter in λ such that R d is a merger rate per unit redshift then Eqs. ( 1) and (10) are related by where λ is the set of all binary parameters excluding z and R( λ; z, Λ) is the volumetric merger rate density as evaluated at redshift z.The factor dVc dz is the differential comoving volume per unit redshift, while the factor (1 + z) −1 is needed to convert between source frame and detector frame times.
Equation ( 9) additionally depends on N exp (Λ), the expected number of detections over our observation time T obs given the population Λ.We evaluate N exp (Λ) using a set of successfully recovered signals injected into LIGO and Virgo data [5,34].If p inj (λ) is the reference probability distribution from which these injections were drawn, then [35] where N inj is the total number of injections performed, detected or otherwise, and T obs is our total search time.
The detector frame rate R d (λ inj,i ; Λ) at the location of each injection can once again be related to the underlying volumetric rate using Eq. ( 11).The critical ingredients underlying Eqs. ( 9) and ( 12) are the differential rates R(λ I,j ; z, Λ) and R(λ inj,i ; z, Λ) at the locations of every posterior sample and every found injection.In the usual strongly-parametrized approach, we obtain these quantities by assuming some functional form for R(λ; z, Λ).Here, our goal is to not assume a particular functional form for the differential rate, but to directly infer the merger rate at every posterior sample and every found injection using our autoregressive prior.In adopting this approach, we have rid ourselves of (nearly) all ordinary hyperparameters.Instead, the merger rates at every posterior sample and every injection are themselves the parameters that we directly infer from the data.This is a rather large-dimensional parameter space.If we have N obs events (each with N samp posterior samples) and N inj injections, we are directly inferring the binary merger rate at N obs N samp + N inj discrete locations.The form of Eq. ( 6), however, imposes an almost equally large number of constraints, ensuring that the inference problem remains tractable.
We haven't quite discarded all hyperparameters: we do still need to determine the variance σ and autocorrelation length τ associated with our autoregressive rate prior.Rather than fix σ and τ , we hierarchically infer them as well, allowing our data to dictate the characteristic length scale and size of features present in the binary black hole population.In Appendix B we derive and discuss the priors we place on σ and τ .To obtain physically meaningful priors, we approach the problem indirectly, considering not constraints on σ and τ themselves but instead on allowed variations in the black hole merger rate R(λ); these choices then induce priors on σ, τ , and the ratio σ/ √ τ .
It is worthwhile to compare our methodology to other flexible approaches appearing in the literature.One similar approach is the spline-based method appearing Refs.[18][19][20]; this model proceeds by first defining merger rates over a discrete set of "knot" locations and then constructing a spline interpolant between these knots.The rates at each knot location may themselves be linked via a Gaussian process prior or other regularization schemes [19].The "binned Gaussian process" models in Refs.[5,[21][22][23][24][25] operate similarly.In this approach, a merger rate is again defined across a discrete grid of points and interpolated, but now assuming that the merger rate is a piecewise constant between grid points rather than a spline.
A primary methodological difference between these approaches and ours is that we perform no interpolation: the parameters governing our models are the direct merger rates at each point of interest, rather than the rates defined over some reference grid.Our autoregressive model also behaves in ways that make it complementary to these other approaches.Because spline interpolants are continuous in their derivatives, the spline-based approaches above are suitable for identifying smooth trends in the data, but may struggle to resolve sharp features or features at the same scale as the knot locations.The continuity and differentiability imposed by spline models can additionally sometimes give rise to oscillatory "ringing" that depends on the precise choice of knot locations [20,36].Our AR(1) model requires no differentiability, however, avoiding this oscillatory behavior.The lack of a reference grid also means that we require no a priori choice of scale.This freedom, however, greatly boosts the computational cost of our approach and does give rise to possible instabilities in the hierarchical likelihood; this instability is described in Appendix B. And even with the flexibility afforded by an AR(1) process, there do remain limitations on the degree to which our model can recover discontinuously sharp features; see Appendix C for further discussion.
We implement our autoregressive model using jax [37] and numpyro [38,39], which enable compilation and auto-differentiation of our likelihood.We perform our Bayesian inference using numpyro's implementation of the NUTS ("No U-Turn Sampler") algorithm [40], a variant of Hamiltonian Monte Carlo (HMC) sampling [41].As noted above, our autoregressive models actually comprise a vast number of latent parameters: one per posterior sample and found injection.In practice, this amounts to ≈ 2.5 × 10 5 parameters for the analyses presented in this paper.Given this extremely highdimensional space, the computational acceleration and sampling efficiency afforded by auto-differentiation and HMC methods is critical.Further details regarding our hierarchical inference, including the exact data and priors used, are given in Appendix B.

III. STOP ONE: MASSES
We first use our autoregressive model to investigate the distribution of binary black hole primary masses m 1 and mass ratios q.We consider the merger rate to be the combination of two parallel autoregressive processes, Ψ(ln m 1 ) and Φ(q), that capture the dependence of the merger rate on both ln m 1 and q: R(ln m 1 , q, χ 1 , χ 2 , cos θ 1 , cos θ 2 ; z) = r e Ψ(ln m1) e Φ(q) 1 + z We fit for both Ψ(ln m 1 ) and Φ(q) simultaneously, allowing each process to possess its own variance and autocorrelation length.
While our focus in this section is mass distribution of binary black holes, when measuring the population distribution of any one parameter it is usually important to simultaneously fit the distributions of other parameters like spin magnitudes χ i , spin-orbit misalignment angles θ i , and redshifts.There is no fundamental reason why the distributions over all binary parameters cannot be fit as simultaneous autoregressive processes.Since each additional AR(1) process introduces a fairly high computational cost, however, for simplicity we will fit the "leftover" redshift and spin distributions by falling back on ordinary parametrized models.We assume that the merger rate evolves as (1 + z) κ for some unknown index κ [42], and that component spins are independently and identically distributed with a probability distribution p(χ 1 , χ 2 , cos θ 1 , cos θ 2 ) of the form given in Appendix D; these redshift and spin distributions are hierarchically fit alongside our autoregressive mass model.Note finally that our model in Eq. ( 13) is presumed to be separable, with no correlations between the masses, mass ratios, and spins of binary black holes.
The top panel of Fig. 3 shows our autoregressive measurement on the merger density rate of binary black holes as a function of primary mass, evaluated at q = 1 and z = 0.2 and marginalized over spin degrees of freedom.Each blue trace shows a single posterior sample for R(ln m 1 ),2 while the thick and thin black curves mark a running median and central 90% credible bounds, respectively.We note that our presentation of the mass spectrum, conditioned on some particular reference values of mass ratio and redshift, is slightly unusual; it is more common to show a mass distribution that has been fully marginalized over other parameters.When marginalizing a merger rate over one or more parameters, however the result can show extreme systematic dependence on the exact model presumed for these marginalized parameters, particularly across regions of parameter space that are not well-measured.An extreme example can be found in Ref. [5], in which the fully-marginalized binary neutron star merger rate can vary by two orders of magnitude depending on the mass model used.Our approach in this paper will be to minimize such systematics by instead quoting differential merger rates at well-measured locations in parameter space (e.g.q = 1 and z = 0.2); this approach maximizes precision and best enables comparison to predictions between observation and theory.
Returning to Fig. 3, we see three possible features in the black hole primary mass spectrum: The binary merger rate appears to be maximized at m 1 ≈ 10 M primary masses, falling off with both lower and higher primary masses.We can quantify the significance of this feature by computing the fraction of posterior samples that exhibit a systematic peak in this neighborhood.To do so, we compute and compare the average merger rates across three bins: 7.5−9 M , 9−11 M , and 11−13.5 M (chosen to have roughly equal logarithmic widths).We regard a "peak" as a case when the averaged merger rate in the middle interval is higher than the averaged merger rates in both adjacent bins.As shown in the top panel of Fig. 4, we find that 96% of our samples meet this criterion and exhibit a systematic peak near 10 M .

2.
A local maximum at m 1 ≈ 35 M .We can again quantify the significance of this feature by comparing the average rates across three bins: 20 − 28 M , 28 − 40 M , and 40 − 55 M .As shown in the middle panel of Fig. 4, 94% of our posterior draws yield higher averaged merger rates in the 28 − 40 M range than in both adjacent bins.Thus both the 10 M and 35 M maxima have roughly equal significance; although neither are unambiguously required by the data, both are favored to exist at greater than 90% credibility.
3. Steepening of the continuum above 40 M .Between the 10 M and 35 M maxima is a large, relatively flat continuum.Above the 35 M maximum, the continuum appears to steepen, falling off more rapidly with increasing mass.We quantify the evidence for this steepening by computing and comparing the mean power-law slope of the black hole merger rate above and below the 35 M maximum.From each posterior sample we extract the merger rates R(ln m 1 ) near 15, 25, 45, and 85 M ; these are then used to compute the power-law indices characterizing the middle and high end of the mass spectrum: Autoregressive Power-Law + Peak Eq. 16

FIG. 3. Top:
The binary black hole merger rate as a function of primary mass, inferred non-parametrically under an autoregressive prior.The merger rate is evaluated at mass ratio q = 1, redshift z = 0.2, and marginalized over spins, following the model defined in Eq. ( 13).The solid black trace marks the mean inferred R(ln m1) as a function of m1, while the lighter black traces bound our central 90% credible bounds.Individual posterior draws on R(ln m1) are shown via light blue traces.Three features naturally emerge in the inferred mass distribution: a global maximum in the merger rate at m1 ≈ 10 M , a secondary maximum at m1 ≈ 35 M , and an otherwise smooth continuum that steepens above 40 M .Each of these three features is exhibited by approximately 90% of posterior draws.Bottom: A comparison between our autoregressive inference (blue band) and results obtained using the strongly-parametrized PowerLaw+Peak model in Ref. [5] (red).Rates are again evaluated at q = 1, z = 0.2, and marginalized over spin.Each band encompasses the central 90% credible region inferred using the given model.Both approaches give consistent estimates of the merger rate at m1 ≈ 10 M , as well as the merger rate in the 30 − 70 M interval.In order to match these rates, though, we see that the parametrized model is forced to overestimate the merger rate between 15 − 30 M , as well as the merger rate below 10 M .Furthermore, our autoregressive model shows no indication of a sharp cutoff in the binary mass distribution at or above 80 M (this feature is included a priori in the strongly parametrized model).A simple fit to our median inferred rate, using the parametric form of Eq. ( 16), is shown via the black dotted curve. and We write R(25 M ), for example, to indicate the average merger rate in a 1 M window about 25 M .Using window-averaged rates in this fashion enables more reliable estimates of representative power-law indices, due to the rapid oscillations exhibited by individual R(ln m 1 ) traces.The joint distribution of both power-law slopes is plotted in the lower panel of Fig. 4. In the 15 − 25 M interval, we find an average power-law index , while in the 45 − 85 M range we find α high = −3.8+2.6 −2.7 .We identify a preference for steepening, with α high < α mid , in 89% of samples, although this behavior is not strictly required by the data.
The significances of the 10 M and 35 M peaks, as computed here, are similar to but more conservative than significance estimates presented elsewhere.A stronglyparameterized analysis presented in Ref. [5] identify a 35 M excess at effectively 100% credibility, and an anal- ysis in the same study using splines to measure deviations from an ordinary power-law finds upward fluctuations at 10 M and 35 M with > 99% credibility (see also Refs.[18,19]).Ref. [36] alternatively explores the frequency with which apparent peaks might arise purely from random counting statistics, due to our stillmoderate number of binary black hole detections.By repeatedly drawing realizations of 69 events from a peakless power-law population, they find the observed 10 M and 35 M peaks to be more statistically significant than > 99% of false peaks arising from random clustering.The difference between these significance estimates and ours is likely two-fold.First, these significance estimates test slightly different features; an upward fluctuation relative to a power-law does not necessarily indicate a local maximum, but can also be caused by a plateau or change in slope.Second, by virtue of its extreme flexibility, our autoregressive prior likely maximizes the variance in our R(ln m 1 ) measurements, diminishing slightly our confidence in any given feature.We note that our assessment of feature significance does not depend on the particular choice of reference mass ratio and redshift adopted in Fig. 3; different reference values would rescale each merger rate by a hyperparameter-dependent constant, which cancels when subsequently taking ratios between rates as in Fig. 4.
In addition to the 10 M and 35 M maxima, other studies have noted the possible existence of other features in the primary mass spectrum, namely additional maxima or minima in the 15−25 M range [5,12,19,43].We do not see evidence for any such features here, however.This indicates that, with current data, any additional features are likely prior-dependent and consistent with random clustering of a still small number of observations.Refs.[5,18,19,36] note a somewhat significant dip in the mass spectrum, relative to a powerlaw, near 14 M .We interpret this result not as a local minimum, but just as a flattening of the power law index at lower masses, as seen in Fig. 3 and discussed further below.Additionally, various studies have searched for the presence of a high-mass cutoff in the black hole mass spectrum [5,8,9,11,23,24,[44][45][46], due possibly to the occurrence of pair-instability supernova [47].In Ref. [5], for example, it is inferred that if such a cut-off exists, then it must occur at m 1 > 78 M at 95% credibility.Our analysis, however, shows no indication of a cutoff in the black hole mass spectrum, instead recovering a distribution that remains smoothly declining out to m 1 ≈ 100 M .Note that the slight increase in the variance of R seen near 100 M marks a reversion to the prior in the region m 1 100 M where we have little data.
It is valuable to compare our autoregressive results to measurements made using standard stronglyparameterized models, in order to identify regions where strongly-parameterized models may fail to capture features in the data and to guide the iterative development of improved models going forward.In the bottom panel of Fig. 3, we compare our autoregressive model of the binary merger rate (blue) with results obtained under PowerLaw+Peak model [9] presented in Ref. [5].Both models identify an excess of mergers near 35 M , and both measure approximately consistent merger rates near 10 M .We see two signs of tension, however.First, the PowerLaw+Peak model otherwise adopts a single unbroken power law; in order to match the merger rate at both low (10 M ) and high (≥ 30 M ) masses, it is therefore forced to overestimate the merger rate in the 15 − 30 M range.This is consistent with the downward perturbation identified by spline-based methods in this region [5,18,19,36]; this downward perturbation may not be caused by a local minima in the mass spectrum, but just a flattening of the power law index at lower masses.
In cases where a strongly-parameterized phenomenological model is needed, our autoregressive result suggests that a sufficient choice is a model comprising two Gaussian peaks and a broken power-law, with a probability density Here, we use N (m 1 |µ, σ) to signify a normalized Gaussian distribution with mean µ and standard deviation σ, and Γ(m 1 ) to denote a broken power law tapered towards zero at low masses: with a proportionality constant chosen to enforce Γ(m 1 )dm 1 = 1.A least-squares fit against our mean inferred ln R(ln m) gives best-fit parameters The corresponding distribution p(ln m 1 ) = p(m 1 )m 1 is shown as a dotted line in Fig. 3.Note that this fit approximates the fully-marginalized primary mass distribution, and is thus valid at any choice of q, z, etc.Compared to the primary mass distribution, we resolve relatively little information about the distribution 5. Top: The merger rate of binary black holes as a function of mass ratio, evaluated at m1 = 20 M , z = 0.2, and integrated over possible spins, following Eq.( 13).The thick and thin black lines mark the mean and central 90% bounds on R(q), while thin blue traces show individual draws from our posterior on R(q).We see a preference for an increasing merger rate as a function of q, but this behavior is not strictly required.Bottom: Comparison between R(q) as inferred by our autoregressive model (blue) and the strongly-parametrized analysis of Ref. [5] (red), which assumes a power-law dependence on q with a truncation in the merger rate below q = mmin/m1 for some minimum mass mmin.The two results are broadly consistent, although under our autoregressive model we find reduced evidence for a merger rate that increases with larger q.
of black hole mass ratios.The top panel of Fig. 5 illustrates our constraints on R(q), evaluated at z = 0.2, m 1 = 20 M , and integrated over component spins.The only feature that manifests in Fig. 5 is a possible preference for larger q.As above, we can compare integrated merger rates in two bands, 0.5 ≤ q ≤ 0.6 and 0.9 ≤ q ≤ 1, to quantify the significance of this feature.We find that the merger rate the high-q interval is greater than the rate in the low-q interval for 90% of samples, such that the binary black hole population likely favors equal mass ratios.
In the lower panel of Fig. 5 we compare our results with the strongly-parametrized measurements presented in Ref. [5] using the PowerLaw+Peak model, in which the mass ratio distribution is modeled as a power-law with a primary-mass-dependent truncation: p(q|m 1 ) ∝ S(q; m 1 ) q βq .( Here, S(q; m 1 ) is a tapering function that sends p(q|m 1 ) to zero when q < m min /m 1 for some m min .Both results are again evaluated at z = 0.2, m 1 = 20 M , and integrated over black hole spins.Other than the truncation below q ≈ 0.2 (imposed in Ref. [5] as an a priori modeling choice), both sets of results are broadly consistent.
In the strongly-parameterized analysis of Ref. [5], it is found that β q > 0 with 92% credibility, comparable to our significance estimate above. 3  IV.STOP TWO: REDSHIFTS Next, we investigate the redshift distribution of binary black holes.In most analyses, the redshift dependence of the binary black hole merger rate is presumed to follow a power-law form: R(z) ∝ (1 + z) κ for some index κ [5,8,42,46,48,49].Under this model, it has been concluded that the binary black hole merger rate systematically grows with redshift at a rate consistent with star formation in the local Universe [5].Here, we will instead model the redshift dependence of the black hole merger rate as an autoregressive progress, searching for any features that might be missed under a more stronglyparameterized approach.We simultaneously measure the mass and component spin distributions by falling back on the "strongly-parameterized" models described in Appendix D. Together, our model is of the form The left panel of Fig. 6 shows our resulting inference on the binary merger rate as a function of redshift, evaluated at m 1 = 20 M and q = 1 and integrated across spins.Blue traces show individual draws from our posterior, the solid black curve marks the running median rate, and thin grey lines denote central 90% credible bounds on the merger rate at each redshift.The right panel of Fig. 6 compares these results (in blue) to the results obtained in Ref. [5] using the strongly-parameterized power-law model for the black hole merger rate.Both approaches yield consistent estimates of the merger rate at z ≈ 0.3 and z ≈ 1, but our autoregressive result suggests that the intervening evolution is not necessarily well-modeled by a power law.Instead, our result is consistent with a "sigmoid" shape displaying the following features: 1.A non-evolving, uniform-in-comoving-volume rate below z ≈ 0.4.At the lowest redshifts, the data do not require the merger rate to evolve with redshift.Instead, our autoregressive results are consistent with a rate that 3 Within Ref. [5], the right-hand panel of Fig. 10 appears to be in tension with this significance estimate, instead showing a measurement of R(q) (marginalized over all m 1 ) that unambiguously increases as a function of q.The behaviour in Fig. 10 is actually due to the presumed truncation in the mass ratio distribution, rather than a confident measurement of positive βq.Since the overall merger rate is highest at small m 1 , the structure of the marginalized R(q) must correspondingly be dominated by the mass ratio distribution at small m 1 .The truncation in p(q|m 1 ), however, enforces that q ≈ 1 when m 1 is small.This combination of effects requires R(q) to be maximized at q = 1 after marginalization over m 1 , nearly independently of βq.
remains constant out to z ≈ 0.4.To gauge the significance of this feature, we compute and compare the mean merger rates in two intervals: 0.1 < z < 0.2 and 0.3 < z < 0.4.As shown in the left panel of Fig. 7, we find these mean rates to be consistent with one another, with the mean rate in the 0.3 < z < 0.4 interval exceeding the rate in the 0.1 < z < 0.2 interval only 53% of the time.
2. A rise in the merger rate between z ≈ 0.4 and 0.8.Beyond redshift z ≈ 0.4, however, we do find a requirement that the merger rate rise by up to an order of magnitude by z ≈ 0.8.We quantify the significance of this rise by comparing the mean merger rate between 0.1 < z < 0.2 to the mean rate between 0.7 < z < 0.8.As shown in the right panel of Fig. 7, the mean rates in these high-and low-redshift intervals are confidently unequal, with the 0.7 < z < 0.8 merger rate exceeding the 0.1 < z < 0.2 rate 93% of the time.Beyond redshift z ≈ 1, the absence of informative data causes our measurement to asymptote back towards the autoregressive prior, yielding expanding error bars towards higher redshifts.
Other studies employing flexible non-parameteric analysis have also obtained results indicating a possible tension with a (1 + z) κ power law.Ref. [50] explored the use of population models composed of "Green's-functions"like delta functions as a tool with which to diagnose the performance of strongly-parameterized models.They find the likelihood to be maximized when R(z) is modeled as a sequence of delta functions that initially decrease in height below z ≈ 0.13, followed by an elevated but flat merger rate between 0.2 z 0.5 that then Autoregressive (1 + z) κ Eq. 22 FIG. 6. Left: The binary black hole merger rate as a function of redshift, inferred non-parametrically using an autoregressive process prior.The merger rate is evaluated at a primary mass m1 = 20 M , mass ratio q = 1, and integrated over black hole spins.Light blue traces show individual draws from our posterior, while the black and grey curves denote a running median and central 90% credible bounds, respectively.Right: A comparison between our non-parametric result (blue) and the result obtained in Ref. [5] when assuming that the merger rate evolves as (1 + z) κ for an unknown index κ.Both bands denote 90% credible bounds.We see that both approaches recover similar merger rates at z ≈ 0.3 and z ≈ 1, and both indicate that the black hole merger rate systematically grows with redshift.Our autoregressive result, however, suggests that this growth may not be well-modeled by a power law, but instead by a slowly growing or constant merger rate that begins to evolve more sharply only beyond z 0.4.The dashed black curve, for example, shows the result of a simple least-squares fit to our median inferred merger rate using the sigmoid model defined in Eq. (22).A broken power law, as in Eq. ( 21), also yields a good fit at z 0.8.Left: A comparison between the mean merger rate across the interval 0.3 < z < 0.4 and the mean rate across 0.1 < z < 0.2.Each point corresponds to a single posterior draw from Fig. 6.All estimates cluster around the diagonal, indicating that the merger rates in both intervals are consistent with one another.The data are therefore consistent with a non-evolving merger rate below z 0.4.Right: An analogous comparison between the mean merger rate in the interval 0.7 < z < 0.8 and the mean rate within 0.1 < z < 0.2.The merger rate in the high-redshift interval is greater than that in the low-redshift interval for 96% of samples, indicating a preference for a merger rate that grows at large redshifts.more sharply rises between 0.5 z 0.75; see their Fig. 5.Other than the initially decreasing merger rate, which we do not recover, these results are consistent with the behavior we see in Fig. 6.Ref. [19], in turn, measured the redshift-dependent merger rate using a set of basis splines to capture deviations from a (1 + z) κ power law.They too recover a largely constant merger rate density below z ≈ 0.4, followed by a steeper increase in the merger rate out to z ≈ 1; see their Fig. 8.
If real, the step-like structure in the redshift-dependent merger rate could arise from a variety of effects.The redshift-dependent merger rate R(z) is generally modeled by convolving an estimate of the metallicity-dependent cosmic star formation rate with a distribution of time delays between progenitor formation and binary merger; the time delay distribution is itself typically modeled as a power law.The resulting merger rate is also usually welldescribed by a power-law at low redshifts.If the observed binary black hole population is dominated by a single formation channel, the possible non-power law behavior in Fig. 6 could indicate additional non-trivial structure in the birth rate or time delay distribution of binary progenitors.Alternatively, the observed binary population could comprise a mixture of several distinct formation channels.A shift from a flat to an evolving merger rate at z ≈ 0.5 could mark a transition between two formation channels, one of which dominates low-redshift mergers and the other of which takes over at larger redshifts.If a mixture between formation channels is the correct explanation of Fig. 6, then we should also expect to see systematic evolution in other intrinsic properties of binary black holes between low and high redshifts.Although no such evolution has been found in the binary black hole mass spectrum [46,49], the binary black hole spin distribution does potentially evolve with redshift, with the effective inspiral spin (further discussed in Sect.VI below) becoming larger and more positive at higher z [51].Additional observations will be critical in confirming the trends identified in Fig. 6 and in Ref. [51] and in probing any relationship between these two trends.
When a parametric model is required, our autoregressive results suggest that one might replace the standard power-law model with a broken power law: with a transition between power-law indices κ 1 and κ 2 occurring at z = z b , or a sigmoid, in which the merger rate density rises from R 0 to R 0 +δR across an interval of width δz around a transition redshift z b .A least-squares fit to our median ln R using Eq. ( 21) gives A fit using Eq. ( 22), in turn, gives this fit is shown as a dotted curve in Fig. 6.As our autoregressive results begin to revert to the prior above z = 1, these fits are performed only in the restricted range z ≤ 0.8.Recall that the above fits describe the merger rate per ln m 1 per unit q evaluated at m 1 = 20 M and q = 1, not the fully-integrated merger rate.If the full binary black hole merger rate, integrated over all masses, is desired, this can be fit with the same functional forms: or in Eq. ( 26).

V. STOP THREE: COMPONENT SPINS
Next, we turn to the distribution of spins among binary black hole systems.A black hole binary is characterized by six spin degrees of freedom, three per component spin.Assuming that component spins have no preferential azimuthal orientations (although see Ref. [52]), we work in a reduced four-dimensional space and fit for the distributions of component spin magnitudes, χ 1 and χ 2 , and (cosine of the) spin-orbit tilt angles, cos θ 1 and cos θ 2 .We assume that the variation of the merger rate across spin magnitudes and tilts is described via two autore-gressive processes, Ψ(χ) and Φ(cos θ), with the two component spins in a given binary distributed independently and identically.As we measure Ψ(χ) and Φ(cos θ), we simultaneously infer the mass and redshift distributions of the binary black hole population by falling back on or-dinary "strongly-parametrized" models, assuming a primary mass and mass ratio distributions f (m 1 ) and p(q) as described in Appendix D and a merger rate density that grows as (1 + z) κ .Together, our full merger rate model is of the form Figure 8 shows our autoregressive measurements of the black hole spin magnitude and tilt distributions.We plot our results in two ways.First, the upper row shows merger rates as a function of spin magnitude and orientation.The upper left panel shows the merger rate of binaries along the χ 1 = χ 2 = χ diagonal at fixed reference mass, mass ratio, redshift, and spin tilts (m 1 = 20 M , q = 1, z = 0.2, and cos θ 1 = cos θ 2 = 1); using Eq. ( 29), this is given by Similarly, the upper right panel shows the merger rate along the cos θ 1 = cos θ 2 = cos θ diagonal at fixed spin magnitudes (χ 1 = χ 2 = 0.1) and the same reference masses and redshift: R(cos θ 1 , cos θ 2 = cos θ) = r e Ψ(0.1) 2 e Φ(cos θ) 2 .
(31) We choose to plot results along the χ 1 = χ 2 and cos θ 1 = cos θ 2 diagonals to mitigate systematic modeling uncertainties, in much the same way that we plot merger rates conditioned on specific values of other parameters rather than marginalizing over them.The rate of black hole mergers as a function of χ 1 only (marginalized over χ 2 ), for instance, is strongly affected by assumptions regarding spin pairing which tend to differ widely across the literature.For better comparison with other work, however, in the lower row we also show the implied probability distributions on individual component spin magnitudes and tilts.Since we assume that component spins are independently and identically distributed, these are given by and Note that Eqs. ( 30) and ( 31) are proportional to the squares of Eqs. ( 32) and (33), respectively.
From Fig. 8, we can make the following parameter-free statements regarding the binary black hole spin distribution: 1.The binary black hole merger rate is maximized at low spin magnitudes.As in Sect.III, we can evaluate the robustness of this statement by comparing mean merger rates in different intervals.We find, for example, that our inferred rate of mergers with 0 ≤ χ ≤ 0.2 is greater than the rate of mergers across 0.6 ≤ χ ≤ 0.8 for each of our 4500 posterior samples on R(χ).In Fig. 9, we additionally show the ensemble of cumulative distribution functions corresponding to our posterior on p(χ) from Fig. 8.We find the 50th percentile to occur at χ 50% = 0.21 +0.07 −0.07 , such that half of black holes have spin magnitudes below χ 0.2.The p(χ) distribution shown in Fig. 8 furthermore suggests that the spin magnitude distribution may actually peak near χ ≈ 0.2; the recovered mean (shown in black) rises slightly in this region and our the upper bound on p(χ) is elevated between 0.2 χ 0.25.Neither of these features are statistically significant though; only 66% of traces give larger integrated probability in the 0.15 χ 0.35 interval than in the 0 χ 0.15 interval.Thus the spin magnitude distribution is consistent with a peak global maximum at χ ≈ 0.

2.
No special features at χ = 0 or χ = 1.Despite a preference for small spins, the binary black hole population shows no special preference for non-spinning or maximally spinning black holes.The lack of any discernible features at χ = 0 or χ = 1 is in tension with common assumptions in the population synthesis of compact binaries [53][54][55]: that efficient angular momentum transport yields isolated black holes born with very small (e.g.χ 0.1) [56,57] or vanishing (χ 0.01) natal spin magnitudes [58].This should yield a sharp excess of low or non-spinning systems in the binary black hole spin distribution.Meanwhile, if some fraction of mergers arise from isolated stellar binaries, then late time tidal spinup of the second-born black hole's progenitor can override otherwise efficient angular momentum loss, yielding a secondary sub-population of black holes with spins up to χ ≈ 1 [53,54,59,60].The existence of these predicted features has been the subject of much scrutiny.Some studies initially found that gravitational-wave data  The merger rate of binary black holes as a function of component spin magnitudes (left) and spin-orbit misalignment angles (right), as inferred using our autoregressive model defined in Eq. ( 29).In the axes labels, we use the shorthand d χ1 ≡ dχ1d cos θ1 to indicate a density over both spin magnitude and cosine tilt.Specifically, the rates shown are that of binaries with equal component spin magnitudes (χ1 = χ2 = χ; see Eq. ( 30)) or tilts (cos θ1 = cos θ2 = cos θ; Eq. ( 31)), each evaluated at fixed reference masses and redshift (m1 = 20 M , q = 1, and z = 0.2).The bottom panels show the corresponding probability distributions on component spin magnitudes and tilts among black hole binaries.Within each panel, the central black curve marks the mean inferred rate/probability, while outer black curves bound 90% credible intervals.We see that spin magnitudes are well-described by a unimodal distribution that peaks at low values, with no sign of an excess of non-spinning (χ = 0) or near-maximally spinning black holes.Meanwhile, the rate of binary mergers is non-zero across the full range of misalignment angles, with a spin-tilt distribution that is possibly (but not necessarily) isotropic.While there also appears to be a possible excess of black holes with cos θ ≈ 0.4, this feature is not statistically significant.
were consistent with [61] or even required [62] 4 two distinct populations: a "spike" at χ = 0 comprising the majority of the binary population and a secondary broad sub-population centered at χ ≈ 0.5 and possibly extending to large spins.Follow-up investigations, though, have concluded instead that the data remain agnostic about 4 A subsequent erratum [63] diminished the initial evidence in Ref. [62] distinct non-spinning and spinning sub-populations, bringing their conclusions into closer agreement with those of Refs.[5,14,15,64] these features [5,14,15,64].In our Fig. 8, we see no indication of an excess of non-spinning systems, nor do we see any feature suggesting a sub-population of rapidly spinning black holes.There may exist a small number of rapidly spinning black holes; as illustrated in Fig. 9 we infer the 95th percentile of the spin magnitude distribution to occur at χ 95% = 0.72 +0.17 −0.25 .We emphasize, however, that there is no observational evidence that these systems comprise a physically distinct sub-population, and not simply an extended tail of a single predominantly low-spin population.Consistent results have also been found when alternatively using splines to flexibly model Cumulative distribution functions of binary black hole spin magnitudes (top) and cosine tilt angles (bottom), corresponding to the probability distributions shown in Fig. 11.For reference, we mark median estimates of the 50th and 95th percentiles in the spin magnitude distribution, occurring at χ 50% = 0.21 +0.07 −0.07 and χ 95% = 0.72 +0.17 −0.25 , respectively.We also indicate the measured 50th percentile of the cos θ distribution, occurring at cos θ 50% = 0.16 +0. 28 −0.17 , with cos θ 50% > 0 at 88% credibility.
3. The merger rate is non-zero at χ = 0. Despite no excess of systems with vanishing spin, the the binary black hole merger rate is confidently non-zero at χ = 0.This is in conflict with commonly-used parametric models that assume component spins follow nonsingular Beta distributions [5,8,62,65], which by definition require that p(χ) = 0 at χ = 0; see Fig. 11 and further discussion below. 5The fact that the spin mag- 5 Sometimes singular Beta distributions are also allowed.Singular Beta distributions give p(χ) → ∞ as χ → 0, which is also precluded in Fig. 8.
nitude is non-zero at χ = 0 may have implications for the processes by which black holes acquire their spins.
If black holes acquire their spins via stochastic or incoherent isotropic processes (e.g.random bombardment by gravity waves soon before core collapse [66,67] or statistically isotropic fallback accretion), then the spin magnitude distribution should have a Maxwellian-like form p(χ) ∝ χ 2 near χ = 0.The fact that this is not seen suggests instead that black hole spins originate instead from longer-lived or directionally-coherent processes [68,69].
3. Black holes exhibit a broad range of spin-orbit misalignment angles.As illustrated in the upper-and lowerright panels of Fig. 8, we infer a non-zero merger rate across the full range of cos θ.Using our autoregressive constraints on R(cos θ), we estimate that 41 +9 −17 % of black hole spins are misaligned by more than 90 • with respect to binaries' orbital angular momenta, and that the rate of mergers with at least one component spin tilted by θ > 90 • is 17.0 +10.7 −7.4 Gpc −3 yr −1 .Past studies using strongly-parametrized models have also concluded that the binary black hole population exhibits significant spin-orbit misalignment [5,8,14].The results presented here, obtained under our highly agnostic and parameterfree autoregressive model, corroborate these conclusions.
4. A perfectly isotropic distribution is moderately disfavored.As seen Fig. 8, both the merger rate R(cos θ) and probability distribution p(cos θ) have a tendency to increase towards positive cos θ.In Fig. 9 we show the corresponding cumulative distribution of cos θ and the inferred median cos θ among the black hole population.We find this median to be cos θ 50% = 0.16 +0. 28 −0.17 , with cos θ 50% > 0 for 88% of our posterior samples (the mean value of cos θ is also positive at comparable credibility).These results somewhat disfavor a purely isotropic component spin distribution, although isotropy cannot yet be ruled out.
5. A possible excess of systems with cos θ ≈ 0.4?As identified in Ref. [70], we also see a possible excess of black holes with cos θ ≈ 0.4.We find that although this feature is possible, it is not required by the data.Following our procedure from Sect.III above, we can evaluate the significance of the cos θ ≈ 0.4 peak by asking what fraction of posterior samples give a higher mean probability in a window centered on the peak than in windows at both higher and lower cos θ values.Shown in Fig. 10, only 37% of samples are consistent with a peak at cos θ ≈ 0.4.While the probability distribution of spin tilts is very likely to rise between cos θ ≈ −0.1 and cos θ ≈ 0.4, few samples exhibit the subsequent drop necessary for a peak.
Figure 11 compares our flexible autoregressive inference with results from the strongly-parameterized Default model [65] presented in Ref. [5].In this model, component spin magnitudes are independently and identically drawn from a Beta distribution, while spin tilts are drawn from a mixture between isotropic and preferentially-aligned sub-populations.The two approaches generally yield similar conclusions, with some  8.For each probability distribution p(cos θ) in Fig. 8, we show the ratios between the mean probability in the window 0.05 < cos θ < 0.75 (centered on the possible peak) and the mean probabilities across adjacent windows at smaller and larger cos θ.When a peak is present, both ratios should be greater than one, corresponding to the upper right quadrant.We find that only 37% of posterior samples fall in this quadrant, indicating that the cos θ ≈ 0.4 peak is not statistically significant.
notable exceptions.First, as noted above, the Default model is defined such that p(χ) is necessarily zero at χ = 0. Our autoregressive results indicate that this is likely not the case; we infer a non-zero rate/probability density of mergers with χ = 0 (although no excess of such mergers as one might expect if isolated black holes have vanishing natal spins).Second, strongly-parametrized approaches typically require the cos θ distribution to be either isotropic or peaked at cos θ = 1.As illustrated in e.g. the bottom right-hand panel of Fig. 11, the data tell a more complicated story, with a possible (albeit statistically insignificant) feature at intermediate cos θ values.See Ref. [33] for further investigations of this feature.
Finally, it is instructive to compare the behavior of R(cos θ), in top right panel of Fig. 11, with that of p(cos θ), in the bottom right.Studies of the black hole spin distribution sometimes include the following seemingly inconsistent statements: (i ) that an isotropic cos θ distribution is disfavored but cannot be ruled out, and (ii ) that our knowledge of p(cos θ) is accurately reflected e.g.via the red band in the lower-right panel of Fig. 11. Figure 11, though, seems to show unambiguously that p(cos θ) is an increasing function of cos θ, in conflict with the first of the two statements above!The resolution to this tension is involves the fact that we directly measure R(cos θ), not the normalized probability distribution p(cos θ).Although spin isotropy is disfavored, it is evident in Fig. 11 that a flat R(cos θ) cannot yet be fully ruled out.The renormalized probability density p(cos θ) can inadvertently obscure this fact: Al-though there may exist many distinct posterior samples which yield isotropic R(cos θ) (e.g.flat traces at different vertical positions within the red or blue bands), each of these possibilities is mapped to the same function, p(cos θ) = 1/2, upon normalization.Hidden behind the "uncertainty bands" in the lower-right panel of Fig. 11 is thus a very uneven density of possibilities, with a high number of individual draws stacked directly on p(cos θ) = 1/2.Because the uncertainty bounds do not communicate this density, the result is a figure that appears to indicate an unambiguous measurement of anisotropy.In order to avoid this counterintuitive behavior, we recommend that measurements of the cos θ distribution be shown as both constraints on the probability density p(cos θ) and the merger rate R(cos θ).
While our autoregressive model makes minimal physical assumptions, it does still impose a degree of continuity in the merger rate as a function of χ and cos θ.It is therefore reasonable to ask if our conclusions above are still being driven by modeling assumptions, rather than informative data.This is particular critical when interpreting our conclusions regarding the lack of sharp features near χ ≈ 0; is our non-detection of such features significant, or do they fall outside the coverage of our model?In Appendix C, we conduct a mock data challenge to test the ability of our autoregressive model to recover a sharp excess of non-spinning black holes.Although the resolution of our results is at times limited by the processes' finite scale length τ , we find that we can successfully identify narrow excesses or bimodalities in the merger rate arising from an population of non-spinning systems, should it exist.
A related question is the degree to which we can trust extended tails appearing in our autoregressive measurements of the spin-dependent merger rate.As also demonstrated in Appendix C, our autoregressive process never go completely to zero, as this would correspond to ln R → −∞.Consequently, are the tails in Fig. 8 towards large χ and negative cos θ physically meaningful, or do they arise from our prior modeling assumptions?Within Appendix C we find that, in the absence of observations, the recovered merger rates asymptotically approach a value corresponding to N exp 1 total expected detections (integrated across the region of interest).We can leverage this behavior to gauge the extent to which tails in our χ and cos θ distributions are prior-or likelihood-dominated. Specifically, we use our posteriors on R(χ i ) and R(cos θ i ) to compute expected detection rates at large χ and small cos θ and identify the threshold spin magnitude and tilts beyond which we expect fewer than N = 2 component spins to arise in our sample; these values mark the boundaries beyond which our results are likely prior dominated.This calculation is described in more detail in Appendix E, and accounts also for the influence of selection effects on the observed distribution of binary parameters.
Marginalizing over our posteriors on the spin magnitude distribution, we find an expectation value of N ≤  p(cos θ) Autoregressive "Default" Model Eq. 32 FIG.11.A comparison of the binary black hole spin distributions inferred using our autoregressive model (blue) and that recovered by a strongly parameterized approach (red, the Default model of [5]).As in Fig. 8, the top row shows the binary merger rate as a function of component spin magnitude and spin-orbit tilt angle, at fixed m1, q, and z, while the lower row shows the corresponding probability distributions.Component spins are assumed to be independently and identically distributed.Overall, there is good reasonable qualitative agreement between both sets of results; each recovers similar merger rates across the range of cos θ values and for 0.1 χ 0.3.At the same time, the autoregressive results indicate that the merger rate remains finite for both smaller and larger spin magnitudes, whereas the parametric model requires a priori that the vanish as χ → 0 and χ → 1.
2 detections with at least one component spin magnitude χ ≥ 0.92.Thus our recovered spin magnitude distribution is likely prior-dominated at χ ≥ 0.92.Similarly, we find that our results on average predict fewer than two detections with cos θ ≤ −0.96.This implies that our posterior on the spin tilt distribution is likelihood dominated across nearly the full range of cos θ values.One might find more conservative thresholds by instead identifying values beyond which fewer than N = 4 detections are predicted; these occur at χ ≥ 0.85 and cos θ ≤ −0.91.
When a standard strongly-parameterized model is required, we find that our autoregressive measurement of p(χ) is well-fit by a truncated Gaussian or a truncated Lorentzian, with normalization and p(cos θ) by a mixture between isotropic and Gaussian components, where N [−1,1] (cos θ|µ, σ) indicates a truncated Gaussian normalized on the interval −1 ≤ cos θ ≤ 1. Equation ( 36) is the same as the Default spin-tilt distribution [65], but with a freely varying mean as advocated in Ref. [70].A least-squares fit of our results to Eqs. ( 34) and ( 36) yields best-fit parameters χ 0 = 0.15 γ = 0.18 f iso = 0.67 µ = 0.59 σ = 0.58 .
These fits describe marginal probability distributions, and are therefore valid at any choice of m 1 , q, and z.

VI. STOP FOUR: EFFECTIVE SPINS
Although component spin magnitudes and spin-orbit misalignment angles have clear physical interpretation, they are particularly difficult to measure using gravitational waves.Easier to directly measure are various effective spins: derived parameters that, while less physically interpretable, more directly govern a gravitational wave's morphology.These effective parameters include the ef-fective inspiral spin [71,72], and the effective precessing spin [73], χ eff quantifies the degree of spin projected parallel to a binary's orbital angular momentum, while χ p approximately quantifies the degree of in-plane spin (and hence more directly controls the degree of spin-orbit precession).Although χ eff and χ p are less manifestly physical than the component spin magnitudes and tilts (much like the relationship between a binary's chirp mass and component masses), they do act as signposts by which to identify categorical features of the compact binary spin distribution.Negative χ eff , for example, can arise only if one or both component spins is inclined by more than 90 • with respect to their orbit.Non-zero χ p , meanwhile, can manifest only if a system has at least some in-plane spin, such that sin θ > 0.
Just as we have applied our autoregressive model to non-parametrically infer the component spin magnitude and tilt distributions, we can use our autoregressive model to measure the distribution of these spin parameters.If Ψ(χ eff ) and Φ(χ p ) are autoregressive functions of χ eff and χ p , respectively, then our merger rate model will be of the form where we again fall back on parametric models for the dependence of the merger rate on binary masses and redshift.Note that, while we are describing a binary's spin configuration in turns of χ eff and χ p , binary spin is fundamentally six-dimensional.Our choice to work in a reduced two-dimensional space requires that we assume some distribution for the remaining four degrees of freedom, even if that assumption is implicit.In defining Eq. ( 40), we indirectly assume that the remaining spin degrees of freedom follow their default parameter estimation priors (uniform spin magnitudes and isotropic directions), conditioned on χ eff and χ p .
Figure 12 shows our inference of the χ eff and χ p distributions of binary black holes.As in Fig. 8 above, the upper row shows the inferred merger rate as a function of χ eff (with fixed χ p ; left) and χ p (with fixed χ eff ; right).Both rates are evaluated at a fixed reference primary mass, mass ratio, and redshift.The bottom row, meanwhile, shows the corresponding normalized proba-bility distributions of each effective spin parameter.From Fig. 12 we draw the following conclusions: 1.The merger rate is non-zero for χ eff < 0. Consistent with the results of Sect.V, we find a non-zero merger rate for binaries with χ eff < 0, suggesting the presence of component spins misaligned by more than 90 • with respect to their orbital angular momenta.We find that 27 +17 −14 % of binary black holes have negative χ eff , and that integrated merger rate of binaries with negative effective spin is 7.7 +8.2 −4.3 Gpc −3 yr −1 .These estimates are comparable to those presented in Ref. [5], which concluded using a strongly-parameterized model that 29 +15 −13 % of binaries exhibit negative χ eff .12. Top: The merger rate of binary black holes as a function of effective inspiral spin (χ eff , left) and effective precessing spin (χp, right), as inferred using our autoregressive model.The rates shown are each evaluated at fixed reference masses and redshift (m1 = 20 M , q = 1, and z = 0.2).The bottom panels show the corresponding probability distributions on each effective spin parameter.Within each panel, the central black curve marks the mean inferred rate/probability, while outer black curves bound 90% credible intervals.Effective inspiral spins exhibit a unimodal distribution.The center of this distribution prefers to be at positive χ eff , but with a non-zero merger rate at χ eff < 0. The χp distribution, meanwhile, preferentially peaks toward χp = 0 but with a shoulder that extends to moderate/large precessing spins.
grated merger rate between 0 ≤ χ eff ≤ 0.1 than between −0.1 ≤ χ eff ≤ 0. Similarly, the median χ eff is inferred to be positive for 98.2% of samples. 6.The binary black hole distribution exhibits non-zero χ p .Consistent with the measurement of a range of cos θ values, we find that the black hole merger rate extends across a wide range of χ p .The percentage of binaries with χ p > 0.2, for example, is 42 +35 −32 %.We compare our autoregressive measurements to previous strongly-parametrized population measurements in Fig. 13.Blue bands show central 90% credible intervals on the rates and probability distributions of χ eff and χ p under our autoregressive model, while red bands show results using obtained when modeling the χ eff − χ p as a bivariate Gaussian in order to measure the mean and standard deviation of each quantity.Our autoregressive R(χ eff ) measurement is, in fact, in reasonable agreement with a Gaussian model, although with extended tails to χ eff 0.4 and χ eff −0.4.Both the autoregressive and Gaussian χ p models, in turn, yield similar merger rates at χ p ≈ 0.1, although the Gaussian model appears to vanish too quickly as χ p → 0 or 1.
As in the component spin case above, it is valuable to explore whether these extended tails in our χ eff and χ p distribution are due to informative data or to the continuity imposed by our autoregressive model.We will once again estimate the regions in which our results are prior dominated by identifying the threshold χ eff and χ p Autoregressive Gaussian Eq. 30 Autoregressive Gaussian Eq. 30 FIG. 13.A comparison of the binary black hole effective spin distributions inferred using our autoregressive model (blue) and that recovered by a strongly parameterized approach (red, the Gaussian spin model of [O3b]).As in Fig. 12, the top row shows the binary merger rate as a function of effective spin parameters, χ eff and χp (at fixed m1, q, and z), while the lower row shows the corresponding probability distributions.The merger rates recovered by each approach agree well in the −0.1 χ eff 0.2 and 0.1 χp 0.3 ranges, beyond which the Gaussian rates fall to zero much more quickly than our autoregressive inference.
values beyond which our posteriors predict fewer than N = 2 detections.We find that fewer than two detections are expected at χ eff ≤ −0.35, at χ eff ≥ 0.53, and at χ p ≥ 0.89.More conservative thresholds are set by identifying regions beyond which N ≤ 4 detections are expected; these occur for χ eff ≤ −0.17, χ eff ≥ 0.35, and χ p ≥ 0.78.For the case of χ eff , we accordingly conclude that much of the disagreement between the Gaussian and autoregressive models at very positive and very negative values is due to differing prior assumptions.For χ p , however, we conclude that the extended tail in R(χ p ) to large χ p is due to informative data, not simply an extrapolation from the prior.
When a strongly-parameterized model is needed for p(χ eff ), we find our autoregressive result to be wellapproximated by a truncated Gaussian with mean and standard deviation µ = 0.07 σ = 0.09 (41) or a truncated Lorentzian (see Eq. ( 34)) with χ 0 = 0.07 γ = 0.07; (42) this latter fit is shown as a dotted line in the lower-right panel of Fig. 13.Similarly p(χ p ) can be approximated by either a truncated Gaussian or Lorentzian, with best-fit parameters and respectively, the latter of which is shown in Fig. 13.As in previous sections, these fits are valid at any choice of m 1 , q, or z.

VII. DISCUSSION
In this paper, we have developed and demonstrated a novel means of measuring the population properties of merging binary black holes.By describing the black hole merger rate as a stochastic process, we hierarchically inferred the black hole mass, redshift, and spin distributions without resorting to strongly-parameterized models that a priori assume some particular structure.The advantage of highly flexible models like autoregressive processes is two-fold.They allow us to agnostically study the "known unknowns," like theoretically-predicted features in the black hole population, but also reveal the "unknown uknowns" -unexpected and impactful features that may otherwise be missed by standard stronglyparameterized approaches.
We accordingly searched for expected and unexpected features alike in the distributions of binary black hole masses, redshifts, and spins.Our results reiterated known features in the black hole mass spectrum (peaks at approximately 10 and 35 M ), but also revealed more nuanced structure like an additional steepening of R(m 1 ) towards high masses.We found signs of unexpected structure in the redshift distribution of binary black holes, recovering a merger rate that prefers to remain flat at low redshifts followed by steeper growth at z 0.5.And our autoregressive results offered a direct and model agnostic look at the black hole spin distribution, revealing features like severe spin-orbit misalignment and a unimodal spin magnitude distribution that have previously been controversial.
A challenge that arises when using flexible models is how exactly to translate results (e.g.our posterior on R(m 1 )) into statements about physical features and their significances.We find it useful to conceptually distinguish between two steps: (i ) data fitting and (ii ) feature extraction.When performing hierarchical inference with strongly-parameterized models, these two steps are accomplished simultaneously.A clear example is the Power Law+Peak model for R(m 1 ), whose parameters directly encode the location, width, and height of a possible Gaussian peak.Fitting the Power Law+Peak model to data, therefore, automatically extracts information about the feature of interest.When using highly flexible models, on the other hand, data fitting and feature extraction are necessarily distinct.Although hierarchically fitting our autoregressive model yields, for instance, the mass spectrum shown in Fig. 3, this result offers no immediate information about the presence and/or significance of possible features.Instead, we need to visually inspect our results and devise further tests or summary statistics to make any quantitative statements about the features we see.A major focus of ours has accordingly been the use of parameter-free summary statistics, like the ratios of merger rates in adjacent bins, to identify and characterize the features summarized above.These parameter-free techniques for feature extraction can be employed for any model, and additionally offer a means of directly comparing results obtained under two or more different models (strongly-parameterized or not).
While highly flexible models like ours enable a very agnostic exploration of the compact binary population, we do not necessarily advocate for replacing standard strongly-parameterized models.Instead, we envision using both strongly-parameterized and flexible models in a cyclic development process: flexible models enable the identification of possible new features, which are followed up and characterized using targeted stronglyparameterized models, whose validity is finally re-checked with flexible models as new data become available.In the spirit of this cyclic development, in each section above we have offered refined strongly-parameterized models that capture the range of features identified in our autoregressive results.
One limitation of the autoregressive model employed here the fact that it fundamentally one dimensional.Although we can simultaneously measure the dependence of the merger rate on different binary parameters, each with its own autoregressive process, this approach cannot capture any intrinsic correlations among parameters.As strongly-parameterized models begin to identify possible correlations between binary parameters [16,51], flexible population models that can operate in higher dimensions will be critical in following up these results and agnostically identifying new correlations.Some alternative approaches, like spline-based [18,19] or binned [21,22] models can be very easily extended to more than n = 1 dimension, but likely become computationally infeasible when n becomes large.Future work will involve the exploration of multi-dimensional stochastic processes as tools with which to measure the merger rate across the complete higher-dimensional space of binary black hole parameters.
6. Finally, apply the mean: Ψ i = Ψi + ln r To enable efficient sampling, the Ψ i , w i , and c i are generated following non-centered approaches; we directly sample in Ψi and n i and then transform to the actual parameters of interest.Once the complete set Ψ i of log-merger rates is generated, the sorting performed to obtain Eq. (A9) can be reversed to repartition Ψ i back into the merger rates across individual events' posterior samples and found injections.
In some cases the merger rate ln R(θ) is not well measured at the lowest θ in our set of samples, but at some intermediate value.The merger rate as a function of mass, for example, is much better constrained near m ≈ 25 M than at the very lowest masses m 5 M .In this case, sampling efficiency is maximized by not initializing our autoregressive process at its left-most point (as in Step 3 above), but instead initializing the process in the middle of our parameter range, near the best-measured rate.In this case, Steps 4-6 above are just repeated twice, once to generate forward steps to the right of our reference point, and once to generate backward steps to the left of the reference point.
trast, are explicitly designed to allow for rapid variations in the merger rate, and so are subject to this instability.In particular, our autoregressive inference is most susceptible to runaway oscillatory behavior when the processes' variance becomes large and the autocorrelation length becomes small.This instability can be regulated by placing suitable priors on the parameters governing the autoregressive process.To motivate physically meaningful priors, it is useful to think about how an autoregressive process is allowed to vary between two points.Consider autoregressive process Ψ(x) with zero mean, variance σ 2 , and autocorrelation length τ defined across some parameter x.Also let δΨ = Ψ 2 − Ψ 1 to be the difference in the process' values between two points x 1 and x 2 (separated by δx = x 2 − x 1 ).The expectation value of δΨ is since, by definition, the process has zero mean.The expectation value of δΦ 2 , though, is nonzero: using Eqs.(A6) and (A7) for the variance and covariance of an autoregressive process.It is helpful to consider Eq. (B2) in two different limits.In the limit δx τ , In the opposite limit lim δx τ More generally, we can show that in each of the above limits δΨ 2 is χ 2 -distributed with k = 1 degrees of freedom.Recall that Ψ 1 and Ψ 2 are related by where n ∼ N (0, 1) is drawn from a unit normal distribution.Then (B6) First consider the δx τ limit.Expanding to lowest order in δx/τ , where the first term is subdominant to the second in the limit of small δx/τ .We therefore have such that, by definition, this quantity is chi-squared distributed with one degree of freedom: Its corresponding value is compare to Eq. (B3) above.In the opposite limit, where δx τ , Eq. (B6) becomes Note that Ψ 1 is itself drawn from a unit normal distribution (see Appendix A).We can therefore write where m ∼ N (0, 1) is another normally distributed random variable.The difference n − m is itself a Gaussian random variable with zero mean and standard deviation √ 2, and so where n = (n − m)/ √ 2 ∼ N (0, 1) now follows a unit normal distribution.Hence is also chi-squared distributed with one degree of freedom and with mean δΨ 2 2σ 2 = 1; (B15) compare to Eq. (B4).All together, We use Eq.(B16) to motivate physical priors on σ 2 and τ .First, we might expect the log merger rate to vary by no more than δΨ max over the full parameter space.Let ∆x be the full extent of the parameter space and assume that ∆x τ , such that we are in the second case in Eq. (B16).Then our expectation is that for a chi-squared distributed random variable q ∼ χ 2 (1).
In particular, we might assert that δΨ 2 max /2σ 2 occurs at the 99th percentile q 99 of the chi-squared distribution.The cumulative distribution of a χ 2 (1) distribution is the regularized gamma function P (1/2; q/2), and so q 99 occurs at q 99 = P −1 (1/2; 0.99) ≈ 3.32. (B18) Inserting into Eq.(B17) and solving for σ, we have We therefore choose a prior on σ to enforce Eq. (B19).Specifically, we adopt a half-Gaussian prior on the range 0 ≤ σ ∞.The variance Σ 2 σ of this prior is chosen so that 99% of our prior weight occurs below the threshold set by Eq. (B19).Specifically, Next, consider how we expect our autoregressive process to vary on small scales.In particular, let δΨ event be the maximum variation we expect in the log rate on the typical inter-event distance scale δx ≈ ∆x/N , where N is the number of events in our catalog.This distance scale is likely smaller than the autoregressive process' correlation length τ , and so we now use the first inequality in Eq. (B16), demanding that for q ∼ χ 2 (1).We proceed as above, using Eq.(B22) to define the 99th percentile of q and rearranging to obtain the limit We impose this limit by adopting another half-Gaussian prior on the ratio σ/ √ τ , with Σ r chosen such that 99% of the prior weight occurs below the limit set by Eq. (B23): We can also place limits on the expected length scale τ of our autoregressive processes.We generally expect the merger rate to be smoothly varying across the parameter space of interest, but also want a prior that will nevertheless allow for rapid, small-scale variations should they be demanded by the data.We accordingly place an unbounded Gaussian prior on ln τ : This prior is centered at ln(∆x/2), and we set Σ ln τ by considering the minimum length scale that can be meaningfully constrained by N detections.In particular, the data contain no information about features on scales smaller than the minimum spacing between events.If we consider randomly placing N events across an interval of width ∆x, the spacing δx between events will be exponentially distributed: with a cumulative distribution normalizing over the range 0 ≤ δx ≤ ∆x.With N events, we expect the minimum spacing δx min in our sample to probe the 1/N quantile of this exponential distribution.Setting F = 1/N and inverting Eq. (B28) then gives this is the minimum length scale we expect to probe with N events.We choose Σ ln τ such that 95% of our prior lies above ln δx min : So far, all our discussion has concerned priors on the variance and lengthscale of an autoregressive process.We also need to consider a prior on the mean of the process, denoted above as ln r.For all analyses, we place a loga-TABLE I. Priors governing the various autoregressive process models used in this paper.For each physical parameter, we give our choices for the scale values characterizing the priors defined in Appendix B: the domain width δx, the maximum variation δΨmax in the log merger rate, the maximum inter-event variation δΨevent in the log rate, and the number N of measurements considered.We specifically give e δΨmax and e δΨ event , so that that the quantities listed in the table are directly interpretable as merger rate variations.Also note that while the majority of priors use N = 69 (the number of events in our sample), we take N = 138 when analyzing the component spin magnitude and tilt distributions, as each binary contributes two component spins to our sample.Given these parameter choices, we also show the derived quantities directly appearing in Eqs.(B20), (B24), and (B26).
In Sect.II, we discussed how the likelihood is approximated via a weighted Monte Carlo average over ensembles of posterior samples; see Eq. (9).Similarly, the expected number of detections was evaluated by a Monte Carlo average over a set of successfully found injections; see Eq. (12).Both approximations break down if these averages become dominated by a very small number of posterior samples or found injections.One metric for gauging the health of Monte Carlo averaging is the effective sample number.Define w i (Λ) = R d (λ inj,i ; Λ)/p pe (λ inj,i ) to be the weights appearing in the calculation of N exp (Λ) in Eq. (12).The effective number of samples informing this calculation is given by In order for systematic uncertainty in N exp = (Λ) to remain a subdominant effect in our hierarchical analysis, it is necessary that N inj eff (Λ) 4N obs [35,85].Similarly, define w I,j (Λ) = R d (λ I,j ; Λ)/p pe (λ I,j ) to be the weights defined in Eq. ( 9) over the posterior samples j of each event I.The number of effective posterior samples in-forming each event's likelihood is then In particular, a useful metric is the minimum number of effective posterior samples, min I N samp eff,I (Λ), taken over events I. Healthy inference generally requires min I N samp eff,I (Λ) 1 [86].For each of our analyses, we monitor N inj eff and min N samp eff for signs of poor effective sample counts; see Figs. [16][17][18][19] in Appendix E. Although potential population models with low effective sample counts are largely discouraged by the above priors on σ, ln τ , and σ/ √ τ , we further prevent our inference from exploring models with pathologically low effective samples by severely penalizing proposed populations with N inj eff < 4N obs and/or min log 10 N samp eff < 0.6.Specifically, we define the function (B34) this asymptotes to unity when x is large and falls to zero when x approaches zero.We then add the terms ln S(N inj eff /4N obs ) + ln S(min log 10 N samp eff /0.6) to the loglikelihood implemented in numpyro.These send the loglikelihood towards −∞ when either of the above conditions are violated.

Appendix C: Demonstrations on Known Populations
We demonstrate the machinery developed in Appendices A and B by injecting and recovering a set of known distributions.This is useful in verifying that our methodology works as expected, and in diagnosing any limitations in the performance of our autoregressive model or Eq.(D3) IV,V,VI N(0, 4) µ χ Eq. (D4) III,IV U(0, 1) σ χ Eq. (D4) III,IV LU(0.1, 1) σ u Eq. (D5) III,IV U(0.3, 2) κ Eq. (D6) III,V,VI N(0, 5) shifts (Sect.IV), modeling each set of distributions in turn as autoregressive processes.To accurately measure the population distribution of any one parameter, it is generally necessary to simultaneously fit for the distributions of other parameters.Therefore, wherever we focused on modeling a specific subset of parameters using autoregressive models, we concurrently fit the remaining parameters using simple strongly-parametrized models.The priors used for each of the following models are given in Table II.
Our parametric mass model assumes that primary masses are drawn from a mixture between a power-law and a Gaussian peak, with possible tapering at low and high masses.This is a variant of the Power Law+Peak model first defined in Ref. [9] and used in depth in Refs.[5,8].Specifically, define (D1) to be the superposition of a power law and Gaussian; the former is normalized between 2 M ≤ m 1 ≤ 100 M with spectral index λ, while the latter is centered at mean µ m with standard deviation σ m .The parameter f p controls the relative contribution of each component.Our with a mean fixed to 1 but a standard deviation σ u measured from the data.This parameteric model is used in Sects.III and IV when targeting binary masses and redshifts with our autoregressive prior.Finally, when targeting binary masses or spins, we revert to a standard parametric redshift model in which the comoving merger rate density grows as R(θ; z) ∝ (1 + z) κ . (D6) The observed detector-frame merger rate per unit redshift correspondingly grows as where the additional factor of (1 + z) −1 converts between source-and detector-frame rates.This parametric model is adopted in Sects.III, V, and VI when non-parametrically measuring the black hole mass and spin distributions.

Appendix E: Additional Posteriors and Predictive Checks
In this appendix, we show a few more supplemental results that may be useful in assessing the bheavior and performance of our population inference with an autoregressive model.
As discussed elsewhere, our autoregressive process models are characterized by hyperparameters σ and τ controlling the variance and autocorrelation length of the (log) merger rate.show the posteriors obtained on these parameters for each autoregressive model used in the main text.For diagnostic purposes, these figures also show quantities related to the effective number of samples informing our inference.Included in each corner plot is the distribution of effective injections per observation, N inj eff /N obs and the distribution of effective posterior sample counts, minized across events.See Appendix B for further details regarding these diagnostics.
We furthermore assess the validity of our results using posterior predictive checks, comparing distributions of observed binary parameters to distributions predicted by our fitted models.Catalogs of "observed" and "predicted" parameters are generated by resampling individual event posteriors as well as the set of injections used elsewhere to compute N exp (Λ).This process proceeds as follows [87]: 1. Randomly draw a sample Λ i from our posterior on the population parameters Λ (this includes the variables characterizing strongly-parametrized models, as well as latent parameters defining any autoregressive processes in use).
2. For each observed event, with posterior samples {λ j }, compute weights w j = R d (λ j |Λ i )/p pe (λ j ), the ratios between the detector-frame distribution defined by Λ i and the parameter estimation prior p pe (λ).
3. From each observed event, draw a single sample λ j with draw probabilities proportional to the weights {w j }.The resulting set {λ} defines a single catalog of "observed" parameters consistent with our data.
5. Draw N obs values λ inj,j with draw probabilities proportional to the above weights.The resulting set {λ inj } defines a single catalog of "predicted" parameters consistent with the proposed population Λ i and with appropriate selection effects.6. Sort the "observed" and "predicted" catalogs, and plot against one another.
Figure 20 shows the result of this algorithm using the population model adopted in Sect.III, in which the primary mass and mass ratio distributions of binary black holes are described as autoregressive processes.For convenience, the dotted black lines mark the diagonals along which "observed" and "predicted" values are equal.A systematic deviation from this diagonal would indicate tension between our model (and its corresponding predictions) and our observed data.In Fig. 20, all traces are systematically clustered around the diagonal, with no indication of systematic mismodeling.similarly show predictive checks on the autoregressive models used in Sects.IV-VI to describe distributions of binary redshifts, component spins, and effective spins; these also indicate good agreement between our autoregressive models and observed data.
In addition to checking model validity, the catalogs of "predicted" detections enable other related calculations regarding the locations in which detections are expected (or not expected) to arise.In particular, we saw in Appendix C that autoregressive processes can exhibit priordominated tails in regions with no data, and that the onset of these tails can be characterized by identifying regions in which fewer than one detection is predicted by the fitted model.We can accordingly use the ensembles of "predicted" detections appearing in Figs.20-23 to assess the statistical significances of tails appearing in our autoregressive results, calculating threshold values above or below which fewer than N detections are predicted by our fitted models, on average.The results of this calculation are quoted in Sects.V and VI in the main text, when discussing the robustness of our recovered spin distributions.As a diagnostic, we also include the effective number of injections per observed event and the effective number of posterior samples informing the per-event likelihood, minimized over the 69 events in our sample.
t e x i t s h a 1 _ b a s e 6 4 = " 8 r U u T a C W o w E O b + I H 4 s o C W H X J a o U = " > A A A B / 3 i c b V D L S g M x F L 3 j s 9 b X q O D G T b A I d V N m p K j L o h u X V e w D O qV k 0 k w b m s k M S U Y o Y x f + i h s X i r j 1 N 9 z 5 N 2 b a W W j r g c D J O f e S k + P H n C n t O N / W 0 v L K 6 t p 6 Y a O 4 u b W 9 s 2 v v 7 T d V l E h C G y T i k W z 7 W F H O B G 1 o p j l t x 5 L i 0 O e 0 5 Y + u M 7 / 1 Q K V i k b j X 4 5 h 2 Q z w Q L G A E a y P 1 7 E O P C + S F W A 8 J 5 u n d p J z d w 9 O e X X I q z h R o k b g 5 K U G O e s / + 8 v o R S U I q N O F Y q Y 7 r x L q b Y q k Z 4 X R S 9 B J F Y 0 x G e E A 7 h g o c U t V N p / k n 6 M Q o f R R E 0 h y h 0 V T 9 v Z H i U K l x 6 J v J L K m a 9 z L x P 6 + T 6 O C y m z I R J 5 o K M n s o S D j S E c r K Q H 0 m K d F 8 b A g m k p m s i A y x x E S b y o q m B H f + y 4 u k e V Z x z y v V 2 2 q p d p X X U Y A j O I Y y u H A B N b i B O j S A w C M 8 w y u 8 W U / W i / V u f c x G l 6 x 8 5 w D + w P r 8 A Q o e l X o = < / l a t e x i t > ln R(ln m) < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 r U u T a C W o w E O b + I H 4 s o C W H X J a o U = " > A A A B / 3 i c b V D L S g M x F L 3 j s 9 b X q O D G T b A I d V N m p K j L o h u X V e w D O q V k 0 k w b m s k M S U Y o Y x f + i h s X i r j 1 N 9 z 5 N 2 b a W W j r g c D J O f e S k + P H n C n t O N / W 0 v L K 6 t p 6 Y a O 4 u b W 9 s 2 v v 7 T d V l E h C G y T i k W z 7 W F H O B G 1 o p j l t x 5 L i 0 O e 0 5 Y + u M 7 / 1 Q K V i k b j X 4 5 h 2 Q z w Q L G A E a y P 1 7 E O P C + S F W A 8 J 5 u n d p J z d w 9 O e X X I q z h R o k b g 5 K U G O e s / + 8 v o R S U I q N O F Y q Y 7 r x L q b Y q k Z 4 X R S 9 B J F Y 0 x G e E A 7 h g o c U t V N p / k n 6 M Q o f R R E 0 h y h 0 V T 9 v Z H i U K l x 6 J v J L K m a 9 z L x P 6 + T 6 O C y m z I R J 5 o K M n s o S D j S E c r K Q H 0 m K d F 8 b A g m k p m s i A y x x E S b y o q m B H f + y 4 u k e V Z x z y v V 2 2 q p d p X X U Y A j O I Y y u H A B N b i B O j S A w C M 8 w y u 8 W U / W i / V u f c x G l6 x 8 5 w D + w P r 8 A Q o e l X o = < / l a t e x i t > ln R(ln m) < l a t e x i t s h a 1 _ b a s e 6 4 = " E 4 u j W R w X 9 I O r 2 E R z p + M s o 2 Y y z 2 E = " > A A A B 8 n i c b V D L S g N B E J y N r x h f U Y 9 e B o M g C G F X g n o M e v E Y w T x g E 8 L s Z D Y Z M o 9 l p l c I S z 7 D i w d F v P o 1 3 v w b J 8 k

FIG. 4 .
FIG.4.Tests quantifying the significances of features identified in Fig.3.Top: Ratios between the average merger rate across 9 M < m1 < 11 M and in adjacent lower-and higher-mass intervals.If a peak is present near 10 M , both ratios should be greater than one; this is true for 96% of our samples.Middle: Similarly, ratios between the average merger rate across 28 M < m1 < 40 M and in adjacent bands.A peak near 35 M is present in 94% of samples.Bottom: The implied power law indices characterizing the 15-25 M and 45-75 M intervals (α mid and α high , respectively).We find that 89% of samples show a steepening in the mass spectrum, with α high < α mid .

R 0.1<z<0. 2 [
FIG. 7.Left: A comparison between the mean merger rate across the interval 0.3 < z < 0.4 and the mean rate across 0.1 < z < 0.2.Each point corresponds to a single posterior draw from Fig.6.All estimates cluster around the diagonal, indicating that the merger rates in both intervals are consistent with one another.The data are therefore consistent with a non-evolving merger rate below z 0.4.Right: An analogous comparison between the mean merger rate in the interval 0.7 < z < 0.8 and the mean rate within 0.1 < z < 0.2.The merger rate in the high-redshift interval is greater than that in the low-redshift interval for 96% of samples, indicating a preference for a merger rate that grows at large redshifts.

FIG. 8 .
FIG. 8.Top: The merger rate of binary black holes as a function of component spin magnitudes (left) and spin-orbit misalignment angles (right), as inferred using our autoregressive model defined in Eq. (29).In the axes labels, we use the shorthand d χ1 ≡ dχ1d cos θ1 to indicate a density over both spin magnitude and cosine tilt.Specifically, the rates shown are that of binaries with equal component spin magnitudes (χ1 = χ2 = χ; see Eq. (30)) or tilts (cos θ1 = cos θ2 = cos θ; Eq. (31)), each evaluated at fixed reference masses and redshift (m1 = 20 M , q = 1, and z = 0.2).The bottom panels show the corresponding probability distributions on component spin magnitudes and tilts among black hole binaries.Within each panel, the central black curve marks the mean inferred rate/probability, while outer black curves bound 90% credible intervals.We see that spin magnitudes are well-described by a unimodal distribution that peaks at low values, with no sign of an excess of non-spinning (χ = 0) or near-maximally spinning black holes.Meanwhile, the rate of binary mergers is non-zero across the full range of misalignment angles, with a spin-tilt distribution that is possibly (but not necessarily) isotropic.While there also appears to be a possible excess of black holes with cos θ ≈ 0.4, this feature is not statistically significant.

FIG. 10 .
FIG.10.Evaluation of the significance of the cos θ ≈ 0.4 peak in Fig.8.For each probability distribution p(cos θ) in Fig.8, we show the ratios between the mean probability in the window 0.05 < cos θ < 0.75 (centered on the possible peak) and the mean probabilities across adjacent windows at smaller and larger cos θ.When a peak is present, both ratios should be greater than one, corresponding to the upper right quadrant.We find that only 37% of posterior samples fall in this quadrant, indicating that the cos θ ≈ 0.4 peak is not statistically significant.

1 )
dq d χ FIG. 12. Top:The merger rate of binary black holes as a function of effective inspiral spin (χ eff , left) and effective precessing spin (χp, right), as inferred using our autoregressive model.The rates shown are each evaluated at fixed reference masses and redshift (m1 = 20 M , q = 1, and z = 0.2).The bottom panels show the corresponding probability distributions on each effective spin parameter.Within each panel, the central black curve marks the mean inferred rate/probability, while outer black curves bound 90% credible intervals.Effective inspiral spins exhibit a unimodal distribution.The center of this distribution prefers to be at positive χ eff , but with a non-zero merger rate at χ eff < 0. The χp distribution, meanwhile, preferentially peaks toward χp = 0 but with a shoulder that extends to moderate/large precessing spins.

FIG. 15 .
FIG. 15.Demonstrated autoregressive inference of four simulated populations, as described in Appendix C. The left column indicates the normalized probabilitity distributions recovered in each case, while the right column shows inferred rate densities.In the right-hand column, the median recovered autocorrelation length τ and standard deviation e σ of the process are shown via the horizontal and vertical error bars (recall that σ 2 defines the variance of the logarithmic number density).In each case, true distributions are indicated as dashed red curves.

2 low(m 1 + β q m 1+βq 1 − (2M ) 1+βq m βq 2 (
complete primary mass distribution is of the shapef (m 1 ) = 1 ) exp −(m1−m low ) 2 2δm < m low ) φ(m 1 ) (m low ≤ m 1 ≤ m high ) φ(m 1 ) exp −(m1−m high ) 2 2δm 2 high (m high < m 1 ),(D2) with squared exponentials that taper f (m 1 ) towards zero above and below m low and m high , respectively.The tapering scales δm low and δm high are additional free parameters inferred from the data.Mass ratios, in turn, are assumed to follow a power law distribution, withp(q|m 1 ) = 1 D3)This parametric mass model is used in Sects.V, VI, and IV when focusing on autoregressive modeling of black hole spins and redshifts.When non-parametrically exploring the black hole mass and redshift distributions, we revert to a parametric spin model in which component spin magnitudes and spin-orbit tilt angles are independently and identically distributed as truncated normal distributions.Each component spin magnitude in a given binary has a probability distribution p(χ i ) µ χ and standard deviation σ χ that are inferred from the data.The cosines of component spin tilt angles, meanwhile, are independently distributed as p(cos θ i ) = 2 πσ 2 u e −(cos θi−1) 2 /2σ 2

51 FIG. 16 .
FIG.16.Posteriors on the hyperparameters characterizing the primary mass and mass ratio distributions presented in Sect.III.As a diagnostic, we also include the effective number of injections per observed event and the effective number of posterior samples informing the per-event likelihood, minimized over the 69 events in our sample.

28 FIG. 19 .
FIG.19.As in Fig.16, but for the effective spin distributions shown in Sect.VI.