A graph model for the clustering of dark matter halos

We use network theory to study hierarchical clustering of dark matter halos, the building blocks of cosmic structure. We analyze cosmological simulations of structure formation, which are publicly available, and construct tree graphs for individual main halo systems by connecting their associated subhalos. These graphs exhibit a power-law degree distribution with an exponent of $-2$. We propose a growing graph model based on preferential attachment to realize the power-law distribution exhibited in the simulations. The attachment kernel effectively incorporates effects of minor mergers, major mergers and tidal stripping. The model reproduces many important features of the simulated halos, including the subhalo abundance and the hierarchical clustering. It provides a new way of modeling complex gravitational dynamics of structure formation.

Introduction.Complex systems commonly contain many interacting components.The connectivity of their components can be described using a graph [1,2].When graph nodes and edges are associated with physical components and interactions, a graph is also called a network.Growing random network models have been constructed to study different systems, e.g., the world-wideweb, e-mail, social, protein, and metabolic networks [3][4][5][6][7][8][9][10][11][12][13][14].These networks usually exhibit power-law degree distributions as P (k) ∝ k −ν , where P (k) measures the relative frequency of the nodes with degree k, i.e., the number of connections to other nodes, and ν is a constant with its value typically in the range 2 < ν < 3 [2].
Ref. [15] shows that a mechanism based on preferential attachment can explain the power-law degree distribution.In this mechanism, one starts with a connected graph and attaches a new node to the existing ones once at a time.The probability for a new node to be attached to the i th existing node is computed as A i / j A j , where A i is the attachment kernel that positively correlates to the degree of the i th node k i , and the summation in the denominator goes over all existing nodes.This positive correlation naturally leads to a preference in the attachment, resulting in the effect that the rich gets richer [15][16][17][18][19][20][21].When the attachment kernel is linear A i = k i , we have the Barabási-Albert model that produces a distribution of P (k) ∝ k −3 at large k [15].When the attachment kernel is non-linear, the power-law feature of the degree distribution can be transient [22][23][24].
In this work, we study the hierarchical clustering of dark matter halos using graphs.We will analyze stateof-the-art cosmological N-body simulations of structure formation from the FIRE [25][26][27] and Bolshoi [28,29] projects and construct graphs for individual main halo systems by connecting their associated subhalos.These graphs exhibit a power-law degree distribution with an exponent of −2.We will develop a random graph model to realize this degree distribution.In this model, the attachment kernel is linear for effectively modeling minor mergers in structure formation, while it approaches two asymptotic regimes for incorporating major mergers and An example graph constructed for a Milky Way analog named m12r from the FIRE2 cosmological simulations [27,30].Left: 40 most massive dark matter halos of m12r.Each circle represents a dark matter halo and its size denotes the halo's virial radius.An arrowed link is created between a subhalo to its least massive host halo.Right: A tree graph for modeling the hierarchical clustering of the 40 halos.A node represents a halo and an edge represents a connection of a subhalo to its least massive host halo.
tidal stripping.We will show that our graph model reproduces not only the expected power-law distribution, but also topological and structural properties of the simulated systems.In particular, we will quantitatively show that the graphs constructed from the simulations and the model are scale-free.Graphs for the clustering of subhalos.In the standard model of cosmology, the cosmic structure forms hierarchically, i.e., small dark matter halos, a gravitationally self-bound system, form first, and they merge to form larger and more massive halos, see, e.g., [31][32][33][34].During this process, most of the merging halos "dissolve", virialize, and become the smooth component of their host halos.However, some of them can survive from tidal disruption during the mergers and become subhalos.Fig. 1 (left) illustrates a Milky Way analog, named m 12r , from the public data release of the FIRE2 simulation project [27,30].We include a main halo and its 39 most massive subhalos.Here, we consider a main halo to be isolated and its mass and virial radius are largest in the system [35].A subhalo is identified if its distance to the center of the main halo is less than the virial radius of the latter.
To incorporate the hierarchical nature of structure formation, for each subhalo, we identify its least massive host halo, which is not necessarily the main halo, and create a link between the two, see the red arrows in Fig. 1 (left).Through this process, we obtain a connected and directed tree graph that has a total number of 40 = 39 (subhalos)+1 (main halo) nodes, and each of the nodes represents a dark matter halo.For a constructed graph, the in-degree of the i th node N i corresponds to the number of subhalos that are directly hosted by its host halo.Fig. 1 (right) shows a graph for m 12r .There are four nodes that have none zero N i values: {N 1 , N 2 , N 3 , N 8 } = {29, 8, 1, 1}.Note halo "1" is the main halo.
The distribution of subhalo abundance from the Bolshoi cosmological simulations [28] (solid magenta) and our graph model, ModBA (solid blue).Bottom: The in-degree distribution of subhalos constructed from the Bolshoi [28] (solid magenta) and FIRE2 (solid red) simulations, as well as the one generated using the ModBA model (solid blue).In both panels, a power-law scaling relation with an exponent of −2 is included for comparison (dotted black).
The power-law degree distribution.To construct a large sample of graphs and perform statistical analysis, we take public data from two complementary simulation projects.The FIRE2 cosmological zoom-in simulations of galaxy formation are part of the Feedback In Realistic Environments (FIRE) project, generated using the Gizmo code [25] and the FIRE2 physics model [26].In the FIRE2 public data release, there are 11 main halos with masses ∼ 10 12 M [26,27,30,[36][37][38][39][40][41][42][43].For each of the FIRE2 main halos, the resolution is high enough to resolve subhalos with masses larger 10 7 M .On average, there are N sub ∼ 800 resolved subhalos for each main halo.In our analysis, we take the 600 most massive ones, including the main halo, and have checked that our main results do not change if we vary the number of subhalos.The Bolshoi project is based on dark matter-only cosmological simulations [28], and it contains about 6500 main halos with masses larger than ∼ 10 11 M .But the resolution is relatively low.For the four most massive Bolshoi main halos with masses higher than 10 14 M , N sub ∼ 400 with subhalo masses larger than 10 9 M .Fig. 2 (top) shows the distribution of the subhalo number for the Bolshoi main halos at redshift z = 0 (solid magenta), where P (N sub ) is the fraction of the main halos that have N sub subhalos.Apparently, the probability of having a high number of subhalos decreases and the trend follows a power-law scaling relation as P (N sub ) ∝ N −2 sub (dotted black).Fig. 2 (bottom) shows the N i distribution for the FIRE2 (solid red) and Bolshoi (solid magenta) simulations.For each of the 11 FIRE2 main halos, we construct a graph with 600 halos and show the averaged result.For Bolshoi, we show the stacked result from the four most massive main halos.We again see the N i distribution exhibits a power law of P (N i ) ∝ N −2 i (dotted black) for both simulations, although they have different implementations, initial conditions and resolutions.
From the perspective of structure formation theory, the scaling relation sub can be attributed to the scale invariance of the matter power spectrum.In this case, the Press-Schechter formalisim [44] predicts the mass function as dn/dM ∝ 1/M 2 on the low mass limit for isolated main halos, where n is the number of halos and M is the halo mass.Thus we have dn/dN sub ∝ 1/N 2 sub as N sub ∝ M approximately [45].A similar argument applies to the P (N i ) distribution, as the structure formation is self-similar [44].
The scaling relation shown in Fig. 2 is robust to the variation of mass resolutions in the simulations and halo virial radii.We have also checked that it holds at higher redshifts as well.This implies an increment in N sub must be proportional to its current value, i.e., ∆N sub ∝ N sub such that the power-law distribution is maintained.For minor mergers, where a low mass halo falls into a massive one, this halo accretion process is equivalent to attachment with the kernel to be linear in N sub .This linear attachment feature plays an important role in constructing a graph model to describe the clustering of the halos shown in Fig. 2, but it should break, as we discuss next.A modified Barabási-Albert model for dark matter halos.As we discussed in the introduction, in the original Barabási-Albert model, the attachment kernel is linear and the degree distribution follows a power law with an exponent of −3.For the simulated dark matter halos, both N sub and N i distributions have a power law with an exponent of −2.This indicates that the linear at-tachment kernel, which is realized through minor mergers in halo accretion, should be broken to some extent.
We identify two physical processes that break the linearity in two opposite limits.One is the major merger, where two halos of similar mass coalesce into one, without increasing the number of subhalos in the merged main halo.This effect is most important for halos with low N sub , resulting in a suppression in the subhalo attachment rate for the N sub distribution.The other is tidal stripping, which removes subhalo masses in the outer region and redistributes them to the host halos.As subhalos evolve in the tidal field of their host halos, they become lighter and smaller.Some higher-order subhalos residing in the outer region of these subhalos would be released (some of them would be erased), resulting in a net enhancement in the attachment rate for host halos that have the largest number of subhalos.The effect of tidal stripping is most relevant for the N i distribution.
With these considerations, we propose a modified Barabási-Albert model (ModBA) to describe the clustering of dark matter halos.To reproduce the N sub distribution, we consider the following attachment kernel where k j refers to the degree of the j th node, β is a parameter larger than 1, and α much less than 1.We see that A j asymptotes a linear kernel for large k j , but it is suppressed for small k j as α is much less 1 by design.
In practice, we initialize a graph that has two mutually connected nodes and add a new node at a time.After attaching many nodes, we obtain a connected tree graph with the highest degree node being the root node.Assuming the links take their directions from leaves to the root, we identify the non-zero in-degree nodes, excluding the root node, as main halos.The number of main halos is determined by the size of the graph which can be taken arbitrarily large.The number of subhalos N sub for each of the main halos equals to the in-degree of the corresponding node.Taking α = 0.05 and β = 4.5, we can produce the N −2 sub distribution with constructed graphs, as shown in in Fig. 2 (top, solid blue).We have averaged over 8 graphs and each has 10 4 nodes to obtain this result.
The attachment kernel in Eq. 1 is similar to the one proposed in [22,23].They showed if one sets A 1 to be a constant α, while keeping A j = k j linear for j ≥ 2, the constructed network will possess a central hub and has a power-law degree distribution for large k.The corresponding exponent is −(3 + √ 1 + 8α)/2, which asymptotes −2 as α approaches zero.Thus in our model, the power-law distribution with the exponent −2 is realized naturally and it does not suffer from fine-tuning.
To produce the N i distribution, we consider the fol-lowing attachment kernel where we choose α = 0.05 as before and γ is a parameter depending on the total number of halos.This kernel amplifies the attachment rate for halos with large N i in order to account for the tidal stripping effect, while it becomes linear at small k j .We express γ as γ = 4.97/N 0.117 + 85.7/N 1.44 , where N is the total number of the nodes in the graph, which corresponds to the total number halos N = N sub + 1.The numerical factors are chosen such that for given N nodes the maximum node degree, which characterizes the size of the central hub, agrees with the averaged value from the graphs directly constructed using the 11 FIRE2 main halos.In Fig. 2 (bottom, solid blue), we show that the degree distribution, averaged over 100 graphs using the kernel Eq. 2 with N = 600.It follows a power law with an exponent of −2, consistent with the N i distribution from the simulated halos.For N = 600, γ ≈ 2.4 and it does not become 1 until N ∼ 10 6 .Thus the attachment kernel for low-degree nodes is still linear, while the one with a large degree receives a superlinear enhancement.Fig. 3 shows a visual comparison between a graph constructed from FIRE2 m 12r and an analogous one generated using our ModBA model.The latter is selected by eye from 100 randomly generated graphs.Our ModBA model reproduces the structural properties of the simulated halo system, including the presence of a central hub and subclusters.The largest sub-cluster shown in the left panel represents a simulated analog of the Large Magellanic Cloud, the most massive satellite galaxy (subhalo) of the Milky Way [30].Thus the ModBA model can effectively capture the accretion history of structure formation.Scale-free graphs.The scaling behavior shown in Fig. 2 indicates that our constructed graphs are scale- free.Ref. [46] proposed a structural metric to quantify the extent to which a graph is scale-free.For a graph with edge set E, the metric is defined as s G = (i,j)∈E k i k j .Fig. 4 (left) shows s G distributions for the graphs constructed from the 11 FIRE2 main halos (red dots) and those generated using our ModBA model (blue rectangles).For both cases, the distribution peaks towards large s G values, indicating the graphs are scale-free.The value of s G can be maximized if one performs degree-preserving rewiring such that high-degree nodes are attached to other high-degree nodes.For a given degree sequence, {k 1 , ..., k N }, we follow the algorithm in Appendix A of Ref. [46] and reconstruct tree graphs that have the maximal s G value, s max .Fig. 4 (left) shows the s max distribution for the graphs from the 11 FIRE2 simulations (orange dots) and from the ModBA model (green rectangles).For the latter, we generated 1000 graphs.We see the s max distribution is similar to the s G one, but peaks more towards large s max values.Quantitatively, the averaged ratios of s G /s max are 0.98 and 0.93 for the FIRE2 and ModBA graphs, respectively.Graph spectra.We use the adjacency and normalized Laplacian matrices to describe the structure of a graph [1,2], where directions of graph edges are neglected.Fig. 4 shows the spectra of the adjacency (middle) and normalized Laplacian (right) matrices of the FIRE2 (red) and ModAB (blue) graphs.The FIRE2 result is based on the average of 11 graphs, while the ModBA result on 100 graphs generated randomly.The adjacency spectra are symmetric around zero and strongly localized around the hub.The pair of the largest eigenvalues are close to √ N , which are not shown for clarity [47].The eigenvalue of the normalized Laplacian matrix is in a range of 0-2.The corresponding spectra have a strong correlation to the topological features of a network [48][49][50][51][52][53][54][55]: small and large eigenvalues are associ-ated with the clustering and "bipartiteness" of graph substructures, respectively.The peak around one indicates that our graphs have a large central hub.Our ModAB model produces very similar spectra as the ones from the cosmological simulations.
We have also calculated the graph distance and its variance of all distinct pairs of graphs from the FIRE2 simulations and the ModBA model.Given two graphs where λ a,i refers to the i th eigenvalue for the graph G a [2].For the adjacency spectra, the mean and variance are d A ≈ 7.7 (7.4) and σ A ≈ 3.2 (4.5) for the FIRE2 (ModBA) graphs, respectively.For the Laplacian spectra, d L ≈ 2.6 (2.9) and σ L ≈ 1.3 (1.5) for the FIRE2 (ModBA) graphs.We again see that the population of graphs generated in our model captures the variance in the assembly history of dark matter halos.
Conclusions.We have proposed a graph model to study the clustering of dark matter halos.In particular, we focused on subhalos from cosmological simulations of structure formation and constructed tree graphs that characterize the relationship between a subhalo and its host halo.Our model is based on preferential attachment and it successfully produces the clustering properties of the simulated dark matter halos.We showed that the attachment kernel is linear for modeling minor mergers, and it approaches two asymptotic regimes for incorporating major mergers and tidal stripping.The graphs produced using our model are almost identical to those from the simulations in terms of their topological and structural properties.The results indicate that the clustering of cosmic dark matter structure can be modeled with a growing random network.
There are several related topics worthy of further investigations.We could extend our study to dark matter scenarios beyond the standard one, such as those with strong dark matter self-interactions [56] or damped matter power spectra [57], where the abundance of subhalos can be suppressed.In addition, other halo properties, such as shape and orientation, can be incorporated as weights in nodes and edges in our model, offering a natural framework for neural network studies, see recent examples [58][59][60][61][62][63].
FIG. 1.An example graph constructed for a Milky Way analog named m12r from the FIRE2 cosmological simulations[27,30].Left: 40 most massive dark matter halos of m12r.Each circle represents a dark matter halo and its size denotes the halo's virial radius.An arrowed link is created between a subhalo to its least massive host halo.Right: A tree graph for modeling the hierarchical clustering of the 40 halos.A node represents a halo and an edge represents a connection of a subhalo to its least massive host halo.

FIRE2 m 12 r
FIG.3.A visual comparison between a graph constructed from FIRE2 m12r (left) and an analogous one generated using the ModBA model (right).Both graphs have 600 nodes.Our graph model reproduces the clustering properties of the simulated halos, including the presence of a central hub and subclusters.

FrequencyFIG. 4 .
FIG. 4. Left:The distribution of scale-free metric sG for the graphs from the FIRE2 simulations (red dots with statistical errors), and those from the ModBA model based on 1000 graphs generated randomly (blue rectangles).For comparison, the corresponding distribution of smax is shown for the FIRE2 (orange dots with errors) and ModBA (green rectangles) graphs.Middle and Right: the adjacency and Laplacian spectra, respectively, for the FIRE2 (red) and ModBA (blue) graphs with 600 nodes; the former is averaged over 11 graphs and the latter over 100.