Combinatorics and topological weights of chromatin loop networks

Polymer physics models suggest that chromatin spontaneously folds into loop networks with transcription units (TUs), such as enhancers and promoters, as anchors. Here we use combinatoric arguments to enumerate the emergent chromatin loop networks, both in the case where TUs are labelled and where they are unlabelled. We then combine these mathematical results with those of computer simulations aimed at finding the inter-TU energy required to form a target loop network. We show that different topologies are vastly different in terms of both their combinatorial weight and energy of formation. We explain the latter result qualitatively by computing the topological weight of a given network -- i.e., its partition function in statistical mechanics language -- in the approximation where excluded volume interactions are neglected. Our results show that networks featuring local loops are statistically more likely with respect to networks including more non-local contacts. We suggest our classification of loop networks, together with our estimate of the combinatorial and topological weight of each network, will be relevant to catalogue 3D structures of chromatin fibres around eukaryotic genes, and to estimate their relative frequency in both simulations and experiments.


I. INTRODUCTION
Chromatin is a protein-DNA composite polymer that provides the building block of chromosomes, and it constitutes the form in which genomic information is stored in the nuclei of eukaryotic cells.Chromatin also provides the genomic substrate for fundamental intracellular processing of DNA, such as transcription and replication [1,2].Long-standing observations suggest that the 3D structure of chromatin is functionally important: for instance, it is known that the 3D structure of a gene locus correlates with its transcriptional activity [3].
Polymer models to determine chromatin structure in 3D are therefore important in this field, and several coarse-grained potentials have been developed to describe them (see, e.g., [4][5][6][7][8], and [9,10] for a review of some of these).Typically, coarsegrained polymer models view chromatin as a copolymer, or heterogeneous polymer, where different beads may have different properties to reflect, among others, the local sequence and post-translational modification in DNA-binding histone proteins, such as acetylation or methylation (see, e.g., [3,6,11]).
A simple copolymer model for active chromatin [12], which is relevant to our current work, views the fibre as a semiflexible polymer with interspersed "transcription units" (TUs, the red circles in Fig. 1), representing open chromatin regions such as enhancers or promoters which have high affinity for multivalent chromatin-binding proteins associated with transcription -such as RNA polymerases and transcription factors, or protein complexes including both of these [13,14].
Simulations of more sophisticated polymer models, resolving chromatin-binding proteins, show that TUs come together due to the bridging-induced attraction, a positive-feedback loop associated with multivalent chromatin-protein binding [13].The bridging-induced attraction leads to microphase separation into clusters of TUs (and their associated proteins) because clustering the TUs create loops whose entropy grows superlinearly with TU number, eventually balancing the energetic gain of clustering [12,13].This phenomenon provides a mechanistic model for the formation of transcription factories in mammalian nuclei [15].This discussion suggests that in a simpler effective model, one can consider the TUs themselves as sticky for each other, and this is the model sketched in Fig. 1.
In the copolymer model of Fig. 1, chromatin loop networks emerge in a steady state due to the sticky nature of TUs.Some natural questions then arise, namely how to classify the emerging network topologies (such as the one in Fig. 1B), and what the statistical likeliness of observing each of such topologies is.A possible way to classify the loop topologies is by computing the entropic exponent associated with the network, as in [16].However, the issue arises that all networks with the same number of nodes and edges (or legs) emanating from each node would have the same entropic exponent, as they have the same number of nodes and edges [16].As shown in the companion paper [17], simulations suggest, instead, that the probability of observing different networks is not constant at all, so it would be desirable to go beyond the calculation of the entropic exponent and estimate the statistical weight associated with each loop topology.
We consider two possible classes of chromatin loop networks.First, "labelled" networks are those in which the TUs are numbered.This is often relevant in biological examples where different TUs correspond to different regulatory elements, and it may be important in practice to distinguish networks with the same topology and distribution of clusters, but where different TUs participate in the clusters.
Second, "unlabelled" networks are those where TUs are not numbered, such that different configurations are topologically non-equivalent configurations of our chromatin fibre.For example, the two networks in Fig. 2A  but represent the same topology when counting unlabelled networks.Unlabelled networks are relevant when considering generic topologies, for example, the rosette and watermelon ones in Fig. 2B, and asking which topology is most often found in gene loci genome-wide.Labelled networks are a lot simpler to count combinatorically with respect to unlabelled ones; this is because it is hard in general to count the multiplicity of labelled networks corresponding to a unique unlabelled network topology.
In the present work, we aim to classify topologies of chromatin loop networks, counting them and finding their statistical weights, which measures the probability of observing them in a polymer model.Our article is structured as follows.First, in Section II we provide combinatorial formulas to count labelled networks.As we shall see, the theory of Bell numbers and partitions provides a powerful way to count such networks.We also find a series of recursion relations which constrain the number of labelled networks with specific properties (e.g., without or with singletons).These recursions are associated with an exponential network-generating function which we find explicit formulas for.Second, in Section III we discuss the case of unlabelled, topologically inequivalent, networks, and derive a formula to count the number of such structures with two clusters, which is of interest in applications to chromatin structures in real gene loci.Section IV contains numerical results obtained by simulating chromatin folding within a specific polymer model, viewing the chromatin fibre as a semiflexible self-avoiding chain with equally spaced sticky sites (the TUs).Here we show that different target topologies require different interaction energies between the TUs to form so that they are in general associated with a different entropic cost of formation.These results complement those discussed in the companion paper [17], which show that rosette-like topologies, that are rich in local loops, are much more favoured statistically with respect to others with non-local loops.In Section V, we compute the statistical weight of a generic topology in the simplified case of a phantom freely-jointed chain (i.e., without excluded volume interactions).We show that the weights we compute, although approximate due to the neglect of excluded volume effects, are sufficient to recapitulate the much enhanced statistical likeliness of forming rosette-like networks, in spite of the fact that the combinatoric multiplicities of other topologies are often larger.Finally, Section VI contains our conclusion.

II. COMBINATORICS OF LABELLED CHROMATIN LOOP NETWORKS
We first consider the case of labelled chromatin loop networks, where more progress can be done analytically.Therefore, in this Section TUs are assumed to be labelled from 1 to n, and we think of TUs as the set {1, 2, . . ., n} that is also denoted by [n].
For such labelled networks, we first derive a few enumerative results; we then discuss recursion relations and derive their generating function.Whenever suitable, the asymptotics will be discussed, and we will also give references to the Online Encyclopedia of Integer Sequences [18], when the counting sequence in question can be found there.
The combinatoric multiplicities which we will find can be used, for instance, to find all possible configurations of a chromatin segment with a given number of TUs and a list of desired features (such as the number of clusters and of singletons).This provides a useful bound for all possible topologies that this genomic region can form, in either simulations or experiments.

A. Configurations with an arbitrary number of clusters
To begin with, we note that, if we do not care about the number of clusters in the configurations, then the number of different configurations with n TUs is given by the Bell number B n .This is because each configuration can be thought of as a partition of the set [n], where each subset (or block, or part) with at least two TUs will form a cluster, while the singletons will correspond to the TUs not belonging to any clusters.For example, for n = 6, the partition {{1, 4}, {2}, {3, 5, 6}} encodes a possible configuration with two clusters.It is well-known that B n counts the number of partitions of [n], and this is the sequence A000110 in [18] that begins with 1, 2, 5, 15, 52, 20, 877, 4140, . . . .
The Bell numbers satisfy the recurrence relation n! is e e t −1 , while the ordinary generating function is Also, the Bell numbers satisfy Dobinski's formula k n k! and asymptotically (n → ∞), where the Lambert W function has the same growth as the logarithm [19].Interestingly, if B * n denotes the number of configurations without singletons (i.e. each TU is part of a cluster), then the following (well-known) combinatorial argument can be used to show that B * n+1 = B n − B * n .Note that B n − B * n is the number of partitions of [n] that have at least one singleton.Now, take all singletons in a partition counted by B n − B * n and add them together along with the element n + 1 so that to form a subset in a partition of [n + 1] that has no singletons (and hence is counted by B * n+1 ).This mapping is a bijection.

B. Configurations with a fixed number of clusters
We now discuss how to enumerate configurations with a fixed number of clusters.To do so, a useful set of quantities is provided by the Stirling numbers of the second kind S (n, k), which count the number of ways to partition the set [n] into k subsets.Even though S (n, k) does not directly give us the number of configurations with k clusters, below we will make use of these numbers.
We wish to find the number of partitions of [n] into subsets (i.e., the number of configurations) so that precisely k subsets, 1 ≤ k ≤ n − 2, have two or more elements (i.e., with precisely k clusters).We call this number f (n, k).We highlight that this quantity counts the partition of n TUs into k clusters, with an arbitrary number of singletons.

Number of configurations with one cluster
There are 2 n − n − 1 configurations corresponding to the case of k = 1.Indeed, each binary string s 1 s 2 . . .s n over the alphabet {0, 1} corresponds to a configuration, where s i = 0 indicates that the TU i is a singleton, while s i = 1 indicates that the TU i is included in the only cluster.The number of possibilities is 2 n , but we need to subtract the situations when at most one 1 is present in the string because a cluster needs to have at least two TUs.Note that asymptotically, we have O(2 n ) such configurations.For n ≥ 1, the counting sequence begins 0, 1, 4, 11, 26, 57, 120, 247, 502, . . .
and this is the sequence A000295 in [18].

Number of configurations with two clusters
The case of k = 2 can be derived similarly to the case of k = 1.Instead of binary sequences, we can consider sequences over {0, 1, 2} (there are 3 n of them), and then subtract those sequences that do not correspond to configurations with precisely two clusters (for example, sequences with no 2s, or with one 1 and one 2).However, this method is still cumbersome, so we use the following formula instead, where i corresponds to the number of singletons in a configuration (this number cannot be bigger than n − 4 for us to be able to create two clusters), n i is the number of ways to choose these singletons in [n], S (n − i, 2) (equal to 2 n−i−1 − 1 [20]) counts the number of configurations with two clusters, and we have to subtract (n − i), the number of possibilities for clusters receiving a single TU, giving The last equality can be checked, for instance, by induction.We note that asymptotically, the number of configurations is O(3 n ) and the counting sequence begins, for n ≥ 4, with 3, 25, 130, 546, 2037, 7071, . . .; (6) this is the sequence A112495 in [18].

C. Recurrence relations and generating function for loop networks with an arbitrary number of singletons
Using the approaches above is rather cumbersome to produce explicit formulas for arbitrary k.Alternatively, we can produce a recurrence relation for the numbers in question, f (n, k), that can be turned into a partial differential equation for the respective generating function.
Note that for n ≥ 2, this recursion reads as follows, To prove Eq. ( 7), we can think of producing, in a unique way, a configuration with n TUs from a smaller configuration by introducing the n-th TU.The possible disjoint options are: (i) n joins an existing cluster (with at least two TUs in it) or n becomes a singleton, and there are (k + 1) f (n − 1, k) possibilities in this case; (ii) n ends up in a cluster with exactly two TUs, and there are (n − 1) f (n − 2, k − 1) ways as there are n − 1 ways to select a TU to share the cluster with n.
The initial conditions of ( 7) are f (0, 0) = 1 and f (0, k) = 0 for k 0, and f (1, 0) = 1 and f (1, k) = 0 for k 0, along with f (n, k) = 0 for k < 0. We now consider the exponential generating function, defined as We can write that Therefore, F(t, x) satisfies the following partial differential equation, The solution of this equation with the boundary conditions F(0, x) = 1 and F(t, 0) = e t can be found explicitly to be Eq. ( 12) can be used to find f (n, k) for arbitrary values of n and k, as well as the associated asymptotic behaviour.Note that f (n, k) is the sequence known as A124324 in [18], where Eq. ( 12) is also given.

D. Loop networks with a fixed number of singletons
We can refine Eq. ( 7) to enumerate configurations with a fixed number of singletons.Let f (n, k, ℓ) be the number of configurations with n TUs, k clusters, and ℓ singletons.This quantity satisfies the following recursion relation: Indeed, we can think of producing, in a unique way, a configuration with n TUs from a configuration with n − 1 TUs by introducing the TU n.The possible disjoint options are: (i) n joins an existing cluster (with at least two TUs in it), and there are k f (n − 1, k, ℓ) possibilities in this case; (ii) n is a singleton, and there are f (n − 1, k, ℓ − 1) possibilities in this case; (iii) n forms a cluster with precisely one other TU, in which case there are (ℓ By iterating the recursion relation (13) with the easily checkable base f (1, 0, 1) = 1 and f (1, k, ℓ) = 0 otherwise, we obtain recovering the total number of configurations with two TUs, B 2 = 2; (ii recovering the total number of configurations with three TUs, (iv) f (5, 0, 5 = 15 recovering the total number of configurations with five TUs, B 5 = 52, and so on.
By using a similar approach as in Section IIC, we can define the following exponential generating function, which obeys the following partial differential equation, Quite remarkably, the physically relevant solution of this more complex equation can also be found explicitly and is given by Note that this solution satisfies the following boundary conditions: Once more, Eq. ( 16) can be expanded to yield coefficients f (n, k, ℓ), therefore solving the problem of enumerating all configurations with a fixed number of TUs, clusters, and singletons.

E. Results for networks without singletons
It is sometimes useful, or of interest, to consider the case where there are no singletons in the configuration.This is, for instance, the case that is considered in the companion paper [17].If we denote by N(n, k) the number of configurations with n TUs, k clusters and no singletons, such that This equation can be derived by noting that the configurations of the chain can be constructed by assigning the first bead to cluster 0, and computing the number of configurations of the rest of the TUs with a single cluster and an arbitrary number of singletons.The singletons are then put in the same cluster as the first bead.In this way, we obtain all configurations with 2 clusters and no singletons once we subtract the single configuration which has no singletons in the rest of the chain (as this would lead to a configuration where the first TU is a singleton, which does not contribute to N(n, 2)).
A similar argument leads to the general identity linking the number of configurations with a given number of clusters with and without singletons.The quantities N(n, k) obey the following recursion relation [21][22][23]] Similarly to what done previously, starting from Eq. ( 19) we can find the following exponential generating function for N(n, k), to be given by The related quantities can now be found exactly for each k, and are given by [21] FIG. 3. Example of a reducible network, and of its decomposition into irreducible blocks (here separated by dashed vertical lines).The configuration shown is a string of rosettes.

F. String of rosettes and reducible networks
A natural question is whether a particular configuration can be broken up, or reduced, into a series of simpler configurations.To characterise such states, we call a configuration with n TUs irreducible if it contains no cluster and only singletons, or it has k clusters, one of which contains TU n, and it is not possible to separate the k clusters into two groups by cutting a single polymer segment.
An example of a reducible network is a string of rosettes, shown in Figure 3, where each irreducible component has a single cluster at most.The decomposition into irreducible blocks is always unique assuming that if the configuration has at least one cluster, then the leftmost irreducible block has a cluster.
We next derive the ordinary generating function A(t) for the number of configurations in a string of rosettes.Note that there are 2 n−1 − 1 irreducible configurations with n TUs with a cluster as this is precisely the number of ways to choose at least one TU to join n in the cluster.The generating function for these numbers is .
Noting that the generating function for irreducible blocks without clusters is 1 1−t , we have The corresponding sequence is A001519 in [18] and it has many combinatorial interpretations.One can derive from the generating function through the recurrence relation that where ϕ = (1 + √ 5)/2, and hence asymptotically, the number of configurations in a string of rosettes is The concept of reducible networks would be useful to enumerate configurations of longer chains that we consider in this work.Additionally, strings of rosettes appear often in simulations and it is therefore useful to provide a way to separately count the number of possible configurations leading to this specific type of polymer network.

III. COMBINATORICS OF INEQUIVALENT TOPOLOGIES FOR UNLABELLED NETWORKS
We now discuss the case of unlabelled networks, which as anticipated is of interest when discussing the relative frequencies of different network topologies, irrespective of the specific labelling chosen.This is relevant, for instance, when asking whether, in a simulation, or experiment, rosette topologies are more or less common than watermelon ones.To study this case, we will be mapping networks to graphs and matrices.While this mapping is not necessary to derive the formula we will give below, which holds for k = 2 clusters, it provides a useful framework to build, for instance, numerical algorithms which can enumerate all possible inequivalent topologies with a larger number of clusters k.
Specifically, we begin by noting that the network topologies assembled by joining the TUs of a polymer can be mapped to graphs with n v vertices and n e edges (see Fig. 4A).The vertices of the graph correspond to either cluster of TUs or to one of the two polymer ends, while the edges of the graph correspond to polymer segments between two TUs or between one TU and one of the polymer ends.
We note that not all graphs can be representations of a polymer with TUs: since they are associated with a folded polymer, graphs representative of a chromatin loop network must be connected and traversable (see Fig. 4B).Additionally, both n v and n e are constrained by the number of TUs n.If none of the TUs coincide with the ends of the polymer, and if singletons are disallowed (as in Fig. 4A), n e = n + 1 and 3 < n v ≤ ⌊ 2p+2 3 ⌋ + 2, where p = n e − 2. Two of these vertices correspond to the polymer ends, and their degree is 1; we will call internal vertices all the others, namely all the vertices associated with clusters of TUs (V 1 and V 2 in Fig. 4A) [24].

A. Enumeration of inequivalent topologies with 2 clusters
We now proceed to count the number of topologically inequivalent, connected, and traversable graphs with a given number, n + 2, of edges and 4 vertices, V 1 , V 2 , V 3 , V 4 , two of which, V 3 , V 4 , of degree 1.This is the number of topologically inequivalent networks with n TUs and k = 2 clusters, without any singletons, studied in the companion paper [17].
Let G be a graph of this kind.G is identified by 5 numbers: a, b, c, n 1 and n 2 .Of these, a and b denote the number of edges connecting, respectively, vertex V 1 and vertex V 2 to themselves, c is the number of edges connecting vertex V 1 to vertex V 2 , while n 1 and n 2 are the numbers of vertices of degree 1 connected, respectively, to V 1 and V 2 (for instance, the network in Fig. 4A has n 1 = n 2 = 1, whereas the top left graph in Fig. 4B has n 1 = 2, n 2 = 0).The following symmetric matrix, therefore, identifies G in a compact way (see Fig. 4A) To count the number of graphs we are interested in, we remark that: (i) G and G ′ are equivalent if and only if where P is one of the two permutation matrices In this case, we say that M(G) and M(G ′ ) are equivalent (i.e., they represent equivalent graphs).
(ii) G is disconnected if and only if c = 0.
(v) Since a connected graph is traversable if and only if the number of vertices with odd degree is either 0 or 2 [25], deg(V 1 ) and deg(V 2 ) must be even.Moreover, since deg(V 1 ) = n 1 + 2a + c and deg(V 2 ) = n 2 + 2b + c we have the following cases: (A) if c is even, either n 1 = 2 and n 2 = 0, or n 1 = 0 and n 2 = 2; (B) if c is odd, n 1 = 1 and n 2 = 1.
Let us call {M} G the set of all inequivalent (according to point (i) above) matrices representing a graph with the desired constraints.To count the inequivalent topologies, let us consider the map f This map covers all the desired inequivalent topologies, but it is not injective [26].The number of possible combinations of (a, b, c) satisfying the constraints n = a + b + c, a ≥ 0, b ≥ 0 and c ≥ 1 is given by From this number, we first need to identify the combinations of (a, b, c) which map to equivalent graphs (or matrices), then to take away those which would lead to multiple counting of the same topology, and finally to remove the combinations which do not satisfy point (iv).
To do so, we note that the graph equivalence condition If c is even, this is never met by construction; if c is odd, (a, b, c) and (b, a, c) are mapped to equivalent matrices.To account for this, and avoid double counting of these equivalent topologies, we require a ≥ b, a condition which removes odd c≤n c−1 2 possibilities: equivalently, n−1 2 i=1 i combinations if n is odd, and n 2 −1 i=1 i combinations if n is even.Finally, to account for point (iv), we also remove the two configurations (a = n − 2, b = 0, c = 2) and (a = n − 1, b = 0, c = 1) from the total count [27].
The total number of inequivalent graphs with n TUs and 2 clusters is then given by where ⌊x⌋ denotes the floor of x (the largest integral smaller than or equal to x).

B. Network multiplicities
Note that, for each of the unlabelled network topologies just found, there are multiple possible labelled configurations that correspond to it.As these combinatorial weights, or multiplicities, are generally different for different topologies, it is desirable to keep track of these.It is however difficult to go beyond a case-by-case study.Here, we focus on the case of n = 8 TUs and k = 2 clusters, studied in [17], for which there are 20 inequivalent topologies [as predicted by Eq. ( 32) for n = 8].
In this case, for each inequivalent topology, Table I provides the number of ties, n t , the number of loops n l , the degree of the two vertices in the graphs (corresponding to the clusters), L 1 and L 2 respectively, and the multiplicity of the topology Ω, which is the number of labelled configurations corresponding to that topology.The degrees L 1 and L 2 determine the entropic exponent of the polymer network [16]: it can be seen that there are three classes of such exponents in the 20 topologies considered in Table I.
Regarding multiplicities, we observe that these tend to be larger for hybrid networks which are intermediate between rosettes and watermelons and that multiplicities are in general small for networks with low n l .This would suggest, that, in the absence of other biases, such configurations would form less often.We shall see in what follows, however, that these topologies are actually easier to form, so there is an interesting competition between the combinatoric multiplicity and the entropic cost of formation of these loop networks.
Note that, as required, the total sum of multiplicities for all topologies equals N(8, 2) = 119, namely the number of 2−cluster configurations with n = 8 without singletons found previously (see (17)).

IV. COARSE-GRAINED MOLECULAR DYNAMICS SIMULATIONS
Having discussed the combinatorial problem of enumerating the possible configuration and inequivalent topologies of a chromatin loop network, we now turn to the associated polymer physics problem and ask what interaction between TUs needs to be included to form a target topology in practice.This calculation requires computer simulations for polymer models representing chromatin fibres, and therefore here we use coarse-grained molecular dynamics simulations to study this problem.
In this Section, we will first describe the model used, and then present our simulation results, whose main outcome will be to show that different topologies require significantly different energy inputs to form.This energy of formation will combine in practical examples with the combinatoric multiplicities discussed above (the relevant ones for the case at hand are those given in Table I) to determine the likeliness of observing a given topology in an unconstrained polymer simulation, such as the one discussed in [17].

A. Model and potentials used
We model a chromatin fibre as a bead-and-spring polymer.The underlying equations of motion are the set of Langevin equations for each bead, where m is the bead mass, x i is the i th bead position, γ is the drag, and η(t) is uncorrelated white noise defined by η(t) = 0 and η α (t)η β (t ′ ) = δ αβ δ(t − t ′ ).This Langevin equation imposes an NVT ensemble on the system within which fluctuation and dissipation govern the exploration of the configuration space.
In order to reproduce behaviour appropriate to chromatin, the following potentials, U, enter into this equation according to the particular beads under consideration.A simple phenomenological Lennard-Jones potential, truncated to include only the repulsive regime (Weeks-Chandler-Andersen potential) acts between all beads in the system enforcing excluded volume, or self-avoidance.This is given by where σ is the bead diameter.
To capture chain connectivity, finitely extensible non-linear elastic (FENE) bonds are considered, acting only between consecutive beads along the polymer chain: give the critical energy between TUs needed to form the topology, ϵ c , in units of k B T , together with the 95% confidence interval: these results correspond to the simulations presented in Section IV.We have subdivided the diagrams into three classes, each characterised by the same pair of nontrivial vertex orders.
where K = 30 k B T /σ 2 is the spring constant and R 0 = 1.6σ is the maximum extent of the bond.Finally, we added a bending or Kratky-Porod potential, which acts on the angle θ between three consecutive beads along the chain and enforces a non-zero persistence length, l p : where K b = k B T l p /σ = 3 k B T .Note that the persistence length is artificially raised during equilibration to assist the system in reaching a self-avoiding configuration.It is then lowered from 10σ to 3σ during the main simulation.This is an appropriate value for flexible chromatin [13,14,28].
To study the formation of a target topology we included attractive interactions between beads which should be in the same cluster in the target topology; this procedure is similar to what is done in a Go model approach to study protein folding, where only attractive interactions between residues in contact in the folded state are included [29].We considered all 20 topologies in Table I; for instance, for a symmetric 2-rosette state we included an interaction between the first four T Us, and between the last four.The attraction between the selected TUs was simulated by a Lennard-Jones potential, where part of the attractive tail was retained.The interaction range (cut-off) was set to 1.8σ, while the interaction strength ϵ was adjusted between 5 and 15 k B T .For large enough ϵ, the target topology is formed, with all interactions realised.[Note that not all topology may be realisable if the steric interactions between different polymer segments prevent this, but in our case, this was not an issue.] The transition between an initially unstructured chain and the target topology arises because the energy gained as binding sites come together increases (asymptotically) linearly with the number of binding beads in a cluster, whereas the entropic cost of adding legs to a cluster scales superlinearly [12,16,30].As such, there is a critical energy ϵ c at which the energy gain just offsets the entropic loss, and this is the transition point.Our goal was to find how the value of ϵ c depends on topology.Note that all topologies we compare (i.e., within each of the three classes in Table I) contain the same number of binding site interactions, hence the total maximum energy.The difference in ϵ c is then primarily due to the entropic cost of forming that specific target topology.
Our script loaded a modular pair coefficient file generated by a simple Python script.This allowed the target polymer network topologies to be easily specified and changed while keeping other elements of the simulation fixed, which was important for reproducibility, scalability, and comparisons.Finally, the system was evolved via Langevin dynamics using the LAMMPS simulation package [31].

B. Target topology simulations: rosettes, watermelons, and dependence on the number of ties
As discussed above, in the thermodynamic limit it is expected that the entropic exponent of forming a given topology should depend only on the number (L 1,2 ) of legs meeting at its vertices [16,30].This partitions the set of 20 inequivalent topologies into three classes that have the same values of L 1,2 , (see Table I) so that results discussed below should be compared only between topologies in the same class.
Amongst the first class (first seven topologies in Table I), two topologies, namely the rosette (top topology in the class) and the watermelon (bottom topology in the class), stand out as particularly illustrative choices to discuss the results of the simulations.For these two topologies, simulations were carried out by varying the interaction affinity ε between 5k B T and 15k B T .From the estimate of the pairing energy (E pair ) we computed the dimensionless quantity, e pair = E pair /ε that is reported in figure 5 as a function of ε.
As expected, for small values of ϵ, the chain remains unfolded.In contrast, for sufficiently large values of ϵ the targeting topology is formed (examples of folded configurations in this regime for the rosette and watermelon topologies are shown in Figs. 6 and 7 respectively).The point of sharpest variation of the sigmoidal curves in Fig. 5 can be interpreted as the critical interaction affinity, ϵ c , required to form the target topology (either rosette or watermelon).Thicker lines indicate the mean over 100 random initial configurations for each ε of the normalized pairing energy (e pair ).The surrounding shaded regions represent one standard deviation on either side of this mean.
The interaction affinity giving rise to the maximal standard deviation is taken as the recorded transition affinity (ϵ c ). Confidence intervals were formed using a bootstrapping procedure.In order to better illustrate the relationship between the standard deviation amongst simulations with the same interaction affinity, and the inferred transition affinity, the standard deviations are plotted independently in figure 8. From these curves, it is clear that the location of the maximum differs for the rosette and watermelon topologies.In particular, one can observe that the rosette topology forms more easily, as it requires a smaller value of ε, or equivalently the associated ϵ c (corresponding to the peak in Fig. 8) is smaller.
Note that, while the rosette and watermelon topologies have the same values of L 1,2 , they differ by the number of ties, n t (n t = 0 for the rosette topology, and n t = 7 for the watermelon one).In general, we observed that the larger n t in a target topology, the greater the interaction affinity typically required to form it (with exceptions, see Table I).In this respect, a simple class to study is that of the first seven symmetric topologies shown in Table I, of which the rosette and the watermelon constitute the limiting cases.In order to elucidate the relationship between n t and ϵ c for this class, we carried out a bootstrapped linear regression.The corresponding fit is plotted in Fig. 9: a simple linear relationship between n t and ϵ c holds to a good approximation.As the number of ties increases by one, the number of loops decreases by one too, and so our results indicate that there is a nearly uniform energetic cost each time one loop is exchanged for a tie in a chromatin network.The estimate of the constant cost per tie is ∆ϵ c = 0.31k B T (95% confident interval 0.24 − 0.36k B T ).The other two classes of topologies that are reported in Table I) still lead to an increase of ϵ c with n t but the functional form is less clear (see Table I for a list of values of ϵ c found for each inequivalent 2-cluster topology).

V. TOPOLOGICAL WEIGHTS OF GAUSSIAN CHROMATIN LOOP NETWORKS
Up to now, we have enumerated the configurations of polymer loop networks, thereby finding their combinatorial weights.We have also seen in the last Section that Brownian dynamic simulations show that the energy which is required to offset the entropic cost associated with the formation of these topologies is significantly different.In the companion paper, we have additionally shown that inequivalent (unlabelled) topologies with the same combinatorial weight, such as the rosette and watermelon ones, are observed in polymer models with starkly different frequencies.In this Section, we will show that these results can be understood, at least qualitatively, by computing the topological weight of a given graph, which is essentially the partition function of a Gaussian polymer network with that topology.[Note this is equivalent to a freely-jointed polymer network with a large number of monomers [32].]More specifically, to compute the topological weight of a given graph G, associated with an inequivalent topology of a chromatin loop network with n TUs, we need to compute its corresponding partition function,   where e − 3(x i+1 −x i ) 2 2lσ can be thought of, in field theoretical terms, as the propagator of our Gaussian theory, from the i-th to the (i + 1)-th TU.
In the remainder of this Section, we will first compute in detail the topological weights of unlabelled configurations with 2 clusters, which are the focus of the numerical simulations in the companion paper [17].Afterward, we shall see how to generalise the calculation to compute the topological weight of any given Gaussian polymer loop network.This calculation can be done explicitly because we are approximating the polymer with a Gaussian chain.Including self-avoidance and mutual avoidance between different polymer, segments would require a separate treatment and is outside the scope of the current work.In the special case of 2-cluster configurations, self-avoidance is included in the discussion of the results in [17].

A. Topological weights of 2-cluster configurations
We begin by noting that the term δ(G) in Eq. ( 37) is a product of Dirac δ functions which specify the topology of the network [16].For instance, in the case of rosettes (G ≡ R, Fig. 10A) and watermelons (G ≡ W, Fig. 10B), δ(G) is explicitly given by δ(R) and δ(W), with δ(x 2 − x j ).The topological weight of the rosette is therefore given by where the TU labelling in the integral follows the one in Fig. 10A.Noting that with , and that an analogous formula holds for the integral over dx 9 , we obtain, by making use of the properties of the δ function, that where we have called V the volume of the system.By repeating the same steps for the watermelon topology (see Fig. 10B, and the associated choice of TU labelling), we get Therefore, the topological weight of the watermelon is much smaller than that of the rosette.Additionally, one can generalise the result shown above to hybrid rosette-watermelon configurations with 2 clusters and n t ties, obtaining that their topological weight is given by which becomes Eq.(42) for n t = 7 (which holds for the watermelon topology).The decrease in topological weight of 2-cluster topologies with n t qualitatively explains why they are seen less frequently in simulations [17], and why the interaction energy between TU needed to stabilise a topology increases with n t , as found in the previous Section with coarse-grained molecular dynamics simulations.

B. General formulas for the topological weights of Gaussian loop networks
With a bit more work, the topological weight calculation just outlined can actually be generalised to any chromatin loop network.
To see how let us consider the topology G shown in Fig. 11.Its associated topological weight is given by: which, by using methods similar to those described in the previous section, can also be written as, We now introduce the following matrix, equal to the adjacency matrix of the multi-graph corresponding to G (note that, if loops were present in G, they should not be included in the calculation of A(G)).From this, we define the matrix, where we have changed the sign of the off-diagonal components and the diagonal components have been set equal to the sum of the corresponding row in A(G).With this setup the argument of the exponential in Eq. ( 11) can be written in matrix form as, where x T = (x 1 , x 2 , x 3 , x 5 ).Note that det(B(G)) = 0.This is consistent with the fact that the topological weight is proportional to the volume V of the system.By fixing the position of one of the cluster centre of mass, say x 1 , and integrating over it, the weight associated with this topology can be given in terms of the determinant of the matrix obtained by removing the first row and column, as follows It can be verified that, as expected, the above result does not depend on which cluster is fixed and integrated upon, as the determinant of any matrix obtained by removing the i-th row and column is the same.
By applying this procedure, for instance, to a string of rosettes with n TUs one can show that the corresponding topological weight is W −3 0 V. Finally for a general graph G with n TUs and k clusters, its topological weight can be computed by starting from the corresponding matrix B(G) and computing the determinant of any sub-matrix B ′ (G) obtained by removing the i-th row and column, for any i ∈ [1, k].This is given by The above result confirms that a generic network typically has a (significantly) lower weight with respect to that of a string of rosettes with the same number of TUs n.This is in line with the numerical results obtained for k = 2.

VI. DISCUSSION AND CONCLUSIONS
In summary, we have presented a combination of analytical and numerical results for the combinatorial and topological weights of chromatin loop networks.These weights are important to determine the relative frequencies with which different topologies arise in polymer models for DNA and chromatin, which are studied in the companion paper [17].In particular, we are interested here in the relation between these results and the physical properties of the loop networks which arise due to the bridging-induced attraction [13,14,28], in polymer models for the 3D structures formed by chromatin fibres in vivo, and which are associated with gene folding.For instance, the statistical, or Boltzmann, weight associated with a given topology shown in Table I, which determines the frequency with which it is observed for instance in computer simulations, is proportional to its combinatorial weight (computed in Section III) times its topological weight (computed in Section V).
We have shown that the enumeration problems associated with counting labelled and unlabelled chromatin loop networks are fundamentally different.When transcription units (TUs) are labelled (Section II), the problem can be usefully mapped to that of counting the ways in which n different TUs can be distributed into k clusters with ≥ 2 TUs per cluster.The resulting combinatorial sequences are often related to the Bell or Stirling numbers, and we have shown that it is possible to find explicit formulas for the exponential generating functions associated with a number of different cases, with or without singletons (i.e., TUs not in any clusters).This is useful for providing estimates or upper bounds for the number of topologies which a given chromatin region (with a specified number of TUs) can fold into.
For networks with unlabelled TUs, corresponding to inequivalent topologies (Section III), the enumeration problem is related to that of counting multi-graph, which is NP-complete, hence harder.We have though provided here a derivation of a formula counting all inequivalent topologies with n TUs and k = 2 clusters; for n = 8 (a common occurrence in real gene loci [3]) and k = 2, the case studied in detail in the continuum paper, this formula gives 20 inequivalent topologies (shown explicitly in Table I).
We also asked what attraction energy is needed to form a target topology.This is a biophysically relevant question as regards chromatin loop networks: for instance, we may want to know whether a rosette topology or a watermelon one forms more easily (i.e., requires less interaction between the TUs), as this may affect the relative frequency with which these two structures are observed either in computer simulations of chromatin fibres [3,17], or experimentally in 3D structures of gene loci [33].Previous work based on renormalisation group calculations came to the important conclusion that the entropic exponent of a polymer loop network solely depends on the degree of its nodes (the clusters in our terminology) [16,30].However, this exponent does not completely determine the weight, as there is a pre-factor which could in principle be also topology-dependent.More in detail, rosettes and watermelons, and indeed all first seven network topologies in Table I, have the same entropic exponent, yet they require significantly different energies to form, as we show in Section IV.In particular, by focussing on configurations with n = 8 TUs and k = 2 clusters, our simulations show that the critical energy to form a target topology tends to increase with the number of ties, or polymer segments, linking the two clusters, which we call n t .
In Section V, we compute the topological weight of a chromatin loop network, under the assumption that the polymer is a Gaussian chain.This weight is the partition function of a network with the given topology and, importantly, we find that it depends strongly on n t , qualitatively explaining our numerical results in Section IV.
In the future, it would be interesting to generalise the topological weight calculations in Section V to the case where the polymer network has both self-and mutual avoidance.From an application perspective, it would be desirable to use our labelled and inequivalent unlabelled topologies to classify the 3D configurations of chromatin fibre around genes, for instance, the gene loci configuration found by "HiP-HoP" simulations in [3], or the interaction networks and hypergraphs found by chromatin capture experiments accounting for multiway chromatin contacts, such as poreC [33].We hope that these extensions of our work will be addressed in the future.
This work was supported by the Wellcome Trust (223097/Z/21/Z).For the purpose of open access, the authors have applied a Creative Commons Attribu-tion (CC BY) licence to any Author Accepted Manuscript version arising from this submission. .

FIG. 1 6 FIG. 2 .
FIG. 1. (A) (Top) A chromatin fibre with n = 8 TUs.(Bottom) A possible structure formed when TUs attract each other, for instance effectively due to the bridging-induced attraction [13].The structure is made of two clusters and 4 local loops (B) The loop network topology corresponding to this configuration (repeated at the top for clarity) is shown at the bottom of the panel.
recovering the total number of configurations with four TUs, B 4 = 15;

FIG. 4 .
FIG. 4. (A) Schematics showing how polymer networks can be converted into graphs, and graphs to matrices.(B) Examples of connected and disconnected, traversable and not traversable, topologically equivalent and inequivalent graphs (or equivalently polymer networks).

FIG. 5 .FIG. 6 .
FIG. 5. Plot of the average energy per bead as a function of attractive energy ϵ.The sigmoidal shape of the curve signals the formation of the target topology.

FIG. 7 .
FIG. 7. Simulation snapshot and corresponding topology for the watermelon case.Note different TUs are differently coloured.

FIG. 8 .
FIG.8.Plot showing simulation results for the standard deviation of the normalised pairing energy as a function of the attraction between TU for the rosette (blue) and watermelon (red) topologies.Maxima were used to infer the transition affinities, which are indicated with dashed lines.Bootstrapped 95 % confidence intervals are shaded in yellow.

. 10 .
FIG.10.Loop network configurations and TU labelling used for the calculation of the topological weights of the rosette (A) and watermelon (B) topologies.

10 FIG. 11 .
FIG. 11.Loop network configuration and TU labelling used for the calculation of the topological weights of a network with multiple clusters (here 4).
Diagram n t n l L 1 L 2 Ω ϵ c /(k B T ) CI/(k B T ) TABLE I. Topology summary table.All topologies with n = 8 binding sites and 2 clusters (graph vertices) are listed, together with their number of ties (n t ), number of loops (n t ), nontrivial vertex orders (L 1 and L 2 for first and second cluster), and multiplicity (Ω).The last two columns FIG.9.Plot of critical energy, ϵ c , against number of ties n t for the first class of topologies in TableI, with L 1,2 = (8, 8).Values corresponding to 95% confidence intervals for each values of n t are found by bootstrap.