Weighted hypersoft configuration model

Maximum entropy null models of networks come in different flavors that depend on the type of the constraints under which entropy is maximized. If the constraints are on degree sequences or distributions, we are dealing with configuration models. If the degree sequence is constrained exactly, the corresponding microcanonical ensemble of random graphs with a given degree sequence is the configuration model per se. If the degree sequence is constrained only on average, the corresponding grandcanonical ensemble of random graphs with a given expected degree sequence is the soft configuration model. If the degree sequence is not fixed at all but randomly drawn from a fixed distribution, the corresponding hypercanonical ensemble of random graphs with a given degree distribution is the hypersoft configuration model, a more adequate description of dynamic real-world networks in which degree sequences are never fixed but degree distributions often stay stable. Here, we introduce the hypersoft configuration model of weighted networks. The main contribution is a particular version of the model with power-law degree and strength distributions, and superlinear scaling of strengths with degrees, mimicking the properties of some real-world networks. As a by-product, we generalize the notions of sparse graphons and their entropy to weighted networks.


I. INTRODUCTION
Many real-world complex systems that can be represented as networks [1,2] require weighted representations in which connections between nodes are characterized by positive weights [3]. For example, in modeling the global spread of an epidemic using an air transportation network as a backbone, it is important to know not only that there exists a flight from airport i to airport j, but also the volume of the passenger flow between the two airports. This volume is usually encoded as the link weight w ij [4,5].
The configuration model (CM) [33,34] is a microcanonical ensemble of random graphs with sharp constraints on the degree sequence, meaning that every graph in this ensemble has exactly the same degree sequence, e.g., the one observed in a real-world network. In the CM, every graph satisfying the constraint has the same probability in the ensemble. All other graphs are excluded. The soft configuration model (SCM) [6,8,10,35,36] is a grandcanonical ensemble of random graphs with soft constraints on the degree sequence. This means that the expected degree sequence in the ensemble is equal to a given degree sequence.
In both CM and SCM, a fixed degree sequence is the constraint under which the ensemble entropy is maximized, either micro-or grand-canonically. This constraint, however, does not properly reflect the dynamic nature of node degrees observed in many real networks, where the degrees of all nodes may constantly change, while the shape of the degree distribution stays stable [37,38]. These observations motivated the development of the hypersoft configuration model. The hypersoft configuration model (HSCM) [14,[39][40][41] is a hypercanonical ensemble of random graphs whose entropy is maximized under the constraint that the degree distribution has a given shape. The HSCM belongs to the class of models with hidden variables [40], meaning that each node has a latent parameter sampled from a fixed distribution. This latent distribution defines the degree distribution, so that by tuning the former, one can reproduce any shape of the latter.
The ((H)S)CM story outlined above for unweighted networks finds an incomplete and somewhat distorted reflection in weighted networks where the constraints under which entropy is maximized are on both degrees and strengths of nodes [12]. The weighted configuration model (WCM) is a microcanonical ensemble of networks with sharp constraints on both the degree and strength sequences. Every weighted graph in this ensemble has the same degree and strength sequences, and the same probability in the ensemble. However, we note that several models that come under the WCM name [12,42,43] are not really as defined above. The weighted soft configuration model (WSCM) [9,12,13,15] is a grandcanonical ensemble of networks with soft constraints on the degree and strength sequences, meaning that the expected degree and strength sequences in the ensemble are equal to given degree and strength sequences.
In this paper, we introduce the weighted hypersoft configuration model (WHSCM) which is a hypercanonical ensemble of networks with a fixed joint distribution of degrees and strengths. Similar to the HSCM, the WHSCM is a hidden variable model where each node has two latent parameters sampled from a fixed joint distribution which defines the joint distribution of degrees and strengths. We summarize the taxonomy of the (((W)H)S)CM models in Table I. Besides introducing the WHSCM in general, the main focus of this paper is a much more involved task, which is to identify the joint distribution of latent parameters that reproduces several features of degree and strength distributions observed in many real weighted networks [3,[44][45][46][47][48]. Specifically, these features are: (1) power-law degree distribution, (2) super-linear scaling between strengths and degrees, and (3) sparsity. The last one means that the average degree is constant as a function of the network size.
We proceed by first providing all the necessary motivation and background information in Section II that ends with the introduction of the most general form of the WHSCM. In Section III, we document in detail the real-world-network-dictated properties, mentioned above, that we want our particular power-law version of the WHSCM to reproduce. To reproduce those properties, we need some experimental input from the WSCM as defined in [15], a subject of Section IV. Based on this input, we derive in Section V the WHSCM latent parameter distribution that satisfies the requirements of Section III. We check in simulations that these requirements are indeed satisfied in Section VI. To demonstrate how the constructed model can be used in the analysis of real-world networks, we juxtapose several real-world networks and their WHSCM counterparts in Section VII. We conclude in Section VIII with the discussion of obvious and less obvious limitations, caveats, wishful thoughts, and abstract remarks. We release our implementation of the WHSCM graph generator in a software package available at GitHub [49].

II. MOTIVATION, BACKGROUND, AND GENERAL WHSCM
A. Maximum entropy models Understanding mechanisms that drive formation of complex networks and dynamical processes running in them is a crucial task in network science [50]. It is commonly believed that many real complex systems are self-organizing, i.e., adjusting their structure to optimize their function [51]. Because of that, a lot of past research was dedicated to finding structural network properties that may be indicative of yet unknown optimization mechanisms behind network evolution and function [51][52][53]. However, due to potentially strong interdependencies between different structural properties of networks [17][18][19][20], it is important to ensure that a property of interest is indeed a salient feature, and not a mere consequence of some of its other structural properties. A method that is often used to make this check is to compare the significance of the structural property present in a given network with respect to the same property in benchmark null model networks [21][22][23][24]. This step should be taken with care as choosing an inappropriate null model for a given network may lead to wrong conclusions about its functional and structural features [25][26][27].
The maximum entropy network null models [6,8,9,12,13,16,54] have proven to be an indispensable tool in avoiding possible statistical biases caused by interdependencies of structural network properties. These models are ensembles of networks that reproduce given structural properties, and that are maximally random in all other respects. This maximal randomness is important for a great variety of tasks [17][18][19][20][21][22][23][24][25][26][27]. For a basic example, if a maximum entropy null model is defined by property X observed in a real-world network, and if the model also reproduces some other property Y of the network, then we know right away that Y is not a salient independent feature, but a statistical consequence of X [20].
Maximum entropy network models are usually formulated for a given observed network G * of size n in terms of sets of constraints. These constraints are usually some properties C(G * ) of network G * that the model is required to reproduce. We note that the constraints do certainly not have to be properties of any real network, they can be any set of (artificial) network properties, but what we describe is the most common application scenario.
Given constraints C(G * ), the model is then an ensem-ble of graphs G of the same size n as the original graph G * , with each graph G ∈ G appearing in the ensemble with probability P (G), known as the ensemble distribution. The ensemble distribution in a maximum entropy model defined by constraints C(G * ) is the unique unbiased probability distribution P (G) that maximizes Gibbs/Shannon entropy and that satisfies the constraints C(G * ) and the normalization condition G∈G P (G) = 1 [16,55].

B. Unweighted configuration models
In the simplest case of undirected and unweighted networks, the degrees of all nodes in a given network are frequently used as constraints in maximum entropy null models. The simplest example is the configuration model.
Configuration model (CM) [33,34]. The CM is a microcanonical ensemble of graphs with the same degree sequence as observed in a real network. That is, given the degree sequence {k * 1 , . . . , k * n } = k * observed in a graph G * , the CM ensemble consists of all graphs G with exactly the same degree sequence, i.e., for each degree sequence k(G) of an ensemble graph G, the following holds: The distribution P (G) that maximizes Shannon entropy in Eq. (1) is the uniform distribution over the set of all graphs whose degree sequence is k * , meaning that for these graphs P (G) = 1/N k * and S = log N k * , where N k * is the number of graphs with the degree sequence k * . For all other graphs, P (G) = 0. Soft configuration model (SCM) [6,8,10,35,36]. The SCM is a grandcanonical ensemble of graphs whose expected degree sequence is constrained to a given (observed) sequence. Specifically, given a degree sequence k * , the SCM constraint is where k(G) is the degree sequence in an ensemble graph G. Since the degree sequence used as a constraint is reproduced only in expectation, it does not have to consist of integers only; the expected degrees can be any positive real numbers. As in any (grand)canonical ensemble, the Shannon entropy in the SCM is usually maximized using the method of Lagrange multipliers [6], yielding the familiar Gibbs/Boltzmann exponential family distribution with Hamiltonian where ν i is the Lagrange multiplier coupled to node i, k i (G) i's degree in G, P the set of all node pairs, a ij the adjacency matrix of G, and Z = G∈G e −H SCM (G) the partition function. In statistical terms, these equations say that the degree sequence is the sufficient statistics in the ensemble, so that all graphs with the same degree sequence have the same probability in the ensemble. Graphs G can be sampled from the ensemble distribution P (G) constructively by walking over all node pairs i, j and linking them with the Fermi-Dirac connection probability The expected degree κ i of node i in the ensemble is thus so that the values of the Lagrange multipliers ν i for a given k * are found as the solution of the system of n equations Hypersoft configuration model (HSCM) [14,[39][40][41]. The HSCM is a hypercanonical ensemble of graphs defined by the constraint that the expected degree distribution has a given form. The HSCM can be viewed as a hyperparametrization of the SCM, in that the Lagrange multipliers ν are not fixed by any degree sequence as solutions of (8), but random, sampled independently for each node i from a fixed distribution ρ(ν): Having a sampled sequence of Lagrange multipliers ν i , the nodes are then connected as in the SCM, with the Fermi-Dirac connection probability in Eq. (6). The expected degree κ(ν) of a node with Lagrange multiplier ν in the ensemble is where p(ν, ν ) is from Eq. (6). It is convenient to abuse the notations by defining the expected degree random variable κ via where the l.h.s. κ is a random variable, but the r.h.s. κ(ν) is a function, defined in Eq. (10), of the random variable ν whose distribution is ρ(ν). That is, the last equation is a change of latent variables from ν to κ. One can show [14] that in sparse graphs the κ(ν) function can be well approximated as where R and κ 0 are the interchangeable parameters that control the expected average degreek = κ 2 0 e 2R /n in the ensemble. With this approximation, the Fermi-Dirac connection probability in Eq. (6) can be rewritten in terms of the κ variables as As proven in [56], the classical limit (or Chung-Lu [35,36]) approximation to the connection probability (13) "almost always works," in the sense that the HSCM ensembles of random graphs defined by the connections probabilities (13) and (14) are asymptotically equivalent under very mild assumptions on the distribution of κ. Since Eq. (11) defines the relation between the two random variables κ and ν, it also defines the relation between their distributions ρ(ν) and ρ(κ) via the standard formula where ν(κ) is the inverse function of κ(ν) and ν (κ) its derivative. By the definition of κ, its distribution ρ(κ) is the distribution of expected degrees in the HSCM, the analogy of the expected degree sequence κ i (7) in the SCM. Therefore, the HSCM analogy of the SCM constraints (8) is where ρ * (κ) is any desired expected degree distribution. For example, it can be a pure power law, i.e., the continuous Pareto distribution In view of Eq. (12), the distribution of the Lagrange multipliers ν is exponential in this case: The general exact expression for the expected average degree in the ensemble is In sparse graphs, the expected degree distribution ρ(κ) defines the degree distribution P (k) via where Pois(k|κ) is the Poisson distribution with mean κ. Such distributions P (k) are called mixed Poisson distributions with κ the mixing parameter. The shape of a mixed Poisson distribution P (k) follows the shape of the distribution of its mixing parameter ρ(κ). For example, if ρ(κ) is Pareto with exponent γ, then the Pareto-mixed Poisson distribution is also a power law, albeit impure, with the same exponent, P (k) ∼ k −γ , or in stricter terms, it is a regularly varying distribution with exponent γ [32]. As shown in [14] and discussed in Appendix A, the entropy of HSCM graphs is maximized across all graphs whose degree distribution converges to a given distribution.
The nested hierarchy of the described configuration models is visualized in Fig. 1.

C. Weighted configuration models
For weighted networks, the configuration models are analogous to those for unweighted networks discussed above. They are formulated in terms of constraints on both degrees and strengths of nodes.
Weighted Configuration Model (WCM). The WCM is a microcanonical ensemble of weighted networks with sharp constraints on the degree and strength sequences k * and s * . The ensemble consists of weighted graphs that have exactly the same degree and strength sequences as in the observed graph: Analogously to its unweighted version, the distribution maximizing Shannon entropy is the uniform distribution over all graphs with the joint degree-strength sequence equal to (k * , s * ). This ensemble is well defined and such a uniform distribution always exists because the space of weight matrices {w ij } representing graphs satisfying the constraints (21,22) is always of a finite volume (Lebesgue measure) if weights are real, or of a finite cardinality if they are integer. Indeed, since all weights are positive, they all are bounded by the minimum strength of the two incident nodes: 0 < w ij ≤ min(s * i , s * j ). We note that there exist several network models introduced under the WCM name in the past that are different from the WCM definition above. Specifically, in Ref. [42] the authors consider the unweighted multigraph CM with degree sequences following power-law distributions with γ < 2. In these settings, multiple links between the same pairs of nodes are present with high probability. These multiple links are treated as weights in [42]. In Ref. [12], the WCM is a model in which only the strength sequence is fixed, but the degree sequence is not fixed. In Ref. [43], the WCM is a model in which the degree sequence is fixed, while the weights of links attached to a node are sampled from a distribution that is allowed to depend on the node degree. To the best of our knowledge, the WCM as we defined it above has not been considered. The nested hierarchy of the configuration models. Sampling a graph G from the hypercanonical HSCM ensemble can be done in three steps: 1) sample a sequence κ = {κi} of expected degrees independently from the distribution ρ(κ), i.e., from PHSCM (κ|ρ) = i ρ(κi); 2) sample a degree sequence k = {ki} from the SCM degree sequence distribution PSCM (k|κ); and 3) sample graph G from the uniform CM distribution PCM (G|k). The HSCM probability distribution can thus be written as a chain PHSCM (G|ρ) = κ k PCM (G|k)PSCM (k|κ)PHSCM (κ|ρ) dκ showing that the ensemble at the next level of the hierarchy is a probabilistic mixture of the ensembles at the previous level. In the graphon theory [57], one has to consider the fourth highest level (not shown) where ρ is also random. Moving up in the hierarchy adds new sources of randomness, thus increasing entropy. Since PSCM (k|κ) is intractable-it is a mixture of mixed Poisson distributions-in practice it is much easier to sample graphs as described in the text. The same picture applies to the weighted case upon the addition of strengths s and their expectations σ.

Weighted
Soft Configuration Model (WSCM) [9,12,13,15]. The WSCM is a grandcanonical ensemble of networks whose expected degree and strength sequences are constrained to given (observed) sequences. For any given (observed) degree and strength sequences k * and s * , the WSCM constraints where k(G) and s(G) are the degree and strength sequences of an ensemble graph G. As in the unweighted case, Shannon entropy is maximized using the method of Lagrange multipliers leading to the ensemble distribution with Hamiltonian where ν i and µ i are the Lagrange multipliers coupled to node i, constraining its degree k i (G) and strength s i (G), a ij and w ij are the adjacency and weight matrices of G, P and E are the sets of node pairs and connected node pairs, and Z = G∈G e −H W SCM (G) the partition function. The sufficient statistics are thus the degree and strength sequences, so that all graphs with the same degree and strength sequences have the same probability in the ensemble.
The WSCM was defined and studied both for positive integer- [9,12,13] and real-valued weights [15]. In the latter case the graphs can be sampled constructively from the ensemble distribution P (G) as follows. First, every pair of nodes i and j is connected with the connection probability Second, every established link i, j is weighted by a random weight w ij sampled from the exponential distribution with rate µ i + µ j : The expected weight of link i, j is then The expected degree κ i and strength σ i of node i in the ensemble are thus so that the Lagrange multipliers ν i , µ i are found for given k * , s * as the solution of the system of the 2n equations Weighted Hypersoft Configuration Model (WHSCM). We introduce the WHSCM here as a hypercanonical ensemble of weighted networks with positive real-valued weights. The maximum entropy constraint of the model is that the joint distribution of expected degrees and strengths has a given form. Similar to the HSCM, which is a hyperparametrization of the SCM, the WHSCM is a hyperparametrization of the WSCM, meaning that the Lagrange multipliers ν i , µ i of node i are not fixed by any fixed degree and strength sequences. Instead, ν i , µ i are random, sampled independently for each node i from a fixed joint probability distribution ρ(ν, µ): Having a joint sampled sequence of Lagrange multipliers ν i , µ i , the nodes are then connected as in the WSCM, with the connection probability in Eq. (27), and the established links are weighted by random weights in Eq. (28). The expected degree κ(ν, µ) and strength σ(ν, µ) of a node with Lagrange multipliers ν and µ in the ensemble are where p, ω are from Eqs. (27,29). As in the HSCM, it is convenient to abuse the notations by introducing the expected degree and strength random variables κ and σ via thus changing latent random variables from ν, µ to κ, σ.
The joint distribution of the latter is given by the standard formula where ν(κ, σ), µ(κ, σ) are the inverse functions of κ(ν, µ), σ(ν, µ), and |∂(ν, µ)/∂(κ, σ)| is the absolute value of the determinant of the Jacobian: By the definition of κ and σ, their joint distribution ρ(κ, σ) is the joint distribution of expected degrees and strengths in the WHSCM, the analogy of the joint expected degree sequence κ i , σ i (30,31)-joint via node index i-in the WSCM. Therefore, the WHSCM analogy of the WSCM constraints (32,33) is where ρ * (κ, σ) is any desired joint distribution of expected degrees and strengths. The marginal distributions ρ(κ) and ρ(σ) of the joint distribution ρ(κ, σ) are the distributions of expected degrees and strengths in the ensemble. Therefore the expected average degree and strengths in the model are given bȳ The joint distribution ρ(κ, σ) of expected degrees κ and strengths σ defines the joint distribution of actual degrees k and strengths s via Unfortunately, the conditional joint distribution P (k, s|κ, σ) of degrees and strengths k, s of nodes of a given expected degree and strength κ, σ is in general unknown. It is not even known, in general, what the closed form expression is for the conditional distribution P (s|σ) of strengths of nodes of a given expected strength, which appears in the expression for the strength distribution The distribution P (s|σ) is a probabilistic mixture of exponentially distributed random variables with different random rates. The best what is known about such distributions are some bounds on their tails, but only for fixed, not random rates [58]. However, it is known [40] that if the graphs are sparse then the conditional distribution P (k|κ) of degree of nodes of a given expected degree is still Poisson, as in the HSCM, so that Eq. (20) holds in the WHSCM as well.
In Appendix A, we generalize the notions of sparse graphons and their entropy to weighted networks. This generalization allows us to discuss how to extend the HSCM maximum entropy proof to the WHSCM case, thus showing that the entropy of graphs in the WHSCM ensemble defined above is maximized across all graphs whose joint distribution of strengths and degrees converges to a given joint distribution.
The main focus of the rest of the paper is to identify the latent parameter distribution ρ(ν, µ) that leads to joint degree-strength distributions P (k, s) observed in realworld weighted networks. In what follows, we first formulate in the next section this real-world-inspired form of P (k, s) that we want our specific version of the general WHSCM to reproduce, and then derive the ρ(ν, µ) that leads to this P (k, s).

III. SPECIFIC WHSCM REQUIREMENTS
According to our general definition of the WHSCM model in the previous section, a specific version of the model is fixed by a particular choice of the joint degreestrength distribution P (k, s). Here we document a specific set of properties of this joint distribution, dictated by the properties of many real-world weighted networks [3,[45][46][47][48], that we want our specific version of the general WHSCM introduced above to reproduce.
First, we require the degree distribution P (k), a marginal of P (k, s), to be a power law with γ > 2. Here by "power law" and the "∼" sign, we mean that P (k) is a regularly varying distribution [32] which is a distribution whose complementary CDF satis-fiesF where (k) is a slowly varying function. Our power laws will be Pareto-mixed Poisson distributions whose (k)s converge to constants, lim k→∞ (k) = c. Second, we require the strength of nodes to grow superlinearly with their degrees, as observed in many real weighted networks [3,47,48]. This observation is often expressed ass where η ≥ 1 ands(k) is the average strength of nodes of degree k. We interpret this relation to mean that for some constant s 0 . If the distributions P (s|k) of strengths s of nodes of degree k are concentrated around their expected values, and the degree distribution P (k) is a power law with exponent γ, then the resulting strength distribution P (s) is a power law with exponent δ: If η > γ − 1, then the strength distribution P (s) has exponent δ < 2, meaning that for such combinations of γ and η, the strength distribution P (s) has an infinite first moment, so that the average strengths diverges. We do not want to exclude this possibility from our model, since such combinations of the values of γ and η can be found in real networks [48]. Third, we want our model to produce networks whose average degreek, given by Eq. (42), is independent of the network size n. A model satisfying this condition is said to produce sparse networks.
To simplify the problem significantly, we next observe that if the actual degrees k and strengths s are concentrated around their expected values κ and σ-and this is indeed the case as we show in Appendix B-then the requirements discussed above can be formulated in terms of κ, σ instead of k, s.
We are thus looking for a model in which the distribution ρ(κ) of expected degrees κ follows a power law with exponent γ > 2, and the expected strengths σ grow super-linearly with expected degrees κ. For further simplicity, we want expected strengths and degrees to be deterministically related via where σ 0 > 0. This choice instantly fixes the joint expected degree-strength distribution to where δ() is the Dirac delta function, while the expected strengths are distributed as The parameters of a WHSCM model satisfying the discussed requirements are thus γ, δ,k, σ 0 .
The reason for having σ 0 as a parameter instead of the more natural average strengths, given by Eq. (43), is that the latter is actually infinite if δ < 2 as discussed above. However, σ 0 is well defined even in this case, and controls the baseline of the scaling of strength as a function of degree. If δ > 2, the expected average strengths is finite and in one-to-one relation to σ 0 viā

IV. EXPERIMENTAL HINTS FROM THE WSCM
We are to find the latent parameter distribution ρ(ν, µ) that yields the distribution ρ(κ, σ) of expected degrees and strengths that we want in Eq. (54). Unfortunately, the accomplishment of this task using brute force does not appear possible since it involves solving a system of nonlinear integral equations, as we will see below. Therefore, in this section, we describe a workaround that relies on getting some experimental hints from MLE techniques applied to the WSCM.
First, for convenience and consistency with the powerlaw HSCM described in Section II, we change variables from ν to λ via The support of ν is (−∞, R], so that the support of λ is [1, ∞). We will see that R is the parameter that controls the average degreek and constant σ 0 , and that it grows logarithmically with n, Appendix C. With this change of variables, the connection probability and expected weight change from (27,29) to while the expected degree and strength as functions of the latent variables change from (35,36) to Observe that since weights are positive, the support of µ is (0, ∞).
The brute-force solution of this task is the solution of the system of the two two-dimensional nonlinear integral equations (62,63) with respect to ρ(λ, µ), so that the resulting ρ(κ, σ) is as required in (54). The required amount of brute force for this task is well beyond our analytical strength.
Therefore, we devise a workaround, looking for an ansatz for ρ(λ, µ) using numerical experiments. Specifically, we attempt to "reverse engineer" the distribution ρ(λ, µ) by inferring the values of variables λ i , µ i in the WSCM for synthetically generated degree and strength sequences that satisfy our desired WHSCM constraints in Section III.
The inference is done using maximum likelihood estimation (MLE) by finding a set of values λ i , µ i that maximize the log-likelihood L = log P (G) of a weighted graph G in the WSCM. The probability P (G) of G in the WSCM ensemble is given by Eq. (25). The partition function Z in that equation is calculated in [15] to yield the following explicit expression for P (G): where a ij and w ij denote the entries of the adjacency and weight matrices of G, and P and E denote the sets of node pairs and connected node pairs in G. Using where k i , s i are the degree and strength of node i in graph G, the expression for the log-likelihood simplifies to Observe that log P (G) depends on graph G only via its degree k and strength s sequences, because they are the sufficient statistics in this grandcanonical ensemble. Furthermore, the probability P (k, s|κ, σ) of the degreestrengths sequence k, s in the WSCM ensemble defined by the expected degree-strength sequence κ, σ is at its maximum for k = κ and s = σ [10,16]. Therefore, to execute our MLE program, all we have to do is to generate synthetic degree-strength sequences satisfying the requirements in Section III, and then feed them to the MLE applied to the log P (G) above. We do so as follows: for i = 1, . . . , n, we sample real random numbers x i from the Pareto distribution with exponent γ > 2, and then round them to the closest integers to yield degrees sequences k i = [x i ]. The strengths are then set to s i = σ 0 k η i for some σ 0 > 0 and η ≥ 1. The obtained sequences of degrees {k 1 , . . . , k n } and strengths {s 1 , . . . , s n } are then supplied to the MLE inference of the parameters {ν 1 , . . . , ν n } and {µ 1 , . . . , µ n } by maximizing the log P (G) in Eq. (67) which is done using the simulated annealing algorithm available in the PaGMO package [59]. The inferred {ν 1 , . . . , ν n } are then mapped to {λ 1 , . . . , λ n } via (59) The obtained sequences {λ 1 , . . . , λ n } and {µ 1 , . . . , µ n } for different values of γ and η are shown in Fig. 2. From this figure, we extract several hints suggesting a possible shape of the joint distribution of latent variables ρ(λ, µ): 1. the variable µ can be directly coupled to the variable λ via some function f (λ) as indicated by the λ-µ scatter plots; 2. the visual inspection of the λ-µ scatter plots on the log-log scale suggests that this function scales approximately as a power-law for small and large values of λ, potentially with two different exponents, i.e., f (λ) ∼ λ −β1 when λ → 1, and f (λ) ∼ λ −β2 when λ 1; 3. the complementary CDFF (λ) behaves approximately as a power-law for small and large values of λ, potentially with two different exponents α 1 and α 2 .

V. WEIGHTED HYPERSOFT CONFIGURATION MODEL WITH POWER-LAW DEGREE AND STRENGTH DISTRIBUTIONS
Here we rely on the observations at the end of the previous section to specify a particular version of the WH-SCM model that satisfies the requirements in Section III. These observations instruct us to set the joint distribution of latent parameters λ, µ to where δ() is the Dirac delta function. This setting fixes the WHSCM-definitive distribution ρ(ν, µ) via the change of variables (59), but ρ(λ) and f (λ) are still to be specified. We specify them, again following the hints from the end of the previous section, as follows. The distribution of λ is set to where α 1 > 1, α 2 > 1, and λ c is a crossover point between the two power laws with exponents α 1 and α 2 , whereas A 1 , A 2 are the normalization constants given by while the function f (λ) is set to where a > 0, β 1 ≥ 0, β 2 ≥ 0. With these settings, our model is fully specified by the following set of parameters: 1. exponents α 1 , α 2 , β 1 , β 2 , and 2. parameters R and a, that we move on to specify below. The double power law crossover point λ c may also appear as a free parameter, but we set it to be a function of other parameters as follows.
Recall that the connection probability p(λ i , µ i , λ j , µ j ) in the model is given by Eq. (60). The crossover point λ c is selected in such a way that for λ i , λ j below λ c this connection probability can be approximated by dropping the 1 in the denominator, analogous to the classical limit approximation of the Fermi-Dirac distribution function in statistical physics [14]. This means that the constant λ c defines the point where the term 1 in the denominator in (60) is comparable to the other term there. Therefore, we define λ c to be the value of λ i , λ j such that e 2R × µi+µj λiλj = 1, so that We show in Appendix D that with the settings above, the expected degree of a node with latent parameter λ can be written as where the integrals I 1 -I 4 are explicitly defined in Eqs. (D1-D4). Similarly, the expected strength of a node with latent parameter λ can be written as where the integrals I 5 -I 8 are explicitly defined in Eqs. (D13-D16). We also show in Appendix D that the functions κ(λ) and σ(λ) can be well approximated by power laws where φ 1 ≥ 1, 0 < φ 2 ≤ 1, χ 1 ≥ φ 1 , χ 2 ≥ φ 2 , and that with these approximations, the distributions of expected degrees and strengths do indeed exhibit power-law behavior: Now, if we know how the scaling exponents φ 1 , φ 2 , χ 1 , χ 2 behave as functions of the model parameters, we can easily select those parameters to ensure that ρ(κ) ∼ κ −γ for all values of λ, and that ρ(σ) ∼ σ −δ with δ = 1 + γ−1 η as required in Section III. In Appendix D we find φ 1 , φ 2 , χ 1 , χ 2 as functions of α 1 , α 2 , β 1 , β 2 , and show that the following nontrivial choice of the latter as functions of γ, η produces the desired outcome: The choice of the model parameters above is obtained using a series of approximations outlined in Appendix D.
We find in simulations that this choice produces networks with desired properties if γ > 2 and η ≤ 2 as we will see below.
Finally, we have to express the R and a parameters as functions of the average degreek and σ 0 . In the λ terms, the average degree is equal tō Both R and a appear in this integral. The second equation defining these two parameters is σ = σ 0 κ η . For any given value of the latent parameter λ = λ 0 , we can write For the reasons explained in Appendix D, we set the λ 0 to the value such that κ(λ 0 ) =k. By solving numerically the system of Eqs. (84,85) with the λ 0 set to this value, we find the values of the parameters R and a. With these solutions, the model is fully specified. Weighted hypersoft configuration model with power-law degree and strength distributions. To summarize, the graphs in the described power-law WH-SCM can be generated using the following algorithm: 1. For a given set of input parameters, which are the network size n, the degree distribution power law exponent γ > 2, the strength-degree scaling exponent η ≥ 1, the average degreek, and the constant σ 0 controlling the baseline of the strength-degree scaling, 2. Find the model parameters α 1 , α 2 , β 1 , β 2 , R, a using Eqs. (80)(81)(82)(83)(84)(85).
4. For each node pair (i, j), draw the link between them with probability given by Eq. (60).
5. For each node pair (i, j) linked at the previous step, draw the weight of the link from the exponential distribution with rate µ i + µ j , Eq. (28).
We provide an implementation of this algorithm at the GitHub repository [49].

VI. SIMULATION RESULTS FOR SYNTHETIC NETWORKS
Here we check in simulations that the specific version of the general WHSCM model documented at the end of the previous section produces networks satisfying the requirements in Section III.
In our simulations, we generate graphs of size n = 10 5 with the average degreek = 10 and σ 0 = 0.1 for γ ∈ {2.2, 2.6, 3.0} and η ∈ {1.0, 1.1, 1.3, 1.5, 1.7, 1.9}. For each combination of the parameters, 20 graph instances are generated. The resulting empirical complementary CDFs of degrees k and the average strengthss(k) of nodes of degree k are shown in Fig. 3. We see that the generated networks indeed have power-law degree distributions with the prescribed values of exponent γ, and that the strength-degree correlations follow the superlinear law with the prescribed values of exponent η.
To demonstrate how the degree-strength distributions, constrained in the maximum entropy manner, implicitly constrain some other network properties to some not exactly trivial values, we measure the weight disparity [60][61][62] in the generated networks. The weight disparity Y i is a quantity that characterizes the heterogeneity of weights of links incident to node i. It is defined as where N i denotes the set of neighbors of node i, w ij the weight of the link between nodes i and j, and s i the strength of i. If most of the strength of node i is concentrated in the weights of a few links incident to i, then Y i is close to 1. If all the links incident to i carry the same weight s i /k i , then Y i is equal to 1/k i . To characterize the local distribution of weights in a weighted network as a whole, the average weight disparityȲ (k) of nodes of degree k is often looked at [63]. IfȲ (k) ∼ 1/k for all degrees k, then the weights are homogeneously distributed among all links and all nodes. An average weight disparity function decaying slower than 1/k, as observed in many real networks [48,62], indicates that the link weights are distributed more heterogeneously.
In Fig. 4 we show the average weight disparity func-tionsȲ (k) in the generated WHSCM networks. We observe that the weight disparity behaves quite differently for different values of the scaling exponent η. For small values of η, the weight disparity as a function of degree scales roughly asȲ (k) ∼ 1/k in the lower to mid-range degree values. For larger values of γ this behavior persists even for large degrees, indicating that weights are distributed homogeneously for nodes of both low and high degrees. However, if η is larger, the high degree behavior ofȲ (k) changes entirely, and the function starts to increase with the degree k. This indicates that the higher the degree of a node, the more heterogeneous the distribution of weights among its links.
The intuition behind this effect is that the higher the exponent η, the larger the strengths of the highdegree nodes. In order for these nodes to satisfy their demanding strength constraints without disturbing the small strengths of low-degree nodes attached to them, they have to allocate increasingly heavier weights on their links to other high-degree nodes. This creates a weighted connectivity pattern where large portions of the strengths of high-degree nodes are distributed among the links interconnecting the high-degree, high-strength nodes. This weighted rich-club pattern is similar in spirit to the unweighted rich-club effect [25][26][27]. It is important to reemphasize that this seemingly nontrivial effect is caused purely by simple constraints on low-order network properties-namely, by heterogeneous degree distributions and super-linear scalings of strengths with degrees.

VII. POWER-LAW WHSCM VERSUS REAL-WORLD NETWORKS
Finally, we demonstrate a use case scenario involving the power-law WHSCM to construct null model graphs for the following three real-world weighted networks: 1. C. elegans metabolic network [64][65][66], where nodes are metabolites and links indicate interactions between them, with the link weight representing the number of such interactions; 2. computational geometry collaborations network [67], where nodes are authors publishing works on computational geometry and links between them represent co-authorships, with the link weight representing the number of co-authored works; 3. Bible proper nouns network [66,68], where nodes are proper nouns of places and names in the King James Version of the Bible and links between them indicate that a pair of nouns are mentioned in the same Bible verse, with link weights representing the number of such noun co-occurrences.
To construct null model graphs for these networks, we first measure the WHSCM input parameters in these real networks. We find the n andk parameters directly from the networks, the power-law exponent γ is found using the package from Ref. [32], and the exponent η and σ 0 are found by a linear fit of degrees and strengths in the log-log scale. The resulting values for the three networks are shown in Table II. Using the parameters from Table II, we generate 10 synthetic WHSCM network instances for each of the three real networks, and compare the basic structural properties of the real networks and their WHSCM null model counterparts. Specifically, we look at the following properties:   The last, less familiar property is defined in [69] for a node i as where (j, k) are all the pairs of neighbors N (i) of the node i, and w denotes the link weight rescaled by the maximum weight observed in the network. The function c w (s) is the the average of c (i) w across all nodes i whose strength is s. If weights are real, every node has a unique strength with high probability, so that this function is a scatter plot consisting of n data points, one data point for every node.
Juxtaposing these properties in the considered real networks against their WHSCM counterparts in Fig. 5, we observe that the properties fixed by the WHSCM are well-captured by the null model graphs, as expected. Moreover, even though the weight disparity is not explicitly constrained in the WHSCM, it nevertheless behaves in a qualitatively similar manner in the real and null model networks. Network namecu,rcu,scw,rcw,s C. elegans metabolic 6.5 · 10 −1 2.2 · 10 −2 6.6 · 10 −3 2.8 · 10 −3 Comp. geo. collab. 4.9 · 10 −1 3.2 · 10 −2 4.7 · 10 −3 8.2 · 10 −5 Bible proper nouns 7.2 · 10 −1 4.6 · 10 −2 5.2 · 10 −3 5.0 · 10 −4 However, both the unweighted and weighted clustering coefficients are much lower in the null models than in the real networks. To quantify this difference a bit further, we average the unweighted and weighted clustering coefficients across all nodes in the real networks and across all nodes in all the 10 WHSCM replicas of the real networks. The results are shown in Table III. We see that in all the three real networks, both the unweighted and weighted average clustering is much higher, often by orders of magnitude, than in their null model counterparts. This observation suggests that the clustering coefficient is a statistically significant structural feature of these networks that cannot be explained by power-law degree and strength distributions alone. Some other mechanisms, such as latent geometry [48], are thus responsible for the formation of clustering in these real networks.

VIII. DISCUSSION
There exist a plenty of weighted network models with tunable degree and strength distributions [46][47][48][70][71][72][73][74][75][76][77][78][79][80]. Here, we have contributed to this list by introducing the unique unbiased null model, the WHSCM, that satisfies the maximum entropy requirement. The model produces random graphs whose entropy is maximal across all graphs whose joint distribution of degrees and strengths converges to a given distribution. The outline of the proof that this is indeed so in Appendix A is not as detailed as the proof of the maximum entropy properties of the HSCM in [14]. The WHSCM proof can thus be improved by filling in all the missing details.
In developing a particular version of the model with power-law degree and strength distributions, we encountered the major challenge in the form of the system of nonlinear integral equations (35,36) or (62,63) that need to be solved to find the latent parameter distribution ρ(ν, µ) or ρ(λ, µ). These equations appear intractable, so that we devised a workaround that worked well, but is there a better, more principled solution to the problem?
It would be nice to have such a solution because one may wish to constrain the degree-strength distributions not necessarily to power laws but to something elseto truncated power laws, for instance, observed in real weighted networks [3,[81][82][83]. What is the latent parameter distribution in this case? In our experiments (not shown), we saw that some relatively "clean" latent pa-  87), as a function of node strength. The data for the real networks are shown in the orange color, while the data for the synthetic WHSCM networks are in blue. For each real networks, 10 random WHSCM graphs are generated using the power-law WHSCM with the parameters listed in Table II. rameter distributions ρ(λ, µ) tend to lead to truncated power laws in certain cases, but even with the "cleanest" choices of ρ(λ, µ), things are difficult to control either analytically or even numerically. In any case, for any desired strength-degree distribution ρ(κ, σ), the most principled solution is the exact solution of the system of integral equations (62,63) with respect to the latent parameter distribution ρ(λ, µ).
We emphasize that the introduced null model is what it is, a null model, so that it should be used as such. It should not be confused with or considered as a realistic model of real-world weighted networks. We saw, for instance, that the model does not capture clustering observed in real networks. Latent geometry was proposed in [48] as a possible mechanism explaining clustering in real weighted networks. It would be interesting to see whether the model in [48] satisfies the maximum entropy requirements, and if so, then under what constraints.
Related to that, it would be also nice to have a weighted generalization of random hyperbolic graphs [84] whose maximum entropy properties are well understood and whose infinite temperature limit is exactly the HSCM [85,86]. In other words, what is the model of weighted random hyperbolic graphs that have analogous maximum entropy properties and whose infinite temperature limit is the WHSCM? Here we summarize the key points of the proof from [14] that the HSCM random graphs maximize graph entropy across all graphs whose degree distribution converges to a given distribution, and show how to generalize this proof to weighted graphs in the WHSCM.

HSCM as entropy maximizer
For a given distribution P (k) of node degrees k, what is the graph ensemble whose ensemble distribution P (G) of graphs G maximizes Shannon entropy (1) but in such a way that the degree distribution in the ensemble converges to P (k)? As proved in [14], the unique answer is the HSCM.
The proof is not exactly trivial because any brute-force attack at it is doomed to fail since if understood literally, the task is an intractable combinatorial optimization problem, a mission impossible. To circumvent this impasse, a workaround collection of techniques, based on the graphon theory [57], was devised in [14]. We describe this collection here.
The main idea of this workaround proof is to maximize not the entropy S[P (G)] of the intractable ensemble distribution P (G) but the graphon entropy defined below. Intuitively, the graphon entropy is the entropy of random edges conditioned on given values of the Lagrange multipliers ν i . In other words, for a given collection of fixed ν i s, the graphon entropy is the SCM entropy. The maximization of the graphon entropy turns out to be a tractable functional analysis problem with an explicit unique solution, but there is another contribution to the HSCM entropy which is the entropy of ν i s that are not fixed but random in the HSCM. The crux of the proof is to show that the entropy of the graphon that maximizes the graphon entropy is the leading term in the graph entropy S[P (G)], while the entropy of ν i s is subleading. Since so, the graphon that maximizes the graphon entropy maximizes also the graph entropy, thus solving the original entropy maximization problem. We provide some key details behind how this works next.
In the HSCM, the graphon is no mystery but just the connection probability function p(ν, ν ). This function literally says that if the Lagrange multipliers of two nodes happen to be ν and ν , then the link between them is the Bernoulli random variable with the success rate p(ν, ν ). Observe that the Bernoulli random variable Be(p) is trivially the random variable that maximizes the Shannon entropy of distributions on {0, 1} with mean p. Recall that the Lagrange multipliers ν as random variables can be mapped to the expected degrees κ via (10,11) resulting in the graphon p(κ, κ ) expressed as a function of κs. Observe that if κ is now treated as a latent variable, then the expected degree of a node with latent variable κ is κ: The edges a ij in the HSCM are Bernoullis with different success rates p(κ i , κ j ) that are random because κs are random. The entropy S[Be(p)] of the Bernoulli random variable with the success rate p is These observations justify the definition of the entropy S[p] of the graphon p() in [57] which is What is the graphon p() that maximizes the graphon entropy in Eq. (A3) while satisfying the constraint in Eq. (A1), where ρ(κ) is our desired expected degree distribution, the one that yields the desired P (k)? As shown in [14], it is relatively simple to prove that the unique exact answer is the Fermi-Dirac graphon in Eq. (6) with νs mapped to κs via (10,11). As also shown in [14], in sparse graphs this exact solution is asymptotically equivalent to the approximate expressions in Eqs. (13,14) that express the solution graphon explicitly in terms of the κ variables.
Proving that the entropy of this graphon S[p] is the dominating term in the graph entropy S[P (G)], in comparison to entropy of random κs, is a much more delicate endeavor. For this, the following techniques from [57] are properly adjusted in [14].
First, it is easy to see that the graphon entropy is a trivial lower bound for the HSCM graph entropy divided by n 2 . Indeed, if all κs are fixed, then the graphon entropy in the HSCM is the entropy of the SCM graphs with this graphon divided by n 2 . The hard part is thus to find a matching upper bound, and this is where the techniques from [57] come really useful.
The key point in establishing such an upper bound is to recognize that for any partition of the values of κs into consecutive intervals π k , k = 1, . . . , K, the entropy of P (G) is upper-bounded by the entropy of the averaged graphon defined below, plus the entropy of the indicator random variables I ik that indicate whether the random expected degree κ i of node i happened to land in the interval π k . The averaged graphon is defined as the piecewise constant functionp(κ, κ ) whose values for the values of κ, κ belonging to a given rectangle π k ×π k in the κ×κ partition are equal to the average value of the graphon p(κ, κ ) in this rectangle. Observe that the smaller the number of the partition intervals K, the smaller the total entropy of the indicator random variables I ik , just because there are fewer of them, but the larger the sum of the error terms coming from graphon averaging, simply because rectangles π k × π k are large. The smaller they are, the smaller the total graphon entropy error term, but the larger the total entropy of indicators I ik . The crux of the proof is to find a "sweet spot"-the right number of intervals of the right size guaranteeing the proper balance between these two types of contributions to the upper bound, which we want to be tighter than the difference between the graph and graphon entropies. In sparse graphs, this task turns out to be a blade runner exercise.
Notwithstanding these blade runner difficulties, the required partition π k was found in [14], completing the proof that the HSCM is indeed the unique entropy maximizer across all random graph ensembles whose degree distribution converges to a given P (k).

WHSCM as entropy maximizer
The maximum entropy HSCM proof described in the previous section should apply to the WHSCM as well, upon the modifications that we discuss below. The key idea behind these modifications is a proper generalization of graphons and their entropy to weighted networks.
Similar to the unweighted case, in the weighted case it is more convenient to deal with the expected degree and strength variables κ, σ instead of the Lagrange multipliers ν, µ. The map from the latter to the former is given by Eqs. (35,36) which, if rewritten in the κ, σ variables, become the system of the self-consistency equations analogous to Eq. (A1). In unweighted networks, graph edges a ij are Bernoullis with different random success rates p(κ i , κ j ): In weighted networks, the edges are no longer Bernoullis. Instead they are random variables that we call Bernoulli-Exponential, or BeExp for short: where p and ω are the two parameters of the BeExp. We define the vanilla BeExp as follows: if w = BeExp(p, ω), where p ∈ [0, 1], ω > 0, and w ≥ 0, then w = 0 with probability 1 − p, while with probability p, w is the exponential random variable with mean ω (or rate 1/ω). In other words, the PDF of the BeExp w = BeExp(p, ω) is so that its entropy is where S[Exp(ω)] = 1 + log ω is the entropy of the exponential distribution with mean ω. We observe that the BeExp(p, ω) can be intuitively thought of as a "smearing" of the probability p of 1 (edge existence) in Be(p) into the Exp(ω), the exponential distribution with mean ω (edge weight). As Be(p) is trivially the maximum entropy distribution on {0, 1} with mean p, so is BeExp(p, ω), less trivially, the maximum entropy distribution on [0, ∞) with mean ω and P (w > 0) = p. That is, the BeExp(p, ω) is the maximum entropy distribution under the constraints that the edge exists with probability p and that its mean weight is ω. It is common knowledge in statistical mechanics that in maximum entropy canonical ensembles of systems of particles, the distributions of particles over particle states are also maximum entropy (Fermi-Dirac or Bose-Einstein). Since graph edges are analogous to particles in statistical mechanics [6,9,15,16], these observations motivate us to constrain the space of all possible probability distributions on [0, ∞) to the two-parametric maximum entropy BeExp family.
The rest of the proof then proceeds as in the unweighted case, using the same techniques as in [14]: first show that the unique graphon maximizing the graphon entropy in Eq. (A10) subject to the constraints in Eqs. (A4,A5) is given by Eqs. (27,29) with ν, µ mapped to κ, σ via Eqs. (35,36), and then prove that the entropy of this graphon dominates the graph entropy, while the entropy of random κ, σ is negligible.
The latter step could be challenging as it calls for repeating the blade runner partition finding exercise from [14], this time for the product space of κ × σ values. This should be still possible using the same ideas as in [14]-roughly, the key idea is that the partition is such that all its boxes have the same number of nodes in them on average. However, for the specific power-law WHSCM considered in this paper, or for any other WH-SCM version in which strengths σ are set to be a deterministic function of degrees κ, we only need to partition the space of κ values. That is, the settings are exactly as in [14] in that regard, so that exactly the same partition as in [14] can be used in these cases.

Appendix B: Simulation results for the relation between expected and actual degrees and strengths
The WHSCM is formulated in terms of constraints for the joint distribution of expected degrees and strengths, while in many cases we are interested in the behavior of actual degrees and strengths realized in the corresponding ensemble of graphs. Thus, we need to show that the results obtained so far for expected values also hold for actual degrees and strengths. For latent variable graph models, it is known that actual degrees are concentrated around their expected values, and distributed according to Poisson distribution, i.e., P (k|κ) = Pois(κ) [40]. Since no similar claims are known for the behavior of strengths, we test the correlation between both κ and k, and σ and s values in our model. To this end, we constructed logbinned scatter plots of actual degrees/strengths versus their expected values in WHSCM graphs with varying parameters γ and η. The results are shown in Fig. 6. From the figure, it is evident that both degrees and strengths are highly correlated with their expected values. The narrow error bars also indicate that the distribution of actual degree/strength values around their expected values are narrow. Thus, the power-law scalings obtained for the expected values should also hold for actual values, as we have already demonstrated in the main text for various values of γ and η.
Appendix C: Scaling of R with the number of nodes In the unweighted HSCM, the parameter R scales as R ∼ 1 2 log n with the network size n, which is evident from Eq. (12). This motivated us to assume a similar scaling for the analysis of the WHSCM. In this appendix, we validate this choice by studying numerically how the parameters R and a scale with the system size in the WHSCM.
To this end, we solve Eqs. (84) and (85) as function of network size n for various input exponents γ, η, and fixed values of average degreek = 10, and expected strengthdegree scaling constant σ 0 = 0.1. The resulting solution curves are shown in Fig. 7. The solutions indicate that the parameter R scales with n in the same way as in the unweighted HSCM case, while the parameter a varies slowly with the network size. We note that since we solve for the R, a numerically with fixed average degree requirement, the resulting networks are guaranteed to have constant average degree independent of network size, thus forming a sparse ensemble of graphs.  In the WHSCM, each node is characterized by the two latent parameters λ and µ that are distributed according to the joint probability distribution from Eq. (68). In Section V we stated expressions for the exponent α 1 , α 2 , β 1 and β 2 for the densities of λ and µ. They are obtained from analysis of the integral expressions for both κ and σ in Eq. (62) and (63), respectively.
Approximating the integral expressions for expected degrees. By our double power-law choice for the marginal distributions for λ and µ, the expected degree of a node with latent parameter λ, Eq. (62), may be written as the following four integrals, where each in- tegral represents one of combinations for "small" (≤ λ c ) and "large" (> λ c ) parameters λ and λ : While these integrals cannot be computed directly in a closed-form, it is possible to approximate them. For (D1) we use that for both λ, λ < λ c the 1 in the denominator can be removed. This cannot be done for the other three integrals, since λ > λ c or λ > λ c . Therefore, in the second integral (D2) we use that λ β2−β1 c λ −β2 ≤ λ −β1 c and hence this term is negligible with respect to λ −β1 when 1 ≤ λ ≤ λ c . A similar approach is applied to (D3), with the role of λ and λ reversed. Finally for integral (D4) we use that for λ, λ > λ c , and, given that λ c ≈ (2ae 2R ) 1/(2+β1) and ae 2R ∼ n, as we demonstrate in Appendix C, we have so that the 1 is the dominant term in the denominator. This allows us to obtain the following approximations for the integrals above: The four integrals on the right-hand side can be evaluated exactly: , where 2 F 1 (q 1 , q 2 , q 3 , z) is Gauss hypergeometric function.
Approximating the integral expressions for expected strengths. As was the case for the expected degree, the expected strength as a function of the latent parameter λ, see Eq. (63), may be split into four integrals as follows: Illustration of the approximation used to find expression for κ and σ. The scheme is plotted on the log-log scale.
Analysis of the φ 1 scaling exponent. We start by analyzing the scaling of κ ∼ λ φ1 for the small values of λ, i.e., λ ≤ λ c . While it is possible to get the scaling of hypergeometric functions from Eqs. (D9) and (D10), we note that the resulting scaling of κ as a function of λ is given by the sum of λ terms raised to different powers. Moreover, coefficients in front of these terms may change their signs depending on the choice of the parameters α 1 , α 2 , β 1 , β 2 . In general, for any combination of the model parameters, it is impossible to approximate this sum of power terms as a single power of λ, e.g., by extracting leading order terms.
To circumvent this issue, we find an approximation to the φ 1 scaling using the observations from the MLE inference of the WSCM latent parameters introduced in Section IV. More precisely, we first infer the nodes' λ and µ parameters given degree and strength sequences with predefined exponents γ and η. Second, we obtain the estimate of the φ 1 by linear fitting of the κ(λ) function on the log-log scale. We observe that for various input values of γ and η, the φ 1 scaling exponent is very close to η. We therefore assume that φ 1 ≈ η, and, given Eq. (D23), the α 1 parameter is set to: Analysis of the χ 1 scaling exponent. The approximated expression for the expected strength σ(λ) in the small regime, i.e., λ ≤ λ c , is given by Eqs. (D17) and (D18). Although it is again possible to obtain σ(λ) scaling in terms of series of various λ power-terms, it is hard to extract a single power scaling that approximates the sum of these power-terms for all parameters α 1 , α 2 , β 1 , β 2 . However, it is possible to relate the χ 1 scaling exponent to the β 1 parameter using the following considerations. From the MLE inference of the WSCM latent parameters as in the case of the φ 1 scaling exponent, we observe that the resulting empirical model parameter β 1 is linearly related to the η − 1, and the coefficient between the two depends only on the input parameter γ, i.e., β 1 ≈ u(γ)(η − 1), where u(γ) is a function of γ. By numerical fitting, we find that u(γ) behaves approximately as u(γ) ≈ γ − γ−2 γ . Thus which yields β 1 (η + 1) ≈ (γ − (γ − 2)/γ)(η 2 − 1), so that η 2 ≈ 1 + (η+1)β1 γ−(γ−2)/γ . Moreover, given Eq. (D25) and the fact that φ 1 may be approximated as φ 1 ≈ η, the resulting exponent χ 1 should be approximately equal to χ 1 ≈ η 2 . This means that χ 1 ≈ 1 + (η+1)β1 γ−(γ−2)/γ . Analysis of the φ 2 scaling exponent. The scaling of κ(λ) ∼ λ φ2 for large λ > λ c is encoded in the two integrals from Eqs. (D11) and (D12). The approximation in Eq. (D12) does not have any λ-dependent terms, so it does not contribute to the scaling. We may approximate the scaling of the hypergeometric functions appearing in Eq. (D11) for λ → λ c . In this case, the argument of the second hypergeometric function approaches −1, therefore, the function approaches a constant and does not contribute to the scaling. The first hypergeometric function may be approximated as follows, assuming that its argument is large: 2 F 1 1, π(α 1 − 1) (1 + β 1 ) sin π(α1−1) 1+β1 λ ae 2R α 1 −1 1+β 1 .

(D29)
Additionally, in the case when λ → ∞, the κ(λ) function should saturate to the maximum possible degree n − 1, therefore, we only consider the κ(λ) scaling near the left boundary of this regime, i.e., λ c . The signs of the coefficients in front of the two λ-dependent terms may be of different signs, so it is again impossible to extract a single-exponent scaling from the expression above. However, from the WSCM MLE-inferred latent parameters as was done in the case of φ 1 , χ 1 scalings, we observe that for large values of η 1, the scaling exponent φ 2 → α1−1 1+β1 . We therefore seek a scaling exponent φ 2 in the form φ 2 ≈ ψ(γ, η) α1−1 1+β1 . We know that φ 2 → 1 when η → 1 to be compatible with the unweighted HSCM, and we know that in this limit β 1 → 0, α 1 → γ. Therefore, there should be a prefactor of 1 γ−1 in front of the α1−1 1+β1 to guarantee that φ 2 → 1 in the HSCM limit. Additionally, as φ 2 → α1−1 1+β1 in the large η limit, we need to compensate for this prefactor. Given these requirements, we find from the numerical fitting procedure that the following form of ψ(γ, η) fits well the φ 2 exponent for various γ, η parameters: ψ(γ, η) = 1 γ−1 1 + (γ − 2) (1 − 1/η) . Then the φ 2 scaling exponent is: φ 2 = α1−1 1+β1 1 γ−1 1+(γ −2) (1 − 1/η) . Therefore, the model parameter α 2 should be selected as follows: Analysis of the χ 2 scaling exponent. The scaling of σ(λ) is given in Eq. (D19) and (D20). The hypergeometric functions from Eq. (D19) will not contribute to the scaling in the large λ limit, as their arguments would approach −1, so we only expect to see an effect of these functions near the left boundary of the region, λ c . Conversely, the hypergeometric function from Eq. (D20) is the main contributor to the scaling for large λ. We note that we may not neglect the behavior of the σ(λ) for large λ, as, in general, strengths are not bounded for a weighted network, unlike degrees that have to be at most n−1. Using the large argument approximation for hypergeometric functions as before, we obtain for Eq. (D20): Given the λ β2 prefactor in Eq. (D20), we observe that the only possible resulting scaling may be ∼ λ 1+β2−α2 . However, we note that the large-argument approximation used above is only valid for β 1 > 0 and β 2 > 0. Instead, when η → 1, we expect to recover the unweighted HSCM behavior, effectively giving us the same scaling for the χ 2 as for the φ 2 exponent. Thus, we will set β 2 to 0 in this limit, and use the above approximation otherwise. This defines our choice for the β 2 parameter: Finding the (R, a) solution. As explained in Section V, we find the R, a parameters of the WHSCM by numerically solving Eqs. (84,85) for a fixed λ 0 . For the solver from our code package [49], we set λ 0 such that κ(λ 0 ) =k. This is done to prevent the solver from finding an (R, a) solution that corresponds to a low-degree region (k <k) of thes(k) scaling curve that may not exhibit a clean power-law scaling and distort the resulting baseline for the strength-degree correlation curve. For each solver round, we iteratively update the λ 0 value corresponding to κ(λ 0 ) =k with the current solver's guess of (R, a).
Appendix E: Behavior of the WHSCM in the η → 1 limit With the choice of model parameters above and setting η = 1, we have the following distribution of the latent parameter λ: where λ ∈ (1, ∞) and α = α 1 = α 2 = γ. This gives the expressions for κ(λ) and σ(λ): The integral for κ(λ) may be evaluated directly: This function scales linearly with λ for large enough ae 2R 1, therefore, the expected degree κ(λ) ∼ λ. From the above expressions, we also see that σ(λ) = 1 2a κ(λ). Thus, the constant a may be easily found given the model input parameter σ 0 , a = 1 2σ0 . Moreover, the average degreek defined by Eq. (84) now reads: where Φ(z, q 1 , q 2 ) is the Lerch transcendent function. These equations allows to solve for the model parameter R numerically, given a target average degreek. The considered limiting behavior is included as a special case in the code package for generation of WHSCM networks [49].