Deterministic, quenched and annealed parameter estimation for heterogeneous network models

At least two, different approaches to define and solve statistical models for the analysis of economic systems exist: the typical, econometric one, interpreting the Gravity Model specification as the expected link weight of an arbitrary probability distribution, and the one rooted into statistical physics, constructing maximum-entropy distributions constrained to satisfy certain network properties. In a couple of recent, companion papers they have been successfully integrated within the framework induced by the constrained minimisation of the Kullback-Leibler divergence: specifically, two, broad classes of models have been devised, i.e. the integrated and the conditional ones, defined by different, probabilistic rules to place links, load them with weights and turn them into proper, econometric prescriptions. Still, the recipes adopted by the two approaches to estimate the parameters entering into the definition of each model differ. In econometrics, a likelihood that decouples the binary and weighted parts of a model, treating a network as deterministic, is typically maximised; to restore its random character, two alternatives exist: either solving the likelihood maximisation on each configuration of the ensemble and taking the average of the parameters afterwards or taking the average of the likelihood function and maximising the latter one. The difference between these approaches lies in the order in which the operations of averaging and maximisation are taken - a difference that is reminiscent of the quenched and annealed ways of averaging out the disorder in spin glasses. The results of the present contribution, devoted to comparing these recipes in the case of continuous, conditional network models, indicate that the annealed estimation recipe represents the best alternative to the deterministic one.


I. INTRODUCTION
Over the last twenty years, the growth of network science has impacted several disciplines by establishing new, empirical facts about the structural properties of the related systems.Prominent examples are provided by economics and finance: the growing availability of data has motivated researchers to explore and model the architecture of cryptocurrencies [1], interbank networks [2], production networks [3] and trading networks [4][5][6][7].
Modelling the establishment of a connection and the corresponding weight simultaneously poses a serious challenge.Econometrics prescribes to estimate binary and weighted parameters either separately, within the context of hurdle models [8], or jointly, within the context of zero-inflated models [9]; in both cases, the Gravity Model specification [10] ⟨w ij ⟩ GM = f (ω i , ω j , d ij |ϕ) = e ρ (ω i ω j ) α d γ ij -where ω i ≡ GDP i /GDP is the GDP of country i divided by the arithmetic mean of the GDPs of all countries, d ij is the geographic distance between the capitals of countries i and j and ϕ ≡ (ρ, α, γ) is the vec- * marzio.divece@imtlucca.ittor of parameters defining the Gravity Model specification -is interpreted as the expected value of a probability distribution whose functional form is arbitrary.On the other hand, the approach rooted in statistical physics constructs maximum-entropy distributions, constrained to satisfy certain network properties [11][12][13][14][15].
In a couple of recent, companion papers [16,17] the two, aforementioned approaches have been integrated within the framework induced by the constrained optimisation of the Kullback-Leibler (KL) divergence [18].In particular, two, broad classes of models have been constructed, i.e. the integrated and conditional ones, defined by different, probabilistic rules to place links, load them with weights and turn them into properly econometric prescriptions.For what concerns integrated models, the first, two rules follow from a single, constrained optimisation of the KL divergence [19]; for what concerns conditional models, the two rules are disentangled and the functional form of the weight distribution follows from a conditional, optimisation procedure [20].Still, the prescriptions adopted by the two approaches to carry out the estimation of the parameters entering into the definition of each model differ.
The present contribution is devoted to comparing these recipes in the case of continuous, conditional network models defined by both homogeneous and heterogeneous constraints.

II. MINIMISATION OF THE KULLBACK-LEIBLER DIVERGENCE
The functional form of continuous, conditional network models can be identified through the constrained minimisation of the KL divergence of a distribution Q from a prior distribution R, i.e.
where W is one of the possible values of a continuous random variable, W is the set of possible values that W can take, Q(W) is the (multivariate) probability density function to be estimated and R(W) plays the role of prior distribution, whose divergence from Q(W) must be minimised: in our setting, W represents an entire network whose weights, now, obey the property w ij ∈ R + 0 , ∀i < j.Such an optimisation scheme embodies the so-called Minimum Discrimination Information Principle [16,17], implementing the idea that, as new information becomes available, an updated distribution Q(W) should be chosen in order to make its discrimination from the prior distribution R(W) as hard as possible.
Let us, now, separate both the prior and the posterior distribution into a purely binary part and a conditional, weighted one; the positions Q(W) = P (A)Q(W|A) and R(W) = T (A)R(W|A), where A denotes the binary projection of the weighted network W (i.e.Θ[W] = A), T (A) represents the binary prior and R(W|A) represents the conditional, weighted prior, lead the KL divergence to be re-writable as i.e. as a sum of the two addenda In what follows, we will deal with completely uninformative priors, a choice that amounts at considering the (somehow, simplified) expression i.e. 'minus' the joint entropy, where is the Shannon entropy of the probability distribution describing the binary projection of the network structure [14,15] and is the conditional Shannon entropy of the probability distribution describing the weighted network structure [16,17,20].Notice that, when continuous models are considered, S(Q|P ) is defined by a sum running over all the binary configurations within the ensemble A and an integral over all the weighted configurations that are compatible with each, specific, binary structure, i.e.W A = {W : Θ[W] = A}.For a more detailed discussion, see Appendix A.
The functional form of P (A) can be determined by carrying out the usual, constrained maximisation of Shannon entropy [14,15]; remarkably, any set of (binary) constraints considered in the present paper will lead to the same expression for P (A), i.e.
: specifically, the position x ij ≡ x individuates the Undirected Binary Random Graph Model (UBRGM), the position x ij ≡ x i x j individuates the Undirected Binary Configuration Model (UBCM) and the position x ij ≡ δω i ω j individuates the Logit Model (LM) [21].
On the other hand, the functional form of Q(W|A) can be determined by carrying out the constrained maximisation of S(Q|P ), the set of constraints being, now, while the first condition ensures the normalisation of the probability distribution, the vector {C α (W)} represents the proper set of weighted constraints.The distribution induced by such an optimisation problem reads if W ∈ W A and 0 otherwise.While the Hamiltonian H(W) = α ψ α C α (W) lists the constraints, the quantity at the denominator is the partition function, conditional on the fixed topology A [20].
For mathematical convenience, in what follows we will consider separable Hamiltonians, i.e. functions that can be written as sums of node pair-specific Hamiltonians: (with m ij being the pair-specific, minimum weight allowed by a given model and ζ ij being the corresponding partition function), irrespectively from the specific, functional form of H ij (w ij ) [17].For a more detailed discussion, see Appendix B.

III. ESTIMATION OF THE PARAMETERS
Several, alternative recipes are viable to estimate the parameters entering into the definition of continuous, conditional network models.

A. 'Deterministic' parameter estimation
The simplest one prescribes to consider the traditional likelihood function with W * (A * ) being the empirical, weighted (binary) adjacency matrix; its maximisation allows the parameters entering into the definition of the purely topological distribution and those entering into the definition of the conditional, weighted one to be estimated in a totally disentangled fashion [17].In fact, maximising with respect to the unknown parameters leads us to find the vector of values ψ * satisfying the vector of relationships which stands for the set of relationships ⟨C α ⟩ A * (ψ * ) ≡ C * α , ∀ α, each one equating the model-induced, average value of the corresponding constraint to its empirical value, marked with an asterisk.This first approach to parameter estimation can be named as 'deterministic', to stress that A * is considered as not being subject to variation; otherwise stated, this recipe -which is the most common in econometrics -prescribes to estimate the parameters entering into the definition of the conditional, weighted probability distribution by assuming the network topology to be fixed.

B. 'Annealed' parameter estimation
Topology, however, is a random variable itself, obeying the probability distribution P (A).As a consequence, the 'deterministic' recipe for parameter estimation could lead to inconsistencies, should the description of A * provided by P (A) be not accurate.The variability induced by P (A) can be properly accounted for by considering the generalised likelihood [20] whose maximisation leads us to find the vector of values ψ * satisfying the vector of relationships which stands for the set of relationships ⟨C α ⟩(ψ * ) ≡ C * α , ∀ α.Taking this average is conceptually similar to taking the 'annealed' average in physics: parameter estimation is carried out while random variables -again, the entries of the adjacency matrix -are left to vary.
Interestingly, the 'deterministic' recipe is a special case of the 'annealed' recipe since the former can be recovered by posing P (A) ≡ δ A,A * : in this case, in fact,

C. 'Quenched' parameter estimation
A viable alternative to properly account for the variability induced by P (A) is that of reversing the two operations of 'likelihood maximisation' and 'ensemble averaging': in other words, one can 1) numerically sample the ensemble of configurations induced by P (A), 2) maximise the likelihood ln Q(W * |A) for each, generated network, 3) take the average of the resulting set of parameters, according to the formula the estimation of the α-th parameter being assumed to coincide with the average ⟨ψ * α ⟩.Taking this average is conceptually similar to taking the 'quenched' average in physics: random variables -in the specific case, the entries of the adjacency matrixare frozen, parameter estimation is carried out and, only at the end, the values of the parameters are averaged over the ensemble of configurations induced by P (A).
As our models inherit their functional form from the constrained minimisation of the KL divergence, each parameter controls for a specific constraint: when employing the 'deterministic' recipe, such a circumstance makes each parameter configuration-dependent; when employing either the 'annealed' or the 'quenched' recipe, instead, accounting for the variability of a network structure induces a sort of 'loss of memory' about its empirical, purely topological details.

IV. RESULTS
In order to test if the 'deterministic', 'annealed' and 'quenched' prescriptions lead to the same estimation, let us focus on a number of variants of the Conditional Exponential Model (CEM), induced by the positions if nodes i and j are not connected, the weight of the corresponding link is zero with probability equal to one) and In what follows, we will consider three, different instances of • the Undirected Binary Random Graph Model (UBRGM), defined by posing x ij ≡ x and induced by the maximisation of S(P ) while constraining the total number of links, • the Undirected Binary Configuration Model (UBCM), defined by posing x ij ≡ x i x j and induced by the maximisation of S(P ) while constraining the whole degree sequence, • two, different instances of the Logit Model (LM), both representing a fitness-driven version of the UBCM, (again) induced by constraining the total number of links, L(A * ) ≡ L * = i<j a * ij .The first one is defined by posing x ij ≡ δω i ω j , i.e.
and has been employed to study the year 2017 of the CEPII-BACI version of the World Trade Web (WTW) [22], that is a network of N = 171 nodes and a link density of d = 0.87.The second one is defined by posing x ij ≡ δs i s j , i.e.
and has been employed to study the 01/03/2019 snapshot of the Bitcoin Lightning Network (BLN) [23], that is a network of N = 5012 nodes and a link density of d = 0.003.

A. 'Scalar' variant of the Conditional Exponential Model
Let us start by considering the 'scalar' or homogeneous variant of the CEM, defined by the position In this case, the 'deterministic' recipe for parameter estimation prescribes to maximise the likelihood where W (W * ) ≡ W * = i<j w * ij and whose optimisation leads to the expression β = L * /W * .The 'annealed' recipe prescribes to maximise the likelihood whose optimisation leads to the expression β = ⟨L⟩/W * .The 'quenched' recipe, on the other hand, prescribes to calculate the average the binary topology is either 'deterministic' (black vertical line) or generated via the UBRGM (light orange or light grey), the UBCM (purple or dark grey) and the LM (light purple or grey).The deterministic approach leads to a single estimate, while the other approaches lead to either a single, 'annealed' estimate (vertical, solid lines) or to a whole distribution of 'quenched' estimates (empirical distribution constructed over an ensemble of 5.000 binary configurations with theoretical curves, Binomial or Poisson-Binomial, dependent on the binary model; the corresponding average value is indicated by a vertical, dash-dotted line).The 'annealed' parameter estimates, the average values of the 'quenched' parameter distributions and the 'deterministic' parameter estimate coincide.Data refers to the year 2017 of the CEPII-BACI version of the WTW [22].
In the case of the 'scalar' variant of the CEM, the estimations coincide for any null model preserving the total number of links, i.e. ensuring that ⟨L⟩ = L * , regardless of the network density.Such a result is confirmed by Fig. 1 where each recipe has been implemented on the WTW, by adopting the distributions induced by the UBRGM (blue), the UBCM (green) and the LM (red).Specifically, the 'deterministic' estimation (black, solid line) and the 'annealed' estimations (blue, green and red, solid lines) overlap; moreover, each 'annealed' estimation overlaps with the the corresponding, 'quenched' estimation, i.e. the average value of the related, 'quenched' distribution (blue, green and red, dash-dotted lines).
In the case of the UBRGM-induced, homogeneous version of the CEM, the 'quenched' distribution of the parameter β(A) = L(A)/W * 'inherits' the distribution of 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0 and the LM (light purple or grey).The deterministic approach leads to a single estimate, while the other approaches lead to either a single, 'annealed' estimate (vertical, solid lines) or to a whole distribution of 'quenched' estimates (histograms with normal density curves having the same average and standard deviation, constructed over an ensemble of 5.000 binary configurations; the average value is indicated by a vertical, dash-dotted line).Each 'annealed' parameter estimate coincides with the average value of the corresponding 'quenched' distribution although the distributions induced by the three, binary recipes are well separated.In addition, the 'deterministic' parameter estimate is very close to the UBCMinduced, 'annealed' one.Data refers to the year 2017 of the CEPII-BACI version of the WTW [22].the total number of links, i.e.L ∼ Bin(N (N − 1)/2, p), with p = 2L * /N (N − 1): more precisely, W β ∼ Bin(N (N − 1)/2, p); analogously for the UBCM-and the LM-induced, homogeneous versions of the CEM -the only difference being that, now, L obeys two, different, Poisson-Binomial distributions.

B. 'Vector' variant of the Conditional Exponential Model
Let us, now, consider the 'vector' or weakly heterogeneous variant of the CEM, defined by the position In this case, the 'deterministic' recipe for parameter estimation prescribes to maximise the likelihood where s i (W * ) ≡ s * i = j(̸ =i) w * ij and whose optimisation requires to solve the system of equations The 'annealed' recipe, instead, prescribes to maximise the likelihood whose optimisation requires to solve the system of equations (notice that both the 'deterministic' and the 'annealed' version of the 'vector' variant of the CEM are alternative instances of the so-called CReM A , introduced in [20]).
The 'quenched' recipe, on the other hand, requires to solve the system of equations ⟨β i ⟩ = A∈A P (A)β i (A), ∀ i which no longer have an explicit expression.Devising some sort of approximation is, however, possible.Let us start by re-writing eq.30 as and consider the node whose coefficient is the largest one.This allows us to write to coincide with Bin(N (N −1)/2, p); if, on the other hand, we implemented the UBCM, we would obtain β i (A) ∝ k i (A)/s * i , hence expecting the 'quenched' distribution of s * i β i to obey a Poisson-Binomial.Again, the estimations coincide for any null model preserving the structural properties characterising the binary recipe implemented.
More generally, the mutual relationships between the estimations provided by the three recipes are nodedependent (see Fig. 2, illustrating the case-study of node 166 of the WTW and Fig. 4 in Appendix C): in general, however, each 'annealed' estimation overlaps with the average value of the related 'quenched' distribution.Moreover, the 'deterministic' estimation is very close to the UBCM-induced, 'annealed' one; such a result is a consequence of the accurate description of the empirical network topology provided by the UBCM -in fact, much more accurate than the ones provided by the UBRGM and the LM: indeed, the better the approximation p ij ≃ a ij , ∀i < j, the closer the 'annealed' estimation to the 'deterministic' one.This is even more evident when considering the 'tensor' variant of the CEM, in which case the three optimisation procedures lead to the expressions β det = a * ij / ŵij , ∀ i < j and β ann = ⟨β⟩ que = p ij / ŵij , ∀i < j -with ŵij representing an estimate of the empirical weight w * ij ; if, however, ŵij ≡ w * ij , ∀ i < j then, for consistency, p ij ≡ a * ij and the three recipes coincide.

C. 'Econometric' variant of the Conditional Exponential Model
As a third case-study, let us focus on the 'econometric' variant of the CEM, defined by posing represents the Gravity Model specification traditionally employed to analyse undirected, weighted, trade networks and β 0 is a structural parameter to be tuned in order to ensure that ⟨W ⟩ = W * .In this case, the 'deterministic' recipe for parameter estimation prescribes to maximise the likelihood whose optimisation requires to solve the system of equations The 'annealed' recipe, instead, prescribes to maximise the likelihood whose optimisation requires to solve the system of equations of the CEM, where the binary topology is either 'deterministic' (black vertical line) or generated via the UBRGM (light orange or light grey), the UBCM (purple or dark grey) and the LM (light purple or grey).The deterministic approach leads to a single estimate, while the other approaches lead to either a single, 'annealed' estimate (vertical, solid lines) or to a whole distribution of 'quenched' estimates (histograms with kernel density curves, constructed over an ensemble of 5.000 binary configurations; the corresponding average value is indicated by a vertical, dashdotted line).Each 'annealed' parameter estimate coincides with the average value of the corresponding 'quenched' distribution although the distributions induced by the three, binary recipes may overlap or not; the 'deterministic' estimate, instead, overlaps with the other, two ones only for the parameter α, under the UBCM-induced, binary recipe.Data refers to the year 2017 of the CEPII-BACI version of the WTW [22].
The 'quenched' recipe, on the other hand, requires to solve the system of equations ⟨β 0 ⟩ = A∈A P (A)β 0 (A) and ⟨ϕ⟩ = A∈A P (A)ϕ(A) which no longer have an explicit expression.
Figures 3 and 5 in Appendix C illustrate the case-study of the WTW: although the 'quenched' distributions induced by the three, binary recipes are characterised by different shapes that may overlap (as in the case of the parameters ρ -under the UBRGM-induced and UBCM-induced binary recipes -and γ -under all, binary recipes) or not (as in the case of the parameters β 0 and α), 'annealed' and 'quenched' estimations always coincide (the only, small discrepancy being observable for the parameter β 0 , under the UBRGM-induced, binary recipe).The 'deterministic' estimation, instead, is compatible with the other, two ones only for the parameter α, under the UBCM-induced, binary recipe.
Sparse networks deserve a separate discussion.The results concerning the homogeneous and econometric variant of the BLN, defined by posing β ij ≡ β 0 +z −1 ij , ∀i < j, with z ij ≡ e ρ (s i s j ) α , are analogous to the ones shown for the WTW -in the latter case, the 'annealed' estimates of β 0 , ρ and α are very close to their 'quenched' counterparts, the relative error RE = |(ϕ amounting at ≃ 10 −3 for β 0 and ≃ 10 −4 for ρ, α.On the contrary, these conclusions no longer hold true when the weakly heterogeneous variant of the CEM is considered: in this case, in fact, carrying out the 'quenched' approach can lead to binary configurations with disconnected nodes, a circumstance that impairs the correct estimation of the corresponding parameters; carrying out the 'annealed' estimation, instead, remains a feasible task.

V. DISCUSSION
The present contribution focuses on three recipes for estimating the parameters entering into the definition of statistical network models, i.e. the 'deterministic', 'annealed' and 'quenched' ones.In order to implement them, we have considered several variants of the CEM, i.e. the homogeneous one (defined by one, global parameter), the weakly heterogeneous one (defined by N , local parameters) and the econometric one (defined by four, global parameters), each one combined with three, different recipes for estimating the network topology (i.e. the UBRGM, the UBCM and the LM).
The 'deterministic' recipe, routinely employed in econometrics to determine the so-called hurdle models [8], prescribes to estimate the parameters associated to the weighted constraints on the empirical realisation of the network topology.Since it considers A * as not being subject to variation, its use is recommended whenever Var[a ij ] = p ij (1−p ij ) ≃ 0 or, equivalently, p ij ≃ a ij , ∀ i < j, i.e. whenever the binary random variables can be safely considered as deterministic or, more in general, whenever their (scale of) variation is negligible with respect to the (scale of) variation of the weighted random variables.
Accounting for such a variability in a fully consistent manner can be achieved upon adopting either the 'annealed' recipe (according to which parameters are estimated on the average network topology) or the 'quenched' recipe (according to which parameters are, first, estimated on a large number of binary configurations and, then, averaged); the main difference between these procedures lies in the order in which the two operations of 'averaging' (of the entries of the binary adjacency matrix) and 'maximisation' (of the related likelihood function) are taken.Interestingly, no variant of the CEM is sensitive to this choice (neither the purely structural ones nor the 'econometric' one); while, however, the coincidence of the 'annealed' and 'quenched' estimates for purely structural models can be explicitly verified, this is no longer true when the 'econometric' variant is considered: in this case, in fact, one can proceed only numerically.
This evidence reveals the main limitation of the 'quenched' approach, i.e. the need of resorting upon an explicit sampling of the chosen, binary ensemble.As any 'good' sampling algorithm must lead to a faithful repre-sentation of the parent distribution, we are left with the following question: is this always guaranteed, in all cases of interest to us?
This seems to be the case for dense networks.As shown in [24], a study of the coefficient of variation of the constraints defining the 'vector' variant of the CEM (i.e. the ratio between standard deviation and expected value of each degree) reveals it to vanish in the asymptotic limit: in other words, the fluctuations affecting each degree vanish, a result guaranteeing that the degree sequence of any configuration in the ensemble remains 'close enough' to the empirical one.
When sparse networks are, instead, considered, the coefficient of variation of the constraints defining the 'vector' variant of the CEM remains finite in the asymptotic limit: in other words, the fluctuations affecting each degree do not vanish, a result implying that the degree sequence of any configuration in the ensemble may largely differ from the empirical one; to provide a concrete example, nodes whose empirical degree is 'small' may disconnect, hence inducing the resolution of a system of equations which is not even compatible with the set of constraints defining the original problem.Overcoming such a limitation implies quantifying the bias affecting the estimates in cases like these: although possible, calculations of this kind are far beyond the scope of the present paper.
Overall, then, two alternatives exist to overcome the main limitation of the 'deterministic' estimation recipe, i.e. that of ignoring the variety of structures that are compatible with a given probability distribution P (A), namely the 'annealed' and 'quenched' ones.As the 'quenched' recipe requires an explicit sampling the ensemble -potentially leading to inconsistent estimates for sparse configurations -we believe the 'annealed' one to represent the better alternative, 1) being unbiased by definition, 2) being convenient from a numerical point of view, 3) reducing to the 'deterministic' recipe in case the empirical configuration is not subject to variation.

VI. ACKNOWLEDGEMENTS
SoBigData.itreceives funding from European Union -NextGenerationEU -National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) -Project: "SoBigData.it-Strengthening the Italian RI for Social Mining and Big Data Analytics" -Prot.IR0000013 -Avviso n. 3264 del 28/12/2021.This work is also supported by PNRR-M4C2-Investimento 1.3, Partenariato Esteso PE00000013 -'FAIR-Future Artificial Intelligence Research' -Spoke 1 'Human-centered AI', funded by the European Commission under the NextGeneration EU programme and by the project 'Network analysis of economic and financial resilience', Italian DM n. 289, 25-03-2021 (PRO3 Scuole) CUP D67G22000130001.DG acknowledges support from the Dutch Econophysics Foundation (Stichting Econophysics, Leiden, the Netherlands) and the Netherlands Organization for Scientific Research (NWO/OCW).MDV acknowledges support from the European Union ERC-2018-ADG Grant Agreement n. 834756, 'XAI: Science and technology for the explanation of AI decision making'.MDV and DG also acknowledge support from the 'Programma di Attività Integrata' (PAI) project 'Prosociality, Cognition and Peer Effects' (Pro.Co.P.E.), funded by IMT School for Advanced Studies Lucca.

APPENDIX A. CONDITIONAL NETWORK MODELS FROM KL DIVERGENCE MINIMISATION
Discrete maximum-entropy models can be derived by performing a constrained maximisation of Shannon entropy [11,12].Here, however, we focus on continuous probability distributions: in such a case, mathematical problems are known to affect the definition of Shannon entropy as well as the resulting inference procedure: to restore the framework, one has to consider the KL divergence D KL (Q||R) of a distribution Q(W) from a prior distribution R(W) and re-interpret the maximisation of the entropy associated to Q(W) as the minimisation of its 'distance' from R(W).Such an optimisation scheme embodies the so-called Minimum Discrimination Information Principle, originally proposed by Kullback and Leibler [18] and requiring new data to produce an information gain that is as small as possible.In formulas, the KL divergence is defined as the class of conditional models can be introduced upon re-writing the posterior distribution Q(W) as Q(W) = P (A)Q(W|A), where A denotes the binary projection of the weighted network W.This equation allows us to split the KL divergence into the sum of three terms reading where (42) is the conditional Shannon entropy of the probability distribution of the weighted network structure, given the binary projection.The expression for S(Q, R) can be further manipulated as follows: upon separating the prior distribution itself into a purely binary part and a conditional, weighted one, we can pose R(W) = T (A)R(W|A), an expression that allows the KL divergence to be re-written as i.e. as a sum of the two addenda with T (A) representing the binary prior and R(W|A) representing the conditional, weighted one.Dealing with completely uninformative priors amounts at considering the expression i.e. 'minus' the joint entropy.The (independent) constrained optimisation of S(P ) and S(Q|P ) represents the starting point for deriving the members of the class of conditional models.

APPENDIX B. CONDITIONAL NETWORK MODELS: DETERMINING THE FUNCTIONAL FORM
The constrained maximisation of S(Q|P ) proceeds by specifying the set of weighted constraints reading 1 =

APPENDIX C. CONDITIONAL NETWORK MODELS:
ESTIMATING THE PARAMETERS Let us, now, provide general expressions for the 'deterministic' and the 'annealed' recipe for parameter estimation.The first one follows from writing while the second one follows from writing 'Scalar' or homogeneous variant of the CEM.In the particular case of the UBRGM-induced, homogeneous variant of the CEM, one can derive the 'quenched' distribution of the parameter β upon considering that it is a function of the discrete, random variable L. Since L ∼ Bin(N (N − 1)/2, p), with p = 2L * /N (N − 1), one finds that an expression allowing us to derive the expected value of β, i.e. ⟨β⟩ = as well as its variance.Since we have that The deterministic approach leads to a single estimate, while the other approaches lead to either a single, 'annealed' estimate (vertical, solid lines) or to a whole distribution of 'quenched' estimates (histograms with normal density curves having the same average and standard deviation, constructed over an ensemble of 5.000 binary configurations; the average value is indicated by a vertical, dash-dotted line).Each 'annealed' estimate overlaps with the average value of the related 'quenched' distribution, although 1) the latter ones are well separated in the case of node 168, 2) only partly overlapped in the case of node 171, 3) the UBCM-induced and the LM-induced ones overlap while the UBRGM-induced one remains well separated in the case of node 170.Moreover, the 'deterministic' estimates are always very close to (if not overlapping with) the UBCM-induced, 'annealed' ones.Although the empirical and theoretical CDFs (respectively depicted as solid lines and dotted lines in the bottom panels) seem to be in a very good agreement, the Anderson-Darling test never rejects the normality hypothesis only for node 166 and does not reject the normality hypothesis in the case of the UBCM-induced distribution of estimates for node 168.
'Vector' or weakly heterogeneous variant of the CEM.As pointed out in the main text, each 'annealed' estimation overlaps with the average value of the related 'quenched' distribution although 1) the latter ones are well separated, in the case of node 168, 2) only partly overlapped, in the case of node 171, 3) the UBCMinduced and the LM-induced ones overlap while the UBRGM-induced one remains well separated, in the case of node 170 (see Fig. 4).Moreover, the 'deterministic' estimation is always very close to the UBCM-induced, 'annealed' one -a result that may be a consequence of the accurate description of the empirical network topology provided by the UBCM -evidently, much more accurate than those provided by the UBRGM and the LM.
Each solid line in Fig. 4 represents a normal distribution whose average value and variance coincide with the ones of the corresponding sample distribution: although the empirical and theoretical CDFs seem to be in (a very good) agreement, the Anderson-Darling test never rejects the normality hypothesis only for node 166 and does not reject the normality hypothesis in the case of the UBCM-induced distribution of values for node 168.
'Tensor' variant of the CEM.Let us, now, leave β ij in its tensor form and constrain the set of weight-specific estimates ŵij , ∀ i < j.In this case, the three recipes lead to the following estimates The deterministic approach leads to a single estimate, while the other approaches lead to either a single, 'annealed' estimate (vertical, solid lines) or to a whole distribution of 'quenched' estimates (constructed over an ensemble of 5.000 binary configurations; the corresponding average value is indicated by a vertical, dash-dotted line).The shapes of the 'quenched', cumulative distributions induced by the three, binary recipes are very similar.a result signalling large differences between the 'deterministic' recipe, on the one hand, and the 'quenched' and 'annealed' recipes, on the other -that, instead, coincide.If, however, ŵij ≡ w * ij , ∀ i < j then, for consistency, p ij ≡ a * ij and the three recipes coincide.'Econometric' variant.As Figs. 3 and 5 show, the 'deterministic' estimation is always quite different from the other, two ones -the only exception being represented by the parameter α, under the UBCM-induced, binary recipe.Such a result should warn from employing the 'deterministic' estimation recipe tout court as ignoring the variety of structures that are compatible with a given probability distribution P (A) will, in general, affect the estimation of the parameters of interest.

FIG. 1 :
FIG.1: Estimations of the parameter β, entering the definition of the homogeneous version of the CEM, where the binary topology is either 'deterministic' (black vertical line) or generated via the UBRGM (light orange or light grey), the UBCM (purple or dark grey) and the LM (light purple or grey).The deterministic approach leads to a single estimate, while the other approaches lead to either a single, 'annealed' estimate (vertical, solid lines) or to a whole distribution of 'quenched' estimates (empirical distribution constructed over an ensemble of 5.000 binary configurations with theoretical curves, Binomial or Poisson-Binomial, dependent on the binary model; the corresponding average value is indicated by a vertical, dash-dotted line).The 'annealed' parameter estimates, the average values of the 'quenched' parameter distributions and the 'deterministic' parameter estimate coincide.Data refers to the year 2017 of the CEPII-BACI version of the WTW[22].

FIG. 2 :
FIG.2: Estimations of the parameter β 166 entering the definition of the weakly heterogeneous version of the CEM, where the binary topology is either 'deterministic' (black vertical line) or generated via the UBRGM (light orange or light grey), the UBCM (purple or dark grey) and the LM (light purple or grey).The deterministic approach leads to a single estimate, while the other approaches lead to either a single, 'annealed' estimate (vertical, solid lines) or to a whole distribution of 'quenched' estimates (histograms with normal density curves having the same average and standard deviation, constructed over an ensemble of 5.000 binary configurations; the average value is indicated by a vertical, dash-dotted line).Each 'annealed' parameter estimate coincides with the average value of the corresponding 'quenched' distribution although the distributions induced by the three, binary recipes are well separated.In addition, the 'deterministic' parameter estimate is very close to the UBCMinduced, 'annealed' one.Data refers to the year 2017 of the CEPII-BACI version of the WTW[22].

)FIG. 3 :
FIG.3: Estimations of the parameters (a) β 0 , (b) ρ, (c) α and (d) γ, entering the definition of the econometric version of the CEM, where the binary topology is either 'deterministic' (black vertical line) or generated via the UBRGM (light orange or light grey), the UBCM (purple or dark grey) and the LM (light purple or grey).The deterministic approach leads to a single estimate, while the other approaches lead to either a single, 'annealed' estimate (vertical, solid lines) or to a whole distribution of 'quenched' estimates (histograms with kernel density curves, constructed over an ensemble of 5.000 binary configurations; the corresponding average value is indicated by a vertical, dashdotted line).Each 'annealed' parameter estimate coincides with the average value of the corresponding 'quenched' distribution although the distributions induced by the three, binary recipes may overlap or not; the 'deterministic' estimate, instead, overlaps with the other, two ones only for the parameter α, under the UBCM-induced, binary recipe.Data refers to the year 2017 of the CEPII-BACI version of the WTW[22].
40) is the cross entropy, quantifying the amount of information required to identify a weighted network sampled from the distribution Q(W) by employing the distribution R(W), S(P ) = − A∈A P (A) ln P (A) (41) is the Shannon entropy of the probability distribution describing the binary projection of the network structure and S(Q|P ) = − A∈A P (A) dW, ∀ α (48) the first condition ensuring the normalisation of the probability distribution and the vector {C α (W)} representing the proper set of weighted constraints.The distribution induced by them readsQ(W|A) = e −H(W) Z A = e −H(W) W A e −H(W) dW = = e − i<j Hij (wij ) W A e − i<j Hij (wij ) dW = = i<j e −Hij (wij ) +∞ mij e −Hij (wij ) dw ij aij = i<j e −Hij (wij ) ζ aij ij(49)if W ∈ W A and 0 otherwise -since each Hamiltonian considered in the present paper is separable, i.e. a sum of node pairs-specific Hamiltonians: in formulas, H(W) = i<j H ij (w ij ).

with 2 2VarFIG. 4 :
FIG. 4: Estimations of the parameters (a)-(b) 168 , (c)-(d) β 170 and (e)-(f) β 171 , entering the definition of the weakly heterogeneous version of the CEM, where the binary topology is either 'deterministic' (black vertical line) or generated via the UBRGM (light orange or light grey), the UBCM (purple or dark grey) and the LM (light purple or grey).The deterministic approach leads to a single estimate, while the other approaches lead to either a single, 'annealed' estimate (vertical, solid lines) or to a whole distribution of 'quenched' estimates (histograms with normal density curves having the same average and standard deviation, constructed over an ensemble of 5.000 binary configurations; the average value is indicated by a vertical, dash-dotted line).Each 'annealed' estimate overlaps with the average value of the related 'quenched' distribution, although 1) the latter ones are well separated in the case of node 168, 2) only partly overlapped in the case of node 171, 3) the UBCM-induced and the LM-induced ones overlap while the UBRGM-induced one remains well separated in the case of node 170.Moreover, the 'deterministic' estimates are always very close to (if not overlapping with) the UBCM-induced, 'annealed' ones.Although the empirical and theoretical CDFs (respectively depicted as solid lines and dotted lines in the bottom panels) seem to be in a very good agreement, the Anderson-Darling test never rejects the normality hypothesis only for node 166 and does not reject the normality hypothesis in the case of the UBCM-induced distribution of estimates for node 168.

FIG. 5 :
FIG.5: Empirical CDFs for the parameters (a) β 0 , (b) ρ, (c) α and (d) γ entering the definition of the econometric version of the CEM, where the binary topology is either 'deterministic' (black vertical line) or generated via the UBRGM (light orange or light grey), the UBCM (purple or dark grey) and the LM (light purple or grey).The deterministic approach leads to a single estimate, while the other approaches lead to either a single, 'annealed' estimate (vertical, solid lines) or to a whole distribution of 'quenched' estimates (constructed over an ensemble of 5.000 binary configurations; the corresponding average value is indicated by a vertical, dash-dotted line).The shapes of the 'quenched', cumulative distributions induced by the three, binary recipes are very similar.