Relating balance and conditional independence in graphical models

When data are available for all nodes of a Gaussian graphical model, then, it is possible to use sample correlations and partial correlations to test to what extent the conditional independencies that encode the structure of the model are indeed veriﬁed by the data. In this paper, we give a heuristic rule useful in such a validation process: When the correlation subgraph involved in a conditional independence is balanced (i


I. INTRODUCTION
Graphical models are a popular tool for representing relationships among variables in a wide variety of contexts from Medicine to Biology from Socio-Economical Sciences to Psychology [1][2][3][4][5][6].
When in a graphical model the joint probability distribution is Gaussian, the dependency structure among the variables is completely represented by the so-called concentration matrix K, i.e., the inverse of the covariance matrix .If we look at the graph whose adjacency matrix is K, then absence of an edge between the ith and the jth nodes (K i j = 0) can be encoded as an independence relationship between the ith and the jth variables, marginal or conditioned on some other variables of the graph.In particular, the whole topology determined by K can be mapped into a set of conditional independencies, often referred to as Markov conditions [7].This mapping can be achieved in both undirected and directed graphical models with only minor differences, the most important being the causality interpretation which can be attached to the directed case, and which leads to graphs having the structure of Directed Acyclic Graphs (DAGs).
When sample data are available for the random variables that form the graphical model, then K can be estimated by inverting the sample covariance matrix.This is known to be an optimal estimator when the sample size goes to infinity [8].When instead the sample size is small, then inversion of the sample covariance is an ill-posed operation, and many alternative methods have been developed in the literature for topology inference or validation [8][9][10][11][12][13][14].
The setting we deal with in this paper is this latter one: We assume that we know the underlying topology of the graphical model and that we have a small number of observations available for all nodes of the graph.Our aim is, therefore, to use these data to verify how much of the conditional independencies that correspond to the topology is "visible" in the data, and to provide criteria for discriminating which conditional independencies are more likely to be validated by the data, and which are less likely and hence can be labeled as false positives.As we assume to be dealing with Gaussian multivariates, measures based on sample correlations are suitable for this task.In particular, it is well known that independence conditioned on certain variables is equivalent to vanishing of the corresponding partial correlation, i.e., vanishing of the correlation among the residuals obtained after projecting away the contribution of the conditioning variables.For sample data, conditional independencies can, therefore, be associated with sample partial correlations that are "small enough" [15].
The contribution of this paper is to relate partial correlation with the properties of the associated correlation graph.In particular, a correlation graph is a signed graph, and the property of signed graphs which we want to highlight here is the socalled structural balance (henceforth, for brevity, balance): a signed graph is balanced if all its cycles are positive, i.e., have an even number of negative edges, see Refs.[16][17][18].The property is also called frustration free in the Statistical Physics literature [19][20][21].It is closely related to the notion of positive association among random variables [22].In particular, we provide a couple of heuristic rules, valid for small sample size: (1) Partial correlations associated with balanced correlation graphs tend to be smaller in absolute value than those associated with unbalanced correlation graphs; (2) partial correlations associated with balanced correlation graphs tend to be contractions of the corresponding correlations (contraction is intended in an elementwise sense: A partial correlation coefficient is less in absolute value than the corresponding correlation coefficient).
Both rules point to the fact that balanced correlation graphs tend to lead to validation of conditional independencies more often than unbalanced ones.It is straightforward to show on examples that both rules are only heuristic and not obeyed strictly.The second, however, can be made into a rigorous law if instead of correlation graphs we look at concentration graphs.The equivalent of balance for concentration graphs is an analogous property which we call inverse balance but which in Ref. [23] is called signed MTP 2 (multivariate totally positive of order 2).MTP 2 is a stronger form of positivity, which corresponds to the concentration matrix K being an M-matrix ( [24]; see below for details of this characterization).Indeed we show in the paper that (2 ) partial covariances associated with inverse balanced correlation graphs are always contractions of the corresponding covariances.
Hence, for inverse balance, indeed, conditioning leads to contractions, which points even more to the balance property as being a predictor of the validity of a conditional independence.
As an application of the rules above to empirical data, we consider length-2 causal chains in high-resolution models of gene transcription and protein synthesis.Here the task is to discriminate which chains are true positives, in the sense that the associated conditional independence is validated by our omics data.It is shown that there is a systematic difference between the validation rate obtained in the balanced and inverse balanced case and that obtained in the unbalanced and inverse unbalanced case.In particular, balance and inverse balance appear to be good predictors of conditional independence, in accordance with the above-mentioned rules.These results are qualitatively similar to other studies we recently carried out on different datasets and/or different network motifs, see Refs.[25,26].They are here integrated by a more thorough theoretical analysis of the problem.

II. GAUSSIAN GRAPHICAL MODELS AND CONDITIONAL INDEPENDENCE
Let us consider a Gaussian Bayesian network, i.e., a probabilistic graphical model on a DAG in which to each node is associated a Gaussian random variable X i .Denote X = {X 1 , . . ., X n } ∼ N (0, ), with an n × n positive definite (p.d.) covariance matrix.We assume that the joint probability distribution that represents the DAG factorizes according to the structure of the DAG, i.e., according to the topology determined by the concentration matrix K = −1 with the directionality of the edges prescribed by the DAG.Such factorization determines a set of conditional independencies (also called Markov conditions), where the symbol "⊥" means independence, "| " means conditioning and S is a separating set for X i , see Ref. [7].We also assume that the nodes of the DAG are "well ordered" [7] so that local and global Markov conditions coincide.
For instance, for the length-2 chain X 1 → X 2 → X 3 shown in Fig. 1(a), the joint probability distribution factorizes as Conditioning on X 2 and using Bayes' rule, we get which shows that X 1 and X 3 are conditionally independent given X 2 : X 1 ⊥ X 3 | X 2 , see Ref. [27].

III. CORRELATION, PARTIAL CORRELATION, AND CONDITIONAL INDEPENDENCE
From K = −1 , consider the normalized versions of both and K, i.e., the correlation matrix , and Both R and H are obviously p.d. when , respectively, K are, and have the same sign pattern as , respectively, K.Note that even if K (and H) is sparse, (and R) is, in general, a dense matrix.The matrix H (and, hence, K) is related to the partial correlation matrix P, whose (i, j)th entry, denoted R X i X j .S , is the partial correlation between variables X i and X j given the remaining n − 2 variables S = X \ {X i , X j } and is defined as In matrix form Formula (3) shows that the normalized concentration matrix H and the partial correlation matrix P have the same nonzero pattern but with opposite signs in the off-diagonal part.P is, in general, not p.d. Conditional independence between X i and X j corresponds to absence of the (i, j) edge in the concentration matrix, which, in turn, corresponds also to R X i X j .S = 0. On sample data, the exact test ( 4) is normally replaced by a statistical test, such as where θ is a significance threshold obtained, e.g., through a Fisher test, see Ref. [15].This is especially needed when the sample size m is small (comparable to the number of variables n) because the matrix inversion operation in (2) becomes ill conditioned even when R is p.d.This ill conditioning calls for alternative tests to be developed in order to check conditional independence on sample data, topic which is discussed next.

IV. BALANCE AND INVERSE BALANCE ON GRAPHICAL MODELS
For Gaussian variables X i and X j , one says that X i and X j are positively associated if for all monotone functions f and g, R f (X i )g(X j ) 0, which, in particular, implies that R X i X j 0 [22].Consequently, if the entire vector of Gaussian random ) is positively associated, the graph G(R) formed by taking the correlation matrix R as the adjacency matrix has all nonnegative entries.In general, however, R contains both positive and negative off-diagonal entries, meaning that G(R) is a signed graph.Among all signed graphs, a special class stands out because it has properties that are similar to those of a nonnegative graph: It is the class of balanced graphs (also called frustration-free graphs).A graph is said balanced if all its cycles are positive, i.e., they contain an even number of negative edges.An equivalent condition is that there exists a diagonal signature matrix D = diag(d 1 , . . ., d n ) with d i = ±1, such that after the change of basis with D = D −1 (sometimes called a "gauge transformation"), R = DRD is non-negative, and, therefore, G( R) has all non-negative edges [18].We then say that a vector of Gaussian random variables X is balanced if it has a balanced correlation graph G(R), i.e., if the variables DX are positively associated (they have correlation R).
In Gaussian graphical models, a concept stronger than positive association is often used, that of MTP 2 [24].In terms of the concentration matrix K, X ∈ N (0, ) is MTP 2 if and only if K is an M-matrix.Recall that a matrix K is said to be ).In practice, K being an M-matrix implies that K (and H) have all off-diagonal entries which are nonpositive.Hence, from (3), P has all non-negative off-diagonal entries.Summarizing: X MTP 2 is equivalent to say that the partial correlation graph G(P) has all non-negative edges.A generalization from non-negative G(P) to signed G(P) can be obtained in the same way as we discussed above for G(R).In particular, in this paper we call inverse balanced a Gaussian probability distribution X ∈ N (0, ) such that K becomes a M-matrix after a gauge transformation with a diagonal signature matrix D, i.e., K = DKD is a M-matrix.
It is straightforward to show that if D renders K a M-matrix, then it also renders DRD (and D D) non-negative, meaning that the same gauge transformation D that renders X MTP 2 also renders X positively associated.The same D is also such that all partial correlations d i R X i X j .S d j become non-negative for any conditioning subset S ⊂ X \ {X i , X j }.The notion of inverse balance is referred to as signed MTP 2 in Ref. [23].Given a p.d. covariance matrix and its concentration matrix K, then checking inverse balance is a purely graphical condition: X is inverse balanced (i.e., signed MTP 2 ) if and only if G(P) is balanced.Since X which is MTP 2 is a sufficient but not necessary condition for X to be positively associated, then also X which is inverse balanced (i.e., signed MTP 2 ) is a sufficient but not necessary condition for X to be balanced.The various properties with their graphical characterizations are summarized in Table I.

V. CONTRACTION PROPERTY OF INVERSE BALANCED PARTIAL COVARIANCES
It is known that in MTP 2 distributions, partial correlations and covariances over all conditioning sets are nonnegative, see Ref. [22].On the other hand, conditioning corresponds to projecting the data on the orthogonal complement of the variables being conditioned upon, hence, intuitively, in reducing the variance and covariance of the data.Combining this two concepts, we have the following elementwise contraction property.
Theorem 1.Consider n Gaussian random variables X i ∈ X ∼ N (0, ) such that their joint probability distribution is inverse balanced (i.e., signed MTP 2 ).Then, for any two sets S 1 , S 2 ⊂ X , S 1 ⊂ S 2 , and X i , X j / ∈ S 2 , it is where X i X j .S k is the covariance of X i and X j conditioned on S k and | • | is the absolute value.
Proof.We can use the recursive formula for conditioning a covariance of k variables, see Ref. [28].By reordering the indices, let i, j = 1, . . ., q, and the conditioning be on the indices q + 1, q + 2, . . ., p.Then, If the probability distribution of X is inverse balanced, then ∃ a diagonal signature matrix D = diag{±1} such that ¯ = [ ¯ i j ] = D D 0. Since X = DX is MTP 2 , all its partial covariances ¯ i j.S are non-negative for any S, hence, in particular, all three terms appearing in ( 7) are non-negative, which implies 0 Iterating over successive conditioning leads to 0 ¯ i j.S 2 ¯ i j.S 1 .Consequently, a similar expression holds for X with the absolute values as in (6).
Since X i X j .S is a scalar, the contraction in ( 7) is elementwise.A similar result is likely valid for the corresponding correlations, although a formal proof is still missing.
Remark 1.If a distribution is balanced but not inverse balanced, then the contraction property (6) can be violated when X i X j .S changes sign with respect to X i X j .Consider, for example, = ⎡ ⎣ 1.14 0.67 0.08 0.67 0.87 0.29 0.08 0.29 0.89 ⎤ ⎦ .

VI. USING BALANCE AND INVERSE BALANCE OF THE SAMPLES FOR CHECKING CONDITIONAL INDEPENDENCE
If all nodes of the graphical model are measured, then we can form the sample correlation matrix R (which we can assume p.d.).By construction, correlation matrices are symmetric R i j = R ji , meaning that G(R) is undirected.The sample correlation obtained from the data is also typically full, hence G(R) is typically a complete graph.This implies that all variables pairs are connected by an undirected edge in G(R), even those that are not so in the underlying graphical model.It is this passage from the "true" topology of the graphical model (a DAG) to the fully connected topology of G(R) that provides an opportunity for checking consistency of the data, as the balance of the induced cycles on G(R) helps us discriminate the reliable data from the unreliable ones.We stress that G(R) is now a sample correlation graph, meaning that we always rely exclusively on the data to determine balance and inverse balance.
As an example, in Fig. 1 we consider the chain X 1 → X 2 → X 3 .In this case G(R) is a triangle of nodes 1, 2 and 3. On this triangle, looking at the signs of the edges, eight different G(R) are possible, four of which are balanced and four unbalanced, see Fig. 1(b).If we associate an empirical regulatory action to each edge of G(R) by taking the corresponding value of R, then it is easy to realize that if G(R) is balanced the three regulatory actions are compatible with each other, whereas if G(R) is unbalanced, then at least one of these empirical regulations is incompatible with the others.For instance, if the data for X 1 and X 2 are positively correlated (meaning, e.g., X 1 activates X 2 ), but X 2 and X 3 are negatively correlated (X 2 inhibits X 3 ), then one expects that X 1 and X 3 should also be negatively correlated (X 1 indirectly inhibits X 3 ; balanced case), whereas if instead X 1 and X 3 are positively correlated (X 1 indirectly activates X 3 ; unbalanced case) then the three data series of X 1 , X 2 , and X 3 are incompatible with each other and should not belong to the same "true" regulatory graph.
To check what happens on our length-2 chain, we generated 10 4 sample realizations using the "mvnrnd" function of MATLAB with sample size m = 6 and m = 20.For the conditional independence X 1 ⊥ X 3 | X 2 , the correlation R X 1 X 3 , and the partial correlation R X 1 X 3 .X 2 were computed, as well as the balance and the inverse balance properties of the resulting graphs.The histograms of the resulting partial correlations, classified according to balance and unbalance and inverse balance and inverse unbalance, are shown in Figs.1(c) and 1(d) for both values of m, and the scatter plots of correlations vs partial correlations for the case of m = 6 in Figs.1(e) and 1(f).The case of m = 6 is the most interesting: the partial correlations for the balanced cases tend to be more concentrated around the origin, whereas those associated with unbalanced cases tend to stay away from the origin, see Fig. 1(c).Something similar is visible also for inverse balance, although in a less pronounced way, see Fig. 1(d).When m grows, the effect tends to disappear in both unbalanced and inverse unbalanced cases because numerically the sample correlations and sample partial correlations become more precise.Note in Fig. 1(e) how in the balanced case the contraction rule is tendentially satisfied.The same relationship appears to be always satisfied for inverse balanced graphs (even though our proof of Theorem 1 is only for covariances and not for correlations).The unbalanced cases instead do not obey to any contraction rule, on the contrary.To summarize, the "sign consistency" encoded in the balance of G(R) reflects in the amplitude of the partial correlations, and, in particular, leads to |R X 1 X 3 .X 2 | < |R X 1 X 3 | much more often than unbalance, which, in turn, leads to verification of the conditional independence X 1 ⊥ X 3 | X 2 more often than unbalance.These are heuristic properties and, indeed, exceptions are visible in the data.A consequence is that balance can be used as a (heuristic) test of the validity of a conditional independence among the variables of the graphical model.Since inverse balance is computed from samples in an analogous way, also inverse balance of a graphical model can be used to set up a heuristic test of independence.What can be made rigorous in this case is the property stated in Theorem 1 that for inverse balance conditioning always leads to an elementwise contraction towards 0 of the "residual" partial covariance.
These observations for chains X 1 → X 2 → X 3 and artificial data are confirmed in the next section on experimental data.
Similar arguments can be set up for any graphical model, or subset of a graphical model, involving, at least, two adjacent edges (not forming a collider motif).It gives us a way to label certain interaction patterns as "more plausible" (candidates to be true positives) or as "less plausible" (candidates to be false positives) based on the sample correlations, without relying on the computation of partial correlations.This sign consistency test on G(R) is akin to a parity check in the error correcting code [29,30], and, in fact, the notion of balance can be mapped into solvability of an XOR-SAT problem, i.e., a linear system over a binary field [20].The test is purely on the sample correlations, and, even when applied to Bayesian networks, it does not lead to any causality violation on the corresponding DAG.

VII. AN EXPERIMENT: ELEMENTARY CHAIN MOTIFS IN GENE TRANSCRIPTION AND TRANSLATION
We consider high-resolution models of transcription and translation processes for around 4,800 human genes and proteins, see Fig. 2. In particular, for each gene and protein the variables we consider are: (1) The chromatin accessibility state in sites (called "peaks") in the promoter region of the gene; (2) the RNA expression of the alternative splice variants that characterize a gene.
(3) Protein abundance.Omics measurements of these quantities are available through ATAC-seq, RNA-seq, and mass spectrometry, see Refs.[25,26] for details.We use the subindex "A" to denote ATAC sites, "S" to denote splice variant, and "P" for protein.An "open" chromatin around the promoter region of a gene favors the binding of the transcription factors and hence enhances the regulatory effects on the gene (both positive and negative regulations).ATAC-seq (assay for transposaseaccessible chromatin using sequencing) essays probe the chromatin state by estimating the accessible sites ("peaks") on the DNA in the promoter region of a gene, in particular in proximity of the transcription start site of each splice variant of a gene.The measurements it produces are assumed to be in direct proportionality with this accessibility.Gene expression itself is composed by the abundance of an ensemble of transcripts representing alternative splice variants of the gene, all measurable in modern deep RNA-seq experiments.
The DNA regions targeted by ATAC-seq reads can be of relevance for some splice variants but not for others, hence different measured ATAC peaks can be associated with different splice variants of the same gene.By knowing the position on the DNA of both the ATAC peaks and the transcriptional starting sites of each splice variant of a gene, it is possible to create a bioinformatic map of putative A → S interactions for each gene.The splice variants of a gene, in principle, can lead to different protein isoforms, which, however, are not distinguishable by current mass-spec proteomics.Hence, all splice variants of a gene are necessarily associated to the same protein variable, leading to a set of causal interactions S → P, some of which are "real" whereas some other are false positives.To summarize, for each protein a bioinformatic analysis provides us with a set of putative interactions among A, S, and P. Since causality flows unidirectionally from chromatin accessibility to RNA transcription and then to translation into proteins, both sets of interactions A → S and S → P have an unambiguous directionality.The motif we are interested in is the length-2 chain A → S → P. The number of such chains in our data is 34,026.The set of putative chains obtained in this way contains, however, a large number of false positives.Our task is to use the notion of balance and inverse balance to prune some of these false positive interactions.
More specifically, only some of the ATAC sites, in practice, influence transcription, and, similarly, only some of the splice variants S are of relevance for protein synthesis.Our task is to find out which ones by checking for which triples {A, S, P} involved in a chain it is indeed A ⊥ P|S.For each triplet, the sample correlations R AS and R SP can be used to estimate the direct regulatory effects and R AP for the indirect one.For each {A, S, P}, the three correlations form a triangle G(R) whose balance can be computed, as well as the associated inverse balance on G(P).Because of the edge directionality, the balance can be interpreted straightforwardly as coherence between the regulatory signs inferred for the "physical path" A → S → P and that associated with the "nonphysical" indirect path A → P.These motifs have some similarity with the so-called feedforward loops used in biology, where balance is, indeed, referred to as "coherence," see Ref. [31].The conditional independence A ⊥ P|S can be checked also via partial correlations using (5).The distribution of such partial correlations for our 34,026 chains, classified according to balance and unbalance (and inverse balance and unbalance) is shown in Fig. 3(a), and the scatter plots of R AP vs R AP. S are given in Fig. 3(b).These plots show the following two behaviors: (i) partial correlations R AP. S associated with balanced G(R) tend to concentrate around 0, whereas those associated with unbalanced G(R) tend to stay away from 0, and (ii) |R AP. S | < |R AP | occurs much more often in the balanced cases than in the unbalanced ones.Both properties lead to the same conclusion: balance is much more a proxy for conditional independence than unbalance.Both properties keep holding for inverse balanced graphs where, in particular, the contraction property |R AP. S | < |R AP | is always verified, see lower left panel in Fig. 3(b).
The main contribution of this paper is to introduce the notion of balance and inverse balance as easy-to-check proxies for conditional independence in graphical models.If in this paper we have used them as a validation test for graphs of known topology, a more challenging task ahead is to try to use them as a structure discovery tool, in the case in which the topology of the graphical model is not known and its conditional independences (and concentration matrix) must be found relying only on the sample data.

20 FIG. 1 .
FIG. 1. Conditional independence on a length-2 chain motif and graphical tests based on balance and unbalance.(a) The length-2 chain motif.(b) Possible correlation graphs G(R) associated with the chain motif.Four of the G(R) are balanced and four unbalanced.(c) Distribution of the partial correlations R X 1 X 3 .X 2 according to balance (left) or unbalance (right) for two different sample sizes m = 6 and m = 20.(d) Distribution of the partial correlations R X 1 X 3 .X 2 according to inverse balance (left) or inverse unbalance (right) for m = 6 and m = 20.(e)Scatter plot of R X 1 X 3 vs R X 1 X 3 .X 2 in the balanced (left) and unbalanced (right) case.(f) Scatter plot of R X 1 X 3 vs R X 1 X 3 .X 2 in the inverse balanced (left) and inverse unbalanced (right) case.

FIG. 2 .
FIG. 2. (a) Sketch of the gene transcription/translation process.The open or close status of the chromatin in the sites in proximity of the promoter region of a gene influences gene transcription.A gene is itself composed of various exons which, during transcription, combine in different ways yielding alternative splice variants of a gene.(b) All of the splice variants of the gene, in principle, can contribute to protein synthesis.(c) To each gene and protein are associated multiple elementary chain motifs A → S → P corresponding to the three variables chosen (A = chromatin accessibility state, S = splice variants of a gene, and P = protein).

FIG. 3 .
FIG. 3. Balance and unbalance for elementary chain motifs A → S → P in gene transcription and translation.(a) R AP. S classified according to balance and unbalance (left panel) and according to inverse balance and unbalance (right panel).(b) Scatter plots of R AP vs R AP. S classified according to balance and unbalance (upper panels) and to inverse balance and inverse unbalance (lower panels).

TABLE I .
Summary of properties of random Gaussian variables, their equivalent graphical characterization, and their dependencies.
[24]-matrix if it can be written as K = sI − B with B 0 and s > ρ(B), where B 0 means elementwise non-negative, and ρ(B) is the spectral radius of B. X being MTP 2 is a sufficient but not necessary condition for X being positively associated (meaning that K M-matrix implies R (and ) non-negative but not vice versa, see Ref.[24]