Classifying Causal Structures: Ascertaining when Classical Correlations are Constrained by Inequalities

The classical causal relations between a set of variables, some observed and some latent, can induce both equality constraints (typically conditional independences) as well as inequality constraints (Instrumental and Bell inequalities being prototypical examples) on their compatible distribution over the observed variables. Enumerating a causal structure's implied inequality constraints is generally far more difficult than enumerating its equalities. Furthermore, only inequality constraints ever admit violation by quantum correlations. For both those reasons, it is important to classify causal scenarios into those which impose inequality constraints versus those which do not. Here we develop methods for detecting such scenarios by appealing to d-separation, e-separation, and incompatible supports. Many (perhaps all?) scenarios with exclusively equality constraints can be detected via a condition articulated by Henson, Lal and Pusey (HLP). Considering all scenarios with up to 4 observed variables, which number in the thousands, we are able to resolve all but three causal scenarios, providing evidence that the HLP condition is, in fact, exhaustive.

mathematical framework for causal inference [1][2][3][4][5], the candidates for those causal explanations are represented by directed acyclic graphs (DAGs), where each node is associated with a variable and each edge represents direct causal influence.
A DAG imposes causal compatibility constraints on the probability distributions that can be causally explained by it.For example, a probability distribution over variables {A,B,C} where A and C remain correlated even after a value of B is conditioned upon cannot be causally explained by the DAG of Figure 1.All of the distributions over {A,B,C} that can be explained by Figure 1 need to satisfy P (AC|B) = P (A|B)P (C|B).1The causal compatibility constraint described above, denoted by A⊥ ⊥ CI C|B, is a conditional independence relation.That is, it says that one set of variables becomes independent of a second set of variables when conditioning on a third.In general, however, a DAG can impose more complicated types of constraints on the compatible probability distributions.This only happens for DAGs that have latent nodes, i.e. nodes associated with variables that do not appear in the probability distributions of interest.

A B C
From now on, the variables that do appear in the probability distributions of interest will be called observed variables, while the ones that do not will be called latent variables.Nodes associates with observed variables will be called observed nodes and will be depicted by triangles, while nodes associated with latent variables will be called latent nodes and will be depicted by circles.
An example of a DAG that imposes more complicated types of constraints on the compatible distributions over observed variables is the Bell DAG, presented in Figure 2.This DAG is of interest to physicists, as it encompasses the causal assumptions of Bell's theorem.Bell's theorem [6,7] is central to the foundations of quantum mechanics [8][9][10][11], as it says that no locally causal hidden-variable theory in which the observers can choose their measurements independently of the source can ever be capable of reproducing all the operational predictions of quantum theory [12][13][14][15][16].These assumptions are encoded in the Bell DAG, as there the settings X as Y are not causally connected to the source Λ, as well as the setting in one wing does not causally influence the outcome in the other wing.

Λ X Y B A
As it turns out, the Bell DAG imposes causal compatibility constraints on the compatible distributions over {A,B,X,Y } that take the form of inequalities, which are precisely the Bell inequalities.This means that all the distributions which violate Bell's inequalities, including some of the quantum predictions for this scenario, cannot be causally explained by the Bell DAG.
With the goal of causally explaining the violation of Bell's inequalities without changing the causal assumptions embedded in the structure of the Bell DAG, Henson, Lal and Pusey (HLP) [17] developed a generalization of Pearl's causal inference.In this generalized framework, the latent nodes can be associated with quantum or other generalized probabilistic theory (GPT) systems.HLP also proved that the conditional independence constraints remain the same independently of the theory that describes the latent nodes; other types of constraints, like Bell's inequalities, can be violated.
The Bell DAG is just one of the many causal structures for which inequality constraints can be violated when the latent nodes are associated with nonclassical systems [17][18][19][20][21].In the Bell scenario, correlations that violate Bell inequalities have cryptographic applications precisely because of their non-classicality, so it seems reasonable to hope that other scenarios allowing non-classical correlations will have similar applications.Finding which causal structures imply inequalities which potentially admit quantum violation is a critical step towards such potential applications.
In HLP, the concept of Non-Algebraicness of a causal structure was defined.There, this concept was called interestingness.2A causal structure is said to be Algebraic when all of its causal compatibility constraints are of the form of conditional independence relations, even in the classical case.Conversely, if the causal structure imposes more complicated constraints on the compatible probability distributions, it is said to be Non-Algebraic.As proven in HLP, only the Non-Algebraic scenarios are passive of witnessing a difference between the sets of classically and quantumly achievable probability distributions.
In HLP, a sufficient condition for Algebraicness of a causal structure was developed; it is referred to here as the HLP criterion.A central motivation behind this work is that, at present, it is not known whether the HLP criterion is also necessary for Algebraicness.That is, if a DAG cannot be proven Algebraic by virtue of the HLP criterion, is the DAG necessarily Non-Algebraic?The conjecture that the HLP criterion is indeed necessary for Algebraicness will be referred to as the HLP conjecture.
How might we evaluate the HLP conjecture?To disprove it, we need to find only one DAG for which the HLP criterion does not apply, but which can be proven Algebraic by some other method.We did not pursue a search for such a counterexample, simply because we are unaware of any means to prove Algebraicness when the HLP criterion does not apply.We therefore concentrate on providing evidence in support of the conjecture being true.Namely, we show that as one considers "larger and larger" DAGs, we can still certify the Non-Algebraicness of (almost) all DAGs for which the HLP criterion does not apply.
2Our use of Algebraic and Non-Algebraic is equivalent to the distinction between boring and interesting causal structures in the terminology of Ref. [17].We recognize that, in general, renaming existing technical jargon is counterproductive.However, we feel that our terminology of Algebraic and Non-Algebraic has semantic connotations so much closer to their underlying meanings as to warrant the renaming.The authors acknowledge Robert W. Spekkens for suggesting these terms.
To accomplish these goals we must clarify two preliminary questions.Firstly, how should we enumerate over DAGs?One enumeration style -the original enumeration choice employed by HLPis to count DAGs by their total number of nodes.We can thus consider DAGs with 5 total nodes, with 6 total nodes, with 7 total nodes, etc.While this enumeration style has the advantage of simplicity, the arguably more natural enumeration which will be used here is to count by the total number of observed nodes.We can thus consider DAGs with 2 observed nodes, 3 observed nodes, 4 observed nodes, etc. From naive structural considerations alone, however, one might imagine that there are infinitely many DAGs with any fixed number of observed nodes.Motivated in-part by avoiding such infinities, when enumerating by the number of observed nodes we elect to work within Evans' framework of marginalized DAGs (mDAGs) [22], which will be explained in Section 3.1.
The second preliminary question is how to prove that a DAG for which the HLP criterion does not apply is indeed Non-Algebraic?Since we here seek to consider hundreds if not thousands of such DAGs, we are heavily invested in identifying algorithmic techniques for proving Non-Algebraicness, within which we deprioritize computationally expensive methods.In stark contrast to the approach of HLP, we extensively leverage a new result due to Evans [23], who has shown that a DAG G is Algebraic if and only if it is observationally equivalent to some DAG that does not have latent variables -where two DAGs are said to be (classically) observationally equivalent when their sets of (classically) compatible probability distributions are the same.As such, we herein primarily exploit necessary conditions for a graph to be observationally inequivalent to every latent-free graph.
One such condition is that when the DAG is nonmaximal, i.e. when it has a pair of nodes that are not adjacent (not connected by an arrow nor by a shared latent common cause) but are nevertheless not d-separated by any set of observed nodes, then the DAG is not observationally equivalent to any latent-free DAG.Another condition says that for two DAGs to be observationally equivalent, they must admit the same e-separation relations over their observed nodes.A third condition says that for two DAGs to be observationally equivalent, they must admit the same set of compatible supports.Although the latter condition subsumes the two former ones (as discussed in Section 4.5), the former conditions can be evaluated more efficiently.The latter condition involving supports (remarkably!) can be assessed using Fraser's algorithm [24], which generally requires higher computational overhead.All of these conditions can be utilized to prove the Non-Algebraicness of a given DAG, via proving the classical observational inequivalence of said DAG with every latent-free graph which has the same number of observed nodes.
It is worth contrasting the tools we employ here to certify Non-Algebraicness with those employed in prior literature by HLP and Pienaar [25].3HLP themselves attempted to explore all DAGs with 6 total nodes for which their sufficient condition for Algebraicness did not apply.For all but five of these DAGs, HLP proved Non-Algebraicness by means of deriving DAG-specific entropic inequalities and showing that those entropic inequalities could be violated by a DAG-specific distribution which nevertheless satisfied the conditional independence constraints of the DAG.One of the remaining five cases is the Bell DAG, 3In doing so, we expose a critical error in one of the theorem's in Ref. [25] which we then rectify here (Happily, the handful of explicit DAGs which were declared as Non-Algebraic pursuant to the fallacious theorem in Ref. [25] are ultimately indeed Non-Algebraic nevertheless, and moreover their Non-Algebraicness follows from our replacement nonmaximality theorem here).This discussion is made in Appendix A.
which is long-since established as Non-Algebraic.Another one of the five is the so-called triangle scenario, whose Non-Algebraicness was proven using a one-off proof technique which HLP did not generalize.The remaining three DAGs with 6 total nodes were eventually established as Non-Algebraic in a separate work by Pienaar [25], who employed a proof technique using fine-grained entropic inequalities.
At first glance, the algorithmic proofs of Non-Algebraicness we employ here may seem unrelated to those utilized by HLP or Pienaar.However, our theorem relating e-separation to Non-Algebraicness turns out to recover all but one of the Non-Algebraicness results that HLP achieved by appealing to entropic inequalities.Additionally, our application of Fraser's algorithm further witnesses the Non-Algebraicness of every other DAG conjectured by HLP to be Non-Algebraic, including the three of which were only proven Non-Algebraic in the later work of Pienaar [25].We are therefore confident that our plethora of techniques likely supersede those earlier employed by HLP and Pienaar [25], despite not having a formal proof yet.
We summarize our ultimate findings in Table 1.We interpret these results as evidence in favour of the HLP conjecture: among the thousands of analyzed mDAGs, there are only 3 potential counterexamples.As discussed, this work is of interest to quantum physicists because the Non-Algebraic DAGs are the possible candidates to explore quantum advantages in device independent information processing protocols [26,27].These DAGs are also the ones that should be looked into to compare quantum theory to more general probabilistic theories (GPTs) [28,29].
On the other hand, our problem is also of central interest for purely classical causal inference.The set of probability distributions which are classically compatible with an Algebraic DAG is constrained only by conditional independence relations, which can be obtained from the d-separation relations of the DAG.Isolating a sufficient set of d-separation relations in a graph4 is a well-studied problem.It is of paramount value to a classical data scientist, therefore, to know if the causal hypothesis encoded in a DAG may or may not be falsified by accounting for nontrivial inequality constraints.Such inequality constraints, when present, are often difficult to explicitly characterize.

Structure of the paper
In Section 2.1 we present an introduction to the formalism used in Causal Inference, and proceed to explain what Non-Algebraic and Algebraic precisely mean in this context in Section 3. We state and prove our e-separation condition in Section 4. In Section 5 we present our computational results and also discuss the methods we tried to confirm the Non-Algebraicness of the three left mDAGs of 4 observed nodes.Our conclusions can be found in Section 6.

Causal Explanations of Observational Data
The area of causal inference is concerned with finding potential causal explanations for observed events.For example, imagine that we want to find out what is the causal relationship between three events: a cloudy sky, rain and the floor being wet. Figure 3 depicts two possible causal structures between these three events; in 3(a) we hypothesize that the clouds cause the rain and the rain makes the floor wet.In 3(b), on the other hand, we hypothesize that the wet floor causes the rain, and the rain makes the sky cloudy.

Clouds
Rain Wet Floor Wet Floor Rain Clouds (a) (b) Figure 3: Two hypothesis for the causal relationships between observing clouds, rain and wet floor.By intervening on the experiment we can attest that (a) is the correct explanation, but we cannot attest that if we only passively look at the correlations between those events.
There is an easy way we can check that Figure 3(b) is not the correct causal hypotheses: if we pour water on the floor in a sunny day, it will not start raining.
Note that this method presupposes that we can intervene on our experiment, meaning that we can force one of the variables of the experiment (wet floor) to assume the value we want.However, it is not always possible to do that; sometimes it is unethical or there are technical or fundamental limitations to do so.
If the experimenter cannot make interventions on the variables of interest, it is still possible for them to draw some conclusions from a passive observation of the correlations between the events of interest.As we will see, each causal structure imposes constraints on which probability distributions obtained from passive observations can be classically explained by it.These constraints can be tested against the observed probability distribution to see if the given causal structure is a valid causal hypothesis for the observed phenomena.In Figure 3 it happens that both causal structures impose the same constraints on probability distributions, so they are not distinguishable by passive observations.
In this work we will be mainly concerned with passive observations.The sets defined in Section 3 that are of relevance to us make reference to distributions obtained from passive observations only.

Directed Acyclic Graphs
In the framework we use here, the mathematical object that describes a causal structure is a directed acyclic graph (DAG).A directed graph G is a pair (A,E), where A is a set of nodes and E ⊆ A×A is a set of directed edges.A directed acyclic graph (DAG) is a directed graph that has no directed cycles.Below, we introduce some definitions regarding DAGs that will be useful later.
Definition 1 (Children, Parents, Descendants, Ancestors).Let X be a node of a DAG G.If Y is another node of G such that there is a directed edge X → Y , then Y is called a child of X, and X is called a parent of Y .The set of all children of X is denoted as CH G (X), and the set of all parents of X is denoted as PA G (X).
A directed path is a sequence of nodes X 1 ,X 2 ,X 3 ......X n such that X i → X i+1 for i = 1,...,n.The descendants of X in G are all the nodes in G that can be reached from X by a directed path.Conversely, all the nodes that have X as a descendant are called ancestors of X.The set of all ancestors of X is denoted as ANC G (X), meanwhile the set of all descendants of X can be denoted as DES G (X).5 For example, in the DAG of Figure 4 we have that PA G (B) = {A,D}, and that E is a descendant of D, even if it is not its child.5In this article we follow the convention of Refs.[30][31][32] and others, namely, the convention in which X ∈ ANC G (X) and X ∈ DEC G (X) but X ̸ ∈ PA G (X) and X ̸ ∈ CH G (X).That is, a node is considered it own ancestor and its own descendant, but not its own parent or its own child.

DAGs as Causal Structures
When we associate each node of a DAG with an event of interest, the DAG is a representation of a causal structure: an edge X → Y shows a possibility of a direct causal influence of X on Y .Here, we will indicate the variable associated with a node by the same capital letter as the node itself.If all the events of interest are described by classical random variables, the idea that a probability distribution "can be causally explained" by a certain DAG is formalized through the Markov condition: Definition 2 (Markov Condition).Let G be a DAG with nodes A. A probability distribution P over the variables A is said to be Markov with respect to G if it can be factorized as: Where A = {X 1 ,...,X n } and PA G (X i ) is the set of parents of the node X i in G.
If P is Markov with respect to G, we also say that it is " classically compatible" with G.
As an example of this definition, a joint probability distribution P ABCDE over the random variables A, B, C, D and E is Markov with respect to the DAG of Figure 4 if it can be decomposed as: Therefore, through the Markov condition, each DAG imposes constraints on the probability distributions that are classically compatible with it.A DAG which is classically compatible with every probability distribution, that is, a DAG that does not impose any constraint on the classically compatible distributions, is said to be a saturated DAG.

d-separation: A graphical criterion for conditional independence
If P is a probability distribution over a certain set of random variables A, we say that the variables of the subset X ⊆ A are conditionally independent of the variables of the subset Y ⊆ A given the subset Z ⊆ A if P can be factorized as P (X,Y |Z) = P (X|Z)P (Y |Z).This is denoted by X ⊥ ⊥ CI Y |Z.
Some of the constraints that a DAG imposes on the probability distributions that are classically compatible with it are of the form of conditional independence: if a probability distribution P cannot be factorized according to a certain conditional independence that is imposed by the DAG, then it is not classically compatible with the DAG.As it turns out, there exists a graphical algorithm to obtain all the conditional independence constraints that are imposed by a DAG.This algorithm is called "d-separation", and we will now describe it.
If G is a DAG with nodes A and X,Y ,Z ⊆ A are sets of nodes in G, the d-separation algorithm says whether X and Y are "d-separated" by Z.This happens if all the undirected paths (paths that ignore the direction of the arrows) from X to Y are blocked by Z.A path is blocked if one or more of the following is true: 1.There is a chain of nodes along the path: i → m → j such that m ∈ Z.
2. There is a fork along the path: i ← m → j such that m ∈ Z.
3. There is a collider along the path: i → m ← j such that m / ∈ Z and d / ∈ Z for all the descendants d of m.
If X is d-separated of Y given Z in the DAG under consideration, we denote it as X ⊥ d Y |Z.
For example, for the DAG of Figure 1 the d-separation criterion says that A⊥ d C|B.This means that the event A should become independent of the event C upon knowledge of B; if your distribution P does not satisfy the constraint P (AC|B) = P (A|B)P (C|B), Figure 1 is not a valid causal explanation for it.Interpreting the variables {A,B,C} as respectively the clouds, rain and wet floor (such as in Figure 3(a)), this d-separation relation says that the occurrence of clouds becomes independent of the floor being wet if we already know whether it is raining.
The following theorem, proven in Ref. [2], makes explicit the connection between dseparation relations and conditional independence relations: Theorem 3 (d-separation and conditional independence).Let G be a DAG with nodes A, and let X ⊆ A, Y ⊆ A and Z ⊆ A be three disjoint subsets of A. Then: Importantly, in general this is not true for DAGs that include latent nodes, as we will see now.

Latent-Permitting DAGs
As discussed in the introduction, sometimes we want to allow for causal explanations that include latent variables, i.e. variables that do not appear in the final probability distribution we are trying to explain.When our DAG of interest can have latent nodes, we call it a latentpermitting DAG, as opposed to the latent-free DAGs that only have observed nodes.
Let G be a DAG that has the set of nodes A = V ∪L, V and L disjoint, where V are the observed nodes and L are the latent nodes.Mimicking the terminology used for the latentfree case, we will say that a probability distribution P (V ) over the observed variables is classically compatible with G if there exists some probability distribution P (V ,L) over V ∪L such that: • The marginal of P (V ,L) over V is the original probability distribution that we are interested in, i.e.L P (V ,L) = L P (V |L)P (L) = P (V ).
As an example, a probability distribution P ABXY over the random variables A, B, X and Y is classically compatible with the Bell DAG (Figure 2) if it can be decomposed as: When dealing with latent-permitting DAGs, we will call the conditional independence relations that only involve observed variables the observed conditional independence relations, and similarly the d-separation relations that only involve observed nodes will be called observed d-separation relations.From the definition of classical compatibility with a latent-permitting DAG, we can see that the conclusions of Theorem 3 are also valid for latent-permitting DAGs: As seen in Theorem 4, the only constraints that latent-free DAGs impose on the compatible probability distributions are the conditional independence relations, that can be obtained from d-separation.Consequently, if a DAG G has nodes A = V ∪ L, all the constraints that it imposes on the compatible joint probability distributions P (A) = P (V ,L) are the conditional independence relations.However, if we are interested only on distributions over the observed variables V , sometimes the conditional independence constraints that involve the latent variables L might induce more complicated extra constraints on the distributions over the observed variables V .If a probability distribution P (V ) over the observed variables satisfies both the observed conditional independence relations (obtained from d-separation) and the extra constraints derived from the conditional independence relations that involve the latent variables L, then it is classically compatible with G.
Therefore, in principle one could find all the d-separation relations of a DAG (involving both observed and latent nodes), thus getting conditional independence constraints on P (V ,L), and fromthere infer theconstraints on the probabilitydistribution over the observed variables, P (V ).This process will be exemplified with the Bell DAG in the beginning of the next section.Inferring constraints on P (V ) from the conditional independence relations of P (V ,L), however, is in general very complicated.

The problem: Looking for Non-Algebraic Scenarios
The goal of this work is to classify DAGs as Algebraic or not, following the concept introduced in HLP.Before establishing this concept in full generality, we will explore the example of the Bell DAG (Figure 2).
As discussed in Section 2.5, the d-separation criterion gives us all the classical compatibility constraints imposed on the probability distribution over all the nodes, P (V ,L).In the case of the Bell DAG, we have P (V ,L) = P (ABXY Λ), while P (V ) = P (ABXY ).
The conditionalindependencerelationsofP (ABXY Λ),thatcomefromthed-separation relations of the Bell DAG, are: These equations will give rise to the constraints imposed by the Bell DAG on P (ABXY ) through the marginalization of Λ. Equations (6), that do not involve Λ, are automatically transported as observed conditional independence constraints imposed by the Bell DAG on P (ABXY ).Equations ( 4) and (5), that do involve the latent variable Λ, will give rise to another type of constraint on P (ABXY ): the Bell's inequality.In fact, Equation (4) encodes the no-superdeterminism assumption and Equation (5) encodes the local causality assumption that are used to derive Bell's inequality.
Therefore, to study the Bell DAG, it is not enough to just look at the conditional independence constraints on the compatible distributions P (ABXY ) (Equations ( 6)).If one does that, they would miss important information that is encoded in Bell's inequality.This is so because the set of probability distributions P (ABXY ) that satisfy only the conditional independence relations of Equations ( 6) is strictly larger than the set of probability distributions that satisfies these conditional independence relations and Bell's inequality.In other words, Bell's inequality is not implied by Equations (6).This is the core of the concept of Non-Algebraicness: a Non-Algebraic DAG imposes nontrivial inequality constraints on the classically compatible distributions.A "nontrivial" inequality constraint is an inequality that is not implied by the observed conditional independence relations of the DAG, along with nonnegativity of all probabilities and normalization.6Note that an Algebraic DAG can impose trivial inequality constraints on the compatible distributions: for example, if the node Λ in the Bell DAG was treated as an observed node, then the Bell inequalities would still be satisfied by the compatible distributions P (ABXY Λ).However, in this case the Bell inequalities would be trivial, because they would be implied by observed conditional independence relations (Eqs.(4) and ( 5)).
To formalize the idea of Non-Algebraicness, we will introduce a few definitions: 6There is another type of equality constraint that a DAG can impose on the compatible distributions, apart from conditional independence relations: the "nested Markov constraints".In Ref. [23] it was proven that every DAG that presents nested Markov equality constraints also present nontrivial inequalities.Consequently, Non-Algebraicness may be equivalently defined relative to all implied equality constraints or relative to all implied conditional independence relations.Definition 6.Let G be a DAG.The sets C G and and I G of probability distributions over observed variables are defined as follows: 1. C G : Set of probability distributions that are classically compatible with G.
2. I G : Set of probability distributions that satisfy all the conditional independence constraints that follow from the observed d-separation relations of G.
For the case of the Bell DAG, I Bell represents the set of distributions that obey the Equations (6).By contrast, C Bell consists of a strict subset of I Bell , where we additionally restrict the conditional probabilities P (AB|XY ) to satisfy Bell's inequalities and thus lie in the local polytope.
Theorem 5 shows that C G ⊆ I G for every DAG G.This is so because all the probability distributions that are classically compatible with G have to satisfy the conditional independence constraints that come from its observed d-separation relations.When the observed conditional independence relations are the only constraints imposed by a DAG on the compatible probability distributions over observed variables, the DAG is said to be Algebraic: We borrow the terminology of Non-Algebraic and Algebraic from algebraic geometry: An algebraic set is defined by polynomial equalities (or more generally, by some finite union of sets each of which is defined by polynomial equalities).Semialgebraic sets, by contrast, are characterised by both polynomial equalities and polynomial inequalities.To emphasise that a DAG's set of (classically) compatible distributions is defined by more that just the conditional independence (notably, equality) constraints, we therefore elect to speak of such a DAG as Non-Algebraic.
As proven by HLP, the observed conditional independence constraints imposed by a DAG on the compatible probability distributions do not change when the latent variables of the DAG are associated with quantum systems or other GPT systems.Therefore, if one is interested in studying causal structures that provide any quantum or GPT observational advantage, then there is only hope among the Non-Algebraic scenarios.If a DAG is Algebraic, then all the probability distributions that exhibit the conditional independence relations associated with its observed d-separation relations can be explained by this DAG classically.
Theorem 4 implies that every latent-free DAG is Algebraic.In HLP, a stronger sufficient condition for Algebraicness is provided, together with an algorithmic strategy to check it.This condition, called the HLP criterion, will be discussed in Section 3.2.It is still not known whether the HLP criterion is also necessary for Algebraicness; its possible outcomes for a given DAG are either that it is Algebraic or that it is "unresolved".The unresolved DAGs thus need to be assessed by some other method.
Based on the HLP criterion and certain types of entropic inequalities, HLP and Pienaar [25] classified the Algebraicness or Non-Algebraicness of all DAGs of up to 6 total nodes (observed and latent), thus leaving no DAGs of 6 total nodes with inconclusive status.In this work, however, we elect to count DAGs not by their total node count but rather by their number of observed nodes.It turns out that HLP's complete classification of DAGs with total node count up to 6 meant they resolved all DAGs with 3 observed nodes, a few DAGs with 4 observed nodes, and no DAGs with 5 or more observed nodes.Here we attempt to tackle the Algebraicness classification of all causal structures with up to 4 observed nodes.7 To do so, we will utilize the mDAG formalism introduced by Evans in Ref. [22].

Simplifying the problem by using mDAG formalism
Two DAGs G and H are said to be classically observationally equivalent when C G = C H .Note that C G = C H implies that I G = I H : by Theorem 5, if a certain d-separation relation is not present in a DAG, it is always possible to find a probability distribution that violates the conditional independence corresponding to this d-separation relation and is classically compatible with the DAG.In other words, if two DAGs are classically compatible with the same sets of distributions, they have to present the same d-separation relations.8 In short, with contrapositive In particular, this means that if a DAG G is Algebraic (Non-Algebraic), then all of the DAGs H that are compatible with it are also Algebraic (Non-Algebraic).
In this section, we will present two results of [22] that prove classical observational equivalence, thus reducing the number of DAGs that have to be examined for Algebraicness.After presenting the two results, we will show that they allow for a definition of a new object, called mDAG, that encompasses this simplification.
To do so, we start with the definition of exogenization.It might be easier to understand this definition by following Figure 5, where DAG 5(b) is obtained from DAG 5(a) by exogenizing node B. Definition 8 (Exogenized DAG).Let G be a DAG and let λ be a latent variable of G.We define the exogenized DAG E(G,λ) as follows: take the vertices and edges of G and (1) add an edge m → n from every m ∈ PA G (λ) to every n ∈ CH G (λ) and ( 2) delete edges m → λ for every l ∈ PA G (λ).All other edges remain the same.
With this definition at hand, we state the Lemma 3.7 of [22]: Lemma 9 (Exogenization).Let G be DAG with observed nodes V and latent nodes L, Now, we state the Lemma 3.8 of [22].This Lemma is also illustrated in Figure 5, where it is applied to go from 5(b) to 5(c).
7Note that the set of all DAGs with 4 observed nodes includes the set of all DAGs with 7 total nodes which persist under HLP's reduction techniques.Thus, this work can also be considered an extension of HLP's classification from up-to-6 to up-to-7 total nodes.As noted in Section 5, we ulitmately resolve the Algebraicness or Non-Algebraicness of all but three DAGs of 4 observed nodes.Only one of those 3 has 7 total nodes.Thus, we ultimately resolve all but one DAG with 7 total nodes.8On the other hand, if DAGs G and H are such that I G = I H , this does not imply that C G = C H .

Lemma 10 (No redundant latents). Let G be a DAG with observed nodes V and latent nodes
In this case, we say that the node λ is "redundant".Let G ′ be the DAG obtained after deleting the node λ.Then, Like Lemma 9, Lemma 10 also shows conditions under which proving the Non-Algebraicness of one DAG automatically gives you the Non-Algebraicness of another.
For example, Lemmas 9 and 10 show that all the three DAGs of Figure 5 have the same sets C and I. Thus, we need only to examine one DAG out of the these.It makes sense to pick the DAG of Figure 5(c), as it is the simpler.
Following this same idea, we can work with the concept of an mDAG, first defined in Ref. [22].The definition below is different than, but equivalent to the one presented in Ref. [22].
Definition 11 (mDAG).An mDAG is a DAG where none of the latent nodes is redundant (as defined in Lemma 10) nor has any parents.
For example, the DAG of Figure 5(c) is an mDAG.For a fixed number of observed nodes, there is a finite number of mDAGs.In particular, for 3 observed nodes there are 46 mDAGs, while for 4 observed nodes there are 2809 mDAGs.Lemmas 9 and 10 show that the mDAG encodes all the necessary information of the DAG if you only want to talk about the sets C and I.The Lemmas here presented also give an argument in favour of counting the causal structures in terms of the number of observed nodes instead of in terms of the total number of nodes; Lemma 10 shows that DAGs 5(a) and 5(b), that have 6 total nodes, actually do not have to be analyzed; their Non-Algebraicness can be examined by looking at 5(c), that has 5 total nodes.

The HLP criterion
In this section, we describe the sufficient criterion for Algebraicness that was developed in HLP.The criterion gives some transformations that take a DAG G to another DAG H such that C H ⊆ C G while I H = I G .If the final DAG H is known to be Algebraic (for example, by being latent-free), then the original DAG G is also Algebraic.
These transformations, adjusted to the language of mDAGs, are presented in Theorem 12.
Theorem 12. Let G and H be two mDAGs.Suppose that H can be obtained by starting from G and applying one or more of the following transformations: 1. Removal of an edge.

Addition of an edge
As noted before, C ⊆ I is valid for any DAG, due to Theorem 5. Therefore, C G = I G , meaning that G is Algebraic.
Figure 6(a) exemplifies an mDAG that can be shown Algebraic by the HLP Criterion: by a sequence of transformations defined in Theorem 12 we can obtain the mDAG shown in Figure 6(c), that obeys the conditions presented in Corollary 13.
The natural question to ask here is whether the HLP criterion is also necessary for an mDAG to be Algebraic.The conjecture that the HLP criterion is also necessary will be called the HLP conjecture: Conjecture 14 (HLP Conjecture).Let G be an Algebraic mDAG.Then, by a sequence of transformations defined in Theorem 12, it is possible to start from G and reach another mDAG H such that: 1. H does not have latent nodes.As we will see, our current results do not prove this conjecture, but give hints towards its validity.

The set of observed d-separation relations of H and G is the same, i.e. I H = I
In Ref. [23], Evans has shown that: Therefore, proving the HLP conjecture would also be of relevance to the problem of classifying causal structures into classical observational equivalence classes.

Methods to determine Non-Algebraicness
In this section we discuss the methods we used to prove the Non-Algebraicness of a large number of DAGs that do not respect the HLP criterion.

Using nonmaximality to prove Non-Algebraicness
The first method to show Algebraicness that we will present relies on the concept of maximality [23,34].To define this, we will first define what are adjacent and d-separable pairs of nodes.
Definition 17 (Adjacency).Let G be an mDAG, and let A and B be a pair of nodes observed nodes of G.We say that A and B are adjacent in G if A is a parent of B, or B is a parent of A, or A and B share a common latent parent.9 Definition 18 (d-(un)separable pair of observed nodes).Let G be a DAG with nodes A = V ∪L, where V are observed nodes and L are latent nodes.A pair of observed nodes {A,B} for A ∈ V , B ∈ V is said to be d-separable if there is some subset Z ⊆ V of the remaining observed nodes such that (A⊥ d B|Z), otherwise, A and B are said to be d-unseparable.
These two definitions are related, in that they are criteria with relate to the compatibility of a particular distribution, namely, perfect correlation between one pair of nodes while all other nodes are point distributed.Consider the following distribution: Eq. ( 9) describes a probability distribution in which A and B are random but perfectly correlated, and every other observed node is point-distributed at the value 0.Then, Proposition 19 (Adjacency ⇔ P (9) ∈ C G ).The distribution in Eq. ( 9) is classically compatible with the graph G if and only if A and B are adjacent nodes in G.
Proof.See Refs.[22,35].Clearly P (9) ∈ C G when A and B are adjacent.Whenever A and B are not adjacent, then evidently A and B are e-separated10 upon removal of V \{A,B}, which also means that P (9) would violate the entropic inequality in Theorem 5 of Ref. [36].Putting Propositions 19 and 20 together, we find that P (9) ∈ I G but P (9) ̸ ∈ C G whenever a graph has A and B nonadjacent but also d-unseparable.This leads us to the concept of maximality [23,34]: Let G be a DAG.If all of the pairs of nodes of G which are not d-separable are also adjacent then G is said to be maximal, otherwise G is said to be nonmaximal.Theorem 22 [Nonmaximal].Every Algebraic DAG is maximal, that is, every nonmaximal DAG is Non-Algebraic.
Proof.Aswehave seenfromPropositions19 and20,ifa graphG isnonmaximal,then thereis some pair of observed nodes such that the distribution given by those-two-nodes-are-randomand-perfectly-correlated-while-all-other-observed-nodes-are-point-distributedliesinsome gapbetween I G and C G .Alternatively,notethatnonmaximalityimpliesNon-Algebraicness also follows from the fact that all latent-free graphs are maximal [34,Prop. 3.19].We then simply note that a nonmaximal graph cannot be observationally equivalent to any maximal graph: It follows from Eq. 8 that every pair of observationally equivalent DAGs must agree on their sets of d-(un)separable observed-node pairs.Furthermore, pursuant to Lemma 53 (discussed in Appendix B), agreement with respect to adjacency structure is also a prerequisite for observational equivalence [22,35].These facts imply that if a DAG G is nonmaximal, then it is not going to be observationally equivalent to any latent-free DAG (and will thus be Non-Algebraic by Theorem 15). Figure 8 shows an example of a 4-observed-nodes mDAG whose Non-Algebraicness may be shown by Theorem 22 [Nonmaximal].
To find out which sets of pairs of observed nodes are d-separable in G and then check whether it is maximal, is it necessary to first obtain all the d-separation relations of G? The following accessory lemma shows that this is not the case, thus simplifying the application of Theorem 22 [Nonmaximal] in practice.Proof.This follows from Theorem 6 of Ref. [37].In particular, it follows from the use of Ref. [37]'s Algorithm 3 to solve Ref. [37]'s Problem 4. In their Appendix E, HLP effectively utilized Shannon-type inequalities to certify the Non-Algebraicness of many DAGs with 5 or 6 total nodes.In particular, they found that the Shannon-type inequalities for those DAGs could by violated by distributions where two particular variables are perfectly correlated and all other observable variables are point distributed, identical in nature to the construction of Eq (9).For each of these examples, HLP highlighted the two nodes corresponding to the perfectly correlated variables in yellow.We observe here that the node pairs highlighted by HLP are precisely pairs of nodes which are not d-separable but also not adjacent.Accordingly, our Theorem 22 [Nonmaximal] similarly certifies the Non-Algebraicness of all those examples.

Using setwise nonmaximality to prove Non-Algebraicness
In the previous subsection we found that that membership in I G or C G of the distribution in which pair of observed nodes is perfectly correlated while all other observed nodes are point distributed is directly related to the concepts of d-unseparability and adjacency, respectively.Here we formulate analogous criteria in order to assess perfect correlation of three-or-more variables when all other variables in the distribution are point-distributed.
Definition 24 (Setwise adjacency).Let G be a DAG with nodes A = V ∪L, where V are observed nodes and L are latent nodes.Then, the subset of observed nodes {V 1 ...V k } is setwise adjacent in G if and only if there is some node X (possibly but not necessarily within {V 1 ...V k }) such that X is an ancestor of every node in {V 1 ...V k } not only in the DAG G but also in the subgraph of G formed by deleting all nodes V \{V 1 ...V k } from G.11 Definition 25 (Setwise d-unrestriction).Let V be the set of all observed nodes in some DAG G, and let S be some subset of V .Then, the nodes S are setwise d-unrestricted in G if and only if there does not exist any pair of nodes {S i ,S j } ⊂ S along with some (possibly empty) set of observed nodes Z ⊂ V \S such that (S i ⊥ d S j |Z).
We next show that these definitions for setwise adjacency and setwise d-unrestriction 11Note that we are using the convention that a node counts as its own ancestor, a la Refs.[30][31][32].
have the desired properties.Consider the following distribution: Eq. (10) describes a probability distribution in which the first k observed variables are random but perfectly correlated while all other observed variables are point-distributed at the value 0.Then, violates the conditional independence relation S i ⊥ ⊥ CI S j |Z, and therefore P (10) / ∈ I G .For the other direction, note that if (S i ̸ ⊥ d S j |Z) then so too (S i ̸ ⊥ d S j |Z) for any disjoint sets S i and S j wherein S i ∈ S i and S j ∈ S j .Consequently, when {V 1 ...V k } are setwise dunrestricted in G, there is no way to d-separate any subset of {V 1 ...V k } from any other subset of {V 1 ...V k } by any subset of the observed nodes outside of {V 1 ...V k }, which are the only d-separation relations whose corresponding conditional independence relations would exclude the distribution of Eq. (11).This means that, in this case, P (10) ∈ I G .
Putting Propositions 26 and 29 together lead us to a natural generalization of maximality, which we now define and employ in a theorem.
Definition 30 (Setwise Maximal DAG).Let G be DAG.If every subset of the observed nodes of G which is setwise d-unrestricted is also setwise adjacent then G is said to be setwise-maximal, otherwise G is said to be setwise nonmaximal.).On the other hand, the mDAG of Fig. 7(a) is both maximal and setwise maximal, as all maximal DAGs are also setwise maximal.
Note that there are precisely five mDAGs with 3 observed nodes which are Non-Algebraic (the Triangle scenario, the Unrelated Confounders scenario and three observationally equivalent versions of the Instrumental scenario).Theorem 31 [Setwise Nonmaximal] certifies the Non-Algebraicness of all five.Indeed, Theorem 31 [Setwise Nonmaximal] turns out to be an extremely powerful filter for recognizing Non-Algebraic mDAGs with 4 or 5 observed nodes as well, as discussed in Section 5.

Using d-separation to prove Non-Algebraicness
With Eq. (8), it was noted that any two DAGs that have different sets of observed dseparation relations are not classically observationally equivalent.In other words, imposing the same observed conditional independence constraints on the compatible distributions is a necessary condition for two DAGs to be classically observationally equivalent.
This fact can be used to establish the Non-Algebraicness of some DAGs: if we can prove that the set of observed d-separation relations of a latent-permitting DAG does not match the set of d-separation relations of any latent-free DAG, then our latent-permitting DAG is classically observationally inequivalent to all latent-free DAGs.Via Evans' [23] Theorem 15, then, we can conclude that our latent-permitting DAG is Non-Algebraic.
This type of reasoning, that says that when a certain property of a DAG G is unmatched by all latent-free DAGs then G is Non-Algebraic, will be used a few times in this subsection and in the two next ones.As such, it will be useful to define the auxiliary term Not Achievable in Latent-Free (NALF): Definition 32 (NALF property of a DAG).Let G be a latent-permitting DAG.If G has a certain property that does not match that same property of any latent-free DAG, we say that this property of G is NALF (not achievable in latent-free).
For example, we can have a DAG G whose set of observed d-separation relations is NALF.If there is a proof that a DAG is observationally inequivalent to all latent-free DAGs whenever certain property of the DAG is NALF, then this can be used to show Non-Algebraicness.As discussed, this is the case for d-separation.Definition 32).Then, G is Non-Algebraic.

Theorem 33 [NALF d-sep]. Let G be a DAG. Suppose that the set of observed dseparation relations of G is NALF (as per
Proof.From Eq. (8), G is not classically observationally equivalent to any latent-free DAG.Thus, via Theorem 15, G is Non-Algebraic.
The mDAG presented in Figure 7(a), which was not shown Non-Algebraic by Theorem 31 [SetwiseNonmaximal]duetobeingsetwisemaximal,can beshownNon-Algebraic by Theorem 33 [NALF d-sep]: its set of observed d-separation relations, A ⊥ d D|∅ and B ⊥ d C|A, is not matched by any latent-free DAG.This mDAG can be considered a special case of the bilocality scenario [38] restricted to have the same setting employed at both the extreme wings.Remarkably, it can support non-classical correlations even though the "setting" for the extreme wings is the same (in stark contradiction with the Bell scenario).
On the other hand, note that the Evans scenario (Figure 7 There are a variety of different practical methods to ascertain whether or not Theorem 33 [NALF d-sep] is satisfied for a given latent-permitting DAG G. Naively, one could construct all the latent-free DAGs with the same number of observed nodes as G, and one-by-one check whether any of them have observed d-separation relations matching those of G.Alternatively, one could envision employing a constraint-based causal discovery algorithm where the input to the algorithm is precisely the observed d-separation relations of G: If the output of the causal discovery algorithm fails to include any latent-free DAG as a viable explanation given the input constraints, then evidently G is Non-Algebraic per Theorem 33 [NALF d-sep].While such "brute force" approaches are viable for a small number of observed nodes, as the number of observed nodes increases one would need to contend with potential combinatorial explosion.As it turns out, however, when the DAG is maximal (and thus not shown Non-Algebraic by Theorem 22 [Nonmaximal]) there is an efficient way to check whether or not its set of observed d-separation relations is NALF; the efficient algorithm is presented in Appendix C.
As well as our previous methods,Theorem 33

Using e-separation to prove Non-Algebraicness
As well as the mismatch of d-separation relations is a witness of classical observational inequivalence, we can show that the mismatch of e-separation relations, a concept that will be defined below, also witnesses classical observational inequivalence.As such, we can mimic the same logic used in the last section, thus showing that e-separation relations can be used to attest Non-Algebraicness.An e-separation relation is defined as: Definition 34 (e-separation).Let G be a DAG, and let X, Y , Z and W be four disjoint sets of nodes of G. Let G del W be the DAG obtained by starting from G and deleting the nodes of W .The sets X and Y are said to be e-separated by Note that if W = ∅, the concept of e-separation reduces to d-separation.As well as for the case of d-separation, we will say that an e-separation relation is an observed e-separation relation when the sets X, Y , Z and W only involve observed nodes.Matching observed e-separation relations is a prerequisite for observational equivalence: Lemma 35 (e-separation condition for observational equivalence).Let G and H be two DAGs.If they are classically observationally equivalent (i.e.C G = C H ), then their sets of observed e-separation relations must be identical.
Proof.In Ref. [36] it is shown that e-separation relations imply in inequalities that must be satisfied by the compatible probability distributions.In Appendix E of that same reference, it is further shown that if a DAG does not exhibit an e-separation relation, then there must exist a compatible probability distribution which violates the inequality associated with that e-separation relation.This implies that, if a DAG G exhibits an e-separation relation which is not exhibited by another DAG H, then it is possible to find a probability distribution that is compatible with H but not with G.
Using this lemma, we can derive the analog of Theorem Proof.If G is nonmaximal, then there are at least two nodes A and B which are not dseparable, but are also not adjacent in G.If they are not adjacent in G, it is clear that A is e-separated from B by deletion of every other observed node of G.
Let us make a proof by contradiction: suppose that the set of e-separation relations of G is not NALF.Then, there is a latent-free DAG H which has exactly the same set of eseparation relations as G.This means that A should be e-separated from B by deletion of every other node of H, which implies that A and B are not adjacent in H.In a latent-free graph, two nodes are nonadjacent if and only if said pair of nodes are d-separable [34,Prop. 3.19].However, if A and B are d-separable in H but not in G, then their sets of d-separation relations do not coincide, which is a contradiction.
13We do not discuss how to find a latent-free DAG with the same d-separation relations as G, as ultimately we conclude that there is apparently no advantage in doing so, as discussed in Appendix C. Appendices A and B relate Theorems 22 [Nonmaximal] and 36 [NALF e-sep] to results of prior literature.In particular, in Appendix A we show that a version of Theorem 22 [Nonmaximal] in terms of e-separation that was presented in Ref. [25] is incorrect.

Using incompatible supports to prove Non-Algebraicness
The final method we used to prove Non-Algebraicness is based on the classical feasibility of supports.
Given a set of random variables {X 1 ,...,X n }, a specific set {X 1 = x 1 ,...,X n = x n } of values that these random variables can take is called an event.The support S(P X 1 ,...,Xn ) of a probability distribution P X 1 ,...,Xn over the variables {X 1 ,...,X n } is the set of events that have non-zero probability: We previously defined what it means for a probability distribution to be classically compatible with a DAG.Here, we define what it means for a support to be classically compatible with a DAG: Definition 38 (Compatibility of a support with a DAG).Let G be a DAG with observed nodes A = V ∪L, where V are observed nodes and L are latent nodes.Let S be a set of events over the variables V .We say that S is a support classically compatible with G if there exists a probability distribution P V over V that is classically compatible with G (i.e., P V ∈ C G ) and whose support is S(P V ) = S.We say that S is a support compatibleup-to-CI with G if there exists a probability distribution P V over V such that P V ∈ I G ) and whose support is S(P V ) = S.
Note that if a support is compatible with DAG G, that does not mean that every distribution with that support will be compatible with G.There are countless counterexamples, but let us simply note that the full support (the one where all events have positive probability) is compatible with any DAG, but at the same time we know of many incompatible distributions which nevertheless have full support.
Naturally, admitting the same set of compatible supports is a prerequisite for two DAGs to admit the same set of compatible distributions: Lemma 39 (Supports condition for observational equivalence).Let G and H be two DAGs.If they are classically observationally equivalent (i.e.C G = C H ), then their sets of classically compatible supports must be identical.
It remains an open question whether the condition of Lemma 39 is also necessary for observational equivalence.In particular, it is not known whether or not there exists a DAG for which some distributions are incompatible (due to inequalities) but for which all supports are compatible.
As before, this necessary condition for observational equivalence immediately translates into a method for proving Non-Algebraicness: Theorem 40 [NALF Supports].Let G be a DAG.Suppose the set of classically compatible supports of G is NALF (as per Definition 32).Then, G is Non-Algebraic.
To exploit Theorem 40 [NALF Supports] in practice, we need an an algorithm capable of assessing whether or not a given support is compatible with a given DAG.Such algorithm was developed in Ref. [24], and refer to it here as Fraser's algorithm.We have implemented Fraser's algorithm in Python and scripted it to yield all the supports that are classically incompatible with a given DAG (for a certain assignment of the cardinalities of the observed variables).
In general, Fraser's algorithm is much more computationally expensive than simply assessing whether or not a graph exhibits some d-separation or e-separation relation.Consequently, we consider Theorem 40 [NALF Supports] a method of last resort to show Non-Algebraicness.

Rapidly testing supports (without comparing to any latent-free graph)
As well as for the case of e-separation, Theorem 40 [NALF Supports] has the downside that it requires one to find the supports compatible with the DAG G and then check the compatible supports of all the latent-free DAGs with the same number of observed nodes (or alternatively to find the latent-free H that has the same set of d-separation relations as G, and then check which supports are compatible with H).Since Fraser's algorithm is computationally expensive, doing this in practice can be cumbersome.
Luckily, it is possible to develop a rapid supports test where it is not even necessary to find all of the supports compatible with G.The idea of the rapid supports test comes from noting that sometimes we can prove the incompatibility of a given support with a DAG G by recognizing that the given support conflicts with a d-separation relation exhibited by G.
Suppose, for instance, that a DAG G has the (unconditional) d-separation relation A⊥ d B. Then, the support in Eq. ( 14) is clearly incompatible with G, since any probability distribution with that support will must have P AB (1,1) = 0 ̸ = P A (1)P B (1) > 0, contradicting A⊥ ⊥ CI B.
Indeed, we can formally categorize all such "trivial" proofs of support incompatibility through the following two definitions: Definition 41 (Support conflicting with a conditional independence relation).Let S be a support over a set V of variables, and let A ⊆ V , B ⊆ V and C ⊆ V be three disjoint subsets of V .We say that S conflicts with the conditional independence relation A⊥ ⊥ CI B|C if there exists a set {a,b,c} of values of the variables in A, B and C such that the events {A = a,C = c} and {B = b,C = c} occur in S, but the event {A = a,B = b,C = c} does not occur in S.
For example, the support of Eq. ( 14) conflicts with the conditional independence relation A⊥ ⊥ CI B: both the events A = 1 and B = 1 occur in the support, but the event {A = 1,B = 1} does not.
If a support S conflicts with a conditional independence relation A ⊥ ⊥ CI B|C, then there is no probability distribution with support S that obeys A⊥ ⊥ CI B|C.This will be seen explicitly in the proof of Lemma 43.

Definition 42 (Triviality of support incompatibility).
A support S is said to be trivially incompatible with a given DAG whenever the DAG exhibits some d-separation relation whose associated conditional independence relation conflicts with S (as in Definition 41).
By generalizing the discussion made around Eq. ( 14), we see that this definition indeed implies in classical incompatibility of the support with the DAG: Lemma 43 (Trivial incompatibility implies incompatibility).If a support S is trivially incompatible with a DAG G, then it is classically incompatible with G.
Proof.Let A⊥ d B|C be a d-separation relation of G which is in conflict with S. Furthermore, let {a,b,c} be a set of values of the variables in A, B and C that witnesses this conflict.This means that all of the probability distributions which have the support S must have P AB|C (ab|c) = 0 ̸ = P A|C (a|c)P B|C (b|c) > 0. Therefore, S is not classically compatible with G.
The key insight which allows us to accelerate the application of Theorem 40 [NALF Supports] is that for a latent-free DAG, the only supports incompatible with it are those which are trivially incompatible with it.
Lemma 44 (Latent-free support compatibility).Let H be a latent-free DAG.If S is a support which is not trivially incompatible with H as per Definition 42, then S is classically compatible with H as per Definition 38.14 however, we can instead consider variant algorithms related to Inflation [41] which cannot certify support compatibility but which can often efficiently detect support incompatibility.Some such algorithms are discussed in Ref [42], for example.
It is clear that the application of Theorem 45 [Rapid Supports] is much more efficient than the application of Theorem 40 [NALF Supports].We can also show that both theorems are equally powerful: Proof.First, suppose that there exists a latent-free DAG H such that and hence every support which is incompatible with G must also be incompatible with H.If Theorem 40 [NALF Supports] shows the Non-Algebraicness of G, then there must be a support which is compatible with H but not G.Since the set of supports incompatible with H is exactly the set of trivially incompatible supports as per Lemmas 43 and 44, it follows that the support which is incompatible with G but not H must not be trivially incompatible with G. Therefore, the Non-Algebraicness of G can be shown by Theorem 45 [Rapid Supports].Now, suppose that there is no latent-free DAG that has the same set of d-separation relations as G, i.e. that the set of d-separation relations of G is NALF.If G is not maximal, in the proof of Theorem 22 [Nonmaximal] we showed a distribution which is not compatible with G (Eq. 9).In reality, all the distributions which have the same support as this one will be incompatible with G. Furthermore, since Eq. 9 is an element of I G , this support is not trivially incompatible with G. Therefore, the support of the distribution of Eq. 9 is an incompatible support which is not trivially incompatible.
If the set of d-separation relations of G is NALF and G is maximal, per Theorem 56 we know that G has one of the eighteen mDAGs of Figure 12 as a subgraph.One can explicitly check that these eighteen mDAGs have incompatible supports that are not trivially incompatible; therefore, G itself also must have incompatible supports that are not trivially incompatible, namely, by taking all its observed variables outside of the pertinent subgraph to be point distributed.Therefore, this case also falls under the scope of Theorem 45 [Rapid Supports].
It is also worth noting that, when there is a latent-free DAG H with the same set of dseparation relations as G, Theorem 45 [Rapid Supports] is constructive: the distribution P constructed for H in the proof of Lemma 44 is such that An example of a support which is incompatible -but not trivially incompatible -with the Evans scenario (Figure 7(b)) is the following: The Evans scenario does not have any d-separation relation.Nevertheless, the support S Evans of Equation (15) is not compatible with it.This can be seen by noting that the variable C in S is associated with a point distribution: it always takes the value 0. Since in the Evans scenario all the correlation between D and E is established through C, it is impossible to have perfect correlation between D and E while C takes a point distribution.
An example of DAG whose Non-Algebraicness was first certified in Ref. [24] via the discovery of an incompatible support is presented in Figure 10.By means of his eponymous algorithm for compatible supports, Fraser showed that the following support is classically incompatible with the DAG of Figure 1015 : When using Theorem 45 [Rapid Supports] to attest Non-Algebraicness, we start by checking supports at binary cardinalities of the observed variables.If such an incompatible but not trivially incompatible support is found, we can try to search for such a support at higher cardinalities of the observed variables.
Remark 47.We anticipated that if a DAG has any incompatible support (for any cardinality), then it seemed likely that we should expect to find some incompatible support where all variables have binary cardinality.Indeed, prior to this work, we are not aware of any counterexample.Even the three challenging DAGs identified in Figure 14 of Ref. [22] were eventually found to have incompatible supports with merely binary variables.However, this intuition turns out to be misplaced: We identified 4 mDAGs for which the high-cardinality support of Eq. 17 is identified as incompatible, but where nevertheless every support over binary variables is provably compatible.These mDAGs are depicted in Table 2.

Incompatible supports subsumes all other methods
The methods to show Non-Algebraicness that we presented here are not independent of each other.For example, it is clear that Theorem 36 [NALF e-sep] subsumes Theorem 33 be a powerful tool to show Non-Algebraicness; at this stage, we are left with only 186 unresolved mDAGs after this preliminary filtering.
We then exploit the d-separation test for Non-Algebraicness per Subsection 4.3.That is, among those 186 remaining mDAGs we filter out any mDAGs which possess observed d-separation relations not matching those of some latent-free DAG (NALF d-separation relations).We find that 168 mDAGs remain as-yet unresolved -the 18 mDAGs that are shown Non-Algebraic at this stage are the ones presented in Figure 12.
The e-separation test for Non-Algebraicness presented in Subsection 4.4 does not resolve any of these 168 unresolved cases.As mentioned, it is still an open problem whether the nonmaximality test and the d-separation test together will always subsume the e-separation test.We then turn to the method of supports analysis per Subsection 4.5, which ultimately leaves us with only 3 remaining unresolved mDAGs.
More specifically, by considering supports with binary cardinalities of the observed variables we were able to certify the Non-Algebraicness of 161 out of the 168 remaining mDAGs.We could not, however, find any classically incompatible supports for the remaining 7 mDAGs when only considering binary cardinality variables.But by increasing the cardinality of variable A to be three we were able to identify a support that is incompatible -but not trivially incompatible -with the four mDAGs in Table 2, hence certifying their Non-Algebraicness.Said support is explicitly reproduced in Eq. (17); it is obviously not trivially incompatible with any of the mDAGs in Table 2, since none of those four mDAGs exhibits any d-separation relations over its observed nodes.The remaining 3 mDAGs which we were unable to resolve as Non-Algebraic via supports considerations -up to computational tractability limits -are depicted in Table 3.
The exact number of mDAGs that we are able to characterize as Non-Algebraic or Algebraic at each stage is summarized in Table 1.

On the potential Non-Algebraicness of the remaining 3 mDAGs
The 3 as-yet unresolved mDAGs with 4 observed nodes are depicted in Table 3.For these 3 mDAGs we could not find any incompatible supports, at least up to the small cardinalities of the observed nodes that we checked.Searching for incompatible supports at higher cardinalities of the observed nodes using Fraser's algorithm is computationally expensive, as the algorithm's complexity increases significantly on increasing the cardinalities.Perhaps future acceleration of Fraser's algorithm may allow us to probe supports for higher cardinalities.For the present work, however, we considered one final attempt to prove16 the Non-Algebraicness of these 3 mDAGs, namely, by exploring entropic inequalities.For a more comprehensive introduction to entropic inequalities and Shannon cones see Refs.[43,44].
In particular, we attempted to isolate some Shannon-type inequalities that constrain C G , but are not Shannon-type inequalities for I G .This analysis is also computationally expensive -often intractable -as generating the Shannon-type inequalities corresponding 16The entropic technique is predicated on trying to prove that the Shannon-type entropic inequalities constraining C G define a hypercone strictly in the interior of the hypercone given by the Shannontype entropic inequalities constraining I G .Even when such a finding can be established, however, it need not mean that C G ⊊ I G , though that is almost certainly the case.The loophole -however implausible -is that perhaps the true entropy cone constraining I G actually coincides with (or is interior to) the projection of the Shannon cone associated with C G .See Appendix E of Ref. [17].  to C is accomplished via linear quantifier elimination.The complexity of the most common algorithm for performing linear quantum elimination is doubly exponential in the number of eliminated variables, though alternative algorithms have different complexities [45].
We can nevertheless certify that the Shannon cone corresponding to C G and I G are indistinguishable for these 3 remaining mDAGs without explicitly constructing the Shannon cone for C G .We do so as follows: 1. From all the Shannon type inequalities corresponding to I G , generate the extremal rays of this cone.
2. Check whether each of those extremal rays is implicitly contained in the Shannon cone corresponding to C G by asking if the Shannon-type inequalities over all the variables (thus not using linear quantifier elimination) corresponding to C are satisfiable by the given extremal ray.If every extremal ray of the Shannon cone of I G is contained in the Shannon cone corresponding to C G , then those two Shannon cones coincide.
For the 3 mDAGs in Table 3 we find that the Shannon cones corresponding to C G and I G are the same.That is, we were unable to find any valid Shannon type inequality for C G that is not also a Shannon type inequality for I G for these 3 mDAGs.Thus, entropic methods are incapable of proving the Non-Algebraicness of these 3 mDAGs, unless perhaps we explore non-Shannon-type inequalities or entropic inequalities involving non-Shannon entropies.

Conclusions
In this work, we contributed to causal investigation by categorizing which causal structures of 4 observed nodes present inequality constraints or not.To do so, we developed a plethora of techniques to prove that a causal structure is Non-Algebraic (has inequality constraints), while we used one single technique to prove that a causal structure is Algebraic: the HLP criterion.
As can be seen from Table 1, out of the 2809 mDAGs with 4 observed nodes, the HLP criterion shows that that 1813 are Algebraic.Out of the remaining 996 mDAGs, our techniques showed 993 of them to be Non-Algebraic, while we are still uncertain about the status of 3 mDAGs (presented in Figure 3).While these 3 remaining mDAGs are still potential counter-examples to the HLP conjecture (which says that the HLP criterion is necessary and sufficient for Algebraicness), we believe that our numerical results are a hint towards the validity of this conjecture.A truly thorough analysis of all mDAGs with 5 observed nodes proved to be quite computationally demanding.Nevertheless, in Appendix E we show that -among those 5-node mDAGs which the HLP criterion fails to certify as Non-Algebraicat least 99% are Non-Algebraic, which we again elect to interpret as at least consistent with the HLP conjecture.
It is also interesting to note that all of our techniques to show Non-Algebraicness give explicit constructions of distributions which are in I G but not in C G , i.e., respect the conditional independence constraints of DAG G but not its inequality constraints.Theorem 22 [Nonmaximal] is related to the construction of Eq. 9, as well as Theorem 31 [Setwise Nonmaximal] is related to the construction of Eq. 10.Theorem 45 [Rapid Supports] is constructive whenever there is a latent-free DAG H with the same set of d-separation relations as G (such construction can then be found in the proof of Lemma 44).If G is maximal but there is no latent-free DAG H with the same set of d-separation relations as G, then Theorem 56 says that G has one of the eighteen DAGs of Figure 12 as a subgraph.In the end of Appendix C, an explicit distribution which is in I G but not in C G for these eighteen DAGs is presented: it is the uniform distribution over the events in the Popescu-Rohrlich support presented in Eq. (13).
In particular, our Theorem 22 [Nonmaximal] which showed itself to be very powerful in proving Non-Algebraicness, is a corrected version of the e-separation theorem of [25] (as discussed in Appendix A).
By showing practical tools to attest that a causal structure presents inequality constraints, this work simultaneously contributes to purely classical causal inference and advances the question of which causal scenarios might exhibit quantum or post-quantum advantage.node E, and neither (F ⊥ d D|C) nor (F ⊥ d D|CE) hold true.So this DAG is again -but this time, wrongly -characterised as Non-Algebraic by Pienaar's theorem.As previously discussed, all latent-free DAGs are Algebraic!In fact, every DAG for which the conditions of Theorem 50 hold can be converted into a different latent-free DAG for which the conditions of the theorem would continue to hold by making all the nodes observed.That is, for every DAG which is correctly classified as Non-Algebraic by the (invalid) condition formulated as Theorem 50 one can construct a latent-free counterexample to Theorem 50.The flaw with Pienaar's proof of the "if" direction is that it invokes a probability distribution that may not actually be in I, in that the constructed distribution may be inconsistent with some d-separation relation not mentioned in the statement of Theorem 50.For example, in Figure 11 We know that there are DAGs for which one cannot find any DAG H with the required properties to make use of Theorem 54; see Appendix C for examples.In light of Theorem 15 we would want to reformulate Theorem 54 in a manner that is clearly (strictly) more powerful, which removes the caveat about finding such an H.
Theorem 55 (Improved skeleton condition for Non-Algebraicness).Let G be the DAG we need to check for Non-Algebraicness.Consider all latent-free graphs which share the same skeleton as that of G.If no latent-free graph within that set furthermore matches the observed d-separations of G, then G is Non-Algebraic.
As it turns out, however, Theorem 55 is equivalent to the conjunction of Theorems 22 [Nonmaximal] and 33 [NALF d-sep].First, we can see that Theorem 22 [Nonmaximal] is a special case of Theorem 55.If G is nonmaximal, then it has at least one non-adjacent pair of observed nodes which is d-unseparable.This implies that all the latent-free DAGs which have the same skeleton (same adjacency structure) as G have a different set of d-separable pairs of nodes than that of G, because all latent-free DAGs are maximal [34,Prop. 3.19].This then implies that they have a different set of d-separation relations than those of G, so Theorem 55 witnesses all nonmaximal DAGs as Non-Algebraic.Secondly, it is easy to see that Theorem 33 [NALF d-sep] is also a special case of Theorem 55: if there are no latentfree DAGs that match the set of d-separation relations of G, then in particular there are no latent-free DAGs that match the the set of d-separation relations and the skeleton of G.
Next, we prove the inverse, that Theorems 22 [Nonmaximal] and 33 [NALF d-sep] together subsume Theorem 55.Let G be a DAG which is shown Non-Algebraic by Theorem 55.If G is nonmaximal, then it is also shown Non-Algebraic by Theorem 22 [Nonmaximal].What remains, then, is to show that there are no maximal DAGs which cannot be proven Non-Algebraic via Theorem 33 [NALF d-sep] but can be seen as Non-Algebraic via Theorem 55.But if the DAG G is both maximal and shares the same dseparation relations as some latent-free DAG H, it automatically follows that G and H agree on their skeletons as well.After all, G and G are both maximal, which means that their skeletons are dictated by their respective d-unseparable node pairs, which are identical.
Note that Theorem 55 is also subsumed by Theorem 36 [NALF e-sep], since agreeing on e-separation relations implies agreeing on adjacencies (i.e., skeleton) as well as on dseparation relations.
When the DAG is maximal, to see whether its set of d-separation relations is NALF one just needs to check whether it has one of the DAGs of Figure 12 as a subgraph.This was recognized in Evans' own proof of Theorem 15 [23], as we argue in the proof of the following theorem: Proof.Evans' [23] has shown that if a latent-permitting DAG G has a NALF set of observed d-separation relations, then either its associated PAG (partial ancestral graph) contains a so-called "locally unshielded collider path" of length 3, and/or it contains a so-called "discriminating path" of length 3.18 This implies that the PAG associated with G will contain a sub-PAG matching one of the 4-node PAGs depicted within Figures 4 and 5(i) of [23] whenever the observed d-separation relations of G are NALF.
A PAG is an abstract graphical representation of a set of d-separation relations.In particular, two nodes are adjacent in a PAG if and only the two nodes are d-unseparable in the original DAG.A "path of length 3" in a PAG refers to a set of four nodes {X,A,B,Y } such that {X,A}, {A,B}, {B,Y } all constitute d-unseparable pairs.
The adjacencies of a DAG G and the adjacencies of the PAG associated with G can, in general, differ.For maximal DAGs, however, they must coincide: for those, adjacency is equivalent to d-unseparability.The PAG associated with a maximal DAG G will exhibit an unshielded collider path or a discriminating path if and only if in the original G we can find four nodes {X,A,B,Y } such that {X,A}, {A,B}, {B,Y } represent adjacent pairs and such that the d-separation relations pertaining exclusively to {X,A,B,Y } are of one of "unshielded collider path" type or "discriminating path" type.We do not define these two types for brevity, but we exhibit all such 4-node mDAGs in Figure 12.
The set of d-separation relations of each one of the DAGs in Figure 12  [Figure 12 (p)-(r)] By checking the d-separation of all the 4-observed-node mDAGs, we know that these are the only mDAGs that present these sets of d-separation relations.While searching for 18The phrasing presented in the proof of Proposition 4.2 of Ref. [23] says that either the associated PAG has a locally unshielded collider path of length exactly 3, all options of which are presented in Figure 4 of Ref. [23], or a discriminating path of length at least 3.However, Proposition B.1 of [23] further shows that all discriminating paths of length at least 3 either contain a locally unshielded collider path of length 3 and/or a discriminating path of length exactly 3, which is presented in Figure 5(i) of [23].
(p) (q) (r) subgraphs is computationally easy, an alternative is to consider all 4-node subsets of a given large mDAG and ask if the d-separation relations pertaining exclusively to some four nodes contains all and only one of the patterns listed above, up to relabelling.If yes, and if the large mDAG G is maximal, then G certainly contains one of the patterns of Figure 12 as a subgraph.Figure 13 shows an example of a DAG that does not have any of the DAGs of Figure 12 as a subgraph, but nevertheless has a NALF set of d-separation relations.However, this is not a problem: this DAG is not maximal (X and Y are not d-separable but are e-separable by the empty set), so its Non-Algebraicness follows from Theorem 22 [Nonmaximal].Finally, we will show an explicit construction of a distribution which is in I G but not in C G for the maximal mDAGs which are shown Non-Algebraic by Theorem 22 [Nonmaximal].First, we note that the Popescu-Rohrlich box support shown in Eq. ( 13) is not compatible with any of the mDAGs of Figure 12, as can be shown using Fraser's algorithm.This implies that the uniform distribution over the events of the Popescu-Rohrlich box support is not classically compatible with any of the mDAGs of Figure 12, and is thus not an element of C G for these mDAGs.On the other hand, it is well-known that said distribution obeys all of the conditional independence relations that come from the d-separation relations of the Bell DAG.As seen above, all the d-separation relations of the mDAGs of Figure 12 are included in the d-separation relations of the Bell DAG, which implies that such distribution is an element of I G for all the mDAGs of Figure 12 .
The application of the conditions for Non-Algebraicness presented here to mDAGs of 5 observed nodes gives the results shown in Table 4.Although the application of Fraser's algorithm on the remaining 12,834 mDAGs was computationally infeasible, nevertheless, the application of all the other techniques provides a similar success percentage as compared to mDAGs of 4 and 3 observed nodes.Precisely, for mDAGs of 5 observed nodes all the other techniques (apart from supports) reduce the number of unresolved mDAGs of 5 observed nodes by 99.15%, while this percentage is 99.89% and 97.8% for mDAGs of 4 and 3 observed nodes respectively.This result is consistent with the HLP conjecture, though whether or not it can be considered evidence in favor of the conjecture is debatable. Unsurprisingly

Figure 2 :
Figure 2: The Bell DAG.It encompasses the assumptions of Bell's theorem for a Bell Scenario where X and Y are the measurement settings of Alice and Bob, A and B are their outcomes and Λ is a classical hidden variable.The probability distributions that are classically compatible with this DAG are those that decompose as in Equation (3).

Figure 4 :
Figure 4: Example of a directed acyclic graph (DAG).The probability distributions that are classically compatible with this DAG are those that can be decomposed as in Equation (2).

Theorem 5 (
Observed d-separation in latent-permitting DAGs).Let G be a DAG with nodes A = V ∪ L, where V are observed nodes and L are latent nodes.Let X ⊆ V , Y ⊆ V and Z ⊆ V be three disjoint sets of observed nodes of G. Then: 1.If G has the d-separation relation X ⊥ d Y |Z, then all of the probability distributions over the variables V which are classically compatible with G need to satisfy X ⊥ ⊥ CI Y |Z. 2. If G does not have the d-separation relation X ⊥ d Y |Z, then there exists some probability distribution over the variables V which is classically compatible with G and does not satisfy X ⊥ ⊥ CI Y |Z.

Figure 5 :
Figure 5: The DAGs (a) and (b) are classically observationally equivalent to the mDAG of (c).The step (a)→(b) can be shown by Lemma 9, and the step (b)→(c) can be shown by Lemma 10.

Figure 6 :
Figure 6: An example of the application of the HLP criterion.All of these three mDAGs have the same set of d-separation relations: A ⊥ d E, A ⊥ d E|C,A ⊥ d C and A ⊥ d C|D.Since (c) is latent-free, we can conclude that (a) is Algebraic.

Proposition 20 (
d-unseparability ⇔ P (9) ∈ I G ).The distribution in Eq. (9) satisfies all the conditional independence constraints that follow from the observed d-separation relations relations of graph G if and only if A and B are d-unseparable in G. Proof.The only conditional independence relations that the distribution in Eq. (9) fails to satisfy are those of the form A ⊥ ⊥ CI B|Z.Such conditional independence relations follow from the observed d-separation relations relations of graph G if and only if A and B are dseparable in G.

Figure 7 Figure 7 :
Figure 7  shows an example of a maximal DAG (7a) and an example of a nonmaximal DAG (7b).

Lemma 23 (
Rapid test for d-separability of pairs).A pair of node subsets A and B of a DAG G are d-separable by some set of observed nodes if and only if they are d-separable by a particular set of observed nodes, namely, the set of all and only those observed nodes which are ancestors of either A or of B. In other words, A⊥ d B | ANC G (A)∪ANC G (B) whenever there exists some set Z such that A⊥ d B|Z.

Figure 8 :
Figure 8: A DAG with 4 observed nodes that is shown to be Non-Algebraic per Theorem 22 [Nonmaximal].This can be seen because G and F not d-separable, i.e. none of the d-separation relations (G⊥ d F |E), (G⊥ d F |D) or (G⊥ d F |D,E) hold, but they are not adjacent.

Theorem 31 [Figure 9 :
Figure 9: Examples of maximal DAGs which are setwise-nonmaximal.(a) is the Triangle scenario, where the set {A,B,C} (all visible nodes) is setwise d-unrestricted but not setwise adjacent.In (b), the set {A,B,C} is setwise d-unrestricted (in fact, there are no d-separation relations between observed nodes) but not setwise adjacent; by contrast, the larger set {A,B,C,D} is both setwise d-unrestricted and setwise adjacent, despite (b) exhibiting the d-separation relation (A⊥ d D|B), as that d-separation relation involves conditioning on a node within the set.In (c), both of the sets {A,B,C} and {A,B,C,D} are d-unrestricted but not setwise adjacent.Note that {A,B,C,D} is d-unrestricted in the DAG (c) despite that DAG exhibiting the d-separation relation (A⊥ d D|B,C), as that d-separation relation involves conditioning on nodes within the set.
(b)), which was shown Non-Algebraic by Theorem 22 [Nonmaximal], cannot be shown Non-Algebraic via Theorem 33 [NALF d-sep]: it does not have any d-separation relation, just like the saturated latent-free DAGs.Therefore, Theorems 22 [Nonmaximal] and 33 [NALF d-sep] are not redundant to each other.Similarly, Theorems 31 [Setwise Nonmaximal] and 33 [NALF dsep] are not redundant to each other.
[NALF d-sep] is only a sufficient condition for Non-Algebraicness, and not necessary.If there exists a latent-free DAG H which has the same observed d-separation relations as G (i.e.I H = I G ), this does not imply that C G = C H , and as such we cannot conclude anything about the relation between C G and I G .We will now discuss other sufficient conditions for Non-Algebraicness that can be used when Theorems 22 [Nonmaximal] and 33 [NALF d-sep] fail.
33 [NALF d-sep] for the case of e-separation: Theorem 36 [NALF e-sep].Let G be a DAG.Suppose that the set of observed eseparation relations of G is NALF (as per Definition 32).Then, G is Non-Algebraic.Proof.Follows directly from Lemma 35 and Theorem 15.To use Theorem 36 [NALF e-sep] in practice, one might enumerate the e-separation relations exhibited by every latent-free DAG with the same number of observed nodes of G and compare them to the observed e-separation relations of G. Far more efficiently, one need only check against any one latent-free graph which matches the observed d-separation relations of G. (Such a latent-free DAG must exist if G is not already certified as Non-Algebraic by Theorem 33 [NALF d-sep]13).After all, if two latent-free graphs share the same d-separation relations, they will also share the same e-separation relations, per Theorem 4 and Lemma 35.We thus advise verifying that a DAG G in question is not already certified as Non-Algebraic by Theorem 33 [NALF d-sep] before invoking Theorem 36 [NALF e-sep].It is clear that every DAG that can be shown Non-Algebraic by Theorem 33 [NALF dsep] can also be shown Non-Algebraic by Theorem 36 [NALF e-sep], since d-separation is a special case of e-separation.A little less trivial, every DAG that can be shown Non-Algebraic by Theorem 22 [Nonmaximal] can also be shown Non-Algebraic by Theorem 36 [NALF e-sep]: all nonmaximal DAGs have a set of e-separation relations which is NALF.Proposition 37 (Theorem 36 [NALF e-sep] subsumes Theorem 22 [Nonmaximal]).Let G be a nonmaximal DAG.Then, the set of e-separation relations of G is NALF.
Theorems 22 [Nonmaximal] and 33 [NALF d-sep] is as good as Theorem 36 [NALF e-sep] to show Non-Algebraicness.We did not find any DAG that was shown Non-Algebraic by Theorem 36 [NALF e-sep] but not by any of these two previous methods.(See Appendix E for further discussion of this question.)Note, however, that Theorem 36 [NALF e-sep] does not subsume Theorem 31 [Setwise Nonmaximal].For example, the Triangle scenario (Fig. 9(a)), that is shown Non-Algebraic by Theorem 31 [Setwise Nonmaximal], does not have any e-separation relation (just like the saturated latent-free DAGs).

Proposition 46 .
Let G be a Non-Algebraic DAG.If the Non-Algebraicness of G can be shown via Theorem 40 [NALF Supports], then it can also be shown via Theorem 45 [Rapid Supports].

Figure 10 :
Figure 10: A DAG with 4 observed nodes and 7 total nodes whose Non-Algebraicness can be shown by Fraser's algorithm for compatible supports.

Figure 11 :
Figure 11: (a) depicts a scenario characterised correctly by Pienaar's theorem as Non-Algebraic, whereas (b) depicts an Algebraic scenario which Pienaar's theorem incorrectly deems Non-Algebraic.
, Pienaar's proof would invoke a distribution which posits perfect correlation between D and F while all other variables are point-distributed.While such a distribution is consistent with all the observable d-separation relations of Figure 11a, it is nevertheless inconsistent with the d-separation relation (F ⊥ d D|BCE) exhibited by Figure 11b.This loophole in Pienaar's proof can be closed by strengthening the conditions of Pienaar's theorem, namely, to exclude (F ⊥ d D|S) for any subset S of the the remaining observed nodes.In other words, Theorem 51 (A corrected version of Pienaar's e-separation theorem).Let G be a DAG, and let X, Y , Z and W be disjoint sets of nodes of G such that none of the nodes of Z is a descendant of a node in W and (X ⊥ e Y |Z) del W . Then G is Non-Algebraic if the d-separation relations of G do not include any relations of the form (X ⊥ d Y |S), where S may be any subset of the observed nodes of G other than X ∪Y .However, a graph can only exclude relations of the form X ⊥ d Y |S for any S if for every pair of singleton nodes {X,Y } such that X ∈ X and Y ∈ Y it holds that (X ⊥ d Y |S).On the other hand, a pair of nodes {X,Y } can only be e-separated by Z upon the removal of W if X and Y are not adjacent.Consequently, a DAG can only be certified as Non-Algebraic by Theorem 51 if there exists a pair of d-unseparable but nevertheless non-adjacent nodes.
Theorem 56 (rapid d-separation condition for Non-Algebraicness).Let G be an mDAG.If G is not maximal, then it is Non-Algebraic byTheorem 22 [Nonmaximal].If G is maximal, then it has a set of observed d-separation relations unmatched by any latent-free DAG (NALF) -and is therefore Non-Algebraic pursuant to Theorem 33 [NALF d-sep] -if and only if it contains one of the eighteen graph patterns (up to relabelling the nodes) presented in Figure12as a subgraph.

Figure 12 :
Figure 12: A maximal DAG has a set of d-separation relations unmatched by any latent-free DAG (and is thus Non-Algebraic by Theorem 33 [NALF d-sep]) if and only if it has one of these eighteen patterns as a subgraph.

Figure 13 :
Figure 13: nonmaximal DAG.It does not have any of the DAGs of Figure 12 as a subgraph, but it nevertheless has a d-separation pattern that does not correspond to any latent-free DAG.
, our d-separation condition as articulated in Theorem 33 [NALF dsep] is now effectively redundant to the conjuction of Proposition 58 and Theorem 22 [Nonmaximal], since a maximal mDAG with 5+ observed nodes will have a set of observed d-separation relations inequivalent to any latent-free DAG only if the large mDAG contains one of 18 particular 4-observed-nodes mDAGs, as discussed extensively in Appendix C. The fact that the Theorem 36 [NALF e-sep] does not resolve any further mDAGs as Non-Algebraic is evidence in favor of a conjecture that Theorem 36 [NALF e-sep] is perhaps subsumed by the conjunction of Theorems 22 [Nonmaximal] and 33 [NALF d-sep].

Table 1 :
A summary of our findings: apart from the HLP criterion, which shows Algebraicness, all of the other conditions listed show Non-Algebraicness.Note that here we are counting by unlabelled DAGs.That is, two labelled DAGs which are equivalent under a relabelling of the observed nodes and/or a relabelling of the hidden nodes are represented by a single unlabelled DAG in these enumerations.

. A probability distribution over the variables A is Markov with respect to G if and only if it satisfies all of the conditional independence relations associated with the d-separation relations of G.
then all of the probability distributions over the variables A which are Markov with respect to G need to satisfyX ⊥ ⊥ CI Y |Z.2.If G does not have the d-separation relation X ⊥ d Y |Z, then there exists some probability distribution over the variables A which is Markov with respect to G and does not satisfy X ⊥ ⊥ CI Y |Z.
(10)etwise Adjacency ⇔ P(10)∈ C G ).The distribution in Eq. (10) is classically compatible with graph G if and only if {V 1 ...V k } are setwise adjacent in G.Lemma 27 (Partial point distribution =⇒ subgraph compatibility).Suppose P (V ) is some distribution wherein the variables V \{V 1 ...V k } are point-distributed and moreover all the variables in {V 1 ...V [31,have finite cardinality.Then, P (V ) is classically compatible with G if and only if P ({V 1 ...V k }) is compatible with the subgraph of G formed be deleting all nodesV \{V 1 ...V k } from G.Proof.This lemma is an immediate consequence of the e-separation theorem central in Ref.[35].Lemma 28 (setwise correlation =⇒ common ancestor).Let P perfect correlation (A) be the distribution in which all variables in A are random but perfectly correlated with each other.Then, P perfect correlation (A) is compatible with a DAG G if and only if all the nodes in A share some common ancestor in G.Proof.The "if" direction is trivial; the "only if" direction follows from Ref.[31, Theorem 2].Note that Proposition 26 implies Proposition 19 as a special case: A pair of observed nodes share a common ancestor upon removing all other observed nodes from a DAG G if and and only they are adjacent in G.We likewise highlight the utility of the definition of a setwise d-unrestricted set: Proposition 29 (Setwise d-unrestriction ⇔ P (10) ∈ I G ).The distribution in Eq. (10) satisfies all the conditional independence constraints that follow from the observed d-separation relations relations of graph G if and only if {V 1 ...V k } are setwise dunrestricted in G. Proof.Suppose that {V 1 ...V k } are not setwise d-unrestricted in G.Then, there exists a pair of nodes {S i ,S j } ⊆ {V 1 ...V k } such that G exhibits the d-separation relation (S i ⊥ d S j |Z).In this case, the distribution

Table 2 :
The only 4 mDAGs with 4 observed nodes for which every support over binary observed variables is classically compatible but which are nevertheless provably Non-Algebraic by virtue of the higher-cardinality support given in Eq. (17) being incompatible with all 4 DAGs here.

Table 3 :
The mDAGs of 4 observed nodes whose Non-Algebraicness remains unresolved.