Quantum common causes and quantum causal models

Reichenbach's principle asserts that if two observed variables are found to be correlated, then there should be a causal explanation of these correlations. Furthermore, if the explanation is in terms of a common cause, then the conditional probability distribution over the variables given the complete common cause should factorize. The principle is generalized by the formalism of causal models, in which the causal relationships among variables constrain the form of their joint probability distribution. In the quantum case, however, the observed correlations in Bell experiments cannot be explained in the manner Reichenbach's principle would seem to demand. Motivated by this, we introduce a quantum counterpart to the principle. We demonstrate that under the assumption that quantum dynamics is fundamentally unitary, if a quantum channel with input A and outputs B and C is compatible with A being a complete common cause of B and C, then it must factorize in a particular way. Finally, we show how to generalize our quantum version of Reichenbach's principle to a formalism for quantum causal models, and provide examples of how the formalism works.


I. INTRODUCTION
It is a general principle of scientific thought-and indeed of everyday common sense-that if physical variables are found to be statistically correlated, then there ought to be a causal explanation of this fact. If the dog barks every time the telephone rings, we do not ascribe this to coincidence. A likely explanation is that the sound of the telephone ringing is causing the dog to bark. This is a case where one of the variables is a cause of the other. If sales of ice cream are high on the same days of the year that many people get sunburned, a likely explanation is that the sun was shining on these days and that the hot sun causes both sunburns and the desire to have an ice cream. Here the explanation is not that buying ice cream causes people to get sunburned, nor vice versa, but instead that there is a common cause of both: the hot sun.
That the principle is highly natural is most apparent when it is expressed in its contrapositive form: if there is no causal relationship between two variables (i.e. neither is a cause of the other and there is no common cause) then the variables will not be correlated. In particular, without a general commitment to this latter statement, it would be impossible ever to regard two different experiments as independent from one another, or for the results of one scientific team to be regarded as an independent confirmation of the results of another.
This principle of causal explanation was first made explicit by Reichenbach [1]. It is key in scientific investigations which aim to find causal accounts of phenomena from observed statistical correlations. Not only that, but the scientific process of experimentation requires that our tests are not subject to arbitrary causal influences. Without assuming some form of Reichenbach's principle it becomes impossible to reason about experimental outcomes.
Despite the central role of causal explanations in sci-ence, there are significant challenges to providing them for the correlations that are observed in quantum experiments [2]. In a Bell experiment, a pair of systems are prepared together, then taken to space-like separated locations for measurement. The choices of measurement being made at those space-like separated locations also. The natural causal explanation of the correlations that one observes in such experiments is that each measurement outcome is influenced by the local measurement setting as well by a common cause located in the joint past of the two measurement events. But Bell's theorem [3] famously rules out this possiblity: within the standard framework of causal models, if the correlations violate a Bell inequality [4]-as is predicted by quantum theory and verified experimentally [5][6][7]-then a common cause explanation of the correlations is ruled out. Furthermore, Ref. [2] proves that it is not possible to explain Bell correlations with classical causal models without unwelcome fine-tuning of the parameters. In the study of classical causation, fine-tuning is normally ruled out by assumption [8]. This includes any attempt to explain Bell correlations with exotic causal influences, such as retrocausality and superluminal signalling. However, the results of Ref. [2] assume our classical models of causation. It has been suggested [2] that an appropriate quantum generalisation of these models might preserve Reichenbach's principle without finetuning. Such a generalisation would satisfactorily explain Bell experiments and all other quantum experiments on a causal level. This article seeks to develop such a model by first suggesting an intrinsically quantum version of Reichenbach's principle.
Specifically, we consider the case of a quantum system A in the causal past of a bipartite quantum system BC and ask what constraints on the channel from A to BC follows from the assumption that A is the complete common cause of B and C. In this scenario we are able to find a natural quantum analog to Reichenbach's principle. This analog can be expressed in several equivalent forms, each of which naturally generalises a corresponding classical expression. In particular, one of these conditions states that A is a complete common cause of BC if one can dilate the channel from A to BC to a unitary by introducing two ancillary systems, contained in the causal past of BC, such that each ancillary system can influence only one of B and C. This unitary dilation codifies the causal relationship between A and BC and illustrates the fact that no other system can influence both B and C. Moreover, our quantum Reichenbach's principle contains the classical version as a special case in the appropriate limit. This suggests that our quantum version is the correct way to generalise Reichenbach's principle.
The mathematical framework of causal models [8,9] can be seen as a direct generalisation of Reichenbach's principle to arbitrary causal structures. By following the this classical example we are able to generalise our quantum Reichenbach's principle to a framework of quantum causal models. In each case, the original Reichenbach's principle becomes a special case of the framework. Just as with classical causal models, the framework of quantum causal models allows us to analyse the causal structure of arbitrary quantum experiments. It also does so while preserving an appropriate form of Reichenbach's principle (by construction) and avoiding fine-tuning.
Although our main motivation for developing quantum causal models is the possibility of finding a satisfactory (i.e., non-fine-tuned) causal explanation of Bell inequality violations [2,10], they are also likely to have practical applications. For instance, finding quantum-classical separations in the correlations achievable in novel causal scenarios might lead to new device-independent protocols [11], such as randomness extraction and secure key distribution. Quantum causal models may also provide novel schemes for simulating many body systems in condensed matter physics [12] and novel means for inferring the underlying causal structure from quantum correlations [13,14].
The structure of the paper is as follows. Section II provides a formal statement of Reichenbach's principle and shows how it can be rigorously justified under certain philosophical assumptions. The main body of results is in Sec. III. Here our quantum generalisation of Reichenbach's principle is presented and justified by reasoning parallel to that of the classical case. This is then fleshed out with alternative characterisations of our quantum version of conditional independence and some specific examples. We return to the classical world in Sec. IV, discussing classical causal models and providing a rigorous justification of the Markov condition, which plays the role of Reichenbach's principle for general causal structures. Sec. V then generalizes these ideas to the quantum sphere, and presents our proposal for quantum causal models. Finally, in Sec. VI we summarize the relationship of our proposal to prior work on quantum causal models, and in Sec. VII we summarize and describe some directions for future work.

A. Statement
Reichenbach gave his principle a formal statement in Ref. [1]. Following Ref. [15], we here distinguish two parts of the formalized principle. First is the qualitative part which expresses the intuitions described at the beginning of the introduction. The other is the quantitative part which constrains the sorts of probability distributions one should assign in the case of a common cause explanation.
The qualitative part of Reichenbach's principle may be stated as follows: if two physical variables Y and Z are found to be statistically dependent, then there should be a causal explanation of this fact, either: 3. there is no causal link between Y and Z, but there is a common cause X of Y and Z; 4. Y is a cause of Z and there is a common cause X of Y and Z; or 5. Z is a cause of Y and there is a common cause X of Y and Z.
Note that the causal influences considered here may be indirect (mediated by other variables). If none of these causal relations hold between Y and Z, then we refer to them as ancestrally independent (because their respective causal ancestries constitute disjoint sets). Using this terminology, the qualitative part of Reichenbach's principle can be expressed particularly succinctly in its contrapositive form as: ancestral independence implies statistical independence, i.e., P (Y Z) = P (Y )P (Z). The quantitative part of Reichenbach's principle applies only to the case where the correlation between Y and Z is due purely to a common cause (case 3 above). Let X be a complete common cause for Y and Z, meaning that X is the collection of all variables acting as common causes. The quantitative part states that Y and Z must be conditionally independent given X, so the joint probability distribution P (XY Z) satisfies P (Y Z|X) = P (Y |X)P (Z|X). (1)

B. Justifying the quantitative part of Reichenbach's principle
Within the philosophy of causality, providing an adequate justification of Reichenbach's principle is a delicate issue. It rests on controversy over basic questions, such as what it means for one variable to have a causal influence on another and what is the correct interpretation of probabilistic statements. In this section, we discuss one way of justifying the principle which provides a clean motivational story with a natural quantum analogue. This involves temporarily making a strong philosophical assumption about determinism. The reader need not accept this assumption as it is only temporarily adopted to motivate the quantitative part of Reichenbach's principle, which stands apart from the assumption.
Suppose we adopt a Bayesian point of view on probabilities: they are the degrees of belief of a rational agent. Dutch book arguments-based on the principle that a rational agent will never accept a set of bets on which they are certain to lose money-can then be given as to why probabilities should be non-negative, sum to 1 and so forth. But why should an agent who takes X to be a complete common cause for Y and Z arrange their beliefs such that P (Y Z|X) = P (Y |X)P (Z|X)? If the agent does not do this, are they irrational ?
One way to justify a positive answer to this question is to assume that in a classical world there is always an underlying deterministic dynamics. In this case, one variable is causally influenced by another if it has a nontrivial functional dependence upon it in the dynamics. Probabilities can be understood as arising merely due to ignorance of the values of unobserved variables. Under these assumptions, one can show that the qualitative part of Reichenbach's principle implies the quantitative part.
In general, a classical channel describing the influence of random variable X on Y is given by a conditional probability distribution P (Y |X). Assuming underlying deterministic dynamics, then although the value of the variable Y might not be completely determined by the value of X, it must be determined by the value of X along with the values of some extra, unobserved, variables in the past of Y which can collectively be denoted λ. Any variation in the value of Y for a given value of X is then explained by variation in the value λ. This can be formalised as follows.
Definition 1 (Classical dilation). For a classical channel P (Y |X), a classical deterministic dilation is given by some random variable λ with probability distribution P (λ) and some deterministic function where δ(X, Y ) = 1 if X = Y and 0 otherwise.
We now apply this to the situation depicted in Fig. 1, where X is the complete common cause of Y and Z. The conditional distribution P (Y Z|X) admits of a dilation in terms of an ancillary unobserved variable λ, for some distribution P (λ) and a function f The assumption that X is the complete common cause of Y and Z implies that the ancillary variable λ can be Z X Y FIG. 1: A causal structure represented as a directed acyclic graph depicting that X is the complete common cause of Y and Z.
The causal structure of Fig. 1, expanded so that Y and Z each has a latent variable as a causal parent in addition to X so that both Y and Z can be made to depend functionally on their parents.
split into a pair of ancestrally independent variables, λ Y and λ Z , where λ Y only influences Y and λ Z only influences Z [68]. It follows that there must exist λ Y and λ Z that are causally related to X, Y and Z as depicted in Fig. 2, where the causal dependences are deterministic and given by a pair of functions f Y and f Z such that In this case, we have Finally, given the qualitative part of Reichenbach's principle, the ancestral independence of λ Y and λ Z in the causal structure implies that P (λ Y , λ Z ) = P (λ Y )P (λ Z ). It then follows that P (Y Z|X) = P (Y |X)P (Z|X), which establishes the quantitative part of Reichenbach's principle.
A well-known converse statement is also worth noting: any classical channel P (Y Z|X) satisfying P (Y Z|X) = P (Y |X)P (Z|X) admits of a dilation where X is the complete common cause of Y and Z [8].
Summarizing, we can identify what it means for P (Y Z|X) to be explainable in terms of X being a complete common cause of Y and Z by appealing to the qualitative part of Reichenbach's principle and fundamental determinism. The definition can be formalized into a mathematical condition as follows: Definition 2 (Classical compatibility). P (Y Z|X) is said to be compatible with X being the complete common cause of Y and Z if one can find variables λ Y and λ Z , distributions P (λ Y ) and P (λ Z ), a function f Y from (λ Y , X) to Y and a function f Z from (λ Z , X) to Z, such that these constitute a dilation of P (Y Z|X), that is, such that With this definition, we can summarize the result described above as follows.
Theorem 1. Given a conditional probability distribution P (Y Z|X), the following are equivalent: 1. P (Y Z|X) is compatible with X being the complete common cause of Y and Z.
The 1 → 2 implication is what establishes that a rational agent should espouse the quantitative part of Reichenbach's principle if they espouse the qualitative part and fundamental determinism.
The 2 → 1 implication allows one to deduce a possible causal explanation of an observed distribution from a feature of that distribution. However, it is important to stress that it only establishes a possible causal explanation. It does not state that this is the only causal explanation. Indeed, it may be possible to satisfy this conditional independence relation within alternative causal structures by fine-tuning the strengths of the causal dependences. However, as noted above, fine-tuned causal explanations are typically rejected as bad explanations in the field of causal inference. Therefore, the best explanation of the conditional independence of Y and Z given X is that X is the complete common cause of Y and Z.

III. THE QUANTUM VERSION OF REICHENBACH'S PRINCIPLE
In this section, we introduce our quantum version of Reichenbach's principle. The definition of a quantum causal model that we provide in Sec. V can be seen as generalizing these ideas in much the same way that classical causal models generalize the classical Reichenbach's principle.

A. Quantum preliminaries
We assume throughout that all quantum systems are finite-dimensional for simplicity. Given a quantum system A, we will write H A for the corresponding Hilbert space, d A for the dimension of H A , and I A for the identity on H A . We will also write H * A for the dual space to H A , and I A * for the identity on the dual space. If a quantum system is initially uncorrelated with any other system, then the most general time evolution of the system corresponds to a quantum channel, i.e., a completely postive trace-preserving (CPTP) map. If the system at the initial time is labelled A, with Hilbert space H A , and the system at the later time is labelled B, with Hilbert space H B , then the CPTP map is where B(H) is the set of bounded operators on H.
An alternative way to express the channel E B|A is as an operator, using a variant of the Choi-Jamio lkowski isomorphism [16,17]: Here, the vectors {|i A } form an orthonormal basis of the Hilbert space H A . The vectors {|i A * } form the dual basis, belonging to H * A . The operator ρ B|A therefore acts on the Hilbert space H B ⊗ H * A . Although the expression above involves an arbitrary choice of orthonormal basis, the operator ρ B|A thus defined is independent of the choice of basis. This version of the Choi-Jamio lkowski isomorphism was chosen because it is both basis-independent and a positive operator. Following [18], we have chosen the operator ρ B|A to be normalized in such a way that Tr B (ρ B|A ) = I A * (in analogy with the normalization condition Y P (Y |X) = 1 for a classical channel P (Y |X)).
Suppose that ρ B = E B|A (ρ A ). Given that the operator ρ B|A contains all of the information about the channel E B|A , the question arises of how one can express ρ B in terms of ρ B|A and ρ A . Recall that ρ B|A is defined on H B ⊗ H A * , while ρ A is defined on H A . As we discuss further in Sec. V, by defining an appropriate "linking operator" with respect to dual orthonormal bases on H A and H A * , one can write ρ B = T r AA * (ρ B|A τ id AA * ρ A ). This expression is meant to be reminiscent of the classical formula P (Y ) = Y P (Y |X)P (X).
Given an operator ρ AB···|CD··· , acting on H A ⊗ H B ⊗ · · · ⊗ H C * ⊗ H D * ⊗ · · · , we will use the same expression with missing indices to denote the result of taking partial traces on the corresponding factor spaces. For example, given a channel ρ BB|AĀ , we write ρ B|AĀ := TrB(ρ BB|AĀ ).
We will sometimes use a hat to denote an operator renormalized such that the trace is 1. For example, if ρ B|A is the operator representing a channel from A to B, When writing products of operators, we will sometimes suppress tensor products with identities. For example, (ρ B|A ⊗ I C )(ρ C|A ⊗ I B ) will be written simply as ρ B|A ρ C|A .

B. Main result
The qualitative part of Reichenbach's principle can be applied to quantum theory with almost no change: if quantum systems B and C are correlated then this must have a causal explanation in one of the five forms listed above. Here, for quantum systems to be correlated means there exist measurements upon them for which the outcomes are correlated.
Finding a quantum version of the quantitative part of Reichenbach's principle is more subtle. If a quantum system A is a complete common cause of B and C (as depicted in Fig. 3), then one expects there to be some constraint analogous to P (Y Z|X) = P (Y |X)P (Z|X) in the classical case. If one tries to do this by generalising the joint distribution P (XY Z), then one immediately faces the problem that textbook quantum theory has no analogue of a joint distribution for a collection of quantum systems in which some are causal descendants of others. The situation is improved if one focusses on finding an analogue of P (Y Z|X) instead. In standard quantum theory, as long as a system A is initially uncorrelated with its environment, then the evolution from A to BC is described by a channel ρ BC|A , and this seems to be a natural analogue of P (Y Z|X). However, even in this case it is not obvious what constraint on ρ BC|A should serve as the analogue of the classical constraint P (Y Z|X) = P (Y |X)P (Z|X).
The treatment of quantum systems over time is deferred to the full definition of quantum causal models in Sec. V. This section focuses on the case of a channel ρ BC|A .
In Sec. II B, we demonstrated how to justify the quantitative part of Reichenbach's principle from the qualitative part in the classical case by temporarily assuming that all dynamics are fundamentally deterministic. We shall now make an analogous argument in the quantum case by temporarily assuming that quantum dynamics are fundamentally unitary. Just as in the classical case, this assumption simply provides a clean way to motivate our result and the reader need not accept it as philosophically accurate since the result can stand apart from this assumption.
In general, a quantum channel from A to B is given by a CPTP map E B|A . Assuming underlying unitary dynamics, then the output state at B must depend unitarily on A along with some extra ancillary system λ in the past of B. This can be formalised as follows.

Definition 3 (Unitary dilation). For a quantum channel E B|A a quantum unitary dilation is given by some ancillary quantum system λ with state ρ λ and some unitary
where the dimension ofB is fixed by unitarity since If we represent the channels by our variant of the Choi-Jamio lkowski isomorphism, with ρ B|A representing E B|A and ρ U BB|Aλ representing U (·)U † , then the dilation equation has the form where τ id λλ * is the linking operator defined in Sec. V. Just as in the classical case, we would like to apply this to the situation depicted in Fig. 3, where A is the complete common cause for B and C. This was easy classically as it is clear what it means for a classical variable, X, to have no causal influence on another, Y , in a deterministic system. Specifically, if the collection of inputs other than X is denotedX so there is a deterministic function f such that Y = f (X,X), then the assumption that X has no causal influence on Y is formalized as f (X,X) = f ′ (X) for some function f ′ . In unitary quantum theory the corresponding condition is less obvious, so we spell it out explicitly with a definition.

Definition 4 (No influence). Consider a unitary channel
Equivalently to this definition, A has no causal influence on B in some unitary channel if and only if the marginal output state at B is independent of any operations performed on A before the A system enters the channel. There is a rich literature concerning similar properties of unitary operators from various perspectives. In particular, the results of Ref. [19] are very close to ours (where they use the phrase "nonsignalling" rather than "no causal influence") and Ref. [20,21] contains similar results (where they say "semi-causal" rather than "no causal influence").
We can now apply this to the complete common cause situation of Fig. 3. The channel E BC|A admits a unitary dilation in terms of an ancillary system λ, for some state ρ λ and unitary U from λA to BDC. Here, an ancillary output D is generally required so that dimensions of inputs and outputs match, but is not important and will always be traced out. This dilation is such that Just as in Sec. II B, the assumption that A is a complete common cause for B and C implies that the ancilla λ can The causal structure of Fig. 3, expanded so that B and C each has a latent system as a causal parent in addition to A. By analogy the classical case, we take B and C to depend unitarily on their λB, A, and λC.
factorise into ancestrally independent λ B and λ C where λ B has no causal influence on C and λ C has no causal influence on B. It follows that systems λ B and λ C exist and are causally related to A, B, and C as depicted in Fig. 4, where the channel is unitary and given by U .
The ancestral independence of λ B and λ C implies that the input quantum state factorises ρ λB AλC = ρ λB ρ A ρ λC , suggesting the following quantum analog to our "classical compatibility" condition.
Definition 5 (Quantum compatibility). ρ BC|A is said to be compatible with A being a complete common cause of B and C, if it is possible to find ancillary quantum systems λ B and λ C , states ρ λB and ρ λC , and a unitary channel where λ B has no causal influence on C and λ C has no causal influence on B, such that these constitute a dilation of ρ BC|A .
All that remains is to show that this, together with the qualitative part of the quantum Reichenbach's principle, implies an appropriate quantitative part (generalising Thm. 1).

Theorem 2.
The following are equivalent: 1. ρ BC|A is compatible with A being the complete common cause of B and C.
The proof is in Appendix A. Note that there is no ordering ambiguity on the right-hand side of the second condition, because the two terms must commute. This is seen by taking the Hermitian conjugate of both sides of the equation.
The strong analogy that exists between Thms. 1 and 2 suggests the following definition: Definition 6 (Quantum conditional independence of outputs given input). Given a quantum channel ρ BC|A , the outputs are said to be quantum conditionally independent given the input, if and only if ρ BC|A = ρ B|A ρ C|A .
It is easily seen that the quantum definition reduces to the classical definition in the case that the channel ρ BC|A is invariant under the operation of completely dephasing the systems A, B, and C in some basis. More precisely: if fixed bases are chosen for H A , H B , H C , and the operator ρ BC|A is diagonal when written with respect to the tensor product of these bases, then the outputs are quantum conditionally independent given the input if and only if the classical channel defined by the diagonal elements of the matrix has the property that the outputs are conditionally independent given the input.
With this terminological convention in hand, we can express our quantum version of the quantitative part of Reichenbach's principle as follows: if a channel ρ BC|A is compatible with A being a complete common cause of B and C, then for this channel B and C are quantum conditionally independent given A.
The 1 → 2 implication in the theorem is what establishes the quantum version of the quantitative part of Reichenbach's principle.
The 2 → 1 implication is pertinent to causal inference: analogously to the classical case, if one grants the implausibility of fine-tuning, then one must grant that the most plausible explanation of the quantum conditional independence of outputs B and C given input A is that A is a complete common cause of B and C.
Thm. 2, and the surrounding discussion, motivates the definition of quantum causal models given in Sec. V. For the rest of this section we make some further remarks about the quantum version of Reichenbach's principle.

C. Alternative expressions for quantum conditional independence of outputs given input
Classically, conditional independence of Y and Z given X is standardly expressed as P (Y Z|X) = P (Y |X)P (Z|X). However, there are alternative ways of expressing this constraint.
For instance, if one defines the joint distribution over X, Y, Z that one obtains by feeding the uniform distribution on X into the channel P (Y Z|X)-that is, where d X is the cardinality of X-then Y and Z being conditionally independent given X in P (Y Z|X) can be expressed as the vanishing of the conditional mutual information of Y and Z given X in the distributionP (XY Z) [8]. This is defined as with H(·) denoting the Shannon entropy of the marginal on the subset of variables indicated in its argument. Therefore, the condition is simply I(Y : Z|X) = 0.
Similarly, if Y and Z are conditionally independent given X in P (Y Z|X), then it is possible to mathematically represent the channel P (Y Z|X) as the following sequence of operations: copy X, then process one copy into Y via the channel P (Y |X) and process the other into Z via the channel P (Z|X).
We present here the quantum analogues of these alternative expressions. They will be found to be useful for developing intuitions about quantum conditional independence and in the proofs. Recall that the quantum conditional mutual information of B and C given A where S(·) denotes the von Neumann entropy of the reduced trace-one quantum state on the subsystem that is specified by its argument.

The Hilbert space for the A system can be decomposed as H
The theorem that conditions 3 and 4 are equivalent follows as a corollary of Thm. 6 of Ref. [22], our main contribution is showing that these are also equivalent to conditions 1 and 2 of Thm. 2.
The final condition can be described as follows. First one imagines decomposing the system A into a direct sum of subspaces, each of which is denoted A i . For each i, the subspace A i is split into two factors, denoted A L i and A R i , with one factor evolving via a channel ρ B|A L i into system B, and the other factor evolving via ρ C|A R i into system C. In the special case where where is only a single value of i, this is simply a factorization of the A system into two parts. In the special case where all of the A L i and A R i are 1-dimensional Hilbert spaces, it is simply a coherent version of a classical copy operation.

D. Circuit representations
It is instructive to summarize the contents of Thms. 1 and 2 using circuit diagrams.
Consider Fig. 5. Equality (4) in the figure asserts that Y and Z are conditionally independent given X in P (Y Z|X). Here conditional independence is expressed in the form of the classical analogue of condition 4 in Thm 3, based on making a copy of X, as discussed at the beginning of Sec. III C. Equality (1) asserts that the conditional probability distribution P (Y Z|X) always admits of a classical dilation of the sort described below Definition 1, while equality (3) asserts simply that each of P (Y |X) and P (Z|X) admits of such a classical dilation. Finally, equality (2) asserts that P (Y Z|X) is compatible with X being a complete common cause of Y and Z.
Analogous circuit diagrams can be provided in the quantum case, as depicted in Fig. 6. In this case, equality (4) asserts that the channel ρ BC|A is such that B and C are quantum conditionally independent given A,  hand wire carries the factor H A L i and the right-hand wire the factor H A R i . The gate applied to the left-hand wire is decorated by a set of quantum channels, also indexed by i, representing the fact such that the gate acts as the ith channel on the ith factor.

P(Y,Z|X)
Equality (1) in Fig. 6 simply asserts the fact that ρ BC|A admits of a unitary dilation. Fig. 6 asserts that for each i, the channel ρ B|A L i admits a unitary dilation (where the unitary is denoted V i and acts on H λB ⊗ H A L i ) and the channel ρ C|A R i admits a unitary dilation (where the unitary is denoted B i and acts on H λC ⊗ H A R i ). Finally, equality (2) asserts that ρ BC|A is compatible with A being a complete common cause of B and C by depicting conditions under which λ B has no influence on C and λ C has no influence on B.

A unitary transformation
Consider the case in which inputs A and D evolve, via a generic unitary transformation U into outputs B and C. In Fig. 7, we illustrate the circuit and the corresponding causal diagram.
The channel ρ BC|AD which one obtains in this case is compatible with the complete common cause of B and C being the composite system AD. This follows from the fact that ρ BC|AD is its own dilation and hence trivially satisfies the condition for compatibility laid out in Def. 5. It follows from Thm. 2 that for such a ρ BC|AD , the outputs B and C are quantum conditionally independent given the input AD, which means that ρ BC|AD = ρ B|AD ρ C|AD , as can also be verified by direct calculation. Similarly, the alternative expressions for this sort of quantum conditional independence, namely, conditions 3 and 4 of Thm. 3, can be verified to hold.

Coherent copy vs. incoherent copy
Consider the simple example of a classical channel, taking X to Y, Z, where X, Y, Z are bit-valued and The outputs of the channel are conditionally independent given the input. In particular, if X is some variable that obtains at an earlier time, and Y, Z two variables at a later time, with the channel describing the evolution of X into Y, Z, then it is intuitively obvious that variation in X fully explains any correlation between Y and Z. Indeed this example may be seen as the paradigmatic case of the explanation of classical correlations via a complete common cause.
One quantum analogue of this channel is the incoherent copy of a qubit: a qubit A is measured in the computational basis; if 0 is obtained, then prepare the state |00 BC and if 1 is obtained, prepare |11 BC . The operator representing this channel is It is easily verified that this operator satisfies each of the conditions of Thm. 2

. The decomposition of the A Hilbert space implied by Condition 4 is
where C is the 1-dimensional complex Hilbert space, i.e., the complex numbers.
The other direct quantum analogue of the classical copy above is the coherent copy of a qubit: with ρ BC|A = (|000 BCA * +|111 BCA * )( 000| BCA * + 111| BCA * ), which corresponds to an unnormalized GHZ state. It can easily be verified that I(B : C|A) = 1 for a trace-one version of this state, hence it is not the case that outputs B and C are quantum conditionally independent given the input A. There is, then, no way in which this channel can arise as a marginal channel in a situation in which A is the quantum complete common cause of B and C.
At first blush, this conclusion may seem surprising. Given the evolution of Eq. (9), where would correlations between outputs B and C come from, other than being completely explained by the input A?
The puzzle is resolved by considering the dilation of the coherent copy to a unitary transformation, and the interpretation of quantum pure states. Consider In the classical case, there are two reasons why any correlation between Y and Z must be entirely explained by statistical variation in the value of X. First, the ancillary variable λ is prepared deterministically with value 0, so there is no possibility that statistical variation in the value of λ underwrites the correlations between B and C. Second, the classical CNOT gate, (which one easily verifies to reduce to the classical copy of Eq. (8) when one sets λ to 0), has the causal structure depicted in Fig. 8, so that λ does not act as a common cause of Y and Z but only a local cause of Z.
In the quantum case, there is no analogue for either of these reasons. Concerning the second reason, the quantum CNOT has the causal structure depicted in Fig. 9: the quantum CNOT is such that not only does A have a causal influence on C, but λ has a causal influence on B as well. In other words, unlike the classical CNOT, there is a back action of the target on the control. It follows that in the quantum case, λ can act as a common cause of B and C. Furthermore, the ancilla is prepared in a quantum pure state |0 . This is dis-analogous to a point distribution on the value 0 for the classical variable λ if one takes the view that a quantum pure state represents maximal but incomplete information about a quantum system [24][25][26][27][28]. In this case, one must allow for the possibility that some correlation between B and C is due to the ancilla, in which case A is not the complete common cause of B and C [70].
F. Generalization to one input, k outputs Thms. 2 and 3, which apply to quantum channels with one input and two outputs, can be generalized to the case of one input and k outputs.
Consider a channel ρ B1...B k |A , and letB i denote the collection of all outputs apart from B i . The notion of quantum compatibility from Def. 5 generalizes in the obvious way: ρ B1...B k |A is said to be compatible with A being a complete common cause of B 1 . . . B k , if it is possible to find ancillary quantum systems λ 1 , . . . , λ k , states ρ λ1 , . . . , ρ λ k , and a unitary channel where, for each i, λ i has no causal influence onB i , such that these constitute a dilation of ρ B1...B k |A .
The generalization of Thms. 2 and 3, consolidated into a single theorem, is as follows: Theorem 4. The following are equivalent:  The proof is in Appendix B. By analogy to the classical case, we refer to conditions 2, 3 and 4 of Thm. 4 as B 1 . . . B k being quantum conditionally independent given A for the channel ρ B1...B k |A .

A. Definitions
Reichenbach's principle is important because it generalizes to the modern formalism of causal models [8,9].
A causal model consists of two entities: (i) a causal structure, represented by a directed acyclic graph (DAG) where the nodes represent random variables and the directed edges represent the directed causal influences among these (several examples have already been presented in this article), and (ii) some parameters, which specify the strength of the causal dependences. Some terminology is required to present the formal definitions.
Given a DAG with nodes X 1 , . . . , X n , let Parents(i) denote the parents of node X i , that is, the set of nodes that have an arrow into X i , and let Children(i) denote the children of node X i , that is, the set of nodes X j such that there is an arrow from X i to X j . The descendents of X i are those nodes X j , j = i, such that there is a directed path from X i to X j . The ancestors of X i are those nodes X j such that X i is a descendent of X j .

Definition 7.
A causal model specifies a DAG, with nodes corresponding to random variables X 1 , . . . , X n , and a family of conditional probability distributions {P (X i |Parents(i))}, one for each i.
Definition 8. Given a DAG, with random variables X 1 , . . . , X n for nodes, and given an arbitrary joint distribution P (X 1 . . . X n ), the distribution is said to be Markov for the graph if and only if it can be written in the form of (Recall that each conditional P (X i |Parents(i)) can be computed from the joint P (X 1 . . . X n ).) The generalization of Reichenbach's principle that is afforded by the formalism of causal models is this: if there are statistical dependences among variables X 1 , . . . , X n , expressed in the particular form of the joint distribution P (X 1 . . . X n ), then there should be a causal explanation of these dependences in terms of a DAG relative to which the distribution P (X 1 . . . X n ) is Markov.
Note that an alternative way of formalizing the Markov property is that P (X 1 . . . X n ) is Markov for the graph if and only if, for each i, P (X i |Parents(i)) = P (X i |Nondesc(i)), where Nondesc(i) is the set of nondescendents of node X i . The intuitive idea is that the parents of a node screen off that node from the rest of the graph: once the values of the parents are fixed, the values of other non-descendent nodes are irrelevant to the value of X i .
Note that the qualitative and quantitative parts of Reichenbach's principle are easily seen to be special cases of the requirement that for a joint distribution to be explainable by the causal structure of some DAG, it must be Markov for that DAG: if two variables, Y and Z, are ancestrally independent in the graph, then any distribution that is Markov for this graph must factorize on these, P (Y Z) = P (Y )P (Z), which is the qualitative part of Reichenbach's principle in its contrapositive form; if two variables, Y and Z, have a variable X as a complete common cause, as in the DAG of Fig. 1, then any distribution that is Markov for the graph must satisfy P (Y Z|X) = P (Y |X)P (Z|X), which is the quantitative part of Reichenbach's principle.

B. Justifying the Markov condition
Just as we previously asked whether there was some principle that forced a rational agent to assign probability distributions in accordance with the quantitative part of Reichenbach's principle, we can similarly ask why a rational agent who takes causal relationships to be given by a particular DAG should arrange their beliefs so that the joint distribution is Markov for the DAG.
The justification of the Markov condition parallels the justification of the quantitative part of Reichenbach's principle that was presented in Sec. II B. We begin by outlining what the qualitative part of Reichenbach's principle and the assumption of fundamental determinism imply for any arbitrary causal structure.
Definition 9 (Classical compatibility with a DAG). P (X 1 . . . X n ) is said to be compatible with a DAG G with nodes X 1 , . . . , X n if one can find a DAG G ′ that is obtained from G by adding extra nodes λ 1 , . . . , λ n , such that for each i, the node λ i has a single outgoing arrow, to X i , and no incoming arrows, and one can find, for each i, a distribution P (λ i ) and a function f i from (λ i , Parents(i)) to X i such that Theorem 5 (Ref. [8]). Given a joint distribution P (X 1 . . . X n ) and a DAG G with nodes X 1 , . . . , X n , the following are equivalent: 1. P (X 1 . . . X n ) is compatible with the causal structure described by the DAG G.
The 1 → 2 implication in Thm. 5 can be read as follows: if it is granted that causal relationships are indicative of underlying deterministic dynamics, and that the qualitative part of Reichenbach's principle is valid, then, on pain of irrationality, an agent's assignment P (X 1 . . . X n ) must be Markov for the original graph.
The 2 → 1 implication in Thm. 5, like that of Thm. 1, is pertinent for causal inference. It asserts that if one observes a distribution P (X 1 . . . X n ), then of the causal models that are compatible with this distribution, the only ones that do not require fine-tuning of the parameters are those involving DAGs relative to which the distribution is Markov.

A. The proposed definition
In our treatment of the simple causal scenario where A is a complete common cause of B and C (the DAG of Fig. 3), we focussed on what form is implied for the quantum channel ρ BC|A . But there has not been any attempt to define a quantity analogous to the classical joint distribution, that is, a quantity analogous to P (XY Z) in the case of the DAG of Fig. 1, nor indeed other classical Bayesian conditionals such as P (X|Y Z). For results that aim to achieve such analogues, see Ref. [18,27]. See also Ref. [29], however, where it is shown that there are significant blocks towards establishing an analogue of a classical joint distribution when the set of quantum systems includes some that are causal descendants of others.
This work takes a different approach. The interpretation of a quantum causal model will be that each node represents a local region of time and space, with channels such as ρ BC|A describing the evolution of quantum systems in between these regions. At each node, there is the possibility that an agent is present with the ability to intervene inside that local region. Each node A i will be then associated with two Hilbert spaces, one corresponding to the incoming system (before the agent's intervention) and the dual space, which corresponds to the outgoing system (after the agent's intervention). A quantum causal model will consist of a specification of all the quantum channels representing the evolution of systems in between nodes, with the operational significance of a network being that it is used to calculate joint probabilities for the agents to obtain the various possible joint outcomes for their interventions. This way of treating quantum systems over time has appeared in various different approaches in the literature, including the multi-time formalism [30][31][32][33], the quantum combs formalism [34][35][36], the process matrices formalism [37][38][39], and a number of other formalisms besides [14,[40][41][42].
The discussion of classical causal models in Sec. IV, and the results of Sec. III for the special case of A a complete common cause of B and C, suggest the following generalization.
where σ 1...n is an operator on n i=1 (H i ⊗ H * i ). Recall from Section III that, given a quantum channel ρ BC|A , it is compatible with A being the complete common cause of B and C if and only if ρ BC|A = ρ B|A ρ C|A , and if this holds, then [ρ B|A , ρ C|A ] = 0. The definition of a quantum causal model, in particular, the stipulation that the channels commute pairwise, generalizes this idea.

B. Bayesian vs. do-conditionals
In the theory of classical causal models, there is an important distinction between two types of conditionals: Bayesian conditionals and do-conditionals. Roughly, a Bayesian conditional gives the probability distribution for a variable (or collection thereof), given that another variable (or collection thereof) is discovered to have a certain value, as computed using Bayes' rule: P (Y |X) = P (Y X)/P (X). A do-conditional, on the other hand, gives the probability distribution for a variable (or collection thereof) given that an agent intervenes deliberately and fixes another variable (or collection thereof) to have a certain value. Do-conditionals are written P (Y |do X). For a full treatment, see Ref. [8].
Conceptually, a classical channel with input X and output Y is better thought of as defined by a doconditional P (Y |do X), rather than the Bayesian conditional P (Y |X). A quantum channel with input A and output B, represented by ρ B|A is best thought of as the analogue of a do-conditional. Hence using a more careful notation, a quantum channel might be written ρ B|do A and, given a channel that satisfies the conditions of Thm. 2, we would say that the outputs B and C are quantum do-conditionally-independent given A. But this is cumbersome, which is why we have not made use of these distinctions throughout the bulk of this work.
In the definition of a classical causal model above (Def. 7), the DAG is supplemented with Bayesian conditional probabilities of the form P (X i |Parents(i)). It is easy to show that when P (X 1 . . . X n ) is Markov for the graph, then for all i, P (X i |Parents(i)) = P (X i |do Parents(i)). Hence an equivalent definition of a classical causal model would be as a DAG, supplemented with a do-conditional P (X i |do Parents(i)) for each i. Our definition of a quantum causal model might be regarded as analogous to a classical causal model in this sense. In particular, we do not in this work attempt any generalization of the classical concept of a Bayesian conditional to the quantum case. Even more exactly, our definition of a quantum causal model is analogous to a generalization of classical causal models in which the random variable at each node is split into two copies, with the possibility of an intervention at the node.
For similar remarks in the context of a related approach, see [39].

C. Making predictions
In order to see how a quantum causal model is used to calculate probabilities for the outcomes of agents' interventions, consider a quantum causal model with nodes A 1 , . . . , A n and state σ 1...n . Let the intervention at node A i have classical outcomes labelled by k i . The intervention is defined by a quantum instrument (that is, by a set of completely-positive trace-non-increasing maps, one for each outcome) which sum to a trace-preserving map. In order to write the probabilities for the outcomes in a simple form, it is useful to define the instrument in such a way that the maps take operators on H * i into operators on H * i . Hence, suppose that the outcome k i corresponds to the map E ki i : B(H * i ) → B(H * i ) and let The outcome k i of the agent's intervention can then be represented by the (positive, basis-independent) operator τ ki i . If an agent does not intervene at the node A i , this corresponds to the operator The joint probability for the agents to obtain outcomes k 1 , . . . , k n is given by We can also define operations on the state σ 1...n corresponding to marginalization over some nodes, much as classical marginal distributions are written P (X) = y P (XY ). For example, There is in general no reason to expect that the marginal operator σ 1...(i−1)(i+1)...n will be compatible with the remaining network. Even in the classical case, the marginal distribution P (X 1 . . . X i−1 X i+1 . . . X n ) obtained by marginalizing over some variable X i is not in general Markov for the graph obtained by removing the X i node, along with its incoming and outgoing arrows.

Confounding common cause
Consider a quantum causal model with the DAG depicted in Fig. 10. The DAG is supplemented with the quantum channels ρ C|AB , ρ B|A , and ρ A , where the latter is simply a quantum state on H A (which can also be thought of as a channel from the trivial, or 1-dimensional system into A).
The corresponding state is σ = ρ C|BA ρ B|A ρ A , where σ acts on the Hilbert space By stipulation, the channels commute pairwise. This is immediate in the case of, say, ρ B|A and ρ A , since these operators are non-trivial on distinct Hilbert spaces. But it is significant in the case of ρ C|BA and ρ B|A , both of which act non-trivially on H * A . From Thm. 2, this implies that H * A decomposes as H * , with ρ C|BA acting trivially on (say) the right-hand factors and ρ B|A acting trivially on the left-hand factors.
In words, the evolution undergone by the system emerging from A is as follows: a (possibly degenerate) von Neumann measurement is performed and, depending on the outcome, the A system is split into two pieces. One piece evolves to become the input at B. The output at B is then recombined with the other piece, and evolves to become the input at C.
There is a noteworthy difference here between classical and quantum causal models. In the case of a classical causal model, with graph indicated by Fig. 10 and ABC random variables, any distribution P (ABC) is Markov for the graph. In the quantum case, however, the fact that the output Hilbert space for the A system decomposes as described is a very strong constraint on the kinds of evolution of a quantum system that can be described by this DAG.
It is also instructive to consider classical and quantum causal models with the causal structure shown in Fig. 11. Consider a classical causal model with this graph, with A, B, C, λ random variables, and a joint distribution P (ABCλ). Suppose that P (λ = 0) = 1. Seeing as λ does not vary, the λ node can be removed from the graph (along with its outgoing arrows) and the marginal distribution P (ABC) is Markov for the portion of the graph that remains. (Of course, it must be, since any distribution P (ABC) is Markov for the graph of Fig. 10.) In the quantum case, the DAG of Fig. 11 may represent, for example, the evolution of a qubit over three time steps, A, B and C, where the qubit interacts with an environment whose initial state is ρ λ . Nodes corresponding to the environment at the second and third time steps are omitted. Given that over the course of this evolution, information can flow from the qubit to the environment, and back again, it is necessary to include an arrow from λ C A B FIG. 11: The causal structure of Fig. 10 with an extra node λ which is a common cause for B and C. A causal model with this DAG may describe a qubit interacting with an environment: A, B, C represent the qubit system at three different times and λ the environment at the initial time.
A to C, as well as from λ to B and λ to C.
A quantum causal model with this DAG defines commuting channels ρ C|BAλ , ρ B|Aλ , ρ A , ρ λ . From the fact that ρ C|BAλ and ρ B|Aλ commute, we conclude that the Hilbert space H * A ⊗ H * λ decomposes as a direct sum over direct products. However, a decomposition of H * A ⊗ H * λ as a direct sum over direct products does not imply a decomposition of the Hilbert space H * A alone as a direct sum over direct products. Further, suppose that ρ λ is the pure state |0 0|. Then, unlike in the classical case of a point distribution on the variable λ, we cannot remove the quantum λ node and expect the resulting marginal channels to be compatible with the causal structure of Fig. 10. This makes intuitive sense if one takes the view (as per Section III E) that a quantum pure state represents maximal but incomplete information. Uncertainty about the λ system may underwrite correlations between B and C, in such a way that these correlations cannot be understood as arising from channels that would be compatible with the causal structure of Fig. 10.

A simple case of Bayesian updating
This section discusses another sense in which the quantum notion of conditional independence of the outputs of a channel given the input mirrors qualitatively an important aspect of the classical case.
Consider a classical causal model with the DAG of Fig. 1 and distribution P (XY Z) such that P (Y Z|X) = P (Y |X)P (Z|X). A particular feature of this causal scenario is that if new information is obtained about the variable Y , for example an agent learns that the value of Y is Y = y, then the process of Bayesian updating can go as follows. First update the distribution over X by applying the rulẽ P (X) := P (X|Y = y) = P (Y = y|X)P (X) P (Y = y) .
Then use the new probabilities for X to get an updated distribution for Z: where the sum ranges over the values that X may take. Roughly speaking, the process of Bayesian updating "follows the arrows" of the graph. For this it is crucial that the joint distribution P (XY Z) satisfies P (Y Z|X) = P (Y |X)P (Z|X), otherwise the term P (Z|X) in Eq. (13) would have to be replaced with P (Z|X, Y = y).
Consider now a quantum causal model, with the DAG of Fig. 3 and state σ ABC = ρ B|A ρ C|A ρ A . Suppose that an agent at B intervenes, obtaining outcome k B , corresponding to the operator τ kB B . The agent wishes to calculate the probability that an intervention at C yields outcome k C corresponding to τ kC C , conditioned on having obtained the outcome k B , and assuming that there is no intervention at A. This can be done as follows. First, update the state at A to get .
Then apply the channel ρ C|A to get an updated state at C: Finally, calculate the probability of k C : Again, the process of Bayesian updating "follows the arrows" of the graph. For this to work, it was crucial that the channel ρ BC|A satisfies ρ BC|A = ρ B|A ρ C|A .

VI. RELATION TO PRIOR WORK
We now present a short review of prior works on quantum causal models and describe how our own proposal relates to these.
Preliminary work in this area took to form of explorations of Bell-type inequalities (and whether they admit of quantum violations) for novel causal scenarios [43,44]. Several researchers recognized that the formalism of classical causal models could provide a unifying framework in which to pose the problem of deriving Bell-type constraints, and that this framework might be extended to address the problem of deriving constraints on the correlations that can be obtained with quantum resources [2,10,11,45]. Note that such constraints are expressed entirely in terms of classical settings and classical outcomes of measurements. Henson, Lal and Pusey [46] and Fritz [47] independently proposed definitions of quantum causal models with the purpose of expressing such constraints. In these approaches, each node of the DAG represent a process (which may have a classical outcome), while each directed edge is associated with a system that is passed between processes. However, despite the fact that their frameworks incorporate postclassical resources, they do not have sufficient structure to to define conditional independences between quantum systems.
Operational reformulations of quantum theory such as Refs. [48][49][50][51][52][53] helped to set the stage for the development of quantum causal models. Although they were conceived independently of the framework of classical causal models, they were quite similar to that framework insofar as they made heavy use of DAGs-in the form of circuit diagrams-to depict structural features of a set of processes. When the authors of these formulations turned their attention to relativistic causal structure, the frameworks they devised drew even closer in spirit to that of causal models. Prominent examples include: the causaloid framework of Hardy [54], the multi-time formalism of Aharonov and co-workers [30][31][32][33], the quantum combs framework of Chiribella, D'Ariano and Perinotti [34][35][36], the causal categories of Coecke and Lal [23], and the process matrix formalism of Oreshkov, Costa and Brukner [37,38]. A common aim of these approaches is to be able to compute the consequences of an intervention upon a particular quantum system within the circuit, and this is precisely one of the tasks that a quantum analogue of a causal model should be able to handle.
Many of these frameworks represent a quantum system at a given region of space-time by two copies of its Hilbert space, one corresponding to the system that is input into the region and one corresponding to the system that is output from it. In this way, the region becomes a "locus of intervention" for the system. By inserting a particular quantum process into the "slot", one determines the nature of the intervention. This is the approach taken, for instance, in the multi-time formalism of Ref. [31], the quantum combs of Ref. [34], and the process matrices of Ref. [37]. This representation of interventions has a counterpart in classical causal models, for instance in the work of [55], as was noted in Refs. [14,39]. Costa and Shrapnel [39] in particular have sought to explicitly cast this sort of framework as a quantum generalization of a causal model. In their approach, the nodes of the DAG are associated with a quantum system localized in a region (understood as a potential locus of intervention) and the collection of edges from one set of nodes to another represent causal processes.
An approach of this sort is required if one seeks to find intrinsically quantum versions of important theorems of classical causal models. For instance, while Henson, Lal and Pusey derive a generalization of the d-separation theorem of classical causal models, it only infers conditional independence relations from d-separation relations for the classical variables in the graph. An intrinsically quantum version of the d-separation theorem, by contrast, would be one which concerns the causal relations among quantum systems (see, for instance, [56]). If a set of nodes representing quantum systems can be described by a joint or conditional state, then one can seek to de-termine whether factorization conditions on this state are implied by d-separation relations among the quantum systems on the graph. Similarly, while both the approaches of Henson, Lal and Pusey and of Fritz allow one to derive, from the structure of the DAG, constraints on the joint distribution over classical variables embedded therein, they do not address an intrinsically quantum version of this problem. If a set of nodes representing quantum systems can be described by a joint or conditional state, then one can seek to derive constraints on this state directly from the structure of the DAG.
Our own approach aims at an intrinsically quantum generalization of the notion of a causal model. We therefore associate to each node of the DAG a quantum system localized to a space-time region, and we represent it by a pair of Hilbert spaces, corresponding to the input and output of an intervntion upon the system. Consequently, our approach is very similar to that of Costa and Shrapnel. Nonetheless, there is a significant difference in how we represent common causes.
In Costa and Shrapnel's work, any node with multiple outgoing edges is represented as a locus of intervention where the output Hilbert space is a tensor product of Hilbert spaces, one for each outgoing edge. As such, any node acting as a common cause must be associated with a composite quantum system. It cannot, for instance, be associated with a single qubit. By contrast, our approach does not constrain the representation of common causes in this fashion. Any quantum system, including a single qubit, may constitute a complete common cause of a collection of other quantum systems. This extra generality is required since, as our examples have shown, the complete common cause of a set of systems can be a single qubit. Second, and more importantly, our work has shown that for a quantum channel whose input is the complete quantum common causes of its n outputs, it is not the case that the channel must split the input into n components, each of which exerts a causal influence on a different output. This is merely one special case of the most general form that such a channel can take. Third, if the complete common causes consist of multiple nodes in the DAG, then it is only the joint Hilbert space of these that must satisfy the condition of factorizing-insubspaces, while each individual Hilbert space need not.
These differences are likely to have a significant impact on the form of any intrinsically quantum d-separation theorem.
Finally, we mention a third purpose to which quantum causal models can be put. Theorems about classical causal models often concern the sorts of inferences one can make about one variable given information about another. As an example, if Z is a common effect of X and Y , then learning Z can induce correlations between X and Y . As such, one might expect quantum causal models to also constrain the sort of inferences one can make among quantum variables. Early work by Leifer and Spekkens [18] had this purpose in mind. The authors noted various scenarios in which their proposal could not be applied, and subsequent work [29] has narrowed down the scope of possibilities for any such proposal. Our own proposal provides the means of making many of the Bayesian inferences considered in Ref. [18]. The case discussed in Sec. V D 2 is one such example.
There is also prior work on quantum causal models that takes a significantly different approach to the ones described above and for which the relation to our work is less clear. The work of Tucci [57,58], which is in fact the earliest attempt at a quantum generalization of a causal model, represents causal dependences by complex transition amplitudes rather than quantum channels.

VII. CONCLUSIONS
The field of classical statistics has benefited greatly from analysis provided by the formalism of causal models [8,9]. In particular, this formalism allows one to infer facts about the underlying causal structure purely from uncontrolled statistical data, a tool with significant applications in all branches of science, economics and the social sciences. Given that some paradoxical features of classical correlations have found satisfying resolutions when viewed through a causal lens, one might wonder to what extent the same is true of quantum correlations.
Starting with the idea that whatever innovation quantum theory might hold for causal models, the intuition contained in Reichenbach's principle ought to be preserved, we motivated the problem of finding a quantum version of the principle. This required us to determine what constraint a channel from A to BC must satisfy if A is the complete common cause of B and C. We solved the problem by considering a unitary dilation of the channel and by noting that there is no ambiguity in how to represent the absence of causal influences between certain inputs and certain outputs of a unitary. From this, we derived a notion of quantum conditional independence for the outputs of the channel given its input. This inference from a common-cause structure to quantum conditional independence was then generalized to obtain our quantum version of causal models.
Given a state on a quantum causal model, we described how to construct a marginal state for a subset of nodes. We discussed a number of simple examples of quantum channels and causal structures. A theme of the examples is that when there is a difference between the quantum and classical cases, this can often be understood if one takes the view that a quantum pure state represents maximal but incomplete information about a system, hence may underwrite correlations between other systems in a way that a classical pure state cannot.
There are many directions for future work. In the case of classical causal networks, an important theorem states that d-separation is sound and complete for conditional independence in the joint probability distribution [8]. Here, for arbitrary subsets of nodes S, T and U , subsets S and T are said to be d-separated by U if a certain criterion holds, where this is determined purely by the structure of the DAG. An important question is therefore whether d-separation is sound and complete for some natural property of the state σ on a quantum causal network. It is also desirable to relate properties of a quantum causal network to operational statements involving the outcomes of agents' interventions: under what circumstances, for example, does it follow that there is an intervention by the agents at nodes in a subset U , such that conditioned on the outcome, the interventions by agents at S and T must be independent? Such a result would have an application to quantum protocols. Imagine, for example, a cryptographic scenario in which agents at S and T desire shared correlations that are not screened off by the information held by agents at U .
In the classical case, there has been a great deal of work on the problem of causal inference [8,[59][60][61]: given only certain facts about the joint probabilities, e.g., a set of conditional independences, what can be inferred about the underlying causal structure? For an initial approach to quantum causal inference, with a quantumover-classical advantage in a simple scenario, see [14]. The formalism of quantum causal networks described here is the appropriate framework for inferring facts about underlying, intrinsically quantum, causal structure, given observed facts about the outcomes of interventions by agents.
Recently, there has been much interest in deriving bounds on the correlations achievable in classical causal models [59,60,62,63]. Such bounds constitute Belllike inequalities for arbitrary causal structures. The main technical challenge in deriving such inequalities is that they-unlike standard Bell inequalities-are nonlinear in character, hence standard techniques for deriving Bell inequalities are not applicable. By adapting these new techniques to the formalism presented here, one could perhaps systematically derive bounds on the quantum correlations achievable in certain quantum causal models thereby providing a general method of deriving T'sirelson-like bounds for arbitrary causal structures.
Finally, it would be interesting to extend the formalism to explore the possibility that certain quantum scenarios are best understood as involving a quantum-coherent combination of different causal structures [35,37,64,65]. It has been argued that the possibility of such indefinite causal structure may be significant for the project of unifying quantum theory with general relativity [54].
supported by the Government of Canada through the Department of Innovation, Science and Economic Devel-opment Canada and by the Province of Ontario through the Ministry of Research, Innovation and Science. [68] This is because any other λ would necessarily introduce new common causes for Y and Z that are not screened through X, which would violate the assumption that X is a complete common cause.
[69] The quantum version in Fig. 9 was studied for similar reasons in Ref.
[67], though from a different perspective [70] It is interesting to consider an exactly analogous scenario, as it arises in the toy theory of Ref. [26]. Here, a system analogous to a qubit can exist in one of four distinct classical states (the ontic states of the system). But an agent who prepares systems and measures them can only ever have partial information about which of the four ontic states a system is in. The toy equivalent of a CNOT gate corresponds to a reversible deterministic map, i.e., a permutation of the ontic states. By considering the probability distribution over ontic states of the various systems, one may verify directly that the ontic states of toy systems B and C are not determined by the ontic state of toy system A. Rather, the ontic states of B and C depend also on the ontic state of λ. Furthermore, the analogue of a pure quantum state for λ is a probability distribution on λ that is not a point distribution. In this way, statistical correlations between B and C can be underwritten by statistical variation in the ontic state of λ.
Thm. 2 concerns the channel operator ρ BC|A , which satisfies Tr BC (ρ BC|A ) = I A * . Applying Lem. 1 to the operator ρ BC|A = (1/d A )ρ BC|A yields the decomposition Using Tr BC (ρ BC|A ) = (1/d A )I A * , it follows that for each i, the components satisfy Tr B (ρ B|A L i ) = (1/d A L i )I (A L i ) * , and The result follows.
Let ρ U BF C|AλB λC be the Choi-Jamio lkowski operator for the unitary U , defined according to the conventions set out in the main text. Let missing indices indicate that a partial trace is taken, as also in the main text. Note that in general ρ U BC|A = ρ BC|A , since the latter is obtained via a particular choice of input states for λ B and λ C . The proof proceeds by proving relations between quantum conditional mutual informations evaluated on the renormalized operatorρ U BF C|AλB λC = (1/d λB d A d λC )ρ U BF C|AλB λC , and its partial traces. First, This follows by expanding in terms of von Neumann entropies: The third term is zero, since the unitarity of U implies thatρ U BF C|λB AλC is a pure state. The final term is log(d λB d A d λC ), sinceρ U ·|λB AλC = (1/d λB d A d λC )I (λB AλC ) * . Noting also that Tr λB AλC (ρ U BF C|λB AλC ) = (1/d λB d A d λC )I (λB AλC ) * , and using the fact that the von Neumann entropy of the partial trace of a pure state is equal to the von Neumann entropy of the complementary partial trace, yields that the first two terms equal log(d F d C ) and log(d B ) respectively, hence their sum is equal to log(d λB d A d λC ), and Eq. (A2) follows.
Second, The second and fourth terms are entropies of maximally mixed states on their respective systems, hence sum to log d λC . For the first and third terms, it follows from the assumption that there is no causal influence from λ C to B in U thatρ U B|λB AλC =ρ U B|λB A ⊗ (1/d λC )I (λC ) * . Hence the third term is equal to S(ρ U B|λB A ) + log(d λC ), which gives Eq. (A5). Fourth, This follows from a similar argument as Eq. (A5), using the assumption that there is no influence from λ B to C in U .
The aim is now to use Eqs. (A2,A4,A5,A7) to show thatρ BC|A satisfies I(B : C|A) = 0. This follows using a result from Ref. [12], which states that quantum conditional mutual informations on partial traces of a multipartite quantum state satisfy the semi-graphoid axioms familiar from the classical formalism of causal networks [8].