Causal Networks and Freedom of Choice in Bell's Theorem

Bell's theorem is typically understood as the proof that quantum theory is incompatible with local-hidden-variable models. More generally, we can see the violation of a Bell inequality as witnessing the impossibility of explaining quantum correlations with classical causal models. The violation of a Bell inequality, however, does not exclude classical models where some level of measurement dependence is allowed, that is, the choice made by observers can be correlated with the source generating the systems to be measured. Here, we show that the level of measurement dependence can be quantitatively upper bounded if we arrange the Bell test within a network. Furthermore, we also prove that these results can be adapted in order to derive nonlinear Bell inequalities for a large class of causal networks and to identify quantumly realizable correlations that violate them.


I. INTRODUCTION
Bell's theorem [1] can arguably be seen as the most radical departure from classical physics. The kind of nonclassicality it entails is achieved without the need of any specific details or experimental assumptions that furthermore can be put to practical use in a variety of quantum information processing protocols in what is known as the device-independent framework [2].
In its standard interpretation, the violation of a Bell inequality shows that quantum correlations are incompatible with any theories respecting local realism, that is, theories where physical properties have well defined values prior to any measurement and in such a way that far away events do not have a direct causal influence over each other. From another perspective, Bell inequality violation can also be seen as disproving any theories obeying the notion of local causality which implies a certain factorization of probabilities and that can be derived from two more fundamental assumptions, i.e., Reichenbach's principle [3] and relativistic causality according to which the past corresponds to the past light-cone (see Refs. [4,5]). However, as firstly pointed out by Brans [6], local hidden variable models are still capable of reproducing the quantum predictions if we allow for measurement dependence, a mechanism where our measurement devices are correlated with the system to be measured (see also [7][8][9]). This subtle assumption in Bell's theorem, also known as the assumption of "free will" or as "statistical independence", has since then attracted growing attention, both from a theoretical [9][10][11][12][13][14][15][16][17][18][19][20][21] and experimental [22][23][24][25] perspective and can be related to the communication cost between the measurement sta- * rchaves@iip.ufrn.br tions, needed by classical models to reproduce quantum correlations [11,[26][27][28][29][30][31][32][33][34].
Of particular relevance, is the paradigmatic Clauser-Horne-Shimony-Holt (CHSH) Bell scenario [35] -involving two distant parties, each measuring two dichotomic observables -which has been thoroughly analyzed also allowing for relaxations of the measurement dependence assumption [20]. As compared with the relaxation of locality, where it is known [29] that one requires 1 bit of classical communication to simulate the maximal quantum violation of the CHSH inequality, measurement dependence turns out to be a stronger resource, as merely 0.046 bits of correlation are already enough to achieve the simulation [20]. In spite of steady progress, all results to date suffer from the fact that they only impose lower bounds to the amount of measurement dependence needed to simulate a given violation of a Bell inequality using a classical causal model. If, by considering a slight modification of a Bell experiment, one had the means of determining an upper bound on the amount of measurement dependence that can be present, then whenever this amount was less than the lower bound on the amount needed to explain the observed violation of a Bell inequality in a classical causal model, one could infer that these violations were due to nonclassical effects. Such a modification, in other words, would provide a means of adjudicating between measurement dependence and nonclassicality as a means of explaining the violation. Measurement dependence has then remained as a seemingly untestable loophole in any Bell experiment.
The first aim of this paper is to revisit measurement dependence and show that under some assumptions it can indeed be upper bounded from the data observed in a slightly modified Bell experiment. For that, we arrange the standard Bell scenario as part of a larger causal network that includes an auxiliary variable, and use the correlations between the measurement inputs and the auxiliary arXiv:2105.05721v2 [quant-ph] 19 Nov 2021 variable in order to upper bound how much such inputs might depend on the source generating the physical systems to be measured. From that we obtain non-linear Bell inequalities -which explicitly incorporate possible correlations between the system to be measured and the settings of the measurement devices -the violation of which is a clear signature of nonclassicality, rather than something which can be explained by merely positing measurement dependence within a classical causal model.
Following this, we explore the connections between measurement-dependent Bell causal structures and general causal networks that have recently started to attract attention in the literature [36]. As compared with usual Bell scenarios, these new networks have two characteristic features. First, the fact that the correlations between the distant parties are now mediated by a number of independent sources. Second, the fact that one can prove nonclassical behaviour even without the need for any inputs, something considered quintessential in Bell's theorem [37]. In spite of the promising foundational and applied uses, progress in the analysis of general causal networks has been impeded by the fact that the correlations compatible with them define nonconvex sets, for which decades of expertise gathered with the standard Bell scenario and convex optimization algorithms are of limited or no use. Here, we show that standard Bell scenarios with measurement dependence can readily and generally be mapped onto causal networks of growing size and with different topologies. Not only do we derive new non-linear Bell inequalities for a variety of networks but also show, for the first time, that they lead to nonclassical correlations.
The paper is organized as follows. In Sec. II, we revisit the measurement independence assumption in Bell's theorem from a causal perspective and in particular identify and discriminate two possible mechanisms able to generate measurement dependence. In Sec. III, we briefly present the entropic approach for the characterization of classical causal structures. In Sec. IV, we discuss and derive a number of results and inequalities for the characterization of bipartite Bell scenarios with measurement dependence. In this section we also generalize these results to generic multipartite Bell scenarios and in particular derive strong lower bounds for measurement dependence based on the Mermin inequality [38]. In Sec. V, we adapt our results for the analysis of a wide variety of causal networks, in particular proving that they can lead to nonclassical correlations. Finally, in Sec. VI, we discuss our results and point out interesting directions for future research.

II. FREEDOM OF CHOICE IN BELL'S THEOREM
Bell's theorem [1] is the prime example of the incompatibility of quantum predictions with those of classical causal models. From a modern perspective [4,17], we impose a given causal structure to our quantum experiment and inquire whether causal models of classical origin can explain the observed empirical data, that is, the observed probability distribution. Different causal structures can be used to that aim [39][40][41][42][43][44][45][46][47][48][49] but the paradigmatic illustration involves two distant parties, Alice and Bob, that upon receiving physical systems from a common source, randomly decide which measurements to perform obtaining the corresponding outcomes. Their measurement choices are labelled by the variables X and Y while their outcomes are labelled by A and B, for Alice and Bob respectively. We note that random variables are typically denoted by uppercase letters and the values these variables can assume by lowercase letters. For instance, X would represent the input variable of Alice and X=x, a specific value of it. In order to simplify the notation and presentation, in what follows we will use lowercase letters only. Invoking special relativity, if Alice and Bob are space-like separated, then the measurement choices of one should have no causal influence on the measurement outcomes of the other. These are the so-called no-signalling constraints. They imply that the probability distribution p(a, b|x, y) observed in such an experiment should respect From a causal inference perspective [50], the central question is: what is the simplest causal structure able to recover, in a faithful manner, the probabilistic conditional independence relations corresponding to the no-signalling condition? That is, these conditional independencies should follow from the causal structure itself and not from a finetuning of model parameters. More formally, a causal structure can be defined by a directed acyclic graph (DAG), where the nodes represent the variables of interest and direct edges encode the causal relations among them [50]. Classically, any node x i in the graph can be understood as a function of its graph-theoretical parents P a(x i ), implying the causal Markov condition [50], which stipulates that a variable is conditionally independent of its nondescendents given its parents and thus p(x 1 ,..., It turns out that [4] the observational equivalence class of causal structures which faithfully (that is, without fine tuning of parameters) imply the no-signalling constraints is the one which includes the causal structure posited by Bell in his seminal work, shown in Fig. 1 [51] Employing the causal Markov condition [50] any probability compatible with Bell's causal structure should fulfill p(a,b|x,y) = λ p(a|x,λ)p(b|y,λ)p(λ).
As shown by Bell [1], a quantum experiment which mirrors this causal structure can yield correlations that are incompatible with the classical description given by (3). This is the phenomenon known as Bell nonlocality and which can be witnessed by violating Bell inequalities, linear constraints of the general form where L is the maximum possible value achievable by the classical description (3).
Often it is said that Bell's theorem shows the incompatibility between quantum theory and the assumptions of realism, locality and freedom of choice. Under this standard view, the classical decomposition (3) is referred to as a local hidden variable (LHV) model. This interpretation follows naturally from the causal perspective. The realism assumption corresponds to the assumption of explainability of the correlations in terms of a classical causal model, in particular, one positing the existence of a hidden variable Λ. In turn, the locality and freedom of choice (or measurement independence) assumptions are encoded in the causal structure of Fig. 1. Through the Markov condition, this structure implies that p(a|b,y,x,λ) = p(a|x,λ) and p(b|a,x,y,λ) = p(b|y,λ), a condition that is typically called 'local causality' [5,52] and which asserts that Alice's and Bob's outcomes are fully determined by their choices and the state of the source. It also implies that p(x,y,λ) = p(x,y)p(λ), the condition that is typically taken to formalize the notion of measurement independence and which asserts that Alice and Bob's choices are independent from the common source establishing the correlations.

A. When Measurement Independence Fails
The failure of measurement independence is called measurement dependence. In a scenario with measurement dependence we have p(x,y,λ) = p(x,y)p(λ). In stark contrast with the causal models of the standard Bell scenario given by Eq. (3), in the presence of measurement dependence we allow p(λ|x,y) = p(λ). Without measurement independence, the admissible correlations p(a,b|x,y) are fairly unrestricted; certainly standard Bell inequalities would no longer constrain them.
However, just because one grants that X or Y might be somewhat correlated with Λ does not mean that X and Bell's causal structure with measurement dependence. The purple arrows represent the fact that the correlations between X, Y and Λ can be due to some direct causal influence among them or mediated by an external common source. We emphasise that even though the settings are plausibly common-cause connected with Λ, typically one can also be confident that they are also (strongly) influenced by independent localized sources of randomness.
Y are not also largely functions of causal factors which are independent of Λ. To this end, we formally introduce local laboratory private randomness sources U x and U y for Alice and Bob respectively, as depicted in Fig. 2. We now provide some physical intuition for why it makes sense to model X and Y in terms of both causal factors confounded with Λ as well as private causal factors independent of Λ. Differently from the locality assumption that can be assured by invoking special relativity, the measurement independence is a thorny issue. Nevertheless, the independence of U x , U y and Λ can indeed be made very plausible. For instance, U x and U y could stand for stars emitting cosmic photons centuries ago [23,25] or even for the human randomness of hundreds of thousands of people around the globe [24]. It seems reasonable that such sources are independent of Λ, that in a photonic experiment would represent the laser and non-linear crystal employed to generate entangled photons.
By explicitly introducing U x and U y into our causal models, we can now rephrase the assumption of measurement independence as equivalent to the assumption that X depends exclusively on U x and not on any hidden factor correlated with Λ, i.e., the assumption that p(x|u x ,λ) = p(x|u x ). While is clear that (by definition) U x is independent of U y and Λ, it can be difficult to rigorously justify the assumption of the independence of X and Y from Λ.
As an illustration for why this is the case, suppose that U x represents some far away star emitting cosmic photons that define the variable X. There is no reason to doubt that this star is independent of a laser in a laboratory today. However, at some point, the photons emitted by these independent sources (the star and the laser) meet up within the physicist's lab and define the outcome A they will give rise to. Within this context, the atmosphere in the lab (acting as a medium for the photons) or whatever else that might affect the photons state can act as a source of corre-lations and lead to deviations from perfect measurement independence.
In short, even though the experiment might use a source U x that is independent of Λ, a causal mediary between U x and the measurement outcome variable A might nonetheless be influenced by something that also has an influence on Λ, so that X and Λ end up having a common cause (as illustrated in Fig. 2) and therefore the potential for small amount of statistical dependence between them. Fur-thermore, as it turns out, even a quite small amount of measurement dependence (to be explicitly quantified below) is already sufficient to simulate the maximum possible Bell inequality violation achievable with quantum mechanics [10,20].
Even though the sources U x , U y and Λ can be assumed independent, the most general local hidden variable model represented by the DAG in Fig. 2 implies that such that Note that p(λ|x,y) = p(λ) only if p(x|u x ,λ) = p(x|u x ) and p(y|u y , λ) = p(y|u y ), in which case then we recover the standard measurement independent model of Eq. (3).

B. Quantifying Measurement Dependence
In the special limit p(x,y,λ)−p(x,y)p(λ) → 0 we recover measurement independence, so one natural measure for quantifying the degree of measurement dependence in a given Bell experiment is [17] M := x,y,λ |p(x,y,λ)−p(x,y)p(λ)|.
Clearly, when M = 0, the measurement-dependent model (6) goes over to the usual measurement-independent model (3). A related measure that has also been considered in the literature [10,20] is the mutual information between the experimenter's choices and the source Λ, defined as is the Shannon entropy of the random variable X. Importantly, the measures (7) and (8) can be related via the Pinsker inequality [53] Within this context, we can then ask how much measurement dependence would be necessary to reproduce some Bell inequality violation using the classical causal model described by (6) (see Fig. 2). For instance, in the paradigmatic Clauser-Horne-Shimony-Holt (CHSH) scenario [35], where each of the parties can measure two possible dichotomic measurements, LHV models (3) respect is the expectation value of Alice and Bob's outcomes A and B conditioned on the inputs x and y (with all input and output variables taking values 0 or 1). For measurement-dependent models (6), one can prove that [17] (CHSH−2)/4 ≤ M (11) In turn, using the mutual information, it has been proven that [20] where h(x) is the binary entropy given by This implies, in particular, that even for the maximal quantum violation of 2 √ 2 of the CHSH inequality, a measurement dependence as low as I(X,Y : Λ) = 0.046 bits is already enough to reproduce the quantum predictions.
Equations (11) and (12) show that the violation of a Bell inequality can be explained by some non-zero degree of measurement dependence. The problem, however, is that whether or not there is such measurement dependence cannot be determined in a standard Bell experiment. In other words, one cannot know if the violation of a given Bell inequality was due to quantum effects or simply because the measurement independence assumption was not fulfilled. As such, the measurement dependence loophole remains as a possible classical explanation for the violation of a Bell inequality, even for milestone experiments violating a Bell inequality while sealing the locality and detection efficiency loopholes [54][55][56]. Figure 3. Causal structure assessing the measurement dependence. The extra measurement outcome variable R can be used to provide an upper bound on the measurement dependence. Notice that this model allows, at least in principle, for arbitrary correlations between the input variables X and Y and the source Λ.
At the core of the problem is the fact that, in a standard Bell experiment, there is nothing that implies an upper bound on the amount of measurement dependence.
In the following, we will show that by embedding a Bell experiment in a larger causal network that includes an auxiliary variable R which is influenced by U x and U y but is independent of Λ (see Fig. 3), we are able to derive an upper bound on the amount of measurement dependence. In essence, the idea is that, for this causal structure, if R is strongly correlated with X, then X must be only weakly correlated with Λ, and similarly for Y . (The limiting case of this tradeoff-the fact that perfect correlation between R and X implies no correlation between X and Λ-is what Fritz [41] used to establish that the triangle causal scenario can be mapped onto the Bell scenario. The latter observation is therefore the root of the approach to bounding measurement dependence described in this article.) This fact allows us to answer in an unambiguous manner (assuming the sources U x , U y and Λ to be independent) the question of whether the violation of a Bell inequality is due to the presence of quantum entanglement or due to measurement dependence. We will also extend our analysis to the multipartite case and show how our inequalities can be used to characterize a large class of causal networks that are increasingly attracting attention [39][40][41][42][43][44][45][46][47][48][49]57]. Prior to doing so, however, we briefly introduce the entropic framework for causal inference [58] that will be crucial to derive some of our technical results.

III. ENTROPIC INEQUALITIES AND CAUSAL NETWORKS
Our aim is to obtain an upper bound to the measurement dependence measures, M or I(X,Y : Λ) in terms of the degree of correlation exhibited between R and X,Y in the causal structure of Fig. 3. Note that the joint distributions on observed variables that are compatible with the causal structure of Fig. 3 are of the form: . (14) However, the fact that the sources U x , U y and Λ are independent implies that this set is nonconvex and therefore difficult to characterize [44,59]. This nonconvexity can be circumvented by the entropic approach introduced in [58, [60][61][62], allowing one to obtain analytical bounds for I(X,Y : Λ). The bounds can then also be translated into bounds on the L1-norm quantifier M via the Pinsker inequality [53]. A detailed account of the entropic approach can be found at [58]. Here we will introduce the central concepts necessary to understand the results that will follow.
Consider a set of n discrete random variables X 1 , ... , X n . We denote as [n] = {1, ... , n} the set of indices of these random variables.
For every subset S ∈ 2 [n] of indices, X S is the random vector (X i ) i∈S and H(S) := H(X S ) is its associated Shannon entropy defined by H(X S ) := − xs p(x s )log 2 p(x s ). We can construct an entropy vector h with all possible 2 n entropies for n variables (including the empty set) and ask what are the constraints for h to be a valid entropy vector. The region of the real space R 2 n corresponding to entropies is known to define a convex cone [63]; a complete and explicit description remains unknown. For this reason, one has to work with an outer approximation, known as the Shannon cone Γ n , defined by the set of linear inequalities given by . These inequalities are known as the elementary inequalities and any inequality that follows from the elementary set is said to be of the Shannon-type. The first constraint (15a) is known as monotonicity and states that the uncertainty about a set of variables should always be larger than or equal to the uncertainty about any subset of it, i.e., nonnegativity of conditional entropy. The second constraint (15b) is called strong sub-additivity and is equivalent to the nonnegativity of the conditional mutual information. That is, ≥ 0. The causal relations implied by a given causal structure can be easily integrated in this framework as linear constraints. For instance, the independence of the sources in the causal structure of Fig. 3 implies that H(U x , U y , Λ) = H(U x ) + H(U y ) + H(Λ). The subspace of R 2 n defined by all such constraints can be denoted as Γ c . Thus, any entropy vector compatible with a given causal structure should lie in the convex cone Γ c n := Γ n ∩Γ c . Since the sources are not directly observable in the Bell experiment, they need to be traced out from our description, an instance of a quantifier elimination problem that in the entropic case can be performed by a simple Fourier-Motzkin elimination [64].

IV. BOUNDING THE MEASUREMENT DEPENDENCE
As noted earlier, in order to upper bound the measurement dependence I(X,Y : Λ), we modify the causal structure to that of Fig. 3, wherein there is an extra variable R that might depend on the sources U x and U y but is independent of Λ.
Employing the general entropic framework introduced in [58] and outlined above, we can completely characterize the Shannon inequalities bounding the measurement dependence I(X,Y : Λ). For our purpose, the causal constraints implied by the DAG in Fig. 3 can be summarized by the entropic constraints Eq. (16a) follows from the independence of the sources while Eqs. (16b) encode the zero conditional mutual information between any random variable and its causal nondescendants given its parents, i.e., the local Markov condition. Using the approach delineated before and performing the corresponding Fourier-Motzkin elimination [64], we find three non-trivial upper bounds for I(X,Y : Λ). The entropic approach also gives rise to the lower bound given by Each of the upper bounds in Lemma 1 can combined with (11) or (12) to give rise to a non-linear Bell inequality.
Proposition 2. For observational data compatible with the classical causal structure in Fig. 3 we find that by virtue of combining (12) with Lemma 1, as well as by virtue of combining (11) with Lemma 1 through the Pinsker inequality (9).
Under the assumption that the sources U x , U y and Λ are independent, a violation of any combination of these inequalities would mean that whatever degree of measurement dependence is present, i.e., whatever value I(X,Y : Λ) takes, it is not enough to explain the observed correlations. Thus, we would be unambiguously witnessing nonclassicality. Notice that if the input variables X and Y are perfectly correlated with the auxiliary variable R, then H(X,Y |R) = 0 implying that I(X,Y : Λ) = 0. In this case, we recover the usual Bell scenario with no measurement dependence. It is important to highlight, however, that in our scenario, an upper bound on the amount of measurement dependence is implied by the empirical data (assuming the independence of sources) and not assumed a priori, like in a standard Bell scenario.
Notice that the upper bounds in Lemma 1 are valid for an arbitrary number of inputs and outputs. Thus, inequalities like (19) and (20) can be derived for arbitrary bipartite Bell scenarios. To illustrate, in [17] it has been noticed that the measure M can also be related to the CGLMP inequality [65], a Bell inequality bounding classical correlations in a scenario where Alice and Bob have d possible outcomes. More precisely, (I d −2)/4 ≤ M (21) where the CGLMP inequality is given by [65] with p(a x ,b y ) = p(a,b|x,y). Using the Pinsker inequality (9), we can readily derive a generalization of inequality (20).

Proposition 3.
For observational data compatible with the classical causal structure in Fig. 3, we find that Violation of inequality (24) implies that the degree of violation of the CGLMP inequality cannot be accounted for by measurement dependence and therefore attests to the presence of nonclassicality.

A. Example: The Fritz distribution
To illustrate our results we consider the Fritz distribution [41], the first known example connecting causal networks with Bell's theorem. In this case, all variables are binary and the measurement outcome of variable r consists of two bits, r = (r 0 ,r 1 ). As argued by Fritz, if the bit x is perfectly correlated with r 0 , this implies that x should be completely uncorrelated from the source λ. Similarly, perfect correlation between y and r 1 implies that y is uncorrelated from the source λ. That is, the variables X and Y can be seen as the standard measurement choices of Alice and Bob in the usual Bell scenario. Under this condition of perfect correlations, the violation of a standard Bell inequality by the conditional distribution p(a,b|x,y) is then a sufficient condition to witness the nonclassicality.
A quantum realization of such scenario is given by p(a,x,b,y,r 0 ,r 1 ) , where ρ AB denotes the density operator of the state shared between Alice and Bob (thus replacing the classical description in terms of the hidden variable Λ) and similarly for ρ XR0 and ρ Y R1 ; {M AX a,x } denotes a POVM acting on the physical system in possession of Alice (similarly for {M BY b,y } and {M R0R1 r0,r1 }). In the Fritz case, the sources ρ AB , ρ XR0 , ρ Y R1 are given by three singlet states |Φ = (1/ √ 2)(|00 +|11 ), and the POVMs have the following form: where {M R0 r0 }, {M R1 r1 }, {M X x }, {M Y y } are all measurements of the σ z basis, {M A a|x } corresponds to one of the two Pauli observables among {σ x ,σ z } depending on the value of x, and {M B b|y } corresponds to one of the two observables among {(σ z +σ x )/ √ 2,(σ z −σ x )/ √ 2} depending on the value of y. The fact that the measurements in Fritz's example have been chosen to ensure that the conditional p(a,b|x,y) violates a Bell inequality implies that Fritz's distribution p(a,x,b,y,r 0 ,r 1 ) has no classical explanation.
Any experiment that aims to realize the Fritz distribution in the triangle scenario aims to realize the ideal states and measurements specified above, but due to the inevitability of noise, the states and measurements that are actually implemented are necessarily noisy versions of these. Consider, for instance, that the source states are noisy versions of the Bell state, given by ρ = v|Φ Φ|+(1−v)1 1/4. This implies that the correlations between x and r 0 and between y and r 1 will not be perfect and the Fritz argument cannot be employed any longer. Even though in this case we do not have any measurement dependence, the point is that the correlations generated by such model are indistinguishable from a measurement dependent model. In other terms, to be sure about the non-classicality of the data we have to employ the causal network delineated above. For this case, however, Θ(X,Y,R) is given by where s(x) := x log 2 (x), implying that visibilities as high as v ≈ 0.994 are required to observe any violation of the inequalities (9) or (12) and thus witness non-classicality even in the potential presence of measurement dependence. It is worthy to point out, however, that the source shared between X and R 0 and the source between Y and R 1 do not need to have a quantum nature. Since we are simply measuring such states in the computational basis, the same correlations can be achieved with a classical source, significantly simplifying an experimental test.
In hindsight, it is not surprising that the inequalities we derive are not robust. It is known that measurement dependence is a very strong resource to simulate nonlocal correlations in a Bell scenario [10,20]. Remarkably, however, different approaches that do not hinge on Bell's theorem can tolerate a significant amount of measurement dependence, way beyond what is possible within the standard Bell scenario. As will be explored in details elsewhere, resorting to the inflation technique using the Web inflation [59, Fig. 2] we can derive new non-linear inequalities allowing for visibilities as low as v ≈ 0.907, indeed showing that the causal network we propose here not only leads to testable constraints but can also tolerate much more measurement dependence than usual approaches.

B. Multipartite Bell Inequalities without Measurement Independence
So far, we have focused on the bipartite scenario, but our results can be readily extended beyond this. Concretely, consider the case of n parties, each i-th part with an input X i and output A i ; the cardinalities of the inputs and outputs being arbitrary (see Fig. 4).
Similarly to the bipartite case, we introduce an auxiliary variable R that can depend on all the sources of local laboratory private randomness {U i } i where U i accounts for causal influences on X i and A i which are independent of Λ (with i = 1,...,n) (see Fig. 5). The joint distributions over the observed variables that are compatible with the causal structure of Fig. 5 are: p(a 1 ,...,a n ,x 1 ,...x n ,r) = u1,...,un,λ p(r|u 1 ,...,u n )p(λ) Observational statistics over the original observable variables together with R can then be employed to upper bound I(X 1 ,...,X n : Λ).
Clearly, if H(X 1 ,...,X n |R) = 0 we recover the usual measurement independent case characterized by the Bell inequality f (I) ≤ 0. Instead of using the Pinsker inequality to connect the L1-norm quantifier M with the information theoretical measure I(X 1 ,...,X n : Λ), we can try to derive lower bounds for the latter by exploring multipartite inequalities. This is how we achieve the following proposition, for the tripartite Bell scenario, by adapting the results in [10] to the Mermin-Ardehali-Belinski-Klyshko [38,66,67] inequality Proposition 5. For observational data compatible with the classical causal structure in Fig. 5 specialized to the case of three parties, we find that whenever the distribution over the inputs is uniform, i.e., when p(x,y,z) = 1 8 . Proof. As we show in Appendix A, assuming p(x,y,z) = 1 8 , we get which is the same for signaling and nonsignaling behaviours (see Appendix B). We then obtain Prop. 5 by combining (36) with Lemma 4.
To compare the lower bound on the amount of measurement dependence required to explain a given degree of violation in the case of the Mermin inequality (Eq. (36)) versus the case of the the CHSH inequality (Eq. (12)) is not straightforward as the Mermin inequality can be violated quantumly up to its algebraic maximum, while the CHSH inequality cannot. We have therefore considered the degree of measurement dependence as a function of the ratio between the violation and the maximum quantum violation. The result is plotted in Fig. 6 which demonstrates that Mermin requires more measurement dependence to explain away comparable violation ratios than CHSH.
Moreover, by assuming that some inputs never happen (as it is the case in the Mermin inequality), one obtains a Figure 5. Causal structure assessing measurement dependence in a multipartite Bell scenario. The auxiliary variable R allows one to practically upper bound the amount of measurement dependence that might be present. The double arrowed (purple) edges indicate that the correlations between the inputs variable Xi and Λ might arise from direct causal influence or via a common source. . The blue curve shows the lower bound on the measurement dependence I(X, Y, Z : Λ) as described by eq. (36) needed to explain a given degree of quantum violation for the Mermin inequality given by M−2 4 . A comparison between both curves shows that a higher degree of measurement dependence is required to explain the violation of the Mermin inequality with the same degree of quantum violation as the CHSH case, a point that can of experimental relevance when trying to close to violate measurement dependent Bell inequalities.
lower bound for I(X,Y,Z : Λ) that is exactly the same as in the CHSH scenario (see Appendix A), indicating that it is possible to explore biased distribution of inputs in the analysis of measurement dependence. scenarios have started to be considered, typically consisting of many independent sources [39][40][41][42][43][44][45][46][47]49]. The paradigmatic example is the so-called triangle scenario, shown in Fig. 7. Its most prominent feature is that it can lead to nonlocal correlations even though it has no input variables [41,49], an ingredient that until then was considered essential for the appearance of nonclassical behaviour. In spite of the growing theoretical and experimental attention [36], progress in the analysis of nonclassical behaviour in such causal structures has been hampered by the fact that the set of correlations they define is nonconvex and very difficult to be characterized [44,59]. In the following, we will show how our results for Bell scenarios with measurement dependence can be readily applied to derive new non-linear Bell inequalities for different classes of causal networks.
For the sake of example, we start by focusing on the triangle network. The most general correlations admissible in the triangle network have the form p(α,β,r) = ux,uy,λ p(α|u x ,λ)p(β|u y ,λ)p(r|u x ,u y )p(u x )p(u y )p(λ) (37) which is precisely the same form as Eq. (14) under the association α ↔ (A,X) and β ↔ (B,Y ).
As shown in [41], standard Bell scenario correlations can be mapped onto the triangle network, thus providing the first proof that such a causal structure can support nonclassical correlations. We generalize this result by noting that correlations in the nonstandard Bell scenario with measurement dependence of Fig. 3 can be mapped bijectively onto the triangle network via the common forms of Eqs. (14) and (37). This two-way mapping allows us to translate results both ways between those scenarios.
Note that we have relaxed the assumption that X has a direct causal influence over A. Upon allowing for measurement dependence, there is no further loss of generality in treating A and X on an equal footing, i.e., as a single composite outcome variable α = (A,X) that is a function of the sources U x and Λ. [68] We similarly merge B and Y into the single composite variable β = (B,Y ) . As detailed in the proof of Lemma 4, the bounds on the measurement dependence I(X,Y : Λ) only assume such general dependence. Thus, all the results we have derived above can be directly applied to the triangle network. Lemma 6. Let G Bell-MI+aux be the bipartite Bell scenario without the assumption of measurement independence supplemented with an auxiliary variable R as per Fig. 3. Let G triangle be the causal scenario depicted in Fig. 7. Then, a distribution P (a,b,x,y,r) is incompatible with G Bell-MI+aux if and only if P (α = (a,x),β = (b,y),r) is incompatible with G triangle . Corollary 6.1. Any correlations compatible with the classical triangle network should respect the non-linear inequalities (19), (20) and (24). Corollary 6.2. For the special case of triangle scenario correlations where H(X, Y |R) = 0, it follows that if P (a, b|x, y) violates a traditional Bell inequality, then P (α=(a,x),β=(b,y),r) is incompatible with G triangle .
As a consequence of Lemma 6 our results generalize the result of Fritz [41] in a number of ways, since Fritz's original argument given there was only applicable when H(X,Y |R) = 0 and furthermore was unable to explicitly derive a testable Bell inequality. It is worth pointing out that different Bell inequalities able to witness quantum nonlocality in the triangle have already been derived [49,69,70]. In particular, in [69] a specific inequality has been obtained for testing the mapping between a standard Bell scenario and the triangle network as proposed in [41]. Similarly, our inequalities (19) and (20) will also witness the nonclassical behaviour whenever the CHSH inequality is violated (if H(X,Y |R) = 0).
Moreover, our construction can be extended to prove the possibility of nonclassical behaviour in causal networks of growing size and where the variables can assume different cardinalities. As a generalization of the triangle causal structure we will consider any causal network inspired by the so-called bipartite graphs [71,72] and composed of two layers: a first layer corresponding to o+s sources labelled as {Λ 1 ,...,Λ o ,U 1 ,...,U s } acting as common causes to the n + m observable variables α 1 ,...,α n ,R 1 ,...,R m . In the following, we will restrict our attention to two particular classes of these general networks and in particular derive nonlinear Bell inequalities which can be violated by quantum correlations in those networks. In the first class we fix o = 1 and let s = n, whereas in the second class we fix s = 2 and let o = n−2.
A. "2's & n" networks As a first case, we consider networks of n+1 parties in which all parties except for R are connected by an n-way source, whereas R is connected to every other individual party by a 2-way source. An example of this if given in Fig. 8. We call such a scenario a "2's & n" network. As a second case we will consider the cyclic network of degree n where n−2 sources Λ i connect pairs of α i 's (with i = 1,...,n), source U 1 connects α 1 to R and U 2 connects α n to R (see Fig. 9). A "2's & n" network admits distributions of the form p(α 1 ,...,α n ,r) = u1,...,un,λ The mapping between "2's & n" networks and multipartite Bell scenarios with measurement dependence and an auxialliary variable R is ready evident by comparing the forms of Eqs. (28) and (38). The are equivalent under the relabelling α i ↔ (A i ,X i ).
Lemma 7. Let G MultiBell-MI+aux be the multipartite Bell scenario without the assumption of measurement independence supplemented with an auxiliary variable R as per Fig. 5. Let G "2's & n" be a causal scenario of the family depicted in Fig. 8. Then, a distribution P (a 1 ,...,a n ,x 1 ,...x 1 ,r) is incompatible with G MultiBell-MI+aux if and only if P (α 1 = (a 1 ,x 1 ),...,α n = (a n ,x n ),r) is incompatible with G "2's & n" . Corollary 7.1. Any correlations compatible with the "2's & n" network for n = 3 depicted in Fig. 8 should respect the non-linear inequality (35).
Firstly, note that the generic multipartite bounds implied by combining inequality (32) with Lemma 4 remain valid for "2's & n" networks, thus providing a general non-linear Bell inequality of the form (33) for it. To see that, notice that in the proof of Lemma 4, the crucial step (31) which invokes the causal structure under analysis only makes use of the causal assumption that R is independent of Λ, a condition fulfilled by "2's & n" networks. As a consequence, the generic non-linear Bell inequality (32) will hold, where I is a function of the conditional probability distribution p(a 1 ,...,a n |x 1 ,...,x n ).

B. Cyclic networks
Next we will consider the cyclic network. By mapping it onto an n-locality scenario [39,40,73], we will be able to solve an open problem in the characterization of quantum correlations in networks. More specifically, although it has been proven in [41] that the cyclic scenario of Fig. 9 gives rise to nonclassical correlations, the proof there relies of nonclassicality of a post-quantum nature [74]. It was left open whether a nonclassical behaviour that is quantumly realizable would be possible. The basic idea will be to map the bilocality scenario shown in Fig. 10 onto this cyclic scenario with n = 4. Notice, that in this particular bilocality scenario, there are no external inputs acting as causal parents of the central node A 2 . Thus we achieve a mapping by setting α 1 =(A 1 ,X 1 ), α 2 =A 2 and α 3 =(A 3 ,A 3 ). Figure 8. "2's & n" network for n = 3. One observable source connects three of the four observable variables; the other sources only connect pairs, however. More generally, a "2's & n" consists of observable variables α1,...,αn in addition to R. Notice that the triangle network (see Fig. 7) is a "2's & n" network with n = 2. Figure 9. Cyclic causal structure with n = 4. Each observable variable its connected to its neighbour via a common source. Notice that the triangle network (see Fig. 7) is a particular case of cyclic network with n = 3. We index the observable variables in and n-cyclic scenario as α1,...,αn−1 along with R.
There are many nonlinear Bell inequalities I ≤ L (where I is a polynomial function of p(a 1 ,a 2 ,a 3 |x 1 ,x 3 )) derived to characterize the causal structure of the bilocality scenario and that can be violated by quantum correlations. As an example, we have the bilocality inequality [39,40]: and where all input and output variables assume the values 0 or 1.
Following the recipe in [17], one can also relate the violation of such inequalities with the degree of measurement dependence M := λ1,λ2 |p(x 1 ,x 3 ,λ 1 ,λ 2 )−p(x 1 ,x 3 )p(λ 1 )p(λ 2 )| required to explain it. As before, one can then generally write g(I) ≤ M where now g is a polynomial function of the Bell inequality I such that g(I) > 0 if the inequality is violated and g(I) ≤ 0 otherwise. Thus, we also obtain In particular, if X 1 and X 3 are perfectly correlated with and R, then H(X 1 ,X 3 |R) = 0 and the quantum violation of any bilocality inequality would be enough to witness quantum nonclassicality in such a cyclic network.
Leveraging the linear chain n-locality scenario considered in [73], one can demonstrate nonclassicality in cyclic networks of arbitrary size. In the linear chain n-locality scenario, we have n+1 observable variables A i of which only A 1 and A n+1 have inputs, denoted X 1 and X n+1 respectively. If we treat α 1 = (a 1 ,x 1 ) and α N = (a n+1 ,x n+1 ) and otherwise α i = a i , and we close the cycle by introducing the sources U 1 and U 2 that connect α 1 and α n to the variable R, we obtain the (n+2)-cycle causal network. The joint distribution on observed variables that are compatible with this network are Figure 10. Bilocality causal structure, shown here with measurement dependence (sources and local inputs potentially correlated) along with an auxiliary variable R. In the absence of measurement dependence this scenario is akin to an entanglement swapping scenario [75] with a central node sharing correlations with two peripheral nodes via two independent sources. More generally, in an n-locality scenario with measurement dependence and an auxiliary variable, we have a chain of outcome variables A1,...,An+1 along with R, and only the two peripheral nodes A1 and An+1 have inputs.
p(α 1 ,...α n+1 ,r) = p(r|u 1 ,u 2 )p(α 1 |u 1 λ 1 )p(α n+1 |u 2 ,λ n )p(λ 1 ) Compatible distributions in the n-locality scenario with measurement dependence and an auxiliary variable R, that is, generalizations of Fig. 10, are of precisely the same form but with the relabelling As before, if X 1 and X n+1 are perfectly correlated with R then H(X 1 ,X n+1 |R) = 0 and the violation of any n-locality inequality bounding the linear chain scenario will suffice to demonstrate nonlocality also in the cyclic scenario. An example of such inequality was provided in [73].

VI. DISCUSSION
If observed statistical correlations are found to violate a Bell inequality, then one can conclude that these correlations cannot be explained by a classical causal model having the causal structure of Fig. 1. The moniker of 'Bell nonlocality' has traditionally been assigned to this phenomenon because one can avoid the contradiction through a modification of the causal structure of Fig. 1 that incorporates a (nonlocal) causal influence from the setting on one side to the outcome on the other. It is well known, however, that there are other opportunities for evading the contradiction. In particular, one can modify the causal structure of Fig. 1 by relaxing the assumption that there is no causal influence from Λ to the setting variables X or Y and no common cause of Λ and X or of Λ and Y , that is, by relaxing the assumption of measurement independence. We have argued that there are two distinct causal mechanisms by which the assumption of measurement independence might fail in a Bell experiment. The first mechanism is a violation of independence between Λ and variables that causally determine the settings X and Y . If the settings are made to depend causally on cosmic photons [23,25], then this version of the assumption seems especially plausible, since denying it seems to require assuming a superdeterministic world wherein everything is potentially correlated with everything else. The second mechanism is one wherein some systems that mediate the influence of the ultimate causal determinants of the settings (such as cosmic photons) become influenced by Λ or by a variable that also influences Λ. In other words, even though the variables that determine the settings may start out independent of Λ, they might become correlated with it (e.g., when the cosmic photons enter the laboratory). It is this second class of mechanism for measurement dependence that we have here shown can be subjected to an experimental test.
We embed the Bell causal structure in a larger network and assume the independence of the variables that causally determine the settings (i.e., we rule out, by assumption, the first mechanism for achieving measurement dependence), and show that in this case it is possible to upper bound the amount of measurement dependence (of the second kind) based on observational data. Combining these upper bounds with previous lower bounds for the amount of measurement dependence required to explain a given violation of Bell inequalities, we are able to derive nonlinear Bell-type inequalities whose violation is a proof of can witness nonclassicality in spite of the presence of some measurement dependence (of the second kind).
To our knowledge, this is the first demonstration of the possibility of putting an upper bound on the amount of measurement dependence in a Bell experiment, a feature that deserves further theoretical and experimental investigation. It is worth remarking that the results in [70] can be understood as complementary to ours. There, focusing on the triangle network, it has been shown that under the assumption of perfect correlation between some variables (ruling out any possible measurement dependence of the second kind) they can witness nonclassicality even allowing correlations between the sources (thus allowing measurement dependence of the first kind, even though the latter cannot be upper bounded by observational data).
Following that, we have also shown how measurementdependent Bell causal structures can be readily mapped onto causal networks of growing size and complexity, a field of research that is attracting growing attention but for which advances have been hampered by the difficulty in deriving Bell inequalities. By doing so, we have derived a robust Bell inequality especially suited to test the Fritz correlations [41] in the triangle network and which, in contrast to previous attempts, does not require perfect correlations between some subset of variables [70], which is a welcome feature for achieving an experimental implementation. Finally, by mapping fully connected networks onto multipartite Bell scenarios and by mapping cyclic networks to the linear n-locality scenario, we were able to show that such networks can give rise to correlations that witness nonclassicality.
We believe that our results are just a first step towards a better understand of measurement dependent causal models and how these can be tested experimentally. For instance, upper bounds like the one in (33) employ the Pinsker inequality, which is known to be non-tight. Can we employ more modern techniques such as those in [20], the covariance [71,76] or the inflation technique [59], in order to provide better lower and upper bounds to M and I(X,Y : Λ) and thus improve our inequalities? We have proven that is possible for the tripartite scenario by deriving new results for the Mermin inequality.
It is noteworthy that the analysis we have carried out here can also be extended to analyze measurement dependence in different causal structures, in particular in the instrumental scenario (which plays a central role in the field of causal inference). As is further explored in [77], measurement independence is also a crucial assumption in causal inference, the violation of which has important consequences to the analysis of cause and effect in empirical data [78,79].
Finally, we believe that the techniques we have introduced here can offer a way to tackle open questions in the study of networks. As mentioned before, the first example of nonclassical correlations in the triangle network was provided by a mapping of that network onto the usual Bell network [41]. Other examples of nonclassicality that do not hinge directly on Bell's theorem are known [49]. However, it remains an open question of how can one prove that the nonclassicality observed in a given network is truly different from that in Bell's theorem. Further exploration of the connection between Bell scenarios with measurement dependence and networks may offer a way to better understand their similarities and their differences. We hope that our results might trigger future research along all of these directions.