Monogamy of Temporal Correlations: Witnessing non-Markovianity Beyond Data Processing

Matheus Capela,1 Lucas C. Céleri,1, 2 Kavan Modi,3 and Rafael Chaves4, 5 1Institute of Physics, Federal University of Goiás, POBOX 131, 74001-970, Goiânia, Brazil 2Department of Physical Chemistry, University of the Basque Country UPV/EHU, Apartado 644, E-48080 Bilbao, Spain 3School of Physics & Astronomy, Monash University, Clayton, Victoria 3800, Australia 4International Institute of Physics, Federal University of Rio Grande do Norte, 59070-405 Natal, Brazil 5School of Science and Technology, Federal University of Rio Grande do Norte, 59078-970 Natal, Brazil (Dated: October 11, 2019)


I. INTRODUCTION
The importance of Markov processes can hardly be overstated. In short, a process is Markovian if, in order to predict its future state, the present state contains as much information as the full previous history. That is, a Markov process keeps only information about its immediate past. Applications of it range from computer science [1], causal inference [2,3] and statistics [4] to social sciences [5] and genetics [6]. Within physics, random walks [7] and the Brownian motion [8] are paradigmatic examples of Markov processes. In quantum information, they also play a key role [9], particularly in the understanding of open-system dynamics [10].
Mathematically, a stochastic discrete process {X n , t n ∈ T} is Markovian if the probability that the random variable X n takes a value x n at time t n ∈ T, is uniquely determined, and not affected by the possible values of X at previous times to t n−1 . That is, p(x n |x n−1 , . . . , x 1 ) = p(x n |x n−1 ), for all t n ∈ T. (1) Given a joint probability distribution p(x 1 , . . . , x n ), to check if it arises from a Markovian process we have to test all the n conditions expressed in (1). However, if the number n of variables or their cardinalities (the number of possible values they can assume) is large, it is practically impossible to gather enough statistical data to reconstruct p(x 1 , . . . , x n ) and thus check for its Markovianity. As an example, an English text with 1000 letters has 1000 26 = 10 78 possible variations, close to the estimated number of atoms in the universe. For that reason, we often have to rely on marginal information, for instance, only the pairwise correlations between two variables. Because of this estimation limitation, we face a particular case of a marginal problem [11]: given some limited/marginal information, what can we conclude from the global object?
Formally, the problem is equivalent to quantifier elimination [12,13]: starting from a set of constraints -in this case the positivity and normalization of probabilities plus the Markov conditions (1)-we aim to eliminate from our description those variables to which we do not have empirical access (e.g., correlations involving more than two variables). The same problem arises in causal inference [14], an extremely demanding computational task that cannot be performed beyond very few variables [15]. A more treatable approach is to ask instead, what are the constraints implied by Markovianity on the entropies of the variables of interest [11,[16][17][18][19]. For instance, a simple and yet fundamental result implies that Markov processes fulfil data processing inequalities, basically stating that as we move along the Markov chain the correlation between the consecutive cannot increase, but may remain constant. Data processing inequalities are a consequence of the Markov condition, imposing constraints to the correlations between the variables in different time steps, but are they the only consequence? This is the central question we explore in this article.
As we show, indeed data processing inequalities are the only consequence of the Markov condition for n = 3. However, for n ≥ 4, Markovianity implies new kinds of monogamy constraints for the correlations in the Markov chain, violations of which can be seen as a device-independent test [20][21][22] of the non-Markovianity of the underlying process. Interestingly, arXiv:1910.04236v1 [quant-ph] 9 Oct 2019 by employing an operational definition of divisibility [23], we show that such condition is sufficient for a process satisfying all data processing and monogamy inequalities. Furthermore, we show how the violation of these new constraints can also be connected with the quantification of causal influences [16,24,25] among the variables. Finally, we consider a quantum information related application of this framework, showing that non-projective measurements can give rise to statistics compatible with data processing, but nonetheless can violate our new monogamy inequalities, thus witnessing the non-Markovianity arising from such quantum measurements.
The paper is organized as follows. In Sec. II we review the basic toolbox to be used in this paper: the entropic approach to the marginal problem [11,[16][17][18]. In Sec. III we employ this framework to show that a new kind of constraint, beyond that given by data processing inequalities, arise for Markov chains with n ≥ 4. The question regarding divisibility is addressed in Sec. IV. In Sec. V we show that the violation of both the data processing and the monogamy inequalities can be given a causal interpretation. Finally, in Sec. VI we conclude and discuss our findings. More technical details and proofs of our results can be found in the Appendix.

II. SHANNON ENTROPIES, ENTROPY CONES AND MARGINAL PROBLEMS
The Shannon entropy of a random variable X is the fundamental building block in information theory [26]. It is defined as where the sum is taken over the support of X. If we are interested in the entropy of n variables X 1 , . . . , X n , it is useful to construct the entropy vector h associated with these variables as h = (H(∅), H(X 1 ), H(X 2 ), H(X 1 , X 2 ), . . . , H(X 1 , . . . , X n )). That is, h is a vector with 2 n components given by all possible entropies among n variables (including the empty-set ∅ for which we define H(∅) = 0). Within this approach, a natural question is to understand which vectors in the real space R 2 n define the entropy vectors. The first trivial constraint follows from the fact that entropies are positive quantities, that is, the entropy vector cannot have negative components. We are thus restrict to the positive-orthant of R 2 n . The second constraint comes from the realization that entropies define an unbounded convex set, the so-called entropy cone [26]. We can think of it as a hyper-dimensional and infinite ice cream cone where the tip of cone lies in the origin of our coordinate system. Annoyingly, however, the exact structure of the real space defining entropy vectors is still not precisely known, the best general description being given by an outer approximation of the true entropy cone, the so called Shannon cone [26]. The nice thing about the Shannon cone is the fact that it is defined in terms of finitely many linear inequalities of two types: The first inequality is known as strong subadditivity (or submodularity), basically stating the positivity of the conditional mutual information. The second constraint is known as monotonicity, stating the positivity of conditional entropies or alternatively showing that the uncertainty of the whole is at least as large as the uncertainty of its parts. Given a collection of n variables there are n + ( n 2 )2 n−2 non-redundant inequalities (that is, inequalities that cannot be obtained by combining the other inequalities) defining the Shannon cone. This minimum set of linear inequalities is known as the elemental set of Shannon type inequalities.
Given the Shannon cone (as represented by the el-emental set), we can ask the following: what are the constraints following from the elemental set on the subspace where we eliminate some the variables in the entropy vector [11]? As an example let us consider the following situation. Three people, seating in different rooms in such a way they cannot directly communicate (but are allowed to establish some pre-shared correlations), have to answer questions to a referee. Each of the answers of the three participants being represented by the random variables X 1 , X 2 and X 3 . However, at a given run of our experiment, a referee will only ask questions to two of them, that is, the referee does not have access to the entropy H(X 1 , X 2 , X 3 ) corresponding to the event where the three participants would give their answers. This means that we have to eliminate this entropy from our description, implying that our object of interest is a new entropy vector where we trace out one of its components. Formally, the problem is equivalent to a quantifier elimination problem [12]: We have a set of (linear) inequalities and we want to have the equivalent description of this set where some of the variables appearing in the inequalities have been eliminated from the problem. Coming back to our problem. To obtain an inequality that does not depend on H(X 1 , X 2 , X 3 ), we can simply sum the two Shannon type inequalities in (2) to obtain (3) or in terms of mutual information I(X : Y) = H(X) + H(Y) − H(X, Y) (a measure of correlations between X and Y) As we can see, the simple assumption about the existence of a joint probability distribution describing the three variables already imply constraints about their pairwise correlations.
To illustrate the use of these marginal constraints, consider dichotomic answers (yes/no questions). Also, suppose that after we run the experiment a sufficient number of times, the referee observes that all answers are unbiased (H(X 1 ) = H(X 2 ) = H(X 3 ) = 1). Furthermore, answer X 1 is fully correlated with X 2 and X 3 (that is I(X 1 : X 2 ) = I(X 1 : X 3 ) = 1); however X 2 and X 3 are uncorrelated (I(X 2 : X 3 ) = 0). If we plug in these values in (4) we see that this inequality is violated. What does this mean? Notice that (3) follows from the assumption that even if we cannot observe it, there is a well defined H(X 1 , X 2 , X 3 ) joint entropy for all the answers. The violation of the inequality shows that this assumption is not valid: From the marginal observations we cannot construct a well-defined H(X 1 , X 2 , X 3 ) (in such a way that all the elemental inequalities are respected). In fact, notice that the distribution violating the inequality is a bit odd. Since X 2 and X 3 are fully correlated with X 1 , by the transitivity of correlations we would expect that X 2 and X 3 are also fully correlated. We can understand this distribution as a violation of the rules of the game. At each run, parties 2 and 3 communicate through a secret channel: If in a given particular run one of them is excluded of the game (the referee does not ask one of them any question), then the other part (which was asked something) will use a strategy that correlates his answer with answer X 1 ; if both parties 2 and 3 are asked questions, they just give completely uncorrelated answers between them.

III. MARKOV PROCESSES BEYOND DATA PROCESSING INEQUALITIES
In the case of a Markov process we can follow a similar construction to the one delineated above. We have n variables respecting the usual elemental inequalities; however, in this case, we also have a set of new constraints that follow from the Markov conditions (1). In terms of entropies, (1) can be expressed as That is, in this case we are interested in the intersection of the Shannon cone with the hyperplanes defined by (5). We can proceed with the quantifier elimination and thus eliminate all terms but the marginal involving single and two-body terms. Let us start with the simplest possible Markov chain with n = 3. In this case, the only Markov condition is given by H(X 3 |X 1 , X 2 ) = H(X 3 |X 2 ). Performing the quantifier elimination step we observe that the only non-trivial inequalities characterizing the marginal Shannon cone are given by that is, we recover the usual data processing inequalities that we expect to hold in a Markov chain. By non-trivial, we mean inequalities that are not simple elemental inequalities (strong subadditivity and monotonicity). In this geometric perspective, data processing inequalities are nothing else than the facets of the Shannon cone intersected with the hyperplanes defining the entropic Markov conditions (5), marginalized to the subspace where the joint entropy between the three variables has been eliminated. For instance, the data processing inequality I(X 1 : X 2 ) ≥ (X 1 : X 3 ) is a direct consequence of combining the strong subadditivity I(X 1 : We can now move to the case of a Markov chain with n = 4. In this case, we have two Markov conditions: . Performing the quantifier elimination, we observe that the only non-trivial (and non-redundant) inequalities are given by where we have used the short-hand notation I i,j ≡ I(X i : X j ). We highlight two aspects in the set of the inequalities above. First, we notice that there are other data processing inequalities that follow in this scenario (five more), for instance, I 1,2 ≥ I 1,4 . However, all these other data processing inequalities are redundant, in the sense that they follow from the set above combined with Shannon type inequalities (see Appendix for a proof); this set is the minimum non-redundant one [27]. Second, the most interesting feature comes from the fact that we have a new kind of inequality emerging for n = 4 and that is not of the data processing type. It shows that in Markov processes not only the correlations should decrease as we move along the chain (as quantitatively expressed by the data processing inequality) but also that the pairwise correlations between the variables should respect a monogamy kind of constraint. Interestingly, as shown below there are non-Markovian processes respecting all data processing inequalities but nonetheless violate our new derived inequality, thus showing the clear relevance of it.
In particular, notice that the sum of the distances between the nodes involved is the same in the left hand side (LHS) and right hand side (RHS) of the monogamy inequalities. For instance, the mutual information I 1,6 involves nodes X 1 and X 6 that have four nodes in between them. So, for n = 6 the sum of the distances in the LHS is 4 + 2 + 0 = 6, while in the RHS we have 2 + 3 + 1 = 6. A similar analysis shows that for n = 8, the sum of distances in the LHS gives 6 + 4 + 2 + 0 = 12 and the RHS gives 3 + 5 + 3 + 1 = 12; for n = 10 the LHS gives 8 + 6 + 4 + 2 + 0 = 20 and RHS gives 4 + 7 + 5 + 3 + 1 = 20. Based on that and the clear pattern observed for n = 4, 6, 8, 10 we conjecture that the following monogamy inequality holds for arbitrary even n This result is based on induction and an analytical general proof is missing.

A. Examples of monogamy violation while satisfying the data processing inequalities
Two bit process. We now given an explicit example of a classical non-Markovian process where the data processing inequalities hold, while the monogamy inequality does not. Consider that variable X 1 is binary (x 1 = 0, 1) and all the others can assume four values x 2 , x 3 , x 4 = 0, 1, 2, 3. The process is described by the following four steps: 1. Let x 1 = 0, 1 with probability 1/2.
This protocol generates a distribution p(x 1 , 3,3,3). Computing the associated entropies we get: We can check that this distribution respects all eight data processing inequalities (the four in (6) plus the four redundant ones) but nonetheless violates the monogamy constraint in (6). Notice that the process is Markovian among the three first nodes (X 2 is a function of X 1 alone, and X 3 is a function of X 2 alone). However, X 4 has a direct dependence on the values of X 2 and X 3 (thus breaking the Markov condition).

Non-Markovianity from non-projective measurements.
We now discuss another example where the data processing inequalities are satisfied but the monogamy inequality is violated.
Here the non-Markovianity arises from a sequence of non-projective quantum measurements. The generalized quantum measurements are defined by a collection of measurement operators M x satisfying the completeness relation ∑ x M † x M x = 1, where each index x is associated with an experimental outcome and 1 is the identity operator on the system's Hilbert space. Here we consider the case of generalized measurements performed on a qubit system. The first collection of measurement operators, written in the computational basis, are defined by These measurement operators were chosen such that each A † i A i (i = 1, 2, 3, 4) has the same eigenvalues, 0.3943 and 0.1057, but the associated eigenvectors are different.
The second set of measurement operators considered here are defined by where |± = (|0 ± |1 )/ √ 2 and the real parameter varies as 0 ≤ α ≤ 1. For α = 1, the operators describe a projective measurement, while a weak one is described when α 1. The protocol considered here is defined by four sequential measurements after the preparation of a qubit state : The first measurement is defined by {A x } x=1,··· ,4 and the second one by {B y } y=1,2 . The third and fourth measurements are just a repetition of the first and second ones. Such sequential measurements are described by the operators M i,j,k,l = B l A k B j A i (i, k = 1, 2, 3, 4 and j, l = 1, 2), associated with the joint probability If the initial state is = |+ +| it is found that the monogamy inequality is violated in the region greater  6), as a function of the parameter α (defining the measurement operator). The shaded region shows the values of α for which all data processing inequalities are satisfied (are positive) while the monogamy inequality is violated (is negative). There are 9 data processing inequalities for n = 4 but only 6 of them have to be considered here (see Appendix for more details). than α 0.8, but none of the data processing inequalities are violated, as shown in Fig. 1. This example is interesting for several reasons. First, while the underlying process is quantum, it leads to a classical process described by the distribution in Eq. (8). Secondly, the quantum process itself here is Markovian [28]. We can think of it simply as the identity channel or we can think the measurements in the computational basis and the process as a unitary transformations between the measurements. In both cases the quantum process is Markovian, but it leads to a non-Markovian classical distribution. Therefore the non-Markovianity must arises from the measurements themselves. In the classical domain, making coarse measurements can turn a Markov process into a non-Markov process, see Examples 4-6 in [29]. In this example the coarseness comes from the fact that the chosen quantum measurements are not sharp, i.e., rank-1 projections. Finally, the example illustrates a key difference between classical stochastic processes and quantum stochastic processes: In the former theory there is an assumption of non-invasiveness, while the latter requires invasive measurements to say anything about the process [30].

IV. INEQUALITIES FOR DIVISIBLE PROCESSES
Divisible processes form a special superset of Markov processes. Before we talk about divisible processes we first introduce the notion of stochastic matrices. Consider the process from X → Y; the stochastic matrix Γ Y:X maps any initial distribution p(X) to the corresponding final distribution p(Y). The stochastic matrix can be acquired from the joint probability distribution p(X, Y) as . . .
Let us now consider the process X → Y with the stochastic matrix Γ Y:X and the process X → Z with the stochastic matrix Γ Z:X . A process is called divisible if matrix G Z:Y := Γ Z:X Γ −1 Y:X is also a stochastic matrix [31]. This is known as divisibility by inversion. There is also a stronger operational notion of divisibility. Above, we have only used joint distributions p(X, Y) and p(X, Z). If we also have access to p(Y, Z) then we can compute Γ Z:Y and check if Γ Z:X = Γ Z:Y Γ Y:X . The latter condition is stronger than the former because Γ Z:X represents the actual process Y → Z (see [23] for details).
Importantly, it is known that there are non-Markovian processes that are divisible [23]. This is because divisibility only accounts for pairwise correlations and neglects higher-order correlations in time. Then a natural question is whether divisible processes satisfy the set of inequalities presented, for instance, in (6)? Recently, the equivalence between a non-entropic data processing inequality and divisibility was proved in Ref. [32]. It is important to stress that their data processing inequality is distinct from the ones considered here. Furthermore, Ref. [32] only considers divisibility by inversion, and the equivalence may not hold for when operational divisibility is considered. Here we show that the operational divisibility is sufficient for satisfying the data processing and the monogamy relations.
Suppose a process is operationally divisible; then for any map Γ Z:X can be written as Γ Z:X = Γ Z:Y Γ Y:X for some intermediate time step Y. Here, the RHS is a Markov process, meaning all pairwise correlations, i.e., the mutual information {I(Z : X), I(Y : X), . . . } can be obtained from the underlying Markov process. Since the inequalities in Eq. (6) only requires pairwise correlations, which in effect come from a Markov process, all inequities there will be satisfied because they are derived under the Markov assumption. It is worth stating that the same argument does not hold if the process is divisible by inversion.

A. Example relating divisibility and the inequalities
A divisible non-Markovian process. We give here an illustrative example of a non-Markovian process that is divisible and thus only carries higher-order correlations. Consider a one bit process with x 1 = 0, 1 with probability 1/2. Let x j = y j for j = 2, 3, where y j are random bits. Finally, we let x 4 = x 1 + y 2 + y 3 (here we have modular addition). It's clear that the mutual information between any two marginals will be zero since they are all independent and random. In fact, the process is divisible and therefore all inequalities in (6) will be trivially satisfied. However, higher-order correlations, those containing correlations between three or more variables, cannot be obtained in the same way. Indeed this is exactly why a divisible process can be non-Markovian. For instance, the mutual information such as I(X 4 : X 1 X 2 X 3 ) or I(X 4 : X 1 |X 2 X 3 ) will not vanish.
Here we find that the monogamy inequality and all data processing inequalities hold.

V. CAUSAL INTERPRETATION OF THE DATA PROCESSING AND MONOGAMY INEQUALITIES
The violation of the data processing or monogamy inequalities imply that the Markov the constraint is not fulfilled by the process under investigation. It seems natural that the more we violate such constraints, the more non-Markovian the process should be. In order to formalize that quantitatively, we make use here of the causal Bayesian networks formalism [14].
A central concept is that of a directed acyclic graph (DAG) which has the variables X i 's as vertices. The directed edges in the DAG represent relations of causal and effect, reason why the graph should be acyclic, otherwise we would incur in paradoxical situations where the effect is its own cause. For the X i 's to form a Bayesian network (with respect to the DAG), every variable should be expressed as a function of its graphtheoretical parents PA i and an unobserved noise term N i (such that the N i 's are jointly independent). That is the case if and only if the distribution is of the form We notice that this is equivalent to the so-called local Markov property stating that every X i is conditionally independent of its non-descendants ND i given its par- Within this context, a Markovian process is nothing else than a causal model where a given variable X i has only X i−1 as a parent and X i+1 as a descendant (see Fig.  2). Thus, a natural way to quantify non-Markovianity is to quantify how much causal dependence X i has on X i−2 , . . . , X 1 . To that aim, we notice that a list of reasonable postulates any measure of causal strength should fulfill has been proposed in [24], in particular the axiom stating that where C X→Y is the causal strength of X into Y and PA X Y stands for the parents of variable Y other than X. In general, if the Markovian condition is not fulfilled, , this means that the variables (X 1 , . . . , X i−2 ) have a direct causal influence over X i (see Fig. 2) and that can be quantified as To draw a connection between causality, entropies and Markovianity, we first notice that if a process is Markovian then (10) as can be seen by a direct application of the chain rule plus the Markov conditions (5). A violation of (10) implies that at least one of the Markov conditions (5) is not fulfilled. In other terms, there is a causal influence over some X i that is not simply given by its immediate past variable X i−1 . Nicely, it can be demonstrated (see Appendix for the proof) that the following inequality holds for the sum of all non-Markovian causal influences: where n ≥ 3. The right hand side is equal to zero if and only if the entropic Markov constraints (5) are fulfilled. Therefore, the sum of causal influences equals to zero if and only if the entropies of the stochastic process are indeed Markovian. The more the Markov equality (10) is violated, the more non-Markovian causal influences should be present in the underlying process.
To further illustrate the connection between causality and Markovianity, consider the data processing gap, expressed as DP 1 23 = −I 1,2 + I 1,3 , which is always negative for Markov processes and may be positive for non-Markovian processes. A violation of it implies that DP 1 23 > 0 and thus H(X 3 |X 2 , X 1 ) = H(X 3 |X 2 ). Using the entropic framework described above we can prove that (see Appendix) thus showing that the more we violate the data processing inequality the more direct causal effect the variable X 1 has over X 3 . Similarly, considering the monogamy gap for n = 4, expressed as M 4 = −I 1,4 − I 2,3 + I 1,3 + I 2,4 , is also negative for all Markov processes and can be positive for some non-Markovian process. As proven in the Appendix, the monogamy inequality imposes a lower bound to the total non-Markovianity of the process, as quantified by the sum of causal influences We could analytically prove (see Appendix), up to n = 10, that the following inequality holds ∑ i=3,...,n where, similarly to M 4 defined above, M n is the sum of mutual information terms appearing in the monogamy inequalities (7). Unfortunately, a general result for larger chains is still missing. Overall, this framework allows for an operational causal interpretation for the violation of data processing and monogamy inequalities, as a lower bound for the non-Markovian causal influences necessary to explain the observed correlations.

VI. DISCUSSION
The main question addressed in this article is the following: if a process is Markovian, what are the additional restrictions (beyond data processing inequality) imposed on the pairwise correlations between the variables in the Markov chain? This is a very important question since we do not have, in general, access to the full probability distribution describing the process. For instance, we cannot perform sequential time measurements in a continuous way, implying that a sequence of events in time must necessarily be discrete. Our contribution in the search for answering this question, based on an entropic approach, is threefold.
First, we know that Markov processes satisfy all data processing inequalities, which are restrictions on the possible correlations between distinct time steps. However, the converse to this statement is not true. Along these lines, we demonstrate that data processing constraints are not the only consequences of the Markov condition. We proved a new class of relations, named monogamy inequalities, that can be violated by non-Markovian processes that satisfy all data processing inequalities. In this way, the violation of one of these monogamy inequalities can be employed as a deviceindependent test of the non-Markovianity of the underlying process. It is worthy to highlight that increasing the size of the Markov chain it is likely that new kinds of constraints, beyond data processing and monogamy, might appear. Also, in the derivation of the monogamy inequalities, only strong subadditivity constraints have been employed; meaning that they also hold for von Neumman entropies, if the corresponding joint density matrix respecting the Markov condition can be defined [28,33,34]. Exploring these possibilities defines a clear venue for future research.
Our second contribution deals with the concepts of Markovianity and divisibility, sometimes seem as synonymous in the literature. By considering divisible processes, we showed that the definition of operational divisibility is sufficient to guarantee that the process will satisfy all data processing and monogamy inequalities, even if the process is non-Markovian (but operationally divisible). This result is very interesting because it points out the distinction between Markovianity and divisibility, in such a way that it can be experimentally investigated.
Finally, our third contribution is to build a connection between the violation of data processing and monogamy inequalities with causal influences. In short, the more these inequalities are violated, greater will be the causal influence from the past. This implies that these relations can be employed as a quantifier for causal influences, since they satisfy all the requirements for a bona fide measure.
In summary, the ideas put forward in this article may have potential applications in several fields, like causal inference, statistical physics and information theory.

Analytical derivation of the data processing inequality
Let us first see how the data processing inequalities can be proven. Simply use the Markov condition H(X 3 |X 1 , X 2 ) = H(X 3 |X 2 ) (rewritten as H(X 1 , X 2 , X 3 ) = H(X 1 , X 2 ) + H(X 2 , X 3 ) − H(X 2 )) in the elemental inequality that can be rewritten as I 1,2 ≥ I 1,3 , employing the same notation used in the main text.

Analytical derivation of the monogamy inequality for n = 4
Let us now prove the new inequality beyond data processing. Add the following basic inequalities Using the Markov conditions H(X 3 |X 1 , Substituting that in the expression above we get that can be rewritten as −I 1,3 + I 1,4 − I 2,4 + I 2,3 ≥ 0,
In turn, this set of 6 DP inequalities is implied by the 4 DP inequalities plus the monogamy inequality in (6). For instance, the DP inequality I 2,3 ≥ I 1,3 can be obtained by summing the DP inequality I 2,4 ≥ I 1,4 with the monogamy inequality. Similarly, the DP inequality I 2,3 ≥ I 2,4 can be obtained by summing the DP inequality I 1,3 ≥ I 1,4 with the monogamy inequality. Thus, the non-redundant set of DP inequalities is given by those expressed in (6).

Causal Influences and Entropy
Consider a stochastic process {X 1 , · · · , X n }. One defines the joint Markov entropy by as the joint entropy of Markovian processes has the exact form above. The sum of the non-Markovian causal influences is lower bounded as follows n ∑ i=3 C (X 1 ,...,X i−2 )→X i ≥ H Markov (X 1 , · · · , X n ) − H(X 1 , · · · , X n ).
The proof follows by mathematical induction on the number of random variables. Suppose that the following equality is valid for n, n ∑ i=1 I(X 1 , · · · , X i−2 : X i |X i−1 ) = H Markov (X 1 , · · · , X n ) − H(X 1 , · · · , X n ).