Contextual advantage for state discrimination

Finding quantitative aspects of quantum phenomena which cannot be explained by any classical model has foundational importance for understanding the boundary between classical and quantum theory. It also has practical significance for identifying information processing tasks for which those phenomena provide a quantum advantage. Using the framework of generalized noncontextuality as our notion of classicality, we find one such nonclassical feature within the phenomenology of quantum minimum error state discrimination. Namely, we identify quantitative limits on the success probability for minimum error state discrimination in any experiment described by a noncontextual ontological model. These constraints constitute noncontextuality inequalities that are violated by quantum theory, and this violation implies a quantum advantage for state discrimination relative to noncontextual models. Furthermore, our noncontextuality inequalities are robust to noise and are operationally formulated, so that any experimental violation of the inequalities is a witness of contextuality, independently of the validity of quantum theory. Along the way, we introduce new methods for analyzing noncontextuality scenarios, and demonstrate a tight connection between our minimum error state discrimination scenario and a Bell scenario.

Understanding the boundary between the quantum and the classical is of fundamental importance for understanding quantum theory. One successful metric for nonclassicality, violation of Bell's notion of local causality [1], defines a clear departure from classicality in relativistic theories, but is relevant only for experiments with space-like separated measurements. Another notion of classicality, which concerns context-independence, was proposed by Kochen-Specker [2] and Bell [3], and has since been significantly refined and generalized [4]. It is the generalized notion of noncontextuality from Ref. [4] which we study in this paper, but we refer to it simply as "noncontextuality" hereafter. As a metric for nonclassicality, the failure of noncontextuality has a broader scope than the failure of local causality insofar as it does not require space-like separation. It has also been shown to subsume many other pre-existing notions of nonclassicality, such as the negativity of quasi-probability representations [5], the generation of anomalous weak values [6], and even the aforementioned violations of local causality [4].
The quantum-classical boundary is also of practical importance in identifying tasks which admit of a quantum advantage.
For example, violations of Bell inequalities have been shown to be a resource for device-independent key distribution [7], certified randomness [8], and communication complexity [9]. The failure of noncontextuality has also been shown to be a resource, leading to advantages for cryptography [10][11][12] and computation [13][14][15].
We here analyze minimum-error state discrimination (MESD) from the point of view of noncontextuality. Quantum state discrimination is a task wherein one must guess which quantum state describes a given * dschmid@perimeterinstitute.ca quantum system when the state of that system is drawn from a known set of possibilities with a known prior distribution, and the estimation is based on the outcome of a measurement of one's choosing. In the "minimum error" variety of state discrimination, the objective is to minimize the probability that the estimate is in error. We here focus on the simplest case of a set containing just two states having equal a priori probability.
Although it is common to assert that the impossibility of perfectly discriminating nonorthogonal quantum states is an intrinsically nonclassical effect, this claim does not meet the minimal standard that one should require of any claim that some operational feature of quantum theory cannot be explained classically: namely, that it be justified by a rigorous no-go theorem. Such a theorem articulates a principle of classicality which has implications for operational statistics, and then proves that these implications are inconsistent with some operational feature(s) of quantum theory. Because the principle of noncontextuality constrains operational statistics and also has very broad scope, it is a particularly useful notion of classicality. If one does take it as one's principle of classicality, then the impossibility of discriminating nonorthogonal pure quantum states cannot be considered a nonclassical effect because there are subtheories of quantum theory (containing a strict subset of the states, measurements and transformations of the full theory) [16] wherein this phenomenon arises and which admit of a noncontextual model. (Within such models, the phenomenon can be attributed to the fact that the probability distributions associated to such quantum states are overlapping 1 .) It follows that one must look at more nuanced aspects of the phenomenology of quantum state discrimination to identify features which are truly nonclassical by these lights.
We identify one such strongly nonclassical aspect of minimum error state discrimination: the particular dependence of the probability of successful discrimination on the overlap of the quantum states. For a given overlap, the quantum probability of discrimination is larger than can be accounted for by a noncontextual model. After presenting this result as a no-go theorem-that no noncontextual model can reproduce certain features of quantum MESD-we reformulate the problem in a manner which makes no reference to quantum theory, and which does not rely on any theoretical idealizations such as noise-free measurements or preparations. Our entirely operational formulation allows us to derive inequalities which can experimentally witness a contextual advantage for state discrimination, in the presence of noise and independently of the validity of quantum theory.
Our result identifies a key feature of quantum state discrimination which cannot be understood in any noncontextual model, and hence which is strongly nonclassical. Because quantum state discrimination is a primitive in many important quantum information processing protocols [18,19], this work constitutes a first step towards identifying contextuality as a resource for more tasks concerning communication, computation, and cryptography.
We also prove an isomorphism between our operational MESD scenario and a two-party Bell test in which one party performs one of a pair of binary-outcome measurements and the other performs one of three binary-outcome measurements. This is similar to the fact that the noncontextuality inequality delimiting the success rate for parity-oblivious multiplexing [10] is isomorphic to the CHSH inequality in the Bell scenario [10].
Finally, we introduce two powerful new technical tools. First, we generalize existing methods for simulating exact operational equivalences [20]. Namely, while Ref. [20] shows how one may find a set of procedures which respects certain operational equivalences exactly, we have further demonstrated that one can find procedures which respect operational equivalences and simultaneously obey useful auxiliary constraints, such as the symmetries native to our ideal MESD scenario. This tool may have more general applications in the comparison of experimental data with theoretical expectations. More importantly, we find our noncontextuality inequalities using a novel algorithm (presented in Appendix B) for deriving the full set of necessary and sufficient noncontextuality inequalities for any finite prepare-and-measure scenario, with respect to any fixed operational equivalences 2 .
2 A full description of this algorithm can be found in Ref. [21].

I. OPERATIONAL THEORIES AND ONTOLOGICAL MODELS
An operational theory is a specification of sets of primitive laboratory operations (e.g., preparations and measurements) and a prescription for finding the probabilities p(k|M, P ) for each outcome k given any measurement M performed on any preparation P . Two preparations P and P are termed operationally equivalent if they cannot be differentiated by the statistics of any measurement; we denote this operational equivalence by P P . (1) In this article, quantum theory is understood as an operational theory. In the quantum formalism, the density operator specifies the statistics for all measurements, so that two preparation procedures are operationally equivalent if and only if they are represented by the same density operator. An ontological model of an operational theory has the following form. To every system, there is associated an ontic state space Λ, where each ontic state λ ∈ Λ specifies all the physical properties of the system. Each preparation P of a system is presumed to sample the system's ontic state λ at random from a probability distribution, denoted µ P (λ) and termed the epistemic state associated to P , where Λ dλ µ P (λ) = 1.
Each measurement M on a system is presumed to have its outcome k sampled at random in a manner that depends on the ontic state λ. The term effect will be used to refer to the pair consisting of a measurement, M , together with one of its outcomes, k, and will be denoted by k|M . The probability of outcome k given measurement M , considered as a function of λ, will be termed the response function associated to k|M , and denoted ξ k|M (λ), where ∀λ : Finally, an ontological model of an operational theory must reproduce the latter's empirical predictions; that is, We are now in a position to describe the assumption of preparation noncontextuality defined in Ref. [4]. An ontological model is said to be preparation noncontextual if it assigns the same epistemic state to all operationally equivalent preparations [4]: In operational quantum theory, the principle of preparation noncontextuality is respected whenever any two preparations that are associated to the same density operator are represented by the same epistemic state. For instance, different ensembles of states that average to the same mixed state (and for which one discards the information about which element of the ensemble was prepared) are operationally equivalent, and must be assigned the same epistemic state in a preparation noncontextual model. Although there is a corresponding notion of measurement noncontextuality (namely, that operationally equivalent outcomes of measurements are represented by the same response functions), we will not have use of it in this article.
A few terminological conventions will be useful. A measurement is said to be represented as outcome-deterministic in the ontological model if the associated response functions all take values in {0, 1}. The support of an epistemic state is defined as the set of λ ∈ Λ which are assigned nonzero probability by it, supp[µ P (λ)] ≡ {λ : µ P (λ) = 0}, while the support of a response function is defined as the set of λ ∈ Λ for which the response function is nonzero,

II. QUANTUM MINIMUM ERROR STATE DISCRIMINATION
We begin with the problem of discriminating two nonorthogonal pure quantum states |φ and |ψ . These two states span a 2-dimensional space, so we can represent them as points in an equatorial plane of the Bloch ball, as in Fig. 1.
First, we consider the operational signature of their nonorthogonality.
A measurement of the φ basis, B φ ≡ {|φ φ| , φ φ }, perfectly distinguishes between state |φ and its complement; we denote the associated outcomes by φ andφ, respectively. A measurement of the ψ basis, B ψ ≡ {|ψ ψ| , ψ ψ }, does the same for the state |ψ and its complement, with associated outcomes ψ andψ. If one implements the ψ basis measurement on the state φ, the probability of obtaining the ψ outcome is Because one could think of this quantity as the probability that φ passes the test for ψ and thus is confusable with ψ, we henceforth call it the confusability. Note that if one implements the φ basis measurement on the state ψ, the probability of obtaining the φ outcome is also c q .
If |φ and |ψ have nonzero confusability (i.e., if they are not orthogonal), then no measurement can distinguish between the two without incurring a nonzero probability of error. We denote the discriminating measurement by B d ≡ {E g φ , E g ψ }, where the outcome for which one should guess φ (respectively ψ) is denoted g φ (respectively g ψ ). Assuming equal prior probabilities of |φ and |ψ , the probability of guessing the state correctly with this measurement is We assume that the discriminating measurement has the natural symmetry property Tr[E g φ |φ φ|] = Tr[E g ψ |ψ ψ|] so that The measurement scheme that yields the greatest probability of guessing correctly which of two nonorthogonal states was prepared is called the minimum error state discrimination (MESD) scheme. Since |φ and |ψ are prepared with equal probability, the POVM {E g φ , E g ψ } achieving MESD is the one consisting of projectors onto the basis that straddles |φ and |ψ in Hilbert space, which is depicted in the Bloch sphere in Fig. 1. This is called the Helstrom measurement [22]. It is well-known that the probability of guessing the state correctly using the Helstrom measurement is We have now described all of the preparations and measurements that usually appear in a discussion of the problem of discriminating two nonorthogonal quantum states, and some basic facts about the relations that hold among the operational quantities characterizing the discrimination problem (i.e., facts about the phenomenology of quantum state discrimination). However, these facts are insufficient for deriving a no-go theorem for noncontextuality. The reason is that the preparations and measurements described thus far do not exhibit any operational equivalences via which the assumption of noncontextuality could imply nontrivial constraints on the ontological model. However, there is a simple solution: we also consider the problem of discriminating the pair of quantum states that are complementary to |φ and |ψ , namely, φ and ψ , also depicted in Fig. 1. By symmetry, the confusability of φ and ψ is also equal to c q , and the success rate for distinguishing φ and ψ when they have equal prior probability is also equal to s q (where the optimal measurement is again {E g φ , E g ψ }, but now the outcomes g φ and g ψ signal one to guess preparations ψ and φ , respectively). So the φ vs.
ψ discrimination problem is a mirror image of the |φ vs. |ψ discrimination problem, and consequently does not require specifying any additional facts about the phenomology of quantum state discrimination. However, the inclusion of φ and ψ in our analysis provides us with a nontrivial operational equivalence relation among the preparations, namely, We will show that this equivalence relation together with the phenomenology of quantum state discrimination described above is sufficient to derive a no-go theorem for noncontextuality. The probability of a given measurement outcome occurring on a given preparation, for every possible pairing thereof, is summarized in Table I. Here, the columns correspond to the distinct state-preparations and the rows correspond to the distinct effects (where one need only include a single effect for each binary-outcome measurement given that the probability for the other effect is fixed by normalization).

III. NONCONTEXTUALITY NO-GO THEOREM FOR MESD IN QUANTUM THEORY
The fact that the ontological model must reproduce the probabilities in Table I via Eq. (6) implies constraints on the epistemic states associated to the four preparations and the response functions associated to the three effects. For instance, to reproduce the first column of the table, one requires that Given that convex mixtures of preparations are represented in an ontological model by the corresponding mixture of epistemic states (see Eq. (7) of [5] and the surrounding discussion), it follows that 1 2 |φ φ| + 1 2 φ φ is represented by 1 2 µ φ (λ) + 1 2 µφ(λ), and 1 2 |ψ ψ| + 1 2 ψ ψ is represented by 1 2 µ ψ (λ) + 1 2 µψ(λ). But because both of these mixtures of preparations are associated to the completely mixed state (Eq. (12)), they are operationally equivalent, and thus by the assumption of preparation noncontextuality, they are represented by the same epistemic state. It follows that Any ontological model satisfying noncontextuality, and consequently Eq. (16), and reproducing the form of the data in Table I, and consequently Eqs. (13)- (15) and their kin, can be shown to satisfy the following trade-off between s q and c q : An intuitive proof is provided in Section III A, where we also discuss how this result is related to the results of Refs. [23][24][25]. (In Appendix A, we provide a proof using more general methods, which generalizes more easily to the noisy case discussed later, in Section V.) This tradeoff relation contradicts the one known to be optimal in quantum theory, Eq. (11). The optimal quantum tradeoff generally allows higher success rates for a given confusability than the noncontextual tradeoff. Therefore, we conclude that the phenomenology of minimum-error state discrimination in the noiseless quantum case is inconsistent with the principle of noncontextuality.
In Fig. 2, we plot the maximum success rate for MESD as a function of the confusability for both quantum theory (Eq. (11)) and for a noncontextual model (the tradeoff that saturates the inequality of Eq. (17)).

A. Intuitive proof of the noncontextual tradeoff
We now introduce some basic facts from classical probability theory, which we then leverage to prove Eq. (17).
Suppose that a classical variable λ has been sampled from one of two overlapping probability distributions, p(λ|a) and p(λ|b). Absent additional information, it is straightforward to see that in trying to guess which of the two distributions a given λ was drawn from, one cannot do better than guessing 'distribution a' for the values of λ for which p(a|λ) > p(b|λ), and guessing 'distribution b' when the opposite is true. (Of course, it is irrelevant what one guesses for the values of λ for which p(a|λ) = p(b|λ).) In the special case we are considering, with equal prior probability p(a) = p(b) = 1 2 for the two options, if we perform a Bayesian inversion, we find p(λ|a) > p(λ|b) if and only if p(a|λ) > p(b|λ), and hence one should guess 'distribution a' for the values of λ for which p(λ|a) > p(λ|b), and guess 'distribution b' when the opposite is true.
The probability that the guess g ∈ {a, b} was correct given a particular value of λ is simply p(g|λ). Since we always guess the distribution a or b that has the higher likelihood of being correct, the probability that we are right in each run is simply max{p(a|λ), p(b|λ)}. On average, then, the success probability r is where the equality on line (19) uses the fact that p(a|λ) + p(b|λ) = 1 for all λ. The quantity Λ dλ min{p(λ|a), p(λ|b)} is termed the classical overlap of the probability distributions p(λ|a) and p(λ|b).
In an MESD scenario, the task is to guess, in each particular run of the experiment, whether a system was prepared by state-preparation |φ or by state-preparation |ψ . If the experiment is described by an ontological model, then this task corresponds to guessing, from a single sample of the ontic state λ of the system, whether it was sampled from the distribution µ φ (λ) or from µ ψ (λ). Given that we do not assume any operational equivalence relations among the measurements in the experiment, the assumption of measurement noncontextuality does not place any constraints on the ontological representation of the measurements. Therefore, in particular, the Helstrom measurement is at best represented in the ontological model by the set of response functions that yield the maximum probability of guessing which distribution the ontic state λ was sampled from. From our discussion concerning two overlapping classical probability distributions, it is clear that this corresponds to a measurement that returns the g φ outcome whenever µ φ (λ) > µ ψ (λ) and the g ψ outcome whenever µ φ (λ) < µ ψ (λ), and that the probability of guessing correctly based on the outcome of the Helstrom measurement is upper bounded as follows: 3 We will now show that in a noncontextual model, so that substituting Eq. (24) into Eq. (23), we infer that s q ≤ 1 − cq 2 , the noncontextual bound on the trade-off between s q and c q described in Eq. (17).
Firstly, in any preparation noncontextual model the response function ξ i (λ) for a projector onto pure state |i satisfies This outcome determinism for sharp measurements was first proven in Ref. [4]. It can be seen by considering the projector as part of some projective measurement M with effects {E i = |i i|}, and the corresponding basis of pure states {ρ i = |i i|}, so that Tr[E i ρ j ] = δ i,j . Denoting the epistemic state of ρ j as µ j (λ) and the response function for E i as ξ i|M (λ), this implies that µ j (λ)ξ i|M (λ)dλ = δ i,j . Because µ j (λ) is a normalized probability distribution, this implies that, for any ontological model, 3 A mathematically equivalent version of this upper bound was previously proven under different assumptions in Refs. [25,26]. The former article considered the assumption that this inequality is saturated as a constraint on ontological models, which they termed "maximal ψ-episemicity". (Note that this constraint is different from the constraint considered in Ref. [24] even though it has the same name.) Eq. (26) is not equivalent to Eq. (25), since there may exist ontic states that are not in the support of any of the µ i (λ), and Eq. (26) does not constrain such ontic states in any way. In a preparation noncontextual model, however, we can furthermore show that there are no ontic states outside of the union of the supports of the set of basis states, ∪ i supp[µ i (λ)], as follows. Every density operator ρ appears in some decomposition of the maximally mixed state 1 d 1. By preparation noncontextuality, every such decomposition has the same distribution µ 1 d 1 (λ) over ontic states. Thus, every ontic state in the support of the corresponding µ ρ (λ) also appears in the support of µ 1 Thus every ontic state λ must be in the support of exactly one of the ρ i , and Eq. (26) can be strengthened to Eq. (25).
Recalling the expression for the confusability of quantum states |φ and |ψ in an ontological model, (25) implies that for a preparation noncontextual model: By virtue of the symmetry of the problem, the analogous expression with the roles of φ and ψ reversed also holds.
The fact that the expression for the ideal confusability c q = | φ|ψ | 2 of φ and ψ in a preparation-noncontextual model is given by Eq. (27) was noted by Leifer and Maroney [24].
The second implication of preparation noncontextuality which we require to prove Eq. (24) is that for each of the four quantum states Ψ ∈ {φ, ψ,φ,ψ}, is the distribution associated with the maximally mixed state 1 2 . This was also first proven in Ref. [4], and follows immediately from preparation noncontextuality, , and the fact that an ontic state can be in the support of at most one state from a set of orthogonal states; that is, µ φ (λ)µφ(λ) = 0 and µ ψ (λ)µψ(λ) = 0.
, and is equal to 0 everywhere else, and consequently Finally, Eq. (27) and Eq. (28) together imply Eq. (24), which is what we sought to prove.

B. Graphical summary of the proof
The intuitive proof is best summarized graphically, by contrasting a preparation-contextual ontological model, Fig. 3, with a preparation noncontextual ontological model, Fig. 4. For visual simplicity, we have chosen a continuous, 1-dimensional, bounded ontic state space. We arrange the state space into a circle, so that each point on the circle is a unique ontic state, and epistemic states are represented as probability distributions on the surface of the circle (where the probability density corresponds to the radial height). In each figure, we show the epistemic states for the four preparations and for the two mixed preparations, the classical overlap for two epistemic states, a representative response function, and the confusability generated by that response function. We then show that in the contextual model, the classical overlap and confusability can differ, while in the noncontextual model, they must be identical.
Thus, in a preparation-noncontextual model, the classical overlap is given simply by the integral of 2µ 1/2 (λ) in the region of common support, as shown by the shaded region in (g). Furthermore, preparation noncontextuality implies that the response function ξ φ|B φ (λ) is 1 on the support of µ φ (λ) and 0 on all other ontic states, as shown in (h). Given this form for the response function, the confusability c q = Λ dλξ φ|B φ (λ)µ ψ (λ) is given by the area of the shaded region in (i). Clearly, the classical overlap and the confusability are identical in a preparation-noncontextual model.

C. Relation to previous work
Leifer and Maroney [24] consider the assumption that Eq. (27) should hold for every possible pair of quantum states φ and ψ as a constraint on ontological models that is worthy of investigation in its own right. They term ontological models that satisfy this assumption maximally ψ-epistemic. As we noted in Sec. III A (and as demonstrated in their article), this assumption follows from preparation noncontextuality (and hence from universal noncontextuality). However, Leifer and Maroney investigate the consequences of making the assumption of maximal ψ-epistemicity In a noncontextual model of an MESD scenario: (a)-(f) Epistemic states; (g) Classical overlap between µ φ (λ) and without also assuming other consequences of universal noncontextuality, in particular, without assuming other consequences of preparation noncontextuality.
They establish their no-go theorem for maximal ψ-epistemicity (and hence for universal noncontextuality) by demonstrating that maximal ψ-epistemicity implies the Kochen-Specker notion of noncontextuality (which is measurement noncontextuality together with the assumption of outcome determinism for sharp measurements), and then relying on the fact that quantum theory does not admit of a Kochen-Specker noncontextual model (the Kochen-Specker theorem).
Both our article and theirs explore senses in which a pair of quantum states may be said to be "indistinguishable", and to what extent some operational counterpart of this indistinguishability can be explained in an ontological model satisfying certain properties. But there are key differences. As we've noted, the property of ontological models that we focus on is different: we consider the assumption of universal noncontextuality rather than just maximal ψ-epistemicity. 4 The more important difference between our work and that of Leifer and Maroney, however, is in how we operationalize the notion of indistinguishability.
To explain the difference, it is useful to highlight two distinct facts about a pair of nonorthogonal pure quantum states (i.e., a pair |ψ and |φ for which | ψ|φ | 2 > 0): (i) they are not perfectly discriminable, which is to say that there is no quantum measurement that achieves zero error in the discrimination task, formalized as s q > 0, and (ii) they are confusable, which is to say that the ideal quantum measurement that tests for being in the state |φ has a nonzero probability of being passed by the state |ψ , and similarly for |φ and |ψ interchanged, formalized as c q > 0.
The determination of the maximum probability of discrimination for a given confusability, that is, the optimal tradeoff relation that holds between s q and c q , is one of the central results in the field of quantum state estimation. Our work seeks to determine constraints on this tradeoff relation from assumptions about the ontological model.
Leifer and Maroney, by contrast, do not consider this tradeoff relation, nor the expression for the discriminability of quantum states. Rather, they address (and answer in the negative) the question of whether the degree of confusability of nonorthogonal pure quantum states can be given a particular expression in the ontological model, namely, that of Eq. (27), which asserts that the test associated to the state |φ is a test for whether the ontic state λ is inside the ontic support of the distribution representing |φ . 5 While the expression for the confusability of two quantum states is a feature of their indistinguishability, it is not one that has previously been of interest in the field of quantum state estimation.
Thus, whereas Leifer and Maroney show the impossibility of a particular ontological expression for the confusability from a known no-go result for Kochen-Specker noncontextuality (the Kochen-Specker theorem), we begin with the native phenomenology of minimum-error state discrimination (the quantum tradeoff between s q and c q ), and we derive a novel no-go result for universal noncontextuality from it.
The form of the tradeoff relation between discriminability and confusability has relevance for quantum information processing tasks that make use of minimum error state discrimination. For instance, ψ-epistemicity that is not simultaneously a motivation for universal noncontextuality. Therefore, unlike Ref. [23], we remain unconvinced that the assumption that Eq. (27) holds for every pair of quantum states is interesting in its own right. 5 This is the sort of explanation one obtains in the toy theory model of the single qubit stabilizer subtheory of quantum theory [27] or the Kochen-Specker model of a single qubit [2]. Note that this is not the only way to explain the degree of confusability; the response function for |φ might be nontrivial outside the ontic support of the distribution representing |φ and even indeterministic in that region, and if so, one can have a nonzero confusability even though µ φ and µ ψ have disjoint ontic supports.
it is used in Ref. [28] to derive the tradeoff relation between concealment and bindingness in quantum bit commitment protocols [29,30], and such protocols can be used as subroutines in protocols for other tasks, such as strong coin flipping [28,31]. It has also used in the analysis of quantum protocols for the task of oblivious transfer [32]. Our results may be useful, therefore, in determining whether or not the failure of universal noncontextuality is a resource for such tasks. Note that because MESD for two pure quantum states is a phenomenon occuring in a two-dimensional Hilbert space (the subspace spanned by the two states) while the Kochen-Specker theorem can only be proven in Hilbert spaces of dimension three or greater, there is no possibility of leveraging facts about Kochen-Specker-uncolourable sets to infer anything about which aspects of MESD resist explanation within a universally noncontextual model. 6 A final crucial advantage of our approach over that of Ref. [24] is that it can be used to derive noncontextuality inequalities that are noise-robust and hence experimentally testable, as we will show in the next section. Noise-robustness is critical if one hopes to leverage contextuality as a resource in real (hence noisy) implementations of information-processing protocols.

IV. DEALING WITH NOISE
It is important to recognize that the inequality of Eq. (17) is not experimentally testable.
To clarify this point, we first draw a distinction between noncontextuality no-go results and noncontextuality inequalities. A noncontextuality no-go result is a proof that no noncontextual model can reproduce certain predictions of quantum theory; as such, a no-go result can contain idealizations (such as perfect correlations) which are justified by quantum theory but which never hold in real experiments. In some cases (as above), a no-go result may derive an inequality on the way to deriving a logical contradiction, but such an inequality may not qualify as a proper noncontextuality inequality. In our usage, a noncontextuality inequality makes no reference to the quantum formalism and must not invoke idealized assumptions in its derivation. We give such an inequality for MESD in Section V.
The distinction between no-go results and robust inequalities has historical precedent.
In his 1964 paper [1], in deriving an inequality that could be shown to be violated by quantum correlations, Bell assumed an experiment wherein certain pairs of measurements had perfectly correlated outcomes. Such perfect correlations hold for ideal quantum states and measurements, but are never observed in nature.
Hence, Bell's 1964 result is a no-go result, with consequences for the interpretation of quantum theory, but the inequality he derives en route to this contradiction does not provide a means of experimentally testing the principle of local causality. In 1969, Clauser, Horne, Shimony, and Holte [33] derived an inequality without assuming these idealizations. Because their inequality makes no reference to perfect correlations or to any other feature of quantum theory, its violation rules out all locally causal ontological models, independently of the validity of quantum theory. Only inequalities of this type are termed "Bell inequalities" in modern usage (so that the inequality in Bell's 1964 paper is not a "Bell inequality").
Similarly, Eq. (17) is not a proper noncontextuality inequality because it relies upon the idealization of perfect correlations between which of the states |φ or φ was prepared and which of the outcomes will occur in the measurement of the B φ basis (and similarly for ψ andψ). To get a noncontextuality inequality, we must allow these correlations to be imperfect. Thus, in Table I, the entries that take the values 0 and 1 must instead be presumed to take the values and 1 − respectively, such that becomes a parameter in our noncontextuality inequality which quantifies the degree of imperfection of the correlations. We then show that quantum mechanics still allows higher success rates for a given confusability than any noncontextual model, even when = 0.
Before proving this, we first rephrase the scenario as a totally operational prepare-and-measure experiment, with no reference to the quantum formalism (despite the suggestive notation below). This is a necessary first step for deriving any proper noncontextuality inequality.

A. Operationalizing MESD
We imagine an experiment involving four preparations {P φ , P ψ , Pφ, Pψ} and three binary-outcome measurements, {M φ , M ψ , M d }, with outcome sets denoted {φ,φ}, {ψ,ψ}, and {g φ , g ψ }, respectively. An arbitrary data table for such an experiment would contain 12 independent parameters, specifying the probability of the first outcome of each measurement when acting on each preparation (the probability of obtaining the second outcome being fixed by normalization).
However, we wish to study the scenario in which preparations P φ , P ψ , Pφ, and Pψ satisfy the following relation: the procedure P 1 2 φ+ 1 2φ defined by sampling from preparations P φ and Pφ uniformly at random (and then forgetting which preparation occurred) is indistinguishable from the similarly defined procedure This implies that only 3 of the parameters in each row are independent, so only 9 independent parameters remain. Previously the operational equivalence of Eq. (29) was guaranteed by quantum theory (Eq. (12)), but now we wish to justify it experimentally. In order to do so, one must show that the statistics for P 1 2 φ+ 1 2φ and for P 1 2 ψ+ 1 2ψ are identical for all measurements. Because the statistics of a tomographically complete set of measurements allows one to predict the statistics for all measurements, it suffices to verify this identity for such a tomographically complete set. Accumulating evidence that a given set of measurements is indeed tomographically complete represents the most difficult challenge for an experimental test of noncontextuality (See Refs. [20,34] for a more detailed discussion.).
Note that in a realistic experiment, the four preparations that are realized, called the primary preparations, will not satisfy Eq. (29) perfectly. However, this problem can be solved by post-processing these into "secondary preparations" which are chosen to enforce this equivalence [20,35], as discussed in Section VI.
For this 9-parameter problem, the algorithm we describe in Appendix B gives the full set of necessary and sufficient noncontextuality inequalities, which we list in Appendix D. For now, however, we consider a special case with just three parameters, which captures the essence of minimum error state discrimination. Namely, we assume symmetries that parallel those in the ideal quantum case: = p(φ|Mφ, Pψ) = p(ψ|Mψ, Pφ) and 1 − ≡ p(ψ|M ψ , P ψ ) = p(φ|M φ , P φ ) = p(ψ|Mψ, Pψ) = p(φ|Mφ, Pφ).
We have denoted the three free parameters that remain after imposing the symmetries by s, c, and 1 − , paralleling their ideal quantum counterparts, s q , c q , and 1, respectively. Just like the operational equivalence, these symmetries will never hold exactly for the primary procedures, but we can enforce them while choosing secondary procedures, as discussed in Section VI. The notation P φ , P ψ , Pφ, Pψ, M φ , M ψ , and M d will henceforth be used to denote the secondary procedures, for which the operational equivalence and symmetries are exact.
The resulting data table, Table II, is similar to the ideal scenario of Table I, but contains the noise parameter (1 − ) in place of the probability 0 (1). Note that for each row, the average of the entries in the P φ and Pφ columns is 1 2 (and similarly for P ψ and Pψ). Here, this follows from the assumed symmetries, not from the operational equivalence (which specifies that the average of the entries for P φ and Pφ is the same as the average of the entries for P ψ and Pψ, but not necessarily 1 2 ); in Table I, the same averaging property is implied by the operational equivalence of each of the two mixtures to the maximally mixed quantum state in Eq. (12) (and redundantly implied by these symmetries).
Finally, we assume that the measurements and outcomes are labeled in the natural way; e.g., the outcome of M φ that is more likely to occur given the preparation P φ is φ rather thanφ, etc. Then, the data satisfies the constraint that

V. NONCONTEXTUALITY INEQUALITIES FOR MESD
The operational equivalence relation of Eq. (29) together with the assumption of preparation noncontextuality implies via Eq. (7) that where we have again used the fact that convex mixtures of preparations are represented in an ontological model by the corresponding mixture of epistemic states. The fact that the ontological model must reproduce Table II implies constraints analogous to Eqs. (13)-(15) and their kin.
As we prove in Appendix B, the tradeoff between s, c, and in any noncontextual model of our operational scenario must satisfy In Appendix C, we show that quantum theory allows a tradeoff of s = 1 2 (1 + 1 − + 2 (1 − )c(c − 1) + c(2 − 1)).
(36) Thus quantum theory predicts a higher state discrimination success rate for any given c and than a noncontextual model allows. One easily verifies that Eq. (35) reduces to Eq. (17) in the limit of → 0, and that Eq. (36) reduces to Eq. (11) in the same limit.
It is an open question whether Eq. (36) is the optimal tradeoff that quantum theory allows. We conjecture that it is optimal for pairs of states in a 2-dimensional Hilbert space.
The noncontextual and quantum tradeoffs are shown in Fig. 5. The purple surface represents the triples (s, c, ) saturating the inequality of Eq. (35), while the light blue surface represents the triples (s, c, ) corresponding to the quantum success rate of Eq. (36).
If an experiment generates data having the form of Table II and satisfying Eq. (29), and it is found to lie above the purple shaded surface, then one has experimental evidence for the failure of noncontextuality. This evidence is independent of the validity of quantum theory, and signals a contextual advantage for state discrimination, even when one's preparations and measurements are imperfect.

A. Understanding the quantum and noncontextual bounds
For both quantum and noncontextual models, we adopt the natural labeling convention described above Eq. (33), so that all operational data necessarily satisfies ≤ c ≤ 1− . In the c− plane of Fig. 5, these constraints describe a triangular wedge which points into the page.
In the plane with = 0, Section A provides an intuitive explanation for the tradeoff relation.
In the plane with = c, we can see that for both quantum and noncontextual models, the preparations can be perfectly distinguishable, s = 1. This follows from the fact that the value of quantifies the noise in M φ and M ψ , and when c is no larger than we can attribute all of the confusability to this noise. Explicitly, one can construct a quantum model where preparation P φ is represented by |0 0| and P ψ is represented by |1 1| and where effect E φ|M φ is represented by (1 − ) |0 0| + |1 1| and E ψ|M ψ is represented by |0 0| + (1− ) |1 1|, which implies that c = , while s = 1 for the Helstrom measurement {|0 0| , |1 1|}. Furthermore, since these states and effects are all diagonal in the same basis, we can take the eigenvalues of these to define the conditional probabilities of a noncontextual model which achieves c = and s = 1. Whenever c > , however, the noise in M φ and M ψ cannot explain all of the confusability, and therefore some of this confusability must be explained by the lack of perfect distinguishability of the preparations; that is, in a quantum model, the preparations must be represented by nonorthogonal states, while in a noncontextual model, they must be represented by overlapping probability distributions. Thus, the maximum value of s falls away from 1 as we move away from the = c plane. In a noncontextual model, it falls off linearly, interpolating between its value for = c and its value for = 0. The quantum bound falls off more slowly.

B. Robustness to depolarizing noise
We can get a sense for the robustness of our noncontextuality inequalities by considering a specific noise model in quantum theory. Imagine that one's attempts to implement the ideal quantum preparations and measurements are thwarted by a depolarizing channel which has the same noise parameter v for all states and effects: The resulting states and effects are shown in Fig. 6 for some fixed v. One can graphically see that this uniform depolarization map generates a new set of states and measurements which satisfy the symmetries and operational equivalence we require. However, if the noise is too large, our noncontextuality inequality will not be violated, as we now show. This noisy model generates a data table of the form of  Table II with As always, c q = | φ|ψ | 2 .
FIG. 7. The maximum value of the parameter v for the depolarizing noise model that allows a violation of our noncontextuality inequality, as a function of the Bloch sphere angle θ between the two states.

VI. ENFORCING SYMMETRIES AND OPERATIONAL EQUIVALENCES
In Section IV A, we predicated our noncontextuality inequalities on the exact operational equivalence of Eq. (29) and exact operational symmetries of Eq. (30)-(32), yet we claimed that these idealizations can in fact be realized in realistic, noisy experiments. Of course, no experimental data will directly satisfy either of these requirements; rather, one performs a post-processing of the data, as originally outlined in [20].
For pedagogical clarity, we will discuss this data processing under the assumption that the operational theory is quantum theory. Note, however, that our comments can easily be generalized to the framework of generalized probabilistic theories (defined in Refs. [36,37]), as demonstrated in Refs. [20] and [34]. Indeed, the analysis must be performed in this framework if one hopes to directly test the hypothesis of noncontextuality against one's experimental data (i.e., without assuming the validity of quantum theory).
For any set P of noisy preparations that has been performed experimentally, one can simulate perfectly the statistics of all other preparations in the convex hull of P, viewed as points in the quantum state space (here, simply a plane of the Bloch sphere). Similarly, for any set E of noisy measurement effects, one can perfectly simulate the statistics of all other effects in the convex hull of E, viewed as points in the space of valid quantum effects. In [20], this fact was leveraged to simulate exact operational equivalences for a set of "secondary preparations" from data on a set of "primary preparations" that failed to satisfy the operational equivalences exactly. Here, we leverage this trick to simulate preparations and measurements which simultaneously satisfy our operational equivalence as well as the symmetries. We now argue that this can always be done, although if the primary preparations or measurements are too noisy, the resulting simulated data will not violate our inequalities.
As we showed explicitly in Section V B, even a partially depolarized set of states and effects can violate our inequality. Hence, one need only realize experimental sets P and E which contain in their convex hull the images of our ideal states and effects under the depolarization . Then, one can post-process the data obtained from P and E to obtain a physically meaningful set of data which satisfies the operational equivalence and symmetries that we assumed in the main text, and our inequality will still be violated. Geometrically, this simply means that the primary preparations must have a convex hull which contains the image of the ideal states under a depolarizing map with v ≤ 1 − 1 cos 2 ( θ 2 )+sin( θ 2 ) , as pictured in Fig. 8 (and similarly for the measurements, also pictured).
In fact, there are other noisy sets of preparations and measurements besides the depolarized versions of the corresponding ideals which satisfy the operational equivalence and symmetries needed for the noncontextuality inequality to apply. A simple example is states and measurements that are depolarized versions of the ideals that are also rotated in the plane by the same angle. By doing such a rotation, one may be able to simulate a set of states and effects with less depolarization, which then leads to larger violations. In general, there are many sets of states and effects that satisfy our operational equivalence and symmetries. Given a set of primary procedures that one has performed and characterized, finding the states and measurements satisfying our constraints which maximize the violation of our inequality is a straightforward linear program [20].
Leveraging the convex structure of operational theories in order to define secondary laboratory procedures which respect certain theoretical idealizations is a powerful tool which we expect to have broad applicability. To date, this method has been proposed to identify operational procedures which respect exact operational equivalences. What we have just shown is that the method also allows one to enforce natural symmetries which greatly simplify the problem at hand (as evidenced by comparing Eq. (35) to the set of inequalities in Appendix D). Of course, this tool does not allow one to define laboratory procedures which satisfy any desired idealizations; for example, one could never generate a pure state or a sharp measurement effect by convexly mixing the noisy procedures actually performed in the lab. We expect future work to continue expanding the range of practical applicability of the technique of secondary procedures.

VII. ISOMORPHISM BETWEEN MESD AND A BELL SCENARIO
Any noncontextuality scenario that makes no assumptions of measurement noncontextuality, and for which there is a single mixed preparation whose various ensemble decompositions generate all of the operational equivalences of interest, is isomorphic to a related Bell scenario [38]. Both of these conditions hold for our MESD scenario, since we do not consider any operational equivalences among the measurements, and the operational equivalences among the preparations are generated by decompositions of a single mixed preparation (e.g. the maximally mixed state in the ideal case). The operational Bell scenario which relates to our MESD scenario is one with two parties, whom we denote by S and M (for reasons that will become apparent), where S has 2 binary measurements, denoted S 1 and S 2 , and M has 3 binary measurements, denoted M 1 , M 2 , and M 3 . The outcomes (which we denote s i for S i and m j for M j ) take values in the set {−1, +1}.
For such a scenario, the set of constraints which define the local set of correlations is given by positivity inequalities, p(s i m j |S i M j ) ≥ 0, the normalization condition simj p(s i m j |S i M j ) ≥ 0, and the CHSH inequalities [33] (applied to any of the 3 possible pairings of 2 measurement settings on S with 2 measurement settings on M ) [39]. As we will show, the bound on our MESD success rate follows under our assumed symmetries from the CHSH inequality where The connection between this Bell scenario and our MESD scenario is most easily seen in the ideal quantum realization.
Imagine that the two parties share a maximally entangled state |Φ + SM = 1 √ 2 (|00 SM + |11 SM ) (with |0 and |1 defined so that |φ and |ψ have real coefficients when written in this basis), and imagine that their measurements correspond to the quantum measurements from the main text, as follows: We take the +1 outcome for each measurement to correspond to the first quantum effect for that measurement. This ideal quantum realization of this Bell scenario is conceptually transformed into our ideal quantum realization of the MESD scenario by viewing a measurement by party S to be a remote preparation (via quantum steering) for party M . For example, outcome +1 for S 1 remotely prepares the state |φ M (which is why we have chosen the notation S, for 'source'). Similarly, outcome −1 for measurement S 2 prepares the state ψ M , and so on.
Thus, one can verify that in the ideal quantum realization, s q and c q become (in our new notation, and assuming the symmetries in Eqs. (30)-(32)) while the fact that paired preparations and measurements are perfectly correlated in the ideal quantum realization corresponds to Furthermore, the no-signaling condition in the Bell scenario implies the operational equivalence of our MESD scenario. If party S performs measurement S 1 , the updated state on M will be either |φ or φ with equal likelihood, and if party S performs measurement S 2 , the updated state on M will be either |ψ or ψ with equal likelihood. In quantum theory, the no-signaling condition implies that the average density operator prepared on M is the same for either choice of measurement by S, which is precisely the operational equivalence of Eq. (12).
Using Eq. (44), we can write Eq. (46) and Eq. (47) in terms of expectation values: Rewriting Eq. (43) in terms of s q and c q instead of expectation values, one obtains recovering Eq. (17), our bound for the success rate in state discrimination. Because both the Bell scenario and our MESD scenario are operationally defined, one can also make the translation without assuming the ideal quantum realizations. In a realistic operational scenario, will be nonzero, and one obtains Rewriting Eq. (43) in terms of s, c, and instead of expectation values, one obtains recovering Eq. (35), our bound for the success rate in state discrimination.
Due to the redundancies induced by our assumed symmetries, Eq. (35) follows also from the CHSH inequality by the same logic. More generally, if we do not assume any symmetries, then there are no redundant inequalities. If we furthermore do not assume the natural labeling constraint (Eq. (33)), then the full polytope of local correlations for this Bell scenario [39] (and described just above Eq. (43)) is isomorphic to the full polytope of noncontextual correlations for our MESD scenario.

VIII. FUTURE DIRECTIONS
We have identified a quantitative feature of minimum-error state discrimination in quantum theory that fails to admit of a noncontextual model. We have derived noncontextuality inequalities that delimit the tradeoff between success rate, error rate, and confusability in state discrimination, independently of the validity of quantum theory.
Our results show that contextuality is a resource for state discrimination, even in realistic, noisy experiments. This suggests many directions for future research. One important question is how our results translate into advantages for quantum information processing tasks which have MESD as a sub-routine. Because many such tasks (e.g., key distribution) consider consecutive measurements on the system, this research program would require further analysis regarding the consequences of noncontextuality for experiments involving sequential measurements [6,40,41].
It would also be interesting to generalize these results to other types of state discrimination, such as unambiguous state discrimination. Indeed, one can easily derive a relevant no-go theorem. The challenge is to define an operational notion of "unambiguous" given that no measurement yields truly unambiguous knowledge in the presence of noise. Once this challenge is met, it should be straightforward to apply the general algorithm we have introduced in this article in order to derive the noncontextuality inequalities for this scenario. Understanding the relation between noncontextuality and other kinds of state discrimination should translate into new kinds of quantum advantages for information processing tasks.
Let us define a variable κ which runs over the eight extremal points in the cube of ontic states. Then, there exists a p(κ|λ) such that ξ k|M (λ) = κ ξ k|M (κ)p(κ|λ). We can thus write any observable probability p(k|M, P ) as where µ P (κ) ≡ Λ dλ p(κ|λ)µ P (λ). This construction lets us write observed probabilities in terms of extremal value assignments by effectively moving uncertainty into the new state distributions µ P (κ). We sometimes simplify the notation by letting the distributions and response functions be vectors of probabilities indexed by the ontic states κ; e.g.
We thus convert an outcome-indeterministic model over a continuum of ontic states (the unit cube) to an outcome-deterministic model over just 8 ontic states (its vertices), without any loss of generality. The vertices κ 1 to κ 8 correspond to the deterministic triples In fact, if we assume B d is optimal, the fourth and fifth of these value assignments will never occur. Consider for example the triple (1, 0, 0) (which occurs for κ 5 ). Since ξ φ|B φ (κ 5 ) = 1, the state cannot have beenφ. Since ξ ψ|B ψ (κ 5 ) = 0, the state cannot have been ψ. Thus, we know the state must have been φ orψ; in either case, the winning strategy is for B d to return the outcome g φ . Therefore the winning strategy has ξ g φ |B d (κ 5 ) = 1, and thus the triple (1, 0, 0) never occurs in the winning strategy. Similar logic applies to the triple (0, 1, 1), and hence we need not consider these two assignments 7 . The 7 These assumptions for B d ensure that the relationship we derive between sq and cq will saturate the bound on sq implied by any noncontextual model. If we had not used this argument, we would obtain the same relationship, but only as a bound on sq, not as the saturating equality. This is easily verified explicitly, e.g. by taking = 0 in Appendix B below. However, including two more ontic states requires considerably more algebra.
remaining value assignments are Thus six ontic states are sufficient for describing our experiment: one for each remaining deterministic assignment. It follows that the vectors representing each of the three response functions are: (A7) We can constrain the most general form of the epistemic states using the perfect predictability of measurements B φ and B ψ on their corresponding states. Namely, recalling Eq. (25) and the form of the response functions, ξ φ|B φ (κ), ξ ψ|B ψ (κ), ξψ |B ψ (κ) ≡ 1 − ξ ψ|B ψ (κ), and ξφ |B φ (κ) ≡ 1 − ξ φ|B φ (κ), we can see that our epistemic states must have the form where normalization requires that a 4 + a 5 + a 6 = 1, and so on.
The definitions of c q and s q in Eqs. (8) and (10) translated into our ontological model become (A10) Taking these dot products using the vectors in Eq. (A7) and Eq. (A8) gives and Because the epistemic states must be normalized, it follows that b 5 + b 6 = 1 − b 3 , a 5 + a 6 = 1 − a 4 , a 4 + a 6 = 1 − a 5 , and b 2 + b 4 = 1 − b 1 . Substituting these four expressions, we obtain and and hence b 3 = a 4 = b 4 = a 3 and a 5 = b 6 = a 2 = b 1 . Let us take s q = 1 − a 2 and c q = 1 − a 3 . If there were no more constraints, then a 2 and a 3 could range from 0 to 1 independently, and s q and c q could take any values from 0 to 1. By imposing preparation noncontextuality, however, we have This implies b i = a i for all i. Since a 1 + a 2 + a 3 = 1 from normalization, a 1 = b 1 from preparation noncontextuality, and b 1 = a 2 as derived above, we also have 2a 2 + a 3 = 1 and hence c q = 2a 2 . Finally, writing s q in terms of c q yields Appendix B: Proof of noncontextuality inequality for MESD Herein we prove our noncontextuality inequality, Eq. (35); that is, we prove that must be satisfied for any s, c, and arising in a noncontextual model that reproduces data in Table II and respects Eq. (29).
First, we use the arguments of Appendix A to write down an ontological model with 8 ontic states and purely outcome-deterministic response functions. Second, we parametrize the set of possible epistemic states for this second model in accordance with preparation noncontextuality. Third, we calculate expressions for s, c, and in terms of these response functions and epistemic states.
These manipulations reduce the problem to a small set of linear equalities and inequalities over unobserved and observed variables. Finally, we eliminate the unobserved variables to obtain inequalities concerning only the observed variables s, c, and .
Exactly as before, we can convert a general, outcome-indeterministic model over a continuum of ontic states (the unit cube) to an outcome-deterministic model over just 8 ontic states (its vertices), without any loss of generality. (As before, this is simply a mathematical construction, and in no way commits us to a fundamental principle of outcome-determinism.) The vertices of the unit cube, κ 1 to κ 8 , again correspond to the deterministic triples ξ φ|M φ (κ), ξ ψ|M ψ (κ), ξ g φ |M d (κ) ∈ {(0, 0, 0), (0, 0, 1), (0, 1, 0), ..., (1, 1, 1)}, (B2) and the three response functions are again (In a more general situation in which measurement noncontextuality is also leveraged, there will be linear constraints on this set of response functions, and the extremal response functions will no longer all be outcome-deterministic. In this case, one can still explicitly enumerate the finite set of extremal response functions by taking the intersection of the linear constraints with the above cube of value assignments. These extremal points modify the specific form of Eq. (B3), and our methods would proceed largely unchanged.) Each preparation generates a probability distribution over κ, so we can write the epistemic states as where the parameters in each vector are positive and sum to 1.
Dot products between a vector in Eq. (B3) and a vector in Eq. (B4) can produce any set of observable statistics, and thus constitute a general ontological model for our measurements and preparations. The values of (s, c, ) that we can observe in a noncontextual model with our assumed symmetries, however, are restricted by the above constraints, all of which we repeat here for convenience.
Eqs. (2) and (3) imply that for all four preparations, Eq. (29) gives Eqs. (30)-(32) are, respectively, Eq. (33) gives Eqs. (B5)-(B11) define a set of constraints over the variables s, c, , a i , b i , c i , and d i (where i ∈ {1, 2, ..., 8}). Although the parameters a i , b i , c i , d i in our epistemic states are not observable, constraints upon them (Eqs. (B5) and (B6)) have consequences for the set of possible triples (s, c, ). Finding the set of inequalities over only (s, c, ) that is implied by the full set of linear equalities and inequalities of Eqs. (B5)-(B11) is algebraically tedious by hand, but straightforward using the well-known Fourier Motzkin Elimination algorithm, which returns our result It is worth noting that the technique for deriving noncontextuality inequalities we have introduced here, insofar as it reduces to a convex hull problem, is an instance of the problem of quantifier elimination. Recent work in quantum foundations has seen increasing use of quantifier elimination algorithms, in noncontextuality [42,43] as well as other scenarios. Fourier-Motzkin elimination, which is appropriate for problems wherein the dependence on the variables to be eliminated is linear, has been used to derive Bell inequalities [44], and also recently, to derive Bell-like inequalities for novel causal scenarios [45][46][47].
In Ref. [47], where the problem is reduced to what is known as the classical marginals problem-that of determining whether a given set of distributions on various subsets of a set of variables can arise as the marginals of a single joint distribution over all of the variables-this problem can be solved by performing quantifier elimination on the probabilities in the joint distributions using convex hull algorithms. Nonlinear quantifier elimination using cylindrical algebraic decomposition has also found application in deriving Bell-like inequalities in simple scenarios [46,48]. We anticipate that these more general techniques for quantifier elimination will ultimately also find applications to the derivation of noncontextuality inequalities.
Appendix C: Noisy quantum realization which violates our noncontextuality inequality We now sketch a quantum realization of the MESD scenario for any given values of c and satisfying the assumed symmetries and operational equivalences and violating our noncontextuality inequality for all values of c and . (The ideal quantum realization of the MESD scenario, given earlier, was defined only for = 0.) There is no general technique for finding the set of all data tables achievable in quantum theory for some prepare-and-measure scenario. For some cases (e.g., Bell tests), this set can be approximated efficiently via the Navascues-Pironio-Acin hierarchy [49]. For situations with multiple preparations or additional constraints, no such method exists yet.
However, we can apply our understanding from Section V A to construct a quantum model which recovers Eq. (36), which we conjecture is optimal for qubits. Namely, because we want to find the maximum value of s consistent with a given c and , we should attribute as much of the confusability as possible to noise in the M φ and M ψ measurements, and only attribute the remainder of the confusability to nonorthogonality of the states. As such, in this section we allow the effects E φ , E ψ , Eφ, and Eψ to be noisy POVM elements (unlike in Appendix A, where E φ denoted a projector onto |φ , and so on).
Imagine P φ prepares state |0 on the Bloch sphere and P ψ prepares a pure state |θ rotated by θ ∈ [0, π] with respect to |0 in the X − Z plane. We will specify the value of θ later. Within this plane, the effect E φ must lie on the green line shown in Fig. 9, since only these effects imply 0| E φ |0 = 1 − . The choice of E φ that yields the maximum confusability is the one on the green line, closest to |θ (but not closer to |θ than to |0 , since that would imply that c ≥ 1 − ). The remainder of the confusability must then be attributed to the nonzero inner product between the two pure states, so θ is fixed by θ| E φ |θ = c. Now that the two states are specified, calculating the optimal (Helstrom) probability is a simple quantum calculation whose result gives Eq. (36), that is The remaining states and effects are completely fixed by the assumed symmetries and operational equivalence. For a general pair of c and , this quantum model outperforms the optimal noncontextual model, as seen in Fig. 5.
Appendix D: Full set of noncontextuality inequalities for MESD without symmetries As promised in Section IV A, we now derive the full set of noncontextuality inequalities for our operational MESD scenario when the symmetries of Eqs. (30)- (32) are not assumed. In Table III we show a general data table for 3 binary measurements and 4 preparations which respect our operational equivalence. There are 9 free parameters, since the probabilities in the last column are fixed by those in the first three. The procedure from Appendix B yields the following set of inequalities over the 9 free parameters, which are necessary and sufficient for the data to have been generated by a noncontextual model respecting operational equivalence Eq. (29): Of course, these inequalities reproduce Eq. (35) if the symmetries are now imposed.
In deriving these inequalities, we have assumed the logical labeling of Eq. (33). If one drops the labeling condition, then the resulting inequalities are identical to the facets of the Bell polytope discussed in Section VII (but have no practical relevance to minimum error state discrimination).