Bounding and simulating contextual correlations in quantum theory

We introduce a hierarchy of semidefinite relaxations of the set of quantum correlations in generalised contextuality scenarios. This constitutes a simple and versatile tool for bounding the magnitude of quantum contextuality. To illustrate its utility, we use it to determine the maximal quantum violation of several noncontextuality inequalities whose maximum violations were previously unknown. We then go further and use it to prove that certain preparation-contextual correlations cannot be explained with pure states, thereby showing that mixed states are an indispensable resource for contextuality. In the second part of the paper, we turn our attention to the simulation of preparation-contextual correlations in general operational theories. We introduce the information cost of simulating preparation contextuality, which quantifies the additional, otherwise forbidden, information required to simulate contextual correlations in either classical or quantum models. In both cases, we show that the simulation cost can be efficiently bounded using a variant of our hierarchy of semidefinite relaxations, and we calculate it exactly in the simplest contextuality scenario of parity-oblivious multiplexing.


I. INTRODUCTION
The contextuality of quantum theory is a fundamental sign of its nonclassicality that has been investigated for several decades. While contextuality was originally established as a property specific to the formalism of quantum theory [1,2], it has, in more recent times, been further generalised as a property of nonclassical probability distributions that can arise in operational theories [3]. This operational notion of contextuality is applicable to a broad range of physical scenarios and has been shown to be linked to a variety of foundational and applied topics in quantum theory (see, e.g., Refs. [4][5][6][7][8][9][10][11][12]).
The principle of noncontextuality holds that operationally equivalent physical procedures must correspond to identical descriptions in any underlying ontological model [3]. This assumption imposes constraints on the correlations that can be obtained in prepare-and-measure scenarios involving operationally equivalent preparations and measurements. In such scenarios, which we term "contextuality scenarios", the correlations obtainable by noncontextual models can be characterised in terms of linear programming [13]. In contrast, quantum models that nonetheless respect the operational equivalences may produce "contextual correlations" unobtainable by any such noncontextual model [3]. This leads to a conceptually natural question: how can we determine if, for a given contextuality scenario, a given set of contextual correlations is compatible with quantum theory? This question is crucial for understanding the extent of nonclassicality manifested in quantum theory, and hence also for the development of quantum information protocols powered by quantum contextuality. While an explicit quantum model is sufficient to prove compatibility with quantum theory, proving the converse-that no such model exists-is more challenging.
Here, we provide an answer to the question by introducing a hierarchy of semidefinite relaxations of the set of quantum correlations arising in contextuality scenarios involving arbitrary operational equivalences between preparations and measurements. This constitutes a sequence of increasingly precise necessary conditions that contextual correlations must satisfy in order to admit of a quantum model. Thus, if a given contextual probability distribution fails one of the tests, it is incompatible with any quantum model satisfying the specified operational equivalences. We exemplify their practical utility by determining the maximal quantum violations of several different noncontextuality inequalities (for noisy state discrimination [14], for three-dimensional parity-oblivious multiplexing [15], for the communication task experimentally investigated in Ref. [16], and for the polytope inequalities obtained in Ref. [13]). Then, we apply our method to solve a foundational problem in quantum contextuality: we present a correlation inequality satisfied by all quantum models based on pure states and show that it can be violated by quantum strategies exploiting mixed states. Thus, we prove that mixed states are an indispensable resource for strong forms of quantum contextuality.
Equipped with the ability to bound the magnitude of quantum contextuality, we ask what additional resources are required to simulate preparation contextual correlations with classical or quantum models. We identify this resource as the preparation of states deviating from the required operational equivalences, and quantify this deviation in terms of the information extractable about the operational equivalences via measurement. This allows us to interpret preparation contextuality scenarios, and experiments aiming to simulate their results, as particular types of informationally restricted correlation experiments [17,18]. For both classical and quantum models, we show that the simulation cost can be lower bounded using variants of our hierarchy of semidefinite relaxations. We apply these concepts to the simplest preparation contextuality scenario [19], where we explicitly derive both the classical and quantum simulation costs of contextuality.

II. CONTEXTUALITY
Consider a prepare-and-measure experiment in which Alice receives an input x ∈ [n X ] := {1, . . . , n X }, prepares a system using the preparation P x and sends it to Bob. Bob receives an input y ∈ [n Y ], performs a measurement M y and obtains an outcome b ∈ [n B ]; this event is called the measurement effect and is denoted [b|M y ]. When the experiment is repeated many times it gives rise to the conditional probability distribution p(b|x, y) := p(b|P x , M y ).
An ontological model provides a realist explanation of the observed correlations p(b|x, y) [3]. In an ontological model, the preparation is associated to an ontic variable λ subject to some distribution (i.e., an epistemic state) p(λ|x) and the measurement is represented by a probabilistic response function depending on the ontic state, p(b|y, λ). 1 The observed correlations are then written p(b|x, y) = λ p(λ|x)p(b|y, λ). (1) Notice that every probability distribution admits of an ontological model.

A. Operational equivalences
The notion of noncontextuality becomes relevant when certain operational procedures (either preparations or measurements) are operationally equivalent [3]. Two preparations P and P are said to be operationally equivalent, denoted P P , if no measurement 2 can distinguish them, i.e., In prepare-and-measure experiments, we are particularly interested in operationally equivalent procedures obtained by combining preparations P x or measurement effects [b|M y ]. Specifically, one may have (hypothetical) prepa- where {α x } x and {β x } x are convex weights (i.e., nonnegative and summing to one) and, likewise, measurement 1 These distributions must, respectively, be linear in preparations x (since the epistemic state of a mixture of preparations is the mixture of the epistemic states of the respective preparations) and measurement effects [b|My] (since a measurement effect arising from a mixture of measurements or a post-processing of outcomes must be represented by the corresponding mixture and post-processing of response functions). 2 Here, the quantifier is over all possible measurements (that, e.g., Bob could perform), not only the fixed set {My}y he uses in the prepare-and-measure experiment at hand.
Such operationally equivalent procedures can naturally be grouped into equivalence classes, and it will be convenient for us to specify equivalent procedures in a slightly different, yet equivalent, way as follows.
Note that any operational equivalence of the form can be specified in this way. 3 The formulation of Definition 1 allows us to consider natural partitions into K ≥ 2 or L ≥ 2 sets, which will prove useful later. For example, if one had three operationally equivalent preparations of the form 1 2 (P 1 + P 2 ) 1 2 (P 3 + P 4 ) 1 2 (P 5 + P 6 ), we can express this as a single operational equivalence rather than several pairwise equivalences.

B. Contextuality scenarios and noncontextuality
With these basic notions, we can now more precisely define the kind of scenario in which we will study noncontextuality and its precise definition in such settings. In particular, we consider prepare-and-measure scenarios of the form described above in which Alice's preparations and Bob's measurements must obey fixed sets of operational equivalences.
are preparation and measurement operational equivalences, respectively.
Note that the normalisation of the probability distribution p(b|x, y) implies that b [b|M y ] = b [b|M y ] for all y, y , and hence every ontological model must satisfy the corresponding operational equivalence. We will generally omit this trivial operational equivalence from the specification of a contextuality scenario.
The notion of (operational) noncontextuality formalises the idea that operationally identical procedures must have identical representations in the underlying ontological model [3].
Definition 3. An ontological model is said to be: (a) Preparation noncontextual if it assigns the same epistemic state to operationally equivalent preparation procedures; i.e., if the preparations P x satisfy an operational equivalence ∀λ : (b) Measurement noncontextual if it endows operationally equivalent measurement procedures with the same response function; i.e., if the measurement effects [b|M y ] satisfy an Finally, if an ontological model is both preparation and measurement noncontextual, we simply say that it is noncontextual.
The assumption of noncontextuality imposes nontrivial constraints on the probability distributions that can arise in an ontological model [3].
Definition 4. Given a contextuality scenario, the correlations p(b|x, y) are said to be (preparation/measurement) noncontextual if there exists a (preparation/measurement) noncontextual ontological model satisfying the operational equivalences of the scenario and reproducing the desired correlations. If no so much model exists, we say that the correlations are (preparation/measurement) contextual.
It is known that the set of noncontextual correlations (and, likewise, the sets of preparation or measurement noncontextual correlations) forms, for a given contextuality scenario, a convex polytope delimited by noncontextuality inequalities [13].

C. Quantum models
Here, we are particularly interested in what correlations can be obtained in contextuality scenarios within quantum mechanics. In quantum theory, a preparation P corresponds to a density matrix ρ (i.e., satisfying ρ 0 and tr(ρ) = 1), and two preparations ρ and ρ are operationally equivalent if and only if ρ = ρ . Preparation operational equivalences thus correspond to different decompositions of the same density matrix. Likewise, a measurement corresponds to a positive operator-valued measure (POVM) where the E b are the measurement effects. Measurement effects E b and E b are thus operationally equivalent if and only if We can thus specify precisely what a quantum model for a contextuality scenario corresponds to.
is given by two sets of Hermitian positive semidefinite operators {ρ x } n X x=1 and ∀y : as well as the operational equivalences ∀r, k : ∀q, : for some operators σ r and τ q independent of k and . If a quantum model consists only of pure states (i.e., if ρ 2 x = ρ x for all x) or projective measurements (i.e., if E 2 b|y = E b|y and E b|y E b |y = 0 for all b, b , y), then we will call the model pure or projective, respectively.
It turns out that quantum theory is conceptually different from standard realist models, in the sense that there exist quantum models for contextuality scenarios-that thus respect the specified operational equivalences-but nevertheless can give rise to contextual correlations [3]. Quantum theory is thus said to be contextual.
Interestingly, quantum models cannot provide any advantage over noncontextual ontological models in the absence of nontrivial preparation operational equivalences [20]. In this sense, quantum theory is measurement noncontextual. Conversely, however, quantum contextuality can be witnessed in contextuality scenarios involving only operational equivalences between the preparations (along with the trivial measurement operational equivalence arising from Eq. (9), which is necessarily satisfied by any quantum model for any contextuality scenario). For this reason there has been particular interest in preparation noncontextual inequalities, although interesting contextuality scenarios involving both preparation and measurement operational equivalences have been proposed (see e.g. [8,13,21,22]).

III. A HIERARCHY OF SDP RELAXATIONS
In recent years, hierarchies of semidefinite programming (SDP) relaxations of the set of quantum correlations have become an invaluable tool in the study of quantum correlations [23,24]. Such a hierarchy capable of bounding contextual correlations in contextuality scenarios, where operational equivalences must be taken into account, has thus far, however, proved elusive, and it is this problem we address here.
The fundamental question we are interested in is the following: given a contextuality scenario (n X , n Y , n B , {E (r) M }) and a probability distribution p(b|x, y), does there exist a quantum model for the scenario reproducing the observed correlations, i.e., satisfying p(b|x, y) = tr(ρ x E b|y )?
Note that, in contrast to many scenarios in quantum information, such as Bell nonlocality, it is not a priori clear that, in the search for such a quantum model, one can restrict oneself to pure states and projective measurements despite the fact that no assumption on the Hilbert space dimension is made. Indeed, while one can always purify a mixed state, or perform a Naimark dilation of the POVMs, such extensions may no longer satisfy the operational equivalences of the contextuality scenario.
Although SDP hierarchies have previously been formulated for prepare-and-measure scenarios [18,[25][26][27], the main challenge for contextuality scenarios is to represent the constraints arising from the operational equivalences. Here, we adopt an approach motivated by a recent hierarchy [18] bounding informationally restricted correlations [17] and the fact that operational equivalences can be interpreted as restrictions on the information obtainable about equivalent operational procedures (see also Sec. V A).

A. Necessary conditions for a quantum model
Similarly to other related SDP hierarchies, our approach to formulate increasingly strict necessary conditions for the existence of a quantum model is based on reformulating the problem in terms of the underlying moment matrix of a quantum model. To this end, let us define the set of operator variables where σ r , τ (with r ∈ [R], ∈ [L]) are variables corresponding to the operators defined in Eqs. (10) and (11) and will be used to enforce robustly the operational equivalences. Consider a list S = (S 1 , . . . , S |S| ) of monomials (of degree at least one) of variables in J. We say that S represents the kth degree of the hierarchy if it contains all monomials over J of degree at most k. 4 The choice of S will lead to different semidefinite relaxations, but it should at least include all elements of J. Given a monomial list S, the existence of a quantum model implies the existence of a moment matrix Γ whose elements, labelled by the monomials in u, v ∈ S, are and satisfy a number of properties that form our necessary conditions. Some of these constraints are common with those found in similar hierarchies (points (I)-(III) below), while others capture important aspects of quantum models for contextuality scenarios (points (IV)-(V)) and will be expressed through localising matrices [28]. We outline these constraints below.
(I) Hermitian positive semidefiniteness. By construction the moment matrix is Hermitian and it is easily seen to be positive semidefinite [23], i.e., (II) Consistency with p. Since the quantum model must reproduce the correlations p(b|x, y), Γ must satisfy ∀x, y, b : Γ ρx,E b|y = p(b|x, y).
(III) Validity of states and measurements. Since any quantum model must satisfy the constraints of Eqs. (8) and (9), Γ must satisfy as well as linear identities of the form where the sum is over all monomials u, v in S. These constraints are, in particular, those satisfied by any quantum model that follow from the validity of the states and measurements making up the model and the cyclicity of the trace. For example, Eq. (17) includes constraints of the form b Γ E b|y ,E b |y = Γ 1,E b |y , as well as constraints such as Γ E b|y ,ρxE b |y = Γ ρx,E b |y E b|y which follows from the fact that tr(E b|y ρ x E b |y ) = tr(ρ x E b |y E b|y ). It thus includes the constraints implied by the trivial operational equivalence following from Eq. (9) that are satisfied by any quantum model, thereby justifying the fact that we generally do not explicitly include this operational equivalence relation when specifying contextuality scenarios.
Note that if we were to assume the quantum model is either pure or projective (so that, respectively, either ρ 2 x = ρ x , or E 2 b|y = E b|y and E b|y E b |y = 0), then this implies further constraints of the form (17). In particular, one can always make this assumption if there are no nontrivial operational equivalences of the corresponding type, allowing the SDP hierarchy we formulate to be simplified, but can also be considered as an additional assumption of interest (see Sec. IV B).
(IV) Operational equivalences. A quantum model must satisfy the operational equivalences of Eqs. (10) and (11). While this implies that the traces of each side of those equations must equal-which in turn imposes the corresponding linear identities on the moment matrix-this alone does not fully capture the constraints implied by the operational equivalences, and notably is not enough to provide a good hierarchy. To properly enforce these constraints, we draw inspiration from the hierarchy of informationally restricted quantum correlations [18] and make use of localising matrices. These are additional matrices of moments whose elements (or a subset thereof) are linear combinations of elements of Γ, and which themselves must be positive semidefinite [28].
We thus define, for all r ∈ [R], k ∈ [K (r) ] and all q ∈ [Q], ∈ [L (q) ], the localising matricesΛ (r,k) andΛ (q, ) with elements which are labelled now by monomials from a monomial list L, in general different from S (and which, in principal, could differ for each localising matrix). Ideally, L should be chosen so that the elements of the localising matrices are linear combinations of elements of the moment matrix Γ. For a quantum model exactly satisfying exactly the operational equivalences E (r) P and E (q) M , with σ r and τ q defined as in Eqs. (10) and (11) . Such complicated matrix equality constraints (which one could in principle enforce without defining the localising matrices), however, tend to lead to poor results in practice due to the numerical instability of SDP solvers. Instead, we impose the more robust constraints thatΛ (r,k) ,Λ (q, ) 0 (along with the equality constraints on the traces of σ r , τ q , which serve to "normalise" the localising matrices), which follow from the existence, for any quantum model, of Hermitian operators σ r , τ q satisfying σ r We thus have, for all r, k, q, Moreover, whenever the monomials u, σ r u and ρ x u are in S we haveΛ and, when u, τ q u and E b|y u are similarly in S, thereby relating the localising matrices to the moment matrix Γ.
We note that the operators σ r and τ q , and the localising matrices expressing the deviation of their moments from those of the operational equivalences, hence play the role of slack variables to robustly enforce the operational equivalencies. As we will see in Sec. V, the formulation we adopt here will also allow a natural generalisation allowing us to study the simulation cost of preparation contextuality, where the trace of σ r has a natural interpretation, further motivating our choice to present the constraints in the form given here.
(V) Positivity of states and measurements. In most SDP hierarchies used in quantum information, one can assume without loss of generality that the states and measurements in question are projective (see e.g. [23,24]); since all projective operators are positive semidefinite, it is not necessary in such cases to consider explicitly the constraints the positive semidefiniteness of the operators in a quantum model imposes on a moment matrix. As already mentioned, however, for contextuality scenarios this is not a priori the case, and to capture the constraints implied by the positive semidefiniteness of states and measurements (i.e., ρ x , E b|y 0) we again exploit localising matrices.
Let us thus introduce the localising matrices (for all x, y, b) which are labelled by monomials from a monomial list O, in general different from S (and which, as for L, in principle could differ for each x, y, b). Ideally, O should be chosen so that the elements of the localising matrices are also elements of the moment matrix Γ.
It is easily seen that the positive semidefiniteness of ρ x and E b|y implies which in turn (for well chosen O) constrains Γ. Moreover, for all u, v in O, whenever the monomials u, ρ x v or, respectively, u, E b|y v are in S we havẽ thereby relating the localising matrices to the main moment matrix.
For given choices of the moment lists S, L and O, the constraints presented above thus provide necessary conditions for a given correlation to have a quantum realisation in the contextuality scenario. Note moreover that, by standard arguments [23], one can actually assume the moment matrix (and localising matrices) are real since the above constraints only involve real coefficients. These conditions are all semidefinite constraints, which leads us to the following proposition summarising our hierarchy of SDP relaxations.
in a given contextuality scenario reproducing the correlations {p(b|x, y)} b,x,y is the feasibility of the following SDP: where the above operators are all symmetric real matrices.
By taking increasingly long monomials lists S, L and O, one thus obtains increasingly strong necessary conditions for a quantum realisation, and which can be efficiently checked by standard numerical solvers for SDPs.
While the above hierarchy applies to arbitrary contextuality scenarios, in many scenarios or situations of interest, it can be somewhat simplified. In particular, if one wishes to determine whether a given correlation is compatible with a pure and/or projective quantum model, the extra constraints imposed on the states and measurement effects (cf. Definition 5) correspond to further linear constraints in Eq. (29d), meaning that the corresponding localising matricesΥ x and/orΥ (b,y) (and subsequent constraints in Eq. (29h)) are not required. Similarly, if there are either no preparation or no measurement operational equivalences present in the problem (i.e., if R = 0 or Q = 0) then the corresponding localising matricesΛ (r,k) orΛ (q, ) (and subsequent constraints in Eqs. (29e) and (29f)) are also not required. The later case is particularly relevant in many (preparation) contextuality scenarios of interest, including the examples we consider in the following section. To illustrate this, in Appendix A we show how the SDP simplifies for the case of preparation noncontextuality, where only nontrivial preparation operational equivalences are considered.
Although the above hierarchy solves a feasibility problem, asking whether a distribution p(b|x, y) is compatible with a quantum model for the contextuality scenario, in practice one is often interested with maximising a linear functional of the probability distribution over all possible quantum modelsi.e., a noncontextuality inequality-perhaps subject to some further constraints on the distribution. It is easily seen that, following standard techniques, the hierarchy of necessary conditions we have presented also allows one to bound such optimisation problems by instead maximising the correspond-ing functional over all feasible solutions to the SDP of Proposition 6.
As we will see in the following section, the hierarchy of Proposition 6 allows us to readily obtain tight bounds on quantum contextual correlations in many scenarios of interest. However, in some cases involving both nontrivial preparation and measurement operational equivalences and no assumptions of pure states or projective measurements, it performs relatively poorly in practice. This appears to stem from the fact that, in such cases, the probabilities p(b|x, y) do not appear on the diagonal of the moment matrix or any of the localising matrices. In Appendix B we show how these difficulties can be overcome by presenting a modified version of our hierarchy, obtained by taking the operators { √ ρ x } x and/or { √ E b|y } b,y in the operator set J (cf. Eq. (12)) instead of {ρ x } x and {E b|y } b,y , an approach which we believe may be of independent technical interest.

IV. APPLICATIONS OF THE SDP HIERARCHY
We implemented a version of this hierarchy (and the variant described in Appendix B) in MATLAB, exploiting the SDP interface YALMIP [29], and our code is freely available [30]. Our implementation can handle arbitrary contextuality scenarios, restrictions to pure or projective quantum models or to classical (commuting) models, and solve either the feasibility SDP of Proposition 6 or maximise a linear functional of the correlations p(b|x, y) subject to linear constraints on the probabilities. In solving large SDP problems that would otherwise be numerically intractable, it can make use of RepLAB [31,32] (a recently developed tool for manipulating finite groups with an emphasis on SDP applications) to exploit symmetries in noncontextuality inequalities, a capability we exploit in obtaining some of the results presented below.

A. Quantum violations of established preparation noncontextuality inequalities
To illustrate the usefulness of the hierarchy described in Proposition 6, we first exploit it to derive tight bounds on the maximal quantum violation of three preparation noncontextuality inequalities introduced in previous literature. In Appendix C we detail the analysis of two examples based on the inequalities derived in Ref. [15] and the inequalities experimentally explored in Ref. [16]. Here, we focus on the noncontextuality inequalities for state discrimination presented in Ref. [14].
To reveal a contextual advantage in state discrimination, Ref. [14] considers a scenario with x ∈ [4], y ∈ [3] and b ∈ [2] and attempts to discriminate the preparations P 1 and P 2 , while P 3 and P 4 are symmetric extensions that ensure the operational equivalence 1 2 P 1 + 1 2 P 3 1 2 P 2 + 1 2 P 4 . The first two measurements (y = 1, 2) correspond to distinguishing preparations P 1 and P 3 , and P 2 and P 4 , respectively (in the noiseless case, these should be perfectly discriminable); while the third (y = 3) corresponds to the state discrimination task, i.e., discriminating P 1 and P 2 . There are three parameters of interest: the probability of a correct discrimination, s; the probability of confusing the two states, c; and the noise parameter, . Under the symmetry ansatz considered, the observed statistics are thus required to satisfy The authors show that, for ≤ c ≤ 1 − , the following noncontextuality inequality holds: What is the maximal quantum advantage in the task? Ref. [14] presented a specific family of quantum models that achieve which violates the bound (31), and conjectured it to be optimal for qubit systems. The semidefinite programming hierarchy presented in the previous section allows us to place upper bounds on s for given values of (c, ) by maximising s under the above constraints. Using a moment matrix of size 42 and localising matrices of size 7, 5 we systematically performed this maximisation with a standard numerical SDP solver [33] for different values of (c, ) by dividing the space of valid such parameters (i.e., satisfying ≤ c ≤ 1 − ) into a grid with spacing of 0.01. We consistently obtained in every case an upper bound agreeing with the value in Eq. (32) to within 10 −5 , which is consistent with the precision of the SDP solver. We thus find that Eq. (32) indeed gives the maximal quantum contextual advantage in state discrimination.
For the interested reader, in Appendix D we use this example to show more explicitly what form the constraints of the SDP hierarchy take and how they relate the moment matrix and localising matrices.

B. Mixed states as resources for quantum contextuality
In many forms of nonclassicality, such as Bell nonlocality, steering and quantum dimension witnessing, the strongest quantum correlations are necessarily obtained with pure states. In the former two, this stems from the fact that any mixed state can be purified in a larger Hilbert space. In the 5 The precise lists of moments used in this and all subsequent examples can be found along with our implementation of the SDP hierarchy, where the code generating these results is available [30]. In all these examples we take the monomial lists L and O for the localising matrices to be the same (simply because this was sufficient to obtain the presented results), although one could indeed take these to be different if desired.
latter, it follows from the possibility to realise a mixed state as a convex combination of pure states of the same dimension. Interestingly, however, it is a priori unclear whether mixed states should play a more fundamental role in quantum contextuality: both purifications of mixed states and post-selections on pure-state components of mixed states may break the operational equivalences between preparation in contextuality scenarios. Here we show that this intuition turns out to be correct: preparation contextuality indeed is exceptional as mixed states are needed to obtain some contextual quantum correlations.
To prove this, we consider the noncontextuality scenario of Hameedi-Tavakoli-Marques-Bourennane (HTMB) [16]. In this scenario, Alice receives two trits, x := x 1 x 2 ∈ {0, 1, 2} 2 and Bob receives a bit y ∈ [2] and produces a ternary outcome b ∈ {0, 1, 2}. There are two operational equivalences involved, corresponding to Alice sending zero information about the value of the sums x 1 + x 2 and x 1 + 2x 2 (modulo 3), respectively. Each of these corresponds to a partition of Alice's nine preparations into three sets. Under these constraints, Alice and Bob evaluate a Random Access Code [34]. The HTMB inequality bounds the success probability of the task in a noncontextual model [16]: We revisit this scenario and employ our semidefinite relaxations to determine a bound on the largest value of A HTMB attainable in a quantum model in which all nine preparations are pure. As described following Proposition 6, this scenario can easily be considered with our hierarchy by simply including the linear constraints following from ρ 2 x = ρ x (for all x) in Eq. (29d) and noting that the localising matricesΥ x are no longer required. Using a moment matrix of size 2172 and localising matrices of size 187, we find that A HTMB 0.667 up to solver precision. To make such a large SDP problem numerically tractable, we used RepLAB [31,32] to make the moment matrix invariant under the symmetries of the random access code, thereby significantly reducing the number of variables in the SDP problem. This gives us strong evidence (i.e., up to numerical precision) that pure states cannot violate the HTMB inequality (33), and we conjecture this to indeed be the case exactly. 6 Importantly, however, mixed states are known to enable a violation of the inequality: sixdimensional quantum systems can achieve A HTMB ≈ 0.698 [16]. 7 This shows that sufficiently strong contextual quantum 6 Note that the large size of the moment matrices meant that the solver precision we were able to obtain is somewhat reduced compared to the other examples discussed in this paper. Our numerical result agrees with the noncontextual bound of 2/3 to within 2 × 10 −4 , which is within an acceptable range given the error metrics returned by the solver. 7 We were similarly able to use our hierarchy to place an upper bound on the quantum violation of this inequality at A HTMB 0.704 using a moment matrix of size 3295 and localising matrices of size 268 with the solver SCS [35]. We note that obtaining this bound required using terms from the 4th level of the hierarchy. We leave it open as to what the tight quantum bound is. correlations can require the use of mixed states.

C. Quantum violation of contextuality inequalities involving nontrivial measurement operational equivalences
The examples discussed above focused on preparation contextuality scenarios, in which there are no non-trivial measurement operational equivalences. Nonetheless, quantum contextuality can also be observed in scenarios involving measurement operational equivalences (in addition to preparation operational equivalences), and we demonstrate the ability of our hierarchy to provide tight bounds in such scenarios by applying to the noncontextuality inequalities derived in Ref. [13].
In Ref. [13], the authors consider a scenario with x ∈ [6], y ∈ [3] and b ∈ [2] where the preparations satisfy the operational equivalence 1 2 (P 1 + P 2 ) 1 2 (P 3 + P 4 ) 1 2 (P 5 + P 6 ) and the measurements satisfy the operational equivalence The authors completely characterised the polytope of noncontextual correlations in this contextuality scenario, finding the following 6 inequivalent (under symmetries), nontrivial noncontextuality "facet" inequalities (where we use the notation p xy := p(1|x, y)): While it was shown in Ref. [21] that a quantum model can violate the first of these inequalities and obtain the logical maximum of I 1 = 3, the degree to which the other inequalities can be violated has not, to our knowledge, previously been studied. We note that this question is also addressed in the parallel work of Ref. [36].
In this scenario, where we have both nontrivial preparation and measurement operational equivalences, we failed to obtain nontrivial bounds on these inequalities using the basic hierarchy described by Proposition 6. Instead, we employed the variant of the hierarchy described in Appendix B which uses the principal square roots of the states ρ x and/or measurements E b|y in the operator list, but otherwise follows the same approach. This hierarchy, which is a strict extension of the one described by Proposition 6, allowed us to place strong bounds on all the above inequalities. Indeed, using moment matrices of size 1191 and localising matrices of size 85 (and monomials involving square roots of measurement operators, but not of states; see Appendix B), we obtained the following Using a see-saw optimisation approach, for all six inequalities we were able to obtain quantum strategies saturating the bounds from the hierarchy, showing that they are in fact tight up to the precision of the SDP solver. Interestingly, we were moreover able to show that the maximum quantum violation of the third inequality (34c) cannot be obtained with projective measurements. Indeed, by using the hierarchy of Proposition 6 and imposing the constraints following from the projectivity of POVM elements (and using the same monomial lists as for the above results) we were able to show that I 3 3.464 for projective quantum models. Using a see-saw optimisation, we were able to obtain projective quantum models saturating this bound to numerical precision, thereby confirming its tightness and showing that non-projective measurements, just like mixed states, are resources for quantum contextuality.

V. SIMULATING PREPARATION CONTEXTUALITY
Quantum correlations are famously capable of going beyond those achievable in classical theories in numerous scenarios, as highlighted by the violation of Bell inequalities and, indeed, noncontextuality inequalities. One can likewise consider correlations that are even stronger than those observed in nature, which we call "post-quantum" correlations. Interest in post-quantum theories stems from them nonetheless respecting physical principles such as no-signalling, and understanding what physical principles distinguish quantum and postquantum correlations can lead to new insights into quantum theory itself [37][38][39].
An interesting strategy to study the correlations obtained by different physical theories is to ask what kind of resource, and how much of it, one should supplement a theory with to achieve stronger correlations. This question has been extensively studied in the context of simulating Bell correlations with classical theory and additional resources. Two such resources that can be used in that case are classical communication [40,41] and measurement dependence [42]. Similarly, various resources have also been investigated in Kochen-Specker contextuality experiments with the goal of simulating quantum correlations within a classical theory [43][44][45]. To our knowledge, however, nothing is known about what resources would be necessary to simulate operationally contextual correlations, and in particular the especially relevant resource of preparation contextuality.
In this section, we begin by casting preparation contextuality scenarios as information-theoretic games, and show how these allow us to formalise a notion of simulation cost, for both classical and quantum models. The resource used is the preparation of states which deviate from the required operational equivalences. This is a natural figure of merit as the defining feature of a model for noncontextual correlations within a given theory is that the underlying ontological model obeys the specified operational equivalences; it is thus this condition that must be violated in some way if stronger correlations are to be simulated. We leverage our hierarchy of semidefinite relaxations to quantify both the simulation of quantum contextual correlations using classical theory, and the simulation of post-quantum correlations using quantum theory.

A. Zero-information games
To show how the cost of simulating preparation contextuality can be quantified in information theoretic terms, we begin by giving an alternative interpretation for preparation contextuality scenarios (i.e., contextuality scenarios involving only nontrivial operational equivalences between sets of preparations). In particular, we will describe how preparation contextuality scenarios can be interpreted as games in which Alice is required to hide some knowledge about her input x (see, e.g., Ref. [46]).
Consider thus a contextuality experiment involving R preparation operational equivalences. For a given such equivalence r ∈ [R] involving a partition into K r sets S K (r) } the state they receive is sampled from? The optimal discrimination probability in an operational theory is wherep is the response distribution for the discrimination. Using that Kr k=1p (k|x) = 1, it straightforwardly follows (see Appendix E) that the discrimination probability is G (r) = 1 K (r) (i.e., random) if and only if the rth operational equivalence is satisfied. The discrimination probability constitutes an operational interpretation of the min-entropic accessible information about the set membership of x [47], and is convenient to work with. More precisely, the accessible information is given by I r = log 2 (K (r) ) + log 2 (G (r) ).
Thus, we can associate the operational equivalences to an information tupleĪ = (I 1 , . . . , I R ). A contextuality experiment is a zero-information game since G (r) = 1 K (r) for all r is equivalent to vanishing information:Ī =0.

B. Information cost of simulating preparation contextuality
Since a vanishing information tupleĪ is necessary for a faithful realisation of a contextuality scenario in a given physical model, it follows that contextual correlations that cannot be explained in said model require an overhead information, i.e., an information tupleĪ =0. In both classical (noncontextual) models and quantum theory, this means that the preparations are allowed to deviate from the operational equivalences specified by the contextuality scenario to an extent quantified by the overhead information. By doing so, one necessarily goes beyond a standard model for the scenario, as defined in Definition 3 for classical models and Definition 5 for quantum theory.
For the simplest case of a single operational equivalence (i.e., R = 1), we define the information cost, Q, of simulating p(b|x, y) in quantum theory as the smallest amount of overhead information required for quantum theory to reproduce the correlations: However, when several operational equivalences are involved, the information is represented by a tupleĪ and it is unclear how the information cost of simulation should be defined (note, in particular, that the operational equivalences may not be independent, so information about one may also provide information about another). We thus focus here on the simpler case described above, and leave the more general case of R > 1 for future research.
It is not straightforward to evaluate Q. However, by modifying our semidefinite relaxations of contextual quantum correlations we can efficiently obtain lower bounds on Q in general scenarios. Indeed, note that from Eq. (36), interpreted in a quantum model, it follows that if σ r satisfies σ r x∈S (r) k ξ (r) k (x)ρ x for every k ∈ [K (r) ] then one has G (r) ≤ 1 Kr tr(σ r ). Thus, rather than imposing the constraint arising from tr(σ r ) = 1 in our hierarchy of semidefinite relaxations, we can instead minimise ( 1 Kr times) the term corresponding to tr(σ r ) in the moment matrix, which thus provides an upper bound on G (r) . Note that this provides an alternative interpretation to the constraint that Γ 1,σr = 1 in Eq. (29e): it enforces the fact that Bob should have no information about which set S (r) k Alice's state was chosen from. This interpretation makes an interesting link to the recently developed approach to bounding informationally-constrained correlations [18], and which indeed was the initial motivation for the approach we take in this paper.
Considering still the case of R = 1, we thereby bound the information cost of a quantum simulation by evaluating the semidefinite relaxation as follows.
where G * is obtained as where the above operators are all taken to be Hermitian.
The correctness of Proposition 7 follows immediately from Eq. (37) and the fact that G * is an upper bound on G (1) .
Furthermore, one can similarly consider the information cost of simulation in classical models. In analogy with the quantum simulation cost, we define the classical simulation cost, C, as the smallest overhead information required for a classical noncontextual model to reproduce given correlations: Naturally, in contrast to quantum simulation, every contextual distribution p(b|x, y) will be associated to a non-zero classical simulation cost. In analogy with the quantum case, we can place lower bounds on the classical simulation cost using the SDP hierarchy we discussed and assuming that all variables commute, thereby introducing many further constraints on the SDP and providing necessary conditions for a classical model to exist for a given value of G. However, it turns out that a precise characterisation of the classical simulation cost, in terms of a linear program, is also possible by exploiting the fact that the set of classical, informationally restricted, correlations forms a convex polytope [17,18]. 9 Finally, we make the interesting observation that the discrimination probability G can be given a resource theoretic interpretation in terms of a robustness measure. As we discuss in Appendix F, this can be used to give an alternative interpretation of the simulation cost I.

C. Simulation cost in the simplest scenario
We illustrate the above discussion of the classical and quantum simulation costs of contextuality by applying it to arguably the simplest contextuality experiment, namely parityoblivious multiplexing (POM) [19]. In POM, Alice has four preparations (x ∈ [4] written in terms of two bits x := x 1 x 2 ∈ [2] 2 ) and Bob has two binary-outcome measurements (y ∈ [2] and b ∈ [2]). The sole operational equivalence is 1 2 P 11 + 1 2 P 22 1 2 P 12 + 1 2 P 21 , which corresponds to Alice's preparations carrying no information about the parity of her input x. The task is for Bob to guess the value of her yth input bit. The average success probability in a noncontextual model obeys In contrast, quantum models obey the tight bound A POM ≤ 1 2 1 + 1 √ 2 [19]. However, a post-quantum probability theory can achieve the algebraically maximal success probability of A POM = 1 [48].
We consider the information cost of simulating a given value of A POM (i.e., the minimal information cost over all distributions compatible with that value, which can easily be evaluated by modifying the linear and semidefinite programs defined above) in both classical and quantum models. The results are illustrated in Fig. 1. The classical simulation cost is analytically given by In Appendix G we present an explicit simulation strategy that saturates this result, while the results of the linear program and the classical version of the hierarchy coincide with this value up to numerical precision. For quantum models, we have employed the described semidefinite relaxations using a moment matrix of size 547 and localising matrices of size 89. The results are illustrated in Fig. 1. Importantly, we find that this lower bound on the quantum simulation cost is tight since we can saturate it with an explicit quantum strategy (detailed in Appendix G). The quantum simulation cost is analytically given by

VI. CONCLUSIONS
In this paper we introduced a semidefinite relaxation hierarchy for bounding the set of contextual quantum correlations and demonstrated its usefulness by applying it to solve several open problems in quantum contextuality. This approach opens the door to the investigation of the limits of quantum contextuality in general prepare-and-measure experiments, as well as potential applications thereof. Moreover, it provides the building blocks with which to explore several interesting, related questions, such as whether our approach can be extended to contextuality scenarios involving more than two parties, and whether it can be adapted to bound quantum correlations in Kochen-Specker type contextuality experiments.
By leveraging the interpretation of contextuality experiments as zero-information games, we introduced a measure of the cost of simulating preparation contextual correlations in restricted physical models, and showed how this simulation cost can be bounded in both classical and quantum models. This raises three fundamental questions: 1) How can the definition of the simulation cost be extended to scenarios with multiple preparation operational equivalences which, a priori, may not be independent? 2) How does the simulation cost of contextuality scale in prepare-and-measure scenarios with increasingly many settings? and 3) For a given number of inputs and outputs, what is the largest simulation cost possible in order for classical correlations to reproduce quantum correlations? Additionally, it would be interesting to investigate how the simulation cost of operational contextuality relates to other notions of simulation, e.g., in Bell nonlocality, Kochen-Specker contextuality and communication complexity. In particular, can our semidefinite relaxation techniques be adapted to also bound simulation costs in such correlation experiments?
Our work thus provides both a versatile tool for bounding quantum contextuality and a general framework for analysing the simulation of contextual correlations.
Finally, while finalising this article, we became aware of the related work of Ref. [36]. This work also addresses the problem of bounding the set of contextual quantum correlations. It uses a hierarchy of semidefinite programming relaxations that is considerably different to the one introduced here. For contextuality scenarios featuring measurement operational equivalences, as well as general mixed states and non-projective measurements, the hierarchy of Ref. [36] appears to provide faster convergence (they recover, for example, more readily the bounds of Eq. (35)). In contrast, the hierarchy we introduced here appears particularly well suited to preparation contextuality scenarios, admits a generalisation to quantifying the simulation cost of contextuality, and makes an interesting conceptual connection to informationally restricted quantum correlations [17,18]. simplified by noting that, in this particular case, we can assume the measurements to be projective. Indeed, we can always invoke Naimark's dilation theorem to obtain projective measurements on a larger Hilbert space that give the same statistics on the states in a given quantum model. Crucially, since there are no (nontrivial) measurement operational equivalences, these dilated projective measurements also provide a valid quantum model for the contextuality scenario in question.
Proposition 8. Let S, L, O be fixed lists of monomials from J. A necessary condition for the existence of a quantum model in a given preparation contextuality scenario reproducing the correlations {p(b|x, y)} b,x,y is the feasibility of the following SDP: where the above operators are all taken to be real symmetric matrices.
Appendix B: Variant of SDP hierarchy using principal-square-root operators In the hierarchy described in Proposition 6, if the measurements are taken to be projective or the states pure (so they are likewise described by projectors), then all of the probabilities p(b|x, y) appear on the diagonal either of one of the localising matricesΥ x orΥ (b,y) or, if both these sets of operators are projective, the moment matrix Γ. For example, in the case of projective measurements (as can always be assumed for preparation non-contextuality), one has Υ x E b|y ,E b|y = tr(E b|y ρ x E b|y ) = tr(ρ x E b|y ) = p(b|x, y). The positive semidefiniteness of these matrices thereby imposes strong constraints on the probability distribution, even at low levels of the hierarchy (notably, that they are non-negative, although the constraints are strictly stronger that this).
In the most general case, however, when no assumption of projective measurements or pure states can be made, the probabilities only appear on off-diagonal entries. In practice, we found that a consequence of this was the need to go to much higher levels of the hierarchy to obtain nontrivial constraints Indeed, for the inequalities discussed in Sec. IV C we were unable to obtain useful constraints with the hierarchy of Proposition 6. Here, we show how this hierarchy can be modified and generalised to overcame this shortcoming.
Our approach exploits the simple fact that, since the states ρ x and POVM elements E b|y are positive semidefinite, they have positive semidefinite principal square roots √ ρ x and √ E b|y such that √ ρ x √ ρ x = ρ x and √ E b|y √ E b|y = E b|y , respectively. Instead of taking the operator set J defined in Eq. (12), we reformulate our hierarchy using the finer-grained operator set The moment matrix Γ and localising matricesΛ (r,k) ,Λ (q, ) can be constructed in the same way as for the original hierarchy, while the localising matricesΥ x andΥ (b,y) are now used to enforce the positive semidefiniteness of the principal roots, and thus have elements While this modification may appear to change little, an immediate consequence is that the probabilities p(b|x, y) now appear on the diagonal of Γ; indeed one has Apart from this change in operator set, the conceptual approach of the hierarchy remains unchanged. The constraints (II)-(IV) described in Sec. III A are thus enforced in the same way, but now on the squares of the operators √ ρ x and √ E b|y around which the hierarchy is constructed. For example, the constraint that, for all x, tr(ρ x ) = 1 in any quantum model is now imposed by requiring that Γ satisfy ∀x : Γ √ ρx, √ ρx = 1. (B4) Following analogous reasoning to that of Sec. III A, we thus arrive at the following proposition describing the modified hierarchy.
Proposition 9. Let S, L, O be fixed lists of monomials from J . A necessary condition for the existence of a quantum model in a given contextuality scenario reproducing the correlations {p(b|x, y)} b,x,y is the feasibility of the following SDP: where the above operators are all symmetric real matrices.
Let us note firstly that Proposition 9 is strictly stronger than Proposition 6. Indeed, the latter can be seen as a special case of the former in which the monomial lists S, L, O are chosen so that the square root operators only ever appear in "matching" pairs.
While one may worry that one must go to higher levels of the hierarchy to obtain similarly strong constraints when employing this modified hierarchy, in practice we find that the situation is more subtle. Even in the case where either the measurements are assumed to be projective, or the states pure, we generally found that equally tight bounds could be obtained using either hierarchy. On the other hand, in the fully general case we found that the modified hierarchy of Proposition 9 provided a clear advantage.
Finally, we note that one could likewise consider the intermediate possibility of taking the principal roots of only the states or the POVM elements in the operator set. In this case, the probabilities instead appear on the diagonal of the localising matricesΥ x orΥ (b,y) . We found that, in practice, this option generally provided the best results for moment and localising matrices of a given size. Indeed, the results for the example of Sec. IV C were obtained using the operator set The implementation of our hierarchy, which is freely available [30], allows one to choose between all these different variants of the hierarchy. We finish by noting that, to our knowledge, this approach of building an SDP hierarchy from principal square root operators is novel, at least within quantum information, and may be of independent interest in other applications.

Appendix C: Maximal quantum violations of noncontextuality inequalities
Here we present two further case studies illustrating the practical usefulness of the hierarchy of semidefinite relaxations of the set of quantum correlations in contextuality experiments that we described in the main text.
1. The inequality of Ref. [16] Ref. [16] experimentally implemented a test of contextuality based on the communication games introduced in Ref. [49]. In the scenario considered there, Alice receives one of six preparations x := x 1 x 2 , where x 1 ∈ {0, 1} is a bit and x 2 ∈ {0, 1, 2} a trit. Bob receives a binary input y ∈ {0, 1} and produces a ternary outcome b ∈ {0, 1, 2}. The authors then present the following noncontextuality inequality: where m = 0, 1 and T m = x 2 − (−1) x1+y+m m − x 1 y mod 3. This inequality is valid under the operational equivalence i.e., when no information is relayed about the bit x 1 . Notably, this noncontextuality inequality is isomorphic to the Collins-Gisin-Linden-Massar-Popescu Bell inequality [50]. It is shown in Ref. [16] that a quantum strategy (based on qutrits) can achieve the violation A Q = 3+ √ 33 12 ≈ 0.7287, but the optimality of this violation was not proved.
Using our semidefinite relaxations we evaluated an upper bound on the largest possible value of A attainable in quantum theory. Specifically, using a moment matrix of size 386 and localising matrices of size 49 (with L = O) and evaluating the corresponding semidefinite program, we obtain the value A Q (up to the precision of approximately 10 −7 ). Hence, up to solver precision, this shows that the quantum protocol considered in Ref. [16] is indeed optimal.
2. The inequality of Ref. [15] Ref. [15] introduced noncontextuality inequalities based on the task of Random Access Coding. The authors consider a scenario in which Alice has an input x ∈ [d 2 ] represented as two d-valued entries x = x 1 x 2 ∈ {0, . . . , d − 1} 2 while Bob receives a binary input y ∈ [2] and produces an output b ∈ {0, . . . , d − 1}. Alice is required to communicate no information about the modular sum x 1 + x 2 mod d, i.e., her preparations must respect the operational equivalences ∀(s, s ) : where the addition is modulo d. Ref. [15] shows that the success probability in the Random Access Code in an noncontextual model obeys Notably, these noncontextuality inequalities are isomorphic to known Bell inequalities for Random Access Codes [51].
Let us focus on the case of d = 3 (note that the case of d = 2 was solved in Ref. [19]). It was shown in Ref. [15] that there exists a quantum strategy (based on qutrits) which achieves the quantum violation A Q = 7 9 . However, the authors were unable to prove that a better quantum implementation cannot be found. Using a semidefinite relaxation corresponding to a moment matrix of size 563 and localising matrices of size 52 (with L = O), we evaluated an upper bound on A valid for general quantum models. Up to solver precision, we recover the result A Q (it agrees up to order 10 −8 ) thus showing that the explicit quantum strategy of Ref. [15] is optimal.
Appendix D: Pedagogical illustration of SDP hierarchy constraints To give some further understanding into the SDP hierarchy we present in Proposition 6, and in particular the form of the moment and localising matrices and the constraints imposed upon them, we show here somewhat more explicitly the form that they take in the example we treat in Sec. IV A based on state discrimination.
In this example, the only operational equivalence is the preparation operational equivalence 1 2 P 1 + 1 2 P 3 1 2 P 2 + 1 2 P 4 , which in the form of Definition 1 is As a result, we assume the measurements are projective and use the simplified version of the SDP hierarchy given in Proposition 8.
The remaining constraints of interest are those referred to in Eq. (A1d). To illustrate these, let us expand on the form of some of the blocks in Γ. From the completeness relation b E b|y = 1 we can write Γ ρ,σE as where γ ρxσ are the elements of Γ ρ,σ . By the cyclicity of the trace and the projectivity of the measurement (i.e., E b|y E b |y = δ b,b E b|y ), the elements of Γ ρ,σE are then related to the elements of Γ ρxE,σE as (recalling that Γ u,v = tr(u † v), so the elements of the monomial u are reversed) where, for the sake of legibility, we have not yet applied the completeness relations. These, e.g., further impose that The other blocks of Γ can be reduced and related in similar ways by applying similar simplifications.
In practice, our code (which is freely accessible [30]), works by applying reductions to every element of the moment matrix to reduce it to a canonical form, before identifying the unique elements. The completeness relations can then be applied to further reduce the number of variables in the optimisation problem. We note, however, that when projective measurements are considered it is generally not actually necessary to apply the constraints arising from the completeness relation. Although one obtains a potentially weaker set of necessary conditions, in practice we rarely see any difference in the power of the hierarchy under this relaxation.

Appendix E: Contextuality experiments as zero-information games
Here we show that, for a given operational equivalence (i.e., a fixed r ∈ [R]), a uniform discrimination probability G = 1/K (i.e., a vanishing information I = 0) is equivalent to the corresponding operational equivalence being satisfied. To this end, use that K k=1p (k|x) = 1 to write the discrimination probability on the form It then follows from the convex linearity ofp in x (cf. Footnote 1, noting thatp must by definition arise from an ontological model) that the operational equivalences (E1) imply that the bracket in the above expression vanishes, thus leading to G = 1/K. Conversely, the condition G = 1/K is equivalent to (E3) If the bracket on the right-hand-side does not vanish we can always find ap(·|x) such the argument of the maximisation becomes positive. Thus, the operational equivalences are implied.

Appendix F: Simulation cost from robustness of operational inequivalence
Let us first show how the discrimination probability G, as defined in Eq. (36), can be related to a robustness measure within a resource-theoretic framework (see Ref. [52] for an overview of robustness measures in such frameworks). To