Semi-device-independent information processing with spatiotemporal degrees of freedom

Nonlocality, as demonstrated by the violation of Bell inequalities, enables device-independent cryptographic tasks that do not require users to trust their apparatus. In this article, we consider devices whose inputs are spatiotemporal degrees of freedom, e.g. orientations or time durations. Without assuming the validity of quantum theory, we prove that the devices' statistical response must respect their input's symmetries, with profound foundational and technological implications. We exactly characterize the bipartite binary quantum correlations in terms of local symmetries, indicating a fundamental relation between spacetime and quantum theory. For Bell experiments characterized by two input angles, we show that the correlations are accounted for by a local hidden variable model if they contain enough noise, but conversely must be nonlocal if they are pure enough. This allows us to construct a"Bell witness"that certifies nonlocality with fewer measurements than possible without such spatiotemporal symmetries, suggesting a new class of semi-device-independent protocols for quantum technologies.


I. INTRODUCTION
Quantum theory radically challenges our classical intuitions. A famous example is provided by the violation of Bell inequalities [1][2][3][4][5][6], demonstrating that local hidden variable models are inadequate to account for all observable correlations in quantum theory. While this so-called nonlocality was initially of foundational concern, it transpires to have a very powerful practical use: it enables device-independent protocols in quantum information theory (e.g. [7][8][9][10]). In this paradigm, one can perform certain tasks (e.g. cryptography) without trusting one's apparatus, or even necessarily assuming the full formalism of quantum mechanics. These protocols rely on the readily believable no-signalling constraint, which forbids the instantaneous transmission of information between sufficiently distant laboratories. Since this constraint originates in special relativity, it may be thought of as a property of spacetime itself.
A pillar of the device-independent formalism is its abstract black box description: experimental devices are fully characterized by probability tables of outputs given a supplied input (figure 1a). In this article, we supplement these inputs with physical structure, and adopt a semi-device-independent approach that makes no assumptions about the inner workings of the devices, or the physical theories governing them (i.e. quantum or otherwise), but assumes that their ensemble statistics can be characterized by a finite number of parameters. Specifically, we consider when inputs are spatiotemporal degrees of freedom, e.g. some orientation in space or duration of time. This includes, for example, the bias of a magnetic field, duration of a Rabi pulse, or angle of a polarizer (figure 1b). Spatiotemporal degrees of freedom bring with them a symmetry structure, which can be mathematically described using Lie group theory.
In this article, we introduce a general framework for spatiotemporal black boxes. We prove that the probability tables associated with spatiotemporal inputs must encode a linear representation of the corresponding symmetry groups (section II A). We demonstrate the power of this approach with two examples in Bell test scenarios: First, if each laboratory controls a single angle (section II B), we find-independently of the theory-that the response to rotations can in some cases certify the existence of a local hiddenvariable model, or the violation of a Bell inequality. Consequently, we present a novel protocol for witnessing nonlocality, similar in spirit to [11,12], but without prerequiring the validity of quantum theory. Secondly, we consider when both inputs are chosen via rotations in d-dimensional space.
We show that natural assumptions on the local response to those rotations recovers the set of bipartite binary quantum correlations exactly (section II C), indicating a fundamental relation between the structures of spacetime and of quantum mechanics. Finally, we discuss the implications of these results (section III), particularly for the construction of novel experimental tests of quantum mechanics and of new semi-device-independent protocols for quantum technologies.

A. Representation theorem for spatiotemporal degrees of freedom
The device-independent formalism abstracts experiments into a table of output statistics conditional on some choice of input. This is imbued with causal structure [13] by separating the inputs and outputs into local choices and responses made and observed by different local agents, acting in potentially different locations and times. The simplest structure is one agent at a single point in time. More commonly considered is the Bell scenario [6], where two spatially separated agents each independently select an input (measurement choice) and record the resulting local output. Theorem 1 of this paper applies to any casual structure, but looking towards application the later examples will use the Bell scenario.
Here, we shall consider experiments where the local inputs correspond to spatiotemporal degrees of freedom: for example, the direction of inhomogeneity of the magnetic field in a Stern-Gerlach experiment, or the angle of a polarization filter (figure 1b). Crucially, we will describe such experiments without assuming the validity of quantum mechanics.
Let us first consider a single laboratory, say, Alice's. For concreteness, assume for the moment that Alice's input is given by the direction x of a magnetic field. She chooses her input by applying a rotation R ∈ SO(3) to some initial magnetic field direction x 0 , i.e. x = R x 0 . Her statistics of obtaining any outcome a will now depend on this direction, giving her a black box P(a | x).
In general, Alice will have a set of inputs X and a symmetry group G that acts on X . Given some arbitrary x 0 ∈ X , we assume that Alice can generate every possible input x ∈ X by applying a suitable transformation R ∈ G, such that x = Rx 0 . Mathematically, X is then a homogeneous space [14], which can be written X = G/H, where H ⊆ G is the subgroup of transformations R with R x 0 = x 0 . In the example above, G = SO(3) describes the full set of rotations that Alice can apply to x 0 , while H = SO(2) describes the subset of rotations that leave x 0 invariant (i.e. the axial symmetry of the magnetic field vector). Then, X = SO(3) /SO(2) = S 2 is the 2sphere of unit vectors (i.e. directions) in 3-dimensional space. Similarly, the polarizer (figure 1b) corresponds to G = SO (2), H = {1}, and X = S 1 , which we identify with the unit circle.
Temporal symmetries also fit into this formalism. Suppose Alice's input corresponds to letting her system evolve for some time, then G = (R, +) is the group of time translations. If we know that the system evolves periodically over intervals τ ∈ R + , which we model as a symmetry subgroup H = (τ · Z, +), then the input domain X = G/H S 1 . Physically, this could correspond to applying a controlled-duration Rabi pulse to an atomic system of trusted periodicity before recording an outcome. Now suppose Alice has a black box P, where on spatiotemporal input x ∈ X , the outcome a is observed with probability P(a | x). Then, Alice can "rotate" her apparatus by R ∈ G, and induce a new black box P with outcome probabilities P (a|x) = P(a |Rx). Physically, R could be an active rotation within Alice's laboratory (e.g. spinning a polarizer), of the incident system (e.g. adding a phase plate), or could be a passive change of coordinates.
Thus, a given black box and a spatiotemporal degree of freedom defines a family of black boxes, and transformations R ∈ G map a given black box to another one in this family. Suppose we denote the action of R on the black boxes by T R : P → P . If rotating the input first by R then by R is equivalent to a single rotation R = R • R, it follows the black box formed by applying T R and then T R is equivalent to applying the single transformation T R = T R • T R on P. We can say more about this action if we consider ensembles of black boxes. For any family of black boxes , the experiment of first drawing i with probability λ i and then applying black box P i defines a new, effective black box P, with statistics P(a | x) = i λ i P i (a | x). All these black boxes are in principle operationally accessible to Alice. However, a priori, we cannot say much about the resulting set of boxes -it could be a complicated uncountably-infinite-dimensional set defying simple analysis. Thus, we make a minimal assumption that this set is not "too large": Assumption (i). Ensembles of black boxes can be characterized by a finite number of parameters.
The mathematical consequence is that the space of possible boxes for Alice is finite-dimensional. This is a weaker abstraction of a stronger assumption typically made in the semi-device-independent framework of quantum information: that the systems involved in the protocols are described by Hilbert spaces of bounded (usually small) dimension [15,16].
For example, BB84 [17] quantum cryptography assumes that the information carriers are two-dimensional, excluding additional degrees of freedom that could serve as a side channel for eavesdroppers [18]. Assumption (i) is much weaker; it does not presume that we have Hilbert spaces in the first place. It is for this assumption (and not the spatiotemporal structure of the input space) that the results presented in this article lie in the semi-deviceindependent regime.
We thus arrive at our first theorem. Recall that Alice chooses her input x R ∈ X by selecting some R ∈ G and applying it to a default input x 0 , i.e. x = Rx 0 . Then: There is a representation of the symmetry group G in terms of real orthogonal matrices R → T R , such that for each outcome a, the outcome probabilities P(a | x R ) are a fixed (over R) linear combination of matrix entries of T R .
The proof is given in appendix A, and is based on the observation that T R becomes a linear group representation on the space of ensembles. Motivated by this characteristic response, we refer to black boxes whose inputs are selected through the action of G as G-boxes.
A few comments are in order. First, this theorem applies to any causal structure, including the case of two parties performing a Bell experiment. If Alice and Bob have inputs and transformations X A , G A and X B , G B respectively, then the full setup can be seen as an experiment with X = X A × X B and G = G A × G B , to which Theorem 1 applies directly.
Secondly, there may be more than one transformation that generates the desired input x, i.e. both x = Rx 0 and x = R x 0 for R = R ; this is precisely the case if R −1 R ∈ H. For example, a magnetic field can be rotated from the y-to z-direction in many different ways. In this case, Theorem 1 applies to both R and R , which yields additional constraints.
Finally, quantum theory is contained as a special case. Typically, one argues that due to preservation of probability, transformations R must be represented in quantum mechanics via unitary matrices U R acting on density matrices via ρ → U R ρU † R . This projective action can be written as an orthogonal matrix on the real space of Hermitian operators, in concordance with Theorem 1.
As a specific example, consider a quantum harmonic oscillator with frequency ω, initially in state ρ 0 , left to evolve for a variable time t before it is measured by a fixed POVM [19] {M a } a∈A . The free dynamics are given by the Hamiltonian H, whose discrete set of eigenvalues {E n = ω 1 2 +n } correspond to allowed "energy levels". The evolution is periodic, so (recalling earlier) G = (R, +), H = 2π ω · Z, + and X S 1 . The associated black box is thus P(a | t) = Tr M a exp − iHt ρ 0 exp iHt . For any given ρ 0 and M a , this evaluates to an affine-linear combination of terms of the form cos [(n − m) ωt] and sin [(n − m) ωt], involving all pairs of energy levels that have non-zero occupation probability in ρ 0 (and nonzero support in M a ). This is a linear combination of entries of the matrix representation in accordance with Theorem 1. For T t to be a finite matrix, there must only be a finite number of occupied energy differences E m − E n .
Here, Assumption (i) is equivalent to an upper (and lower) bound on the system's energy. In the general framework that does not assume the validity of quantum mechanics (or presuppose trust in our devices, or our assignment of Hamiltonians), we can view Assumption (i) as a natural generalization of this to other symmetry groups and beyond quantum theory. By assuming a concrete upper bound on the representation label (such as α in eq. (1)), we can establish powerful theory-and device-independent consequences for the resulting correlations, as we will now demonstrate by means of several examples.

B. Example: Two angles and Bell witnesses
Let us consider the simplest non-trivial spatiotemporal freedom, where Alice and Bob each have the choice of a single continuous angle: respectively α, β ∈ [0, 2π), and each obtain a binary output a, b ∈ {+1, −1}. Physically, this would arise, say, in experiments where a pair of photons is distributed to the two laboratories, each of which contains an rotatable polarizer followed by a photodetector (figure 1b).
Due to Theorem 1, the probabilities P(a, b | α, β) are linear combinations of matrix entries of an orthogonal representation of SO(2)×SO (2). From the classification of these representations (see appendix B 1), it follows that all SO(2)×SO(2)-boxes are of the form resulting in a correlation function where J ∈ {0, 1 2 , 1, 3 2 , . . .} is some finite maximum "spin".
If Alice and Bob's laboratories are spatially separated, the laws of relativity forbid Alice from sending signals to Bob instantaneously. This "no-signalling" principle constrains the set of valid joint probability distributions: namely Bob's marginal statistics cannot depend on Alice's choice of measurement, and vice versa. However, for any given correlation function of the form eq. (4), there is always at least one set of valid nosignalling probabilities (see appendix B 2) -for example, those where the marginal distributions are "maximally mixed" such that independent of α, a is +1 or −1 with equal probability (likewise for β and b), consistent with an observation of Popescu and Rohrlich [20].
Consider a quantum example: two photons in a Werner state [21,22]  A paradigmatic question in this setup is whether the statistics can be explained by a local hidden variable (LHV) model. Namely, is there a single random variable λ over some space Λ such that P(a, b | α, β) = Λ dλ P Λ (λ) P A (a | α, λ) P B (b | β, λ), where P Λ (λ) is a classical probability distribution, and P A (a | α, λ) and P B (b | β, λ) are respectively Alice and Bob's local response functions (conditioned on their input choices α and β and the particular realization of the hidden variable λ)? If no LHV model exists, then the scenario is said to be nonlocal. Famously, Bell's theorem shows that quantum theory admits correlations that are nonlocal in this sense [1,2]. This follows from the violation of Bell inequalities that are satisfied by all distributions with LHV models, the archetypical example being the Clauser-Horne-Shimony-Holt (CHSH) inequality [3]: where α 1 , α 3 are two choices of Alice's angle, and β 2 , β 4 of Bob's. Classical systems always satisfy this bound, but quantum theory admits states and measurements that violate it. When working with a continuous parameter, Bell inequalities need not be limited to a subset of angles, but can also be formulated as a functional of the entire correlation function [23,24].
Not all correlations of the form in eq. (4) are allowed by quantum theory. For example, "science fiction" polarizers with the correlation function C(α, β) =  [25]. With this general form, we can make broad statements about whether correlations are local or nonlocal. First, if the correlations are sufficiently "noisy", we can systematically construct a LHV model by generalizing a procedure by Werner [21] (see appendix B 3). If the only constraint on the correlations is that is has some maximum J, then the existence of a LHV is guaranteed if the magnitude of angle-dependent changes in C is less than γ J where Subject to extra restrictions that keep the form of C simple, more permissive bounds are also derived. For instance, if there is only one non-zero coefficient in eq. (4), then γ = √ 2/π ≈ 0.4502. Recall the correlation function for projective measurements on a Werner state, −p cos [2 (α − β)], and identify γ with p. In this case, our bound is comparable with that in Hirsch et al. [26] of p ≤ 0.6829.
Conversely, we can give a simple sufficient criterion for nonlocality if we separate the terms in eqs. (2) and (4) into relational and non-relational components. The relational components where m = n account for behaviour that depends only on the difference between the two angles. Purely relational correlations, i.e. ones with C(α, β) ≡ C(α−β), can be motivated by symmetry (i.e. that in the absence of external references, only the relative angle should have operational meaning). Here, the J = 1 2 case contains the bipartite rotational invariant correlations discussed in Nagata et al. [27]. Conversely, the correlations resulting from any experiment can be actively made relational as we will describe in more detail below.
If the relational part of a correlation function C rel has an angle difference Θ + which results in near perfect (anti-)correlations, and another angle difference Θ − that does not, then one can systematically construct a (Braunstein-Caves [28]) Bell inequality that will be violated (see appendix B 4). Specifically, "near perfect" means that for a given J, where K J := √ 2π 2 J(2J + 1)(4J + 1)/3, and C rel (Θ − ) ≤ 1 − ∆ bounds the "other" value measured at Θ − . (See appendix B 4 for proof).
We summarize these results (see also figure 2): Theorem 2. Consider a two-angle Bell experiment with correlations C in the form of eq. (4), with an upper bound J on the representation labels.
A. If C is sufficiently "noisy", in the sense that with γ J as in eq. (6), then the correlations can always be exactly accounted for by a LHV model.
B. If the relational part of C is sufficiently "pure" for some angle Θ + (above 1 − ε J , as defined in eq. (7)), but also sufficiently different (below 1 − ∆) for some other angle Θ − , then the correlations violate a Bell inequality.
FIG. 2: Two-angle relational correlation functions. A "sufficiently noisy" correlation function can always be reproduced exactly by a LHV model (Theorem 2a). This is represented by the green curve completed contained within the central green-shaded region (drawn for C00 = 0). Conversely, if the function is "pure enough", then it must be nonlocal (Theorem 2b). This is represented by the blue curve with values in both extremal blue-shaded regions. Not all curves can be realized within quantum theory, but simple sinusoidal curves certainly can (such as the dashed black curve), following from Theorem 3 in two dimensions.
This is a powerful result: with a choice between two experimental settings for Alice, and no choice made by Bob, we can witness nonlocality. This can be done by the following protocol: • Alice and Bob share some random angle λ, uniformly distributed in the interval [0, 2π).
• Alice now inputs α + λ into her half of the box, while Bob inputs λ.
• By repeating the protocol, they determine the correlations C rel (Θ + ) and C rel (Θ − ), and verify that they violate the inequality above.
The protocol assumes that Alice and Bob have some physically motivated promise on the maximum representation label J (e.g. by assuming an upper bound on the total energy of the system, or the number of elementary particles transmitted), and that they know the angles Θ + and Θ − beforehand. The latter assumption is analogous to standard Bell experiments, where the relevant measurement settings are assumed to be known.
Witnessing Bell nonlocality is not the same as directly demonstrating nonlocality (i.e. collecting all the statistics for a Bell test, which is only possible if Bob has some free choice too) but rather, subject to Assumption (i), implies the existence of an experiment that would demonstrate nonlocality. In contrast to a full Bell experiment, a Bell witness has the advantage of being experimentally easier to implement: the protocol above allows one to witness nonlocality with only two measurement settings instead of four. Note that only making the correlation function relational (i.e. going from C to C rel as above) without any additional assumption on J is not sufficient to obtain this reduction, as we show in appendix B 5.
Our protocol hence demonstrates that natural assumptions on the response of devices to spatiotemporal transformations can give additional constraints that allow for the construction of new Bell witnesses. This opens up the possibility of new methods of experimentally certifying nonlocal behaviour, similar to [11,12,29], but without the need to presume the validity of quantum theory or to trust all involved measurement devices.
Theorem 2 shows us that smaller values of J (and hence "simpler" responses to changes in angles) result in more permissive bounds for finding LHV models, or witnessing non-locality. In our next example, we shall move from angles (SO(2)) to directions (SO(d)), but consider arguably the simplest non-trivial response.

C. Example: Characterizing quantum correlations
For our last example, we shall apply our framework to characterize the set of correlations that can be realized by two parties sharing a quantum state, each locally choosing one of two binary-outcome measurementsthe thus called quantum "(2,2,2)"-behaviours. The set of quantum (2, 2, 2)-behaviours Q is a strict superset of the classical (2, 2, 2)-behaviours C (i.e. those admitting a LHV model). However, the set of all no-signalling behaviours N S is strictly larger: C Q N S [20,30]. This has led to the search for simple physical or information-theoretic principles that would explain "why" nature admits no more correlations than in Q. Several candidates have been suggested over the years, including information causality [31], macroscopic locality [32], or non-trivial communication complexity [33], but none of these have been able to single out Q uniquely [34].
Here, we will provide such a characterization by considering black boxes that transform in arguably the simplest manner. Over a spherical input domain Consequently, a black box that transforms fundamentally has an affine representation, . Motivated by symmetry, we consider a class of unbiased black boxes that do not prefer any particular output when averaged over all possible inputs. This implies that c a 0 = 1/|A| for every a. For example, this symmetry holds for measurements on quantum spin-1 2 particles: spin + 1 2 in one direction is the same as spin − 1 2 in the opposite, and hence neither outcome is preferred on average.
Imagine Alice and Bob residing in d-dimensional space (d ≥ 2), sharing a non-signalling box P(a, b | x, y), where both inputs x, y ∈ S d−1 are spatial directions, and a, b each can take two values. Suppose that their conditional boxes transform fundamentally and are unbiased.
A conditional box P b, y A (a| x) := P(a, b | x, y) /P B (b | y) describes the local black box Alice would have if she was told Bob's measurement choice y and outcome b. If all conditional boxes for Alice and Bob transform fundamentally, then the bipartite box is said to transform fundamentally locally. Similarly, if all conditional boxes are unbiased, P(a, b | x, y) is said to be locally unbiased.
Surprisingly, these local symmetries severely constrain the global correlations: they allow for only and exactly those correlations that can be realized by two parties who share a quantum state and choose between two possible two-outcome quantum measurements eachthe quantum (2, 2, 2)-behaviours: Theorem 3. The quantum (2, 2, 2)-behaviours are exactly those that can be realised by binary-outcome bipartite SO(d)×SO(d)-boxes that transform fundamentally locally and are locally unbiased, restricted to two choices of input direction per party per box, and statistically mixed via shared randomness.
The proof is given in appendix C.
A few remarks are in place. First, the unbiasedness refers to the total set S d−1 of possible inputs per party, not to the two inputs to which the box is restricted. Even if the unrestricted behaviour is unbiased in the sense described above, the resulting (2, 2, 2)-behaviour can be biased. Secondly, this unbiasedness of the underlying SO(d) × SO(d)-box is necessary to recover the quantum correlations -without it, one can realize arbitrary nonsignalling correlations, including PRbox behaviour, in a way that still transforms locally fundamentally (we give an example in appendix C). Finally, shared randomness is necessary to realize explicitly non-extremal quantum correlations by such boxes, following on the observation that the set of (2, 2, 2)-behaviours realizable by POVMs on two qubits is not convex [35,36]. Namely, if both parties share the (2, 2, 2)-behaviours P 0 and P 1 and a random bit c ∈ {0, 1} that equals 0 with probability λ, they can statistically implement the mixed behaviour λP 0 + (1 − λ)P 1 by feeding their inputs into box P c .
For n = 2 parties with m = 2 measurements and k = 2 outcomes each, our result provides a characterization of the quantum set. Although Theorem 3 cannot be extended to general (m, n, k)-behaviours [37] without modification, this result shows that our framework of G-boxes offers a very natural perspective on physical correlations, and reinforces earlier observations that hint at a deep fundamental link between the structures of spacetime and quantum theory [38][39][40][41].

III. DISCUSSION AND OUTLOOK
We have introduced a general framework for semidevice-independent information processing, without assuming quantum mechanics, for black boxes whose inputs are degrees of freedom that break spatiotemporal symmetries.
Such black boxes have characteristic probabilistic responses to symmetry transformations, and natural assumptions about this behaviour can certify technologically important properties like the presence or absence of Bell correlations.
Specifically, we have shown that the quantum (2, 2, 2)behaviours can be exactly classified as those of bipartite boxes that transform locally in the simplest possible way -by the fundamental representation of SO(d) rotations, respecting the unbiasedness of outcomes. For Bell experiments with SO(2) × SO(2)-boxes, we have shown that correlations that are quantifiably "noisy enough" always admit a local hidden variable model, whereas relational correlations for which there are settings with differing "purity" must violate a Bell inequality. Since the underlying technical tools (e.g. Schur orthogonality [42]) hold in greater generality, many of our results could be applied to other groups.
Furthermore, these results have allowed us to construct a protocol to witness the violation of a Bell inequality within a causal structure that is otherwise too simple to admit the direct detection of nonlocality. We believe that our approach can be applied to experimental settings, such as the recent demonstration of Bell correlations in a Bose-Einstein condensate [12], and potentially eliminate the necessity to trust all detectors or to assume the exact validity of quantum mechanics. Many of these experiments do work with spatiotemporal inputs like Rabi pulses, which makes our approach particularly natural for analyzing them.
We have predominantly worked under the assumption that ensembles of black boxes are characterized by a finite number of parameters, and -more specificallythat an upper bound on the representation label (say, the "spin" J) of the boxes is known. On one hand, this assumption can likely be weakened, by employing grouptheoretic results such as the Peter-Weyl theorem [42]. On the other hand, we have argued that this assumption is natural: it is weaker than assuming a Hilbert space with bounded dimension (standard in the semidevice-independent framework [16]) and constitutes a generalization of an "energy bound" beyond quantum theory (cf. [43]).
Moreover, it incorporates an intuition conceptually closer to particle physics: to quantify the potential eavesdropping side channels, one might not count Hilbert space dimensions, but rather representation labels, since these are intuitively (and sometimes rigorously) related to the total number of particles.
Our framework opens up several potential avenues for future work.
First, as the witness example demonstrates, our formalism hints at novel semi-deviceindependent protocols based on assumptions with firmer physical motivation than the usual dimension bounds. In contrast to recent proposals for using energy bounds [44][45][46], our assumption on the devices' symmetry behaviour does not presume the validity of quantum mechanics, but rather embodies a natural upper bound to the "fine structure" of the devices' response. Meanwhile, one might apply the functional approach [23,24] to our framework by taking Haar integrals over spatiotemporal input spaces to derive a device-independent family of generalized Bell-Żukowski inequalities for various limits of fine structure.
Secondly, our framework informs novel experimental searches for conceivable physics beyond quantum theory. Previous proposals (e.g. superstrong nonlocality [20] or higher-order interference [47,48]) have simply described the probabilistic effects without predicting how they could actually occur within spacetime as we know it. This has made the search for such effects seem like the search for a needle in a haystack [49]. Our formalism promises a more direct spatiotemporal description of such effects -hopefully leading to predictions that are more tied to experiments and in greater compatibility with spacetime physics.
Combining the principles of quantum theory with special relativity has historically been an extremely fruitful strategy. Here, we propose to extend this strategy to device-independent quantum information and even beyond quantum physics. In principle, suitable extensions of our framework would allow us to address questions such as: which probability rules are compatible with Lorentz invariance? Any progress on these kind of questions has the potential to give us fascinating insights into the logical architecture of our physical world.

ACKNOWLEDGMENTS
We are grateful to Miguel Navascués, Matt Pusey, and Valerio Scarani for discussions.
This project was made possible through the support of a grant from the John Templeton Foundation. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the John Templeton Foundation. We acknowledge the support of the Let us first furnish a mathematical description of a black box as an input-output process. We begin with the single party case (say, Alice). Suppose the domain of Alice's inputs is the set X , and of her outputs is the finite set A. As motivated in the main text, we are interested in the case where X is a homogeneous space. That is, we have a group G that acts transitively on the set of inputs X , such that X = G/H, and H ⊆ G is the corresponding stabilizer subgroup. The paradigmatic example is given by X = S d−1 , G = SO(d) and H = SO(d−1) ⊂ G, such that the inputs x ∈ X are unit vectors. Even though the inputs need not be vectors in general, we will use the vector notation in the following for convenience. We will assume that G is a locally compact group, such that all bounded finitedimensional representations are unitary [50].
For such an input domain, we can assign an arbitrary "default input" x 0 ∈ X , such that every other input x ∈ X can be written as x = R x x 0 for some suitable transformation R x ∈ G. Physically, we can imagine that Alice chooses her input by "rotating" the default input x 0 into her desired direction x, and she can do so by applying a suitable rotation R x . In general, R x is not unique, and Alice's freedom of choice of transformation is given by H.
A black box P is then a map P : X → R |A| such that for x ∈ X , P a : x → P(A = a | X = x), where P a is the a th element of the vector map. Since for probabilities 0 ≤ P(A = a | X = x) ≤ 1, each P a is a non-negative real bounded function on X . For probabilities, we also have the constraint that for all x, a P a x = 1; so the range of the vector function P is actually that of (|A| − 1)-dimensional simplices (a compact convex subspace of R |A| ). As such, P ∈ B(X ) |A| where B(X ) is the set of bounded functions on X .
Definition 1 (G-box). A black box (formalized above) whose input domain X is a homogeneous space acted transitively upon by the group G is known as a G-box.
Proof of Theorem 1. Consider a G-box whose ensemble behaviour can be characterized by a finite number of parameters (Assumption (i)). There is a representation of the symmetry group G in terms of real orthogonal matrices R → T R , such that for each outcome a, the outcome probabilities P(a | x R ) are a fixed (over R) linear combination of matrix entries of T R .
Proof. Suppose Alice has a black box P, and access to a geometric freedom G acting on X . For each R ∈ G, Alice can induce a new black box P by first applying R to her input x and then supplying the input R x to P, which acts as P a : x → P(a | R x), i.e. P (a| x) = P(a |R x).
For each R, we can define a map T R : P → P , acting on each component of P via T R P a = P a . Obviously, T R •T S = T RS , so if we denote the "space of black boxes" accessible to Alice by Ω G := {T R P | R ∈ G} ⊆ B(X ) |A| , then T R defines a group action on Ω G .
Consider the linear extension Ω R G := span (Ω G ), a linear subspace of B(X ) |A| , with elements Q = n i=1 λ i P i , where n ∈ N is arbitrary but finite, all λ i ∈ R, and P i ∈ Ω G . Note Q : X → R |A| , but without further restriction on {λ i } this may map to outside of the simplex of normalized probabilities. Now, consider the effect of R ∈ G on some object Q. Since Q : viaT R Q := Q • R as an extension of the map T R . By construction, everyT R is a linear map, and SinceT R is an extension of T R , we drop the tilde from our notation. As we have assumed that ensembles of black boxes can be characterized by a finite number of parameters, the linear space Ω R G is finite-dimensional. Then T R , as linear maps acting on a finite-dimensional real vector space, may be expressed as real matrices.
Next, we need to show that the representation R → T R is bounded, i.e. that sup R∈G T R < ∞. This will exclude, for example, cases like G = (R, +) and T t := ( 1 t 0 1 ). To this end, let P 1 , . . . , P D ∈ Ω G be a linearly independent set of boxes that spans Ω R G (that is, a basis of boxes, hence D = dim Ω R G ). Then, every P ∈ Ω R G has a unique representation P = D i=1 α i P i , and P 1 := D i=1 |α i | defines a norm on Ω R G . We can define another norm on this space via This is finite since P ∈ B(X ) |A| , and it is easy to check that it satisfies the properties of a norm. Since all norms on a finite-dimensional vector space are equivalent, there is some c > 0 such that • 1 ≤ c • . Furthermore, all P ∈ Ω G satisfy P = 1. Thus, noting that T R P i ∈ Ω G for all i = 1, . . . , D, we get This establishes that the operator norm of all T R with respect to • (and hence with respect to all other norms) is uniformly bounded. Since we have assumed that G is locally compact, this implies that there is a basis of Ω R G in which the T R are orthogonal matrices. Consider now the evaluation functional δ a x : Ω R G → R; namely, the map from the space of black boxes to the particular probability of outcome a given input x. It follows that the statistics P(a | x) = P(a | R x 0 ) = T R P a ( x 0 ) = δ a x0 (T R P ). Since the evaluation functional is a linear map, we then find that the probabilities are given by a linear combination of elements from T R . For all x ∈ X , we use the same P and the same δ a x0 such that the only element that changes is the representation matrix T R .
Arguing via harmonic analysis on homogeneous spaces [51], we expect that Theorem 1 can be extended: it is not only entries of T R that appear in the probability table P (a|x R ), but, more specifically, generalized spherical harmonics.
A taste of this appears in Lemma VIII, but since the formulation of Theorem 1 is sufficient for the purpose of this article, we defer this extension to future work.
where J is some non-negative integer or half-integer.
(Note that this does not yet assume the no-signalling principle.) Proof. While the representation T R = T α,β from Theorem 1 acts on a real vector space V of finite dimension D, we can also regard it as a representation on the complexification W = V ⊕ iV . Since SO(2) × SO(2) is an Abelian group, all its irreducible representations are one-dimensional [42]. Thus, we can decompose W as W = D j=1 W j , where each W j is a one-dimensional invariant subspace on which T α,β acts as a complex phase. It follows that T α,β = D j=1 exp(i(m j α−n j β)) with suitable integers m j , n j ∈ Z (to see this, write T α,β as a composition of the SO(2)representations T α,0 and T 0,β ). Then, due to Theorem 1, P(a, b | α, β) must be a linear combination of real and imaginary parts of T α,β , which proves that it is of the form (B1).

Generic no-signalling correlations
It is well-known (e.g. [20]) that the no-signalling principle does not impose any constraints on the form of the correlation function if we have a bipartite box with two outcomes a, b ∈ {+1, −1} each. Namely, if X , Y denote two arbitrary sets of inputs, given an arbitrary function C : X × Y → R with −1 ≤ C(x, y) ≤ 1 for all x, y ∈ X , Y, the simple prescription P(a, b | x, y) := 1 4 + 1 4 ab C(x, y) generates a valid no-signalling distribution that has C(x, y) as its correlation function. It is a simple exercise to check that C is non-negative, normalized and nosignalling, and that C is the correlation function for P.

Local hidden variable models for SO(2) × SO(2) settings
Generalizing ideas of Werner [21], we can show that for noisy enough correlation functions of SO(2) × SO(2) settings, we can always construct a LHV model that achieves these correlations.
We will construct local probabilities that implement the dependence on α, β, λ in the following form: where q A and q B are response functions defined in the following way: q A (−| λ 1 , . . . , λ N ) = 1 − q A (+| λ 1 , . . . , λ N ), is some (small) constant and φ j is the angle such that λ j = (cos φ j , sin φ j ) T . Furthermore, j + s 2 j ) ≤ 2 and |λ | 2 = j | λ j | 2 = N , the sum hence is upper-bounded by √ 2N due to the Cauchy-Schwarz inequality. This shows that q B yields valid probabilities.
We calculate the joint probability distribution obtained in the Bell test scenario: We apply the substitution λ j := R mj α λ j and λ := ( λ 1 , . . . , λ N ), noting that this does not change the integral due to our choice of measure: Due to the definition of q A and q B , this equals (B15) Noting that ξ −ξ f j (φ j )dφ j = 2 sin ξ c j cos(m j α − n j β) +s j sin(m j α − n j β) , (B16) we can evaluate the integral explicitly, obtaining Next, let us look at the other probabilities: and the final integral vanishes on all b j · λ j -terms of q B , leaving only the constant term 1/2. That is, These give the correlation function Finally, we define γ N as the largest admissible prefactor among all possible choices of ξ.
Let us now introduce a constant term: Lemma III. Consider any two-angle correlation function where (m j , n j ) ∈ Z × Z \ (0, 0), and (as above) without loss of generality we choose m j ≥ 0 and n j > 0 if m j = 0, and disallow (m i , with constant γ N given by then this correlation function has a local hidden variable model.
Proof. First, we obtain the form of γ N by solving the optimization problem of Lemma II exactly for N = 1, 2, 3, and by substituting x = π 1 − 1 N and using (B24) We can add the constant function 1/ √ 2 to the orthonormal system in (eq. (B5)); similar reasoning as in the proof of Lemma II shows that ( √ 2c 0 ) 2 ≤ 2, i.e. that −1 ≤ c 0 ≤ 1, and |c 0 | = 1 is only possible if C(α, β) = c 0 (i.e. with no angle-dependent terms). Now consider the case 0 ≤ c 0 < 1. We can write where 1 is the constant function that takes the value 1 on all angles, and f (α, β) = (C(α, β) − c 0 )/(1 − c 0 ) is of the form of the function in Lemma II. If inequality (B22) holds, then and so Lemma II proves that f (α, β) is a classical correlation function. Moreover, 1 is trivially a classical correlation function, and thus so must be C(α, β), which is a convex combination of the two. Then case −1 < c 0 < 0 can be treated analogously, using that −1 is a classical correlation function too.
Proof of Theorem 2A. Consider an SO(2) × SO(2) box, with a correlation function in the form of eq. (4) with maximum (half-)integer J = 0. If where C 00 is the angle-independent contribution to the correlation function (as in eq. (4)), and γ J is a given then there is a LHV model that accounts for these correlations.
Proof. This follows as a corollary of Lemma III. We convert between the form of correlations in eq. (4) and eq. (B21) by counting the maximum number N of unique terms that could appear for a given positive (half-)integer J. The double sum contributes (2J + 1)(4J + 1) terms, from which we remove 2J cases corresponding to negative n where m = 0, and the one completely constant case m = n = 0. This gives a maximum of N = 4J(2J + 1). Since the lowest value (J = 1 2 ) already yields N = 4 unique terms, we only need the final case of eq. (B23), and hence the constant

Witnessing nonlocality
Bell inequalities can be chained by direct addition. For instance, suppose one takes a CHSH inequality (eq. (5)) with measurements {x 1 , y 2 , x 3 , y 4 } and a second with measurements {x 1 , y 4 , x 5 , y 6 }. Adding these together yields C(x 1 , y 2 )+C(x 3 , y 2 )+C(x 3 , y 4 )+ C(x 5 , y 4 ) + C(x 5 , y 6 ) − C(x 1 , y 6 ) ≤ 4. This can inductively be done for a set of N measurements ( N 2 each for Alice and Bob), leading to a chained Bell inequality, known as the Braunstein-Caves inequality (BCI) [28]: If such an equation is violated, then no LHV can account for these statistics 2 .
Recall, eq. (4) gives the generic SO(2) × SO(2) correlation function. If we restrict ourselves to relational correlations, this amounts to setting S mn = C mn = 0 when m = n, such that the correlation function has a single parameter form Proof. We show this by construction. For even N , define We use the notation "x mod 2π" to indicate x − 2πn where n ∈ Z is chosen such that x − 2πn ∈ [0, 2π), mapping the angle to the principal range. We construct a N -measurement BCI, as defined in eq. (B29). Since the correlation function is relational, we write C(α, β) as the single parameter function C(β − α), and assign the measurement settings: figure 3.) This amounts to setting the arguments of the correlation functions featured in the BCI to where equality is taken modulo 2π. With such assignments, the BCI is then written: (B34) Recall that C (Θ − ) = −1. C(Θ + ) = +1 must be a local maximum, and a finite J allows us to assume the function C is smooth at this point. Thus, in the limit of small δ N , where k ≥ 0 is some constant. We then rewrite eq. (B34) as which for large enough N will eventually be violated.
Proof. First, since the correlation function C is continuous, it attains its global maximum at some Θ + . Since C(Θ + ) ≥ C(Θ + ) ≥ 1 − ε, the premises of this lemma are also satisfied if Θ + is replaced by Θ + -i.e., we can assume without loss of generality that C attains its global maximum at Θ + .
With these Θ + and Θ − , we use the prescription in Lemma IV, with the angle choices in eq. (B32) to generate the following BCI, which must hold for all even integers N ≥ 2 if there exists a LHV model: (B42) Let us write δΘ := Θ − − Θ + mod 2π, such that δ N = δΘ N −1 . Let K ∈ R be any constant such that C (x) ≥ −K for all x; it follows from Lemma V that such K exists, and we will fix K later in accordance with that lemma. Since Θ + is a local maximum, we know that 0 ≥ C (Θ + ) ≥ −K, i.e. K ≥ 0. The global bound on the second derivative of C gives us Thus, eq. (B42) implies Under what conditions does there exist an even integer N ≥ 2 such that this inequality is violated, i.e. the existence of a LHV model is ruled out? The negation of this inequality can be rearranged into a quadratic equation in (N − 1): If this equation has a solution for some even integer N , then the non-existence of a LHV model follows. If ε = 0, then there will always be a solution for large enough N , recovering Lemma IV. Thus, we here only give further consideration to the case where ε > 0. Since this quadratic function in (N − 1) is positive for large values of ±(N −1), it is necessary for the existence of a negative value that this function has zeroes over the real numbers. The zeroes are and so the following inequality is necessary for the existence of a solution of eq. (B45): If it is satisfied, then the values of (N − 1) ± are welldefined, and we can continue to argue as follows. The quadratic function in eq. (B45) is negative for all real numbers N ∈ (N − , N + ), where 0 < N − < N + . Now, this interval definitely contains an even integer N if this difference is larger than two if and only if The two solutions of the corresponding quadratic equation are They are both real, and ε − < 0 < ε + . Thus, ε < ε + implies a suitable solution of eq. (B45), i.e. rules out the existence of a LHV model. In fact, if ε < ε + , then we automatically get (B50) i.e. eq. (B47) is automatically satisfied. Now, considering ε + as a function in δΘ, this function is decreasing for δΘ > 0. Since δΘ ≤ 2π, ε < ε + (2π) implies ε < ε + (δΘ) ≡ ε + . Thus, the inequality implies a violation of a BCI. The statement of the lemma now follows from taking the value of K from Lemma V, and by substituting K J := π 2 K.
Proof. Subtracting the "non-relational" parts of C(α, β) is equivalent to performing the following integration: This may be directly verified by noting that terms of the form cos (mα − nβ + (m − n) φ) and sin (mα − nβ + (m − n) φ) individually integrate to 0 over φ except when m = n. This allows us to interpret taking the relational core of a correlation function as mixing C over many settings offset by a shared uniform random angle. It then follows from the convexity of Bell inequalities that if the BCI implied by Lemma VI for the relational core is violated "on average" for this mixture of settings, there must be at least one single set of input settings that also results in that BCI being violated.

Necessity of a bound on J
We will now show that our protocol for witnessing nonlocality does not work if we simply drop the assumption that J is finite.
A correlation function C(α, β) has a LHV model if and only if there exists a variable λ ∈ Λ, distributed via some P Λ (λ), and a family of local response functions C A (α, λ) := a∈{−1,+1} a P A (a|α, λ) and C B (β, λ) := b∈{−1,+1} b P B (b|β, λ) such that Suppose that there are two angles θ + , θ − such that our protocol gives correlation values C rel (θ ± ) very close to ±1. If this is the only experimental syndrome, without further assumptions on the form of the correlation function (in particular, without any assumption on J as explained in the main text), then this experimental behavior can always be reproduced to arbitrary accuracy by local hidden variables. Namely, θ ± can be arbitrarily well approximated by angles θ ± that satisfy the premises of the following lemma: Lemma VII. Suppose that θ + , θ − ∈ [0, 2π) are such that θ − − θ + = m n π, where n ∈ N and m ≤ n is an odd integer. Then there exists a local relational correlation function C(α, β) ≡ C(α − β) such that C(θ + ) = +1 and C(θ − ) = −1. (B55) Proof. We set Λ = [0, 2π) and P Λ (λ) = 1 2π -the uniform measure on this interval. Without loss of generality, assume that Θ − > Θ + and Θ + = 0 (we can choose our local coordinates α, β to make this the case). Define That is, f is a square-wave function of period 2π n . This f is piecewise continuous and satisfies f x + π n = −f (x) for all x. Thus, f x + 3 n π = −f x + 2 n π = f x + 1 n π = −f (x) for all x, and in particular, by induction, f x + m n π = −f (x) for all x since m is odd by assumption. Now, defining C(α, β) as in (B54), this correlation function is relational, since by substitution and due to the (2π)-periodicity of f . Furthermore, Therefore, simply following our protocol but relaxing our assumption on J (while not imposing any other assumptions) cannot be sufficient to certify nonlocality.

Appendix C: Characterizing quantum correlations
Let us consider black boxes that have a particularly simple transformation behaviour under rotations: Definition 2 (Transforming fundamentally). Consider an SO(d)-box P(a | x), where d ≥ 2. Let x 0 ∈ X . We say that this box transforms fundamentally under rotations if for all x ∈ X and all R x ∈ G with R x x 0 = x one finds where (R x ) i,j is the fundamental matrix representation of R x ∈ G, and c a 0 , c a i,j are constants independent of x and R x .
Equivalently, a black box transforms fundamentally if the corresponding representation R → T R from Theorem 1 can be chosen as a direct sum of copies of the trivial and the fundamental representations of G = SO(d). Since G is transitive on X , the existence of the above representation is independent of the particular x 0 : any alternative x 0 ∈ X satisfies x 0 = Sx 0 for some S ∈ G, satisfying the above with R = RS −1 .
Lemma VIII. Suppose that X = S d−1 (the unit sphere), and P(a | x) transforms fundamentally under rotations in G = SO(d). Then, where constants c a 0 ∈ R and c a ∈ R d satisfy The fact that this group is isomorphic to SO(d − 1) is precisely due to the fact that our set of inputs is the homogeneous space X = SO(d) /SO(d − 1), i.e. the (d − 1)-sphere. For d ≥ 3, we have SO(d−1) T dT = 0, and thus Since R x S e 1 = x for every S ∈ G e1 , every rotation matrix R x S can be substituted for R x into definition 2. Thus P(a | x) = c a 0 + d i,j=1 (R x S) i,j c a i,j . By taking the average over all S ∈ G e1 according to the Haar measure, we get But R x Q = ( x, 0, . . . , 0), i.e. a matrix with first column equal to x and all further columns equal to zero. This proves that P (a| x) is affine-linear in x as claimed, in the case d ≥ 3. Now consider the case d = 2. Here, x = (x 1 , x 2 ) T , and there is a unique choice of R x , namely R x = x 1 −x 2 x 2 x 1 . Then, P (a| x) being affine-linear in R x is equivalent to being affine-linear in x. From normalization, a P(a | x) = 1 ∀ x ∈ X , which by transitivity of G on X can be re-written a P(a | R x 0 ) = 1 ∀R ∈ G. Suppose we take the Haar average of G over both sides of this constraint: where we have used SO(d) RdR = 0. Since a P(a | R x 0 ) = 1 for each individual R ∈ G, we have ( a c a ) · (R x 0 ) = 0. Since by transitivity dim[ span({R x 0 } R∈G )] = d, it follows that a c a = 0.
For any x, one may find some R ∈ SO(d) such that R x = − x (since both x and − x ∈ S d−1 and SO(d) is transitive on S d−1 ). Hence, for the black box P(a | x), there is always another black box P(a | − x) such that the average statistics of these two boxes is given by 1 2 [P(a | x) + P(a | − x)] = c a 0 . Clearly, then c a 0 ≥ 0. Finally, from the definition of the dot product, . The converse follows from the transitivity of SO(d) on S d−1 : any x can be expressed as R x 0 for some fixed x 0 and R ∈ SO(d). Thus eq. (C2) can be written P(a | x) = c a 0 + c a · R x 0 which has the form of eq. (C1). We can formally define the concept of an "unbiased" black box where if the input orientation is randomized, no particular outcome is preferred: . Consider a G-box P(a | x) for some compact group G. We say that this box is unbiased if the Haar average of P(a | Rx) over R ∈ G is the same for all a ∈ A.
It follows from normalization that if a black box transforms fundamentally under rotations and is unbiased, it may be written in the form P(a | x) = 1 |A| + c a · x.
We extend both these definitions to the local parts of a bipartite system by considering the conditional boxes defined whenever P B (b | y) > 0 and P A (a | x) > 0 respectively. A conditional box can be thought of as the black box Alice has if she is told Bob's measurement and outcome. (This is in contrast to a marginal black box, which quantifies Alice's statistics when she knows nothing of Bob's measurement or outcome.) No-signalling implies the existence of well-defined marginal boxes P B (b | y) and P A (a | x).
Definition 4. We say that a no-signalling bipartite box P(a, b | x, y) transforms fundamentally locally (is locally unbiased) if all conditional boxes transform fundamentally (are unbiased).
The next two lemmas show that these properties are preserved by convex combinations of boxes.
..N be a set of no-signalling bipartite black boxes that transform fundamentally locally. Any convex combination P(a, b | x, y) := i λ i P i (a, b | x, y) where all λ i ≥ 0 and i λ i = 1 also transforms fundamentally locally.
Proof. First, we calculate the marginal distribution: First, we note that P B (b | y) = 0 only if P B,i (b | y) = 0 for all i. In this case, the combined conditional box is undefined, and there is nothing to prove. Thus, we may proceed with the case that P B (b | y) > 0.
With P(a, b | x, y) Here, i denotes a sum over all those i for which P B,i (b| y) > 0. These are exactly the i for which 1, and hence we may define µ i := for those i with P B,i (b| y) > 0, and µ i = 0 for all other i, such that µ i ≥ 0 and moreover i µ i = i µ i = 1. Thus, the new conditional box is itself a convex combination of the constituent conditional boxes. A similar convex combination can be found for P a, x B (b| y). Since the constituent conditional boxes transform fundamentally, from Lemma VIII we write P b, y A,i (a| x) = c . By the converse part of Lemma VIII, P b, y A (a| x) is thus a valid black box that transforms fundamentally. The same argument holds for Bob's conditional boxes.
..N be a set of nonsignalling black boxes that are locally unbiased with respect to G. Any convex combination P(a, b | x, y) := i λ i P i (a, b | x, y) where all λ i ≥ 0 and i λ i = 1 is also locally unbiased with respect to G.
Proof. In the proof of Lemma IX we have seen that there exists a probability distribution {µ i } i such that P b, y A (a| x) = i µ i P b, y A,i (a| x). Thus we find Likewise holds for Bob's conditional boxes, and hence, P(a, b | x, y) is also locally unbiased.
Lemma XI. Consider a bipartite black box with inputs X = Y = S d−1 and binary outcomes A = B = {+1, −1}. If this box transforms fundamentally locally and is locally unbiased, then it describes quantum correlations.
Proof. From Lemma VIII, binary-outcome conditional boxes that transform fundamentally and are unbiased can be written: such that the joint probability distribution is given by: This defines a mapω AB acting on e a, x := (1, a x) T and e b, y := (1, b y) T , such thatω AB ( e a, x , e b, y ) = P(a, b | x, y). Moreover, span( e a, x ) = span( e b, y ) = R d+1 , and so this function has a unique bilinear extension ω AB : The set of non-negative linear combinations of e a, x define a positive Euclidean cone A + ⊂ R d+1 , whose extremal rays are the non-negative multiples of (1, z) T for z ∈ S d−1 . We may then define an Archimedean order unit (AOU) [52], u := (2, 0, . . . , 0) T ∈ R d+1 and define an AOU-space (R d+1 , A + , u). An identical AOU-space (R d+1 , B + , u) can be defined using the non-negative linear combinations of e b, y . Now, we shall employ a result from Kleinmann et al. [52] (generalizing a result by Barnum et al. [53]) that pertains to bilinear maps on positive Euclidean cones. If a bilinear map ω AB on such cones is both unital and positive, then there exists a bipartite quantum system ρ AB and a map from each point a ∈ A + , b ∈ B + onto local quantum POVM elements M a , M b such that We show that ω AB satisfies these conditions. First, for any given a, x (likewise b, y), it can be seen that and hence ω AB ( u, u) = a,b P(a, b | x, y) = 1, which means that ω AB is unital. Next, since every p ∈ A + can be written as a non-negative linear combination of finitely many e a, x (likewise for q ∈ B + ), then ω AB ( p, q) ≥ 0 for all p, q ∈ A + , B + , showing that ω AB is positive. Hence, ω AB can be realised by a quantum system, and P(a, b | x, y) is a quantum behaviour.
Lemma XII. Let d ≥ 2. Then all extremal quantum (2,2,2)-behaviours can be realized by locally unbiased SO(d)-boxes that transform fundamentally locally with X A = X B = S d−1 ; the two settings (inputs) correspond to two choices of directions.
Any extremal quantum (2, 2, 2)-behaviour P(a, b | r, t) can then be written in the form P(a, b | r, t) = Tr[ρ AB (E are qubit rank-1 projectors, a, b ∈ {−1, +1} and r, t ∈ {1, 2}. We shall show that there exists a non-signalling SO(d) × SO(d)-box P(a, b | x, y) that transforms fundamentally locally, is locally unbiased, and has choices x r , y t such that P(a, b | x r , y t ) = P(a, b | r, t).
(C19) This expression yields well-defined probabilities by construction, and it is affine-linear in x. Analogous statements hold for the other conditional boxes. Thus, according to Lemma VIII, P(a, b | x, y) transforms fundamentally locally. Furthermore, averaging the above conditional box uniformly over x replaces x by zero and annihilates all dependence on a; hence this box is locally unbiased. Let x r ∈ R d be the vector whose first two components are the first two components of x r , and all other (d − 2) components are zero; define y t analogously. Then P(a, b | r, t) = P(a, b | x r , y t ).

Proof of Theorem 3.
Let d ≥ 2. The quantum (2, 2, 2)-behaviours are exactly those that can be realised by binary-outcome bipartite SO(d) × SO(d)boxes that transform fundamentally locally and are locally unbiased, restricted to two choices of input direction per party per box, and statistically mixed via shared randomness.
Proof. Lemma XI tells us that "(2, S d−1 , 2)-behaviours" that transform fundamentally locally, and are locally unbiased, can be realised by local measurements on a bipartite quantum system. If we restrict our choice of inputs from the full S d−1 freedom to just two choices of orientation per party, then these will be (2, 2, 2)behaviours, and since they can be realised by a quantum system, they are quantum (2, 2, 2)-behaviours.
The other direction follows from Lemma XII: all extremal quantum (2, 2, 2)-behaviours can be realised by restricting binary-outcome bipartite SO(d)×SO(d)boxes, transforming fundamentally locally and being locally unbiased, to two possible input directions per party. Additional shared randomness allows the two parties to generate all statistical mixtures of these behaviours, yielding all further quantum (2, 2, 2)behaviours.
Theorem 3 cannot hold for all d ≥ 2 without allowing shared randomness. For example, suppose that d = 3, then the proof of Lemma XI shows that all correlations realizable with binary-outcome bipartite SO(3)×SO(3)boxes that transform fundamentally locally and are locally unbiased can be realized via unital positive bilinear forms on the positive semidefinite qubit cone. Consequently, the result by Barnum et al. [53] implies that all these correlations can also be realized via POVMs on ordinary two-qubit quantum state space. However, Donohue and Wolfe [36] (extending results by Pál and Vértesi [35]) have shown that the set of (2, 2, 2)behaviours realizable on two qubits via POVMs is not convex, and thus not equal to the convex set of quantum (2, 2, 2)-behaviours.