Multi-system measurements in generalized probabilistic theories and their role in information processing

Generalized probabilistic theories (GPTs) provide a framework in which a range of possible theories can be examined, including classical theory, quantum theory and those beyond. In general, enlarging the state space of a GPT leads to fewer possible measurements because the additional states give stronger constraints on the set of effects, the constituents of measurements. This can have implications for information processing. In boxworld, for example, a GPT in which any no-signalling distribution can be realised, there is no analogue of a measurement in the Bell basis and hence the analogue of entanglement swapping is impossible. A comprehensive study of measurements on multiple systems in boxworld has been lacking. Here we consider such measurements in detail, distinguishing those that can be performed by interacting with individual systems sequentially (termed wirings), and the more interesting set of those that cannot. We compute all the possible boxworld effects for cases with small numbers of inputs, outputs and parties, identifying those that are wirings. The large state space of boxworld leads to a small effect space and hence the effects of boxworld are widely applicable in GPTs. We also show some possible uses of non-wirings for information processing by studying state discrimination, nonlocality distillation and the boxworld analogue of nonlocality without entanglement. Finally, we connect our results to the study of logically consistent classical processes and to the composition of contextuality scenarios. By enhancing understanding of measurements in boxworld, our results could be useful in studies of possible underlying principles on which quantum theory can be based.


I. INTRODUCTION
Standard textbook presentations of the postulates of quantum mechanics usually begin with a list of mathematical axioms with relatively little accompanying explanation.This is in contrast to relativity theory, for example, which can be based on the premise that the laws of physics are frame independent.Whether or how we can formulate quantum theory in a similar way remains open in spite of significant investigation (see, e.g., [1][2][3][4][5][6]).Quantum theory has some counter-intuitive features such as the presence of non-local correlations that seemingly defy classical explanation [7].A general framework to study these features in the context of quantum theory and possible alternatives is that of generalized probabilistic theories (GPTs) [8].Beyond classical and quantum theory one well studied GPT is boxworld, which allows arbitrary no-signalling distributions to be realised.It is known for instance that particular cryptographic tasks remain possible even against adversaries that have access to boxworld systems [9][10][11][12].In a further line of work that tries to single out quantum correlations within a range of alternative theories (see e.g.[13][14][15][16]), boxworld is a useful foil theory.In spite of these results, the structure of multi-system measurements in boxworld has not been developed in detail, but understanding these is important to fully characterize the information processing power of boxworld.Furthermore, although we refer to effects in boxworld throughout this paper, the measurements we find are applicable to a range of GPTs (see Section VII E).
Further motivation for the study of multi-system measurements comes from the recent trend for studying information processing in quantum networks [17][18][19][20][21].It is likely that simple quantum networks will be built in the near future, hence it is useful to explore the possibilities these networks bring.GPTs provide a useful means to form a more general understanding of this.To compare what is possible in quantum theory as opposed to other GPTs it is furthermore necessary to understand the structure of multi-system measurements in the latter.Such comparison furthermore informs the question of in what sense quantum theory is optimal for information processing [22].In the present work, we thus explore such multi-system measurements.In addition, we connect our results to the study of logically consistent classical processes [23] and of composition of contextuality scenarios [24,25], pointing to a new kind of composition in the latter case.
Our study of the set of possible measurements in boxworld proceeds using the set of effects, the constituent parts of measurements (see later).The large set of possible states in boxworld comes at the expense of having a smaller set of effects than in theories with weaker correlations.In this work we will be interested in the set of possible measurements that can be performed with access to several systems in boxworld.One type of such measurements are the wirings [8], which correspond to processes in which a measurement is applied to one system, then to a second depending on the result and so on (or convex combinations of such processes).Wirings are hence implementable using local operations and classical communication (LOCC) when the individual systems are separated 1 .It is well-known that in quantum theory not all measurements are of this form.For example, a measurement in the Bell basis as used in teleportation cannot be realised by wiring together measurements on individual systems.For single and bipartite systems in boxworld, all measurements are wirings [8], while this is no longer true for three or more systems [26].One of the aims of the present paper is to find the multi-system measurements that are not wirings and investigate their significance for information processing.
We classify scenarios by the number of systems, and the number of inputs and outputs for the boxes of each system.One way to find the set of all extremal effects in a given scenario is using vertex enumeration (all the extremal nosignalling distributions act as facets since the inner product of each such state with any valid effect must be greater than 0 and less than 1).However, directly performing vertex enumeration is slow, except in the smallest scenario [27].
To circumvent this we exploit an alternative method for finding extremal effects that starts with different ways to represent the identity effect.By breaking down these identity effects we can find the extremal effects in various scenarios.We separately consider deterministic and non-deterministic effects and classify the effects into wirings and non-wirings before investigating the significance of the latter for state discrimination and nonlocality distillation, showing the advantages of non-wiring measurements.We also discuss how our findings relate to the phenomenon of quantum nonlocality without entanglement [28], which is the existence of measurements comprising product effects that cannot be performed by LOCC.The non-wiring type measurements of the present paper serve as a boxworld analogue of these, and we find a set of product states in boxworld that can be perfectly distinguished with a non-wiring measurement, but not with any wiring.
Understanding measurements in GPTs is also useful to gain insight into which features of quantum theory make it special with respect to other theories, which in turn may help find underlying principles on which quantum theory can be based.Boxworld is known to have only separable measurements [26] (i.e., those for which all effects can be expressed as a sum of product effects), which in a sense makes it inferior to other GPTs.However, this property also makes boxworld a suitable example for studying the difference between separable measurements that can be done using LOCC and those that cannot, and of the information processing capabilities enabled by the latter, an understanding that will likely carry over to arbitrary GPTs.Further insights into this difference are also desirable in the quantum case [29][30][31][32].
The rest of the paper proceeds as follows.We first give a short introduction to GPTs, before introducing the notation and technical background in Section III.In Section IV, we present technical results underlying the algorithms for generating all extremal effects in box-world; these algorithms are presented in Section V.In Section VI we apply our algorithm to various scenarios of up to 4-inputs and up to 4-outputs and demonstrate the application of these results in information processing tasks in Section VII.Finally, we connect our work to logically consistent classical processes and to composition of contextuality scenarios in Section VIII before making a few concluding remarks in Section IX.

II. BACKGROUND
We briefly outline the framework for GPTs that will be used throughout this paper.In a GPT, states are represented as vectors in a vector space V .We use S ⊆ V to denote the set of all possible states (note that the state space depends on the size of the system), and we typically assume S is convex and compact.An effect is a linear map from a state to a probability.Effects can be taken as vectors in the vector space dual to V , which we call E, and the map from states to probabilities is then formed by taking the inner product between the state and the effect.Given a state space, a valid effect e must satisfy 0 ≤ ⟨e, s⟩ ≤ 1 for all s ∈ S.Under the no restriction hypothesis [33], which we will assume here, this necessary condition is taken to be necessary and sufficient for an effect to be valid.
The state space of a composite system is formed by taking some kind of tensor product between the individual state spaces.A minimal requirement is that if s A ∈ S A is a state of system A and s B ∈ S B is a state of system B, then s A ⊗ s B is a state of AB.If all joint states have this form, or are convex combinations of states of this form, then the joint state space is said to be the minimal tensor product of the individual state spaces.Taking the composite system effect space to obey the no restriction hypothesis, the min tensor product state space corresponds to forming the effect space by taking the maximal tensor product of the individual effect spaces.
One way to specify a state is via the probabilities of the outcomes of a set of fiducial measurements [2], i.e., a set that is sufficient to completely characterize the state.We will work with systems that have a finite set of fiducial measurements, each of which has a finite number of outcomes.This means that a state can be specified using a vector whose entries contain the probabilities of outcomes for the fiducial measurements.For simplicity, we consider cases in which the number of outcomes of each fiducial measurement is the same.For instance, if for a single system there are two 2-outcome fiducial measurements, we write the state as where P (i|j) is the probability of output i given input j.
Although this is a four dimensional vector, due to normalization there are only two independent parameters.For later convenience we stick with the larger representation rather than suppressing the redundant parameters.
In this work we assume local tomography [34], i.e., that the state of a joint system can be determined from the statistics of local fiducial measurements (called the global state assumption in [8]).This means that a state of N parties (where for each single system there are two 2-outcome fiducial measurements) has a similar representation: (the ordering is such that first for input 00 . . .0 the outcomes increase counting in binary, and then the inputs increase counting in binary).
For every system there is an identity effect, i.e., an effect u such that ⟨u, s⟩ = 1 for all s ∈ S. In our notation there are several ways to write this effect.For a single system we can write u R = (1, 1|0, 0), u R = (0, 0|1, 1), u R = (1, 1|1, 1)/2 etc.Although written differently, these all represent the same effect, the notation u R meaning a representation of the effect u.In general, we identify two vectors as representing the same effect if they have the same inner product with all elements of the state space.
The set of vectors that can be added to any effect vector without changing the effect it represents we term nosignalling moves because they each arise as a result of the state space S only containing no-signalling distributions (or as a result of the normalization).For instance, for all valid states, P (00|00) + P (10|00) − P (00|10) − P (10|10) = 0, which represents the impossibility of Alice's choice of measurement affecting Bob's outcome when Bob makes input 0. This no-signalling condition can be encoded using a vector r such that ⟨r, s⟩ = 0 for all s ∈ S. Thus, if e R is a vector representing a valid effect, then e R + r represents the same effect.We use {r i } to denote a complete set of generators of all such vectors, so that any no-signalling move can be written as a linear combination of vectors from {r i }.
A measurement is a collection of effects that sum to the identity effect, i.e., given state space S, the set of effects {e 1 , . . ., e m } form a measurement on S if for each i = 1, . . ., m we have 0 ≤ ⟨e i , s⟩ ≤ 1 for all s ∈ S and if m i=1 e i = u.In this work we will be interested in the effect space when the state space comprises all no-signalling distributions (also known as boxworld ).Because boxworld has large multipartite state spaces, the set of possible effects is comparably restricted.Boxworld effects are therefore valid in a wide range of GPTs.We consider the effect spaces for various small numbers of inputs, outputs and parties (larger cases become too computationally intensive).Due to linearity, the effect space is convex, and can hence be characterized in terms of a set of extremal effects.There are a finite number of these, generating a convex polytope; our aim is to have a procedure that can generate the vertices of this polytope.
In the two-party case, the complete set of extremal effects was computed in [27].It was found that there are 82 such effects.These were computed by vertex enumeration starting from the facet description of the effect space.This facet description says that to be a valid effect 0 ≤ ⟨e, s i ⟩ ≤ 1 for all extremal states s i (in the present case there are 24 extremal states: 16 local deterministic states and 8 PR-box-type states).In this work we use a different method that allows us to treat cases for which our computational tools for vertex enumeration are prohibitively slow.
We will also be interested in a special type of effect called a wiring.Wirings are effects that can be implemented by convex combinations of procedures of the following form: choose one of the states and choose a measurement to make on that state, take its output and apply a function to it to choose the next state and the measurement on the next state, and so on, where at each step all the previous outputs are used as arguments of the function that selects the next state and the measurement performed on it.The final output is then formed by taking a function of all the individual measurement outcomes -see Fig. 1 for an illustration.The extremal wirings are effects of this type that cannot be represented as convex combinations of others.These boxworld measurements are of particular interest, FIG. 1. Illustration of a wiring-effect in the bipartite case.An input x1 is made generating outcome a1, then x2 is taken as a function of a1 giving outcome a2.The final output, a, is then a function of a1 and a2.For instance, if x1 = 0, x2 = a1 and a = a1 ⊕ a2 then the effect corresponding to a = 0 is ea=0 = (1, 0, 0, 0|0, 0, 0, 1|0, 0, 0, 0|0, 0, 0, 0) and the effect corresponding to a = 1 is ea=1 = (0, 1, 0, 0|0, 0, 1, 0|0, 0, 0, 0|0, 0, 0, 0).since they correspond to classical processing of inputs and outcomes and are thus applicable in any operational theory allowing classical processing.In particular, they are applicable irrespective of the type of system under consideration and can be used for instance for non-locality distillation, which in turn may have applications for device-independent information processing.This is discussed further in Section VII C.

III. PRELIMINARIES
Let n I be the number of inputs and n O be the number of outputs per party with N parties.A state can then be represented as a (n I n O ) N -dimensional vector, whose entries are the conditional probabilities of every set of outputs given every set of inputs.Given a subset, V , of the set [N ] := {0, 1, 2, . . ., N − 1} of parties, we use X V as the random variable representing their inputs (elements of [n I ] |V | ) and A V as that for their outputs (elements of [n O ] |V | ).
We will now define our state space S NS .To be elements of this we require three conditions to hold: no-signalling, positivity and normalization, which we detail below.The no-signalling condition is that no subset of parties should be able to signal to any other subset of parties.In other words, for any disjoint subsets V and W of [N ], the parties in V cannot signal to those in W . Mathematically, this means A smaller subset of this set of conditions is sufficient to generate the whole set, namely it suffices that no party can signal to the collection of all the others.This is stated formally in the following lemma.
Lemma 1.The set of conditions implied by (3) for all disjoint subsets V and W of [N ] is implied by demanding that (3) holds for the subset of conditions where V is a singleton and The conditions where V is a singleton and W = [N ] \ V can be represented by a set of vectors {r i }, such that ⟨r i , s R ⟩ = 0 for all s ∈ S NS , and where each r i contains n O elements with value 1, n O elements with value −1 and the remaining entries are 0. The number of such conditions is N (n O n I ) N −1 (n I − 1).These are not all linearly independent: the dimension of the space of un-normalized no-signalling distributions is (n I (n O − 1) + 1) N [35,36].
A valid state must also have positive entries, since each entry represents a probability, i.e., where The remaining condition for a vector to be an element of S NS is that where S = [N ].Note that, given the no-signalling conditions, if (5) holds for X S = 0 := 0 . . .0 then it also holds for X S = x S for any other x S ∈ [n O ] |S| .Furthermore, together with (4) this implies that no entry can exceed 1 (so conditions that each probability is at most 1 do not need to be added).
An important set of states are the local deterministic states.These correspond to deterministically assigning one of the outcomes for every possible input of every party.Thus, they are states of the form and where P Ai|Xi ∈ {0, 1} in all cases.We denote these s L,i , where i runs from 1 to (n I n O ) N .The linear span of the set of local deterministic states is the same as that of the no-signalling distributions (being local does not confer any additional equality constraints over being no-signalling).Thus, given two vectors representing effects, we can check whether they represent the same effect by checking whether both vectors have the same inner product with all local deterministic distributions2 .
An important symmetry class for a given state or effect is that of the relabellings.We can relabel the parties, the inputs for each party, or the outputs for each input of each party.Thus, the total size of the symmetry group of relabellings is ((n O !) n I n I !) N N !.

IV. EXTREMAL EFFECTS
We are interested in finding the extremal effects for a given scenario, i.e., for a given (N, n I , n O ).As mentioned before we use a representation that has some redundancy in that the constraints arising from the no-signalling moves and normalisation are not used to reduce the parameters.(Such a reduction would typically be done when trying to solve this problem directly by means of vertex enumeration.)Instead, the technical results of this section show that the redundant representation employed has a convenient structure that allows us to find extremal effects in a different way, which will be the basis of the algorithms we present in Section V. We start with the following observation, expressed in terms of a standard basis vector (SBV), by which we mean a vector with all components 0 except for one 1.

Lemma 2. Every (n
Proof.To see that SBVs all represent effects, note that the inner product of such a vector e R with a state s ∈ S gives a single probability, hence 0 ≤ ⟨e R , s R ⟩ ≤ 1 for all s ∈ S, making e R a representation of a valid effect.By considering a state satisfying ⟨e, s⟩ = 1, it is clear that αe is not an effect for any α > 1, hence e is on the boundary of the effect space.To see that it is extremal, suppose e = αe 1 + (1 − α)e 2 for two effects e 1 and e 2 and 0 < α < 1.Then ⟨e, s⟩ = α⟨e 1 , s⟩ + (1 − α)⟨e 2 , s⟩ .Now suppose s 1 is a state with ⟨e, s 1 ⟩ = 1.Since e 1 and e 2 are effects, it follows that ⟨e 1 , s 1 ⟩ = ⟨e 2 , s 1 ⟩ = 1.Similarly, if s 0 is a state with ⟨e, s 0 ⟩ = 0, it follows that ⟨e 1 , s 0 ⟩ = ⟨e 2 , s 0 ⟩ = 0.Because the local deterministic distributions are {0, 1}-valued, if s is local deterministic it satisfies ⟨e, s⟩ ∈ {0, 1}.It follows that both e 1 and e 2 have the same action on any local deterministic distribution as e does.Since the set of local deterministic distributions span the space of no-signalling distributions, e 1 and e 2 must have the same action as e for any s ∈ S NS .Thus, e 1 , e 2 and e must be the same effect and e is extremal.
The {0, 1}-valued representations of the identity effect will be important because these are the class that can be used to generate all wirings (as well as some non-wirings).
Lemma 3. Let u R be a {0, 1}-valued vector representing the identity effect.Any vector e R formed from u R by replacing any number of the 1 entries with 0s is a representation of an extremal effect.
Proof.First, since u R represents the identity effect we have ⟨u R , s R ⟩ = 1 for all s ∈ S. Since u R is a {0, 1}-valued vector we can write it in terms of the SBV effects {e i } as u = i λ i e i , where λ i ∈ {0, 1}.We can also write e R = i λ ′ i e R i , where λ ′ i ∈ {0, 1} and {i : λ ′ i = 1} ⊂ {i : λ i = 1}.It follows that 0 ≤ ⟨e, s⟩ ≤ ⟨u, s⟩ = 1 for all s ∈ S, so e is a valid effect.
To see that it is extremal, consider writing e = αe 1 + (1 − α)e 2 for two effects e 1 and e 2 and 0 < α < 1.Since e R is a {0, 1}-valued representation of e, and all local deterministic states have a {0, 1}-valued representation, for a local deterministic state s we have ⟨e, s⟩ ∈ {0, 1}.Hence, by the same argument as in Lemma 2, e must be extremal.Lemma 4. Any extremal wiring has a representation that can be formed by taking the {0, 1}-valued representation of the identity effect corresponding to all parties making the input 0, applying no-signalling moves in such a way that it remains {0, 1}-valued, and replacing some of the entries that are 1 with 0.
Proof.Consider first the case N = 1.In this case the extremal wirings are the effects formed by choosing one of the n I measurements and then applying a function from {0, 1, . . ., n O − 1} to {0, 1, . . ., n O − 1} to the outcome.In this case the no-signalling moves3 take us from the identity for input 0 to that for all other choices of measurement so the statement holds.Now assume by induction that the statement holds for N − 1 parties and consider N parties.The first step in an extremal wiring is to choose one of the N boxes and make a fixed input to that box.Up to symmetry, we can assume the first box is chosen and input 0 is made (relabelling parties or inputs does not change whether an effect is a wiring or not).In this case, the effect can only have non-zero entries where these correspond to elements of the state of the form P (a 1 a 2 . . .a n I |0x 2 . . .x n I ), i.e., where x 1 = 0.For each outcome we can then consider the N − 1 party effect that is performed conditioned on x 1 = 0 and the value of a 1 .By assumption, each of these sub-effects can be formed by taking the {0, 1}-valued representation of the identity effect corresponding to N − 1 parties making the input 0, applying no-signalling moves in such a way that it remains {0, 1}-valued, and replacing some of the entries that are 1 with 0. Let us ignore the replacement of 1 entries with 0s for the moment and consider only the representation of identity.Up to no signalling moves each of the sub effects corresponds to all parties measuring 0, and, up to symmetry the measurement on the first box corresponds to x 1 = 0. Thus, up to no-signalling moves, the identity is that corresponding to all parties measuring 0, and the effect is then formed by zeroing entries of this.
Lemma 5. Every effect can be represented by a (n I n O ) N -dimensional vector in which every entry is non-negative.
Proof.This is proven as part of Theorem 7 in [8].We give the argument for completeness.Consider the cone of non-normalized states in our representation, i.e., the set of (n where {e i } are the SBV effects and r j are vectors representing the no-signalling moves.We can rewrite ⟨r j , v⟩ = 0 as ⟨r j , v⟩ ≥ 0 and ⟨−r j , v⟩ ≥ 0. The dual cone is then that formed by the conic hull of {e i } i ∪ {r j } j ∪ {−r j } j .Thus, any effect can be written as i t i e i + j w j r j , where t i ≥ 0, but w j can be negative.Since ⟨r j , v⟩ = 0 for all v ∈ V, one representation of the effect is when the values of {w j } are set to zero.Thus, any effect can be written in the form i t i e i where t i ≥ 0. It is helpful to consider the set of (n I n O ) N -dimensional identity effects that have positive entries.These form a convex polytope, since they are defined by the vectors u R for which every element is positive and such that ⟨s L,i , u R ⟩ = 1 where i runs over all local deterministic distributions.There are hence (n I n O ) N equality constraints and (n I n O ) N inequality constraints.We call the extreme points of this polytope the extremal representations of the identity effect, and these can be computed using vertex enumeration.Lemma 6.Every extremal effect has a representation as a vector that can be formed by taking an extremal representation of the identity effect and replacing some of the non-zero entries with 0s.
Proof.Let e be an effect and f = u − e.By Lemma 5, we can represent e and f using (n I n O ) N -dimensional vectors e R and f R whose entries are non-negative.Write e R = i λ i e i and f R = i µ i e i , where {e i } are the SBV effects and {λ i } and {µ i } are non negative.Thus, u R = i (λ i + µ i )e i is a representation of the identity effect.
We claim that if e is extremal then for each i either λ i = 0 or µ i = 0. Suppose by contradiction that there is some j for which 0 < λ j < λ j + µ j .Then Both e R +µ j e j and e R −λ j e j represent effects (for the former, note that e R +µ j e j = u R −(f R −µ j e j )).Hence we have decomposed e as a convex combination of other effects, contradicting the assumption that e is extremal.Note that this implies that for any extremal effect, there exists a representation of it and its complement that are orthogonal 4 .Hence, we have shown that if e is extremal, it can be formed by zeroing entries from a representation of an identity effect.It remains to show that non-extremal representations of the identity effect need not be used.Suppose u R is a non-extremal representation of the identity effect, so u R = i ν i u R i with {u R i } being extremal representations of the identity effect, ν i ≥ 0, i ν i = 1 and at least two ν i > 0. Let Z be a map that zeroes some of the entries and suppose e R = Z(u R ) = i ν i Z(u R i ).If there are two or more values of i for which both ν i > 0 and Z(u R i ) ̸ = 0, then e R is not extremal.If there is only one i such that both ν i > 0 and Z(u R i ) ̸ = 0, then e R is also not extremal (but is proportional to the zeroing of an extremal representation of the identity effect).

V. COMPUTING EXTREMAL EFFECTS
Our method for computing extremal effects is suggested by Lemma 6.The first step is to identify all extremal representations of the identity effect by means of a vertex enumeration.
Specifically, the condition ⟨u, s⟩ = 1 for all elements of the state space can be imposed by requiring ⟨u, s⟩ = 1 for all local deterministic s, since the linear span of the local deterministic states covers the state space.This means that the set of identity effects for N parties can be expressed using 4 N equality constraints and 4 N inequality constraints (positivity of the individual entries).This problem is more tractable than performing the full vertex enumeration to compute all extremal effects directly.Instead, the extremal effects are obtained from the identity effects by considering sub-effects (see Lemma 6).
Computing the set of all extremal effects scales badly with the parameters of the scenario and hence we do not compute all in most cases (although our algorithm would in principle allow this).In several scenarios we instead compute the deterministic extremal effects, which include all extremal wirings.

A. Computing deterministic extremal effects
Our method for computing all deterministic extremal effects is suggested by Lemma 3.This algorithm recovers all deterministic extremal effects, which includes all wirings (cf.Lemma 4).
We first find all the {0, 1}-valued representations of the identity effect.This can be done by the following algorithm: Algorithm 1 -generate {0, 1}-valued identity effects where u R 1 is any {0, 1}-valued representation of the identity effect.[The algorithm can also be started with any initial set of {0, 1}-valued representations of the identity effect.]2. Generate S ′ = {s j ± r i } i,j , where s j are elements of S and r i are a complete set of generators of the no-signalling moves taking values {−1, 0, 1} (cf.Lemma 1).
3. Remove elements of S ′ with negative entries and set S = S ′ .
4. Repeat steps 2 and 3 until S stops increasing.

Output S.
Algorithm 1 is a sub-algorithm of our main routine: Algorithm 2 -generate all {0, 1}-valued effects 1. Use Algorithm 1 to generate all representations of the identity effect.
2. For each representation of the identity, form a new set of effects containing all effects that are obtained by deleting any number of 1s from each of the representations.
3. Take the union of all the sets generated.
4. For each element e i in this union, compute (e i , L(e i )), where L(e i ) = ⟨e i , s L,1 ⟩, ⟨e i , s L,2 ⟩, . . .and generate a list S = ((e 1 , L(e 1 )), (e 2 , L(e 2 )), . ..) of all these pairs.5. Go through the list checking whether L(e i ) = L(e j ) where i ̸ = j.If so, remove one of the two elements from S.
Note that each identity effect has n N O elements with value 1, so there are 2 n N O ways of deleting 1s for each representation of the identity effect in Step 2 (hence this algorithm does not scale well as the number of parties increases).
Taking the union of the sets involves removing any duplicate representations.However, at this point there remain different representations of the same effect in our set.The use of L(e i ) is a convenient way to remove such different representations.
We are also interested in classifying the extremal effects as either wirings or non-wirings.We first classify a set of wiring representations using the following algorithm (cf.Lemma 4):

Algorithm 3 -classify a representation of an effect as wiring representation
This algorithm takes as input a representation e R of an effect.
1.If the number of parties is 1, return 1. 52.Run over all exchanges of party 1 with each other party, and exchanges of input labels for the chosen party until the resulting effect (after the exchanges) has zero elements wherever x 1 ̸ = 0.If no such case is found return 0.
3. For each of the possible outcomes a 1 compute the N − 1 party effect conditioned on x 1 = 0 and the value of a 1 (there are n O instances to compute).

4.
Recursively run the same algorithm on each of these N − 1 party effects.If all cases return 1 then return 1, otherwise return 0.
Algorithm 3 outputs 0 if its input is not a wiring representation and outputs 1 if it is.An effect is a wiring if and only if it can be expressed as a convex combination of effects that have wiring representations.An extremal effect is hence a wiring if and only if it has a wiring representation.
To connect Algorithm 3 to the previously mentioned notion of a wiring, consider the first step in an extremal wiring.This involves choosing one of the boxes to make an input to as well as the value of the input.Consider the case where this is the first box and the input made is x 1 = 0.In this case, e R will have zeros for the elements corresponding to any probabilities conditioned on other values of x 1 .Hence, if e R is a wiring, there must exist a permutation Π of parties and of input labels such that Πe R has zero elements wherever x 1 ̸ = 0.For each of the possible outcomes a 1 we can consider the N − 1 party effect conditioned on x 1 = 0 and the value of a 1 (that each of these are an effect follows by considering the set of states that are a tensor product of the deterministic state that always outputs a 1 for x 1 = 0 with any N − 1 party state on the remaining systems).We can check that all these smaller effects have an analogous property in the same way and recurse.If the required label permutation exists at all levels, we can conclude that the effect corresponds to a wiring.
Suppose we run Algorithm 3 on a particular {0, 1}-valued representation e R of an effect e.If we get output 1 then we know that e can be implemented as a wiring.However, if we get 0, it could be that there is an alternative representation of e that is a wiring representation.To understand which effects are wirings or not we can modify Steps 4 and 5 of Algorithm 2 to the following: This case was already computed in [27] using a different procedure.The output of our algorithm in this case agrees with that of [27].In particular, there are 82 extremal effects.We break these down into 7 classes (two effects are in the same class if they are equivalent up to relabellings, where we can relabel the parties, the inputs for each party and the outputs for each input of each party) 6 .All of the effects are wirings, which was already known from [26].That all of the effects are wirings means we can alter Step 1 of Algorithm 3 to "if the number of parties is 2, return 1".
In this case the deterministic effects are sufficient for describing the full effect polytope, so the more general case is omitted here.
Performing a vertex enumeration using the software Porta [38], we are also able to find all the extremal representations of the identity effect.It turns out that there are 710760 of these, of which 744 are {0, 1}-valued (and 680 of the latter are wiring representations).These extremal representations of the identity effect break down into 307 classes, of which 9 are {0, 1}-valued (and 8 of the latter are wiring representations).Representatives of each of the 307 classes can be found in the Supplementary Material [37].

N = 4
In this case computing all the extremal representations of the identity effect is not feasible in reasonable time using Porta.Furthermore the number of extremal effects is too large to directly use our previous technique for the {0, 1}-valued extremal effects.Instead we can compute all the {0, 1}-valued extremal effects by computing one representative of each symmetry class.The computation works in the same way as before, but we remove symmetries at every step.
In particular, in Algorithm 1 we add a step between Steps 3 and 4 that removes elements of S that are equal to others under relabelling symmetries.In Algorithm 2, rather than using the list of local values L(e i ) we generate a canonical form of these by generating L(e i ) for every symmetry of e i and then storing the first of all of these according to some ordering function (e.g., since each list L(Πe i ) is {0, 1}-valued, they can be ordered as a binary number).We run Algorithm 2 with the modification to classify into wiring or non-wiring representations (i.e., using Steps 4 ′ and 5 ′ ).
Overall we find 168301 classes of extremal {0, 1}-valued effect, of which 124698 are wiring representations.By generating all the symmetries of each, we can then compute the total number of {0, 1}-valued effects to be 7940781474, of which 4729832866 are wiring representations.Because of the size, we only supply Supplementary files with an element of each class in this case [37].

B. Generalisations: more inputs and outputs
In the case n I = 2 and n O = 2 we were only able to partially solve the cases with N = 3 and N = 4. Increasing the number of inputs and outputs further increases the complexity, but we can make a few remarks.
Firstly consider the case N = 2.It was proven in [26] that the bipartite effect spaces also only contain wirings.Using our code we enumerate the number of classes for the first few cases, as well as the total number of effects (see Table I).Data with the full set of extremal effects for these cases can be found in the Supplementary Material [37].
In the case N = 3, the only additional case we attempt is Here we use the previous method to compute the {0, 1}-valued effects, obtaining 79 classes of such effect of which 76 are wirings and 3 are non-wirings.In total the number of {0, 1}-valued effects is 505136 which breaks down as 449840 wirings and 55296 non-wirings.Again, these cases can be found in the Supplementary Material [37].
Our codes can also be used for the enumeration of all extremal effects in these scenarios (within the computational limitations).We omit explicit characterisations here.

VII. APPLICATIONS OF NON-WIRINGS
In this section we discuss some applications of non-wiring measurements, focusing on those that outperform wirings.

A. State discrimination
Given a black box that outputs one of two possible (known) states, s 1 and s 2 , with probability 1/2 each, the task is to choose a measurement that gives the highest probability of correctly guessing which state was produced given just one copy.
When trying to discriminate between two probability distributions, P X and Q X , the guessing probability is where D C is the total variation distance (the subscript C indicating classical), i.e., D C (P, Q) = 1 2 x |P X (x)−Q X (x)|.In the case of two quantum states, ρ 1 and ρ 2 , this optimal guessing probability is 1 2 max where {E 1 , E 2 } form a POVM and D Q is the trace distance (the subscript Q indicating quantum), i.e., D Q (ρ 1 , ρ 2 ) = 1 2 tr |ρ 1 − ρ 2 | (see, e.g., [39,40]).The analogous formula for boxworld is that the optimal probability is 1 2 max e1,e2 where the first maximization is over all measurements {e 1 , e 2 } and the second is over all effects e 1 .The optimum will always be achieved by an extremal effect, hence, in cases where we have computed all extremal effects, we can calculate it by running over all of these.By analogy with the quantum and classical cases, it is natural to define D B (s 1 , s 2 ) := max e1 |⟨e 1 , s 1 ⟩ − ⟨e 1 , s 2 ⟩| (it is not clear how to remove the maximization from this expression in this case).The quantity D B also satisfies the requirements of a distance measure [41].
Using wirings and deterministic non-wirings we can only correctly guess which of these two states is present with probability at most 2423/2592 ≈ 0.935.We can turn this problem around and ask which non-wirings are advantageous for state discrimination (meaning that they outperform wiring effects for some pair of states).For any non-wiring effect e, this question can be answered, using a linear program.Let s 1 and s 2 be the two states to be distinguished and µ be fixed.The linear program is where v N is a vector that encodes the normalisation constraints, and M NS and M W are matrices encoding the non-signalling constraints9 and wirings respectively.(An inequality between a vector and a number is interpreted element-wise.)A non-wiring e is advantageous if ν < µ.In the case of perfect distinguishability µ is set to 1.In order to find any separation, µ could be taken as a variable that is optimised and µ − ν maximized instead.
We have implemented this program in Matlab, relying on YALMIP [42] and MOSEK [43] to solve the linear programs.Using this program, various examples analogous to the ones above can be found.Checking all effects mentioned in Section VI A 2 with this program, we find that there are also examples of non-wirings that only allow for distinguishing states perfectly that can also be perfectly distinguished with wirings.In addition, many of the effects outperform wirings with respect to the distinguishing probabilities they achieve for some states, but without reaching perfect distinguishability.
Our example disproves Observation 1 from [44].Indeed we show that these 8 states cannot be perfectly distinguished with 1-way LOCC in boxworld, while a global measurement achieves this.We see this as an indication that it is not the local indistinguishability within pairs of local states that causes this phenomenon in this example but rather the existence of separable but global measurements.
Note that on the other hand an analogue to the bipartite (9-state) example from [28,45] cannot be constructed, since all bipartite measurements in box-world (even in higher dimensions) are wirings [26].
In boxworld all measurements are separable so a distinction between separable and entangled measurements cannot be made.That we can still demonstrate that separable measurements outperform wiring measurements suggests that in some contexts comparing separable and wiring measurements may be more natural than comparing separable and entangled measurements, although in others, e.g., when considering teleportation, entangled measurements are required.

C. Nonlocality distillation
Consider two parties, Alice and Bob, who hold parts of t bipartite systems, with each subsystem having two inputs and two outputs.For simplicity, take these t systems to be identical (with state ŝ).A nonlocality distillation protocol seeks to use these t systems to give a larger violation of a Bell inequality than is possible with only 1.The most general strategy is for each party to associate a t-system 2-outcome measurement with each possible input.Such measurements have the form {e, u − e} and so can be expressed in terms of one effect.Thus, the overall strategy can be expressed using 4 effects that act on t systems (one for each of Alice's inputs and one for each of Bob's inputs).Because the individual states are identical, the overall starting state is the t-fold tensor product, s = ŝ⊗t .If e x are the effects associated with outcome 0 when Alice's input is x, and f y are likewise those for Bob, then the outcome probabilities are given by P ′ (00|xy) = ⟨e x ⊗ f y , s⟩, P ′ (01|xy) = ⟨e x ⊗ (u − f y ), s⟩ etc., where the tensor factors need to be matched appropriately (s ∈ S A1B1A2B2...AtBt , e x ∈ E A1A2...At and f y ∈ E B1B2...Bt ).That the overall effect is a tensor product reflects the independence of Alice's and Bob's operations.The idea of nonlocality distillation is to choose the four effects e 0 , e 1 , f 0 and f 1 so as to maximize the violation of a Bell inequality in the resulting distribution, P ′ (ab|xy).Figure 2 depicts this intuitively for 3 shared systems.For systems with two inputs and two outputs, the only extremal Bell inequality (up to symmetry) is the Clauser-Horne-Shimony-Holt (CHSH) inequality, which we can express as CHSH(P (ab|xy)) = E 00 + E 01 + E 10 − E 11 , with E xy = P (a = b|xy) − P (a ̸ = b|xy).We hence use this as our measure of nonlocality.Since optimizing over all effects for one party is a linear program, we can run over all extremal effects for Alice and do a linear program for Bob to determine the optimal strategy for CHSH-value distillation given 3 shared systems [46] (in principle we could also do 4, but, given the number of effects, the computation time is prohibitive).

D. Limitations for information processing with wirings and boxworld's non-wiring operations
Despite the advantages we managed to demonstrate above, access to non-wirings in boxworld does not unlock the same potential as access to measurements that cannot be implemented as local measurements and classical communication in quantum mechanics does.In particular, all measurements in boxworld are separable [26] (i.e., can be expressed as a sum of product effects), which leads to various restrictions.For instance, teleportation and entanglement swapping are impossible in boxworld [8,26,27].This directly implies that entanglement swapping is also not possible in the multi-partite setting (i.e., with non-wirings).A direct proof of this, which also applies to other GPTs, is obtained by following the same lines of reasoning as Lemma 2 of [50].

E. Boxworld effects in other GPTs
Wirings correspond to a classical processing of inputs and outcomes from measurements and as such the inputs and outcomes of any GPT can be connected by wirings.We show here that from the effects derived in this workincluding non-wiring effects -we can indeed construct valid effects for any other GPT.FIG. 3. The red lines show the boundary of the distillable regions using our non-wiring protocol, in different cross-sections of the (2,2,2) no-signalling polytope.The dotted curves represent the boundary of the quantum realizable correlations.On the left, the blue curve represents the boundary of the distillable region using the xor protocol of [47], while the green one represents correlations that can be distilled through the or protocols of [46].The red shaded area shows where our non-wiring protocol achieves higher final CHSH values than the protocols of [46][47][48][49] (for the protocol of [49], the comparison has been made with both its 2-copy and 3-copy versions).Note that in CS III there is no two-copy protocol that can distil any of the states [46].Consider a single system that is fully characterised by n I inputs and n O outcomes and call the SBV effects e 1,1 , . . ., e n I ,n O , where e i,j is the effect that has an entry 1 for the i th input and j th outcome and is zero otherwise.For N parties, the effects {e i1,j1 ⊗ • • • ⊗ e i N ,j N } i1,...,i N ,j1,...,j N span the effect cone of all boxworld effects.Thus, the effects we derived above for N -parties can be written in terms of the local effects as where λ i1,...,i N ,j1,...,j N are some coefficients (which can be taken to be positive cf.Lemma 5).Now, as we know that any valid effect is valid on any valid state, i.e., 0 ≤ ⟨e, s⟩ ≤ 1, we also have 0 λ i1,...,i N ,j1,...,j N P (j 1 , . . ., j N |i 1 , . . ., i N ) ≤ 1, where P (j 1 , . . ., j N |i 1 , . . ., i N ) is any no-signalling distribution.This holds because the state space in box-world is in 1:1 correspondence with the set of all no-signalling distributions of which the effects {e i1,j1 ⊗• • •⊗e i N ,j N } i1,...,i N ,j1,...,j N essentially just pick out elements.
Performing local measurements on N party states leads to non-signalling correlations in any GPT, since nonsignalling is one of the underlying assumptions.This implies that the correlations arising from performing any local measurements on an N party system, are mapped to a probability by the map defined by the coefficients {λ i1,...,i N ,j1,...,j N } i1,...,i N ,j1,...,j N .This means that for any GPT we can build valid effects from the ones we derived for boxworld, namely using these same coefficients: where the f i,j are local effects in the GPT of interest.
Note that a special case of the above is that in any GPT in which SBVs are valid local effects, all the effects found by our algorithms are directly valid.Furthermore, the state space of a GPT can always be expressed in terms of the outcome probabilities of a set of fiducial measurements.Having made the transformation needed for this representation, the SBVs are local effects.

VIII. THE SIGNIFICANCE OF WIRING AND NON-WIRING OPERATIONS BEYOND GPTS
The polytope spanned by all identity effects in boxworld in any (N, n I , n O ) scenario is of interest beyond the scope of boxworld and even beyond the scope of GPTs.This is illustrated with the following two connections to other areas of quantum foundations.This means that our Algorithm 2 may be of more general interest in the future in the sense that it allows us to construct logically consistent classical processes without causal order as well as compositions of contextuality scenarios.

A. Logically consistent classical processes
We also remark that the polytope of all identity effects in boxworld is equivalent to that of the logically consistent classical processes, which was computed in [23]. 11Since extremal effects for a single system in boxworld are SBVs in the (n I n O ) N -dimensional representation (and coarse-grainings of these), operationally, any extremal local measurement can be performed by choosing an input and then potentially coarse-graining the outcome, which is a classical process. 12his explains the correspondence between performing multi-system measurements on boxworld systems and the ways these classical local operations can be consistently connected.This correspondence extends: for any number of inputs and outputs the logically consistent classical processes can be understood in terms of the (identity) measurements of a GPT, namely boxworld.This operational way of thinking about the logically consistent operations may aid our understanding of such operations.

B. Composing contextuality scenarios
Contextuality is a notion that captures the lack of predetermined outcomes in quantum measurements, as these may depend on the context in which the measurements are performed.There are various approaches capturing this notion, starting from the original work of Kochen and Specker [51].In [25], the composition of contextuality scenarios was analysed, following the approach to contextuality from [24].According to this approach, a contextuality scenario is represented by a hypergraph (W, M ), where the vertices e ∈ W represent events of a specific outcome occurring and hyperedges m ∈ M correspond to collections of events that make up a measurement. 13These edges may overlap on some vertices, meaning that the respective event can occur as part of several measurements.In a non-contextual model, each vertex can be assigned a probability, since the event occurs with the same frequency no matter which of the measurements it is part of is performed (in a contextual model the probability will in general also depend on the measurement).The probabilistic models that are allowed then further depend on the underlying theory (e.g.classical, quantum or more general).
These contextuality scenarios can then also be considered in the multi-party regime, meaning that independent contextuality scenarios for several independent systems are turned into a single scenario for the joint system.This means that for two systems A, B for each of which a contextualtiy scenario (W A , M A ), (W B , M B ) is given, a joint scenario (W AB , M AB ) with events e AB = (e A , e B ) ∈ W AB , e A ∈ W A , e B ∈ W B is constructed in a way that the probabilistic models defined on the hypergraph satisfy the non-signalling principle.Depending on the set M AB that is constructed for this purpose, we speak about a different product.While in the case of two contextuality scenarios there is a unique way to compose them [52], namely by means of the Foulis-Randall (FR) product, in [24,25] different ways to compose more than two such scenarios, all respecting the non-signalling principle, were proposed.
Our multi-partite {0, 1}-valued effects can be seen as ways to combine products of deterministic local effects into multipartite measurements for our (non-signalling) GPT systems, which indeed amounts to the same mathematical problem as composing contextuality scenarios.Local deterministic effects correspond to the events v and the identity effects to the edges e in these multiparty scenarios.Thus our Algorithm 1 can be seen as a way to construct compositions of contextuality scenarios.Specifically, this algorithm constructs a product known as the disjunctive FR product in [25].The subset of wiring effects identified by Algorithm 3 constructs the maximal FR product from [25].
The existence of extremal identity effects that are not {0, 1}-valued further show that there is a more general way to combine single systems effects compatibly with non-signalling.This reasoning could also be applied to the events in a contextuality scenario.This suggests that the hypergraph formalism for describing contextuality scenarios needs to be extended to include a new concept that generalises the notion of a hyperedge.This generalisation requires a weight for each element of the hyperedge.[This can be easily added within the matrix representation of the hypergraph which works as follows.Each column corresponds to a vertex and each row denotes a hyperedge with a 1 meaning the vertex of the corresponding column is in the hyperedge and a 0 meaning it is not.In the generalisation, the matrix is no longer restricted to contain only elements of {0, 1}.]It would be interesting to explore the significance of this for non-contextuality in more detail.

IX. CONCLUSION
Characterizing measurements in theories beyond quantum theory allows us to better explore the possibilities for information processing in such theories and, in turn helps us understand what is special about quantum theory itself.In this paper we have explored ways to generate all the effects present in a maximally nonlocal alternative theory, boxworld.We have been able to find all the deterministic effects in several scenarios, dividing them into wirings and non-wirings, and also found many classes of non-deterministic effect.Although we have focused on boxworld, theories with fewer states have fewer constraints on allowed effects (under the no-restriction hypothesis) and hence the effects we have found are applicable in a wide range of GPTs (see Section VII E). 14  The effects we have found are also relevant for studies of logically consistent classical processes and for compositions of contextuality scenarios, where our findings suggest the need to extend the hypergraph formalism for dealing with these.
We have applied our findings to several applications, demonstrating advantages of both deterministic and nondeterministic non-wirings.In particular, we showed that, contrary to previous claims, examples of nonlocality without entanglement also appear in boxworld.We remark here that the existence of further examples of quantum non-locality without entanglement has recently been shown in the literature, using a construction based on classical processes without causal order [54,55].Due to the correspondences we establish in Sections VIII A and VII E, any such example can likely be turned directly into an example for other GPTs.
In quantum theory there is often focus on entangled measurements, but there are other types of joint measurement (see, e.g., [28]) whose power would be useful to explore.Since boxworld has no non-separable effects and is somewhat limited with respect to its reversible dynamics [56], it would be interesting to complement our investigations with those of possible measurements in no-signalling theories with restricted nonlocality.

FIG. 2 .
FIG. 2. Illustration of nonlocality distillation in the case t = 3.The grey boxes represent the identical initial bipartite states.The outer black frame represents the final correlations P ′ .The dashed ovals indicate the systems the measurements are performed on (left for Alice; right for Bob).