Certifying the Classical Simulation Cost of a Quantum Channel

A fundamental objective in quantum information science is to determine the cost in classical resources of simulating a particular quantum system. The classical simulation cost is quantified by the signaling dimension which specifies the minimum amount of classical communication needed to perfectly simulate a channel's input-output correlations when unlimited shared randomness is held between encoder and decoder. This paper provides a collection of device-independent tests that place lower and upper bounds on the signaling dimension of a channel. Among them, a single family of tests is shown to determine when a noisy classical channel can be simulated using an amount of communication strictly less than either its input or its output alphabet size. In addition, a family of eight Bell inequalities is presented that completely characterize when any four-outcome measurement channel, such as a Bell measurement, can be simulated using one communication bit and shared randomness. Finally, we bound the signaling dimension for all partial replacer channels in $d$ dimensions. The bounds are found to be tight for the special case of the erasure channel.


I. INTRODUCTION
The transmission of quantum states between devices is crucial for many quantum network protocols. In the nearterm, quantum memory limitations will restrict quantum networks to "prepare and measure" functionality [1], which allows for quantum communication between separated parties but requires measurement immediately upon reception. Prepare and measure scenarios exhibit quantum advantages for tasks that involve distributed information processing [2] or establishing nonlocal correlations which cannot be reproduced by bounded classical communication and shared randomness [3]. These nonlocal correlations lead to quantum advantages in random access codes [4,5], randomness expansion [6], device selftesting [7], semi-device-independent key distribution [8], and dimensionality witnessing [9,10].
The general communication process is depicted in Fig.  1(a) with Alice (the sender) and Bob (the receiver) connected by some quantum channel N A→B . Alice encodes a classical input x ∈ X into a quantum state ρ x and sends it through the channel to Bob, who then measures the output using a positive-operator valued measure (POVM) {Π y } y∈Y to obtain a classical message y ∈ Y. The induced classical channel, denoted by P N , has transition probabilities A famous result by Holevo implies that the communication capacity of P N is limited by log 2 d, where d is the input Hilbert space dimension of N [11]; hence a noiseless classical channel transmitting d messages has a capacity no less than P N . However, channel capacity is just one figure of merit, and there may be other features of a P N that do not readily admit a classical simulation. The strongest form of simulation is an exact replication of the transition probabilities P N (y|x) for any set of states {ρ x } x∈X and POVM {Π y } y∈Y . This problem falls in the domain of zero-error quantum information theory [12][13][14][15][16], which considers the classical and quantum resources needed to perfectly simulate a given channel. Unlike the capacity, a zero-error simulation of P N typically requires additional communication beyond the input dimension of N . For example, a noiseless qubit channel id 2 can generate channels P id2 that cannot be faithfully simulated using a one bit of classical communication [3].
The simulation question becomes more interesting if "static" resources are used for the channel simulation [17,18], in addition to the "dynamic" resource of noiseless classical communication. For example, shared ran-FIG. 1. A general classical communication process. We represent classical information as blue double lines, quantum information as black solid lines, and shared randomness as dotted red lines. (a) A classical channel PN is generated from a quantum channel N via Eq. (1). A classical-quantum encoder Ψ maps the classical input x ∈ X into a quantum state ρx. A quantum-classical decoder Π implements POVM {Πy}y∈Y . (b) Channel PN is simulated using shared randomness and a noiseless classical channel via Eq. (2). Alice encodes input x into classical message m with probability T λ (m|x) while Bob decodes message m into output y with probability R λ (y|m). The protocol is coordinated using a shared random value λ drawn from sample space Λ with probability q(λ).
domness is a relatively inexpensive classical resource that Alice and Bob can use to coordinate their encoding and decoding maps used in the simulation protocol shown in Fig. 1(b). Using shared randomness, a channel can be exactly simulated with a forward noiseless communication rate that asymptotically approaches the channel capacity; a fact known as the Classical Reverse Shannon Theorem [19]. More powerful static resources such as shared entanglement or non-signaling correlations could also be considered [15,20,21].
While the Classical Reverse Shannon Theorem describes many-copy channel simulation, this work focuses on zero-error channel simulation in the single-copy case. The minimum amount of classical communication (with unlimited shared randomness) needed to perfectly simulate every classical channel P N having the form of Eq.
(1) is known as the signaling dimension of N [22]. Significant progress in understanding the signaling dimension was made by Frenkel and Weiner who showed that every d-dimensional quantum channel requires no more than d classical messages to perfectly simulate [23]. This result is a "fine-grained" version of Holevo's Theorem for channel capacity mentioned above. However, the Frenkel-Weiner bound is not tight in general. For example, consider the completely depolarizing channel on d dimensions, D(ρ) = I/d. For any choice of inputs {ρ x } x and POVM {Π y } y , the Frenkel-Weiner protocol yields a simulation of P D that uses a forward transmission of d messages. However, this is clearly not optimal since P D can be reproduced with no forward communication whatsoever; Bob just samples from the distribution P (y) = Tr[Π y ]/d. A fundamental problem is then to understand when a noisy classical channel sending d messages from Alice to Bob actually requires d noiseless classical messages for zero-error simulation. As a main result of this paper, we provide a family of simple tests that determine when this amount of communication is needed. In other words, we characterize the conditions in which the simulation protocol of Frenkel and Weiner is optimal for the purposes of sending d messages over a d-dimensional quantum channel.
This work pursues a device-independent certification of signaling dimension similar to previous approaches used for the device-independent dimensionality testing of classical and quantum devices [24][25][26][27][28]. Specifically, we obtain Bell inequalities that stipulate necessary conditions on the signaling dimension of N in terms of the probabilities P N (y|x), with no assumptions made about the quantum states {ρ x } x , POVM {Π y } y , or channel N [29]. Complementary results have been obtained by Dall'Arno et al. who approached the simulation problem from the quantum side and characterized the set of channels P N that can be obtained using binary encodings for special types of quantum channels N [29]. In this paper, we compute a wide range of Bell inequalities using the adjacency decomposition technique [30], recovering prior results of Frenkel and Weiner [23] and generalizing work by Heinosaari and Kerppo [31]. For certain cases we prove that these inequalities are complete, i.e. providing both necessary and sufficient conditions for signaling dimension. As a further application, we compute bounds for the signaling dimension of partial replacer channels. Proofs for our main results are found in the Appendix while our supporting software is found on Github [32].

II. SIGNALING POLYTOPES
We begin our investigation by reviewing the structure of channels that use noiseless classical communication and shared randomness. Let P n→n denote the family of channels having input set X = [n] := {1, · · · , n} and output set Y = [n ]. A channel P ∈ P n→n is represented by an n × n column stochastic matrix, and we thus identify P n→n as a subset of R n ×n , the set of all n × n real matrices. In general we refer to a column (or row) of a matrix as being stochastic if its elements are non-negative and sum to unity, and a column (resp. row) stochastic matrix has only stochastic columns (resp. rows). The elements of a real matrix G ∈ R n ×n are denoted by G y,x , while those of a column stochastic matrix P ∈ P n→n are denoted by P (y|x) to reflect their status as conditional probabilities. The Euclidean inner product between G, P ∈ R n ×n is expressed as G, P := x,y G y,x P (y|x), and for any G ∈ R n ×n and γ ∈ R, we let the tuple (G, γ) denote the linear inequality G, P ≤ γ.
Consider now a scenario in which Alice and Bob have access to a noiseless channel capable of sending d messages. They can use this channel to simulate a noisy channel by applying pre-and post-processing maps. If they coordinate these maps using a shared random variable λ with probability mass function q(λ), then they can simulate any channel P that decomposes as where m ∈ [d] is the message sent from Alice to Bob and T (m|x) (resp. R(y|m)) is an element of Alice's encoder T ∈ P n→d (resp. Bob's decoder R ∈ P d→n ).
Definition 1. For given positive integers n, n , and d, the set of all channels satisfying Eq. (2) constitute the signaling polytope, denoted by C n→n d .
The signaling polytope C n→n d is a convex polytope of dimension n(n − 1) whose vertices V ∈ P n→n have 0/1 matrix elements and rank(V) ≤ d. We define G, P ≤ γ as a Bell inequality for C n→n d if C n→n d ⊂ {P ∈ P n→n | G, P ≤ γ}, and it is a "tight" Bell inequality if the equation G, P = γ is also solved by n(n − 1) affinely independent vertices. When the latter holds, the solution space to G, P = γ is called a facet of C n→n d . The Weyl-Minkowski Theorem ensures that a complete set of tight Bell inequalities { G k , P ≤ γ k } r k=1 exists such that P ∈ C n→n d iff it satisfies all inequalities in this set [33]. Additional details about signaling polytopes are found in Appendix B.
Having introduced signaling polytopes, we can now define the signaling dimension of a channel. This terminology is adopted from recent work by Dall'Arno et al. [22] who defined the signaling dimension of a system in generalized probability theories; an analogous quantity without shared randomness has also been studied by Heinosaari et al. [34]. In what follows, we assume that N : S(A) → S(B) is a completely positive tracepreserving (CPTP) map, with S(A) denoting the set of density operators (i.e. trace-one positive operators) on system A, and similarly for S(B).
Definition 2. Let P n→n N be the set of all classical channels P N ∈ P n→n generated from N via Eq. (1). The n → n signaling dimension of N , denoted by κ n→n (N ), is the smallest d such that P n→n For any channel N , a trivial upper bound on the n → n signaling dimension is given by Indeed, when this bound is attained, Alice and Bob can simulate any P ∈ P n→n : either Alice applies channel P on her input and sends the output to Bob, or she sends the input to Bob and he applies P on his end. In Theorem 1 we provide necessary and sufficient conditions for when this trivial upper bound is attained. For a quantum channel N , the trivial upper bound is This follows from Carathéodory's Theorem [35], which implies that every POVM on a d B -dimensional system can be expressed as a convex combination of POVMs with no more than d 2 B outcomes [36]. Since shared randomness is free, Alice and Bob can always restrict their attention to POVMs with no more than d 2 B outcomes for the purposes of simulating any channel in P n→n N when n ≥ d 2 B . The notion of signaling dimension also applies to noisy classical channels. A classical channel from set X to Y can be represented by a CPTP map N : S(C |X | ) → S(C |Y| ) that completely dephases its input and output in fixed orthonormal bases {|x } x∈X and {|y } y∈Y , respectively. The transition probabilities of N are then given by Eq. (1) as P N (y|x) = Tr |y y|N |x x| . The channel N can be used to generate another channel N with input and output alphabets X and Y by performing a pre-processing map T : X → X and post-processing map R : Y → Y, thereby yielding the channel P N = RP N T. When this relationship holds, P N is said to be ultraweakly majorized by P N [31,34], and the signaling dimension of P N is no greater than that of P N [15].
In practice, the channel connecting Alice and Bob may be unknown or not fully characterized. This is the case in most experimental settings where unpredictable noise affects the encoded quantum states. In such scenarios it is desirable to ascertain certain properties of the channel without having to perform full channel tomography, a procedure that requires trust in the state preparation device on Alice's end and the measurement device on Bob's side. A device-independent approach infers properties of the channel by analyzing the observed input-output classical correlations P (y|x) obtained as sample averages over many uses of the memoryless channel [29]. The Bell inequalities introduced in the next section can be used to certify the signaling dimension of the channel: if the correlations P (y|x) are shown to violate a Bell inequality of C n→n d , then the signaling dimension κ n→n (N ) > d. If these correlations arise from some untrusted quantum channel N A→B , by Eq. (4) it then follows that min{d A , d B } > d. Hence a device-independent certification of signaling dimension leads to a device-independent certification of the physical input/output Hilbert spaces of the channel connecting Alice and Bob.

III. BELL INEQUALITIES FOR SIGNALING POLYTOPES
In this section we discuss Bell inequalities for signaling polytopes. Since signaling polytopes are invariant under the relabelling of inputs and outputs, all discussed inequalities describe a family of inequalities where each element is obtained by a permutation of the inputs and/or outputs. Additionally, a Bell inequality for one signaling polytope can be lifted to a polytope having more inputs and/or outputs [37,38] (see Fig. 2). Formally, a Bell inequality G, P ≤ γ is said to be input lifted to G , P ≤ γ if G ∈ R n ×m is obtained from G ∈ R n ×n by padding it with (m − n) all-zero columns. On the other hand, a Bell inequality G, P ≤ γ is said to be output lifted to G , P ≤ γ if G ∈ R m ×n is obtained from G ∈ R n ×n by copying rows; i.e., there exists a sur- To obtain polytope facets, it is typical to first enumerate the vertices, then use a transformation technique such as Fourier-Motzkin elimination to derive the facets [33]. Software such as PORTA [39,40] assists in this computation, but the large number of vertices leads to impractical run times. To improve efficiency, we utilize the adjacency decomposition technique which heavily exploits the permutation symmetry of signaling polytopes [30] (see Appendix C). Our software and computed facets are publicly available on Github [32] while a catalog of general tight Bell inequalities is provided in Appendix D. We now turn to a specific family of Bell inequalities motivated by our computational results.

A. Ambiguous Guessing Games
For k ∈ [0, n ] and d ≤ min{n, n }, let G n,n k,d be any n × n matrix such that (i) k rows are stochastic with 0/1 elements, and (ii) the remaining (n − k) rows have 1/(n − d + 1) in each column. As explained below, it will be helpful to refer to rows of type (i) as "guessing rows" and rows of type (ii) as "ambiguous rows." For example, if n = n = 6, k = 5, and d = 2, then up to a permutation of rows and columns we have For any channel P ∈ C n→n d , the Bell inequality is satisfied. To prove this bound, suppose without loss of generality that the first k rows of G n,n k,d are guessing rows. Let V be any vertex of C n→n d where t of its first k rows are nonzero. If t = d, then clearly Eq. (7) holds. Otherwise, if t < d, then G n,n k,d , V ≤ t + (n − t)/(n − d + 1) ≤ d, where the last inequality follows after some algebraic manipulation.
Equation (7) can be interpreted as the score of a guessing game that Bob plays with Alice. Suppose that Alice chooses a channel input x ∈ [n] with uniform probability and sends it through a channel P. Based on the channel output y, Bob guesses the value of x. Formally, Bob computesx = f (y) for some guessing function f , and ifx = x then he receives one point. In this game, Bob may also declare Alice's input as being ambiguous or indistinguishable, meaning that f : [d] → [n] ∪ {?} with "?" denoting Bob's declaration of the ambiguous input. However, whenever Bob declares "?" he only receives 1/(n − d + 1) points. Then, Eq. (7) says that whenever P ∈ C n→n d Bob's average score is bounded by d n . Note, there is a one-to-one correspondence between each G n,n k,d and the particular guessing function f that Bob performs. If y labels a guessing row of G n,n k,d , then f (y) =x, wherex labels the only nonzero column of row y. On the other hand, if y labels an ambiguous row, then f (y) = "?".
We define the (k, d)-ambiguous polytope A n→n k,d as the collection of all channels P ∈ P n→n satisfying Eq. (7) for every G n,n k,d . Naturally, C n→n Based on the discussion of the previous paragraph, it is easy to decide membership of A n→n k,d . (8) where the maximization is taken over all permutations on [n ], r i denotes the i th row of P, r i ∞ is the largest element in r i , and r i 1 is the row sum of r i .
The maximization on the LHS of Eq. (8) can be performed efficiently using the following procedure. For each row r i we assign a pair (a i , b i ) where a i = r i ∞ and b i = 1 n−d+1 r i 1 . Define δi = a i − b i , and relabel the rows of P in non-increasing order of the δ i . Then according to this sorting, we have an ambiguous guessing game score of k i=1 a i + n i=k+1 b i , which we claim attains the maximum on the LHS of Eq. (8). Indeed, for any other row permutation π, the guessing game score is given by i∈{1,··· ,k} π(i)∈{1,··· ,k} a i + i∈{1,··· ,k} π(i)∈{k+1,··· ,n } b i + i∈{k+1,··· ,n } π(i)∈{1,··· ,k} Hence the difference in these two scores is i∈{1,··· ,k} π(i)∈{k+1,··· ,n } where the inequality follows from the fact that we have ordered the indices in non-increasing order of (a i − b i ), and the number of terms in each summation is the same since π is a bijection.
A special case of the ambiguous guessing games arises when k = n . Then up to a normalization factor 1 n , we interpret the LHS of Eq. (8) as the success probability when Bob performs maximum likelihood estimation of Alice's input value x given his outcome y (i.e. he chooses the value x that maximizes P (y|x)). We hence define M n→n d := A n→n n ,d as the maximum likelihood (ML) estimation polytope. Using Proposition 1 we see that An important question is whether the ambiguous guessing Bell inequalities of Eq. (7) are tight for a signaling polytope C n→n d . In general this will not be case. For instance, G n,n k,d , P ≤ d is trivially satisfied whenever k = 0. Nevertheless, in many cases we can establish tightness of these inequalities. A demonstration of the following facts is carried out in Appendix E.

Proposition 2.
(i) For min{n, n } > d > 1 and k = n , Eq. (7) is a tight Bell inequality of C n→n d iff G n,n k,d can be obtained by performing input/output liftings and row/column permutations on an m × m identity matrix I m , with min{n, n } ≥ m > d.
(ii) For n > k ≥ n > d > 1, Eq. (7) is a tight Bell inequality of C n→n d iff G n,n k,d can be obtained from the (n + 1) × n matrix by performing output liftings and row/column permutations.
Note that the input/output liftings are used to manipulate the identity matrix I m and the matrix of Eq. (12) into an n × n matrix G n,n k,d . The tight Bell inequalities described in Proposition 2(i) completely characterize the ML polytope M n→n d . For this reason, we refer to any G n,n k,d satisfying the conditions of Proposition 2(i) as a maximum likelihood (ML) facet (see Appendix D 2). Likewise, we refer to any G n,n k,d satisfying the conditions of Proposition 2(ii) as an ambiguous guessing facet (see Appendix D 3).

B. Complete Sets of Bell Inequalities
In general, we are unable to identify the complete set of tight Bell inequalities that bound each signaling polytope C n→n d . However, we analytically solve the problem in special cases. Theorem 1. Let n and n be arbitrary integers.
In other words, to decide whether a channel can be simulated by an amount of classical messages strictly less than the input/output alphabets, it suffices to consider the ambiguous guessing games. Moreover, by Eq. (8) it is simple to check if these conditions are satisfied for a given channel P. A proof of Theorem 1 is found in Appendix F. We also characterize the C n→4 2 signaling polytope. As an application, this case can be used to understand the classical simulation cost of performing Bell measurements on a two-qubit system, since this process induces a classical channel with four outputs.
Theorem 2. For any integer n, a channel P ∈ P n→4 belongs to C n→4 2 iff it satisfies the eight Bell inequalities depicted in Fig. 3 and all their input/output permutations.
Remarkably, this result shows that no new facet classes for C n→4 2 are found when n > 6. Consequently, to demonstrate that a channel P ∈ P n→4 requires more than one bit for simulation, it suffices to consider input sets of size no greater than six. For n < 6, the facet classes of C n→4 2 are given by the facets in Fig. 3 having (6 − n) allzero columns. We conjecture that in general, no more than n d inputs are needed to certify that a channel P ∈ P n→n has a signaling dimension larger than d. A proof of Theorem 2 is found in Appendix G.

C. The Signaling Dimension of Replacer Channels
In the device-independent scenario, Alice and Bob make minimal assumptions about the channel N A→B connecting them; they simply try to lower bound the dimensions of N using input-output correlations P N (y|x). Applying the results of the previous section, if G, P ≤ γ is a Bell inequality for C n→n Eq. (13) describes a conic optimization problem that can be analytically solved only in special cases [41]. Hence deciding whether a given quantum channel can violate a particular Bell inequality is typically quite challenging. Despite this general difficulty, we nevertheless establish bounds for the signaling dimension of partial replacer channels. A d-dimensional partial replacer channel has the form where 1 ≥ µ ≥ 0 and σ is some fixed density matrix. The partial depolarizing channel D µ corresponds to σ being the maximally mixed state whereas the partial erasure channel E µ corresponds to σ being an erasure flag |E E| with |E being orthogonal to {|1 , · · · , |d }.
Theorem 3. The signaling dimension of a partial replacer channel is bounded by Moreover, for the partial erasure channel, the upper bound is tight for all µ ∈ [0, 1].
Proof. We first prove the upper bound in Eq. (15). The trivial bound κ(R µ ) ≤ d was already observed in Eq.
(4). To show that κ(R µ ) ≤ µd + 1 , let {ρ x } x be any collection of inputs and {Π y } y a POVM. Then where P (y|x) = Tr[Π y ρ x ] and S(y) = Tr[Π y σ]. From Ref. [23], we know that P (b|x) can be decomposed like Eq. (2). Substituting this into Eq. (16) yields For r = µd + 1 , let ν be a random variable uniformly distributed over { d r−1 }, which is the collection of all subsets of [d] having size r − 1. For a given λ, ν, and input x, Alice performs the channel T λ . If m ∈ ν, Alice sends message m = m; otherwise, Alice sends message m = 0. Upon receiving m , Bob does the following: if m = 0 he performs channel R λ with probability µd r−1 and samples from distribution S(y) with probability 1 − µd r−1 ; if m = 0 he samples from S(y) with probability one. Since P r{m ∈ ν} = r−1 d , this protocol faithfully simulates P Rµ . To establish the lower bound in Eq. (16), suppose that Alice sends orthogonal states {|1 , · · · , |d } and Bob measures in the same basis. Then which will violate Eq. (7) for the ML polytope M d→d r whenever r < µd + (1 − µ). Hence any zero-error simulation will require at least µd + 1 − µ classical messages. For the erasure channel, this lower bound can be tightened by considering the score for other ambiguous games, as detailed in Appendix H.

DISCUSSION
In this work, we have presented the signaling dimension of a channel as its classical simulation cost. In doing so, we have advanced a device-independent framework for certifying the signaling dimension of a quantum channel as well as its input/output dimensions. While this work focuses on communication systems, our framework also applies to computation and memory tasks.
The family of ambiguous guessing games includes the maximum likelihood facets, which say that n y=1 max x∈[n] P (y|x) ≤ d for all P ∈ C n→n d . Since the results of Frenkel and Weiner imply that P n→n an observation also made in Ref. [27]. Despite the simplicity of this bound, in general it is too loose to certify the input/output Hilbert space dimensions of a channel. For example, consider the 50 : 50 erasure channel E 1/2 acting on a d A = 3 system. It can be verified that P n→n Hence maximum likelihood estimation yields the lower bound κ(E 1/2 ) ≥ 2. On the other hand, the classical channel ambiguous polytope.
3,2 and it follows that κ 3→4 (E 1/2 ) ≥ 3. Therefore, the ambiguous guessing game certifies the qutrit nature of the input space whereas maximum likelihood estimation does not.
Our results can be extended in two key directions.
First, our characterization of the signaling polytope is incomplete. Novel Bell inequalities, lifting rules, and complete sets of facets can be derived beyond those discussed in this work. Such results would help improve the signaling dimension bounds and the efficiency of computing Bell inequalities. Second, the signaling dimension specifies the classical cost of simulating a quantum channel, but not the protocol that achieves the classical simulation. Such a simulation protocol would apply broadly across the field of quantum information science and technology.

Supporting Software
This work is supported by SignalingDimension.jl [32]. This software package includes our signaling polytope computations, numerical facet verification, and signaling dimension certification examples. SignalingDimension.jl is publicly available on Github and written in the Julia programming language [42]. The software is documented, tested, and reproducible on a laptop computer. The interested reader should review the software documentation as it elucidates many details of our work.

Notation
Terminology Definition P n→n Set of Classical Channels The subset of R n ×n containing column stochastic matrices.

Classical Channel
An element of P n→n that represents a classical channel with n inputs and n outputs.

Quantum Channel
A completely positive trace-preserving map.
The subset of P n→n which decomposes as Eq. (1) for some quantum channel N .

Signaling Polytope
The subset of P n→n containing channels that decomposes as Eq.
(G, γ) Linear Bell Inequality A tuple describing the linear inequality G, P ≤ γ where G ∈ R n ×n , γ ∈ R, and P ∈ P n→n . (G n,n k,d , d) Ambiguous Guessing Game A signaling polytope Bell inequality where G n,n k,d ∈ R n ×n has k rows that are row stochastic with 0/1 elements and (n − k) rows with each column containing 1/(n − d + 1).

Ambiguous Polytope
The subset of P n→n which is tightly bound by inequalities of the form (G n,n k,d , d  In this section we provide details about the structure of signaling polytopes (see Definition 1). The signaling polytope, denoted by C n→n d , is a subset of P n→n . Therefore, a channel P ∈ C n→n d has matrix elements P (y|x) subject to the constraints of non-negativity P (y|x) ≥ 0 and normalization y∈[n ] P (y|x) = 1 for all y ∈ [n ] and x ∈ [n]. Furthermore, since channels P ∈ C n→n d are permitted the use of shared randomness, the set C n→n d is convex. In the two extremes of communication, the signaling polytope admits a simple structure. For maximum communication, d = min{n, n }, any channel P ∈ P n→n can be realized, hence C n→n min{n,n } = P n→n . For no communication, d = 1, Bob's output y is independent from Alice's input x meaning that P (y|x) = P (y|x ) for any choice of x, x ∈ [n] and y ∈ [n ]. This added constraint simplifies the signaling polytope C n→n 1 to P 1→n which is formally an n -simplex [33]. For all other cases, min{n, n } > d > 1, the signaling polytope C n→n d takes on a more complicated structure.

Vertices
The vertices of the signaling polytope are denoted by V n→n d . Signaling polytopes are convex and therefore described as the convex hull of their vertices, C n→n d = conv(V n→n d ). As noted in the main text, a vertex V ∈ V n→n d is an n × n column stochastic matrices with 0/1 elements and rank rank(V) ≤ d. For instance,

Polytope Dimension
The dimension of the signaling polytope dim(C n→n d ) ≤ dim(P n→n ) = n(n − 1). This upper bound follows from the facts that C n→n d ⊆ P n→n and any P ∈ P n→n must satisfy n normalization constraints, one for each column of P. Naively, P n→n ⊂ R n ×n where dim(R n ×n ) = n(n ), however, the n normalization constraints restrict P n→n to dim(P n→n ) = n(n −1). To evaluate the dimension of C n→n d with greater precision, the number of affinely independent vertices in V n→n d can be counted where dim(C n→n d ) is one less than the number of affinely independent vertices. When d ≥ 2, one can count n(n − 1) + 1 affinely independent vertices in V n→n d , therefore, dim(C n→n d ) = n(n − 1). In the remaining case of d = 1, each of the n vertices are affinely independent and dim(C n→n 1 ) = n − 1. This result is not surprising because, as noted before, C n→n 1 = P 1→n and dim(P 1→n ) = n − 1.

Facets
A linear Bell inequality is represented as a tuple (G, γ) with G ∈ R n ×n and γ ∈ R where the inequality G, P = x,y G y,x P (y|x) ≤ γ is formed by the Euclidean inner product with a channel P ∈ P n→n . For convenience, we identify two polyhedra of channels Lemma 1. An inequality (G, γ) is a tight Bell inequality of the C n→n d signaling polytope iff Condition 1 requires that Bell inequality (G, γ) contains all channels P ∈ C n→n d while Condition 2 requires that inequality (G, γ) is both a proper half-space and a facet of C n→n d . Tight Bell inequalities and facets are closely related and described by the same inequality (G, γ). The key difference is that a tight Bell inequality is a half-space inequality G, P ≤ γ whereas a facet is the polytope C n→n d ∩F(G, γ). The complete set of signaling polytope facets is denoted by F n→n d and the signaling polytope is simply the intersection of all tight Bell inequalities ( The number of facet inequalities r is typically larger than the set of vertices V n→n d presenting another challenge in the characterization of signaling polytopes. Remark. A given Bell inequality (G, γ) ∈ F n→n d does not have a unique form. Therefore, it is convenient to establish a normal form for a given facet inequality [30]. First, observe that multiplying an inequality (G, γ) by a scalar a ∈ R does not change the inequality, that is, C(G, γ) = C(aG, a(γ)). Second, observe that the vertices in V n→n d have 0/1 elements and the rational arithmetic in Fourier-Motzkin elimination [33,39] results in the matrix coefficients of G being rational. Therefore, there exists a rational scalar a such that aG y,x and aγ are integers for all x ∈ [n] and y ∈ [n ]. Third, observe that the normalization and non-negativity constraints for channels P ∈ P n→n allows the equivalence between the following two inequalities for any x ∈ [n]. Therefore, it is always possible to find a form of inequality (G, γ) where G y,x ≥ 0 for all y ∈ [n ] and x ∈ [n]. Hence we define a normal form for any tight Bell inequality (G, γ) ∈ F n→n d : • Inequality (G, γ) is scaled such that γ and all G y,x are integers with a greatest common factor of 1.
• Normalization constraints are added or subtracted from all columns using Eq. (B5) such that G y,x ≥ 0 and the smallest element in each column of G is zero.

Permutation Symmetry
The input and output values x and y are merely labels for a channel P ∈ P n→n , therefore, swapping labels x ↔ x and y ↔ y where x, x ∈ [n] and y, y ∈ [n ] does not affect P n→n [38]. The relabeling operation is implemented using elements from the set of doubly stochastic k × k permutation matrices S k . For example, π X ∈ S n , and π Y ∈ S n . Note that permuting the rows or columns of a matrix cannot change the rank of a matrix, It follows that this permutation symmetry holds for any channel in the signaling polytope, P, P ∈ C n→n d where P is a permutation of P. Likewise, a facet inequality

Generator Facets
Permutation symmetry motivates the notion of a facet class defined as a collection of facet inequalities formed by taking all permutations of a canonical facet (G , γ) ∈ F n→n d which we refer to as a generator facet. The canonical facet is arbitrary thus we define the generator facet as the lexicographic normal form [30,38] of the facet class. The set of generator facets, denoted by G n→n containing the generator facet of each facet class bounding C n→n d . Since the number of input and output permutations scale as factorials of n and n respectively, the set of generator facets is considerably smaller than F n→n d and therefore, provides a convenient simplification to F n→n d . To recover the complete set of facets from G n→n d , we take all row and column permutations of each generator facet (G , γ) ∈ G n→n d . As a final remark, we note that V n→n d can also be reduced to a set of generator vertices, however, this set is not required for our current discussion of signaling polytopes.

Appendix C: Adjacency Decomposition
This section provides an overview of the adjaceny decomposition technique [30]. In our work, we use an adjacency decomposition algorithm to compute the generator facets of the signaling polytope. Our implementation can be found in our supporting software [32]. The adjacency decomposition provides a few key advantages in the computation of Bell inequalities: 1. The algorithm stores only the generator facets G n→n d instead of the complete set of facets F n→n d . This considerably reduces the required memory.
2. New generator facets are derived in each iteration of the computation, hence, the algorithm does not need to run to completion to provide value.

Adjacency Decomposition Algorithm
The adjacency decomposition is an iterative algorithm which requires as input the signaling polytope vertices V n→n d and a seed generator facet (G seed , γ seed ) ∈ G n→n d . The algorithm maintains a list of generator facets G list where each facet (G , γ) ∈ G list is marked either as considered or unconsidered. The generator facet is defined as the lexicographic normal form of the facet class [30,38]. Before the algorithm begins, (G seed , γ seed ) is added to G list and marked as unconsidered. In each iteration, the algorithm proceeds as follows [30]: 1. An unconsidered generator facet (G , γ) ∈ G list is selected.
3. Each adjacent facet is converted into its lexicographic normal form. 4. Any new generator facets identified are marked as unconsidered and added to G list . 5. Facet (G , γ) is marked as considered.
The procedure repeats until all facets in G list are marked as considered. If run to completion, then G list = G n→n d and all generator facets of the signaling polytope C n→n d are identified. The algorithm is guaranteed to find all generator facets due to the permutation symmetry of the signaling polytope. By this symmetry, any representative of a given facet class has the same fixed set of facet classes adjacent to it. For the permutation symmetry to hold for all facets in the signaling polytope, there cannot be two disjoint sets of generator facets where the members of one set do not lie adjacent to the members of the other.
The inputs of the adjacency decomposition are easy to produce computationally. A seed facet can always be constructed using the lifting rules for signaling polytopes (see Fig. 2) and the signaling polytope vertices V n→n d can be easily computed (see supporting software [32]). Note, however, that the exponential growth of V n→n d eventually hinders the performance of the adjacency decomposition algorithm.

Facet Adjacency
A key step in the adjacency decomposition algorithm is to compute the set of facets adjacent to a given facet (G, γ). In this section, we define facet adjacency and outline the method used to compute the adjacent facets.
Lemma 2. Two facets (G 1 , γ 1 ), (G 2 , γ 2 ) ∈ F n→n d are adjacent iff they share a ridge H defined as: A ridge can be understood as a facet of the facet polytope C n→n d ∩ F(G, γ). Therefore, to compute the ridges of a given facet (G, γ) ∈ F n→n d we take the typical approach for computing facets. Namely, the set of vertices {V ∈ V n→n d | G, V = γ} is constructed and PORTA [39,40] is used to compute the ridges of (G, γ). A facet adjacent to (G, γ) is computed from each ridge using a rotation algorithm described by Christof and Reinelt [30]. Given the signaling polytope vertices V n→n d , this procedure computes the complete set of facets adjacent to (G, γ).

Appendix D: Tight Bell Inequalities
In this section we discuss the general forms for each of the signaling polytope facets in Fig. 3. Each facet class is described by a generator facet (see Appendix B 4) where all permutations and input/output liftings of these generator facets are also tight Bell inequalities. To prove that an inequality (G, γ) is a facet of C n→n d , both conditions of Lemma 1 must hold. The proofs contained by this section verify Condition 2 of Lemma 1 by constructing a set of dim(C n→n d ) = n(n − 1) affinely independent {V ∈ V n→n d | G, V = γ}. These enumerations are verified numerically in our supporting software [32]. To assist with the enumeration of affinely independent vertices, we introduce a simple construction for affinely independent vectors with 0/1 elements.

Lemma 3.
Consider an n-element binary vector b k ∈ {0, 1} n with n 0 null elements and n 1 unit elements where n 0 + n 1 = n. A set of n affinely independent vectors { b k } n k=1 is constructed as follows: • Let b 1 be the binary vector where the first n 0 elements are null and the next n 1 elements are unit values.
• For k ∈ [2, n 0 + 1], b k is derived from b 1 by swapping the unit element at index (n 0 + 1) with the null element at index (k − 1).
• For k ∈ [n 0 + 2, n], b k is derived from b 1 by swapping the null element at index n 0 with the unit element at index k. Proof. To verify the affine independence of { b} n k=1 it is sufficient to show the linear independence of { b 1 − b k } n k=2 . Note that each ( b 1 − b k ) has two nonzero elements, one of which occurs at an index that is zero for all ( b 1 − b k ) where k = k . Therefore, the vectors in { b 1 − b k } n k=2 are linearly independent and { b k } n k=1 is affinely independent.

k -Guessing Facets
Consider a guessing game with k correct answers out of n possible answers. In this game, Alice has n = n k inputs where each value x corresponds to a unique set of k correct answers. Given an input x ∈ [n], Alice signals to Bob using a message m ∈ [d] and Bob makes a guess y ∈ [n ]. A correct guess scores 1 point while an incorrect guess scores 0 points. This type of guessing game is described by Heinosaari et al. [31,34] and used to test the communication performance of a particular theory. In this work, we treat this k-guessing game as a Bell inequality (G n ,k K , γ n ,k,d K ) of the signaling polytope C n→n d where γ n ,k,d and G n ,k K ∈ R n ×( n k ) is a matrix with each column containing a unique distribution of k unit elements and (n − k) null elements. For example, This general Bell inequality for signaling polytopes was identified by Frenkel and Weiner [23], who showed that given a channel P ∈ C n→n d , the bounds of this inequality are However, we only focus on the upper bound γ n ,k,d K . We now show conditions for which (G n ,k K , γ n ,k,d Proof. To prove that (G n ,k K , γ n ,k,d where 0 and 1 are row vectors containing zeros and ones, and we refer to G (n −1),(k−1) K and G (n −1),k K as left and right k-guessing blocks respectively. The left and right k-guessing blocks suggest a recursive approach to our construction of affinely independent vertices. Namely, we construct n k vertices by targeting the first row of G n ,k K while Proposition 3 is recursively applied to enumerate the remaining vertices using the left and right k-guessing blocks. The recursion requires two base cases to be addressed: 1. When d = 2 and n = k + d, the construction of affinely independent vertices is described in Proposition 4.
2. When k = 1, the construction of affinely independent vertices is described in Proposition 5.
An iteration of this recursive construction proceeds as follows.
First, we construct an affinely independent vertex for each of the n k elements in the first row of G n ,k K . For each index x 1 in the 1 block, a vertex V 1 is constructed by setting all V 1 (1|x) = 1 where x = x 1 and V 1 (y|x 1 ) = 1 where y > 1 is the smallest row index such that G y,x 1 = 1. The remaining rows of V 1 are filled to maximize the right k-guessing block. Then, for each index x 0 in the 0 block, a vertex V 0 is constructed by setting V 0 (1|x 0 ) = 1 and all V 0 (1|x) = 1 where G 1,x = 1. The remaining (d − 1) rows of V 0 are filled to maximize the right k-guessing block. This procedure enumerates n k affinely independent vertices. Then, the remaining (n − 2) n k vertices are found by individually targeting the left and right k-guessing blocks. To construct a vertex V L using the left block G (n −1)),(k−1) K , the first row of V L is not used. The left block is then a (k − 1)-guessing game with (n − 1) outputs where d = (n − 1) − (k − 1) = n − k, hence, Proposition 3 holds and (n − 2) n −1 k−1 affinely independent vertices are enumerated using the described recursive process. Note that for each vertex of form V L , the remaining elements are filled to maximize the right k-guessing block G (n −1),k K . Similarly, to construct a vertex V R using the right block G (n −1),k K , we set all elements V R (1|x) = 1 where G n ,k 1,x = 1. The remaining (d − 1) rows of V R are filled by optimizing the G (n −1),k K block. Since d = n − k and (d − 1) = (n − 1) − k, Proposition 3 holds, and recursively applying this procedure constructs (n − 2) n −1 k vertices of form V R using the right k-guessing block.
Finally, vertices of forms V 0 , V 1 , V L and V R are easily verified to be affinely independent. Summing these vertices yields (n − 2) n −1 k−1 + (n − 2) n −1 k + n k = (n − 1) n k affinely independent vertices, therefore, the k-guessing Bell inequality is proven to be tight when n = k + d. have a unique column x 0 containing null elements both rows y and y . Therefore, for each unique pair y and y , two affinely independent vertices V 1 and V 2 are constructed by setting V 1 (y|x 0 ) = 1 and V 2 (y |x 0 ) = 1 while the remaining terms are arranged such that all unit elements in row y and the remaining elements in row y are selected to achieve the optimal score. Performing this procedure for the first two rows of G 5,3 K (y = 1 and y = 2) constructs the vertices where x 0 = 10 in this example. Repeating this procedure for each of the n 2 row selections produces, 2 n 2 = 2 n k affinely independent vertices, one for each null element in G n ,(n −2) K . The remaining vertices are constructed by selecting a target row y ∈ [n − 1]. In the target row, for each x where G y,x = 1 a vertex V 3 is constructed by setting V 3 (y|x) = 1 for all x = x that satisfy G y,x = 1. A secondary row y > y of V 3 is chosen where y is the smallest index satisfying G y ,x = 1. We then set V (y |x ) = 1 while the remaining elements of V 3 are set to achieve the optimal score. For selected rows y and y , the null column at index x 0 is set in the target row as V 3 (y|x 0 ) = 1. For example, consider G 5, K with the target row as y = 1 and x = 4 we construct the vertex, Note that all secondary row indices y ≤ y + 3 are required to construct a vertex V 3 for each unit element in the target row y. Let ∆y = y − y, then 3 ∆y=1 n −1−∆y d+1−∆y vertices are constructed for target row y. For y = n − 2 and y = n − 1, the sum terminates at ∆y = 2 and ∆y = 1 respectively because the vertices are only affinely independent if the secondary row has index y > y. Thus, this process produces Combining the vertices of form V 1 , V 2 , and V 3 yields a set of 2 n k + (n − 3) n k = (n − 1) n k affinely independent vertices. Therefore, when d = 2 and k = n − 2, (G n ,(n −2) K , n n −2 − 1) is a tight Bell inequality of the C ( n k )→n 2 signaling polytope.

Maximum Likelihood Facets
In this section, we discuss the conditions for which maximum likelihood games (see main text) are tight Bell inequalities. The maximum likelihood Bell inequality (G n ML , d) is a (k = 1)-guessing game where G n ML = G n ,1 K . For simplicity, this section considers unlifted forms of G n ML is a n × n doubly stochastic matrix with 0/1 elements such as the n × n identity matrix. For any vertex V ∈ V n→n d , is satisfied because rank(V) ≤ d and G n ML is doubly stochastic. By the convexity of C n→n d , inequality (D10) must hold for all P ∈ C n→n d . We now discuss the conditions for which (G n ML , d) is a tight Bell inequality.

Ambiguous Guessing Facets
In this section we discuss the conditions for which ambiguous guessing games (see main text) are tight Bell inequalities. Consider the ambiguous guessing Bell inequality (G n ,d where we refer to rows of the G (n −1) ML block as guessing rows and 1 is a row vector of ones which we refer to as the ambiguous row. Note that G n ,d ? is a special case of the ambiguous guessing game G (k) (see main text), and without loss of generality, we express G n ,d ∈ R n ×(n+1) by taking a guessing row y where G y,x = (n − d) distributing the value between between two columns such that G y,x = 1 and G y,x = (n − d) − 1 where x = n is a new column. This rescaling is a non-trivial input lifting rule. The bound of the input-lifted facet is the same as the unlifted version. For example, when n = 5 and d = 2, the G 5,2 ? is rescaled along the 4th row as, This rescaling input lifting is a general trend observed in our computed signaling polytope facets [32], however, it is not clear how broadly this lifting rule applies or generalizes.

Anti-Guessing Facets
Another special case of the k-guessing game is the anti-guessing game Bell inequality (G n A , n ) where G n A = G n ,(n −1) K . For any channel P ∈ P n→n with n = n The anti-guessing Bell inequality G n A , P ≤ n is satisfied, therefore anti-guessing games are not very useful for witnessing signaling dimension. That said, the anti-guessing game is significant because it can be combined with a maximum likelihood game in block form to construct a facet of the d = (n − 2) signaling polytope C n→n (n −2) . We denote these anti-guessing facets by G ε,m A where the facet is constructed as where G ε,m A ∈ R n ×n , n = ε + m , and0 is a matrix block of zeros. For channel P ∈ C n →n This upper bound follows from the fact that no more than two rows are required to score ε in the G ε A block and the remaining d − 2 rows score one point each against the G m ML block.
Proof. To prove the tightness of the anti-guessing Bell inequality we show a row-by-row construction of dim(C n→n For convenience, we refer to the first ε rows of G ε,m A as anti-guessing rows and the remaining m rows as guessing rows. We treat anti-guessing and guessing rows individually because each admits its own vertex construction. To help illustrate this proof, we draw upon the example where ε = m = d = 3, For a target anti-guessing row y ∈ [1, ε] we construct (n − 1) vertices where (ε − 1) vertices are constructed using the G ε A block and m vertices are constructed using the0 block in the top right. Note that a vertex achieves the upper bound γ ε,d A only if two or less anti-guessing rows are used. A vertex V A is constructed using the G ε,m A block by setting V A (y|x) = 1 for all x that satisfy G ε,m y,x = 1 and selecting a secondary row y = y with y ∈ [1, ε] and setting V A (y |x ) = 1 where x is the index of the null element in the target row G ε,m y,x = 0. All remaining elements of V A are set so that the first (d − 2) diagonal elements of the G m ML block are selected and any remaining terms are set as unit elements in the target row. An affinely independent vertex is constructed for each of the (ε − 1) choices of secondary row y . For example, when targeting row y = 1 we enumerate two vertices For a target anti-guessing row y, an additional m vertices with form V A,0 are constructed using the0 block in the top right. If m > (d − 1), we set the target row as V A,0 (y|x) = 1 where x ∈ [1, ε]. The remaining (d − 1) rows are then used to maximize the G m ML block. Using Lemma 3 a set of m affinely independent vectors { b k } m k=1 } with (d − 1) null elements and (m − d + 1) unit elements can be constructed and used in the0 block of V A,0 by setting V A,0 (y|[ε + 1, n ]) = b k . All remaining null elements in the target row of V A,0 are then set along the diagonal of the G m ML block. Since there are m choices of b k , that many affinely independent vertices can be constructed. For example, when targeting row y = 1 we enumerate 3 vertices, (D20) If m = (d − 1), a secondary anti-guessing row y is selected where the anti-guessing rows are set as V A,0 (y|x) = 1 and V A,0 (y |x ) where x, x ∈ [1, ε] and G ε,m y,x = 1 and G ε,m y,x = 0. The remainder of the procedure is the same as the m > (d−1) case. Note that in the m = (d−1) case one of the V A,0 vertices is redundant of a V A vertex. To reconcile this conflict another vertex must be added which maximizes G m ML with V (y|x) = 1 for all x ∈ [1, ε] and V (x |x ) = 1 for all x ∈ [ε + 1, n ]. By this procedure (ε − 1) + m = (n − 1) affinely independent vertices are constructed for each target row y ∈ [1, ε]. Thus, ε(n − 1) affinely independent vertices are constructed for the anti-guessing rows of G ε,m A . For a target guessing row y ∈ [ε + 1, n ] we construct (n − 1) vertices where ε are constructed using the0 block in the lower left and (m − 1) vertices using the G m ML block. Starting with the lower left0 block we construct a vertex V ML,0 for each x ∈ [1, ε] by setting V ML,0 (y|x) = 1 and V ML,0 (y|y) = 1. Of the remaining (d − 1) rows one is used to maximize the G ε A block and (d − 2) rows maximize the G m ML block. Any unspecified unit terms of V ML,0 are set in the target row y. Since there are ε values of x to consider, this procedure produces ε affinely independent vertices. For example, when targeting row y = 4 we enumerate 3 vertices, (D21) Next, we use the G m ML block to a vertex V ML . If m > (d − 1), then we set V ML (1|x) = 1 for all x ∈ [1, ε] and use the procedure in Proposition 5 to enumerate (m − 1) affinely independent vertices that optimize the G m ML block in the target row. If m = (d − 1), then two anti-guessing rows are selected to maximize the G ε A block while the procedure in Proposition 5 is used for the remaining (d − 2) rows are used to construct (m − 1) affinely independent vertices that optimize the G m ML block in the target row. For example, when targetinng row y = 4 we enumerate 2 vertices, Each guessing row produces ε + (m − 1) = (n − 1) affinely independent vertices, thus in total, we have m (n − 1) vertices enumerated for the guessing rows. Combining the procedures for the guessing and anti-guessing rows, we construct a total of ε(n − 1) + m (n − 1) = n (n − 1) affinely independent vertices. Therefore, we prove that (G ε,m A , γ ε,d A ) is a tight Bell inequality. We now address the bounds on d and ε. The lower bound ε ≥ 3 follows from the fact that G 2 A = G 2 ML meaning the antiguessing game is indistinguishable from the maximum likelihood game. The upper bound n − d + 1 ≥ ε follows from the fact that m ≥ (d − 1) must be satisfied or n (n − 1) affinely independent vertices cannot be found because the entire diagonal of the G m ML block must be used by every vertex to satisfy G ε,m The upper bound n − 2 ≥ d results from the lower bound on ε and the fact that d cannot be so large the n − d + 1 < 3.

Appendix E: Proof of Proposition 2
In this section we prove the conditions for which the ambiguous guessing game (G n,n k,d , d) is a facet of C n→n d .

Proof of Proposition 2(i)
Proof. To prove Proposition 2(i), we consider the general form of an ambiguous guessing Bell inequality (G n,n k,d , d) where G n,n k,d ∈ R n ×n is row stochastic and contains k = n guessing rows (see main text). Note that matrix G n,n k,d is row stochastic and therefore describes any input/output lifting and permutation of the maximum likelihood game Remark. Proposition 2(i) is significant because it allows one to easily find a facet of any signaling polytope C n→n d . This enables the use of adjacency decomposition for any signaling polytope (see Appendix C).

Proof of Proposition 2(ii)
Proof. To prove Proposition 2(ii), we consider the ambiguous guessing game Bell inequalities (G n,n k,d , d) with k guessing rows and (n − k) ambiguous rows (see main text). Note that the ambiguous rows of G n,n k,d span the entire width of the matrix. For example, . Finally, if the rank of the guessing rows of G n,n k,d ) is less than n, then G n,n k,d cannot be a facet of C n→n d because there is an insufficient number of affinely independent vertices in {V ∈ V n→n d | G n,n k,d , V = d}. This is true because Proposition 5 implies that we can enumerate (k − 1)n affinely independent vertices using only guessing rows of G n,n k,d . This requires that the remaining (n −k)n affinely independent vertices are enumerated using (d − 1) guessing rows and one ambiguous row. However, as exemplified in the proof of Proposition 6, this cannot be done unless there is a nonzero element in each column of the k guessing rows of G n,n k,d . Thus we conclude that for n > k ≥ n and n > d G n,n k,d is a facet of C n→n d iff the rank of the guessing rows is n.
Remark. In our proof, we do not consider input liftings of G m ,d ?
because it results in matrices which deviate in form from G n,n k,d . Input lifting append an all-zero column to G m ,d ? while G n,n k,d is defined to have a nonzero element in each column of an ambiguous row ambiguous. Therefore, input liftings of ambiguous guessing facets G n ,d ? are incompatible with the ambiguous guessing games G n,n k,d described in the main text.
Proof. We first show the conclusion of Lemma 4 is true for any extreme point V of M n→n n −1 having ML sum φ(V) < n −1. If V is not extremal in P n→n , then V must have at least one column x with two non-extremal elements V (y 1 |x) and V (y 2 |x). However, we could then take two perturbations V (y 1 |x) → V (y 1 |x) ± and V (y 2 |x) → V (y 2 |x) ∓ with chosen sufficiently small so that the ML sum remains < n − 1 and the numbers remain non-negative. Hence by contradiction, V must be extremal in P n→n with rank clearly < n − 1.
Let us then consider an extremal point V of M n→n n −1 for which φ(V) = n − 1. Since φ(V) = n − 1 is an integer and V has n rows, then V must have at least two non-extremal row maximizers (possibly in different columns). We will again introduce perturbations, but care is needed to ensure that the perturbations are valid; i.e. the perturbed channels must remain in M n→n n −1 . There are two cases to consider. Case (a): Suppose that two non-extremal row maximizers occur in the same column: say V (y 1 |x) and V (y 2 |x) are both row maximizers in column x. Since these values will account for the contributions of rows y 1 and y 2 in the ML sum, and since there are only n total rows in this sum, we must have that all other row maximizers are +1. Hence we introduce perturbations V (y 1 |x) → V (y 1 |x) ± and V (y 2 |x) → V (y 2 |x) ∓ . If V (y 1 |x) and V (y 2 |x) are unique row maximizers, then this perturbation is valid. On the other hand, if there are columns x , x such that V (y 1 |x) = V (y 1 |x ) and/or V (y 2 |x) = V (y 2 |x ) (with possibly x = x ), then we must also introduce a corresponding perturbation V (y 1 |x ) → V (y 1 |x ) ± and/or V (y 2 |x ) → V (y 2 |x ) ∓ . To preserve normalization in columns x and/or x , we will have to introduce an off-setting perturbation to some other row in x and/or x . This can always be done since either x = x , or x and/or x have a non-extremal element in some other row which is not a row maximizer (since all other row maximizers are +1).

Case (b):
No column has two non-extremal row maximizers, and V has at least two non-extremal row maximizers that belong to different columns. For each row y with a non-extremal row maximizer, add perturbations ± y to all the row maximizers in that row. Since each column has at most one row maximizer, a normalization-preserving perturbation ∓ y can be added to another non-extremal element in any column having a row maximizer in row y. Finally, choose the y so that n y=1 y = 0.

Proof of Theorem 1(ii)
We now turn to the ambiguous polytopes A n→n ∩ := ∩ n k=n A n→n k,n−1 . Recall that A n→n k,n−1 is the polytope of channels P ∈ P n→n satisfying all Bell inequalities of the form G n,n k,n−1 , P ≤ n − 1, with G n,n k,n−1 having k guessing rows and (n − k) ambiguous rows. In this case, all the elements in an ambiguous row are equal to 1 n−d+1 = 1 2 . To prove Theorem 1(ii) we apply the following lemma to show that the extreme points of A n→n ∩ are the same as those of C n→n n−1 . Then by convexity of A n→n ∩ and C n→n n−1 we must have A n→n ∩ = C n→n n−1 .
Lemma 5. For arbitrary n ≥ n, the extreme points of A n→n ∩ are extreme points of C n→n n−1 .
Proof. We first argue that the conclusion of Lemma 5 is true for any extreme point V of A n→n ∩ such that G n,n k,n−1 , V < n − 1 for all G n,n k,n−1 and all integers k ∈ [n, n ]. Analogous to Lemma 4, if V has at least one column x with two non-extremal elements V (y 1 |x) and V (y 2 |x), we can take two sufficiently small perturbations V (y 1 |x) → V (y 1 |x) ± and V (y 2 |x) → V (y 2 |x) ∓ and still satisfy all the constraints of Eq. (F3). Hence, V must be an extreme element of P n→n . In this case, rank(V) < n − 1 since φ(P) < n − 1, and so V ∈ C n→n n−1 . It remains to prove the conclusion of Lemma 5 whenever Eq. (F3) is tight for some A n→n k,n−1 . The lengthiest part of this argument is when k = n and tightness in Eq. (F3) corresponds to the ML sum equaling n − 1. In this case, Proposition 10 below shows that V must be an extreme point of C n→n n−1 . However, before proving this result, we apply it to show that Lemma 5 holds whenever Eq. (F3) is tight for some other G n,n k,n−1 with k < n . Specifically, we will perform a lifting technique on any vertex V satisfying G n,n k,n−1 , V = n − 1 and reduce it to the case of the ML sum equaling (n − 1).
Suppose that φ(V) < n − 1 yet there exists some G n,n k,n−1 such that G n,n k,n−1 , V = n − 1. The matrix G n,n k,n−1 identifies (n − k) ambiguous rows, and suppose that y is an ambiguous row such that 1 2 r y 1 > r y ∞ , with r y being the y th row of V. To be concrete, let us suppose without loss of generality that the components of row r y are arranged in non-increasing order (i.e. V (y|x i ) ≥ V (y|x i+1 )), and let k be the smallest index such that By the assumption 1 2 r y 1 > r y ∞ , we have k > 1. Also, since k is the smallest integer satisfying Eq. (F4), we have Subtracting V (y|x k−1 ) from both sides of this equation implies that the LHS of Eq. (F4) is strictly positive. Hence, there exists some λ ∈ (0, 1] such that Consider then the new matrix V formed from V by splitting row y into k rows as follows: Notice that we can obtain V from V by coarse-graining over these rows. Moreover, this decomposition was constructed so that where the r yi are the rows in Eq. (F7). Essentially this transformation allows us to replace an ambiguous row with a collection of guessing rows so that the overall guessing score does not change.
We perform this row splitting process on all ambiguous rows of V thereby obtaining a new matrix V such that φ( V) = n − 1. If m is the total number of rows in V, then V will be an element of A n→m ∩ . We decompose V into a convex combination of extremal points of A n→m ∩ as V = λ p λ V λ . By the convexity of φ, it follows that φ( V λ ) = n−1, and we can therefore apply Proposition 10 below on the channels V λ to conclude that they are extreme points of C n→m n−1 . Consequently, each V λ has only one nonzero element per row. Let R denote the coarse-graining map such that V = R V, and apply However, by the assumption that V is extremal, this is only possible if R V λ is the same for every λ. As a result, any two V λ and V λ can differ only in rows that coarse-grain into the same rows by R. From this it follows that V can have no more than one nonzero element per column and rank(V) ≤ n − 1. Hence we've shown that the extreme points of A n→n ∩ are indeed extreme points of the signaling polytope C n→n n−1 . To complete the proof of Lemma 5, we establish the case when φ(V) = n − 1, as referenced above. We begin by proving the partial result provided by Proposition 9 and then, use this result to prove Proposition 10.
Proposition 9. If V is an extreme point of A n→n ∩ satisfying φ(V) = n − 1, then each column of V must have at least one unique row maximizer or it has only one nonzero element.
Proof. Suppose on the contrary that some column x has more than one nonzero element yet no unique row maximizer. Let S x ⊂ [n ] be the set of rows for which column x contains a row maximizer. Since only one row maximizer per row contributes to the ML sum, and the elements of column x sum to one, we can satisfy φ(V) = n − 1 iff both conditions hold: (i) each row y in S x has only two nonzero elements V (y|x) and V (y|x y ) for some column x y = x; (ii) every other nonzero element in V outside of column x and the rows in S x are unique row maximizers.
With this structure, we introduce three cases of valid perturbations.
Case (a): V (y 1 |x) and V (y 2 |x) are non-extremal elements in column x with y 1 , y 2 ∈ S x . Then V (y 1 |x) → V (y 1 |x) ± and V (y 2 |x) → V (y 2 |x) ∓ is a valid perturbation. Indeed, even if we consider y 1 or y 2 as ambiguous rows, there is at most one other element in each of these rows (property (i) above), and so this perturbation would not violate any of the inequalities in (F3).
Case (b): V (y 1 |x) and V (y 2 |x) are non-extremal elements in column x with y 1 ∈ S x and y 2 ∈ S x . Then V (y 1 |x) = V (y 1 |x y1 ) for some other column x y1 = x. By normalization, there will be another element V (y 3 |x y1 ) in column x y1 that by property (ii) is a unique row maximizer. Hence, we introduce perturbations For clarity, the line spacing is chosen here so that elements on the same vertical line correspond to elements in the same row of V. By properties (i) and (ii), these perturbations do not increase the ML sum, nor are they able to violate any of the other inequalities in (F3).
Case (c): V (y 1 |x) and V (y 2 |x) are non-extremal elements in column x with y 1 , y 2 ∈ S x . Then V (y 1 |x) = V (y 1 |x y1 ) and V (y 2 |x) = V (y 2 |x y2 ) for some other columns x y1 , x y2 = x (with possibly x y1 = x y2 ). By normalization, there will be elements V (y 3 |x y1 ) and V (y 4 |x y2 ) in columns x y1 and x y2 respectively that are unique row maximizers (again by property (ii)). Note this requires that y 1 , y 2 , y 3 , y 4 are all distinct rows. Hence, we introduce perturbations Normalization is preserved under these perturbations and all the inequalities in (F3) are satisfied. As we have shown valid perturbations in all three cases under the assumption that some column has non-extremal elements with no unique row maximizer, the proposition follows. Proof. Suppose that V has some column x 1 containing more than one nonzero element (if no such column can be found, then the proposition is proven). Let V (y 1 |x 1 ) ∈ (0, 1) denote a unique row maximizer, which is assured to exist by Proposition 9. We again proceed by considering two cases.
Case (a): Column x 1 contains only one row maximizer V (y 1 |x 1 ) and all other elements in the column are not row maximizers. Then there must exist another column x 1 that also contains at least two nonzero elements. Indeed, if on the contrary all other columns only had one nonzero element each, then it would be impossible for φ(V) = n − 1. If x 1 only contains row maximizers, then proceed to case (b) and replace x 1 with x 1 . Otherwise, x 1 does not only contain row maximizers; rather it has a unique row maximizer V (y 3 |x 1 ) in row y 3 and a nonzero element V (y 4 |x 1 ) in row y 4 that is not a row maximizer. Thus, we can introduce the valid perturbations where V (y 2 |x 1 ) denotes another nonzero element in x 1 (with possibly y 2 = y 3 , y 4 and/or y 3 = y 1 ). It can be verified that all inequalities in (F3) are preserved under these perturbations.
Case (b): Column x 1 only contains row maximizers, with V (y 2 |x 1 ) being another one in addition to V (y 1 |x 1 ). If V (y 2 |x 1 ) is a unique row maximizer, then valid perturbations can be made to both V (y 1 |x 1 ) and V (y 2 |x 1 ). On the other hand, suppose that V (y 2 |x 1 ) is a non-unique row maximizer, and let V (y 2 |x 2 ) = V (y 2 |x 1 ) be another row maximizer in column x 2 . There can be no other nonzero elements in row y 2 . Indeed, if there were another column, say x 3 , such that V (y 2 |x 3 ) > 0, then we would have and so where the one ambiguous row in G n,n n −1,n−1 is y 2 . Hence, the only nonzero elements in row y 2 are V (y 2 |x 1 ) and V (y 2 |x 2 ). Let V (y 3 |x 2 ) be a unique row maximizer in column x 2 .
We must be able to find another column x 3 with more than one nonzero element, one of which is a unique row maximizer and the other which is a non-unique row maximizer. For if this were not the case, then any other column in V would either have a unique row maximizer equaling one, or it would have at least two elements, one being a unique row maximizer and the others not being row maximizers. However, the latter possibility was covered in case (a) and was shown to be impossible for an extremal V. For the former, if all then other n − 2 columns outside of x 1 and x 2 contain unique row maximizers equaling one, then they would collectively contribute an amount of n − 2 to the ML sum. Since every element in column x 1 is a row maximizer, and V (y 3 |x 2 ) is a row maximizer in column x 2 , we would have φ(V) > (n − 2) + 1 + V (y 3 |x 2 ) > n − 1. Hence, there must exist another column x 3 with a non-unique row maximizer V (y 5 |x 3 ) that is shared with column x 4 (which may be equivalent to either x 1 or x 2 ). Letting V (y 4 |x 3 ) and V (y 6 |x 4 ) denote unique row maximizers in columns x 3 and x 4 , respectively, we can perform the valid perturbations Note that y 1 , y 2 , y 3 , y 4 , y 5 , y 6 are all distinct rows since each row in V can have at most one pair of non-unique row maximizers while rows y 1 , y 3 , y 4 , y 6 contain unique row maximizers. This assures that the perturbations do not violate the inequalities in (F3). As cases (a) and (b) exhaust all possibilities, we see that V can only have one nonzero element per column. From this the conclusion of Proposition 10 follows.
This completes the proof of Lemma 5.

Appendix G: Proof of Theorem 2
In this section we analyze the C n→4 2 signaling polytope to prove the Theorem 2. To begin we define the polyhedron of channels for any Bell inequality (G, γ) with G ∈ R n ×n and γ ∈ R. Since C n→n d is a convex polytope, there exists a finite number of polyhedra {C(G m , γ m )} r m=1 such that that C n→n d ⊂ C(Ĝ,γ). Conversely, if P ∈ C(Ĝ,γ), then y,x P (y|x ) = yĜ y,x P (y|x) + x =x,yĜ Therefore, P ∈ C(G, γ) and so C n→n d ⊂ C(Ĝ,γ) ⊂ C(G, γ). Note that if G has only non-negative elements then so willĜ.
Lemma 7. For any finite number of inputs n, with each G m having at most six nonzero columns.
Proof. As a consequence of Lemma 6, we can always find a complete set of polyhedra {C(Ĝ m ,γ m )} s m=1 such that such that eachĜ m has no more than positive elements in each column and the rest being zero. Our goal is to show that the number of such columns can be reduced to six. The key steps in our reduction are given by the following two propositions.
Proof. Every vertex V of C n→4 2 will have support in only two rows. If V has support in the first two rows, then its upper left corner will have one of the forms ( 1 1 0 0 ), ( 1 0 0 1 ), ( 0 1 1 0 ), ( 0 0 1 1 ). In each of these cases, Ĝ , V ≤γ ⇔ Ĝ , V ≤γ. The other possibility is that V has support in only one of the first two rows. This leads to upper left corners of the form ( where κ is the contribution of the other columns to the inner product, and we have used the assumption that a ≥ c. Similar reasoning shows that Ĝ , V ≤γ for all other vertices V. Conversely, by an analogous case-by-case consideration, we can establish that C n→4 2 ∈ C(Ĝ ,γ) implies Ĝ , V ≤γ for all vertices V of C n→4 2 .
Proof. This proof considers the vertices of C n→4 2 and applies the same reasoning as the proof of Proposition 11.
Continuing with the proof of Lemma 7, suppose that C n→4 2 ∈ C(Ĝ m ,γ m ) with each column ofĜ m having no more than two nonzero rows. We can group the columns into six groups according to which two rows have zero (it may be that a column has more than two zeros, in which case we just select one group to place it in). By repeatedly applying Proposition 11, we can replaceĜ m with a matrixĜ m such that each group has at most one column with two nonzero elements; the rest of the columns in that group have at most just one nonzero element. We then repeatedly apply Proposition 12 to remove multiple columns with the same single nonzero row. In the end, we arrive at the following: where eachĜ m,j has at most ten nonzero columns corresponding to the different ways that no more than two nonzero elements can occupy a column. That is, up to a permutation of columns, eachĜ m,j will have the form The final step is to remove the block of diagonal elements [g, h, i, j]. To do this, observe that we absorb any of these diagonal elements into an earlier column, provided that the row contains the largest element in that column. For example, if f 2 > f 1 , then we can replaceĜ m,j witĥ G m,j =    a 1 b 1 c 1 0 0 0 g 0 0 0 0 · · · a 2 0 0 d 1 e 1 0 0 h 0 0 0 · · · 0 b 2 0 d 2 0 f 1 0 0 i 0 0 · · · 0 0 c 2 0 e 2 f 2 + j 0 0 0 0 0 · · ·    , and we can easily see that C n→4 2 ∈ C(Ĝ m,j ,γ m ) iff C n→4 2 ∈ C(Ĝ m,j ,γ m ). By considering the maximum element in each of the first six columns, we can perform this replacement for at least three of the four elements [g, h, i, j]. If we can do this for all four elements, then the proof is complete. On the other hand, if we can only remove three of these elements, then we will obtain a matrixĜ m,j of the form (up to row/column permutations) G m,j =    a 1 b 1 c 1 0 0 0 g 0 0 0 0 · · · a 2 0 0 d 1 e 1 0 0 0 0 0 0 · · · 0 b 2 0 d 2 0 f 1 0 0 0 0 0 · · · 0 0 c 2 0 e 2 f 2 0 0 0 0 0 · · ·    (G17) with a 1 , b 1 , c 2 not having the largest values in their respective columns. In this case, we construct the matrix G m,j =    a 1 + g b 1 + g c 1 + g 0 0 0 0 0 0 0 0 · · · a 2 + g 0 0 d 1 e 1 0 0 0 0 0 0 · · · 0 b 2 + g 0 d 2 0 f 1 0 0 0 0 0 · · · 0 0 c 2 + g 0 e 2 f 2 0 0 0 0 0 · · ·    , which must satisfy n +1 y=1 Π y = I d . Furthermore, one column x of G ML has two nonzero elements in rows y and y where G y,x = G y ,x = 1. In this case, two POVM elements Π y and Π y are both optimized against the state ρ x . However, the constraint Tr[Π y ρ x ] + Tr[Π y ρ x ] ≤ 1 holds for any choice of ρ x and POVM. Therefore, the inner product G ML , P Rµ ≤ µd + (1 − µ). The argument applied for the output lifting holds in general where one or more columns x contain at least two non-zero elements. Thus, the upper bound in Eq. (H6) holds for any input/output lifting taking G n ML → G ML ∈ R m ×m where min{m, m } ≥ n . This concludes the proof. The upper bound on the maximum likelihood score from Proposition 13 serves as a lower bound on the signaling dimension of the partial replacer channel κ(R µ ). This follows from the fact that if P Rµ / ∈ M n→n r , then κ(R µ ) > r Furthermore, the integer nature of the signaling dimension implies that κ(R µ ) ≥ µd + (1 − µ) . We now turn to certify the signaling dimension of the partial erasure channel. Proposition 14. The signaling dimension of of a d-dimensional partial erasure channel is, κ(E µ ) = min{d, µd + 1 }. (H10) Proof. Let the classical channel P Eµ be induced by the partial erasure channel E µ via Eq. (1) for any collection of quantum states {ρ x } x and POVM {Π y } y . The transition probabilities are then expressed where P id d (y|x) = Tr[Π y ρ x ] and P |E (y) = Tr[Π y |E E|]. Since the simulation protocol for partial replacer channels can faithfully simulate P Eµ , the upper bound κ(R µ ) ≤ µd + 1 holds (see the proof of Theorem 3 in the main text). Therefore, min{d, µd + 1 } ≥ κ(E µ ). To establish a lower bound on κ(E µ ) we consider the channel P Eµ ∈ P d→ As demonstrated in Proposition 13, P Eµ achieves the maximum likelihood upper bound for partial replacer channels, G ML , P Eµ = µd + (1 − µ). In fact, this bound also holds for non-orthogonal quantum states {ρ x } x∈[n] where n > d.
To improve the lower bound on κ(E µ ) beyond Proposition 13, we consider the ambiguous polytope A (n −1)→n (n −1),r with ambiguous guessing facets G n ,r ? that are tight Bell inequalities of C n→n r (see Appendix D 3). Our goal is to find the smallest integer r such that P Eµ ∈ A (n −1)→n (n −1),r , that is, G n ,r ?