General entropic constraints on CSS codes within magic distillation protocols

Magic states are fundamental building blocks on the road to fault-tolerant quantum computing. CSS codes play a crucial role in the construction of magic distillation protocols. Previous work has cast quantum computing with magic states for odd dimension $d$ within a phase space setting in which universal quantum computing is described by the statistical mechanics of quasiprobability distributions. Here we extend this framework to the important $d=2$ qubit case and show that we can exploit common structures in CSS circuits to obtain distillation bounds capable of out-performing previous monotone bounds in regimes of practical interest. Moreover, in the case of CSS code projections, we arrive at a novel cut-off result on the code length $n$ of the CSS code in terms of parameters characterising a desired distillation, which implies that for fixed target error rate and acceptance probability, one needs only consider CSS codes below a threshold number of qubits. These entropic constraints are not due simply to the data-processing inequality but rely explicitly on the stochastic representation of such protocols.


I. INTRODUCTION
Work towards achieving fault-tolerant quantum computing is currently seeing rapid progress on qubit systems on many computational platforms [1][2][3][4][5][6][7][8][9].In particular, the surface code [10][11][12] is a leading framework that allows Clifford operations to be implemented transversally on blocks of physical qubits.However, Clifford operations are not universal for quantum computing [13,14], and in fact it is impossible to encode any universal gateset transversally [15].A prominent method of circumventing this problem is the magic state injection model, wherein the Clifford group is promoted to a universal gateset by injecting copies of special non-stabilizer states known as magic states [16,17].While magic states can only be prepared in surface codes with relatively high error rates, it is possible to reduce the noise per copy by converting many noisy magic states into fewer higher-fidelity magic states using only stabilizer operations [18].This process, known as magic state distillation, allows magic states to be produced with arbitrarily high purity, and thereby enables universal quantum computation within surface code models.
Almost all protocols [18][19][20][21][22] to-date for qubit magic distillation are based on a subclass of stabilizer codes known as Calderbank-Shor-Steane (CSS) codes [23,24].CSS codes can be constructed from two classical linear codes, allowing one to draw on a plethora of results from classical coding theory to construct quantum codes with desirable properties.For instance, it has been shown that CSS codes are optimal when it comes to constructing quantum error correcting codes that support a transversal T -gate [25], a key feature in many of the aforementioned distillation protocols.Although significant progress has been made to reduce the overhead of such protocols [26], distillation is still estimated to dominate the total resource cost of performing a computation in the magic state injection model.Therefore, a better understanding of the extent to which this cost can be reduced is of great practical interest.
Recent work [27] has developed a framework for analysing magic distillation in odd-dimensional systems by taking key insights from a rich literature of majorization theory and applying them to discrete phase space representations of magic states.In odd dimensions, Gross's Wigner function [28] provides a representation wherein distillable magic states correspond to quasiprobability distributions containing negativity on a discrete phase space [16].By contrast, stabilizer states correspond to probability distributions, and stabilizer operations in general are represented by stochastic transformations [28,29].Thus when computation is restricted to the stabilizer setting, one obtains a classical stochastic model that can be studied using entropic theory and in particular relative majorization [30][31][32][33][34]. Ref. [27] extended majorization tools to negatively-represented magic states, and found that a dense subset of α-Rényi entropies H α remain well-defined and meaningful as quantifiers of disorder on quasiprobability distributions under stochastic processing, leading to fundamental constraints on magic distillation protocols in the form of thermodynamic laws.
Since most quantum algorithms are formulated for systems of qubits, the important question of whether this framework can be extended to qubit systems remains.There are, however, many well-known obstacles in constructing valid Wigner representations for qubits (related to the fact that 2 −1 does not exist modulo 2 [29]).Many constructions for Wigner functions, including Gross's, cannot be extended to qubits [35,36], while others represent some pure stabilizer states negatively [37,38], which breaks the link established in odd dimensions between Wigner negativity and quantum computational speedup [39,40].While substantial work has been done to develop phase space representations wherein all qubit stabilizer operations are non-negatively represented [41,42], channels are typically not mapped to linear, let alone stochastic, transformations under such representations.
Progress can be made by identifying subsets of qubit stabilizer operations that can be represented stochastically, while nevertheless remaining capable of univer-sal quantum computation via magic state injection.In this paper, we make use of a Wigner representation for qubits introduced in Ref. [43] that shares many desirable features with Gross's representation, such as the linear representation of channels.Drawing on results from Ref. [44], we show that CSS circuits -the subset of stabilizer circuits wherein CSS states play the role of stabilizer states -remain stochastic in this representation.Since CSS circuits are known to be capable of universal quantum computation via magic state injection [44,45], they provide a setting where we can extend the statistical mechanics framework for universal quantum computing developed in Ref. [27] to qubits.
The structure of our paper is as follows.In Section III, we introduce the phase space representation of qubit states and channels we will use, and identify some regimes where this representation becomes stochastic.Building from this, in Section IV we develop majorization techniques to analyse stochastically-represented magic distillation protocols on qubits.Finally, in Section V, we apply these tools to derive general entropic constraints (in the form of upper and lower bounds on code length) for distillation protocols that project onto CSS codes, which exploit structures basic to this distillation strategy.

II. MAIN RESULTS
We show that CSS circuits can be represented by stochastic maps on a well-defined multiqubit phase space.By exploiting techniques from majorization, in Theorem 5 we extend the statistical mechanical framework of Ref. [27] to CSS qubit quantum computation.
We further find that, similar to Ref. [47], every CSS circuit can be decomposed in terms of protocols that project onto CSS codes, which therefore constitute the core machinery for magic distillation in CSS circuits.For such CSS code projection protocols, we obtain novel upper bounds on the code length n as a function of the number of output qubits k, acceptance probability p and input and output error rates and δ respectively (which are typically related to the code distance D via δ = O( D )).Our main result is the following, which we generalise to odd dimensional systems and arbitrary stabilizer codes in Theorem 9.
Result 1.Consider the distillation of k copies of a pure qubit magic state ψ from a supply of the noisy magic state ρ, where both ψ and ρ have real density matrices in the computational basis.Any magic distillation protocol that projects onto the codespace of an [[n, k]] CSS code and can use n copies of ρ to distil out a k-qubit state ρ at output error δ ≥ ρ − ψ ⊗k 1 and acceptance probability p must have a code length n such that 2 required to distil a single output qubit |H with output error rate δ = 10 −9 by projecting onto an [[n, 1]] CSS code.The shaded purple region shows the range of code lengths allowed by the tightest numeric upper bound (red curve) from Theorem 8 and the lower bound from projective robustness (PR) introduced in Ref. [46] (blue curve).The analytic upper bound n * (dashed yellow curve) defined in Eq. ( 2) is shown to form a good approximation to the numeric bound.(a) When target acceptance probability p is low (p = 0.1) the upper bounds are less constraining; (b) By increasing to p = 0.9, the upper bounds become considerably tighter.In both cases, there is a cut-off input error beyond which no CSS code projection protocol can achieve the desired combination of output error and acceptance probability.
for all α ∈ A for which H α (W ρ ) < 1, and bounds on the number of noisy magic states needed to distil out a single copy at the output take on the particularly simple form: where the base of the logarithm is given by f In the spirit of Ref. [48], Eq. ( 3) and more generally Theorem 9, can be viewed as expressing trade-off relations between various distillation parameters.These fundamental no-go results may be instructive when constructing stabilizer code projection protocols with optimized parameters.

III. STOCHASTIC REPRESENTATION FOR CSS CIRCUITS ON QUBITS
In this section, we review the qubit representation W ρ introduced in Ref. [43] that forms the backbone of our work.We expand upon its properties and confirm that it respects all sequential and parallel composition of processes, the former of which crucially gives rise to a welldefined input-output relation W E(ρ) = W E W ρ .Moreover, the latter implies that the representation of product states factorizes over subsystems, a property which will prove computationally advantageous given that inputs to magic distillation protocols typically take on the form ρ ⊗n .Furthermore, we show that all magic distillation protocols executed by CSS circuits are stochastically represented, which means that such protocols can be analysed using majorization theory and admit a description in terms of classical statistical mechanics on quasiprobability distributions.

A. Phase space representation of qubit states
We first establish some convenient notation.Let u := (u 1 , . . ., u n ) ∈ Z n 2 denote a binary vector.Furthermore, given any single qubit operator O, let us denote With this in place, consider an n-qubit quantum system with total Hilbert space H n 2 := H ⊗n 2 .We associate to this system a phase space P n := Z n 2 × Z n 2 , where P n consists of all vectors (u x , u z ), and has a symplectic inner product [u, v] defined as where arithmetic is carried out modulo 2.
We are now in a position to define our chosen representation over P n (we refer the reader to Appendix A for further details and proofs of the properties presented).We first define n-qubit displacement operators {D u }, where u := (u x , u z ) ∈ P n , via strings of single qubit Pauli operators X and Z as which generate the Heisenberg-Weyl group H(2) ×n on n-qubits modulo phase factors [49].These displacement operators satisfy Using these displacement operators, we can construct the following representation for any n-qubit state ρ, where {A u } are the set of 2 2n phase point operators on n-qubits, which are defined as It can be shown (see Appendix A 1) that these phase point operators share the following properties with those defining Gross's Wigner representation of n qudits with Hilbert space dimension d on a phase space where u X and u Y are respectively points in the phase spaces of subsystems X and Y .
These properties imply that W ρ provides an informationally complete and normalized representation of general n-qubit states, i.e., for any quantum state ρ.Like Gross's Wigner function, an immediate consequence of Eq. ( 7) is that the representation W ρ transforms covariantly under the displacement operators, namely, for all u, v ∈ P n .In fact, everything in the construction of this representation has proceeded in direct analogy to Gross's, except for the lack of phase factors ensuring the Hermiticity of the displacement operators.As a result, the phase point operators in Eq. ( 9) are no longer Hermitian, which in turn implies that W ρ is generally complex.However, it turns out that the real and imaginary parts of W ρ are related to the quantum state in the following simple way (a proof is given in Appendix A 2): for all u ∈ P n , where Re(ρ) and Im(ρ) are respectively the real and imaginary parts of the density matrix of ρ in the computational basis.

This immediately implies
Corollary 1.The representation W ρ of an n-qubit state ρ is a quasiprobability distribution if and only if ρ is an n-rebit state, i.e., the density matrix ρ is real in the computational basis.
To simplify our analysis, we will therefore focus on rebit states for the majority of this work, although in Section VI we show how we can handle arbitrary qubit states by treating the real and imaginary components separately resulting in an overcomplete quasiprobability representation.Typically, however, we will consider the case of distilling the following Hadamard state which is equivalent to the canonical magic state |1 up to a Clifford unitary [50], where T := diag(1, e i π 4 ) is the T -gate.The Hadamard state can thus be used in a stabilizer gadgetisation circuit to implement the T -gate [18].

B. Phase space representation of channels
The representation of qubit states induces a corresponding representation of qubit channels.Let E be an arbitrary channel from n to m qubits, and be its associated Choi state [51], where |φ + n is the canonical maximally entangled state on two copies of the input system.We now define a representation [52] of a quantum channel E as for all v ∈ P m , and u ∈ P n .Under this representation, every channel becomes a matrix mapping the representation of an input state to the representation of the output state.More precisely, if σ = E(ρ), then for all v ∈ P m .Furthermore, the representation W E respects the sequential and parallel composition of channels, i.e., Golay [ [23,1]] codes [19].
One useful implication of Eq. ( 19) is that when E and F respectively prepare states ρ and σ, we obtain which informs us that our chosen representation factorizes over subsystems for product states.The transition matrix formed by W E (v|u) preserves normalization since for any u ∈ P n and any quantum channel E. (Proofs of Eqs. ( 17) through (21) can be found in Appendix A 3).Therefore, a quantum channel E from n qubits to m qubits is represented by a stochastic matrix if and only if W E (v|u) ≥ 0 for all u, v.By inspection of Eq. ( 16) we equivalently have that the quantum channel E is stochastically represented if and only if its Choi state J (E) on n + m qubits is represented by a genuine probability distribution on the phase space P n+m .Unlike the odd-dimensional case, the channel E is not guaranteed a stochastic representation whenever J (E) is a stabilizer state.This is an immediate consequence of the sequential and parallel composition rules of Eqs. ( 18) and (19), which imply that our qubit representation cannot be non-negative over the full stabilizer subtheory [53].However, we will show in the next section that E is stochastically represented if J (E) belongs to an important subset of stabilizer states known as CSS states.

C. CSS states and circuits
We now identify a class of qubit distillation protocols that arise naturally in fault-tolerant quantum computing, are sufficiently large to enable universal quantum computation, and admit a stochastic representation.In particular, we show that a channel is stochastically represented if its Choi state is CSS.Building on this result, we construct a stochastically-represented subset of stabilizer circuits wherein CSS states play the role of stabilizer states, which has been shown to be capable of universal quantum computation with magic state injection.
A pure CSS state on n qubits is any stabilizer state whose stabilizer group can be generated by n Pauli observables that are individually of X-type or Z-type only.For instance, |φ + := 1 √ 2 (|00 + |11 ) has the stabilizer group and is therefore CSS.By contrast, |ψ and is not CSS because its stabilizer generators necessarily mix X and Z.As they are generators of stabilizer groups defining CSS states, we group X-and Z-type Pauli observables together as CSS observables.Letting the set of all pure CSS states be denoted Ω css , we further define the set of all CSS states D css as the convex hull of Ω css .The representation we have chosen coincides on all rebit states with an earlier one introduced by Ref. [44] (see Appendix A 2), in which it was shown that a Discrete Hudson's theorem [28] can be recovered for qubits when one restricts to rebits.More precisely, it was shown that any pure n-rebit state is non-negatively represented if and only if it is CSS.Therefore, W ρ is a valid probability distribution for all ρ ∈ D css , and we conclude that Theorem 2. A quantum channel E from n to m qubits is stochastically represented if J (E) is a CSS state on n + m qubits.
Theorem 2 can be leveraged to identify stochasticallyrepresented qubit stabilizer operations in a systematic way.A channel E from a system of qubits B is CSSpreserving if E(ρ) is a CSS state for all ρ ∈ D css , and completely CSS-preserving if, given any CSS state ρ AB on another system of qubits A as well as B, I A ⊗E B (ρ AB ) is always CSS.We now note that the maximally entangled state |φ + n over two sets of n qubits is CSS for all n (see Appendix A 3).Therefore, if E is completely CSSpreserving, J (E) must be CSS.By Theorem 2, it follows that every completely CSS-preserving channel is stochastically represented.
To motivate the class of completely CSS-preserving channels as operationally significant, we highlight that they cover at least the following subset of stabilizer circuits (see Appendix B 3 for proof): Lemma 3 (CSS circuits).Any sequence of the following stabilizer operations: 1. introducing a CSS state on any number of qubits, 2. performing a completely CSS-preserving gate on any number n of qubits, i.e., a member of the group G(n) := CNOT(i, j), Z i , X i i,j=1,...,n,i =j .
3. projectively measuring a CSS observable (with the possibility of classical control conditioned on outcome), 4. discarding any number of qubits, as well as statistical mixtures of such sequences, is completely CSS-preserving.
By using CSS-preserving rather than completely CSSpreserving gates, channels covered by Lemma 3 can be promoted to the subset of stabilizer circuits where CSS states play the role of stabilizer states.However, both groups of gates are equally powerful for magic distillation (see discussion in Appendix B 3 a).Thus we directly refer to the set of channels covered by Lemma 3 as "CSS circuits", and conclude that all such circuits are stochastically represented.
We emphasise the computational power of CSS circuits.Firstly, they are capable of universal quantum computation when supplemented by rebit magic states [44,45] they can also distil [54].Moreover, the gateset G(n) constitutes all gates that can be implemented fault-tolerantly using defect braiding in surface codes [55].Finally, we will see in Section V that CSS circuits form the basis of many existing magic distillation protocols constructed around CSS codes.

IV. ENTROPIC CONSTRAINTS ON COMPLETELY CSS-PRESERVING PROTOCOLS
The standard approach to obtaining constraints on magic distillation is tracking a magic monotone [48,[56][57][58][59], which is any property of a quantum system that cannot be increased under some class of magic nongenerating operations (e.g.stabilizer operations).The paradigmatic example is mana [56], the total negativity in the Wigner representation of a state.However, this approach operates at the state level and therefore does not incorporate any additional distinguishing physics of distillation protocols.In contrast, the recent work [27] considers how a class of magic distillation protocols transform a pair of quantum states -one a noisy magic state, the other a stabilizer state singled out by the characteristic physics of those protocols.
Here we briefly review the approach taken in Ref. [27] to extend relative majorization to quasiprobability distributions, and how that leads to the extension of a dense subset of α-Rényi divergences from classical statistical mechanics to quantify the non-classical order in magic states under distillation.We then adapt this work for rebit magic state distillation using CSS circuits.

A. Statistical mechanics of quasiprobability distributions
At the heart of statistical mechanics are the notions of disorder and deviations from equilibrium.In classical statistical mechanics, this leads to the thermodynamic entropy H(p) = − i p i log p i , which is essentially the unique measure of disorder of a statistical distribution p = (p 1 , . . ., p N ).
In odd-dimensional systems [16] or restricted qubit models [44,45,60], magic states that promote an efficiently simulable part of quantum mechanics to universal quantum computation must have negativities in their representation within a phase space model.Despite this negativity, it is still possible to arrive at a well-defined statistical mechanical description that circumvents the fact that the Boltzmann entropy is not well-defined.The key observation we exploit is that the framework of majorization remains well-defined when extended to quasiprobability distributions, and is a more fundamental concept than the traditional entropy.
Given two probability distributions p = (p 1 , . . ., p N ) and p = (p 1 , . . ., p M ), we would like to determine which of them is "more disordered" than the other.This can be done by comparing p to some reference probability distribution r = (r 1 , . . ., r N ) of our choice, and p to some other reference probability distribution r = (r 1 , . . ., r M ), also of our choice.We then say that (p, r) relatively majorizes (p , r ) and write (p, r) (p , r ) if there exists a stochastic map A that sends the first pair of distributions into the second, namely (Ap, Ar) = (p , r ). ( It was shown in Ref. [27] that this definition can be extended to the case of quasiprobability distributions in the first argument, and the following result was established to provide an entropic measure in terms of the α-Rényi divergences.

Theorem 4 ([27]
).Let w = (w 1 , . . ., w N ) and w = (w 1 , . . ., w M ) be any two quasiprobability distributions and let r = (r 1 , . . ., r N ) and r = (r 1 , . . ., r M ) be any two probability distributions with non-zero components.If (w, r) (w , r ) then Here D α (w||r) is an extension of the classical α-Rényi divergence to the case of w being a quasiprobability distribution.This extension requires α ∈ A in order for the expression to be well-defined [61].In the case of r being the uniform distribution r = (1/N, 1/N, . . ., 1/N ), we have that where H α (w) := (1 − α) −1 log i w α i is the α-Rényi entropy evaluated on w.Another result of Ref. [27] is that w has negativity if and only if H α (w) is negative for α close to 1 and diverges to −∞ in the limit α → 1 + .This provides a well-defined and meaningful notion of negative entropy in a statistical mechanical setting.

B. Application to completely CSS-preserving magic distillation
Since completely CSS-preserving protocols are stochastically represented, the following family of entropic constraints on rebit magic distillation applies them all (see Appendix C for proof): Theorem 5. Let ρ be a noisy rebit magic state and τ be a CSS state in the interior of D css .If there exists a completely CSS-preserving protocol E such that E(ρ ⊗n ) = ρ and τ := E(τ ⊗n ) is also in the interior of D css , then for all α ∈ A, where The reference process τ ⊗n → τ in Theorem 5 can be used in three different ways: (1) as a variational parameter, (2) to account for limitations the physical hardware carrying out magic distillation, or (3) to capture structure distinctive to a family of protocols and thereby produce entropic constraints specific to that family, which we now elaborate on in turn.
For (1), we simply treat τ as a variational parameter, which can be optimized over D css to obtain the following set of monotones [62] on completely CSS-preserving protocols for all α ∈ A. To see this, we note that we have D α (W ρ ||W τ ) ≥ 0 for all ρ, τ , with equality if and only if ρ = τ (see Lemma 15).Given any rebit state ρ, let τ ρ be a solution to the optimization problem in Eq. (31).Then if there exists a completely CSS-preserving protocol E such that E(ρ) = ρ , we obtain where the first inequality follows from generalised relative majorization and the second inequality follows by the definition in Eq. (31).Therefore {Λ α } α∈A form an infinite set of monotones on all completely CSSpreserving protocols.It is straightforward to verify that Λ α are sub-additive, i.e., Λ α (ρ ⊗n ) ≥ nΛ α (ρ) (this follows from the additivity of the generalized α-Rényi divergences).Therefore, these Λ α -monotones allow us to set global bounds on any completely CSS-preserving protocol.More precisely, if there exists a completely CSSpreserving protocol E such that E(ρ ⊗n ) = ρ , then the overhead n is lower bounded as For (2), we can use the reference process to take into account limitations in the hardware carrying out magic distillation.For instance, Ref. [27] uses the reference process to preserve the Gibbs state in order to encode a background temperature or free energy production in the distillation hardware.
The final way (3) of using the reference process is demonstrated in the next section.We show explicitly how to the reference process may be chosen to produce entropic constraints specialised for CSS code projection protocols.

V. ENTROPIC CONSTRAINTS ON CSS CODE PROJECTION PROTOCOLS
In this section, we apply Theorem 5 to CSS code projection protocols, and obtain lower and upper bounds on their code length (∼ resource cost).In some parameter regimes, the new lower bounds outperform those due to magic monotones such as generalised robustness [63] and projective robustness [46].To our knowledge, these constitute the first set of trade-off relations on distillation parameters that act as fundamental upper bounds on the resource cost for a family of distillation protocols.

A. CSS code projections
An elementary protocol for magic distillation, proposed in the seminal work of Bravyi and Kitaev [18], uses projection onto a quantum error correcting code.This protocol begins by taking in n copies of a noisy magic state ρ and post-selecting the no-error outcome from the syndrome measurement of an [[n, k]] stabilizer code C. Doing so has the effect of projecting onto C, so the protocol proceeds to decode onto k output qubits and discard the remaining syndrome qubits.In general, any such code projection protocol only succeeds probabilistically with some acceptance probability p.Nevertheless, if the likelihood of an undetectable error is less than the input error rate , the post-selected output state will have a higher fidelity per qubit with respect to the target magic state than ρ.Many existing magic distillation protocols are based on CSS codes, such as the 15-to-1 protocol [18] based on the [ [15,1]] punctured Reed-Muller code [64,65], as well as straightfoward code projection protocols based on the [ [7,1]] Steane and [ [23,1]] Golay CSS codes analysed in [19].
It has long been known that any n-to-1 magic distillation protocol can be decomposed as a sum of stabilizer code projections followed by Clifford post-processing [47].This result implies that the optimal fidelity with respect to a target magic state, though not necessarily optimal acceptance probability, can always be achieved by a stabilizer code projection.In a similar way, we can show (Theorem 21) that any CSS circuit carrying out an nto-k magic distillation protocol is a sum of CSS code projections followed by completely CSS-preserving postprocessing.(In fact, the proof line we give also allows one to generalise the result of Ref. [47] to arbitrary n-k stabilizer protocols).
An n-to-k CSS code projection protocol is an operation K from n to k qubits that acts as where U and P are respectively a unitary decoding channel and codespace projection for an [[n, k]] CSS code.
Given n copies of a noisy magic state ρ, K acts as where ρ is the output magic state on k qubits and we have defined the acceptance probability p := tr[P ρ ⊗n ] for a single successful run of K. Distillation is successful if the output ρ from a successful run has a greater fidelity per qubit with respect to a target (pure) magic state of choice than ρ.
Since code projection protocols are not tracepreserving, the majorization constraints do not immediately apply.However, this can be remedied by preparing a specially designated CSS state σ on k qubits whenever an n-to-k code projection protocol fails, while continuing to distinguish between successful (labelled '0') and unsuccessful (labelled '1') runs of the protocol by recording this information in an ancillary qubit.We can therefore extend K into the following trace-preserving operation E: where P := P (•)P := (1 ⊗n − P )(•)(1 ⊗n − P ) performs the projection onto the orthogonal complement of C, and σ is an arbitrary CSS state.We conclude that there exists an n-to-k CSS code projection such that ρ ⊗n → pρ if and only if there exists a trace-preserving n-to-k CSS code projection E identified in Eq. (36) such that

B. General constraints on CSS code projections
We now exploit Theorem 5 to derive majorization conditions that apply across all n-to-k CSS code projection protocols.Crucially, the trace-preserving CSS code projection identified in Eq. ( 36) can be implemented as a CSS circuit (see Appendix B 4 for proof), which leads to the following Lemma: Lemma 6.Every trace-preserving CSS code projection can be executed as a sequence of completely CSS preserving operations, and is therefore stochastically represented.
A natural reference process can be chosen for all tracepreserving CSS code projections by exploiting the fact that their successful components are sub-unital.To see this, we first note that the identity operator on n qubits can be decomposed as 1 ⊗n = P + P for the codespace projector P of any [[n, k]] CSS code C. The successful component K in the trace-preserving code projection of C therefore acts as Since P is the logical identity on k logical qubits, i.e., the decoding of P in Eq. ( 34) must give an output state that is proportional to the maximally mixed state on k physical qubits, so we obtain K(1 ⊗n ) ∝ 1 ⊗k .This confirms the successful component of any trace-preserving CSS code projection to be sub-unital.Furthermore, since P is a rank-2 k projector, the acceptance probability associated with this protocol is p = tr P 1 ⊗n 2 n = 2 k−n .Putting everything together, we find that every n-to-k trace-preserving CSS code projection E maps the maximally mixed state to where we have defined the output state Since 1 2 and τ n,k are both full-rank CSS states (for the appropriate choice of σ ∈ D CSS ), Eq. ( 40) is a valid reference process for all trace-preserving CSS code projections.
We therefore conclude that, if there exists an n-to-k CSS code projection such that ρ ⊗n → pρ , then for all α ∈ A. We define ∆D α over the restricted domain n ∈ [k, ∞] (as the number of logical qubits cannot exceed the number of physical qubits).We highlight the satisfying fact that ∆D α is independent of the choice of CSS state σ in Eq. ( 36).This follows from resource-theoretic arguments as well as properties of the α-Rényi divergence (see Appendix D 2 for details).2 required to distil a single output qubit |H with output error rate δ = 10 −9 and acceptance probability p = 0.9 under a CSS code projection protocol as a function of input error rate .Our tightest lower bound from majorization (maj.) is shown to be tighter those from mana [56] and generalized robustness (GR) [63].However, it only outperforms the lower bound from projective robustness (PR) [46] in the high p, high regime.

C. Bounds on the code length of CSS code projection protocols
The entropic constraints of Eq. ( 42) on CSS code projections can be used to bound many metrics on their performance as magic distillation protocols.In this section, we apply these constraints to bounding the code length of any CSS code projection protocol that could achieve some target combination of noise reduction and acceptance probability from a given supply of noisy magic states.
We first highlight some properties of the relative entropy difference ∆D α in the following lemma, proofs of which can be found in Appendix D 3.

Lemma 7.
The following properties of the relative entropy difference ∆D α hold for any noisy input rebit magic state ρ, output k-rebit magic state ρ , acceptance probability p < 1 and α ∈ A: (ii) ∆D α is negative in the limit where n = k: An immediate consequence of Lemma 7 is that ∆D α is either negative for all n, which implies no CSS code projection protocol can distil ρ from a supply of ρ with acceptance probability p, or ∆D α has one or two roots located at n α L and n α U .These roots therefore constitute lower and upper bounds on the code length n of any CSS code projection that can carry out the desired distillation.We formalise these observations in the following Theorem: Theorem 8. Let ρ be a noisy rebit magic state.Any nto-k CSS code projection can distil out the k-rebit magic state ρ from a supply of ρ at acceptance probability p must have a code length n such that for all α ∈ A. Moreover, given any α such that H α [W ρ ] > 1, the second expression yields a finite upper bound on n.
For sufficiently low k, these bounds can be computed numerically using basic root-finding methods.However, we also find analytic upper and lower bounds on n in Section V C 1.The values of n α L and n α U when D α < 0 for all n were chosen to indicate that no n-to-k CSS code projection can carry out the desired distillation.
We emphasise that n in Theorem 8 refers to the code length (related to the resource cost C by C = n pk ) in a single run of a distillation protocol, as opposed to the the asymptotic overhead.However, single-run n still constitutes a useful metric for analysing the actual resource cost of a given stage of a protocol.Moreover, distillation costs are typically dominated by the final round of a multi-stage distillation protocol (see Ref. [66] and references contained therein), so we expect the above bounds to be particularly informative in this context.
In Fig. 3, we plot the tighest lower bound produced by Theorem 8 on the code length of n-to-1 CSS code projection protocols for Hadamard state distillation.We consider input magic states of the form ρ( ) := (1 − ) |H H| + 1 2 , which can be generally assumed irrespective of the particular error model.This is because any state ρ can be converted into this canonical form by applying the pre-processing channel i.e. twirling with respect to the Clifford subgroup generated by the single Hadamard gate.In all parameter regimes, our lower bound is observed to be tighter than mana [67], and the generalized robustness bound in Theorem 13 of Ref. [63].Furthermore, in the high p, high Even in the limit of zero input error = 0 we obtain a valid set of permissible α, which implies that Hadamard state distillation under n-to-1 CSS code projection is ruled out in the asymptotic limit n → ∞.We further highlight that the error rate = 0.3 (dashed curve) is outside of the region where ρ( ) is magic (0 , and therefore W ρ( ) is a proper probability distribution at = 0.3, which is why Hα is only seen to satisfy standard monotonicity properties at this input error.We also highlight that the α → 1 divergence corresponds to a pole in Hα[Wρ] for magic state ρ, and its residue is the mana of the state.
regime our lower bound gives tighter constraints than the projective robustness bound [46].In particular, there is a cut-off input error rate ≈ 0.12 at which our lower bound shoots up to infinity because, for any input error greater than this cut-off, one can always find some α such that ∆D α < 0 for all n, so no CSS code projection can carry out the desired distillation given a higher input error rate (see Section V C 2 for physical intuition on the origin of this behaviour).In the low p regime, our upper bounds are still able to give additional constraints on code length beyond those given by the projective robustness bound.In particular, Fig. 1 puts together information from our upper bounds and the lower bound from projective robustness to show that no CSS code projection can achieve some target combinations of output error and acceptance probability beyond a cut-off input error rate.

Extension to non-qubit code projection
Mathematically, the appearance of upper bounds on the code length of qubit CSS code projection protocols comes from the concavity of the objective function ∆D α in n.We now demonstrate that this feature is not peculiar to qubits, and in fact arises whenever stabilizer code projections on any quantum system are stochastic under a Wigner representation sufficiently similar to Gross's.
More precisely, we say that a Wigner representation W of n qudits with Hilbert space dimension d is a generalised Gross's Wigner representation if it represents each state ρ as the function W ρ on a phase space via phase point operators {A u } satisfying (A1)-(A4).
For all stabilizer code projections on odd-dimensional systems, W can simply be Gross's Wigner representation [27].For CSS code projections on qubits, W can be the representation from Eq. ( 8).We then have the following analytic upper and lower bounds on the resource cost of code projection protocols: for all α ∈ A for which H α (W ρ ) < log d, and for all α ∈ A for which H α (W ρ ) > log d.
One might be concerned that the conditions H α (W ρ( ) ) > log d given in Theorems 8 and 9 for the existence of a finite upper bound on n are never actually satisfied.This turns out not to be the case.For n-to-1 CSS code projection protocols for Hadamard distillation, there always exists a valid set of α-values such that H α [W ρ( ) ] > 1 for all , so we always obtain a finite upper bound on n.This can be seen by upper bounding the Rényi entropy of W ρ( ) as and then examining Fig. 4, which shows that there exists finite range of α such that H α > 1 at = 0.The trade-off relations given in Theorem 9 can be rearranged to bound other parameters such as the acceptance probability p.In Fig. 5, we compare the tightest upper bound on p produced by Theorem 9 for n-to-1 Hadamard distillation to those attained in existing protocols based on CSS codes given in Refs.[18] and [19].We see that the acceptance probabilities of basic code projection protocols using the [ [7,1]] Steane and [ [23,1]] Golay codes in Fig. 5(a) are orders of magnitude less than our upper bounds, suggesting that substantial room for improvement is not ruled out.Interestingly, in Fig. 5  2 via an n-to-1 code projection against actual acceptance probabilities attained using the Steane code (purple) at n = 7 and the Golay code (green) at n = 23 (detailed in Ref. [19]).Attained acceptance probabilities are orders of magnitude less than our upper bounds.(b) We plot the majorization upper bound (dashed line) on the acceptance probability p of any 15-to-1 CSS code projection protocol with which one can distil the noisy magic state (1− ) |A A|+ 1 2 .Interestingly, our bound is very close to the actual acceptance probability for the 15-to-1 protocol (blue line) given in [18], though we emphasise this latter protocol is not a straightforward CSS code projection.
our upper bound is very close to the actual acceptance probability of the protocol based on the [ [15,1]] Reed-Muller code in Ref. [18], which we speculate may hint at something fundamental about the role of the intermediate Clifford corrections used in that protocol.
Theorem 9 takes on a particularly simple form in the particular case of n-to-1 CSS code projection protocols for Hadamard state distillation.By evaluating the α = 2 condition explicitly, we find that, if there exists a CSS code projection that sends n copies of ρ( ) to a k-qubit state ρ with acceptance probability p and output error where the logarithm base is f ( ) := [1 − + expression captures the fact that under a CSS code projection protocol, there is a fundamental trade-off between acceptance probability and output fidelity.For instance, Eq. ( 52) shows that given a supply of noisy magic states ( > 0), we cannot use CSS code projection to distil a perfect magic state (δ = 0) with certainty (p = 1), which was first shown in Ref. [48].To further investigate this trade-off, in Fig. 6(b) we plot the maximum achievable fidelity with respect to the Hadamard state that can be achieved by an n-to-1 CSS code projection K via where the maximization is performed over the set of all n-to-1 CSS code projection protocols.

Why do we expect upper bounds?
Taking CSS circuits as our free operations, the appearance of upper bounds on n might first seem to contradict a resource theory perspective, where we might expect n + 1 copies of a noisy magic state to be at least as good as n copies at distilling magic, since discarding subsystems is itself a CSS circuit.However, by specialising to stabilizer code projection protocols, we necessarily introduce a trade-off between n and acceptance probability p, which we now show in a simple calculation.For any nto-k stabilizer code projection protocol for d-dimensional qudit distillation, the acceptance probability p is given by how much of n copies of the noisy input magic state ρ projects onto the d k -dimensional codespace spanned by the logical basis {|j L } j=0,...,d k −1 of the code.Letting λ max (•) denote a state's largest eigenvalue, we immediately identify the following upper bound on p, p = which falls monotonically towards 0 as n → ∞.At an intuitive level, the trade-off between n and p occurs because the codespaces of [[n, k]] stabilizer codes remain the same size as we increase n, and so take up a vanishingly small part in the support of all the noisy input magic states used.Under the requirement that we have some threshold acceptance probability (below which the expected overhead would be too large), a corresponding upper bound on n is then expected.

Comparison with the data-processing inequality (DPI)
We have seen that stochastically represented CSS code projection protocols give rise to a set of upper bounds on n (Theorem 9).By comparing to the data-processing inequality (DPI), we see that although the existence of upper bounds is a general feature of code projection protocols, exploiting the stochasticity in the representation of CSS code projections gives strictly stronger bounds.
According to the DPI, if there exists a code projection (CSS or otherwise) that can distil out the k-qubit magic state ρ from n copies of the noisy magic state ρ with acceptance probability p, then for all α ∈ (1, ∞) [68], where Dα (ρ||τ ) is the sandwiched α-Rényi divergence [69,70] on the normalized quantum states ρ and τ , which is defined as We then have the following upper bound on n: which is finite whenever α is such that H α (ρ) > 1 (proof essentially identical to that of Theorem 8).
Given that we also obtain upper bounds on code length from simple data processing of quantum states, we should ask: does majorization give genuinely new constraints on magic state distillation beyond the DPI?Since our majorization conditions are a consequence of the stochastic representation of stabilizer circuits, while the DPI arises from the fact that all quantum channels are CPTP, this question may be loosely rephrased as asking whether stochasticity imposes any additional constraints beyond those imposed by CPTP on magic state distillation by stabilizer code projection.
Fig. 6 allows us to answer this question in the affirmative, since the upper bound on code length due to majorization is observed to be stronger than that due to the DPI over a wide range of parameter regimes for CSS code projection protocols.In particular, extracting n maj U := min α n α U as the tightest bound due to majorization from Theorem 8, we find that, in the low acceptance probability p and low input error regime of Fig. 6(a), the difference in upper bounds ∆n U := n DP I U − n maj U (the amount by which majorization "beats" DPI) is of the order ∆n U = O(10 4 ).We thus conclude that the constraints on CSS protocols stemming from majorization go beyond those from the DPI.

VI. BEYOND REBIT DISTILLATION
We have simplified our analysis thus far by restricting our attention to rebit magic states such as |H .However, many protocols such as the seminal 15-to-1 Bravyi-Kitaev protocol [18] distil magic states such as |A ∝ |0 + e i π 4 |1 , which have complex density matrices in the computational basis.An argument can be made that since |A is Clifford-equivalent to |H , this state should be considered equally resourceful for magic distillation.However, to address this concern more directly, we extend our majorization relations to states with complex Wigner representations.A discussion of how this can be achieved is given in Appendix F. The basic idea is that we can define valid 2d 2 -dimensional quasiprobability distributions by forming the direct sum of the real and imaginary parts of the original distribution: while for our reference CSS states we can simply take: It then follows that if there exists a completely CSS preserving operation E such that E(ρ) = ρ and E(τ for all α ∈ A. Further study of the significance of these complex relative majorization conditions may be of foundational interest, but we leave this for future work.

VII. DISCUSSION AND OUTLOOK
We have shown that the statistical mechanical framework of Ref. [27] can be extended to the experimentally significant case of qubit systems by focusing on the processing of magic states under CSS circuits -i.e., the subset of stabilizer circuits where CSS states play the role of stabilizer states.To achieve this, we made use of a Wigner representation first introduced in Ref. [43] wherein completely CSS-preserving channels correspond to stochastic transformations on phase space.This set of channels include CSS circuits, which are sufficient for universal quantum computing [44,45] and consist precisely of the gateset that can be performed fault-tolerantly on surface code constructions [71].
Within this framework, we showed that relative majorization can be used to encode particular properties of an important class of distillation protocols that project onto CSS codes, in terms of which all protocols carried out by CSS circuits can be decomposed.We established general entropic constraints on such protocols in terms of upper and lower bounds on the code length n.
In the context of achieving full fault-tolerance, a natural extension of this work would be to generalize our results to more sophisticated protocols.For instance, we might ask how the use of m intermediate Clifford corrections in between the measurements of stabilizer generators might affect these fundamental constraints.One would expect to be able to obtain more refined bounds as a function of m.Moreover, while many protocols are based on CSS codes in part due to their relative ease of construction via tri-orthogonal matrices [20], from an operational perspective it would be of interest to see whether we can extend to the complete set of stabilizer operations on qubit systems.
We have also obtained a set of monotones {Λ α } for completely CSS-preserving magic distillation, each of which forms a convex optimization problem.We speculate that an analogous monotone can be constructed for any resource theory for which the free operations are a subset of operations that completely preserve Wigner positivity.From the perspective of quantum optics experiments, wherein Gaussian operations and probabilistic randomness are readily available, it may be of interest to consider the case of continuous variable systems under the set of Gaussian operations and statistical mixtures [72].Since the individual α-Rényi divergences on quasiprobability distributions were seen in Section V C 3 to typically produce stronger constraints than the corresponding constraints given by α-Rényi divergences on quantum states, it would be interesting to see how well these quasidistribution-based monotones perform relative to known state-based counterparts.
On a technical note regarding majorization theory, we point to two interesting directions for further study.Firstly, complex majorization constraints arise naturally when we extend our setup from rebit to all qubit states, where Wigner representations can become complex due to the non-Hermiticity of the operator basis {A u }.We expect such constraints to take the form of a duplet of constraints applying separately to the real parts and imaginary parts of the Wigner function.In the context of non-Hermitian quantum mechanics [73], results on complex majorization would also benefit theories that require an ordering between Hamiltonian eigenvalues, such as quantum thermodynamics.Secondly, the Wigner representation of Ref. [43] recovers the covariance over symplectic affine transformations on qubit phase spaces, a property shared by Gross's Wigner function on odddimensional systems.This added structure on the phase space was ignored by our analysis, but could be utilised to tighten the obtained bounds in future work.In particular, as explained in the discussion of Ref. [27], the stochastic majorization used in our analysis is only a special case of G-majorization, where G can be taken as a subgroup of the stochastic group such as the symplectic group.It can then be shown [74][75][76] that one should expect to obtain a set of finite lower bound constraints on distillation, which will be tighter than stochastic majorization constraints.

ACKNOWLEDGEMENTS
RA and NK are supported by the EPSRC Centre for Doctoral Training in Controlled Quantum Dynamics.SGC is supported by the Bell Burnell Graduate Scholarship Fund.DJ is supported by the Royal Society and a University Academic Fellowship.

APPENDIX A: WIGNER REPRESENTATION
For any n-qubit state ρ, we define the following complex-valued function W ρ : where {A u } are the set of 2 2n phase point operators on n-qubits, which are defined as follows As a consequence of Eq. ( 7), these phase-point operators can alternatively be expressed as which further reveals that every A u is real in the computational basis.Despite being complex-valued, W ρ transforms covariantly under the displacement operators -informally speaking, ρ is shifted by the displacement operators around phase space -just like Gross' representation in odd dimensions.Concretely, we consider the Wigner representation of D v ρD † v for an arbitrary phase space displacement v, which is Inserting the commutation relation for the displacement operators in Eq. ( 7) into Eq.(A4), we obtain which confirms that W ρ transforms covariantly under the action of the displacement operators.

Properties of qubit phase point operators and Wigner function
We first establish that the phase point operators of a joint system are simply tensor products of phase point operators on its subsystems: (A1) (Factorization).On a bipartite system with subsystems X and Proof.From the definition of D u in Eq. ( 6), it is clear that Let n X and n Y be the numbers of qubits in subsystems X and Y respectively.Then the zero phase point operator on the bipartite system, A 0 , is which in turn implies that any phase point operator A u := A u X ⊕u Y for some u X ∈ P X and u Y ∈ P Y is as claimed.
Property (A1) enables us to break down any n-qubit phase-point operator A u as a tensor product of singlequbit phase-point operators, where u j ∈ Z 2 × Z 2 is a co-ordinate in the phase space of the jth qubit only.It is therefore instructive to calculate the single-qubit phase point operators, which are We next demonstrate how the explicit forms of singlequbit phase point operators can be leveraged via Eq.(A9) to prove two further properties for general n-qubit phase point operators.In particular, we show how distinct nqubit phase point operators are orthogonal under the Hilbert-Schmidt inner product: (A2) (Orthogonality).Let A u and A v be two n-qubit phase point operators.Then tr[A † u A v ] = 2 n δ u,v .Proof.Let us first decompose u and v as u = n i=1 u j and v = n i=1 v j , where u j and v j are phase point coordinates on the jth qubit only.By Eq. (A10), as claimed.
There are n phase point operators on n-qubits.Property (A2) thus implies {A u } u∈Pn forms an orthogonal complex basis for the complex vector space C 2 n ×2 n of 2n × 2n complex matrices under the Hilbert-Schmidt inner product.Therefore, W ρ is an informationally complete representation of n-qubit states.
More precisely, any n-qubit quantum state ρ can be uniquely decomposed as where W ρ (u) := 1 2 n tr[A † u ρ] is a complex function on P n .Furthermore, every phase point operator has trace 1: (A3) (Unit trace).Let A u be any n-qubit phase point operator.Then we have tr[A u ] = 1.
Proof.Let us first decompose u as u = n j=1 u j , where u j is a point in the phase space of the jth qubit.We see that tr[A uj ] = 1 from Eq. (A10).Therefore, which completes the proof.
Property (A3) implies that all n-qubit functions are normalized.Since any n-qubit state ρ has trace 1, where the last equality is established by the unit trace of phase point operators.
We will also find it useful to identify the following property of phase point operators: Proof.Adopting the decomposition of each A u in Eq. (A9), we see that Using the explicit forms of single-qubit phase-point operators in Eq. (A10), we calculate that from which the result immediately follows.

Wigner representation for rebits
Any n-qubit state ρ can be decomposed as where the transposition is taken with respect to the computational basis.Because ρ * = ρ T in any basis, we can identify i.e., ρ (0) and ρ (1) are respectively the real and imaginary components of the density matrix of ρ in the computational basis.
We first prove Lemma 1, which shows there is a direct correspondence between the real/imaginary components of a state's Wigner representation and those of its density matrix in the computational basis.
Lemma 1.Given any n-qubit quantum state ρ, for all u ∈ P n , where Re(ρ) and Im(ρ) are respectively the real and imaginary parts of the density matrix of ρ in the computational basis.

Wigner representation of qubit channels
We recall from the main text that the Wigner representation of a channel E : is the linear map W E : P n → P m on phase space defined as for all v ∈ P m , u ∈ P n , where the Choi state [51] of E, J (E), is defined as J (E) = (I ⊗ E) |φ + n φ + n | for the canonical maximally entangled state |φ + n on two sets of n qubits, One can straightforwardly verify that |φ + n is stabilized by Z i Z n+i , X i X n+i i=1,...,n and is therefore a CSS state.
The factorization property ((A1)) of phase point operators implies that Using the identity E(X) = 2 n tr 1,...,n (X T ⊗ 1 ⊗m )J (E) for transposition taken with respect to the computational basis, and recalling that A u is real in the computational basis, we then conclude Therefore, if σ = E(ρ), then we obtain Eq. ( 17) from the main text, i.e., We thereby see that if E maps ρ to σ, then W E is a matrix that maps W ρ to W σ , which justifies regarding W E as the representation of E on phase space.By property (A4) of the phase point operators, we have that v∈Pm A v = 2 m 1 m .By applying this to the alternative formulation of W E in Eq. (A42), we see that Then recalling that tr[A u ] = 1 (property (A3)), we obtain Eq. ( 21) from the main text, i.e., v∈Pm This means every column of W E sums up to 1. Finally, we show that the representation we have chosen respects sequential and parallel composition of processes. Let 2 ) be two multiqubit channels.Since {A x } x∈Pm are a complex orthogonal basis for 2 m × 2 m complex matrices under the Hilbert-Schmidt inner product, we have that Therefore, when m = l, we obtain or in matrix notation, Furthermore, due to the factorization property of the phase point operators, we have that or in matrix notation APPENDIX B: COMPLETELY CSS-PRESERVING OPERATIONS

Completely CSS-preserving unitaries
The group of CSS-preserving unitaries on n qubits [44] can be generated as We will also find it useful to note the following conjugation relations of the collective Hadamard gate, which respectively hold for all i, j ∈ {1, . . .n} where i = j and n-bit strings a ∈ {0, 1} n .
Lemma 11.The group of completely CSS-preserving unitaries on n qubits is G(n) := CNOT(i, j), Z i , X i i,j=1,...,n,i =j . (B4) Proof.Let U + be any CSS-preserving unitary on n qubits.We first observe that U + is either in G(n) or is a unitary from G(n) followed by the collective Hadamard gate on n qubits, i.e., U + = [H ⊗n ] b U for some binary digit b ∈ {0, 1} and unitary U ∈ G(n).This follows from the conjugation relations Eq. (B2) and Eq.(B3) alongside the fact that H ⊗n is self-inverse.
for some U ∈ G(n), which implies U + cannot be completely CSS-preserving since H ⊗n is not completely CSSpreserving.Therefore, U + is completely CSS-preserving if and only if it is in G(n).

Completely CSS-preserving measurements
Throughout the rest of this appendix, we extend, wherever necessary, the notion of being completely CSSpreserving to trace-decreasing operations -i.e., a tracedecreasing operation E from n to n qubits is completely CSS-preserving if, given any CSS state ρ on m+n qubits, we have that (I m ⊗ E)(ρ) is always a (possibly subnormalised) CSS state on (m + n ) qubits.
The projective measurement of any n-qubit Pauli observable S is carried out using projectors P (±S) := 1 2 (1 n ± S) corresponding to the ±1 outcomes.Post-selection for the ±1 outcome is then carried out by the operation P(±S) := P (±S)(•)P (±S).
Lemma 12. Post-selecting the ±1 outcome in the projective measurement of a CSS observable is completely CSS-preserving.
Proof.Let S be a CSS observable on n qubits and |ψ be a CSS state on m + n qubits for any m ≥ 0. Then let S 1 , . . ., S m+n be a set of m + n CSS observables that generate the stabilizer group S(|ψ ) of |ψ .
Post-selecting the ±1 outcome in a projective measurement of S on the last n qubits of |ψ yields the possibly subnormalised output where we have defined the CSS observable S := 1 ⊗m ⊗S.
There Therefore, given any pure CSS state |ψ on m + n qubits, post-selecting the ±1 outcome in the projective measurement of a CSS observable on the last n qubits of |ψ always produces a (possibly subnormalised) CSS state.Since every CSS state is a statistical mixture of pure CSS states, we arrive at the Lemma result.

CSS circuits
In this section, we show that the subset of stabilizer operations covered by Lemma 3, which we referred to as CSS circuits, are completely CSS-preserving.
To reiterate, a CSS circuit is any sequence of the following four primitive CSS channels: 1. Introducing a CSS state on any number of qubits, 2. Performing a completely CSS-preserving unitary,

Projective measurement of any CSS observable,
with the possibility of performing different sequences of primitive CSS channels depending on outcome, 4. Discarding any number of qubits, as well as statistical mixtures of such sequences.We emphasise that all CSS circuits are trace-preserving.Any sequence of primitive CSS channels can be executed as a binary tree where the root node represents inputting qubits, the leaf nodes represent outputting qubits, and internal nodes represent primitive CSS channels.An illustrative example is provided in Fig. 7, from which we see that a sequence of primitive CSS channels can produce outputs distinguished by the sequences of measurement outcomes leading up to them.Different numbers of ancillary qubits may be introduced on different branches of a tree representing a sequence of primitive CSS channels.However, one can arbitrarily increase the number of qubits introduced on any branch, without affecting what it does, by introducing the maximally mixed state on qubits that are immediately discarded just before the branch's output.Furthermore, different branches may have different lengths.However, one can arbitrarily lengthen any branch, without affecting what it does, by inserting identity channels just before the branch's output.Since introducing the maximally mixed state and the identity channel are both primitive CSS channels, we can, without loss of generality, only consider sequences of primitive CSS channels executed as binary trees where every branch has the same length and introduces the same number of ancillary qubits.
Lemma 13.A CSS circuit E on n qubits is a statistical mixture of channels E i representing sequences of primitive CSS channels, in which each sequence E i is a sum of (possibly trace-decreasing) operations E i,j generating its distinguishable outputs.Thus one can write in which {p i } forms a probability distribution, σ i,j is a CSS state on m ancillary qubits, R is a subset of the (n + m) input and ancillary qubits, U (i,j),l is a completely CSS-preserving unitary and P (S (i,j),l ) projects onto the +1 eigenspace of the CSS observable S (i,j),l .Moreover, P (S (i,j),1 ), . . ., P (S (i,j),N ) gives the sequence of measurement outcomes that operationally distinguish the jth possible output of sequence E i .
Proof.Without loss of generality, every branch of every sequence forming the mixture of E introduces the same number of ancillary qubits m and has same length N .One can show that the jth branch of the ith sequence must generate the channel E i,j by induction over the steps of the branch.
Because each possible output from a sequence of primitive CSS channels results from a unique sequence of measurement outcomes, it is operationally meaningful to prepare a state conditioned upon obtaining the jth possible output in the ith sequence from the statistical mixture forming E. We therefore have the following corollary: Corollary 2. We can record which operationally distinguishable output E i,j from a CSS circuit E was obtained using a classical register, where |r i,j is a computational basis state on multiple qubits.We note that, since |r i,j is a CSS state, E can be carried out as a CSS circuit.
In the next subsection, we use this corollary to obtain the trace-preserving CSS code projection studied throughout this paper.
We are finally in a position to prove Lemma 3, which is reproduced below.To this end, it is convenient to define the unique 0-qubit state 1 as CSS.
Lemma 3. Any CSS circuit is completely CSSpreserving.
Proof.The decomposition of CSS circuits given in Lemma 13 implies that if (i) performing completely CSSpreserving unitaries, (ii) conditioning on the +1 outcome in the projective measurement of a CSS observable, (iii) introducing a CSS state and (iv) discarding any number qubits are completely CSS-preserving, then all CSS circuits are completely CSS-preserving.Now (i) is completely CSS-preserving by definition, we proved that (ii) is completely CSS-preserving in Lemma 12, and since the tensor product of two CSS states is always a CSS state, (iii) is completely CSS preserving.
Therefore, to prove that all CSS circuits are completely CSS-preserving, we just have to prove that discarding any number of qubits is completely CSS-preserving.
Consider discarding l qubits from n, where n ≥ l ≥ 1.Since we can freely relabel subsystems, we need only consider discarding the last l qubits.Let |ψ be a pure CSS state on m + n qubits for any m ≥ 0. Discarding the last l qubits of |ψ then produces the state σ := I m ⊗ tr n−l+1,...,n [|ψ ψ|].Since tracing out is unaffected by first performing a computational basis measurement on the last l qubits, we have that We then observe that and so by Lemma 12, 1 ⊗m+n−l ⊗ |k k| |ψ is a possibly subnormalised pure CSS state √ p k |φ k ⊗ |k , where p k is the probability of getting the |k outcome in the computational basis measurement, and |φ k must be a (normalised) CSS state to keep the full state CSS.We thus obtain which is a CSS state on (m + n − l) qubits.We conclude that, given any pure CSS state on m + n qubits, discarding any l ≤ n of its final n qubits always produces a CSS state.This is true if and only if discarding any number of qubits is completely CSS-preserving.

a. Omission of the collective Hadamard gate
The collective Hadamard gate promotes CSS circuits to a subset of stabilizer circuits where CSS states play role of stabilizer states.One can reasonably ask why we have excluded the collective Hadamard gate from the construction of CSS circuits.Our justification is that one can conjugate the collective Hadamard gate past any primitive CSS channel and leave another primitive CSS channel behind.This follows from the conjugation relations given by Eq. (B2) and Eq.(B3) for completely CSS-preserving unitaries and projective measurements of CSS observables, from the cyclic property of the trace for discarding qubits, and from for introducing a CSS state on m ancillary qubits to an n-qubit system, where we note that H ⊗m σH ⊗m is also a CSS state because H ⊗m is CSS-preserving on m qubits.Therefore, circuits from this wider subset are operationally equivalent to CSS circuits followed by the collective Hadamard gate conditioned upon obtaining certain outputs, and are therefore not more powerful as magic distillation protocols.

CSS code projections
An [[n, k]] CSS code, where n ≥ 1 and n > k ≥ 0, is a vector space C stabilized by a subgroup S of n-qubit Pauli observables such that −1 ⊗n / ∈ S and S can be generated from n − k independent and commuting CSS observables S 1 , . . ., S n−k , of which none is the identity.Lemma 14.Let S := (−1) bi S i i=1,...,n−k be the stabilizer group of an [[n, k]] CSS code, where each S i is a positive CSS observable and each b i is a binary digit.Then there exists a completely CSS-preserving unitary U such that Proof.Let r be the number of Z-type generators for S.
We first construct the completely CSS-preserving unitary U for the case where all Z-type generators of S appear before X-type ones, i.e., where each u i , v i is a non-zero n-dimensional binary vector.
Let us define Z as the set of Z-type generators for S without their signs, i.e., Z := {Z(u 1 ), . . ., Z(u r )}.We now prove that there exists a sequence of CNOT operations that transforms Z(u i ) to Z i for all 1 ≤ i ≤ r.
We observe that Z is a subset of positive Z-type observables on n qubits, which form an n-dimensional vector space V over the field F 2 .Choosing the basis {Z 1 , . . ., Z n } for V , we can simply write Z(u i ) as u i , and we further have that {u 1 , . . ., u r } are linearly independent.Therefore, the matrix M Z formed by taking members of Z as columns has rank r.Using Gauss-Jordan elimination, we can convert M Z into its unique reduced row echelon form R Z , where I r,r is an r × r identity matrix while 0 n−r,r is a (n − r) × r null matrix.On vector spaces over F 2 , Gauss-Jordan elimination consists of row swaps and additions.We now show how both can be done on any matrix M whose columns are elements of V in the basis {Z 1 , . . ., Z n } using CNOTgates: 1. Swapping rows j and l.This corresponds to swapping qubits j and l, which is carried out by performing CNOT(l, j) CNOT(j, l) CNOT(a, b) on each positive Z-type observable forming a column in M .
2. Adding row j to row l.The action of CN OT (j, l) Therefore, given any vector u in V , we have that where arithmetic is modulo 2 and e l gives coordinates for the lth basis vector of V .In words, CNOT(j, l) adds the jth component of u to the lth.Therefore, performing CNOT(l, j) on each positive Z-type observable forming a column in M would add the jth row of M to the lth row.
We conclude that there exists a sequence of CNOTgates that, when performed on each element of Z, accomplishes the Gauss-Jordan elimination in Eq. (B14).Denoting this sequence of CNOT-gates by the unitary U Z , we have that U Z (Z(u i ))U † Z = Z i for all 1 ≤ i ≤ r.We next consider X-type generators for S without their signs.The action of CNOT(j, l) on X m is CNOT(j, l)X m CNOT(j, l) = X j X l for m = j, X m otherwise, (B17) so U Z only transforms positive X-type observables into other positive X-type observables.In other words, we can find non-zero n-bit strings {v r+1 , . . ., However, since the Z-type generators of S commute with the X-type generators, X(v i ) must commute with Z 1 , . . ., Z r .Therefore, X(v i ) must act trivially on qubits 1 through r, so the first r bits of v i must be 0. Therefore Everything we have done for Z can now be repeated for X := {X(v r+1 ), . . ., X(v n−k )} on qubits r + 1 through n.The only thing that needs to be checked is that row addition in any matrix whose columns are elements from the vector space of positive (n − r)-qubit X-type observables can be performed by CNOT-gates.This can be confirmed using Eq.(B17), which implies where arithmetic is modulo 2 so CNOT(j, l) adds the jth component of v to the lth.
We conclude that there also exists a sequence U X of CNOT-gates such that U X (X(v i ))U † X = X i for all r < i ≤ n − k.Furthermore, U X acts trivially on qubits 1 through r, which implies We now define unitaries U C and U SWAP , which respectively remove the signs from the generators of S and moves qubits 1 through n − k to k + 1 through n, where SWAP(i, j) := CNOT(i, j) CNOT(j, i) CNOT(i, j) swaps qubits i and j.Defining Since U † := U C U Z U X is formed from CNOT-, singlequbit X-and Z-gates, it is a completely CSS-preserving unitary that accomplishes the Lemma's claim for the ordering of S i given by Eq. (B13).By appropriate swaps among the last n − k qubits, which we have seen is completely CSS-preserving, we can construct a completely CSS-preserving unitary accomplishing the Lemma's claim for any ordering of S i .
As an immediate consequence of Lemma 14, Corollary 3. Every [[n, k]] CSS code has a completely CSS-preserving encoding unitary.
Proof.Let C be an [[n, k]] CSS code whose stabilizer group is generated by CSS observables S 1 , . . ., S n−k , and let s be a k-bit string (if k = 0, then s is the empty string).By Lemma 14, there exists a completely CSSpreserving unitary U such that In words, U can encode any computational basis state |s on k physical qubits (if k = 0, then |s = 1) as a CSS logical basis state |s C in C.
With Corollary 3 in hand, we can show that every trace-preserving CSS code projection can be executed as a CSS circuit, which leads to the following Lemma from the main text: Lemma 6.Every trace-preserving CSS code projection can be executed as a sequence of completely CSS preserving operations, and is therefore stochastically represented.
Proof.Let C be an [[n, k]] CSS code generated by CSS observables S 1 , . . ., S n−k .We can then define a quantum channel F that carries out the following sequence of primitive CSS channels: 1. Projectively measure S 1 , . . ., S n−k , which is equivalent to the syndrome measurement of C.
2. If the no-error syndrome is obtained, decode onto the first k qubit using a completely CSS-preserving decoding unitary, which always exists in accordance with Corollary 3, and discard the final n−k qubits.
3. Otherwise, discard all qubits and prepare a k-qubit CSS state σ.
Since F is a CSS circuit by construction, it is completely CSS-preserving.By Corollary 2, we can convert F into a trace-preserving code projection E for C by distinguishing the output of the no-error syndrome from those of all other syndrome using a single-qubit classical register, i.e. by preparing an ancillary qubit in the state |0 if we obtain the no-error syndrome output, and in the state |1 otherwise.Corollary 2 confirms that E can also be executed as a CSS circuit and is therefore completely CSS-preserving.We conclude by Lemma 3 that E is stochastically represented.

APPENDIX C: ENTROPIC CONSTRAINTS ON COMPLETELY CSS-PRESERVING PROTOCOLS
In this section we prove Theorem 5, which gives entropic constraints on generic completely CSS-preserving protocols for qubits.
We first define the sets of positively-represented and real-represented quantum states in any Wigner representation W as We then have the following Lemma, which is a generalisation of Theorem 11 of Ref. [27].
Lemma 15.Let ρ and τ be states of a d-dimensional qudit such that ρ ∈ W R and τ ∈ W + in some generalised Gross's Wigner representation W . Furthermore, let E : B(H d ) → B(H d ) be a stochastically represented channel.Then the α-Rényi divergence D α (•||•) is welldefined and satisfies the following properties for α ∈ A: cally represented E such that E(τ ) ∈ W + .
Proof.In general, W ρ is a quasiprobability distribution, but for α ∈ A we see that W ρ (u) α ≥ 0 for all u ∈ P. Therefore D α (W ρ ||W τ ) is always well-defined and realvalued.The proofs of 1-4 are then identical to the proof given for Theorem 11 in Ref. [27].
Importantly for our purposes, this abstract but general result applies to input and output systems of any (even or odd) finite dimension.With this in hand, we can now give a proof of Theorem 5, which we restate for clarity: Theorem 5. Let ρ be a noisy rebit magic state and τ be a CSS state in the interior of D css .If there exists a completely CSS-preserving protocol E such that E(ρ ⊗n ) = ρ and τ := E(τ ⊗n ) is also in the interior of D css , then for all α ∈ A, where Proof.Since τ and τ are in the interior of D css , they are positively represented.Moreover, we have established in Theorem 2 that every trace-preserving and completely CSS-preserving operation is stochasically represented.The results of Lemma 15 thus apply, from which properties 3 and 4 combine to give the Lemma result.We recall from Section V that if there exists an nto-k CSS code projection that achieves the distillation ρ ⊗n → ρ with acceptance probability p, then ∆D α ≥ 0 (D1) for all α ∈ A, where We recall the following output states from the system and reference processes, where σ is the CSS state prepared after a failed run.We now prove the following lemma, which enables us to simplify the expression of our constraint functions ∆D α .To this end, we will find it useful to first introduce the general mean Q α (w||r) on a quasiprobabilty distribution w := (w 1 , . . .w N ) T and probability distribution r := (r 1 , . . .r N ) T , Lemma 16.Consider the following pairs of k-rebit states (ρ 0 , ρ 1 ) and (τ 0 , τ 1 ) ∈ Int(D css ).Moreover let ψ 0 and ψ 1 be two distinct computational basis states on m rebits, and let {p i } 1 i=0 , {q i } 1 i=0 be valid probability distributions.We then have the identity which in turn implies the inequality for each i ∈ {0, 1}.
Proof.Since ψ 0 and ψ 1 are orthogonal, by Eq. (A12) this implies As ψ i ∈ D css for each i ∈ {0, 1}, we must have W ψi (u) ≥ 0. We thus conclude from Eq. (D8) that With this in hand, we can explicitly evaluate: where in the second equality we used the normalisation of our chosen representation W .The inequality in the Lemma statement then follows from the fact that both terms on the right hand side of Eq. (D6) must be nonnegative for all α ∈ A.
With this property in hand, we obtain the following Lemma, which makes the non-trivial n-dependence in ∆D α more explicit.Lemma 17.Let µ k denote the maximally mixed state on k qubits.The function ∆D α can then be expressed as Proof.By Lemma 16, we have the expansion where in the last equality we have used Q α (p||p) = 1 for all probability distributions p.Therefore, We therefore conclude that so the entropic constraints on CSS code projections are unaffected by varying σ in the interior of D css .

Proof of Lemma 7
We now prove the following properties of the constraint function ∆D α from the main text: Lemma 7. The following properties of the relative entropy difference ∆D α hold for any noisy input rebit magic state ρ, output k-rebit magic state ρ , acceptance probability p < 1 and α ∈ A: (ii) ∆D α is negative in the limit where n = k: .
then ∆D α is also negative in the asymptotic limit ) Proof.Let us denote the maximally mixed state on k qubits by µ k .We further simplify notation by defining the constants c Proof of (i): Let us define the following function This means that from Lemma 17 we can write and since the first term is linear we need only check the second derivative of the second term to establish that ∆D α is concave.We have Since c 1 , c 2 ≥ 0 for all ρ and p, the term in square brackets is non-negative for all n > k, α > 1, ρ and p (strictly positive for p < 1), which implies ∂ 2 n ∆D α is nonpositive everywhere on our restricted domain.Therefore ∆D α is concave, as claimed.
Proof of (ii): Recalling that α > 1, we have from Lemma 17 that so long as c 2 > 1, which is true if and only if p < 1.
Proof of (iii): We have: This completes the proof.

Analytic bounds on code length in qudit code projection protocols
In this section, we consider [[n, k]] stabilizer code projections for quantum systems of arbitrary Hilbert space dimension d, which are stochastic under some generalised Gross's Wigner representation.
Following Eq. ( 36), we can define the following tracepreserving projection where U and P are respectively the decoding channel and codespace projection for C, P is the projection onto the orthogonal complement of C, and σ is positively represented under some generalised Gross's Wigner representation of our choice.This channel transforms n copies of an input noisy magic state ρ as where we assume the output state ρ following successful code projection is δ-close to k copies of our pure target magic state ψ, as measured by the trace distance ρ − σ 1 where X 1 := tr|X| = tr √ X † X is the trace norm (also known as the Schatten-1 norm).Formally, we assume We also define the Frobenius norm (also known as the Schatten-2 norm) as We further define the 1 -and 2 -norms, respectively, of a vector w ∈ R d as We now make use of the following result from the literature on real vector spaces (e.g.see [77]), which is a consequence of the Cauchy-Schwarz inequality.

Lemma 18. For all w
This result enables us to show that vanishingly small variations in quantum states correspond to vanishingly small variations in their Wigner representations: Lemma 19.If ρ − σ 1 ≤ then for any generalised Gross's Wigner representation W , we have Proof.To simplify notation, we first define the state difference ∆ := ρ − σ such that W ∆ = W ρ − W σ .Since the Schatten-p norms are non-increasing with respect to p [77], we obtain where in the second inequality we employ Lemma 18. Therefore Lemma 20.Let ρ and σ be two quantum states of a ddimensional qudit such that ρ − σ 1 ≤ .Then given any generalised Gross's Wigner representation W , Proof.Theorem 7 (2) of Ref. [78] applies to quasiprobability distributions and tells us that for two d 2dimensional distributions w, w , we have the following continuity statement on the α-Rényi entropies The proof of this just relies on the monotonicity of the p-norms w p := We are now in a position to prove Theorem 9, which we restate here for clarity: Theorem 9 (Qudit code bounds).Consider the distillation of k copies of a pure magic state ψ from a supply of the noisy magic state ρ, where ψ and ρ are d-dimensional qudit states that are real-represented under a generalised Gross's Wigner representation W .Any stochasticallyrepresented distillation protocol that projects onto the codespace of an [[n, k]] stabilizer code and can use n copies of ρ to distil out a k-qudit state ρ with acceptance probability p and output error δ ≥ ρ − ψ ⊗k 1 must have a code length n such that for all α ∈ A for which H α (W ρ ) < log d, and for all α ∈ A for which H α (W ρ ) > log d.
Proof.Let µ k dneote the maximally mixed state of k qudits with Hilbert space dimension d.Following the same proof strategy as that of Lemma 16, we obtain where in the final equality we have used the identity We can now make use of the continuity of the Rényi entropy as stated in Lemma 20 to further lower-bound this divergence as where the equality follows from the factorisation of the α-Rényi entropy over subsystems.This gives rise to the following upper bound on the relative entropy difference ∆D .

(D36)
This gives a weaker but still necessary constraint on stochastic transformations accomplishing ρ ⊗n → ρ p and .
2. By Lemma 24, each operation E i can be decomposed into a sum where each operation E i,s first introduces m ancillary qubits in the CSS state |ψ i , then post-selects the +1 outcome in a sequence of projective measurements of CSS observables S i,N , . . ., S i,1 , and then performs a CSS code projection on the input and ancillary qubits.Thus one can write (note that K depends on i and s) 3.By repeated applications of Lemma 25, we find that E i,s performs a CSS code projection on the input and ancillary qubits, followed by preparing a CSS state and completely CSS-preserving postprocessing.Thus one can write where q is a probability, U is a completely CSSpreserving unitary channel on k qubits, |ϕ is a CSS state on k − k qubits for some integer k in the range 0 ≤ k ≤ k, and K is a code projection for an [[n + m, k ]] CSS code.
4. By Lemma 27, each CSS code projection K on the input and ancillary qubits from Eq. (E7) can be reduced to a CSS code projection on the input qubits alone, followed by preparing a CSS state and completely CSS-preserving post-processing.Thus one can write where q is a probability, U is a completely CSSpreserving unitary channel on k qubits, K is the code projection for an [[n, k ]] CSS code for some integer k in the range 0 ≤ k ≤ k , and |ϕ is a CSS state on k − k qubits.Substituting back then immediately yields the result.

Auxiliary lemmas
Before turning to the proofs of Lemmas used in each step of the main proof, we first present a result that will be useful throughout.
Lemma 22.Given any completely CSS-preserving unitary U and CSS observable S on n qubits, S := U † SU is another CSS observable of the same type as S.This further implies P (±S)U = U P (±S ).
Proof.For convenience, let us label the n qubits as 1, . . ., n.Let a be an arbitrary n-bit string, and e j be the n-bit string with 1 in its jth entry and 0 everywhere else.We then have the following conjugation relations Z j [X(a)]Z j = (−1) aj X(a) (E9) Z j [Z(a)]Z j = Z(a) (E10) X j [X(a)]X j = X(a) (E11) X j [Z(a)]X j = X j (−1) aj Z(a) (E12) CNOT(i, j)[X(a)] CNOT(i, j) = X(a + a i e j ) (E13) CNOT(i, j)[Z(a)] CNOT(i, j) = Z(a + a j e i ), (E14) for any i, j from the range 1, . . ., n, where arithmetic is modulo 2. Since S is of the form ±X(a) or ±Z(a), and U is a product of CNOT(i, j), Z j and X j for i, j in the range 1, . . ., n, we immediately arrive at the Lemma result.
It is furthermore useful to recall that, given an [[n, k]] CSS code C whose stabilizer group is generated by CSS observables S 1 , . . ., S n−k , Corollary 3 implies the code projection K of C can be represented using a completely CSS-preserving encoding unitary U as K(ρ) := tr k+1,...,n U † P (ρ)P U , We remark that in the k = 0 case, K simply projects onto a pure n-qubit CSS state and then discards it.
Step 1: Standard form for CSS magic distillation protocols Lemma 23.Any n-to-k CSS magic distillation protocol E can be decomposed as a sum of n-qubit operations where p i is a probability and E i (ρ) := tr k+1,...,n+m K i (ρ for a CSS state |ψ i on m ancillary qubits and Kraus operator K i .Furthermore, K i has the form where U i is a completely CSS-preserving unitary and P (S i,l ) projects onto the +1 eigenspace of a CSS observable S i,l .
where |ψ is a CSS state on m qubits.We then have that    We can then use Eq.(E90) to show that

Figure 1 .
Figure 1.(Finite range on CSS code lengths for magic state distillation protocols).We plot upper and lower bounds on the number of copies n of the noisy Hadamard state (1− ) |H H|+ 1 2 required to distil a single output qubit |H with output error rate δ = 10 −9 by projecting onto an [[n, 1]] CSS code.The shaded purple region shows the range of code lengths allowed by the tightest numeric upper bound (red curve) from Theorem 8 and the lower bound from projective robustness (PR) introduced in Ref. [46] (blue curve).The analytic upper bound n * (dashed yellow curve) defined in Eq. (2) is shown to form a good approximation to the numeric bound.(a) When target acceptance probability p is low (p = 0.1) the upper bounds are less constraining; (b) By increasing to p = 0.9, the upper bounds become considerably tighter.In both cases, there is a cut-off input error beyond which no CSS code projection protocol can achieve the desired combination of output error and acceptance probability.

Figure 2 .
Figure 2. (Schematic of our approach).We find that the set of completely CSS-preserving protocols O are stochastically represented.Such protocols contain the family of CSS code projections as subset, examples of which include 7-1 and 23-1 protocols based respectively on the Steane [[7,1]] and Golay [[23,1]] codes[19].

Figure 3 .
Figure 3. (Lower bound comparison).We plot lower bounds on the number of copies n of the noisy Hadamard state (1 − ) |H H| + 12 required to distil a single output qubit |H with output error rate δ = 10 −9 and acceptance probability p = 0.9 under a CSS code projection protocol as a function of input error rate .Our tightest lower bound from majorization (maj.) is shown to be tighter those from mana[56] and generalized robustness (GR)[63].However, it only outperforms the lower bound from projective robustness (PR)[46] in the high p, high regime.

Figure 4 .
Figure 4. (Wigner-Rényi entropies & magic distillation) We plot the condition in Theorem 8 for the existence of finite upper bounds on n in an n-to-1 CSS distillation for the qubit Hadamard state distillation, signified by region where Hα[W ρ( ) ] > 1.Even in the limit of zero input error = 0 we obtain a valid set of permissible α, which implies that Hadamard state distillation under n-to-1 CSS code projection is ruled out in the asymptotic limit n → ∞.We further highlight that the error rate = 0.3 (dashed curve) is outside of the region where ρ( ) is magic (0 ≤ < 1 − 1

Theorem 9 (
Qudit code bounds).Consider the distillation of k copies of a pure magic state ψ from a supply of the noisy magic state ρ, where ψ and ρ are d-dimensional qudit states that are real-represented under a generalised Gross's Wigner representation W .Any stochasticallyrepresented distillation protocol that projects onto the codespace of an [[n, k]] stabilizer code and can use n copies of ρ to distil out a k-qudit state ρ with acceptance probability p and output error δ ≥ ρ − ψ ⊗k 1 must have a code length n such that (b)

Figure 5 .
Figure 5. (Explicit protocol comparison).(a) We compare the majorization upper bounds (dashed lines) on the acceptance probability p with which one can distil a noisy Hadamard state (1 − ) |H H| + 12 via an n-to-1 code projection against actual acceptance probabilities attained using the Steane code (purple) at n = 7 and the Golay code (green) at n = 23 (detailed in Ref.[19]).Attained acceptance probabilities are orders of magnitude less than our upper bounds.(b) We plot the majorization upper bound (dashed line) on the acceptance probability p of any 15-to-1 CSS code projection protocol with which one can distil the noisy magic state (1− ) |A A|+ 12 .Interestingly, our bound is very close to the actual acceptance probability for the 15-to-1 protocol (blue line) given in[18], though we emphasise this latter protocol is not a straightforward CSS code projection.

Figure 6 . 1 8 1 .
Figure 6.(Majorization gives independent constraints over DPI).(a) Shown is how (scaled) ∆nU := n DP I U − n maj U varies over all possible values of acceptance probability p and a realistic range of input error , with fixed δ = 10 −9 .Whenever we have log 10 (∆nU + 1) > 0 means that upper bounds from majorization give tighter constraints than the DPI, reaching ∆nU = O(10 4 ) in the low p, low regime.(b) We show the trade-off relation given by bounds on the maximum achievable fidelity Fmax(ρ) vs. target acceptance probability p, under an n-1 CSS code projection, where ρ = 3 4 |H H| + 1 8 1.For pabove a given threshold (≈ 0.6) no perfect distillation is theoretically possible, even for n → ∞ copies of the input state.Majorization (maj.) is shown to give stronger constraints than that of DPI.

Figure 7 .
Figure 7.The sequence of primitive CSS channels on the top can be executed as the binary tree on the bottom.

E
i,s (ρ) = K • P(S i,1 ) • • • • • P(S i,N )[ρ ⊗ |ψ i ψ i |], (E6)in which P(S i,j ) post-selects the +1 outcome in a projective measurement of the CSS observable S i,j , and K is the code projection of an [[n + m, k]] CSS code.
are now two possibilities: (a) that S commutes with every generator of S(|ψ ), so S or −S must stabilize |ψ .Therefore, either |φ + = |ψ and |φ − = 0 or vice versa, so |φ ± are possibly subnormalised CSS states.(b) that, without loss of generality, S does not commute with just one CSS observable S 1 that generates S(|ψ ).This follows from the fact that, in any set of m + n CSS observables that generate S(|ψ ), those that do not commute with S must all be X or Z-type, so by picking one such generator and multiplying all others by it, we obtain another set of m + n CSS observables generating S(|ψ ) in which only one generator does not commute with S .Then the states |φ ± have norm 1 √ 2 and are

)
Substituting Eq. (D14) and D α W ρ W 1 − H α [W ρ ] into Eq.(D2)gives the result as claimed.2.Constraints are independent of the choice of CSS state σ prepared on failed code projectionBy inspection, the form for ∆D α given in Lemma 17 has no σ-dependence, which implies Corollary 4. The entropic constraints ∆D α ≥ 0 on n-to-k CSS code projection protocols are independent of which CSS state σ is prepared following failed runs.This result also follows from resource-theoretic arguments.Consider a CSS circuit E on k + 1 qubits that performs a Z-basis measurement on the last qubit and re-prepares the first k qubits in a CSS state ω conditioned upon the −1 outcome.Thus one can write E ω (•) := I ⊗ P 0 (•) + ωtr ⊗ P 1 (•), 2 = 2 k (•) := |k k| (•) |k k|, from which one straightforwardly verifies that E ω (ρ p ) and E ω (τ n,k ) simply replaces σ in ρ p and τ n,k respectively by ω.Since E is stochastically represented, given any ω in the interior of D css , we have by Property 4 in Lemma 15 that S(|ψ 0 ) is the stabilizer group of |ψ 0 .Consequently, we can express S i as S i = S i ⊗ T i , where S i and T i are CSS observables of the same type as S i that respectively stabilize |ψ 0 and |ψ .Therefore, by multiplying each S i in Eq. (E81) by appropriate generators of the same type in {1 ⊗ ⊗ T 1 , . . . 1 ⊗n ⊗ T m }, we can represent the stabilizer group of |s C asS(|s C ) = S(|ψ s ) ⊗ S(|ψ ), (E82)where the stabilizer group S(|ψ s ) is generated byS(|ψ s ) = (−1) s1 S 1 , ..., (−1) s k S k , S k+1 , ..., S n ,(E83) in which S 1 , . .., S k are Z-type becausse S 1 , . .., S k are Z-type.By applying Lemma 14 to S 1 , . .., S n , we can find a logical basis {|s C } for an [[n, k]] CSS code C stabilized by S k+1 , . .., S n , generated by a completely CSS-preserving encoding unitary, such that |s C shares the stabilizer group of |ψ s .Therefore, |s C and |ψ s only differ up to a phase, which implies the Lemma.Lemma 27.Let K be the code projection for an [[n + m, k]] CSS code C where n ≥ 1, n > k and m > 0. Then given any m-qubit CSS state |ψ , we have that C([•] ⊗ |ψ ψ|) is equivalent to a CSS code projection on n qubits alone, followed by preparing a CSS state and completely CSS-preserving post-processing, i.e.K(ρ ⊗ |ψ ψ|) = p Ũ • K(ρ) ⊗ |ϕ ϕ| , (E84) where p is a probability, Ũ is a completely CSS-preserving unitary channel on k qubits, K is a code projection for an [[n, k ]] CSS code where 0 ≤ k ≤ k, and |ϕ is a CSS state on k − k qubits.Proof.Let {S n+1 , . .., S n+m } be a set of CSS observables that generate the stabilizer group defining |ψ .Then K(ρ ⊗ |ψ ψ|) is equivalent to K(ρ ⊗ |ψ ψ|) = K(P[ρ ⊗ |ψ ψ|]P), (E85)where P projects the last m qubits onto |ψ , i.e.