Magic state distillation in all prime dimensions using quantum Reed-Muller codes

We propose families of protocols for magic state distillation -- important components of fault tolerance schemes --- for systems of odd prime dimension. Our protocols utilize quantum Reed-Muller codes with transversal non-Clifford gates. We find that, in higher dimensions, small and effective codes can be used that have no direct analogue in qubit (two-dimensional) systems. We present several concrete protocols, including schemes for three-dimensional (qutrit) and five-dimensional (ququint) systems. The five-dimensional protocol is, by many measures, the best magic state distillation scheme yet discovered. It excels both in terms of error threshold with respect to depolarising noise (36.3%) and the efficiency measure know as"yield", where, for a large region of parameters, it outperforms its qubit counterpart by many orders of magnitude.

The central challenge of implementing scalable quantum computing is to protect quantum systems against noise and decoherence while retaining the capacity to perform computation.Quantum error correction and fault tolerant techniques provide a solution to this problem, and a variety of constructions for fault tolerant quantum computation have been proposed [1][2][3][4].In all these schemes, a delicate balance must be maintained between coherently manipulating the encoded system while preserving the protected subspace and prohibiting the proliferation of errors.For example, for schemes built on stabilizer codes [5] transversal gates have the desired properties, while in topological systems, topologically protected braiding operations [2] provide the logical gates.While much work in quantum computation has focussed upon qubits (twolevel systems), it is known that for any prime d, effective codes exist for storing d-level quantum systems [5][6][7].Thus qudit systems are also candidates for scalable fault tolerant quantum computation.
In many approaches, the protected unitary gates are a subset of the so-called Clifford group.The stabilizer operations (comprising Clifford unitaries as well as preparation and measurements in the computational basis) are known to be efficiently classically simulatable [5,6,8], and on their own are not universal for quantum computation.Furthermore, several theorems have shown [9][10][11][12] that, in general, there is a tension between providing protection against generic noise and achieving universal quantum computing.Despite these obstacles, fault tolerant universal quantum computing is possible [1].One particularly successful approach, known as state-injection, is to achieve universality by augmenting the fault tolerant operations with a supply of many copies of a suitable ancillary resource state.While methods for direct preparation of sufficiently noise-free protected resource states have been proposed [1], a particularly elegant solution can be provided by distillation techniques, where many noisy copies of a resource state can be distilled to arbitrary fidelity by using only error-protected operations, * Electronic address: earltcampbell@gmail.com more copies lower fidelity fewer copies higher fidelity ρ' ρ ρ E ρ ρ FIG.1: An outline of a round magic state distillation protocol.Within many architectures of fault tolerance quantum computing a large proportional of the device is committed to these magic state factories.Each attempt uses n copies of a state ρ, and when successful outputs a state ρ ∝ E(ρ ⊗n ).Given n successful attempts, these are used as inputs into the next iterate.Within the magic states model the completely positive map, E, is composed of a sequence of Clifford unitaries and Pauli measurements.This figure illustrates a protocol where n = 4, for example the ququint, d = 5, protocol that we discuss throughout the article.while preserving the error threshold of the model.
Here we consider the typical but idealized case, where the available protected operations are perfect stabilizer operations.As such whenever we speak of a qubit we mean a qubit encoded into either a topological system or stabilizer code that provides protection of stabilizer operations.This has become known as the magic states model and was first studied by Bravyi and Kitaev [13] who proposed two protocols for distillation of qubit magic states, the resource for stateinjection of a non-Clifford unitary.In parallel, Knill proposed the concept of a post-selected quantum computer [4,14,15] that used state preparation protocols that appeared distinct to magic state distillation, but were later shown to be equilvalent [16].These techniques are a key component of many fault tolerance schemes, including for example the topological cluster state scheme [3,[17][18][19][20][21][22].
Additional protocols were later discovered by Reichardt [16,23,24], which increased the family of qubit states known to be distillable.Conversely, Campbell and Browne [25,26] showed that no finite iterative protocol could distill all mixed nonstabilizer states.Many other results have contributed to our understanding of the magic states theory for qubit systems [27][28][29][30] and a 5-qubit distillation protocol has been implemented in an NMR system [31].
The theory of higher dimensional quantum computation [32][33][34][35], stabilizer operations and error correcting codes [36,37] is well known.However, higher dimensional magic state models have been largely neglected until recently.In anyonic systems the dimensionality of the available stabilizer operations is determined by the underlying physics, and so with some physical systems we would have no choice but to work in the higher dimensional model (see e.g.[38,39]).Recent progress on this problem has centered on exploiting a discrete phase space, or Wigner function, representation of quantum states [40][41][42].Notably, Veitch et al. [42] showed that states with positive Wigner functions can never be used as a resource for magic state distillation.Although all stabilizer states have positive Wigner functions, there also exist undistillable nonstabilizer states and so bound magic states.
These developments on no-go results took place without any known distillation protocols in higher dimensions.However, recently we have proposed a protocol for 3-dimensional, qutrit, systems that uses a generalization of the 5-qubit code [43].Magic state distillation was observed there, but the error suppression was slower than in qubit protocols.Here we present a family of protocols that distil magic states in any odd prime dimension and do so with a quadratic reduction in noise per iteration.As such the protocols are competitive with (and in some cases outperform) the best previously-known qubit protocols.
Our protocols exploit higher dimensional quantum Reed-Muller codes [44] and so generalize the qubit protocol of Bravyi and Kitaev [13] that used a 15-qubit quantum Reed-Muller code.This 15-qubit code was, to our knowledge, first developed by Knill, Laflamme and Zurek [45] and later developed by Steane [46].These quantum codes are constructed from classical Reed-Muller codes [47][48][49][50][51][52], which have played a pivotal role in classical coding theory.Notably, the family of Reed-Muller codes includes the infamous Reed-Solomon code used for communication with the Voyager space probe and data storage on compact disks.
We begin with a formal description of the Clifford group and the magic states model.This allows us to state our main theorem; roughly that magic state distillation is possible in higher dimensions.Next we review some basic theory of quantum error correction and show what properties of error correcting codes would enable us to build a protocol for magic state distillation.This sets the stage for constructing codes, the quantum Reed-Muller codes, which have the required properties.Next we introduce some additional tools from classical coding theory that helps to simplify our analysis for a uniform depolarizing noise model.
We consider several measures of performance for many protocols applied to systems of up-to 19 dimensional systems.The measures indicate that two protocols for qutrits (3-dimensional systems) and ququint (5 dimensional systems) perform well compared to both qubit protocols and protocol for even higher dimensional systems.For these two protocols we investigate their performance in more detail.We find that the qutrit protocol performs well, but that the ququint outperforms all other known magic state protocols, both, in terms of the degree of error on the initial state it can tolerate and the efficiency of the protocol.We will see that the effectiveness of these protocols can be related to properties of the Clifford group.The startlingly good performance of these protocols make higher dimensional systems a enticing alternative to qubit systems.
Finally, we show how to perform state-injection to convert the distilled magic states into non-Clifford gates.The addition of any non-Clifford unitary to the set of n-qudit Clifford gates gives a set of gates dense in SU(d n ) and so approximately universal via the Solovay-Kitaev theorem.This fact is wellknown for qubits, but is also true for general prime d, which follows from theorems proven by Nebe, Sloane and Rains in their recent book [53] (see Appendix D).

I. STABILIZER OPERATIONS AND THE MAGIC STATES MODEL
We are interested in d-dimensional quantum systems, or qudits, where d is an odd prime.The computational basis states are labeled by j ∈ F d , where F d denotes the finite field of d elements.For such systems, the so-called Pauli group, P d , is generated by where ⊕ is addition modulo d, and ω = exp(i2π/d).The conjugation relation, XZ = ω −1 ZX, is easy to verify and used throughout.The Pauli group over n qudits, P n d , is the nfold tensor product of the single qudit Pauli group.Consider an Abelian subgroup of the Pauli group, S, which contains the identity but no other multiple of the identity, e.g.ω1l / ∈ S. Associated with this group is a physical subspace, called a stabilizer code, and a projector onto this subspace, Π ∝ s∈S s.We equate Π with the code and call the group S the stabilizer of the code.When the code is 1-dimensional, the projector describes a pure quantum state, which we call a pure stabilizer state.We will also follow common terminology and call any probabilistic ensemble of pure stabilizer states a stabilizer state, even when there does not exist a unique stabilizer group describing the mixture.
The Clifford unitaries C n d are those that conjugate the Pauli group to itself, so The whole Pauli group is a subgroup of the Clifford group, Gottesman [36] introduced several other Clifford gates, including the single qudit gates and the 2-qudit gate, SU M (or generalized CNOT) gate, These gates have been shown to generate the whole Clifford group [54].The magic states model also allows the implementation of so-called Pauli measurements.Given any Pauli U ∈ P n d , which we express as U = d−1 k=0 ω k U k , we allow for POVM measurements with elements {U k }.It is commonplace, though a modest abuse of terminology, to speak of measuring the Pauli U .
For an n-qudit system the space of possible density matrices is within the set of bounded operators, B(H d n ), acting on . For such a space the set of physical stabilizer operations allowed in the magic states model is captured by the following.
Definition 1 Consider a completely positive map E : The map is a stabilizer operation if and only if it can be composed from the following: 1. Clifford unitaries; 2. measurements and subsequent projections on stabilizer subspaces; 3. preparation of fresh ancilla in a stabilizer state; 4. tracing out of unwanted qudits; 5. adaptive decision making based on both measurement outcomes and random coin tosses.
The number of qudits output and input may differ, as is typically the case when magic state distillation is performed.

II. M-TYPE GATES AND M-DISTILLATION CODES
Every code defines an iterative scheme for magic state distillation.However, some codes are much more suitable than others, and their usefulness can often be inferred from abstract properties of the code.In particular, the 15-qubit Reed-Muller code exploited by Brayvi and Kitaev has a very special property.There exists a product operator, of the form U ⊗n , that acts on the logical basis as a non-Clifford operator.Such a code is said to have transversal non-Clifford gates, and we will consider generalizations of the qubit Reed-Muller codes with this remarkable property.
The transversal non-Clifford gate of the 15 qubit code, the so-called π/8 gate denoted U π/8 , has another additional interesting property; For all Pauli P ∈ P n 2 , we have that Gottesman and Chuang defined this set of gates as the second level of an infinite hierarchy of qubit gates [55].The hierarchy generalizes easily to quqits.

Definition 2
The k th level of the Clifford hierarchy for n quqits is the set where the bottom level is the Pauli group The hierarchy is defined recursively with the k th level as the set of unitaries that conjugate the Pauli operators to a unitary in the (k − 1) th level.The bottom level is fixed as the Pauli group and the first level is simply the Clifford group C n d (1) = C n d .Whereas higher levels are not groups.The quqit gates of interest share these properties with the qubit U π/8 gate and are defined as follows.
Definition 3 The set of gates M m d contains all M such that: 1. M is diagonal in the computational basis; 2. M d m = 1l; We outline the motivation for these criteria and remark that M can be remembered as short for magic.Conditions 1-3 will be directly related to the transversality of the gate for our quantum Reed-Muller codes.Furthermore, if we express the eigenvalues of M as exp(i2λ j π/d m ) then condition 2 entails λ j are integers and condition 3 is satisfied when j λ j = 0. Condition 4 requires that while M is a member of the second level of the Clifford hierarchy, it is not a member of the Clifford group itself.From this we conclude that the operator is in the Clifford group but is not a Pauli operator.The eigenstates of C M will be the attractor of our distillation protocols, which is why it is essential that C M is a non-Pauli operator.Distillation would be possible without requiring that C M is a Clifford operator, but demanding this property provides us with tools that improve the protocols efficiency.We observe that these sets form their own hierarchy, such that for any m < m we have M m d ⊂ M m d .This holds because, almost trivially, M d m+1 = (M d m ) d = 1l d = 1l.We remark also that if M ∈ M m d then M † ∈ M m d and we use this feature throughout.
For every such set that is non-empty we will design protocols that distill eigenstates of C M .However, we need to know that such gates exist.In the qubit setting, the π/8-phase gate provides such a unitary for m = 4.However, for m < 4 it is easy to check that all qubit gates with the form required by conditions (1-3) of the above definition are Clifford unitaries and so fail condition (4).Remarkably, for all odd prime dimensions d ≥ 3 we can find such gates for m = 2, and when d ≥ 5 these gates exist for m = 1 [56].Using tall brackets to denote binomial coefficients we have the following.
Theorem 1 For all odd primes d, there exists a gate M such that 1. for d = 3 we have M ∈ M m d for all m ≥ 2; 2. for prime d ≥ 5 we have M ∈ M m d for all m ≥ 1.
One such gate is the following with We refer to this M as the canonical M d gate.
In particular, the canonical M d gate is associated with the non-Pauli Clifford unitary where P is the Clifford gate introduced earlier in Eq. ( 2).Clearly, a different M exists for every dimension d.For notational clarity we suppress this d dependence.The proof is elementary, but for completeness given in App. A. That odd prime dimensions can produce the desired gates with smaller m is no mere technicality, it has far reaching benefits for the magic state distillation in higher dimensions.We also remark that these gates, for d = 3, 5, are Clifford equivalent to those found in Ref. [41] to be the most robust to depolarizing noise before becoming stabilizer operations.This family of gates was discovered independently and concurrently to this paper by Howard and Vala [56], and many other important properties of such gates can be found there.The eigenstates of C M are non-stabilizer states, which we label |M k .We note that |M k = M |+ k , where |+ k is an eigenstate of X with eigenvalue ω k .We aim to use magic state distillation to purify copies of |M 0 from noisy copies, and in turn to use these for fault-tolerant state-injection of the magic unitary M .This brings us to our main result.There exists a stabilizer operation, E, that iteratively distils the magic state |M 0 .The map E takes n = d m − 1 copies of a qudit state ρ, where With non-zero probability the protocol outputs a state ρ ∝ E(ρ ⊗n ) such that There exists a K > 0 such that for all we have ≤ K 2 .Consequently, there exists a threshold * > 0 such that if 0 < < * then < .
Notice that after a single iterate, using as input noisy |M 0 states, the protocol will output a noisy |M † 0 state.By performing an even number of iterations a fixed state can be distilled.We call this phenomena cycling, and in many cases it may be prevented by some Clifford unitary correction.However, cycling can be desirable as it provides us with a mechanism for producing both |M 0 and |M † 0 states.The rate of error suppression is always quadratic, and so these results give the first better than linear error reductions in higher dimensional systems.The Clifford unitary C M plays a practical role in several steps of our protocols.First, it is used for C M −twirling, which is a process for converting input states into a canonical form.By randomly choosing an integer, k = 1, ..., d and applying C k M we twirl any quantum state into the |M k basis.Hence, all qudit states, ρ, can be twirled into a form that depends only on d − 1 independent parameters, such that Our distillation protocols seek to increase the value of f 0 .We will see that C M is also used in our protocols for Clifford correction, which significantly increases the success probability, and as part of the final state-injection.

III. MAGIC STATE DISTILLATION PROTOCOLS A. CSS codes
Calderbank, Shor and Steane identified a special class of quantum codes, which in their honor are now known as CSS codes [57].These codes have stabilizers generated by two subgroups, S Z and S X , which contain only Z k and X k terms, respectively.Therefore, the code projector has the form Π S = Π S X Π S Z .All CSS codes, can also be described by a pair of classical vector spaces, which correspond to S Z and S X .If we have a vector u ∈ F n d and a single qudit operator, U , then we define the n-qudit operator The k th element of the vector, u, tells us what multiple of U acts on the k th qudit.It follows that for every s ∈ S Z we can find a u such that s where L Z is a linear vector space.The closure of the stabilizer group under multiplication is easily seen to directly correspond to closure of L Z under additional modulo d.Similarly we can find a linear code, L X , for S X .The whole stabilizer must be Abelian and so for all u ∈ L X and v ∈ L Z we require u, v = ⊕ j u j v j = 0. Furthermore, for any code, L, we define the dual code In terms of duality, commutation inside the stabilizer equates to The dimensionality of the duals are related by Dim(L ⊥ ) = n − Dim(L), where n is the dimension of the vector field they inhabit, namely gives the number of logical qudits supported by the code Π.
Here we are solely interested in stabilizer codes of only d dimensions, that is a single logical qudit.It is useful to specify a basis spanning the code, which we again do using Pauli operators Z L and X L .These are the so-called logical operators of the subspace and they must commute with the code stabilizer.Whereas, with respect to each other the logical operators must conjugate in the same way as Z and X, such that X L Z L = ω −1 Z L X L .It follows that there exists an orthonormal basis, {|j L }, of stabilizer states that obey , and which we call the logical basis.In this basis, the code projector can be expressed as Π = j |j L j L |.We also make use of the X-basis that we denote |+ j for single qudits stabilized by ω −j X and |+ L j for logical encoded states stabilized by ω −j X L .Typically, such logical operators can also be expressed in terms of vectors, such as X L = X[u] where commutation of X L with S Z entails u ⊂ L ⊥ Z and L Z ⊂ u ⊥ .Given this vector description, a useful fact is that L Z = (span(L X , u)) ⊥ where the span(..., ...) is the vector space generated by its arguments.Let us prove this by first observing that since L Z ⊂ u ⊥ and L Z ⊂ L ⊥ X we have that L Z ⊂ (span(L X , u)) ⊥ .That L Z can be no smaller than this set follows from dimension counting; that is Since we have a single logical qudit, k = 1, we know also that Since the dimensionalities match, the assertion is proven.Taking also Z L = Z[v] and noting (L ⊥ ) ⊥ = L, many such results for single qudit codes can be deduced by similar reasoning, We employ the above relations throughout.
The smallest unitary capable of non-trivially acting on the code gives the robustness of the code to noise.For CCS codes it suffices to consider phase and bit flip noise separately.For an operator U [u] its "size" is measured by the Hamming weight, |u| H = {#x j ; x j = 0}, so the number qudits upon which the operator acts non-trivially.The robustness to phase noise is measured by the distance, Finally we remark that for any code there always exists a Clifford unitary that decodes, such that

B. Suitable codes
We now define the broad class of quantum codes that we show can be used to distill these magic states.
We have introduced the vector shorthand 1 = (1, 1, ...1).Notice that we require a special kind of transversality, such that the logical operator, M † L , is implemented by applying M ⊗n .The need for complex transposition will be explained later, and will be seen to result in a cycling phenomenon in the distillation protocol.
Here we show that all M m d -distillation codes can be used to perform distillation for magic states of the form |M 0 = M |+ 0 for all M ∈ M m d .Due to cycling, after a single iterate using as input noisy |M 0 states, the protocol will output a noisy |M † 0 state.
Theorem 3 Given an n-qudit M m d -distillation code of distance D the following holds.For all M ∈ M m d there exists a stabilizer operation, E, that iteratively distils the magic state |M 0 .The protocol takes as input n copies of a state, ρ, where With non-zero probability the protocol outputs a state ρ ∝ E(ρ ⊗n ) such that There exists a K > 0 such that for all we have ≤ K D .Consequently, there exists a threshold * > 0 such that if 0 < < * then < .
Later we show the existence of the required codes with d = 2, which will then entail Thm. 2. For now we show how to proceed given such a code.

C. The protocol
We prove the above key result constructively.Given an nqudit M m d -distillation code and any M ∈ M m d , we can perform the following iterative magic state distillation protocol.When iterating the protocol, on the odd iterates we must replace C M by C † M to account for cycling.We have not yet defined the exact setting of C M [w], but will come to this in due time.For simplicity though, we begin with assuming that step 2 generates all "+1" measurement outcomes, for which C M [w] = 1l.We explain later how the Clifford correction in step 3 increases the success probability.
After C M -twirling the n-copies we have a state where and where wt k (v) is the k-weight, the number of elements in v equal to k, and f k = M k |ρ|M k .We note that, where M † L = M ⊗n .Upon a successful projection onto the code subspace, we have as the projector commutes with M L .We need to determine the effect of each term Π|+ v , which we will find to be The first equation covers all v / ∈ L ⊥ X and the second equation covers all v ∈ span(L Z , 1l).By virtue of Eq. ( 16) we know L ⊥ X = span(L Z , 1) and so these equations account for all possible v.The constant c gives the probability of this projection when the initial state is pure, Furthermore, |+ ⊗n is an eigenstate of Π S X and so this randomness can be completely attributed to the Z stabilizer measurements, which can be made deterministic by Clifford correction.Equations (24,25) follow directly from properties of error correcting codes, but for completeness more details are given in App.B. In summary, the transversality of M allows us to consider the distillation of magic states |M 0 as equivalent to the simpler problem of distillation in the X basis.Combining these results, we have The output state is diagonal in the basis M † L |+ L j rather than the desired M L |+ L j .We reiterate that this cycling is not problematic as an even number of iterations always brings us back to the initial basis.Decoding onto a single qudit and using E to denote the whole process, we have By expanding out α v , we get an iterative formula for which has been renormalized by dividing through by the success probability P .This probability equals the sum of the numerators, which is The summation over all j, such that v⊕j1 ∈ L Z , is equivalent to a sum over all v ∈ span(L Z , (q − 1)1).Using the features of CSS codes (see Eq. 15) we know span(L Z , (d − 1)1) = L ⊥ X and so Notice that we have dropped a factor of c from the success probability, which will be justified later by Clifford correction.Both numerator and denominator of f j are polynomials of degree n, and can be calculated from the classical codes.

D. Analyzing the iterative formulae
Here we consider some properties of the above iterative formulae.First we consider simple depolarizing noise model and give a Taylor series approximation.Next, we consider a completely general noise model and show the existence of a distillation threshold.
When the noise is depolarizing, and so f j =0 = /(d − 1) and f 0 = 1 − , the formula for the fidelity simplifies to where |...| H is again the Hamming weight.The factors f n 0 appear on both numerator and denominator and so cancel.Making use of the shorthand we can further simplify the fidelity formula to Such cases are easier to study as they depend on only a single parameter and the simple Hamming weights.Indeed, we will show later, in Sec.IV E, that this simple form can be further simplified by leveraging some powerful techniques from classical coding theory.For now we make some casual observations concerning quadratic error suppression.
Taylor expanding the numerator and denominator to second order we have where a (b) is the number of weight d elements of L Z (L ⊥ X ).Both L Z and L ⊥ X contain a single weight zero element, v = 0 = (0, 0...0).By definition both contain no other elements with weights smaller than d.Further approximating the denominator and using So the suppression of errors is degree D as µ ∼ .In particular, since D ≥ 2 the error suppression is at least quadratic.
The depolarizing noise model is useful for illustrating the salient features of a distillation protocol.However, it is important to demonstrate error suppression and existence of a threshold for all possible noise models.Again we rescale the noise parameters to µ k = f k /f 0 , and so Both L Z and L ⊥ X contain v = 0 for which wt k =0 (v) = 0, and so both numerator and denominator contain a term equal to 1.We make a very coarse lower bound on the numerator, which must be greater than 1 since all terms are positive.We wish to upper bound the denominator less coarsely.First we define µ = max k =0 {µ k } and use it to replace all other noise parameters in the denominator, yielding the inequality Recall that v = 0 contributes 1 to the summation and all other terms are upper bounded by µ D where D is the distance of the code.Hence we have where C is the number of nontrivial terms, where = 1 − f 0 .We assume without loss of generality that f 0 is larger than all other f j , which allows us to bound (1 − ) −1 ≤ d and so This gives us a valid constant K = d D C as asserted in Thm. 2 and Thm. 3. The existence of some distillation threshold follows quickly.If we consider * = K −(D−1) −1 , we find that if 0 < < * then < .The above analysis is very general, but the corresponding bounds will be far from tight and a much higher * will exist.

E. Clifford correction
So far we have assumed that the Z stabilizer measurements all yield the desired " + 1" outcome.Next we consider the process of Clifford correction, as outlined by step 3 of our protocol.This additional strategy significantly increases the success probability of each round, so much so that success is guaranteed in the limit of pure initial states.The general idea is that for any measurement outcomes, with resulting projector Π S Z , there exists a Clifford The key fact exploited is that for a single qudit C M Z = ω −1 ZC M , and so for many qudits To proceed we must specify the projector Π S Z .We begin by expressing the linear code as L Z = {Gu : u ∈ F m d } where m = Dim(L Z ) and G is an m by n matrix called the generator matrix of L Z .Each column of G gives an individual generator of L Z and hence S Z .When the measurement corresponding to the j th generator gives outcome ω kj , the resulting projection is Conjugating with a Clifford correction C M [w] yields and so the correction works when for all u we have k, u = w, Gu mod d.We can always choose a canonical form for the generator matrix, such that G = (1l m |G ), where the identity acts on the first m rows of G and G labels the remainder of the matrix.For such a canonical generator matrix we choose w to equal w = (k 1 , k 2 ...k m , 0, 0, ...0) so it matches the measurement outcomes on the first m entries.This yields w, Gu = k, u and so Clifford correction achieves its goal.

IV. REED-MULLER CODES A. Some concrete examples
Our demonstration of magic state distillation in higher dimensions was conditional on the existence of a M m ddistillation codes, as specified in Def. 4. Before introducing a family of M m d -distillation codes for all odd prime d, we give some concrete examples.We label the codes as QRM d (m) where d is again the dimensionality and m dictates the codes size and transversality properties.
Definition 5 QRM 3 (2) is a CSS code over n = 8 qudits of dimension 3. The L X code is generated by Whereas, L Z is the code generated by The logical operators are Z L = Z [21] and For the above qutrit code, we will find that it is transversal with respect to the canonical M 3 non-Clifford gate, as in Thm. 1, where τ = exp(i2π/9).
In the introduction of suitable non-Clifford gates, Thm. 1 showed that for odd primes greater than 3, it was sufficient to set m = 1 to find non-Clifford gates.Remarkably, this means that we can find even smaller codes with transversal non-Clifford gates.The smallest such code exists for d = 5, and as we shall see later it performs exceptionally well at magic state distillation.Definition 6 QRM 5 (1) is a CSS code over n = 4 ququints of dimension 5.The L X code is generated by Whereas, L Z is the code generated by The logical operators are Z L = Z [41] and For the above code we find it is transversal with respect to the canonical M 5 non-Clifford gate, where here ω = exp(i2π/d) = exp(i2π/5).Notice how the eigenvalues are all multiples of ω.For dimensions smaller than d = 5 any diagonal gate with phases that are multiples of ω will be a Clifford gate rather than a non-Clifford as desired.This property makes it possible in higher dimensions to find smaller codes with a transversal non-Clifford.
From this information one can numerically verify that both codes are well defined and have the correct transversality properties.Transversality can be verified by calculating the effect of the non-Clifford gates on the logical basis states.Over the following sections we develop an analytic proof that these features are valid for a whole family of quantum codes.Further details of the performance of these codes are given later, but we hope the examples help guide the reader through the general case.

B. Classical Reed-Muller codes
Here we review d-ary generalizations of Reed-Muller codes [49][50][51][52] and derive the crucial properties we exploit later.Convention dictates that we denote Reed-Muller codes as RM d (u, m), where d tells us the relevant field, u is the order of code and m determines the size of the code.Here we explicitly use only Reed-Muller codes of 1 st order, so u = 1.All Reed-Muller codes are defined by polynomials of a degree bounded by the codes order.For order 1 Reed-Muller codes we must consider degree 1 polynomials, that is linear functions.The dual of a Reed-Muller is another Reed-Muller code, though it may have a different order [49][50][51][52].In this way higher order Reed-Muller codes do enter into our work.However, it is sufficient for us to define them in terms of duality.Ultimately, we will not use these codes but their smaller shortened versions introduced in the next section.However, for pedagogical reasons we first review the unshortened variants.
Hence for ū = (0, 1) we have For any d and m (positive integers), the set L = {u = U m d (ū); ū ∈ F m d } is a linear vector space.Closure of the vector space under addition follows directly from the closure under addition of homogenous linear maps.The codes of interest are constructed by considering all affine functions, which are linear maps plus an additional constant c such that they map ū to U m d (ū) ⊕ c1.
The reader should note that λ-functions are closely related to the non-Clifford gates introduced in Def. 3. Our main observation here is the following.
Lemma 1 Given a λ-function Λ and an unshortened code To prove the lemma we first consider codewords where ū = 0, and so v = (c, c, c...c), then which vanishes modulo d m .Let us now consider the codeword for the unit vector, ū = (1, 0, 0..0), and c = 0.The corresponding codeword has a repetitive structure as in Eq. ( 51), where each element of F d appears d m−1 times.Hence, since we required in definition of a λ-function that d−1 j=0 λ j = 0.The above argument looks tailored to codewords for a unit vector ū, but a similar argument holds for all codewords with non-trivial ū.That is, for any non-trivial ū there are d m−1 different linear maps that evaluate to each possible output.To see this, consider that the family of linear maps is invariant under change of variables that preserve linearity.Hence, the family of functions can always be expressed in a basis such that ū is a unit vector.Furthermore, these codewords have uniform multiplicity of every value F d , and so adding c1 will only reorder the elements and not the multiplicity with which they appear.This proves our lemma.
In summary, unshortened Reed-Muller codes have a huge amount of symmetry that they inherit from the families of affine and linear maps.However, they actually have too much symmetry for our purposes.We break just enough of that symmetry by shortening the code.

C. Shortened classical Reed-Muller codes
Given a code L over F n d , the corresponding shortened code, denoted L * , is over F n−1 d .It contains all the codewords of L with 0 in the first position and that position deleted.The process of shortening is closely related to puncturing, where the first position is removed but all codewords are kept.We can also give a self contained definition of a shortened Reed-Muller code as follows.Here P m d is the same map as U m d , but omitting the first element.For example, the shortened version of Eq. ( 51) is which the reader may have noticed is also one of the generators of the L X code for the quantum QRM 3 (2) code reviewed earlier.Notice that the self-contained definition of the shortened Reed-Muller code makes use of only linear maps and not affine maps.In the unshortened code we had a generator 1 that corresponded to the constant term in affine functions.However, when shortening a code we only keep codewords with zero in the first position and so the 1 generator is dropped.For this reason the dimension of the code drops by one; Let us now consider the shortened analog of Lem. 1.
Lemma 2 Given a λ-function Λ and a shortened code This follows quickly from Lem. 1.Given a v ∈ RM * d (1, m), let us define where clearly w is a codeword of the unshortened code RM d (1, m).Furthermore, Λ(w) = Λ(v) + λ c as it has an extra term appended.However, Lem. 1 tells us that Λ(w) = 0 and so Λ(v) = −λ c .We will soon see that Lem 2 is intimately related to transversality of quantum gates for an associated quantum code.

D. Quantum Reed-Muller codes
Here we construct quantum codes from shortened Reed-Muller codes for general m and d.
We could have equivalently specified L Z as a higher order Reed-Muller code, though the above is simpler.We first check that QRM d (m) codes are indeed quantum codes.By construction, the stabilizer is Abelian as L Z ⊂ L ⊥ X .It is easy to check the logical operators are well defined: that Z L commutes with the stabilizer; X L commutes with the stabilizer; and X L Z L = ω −1 Z L X L .Now our next main result can be concisely stated.The main property we need to prove is transversality for all M ∈ M m d .As with all CCS codes, we have that Acting on this logical state with M ⊗n gives where Λ is a λ-function (Recall Def.8) using the integers {λ j } associated with the eigenvalues of the unitary M .Now we use our key lemma 2 to conclude and so we can identify M ⊗n with M † L .Proving a distance lower bound is straightforward as distance 2 is the smallest non-trivial distance.The relevant distance is D z , the smallest |v| H such that it produces a logical error Z[v]Π = Z j L Π.For such an operator v ∈ L ⊥ X but v = 0, so the phase error commutes with the X stabilizer but is non-trivial.If such an operator existed with Hamming weight 1, it would entail that there existed a qudit upon which L X acted trivially, which there is not.That the distance is not greater than 2 follows from the following section.

E. MacWilliams identities
We have introduced higher dimensional Reed-Muller codes and shown that they have suitable transversality properties for magic state distllation.Knowing the code stabilizer and using Eqs (29,34) we can calculate the exact analytic formuale for arbitrary noise.For QRM 3 (2) the general noise problem is tractable because L Z and L ⊥ X are quite small sets, but the size and complexity of these sets grows rapidly with d and m.This is relevant because the fidelity after 1 iterate is calculated by summing over all elements in L Z and L ⊥ X .By considering depolarizing noise, the problem is partially simplified by Eq. ( 34), which we restate here as where W L (µ) is known as a weight enumerator Weight enumerators have been extensively studied in classical coding theory [47].In particular, a weight enumerator for a code L can be related to the weight enumerator for the dual code L ⊥ by the MacWilliams identity [47] where we use the shorthand Using L Z = [span(L X , 1)] ⊥ = (L X ) ⊥ (see Eq. 13) and the MacWilliams identity we have The codes L X and L X are much smaller and simpler than their duals, and so the MacWilliams identities has proven extremely helpful.Indeed, for Reed-Muller codes we can find a closed form for these enumerators.When L X = RM d (1, m) we have (see App. C for details) and Combining all these formulae and reverting back to the original variables gives a closed analytic form, which is manageable albeit a bit long for reproducing here.Rather we present the Taylor expansion to second order in It is interesting that for all protocols based upon a quantum code QRM d (m) we see quadratic error suppression for all odd prime d and all m.Whereas the quantum Reed-Muller code used by Bravyi and Kitaev, QRM 2 (4), obtained a cubic reduction, such that ∼ 35 3 .Our analysis also describes the Bravyi-Kitaev protocol, the only difference being that in the qubit case we need m ≥ 4, and so the above formula also holds for qubits.It is intriguing to observe that the factor (d − 2) appears above and so the quadratic term would vanishes only in the qubit case, and so in higher dimensions these Reed-Muller codes are only distance 2. This is one of many curious differences between qubits and odd prime dimensions.

V. PERFORMANCE OF PROTOCOLS
Here we consider various aspects of the performance of our protocols.We begin by showing that our protocols yield magic states at a rate that scales only polynomially with the desired final error probability.We then use MacWilliams identities to analyse thresholds under depolarizing noise models for much larger codes.Next we consider in more detail the performance of our protocol based on QRM 3 (2) and QRM 5 (1)

A. Yields
The overall performance of a protocol can be captured by its yield.Given some target error probability, we calculate the yield as the expected fraction of the initial copies that achieves the goal.By definition, for any protocol and any distillable state ρ, with error probability in , there exists a number of rounds N (ρ, target ) required to achieve target .If on the k th round of distillation the success probability is P k , the yield is simply where n is the again the number of copies use per iterate.We are interested in how this scales as target vanishes.The success probability is continuous in and approaches 1 as vanishes, thus P k approaches 1 as k increases.Therefore, for all p < 1 these exists a c such that for all k > c we have P k > P c = p.This allows us to lower bound the yield such that where C is a constant overhead, independent of target , that represents the yield for c iterations.Furthermore, after c iterations the error probability is now c .Next we observe that for a single round we know ≤ K D for some K, equivalently K ≤ (K ) D .Therefore, the error probability after N iterations, N , satisfies . Taking K c < 1 allows us to bound the number of iterations needed such that For positive a and b we have the identity a log D (b) = b log D (a) , which combined with the above equations entails that With the shorthand γ = − log D (P c /n), which is positive, we have This decreases by a factor polynomial in −1 target .Conversely, the expected resource cost of distillation is the inverse yield, and this increases only polynomially in −1  target .The scaling is governed by the factor γ = − log D (P c /n), but P c can be taken arbitrarily close to 1.As such, the relevant scaling parameter is γ * = log D (n), and so For our protocols in odd prime dimension we will find that D = 2 and n = d m − 1 so γ * = log 2 (d m − 1), which we give in table I.The yield scaling parameter, γ * , for distillation by QRM d (m) as governed by Eq. 71.The smaller the value of γ * , the more resource efficient the protocol in the limit of many iterations.For qubit systems, the 10-to-2 protocol of Ref. [58] achieves γ * = log 2 (5) ∼ 2.32193, which is the best known value for qubit protocols.N/A indicates not applicable, as for those parameters no non-Clifford gates exist.
Notice that the code QRM 5 (1) achieves the best yield scaling of all quantum Reed-Muller codes.This accolade is retained by QRM 5 (1) even if we compare it with all presently known magic state distillation protocols.

B. Depolarizing noise thresholds
For some values of d and m, we have used the exact expression for to find the depolarizing noise threshold * dep below which distillation occurs (see Table II below).This should not be confused with the absolute threshold * that holds for all noise models and can be smaller.The threshold gets weaker for both increasing d and increasing m, as suggested by the above approximate formula for (see Eq 65).When we increase m, we increase the number of copies required per iteration but decrease the depolarizing noise threshold.This makes it advantageous to use the smallest possible m such that M ∈ M m d .The benefit of larger m is rather that a large set of states are distilled by the protocol.
If we also compare our protocols with the threshold of the BK protocol for d = 2, the pattern of better threshold for smaller dimensions no longer holds.We see that the best threshold we observe is for QRM 5 (1) with a fairly high threshold also observed for QRM 3 (2).There are many subtle differences in the Clifford group between odd and even dimension, and here those differences work in our favor.In 2 N/A N/A N/A 0.14148 3 N/A 0.211001 0.0657764 0.0214564 5 0.3631226 0.0614718 0.0119213 0.00236986 7 0.2322599 0.0291865 0.00409851 0.000584079 11 0.1341066 0.0111835 0.00100907 0.0000916717 13 0.1106148 0.00790156 0.000604487 0.0000464795 17 0.0818753 0.00454655 0.000266565 0.0000156773 19 0.072453 0.00362063 0.000190054 0.0000100014 TABLE II: The distillation threshold * dep for depolarizing noise when distilled by QRM d (m).We include the threshold for the Brayvi-Kitaev protocol using 15-qubits, which uses a quantum Reed-Muller code QRM 2 (4).N/A indicates not applicable, as for those parameters no non-Clifford gates exist.
odd prime dimension we can construct smaller codes with transversal non-Clifford gates.Our code QRM 5 (1) uses 4 ququints covering a Hilbert space of dimension 5 4 , which to our knowledge is the smallest non-trivial stabilizer code with a transversal non-Clifford gate.Furthermore, research to date indicates that smaller codes lend themselves to better thresholds.A plausible explanation is that larger codes allow more undetected errors.Most of these undetected errors will have a large Hamming weight, and so while negligible for small , they will be damaging for the modest size relevant for threshold calculations.
Concerning thresholds for qubit protocols, we have focused on the comparable protocol using quantum Reed-Muller codes.The qubit threshold can be slightly extended by using the 7-qubit Steane code [16] ( * = 0.14645) or the 5-qubit code [13] on a different class of magic states ( * = 0.1719).Though a slight improvement, both fall short of our qutrit and ququint thresholds and have much poorer yields.
Before proceeding, we will remark on our notation and terminology for quantifying depolarizing noise.Throughout we have used = 1 − M 0 |ρ|M 0 for the error probability.If a state suffers depolarizing noise, it has the form and in some parts of the literature δ is used to quantify noise.Relating these two distinct noise measures we have and so a dependence on the dimensionality appears.In terms of δ thresholds appear larger, with QRM 3 (2) and QRM 3 (2) having thresholds at δ = 0.317 and δ = 0.453 respectively.Some readers may find using δ to be more natural as it may be related to the depolarizing noise rate of some unitary used to prepare the initial noisy magic states.However, when unitaries suffer depolarizing noise, the best strategy is not simply to apply the noisy unitary to |+ .Rather better thresholds can be achieved with noisy unitaries by using the noise dilution protocol of Howard and Vala [56].Furthermore, the threshold boosts from noise dilution become more prominent for higher dimensions.FIG.2: The canonical CM -plane for a qutrit, d = 3, which any state can be projected onto by CM -twirling.Every quantum state is a point in the complex plane for the complex number zρ = tr(CM ρ).The three pure magic states, |M k , take values z = 1, ω, ω 2 , which have |z| 2 = 1 and so lie on a circle in the plane.All physical states have, z = (1 − f1 − f2) + ωf1 + ω 2 f2, and so lie in the convex hull of the pure magic states, forming a triangle of physical states.The distillable region of states can, by use of the QRM 3 (2) protocol, be brought arbitrarily close to nearest pure magic state.The stabilizer states are the convex hull over the set of points, z, taken for each of the pure stabilizer states.It is impossible to distil not only the stabilizer states but also the bound states, as demonstrated in Ref. [42].Note that the rotational symmetry is to be expected as the Pauli Z rotation performs a rotation in the CM -plane.
Here we apply our methods to the 3 dimensional case using QRM 3 (2), as explicitly defined in Def 5.In previous work [43] we have proposed other protocols for the 3dimensional case, including a generalization of the 5-qubit code to qutrits.While magic state distillation was observed for this 5-qutrit code, these previous studies only showed a linear suppression of noise, whereas here we observe a more rapid quadratic suppression with each iteration.
We take M to be the canonical M 3 gate, as in Thm. 1 and Eq 46.By C M -twirling all single qudit quantum states are projected onto the diagonal in the |M k basis, such that ρ = When we wish to distil |M 0 , the weights f 1 and f 2 represent different types of noise.A more convenient parameterization is f 1 = cos 2 (θ) and f 2 = sin 2 (θ) as we are mainly interested in how the total noise reduces.Our techniques allow us to find an analytic solution for after a single iterate of magic state distillation with QRM 3 (2).However, the expression is lengthy so here we truncate to 3rd order which is quadratically reduced.In Fig ( 3a), we show the exact output error probability for the whole range of different noise  (1).For a fixed there are many different compatible states, and so there are many different possible output and these are shown as a region rather than single curve.For the worst case noise we mark the threshold * .The dashed line shows the specific instance of depolarizing noise, and the associated depolarizing threshold * dep is also shown.The straight line is simple the "break even" line.models (different θ) and depolarizing noise (θ = π/2').We find that a threshold of * = 0.20015 for general noise and * dep = 0.211001 for depolarizing noise (as cited earlier) .As such, for all θ, if 0 < < * it follows that < .We can also find a quadratic upper bound, such that for all and θ we have ≤ K 2 with K = 5.03.The value of K is found by considering the function −2 and numerically maximizing, so K = sup ,θ { −2 }.
The region of distillable states is actually slightly larger than the < * region, with a greater noise tolerance for some values of θ.To find the whole distillable region we resort to numerics and present the results as part of Fig. 2. Several other important regions of the plane are also highlighted.We show the stabilizer states and bound magic states, which cannot be distilled by any stabilizer operation.Between these regions is a non-empty regime of ambiguous status, which neither our protocol works upon nor is ruled out from distillability by any known theorem.Even in the simple qubit case, such puzzling regimes exist and it has proven challenging to conclusively decide their status, see for example Refs.[25,26].
Also important is the success probability of distillation with QRM 3 (2), which for all states satisfies P ≥ 1/9 and for small is approximately Given these fairly high success probabilities and that we use only 8 copies per iteration, this protocol is competitive in comparison to the Bravyi-Kitaev protocol (herein BK) that also used Reed-Muller codes.Their protocol uses 15 copies per iteration and has P ≥ 1/16 and for small it achieves P = 1 − 15 + O( 2 ).Our QRM 3 (2) code requires fewer copies per iteration, but it would require more iterations to achieve the same error suppression as BK, since BK has a cubic error suppression rather than just quadratic.In Figs.(4.1a,4.1b)we consider the exact yield of our protocol, compared against BK, assuming depolarizing noise, so θ = π/4.For small error probability in < 0.05, the yield of our protocol QRM 3 (2) is similar to BK.Both protocols give yields of the same order of magnitude and which protocol is superior fluctuates with variation in required iterations.However, as the initial error probability in increases, the yield of QRM 3 (2) exceeds that of BK by many orders of magnitude.The dominant effect here is that the yield of BK vanishes as we approach the threshold * BK ∼ 0.1415, whereas our protocol can tolerate depolarization all the way upto * dep ∼ 0.211.The results of Sec V A also give us analytic tools for estimating yields.These show that for small target the yield of our protocol decreases as where γ * = log 2 (8) = 3.This can be compared with the BK protocol, which achieves a similar scaling with γ * = log 3 (15) ∼ 2.46.We see these protocols have similar scaling properties, but BK performs slightly better in the large −1 target limit.However, the numerical results reported in the previous paragraph show that finite size effects and a superior threshold often outweight these asymptotic arguments.
Next we apply our methods to the 5-dimensional case using the code QRM 5 (1), as explicitly defined in Def 6.This is the first protocol ever applied to the problem of distilling magic states in 5-dimensional systems.The code and associated protocol have many distinguishing feature already mentioned: it is the smallest known non-trivial code to have a transversal non-Clifford; it has the largest noise threshold against depolarizing noise ( * dep = 0.363); and it has the best known scaling in terms of expected yield (with γ = 2).All these features can be attributes to the fact that d = 5 is the smallest dimension where diagonal non-Clifford gate exist with period d, allowing us to work with m = 1.
Again we take M to be the canonical M 5 gate, as in Thm. 1 and Eq.(49).The C M -twirled states are parameterised by a fidelity, f 0 = 1 − , and 4 independent noise parameters f j for j = 1, 2, 3, 4. In Fig. (3b) we show the range of different output error rates for all different types of noise and the depolarizing noise, which have thresholds of * = 0.31195 and * dep = 0.363122.We noted earlier that the QRM 5 (1) possesses the best known protection against depolarizing noise, but also see here that its robustness against generic noise is also unrivaled.
Unfortunately, 5-dimensional systems are quite complex.Even after twirling into the C M -plane, we cannot easily visually represent the whole distillability region as with did for the qutrit protocol.For this reason we focus on the depolarized case with f 0 = 1 − and f j =0 = /4.After a successful implementation of one round, a depolarized state is output with and this occurs with probability Based on these results we expect the protocol to have an excellent yield.We numerically studied the yield and again compared it against the qubit protocol QRM 2 (4) or Bravyi and Kitaev, see Figs. (4.2a,4.2b).The numerics confirm that across all parameter regimes QRM 5 (1) offers a significant resource savngs of potentially many orders of magnitude.Magic state distillation is typically the most resource intensive aspect of fault tolerance schemes, and so high yield protocols are very desirable.

VI. STATE-INJECTION AND UNIVERSAL QUANTUM COMPUTING
Protocols for qudit magic state distillation are our main focus, but what happens after preparation of a highly purified magic state?Our ultimate goal is to simulate a non-Clifford group unitary via state-injection.For the C M -magic states of direct interest we show the following.For perfect magic states, = 0, this entails that G(ρ M ⊗ ρ) = M ρM † .We first focus on the ideal case and later extend to noisy magic states.
For qubit systems, any magic state on the equator of the Bloch sphere may be exchanged for a unitary randomly selected from a pair of non-Clifford phase gates [13].In previous work, it was shown that a qutrit analog of the Bloch sphere equator [43] provides magic states that can be used for state-injection of non-Clifford phase gates.Here we review and generalize these ideas.
which can also be expressed as where we note The transformation is unitary, but randomly selected from d different possibilities.How do we simulate a deterministic unitary required for a computation?Herein we consider unitary gates produced from a magic state, |M 0 , such that Again we exploit the relationship between M and the Clifford unitary C M = M XM † .Noting that C k M = M X k M † we express the unitary as Therefore, we can recover the desired M unitary by applying the inverse of Clifford unitary (X k ) † C k M .
Taking the trace norm and using the triangle inequality gives This is a rigorous treatment of the intuition that if the magic state is almost perfect then so too is the state-injection.The addition of non-Clifford M and M † gates to our repertoire of unitaries generates a set dense in the special unitary group (see Refs. [53,59] and App.D).Furthermore, for every gate in this set its inverse is also contained in the set.Thus the Solovay-Kitaev algorithm can be applied to ensure an efficient approximation of any unitary.This argument also applies to the results of Ref [43], where the qutrit Clifford group was supplemented by a non-Clifford unitary but universality was only conjectured there.

VII. DISCUSSION
We have generalized the idea of magic state distillation using quantum Reed-Muller codes to all prime dimensions, enabling preparation of highly purified pure non-stabilizer states given a device capable of ideal stabilizer operations.By stateinjection these magic states enable us to simulate universal quantum computation.While many aspects of the generalization were very analogous to the qubit case, there have also been some remarkable surprises.In odd prime dimension the non-Clifford gates we gain are fundamentally different from the phase gates implemented by the Bravyi-Kitaev protocols.In particular, we find that for primes d ≥ 5 there exist quantum Reed-Muller codes of only d − 1 qudits that possess these non-Clifford gates as transversal gates, whereas 2 4 − 1 = 15 qubits are needed for a similar construction.
To our knowledge the ququint code (d = 5) using only 4 ququints is the smallest non-trivial stabilizer code with a transversal non-Clifford gate.This translates into real practical gains, with the ququint protocol achieving better error probability thresholds (see Sec. V B) than any other known protocol with a polynomially scaling yield; * dep = 0.363 for depolarizing noise.Calculating the yield of the ququint protocol also shows that it is superior to all known qubit protocols, as demonstrated both by numerics and analytic scaling arguments.For larger prime dimensions, d > 5, the thresholds and resource costs deteriorate with increasing dimension.It is not presently clear whether this is an inevitable problem with higher dimensional systems or a peculiarity of our protocols.
We also investigate in detail the performance of an 8 qutrit (d = 3) protocol, which whilst not as effective as the ququint protocol was still competitive against qubit protocols.
It is natural to question whether , as defined in equation ( 9), is a fair measure to compare noise thresholds in systems of different dimensions.It would be desirable to use a noise measure which is practically motivated based on a noise processes which could occur in the lab.In [41], the depolarising noise rate δ is employed, where δ measures the degree of depolarising noise of state ρ from pure state |ψ state via ρ = (1 − δ)|ψ ψ| + (p/d)1 1.For a depolarising noise model δ is related to via = ((d − 1)/d)δ.Quantifying error via δ penalises higher dimensional states, yet even via this measure the thresholds for the 4-ququint code continue to significantly outperform their qubit counterparts.
Nevertheless, one can argue that δ is also an unfair method of comparison given the larger number of noise processes which contribute to depolarising noise for higher dimensional systems.Ultimately, for the context of magic state distillation, the most relevant measure of comparison would be the yield at the fault-tolerance threshold.Unfortunately, at present, fault tolerant quantum computation with higher dimensional systems remains a little explored research area, and comparable thresholds to e.g.Knill [4] or Harrington et al [3]'s schemes are unknown.We know of only one study of qudit fault tolerance thresholds [60], and while evidence was presented that higher dimensional systems may provide better thresholds than their binary counterparts, the analysis in this paper is limited.In particular therefore, our results motivate further study of an full fault-tolerance schemes based on ququint and qutrit components.It is possible that the enhanced performance in dimensions 3 and 5 seen in our magic state distillation protocols translate into better thresholds and resource costs for full fault tolerance schemes based on qutrits and ququints.
Another application of our results it to models of computation where the fault tolerant operations are a proper subgroup of the Clifford group.For instance, the qubit topological cluster states [3,17] cannot directly prepare Y eigenstates, but they can be distilled using magic state distillation.In qudit generalizations of the topological cluster scheme, we anticpate that preparation of XZ eigenstates will not be topologically protected.While we have focused distillation of nonstabilizer states, our protocols also enable distillation of XZ stabilizer states.
Our understanding of the magic state model is still in its infancy despite many striking similarities with the more mature theory of entanglement.However, as reviewed in the introduction there has been a flurry of recent results on qudit magic state model.Numerous problems of a fundamental nature now present themselves as ripe for tackling.Inspired by entanglement theory, we might ask if their exist qudit protocols for magic catalysis [61,62] or magic activation [61,63].Furthermore, while all known protocols offer yields of magic states with arbitrarily small error probabilities, the yield vanishes as the target error vanishes.Contrast this with entanglement theory where the hashing protocol [64,65] and quantum polar-coding techniques [66] offer a method of distilling entanglement at a non-zero yield even for vanishing target error.
Whether such a protocol could exist for magic state distillation is an intriguing and wide open question.
In the final stages of this research we became aware of recent work that proposes a novel protocol [58] for qubit magic state distillation.The protocol, which they call the 10-to-1 protocol, takes 10 noisy magic states each iterate and outputs 2 magic states.This is the first protocol to output more than 1 magic state per iterate, and this has the benefit of increasing its yield.Potentially similar techniques could also be used to design higher dimensional protocols.
unitaries, that is dense in SU(d n ).In Ref. [59], Theorem 7.3 implies that any finite group that contains the Clifford group must be generated by the Clifford group and a gate proportional to the identity.Thus the group H generated by the Clifford group and a non-Clifford unitary (not proportional to the identity) cannot be finite and must be of infinite order.
In Corollary 6.8.2 of [53], it is shown that any closed sub-group, H, satisfying C n d ⊂ H ⊂ U(d n ), must either be have finite order (ignoring global phase factors) or be SU(d n ).Combining this corollary with the previous theorem we conclude that the closure of the group generated by the Clifford group and any non-Clifford unitary (not proportional to the identity) is SU(d n ).

Theorem 2
Consider any M ∈ M m d for any odd prime d and any integer m ≥ 2, or any odd prime d ≥ 5 and m ≥ 1.

1 . 4 . 5 . 6 . 7 .
Take n copies of the state ρ and C M -twirl; 2. Measure generators of the phase stabilizer S Z ; 3. Accept all outcomes, but perform a Clifford correction operator C M [w] tuned to outcomes; Measure generators of the bit-flip stabilizer S X ; Postselect on all "+1" measurement outcomes; Decode the encoded qudit to a single qudit; Use the output labelled ρ as input in the next iterate.

Definition 7 Definition 8
Unshortened Reed-Muller codes, RM d (1, m), are classical linear codes on F n d , where n = d m , of dimension m + 1.They are the set of codewords RM d (1, m) = {U m d (ū) ⊕ c1 : ū ∈ F m d , c ∈ F d } defined in terms of affine functions.Such codes have many exotic properties.Before investigating them we introduce one more definition.We say a function Λ : F n d → Z is a λ-function if there exists a set of d integers {λ 0 , ..λ d−1 } such that j∈F d λ j = 0 and Λ(v) = n j=1 λ vj .

Definition 9
Shortened Reed-Muller codes, RM * d (1, m), are classical linear codes on F n d , where n = d m − 1, of dimension m.They are the set of codewords RM * d (1, m) = {P m d (ū) : ū ∈ F d } defined in terms of linear maps.

Theorem 4
QRM d (m) quantum codes are M m d -distillation codes of distance D = 2.

FIG. 3 :
FIG.3:The output error, against input error, for (a) QRM 3 (2) and (b) QRM 5(1).For a fixed there are many different compatible states, and so there are many different possible output and these are shown as a region rather than single curve.For the worst case noise we mark the threshold * .The dashed line shows the specific instance of depolarizing noise, and the associated depolarizing threshold * dep is also shown.The straight line is simple the "break even" line.

FIG. 4 :
FIG.4:The yield on a log-scale of of our protocols, QRM 3 (2) and QRM 5 (1), (blue) compared with the Bravyi-Kitaev QRM 2 (4) (red) protocol.Plots (1a) and (2a) are a function of initial error probabilities in and target error probabilities target.For the qutrit and ququint states the noise is depolarizing.Both (1a) and (2a) come with a cross-sections (1b) and (2b) respectively, with the target error probability held constant.The sudden changes in yield occurs because of discrete changes in the number of iterations required.

TABLE I :
Definition 11We say a qudit quantum state |Θ is equatorial, or a phase state, if Θ ∈ R d and |M 0 state is a phase state with Θ j = 2λ j π/d m .The essential feature of such states is that they are unbiased with respect to the computational basis, such that a Z measurement will generate completely random outcomes.Taking an unknown state |ψ and measuring ZZ † on the pair |Θ |ψ will also give unbiased outcomes, and so no information is gained from |ψ .Denoting a general state as |ψ = j c j |j , the result of a projection, Π k , onto a subspace stabilized by ω −k ZZ † yieldsΠ k |ψ |Θ ∝ j c j e iΘ j⊕k |j ⊕ k |j .(81) We decode by performing a Clifford unitary such that |j ⊕ k |j → |k |j and tracing out the first system.As promised the result is a unitary transform, |ψ → U k (Θ)|ψ , where U k (Θ) = j e iΘ j⊕k |j j|,