Optimizing short stabilizer codes for asymmetric channels

For a number of quantum channels of interest, phase-ﬂip errors occur far more frequently than bit-ﬂip errors. When transmitting across these asymmetric channels, the decoding error rate can be reduced by tailoring the code used to the channel. However, analyzing the performance of stabilizer codes on these channels is made difﬁcult by the #P-completeness of optimal decoding. To address this, at least for short codes, we demonstrate that the decoding error rate can be approximated by considering only a fraction of the possible errors caused by the channel. Using this approximate error rate calculation, we extend a recent result to show that there are a number of [[5 (cid:2) n (cid:2) 12 , 1 (cid:2) k (cid:2) 3]] cyclic stabilizer codes that perform well on two different asymmetric channels. We also demonstrate that an indication of a stabilizer code’s error rate is given by considering the error rate of a classical binary code related to the stabilizer. This classical error rate is far less complex to calculate, and we use it as the basis for a hill-climbing algorithm, which we show to be effective at optimizing codes for asymmetric channels. Furthermore, we demonstrate that simple modiﬁcations can be made to our hill-climbing algorithm to search for codes with desired structure requirements.


I. INTRODUCTION
Quantum codes can be employed to protect quantum information against the effects of a noisy channel.Of particular note are the stabilizer codes, which are defined by a stabilizer S that is an Abelian subgroup of the n-qubit Pauli group P n [1].An [[n, k]] stabilizer code encodes the state of a k-qubit system in that of an n-qubit system; that is, it is a subspace Q ⊆ (C 2 ) ⊗n of dimension 2 k .For a Pauli channel, an error E acting on the code is also an element of P n , with the component acting on any given qubit being I with probability p I , X with probability p X , Y with probability p Y , or Z with probability p Z .Most stabilizer codes are implicitly designed for good decoding performance (that is, a low decoding error rate) on the depolarizing channel, where p X = p Y = p Z .This is achieved by ensuring that the code has large distance d, which is the weight of the lowest weight error that yields a trivial syndrome while having a nontrivial effect on the code.However, for a number of channels of physical interest, Z errors occur far more frequently than X errors [2,3].For these channels, better decoding performance can be achieved by using codes that are tailored to the channel [4,5].
In this paper, our focus is on the construction of stabilizer codes for two different asymmetric channels.The first of these is the biased X Z channel, for which the X and Z components of an error occur independently at different rates.The second is a Pauli approximation of the combined amplitude damping (AD) and dephasing channel.Both of these channels have two degrees of freedom, which means that the values of p X , p Y , and p Z can be defined via the total error probability p = p X + p Y + p Z and bias η = p Z /p X [4,6].A well-studied approach to constructing codes for asymmetric channels is to restrict consideration to Calderbank-Shor-Steane (CSS) codes [7,8], which can be designed to have separate X and Z distances d X and d Z (typically d Z > d X ) [6,[9][10][11][12][13].We wish to take a more direct approach to the problem by actually determining the decoding error rates of the codes we construct (this also allows us to meaningfully consider non-CSS codes).However, to do this, we have to overcome the #P-completeness of stabilizer decoding [14], which stems from the equivalence of errors up to an element of the stabilizer.To achieve this, at least for short codes (that is, codes with small n), we first demonstrate that the error rate of an optimal decoder can be approximated by considering only a small subset E of the 4 n possible Pauli errors.We derive a bound on the relative error in this approximation, and we demonstrate that the independence of error components can be exploited to construct E without having to enumerate all possible errors.We also show that the performance of a classical [2n, n + k] binary linear code associated with the stabilizer [1,15] gives an indication of the stabilizer code's performance (note that whenever we mention a code's performance or error rate, we are referring to that of the associated decoder).It is several orders of magnitude faster to calculate this classical error rate, and we show that it can itself be approximated using a limited error set.
We have a particular focus on cyclic codes, which are stabilizer codes based on classical self-orthogonal additive cyclic GF(4) codes [16][17][18] [where GF(q) is the q-element finite field].This is motivated by the recent result of Ref. [4], where it was shown that a [ [7,1]] cyclic code performs near optimally compared to 10 000 randomly constructed codes on the biased X Z channel for a range of error probabilities and biases.We extend this result by enumerating the [[5 n 12, 1 k 3]] cyclic codes and making use of our approximate error rate calculation.In particular, we demonstrate that there are a number of cyclic codes that perform well compared to the best of 10 000 randomly constructed codes for both the biased X Z and AD channels across a range of p and η values.In some cases, such as [[n 9, 1]] codes for the biased X Z channel, the best cyclic codes significantly outperform the best of the random codes constructed.To improve on the poor performance of the random search, we demonstrate the effectiveness of a simple hill-climbing algorithm that attempts to optimize the performance of the classical binary code associated with a stabilizer.We also show that by modifying the mutation operation employed by this hill-climbing algorithm, we can effectively search for codes with desired structure.In particular, we show that we can search for codes with weight-four generators, CSS codes, and linear codes.
The paper is organized as follows.Section II gives an overview of classical codes, asymmetric quantum channels, and stabilizer codes.In Sec.III, we detail our methods for calculating approximate error rates.In Sec.IV, we demonstrate the performance of cyclic codes, outline our hill-climbing search algorithm, and show its effectiveness.The paper is concluded in Sec.V.

II. BACKGROUND A. Classical codes
A classical channel maps a set of inputs A x to a set of outputs A y .We are interested in the case where A x = A y = GF(q), for which the action of the channel is given by (x) = x + e = y, (1) where x ∈ GF(q) is the channel input, y ∈ GF(q) is the channel output, and e ∈ GF(q) is an error (or noise) symbol that occurs with probability P(e). is called symmetric if P(0) = 1 − p and P(e i ) = p/(q − 1) for e i = 0.The noise introduced by the channel can be protected against using a code C ⊆ GF(q) n , whose elements are called codewords.The effect of the combined channel n , which is composed of n copies of , on some codeword x ∈ C is n (x) = x + e = y, (2) where y ∈ GF(q) n is the channel output and e ∈ GF(q) n is an error "vector."Assuming that error components occur independently, the probability of e = (e 1 , . . ., e n ) occurring is where P(e i ) is the probability of the error symbol e i occurring on .It follows that for a symmetric channel, the probability of an error e occurring depends only on its weight w(e), which is the number of nonzero components from which it is composed.The distance d of a code is the weight of the lowest weight error mapping one codeword to another.The (minimum) weight w(C) of a code C is simply the weight of the lowest weight codeword it contains.
A code is called additive if it forms a group (under addition) and linear if it forms a vector space.Such codes can be described by a generator matrix where B = {b 1 , . . ., b k } is either a generating set or basis, respectively (note that we consider codewords as column vectors).A linear code can also be defined as the kernel of a GF(q) parity-check matrix H; that is, If H has m rows, then dim(C) = k n − m, with equality when H is full rank.For a linear code, the errors mapping one codeword to another are themselves codewords; therefore, it follows that the distance of a linear code C is simply d = w(C).A linear code of length n with dimension k and distance d is called an [n, k] q or [n, k, d] q code (the q is typically omitted for binary codes, where q = 2).More generally, a length-n code of size |C| = K and distance d is called an (n, K ) q or (n, K, d ) q code.The dual code of some C ⊆ GF(q) n with respect to the inner product •, • : GF(q) n × GF(q) n → GF(q) is C ⊥ is the annihilator of C and is therefore a linear code.If orthogonal and vice versa.Unless otherwise specified, the dual code is with respect to the Euclidean inner product In this case, if C is linear with generator matrix G, then a necessary and sufficient condition for c ∈ C ⊥ is Gc = 0; that is, a generator matrix for C is a parity-check matrix for C ⊥ .Conversely, if H is a parity-check matrix for C, then it is a generator matrix for C ⊥ .A decoder uses the output of a channel to infer its input.For a linear code, this inference can be aided by the syndrome As channel outputs that differ only by a codeword yield the same syndrome, the q n−k possible syndromes can be associated with the cosets of GF(q) n /C.Given some syndrome measurement z, an optimal maximum a posteriori (MAP) decoder will then return the most probable error êz = argmax e∈GF(q) n P(e|z) ( 9 ) in the corresponding coset.The channel input can then be inferred as x = y − êz .If ê = êz (and hence x = x), then decoding is successful; otherwise, a decoding error has occurred.
The probability of such a decoding error, called the frame error rate (FER), is simply Unfortunately, even in the simple case of a binary code operating on the binary symmetric channel (a symmetric channel with q = 2), this decoding problem can be shown to be NP-complete [19].This complicates the design of highly performant codes (that is, codes yielding a low FER).In practice, when designing codes for symmetric channels, the simpler goal of achieving a large distance is typically settled for.This is motivated by the fact that for low-distance codes, there are many errors in each coset êz + C with weight, and hence probability, similar to êz , which leads to a high FER according to Eq. ( 10) (see Sec. II A of Ref. [20] for a more detailed discussion).Two codes C and C are called permutation equivalent if they are the same up to a relabeling of coordinates.Permutation-equivalent codes share a large number of properties including length, size, and distance; furthermore, they yield the same FER for channels where the error components are independently and identically distributed.While there are more general notions of code equivalence, whenever we say that two codes are equivalent, we mean that they are permutation equivalent in this paper.Furthermore, if some family (set) of codes {C 1 , . . ., C N } can be split into M equivalence classes (according to permutation equivalence), then we simply say that M of the codes are inequivalent.

B. Cyclic codes
Cyclic codes are those for which a cyclic shift of any codeword is also a codeword; that is, for a cyclic code C, if (c 0 , c 1 , . . ., c n−1 ) ∈ C, then it is also the case that (c n−1 , c 0 , . . ., c n−2 ) ∈ C (note that to be consistent with standard convention, we index the codewords of cyclic codes from zero in this section).If C is linear, then it has a convenient description through the mapping (11) of codewords to polynomials in GF(q)[x].Cyclic shifts of codewords correspond to a multiplication by x taken modulo , from which it follows that C corresponds to an ideal I C ∈ GF(q)[x]/(x n − 1).Any such ideal is principal and is generated by a unique monic polynomial of minimal degree g(x) ∈ I C that is a factor of x n − 1 [21]; through slight abuse of notation, we write C = g(x) .C has dimension k = n − deg(g) and has a generator matrix Furthermore, a parity-check matrix is given in terms of the check polynomial h(x) = (x n − 1)/g(x).It follows that the dual code C ⊥ is also cyclic and is generated by x k h(x −1 ).
In the quantum setting, we are particularly interested in codes over GF(4) = {0, 1, ω, ω 2 = ω} that are selforthogonal with respect to the trace inner product (this will be explained further in Sec.II D).Note that the trace inner product of a, b ∈ GF(4) n is where 0 = 0, 1 = 1, ω = ω 2 , and ω2 = ω; and tr(x [16], where More generally, an (n, 2 k ) 4 additive cyclic code C has two generators [16][17][18].Following the formulation of Ref. [16], C = ωp(x) + q(x), r(x) where p(x), q(x), r(x) ∈ GF(2)[x]; p(x) and r(x) are factors of x n − 1; and r(x) is also a factor of q(x)(x n − 1)/p(x).In general, the choice of generators is not unique; however, any other representation will be of the form C = ωp(x) + q (x), r(x) where q (x) ≡ q(x) [mod r(x)].The size of C is given by k = 2n − deg(p) − deg(r), with a generator matrix consisting of n − deg(p) cyclic shifts of the codeword corresponding to ωp(x) + q(x) and n − deg(r) cyclic shifts of the codeword corresponding to r(x).C is self-orthogonal (with respect to the trace inner product) if and only if p(x)r(x n−1 ) ≡ p(x n−1 )r(x) ≡ 0 (mod x n − 1), ( 16) It is possible to enumerate all the self-orthogonal (n, 2 k ) 4 additive cyclic codes through a slight modification of the method presented in Ref. [22]: r(x) ranges over all factors of x n − 1; for each r(x), p(x) ranges over the factors of x n − 1 of degree 2n − k − deg(r) that satisfy Eq. ( 16); and for each pair of r(x) and p(x), q(x) ranges over the polynomials with deg(q) deg(r) that satisfy both Eq. ( 17) and q(x)( While every additive cyclic code has a "canonical" representation involving two generators, many of them can be described using only one [17,18] (that is, they have a generating set composed of cyclic shifts of a single codeword).This is guaranteed to be the case if r(x) = x n − 1 or if p(x) = x n − 1 and q(x) is a multiple of r(x).However, these are not necessary conditions for a single-generator representation to exist.For example, there is a (5, 2 5 ) 4 code with p(x) = 1 + x, q(x) = x 3 , and r(x) = 1 + x + x 2 + x 3 , which gives a canonical generator matrix however, it is also has the generator matrix We can express this code compactly as C = ωω010, 11111 cyc ≡ ωω010 cyc .

C. Quantum channels
The action of a quantum channel on a quantum state described by the density operator ρ is where the A k , called Kraus operators, satisfy k A † k A k = I (the identity operator) [23].We are interested in qubit systems, for which states belong to a two-dimensional Hilbert space H ∼ = C 2 .Furthermore, we are concerned with Pauli channels, which are of the form where p I + p X + p Y + p Z = 1, and in the computational {|0 , |1 } basis The action of this channel can be interpreted as mapping a pure state |φ to E |φ where the error E is I with probability p I , X with probability p X , Y with probability p Y , or Z with probability p Z [24].X can be viewed as a bit-flip operator as X |0 = |1 and X |1 = |0 .Z can be viewed as a phase flip as Z|0 = |0 and Z|1 = −|1 .Y = iX Z can be viewed as a combined bit and phase flip.The quantum equivalent of the symmetric channel is the depolarizing channel, for which p I = 1 − p and p X = p Y = p Z = p/3.For a number of systems of physical interest, phase-flip errors occur far more frequently than bit-flip errors [2,3].We focus on two such asymmetric channels in this paper.The first is the biased X Z channel, for which the X and Z components of an error E ∝ X u Z v , where u, v ∈ GF(2), occur independently with probabilities q X and q Z , respectively.It follows from the independence of the error components that p X = q X (1 − q Z ), p Z = q Z (1 − q X ), and p Y = q X q Z .A typical way to specify an asymmetric channel with two degrees of freedom is through the total error probability p = p X + p Y + p Z and bias η = p Z /p X .Note that while this definition of bias is consistent with Refs.[4,6], some authors give alternate definitions; for example, bias is defined as p Z /(p X + p Y ) in Ref. [5] and (p Y + p Z )/(p X + p Y ) in Ref. [25].Ultimately, the exact nature of the channel parametrization will have no real impact on our results, which has lead us to select the simplest definition of bias.The second channel of interest is the combined amplitude damping (AD) and dephasing channel, which is described by the non-Pauli Kraus operators A Pauli approximation of this channel can be obtained through a process called Pauli twirling [26][27][28].In particular, the approximate channel is [6] T Again, this channel has two degrees of freedom (λ and γ ) and can therefore be described in terms of the total error probability p and bias η = p Z /p X .Note that in the case of η = 1, T reduces to the depolarizing channel.For the sake of brevity, we will simply refer to T as the AD channel.The Pauli matrices are Hermitian, unitary, and anticommute with each other.Furthermore, they form a group P 1 = {±I, ±iI, ±X, ±iX, ±Y, ±iY, ±Z, ±iZ} = X, Y, Z (28) called the Pauli group.The n-qubit Pauli group P n is composed of all n-fold tensor product combinations of elements of P 1 .Note that when writing elements of P n , the tensor products are often implied; for example, we may write I ⊗ I ⊗ X ⊗ I ⊗ Y ⊗ Z ⊗ I ⊗ I ∈ P 8 as IIX IY ZII.The weight w(g) of some g ∈ P n is the number of nonidentity components from which it is composed.It follows from the commutation relations of the Pauli matrices that any two elements of P n commute if their nonidentity components differ in an even number of places; otherwise, they anticommute.
As in the classical case, the noise introduced by a quantum channel can be protected against using a code.In the qubit case, a code is a subspace Q ⊆ (C 2 ) ⊗n whose elements are again called codewords.These codewords are transmitted across the combined n-qubit channel ⊗n , which, in the Pauli case, maps a codeword |φ to E |φ where E ∈ P n .Similar to the classical case of Eq. (3), if the error components are independent, then the probability of an error where P(E i ) is the probability of the error E i occurring (up to phase) on the single-qubit channel .The equivalence of errors up to phase can be addressed more explicitly by instead considering Ẽ = {E , −E , iE , −iE } ∈ P n /{±I, ±iI} = Pn .

D. Stabilizer codes
Stabilizer codes are defined by an Abelian subgroup S < P n , called the stabilizer, that does not contain −I [1].The code Q is the space of states that are fixed by every element s i ∈ S; that is, The requirement that −I / ∈ S means both that no s ∈ S can have a phase factor of ±i, and also that if s ∈ S, then −s / ∈ S. If S is generated by M = {M 1 , . . ., M m } ⊂ P n , then it is sufficient (and obviously necessary) for Q to be stabilized by every M i .Assuming that the set of generators is minimal, which will be the case for all codes considered in this paper, it can be shown that dim(Q) = 2 k where k = n − m [24]; that is, Q encodes the state of a k-qubit system.
Suppose an error E occurs, mapping some codeword Defining S = {s = {s, −s, is, −is} : s ∈ S}, the syndrome resulting from Ẽ ∈ Pn depends only on which coset of Pn /N ( S ) it belongs to, where N ( S ) = {g ∈ Pn : g −1 Pn g = Pn } is the normalizer of S in Pn ; furthermore, the effect of Ẽ on the code depends only on which coset of Pn / S it belongs to [1].Note that as S N ( S ), the 2 n−k cosets of Pn /N ( S ) are each the union of 2 2k cosets of Pn / S. In the classical case, the distance d of a linear code is equal to the weight of the lowest weight error yielding a trivial syndrome while having a nontrivial effect on the code.This extends to the quantum case, with the distance d of a stabilizer code being the weight of the lowest weight element in N ( S )\ S [1].An n-qubit code of dimension 2 k and distance d is called an [[n, k]] or [[n, k, d]] code (the double brackets differentiate it from a classical code).
Given the equivalence of errors up to an element of the stabilizer, a MAP decoder will determine the most likely coset Âz = argmax that is consistent with the syndrome measurement.If Âz has the representative Ẽ = { Ê , − Ê , i Ê , −i Ê }, then the decoder attempts correction by applying Ê to the channel output.If Ẽ ∈ Âz , and hence Ẽ Ẽ ∈ S, then decoding is successful; otherwise, a decoding error has occurred.It therefore follows that the FER is Unfortunately, this decoding problem has been shown to be #P-complete [14].Furthermore, the simpler decoding problem of determining the single most likely error Ẽz = argmax corresponding to the observed syndrome is essentially the same as the classical decoding problem outlined in Sec.II A and hence is also NP-complete [29][30][31].The FER for this decoder is where "SE" stands for "single error."Two stabilizers (or the codes they define) are permutation equivalent if they are equal up to a relabeling of qubits.As in the classical case, if two stabilizer codes are permutation equivalent, then they are both [[n, k, d]] codes; furthermore, they will yield the same FERs (both F MAP and F MAP−SE ) when the error components are independently and identically distributed, which is the case for the channels that we consider.Again, while there are more general notions of quantum code equivalence, we are always referring to permutation equivalence in this paper.
The links between stabilizer codes and classical codes can be made more concrete by representing the elements of Pn as elements of GF(2) 2n [1,15].This is achieved via the isomorphism with the product of elements in Pn corresponding to addition in GF(2) 2n .Furthermore, representatives of elements in Pn commute if the symplectic inner product of the binary representations is zero, where the symplectic inner product Utilizing this isomorphism, the generators of some stabilizer S can be used to define the rows of an m × 2n binary matrix where H X and H Z are each m × n matrices.Under this mapping, the requirement that all stabilizer generators commute becomes Conversely, a [2n, n + k] linear binary code C with a paritycheck matrix H satisfying this constraint can be used to define a stabilizer S. Technically, this only specifies S; however, as previously outlined, it is S that dictates the effect of an error on a stabilizer code, which means that the 2 n−k stabilizers corresponding to S will all have the same error correction properties (the codes corresponding to each such stabilizer actually form a partition of (C 2 ) ⊗n [32,33]).Without loss of generality, we can therefore map S to a particular stabilizer S by arbitrarily selecting a phase factor of +1 for all the generators.A subclass of stabilizer codes are the Calderbank-Shor-Steane (CSS) codes [7,8], which have a binary representation of the form For such codes, the commutation condition of Eq. ( 38) becomes HZ HT X = 0, which is satisfied when C ⊥ X ⊆ C Z , where C X and C Z are classical codes defined by the parity-check matrices HX and HZ , respectively.If C X = C Z , then this reduces to C ⊥ X ⊆ C X , in which case, the CSS code is called dual containing (DC).
As previously mentioned, the decoding problem of Eq. ( 34) is essentially the same as the classical decoding problem.This link can be made more explicit by expressing errors within the binary framework using the mapping E ∝ X e X Z e Z ↔ e = (e T X |e T Z ) T (where e X , e Z , and e are column vectors for consistency with the classical case).If the generators of a stabilizer define the parity-check matrix H for the binary code C, then the syndrome corresponding to E can be found by taking the symplectic inner product of e with each row of H, which can be written compactly as where With this slight modification to classical syndrome calculation, determining Ẽz in Eq. ( 38) corresponds precisely to determining êz in Eq. ( 9).Note that some authors avoid this difference in syndrome calculation by using the mapping E ∝ X e X Z e Z ↔ e = (e T Z |e T X ) T [34], which gives z = He as in the classical case of Eq. ( 8).For a CSS code, the syndrome associated with an error This allows the X and Z components of the error to be treated separately.In particular, e Z can be inferred from HX e Z = z Z , while e X can be inferred from HZ e X = z X .However, this approach is only guaranteed to determine the single most likely error if the X and Z components of E occur independently, which is the case for the biased X Z channel but not for the AD channel among others (see Sec. II E of Ref. [20] for a more detailed discussion).
Elements of Pn can also be represented as elements of GF( 4) n according to the isomorphism [1,16] with the product of elements in Pn corresponding to addition in GF( 4) n .Representatives of elements in Pn commute if the trace inner product [see Eq. ( 14)] of the corresponding elements of GF( 4) n is zero.Utilizing this isomorphism, any (n, 2 n−k ) 4 additive GF(4) code C that is self-orthogonal with respect to the trace inner product can be used to define an [[n, k]] stabilizer code (it is for this reason that stabilizer codes are sometimes called additive codes).Furthermore, the generators of the stabilizer S can be associated with the rows of a generator matrix G for C. We can describe a stabilizer code based on properties of C; for example, if C is linear and/or cyclic, then we will also call S (and the code Q it defines) linear and/or cyclic.Similar to the classical case, when designing a stabilizer code for the depolarizing channel, the complexity of determining its FER can be avoided by instead using code distance as something of a proxy.However, for asymmetric channels, distance becomes a less accurate metric as the probability of an error occurring no longer depends only on its weight.One approach in this case is to design codes with different For these so-called asymmetric codes, d X and d Z are the maximal values for which there is no Ẽ ∈ N ( S )\ S where E ∝ X e X Z e Z and both w(e X ) < d X and w(e Z ) < d Z .Such codes are typically constructed within the CSS framework, where d X = w(C Z \C ⊥ X ) and d X = w(C X \C ⊥ Z ) [35].Outside of the CSS framework, where the X and Z components of an error cannot be considered separately, the distances d X and d Z are somewhat less meaningful and potentially not even unique.For example, the (7, 2 6 ) 4 additive cyclic code ω10ω100 cyc maps to the [ [7,1,3]] cyclic stabilizer code with S = X ZIZX II cyc , which can be considered as a [[7, 1, 7/1]], [[7, 1, 1/7]], or [[7, 1, 2/3]] code.Some examples of asymmetric codes (for qubits) can be found in Refs.[6,[9][10][11][12][13].

III. APPROXIMATE FER CALCULATION
In this paper, we wish to construct stabilizer codes that perform well on asymmetric channels.In particular, we wish to gauge their performance directly; that is, we wish to accurately determine the FER exhibited by a MAP decoder as given in Eq. (33).As previously noted, determining this error rate is an #P-complete problem.In this section, we therefore investigate lower complexity methods of approximating F MAP and derive bounds on the relative error of these approximations.

A. Limited error set
In most cases, many of the errors in Pn occur with very low probability.It seems reasonable to assume that ignoring these low-probability errors will have little effect on the FER calculation of Eq. (33).In particular, suppose we only consider a subset of errors E ⊂ Pn .We can calculate an approximate FER using E by first partitioning it by syndrome into the sets B 1 , . . ., B r , where r 2 n−k .Each of these B i is then further partitioned by equivalence up to an element of S to give the sets A i1 , . . ., A is , where s 2 2k .The approximate FER is then where Âi = argmax Note that if we wish to explicitly associate a stabilizer S with F E , then we write F S E .In the best case, E will contain every Âz in its entirety, which gives z P( Âz ) = r i=1 P( Âi ) and hence F E = F MAP .In the worst case, which leads to This bound E on the relative error δ E in the approximate FER calculation is of practical use as it does not require any knowledge of F MAP .
There are two desirable attributes of the set E ⊂ Pn used to calculate F E .The first of these, which follows from Eq. (47), is for 1 − P(E ) to be less than some predetermined value as this affects the accuracy of F E .The second is for |E| to be small as this reduces the complexity of calculating F E .It is possible to construct such a set without enumerating Pn in its entirety by exploiting the independence of error components, which means that the probability of an error occurring depends only on the number of I, X , Y , and Z components it contains.Explicitly, the probability of some error Ẽ ∈ Pn occurring is where n(σ ) is the number of tensor components of E that are equal to σ up to phase.Furthermore, the number of errors in Pn with a given distribution of components is [36] Therefore, to construct E, we first enumerate all of the possible combinations of n(I ), n(X ), n(Y ), and n(Z ) such that which is a straightforward variation of the integer partition problem [37].These combinations are sorted in descending order according to their associated probability as given in Eq. ( 49).In an iterative process, we then work through this list of combinations, adding the N distinct errors associated with each one to E until we reach the desired value of 1 − P(E ).This construction has the added benefit of ensuring that E is permutation invariant, which guarantees that F E will be the same for equivalent codes.
For the approximate error rate calculation presented in this section to be of any real use, it must be accurate even when E is relatively small.To demonstrate that this is in fact the case, we have first constructed 1 000 random [ [7,1]] codes.To produce a random stabilizer S = M 1 , . . ., M n−k , we iteratively select Mi = {M i , −M i , iM i , −iM i } at random from from N ( M1 , . . ., Mi−1 )\ M1 , . . ., Mi−1 (note that we arbitrarily use a phase factor +1 for each M i as outlined in Sec.II D).Our only structure constraint on S is that it must involve every qubit; that is, for all 1 j n, there must be some M ( j) i ∝ I, where M ( j) i is the jth tensor component of M i (if a stabilizer does not satisfy this constraint, we simply discard it and construct a new one).For biased X Z channels with p = 0.1, 0.01, or 0.001 and η = 1, 10, or 100, we have then determined the fraction of the 1 000 codes that yield a relative error δ E 0.01 or relative error bound E 0.01 for varying |E|.The results of this are shown in Fig. 1 where it can be seen that, depending on the channel parameters, only 1-5% of Pn needs to be considered to yield δ E 0.01 for every code.As is to be expected, a slightly larger fraction of Pn is required to ensure a relative error bound of E 0.01; however, in every case this can still be achieved by only considering between 1-10% of Pn .Interestingly, for higher p, increasing η reduces the number of errors that need to be considered, while for lower p, this trend is reversed.Figure 2  shows the results of a similar analysis for codes with 5 n 7 and 1 k 3 on a biased X Z channel with p = 0.01 and η = 10.It can be seen that increasing k for fixed n reduces the fraction of errors that must be considered, which makes sense given that encoding a larger number of qubits will lead  to a higher FER.Furthermore, increasing n for fixed k reduces the fraction of errors that need to be considered, which bodes well for the analysis of longer codes.We note that changing p and/or η has little effect on these observations.

B. Most likely error
We now consider the decoder of Eq. ( 34) that determines the single most likely error given a syndrome measurement, which has an error rate as given in Eq. (35).Note that F MAP−SE is simpler to calculate than F MAP as it does not require a complete partitioning of Pn to form Pn / S. When using F MAP−SE as an approximation of F MAP , the best case scenario is that the most likely coset Âz will contain Ẽz for every z, which gives F MAP−SE = F MAP .In the worst case scenario, two things will occur.Firstly, the probability distributions over every Âz will be uniform; that is, P( Âz )/| S| = P( Âz )/2 n−k for all z.Secondly, the distributions over every Ẽz S will be sharply peaked without P( Ẽz S ) being large; that is, for every z, P( Ẽz ) = P( Âz )/2 n−k + ε and P( Ẽz S\ Ẽz ) = ε for some small ε, ε 0. In general, it is therefore the case that This upper bound on F MAP−SE is very loose, and in practice, F MAP−SE tends to be quite close to F MAP .To demonstrate this, we have again constructed 1 000 random [ [7,1]] codes.For each code, we have then determined both F MAP and F MAP−SE for the same nine biased X Z channel parameter combinations considered in Sec.III A (p = 0.1, 0.01, or 0.001 and η = 1, 10, or 100).The results of this are shown in Fig. 3. Especially for the codes yielding a low F MAP , which are the codes of greatest interest, it can be seen that the difference between F MAP−SE and F MAP is often negligible.
F MAP−SE can itself be approximated using a limited error set E. We call this approximation F E−SE , and it can be calculated in much the same manner as F E .Again, E is first partitioned by syndrome to give B 1 , . . ., B r .For each 1 i r, we then determine the most likely error Ẽi ∈ B i , which we use to define Âi = { Ẽ ∈ B i : Ẽi Ẽ ∈ S}.With this altered definition of Âi , F E−SE is given by the right-hand side of Eq. (44).Furthermore, the relative error bound of Eq. ( 47) also holds for F E−SE with respect to F MAP−SE .We emphasize that F E−SE can be calculated faster than F E as there is no need to fully partition each B i .

C. Most likely error only
As outlined in Sec.II D, the single most likely error decoder for an [[n, k]] stabilizer code can be viewed as a decoder for an associated [2n, n + k] classical code C.However, the calculation of F MAP−SE as in Eq. ( 35) is more complicated than determining the FER of a classical MAP decoder as the cosets Ẽz S still need to be enumerated.If we ignore the coset nature of the error correction, then we get where "SEO" stands for "single error only."Note that this is exactly the FER of the classical decoder for C as in Eq. ( 10).
Given the nature of the assumptions leading to Eq. ( 51), it also holds for F MAP−SEO .Again, it is a very loose upper bound, and as can be seen in Fig. 4, F MAP−SEO does tend to be somewhat close to F MAP .In particular, it can be seen that the codes yielding a minimal value of F MAP−SEO also often yield a near-minimal value of F MAP .F MAP−SEO can also be approximated using a limited error set to yield F E−SEO .This involves first partitioning E to form B 1 , . . ., B r and then determining the most likely error Ẽi in B i .By defining Âi = Ẽi , F E−SEO is also given by the right-hand side of Eq. (44).Note that as no partitioning of each B i is required, calculating F E−SEO is less complex than calculating F E−SE (or, indeed, F E ).The upper bound on relative error given in Eq. ( 47) again holds for F E−SEO with respect to F MAP−SEO .
Assuming that E contains the most likely errors in Pn , which is the case for the construction given in Sec.III A, we can derive another simple bound.In particular, if E contains errors corresponding to r different syndromes, then an error Ẽ / ∈ E yielding one of the other 2 n−k − r possible syndromes must have probability P( Ẽ ) min Ẽ ∈E P( Ẽ ) (as otherwise it would be an element of E).This gives which leads to a combined bound on the relative error of

IV. CODE PERFORMANCE
In this section, we employ the approximate FER calculation methods outlined in Sec.III to investigate the performance of various families of codes on biased X Z and AD channels.There is a particular focus on the performance of cyclic codes as it has previously been shown that a [ [7,1,3]] cyclic code with S = X ZIZX II cyc performs near optimally on the biased X Z channel for a range of error probabilities and biases [4].

A. [[7,1]] codes
To demonstrate our approach, we first consider the case of [ [7,1]] codes.We have constructed all of the [ [7,1]] cyclic codes by enumerating the self-orthogonal additive cyclic (7, 2 6 ) 4 codes as outlined in Sec.II B. There are 11 such codes, six of which are inequivalent.Following the lead of Ref. [4], we have also constructed 10 000 random codes to serve as a point of comparison.Our random construction, as detailed in Sec.III A, differs to that of Ref. [4] in that we do not require our codes to have weight-four generators or distance d 3.For both biased X Z and AD channels with p = 0.1, 0.01, 0.001, or 0.0001 and η = 1, 10, 100, or 1 000, we have determined F E for each code, ensuring that in every case, E is large enough to give E 0.01.This can be achieved without having to construct a new E for every FER calculation.For some channel type (biased X Z or AD), channel parameter combination (p and η pair), and code family (random or cyclic), we first construct E, as outlined in Sec.III A, such that 1 − P(E ) 0.1 and then calculate F E for every code in the family.If E > 0.01 for any of these codes, we then add errors to E until 1 − P(E ) 0.01 and recalculate F E for these codes.This proceeds iteratively, reducing 1 − P(E ) by a factor of 10 each time, until E 0.01 for every code.
For each channel type, channel parameter combination, and code family, we report two values.The first of these is simply the lowest FER of any code in the family, which can be viewed as a performance measure of the family as a whole.The second is the FER of the code that performs the best on average across all channel parameter combinations.We quantify this average performance by taking the geometric mean of a code's FERs across the associated channels.That is, we take the best code to be the one with stabilizer where F is the family of stabilizers and E i is the error set associated with one of the N = 16 channels.Figure 5 shows these values for the biased X Z channel.It can be seen that for every parameter combination, there is a cyclic code that performs nearly as well as the best random code.Furthermore, there is a single cyclic code that performs optimally (among the cyclic codes) on all channels.In fact, there are three such codes; however, they are all equivalent to the code with stabilizer S = X ZIZX II cyc .The values for the AD channel are shown in Fig. 6, where the code with stabilizer X ZIZX II cyc again performs optimally among the cyclic codes; however, in some cases, it is outperformed by the best random code by quite a margin, particularly at lower error probabilities (note that for consistency, we have used the same random codes for both channel types).At these low error probabilities, it can also be seen that unlike the biased X Z channel, increasing the bias does little to decrease the error rate.Interestingly, the code with stabilizer Y ZIZY II cyc , which is not equivalent to X ZIZX II cyc , yields the same performance.This is a result of the fact that p X = p Y for the AD channel, which means that applying the permutation X ↔ Y to a code's stabilizer on any subset of qubits has no effect on its performance.Note that the relative error of a geometric mean of FERs, such as the one in Eq. ( 56), is bounded by the relative error of the least accurate individual FER.This follows from which gives

B. Other parameters
We have repeated the analysis of Sec.IV A for codes with 5 n 12 and 1 k 3.For each combination of n and k, this has again begun by constructing 10 000 random codes and enumerating the cyclic stabilizer codes.The number of these cyclic codes is given in the first column of Table I.The first value in each row gives the number of inequivalent codes, while the value in brackets gives the total number of distinct codes.Note that for odd n, the number of distinct codes we report is consistent with Ref. [17].To the best of our knowledge, neither the number of distinct codes with even n or the number of inequivalent codes with any n has previously been published (Ref.[18] does give total number of distinct [[n, k n]] cyclic codes, but it does not include the number for each specific k).Note that in some cases, there are no cyclic codes.
For each channel type, code family, and pair of n and k, we report two values.The first of these is the geometric mean of the FERs for the single best code as defined in Eq. (56); that is,  [5,2]] 0 (0) 0 (0) 0 (0) 0 (0) 0 0 (0) [ [5,3]] 0 (0) 0 (0) 0 (0) 0 (0) 0 0 (0) [ [6,1] ) 2 0 (0) [ [11,2]] 0 (0) 0 (0) 0 (0) 0 (0) 0 0 (0) [ [11,3]] 0 (0) 0 (0) 0 (0) 0 (0) 3 0 (0) [ [12,1]] 300 (465) 162 ( 288) 51 (75) 20 (20) 0 0 (0) [ [12,2]] 536 (768) 288 (432) 65 (81) 35 (35) 11 2 (3) [ [12,3]] 312 (528) 198 (360) 27 ( 30) 26 (26) 0 0 (0) The second value is the geometric mean of the minimum FERs of all codes in a family for each channel; that is, which can again be viewed as a performance measure of the family as a whole.Figure 7 shows these values for the biased X Z channel.It can be seen that for both the random and cyclic codes, there is typically a single code that performs nearly as well as the family as a whole across the 16 different channels considered.Furthermore, when [[n, k]] cyclic codes exist, there is often one that performs as well as or better than the best random code we have created.In fact, for n 9 and k = 1, the best cyclic codes significantly outperform the best random codes.The results for the AD channel are given in Fig. 8. Again, where [[n, k]] cyclic codes exist, they typically perform favorably compared to the random codes.However, any performance advantages over the random codes are less pronounced than in the biased X Z case.Generators for the best cyclic codes on both the biased X Z and AD channels can be found in Table II (for reference, we also give their distances).In particular, we list generators for all codes that yield a geometric mean of FERs within 1% of the minimum value we have observed (these are all codes that could conceivably be optimal within our margin of error).There are a few notable properties of these codes.The first of these is that they can all be expressed using a single generator.While, as shown in the second column of Table I, a large number of codes have such a representation, this is still a somewhat surprising result.It can also be seen that in nearly every case, there are codes that perform well for both the biased X Z and AD channels (the only exceptions to this are the [ [6,1]], [ [6,2]], and [ [10,2]] cases).A third property of note is that the codes for the AD channel typically come in pairs, TABLE II.Generators and distances for the best performing inequivalent cyclic codes on the biased X Z and AD channels.Note that each stabilizer can be expressed using a single generator; that is, each generator given corresponds to a different code.The generators of codes performing well on both channel types are given in bold.
[ [8,3]] one being an X ↔ Y permuted version of the other.This is to be expected given the partial channel symmetry outlined in Sec.IV A. The only two exceptions to this are the [ [5,1]] and [ [10,2]] cases, where the single code given is invariant under an X ↔ Y permutation (up to a permutation of qubit labels).

C. Hill climbing
The results of Sec.IV B, particularly those for [[n 9, 1]] codes on the biased X Z channel, show that constructing 10 000 random codes is not a reliable way of finding a good code for larger n.One approach to find better codes would be to simply increase the size of the random search.However, even with the reduction in error set size afforded by the approach of Sec.III A, this quickly becomes computationally impractical.As such, we need a more efficient search strategy.
To achieve this, we use the observation of Sec.III C that codes yielding a low F E−SEO tend to also yield a low F E (recall that F E F E−SEO ).We can therefore reduce the search to finding codes that yield a low F E−SEO , which is beneficial as it is typically several orders of magnitude faster to calculate F E−SEO than it is to calculate F E to the same accuracy.
We start by considering the problem of finding codes that perform well for a single channel parameter combination.That is, we want to find a stabilizer S that yields a low F S E−SEO .We have found a simple hill-climbing search strategy to be effective at this.This involves first constructing S at random.S is then mutated (modified) somehow to produce S , and if then S is replaced with S .This process repeats for a predetermined number of iterations, after which we calculate F S E to quantify the actual performance of the code.Similar to the random search outlined in Sec.IV A, we ensure that the relative error of all approximate FER calculations is less than 1%.To achieve this, we again initially construct E such that 1 − P(E ) 0.1, and if E−SEO > 0.01 ( E > 0.01) for any calculation of F E−SEO (F E ), then we add errors to E to reduce 1 − P(E ) by a factor of 10 and recalculate the error rate.To better explore the space of possible stabilizers, we run a number of these hill-climbing instances in parallel (this is often called hill climbing with random restarts [38]).
The choice of a mutation operator that maps S to S is limited by the requirement that S must be a stabilizer.We consider two types of mutation that satisfy this constraint.The first of these involves permuting the nonidentity Pauli matrices of all stabilizer elements at any given index 1 i n with probability 1/n.Note that these permutations correspond to a multiplication of coordinates of the associated classical GF(4) code by a nonzero scalar α ∈ GF(4) followed by a possible conjugation.The second mutation method involves first removing any given generator M i of S = M 1 , . . ., M n−k with probability 1/(n − k) and then adding generators, as outlined in Sec.III A, to form S .When performing this generator mutation, we still require that all qubits are involved in the stabilizer; if this is not achieved after adding the new generators, we remove them and try again.To compare these two mutation operators, we consider [ [9,1]] codes on the biased X Z channel with p = 0.01 and η = 10.We have run 1 000 hill-climbing instances, each for a maximum of 1 000 iterations.Across all of these instances, Fig. 9 shows the 95th percentile F E−SEO at each iteration; that is, it shows the 50th lowest F E−SEO (we have chosen to show this value as it reflects the performance of the best codes while having less potential variance than showing the FER of the single best code).As a control, we have also tested random mutation, which involves simply creating S at random (this reduces hill climbing to a random search).It can be seen that both the permutation and generator mutation outperform this random mutation, with the permutation mutation performing best initially but then tapering off somewhat.Finally, we have tested a combination of the two mutation methods (a generator mutation followed by a permutation mutation), which can be seen to perform better than either of the methods individually.

D. Multiobjective hill climbing
The results of Sec.IV B suggest that there are typically codes that perform well across a range of channel parameter combinations.We can search for such codes by building on the hill-climbing algorithm outlined in Sec.IV C. In particular, instead of comparing F S E−SEO to F S E−SEO , we compute and compare the geometric means ( N i=1 F S E i −SEO ) 1/N and ( N i=1 F S E i −SEO ) 1/N of the FERs for N channel parameter combinations.Following Eq. (58), we ensure that these geometric means are accurate to within 1% by keeping each of the individual E i −SEO 0.01 as outlined in Sec.IV C. Again, we run a number of these hill-climbing instances in parallel, and at the end of each one, we calculate ( N i=1 F S E i ) 1/N .Note that for N = 1, this search reduces to that of Sec.IV C.
We have performed such searches for the same cases considered in Sec.IV B (that is, codes with 5 n 12 and 1 k 3 for biased X Z and AD channels with p = 0.1, 0.01, 0.001, or 0.0001 and η = 1, 10, 100, or 1 000).For each combination of n, k, and channel type, we have run 1 000 hill-climbing instances based on the combined generator and permutation mutation, each for 1 000 iterations. Figure 10 compares the performance (that is, the geometric mean of FERs) of the best codes found in this way to that of the best cyclic codes (the other values shown will be detailed in Secs.IV E to IV G).It can be seen that in all but the [ [10,1]] case, the best code found via hill climbing is either as good as or better than the best cyclic code.Very similar results can be seen in Fig. 11 for the AD channel, where the best code found via hill climbing performs as well as or better than the best cyclic code in every instance.Generators for the best codes FIG. 10.The performance (geometric mean of FERs) of the best [[5 n 12, 1 k 3]] codes found via hill climbing for biased X Z channels with with p = 0.1, 0.01, 0.001, or 0.0001 and η = 1, 10, 100, or 1 000.Also shown is the performance of the best cyclic codes and dual-containing CSS codes.
we have found for the biased X Z and AD channels can be found in Tables III and IV, respectively.

E. Weight-four codes
Through slight modification of the hill-climbing algorithm, we can search for good codes that satisfy structure constraints.The first constraint we consider is the requirement that the stabilizer has a representation involving only weight-four generators; such codes are of practical interest as their syndrome measurements involve fewer qubits, and are hence less complex, than those for codes with high-weight generators.The first modification required to search for these codes, which is somewhat obvious, is to ensure the initial random stabilizer has weight-four generators.This also extends to the generator permutation; that is, any generator added to replace a removed one must also have weight four.No change to the permutation mutation is required as it preserves the weight of stabilizer elements.We compare the codes found via this constrained hill-climbing search to the cyclic codes with a weight-four generator representation.The number of such cyclic codes is given in the third column of Table I, where it can be seen that they are reasonably plentiful.
The performance of the weight-four codes found via hill climbing for the biased X Z channel is shown in Fig. 10.It can be seen that in a lot of cases, these codes perform nearly as well as those found using unconstrained hill climbing in Sec.IV D. The performance of the weight-four cyclic codes is more varied.In some cases, they are optimal (among the cyclic codes), while in others, they perform relatively poorly.Figure 11 shows that the performance of the weight-four codes found via hill climbing for the AD channel is somewhat mixed, ranging from outperforming the unconstrained [ [9,1]] codes to performing very poorly for k = 3 and n 8.The performance of the weight-four cyclic codes relative to the best unconstrained cyclic codes is much the same as for the biased X Z channel.Generators for the best weight-four codes found via hill climbing can be found in Tables V and VI, and generators for the best cyclic codes are given in Table VII.

F. CSS codes
We next consider CSS codes, which, as outlined in Sec.II D, are codes that can be represented using generators that contain either only X or only Z matrices as their nonidentity elements.Similar to the search for weight-four codes, we must modify both the initial stabilizer construction and the generator permutation.In particular, when adding a new generator, we will select a suitable X -only element half the time and a Z-only element the other half.Another required modification is the removal of the permutation mutation as, in general, it does not map CSS codes to CSS codes.We also consider cyclic CSS codes, which can be thought of in two equivalent ways.They can be viewed as codes with a binary representation where HX and HZ each correspond to a binary cyclic code.Alternatively, they can be considered in the GF(4) framework as additive cyclic codes that can be represented by a 1-only cyclic generator and/or an ω-only cyclic generator.The number of these cyclic CSS codes is given in the fourth column of Table I.We also consider the family of dual-containing CSS codes to generalize the result of Ref. [4], where it was shown that the [ [7,1,3]] Steane code [39], which has performs poorly on the biased X Z channel.We have constructed these codes by enumerating all of the inequivalent binary self-orthogonal codes using SAGEMATH [40] (recall that a generator matrix for a binary-self orthogonal code is the parity-check matrix for a dual-containing code).The number of such codes is given in the fifth column of Table I.Note that there can only be an [[n, k]] dual-containing CSS code if n − k is even; furthermore, even when n − k is even, not many of them exist for the parameters considered.
As can be seen for the biased X Z channel in Fig. 10, both the CSS codes found via hill climbing and the cyclic CSS codes perform poorly compared to their non-CSS counterparts.This performance can be improved by following the modification outlined in Ref. [5], which involves applying the permutation Z ↔ Y to the code's generators (this is motivated by the fact that Z-only generators commute with any Z-only error, meaning that they often provide no information about an error when η is large).Given the nature of this modification, we call such codes CSSY codes.We have performed a hill-climbing search for CSSY codes, and it can be seen that they perform significantly better than the standard CSS codes; however, they are still outperformed by non-CSS codes in most instances.Similarly, while the cyclic CSSY codes perform better than the cyclic CSS codes, there is often a significant performance gap to the non-CSS cyclic codes.The dual-containing CSS codes perform poorly across the board, which can at least partially be attributed to the fact that they must have d X = d Z .Furthermore, their performance cannot be improved as they are invariant under a Z ↔ Y permutation.As shown in Fig. 11, the results on the AD channel are similar to those for the biased X Z channel.Both the CSS codes found via hill climbing and the cyclic CSS codes perform poorly compared to the non-CSS codes.In this case, the performance gain of the CSSY codes over the CSS codes is less pronounced.A notable exception to this is the [ [9,1]] case where the best CSSY code found via hill climbing outperforms the best unrestricted code found.Somewhat surprisingly, after applying an X ↔ Y permutation to the second, fourth, fifth, sixth, and ninth qubits, this code is equivalent to the best code with weight-four generators found in Sec.IV E. Again, the performance of the dual-containing CSS codes is very poor compared to nearly all other codes considered.Generators for the best CSSY codes found via hill climbing can be found in Tables VIII and IX.We omit the standard CSS codes found via hill climbing and the cyclic CSS(Y) codes due to their poor performance.

G. Linear codes
The dual-containing CSS codes considered in the previous section are examples of linear stabilizer codes.An additive (n, 2 n−k ) 4 code C is linear if and only if it has a generating set of the form B = {b 1 , . . ., b (n−k)/2 , ωb 1 , . . ., ωb (n−k)/2 }.This corresponds to the stabilizer having generators of the form S = M 1 , . . ., M (n−k)/2 , M1 , . . ., M(n−k)/2 , where Mi is a version of M i that has been subjected to the permutation (X, Y, Z ) → (Z, X, Y ).To search for such codes, we must first modify the initial construction and generator mutations.In particular, we add or remove the generators M i and Mi in pairs.To preserve linearity, the permutation mutation has to be restricted to permutations corresponding to a multiplication of a coordinate of C by ω or ω.That is, the permutation must either be (X, Y, Z ) → (Z, X, Y ) or (X, Y, Z ) → (Y, Z, X ).We also consider linear cyclic codes, the structure of which is outlined in Sec.II B. The number of such codes is given in the sixth column of Table I.Like the dual-containing CSS codes, [[n, k]] linear codes can only exist for even n − k; furthermore, while n − k is even for [ [5,3]] codes, there are no linear codes with these parameters that involve every qubit.
As shown in Fig. 10, the linear codes found via hill climbing perform reasonably well on the biased X Z channel.The performance of the linear cyclic codes is somewhat less impressive, with there being a significant gap in performance to the more general additive cyclic codes.This can potentially be attributed to the fact that, at least for the code parameters considered, there are very few linear codes.As can be seen in Fig. 11, the linear codes found via hill climbing for the AD channel perform better than those on the biased X Z channel, particularly in the k = 3 case.However, the linear cyclic codes still perform poorly.The best linear codes found via hill climbing are given in Tables X and XI.We omit the linear cyclic codes due to their poor performance.

V. CONCLUSION
We have shown that the error rate of an optimal stabilizer code decoder can be effectively approximated by considering only a limited subset E of the 4 n possible Pauli errors, and we have outlined how to construct E without having to enumerate all of these errors.Utilizing this approximate calculation, we have demonstrated that there are a number of [[5 n 12, 1 k 3]] cyclic stabilizer codes that perform very well on both the biased X Z and AD channels across a range of error probabilities and biases.We have also shown that an indication of the performance of a stabilizer code can be obtained by considering the error rate of an associated [2n, n + k] classical code.We have used this as the basis for a hill-climbing algorithm, which we have shown to be effective at optimizing codes for both of the asymmetric channels considered.Furthermore, we have demonstrated that by modifying the mutation operation of this hill-climbing algorithm, it is possible to search for highly performant codes that satisfy structure constraints.In particular, we have successfully performed searches for codes with weight-four generators, CSS(Y) codes, and linear codes.

FractionFIG. 1 .
FIG.1.The fraction of 1 000 randomly generated [[7,1]] codes that yield a relative error δ E 0.01 or relative error bound E 0.01 for varying |E| and biased X Z channel parameters.

5 FIG. 3 .
FIG.3.F MAP versus F MAP−SE for 1 000 random [[7,1]] codes on biased X Z channels with varying parameters.The dotted lines give F MAP = F MAP−SE .

5 FIG. 4 .
FIG.4.F MAP versus F MAP−SEO for 1 000 random [[7,1]] codes on biased X Z channels with varying parameters.The dotted lines give F MAP = F MAP−SEO .

TABLE I .
The number of inequivalent (distinct)[[n, k]] cyclic codes, single-generator cyclic codes, cyclic codes with weight-four generators, cyclic CSS codes, dual-containing CSS codes, and linear cyclic codes.

TABLE III .
Generators and distances for the best codes found for the biased X Z channel using hill climbing.

TABLE IV .
Generators and distances for the best codes found for the AD channel using hill climbing.

TABLE V .
Generators and distances for the best weight-four codes found for the biased X Z channel using hill climbing.

TABLE VI .
Generators and distances for the best weight-four codes found for the AD channel using hill climbing.

TABLE VII .
Generators and distances for the best performing inequivalent cyclic codes with weight-four generators on the biased X Z and AD channels.If a code requires two generators, they are grouped in brackets; otherwise, a single generator is given as in TableII.The generators of codes performing well on both channel types are given in bold, while the generators for codes previously appearing in Table II are marked with an asterisk.

TABLE VIII .
Generators and distances for the best CSSY codes found for the biased X Z channel using hill climbing.

TABLE IX .
Generators and distances for the best CSSY codes found for the AD channel using hill climbing.

TABLE X .
Generators and distances for the best linear codes found for the biased X Z channel using hill climbing.

TABLE XI .
Generators and distances for the best linear codes found for the AD channel using hill climbing.