Estimating the bias of CX gates via character randomized benchmarking

Recent work has demonstrated that high-threshold quantum error correction is possible for biased-noise qubits, provided one can implement a controlled-not (CX) gate that preserves the bias. Bias-preserving CX gates have been proposed for several biased-noise qubit platforms, most notably Kerr cats. However, experimentally measuring the noise bias is challenging as it requires accurately estimating certain low-probability Pauli errors in the presence of much larger state preparation and measurement (SPAM) errors. In this paper, we introduce bias randomized benchmarking (BRB) as a technique for measuring bias in quantum gates. BRB, like all RB protocols, is highly accurate and immune to SPAM errors. Our first protocol, CX-dihedral BRB, is a straightforward method to measure the bias of the entire CX-dihedral group. Our second protocol, interleaved bias randomized benchmarking (IBRB), is a generalization of interleaved RB tailored to the experimental constraints biased-noise qubits; this is a more involved procedure that directly targets the bias of the CX gate alone. Our BRB procedures occupy a middle ground between classic RB protocols that only estimate the average fidelity, and tomographic RB protocols that provide more detailed characterization of noise but require more measurements as well as experimental capabilities that are not necessarily available in biased-noise qubits.


I. INTRODUCTION
Achieving scalable quantum computing will require efficient error-correction to counter the effects of noise. Recent work has revealed that there exist error correcting codes that tolerate much higher noise rates [1][2][3][4][5][6][7] when the noise is biased, that is, when errors that cause bit flips are suppressed compared to errors that cause only phase flips. In order to efficiently detect errors, these proposals rely on using controlled-NOT (CX) gates that are bias-preserving such that bit-flip noise remain suppressed to leading order [2][3][4][5][6][7][8][9][10]. Without bias-preserving CX gates, error correction is possible but requires more complex circuits, reducing the effectiveness of the underlying code [11]. Recently, CX gates that preserve the noise bias have been theoretically proposed in Kerr cat [12] and other qubit platforms [13,14] (see also [15] for improvements to bias-preserving CX gates in Kerr cat qubits). Experimental efforts towards realizing these proposals are rapidly growing. In order to determine the effectiveness of the bias-tailored codes implemented with the experimentally realized CX gates, it is necessary to estimate the amount of noise-asymmetry or bias along with the total error probability of these gates. However, it is challenging to precisely measure the rate of bit flip errors of highly-biased noise channels, as such rates are extremely low.
A common method for precisely estimating low error probabilities in quantum gates is randomized benchmarking (RB) [16][17][18][19] and its derivatives [20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36]. In RB, one randomly generates circuits from some set of elementary gates and estimates the average error probability as a function of the circuit depth; the decay rate of the error probability with circuit depth then gives information about the error rate in the circuit. RB protocols typically boast two main advantages over other characteri-zation methods. Primarily, RB protocols decouple state preparation and measurement (SPAM) errors from gate errors. In addition, because RB involves evaluating the error rate of circuits composed of many elementary gates, small error probabilities are magnified and may be precisely measured. The accuracy of RB experiments to estimate properties of the error channel can be rigorously guaranteed in a variety of settings [23,24,[37][38][39][40][41].
There exists a zoo of different RB protocols (see [25] for a taxonomy) which differ in whether they characterize a group of gates or a single gate, whether they measure only the average fidelity or additional properties of the noise, and whether they are tailored to specific experimental hardware. The original RB measures the fidelity averaged over elements of the Clifford group [16,17], but there also exist extensions of RB which measure the average fidelity of other efficientlysimulatable groups [22,26,27], most notably character RB [21,23,24] which uses techniques from representation theory to significantly simplify the RB decay functions. In contrast to RB methods characterizing groups of gates, interleaved RB [20] and its variants [23,[28][29][30][31]42] instead measure the average fidelity of a single gate or layer of gates. There also exist RB protocols that measure properties of the noise channel beyond the average fidelity for either groups of gates or specific gates [32-35, 43, 44]. Moreover, efforts have been directed towards designing both group and interleaved RB protocols that are specifically tailored to the gate-set available in a given hardware [24,36,42,45].
In this spirit, here we introduce group and interleaved RB procedures tailored to extract information about the total noise as well as noise-asymmetry in biased-noise hardware. Our first procedure is CX-dihedral bias RB (BRB), which is a straightforward modification of CXdihedral RB [22,23]. This procedure measures the noise arXiv:2206.00009v1 [quant-ph] 31 May 2022 bias and average fidelity of the entire CX-dihedral group. Our second procedure, interleaved bias RB (IBRB), is a generalization of interleaved RB [20] and the recently introduced 2-for-1 interleaved RB [23,42]. In particular, the IBRB procedure is tailored to the available gate-set of biased-noise qubits [12,46,47]. This procedure uses interleaved CX and Z gates to estimate the bias and fidelity of the CX. Both the BRB and IBRB protocols make heavy use of the recently introduced framework of character RB [23,24].
Our approach to biased-noise benchmarking is notably different from Pauli channel estimation [34,35,44]. Pauli channel estimation is an interleaved RB procedure that uses character RB and randomized compiling to measure the full set of Pauli-diagonal elements of the noise channel associated with any Clifford operation. While Pauli channel estimation would appear to be sufficient for measuring the bias of CX gates, it has two drawbacks that make it unsuitable for estimating bias in biasednoise qubits. First, Pauli channel estimation demands considerable experimental overhead, as it estimates the probability of each Pauli error rather than the probability of sets of Pauli errors such as bit-flips and phase-flips. More importantly, Pauli channel estimation requires interleaving the full Pauli group between the Clifford operators, and is only guaranteed to measure the Clifford's noise when the Pauli group is high-fidelity. This is not suitable for biased-noise qubits, where we generically expect X and Y gates to be of comparable fidelity to CX gates if implemented in a bias-preserving manner [12]. On the other hand, if X and Y gates are not implemented in a bias-preserving manner then it won't be possible to accurately estimate the suppressed bit-flip rate in the CX gates. In contrast to Pauli channel estimation, our interleaved bias RB is designed specifically to use only interleaved Z gates, which are trivially bias-preserving and can be implemented with high-fidelity [8,12].
Our paper is organized as follows. In Section II, we define the bias and fidelity of a multi-qubit gate in terms of the Pauli-diagonal part of its noise channel. In Section III, we introduce the Pauli, Z, and CX-dihedral groups that we will use to benchmark the bias. In Section IV, we give the step-by-step instructions for our bias RB procedures and illustrate them with simulated experiments. The derivations of these procedures, as well as technical details about biased-noise error channels, are relegated to appendices.

II. DEFINING THE BIAS
Intuitively, the bias of a noise channel Λ is a measure of the likelihood that an error will not flip a bit and will instead only apply an erroneous phase. We refer to errors that only apply an erroneous phase as dephasing errors, while we refer to errors that include a bit-flip as non-dephasing errors, even if they also apply erroneous phases. For example, an error 1 ⊗ Z is dephasing, while Y ⊗ Z is non-dephasing.
To formally define the bias, we introduce the χ χ χ-matrix of a quantum channel [48]. Since the Pauli group on N qubits forms a basis for the set of all operators on N qubits, we can write the action of an arbitrary noise channel as for some coefficients χ α1 β1, α2 β2 [49]. Here α i , β i ∈ Z N 2 are vectors of 0s and 1s that index the Pauli group, and for any single qubit operator O we use the notation The diagonal of the χ-matrix is non-negative and sums to one. As a Pauli error is dephasing if and only if α = 0, we define the probabilities of dephasing and nondephasing errors, p D and p ND , by Finally, we define the bias η as the ratio of the error probabilities, This definition of the bias for a multi-qubit gate was originally given in [12]. The probability of no error is given by χ 0 0, 0 0 , which is related to p D and p ND in the obvious way The usual measure of gate quality is the average fidelity F Λ , which in can be written in terms of the χ-matrix as [50,51] where N is the number of qubits. There exist numerous RB procedures to measure the average fidelity of a group of gates [16,17,[21][22][23][24][25] or a specific gate [20,23,28,29,31,42] and therefore determine χ 0 0, 0 0 . Note that while p D and p ND only directly give information about the diagonal elements of the χ-matrix, complete-positivity of the noise channel Λ implies the offdiagonal elements of the χ-matrix are bounded by the diagonal elements as [33,Appendix D] |χ α1 β1, α2 β2 | ≤ χ α1 β1, α1 β1 χ α2 β2, α2 β2 .
One can use this bound to show that if two biased noise channels Λ A and Λ B have non-dephasing error probabilities p A ND and p B ND , their composition will still have and similar for p D (see Appendix B). Thus, p ND as defined in Eq. 4 is a useful characterization of the channel's behavior under composition even if the off-diagonal elements are not negligible. We also note that previous work on quantum error correction has demonstrated that measurement of the stabilizers of an error-correcting code causes the off-diagonal elements of the χ-matrix to rapidly decay [52][53][54][55].
III. THE PAULI, Z, AND CX-DIHEDRAL QUANTUM GROUPS We will focus specifically on the Pauli, Z, and CXdihedral [22,56] groups. These are all finite subgroups of the full unitary group that can be efficiently simulated. We will consider these groups to be defined modulo overall phases for convenience.
The N -qubit Pauli group P N is the group generated by the Pauli X and Z operators on all N qubits. As XZ = ZX up to a phase, we may assume all X operators appear to the left of all Z operators. We then have Since X( α 1 )Z( β 1 )X( α 2 )Z( β 2 ) = X( α 1 + α 2 )Z( β 1 + β 2 ) up to a phase, this group is isomorphic to Z 2N 2 The N -qubit Z group Z N is the commutative subgroup of the Pauli group generated by all single-qubit Z: This group is clearly isomorphic to Z N 2 . Finally, the N -qubit CX-Dihedral group D N is the group generated by the N -qubit dihedral group along with all CX gates between any pair of qubits [22]: Here, T i := exp{iπZ i /4} is the T-gate, while CX i,j denotes the CX gate with the first index, i, the control and the second index, j, the target. Simple presentations of D N were introduced in [22,57], and efficient decompositions of elements of D N which minimize the number of two-qubit gates are given in [56].

IV. THE BIAS RB PROCEDURES
In this section, we outline our two bias RB procedures. The first is a simple and scalable protocol for measuring p D and p ND for the entire CX-dihedral group, which may be a useful proxy for the performance of the CX gate when the X and T gates are high-fidelity. The second is a protocol for measuring p D and p ND for the CX gate CX 1,2 acting on two qubits, by interleaving with elements of Z 2 and randomly swapping between the usual CX 1,2 gate and a |0 -controlled CX gate which flips the target qubit if the control qubit is in the |0 state. The |0 -controlled CX gate can be written as a CX gate conjugated by a Pauli X on the control qubit, X 1 CX 1,2 X 1 . This second protocol is specifically designed for biased-noise, stabilized-cat qubits, where the fidelity of diagonal Z gates is much higher than the fidelity of X, and where we can apply both a bias-preserving CX 1,2 and a bias-preserving |0 -controlled CX gate by simply changing the phase of a drive [9,12,13] so that these two operations have similar p D and p ND . However, we expect this protocol to be generally applicable to other biasednoise architectures, since arbitrary biased-noise architectures will likely have much higher-fidelity diagonal gates than off-diagonal gates [9,13], and X 1 CX 1,2 X 1 differs from CX 1,2 simply by swapping the roles of |0 and |1 in the control qubit.

A. CX-dihedral bias RB (BRB)
Standard Clifford RB measures the fidelity averaged over elements of the Clifford group; similarly, CXdihedral bias RB measures p D and p ND averaged over elements of the CX-dihedral group D N . Our procedure is essentially identical to the previous character RB procedure for estimating the fidelity of the CX-dihedral group [21,23]. The only modification is a post-processing step to extract the dephasing and non-dephasing error probabilities instead of just the average fidelity.
For convenience, we make the standard RB assumption of gate-independent noise, so that the noisy implementation of any U ∈ D N is Λ • U for a noise channel Λ independent of U . However, like the usual character RB, our procedure works for gate-dependent noise, provided all U ∈ D N are high-fidelity; see [23,24] for proofs.
The CX-dihedral bias RB procedure is: 1. For each b in Table I: (a) For arbitrary n, choose unitaries U 0 ∈ P N and U 1 , ..., U n ∈ D N at random. Set U n+1 = U † 1 · · · U † n .
(b) Prepare the initial state ρ b listed in Table I. (c) Successively apply the gates (U 1 U 0 ), U 2 , U 3 , . . . , U n+1 . Note that instead of applying U 0 and then U 1 , we compile the product U 1 U 0 into a single element of D N .
(d) Perform a measurement of the observable E b in Table I. Weight the outcome by χ * b (U 0 ). (e) Repeat steps 1a-1d many times to estimate the character-weighted survival probability where P {Ui} denotes the expectation value of E b after applying the gates in step 1c and E[·] denotes the average over the choices of gates.
(f) Repeat steps 1a-1e for many different values of n to estimate the whole S b (n) curve.
(g) Fit the decay curve, S b (n), to the functional form listed in Table I to estimate λ b .
2. Estimate the dephasing and non-dephasing error probabilities of the error channel Λ as Note that the weights and measurement outcomes satisfy |χ * b (U 0 )P {Ui} | ≤ 1, so that if we take N s samples to estimate S b (n) at a specific value of b and n, the statistical uncertainty in our estimate is roughly 1/ √ N s , independent of n or N . To achieve a good relative uncertainty, then, we simply need the true value of S b (n) to be sufficiently large. For high-fidelity gates, we show in our derivation in Appendix A 4 that A b ≈ 1, λ b ≈ 1, so that we can reliably fit the decay curve.
We give an example of CX-dihedral BRB in Fig. 1. Here, we generate random error channels Λ by generating random sets of Kraus operators, and simulate a CXdihedral BRB experiment (see Appendix D for details on the random error channels). Fig. 1a illustrates the experiment for a single error channel, where we estimate the value of S b (n) for a few different values of n and fit the data to the functional forms given in Table I. In this simulation, each value of S b (n) is estimated using 5000 measurements. To demonstrate the effectiveness of our procedure, we repeat this experiment for many different error channels, using Eq. 16 to estimate p D and p ND and finally using these estimates to extract the bias. In Fig.  1b we plot our estimated probabilities versus the exact probabilities of the error channel, and in Fig. 1c we do the same for the bias. To estimate the error bars, we use bootstrap resampling. Visually, it is clear that we are accurately estimating p D , p ND , and η even for very high biases. To verify this, we compute the reduced-χ 2 FIG. 1. Simulated CX-dihedral bias RB experiments. (a) An example of estimating S b (n), with the S b (n) curves plotted in orange and our estimates of S b (n) given by blue dots. To estimate sensitivity to statistical errors, we generate multiple fits of S b (n) using bootstrap resampling of our data, which are plotted in grey and generally overlap the exact S b (n) curves. (b) The probabilities pD and pND extracted from the fits for 50 randomly generated error channels. Error bars denote the standard deviation over 50 resamplings. (c) The estimated bias η for each of these error channels. We see that even at very high bias, we can accurately estimate the value of η. statistic for our estimates of p D , p ND , and η, and find χ 2 between 1 and 1.3, indicating that our bootstrapping is accurately estimating the error bars to within a factor of between 1 and √ 1.3.

B. Interleaved bias RB (IBRB)
A common approach to estimate the fidelity of a single gate is interleaved RB, in which one interleaves the gate of interest with random elements from an interleaving group designed to simplify the error channel. Originally, the gate of interest was required to be from the interleaving group [20], but later work has relaxed this requirement [23,28,29,31,42]. Interleaved RB estimates the fidelity of the combined error channel Λ U • Λ G of the gate U and interleaving group G; knowing the fidelity of Λ U • Λ G and the fidelity of Λ G allows one to bound the fidelity of Λ U , with bounds that become tight as the fidelity of Λ G approaches one [20,33,58]. Thus, we typically require the interleaving group to have high-fidelity. On the other hand, the advent of randomized compiling [59][60][61] implies that in some cases it is not necessary to separately estimate the fidelities of Λ U and Λ G , as Λ U • Λ G is the relevant error channel for a circuit that has been randomly compiled with the group G. However, in the case of randomized compiling, it is still necessary for G to be high-fidelity, so that the randomized compilation does not add in significant additional noise.
We develop an interleaved bias RB procedure that directly estimates the bias of the CX gate in Kerr cat qubits [12,46,47] or other biased-noise platforms [9,13,14]. In biased-noise qubits, we generally expect gates that are diagonal in the Z-basis to have much higher fidelity than non-diagonal gates. It is thus desirable to have a protocol that uses Z 2 as the interleaving group, as the fidelity of X and Y gates may be no better than the fidelity of the CX gate [62]. Restricting to an interleaving group such as Z 2 that is diagonal in the Z-basis introduces considerable complications, as we will see below.
For convenience, we define C := CX 1,2 to be a CX gate on the two qubits. We also define the gate C := X 1 CX 1,2 X 1 , which is similar to C, except that it applies X 2 when qubit 1 is in |0 rather than |1 . The Kerr cat system can implement bias-preserving versions of both of these gates using similar procedures, and we expect that both C and C will have similar dephasing and non-dephasing probabilities. These features will likely be shared by any biased-noise system. Define the error channels Λ C and Λ C to be the error channels associated with C and C respectively, so the noisy implementation of C is C • Λ C and similar for C . In addition define Λ G to be the error channel associated with gates U ∈ Z 2 , so the noisy implementation of U is Λ G • U . Finally, define Λ = Λ C • Λ G and Λ = Λ C • Λ G to be the composed error channels. Our protocol will directly estimate the dephasing and non-dephasing probabilities of the average channel (Λ + Λ )/2. If the dephasing and non-dephasing probabilities of Λ C and Λ C are close and Λ G ≈ 1, this is identical to the dephasing and non-dephasing probabilities of Λ C alone. On the other hand, even without these assumptions, we will see below that (Λ + Λ )/2 is the relevant error channel for a certain randomized compilation procedure.
The interleaved bias RB procedure is: Table II: (a) For arbitrary n, choose unitaries U 1 , ..., U n+1 ∈ Z 2 uniformly at random. Also choose CX gates C 1 , ..., C n ∈ {C, C } uniformly at random. (b) If there is more than one initial state ρ b listed in Table II, randomly select one of the listed initial states and prepare it. (c) Alternatively apply the gates from Z 2 and the CX gates as and χ b (U i ) is given in Table II. If b = 0− or b = 1− the effect of σ is to multiply the measurement outcome by (−1) for each C we apply, while otherwise σ is trivial. (e) Repeat steps 1a-1d many times, to estimate the signed character-weighted survival probability where P {Ci},{Ui},ρ b denotes the expectation value of E b after applying the gates in step 1c to ρ b .
(f) Repeat steps 1a-1e for many different values of n to estimate the whole S b (n) curve.
(g) Fit S b (n) to the functional form listed in Table II to estimate λ b and κ b , where we take the convention (λ b ) > (κ b ). Table II: (a) For arbitrary n, choose unitaries U 1 , ..., U 2n+1 ∈ Z 2 uniformly at random. Also choose CX gates C 1 , ..., C 2n ∈ {C, C } uniformly at random.

For b = 2± in
(b) If there is more than one initial state ρ b listed in Table II, randomly select one of the listed initial states and prepare it.
(c) Alternatively apply the gates from Z 2 and the CX gates as where χ 2± (U i ) are given in Table II.
(e) Repeat steps 2a-2d many times, to estimate the signed character-weighted survival probability where P {Ci},{Ui},ρ b denotes the expectation value of E b after applying the gates in step 2c to ρ b .
(f) Repeat steps 2a-2e for many different values of n to estimate the whole S b (n) curve.
3. Estimate the dephasing and non-dephasing error probabilities of the combined error channel Λ as To realize the mixed, non-positive state ρ = 1 2 Z ⊗ |0 0| we simply prepare either |00 00| or |10 10| with equal probability, and weight the resulting measurement by (−1) if we prepare |10 10|. We realize the states ρ = 1 2 1 ⊗ |+ +| and ρ = 1 2 Z ⊗ |+ +| similarly. In this procedure, for b = 0+ we need to accurately estimate λ 0+ and for b = 0+ we need to accurately estimate both λ b and κ b . To accurately fit these decay parameters, we require the prefactors A 0+ and A b , B b for b = 0+ to be large. As we will demonstrate in our derivation in Appendix A 5, for high-fidelity gates we expect A 0+ ≈ 1, as well as A b ≈ 1/2 and B b ≈ 1/2 for b = 0+, so that we may accurately fit the decay parameters.
In the case of b = 0− and b = 1−, we also need to distinguish between λ b and κ b , since they enter into Eq. 23 with different signs. As we demonstrate in Appendix A 5, for high-fidelity gates we expect λ b ≈ 1 and κ b ≈ −1, so we can define λ b to be the decay constant with the largest real part. In the case of b = 0+, b = 1+, and b = 2± we cannot differentiate between λ b and κ b , since they are both ≈ 1 for high-fidelity gates, but they enter Eq. 23 with the same sign and therefore do not need to be distinguished.
While there are nine initial states and measurements listed in Table II, one can use the same experimental data for b = 0+ and b = 0−, the two rows of b = 1+, for the first row of b = 2+ and the first row of b = 2−, and for the second row of b = 2+ and the second row of b = 2−, reducing the cost to five distinct pairs of initial states and measurements.
We give an example of IBRB in Fig. 2. Here, we generate random error channels Λ G , Λ C , and Λ C by generating random sets of Kraus operators, and simulate an IBRB experiment (see Appendix D for details on the random error channels). Fig. 2a illustrates the experiment for a single error channel, where we estimate the value of S b (n) for a few different values of n and fit the data to the functional forms given in Table II. Again, each value of S b (n) is estimated using 5000 measurements. Note oscillates with period 2, because λ b ≈ 1 and κ b ≈ −1. However, we can still accurately estimate the parameters taking only a few widely spaced data points, as shown in Fig. 2a, provided we take data at both even and odd sequence lengths n. This is because the rapid oscillations are constrained to have period 2 by the form of S b (n), so it is not necessary to take finegrained data at nearby values of n to fit this rapidly oscillating function. To demonstrate the effectiveness of our procedure, we repeat this experiment for many different error channels, using Eq. 23 to estimate p D and p ND and finally using these estimates to extract the bias. In Fig. 2b we plot our estimated probabilities versus the exact probabilities of the error channel, and in Fig. 2c we do the same for the bias. To estimate the error bars, we again use bootstrap resampling. Visually, it is clear that we are accurately estimating p D , p ND , and η even for very high biases. To verify this, we again compute To robustly fit S b (n) in the presence of these oscillations does not require densely sampling data points, since we know the oscillatory period is always 2; however, it does require taking data at both even and odd sequence lengths, as shown in the figure. To estimate sensitivity to statistical errors, we generate multiple fits of S b (n) using bootstrap resampling of our data, which are plotted in grey and generally overlap the exact S b (n) curves. (b) The probabilities pD and pND extracted from the fits for 50 randomly generated error channels. Error bars denote the standard deviation over 50 resamplings. (c) The estimated bias η for each of these error channels. We see that even at very high bias, we can accurately estimate the value of η.
the reduced-χ 2 statistic for our estimates, and find χ 2 between 2 and 3, indicating that our bootstrapping is accurately estimating the error bars to within a factor of between √ 2 and √ 3.

C. Randomized compiling for interleaved bias RB
Our IBRB measures p D and p ND for the averaged, composite channel (Λ + Λ )/2, with Λ = Λ C • Λ G and Λ = Λ C • Λ G . Provided that p D and p ND for Λ C and Λ C are equal and Λ G = 1, these are also the dephasing and non-dephasing probabilities for Λ C alone. However, in general we would like to avoid assuming the error channels Λ C and Λ C are identical, and we would like to allow for the possibility of Λ G = 1.
Previous interleaved RB procedures for determining the fidelity of a gate C have a similar problem; in these procedures one separately estimates the fidelity of Λ C • Λ G and Λ G , and uses this information to provide bounds on the fidelity of Λ C alone [20]. These bounds depend on the fidelity of Λ G , with lower fidelity Λ G resulting in looser bounds for the fidelity of Λ C . From this point of view, it is important for the interleaving group to be high-fidelity, in order to be able to tightly bound the fidelity of Λ C . It is also possible to use this method to bound p D and p ND of (Λ C + Λ C )/2, provided we know the corresponding probabilities for Λ G and Λ (see Appendix B 3). However, benchmarking p D and p ND of Λ G is challenging for G = Z 2 as averaging over sequences of Z 2 gates does not randomize error channels enough to guarantee a simple form of the survival probability.
On the other hand, the technique of randomized compiling [59][60][61] provides an alternative interpretation of interleaved RB results. In randomized compiling, one intentionally inserts random elements of a high-fidelity interleaving group between the lower-fidelity gates C in order to eliminate the coherence of the noise. In a circuit that has been randomly compiled, the error channel associated with a gate C is the combined error channel Λ C • Λ G . From this point of view, we require the interleaving group to be high-fidelity to ensure that randomly inserting elements of the interleaving group into a circuit does not notably increase the errors in the circuit.
We can do a modified version of randomized compiling for bias-preserving CX gates. Each time a CX gate appears in a circuit, we randomly insert an element of Z 2 before it, and with 50% probability replace it with |0 -controlled CX, X c CX c,t X c . These extra Pauli operators can be commuted through the rest of the circuit and their effect can be tracked in software. This is illustrated in Fig. 3 for the example of a circuit that measures the stabilizer of the XZZX surface code [5].
The error channel for the resulting randomized gate Randomizing a single CXc,t gate in a circuit that measures the stabilizer of the XZZX code, where c denotes the control qubit and t the target. Rather than apply the bare CXc,t gate, we randomly insert an element of Z2 before the gate, and randomly apply either CXc,t or XcCXc,tXc. This modification of the circuit results in an overall Pauli operator, which can be tracked in software.
then has the p D and p ND that we measure in our experiment. Therefore, we can always achieve the error rates p D and p ND by this randomized compiling procedure. Like the usual randomized compiling, our biased-noise randomized compilation procedure has the additional benefit of limiting how errors can build up when composing noisy gates. We show in Appendix C that if two error channels Λ A and Λ B are randomly compiled as above, their composition (Λ A • Λ B ) will have a non-dephasing error probability which, in contrast to to Eq. 9, says that p ND grows linearly rather than quadratically in the number of composed error channels when the circuit is randomly compiled by Z N . This is similar to the behavior of the average infidelity for full randomized compilation, which also may grow quadratically under composition for generic noise channels but grows linearly for randomly compiled circuits [58].

V. CONCLUSION
Measuring the bias of a highly-biased gate is a delicate process, as the non-dephasing error probability must be precisely estimated. By using techniques from randomized benchmarking, we can precisely estimate these error probabilities. The essential ingredient in our method is defining efficiently measurable weighted survival probabilities whose decay rates depend only on p ND . Because we consider variable sequence lengths, our method estimates gate error rates independently from SPAM errors, even if the SPAM errors are much larger than the nondephasing gate errors. By measuring the weighted survival probabilities for long gate sequences, we can magnify the effect of small non-dephasing errors, allowing us to precisely measure arbitrarily small error probabilities by simply increasing our sequence length.
Our interleaved bias RB in particular is highly tailored to the experimental constraints of biased noise qubits. In general, interleaved RB works because the interleaved gates randomize the error channel while not adding significant additional errors. However, in the case of biased noise qubits, we can only interleave Z gates without introducing additional errors; X and Y gates are generally as error-prone as CX gates. As a result, we were motivated to add additional randomization by swapping between C and C , which allowed for sufficient randomization of the error channel. This is in contrast to standard techniques for estimating Pauli channels, which assume one can freely add Pauli operators to a circuit without adding significant errors [34,35,44,61]. We expect our techniques to be highly relevant to near-term experiments, as the numerous proposals for bias-preserving CX gates [12][13][14] are realized experimentally.

ACKNOWLEDGMENTS
We thank Joel J. Wallman for helpful discussions about randomized compiling and its implications for our procedure and Steven T. Flammia for a critical reading of our manuscript. This work is supported by the ARO under grant number W911NF18-1-0212.

Appendix A: Derivation of the procedures
To derive these procedures requires a detour into some background mathematics. To begin, we review a few necessary aspects of representation theory, and define a natural representation for quantum groups, the Liouville representation. Next, we determine the irreducible representations of the Liouville representation of P N , Z N , and D N . Then we derive how the dephasing and nondephasing probabilities may be written as the trace over invariant subspaces of the Liouville representations. Finally, armed with this background, we derive each of the bias RB procedures.

Background: Representation theory
Given a finite group G, a unitary representation is a map assigning a unitary matrix to each group element such that group multiplication is preserved: where U (m) is the group of m × m unitary matrices acting on C m . A representation is irreducible if the image of φ doesn't preserve any proper subspace of C m . Every finite-dimensional representation can be uniquely decomposed as the direct sum of irreducible representations (irreps): where φ i : G → V i is irreducible and a i φ i is standard shorthand for the direct sum of a i copies of φ i . The number a i is refered to as the multiplicity of the ith irrep. Finally, the character χ of a representation is defined by We will repeatedly use two elementary facts about representations.
Fact 1 (Projection Formula). If φ a 1 φ 1 ⊕ · · · ⊕ a I φ I , then the projectorΠ i onto V ai i ⊆ C m is given bŷ where χ i is the character of φ i .
Fact 2 (Schur's Lemma). If φ a 1 φ 1 ⊕ · · · ⊕ a I φ I , then for any matrix Q ∈ GL(m) we have where Q i is some a i × a i matrix, and Q i ⊗ 1 i is a matrix that acts on V ai i by mixing the a i copies of V i but acting as the identity on the degrees of freedom within each copy of V i . In particular, if a i = 1 for all i, we have whereΠ i is the projector onto the single copy of V i ⊆ C m .
See [63] for proofs of these facts, as well as more details on the representation theory of finite groups.
In this paper, we will be interested in the case where G ⊂ U (2 N ) is a finite subgroup of the unitary group on N qubits. In this case, the standard action of U ∈ G on a density matrix, ρ → U ρU † , is a unitary representation of G on the vector space of density matrices. Choosing a basis for H, we can more conveniently represent a density matrix ρ = i,j ρ ij |i j| by a vector |ρ := i,j ρ ij |i |j . In terms of this vectorized density matrix, it is simple to see that the action of a unitary U on |ρ is given by where we've definedÛ := U ⊗ U * to be the matrix representation of the unitary U acting on the space of vectorized density matrices. This representation φ : U →Û sending a d × d unitary to a d 2 × d 2 unitary is known as the Liouville representation. We can define the Liouville representation of a quantum channel Λ : ρ → a K a ρK † a byΛ := a K a ⊗ K * a , in which case we have Λ(ρ) =Λ|ρ . (A9) Finally, we can write the expectation value of an observable E over a state ρ in terms of the Liouville representation as We refer to [48] for a more detailed treatment of both quantum channels and the Liouville representation.

Irreps of the quantum groups
For the Pauli, Z, and CX-dihedral groups, we will need to understand the decomposition of their Liouville representation into irreps, and the characters of those irreps.
In the case of the 1-qubit Pauli group, the Liouville representation decomposes into four non-isomorphic irreps, with projectors and characters given in Table III. The Liouville representation of the N -qubit Pauli group P N then decomposes into 4 N non-isomorphic irreps indexed by vectors i ∈ Z N 4 , with projectors and characterŝ One can similarly determine a factorized form for the irreps of the Liouville representation of the Z group Z N , but we will need only the case N = 2 here. The Liouville representation of Z 2 decomposes into 16 irreps with projectors and characters given Table IV. Note that in this case, each irrep has multiplicity 4.
Finally, the Liouville representation of the CX-dihedral group D N decomposes into the three irreps given in Table V. Note that the number of irreps is independent of N . Derivation of these irreps can be found in [22].

The dephasing/non-dephasing error probabilities in the Liouville representation
From the definition of the dephasing and nondephasing error probabilities, Eqs. 3 and 4, we want to express p D and p ND as the trace ofΛ over subspaces Z N and P N \ Z N of the Pauli group. Note that these subspaces are invariant under the action of any bias-   preserving operator. It is straightforward to show Rearranging this gives formulas for the dephasing and non-dephasing probabilities Our goal in these RB procedures is to determine the two traces Tr Z N (Λ) and Tr P N \Z N (Λ).

Deriving CX-dihedral bias RB
The character-weighted survival probability S b can be written as where Λ is the error channel associated with elements of D N , and we have included unknown preparation and measurement errors Λ P and Λ M . We make the standard RB change-of-variables, defining V 1 := U 1 and inductively defining V m := U m V m−1 . Note that the expectation value over {V m } is equivalent to the expectation value over {U m }.
Thus, we can write The two expectation values can be evaluated using Facts 1 and 2. First, we note from Eq. A11 that χ 1 is the character function of the irrep of the Liouville representation indexed by i 1 = (1, 1, . . . , 1), and χ 2 is the character function of the irrep indexed by i 2 = (3, 3, . . . , 3). Then Fact 1 says that where the projectors have the explicit form (see Eq. A11) Second, we note that, since the Liouville representation of the dihedral group is multiplicity-free (all a i = 1), Fact 2 gives Combining these facts allows us to simplify the survival probability as We have carefully chosen χ b to give us a projectorΠ P as can be checked from the formulas forΠ D i in Table V. Therefore, all terms in the sum vanish except for i = b, and we can write the final form of S b (n) as where we have defined A b , λ b to match the form given in Table I. Note that λ b only depends on Λ, and all effects of the SPAM errors Λ P and Λ M are absorbed in A b . Note also that provided we have reasonably high-fidelity preparation, gates, and measurements, we can approximate A b as where the last equality is found by plugging in the explicit formulas for E b ,Π P i b , and ρ b given in Eq. A21 and Table I. Therefore, the prefactor in front of λ n b is large, and we can accurately fit λ b .
To finish, we note that plugging in the explicit formulas from Table V Plugging this into Eq. A16 for p D and p ND gives the estimates in Eq. 16, as desired.

Deriving interleaved bias RB
a. Deriving the survival probabilities for b = 0± and b = 1± We begin by considering the survival probability S 0+ . For convenience, we define the operatorM 0+ bŷ Note thatΠ Z 0 is a rank-4 projector, soM 0+ is a rank-4 operator in the general case.
We now evaluate the survival probability S 0+ (n) in terms of M 0+ . We note that Fact 1 implies that Given thatM 0+ has rank 4, we can expand it in terms of its eigenvalues µ j and corresponding left and right eigenvectors |φ j and |ψ j aŝ where we have normalized our eigenvectors so that ψ j |φ k = δ j,k . In this case, our survival probability becomes (A33) Note that again the eigenvalues µ j depend only on the gate errors Λ and Λ , and not on the SPAM errors Λ P and Λ M .
In this form, the survival probability is the sum of four exponential decays, which is infeasible to fit to experimental data. However, in the case of high-fidelity gates, we can show that only two exponential decays are relevant using perturbation theory in δΛ := (Λ − 1) and δΛ := (Λ − 1). For perfect gates with δΛ = δΛ = 0, M 0+ has eigenvalues and eigenvectors given by Then to first order in δΛ and δΛ , we have We therefore see that one eigenvalue is always 1, and that we may neglect the eigenvalues µ 3 and µ 4 , since their contribution to the survival probability S 0+ (n) is O(δ n ). We can therefore fit S 0+ (n) to a single exponential decay plus a constant. In the notation of Table II, µ 2 corresponds to λ 0+ . From Eq. A33, the prefactor A 0+ in front of the exponential decay is given by We can estimate the value of A 0+ by assuminĝ Λ G ,Λ P ,Λ M ≈ 1 and evaluating Eq. A42 explicitly, provided we can estimate the eigenvectors |ψ 2 , |φ 2 . Since the eigenvectors at δ = 0 (Eqs. A34 and A35) are degenerate, we must use degenerate perturbation theory to find the eigenvectors at δ = 0. This means that |ψ 2 and |φ 2 will in general be (up to O(δ)) some linear combination the δ = 0 eigenvectors with eigenvalue 1. Specifically, we have where γ is some constant determined by the specific perturbations δΛ and δΛ , and the overall form is restricted by the normalization condition φ i |ψ j = δ i,j and the fact that |φ 1 = 1 2 |11 is a right-eigenvector with eigenvalue 1 for any trace-preserving map. Using these eigenvectors to evaluate Eq. A42 then gives A 0+ ≈ 1, so that we may accurately fit λ 0+ .
The case of b = 0− and b = 1± are similar. We definê and repeat the above analysis to see that we have for b = 0− and b = 1− where µ j , |ψ j , and |φ j are the eigenvalues and eigenvectors of the correspondingM b operator. Performing perturbation theory, we find that the unperturbedM 0− andM 1− have eigenvalues {µ 0 j } = {1, −1, 0, 0}, while the unperturbedM 1+ has eigenvalues {µ 0 j } = {1, 1, 0, 0}. We label the two largest-magnitude eigenvalues by λ b and κ b , and neglect the remaining eigenvalues that are O(δ). Working to first order in δΛ and δΛ , we find In total, we've demonstrated that for these values of b, we can fit S b (n) to the sum of two exponential decays, as given in Table 23. We can again estimate the prefactors A b and B b in front of the decays by assuminĝ Λ G ,Λ P ,Λ M ≈ 1 and evaluating Eqs. A48 and A49. In the case of b = 1+ we again need to use degenerate perturbation theory, while the cases of b = 0− and b = 1− are non-degenerate. We find that A b , B b ≈ 1/2, so that we can reliably fit both decay curves. Note that the only reason for averaging over two initial states and final measurements for b 1+ is to ensure that A 1+ , B 1+ ≈ 1/2.

b. Deriving the survival probabilities for b = 2±
We begin by considering the survival probability S 2+ . For convenience, we define the operatorM 2+ bŷ Note thatΠ Z 2 is a rank-4 projector, soM 2+ is a rank-4 operator in the general case.
We now evaluate the survival probability S 2+ (n) in terms of M 2+ . We note that Fact 1 implies that Table II to the characters of the Liouville representation of Z 2 in Table IV). We then have where in the last line, µ j , |ψ j , and |φ j denote the eigenvalues and eigenvectors ofM 2+ .
We again simplify this expression through perturbation theory. When δΛ = δΛ = 0,M 2+ has eigenvalues and eigenvectors given by Then to first order in δΛ and δΛ , the eigenvalues satisfy We neglect µ 3 and µ 4 , and find S 2+ is the sum of two exponential decays. In the notation of Table II, µ 1 corresponds to λ 2+ and µ 2 corresponds to κ 2+ . Similarly, for S 2− (b), we definê in terms of which we can write where now µ j , |ψ j , and |φ j denote the eigenvalues and eigenvectors ofM 2− . We again use perturbation theory; the unperturbed eigenvalues and eigenvectors are while to first order in δΛ and δΛ , the eigenvectors satisfy We again neglect µ 3 and µ 4 , and find S 2− is also the sum of two exponential decays. In the notation of Table II, µ 1 corresponds to λ 2− and µ 2 corresponds to κ 2− . Finally, we can estimate the prefactors A 2± and B 2± by assumingΛ G ,Λ P ,Λ M ≈ 1 and evaluating Eqs. A57 and A66. In both cases, we must use degenerate perturbation theory to find the approximate eigenvectors. We again find A b , B b ≈ 1/2.

c. Finding the dephasing and non-dephasing probabilities
We can combine Eqs. A39 and A50 to evaluate the trace over Z 2 as and we can combine Eqs. A51, A52, A62, and A71 to evaluate the trace over P 2 \ Z 2 as Plugging Eqs. A74 and A75 into Eq. A16 for p D and p ND gives Eq. 23 as desired.
Appendix B: Dephasing/non-dephasing probabilities of a composite channel Given two quantum channels with dephasing and non-dephasing probabilities p A D , p A ND , p B D , and p B ND , we want to find bounds on the dephasing/non-dephasing probabilities of the combined channel (Λ A • Λ B ). We will denote the combined error probabilities by simply p D and p ND .

Finding the non-dephasing error probability
From the definition of p ND , Eq. 4, we have For legibility in what follows, we will denote the diagonal elements χ α β, α β of the χ-matrices by simply χ α β . Note that the complete-positivity of Λ A and Λ B requires that the χ-matrices are positive semidefinite, which in turn implies |χ α1 β1, α2 β2 | ≤ χ α1 β1 χ α2 β2 [33], a fact we will use repeatedly. We will also repeatedly use two elementary inequalities, both of which are versions of the Cauchy-Schwarz inequality: The condition α 1 + γ 1 = α 2 + γ 2 = 0 implies at most two of { α 1 , α 2 , γ 1 , γ 2 } can be equal to 0. We thus divide the terms in Eq. B3 into several subsets.
Let's take each of these in turn. We have The first term is equal to p B ND , up to irrelevant higherorder terms. We can bound the magnitude of the remaining terms: Thus in total, we have By symmetry, we similarly have The remaining terms are only higher-order corrections, and so we bound their magnitudes: We can bound each of these four terms as Thus, in total we have By symmetry, we also have Continuing, we have each of which can be bounded as Thus in total, By symmetry, we also have Finally, Combining the bounds given in Eqs. B17, B18, B33, B34, B45, B46, B47, B48, and B53, we have that where the error term is bounded by While the bound on | | involves many terms, in the case where p A,B text. This is the relevant regime for high fidelity, highlybiased channels on a small number of qubits. It is also the relevant regime for error channels in which off-diagonal elements of the χ-matrix are exponentially suppressed in the weight of the Pauli errors, which is likely the case for error-correction circuits as they tend to decohere noise [52][53][54][55].

Finding the dephasing error probability
From the definition of p D , Eq. 3, we have We could similarly divide the terms in this sum into categories and bound them, as we did for p ND above. However, if we assume p A,B D p A,B ND , we can instead approximate p D by p = p D +p ND , the total error probability of the channel. We then use the known result [58, Theorem 1] 3. Extracting the dephasing and non-dephasing probabilities in IBRB Given the estimates of p D and p ND given in Eqs. B57 and B54, respectively, we can reverse these equations to try and estimate p A D and p A ND from p B D , p B ND , p D , and p ND . This is relevant if Λ A is the error channel associated to our gate of interest and Λ B is the error channel of our interleaving group. For convenience, we'll assume we're in the regime where we can neglect the error terms proportional to 2 N/2 and higher powers. Rearranging Eqs. B57 and B54 gives We note that in our particular case, it seems difficult to extract p B D and p B ND for the group Z 2 , as the Liouville representation of this group has high multiplicity. This is why we prefer the randomized compiling interpretation of p D and p ND given in the main text.
Appendix C: Randomized compiling with Z randomization In this section, we explain the randomized compiling by Z N in more detail, and prove that it ensures nondephasing errors increase linearly under composition (Eq. 24 of the main text).
Given an N -qubit bias-preserving Clifford circuit composed of gates C 1 , C 2 , . . . , C n , a noisy implementation of the circuit is given bŷ C nΛn · · ·Ĉ 2Λ2Ĉ1Λ1 (C1) whereΛ 1 ,Λ 2 , . . . ,Λ n are the associated error channels. Let's assume that we can implement arbitrary elements of Z N with a gate-independent error channelΛ G , wherê Λ G is negligible red to the Clifford errors. We take the convention that the In this case, we can interleave randomly chosen gates U ∈ Z N between the Clifford elements without increasing the error. Note that because the circuit elements are Clifford, the effect of interleaving U can be corrected by an efficiently computable Pauli correction operator. We'll denote the correction operator by U n+1 . Note that because the Cliffords preserve the bias, we have U n+1 ∈ Z N . The resulting noisy circuit is then Because the Cliffords preserve the bias, commuting any element of Z N past a Clifford results in another element of Z N . We can thus rewrite the circuit aŝ for some V i ∈ Z N that are also distributed uniformly over where the twirled error channel Λ T i is given bŷ The twirled version of the error channels is highly simplified compared to the original error channel. In terms of the χ-matrix, if the original error channel is given by χ α1 β1, α2 β2 X( α 1 )Z( β 1 )ρZ( β 2 )X( α 2 ) (C6) then the twirled error channel is given by χ α β1, α β2 X( α)Z( β 1 )ρZ( β 2 )X( α) where the off-diagonal elements of the χ-matrix with α 1 = α 2 are set to zero. If we consider the composition of two twirled error channels (Λ T A • Λ T B ), we can again estimate the nondephasing probability of the composition in terms of the dephasing and non-dephasing probabilities of Λ A and Λ B . In evaluating the sum in Section B 1, the fact that the error channels are twirled means the only nonzero terms in the sum have α 1 = α 2 and γ 1 = γ 2 . These leaves the sum over subsets (1) and (2) unchanged, makes the sum over subsets (3)-(8) zero, and lets us reevaluate the sum over subset (9) as |(9)| ≤ α, γ = 0 β1, β2, δ2 Thus by combining Eqs. B17, B18, and C13, we have In the regime of high-fidelity gates on a small number of qubits, | | is negligible, which gives Eq. 24 in the main text.
In contrast, randomized Z 2 compiling produces no notable improvement to the bounds on p D for a highly biased noise channel, as in this case the dominant uncertainty in p D comes from off-diagonal elements of the χ-matrices χ A and χ B with α 1 = α 2 = γ 1 = γ 2 = 0, which are not affected by twirling.
Appendix D: Generating random biased-noise error channels Here, we give more details on our procedure to generate random biased-noise error channels to use in our simulation data. We make no claim that this procedure is optimal or generates all realistic error channels; our goal was simply to generate error channels that were both biased and not Pauli-diagonal to illustrate the power of our BRB methods.
We first randomly choose the number d = 1, ..., 4 N of Kraus operators to include, as well as the approximate target probabilities p D and p ND for the channel. For i = 1, . . . , (d − 1), we choose each Kraus operator to be either dephasing or non-dephasing with probability 1/2, and set it to where each c is generated by choosing a uniform r ∈ [0, 1], θ ∈ [0, 2π], and setting c = re iθ . The factor of 10 was inserted "by hand" to make the resulting error channel approximately have the desired dephasing/nondephasing probabilities.
Finally, having defined K i for i = 1, . . . , (d − 1), we define K d to be a matrix satisfying to ensure the overall channel is trace-preserving. While this description of K d is not unique, one matrix satisfying this equation is given by the Cholesky decomposition of (1 − i K † i K i ) [64], which is what we used in our simulations.