Fast Estimation of Sparse Quantum Noise

As quantum computers approach the fault tolerance threshold, diagnosing and characterizing the noise on large scale quantum devices is increasingly important. One of the most important classes of noise channels is the class of Pauli channels, for reasons of both theoretical tractability and experimental relevance. Here we present a practical algorithm for estimating the $s$ nonzero Pauli error rates in an $s$-sparse, $n$-qubit Pauli noise channel, or more generally the $s$ largest Pauli error rates. The algorithm comes with rigorous recovery guarantees and uses only $O(n^2)$ measurements, $O(s n^2)$ classical processing time, and Clifford quantum circuits. We experimentally validate a heuristic version of the algorithm that uses simplified Clifford circuits on data from an IBM 14-qubit superconducting device and our open source implementation. These data show that accurate and precise estimation of the probability of arbitrary-weight Pauli errors is possible even when the signal is two orders of magnitude below the measurement noise floor.


I. INTRODUCTION
Estimating noise in quantum computers is becoming increasingly important as we begin to test quantum error correction (QEC) on current noisy intermediate-scale devices [1]. Much of the current effort in noise estimation is focused on identifying methods that will remain tractable as the system size increases beyond the few qubit regime [2][3][4][5][6][7][8][9][10][11][12][13][14][15]. In such larger systems it is important to identify not only the errors that occur when qubits are operated in isolation or in small groups but also the additional errors that occur when the device is implementing fault-tolerant QEC circuits and nontrivial quantum algorithms. If we are able to characterize the noise and noise types (such as control errors, decoherence and crosstalk errors) in such a system then that will allow us to better diagnose and fix such errors, for instance by enabling calibration in the presence of crosstalk. Characterization of the noise will also allow the construction of tailored quantum error-correcting codes and decoders and customized fault-tolerance protocols designed to counteract the specific noise in the system. Such bespoke systems have been shown to outperform their generic counterparts at quantum error correction [16][17][18][19][20][21].
Noise estimation is possible in principle using quantum process tomography [22], but in practice this is often not desirable for several reasons. First, even using methods such as compressed sensing [23][24][25][26][27][28], the enormous Hilbert space of a multi-qubit machine makes it difficult to efficiently estimate all possible parameters beyond a handful of qubits. Second, standard tomography protocols are susceptible to state preparation and measurement (SPAM) errors [29], which limit the accuracy in estimating noise in quantum gates. * These authors contributed equally.
One promising approach to make noise characterization more tractable is to reduce the noise to a smaller set of relevant parameters that can be estimated in a SPAMfree way. A natural candidate for this approach is to learn the Pauli projection of a quantum noise channel. This is the channel obtained when the noise channel is twirled over the set of n-qubit Pauli operators. The remaining parameters of the channel, known as the Pauli error rates, are the most relevant parameters for near-term applications of QEC and fault tolerance because of the dominant role played by stabilizer codes [30]. Moreover, practical methodologies have been developed to implement the Pauli projection without substantially changing the average error rate in a given round of gates [31][32][33]. Furthermore, QEC tends to make noise less coherent [34][35][36], which further justifies the Pauli approximation at the logical level. Finally, Pauli error rates can be learned in a SPAM-free way [37,38].
Focusing on Pauli channels reduces the number of parameters required for complete noise estimation to 4 n , where n is the number of qubits of the device. Although this has better scaling than other SPAM-robust methods that attempt to learn an entire noise channel (e.g. [39,40]), this is unfortunately already too large to be tractable for some present-day quantum devices [41]. There are several ways to try to reduce this parameter count even further while still capturing the most relevant parameters for fault tolerance and QEC. For example, when the Pauli error rates form a bounded-degree Markov field, then the channel can be learned efficiently in n [37]; this algorithm was experimentally validated in Ref. [38]. Ref. [37] also gave an efficient algorithm for estimating the class of s-sparse Pauli channels, i.e. those with at most s nonzero Pauli error rates. These two classes of Pauli channels are motivated by the fact that quantum devices approaching the fault-tolerant regime will have very few significant errors (and therefore are approximately sparse) and will have errors that are only weakly correlated (and therefore are approximated by a low-degree Markov field).

A. Main Results
In this paper, we give a new algorithm for estimating ssparse Pauli channels that is distinct from Ref. [37]. This algorithm can reconstruct an s-sparse Pauli channel with the following recovery guarantee. We assume first that an experiment can be modeled as having access to a noisy oracle that can return an eigenvalue of an unknown Pauli channel with some independent Gaussian noise with variance ξ 2 . Then using at most O(sn) queries to the noisy oracle, the algorithm returns s estimated error ratesp j that agree with the channel error rates p j with precision |p j − p j | ≤ O ξ √ s . In fact, the bound is slightly stronger than this. The precise statement is given in Theorem 1, together with Assumptions 1 which lay out the precise mathematical assumptions used in the derivation.
We then show how to break open the oracle and perform the entire estimation efficiently. We show that noisy eigenvalues can be estimated to within variance ξ 2 by using only Clifford quantum circuits and computational basis measurements. Our results use modifications of the algorithm from [37] and show how the relevant noisy eigenvalue queries can be obtained with only O n 2 ξ 2 measurements.
Next, we validate these algorithms using experimental data from a 14-qubit superconducting device [38]. The original experiment exhaustively estimated the averaged eigenvalues in this device. We use these data to construct our eigenvalue oracle. We then simulate various levels of measurement noise on top of this "true" experimental signal to validate our algorithms. Our results are depicted in Figure 1. We show that when the noise added to the eigenvalues has any standard deviation in the range of 10 −3 -10 −5 then we can accurately recover Pauli error rates as small as two orders of magnitude less than the noise added on the eigenvalues. Importantly, even when we artificially add arbitrary many-body Pauli errors with comparable error probabilities, we still recover these strongly correlated errors with high relative precision.
Our results suggest that practical characterization of all Pauli error rates with probabilities greater than 10 −4 or 10 −5 in a quantum device with 10-20 qubits can be achieved with around 10 6 or 10 7 experimental measurements. In such quantum devices having sub-microsecond gate times, this puts practical noise characterization within reach on a time scale of hours, not days or weeks.
Finally, we have written open source code, available on GitHub [42], which reproduces all the figures in this manuscript and contains other examples which explain how to use the algorithms in real experiments.
The remainder of this paper is organized as follows. We provide some notation and background in §II followed by an intuitive overview of our recovery algorithm in §III.
We state our precise recovery guarantees in §IV. We describe the circuits we use for practical eigenvalue estimation and provide details of our validation results in § §V and VI. We have deferred the precise definition of the algorithm until §VII and the proofs until § §VIII and IX. We conclude in §X.

II. NOTATION AND BACKGROUND
Given a set of n qubits with Hilbert space dimension 2 n , we can introduce the following notation. Let P n denote the group of Pauli operators on all n qubits and P n = P n / i be the Paulis modulo phase. There is a natural isomorphism between multiplication on P n and bit-wise addition of 2n-bit strings F 2n 2 given by a ∈ F 2n 2 , a ←→ P a = P axaz = i ax·az X[a x ]Z[a z ], (1) where a x , a z ∈ F n 2 and X and Z are the standard singlequbit Pauli matrices, and P a ∈ P n is understood to be a canonical coset representative. Here X[a x ] = X ax 1 ⊗. . .⊗ X ax n , and similar for Z[a z ]. Using this isomorphism, we can directly use a ∈ F 2n 2 to denote the Pauli matrix P a . For any two Pauli matrices P a and P b , we have P a P b = (−1) a,b P b P a where the symplectic inner product is symmetric and bilinear. We define a stabilizer group S to be a linear subspace of F 2n 2 such that for all a, b ∈ S, a, b = 0. Thus a stabilizer group forms a commuting subgroup of the full Pauli group by the mapping in (1).
An n-qubit Pauli channel E acting on a quantum state ρ is of the form where p j is the error rate associated with the Pauli operator P j . The Pauli error rates p j form a probability distribution over all N = 4 n elements of the n-qubit Pauli group modulo phases. These are closely related to, but distinct from, the Pauli channel eigenvalues, which are defined as λ j = 1 2 n Tr P j E(P j ) .
Because it will be clear from context, we will often refer to these simply as the "error rates" and the "eigenvalues". Thus, when a state ρ is subjected to the noisy channel E, the error rate p j describes the probability of a multiqubit Pauli error P j affecting the system. In contrast, the eigenvalues describe how faithfully a given multi-spin Pauli operator is transmitted through the channel. The error rates p j and eigenvalues λ j are related by a Walsh-Hadamard transform (WHT  1. (a) This figure shows the ability of the reconstruction algorithms to recover sparse Pauli error rates from experimental data. The "true" error rates (gray line) were constructed using data from a 14-qubit experiment [38] as described in the main text ( §V). Dots indicate recovered Pauli error rates using our algorithm with artificially added normally distributed noise on top of the true error rates to simulate finite sampling and other noise sources. The reconstruction used two experimental designs using a number of randomized benchmarking style experiments: Type I used 58 and Type II used 365 such experiments, each with the same number of samples per experiment. The Type II experimental design runs more experiments and therefore takes more data overall, but allows recovery of an increasing number of Paulis while keeping constant the number of measurements per experiment. (b) shows the recovery of 1000 different error rates as low as 10 −7 with high relative precision, when the experimental noise varies between ξ = 10 −3 -10 −5 . Notably, the error rates are recovered with a precision almost two orders of magnitude below the standard deviation ξ of the noise added to the signal. (c) a more detailed look at the recovery in the regime between 10 −6 and 10 −7 with noise levels of ξ = 10 −5 . (d) Violin plot of the total variational (1-norm) distance between the original probability distribution (p) and the reconstructed probability distribution (p), being 1 2 p −p 1. The charts show the spread of recovery error over 200 different randomly generated samples of noise. As can be seen the entire probability distributions are consistently recovered to high precision. (e) shows a separate experiment where 4 distinct uniformly random many-body Paulis were added to the oracle with error rates chosen randomly from a normal distribution N (0.005, 0.001). The algorithm was run with this additional signal to test if it can recover these Paulis as well. In all cases the planted Paulis were recovered with small relative error, as shown. and the orthogonality relations of the Pauli group, we can compute the Walsh-Hadamard transform coefficients: The symmetrical nature of the Walsh-Hadamard transform means we also have the inverse relation: Note that our WHT is ordered by Pauli commutation relations-see Appendix A for a further discussion of this subtlety. Finally, for any natural number N , we then write [N ] to mean {0, . . . , N − 1}.
In an analogy with discrete Fourier transforms, the error rates can be thought of as the frequency domain components of the time domain signal, which in this case is the eigenvalues. Our goal is to sparsely sample the dense time domain signal (the eigenvalues) and reconstruct the entire (but sparse) frequency domain (the error rates). The theory of compressed sensing allows us to do this in principle with very few measurements, namely O(s log N ). However the standard reconstruction methods that use convex optimization require poly(N ) classical computation, which is too expensive since N = 4 n . Therefore, unlike in compressed sensing, we must reconstruct the sparse frequency domain signal using only poly(s, log N ) resources for our algorithm to be considered efficient, which is indeed what we achieve here.
Throughout this paper, we will restrict to a sparsity regime with only s 4 n/2 nonzero error rates, each having probabilities greater than a specified cutoff 0 . (In our proofs, we assume that any error rate less than 0 is identically zero, although the heuristic algorithm is more forgiving.) This is what we mean when we refer to an s-sparse model. This allows our algorithm to perform in the regime where s is exponential in n. When such an exponential scaling holds, it makes our algorithm inefficient in n, but this is also a relevant regime if we wish to estimate Pauli channels with an extensive entropy. Distributions with extensive entropy will generally require an exponential number of error rates to estimate them with arbitrary accuracy.
Our recovery methodology builds on one of the main results of Ref. [37] and an adaptation of the classical algorithms described in Refs. [43,44]. In Ref. [37], the authors show how to recover all N = 4 n Pauli channel eigenvalues to relative precision using O −2 n2 n measurements. The circuit modifications we require on top of that algorithm are shown in Figure 3. The recovery of all N eigenvalues would require 2 n + 1 applications of depth O(m + n 2 / log n) Clifford circuits, or 3 n applications of depth O(m) Clifford circuits. Here m is a constant that depends on the channel being estimated and is O(1/∆) with ∆ = 1 − max j =0 λ j being the spectral gap of the channel. We can consider m = O(1), although the implied constant might be large for high-fidelity quantum channels. While the depth of this algorithm is efficient, the number of distinct circuits required is clearly not scalable in n. A single individual eigenvalue can still be learned to relative precision using only O −2 measurements, however. It is the need to sweep through 2 n + 1 (or more) sets that leads to the factor of O n2 n in the sample complexity.
In Ref. [37], the authors also derived what is essentially a variant of the Kushilevitz-Mansour Algorithm [45] for learning decision trees via the Fourier spectrum and applied it to the case of Pauli channels. The basic idea is to do a binary search through the marginals for Pauli errors rates that have large probability mass. This algorithm is theoretically efficient in s and n, however our numerical experiments using this algorithm suggest that the number of eigenvalues required per recovered error rate will make that algorithm difficult to use in practice, at least in its current instantiation and in the relevant regime for quantum computing applications.

III. ALGORITHM OVERVIEW
The problem of reconstructing a sparse set of Pauli error rates by measuring few eigenvalues is closely related to a classical problem of computing a sparse Walsh-Hadamard transform. This problem was studied by Scheibler et al. [43] and later (in the regime of noisy signals) by Li et al. [44] by decoding a signal x ∈ R N which contains 2 n points indexed by j ∈ F n 2 . In our circumstances we are not analyzing the frequency domain of a signal, but rather the global probability distribution of the Pauli error rates in a quantum device and the eigenvalue distribution of the Paulis in a super-operator representation of a Pauli noise channel, so this formalism requires some adaptation.
Given the WHT mapping in equations 5 and 6 the algorithms presented in [43] and [44] are broadly applicable, but require some modifications. We will note where adjustments have to be made. One major difference is our inability to simultaneously measure noncommuting Pauli operators. Below we give a broad overview of the reconstruction algorithms as applicable to our needs. A complete and rigorous analysis can be found in §VIII, but the main recovery guarantee is stated below in Theorem 1. We first deal with the noiseless case.
The main idea behind the algorithm is to note that each Pauli eigenvalue is made up of a linear combination of all the Pauli error rates. By subsampling the eigenvalues, we are able to split up the Pauli error rates, figuratively creating 'bins' of error rates, where each bin contains a linear combination of a smaller number of error rates. Provided that there are sufficient bins, then in the sparse regime most of these bins will only contain a few Pauli error rates with weight ≥ 0 . Using aliasing, we can identify these bins and can therefore evaluate these error rates. This information will allow us to reconstruct all the sparse error rates. With this in mind, the reconstruction algorithm can be broken down into three main steps: 1. Determine the subsampling bins and perform the experiments to measure the required eigenvalues.
2. Calculate and measure the aliased bins to enable identification of single Pauli-bins (single-tons) and the Pauli error rates that occupy them.
3. Run a decoder to 'peel back ' single-tons, converting multi-Pauli bins to single-Pauli bins and repeat until all error rates are identified.
We describe these three steps in an intuitive manner below and relegate the analysis and proofs to §VII.
Step 1-Subsampling. The intuition behind the first step is that it is possible to sample a specific pattern of eigenvalues that will allow the reconstruction of the global probability vector, but where various probabilities are binned (i.e. added together). For instance, given a global probability vector with N = 4 n values it is possible to rewrite this as a "reduced" vector (p) with B = 2 b values, each value being composed of the summation of N/B of the original global probability values (possibly with signs). In the regime where our sparsity is s < 2 n then we will show that with appropriate random sampling a large number of these reduced vector values (which we will call bins), will be composed of none or one of our sparse Pauli errors, i.e. those with a weight ≥ 0 for a parameter 0 to be chosen later. In what follows we will always choose B = 2 n , but we will occasionally use the notation B = 2 b (so that b = n) to illustrate where a given numerical factor originates from.
Whereas Ref. [43] imagined using specific bit patterns of binary strings to index the requisite eigenvalues to sample, we wish to exploit the ability of a quantum device with independent measurement on each qubit to sample from a bit string of 2 n values. As previously discussed, the protocol in [37] shows how to measure, to multiplicative precision, the Pauli eigenvalues of 2 n commuting Paulis using one randomized-benchmarking style experiment with n-bit spin measurements at the output. The constraint that the Paulis measured be mutually commuting is exactly the constraint we require for the subsampling to allow us to create the required reduced probability vectorp.
Suppose we have a specific stabilizer group S. We will postpone how to choose this group until later. We can represent the entire stabilizer group by an n × 2n binary matrix S whose j th row is the stabilizer generator s j . Now let v ∈ F n 2 label the elements of the chosen stabilizer group, for example via the mapping v.S, where v is thought of as a row vector. Our reduced probability vectorp then consists of B bins each containing a sum of N/B distinct Pauli errors. It is labeled by a string j ∈ F b 2 and is given by The effect of this is that the sampled Pauli eigenvalues from the stabilizer group, when transformed by the Walsh-Hadamard transform, give us B bins each containing a sum of 2 n = N/B error rates, many of which will be zero in general. The binning is chosen in such a way that with high probability there will be a large number of bins that only contain a single Pauli error rate with a weight ≥ 0 (the other Pauli errors allocated to that bin being, effectively, zero). This will depend on the size of the bins and the sparsity of the Pauli error rates, and is discussed further in §VII. An simple example of the subsampling and binning idea is shown in Figure 2.
So how do we construct our stabilizer group? The most obvious way is to sample a random n-qubit Clifford (see [46,47] for how to do this). However as n grows past a few qubits, then on current devices the number of single and multi-qubit gates required to construct a generic element of the Clifford group requires circuits of depth O(n 2 / log n), and if these circuits are noisy then this will wash out the signal required to estimate the eigenvalues. A better way for current devices is to use a random subset of n-qubit stabilizers that can be formed from a single round of non-overlapping 2-qubit Clifford gates. This has the added advantage of making it trivial to work out how to perform step 2.
Step 2-Aliasing. The question then becomes: how do we detect which bins contain a single Pauli error rate? To do this the reconstruction algorithm uses the shift/modulation property of the WHT. Specifically, if we let {p k } be the WHT of {λ m } we have: By taking each element of the stabilizer group and offsetting the sample with a shifting bit pattern (e.g. for four qubits the sample would be offset by the five following bit patterns [0, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]) then the Pauli error rates consigned to that bin are no longer merely summed but rather are added or subtracted depending on whether the inner product of their 'bit-strings' and the relevant pattern is zero or one. This result will be illustrated in more detail in Algorithm 2 and lemma 1, where we also discuss how to use bit-flip error detection codes to make the decoding more robust to noise. This leads to a number of remarkable effects. If the bin is empty (i.e. contains no Pauli error rates with non-zero errors) each of the offset bins (i.e. for a particular j each p j,d , d ∈ {2 0 . . . 2 2n }) will also be zero. If the bin contains only one non-zero Pauli error rate then the magnitude of the sum of each of the offset bins will be constant, and the sign of the sums will identify exactly which Pauli has the non-zero error rate. (For example using the four qubit offsets shown above, if the absolute values of the bins were all 0.001 and the signs of the 4 offset bins were (+, −, −, +), this could only be caused by a single Pauli error rate of 0.001, with a bit string of 0110). In every other case, the bin contains multiple Pauli error rates (a multi-ton bin), which leads us to the PEELING decoder (see step 3).
So how can we construct the experiments that will allow us to extract the 'shifted' eigenvalues? For instance, one might note that for any particular stabilizer group S, the offset bit pattern applied to each of the elements of the group are unlikely to form a stabilizer group.
It transpires that where we use a stabilizer group created by local two-qubit Cliffords (on each qubit pair), we can do this simply by iterating each distinct qubit , separating them into bins consisting of singletons, multi-tons and zero-tons, as described in the text. In this example it can be seen that the IY Pauli exists as a single-ton in group 2, allowing its value to be recovered. It can then be 'peeled' from the third bin in group 1 converting that bin from a multi-ton to a single-ton containing just Pauli XY . This then allows Pauli XY to be recovered as well, which would not otherwise be possible, as the signal would conflate with that of IY from group 1 alone. Iterative peeling in this fashion will eventually recover all of the nonzero Pauli error rates. On the right side of the figure we illustrate how the sub-sample groups are formed. The sparse Pauli errors are transformed into the dense eigenvalues by the WHT transform. Here we have only added the lines relating to the nonzero Paulis, with blue a positive and red a negative contribution to the relevant eigenvalue. The sampled eigenvalues are transformed to define the subsamping groups.
pair through four further (different) two-qubit stabilizer patterns. There are five two-qubit stabilizer groups, the union of whose bases form a complete set of mutually unbiased bases; let us label them S ⊗2 1...5 . We set out a specific choice of these groups in detail, together with the two-qubit circuit needed to create them, in Figure 3. The initial stabilizer is chosen by selecting randomly from S ⊗2 1,2 for each qubit pair. This becomes the stabilizer for the purpose of Step 1. This circuit is used to conduct the first experiment and extract 2 n Pauli eigenvalues. The offset pattern required for this in Step 2 is constructed by iterating over each qubit pair, and replacing the circuit chosen in Step 1 with one of the other 5 (for a total of 4 further experiments per qubit pair). The total number of experiments required is therefore 2n + 1. By analyzing each of the experiments formed we will be able to pull out of the all of the eigenvalues determined by such experiments. Figure 3 shows the circuits used for each experiment and illustrates the method described above.
Step 3-Peeling. If we use a variety of subsampling matrices (that is we repeat Steps 1 and 2 for more than one random initial choice of Cliffords) we are now in the position where we have identified a number of Pauli error rates (from bins that contain only one Pauli error rate) and we will also have a number of bins that contain more than one Pauli error rate (multi-ton bins). In general, for any two stabilizer groups, different Pauli error rates will get hashed into different bins. Where we have identified a single Pauli error rate under, say, stabilizer group 1, that same error rate may be in a different bin under stabilizer group 2, a bin it shares with one or more different high weight Paulis (i.e. it may be in a multi-ton bin under stabilizer group 2). However, because we know the value of this Pauli error rate (since it was a singleton under stabilizer group 1), we can remove it from the bin created by stabilizer group 2 by simple subtraction. After this removal, some bins that were previously multiton bins will now become singletons, or at the very least they will be closer to being singleton in that we are left with a bin that now has one fewer Pauli error rate in it. This removal of the value of a previously identified singleton from a different stabilizer group's bin is known as 'peeling back' the known values, giving the PEELING decoder its name. The goal is that when we peel back our identified error rates, we create more and more bins that now contain only one Pauli error rate. This can be applied in an iterative fashion. We can then iterate this until we have either identified all the Pauli error rates (all the bins are empty) or until we have no further single Pauli error rates to peel back. All of these steps can be viewed in Algorithm 3. In the latter case the reconstruction algorithm has failed, although we will at least know the magnitude of the error rates we have failed to identify, and can perform additional experiments to try to learn them.

A. Dealing with noise
Using the ideas in Ref. [44], we can modify the reconstruction algorithm to handle noise of the form where w is a Gaussian distributed noise vector, w ∼ N (0, ξ 2 1). It is only for simplicity in the proof that we consider the isotropic case, and small dependencies and correlations do not substantially affect the observed numerical performance. In our case, the noise arises as the estimation error in our eigenvalues caused by finite sampling. These finite sampling errors occur because of the limited number of random sequences and measurement shots per sequence occurring when the eigenvalue estimation experiments are carried out. Errors of this nature have been analyzed and in Ref. [37]. To reduce noise, the number of sequences and shots per experiment needs to be increased, and this sample complexity was also bounded in Ref. [37]. In the relevant regime of high precision, the estimation error on the eigenvalues will be approximately normally distributed, and empirical estimates of the variance an covariance can be determined by bootstrapping from the observed measurement outcomes [38].
The PEELING decoder only requires two adjustments to account for such noise: the zero Pauli verification and the single Pauli search protocols.
For the former, in the noiseless model we identify a bin as being empty if the value of the bin (and each of the offset bins) is zero. Where we have noise, we simply relax the requirement that the bins are exactly equal to zero before identifying them as empty. We can bound an acceptable small value as indicating an empty bin, given the number of 'noisy' zeros in the bin and our estimate of the noise variance. This will lead to a noise floor of Pauli error weights we can recover. That is, we are unlikely to recover those Pauli errors with a value so small they are swamped by the noise in the bins. This is an inevitable consequence of the noise.
The latter case of single Pauli identification has two aspects that need to be considered. The first is 'does the bin contain only a single Pauli?', and the second is 'if so: which Pauli?'. For a noisy version the first ques-tion is dealt with the same way as the noisy zero, i.e. we only require the magnitudes of the offset bins to match to within some estimated noise window. While this runs the risk of not noticing some small Pauli error rates that are also in the bin, it appears to work well in practice. The second is more akin to a noisy bit flip channel, in that the noise may cause us to incorrectly identify a '1' as a zero or vice-versa. (This is more likely when the noise is commensurate with or greater than the Pauli error weight.) One simple method of dealing with this is to repeat sample with different offsets, and then take a majority vote, however our numerical simulations do not suggest that this is necessary. Finally we can use a number of random offsets and some additional fixed offsets chosen in such a way they form a classical error correction code to further protect the algorithm from noise. When an appropriate classical code is chosen this does not alter the sample complexity scaling, though it does increase slightly the number of experiments. It also comes with a robust recovery guarantee as described in the next section and §VII.

IV. RECOVERY GUARANTEE FROM NOISY EIGENVALUES
Using the algorithm illustrated above and leveraging some proofs contained in [44], we can construct the following recovery guarantee that relates our ability to recover Pauli error rates with bounded error to the noise in the estimated Pauli eigenvalues. The intuition behind the guarantee is that by increasing the number of offset observations we can reduce the chance of incorrectly detecting whether the bin occupancy is zero, or one, or more than one. If the bin detection succeeds, then the peeling step will succeed with high probability for appropriate choices of the subsampling and aliasing designs.
Our recovery guarantee does however rely on several assumptions, which we now state explicitly. Assumptions 1. Let p ∈ R N be the target Pauli error rates with support K = supp(p) and sparsity s = |K|.
A1 (Random sparse support.) The support set K is chosen uniformly at random from all subsets of [N ] of size exactly s, where s = 4 δn is sub-linear in the dimension N = 4 n for some 0 < δ < 1/2.
A2 (Independent Gaussian noise.) Each queried Pauli eigenvalue λ j has noise given by independent Gaussian noise centered around the eigenvalue with variance ξ 2 .
A3 (Good signal-to-noise.) Each error rate p m for m ∈ K is lower bounded by p m ≥ 0 for some 0 > 0, the eigenvalue noise variance is upper bounded as ξ 2 ≤ min( B s 2 , 1), and the two are Here B is the number of bins in a single subsampling group.
Our main theorem is then the following. Theorem 1. Suppose the Assumptions 1 hold for an unknown Pauli channel with eigenvalues λ and error rates p. Then with failure probability P F ≤ e −O(n) , Algorithms 2, 3, and 4 estimate the s-sparse Pauli error rates p such that p − p ∞ ≤ 2ξ/ √ B using O sn eigenvalue queries and O(sn 2 )-time classical computation.
Proof. The proof is given in §VIII.
Note that our main theorem references a noisy eigenvalue oracle rather than a direct sample complexity for estimating the eigenvalues. From [37], O(sn) queries to the eigenvalue oracle can be approximated to within variance ξ 2 using only O n 2 ξ 2 samples. While a variant of the protocol in [37] can make the noise independent, it will not be exactly isotropic Gaussian noise, so we can only informally claim this as the sample complexity. This is why we state the formal main result in terms of query complexity.
It is worth remarking on the strength of the assumptions that go into the statement of the theorem. Assumption A1 is mathematically convenient, but is certainly too strong physically since most errors in near-term quantum devices are likely to have low weight. This could in principle be compensated by incorporating a randomizing permutation into the experimental design. However, our experiments (see the next section) do not seem to require such a compensation for convergence. Assumption A2 is again mathematically convenient, and it will only ever be approximately true in practice. We believe that other error models with weak correlations and bounded variance will have similar guarantees, but an analysis of this would introduce significant complications without elucidating anything about the algorithm. Weakening A2 in this way would be interesting future work, as it would let us make direct formal statements about the sample complexity. Finally, a signal-to-noise assumption along the lines of Assumption A3 seems to be a mathematical necessity for convergence. However, it may be possible that a guarantee could still be proven with a smaller signalto-noise ratio or with weaker restrictions on 0 and ξ.

V. EXPERIMENTAL VALIDATION
To validate our algorithm we use data extracted from a 14-qubit superconducting device build by IBM. In Ref. [38] the complete distribution of locally averaged Pauli error rates in the device was estimated. In this work, we recycle the data from that experiment to validate our new algorithms.
The data set from Ref. [38] consists of 2 14 locally averaged eigenvalue estimates, meaning that each eigenvalue is labeled by a 14-bit string that labels the presence or absence of a nontrivial Pauli on each corresponding qubit. This is in contrast to the full eigenvalues, each of which would require a 28-bit label and could additionally resolve the entire set of 2 28 Pauli eigenvalues, without local averaging. Although we could run our algorithm on the 2 14 locally averaged eigenvalues, to make it more challenging we have looked at random self-consistent extrapolations of the data onto the full set of 2 28 ≈ 2.7 × 10 8 eigenvalues.
The random interpolation proceeds as follows. From the estimated eigenvalues of Ref. [38], we reconstruct the locally averaged error rates. (In fact, this step was already done in [38].) For each locally averaged error rate, we pick a uniformly random point in the probability simplex of the Paulis supporting the local average. This defines a new probability distribution on the full set of 2 28 Paulis. Every such extrapolation has the property that locally averaging it will return the original experimentally observed data. We construct a "true" set of known Pauli channel eigenvalues by transforming (using the Walsh-Hadamard transform) on these extrapolated error rates. This gives us a family of experimentally derived eigenvalue oracles that we can use to validate our numerical reconstructions.
The data from Ref. [38] have a 'no-error' probability of about 0.86, and upon extrapolation they have approximately 200 Paulis with an error rate above 10 −5 , about 600 above 10 −6 , and about 2,000 above 10 −8 . Although the original estimation cannot resolve error rates as small as 10 −8 with meaningful error bars, our eigenvalue oracle still has access to these numbers as part of the simulation. For this discussion, we will focus on reconstructing errors in the regime above 10 −5 , as these are the most relevant. This corresponds to a sparsity s = 4 δn with roughly δ ≈ 1 4 . As can be seen from Figure 1, the sparse recovery protocol performs well in this regime (δ 0.25), requiring only a fraction of the eigenvalues that would be required for a full recovery of all Pauli error rates. The limiting factor in this regime is the noise in the oracle, which equates directly to the number of measurements and sequences sampled as part of the original experiment (see [37] for relevant reconstruction guarantees). It appears that the effect of the protocol is to allow recovery of the Pauli error rates to (approximately) an order of magnitude or more less than the noise in the oracle.
Importantly, if a device has unexpected many-body correlations (for example through unexpected qubit interactions or crosstalk), then we should also be able to find these errors whenever their probability is above our noise floor. We have validated this feature of the algorithms as well by injecting known high-weight Pauli errors into the oracle. Our algorithm reveals and evaluates such Pauli error rates to a high degree of relative accuracy, as shown in Figure 1(e).
Section V A discusses the regime where (δ ≥ 0.25). In that case continued recovery of Paulis with low error rates requires some changes to the local stabilizer groups used (or a switch to global random stabilizer groups).
A. Experiments in the regime 1 4 < δ < 1 2 While the experimental protocol presented above is likely to be all that is required in most practical regimes, if the number of Paulis to be recovered is large then a slight modification might be needed. Unlike the situation where one is using completely random stabilizer groups, the local stabilizer protocol can fail when trying to reconstruct many low-error Paulis that differ only in one or two Paulis, in such a way that they cannot be separated by the local stabilizers. This might occur, for instance, in the regime where δ > 0. 25. In such circumstances, one can cycle each distinct set of two qubit pairs through the 5 stabilizer groups identified in Figure 3(c), and then generate the offset bins for each of them. The number of experiments that need to be performed are the original experiment (1), then a further 4 for each qubit pair (4), times the number of qubit pairs (n/2), times the number of experiments needed to generate the offsets on the remaining n − 2 qubits (4(n − 2)), for a total of 1+8n(n−2) = O(n 2 ) total experiments. The eigenvalues gathered this way allow the creation of n/2 properly offset subsampling matrices of 2 n+2 bins each containing 2 n−2 Paulis. Empirically, this appears to be sufficient to exactly recreate the global probabilities up to δ = 0.5. Figure 1(b) illustrates the extra recovery power available in the highest precision regime.

VI. HEURISTIC NOISE RECONSTRUCTION
Here we describe in more detail the intuition behind the algorithm, the experiments prescribed and a simplified, practical extraction algorithm. Our GitHub repository [42] contains code and examples showing how the algorithms can be used to recreate the figures in this paper.

A. Determining a suitable number of subsampling groups
Our proofs relating to the recovery of s-sparse Pauli errors require an assumption that each element in the support set K is chosen independently and uniformly at random from [N ]. At first glance it may appear that this is not likely to be the case in a quantum device as the Pauli errors are likely to cluster around low-weight Pauli errors rather than be uniformly distributed over the 4 n different possible Pauli errors. However where we choose random n-qubit stabilizers (global stabilizer groups) as the basis for sampling the Pauli eigenvalues, this effectively randomizes the bin into which we consign any specific Pauli error rate, which (empirically) allows us to satisfy the uniformly random distribution requirement.
Given this we can continue to use the "balls-and-bins" model utilized in [44,Appendix B]. We can use this insight, together with our ability to simultaneously sample 2 n commuting eigenvalues, to determine practical values for the number of subsampling groups C given our bin size of B = 2 n .
Since the sparsity s N , we have that the expected number of Paulis (balls) in one bin will be s N × N B = s B , which in the sparsity regime of interest will be < 2. Assuming we have an s-sparse distribution, with B = 2 n being the number of bins sampled, we define the sparsity coefficient η = B s , which is at least 1. This means that we only require C, the number subsampling groups, to be 2 in order to recover all of the edges in O(s) iterations with probability at least 1 − O(1/s) (see [44,Appendix B]). It can then be seen that as each experimental run recovers 2 n eigenvalues, we need to perform at least one experimental run for each bin plus one for each of the offsets of the bin times the number of subsampling groups. This means that the minimum number of experimental runs is 2(2n + 1). As we discuss later, by increasing the number of offsets we can increase the recovery guarantees, but at the cost of sampling more eigenvalues (although still only scaling proportional to n).
In the case where our device is too large for η ≥ 1, for instance where we have had to marginalize over the measurements as log 2 (B) ≥ 30, then we can increase our effective C by marginalizing over randomly chosen qubits and creating our subsampling matrices from such randomly chosen sub-samples of the measurement outcomes. This will allow us to retain the recovery guarantees without increasing the number of experiments on the device, although this incurs an increased computational cost in setting up and performing the peeling decoder.

B. Heuristic algorithm using local circuits
Our numerical simulations based on the data collected in the experiments from Ref. [38] indicate that randomly chosen global stabilizers are not in fact necessary to distribute the Pauli errors widely enough to allow recovery. It appears that local stabilizer groups suffice. This allows us to dramatically reduce circuit complexity while keeping the number of experiments required to a minimum. Figure 3(c) details the local two-qubit stabilizer groups that can be selected to perform an extraction experiment that is viable on most current devices. In Figure 3(d) we show the local Clifford circuits that can be used (with their corresponding inverse) at the beginning (end) of the measurement circuits to create these local stabilizer groups. In Ref. [37] it was shown that using such circuits we can estimate 2 n Pauli eigenvalues with relative precision , using O( −2 n) measurements.
Having chosen the series of stabilizers to measure, together with the circuits for offsets, we will have all the relevant eigenvalues required to use the noisy peeling decoder.
In Algorithm 1 we show how to operate the decoder on a practical level, assuming that there are a large number of Paulis sitting below the level of interest (i.e. with an   Two qubit Cliffords  3. (a) shows the type of circuit described in [37] that allows the recovery of 2 n eigenvalues of the averaged noise channel of the device. The random Pauli gates (blue) are used to twirl the channel and by averaging over a number of random choices of Pauli, the noise channel in the device is transformed into a Pauli channel. In a similar way to randomized benchmarking, by repeating the twirl for a certain number of Pauli gates (m) and then returning the system into the computation basis by choosing the Pauli inverting the twirl, a decay curve will be induced and this can then be fit to determine the eigenvalues independently of state preparation and measurement errors. The single n-qubit Clifford (and its inverse at the end) determines which of the 4 n Paulis are sampled and an appropriate Clifford can be used to select any n-qubit stabilizer set. (b) shows a further modification of the circuit, where instead of using a generic n-qubit Clifford only two-qubit Cliffords are used. (Where the device has an odd-number of qubits a single Clifford can be used on one of the qubits.) As discussed in the text, for each chosen value of m the circuit is repeated for multiple sequences with different randomly chosen Paulis, but for fixed Cliffords. Collectively each of the runs for multiple choices of Paulis carried out over several different lengths of m are defined as an experiment. (c) shows how once an experiment has been chosen in Step 1 of the procedure, further experiments are created in order to determine the offsets required to identify the Pauli (see text). As shown in Step 2, each of the two qubit Cliffords needs to be cycled sequentially through the four other 2-qubit stabilizer groups (i.e. the four that are different from the initial choice). This means that for each sequence in Step 1, a further 2n experiments need to be performed, leading to 2n + 1 experiments per chosen stabilizer group. For the second group an offset of one qubit should be chosen, meaning the local stabilizer groups now span different qubit pairs. Simple Pauli twirls can be carried out on any odd or isolated qubits. (d) Some example sub-circuits required to perform the transform into the local stabilizer group listed in (c)-Step 2. The inverse gate will be of a similar form.
error rate 0 ). To understand how it works one should note that there are two main components to dealing with the noise. The first is when deciding if the bin is zero, i.e. when the only values in the bin and its offsets are noise. We initially start willing to assume this is the case and slowly become less willing (by δ z ) to accept that the bin is really zero as we start to try and recover smaller and smaller error rates. This means that initially the decoder will concentrate only on bins that have relatively large Pauli errors in them and will be less likely to mistake noise as indicative of an error.
For instance, assuming a reconstruction error normally distributed with a standard deviation of 0.01, then if a bin contained 2 14 0-error Paulis with such noise, we would expect the mean of such a bin to be centered around 0, with a standard deviation of (0.01 2 /2 14 ) ≈ 7.8 × 10 −5 . Therefore allowing 3 standard deviations we would expect the square of the noise in the bin to be less that ≈ 5.5 × 10 −8 . By ignoring bins with a squared value less than 5.5×10 −8 and slowly decreasing this number to, say, 5.5 × 10 −10 one can ensure that higher error rate Paulis are first recovered, before exploring possibly empty bins for low error rate Paulis.
The second component is the willingness to accept that there is only one value in the bin offsets. This time we start with a strict check, and we only accept a Pauli if the noise is below a threshold, then slowly relax this (by δ s ) as we aim to recover Paulis that happen to have increasing amounts of noise in the bins with them.

Algorithm 1 Noisy Peeling Decoder
Require: M ← Paulis in groups (C sets), Populate P with singletons. 10: for = 1, 2, ..., arbitrary do 11: for c=1, . . . , C do 12: forp ∈ [pj,c,p j,c,b ], j ∈ F n 2 , b ∈ 2 [2n] do 13: if not IsCloseToZero(p, Zs) then 14: if IsSingleton(p, Ss) then 15: (P,E) ← SingletonPauliAndSize(p) 16 In this section, we describe in detail a hashing-based subsampling recovery algorithm for which we prove a recovery guarantee. For convenience, we have collected the notation used in the provable recovery algorithm into a glossary of symbols in Table I  Consider a Walsh-Hadamard transformation among n-qubit Pauli eigenvalues and Pauli error rates like (5). In order to recover a set of sparse error rates with noisy eigenvalues, it is necessary to consider a noisy variation Note that we employ w k to indicate the sampling errors of the eigenvalues, and for simplicity we are assuming that they are all independent Gaussian random variables with distribution N (0, ξ 2 ). The proposed algorithm follows the SPRIGHT framework of Ref. [44]. It first samples Pauli eigenvalues and forms several groups of bins. The algorithm then implements the Peeling Decoder algorithm 3 to recover individual Pauli error rates from the sampled bins.
To subsample bins from noisy eigenvalues, this algorithm employs C subsampling groups. Each subsampling group is specified by a binary matrix M c ∈ F 2n×b 2 for c ∈ [C] and a set of P offsets. The binary matrices M c serve as hash functions to isolate individual Pauli error rates into B bins with high probability. In our experimental setting, this B is always chosen as B = 2 n for convenience, while in the following proof it's sufficient for B to be as large as s, and increasing B exponentially to s is enough to find out the magnitude of s. Therefore in the proof, we use the general case that B = O(s). The P offsets provide redundancy designed to make the recovery algorithm robust to limited amounts of sampling noise.
Before constructing explicit algorithms, we shall introduce a method for choosing offsets by using good error correcting codes [44].
Here P is an upper bound on the probability that the sample error will change the sign of a single-ton bin (i.e., a bin with a single nonzero Pauli error rate).
It will be convenient in what follows to define a 2n×2n matrix J n given by where 0 n is the n × n zero matrix and I n is the n × n identity matrix. This is the symplectic form that controls the commutation relations in the Pauli group. That is, if p, q ∈ F 2n 2 , then where the arithmetic is implicitly modulo 2. k ← M c + dc;t

11:
Return Uc;t[j] 12: end for We now introduce Algorithm 2 to use for data preprocessing and bin construction. The indices on each array are considered to be modulo their respective dimension, and each element of the summation M c + d c;t is calculated in the field F 2 . The algorithm calculates bin coefficients using the corresponding binary matrices and by taking sums over the whole space F b 2 . After this subsampling process, each subsampling group contains P sets of B = 2 b bins, where b is a free parameter. The result of applying Algorithm 2 is summarized in the following lemma.
Moreover the sample error is as follows where W m is the noise of the Pauli error rate p m .
We remark that the noise W m on the error rate p m is induced by the Gaussian noise {w k } on the noisy eigenvalues {λ k } by the WHT. It is important to keep these two noise sources separate, although we will not make much direct use of w k in the remainder of the paper.
Proof. Denote p m + W m byp m , so the noisy Pauli eigenvalues can be transformed to From Algorithm 2, a specific bin U c;t [j] for some c ∈ [C], t ∈ [P ], and j ∈ F 2n 2 is constructed as follows, A key observation is where the first equation is from the definition of the symplectic inner product in §II, and the second equation comes from Modify part in Algorithm 2, and the third is due to the following property of J J n · J n = I 2n ∀ n ∈ N.
Thus the bin can be simplified as follows, where W c,t [j] is defined in the lemma.
We note that from Line 10 in algorithm 2, the fact that the original noise w is isotropic, and the fact that the -bit WHT is proportional to an orthogonal transformation, it follows that the noise in each bin W c [j] remains Gaussian distributed, but according to the distribution N (0, ν 2 1) where ν 2 = ξ 2 B . Moreover, we can combine each U c;t [j] for different t ∈ [P ], and get a vector  Proof. This is a variation of lemma 1.
After subsampling and calculating bins, it's straightforward to design a protocol to extract information from these bins. The idea is to construct a bipartite graph G, as in Figure 2, with s left nodes representing nonzero Pauli error rates and BC right nodes representing bin vectors U c [j].
We draw an edge from each left node (a nonzero Pauli error rate) to every right node that contains that Pauli. Each Pauli error rate will occur exactly once in each subsampling group, the degree of the left nodes is therefore C. We can use the resulting degrees of the right hand nodes to partition them into three types. We call a bin with exactly one nonzero Pauli error rate a single-ton, and similarly there are zero-ton and multi-ton bins that contain exactly zero or contain more than one Pauli error rate respectively. (Recall that this graph is depicted in Figure 2.) Shortly we will describe in detail a method to detect which type of bin a particular node has been partitioned into. After invoking such a bin detector, the peeling decoder can be designed to peel out the detected single-ton Pauli error rates by subtracting them from every multi-ton bin in which they appear, removing the associated edge from the graph. This will reduce the degree of that right-hand node, potentially turning it from a multi-ton bin into a single-ton bin. For the range of parameters that we have chosen and the assumptions outlined above, iterating this decoder to discover new single-tons and reduce multi-tons will converge to reduce the graph to only zero-ton and single-ton bins with high probability.
In the Algorithm 3, we apply an array T that indicates the variance of the propagated noise part, W c;t [j], in each bin. These numbers help track the propagation of error in the bin detector from the calculation in Line 12 of Algorithm 3. The equation in Line 11 of that algorithm describes how to update T. Lemma 7 below shows the need and utility of this parameter.
One subtlety to applying the peeling decoder to this graph is that the graph might have cycles. Peeling on a graph with cycles will in general lead to dependencies in the random variables, which complicates the analysis. However, as we show below in Lemma 5, large local neighborhoods of the peeling graph look locally tree-like with high probability, therefore we can peel for a large number of steps before encountering a cycle. With the correct choice of parameters, the tree-like neighborhood can be made large enough throughout the graph to ensure convergence of the peeling decoder.
As we previously mentioned, the peeling decoder algorithm is based on a subroutine that we call Bin Detector (it is set out in Algorithm 4). We will denote it by BD(U c , D c , T ). The subroutine, BD, will take a bin, the offsets chosen, and a noise parameter T as inputs, and it will output an estimate for the type (zero-ton, single-ton, or multi-ton) of the bin B, and if the bin is a single-ton it also returns the estimated index m and Pauli error rate p m . The subroutine BD also depends on two parameters γ 1 and γ 2 , but these can be chosen as arbitrary constants in the interval (0, 1). Their only purpose is to ensure exponential decay of the failure probability of bin detection, as we discuss in Lemma 10. if B = single-ton then  else if B = single-ton then 15: continue to next j. 16: end if 17: end for 18: end for 19: Return: P By using the sparsity assumption and our choice of subsampling matrices, this peeling process will succeed with high probability. Intuitively we can see this from our ability to choose subsampling matrices M in such a way that we can find bins that typically contain only zero or one nonzero Pauli error rate. In [44] the authors provide a proof that if a bin detector algorithm always returns an exactly correct answer, then the oracle-based peeling decoder has a failure probability that vanishes with the signal size. So it suffices to propose a suitable design for the bin detector and a corresponding recovery guarantee.
In designing such a bin detector we will need to estimate the index of the relevant Pauli in the bin. For index estimation in the setting where there is noise we will need to make our estimation robust. One approach is to use some repetition of detection and a majority voting. A better approach is to use some form of error correcting code for the offsets, as discussed above in Definition 1.
In what follows, we will use the following definition of a sign function, With this definition, we have the following lemma, which confirms that offsets chosen in accordance with Definition 1 can be used to estimate the indices.
Lemma 3. Given a single-ton bin (m, p m ) observed with noise and supposing that the variance in each row (offset, d) of the bin is equal to T ν 2 , then the sign of each observation satisfies where Z is a Bernoulli random variable with probability Proof. The first term in (18) follows trivially from the sign of the power of minus one in (17) and the fact that p m , being a probability, is always positive. The second term, Z, will be 1 if and only if |W | is larger than p m so that it can change the sign generated by the first term of (17). Therefore, Z is Bernoulli distributed with a probability that we can bound as follows. Recalling that W is a Gaussian random variable, we can use the relevant tail bounds for our assumption on the variance (for details see [48]) to obtain where T is the number extracted from the array T in Algorithm 3.

Remark 1.
If we assume that the maximum degree of right nodes in the bipartite graph G is not larger than 1 2 P 1 , P m < 1 2 is satisfied for all m ∈ F 2n 2 (using A1 in Assumptions 1 and lemma 9). We will bound the probability that this right-hand degree assumption fails in lemma 8. In what follows, we will ignore the subscripts c and indices j of the bins when it will not lead to any misunderstanding. Now lemma 3 can be used to identify m, the index of the Pauli error rate in a single-ton bin. Given the offsets chosen in Definition 1 and recalling lemma 2, we have the following equation from the code generator G for the signs of every element in a bin, . . .
Since the bit length of the index m is 2n, we can choose the number P 2 as follows. We choose any linear code with rate R and distance d, and a decoder that can decode up to at least a minimum distance β2n/R for parameters β, R = Θ(1). Obviously this requires d ≥ β2n/R. The additional constraint on β is that β is larger than the probability P of any of the Bernoulli random variables Z i to be 1. Then we can choose P 2 = 2n/R. That is, we are looking for a classical linear code with parameters [2n/R, 2n, d ≥ β2n/R]. There are a number of pre-existing candidate codes that can be decoded up to a constant fraction of the minimal distance in linear time in the length of the code exist that satisfy these stringent conditions. For example, expander codes [49] can be implemented to construct the code generator G and the parity check matrix H, and the greedy linear-time decoder [49] can correct errors with weight up to d/4. The decoder of the corresponding code is required to retrieve the estimatem. Since the manner of coding and decoding is flexible, here we only use Decode to indicate the decoder.
With this we can specify the modified algorithm to detect the bin U with the offsets as in Definition 1 along with the corresponding number T .
We are now ready to give a precise specification of the bin detector algorithm.  Recall that using the protocol in [37], we can estimate O(sn) eigenvalues to within variance ξ 2 by doing O( n 2 ξ 2 ) measurements. Therefore, informally, the entire algorithm needs O( n 2 ξ 2 ) measurements to achieve this recovery guarantee.
Proof. Firstly we consider the stated query and computational complexities. From [44,Theorem 2], it's shown that the oracle-based peeling decoder succeeds with probability 1 − O(1/s) for a random sparse set (obeying assumption A1) as long as C = O(1) and B = O(s). Therefore, to prepare these bins, the number of queried eigenvalues is BP C = O(P s).
To construct the bins and the corresponding graph, the computational complexity can be calculated by the complexity from the construction algorithm. Note there are P offset coefficients d and each U c,t [j] comes from the sum of B samples in Algorithm 2. To construct the total set {U c,t [j]} C,P,B , the we can use a fast WHT (which has complexity O(B log B) to calculate a B-point WHT) for each offset. Therefore, the computational complexity for this part is The second part of computational complexity comes from the computation of Algorithm 3. Each step in the bin detector checks the type of the bin with O(P ) calculations, and there are O(B) iterations. Accordingly, the complexity is Therefore, the total computational complexity is Z = Z 1 + Z 2 = O(P sn).
Let us denote by E bin the event that any invocation of the bin detector (in the execution of Algorithm 3) returns one or more of the following: (a) an incorrect identification of the type of bin; (b) wrong indices for a detected single-ton, or (c) a mis-estimate of the Pauli error rates of a detected single-ton by more than 2ν = 2ξ/ √ B. Furthermore, let D denote the event that the maximum degree of the right nodes in the graph G is less than or equal to P 1 . Let H be the event that all the peeling routes in the procedure are cycle-free. Then utilizing the law of total probability we can bound the failure rate of the entire algorithm as Here the subscript c denotes the complement of the event, e.g. E c bin , denotes that no bin detection error occurred in the entire execution of Algorithm 3.
The first term in (22) is the chance that the oraclebased peeling decoder fails, even though the bin decoder is always correct. This probability scales as O(1/s) (Proposition 4 in [44]).
To bound the second term, it will be more convenient to consider the probability that every invocation of a bin detector works correctly given D and H. Let M denote the number of times the peeling decoder calls the bin detector subroutine. This probability can be expressed as follows, where E i denotes the event that the ith call of bin detector returns a wrong answer. According to lemma 7, the parameter T will always correctly estimate the variance if all the earlier bin detectors worked correctly. From lemma 10, each term in the above equation will be lower bounded by where V here just indicates that all the previous bin detectors work correctly. So we have Moreover, since M , the number of times the bin detector routine is called, is at most O(BCs), the upper bound of the second term is Lemma 8 provides that the third term in (22) is also exponentially decaying with P 1 : where the last inequality comes from the definition of P 1 from Definition 1. Similarly lemma 5 and Remark 2 provide the bound on the probability of H c : Therefore, the total failure probability of our peeling decoder algorithm is vanishing exponentially with the number of qubits n. And from Definition 1, the total num-ber of offsets consists of P 1 = Θ(n) random offsets and P 2 = Θ(n) coding offsets, thus P = Θ(n) and stated complexities have been proven.

IX. TAIL BOUNDS
In this section, we prove several statements bounding the failure probabilities of various events that can cause the bin detector outlined as Algorithm 4 to fail.
One of the main lemmas that we will need is the following tail bound on Gaussian random variables.
Lemma 4 (Tail bound [44,Lemma 11]). Given g, k ∈ R N where k is an isotropic Gaussian random variable k ∼ N (0, ν 2 1 N ), then the following tail bound holds: for τ 1 , τ 2 , and θ 0 satisfying Since we will use this lemma 4 to get a failure bound, it is critical to show the sample errors within the different offsets for a particular bin are independent. However, given that bins are created as shown in line 10 in Algorithm 3, it is not immediately clear that the sample errors remain independent. To show this independence let us first extend the definition of the sample errors W c;t [j] to take into account the effect of the peeling decoder Algorithm 3. Recall that for any particular bin we have P (being, P1+P2) offset bins.
Definition 2. For a specific bin, regard U c [j] as a vector of length P as in lemma 2. Denote the set of indices of the current contained non-zero Pauli error rates by P. Denote the P1 sample errors, being the sample errors that occur in the offsets of that bin where the offset indices are t ∈ [P 1 ], for a certain timestamp in the peeling decoder And similarly denote the P2 sample errors, being those those that occur in the bins with offset indices t ∈ [P 2 ], as We can combine all of these sample errors to be W c [j] following the manner of lemma 2.
In case of the discussion about the independence of errors and the variance evolution, we first introduce some results to rule out some intricate situations. To define this more rigorously, consider the directed neighborhood N l e in the bipartite graph consists of nonzero Pauli error rates (left nodes) and all the bins (right nodes). This N 2l e with length 2l and an edge e = (v, c) is an induced sub-graph containing all edges and nodes on paths e 1 , e 2 · · · , e 2l from node v where e = e 1 .
Denote the event that for every edge in the bipartite graph, this sub-graph N 2l e is cycle-free by T l . If T l occurs all the first l peeling iterations will progress independently and there is no initial error propagating to a single bin with more than once in the first l iterations. It has been shown in [44] that with sufficiently large s and N , the effective part of the subsampling-based bipartite graph, similar to Figure 2, behaves with a cycle-free manner.
Lemma 5 (Ref. [44,Lemma 6]). For any iteration l, the probability of the complement of T l is bounded as for some constant c 0 .

Remark 2.
Ref. [44] shows the probability p l to depict an arbitrary edge to be held after l peelings given the subgraph is tree-like or cycle-free is calculated in an iterative approach.
where η is the factor B s and C is the number of subsampling groups.
If we take C = 6 and η = 1, which is a reasonable setting, the probability will decrease to the machine epsilon in only three iterations. And in general this p l vanishes exponentially with the exponent with of l. With the law of total probability, the probability of that there exist any edges after l ∼ O(log log(s)) iterations, denoted by h c , is Therefore, the event that there exist bins getting peeled by some earlier bins in a cycle manner during the whole process happens with probability of the same magnitude of (26), which is convergent to zero as s is sufficiently large.
Lemma 6. For an arbitrary timestamp in Algorithm 3, sample errors in each of the P1 sample errors for a particular bin U c [j] remain independent of each other given that the peeling route is cycle-free.
Proof. For an initial bin subsampled from Algorithm 2, consider an arbitrary pair of offsets labeled by t 1 , t 2 ∈ [P 1 ] in the same bin U c [j] Since all the errors W m are i.i.d. Gaussian random variables N (0, ξ 2 N ), it is obvious that E(W c;t1 [j]·W c;t2 [j]) = 0. So they are independent given that the expected values of the samples errors are 0.
The peeling decoder in line 10 in Algorithm 3 causes errors in the estimate of W c;t [j] to propagate in the following manner We now proceed by induction. As discussed above the noise is initially independent, so the base case is satisfied. Now assume that the sample errors before peeling are independent of each other. Observing that the updated error still has mean zero, we can calculate the expected value of a product between an arbitrary pair of sample errors in the offsets of a bin to show independence as between the offset bins. For convenience, denote the updated error by W c;t [j] . Then we have Note the last two terms are expected to be zero since in each term the original error is independent from the additional one according to the condition that the peeling process is cycle-free.
In Algorithm 3, we employ an array T to keep track of the variance of sample error for each bin. This array gets updated whenever the algorithm peels a bin using an estimated Pauli error rate. We now show that this does indeed correctly track the variance of the sample errors in the bins.
Lemma 7. Suppose that at a given arbitrary timestamp in Algorithm 3 all the bin detector subroutines called earlier have correctly identified their bins. And the peeling route is cycle-free. Then for each bin and its corresponding offsets U c [j], the sample error for that bin and each of its offsets W c [j] have the same variance T c [j] · ν 2 .
Proof. Since this statement is based on the premise that all of the earlier bin detector runs were accurate, we can assume that the index m = m is correct. We will still write m to distinguish these index estimates from the original index m. The idea is to calculate the variance after a peel by induction with the fact that for any two random variables, Var (X + Y ) = Var (X) + Var (Y ) + 2Cov (X, Y ) . (27) Assume that the statement in the lemma holds before a peeling. Then we need to show that if we subtract an estimated Pauli, p m , from a bin in a different subsampling group that contains this Pauli, the statement is preserved as the updating of T . To do this we will work out the variance of the sample error in each term and the covariance between these sample errors. Armed with this we will be able to prove that the statement is preserved after peeling.
We now consider the peeling process causes error propagation for the error part of the bin U c [j] as follows The variance of the first term is by induction T c [j]ν 2 , while the variance of the second term is not so trivial. The estimated Pauli comes from the single-ton search, where all the first P 1 observations get summed after added a random sign. Since all the random signs of W m terms are annihilated before summation, and all the other error parts still remain random, the variance of this second term in (28) is Because of the assumption that all the initial errors in different bins are independent and the condition that the whole peeling routes are cycle-free, it can be viewed obviously that these two term remains independent. Therefore, we have calculated variance as follows which is consistent with what we have claimed.
In order to prove theorem 1 and find a bound on the variances of the sample errors, it is necessary to find an upper bound on the parameter T , which needs to be analyzed for both the graph and the algorithm. Denote by G the bipartite graph of which each right node represents a bin observation, each left node represents a nonzero Pauli error rate, and edges come from the hash function relation: That is, a bin-observation-node U c [j] is connected to error-rate-node p m if and only if it holds that M T c m = j. A right node is a single-ton node if and only if it has a single edge connected to it. Every time we peel a left node (that is we identify a Pauli error) we remove the edges connecting it and the right nodes. Each peeling therefore decreases the degree of the right nodes.
The following two lemmas help us bound the integer array T as we peel along the graph G. We first bound the right degree of G.
Lemma 8. The maximum degree of the right nodes in G is less or equal than P1 2 with probability 1 − e −O(n) .
Proof. Put the right nodes in some sequential order, and define events {X i } BC i=1 where X i denotes the ith node and is linked to more than P 1 /2 left nodes. According to the bin observation model (15), each bin connects with N B Pauli error rates (most of which will be zero), so the expected degree of a right node is s B where s is the number of left nodes. Since the protocol chooses B = O(s), the expected degree e = s B ∼ O(1). Note that by Assumption 1, the support set of the Pauli error rates is chosen randomly. Therefore, concentrating on a specific bin i, we can introduce a random variable d i that denotes the degree of bin-observationnode i (a right node), and introduce the variable d ij which is 0 if the corresponding jth Pauli error rate is zero or if p j is not in the ith bin, and 1 otherwise. Then we have the relation since this counts the support in the ith bin. These variables d ij are actually all Bernoulli variables and the only correlation among them comes from the constraint that there are exactly s elements in the entire support. This constraint means that the d ij are negatively correlated, and so the probability of d ij = 1 can be upper bounded by considering the event that all the other Pauli rates linked with this bin are zero, . Now consider another set of i.i.d. Bernoulli variables {d ij } j each of which is 1 with probability sB N (B−1) . We then have Since the expected value of the sum of {d ij } is s B−1 , the Chernoff bound is suitable for this case, and we find According to the union bound, the event X which denotes that there exist some left nodes with degree larger than P1 2 follows from the upper bound, That this is at most e −O(n) follows since B = Θ(s), s = Θ(N δ ), and P 1 = Θ(n).
The degree bound we have just proven allows us to bound the maximum element of the array T. Lemma 9. Suppose the maximum degree of the right nodes in G is not larger than P1 2 . Then the maximum element of the array T is at most 4 if all the earlier bin detectors work correctly.
Proof. The recursive equation for T in algorithm 3 is There exists a time sequential order for each nonzero Pauli error rate to be detected. For each step i and bin U zi , being the next bin in which we will find a nonzero Pauli rate. Let T max denote the current maximum element in a subset T peeled of T. T peeled contains T of bins in which we have already found a nonzero Pauli error. Also, we use T max to indicate the maximum T that will exist after the next step. According to the assumption, the maximum degree of an arbitrary right node is less than min(P 1 /2, N/B), and the number of peels needed is never more than the maximum degree of that node. Note our assumption is that the previous bins have been accurately detected, so the process will always choose nonzero Pauli error rates (and the corresponding bins) to peel. The noise in the peeling bin will increase by at most Tmax P1 + (P1−1)B P1N . Each T is initialized as 1 and denote the number of peeling by κ, therefore Then by induction, because initially the maximum element in T peeled is equal to 0 and in each step this value will increase as the above formula. Therefore, the maximum element we can get is the limit of this equation, which is T max ≤ 4.
According to the above two lemmas, the integer T max indicates that the upper bound of the maximum element in the array T is not larger than 4 with high probability for a sufficiently large N . In order to compute the failure rates and make the algorithm realizable, we assumed (as in A3) the minimum nonzero Pauli error rate 0 satisfies In anticipation of applying the union bound, let us define the following error categories and their probabilities. For brevity, we will denote a zero-ton, single-ton, or multi-ton by just the letters z, s, or m, and we denote the true value of the bin by B. • The value error probability: Of course these probabilities are not all independent. However, by the union bound it suffices to bound each of these bad events individually and the total failure probability will be at most the sum of the probabilities of these failure modes. We will show that all of these failure probabilities decay exponentially with P 1 , the number of randomly chosen offsets.
Lemma 10. Let E denote the event that an arbitrary bin detection with inputs as those in Algorithm 3 returns either the wrong bin type, the wrong index or an estimated Pauli error rate with error larger than √ 4ν 2 . Let D be the event that the maximum degree of right nodes in G is not larger than P 1 /2. Let V be the event that all the earlier bin detectors work correctly. And denote the event that every peeling route is cycle-free by H. There exists a bound that Proof. This theorem means that the bin detector algorithm succeeds with high probability whenever D, V and H occur. To show it, we have to bound the failure probabilities for each failure mode of the bin detection algorithm and then apply the union bound. We will prove most of our statements by bounding failure probabilities with expressions of the form e −O(P1) . This is equivalent to a bound of the form e −O(n) since P 1 = Θ(n) by Definition 1. Also note that conditioning on events D, V and H allows the use of lemma 7 and lemma 9, specifically that the variance of noise in each row of this bin is T ν 2 from lemma 7, and that this T is no more than 4 according to lemma 9.
We first consider the single-ton false negative probability in Definition 3. Note in this case, the underlying bin contains only one Pauli error rate along with noise, that is Let f 1 = Pr B = z | B = s . Then by definition, the probability can be upper bounded by the probability of a single-ton bin passing the zero-ton verification: where s c,m is the vector with t-th element to be the random sign of index m and the offset d c;t . Since here the noise vector W comes from the sum of noise w, it's obvious that all the elements of W are Gaussian distributed with variance T ν 2 . Therefore, according to the tail bounds of lemma 4, the following holds as long as Now let f 2 = Pr B = m|B = s . This kind of failure happens if and only if the single-ton bin fails during single-ton verification, Considering the underlying structure of this bin (33), this probability can be bounded using a conditional probability. We first denote the event {| p m − p m | > √ 4ν 2 or m = m} by E 0 . Then we observe Using the tail bound (23), we have that Then using union bound, we can deal with the first term Note above, the estimated Pauli error rate can be calculated according to Algorithm 4, so we obtain the bound where Y is the sum of P 1 i.i.d. Gaussian variables with N 0, T + (P1−1)B N ν 2 like (29), and the last inequality comes from the Chernoff-Hoeffding bound [50]. According to lemma 9 powers in (35) and (37) are both scaling linearly with P 1 , thus the probabilities decay exponentially with P 1 .
Since the second term in (36), Pr ( m = m), is essentially the probability of the index error, the failure probability of such a decoding process also decays exponentially with P 2 . In accordance with (20), the sign vector [sgn [U P1 ] , · · · , sgn [U P −1 ]] T is the sum of a codeword G, m and a vector of noise. Since the decoding process fails only if the weight of the noise is larger than distance βP 2 and each element of the noise is an independent Bernoulli random variable with error probability upper bounded by P, the index error probability can be bounded by Chernoff-Hoeffding bound: Moreover, as noted in Remark 1, we have P m < 1 2 for all m ∈ F 2n 2 . Given the assumptions of D, V and H, we choose the maximum P m to be P = max m P m . Therefore, using the law of total probability we have Recall the Definition 1 to learn the magnitude of P 1 , P 2 , and we have a bound f 2 = e −O(n) . We now turn to the case that the bin detection algorithm incorrectly recognizes a zero-ton or a multi-ton bin as a single-ton bin, i.e., we consider the single-ton false positive probability. For this, we need to consider the general underlying bin structure where U c is either zero-ton or multi-ton, and only contains the P 1 fully random offsets when choosing as Definition 1. Here S c ∈ {±1} P1×N/B is the sign matrix constructed according to lemma 2. Now consider the probability of the bin detector falsely detecting a zero-ton as a single-ton, and denote Pr B = s|B = z by f 3 . By definition, the probability of f 3 can be bounded by the probability of zero-ton verification failing According to the tail bound (23), this failure probability can be bounded by an exponentially decaying function Now let f 4 = Pr B = s|B = m . The kind of error probability can be evaluated under the multi-ton model when it passes the single-ton verification step for some estimated index-value pair ( m, p m ) and let the sample error be W = k. Then the law of total probability can be used as follows: Note that the first term can be bounded by (24) since the conditional part shows the lower bound of the parameter θ 0 as defined in lemma 4 The second term can be bounded as follows. Let α = p − p m e m , and we have Here e k is the vector with support only on the k th element. We denote the support set of the vector α by L 0 , and define the 0 -essential support of α to be Denote the cardinality of L by L. Then the above probability can be bounded by an application of the Chernoff-Hoeffding bound.
With the same argument as in the proof of lemma 6, the sample error in each row in vector g is independent, and so is the square of that error. When we calculate g 2 , we can regard it as a sum of P 1 independent random variables. Also, each term in this sum contains the same structure, and identically distributed parameters, so we can claim each term is identically distributed.
Therefore, we first analyze the expected value E of each variable in this sum. Take one of these terms X i as an example, where {d i } is a set of independent random 2n-bit strings.
The expected value E of X i satisfies the following bound Note that above we used the fact that any random strings are independent.
Moreover, since we want to show this term will be large with high probability, we should consider the random terms in each X i , and that is where the order is in lexicographical order. Note the rest part of X i is a deterministic one, so we only calculate the variance for this R i . It's straightforward that we have and the only contributed terms are those without random signs in R 2 i . For example, if we consider a specific u, v and the term (α u α v ) × (α w α x ) with some (w, x) = (u, v) (assume w > x), this term will contribute to the expected value only if w + x + u + v = 0. Therefore, the only potential effective terms are those with four different Pauli error rates. Moreover, since w + x + u + v must be in the null-space of M T c according to lemma 1, of which the size is N B = 1 η e (1−δ)n , we can abstract this probability from the so-called Balls and Bins problem.
Regardless of square terms, the number of potential terms are L 4 . And the probability of the number of terms in 0 is at least a chosen constant η 0 , denoted by G i , can be bounded which is decaying exponentially with n. Given that the complementary event G i happens and that the set of ef-fective terms are A i , the variance of R i is the sum The last inequality comes from the mean inequality, and we can use the Hoeffding bound to obtain The last inequality used facts that For any nontrivial signal, we have that 1 ≤ L < P 1 /2. As long as 0 < γ 2 < 2 0 /2T ν 2 and choosing η 0 = 6, for any multi-ton there exists Denote the first term Pr( B = z | B = m) by f 5 and the second Pr( B = m | B = z) by f 6 . For f 5 , recognizing a multi-ton as a zero-ton, we have the following inequality, Note this probability can be analyzed in just the same way as f 4 's, and the only difference is that when we consider f 5 , the α just based on several underlying Pauli error rates without any subtraction, so L ≥ 2 for this case. As long as 0 < γ 1 < 2 0 /T ν 2 , for any multi-ton we have the bound Moreover, it is clear that the failure probability of recognizing a zero-ton bin as a multi-ton bin, namely f 6 , is smaller than f 3 . Next, consider the index error probability, and denote Pr B = s, m = m|B = s, m by f 7 . This probability can be bounded by the probability of estimating a wrong index m and some Pauli error rate, and still passing the single-ton verification Note the last inequality is just (38), and according to Remark 1, P m < 1 2 for all m ∈ F 2n 2 given that events D and V happen and we choose the maximum one to be P.
Finally, let's consider the value error probability, and denote Pr( B = s, |pm − p m | > √ 4ν 2 |B = s, m, p m ) by f 8 . Note that we have chosen √ 4ν 2 as the error bound for the Pauli error rate, so similar to the index error probability, this f 8 can be bounded by the probability of estimating a noisy Pauli error rate and passing the single-ton verification. We can loosen this bound by only consider the first event, and we obtain the inequality Note the last inequality comes from combination of (37) and (38). According to Remark 1, we again have P m < 1 2 for all m ∈ F 2n 2 given that events D, V and H happen and we choose the maximum one to be P.
According to Definition 3, we have treated all of the failure cases of the bin detector algorithm. Using the union bound, we can get the following inequality As we illustrated at the beginning of this proof, events D, V and H have showed that the variance of noise in each row of this bin is T ν 2 from lemma 7, and that this T is no more than 4 according to lemma 9. Furthermore, they imply that θ m is strictly smaller than 1 2 for all m ∈ F 2n 2 in (39), (52) and (53). Since constraining the peeling graph G to obey this event is independent of the above analysis of the failure probabilities of the bin detector, it will hold that Pr (E|D, V, H) ≤ e −O(P1) .
This completes the proof.

X. CONCLUSION
We have shown that for sparse Pauli channels we can learn all the significant Pauli errors, even those associated with high-weight Pauli strings, using realistic ex-perimental resources that scale with the sparsity of the Pauli errors rather than the dimension. In particular we have demonstrated that using only a few local two-qubit gates and a number of quantum experiments that scales linearly (with a factor of about 4), we can recover up to 4 δn of the largest error rates, where δ 0.25. Our numerical analysis indicates in the regime where 0.25 < δ < 0.5 we can still recover these errors with a number of experiments that only scales as O(n 2 ).
We support these experimental protocols by defining and analyzing an algorithm with rigorous performance guarantees. This provable algorithm confirms that, with explicitly stated assumptions, high-precision reconstruction is possible when querying only a number of Pauli eigenvalues that scales like O(sn). Moreover, the heuristic practical circuits used above are able to approximate the relevant noisy eigenvalue queries with sufficient precision ξ using only O(n 2 /ξ 2 ). These circuits exploit the protocols presented in [37] and [38] to learn up to 2 n commuting Pauli eigenvalues per experiment, and greatly reduces the required experimental resources.
This work provides an experimentally realizable method of identifying the relevant Pauli errors in largescale quantum devices even if there are unexpected longrange correlations between the qubits. The ability to do so will be vital as we seek to mitigate the errors in such devices, to learn the noise patterns that exist when such devices are operated holistically and will allow better designing and tailoring of error correction and fault tolerance in such devices.
Many interesting open questions remain.
For example, what about very large scale devices? The practicality of the algorithm in the regime of greater than (say) 30 qubits, where memory storage becomes an issue, could potentially be addressed as follows. We keep the protocol executed on the device identical as system size increases. However, we can take advantage of the fact that the WHT commutes with the marginalization of the observed probabilities and the process of fitting required to ascertain the SPAM-free eigenvalues (see [38]). The actual observations only require n bits of data to store. We can, therefore, marginalize the observations to obtain overlapping sets of 2 m eigenvalues (where we choose m to be the largest computationally tractable number for our classical computer). This will mean that we have multiple sets of 2 m bins, each potentially containing 2 2n−m Pauli error rates. Given this, the s-sparse assumption now becomes s < 2 m . It would be extremely interesting to implement this version of the algorithm on real data.
Another approach to dealing with very large scale devices is to incorporate our algorithm as a subroutine in a larger algorithm that builds a globally consistent Pauli error distribution from estimations of marginal error rates. For example, as proposed in Ref. [37], one could efficiently estimate a Markov random field description of a Pauli channel if the underlying graphical model has bounded degree correlations. This idea has been performed experimentally on 14 qubits in Ref. [38]. We believe using the algorithm presented here would improve the estimation of the core subroutines and lead to better performance of the global reconstruction.
There are also several open mathematical questions about the reconstruction of sparse (or approximately sparse) Pauli channels. For example, it would be interesting to relax the random sparsity assumption on the support. Similarly, it would be interesting to treat more general noise on the eigenvalue oracle. In particular, treating the case of noise with bounded variance seems to be the most relevant for providing recovery guarantees that relate to practical experimental capabilities. It might also be possible to weaken our assumptions about the signalto-noise ratio. A lower bound would help to clarify where the limits are to these types of algorithms.
A further important open question is understanding the power of the structured circuits that we use for eigenvalue estimation. When using shallow depth Clifford circuits to prepare stabilizer bases for eigenvalue estimation, what recovery guarantees are possible? Is it still possible to efficiently reconstruct arbitrary sparse Pauli channels?
Finally, the most important open problem is to use our algorithms on real experiments to characterize noise, improve calibration of a device, or customize an error correction procedure.

This work was supported by the US Army Research
Office grant numbers W911NF-14-1-0098 and W911NF-14-1-0103, and the Australian Research Council Centre of Excellence for Engineered Quantum Systems (EQUS) grant number CE170100009.
Appendix A: Walsh-Hadamard ordering In this paper we use a variant of the Walsh-Hadamard transform where the ordering is determined by the commutation relations between the Paulis. The natural ordering of a WHT matrix can be calculated from the tensor product as: In this case, like the sequency order and dyadic order variants of the WHT, we reorder the columns of the transform matrix. Unless otherwise expressly noted, we use a WHT where the (i, j)th entry of the Hadamard transform matrix is given by (−1) i,j , where the inner product is the symplectic inner product introduced above. The advantages of using this variant of the WHT is that when it is used to transform eigenvalues to error rates and vice-versa (Equations (5) and (6)), the position of each Pauli in the transformed vector remains constant.