Efficient diagnostics for quantum error correction

Fault-tolerant quantum computing will require accurate estimates of the resource overhead, but standard metrics such as gate fidelity and diamond distance have been shown to be poor predictors of logical performance. We present a scalable experimental approach based on Pauli error reconstruction to predict the performance of concatenated codes. Numerical evidence demonstrates that our method significantly outperforms predictions based on standard error metrics for various error models, even with limited data. We illustrate how this method assists in the selection of error correction schemes.

Noise is pervasive in quantum processing, and must be overcome to achieve the disruptive capabilities of quantum computing. Fault tolerance guarantees reliable logical quantum computation in the presence of noise under prescribed conditions often oversimplified as achieving a threshold on gate error rates. However, achieving low logical error rates in practice is a significant challenge, in part because of the large overheads that are required in terms of the number of additional qubits and gates. The design and selection of an error correction strategy for a particular platform requires accurate prediction of its expected logical performance. For instance, in the presence of biased noise [1][2][3][4][5][6], tailored codes have been shown to outperform traditional codes that are designed to correct unstructured noise. However, bias is only one of the exponentially many parameters that describe the noise on n physical qubits. This work addresses the lack of tools for predicting the logical performance of a FT architecture based on a description of noise at the physical level.
The existing theoretical framework for choosing a FT scheme is centered around the fault tolerance accuracy threshold theorem [7,8] which provides a threshold on the strength of physical noise below which reliable quantum computation can be guaranteed. However, directly applying the theorem to realistic noise has several challenges. One, the FT threshold is derived under oversimplified conditions that implicitly model a physical noise process as an incoherent error model with the same diamond distance. The leads to loose estimates of the logical performance when the noise has complex features such as coherence or strong correlations. Another is that diamond distance, which is usually invoked for assessing error rates in FT proofs, cannot be measured in a scalable way [9]. It has been shown that the resource overheads for a FT architecture depend critically on the precise relationship between the architecture and the underlying error model. While there are several well-studied error metrics, none of them can accurately predict the logical error rate of a quantum code, with predictions varying by several orders of magnitude [10]. In this work we ad-dress this crucial deficiency prevalent in all of the known standard error metrics.
Here, we present a new figure of merit to predict the performance of concatenated codes, which can be measured efficiently using experimental protocols. As opposed to existing metrics such as average gate fidelity and diamond distance, our approach captures the interplay between the physical noise model and the choice of FT architecture. Our method leverages Randomized Compiling (RC) [11] to create an effective Pauli noise on the physical qubits, and then uses noise reconstruction (NR) techniques [12][13][14] to estimate Pauli error probabilities. Using these data, we then design a logical estimator that predicts the total probability of Pauli errors that a code cannot correct. While exactly computing this quantity is inefficient for a generic code, we introduce an efficient approximation to accurately estimate the total probability of uncorrectable errors for concatenated codes. We provide a bound on the efficiency and demonstrate the accuracy of our method through numerical simulations in several noise scenarios of interest. Finally, as an application, we demonstrate how the logical estimator pinpoints the selection of a suitable error correcting code for differing noise environments.
The net effect of a physical noise process E 0 together with a quantum error correction (QEC) routine using an [[n, k]] stabilizer code [38] is captured by the effective logical channel E s 1 [39] acting on an encoded state ρ as: where Pr(s) is the probability of measuring the syndrome outcome s, Π s is the syndrome projector and R s is the corresponding recovery. The average logical channel E 1 is given by [10,40] While physical error rates are measured by noise-metrics on E 0 , logical error rates are measured on E 1 . Concatenated quantum codes are a popular family of codes of increasing sizes [41], and are often used to guarantee error suppression in fault tolerance proofs [7,42]. Physical qubits of a code C +1 are encoded using a code C , for 1 ≤ ≤ L−1, yielding a level-L concatenated code. The recursive encoding structure is represented by a tree where the i-th node at a depth (L − ) denotes a quantum error correcting code block C ,i . The subtree of the node is itself a concatenated code, denoted by C ,i , consisting of (n −1)/(n−1) code blocks. There are n−1 independent stabilizer measurements corresponding to each of the code-blocks of C ,i . The resulting error syndrome s(C ,i ) has (n −1) bits, which can be grouped into subsets of n−1 bits that are identified by the codeblocks. We identify the subset of syndrome bits obtained by measurements on a code-block C ,j by s(C ,j ).
We consider the following iterative routine for quantum error correction in concatenated codes. For each level = 1, . . . , L: (i) syndromes are extracted for each code block C ,1 , . . . , C ,n , and (ii) a minimum-weight correction [43] is applied in each case. Although we assume the popular choice of minimum-weight decoder in (ii), the methods prescribed in this work can be adapted to any lookup table decoder [44]. Hence, the correction applied at any level depends on the syndrome history of the code blocks in the lower levels.
The effective channel for a level-concatenated code can also be computed in a recursive fashion, i.e., using eq. (1) where E 0 is replaced by effective channel on the level-( − 1) code blocks, i.e., E s −1,1 ⊗ · · · ⊗ E s −1,n [39,45]. The performance of the level-concatenated code can be quantified [10] by the infidelity r(E ) = s Pr(s)r(E s ) of the average logical channel E . For concatenated codes, it is possible to calculate both Pr(s) and the effective channel E s . However, as the number of syndromes grow exponentially with the number of levels, Monte Carlo sampling techniques described in section G of the appendix can be used to estimate this average.

II. METHODS
While the special setting of Pauli errors drastically simplifies the predictability problem, realistic noise processes are nonetheless poorly described by Pauli error models. To circumvent this problem, we recall a straightforward application of RC [11] to fault tolerant circuits, that allows us to model the effect of complex noise processes by simple Pauli errors. In other words, RC ensures that there is no effect on the logical error rate from parameters of the physical channel other than the Pauli error probabilities. The physical twirling gates required to do RC can be absorbed into the logical gadgets of the FT circuits, at no additional cost in overhead. We provide explicit details of the procedure in the appendix section B. With a noise model described by Pauli errors, we first develop the background needed to define notion of a logical estimator that can accurately predict the logical error rate. A stabilizer code and a decoder pair is designed to correct a target set of errors E C , called correctable errors [46,47]. Errors not in this set, i.e., uncorrectable errors, contribute to the logical error rate. Ideally, we want to estimate the total probability of all uncorrectable errors, which can be obtained by adding the probabilities of correctable errors and subtracting the result from one: where χ P,Q is the element corresponding to the Pauli operators P and Q, in the chi-matrix representation [48] of the physical noise process. For Pauli error models, p u is identical to the logical infidelity, while for generic CPTP maps, a precise relationship is presented in section A of the appendix. While on the one hand, the average gate infidelity r accounts for the effect of only the trivial correctable error I, p u on the other hand captures all the degrees of freedom that are relevant to the logical error rate. However, an exact computation of p u is intractable in general. A correctable error E for the concatenated code C falls into one of the two categories: either (i) it is corrected within the lower level code-blocks C −1,1 , . . . , C −1,n , or (ii) it has a non-trivial correction applied by the decoder of the level− code-block C ,1 . Adding up the contributions to 1 − p u (C ) from cases (i) and (ii) respectively, we find where Γ(C ,1 ) = E ∈ E C \I Pr(⊗ n j=1 E −1,j ). An exact computation of Γ(C ,1 ) involves enumerating all possible syndrome outcomes for the level− concatenated code.
Our logical estimator, denoted by p u , is the result of estimating p u using an efficient approximation for Γ(C ,1 ) for concatenated codes. In particular, we use a coarse  Figures 1(a,b) compare the predictive powers of our logical estimator (red) against two standard error metrics (gray): the average gate infidelity (a) and the diamond distance (b), under an ensemble of 18000 CPTP maps. Each point p = (x p , y p ) corresponds to a physical noise process; x p is its physical error metric and y p , its logical error rate. The dispersion of points, quantified as ∆ in the insets, indicates the predictive power of the physical error metric. While logical error rates can vary over several orders of magnitude with respect to standard error metrics, our logical estimator is strongly correlated with the logical error rate. (c): Correlated Pauli error model.
grained estimate of the probability of a syndrome outcome -a joint probability distribution over O(n ) syndrome bits -calculated as a product of marginal probability distributions over the n code blocks at level ( − 1). This procedure is recursed through the levels of the concatenated code. A detailed derivation of the logical estimator is provided in section C of the appendix. For i.i.d Pauli error models with sufficiently small single-qubit infidelity r 0 , the quality of approximation is: |r − p u | ≤ n +1 C r Here, d C and n C describe the distance and the size of a codeblock of a level− concatenated code. For instance, using an i.i.d depolarizing error model with r 0 = 10 −3 and the level-2 concatenated Steane code, the above expression yields |r 2 − p u | ≤ 5 × 10 −10 . This is validated by numerics: p u = 4.24×10 −9 and r 2 = 4.20×10 −9 . Notably, the time complexity of computing p u for the concatenated code: O(4 n C + n ), scales polynomially in the total number of physical qubits, i.e., scaling as n , whereas an exact computation of p u would scale doubly exponentially in . An analysis of the quality and efficiency of the approximation, can be found in sections D and E, respectively, of the appendix.

III. RESULTS AND DISCUSSION
We provide numerical evidence to highlight the improvements offered by our methods developed for optimizing FT schemes. We begin with the task of accurately predicting the performance of concatenated Steane codes. We perform numerical simulations of quantum error correction in the RC and non-RC settings under a large ensemble of random CPTP maps applied to the physical qubits. Following Ref. [10], we generate a single qubit CPTP map E from its Stinespring dilation: a random unitary matrix U of size (8 × 8), given by U = e −iHt for a complex Hermitian matrix H whose entries are sampled from a Gaussian distribution of unit variance, centred at 0. We vary the time parameter t between 0.001 and 0.1 to vary the noise strength. Figure 1 shows that logical error rates can vary wildly across physical noise processes with fixed infidelity and diamond distance respectively, in agreement with the work in Ref. [10]. The variation, captured by the amount of dispersion in the scatter plots, is quantified using a simple measure -the ratio of the minimum and the maximum logical error rates across channels of similar physical error rate, denoted by ∆. In other words, we partition the range of physical error rates into bins b i and use ∆(b i ) to quantify the amount of dispersion: ∆(b i ) = (1/|b i |) (max p∈bi y p )/(min p∈bi y p ), where |b i | is the number of channels in the bin b i . The large fluctuations in the logical error rates can be attributed to two extreme features of the error-metrics. While infidelity controls only one parameter out of the many that specify a noise process, diamond distance, on the other hand, suffers from being sensitive to the details of a noise process that are irrelevant to the logical error rate. In addition, standard error metrics can only reveal intrinsic properties of the underlying noise process, that are agnostic to the choice of an error correcting code.
Logical estimator with RC, in contrast, is very highly correlated with the logical error rate. This improvement can be attributed to two features. First, RC provides a drastic reduction from O(12 n ) parameters that specify an n−qubit Markovian noise process to O(4 n ) Pauli error probabilities. Second, unlike standard error metrics, p u carefully accounts for Pauli error probabilities that contribute to the logical error rate. Section F of the appendix shows drastic gains in predictability using the logical estimator with RC, over standard error metrics, for the class of coherent errors.
The special setting of i.i.d noise hides the drastic advantages provided by p u in predicting logical infidelity because the dominant contribution to p u comes from χ 0,0 , which is also well captured by r. However, for correlated error-models, given only χ 0,0 , the uncertainly on the logical error rate ranges between the extremities, 0 and 1, achieved when all the multi-qubit errors are correctable, and uncorrectable, respectively. While r is completely insensitive to either of these scenarios, p u in contrast helps distinguish between them, thereby providing a far more accurate estimate of the logical error rate.
We support the above argument by numerical studies of correlated Pauli error models generated from a convex combination of an i.i.d process of infidelity r 0 and multi-qubit interactions. While the i.i.d component E iid is specified by single qubit error probabilities, multi-qubit interactions are specified by an arbitrary subset S, so, E cor (ρ) = P ∈S χ P,P P ρ P , where χ P,P is sampled from the normal distribution with mean and variance 4 n r 0 . The combined Pauli error model is therefore given by Explicitly setting χ 0,0 followed by appropriate normalization, ensures that the infidelity of the above noise model is r 0 . Figure 1(c) highlights the importance of the logical estimator over the standard infidelity error metric for predicting the performance of the concatenated Steane code under correlated Pauli noise processes.

A. Logical estimator using limited NR data
Even in the absence of correlations across the n−qubit code blocks of a concatenated code, we require O(4 n ) Pauli error rates from NR to compute p u . Extracting this exponential sized NR dataset is a challenge for experimentalists. Refs. [14,49] describe how to extract the leading K Pauli error probabilities in a noise process, where K 4 n . We want to combine a handful of leading Pauli error rates extracted by NR with a simple method to extrapolate the remaining ones. We define the probability of a Pauli error Q that is not given in the NR dataset as where wt(Q) is the Hamming weight of Q, and r 0 is derived from the infidelity of the noise process: r = 1 − (1 − r 0 ) n . We construct an adversarial example of an error model where the above extrapolation is unlikely to perform well by setting some multi-qubit error probabilities that violate eq. (5). Furthermore, when errors are sampled uniformly from the set of correctable and uncorrectable errors, we observe maximum fluctuations in the logical error rate. However, Fig. 2 presents strong numerical evidence indicating that the simple extrapolation works well in practice even for the adversarial example. FIG. 2: Accuracy of the logical estimator based on limited NR data, using a level-2 concatenated Steane code for an ensemble of about 15000 random correlated Pauli channels. The accuracy, quantified by ∆, improves sharply with the number of Pauli error rates (K) extracted using NR. We observe that for K = 200, which is about 1.2% of all Pauli error rates on the Steane code block, the accuracy closely matches the logical estimator computed using all NR data, i.e., K = 4 7 .

B. Code selection
Selecting a quantum error correcting code that has the smallest logical error rate under an existing physical noise process is a crucial step in optimizing resources for fault tolerance. To demonstrate the efficacy of the logical estimator for this problem, we consider an example of an error model and two different error-correcting codes: (i) concatenated Steane code and (ii) concatenated version of a [ [7,1,3]] code used in Ref. [2] that we refer to as a Cyclic code. The error model is obtained from a Pauli twirl on the i.i.d application of the CPTP map E : and set a bias specified by η = p Z /p X . Based on Ref. [2], we expect the Steane code to outperform the Cyclic code in one noise regime, and the converse in a different regime. Our tool is successful if it produces a lower value of p u for the code with lower logical infidelity, for any noise rate. Lastly, to compute the logical estimator as well as the logical error rate estimates, we use a bias-adapted minimum-weight decoder that assigns weights η, η, and 1 to each Pauli error of type X, Y , and Z, respectively.
Our results presented in Fig. 3 show that the logical estimator correctly identifies the optimal code for all values of the physical error rate (bias). Furthermore, it also replicates the functional form of the logical error rate, showing that the performance gain from the Cyclic code over the Steane code increases with the bias.

IV. CONCLUSIONS
We have shown how experimental data from NR, even limited data, can be used to successfully predict the logical performance of FT architectures based on concatenated codes. It can be used to precisely and efficiently estimate the resource overhead required to achieve a target logical error rate [2,50,51] for implementing quantum algorithms. Along with informing the choice of an optimal code for an underlying physical noise process, the logical estimator provides directives for other components in a FT scheme, such as a decoder. Different lookup table decoders can be compared using our logical estimator, similar to the work in Refs. [52][53][54].
Our scheme relies on RC to yield a Pauli error model, and although in theory this requires twirling with the full Pauli group, it has been observed in practice that a handful of random compilations of the original circuit are sufficient to achieve an almost Pauli-like effective noise process [55,56]. A natural question that follows is whether RC also mitigates the impact of physical noise on the logical qubit. There is no persistent trend across the general class of Markovian noise processes, and in some cases, RC degrades the performance of the code. Developing noise tailoring techniques that guarantee an improvement to the performance as well as predictability is an interesting problem for future research.
Although the methods and techniques presented in the paper address generic noise processes, there are a number of roadblocks in broadening the scope of this study beyond concatenated codes, where the complexity of computing the logical estimator grows exponentially with the size of the code. In addition, further research is needed to extend these ideas to the context of multiple logical qubits. The average logical channel E s 1 , defined in eq. 1, summarizes the effect of quantum error correction on a physical noise process E 0 affecting an encoded state ρ. In this section, we derive a closed form expression for the average logical channel in terms of the physical channel and the error correcting code parameters. Similar derivations have appeared in [39,40,57], however, we present ours for the sake of completeness.
The action of the average logical channel defined in eq. 2 on the logical state is where in the last line we used the fact that Π s P i = P i Π s⊕s(Pi) . In other words, whenever s = s(P i ), the corresponding projector Π s⊕s(Pi) annihilates the encoded state ρ.
The chi matrix χ of the effective logical channel defined by where P l and P m are logical operators of the code; can be extracted from eq. A1. The total probability of errors successfully corrected by the decoder: χ 00 , can be estimated from the following observation. An error whose syndrome is s is corrected if the net effect of applying the error along with a recovery prescribed by the decoder results in an effective action of a stabilizer. In other words, all the terms in eq. A1 where R s P i and P j R s are stabilizers contribute to χ 00 . So, where E is the logical component in the decomposition of E with respect to the Stabilizer group and φ(E) is specified by R s(E) E = φ(E) S for any Pauli error E and some stabilizer S. The average logical infidelity r is then given by 1 − χ 00 . When a Pauli error is not correctable, the effect of applying a recovery yields a logical operator. Hence, in general where R s(E) E P l = φ(E, s) S, for l ∈ {0, 1, 2, 3}, any Pauli error E and some stabilizer S.

Appendix B: Quantum error correction with randomized compiling
In this section, we discuss how randomized compiling (RC) can be performed in fault tolerant circuits. Note that a Pauli error P can be decomposed with reference to a stabilizer code: P = P S P E P , where S P is an element of the stabilizer group S, P is a logical Pauli operator in L = N (S)/S, and E P is an element of N (L)/S, usually called a pure error [45,58]. Unlike pure errors, stabilizers and logical operators commute with QEC routines. A Pauli error P can be compiled into QEC resulting in a new quantum error correction routine QEC(P ) in which the input to the decoder corresponding to a syndrome outcome s is s ⊕ s(P ) [59,60].
In fault tolerant circuits, each logical gate G is sandwiched between QEC routines. Following the prescription in [11], we divide logical gates into two sets: S 1 and S 2 , calling them easy and hard gates respectively. A crucial requirement for S 1 and S 2 is for all easy logical gates C ∈ S 1 , n−qubit Pauli gates T and hard gates G. Recall that QEC(T ) refers to the compilation of the Pauli gate T in the QEC routine, discussed in sec. I. It follows from where, in eq. B2 we have used the decomposition of Pauli gates with reference to a stabilizer code. Note that the expression G T G † in eq. B3 is guaranteed to be an easy gate for a choice of easy and hard gate sets in [11]. Fig. A.2(a) shows a canonical presentation of a quantum circuit, where the k-th clock cycle is composed of an easy gate C k and a hard gate G k , sandwiched between QEC routines. Noise processes affecting easy and hard gates are denoted by E 1,k and E 2,k respectively. These complex processes can be tailored to Pauli errors by inserting Pauli gates T 1,k , T † 1,k , T 2,k , T † 2,k . However, to guarantee that they be applied in a noiseless fashion, we compile them into the existing gates in the fault tolerant circuit. This is achieved in two steps. First, T † 1,k and T 2,k are compiled into QEC following E 1,k , resulting in QEC(T † 1,k T 2,k ). Second, T † 2,k is propagated across G k , and compiled with QEC C k+1 T k+1 , resulting in a dressed gate C D k+1 = G k T k G † k QEC C k+1 T k+1 . It follows from eq. B1 that C D k+1 is equivalent to quantum error correction followed by an easy gate. Fig. A.2(c) shows the result of compiling all of the twirling gates into the easy gates and quantum error correction routines. Note that the compiled circuit is logically equivalent to the original circuit in the absence of noise. However, in the presence of noise, the average output of the circuit is dictated by the performance of QEC(T ) averaged over the different choices of Pauli gates T . This is what we refer to as QEC in the RC setting. In practice, this average performance can be achieved by repeating every iteration (shot) of the algorithm with a different Pauli operation compiled into the constituent QEC routines. For the purpose of numerical simulations in this paper, we have used the performance of the QEC routine under the twirled noise process, as a proxy to the performance of QEC in the RC setting.

Appendix C: Logical estimator for concatenated codes
In this section we discuss the derivation for the logical estimator p u used to predict the logical error rate under physical Pauli noise processes. A stabilizer code and a decoder pair is designed to correct a target set of errors, called correctable errors [46,47] E C . For an [[n, k]] code, E C can be partitioned into 2 n−k disjoint subsets E C,1 , . . . , E C,2 n−k , each of which can be identified with a unique syndrome measurement outcome. The construction of the set E C,s closely depends on the choice of a decoder. Recall that the output of a decoder on input syndrome s is a Pauli recovery operator R s , i.e., R s ∈ E C,s . A key observation to construct elements in E C,s besides R s is that any error of the form R s S where S is an element of the stabilizer group is also correctable, so, E C,s = {R s S : S ∈ S}. Uncorrectable errors cause the quantum error correction scheme to fail. We adopt the notation p c to denote the total probability of correctable errors: and p u to denote the total probability of uncorrectable errors: p u = 1 − p c . It is easy to note that p u is an upper bound to the standard infidelity metric which is measured by randomized benchmarking, i.e., r = 1 − χ 0,0 : In particular, for Pauli noise processes the following equations show that p u is exactly the average logical infidelity r.
A detailed derivation of eq. C3 is presented in section A of this appendix. The expressions in eqs. C4 and C5 point out a conceptual difference between infidelity and the uncorrectable error probability. While on the one hand, r accounts for the effect of only the trivial correctable error I, p u on the other hand captures many more degrees of freedomincluding all other correctable errors in E C . Hence, we expect r to be a worse predictor of the logical infidelity than p u . It is generally infeasible to enumerate all the O(4 n−k ) correctable errors for an [[n, k]] stabilizer code, to compute p u exactly. Our logical estimator is the result of an efficient heuristic to approximate p u , particularly for concatenated code families. Furthermore, its accuracy is provably high for uncorrelated Pauli error models.
While for concatenated codes, the number of physical qubits itself grows exponentially in the size of a code block n, we can exploit its encoding structure to simplify the complexity of computing p u . However, it turns out that despite this simplification we cannot exactly compute p u efficiently, i.e., in time that scales polynomially in the number of physical qubits. This leads us to resort to a heuristic method for a reasonable approximation of p u for concatenated codes, described in the rest of this section. Here we present a method to measure and compute an approximation, denoted by p u (C ), to the probability of uncorrectable errors for a concatenated code C : p u (C ). For ease of notation we also define the quantities p c (C ) = 1 − p u (C ) and p c (C ) = 1 − p u (C ). An error E for the level concatenated code C can be expressed as a tensor product of Pauli errors E −1,i for the level − 1 codes C −1,i : Let us define E to be a correctable pattern if the above tensor product corresponds to an encoded version of a correctable error for the code block C . For example, E 2 = X ⊗ I ⊗6 is a correctable pattern for the = 2 concatenated Steane code since X ⊗ I ⊗6 is a correctable error for the Steane code block. A correctable error E for C falls into one of the two categories: either (i) it is corrected within the lower level code-blocks C −1,1 , . . . , C −1,n , or (ii) it has a non-trivial correction applied by the decoder of the level− code-block C ,1 . Let us denote the contribution to p c (C ) from case (i) by Λ, while that from case (ii) by Γ; so that p c (C ) = Λ(C ) + Γ(C ) . (C7) Case (i) implies that each of the errors E −1,i are correctable errors for the codes C −1,i . Therefore, the total probability of correctable errors in case (i) admits a recursive definition: Recall that case (ii) is the total probability of non-trivial correctable patterns for C , i.e., where we have used the fact that each correctable error corresponds to a pattern according to eq. C6. A logical error E −1,i occurs on the codeblock C −1,i whenever the decoder fails in correcting the physical errors in such a way that the residual effect of the physical noise process affecting the qubits of C −1,i and the recovery operation applied by the decoder results in E −1,i . Let us denote the probability of the decoder for C −1,i to leave a residual E −1,i , conditioned on the syndrome measurements by Pr D (E −1,i | s(C −1,i )). We can rewrite eq. C10 as where Pr(s(C )|s(C −1,1 ) . . . s(C −1,n )), is the conditional probability of measuring the syndrome outcomes s(C ) on the codeblock C when the outcomes on the lower level code blocks C −1,1 , . . . , C −1,n are s(C −1,1 ), . . . , s(C −1,n ), respectively. Equivalently, A major hurdle in computing Γ using eq. C12 is the sum over an exponentially large set of syndrome outcomes for the concatenated code. To circumvent this difficultly, we will apply an efficient heuristic to approximate the probability in eq. C13. In essence, we will replace the conditional channel E s(C −1,i ) −1,i by the average logical channelÊ −1,i , which is defined asÊ Note thatÊ 0,j is the physical noise model whileÊ 1,j is the exact average logical channel E 1,j . However, in general for ≥ 2,Ê is a coarse-grained approximation for the exact average logical channel E . In other words,Ê −1,i is computed using the knowledge of the syndrome bits measured only at level − 1, while assuming the noise model: E −2,1 ⊗ . . . ⊗Ê −2,n , that accounts for the average effect of all syndrome measurements at lower levels.
Replacing the conditional channel E s(C −1,i ) −1,i in eq. C12 by the average channelÊ −1,i defined in eq. C14 allows us to approximate Γ by Γ defined as follows: . . .
Denote R(s(C −1,i ), P ) to be the set of n−qubit errors on which a lookup table decoder for the code block C −1,i leaves a residual logical error P when the error syndrome s(C −1,i ) is encountered. Now Pr D (E −1,i |Ê −1,j )) can be computed recursively: Pr D (Q −2,j |Ê −2,j )) .
Note that the probability of leaving a residual error at level 0 is simply specified by the physical noise model, i.e., Pr D (P |Ê 0,j ) is the probability of the Pauli error P on the physical qubit j. This concludes the method to efficiently compute Γ, an approximation to Γ.
Recall that the total probability of correctable errors is given by eq. C7. An approximation to p c (C ), is given by where Γ defined in eq. C16 while Λ is defined in a similar fashion to eq. C8: Using the approximation in eq. C18, we can efficiently estimate the logical estimator p u for concatenated codes.
Appendix D: Approximation quality for the uncorrectable error probability In this section, we will quantity the accuracy of the approximating the uncorrectable error probability using p u for concatenated codes. For simplicity, we will assume that the code-blocks in the concatenated code are all identical, and equal to a [[n, 1, d]] quantum error correcting code, with d ≥ 3. Recall that the distance of a level concatenated code scales as d . We will use t = (d + 1)/2 to denote the Hamming weight of the smallest uncorrectable error. Recall that p u is defined recursively as the sum of two quantities: Q 1 and Q 2 . We will use δ to denote the inaccuracy in computing p u for a level concatenated code: and γ to denote the inaccuracy in computing Γ: Then it follows that The most important ingredient in computing δ is γ , defined in eq. D2. For simplicity we will compute γ for the i.i.d depolarizing error model. However, for generic i.i.d Pauli error models, we can replace the depolarizing rate p in our analysis by the physical infidelity of the single qubit error model, r 0 . The extension to correlated Pauli error models remains unclear. An i.i.d application of the depolarizing channel on n−qubits can be described by E(ρ) = P ∈ Pn χ P,P P ρ P , where P n is the n−qubit Pauli group, 0 ≤ p ≤ 1 is the depolarizing rate and |P | is the Hamming weight of the Pauli error P . In this case, we will show that for a level concatenated code. Combining eq. D5 with eq. D3, we arrive at an expression for δ : where d is the distance of a code block.
In the rest of this section, we will derive eq. D5. Recall that eq. C16 outlines the approximation made by the heuristic to compute Γ(C ). It involves replacing the knowledge of conditional channels E s −1,j by the average channel, E −1,j . We will prove the scaling in eq. D5 two steps. First, is an observation that This follows from the fact that at least one of the errors E −1,j in the error pattern E −1,1 ⊗ . . . ⊗ E −1,n must be non-identity. Note that a non-identity logical error is left as a residual when the decoder for the subsequent lower level fails. Such an event will not occur for errors whose weight is below t −1 . Second, by showing that Pr(s(C −1,j )) = Recall from eq. C14 that the average channelÊ ,i is defined recursively in terms ofÊ −1,j . While the term corresponding to s(C ,i ) = 0 describes the effect of stabilizers on the input state, the other terms include the effect of non-trivial errors. Note that the a non-trivial error E has weight at least t −1 , equal to the weight of the smallest uncorrectable error of the concatenated code C −1,j . Carrying this idea from level − 1 to level 1, we find: where in eq. D10 we have used the fact that the leading contribution to the conditional channel for the trivial syndrome, is the physical channel itself. Equation D11 describes the recursion until level = 1 whereÊ 1,j = E 1,j .
Recall that the conditional channel for an error-syndrome s(C −1,i ), is defined by applying quantum error correction routines corresponding to the syndrome outcomes in the respective code-blocks of C −1,i . Note that an error is detected (by means of a non-trivial syndrome outcome) in a code block at level when the decoder operating on the code block at level − 1 leaves a non-trivial residue. Hence, for a leading order analysis, we will consider conditional channels that correspond to trivial syndromes in all the code-blocks except for those at level one, i.e., s(C ,i ) = 0 for all > 1 in eq. D12. In other words, we will consider errors that are corrected within the code blocks in level one: Using eqs. D11 and D13, we note that the quality of the approximation in eq. D8 can be bounded as follows: where χ s 1 refers to the chi matrix of the conditional channel E s 1 whileχ 1 refers to the chi matrix of the average channel E 1 . In eq. D16, we have used the matrix norm ||A|| ∞ to refer to the maximum absolute value in the matrix.
To establish the scaling in eq. D8 it remains to show that Recall that the effective channel for a given syndrome s: E s 1 , describes the composite effect of the physical noise process and quantum error correction conditioned on the measurement outcome s. Comparing eq. A1 to the general form in eq. A2, we find an expression similar to eq. A4: For the specific case of the depolarizing channel in D4 we can express [χ s 1 ] i,i , Pr(s) andχ i,i as polynomials in the depolarizing rate p: where A s i,w is the number of Pauli errors Q of Hamming weight w on which the action of the decoder leaves a residual logical error P i . In other words, Q = P i R s S where R s is the recovery operation prescribed by the decoder for the error-syndrome s and S is any stabilizer. We can use two simple facts about errors to simplify the coefficients A s i,w . First, since the only error of Hamming weight zero is the identity which has s = 0, we find A s i,0 = δ s,0 δ i,0 . Second, since all errors of Hamming weight up to (d − 1)/2 are correctable, we find A s i,w = δ i,0 A s 0,w for all w ≤ (d − 1)/2 . Using these simplifications, It is now straightforward to see that eq. D17 follows from the above set of equations.
In summary, this section establishes that the approximation used by the heuristic to compute p u (C ,1 ), is accurate to O(n +1 p 2+ (d+1)/2 ) for the i.i.d depolarizing physical error model with error rate p. To get a sense for this approximation quality, we can plug in relevant numbers for an i.i.d Pauli error model and level-2 concatenated Steane code: p = 10 −3 , n = 7, = 2, d = 3. Numerical simulations of quantum error correction yield an estimate of the logical infidelity given by 4.2 × 10 −9 . The analytical bound suggests that the logical estimator derived from the our heuristic method agrees with the logical infidelity up to O(10 −11 ). However, the scaling suggests that the heuristic may not be not accurate for large codes in the high noise regime. Nonetheless we have strong numerical evidence to support that the logical estimator predicts the functional form of logical infidelity.
Appendix E: Time complexity of computing pu for concatenated codes Recall that p u = 1 − p c (C ), where p c (C ) is an approximation to the total probability of correctable errors. We will analyze the time complexity of the technique described in section C to compute p c (C ) here.
Using the above solution in eq. E1, we find that in the main text. The dispersion in the scatter corresponding to a metric (∆ in the insets) is indicative of its predictive power. The gains in predictability offered by our tool is drastic for the above case of unitary errors when compared to CPTP maps.

Appendix F: Predictability results for coherent errors
Numerical results presented in section III highlight the predictive power of the tools developed in this work with respect to the standard error-metrics, under random CPTP maps. Although CPTP maps encompass a wide range of physical noise processes, our method of generating random CPTP maps does not draw attention to an important class of noise processes -coherent errors -a special case of CPTP maps under which the evolution of a qubit is described by a unitary matrix. They occur due to imperfect control quantum devices and calibration errors [61,62]. Various methods such as dynamical decoupling [63,64], designing pulses using optimal control theory [65] and machine learning approaches [66] are used to mitigate these errors. However, each of these methods have their shortcomings and unitary errors continue to form a major part of the total error budget [67][68][69]. The methods presented in this paper will be particularly advantageous in these cases.
In this section we highlight the predictive power of our tool, over standard error metrics, under different coherent noise processes. We choose a simple class of coherent errors modeled by an unknown unitary U i on each physical qubit i, of the form U = e −i π 2 δn· σ , where δ is the angle of rotation about an axisn on the Bloch sphere. With a slight loss of generality, we will consider n−qubit unitary errors of the form ⊗ n i=1 U i . We control the noise strength by rotation angles δ i drawn from a normal distribution of mean and variance equal to µ δ where 10 −3 ≤ µ δ ≤ 10 −1 . Figure 5 shows that logical error rates vary over several orders of magnitudes across coherent errors with noise strength as measured by standard error-metrics such as infidelity and the diamond distance. In contrast, our tools provide an accurate prediction using the logical estimator developed in section C. Moreover, we observe a drastic gain in in predictability using our tools for this case of unitary errors, when compared to CPTP maps in figure 1 of the main text. FIG. 6: The above figures highlight the rapid convergence rate of the importance sampler as compared to the direct sampler, under CPTP noise processes in Fig. 6(a) and coherent errors in Fig. 6(b). Each trend line in the figures is associated to a physical noise rate. While different colors are used to identify different physical error rates, the solid and dashed lines are used to distinguish between the sampling techniques. Note that while the direct sampler takes a large number to syndrome samples to provide a reliable estimate of r(E ), the importance sampler achieves this task with far lesser syndrome samples. The speedup offered by importance sampling is quite drastic. The case for r = 4 × 10 −3 in Fig. 6(b) is a good example. The direct sampler shows signs of convergence around 10 7 syndrome samples, whereas the importance sampler converges with just 10 4 samples. Notice however that with only 10 4 samples, the direct sampler underestimates r(E ) by almost two orders of magnitude. section 3.3 of [37]. We briefly review the technique here for completeness. In summary the average logical error rate is grossly underestimated unless an unreasonably large number of outcomes are sampled. We will resort to an importance sampling technique proposed in [10], to improve our estimate of the average logical error rate. Previously, similar techniques have also been discussed for Pauli noise processes in [71,72]. Instead of choosing to sample the syndrome probability distribution, we sample an alternate distribution Q(s), which we will simply refer to as the importance distribution. The corresponding sampling methods with Pr(s) and Q(s) will be referred to as direct sampling and importance sampling respectively.
The expression for the average logical error rate estimated by the importance sampler takes a form: whereŝ is a random syndrome outcome drawn from the importance distribution Q(s). The average estimated by importance sampling coincides with r(E ) which is estimated by the direct sampling technique. The crucial difference between the two sampling techniques is that the variance of the estimated average can be significantly lowered by an appropriate choice for the importance distribution Q(s), which in our case, takes the form where Z is a normalization factor Z = s P (s) 1/k , and k ∈ (0, 1] is chosen such that the total probability of non-trivial syndrome outcomes, s = 00 . . . 0, is above a fixed threshold λ 0 , i.e., s =00...0 Pr(s) 1/k Z ≥ λ 0 .
(G4) Figure 6 shows that our heuristic for the importance distribution provides a rapid convergence to r(E ), when compared to the direct sampling method. Note that the noise processes in these figures are the same as those used to compare the predictive powers of physical error metrics in figures 1 and 5. Hence, the employment of importance sampling is key to an honest comparison of the predictive powers of the physical error metrics.