Independent State and Measurement Characterization for Quantum Computers

Correctly characterizing state preparation and measurement (SPAM) processes is a necessary step towards building reliable quantum processing units (QPUs). In this work, we discuss the subtleties behind separately measuring SPAM errors. We propose a protocol that can separately estimate SPAM errors, in the case where quantum gates are ideal. In the case where the quantum gates are imperfect, we derive bounds on the estimated SPAM error rates, based on gate error measures which can be estimated independently of SPAM processes. Our method shows that the gauge ambiguity in characterizing SPAM operations can be resolved, by assuming that there exists one qubit whose initial state is uncorrelated with other qubits in a QPU. We test the protocol on a publicly available five-qubit QPU and demonstrate its validity by comparing our results with simulations.


I. INTRODUCTION
Successfully operating quantum processing units (QPUs) requires sufficiently low error rates. Protocols that accurately characterize error rates in different components of a QPU are necessary for testing its quality. While there exists many well-developed methods that characterize errors of quantum gates such as quantum process tomography [1][2][3] and (variants of) randomized benchmarking [4][5][6][7][8][9], fewer have focused on studying state preparation and measurement (SPAM) errors which can be on the same order as (and sometimes surpass) gate errors in some current QPUs. For example, the combined SPAM error in current superconducting transmon qubit systems has been reported to range from 0.8% to 2% [10], while one-and two-qubit gates may achieve fidelities over 99.9% and 99%, respectively [11]. The requirement to repeatedly prepare qubits in well-defined initial states and perform syndrome measurements in quantum error correcting codes also puts SPAM operations on the same level of importance as gate operations.
Separately characterizing SPAM is not a straightforward task. Conventional approaches, such as quantum state tomography [1] or detector tomography [12][13][14], rely on the existence of some ideal set of measurements or probe states to determine the other. Gate-set tomography [15] avoids such unrealistic assumptions by simultaneously determining all state, gate and measurement operators. However, such a general treatment can only provide estimates up a gauge transformation [16,17], which can alter the relative strength between preparation and measurement errors. In this paper, we approach this problem from a new perspective, in view of the above issues. After illustrating the problem caused by the gauge freedom and a sufficient assumption to eliminate it, we provide a simple protocol from which the SPAM operators can be separately determined. We then derive upper and lower bounds on the estimated parameters in the case of non-ideal quantum gates, based on an error metric that can be estimated independently of SPAM, resolving the self-consistency problem. To make the protocol concrete, we performed it on a publicly available five-qubit QPU and obtained consistent results with a simulation. Our method provides new insights into the problem of SPAM characterization, and is valuable to validating QPUs. Moreover, it complements the many existing protocols that measure gate errors.

II. SPAM CHARACTERIZATION AND GAUGE AMBIGUITY
The ideal operations on a QPU generally include initializing the qubits in a state described by a density operator ρ, applying an arbitrary sequence of unitary gates, and making a final measurement described by a k-outcome positive operator-valued measure (POVM) M = {M 1 , . . . , M k }. We will assume here that the state which the QPU can be initialized to is unique. The implementation of each of these operations is imperfect due to a variety of noise processes. Noisy implementations of operations are denoted with an overset ∼ so that, for example,ρ is the noisy implementation of ρ. To avoid overcrowding the text, we do not put additional ∼'s on parameters describing any operator: their meaning can usually be understood from context, and additional special notations on parameters will be defined prior to being used.
Denoting the N -qubit Pauli basis as P N = {I, X, Y, Z} ⊗N , we can uniquely writẽ  In the Pauli-Liouville representation,ρ is represented as a 4 N × 1 vector |ρ with components 2 −N/2 s P [18], and similarlyM i as |M i . A linear map G is represented by Φ is called the Pauli transfer matrix, or PTM. In this picture, the result of mappingG to a stateρ is given by a matrix multiplication: |G(ρ) = ΦG|ρ . The probability p(ρ,G,M ) of an outcome corresponding to a POVM elementM given an input state |G(ρ) can be computed by the inner product via p(ρ,G,M i ) = M i |ΦG|ρ .
We now give an operational definition of what "SPAM errors" and "SPAM error rates" mean. Consider the experiment where one prepares the initial state and performs a measurement. The ideal and actual probabilities of obtaining outcome i are M i |ρ and M i |ρ respectively. We thus define the SPAM error to be the difference between these probabilities for this experiment, that is, as the vector δ SPAM with components Next, the state preparation (SP) error vector δ SP and the measurement (M) error vector δ M are defined by δ SP,i = δ SPAM,i (ρ, M ) and δ M,i = δ SPAM,i (ρ,M ), i.e., the SPAM error vector with the measurement/state preparation operators replaced by their ideal versions, respectively. For a single qubit with a two-outcome measurement, we can always write δ SPAM := (1 − , ) T . In the usual case where ρ = M 0 = |0 0|, corresponds to the probability of returning an outcome 1 when measuring ρ, which we will refer to as the SPAM error rate. Then, δ SP and δ M are also each characterized by a single parameter, which we will refer to as the SP-error rate SP , and the M-error rate M , respectively.
Ideally, one would like to obtain a full, unique description ofρ,M , and all possible control operationsG j . Unfortunately, this is impossible due to a gauge freedom [15][16][17]. In reality, these operators are hidden and we can only infer their values from probabilities based on the Born rule. It turns out that the choice of (ρ,G j ,M ) given a list of p(ρ,G j ,M ) is non-unique: they are related by a "gauge transformation" where B is an invertible matrix. This preserves all outcome probabilities when applied to all elements simultaneously, making the transformed set equally valid as the original set. On the other hand, most quality metrics for individual components (such as SP , M , or gate error rates) are not gauge-invariant. Since separate components in a QPU often require individual calibration in reality, having non-unique metrics is problematic because it becomes unclear whether an operation has improved (e.g., due to a change in control parameter) or not.
Next, we consider the weaker question of SPAM characterization, which boils down to estimating s P and m P,i . Previous studies on quantum state and detector tomography showed thatρ orM can be determined if the other is fully known. If we assume both to be in the most general form (satisfying only the physicality constraints that ρ is a density matrix andM is a POVM), but allow an arbitrary set of known, unitary gates G j , can we learn eitherρ orM ? Interestingly, the answer is still negative. In particular, since a unitary gate G j is trace-preserving and unital, it can be parametrized by where φ j is a block matrix with components φ P,Q,j . All outcome probabilities are thus in the form where we followed the definitions in Eq. (1) [19]. Among all such equations which can be constructed, s Q and m P,i always appear in a product form and cannot be separately solved for, assuming that φ only consists of constants. A gauge transformation (named "blame gauge" in [16]) of the form in the second term for some real number x will keep the equations unaltered [20]. While this transformation also needs to maintain the physicality constraints onρ and M , it is valid for most experimentally relevant cases [17]. Therefore, in addition to assuming ideal gates, one needs further assumptions about the structure of SPAM elements, and needs to design an effective operation that breaks this symmetry. Below, we will state a sufficient assumption, and present a protocol that achieves this by engineering Φ to depend upon the SPAM coefficients.

III. PROTOCOL ASSUMING IDEAL GATES
To develop a straightforward protocol, we engineer simplified effective SPAM operators based on an averaging technique in [21], by removing undesired components inρ andM . From now on, the qubit or system of qubits whose SPAM operators we would like to know will be called the target qubit (system), and we will use the subscript t to indicate parameters of the target qubit (system). For now we assume all quantum gates to be ideal, and will relax this later. Consider a single qubit initialized toρ and has a two-outcome POVM We will assume that the ideal state and measurement are ρ = M 0 = |0 0|, corresponding to s X = s Y = m X = m Y = 0, and s Z = m Z = 1 in Eq. (8). In reality these parameters deviate from the ideal, but we can use the following technique to eliminate some undesired components. By linearity of quantum operations and probabilities, applying two Pauli gates from the set {I, Z} immediately after state preparation and before the measurement, and averaging over outputs from all possible circuits would set s X = s Y = m X = m Y = 0 (this is similar to the phase cycling technique commonly used in nuclear magnetic resonance (NMR) spectroscopy to suppress spurious signals [22]). To fix m I , we apply an additional gate from {I, X} immediately before the measurement, and relabel the outcome when we apply an X so that the outcome 0 corresponds to the POVM element M 1 and vice versa. We label this with an overhead circle in Fig. 1 and Fig. 2. Averaging the results from these two circuits (with an I or X averaging gate) effectively sets m I = 1. Combining the above, the problem is now reduced to finding s Z,t and m Z,t on the target qubit q t .
A few words regarding SPAM averaging shall take place before we proceed. While the SPAM operators after averaging deviate from the original ones, they do represent the ones that actually enter the circuit, if SPAM averaging is consistently applied in all future circuits. Since our protocol estimates exactly the parameters in this averaged model, they will predict the correct experimental outputs for future circuits as well. Now, we provide a simple protocol that estimates s Z,t and m Z,t . We assume that there exists an ancillary qubit q a which can be prepared and measured independently, i.e. is described by two independent but unknown coefficients s Z,a and m Z,a (after applying the same SPAM averaging). If we apply a CNOT gate controlled on q t and targeted on q a (see Fig. 1), which we will call C t,a , the PTM on q a can be calculated to be a diagonal matrix which depends upon s Z,t as desired. In other words, the entangling CNOT gate propagates SP parameters of q t to q a . Because the measurement on q a is unaffected by the gate, we can learn s Z,t as follows: define α and β as the expectation values M 0 −M 1 a on q a in the absence and presence of the CNOT gate, respectively. A direct calculation [see Eq. (8) and Eq. (9)] shows which gives s Z,t = β t /α a . m Z,t can then be determined by a separate experiment that measures α t on q t with Note that the subscript for α refers to the qubit being measured, while for β it refers to the qubit whose parameter is being propagated. The SP-and M-error rates can then be computed as We can generalize this idea to measuring parameters of an N -qubit system, as long as we assume that there exists one additional ancilla q a that can be prepared and measured independently. This is thus a sufficient condition for breaking the gauge symmetry. Here, α a is estimated using the same circuit as the one on the left of Fig. 1. To estimate β P,t where P labels the Pauli components of the unknown initial state, the CNOT in the one-qubit case is generalized to U P = (H ⊗ I)C P (H ⊗ I), where C P corresponds to the unitary |0 0|⊗I +|1 1|⊗P , P = P 1 ⊗ · · · ⊗ P N , P i ∈ {I, Z}, and H corresponds to a Hadamard gate on q a . C P is controlled on q a and targeted on the N qubit system (see Fig. 2). As shown in Appendix A, the effect on q a is identical to Eq. (9) with s Z,t replaced by s P,t . Repeating for all possible P 's will fully determine ρ t , allowing one to determine the POVMs by detector tomography. This is summarized in Algorithm 1. Note that the number of circuits to average over grows exponentially with the size of q t . For large systems, one should instead randomly sample from the space of all SPAM averaging gates. Additionally, the controlled-P gate requires O(N ) controlled-Z gates and has depth O(N ) for the worst case, but can be achieved by two all-to-all Mølmer-Sørensen gates [23,24].
We would also like to point out that small modifications to Algorithm 1 would allow one to obtain all components of the SPAM operators in principle. For example, in the one-qubit case, one could average over {I, X} instead of {I, Z} to obtain |ρ ∼ (1, s X,t , 0, 0) T . Then performing a Y (−π/2) rotation would result in |ρ ∼ (1, 0, 0, s X,t ) T , so that s X can then be determined in exactly the same way as s Z,t before. But this is unnecessary if SPAM averaging is consistently applied for all future circuits, as we discussed previously. Apply the circuits in Fig. 2 with gate P , measure qa and record the result βP,t

IV. PROTOCOL WITH IMPERFECT GATES
We now take into account gate imperfections. The effects of gate errors must be treated in a way that does not rely on any prior information on SPAM, since they are assumed unknown. This prohibits using protocols like process tomography to extract the full effect of the gate in question, and substitute to replace Φ in Eq. (9). Protocols that estimate gate error strengths independently of SPAM offer a solution to the problem. Here we utilize the recently proposed cycle benchmarking (CB) [25] procedure. CB estimates the process infidelity of a composite cycle (consisting of a round of the original gatesG composed with a round of "dressing" gatesD), averaged over all Pauli dressing gates, namely where the process infidelity is The figure r CB is relevant when a quantum computation task is used in conjunction with a noise-tailoring procedure called randomized compiling (RC) [21]. Here, random twirling gates are inserted into the original circuit, such that the logical circuit is preserved. Uniformly averaging over all twirling gates turns the error of a composite cycle into a Stochastic Pauli channel P: P(ρ) = P ∈P N c P P ρP † , where c P 's form a probability distribution. The error rate of P is then precisely characterized by r CB (G j , G j ) [25]. This twirling is exact under the standard assumption that errors on twirling gates are gate-independent (which we assume throughout), and has a relatively small correction when gate dependence is present [21]. For simplicity we also make the standard assumption that one-qubit gates have a one-qubit error channel, however, we conjecture that this can be relaxed.  (12)] on a target qubit qt. α and β are defined in Eq. (10). rt,a is shorthand for rCB(Ct,a, Ct,a).
Let us now denote the parameters that would have been obtained with an ideal propagating cycle with a superscript ic (i.e., ideal cycle). These are the actual parameters describing our unknown SPAM operators, which are not affected by the imperfect gates. On the other hand, the ones that are actually obtained in experiments will be denoted as normal letters. We will show in Appendix B that β ic P,t can be bounded using the measured β P,t and r CB (Ũ P , U P ) as: (15) which holds independently of the dimension of q t . Since s P,t = β P,t /α a , and because α a does not involve gates with unknown effects, we see that (16) Repeating for all values of P would give a bound on each parameter s ic P,t of the estimated initial stateρ. Recall from our previous definition that the i-th component of δ SP is given by (17) where we used the superscript "ideal" to represent the ideal parameters of ρ. This is a linear function of s P , whose bounds are given by Eq. (16). Therefore the upper and lower bounds for δ SP,i can simply be obtained by optimizing each term in the sum, resulting in where sgn is the sign function, and we use the shorthand s − and s + to represent the lower and upper bounds in Eq. (16). The bounds for measurement parameters are more complicated, since one would need to perform measurement tomography based on the learned initial state, and different tomography approaches will lead to different bounds. But, the principles behind all approaches will be similar. Here we demonstrate with the simplest case of a linear inversion (LI) tomography. In LI detector tomography, one prepares an informationally-complete set of initial statesρ 1 , . . . ,ρ 4 N , all of which have been characterized using our procedure by assumption. For qubit measurements in the computational basis, the unknown POVM elements will correspond to the outcomes |0 ⊗N . . . |1 ⊗N , so there are a total of 2 N of them. Arrange the column vectors |ρ j into a 4 N × 4 N matrix S. Arrange the vectorized POVM elements |M i into a 2 N × 4 N matrix R. One then measures each basis stateρ i and records the data matrix with components D i,j = M i |ρ j . This gives the matrix relation: which can be inverted as R = D · S −1 to solve for the unknown matrix R. In the absence of gate error and measurement shot noise, this results in a noiseless reconstruction of the POVM elementsM i . In the presence of gate errors when measuring the statesρ j , we have learned from Eq. (18) that each component of the vector |ρ j is bounded. This uncertainty translates into uncertainties inM i through the matrix inverse, S −1 . In this case, each component of the resulted M i | is a (potentially highly nonlinear) function of the components of |ρ j . Nonetheless, the max and min values of M i | k are guaranteed by the extreme value theorem, and can be found using numerical programs such as SCIPY. From this, bounds on components of δ M can be derived in the same way as what we did for δ SP . The situation becomes particularly simple for one qubit with a two-outcome measurement, along with SPAM averaging. In this case there is only one unknown parameter s Z forρ and another one m Z forM 0 (M 1 is fixed byM 0 ). The bound for s Z is given directly in Eq. (16). Since m Z and s Z are inversely proportional [see Eq. (11)], the maximum of s Z gives the minimum of m Z , and vice versa. We then use Eq. (12) to convert to bounds on the error rates SP and M . These are summarized in Table I. Intuitively, a smaller gate error corresponds to a narrower range, and the region restores the previous point estimate [Eq. (10)] in the limit of perfect gates.
We incorporate gate error effects into Algorithm 1 by proposing a simple procedure to separately estimate the single qubit SP-and M-errors in a QPU. For each qubit i, label it as the "target" (t) and find an "ancilla" (a) such that a CNOT gate C t,a is allowed by the QPU's connectivity. We then run the one-qubit protocol to estimate the SPAM parameters in Table I. Repeatedly identifying each of the N qubits as the "target" gives a single-qubit SPAM characterization of the full device. This is summarized in Algorithm 2. While providing experimentally relevant single-qubit error rates for the full system, the protocol has an overhead that only scales linearly with the system size, making it a practical tool for many scenarios.
Algorithm 2 Estimating single qubit SP-and M-error rates on an N -qubit device 1: for each of the N qubits do 2: Label it as qt; choose an ancilla qa where Ct,a is allowed 3: Run the one-qubit protocol to estimate αa, αt, and βt

4:
Use cycle benchmarking to estimate rCB(Ct,a, Ct,a)

5:
Compute a regional estimate on SP,t and M,t on qt according to Table I. 6: end for We performed the above protocol on a publicly available five-qubit QPU (IBMQ-SANTIAGO [26]) to estimate SP and M on each qubit. For each target qubit, we chose the ancilla to be the one connected to it with the lowest error CNOT. The specification for the experiments can be found in Appendix D. We used the TrueQ [27] software to generate circuits, submit to the IBM-Q server, and perform data analysis. The range of possible error rates (i.e., between the upper and lower bounds in Table I) is shown in shaded regions for each estimated SP or M . The 95% confidence intervals (CIs) of the upper and lower bounds are shown as error bars, whose derivations can be found in Appendix C. Any region below zero is discarded due to the physicality constraint that error rates are positive by definition [Eq. (12)].
Since IBM-Q does not provide separate SP/M error rates for us to compare with, we simulated the same circuits by manually injecting SPAM errors, in the form of applying a Pauli X gate to the initial |0 state with probability SP,i and flipping the (classical) measurement outcome with probability M,i independently on each qubit i. We modeled noisy gates by assuming a simple T 1 + T 2 relaxation model, and using the relaxation time and gate time obtained from the provider. By manually adjusting the relative magnitudes between SP,i and M,i (indicated by the green stars) while fixing the total SPAM error to the measured value, we obtained a similar behavior compared to the data, strengthening our claim about noise on the physical device (see the bottom half of Fig. 3). Our result shows that state preparation contributes more to the total SPAM error on qubit 4, while not conclusively on the other qubits. Better distinguishability can be achieved if higher quality gates are available, as shown by the darker shaded regions in the bottom of Fig. 3, which represent the same bounds (error bars omitted) in Table I with all gate times reduced to 1/5 of their original values. and measurement (in blue) error rates on IBMQ-SANTIAGO QPU. Shaded regions represent the range of error rates consistent with the measured gate error, and error bars are 95% CIs for the endpoints. Green dots indicate measured total SPAM error SPAM. All estimates are cut-off below 0%. Bottom: A simulation assuming T1, T2 relaxation gate errors on two-qubit gates and ideal one-qubit gates, using the device's specifications. Green stars mark the magnitudes of SP-and M-errors used in the simulation, which combine to SPAM on each qubit. Darker regions mark the same bounds when all gate times are reduced to 1/5 of their original values.

V. CONCLUSIONS
In this work, we proposed a method to characterize state preparation and measurement errors independently on a QPU. In the case where quantum gates are ideal, our method returns the exact state preparation and measurement errors. In the case where quantum gates are imperfect, by utilizing randomized compiling and cycle benchmarking techniques, we derived upper and lower bounds for the estimated SPAM errors in terms of gate error rates that can be measured independently of SPAM. This resolves the self-consistency issue due to the gauge freedom. We demonstrated our protocol on a publicly available QPU and observed consistent results between the data and a computer simulation. We believe this protocol can be a valuable tool for benchmarking near-term quantum devices, in complement to the existing protocols that estimate errors on quantum gates. Here we show that the entangling cycle has the effect of "propagating" the desired component from the target qubits to the ancillary qubit. Specifically, first note that due to the commutation relations between Pauli operators, the only non-zero components of an N -qubit state after averaging over {I, Z} on each qubit are tensor products of I and Z: that is, To calculate the effective PTM on q a , we first note that [P, Q] = 0 for all P, Q ∈ {I, Z} ⊗N , and that all elements of {I, Z} ⊗N are involutory (meaning that they square to the identity). First consider the effect of the entangling gate only. We will consider a particular gate U = |0 0| ⊗ I + |1 1| ⊗ T , where T also belongs to the group of {I, Z} ⊗N . The PTM on q a is given by where the map G is the effect on q a by first attaching a state q t , applying the controlled-P gate, and tracing out q t . We can write the term G(Q) as Consider two separate cases: 1. Q = I or Q = Z. Then Q is diagonal and the only 4. An expanded version of the two-qubit propagating circuits that are actually carried out. The restoring gates are given by D † r = CD † C † . Note that SPAM averaging gates are compiled with adjacent dressing gates from randomized compiling (denoted by dashed boxes) and are implemented as one gate, hence there is only one noise channel E ⊗2 1 . The red dashed line indicates the place where we make the comparison (see text).
non-zero elements are Q 00 := 0| Q |0 and Q 11 := 1| Q |1 . Therefore we can simplify Eq. (A3) as = Q (A4) where we used the fact that T commutes with R for all T, R ∈ {I, Z} ⊗N in the first step, that Q is diagonal in the second step, and that density operators have unit trace in the third step. Therefore, 2. Q = X or Q = Y . Then Q has only off-diagonal elements Q 01 := 0| Q |1 and Q 10 := 1| Q |0 . So we can write Eq. (A3) as (A6) Here we use the fact that all elements of {I, Z} ⊗N are both Hermitian and involutory, which leads to the relation Since the only element of {I, Z} ⊗N with a nonzero trace is I ⊗N , these would be the only remaining terms when the partial trace is performed. Keeping only these terms, we further simplify Eq. (A6) as The term in the parentheses is just Q since Q = X or Q = Y has off-diagonal elements only. Thus we get G(Q) = s T Q and From the above we get the form of Φ G as: Finally, the PTM of the a one-qubit Hadamard gate is The effect of the full cycle is given by the matrix product which is identical in form to Eq. (9) in the main text.
Appendix B: Proof of Equation (15) In this section we prove Eq. (15) in the main text. First, we define the operator 1-norm A 1 := Tr √ A † A , the induced superoperator norm G 1→1 := max{ G(A) 1 : A 1 ≤ 1} and the diamond norm G := G ⊗ I d 1→1 for quantum channels [28,29]. Note that ρ 1 = 1 for any density matrix ρ. We would like to compare the distance between the final states of the ancillary qubit q a immediately before the measurement (see Fig. 4), under two cases: the imaginary case where the propagation cycle is ideal (denoted as ρ ic a , "ideal cycle"), and the actual case with imperfect gates (denoted as ρ a ).
Below we denote a round of CNOT gate as C, and the channel represented by the round of dressing gate D 1 ⊗D 2 as D. Under the assumption of gate-independent error on the dressing gates, we will denote this (single-qubit) error channel as E 1 so thatD = E 1 D. The error on the CNOT can be a general channel denoted byC = CE C .
The dressing gates are randomly sampled from the twoqubit Pauli channels {I, X , Y, Z} ⊗2 . Recall that ρ a denotes the state of the ancilla qubit immediately before measurement. Further, let us denote the full system immediately before measurement as ρ full . We then have the following chain of inequalities: for each choice of dressing gates D. The first inequality is because partial tracing does not increase the trace distance [30]. The second inequality is because trace distance is non-increasing upon action of any CPTP map. The third inequality is from the definition of the diamond norm.
The diamond norm is related to the process infidelity r p [Eq. (14) in the main text] by [21] 2r p (CD, CD) ≤ CD − CD ≤ 2d r p (CD, CD) (B2) for d-dimensional channels. From our assumption on the error model,CD = CE C E ⊗2 1 D. Note that for any channel E and unitary processes U and V, and since both · and r p are linear functions in their arguments, and where P is the twirled error channel mentioned in the main text. After averaging over all Pauli dressing gates (i.e., the circuit is randomly compiled), where the last equality is because the lower bound of Eq. (B2) is saturated for a Pauli noise channel. Combining with Eq. (B4), we finally have On the other hand, ρ a,RC − ρ ic a,RC 1 is related to β and β ic through the Holevo-Helstrom theorem for distinguishing quantum states [29]. We quote theorem 3.4 in [29] as the following lemma: Lemma 1. Let ρ 1 , ρ 2 be density operators. Let λ ∈ [0, 1]. For an arbitrary two-outcome POVM measurement described by elements {M 0 , M 1 }, it holds that From the definition of β, we can rewrite it as for each particular D. Since the measurement on q a is unchanged by the propagation cycle, we can apply the above lemma with λ = 1 2 twice: first, using the first definition of β D and the second definition of β ic D , (B11) Combining the above two equations we get: Thus, combining with Eq. (B7) and averaging over all D's, we obtain the desired result The proof can be trivially extended to the case where a multi-qubit propagation cycle, (H ⊗ I)C P (H ⊗ I), is used in place of the CNOT gate, hence Eq. (15) in the main text.

Appendix C: Calculating uncertainties in estimated parameters
In this section, we derive the expressions for uncertainties in our experiments. The directly measured quantities in our scheme are the expectation values α and β, as well as the infidelity r CB measured by cycle benchmarking. Here, we will focus on the uncertainties on α and β. The one for r CB is based on the same ideas but involves more technical details, and we refer to the Supplementary Information of the original paper [25] for the exact expressions. In this section only, we will denote the estimated values of the random variables with an overhead tilde, such that the measured value of α is written asα. We will first derive estimators for the expectation value (denoted with E) and variance (denoted with V) of a desired quantity in the general case, then apply it to the case of α and β. Finally we use standard error propagation to obtain the uncertainty on the upper and lower bounds.
The quantity of interest which we try to estimate can generally be described by the following average value: where the value of N depends on the context. In our case, λ can be α or β, so the p i 's are expectation values of single qubit measurements. There are two things to be noted about estimating this quantity: first, N can be very large in general, so that it is sometimes not possible to exhaustively sample all p i 's. Second, each p i cannot be measured perfectly because of finite sampling error. From now on, we will assume that we sample n out of the N elements with or without replacement. For a particular n-element sample s, the value of each sampled elementp si is a random variable which is denoted with a hat. Furthermore we assume that E [p i ] = p i , and that the variance V [p i ] := σ i exists and can be estimated using an unbiased estimatorσ i . We now derive the estimators of interest. By a simple counting argument (see, for example, Sec. 2.6 of [31]) and the law of total expectation, it is easy to see that where the standard abuse of notation of denoting the sample with subscript (and summation) is used. This holds true whether we are sampling with or without replacement. Therefore the estimator is an unbiased estimator for the population mean. Next, we can use the law of total variance to compute the variance of this estimator. Note that in the absence of noisy measurements (p i = p i ) the problem reduces to estimating the variance of the sample mean, which has a wellknown formula (see Sec. 2.5 and 2.6 of [31]) when the sampling is done without replacement: or with replacement: where σ 2 is the population variance of p i : By the law of total variance we can extend to account for noisy measurements: for sampling without replacement we have and similarly for with replacement, An important implication from the above expression is that, for situations where each σ i is small, or where N is very large, the above expression depends mostly on the spread of the quantities over the set of values (i.e., σ 2 ) and only very weakly on N . A practical example that aligns with our protocol is where N grows exponentially in the number of qubits, and σ i decreases as the square root of measurement "shots." This ensures that randomly sampling from a large population is scalable in practice.
We then need an estimator for σ 2 and 1 N N i=1 σ 2 i . For the second quantity it is simply 1 n n i=1σ 2 si , which are variances of each element in the chosen sample. For the first quantity, it can be shown that the sample variance corrected by the average of σ 2 si 's gives an unbiased estimator for σ 2 : i.e., wheres 2 is the variance for the chosen sample. To see this, note that σ 2 is equal to Meanwhile, the expectation value ofs 2 is By a counting argument, the first term evaluates to The second term evaluates to where the last equality is given by independence of p i 's. Combining Eqs. (C19), (C20) and (C23) and compare with Eq. (C17), one sees that the corrected estimator [Eq. (C13)] is unbiased. We can then use this in Eq. (C11) and simplify to get our unbiased estimator for the variance for sampling without replacement: (C28) Eq. (C3) and Eq. (C26) [or Eq. (C28)] allow us to write down mean and variance estimators for any quantity in the form of Eq. (C1). We then use standard (linearapproximated) error propagation to estimate uncertainties on the parameters of interest, i.e., upper and lower bound on the error rates SP and M from Table 1 in the main text. Each bound is an independent estimate and is a function of the four parameters: α t , α a , β t , and r t,a . The uncertainty on individual parameters are independent of each other, so its covariance matrix Σ p is diagonal (where p stands for "parameter"). The first order approximation to the covariance matrix of the bounds is given by Σ b = JΣ p J T where J is the Jacobian. We then take the diagonal elements of Σ b to be the uncertainties of the bounds.
Finally we mention how each σ 2 si is estimated for our experiment. Sincep i equals to a binomial variable divided by the sample size k, it has mean p i and variance pi(1−pi) k . Using again the relation E p 2 = p 2 +σ 2 , it can be verified that an unbiased estimator for the variance is pi(1−pi) k−1 . Becauseα i = 2p i − 1, σ 2 α,i = 4σ 2 p,i . We can express the estimator in terms ofα i as The estimator for β is identical, except changing theα i toβ i in the above expression.
3. The infidelity r t,a for each CNOT was estimated using cycle benchmarking by repeating the CNOT cycle {4, 84} times, and averaging over all 16 Pauli decay strings, each by sampling 30 random circuits with 128 shots. The specific choice of CNOT gates used in the experiment were: C 0,1 , C 1,0 , C 2,1 , C 3,2 , C 4,3 .
Next we sketch how the simulation was performed. For each qubit, we individually add a state preparation error to it by replacing the ideal initial state |0 with a density matrix ρ = diag(1 − SP , SP ). We add a measurement error by classically flipping the outcome (symmetrically, from 0 to 1 and from 1 to 0) with probability M . Gate errors are simulated using a simple T 1 , T 2 relaxation model: for each clock cycle in the circuit, we apply a noise pro-cess to each qubit defined by the following Choi matrix (D1) The T 1 and T 2 relaxation times are obtained from the provider and are tabulated in Table II