Demonstrating scalable randomized benchmarking of universal gate sets

Randomized benchmarking (RB) protocols are the most widely used methods for assessing the performance of quantum gates. However, the existing RB methods either do not scale to many qubits or cannot benchmark a universal gate set. Here, we introduce and demonstrate a technique for scalable RB of many universal and continuously parameterized gate sets, using a class of circuits called randomized mirror circuits. Our technique can be applied to a gate set containing an entangling Clifford gate and the set of arbitrary single-qubit gates, as well as gate sets containing controlled rotations about the Pauli axes. We use our technique to benchmark universal gate sets on four qubits of the Advanced Quantum Testbed, including a gate set containing a controlled-S gate and its inverse, and we investigate how the observed error rate is impacted by the inclusion of non-Clifford gates. Finally, we demonstrate that our technique scales to many qubits with experiments on a 27-qubit IBM Q processor. We use our technique to quantify the impact of crosstalk on this 27-qubit device, and we find that it contributes approximately 2/3 of the total error per gate in random many-qubit circuit layers.


I. INTRODUCTION
Quantum computers suffer from a diverse range of errors that must be quantified if their performance is to be understood and improved.Errors that are localized to single qubits or pairs of qubits can be studied in detail using tomographic techniques [1,2].However, many-qubit circuits are often subject to large additional errors, such as crosstalk [3][4][5][6][7][8], that are not apparent in isolated one-or two-qubit experiments.There are now techniques for partial tomography on individual many-qubit circuit layers (also called "cycles"), including cycle benchmarking [9] and Pauli noise learning [10][11][12].But quantum computers can typically implement exponentially many different circuit layers, and it is only feasible to characterize a small subset of them.
Randomized benchmarks [5][6][7][8][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]] make it possible to quantify the rate of errors in an average n-qubit layer, by probing a quantum computer's performance on random n-qubit circuits.However, established randomized benchmarks cannot measure the performance of universal layer sets in the many-qubit regime, where quantum computational advantage may be possible.Those randomized benchmarks that can be applied to universal layer sets, such as standard randomized benchmarking (RB) [15,16] and cross-entropy benchmarking (XEB) [27][28][29], require classical computations that scale exponentially in the number of qubits (n).XEB requires classical simulation of random circuits that are famously infeasible to simulate for more than approximately 50 qubits [28].This is because XEB requires estimating the (linear) cross-entropy (a) Randomized mirror circuits combine a simple reflection structure with randomized compiling to enable scalable and robust RB of universal gate sets.(b) Data and fits to an exponential obtained by using our method-MRB of universal gate sets-to benchmark a universal gate set on n = 1, 2, 3, 4 qubits of the Advanced Quantum Testbed, and the average error rates of n-qubit layers (r Ω , where Ω is the layer sampling distribution) extracted from these decays.(c) We benchmarked each connected set of n qubits for n = 1, 2, 3, 4, enabling us to map out the average layer error rate (r Ω ) for each subset of qubits.
between each circuit's actual and ideal output distributions.

arXiv:2207.07272v3 [quant-ph] 11 Oct 2023
Standard RB of a universal layer set is restricted to even smaller n, because it requires compiling and running Haar random n-qubit unitaries [15].This compilation requires classical computations that are exponentially expensive in n, and results in circuits containing O(2 n ) two-qubit gates [32].Due to the large overhead, even standard RB on Clifford gateswhich has lower overheads and non-exponential scaling-has only been implemented on up to 5 qubits [5,6,8].Existing RB protocols can be used for heuristic estimates of the performance of a universal gate set-e.g., by synthesizing Clifford gates from a universal gate set [33] or by separately benchmarking Clifford gates with standard RB and a non-universal set of non-Clifford gates with dihedral RB [18] or interleaved RB [22,[34][35][36][37].However, these approaches do not holistically assess a universal gate set, and they typically require strong assumptions on the types of gate errors to be accurate.
In this paper we introduce and demonstrate a simple and scalable technique for RB of a broad class of universal gate sets.Our technique uses a novel kind of randomized mirror circuits, shown in Fig. 1 (a), and advances on a recently introduced method-mirror RB (MRB)-that enables scalable RB of Clifford gates [6].Mirror circuits [6,7,12] use a layerby-layer inversion structure that enables classically efficient circuit construction and prediction of that circuit's error-free output.The idea of layer-by-layer inversion was explored in the earliest work on RB [13,14], and recently it was shown that the addition of Pauli frame randomization [38] to Clifford mirror circuits enables reliable error rate estimation [6,7,12].The randomized mirror circuits we introduce here combine layer-by-layer inversion with a form of randomized compilation [39] to enable reliable and efficient RB of universal gate sets.MRB on universal gate sets consists of running randomized mirror circuits of varied depths and computing their mean observed polarization [6], a quantity that is closely related to success probability.The mean observed polarization versus circuit depth is fit to an exponential decay, as shown in Fig. 1 (b).As in standard RB, the estimated decay rate is then simply rescaled to estimate the average error rate of n-qubit layers.MRB therefore preserves the core strengths and simplicity of standard RB and XEB, while avoiding the classical simulation and compilation roadblocks that have prevented scalable and efficient RB of universal gate sets.
We use MRB to study errors in two different quantum computing systems, based on superconducting qubits.We demonstrate our method on 4 qubits of the Advanced Quantum Testbed (AQT) [40] and on all of the qubits of a 27-qubit IBM Q quantum computer (ibmq montreal) [41].In our experiments on AQT we use MRB to quantify and compare the performance of three different layer sets on each subset of n qubits (for n = 1, 2, 3, 4), including a layer set containing non-Clifford two-qubit gates [see Fig. 1 (b-c)].In our demonstration on ibmq montreal we show that our method scales to many qubits by performing MRB on a universal gate set on up to 27 qubits.
Multi-qubit MRB enables probing and quantifying crosstalk, which is an important source of error in contemporary many-qubit processors [3][4][5]7] that cannot be quantified by only testing one or two qubits in isolation.We quantify the contribution of crosstalk errors to the observed error rates in our experiments on AQT and further divide the error into contributions from individual layers and gates.The techniques we introduce for these analyses complement other established RB-like methods for estimating the error rates of individual gates-such as interleaved RB [34,36,42] and cycle benchmarking [9].We use MRB to study how crosstalk errors vary on ibmq montreal as n increases, with n ranging from n = 1 up to n = 27.We find that crosstalk errors dominate in circuit layers with n ≫ 1 qubits.
This paper is structured as follows: In Section II we introduce our notation and define the error rate that our method measures.In Section III we define the MRB protocol.In Section IV we present theory and simulations that show that MRB is reliable.In Sections V and VI we present the results of our experiments on AQT and demonstration on IBM Q's quantum processors, respectively.

II. DEFINITIONS AND PRELIMINARIES
In this section, we introduce our notation and background information related to our method.In Section II A we introduce the notation and definitions used throughout this paper.In Section II B we introduce the type of random circuits whose error MRB is designed to measure.In Section II C we describe the gate sets that our method can be used to benchmark, i.e., we state the conditions that a gate set must satisfy if it is to be benchmarked with MRB.

A. Definitions
We begin by introducing our notation and definitions.A k-qubit gate g is an instruction to perform a particular unitary operation U(g) ∈ SU(2 k ) on k qubits.We will only consider k = 1, 2, and we use G 1 and G 2 to denote a set of one-and two-qubit gates, respectively.In this work G 2 will only contain controlled rotations about the X, Y, or Z axis, denoted CP θ and defined by where θ is the angle of rotation and P is the axis of rotation.
We denote the single-qubit gate that is a rotation by θ about P by P θ .An n-qubit, depth-d circuit is a length-d sequence of n-qubit layers An n-qubit layer L is an instruction to perform a particular unitary operation U(L) ∈ SU(2 n ) on those n qubits.In this work, we use layers that consist of parallel applications of only one-qubit gates or only two-qubit gates.We use L(G) to denote the set of all layers constructed by parallel applications of gates from the gate set G. Often it will be convenient to think of random circuits and layers as random variables, and when we do so we use the L font, e.g., we often use L to denote a layer-valued random variable, meaning that L = L with probability Ω(L) for some distribution Ω over L(G).We use L −1 to denote an instruction to perform the operation U(L) −1 .For a layer or circuit L, we use U(L) and ϕ(L) to denote the superoperator for its perfect and imperfect implementations, respectively, so U(L)[ρ] = U(L)ρU † (L).We assume that ϕ(L) is a completely positive trace-preserving (CPTP) map.We often represent superoperators as matrices, acting on states represented as vectors in Hilbert-Schmidt space (denoted by |ρ⟩⟩).A layer L's error map is defined by E(L) = ϕ(L)U † (L).The entanglement fidelity (also called the process fidelity) of ϕ(L) to U(L) is defined by where φ is any maximally entangled state of 2n qubits [43].Throughout, we use the term "(in)fidelity" to refer to the entanglement (in)fidelity.Our theory will make use of the polarization of a channel E, which is a rescaling of E's fidelity given by as well as stochastic Pauli channels.An n-qubit stochastic Pauli channel E pauli,{ϵ P } is parameterized by a probability distribution {ε P } over the 4 n Pauli operators (P n ), and it has the action with P∈P n ε P = 1.For a stochastic Pauli channel, the total probability of a fault, i.e., the probability it applies a nonidentity Pauli operator, is 1 − ε I n = 1 − F(E pauli,{ε Q } ).

B. Ω-distributed random circuits
In this work, we aim to estimate the average error rate ϵ Ω of circuit layers sampled from a distribution Ω.We now introduce a natural family of circuits-which we call Ω-distributed random circuits-that we use in our method in order to estimate ϵ Ω .Ω-distributed random circuits are similar to the circuits used in XEB and other benchmarking routines.They are defined in terms of a customizable gate set G and sampling distribution Ω over that gate set.This gate set consists of oneand two-qubit gate sets G = (G 1 , G 2 ), and Ω is determined by two probability distributions Ω 1 and Ω 2 over n-qubit layer sets L(G 1 ) and L(G 2 ), respectively.An Ω-distributed random circuit with a benchmark depth of d is a circuit-valued random variable where the d odd-indexed layers are Ω 1 -distributed and the d even-indexed layers are Ω 2distributed.These circuits consist of interleaved layers of one and two-qubit gates, so it is useful to define a composite layer to be a pair of layers of the form L = L 2 L 1 where L 1 ∈ L(G 1 ) is a layer of one-qubit gates and L 2 ∈ L(G 2 ) a layer of one-qubit gates.We denote the set of all composite layers by L(G).An Ω-distributed random circuit of benchmark depth d then consists of d composite layers that are Ω-distributed over L(G) with Ω(L 2 L 1 ) = Ω 1 (L 1 )Ω 2 (L 2 ).

C. The gate set and sampling distributions
Our technique requires certain conditions of the gate set G = (G 1 , G 2 ) and the sampling distributions Ω 1 and Ω 2 .In order to construct the circuits required for our method, the gate set and sampling distributions must satisfy the following properties: 1. G 2 is a set of CP θ gates (defined in Section II A) and is closed under inverses.Examples of valid G 2 are {cnot} and {cs, cs † }.
2. G 1 is closed under inverses, conjugation by Pauli operators, and multiplication by the single-qubit Pauli axis rotation P θ for each CP θ ∈ G 2 .This is guaranteed to hold if G 1 is the set of all single-qubit gates SU(2).
In addition, we require that our circuits are highly scrambling.To ensure that our circuits are highly scrambling, we require that our gate set and sampling distributions satisfy the following conditions: 1. G 1 is a unitary 2-design over SU (2).Examples of valid G 1 are the set of all single-qubit gates SU(2) and the set of all 24 single-qubit Clifford gates C 1 .
2. G 2 contains at least one gate with θ 0, i.e., it must contain at least one entangling gate.

Ω-distributed layers quickly randomize and delocalize
errors.Informally, this means that any Pauli error is mapped to a distribution over many different errors before another error is likely to have occurred.Formally, we require that for all Pauli operators P, P ′ I n , there exists a constant k ≪ 1 /ε such that where P[ρ] = PρP and P ′ [ρ] = P ′ ρP ′ ,L 1 , . . ., L k are Ωdistributed random layers, δ ≪ 1, and ε is the expected infidelity of an Ω-distributed random layer.While we require that k ≪ 1 /ε for our theory, this condition on k can be relaxed when n ≫ 1 because errors that occur on spatially separated qubits cannot cancel even if they occur in sequential circuit layers.Note that Eq. ( 6) is not equivalent to requiring that a length k sequence of Ωdistributed layers is a good approximation to a unitary 2-design [because we do not require that 4. Ω 1 is the uniform distribution over G 1 .
5. Ω 2 is invariant under exchanging any subset of the gates in a two-qubit gate layer L with their inverses.
The above conditions are sufficient to ensure our circuits are highly scrambling, but not necessary.In particular, our method can be generalized to single-qubit gate sets G 1 that only generate a unitary 2-design and to distributions Ω 1 other than the uniform distribution.However, this complicates the analysis, so we do not consider this more general case herein.

III. SCALABLE RANDOMIZED BENCHMARKING OF UNIVERSAL GATE SETS
In this section we introduce our method for MRB of universal gate sets.In Section III A we introduce the family of randomized mirror circuits used in MRB.In Section III B we explain the MRB data analysis and define the complete MRB protocol.
A. Randomized mirror circuits for universal gate sets Our protocol uses a novel family of randomized mirror circuits [6,7,31] that we now introduce.The structure of these randomized mirror circuits allows our protocol to measure ϵ Ω , the average error rate of n-qubit layers sampled from Ω (see Section IV A for the precise definition of ϵ Ω ), without expensive classical computation.One approach to estimating ϵ Ω is to run Ω-distributed random circuits of varied depths, and then estimate the decay in the (linear) cross entropy between these circuits' ideal and actual output probability distributions [27,29].This is because the decay rate of this cross entropy is known to be approximately equal to ϵ Ω [27,29].The problem with this method is that the classical computation cost of computing the ideal output probability distribution scales exponentially in the number of qubits (n) when the gate set is universal [28], limiting it to n ≲ 50.To estimate ϵ Ω without expensive classical computation our protocol runs Ω-distributed randomized mirror circuits, which use an inversion structure to transform an Ω-distributed random circuit into a circuit with an efficiently computable outcome.
We construct a specific randomized mirror circuit on n qubits with benchmark depth d via the three-step procedure shown in Fig. 2.This procedure consists of first sampling a circuit C 1 consisting of an Ω-distributed random circuit preceded by an initial layer of random single-qubit gates that randomizes the state input into the circuit (enabling estimation of the circuit's fidelity using the method of Ref. [44]).We then append the inverse of C 1 to obtain C 2 , a simple form of mirror (or motion-reversal) circuit which, if run perfectly, always outputs a single bit string.Finally, C 2 is randomly compiled, to prevent systematic coherent addition or cancellation of errors between the Ω-distributed random circuit and its inverse-which is essential for reliable estimation of ϵ Ω .The exact procedure is as follows: (a) A layer L 0 sampled from Ω 1 , which consists of a single-qubit gate on each qubit.(b) d /2 composite layers L i L θ i , where L i is sampled from Ω 1 , and 2. (Construct simple mirror circuit) Add to the circuit C 1 the layers in step 1 in reverse order, with each layer replaced with its inverse.The result is a circuit such that U(C 2 ) = I.

(Randomized compiling)
Construct a new circuit M by starting with C 2 and replacing layers using the following randomized compilation procedure, which reduces to standard Pauli frame randomization [39] when the two-qubit gates are all Clifford gates.To specify our procedure, we first write C 2 [Eq.(7)] in the form where L θd /2+1 is a dummy (empty) 2-qubit gate layer, so that C 2 consists of alternating layers of one-and two-qubit gates.Then: (a) For each single-qubit gate layer L i in C 2 , sample a uniformly random layer of Pauli gates P i , that in the following procedure is inserted after and then compiled into L i .
(b) Replace each two-qubit gate layer L θ i in C 2 with a new two-qubit gate layer T (L θ i , P i−1 ) that is constructed as follows: For each gate CP θ in L θ i with control qubit q j and target qubit q k , consider the instructions in P i−1 acting on q j and q k , denoted by P i−1, j and P i−1,k , respectively.If U(P i−1, j ) = I or Z, then add CP ϕ acting on (q j , q k ) to T (L θ i , P i−1 ) where (c) For each single-qubit gate layer L i in C 2 with i > 0, we define a layer of single-qubit gates P c i−1 that undoes the effect of adding P i−1 into the circuitmeaning a layer such that U(P c i−1 T (L θ i , P i−1 )P i−1 ) = U(L θ i ).Because G 2 is restricted to only controlled Pauli-axis rotations, the correction takes the form U(P c i−1 ) = U(P i−1 P˜θ i ), where P˜θ i consists of singlequbit Pauli axis rotations.If L i is not immediately preceded by a two-qubit gate layer, then P˜θ i = I.Otherwise, (d) Replace each single-qubit gate layer L i in C 2 with a recompiled layer R(P i L i P c i−1 ), defined by This randomized compilation step transforms the layer pair The final circuit produced by this procedure (M) has the property that U(M) = U(P d+1 ), i.e., its overall action is an n-qubit Pauli operator.So, if run perfectly, M returns a single bit string (s M ) that is determined during circuit construction with no additional computation needed.To construct a randomized mirror circuit of benchmark depth d (and total depth 2d + 2) we first sample a random depth d + 1 circuit.This circuit alternates between layers of randomly sampled one-qubit gates and layers of randomly sampled two-qubit gates.It can be thought of as consisting of a single initial layer of random one-qubit gates followed by d /2 composite layers (see inset).We then append to this circuit its inverse, i.e., the circuit in reverse with each layer replaced with its inverse.This creates a depth 2d + 2 circuit that will, if run perfectly, always return the all zeros bit string.This circuit is susceptible to systematic addition or cancellation of errors between the two halves of the circuit.To prevent this unwanted effect we then apply randomized compiling to the circuit.We insert a layer of random single-qubit Pauli gates (cyan) after each one-qubit gate layer.In order to guarantee that this randomly compiled circuit still always, if run perfectly, returns a single bit string s, our procedure (1) changes the rotation angles in the two-qubit gates (orange) if these gates are not Clifford gates, (2) adds in single-qubit Pauli axis rotations following the two-qubit gates (red) and, (3) adds in correction Pauli gates (purple) prior to each single-qubit gate layer.The yellow boxes show gates that are compiled together to create the final circuit of depth 2d + 2. This circuit contains d composite layers, which we call its benchmark depth.
The final depth-d randomized mirror circuit has the form where , is the circuit obtained after applying randomized compilation to the d /2 composite layers sampled from Ω and their inverses.

B. RB with non-Clifford randomized mirror circuits
We now introduce our protocol-MRB for universal gate sets.Our protocol has the same general structure as standard RB [15] and many of its variants: an exponential decay is fit to data from random circuits.However, our data analysis method is different from standard RB.We use the same analysis technique as MRB of Clifford gate sets [6].In particular, for each n-qubit circuit C that we run, we estimate its observed polarization [6] where h k is the probability that the circuit outputs a bit string with Hamming distance k from its target bit string (s C ).As shown in Ref. [6] and discussed further below, the simple additional analysis in computing S simulates an n-qubit 2-design twirl using only local state preparation and measurement.
A specific MRB experiment is defined by a gate set G, a sampling distribution Ω, and the usual RB sampling parameters (a set of benchmark depths d, the number of circuits K sampled per depth, and the number of times N each circuit is run).Our protocol is the following: 1.For a range of integers d ≥ 0, sample K randomized mirror circuits that have a benchmark depth of d, using the sampling distribution Ω, and run each one N ≥ 1 times.
3. Fit S d , the mean of S at benchmark depth d, to where A and p are fit parameters, and then compute

IV. THEORY AND SIMULATIONS OF MRB ON UNIVERSAL GATE SETS
In this section we present a theory for MRB of universal gate sets that shows that our method is reliable.We show that the average observed polarization ( S d ) decays exponentially, and that the MRB error rate (r Ω ) approximately equals the average error rate of Ω-distributed layers (ϵ Ω ).In Section IV A we define ϵ Ω , the error rate that MRB is designed to measure.In Section IV B we show that r Ω ≈ ϵ Ω assuming Pauli stochastic error on each circuit layer.In Sections IV C and IV D we present theory and simulations of the performance of MRB under general Markovian errors to further validate our method.In particular, we show that the randomized compilation step of our circuit construction guarantees that all errors in the circuit are twirled into Pauli stochastic error (implying that r Ω ≈ ϵ Ω ) under the assumption that all two-qubit gates are Clifford gates.The relative error δ rel = (r Ω, perQ −ϵ Ω, perQ ) /ϵ Ω, perQ divided by its uncertainty σ δ rel for each randomly sampled error model (σ δ rel is calculated via a standard non-parametric bootstrap).The MRB error rate r Ω is biased towards very slightly underestimating ϵ Ω for n > 2 qubits, which is expected from our theory (see main text).
A. The error rate of Ω-distributed random circuits Our claim is that r Ω is a reliable estimate of the average error rate ϵ Ω of Ω-distributed n-qubit circuit layers.We now make this claim precise by defining ϵ Ω .Surprisingly, defining the error rate that our method (or any other RB method) should aim to estimate is challenging.RB protocols are often formulated as methods for measuring the mean infidelity of a set of n-qubit gates or layers, but this is subtly flawed: the mean infidelity is not an observable property of a set of physical gates-it is not "gauge-invariant" [45].One solution to this problem, which we adopt herein, was introduced in Ref. [46]: the rate of decay of the mean fidelity of a family of random circuits, as a function of increasing circuit depth, is (approximately) gauge-invariant.This decay rate can therefore be what an RB protocol aims to measure.
Our protocol aims to estimate the rate at which the fidelity of Ω-distributed random circuits decays with depth.The average fidelity of Ω-distributed random circuits with benchmark depth d ( Fd ) is given by The requirement that our Ω-distributed circuits are highly scrambling, which is guaranteed by our restrictions on G and Ω (see Section II C), ensures that Fd decays exponentially, and therefore has a well-defined rate of decay.In Section IV we show that Fd decays exponentially in depth for our circuits, i.e., Fd ≈ Ap d rc + B, for constants A and B. We then define We choose ϵ Ω to be this particular rescaling of p rc because p rc corresponds to the effective polarization of a random composite layer in an Ω-distributed random circuit-i.e., the polarization in a depolarizing channel that would give the same fidelity decay-and so ϵ Ω is the effective average infidelity of a layer sampled from Ω.When stochastic Pauli errors are the dominant source of error, ϵ Ω is approximately equal to the average layer entanglement infidelity (see Appendix A 3).

B. MRB with stochastic Pauli errors
We now show that r Ω ≈ ϵ Ω under the assumption of stochastic Pauli errors on each circuit layer.A more detailed derivation can be found in Appendix A. Throughout this section, we will treat circuits and circuit layers as random variables.We assume each circuit layer has gate-dependent Markovian error, ϕ(L) = E(L)U(L).We will model the error on state preparation and measurement (SPAM) and the first and last circuit layers of a randomized mirror circuit [R(P 0 L 0 ) and R(P d L −1 0 P c d−1 ), respectively] as a single global depolarizing error channel immediately before the final circuit layer.We assume E SPAM is independent of L 0 and the target bit string of the circuit.We start by showing that the mean observed polarization [Eq.(12)] of randomized mirror circuits, which is measured in the MRB protocol, equals the mean polarization of the overall error map of a randomized mirror circuit.An implementation of the depth-d randomized mirror circuit M d [whose structure is given in Eq. (11)] can be expressed in terms of its error and its target evolution U(P d+1 ) as where (19) and Eq. ( 18) defines an overall error map for M d , which includes the error from the d /2 Ω-distributed circuit layers and their inverses (after randomized compilation).To extract the polarization [Eq.( 4)] of this error map, we average over the initial circuit layer L 0 , making use of a fidelity estimation technique that requires only single-qubit gates: the fidelity of any error channel E can be found by averaging over a tensor product of single-qubit 2-designs [44].In particular, for any bit string y ∈ {0, 1} n , where Ē = E L [U(L) † EU(L)] and L = ⊗ n i=1 L i , where each L i is a independent, single-qubit 2-design [44].Applying Eq. (21) to Eq. (17), we find that where S (M d ) denotes the observed polarization [Eq.( 12)] of M d .Therefore, the mean observed polarization over all depthd randomized mirror circuits is Equation ( 23) says that the average observed polarization S d , which is estimated in the MRB protocol, is equal to the expected polarization of the error channel of a depth-d randomized mirror circuit.We now show how S d depends on the error rate of layers sampled from Ω (ϵ Ω ).To do so, we use the fact that a depth-d randomized mirror circuit consists of randomized compilation of a circuit consisting of a depthd /2 Ω-distributed random circuit Cd /2 followed by its inverse.These two depthd /2 circuits are both Ω-distributed (even after randomized compilation), but they are correlated.In particular, where E eff (Cd /2 ) is the overall error map for Cd /2 [i.e., ϕ(Cd /2 ) = U(Cd /2 )E eff (Cd /2 )] and Ēeff (C −1 d /2 ) denotes the average error map over all possible circuits C ′ resulting from applying randomized compilation to C −1 d /2 .Expressing Eq. ( 24) in terms of the mean observed polarization of the overall error map on an Ωdistributed random circuit, we have where and Equation (25) shows that and Γd /2 decay exponentially, Eq. ( 25) relates their decay rates-i.e., r Ω = ϵ Ω if ∆ Ω = 0. ∆ Ω quantifies the correlation between the overall error map of a depthd /2 Ω-distributed random circuit and the overall error map of its randomly compiled inverse.We conjecture that |∆ Ω | is typically small for physically relevant errors, which is supported by our simulations (see Section IV D) and prior work [6].
We now show that the expected polarization of Ωdistributed random circuits ( Γd ) decays exponentially.Together with the assumption that |∆ Ω | is small, this implies that S d decays exponentially.To show that Γd decays exponentially, we will assume that the error on each composite layer E(L) is a stochastic Pauli channel [Eq.(5)].This assumption implies that E eff (C d ) [Eq. (19)] is the composition of a stochastic Pauli channel for each composite layer of Md , each rotated by a unitary.This allows us to relate the polarization of E eff ( Md ) to the polarizations of the error channels of individual circuit layers using the scrambling condition required for MRB [Eq.(6)].
Due to the scrambling condition on the gate set and sampling distribution for MRB [Eq.( 6)], the polarization of the effective error channel of Ω-distributed random circuits is approximately equal to the product of the polarizations of the layers' error channels.Specifically, the expected polarization of the overall error map is where δ = O (dε(δ + kε)), and ε is the average layer infidelity.
Because circuits longer than d = O( 1 /ε) have negligible polarization, we need only consider the case where dε = O(1).
Because kε and δ are small, δ is negligible.In the small n limit, Eq. ( 28) follows because Eq. ( 6) implies that depth-k Ω-distributed random circuits rapidly converge to a unitary 2design (as a function of k).In this case, errors in Ω-distributed random circuits are rapidly scrambled into global depolarizing errors, which implies that the polarizations of the circuit layers approximately multiply.For n ≳ 3, our circuits do not quickly converge to a 2-design, but in Appendix A 3 we show that Eq. ( 6) implies that error cancellation is negligible in Ω-distributed random circuits, from which it follows that Γd decays exponentially at a rate determined by the expected layer polarization.
We have shown that the expected polarization of the overall error map of Ω-distributed random circuits decays exponentially, and we now relate its decay rate to the decay rate of the observed polarization of randomized mirror circuits, thereby relating r Ω and ϵ Ω .Combining Eq. ( 28) with Eq. ( 25), we have Assuming that ∆ Ω is small, Eq. ( 29) implies that S d and Γd have approximately the same decay rate, which implies that r Ω ≈ ϵ Ω .

C. MRB with general errors
The theory presented above (Section IV B) shows that MRB is reliable whenever stochastic Pauli errors dominate over all other possible errors (e.g., coherent errors).In practice, stochastic error is not always dominant, which our protocol addresses with the randomized compilation step [see Fig. 2].The purpose of this step is to, upon averaging, convert all types of errors into stochastic Pauli errors [39]-in which case the theory presented above can be used to infer that r Ω ≈ ϵ Ω .When MRB is implemented on a gate set in which all of the two-qubit gates are Clifford gates, this noise tailoring follows from standard randomized compilation theory [39].In Appendix B, we show that with a Clifford two-qubit gate set, the error in MRB circuits is twirled into Pauli stochastic noise under the assumption that the error map on the single-qubit gates is independent of the Pauli gates with which they are compiled.In actual devices it is common for the single-qubit gate layers to have errors that are gate-dependent but much smaller than the two-qubit gate errors, in which case this result holds approximately [39].
Our MRB protocol can be applied to all controlled rotations around Pauli axes, i.e., all CP θ gates.When the two-qubit gates are not all Clifford gates (i.e., when θ 0, π), the randomized compilation method used in our circuits is not equivalent to standard randomized compilation.In this case, we cannot use standard randomized compilation theory to guarantee that all coherent errors on the two-qubit gates are twirled into stochastic Pauli errors.Ineffective twirling of coherent errors on two-qubit gates could result in coherent cancellation of the errors in a layer of two-qubit gates and its inversion layer in the second half of the mirror circuit (as happens in a simple mirror circuit, or standard Loschmidt echo [7]).In Appendix C 1 we prove that our randomized compilation method largely-but not entirely-prevents this error cancellation.We consider the sensitivity of our method to general Hamiltonian errors on each gate g ∈ G 2 .We model these errors by an error map E(g) = e M g , where and H P a ,P b is the two-qubit Hamiltonian error generator indexed by the Pauli operators P a and P b , as defined in Ref. [47].
We show that r Ω depends on all Hamiltonian errors in CP θ gates except one particular linear combination of the Hamiltonian errors on CP θ and CP −θ gates, when θ 0, π (i.e., when CP θ is not a Clifford gate).In particular, r Ω is insensitive (at first order) to ε CP θ P,P + ε CP −θ P,P when θ 0, π.This is the sum of over-and under-rotation Hamiltonian errors in the CP θ gate and its inverse.In Appendix C 2 we discuss how our technique could be adapted to remove this limitation.Note that if G 2 = {cs, cs † }, as is the case in our simulations (below) and some of our experiments (Section V), then r Ω is insensitive (at first order) to ε cs Z,Z + ε cs † Z,Z .However, it is sensitive to all other linear combinations of the Hamiltonian errors on the cs and cs † gates.

D. Simulations
We now use numerical simulations to investigate the robustness of MRB, studying whether the MRB error rate (r Ω ) closely approximates the error rate of Ω-distributed layers (ϵ Ω ).Our theory for MRB suggests that MRB is particularly robust when the two-qubit gates are Clifford gates and when all errors are stochastic Pauli errors.Therefore we simulated MRB with non-Clifford two-qubit gates and for both stochastic and coherent errors.We simulated MRB for nqubit layer sets constructed from the gate set G 1 = SU(2) and G 2 = {cs, cs † } and n = 1, 2, 4, with all-to-all connectivity.We used a sampling distribution Ω 2 for which the two-qubit gate density is ξ = 1 /2 [48].In these simulations (and our experiments) each single-qubit gate is decomposed into the following sequence of xπ /2 and z θ gates: Here xπ /2 is a π /2 rotation around the X axis and z θ is a rotation around the Z axis by θ ∈ [0, 2π).Note that even when a shorter sequence of gates can implement the required unitary (e.g., u(0, 0, 0) implements the identity so it could be implemented with no gates) we always use this sequence of five gates.Therefore, the only difference between any two singlequbit gates is the angles of the z θ gates.We simulated three different families of error model: stochastic Pauli errors, Hamiltonian errors, and stochastic and Hamiltonian errors.These error models are specified using the error generator framework of Ref. [47], and they consist of gate-dependent errors specified by randomly sampling error rates for each type of error and each gate.We simulated error models that are crosstalk free (note that our theory encompasses crosstalk errors) so each error model is specified by the rates of each type of local error on each gate.In particular, for an m-qubit gate we randomly sample 4 m − 1 stochastic error generators, or 4 m −1 Hamiltonian error generators, or both, depending on the error model family.We sampled the error rates so that the infidelity of each two-qubit gate was approximately q, and the infidelity of each one-qubit gate was approximately 0.1q, where q is a parameter swept over a range of values (See Appendix D).These error models have perfect state preparation and measurements.The effect of SPAM error on the polarization is approximately independent of benchmark depth, and therefore we expect MRB to be robust to SPAM error.In Appendix D we present simulations compare the MRB error rate in error models with perfect measurements to error models with bit flip and amplitude damping measurement error.We find that these measurement errors do not significantly impact the resulting MRB error rate.
Figure 3 shows the results of our main simulations.It compares the true average layer error rate per qubit to the observed MRB error rate per qubit in each simulation, separated into the three families of error model (1σ error bars are shown, computed using a standard bootstrap).Figure 3(a)-(c) shows that r Ω ≈ ϵ Ω in every simulation, which means that our method closely approximates the error rate of Ω-distributed layers for all of these error models.
For stochastic error models [Fig.3 (a)], the relative error δ rel = (r Ω, perQ − ϵ Ω, perQ )/ϵ Ω, perQ in the MRB estimate of ϵ Ω,perQ is small: |δ rel | < 0.04 and the mean |δ rel | is 0.007 for all sampled error models.This is consistent with, and supports, our theory for MRB with stochastic errors.The relative error is larger for Hamiltonian error models-the mean relative error is 0.04 and |δ rel | < 0.21 for all error models.We expect larger relative error for some Hamiltonian error models, because MRB is insensitive to some Hamiltonian errors (see Section IV C)-but note that the uncertainty due to finite sample fluctuations (σ) are larger in these simulations.For stochastic Pauli errors [Fig.3 (a)], the uncertainty in r Ω,perQ is small, because there is little variation in the performance of circuits of the same depth (the mean uncertainty in r Ω,perQ is 0.5%).For Hamiltonian errors [Fig.3 (c)], the uncertainty in r Ω,perQ is larger (the mean uncertainty is 3%), as individual circuit performance varies widely due to coherent addition or cancellation of error being highly dependent on the circuit structure (as in all RB methods, we expect coherent errors to add or cancel in individual MRB circuits).
Arguably the most relevant simulations for real-world quantum computers are those with both stochastic and coherent errors [Fig.3 (b)].In these simulations we sampled random combinations of stochastic and Hamiltonian errors (so the dominant source of error varies across these models).We find that r Ω ≈ ϵ Ω holds to a good approximation for typical error models sampled from this ensemble (the mean relative error is 0.017, and |δ rel | < 0.11 for all models, and the mean uncertainty in r Ω,perQ is 1.4%).
To investigate whether there is evidence for r Ω systematically under (or over) estimating ϵ Ω we plot the relative error divided by its uncertainty σ δ rel [Fig.3 (d-f The Advanced Quantum Testbed. We performed MRB experiments on four qubits (Q4-Q7) of AQT's eight-qubit superconducting transmon processor (AQT@LBNL Trailblazer8-v5.c2).The processor includes 8 fixed frequency transmons coupled in a ring geometry.Each qubit (purple) has its own control line (orange) and readout resonator (cyan) coupled to a shared readout bus (red) for multiplexed readout.
qubit, there is no evidence that MRB is significantly biased towards under or overestimating ϵ Ω with these error models.In contrast, we find that MRB slightly but systematically underestimates ϵ Ω for n > 1 qubits.This underestimate can be explained by the correlation between the error in an Ω-distributed circuit and its randomly-compiled inverse, which determines the difference between r Ω and ϵ Ω (see Section IV B).When the circuits contain two-qubit gates-which in our simulations (and in most real systems) have higher error rates than one-qubit gates-the error in a circuit is typically highly correlated with the number of two-qubit gates in the circuit.As a result, the correlation between a circuit and its randomly-compiled inverse is typically larger when the circuits contain a variable number of two-qubit gates, causing r Ω to slightly underestimate ϵ Ω .

V. EXPERIMENTS ON THE ADVANCED QUANTUM TESTBED
We used MRB to benchmark universal gate sets on the Advanced Quantum Testbed (AQT) [40], a quantum computing testbed platform based on superconducting qubits.We performed our experiments on four qubits (Q4-Q7) of an eight-qubit superconducting transmon processor (AQT@LBNL Trailblazer8-v5.c2).These four qubits are coupled to their nearest neighbors in a linear geometry (see Fig. 4).Below and throughout this paper, estimated quantities include error bars where possible [49].All error bars are 1σ and are written using standard concise notation, i.e., r = 1.2(3)% means r = 1.2% with a standard error of 0.3%.
. Randomized benchmarking of universal gate sets on four qubits of the Advanced Quantum Testbed.We used MRB to benchmark n-qubit layers constructed from three different gate sets, on each connected n-qubit subset of a linearly-connected set of four qubits {Q4, Q5, Q6, Q7} in an eight-qubit superconducting transmon processor (AQT@LBNL Trailblazer8-v5.c2).The rows correspond to results from three different choices of gate set, each consisting of a two-qubit gate set G 2 and a single-qubit gate set G 1 .From top to bottom, the rows correspond to: a universal gate set containing two non-Clifford entangling gates and the set of all single-qubit gates [G 2 = {cs, cs † }, G 1 = SU( 2 The estimated error rate r Ω for each qubit subset that we benchmarked.(g-i): Predictions for the average layer error rate of 3-and 4-qubit subsets (hatched) based on the experimental 1-and 2-qubit error rates (un-hatched) and the assumption of no crosstalk errors.The difference between (d-f) and (g-i) quantifies the contribution of crosstalk errors to the average error rate of an n-qubit layer, for n = 3, 4. For all three gate sets and n = 4, we see that crosstalk errors are contributing approximately 0.7% error to r Ω , which is approximately 1 /3 of r Ω .

A. Experiment design
One of the advantages of MRB is that it can benchmark a wide variety of n-qubit layer sets, and we used this flexibility to explore the performance of three distinct layer sets on AQT.Each layer set is defined by a set of single-qubit gates G 1 , a set of two-qubit gates G 2 , a two-qubit gate density ξ, and the connectivity of the qubit subset (see Section II).In our experiments we investigated three different choices for (G 1 , G 2 ): (SU(2), {cs, cs † }), (SU(2), {cz}), and (C 1 , {cz}), where C 1 is the set of all 24 single-qubit Clifford gates.These circuits contain strict barriers between all layers, including between the single-and two-qubit gate layers that make up each composite layer.
MRB enables benchmarking each layer set on any connected set of qubits, and the error rates on subsets of a device can be used to learn about the location and type of errors.We benchmarked n-qubit layer sets for every possible connected set Q ⊆ {Q4, Q5, Q6, Q7} of n qubits with n = 1, 2, 3, 4, resulting in 10 different qubit subsets.Independently benchmarking every connected subset of qubits allows us to study the spatial variation in gate performance in detail and determine the size of crosstalk error in circuits with 3 and 4 qubits (see Section V C).For each RB experiment, we sampled K = 30 circuits at a set of exponentially-spaced benchmarking depths (d = 0, 2, 4, 8 . . .).
For each of the three gate sets (G 1 , G 2 ), and each qubit subset Q, we ran experiments with a two-qubit gate density of ξ = 1 /2.To investigate the effect of varying ξ, we also ran experiments with ξ = 1 /8 for one of the gate sets-(SU(2), {cs, cs † })-and every Q.For each qubit subset we therefore ran 4 MRB experiments, defined by [50]: Further experiment details are provided in Appendix E.

B. Estimating average error rates of universal layer sets
Figure 5 summarizes the results of the 3 × 10 MRB experiments in which we vary the gate set (G 1 , G 2 )-corresponding to each row of Fig. 5-and the subset of qubits benchmarked Q, but we keep the expected two-qubit gate density constant (ξ = 1 /2).The main output of an MRB experiment is an average layer error rate (r Ω ), obtained by fitting the mean observed polarization [ S d , defined in Eq. ( 12)] to an exponential decay.This error rate is a function of (G 1 , G 2 , Q, ξ), so we denote our estimated error rates by r(G 1 , G 2 , Q, ξ) whenever we need to refer to a particular error rate.These error rates quantify the performance of random circuits on this device and enable us to compare the average performance of the gate sets we tested.
Figure 5 (a-c) shows MRB data and fits to an exponential, for each of the three gate sets and ξ = 1 /2.For each MRB experiment, we show the mean observed polarization ( S d ) versus benchmark depth, the distribution of the observed polarization versus benchmark depth, and the fit of S d to S d = Ap d .Data for a single representative subset of qubits of each size (n = 1, 2, 3, 4) are shown.In all cases, we observe that S d is consistent with an exponential decay in d, providing experimental evidence for our claim that S d will decay exponentially under a broad range of conditions.
Figure 5 (d-f) shows the estimated error rates (r Ω ) for each qubit subset that we benchmarked, for each of the three different gate sets.Each r Ω is a rescaling of the decay rate of the fitted exponential [see Eq. ( 13)].By comparing Fig. 5 (d), (e) and (f) we can compare the average error rates of nqubit layers constructed from three different gate sets, two of which are universal and one of which contains only Clifford gates and therefore is not.By comparing (e) and (f), we find that the average error rate of a layer set is approximately independent of whether single-qubit gates are sampled from SU(2) or from C 1 (the single-qubit Clifford group)-that is, r(SU(2), {cz}, Q, 1 /2) ≈ r(C 1 , {cz}, Q, 1 /2) for all ten subsets of qubits Q.All single-qubit gates in our experiments are implemented using a composite u(θ, ϕ, λ) gate [see Eq. (31)] that contains two xπ /2 gates and three z θ gates.This is the case even for unitaries that do not require two xπ /2 pulses, such as the identity.The difference between any two single-qubit gates is therefore only in the angles of the three z θ gates within u(θ, ϕ, λ).These gates are implemented by in-software phase updates on later pulses [51], so it is expected that these "virtual gates" cause negligible errors.The observed similarity between the average performance of these two gate sets is consistent with this expectation (numerical values for all estimated r Ω are included in Table I).Note, however, that the observed similarity between the average success rates of circuits in which the single qubit-gate gates u(θ, ϕ, λ) are sampled from two different distributions does not imply that the success rate of an individual circuit is independent of the values of θ, ϕ and λ in its u(θ, ϕ, λ) gates -see Appendix E 2 for further discussions.
Our experiments included MRB on n-qubit layers containing two non-Clifford two-qubit gates-cs and cs † -and we now turn to these results.Comparing Figs. 5 (d) and (f), we observe that the error rates for layers containing cs and cs † gates are all almost equal to, but slightly larger than, the error rates for layers containing cz gates.The largest relative difference is in the experiments on the 3-qubit set {Q4, Q5, Q6}: r(SU(2), {cs, cs † }, {Q4, Q5, Q6}, 1 /2) = 1.64(5)% and r(SU(2), {cz}, {Q4, Q5, Q6}, 1 /2) = 1.48(4)%.The three different two-qubit gates (cs, cs † , and cz) on each qubit pair were a priori expected to have similar error rates, due to their similar calibration procedures.The slightly larger error rates for cs and cs † were cross-validated using cycle benchmarking [9] (see Section V D for a quantitative comparison).Therefore, these results are experimental evidence for the robustness of MRB with non-Clifford two-qubit gates (see Sections IV C and IV D for discussion of and theory for MRB of non-Clifford two-qubit gates).

C. Estimating crosstalk errors
Crosstalk is an important type of error in current quantum processors, but it is challenging to quantify [4].Multi-qubit MRB captures crosstalk errors, and it enables us to quantify the contribution of crosstalk errors to the average error rate of n-qubit layers.To do so, we compare the observed increase in r Ω with n [Fig.5 (d-f)] to predictions for r Ω that assume no crosstalk errors.The excess observed error above these predictions is then attributed to crosstalk.
We predict r Ω for sets of three or more qubits from the observed r Ω values for each one-and two-qubit subset (note, however, that this is not the only possible way to predict r Ω ).This prediction is built on a simple theory for MRB.We model r Ω by where ϵ L is the infidelity of a G 1 -dressed layer L, which consists of a specific layer of two-qubit gates-i.e., L is labelled Estimating crosstalk errors on AQT.We estimate the contribution of crosstalk errors to the layer error rate r Ω for n = 3, 4 qubits by taking the difference between each experimental error rate (r Ω ) and a corresponding prediction (r Ω,pred ) obtained from the experimental one-and two-qubit error rates and the assumption of no crosstalk.We find that crosstalk contributes approximately 0.2% − 0.4% to r Ω for n = 3 (which is 1 /8 − 1 /4 of r Ω ), and approximately 0.7% to r Ω for n = 4 (which is 1 /3 of r Ω ).
by the two-qubit gate layer-followed by a layer of random single-qubit gates (either from SU(2) or C 1 ).Equation ( 34) is justified by our theory for MRB (see Section IV), but note that it only holds approximately, unless each layer's error channel is an n-qubit depolarizing channel.The fidelity F = 1 − ϵ of a tensor product of channels is the product of those channels' fidelities.So, under the assumption that there are no crosstalk errors, the infidelity of L is given by ϵ L = g∈L F g , where g are the G 1 -dressed gates in the G 1 -dressed layer L, and F g is the fidelity of g.Therefore, where ϵ g = 1 − F g .To predict ϵ L using Eq.(35) [and then r Ω using Eq. ( 34)] we need estimates for ϵ g for every possible G 1 -dressed gate g.That is, we need estimates for (1) ϵ idle(Qi) for each qubit Qi ∈ {Q4, Q5, Q6, Q7} where idle(Qi) is the G 1 -dressed idle gate on Qi, and (2) ϵ g(Qi,Qj) for each connected pair of qubits {Qi, Qj} where g(Qi, Qj) is a two-qubit gate on {Qi, Qj} uniformly sampled from G 2 .Each of these quantities can be estimated from the observed one-and two-qubit MRB error rates.Using Eq. ( 34) we have because each single-qubit MRB circuit simply consists of repeating the G 1 -dressed idle gate.Similarly, using Eq.(34) we have because each G 1 -dressed layer in a two-qubit MRB circuit is either (with probability ξ) a G 1 -dressed two-qubit gate sampled uniformly at random from G 2 , or a G 1 -dressed idle on each qubit (with probability 1 − ξ).Using Eqs. ( 35)- (37) and explicit expressions for Ω(L), we obtain analytic expressions for our crosstalk-free predictions of r Ω for the 3-and 4-qubit layers.These predictions are shown in Fig. 5 (g-i).The crosstalk-free predictions are significantly smaller than the observed experimental values, shown in Fig. 5 (d-f).For each gate set, the predicted 4-qubit r Ω is approximately 25% smaller than the observed value.The crosstalk-free predictions for {Q4, Q5, Q6} are 13%-19% smaller than their observed values, and the crosstalk-free predictions for {Q5, Q6, Q7} are 20%-27% smaller than their observed values.The difference between the experimental error rates and the crosstalk free predictions, shown in Fig. 6, is a quantification of the contribution of crosstalk errors to the average rate of errors in 3-and 4-qubit random circuits in this system.We note that one contribution to the difference between the observed r Ω and the crosstalk-free prediction is the difference between idle gates that occur in parallel with a two-qubit gate and idle gates that occur in single-qubit circuits.The idle that occurs in parallel with a two-qubit gate is a 200 ns idle (the duration of a two-qubit gate on this device), whereas the idle gate that occurs in a one-qubit circuit is a 60 ns idle.Our prediction methodology implicitly assumes that these two idle gates have the same error rate.However, we conjecture that the contribution from this difference is small, because idle gates in this system are relatively low error.
Qubit Subset ).(b) The error rates obtained when running equivalent DRB and MRB experiments, on every one-and two-qubit subset of the 4 qubits we benchmarked on AQT.The close agreement between the DRB and MRB error rates is experimental evidence that MRB is reliable.The inset shows the polarization at benchmark depth d = 0 (S 0 ) for n-qubit DRB with n = 1, 2, 3.The rapid decay in S 0 is due to the overhead in implementing a Haar-random unitary, and it makes DRB of universal gate sets infeasible on more than around 2-3 qubits.

D. Estimating the error rates of individual gates
An MRB experiment is primarily designed to estimate a single error rate (r Ω ) that quantifies the average error rate of an n-qubit layer.However, it is also often useful to quantify the error in specific layers, e.g., to identify high-error gates.Information about the error rates of individual layers is contained within the MRB data (e.g., RB data can even be used for full tomography [52,53]), and we extract it using a scalable model fitting method.Specifically, we fit a 4-qubit depolarizing error model to the 4-qubit MRB data to estimate the error rates of individual G 1 -dressed layers [Fig.7].To validate our results, we compare the infidelities we estimate to independent estimates obtained from an established technique: cycle benchmarking [9], which is a method for estimating the infidelity of individual many-qubit gate layers.Fig. 7 shows that our estimates are broadly similar to the those obtained from cycle benchmarking, differing by at most 23%, and note that we would not expect exact agreement [54].This demonstrates the potential of MRB to go beyond average error rate estimation, and provides an alternative to, e.g., interleaved RB.In Appendix E 3 we discuss the depolarizing model fit as well as two additional methods for estimating the error rate of individual layers from MRB data, and compare their predictions.

E. Comparison to direct RB
One of the purposes of our experiments is to test the reliability of MRB.To investigate whether r Ω ≈ ϵ Ω in experiment (as claimed by our theory), we compare the results of MRB to an alternative, established RB technique: direct RB (DRB) [5].DRB is a streamlined variant of standard RB.Both DRB and standard RB are inefficient when applied to univer-sal gate sets-as they have costs that scale exponentially with the number of qubits-but they are feasible in the very few qubit regime.We chose to compare MRB to DRB because these two methods have the same flexible circuit sampling and they are designed to measure the same error rate: ϵ Ω .In contrast, standard RB benchmarks a gate set that forms a group, e.g., SU(2 n ), and it measures an error rate for a uniformly random element of that group-so this error rate cannot be directly compared to r Ω .
An n-qubit, benchmark depth d DRB circuit for a universal layer set is constructed by first sampling a depth-d circuit C with layers sampled from some distribution Ω-exactly as with MRB.As shown in Fig. 8 (a), this circuit C is then embedded between (1) a circuit that implements an n-qubit Haar random unitary, and (2) a circuit that returns the qubits to the computational basis.Note that both (1) and ( 2) require circuits of one-and two-qubit gates whose size grows exponentially in n (we compile a SU(2 n ) unitary into a circuit of xπ /2 , z θ and cz gates using the Qsearch package [55,56]).We therefore ran DRB on all n-qubit subsets only up to n = 3.
In our DRB experiments we used the same layer sampling distribution as in our G 1 = SU(2), G 2 = {cs, cs † }, and ξ = 1 /2 MRB experiments.So the DRB error rates we are measuring-which we denote by r DRB (SU(2), {cs, cs † }, Q, 1 /2) for qubit subset Q-will be equal to the equivalent MRB error rates r(SU(2), {cs, cs † }, Q, 1 /2) if both DRB and MRB are working correctly.Figure 8 compares these DRB and MRB error rates for each one-and two-qubit subset.For each of these qubit subsets, the two error rates differ by no more than 2σ.Due to the overhead in implementing a Haar-random unitary from SU(2 n ), the 3-qubit DRB circuits were so large that the polarization of all n = 3 DRB circuits was S d ≈ 0, even for the d = 0 circuits, so we were not able to obtain reliable estimates of r DRB for either 3-qubit subset.The rapid decrease in the d = 0 polarization (S 0 ) with increasing n is shown in the inset of Fig. 8 (b).This demonstrates that DRB cannot be used to benchmark universal gate sets on more than around 2-3 qubits (and note that standard RB requires running even larger circuits than those used in DRB).

VI. 27-QUBIT IBM Q DEMONSTRATION
To investigate what many-qubit MRB can reveal about errors in current many-qubit hardware, we ran MRB on a 27qubit IBM Q device (ibmq montreal, a Falcon r4 processor).We used the universal gate set G 1 = SU(2) and G 2 = {cnot}, and we sampled layers with a two-qubit gate density of ξ = 1 /4.Our circuits contain barriers between each layer of gates, as in our experiments on AQT (Section V). choose a single qubit subset Q containing n qubits for 15 exponentially spaced n up to n = 27.This is illustrated in Fig. 9 (b), for 6 of the 15 qubit subsets.For each qubit subset, we sampled and ran 25 circuits at each of a set of exponentially spaced depths.
Figure 9 (a) shows the observed polarization versus benchmark depth for six representative values of n.Even for n = 27, where we observe an average layer error rate of r Ω = 28(1)%, we obtain a d = 0 average observed polarization of S 0 ≈ 40%.This demonstrates that MRB is practical on many qubits, even when the error rate per layer is O(10%).For all n, we observe that the mean observed polarization is consistent with an exponential decay, as expected.Fig. 9 (b) shows the error rate per qubit (r Ω, perQ = 1 − (1 − r Ω ) 1 /n ≈ r Ω/n) versus n.Our circuits have a fixed expected two-qubit gate density (of ξ = 1 /4).Therefore, r Ω, perQ will be independent of n for n ≥ 2 if (1) the error rate of one-qubit gates and the error rate of twoqubit gates is invariant across the device, and (2) there are no crosstalk errors.Instead, we observe that r Ω, perQ rapidly increases from r Ω, perQ ≈ 0.2% for n = 2 up to r Ω, perQ ≈ 1.2%an increase of approximately 500%.
To quantify the contribution of crosstalk errors to the observed increase in the per-qubit error rate with n, we first need to quantify the spatial variations in the one-and two-qubit gate error rates (meaning the error rates of those gates when all other qubits are idle).We used one-and two-qubit MRB to measure the error rates of each one-qubit subset and each connected two-qubit subset of the 27-qubits.Because of the large number of qubits, it would require running more circuits than was feasible to implement independent one-qubit MRB experiments on each qubit (27 MRB experiments) and independent two-qubit MRB experiments on each connected pair of qubits (30 MRB experiments).Instead, we implemented all 27 one-qubit MRB experiments simultaneously [3].The resultant one-qubit MRB error rates therefore include contributions from single-qubit gate crosstalk errors.We ran the 30 two-qubit MRB experiments in eight groups, selected to minimize the closeness in frequency space of the qubits in each group.These two-qubit MRB error rates will therefore include some contributions from two-qubit gate crosstalk, but the experiments have been designed with the aim of minimizing this contribution.We also ran five isolated two-qubit MRB experiments and observed that the simultaneous twoqubit MRB error rates were a factor of between 1.5 and 2.5 times larger than the corresponding isolated MRB error rates (see Table III).
We use the set of measured one-and two-qubit MRB error rates to predict the n-qubit r Ω that would be observed if there are no two-qubit gate crosstalk errors, using Eqs.( 34)-( 37) [57].Figure 9 (c) shows the predictions for the per-qubit error rate r Ω, perQ .For n ≫ 1 these predictions (blue diamonds) are much smaller than the observations (red circles).This prediction accounts for spatial variations in the one-and two-qubit error rates, and includes contributions from one-qubit gate crosstalk errors (and some contributions from two-qubit gate crosstalk).Therefore, we can conclude that the additional observed error is due to crosstalk caused by the two-qubit gates, and it lower bounds the total contribution of crosstalk errors to r Ω .Figure 9 (d) shows the ratio R of the observed to the predicted error rate per qubit r Ω, perQ , versus n.R grows approximately linearly from R ≈ 0.2 at n = 2 up to R ≈ 2.5 at n ≈ 13 and then saturates at between R ≈ 2.5 and R ≈ 3.0.One possible explanation for this is two-qubit gate crosstalk errors with finite spatial radius, i.e., two-qubit gates cause increased errors on other qubits within some distance of the target qubits.

VII. DISCUSSION
Scalable benchmarking methods are needed to quantify the integrated performance of medium-and large-scale quantum processors.In this paper, we introduced a scalable method for RB of universal gate sets that uses a novel and customizable family of randomized mirror circuits.We presented a theory for our method, showing that it reliably measures the error rate of a random n-qubit circuit layer sampled from a userspecified distribution Ω.We demonstrated MRB on multiple gate sets in both simulations and experiments, demonstrating that it is reliable and that it is a powerful tool for understanding errors in many-qubit circuits.Our method can be viewed as both an adaptation of standard RB and its variants, to enable efficient and scalable benchmarking of universal gate sets, and as an adaptation of XEB that removes XEB's inefficient circuit simulation step.It therefore provides a link between two widely used benchmarking methodologies, and so we anticipate that the ideas introduced here will lead to further advances in randomized benchmarking.
Using two quantum processors, we demonstrated MRB of a gate set consisting of cnot and arbitrary single-qubit gates on up to 27 qubits and MRB of a gate set with non-Clifford two-qubit gates (cs and cs † ) on up to 4 qubits.Our results provide evidence that MRB with non-Clifford gates is a robust method for determining a processor's error rate per gate layer, and that these error rates can be used to understand the magnitude of various types of errors.Additionally, our results show that MRB on many qubits reveals and quantifies errors not present in one-and two-qubit circuits, highlighting the importance of scalable benchmarks.Comparisons of RB error rates predicted from crosstalk-free models and our experimental re- The observed error rate per qubit r Ω,perQ = 1 − (1 − r Ω ) 1/n ≈ r Ω/n (red circles) versus n increases rapidly with n, even though the circuits have a constant expected two-qubit gate density ξ = 1 /4.This increase in r Ω,perQ is due to two-qubit gate crosstalk, not spatial variations in gate error rates.This is confirmed by comparison to predictions for r Ω,perQ (blue diamonds) obtained from one-and two-qubit error rates, for each one-qubit and connected two-qubit subset, and the assumption of no crosstalk.(d) The ratio of the observed (r Ω,perQ ) to predicted (r Ω,perQ,pred ) per-qubit error rate shows that crosstalk errors cause the per-qubit error rate r Ω,perQ to increase by approximately 250%-300% when n ≥ 15.(e) The one-and two-qubit error rates were obtained using simultaneous one-qubit MRB on all 27 qubits (blue boxes), and two-qubit MRB, on all pairs of connected qubits, run simultaneously on the qubit pairs in eight distinct groupings (the purple and green boxes show two such groups).
sults show evidence of large crosstalk errors in both of the devices we benchmarked and, importantly, our methods make it possible to quantify the size of these crosstalk errors.
We anticipate that a variety of interesting benchmarking methods can be constructed using MRB and extensions or adaptations of this method.For example, we anticipate that MRB can form the foundation of methods for estimating the error rates of individual gates and layers, within the context of many-qubit circuits.In this work we demonstrated a simple example of such a technique-fitting MRB data to a depolarizing model-and we expect that a variety of robust methods could be developed, that would complement or advance on existing methods for this task [9,12,42], such as interleaved RB.For example, MRB can potentially be adapted to extend the averaged circuit eigenvalue sampling protocol [12] to universal gate sets.Furthermore, we anticipate that MRB can be adapted to construct scalable "full-stack" benchmarks based on random circuits, such as a scalable variant of the widely used quantum volume benchmark [30].Department of Energy's National Nuclear Security Administration (DOE/NNSA) under contract DE-NA0003525.This written work is authored by an employee of NTESS.The employee, not NTESS, owns the right, title and interest in and to the written work and is responsible for its contents.Any subjective views or opinions that might be expressed in the writ-ten work do not necessarily represent the views of the U.S. Government.The publisher acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this written work or allow others to do so, for U.S. Government purposes.The DOE will provide public access to results of federally sponsored research in accordance with the DOE Public Access Plan.
We acknowledge the use of IBM Quantum services for this work.The views expressed are those of the authors, and do not reflect the official policy or position of IBM or the IBM Quantum team.

CODE AND DATA AVAILABILITY
where To obtain Eq. (A3), we use the reflection structure of randomized mirror circuits-in particular, U(R(P d+1 L −1 0 P c d ) Md ) = U(P d+1 L −1 0 P 0 ), where P 0 and P d+1 are the Pauli gates that are recompiled into L 0 and L −1 0 , respectively, in the randomized compilation step.The Pauli gate P d+1 determines the target bit string of M d -i.e., U(M) |0⟩⟩ = U(P d+1 ) |0⟩⟩ = |b⟩⟩.The overall error map E eff ( Md ) [Eq. (A5)] contains the error from the d /2 Ω-random circuit layers and their inverses (after randomized compilation), and it is composed of unitary rotations of the error channels associated with each circuit layer.
In the MRB protocol, we compute each circuit's observed polarization S [Eq.( 12)].We now show that the observed polarization S (M d ) is related to the polarization [Eq.( 4)] of M d 's overall error map (introduced above).Using the expression for ϕ(M d ) in Eq. (A3), the probability of measuring bit string x on circuit M d is given by The layer L 0 consists of single-qubit gates independently sampled from single-qubit unitary 2-designs.We now average over the initial circuit layer L 0 , making use of a fidelity estimation technique based on single-qubit gates: the fidelity of any error channel E can be found by averaging over a tensor product of single-qubit 2-designs [44].In particular, for any bit string y ∈ {0, 1} n , where where each L i is a independent, single-qubit 2-design.This implies that the expected observed polarization of M d over L 0 is where γ(E) denotes the polarization of E [Eq. ( 4)].Eq. (A9) follows from Eq. (A8).Averaging over all depth-d randomized mirror circuits, the mean observed polarization is Equation (A10) says that the average observed polarization S d , which is estimated by the MRB protocol, is equal to the expected polarization of the error channel of a depth-d mirror circuit.

Relating the observed polarization of MRB circuits and Ω-distributed random circuits
Above, we related the mean observed polarization ( S d ), which determines the MRB error rate, to the expected polarization of the overall error map of a depth-d randomized mirror circuit.We now use this result to derive Eq. ( 25), which relates the mean observed polarization of depth-d randomized mirror circuits to the expected polarization of the overall error map of a depthd /2 Ω-distributed circuit.In combination with the theory in Section A 3-which shows that S d and the mean polarization of the overall error map of Ω-distributed random circuits decay exponentially-the relationship we derive here implies that r Ω ≈ ϵ Ω .
Our goal is to relate the rate of decay of S d to the rate of decay of the fidelity of Ω-distributed circuits ( Fd ) [Eq. ( 15)].We start by expressing Fd in terms of the expected polarization of the overall error map of a depth-d Ω-distributed circuit.Applying Eq. (A1) to a depth-d, Ω-distributed random circuit We define Γ d to be the average polarization of the error map of a depth-d mirror circuit: To relate S d to Γd , we use the fact that a depth-d randomized mirror circuit consists of randomized compilation of a depthd /2 Ω-distributed random circuit followed by its inverse.These two depthd /2 circuits are both Ω-distributed (even after randomized compilation), but they are correlated.Below, we show that the polarization of the mirror circuit's overall error map depends on the covariance between the error in a depthd /2 Ω-distributed circuit and its randomly compiled inverse.We can write the overall error map in Eq. (A5) as a composition of two error maps-an overall error map for a random circuit and an overall error map for its randomly compiled inverse: where and C is the first half of Md , and it is a depthd /2 Ω-distributed random circuit that has had randomized compilation applied to it.By substituting Eq. (A12) into Eq.(A10), we obtain where, to go from Eq. (A13) to Eq. (A14), we have used the assumption that E SPAM is a global depolarizing channel.
Applying randomized compilation to an Ω-distributed random circuit creates a new random circuit that is also Ω-distributed.This is due to the conditions we require of Ω 1 and Ω 2 (Ω 1 is the uniform distribution, and Ω 2 is invariant under replacing a subset of a layer's gates with their inverses).Therefore, we can replace the average over all depth-d randomized mirror circuits in Eq. (A14) with an average over all depthd /2 Ω-distributed random circuits: where Ēeff (C −1 d /2 ) denotes the average over all possible circuits C ′ resulting from applying randomized compilation to C −1 d /2 .Expressing Eq. (A15) in terms of Γd /2 [Eq.(A11)], we have where (A17)

Fidelity decay of Ω-distributed random circuits
In this section, we show that the fidelity of Ω-distributed random circuits decays approximately exponentially in depth, assuming stochastic Pauli errors, when n is sufficiently large that 1 /4 n is negligible (in the small n case, Ω-distributed random circuits rapidly converge to a 2-design, from which it follows that the fidelity decays approximately exponentially).In this section, we use the notation L a;b to denote the sequence of composite layer-valued random variables L a L a+1 • • • L b .We will assume that each composite layer has a stochasic Pauli error channel, i.e. ϕ(L) = E L U(L), where L is a stochastic Pauli channel.We will used the stacked representation of superoperators, U = U ⊗ U * .We use P * n to denote the n-qubit Paulis, excluding the identity.We use P to denote the superoperator representation of a Pauli P.
We first prove a useful lemma that follows from the properties of the layer sampling distribution required for MRB (see Section II C).MRB requires that the layer sampling distribution is invariant under the randomized compilation procedure defined in Section III A, which implies that the distribution of unitaries induced by our layer sampling distribution is invariant under left and right multiplication by Paulis, i.e., for all P, P ′ ∈ P n , where L is sampled from a layer sampling distribution Ω that satisfies the conditions in Section II C. Using this fact, we obtain the following lemma: Lemma 1.Let L be a circuit layer-valued random variable sampled from an MRB layer sampling distribution Ω.Let P 1 , P 2 , P 3 , P 4 ∈ P n , be Pauli operators.If either (i) P 1 P 2 or (ii) P 3 P 4 , then Proof.We first consider the case where P 3 P 4 .Becuase Ω is invariant under right multiplication by Paulis, we can insert a right-multiplying Pauli Q: where Q is a Pauli superoperator.We can rewrite Eq. (A20) as where η(P, Q) = −1 if P and Q anticommute and η(P, Q) = 1 if P and Q commute.If P 3 P 4 , then there exists some Pauli Q such that Q anticommutes with P 3 and commutes with P 4 (otherwise, we would have [PQ, P ′ ] = 0 for all P ′ ∈ P n , which implies PQ = I and hence P = Q).Taking Q to be such a Pauli, we have Therefore, E L T r U(L) −1 (P 1 ⊗ P * 2 )U(L)(P 3 ⊗ P * 4 ) = 0. Similarly, to address the case where P 1 P 2 , we use the invariance of the sampling distribution under left multiplication by Paulis to obtain Using an argument analogous to the previous case, we conclude that if □ We now show that the fidelity of Ω-distributed random circuits decays approximately exponentially.Our theory shows that the expected polarization of Ω-distributed random circuits is given by Eq. ( 26): where E eff is the overall error channel of a depth-d Ω-distributed random circuit.Analogously, the expected fidelity of these circuits is given by We will now show that the expected process fidelity decays exponentially in benchmark depth (d), from which it follows that the polarization γ(E eff ) decays exponentially (as n ≫ 1).We will use the Pauli unravelling of the circuit: we expand each error channel as E L i = P∈P n γ P (L i )P.We will then expand the fidelity (Eq.(A27)) as a sum of terms corresponding to sequences of Paulis in the Pauli unravelling.Eq. (A27) becomes The sum in Eq. (A29) has 4 nd terms, each with a different sequence of d Pauli superoperators P 1 , P 2 , . . ., P d , which represents a possible sequence of Pauli errors in a depth-d circuit.We will separate these terms by the number of errors in the Pauli sequence, i.e. the number of indices i such that P i I n .Throughout this section, we will assume that indices for circuit layers and Paulis satisfy 1 ≤ i ≤ d.We use the term error pattern to refer to a description of the locations in a Pauli sequence where errors occur, which we describe by the set of indices S such that P i I n if and only if i ∈ S .We call an error k-separated if there are no errors within k layers of the error, i.e., there is a k-separated error on layer i if P i I and j S for all j i such that |i − j| < k.We expand Eq. (A29) by dividing the terms in the sum over Paulis P 1 , . . ., P d up by their error patterns to get where c j is the sum of all terms in Eq. (A29) in which the error pattern has |S | = j errors and contains a k-separated error, and h j is the sum of all terms in Eq. (A29) with an error pattern with |S | = j that does not contain a k-separated error.By the cyclic property of the trace, all terms with exactly one error have no contribution to the fidelity: It remains to show that c j and h j are negligible for j ≥ 2. First, we will use the scrambling condition on our gate set and sampling distribution to show that the c j are small.We start by considering the terms that make up c 2 .We break c 2 up into terms for each error pattern {i 1 , i 2 } contributing to c 2 , and we will bound the contribution of each of these terms: where We will now use the scrambling condition [Eq.6], to derive a bound on c {i 1 ,i 2 } .We can simplify Eq. (A34) to Now, we define the unitarily-rotated error channel where d P,P ′ depends on U(L i 2 ) and P i 2 , but we have suppressed this dependence.Substituting Eq. (A36) into Eq.(A35) and applying Lemma 1, we get We now apply the scrambling condition [Eq.( 6)] and use the definition of the layer infidelity to obtain an upper bound on c {i 1 ,i 2 } : ] be the expected layer infidelity, Eq. (A41) becomes Eq. (A42) bounds the value of each term in the expansion of c 2 given in Eq. (A33), and the number of terms in We now bound c j for j > 2. Again, we will start by bounding the contribution of each individual error pattern S that contributes to c j , i.e., each S such that |S | = j and S contains a k-separated error.Let S = {i 1 , i 2 , . . ., i j } be an error pattern with j > 2 and i i < i 2 < • • • < i j , and assume that i q+1 − i q > k and i q − i q−1 > k for some q < j (we will address cases in which the first or last error is k-separated later).We expand the errors before and after the k-separated error in layer i q in terms of tensor products of Paulis, then apply the properties of our layer distribution to bound the value of c S .First, using the cyclic property of the trace, c S becomes We expand two unitaries, corresponding to the layers of the circuit before and after k error-free layers, in terms of tensor products of Paulis: Using Eq. (A45) and Eq.(A46), Eq. (A44) becomes Using Lemma 1, Eq. (A47) becomes (A49) Applying Eq. ( 6), we have We then bound sums of d P ′ ,P ′ and b P,P coefficients and use the fact that Eq. ( 6) implies b I,I ≤ δ to bound c s in terms of the average layer infidelity and δ: Where the final inequality follows from the definition of ε.Eq. (A52) bounds the contribution of the term with error pattern S to the polarization of depth-d Ω-distributed random circuits.The number of possible error patterns S with |S | = j of the type considered in our argument above (i.e., error patterns with |S | = j and contain a k-separated error that is not the first or last error) is bounded by k d−2k j .There are two additional types of error pattern S that contribute to c j : (1) i 1 − i 2 > k, and (2) i q − i q−1 > k.These two cases essentially reduce to the two error case.In case (1), all errors after the first error can be expanded in terms of the Pauli basis: The argument for case (2) is analogous, doing an expansion of all errors except the last error.For each case (1) and case (2), the number of possible error patterns S with |S | = j is bounded by k d−k−1 j−1 .As in the previous section, our arguments bound c S for each valid error pattern S with |S | ≥ 3. Therefore, We need not consider circuits with depth larger than O( 1 /ε), because the circuit depth at which the polarization becomes negligible is O( 1 /ε)-because when dε ⪆ 1 at least one error is almost certain to occur.Therefore, Eq. (A59) implies that the c j terms have a negligible contribution to the fidelity F(E eff ).
We have bounded the contributions of the c j terms to F(E eff ).We now argue that that contribution of the h j terms is also negligible.These terms represent error patterns in which every error is within k layers of another error, so that our scrambling condition cannot guarantee that the probability of error cancellation is negligible.Instead, we will argue that the total probability of these error patterns is negligible.Because the contribution of an error pattern to the fidelity is bounded by its probability, this will bound the contribution of the h j terms to the fidelity.Since we have kε ≪ 1 from our scrambling condition, it follows that dkε 2 is small, and hence h j is small.We also note that as d gets large, is highly likely that an error pattern contains a k-separated error, and we can show that the probability of no k-separated layers is exponentially suppressed.We can break a depth d circuit into d 2k+1 pieces of 2k + 1 layers to bound the probability of there being no k-separated errors-if one of these blocks of 2k + 1 layers consists of an error with k error-free layers before and after it, then there is a k-separated error.When dε /2k+1 > 1, we can bound the probability of there being no block of this form (which we call a k-separated block) using a Chernoff bound: We consider a depth-d randomized mirror circuit (treated as a random variable), which we write as When the two-qubit gate layers consist of two-qubit Cliffords of the form CP θ , they are not changed by the randomized compilation step of our circuit construction.Therefore, We will assume the error on the single-qubit gates is independent of the Paulis they are recompiled with-i.e., ϕ R(P ′ L i P) = E(L i )U R(P ′ L i P) .Using this assumption, an implementation of the circuit M d can be written as We now push the error on the single-qubit gate layers through the two-qubit gate layers, defining new error channels that represent the error on a composite layer.Eq. (B3) becomes where We have grouped the error channels into error channels E ′ i that represent the error in a composite layer.Now, we use the structure of the randomized compilation procedure to twirl the error.The dressed layers can be expanded in terms of the original sampled layer and the Paulis inserted in randomized compilation as Rewriting Eq. (B4) using these expansions, we have where each correction layer P c i is a uniform random Pauli, because the two-qubit gates are Clifford.Averaging over the uniform random n-qubit Paulis P 0 , P 1 , . . ., P d , which equivalently averages over the correction Paulis P c 1 , . . ., P c d+1 , performs a Pauli twirl, converting the error channels into Pauli stochastic error channels.Performing this average, Eq. (B9) becomes where S i = E P PE ′ i P −1 and S(Ld /2 ) = E P PE(Ld /2 )P −1 are stochastic Pauli channels, each of which captures the error from one composite layer.All error, except the error on the final circuit layer, is twirled into stochastic Pauli noise by the random Paulis inserted in randomized compilation.Therefore, we expect our method to be sensitive to all errors when the two-qubit gates are chosen to be Clifford gates.

Sensitivity of errors in non-Clifford two-qubit gates
In Appendix B, we showed that when the two-qubit gate set used in MRB contains only Clifford gates, the error in the twoqubit gates is twirled, upon averaging, into stochastic Pauli noise.This guarantees sensitivity to general errors on the two-qubit gates.We now consider circuits with non-Clifford two-qubit gates and show that randomized mirror circuits are sensitive to most Hamiltonian errors on the two-qubit gates, to first order.We will assume there is no crosstalk error, and all two-qubit layers are sampled independently, so that we expect the only systematic coherent cancellation of errors to come from a layer and its inverse.We will also assume there is no error on the single-qubit gates.To see the effect of error in a two-qubit gate on a randomized mirror circuit to first order, it is sufficient to consider mirror circuits resulting from applying our circuit construction procedure to a single two-qubit composite layer L = L 1 L θ , where L θ = CP θ is a two-qubit gate and L 1 is a one-qubit gate layer.After mirroring and randomized compilation on L, we have the circuit M = T (L −1 θ , P 2 )R(P 2 L −1 1 P c 1 )R(P 1 L 1 P c 0 )T (L θ , P 0 ), where P 0 , P 1 , and P 2 are random two-qubit Pauli layers.The ideal operation M implements is U(M) = (P c 2 ) −1 P 0 .An imperfect implementation of M can be expressed as where we have used the definitions of the two-qubit gate layer T (L θ , P) and the correction layers P c i to rewrite the unitary evolution.
We now consider the effect of general gate-dependent Hamiltonian errors on the two-qubit gates on ϕ(M).We will write the error in terms of elementary error generators, as defined in the error generator formalism of Ref. [47].We model the error on each two-qubit gate g as E(g) = e M g , where and where H P a ,P b is the two-qubit Hamiltonian error generator indexed by the Pauli operators P a and P b .Using this expression for the error and expanding Eq. (C5) to first order in the error rates ε g P a ,P b , we have Eq. (C7) expresses the implementation of M in terms of its target evolution and a first order correction.The circuit is insensitive to an error to first order when the correction term vanishes, which occurs when Satisfying Eq. (C8) requires that the coefficient of each elementary error generator H P a ,P b is 0, which results in a system of 15 linear equations for each of 16 2 choices of two-qubit random Paulis P 2 , P 0 used in randomized compilation.The randomized mirror circuits are sensitive to an error if for some choice of P 0 and P 2 , the system cannot be satisfied when the error is nonzero.The two-qubit gate set G 2 is closed under inverses, so in addition to mirroring L = L 1 L θ as we have done above, we can mirror L 1 L −θ to get an analogous set of linear equations.Considering all of the equations from mirroring L 1 L θ and L 1 L −θ , we have a system of 2 × 16 2 × 15 linear equations.The solutions to this system are ε θ P a ,P b = ε −θ P a ,P b = 0 ∀(P a , P b ) (P, P) and ε G P,P = ε G † P,P .This means that to first order, the mirror circuits are not sensitive to the sum of H P,P errors on CP θ and CP −θ , as we can change ε G P,P + ε G † P,P without changing the error in any of the mirror circuits.This is a result of the structure we use for our mirror circuits.Below, we discuss how our method can be adapted to address this insensitivity.

Adaptations of MRB
While our simulations and experiments suggest non-Clifford MRB is a robust method, when our randomized mirror circuits contain non-Clifford two-qubit gates they are not sensitive to some coherent errors on these gates.In Appendix C 1 we showed that MRB circuits containing non-Clifford two-qubit gates are not sensitive to one linear combination of the Hamiltonian errors in these gates because of the correlations between the randomized compilation and the two-qubit gate that is applied, which prevent error from being perfectly twirled into stochastic noise.This shortcoming in our method is due to our choice of structure for our randomized mirror circuits.However, circuit mirroring is a flexible technique that can be applied to a variety of circuit structures, and here we discuss several adaptations of our method utilizing this flexibility that would address the error insensitivity in MRB.
Our method involves sampling random circuits with layers sampled from a user-specified distribution Ω over circuit layers.Different choices of circuit structure can address the shortcomings of our method and make other scalable benchmarks.We could guarantee sensitivity to all errors with more complex sampling of the Ω-distributed random circuit.For example, to benchmark a two-qubit gate set G 2 = {cs, cs † } we could generate circuits containing cs, cs † and cz gates and implement the cz gate by two consecutive cs or cs † gates.This MRB experiment would be sensitive to the H Z,Z errors on the cs and cs † gates that our MRB experiment is insensitive to (see above).
Our MRB protocol performs inversion layer-by-layer, and an alternative method to guarantee sensitivity to all errors is to use more complex inversion strategies that reduce the correlation in the gate layers in the two halves of a mirror circuit.One option is to invert multiple circuit layers at a time, through computing the inverse of the layers and compiling an inverse circuit-and similar ideas to this have recently been used to implement RB of continuously parameterized gates [59].However, compilation can be computationally-intensive with many qubits.Alternatively, we can modify the inversion layers by adding in additional gates, while maintaining a circuit that is logically equivalent to the inverse.

Appendix D: Simulations of MRB
In this appendix, we provide further details about our simulations of MRB, which are discussed in Section IV D.

Error Models for MRB Simulations
We simulated MRB with three classes of error models-stochastic, Hamiltonian, and stochastic+Hamiltonian.Our models are defined based on the error generator formalism in [47].Error rates are specified as elementary error generators of a post-gate error map.We include qubit-dependent Hamiltonian errors and Pauli stochastic errors on the xπ /2 and single-qubit idle gates with Hamiltonian error rates sampled in the range [0, h /10], and stochastic Pauli error rates sampled in the range [0, s /10].The stochastic and Hamiltonian errors are each split randomly across the three Paulis.We also include qubit-dependent Hamiltonian errors and Pauli stochastic errors on the cs and cs † gates with Hamiltonian error rates sampled in the range [0, h], and Pauli stochastic error rates sampled in the range [0, s], spread at random across the 15 two-qubit Pauli errors.
To generate error models, we start with an overall error parameter p and select s, h such that h 2 + s = p.We generate models with p ∈ [0.001, 0.2475] for 150 evenly-spaced values for the 1-qubit models and p ∈ [0.0001, 0.075] for 150 evenly-spaced values for the 2-and 4-qubit models.In the stochastic error models, we set h = 0.In the Hamiltonian error models, we set s = 0.In the stochastic+Hamiltonian error models, we generate s ∈ [0, p] at random, and set h = √ p − s.
For each error model, we run a randomly-generated set of MRB circuits consisting of K = 300 circuits at each benchmark depth d ∈ {2 j | 0 ≤ j ≤ 8}.We approximate the error rate in Ω-distributed random circuits (ϵ Ω ) via sampling.For each depth d ∈ {2 j | 0 ≤ j ≤ 8}, we ran K randomly-generated depthd /2 Ω-distributed random circuits, each followed by a perfect projective measurement onto the target state.

Simulations of MRB with measurement error
We simulated MRB with two types of measurement error-amplitude damping error and bit flip error.For each type of measurement error, we performed simulations where only a single qubit had measurement error, and where each qubit had measurement error.We compare these results to simulations of MRB with no measurement error.
To generate each error models, we sampled a stochastic+Hamiltonian gate error models via the method in Appendix E 3 with overall error parameter p, with p=0.1 for single-qubit error models and p=0.02 for 2-and 4-qubit error models.We define our measurement error using the single-qubit elementary error generators S X , S Y , and A X,Y defined in Ref. [47].To add bit flip error of strength p m to a qubit, we apply the error exp(p m S x ) immediately before measurement.To add amplitude damping error to a qubit, we apply the error exp(p m (S X + S Y − A X,Y )) immediately before measurement.We used 80 evenly-spaced values of p m ∈ [0.0001, 0.0801].We ran simulations with measurement error on a single qubit and on all qubits.In models with measurement error on all qubits, we sampled a uniform random value p ∈ [0, 2p m ] for each qubit, and apply error of strength p to that qubit.In models with measurement error on one qubit, we apply a fixed measurement error of strength p m to one qubit.Fig. 10 shows the results of these simulations.We observe that r Ω ≈ ϵ Ω across all types of measurement error models we sampled, providing evidence that measurement error does not significantly impact the performance of MRB.Furthermore, the FIGURE 13.Fitting error models to MRB data and estimating gate error rates.We fit two types of error models to MRB data to estimate the infidelity of individual circuit layers.(a) By running two MRB experiments with two different two-qubit gate densities ξ, we can estimate the mean infidelity of a set of one or more two-qubit gates-here cs and cs † -using basic linear algebra (see Appendix E 3 a).We call this procedure the two densities heuristic.The estimates of the average gate error obtained from the two densities heuristic (orange) are compared to independent estimates obtained from two more rigorous but more complex and computationally intensive procedures: fitting each set of two-qubit MRB data to (1) a depolarizing model (light blue), and ( 2 parameters.We can therefore use our data for each such circuit pair (C 1 , C 2 ) to investigate whether circuit success rates depend on the values of the phase shifts.Figure 12 shows the observed polarization S for C 1 versus S for C 2 for each pair of circuits (C 1 , C 2 ) that differ only by the values of the phases in z θ gates.There are many circuit pairs that have very different S , e.g., there is a circuit pair for which S ≈ 0.9 for one circuit and S ≈ −0.3 for the other (note that − 1 /2 ≤ S ≤ 1). Figure 12(b) shows that the spread in the differences between the observed polarization of circuit pairs is largest for single-qubit circuits (σ = 0.147) and decreases as the number of qubits increases (σ = 0.052 for n = 4).The substantial variance in observed polarization differences implies strongly structured errors, e.g., coherent errors.Even for perfect z θ gates, the value of each phase shift impacts how errors in other gates propagate through a circuit [61,62].

Estimating the error rates of individual gates
A single MRB experiment is designed to estimate a single error rate r Ω that quantifies the average rate at which an n-qubit layer causes an error in Ω-random circuits.But we can also use MRB to extract information about the error rates of particular layers.In Section V D we present one method for doing so-fitting to a depolarizing model.In this appendix we explain this method and present two alternative methods.These methods are complementary, as they trade off rigor for complexity.Note that one possible method for estimating the error rates of individual gates using MRB is to run an interleaved [42]  or cz gate on one of the three connected pairs of qubits.We fit a simple n-qubit depolarizing model to (1) the 4-qubit data, and (2) the 1-and 2-qubit data, and use both models to estimate the infidelity of 4-qubit G 1 -dressed layers.The estimates from fitting to the 1-and 2-qubit data do not account for any additional crosstalk errors that occur in 4-qubit layers, so the additional error estimated when fitting to the 4-qubit data is a quantification of crosstalk.We also fit a more sophisticated stochastic Pauli error model to the 4-qubit circuit data, resulting in comparable estimates to those obtained from the simple depolarizing model (which uses a scalable, less computationally intensive analysis).To validate our results against an established technique, we compare to infidelities independently estimated using cycle benchmarking [9].We observe qualitative agreement.The cycle benchmarking experiments measure the infidelities of layers dressed with one-qubit gates sampled from a different gate set (the Pauli group) to that used in our MRB experiments [SU(2) or C 1 , the single-qubit Clifford group], and these experiments were implemented on a different day than the MRB circuits, so exact agreement is not expected.MRB (and interleaved standard RB has been previously used to measure the error rate of a cs gate [34]).We do not explore this here, although we note that interleaved MRB would inherit all of the known problems with interleaved standard RB [37,52].
a. Estimating gate error rates using a varied-densities heuristic MRB uses flexible sampling of the circuit layers, as each composite layer is sampled from some distribution Ω.By running MRB experiments with the same layer set L but different sampling distributions {Ω 1 , Ω 2 , . . .} over L, we can (approximately) ascertain the average error rates of different subsets of gates by applying basic linear algebra to {r Ω 1 , r Ω 2 , . . .} [5].In our experiments, we ran MRB for the gate set ({cs, cs † }, SU(2)) with two different Ω defined by two different two-qubit gate densities: ξ = 1 /2 and ξ = 1 /8.We focus on the three two-qubit sets of connected qubits.Using Eq. ( 37), for each two-qubit subset Q, we have that where r ξ = r(G 1 , {cs, cs † }, Q, ξ), ϵ 1 is the infidelity of the dressed layer consisting of dressed idles on each qubit in Q, and ϵ 2 is the mean of the infidelities of G 1 -dressed cs and cs † gates applied to the qubits Q.We solve these linear equations to estimate ϵ 2 , for all three connected qubit pairs.These estimates are shown in Fig. 13(a), and we call this method the two densities heuristic, as it is based on the approximate relation of Eq. ( 34) [63].
b. Estimating gate error rates by fitting depolarizing error models The two densities heuristic is built on the standard MRB data analysis, which extracts a single error rate (r Ω ) from each MRB experiment design.But data from even a single MRB experiment contains a lot more information about each gate's errors than is contained in r Ω , e.g., RB data can contain sufficient information for complete tomography [52,53].In principle, this information can be extracted by fitting error models to MRB data-as has been demonstrated in simulations with 2-qubit standard RB [53].
However, fitting an error model to data typically requires simulating the circuits under that error model and this simulation is, in general, exponentially expensive in the number of qubits.Simplified, scalable approximations are therefore useful.One model that satisfies these criteria is a model in which each gate's error is modelled by a single error rate, and the error map for a layer of gates is a depolarizing channel [7].
Our depolarizing model summarizes the errors in each dressed one-and two-qubit gate (g) with a single, independent error rate ϵ g .The error channel for each dressed n-qubit layer L is modelled by an n-qubit depolarizing channel with an infidelity ϵ L given by ϵ L = 1 − g∈L (1 − ϵ g ).This means modelling the error channel for each dressed n-qubit layer L by with This error model is illustrated in Fig 13(b).We also model the readout on each qubit with an independent error rate ϵ Qi , where the readout error on an n-qubit circuit is an n-qubit depolarizing channel with infidelity ϵ R = 1 − Qi∈Q (1 − ϵ Qi ).Under this error model, the observed polarization of a circuit C = L 1 L 2 . . .L d is S (C) = γ(L 1 )γ(L 2 ) . . .γ(L d )γ(R). (E4) The parameters of this depolarizing model are a set of error rates-an ϵ g for each G 1 -dressed one-and two-qubit gate g and an ϵ Qi for the readout on each qubit Qi.To estimate these parameters we use a least-squares fit of Eq. (E4) to the observed polarizations of the MRB circuits.We separately fit the parameters of the depolarizing model to the data from MRB circuits on different numbers of qubits (n = 1, 2, 3, 4), so that we can study how the error rates of the gates change with n, due to crosstalk errors.
Fitting Eq. (E4) to the data from two-qubit MRB circuits results in estimates of the infidelity of each two-qubit dressed layer containing a two-qubit gate from G 2 (and an estimate for ϵ 1 , the dressed two-qubit idle).We can therefore use these fits to compare to the two densities heuristic (above).Figure 13(a) compares the mean of the entanglement infidelities of the dressed cs and cs † gates, obtained from this fit, with the estimate from the two densities heuristic.The estimates of both methods are between 1.1% and 1.5%, with the estimates differing by 0.5%-2.1%,which cross-validates the two methods.
Fitting Eq. (E4) to the data from 4-qubit MRB circuits provides estimates for the entanglement infidelities of all 15 dressed 4-qubit layers used in our experiments.Figure 14 shows the estimated infidelity for each of the nine G 1 -dressed layers that consist of a single dressed two-qubit gate applied to one of the three connected qubit pairs (in parallel with dressed idles on the other two qubits).These infidelities are between 2% and 3.1%, and they vary between qubit pairs and between gates (cs, cs † and cz).We quantify the contribution of crosstalk errors to these infidelities by also predicting the infidelities of these 4-qubit layers from the dressed gate error rates obtained from fitting the depolarizing model to the one-and two-qubit data.Shown in Fig. 14, these predicted infidelities are smaller than those estimated from the 4-qubit data by up to 60%.Crosstalk errors are a large proportion of the total infidelity in these 4-qubit layers.
To validate our results, we compare the infidelities we estimate to independent estimates obtained from an established technique: cycle benchmarking [9], a technique for estimating the infidelity of individual many-qubit gate layers.Fig. 14 shows that our estimates are broadly similar to those obtained from cycle benchmarking, differing by at most 23%.Moreover, we only expect rough agreement with the cycle benchmarking estimates, for two reasons: (1) the cycle benchmarking experiments were implemented on a different day (they were run immediately after the gates were calibrated), and (2) cycle benchmarking estimates the error rate of layers that are dressed by random Pauli gates (whereas our layers are dressed by Haar-random gates or random single-qubit Clifford gates).

c. Estimating gate error rates by fitting Pauli error models
Fitting data to an n-qubit depolarizing model is scalable, but the actual error map for each layer is unlikely to be global depolarization.For example, a global depolarizing channel causes highly correlated errors, whereas physically we expect many errors to be local errors.We therefore fit a more physically well-motivated model against which to compare our estimates for each dressed layer's infidelity.Arbitrary Markovian errors on a set of n-qubit layers L can be modelled by an n-qubit process matrix for each L ∈ L [1].But each of these process matrices has O(16 n ) parameters, resulting in an infeasible number of parameters to estimate when n = 4. Instead, we fit to a process matrix model of reduced complexity.This error model is illustrated in Fig. 13(c).We model the error in each one-or two-qubit native gate (i.e., each xπ /2 etc, not each dressed gate, or each element of G 1 ) by a one-or two-qubit stochastic Pauli channel [Eq.( 5)], respectively.We allow the Pauli error rates

FIGURE 1 .
FIGURE 1. Scalable randomized benchmarking of universal gate sets.(a) Randomized mirror circuits combine a simple reflection structure with randomized compiling to enable scalable and robust RB of universal gate sets.(b) Data and fits to an exponential obtained by using our method-MRB of universal gate sets-to benchmark a universal gate set on n = 1, 2, 3, 4 qubits of the Advanced Quantum Testbed, and the average error rates of n-qubit layers (r Ω , where Ω is the layer sampling distribution) extracted from these decays.(c) We benchmarked each connected set of n qubits for n = 1, 2, 3, 4, enabling us to map out the average layer error rate (r Ω ) for each subset of qubits.

FIGURE 2 .
FIGURE 2. Randomized mirror circuits over universal gate sets.To construct a randomized mirror circuit of benchmark depth d (and total depth 2d + 2) we first sample a random depth d + 1 circuit.This circuit alternates between layers of randomly sampled one-qubit gates and layers of randomly sampled two-qubit gates.It can be thought of as consisting of a single initial layer of random one-qubit gates followed by d /2 composite layers (see inset).We then append to this circuit its inverse, i.e., the circuit in reverse with each layer replaced with its inverse.This creates a depth 2d + 2 circuit that will, if run perfectly, always return the all zeros bit string.This circuit is susceptible to systematic addition or cancellation of errors between the two halves of the circuit.To prevent this unwanted effect we then apply randomized compiling to the circuit.We insert a layer of random single-qubit Pauli gates (cyan) after each one-qubit gate layer.In order to guarantee that this randomly compiled circuit still always, if run perfectly, returns a single bit string s, our procedure (1) changes the rotation angles in the two-qubit gates (orange) if these gates are not Clifford gates, (2) adds in single-qubit Pauli axis rotations following the two-qubit gates (red) and, (3) adds in correction Pauli gates (purple) prior to each single-qubit gate layer.The yellow boxes show gates that are compiled together to create the final circuit of depth 2d + 2. This circuit contains d composite layers, which we call its benchmark depth.
non-Clifford two-qubit gates) (universal, Clifford two-qubit gates) (only Clifford gates) FIGURE 5. Randomized benchmarking of universal gate sets on four qubits of the Advanced Quantum Testbed.We used MRB to benchmark n-qubit layers constructed from three different gate sets, on each connected n-qubit subset of a linearly-connected set of four qubits {Q4, Q5, Q6, Q7} in an eight-qubit superconducting transmon processor (AQT@LBNL Trailblazer8-v5.c2).The rows correspond to results from three different choices of gate set, each consisting of a two-qubit gate set G 2 and a single-qubit gate set G 1 .From top to bottom, the rows correspond to: a universal gate set containing two non-Clifford entangling gates and the set of all single-qubit gates [G 2 = {cs, cs † }, G 1 = SU(2)]; a universal gate set containing a Clifford entangling gate and the set of all single-qubit gates [G 2 = {cz}, G 1 = SU(2)]; and a non-universal, Clifford gate set [G 2 = {cz}, G 1 = C 1 where C 1 is the one-qubit Clifford group].(a-c): MRB decays for the qubit subsets {Q4}, {Q4, Q5}, {Q4, Q5, Q6}, and {Q4, Q5, Q6, Q7}.Violin plots and points show the distribution and mean, respectively, of the MRB circuit's observed polarization (S d ) versus benchmark depth (d).The curve is a fit of the mean of S d ( S d ) to S d = Ap d .The average error rate of an n-qubit layer (r Ω ) is given by r Ω = (4 n − 1)(1 − p)/4 n .The observed S d decays exponentially, as predicted by our theory for MRB.(d-f):The estimated error rate r Ω for each qubit subset that we benchmarked.(g-i): Predictions for the average layer error rate of 3-and 4-qubit subsets (hatched) based on the experimental 1-and 2-qubit error rates (un-hatched) and the assumption of no crosstalk errors.The difference between (d-f) and (g-i) quantifies the contribution of crosstalk errors to the average error rate of an n-qubit layer, for n = 3, 4. For all three gate sets and n = 4, we see that crosstalk errors are contributing approximately 0.7% error to r Ω , which is approximately 1 /3 of r Ω .
FIGURE 6. Estimating crosstalk errors on AQT.We estimate the contribution of crosstalk errors to the layer error rate r Ω for n = 3, 4 qubits by taking the difference between each experimental error rate (r Ω ) and a corresponding prediction (r Ω,pred ) obtained from the experimental one-and two-qubit error rates and the assumption of no crosstalk.We find that crosstalk contributes approximately 0.2% − 0.4% to r Ω for n = 3 (which is 1 /8 − 1 /4 of r Ω ), and approximately 0.7% to r Ω for n = 4 (which is 1 /3 of r Ω ).
Haar-random n-qubit unitary and benchmarking layers, up to a Pauli Randomly sampled 1-qubit gate layer Randomly sampled 2-qubit gate layer Compiled Haar-random n-qubit unitary

FIGURE 9 .
FIGURE 9. Randomized benchmarking of a universal gate set on a 27-qubit IBM Q processor.We ran MRB on n-qubit subsets of the ibmq montreal processor, for 15 exponentially spaced n from n = 1 to n = 27.(a) The MRB decays and fits to an exponential for the six subsets of qubits illustrated in (b).The observed polarization decays exponentially in all cases.Due to the minimal overhead in MRB circuits, we obtain an exponential decay even for 27 qubits and can extract a low-uncertainty estimate of the average error rate of 27-qubit layers [r Ω = 28(1)%].(c) The observed error rate per qubit r Ω,perQ = 1 − (1 − r Ω ) 1/n ≈ r Ω/n (red circles) versus n increases rapidly with n, even though the circuits have a constant expected two-qubit gate density ξ = 1 /4.This increase in r Ω,perQ is due to two-qubit gate crosstalk, not spatial variations in gate error rates.This is confirmed by comparison to predictions for r Ω,perQ (blue diamonds) obtained from one-and two-qubit error rates, for each one-qubit and connected two-qubit subset, and the assumption of no crosstalk.(d) The ratio of the observed (r Ω,perQ ) to predicted (r Ω,perQ,pred ) per-qubit error rate shows that crosstalk errors cause the per-qubit error rate r Ω,perQ to increase by approximately 250%-300% when n ≥ 15.(e) The one-and two-qubit error rates were obtained using simultaneous one-qubit MRB on all 27 qubits (blue boxes), and two-qubit MRB, on all pairs of connected qubits, run simultaneously on the qubit pairs in eight distinct groupings (the purple and green boxes show two such groups).

ACKNOWLEDGEMENTS
This material is based upon work supported by the Laboratory Directed Research and Development program at Sandia National Laboratories and the U.S. Department of Energy, Office of Science, National Quantum Information Science Research Centers, Quantum Systems Accelerator.This work was also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research Quantum Testbed Program under Contract No. DE-AC02-05CH11231.Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC (NTESS), a wholly owned subsidiary of Honeywell International Inc., for the U.S.
FIGURE 13.Fitting error models to MRB data and estimating gate error rates.We fit two types of error models to MRB data to estimate the infidelity of individual circuit layers.(a) By running two MRB experiments with two different two-qubit gate densities ξ, we can estimate the mean infidelity of a set of one or more two-qubit gates-here cs and cs † -using basic linear algebra (see Appendix E 3 a).We call this procedure the two densities heuristic.The estimates of the average gate error obtained from the two densities heuristic (orange) are compared to independent estimates obtained from two more rigorous but more complex and computationally intensive procedures: fitting each set of two-qubit MRB data to (1) a depolarizing model (light blue), and (2) a stochastic Pauli errors model (dark blue).(b) To fit a depolarizing model, we assign an error rate to each dressed layer and an error rate to each qubit's readout.(c) To fit a Pauli stochastic model, we assign a Pauli stochastic channel to each possible gate except the virtual z θ gates.
The MRB error rate per qubit [r Ω, perQ = 1 − (1 − r Ω ) 1/n ] versus the average composite layer error rate per qubit [ϵ Ω, perQ = 1 − (1 − ϵ Ω ) 1/n ] for each randomly sampled error model.The MRB error rate r Ω closely approximates ϵ Ω , and the agreement is closest under purely stochastic errors.(d-f): [9]imating the infidelity of dressed 4-qubit layers.Estimates of the error rates of individual G 1 -dressed layers containing a single 2-qubit gate (cs, cs † , or cz), obtained by fitting an n-qubit depolarizing model to the 4-qubit MRB data.This scalable analysis technique enables extraction of additional information about each layer's error from MRB data.To validate our results against an established technique, we compare to infidelities independently estimated using cycle benchmarking[9].We observe qualitative agreement.The cycle benchmarking experiments measure the infidelities of layers dressed with one-qubit gates sampled from a different gate set (the Pauli group) to those used in our MRB experiments, so exact agreement is not expected.
Validating MRB by comparison to DRB.(a)The structure of DRB circuits, which is a method for benchmarking an n-qubit layer set, when applied to a universal gate set.DRB is known to be reliable, but it is exponentially expensive in n for universal gate sets-because, for a universal gate set, its circuits start by implementing a Haar random unitary from SU(2 n

TABLE I .
RB error rates on AQT.The RB error rates for every RB experiment we ran on AQT.We benchmarked each connected subset of four linearly-connected qubits and used three different gate sets.Estimating the infidelity of dressed 4-qubit layers.By fitting error models to MRB data, we can estimate the infidelity of each G 1 -dressed layer used in the MRB circuits.Here we show four different estimates of the infidelities of 4-qubit layers containing a single cs, cs †

TABLE III .
qubit subset r Ω (isolated MRB) r Ω (simultaneous MRB) 1-and 2-qubit isolated and simultaneous MRB on IBMQ.We performed simultaneous one-qubit MRB on all 27 individual qubits of ibmq montreal.We also performed simultaneous two-qubit MRB on each connected qubit pair of IBMQ Montreal, in eight groups.We ran isolated MRB experiments on five qubit pairs to compare the error rates from simultaneous and isolated two-qubit mirror RB.Isolated MRB experiments had an error rate approximately 50% smaller than simultaneous MRB experiments.qubitT 1 (us) T 2 (us) frequency (GHz) anharmonicity (GHz) readout error Pr(prep 1, measure 0) Pr(prep 0, measure 1) readout length (ns)

TABLE IV .
IBMQ Montreal calibration data.Calibration data from ibmq montreal from the time of our MRB demonstrations (September 7, 2021).