Randomized Benchmarking Beyond Groups

Randomized benchmarking (RB) is the gold standard for experimentally evaluating the quality of quantum operations. The current framework for RB is centered on groups and their representations, but this can be problematic. For example, Clifford circuits need up to $O(n^2)$ gates, and thus Clifford RB cannot scale to larger devices. Attempts to remedy this include new schemes such as linear cross-entropy benchmarking (XEB), cycle benchmarking, and non-uniform RB, but they do not fall within the group-based RB framework. In this work, we formulate the \emph{universal randomized benchmarking (URB) framework} which does away with the group structure and also replaces the recovery gate plus measurement component with a general ``post-processing'' POVM. Not only does this framework cover most of the existing benchmarking schemes, but it also gives the language for and helps inspire the formulation of new schemes. We specifically consider a class of URB schemes called \emph{twirling schemes}. For twirling schemes, the post-processing POVM approximately factorizes into an intermediate channel, inverting maps, and a final measurement. This leads us to study the twirling map corresponding to the gate ensemble specified by the scheme. We prove that if this twirling map is strictly within unit distance of the Haar twirling map in induced diamond norm, the probability of measurement as a function of gate length is a single exponential decay up to small error terms. The core technical tool we use is the matrix perturbation theory of linear operators on quantum channels.


Introduction
With recent breakthroughs in the hardware development of quantum processors, it becomes increasingly important to be able to efficiently characterize their performance. Such a task, often called benchmarking, is essential in directing future quantum hardware development. An efficient and reliable benchmarking scheme not only allows comparison between different physical platforms and designs, but also provides useful feedback for device calibration and error diagnosis. This in turn provides useful information for future hardware designs, and, eventually, achieving fault-tolerant quantum computing.
A large number of such benchmarking schemes is collectively called randomized benchmarking (RB). Randomized benchmarking aims to extract information about a certain gate or a collection of gates through runs of random gate sequences, while isolating the effect of state preparation and measurement (SPAM) errors. Implementing the RB experiment usually produces an exponential decay curve with respect to the length of the random gate sequences. It is widely assumed that the decay rate of this curve, which can be obtained via fitting the experimental data, indicates a certain fidelity measure of the given gate set [1]. Since its first proposal in [1], randomized benchmarking has been thoroughly analyzed for its theoretical correctness and robustness [2,3,4,5,6], and it has been extensively validated by experiments [2,7,8,9,10]. Many variants of the original RB protocol have been proposed due to its great flexibility and adaptability to different experimental scenarios [3,11,12,13,14], One central question in the study of RB is choosing the probability distributions according to which the random gate sequences are chosen. Most RB schemes require that the gate set forms a group, 1 as the final gate in most RB experiments inverts all the previously applied gates, hence requiring the gate set to be closed under inversion of products. However, requiring that the gate set form a group is not always feasible nor necessary in an experimental setting. We give several reasons: • Realizing an inverse gate can be experimentally challenging. In superconducting devices, arbitrary single-qubit gates can be easily implemented with high fidelities, whereas realizing a new two-qubit gate usually requires significant additional control and calibration. As a result, a connected qubit pair usually only admits a single calibrated two-qubit gate. For two-qubit gates that are locally equivalent to their inverses (or equivalently, locally equivalent to a real rotation in SO(4) [15]), inverting them only requires appending the corresponding single-qubit gates. However, this is not true for all two-qubit gates. Counterexamples include the fSim gates which are indeed used in benchmarking experiments [16].
• Group structure can be very costly to implement. A typical Haar random element in SU (2 n ) would require a circuit of exponential length and depth to realize. A random Clifford gate on n qubits on average requires Ω(n 2 ) size and Ω(n/ log n) depth [17]. Neither of these groups can be used to extract any useful information given the noise levels on current physical devices; the signal would be long gone after implementing just a few group elements.
• Recently, linear cross-entropy benchmarking [16] was proposed as an experiment-friendly alternative to RB on large quantum devices with tens of qubits. Linear XEB uses random shallow circuits whose unitaries clearly do not form a group, but an exponential decay was nevertheless observed experimentally. In fact, multiple existing benchmarking schemes [18,19,20] rely on random sequences of shallow circuits whose unitaries do not form groups.
In this paper, we extend the RB framework beyond groups, and propose a universal randomized benchmarking (URB) framework which incorporates most of existing RB schemes, 2 in particular ones that cannot be easily formulated as group-based protocols. Most importantly, the random gates do not need to form a group but can be any set. Furthermore, instead of a single recovery gate that is often applied at the end of the group-based RB sequence (which is often well-defined given the group structure), we allow for a more general post-processing POVM that depends on the random set elements chosen. For known RB schemes, intuitively the post-processing POVM verifies that the gate sequence was perfectly applied, which is the case for group-based RB where the post-processing POVM factors into the recovery gate and the final measurement. However, under our framework any gate sequence dependent POVM is allowed, even ones that do not follow the intuition of verification. The general, possibly not a group, gate set and the general post-processing POVM define the essence of the generality provided by the URB framework. This helps open the doors to thinking about RB from a new angle and may motivate fundamentally different schemes.
Our main technical contribution is determining conditions under which a URB scheme experiment gives rise to a single exponential decay. The usual strategy to prove exponential decay for an RB scheme is to first formulate the RB measurement probability as a linear functional of powers of a linear operator. The linear operator represents the average effect of one random gate, and the power is the sequence length. The spectral properties of the linear operator then gives the exponential decay. For group-based RB with gate-independent noise, this operator is the twirled noise channel [1,3,11]. For group-based RB with gate-dependent noise it is the Fourier transformation of the implementation map [5,6,21]. For our case where a group structure is not present, we assume the post-processing POVM involves in a certain sense inverting the random gates, leading us to study linear operators on the set of quantum channels known as twirling maps: where µ is some measure over a gate set and ω is a map from the gate set to SU (d). Intuitively, these are higher-order operators representing the averaged joint action of a random gate and its inversion. They appeared in [22] to analyze a specific scheme without group structure. We can prove that single exponential decays can be observed in general when the corresponding twirling map is within γ of the Haar twirling map (set µ as the Haar measure over SU (d)) in induced diamond norm with γ < 1. This does not involve any underlying group structure, but instead the matrix perturbation theory of higher-order operators. The rest of the paper is organized as follows. We first introduce basic notation, introducing operator norms and matrix perturbation theory of higher-order operators in Section 2. We then introduce the URB framework, how an experiment would run, and state sufficient conditions to obtain a single exponential decay in Section 3, together with examples of existing benchmarking schemes formulated as URB schemes. Section 4 presents the proof of the single exponential decay under the assumptions and other relevant technical details. Section 5 concludes with a discussion and open problems.

Linear Operators and Norms
Here we give an exposition of different norms on linear operators that we will use.
Norms on Hermitian matrices. The spectrum of a Hermitian matrix is a vector in R d . We can then define norms of a Hermitian matrix via vector p -norms of its spectrum. For general matrices, these are known as Schatten p-norms. We consider three norms: • Trace norm ρ 1 := spec(ρ) 1 = tr |ρ|.
• Spectral norm It is easy to see that Quantum states are positive semidefinite operators with unit trace, and a positive operator-valued measurement (POVM) element is a positive semidefinite operator with spectral norm upper bounded by 1.
Define the Hilbert-Schmidt inner product on Hermitian matrices as ·, · HS : (ρ, σ) → tr [ρσ]. It is easy to see that the Frobenius norm of Hermitian matrices is induced by this inner product.

Norms on real superoperators
Consider linear maps C on Hermitian matrices, which we call real superoperators. We can treat them as usual linear maps and define the corresponding operator norms: • Induced trace norm • Induced Frobenius norm We can also define another norm via the superoperator inner product: where {X i } i is an orthonormal basis of matrices under the Hilbert-Schimidt inner product. Note that {X i } i can be any orthonormal basis of matrices, and we can extend the action of C to non-Hermitian matrices via linearity. We can then define This norm is the analogue of the Frobenius norm for matrices. Another norm, called the diamond norm, is defined on composite systems: where id is the identity map on C d×d . From the definitions and Equation (2.1) we have the following relations between the norms: We can also prove the following norm inequality: where E(i, j) kl := δ ik δ jl are the elementary matrices, which are orthonormal under the Hilbert-Schmidt inner product and have unit trace norms, and H and AH denote the Hermitian and anti-Hermitian parts, respectively. We conclude For the other direction, we argue where the second inequality follows since any Hermitian matrix can be extended to a basis. A quantum channel is a completely positive, trace-preserving (CPTP) map, and consequently has unit induced trace norm and unit diamond norm. Furthermore, a quantum channel has unit induced Frobenius norm if it is a unitary, and sub-unit induced Frobenius norm if it is a mixture of unitaries. Throughout the paper we will mainly consider the Hilbert space spanned by all channels equipped with the ·, · SO inner product, which we denote as V (d). There is a subspace V 0 (d) with codimension 1 spanned by all differences of quantum channels. This with an arbitrary quantum channel spans the whole V (d). All norms defined on real superoperators natrually carries to V (d) and V 0 (d).
It can be verified that the above norms are all bona fide matrix norms, namely for arbitrary C and D. Furthermore, for the SO norm and the induced Frobenius norm we have

Norms on linear maps on real superoperators
In this work, we investigate linear operators on real superoperators, which we call twirling maps. Again, we can treat them like linear operators. One can define the induced diamond norm of a twirling map Λ as 3 We can similarly define the induced SO norm ||| · ||| 2 := · SO→SO and induced trace norm ||| · ||| tr := · tr →tr , where to avoid awkwardness, we omitted the second "induced". Again, these are all bona fide matrix norms. Note that the ||| · ||| 2 norm corresponds to the usual spectral norm under the superoperator inner product. Although we do not use them here, a detailed discussion involving norms and approximate twirls and unitary designs is given in [23]. We have the following relation similar to Equation (2.2): Moreover, twirling maps have the following property. 3 Without specifying otherwise, all norms are induced from real superoperators. Note that restricting to subspaces V (d) or V0(d) does not change the inequalities and does not increase the induced norms.

Proposition 1 (Data Processing Inequality). If a twirling map Λ can be decomposed as
where we have the appropriate correspondence between twirling map and real superoperator norms. Moreover, we have a tighter bound regarding the induced SO norm on twirling maps and induced Frobenius norm on real superoperators: Proof. The first inequality follows from the triangle inequality plus submultiplicativity, while the second inequality follows from the triangle inequality and Equation (2.4).

Matrix Perturbation Theory
We first state a few results regarding the perturbation of matrices. Let H be a finite dimensional Hilbert space and V 1 , V 2 be subspaces such that V 1 ⊕ V 2 = H. Let A 1 and A 2 be operators on V 1 and V 2 respectively. Let M(V 2 , V 1 ) be the set of linear operators from V 2 to V 1 , that is, linear operators P = X 1 P X 2 where X 1 and X 2 are the projectors onto V 1 and V 2 respectively. For a norm · defined on H, we define the corresponding seperation function sep(·, ·) as From [24] we have the following result.
Theorem 2 (Stewart and Sun [24]). Let H be a finite-dimensional Hilbert space with norm · . This norm naturally induces a norm on linear operators. Let V 1 be a linear subspace of H and V 2 be its orthogonal complement. Let X 1 and X 2 be the projectors onto V 1 and V 2 respectively. Let A be a linear operator on H such that Let E be an arbitrary operator. If E satisfies then there exist operators P 1 , P 2 such that such that A + E can be diagonalized as and We will actually make use of a corollary that has stronger assumptions: Let H be a finite-dimensional Hilbert space with norm · . This norm naturally induces a norm on linear operators. Let X 1 , A, X 2 = I − X 1 be linear operators on H such that (1) X 1 is an orthogonal projector, (2) . Then we obtain the conclusion of Theorem 2 with P 1 ≤ 1, P 2 ≤ 1. Furthermore, • All eigenvalues of A 1 is 2δ-close to 1, Proof. We first prove sep(A 1 , A 2 ) ≥ 1 − γ. This is because A 1 P = X 1 P = P and thus Furthermore, X 2 = I − X 1 ≤ 2. Therefore X 1 EX 1 ≤ δ, X 2 EX 1 ≤ 2δ, X 1 EX 2 ≤ 2δ and X 2 EX 2 ≤ 4δ. It is easy to verify that both assumptions in Theorem 2 hold, and We have From Theorem 2 we have A 1 = X 1 A 1 X 1 . Therefore for any eigenvector v of A 1 with eigenvalue Λ, Finally, proving that κ ≤ 16.
Note that assumption (1) of Corollary 3 implies the setting of Theorem 2.

URB Scheme and Experiment
Let C(d) be the set of qudit channels and H(d) be the set of d × d Hermitian matrices.
Definition 4 (URB scheme). A univeresal randomized benchmarking (URB) scheme R on a d-dimensional quantum system can be expressed as a tuple (S, µ, φ, M, ρ 0 ) consisting of: • A gate set S, encoding the gates to be applied in the URB scheme. For sake of generality, we do not restrict S to be a subset of the unitary group SU (d); instead S can encode any description that leads to an implementation of the gate independent of other gates in a circuit.
• A probability distribution µ over the gate set S.
• An implementation map φ : S → C(d), assuming a gate-dependent yet Markovian noise model.
• A post-processing POVM M : S * → H(d) taking a finite string of elements from the set to a Hermitian operator on C d .
3. Perform a measurement M (g 1 , · · · , g m ) on the final state and get a binary result. 4. Repeat steps 1 to 3 to get an estimationp(m) of the success probability.
It is easy to see that eachp(m) is an unbiased estimator of the quantity The output estimationsp(m 1 ), · · · ,p(m k ) are in practice assumed to be sufficiently close to p R (m 1 ), · · · , p R (m k ) and are fit to a single exponential decay curve f (m) = A + B · u m to extract the decay rate u. The decay rate u is commonly believed to indicate an "average fidelity" of the gate ensemble (see Section 4.4 for the interpretation of the RB value). However, the above URB framework itself does not guarantee a single exponential decay. To guarantee such a decay, we need the following.
where we use the notation for a Hermitian matrix ρ and real superoperator C, C † being the adjoint superoperator with respect to the Hilbert-Schmidt norm. 4 Near-ideal implementation There exists an ideal map ω : S → C(d) into unitary channels, such that the implementation map is close to the ideal map and the inverting map is close to its adjoint: Approximate twirling The twirling map is a γ-approximate twirl: that |||Λ * R − Λ * ||| ≤ γ under a certain norm ||| · ||| (usually taken as the diamond norm) where Λ * : N → g∼η dgg † • N •g is the twirling map where η is the Haar random distribution on SU (d), andg : ρ → gρg −1 is the unitary channel corresponding to the fundamental representation of SU (d) for g.
We will also say that a URB scheme R is an ( , δ, γ) twirling scheme with respect to the tuple (M 0 , φ * , I, ω) in cases where the components are to be specified. The approximately factorizable post-processing POVM condition expresses that a twirling scheme is an RB-like scheme in that after the random gates, a series of inverting gates are applied followed by a measurement. However, this mathematical definition can be more inclusive than it may seem, as we will see that linear XEB also effectively uses a factorizable post-processing POVM. The near-ideal implementation condition simply expresses that the noise levels are bounded. The most interesting condition is the γ-approximate twirl, which expresses that our random gate distribution is constant distance from a unitary 2-design. Figure 1 visually illustrates the URB framework and twirling schemes.
The main technical contribution of our work is a characterization of the exponential decay behavior provided that the URB scheme parameters satisfy certain constraints. More specifically we have the following main result.
Theorem 7 (Theorem 8, informal). Let R be an ( , δ, γ) twirling scheme with respect to the diamond norm, and assume that γ Figure 1: A visual illustration of the URB framework and twirling schemes. (a) A generic URB scheme. Given a sequence length m, we randomly select m i.i.d. samples from a distribution µ. The experiment then applies g 1 , · · · , g m sequentially via an implementation map φ on a fixed initial state ρ 0 , followed by a post-processing POVM M that depends on the random elements g 1 , · · · , g m . (b) Approximate factoring. When the post-processing POVM admits an approximate factoring, it can be approximately decomposed into an intermediate channel I, a sequence of inverting operations φ * (g m ), · · · , φ * (g 1 ), and a fixed measurement operator M 0 . (c) A twirling map. The average joint operation of φ(g) and φ * (g) sandwiching a channel can be formulated as a linear operator Λ R on quantum channels. (d) A URB scheme in terms of twirling maps. Expressing the probability of measurement in terms of the twirling map Λ R reduces the proof of the exponential decay to the study of its spectral properties.
We see that the crucial property we need to establish to apply Theorem 7 is γ < 1, so that for sufficiently low experimental error δ, γ ≤ 1 − 11δ. One may mistakenly think that our result is essentially saying there is a single exponential decay if our twirling map is close to the Haar twirl Λ * , which intuitively means the unitary ensemble defined by µ, ω is close to a unitary 2-design. This is of course unsurprising. However, we stress that we do not require γ to be small, but just less than 1. This is not as strong of a requirement as being close to a unitary 2-design. Note also that Theorem 7 by itself is not sufficient to imply we can extract a single exponential decay from measured data. In general, we require the magnitude of B to be significantly larger than 0, p to be significantly larger than γ + 6δ, and is sufficiently small for the URB experiment to be able to extract a single exponential decay with decay rate close to p given sufficiently many repeated experiments. For the effect of , see Section 4.5 for an analysis of the robustness of fitting to a perturbed exponential decay.

Examples of URB Schemes
In the URB framework, the post-processing POVM M is defined abstractly for generality. To our knowledge, all known URB schemes can be approximately factorized, but we leave open other possibilities. We here give examples of schemes that fall into our framework. Note that we make the distinction between a scheme falling into our framework and it being guaranteed by our theorem to have a single exponential decay, which requires additional assumptions. Further assumptions are required for this exponential decay to be extractable. We summarize the relationship between the URB framework, our class of twirling schemes, and the existing group-based framework as well as other classes of RB schemes in Figure 2.
Group-based Randomized Benchmarking Standard group-based RB can be readily formulated as a URB scheme, where the gate set is taken as a group G, with the distribution being the uniform distribution over the group. The post-processing POVM is then M (g 1 , · · · , g m ) = M 0 · φ(g −1 1 · · · · · g −1 m ), i.e. applying a fixed measurement after physically applying the gate corresponding to the inverse of the product of the previous elements. This post-processing POVM admits a δ-approximate factoring into the triple (M 0 ,ω † , id) the when the implementation map φ is δ-close to a representationω, that is, Moreover, in the case that the uniform distribution over the image of the ideal map forms a unitary 2-design, the twirling map is an exact twirl, and a single exponential decay occurs whenever δ < 1 11 . More general cases where there are different representations or multiplicities of representations still lie in the URB framework, but we do not consider them below.
Note that unlike the results for example in [6], we have an additional constant error term . This is because we want to reduce our analysis to twirling maps, and to do this we need to consider φ * (g 1 ) • · · · • φ * (g m ) instead of φ((g m · · · g 1 ) −1 ). We also cannot consider the latter option because we lack group structure.
Non-uniform RB There have been several extensions to standard RB [2,26,19] where the random gates are drawn from non-uniform distributions over a group, since a typical Haar random element is too costly. Such RB variants readily fit into our URB framework as it makes no assumption on the distribution. In fact, URB schemes with an ideal reference map ω naturally gives rise to a (not necessarily uniform) distribution on the special unitary group of corresponding dimension. Our work extends [26] in several ways, but most importantly we do not require the distribution to be approximately uniform, nor inverse-symmetric, nor that its support contains the generators of a group. To be a twirling scheme, we only require that the twirling map is close to Haar under a certain norm. As a result, our result provides tighter bounds with less assumptions on the distribution, and also applies to infinite groups. We leave it to future work to study a twirling scheme where there are multiple irreducible representations or irreducible representations with multiplicities, which would result in a matrix exponential decay.
Linear Cross-entropy Benchmarking (Linear XEB) Linear cross-entropy benchmarking (Linear XEB) is a benchmarking scheme first introduced in [16]. In this framework, a certain number of layers, typically shallow, of random circuits C 1 , · · · , C m are applied to an initial state. A subsequent measurement then returns a bitstring x, whose ideal probability q(x) is numerically simulated on a classical computer. The protocol returns the value 2 n q(x) − 1, where n is the number of qubits in the system.
We claim that linear XEB falls into the URB framework, provided that the random shallow circuits are chosen i.i.d. from a distribution µ over a set S. To see this, we express the expected outcome where the distributions q,q are defined as for some POVM {M x } x∈{0,1} n and initial state ρ 0 . By lifting the distributions as diagonal operators, we have representing the ideal and physical measurement under the computational basis, and C i the ideal implementation of C i as a unitary channel. Using the fact that Q = Q † , we have that is, the post-processing POVM, being the combination of the quantum measurement followed by classical simulation, can be exactly factored in terms of a tuple (|0 0|,ω † ,D), with a prefactor of 2 n . Linear XEB experiments suggest that F (m) can be fit into a single exponential decay 1. This implies that the single exponential decay may not be observable unless the circuit depth approaches ω(n log(1 − γ)) so that the error term is negligible compared to the actual single exponential decay. We leave a more detailed analysis of linear XEB experiments for future work.
Cycle Benchmarking Cycle benchmarking [20] is a specialized protocol to measure the Pauli errors of Clifford gadgets in a large-scale quantum processor. Given an n-qubit Clifford gate g ∈ C n , let k = ord(g) := min{k ∈ Z + |g k = I}. Each random element is chosen uniformly from the set of n-qubit Pauli operators P ⊗k n , and is implemented by for some physical implementation ϕ defined on all Pauli gates and g. It can be proven that in the noiseless case such implementations result in a Pauli operator, and it is therefore sufficient to calculate and implement the final Pauli gate as the recovery gate. The post-processing POVM can be approximately factored given ϕ on the set of Pauli operators is close to ideal. One distinct feature of the cycle benchmarking is that there is almost never a single exponential decay: the twirling map would be far from an approximate twirl since the distribution effectively defined on the set of Pauli channels. Instead, there is typically an exponential number of exponential decay components from a single experiment. Multiple experiments, typically with different measurements, are required in order to isolate the exponential decay components to give useful information about the Pauli error channels. Our proof centers around twirling maps that map channels to channels; for a URB scheme R = (S, µ, φ, M, ρ 0 ) that is a twirling scheme with parameters (M 0 , φ * , I, ω), define the ideal twirling map and the physical twirling map The following is a summary of the proof: We first approximate p R (m) as a linear function of the m-th power of the physical twirling map Λ R , transforming the problem into the study of the major spectral components of Λ R . This is then analyzed by regarding Λ R as a perturbed version of the ideal twirling map Λ * R , whose spectral properties are given by the γ-approximation with respect to the Haar twirl Λ * . For sake of simplicity we denote the Haar random distribution η without explicity specifying the dependence on the dimension d.
Proof. We first compute p R (m): where the first inequality uses the triangle inequality, the second inequality uses the Hölder inequality, the third equality uses the fact that all φ(g i )'s are channel and thus φ(g m ) • · · · • φ(g 1 )(ρ 0 ) is a state with unit trace, and the final inequality follows by the factorization assumption. Now we turn to study the quantity tr[M 0 · Λ m R (I)(ρ 0 )]. We do this by studying the spectral properties of Λ R . Specifically, we would like to invoke Corollary 3 with the following setup: We verify that all the assumptions in Corollary 3 hold.
1. X † 1 = X 1 , X 2 1 = X 1 . The first is because for any C, D ∈ V (d), Here we used the facts that {Y g i } i := {g(Y i )} i is an orthonormal basis under the Hilbert-Schmidt inner product if {Y i } i is, that (g † ) † =g, and that the Haar measure is inverse symmetric.
To prove X 2 1 = X 1 , define Λ(ν) := g∼ν dgg † • · •g for arbitrary distribution ν defined on SU (d). We then have where we have used the fact that the Haar measure is right invariant. Taking ν = η proves R is uniquely determined by the distribution µ ω on SU (d) which is defined by the distribution µ on S and the ideal map ω, we have Λ * Λ * R = Λ * and similarly Λ * R Λ * = Λ * from the above. This directly leads to where the first inequality uses the triangle inequality and Proposition 1, and the second inequality uses that ω(g) , φ * (g) = 1 since they are channels.
7. This is assumed.
We can now invoke Corollary 3 to obtain L 1 , L 2 , R 1 , R 2 , A 1 , A 2 such that Λ R is diagonalized: The second term is bounded by an exponential decay: We finally show that the term tr[M 0 · L 1 (A 1 ) m R 1 (I)(ρ 0 )] is exactly a single exponential decay. Specifically, we show that the rank of A 1 is 2, and it can be diagonalized with one eigenvalue 1 and the other p ∈ [1 − 2δ, 1].

A 1 is of rank 2.
First, by construction the rank is at most 2. Also, for any quantum channel N , for some α N . Therefore Im(Λ * ) = span({dep, id}) and rank(Λ * ) = 2. Moreover, since A 1 − A 1 ≤ 2δ and A 1 = Λ * is a projector, the two eigenvalues λ 1 , λ 2 of A 1 must be 2δ-close to 1 and A 1 is therefore of rank 2.
2. Eigenvalues of A 1 are bounded in magnitude by 1. For any non-zero eigen-channel N of Λ R with eigenvalue λ, we have Therefore all eigenvalues of Λ R are bounded in magnitude by 1, and so are A 1 , A 2 since their two spectra constitute a bipartition of the spectrum of Λ R .
3. All Λ R , Λ * R and Λ * are real-valued under a properly chosen basis. Consider a orthogonal basis B (under the Hilbert-Schmidt inner product) of the space of Hermitian matrices. Then, any real superoperator is defined by its action on elements of B, giving rise to a matrix representation. In particular, real superoperators will be real-valued matrices since they preserve hermiticity. Now, we can consider a basis B := {N i } i of real superoperators with a matrix representation with respect to B being an elementary matrix: a single 1 and 0's everywhere else. Since ω(g), φ(g) and φ * (g) are quantum channels for all g, Λ R , Λ * R , Λ * maps N i to real matrices with respect to B and therefore are themselves real matrices with respect to B. This also indicates that L 1 , L 2 , R 1 , R 2 , A 1 , A 2 are real with respect to B. Additionally, the intermediate channel I, the initial state ρ 0 , and the final measurement M 0 are all real with respect to the corresponding bases. 4. A 1 has one eigenvalue 1. Consider the channel C φ * := g∼µ dgφ * (g).

5.
where the magnitude of the first term is bounded by p R (m) + ≤ 1 + and that of the second term by 16(γ + 6δ) m ≤ 16.

Remark 9.
In the proof we chose the Hilbert space H to be the channel space V (d) as it is an invariant space under all twirling maps Λ * , Λ * R and Λ R . A similar proof considers the channel difference space V 0 (d) as it is again an invariant subspace under all above twirling maps by taking as input the channel difference I − E ρ * . We will use this fact when giving upper bounds on γ.

Proving Measures are Approximate Twirls
In Theorem 8, a key requirement is for the ideal twirling map Λ * R to be a γ-approximate twirl with γ < 1. Here, we give several possible ways to upper bound γ. A trivial upper bound is 2 which follows directly from Proposition 1.
We establish some notation. Note that a probabilistic distribution µ on the set S together with the ideal map ω : S → SU (d) defines a probabilistic distribution on SU (d), from which the ideal twirling map Λ * R is uniquely determined. For the rest of the section, we consider probabilistic distributions µ on SU (d), and let Λ(µ) := g∼µ ω(g) † • · • ω(g)dg.

Bounds on γ for Convex Combinations
One straightforward yet somewhat restrictive way of bounding γ is when µ is a convex combination of a unitary 2-design ν with any other measure.
We first observe that if µ is a p-convex combination of some measure µ and ν, then Note that this implies the set of twirling maps corresponding to some measure is convex. However, this implies Moreover, Λ(ν) = Λ(η), where η is the Haar random distribution on SU (d). We conclude the following:

Proposition 10. If µ is a p-convex combination of any measure and a unitary 2-design, then µ is a 2p-approximate twirl.
Consider the Clifford group C(d) ⊆ SU (d) and let the measure µ C be the uniform distribution over C(d). Suppose a discrete measure µ has support that includes C(d). Then, define If m satisfies this with equality then µ = µ C . Assume it doesn't. Then, define the measure µ as By construction, µ is a legitimate measure. That is, µ is a (1 − m|C(d)|)-convex combination of µ and µ C . Note that µ having support that includes C(d) is also a necessary condition for it to be a nontrivial convex combination involving µ C . By the above result, we can conclude µ is a 2(1 − m|C(d)|)-approximate twirl. We can therefore conclude Proposition 11. If a measure µ's support includes the Clifford group with all probabilities greater than half of that of the uniform distribution over the Clifford group, it is a γ-approximate twirl with γ < 1.
In particular, we can find another distribution over the Clifford group for which we can do URB. This can be extended to any unitary 2-design that is a uniform distribution over a finite set. We can also consider more general two-designs. Let ν be a measure with support a finite subset S ⊆ SU (d) that is a unitary 2-design. We define Consider a measure µ whose support includes S. Define We conclude m S (µ) ≤ M S (ν). Then, we can define the measure By construction, µ is a legitimate measure. Thus, µ is a convex combination of µ and ν. Note that µ having support on S is a necessary condition for µ to be a nontrivial convex combination involving ν. We conclude Proposition 12. Given a unitary 2-design ν with finite support S, if a measure µ has probability greater than half of the max probability of ν on all of S, µ is a γ-approximate twirl with γ < 1.
Note that this implies if M S (ν) ≥ 2/|S|, which is impossible. Thus in this case we cannot find such a measure µ. Note that we can easily extend these arguments to the case when the support of µ is not finite.

Bounds on γ via Pauli Representation
For the following bounds we restrict ourselves to the case of qubits, that is d = 2 n . We consider the Pauli matrices , that is tensor products of {I, X, Y, Z}, that span L(C d ) and are orthogonal under the Hilbert-Schmidt inner product With this basis, we can express any linear operator. Since superoperators can be expressed in terms of left and right multiplications by operators, in general we can express a superoperator as a matrix α whose coefficients are defined by: We refer to this as the α matrix and will sometimes specify which superoperator we're considering by α(T ). Unless otherwise specified, summation indices range from 1 to d 2 . Going further, given a supersuperoperator Λ, we now introduce the β tensor whose coefficients are defined by We can linearly extend this to obtain the action of Λ on any superoperator. We refer to this as the β tensor. The supersuperoperators we will mainly be considering are twirling maps corresponding to a measure µ, and we sometimes write β(µ) as the β tensor for Λ(µ). For a more detailed introduction to these representations and some basic facts that will be relevant to proving upper bounds on γ, see Section A. The first bound is the following: Proposition 13. Let β(µ) be the β tensor for the twirling map Λ(µ). Then, Proof. We prove this in Section B.
This is a direct and general upper bound on the diamond norm. However, its form is reminiscent of an 2 -norm, while we would expect something that looks more like an induced 1 -norm. We can prove such a bound for a special case.

Proposition 14.
Suppose where µ P is the uniform distribution over Pauli operators. Furthermore, let T be a difference of two channels: T = N 1 − N 2 . Then, Proof. We prove this in Section C.
Intuitively, the Pauli representation lets us express supersuperoperators as a linear map on matrices, and our upper bound on the corresponding induced diamond norm resembles the induced 1 -norm for regular matrices: The assumption that Λ(µ) is right invariant under Λ(µ P ) is natural in that this allows us to connect the trace norm of α with the superoperator's diamond norm. This right invariance is easily enforced by adding a uniformly random Pauli operator before each random gate for any URB scheme, along with the fact that Λ(µ P ) is a projector as shown in Section C. Finally, the restriction that T is a difference of channels still allows us to bound γ in Theorem 8. This is because of Remark 9 and the fact that To see this, we simply observe that a linear combination of a difference of channels is simply a scaled difference of channels. Without loss of generality, let c 1 , c 2 ≥ 0 and consider As an exercise, we can evaluate γ for µ P : This for d = 2 is already 4/3 > 1, so this does not justify uniform Pauli randomized benchmarking. However, as d → ∞, the upper bound goes to 2, which is the highest the induced diamond norm distance can be.

Approximate Twirls in Other Norms
Theorem 8 shows that a single exponential decay can be observed when R is a twirling scheme with respect to the induced diamond norm. We give upper bounds on the induced diamond norm, but in general this is difficult to compute. Other norms are easier to compute and characterize, and we give the following corollaries as alternatives to Theorem 8. Proof. This can be readily proved by replacing the diamond norms in the proof of Theorem 8 by the trace norm.
Given a twirling map, it is often much easier to observe its spectrum through diagonalization of its matrix representation. In contrast, the diamond norm or trace norm are typically computed via semidefinite programming. The following gives an analog of Theorem 8 in terms of a spectral gap, although with an error term involving the dimension of the quantum system. Corollary 16 (Single exponential decay with respect to the Frobenius norm). Let R = (S, µ, φ, M, ρ 0 ) be an ( , δ, γ) twirling scheme on quantum systems with dimension d with respect to the tuple (M 0 , φ * , I, ω) under the · 2 norm. Let δ := d · δ. Given that δ ≤ 1−γ 11 , there exists A, B ∈ R and p ∈ [1 − 2δ , 1] such that Furthermore, in the case that φ * or φ maps to unitary mixtures we can restrict to δ = √ d · δ.
Proof. There are only two places in the proof of Theorem 8 where the diamond norm is explicitly used, and we replace them with the spectral norms. The first one is at Equation (4.2), to bound the norm of the perturbation E. Here we instead have ≤ dδ, (4.4) where in the third inequality we used the fact that ω(g) 2 = 1 ≤ √ d and φ * (g) have 2 norm upper bounded by √ d. In case that φ * (g) is a unitary mixture (similarly for φ(g)), the bound can further be tightened to √ dδ by observing φ * (g) 2 ≤ φ * (g) tr = 1. The second one is at Equation (4.3). Here we have (4.5) As in the case of Theorem 8, the crucial property we need to establish is γ < 1, but now with respect to these other norms.

Gauge Invariance
RB schemes are known to be gauge invariant: that an implementation map φ only needs to be close to a conjugation of the ideal map ω in order to exhibit the desired exponential decay, even if φ and ω were far apart under diamond norm. This is still true in the URB setting, and gives us the following alternative definition for the near-ideal implementation.
With the definition of near-ideal implementation under gauge, we can further define that a URB scheme R is a ( , δ, γ, κ) twirling scheme under gauge if it has such a near-ideal implementation. The gauge free version corresponds to the case that U = V = id and κ = 1. Our single exponential decay results for different norms can be readily applied to twirling schemes under gauge. We have the following.
Furthermore, in the case that φ * or φ maps to unitary mixtures we can restrict to δ = √ d · δ.
Proof. The proofs closely mimic the proofs for Theorem 8, Corollary 15 and Corollary 16. The central difference is to substitute the use of Λ R with a gauge-corrected twirling map with the observation that By treatingΛ R as a perturbed version of Λ * R , we can similarly get a diagonalization (L 1 , L 2 , R 1 , R 2 , A 1 , A 2 ) ofΛ R . This gives (4.6) The second term is an exponential decay term; we have from which Equation (4.3) or Equation (4.5) apply. The analysis of the first term is similar to that of the proof of Theorem 8.

Relating the Decay Rate to the Average Fidelity
The decay rate in an RB or URB scheme is often believed to indicate the average "quality" of a collection of gate implementations. Mathematically, it corresponds to the second largest eigenvalue (guaranteed to be real when the URB scheme is a twirling scheme) of the twirl operator Λ R . Relating such an eigenvalue with an indicator of the average "quality" of gates can be non-obvious and sometimes tricky [27]. Of course, we can always avoid this question by defining a URB scheme's decay rate as a figure of merit by fiat, but we can consider some cases where there is a relatively clear connection between the average fidelity and the second largest eigenvalue. During the proof of Theorem 8, we see that the eigen-channel with eigenvalue 1 is always a replacement channel E ρ * . All other eigen-superoperators, including the one corresponding to the second largest eigenvalue, lie in the space of channel differences V 0 (d). To see this, note that any eigne-channel N ∈ C(d) of Λ R must be of eigenvalue 1 (otherwise Λ R (N ) is no longer a channel) and consequently corresponds to an eigen-superoperator N − E ρ * ∈ V 0 (d). Let D R ∈ V 0 (d) be the eigen-superoperator corresponding to the eigenvalue p R = λ 2 .
An easy case that p has a natural interpretation is when the line {E ρ * + λ · D R |λ ∈ R} contains a unitary channel U. In this case we can perform a unitary gauge transformation φ(g) → U • φ(g) • U −1 so that we can assume without loss of generality that the unitary channel is the identity channel id. 6 Then The decay rate p is then related to the fidelity of the channel Λ R (id) = g∼u dgφ * (g)φ(g), which in turn is the average fidelity of the channels φ * (g)φ(g) under the probability distribution µ, specifically, where the average fidelity of a channel C is defined as over Haar random unitaries U . One example of an error model satisfying this condition is the gate-dependent replacement model: that there exists probabilities p(g) and states ρ(g) for each g, For gate independent noise models we can without loss of generality consider either in-between noise models or sandwiched noise models R . • For the in-between noise model, we have In the case that µ gives rise to a unitary 2-design, or equivalently Λ * R = Λ * , we have . This recovers the case that RB on unitary two-designs extract the average fidelity of gate-independent noises.
• For the sandwiched noise model, we have Then the RB exponent p R relates to the average fidelity of N L • N R if the following holds: where ρ * is the eigenstate of the channel g∼µ dgφ * (g) = N L • g∼µ dgω(g) † . This also guarantees that p R = p.
To summarize, for gate-independent noise models, the RB exponent extracted from the experiment is not always determined by the average fidelity of the error channels, partially because that the ideal twirl is not always a full twirl. Special cases where there is such a determination is when the ideal twirl is a full twirl, and when the noise channels adds up to an replacement channel with the maximal eigenstate.

Analysis on Robustness of Data Fitting
In this section, we will prove that a sufficiently small perturbation to an exponential decay curve will not significantly affect the extracted decay rate. More specifically, we will prove the following lemma: then the decay rates β and α must satisfy Proof.
Assume without loss of generality that 1 > β ≥ α > 0. Then we have Lemma 19 bounds the deviation of the decay rate, given that the fitted curve is O( /A 0 )-close to a certain ground truth single exponential decay curve under the ∞ -norm. In our case, such ∞ deviations come from the error terms + 16(γ + 6δ) m , standard deviation from sampling Θ log k √ K (k being the number of sequence lengths chosen and K the number of repeats for each sequence length), and imperfect fitting algorithms. Even though finite makes the error term non-vanishing, we conclude that unless the number of samples for each sequence length exceeds −2 , the inaccuracy of decay rate extraction mainly comes from stochastic fluctuations of the random experiments.

Discussion and Open Questions
In this paper we propose a new framework for randomized benchmarking, which we call the URB framework. To formulate this we take major components of widely used benchmarking schemes such as group-based RB and linear XEB and then generalize them. With this generalization we can express a much wider variety of schemes that could possibly overcome the shortcomings of current schemes, such as scalability. For a certain class of URB schemes, which we call twirling schemes, we can prove an exponential decay using Theorem 8, a hallmark feature of randomized benchmarking schemes. We saw that the crucial property we need to establish to apply Theorem 8 is for the twirling operator Λ * R to be a γ-approximate twirl for γ < 1. We gave upper bounds for γ and give alternative exponential decay results for easier bounds to compute, such as the ||| · ||| 2 norm. We also discuss issues of gauge invariance and the interpretation of the decay rate.
One recent RB variant, called randomized benchmarking with mirror circuits [18], aims to extract system fidelity information from shallow Clifford circuits. The random gate sequence consists of 1. Clifford layers with a mirror structure, i.e. the second half of the Clifford circuits are inversions of the first half 2. Uniform Pauli gates interleaving the Clifford layers 3. Single-qubit Cliffords at the beginning and the end of the circuits.
This falls into our RB framework as all gates after the first half of Clifford layers can be seen as the post-processing POVM. However, with the presence of the interleaving Pauli gates, it is impossible to separate out "inverting maps" that locally approximately inverts the implementation maps. Hence, this scheme does not readily yield to our analysis. In this scheme, the Pauli error is corrected on a global scale after the final measurement. We leave it to future work to analyze such schemes involving complex correlations between the random gates.
We leave some open questions for future work. Obviously, the value of our framework is really proved by novel benchmarking schemes that have better performance by bypassing the group-based requirement. One work towards this direction is [28], which proposes performing linear XEB with shallow Clifford circuits so that the classical simulation step is much more scalable to larger numbers of qubits. They simulate systems with more than a thousand qubits to showcase this scalability. An earlier work [22] also considered a variant of RB that does not form a group.
The upper bounds on γ are somewhat loose and could be analyzed more systematically. In particular, it would be interesting to express the induced diamond norm as an SDP so that it can be bounded or even exactly calculated. Furthermore, various inequalities of the proof of Theorem 8 could be tightened, such as improving the factor of 11 in the δ ≤ (1 − γ)/11 bound and a possible typicality argument for the bounding of (A 2 ) m . Also, our result on robustness of data fitting naturally leads to a question of sample complexity, which we also leave for future work. As discussed above, the interpretation of the decay rate is still a matter of discussion. Also, the exponential decay of the F (m) quantity for linear XEB defined in Equation (3.1) is still to be further investigated. Lastly, from our work we recognize the value of a systematic formulation of norms for multilinear algebra as applied to quantum information science.
Another direction to pursue is to recover matrix exponential decay results for group-based RB [4,6,5,21]. There are two major differences between our result and Fourier-analysis based results for group-based RB. First, we treat the final recovery gate as an imperfect implementation of the perfect recovery gate, which enables factorization and hence the study of twirling maps, but this introduces a constant error term which is not present in the Fourier-based analyses. Intuitively, this is because our twirling map approach fails to capture the correlation between the noise in the final recovery gate with those of the previously applied gates due to taking the expectation over the group. This removes the constant error term in Fourier analysis-based approaches. We leave the incorporation of such correlated noise in a twirling maps approach to future work. Also, the case where multiple irreducible representations are concerned corresponds to the case where Λ * R = Λ * is a projector on a larger space and is thus not an approximate twirl. Recovering the matrix exponential result then requires an investigation into Λ * R that do not form approximate twirls. However, our framework is not concerned with group structure or irreducible representations, but only the twirling map. This could provide a conceptually easier approach when dealing with infinite groups.

A Properties of the α Matrix and β Tensor
We first recall that we have a basis {P i } d 2 i=1 of L(C d ) that satisfies tr[P i P j ] = dδ ij .
We can use this basis to express superoperators T = i,j α i,j P i · P j and supersuperoperators Λ : P i · P j → k,l β k,l i,j P k · P l .
We establish some facts relating the norms of the α matrix with the norms of its corresponding superoperator. We will use these results when proving the bounds on γ.
Proposition 20. Let T be a superoperator. Then, where · SO is the norm on superoperators induced by the ·, · SO inner product.
In the fourth equality we used for any operator X, This directly gives our conclusion.

Proposition 21.
Let N be a quantum channel. Its α matrix is positive semidefinite with unit trace.
Proof. Let N be a quantum channel. Then, it has a Kraus representation where K ≤ d 2 is the Kraus rank and Suppose the decomposition of A k ∈ L(C d ) in terms of {P i } i is given by Then, This implies α is a Gram matrix, which is equivalent to being positive semidefinite. Furthermore, Taking traces on both sides, Hence, α has unit trace.
This is a nice property as intuitively one would expect a quantum channel to be a higher-order tensor analogue of a quantum state. However, we can't take this analogy that far since the converse is not true. We next consider the β tensor for twirling maps corresponding to a measure. We write out: Since the Paulis are Hermitian and orthogonal, u(g) i,k ∈ R and d 2 k=1 |u(g) i,k | 2 = 1. Then, This proves β k,l i,j (µ) ∈ R. From this representation we also obtain the identity: α i,j β k,l i,j (µ))P k · P l is also a channel, which puts additional constraints on β(µ). Lastly, we compute the β tensor for the Haar measure: β(η). We can use the expression for a Haar twirl from [1]  If exactly one of P i , P j is I, then the RHS is zero. If they're both not I,

C Induced 1 -Norm Like Bound
Here we prove Proposition 14: Proof. We first establish that Furthermore, we have that Λ(µ P )(T ) = Λ(µ P ) where ·, · S denotes the symplectic inner product of the binary symplectic representations of a Pauli string [29] and ⊕ denotes bitwise addition. Two Paulis commute iff the inner product vanishes. The fourth equality follows from the distributive property of the inner product. The last inequality follows because when i = j, b i ⊕ b j = 0, while when i = j, half of the summed b k is orthogonal to b i ⊕ b j while the other half is not and therefore cancels out.