Benchmarking Gate Fidelities in a Si/SiGe Two-Qubit Device

We report the first complete characterization of single-qubit and two-qubit gate fidelities in silicon-based spin qubits, including cross-talk and error correlations between the two qubits. To do so, we use a combination of standard randomized benchmarking and a recently introduced method called character randomized benchmarking, which allows for more reliable estimates of the two-qubit fidelity in this system. Interestingly, with character randomized benchmarking, the two-qubit CPhase gate fidelity can be obtained by studying the additional decay induced by interleaving the CPhase gate in a reference sequence of single-qubit gates only. This work sets the stage for further improvements in all the relevant gate fidelities in silicon spin qubits beyond the error threshold for fault-tolerant quantum computation.


INTRODUCTION
With steady progress towards practical quantum computers, it becomes increasingly important to efficiently characterize the relevant quantum gates. Quantum process tomography [1][2][3] provides a way to reconstruct a complete mathematical description of any quantum process, but has several drawbacks. The resources required increase exponentially with qubit number and the procedure cannot distinguish pure gate errors from state preparation and measurement (SPAM) errors, making it difficult to reliably extract small gate error rates. Randomized benchmarking (RB) was introduced as a convenient alternative [4][5][6][7]. It estimates the gate fidelity as a concise and relevant metric, requires fewer resources, is more robust against SPAM errors and works well even for low gate error rates.
Various randomized benchmarking methods have been investigated to extract fidelities and errors in different scenarios. In standard randomized benchmarking, sequences of increasing numbers of random Clifford operations are applied to one or more qubits [5,6]. Then, loosely speaking, the average Clifford gate fidelity is extracted from how rapidly the final state diverges from the ideally expected state as a function of the number of random Clifford operations. In interleaved randomized benchmarking, the fidelity of a particular quantum gate is obtained by interleaving that gate in a reference sequence of random Clifford gates and studying how much faster the final state deviates from the ideal case [8]. Simultaneous randomized benchmarking uses simultaneously applied random Clifford operations to different qubits to characterize the degree of cross-talk [9].
A major drawback of these traditional randomized benchmarking methods is that the number of native gates that needs to be executed in sequence to implement a Clifford operation, can rapidly increase with the qubit number. For example, it takes on average 1.5 controlledphase (CPhase) gate and 8.25 single-qubit gates to imple-ment a two-qubit Clifford gates [10]. This in turns puts higher demands on the coherence time, which is still a challenge for near-term devices, and leads to rather loose bounds on the gate fidelity inferred from interleaved randomized benchmarking [8,11]. Therefore, in early work characterizing two-qubit gate fidelities for superconducting qubits, the effect of the two-qubit gate projected in single-qubit space was reported instead of the actual two-qubit gate fidelity [12,13]. For semiconductor spin qubits, even though two-qubit Bell states have been prepared [14][15][16][17] and simple quantum algorithms were implemented on two silicon spin qubits [15], the implementation issues of conventional randomized benchmarking have long stood in the way of quantifying the two-qubit gate fidelity. These limitations can be overcome either by using different native gates [17] or by using a new method called character randomized benchmarking (CRB) [18], which allows to extract a two-qubit gate fidelity by interleaving the two-qubit gate in a reference sequence consisting of a small number of single-qubit gates only. As an additional benefit, CRB provides detailed information on separate decay channels and error correlations.
Here we supplement standard randomized benchmarking with character randomized benchmarking for a comprehensive study of all the relevant gate fidelities of two electron spin qubits in silicon quantum dots, including the single-qubit and two-qubit gate fidelity as well as the effect of cross-talk and correlated errors on single-qubit gate fidelities. This work is of strong interest since silicon spin qubits are highly scalable, owing to their compact size (< 100 nm pitch), coherence times up to tens of milliseconds and ability to leverage existing semiconductor technology [19,20]. A double quantum dot is formed in the Si/SiGe quantum well, where two spin qubits Q1 (blue spin) and Q2 (red spin) are defined. The green shaded-areas show the location of two accumulation gates on top of the double dot and the reservoir. The blue dashed lines indicate the positions of three Co micro-magnets, which form a magnetic field gradient along the qubit region. MW1 and MW2 are connected to two vector microwave sources to perform EDSR for single-qubit gates. The yellow ellipse shows the position of a larger quantum dot which is used as a charge sensor for single-shot readout. Plunger gates P1 and P2 are used to pulse to different positions in the charge stability diagram as needed for initialization, manipulation, and readout, as well as for pulsing the detuning for controlling the two-qubit gate.

DEVICE AND QUBIT OPERATION
The device is cooled to ∼ 20 mK in a dilution refrigerator. By applying positive voltages on the accumulation gate, a two-dimensional electron gas is formed in the quantum well. Negative voltages are applied to the depletion gates in such a way that two single electrons are confined in a double well potential [15]. A 617 mT magnetic field is applied in the plane of the quantum well. Two qubits, Q1 and Q2, are encoded in the Zeeman split state of the two electrons.
Single-qubit rotations rely on electric dipole spin resonance (EDSR), making use of artificial spin-orbit coupling induced by the transverse magnetic field gradient from three cobalt micro magnets fabricated on top of the gate stack [21]. The longitudinal magnetic field gradient leads to well-separated spin resonance frequencies of 18.34 GHz and 19.72 GHz for Q1 and Q2 respectively. The rotation axis in thex −ŷ plane is set by the phase of the on-resonance microwave drive, while rotations around theẑ axis are implemented by changing the rotating reference frame in software [22].
We use the CPhase gate as the native two-qubit gate. An exchange interaction J(ε) is switched on by pulsing the detuning ε (electrochemical potential difference) between the two quantum dots, such that the respective electron wave functions overlap. Due to the large difference in qubit energy splittings, the flip-flop terms in the exchange Hamiltonian are ineffective and an Ising interaction remains [15,16,23,24]. The resulting time evolution operator in the standard {|00 , |01 , |10 , |11 } basis is given by Choosing t = π /J( ) and adding single-qubitẑ rotations on both qubits, we obtain a CPhase operator with Z i (θ) aẑ rotation of qubit i over an angle θ. Spin initialization and single-shot readout of Q2 are realized by energy-selective tunnelling [25]. Q1 is initialized to its ground spin state by fast spin relaxation at a hotspot [26]. For read-out, the state of Q1 is mapped onto Q2 using a conditional π rotation [15,24], which enables extracting the state of Q1 by measuring Q2. Further details on the measurement setup are provided in Appendix A.

INDIVIDUAL AND SIMULTANEOUS RANDOMIZED BENCHMARKING
In standard randomized benchmarking, sequences of random Clifford operations are applied to a number of target qubits, followed by a final Clifford operation that, in the absence of errors, maps the qubits' state back to the initial state. Twirling one or more qubits via random Clifford operations symmetrizes the effects of noise such that the qubits are effectively subject to a depolarizing channel. The probability P that the qubits returns to the initial state then decays exponentially with the number of Clifford operations m, under broad assumptions [27][28][29]. By fitting the decay curve to where only A and B depend on the state preparation and measurement, the average fidelity of a Clifford operation can be extracted in terms of the depolarizing parameter α as where d = 2 N and N is the number of qubits. For the blue squares, random Clifford operations are applied to Q1 when Q2 simultaneously (C ⊗ C). For each data point, we sample 32 different random sequences, which are each repeated 100 times. Dashed lines are fit to the data with a single exponential. A constant offset of -0.06 is added to the blue curve in order to compensate for a change in read-out fidelities between the two data sets, making comparison of the two traces easier. Without SPAM errors, the datapoints would decay from 1 to 0.5. (c) Analogous single-qubit RB data for Q2, with Q1 idle (red circles) and subject to random Clifford operations (blue squares). A constant offset of -0.05 is added to the blue datapoints. Throughout, single-qubit Clifford operations are generated by the native gate set {I, X(π), Y (±π), X(±π/2), Y (±π/2)}.
In the present two-qubit system, we first perform standard RB on each individual qubit (red data points in Fig. 2), finding F avg = 98.50 ± 0.05% for Q1 and F avg = 97.72 ± 0.03% for Q2 (all uncertainties are standard deviations). By dividing the error rate over the average number of single-qubit gates needed for a Clifford operation, we extract average single-qubit gate fidelities of 99.20 ± 0.03% for Q1 and 98.79 ± 0.02% for Q2.
In order to assess the effects of crosstalk, we next perform single-qubit RB while simultaneously applying random Clifford operations to the other qubit ( Fig. 2 blue data points). Following [9], we denote the corresponding depolarizing parameter for qubit i while twirling qubit j as α i|j . In contrast to standard RB which is insensitive to SPAM errors, we have to assume here that operations on one qubit do not affect the read-out fidelity of the other qubit [9]. Comparing with individual single-qubit randomized benchmarking results, we find that simultaneous RB decreases the average Clifford fidelity for Q1 by 0.8% to 97.67 ± 0.04% while the fidelity for Q2 decreases by 3.5% to 94.26 ± 0.10%. Since the difference in qubit frequencies of 1.38 GHz is almost three orders of magnitude larger than the Rabi frequencies (∼ 2 MHz), this crosstalk is not due to limited addressability. Furthermore, the cross-talk on Q2 persists when the drive on Q1 is applied off-resonantly, hence it is an effect of the excitation and not a result of twirling Q1. Attempting to understand how the excitation leads to undesired crosstalk, we performed detailed additional studies (see [15] and Appendix F), ruling out a number of other possible sources of cross-talk, including the AC Stark effect, heating and residual coupling between the qubits. Finally, cross-talk in the experimental setup is likely to be symmetric, so the observed asymmetry indicates that the microscopic details of the quantum dots must also play a role.

TWO-QUBIT RANDOMIZED BENCHMARKING
To characterize two-qubit gate fidelities, the Clifford group is expanded to a four-dimensional Hilbert space. We first implement standard two-qubit RB, sampling Clifford operations from the 11520 elements in the twoqubit Clifford group. Each two-qubit Clifford operation is compiled from single-qubit rotations and the two-qubit CPhase gate, requiring on average 8.25 single-qubit rotations aroundx orŷ and 1.5 CPhase gate. The measured probability to successfully recover the initial state is shown in Fig. 3. From a fit to the data using Eq. 3 and Two-qubit Clifford Randomized Benchmarking.
Probability for obtaining outcome 11 upon measurement in the σz ⊗ σz basis, starting from the initial state |11 , as a function of the number of twoqubit Clifford operations.
As the native gate set, we use {I, X(π), Y (±π), X(±π/2), Y (±π/2), CPhase}. The elements of the two-qubit Clifford group fall in four classes of operations, the parallel single-qubit Clifford class, the CNOTlike class, the iSWAP-like class and the SWAP-like class. They are compiled by single-qubit gates plus 0, 1, 2 and 3 CPhase gates respectively. For each data point, we sample 30 random sequences, which are each repeated 100 times. The dashed line is a fit to the data with a single exponential.
The large number of native gates needed to implement a single two-qubit Clifford gate, leads to a fast saturation of the decay, within about eight Clifford operations, leading to a large uncertainty on the two-qubit Clifford fidelity estimate. In addition, this fast saturation makes it difficult to assess whether gate-dependent errors are present [29][30][31]. Importantly, interleaving a specific gate in a fast decaying reference sequence also yields a rather unreliable estimate of the interleaved gate fidelity. In the present case, interleaving a CPhase gate in the reference sequence of two-qubit Clifford operations is not a viable strategy to extract the CPhase gate fidelity. Furthermore, the compilation of Clifford gates into two different types of native gates -single-qubit gates and the CPhase gate -makes it impossible to confidently extract the fidelity of any of the native gates, such as the CPhase gate, by itself. This is different from a recent experiment on silicon spin qubits where only a single physical native gate was used, the conditional rotation, in which case the error per Clifford operation can be divided by the average number of conditional rotations per Clifford operation for estimating the error per conditional rotation [17].
As a first step to obtain quantitative information on the CPhase gate fidelity, we implement a simplified version of interleaved RB, which provides the fidelities of the two-qubit gate projected in various single-qubit subspaces, as was done earlier for superconducting transmon qubits [12] and hybrid gatemon qubits [13]. In this protocol, the CPhase gate is interleaved in a reference sequence of single-qubit Clifford operations. When applying a CPhase gate, we can (arbitrarily) consider one qubit the control qubit and the other the target qubit. When the control qubit is |1 , the target qubit ideally undergoes a π rotation around theẑ axis. With the control in |0 , the target qubit ideally remains fixed (Identity operation). Therefore, projected in the subspace corresponding to the target qubit, this protocol interleaves either a Z(π) rotation or the identity operation in a single-qubit RB reference sequence applied to the target qubit. The decay of the return probability for interleaved RB is also expected to follow Eq. 3. The fidelity of the interleaved gate is then found from the depolarizing parameter α for the interleaved and reference sequence, as From the experimental data, we find CPhase fidelities projected in single-qubit space of 91% to 95%, depending on which qubit acts as the control qubit for the CPhase, and which eigenstate it is in (see Appendix E).

CHARACTER RANDOMIZED BENCHMARKING
In order to properly characterize the two-qubit CPhase fidelity, we experimentally demonstrate a new approach to RB called character randomized benchmarking (CRB) [18]. CRB is a powerful generic method that extends randomized benchmarking in a rigorous manner, making it possible to extract average fidelities from groups beyond the multi-qubit Clifford group while keeping the advantages of standard RB such as resistance to SPAM errors. The generality of CRB allows one to start from (a subset of) the natives gates of a particular device and then design an RB experiment tailored to that set. This can strongly reduce compilation overhead and gate dependent noise, a known nuisance factor in standard RB [29][30][31]. Moreover, since the accuracy of interleaved randomized benchmarking depends on the fidelity of the reference gates [8,11], performing (through CRB) interleaved RB with a reference group generated by high fidelity gates can significantly improve the utility of interleaved RB.
Character randomized benchmarking requires us to average over two groups (the second one usually being a subgroup of the first). The first group is the "benchmark group". It is for the gates in this group that CRB yields the average fidelity. The second group is the "character group". CRB works by performing standard randomized benchmarking using the benchmark group but augments this by adding a random gate from the character group before each RB gate sequence. By averaging over this extra random gate, but weighting the average by a special function known from representation theory as a character function, it guarantees that the average over random sequences can always be fitted to a single exponential decay, even when the benchmark group is not the multiqubit Clifford group and even in the presence of SPAM errors.
Guided by the need for high reference fidelities, we choose for our implementation of CRB the benchmark group to be the parallel single-qubit Clifford group (C ⊗ C, the same as in standard simultaneous single-qubit RB) and the two-qubit Pauli group as the character group (see [18] for more information on why this is a good choice for the character group). It is non-trivial that the C ⊗ C group allows us to get information on two-qubit gates, since parallel single-qubit Clifford operations cannot fully depolarize the noise in the full two-qubit Hilbert space. In fact, for simultaneous single-qubit RB there are three depolarizing channels, each acting in a different subspace of the Hilbert space of density matrices, spanned by I ⊗ σ i , σ i ⊗I, and σ i ⊗σ i , with I the identity operator and σ i one of the Pauli operators. The three decay channels are reflected in the recovery probability for the final state, which is now described by where α i|j is again the depolarizing parameter for qubit i while simultaneously applying random Clifford operations to qubit j, and α 12 is the depolarizing parameter for the two-qubit parity ({|00 , |11 } versus {|01 , |10 }). We note that if the errors acting on both qubits are uncorrelated, then α 12 = α 1|2 α 2|1 [9]. The question now is how to separate the three decays. Fitting the data using a sum of three exponentials will be very imprecise. Existing approaches combine the decay of specific combinations of the probabilities of obtaining 00, 01, 10 and 11 upon measurement, but suffer from SPAM errors [9]. As discussed above, CRB offers a clean procedure for extracting the individual decay rates that is immune to SPAM errors and does not incur additional overhead. Concretely, CRB here proceeds as follows: (1) the twoqubit system is initialized to |00 , then (2) one random Pauli operator on each qubit is applied to prepare the system in a state |φ 1 φ 2 (one of |00 , |01 , |10 , and |11 ), followed by (3) a random sequence of simultaneously applied single-qubit Clifford operators. In practice, the random Pauli operator is absorbed in the first Clifford operation, making the Pauli gates effectively noise-free. A final Clifford operation is applied which ideally returns the system to the state |φ 1 φ 2 and finally (4) both qubits are measured. Each random sequence is repeated to collect statistics on the probability P φ1φ2 of obtaining measurement outcome 00 when starting from |φ 1 φ 2 (note that each P φ1φ2 averages over 4 Pauli operations). We combine these probabilities according to their character (see Appendix B for more details) to obtain three fitting parameters, P 1 = P 00 − P 01 + P 10 − P 11 , P 2 = P 00 + P 01 − P 10 − P 11 , P 3 = P 00 − P 01 − P 10 + P 11 .
Each of these three fitting parameters is expected to decay as a single exponential, isolating one of the decay channels in Eq. 6: Note that there is no constant offset B. This is also a feature of CRB. The three experimentally measured probabilities are shown in Fig. 4a. These contain a lot of useful information, including not only the separate depolarizing parameters but also the averaged CRB reference fidelity and information on error correlations. The blue (red) curve shows the decay in the subspace corresponding to Q1 (Q2), spanned by σ i ⊗ I (I ⊗ σ i ). The green curve shows the decay in the subspace spanned by σ i ⊗σ j . This decay can be interpreted as the parity decay. The fitted depolarizing parameters are α 1|2 = 0.9738 ± 0.0008, α 2|1 = 0.8902±0.0020 and α 12 = 0.8652±0.0022. The average CRB depolarizing parameter can be found from the separate depolarizing parameters as where the weights are proportional to the dimension of the corresponding subspaces of the 16-dimensional Hilbert space of two-qubit density matrices. We obtain a reference CRB fidelity of 91.9 ± 0.1%, which represents the fidelity of two simultaneous single-qubit Clifford operators (C ⊗ C). Finally, from the three depolarizing parameters in Eq. 6, we can infer to what extent errors occur independently on each qubit or exhibit correlations between the two qubits. The fact that α 12 − α 1|2 α 2|1 = −0.0017 ± 0.0031 indicates that the errors are essentially independent.
Next we perform the interleaved version of CRB, for which we insert a CPhase gate after each single-qubit Clifford pair. Fig. 4b shows the three corresponding experimentally measured decays. The fitting parameters we extract now reflect the combined errors from a single-qubit Clifford pair followed by a CPhase gate. The fitted depolarizing parameters are α 1|2 = 0.7522 ± 0.0060, α 2|1 = 0.7623±0.0053, and α 12 = 0.8226±0.0030. As can be expected, the three decays lie closer together than those for reference CRB: not only does the additional CPhase gate contribute directly to all three decays, it also mixes the three subspaces. From the depolarizing parameters in interleaved and reference CRB measurement, we use Eq. 5 to isolate the fidelity of the CPhase gate, now in two-qubit space as desired, yielding 92.0 ± 0.5%.
The dominant errors in the CPhase gate arise from nuclear spin noise and charge noise. In natural silicon, the abundance of Si 29 atoms is about 4.7%, and the Si 29 nuclear spins dephase the electron spin states due to the hyperfine interaction [19]. Charge noise modulates the overlap of the two electron wave functions, and thus also the two-qubit coupling strength. In the present device, we could not access the symmetry point where the coupling strength is to first order insensitive to the detuning of the double dot potential [32,33], hence charge noise directly (to first order) affects the two-qubit coupling strength.

CONCLUSIONS
Character randomized benchmarking provides a new method to effectively characterize multi-qubit behaviour. It combines the advantages of simultaneous randomized benchmarking and interleaved randomized benchmarking, and gives tighter bounds on the fidelity number than standard interleaved randomized benchmarking due to its simpler compilation. CRB is useful in a wide variety of settings, far beyond the particular case studied here. The general approach to exploiting CRB is to start from a set of native gates that can be implemented easily and with high fidelity, and to construct a suitable reference sequence based on this set. The decay for the reference sequence contains any number of exponentials, which can be separated without suffering from SPAM errors and which provide relevant additional information, in the present case on the fidelity of simultaneously applied gates, cross-talk and on noise correlations. Comparison with interleaved CRB allows one to extract the fidelity of specific gates of interest.
We perform the first comprehensive study of the single-qubit, simultaneous single-qubit and two-qubit gate fidelities for semiconductor qubits, where the use of CRB, which allows for a compact reference sequence, was essential for extracting a reliable two-qubit gate fidelity. Summarizing, independent single-qubit gate fidelities are around 99% in this system, these drop to 98.8% for qubit 1 and to 96.9% for qubit 2 when simultaneously twirling the other qubit, and the two-qubit CPhase fidelity is around 91%. We expect that by working in an isotopically purified Si 28 /SiGe substrate and performing the two-qubit gate at the symmetry point, a CPhase gate fidelity above the fault-tolerant threshold (> 99%) can be reached. A recent report on the fidelity of controlled rotations in Si/SiO 2 quantum dots already comes close to this threshold [17]. With further improvements in charge noise levels, two-qubit gate fidelities above 99.9% are in reach.

ACKNOWLEDGMENTS
This research was sponsored by the Army Research Office (ARO) under grant numbers W911NF-17-1-0274 and W911NF-12-1-0607. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the ARO or the US Government. The US Government is authorized to reproduce and distribute reprints for government purposes notwithstanding any copyright notation herein. Development and maintenance of the growth facilities used for fabricating samples is supported by DOE (DE-FG02-03ER46028). We acknowledge the use of facilities supported by NSF through the University of Wisconsin-Madison MRSEC (DMR-1121288). J.H. and S.W. are supported by an NWO VIDI Grant (SW), an ERC Starting Grant (SW), and the NWO Zwaartekracht QSC. We acknowledge useful discussions with the members of the Vandersypen group, and technical assistance by R. Schouten and R. Vermeulen.

Appendix A: MEASUREMENT SETUP
The measurement setup is the same as the one used in [15]. We summarize here a few key points. The gates P1 and P4 are connected to arbitrary waveform generators (AWG, Tektronix 5014C) via coaxial cables. Applying DC voltage pulses to these two gates moves the system through different positions in the charge stability diagram for initialization, operation and read-out. Voltage pulses applied to these two gates are used to pulse the detuning between the two quantum dots, thereby turning on and off the controlled-phase gate. Gates P2 and P3 are connected to vector microwave sources (Keysight E8267D) for achieving EDSR. Each microwave source has two I/Q input channels, connected to two channels on the master AWG, which controls the clock of the entire system and triggers all the other instruments. The frequency, phase and duration of the microwave bursts are thus controlled by I/Q modulation. In addition, we use pulse modulation to obtain a larger on/off ratio of the microwave bursts than is possible using I/Q modulation only. A digitizer card (Spectrum M4i.44) installed inside the measurement computer is used to record the current traces of the sensing quantum dot at a sampling rate ∼ 60 kHz. Each time trace is converted into a single bit value (0 or 1) by the measurement computer using threshold detection. The average over many repetitions gives us the spin-up and spin-down probabilities (0 and 1).

Appendix B: MATHEMATICAL BACKGROUND OF CRB
Character randomized benchmarking is a generic method for performing randomized benchmarking with finite groups other than the multi-qubit Clifford group. As mentioned in the main text, CRB requires the user to specify two finite groups: the benchmark group and the character group. In this work we chose the benchmark group to be the simultaneous single-qubit Clifford group on two qubits and the character group to be the twoqubit Pauli group. Standard RB and CRB rely on the framework of representation theory. Central to the use of RB and CRB is a powerful result called Schur's lemma. In the context of this paper Schur's lemma states that, assuming for simplicity that every gate is subject to an identical noise map E, the average noisy RB operator M is of the form where we are describing all quantum channels in the Pauli Transfer Matrix picture, i.e. M i,j = Tr(σ i M(σ j ))/2 where σ i , σ j are Pauli matrices. One can think of the matrix entry M i,j as describing how much the noise map M maps the generalized Bloch sphere axis labeled σ j to the one labeled σ j . The submatrices I 1|2 , I 1|2 and I 12 of the matrix M are defined as the identity matrix on the sets of 2-qubit Pauli's of the form {σ i ⊗ I}, {I ⊗ σ i } and {σ i ⊗ σ j } respectively. We would like to estimate the numbers α 1|2 , α 2|1 and α 12 individually in a way that does not depend on state preparation and measurement. To do this CRB adds an extra average over another group called the character group, which we choose to be the two-qubit Pauli group. This average is weighted by a so-called character function. This average over the Pauli group projects any initial state onto a single axis of the Bloch sphere. Which axis is projected on depends on the character function used for the weights. By selecting the correct Bloch sphere axes, we can single out the individual blocks of the matrix M. In order to isolate the parameter α 1|2 we choose to project onto the Bloch sphere axis associated to σ z ⊗ I. Concretely this means that the character averaged RB operator M becomes where the function χ σzI (σ) is given in the first row of Table I and the matrix P σzI has all zero entries except on the diagonal entry corresponding to the Pauli σ z ⊗ I. By matrix multiplication we see that M m P σzI = α m 1|2 P σzI . This means that the average measured survival probability in CRB, with input state ρ and measurement operator σ\P II σzI Iσz σzσz σxI Iσx σxσx σyI Iσy σyσy σzσx σxσz σzσy σyσz σxσy σyσx σzI 1 1 where A is a function of Q and ρ. Similarly we can obtain estimates α 2|1 and α 12 by constructing projectors onto the Pauli operators I ⊗ σ z and σ z ⊗ σ z respectively. The character functions for these projectors are given in rows 2 and 3 of Table I respectively.
As noted in the main text, CRB is a generic procedure, which can be used beyond its application in this manuscript. Another notable example of where we suspect CRB can offer a benefit is when the device native gates are not single-qubit gates but rather two-qubit gates, as happens in [17]. In this case compiling multiqubit Cliffords is very cumbersome. In the theoretical RB literature benchmarking groups are discussed that are more suitable to this scenario such as the CNOTdihedral group (for native CNOT gates) [34] and the real Clifford group (for native CPhase gates) [35]. Both of these groups lead to benchmarking data that mixes two exponential decays but using the CRB approach these can be fitted individually in a reliable manner (in both cases the Pauli group is a good choice for character group, see the example in [18] for more information).
Although it often goes unmentioned, the estimate for the fidelity of an interleaved gate given in Eq. 5 is only exact when the qubit noise is exactly depolarizing. In the presence of other types of noise (such as dephasing or calibration errors) this number gives only upper and lower bounds on the fidelity of the interleaved gate. First upper and lower bounds were given in [8] and recently optimal upper and lower bounds were given in [11]. These bounds depend strongly on the fidelity of the gates in the reference sequence, in particular they scale as O( where α ref is the reference RB decay constant. This means that our implementation of CRB, which uses only single-qubit gates for the reference experiment, has a significant advantage over standard two-qubit interleaved RB also in this respect. We can illustrate this advantage by considering a hypothetical standard two-qubit interleaved experiment with interleaved CZ gate. Recall from Eq. 3 that standard two qubit RB (here considered as a reference experiment) yielded a reference fidelity of 82% and thus a depolarizing parameter of α 2,ref = 0.73 (suppressing uncertainty for the sake of this exercise). Assuming an interleaved CPhase fidelity of 92% (which is what we extracted from the CRB experiment) and assuming that the error on a hypothetical interleaved two qubit RB experiment scales multiplicatively (optimistic given the possibility of calibration errors) we estimate that a hypothetical two qubit interleaved RB experiment would have a depolarizing parameter of α 2,int . Using the optimal bounds calculated in [11] this would mean we can only guarantee that the fidelity of the interleaved gate lies in the range [0.58, 1]. From the CRB experiment we can however guarantee that the fidelity of the interleaved gate lies in the range [0.69, 1], a significant improvement even in the absolute worst case scenario discussed in [11].
We would also like to note that the bounds given in [8,11] significantly overestimate the range of possible interleaved gate fidelities if more is known about the noise process. If for instance the noise on the reference gates is assumed to be dominated by stochastic errors (as opposed to coherent errors due to mis-calibration) then the upper and lower bounds can be made significantly tighter. This coincides with experimental consensus that interleaved RB generally gives good estimates of the interleaved gate fidelity. However, since single qubit gates will typically suffer less from calibration errors than compiled two qubit gates we argue that interleaved CRB will yield sharper upper and lower bounds on the interleaved gate fidelity than standard interleaved RB when more is known about the noise process.  Fig. 5 shows experimental results for the experiment discussed in the main text where a CPhase gate is interleaved in a standard single-qubit RB sequence applied to one qubit, while the other qubit is in either |0 or |1 . This experiment provides the CPhase fidelity projected in single-qubit space [12,13], summarized in the table below for the four possible cases.
FIG. 5. Interleaved Randomized Benchmarking projected in single-qubit space. (a) Probability for obtaining outcome 0 upon measurement in the σz ⊗ I basis as a function of the number of single-qubit Clifford operations, interleaved with the CPhase operation. For the red circles (blue squares), Q2 is is in |0 (|1 ) so Q1 is expected to undergo the identity operation (a Z(π) rotation). For each data point, we sample 30 different random sequences for each Clifford number, which are each repeated 100 times. Dashed lines are fits to the data with a single exponential. (b) Analogous data for Q2.

Appendix F: CROSSTALK
We here provide more information on the cross-talk effects that occur on one qubit when applying a microwave drive to the other (see also [15] and the supplementary information therein). First, when we perform spectroscopy on Q2 while driving Q1, we find that the frequency of Q2 shifts by of the order of 2 MHz (depending on the power applied to Q1). We compensate for this known frequency shift by shifting the drive frequency applied to Q2 when we simultaneously drive Q1. We note that a frequency shift by a known amount is not expected to contribute to decoherence. However, Fig. 6 shows Rabi oscillations for both qubits in the absence and presence of an excitation to the other qubit. Clearly, when simultaneously driving Rabi oscillations on both qubits, we find a faster decay on Q2 comparing to driving Q2 by itself. The effect of simultaneous driving on Q1 is less pronounced. This is consistent with the observed effects of simultaneous driving on the measured single-qubit gate fidelities reported in the main text. The cross-talk effect on Q2 persists when the drive on Q1 is applied off-resonantly or when dot 1 is emptied. We do note that the microwave power used to drive Q1 (∼20dBm) is substantially higher than that used for Q2 (∼8dbm). This difference is needed to compensate for the tighter confining potential of dot 1 compared to dot 2.