Experimental implementation of non-Clifford interleaved randomized benchmarking with a controlled-S gate

Hardware efficient transpilation of quantum circuits to a quantum devices native gateset is essential for the execution of quantum algorithms on noisy quantum computers. Typical quantum devices utilize a gateset with a single two-qubit Clifford entangling gate per pair of coupled qubits, however, in some applications access to a non-Clifford two-qubit gate can result in more optimal circuit decompositions and also allows more flexibility in optimizing over noise. We demonstrate calibration of a low error non-Clifford Controlled-$\frac{\pi}{2}$ phase (CS) gate on a cloud based IBM Quantum computing using the Qiskit Pulse framework. To measure the gate error of the calibrated CS gate we perform non-Clifford CNOT-Dihedral interleaved randomized benchmarking. We are able to obtain a gate error of $5.9(7) \times 10^{-3}$ at a gate length 263 ns, which is close to the coherence limit of the associated qubits, and lower error than the backends standard calibrated CNOT gate.

Hardware efficient transpilation of quantum circuits to a quantum devices native gateset is essential for the execution of quantum algorithms on noisy quantum computers. Typical quantum devices utilize a gateset with a single two-qubit Clifford entangling gate per pair of coupled qubits, however, in some applications access to a non-Clifford two-qubit gate can result in more optimal circuit decompositions and also allows more flexibility in optimizing over noise. We demonstrate calibration of a low error non-Clifford Controlled-π 2 phase (CS) gate on a cloud based IBM Quantum computing using the Qiskit Pulse framework. To measure the gate error of the calibrated CS gate we perform non-Clifford CNOT-Dihedral interleaved randomized benchmarking. We are able to obtain a gate error of 5.9(7) × 10 −3 at a gate length 263 ns, which is close to the coherence limit of the associated qubits, and lower error than the backends standard calibrated CNOT gate.

I. INTRODUCTION
Quantum computation holds great promise for speeding up certain classes of problems, however near-term applications are heavily restricted by the errors that occur on present day noisy quantum devices [1]. To run a computation on a quantum processor requires first calibrating a universal gate set -a small set of gates which can be used to implement an arbitrary quantum circuitwhich has low error rates, and then transpiling the circuit to this set of gates. This transpilation should be done in a hardware-efficient manner to reduce the overall error by minimizing the use of the highest error gates [2]. Two of the most significant error sources on current devices are incoherent errors due to interactions with the environment, quantified by the coherence times of device qubits, and calibration errors in the gates used to implement a quantum computation [3,4].
If a gate set could be perfectly calibrated the coherence time of the qubits would set the fundamental limit on error rates without active error correction. Thus the goal of gate calibration is to get as close to the coherence limit as possible. Current quantum hardware typically use a gate set consisting of arbitrary single-qubit rotations and a single entangling two-qubit gate [5]. State of the art single-qubit gate error rates in these systems approach 2 × 10 −4 [6], where two-qubit gate errors are around 10 −3 [7][8][9], see also Appendix A. In superconducting qubit systems using fixed-frequency transmon qubits a microwave-only two-qubit entangling gate may be implemented using the cross-resonance (CR) interaction [10]. The CR interaction can be used to implement a high fidelity Controlled-NOT (CNOT) gate [11]. Gate In some cases it may be favourable to introduce an additional two-qubit gate to a gate set if it enables more hardware efficient compilation of relevant circuits, however this adds the overhead of additional calibration and characterization of the gate errors. One such gate is the Controlled-Phase (CS) gate, which is a non-Clifford twoqubit entangling gate that is universal when combined with the Clifford group [17]. The CS gate is particularly attractive to fixed-frequency transmon qubit systems as it can be implemented using the CR interaction, since it is locally equivalent to √ CNOT. This means it can be calibrated using the same techniques as the CNOT gate, but with a shorter gate duration or lower power, potentially leading to a higher fidelity two-qubit gate when calibrated close to the coherence limit. Furthermore the CS gate is a member the CNOT-Dihedral group and can be benchmarked using CNOT-Dihedral randomized benchmarking [17]. Recently an optimal decomposition algorithm for two-qubit circuits into the Clifford + CS gates was developed [18]. This method minimizes the number of non-Clifford (CS) gates, which is important in the context of quantum error correction as non-Clifford gates require additional resources such as magic-state distillation to prepare fault-tolerantly [19]. However, in nonfault tolerant near term devices it is often preferable to minimize the total number of two-qubit gates in a decomposition rather than non-Clifford gates. An optimal decomposition for gates generated by the CNOT-Dihedral in terms of the number of CNOT and CS gates has also recently been developed [20]. Another example is the Toffoli gate which can be decomposed into 6 CNOT gates and single qubit gates, but requires only 5 two-qubit gates in its decomposition if the CS and CS −1 gates are also available [21].
In this work we calibrate CS and CS −1 gates of varying durations on an IBM Quantum system and benchmark the gate error rates by performing the first experimental demonstration of interleaved CNOT-Dihedral randomized benchmarking. For specific gate durations we are able to obtain a high-fidelity CS gate approaching the coherence limit, which due to the shorter CR interaction time results in a lower error rate than can be obtained for a CNOT gate. In addition to RB we also compute the average gate error of the CS gate using two-qubit quantum process tomography (QPT) and compare to the values obtained from RB. Pulse-level calibration was done using Qiskit Pulse [22], and the RB and QPT experiments were implemented using the open source Qiskit computing software stack [23] through the IBM Quantum cloud provider.

II. CNOT-DIHEDRAL RANDOMIZED BENCHMARKING
We describe the protocol for estimating the average gate error of the CS gate using interleaved CNOT-Dihedral Randomized Benchmarking, which is a natural generalization of the CNOT-Dihedral RB procedure described in [17] with interleaved RB [13] to estimate individual gate fidelities for the CS gate In the following we let G denote the CNOT-Dihedral group on n qubits, g ∈ G denote a unitary element of G, and I, X, Y , Z denote the single-qubit Pauli matrices.
Randomly sample l elements g j1 , . . . , g j l uniformly from G, and compute the (l + 1)th element from the inverse of their composition, g j (l+1) = (g j l • · · · • g j1 ) −1 . Denote by j l the l-tuple (j 1 , . . . , j l ). For each sequence, we prepare an input state ρ, and apply the composition of the l + 1 gates that ideally would be S j l := g j (l+1) • g j l • · · · • g j1 , and then measure the expecation value of an observable E.
Assuming each gate g i has an associated error Λ i (ρ), the sequence S j l is implemented as The expectation value of E is E j l = T r[ES j l (ρ)]. Averaging this overlap over K independent sequences of length l gives an estimate of the average sequence fidelity whereS l (ρ) := 1 K j lS j l (ρ) is the average quantum channel.
We decompose the input state and this final measurement operator in the Pauli basis P (an orthonormal basis of the n-qubit Hermitian operators space, constructed of single-qubit Pauli matrices). This gives ρ = Σ P x P P/2 n and E = Σ P e P P . Given that the gate errors are close to the average of all errors [17], the average sequence fidelity is where A Z = Σ P ∈Z\{I} e P x P and A R = Σ P ∈P\Z e P x P , with Z being tensor products of Z and I gates.
Each of the two exponential decays α l Z and α l R can be observed by choosing appropriate input states. For example, if we choose the input state |0 . . . 0 then F seq = e I + A 0 α l Z where A 0 = Σ P ∈Z\{I} e P . On the other hand, if we choose | + · · · + then F seq = e I + A + α l R where A + = Σ P ∈X \{I} e P , with X tensor products of X and I gates.
The channel parameters α Z and α R can be extracted by fitting the average sequence fidelity to an exponential. From α Z , α R the average depolarizing channel parameter α for a group element g is given by and the corresponding average gate error is given by Step 2: Interleaved CNOT-Dihedral sequences.
Choose a sequence of unitary gates where the first element g j1 is chosen uniformly at random from G, the second is always chosen to be g, and alternate between uniformly random elements from G and fixed g up to the l-th random gate. The (l + 1) element is chosen to be the inverse of the composition of the first l random gates and l interlaced g gates, g j (l+1) = (g • g j l • · · · • g • g j1 ) −1 . We adopt the convention of defining the length of a sequence by the number of random gates l.
For each sequence, we prepare an input state ρ, apply and measure an operator E.
Assuming that the gate g has an associated error Λ g (ρ) and that each gate g i has an associated error Λ i (ρ), the sequence ν j l is implemented as The overlap with E is T r[Eν j l (ρ)]. Averaging this overlap over K independent sequences of length l gives an estimate of the new sequence fidelity Similarly to Step 1, we fit F seq (l, E, ρ) and obtain the depolarizing parameter αḡ, according to Eq. (3). Using the values obtained for α and αḡ , the gate error of Λ g , which is given by and must lie in the range [r rb g − , r rb g + ], where is estimated in [13] Eq. (5). Note that one has to be careful in interpreting the results of an interleaved experiment, as in some cases might be large compared to r rb g .

III. IMPLEMENTING THE CONTROLLED-S GATE
We calibrate CS gates of varying gate durations using Qiskit Pulse and measure the average gate error using the interleaved CNOT-Dihedral RB protocol in II. We use the CR pulse sequence as a generator of two-qubit entanglement [10,24]. The CR pulse is realized by irradiating one (control) qubit with a microwave pulse at the transition frequency of another (target) qubit. The stimulus drives the quantum state of the target qubit with the direction of rotation depending on the quantum state of the control qubit. This controlled rotation is used to create two-qubit entangling gates such as CNOT and CS.
The two-qubit system driven by the CR pulse with amplitude A and phase φ can be approximated by an effective block-diagonal time-independent Hamiltonian [25,26] where the qubit ordering is control ⊗ target, and ω ZP and ω IQ represent the interaction strength of the corresponding Pauli Hamiltonian terms. In the absence of noise, the ideal CR evolution for a constant-amplitude pulse is written as an unitary operator where t CR is the length of the CR pulse. We also define the unitary operator created by an arbitrary two-qubit generator as where B, C are arbitrary single qubit operators, and we use [BC] ≡ [BC] π . As can be seen by examining Eq. (7), the CR pulse induces three entangling interaction terms (ZX, ZY , and ZZ), in addition to potentially many unwanted local rotations with different amplitudes. By appropriately calibrating the phase of the CR drive φ, the ZX term is the dominant term among the interactions and is the key term for executing two-qubit gates in this system. As with the standard CNOT gate, we can compose a CS gate by isolating the ZX interaction with a refocusing sequence and single qubit pre-and post-rotations: where H is the Hadamard operator. As shown in Eq. (10), we need to develop the calibration procedure to find an amplitude A and a phase φ where |ω ZX |t CR = π/4 and the other terms become zero. The CR Hamiltonian includes a large ZI term as a result of the offresonant driving of the control qubit; IX, ZZ and IZ can also be large for transmon qubits [25]. However, the strengths of ZZ and IZ terms are expected to be negligibly weak in our device. We note that both ZI and IX terms commute with the ZX term of interest, while ZI and ZX terms anti-commute with the inversion of the control qubit XI. In addition, the ZI term is the even function and both IX and ZX terms are odd functions of the drive amplitude A. Accordingly, we can effectively eliminate the impact of those unwanted terms with the two-pulse echoed CR sequence [27] expressed as This sequence consists of two CR pulses with opposite drive amplitude, each one followed by a π-rotation refocus pulse XI on the control qubit. Here we also assume the negligible impact of the IY term which is generally introduced by the physical crosstalk between the control and the target qubit [11].

A. Gate Calibration and Benchmarks
To experimentally implement the CS and CS † gates we use the 27 qubit IBM Quantum system ibmq paris with fixed-frequency and dispersively coupled transmon qubits. Qubit 0 and the qubit 1 of this system are assigned as the control and the target qubit, respectively. The resonance frequency and anharmonicity of the control (target) qubit are 5.072 (5.020) GHz and -336.0 (-321.0) MHz.
The pulses realized in practice are not constantamplitude pulses, rather the amplitude is increased and decreased smoothly. We implment the CR pulse as a flat top Gaussian, with flat-top length τ sq , and Gaussian rising and falling edges each with length τ edge (τ CR = τ sq + 2τ edge ). We use a constant Guassian edge with τ edge = 28.16 ns with 14.08 ns standard deviation and vary the length of the duration of the square flat-top pulse τ sq . The minimum pulse duration is τ sq = 0 ns, yielding a pure Gaussian shape. The overhead of single-qubit gates in the echoed CS sequence in Eq. (10) for the ibmq paris backend is 106.7 ns, giving a total echoed CS gate time of τ CS = 2τ CR + 106.7 ns. The single-qubit gates are optimized by merging consecutive rotations using the Qiskit circuit transpiler with optimization level = 1 followed by conversion to a pulse schedule [22].
We performed calibration to a CR rotation angle ω ZX (A, φ)τ CR π/4 for different values of τ sq . This was done by first performing a rough calibration of (A, φ) by scanning those parameters, followed by the closedloop fine calibration with standard error amplification sequences (see Appendix B for details). The calibrated pulse schedule of the CS gate with τ sq = 21.3 ns (τ CS = 263.1 ns) is shown in Fig. 1(a).
The average gate error of the calibrated CS gate is evaluated by using the interleaved CNOT-Dihedral RB with 10 sequence lengths l ∈ (1, 5, 10, 20, 30, 50, 75, 100, 125, 150), and 10 samples for each l. Each experiment is executed 1024 times for both input states |00 and |++ both with and without interleaving the CS gate. An example of measured RB decay curves for τ sq = 21.3 ns are shown in Fig.  1(b). The exponential fit of the decay curves yields α = 9.78(1) × 10 −1 and αḡ = 9.73(1) × 10 −1 , giving an estimated average gate error of the CS gate of r rb g = 5.2(7) × 10 −3 . According to [13], the theoretical bound of the error is calculated to be [0, 2.8 × 10 −2 ]. In addition to RB we also perform quantum process tomography (QPT) [29] and compute the average gate fidelity from the reconstructed process. QPT was done using maximum likelihood estimation with the tomography module of Qiskit Ignis [30], using a preparation basis of {|0 , |1 , |+ , |+i } and measurement basis of {X, Y, Z} for each qubit. We performed 1024 repetations (shots) per basis configuration and correct for measurement errors using the readout error mitigation technique [31]. The average gate error calculated from the tomographic fit for τ sq = 21.3 ns was r qpt g = 8.8 × 10 −3 which is comparable to the value estimated from the interleaved CNOT-Dihedral RB experiment.

B. Gate Duration Dependence
We perform the same calibration and benchmarking procedures for different flat-top width τ sq from 0 ns to 355.6 ns (τ CS from 219.3 ns to 930.5 ns) and measure the average gate errors by both the interleaved CNOT-Dihedral RB experiment and QPT. In this experiment, we use a reduced set of RB sequence lengths l ∈ (1, 10, 25, 50, 100, 150) to reduce the total number of experiments while keeping the accuracy of the estimated gate error high.
We measure the qubit coherence times with a long characteristic time [33].
Nevertheless, as Fig. 2 shows, our calibration method provides highly accurate results and allows to approach the coherence limit for appropriately chosen gate times. It can also be seen that r rb g and r qpt g show a similar trend as a function of τ sq . This dependence on τ sq agrees well with the slope predicted by the coherence limit for τ sq 21.3 ns. We emphasize that r rb g gives a more robust estimate of the gate error than the r qpt g as it is not as sensitive to state prepratation and measurement errors. The interleaved CNOT-Dihedral RB experiment also requires only 24 circuit executions per single error measurement, while the two-qubit QPT requires 144 circuit executions with the readout error mitigation. The smaller experimental cost to measure r rb g enables us to average the result over 10 different random circuits, which is empirically sufficient to obtain a reproducible outcome, at a practical queuing time with ibmq paris.
The nearly stable offset of r rb g from the coherence limit possibly indicates the presence of coherent errors due to imperfection of calibration. The measured r qpt g obtained consistantly smaller error values than r rb g . In the region τ sq 21.3 ns, both gate errors show a significant increase from the coherence limit. In this regime the drive amplitude of the CR pulse rapidly increases in order to guarantee that the total accumulated rotation angle is π/4 for shorter τ CR . The amplitude of crosstalk ω 2 IX + ω 2 IY measured at τ sq = 0 ns is 176.2 kHz, while one at τ sq = 355.6 ns is 19.4 kHz. Although the IX term is refocused and has negligible contribution, the re-mained IY term can still impact on the measured gate errors. Thus, at τ sq = 0 we calibrate a CS gate with a compensation tone on the target qubit to suppress the physical crosstalk between qubits (see Appendix C for details). The calibrated pulse sequences with and without the compensation tone yield r rb g of 2.1(3) × 10 −2 and 2.2(2) × 10 −2 , respectively. These comparable results indicate the physical crosstalk is relatively suppressed in this quantum device and other noise sources are dominant for τ sq 21.3 ns. For example, at high power the pertubation theory used to obtain the average CR Hamiltonian may break down, and hence also calibration scheme based on this decomposition.
The reasons for imperfection of two-qubit gates in superconducting qubits have been investigated and associated with various mechanisms such as nonideal signal generation, residual ZZ coupling, CR-induced ZZ interaction [34][35][36], and leakage to the higher energy levels [37,38]. Although a further analysis of the error mechanisms in this regime of high-power pulses is beyond the scope of this study, initial results indicate that coherent population transfer out of the two-qubit manifold into the higher levels, and ZZ interaction terms, are not the relevant mechanisms [39]. At the same time, the coherence limit can be further lowered by reducing the time spent on single-qubit gates. At τ sq = 21.3 ns with the minimum r rb g of 5.9(7) × 10 −3 , the refocusing pulse and local rotations occupy 40% of the total gate time τ CS , yielding a non-negligible impact on the gate error.

IV. CONCLUSION
We have demonstrated calibration of a high fidelity non-Clifford CS gate on 27 qubit IBM Quantum system ibmq paris. This gate is not currently included in the standard basis gates of IBM Quantum systems, and it was calibrated and benchmarked entirely using open source software available in Qiskit. Since the CS gate is non-Clifford, robust characterization of the average gate error cannot be done using standard RB. To benchmark performance of the non-Clifford gate we performed the first experimental demonstration of two-qubit interleaved CNOT-Dihedral RB, which allow efficient and robust characterization of a universal gateset containing the CS gate.
We obtained a minimal gate error of 5.9(7) × 10 −3 with appropriately shaped echoes and a total gate time of 263.1 ns. The gate error reported for the standard twoqubit CNOT gate provided by ibmq paris is 1.3 × 10 −2 . Thus the presented CS gate error is comparable with half the CNOT error. By performing RB and QPT for a variety of gate lengths we were also able to study the performance of the CS gate in different regimes and observed a break down in performance if gate lengths were reduced below the best value obtained for 263.1 ns. This is consistant with previous literature on CNOT calibration using the cross-resonance interaction in the high power regime.
The expansion of the native two-qubit gateset of a Cloud quantum device with additional low error calibrated gates allows for improved hardware efficient transpilation of quantum circuits. This is important for executing quantum algorithms on noisy quantum devices without error correction, and for reducing the error correction overhead when fault-tolerant devices with active error correction are available.

Appendix B: Calibrating CS Gate
The single qubit gates used for the echo sequence and local rotations are provided by ibmq paris. We calibrate the CR pulse amplitude A and its phase φ by the rough parameter scan followed by the closed-loop calibration. These parameters are determined based on the two-pulse echoed CR sequence U echo shown in Eq. (11). This approach simplifies the calibration, namely, we don't need to take non-negligible ZI and IX terms into account when we fit the experimental results for calibration parameters. Calibrated sequence U echo ∼ [ZX] π 4 is used to realize the CS with local rotations shown in Eq. (10).

Rough Parameter Scan
We initialized both qubits in the ground state and perform a rough scan of the CR pulse amplitude with the pulse schedule: The schedule is follwed by the measurement of the target qubit in the Z-basis. The sinusoidal fit for the measured population of the target qubit with S scan A with different A gives an estimate of the CR amplitude A 0 where the angle of controlled rotation is approximately π/4. A typical experimental result for τ sq = 21.3 ns is shown in Fig.  4(a).
By using this A 0 , we scan the CR phase with two pulse schedules S scan φg and S scan φe : The schedule S scan φg (S scan φe ) drives the echo sequence U echo (A 0 , φ) twice with the control qubit of the ground (excited) state. Note that the last two operations correspond to the projection into Y -basis for the following measurement. The flip of the state of the control qubit leads the controlled rotation of the target qubit state with opposite direction as illustrated in Fig. 4(b). This opposite rotation of π/2 around an azimuthal angle θ = θ 0 −φ of the target qubit Bloch sphere yields measured outcome of ∓ 1 for S scan φg and S scan φe , respectivly, at the optimal phase φ = φ 0 where θ = 0. Here θ 0 is the phase offset from the unknown transfer function of the coaxial cable assembly [42]. The phase φ 0 gives a rough estimate of the CR phase where the ZX term of interest is maximized while the unwanted ZY term is eliminated.

Closed-loop Fine Calibration
We use the roughly estimated parameters (A 0 , φ 0 ) as an initial guess of closed-loop calibrations. We first optimize the CR pulse amplitude with following experiment: where N is number of repeated sequences. This schedule prepares the target qubit in the superposition state and repat the echo sequence 4N times to apply a controlled rotation of N π. Because the initial guess of A 0 is estimated by the parameter scan in the coarse precision with a finite error δ A , repeating S fine where U CR = U CR (A 1 , φ 1 ). Here, the CR pulse with the same sign is repeatedly applied while changing the state of control qubit. This pulse sequence refocuses (and hence eliminates) controlled rotation terms such as ZX and ZY , allowing us to precisely estimate the strength of weak local rotation terms ω 2 IX + ω 2 IY , amplified in the absence of strong two-qubit interactions.
This technique can be used to calibrate a compensation tone that eliminates the IY term caused by the physical crosstalk between qubits [11]. The compensation tone is applied to the drive channel of the target qubit d1, in parallel with U CR . This single-qubit pulse is shaped as a flat-top pulse with Gaussian edges of identical duration as the U CR pulse, with its own calibrated amplitude and phase (A , φ ). First, we repeat S xy4 for N = 0, 2, 4, ..., 32 without the compensation tone and measure the Pauli Z expectation value of the target qubit. The fit for the oscillation over the total CR gate time 8τ CR N yields the strength of the total unwanted local rotation terms. At τ sq = 0 ns, the unwanted local rotation strength of 176.2 kHz was observed. This strength was reduced to 6.7 kHz with the calibrated compensation tone with A = 0.00102 and φ = −0.962 rad. The experimental result is shown in Fig. 4(d).