Quantum Instruction Set Design for Performance

A quantum instruction set is where quantum hardware and software meet. We develop new characterization and compilation techniques for non-Clifford gates to accurately evaluate different quantum instruction set designs. We specifically apply them to our fluxonium processor that supports mainstream instruction $\mathrm{iSWAP}$ by calibrating and characterizing its square root $\mathrm{SQiSW}$. We measure a gate fidelity of up to $99.72\%$ with an average of $99.31\%$ and realize Haar random two-qubit gates using $\mathrm{SQiSW}$ with an average fidelity of $96.38\%$. This is an average error reduction of $41\%$ for the former and a $50\%$ reduction for the latter compared to using $\mathrm{iSWAP}$ on the same processor. This shows designing the quantum instruction set consisting of $\mathrm{SQiSW}$ and single-qubit gates on such platforms leads to a performance boost at almost no cost.

A quantum instruction set is the interface that quantum hardware vendors provide to software.This has a subtle difference from a native gate set although they are often used interchangeably.A native gate set contains quantum operations that are physically realizable on a quantum computing platform.In contrast, a quantum instruction set is a carefully chosen set of gates with consideration for cost or performance of the processor.Although one can always include more quantum gates to an instruction set to more efficiently implement algorithms, there are additional considerations.First of all, the added instruction must be of sufficient accuracy.In addition, since we need to calibrate each instruction, it is preferable to limit the number of gates [1].This is reminiscent of the debate between RISC (Reduced Instruction Set Computer) and CISC (Complex Instruction Set Computer) in classical computing [2].Now, any quantum instruction set capable of universal quantum computing contains at least one entangling two-qubit gate.Given that errors of single-qubit gates are usually much less than that of two-qubit gates, the design of a quantum instruction set is mainly about the selection of two-qubit gates.However, evaluating quantum instruction sets with different two-qubit gates has the following challenges.
1.The set of native two-qubit gates and how accurately each gate can be realized is highly dependent on hardware implementation.Therefore, it is difficult to make a direct comparison between quantum instruction sets at a software level; 2. It is a non-trivial task to faithfully characterize a general quantum gate.In particular, the conventional approach of using Clifford-based randomized benchmarking (RB) [3] cannot be used to benchmark non-Clifford gates.
3. The compilation of quantum algorithms into quantum instructions is only extensively studied for CNOT [4][5][6].Although there are studies of compiling into general gates [1,7], these approaches use heuristics and so cannot give reliable data for designing an instruction set.
With these challenges in mind, in this letter we re-evaluate the design of the mainstream quantum instruction set for superconducting circuits that is based on CNOT, iSWAP, or gates that differ from either only by single-qubit gates [8].Given all the focus on maximizing the fidelities of these two specific types of gates [9,10], it is worthwhile to critically analyze these particular choices of quantum instruction sets.In particular, we consider the instruction set with iSWAP and single-qubit gates.We numerically compute and experimentally demonstrate that for a quantum processor with such an instruction set, designing an instruction by replacing iSWAP with its matrix square root SQiSW can both reduce the gate error and improve compilation capabilities substantially.This new design can therefore significantly improve the quantum instruction set.
More specifically, the square root of the iSWAP gate, abbreviated as SQiSW, whose matrix form can be written as is in the iSWAP family and has been proposed and experimentally realized in earlier works [11][12][13][14][15][16][17][18].The most common technique to realize iSWAP is to bring two transversally coupled qubits into resonance, and SQiSW is realized by halving the gate time for iSWAP.Thus, SQiSW is expected to be less susceptible to errors from decoherence and stray interactions.Despite hints of promise, few works have systematically studied the SQiSW from a holistic perspective, which we aim to accomplish in this work.In particular, we establish its control accuracy and compilation efficiency as follows.arXiv:2105.06074v3[quant-ph] 28 Jun 2022 • Control accuracy: Taking into account decoherence, stray coupling between qubits, and instrumental limitations on experimental control parameters, we conduct a numerical simulation with experimentally achievable circuit parameters and decoherence times and estimate that we can realize SQiSW with an error of about 5 × 10 −4 , which is only half of the error of iSWAP on the same system.
• Compilation efficiency: We prove that an arbitrary twoqubit gate can be compiled using at most three uses of the SQiSW gate interleaved with arbitrary single-qubit gates and give an analytical scheme to compile a twoqubit gate with the optimal number of SQiSW gates.Moreover, we prove that 79% of two-qubit gates (under the Haar measure) can be generated with only two uses of the SQiSW gate.In comparison, only a zeromeasure set of two-qubit gates can be generated using two uses of common two-qubit gates such as CPHASE family gates, SWAP family gates or the iSWAP gate.As a result, using SQiSW to compile two-qubit Haar random or random Clifford gates leads to error reductions of about 63% or 35% compared to using iSWAP, assuming that the gate error of SQiSW is half of that of iSWAP and that single-qubit gate errors are negligible.
Furthermore, we experimentally demonstrate these advantages of SQiSW on a superconducting quantum processor consisting of two capacitatively-coupled fluxonium qubits.The calibration of SQiSW can be performed by following an efficient procedure for general excitation number conserving gates in [11].The detailed calibration on our specific platform can be found in the Supplementary Material.In order to characterize SQiSW, a non-Clifford gate, we develop a framework for benchmarking a general quantum gate based which we call interleaved fully randomized benchmarking (iFRB).Fully randomized benchmarking (FRB) is similar to RB but instead of using Clifford gates, we go back to the original proposal of using Haar randomgates [3].iFRB is the corresponding interleaved variant and does not require the target gate to lie in the Clifford group or exhibit any group structure (another example is the CNOT-dihedral group [19]).It is applicable whenever an efficient compilation scheme for an arbitrary gate is available, which we establish through our explicit compilation scheme for SQiSW.We experimentally benchmark SQiSW using iFRB and compare the results to iSWAP in the same experimental run.We observe a SQiSW fidelity of up to 99.72% with average 99.31%.The latter is a 41% error reduction compared to the corresponding iSWAP fidelity.We also realize Haar random two-qubit gates using SQiSW and measure an average fidelity of 96.38%, the error reduction being 50% compared to that of iSWAP.This additional reduction demonstrates the superior compilation capabilities of SQiSW.
The SQiSW gate scheme.-Two-qubitgates in the iSWAP family can be implemented by turning on a transverse coupling between qubits for a given time t [9].In the weak cou-pling limit where the coupling energy is much smaller than the qubit energy, we can neglect the transition between |00 and |11 by performing the rotating wave approximation (RWA) therefore the interaction Hamiltonian takes the well-known form of the XY spin interaction: g(σ x σ x + σ y σ y ) where g is the XY coupling strength between the two qubits.During the interaction, the population exchange of the two-qubit system can be described by the unitary where θ ≡ tg/4.When g > 0, we can implement SQiSW † by U (t = π/g) and iSWAP † by U (t = 2π/g).When g < 0, we instead implement U (t = −π/g) = SQiSW.Note that SQiSW is equivalent to SQiSW † under local unitaries: we have SQiSW † = (Z ⊗ I)SQiSW(Z ⊗ I).The properties of these two gates are essentially identical other than minor sign differences in compilation.In the subsequent sections we will consider the SQiSW gate even though SQiSW † is more common in physical implementations.
In realistic systems, this gate scheme is complicated by possible stray longitudinal coupling (ZZ interaction), inaccurate flux tuning (to bring qubits into resonance) and decoherence [20][21][22].Since SQiSW takes roughly half of the gate time of iSWAP, the corresponding contributions from the major error sources are approximately halved as well.A more detailed analysis of the error contributions, together with numerical simulations, can be found in Section III of the Supplemental Material.Considering the recent progress on fluxonium fabrication [23] and the precision of the arbitrary wave generator, according to our simulation we can expect an infidelity as small as 5 × 10 −4 for the SQiSW gate.
Since SQiSW 2 = iSWAP, replacing iSWAP in a quantum instruction set with SQiSW in principle results in negligible fidelity loss, possibly nonzero due to imprecise control on real devices.Assuming this, we next show that replacing iSWAP with SQiSW yields strict performance gain by proving that SQiSW is superior at certain compilation tasks.
Compilation capabilities of SQiSW.-Westudy how effective SQiSW can compile quantum circuits, in particular arbitrary two-qubit gates assuming single-qubit gates are free resources.We compare this with those of other widely used two-qubit gates.We give a more detailed analysis of the ability of SQiSW to compile multiqubit quantum circuits, such as W state preparation, in Section II of the Supplemental Material.
To minimize the two-qubit gate count when compiling circuits, there is a convenient tool classifying the equivalence classes of two-qubit gates modulo single-qubit gates: Theorem 1 (KAK decomposition [24]).For an arbitrary U ∈ SU (4), there exists a unique η = (x, y, z), The equivalence class of a two-qubit unitary U modulo local, or single-qubit, unitaries is characterized by the interaction coefficients η(U ), which lives in a 3-dimensional tetrahedron called the Weyl chamber [25] We say that two unitaries U, V ∈ SU (4) are locally equivalent, or , that is, they can be transformed into each other with only single-qubit gates.
We prove that other than trivial compilations of gates that are locally equivalent to I or SQiSW, a two-qubit gate can be realized with 2 applications of SQiSW if and only if its interaction coefficients (x, y, z) satisfy x − y ≥ |z|.All other two-qubit gates can be realized with 3 applications of SQiSW.
An explicit compilation scheme for all two-qubit gates minimizing SQiSW gate count, the corresponding proofs, and specific compilation schemes for special families of two-qubit gates can be found in Section I of the Supplemental Material.To complete the compilation scheme, arbitrary single-qubit gates can be implemented in hardware using for example the PMW-3 scheme in [26].
We also compare the ability of SQiSW to compile twoqubit gates with that of other two qubit gates.The average two-qubit gate count required to generate Haar random twoqubit gates L H and random Clifford gates L C , together with other properties, are summarized in Table I.It can be seen that, although only being half of the iSWAP gate, the SQiSW gate is actually much more efficient in compiling Haar random two-qubit gates than the iSWAP gate, while not being much worse in terms of compiling random Clifford gates.If we assume that SQiSW has half the error of that of iSWAP, by using SQiSW we expect a 63% reduction in the average infidelity for Haar random gates and a 35% reduction for random Clifford gates.However, note that other two-qubit gates have been proposed in the literature, such as the B gate [27] and the iSWAP 3 4 gate [16].However the former does not have a high-fidelity experimental gate scheme while the latter does not have an explicit compilation scheme.SQiSW is the first gate to our knowledge that outperforms iSWAP and CNOT in compilation abilities while still admitting a highfidelity gate scheme and an explicit compilation scheme.
Experiment.-We experimentally realize SQiSW and iSWAP on a fluxonium quantum processor.Fluxonium is an attractive candidate for the next generation of superconducting qubits because of its large anharmonicity and long coherence times.The anharmonicity is an order of magnitude larger compared to that of transmon which allows for performing low-leakage gates without the need for complex pulse SQiSW CNOT [28]   shaping.With proper calibration, the fidelities of quantum gates are mainly limited by decoherence.Therefore, fluxonium is particularly suitable for demonstrating the advantages of SQiSW.
The experiment setup and quantum processor used in this experiment are the same as that of [29], but this experiment is performed in a different cool-down cycle.For individual single-qubit gates, the two qubits Q A and Q B are fixed at their sweet spots corresponding to 1.09 GHz and 1.33 GHz, respectively, and a resonant microwave pulse generated by an arbitrary waveform generator (AWG) is applied to each qubit through their own charge line.
The two fluxonium qubits in our processor are capacitively coupled, with a transverse coupling strength g.To realize an iSWAP-like gate, we modulate the external flux of Q A to bring it in resonance with Q B .The flux modulation pulse has a simple error-function shape and is also generated by an AWG and applied to Q A through its individual flux line.We use the same method as in Ref. [29] to calibrate the flux amplitude Φ amp that brings the two qubits in resonance.With a coupling strength of g/2π ≈ 11.2 MHz, the gate times of iSWAP and SQiSW are 48.5 ns and 24.6 ns, respectively.The detailed calibration of the SQiSW gate is summarized in Section IV of the Supplementary Material.
We apply a variant of interleaved randomized benchmarking (iRB) [30] to characterize the fidelity of SQiSW and compare it with iSWAP.The iRB protocol characterizes the fidelity of a particular target gate by measuring the difference between the fidelity of a random Clifford gate, and a random Clifford gate appended by the target gate, both through the standard RB protocol [31].The iRB protocol requires that the target gate be chosen from the Clifford group as the reference gate set (the random gates) are all from the Clifford group, which is not directly applicable to the SQiSW.Instead, in our experiments we choose the random gates from the Haar random distribution on all two-qubit rotations SU (4).We call randomized benchmarking based on the whole (special) unitary group fully randomized benchmarking (FRB), and the corresponding interleaved version interleaved fully randomized benchmarking (iFRB).The reason we apply FRB and iFRB in our experiments is two-fold: First, benchmarking arbitrary gates requires sampling the gates from an entire special unitary group; second, Clifford-group RB cannot directly benchmark non-Clifford gates such as SQiSW.FRB and iFRB applicable for characterizing SQiSW because we have an explicit compilation scheme and the required gates are all realized with high fidelities.
In order to compile arbitrary two-qubit gates, we first need  to compile arbitrary single-qubit gates.To do this, we employ the results in [26] to compile an arbitrary single-qubit gate using 3 phase-shifted microwave (PMW) pulses.In particular, we use the PMW-3 scheme.In Figure 1(a), we show the results of simultaneous single-qubit FRB using the PMW-3 scheme.An arbitrary single-qubit gate is complied with 3 primary Pauli rotations with rotation angles π/2 or π.The extracted average fidelity of the Pauli rotations of Q A (Q B ) is 99.91% (99.96%).We also include data for simultaneous single-qubit RB in Section IV of the Supplementary material.
The Clifford RB provides a standard baseline for single-qubit gate fidelities, and we see the results are comparable to the high-fidelity implementations on superconducting qubits [9].We also see that FRB can achieve comparable fidelities of more than 99.9% per Pauli rotation.This in particular demonstrates that the PMW-3 scheme can achieve state-of-the-art fidelities.Each microwave pulse is 10 ns long and is calibrated using known techniques developed for example in [32][33][34] and detailed in [29].
Theoretically, iFRB and iRB should be measuring the same quantity [35].We run iFRB for SQiSW and iSWAP and compare the results in Figure 2. We show the sequence fidelity exponential decays for the highest measured SQiSW fidelity FIG. 2. Cumulative histogram of two-qubit FRB and iFRB fidelities of iSWAP and SQiSW from 14 experiments.a a Although we measure a SQiSW iFRB of 99.80% in one of the runs, a closer look into the data reveals a poor fitting due to highly fluctuating data.We include this data point here for sake of completeness, but do not report the corresponding fidelity value as our highest.
in Figure 1(b).We also give average values and the standard deviation for each metric in Table II.We include in the table our measurements of iRB for iSWAP, which, as expected, we find very similar to the iFRB [36].We also find that the average iFRB infidelity of SQiSW is reduced by 41% compared to iSWAP, while the FRB is reduced by 50%.This is similar to what is expected from the theoretical predictions (50% and 63% reduction, respectively).In particular, the additional reduction for FRB is testament to SQiSW's superior compilation abilities.More details on our benchmarking schemes are provided in Section V of the Supplementary Material.In Section IV of the Supplementary Material we also discuss a fluctuation we observed in coherence times of our qubits, which we can possibly attribute to undesired coupling with a drifting two-level system.FRB is the fidelity of arbitrary two-qubit gates, while iFRB and iRB is the fidelity of the target two-qubit gate using the FRB and RB schemes, respectively.We also give the corresponding infidelity to make the error reduction more transparent.
Conclusions.-In this letter we re-evaluate the design of the quantum instruction set for superconducting qubits or any quantum computing platform that offers iSWAP as a native gate.By taking only roughly half of the time of iSWAP, the SQiSW gate is expected to be implemented with much higher fidelity.Moreover, it has superior compilation capabilities than iSWAP in the task of compiling arbitrary two-qubit gates.An iFRB experiment, which can benchmark non-Clifford gates, on our capacitatively coupled fluxonium quantum processor shows the gate error is reduced by 41% and the Haar random two-qubit gate error is reduced by 50% compared to iSWAP on the same chip .
Our work indicates that while two-qubit gates such as CNOT and iSWAP are usually chosen to be the two-qubit gates in quantum instruction sets, it is actually beneficial to investigate other gates that are natively supported on the platform.On our fluxonium processor, choosing SQiSW improves our system performance substantially with the average error rate of a Haar random two-qubit gate decreased by 50%.A series of works [1,15,37] have studied how one can make full use of alternative two-qubit gates for compiling quantum circuits.We therefore advocate for a systematic study of hardware native gates as possible candidates to include as quantum instructions.
Additionally, the FRB and iFRB schemes are general schemes for benchmarking arbitrary gates, assuming that a Haar random unitary can be implemented efficiently.Although such schemes face the same scaling issues of ordinary Clifford-based RB, it is still useful for characterizing hardware native gates, which are usually either single-qubit or two-qubit.Our work is the first to demonstrate the applicability of the iFRB scheme as a general sol tion for benchmarking a non-Clifford gate.A recent work [35] established a framework for analyzing matrix exponential decay behaviors of RB schemes that involve a continuous gate set under gate-dependent noise.We leave the theoretical interpretation of the iFRB decay exponent in terms of the target gate fidelity for future work.
We would like to thank Casey Duckering, Eric Peterson and Jun Zhang for insightful discussions.We thank all those Alibaba Quantum Laboratory members who contributed to the development of the quantum hardware.DD would like to thank God for all of His provisions.LK is supported by NSF grant CCF-1452616.In this section, we first show some basic mathematical properties of the SQiSW gate and then study the information processing capabilities of the SQiSW gate, in particular its ability to compile two-qubit and higher operations.Specifically, we prove that an arbitrary two-qubit gate can be decomposed into at most three applications of the SQiSW gate interleaved by single qubit rotations and give explicit decompositions for certain families of two-qubit rotations.The CNOT gate and the iSWAP gate also generate all two-qubit gates with three applications; however we prove that a majority (∼ 79%) of two-qubit gates, under the Haar measure, can be generated using only two uses of the SQiSW gate, whereas gates generated by two uses of the CNOT gate or the iSWAP gate only spans zero-measure sets.We lastly prove that SQiSW has an advantage over the CNOT and iSWAP gates in the task of preparing W-like states.We summarize some useful mathematical properties of the SQiSW gate.Besides being the square root of the iSWAP gate, SQiSW satisfies the following properties: • SQiSW lies in the third level of the Clifford hierarchy: just like the T gate and the Controlled-S gate, the SQiSW gate conjugates Pauli matrices to Clifford matrices.Also, it is not in the second level of the Clifford hierarchy, meaning that it itself is not a Clifford gate.
• SQiSW is a perfect entangler, that is, it maps a product state into a maximally entangled state.Explicitly, • SQiSW is an excitation number preserving gate, meaning that for all θ, [SQiSW, Z θ ⊗ Z θ ] = 0.
To explore further properties, we first introduce some mathematics.

KAK decomposition and the Weyl chamber
The KAK decomposition and Weyl chamber provide mathematical tools to characterize two-qubit gates up to single qubit gates.That is, they give the "non-local" information of a two-qubit gate.In particular, the KAK decomposition characterizes equivalence classes of two-qubit unitaries, or elements in the group SU (4), under actions by single-qubit rotations in SU (2) ⊗ SU (2) before and after.This perspective is particularly useful when we can experimentally regard single qubit local operations as free resources that introduce little error compared to two-qubit gates.We here directly state the results and refer the reader to [1,2] for more detailed expositions.

Define the magic basis change matrix
The KAK decomposition theorem can be equivalently stated as follows: where A, B ∈ SO(4) and The equivalence class of a unitary U under local unitaries is characterized by the interaction coefficients η(U ), which lives in a 3-dimensional tetrahedron called the Weyl chamber [37] We say that two unitaries U, V ∈ SU (4) are locally equivalent, or U ∼ V , if η(U ) = η(V ).We give the interaction coefficients for some common gates: • I : η(I) = (0, 0, 0); FIG. 1: An illustration of the Weyl chamber and the positions of common gates.Note that SQiSW lies in the midpoint of the identity and iSWAP.The point SWAP † is to be identified with the point SWAP but is drawn separately for easier visualization.
, 0, 0).Note that CNOT ∼ CZ by a local Hadamard conjugation on the target qubit; , 0).These gates and their positions in the Weyl chamber is given in Fig. 1.
Definition 1.Let L(x, y, z) ≡ exp i[x, y, z] • Σ be the canonical element of the equivalence class.

Local invariants and the character polynomial
The KAK decomposition geometrically characterizes the equivalence class of a unitary U ∈ SU (4); however, it requires diagonalization of matrices and thus can sometimes be difficult to study analytically.Local invariants [4] characterizes the equivalence class of the unitary U , while still being easy to solve.There are many different choices of local invariants.We choose ours to be the degree-4 polynomial where [•] and [•] represent the (element-wise) real and imaginary part of a matrix.We call this the character polynomial.To see that the polynomial is locally invariant, we first observe The polynomial is a complete characterization of the equivalence classes as the zeros of the polynomial are − cot(x − y + z), − cot(x + y − z), − cot(−x − y − z) and − cot(−x + y + z) by evaluating it on the canonical element.Hence, one only needs to check the corresponding character polynomial coefficients in order to determine whether two unitaries are locally equivalent.Furthermore, since U, U * ∈ SU (4), leaving the character polynomial with three free coefficients.For U with interaction coefficients (x, y, z), we have where

Effective target size
An interesting way to quantify how easy it is to realize a two-qubit gate with quantum control is its effective target size, as put forth in [5].Intuitively, the effective target size is the invariant volume of the region around a two-qubit gate that corresponds to a small perturbation of its interaction coefficients.We show in this section that the effective target size of SQiSW is larger than that of CNOT and iSWAP, having a target size that scales with the perturbation better than any other common two-qubit gate, save for the B gate.
Let U ∈ SU (4) and its interaction coefficients η(U ) = (x, y, z) ∈ W . Furthermore, let U be the neighborhood of η(U ) given by a box with edge length a centered on η(U ) and with sides parallel to the x, y, z axes.Then, the effective target size of U is defined as where dµ is the Haar measure over SU (4) and is the normalized Haar measure over W .We note that the effective target size is the same for mirror gates (differ by a SWAP) since its definition is symmetric under exchanging which qubits we deem as the first and the second.Now, the Weyl coordinates of SQiSW is (π/8, π/8, 0).Its mirror gate has coordinates (e.g.see [6]) where the first equivalence follows by subtracting π/2 from the first coordinate and the second follows from flipping the signs of the first and third coordinates [2].Hence, the effective target size of SQiSW is the same as that of its mirror gate (π/4, π/8, π/8) which is given [38] in (47) of [5]: This is a larger area than that of CNOT and iSWAP [5]: Note that the effective target sizes of CNOT and iSWAP are the same since they are mirror gates up to local equivalence.

B. Compiling two-qubit gates into SQiSW and single-qubit rotations
We first study the problem of compiling arbitrary two-qubit gates.In particular, we prove the following theorem.
Theorem 2. Every two-qubit unitary can be expressed by at most 3 SQiSW gates interleaved by single qubit gates.
The proof of the theorem will consist of two steps.We first completely characterize the set W (S 2 ) of all two-qubit gates that can be generated using only 2 uses of the SQiSW gate.We second show how to decompose a gate outside of W (S 2 ) into one use of SQiSW and one use of a gate in W (S 2 ).This completes the proof.We end by providing an explicit decomposition algorithm.

Weyl chamber region spanned by two SQiSW gates
We now study the region in the Weyl chamber that can be generated by two SQiSW gates interleaved with single qubit rotations.This will later help us establish compilation schemes of arbitrary two-qubit gates using SQiSW and single qubit rotations.
Lemma 1.The Weyl chamber region spanned by two SQiSW gates is the region described by the inequalities Proof.Denote the subset of the Weyl chamber spanned by two SQiSW gates W (S 2 ) and The proof proceeds in two steps: We first prove that W ⊂ W (S 2 ) by giving an analytical solution to the interleaving single qubit rotations for a general element in W , and then prove that W (S 2 ) ⊂ W by investigating the character polynomial coefficients associated to a general element in W (S 2 ).
W ⊂ W (S 2 ): We prove this statement constructively by giving explicit analytical forms for the interleaving single qubit rotations: For (x, y, z) ∈ W , consider the following gate in W (S 2 ): Here we define sgn(z) = 1 if z ≥ 0 and is otherwise −1.Note that C in terms of x, y, z was given in eq. ( 3).One can verify that all operations including the square root and the inverse cosine functions are legal when (x, y, z) ∈ W , and one can also verify that the interaction coefficient associated to U (α, β, γ) is indeed (x, y, z) by comparing the coefficients in the character polynomial.
W (S 2 ) ⊂ W : Up to local equivalence, a general element in W (S 2 ) can be parameterized by six parameters: where S is shorthand for SQiSW.The corresponding coefficient C in the character polynomial associated to it is then Combining eq. ( 8) with with eq. ( 3), we have FIG. 2 where (x, y, z) = η(U (α, β, γ 1 , γ 2 , δ 1 , δ 2 )).Since (x, y, z) ∈ W ensures that sin(x + y − z), sin(x + y + z) ≥ 0, we know that Combining this constraint with the ones from W gives us In Fig. 2 we show the region W ⊂ W .We also show a schematic of how an element of W (S 2 ) can be decomposed in Fig. 3.

Decomposing arbitrary two-qubit gate into ≤ 3 SQiSW gates
We now consider unitaries whose interaction coefficients lie outside of the region W .Those gates include the SWAP family (x, x, ±x), the Sycamore fSim gates and so on.We show below that a third SQiSW gate is sufficient to span the whole Weyl chamber.
Given that all gates in the Weyl chamber region W can be generated using 2 SQiSW gates by Lemma 1, it suffices to prove the following.
Proof.Before proceeding to the proof, we first visualize the constraints imposed by region W and W in terms of the eigenphases {a 0 , a 1 , a 2 , a 3 } of L(x, y, z), where The constraint that (x, y, z) ∈ W can be equivalently stated as It can be deduced that a 0 ≥ 0 ≥ a 3 .(x, y, z) ∈ W imposes an additional constraint: Assuming that z = 1 2 (a 1 + a 2 ) ≥ 0 (the other case can be reduced to this by observing that SQiSW ∼ SQiSW † and L(x, y, z) ∼ L † (x, y, −z)), (x, y, z) ∈ W \ W indicates that this additional constraint is violated via the sign violation a 2 > 0. We show that the following is true: we can always select a i , a j , i = j, append on them phases π 4 , This indicates that there is a way of decomposing L(x, y, z) to L(x , y , z ) associated to the eigenphases . Explicitly, we argue via the following two cases.We also give a visual argument in Fig. 4.
where "sort" means the set is sorted in descending order.One has Note that when an eigenphase crossing happens, i.e. that a 2 − π 4 < a 3 in case 1 and a 2 − π 4 < a 3 + π 4 in case 2, additional single qubit gates need to be applied to switch the two-qubit unitary to the canonical form for compilation purposes, see Algorithm 1.

Decomposition algorithms for two-qubit gates into SQiSW gates
The full decomposition algorithm for an arbitrary two-qubit gate into sequences of single qubit rotations and the SQiSW gate is summarized in algorithm 1 and visualized in Fig. 5.We also list compilation schemes of some common two-qubit gates or gate families into SQiSW below and summarize the results in Fig. 6.
In this section and throughout the rest of the paper, we use || to denote the concatenation of two quantum gates.For example, A||B represents a composite quantum gates where A is applied before B, resulting an overall operation of B • A. FIG.4: Illustration of the eigenphases a 0 ≥ a 1 ≥ a 2 ≥ a 3 .Being in W requires that x ≤ π 4 , and a 0 , a 1 and a 2 , a 3 lie symmetrically with respect to x and −x respectively.Fig. 4a corresponds to no sign violation of a 2 and hence can be generated using 2 SQiSW gates.Fig. 4b and Fig. 4c are possible value assignments corresponding to the two eigenphase modifications corresponding to case 1 and 2 in the proof, respectively.< 0, the < case indicated in blue.The proof only deals with the case z > 0, where the case z < 0 follows similarly with appropriate inversions.(c) Whether there is eigenphase crossing, i.e. whether the order of a 0 , . . ., a 3 is preserved after the phase modification.The purple region shows when it isn't, and a corresponding correction needs to be made in order to transform the gate to its canonical form.(d) The corresponding modifications are the green R z -conjugations around the SQiSW gate, the violet R x -conjugations around the L(x , y , z ) in red, and the cyan Z gates on the first qubit.

34:
if s < 0 then return (x , y , z ), A1, A2, B1, B2, C1, C2 38: end procedure Before proceeding, note that [R z (α) ⊗ R z (α), SQiSW] = 0 for all α.This introduces gauge freedom in compilation of the circuit and enables us to choose the single qubit gates with the simplest form in our compilation.
Therefore, one can check by applying the gauge freedom that where In the case of special orthogonal gates, the single qubit corrections can be solved analytically.Let Then Specific examples of special orthogonal gates include: • Therefore, In this case we can also explicitly solve for the single qubit corrections.Let )).
Decomposition for the other half of the improper orthogonal gates can be obtained by first decomposing it into one SQiSW and a gate in W (S 2 ), then decomposing the gate in W (S 2 ) by observing that it is an improper orthogonal gate.

Comparison with other two-qubit gates
It can be observed from the visualization in Fig. 2 that W takes up 1/2 of the entire Weyl chamber, similar to the set of all perfect entanglers [1].However, the measure in the Weyl chamber does not reflect the Haar measure of the unitary group FIG. 6: Summary of compilations of common two-qubit gates into SQiSW gates SU (4) [5].Indeed, the probability that a Haar random element in SU (4) can be decomposed with two SQiSW gates can be calculated as It is well known that the B gate with interaction coefficients ( π 4 , π 8 , 0) spans the whole Weyl chamber with only two uses [3].It is also well known that many two-qubit gates, including the CNOT gate, the iSWAP gate and the other gates in the super controlling gate family, generates the whole Weyl chamber with three uses [8].We find that SQiSW lies in-between: although it cannot generate the whole Weyl chamber, it generates a unitary subset of a nonzero measure.Although this holds for general two-qubit gates [9], we show that two uses of standard gates such as CNOT, iSWAP or the SWAP family actually generate a subset of the Weyl chamber with zero measure, even though three uses of either gate span the whole Weyl chamber.Proposition 1.For any two gates U 1 , U 2 ∈ SU (4) in the CPHASE gate family η(U 1 ) = (x 1 , 0, 0), η(U 2 ) = (x 2 , 0, 0) and any two single qubit gates A 1 , A 2 ∈ SU (2), define the gate Then the last element in η(V ) is always zero.Equivalently, V must lie in the I-CNOT-iSWAP plane in the Weyl chamber.
Proof.Let η(V ) = (x , y , z ).It can be checked from the characteristic polynomial F V (t) that the corresponding polynomial coefficient B = sin 2x sin 2y sin 2z = 0, regardless of how U 1 , U 2 , A 1 , A 2 are chosen.This indicates that z = 0 given π 4 ≥ x ≥ y ≥ |z|.Two CPHASE family gates only generates a two dimensional submanifold because they are U (1)-covariant, or "leaky" [9,10]; we have for all θ 1 , θ 2 and φ.By commuting the Z-rotations, the interleaving single qubit gates, which can each be decomposed into a Z − X − Z sequence of rotations, can only generate a two-dimensional manifold in the Weyl chamber, as illustrated in Fig. 7.
By making use of the properties of mirror gates, we can extend this result to gates on the iSWAP − SWAP line as well.The results are visualized in Fig. 8.

CPHASE(φ1)
Rz(α1)Rx(β1)Rz(γ1) FIG. 7: Decomposing each single qubit gate as a Z − X − Z sequence of rotations and commuting the Z-rotations with the CPHASE gates, it can be seen that two CPHASE gates only generate a two dimensional submanifold of the Weyl chamber.Corollary 1.For any two gates and any two single qubit gates A 1 , A 2 ∈ SU (2), define the gate Then the last element in η(V ) is always zero.
Corollary 2. For any two gates U 1 , U 2 ∈ SU (4) such that and any two single qubit gates A 1 , A 2 ∈ SU (2), define the gate Then the first element in η(V ) is always π 4 , i.e. the gate V always lies inside the CNOT − iSWAP − SWAP plane of the Weyl chamber.
We can also prove a result for the case when we have two gates in the SWAP family.This result is visualized in Fig. 9.

Proposition 2. For any two gates
and any two single qubit gates A 1 , A 2 ∈ SU (2), define the gate Then (x , y , z ) ≡ η(V ) must satisfy y = x or y = |z |.Equivalently, V must either lie in the I − CNOT − SWAP plane, or the I − SWAP † − SWAP plane in the Weyl chamber.Proof.Let η(V ) = (x , y , z ).It can be checked that the character polynomial F V (t) has a zero with multiplicity two: regardless of how U 1 , U 2 , A 1 , A 2 are chosen.This indicates that there must be at least one equality in the inequalities x + y − z ≥ x − y + z ≥ −x + y + z ≥ −x − y − z , or equivalently, one equality in x ≥ y ≥ |z |.

II. COMPILATION: CIRCUITS WITH MORE THAN TWO QUBITS
To further demonstrate the information processing superiority of SQiSW, we prove a linear separation between the number of gates needed to generate an n-qubit W-like state from the product state |0 ⊗n using the CNOT gate and the SQiSW gate.Considering the corresponding family of circuits with the SQiSW gates, our proof extends to a linear separation between the gate counts in the task of compiling this family of circuits using the SQiSW gate and the CNOT gate.(Note that iSWAP is equivalent to CNOT for this purpose, since they are mirror gates up to local equivalence.)Throughout we consider single qubit rotations as free resources.

A. W state and W-like states
The n-qubit W state is defined as An interesting property of the W state is that it is robust against the disposal of qubits [11]: even after tracing out any subset of n − 2 qubits, the marginal state of the remaining two qubits is still an entangled state.In contrast, the most common multipartite generalization of maximally entangled states, the n-qubit GHZ state |GHZ n = (|0 ⊗n + |1 ⊗n ) / √ 2, does not satisfy this condition when n ≥ 3.This special property of the W state can be abstracted as follows: Definition 2. An n-partite state |Ψ is W-like if the marginal state on any two subsystems is an entangled bipartite state.
It is easy to see that n-qubit states of the form are special cases of W-like states.Now, in order to generate any n-qubit W-like state from |0 ⊗n , it takes at least n − 1 two-qubit gates, since the generating circuit as a graph needs to be at least connected.Surprisingly, n − 1 SQiSW gates is also sufficient to generate a particular W -like state of the form in eq. ( 9): it can be verified that ⊗n is a state of the above form where α i = 2 −i/2 for i ≤ n − 1, and α n = α n−1 .
In contrast, we show that two-qubit gates that are equivalent to a diagonal gate up to local unitaries, such as CNOT ∼ CZ, are ill-suited for generating any W-like state.We have the following result.Theorem 3.An n-qubit W-like state cannot be generated using single qubit gates and less than 15n−3 14 CNOT gates.This is evidence that SQiSW has better information processing capabilities beyond just compiling two-qubit gates, but it is unclear how general the statement can be made.
In the rest of the section, we will provide a proof to Theorem 3. In Section II B we will provide a graph theoretical perspective towards quantum circuits.We will first establish the limitations on the entangling power of CNOT gate in Lemma 3, which leads to nontrivial constraints on circuits that generate W-like states using CNOT gates and single-qubit gates.To be more precise, we will show in Lemma 4 that the graph given by a circuit generates a W-like state only if it is a "good graph" defined in Definition 4. Then in Section II C we will prove a lower bound for the number of edges in a good graph, which ultimately leads to Theorem 3.

B. Conversion to graph problem
The key observation is that entanglement generated by a lone diagonal two-qubit gate is inherently non-robust against the disposal of qubits: Lemma 3. Consider a tripartite system ABC where AB is not entangled with C. If we apply a diagonal two-qubit gate between B and C and then immediately trace out B, then A will still not be entangled with C (i.e. the quantum state on system AC is a separable state).
Proof.Note that any diagonal two-qubit gate U can be written in the block diagonal form First we suppose that the initial state of the system ABC is a product state ρ AB ⊗ ρ C .Then after applying U on BC and tracing out B, the state of AC becomes which is a mixture of two product states between A and C and thus is not entangled.If the initial state of the system ABC is a mixture of multiple product states between AB and C, then the final state will become a mixture of such mixtures, so A and C are still not entangled.
Corollary 3. A 3-qubit W-like state cannot be generated with only 2 diagonal two-qubit gates.
Proof.If such a generation scheme exists, then without loss of generality we can assume that the first two-qubit gate is between qubits A and B and the second is between qubits B and C, but Lemma 3 shows that after the second two-qubit gate, A is still not entangled with C, and single-qubit gates thereafter will not help.
In fact, this argument can be generalized to give a non-trivial bound for any number of qubits.Represent a circuit on n qubits by an undirected graph on n vertices with distinct edge weights.Larger edge weight means that an edge correspond to a two-qubit gate later in the circuit.Definition 3.An edge C-D is considered a useful edge with respect to the vertices A and B, if there exists a path C-D-D 1 -D 2 -• • • -D t such that all edges in the path have strictly increasing weights, and D t ∈ {A, B}, and the same condition also holds for the other direction of the edge.
Note that if D ∈ {A, B} then the condition in one direction is automatically satisfied by t = 0. Also, D 1 can be the same as C, but then by the strictly increasing condition there must be at least two edges between C and D. Definition 4. A graph is good if for all vertex pairs A and B, there is a path between A and B that consists entirely of useful edges with respect to the vertices A and B.
Note that this trivially implies a good graph must be connected.Lemma 4. Consider a circuit on n qubits in which all two-qubit gates are diagonal.If that circuit can generate an n-qubit W-like state from |0 ⊗n , then the corresponding graph must be good.
Proof.Consider any vertex pair A and B in the graph.Since the final state of the circuit is W-like, in the final state A and B must be entangled.We will show that this implies that A and B are connected by a path consisting entirely of useful edges (with respect to the vertices A and B).First, we trace out all qubits except A and B from the final state to get the marginal state ρ AB .Then we remove some edges from the graph corresponding to the circuit in two sequential steps: 1. Any single-qubit gates followed immediately by a trace operator, as well as two-qubit gates followed by two trace operators on both qubits involved, can be removed without affecting the marginal state.Repeat this step until there are no more gates to remove.
2. Then, for each diagonal two-qubit gate U followed immediately by a trace operator on one of the qubits involved (say, the first qubit), we write U as Similar to in the proof of Lemma 3, after tracing out the qubit, the final state can be written as a mixture of two components, in each component the two-qubit gate is replaced by two single-qubit operations.Therefore, we remove all edges corresponding to such gates from the graph.(Note that this step cannot be repeated because this step removes the trace operator, too.
In what is left of the graph, there must still be a path connecting A and B; otherwise, A and B will not be entangled in any of the components of the mixture, and thus they will not be entangled in ρ AB .It suffices to show that this path consists entirely of useful edges in the beginning.
In fact, consider any edge C-D left in the graph.Since this edge was not removed in the second step, either D ∈ {A, B} or there was at least one other two-qubit gate on D between C-D and the final trace operator on D. In the second case, let the last of those gates be D-D 1 .This gate must have been removed in the second step since it is followed immediately by a trace operator on D, but it must have not have been removed in the first step, either because D 1 ∈ {A, B} or because there was at least one other two-qubit gate on D 1 between it and the final trace operator on D 1 .Repeating this argument, since there are a finite number of gates, we must end at some D t ∈ {A, B}, which gives a path satisfying the condition in Definition 3.This argument can be similarly applied to C. Therefore, C-D was an useful edge in the original graph.

C. Bound of edge numbers in a good graph
We first consider the case where G doesn't have any parallel edges and deal with the parallel edge case in the proof of Proposition 3.
Lemma 5. Consider a good graph without any parallel edges and with at least 3 vertices.Then, there can be at most one vertex with degree 1.
Proof.Suppose there are at least 2 vertices with degree 1, and let X and Y be two of them.Let e X and e Y be the edges incident to X and Y respectively, and without loss of generality suppose w(e X ) < w(e Y ).(e X and e Y cannot be the same edge, because otherwise X and Y will be disconnected from other vertices.)Then e Y cannot be a useful edge with respect to X and Y , as the direction starting from Y has to end on X (it cannot end on Y because there is no parallel edge) and have to go through e X , but e X has a smaller weight.Then there cannot be any path of useful edges that connects X and Y .
We now prove the core result which with Lemma 4 leads directly to Theorem 3. Given a connected graph G without any parallel edges, we can consider the subgraph generated by its set of vertices with degree 2. Each connected component in this subgraph consists of vertices connected one after another.We call each component a chain.We now consider two cases: 1.In the degenerate case where a chain is a cycle, we can directly argue: • Suppose one component is a cycle and there are other connected components, then the cycle must be disconnected from other vertices in the original graph, and the graph cannot be good.
• Otherwise suppose the cycle is the only connected component, then the original graph itself must be a cycle.This could be further divided into two cases based on the total number of vertices.
-A cycle with 3 vertices is a good graph and obeys m ≥ 15n−3 14 .-A cycle with at least 4 vertices cannot be a good graph.Consider any 2 vertices that are not neighbors in the cycle.
There are 2 paths connecting this pair of vertices, and each path contains at least 2 edges.The edge with largest weight in each path each cannot be useful with respect to this pair, so these 2 vertices are not connected by useful edges.
2. No chains are cycles.Then, we first establish the following lemma: Lemma 6.For a good graph without any parallel edges or any cycle chains, each chain can have at most 4 vertices.
Proof.Suppose a chain has at least 5 vertices.Denote the first 5 vertices starting from one end by P 1 , P 2 , . . ., P 5 .Let M = P 2 and N = P 4 be the vertices connected to P 1 and P 5 respectively.Then we define e k as the edge connecting P k and P k+1 , 1 ≤ k ≤ 4. The edges (M, P 1 ) and (P 5 , N ) are denoted by e 0 and e 5 respectively.Now we consider the useful edges with respect to M and P 2 .e 0 and e 1 cannot be both useful, so the path of useful edges between M and P 2 should go through the edges e 2 , e 3 , . . ., which implies w(e 2 ) < w(e 3 ).
Similar analysis could be done to P 4 and N and yield w(e 2 ) > w(e 3 ), which leads to a contradiction.Now, suppose that every vertex has degree at least 2. We can replace each chain by a single edge between the pair of vertices that the chain connects and remove all the disconnected vertices.In this new graph each vertex has degree at least 3, so we have where n and m are the number of vertices and edges respectively of the new graph.Letting n 2 be the number of vertices of degree 2, by Lemma 6, The numbers of vertices and edges in the original graph are n + n 2 and m + n 2 , so we have Hence, m ≥ 15 14 n.By Lemma 5 there can only be at most 1 vertex with degree 1.In that case we can remove that vertex and apply the calculation above, which shows that m ≥ 15 14 (n − 1) + 1 = 15n−1 14 .In summary, if there are no parallel edges, we must have m ≥ min{ 15  14 n, 15n−1 14 , 15n−3 14 } = 15n−3 14 .Note that we have this bound purely for the n = 3 cycle case.For n > 3, we can improve this to 15n−1 14 .
Allowing parallel edges: we prove the theorem by induction.As we have already known, the statement is true for n = 3.For n > 3, if there is no parallel edges, the theorem is proved true above.Otherwise, we can contract a pair of vertices connected by parallel edges and the resulting graph with n − 1 vertices is still a good graph, thus by the inductive hypothesis should have at least 15(n−1)−3 14 edges.Then the number of edges in the original graph is at least 15(n−1)−3 14 + 2 > 15n−3 14 .

III. CONTROL: NUMERICAL SIMULATION FOR THE ERROR RATE
In this section, we give numerical evidence that we can implement SQiSW on superconducting platforms with ultra-high fidelity.
Two-qubit gates in the iSWAP family can be implemented by tuning two superconducting qubits with transversal coupling into resonance.This can be demonstrated in a two-qubit system such as two tunable transmons or tunable fluxonia capacitively coupled directly or through a bus resonator.Such an implementation suffers from possible stray longitudinal coupling (ZZ interaction), inaccurate flux tuning and decoherence [12][13][14].Here we show how these errors will affect the fidelity of implementation.
First we will investigate the coherent error in the gate.Without loss of generality, we consider a two-qubit system with Y Y coupling [15], with only the lowest two levels of each qubit included.The relevant Hamiltonians near resonance are given by where H 1 and H 2 are the single-qubit Hamiltonians, H c is the coupling term, H is the Hamiltonian of the whole two-qubit system, ω and ω + ∆ are frequencies of two qubits, and g yy and g zz are the corresponding coupling strengths.g yy is the term that contributes to the iSWAP family gate.∆ and g zz are introduced to account for inaccurate flux tuning and stray longitudinal coupling, respectively.We move into the frame rotating with both qubits at frequency ω.The rotating frame transformation R(t) and the rotated Hamiltonian H R (t) are given by Assuming ω is much larger than any other frequency in the Hamiltonian, we can take the rotating wave approximation and eliminate all fast-oscillation terms with e i2ωt since the time-average is approximately zero.We thus have an approximate timeindependent Hamiltonian of the system given by The time evolution operator corresponding to this Hamiltonian is It is easy to verify that when all error terms are zero, that is ∆ = 0 and g zz = 0, U (t) is given by the well-known form of iSWAP family gate: where θ ≡ tg yy /4.Note that we assume above g yy > 0 since the gate time must be positive.When g yy < 0, we instead implement U (t = −π/g yy ) = SQiSW.Note that SQiSW is equivalent to SQiSW † under local unitaries: we have SQiSW † = (Z ⊗ I)SQiSW(Z ⊗ I).The properties of the two gates are very similar despite minor sign differences in compilation.In the subsequent sections we will consider the SQiSW gate even though SQiSW † is more common in physical implementations.
In realistic systems g zz and ∆ are generally not zero.This causes a nonzero error, which we can quantify using the average fidelity F between two unitary matrices U, V [16]: where d = 4 is the dimension of our matrices.We can estimate the infidelity E = 1 − F between the simulated physical implementation of SQiSW † with U (t = π/g yy ) and the ideal SQiSW † unitary induced by a specific error term by turning on the given error while turning off all others.We find that the infidelity induced by the ZZ interaction g zz and detuning ∆ can be written as a power series of g zz /g yy and ∆/g yy respectively, assuming each error term is small: Therefore, the infidelity of SQiSW scales quadratically with the ratio of ZZ to Y Y coupling and the ratio of ∆ to Y Y coupling.We also expect there to be a cross-interaction between these two errors, and this is investigated below.To include the decoherence of the system, we perform numerical simulations to estimate the effect of T 1 and T φ on the fidelity.The time evolution is based on the Lindblad master equations: where ρ(t) is the time-dependent density matrix, H(t) is the time-dependent Hamiltonian, Γ 1,j is the dissipation rate of j-th qubit, Γ φ,j is the dephasing rate of j-th qubit, and is the Lindblad superoperator.Note this model includes only the Markovian noise.The gate fidelity is affected little by the non-Markovian noise when the gate time is short [17], so we neglected the non-Markovian noise in the discussion.We can then compute the average gate fidelity via the process fidelity and χ-matrix as follows [16]: We choose the following range of parameters, which can be experimentally realized in fabricated fluxonium qubits, to perform the numerical simulations.For simplicity, we assume • g yy /h = 25MHz, corresponds to a SQiSW gate time of 20ns We check the effect of nonzero temperature as the thermal population of the first excited state |e may be large.However, we find no major difference between the infidelity at zero temperature and that of a typical fluxonium system (T = 50mK and frequency ω ge = 2π × 1GHz).
After computing the infidelity via numerical simulation, we perform a polynomial regression up to 2nd order in the different parameters to identify the error sources (3rd order polynomial regression does not add any new important terms as shown in Fig. 11).In particular, g zz , ∆, Γ 1 and Γ φ , are features in the regression as they are expected to increase the infidelity linearly or quadratically.The regression accuracy is plotted in Fig. 10.The polynomial regression works very well in this case: the root mean square error of the regression is on the order of 10 −9 .
To identify the key error sources contributing to the infidelity, we check the permutation feature importance as shown in Fig. 11.There are 5 dominant features in the polynomial regression.The infidelity depends linearly on Γ φ and Γ 1 , which is a general behavior of decoherence.And the infidelity also depends quadratically on g zz and ∆, which agrees with the symbolic results in eq.(10).Finally, an additional term g zz ∆ emerges, which is the cross-term between g zz and ∆.This term makes it unclear what the error contributions are due to g zz and ∆ individually.In our experiments, g zz is fixed at the design stage and ∆ is microwave tunable, so we can choose an optimal ∆ in the experiments to minimize the infidelity contributed by the terms c ∆ 2 ∆ 2 , c gzz∆ g zz ∆, c g 2 zz g 2 zz .This isolates the contribution to the infidelity from g zz .This optimal ∆ will be our operating point.However, the precision of the frequency is limited by our instruments, and there is an error caused by the deviation from the optimal point which we call ∆ p .Overall, we define the infidelity from each error source as follows: • c ∆ 2 ∆ 2 + c gzz∆ g zz ∆ + c g 2 zz g 2 zz at the optimal ∆ as the error from the stray coupling g zz ; • c ∆ 2 ∆ 2 p as the error from the instrumental limitation of the frequency ∆ p ; • c Γ1 Γ 1 and c Γ φ Γ φ as the error from the decoherence processes.These errors are plotted for both SQiSW and iSWAP in Fig. 11.Note that because we are working at the optimal ∆, the above errors add up to the total error.
Considering the recent progress on fluxonium fabrication [18,19] and the precision of the arbitrary wave generator, we compute for a realistic set of parameters g yy /h = 25MHz, T 1 = 100µs, T φ = 100µs, g zz /h = −0.3MHz,∆ p /h = 0.18MHz, which is what we take as input for the results in Fig. 11, SQiSW can be realized with about 5 × 10 −4 infidelity.Note in the fluxonium the T φ at the operation point is limited by 1/f flux noise.Its error can be estimated by t 2 /3T 2 φ , which is < 10 −4 when T φ > 2.5µs for a 40 ns iSWAP gate and it is even smaller for a 20 ns SQiSW gate.So the error from 1/f flux noise can be neglected and it is not included in the discussion.For an iSWAP-like gate, the matrix form of the unitary operator can be written as [20] U where γ is a common single-qubit phase induced by the flux modulation, χ is the relative phase between |10 and |01 states with population swapping, ζ is relative phase without population swapping, and φ is the controlled-phase induced by ZZ coupling, and is believed to be negligible according to the analysis in Section III.We assume φ = 0 throughout the section.The swap angle θ equals to π/2 for iSWAP and equals to π/4 for SQiSW.To activate the two-qubit interaction, a fast flux pulse brings the qubit with lower frequency into resonance with the other qubit.Technically, the iSWAP-like gate is realized when the qubits are kept in the same frequency.However in practice, the two qubits are biased at their own operation points with different frequencies The swap angle θ determines the gate implemented up to local equivalence.All the other phase terms can be absorbed in single-qubit phase gates for corrections: where Therefore, θ needs to be characterized and calibrated to π/4 with high accuracy, and the phase terms ζ, γ, χ only needs to be characterized with high accuracy, and can be cancelled out in the compilation stage.

B. Coarse calibration
The calibrations of the bias-crosstalk, the flux-pulse correction and the single-qubit gates should be implemented before the coarse calibration [19].Without need of special pulse-shaping for suppressing higher-energy-level leakage, we use an errorfunction shape pulse to modulate the external flux Φ ext : with σ = 0.5 ns.The first step is to measure the pulse amplitude Φ amp of the resonance point with both qubits initially biased at their respective sweet spots and prepared in the state |10 .We measure P 10 as a function of the gate duration and amplitude.The measured P 10 oscillates versus the gate duration with an angular frequency ω = 4g 2 + dω1 dΦext 2 (Φ amp − Φ res ) 2 , where g is the coupling strength of the two qubits.The measurement results can be found in [19].The resonance point Φ res corresponds to the minimal oscillation frequency.When pulse amplitude is fixed at Φ res , the population as a function of gate duration is plotted in Fig. 12.The probability P 10 can be fitted with the function P 10 (t) = (1 + cos(2gt + φ 0 ))/2.The duration of the iSWAP-like gate is estimated as T = (2θ − φ 0 )/2g.The detailed calibration process of iSWAP gate has been described in [19].Here we will focus on the calibration of SQiSW.

C. Fine calibration for θ and ζ
The gate duration of SQiSW is approximately 25 ns estimated from the previous coarse calibration.To minimize the control error, we use the following scheme to measure more accurate values of θ and ζ.A gate sequences (U • (R z (ϕ) ⊗ R z (−ϕ))) N is applied to the initial state |10 .The repeat count N is chosen to be 8 to amplify the sensitivity of final P 01 to θ and ζ.The final swapped probability can be written as δt = T + T R is the separation of two SQiSW gate, where T R is the duration of the phase gates R z (ϕ) ⊗ R z (−ϕ).The single-qubit phase gates are realized with where

D. Calibration for γ and χ
To further measure the phases γ and χ, we use the similar sequence applied to the initial state |ψ i = (|00 − i|01 )/ √ 2. To simplify the measurement, Ω is set to π/4 with ϕ = ∆ • δt − ζ.The increased phase of first qubit after multiple gates can be written as where t 1 and t N are the start time of first and last SQiSW gate.We use a Ramsey-type experiment to extract the single-qubit phase ϕ sq .After applying a second half-π pulse to the first qubit, the P 00 relates to the ϕ sq and the varied phase of second half-π pulse.Subtracting the known phase term π 2 • sgn sin N π 4 + ∆(t 1 + t N )/2, the residual phase of N γ + χ can be extracted by single-qubit tomography.We present an experiment of N = 1 as an example in Fig. 15(a).The measured P 00 can be well fitted with a cosine function, giving (γ + χ)/π ≈ −0.4615.We then increase the gate number N and measure the corresponding phases N γ + χ.A relatively accurate value of γ/π = 0.3012 and χ/π = −0.7628can be extracted by fitting the measured phases presented in Fig. 15(b) with a linear function.

E. Drift of T1 and T2
The coherence times T 1 and T 2 of our qubits fluctuated during our measurements.The highest number we measured are T point.Based on the estimation of error in Ref. [17], we find that the decoherence error of iSWAP from the relaxation and the white noise dephasing is 4.7 × 10 −3 and that from 1/f flux noise dephasing is 0.5 × 10 −3 .The coherent error is estimated to be 0.5 × 10 −3 [19].This corresponds to the best iSWAP fidelity 0.993 in Figure 4 of the main text.Also, we observed that T 1 of Q A sometimes drops significantly, as low as 1 µs at the sweet spot, possibly due to coupling with a fluctuating two-level system.In such case, the decoherence error from white noise dephasing and relaxation is as large as 2 × 10 −2 , which corresponds to the worst fidelity 0.98.Overall, the decoherence error from white noise dephasing and relaxation dominate the infidelity of the iSWAP gate.As this error is proportional to the gate time, we expect the infidelity of the iSWAP gate and the SQiSW gate is approximately 2 : 1.This roughly agrees with the measured fidelities of the two gates in Figure 1 of the main text.

F. Single-Qubit Gates
The single-qubit Clifford gate is complied with a primary set of gate operations , denoted as {I, X π , Y π , X ±π/2 , Y ±π/2 }.The measured average fidelities of this set operations of Q A and Q B are 99.90% and 99.96%, respectively.

V. CHARACTERIZATION: INTERLEAVED FULLY RANDOMIZED BENCHMARKING
We now give a variant of the interleaved randomized benchmarking (iRB) framework [21] for benchmarking the SQiSW gate.The iRB framework was first proposed to benchmark the average fidelity of a target gate given the ability to implement arbitrary Clifford gates with high fidelity.However, under the iRB framework, the target gate, i.e. the gate to be benchmarked, needs to be Clifford too.For this reason, the iRB framework is usually used on two-qubit gates such as the iSWAP gate or the CNOT gate, but not on non-Clifford gates such as the Controlled-S gate , much of the fSim gate family, the matchgates [22], or SQiSW.Our variant, called interleaved fully randomized benchmarking (iFRB), relies on the efficient implementation of Haar random gates as the reference gate set.Compared to Clifford-based iRB, the iFRB scheme is readily applicable to benchmarking of arbitrary quantum gates (not necessarily Clifford) and especially useful when benchmarking on a small quantum system where efficiency of implementing Haar random gates is not an issue.

A. FRB and iFRB
Before introducing iFRB, we first briefly recall the vanilla randomized benchmarking (RB) and the iRB frameworks.Randomized benchmarking [23] was first proposed to study the amplitude of the gate-independent, time-independent, average noise level in a quantum system, while isolating out the errors caused by imperfect state preparation and measurement (SPAM error).The experimental protocol goes as follows: for a d-dimensional quantum system, choose an appropriate gate sequence length m, choose m gates U 1 , • • • , U m i.i.d.from the Haar random distribution and compute U m+1 the recovery gate such that U m+1 U m • • • U 1 = I assuming there are no errors.The gate sequence U 1 ||U 2 || . . .||U m ||U m+1 is then performed on an initial state |0 and subsequently measured in the computational basis.A survival probability p m,j of measuring 0 is estimated from repeated experiments with sequence j of length m + 1.By averaging over many different sequences with the same length and collecting data over several different gate sequence lengths, one can extract the average gate fidelity r = 1 − (1 − u)(d − 1)/d, where d is the dimension of the system (2 n for an n-qubit system) and u is obtained from fitting the curve Here, the parameters A and B are supposed to capture the SPAM error, leaving u represent solely the imperfection in gate implementation.The requirement for Haar randomness was subsequently found to be unnecessary and unscalable to larger sized quantum systems and is relaxed to a unitary 2-design, most commonly a uniform distribution on the Clifford gates [24].To avoid confusion, we refer to the Haar random randomized benchmarking as fully randomized benchmarking (FRB) and the Clifford-based one RB.
After the first proposal of FRB, follow-up works flourished [21,[24][25][26][27][28][29], most of which were based on the Clifford gate set.In particular, interleaved randomized benchmarking [21] was proposed to benchmark the average fidelity of a specific gate, referred to as the target gate, with the hope to exclude not only the SPAM error, but the errors of other gates in a gate set.An iRB experiment consists of two parts.The first part is an ordinary RB protocol on a gate set, referred to as the reference gate set.The second part performs RB with interleaved sequences.Given a target gate T , a random gate sequence of length m, U 1 , U 2 , • • • U m , is generated i.i.d.from the Clifford group, but the final recovery gate is chosen to be U is then performed as in the RB experiment.A different error quantity v can be calculated from the decay rate of the average survival probability with respect to the sequence length, similar to u in the ordinary RB experiment.The average fidelity of the target gate can then be calculated as In order to be able to carry out the iRB experiment, it is crucial that the final recovery gate, U m+1 , lies in the reference gate set, i.e. the Clifford group.Although this holds when the target gate itself lies in the Clifford group, this is not true for many common . . .FIG. 17: Illustration of the Clifford-based iRB and the iFRB.
gates.That the iRB framework cannot be applied for non-Clifford gates is a serious outstanding issue as a non-Clifford gate is necessary for universal quantum computing, by the Gottesman-Knill Theorem [30].Several alternatives have been proposed, including choosing different finite groups other than the Clifford group [29].Altogether different benchmarking experiments were also proposed [31,32], but these alternatives either rely on extensive algebraic studies of the target gate or lack a rigorous theoretical framework for analyzing their interpretation and applicability.
To resolve this issue thorugh iFRB, we simply apply iRB, except that instead of using random Clifford gates, we return to the original FRB proposal by using Haar random gates.As there are no restrictions on the recovery gate, iFRB applied to any gate.Since Haar random gates are trivially a unitary 2-design, iFRB carries the same theoretical guarantees of RB and iRB that the noise is gate-independent.For the more general Markovian and possibly gate-dependent noises, whether the iFRB framework works as expected requires further investigation.

B. iFRB on the SQiSW gate
The biggest challenge of implementing such a fully randomized scheme is of course the efficient generation and implementation of arbitrary rotations.It is long known that the complexity of implementing Haar random gates grows exponentially with respect to the number of qubits [33].However, FRB/iFRB can still be a very useful tool to benchmark single qubit or two-qubit gates, or even unitaries acting on a small number of qubits.In superconducting systems, 1-qubit FRB/iFRB is readily realized via the virtual Z compilation scheme [34].On two-qubit systems, the FRB/iFRB framework requires the efficient generation of arbitrary two-qubit unitaries from native two-qubit gates.Luckily, for many families of gates, such as the super-controlling gates [7] and the SQiSW gate, an efficient decomposition of arbitrary two-qubit gates into an optimal number of native two-qubit gate exists.Hence we can realize two-qubit FRB/iFRB in such cases.Ui, Si ←GENRANDGATE() The information processing capabilities of SQiSW we have proved all point to its superiority in actual experimental realizations.To strengthen this claim, in this section we conduct a series of numerical experiments comparing SQiSW to iSWAP with respect to different metrics under a noisy setting.For simplicity we assume a simple depolarizing noise model.

A. Fidelity of Compiling Two-Qubit Gates
In our first experiment we compare SQiSW to iSWAP by computing the fidelity of generating arbitrary two-qubit gates in a noisy setting.An arbitrary two-qubit gate is compiled using SQiSW according to algorithm 1 and using iSWAP according to [7].We consider a simple error model: each gate is followed by a depolarizing channel with error rate p iswap = 2p sqisw = 0.005, p single = 0.0005.For the two-qubit gates, each of the qubits undergo a depolarizing channel of the corresponding error rate.
As the family of all two-qubit unitaries SU(4) has 15 real degrees of freedom, we choose one element from each Weyl chamber coordinate, appending it with randomly chosen single-qubit gates.The results we find show that the errors are dominated by the two-qubit gates.We use an interleaved version of Fully Randomized Benchmarking, or iFRB [35] to compute the fidelity value for each Weyl chamber coordinate (please refer to Section V A for more details on iFRB).The corresponding results are shown in fig.18.
It can be seen from the figure that, under this particular noise model, all gates can be compiled using SQiSW with an error rate below 1.8%.Meanwhile, although gates in the I-CNOT-iSWAP plane can be compiled with 2 applications of the iSWAP gate, reaching an error rate of about 2%, general gates requiring 3 applications of the iSWAP gate has error rate about 3%.This significant difference indicates an appreciable advantage to using SQiSW for compiling quantum algorithms.

B. Achievable Quantum Volume
Quantum volume [6] is a measure of the largest random quantum circuit of equal width and depth that a quantum computer can successfully implement.It is a all-around measure, taking into account gate fidelities, expressibility of native gate sets, quality of compilers, and even qubit connectivity.We conduct numerical experiments computing the quantum volume that directly compares using SQiSW to using iSWAP as the native two-qubit gate, ceteris paribus, under different noise levels and connectivities.Note that unlike compiled two-qubit gate fidelity, this compares the gates in a multi-qubit setting beyond just two qubits.
For the sake of being self-contained, we repeat here the definition of quantum volume.Given the number of qubits and depth d, we generate a random circuit of the form shown in fig.We first numerically compute the probability distribution over bit strings x ∈ {0, 1} d measured if we implement U on |0 ⊗d : Using this, we can define the heavy outputs as the bit strings whose probability is higher than the median: where p med is the median of the probabilities of the bit strings.Next, we compute the probability distribution obtained using imperfect gates, compilation and such: q U (x) [39].We then define We average over unitaries U according to the above distribution to obtain The quantum volume is defined as In our particular case, we generate q U (x) as follows.As before, an arbitrary two-qubit gate is compiled using SQiSW according to algorithm 1 and using iSWAP according to [7].The two-qubit gates are then subject to depolarizing noise with rate p iSWAP = 2p SQiSW , and the single-qubit gates with rate 5 × 10 −4 .
We now show the results of our numerical experiments, conducted using the quantum volume module in Cirq [36].h d is approximated by averaging over 1001 different U for increasing d (We find numerically that h U is approximately the same for all 1001 samples.),under different noise levels and connectivities.In fig.20, a complete graph is assumed and fig.21 assumes a 1-D chain graph.The chain case is done by computing a list of SWAP gates needed to implement the random permutations using the corresponding function in the quantum volume module of Cirq.Finally, in the figures we only compute even d for simplicity -the odd values show a similar trend.We see that for all the error rates we consider, SQiSW clearly outperforms iSWAP, consistently achieving a higher quantum volume.This indicates that simply changing from iSWAP to SQiSW can appreciably change the quantum volume boasted by a quantum computer.

FIG. 1 .
FIG. 1.(a) Simultaneous single-qubit FRB.The legend presents the average fidelity of a single Pauli rotation.(b) Sequence fidelity of FRB and iFRB for SQiSW for the run with highest fidelity.Each data point is averaged from 20 random sequences.

4 (
a) Eigenphases without sign violation of a2.a2 has a sign violation and x ≤ π 8 .a2 has a sign violation and x > π 8 .

)FIG. 5 :
FIG. 5: Visualization of the full compilation scheme.When a gate is outside of the region W , there are eight cases corresponding to different circuit compilations, indicated by three inequalities.(a) x ?> π/8, the > case indicated in green.This

FIG. 8 :
FIG.8:The area spanned by 2 CPHASE family gates or their mirror gates.Two gates on the I-CNOT line, or the SWAP † -SWAP line, spans the red area, whereas one gate on each line spans the green area.

FIG. 9 :
FIG. 9: The area spanned by 2 SWAP family gates.Two gates on the SWAP † -I-SWAP line spans either the red area (I-SWAP-SWAP † ) or the green area (I-SWAP-CNOT or I-SWAP † -CNOT).

Proposition 3 .
Any good graph with n ≥ 3 vertices should have at least 15n−3 14 edges.Proof.No parallel edges: We first consider graphs without any parallel edges.

FIG. 10 :
FIG.10: Infidelity via polynomial regression against simulated infidelity of SQiSW gates.The black dashed line is the x = y line for visual effect.The root mean square error of the regression is on the order of 10 −9 .

FIG. 11 :
FIG. 11: Left: The permutation feature importance for the polynomial regression in Fig. 10.Only features with importance > 0.01 are included in the figure.Right: Comparison of features' contributions to the infidelity of SQiSW and iSWAP with the parameters g yy /h = 25MHz, T 1 = 100µs, T φ = 100µs, g zz /h = −0.3MHz,∆ p /h = 0.18MHz.
rotation of single-qubit, and can be realized with a single microwave pulse.The duration of each pulse for single-qubit operation is set to 10 ns, which gives T R = 20 ns.We measure the P 01 as a function of ϕ := ϕ − ∆ • δt.The experimental data and a fit to Eq. (13) yielding θ/π = 0.2507 and ζ/π = −0.2992are shown in Fig.13.To further reduce the error in θ, we fix the pulse amplitude Φ amp and sweep gate duration over small range.We measure θ and ζ for each specified gate duration.The measured results versus different gate duration are plotted in Fig.14.The θ and ζ almost vary linearly with gate duration.By interpolating the data, we can extract a gate duration of T = 24.64 ns and ζ/π = −0.2709when θ equals π/4.In order to facilitate the experiment and reduce the impacts from possible flux distortion, we add an idle length to each flux pulse to make the gate duration T = 30 ns.

A 1 =FIG. 15 :
FIG. 15: (a) Ramsey-type experiment under N = 1, the measured P 00 oscillates versus phase of the second half-π pulse.(b)The extracted phase as a function of gate number N .

FIG. 18 :FIG. 19 :
FIG.18: iFRB fidelity value projected onto the Weyl chamber.Data points are taken where η x are multiples of π/20, and η y and η z are multiples of π/60.Each data point is collected using iFRB on a gate with the Weyl chamber coordinate, with a randomly chosen set of single-qubit operations applied before and after.For demonstration, we consider a simple error model: each gate is followed by a depolarizing channel with error rate p iswap = 2p sqisw = 0.005, p single = 0.0005.It can be seen from the figure that the effects of the randomly chosen single-qubit operations are negligible as the predominant error sources are the two-qubit gates.

FIG. 20 :
FIG.20: h d as a function of d for different depolarizing noise rates assuming we use SQiSW or iSWAP as our native two-qubit gate.We assume here a complete connectivity graph.

TABLE I .
Comparison between SQiSW and conventional CNOT/iSWAP in terms of compiling Haar random two-qubit gates and random Clifford gates.

TABLE II .
Average values and standard deviations of different benchmarking metrics for SQiSW and iSWAP over 14 experiments.
Supplemental Material for 'Quantum Instruction Set Design for Performance' : The region W = W (S 2 ) spanned by 2 SQiSW gates.It is a pyramid with vertices I, CNOT, ( π FIG.3: Illustration of decomposition of a two-qubit gate in W (S 2 ) into two SQiSW gates up to local equivalence.The special form of the interleaving single qubit gates are due to the proof that the three parameters α, β, γ running over [0, 2π] are sufficient to generate the whole region W .
return S|| DECOMP(U † ) Append the recovery gate 15: end procedure 16: procedure GENRANDGATE Generate a Haar random SU (4) gate and its corresponding decomposition into SQiSW sequence