Fast logic with slow qubits: microwave-activated controlled-Z gate on low-frequency fluxoniums

Quentin Ficheux, ∗ Long B. Nguyen, ∗ Aaron Somoroff, Haonan Xiong, Konstantin N. Nesterov, Maxim G. Vavilov, and Vladimir E. Manucharyan Department of Physics, Joint Quantum Institute, and Center for Nanophysics and Advanced Materials, University of Maryland, College Park, MD 20742, USA Department of Physics and Wisconsin Quantum Institute, University of Wisconsin Madison, Madison, WI 53706, USA (Dated: November 6, 2020)


I. INTRODUCTION
Macroscopic superconducting circuits have emerged as a leading platform for implementing a quantum computer [1]. Currently available small-scale superconducting quantum processors [2][3][4][5][6][7][8][9] have achieved a number of important milestones, including the break-even point in quantum error correction of a single logical qubit [10], digital quantum simulation [11][12][13][14][15][16], non-trivial optimization algorithms [17][18][19], and, an example demonstration of quantum supremacy with 53 qubits [20]. This progress is even more spectacular as it is solely based on a single qubit type, the transmon, which is essentially a weaklyanharmonic electromagnetic oscillator [21]. Although the transmon's simplicity makes it an exceptionally robust quantum system, its relatively weak non-linearity has become a major limiting factor for current performance and further scaling of quantum processors. Irrespective of implementation details, a weak anharmonicity inevitably leads to slower two-qubit gates, which makes them prone to decoherence errors. Therefore, a motivation has built up for exploring strongly-anharmonic alternatives to transmons that would ideally have higher intrinsic coherence and be compatible with the transmonbased scaling architectures.
In recent years, long coherence times (up to T 2 = 500 µs) were repeatedly observed in superconducting fluxonium qubits [22,23]. In this paper we describe the first logical operation on a pair of capacitively-coupled fluxoniums [ Fig. 1(a)]. Each fluxonium has one weak Josephson junction and a large capacitor, which are the * Equal contribution author same components as in the transmon, but the junction is additionally shunted by a large-value inductance [24], made from a chain of about 100 stronger junctions. Inductive shunting makes the charges on the fluxonium capacitors continuous, and hence, contrary to transmons, fluxoniums can have highly anharmonic spectra insensitive to offset charges. We operate fluxoniums near the half-integer flux bias ("sweet spot"), where the qubit transition frequency is first order insensitive to flux noise and belongs to a 100 − 1000 MHz range. The order of magnitude qubit slowdown, compared to typically 5 − 6 GHz transmons, dramatically reduces the rate of energy relaxation due to dielectric loss, an important decoherence mechanism in superconducting circuits. This straightforward trick proved largely responsible for the long coherence time of fluxoniums [22]. However, how can one slow qubits down without ending up with slower two-qubit gates?
To answer this general question, we consider the simplest form of circuit-circuit coupling by means of a mutual capacitance. In analogy with transmons, we get a coupling term proportional to n A n B , where n A and n B are the charge operators of qubits A and B, normalized to the Cooper pair charge 2e. First, we notice that capacitive coupling indeed produces little effect on the computational states |00 , |01 , |10 , |11 , because the transition matrix elements of n A(B) vanish with the transition frequency. For the same reason, the much higher energy non-computational states |12 and |21 can experience a noticeable repulsion, while the only nearby states |20 and |02 remain unaffected due to the parity selection rule, see Fig. 1(b). Therefore, connecting the two subspaces with radiation can induce an on-demand qubit-qubit interaction. For example, Ref. [25] describes a controlled-Z (CZ) gate obtained by applying a 2π-pulse to transition |11 − |21 while the closest transition |10 − |20 stays unexcited. This condition can always be met if the gate pulse is much longer than 1/∆, where ∆ is the shift of level |21 due to the n A n B -term, shown in Fig. 1(b). Yet, we show that with an optimal combination of drive detuning and amplitude, the CZ-gate can be completed in a time close to 1/∆ with zero leakage outside the computational subspace, thanks to synchronization of Rabi rotations of both non-computational transitions. For our specific device parameters (see Tables I,  II), we get ∆ = 22 MHz (1/∆ = 45.5 ns) and the optimal gate time is 61.6 ns. Remarkably, since ∆ is not tied to the qubit frequencies (here 72.3 MHz and 136.3 MHz), the logical operation takes just a few qubit Larmor periods. To our knowledge, such a high relative gate speed is unmatched across quantum computing platforms.
Population transit through non-computational states is common in gates realized with transmons [26][27][28][29][30][31][32][33]. For example, repulsion of states |11 and |20 enables a CZgate via flux-tuning of these states in and out of resonance [27]. Recent high-fidelity versions of this gate rely on diabatic flux pulses [30,33], resulting in a significant population of state |20 for a short time, which draws parallels to our microwave-controlled scheme. In the case of fixed-frequency qubits, repulsion between states |30 and |21 was used in Ref. [34] to implement a CZ-gate activated by a microwave pulse at a frequency near the transition |11 − |12 . However, the insufficient transmon anharmonicity introduces many nearby transitions (there is only one relevant transition |10 − |20 for fluxoniums), and in the end such a gate proved impractical. Even in gates designed to operate entirely within the computational subspace, e.g. the flux-activated |10 − |01 swap gate [35][36][37] or the cross-resonance gate [38,39], uncontrolled population leakage to non-computational states remains an important factor limiting gate speed [40]. Yet, such coherent errors can be practically eliminated in fluxoniums owing to their highly anharmonic spectra, as exemplified by the gate scheme reported here.
Another important advantage of fuxoniums over fixedcoupling transmons is that the static ZZ-shift, coming from the repulsion of computational and noncomputational states, can be suppressed by over an order of magnitude compared to typical values in capacitively coupled transmons (here about 40 kHz), largely thanks to the low qubit frequencies. For transmons, the static ZZshift is an important source of gate error, the mitigation of which draws additional resources. Thus, in the case of the cross-resonance gate, the ZZ-term is suppressed by a combination of circuit parameter matching, additional echo pulse sequences incorporated into the gate protocol [40], and additional qubit rotations [41]. An alternative strategy to eliminate the ZZ-shift is to use flux-tunable couplers [20,42,43], which in practice act as separate quantum systems and hence increase the circuit complexity. More recently, the ZZ-shift was suppressed in capacitively-shunted flux qubits using an additional drive, but at the expense of operating away from the flux  [22] except the antennas are intentionally made asymmetric for optimal coupling to the readout cavity [not shown]. (b) Diagram of the lowest energy states of the interacting twofluxonium system. Capacitive coupling induces a shift of level |21 by ∆ due to repulsion from level |12 . The shift ∆ leads to a CZ-operation in time approximately given by 1/∆ when either qubit is driven at a frequency in between the transitions |10 − |20 and |11 − |21 . sweet-spot [44].
The gate error in our scheme is largely limited by decoherence outside the computational subspace. Randomized benchmarking yields the averaged gate error of 8 × 10 −3 , which is consistent with the measured few microseconds coherence time of transitions |10 − |20 and |11 − |21 (see Table II). Because these transitions are transmon-like, we expect their coherence will improve by an order of magnitude in the next generation experiments with improved fabrication and noise filtering procedures. This step would lower the gate error into the 10 −4 -range. The presently achieved fidelity is on par with the lowest reported values in microwave-activated gates [41,45] (cross-resonance gate by IBM). Additionally, our gate is considerably faster than cross-resonance type gates with comparable errors [40,46]. Our result illustrates the potential of highly-anharmonic circuits for quantum information processing and it motivates the exploration of large-scale quantum processors based on fluxoniums.
The paper is organized as follows, in Sec. II we describe the details of our experimental setup, including spectroscopy of the two-fluxonium device, the joint singleshot readout of fluxoniums, and state initialization procedures. In Sec. III we detail the concepts behind our fast microwave-activated CZ-gate. Section IV presents the results of gate characterization, including quantum process tomography and randomized benchmarking. In Sec. V we review the technical limitations of the present experiment, and project the near term improvements. Section VI concludes the work.

II. TWO-FLUXONIUM SYSTEM CHARACTERIZATION
Our device is composed of two fluxonium artificial atoms with a circuit design introduced in [22] coupled via a shared capacitance [see Fig. 1(a)]. The system obeys the Hamiltonian [25,47] , and E J,α are the charging energy, the inductive energy and the Josephson energy of fluxonium indexed by α = A, B, respectively. The operatorsφ α andn α are the phase twist across the inductance L α and the charge on the capacitor C α , and they commute according to [φ α ,n α ] = i. The experimentally extracted parameters of the Hamiltonian (1) are given in Table I. When biased at φ ext,α = π, the fluxoniums are at their sweet spots with respect to external flux. The qubit frequencies f A = 72.3 MHz and f B = 136.3 MHz are almost two orders of magnitude lower than the typical transition frequency in cavity resonators and transmon qubits [48]. These low frequency transitions exhibit a long energy relaxation times T 1,A = 347 µs and T 1,B = 282 µs owing to their decoupling from dielectric loss mechanisms [22]. A slight non-uniformity in the magnetic field, provided by a single external coil, prevents biasing qubits precisely at their sweet-spots simultaneously, with the offset being about 0.14% of the flux quantum. We operate at a bias coil current of 33.7 µA, approximately halfway between the two sweet spots, which reduced the spin echo coherence times to T E 2,A = 31 µs and T E 2,B = 64 µs (see Table II) compared to their sweet spot values of 47 µs and 67 µs, respectively. The sweet-spot values of coherence times are likely limited by photon shot-noise due to imperfect thermalization of the readout cavity [49]. This common issue can be improved with better filtering of the measurement lines. We note in advance that the small deviation from the sweet-spots does not modify appreciably the operation of our two-qubit gates, and the qubit coherence times are not limiting the gate error.
Higher energy states are separated from the computational states by an approximately 4.5 GHz gap (  (Table I)  the interaction creates a frequency splitting between transitions |10 − |20 and |11 − |21 of ∆ = 22 MHz. Note that the extracted value of J C corresponds to the mutual capacitance between the qubit circuits of about 1.2 fF, which is comparable to the typical mutual capacitance used for coupling transmons in quantum processors. Owing to the matrix element hierarchy n α 0−1 n α 1−2 , where n α i−j = | i|n α |j | is the charge matrix element of fluxonium α, the transverse interaction in the computational subspace is negligible. Interaction involving higher levels also introduces a static longitudinal ZZinteraction in the computational subspace [42,44] given by ξ ZZ = (E 00 + E 11 − E 01 − E 10 )/h = 46 kHz, where E ij is the energy of state |ij . We note that this value of ξ ZZ is at least an order of magnitude smaller than in typical transmon circuits and does not require extra spectrum optimizations beyond having low qubit-transitions  1) in the presence of a Gaussian flat-topped pulse and without decoherence. For every gate duration, drive frequency and amplitude are optimized to minimize the coherent leakage error (dashed line). The dots represent the gate error caused by incorrect phase accumulation ∆ϕ = ϕ11 − ϕ10 = π. With the optimal pulse duration around ∼ 62 ns, the infidelity due to coherent errors is well below 10 −3 .

frequencies.
The device is embedded in a 3-dimensional copper cavity with a resonant frequency of f c = 7.4806 GHz. Prior to each experiment, the qubits are initialized by applying a strong microwave pulse on the cavity (see Appendix C 3). This procedure increases the excited state probability of the qubits to 86% and 88% which is sufficient for metrology of gate operations. The remaining qubit entropy corresponds to temperatures of T A = 3.7 mK and T B = 5.9 mK well below the fridge temperature of 14 mK. We achieve a single-shot joint readout of the two qubits by sending a cavity tone of optimal frequency, power, and duration. The outgoing signal is further amplified by a Josephson Traveling Wave Parametric Amplifier (JTWPA) [50] and commercial amplifiers before down-conversion and numerical demodulation. We finally correct for readout imperfections by a procedure described in Appendix C 2.

III. FAST CZ-GATE BY EXACT LEAKAGE CANCELLATION
In this section, we describe how to exploit the fluxonium interaction to implement a CZ gate in the shortest possible time. Let us start by considering the gate transitions |10 − |20 and |11 − |21 as two-level systems. Our gate exploits the geometric-phase accumulations during round trips in these two systems. The accumulated phase can be divided into two parts: the dynamical phase, which is proportional to the evolution time and the energy of the system, and the geometric phase, which depends only on the trajectory followed in the Hilbert space. Applying a microwave tone drives the system with the Hamiltonian When the drive frequency f d is nearly resonant with the gate transitions, we observe Rabi oscillations [ Fig. 3(a))] with the resonance Rabi frequencies Ω 10−20 = | 10|Ĥ drive |20 |/h and Ω 11−21 = | 11|Ĥ drive |21 |/h, respectively. A strong hybridization of the |12 and |21 states creates an imbalance between the rotation speeds given by the ratio r = Ω 11−21 /Ω 10−20 1.36.
In order to eliminate leakage to higher states, one needs to synchronize off-resonance Rabi oscillations determined by the two transitions to ensure that the state vector always comes back to the computational subspace. This is achieved by matching the generalized Rabi frequencies where δ = f 11−21 − f d is the detuning between the |11 − |21 transition and the drive frequency [ Fig. 3

(a)].
A full rotation is then preformed in the shortest gate time t gate = 1/Ω. During the gate operation, the state vector trajectory can be depicted in a Bloch sphere representation [ Fig. 3(b)] when the system starts in |10 or |11 . These circular trajectories, which are traveled in opposite directions with respect to the centers of the Bloch spheres, define two cones inside the spheres. The cones and directions of travel define the solid an- /Ω] and Θ 11 = 2π(1 + δ/Ω), which correspond to a geometric phase accumulation ϕ ij = −Θ ij /2 on state |ij . Our gate thus implements a unitary operation U = diag(1, 1, e iϕ10 , e iϕ11 ). Using virtual Z rotations [51], the phase difference can be assigned to any computational state such as |11 to realize a controlled-Phase operation U = diag(1, 1, 1, e i∆ϕ ) with ∆ϕ = ϕ 11 − ϕ 10 . A CZ gate is obtained when ∆ϕ = −(Θ 11 − Θ 10 )/2 = −π∆/Ω = ±π. Using this condition in Eq. (3) we obtain the optimal drive frequency [brown arrow on Fig. 3 For δ given by Eq. (4) and Ω = ∆, a CZ gate with zero leakage is achieved in time exactly t gate = 1/∆. Our understanding of the gate process is validated by simulating the full Hamiltonian Eq. (1) in the presence of the drive Hamiltonian, Eq. (2). The full Hamiltonian takes into account the dynamical phase φ dyn = 2πξ ZZ t gate 10 −2 π which is negligible after one gate operation thanks to the small ZZ-interaction term. Starting with the driving frequency given by Eq. (4), we minimize the final populations leakage out of the computational subspace by adjusting the drive amplitude and frequency for every gate duration. Because of the rising and lowering edges of the pulse in the simulation, we find a slightly longer optimal gate duration ∼ 61 ns (compared to 1/∆ = 45.5 ns) for which coherent errors on the gate fall below < 10 −3 , see Fig. 3(c). Note, even at the error level of 10 −4 , the optimal point does not require an excessive fine-tuning of the gate pulse parameters: it corresponds to a fraction of a percent variation in terms of the gate time, gate pulse amplitude, and frequency (see Appendix F). Our procedure can readily be extended to any other phase accumulation by imposing a different phase-accumulation condition leading to a different gate duration and modified single-qubit virtual Z rotations.

IV. GATE CALIBRATION AND METROLOGY
Using our understanding from the previous section, we start with the following initial gate parameters: we use a 60 ns flat-top Gaussian pulse with a driving amplitude corresponding to a full rotation at the gate frequency given by Eq. (4). To measure the accumulated phase difference ∆ϕ, we perform a Ramsey-type experiment on qubit B to compare the phases of the superpositions |11 + |10 and |01 + |00 after the application of a CZ gate. This is achieved by inserting a CZ gate between two successive π/2 pulses with a relative phase β created by a virtual Z rotation [51]. When the angle β is varied, we observe Ramsey fringes with an initial phase encoding the excited state probability of the control qubit A. The phase difference when qubit A is prepared in the ground or excited state yield the relative phase accumulation ∆ϕ/π 0.95 [in Fig. 4(a)] close to the phase required for the CZ gate.
In order to gain more insight in the physics of our gate, we perform quantum process tomography by preparing 16 independent input states, applying the CZ gate to each of them before performing their state tomography.  The state tomography is obtained by a maximum likelihood estimation similar to the one in Ref. [52] using an overcomplete set of 36 tomography pulses. We represent the quantum process [53] in the Pauli basis through the process tomography matrix χ [46] [ Fig. 4(b)] and adjust the single qubit phases with virtual Z rotations to attribute the relative phase accumulation to the |11 state only. We obtain process fidelity F QPT = 0.97. This result shows that our scheme can be used to implement a CZ gate but the value of the fidelity is probably dominated by state preparation and measurement (SPAM) errors (Appendix C 2 and D) and cannot be used to optimize the gate parameters further.
Randomized benchmarking (RB) provides a reliable and robust metric of gate fidelity. The sequence is composed of m randomly chosen Clifford operations followed by a recovery gate aimed at bringing back the quan- tum state to the initial state. The average population of |11 state decays as a + bp m + c(m − 1)p m−2 where m is the number of Clifford operations, p is the depolarizing parameter and a, b, c are fitting parameters used to absorb state preparation and measurement errors [54]. The average fidelity of a Clifford operation is given by where d = 2 n is the dimension of the Hilbert space with n = 2 the number of qubits. Interleaving a target gate yields a decay with depolarizing parameter p gate and a gate error of 1−F = d−1 d (1−p gate /p). To reach the optimal gate fidelity, we perform a parameter search optimizing the sequence fidelity of fixed length interleaved randomized sequences [54] using a method known as 'Optimized Randomized Benchmarking for Immediate Tune-up' (ORBIT) [55] with a Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [56]. In practice, we first measure an interleaved randomized benchmarking curve [see Fig. 5(b)] before fixing the number of gates to n = floor[−1/ log(p)] = 5 providing the optimal sensitivity. The survival probability p(|11 ) is maximized by adjusting the 6 gate parameters [ Fig. 5(a)]: amplitude, duration, width of the edges, frequency, and single-qubit rotation angles. Fig. 5(c-h) shows the stochastic evolution of gate parameters during 1700 search steps leading to an improvement of gate fidelity from 0.967 to 0.992 for final gate duration of 30.0(plateau) + 2 × 15.8(edges) = 61.6 ns. In our experiment, CMA evolution strategy leads systematically to the same (global) minimum contrary to non-stochastic algorithm such as Nelder-Mead method that gets more easily trapped in local minima.
Finally, we assess the quality of the optimized CZ gate by iterative interleaved randomized benchmarking [54]. We obtain a reference fidelity F Clifford2 = (96.0 ± 0.1)%.
Interleaving the CZ gate [yellow curve in Fig. 6(a)] yields a gate fidelity of F (CZ) = (99.2 ± 0.1)%. This measurement is repeated for an increasing number of interleaved CZ gates [ Fig. 6(b)] to separate the contribution of systematic coherent errors from decoherence effects [57]. In our experiment, the total error of the CZ gate repeated n times grows linearly with the number of gates [Fig 6(b)] demonstrating that the remaining error is incoherent and thus not correctable with coherent controls.

V. OUTLOOK
Although the benchmarked fidelity of our CZ-gate is already high, the current experimental setup contains a number of imperfections, most of which can be readily eliminated in the next generation experiments. Let us start the discussion by briefly summarising these imperfections.
Taking into account the Ramsey coherence times of transitions |10 − |20 and |11 − |21 (see Table II), one expects the gate error bounded by 0.7% (see Appendix F 2). This estimate is compatible with the gate error (0.8 ± 0.1)% measured with randomized benchmarking. We believe that the limited coherence time of our higherlevel transitions originates in the first order sensitivity to external flux noise caused by the sweet spot misalignment. This effect can be corrected by either improving magnetic field uniformity, likely through better magnetic shielding, or by using two independent coils. Importantly, fluxonium's |1 − |2 transition is very similar in term of decoherence mechanisms to a regular transmon transition [22] and we therefore expect no fundamental obstacles in reaching coherence times around 50−100 µs, on par with optimally fabricated flux-tunable transmons. In this case, the Hamiltonian simulations presented in Fig. 3 shows that coherent gate errors in the low 10 −4range are possible for our device without sophisticated pulse optimization.
Coherence time in the computational subspace can exceed 500 µs, given the measured energy relaxation times of 250 − 350 µs of states |10 and |01 . The proof-of-principle demonstration of T 2 ≈ 500 µs coherence time was reported in a previous single-fluxonium experiment. Yet, here the qubit coherence time is reduced to 50 − 60 µs, even at their flux sweet-spots. This is likely due to a residual average thermal photon population n th ∼ 5 × 10 −3 in the readout resonator. Coherence time of the gating transitions |10 − |20 and |11 − |21 will be limited eventually by thermal dephasing as well. This dephasing source is generic to all superconducting circuit experiments and it can be mitigated in the future with improved cryogenic filtering of the microwave measurement lines [58,59]. After this technical task is accomplished, the qubits in our gate scheme would be considerably better than transmons at storing quantum information in between the two-qubit gate operations.
Turning to our microwave packaging choice, we used a single input port in a single 3D-box resonator to perform all the gates, for the sake of technical simplicity (see Appendix B). In Appendix D, we characterize the reduction of addressability in the system and find that simultaneous single qubit gates have about a ∼ 3% error rate dominated by classical cross-talks. Using a dedicated driving port per qubit would enable selective addressing of each qubit and hence improve significantly the single qubit gates fidelity [60]. More generally, individual addressing and readout is essential for scaling beyond the two-qubit experiments. We believe that scaling of our gate scheme can be done by simply moving to 2D-circuit designs, in complete analogy with processors based on capacitively-coupled transmon circuits [2][3][4]. Indeed, our experiment does not benefit from the usually high quality factors of 3D-resonators, and hence it is compatible with a traditional 2D-circuit technology without conceptual modifications. The only explicit price of replacing fixed-frequency transmons by fluxoniums would be the requirement of either a highly homogeneous global magnetic field or a static flux-bias line per qubit.
As a final remark, we stress that our CZ-gate does not require excessive parameter matching, unlike the majority of microwave-activated gates [34,61]. For example, the level diagram in Fig. 1 has no fine-tuned transitions, and the described gate protocol would work for a large class of fluxonium spectra and interaction strengths. Essentially, one only need to make sure the states |12 and |21 are not too far from each other (around 300 MHz here) and that qubit frequencies are well resolved (about 70 MHz here). This property can mitigate the effects of fabrication imperfections and improve qubit connectivity in the large-scale quantum processors.

VI. CONCLUSIONS
In this paper, we demonstrated that low-frequency fluxonium qubits are not only good at storing quantum information, but they also allow for fast and high-fidelity logical operations with a minimal engineering overhead. Our implementation of the microwave-activated CZ gate can be decomposed into two ideas applicable to other quantum systems, each harnessing the strong anharmonicity of fluxoniums. First, the energy scale ∆ limiting the gate speed comes from the repulsion of noncomputational states (here the repulsion of states |21 and |12 ), and hence the gate time ∼ 1/∆ can be made, in principle, shorter than the qubit's Larmor periods. Second, by synchronizing Rabi rotations of the only two relevant non-computational transitions (here |10 − |20 and |11 − |21 ), a conditional geometric phase is accumulated at the end of each Rabi period with zero leakage outside the computational manifold. Notably, the synchronization is possible with a single microwave pulse of proper amplitude, frequency, and duration, applied to either qubit. We adjusted the conditional phase to π to get the CZ-gate, but the phase can be changed to any other value, resulting in the CPhase-gate.
We exhaustively characterized the CZ gate using both quantum process tomography and randomized benchmarking techniques. While the current combination of gate time, 61.6 ns, and gate infidelity, 8 × 10 −3 , is already competitive with gates on transmons, our analysis indicates that the error can be reduced by an order of magnitude on improving the fabrication procedures, magnetic shielding, and line filtering. Given the compatibility of our capacitive coupling scheme with transmonbased scaling technology, we believe that all necessary demonstrations have been made to start exploring largescale fluxonium-based quantum processors.  Qubit pulses are directly generated by an Arbitrary Waveform Generator (see Fig. 7) instead of modulating a radio frequency tone. Our ability to multiplex and synthesize qubit pulses with a single digital-to-analog converter significantly reduces the hardware cost of the experiment. We use the internal mixing and pulse modulation capabilities of two Rhode and Schwarz®SMB100A sources to generate the readout pulse and the two qubit operations. The output signal is amplified by a Josephson Traveling Wave Parametric Amplifier (JTWPA) provided by Lincoln Labs [50] followed by commercial amplifiers before heterodyne detection.
Appendix C: Initialization and readout 1. Single-shot readout Both fluxoniums are strongly coupled to a 3D cavity. We perform a joint readout [27,62] of the qubit states. In the dispersive regime [63], the readout operator can be written where β ij are complex coefficients and σ i are Pauli matrices. In the single-shot limit, the integrated heterodyne signal distribution can be modeled as the sum of four Gaussian distributions associated to the computational states |00 , |10 , |01 , |11 (see Fig. 8). Surprisingly, we observe that the populations (Fig. 8) extracted from fitting the readout distribution by four Gaussians are affected by the readout amplitude and duration indicating that our readout scheme is not Quantum Non-Demolition (QND) [64][65][66]. Fig. 9 represents the evolution of the population with the duration of a cavity pulse preceding the readout pulse. Single qubit pulses are generated directly at the qubit frequencies using one analog output port of a Tektronix®Arbitrary Waveform Generator AWG5014C (not represented). The signal is combined with CZ gate pulses and readout pulses before reaching the input port of the 3D cavity. The outgoing signal is amplified using a Traveling Wave Parametric Amplifier followed by cryogenic and room temperature amplifiers before down-conversion by a local oscillator, digitization by an Alazar®acquisition board (not represented), and numerical demodulation. Although recent studies demonstrated the QND readout of granular aluminum fluxonium circuits with hundreds of photons [67], the non-QNDness of the readout of junction based fluxoniums was observed in the past either directly [68] or in the form of a qubit lifetime suppression by readout photons [69] even for average stationary photon number of the order of unity. These mechanisms are not investigated here and will be the subject of further studies.

Readout cross-talks compensation
We adopt an empirical readout cross-talk compensation introduced in Ref. [37]. This approach was used to compensate incorrect state mapping during a bifurcation readout but we believe that it is relevant in our case. First, we compensate for an incorrect mapping |0 → |1 or |1 → |0 of qubit α by correcting qubit population of states |ij with a tensorial products of two In addition, we use a pure cross-talk matrix which takes into account the possibility to swap excitations between the qubits during the readout process. The total readout correction applied to the qubit populations reads    p 00 p 10 p 01 p 11 where p ij are the corrected qubit populations, b = 7% and c = 3.5%. The impact of this calibration is best exemplified when performing Rabi experiments. After the calibration (Fig. 10 right column), we observe that the oscillations are centered around 0.5 (center of the Bloch sphere) and only the targeted qubit displays oscillations. Finally, we remind the reader that the two-qubit gate fidelities quoted in the main text are not affected by this procedure since randomized benchmarking is not sensitive to readout errors [70].

Initialization
We describe how we turned the qubit transitions induced by cavity photons observed in Fig. 9 to our advantage to initialize the system. The rate of these transitions is known to generally increase with the number of circu- lating photons in the cavity [67,69]. We thus expect to be able to induce incoherent transition rates between the qubit states.
We model the photon-induced relaxation using the incoherent rate equations where the total excitation and de-excitation rates are Γ A ↑ = 38.6 ± 0.6 kHz, Γ A ↓ = 6.4 ± 0.6 kHz and Γ B ↑ = 53.8 ± 0.3 kHz, Γ B ↓ = 7.4 ± 0.3 kHz. Notably, increasing the number of circulating photons by a factor 2.3 2 ∼ 5 (from Fig. 9 to Fig. 11) increases the energy relaxation rates Γ A ↓ + Γ A ↑ and Γ B ↓ + Γ B ↑ by about a factor ∼ 5 mainly by increasing the excitation rates. The amplitude of the initialization pulse was chosen to maximize the steady state purity while keeping the qubit populations in the computational space. This procedure leads to the preparation of the qubit states in a statistical mixture with the excited state probabilities p A 1 = 0.86 and p B 1 = 0.88 (see Fig. 11).
Appendix D: Single qubit gate fidelities Single qubit gate fidelities are calibrated by single qubit randomized benchmarking (see Fig. 12). Qubit A (B) gates are generated using Gaussian edge pulses with a total duration of 150 ns (75 ns). Reducing the gate duration of single qubit gates deteriorates the simultaneous randomized benchmarking fidelity because of cross-talks.
The addressability of each qubit can be characterized by the numbers in table III using the definitions in Ref. [60]. The reduction of addressability in our system is related to the fact that our control field aimed at one target qubit is also influencing the other qubit (classical cross-talk) due to our choice of a sample design with a single input port.   [60]. The error rates rα are extracted from the independent randomized benchmarking experiments of Fig. 12.
The error rate r α|β are extracted by looking at the depolarization of qubit α during a simultaneous randomized benchmarking experiment. The error metric of error on α due to the unwanted control of β is given by δr α|β = |rα − r α|β |. The correlations in errors are flagged by δp = pAB − p A|B p B|A where pAB is obtained from fitting the decay of qubit-qubit correlations p00 + p11.

Appendix E: Error bars
The error bars displayed in Fig. 6 are the standard deviations on the population obtained from the population extracted from the method described in Appendix C. All the curves in Fig. 6(a) are fitted simultaneously to a first order model a + bp(n) m + c(m − 1)p(n) m−2 where a, b, c are fitting parameters used to absorb state preparation are measurement (SPAM) errors, m is the number of Clifford gates and n is the number of interleaved CZ gates. The fidelity of a n iteration of a CZ gate is given by The error on the gate fidelity is then estimated by the propagation of error formula (E1) to obtain the error bars in Fig. 6(b). Similar procedures are used to obtain all the error bars given in the paper. and (2) with the additional account for a Gaussian flattopped pulse in the drive term (2). Thus, we multiply the drive term by a pulse-shaping function with the rising edge given by The lowering edge of the pulse is given by a symmetric expression at t width + t plateau < t < t gate , where t plateau is the duration of the flat part and t gate = 2t width + t plateau is the total gate duration. In all the simulations, we keep t width = 15 ns and vary t plateau in order to vary t gate . To match the measured ratio between resonance Rabi frequencies Ω 10−20 and Ω 11−21 , we choose A / B = 0.9 in the drive term (2).
We find the evolution operatorÛ by calculating the evolution of four basis computational states, projecting the result into the computational subspace, and performing single-qubit Z rotations as described in Ref. [25], which ensures that the only diagonal matrix element that has a nonzero phase is 11|Û |11 . We then calculate the average gate fidelity as [72] whereÛ CZ = diag(1, 1, 1, −1) is the diagonal operator for the ideal CZ gate. This expression corresponds to the randomized benchmarking fidelity [73]. When the real gate operator isÛ = diag(1, 1, 1, e i∆ϕ ), we find that 1 − F = (3/10)(1 + cos ∆ϕ). For a generic gate operator, we define ∆ϕ = arg 11|Û |11 and use the last expression to calculate the phase error, which is shown by dots in Fig. 3(c). Once projected into the computational subspace, the operatorÛ in Eq. (F2) is generally nonunitary, which describes coherent leakage to noncomputational levels. The average probability of such leakage is given by We show this leakage error in Fig. 13(a) as well as by the dashed line in Fig. 3(c). The total gate error calculated using Eq. (F2) is shown in Fig. 13(b) and by the solid line in Fig. 3(c). Results shown in these figures illustrate that we can achieve coherent gate and leakage errors below 10 −4 with the current amplitude and frequency stability of the experiment.

Decoherence effects
We estimate incoherent errors resulting from driving the |10 − |20 and |11 − |21 transitions as follows. We add the errors coming from cycling each transition independently. We first estimate the state fidelity for an arbitrary initial state when driving a full rotation on one of the transitions and average this fidelity over a set of 16 independent initial states before summing the two contributions. We solve the master equation by treating incoherent processes as a perturbation and write the density matrix as an expansion ρ = ρ (0) + ρ (1) + .... For the initial state |ψ(0) = |ψ ⊥ + c|11 , where |ψ ⊥ describes the amplitudes of the remaining computational basis states, we consider the Hilbert space with only three levels |ψ ⊥ , |11 , and |21 . Here, ψ ⊥ |ψ ⊥ = 1 − |c| 2 . We consider the drive Hamiltonian on the |11 − |21 transition in the interaction picture and under the rotating wave approx-imationĤ where we have introduced the basis |± = (|11 ±|21 / √ 2 that diagonalizesĤ Ω . The unitary evolution under this Hamiltonian results in which gives us the zeroth-order approximation to the density matrix ρ (0) (t) = |ψ(t) ψ(t)|. We find the firstorder correction ρ (1) (t) from the iterative master equation where the Lindblad super-operator Lρ = k L k ρL † k − (1/2)(L † kL k ρ + ρL † kL k ) describes nonunitary processes. We use the following collapse operators The first operator describes relaxation of the |11 − |21 transition with a rate Γ 1 = 1/T 1 (11 − 21), and the second operator describes pure dephasing with a rate Γ ϕ = 1/T R 2 (11 − 21) − 1/2T 1 (11 − 21) (see Table II). This gives the matrix element of the correction in the basis diagonalizingĤ Ω m ρ (1) (t) n = t 0 m Lρ (0) (t) n e −iνmn(t−t ) dt , (F8) where hν mn = E m − E n is the difference between eigenvalues ofĤ Ω corresponding to |m and |n . These eigenvalues belong to the set {±hΩ/2, 0}.
As a sanity check, we have also integrated numerically the master equation in the six-level Hilbert space that consists of the computational subspace and two upper levels |20 and |21 . We considered the Hamiltonian generalized from Eq. (F4) to include the drive of the |10 − |20 transition and to account for detunings between the drive frequency and two transition frequencies as described in Sec. III. Using collapse operators given in Eqs. (F7a) and (F7b) and similar operators for the |10 − |20 transition, we find the incoherent error to be 0.64%, which is close to the analytic estimate of 0.67%.