Demonstrating a Continuous Set of Two-qubit Gates for Near-term Quantum Algorithms

Quantum algorithms offer a dramatic speedup for computational problems in machine learning, material science, and chemistry. However, any near-term realizations of these algorithms will need to be heavily optimized to fit within the finite resources offered by existing noisy quantum hardware. Here, taking advantage of the strong adjustable coupling of gmon qubits, we demonstrate a continuous two-qubit gate set that can provide a 3x reduction in circuit depth as compared to a standard decomposition. We implement two gate families: an iSWAP-like gate to attain an arbitrary swap angle, $\theta$, and a CPHASE gate that generates an arbitrary conditional phase, $\phi$. Using one of each of these gates, we can perform an arbitrary two-qubit gate within the excitation-preserving subspace allowing for a complete implementation of the so-called Fermionic Simulation, or fSim, gate set. We benchmark the fidelity of the iSWAP-like and CPHASE gate families as well as 525 other fSim gates spread evenly across the entire fSim($\theta$, $\phi$) parameter space achieving purity-limited average two-qubit Pauli error of $3.8 \times 10^{-3}$ per fSim gate.


I. INTRODUCTION
Quantum computing is a potentially transformative technology, but challenges remain in identifying a path towards solving practical problems with a quantum advantage [1]. Continued progress towards this goal may be made on many fronts including qubit coherence or scalability [2,3], measurement or gate fidelities [4,5], and algorithmic improvements that reduce the required circuit depth through compilation [6]. In superconducting qubits, single-qubit gates are usually a factor of two or more lower error than two-qubit gates. Consequently, a typical strategy has been to demonstrate a minimally universal gate set consisting of arbitrary single-qubit rotations and a single two-qubit gate [7]. This is an efficient approach for some algorithms, e.g. surface code error correction, which compiles optimally with such a gate set [8]. However, many noisy intermediate-scale quantum (NISQ, [9]) algorithms require a more diverse set of two-qubit gates. An implementation of these gates could take the place of six to eight single-qubit gates and three CZ φ gates per arbitrary two-qubit gate required with an optimal decomposition into a minimally-universal gate set [10].
In the NISQ era, we need the largest two-qubit gate set that may be implemented with high-fidelity. A general two-qubit unitary gate allows independent control over the strength of σ X σ X , σ Y σ Y , and σ Z σ Z coupling between qubits requiring both DC and microwave control of gmon qubits [11]. However, models of interacting particles typically conserve the number of excitations corresponding to a simpler model where the σ X σ X and σ Y σ Y couplings have equal coefficients. This reduces the number of control parameters from three to two and eliminates the need for microwave control during an algorithm. This set of excitation-conserving gates has been appropriately termed the Fermionic Simulation, or fSim, gate set since it maps electron conservation in a chemistry problem to photon conservation in qubits [12]. An fSim gate can be defined with two control angles, θ, the |01 ↔ |10 swap angle, and, φ, the phase of the |11 state with a matrix representation in the |00 , |01 , |10 , |11 basis given by: fSim(θ, φ) = Demonstration of the tunable coupling between gmon qubits. a, Pulse sequence used to measure swapping as a function of coupler bias. We initialize one qubit, perform an fSim gate, defined by a set of three flux pulses that control the qubit frequencies and the coupling between the qubits, and measure the population of the other qubit. b, We vary the fSim gate as a function of the length and amplitude of the coupler pulse to measure the swap rate as a function of the bias amplitude. c, By taking the Fourier transform of the oscillations in b, we extract the coupling strength, |g|, as a function of coupler bias. The coupling changes sign at the two "OFF" biases ensuring we can turn the coupling off.
ing from arbitrary flux control of gmon qubits. Notably, promising low-depth algorithms using this gate set have been proposed including the quantum approximate optimization algorithm [13] and an algorithm for lineardepth circuits simulating the electronic structure of molecules [12]. Additionally, algorithms performed with just z-rotations and fSim gates enable error mitigation techiques including post selection and zero noise extrapolation [14], further improving this gate sets prospects on NISQ processors.
Here, we first demonstrate the strong flux tunable coupling between gmon qubits which we use to perform fast two-qubit gates. Then, to describe our calibration and control strategy, we use shallow circuits to illuminate the natural correspondence of the coupled transmon Hamiltonian and the fSim gate set. We use crossentropy benchmarking (XEB, [15]) to characterize two linearly independent and continuous families of entangling gates: the iSWAP-like family corresponding to fSim(θ, φ ∝ θ 2 ), and the CPHASE family corresponding to fSim(θ ≈ 0 • , φ). We then combine these two continuous gate sets to calibrate and benchmark 525 fSim gates spread evenly across the entire (θ, φ) parameter space.

II. STRONG COUPLING WITH GMON QUBITS
The quantum processors used in this work each consist of four gmon transmon qubits in a chain, together with three couplers. Both the qubit frequencies and their coupling can be independently controlled, providing several advantages over fixed coupling designs [11,16,17]. Firstly, since we can turn off the coupling at any detuning, both qubits may idle and perform single-qubit gates while operating closer to their flux insensitive point. This improves dephasing and decreases our sensitivity to flux settling tails. Secondly, since entangling gates are performed by bringing the qubit states near resonance, idling the qubits closer together means that gates require much smaller dynamic detunings, further reducing the amplitude of flux settling tails [18,19]. Thirdly, since the on/off coupling ratio is not dependent on the maximum qubit-qubit detuning, we are able to increase the overall coupling strength enabling faster gates with reduced decoherence error.
In Figure 1 we characterize the qubit-qubit coupling strength as a function of the coupler flux bias. Using the pulse sequence in Figure 1a, we initialize one qubit, apply an fSim gate, and measure the population transferred to the other qubit. Each fSim gate is defined by the amplitude and duration of three, nominally rectangular, flux bias pulses. Two pulses control the qubit frequencies and set their relative detuning, ∆, while the third pulse controls the coupling strength between the qubits, g. In Figure 1b we repeat this pulse sequence using the qubit flux biases to place them on resonance (∆ = 0 MHz) while varying the coupler bias amplitude and the shared duration of all three pulses. By taking the Fourier transform of the oscillating population data in 1b we extract the swap rate as a function of coupler bias which is equivalent to twice the qubit-qubit coupling, g, plotted in Figure 1c. We measure g/2π = 6 MHz when the coupler is biased to zero Φ 0 , and a coupling exceeding -50 MHz as the coupler bias approaches Φ 0 /2. The net coupling changes sign between these two regions ensuring we can turn the coupling off. During general operation, we idle and perform single-qubit gates with the coupler at the "OFF" bias and make excursions to stronger couplings ("ON" region) to perform fSim gates. In this work we use g max /2π ≈ −45 MHz which is three times stronger than is typical for fixed coupling devices.

III. COUPLED TRANSMON PHYSICS AND THE fSim GATE SET
In the absence of a resonant microwave drive, coupled transmon qubits naturally evolve within the excitationpreserving subspace. The specific time evolution is determined by three parameters: the qubit nonlinearity, η, the qubit-qubit frequency detuning,∆, and the coupling between qubits, g. While η is fixed at 240 MHz by qubit capacitance, the gmon architecture allows for Exploring the parameter space of two-qubit gates. Each pixel represents one experiment. We use a set of 15 ns rectangular current bias waveforms to perform some fSim unitary by setting the qubit-qubit detuning, ∆, and the coupling strength, g. a, To identify the low-leakage gates described by the fSim model, we measure leakage by initializing the |11 state and measuring the |02 state. When the detuning is near the qubit nonlinearity, we observe the expected Rabi oscillations. b, We measure the conditional phase, φ, by performing a Ramsey experiment where we initialize one qubit with an X/2 gate and perform tomography to measure the difference in accumulated phase (φ) with and without initializing the other qubit to the |1 state. By choosing combinations of ∆ and g as indicated by the CPHASE dash-dotted line (chosen as the low-leakage coupling strength from a), we are able to achieve any φ : [−180 • , 180 • ]. c, We measure the swap angle, θ, by initializing the |01 state and measuring the |10 state. By placing the qubits on resonance and varying the coupling strength along the iSWAP-like dashed line, we are able to achieve any θ : time-dependent control of both ∆ and |g| using DC to ≈ 200 MHz bandwidth flux waveforms. The qubit center frequency, (f q1 + f q2 )/2, is a free parameter that may be used to avoid coupled two level system (TLS) defects present in the frequency spectrum of either qubit [20][21][22]. For simplicity, we limit our fSim control pulses to synchronous, nominally rectangular waveforms defined by four parameters: a shared length, typically 13 ns to 15 ns and three control amplitudes that set g and ∆. While further pulse shaping may improve gate performance in the future, these basic waveforms were sufficient to approach the decoherence limit of our qubits which have a T 1 of 25.3 ± 7.3 µs (supplement IX B).
The full fSim control model describes any low-leakage two-qubit unitary evolution with five parameters: θ and φ, discussed previously, in addition to three parameters describing single-qubit phases as detailed in the supplement (VII). Here, we focus on the parameters that describe the two-qubit interaction and use the three experiments described in Figure 2 to measure leakage to the |02 state and map out the φ and θ control landscape (complete unitary tomography procedure outlined in supplemental section IX C). Each experiment follows the same pattern: initialize a relevant state, apply fSim control pulses, and then perform either population or tomographic measurements to extract the desired qubit's population or phase. Within the fSim model, leakage is the dominant error. In Figure 2a, we map out leakage by initializing |11 and measuring the |0 population of the lower frequency qubit as a proxy for leakage in the higher frequency qubit. In Figure 2b we explore the φ parameter space by performing a Ramsey experiment where we take the difference in the accumulated phase with and without the second qubit initialized to the |1 state. Finally, in Figure 2c we explore the θ parameter space by initializing one qubit to the |1 state and measuring the |1 population of the other qubit after the fSim gate. The Rabi oscillation physics explored with these measurements is reproduced with fairly rudimentary numerics in supplemental VIII, but these experiments serve to demonstrate our fSim control strategy.

IV. BENCHMARKING iSWAP-like AND CPHASE GATES
The data presented in Figure 2 provides a map for implementing an arbitrary fSim-each pixel defines a set Characterizing the iSWAP-like and CPHASE gate families with cross-entropy benchmarking. We plot the optimized fSim control angles, θ and φ, on the left y-axes and the Pauli gate error per two-qubit gate on the right y-axes, conservatively assuming 7.5 × 10 −4 single-qubit Pauli gate errors. a, Characterization of the CPHASE gate family corresponding to fSim(θ ≈ 0 • , φ). Each gate is 15 ns long, consisting of control pulses that vary the qubit detuning, ∆, around the qubit nonlinearity, η, with a coupler bias amplitude chosen to complete one full swap: |11 → |02 → |11 . We measure an average two-qubit Pauli error of 1.9 × 10 −3 for the CPHASE family. b, Characterization of the iSWAP-like gate family corresponding to fSim(θ, φ ∝ θ 2 ). Each gate is 13 ns long, consisting of control pulses that place the qubits on resonance and vary the coupling strength, |g|, to achieve an arbitrary swap angle θ between the |01 and |10 states. We measure an average two-qubit Pauli error of 1.2×10 −3 for the iSWAP-like family.
of three control amplitudes, and any control amplitudes yielding low-leakage should result in a high-fidelity gate described by the fSim control model (Eq. 1). While it may be possible to perform an arbitrary fSim gate with a single set of flux pulses using either very strong coupling or more complex control waveforms, we have chosen to implement an arbitrary fSim gate as a composition of two continuous gate families using simple rectangular control pulses to minimize the gate length. The first gate family completes a diabatic |11 |02 swap to perform a gate with an arbitrary conditional phase, φ, using control amplitudes denoted by the dot-dashed line labeled 'CPHASE' in Figures 2a and 2b. The dominant control angle in the CPHASE gate family is the conditional phase, but, we do accumulate a small swap angle θ due to the strong coupling necessary to perform a fast CPHASE gate (θ ≤ 5 • for a 13 ns CPHASE gate-this may be reduced by increasing the gate duration). The second gate family places the qubits on resonance (∆ = 0 MHz) and varies g to reach the desired swap angle, θ, using control amplitudes along the dashed line labeled "iSWAP-like" in Figure 2c. We have deemed this gate family "iSWAPlike" since the swap angle varies from θ : [0 • , 90 • ] and because this gate accumulates a conditional phase φ ∝ θ 2 due to the dispersive interaction with the |02 and |20 states. Both of these gates are a subset of the fSim group individually, and, compiled together, they can reach the full fSim parameter space.
In Figure 3 we characterize both the iSWAP-like and CPHASE gate families using cross-entropy benchmarking (XEB) [15]. On the left axes we plot the optimized values of θ and φ for a range of CPHASE and iSWAP-like gates, and on the right y-axes we plot the Pauli error per two-qubit gate (see supplemental IX B), achieving average errors of 1.9 × 10 −3 and 1.2 × 10 −3 for each gate family respectively.

V. BENCHMARKING fSsim GATES
In Figure 4a we present the Pauli error of 525 distinct fSim(θ, φ) gates where the values of θ and φ have been constrained to be exactly the values indicated by the xycoordinates at the center of each pixel (where ex situ optimization has been used only to optimize the singlequbit phases). Each 28 ns long fSim gate in Figure 4 is a composition of a 15 ns CPHASE gate followed by a 13 ns iSWAP-like gate. While the fSim fidelity is largely independent of the values of θ and φ there are a few features of note. As discussed in supplement X C, we most-directly calibrated line cuts at θ = 0 • , 90 • and φ = 180 • . The regions of higher error where φ is near 0 • (360 • ) involve the most extrapolation from the directly calibrated control amplitudes. Secondly, there is a faintly-visible indication of a band of higher error near φ ≈ 240 • which we believe is due to a weakly interacting TLS defect in the spectrum of one of the qubits-in the future we hope to avoid such defects by shifting the frequencies of both qubits while maintaining their relative detuning. In Figure 4b we histogram these results in addition to the purity [23] per fSim and confirm a purity-limited average Pauli error of 3.83 × 10 −3 per fSim gate.

VI. CONCLUSIONS
We have implemented continuous iSWAP-like and CPHASE gate families with average Pauli error rates of 1.2 × 10 −3 and 1.9 × 10 −3 respectively. These fast (13-15 ns) gates take advantage of the strong, tunable, qubit- qubit coupling offered by our gmon transmon qubit architecture achieving error rates more than a factor of two lower than the best previously reported two-qubit gates for superconducting qubits [24]. Additionally, we have combined these two gate sets to demonstrate a complete implementation of the two-qubit fSim gate set with an average Pauli error of 3.83 × 10 −3 per gate. This direct implementation of the fSim gate offers roughly an additional factor of three in compilation efficiency for NISQ algorithms over a minimally-universal gate set.

ACKNOWLEDGMENTS
This work was supported by Google LLC. The UC Santa Barbara Nanofabrication Facility, part of the National Nanotechnology Infrastructure Network funded by NSF, fabricated the gmon qubits. * These authors contributed equally to this work

VII. FSIM CONTROL MODEL
A generic representation of a Fermionic Simulation (fSim) gate corresponding to a two-qubit photon conserving unitary requires five parameters. We may separate out the single and two-qubit parameters as follows: a |01 ↔ |10 swap angle, θ, a |11 state conditional phase, φ, and three single qubit phases, ∆ + , ∆ − , and ∆ −,of f yielding a generic fSim parameterization, We are interested in performing a two-qubit gate, which is independent of the single-qubit rotations. Therefore, we can focus on the matrix where ∆ + , ∆ − , and ∆ −,of f are all zero, leading to the notation, used to designate an arbitrary gate within the excitation preserving subspace.

VIII. FSIM GATE NUMERICS
The qubit dynamics presented in the main paper (Figure 2) are well described by numerics simulating two interacting qutrits (e.g. a pair of coupled three-level anharmonic oscillators) evolving with a time dependent detuning, ∆(t), and coupling, g(t). We truncate the full two-qutrit Hamiltonian limiting our simulation to states with 1 or 2 excitations. Operating with the basis |01 , |10 , |11 , |20 , |02 , the Hamiltonian describing the system is given by: where η is the nonlinearity of each qubit, which we assume is the same for both qubits (240 MHz). Using this model, we may estimate the unitary operation enacted by arbitrary time-domain control of the coupling strength and the qubit detuning by discretizing these time domain control waveforms and performing a time ordered integral of H(t).
In Figure 5 we qualitatively reproduce the experimental results in Figure 2 by simulating 15 ns rectangular control pulses defining both g and ∆. In Figure  6 we illustrate the broadening effect that using shorter pulse lengths has on the Rabi interactions of both the |01 ↔ |10 and |11 ↔ |02 interactions by simulating rectangular pulses that are 10 ns, 15 ns, and 20 ns long. In Figures 6 and 7, we have omitted points where the leakage exceeds a 1% threshold which identifies the parameter space where we can perform fSim gates with low error. Experimentally we have chosen to implement our CPHASE gates with 13 ns long rectangular pulses with a 1 ns pad on either side-when we made the gate length shorter, leakage increased (data not shown). Here, in Figure 6a, we qualitatively see that the width of the 1% leakage band where we perform the CPHASE gate begins to pinch off and the |2 state Rabi interaction reaches all the way to the on-resonance iSWAP-like parameter space (dotted white line) when the gate length is 10 ns. Both these results qualitatively reproduce what we observed experimentally when attempting iSWAPlike gates shorter than 11 ns or the CPHASE gate shorter than 13 ns. Finally, in Figure 7 we simulate the effect of smoothing the control pulses by simulating 20 ns long coupler pulses that are rectangular, rectangular with 3 ns Gaussian smoothing, and cosine shaped (all detuning pulses are rectangular and have the same length). Here we see that smoothing reduces the extent of leakage from the second and third |11 ↔ |02 swap lobes expanding the available low-error fSim control space. This indicates that pulse smoothing may be an important consideration of any future fSim implementation that aims to perform an arbitrary fSim using a single coupler pulse instead of the two discrete rectangular pulses we have used in this work.  We simulate qubits with a fixed nonlinearity (240 MHz) with 15 ns long rectangular control pulses defining the qubit detuning, ∆, and their coupling, g.

IX. GATE CHARACTERIZATION
We use a variety of techniques to characterize the performance of our single and two-qubit gates which we detail in this section. In lieu of full process tomography, we use depth one population based measurements to perform unitary tomography to quickly assess the unitary operation performed by a given set of control pulses. We then turn to benchmarking techniques that amplify gate errors and allow for the characterization of small error rates. We use Clifford based benchmarking to characterize our single-qubit microwave gates and cross-entropy benchmarking (XEB) to characterize our two-qubit entangling gates.

A. Computing and reporting Pauli error rates
Before jumping in to gate characterization, a quick aside on Pauli error rates. We report Pauli error rates which are independent of the Hilbert space dimension and thus add linearly as the circuit's Hilbert space grows. In the past, many have reported average single and twoqubit error, e r , as exponential decay constants of a sequence fidelity, F = Ae mer + B where A and B are fit parameters to compensate for state preparation and measurement (SPAM) errors, m is the number of gate repetitions in the sequence, and e r is the error per cycle. The Pauli error, e p , is related to e r by the dimension of the Hilbert space: where D = 2 n is the dimension of the Hilbert space for an n-qubit gate. We note that this results in an increase in the reported error by a factor of 1.5 for single-qubit gates (n = 1) and a by a factor of 1.25 for two-qubit gate errors (n = 2). When performing two-qubit XEB, we measure the exponential decay constant per cycle, e r,cycle where each cycle consists of the application of one single-qubit gate per qubit and one fSim entangling gate involving both qubits. In order to extract the error per fSim gate, we can convert this to a Pauli error per cycle, e p,cycle , and subtract off the two single-qubit Pauli gate errors, e p,q1 and e p,q2 , which we estimate using single-qubit Clifford based randomized benchmarking. e p,2q = e p,cycle − (e p,q1 + e p,q2 ) For simplicity, all two-qubit Pauli errors have been computed assuming single-qubit Pauli errors of 7.5×10 −4 per gate per qubit consistent with our typical single-qubit error rates immediately following a successful run of our standard single-qubit gate calibration procedure (see supplement IX B).  where leakage is less than 1% (white regions are where leakage exceeded this threshold). Experimentally we chose to perform our CPHASE gate with 13 ns long pulses and the iSWAP-like gate with 11 ns control pulses (both of which had 1 ns pads on either side)-as we found that shorter implementations of either gate increased leakage and the overall gate error. Here, these numerics demonstrate that for 10 ns long gates, the low-leakage lobe where we perform the CPHASE gate narrows considerably and the |2 state Rabi interaction reaches the on-resonance iSWAP-like line cut near θ = 90 • , both of which agree with our experimental results.

B. Single-qubit coherence and gates
Qubit coherence, in conjunction with gate duration, places a lower bound on both our single and two-qubit gate error rates. In Figure 8 we characterize T 1 for four qubits over a frequency range of 5 to 6 GHz. To perform this measurement we calibrate single-qubit gates, readout, and flux bias frequency control for a given qubit idle frequency. We then excite the qubit to the |1 state and detune the qubit to another frequency for a variable amount of time before detuning back to the idle frequency for readout. For each detuned frequency, T 1 is IG. 7. Numeric simulation of a, a 20 ns rectangular coupler pulse, b, a 3 ns rise time rectangular pulse, and c, cosine coupler pulse showing the fSim parameter space where leakage is less than 1%. We observe that as the coupler pulses become more smooth, the fSim parameters space where leakage is less than 1% expands considerably. This indicated that pulse shaping and or smoothing may play an important role in any future implementation of the fSim gate set that aims to implement the gate set with a single pulse.
extracted as an exponential decay of the population over time, P |1 ∝ Ae −t/T1 + B, where A and B are fit parameters to compensate for state preparation and measurement errors. We find T 1 = 25.3 ± 7.3 µs averaging data from all four qubits over a frequency range of 5 − 6 GHz. Since f max for the second qubit was anomalously low, we averaged data for this qubit from 5 − 5.61 GHz.
We use single-qubit purity [23] and Clifford-based ran-domized benchmarking [25,26] to characterize the average error of our single-qubit gates. In Figure 9 we present representative results for a pair of qubits demonstrating purity-limited (incoherent error-limited) performance. These gate errors drift over time, but immediately following a successful run of our standard calibration procedure we typically observe single-qubit error rates at or slightly higher than the 7.5 × 10 −4 level [27].  The two additional measurements (u 12,excited and u 22,excited ) are repeated measurements of u12 and u22 but with the other qubit placed into the excited state. This additional information is used to construct the conditional phase, φ.
Matrix element Initial state Measure qubit As such, we use this estimate in computing two-qubit error rates throughout this paper. These error rates are consistent with the coherence limit, for T gate = 15 ns and T 1 = 30 µs, giving e p,inc ≈ 1.5×T gate /3T 1 ) = 2.5×10 −4 , with the remainder of the error coming from leakage and T 2 [28].

C. Unitary tomography
Section II of the main text describes shallow circuits used to characterize leakage and the two-qubit control parameters, θ and φ. Here, we detail the procedure used to directly measure all the non-zero matrix elements composing an arbitrary photon conserving unitary operation and the algebra used to convert these matrix elements into the five fSim control parameters (in Eq. 2). We use the resulting fSim model to compute the XEB sequence fidelity which we may then use as a cost function to optimize some, or all, of the fSim model parameters.
In order to efficiently characterize the unitary operation performed by a given set of control pulses, we initialize and measure a set of circuits as summarized in Table  I. If we consider a general photon conserving unitary the non-zero matrix elements will take the form: Where u nm denotes a non-zero element. We measured u nm by initializing excited qubit in the basis ket of column m with an X/2 gate, and measuring the expectation value of σ x + iσ y of the excited qubit in the basis ket denoted by row n. e.g. for u 21 we initialize the left qubit, apply the fSim gate, and then measure σ x + iσ y of the right qubit-this is the complex value of u 21 . This procedure works for the single excitation subspace (e.g. n, m in [1,2]), but u 33 is computed from repeated measurements of u 12,excited and u 22,excited where the previously TABLE II. Computing fSim model parameters from the results of our unitary tomography protocol. The "condition" column is present because we compute u33 = u 22,excited /u * 11 or u33 = u 12,excited /u * 21 depending on if u11 or u21 is larger to ensure the result is non-singular. ψ10 is the phase difference accumulated between the two qubits over the gate duration.

fSim parameter
Value uninitialized qubit is instead placed into the |1 state as summarized in Table I. This procedure is similar to process tomography, but requires considerably fewer measurements to characterize the fSim matrix. We note that an optimal measurement sequence would require only 2n-1 circuits (for a n × n matrix) [29]. Even with several thousand repetitions of each circuit, characterizing the matrix with this method takes only a few seconds. Our series of six circuits is intentionally over-complete to avoid singular behavior when some matrix elements are small. In table II we list the conversion matrix elements to the five parameters of our fSim control model. These are useful measurements for building an fSim model, but we cannot characterize small gate errors (≈ 10 −3 ) using this method due to the limitations of state preparation and measurement (SPAM) errors which are a few percent.

D. Cross-entropy error benchmarking
Cross-entropy benchmarking (XEB) is a powerful technique for characterizing the error of an arbitrary gate [15]. It is particularly useful when implementing non-Clifford gates like the continuous fSim gate set we use here. XEB uses a repetitive gate sequence to amplify small errors where each cycle consists of a random singlequbit gate from the set {X/2, Y/2, ±X/2±Y/2} applied to each qubit followed by the fSim gate we are benchmarking. We extract the error per cycle as an exponential decay in the XEB sequence fidelity, F XEB . The sequence fidelity is computed using the cross-entropy between two probability distributions P and Q, S(P, Q) = − i p i ln(q i ), by comparing the expected, measured, and incoherent probability distributions for a given gate sequence, F XEB = S(P incoherent , P expected ) − S(P measured , P expected ) S(P incoherent , P expected ) − S(P expected ) (9) The numerator is the difference between the measured and expected cross-entropy and the denominator serves as a normalization so that F XEB takes a value from [0, 1]. We then use 1 − F XEB as a cost function to optimize the five parameters of our fSim control model. For a given random sequence, we compute the expected probability distribution using perfect single-qubit gate models and the fSim model obtained from our unitary tomography experiment (supplement IX C). Since, the sequence fidelity is dependent on the single and two-qubit gate models used in the cross-entropy calculation, we can use 1 − F XEB as a cost function to optimize some or all of our fSim gate model parameters, a process termed ex situ optimization.

E. RB vs XEB
As a sanity check, one may ask that we compare the result of Clifford based randomized benchmarking (RB) and cross-entropy benchmarking (XEB). Clifford based RB requires an inversion gate, inverting a random gate sequence to map the total ideal gate sequence starting in the |0 state back to |0 . For most of the fSim gates, the inversion gate is non-trivial, but, for the special case of a CZ φ = fSim(0 • , 180 • ), which is part of the Clifford gate set, this comparison is possible.

F. Error budgeting
In this section, we use various techniques to provide a more thorough budget of our XEB per cycle errors. As we have discussed, XEB measures the total error per cycle, e p,cycle . This includes coherent and incoherent errors for one single-qubit gate per qubit and one fSim gate. We use single-qubit Clifford-based randomized benchmarking to characterize the average total error for single-qubit gates, we use purity benchmarking to characterize incoherent error of both the single-qubit and fSim gates, and we use |2 state readout in conjunction with XEB to characterize per cycle leakage (which is included in the incoherent error). Here we focus on the two-qubit gate errors by assuming purity-limited single-qubit Pauli gate errors of 7.5 × 10 −4 as described in supplement IX B-this effectively means we subtract 1.5×10 −3 from e p,cycle to obtain e p,2q for both error and purity measurements..
In Figure 11a we perform Purity benchmarking for each XEB gate sequence and obtain an average Purity b, Two-qubit Clifford based randomized benchmarking with (blue) and without (red) an interleaved CZ φ gate, allowing us to extract the Pauli error per CZ + φ of 0.41%. c, Twoqubit cross-entropy benchmarking where each cycle includes two single-qubit gates and a CZ φ gate yielding a Pauli error per cycle of 0.59%. Here we find that the sum of the single and two-qubit errors measured with Clifford based RB (0.09% + 0.07% + 0.41% = 0.57%) corresponds well to the XEB error per cycle (0.59%). of 3.76 × 10 −3 per fSim gate. In Figure 11b we plot e p,2q,unitary tomography , the Pauli error per fSim gate using the fSim gate model obtained from unitary tomography.
The average e p,2q,unitary tomography is 5.07 × 10 −3 indicating a coherent error of 1.31×10 −3 per fSim. In Figure 11c we perform ex situ optimization of our fSim gate model to reduce the coherent error by changing the three singlequbit detuning model parameters. We hold the values of θ and φ fixed to the sampling grid, but allow the singlequbit phases in the fSim model to be optimized. With this improved gate model coherent error is nearly eliminated. The average error e p,2q,ex situ is 3.83 × 10 −3 re-ducing the average coherent error to 7 × 10 −5 per gate. We characterize leakage by directly measuring the |2 state population as a function of the XEB sequence depth. In Figure 12 we perform this measurement for a line cut of fSim control pulses that sweep the coupler bias on either side of the low-leakage bias used to perform a CPHASE gate. We find leakage to be minimized to a value of 5 − 6 × 10 −4 for a range of coupler biases spanning nearly 10 "clicks" of our 13-bit bipolar DAC (2/2 13 ≈ 0.0002).
In total, these metrics indicate that we have achieved incoherent-error-limited gates with fairly low leakage (if necessary, leakage may be reduced further by optimizing the gate length). Additionally, we find that we are able to perform the desired fSim(θ, φ) gate we want without incurring additional coherent error. A critical component in achieving these results was eliminating the non-gatelike behaviors induced by long settling tails on our flux bias pulses. As such, we will now detail the procedure used to calibrate our flux control pulses.

G. Unitary overlap
The unitary overlap of two unitary matrices, e.g. some target fSim, U target , and the actual fSim, U actual , is defined as T r(U target · U actual )/D, where D is the dimension of the Hilbert space. The unitary overlap is related to the Pauli error, e p = 1−(T r(U target ·U actual )/D) 2 . The Pauli error in an fSim gate for small deviations in either θ or φ is proportional to the square of the deviation angle. In Figure 13 we plot the additional coherent error incurred if you assume some actual fSim actual = fSim(θ + δθ, φ + δφ) is instead some target fSim target = fSim(θ, φ). This plot indicated that a deviation of either 2.5 • in θ or 4 • in φ with result in an additional coherent error of 1 × 10 −3 . In our case (Figure 11), after a constrained optimization where θ and φ were fixed to a grid, our average error was approximately 1 × 10 −4 higher than the purity limit which corresponds to a deviation of about 1 • in either θ or φ.

X. CONTROL PULSE CALIBRATION
In a world without flux settling tails, we would be able to implement an arbitrary fSim gate with a fidelity that is the sum of the requisite CPHASE and iSWAP-like gates by just merging the control pulses into a composite fSim gate. Unfortunately, due to flux settling tails, further calibration, described in X C, was required. The keystones of this calibration were two-fold: 1) When performing two flux control based gates back to back (e.g. 2 ns separation), adjust the amplitude of the second pulse based on the first. 2)When implementing a composite gate, perform a CPHASE gate followed by the iSWAP-like gate so that bleed through is well behaved; in the reverse order, bleed through of the iSWAP-like coupler pulse into a b c FIG. 11. Comparison of purity benchmarking and cross-entropy benchmarking with and without a constrained ex situ optimization of the fSim control angles. a, Purity benchmarking. b, XEB error per gate using the fSim gate model obtained from unitary tomography (supplement IX C). c. XEB error after a constrained ex situ optimization of the fSim gate parameters where θ and φ were held fixed to the grid and the single-qubit phases were optimized.
the CPHASE gate pulses will result in leakage to the |2 state which is an error in the fSim model. Using these two principles, we were able to implement a robust calibration of the complete fSim gate set.
As we have demonstrated numerically in supplement VIII, our desired implementation of the fSim gate set is possible with less than 1% error using simple rectangular control pulses. Unfortunately, the system transfer function (electronics and wiring) is imperfect and cannot produce these ideal waveforms exactly. Fortunately, as explored numerically in Figure 7, our fSim implementation is mostly sensitive to the integral of our control pulses rather than the shape. This likely remains true unless the spectral content of our flux control pulses approaches the qubit frequency. However, we must be very careful to ensure our control pulses do not bleed into each other which requires careful calibration of our flux bias settling tails.
We can consider settling non-idealities at two time scales: 1) pulse distortion during the duration of a gate (roughly 15 ns), and 2) pulse settling that occurs after the intended gate duration. Distortion at short times may, for instance, make it difficult to place the qubits exactly on resonance during a gate-this may make it difficult to achieve a swap angle, θ, of 90 • swap amplitude (Rabi oscillation amplitude = g 2 /(g 2 + π∆ 2 /2) = 1 if and only if the qubits are on resonance), but fortunately these distortions do not have a huge impact on the rest of the fSim parameter space. Due to the periodic nature of Rabi oscillations the resulting fSim is mostly dependent on the integral of the control pulses. Pulse settling that occurs outside the intended gate interval means that adjacent gates will bleed in to each other. If the tails are relatively short (a few ns), it is possible to mitigate this error just by placing a short idle time between gates. Pulse settling at longer times is particularly nefarious because it becomes no longer feasible to pad gates with idle times and setting times of 5-1000+ ns have been observed in superconducting qubit systems. If left uncompensated, the performance of the m th 15 ns long gate would be dependent on the preceding 1-60+ gates. This runs contrary to the entire notion of gate-based local operations and certainly would not fit within our static fSim control model used with XEB. As such, it is this long-time settling in particular that requires a careful calibration to enable the sensible control strategy employed throughout this letter. The full fSim gate calibration happens in three stages. In the first stage, we calibrate the electronics to eliminate the long-time settling flux settling. In the second stage, we describe the calibration procedure for the CPHASE FIG. 13. We may choose to interpret some fSim actual = fSim(θ+δθ, φ+δφ) as some fSimtarget = fSim(θ, φ), by accepting additional coherent error. For small deviations in either θ or φ the error is proportional to the square of the deviation. and iSWAP-like gate sets. Then, for the fSim gate family, we perform further calibrations of the composite fSim gates to achieve the best possible gate performance by adjusting the control amplitude of the second pulse dependent on the first rather than adding longer buffer times between flux pulses.

A. Electronics calibration
On this device there are a total of seven flux bias lines, four for the qubits and three for the couplers. Each channel is driven by a dedicated 1 GS/s, 14-bit DAC controlled by an FPGA to form an arbitrary waveform generator. Each line uses nominally identical cabling, attenuation, and filtering from room temperature down to the sample's chip mount. To compensate for non-idealities in each line, we first measure the qubit's response to a flux pulse, fit the response using three exponential decay time constants, and then use this model to pre-distort our control pulses as in previous work [19,30]. This allows us to directly measure and compensate for the transfer function of each qubit's flux bias wiring. Implementing a similar in situ calibration of the coupler bias lines is the subject of on-going work. For now, we have found it sufficient to simply apply the average of the two adjacent qubit settling models to the coupler. The pulse calibration parameters for the pair of qubits and the coupler used to benchmark the fSim gate set are summarized in Table III.
After performing the electronics calibration we find the unitary gate interactions of our fSim gates to be well characterized by either unitary tomography, performed with a depth-1 circuit, or cross-entropy bench-marking using a depth N circuit where N varies from 5 to 700. This fact is illustrated by Figure 11 panels b and c where the difference in the average error of all 525 fSim gates differs by only 1.2×10 −3 with and without optimizing the single-qubit unitary parameters-this provides an upper bound on the effects of pulse bleed through on gate fidelity. If we consider the gate timing of the cross-entropy benchmarking sequence in Figure 11, which used 28 ns fSim gates interleaved with 15 ns single-qubit gates, this result indicates that our settling is well compensated for at times longer than 15 ns. This result also indicates that the qubit biases are settled enough to have a minimal impact on the single-qubit gate errors. If this were not the case then we would require a circuit-depth-dependent gate model to reach the purity limit. However, the settling of the coupler bias flux signal at times less than 15 ns becomes non-negligible and merits special consideration when calibrating fSim gates composed of a CPHASE gate in close proximity to an iSWAP-like gate. So, we will first detail the calibration procedure for each of the component gate families in the next section and finish our calibration discussion with a description of composite fSim calibration procedure.

B. CPHASE and iSWAP-like calibrations
We calibrate the CPHASE interaction by repeating the leakage experiment described in Figure 2a to fine tune the coupler bias amplitudes and to identify combinations of qubit detunings, ∆, and corresponding coupler bias amplitudes that yield low-leakage gates. We use the qubit frequency bias transfer function to choose qubit biases that set the desired qubit detuning, ∆ in the vicinity of η. The frequency range around η is set by the width of the Rabi interaction which, for a fixed pulse length, is inversely proportional to gate length since shorter gates require stronger coupling, g (see Figure 6 in supplement VIII). We use 15 ns pulses (13 ns rectangular pulses with a 1 ns padding on either side) which makes the Rabi interaction span about 75 MHz on either side of the qubit nonlinearity, η. For each detuning in this range, we repeat the experiment in Figure 2a varying the coupling strength to minimize leakage. An example of the raw data from this experiment is provided in Figure 14 where the dotted line indicates the low-leakage coupler bias amplitude that achieves one full swap from |11 to |02 and back. We initialize the |11 state, apply the CPHASE control pulses, and measure the |1 state population of the lower frequency qubit to identify when the popula- tion has completed a full swap. Then, for each combination of ∆ and the corresponding low-leakage coupler bias we repeat the experiment from Figure 2b to measure the conditional phase. This procedure works well for φ : [−130 • , 130 • ] until the Rabi swap amplitude becomes small and the peak is broad along the coupling strength line-cut. At that point, we extrapolate towards the zero coupling bias while measuring the conditional phase to fill out the rest of the conditional phase control space.
In Figure 15 we calibrate a 13 ns iSWAP-like gate (11 ns rectangular pulses, with 1 ns padding) by repeating the experiment from Figure 2c three times with the qubits on resonance (∆ = 0 MHz) to fine-tune the pulse amplitudes needed to reach θ = 90 • . Then, for 0 • < θ < 90 • we simply interpolate the coupler bias between the "OFF" bias and the θ = 90 • bias. For each iSWAP-like tune up experiment we initialize one qubit to the |1 state, apply the iSWAP-like pulses to the qubits and coupler, and then measure the |1 state population of the other qubit. In Figure 15a, we first use our precalibrated qubit frequency bias DC transfer functions to choose qubit bias amplitudes, amp q0 and amp q1, that place both qubits at the same frequency, and we sweep the coupler bias from the "OFF" bias to the maximum coupling bias to identify the amplitude that achieves exactly one a swap from the first to the second qubit corresponding to θ = 90 • (dotted line). In Figure 15b, we repeat the experiment using the θ = 90 • coupler bias from 15a while sweeping the bias of one qubit to maximize the amplitude of the swapped population, thus placing the qubits on resonance. Finally, in 15c, we repeat the experiment using the new qubit biases to fine-tune the coupler bias. In each experiment we initialize the |01 state and measure the population of the first qubit to identify the control biases to complete a full swap to |10 a, We use prior calibrations to bias the qubits on-resonance and scan the coupler bias amplitude, amp c, to find the bias that completes one swap. b, Using the new amp c, we scan the bias of one qubit, amp q, to place them on resonance. c, Using the updated qubit biases, we again scan amp c to find tune the coupler bias.

C. fSim calibration
Once we have calibrated the iSWAP-like and CPHASE gates, we should nominally be able to use one of each to implement any fSim gate. Unfortunately our pulse response is imperfect at short times, as described in supplement X A. This was less of an issue for the iSWAP-like and CPHASE gates in section IV because the XEB gate sequence alternated (10 ns) single and (13 ns or 15 ns) two-qubit gates; in those cases, the uncompensated flux bias settling tails resulted in a small detuning at the qubit idle frequencies. However, when we perform an fSim gate as a composition of a CPHASE followed immediately by an iSWAP-like gate, the tail of the first coupler pulse bleeds into the second coupler pulse. Even a small settling tail adding to the amplitude of the coupler pulse can drastically change the coupling during the second gate due to the large coupler flux sensitivity at strong couplings (remembering Figure 1c). In the future, this problem may be mitigated by identifying and removing the physical origin of these settling tails, with a more thorough in situ calibration procedure for the couplers, or by placing longer idle times between gates.
In this work, we deal with pulse bleed through by calibrating composite fSim gates where the amplitudes of the second set of pulses in the composite fSim sequence is dependent on the first, thus eliminating the need for excessive idle times between gates. Conveniently, the tune up procedure for each gate in the composition is the same as in the isolated iSWAP-like or CPHASE case, just with the two sets of pulses played back-to-back-this works because each experiment in our usual bring-up procedure operates within an isolated manifold (e.g. one excitation for θ or two excitations for φ) when performing fSim gates. The ordering of the gates within the fSim gate is chosen to place the CPHASE gate before the iSWAP-like gate. Since both coupler pulses have the same sign, if the CPHASE coupler amplitude bleeds into the iSWAP-like coupler amplitude, this results in slightly more swapping which is easily measured and adjusted for by reducing the iSWAP-like coupler amplitude to compensate. If we ordered the gates in the reverse order, pulse bleed through would generate leakage during the CPHASE gate which is much more difficult to characterize and remove.
For the purpose of building a robust registry of gates, we erred on the side of over-calibration for this demonstration. However, we find these control parameters to be well behaved and it should be possible to sample more sparsely in the future to simplify calibration of the full fSim gate set. Figure 16 outlines the three steps used to calibrate our composite fSim gates. In Figure  16a, we first calibrate many CPHASE gates spaced every 1 • using control pulses for just the CPHASE gate as shown on the right following the procedure outlined in supplement X B. Then, in Figure 16b, for each preceding CPHASE gate we follow the iSWAP-like calibration procedure (also supplement X B) to identify qubit and coupler bias amplitudes to achieve both a θ = 0 • and 90 • gate. Finally, in Figure 16c, for a CPHASE conditional phase, φ CPHASE = 180 • we tune up iSWAP-like gates for θ from 0 • to 90 • in 1 • increments by interpolating between the min and maximum amplitudes determined in 16b. We use this calibration to produce a spline for θ iSWAP−like → %(bias 90 • − bias 0 • ) and another for θ iSWAP−like → φ iSWAP−like .
With the fSim gate registry in hand we set out to benchmark specific fSim gates. For a given target fSim, we first look up the iSWAP-like pulse amplitudes that achieve the correct swap angle θ iSWAP−like , and subtract the conditional phase due to the iSWAP-like gate from the total target to choose pulse amplitudes for the desired  16. Three steps to calibrating the fSim gate to account for pulse distortion of the first (CPHASE) pulses bleeding into the second (iSWAP-like) pulses. a, Follow the usual CPHASE calibration procedure to bring-up a full CPHASE gate family corresponding to fSim(θ ≈ 0 • , φ : [0 • , 360 • ]). b, Follow the iSWAP-like tune up procedure, but play the CPHASE control pulses before the iSWAP-like pulses. Use the sequence to identify the flux bias amplitudes that achieve fSim gates with θ = 0 • and θ = 90 • for each proceeding CPHASE gate. c, for a preceding CPHASE gate with φ = 180 • , bring up gates corresponding to fSim(θ : [0 • , 90 • ], φ = 180 • ). CPHASE gate (e.g. φ CPHASE = φ target − φ iSWAP−like ). We then performed unitary tomography (supplement IX C) using the pulse amplitudes we looked up in the registry to quickly assess the resulting fSim control angles of the composite gate. If either control angle is off by more than 1 • , we used the registry to adjust the corresponding iSWAP-like (θ) or CPHASE (φ) control amplitudes by ±1 • accordingly. This process converged to an fSim gate within 1 • of both θ target and φ target for the target fSim with fewer than 9 adjustments for each of the 525 fSim gates we benchmarked. Once the unitary tomography experiment indicated the composite fSim gate produced a unitary operation near the target fSim gate, we performed purity and cross-entropy benchmarking.

XI. SYSTEM STABILITY
As the size of quantum processors grows (number of qubits), so too does the time it takes to calibrate a device (at least until fully parallel calibrations are possible). As the system drifts from these calibrations over time, the performance of a processor will fall and calibrations must be revisited. If the required calibration time is long compared to the scale of drift, then the device becomes unusable in practice. While electronics drift with both time and temperature must be considered when designing a system, one particularly worrisome issue is the time dependence of two level system (TLS) defects entering and leaving the qubit spectrum [22].
Here we present a promising snapshot of the stability of our system. In the process of calibrating the fSim gate set, we started by calibrating the single-qubit gates and readout. We then operated with the same singlequbit calibrations for several days while we were working on the fSim gate ultimately obtaining our primary fSim benchmarking dataset about a week after the initial single-qubit calibration. Shortly after this, a TLS showed up near one of the qubit's idle frequencies significantly limiting its coherence. Then, after about another week, we returned to the original calibration parameters to benchmark a subset of the same fSim(θ, φ) gates presented in the main text. We were pleasantly surprised to find that the original calibration was still good enough to produce high-fidelity gates.
In Figure 17 we used the two-week-old iSWAP-like and CPHASE calibrations to benchmark a less dense grid of fSim gates. While the average performance has degraded by a factor of two from the initial calibration, the average error is still less than 1%, but that is not the whole story. These fSim gates were benchmarked in a random order-if we look at a plot of the gate error as a function of time for these 91 (figure 17b), we see a strong time dependence where the first 50 gates (benchmarked over the course of an hour) have an average error much lower than gates #50 to #80. This would seem to indicated that the two-week-old electronics calibration is stable enough to maintain high-fidelity gates for weeks, and that the decreased fidelity is likely due to the residual and/or intermittent presence of a TLS interacting with one of the qubits. In an ideal world, we would be able to prevent or remove TLS defects, but, at least presently, we do not know how to do this. Instead, relying on the stability of our electronics, an optimal strategy for maintaining up-time on a large-scale quantum processor will likely involve calibrating a number of idle frequency configurations and being able to quickly vet and switch to an old configuration if and when a TLS shows up.  Figure 4, a TLS showed up at one of the qubit idle frequencies effectively breaking the calibration. After about another week, we returned to the original calibration and repeated the fidelity measurement on a subset of 91 fSim gates which we present here (top and middle rows). We find that the average gate fidelity had decreased somewhat, but is still above 99%. Furthermore, if we look at the gate error rates sorted in the order they were measured (bottom row), a strong time-dependence becomes apparent. Many of the gates presenting low errors (≈ 5 × 10 −3 ) as they did after the initial calibration. It is not until gates numbered 60 to 80 or so where large errors show up. This indicates that our control electronics are stable enough to maintain a high-fidelity calibration on the timescale of weeks, and that TLSs are likely the biggest threat to maintaining long term calibrations.