Implementing two-qubit gates at the quantum speed limit

The speed of elementary quantum gates, particularly two-qubit gates, ultimately sets the limit on the speed at which quantum circuits can operate. In this work, we experimentally demonstrate commonly used two-qubit gates at nearly the fastest possible speed allowed by the physical interaction strength between two superconducting transmon qubits. We achieve this quantum speed limit by implementing experimental gates designed using a machine learning inspired optimal control method. Importantly, our method only requires the single-qubit drive strength to be moderately larger than the interaction strength to achieve an arbitrary two-qubit gate close to its analytical speed limit with high fidelity. Thus, the method is applicable to a variety of platforms including those with comparable single-qubit and two-qubit gate speeds, or those with always-on interactions. We expect our method to offer significant speedups for non-native two-qubit gates that are typically achieved with a long sequence of single-qubit and native two-qubit gates.


I. INTRODUCTION
Increasing the speed of elementary quantum gates boosts the "clock speed" of a quantum computer.For noisy, intermediate-scale quantum computers [1] with finite coherence times [2,3], speeding up single-and two-qubit quantum gates also increases the circuit depth needed for solving useful computational problems [4,5].In most experimental platforms, single-qubit gates are achieved via electro-magnetic fields that drive individual qubit transitions.The maximum speed of these gates is often limited by the strength of the driving fields [6,7].A two-qubit entangling gate, necessary for universal quantum gates, can however only operate at a speed proportional to the interaction strength between the qubits [8][9][10], which is typically weaker than available singlequbit drive strengths and cannot be easily increased.
Assuming a limited interaction strength, one can analytically obtain the maximum speed for any particular two-qubit gate in the limit of arbitrarily fast single-qubit gates [11,12].In practice, all single-qubit gates have finite speeds, and in platforms such as superconducting qubits, single-qubit gates may not be much faster than two-qubit gates due to limited anharmonicity [13][14][15].The speed limits of two-qubit gates in such scenarios have not been studied.Therefore, we seek to both theoretically and experimentally investigate these practical speed limits, which are not only relevant to the optimal design of quantum gates and quantum circuits, but also directly related to the speed limits of entanglement generation, a fundamental topic of high interest in quantum information theory, condensed matter physics, and black-hole physics [16][17][18][19][20].
In this paper, we report a new method for designing twoqubit gates that are speed optimized, and we implement the gates experimentally using superconducting transmon qubits.We find that the protocol for achieving the fastest two-qubit gates in Refs.[11,12] can be far from optimal with a finite single-qubit gate time.Our method differs from previous protocols [11,12,21] in that we apply single-qubit drives simultaneously with the two-qubit interaction, a crucial strategy for speed optimization.We optimize the pulse shapes of the single-qubit drives using a method that combines the wellknown GRAPE algorithm [22,23] with state-of-art machine learning techniques.Importantly, our method works as long as there exists a physical interaction between two qubits and one can drive each qubit with controllable pulse shape and phase.This applies to almost any platform suitable for quantum computing.Furthermore, we experimentally demonstrate that our method can achieve maximally entangling two-qubit gates, such as the CNOT gate, close to their analytical speed limits found in Refs.[11,12] with modest single-qubit drive strengths and up to 98.3% average gate fidelity determined from quantum process tomography.This largely eliminates the impractical assumption of infinitely fast single-qubit gates.The same applies to the SWAP gate we implemented, which is crucial for remote quantum gates but notoriously hard to achieve with a short gate sequence [24].We have also implemented a non-Clifford gate, the √ SWAP gate, close to its quantum speed limit at 97.0% average gate fidelity.
We emphasize that the above-mentioned gates we implemented are all non-native gates, meaning that they cannot be achieved by evolving the static interaction Hamiltonian without single-qubit gates or time-dependent drives.These nonnative gates are crucial for efficient implementation of many useful quantum circuits [25][26][27].With our method, one could even realize an arbitrary two-qubit gate close to its theoretical speed limit.The conventional method of using a universal gate set to achieve a general SU (4) unitary would instead require a sequence of up to 3 two-qubit gates and 12 single-qubit gates [28], which is not only much slower in practice, but also likely of lower fidelity due to error accumulation.
Although certain two-qubit gates with 99.5% fidelity or higher have already been achieved with superconducting qubits [13][14][15], the lower fidelity gates reported here are largely due to limitations of our experimental hardware, as the theoretical gate fidelities associated with our demonstrated gates are all above 99.9%.The main purpose of this work is to demonstrate that the theoretical two-qubit gate speed limits can be largely achieved using a general optimal control method with minimal hardware requirement.Our speedoptimized gates achieve the same level of fidelity as that achieved previously on the same hardware with only fidelity optimization [21], showing that our gate speed optimization does not rely on sacrificing gate fidelity.Moreover, our method is scalable to a large number of qubits provided that the two-qubit interactions can be switched on and off (such as via tunable couplers [29]), since we can perform the speed optimization for each two-qubit gate independently.With most of the research on quantum gates focusing on improving gate fidelities, our work represents an orthogonal direction that aims to improve the fidelity of the whole quantum circuit [30] by optimizing the speed of any two-qubit gates.

II. EXPERIMENTAL SETUP
Our experimental platform consists of strongly-coupled fixed-frequency superconducting transmon qubits with static capacitive couplings in a hanger readout geometry [31].An intrinsic silicon substrate is used on which aluminum oxide tunnel junctions are fabricated via an overlap technique [32].The remaining circuit components are made of niobium.The full chip design and corresponding circuit model is shown in Fig. 1.The two transmon qubits' transition frequencies are 5.10 GHz, 5.26 GHz, anharmonicities are −270 MHz, −320 MHz, T 1 decay times are 40 µs, 21 µs and T * 2 decay times are 12 µs, 10 µs, respectively.In the rotating frame of the two qubit frequencies and assuming ℏ = 1 from now on, the static Hamiltonian of the two qubits can be written as [21,33] where g ≈ 2π × 1.75 MHz represents a fixed Ising coupling strength between the qubits.To interact with the qubits, we deliver two microwave drives -resonant with each qubit's transition frequency -simultaneously through the feedline.
Each of the drive fields contain two adjustable quadratures (X or Y), and can be described by the drive Hamiltonian.
where Ω x,y i (t) denotes the Rabi frequency of the drive resonant with qubit i's transition in the X or Y quadrature at time t.For perfect single-qubit drives σγ i = σ γ i .However, due to the strong Ising coupling between the two qubits in H 0 , the drive strength on one qubit is dependent on the other qubit's state, resulting in σγ ) and σγ 2 = (|0⟩⟨0| + r 1 |1⟩⟨1|) ⊗ σ γ , with r 1 ≈ 1.1 and r 2 ≈ 0.7 for our current chip.We note that with a weaker coupling strength or with a tunable coupler [34,35], both r 1 and r 2 can be made closer to or equal to 1.

III. ANALYTICAL SPEED LIMIT
In the limit of arbitrarily strong single-qubit drives, i.e.Ω max ≡ max |Ω x,y 1,2 (t)| → ∞, one can derive an analytical speed limit for any target two-qubit unitary with the abovementioned static Hamiltonian H 0 and control Hamiltonian H 1 .Note that in this limit, the speed limit is only well defined when r 1 = r 2 = 1, since otherwise H 1 will lead to arbitrarily strong interactions.As detailed in [12], any two-qubit target unitary U can always be decomposed as where ) are some single-qubit gates on the first (second) qubit, γ = x, y, z, and ]. {λ x , λ y , λ z } form a canonical vector that uniquely specifies any given two-qubit gate U up to single-qubit rotations, and their exact values can be obtained with the knowledge of U based on Ref. [11].To obtain the analytical speed limit, we assume that single qubit gates are arbitrarily fast and thus take a negligible amount of time.The total gate time for implementing U therefore reduces to the time spent on realizing U d , which is responsible for any entanglement generation.Based on H 0 , together with instantaneous single-qubit rotations, U d can be realized with a minimum time of: Eq. ( 4) is the analytical speed limit for a two-qubit gate U .Its values for the CNOT, SWAP, and √ SWAP gate are shown above and derived in Appendix B.  The minimum time TF (in units of Tmin) it takes to achieve a CNOT gate of F > 99% as a function of Ωmax (in units of g) using either our optimization algorithm (blue) or the GRAPE algorithm in QuTiP (red).Both algorithms use 16 segments of the drive pulses and 200 random restarts.The GRAPE algorithm in QuTiP fails to reach F > 99% for larger Ωmax values, where the gate time approaches its theoretical limit Tmin.

IV. OPTIMAL CONTROL METHOD
In practice, single-qubit gate speeds are limited by finite drive strengths and the analytical speed limit in Eq. ( 4) does not apply.To our best knowledge, no analytical speed limit has been found with finite single-qubit gate time.In this case, we can no longer rely on the decomposition in Eq. ( 3) to reduce the problem to just finding the minimum time in implementing U d .The true time-optimal protocol may not feature the structure of such decomposition and is challenging to find analytically for a general target gate.In fact, if we still follow Eq. ( 3) for realizing a target two-qubit gate, the resulting gate time can be much longer than T min .To realize universal single-qubit gates needed in Eq. ( 3), we use a two-axis gate (TAG) protocol first developed in Ref. [21], which employs an analytically obtained 3-segment drive pulse for Ω x,y 1,2 (t) to exactly cancel the effects of static interaction for any values of r 1 and r 2 .A single-qubit gate implemented this way has a gate time of at least π/(2g) [33].Apart from the native CZ gate that can be directly realized via an evolution of H 0 over a time t = π/(4g) ≈ 71.4ns [21,36,37], any two-qubit gate design that involves the use of single-qubit gate(s) realized via TAG requires a gate time of at least T min + π/(2g) (since single-qubit gates cannot shorten T min ), and is thus far from optimal.
Consequently, to approach the analytical speed limit with finite Ω max (or finite single qubit gate time), we adopt an alternative approach that avoids the use of any single-qubit gate and generate the target two-qubit gate directly.Specifically, we directly optimize the pulse shapes Ω x,y 1,2 (t) in our control Hamiltonian H 1 (t) in order to minimize the gate time for achieving a certain target gate with sufficiently high fidelity.For a given set of pulse shape functions Ω x,y 1,2 (t), we numerically find the evolution operator (5) where T denotes the time ordered integral and T denotes the total evolution time.Note that U can achieve any two-qubit gate with properly engineered pulse shapes, as the Hamiltonian and the commutators of the Hamiltonian at different times span all SU (4) generators.
Next, we calculate the average gate fidelity between the target unitary U and the evolution operator U using [38] where U j ∈ {σ γ ⊗ σ γ ′ } and σ γ ∈ {σ x , σ y , σ z , I}.For efficient numerical optimization, we will assume where m indexes the segments sequentially.Our goal is to maximize F over all possible values of {Ω γ i,m } subjected to the constraints |Ω γ i,m | ≤ Ω max for a given time T .The numerical speed limit T F is then defined as the minimum T that can achieve F > 1 − ϵ, where ϵ is the infidelity we can tolerate (set to 1% in the following).
Since F is a highly nonlinear function of {Ω γ i,m }, simple numerical optimization methods will not work well in finding the global maximum of F .Here we develop a new method that combines the standard GRAPE algorithm [23] with stateof-art machine learning techniques.Using the backward propagation method in the widely used machine learning library PyTorch [39], we calculate the gradients of F over each pulse parameter Ω γ i,m automatically.We then perform a stochastic gradient descent (SGD) algorithm with the Nesterov Momentum method [40] to maximize F over the pulse parameters.To avoid obtaining only a local maximum for F , we repeat each gradient descent process with 200 random seeds used for both initialization and SGD, and then select the global maximum among all repetitions.Further increasing the number of random seeds does not lead to noticeable improvement in maximizing F , showing that the optimization has converged.
To benchmark our numerical optimization method, we choose the target gate to be the CNOT gate and find the abovementioned numerical speed limit T F for F = 99% as a function of Ω max .We set M = 16, which allows the calculation to be done within a few hours on a small HPC cluster, and larger M does not lead to noticeable improvements.As shown in Fig. 2, we clearly see that as Ω max /g increases, T F approaches the analytical speed limit T min , indicating that the optimization succeeded in reaching the theoretical speed limit.Importantly, the maximum single-qubit drive strength Ω max does not need to be significantly larger than the interaction strength g to get close to the analytical speed limit.For example, setting Ω max = 3g already gives us a minimum gate time of 1.24T min with F > 99%.We also compare our method with the standard GRAPE algorithm in the widely used QuTiP software [41].With the same number of iterations and random initializations, the GRAPE algorithm in QuTiP can closely match our optimization results for Ω max /g < 2.5.However, it fails to achieve F > 99% for Ω max > 2.5g, and the same happens even with double the number of iterations or random initializations.This is likely because the algorithm struggles at escaping local minima due to a larger parameter space for a larger value of Ω max .
In Appendix C, we show that our optimization method also works well for different interaction Hamiltonians, such as the flip-flop interaction common in superconducting qubit systems.With such interaction, the speed optimization can significantly speed up CNOT and CZ gates since they can operate as fast as the native iSWAP gate.In contrast, most existing experiments with flip-flop interacting Hamiltonians report much slower CZ gates compared to iSWAP gates [13,15,42].

V. EXPERIMENTAL RESULTS
We now proceed to demonstrate the speed limits of the twoqubit CNOT, SWAP, and √ SWAP gates experimentally.The procedure for this is as follows.First, for each gate, the total evolution time T is varied from 0 to ≳ T min in 20 steps and the optimized pulse sequence is obtained numerically for each value of T .Next, this pulse sequence is applied to the transmon qubits experimentally by modulating the microwave drive signals.Finally, the average gate fidelity F is measured at time T by performing a quantum process tomography (QPT) [43].Our QPT involves applying 36 different prerotations to an initial state with both qubits in the state |0⟩, applying the optimized pulse sequence for time T , and then measuring 9 different Pauli operator (see Appendix D for details), resulting in 324 different experimental protocols, each of which is further repeated 500 times to ensure low statistical errors.After correcting the state preparation and measurement (SPAM) errors as well as performing a maximum likelihood estimation to ensure a completely positive and tracepreserving quantum map (see Appendix D for details), the QPT allows us to find a Pauli transfer matrix [44] for the corresponding quantum process, which can be further used to infer F (Appendix D).This process allows us to find the value of T above which we can get sufficiently high gate fidelity.Such T is the experimental speed limit for the target gate.
There are several experimental limitations in this procedure.First, as strong microwave drives can heat up the superconducting qubits and cause decoherence, we only send microwave pulses of at most 2π × 6 MHz in Rabi frequency, roughly 3 times the coupling strength g.But as we have shown in Fig. 2, this limitation should not prevent us from getting close to the analytical speed limit.A more noticeable limitation is that we can only generate smoothly varying pulse shapes that approximate the segmented (and thus discontinuous) pulse shapes used in the numerical optimization.As the number of segments M increases, this approximation deteriorates while the gate speed increases (and eventually converges).For our setup, we choose M = 4 for the experiment as a sweet spot for balancing the error and speed.We note that this limitation can be addressed by numerically optimizing smooth pulse shape functions (such as a train of Gaussian envelopes), although such optimization is more resource intensive.Finally, with r 1 , r 2 ̸ = 1 experimentally, our singlequbit drives will induce a small amount of extra interaction that would in principle allow us to go above the analytical speed limit for sufficiently large Ω max .Our numerical optimizer accounts for this artifact.The amount of speedup over the scenario of r 1,2 = 1 varies for different target gates.
Our experimental results are shown in Fig. 3.The mea-sured gate fidelity F (red curves) closely matches the one obtained from the numerical simulation of the experiment with no error (blue curves).The deviations between the two grow as the gate fidelity gets close to 1 for reasons we discuss in the next section.For the CNOT gate, we were able to achieve F ≈ 96.5% experimentally with a gate time of T = 93.7ns≈ 1.32T min (Fig. 3a).We emphasize that this outperforms the CNOT gate implemented using the SWIPHT protocol [45] performed on the same hardware (F ≈ 94.6% for a gate time of 1.87T min [21]), which is protocol designed specially for our hardware.The highest fidelity we achieve is F ≈ 98.3% at time T ≈ 1.84T min .
For the SWAP and √ SWAP gates, the extra interactions caused by non-unity r 1 and r 2 values have a more noticeable effect in speeding up the gates.For the SWAP gate (Fig. 3b), we obtain an experimental gate fidelity of F ≈ 95.9% at T = 216ns ≈ 1.01T min , where theoretically F ≈ 99.997%.A SWAP gate with such a short time is hard to achieve via a gate sequence using a typical universal gate set, making our method particularly useful given the importance of SWAP gates in many quantum algorithms [24].For the √ SWAP gate (Fig. 3c), we obtain an experimental gate fidelity of F ≈ 97.0% at T = 126ns ≈ 1.18T min , with F ≈ 99.999% in theory.
For all gates, the demonstrated experimental speed limits are reasonably close to the analytical speed limits.We note that the fidelities achieved here are lower than state-of-the-art due to limitations of the hardware platform (discussed in the next section) and not due to the optimal control algorithm.Even without optimal control, the fidelities obtained on this setup are close to or lower than what we are getting here [21].

VI. ERROR ANALYSIS
We have calculated the fidelity between the experimental process and the exact time evolution operator U in Eq. ( 5) for each point in Fig. 3, which is in general > 95% (see Fig. 4) As seen from Fig. 3, the experimental errors get larger at large values of T .This is possibly due to the following reasons.
First, the qubits decohere as time increases.This is evidenced by our measurement of a dark evolution (i.e. with the drive Hamiltonian H 1 turned off) process fidelity that drops from ≈ 99.3% to ≈ 96.3% from T = 0 to T = 3π/(4g) (the theoretical minimum time for the SWAP gate), as shown in Fig. 4.This large loss in fidelity is unrelated to optimal control or errors in the control pulses, and likely results from finite T 1 time of the qubits, measurement errors, and lowfidelity (about 98% on average) single-qubit gates used in our QPT [21,33].With better hardware designs, these errors can be largely eliminated.For example, single-qubit gates with > 99.9% fidelities have already been achieved with superconducting qubits [46].
Second, when T is large enough to allow the numerically optimized F to approach 1, imperfect calibration or fluctuations on the microwave drive amplitudes or phases tend to create a larger discrepancy between the experiment and the theory, as we have discussed in detail in Appendix E. This can account for up to 0.1% loss in fidelity for ≈ 1% deviations in Finally, there are also systematic errors coming from the leakage to higher excited states (in particular the |2⟩ state for each transmon), cross talk between the drives of each qubit, rotating wave approximations, and the deviation of the experimental pulse shapes from the ideal square waves used in our numerical optimization.We can characterize these errors with the following Hamiltonian in the lab frame for two qutrits: where m ∈ {00, 01, 10, 11, 02, 20, 12, 21, 22} labels the energy eigenstates of the static Hamiltonian, and d m,m ′ represents the dipole moment for the transition between states |m⟩ and |m ′ ⟩.E 1 (t) and E 2 (t) denote the electric fields of the two microwave drives we applied at frequencies ω 1 = E 10 − E 00 and ω 2 = E 01 − E 00 respectively.Their values are set by the actual experimentally applied electric fields that follow the pulse shapes from our optimization method but have finite rising/lowering edges between different pulse segments.The Hamiltonian in Eq. ( 7) then fully models the leakage outside the qubit subspace, the cross talk between the two drive fields, and realistic pulse shapes without rotating wave approximations.We then calculate the fidelity between the exact evolution operator of this Hamiltonian and the one in Eq. ( 5) (which was used for Figs.3-4).As shown in Fig. 5, these errors only add up to about 0.3% infidelity on average.And since these errors are explicitly modelled by the Hamiltonian in Eq. ( 7), we can further minimize their impact to gate fidelities using the same optimization method we built.However, this is beyond the scope of this work as our experiment hardly benefits from such effort due to other error sources being dominating.

VII. CONCLUSION AND OUTLOOK
There are primarily three advancements made in this work.First, we have studied the speed limits for two-qubit gates  .Fidelity F between the experimental quantum process (characterized by the QPT) and the corresponding exact time evolution operator in Eq. ( 5) using the optimized pulse shapes for a given gate time T with the target gate being CNOT, SWAP, or √ SWAP.The blue curve represents the fidelity between the experimental evolution without the drives (i.e.dark evolution) and the ideal evolution operator of e −iH 0 T .under realistic experimental conditions and shown that these limits are close to the analytical speed limits derived under ideal conditions.Second, we have developed an optimal control algorithm to generate a realistic pulse sequence to achieve these speed limits.Our algorithm performs better than a standard GRAPE algorithm and can be used to design speedoptimized two-qubit gates in a variety of quantum computing platforms with different types of interactions.It can offer significant speedups for non-native two-qubit gates especially when single-qubit gate times are not negligible.Finally, we have experimentally demonstrated the quantum speed limits for various two-qubit gates using superconducting qubits.
We have also carefully characterized the error sources for our experimental gates.Most of the gate errors come from characterization/calibration errors, imperfect measurements, qubit decoherence, and low-fidelity single-qubit gates.While .Average fidelity F between the evolution operators calculated using the Hamiltonian in Eq. ( 7) and using Eq. ( 5) for the speed-optimized CNOT, SWAP, and √ SWAP gates shown in Fig. 3.
the strong drive pulses in the optimal control could lead to more leakage and cross-talk errors, we show that these errors only add up to about 0.3% infidelity on average, and they can be further mitigated by optimizing a more accurate Hamiltonian.It is also worth pointing out that by optimizing the speed of two-qubit gates, errors from qubit decoherence will be suppressed.We therefore expect our method to be able to improve the fidelity of the whole quantum circuit.
An important future direction is to generalize this work to a multi-qubit scenario where additional qubits are used to speed up a two-qubit gate.Previous work has shown that significant scaling speedups may be obtained in performing remote quantum gates or preparing useful many-body entangled states [18,19] with long-range interacting qubits.However, questions regarding the speed limit of entangling gates when interactions are strongly long-ranged are still largely open [17].Such interactions play important roles in quantum information scrambling [20] and the development of fully-connected quantum computers [47].Another interesting direction is to study the speed limit of entangling gates when higher excited states outside the qubit subspace are utilized [48], where experimental and analytical results are both lacking.
Our experimental device is operated at 10mK in an Bluefors LD dilution refrigerator.Full schematics of the experimental setup are shown in Fig. 6.All qubit drive and readout microwave tones are delivered via the feedline, which has an output amplification chain of a Raytheon BBN Josephson parametric amplifier (JPA) preamp at base, high-electronmobility transistor (HEMT) amplifier at 4K, and a high-gain room-temperature amplifier.
Each experimental cycle consists of a state initialization, a time evolution under the engineered Hamiltonian flanked by process tomography rotations [43] and followed by a heterodyne state readout (see Fig. 6).The state initialization occurs by waiting 500µs ≈ 12T 1 between two experimental cycles, which is long enough to guarantee that each qubit is in the |0⟩ state.All gates consist of microwave tones from a Holzworth HS9008B pulse shaped by a BBN arbitrary pulse sequencer (APS) quadrature modulation scheme.Readout consists of a simultaneous 2µs probe (Agilent N5183Ms) of the two readout resonators to detect shifts in their frequencies due to their respective qubit states.The I/Q components of the readout signal shift are extracted via down conversion and a digital lock-in routine with a reference tone.They are then used to identify the two-qubit states as |00⟩ , |01⟩ , |10⟩ , |11⟩ via a classification algorithm using support vector machines.Total state preparation and measurement errors, quantified by the basis-state preparation confusion matrix [43] stayed under 5%, with the errors dominated by readout errors associated with qubit state relaxation during the measurement.
The computational subspace spectrum of the transmons was determined via a combination of spectroscopy (directly probing excitations with a 10µs square pulse) and Ramsey experiments (driving 2 MHz off-resonant, running a typical Ramsey sequence, and noting the deviation of the fitted frequency from 2MHz) [49].The strength of the drive fields on the qubits was inferred from the frequency of Rabi oscillations of the excited state population incurred by driving at uniform strength for a linearly increasing duration.The linearity of the pulse shaping quadrature channels on the pulse sequencer was characterized by measuring the Rabi oscillation frequency resulting from a sweep over pulse amplitudes, analyzing it via Fourier filtering [33], and correcting for it at the software level.

Appendix B: Obtaining Analytical Speed Limits
We provide details on how to obtain the analytical speed limit T min defined in Eq. ( 4) of the main text.Given a twoqubit unitary operator U , the key step is to find the decom- ], and U 1 , V 1 (U 2 , V 2 ) are some single-qubit gates on the first (second) qubit.This decomposition is non-trivial, and the detailed procedure can be found in Ref. [12].Here we provide the results of the decomposition for the three target gates we studied in Table I.The values of λ x,y,z directly lead to T min values for the three target gates shown in Eq. ( 4) of the main text.Note that an overall phase difference is tolerated for the decomposition of U .
Table I.Detailed decompositions of the CNOT, SWAP, and √ SWAP gates based on Eq. (3).

Appendix C: Optimization for different static Hamiltonians
To demonstrate the universality of our optimal control method, here we apply it to two different static Hamiltonians commonly seen in superconducting qubit platforms [13,25,35,50,51]: A flip-flop (also known as XY) Hamiltonian H XY 0 and an XXZ Hamiltonian H XXZ 0 that contains both flip-flop interaction and ZZ (Ising-type) interaction.
As an example, we set the parameter η = 1/2 in the following, and similar results are expected for different η values.We now perform our gate speed optimization described in Section IV with our experimental static Hamiltonian H 0 in Eq. ( 1) replaced by the above H XY 0 or H XXZ 0 , and the target gate being either SWAP or CNOT.
In Fig. 7, we show the minimum gate time T (in unit of the corresponding analytical speed limit T min ) as a function of the maximum drive strength Ω max (in unit of the interaction strength g), similar to Fig. 2. Note that for the CNOT gate (as well as the CZ gate), T min = π/(4g) holds for H 0 , H XY 0 and H XXZ 0 .But for the SWAP gate, T min = 3π/(8g) for H XY 0 and T min = 3π/(10g) for H XXZ 0 .In other words, the XY or XXZ interaction can generate a SWAP gate faster than the Ising interaction used in our experiment at the same strength.
We have also set a higher fidelity threshold of F > 99.99% to better show the potentials of our optimization method in Fig. 7, while the number of pulse segments, random seeds and iterations remain identical to those in Fig. 2 (for our experiment such a high fidelity threshold is unnecessary due to experimental error sources).We find that for both H XY 0 and H XXZ 0 , one can reach T ≈ T min with Ω max /g < 4. For the CNOT gate, one needs larger drive strengths to reach the theoretical speed limit, but at moderate drive strengths (Ω max ≈ 3g as in our experiment), one can still get close to the theoretical speed limit (T ≈ 1.2T min ).this error, we simulate additional measurements by adding a Gaussian distributed random noise with zero mean and unity standard deviation on each Pauli operator measured during our QPT [44].This allows us to set an upper bound on the statistical error of the mean that would be obtained on re-performing the full experiment with all other error sources held fixed.As shown in Fig. 3 of the main text, this statistical error on the measured F is less than 1% in all cases.
We have also numerically simulated the effects of imperfect calibration or noises on the optimized pulse shapes (either for amplitudes or phases) by adding random perturbations to each optimized pulse parameter Ω γ i,m (see main text).We ex-pect such perturbations to be present in our experimental setup with magnitudes of a few percent of Ω max .The simulated average gate fidelity F is shown in Fig. 9, where all other parameters are identical to the exact F curves in Fig. 3 of the main text.We see that our optimization method is robust to small amount (1%) of noises on the pulse shapes, where the fidelity can still exceed 99.9% for gate time close to T min .For larger noises (5%), the infidelity caused by errors on the drive pulses can be around 1−2%, but such large noises are rare in most experimental platforms.

Figure 1 .
Figure 1.(a) Optical micrograph of the experimental chip including qubits, readout resonators, test Josephson junctions, and test resonators.(b) Zoomed-in view of the two floating qubits.Each qubit consists of two identical pads (red for the left qubit and blue for the right qubit) and a Josephson junction connecting the two pads.Each qubit is coupled to its own readout resonator (blue).(c) Grounded circuit model of the capacitively coupled qubits.

Figure 2 .
Figure 2. The minimum time TF (in units of Tmin) it takes to achieve a CNOT gate of F > 99% as a function of Ωmax (in units of g) using either our optimization algorithm (blue) or the GRAPE algorithm in QuTiP (red).Both algorithms use 16 segments of the drive pulses and 200 random restarts.The GRAPE algorithm in QuTiP fails to reach F > 99% for larger Ωmax values, where the gate time approaches its theoretical limit Tmin.

Figure 3 .
Figure 3. Experimental measurements of the average gate fidelity F using optimized 4-segment drive pulses , with the target gate being (a) CNOT, (b) SWAP, and (c) √ SWAP.The red curves represent experimental measurements while the blue curves represent the exact numerical calculation of F without considering any experimental error.Ωmax = 6MHz for the CNOT gate and Ωmax = 5MHz for SWAP and √ SWAP gate.The error bars represent a upper bound on the statistical error of the mean for 500 repeated measurements at each point.

Figure 4
Figure 4. Fidelity F between the experimental quantum process (characterized by the QPT) and the corresponding exact time evolution operator in Eq. (5) using the optimized pulse shapes for a given gate time T with the target gate being CNOT, SWAP, or √ SWAP.The blue curve represents the fidelity between the experimental evolution without the drives (i.e.dark evolution) and the ideal evolution operator of e −iH 0 T .

Figure 5
Figure 5. Average fidelity F between the evolution operators calculated using the Hamiltonian in Eq. (7) and using Eq.(5) for the speed-optimized CNOT, SWAP, and √ SWAP gates shown in Fig.3.

Figure 6 .
Figure 6.Our experimental setup composed of qubit drives and heterodyne state readout.Each qubit drive (green section) is shaped via quadrature modulation by a BBN APS1 and Polyphase Microwave AM4080A.Readout (orange) consists of digitally locking in the signal passing through the device with a reference and extracting I/Q shifts used to classify the ground/excited states.

Figure 8 .
Figure 8.Quantum Process Tomography based on the Pauli transfer matrix for the three target gates we performed experimentally.Top row from left to right: Pauli transfer matrices for an ideal CNOT gate, SWAP gate, and √ SWAP gate.Bottom row from left to right: Examples of SPAM-error-corrected Pauli transfer matrices obtained experimentally for the CNOT , SWAP, and √ SWAP gates, with average gate fidelities being 95.6%, 93.1%, and 95.7% respectively.

Figure 9 .
Figure 9. Average gate infidelity 1 − F calculated using the optimized pulse shapes in Fig. 3 of the main text but with random Gaussian noise added to each pulse shape parameter Ω γ i,m (see main text) with the target gate being CNOT (a), SWAP (b), and √ SWAP (c).The blue (red) curves correspond to a standard deviation of the Gaussian noise at 0.01Ωmax (0.05Ωmax).
Figure 7. Minimum gate times at different maximum drive strengths achieved by our optimization method for an XY or XXZ interacting Hamiltonian shown in Eq. (C1), with the target gate being SWAP or CNOT.Every point here has average gate fidelity F > 99.99%.