Comparing planar quantum computing platforms at the quantum speed limit

An important aspect that strongly impacts the experimental feasibility of quantum circuits is the ratio of gate times and typical error time scales. Algorithms with circuit depths that significantly exceed the error time scales will result in faulty quantum states and error correction is inevitable. We present a comparison of the theoretical minimal gate time, i.e., the quantum speed limit (QSL), for realistic two- and multi-qubit gate implementations in neutral atoms and superconducting qubits. Subsequent to finding the QSLs for individual gates by means of optimal control theory we use them to quantify the circuit QSL of the quantum Fourier transform and the quantum approximate optimization algorithm. In particular, we analyze these quantum algorithms in terms of circuit run times and gate counts both in the standard gate model and the parity mapping. We find that neutral atom and superconducting qubit platforms show comparable weighted circuit QSLs with respect to the system size.


I. INTRODUCTION
Quantum computers promise to solve computational problems that are deemed hard or even intractable for classical computers.Their potential applications include prime-factoring of large integers [1], quantum simulation [2], quantum chemistry [3], combinatorial optimization [4], and even problems in finance [5].Currently, quantum computing is in the so-called noisy intermediate-scale quantum (NISQ) era [6], characterized by imperfect qubit control, and qubit numbers that prohibit quantum error correction [7] for relevant problem sizes.Nevertheless, recent proof-of-principle experiments [8][9][10][11] demonstrated that a computational quantum advantage over classical computers can be reached already with NISQ hardware.However, it remains a crucial challenge to go beyond the proof-of-principle stage, i.e., to demonstrate a quantum advantage for practically relevant computational tasks on resource-limited presentday devices.
To reach a practical quantum advantage regime [2] in NISQ-era digital quantum computing it is of crucial importance to execute quantum algorithms as efficiently as possible in order to minimize the time for noise mechanisms to impair the quantum information processing.This effectively makes a minimization of the quantum algorithm run times and gate counts desirablea task that can be addressed in various ways.One option is to find an algorithm's optimal circuit representation, i.e., a circuit requiring a minimal circuit depth together with a minimal gate count for a given set of available gates.Quantum circuit optimization has, e.g., been done heuristically [12,13] or by machine learning techniques [14,15] and several open-source packages are readily available [16][17][18].
Another option for minimizing the algorithm run time * daniel.basilewitsch@uibk.ac.at is to minimize the time for each elemental quantum gate of a given quantum circuit.While protocols for fast, high-fidelity quantum gates are nowadays routinely available and implemented on all major quantum computing platforms like neutral atoms [19,20], superconducting circuits [8,21] or trapped ions [22,23], the ideal would be to execute every quantum gate at its quantum speed limit (QSL).In general, the QSL denotes the shortest time needed to accomplish a given task [24].It constitutes a fundamental limit in time and depends on the system under consideration, i.e., its Hamiltonian and the control knobs available to steer the dynamics.
Here, we determine the QSLs of quantum gates for two major quantum computing platforms that allow for twodimensional (2D) qubit arrangements -neutral atoms and superconducting circuits [25].Our study reveals how close the current experimental gate protocols are in comparison to their QSLs and thus exemplifies what can theoretically still be gained from further speeding up gate protocols.Moreover, provided that every gate could be experimentally realized at the QSL, our analysis gives an estimate of how many gates can be executed realistically before decoherence takes over and renders longer quantum circuits practically infeasible.To this end, we consider two prototypical quantum computing algorithms: (i) the quantum Fourier transform (QFT), required for Shor's algorithm for integer factorization [1], and (ii) the quantum approximate optimization algorithm (QAOA), used to solve combinatorial optimization problems [4].Considering standard NISQ devices for both neutral atoms and superconducting circuits with qubits arranged in a 2D grid architecture with only nearest-neighbor connectivity, we calculate the circuit run times with gates at the QSL for both algorithms.This allows for a direct comparison of both platforms in terms of the maximal problem sizes that should currently be feasible on their NISQ representatives.
A common challenge arising in 2D platforms with nearest-neighbor connectivity is the requirement to perform gates between non-neighboring qubits.In the stan-dard gate model (SGM), such gates can be replaced by sequences of universal single-and two-qubit gates using the available local connectivity.However, this comes at the price of increasing the circuit depths and gate counts.As an alternative to the SGM, we also examine circuit representations using the so-called parity mapping (PM).In brevity, the PM for quantum computing [26] and quantum optimization [27,28] is a problem-independent hardware blueprint that only requires nearest-neighbor connectivity at the cost of increased qubit numbers.Since for QAOA circuits in the PM it is beneficial to use local three-and four-qubit gates [29,30], we also determine their QSLs on both platforms.
It should be noted that this work focuses entirely on the determination of the QSLs for various quantum gates and how to turn these into a fair comparison of circuit run times across platforms.A more detailed discussion of gate protocols for specific quantum gates or platforms can, e.g., be found in Refs.[31][32][33] for neutral atoms or in Refs.[34][35][36] for superconducting circuits.Moreover, it should be noted that we only consider the gate times, respectively circuit run times, as well as the number of gates as indicators for the feasibility of quantum circuits.While both quantities are doubtlessly important, they are by no means the only quantities impacting a circuit's feasibility.A holistic figure of merit assessing a circuit's feasibility would also need to account for state preparation and measurement errors and various other error sources.
The paper is organized as follows.In Sec.II we present our main result, i.e., an overview of circuit run times when using gate times from the literature and gate times at the QSL -evaluated both for neutral atoms and superconducting circuits as well as for circuits in the SGM and the PM.In Sec.III we then present the details of our numerical model.Section IV introduces the basic notion of QSLs and how quantum optimal control theory (OCT) can be used for determining QSLs.A detailed discussion of QSLs as well as which combinations of available control fields allow to reach the QSLs, is given in Sec.V. Section VI concludes.

II. MAIN RESULT: QUANTUM ALGORITHMS AT THE QUANTUM SPEED LIMIT
In this section, we compare the run times of QFT and QAOA quantum circuits using gate set implementations available on neutral atom and superconducting circuit hardware.To do so we consider two scenarios.In the first scenario, we use literature values for gate times of stateof-the-art gate implementations of the minimal universal gate set (referred to as "standard gate set" (SGS) from now on) native to each platform, which thus represents the canonical way of converting quantum algorithms into executable quantum circuits.In the second scenario, we use an extended set of gates available at each platform with gate times at the QSL (referred to as "QSL gate  In the SGM, each quantum algorithm is converted into quantum circuits using gates from a universal set of quantum gates.Such a universal set typically contains all the single-qubit gates and at least one entangling two-qubit gate, e.g., the CNOT gate [38].The row named "standard gate set" in Table I summarizes the native universal gate sets and their typical gate times for both platforms at comparable gate fidelities.For neutral atoms, we consider the controlled-Z gate, CZ = diag{1, 1, 1, −1}, as the typical entangling two-qubit gate.This is a common choice for neutral atoms [20] as it has been successfully used for implementing quantum algorithms [39,40].For superconducting circuits, we have chosen the iSWAPlike Sycamore gate as an entangling two-qubit gate, motivated by its short gate time and successful usage in recent quantum advantage experiments with quantum processors based on tunable couplers [8,9]. In addition, we consider an extended, platformindependent set of quantum gates operated at the QSL (see "QSL gate set" in Table I).This set consists of all single-qubit gates as well as several multi-qubit gates implementable on both platforms: CZ, CNOT, SWAP, ZZZ, and ZZZZ.The availability of a wider range of gates al-  FIG. 1.Comparison of the circuit run times and gate counts for the quantum Fourier transform (QFT) and quantum approximate optimization algorithm (QAOA) between neutral atoms and superconducting circuits.The left (right) column corresponds to the QFT (QAOA) and the orange (purple) marker correspond to neutral atoms (superconducting circuits).The circuit run times for both algorithms and various problem instances of different size, N = 9, 16, . . ., 81 qubits for the QFT and N = 9, 16, . . ., 121 qubits for the QAOA, are given in panels (a) and (c), respectively.The numbers of two-and multi-qubit gates in the corresponding circuits are given in panels (b) and (d), respectively.The data for the squares [circles] is obtained using the standard gate model (SGM) using the gates and times from the standard gate set (SGS) [QSL gate set (QGS)], cf.Table I.In contrast, the plus signs [crosses] represent the results when the same algorithms are realized in the parity mapping (PM) using the SGS [QGS].The pentagons correspond to an altered 2D architectures allowing for next-to-nearest neighbor (NNN) coupling between qubits.The run time for each circuit is given in units of typical platform-specific error times.The gray area in panels (a) and (c) indicates where the circuit run times exceeds the error times.
lows for more flexibility in finding circuit representations with fewer gates and shorter circuit run times.In addition, we also assume that every gate in this set is executed at the QSL, which allows for another speed-up of circuit run times.The run times obtained for the QGS should therefore be viewed as an estimate for the QSL of the circuit itself, i.e., the circuit QSL -provided that no other, potentially faster and/or better suited gates are available [41].Note that the details regarding the method to determine the QSLs and their results are presented in Secs.IV and V.The row "QSL gate set" in Table I summarizes the gates of the extended set and lists the QSL times for both platforms.Table I indicates that absolute circuit run times for neutral atoms will be longer compared to those for superconducting circuits since their elemental gate times differ by more than an order of magnitude.However, in order to ensure a fair comparison of circuit run times the absolute gate times need to be weighted by the finite coherence time and lifetime of qubits and other levels involved in the gate mechanisms.
For neutral atoms, we take both the coherence time of the qubit states, given by the dephasing time T * 2 = 4 ms [39], and the lifetime of the Rydberg state, T Ryd = 150 µs [39], as typical error time scales against which we compare circuit run times.We take the former as reference for single-qubit gates, since they don't occupy the Rydberg level [42],and the latter for two-qubit gates, where controlled transitions via Rydberg levels constitute the primary gate mechanism [43].For superconducting circuits, we take their intrinsic T 1 time, T 1 = 15 µs [8], as typical error time scale since it applies for both singleand two-qubit gates.However, note that these choices are rather conservative.For neutral atoms, any two-qubit gate dynamics will naturally also involve the qubit levels, which have much longer coherence times.Weighting two-qubit gates exclusively by the Rydberg lifetime thus overestimates the lifetime-induced error probability.For superconducting circuits, longer T 1 times up to 500 µs have already been reported [44].In the following, we calculate the circuit run times for QFTs and single QAOA steps in the SGM using gates from both the SGS and the QGS.As already outlined above, we assume both hardware platforms to consist of qubits arranged in 2D arrays with only nearest-neighbor connectivity (see Sec. III for details regarding the model).This requires replacing all gates between non-connected qubits with gate sequences between physically connected qubits.

Standard gate set circuits
Using gates from the SGS, the circuit's gate sequences consist of single-qubit gates plus CZ gates in case of neutral atoms and single-qubit gates plus Sycamore gates in case of superconducting circuits (cf.Table I).Since the circuit for the QFT does by construction not require any gate between non-neighboring qubits [45], we only replace the controlled-phase gate and SWAP gate by gate sequences from the SGS.The situation changes for the QAOA circuits, as it assumes an all-to-all connected architecture and finding a circuit representation with minimal depth while only requiring nearestneighbor connectivity is likely NP-hard [46].As a remedy, we use the open-source pytket compiler [16] to translate the QAOA's bare quantum circuits to executable quantum circuits that are in agreement with the hardware's nearest-neighbor connectivity and SGSs.We furthermore use the compiler's quantum circuit optimization features to optimize the circuits in order to minimize gate counts and circuit depths.
The squares in Fig. 1 (a) and (c) show the resulting run times for quantum circuits corresponding to QFTs and QAOA steps, respectively, when executed on neutral atoms (orange) and superconducting circuits (purple).Within each line, the size of the problem instance increases from the lower left to the upper right.Note that for a fair comparison between platforms, all gate times have been weighted by each platform's intrinsic error time scales as described previously.Circuits with run times significantly exceeding the platform's intrinsic, level-dependent error time scales, highlighted by the gray area in Fig. 1 (a) and (c), will most likely yield unreliable results.We observe that despite the different time scales for gates, the weighted circuit run times are almost identical for both platforms and for both the QFT and QAOA, see squares in Fig. 1 (a) and (c), respectively.However, only problem instances with relatively small qubit numbers seem to be currently doable on both platforms.
While the circuit run time is one deciding factor in whether it is feasible on current NISQ hardware, this measure neglects the fact that every gate comes with an intrinsic error probability.Assuming gate errors on the order of 0.1 − 1%, which are realistic both for neutral atoms [20,40] and superconducting circuits [37], it is clear that also the gate count limits the feasibility of quantum circuits and lower gate counts are thus preferable.To this end, the squares in Fig. 1 (b) and (d) show the number of two-qubit gates required for QFT and single-step QAOA circuits, respectively.Note that circuit representations in terms of the SGS require the same number of two-qubit gates both on neutral atom and superconducting qubit hardware.Hence, both cases are represented by a single line in Figs. 1 (b) and (d).

QSL gate set circuits
The circles in Fig. 1 (a) and (c) show the circuit run times for the same quantum algorithms and complexity levels as used for the squares but when the QGS is employed to generate circuit representations.The corresponding two-qubit gate counts are illustrated by the circles in Fig. 1 (b) and (d).We observe an overall reduction in circuit run times and gate counts for both platforms, both algorithms and any considered problem instance.This improvement is a combined effect of using maximally fast quantum gates at the QSL and taking advantage of the increased flexibility provided by the extended set of gates.Especially the availability of SWAP gates needs to be stressed as it directly reduces the gate count and thus leads to a reduction in circuit run time even without an additional speedup in gate times.In order to quantify the improvements achievable through the QGS, Fig. 2 (a) lists the average reduction in circuit run times and gate counts for each algorithm and platform.
Our analysis demonstrates to which extent circuit run times and gate counts can -from a theoretical perspective -still be improved if standard gates are replaced by an extended gate set with gate times at the QSL.However, even with the improved gate set and its reduction in run time and gate count, most of the quantum circuits, i.e., circles in Fig. 1, remain likely infeasible using current NISQ hardware.Quantum error correction codes could in principle address the issue of circuit run times and gate counts exceeding the limits set by finite lifetimes and gate errors.Nevertheless, since both lifetimes [44,47] and qubit numbers [48] are constantly increasing, more complex quantum circuits will likely reach the feasible regime in the near future.

C. Circuits times in the parity mapping
Besides the representation of quantum algorithms using the SGM, we now consider the representation of the same algorithms within the PM.While the PM was originally designed to tackle combinatorial optimization problems via quantum annealing [27], it can also be utilized for digital quantum optimization algorithms such as QAOA [30] as well as to achieve universal quantum computing [26].At its core, the PM circumvents the need for long-range interactions between qubits, which in turn renders gates between non-adjacent qubits obsolete.However, this comes at the expense of requiring more physical qubits and many-body constraints on 2 × 2 plaquettes of qubits as specified in detail in Appendix A 2. Similar to our analysis for the SGM (see Sec. II B) we use either gates from the SGS or the QGS in the following.

Standard gate set circuits
The plus signs in Fig. 1 (a) [(b)] show the circuit run times [two-qubit gate counts] of QFTs in the PM for the same problem instances as used for the SGM (squares).In both cases and for both platforms, the gates and times of the SGS have been used.We observe a reduction in circuit run times and gate counts for both platforms with the comparison being made with respect to the values for the SGM [see also Fig. 2 (c)].The same comparison can be done for the QAOA steps with their circuit run times and gate counts given by the plus signs in Fig. 1 (c) and (d), respectively.Regarding single-step QAOA resource requirements, we observe a significant reduction in circuit run times [see also Fig. 2 (c)] due to the constant circuit depth in the PM [29,30].However, note that the circuit depth differs for neutral atoms and superconducting circuits due to the different coupling mechanisms between qubits with neutral atoms having the deeper circuits, cf.Appendix A 2. The reduction in circuit run time is accompanied by an increase in two-qubit gate counts compared to the SGM, which we attribute to the decomposition of the constraint gates, cf.Eq. (A6), into the natively available gates within each platform.

QSL gate set circuits
The crosses in Fig. 1 show the results for circuit representations in the PM when employing gates and times from the QGS.For the QFT on neutral atoms, we do not observe any further reduction in circuit time or gate counts compared to its representation utilizing the SGS.This is because its circuit representations [49] for neutral atoms contain only single-qubit and CZ gates -gates for which the gate times are identical within the SGS and the QGS.In contrast, for superconducting circuits, we still observe an improvement in circuit time since the CZ gate becomes directly available in the QGS and must no longer be replaced by Sycamore and single-qubit gates.However, the representations in the SGS and QGS only differ by single-qubit gates and hence their two-qubit gate count is identical and also identical to that of the neutral atoms.This is reflected in Fig. 1 (c) by a single line of superimposed plus signs and crosses.
For the single-step QAOA circuits in the PM, we observe a significant reduction both in circuit run time and gate counts on both platforms when the QGS is used instead of the SGS.This is due to the availability of the multi-qubit gates ZZZ(γ) and ZZZZ(γ), cf.Eq. (A6), where the gate-count reduction originates from avoiding single-and two-body gate decompositions.In addition, it turns out to be much faster to use control pulses that directly implement these multi-qubit gates as opposed to serially applying control pulses to implement the required single-and two-qubit gates.Figure 2 (b) summarizes the average run time and gate count reductions (in terms of two-and multi-qubit gates) when replacing the SGS with the QGS within the PM.
In contrast to panels (a) and (b) of Fig. 2, where the gate set changes and the circuits stay in the SGM or PM, panels (c) and (d) examine the opposite scenario, i.e., the gate sets are kept constant but the circuit models change.In detail, Fig. 2 (c) and (d) show the average circuit run time and gate count reduction when the SGM is replaced by the PM while the gate set is given by the SGS and QGS, respectively.Especially Fig. 2 (d) needs to be emphasized as it reveals to which extent the PM allows to reduce circuit run times and gate counts compared to representations of the same circuits in the SGM.As mentioned above and detailed in Appendix A 2, the PM-specific improvements come at the only expense of requiring more qubits.For the current state of NISQ hardware and independent from the platform, we believe the SGM to be better suited for QFTs, as more problem instances seem to be feasible, judging by circuit run time and required qubits.In contrast, for QAOA using the QGS, we find the described parity representation of circuits (despite its overhead-induced inferior success probabilities compared to direct SGM-QAOA [50]) to be an advantageous option, as the run time and gate count is drastically reduced for each QAOA step compared to the SGM.In particular, the need for more PM-QAOA steps compared to SGM-QAOA steps might be compensatable given the resource reduction per PM-QAOA step.Given the ongoing upscaling of quantum computers in terms of qubit numbers, see, e.g., IBM's quantum roadmap [48], the PM seems to be a viable option for QAOA.
As a final remark, it should be noted that we only discussed optimization problems with all-to-all connectivity, cf.Eq. (A1).However, many realistic optimization problems have rather sparse connectivity, which would lead to a significant reduction in required qubits [28] while maintaining its strength of constant circuit depth.

III. MODELING NEUTRAL ATOM AND SUPERCONDUCTING CIRCUIT PLATFORMS
In Sec.II we have presented the main result of our work -namely a calculation and comparison of circuit run times and gate counts using neutral atoms and superconducting circuits as quantum computing platform.In this section, we now introduce the detailed physical models used for both platforms.Since our focus is the description of dynamics, we focus on the respective Hamiltonians including the various control knobs typically available to steer the systems and implement quantum gates.

A. Neutral atoms
Arrays of trapped neutral atoms laser coupled to highly excited Rydberg states are a promising platform for quantum computing [51,52] and quantum simulation [53] as qubits can for example be encoded in long-lived hyperfine ground states.High-fidelity single-qubit gates can be achieved using microwave fields [54], two-photon Raman transitions [55], or a combination of microwaves and gradient fields for individual-qubit addressing [56][57][58].In contrast, entangling operations between atoms, i.e., many-body gates, are typically realized via strongly interacting Rydberg levels [59,60] and various control schemes for two-and multi-qubit gates have been experimentally demonstrated [20,[61][62][63][64].
Such gates have been used in recent experiments [39,40] with up to several hundreds of atoms arranged in a planar geometry.If we consider the smallest building block of such a 2D array, it consists of N = 4 atoms with each atom described by three relevant levels.In the rotating frame, its Hamiltonian reads ( ̵ h = 1) where ↓ n ⟩ and ↑ n ⟩ denote the qubit levels of the nth atom and r n ⟩ its Rydberg level.The Rabi frequencies of the two laser fields coupling the ↓ n ⟩ and ↑ n ⟩ states of the nth atom to their respective Rydberg level r n ⟩ are denoted by Ω ↓,n (t) and Ω ↑,n (t) [65].The phases of the respective laser fields are given by ϕ ↓,n (t) and ϕ ↑,n (t) while ∆ n (t) denotes the detuning of the laser field from the exact transition frequency.We assume ∆ n (t) to be identical for both fields.The van der Waals (vdW) interaction between atom n and m is denoted by V nm .Note that in Hamiltonian (1) we have omitted fields responsible for single-qubit gates, i.e., fields coupling the two qubit levels, since our focus is on multi-qubit gates for which the Rydberg levels and their interaction are the primarily gate mechanism.
If not stated otherwise, we assume the laser fields to act identically on all atoms, i.e., Ω ↓,n (t for all n.In the following, we moreover assume by default a pseudo-2D architecture with V nm = V as described in more detail in Ref. [66].Effectively, this corresponds to an architecture with equally strong nearest neighbor (NN) and next-to-nearest neighbor (NNN) coupling.However, it should be noted that we do not exploit the NNN couplings when constructing the quantum circuits in Sec.II.It thus does not affect any of the singleor two-qubit gate times presented in Table I as these times would be identical for an actual, planar 2D architecture where V is simply the coupling strength between nearest neighbors.The assumption of of the pseudo-2D architecture does only affect the multi-qubit constraint gates ZZZ(γ) and ZZZZ(γ), cf.Eq. (A6), which arewithin our study -only relevant for the PM using the QGS, cf. the crosses in Fig. 1 (c).

B. Superconducting Circuits
Qubits encoded in the lowest energy levels of superconducting circuits are another promising platform for quantum computing [67].Since their parameters can be to some extent chosen during their fabrication process, superconducting circuits come in various variants and parameter regimes with transmon qubits [68] being currently the most prominent ones for quantum computing.Their qubit levels can be manipulated via microwave fields to either implement single-qubit gates [69,70] or two-qubit gates [71,72] with high fidelity.In contrast to neutral atom qubits, which interact directly via their Rydberg levels, qubits encoded in superconducting circuits often interact indirectly via intermediate coupling elements [73].In an architecture, where such couplers are made tunable [74], the effective interaction strength between the qubits becomes tunable as well.Such qubit architectures have been successfully used in recent quantum advantage experiments [8,9] and thus are a prototypical NISQ quantum computing platform.
The qubits in such a tunable coupler architecture are arranged on a 2D lattice with nearest-neighbor couplings.In the following, we take the architecture from Ref. [8] as a reference.The smallest building block within such a system consists of N = 4 qubits.The Hamiltonian for this subsystem reads [8] with where ω n (t) is the frequency-tunable level splitting of transmon n, α n its anharmonicity and b n its annihilation operator.The tunable coupling between transmons n and m is denoted by g nm (t) and η is the non-linearity of the involved transmons, which is roughly constant α n ≈ η for all n.The time-dependent amplitude and frequency of a local X-type control field on transmon n, e.g., some microwave field, is denoted by Ω n (t) and ωn (t), respectively.For numerical reasons, it is advantageous to change into a rotating frame.We change into a rotating frame with frequency ω rot via the transformation and find where we have introduced the auxiliary control fields Ωn,re (t) = Re Ω n (t)e −i(ωrot−ωn(t))t , (6a) Ωn,im (t) = Im Ω n (t)e −i(ωrot−ωn(t))t . (6b) While Ω n (t) and ωn (t) are the actual physical control fields, we may take Ωn,re (t) and Ωn,im (t) as auxiliary control fields which capture the time-dependent nature of Ω n (t) and ωn (t) in the rotating frame and the latter can be reobtained from Ωn,re (t) and Ωn,im (t).
IV. DETERMINING QUANTUM SPEED LIMITS VIA QUANTUM OPTIMAL CONTROL After having introduced the physical model for neutral atoms and superconducting circuits in Sec.III, we now review the basic notion of QSLs in Sec.IV A and of quantum optimal control theory in Sec.IV B since both build the theoretical, respectively methodical, foundation for the results presented in this work.Our method is described in Sec.IV C.

A. Quantum speed limits
The notion of quantum speed limits (QSLs) naturally arises in the context of quantum control problems.To this end, let us consider a quantum system described by the Hamiltonian H(t) = H({E k (t)}), which depends on a set of control fields, {E k (t)}, that can be externally tuned, e.g., by the time-dependent amplitudes, phases or detunings in Eqs.(1) or (5).A quantum control problem is then defined by a set of initial states, { ψ in l ⟩}, that should be transferred into a set of target states, { ψ trgt l ⟩}, where U(T, 0; {E k (t)}) is the system's time-evolution operator and T the total time.Any choice of {E k (t)} which fulfills Eq. ( 7) is considered a solution to the control problem.It is important to note that solutions to quantum control problems are usually not unique.Even for a fixed protocol duration T there typically exist many and often infinite many solutions.
The QSL for a given control problem is defined by the shortest protocol duration T QSL for which at least one solution exists, i.e., for which at least one set of control fields {E k (t)} exists that fulfills Eq. ( 7).In the context of quantum computing and the NISQ era, where time is a limited resource due to decoherence, it is desirable to implement quantum gates at the QSL.In order to calculate T QSL analytically, U(T, 0; {E k (t)}) must be analytically calculable for any set {E k (t)} of conceivable control fields -a requirement that typically limits an analytical calculation of T QSL to simple systems [75].
Besides an analytical determination, there are various methods to approximate T QSL .One prominent method is to calculate a lower bound T bound ≤ T QSL in order to get an estimate for T QSL itself.For the simplest case of a state-to-state control problem, lower bounds can be calculated analytically [24].In contrast, in case of multiple pairs of initial and final states, which describe the implementation of quantum gates, such lower bounds only exist for very simple systems [76].In most cases, one needs to resort to numerical tools for estimating T QSL .In that context, quantum optimal control theory has proven to be very useful [77] as it not only estimates T QSL quite accurately but additionally yields the control fields that implement the desired dynamics, i.e., realizing the transition from initial to target states, cf.Eq. (7).Since this is our method of choice, we introduce it in more detail in the following.

B. Quantum optimal control theory
Quantum optimal control theory (OCT) [78] is a toolbox providing analytical and numerical tools that allow to derive optimized control fields which solve a given control problem, e.g., in shortest time or with minimal error.Mathematically, an optimal control problem is formulated by introducing the cost functional where {ψ l (t)} is a set of time-evolved states and {E k (t)} a set of control fields to be optimized.The errormeasure ε T quantifies the distance between the timeevolved states ψ l (T )⟩ = U(T, 0; {E k (t)}) ψ in l ⟩ and the desired target states ψ trgt l ⟩ at the protocol's final time T , cf.Eq. (7).The term J t in Eq. ( 8) captures time-dependent running costs.In most cases, the error-measure ε T is the crucial figure of merit.In order to optimize for quantum gates, we use the error-measure [79] ε and take ψ trgt l ⟩ = O ψ in l ⟩ as the desired target states for the target gate O.The set {ψ in l } runs over the N trgt logical basis states affected by O with N trgt = 4, 8, 16 for two-, three-or four-body gates, respectively.
Since the cost functional J is formulated such that smaller values correspond to better solutions of the control problem, solving an optimal control problem becomes essentially a minimization task, i.e., to find a set of control fields {E opt k (t)} that minimizes J, respectively ε T .This is an optimization problem for which several numerical algorithms have been developed [80][81][82][83][84].Many of them are readily available in open-source software packages [85][86][87][88][89].

C. Optimization procedure and method
In the context of determining T QSL numerically via OCT, we search for the shortest protocol duration T for which the error-measure ε T , cf.Eq. ( 9), is still sufficiently small.In mathematical terms, we therefore define an error threshold ε max and search for the shortest time T for which min has a solution.However, the minimization over all conceivable control fields can not be done numericallyas there are infinitely many fields to check -and thus must be replaced by a sampling over finitely many fields in practice.In order to explore the function space efficiently by finite sampling, optimization algorithm as described in Sec.IV B can be used.Since we are interested in the fundamental QSL, we put only minimal limitations -apart from physically motivated limitations on amplitudes -on the form of each control field E k (t).
Hence, we need an optimization algorithm that is capable of exploring a function space of almost arbitrary field shapes.
As our method of choice we use Krotov's method [90], a gradient-based optimization algorithms for timecontinuous control fields.While a more detailed description of Krotov's method is given in Appendix B, its basic working principle is outlined in the following.It consists of an iterative update of the control fields {E k (t)}.Starting from a set of guess control fields {E 0 k (t)}, Krotov's method updates them until either ε T ≤ ε max or a maximum number of iterations is reached.This procedure can be viewed as a local but structured search within the space of all conceivable sets of control fields -the so-called control landscape.The locally searched area is thereby determined by the choice of the guess fields {E 0 k (t)}, which set the initial starting point of the search.While the local nature of this search might appear contradicting to the global search required for evaluating Eq. ( 10), it can be turned into an approximate global search by using various sets of randomized guess fields.The combined effect of all these local searches "cover" a larger fraction of the control landscape.The total procedure for approximating T QSL using this method is thus to start with a protocol duration T for which the optimization algorithm finds solutions, i.e., optimized fields giving rise to ε T ≤ ε max , and then to consecutively lower T until none of the various sets of randomized guess fields finds a solution anymore.Appendix C summarizes the details regarding the generation of random guess fields as well as each field's parametrization within Krotov's method.
Similar application of numerical optimal control techniques have previously shown excellent agreement with analytically provable QSLs [91,92].At worst, this method could overestimate the actual QSLs, in which case the actual QSLs would be even smaller and the circuit and gate times in Sec.II that use the QGS would be even better.

V. BENCHMARKING QUANTUM GATE TIMES ON 2D ARCHITECTURES
In this final section, we now present the detailed results regarding the QSLs obtained via methods described in Sec.IV for the various gates listed in Table I and have been used to calculate the circuit run times in Sec.II.

A. Neutral atoms
In this section, we determine the QSLs for different gates on neutral atoms by utilizing the control knobs available in Hamiltonian (1).To this end, we set the vdW interaction strength between Rydberg levels to V 2π = 40 MHz and assume a maximally achievable Rabi frequency of Ω max 2π = 0.1V = 4 MHz for both Ω ↓ (t) and Ω ↑ (t) as well as ∆ max 2π = 0.3V = 12 MHz for ∆(t) .These parameters are in the same regime than those reported in recent experiments [20,39,93].
The markers in Fig. 3 (left column) show the achievable gate error ε T , cf.Eq. ( 9), for various gates and various gate times T on neutral atoms.Each individual marker thereby indicates the result of a single optimization with Krotov's method [cf.Appendix (B)] i.e., the final error ε T after 1500 iterations when starting from random guess fields generated via Eq.(C1).In the following we set ε max = 10 −3 for all gates and stop any optimization as soon as this threshold is reached.The circles correspond to the "parallel" configuration where all five possible control fields Ω ↓ (t), Ω ↑ (t), ϕ ↓ (t), ϕ ↑ (t) and ∆(t) have been optimized.In contrast, the crosses correspond to the "phase" configuration, where only ϕ ↑ (t) has been optimized while Ω ↑ (t) = Ω max , ∆(t) = ∆ max and Ω ↓ (t) = ϕ ↓ (t) = 0 have been kept fixed.In both cases, we obtain a QSL of T CZ QSL = 350 ns.The third field configuration, which we called "sequential" configuration (diamonds), consists of a sequential use of Ω ↓ (t), ϕ ↓ (t) and Ω ↑ (t), ϕ ↑ (t), i.e., the first half of the protocol we have Ω ↑ (t) = ϕ ↑ (t) = 0 and in the second half Ω ↓ (t) = ϕ ↓ (t) = 0.This configuration is inspired by an adiabatic protocol for implementing ZZZZ(γ) gates, cf.Eq. (A6) and Ref. [66].Its QSL T CZ QSL = 700 ns is twice as long compared to the other two configurations.

Two-qubit gates
In order to compare the results from the three configurations in terms of how successful the optimization has been in finding solutions, the histogram on the right side of Fig. 3 (a) provides the probability density for obtaining final errors ε T within certain ranges.For obtaining a solution with errors ε T ≤ ε max , we find the lowest probability for the "sequential" configuration (diamonds) -coinciding with the highest QSL -and the highest and almost identical probability for the other two configurations.Among those two, the "phase" configuration (crosses) needs to be emphasized in particular.From a physical perspective, setting Ω ↓ (t) and ϕ ↓ (t) to zero automatically ensures that ↓↓⟩ is mapped onto itselfas required by the CZ gate.This is not automatically guaranteed by the other two configurations and the optimization needs to explicitly ensure it and therefore needs to solve a slightly more complex optimization problem.However, the advantage of having one basis state automatically mapped correctly does not translate into an advantage regarding the QSL of T CZ QSL = 350 ns or the reachable error in general.Interestingly, both the "parallel" and "phase" configurations yield the same achievable lowest errors for each gate time T .This is visually highlighted by the lines connecting the lowest errors per T in Fig. 3 (a).Moreover, it should be stressed that these errors ε T are reached for almost every set of initial guess fields, i.e., independent of the initial starting point within the problem's control landscape, and obtained independently for both configurations.This supports the conjecture that T CZ QSL = 350 ns is the actual QSL for a CZ gate and generally validates our method in determining the QSL.From an optimal control perspective, it is interesting to see that the flexibility originating from the extended set of available control fields in the "parallel" configuration can not be turned into an advantage in error or time compared to the "phase" configuration.From a practical perspective, the latter is advantageous for experimental realizations as it requires fewer physical resources.
While the three configurations discussed so far should only be viewed as examples, we did not find any configuration giving rise to faster CZ gates.Hence, we assume T CZ QSL = 350 ns to be the fundamental QSL across all configurations.A natural comparison for T CZ QSL with a value from the literature would be the gate time from the analytical protocol introduced in Ref. [20], especially because it uses the same control fields as the "phase" configuration to implement the gate.For our parameters, we find T CZ lit ≈ 340 ns as analytical gate time, which we assume identical with our QSL T CZ QSL = 350 ns given the rather coarse sampling of gate times T in Fig. 3 (a).However, it should be noted that the analytical protocol of Ref. [20] implements a CZ gate only up to local operations -operations that are already contained in our optimized gate protocols at the QSL.Nevertheless, for a fair comparison of analytical and QSL gate times, as needed in Sec.II, as well as for simplicity, we set both times to 350 ns in Table I.
The remaining panels (b)-(e) in the left column of Fig. 3 show the results for other gates from the QGS of Table I FIG. 3. Overview of the QSLs for the various gates of the "QSL gate set" (QGS) in Table I.The left (right) column shows the results for neutral atoms (superconducting circuits) for three different configurations of control fields (specified in the main text), respectively.Each marker represents the gate error ε T , cf.Eq. ( 9), after either reaching ε T ≤ εmax = 10 −3 or 1500 iterations of Krotov's method, cf.Appendix B, after starting from a set of random guess fields generated via Eq.(C1).While the lines connect the lowest errors reached for each gate time T within a given field configuration, the shaded background color indicates the range between the lowest and highest error.The marker density is shown at the right side of each panel as a histogram.The parameters for neutral atoms are V 2π = 40 MHz, 0 ≤ Ω↓(t), Ω↑(t) ≤ Ωmax = 0.1V and ∆(t) ≤ ∆max = 0.3V .The parameters for superconducting circuits are −40 MHz ≤ gnm(t) 2π ≤ 5 MHz, 6700 MHz ≲ ωn(t) ≲ 7100 MHz (exact values depending on n and taken from Ref. [8]) and −50 MHz ≤ Ωn,re(t), Ωn,im(t) ≤ 50 MHz.The anharmonic ladder for each transmon has been truncated after five levels with population in the highest level suppressed during optimization.The optimization results for ZZZ(γ) and ZZZZ(γ), cf.Eq. (A6), have been obtained for the maximally entangling gate at γ = π 4. The results for the CNOT gate on neutral atoms have been obtained using site-dependent control fields for each atom.
(b).It is the only gate for neutral atoms (among those we considered) that requires individual instead of global fields, i.e., it is the only gate for which we did not assume and ∆ n (t) = ∆(t) for all n but actually assume individual fields with unique field shapes directed at each atom.We nevertheless consider the same three field configurations as for the CZ gate in panel (a) but now applied to the individual fields Ω ↓,n (t), Ω ↑,n (t), ϕ ↓,n (t), ϕ ↑,n (t) and ∆ n (t) instead of their global versions.Even with this more general setting of control fields, we find only the "parallel" configuration (circles) to allow for the realization of a CNOT gate with a QSL of T CNOT QSL = 300 ns.In contrast, the "phase" and "sequential" configurations do not allow to realize a CNOT gate at all.These results demonstrate that CNOT gates can be implemented using exclusively the site-dependent laser couplings of the qubit and Rydberg levels and no control knobs for single-qubit gates.However, an experimentally more convenient op-tion, requiring no full site-dependent control of coupling qubit to Rydberg states, is to realize CZ gates with global laser pulses and convert them into CNOTs via local operations.We nevertheless include the CNOT gate and its QSL for completeness in our analysis as well as in the QGS in Table I but exclude it from any quantum circuit for neutral atoms in Sec.II for the reasons just mentioned.
Figure 3 (c) shows the results for a SWAP gate.Like for the CNOT gate, we find the "parallel" configuration -assuming again global fields that are identical for each atom -to be the only one capable of realizing a SWAP gate.We obtain T SWAP QSL = 400 ns as its QSL.The other two configurations are not capable of realizing SWAP gates.However, since the SWAP gate, in contrast to the CNOT gate, can be realized with global control fields, we believe it to be experimentally feasible and thus include it as a viable gate in the QGS in Table I.In panel (c), the impact of ∆max Ωmax is visualized.In the latter case, we have γ = π 4 and a fixed Ωmax while ∆max is modified.The QSLs in panels (b) and (c) have been determined using the parameters and "phase" configuration described in Fig. 3.

Three-and four-qubit constraint gates
So far, we have discussed the results for two-qubit gates in Fig. 3 (a)-(c).These gates and their respective QSLs have been used in determining the quantum circuits and calculating the corresponding circuit run times in Sec.II -especially for those circuits in the SGM discussed in Sec.II B. In contrast, for QAOA circuits in the PM [29], circuit representations without two-qubit gates exist, e.g., when the required three-and four-qubit constraint gates ZZZ(γ) and ZZZZ(γ), cf.Eq. (A6), are available natively and thus must not be decomposed into single-and two-qubit gates.In the following, we determine and discuss their QSLs.
It should first be noted that, from an algorithmic point of view, it is irrelevant whether, e.g., ZZZ(γ) or e iα ZZZ(γ), with α some arbitrary phase, is realized in experiments.The latter just changes the global phase of the quantum state during circuit execution.The same holds for the ZZZZ(γ) gate.In both cases, we may choose α arbitrarily but, for practical reasons, choose it such that the states ↓↓↓⟩ and ↓↓↓↓⟩ do not acquire a phase from the respective constraint gates.In the following, we therefore consider the phase-shifted constraint gates instead of the ones from Eq. (A6).
In the context of QAOA circuits in the PM, these constraint gates need to be realized for various γ, cf.Eq. (A3).However, since we can not determine their QSL for each value of γ, we first analyze the gate's entangling power [94] as a function of γ in Fig 4 (a).We observe maximal entangling power for γ = π 4 for both ZZZ(γ) and ZZZZ(γ) and thus decide to first benchmark their QSLs for that particular value as we expect the con-trol problem in that case to be the hardest to solve and consequently the QSLs to be the largest.
Figure 3 (d) and (e) show the optimization results for the phase-shifted constraint gates ZZZ(γ) and ZZZZ(γ), cf.Eq. ( 11), for γ = π 4, respectively.We use the same three configurations of control fields as for the two-qubit gates in panels (a)-(c) and find all configurations to be capable of implementing the constraint gates but observe the best performance for the "parallel" and "phase" configuration.While both configurations yield the same QSLs of T ZZZ QSL = 400 ns and T ZZZZ QSL = 500 ns for ZZZ(γ) and ZZZZ(γ), respectively, the "phase" configuration exhibits the better convergence behavior.Like for the CZ gate in Fig. 3 (a), we observe every set of guess fields for this configuration to reliably converge towards the same final error ε T for every T .Since the "parallel" configuration yields the same achievable errors as a function of gate time T and uses by definition a different control strategy than the "phase" configuration, we believe to have reliably identified the QSLs for the constraint gates.In terms of experimental feasibility, the "phase" configuration is advantageous as it requires fewer hardware and control resources.
The fact that using ϕ ↑ (t) as the only time-dependent control field suffices for implementing ZZZ(γ) or ZZZZ(γ) originates from considering the gates' phaseshifted versions of Eq. (11).In detail, since the states ↓↓↓⟩ and ↓↓↓↓⟩ are the only states among the 8 or 16 basis states of ZZZ(γ) or ZZZZ(γ) that technically require non-zero Ω ↓ (t) and ϕ ↓ (t) in order to be phaseconfigurable, we simply avoid this requirement by considering the gates' phase-shifted version.For the remaining 7 or 15 basis states, which all have at least one atom initially in the ↑⟩ state, the phase ϕ ↑ (t) together with a constant Ω ↑ (t) = Ω max is sufficient to correctly adjust all phases.The "phase" configuration thus represents a hardware-efficient control scheme to realize fast ZZZ(γ) and ZZZZ(γ) gates.
Our results furthermore reveal that the "sequential" configuration, which was recently introduced in Ref. [66] and designed to implement high-fidelity ZZZZ(γ) gates, seems not to be ideal when it comes to gate time as its configuration-specific QSL is roughly twice as long as the QSLs for the other configurations.However, it should be noted that a comparison of the protocol from Ref. [66] with protocols at the QSL is not a fair comparison.On the one hand, the control scheme of Ref. [66] is based on adiabaticity -a regime that we are far away from in our numerical calculations.On the other hand, while the parameter γ is a tunable variable in the adiabatic control scheme, the results in Fig. 3 (d) and (e) are only valid for γ = π 4. If gates with a different γ are required, one explicitly needs to optimize control fields for that purpose.While it is beyond the scope of this work to examine whether there exists an analytical control scheme with configurable γ at the QSL, in the following, we nevertheless provide an analysis of the QSLs for ZZZ(γ) and ZZZZ(γ) beyond γ = π 4. In detail, after having identi-fied the "phase" configuration as the most reliable configuration to determine a gate's QSL, we provide the QSLs for other γ values in Fig. 4 (b).We find the QSLs to be almost constant for γ > 0 and only zero for γ = 0, in which case ZZZ(γ) and ZZZZ(γ) coincide with the identity operation.Interestingly, we do not observe a decrease in the QSLs for γ = π 2 in which case the constraint gates are no longer entangling, cf.Fig. 4 (a), and should therefore theoretically be implementable with local operations only.We suspect that we do not see a decrease of the QSLs since we do not consider any control fields for local operations in Hamiltonian (1) and thus need to implement the local gates by means of the Rydberg levels.
We moreover analyze the dependence of the QSLs on the ratio ∆ max Ω max in Fig. 4 (c).Surprisingly, we observe the QSLs for both ZZZ(γ) and ZZZZ(γ) to be independent on this ratio and find ∆ max = 0 to be a viable option.
In general, we observe that the QSLs for ZZZ(γ) and ZZZZ(γ) are only slightly larger than those of the twoqubit gates.In view of quantum circuits for PM-QAOA, where such constraint gates are required, it is thus advantageous to have these gates natively available, since their representation via single-and two-qubit gates [29] consumes significantly more time.This effect can be seen in Fig. 1 (c), where the plus signs illustrate the data for constraint gates expanded in single-and two-qubit gates and the crosses for the usage of native constraint gates.One possible explanation for the short QSLs for ZZZ(γ) and ZZZZ(γ) compared to those of the two-qubit gates might be that the permutation symmetry of atoms within the pseudo-2D architecture [66], i.e., V nm = V , matches the permutation symmetry of the gate operation itself.
At last, we therefore examine the impact of the pseudo-2D architecture onto the constraint gates ZZZ(γ) and ZZZZ(γ).For the three-and four-qubit constraint gates, the change to an actual, planar 2D architecture implies that while the couplings between nearest neighbors remain V , diagonal couplings, i.e., couplings between nextto-nearest neighbors or, in other words, qubits on opposite edges of a 2×2 square plaquette, are replaced by V 8.The stars in Fig. 5 (a) and (b) show the corresponding results for ZZZ(γ) and ZZZZ(γ) gates and γ = π 4, respectively.While their QSLs within the pseudo-2D architecture are T ZZZ QSL = 400 ns and T ZZZZ QSL = 500 ns, they become T ZZZ QSL,2D = T ZZZZ QSL,2D = 600 ns in the actual, planar 2D architecture.Interestingly, this corresponds only to a relatively small increase in the QSLs for both gates.A possible explanation might be that the gate speed for neutral atoms is primarily determined by the maximal Rabi frequency Ω max , which is identical for both examples, and not so much by the interatomic interaction strength, which is different for both architectures.
Although we believe the pseudo-2D architecture to be viable in experiments due to the great flexibility to arrange neutral atoms [95], we nevertheless take the QSLs for the actual, planar 2D architecture to be the reference gate times within the QGS in Table I and Sec.II.How-ever, recall that the constraint gates are -within our study -only relevant for the QAOA circuits in the PM, cf.Fig. 1 (c).In order to nevertheless allow for a comparison of the run times in the actual, planar 2D architecture (orange crosses) with those using the pseudo-2D architecture, we add the latter as orange pentagons to Fig. 1 (c).

B. Superconducting circuits
Similar to neutral atoms in Sec.V A, we now determine and analyze the QSLs for the same quantum gates but for superconducting circuits.The available control knobs to implement these gates are the tunable qubit frequencies ω n (t), the tunable coupling strength g nm (t) between qubits and the (auxiliary) X-type local control fields Ωn,re (t) and Ωn,im (t) in Hamiltonian (5).In order to remain experimentally realistic, we take parameters from Ref. [8].To this end, we single out a 2 × 2 plaquette consisting of four qubits from the generally larger 2D architecture.We take the qubit frequencies and their tunable range to be given by 6700 MHz ≲ ω n (t) ≲ 7100 MHz and their anharmonicities given by α n ≈ 200 MHz, with exact values depending on n.The tunable coupling strength is given by −40 MHz ≤ g nm (t) ≤ 5 MHz, as reported in Ref. [8].Moreover, we assume the (auxiliary) X-type control fields Ωn,re (t) and Ωn,im (t), which encode the physical X-type control fields Ω n (t) and their tunable driving frequencies ωn (t), to satisfy −50 MHz ≤ Ωn,re (t), Ωn,im (t) ≤ 50 MHz.

Two-qubit gates
Figure 3 (f) shows results for a CZ gate on superconducting circuits using three different configurations of control fields -different, of course, from those used for neutral atoms.In the "full" configuration (circles), all available control fields, ω n (t), g nm (t), Ωn,re (t) and Ωn,im (t), are time-dependent and being optimized.In the "no-X" configuration (crosses), only ω n (t) and g nm (t) are optimized while the X-type control fields are set to zero, Ωn,re (t) = Ωn,im (t) = 0.At last, in the "interaction" configuration only ω n (t) is time-dependent and optimized while g nm (t) = −40 MHz = g max is set to its maximum magnitude and Ωn,re (t) = Ωn,im (t) = 0.For the CZ gate in Fig. 3 (f), we observe all three configurations to indicate the same QSL of T CZ QSL = 10 ns.In terms of convergence behavior, the "interaction" configuration shows the best performance as indicated by the probability density on the right side of Fig. 3 (f).In general, the three configurations show slightly worse convergence behavior than the three configurations for the neutral atoms, cf.Fig. 3 (a).Nevertheless, since all three configurations indicate the same QSL and, in general, yield the same achievable error ε T depending on T , we believe our method to determine the QSL to yield reliable results also ).The left column shows results for neutral atoms, obtained using the "phase" configuration of Fig. 3, for the pseudo-2D architecture (triangles) and an actual, planar 2D architecture (stars).The right column compares the results for superconducting circuits using the "no-X" configuration from Fig. 3.The data correspond to the physical architecture of Ref. [8] where only nearest neighbor (NN) couplings between qubits are present (triangles) and where diagonal couplings, i.e., next-to-nearest neighbor (NNN) couplings, are added (stars).
for superconducting circuits and conjecture T CZ QSL = 10 ns to be the fundamental QSL for CZ gates.While there are reference implementations for CZ gates on similar architectures with tunable couplers [72,74,96,97], none of these architectures matches our architecture and parameter regime.Hence, we compare our QSL to the gate time of the fastest two-qubit gate, the Sycamore gate, reported in Ref. [8].We find T CZ QSL = 10 ns to be slightly faster than T Syc.lit = 12 ns [37].In Fig. 3 (g), the results for a CNOT gate are shown using the same three configuration as for the CZ gate.We find only the "full" configuration to be capable of realizing a CNOT gate, yielding the QSL T CNOT QSL = 14 ns, while the other two configuration are not capable of it.Among the available control knobs, the X-type control fields Ωn,re (t) and Ωn,im (t) are crucial for a CNOT gate to be feasible.For the SWAP gate, we again find all three configuration to converge, cf.Fig. 3 (h), with the "no-X" and "interaction" configuration showing the best convergence behavior.We find a QSL of T SWAP QSL = 12 ns.

Three-and four-qubit constraint gates
We now turn towards the three-and four-qubit constraint gates ZZZ(γ) and ZZZZ(γ).However, note that in the following and in contrast to neutral atoms, we do not consider their phase-shifted versions, cf.Eq. ( 11), but their original versions, cf.Eq. (A6).
Figure 3 (i) and (j) show the results for the constraint gates ZZZ(γ) and ZZZZ(γ), respectively.We use the same three configurations of control fields as for the twoqubit gates of panels (f)-(h).We observe the "interaction" configuration to have the best convergence behavior while the "no-X" configuration gives in both cases rise to the shortest QSLs of T ZZZ QSL = 24 ns and T ZZZZ QSL = 80 ns for ZZZ(γ) and ZZZZ(γ), respectively.Both QSLs have been determined for γ = π 4. The QSL for ZZZ(γ) has thereby been confirmed independently by both the "full" and "no-X" configurations as both yield almost identical achievable errors ε T as a function of gate time T .We thus assume the QSL T ZZZ QSL = 24 ns to be well backed up.The situation is different for ZZZZ(γ), for which we observe very different convergence behaviors for the three configurations, cf.Fig. 3 (j).While the "no-X" configuration yields the shortest QSL of T ZZZZ QSL = 80 ns, the "full" configuration shows slightly better performance for T < T ZZZZ QSL , which might suggest that even shorter gate protocols for ZZZZ(γ) may exist but our method did not find them due, e.g., limited numbers of guess fields for exploring the control landscape.In the following, as well as for the calculation of circuit run times in Sec.II, we nevertheless assume T ZZZZ QSL = 80 ns to be the QSL for the ZZZZ(γ) gate as it is the fastest gate time T among the three configurations of control fields for which Krotov's method was able to find a solution with ε T ≤ ε max .
Interestingly, while we observe the QSLs for ZZZ(γ) and ZZZZ(γ) to be almost identical for neutral atoms and only slightly longer than the QSLs for the two-qubit gates, we observe the same only for the ZZZ(γ) gate for superconducting circuits.The ZZZZ(γ) gate has a much longer QSL compared to the other QSLs on that platform.To rigorously decide whether this is due to the non-ideal convergence behavior observed in Fig. 3 (j) or has some deeper physical origin is beyond the scope of this study.In an attempt to tackle this question nevertheless, we consider the scenario of having additional diagonal couplings, i.e., next-to-nearest neighbor couplings, among the transmons in the superconducting circuit architecture.For Hamiltonian (3), this implies adding two additional rows with couplings g 13 (t) and g 24 (t) in the same form as the already present couplings g 12 (t), g 23 (t), g 34 (t) and g 41 (t).This scenario is inspired by the pseudo-2D architecture for neutral atoms, which also exhibits identical nearest neighbor (NN) and nextto-nearest neighbor (NNN) couplings and where having these couplings is advantageous.Figure 5 (c) and (d) show the results for the two cases with only NN couplings (triangles) and with NN plus NNN couplings (stars).Besides observing much better convergence properties for the latter case, we also obtain improved QSLs of T ZZZ QSL,NNN = 20 ns and T ZZZZ QSL,NNN = 60 ns for the ZZZ(γ) and ZZZZ(γ) gate, respectively.However, despite improvements in this scenario, the QSL of the ZZZZ(γ) gate does not get close to the QSL of the ZZZ(γ) gate as it does for neutral atoms.The presence of NNN couplings might therefore be just a partial explanation of the QSL differences for the constraint gates between neutral atoms and superconducting circuits.Despite the scenario with NNN couplings not reflecting the actual architecture for superconducting circuits, we nevertheless add the circuit run times for the QAOA circuits in the PM using these faster constraint gates to Fig. 1 (c) for reference purposes as purple pentagons.

VI. CONCLUSIONS
In this study, we have determined the QSLs for several common two-qubit and two specific multi-qubit quantum gates for two promising quantum computing platforms that allow for a 2D arrangement of qubits -neutral atoms and superconducting circuits.We have used OCT to determine the QSLs, as it provides a generally applicable tool that warrants a fair comparison of both platforms.
On the level of individual quantum gates, our study allows assessing how close gate protocols from the literature are compared to their fundamental QSLs or, in other words, how much can (theoretically) still be gained in time if known gate protocols are replaced by numerically optimized ones.We find the QSLs for all investigated two-qubit gates, encompassing CNOT, CZ and SWAP, to be very similar within each platform and close to the reference gate times for a CZ gate in case of neutral atoms [20] and a Sycamore gate in case of superconducting circuits [8].
On the level of quantum algorithms, our study has moreover allowed us to determine the "QSLs" for entire quantum circuits.To this end, we have assumed a 2D grid architecture for both platforms and qubit connectivity that allows physical quantum gates only between neighboring qubits.However, we have assumed these gates to be executable at the QSL.This has allowed us to calculate the circuit run times at the "QSL" for two paradigmatic quantum algorithms -the QFT and a single step of the QAOA.We find that the corresponding weighted circuit run times scale comparably with respect to the system size.Furthermore, we observe this to be independent of the chosen gate set used to translate the quantum algorithms into executable quantum circuits, i.e., independent of whether the SGS or QGS is used.We observe platform-independently that the QGS yields circuit run times and gate counts that are roughly half compared to those in the SGS.On the one hand, this demonstrates that further speedup of circuit run times is theoretically possible on both platforms.On the other hand, it also shows that both platforms perform equally well when running prototypical quantum circuits using typical, present-day NISQ hardware.
Besides a representation of the quantum circuits in the SGM, we have also explored the representation of the same quantum algorithms in the PM [26,27,29].We observe a reduction in circuit run times as well as in gate counts in most cases.This reduction comes at the expense of requiring more physical qubits but without the need to change the geometrical layout or the control hardware.In this context, we want to specifically emphasize the circuit run times of a single QAOA step in the PM.Compared to its representation in the SGM, it offers a constant circuit depth independent on the problem complexity but requires the implementation of local three-and four-qubit constraint gates [29].For superconducting circuits we find their gate times at the QSL to be roughly similar to the run times of their decompositions into single-and two-qubit gates.In contrast, for neutral atoms we find the direct implementation of the constraint gates to be only slightly slower than any single two-qubit gate and especially much faster than their decompositions into single-and two-qubit gates.We nevertheless observe for both platforms run times for a single QAOA step on the order of 2 − 4% of the platform's intrinsic coherence time.While this corresponds to an improvement of one order of magnitude for a problem size with N = 9 logical qubits, this grows to an improvement of three orders of magnitude for N = 121.Since the gate counts, when the native implementations of the constraint gates are used, are also smaller in the PM compared to the SGM, we believe the PM to be advantageous in terms of the described resources for a single QAOA step.It minimizes errors due to finite coherence time and allows for large-depth QAOA.However, we want to emphasize that deeper PM-QAOA circuits are in general necessary to reach similar success probabilities compared to lowerdepth SGM-QAOA implementations.
While we want to emphasize that the feasibility of a quantum circuit depends on more than just its run time and gate count, our benchmark study demonstrates that at least these two factors can be theoretically further improved using optimized gate protocols.This holds for both neutral atoms and superconducting circuits.by the Austrian Science Fund (FWF) through a START grant under Project No. Y1067-N27 and I 6011.This research was funded in whole, or in part, by the Austrian Science Fund (FWF) SFB BeyondC Project No. F7108-N38.For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.This project was funded within the Quan-tERA II Programme that has received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No. 101017733.
Appendix A: Quantum circuits for QFT and QAOA

Standard gate model
The quantum Fourier transform (QFT) is a key ingredient in Shor's algorithm for integer factorization [1] and thus a prototypical application for quantum computers.While a typical circuit representation of a QFT contains exclusively Hadamard gates, H, and controlledphase gates, R n = diag{1, 1, 1, exp{2πi 2 n }}, a more efficient representation with fewer gates and lower circuit depths can be constructed using SWAP gates [45].
The quantum approximate optimization algorithm (QAOA) aims at finding approximate solution to combinatorial optimization problems [4].For instance, let us consider the task to find the ground state of the N qubit spin glass Hamiltonian where J nm denotes the interaction strength between qubits n and m and σ z the Pauli-z operator on qubit i.The QAOA allows to find an approximate solution, i.e., an approximate ground state for Hamiltonian (A1), by applying the procedure where ψ in ⟩ is the ground state of the so-called mixing Hamiltonian H x and α k , β k ∈ [0, 2π) are angles -physically corresponding to evolution times -that are iteratively optimized via a classical, closed-loop feedback optimization and the energy expectation value of ψ out ⟩ being the objective to minimize.The number of steps is denoted by p.A single step in the QAOA is thus given by the application of the spin glass or problem Hamiltonian H z followed by the mixing Hamiltonian H x .While the latter corresponds to a parallel application of α kdependent single-qubit X rotations, R k x , in the associated quantum circuit -and is thus negligible time-wise -the circuit implementation of the former requires multiple CNOT gates as well as single-qubit phase gates, R nm z = exp{−iβ k J nm σ z }, containing information about J nm and β k .

Parity mapping
In the SGM, as described in Appendix A 1, every logical qubit is given by exactly one physical qubit and gates on logical qubits are equivalent to gates on physical qubits.This is different for the PM, where K > N physical qubits are required for a problem of N logical qubits and gates on logical qubits become different gates or even gate sequences on the physical qubits.However, due to the arrangement of physical qubits according to the PM, all gates between physical qubits are strictly local and thus require only nearest-neighbor connectivity.
For the QFT, we need K = N (N + 1) 2 physical qubits in the PM and find that Hadamard gates, H, on logical qubits become equivalent to several single-and twoqubit gates on neighboring physical qubits.In contrast, the logical two-qubit controlled-phase gates, R n , between any pair for logical qubits is given by exactly three parallel single-qubit gates on physical qubits [26,49].Hence, while the logical Hadamard gates require more resources in the PM, the logical controlled-phase gates require significantly less resources -especially for those gates where the logical qubits are far away from each other.
For the QAOA, we need K = N (N − 1) 2 physical qubits in the PM and Eq.(A2) becomes [29] ψ out ⟩ = (m) z [27].In order to solve optimization problems using the PM, we additionally need to constraint the dynamics to the 2 N −1 dimensional subspace within the 2 K dimensional physical Hilbert space that corresponds to the 2 N −1 eigenstates of Eq. (A1) that have unique eigenvalues [98].Hence, since not every eigenstates of H phys z has a logical counterpart in H z , it needs to be energetically penalized as it would not be a valid solution to the optimization problem.This is achieved by realizing C = K − N + 1 local three-and four-qubit constraints via [27]  with γ the effective constraint strengths, which -like in the original QAOA scheme of Eq. (A2) -are optimized in a classical, closed-loop feedback optimization.The gates in Eq. (A6) can be either realized directly [66] or by decomposing them into single-and two-qubit gates, e.g., by using four or six CNOT gates plus one γ-dependent single-qubit phase-gate for ZZZ(γ) or ZZZZ(γ), respectively [29].
It is important to note that the constraint gates are the only multi-qubit gates in Eq. (A3).Independent of N , their implementation can be parallelized with at most nine [66] or four [29] consecutive layers of constraint gates for neutral atoms or superconducting circuits, respectively.However, note that shallower circuits may be feasible by now [99].The difference between the two platforms originates from their different qubit-qubit coupling mechanism.The tunable coupler architecture of superconducting circuits [8] allows to switch off the coupling between any pair of qubits.As a consequence, all constraint plaquettes that do not share a common qubit, i.e., next-to-nearest neighbor plaquettes, can be implemented in parallel.A single QAOA step thus requires at most four layers of constraint gates to realize all of them.Neutral atoms, in contrast, interact via their Rydberg levels and therefore require additional spatial separation between atoms that are in their Rydberg levels but are not supposed to interact, i.e., to suppress unwanted interactions.Assuming that a single line of atoms in non-Rydberg levels suffices as a buffer between plaquettes that are to be implemented in parallel, this corresponds to two lines of plaquettes as a buffer in each spatial direction.This yields a maximal number of nine layers of constraint gates.Despite the platform-dependent differences, all quantum circuits corresponding to Eq. (A3) have constant circuit depth.
Appendix B: Krotov's method for quantum optimal control Krotov's method [90] is an iterative, gradient-based optimization algorithm for time-continuous control fields featuring a build-in monotonic convergence [82].To achieve the latter, Krotov's method requires a specific choice of the total optimization functional J, cf.Eq. ( 8).In detail, while the error measure ε T at final time T remains the relevant figure of merit that we want to minimized, Krotov's method achieves its minimization only indirectly by minimizing the total functional J where the time-dependent running costs J t are give by [79] J The superscripts i and i + 1 in Eqs.(B2)-(B4) indicate whether the corresponding quantity is calculated using the "old" fields from iteration i or the updated fields from iteration i + 1, respectively.
In order to turn Eq.(B2) into a proper update equation, the reference field E ref k (t) is taken to be the field E (i) k (t) from the previous iteration in which case the second term of the left-hand side of Eq. (B2) becomes its update.This choice causes the time-dependent costs J t , cf.Eq. (B1), to gradually vanish as the iterative procedure converges.Hence, the error-measure ε T at final time T becomes the dominant term within the total optimization functional J, cf.Eq. ( 8), and is thus predominantly minimized.Equation (B2) also reveals that while λ k can be used to control the general size of the update, S k (t) can be used to suppress updates at certain times.
We refer to Ref. [82] for a more detailed introduction of Krotov's method and to Ref. [85] for a detailed discussion about its numerical implementation.

Figure 3 (
Figure 3 (a)  shows the results for a CZ gate for three different, paradigmatic configurations of control fields.The circles correspond to the "parallel" configuration where all five possible control fields Ω ↓ (t), Ω ↑ (t), ϕ ↓ (t), ϕ ↑ (t) and ∆(t) have been optimized.In contrast, the crosses correspond to the "phase" configuration, where only ϕ ↑ (t) has been optimized while Ω ↑ (t) = Ω max , ∆(t) = ∆ max and Ω ↓ (t) = ϕ ↓ (t) = 0 have been kept fixed.In both cases, we obtain a QSL of T CZ QSL = 350 ns.The third field configuration, which we called "sequential" configuration (diamonds), consists of a sequential use of Ω ↓ (t), ϕ ↓ (t) and Ω ↑ (t), ϕ ↑ (t), i.e., the first half of the protocol we have Ω ↑ (t) = ϕ ↑ (t) = 0 and in the second half Ω ↓ (t) = ϕ ↓ (t) = 0.This configuration is inspired by an adiabatic protocol for implementing ZZZZ(γ) gates, cf.Eq. (A6) and Ref.[66].Its QSL T CZ QSL = 700 ns is twice as long compared to the other two configurations.In order to compare the results from the three configurations in terms of how successful the optimization has been in finding solutions, the histogram on the right

FIG. 4 .
FIG.4.Panel (a) shows the entanglement power[94] of the three-and four-qubit constraint gates ZZZ(γ) and ZZZZ(γ), cf.Eq. (A6), as a function of γ.In contrast, panels (b) and (c) examine the QSLs for these gates under various conditions on neutral atoms.In panel (b), the dependence of the QSL on the parameters γ is shown.In panel (c), the impact of ∆max Ωmax is visualized.In the latter case, we have γ = π 4 and a fixed Ωmax while ∆max is modified.The QSLs in panels (b) and (c) have been determined using the parameters and "phase" configuration described in Fig.3.

FIG. 5 .
FIG.5.Overview of different QSLs similar to Fig.3but exclusively for the three-and four-qubit constraint gates ZZZ(γ) and ZZZZ(γ), cf.Eq. (A6).The left column shows results for neutral atoms, obtained using the "phase" configuration of Fig.3, for the pseudo-2D architecture (triangles) and an actual, planar 2D architecture (stars).The right column compares the results for superconducting circuits using the "no-X" configuration from Fig.3.The data correspond to the physical architecture of Ref.[8] where only nearest neighbor (NN) couplings between qubits are present (triangles) and where diagonal couplings, i.e., next-to-nearest neighbor (NNN) couplings, are added (stars).

l
(t)⟩ are backward-propagated co-states and solutions to the equations d dt χ

TABLE I .
Overview of different gate sets and corresponding gate times on neutral atoms and superconducting circuits.While the row "standard gate set" (SGS) represents typical gates and times used on the respective platform, the multiqubit gates in the row named "QSL gate set" (QGS) correspond to an extended gate set with gate times at the QSL.
set" (QGS) from now on), which thus yields the current fundamental limits in circuit run times.We describe both gate sets in Sec.II A and use them to analyze circuit run times of QFT and QAOA quantum circuits both in the SGM in Sec.II B and the PM in Sec.II C. A brief review of the basic concepts and quantum circuits of a QFT and a single QAOA step within the SGM and the PM is given in Appendix A.A. Standard and QSL gate sets . The results for a CNOT gate are shown in panel "parallel" -∆(t) + parallel Ω ↓ (t), Ω ↑ (t) "phase" -only φ ↑ (t) with Ω ↑ (t) = Ωmax, ∆(t) = ∆max "sequential" -∆(t) + sequential Ω ↓ (t), Ω ↑ (t) [82]ψ l (t)}, {E k (t)}, t] = (t) is a reference field for the control field E k (t) that is to be optimized, S k (t) ∈ (0, 1] is a shape function and λ k > 0 a numerical parameter.With the choice of Eq. (B1), the update equation for field E k (t) becomes[82] ⟩ (t) = −iH (i+1) (t) ψ