Fast multi-qubit gates through simultaneous two-qubit gates

Near-term quantum computers are limited by the decoherence of qubits to only being able to run low-depth quantum circuits with acceptable fidelity. This severely restricts what quantum algorithms can be compiled and implemented on such devices. One way to overcome these limitations is to expand the available gate set from single- and two-qubit gates to multi-qubit gates, which entangle three or more qubits in a single step. Here, we show that such multi-qubit gates can be realized by the simultaneous application of multiple two-qubit gates to a group of qubits where at least one qubit is involved in two or more of the two-qubit gates. Multi-qubit gates implemented in this way are as fast as, or sometimes even faster than, the constituent two-qubit gates. Furthermore, these multi-qubit gates do not require any modification of the quantum processor, but are ready to be used in current quantum-computing platforms. We demonstrate this idea for two specific cases: simultaneous controlled-Z gates and simultaneous iSWAP gates. We show how the resulting multi-qubit gates relate to other well-known multi-qubit gates and demonstrate through numerical simulations that they would work well in available quantum hardware, reaching gate fidelities well above 99 %. We also present schemes for using these simultaneous two-qubit gates to swiftly create large entangled states like Dicke and Greenberg-Horne-Zeilinger states.

All quantum algorithms can be decomposed into a sequence of universal single-and two-qubit gates [2,18]. Current quantum computers are usually able to implement a universal gate set with arbitrary single-qubit rotations and one or two entangling two-qubit gates. However, many quantum algorithms, e.g., for optimization problems or quantum simulations, require the creation of large-scale entanglement or many-body interactions. Such interactions between three or more qubits result in a large overhead in terms of circuit depth if they are to be arXiv:2108.11358v1 [quant-ph] 25 Aug 2021 decomposed into and compiled from two-qubit gates [2]. For example, decomposing the three-qubit Fredkin gate requires at least five two-qubit gates [19].
In this article, we show how various multi-qubit gates can be constructed by simply applying multiple twoqubit gates simultaneously to several qubits such that at least one of the qubits is involved in two or more of the two-qubit gates. The multi-qubit gates we propose can thus be implemented in existing quantum hardware adapted to standard single-and two-qubit gates, without any additional components, complicated pulse shapes, or changes in hardware design being required. Furthermore, our multi-qubit gates are as fast as, or faster than, the two-qubit gates from which they are constructed. Although our examples and discussion of experimental feasibility focus on implementations of quantum computing in superconducting circuits [10,13,15,[52][53][54][55][56], our ideas are applicable to any other quantum-computing platforms that implement two-qubit gates in similar ways. The results presented here thus open up avenues for speeding up quantum computation across many different algorithms and systems.
We illustrate our general idea with two specific examples, simultaneous controlled-Z (CZ) gates and simultaneous iSWAP gates, but note that the simultaneous application of other gates also should be explored. In the first example, we consider CZ gates created by activating the transition between states |11 and |02 (or |20 ) [57,58], where |0 is the ground state and |1 is the first excited state of a qubit, and |2 is the second excited state, which typically is outside the computational subspace. Activating the interaction required for two such gates simultaneously to the nearest neighbours in a linear chain of three qubits, with the middle qubit being the one where the second excited state |2 is populated during the gates, results in a three-qubit gate where both CZ and SWAP are applied to the outer qubits conditioned on the middle qubit being in |1 . This three-qubit gate, which we denote CCZS [59], takes less time than a single CZ and would require at least three sequential two-qubit gates if it were to be decomposed. The well-known three-qubit iFredkin gate [60] can be real-ized by adding a two-qubit gate after the CCZS gate. Furthermore, by changing the relative strengths of the constituent CZ gates and their detuning, a whole family of three-qubit gates can be created. These gates can be used to create many-qubit entangled states, e.g., a Greenberger-Horne-Zeilinger (GHZ) state [61,62] in a single step or large Dicke states [63][64][65] in a few steps, and have applications in phase estimation [2,66], Hamiltonian simulation [67][68][69], and swap tests [70] for quantum machine learning [71].
In our second example, we consider iSWAP gates created by coupling the states |01 and |10 . Just like for the CCZS gate above, simultaneous activation of the interaction for two such gates in a linear chain of three qubits creates a three-qubit gate, which we denote DIV for "divider" gate. The DIV gate distributes excitations among all three qubits within the subspaces with fixed excitation number in the computational subspace. Similar to the CCZS gate, the DIV gate is faster than the two-qubit gates created by activating the same interactions and can be used to create both GHZ and large Dicke states. By changing the relative strengths of the constituent iSWAP gates and the gate time, a family of different three-qubit DIV gates can be realized. Since both the DIV gates and the CCZS gates conserve the number of excitations, they may find applications in quantum-chemistry calculations with a fixed number of electrons [72] or in the mixing layer of the quantum alternating operator ansatz [73] for constrained combinatorial optimisation with conserved Hamming weights [74].
This article is organized as follows. In Section II, we present the details for generating a family of multiqubit gates through simultaneous application of multiple CZ gates. We show how this family of three-qubit gates can be decomposed into a sequence of three twoqubit gates and how the three-qubit gates can be used to implement other well-known three-qubit gates through some additional operation. We then present schemes for rapidly generating large entangled states using our threequbit gates. Finally, we show, through numerical simulations with parameters from state-of-the-art superconducting quantum-computing platforms, that our threequbit gates are ready to be implemented with high fidelity and short gate times in currently available quantum hardware. In Section III, we repeat these steps for simultaneous application of iSWAP gates instead of CZ gates. We conclude in Section IV and give an outlook for future work and applications in Section V. Some further analytical calculations for the simultaneous CZ gates with an additional coupling between the outer qubits in the linear chain are given in Appendix A. The setup considered is a linear chain of three qubits with nearest-neighbour coupling. Going from left to right in the chain, we denote the qubits q1, q0, and q2. The CZ gates CZ0j between qubits 0 and j = {1, 2} are applied simultaneously by activating a coupling between the |101j and |200j states with the coupling strength λj(t). (b) The transitions in the three-qubit system activated by the application of the CZ gates. With the three-qubit states denoted by |q0q1q2 , the transitions |11x ↔ |20x with x = {0, 1} are activated by CZ01 (red), and the transitions |1x1 ↔ |2x0 are activated by CZ02 (green). We assume that both CZ gate operations are detuned by δ from resonance. (c) We denote the threequbit operation resulting from the simultaneous application of the two CZ gates by CCZS (controlled-CZS), since it applies both CZ and SWAP gates to the target qubits q1 and q2 conditioned on the control qubit q0.

A. Setup and gate operation
We here consider simultaneous application of CZ gates that are based on making the states |11 and |02 (or |20 ) resonant [57,58]. In such gates, the states |00 , |01 , and |10 do not couple to other states and remain unchanged while the state |11 acquires a π phase shift when its population is transferred to |02 (or |20 ) and back. In superconducting circuits, this can be achieved either by rapidly tuning the frequencies of the two qubits in and out of the desired resonance [58,[75][76][77][78][79][80][81] or by parametric modulation of a coupler connecting the two qubits activating the interaction between |11 and |02 (or |20 ) [82][83][84][85]. For both methods, high gate fidelities have been demonstrated for short gate times. In Ref. [78], a gate fidelity of 99.9 % was reached for a gate time of 60 ns.

Hamiltonians and time evolution
We first treat the case of three qubits with simultaneous application of two CZ gates (the case of more qubits is discussed further in Section II D below). We consider the setup shown in Fig. 1(a), with the three qubits arranged in a linear chain such that qubit q 0 is in the middle, qubit q 1 on the left, and qubit q 2 on the right. The more complicated case with an additional direct coupling existing between q 1 and q 2 is discussed in Appendix A.
In the setup of Fig. 1(a), the transitions between states |1 0 1 j and |2 0 0 j are coupled with a strength λ j (t) to implement the standard two-qubit gates CZ 0j between qubits 0 and j = {1, 2} by activating the coupling for a time corresponding to a complete transfer of population from |1 0 1 j to |2 0 0 j and back. If both these CZ gates are applied simultaneously, transitions |110 ↔ |200 ↔ |101 and |201 ↔ |111 ↔ |210 , where the states are ordered as |q 0 q 1 q 2 , are activated. This creates a Λ-type three-level system and a V -type three-level system, as shown in Fig. 1(b).
With all other transitions except the ones shown in Fig. 1(b) far off resonance, the Hamiltonian for the threequbit system can be written in the interaction picture as ( = 1 throughout this article) where H.c. denotes Hermitian conjugate and δ is the detuning, assumed to be the same, for the transitions of both CZ gates. To analyze the time evolution generated by H, it is convenient to deal with the two effective threelevel systems in Fig. 1(b) separately. For the effective Λ-type three-level system, i.e., the subspace spanned by |101 , |200 , and |110 , we can introduce a new basis: the bright state |B , the dark state |D , and the excited state |E . These states are given by with In this basis, the Hamiltonian of this subspace becomes where and σ (B,E) i are the Pauli matrices in the basis of |B and |E .
We now consider the simple case where λ 1 , λ 2 , and δ are time-independent. In that case, the time evolution for the three-level system only affects the two-level subspace spanned by |B and |E . The time-evolution operator becomes where n (B,E) = 1 For this time evolution to yield a useful gate, we need to eliminate any leakage to the state |E = |200 , since it is outside the computational subspace. When starting in the computational subspace, the shortest evolution time which fulfils this condition is After this time, the states |B and |E both acquire a phase factor −e −iγ , where while the dark state |D remains unchanged. Since |B and |D also constitute a basis for the subspace spanned by |101 and |110 , the effect of the time evolution can be written [86,87] where n = (sin θ cos φ, sin θ sin φ, cos θ), Here, the Pauli matrices σ i are in the basis of |101 and |110 . A similar analysis can be performed for the effective V -type three-level system, i.e., the subspace spanned by |111 , |210 , and |201 . Introducing the new basis states the Hamiltonian of this subspace can be written as Thus, time evolution until the gate time t gate will lead to both |B and |E acquiring a phase factor −e iγ . However, |B and |D span the subspace of the states |201 and |210 , neither of which is in the computational subspace, and thus will not be populated in the intial or final states of the gate. The only effect of the gate in the effective V -type three-level system is thus to bestow a phase factor −e iγ on |111 .

The family of three-qubit gates
Summarizing the results from the analysis above, we see that the eight states in the computational subspace of the three qubits are affected as follows: |101 and |110 obey the time evolution given by Eq. (12), |111 will acquire a phase factor −e iγ , and all the other states are unchanged. This is similar to the three-qubit Fredkin gate (controlled-SWAP) [2,19,88,89], which swaps the states of two target qubits conditioned on the state of a control qubit, i.e., |101 and |110 are swapped if the first qubit is the control qubit. Our gate, which we denote CCZS [see Fig. 1(c)], also implements a SWAP-like operation on the outer qubits q 1 and q 2 , conditioned on the middle qubit q 0 , but adds phase factors to |101 , |110 , and |111 . The gate can be written as and the parameters θ, φ, and γ are set by the coupling strengths λ 1 , λ 2 and the detuning δ according to the relations

Examples of three-qubit gates
It is illuminating to study a few of the simplest parameter choices for the CCZS gate. If we set λ 1 = λ, λ 2 = 0, and δ = 0, we recover the two-qubit CZ gate acting on q 0 and q 1 . In the same way, if instead λ 1 = 0, λ 2 = λ, and δ = 0, we obtain the two-qubit CZ gate acting on q 0 and q 2 . The gate time for these gates is t gate = π/λ. If we instead apply both these CZ gates simultaneously, i.e., λ 1 = λ 2 = λ and δ = 0, we obtain CCZS(θ = π/2, φ = π, γ = 0), for which The gate time for this gate is t gate = π/ √ 2λ, i.e., this three-qubit gate is a factor √ 2 faster than the two-qubit CZ gates generated by the two interactions from which the CCZS gate is constructed. We can set the phase φ by adjusting the relative phase of the coupling strengths λ 1 and λ 2 . For λ 1 = λ, λ 2 = −λe −iφ , and δ = 0, the controlled part of the gate becomes The gate time remains t gate = π/ √ 2λ. Further tuning can be achieved by changing the relative amplitudes of the coupling strengths λ 1 and λ 2 . Setting λ 1 = λ, λ 2 = −Kλe −iφ , and δ = 0, we have θ = 2 arctan K and the controlled part of the gate becomes The gate time becomes t gate = π/ √ 1 + K 2 λ. This is faster than the corresponding individual CZ gates, which would take t gate = π/λ and t gate = π/Kλ, respectively, on their own.

Time-dependent parameters
In the derivation of Eqs. (19)- (23), which constitute the main result of this section, we assumed for simplicity that the coupling strengths λ 1 , λ 2 and the detuning δ were constants. However, in actual experiments, at least the coupling strengths λ 1 and λ 2 will need to be turned on and off, and this will not be done with a perfect step function. Fortunately, it is feasible to vary these parameters in time as long as they have the same time dependence. The principle is the same as for nonadiabatic holonomic gates [87,[90][91][92]. As Eq. (12) shows, we can thus construct arbitrary rotations in the space spanned by the states |101 and |110 , where the angles θ, φ are controlled by the relative strengths of the two CZ gates being applied simultaneously [see Eq. (21)].

B. Decomposition into two-qubit gates
The three-qubit CCZS gate entangles all three qubits. It can thus not be written as the simultaneous application of a two-qubit gate to two of the qubits and a single-qubit gate to the third qubit. Instead, decomposing CCZS into single-and two-qubit gates requires the consecutive application of several such gates. Inspired by decompositions for the quantum-optical Fredkin gate [2,89], we Figure 2. Decomposition of the three-qubit CCZS gate into two-qubit gates. Note that for the case of a linear chain [see Fig. 1(a)], the control qubit q0 in the CCZS gate is the middle qubit, but the decomposition into two-qubit gates requires q1 to be the middle qubit, since it has to interact with both q0 and q2.
find that CCZS can be realized through the consecutive application of three two-qubit gates: and Here, the two-qubit XY(θ, φ) gate [93,94] is generated by an exchange-type interaction, e.g., XY(π, 0) = iSWAP. The decomposition in Eq. (27) is illustrated in Fig. 2. From that illustration, it becomes clear that this decomposition requires re-labelling the qubits in a linear chain to work. For the case of a linear chain, the control qubit in the CCZS gate is the middle qubit, but in the decomposition given here, the control qubit must be one of the outer qubits.
unitary operations: Comparing with U CZS (θ, φ, γ) in Eq. (20), it is clear that it never coincides with U Toffoli in Eq. (32), since the two off-diagonal elements in the lower right corner of U CZS (θ, φ, γ) are zero for all values of θ, φ, and γ. Noting that a Toffoli gate is formed by sandwiching a controlled-controlled-Z (CCZ) gate [96] between two Hadamard gates on qubit 2 does not help. The controlled unitary of the CCZ gate is To obtain the −1 in this matrix from U CZS (θ, φ, γ) requires γ = 0, which is easy, but we see from Eq. (26) that the two middle diagonal elements in U CZS (θ, φ, γ) then always will have opposite signs, which does not match the CCZ gate. Indeed, for θ = 0, we simply have the two-qubit CZ gate acting on qubits 0 and 1. The Fredkin gate in Eq. (30) also cannot be directly implemented by the CCZS gate. For the 1 in the lower right corner of the U Fredkin to match U CZS (θ, φ, γ), γ = ±π is necessary, but then all off-diagonal elements in U CZS (θ, φ, γ) become zero. However, we note that the Fredkin gate can be constructed by combining a CCZ gate and CCZS(π/2, 0, 0) (i.e., λ 1 = −λ 2 and δ = 0), as shown in Fig. 3(a). Since an implementation of the Fredkin gate using only single-and two-qubit gates requires at least five two-qubit gates [19], and a CCZ gate Figure 4. Quantum circuits for generating the three-qubit GHZ state (|000 + |111 )/ √ 2. (a) A circuit using singlequbit gates and two-qubit CZ gates. (b) A circuit using singlequbit gates and one CCZS gate.
can be implemented using three two-qubit gates [96], the construction with the CCZS gate (which on its own is at least as fast as a two-qubit gate) constitutes an improvement.
Using exactly the same reasoning as for the Fredkin gate in the preceding paragraph, we see that the iFredkin gate in Eq. (31) also cannot be directly implemented by the CCZS gate. However, it is sufficient to add a single two-qubit CZ gate after CCZS(π/2, π/2, 0) to fix this, as shown in Fig. 3(b). Since implementing the iFredkin gate using only single-and two-qubit gates requires at least four two-qubit gates, while the CCZS gate is at least as fast as a two-qubit gate, our construction at least halves the time required for the iFredkin gate, which is a natural operation in, e.g., simulations of the Fermi-Hubbard model [60].

D. Rapid creation of large entangled states
Having demonstrated that the CCZS gate is fast and that it entangles three qubits, we now show how this gate and its generalization to more than three qubits can be applied to rapidly generate some particular large entangled states. The ability to create entanglement [97] is crucial for both quantum information processing [2,98,99] and quantum communication [100]. Lately, the creation of entanglement between (several) tens of qubits has been used to demonstrate the capabilities of quantum processors on multiple platforms: superconducting circuits [15,[101][102][103][104][105], photonic systems [101], ion traps [106,107], and neutral atoms [108].

Greenberger-Horne-Zeilinger states
Greenberger-Horne-Zeilinger (GHZ) states [61,62] are entangled states of N qubits on the form To generate a GHZ state with N = 3 qubits using only single-and two-qubit gates requires at least two two-qubit gates, e.g., two CNOT gates, two iSWAP gates [109], or two CZ gates [see Fig. 4(a)]. However, this state can also be generated using only a single CCZS gate and a few single-qubit gates, using the circuit shown in Fig. 4(b). Starting from |000 , applying single-qubit Hadamard and X gates to the first and second qubit, respectively, creates the state Then, applying CCZS(θ = π/2, φ = 0, γ = 0), which is achieved for λ 1 = −λ 2 and δ = 0, results in a controlled SWAP of the second and third qubit [see Eq. (25)], yielding the state which is transformed to the three-qubit GHZ state by applying an X gate on the second qubit. We remark that the phase acquired by the doubly excited state of the second and third qubits does not affect the state |ψ that we apply the CCZS gate to.
Since the CCZS(π/2, 0, 0) is √ 2 faster than the CZ gates that can be implemented with the interactions from which it is constructed (see Section II A 3), the circuit with the CCZS gate in Fig. 4(b) generates the three-qubit GHZ state 2 √ 2 times faster than the circuit with the two CZ gates in Fig. 4(a), provided that the time required for single-qubit gates is negligible. If one instead uses the circuit with two iSWAP gates [109], the circuit with the CCZS gate is twice as fast, since the iSWAP is √ 2 faster than a CZ gate in the setups we consider here (the coupling between |11 and |02 is √ 2 stronger than the coupling between |01 and |10 for weakly anharmonic qubits, but the iSWAP only require half the oscillation between states that CZ does; see, e.g., Ref. [85]).

Dicke states
Another class of entangled states is the W states which cannot be converted into GHZ states by local operations and classical communication [110]. The W states are in turn a subset D 1 N of the symmetric Dicke states D k N [63][64][65], which are equally weighted superpositions of all permutations of N -qubit states with k excitations: with S the symmetrization operator. Dicke states have important applications in quantum metrology [111,112] and quantum networks [113][114][115][116].
Recently, it has also been shown that, for combinatorial optimization problems, symmetric Dicke states representing a superposition of all feasible solutions can give advantages when used as the initial state in the quantum alternating operator ansatz [117,118].
The Dicke states arise naturally when N identical atoms are collectively coupled to a harmonic mode [64,119]. However, since the photon or phonon number of the harmonic mode is difficult to control, alternative protocols for Dicke-state generation have been proposed [120][121][122][123][124]. For deterministic preparation of a symmetric Dicke state on a quantum computer, using a sequence of singleand two-qubit gates, it has been shown that constructing D k N requires a quantum circuit with depth O(N ) containing at least O(kN ) gates [125][126][127].
In this subsection, we show how to rapidly create large symmetric Dicke states by generalizing the interaction underpinning the CCZS gate to more qubits. As a concrete example, we show that we can create the state D 3 5 using only two rounds of simultaneous CZ gates and two single-qubit operations, while the quantum circuit in Ref. [126] for creating the same state includes five three-qubit gates and 21 two-qubit gates applied sequentially.
a. Hamiltonian and dynamics We consider a system composed of a central qubit 0 and its N nearest neighbours {j} (the three-qubit system in Fig. 1 For each qubit i, we take into account the three lowest energy levels |0 i , |1 i , and |2 i , with energies 0, ω i , and 2ω i + α i , respectively, where α i is the anharmonicity. In this system, a CZ gate between qubit 0 and one of its neighbours j can be applied by activating the |2 0 0 j ↔ |1 0 1 j transition. Assuming these transitions are resonant, the system Hamiltonian with the transitions switched on is, in the interaction picture, where σ nm i = |n m| i and λ j is the interaction strength for the |2 0 0 j ↔ |1 0 1 j transition. For simplicity, we assume all interactions equally strong: λ j ≡ λ. We can then introduce the collective spin operator J − = N j=1 σ 01 j and rewrite Eq. (39) as which is reminiscent of the Tavis-Cummings (Dicke) model [128,129], where a harmonic oscillator couples to N identical atoms. Due to the anharmonicity of qubit 0, the models are equivalent in the limit where the harmonic oscillator only hosts a single photon. The neighbouring qubits 1 to N are symmetric under permutation and can thus be described by the Dicke states in Eq. (38). In this basis, the matrix elements of the operator J + = (J − ) † are [130] D k+1 We can thus interpret the second term in Eq. (40) as qubit 0 being de-excited from |2 0 to |1 0 while the state of the neighbouring qubits changes from D k N to D k+1 N . The total excitation number k + 2 is conserved.
Since subspaces with different numbers of excitations are decoupled, we can limit ourselves to the subspace with k + 2 excitations, which is spanned by the basis (41), we see that the dynamics in this subspace is generated by b. Generalizing the CCZS gate to more than three qubits We can now understand how the CCZS gate generalizes to more qubits. For N = 2 neighbouring qubits, starting in the computational subspace, i.e., with qubit 0 in state |1 0 , we can always find a time t when the occupation of |2 0 is zero, since This is the case analyzed in the preceding subsections. For N > 2, the coefficients G k N are not equal or commensurate, making it impossible to confine the central qubit to the computational subspace and create an (N + 1)-qubit gate according to the same principle as the CCZS gate. What could be done is to apply a single-qutrit operation on qubit 0 that takes |0 0 to |1 0 and |1 0 to |2 0 , let the system evolve for some time t according to Eq. (42), and then apply the inverse of the single-qutrit operation to qubit 0 to bring it back to the computational subspace. While this would be an (N + 1)-qubit gate, it appears too complicated to find immediate applications.
c. Creating a five-qubit Dicke state Instead of constructing a general (N + 1)-qubit gate, we therefore focus on preparing symmetric Dicke states by starting in a specific subspace. As an illustrative example, we consider the case N = 4 and the target state which also was used as an example in Ref. [126]. First, we note that since D 3 4 can be created by applying X gates to all qubits in the state D 1 4 , and the state D 2 4 is unchanged by those gates, the problem reduces to preparing the state 3 5 |1 0 D 2 4 + 2 5 |0 0 D 1 4 . The procedure for doing so is illustrated in Fig. 5. We first explain how to obtain D 2 4 : 1. First, we prepare the initial state |20000 = |2 0 D 0 4 by single-qutrit operations on qubit 0. This puts the system in the subspace k = 0 spanned by |1 0 D 1 4 and |2 0 D 0 4 . Turning on the interaction and letting the system evolve for a time t = π/(4λ), four times faster than a two-qubit CZ gate, we arrive at |1 0 D 1 4 . We remark that if we have the system tuned to have different interaction strengths λ j like in tripod systems [131,132], we can create arbitrarily weighted superpositions of the N -qubit states with one excitation instead of the symmetric superposition that is the symmetric Dicke state.
2. Next, we flip qubit 0 to |2 0 such that the system state becomes |2 0 D 1 4 . This puts the system in the k = 1 subspace spanned by |1 0 D 2 4 and |2 0 D 1 4 . Turning on the interaction again for To create the superposition state in Eq. (43), we carry out step 1 as above. Then we rotate qubit 0 to the superposition state 2 5 |0 0 + 3 5 |1 0 and flip |1 0 to |2 0 , yielding the system state Turning on the interaction as in step 2 above, the part of the superposition containing |0 0 is decoupled from the dynamics, while the part containing |2 0 reaches |1 0 D 2 4 as before. Finally, applying X gates to the four neighbouring qubits yields the state in Eq. (43).
In total, our scheme requires seven single-qubit operations (four of them simultaneous) and two applications of the interaction that yields CZ gates. The total time spent on these CZ interactions is less than that of a single two-qubit CZ gates. This fast creation of the entangled state in Eq. (43) should be contrasted with the quantum circuit for the same task given in Ref. [126], which contained five three-qubit gates and 21 two-qubit gates, applied sequentially.
d. Rapid creation of a large W state The size of the Dicke state that can be efficiently prepared is determined by the number of neighbouring qubits to which the central qubit is coupled. In general, scaling up arbitrary Dicke states is hard [133]. However, in our scheme, large W states are easy to construct even with limited connectivity, e.g., in a square grid of qubits, as shown in Fig. 6. The method outlined there is straightforward to adapt for other connectivities.

E. Experimental feasibility
To determine how well the CCZS gate is likely to work in actual experiments, we now turn to simulating two specific possible experimental implementations of the gate. We consider two gate schemes commonly used to perform CZ gates in superconducting circuits by turning on and off an interaction between the two-qubit states |11 and Figure 6. Creating large W states rapidly on a square grid of qubits. We first prepare qubit 0 (q0) in its second excited state and carry out the rest of step 1 in Section II D 2 c to create D 1 4 on the neighbouring qubits (red circles). Next, we swap each of the neighbouring qubits with the qubit next to them that is farthest from the centre qubit (filled blue circles except q0). Flipping the |1 part of the state for these new centre qubits to |2 and having them interact with their nearest neighbours creates D 1 4 on those neighbours (green and red circles). For brevity, the new cells starting from q3 and q4 are not shown. The result after these steps (two rounds of single or simultaneous single-qubit operations, three rounds of simultaneous two-qubit gates) is the 16-qubit W state D 1 16 .
|02 or |20 . Similar schemes used for CZ gates on other quantum-computing platforms should be equally feasible for realizing the CCZS gate.
In the first scheme, the two outer qubits in the threequbit chain are tunable [58,[76][77][78]. To activate the gate, they are tuned such that the |110 and |101 states both become resonant with the |200 state. In some implementations, tunable couplers between the qubits are also adjusted to further control the coupling [79][80][81]. In all these cases, the interaction strengths λ 1 and λ 2 will be limited to being in phase, i.e., φ = π [see Eq. (21)]. In the cases without tunable couplers, the parameter θ [see Eq. (21)] is fixed by the coupling strengths in the hardware and cannot be tuned in situ.
In the second scheme, the neighbouring qubits in the chain are connected via a tunable coupler, which itself is a qubit [82][83][84][85]. To activate a CZ gate, the coupler, which is detuned from the qubits it is connected to, is parametrically modulated with a modulation frequency close to the difference in frequency between the states |11 and |02 or |20 . In this case, the interaction strengths λ 1 and λ 2 are determined by the phase and amplitude of the modulation of the coupler; they can thus be tuned over a wide range to implement different parameters for the CCZS gate. We note that a CZ gate also can be implemented in a similar fashion between a fixed-frequency qubit and a parametrically modulated qubit [134,135], but we do not simulate that case here.
To characterize the performance of the gates, we calculate the average gate fidelity [136,137] where U is the ideal gate operation that we wish to implement, M is the gate operation that we actually implement, and n is the dimension of the computational subspace (n = 2 2 = 4 for the CZ gates and n = 2 3 = 8 for the CCZS gate). The aim of the numerical simulations below is to show that high-fidelity three-qubit CCZS gates can be obtained in a straightforward way, without optimizing pulse shapes, etc., compared to the constituent two-qubit gates.

Tunable qubits
We first consider the setup with two tunable qubits 1 and 2 on each side of the fixed-frequency qubit 0 as shown in Fig. 7(a). To test the performance under realistic conditions, we use parameters close to the experiment in Ref. [77]. Before the gate is turned on, the qubit energies are For qubits 0 and 1, these are also their maximum energies; for qubit 2, the maximum energy is set to 2π × 4.927 GHz. The qubit anharmonicities are and the couplings between the qubits are The factor √ 2 appears in Eq. (48) since we use λ 1 and λ 2 to denote the coupling strengths for |11 ↔ |20 transitions instead of the coupling strengths for |10 ↔ |01 transitions, which are the parameters that are actually given as input to the simulation.
To activate the gate, the energies ω 1 and ω 2 of qubits 1 and 2 are tuned into resonance with ω 0 + α 0 as shown in Fig. 7(b). To account for the finite response time of the drive line, the pulse used for tuning the qubit energies is the convolution of a rectangular pulse of length t gate (the gate time) and a Gaussian pulse centered in the middle of the rectangular pulse with standard deviation σ = 1 ns.
We first check that we can tune up CZ gates between qubits 1 or 2 and qubit 0 by tuning just one of qubits 1 and 2 to the relevant resonance. We find that we can achieve F > 99.99 % for the CZ gate between qubits 0 and 2 with a gate time t gate,CZ02 = 93.0 ns and, similarly,   Fig. 1(a). The superconducting qubits are transmon qubits [138], which are nonlinear LC oscillators where the nonlinear inductances are provided by Josephson junctions (boxes with crosses in the sketch). When two Josephson junctions are combined in a loop [a superconducting quantum interference device (SQUID)], the effective inductance, and thus the qubit frequency, can be tuned by controlling the magnetic flux through the loop. (b) The tuning of the qubit energies used to implement a high-fidelity CCZS(π/2, π, 0) gate. (c) Population of the states |110 , |200 , and |101 during the CCZS(π/2, π, 0) gate when the initial state is |110 . As Eq. (24) shows, the main effect of this gate is to swap the population between qubits 1 and 2 if qubit 0 is in its excited state.
We then tune up the CCZS(π/2, π, 0) gate [see Eq. (24)] by reducing the gate time and synchronizing the tuning of both qubits 1 and 2 to the resonance for the gate. Note that θ = π/2 is set by the fixed coupling strengths λ 1 = λ 2 and cannot be changed. The best gate fidelity we find is F = 99.42 % for the gate time t gate,CCZS = 66.8 ns ≈ t gate,CZ / √ 2 [see Fig. 7(b)- We attribute the deviation from 100 % gate fidelity for the three-qubit gate to a combination of factors. One is imperfections arising when tuning qubits 1 and 2 in and out of the resonance. During that time, qubit 1 crosses the frequency of qubit 0, which may cause leakage by briefly activating the |01 ↔ |10 transition for these qubits instead of the desired |11 ↔ |20 . We note that qubit 2, which is below qubit 0 in frequency, does not have the same potential problem; this may explain why the CZ 02 gate has higher fidelity than the CZ 01 gate. Furthermore, tuning qubits 1 and 2 from different frequencies into the resonance appears to affect the parameter φ, making it deviate from π and thus lowering the gate fidelity. To improve the tuning of the qubit energies, one could try methods developed for nonadiabatic holonomic gates [139].
An additional source of error may be that the states |001 , |100 , and |010 form a Λ system during the gate operation, with |001 and |010 having the same energy. Although |100 is detuned from the other two states by the anharmonicity α 0 , there will still be a small effective coupling between |001 and |010 that can contribute to lowering the gate fidelity. This effect can be reduced by increasing the anharmonicity.
To further put the time gained by performing the threequbit gate in perspective, we show in Fig. 8 the decomposition of the CCZS(π/2, π, 0) gate into single-qubit gates and two-qubit CZ gates between the middle qubit and its neighbours. Note that this is different from the decomposition in Fig. 2, which assumes access to a parameterized XY gate in addition to the CZ gates we have here. From the decomposition in Fig. 8, we see that five sequential CZ gates would be needed to implement this three-qubit gate in the setup at hand. Even if we assume that singlequbit gates take negligible time compared to two-qubit gates, this still means that we gain more than a factor 7 in gate time by implementing the three-qubit gate using our scheme.

Tunable couplers
a. Setup and operation We next consider the setup with tunable couplers as shown in Fig. 9. This setup is Figure 8. Decomposition of the CCZS(π/2, π, 0) gate into single-qubit and CZ gates, obtained using Qiskit [140]. For the single-qubit gates, we use the notation S = √ Z = Z π/2 and √ X = X π/2 . Note that this decomposition requires qubit 2 being placed in the middle of the linear chain, since it has to perform CZ gates with both qubit 0 and qubit 1.
where a i and a † i (b j and b † j ) are the annihilation and creation operators, respectively, of qubit i (coupler j), ω i (ω cj ) is its transition frequency, α i (α cj ) its anharmonicity, and g ij is the strength of the capacitive coupling between qubit i and coupler j. We use parameter values similar to recent updates of the design in Ref. [84]. These values, which are kept fixed throughout all simulations, are given in Table I. To activate the gate, the magnetic flux Φ j (t) through the superconducting quantum interference device (SQUID) of coupler j (see Fig. 9) is modulated as where Θ j is the DC bias, δ j (t) is an envelope function with sinusoidal rise and fall of 25 ns and a constant value δ 0j for a time t p in-between such that t gate = t p + 50 ns, ω Φj is the modulation frequency [close to resonance with the transition frequency between the states that are coupled by the CZ gate (see Fig. 1)], and ϕ j is the initial phase of the drive, which is kept equal to zero until we need to calibrate different values of φ in the CCZS gate family. Modulating a symmetric SQUID like this results in a time-dependent coupler frequency [138] where Φ 0 is the flux quantum. Our control parameters for the CCZS gate are thus t p , δ 0j , Θ j , and the detuning between ω Φj and the expected resonant frequency. b. Calibration procedure and results We calibrate gates in the CCZS family by the following procedure: 1. We first tune up high-fidelity CZ 01 and CZ 02 gates with equal gate times. Both CZ gates must be implemented such that the second excited state used is that of qubit 0, as shown in Fig. 1(b). We begin by exploring the parameter space spanned by Θ j and δ 0j to find values that yield high population transfers from |1 0 1 j to |2 0 0 j . We then plot the population in |2 0 0 j as a function of t gate and ω Φj as in Fig. 10(a) and 10(c) and go along the value of ω Φj that corresponds to the tip of the resulting chevron pattern to the first value of t gate that returns all population to |1 0 1 j . Finally, we confirm that the CZ gate fidelity around this point in parameter space is close to 100 % and pick the parameter values in this area that give the highest gate fidelity.
2. Next, we apply pulses to both couplers with the same DC biases and amplitudes as for the good CZ gates found in the previous step, but sweep the modulation frequency of both pulses around the values for the CZ gates. The smoking gun for the CCZS gates is a maximal population transfer between |101 and |110 , which corresponds to θ = π/2. We expect such a point in parameter space to show up at gate times around √ 2 shorter than those found for the CZ gates. Having found such a point, we check that the gate fidelity around that point is close to 100 % for the CCZS(π/2, φ, 0) gate. We then pick the parameter values in this area that give the highest gate fidelity for the desired value of φ. Figure 10. Calibrating the CCZS gate with tunable couplers. (a) Population in |200 as a function of gate time and the detuning between the modulation frequency ω Φ 1 and the frequency of the transition |110 ↔ |200 when calibrating the CZ01 gate by initializing the system in |110 . (b) The gate fidelity for the CZ01 gate for the parameters in (a). (c) Population in |200 as a function of gate time and the detuning between the modulation frequency ω Φ 2 and the frequency of the transition |101 ↔ |200 when calibrating the CZ02 gate by initializing the system in |101 . (d) The gate fidelity for the CZ02 gate for the parameters in (c). (e) Population in |110 as a function of gate time and the detuning between the modulation frequency ω Φ 1 and the frequency of the transition |110 ↔ |200 when calibrating the CCZS(π/2, π, 0) gate by initializing the system in |101 . During the calibration, we also vary ω Φ 2 , but in this plot it is kept fixed. (f) The gate fidelity for the CCZS(π/2, π, 0) gate for the parameters in (e).
3. Other elements of the CCZS gate family can also be found, but none of them have such a clear signature as the |101 ↔ |110 population transfer. In particular, other values for the phase φ are found by changing the relative initial phase between the two pulses, (ϕ 1 − ϕ 2 ). Other values of θ can, in principle, be found by combining the controls of individual CZ gates with different gate strengths and adjusting the gate time accordingly [see the discussion below Eq. (26)].
We now show how such a calibration procedure can look like in practice. We first plot the population in |2 0 0 j as a function of t gate and ω Φj for the two individual CZ gates in Fig. 10(a) and 10(c). We find highfidelity (> 99.7 % and > 99.9 %) CZ gates with similar gate times, around 405 and 396 ns, by choosing a DC bias Θ j = 0.275Φ 0 for both couplers and an amplitude δ 0j ≈ 0.08Φ 0 . In Fig. 10(b) and 10(d), we show the corresponding maps of gate fidelity as a function of gate time and ω Φj .
We then try applying the same pulses simultaneously. The parameters we vary are now the two modulation frequencies (and the gate time), so we need to look at different projections of the resulting 2-dimensional param- Average gate fidelity Figure 11. Calibrating the CCZS(π/2, φ, 0) gate by tuning the phase difference ϕ1 − ϕ2 between the signals modulating the two tunable couplers. The plots shows the gate fidelity for the CCZS(π/2, φ, 0) gate as a function of φ and ϕ1 − ϕ2. All other parameters are the same as those that gave the highest gate fidelity for the CCZS(π/2, π, 0) in Fig. 10. For each value of φ shown in the plot, the highest gate fidelity exceeds 99 %. eter space. In Fig. 10(e), we show one such projection, fixing ω Φ2 and plotting the population in |101 as a function of ω Φ1 and t gate . The corresponding gate fidelity for CCZS(π/2, π, 0) as a function of the same parameters is plotted in Fig. 10(f). Selecting the parameters that yield the highest population transfer, and optimizing for fidelity around those values, we find a CCZS(π/2, π, 0) gate with t gate = 295 ns, which is a factor ∼ √ 2 shorter plateau time t p than the individual CZ gates. The gate fidelity is > 99.3 %.
Note that, just as for the simulations with tunable qubits above, we have not included any effects of decoherence in these simulations. The impact of decoherence will be less the faster the gates are. It is possible to calibrate faster CZ and CCZS gates than the examples shown here in Fig. 10 by choosing other values of Θ j and δ 0j , but we have chosen to show these examples since they illustrate the calibration and workings of the gates more clearly than some of the faster gates.
c. Tuning the gate parameter φ To further demonstrate the extensive control of gate parameters afforded by the setup with tunable couplers, we calibrate the CCZS(π/2, φ, 0) gate for many values of φ in the range [0, 2π]. We do this by starting from the optimized parameters for CCZS(π/2, π, 0) found above and then tuning the phase difference ϕ 1 −ϕ 2 between the signals modulating the two tunable couplers. The resulting gate fidelities for CCZS(π/2, φ, 0) are shown in Fig. 11. For all values of φ we try, we find gate fidelities above 99 %. These high gate fidelities are achieved along the line φ = π + ϕ 1 − ϕ 2 mod 2π, as expected from Eq. (21).
d. Error sources Just as for the setup with tunable qubits in Section II E 2, we do not reach perfect 100 % gate fidelity in our simulations of the setup with tunable couplers, despite neglecting decoherence effects. The re-maining error has multiple contributions. Firstly, the pulse shape δ j (t) is chosen to be very simple; no optimal control is applied to improve it. Secondly, higherorder interactions between the qubits mediated by the off-resonant couplers result in ZZ interactions that disturb the three-qubit gate. We observe higher gate fidelities if we allow ourselves to correct phases like those produced by such interactions. This suggests that schemes for reducing unwanted ZZ interactions in two-qubit gates (see, e.g., Refs. [79,80]) could be helpful also for the three-qubit gate considered here.
Thirdly, we note that we restricted ourselves to calibrating gates on the form CCZS(π/2, φ, 0). It is possible that some of the gates we produced had higher gate fidelities with CCZS(θ, φ, γ) for other values of θ and γ, but we preferred tuning up and showing fidelities for a gate with clearer functionality rather than searching the space of parameters θ and γ to find the highest possible fidelity. Finally, the simulations with five three-level transmon qubits are quite computationally heavy and we needed to search a 10-dimensional parameter space (plateau times t p , modulation frequencies ω Φj , modulation phases φ j , modulation amplitudes δ 0j , and DC biases Θ j ). There is thus certainly room for improvement in exploration of this parameter space.

III. SIMULTANEOUS ISWAP GATES
In this section, we show that the idea of applying simultaneous two-qubit gates to create multi-qubit gates is not limited to the CZ gates studied in Section II. Here, we investigate what happens when the simultaneous gates are iSWAP gates instead. The treatment in this section will be more condensed than in the previous one, since some parts turn out to be quite similar. We note that several other combinations of two-qubit gates are possible, but the detailed study of such possibilities is left for future work.

A. Setup and gate operation
We consider simultaneous application of iSWAP gates that are based on activating a coupling between the states |01 and |10 . In such gates, the states |00 and |11 do not couple to other states and remain unchanged while the states |01 and |10 are swapped and acquire a phase factor −i. In superconducting circuits, just as for the CZ gate, the required coupling can be achieved either by tuning the frequencies of the two qubits into resonance [75,76], possibly in conjunction with tuning a coupler [79,80], by parametrically modulating a tunable coupler between two fixed-frequency qubits [82,85,93], or by parametrically modulating one of the qubits [94,135]. In Ref. [79], a gate fidelity of 99.86 % was reported for a gate time of 30 ns. We denote the three-qubit operation resulting from the simultaneous application of the two iSWAP gates by DIV, since it is a "divider" gate that can distribute one or two excitations among all three qubits.

Hamiltonians and time evolution
We consider the same linear chain of three qubits as for the CCZS gate in Fig. 1, but now coupling the transitions between the states |1 0 0 j and |0 0 1 j with strengths g j , as illustrated in Fig. 12(a). A simplification compared to the simultaneous CZ gates is that no second excited state of any qubit becomes part of the dynamics. For the simultaneous iSWAP gates, only the transitions shown in Fig. 12(b) are activated.
Assuming for simplicity that the transitions shown in Fig. 12(b) are resonant (δ = 0), and that all other transitions are far off resonance, the Hamiltonian for the system can in the interaction picture be written as with operators defined as in Section II D 2 a. This Hamiltonian conserves the number of excitations in the system and we also see that the transitions in Fig. 12(b) occur within the subspaces determined by the excitation number.
In the subspaces with one and two excitations, Λ or V systems are formed. Just as in Section II A 1, it is thus convenient to introduce new basis states that include dark and bright states. Defining where the unitary dynamics generated by the system Hamiltonian in Eq. (52) can be expressed as where I and σ x are defined in the subspace spanned by |100 and |0B 1 , and I and σ x are defined in the subspace spanned by |011 and |1B 2 .

The family of three-qubit gates
The results above show that the states in the computational subspace of the three qubits are affected as follows: |000 and |111 are unchanged, while states are swapped around in the single-and double-excitation subspaces. By introducing the notation tan θ = g 2 /g 1 (we assume for simplicity that g 1 and g 2 are in phase) and ϕ = Ωt, we can write the three-qubit gate, which we denote DIV [see Fig. 12(c)], as where U j acts on the j-excitation subspace. Here, U 0 = U 3 = 1, while U 1 and U 2 are found by transforming from the basis with bright and dark states used in Eq. (58) (see Appendix A) to the computational basis; in the singleexcitation subspace spanned by |010 , |100 , and |001 , we obtain and U 2 (θ, ϕ) for the double-excitation subspace spanned by |101 , |011 , and |110 has exactly the same form. An important difference between the DIV gate from simultaneous iSWAP gates and the CCZS gate from simultaneous CZ gates in the previous section is that the operation of the DIV gate never makes any population leave the computational subspace. This is why we can vary the parameter ϕ freely by choosing the evolution time t. In the CCZS gate, the evolution time is heavily constrained by the need to ensure that the temporary population in the middle qubit's second excited state returns to the computational subspace at the end of the gate.

Examples of three-qubit gates
We now study some simple parameter choices for the DIV gate. If we set g 1 = g, g 2 = 0, and ϕ = π/2, i.e., t gate = π/2g, we recover the two-qubit iSWAP gate acting on qubits 0 and 1. In the same way, if g 1 = 0, g 2 = g, and ϕ = π/2, we have the two-qubit iSWAP gate acting on qubits 0 and 2.
If we activate both these iSWAP interactions simultaneously, i.e., g 1 = g 2 = g such that θ = π/4, and choose the gate time t gate = π/2 √ 2g such that ϕ = π/2, we obtain DIV(π/4, π/2), for which This gate, which is a factor √ 2 faster than the individual two-qubit iSWAP gates, thus takes a single excitation in the middle qubit and divides it evenly between the two outer qubits. A single excitation in one of the outer qubits ends up divided across all three qubits: half in the middle qubit and a quarter in each of the outer qubits.
If we keep the two coupling strengths the same (g 1 = g 2 = g such that θ = π/4), but vary the parameter ϕ by varying the gate time t = ϕ/ √ 2g, the gate becomes (62) If we instead fix ϕ = π/2, but vary θ by varying the ratio of the coupling strengths g 1 and g 2 , the resulting gate is given by

B. Decomposition into two-qubit gates
We note that all the three-qubit gates in Section III A 3 entangle all three qubits. Although finding a decomposition of the DIV gate into single-and two-qubit gates is less straightforward than for the CCZS gate in Figure 13. Quantum circuit for generating the three-qubit W state (|100 + |010 + |001 )/ √ 3 using the DIV gate. The phase gate S is √ Z.
Section II B, we can from this entanglement conclude that at the very least two sequential two-qubit gates are necessary for such a decomposition. Since the three-qubit gate already is faster than a single two-qubit gate, this guarantees a significant speed-up.

C. Constructing other three-qubit gates
Unlike the CCZS gate, the DIV gate cannot be interpreted as one qubit controlling a two-qubit operation on the other two qubits. The most well-known three-qubit gates, the Fredkin, iFredkin, and Toffoli gates considered in Section II C, are all such gates. It is thus clear that the DIV gate cannot be equivalent to any of these three-qubit gates for any choice of parameters θ and ϕ. Furthermore, it does not appear possible to change any DIV gate into such a form by adding a single two-qubit gate before or after the DIV gate.

D. Creating large entangled states
We now turn to how the DIV gate and its generalizations to more qubits can be used to rapidly create large entangled states, similar to what we showed for the CCZS gate in Section II D. We first note that arbitrary superpositions of all permutations of three-qubit states with one excitation can easily be created by starting from |000 , exciting qubit 0, and then applying the DIV gate for suitable values of θ and ϕ, yielding In particular, the three-qubit W state can be constructed by choosing θ = π/4 (i.e., g 1 = g 2 ) and ϕ = arctan √ 2 in Eq. (64), and following that by applying single-qubit gates to qubits 1 and 2, as shown in Fig. 13.
We note that the experiment in Ref. [109] showed how to construct a W state by single-qubit rotations and a single application of simultaneous iSWAP interactions between all three qubits in a triangular setup instead of the linear chain considered here. A generalization of this protocol to more qubits is given in Ref. [141]. Furthermore, a protocol to construct the three-qubit GHZ state using single-qubit rotations and a single application of simultaneous iSWAP interactions in a linear chain like we consider here was presented in Ref. [142].
The multi-qubit version of the simultaneous iSWAP interaction, where the dynamics of a central qubit 0 and its N nearest neghbours {j} are governed by the interactionpicture Hamiltonian is the same as the simultaneous CZ interaction given in Eq. (39) except that it is the |0 ↔ |1 transition that couples to the surrounding qubits instead of the |1 ↔ |2 transition. We can thus reuse much of what we derived in Section II D 2 about how to rapidly create large Dicke states. For example, the procedure for creating W states with many qubits described in Fig. 6 can also be implemented with simultaneous iSWAP gates. It is actually even easier, since the initial qubit only needs to be prepared in state |1 instead of |2 and the later step of flipping |1 to |2 for other qubits described there can be omitted.
To create superpositions of Dicke states like in Section II D 2 c, only a minor modification of the protocol presented there is needed to adapt it to simultaneous iSWAP gates. We simply change the states |0 0 , |1 0 , and |2 0 to |2 0 , |0 0 , and |1 0 , respectively, during the execution of the protocol. At the end, we change them back to obtain the state in Eq. (43).

E. Experimental feasibility
In the same way as for the CCZS gate (see Section II E), there are several experimental setups with superconducting qubits that could be used to implement the simultaneous iSWAP gates that make up the DIV gate. This includes setups with tunable qubits, where the states |01 and |10 are tuned into resonance [75,76]. This activation of the iSWAP gate can be further enhanced with a tunable coupler [79,80]. The other type of setup uses parametric modulation of either a tunable coupler [82,85,93] or one of the qubits [94,135].
For brevity and simplicity, we here limit our simulations to the implementation with tunable qubits. In this implementation, the parameter θ is fixed by the coupling strengths set in the hardware and cannot be changed in experiment. An implementation with parametric modulation of tunable couplers instead would enable controlling θ in situ. Such an implementation can be calibrated in similar fashion as the CCZS gate with tunable couplers in Section II E 2.
For the implementation with tunable qubits, we consider the same setup and parameters as in Section II E 1 except that we increase the maximum energy of qubit 2 to ω 0 ; see Fig. 7(a) and Eqs. (46)-(48) with g j = λ j /2. We further use the same pulse shapes as in Fig. 7(b), but now tuning ω 1 and ω 2 into resonance with ω 0 instead of ω 0 + α 0 and adapting the gate times to yield the iSWAP and DIV gates, resulting in the tuning shown in Fig. 14(a).
We first tune up the individual iSWAP 01 and iSWAP 02 gates by tuning just one of the outer qubits into resonance with the middle qubit. For the iSWAP 01 gate, we find a gate fidelity of 99.8 % with a gate time t gate,iSWAP01 = 66.8 ns, and for the iSWAP 02 gate, we find a gate fidelity of 99.6 % using the same gate time.
We then tune up the DIV(π/4, π/2) gate [see Eq. (61)] by tuning both outer qubits into resonance with the middle qubit in a synchronized fashion, reducing the gate time by around a factor √ 2. Note that other values of ϕ are easily implemented by decreasing or increasing the gate time, while θ = π/4 is fixed by the choice of g 1 = g 2 used here. For the DIV(π/4, π/2) gate, we find a gate fidelity of 99.1 % for the gate time t gate,DIV = 47.5 ns ≈ t gate,iSWAP / √ 2. Just as in Section II E, the gate fidelity is calculated without including any effects of decoherence in the simulation. To illustrate the calibrated gate, we plot in Fig. 14(b) the population transfers in the singleexcitation subspace when the system is initialized in the state |010 .
The deviation from perfect gate fidelity can be attributed to several factors, similar to the case of the CCZS gate with tunable qubits in Section II E 1. As we tune qubit 2 into resonance with qubit 0, we pass the point where ω 2 = ω 0 + α 0 , i.e., where the |1 0 1 2 ↔ |2 0 0 2 transition is resonant. This can cause leakage from the computational subspace. We note that qubit 1 does not have the same issue, which likely helps explain why we find a higher gate fidelity for the iSWAP 01 gate than the iSWAP 02 . Just like for the CCZS gate, it would likely be beneficial to apply more advanced methods for optimizing the tuning of the qubits in and out of resonance.

IV. CONCLUSION
We have shown how multi-qubit gates can be constructed and implemented by simultaneously applying two-qubit gates to a group of qubits such that at least one qubit is affected by the operation of two or more of these two-qubit gates. The resulting multi-qubit gates are as fast as, and in many cases clearly faster than, the individual two-qubit gates on their own. Furthermore, the multi-qubit gates can have larger entangling power than the sequential application of the constituent twoqubit gates, in addition to being much faster than such a sequential application.
Since our scheme for multi-qubit gates only relies on control operations corresponding to two-qubit gates, our ideas are ready to be implemented in existing quantum hardware without the need for any additional components, complicated pulse shapes, hardware redesign, or other changes beyond some recalibration of the lengths (and in some cases, the phases) of the control pulses already optimized for two-qubit gates. This means that the multi-qubit gates presented in this article, and other multi-qubit gates using the same principles, could become useful immediately across mature quantum-computing platforms like superconducting circuits, trapped ions, and others.
We illustrated our ideas for multi-qubit gates with two specific examples: simultaneously applied interactions for CZ gates and simultaneously applied interactions for iSWAP gates. For the simultaneous CZ gates, based on activating the |11 ↔ |02 transition, we showed that applying them to the nearest neighbours in a linear chain of three qubits, with the middle qubit being the one excited to |2 , resulted in a three-qubit gate that we denoted CCZS. This CCZS gate applies a combination of the CZ and SWAP gates to the outer qubits in the chain conditioned on the middle qubit being in its excited state. By controlling the ratio of amplitudes and the relative phases of the pulses for the constituent CZ gates, and also the detuning from resonance of the |11 ↔ |02 transition, we gain access to a whole family of CCZS gates. For the case where the CZ control pulses are in phase, on resonance, and of equal amplitude, the gate time for the CCZS gate is a factor √ 2 shorter than for a single CZ gate. Exploring the entangling power of the CCZS gates, we showed that a decomposition of a gate in the CCZS family in general requires three sequential two-qubit gates. We also demonstrated that gates from the CCZS family can be used to construct other three-qubit gates: the iFredkin and Fredkin gates are equivalent to a CCZS gate followed by a two-qubit CZ gate or a three-qubit CCZ gate, respectively. For the iFredkin gate, this suggests that we can implement it twice as fast using a construction with a CCZS gate than a standard decomposition into two-qubit gates. Furthermore, we showed that a single CCZS gate combined with a few single-qubit gates can be used to construct an entangled three-qubit GHZ state. Finally, we generalized the CCZS gate operation to more qubits and showed that, in combination with a few single-qubit gates, it can create large entangled Dicke states in very few steps.
For the simultaneous iSWAP gates, based on activating the |01 ↔ |10 transition, we showed that when they are applied in a linear chain of three qubits, a three-qubit gate which we denoted DIV is created. The DIV gate distributes excitations among the three qubits within the one-and two-excitation subspaces while leaving the states |000 and |111 unchanged. We showed that we can create a large family of DIV gates by controlling the gate time and the relative strength of the two constituent iSWAP gates. Furthermore, similar to the CCZS gates, we showed that the DIV gates are in general faster than single iSWAP gates and can be used to rapidly construct large entangled states like Dicke and GHZ states.
For both the CCZS gate and the DIV gate, we performed numerical simulations using parameters from existing state-of-the-art quantum hardware with superconducting qubits to demonstrate that these three-qubit gates are ready to be implemented with high fidelity in experiments. For the CCZS gate, we showed that it can be implemented with both tunable qubits and with tunable couplers, where the latter gives some more freedom to control parameters and realize the whole family of gates. We found that both setups enable gate fidelities exceeding 99.3 %, with the tunable qubits reaching a gate fidelity above 99.4 %. For the DIV gate, we limited the simulations to the setup with tunable qubits and demonstrated a gate fidelity exceeding 99.1 %. We emphasize that all these simulations used quite simple and straightforward methods for optimization and calibration, indicating that these high gate fidelities should be within reach in experiment. Furthermore, we identified factors contributing to the deviations from perfect gate fidelity, e.g., lack of optimal control applied to gate parameters varying in time and the presence of unwanted ZZ coupling. This allowed us to suggest several improvements to the operation of the three-qubit gates, which should enable even higher gate fidelities than demonstrated here.
In conclusion, we have introduced a general method for creating multi-qubit gates using two-qubit gates already in use in current quantum hardware. We have shown that these multi-qubit gates are fast, powerful, and ready to be implemented in existing experimental setups without any significant modifications needed. This opens up a wealth of possible applications by making quantum circuits more compact and faster to run, which is crucial for unleashing the potential of NISQ devices that are limited by coherence times.

V. OUTLOOK
We see at least five directions for further research building on the results presented in this article. The first is to test the ideas detailed here in actual experiments. Such experiments could be performed using various setups with superconducting qubits, as we have analyzed in Sections II E and III E, but also on other quantum hardware platforms. We note that experimental implementations would benefit from further developing calibrations methods compared to what we showed in Sections II E and III E. In the numerical simulations there, we had access to the full propagator associated with the gate. This allowed us to simplify the calibration process substantially, since we could easily check the average gate fidelity for those points in parameter space that showed population transfers between the computational states corresponding to the gate we sought.
The question of experiments ties into the second research direction, which is to analyze how well the schemes from this article will perform on other platforms than superconducting qubits. Furthermore, it should be investigated whether there are two-qubit gate implementations native to these other platforms that can be run simultaneously to create new multi-qubit gates.
This last part can also be viewed as part of the third research direction, which is to find more multi-qubit gates realized through simultaneous application of other two-qubit gates than CZ and iSWAP, which were used as examples in this article. Candidates for such twoqubit gates include Controlled-NOT (CNOT) gates implemented through cross-resonance driving [143,144]. It may also be possible to simultaneously apply different two-qubit gates to different pairs of qubits to create yet other multi-qubit gates.
The fourth research direction we envision is to compile or transpile various quantum algorithms anew with the novel multi-qubit gates included in the native gate set of the device that the algorithm is to be executed on. We expect this to lead to a significantly reduced circuit depth and run time for some algorithms. As mentioned in the introduction, the CCZS gate seems particularly suited to improve phase estimation and spectrum qubitization, but there are likely many more algorithms that would benefit from its inclusion. For example, one could investigate the use of our multi-qubit gates in the entangling layers of variational quantum algorithms [12] like the quantum approximate optimization algorithm [145], the quantum alternating operator ansatz [73], or the variational quantum eigensolver [146].
Finally, we also believe that tools from optimal control and insights from other works on optimizing pulse shaping for gates should be applied to the multi-qubit gates developed here. This could help achieve even higher gate fidelities and shorter circuit run times by reducing leakage to states outside the computational subspace and further decreasing the gate time.
For setups where the CZ gates are performed by tuning the states |1 0 1 j and |2 0 0 j into resonance, the states |x01 and |x10 with x = {0, 1, 2} will also become resonant. Then, a direct coupling between qubits 1 and 2 will activate an iSWAP between them. If the CZ gates instead are performed through parametric modulation of tunable couplers between the three qubits, such an iSWAP gate will be activated if there is a frequency component of the modulation in the coupler connecting qubits 1 and 2 that matches the energy difference between the states |x01 and |x10 .
We thus have three additions to the diagram in Fig. 1(b): a coupling between |101 and |110 , transforming the effective Λ system in the upper part of the figure into a ∆ system; a coupling between |201 and |210 , transforming the effective V system in the lower part of the figure into a ∇ system; and a coupling between |001 and |010 . The effect of the last part is simply to change the first term in Eq. (19) and the angle β = gt is determined by the interaction strength g between |001 and |010 and the gate time t.
In the following, we therefore investigate the dynamics of the effective three-level systems, which will determine the gate time, since we must make sure to return all population to the computational subspace at the end of the gate.
Since we want to eliminate leakage to the |200 = |2 state, which is outside of the computational subspace, we need the off-diagonal elements in Eq. (A16) to be zero. This is achieved when Ωt = π. Then, the time evolution for |B , |2 , and |D is given by We see from Eq. (A19) that a full population transfer between |101 and |110 requires either t = 4π/α 13 or α 13 = 0. The latter is the case treated in the main text. The former condition shows that a three-qubit gate similar to the CCZS gate in Section II can be implemented also when there is an additional nonzero direct coupling α 13 between qubits 1 and 2.
We finally note that the analysis here for the ∆ system also applies to the ∇ system, if we identify the states |1 , |2 , and |3 with |210 , |111 , and |201 , respectively. This means that the condition Ωt = π imposed above also ensures that no leakage from the computational subspace takes place in the ∇ system, since the only effect on the state |111 is that it acquires a phase factor − exp(−iα 13 t/2), as shown by Eq. (A19).