Improving the Performance of Deep Quantum Optimization Algorithms with Continuous Gate Sets

Variational quantum algorithms are believed to be promising for solving computationally hard problems and are often comprised of repeated layers of quantum gates. An example thereof is the quantum approximate optimization algorithm (QAOA), an approach to solve combinatorial optimization problems on noisy intermediate-scale quantum (NISQ) systems. Gaining computational power from QAOA critically relies on the mitigation of errors during the execution of the algorithm, which for coherence-limited operations is achievable by reducing the gate count. Here, we demonstrate an improvement of up to a factor of 3 in algorithmic performance as measured by the success probability, by implementing a continuous hardware-efficient gate set using superconducting quantum circuits. This gate set allows us to perform the phase separation step in QAOA with a single physical gate for each pair of qubits instead of decomposing it into two C$Z$-gates and single-qubit gates. With this reduced number of physical gates, which scales with the number of layers employed in the algorithm, we experimentally investigate the circuit-depth-dependent performance of QAOA applied to exact-cover problem instances mapped onto three and seven qubits, using up to a total of 399 operations and up to 9 layers. Our results demonstrate that the use of continuous gate sets may be a key component in extending the impact of near-term quantum computers.


I. INTRODUCTION
Quantum computers have the potential to outperform classical computers on a range of computational problems such as prime factoring [1] and quantum chemistry [2].Although many of these applications will require quantum error correction [3] to provide a quantum advantage, there is an increasing interest in exploring quantum applications on noisy intermediate-scale quantum (NISQ) devices [4] available in the near-term.Recent experiments have demonstrated a computational advantage of quantum computers [5], explored many-body physics [6,7] and simulated small-scale quantum chemistry problems [8][9][10].Moreover, there is a significant interest in solving optimization problems on quantum computers, in particular with the quantum approximate optimization algorithm (QAOA) [11][12][13].This variational algorithm has been used to study a range of discrete [11,[14][15][16] and continuous [17] optimization problems, and may have applications for unstructured search [18].While there is currently no proof that it can provide an asymptotic quantum advantage, QAOA is an emerging approach for benchmarking quantum devices and is a candidate for demonstrating a practical quantum speed-up on near-term NISQ devices.
To find an approximate solution to a combinatorial problem with QAOA, a problem Hamiltonian is formulated, whose ground state corresponds to the solution of the combinatorial problem.To approximate this ground state, a quantum computer prepares an ansatz state with a parameterized gate sequence, whose parameters are iteratively updated by a classical optimizer.The gate sequence consists of layers, each characterized by two variational parameters, γ q and β q , see Fig. 1.The number of layers, p, sets the depth of the algorithm and QAOA can reach the global optimum of any cost function for p → ∞ [11].It is therefore expected that the computational power of QAOA increases with p.In practice, however, the number of layers that can be executed reliably on near-term quantum computers is limited due to finite gate errors induced by relaxation, dephasing and pulse imperfections [14,19].
Small-scale implementations of QAOA, while restricted to solving problems that can also be efficiently solved on classical computers, provide crucial insights into the feasibility and challenges related to the execution of the algorithm on NISQ devices.Previous studies of QAOA with superconducting qubits [12,13,[19][20][21], photonics [22] and trapped ions [23] highlight the applicability of QAOA on a range of platforms and illustrate the breadth of problems that can be addressed with QAOA.The work presented in Ref. [12] studied the MaxCut problem, which is the canonical problem for QAOA [11], with up to 19 qubits, Ref. [20] studied a channel decoding problem, Ref. [23] searched the eigenstate of all-to-all connected Ising models with up to 40 qubits and Ref. [21] considered an exact-cover problem with 2 qubits.Many of these experiments consider problems that can be solved with shallow QAOA circuits (p = 1 or 2).However, these examples may not be representative of the broad range of problems that can be addressed with QAOA.Indeed, studies of all-to-all connected Ising models show that deep circuits may be needed [13].
When implementing quantum algorithms on a quantum device, it is common to decompose the gate sequence into a discrete set of gates available on the hardware.To improve performance, recent experiments have explored continuous gate sets motivated by applications in quantum simulations [24,25], quantum chemistry [26,27] and for QAOA using XY interactions [28].In this work, we benchmark QAOA with a continuous hardware-efficient gate set.We present a controlled arbitrary-phase gate (C-ARB gate), which allows to execute each QAOA layer with only one two-qubit gate per ZZ-term in problem Hamiltonians formulated as Ising models, see Fig. 1(a).We demonstrate how our gate set shortens the QAOA sequence and, thus, leads to better performance for a fixed QAOA depth compared to a decomposed implementation of the algorithm with a discrete gate set.In particular, we demonstrate with two concrete examples that the reduction in gate sequence duration outweighs errors originating from the interpolation of parameters necessary for implementing the continuous gate set.Taking advantage of this gain in performance, we investigate the trade-off between experimental noise, which favors shallow circuits, and increasing the number of layers, which is needed to solve complex problem instances.

II. IMPLEMENTATION
The objective function of many NP-complete discrete optimization problems can be mapped to an Ising Hamiltonian [29,30], where Z i is the Pauli-Z operator for spin i. QAOA can find the ground state of this Hamiltonian by minimizing the expectation value of Ĉ for the ansatz state | γ, β where γ = (γ 1 , . . ., γ p ), β = (β 1 , . . ., β p ) are variational parameters.In particular, the quantum circuit preparing | γ, β consists of p layers each containing a phaseseparation operator U C = e −iγq Ĉ and a mixing operator U B = e −iβq B , where B = i X i , with q = 1, . . ., p [11].Since all terms of Ĉ commute, we can implement each term A common approach is to decompose U ij C into a gate sequence consisting of two conditional phase rotations of π, i.e. standard CZ-gates, combined with several singlequbit gates [12,21].We present such a decomposition in Fig. 1(b), where the dependence on the continuous parameters Γ ij is introduced via an arbitrary-angle single-qubit Z-rotation.An alternative approach is to use a single controlled arbitrary-phase gate (C-ARB gate), which can add any desired phase factor e −iφ to the |11 state.This gate naturally applies the angle 2Γ ij and, together with  two single-qubit Z-rotations, realizes the unitary U ij C , see Fig. 1(a).
In QAOA, the number of unitaries U ij C grows linearly with the number of two-qubit terms in Ĉ and with the number of QAOA layers, p.Thus, it is essential that each U ij C is implemented with high fidelity.The direct implementation we present in this work significantly reduces both the physical gate count and the sequence duration.Thus, this approach is expected to find correct solutions to complex problems with higher probability.
We run QAOA on a quantum device with 7 superconducting transmon qubits, see Appendix A for device parameters and a false-colored micrograph of the device.The qubits are pairwise connected as illustrated in Fig. 2(a).Single-qubit X and Y -rotations are implemented with microwave pulses, while Z-rotations are performed as virtual gates [31] which take zero time as they are implemented through a redefinition of the reference frame.To realize CZ-gates, we use a standard approach relying on a flux pulse which shifts the transition frequency of one of the qubits to bring the |11 state of a pair of coupled qubits in resonance with the non-computational |20 state [32][33][34].The resulting hybridization leads to a coherent population oscillation between the two states.The frequency detuning between the |11 and |20 states, ∆, is 0 during the gate, and after an interaction of the duration corresponding to one oscillation period, the population returns to the |11 state with an added phase of π, see green diamond in Fig. 1(c) and (d).We generalize the CZ-gate to a C-ARB gate on our device by exploiting near-resonant interactions of the |11 and |20 states [24], i.e. ∆ = 0, to acquire conditional phase angles ranging from 0 to 2π, see Fig. 1(d).We vary ∆ by sweeping the flux pulse amplitude and simultaneously adapting the pulse length to maximize population recovery in the computational subspace, see blue dots in Fig. 1(c).Details about the gate implementation are provided in Appendix B.
We compare the performance of both approaches on two example instances of the NP-complete exact-cover problem [29].The aim of exact cover is to decide whether it is possible to cover all elements in a set S exactly once by an appropriate selection of subsets {V i } from a given collection of subsets V .In the example visualized in Fig. 2(b), each row corresponds to an element of a threeelement set S, while each column corresponds to a subset V i out of three given subsets.The dots visualize which elements (rows) are included in a subset (column).In this picture, the task is to find a selection of columns such that each row is covered by exactly one dot.This condition is fulfilled by two solutions: selecting the first two columns or selecting the last column.In a mathematical formulation of the exact-cover problem (see Appendix C), the grid in Fig. 2(b) corresponds to a visual representation of an incidence matrix K, where a dot in row and column i indicates an entry K i = 1 while empty cells in the grid indicate entries equal to 0. When mapping an instance of exact cover to an Ising Hamiltonian [30,35], the i-th qubit encodes whether a subset V i is selected or not, see Appendix C. In the visualization in Fig. 2(b), the qubit that represents a subset is indicated by the label above the column and by the color used for the dots.Fig. 2(c) shows an example of a larger instance of exact cover with seven subsets, requiring seven qubits.
To focus on the comparison between the two methods for realizing the two-qubit unitaries U ij C , these two problem instances are chosen such that the resulting Ising Hamiltonians respect the hardware connectivity graph of our device, see Fig. 2(a), and that all single-qubit terms vanish, i.e. h i = 0.The three-qubit problem instance depicted in Fig. 2(b) yields an Ising Hamiltonian with J A1B2 = 0.5 and J A2B2 = 1.In the basis |A 1 A 2 B 2 , the two possible selections of columns covering all rows, namely are encoded with the states |110 and |001 , respectively, where a 1 in position i indicates that the i-th column of K is included in the selection of subsets.For the seven-qubit problem instance of Fig. 2(c), we have J A3B2 = 0 and J ij = 0.5 for all other physically connected qubit pairs.This instance also possesses two solutions, A = {A 1 , A 2 , A 3 , A 4 } and B = {B 1 , B 2 , B 3 }, corresponding to the states |1111000 and |0000111 , respectively, using the basis |A 1 A 2 A 3 A 4 B 1 B 2 B 3 .Note that we have labeled the qubits in Fig. 2(a) such that the solutions always correspond to either selecting the qubits labeled with A or the qubits labeled with B, see Appendix C.
The QAOA circuit solving the three-qubit problem instance consists of 15 (32) operations per layer while the corresponding circuit for the seven-qubit problem instance consists of 42 (98) operations per layer, for the direct (decomposed) implementation, respectively.Thus, the seven-qubit problem instance, which requires deep QAOA circuits, yields a circuit comprising 259 (399) operations in total for the direct (decomposed) implementation at p = 6 (p = 4), see Appendix G for more details.

III. PERFORMANCE OF QAOA
A single-layer QAOA implementation (p = 1) is a useful intermediate benchmark towards implementing multilayer QAOA circuits since there are only two variational parameters γ = (γ 1 ) and β = (β 1 ), hereafter referred to as γ and β for ease of notation, which allows us to map out the full optimization landscape experimentally.As further discussed in Appendix D, when p = 1, the cost-function landscape is π/2-periodic in β for a problem without single-qubit terms.Moreover, since all eigenvalues of Ĉ are odd multiples of 1  2 , the landscape is 2π-periodic in γ.Finally, the landscape is always point-symmetric around the center point of a period.We can thus reduce our considerations to γ ∈ [0, π[ and β ∈ [0, π/2[.For each pair of parameters, we prepare the state |γ, β 20000 times, see Appendix G for the full pulse sequence, and we evaluate the cost function C(γ, β) = γ, β| Ĉ |γ, β .We use a three-level readout scheme discussed in Appendix A, which allows us to discard the measurement outcomes with leakage outside of the computational space, see Appendix H.In the context of QAOA, discarding leakage events corresponds to reducing the effective number of shots available for evaluating the cost function by rejecting outcomes that are not valid bit-strings.In this regard, leakage is different from other undetectable errors, for which such a post-selection cannot be done.
We observe that the resulting cost-function landscapes, see Fig. 3, are odd functions of β with a line symmetry axis at β = π/4, see Appendix D. The locations of all extrema in the measured landscape, see Fig. 3(a), are in good agreement with noise-free simulations, see Fig. 3(b), which suggests that the coherent errors are small in our implementation.Errors due to decoherence mostly affect the contrast of the landscapes [36], see Fig. 3(c) and Appendix E. The distortions of the local extrema located at γ > π/2 are attributed to the residual ZZ-coupling between the qubits [37], which we confirm with masterequation simulations, see Appendix E.
By embedding the evaluation of C(γ, β) measured on the quantum device into a classical Nealder-Mead optimizer, we demonstrate that the landscape is suitable as cost function for a classical optimizer.The closed-loop classical optimizer finds the optimal parameters for most random initialization parameters, see Fig. 3(d-f), however, some convergence traces get trapped in local minima.Note that in this single-layer implementation, the cost never reaches the ground-state energy C gs = −1.5, neither in the measurement nor in the noise-free simulation, which indicates that QAOA circuits of larger depth are indeed required for this problem.
To obtain better approximate solutions to the combinatorial problem instance, we execute QAOA circuits with additional layers and study the effect of the depth p on the output state distribution.To investigate the performance of the quantum part of QAOA rather than the performance of the classical optimizer, we initialize the algorithm with optimal parameters obtained from noise-free simulations.We then optimize these parameters locally to correct for small coherent errors, and we estimate the resulting state distribution as a function of depth from 20000 single-shot measurements, see Fig. 4 for the three-qubit case.Three layers are required in noisefree simulations (black wire-frames) to fully concentrate the probability distribution on the two solution states |110 and |001 corresponding to the selection of subsets A and B, respectively.
Mixing angle, We quantify the experimental outcomes using the classical fidelity [38] between the output state probability distribution arising from the measurements, P , and from noise-free simulations, P , where P i and Pi correspond to the probabilities of the i-th basis state in the Hilbert space.Note that 0 ≤ F(P, P ) ≤ 1 with F(P, P ) = 1 if and only if P = P .The state distributions of the implementation using C-ARB gates, see filled bars in Fig. 4(a-c), have fidelities of 98.93 %, 95.93 %, and 86.20 % for p = 1, 2, and 3 respectively, with respect to the corresponding distribution obtained with noise-free simulations.As expected, the reduction of the fidelity with the number of layers p illustrates the accumulation of errors in circuits of increasing depth.However, the concentration of probability on solution states as p increases is stronger than the detrimental effect of the additional errors, such that overall, the probability of measuring a solution increases with p.By contrast, in the implementation using CZ-gates, see Fig. 4(d-f), the concentration of probability on solution states only compensates the additional errors for p = 2 while the errors outweigh the gain of an additional layer for p = 3.This is also reflected by lower fidelities of 96.37 %, 87.43 %, and 64.52 % for p = 1, 2, and 3 respectively, and is explained by the fact that decoherence and residual ZZcoupling accumulate over the longer gate sequence.
Master-equation simulations (red wire-frames) are in excellent agreement with the measured distributions, see Appendix E for details.We confirm from these simulations that decoherence is the main limitation in this experiment while residual ZZ-coupling cause additional errors, in particular for the decomposed implementation.
The landscape and state probability distributions of the seven-qubit problem instance presented in Appendix F lead to a similar conclusion, i.e. the direct implementation exploiting C-ARB gates is in better agreement with noisefree simulations than the decomposed implementation.
It is expected that additional layers in QAOA circuits increase the number of reachable states, thereby leading to better approximate solutions in the absence of experimental noise.To gain further insight into the trade-off between the extended reachable state space and the additional noise resulting from increased depth, we determine the enhancement of the success probability provided by the output state distribution over a uniform state distribution as a function of p for both problem instances, see Fig. 5.We define the enhancement as P s /P u , where P s is the success probability, i.e. the sum of the probabilities of all solution states, and P u = 2/(2 N − 1) is the probability of sampling a solution from a uniform probability distribution over all possible states.Note that we exclude the state |0 ⊗N , which is never a solution in the context of exact cover.We indicate the sequence duration for both implementations with additional axes in Fig. 5, where sequence duration is defined as the time between the start FIG. 4. Output state probability distribution for the threequbit problem instance implemented with controlled arbitrary phase gates (a,b,c), and decomposed using CZ-gates (d,e,f).
States are measured at optimal parameters for depth of p = 1 (a,d), p = 2 (b,e), and p = 3 (c,f).The filled bars correspond to the measured state probabilities in which we highlight the problem solutions in blue (direct implementation) and green (decomposed implementation), respectively.The black wire-frames are the expected QAOA outcome from noise-free simulations and the red wire-frames are from master-equation simulations.
of the initialization pulse and the start of the readout.
In the three-qubit case, Fig. 5(a), the direct (decomposed) implementation shows a maximal enhancement of success probability of 3 (2.4) at p = 3 (p = 2).The direct implementation (blue dots) provides a higher enhancement than the decomposed implementation (green dots) because the sequence duration is shorter for a fixed p, and the problem instance requires at least p = 3 to reach maximal enhancement in an ideal setting (black squares).
When the problem increases in complexity, the number of layers required to reach maximal enhancement in a noise-free scenario also increases.For the seven-qubit instance, we find that p = 6 is required to reach a success probability above 90%, see Fig. 5(b).Consequently, the ability to execute more layers in shorter time provides an even more pronounced advantage.In particular, for the direct implementation, we find that increasing p from 1 to 2 increases the enhancement of the success probability to 9.5.However, when further increasing to p = 3, the extended reachable state space does not compensate the additional noise arising from the increased sequence duration.For the decomposed version, going beyond p = 1 does not provide any benefits.Thus, for the seven-qubit problem we only benefit from adding layers when taking advantage of the directly implemented C-ARB gates, which improves the performance by a factor of 3 compared to the decomposed implementation.
Finally, to emphasize that the limitations for deeper circuits are directly related to the increased sequence duration rather than the depth itself, we notice that for a fixed sequence duration of L = 5 µs, both implementations of the seven-qubit instance show similar enhancement of success probability despite being of depth p = 6 (direct) and p = 2 (decomposed).

IV. DISCUSSION
In this work, we show that controlled arbitrary phase gates (C-ARB gates) enable a significant reduction of the number of physical gates required to implement QAOA circuits of any depth on quantum hardware.We demonstrate the advantage of this approach by comparing it to a standard QAOA decomposition on two problem instances of the exact-cover problem, with three and seven qubits, respectively.Despite a more demanding calibration scheme requiring interpolation of gate parameters, C-ARB gates in QAOA circuits systematically outperform the decomposed alternative for a fixed depth and are able to benefit from the extended reachable state space of more layers.
We foresee an even more pronounced advantage for larger-scale combinatorial optimization problems because the number of layers required to solve problems with QAOA is expected to scale with the number of qubits involved in the experiment [39,40], in particular for dense problem graphs.In addition, the number of physical two-qubit gates saved within each layer also scales with the number of two-qubit terms in the cost Hamiltonian.Our results demonstrate that hardware-efficient gate sets are key components in extending the impact of near-term quantum applications, which may become even more relevant when solving problem instances that do not match the connectivity of the hardware.For example, it has recently been observed that the need for swap-gates can significantly reduce the performance of a QAOA implementation if a decomposed implementation of swap gates is used [13].A direct, hardware-efficient implementation combining a controlled arbitrary phase and a swap-gate may therefore be another key component to improve the performance in these cases, and should be considered in future research.
the National Centre of Competence in Research Quantum Science and Technology (NCCR QSIT), a research instrument of the Swiss National Science Foundation (SNSF), by the SNFS R'equip grant 206021-170731 and by ETH Zurich.This work was undertaken thanks in part to funding from NSERC, Canada First Research Excellence Fund and ARO W911NF-18-1-0411.The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government.

Appendix A: Experimental Setup and Device Parameters
The experiments described in this manuscript are performed in a cryogenic setup [41,42], the wiring scheme of which is summarized in Fig. 6.Each qubit is controlled by a flux line for frequency tuning which enables two-qubit gates, and a microwave drive line for realizing single-qubit gates.The pulses are generated with arbitrary waveform generators (AWGs).The drive pulses are generated at an intermediate frequency of 100 MHz and upconverted to microwave frequencies.Multiplexed readout is performed via two feedlines [42,43] with the readout pulses generated   by an ultra-high frequency quantum analyzer (UHFQA).
The measurement signals at the output ports of the sample are first amplified with a wide-bandwidth near-quantumlimited traveling-wave parametric amplifier (TWPA) [44], then with a high-electron-mobility transistor (HEMT) amplifiers and finally with low-noise, room-temperature amplifiers (WAMP).Thereafter, the signals are downconverted and processed using the weighted integration units of the UHFQAs.The quantum device [42] shown in Fig. 7, is fabricated on a high-resistivity intrinsic silicon substrate.Photolithography and reactive ion etching are used to define resonators, signal lines and qubit structures in a 150 nm thin niobium film sputtered onto the substrate.We also add air bridges to the device to establish a welldefined ground plane and for cross-overs in signal lines.The Al/AlOx/Al Josephson junctions of the transmon qubits are fabricated using electron-beam lithography and shadow evaporation.
The parameters of the device listed in Table I, are measured using standard spectroscopy and time-domain methods.For the readout, we characterize the ability to identify the correct qubit state as well as the second excited state of each transmon qubit.In particular, we use two weighted-integration units per qubit to distinguish |0 from |1 and |1 from |2 , respectively.A standard Gaussian mixture model is then used to classify the resulting FIG. 8. Single-qubit error per gate (vertices) and two-qubit error per gate (edges) in percent, measured with randomized benchmarking.We indicate the residual ZZ-coupling between each pair of coupled qubits below the corresponding two-qubit gate infidelity.The gate between A3 and B2 is not needed for the problem instances considered in this work (dashed line).integrated weights of each single shot measurement.
To characterize the gate performance, we perform randomized benchmarking on all qubits to find the error per single-qubit Clifford, see Fig. 8.For the two-qubit gates, we only characterize with a fixed conditional phase of π such that the two-qubit gate is in the Clifford group and we measure the error per gate from interleaved randomized benchmarking, see infidelities next to lines indicating the coupling elements in Fig. 8.
We also characterize the residual ZZ-coupling, α ij , between pairs of coupled qubits, as the frequency shift of qubit i when qubit j is in the excited state.We verify that α ij = α ji .The experimentally determined residual ZZ-couplings are shown below each error rate in Fig. 8.

Appendix B: Controlled Arbitrary Phase Gate
The goal of a C-ARB gate is to apply a unitary in a two-qubit subspace which adds a desired phase φ to the |11 state, We realize this unitary by exploiting near-resonant interactions of the |11 and |20 states.In particular, to span a conditional phase in the range [0, 2π[, we vary the frequency detuning ∆ between the |11 and |20 states with a flux pulse of amplitude a and length l.During the flux pulse, the population of the |11 state is transferred to the |20 state and after a time τ g , the population returns to the |11 state with a phase where J is the constant coupling strength between the |11 and |20 states.
To calibrate the flux pulse amplitudes and lengths, we start by measuring the population in the |11 state as a function of pulse amplitude and length, yielding a characteristic Chevron pattern (Fig. 1(c)).For each measured amplitude, we fit the |11 population to a cosine to extract the pulse length maximizing the population recovery in the computational subspace.Next, we measure the conditional phase for 45 flux pulse amplitudes.For each amplitude, the control qubit is brought to the excited state with a π-pulse while the target qubit is brought to a superposition state with a π/2-pulse.Then, a flux pulse is applied to the control qubit and finally a π-pulse and a π/2-pulse are applied to the control and the target qubit, respectively.We extract the conditional phase by varying the phase of the second π/2-pulse and comparing the phase of the target qubit with and without the initial πpulse applied to the control qubit.We use the calibrated pulse lengths to ensure high population recovery in the computational subspace for each amplitude.We interpolate the pulse length linearly in between calibration points if required to reach all phases in the range 0 to 2π.Note that on our device, we acquire phase from 0 to −2π but reverse the sign in Fig. 1(d) for convenience.Finally, the flux-biased qubit also acquires a dynamic phase φ D [33].We compensate for this single-qubit phase shift with a virtual Z-gate after the C-ARB gate.To calibrate φ D , we compare the phase of the qubit with and without the flux pulse, for amplitudes in the range of interest.The number of calibration points is gate-dependent and is set such that we can unwrap the dynamic phase as a function of flux pulse amplitude without ambiguity.This allows us to interpolate (with cubic splines) the dynamic phase between calibration points.Gates between specific pairs of qubits also include additional flux pulses on neighboring qubits to avoid undesired interactions, see Appendix G.We calibrated the dynamic phase acquired by these neighboring qubits simultaneously to the dynamic phase of the qubit directly involved in the gate.
The calibration procedures are automated such that human interaction is only required to verify the quality of the fits.The approach is thus scalable to larger devices.Note that gate architectures allowing the acquisition of conditional phase as a linear function of the flux pulse length could further simplify and speed up the calibration procedure [45].The exact-cover problem is mathematically formulated as follows [29].Given a collection of subsets V = {V i } i∈1,...,n with V i ⊆ S, the task is to verify whether there exists a set of indices I ⊆ {1, . . ., n} such that {V i } i∈I forms a partition of S, i.e., the sets in {V i } i∈I are disjoint and their union equals S.This is the case if 0 = min (b1,...,bn)∈{0,1} n where the element K i of the incidence matrix K is 1 if the -th element of S is contained in subset V i and 0 otherwise.A bit value b i = 1 indicates that the subset V i is selected.Using spins z i ∈ {±1} instead of bits b i = zi+1 2 , multiplying out, and dropping additive constants, the optimization problem can be formulated as [30,35,46] min where Solving this optimization problem is equivalent to finding the ground-state energy of the Ising Hamiltonian, see Eq. ( 1), with J ij and h i values given by Eq. (C3) and Eq.(C4), respectively.In the visual representations of the incidence matrices depicted in Fig. 2(b) and (c), the bullets represent the entries with K i = 1 while empty cells correspond to K i = 0.By substituting these values of K i into the above equations, we see that h i = 0 for all i in both problem instances, and we obtain the values of J ij given in Section II.
To run QAOA without requiring swaps, we need all-toall physical connectivity between qubits that occur jointly in any row of the incidence matrix K.For the physical connectivity graph shown in Fig. 2(a), this means that each row can contain only up to two nonzero entries.The positive sign in Eq. (C3) reveals that the spins corresponding to a row with two nonzero entries have an antiferromagnetic coupling.This is in line with the exact-cover constraint, which requires that exactly one of them is selected in a valid solution, but not both.Thus, any problem instance that does not require swaps on our device and that does not decompose into a set of isolated subgraphs must correspond to a lattice of antiferromagnetically coupled spins.Then, either all qubits labeled with A or all qubits labeled with B have to be in an excited state in a valid solution.In the presence of a row (or rows) with a single nonzero entry, some external field term(s) h i of the Ising Hamiltonian become(s) nonzero and the solution that fulfills the exact-cover condition also for this row (these rows) is favored.Otherwise, both solutions are valid, which is the case for the problem instances considered in our experiments.
Appendix D: Properties of QAOA Landscapes Following Ref. [11], the parameter γ q can be restricted to [0, 2π[ if the problem Hamiltonian Ĉ has integer eigenvalues, while β q can always be restricted to [0, π[.In this appendix, we discuss further periodicity and symmetry properties of the QAOA cost function, which enable us to reduce the parameter space and better understand the cost-function landscapes we observe.To this end, we consider the cost of a p-layer QAOA circuit, in which U ( γ , β ) = q e −iβ q B e −iγ q Ĉ is the p-layer QAOA unitary with γ = (γ 1 , . . ., γ p ) and β = (β 1 , . . ., β p ).
If the eigenvalues of Ĉ are integer multiples of α, then by setting γ q = γ q + 2π α in Eq. (D1) and noting that e ±i 2π α Ĉ = I is the identity, we find that C( γ , β ) is (2π/α)periodic in γ q .
In addition, if all eigenvalues of Ĉ are odd multiples of α, we have e ±i π α Ĉ = −I, where the minus sign is a global phase, so that the cost is (π/α)-periodic.For both problem instances considered in this work, the eigenvalues of Ĉ are odd multiples of 1/2, so that the landscapes are 2π-periodic.
Inserting β q = β q + π into Eq.(D1) and noting that e ±iπ B = i e ±iπXi = i (−I) yields the π-periodicity in β q mentioned in [11].Moreover, since e −i π 2 B = i e −i π 2 Xi corresponds to an X π rotation of all qubits, setting β p = β p + π 2 in the last layer p corresponds to flipping the sign of all spins before estimating the energy of Ĉ.If the Ising Hamiltonian Ĉ does not contain single-qubit terms (h i = 0 for all i), this sign flip does not change the energy, and the cost landscape is π 2 -periodic in β p .As this applies to the examples considered in this paper, we measure the landscapes for p = 1 only up to β = β 1 = π 2 .By simultaneously setting γ q = −γ q and β q = −β q in Eq. (D1), and noting that Ĉ, B, and |+ are real-valued, we have C(− γ , − β ) = (C( γ, β)) † = C( γ, β).Therefore, the cost landscape is point-symmetric with respect to the origin, which implies that it is also point-symmetric with respect to the center point of a period.When measuring a landscape, we can thus restrict either β or γ to half a period without losing information about the landscape.
In the examples shown in this paper, we restrict γ to half a period, i.e. to the interval [0, π[.Finally, when choosing γ q = −γ q and β q = β q in Eq. (D1), we obtain where U ( γ, β) = q e −iβq B e −iγq(− Ĉ) .This is equivalent to the QAOA cost function for a problem Hamiltonian Ĉ = − Ĉ.Thus, in cases for which running QAOA with Ĉ and with − Ĉ leads to the same landscape, the landscape is an odd function of γ.In particular, this occurs for both problem instances considered in this work.
Due to the point-symmetry observed above, the landscape is also an odd function of β if it is an odd function of γ.In the landscape plots for p = 1, this manifests as line symmetries (with a change of the sign of the energy) about both coordinate axes and with respect to the center line of each period.Within the chosen range of β, we observe this type of symmetry with respect to the horizontal line β = π 4 .

Appendix E: Master-Equation Simulations
We model the dynamics of our system by a masterequation given by where ρ is the density matrix describing the system at time t and H(t) is the Hamiltonian, the time-dependence of which models the applied gate sequence.The collapse operators ĉk model incoherent processes.We solve the master equation numerically [47] in the rotating frame of qubits.Incoherent errors are described by Lindblad terms in Eq. (E1) with where T 1,i and T 2,i are the lifetime and decoherence time (Ramsey decay time) of qubit i as listed in Table I.
In addition to the incoherent errors introduced by the Lindblad terms, it is important to also consider the impact of coherent errors on the algorithm.In our experiment, the main source of coherent errors is residual ZZ-coupling between neighboring qubits [37].To model this coupling in the numerical simulations, we include the Hamiltonian where the sum is over connected pairs of qubits with the residual ZZ-couplings listed in Appendix A. We notice from simulations of the full QAOA circuit that the residual ZZ-couplings give rise to the distortions observed in the cost landscapes, see Fig. 9.The main effect of decoherence is to reduce the overall contrast of the landscape.In particular, for the direct implementation we find a minimum Mixing angle,  We use a single-layer QAOA circuit with C-ARB gates to measure the cost-function landscape of the seven-qubit problem instance, see Appendix G for the full pulse sequence.The measured landscape, see Fig. 10(a), is in good qualitative agreement with noise-free simulations, see Fig. 10(b).Due to decoherence, the absolute values of the global extrema are smaller than in noise-free simulations, see Fig. 10(c).Starting from random initialization, the convergence traces of the separating angle, the mixing angle and the corresponding cost are displayed in Fig. 10(d), (e) and (f) respectively.
The output state distribution at optimal parameters for the direct and decomposed implementation of C-ARB gates are shown in Fig. 11(a) and (b), respectively.We display the distributions yielding highest success probability for each implementation, i.e. p = 2 for the direct implementation and p = 1 for the decomposed implementation.For the direct implementation, the two most likely measured states are |1111000 and |0000111 , corresponding to the respective selections of subsets A and B forming exact covers of the considered problem instance.Conversely, the solution states are not the two most likely measured states for the decomposed implementation.For both implementations, the measured data matches well with expectations from master-equation simulations (red wire-frame).
Appendix G: QAOA gate sequences Fig. 12(a) shows the pulse sequence generated by the AWGs for a single layer of QAOA in the seven-qubit instance using the direct implementation of C-ARB gates.Since the length of the flux pulses depends on the required phase, see Fig. 1(c) and (d), the pulse sequence is shown for a representative flux pulse length (close to the average) that we obtain for γ = π 5 .After an initial π/2-pulse on each qubit to prepare a |+ ⊗7 state, the phase-separation operator U C of the first QAOA layer starts with two parallel C-ARB gates corresponding to the couplings J A3B1 and J A2B2 , while qubit B 3 is detuned by an additional flux pulse to avoid an unwanted interaction when the ef-transition frequency of A 2 crosses the parking frequency of qubit B 3 .Since the additional Z Γij rotations, see Fig. 1(a), are implemented as virtual gates [31] through a redefinition of the reference frame, they are not shown in the pulse sequence.After the last round of flux pulses, the final two π/2-pulses for each qubit (plus a virtual gate between them) implement the mixing operator U B = e −iβB , where we have decomposed each term e −iβXi as shown in Fig. 1(a).After the end of the shown pulse sequence, we perform qubit readout.
In the significantly longer pulse sequence shown in Fig. 12(b), the controlled arbitrary phase gates are decomposed as described in Fig. 1  State prob., via a virtual gate, and the Z Γij in the center of the gate decomposition is another virtual gate.Pulse sequences for the direct and the decomposed implementation of the three-qubit problem instance are shown in Fig. 12(c) and (d), where analogous explanations apply.
To implement additional layers, the pulses between the end of the initialization pulses and the start of the readout are repeated p − 1 times.For the configurations considered in Fig. 5, this leads to the gate counts shown in Table II.For both QAOA implementations, we discard all measured states containing at least one leakage event.We show the percentage of single-shot measurements we keep

FIG. 1 .
FIG. 1.(a) Quantum circuit of a layer q of QAOA for the twoqubit subspace |QiQj , using the controlled arbitrary-phase gate (blue) to rotate the |11 state by an angle 2Γij where Γij = 2γqJij.(b) A QAOA layer with the phase-separation unitary U ij C decomposed into CZ gates (green) and additional Hadamard gates and single-qubit Z-gates.(c) Excited-state population Pe of the control qubit Qi = A1 brought in interaction with the target qubit Qj = B2 via a flux pulse.We perform a two-dimensional sweep of flux pulse amplitude a and flux pulse length l, and indicate the maximum population recovery with blue dots.(d) Conditional phase for the dots indicated in (c).The green diamond corresponds to the CZgate.The right axis indicates the detuning between |11 and |20 .The inset depicts the pulse sequence used to measure the conditional phase.Single-qubit π-pulses and π/2-pulses are shown in dark blue and purple, respectively.The flux pulse (light blue) of amplitude a and length l is applied to the control qubit Qi, see Appendix B for more details.
FIG. 2. (a) Hardware connectivity graph of the quantum device.Dots correspond to qubits and edges indicate between which pairs of qubits two-qubit gates can be realized.The grey dashed line indicates the subset of qubits used for the three-qubit problem instance depicted in (b).(b) Visual representation of the incidence matrix K (dots indicating entries K i = 1) for a chosen three-qubit exact-cover problem instance.The labels above the columns (and the colors) indicate which physical qubits are used to represent the corresponding subset.The two solution states are indicated below the grid.(c) Visual representation of the incidence matrix for a chosen seven-qubit problem instance.

FIG. 3 .
FIG. 3. Cost function evaluated for p = 1 on the three-qubit problem instance, using C-ARB gates.(a) Cost-function landscape as a function of variational parameters measured with direct implementation of C-ARB gates.(b) Cost-function landscape obtained from noise-free simulations.(c) Experimental evaluation (blue) and simulation (black) of the cost function for two horizontal line cuts of (a) and (b), with β = π/8 (dotted lines) and β = 3π/8 (dashed lines) respectively.(d,e) 10 convergence traces of the separation angle and the mixing angle, respectively, for end-to-end optimization starting from random parameter initialization.(f) Average energy (solid blue line) and individual convergence traces (faded lines) of the energy corresponding to parameters shown in (d,e).

FIG. 5 .
FIG. 5. Performance of QAOA for (a) three qubits and (b) seven qubits.The blue points are implemented with the direct controlled arbitrary-phase gate and the green points are implemented with the gate sequences decomposed into CZ gates.The black squares indicate the highest success probabilities found with noise-free simulations.The top axes indicate the sequence duration for the direct implementation and the decomposed implementation in green and blue, respectively.The gray areas indicate success probabilities above 1.

FIG. 6 .
FIG.6.Experimental setup.The experiment is controlled by AWGs whose signals are routed to the quantum device through a series of bandpass filters (BP), lowpass filters (LP) and Eccosorb filters.The flux pulses are combined with a voltage source using a bias-T.The IQ signal from the control AWG is upconverted (UC) to a microwave signal using an IQ mixer.The readout signal is generated by the UHFQA, and the output from the quantum device is amplified by a chain of amplifiers before being downconverted (DC) and analyzed by the UHFQA.

FIG. 7 .
FIG.7.Optical micrograph of the device.Each qubit is colored corresponding to the Fig.2and connected to neighboring qubits by coupling resonators (white).Each qubit is connected to a readout resonator (red), a flux line (green) and a drive line (pink).
Appendix C: Exact Cover to Ising

FIG. 9 .
FIG. 9. Simulated and experimental cost-function landscapes of the three-qubit problem instance for p = 1.The direct implementation is displayed in (a), (b) and (c).The decomposed version is shown in (d), (e) and (f).The master-equation simulations are performed including errors from residual ZZcoupling only (a, d), and with both residual ZZ-coupling and decoherence (b, e).The experimental data is shown in (c) and (f).
Appendix F: Seven-qubit problem instance

FIG. 10 .
Fig.12(a) shows the pulse sequence generated by the AWGs for a single layer of QAOA in the seven-qubit instance using the direct implementation of C-ARB gates.Since the length of the flux pulses depends on the required phase, see Fig.1(c) and (d), the pulse sequence is shown for a representative flux pulse length (close to the average) that we obtain for γ = π 5 .After an initial π/2-pulse on each qubit to prepare a |+ ⊗7 state, the phase-separation operator U C of the first QAOA layer starts with two parallel C-ARB gates corresponding to the couplings J A3B1 and J A2B2 , while qubit B 3 is detuned by an additional flux pulse to avoid an unwanted interaction when the ef-transition frequency of A 2 crosses the parking frequency of qubit B 3 .Since the additional Z Γij rotations, see Fig.1(a), are implemented as virtual gates[31] through a redefinition of the reference frame, they are not shown in the pulse sequence.After the last round of flux pulses, the final two π/2-pulses for each qubit (plus a virtual gate between them) implement the mixing operator U B = e −iβB , where we have decomposed each term e −iβXi as shown in Fig.1(a).After the end of the shown pulse sequence, we perform qubit readout.In the significantly longer pulse sequence shown in Fig.12(b), the controlled arbitrary phase gates are decomposed as described in Fig.1(b).Each Hadamard gate is implemented by a π/2-pulse and a Z π rotation

FIG. 11 .
FIG. 11.Output state probability distribution for the seven-qubit problem instance implemented with C-ARB gates (a) and decomposed using CZ-gates (b).States are measured at optimal parameters for depth of p = 2 for (a) and p = 1 for (b).The filled bars correspond to the measured problem solutions, while the black (red) wire-frames are the expected QAOA outcome from noise-free (master-equation) simulations.
FIG. 12. Pulse sequences for implementing QAOA with p = 1.The shaded area around flux pulses illustrates which interaction they implement and how long buffer times before and after the flux pulse are chosen.(a) Direct implementation of the seven-qubit instance.(b) Decomposed implementation of the seven-qubit instance.(c) Direct implementation of the three-qubit instance.(d) Decomposed implementation of the three-qubit instance.

TABLE II .
Number of two-qubit gates (first row), singlequbit gates (second row), and virtual gates (third row) in the implemented QAOA sequences.

TABLE III .
(First 4 rows) Percentage of data kept after discarding all measured states containing at least one leakage event.(Bottom 4 rows) Corresponding average leakage per two-qubit gate in percent.as a function of the number of layers in the top half of TableIII.We estimate the corresponding average leakage per gate as λ ≈ 1 − P 1/ng post , where P post is the fraction of data left after post-selection and n g is the number of two-qubit gates in the sequence.All average leakage per gate values lie between 0.2% and 2.1%.