Improved Success Probability with Greater Circuit Depth for the Quantum Approximate Optimization Algorithm

Present-day, noisy, small or intermediate-scale quantum processors—although far from fault toler-ant—support the execution of heuristic quantum algorithms, which might enable a quantum advantage, for example, when applied to combinatorial optimization problems. On small-scale quantum processors, validations of such algorithms serve as important technology demonstrators. We implement the quantum approximate optimization algorithm on our hardware platform, consisting of two superconducting transmon qubits and one parametrically modulated coupler. We solve small instances of the NP (nonde-terministic polynomial time)-complete exact-cover problem, with 96.6% success probability, by iterating the algorithm up to level two. DOI:


I. INTRODUCTION
Quantum computing promises exponential computational speedup in a number of fields, such as cryptography, quantum simulation, and linear algebra [1]. Even though a large, fault-tolerant quantum computer is still many years away, impressive progress has been made over the last decade using superconducting circuits [2][3][4], leading to the noisy intermediate-scale quantum (NISQ) era [5]. It was predicted that NISQ devices should allow for "quantum supremacy" [6], that is, solving a problem that is intractable on a classical computer in a reasonable time. This was recently demonstrated on a 53-qubit processor by sampling the output distributions of random circuits [7].
Two of the most prominent NISQ algorithms are the quantum approximate optimization algorithm (QAOA) for combinatorial optimization problems [8][9][10] and the variational quantum eigensolver (VQE) for the calculation of molecular energies [11][12][13]. The QAOA is a heuristic algorithm that could bring a polynomial speedup to the solution of specific problems encoded in a quantum * bylander@chalmers.se Hamiltonian [14,15]. Moreover, the QAOA should produce output distributions that cannot be efficiently calculated on a classical computer [16].
The QAOA is a hybrid algorithm, as it is executed on both a classical and a quantum computer. The quantum part consists of a circuit with p levels, where better approximations to the solution of the encoded problem are generally achieved with higher p. In this work, we report on using our superconducting quantum processor to demonstrate the QAOA with up to p = 2, enabled by adequately high gate fidelities. We solve small toy instances of the NP (nondeterministic polynomial time) complete exact-cover problem with 96.6% success probability. For p > 1, the QAOA solution cannot be efficiently calculated on a classical computer, as the computational complexity scales doubly exponentially in p [8].
Our interest in solving the exact-cover problem originates from its use in many real-world applications, for instance, the exact-cover problem can provide feasible solutions to airline planning problems such as tail assignment [17]. Currently, this is solved by well-developed optimization techniques in combination with heuristics. By leveraging heuristic quantum algorithms such as the QAOA, the current approach can be augmented and might provide high-quality solutions while reducing the running time. Applying the QAOA to instances of the exact-cover problem extracted from real-world data in the context of tail assignment has been numerically studied with 25 qubits, corresponding to 25 routes and 278 flights [18].

II. QAOA
All NP-complete problems can be formulated in terms of finding the ground state of an Ising Hamiltonian [22]. The QAOA aims at finding this state by applying two noncommuting Hamiltonians,B andĈ, in an alternating sequence (with length p) to an equal superposition state of n qubits [visualized in Fig. 1(a)], where γ i and β i are (real) variational angles. The first Hamiltonian in the sequence is the Ising (cost) Hamiltonian specifying the problem, and the second is a transverse field (mixing) Hamiltonian defined byB where h i and J ij are real coefficients, and theσ x(z) i are the Pauli X (Z) operators applied to the ith qubit.
The ground state of Eq. (2) corresponds to the lowest-energy state. We therefore define the energy expectation value of Eq. (1) as a cost function This cost function is evaluated by repeatedly preparing and measuring | γ , β on a quantum processor. To find the state that minimizes Eq. (4), a classical optimizer is used to find the optimal variational angles γ * , β * . For a high enough p, | γ * , β * is equal to the ground state ofĈ and hence yields the answer to the optimization problem [8]. However, for algorithms executed on real hardware without error correction, noise will inevitably limit the circuit depth, implying that there is a trade-off between algorithmic errors (too low p) and gate errors (too high p). Note that, in order to find the solution to the optimization problem, it is not necessary for | γ * , β * to be equal to the ground state: as long as the ground-state probability is high enough, the quantum processor can be used to generate a shortlist of potential solutions that can be checked efficiently (in polynomial time) on a classical computer. For instance, even if the success probability of measuring the ground state is only 5%, we could measure 100 instances and still attain a probability greater than 99% of finding the correct state. Moreover, the angles γ * , β * themselves are not interesting, as long as they yield the lowest-energy state. This gives some robustness against coherent gate errors, since any over or under rotations can be compensated for by a change in the variational angles [12]. We apply the QAOA to the exact-cover problem, which reads: given a set X and several subsets S i containing parts of X , which combination of subsets include all elements of X just once? Mathematically speaking, this combination of subsets should be disjoint, and their union should be X . This problem can be mapped onto an Ising Hamiltonian, where the number of spins equals the number of subsets, while the size of X can be arbitrary.
Let us consider n = 2, for which the two-spin Ising Hamiltonian isĈ The exact-cover problem is mapped onto this Hamiltonian by choosing h i and J as [23] J > min(c 1 , c 2 ), where c i is the number of elements in subset i, and J > 0 if the two subsets share at least one element. We are free to choose J , as long as it fulfills the criterion in Eq. (6).

Problem Subsets
For example, consider X = {x 1 , x 2 } and two subsets S 1 = {x 1 , x 2 } and S 2 = {x 1 }. This gives c 1 = 2 and c 2 = 1, and we could choose J = 2, yielding h 1 = −2 and h 2 = 0. It is easy to check that the corresponding ground state is |10 (i.e., S 1 is the solution). Finally, we normalize J and h i such that the Ising Hamiltonian has integer eigenvalues, allowing us to restrict γ i and β i to the interval [0, π [. For two subsets, four different problems exist, which all yield different sets of h i and J . These are summarized in Table I. Problem A is the example given above; it is the most interesting, as the other problems are trivial. Problems B and C are trivial since they do not contain any qubit-qubit interaction (J = 0). Problem D is also trivial since both subsets are equal. Additionally, the ground states are degenerate for problems B and D.

III. REALIZATION ON QUANTUM HARDWARE
We implement Eq. (1) on our quantum processor using the circuit in Fig. 1(b). The circuit can be somewhat compiled by simple identities (e.g., two Hadamard gates act as an identity gate). We stress that our implementation of the QAOA is scalable in that we do not use any exponentially costly precompilation (e.g., calculating the final circuit unitary and using Cartan decomposition to minimize the number of two-qubit gates).
Our quantum processor is fabricated using the same processes as in Ref. [24] and consists of two fixed-frequency transmon qubits with individual control and readout. Both qubits are coupled to a common frequency-tunable coupler used to mediate a controlled-phase (CZ) gate between the qubits. The CZ gate is realized by a full coherent oscillation between the |11 and |02 states. The interaction is achieved by parametrically modulating the resonant frequency of the coupler at a frequency close to the difference frequency between the |0 − |1 and |1 − |2 transitions of qubit 1 and 2, respectively [25,26]. We have benchmarked such a gate on the same device during the same cooldown to above 99%; however, the benchmark performed closest in time to the experiments presented here showed a fidelity of 98.6%. These kinds of fidelity fluctuations might be related to fluctuations in the qubits' coherence times [24]. Single-qubit X rotations are driven by microwave pulses at the qubit transition frequencies with fidelities of 99.86% and 99.93% for the respective qubits. Z rotations are implemented in software as a shift in drive phase and thus have unity fidelity [27]. All the reported gate fidelities are measured by (interleaved) randomized benchmarking [28]. More experimental details, a measurement setup along with a device schematic, and benchmarking results are given in Appendices A and B.

IV. APPLYING THE QAOA TO FOUR PROBLEMS
For p = 1, we apply a simple grid (61 × 61) search of β 1 , γ 1 ∈ [0, π [ while recording 5000 measurements of each qubit. From these, we calculate σ z i , σ z 1 σ z 2 , the cost function F, and the occupation probability for each of the four possible states, while accounting for the limited, but calibrated, readout fidelity (86% and 95% for the two qubits). By collecting sufficiently many samples, the statistical error on the estimated quantities can be made small.
The grid search allows us to explore the shape of the optimization landscape, which may bring important understanding in the difficulty of finding global minima for black-box optimizers. In Fig. 2, we show measured cost functions for the four problems in Table I. Because of the normalization of h i and J , the ground state for each problem corresponds to F = −1. In Fig. 2(a), the cost function for problem A never reaches below −0.5. To achieve costs approaching −1, additional levels (p > 1) are needed. Moreover, the existence of a local minimum , we see clear minima where F ≈ −1, indicating that we have found the optimal variational angles | γ * , β * corresponding to the ground state. In Fig. 3, we take linecuts along the dashed lines in Fig. 2 and benchmark our measured cost functions and state probabilities against those of an ideal quantum computer without any noise. We see excellent agreement between measurement and theory: the measured positions of each minimum and maximum are aligned with those of the theory, consistent with low coherent-error rates. In addition, we observe excellent agreement between the absolute values at the minima and maxima, indicating low incoherent-error rates as well. Even with high gate fidelities, a high algorithmic fidelity is not guaranteed. Randomized benchmarking gives the average fidelity over a large number of random gates, which transforms any coherent errors into incoherent errors. For real quantum algorithm circuits, the gates are generally not random. Therefore, any coherent errors can quickly add up and yield algorithmic performance far lower than expected from randomized benchmarking fidelities alone [29,30].
To quantify the performance of the QAOA with p = 1, we compare the highest-probability state at the minima of F with the solutions in Table I. Problem A [ Fig. 3(a)] does not reach its ground state (F ≈ −0.5); however, the  Table I. The linecuts are taken at the vertical dashed lines in Fig. 2. The theory curves are calculated assuming an ideal quantum processor, whereas each experimental data point is derived from the average of 5000 measurements on our quantum processor.
probability of measuring the correct state (|10 ) is approximately 50%, which is still better than random guessing. For problem C [ Fig. 3

V. INCREASING THE SUCCESS PROBABILITY
To increase the success probability for problem A, we add an additional level (p = 2). For p > 1, a grid search to map out the full landscape becomes unfeasible due to the many parameters (equal to 2p). Therefore, we instead use black-box optimizers to find the optimal variational angles. We try three different gradient-free optimizers: Bayesian optimization with Gaussian processes (BGPs), Nelder-Mead, and covariance matrix adaptation evolution strategy (CMA ES). We choose BGPs due to its ability to find global minima, Nelder-Mead due to it being common and simple, and CMA ES due to its favorable scaling with the number of optimization parameters.
We evaluate the optimizer performances by running 200 independent optimizations with random starting values ( γ , β ∈ [0, π [) for each optimizer. For each set of variational angles, we repeat the circuit and measure 5000 samples to accurately estimate the expectation values. We set a threshold for convergence at F < −0.95 and count the number of converged optimization runs as well as the number of calls to the quantum processor (function calls) required to converge. We also record the success probability of measuring the problem solution (P |10 ). The results are summarized in Table II. We observe that the success probabilities after convergence are similar for all three optimizers. However, there is a difference in convergence probability, of which BGPs has the highest and Nelder-Mead has the lowest. The lower performance of Nelder-Mead is most likely due to its sensitivity to local minima, a well-known problem for most optimizers. In contrast, one of the strengths of Bayesian optimization is its ability to find global minima, which could explain why it performs better than Nelder-Mead and CMA ES. Additionally, Bayesian optimization is designed to handle optimization where the time of each function call is high (costly), such that the number of calls is kept low. However, for more optimization parameters (higher p), the performance of BGPs is generally decreased due to an increasing need for classical computation. CMA ES, on the other hand, excels when the number of parameters is high, and thus might be a good optimizer for the QAOA with tens or hundreds of parameters. Here, with just four parameters, CMA ES has a convergence probability similar to that of BGPs, although with a greater number of function calls on average. To quantify the optimization further, we study the trajectories of each optimization run (Fig. 4). For each run, we plot the costs F. The trajectories for BGPs and Nelder-Mead [Figs. 4(a)-4(b)] corroborate the indications about local minima. We see groups of horizontal lines corresponding to different local minima, especially clear at F ≈ −0.55 for both BGPs and Nelder-Mead. We also see that BGPs tries, and sometimes succeeds, to escape these local minima, which is one of the advantages of Bayesian optimization. In comparison, Nelder-Mead rarely gets out of a local minimum once it has found it. For the third optimizer, CMA ES [ Fig. 4(c)], it is hard to draw any conclusions from the trajectories other than that the convergence is slower than for the other optimizers. However, we include the CMA ES trajectories for completeness. For each optimizer, we also plot the averaged (over all the converged) trajectories for F and the probability of finding the solution state P |10 .
At the end of the optimization, the highest recorded probability of generating the correct state is 96.6%. The success probability is limited by imperfect gates (we have verified that an ideal quantum computer and p = 2 can achieve P |10 = 1). We compare our measured success probability to what we would expect from the randomizedbenchmarking fidelities. The quantum circuit for p = 2 consists of 6 X, 4 Hadamard, 4 Z, and 3 CZ gates, which, when multiplied together with the fidelities for each gate, predicts a total fidelity of 96.3%, in good agreement with the measured fidelity considering experimental uncertainties (e.g., fluctuations in qubit coherence and gate fidelities). Note that p = 3 would not yield a higher success probability, since adding more gates would lower the total fidelity further (predicted to be 94.2%).
Finally, we examine histograms over the success probabilities at the end of each optimization run for the three different optimizers; see

034010-5
BGPs has the most converged runs out of the three. We see clusters around 55% and 95% success probabilities for all three optimizers, possibly corresponding to one local and the global minima. For CMA ES, the success probabilities are more scattered, where some runs even have below 40% success. All in all, Bayesian optimization performs the best; however, further studies will be needed to determine which classical optimizer is the most suitable for variational quantum algorithms, such as the QAOA and VQE.

VI. CONCLUSION
In conclusion, we implement the quantum approximate optimization algorithm with up to p = 2 levels. Using a superconducting quantum processor with state-of-the-art performance, we successfully optimize four instances of the exact-cover problem. For the nontrivial instance (problem A), we use p = 2 and black-box optimization to reach a success probability of 96.6% (up from 50% with p = 1), in good agreement with a prediction from our gate fidelities. Even if many more qubits are needed to solve problems that are intractable for classical computers, algorithmic performance serves as a critical quantum-processor benchmark since performance can be much lower than what individual gate fidelities predict. Although further experiments with larger devices are needed to explore whether the QAOA can have an advantage over classical algorithms, our results show that the QAOA can be used to solve the exact-cover problem.

ACKNOWLEDGMENTS
We are grateful to the Quantum Device Lab at ETH Zürich for sharing their designs of sample holder and printed circuit board, and Simon Gustavsson and Bruno Küng for valuable support with the measurement infrastructure. We also thank Morten Kjaergaard, Devdatt Dubhashi, and Kevin Pack for insightful discussions. This work was performed in part at Myfab Chalmers. We acknowledge financial support from the Knut and Alice Wallenberg Foundation, the Swedish Research Council, and the EU Flagship on Quantum Technology H2020-FETFLAG-2018-03 project 820363 OpenSuperQ.

APPENDIX A: MEASUREMENT SETUP
The experimental measurement setup used here is a standard circuit quantum electrodynamics setup; see the schematic in Fig. 6. The quantum processor consists of two xmon-style transmon qubits coupled via a frequencytunable anharmonic oscillator. The tunability is provided by two Josephson junctions in a superconducting quantum interference device (SQUID) configuration. The two qubits are capacitively coupled to individual control lines and quarter-wavelength resonators for readout. There is  6. (a) Cryogenic setup and electrical circuit of the quantum processor. All lines are attenuated and filtered to minimize the amount of noise reaching the qubits. The readout output contains cryogenic isolators and a high electron mobility transistor amplifier. (b) False-colored micrograph of the processor. The colors match the circuit elements in (a). The three waveguides at the bottom are for control over the qubits and the coupler. also a readout resonator for the coupler, which is only used as a debugging tool (i.e., it is not involved during any algorithm execution). The SQUID for the tunable coupler is inductively coupled to a waveguide to allow for both static and fast modulation of the resonant frequency.

034010-6
The processor is fabricated on a high-resistivity intrinsic silicon substrate. After initial chemical cleaning, an aluminium film is evaporated. All features except the Josephson junctions are patterned by direct-write laser lithography and etched with a warm mixture of acids. The Josephson junctions are patterned by electron-beam lithography, and evaporated from the same target as previously. A third lithography and evaporation step (with in-situ ion milling) is performed to connect the Josephson junctions to the rest of the circuit. Finally, the wafer is diced into individual dies and subsequently cleaned by a combination of wet and dry chemistry.
A die is then selected and packaged in a copper box and wire bonded to a palladium-and gold-plated printed circuit board with 16 nonmagnetic coaxial connectors. For the present device, we use five of these connectors, two for readout, two for single-qubit control, and one for control of the magnetic flux through the SQUID loop of the coupler. These are connected to filtered and attenuated coaxial lines leading up to room temperature. We point out that the dc current for the static flux bias is also provided through the coaxial line. Finally, the processor is attached to the mixing chamber of a Bluefors LD250 cryo-free dilution refrigerator. There, it is shielded from stray magnetic fields by two high permeability shields and two superconducting shields.
We perform multiplexed readout by using the Zurich Instruments UHFQA for generating and detecting the readout signals, together with a Rohde & Schwarz SGS100A continuous-wave signal generator and two Marki IQ mixers for up-and down-conversion. The single-qubit pulses are synthesized by the Zurich Instruments HDAWG and up-converted using Rohde & Schwarz SGS100A vector signal generators. The flux drive is generated directly by the HDAWG since the modulation frequency is within the bandwidth of the instrument. Finally, all instruments are controlled and orchestrated by the measurement and automation software Labber. Labber also does costfunction evaluations and calls external Python packages for the three different optimizers. All three optimizers are run using publicly available packages: Scikit-Optimize for BGPs, scipy for Nelder-Mead, and pycma for CMA ES.

APPENDIX B: CHARACTERIZATION AND TUNE-UP
Initially, we perform basic spectroscopy and decoherence benchmarking of each qubit individually. This allows us to extract readout frequencies, qubit frequencies and anharmonicities, relaxation and dephasing times, and static couplings between qubit and resonator, as well as between qubit and coupler. The extracted parameters are found in Table III. After the initial characterization, we tune up highfidelity single-qubit gates. The drive pulses have cosine envelopes together with first-order derivative removal by Next, we calibrate our readout fidelities. By collecting raw voltages of the readout signals (as measured by the digitizer in the UHFQA), with and without a calibrated π pulse applied to the qubit (|0 and |1 states, respectively), and as a function of readout frequency and amplitude, we can find the optimal readout parameters. Because of our rather low coupling strengths, we cannot achieve short readout times in this device. However, the QAOA does not require any measurement feedback, so a long readout time is not an issue as long as the time is shorter than the relaxation times of the qubits. Also, longer readout times give greater signal-to-noise ratios, which allow us to achieve high readout fidelities even in the absence of a quantumlimited amplifier. Here, the readout is 2.3 μs long, well below our relaxation times (several tens of microseconds). After finding the optimal readout parameters, a voltage threshold is used to differentiate between |0 or |1 of the measured qubit.
To accurately extract state probabilities in the presence of limited readout fidelity, we collect statistics of the measured qubit population as a function of qubit drive amplitude (Rabi oscillations). Since the measured population increases monotonically with the expected population, we can renormalize the populations, similarly to Ref. [31]. This calibration allows us to accurately measure the average quantities σ z i , σ 1 i σ 2 i and state probabilities even in the presence of limited readout fidelities.

034010-7
Our two-qubit gate of choice is the CZ. This interaction is induced by parametrically modulating the resonant frequency of the coupler at a frequency close to the difference of |11 and |02 . For our device, this frequency is 255 MHz. However, due to the frequency modulation and the nonlinear relationship between flux and frequency, the transition frequencies are slightly lowered. This frequency shift will also induce deterministic phase shifts on the individual qubits, which we compensate for by applying Z gates on both qubits after each CZ gate. We choose a static bias point and a modulation amplitude that yield a moderate effective coupling strength of 5 MHz between the two states. From here, we find the modulation frequency and time that yield a full oscillation between the |11 and |02 states. We then fine tune the frequency and time such that the controlled phase is π and the leakage to |02 is minimal. Here, the final gate frequency and duration are 253 MHz and 271 ns.
We benchmark our single-and two-qubit fidelities using randomized benchmarking. A sequence of random gates drawn from the Clifford group is applied together with a final recovery gate that should take the system back to the ground state. The number of random gates is varied and the probability of measuring the ground state is recorded. In Fig. 7, we plot these probabilities for each qubit individually and for the two-qubit case. In the single-qubit case, it is important to note that it is done simultaneously for both qubits. Generally, the gate fidelities are higher if they are done in isolation. However, to reduce the total run time of algorithms, we usually run single-qubit gates in parallel. Therefore, simultaneous randomized benchmarking fidelities are more relevant metrics than isolated ones. FIG. 7. Randomized benchmarking of single-and two-qubit gates. Plotted are the probability of measuring the ground state as a function of the number of Clifford gates applied. Circles are data, and lines are fits to extract the gate fidelity. For qubits 1 and 2, the extracted single-qubit fidelities (averaged over all possible single-qubit Clifford gates) are 0.9986 and 0.9993. For benchmarking of the two-qubit gate, we take a reference (random Clifford gates) and an interleaved (a CZ gate between each Clifford gate) trace to extract the CZ fidelity (0.986).