Low-depth Clifford circuits approximately solve MaxCut

,


I. INTRODUCTION
Near-term quantum processors have found a niche in the hybrid quantum-classical model of computation with Variational Quantum Algorithms (VQAs) [1][2][3].These algorithms perform classical optimization of a problemspecific objective function that is evaluated by measuring the output of a parametrized quantum circuit.Through variational search over circuit parameters, a VQA thus seeks a solution circuit that transforms a simple input state in the Hilbert space of problem variables to a superposition of approximate solutions, i.e., configurations that yield near-optimal values of the objective function.In attempts to determine whether they can lead to a speedup over classical algorithms for any useful task, VQAs have been applied to a variety of combinatorial optimization problems.
These are "dequantized" classical versions of quantum or hybrid quantum-classical algorithms that unveil and exploit previously unrecognized properties or structures in a problem, leading to solution strategies that outperform the best known classical algorithm.* munm2002@usherbrooke.ca In this work, we introduce a quantum-inspired approximation algorithm for the MaxCut problem.The algorithm, which we dub ADAPT-Clifford, is motivated by the observation that the solution circuits found by an adaptive QAOA variant for MaxCut on weighted complete graphs are (almost) Clifford circuits, a well-known restricted class of quantum circuits that are easy to simulate classically [19][20][21][22][23]. ADAPT-Clifford builds an entangled state with a number of unitary operations that is equal to the number of nodes N , adding at every step a known two-qubit gate to the circuit.The algorithm is polynomial both in time and space, with worst case runtime complexity O(N 4 ) and space complexity O(N 2 ).We characterize the performance of the algorithm in several families of graphs.For graph sizes up to N = 30 nodes we report the exact approximation ratios.For larger problem sizes up to hundreds of nodes and depending on graph family, we assess the performance by either direct comparison with the solution found by the best classical algorithm for MaxCut [24,25], or by comparing with the known value of the mean energy density in the thermodynamic limit.
The rest of this manuscript is organized as follows.In Sec.II we present a short summary of the Max-Cut problem, quantum approximate optimization and its adaptive variant, Clifford circuits, and introduce the tools we use to characterize the structure of solution circuits.In Sec.III we present the origin of the algorithm by analyzing the structure of solution circuits to MaxCut on weighted complete graphs obtained with QAOA and ADAPT-QAOA.In Sec.IV we present the ADAPT-Clifford algorithm, discuss its details, and analyze its runtime and space complexities in different types of graphs.In Sec.V we present numerical results of the algorithm performance on weighted complete graphs with positive and signed weights, using the Sherrington-Kirkpatrick model as a specific example of the latter.In Sec.VI we explore the performance of the algorithm beyond complete graphs, including weighted and unweighted K-regular graphs and Erdös-Rényi graphs.Finally in Sec.VII we conclude with a discussion of our results in the context of near-term quantum optimization algorithms and present an outlook of future work.

A. The MaxCut problem
Given a graph G = (V, E), where V is the vertex set and E ⊆ V 2 is the set of edges (E = V 2 is a complete graph), and edge weights ω i,j ∈ R for (i, j) ∈ E, the MaxCut problem asks to partition V into two complementary subsets A, A ⊂ V, such that the total weight of the edges between A and A is maximized.We use binary variables z i ∈ {0, 1}, i ∈ V to help us identify each subset, so that z i = 1 if vertex i ∈ A and z i = 0 if i ∈ A. The maximal cut can then be formally expressed as the assignment z ′ that maximizes the cost function where z = z 1 ...z N is a N -bit binary string and ω i,j = ω j,i ∀(i, j) ∈ E.
Beyond the special cases mentioned above and due to the difficulty of solving the problem exactly, one often aims instead to find approximation algorithms that yield reasonably good solutions in polynomial time for all problem instances.That is, we search for an algorithm that outputs an assignment z * , such that the approximation ratio equals some desired value, ideally as close to 1 as possible, on all instances of MaxCut.However, in some cases the gap between approximate an optimal solutions cannot be reduced arbitrarily in polynomial time [37], a phenomenon known as hardness of approximation.For Max-Cut on general graphs, the best known approximation algorithm is that of Goemans and Williamson (GW) [24], which has a performance guarantee (worst case) of α ≃ 0.878 [24, 25, 38] [39].Below we will present extensive performance comparisons between ADAPT-Clifford and the GW algorithm, in order to be as self contained as possible we review the details of the GW algorithm in App. A. One might hope to achieve better approximation ratios by focusing on specific families of graphs.For unweighted graphs Ref. [40] showed that finding an algorithm yielding an approximation ratio better than 16/17 is NP-hard.Nearly optimal algorithms both for cubic graphs [41], guaranteeing α = 0.9326, and for K-regular graphs of large degree [42], are known.Another interesting example is the case of dense graphs, i.e., graphs with O(N 2 ) edges.Polynomial time approximation schemes (PTAS) are known for both unweighted [43,44] and weighted [45] graphs, although for the latter there is only an existence result.A PTAS guarantees an approximate solution whose cost is 1 − ϵ away from the optimal.Although these schemes have a provable polynomial runtime in N , it might not be polynomial in ϵ, see for example Ref. [44].We will come back to this point in Sec.VII.
In quantum approximate optimization, the objective function of a combinatorial optimization problem defined on binary variables z i , such as MaxCut, is expressed as an Ising Hamiltonian through the mapping σ i = 2z i − 1, with connectivity dictated by the graph G [46].That is, the entries ω i,j of the adjacency matrix reflect the coupling between the i-th and j-th spin.In this setting the optimum z ′ is encoded as the ground state of the Ising Hamiltonian.The Ising Hamiltonian is then promoted to a Hamiltonian operator via the identification σ i → Z i , with Z i a Pauli-z operator acting on the qubit that corresponds to the i-th spin.For the MaxCut problem, the corresponding Hamiltonian is In writing Eq. ( 3) we have dropped a constant factor equal to i<j ωi,j 2 and added a minus sign to turn the maximization problem defined by Eq. ( 1) into a minimization one.
In analogy with classical approximation algorithms, quantum approximate optimization yields approximate solutions in the form of a state |ϕ⟩, whose energy expectation is as close as possible to the ground-state energy of the Ising Hamiltonian.Thus, the approximation ratio, Eq. ( 2), takes the form where E C min is the smallest eigenvalue of H C .To achieve advantageous performance, a quantum algorithm must produce an approximate solution with a desired α faster than any classical algorithm.
The Quantum Approximate Optimization Algorithm (QAOA) is a type of variational algorithm [2] that aims to solve combinatorial optimization problems [4].It is defined by a parametrized quantum circuit with a periodic structure.Each layer of the circuit is given by a product of two unitaries, time evolution under H C , followed by time evolution under a mixer Hamiltonian where X j is a Pauli-x on the j-th qubit.For p layers QAOA prepares the state where γ = γ 1 , ..., γ p , β = β 1 , ..., β p , and H is the Hadamard gate.In order to find approximate solutions the set of 2p parameters is optimized so as to minimize ⟨ψ(γ, β)|H C |ψ(γ, β)⟩ p .After executing the circuit with optimized parameters a measurement in the computational basis returns a candidate solution in the form of a bit string z * .Ideally, one would sample with high probability a good approximate solution.We will denote the optimal parameters found by numerical experiments as (γ * , β * ), and the associated solution unitary l HM e −iγ * l HC .Not much is known regarding performance guarantees and hardness of approximation for QAOA.The case of constant p has so far been the main focus, as it is the regime of interest for current quantum devices [47].
Ref. [48] gives evidence for a possible quantum advantage for MaxCut on 3-regular graphs with shallow QAOA.Refs.[49,50] provide evidence that both p = 1 and large-depth QAOA output states with bit string probabilities following Boltzman distributions, rendering sampling classically hard.At the same time, it is known that constant p QAOA is bounded away from optimality in sparse graphs [51][52][53], as well as in some dense problems where the overlap gap property [54] is known to exist [55].These results were recently extended to the case of p ∼ log(N ) [56].However these results do not apply directly to the case of p ∼ poly(N ).As a consequence there are no conclusive results on the runtime required for p ∼ poly(N ) QAOA to reach a given approximation ratio, with only loose lower bounds appearing recently [57].Most studies of QAOA so far have been numerical experiments on different families of problem instances, two examples are Erdos-Renyi graphs [6] and 3-and 4-regular graphs (weighted and unweighted) [58].Importantly, any indication of a putative advantage in this type of studies, has been inconclusive due to the small problem sizes accessible to either quantum implementations or classical simulation [59].
To alleviate some of the roadblocks explored above, variants to the original QAOA ansatz have been developed, see Ref. [60] for a review.Of interest to us here is the ADAPT [61] variant, which was proposed as a way to find ansätze which are tailored to the specifics of the problem under consideration.ADAPT-QAOA is an iterative variational algorithm which replaces the fixed mixer Hamiltonian in Eq. ( 6), by a suitably chosen one, A l , at each layer l ≤ p.Thus, p-layer ADAPT-QAOA prepares the state The l-th mixer Hamiltonian is chosen as the one which maximizes the energy gradient, that is, where the new variational parameter γ l is set to a predefined small positive value γ 0 ∼ 0 [61], P OP is an operator pool, and |ψ l−1 ⟩ is the state resulting from the application of the ADAPT-QAOA solution circuit with only l−1 layers.The choice of pool is not unique, with different pools being advantageous in different situations [62,63].Below we restrict ourselves to the pool which is sufficient for our purposes.
In contrast to QAOA, ADAPT-QAOA grows the circuit layer by layer, until the desired number p.As such, we begin with a single layer, find the corresponding mixer according to Eq. ( 8), then optimize to find the best parameters.We then add a second layer, find the corresponding mixer according to Eq. ( 8), initialize the new pair of parameters to zero [64], the rest of the parameters to the best values already found, and optimize all of them.This procedure is repeated until p layers are added.For a fair comparison between QAOA and ADAPT-QAOA in our numerical simulations we construct the QAOA solution circuit following the same iterative strategy, but with a fixed mixer.

C. Clifford circuits and their efficient simulation
In this subsection we review some concepts of the stabilizer formalism which will be used later in the manuscript.For a general presentation see Ref. [65].
The single qubit Pauli group is given by the operators {I, X, Y, Z} together with multiplicative factors ±1, ±i.The N qubit Pauli group PN is given by all the N -tensor products of these operators together with multiplicative factors.Given a pure state on N qubits |ψ⟩, we say Pi ∈ PN stabilizes |ψ⟩ if the state is an eigenvector of Pi with eigenvalue +1: Pi |ψ⟩ = |ψ⟩.A n-qubit pure state is a stabilizer state if it can be completely specified, up to a global phase, by its N stabilizers.
Quantum circuits which map stabilizer states to stabilizer states define a large class of nontrivial quantum circuits -stabilizer circuits-which can be simulated in polynomial time on a classical computer [22,23].This is the content of the celebrated Gottesman-Knill theorem [20,21].These quantum circuits can be completely written in terms of controlled-NOT, Hadamard, and phase gates, and single qubit measurements.Importantly, the efficient classical simulability does not imply these circuits are not interesting.On the contrary, they have extensive applications in quantum information science, for instance encoding and decoding in quantum error correction [19,21,66,67], dense quantum coding [68], quantum teleportation [69], quantum simulation [70,71], proof of principle of quantum advantage with nonlocal games [72,73], as well as in quantum manybody physics [74][75][76][77][78].
In absence of measurements, stabilizer circuits are referred to as Clifford circuits or Clifford unitaries.They form a group C, defined as the unitaries which normalize the Pauli group, that is, the unitaries which map Pauli operators to Pauli operators.Following from the Gottesman-Knill theorem, this group has three generators, the controlled-NOT, Hadamard, and phase gates.Naturally the Pauli operators are elements of the group, as they are generated by Hadamard and phase.
Both the QAOA and ADAPT-QAOA ansätze are defined as products of unitaries generated by Pauli strings.When are unitary transformations generated by Pauli strings Clifford unitaries?To answer this question, take Pi , Pj ∈ P, two distinct Pauli strings that either commute or anticommute by definition.Further consider the unitary W (θ) = e −iθ Pj , then and W † Pi W = i Pi Pj if { Pi , Pj } = 0 and θ = ±m π 4 with m ∈ N. We thus see that when a quantum circuit is composed of products of unitaries generated by Pauli strings that do not necessarily commute, it is a Clifford circuit if an only if the parameters of these transformations are integer multiples of ±π/4.Therefore, if QAOA solution circuits U (γ * , β * ) are to be Clifford, then the circuit parameters γ * and β * must be integer multiples of ±π/4.

D. Characterizing the structure of solution unitaries
Here we introduce the tools we use in the next section to characterize the structure of QAOA solution circuits U (γ * , β * ).Consider the Hilbert space H of N qubits with dimension d = 2 N , and define the N -qubit Pauli basis as P N = PN /⟨±iI⟩, the quotient group containing, D = 4 N − 1, Pauli strings with all multiplicative factors equal to +1.Furthermore any pair of Pauli strings obey Tr[P i P j ] = dδ ij .Therefore, P N defines a basis for all Hermitian operators in H.
Consider some Hermitian operator O acting on H.If O evolves under some unitary transformation V , we write with It is easy to see that j p j = 1.Eq. ( 11) thus denotes the probability of finding O ′ to be the j-th Pauli string P j .In the case of O = P l , the normalization factor in Eq. ( 11) We analyze the transformation V as an "input-output" channel, with O the input and O ′ the output, and are interested in characterizing the locality, in the Pauli basis, of the output.This can be inferred from the the localization properties of p j (O; V ), which we investigate with the Second Renyi entropy, see Ref. [79] and Sec.2.7 of Ref. [80] Eq. ( 12) has a resemblance to the stabilizer Renyi entropy [81].Although the latter quantifies the nonstabilizerness of a multi-qubit state, the expression in Eq. ( 12) directly looks at nonCliffordness of the transformation.As such, one might interpret it as the operator space counterpart to the stabilizer Renyi entropy, and we expect both quantities to have similar behaviors, that is, if for a multi-qubit state V |ψ⟩ ⊗N the stabilizer Renyi entropy is high/low, then Eq. ( 12) for some input Pauli string O and the same unitary V will be high/low.The P l ∈ P N can be ordered by their "weight", i.e., the number of nonidentity elements in the Pauli string.This ordering allows us to systematically study the Clifford character of the transformation V on Pauli strings.Naturally, the first step will be to check it for strings of weight one, which is done by setting O = Y n , where Y n denotes a Pauli operator with a Pauli-y on the n-th qubit position and identity everywhere else.In particular we denote S (Y n ; V ) = S n (V ).Since we can place the initial Pauli-y at any of the N positions representing the nodes of the graph, we consider the node-averaged Renyi entropy of the operator distribution as our figure of merit.Since Clifford unitaries map Pauli strings to Pauli strings, then S (O; V ) = 0 for all O ∈ P N .Since we are only checking the behavior of V as a "channel" for Pauli strings localized on one qubit, a vanishing S is necessary (but not sufficient) for V to be Clifford.We thus use S as a witness of Cliffordness.We supplement this witness with an examination of the optimal parameters (γ * , β * ).Observation of γ * , β * = ±m π 4 with m ∈ N, then provides the sufficient condition for V to be Clifford.This observation is made quantitative via the distance of the vector of parameters v to the discrete set of interest.We define this distance as where the normalization ensures that each term in the sum is bounded to the interval [0, 1], thus we have 0 ≤ D(v) ≤ |v|.Then the instance averaged distances E[D(γ * )] → 0 and E[D(β * )] → 0 will inform us of solution circuits which are close to Clifford.

III. ORIGIN OF THE ADAPT-CLIFFORD ALGORITHM
To understand the origin of the ADAPT-Clifford algorithm, it is instructive to examine the solution circuits obtained with QAOA and ADAPT-QAOA for MaxCut on small weighted complete graphs.We implemented both variational algorithms in the extensible Julia framework Yao.jl [82] and use the COBYLA optimizer.The analysis of the operator distribution in the Pauil basis was implemented using QuantumOptics.jl[83].
We consider first the case of graphs with positive weights with ω i,j from either U[0, 1], where by U[a, b] we denote the uniform distribution in the interval [a, b], or Exp(1), the exponential distribution with mean 1.In Fig. 1a we show the mean approximation ratios for 50 problem instances with N = 6 for circuits up to p = 10 layers.Similar to the observation in Ref. [61], ADAPT-QAOA (green diamonds and circles in Fig. 1a) finds a solution arbitrarily close to the exact solution at sufficiently high but finite p, away from the p → ∞ limit where QAOA is guaranteed to reach the exact solution.In Fig. 1b we show the expectation value over instances of S, E[S].The ADAPT-QAOA solution circuits that lead to α → 1 in Fig. 1a display E[S] → 0, indicating they might be Clifford circuits.In contrast, the QAOA solution unitaries show E[S] > 0 with a tendency towards the typical value, log(4 −N ), with increasing depth, in agreement with previous works using other indicators [84][85][86].
To verify the Cliffordness of the ADAPT-QAOA solution circuits we examine the optimized parameters, (γ * , β * ), at p = 10.An example is shown in Fig. 1c where dashed and solid lines correspond to γ * and β * , respectively.We observe γ * → 0 and β * → −π/4 in all layers, and the mixer Hamiltonians selected by the adaptive step are almost always Y l Z m for some pair of qubits (l, m).Furthermore the distances of the optimal parameters for p = 10 to ±s π 4 with s ∈ N averaged over all instances with α → 1 (ω i,j ∈ U[0, 1]), are E[D(γ * )] = 0.522 ± 0.270 and E[D(β * )] = 0.247 ± 0.322, indicating the optimized parameters are closed, on average, to the Clifford values.This is to be contrasted with E[D(γ * )] = 3.53 ± 0.94 and E[D(β * )] = 3.41 ± 0.62 for the optimized parameters of the QAOA solution circuits with p = 10.Finally, we extensively checked that the properties of the ADAPT-QAOA solution unitary discussed here do not change as long as the edge weights are all positive.
Next we consider the case of signed weights with ω i,j sampled either from U[−1, 1] or N (0, 1), the normal distribution with mean 0 and variance 1.In Fig. 1a we compare the averaged approximation ratio of the ADAPT-QAOA solutions with that of the QAOA solutions for the same problem instances.Similar to the case of strictly positive weights, ADAPT-QAOA solutions get arbitrarily close to the exact solution when enough layers are considered.As seen in Fig. 1b E[S] ̸ = 0 for the ADAPT-QAOA solution (green crosses and squares).Although the circuits found are therefore not Clifford, E[S] ∼ 1 at p = 10 for the small problem size under study, in contrast to QAOA solution circuits (purple crosses and hexagons), for which E[S] tends towards the typical value.
The small value of E[S] for the ADAPT-QAOA solutions raises the question: how far is this solution from the Clifford manifold?To answer this, in Fig. 1d we show the optimized parameters (γ * , β * ) found for one of the problem instances solved.The β * 's are either 0 or −π/4, indicating the mixer unitaries are Clifford, with mixer Hamiltonians almost always Y l Z m for some pair of qubits (l, m), and most of the γ * 's are zero with only few, ∼ 2, being nonzero.Furthermore, the distances of the optimized parameters for p = 10 to ±s π 4 with s ∈ N averaged over all instances with α → 1 (ω i,j ∈ N (0, 1)), are E[D(γ * )] = 1.51 ± 0.72 and E[D(β * )] = 0.83 ± 0.96, indicating that, on average, the solution circuits are further away from the Clifford manifold than in the case of ω i,j > 0. This is to be contrasted with E[D(γ * )] = 2.77 ± 0.78 and E[D(β * )] = 2.75 ± 0.64 for the optimal parameters of the QAOA solution circuits with p = 10.Therefore, the overall structure of the mixer unitaries of the ADAPT-QAOA U (γ * , β * ) found for positive ω i,j is still there when ω i,j are signed, complemented with a nontrivial non-Clifford action of a few of the cost layers.We have checked that this structure is common to all ADAPT-QAOA solutions reaching α → 1.
We summarize the observations of this section: • The mixer part of all layers is Clifford with parameters either 0 or −π/4.The mixer Hamiltonian at a given step is of the form Y l Z m for some pair of qubits (l, m).
• The cost part of most layers acts trivially with parameters equal to 0.
• Only N steps are required to find an approximated solution.Consequently, only N mixer layers of the form described in the first point are needed.

IV. ADAPT-CLIFFORD APPROXIMATION ALGORITHM FOR MAXCUT
A bit string z * is a good approximate solution to Max-Cut if α(z * ) is as close to 1 as possible.Thus, finding good approximate solutions to this problem using only Clifford circuits means to prepare a stabilizer state |Ψ⟩ whose energy expectation satisfies |⟨Ψ|H C |Ψ⟩−E C min | ≤ ϵ, with ϵ a small positive constant ideally equal to 0. A measurement in the computational basis then returns z * with the desired value of α.
Consider the bit string z ′ which maximizes the cost in Eq. (1).A stabilizer state satisfying the conditions discussed above is where z ′ is the complement of z ′ , and we have chosen the state to be antisymetric under the Ising symmetry [H C , X ⊗N ] = 0, of the cost Hamiltonian.The state |Ψ ′ ⟩ is completely determined by its N stabilizers.One of them is −X 1 X 2 X 3 ...X N , while the remaining N − 1 ones are of ZZ type and their signs encode the maximal cut of the graph.In this setting, an approximation algorithm based on Clifford circuits must be able to determine an assignment of the signs of the ZZ stabilizers leading to either z ′ or a z * with α(z * ) as close to one as possible.

A. Details of the algorithm
We design ADAPT-Clifford so as to exploit the observations summarized at the end of Sec.III to prepare a stabilizer state |Ψ⟩ with the general form given in Eq. ( 15).This is done in a greedy manner, where after a choice of an initial seed, at every step the best local update is performed.As such, at an intermediate step 0 < r ≤ N we label qubits as active and inactive.A qubit is active if a Pauli gate has been applied to it, otherwise it is inactive, and a (r) ∈ a (r) and b (r) ∈ b (r) are indices denoting the positions of "active" and "inactive" qubits, respectively, and a (r) and b (r) are vectors storing the positions of all the active and inactive qubits at step r.
ADAPT-Clifford prepares |Ψ⟩ starting from the k-th qubit and growing this entangled state qubit by qubit, in such a way that at step r the state is a product of two parts: an entangled state of all the |a (r) | active qubits and all the |b (r) | inactive qubits in the product state |+⟩ ⊗|b (r) | .To specify the pair (a (r) , b (r) ) of qubit indices at each step, we use a "gradient" criterion similar to that of ADAPT-QAOA.Specifically, at step r > 2 we compute g (r) where ⟨.⟩ r−1 = ⟨ψ r−1 |.|ψ r−1 ⟩ is taken on the state at step r − 1.Then, we choose the pair of qubits (a (r−1) , b (r−1) ) that maximizes g (r) . The case of r = 1 is special, and we discuss it below alongside the steps of the algorithm.
ADAPT-Clifford returns a candidate maximal cut z * of a graph G after completing the following N steps: 0. At step r = 0 we begin by selecting a position k and preparing the product state At this point the active and inactive qubits are a (0) = {k} and b (0) = {1, .., N }\{k}.
1.At step r = 1, given that a (0) = k we can estimate the largest gradient analytically.In fact, max [ω k,b (0) ], thus the pair we are looking for is the edge (k, j) of G with After applying the gate e i π 4 Y k Zj , the state is The vectors of active and inactive qubits are updated to a (1) = {k, j} and b (1) = {1, .., N }\{k, j}, respectively.
2. For r = 2, ..., N − 1, we find the pair of qubits ( l, b (r−1) ), with l ∈ {k, j}, which maximizes g (r) l,b (r−1) , apply the gate e i π 4 ZlY b (r−1) , and update the vectors of active and inactive qubits.In the case of more than one pair ( l, b (r−1) ) leading to the same largest value of g (r) a (r−1) ,b (r−1) we break the tie arbitrarily.
3. After all N steps are completed, we perform a measurement in the computational basis.From the output bit string, z out , we readout the approximate maximal cut of the graph as (A, A) After the above N steps are completed, the resulting stabilizer state |Ψ⟩ encoding the solution has the form While it may seem that restricting the search to pairs of the form ( l, b (r−1) ) in step 2 may lead to missing the true largest gradient, in App.B we show that this is not the case.Furthermore, this restriction has a simple interpretation.After step r = 1, we have effectively selected the edge (k, j) as a reference with respect to which we are going to partition the graph.Nodes k and j are thus representatives of the disjoint subsets of the cut.Thus, from that step onward, we can pick a (r−1) ∈ {k, j} without loss of generality in order to decide which qubit to move into the active set, i.e., to include in the entangled state.
Some further comments are in order: (i) Given the type of two-qubit gate we are considering, the form of the initial product state |ψ 0 ⟩ is chosen as to guarantee that max k,b (0) ] will be positive.(ii) For r > 1, and independently of the graph connectivity, not all the terms in the sum in Eq. ( 16) are nonzero; in fact, the expectation values in g (r) and are nonzero only for those values of l for which either ±Z l Z a (r−1) is a stabilizer of |ψ r−1 ⟩.This observation allow us to find the largest gradient without explicitly computing the expectation values in Eq. 16, which we show in App. C. At the same time, this observation establishes a direct connection between ADAPT-Clifford and a family of existing MaxCut euristics [87,88], as was recently pointed out in Ref. [89].(iii) The relevant two-qubit gate can be written in terms of Clifford gates as where the S l , H l , are the phase and Hadamard gates acting on the l-th qubit, CNOT l,m is the controlled NOT gate, with qubit l and qubit m as control and target qubits, respectively.Furthermore one can write R (l) a variant of the Hadamard gate which swaps the y-and z-axes.For the interested reader, we work through the operations of our algorithm for two small examples in App.F.

A stabilizer perspective on the algorithm
We can gain further understanding of the inner workings of the algorithm by looking at the way the stabilizers of the state change from step r = 0 to step r = N − 1.At step r = 0, the product state |ψ 0 ⟩ has N − 1 stabilizers equal to X l , l = 1, .., N , l ̸ = k and the remaining stabilizer equal to −X k .At step r = 1 the action of the gate between qubits (k, j), with j found as described previously, increases the weight of the −X stabilizer by one and changes one of the +X stabilizers by a ZZ stabilizer.The state |ψ 1 ⟩ is hence stabilized by −I 1 ..X k I k+1 ...I j−1 X j ..I N and −I 1 ..Z k I k+1 ...I j−1 Z j ..I N while the remaining N − 2 are still X l with l ̸ = k, j.This process continues until r = N − 1; with every new gate the weight of the −X stabilizer increases by one and one of the +X stabilizers gets replaced by a ZZ stabilizer.We see then that the Clifford gate e i π 4 Y l Zm was not chosen arbitrarily.In fact ADAPT-QAOA finds it because it is the gate that maps X l to Z l Z m .
As such we can phrase goal of the algorithm to be the correct assignment of the signs of the ZZ stabilizers.After all N − 1 steps are completed, the state |ψ N −1 ⟩ has one stabilizer equal to −X 1 X 2 X 3 ...X N and the remaining N − 1 stabilizers are ZZ with signs that were determined in the previous steps.If this sign assignment is done correctly, it encodes the approximate maximal cut produced by the algorithm.One can read it out directly by setting the value of any spin to either +1 or −1 arbitrarily and use the measured signs of the ZZ stabilizers to fix the values of the spins at the other N − 1 positions relative to the first one.

B. Runtime and space complexities
Although finding the qubit with the largest gradient at a given step does not require the explicit computation of expectation values of Pauli strings (see App. C for details), it is defined based on a double sum with indices running on portions of the vertex set.As such, this part of the algorithm incurs the leading runtime cost.
At step r > 1 and before applying the two-qubit gate, there are r −1 active qubits and N −r +1 inactive qubits.In order to decide on which pair of qubits we act the gate, we compute Eq. ( 16) via Eq.(C3) for all pairs ( l, b (r−1) ) where l ∈ {k, j} and b (r−1) ∈ b (r−1) .There are 2(N − r + 1) of those pairs.For a given pair the sum in Eq. ( 16) is ∀ l such that (l, l) ∈ E. However only when l ∈ a (r−1) is the expectation value ⟨Z l X (r−1) b Z l⟩ r−1 nonzero.Hence, at step r, there are δ = min(r − 1, K) with K the maximum degree of the graph, nonzero terms in the sum.For bounded-degree graphs, such as Kregular graphs, δ = O(K) at most, whereas for dense graphs with K = O(N ), δ = O(N ).
For a fixed initial position k, the algorithm executes N − 1 steps before reaching a candidate solution.The total number of nonzero terms involved in the computation of the largest gradients is for bounded-degree and dense graphs, respectively.Leading to a run time complexity of O(N 2 ) for bounded degree graphs and O(N 3 ) for dense graphs.
Since in general the initial position k leading to the best approximate solution is not known, we propose and explore two complementary approaches.In the first approach, we choose the initial position at random.This algorithm, to which we refer as randomized ADAPT-Clifford, leads to run time complexities of O(N 2 ) and O(N 3 ) for bounded-degree and dense graphs, respectively, as described above.Second, we introduce a deterministic version -deterministic ADAPT-Cliffordwhere the best initial position k * is determined by exhaustive search.That is, we run ADAPT-Clifford N times, each with a different initial position k, and return the cut of minimal energy found.The runtime complexity of this deterministic approach is thus O(N 3 ) and O(N 4 ) for bounded-degree and dense graphs, respectively.Naturally, the deterministic approach is guaranteed to return solutions of equal or smaller energy expectation that the randomized approach, at the cost of a more limiting runtime.Whether there exist graph families for which any initial position is as good as any other is a question for future work.Finally, it is easy to see that for both randomized and deterministic approaches the space complexity of the algorithm is O(N 2 ), corresponding to the memory required to store the Tableau.

V. ALGORITHM PERFORMANCE ON WEIGHTED COMPLETE GRAPHS
We have implemented the ADAPT-Clifford algorithm using the fast stabilizer circuit simulator Stim [23].Our implementation is available at [90].Although we have chosen this simulator to implement our algorithm, any stabilizer circuit simulator which supports interactivity, that is, where expectation values of Pauli strings can be computed and the circuit modified according to the results, could be used to implement the algorithm.We follow the presentation of Sec.III and discuss separately our algorithm's performance for MaxCut on weighted complete graphs with positive and signed weights.For the latter case, we will focus on the Sherrington-Kirkpatrick model.

A. The case of positive weights
The results of Sec.III indicate that the precise choice of positive weight distribution may be immaterial.We have verified numerically that this is indeed the case for a few different weight distributions.In this subsection, we focus the discussion to ω i,j sampled from U[0, 1] and leave an exhaustive investigation for future work.
We begin studying the performance of the randomized approach.We draw a parallel between the random initialization of ADAPT-Clifford and the rounding step of GW, and thus assess the performance of the randomized ADAPT-Clifford by direct comparison with GW.We solved 100 different problem instances for graph sizes up to N = 1000 with both algorithms.In Fig. 2a we show the normalized mean minimum energy, E[E min ]/N , of the solutions obtained with randomized ADAPT-Clifford (green circles) and the ones obtained with GW (light grey circles).Notice that our randomized ADAPT-Clifford almost always produces a solution of lower energy expectation than GW.These observations can be further verified with the mean difference of the minimum energy found, E[E Cliff min − E GW min ], which we show in Fig. 2b.Since our randomized ADAPT-Clifford consistently beats GW, we expect it to have a performance guarantee for typical instances of positively weighted complete graphs above that of GW for the general problem.We discuss the methodology used to estimate it in App.G.We find α r ≈ 0.8986 a value which confirms our intuition and sets a lower bound for the expected performance of the deterministic ADAPT-Clifford.
We now focus on the deterministic approach.First, we benchmark this algorithm for instances with size up to N = 30 for which the exact solution can be found exhaustively.Fig. 3a shows the exact approximation ratios α, obtained for 100 problem instances.For these small problems, our algorithm performs, on average, above α = 0.997, with the value of the minimum α increasing as N → 30.We notice that the number of instances for which our algorithm finds the exact ground state slightly decreases with the problem size.The success rate, defined as the number of times the algorithm finds a cut with energy E Cliff min − E C min < 10 −10 , is shown in Fig. 3b as a function of the problem size.We observe a success rate ∼ 80% for N = 30.
For problem sizes beyond N = 30, when we cannot access the exact value of the ground state energy, we resort to a direct comparison with the GW algorithm.We find that the cuts obtained with deterministic ADAPT-Clifford are of superior quality to those found with the standard GW algorithm.To obtain a comparison, we thus systematically increase the number of times I the rounding step is performed in GW and return the best cut found -see App.A for details.The standard GW algorithm thus corresponds to I = 1.In Fig. 3c we show the normalized mean energies E[E min ]/N for 60 problem instances up to a problem size of N = 200 produced by our algorithm (orange circles), standard GW (light grey circles), and GW with I = 10 5 (black circles).Notice that our algorithm produces cuts which are always, not merely on average, better than those produced with standard GW, and only when we reach I = 10 5 does the GW algorithm begin to produce a cut whose quality is, on average, superior to that of the cut produced by our algorithm.
To further verify this observation, the inset of Fig. 3c shows the mean energy difference, E[E Cliff min − E GW min ], between the solution found with our algorithm and the one found with GW, with the magenta dotted line indicating E[E Cliff min − E GW min ] = 0, that is, equal quality cuts on average.It is seen that ADAPT-Clifford performs increasingly better than GW with fixed I as problem size is increased.To quantify the approximation quality of ADAPT-Clifford, we estimate the average approximation ratio of the deterministic ADAPT-Clifford on this family of graphs to be α = 0.9686 -see App.G for details.
To complement our benchmarks we performed a time to solution (TTS) study of GW with variable I, randomized ADAPT-Clifford and deterministic ADAPT-Clifford, the details are shown in App.D. While Fig. 3c shows that I = 10 5 rounding steps are needed for the GW algorithm to match the approximation quality of the deterministic ADAPT-Clifford for problem sizes up to N = 200, the data in the inset imply that I may in fact need to scale with N for the GW algorithm to compete with ADAPT-Clifford.However, in App.D we show that the TTS of GW remains constant with I up to I = 10 3 and only slightly increased for larger values of I, and even at I = 10 5 the TTS of GW is faster than deterministic ADAPT-Clifford.In fact, we empirically verify the O(N 3 ) runtime scaling of GW (see App. D and [24,91] and references therein).
However, we also find an empiric runtime scaling for randomized ADAPT-Clifford of O(N 2.7 ), indicating an advantage.The comparison here is more involved, as GW with I > 10 1 already produces a better solution than randomized ADAPT-Clifford.However the Cholesky decomposition performed as part of GW is usually executed as a multi-core operation in most numerical linear algebra libraries.On the contrary, our implementation of ADAPT-Clifford runs on a single core, thus one could then easily improve the quality of solution without sacrificing the TTS or runtime scaling by executing the algorithm for different initial position in parallel.We thus believe, this variant of the algorithm does offer an advantage over GW.

B. Signed weights: the Sherrington-Kirkpatrick model
The Sherrington-Kirkpatrick (SK) model [92] has played a fundamental role in the advancement of the understanding of the physics of spin glasses and disordered systems [93][94][95][96].It describes N classical spins with all-toall couplings of both ferromagnetic and antiferromagnetic character.The Hamiltonian is given by where σ i ∈ {−1, 1} is a classical spin and the couplings ω i,j are sampled from a distribution with zero mean and unit variance, for instance the normal distribution N (0, 1).A milestone result by Parisi [97,98] gave an explicit expression for the ground state energy density of this model in the thermodynamic limit, which we refer to as the Parisi value, lim where the expectation value is over realizations of the random couplings, and E sk min refers to the ground state energy of Hamiltonian in Eq. ( 22).The most accurate numerical value of Eq. ( 23) to date was computed in Ref. [99].The limit in the LHS of Eq. ( 23) has been formally shown to both exist and be equal to the Parisi value [100,101].
Recently the SK model has been used as a benchmark in the study of quantum approximate optimization algorithms [9,102,103].Motivated by these works, we focus our attention on this model to characterize the performance of our algorithm on complete graphs with signed weights.A word of caution: The ADAPT-QAOA solution circuits for the signed case, including small instances of the SK model, are not completely Clifford, see Fig. 1b,d and Sec.III.As such, we do not expect our algorithm to match the solution quality of the best classical algorithm due to Montanari [104,105], which produces a σ * with energy below (1 − ϵ) times the lowest energy for typical instances, with ϵ a small positive constant [106].Nevertheless, we are interested in seeing how close the σ * 's produced by our algorithm get to the Parisi value, both for the randomized and deterministic variants of ADAPT-Clifford.
In order to utilize ADAPT-Clifford we promote the classical spin in Eq. ( 22) to σ i → Z i and use the resulting Hamiltonian as our cost.Following the presentation of the previous subsection, we discuss first the performance of the randomized ADAPT-Clifford.The green circles in the inset of Fig. 4c show E Emin N for this algorithm with problem sizes up to N = 1000.To obtain its value in the thermodynamic limit we fit the data for N ∈ [40,1000] to a model of the form qN −2/3 + Π Cliff ri [107] where Π Cliff ri corresponds to the mean energy density in the thermodynamic limit obtained with the randomized ADAPT-Clifford.We find Π Cliff ri ≈ −0.682 which corresponds to

(e-h)
Instance-averaged minimum energy differences between the solutions found with ADAPT-Clifford and standard GW for (e) unweighted 3-regular (circles), (f) weighted 3-regular (diamonds), (g) weighted 8-regular (squares), and (h) Erdos-Renyi with edge probability 1/2 (exes).The magenta dotted line indicates equal energy of the solutions found on average.We have omitted the error bars to avoid saturating the figure.All averages were computed over 100 randomly generated instances.
∼ 89% of the Parisi value (black star in inset of Fig. 4c).This value is below what is obtained with convex relaxation methods, for instance semidefinite programming, which is known to give E Emin N = − 2 π + o(1) ≈ −0.6366 with o(1) a number which vanishes for N → ∞ [108,109].For comparison we display E Emin N = − 2 π as the horizontal dotted line both in the inset and in Fig. 4c.
Let us now consider the deterministic ADAPT-Clifford.For small problems N ∈ [10,30] we computed the exact approximation ratios α over 100 problem instances, and show them as empty circles in Fig. 4a.Notably we do not observe α < 0.94 for any instance, and the average over instances is always above α > 0.997.To complement this observation we compute the success rate, defined as the number of instances for which the difference E SK min − E Cliff min < 10 −10 .These are shown in Fig. 4b with the smallest one being ∼ 82% at N = 30.
To fully explore the performance of the deterministic ADAPT-Clifford algorithm, we solve 100 instances for problems up to N = 200.The normalized energies E Cliff min /N are shown as empty circles in Fig. 4c for all the instances considered, the red full circles show the respective E[E Cliff min ]/N and the error bars correspond to the standard deviation of the normalized energies.To assess the quality of the solutions found we consider the data in the interval N ∈ [40,200] and fit it to a model of the form qN −2/3 + Π Cliff with Π Cliff the estimated mean energy density in the thermodynamic limit of the solutions found by our algorithm.In particular for N = 200 we find E E Cliff min N ≈ −0.727... and from the linear fit we find Π Cliff ≈ −0.7409..., shown by a red star in Fig. 4c.These values correspond to ∼ 94% and ∼ 97% of the Parisi value, respectively (the latter is shown with a black star in Fig. 4c).These values are below what is obtained with convex relaxation methods (horizontal dotted line in Fig. 4c).Notably, the value reached by our algorithm for N = 200 is already better than what can be obtained with zero-temperature simulated annealing which gives E Emin N ∼ −0.71 (as quoted in Ref. [102]), and Π Cliff is comparable to what is achievable with simulated annealing on large problem instances.

VI. ALGORITHM PERFORMANCE ON OTHER FAMILIES OF GRAPHS
In this section, we characterize the performance of ADAPT-Clifford in both its variants for the MaxCut problem on K-regular graphs (unweighetd and weighted) and unweighted Erdos-Renyi graphs with various edge probabilities.For the randomized ADAPT-Clifford, we directly compare the quality of the cuts found with standard GW, while for determinisitic ADAPT-Clifford we discuss the exact approximation ratios for small problems and compare against GW with variable I, the number of time the rounding step is performed.

A. Performance on K-regular graphs
We consider 3-regular and 8-regular graphs, unweighted and weighted.In all cases edge weights are sampled from U[0, 1].
In Fig. 5a-c we show the normalized instance-averaged minimum energy of the solutions found with randomized ADAPT-Clifford and standard GW.For large (N > 200) unweighted 3-regular graphs, GW finds better solutions, on average, than randomized ADAPT-Clifford -see Fig. 5e.The situation is markedly reversed with the inclusion of nontrivial edge weights, with randomized ADAPT-Clifford outperforming standard GW (see Fig. 5b), and the performance margin widens with increased connectivity, see Fig. 5c.These observations are verified with the averaged minimum energy differences shown in Fig. 5f,g for weighted 3-and 8-regular graphs, respectively.Thus, GW performs better than randomized ADAPT-Clifford only for unweighted 3-regular graphs, while the comparative performance of our algorithm consistently improves with both the inclusion of edge weights and higher connectivity.
We now move to the performance of deterministic ADAPT-Clifford.In Fig. 6a we show the mean approximation ratios over 100 problem instances for each of these types of graphs.For the unweighted 3-regular graphs we consider problem sizes N ∈ [10,28] and for the weighted problems we consider problem sizes N ∈ [12,28].We have omitted the error bars from the figure for the sake of clarity.Deterministic ADAPT-Clifford shows the poorest performance for unweighted 3-regular graphs, circles in Fig. 6a, with a decreasing mean α as N increases.Interestingly the comparative performance of ADAPT-Clifford improves upon inclusion of edge weights, diamonds in Fig. 6a, with a mean α above 0.995 for all problem sizes considered.Further improved performance is observed with higher edge connectivity, as evidence by the mean approximation ratio for weighted 8-regular graphs (squares in Fig. 6a).In Fig. 6b we show the success rate of the algorithm, i.e., the number of times ADAPT-Clifford found the maximal cut.The 3-regular graphs (unweighted and weighted) show a success rate which consistently decay with problem size.On the contrary for weighted 8-regular graphs our algorithm shows a success rate above 90% up to N = 28.
For larger problem sizes, we compare the solution quality of deterministic ADAPT-Clifford with that of GW with variable I (I = 1 is the standard GW algorithm).Fig. 7a,b,c shows the normalized mean minimum energy E[E min ]/N for the K-regular graphs studied.Notably, deterministic ADAPT-Clifford produces solutions of lower energy than GW for all three graph ensembles under consideration, compare the colorful markers with the light grey markers in Fig. 7a,b,c.For the unweighted 3-regular graphs, Fig. 7a, already at I = 10 GW consistently finds a cut of lower energy than deterministic ADAPT-Clifford, signaling at a reduced performance of the latter method compared to the case of weighted complete graphs.This observation can be further verified with the mean difference of the minimum energy found, E[E Cliff min − E GW min ], shown in Fig. 7e.
Similarly to its randomized counterpart, deterministic ADAPT-Clifford performs more competitively when edge weights are included.In Fig. 7b we show the E[E min ]/N obtained with our algorithm (red diamonds), GW (light grey diamonds), and GW with I = 10 3 (black diamonds), for the weighted 3-regular graphs.Further inspection of the corresponding E[E Cliff min − E GW min ], shown in Fig. 7f, shows that at least I = 10 2 are necessary for the GW solution to be, on average, superior to that found by our algorithm.The performance margin widens as we move to regular graphs with higher connectivity.Fig. 7c shows E[E min ]/N obtained with deterministic ADAPT-Clifford (orange squares), standard GW (light grey squares), and GW with I = 10 3 (black squares), for weighted 8regular graphs.After inspecting the E[E Cliff min − E GW min ] in Fig. 7g we observe that I = 10 4 is necessary for the GW solution to be consistently better than the deterministic ADAPT-Clifford solution.Thus, for sparse graphs the performance of both randomized and deterministic ADAPT-Clifford improves with the inclusion of edge weights and/or higher node connectivity.

B. Performance on unweighted Erdös-Rényi graphs
We now wish to characterize the performance of ADAPT-Clifford for MaxCut on dense graphs with variable density.For this task we will focus on unweighted Erdös-Rényi graphs.
First, we fixed the edge probability to 1/2 and study the performance with respect to the problem size.In Fig. 5d we show the instance averaged minimum energy of solutions obtained with randomized ADAPT-Clifford (green) and standard GW (light grey).For graphs up to N = 1000.The randomized version of our algorithm produces better solutions, on average, than GW, an observation that is verified by the instance-averaged minimum energy differences shown in Fig. 5h.
Next, we analyze the performance of the deterministic ADAPT-Clifford on small problems N ≤ 28.Fig. 6a,b show mean approximation ratios (exes), which is above α ∼ 0.997, and success rates, respectively.Notably, deterministic ADAPT-Clifford shows a higher success rate for this family of graphs, finding the maximal cut on all instances considered for the sizes N = 10, 12 (whereas it only achieves the same for the 19 nonisomorphic 3-regular graphs at N = 10).For larger problems, in Fig. 7d we compare the normalized instance-averaged minimum en-ergy found by our algorithm (orange solid line), standard GW (light grey dashed line), and GW with I = 10 3 (black dashed line).Our algorithm (orange) finds a solution of lower energy, on average, than that found with GW (light grey).We explore the extent of this advantage by inspecting the mean difference of the minimum energy found, E[E Cliff min − E GW min ], as a function of N and with I as a control parameter.The results are shown in Fig. 7h.It is seen that only at I = 10 4 the GW solutions are consistently of lower energy than those found by deterministic ADAPT-Clifford.Now we turn our attention to benchmarking both the randomized and deterministic ADAPT-Clifford on Erdös-Rényi graphs with varying edge inclusion probability.We focus on problems with N = 120 and consider edge probabilities in [0.1, 0.9].We solve 100 problem instances of MaxCut per edge inclusion probability.In Fig. 8a we show the normalized mean energies found with the randomized ADAPT-Clifford (green) and with standard GW (light grey).Our randomized algorithm returns, on average, a cut of better quality than GW (see also instance-averaged minimum energy differences in Fig. 8b).The normalized mean energies of the solutions found with deterministic ADAPT-Clifford (orange) are shown in Fig. 8c, alongside those for standard GW (light grey), and GW with I = 10 4 (black).With the exception of edge probabilities smaller than 0.15 and larger than 0.85, the solutions found by our algorithm are always, not merely on average, better than the ones found with GW, with the largest advantage observed for edge probabilities around 1/2.Only at I ∼ 10 4 does GW produce solutions on average comparable to those found by ADAPT-Clifford.This is seen more clearly in the instance-averaged energy difference of the solutions found E[E Cliff min − E GW min ], shown in Fig. 8b.Only at I = 10 4 we find E[E Cliff min − E GW min ] ∈ (0, 0.5] for all edge probabilities, indicating our algorithm no longer offers an advantage over GW.
The results discussed here suggest that ADAPT-Clifford offers an advantage over GW on the quality of the cut found for dense graphs, with the largest gap for graphs with density ∼ 1/2.

VII. DISCUSSION AND OUTLOOK
We introduce ADAPT-Clifford, a quantum inspired classical approximation algorithm for MaxCut.For each problem instance, ADAPT-Clifford builds a low-depth Clifford circuit to prepare a stabilizer state that encodes an approximate solution.The algorithm was inspired by observation of the (almost) Clifford character of the ADAPT-QAOA solution circuits for MaxCut on weighted fully connected graphs.A comparison between the two methods and resource estimates for ADAPT-Clifford are given in App.E. We introduce a randomized and a deterministic variant of this algorithm.Their respective runtime complexities are O(N 2 ) and O(N 3 ) for sparse graphs, and O(N 3 ) and O(N 4 ) for dense graphs, and in all cases the space complexity is O(N 2 ).Naturally, the deterministic variant always outperforms the randomized variant, albeit at the cost of an increased runtime.
We have studied the performance of ADAPT-Clifford on MaxCut for various families of graphs, both dense and sparse, and both unweighted and weighted.On weighted complete graphs with positive weights, ADAPT-Clifford finds very high quality cuts, reaching the absolute maximum in the majority of small instances.Moreover, the algorithm is scalable, allowing us to easily find solutions for instances with up to 1000 nodes.ADAPT-Clifford also performs well for signed weights, finding good approximations to the ground state of the SK model with an energy that extrapolates to 97% of the Parisi value in the thermodynamic limit.To investigate performance as a function of density, we applied ADAPT-Clifford to MaxCut on unweighted Erdös-Rény graphs with variable edge inclusion probability.We again find that ADAPT-Clifford finds the absolute maximum cut for the majority of small instances and easily scales to hundreds of nodes.Finally, we study the performance of ADAPT-Clifford for sparse graphs.Even though these graphs are far from the context that gave rise to the algorithm, we find that ADAPT-Clifford still performs well, producing the absolute maximum cut with high probability for small instances and easily scaling to 1000 nodes.Only for the case of 3-regular graphs, the sparsest category of graphs we studied, we observe a noticeable deterioration in solution quality with increasing size.Counter-intuitively, performance improves somewhat with the inclusion of edge weights.
To assess the performance of ADAPT-Clifford for large problem instances whose exact solution is intractable, we compare its performance with the GW algorithm, which represents the state of the art in approximate solution of MaxCut.For all graph families studied, ADAPT-Clifford outperforms the standard GW algorithm in the quality of the cut found.Only for very sparse unweighted graphs, such as 3-regular graphs, the performance of the GW algorithm becomes comparable to that of ADAPT-Clifford, but even in this case the inclusion of edge weights favors the latter.Finally, ADAPT-Clifford solves problems to which the GW algorithm is not directly applicable, as exemplified by our results on the SK model.
The Clifford or near Clifford character of the ADAPT-QAOA solution circuits is a key observation which was missed in previous work [61].This observation, as laid out in Sec.III, allowed us to devise a quantum-inspired, polynomial-time approximation algorithm for MaxCut.While it is known that MaxCut on dense graphs admits Polynomial Time Approximation Schemes (PTAS), leading to approximated solutions which are 1 − ϵ away from the optimum in time polynomial in N [44,45], the scaling of the runtime as a function of ϵ may render these algorithms impractical.In contrast, in this work we showed empirically that ADAPT-Clifford performs better than an algorithm that offers a guaranteed approximation ratio.Notably, based on the gradient criteria used as update rule in ADAPT-Clifford a connection between this algorithm and a family of heuristics for the MaxCut problem, known as Sahni-Gonzales algorithms [87,88], can be established, as was recently pointed out in Ref. [89].It is remarkable to see that when ADAPT-QAOA performs best, the adaptive approach builds solution circuits which share this property with well known classical heuristics.
We hope the results reported here will help delimit the subset of graphs where a quantum speedup could be expected and thus where the current efforts should focus, in similar spirit to previous results obtained with a different subuniversal family of gates [110].While our work indicates that ADAPT-Clifford has a guaranteed approximation ratio, we do not yet have a proof.Our algorithm showed the poorest performance on fully connected graphs with signed weights.This was anticipated in Sec.III since the ADAPT-QAOA solution circuits are not fully Clifford.However, they are near -Clifford, motivating then a resource-centered design of variational ansätze, with a Clifford mixer part constructed following a scheme like the one introduced in this work, similar in spirit to the optimal mixers restricted to subspaces [111], and a cost part with few variational parameters adding just the right amount of nonCliffordness necessary to approximate the problem up to a desired ratio.Furthermore our algorithm could aid in reducing the cost of parameter optimization in QAOA when used to warm-start [112] it.More broadly, Clifford circuits can be leverage to construct a framework for the efficient state initialization in variational quantum algorithms beyond the product state paradigm.An example of this applied to quantum chemestry problems was introduced in Ref. [113].N ∼ 300 after which the familiar O(N 3 ) scaling is observed.Additionally, we do not find a strong dependence of the TTS with I, in fact for the problem sizes considered in this work, i.e., up to N = 1000, the TTS does not increase with I long as this is not bigger than O(10 2 ), see Fig. 9b.When the value of I exceeds this threshold, we do observe an increase in the runtime albeit not a considerable one.As such, deterministic ADAPT-Clifford will only be superior both in solution quality and TTS up to N = 30.
For randomized ADAPT-Clifford the TTS is always better than GW, see Fig. 9a.In fact, we empirically find a more advantageous scaling of O(N 2.7 ) when contrasted to the O(N 3 ) found in the same manner for GW.However, this makes the comparison considerably more subtle.Although the quality of solution found by GW is already better than randomized ADAPT-Clifford when I > 10, GW relies on a Cholesky decomposition which has highly optimized numerical subroutines that exploit the multi-core nature of modern CPUs.However, randomized ADAPT-Clifford is based on STIM and thus runs on a single core.One could then easily improve the quality of solution without sacrificing the TTS or runtime scaling by executing the algorithm for different initial position in parallel.We thus believe, this variant of the algorithm does offer an advantage over GW.In both cases the algorithm reaches approximation ratio of 1, indicating a maximal cut has been found.

4 FIG. 1 .
FIG. 1. QAOA and ADAPT-QAOA results for MaxCut on weighted complete graphs with N = 6.(a) Instance average of α for QAOA (purple) and ADAPT-QAOA (green).(b) Instance average of S in Eq. 13 of solution circuits with p layers fore QAOA (purple) and ADAPT-QAOA (green).For both (a) and (b) the distributions of the weights used are indicated in the figure.(c,d) Examples of the parameters γ * (cost, solid line) and β * (mixer, dashed line) of the solution unitary U (γ * , β * ) for an instance with weights drawn from U[0, 1] (c) and N (0, 1) (d).

FIG. 2 .
FIG. 2. Performance of randomized ADAPT-Clifford on weighted complete graphs.(a) Normalized energy found by randomized ADAPT-Clifford (green circles) and GW (light grey circles) averaged over 100 instances.(b) Instance averaged minimum energy difference between the solution found with randomized ADAPT-Clifford and GW as a function of problem size.Notice that randomized ADAPT-Clifford is almost always superior to GW.The magenta dotted line indicates a mean energy difference of zero.We have omitted the error bars for the sake of clarity.

2 I = 10 3 I = 10 4 I = 10 5 FIG. 3 .
FIG. 3. (a) Approximation rations α (empty circles) and mean approximation ratios (orange full circles) of solutions to MaxCut on 100 weighted complete graphs per graph size found with deterministic ADAPT-Clifford.(b) Success rate on the 100 problem instances per graph size considered in (a).(c) Instance-averaged minimum energy over 60 problems up to graphs with 200 nodes, both whit our algorithm (orange solid line), with Goemans-Williamson algorithm (light grey dashed line), and Goemans-Williamson with I = 10 5 (black dashed line).The inset shows the mean difference in the minimum energies found by our algorithm and the GW algorithm as a function of the problem size and for different values of I.The magenta dotted line indicates a mean energy difference of zero.In the inset we have omitted the error bars for the sake of clarity.

FIG. 4 .
FIG. 4. (a) Approximation rations α (empty circles) and mean approximation ratios (red full circles) of deterministic ADAPT-Clifford for 100 different disorder realizations of the SK model per system size.(b) Success rate on the 100 problem instances per graph size considered in (a).(c) Groundstate energy density for each of the 100 problem instances (empty circles) per problem size up to N = 200, and their mean (full circles).The dashed-dotted line show the best linear fit and the red star the respective mean energy density in the thermodynamic limit.The inset shows the average ground state energy density up to N = 1000 as obtained with randomized ADAPT-Clifford, with its corresponding linear fit (see main text) and the value in the thermodynamic limit (green star).The grey dotted line shows the mean energy density obtained with semidefinite programing and the black star shows the Parisi value Π * .

FIG. 5 .
FIG.5.Performance of randomized ADAPT-Clifford vs GW. (a-d) Normalized instance-averaged minimum energy found with randomized ADAPT-Clifford (colorful markers and solid lines) and standard GW (light grey markers and dashed lines).The different graph types studied are: (a) unweighted 3-regular graphs, (b) weighted 3-regular graphs, (c) weighted 8-regular graphs, (d) unweighted Erdös-Rényi graphs with edge probability 1/2.For the weighted case we take ωi,j in U[0, 1].(e-h) Instance-averaged minimum energy differences between the solutions found with ADAPT-Clifford and standard GW for (e) unweighted 3-regular (circles), (f) weighted 3-regular (diamonds), (g) weighted 8-regular (squares), and (h) Erdos-Renyi with edge probability 1/2 (exes).The magenta dotted line indicates equal energy of the solutions found on average.We have omitted the error bars to avoid saturating the figure.All averages were computed over 100 randomly generated instances.

1 I = 10 2 I = 10 3 I = 10 4 FIG. 7 .
FIG. 7. (a-d)Normalized instance-averaged energy found over 100 instances for problem sizes up to N = 100 and different graph types obtained with deterministic ADAPT-Clifford (colorful markers and solid lines), with standard GW (light grey markers and dashed lines), and GW with I = 10 3 (dark grey markers and dashed lines).The different graph types studied are: (a) unweighted 3-regular graphs, (b) weighted 3-regular graphs, (c) weighted 8-regular graphs, (d) unweighted Erdös-Rényi graphs with edge probability 1/2.For the weighted case we always take ωi,j in U[0, 1].(e-h) Instance averaged minimum energy differences between the solutions found with our algorithm and the solution found with GW with different values of I.For the graph types: (e) unweighted 3-regular (circles), (f) weighted 3-regular (diamonds), (g) weighted 8-regular (squares), and (h) Erdos-Renyi with edge probability 1/2 (exes).As a reference the the magenta dotted line indicates equal energy of the solutions found on average.We have omitted the error bars to avoid saturating the figure.

1 I = 10 2 I = 10 3 I = 10 4 FIG. 8 .
FIG. 8. (a) Normalized instance-averaged minimum energy of the solutions found with randomized ADAPT-Clifford (green) and GW (light grey).(b) Instance averaged minimum energy differences between the solutions found with our algorithm and the solution found with GW.(c) Normalized instance averaged minimum energy of the solutions found with the deterministic ADAPT-Clifford (orange), standard GW (light grey), and GW with I = 10 4 (black), as a function of the edge probability which we take in [0.1, 0.9].(d) Instance averaged minimum energy differences between the solutions found with our algorithm and the solution found with GW with different values of I, as a function of edge probability.The magenta dotted line indicates mean energy difference of zero.The averages are taken over 100 different problem instances and for N = 120.
Finally, our Clifford algorithm was tailored to solve the MaxCut problem.It remains an open question to what extent other combinatorial optimization problems admit Clifford approximation algorithms with practical polynomial runtimes.

FIG. 9 .
FIG.9.Problem instance averaged time to solution for randomized ADAPT-Clifford (orange solid line) and GW with I = 10 0 , 10 5 (dashed light grey and black), for the weighted complete graphs considered in Fig.2.(b) Time to solution of GW as a function of the number of times the rounding step I is performed.Results are shown for three different system sizes, N = 300, 400, 500 from bottom to top, respectively.The empty circles correspond to all the individual TTS for the different 100 problem instances solved per problem size, the full circle to the mean TTS and the error bars shown the standard deviation.The dashed lines are guides for the eye.

FIG. 10 .
FIG. 10. (a,d) Example graphs with N = 5 and N = 4 nodes, respectively.(b,e) Partitioned graphs according to the cuts produced by our algorithm, different colors denote the nodes in each of the disjoint subsets.(c,f ) Approximation ratios of the states produced by our algorithm in the search process for the maximal cut of the graphs shown in (a) and (d).In both cases the algorithm reaches approximation ratio of 1, indicating a maximal cut has been found.