Exploring the neighborhood of 1-layer QAOA with Instantaneous Quantum Polynomial circuits

We embed 1-layer QAOA circuits into the larger class of parameterized Instantaneous Quantum Polynomial circuits to produce an improved variational quantum algorithm for solving combinatorial optimization problems. The use of analytic expressions to find optimal parameters classically makes our protocol robust against barren plateaus and hardware noise. The average overlap with the ground state scales as $\mathcal{O}(2^{-0.31 N})$ with the number of qubits $N$ for random Sherrington-Kirkpatrick (SK) Hamiltonians of up to 29 qubits, a polynomial improvement over 1-layer QAOA. Additionally, we observe that performing variational imaginary time evolution on the manifold approximates low-temperature pseudo-Boltzmann states. Our protocol outperforms 1-layer QAOA on the recently released Quantinuum H2 trapped-ion quantum hardware and emulator, where we obtain an average approximation ratio of $0.985$ across 312 random SK instances of 7 to 32 qubits, from which almost $44\%$ are solved optimally using 4 to 1208 shots per instance.


I. INTRODUCTION
Since its introduction by Farhi et al. [1] in 2014, the Quantum Approximate Optimization Algorithm (QAOA) has been explored in the quantum computing literature as one of the most promising heuristics for achieving quantum advantage on near-term devices [2,3].This is only one example of a larger class of variational quantum optimization algorithms, which attempt to produce good solutions to combinatorial optimization problems by sampling a parameterized quantum circuit [4][5][6][7][8].In the absence of full quantum error correction [9] the required circuits must be sufficiently shallow to withstand noise, yet expressive enough to find states with high overlap onto the ground state.QAOA is a particularly good choice for satisfying these criteria, as it has an adjustable number of layers p.It can be understood as a Trotterized version of the quantum adiabatic algorithm (QAA), for which compelling theoretical evidence of performance exists [10].Additionally, it was shown that even for small numbers of layers, sampling from the QAOA ansatz is a hard task for classical computers [11].
In this regime of a small number of layers, the form of the Trotterized QAOA operators may not be the best choice.This has motivated [12][13][14][15] the addition of extra parameters to the QAOA ansatz so that, instead of evolving the state according to the problem Hamiltonian, each parameter in the ansatz has the freedom to evolve independently.By doing this, an ansatz of the same depth may incorporate corrections that would otherwise require multiple layers.
In particular, 1-layer QAOA circuits-with and without the additional parameterization-belong to the class FIG. 1. Diagrammatic representation of the algorithm.The 1-layer QAOA ansatz is a submanifold of the IQP ansatz and provides a warm start in the optimization protocol.The trajectory between the QAOA optimum and the IQP optimum is defined via the McLachlan variational principle and is computed classically.Color coding the optimization landscape represents the effective temperature of the associated state, with lower temperature states (blue) having a higher chance of sampling the ground state.The quantum computer is only used during the sampling step, which is known to be difficult classically.
of parameterized quantum circuits known as Weighted Graph States (WGS) used to simulate condensed matter systems [16][17][18][19][20][21][22].For these states, the reduced density matrix in a subsystem of fixed size can be computed classically, allowing the efficient evaluation of local observables on a classical computer.This property permits the derivation of analytic and exact expressions for 1-layer QAOA on arbitrary local Hamiltonians [23] and for extra-parameterized circuits on some restricted local Hamiltonians [12,13].Such expressions are used to train the model classically, bypassing typical limitations such as the appearance of barren plateaus [24].
In this manuscript, we explore the embedding of 1-layer QAOA into the broader class of parameterized Instantaneous Quantum Polynomial (IQP) circuits, for which similar hardness of sampling theorems exist [25,26], even in the presence of moderate noise [27].IQP circuits also belong to the class of WGS, but compared to QAOA and existing extra-parameterized variants our ansatz uses allto-all two-qubit interactions, making its implementation problem-independent and most natural for trapped-ion quantum computers.We additionally show that analytic and exact expressions can be obtained for arbitrary local Hamiltonians, and use them to train the model via robust classical techniques like the Runge-Kutta method [28].We emphasize the role of starting the training from the optimal QAOA and finding a nearby local minimum rather than aiming for a global optimum, which avoids the challenging exploration of non-trivial landscapes [29].
This leaves only the key ingredient of sampling from the final quantum state to be performed on the quantum device, as illustrated in Fig. 1.A recent investigation of the states produced by 1-layer QAOA [30] shows that sampling produces a distribution close to a Boltzmann distribution, at temperatures beyond the reach of classical sampling techniques such as Markov Chain Monte Carlo (MCMC) [31].We improve on this result by lowering the temperature further, using variational quantum imaginary time evolution (VarQITE) [32,33].However, the constraint of keeping the state in the variational manifold limits our ability to follow exact imaginary time evolution, distorting the distribution.
The manuscript is structured as follows.Section II provides a brief review of QAOA.Our IQP ansatz is introduced in Section III, where we make the connection to 1-layer QAOA, describe the derivation of analytical expressions and how to use them for classical training, and discuss a previous work [34] that challenges the possibility of quantum advantage with IQP circuits.In Section IV we describe our protocol for approximating thermal distributions and solving combinatorial optimization problems, while Section V presents numerical performance results.First, the average overlap with the ground state obtained with an exact state-vector simulator is polynomially better than for 1-layer QAOA on random Sherrington-Kirkpatrick (SK) Hamiltonians of up to 29 qubits.Second, when approximating thermal distributions we can reach lower temperatures than 1-layer QAOA but the approximation quality reduces.Third, we demonstrate a better performance than 1-layer QAOA at solving random SK Hamiltonians of up to 32 qubits in the recently released Quantinuum's trapped-ion H2 quantum hardware and emulator.Using a reduced number of shots, the best solution per instance presents a large approximation ratio and is optimal for a large fraction of instances.Finally, Section VI discusses the methods, results, and future research directions.

II. THE QUANTUM APPROXIMATE OPTIMIZATION ALGORITHM
The standard implementation of the QAOA [1] attempts to create states with large overlap onto the ground eigenspace of some optimization problem, typically defined through an Ising Hamiltonian, where the Z i variables can be interpreted as the projections onto the Z-axis of a classical or quantum mechanical ensemble of N spin- where p is called the level of the QAOA and the sets β, γ of real coefficients β k , γ k are used as variational parameters.The most commonly used cost function in the optimization of the ansatz is the expectation value of the problem Hamiltonian although alternative objective functions have been proposed [35,36].For the rest of this work, we will only consider the 1-layer QAOA, which is sufficiently shallow to withstand the effects of moderate noise and obtains an enhanced average probability of sampling the ground state quadratically larger than random guessing [30], i.e., scaling as 2 −0.5N .

III. THE INSTANTANEOUS QUANTUM POLYNOMIAL CIRCUIT
The IQP is a non-universal model of quantum computation with similar roots to the boson sampling problem, whose aim is to strengthen the general belief that quantum computers are more powerful than classical machines [25,26].Under certain widely believed complexity-theoretic assumptions, sampling from the IQP state H ⊗N exp(−iH IQP ( ⃗ θ)) |+⟩ ⊗N in the computational basis of all qubits is a hard task for a classical computer [26].Here the IQP Hamiltonian is defined as The IQP ansatz employed in this work is a generalization where Hadamard gates are replaced with independent parameterized single-qubit rotations R x (ϕ) = exp(−iϕX/2), leading to the quantum circuit where θ = ( ⃗ ϕ, ⃗ θ) are free, real parameters.The IQP state is recovered by setting ϕ i = π/2 and making the transformation θ i −→ θ i − π/2.Since the IQP state can be brought to this form by modifying the final layer of single qubit rotations, we expect generic states of this form to be difficult to sample classically as well.We also make the important observation that, up to single qubit rotations and energy rescaling, the IQP state in Eq. ( 4) is the same as that produced by a 1-layer QAOA designed to solve for the ground state of H IQP .
This ansatz generalizes the optimization cost function of Eq. ( 3) to which we refer to as the optimization landscape.The task of computing the cost function defined in Eq. ( 5) is then reduced to estimating the expectation values of the spins ⟨Z i ⟩ θ and correlators ⟨Z i Z j ⟩ θ in an arbitrary state |Ψ(θ)⟩.In the Supplemental Material we show that the latter expression can be reduced to calculating partition functions of reduced Ising Hamiltonians of the form where e's are single or two qubit subsets.The reduced generator H e retains only the terms in H IQP that anticommute with the operator X e = i∈e X i .This leads to a highly restricted graph topology, for which partition functions can be evaluated exactly.We generalize this method to show that IQPs have simple analytic expressions for all expectation values of the form ⟨Z e ⟩ θ , with a number of terms that scales like O(2 |e| ).
These properties of the IQP ansatz make it a good candidate for solving optimization problems, as it is guaranteed to be at least as powerful as 1-layer QAOA and the training can be performed efficiently using only classical resources.The access to exact, analytic expressions for the cost function also means we do not need to worry about finite sampling or device errors during training.Barren plateau issues can also be ruled out, as we can evaluate gradients to arbitrary precision and use adaptive step sizes.Access to a quantum computer is only necessary during the final sampling step, so we expect our protocol to perform well under moderate hardware noise.
As opposed to the standard QAOA ansatz, the IQP is sufficiently flexible to produce all computational states.In particular, this means that, if a classical algorithm were able to find the global optimum of Eq. ( 5), it would also find the exact ground state of H.In [34], it is shown that the optimization landscapes of IQP ansatze with only polynomially many terms (like our ansatz) are generally non-convex and computational states other than the solution may form local minima, which we call trivial minima.Consequently, converging to such local minima would imply the algorithm does not need access to a quantum computer, as the bits x i of the solution corresponding to the optimal parameters are given by ⟨Z i ⟩ θ , which can be efficiently computed classically.
We prove that the optimization landscapes can contain non-trivial minima, and give a minimal example of this in the Supplemental Material.Remarkably, we provide numerical evidence that for the SK model such a local minimum is located in the vicinity of the QAOA parameters, and show that sampling the IQP circuit at this point greatly enhances the chance of finding the ground state compared to QAOA.

IV. METHODS
A remarkable result of [30] is that for a wide range of optimization problems that can be formulated as in Eq. ( 1), the 1-layer QAOA is capable of approximating pseudo-Boltzmann states proportional to exp(−βH/2) |+⟩ ⊗N , with large inverse temperature β, up to relative phases that do not affect the distribution.This is important because sampling this state produces the same distribution as sampling the mixed thermal state ρ β = e −βH /Z for classical Hamiltonians, which is useful for a variety of optimization tasks.
In our work, we use this result to justify the QAOA as a good starting point in optimizing the IQP ansatz.Since the 1-layer QAOA ansatz can be recovered by restricting the parameters of the full IQP, we find the optimal QAOA position classically, using the BFGS [37] algorithm on the submanifold.To find a local optimum in the vicinity of this position, it is sufficient to use simple gradient descent.However, we also explore the feasibility of our algorithm for producing low-energy thermal states, which is achieved using a different approach called VarQITE [32,33].This protocol aims to find the trajectory on the manifold that best approximates the action of exp(−τ H) on the state.If the initial state is pseudo-Boltzmann, then applying this operator leads to a decrease in temperature.The parameters in the ansatz are evolved according to the McLachlan variational principle [38]: where the coupling matrix A describes the geometry of the variational manifold (i.e. it is the Gram matrix of the tangent vectors corresponding to each parameter) and τ is the imaginary time variable.In the Supplemental Material we show that the coefficients of the Gram matrix can be expressed as expectation values of low-weight Pauli operators in the IQP, for which we find simple analytic expressions.However, this calculation is computationally expensive, so when the focus is on finding a local minimum rather than preserving a thermal profile, we set A = I and perform simple gradient descent.
In both cases, this linear system of ODEs defines a flow on the variational manifold, that we solve numerically using the Runge-Kutta method [28].We stop this procedure when we arrive at a local minimum, or when A becomes non-invertible.This typically happens after a long plateau in the energy profile, which we illustrate in the Supplemental Material.Such event becomes a rare occurrence when we increase the number of qubits, but for problems that exhibit this behavior, we choose the optimal parameters in the middle of the plateau.After finding the optimal parameters, we sample the circuit and compute the probability of finding the ground state.We share the code used for implementing this protocol in [39].
We characterize our distributions using an effective inverse temperature β.This is obtained by minimizing the Kullbach-Leibler divergence of the IQP distribution to the family of thermal distributions.Here, we compute the KL divergence exactly, but in practice, this would be estimated from samples [40].

V. RESULTS
We test our method on Sherrington-Kirkpatrick (SK) Hamiltonians [41][42][43] of up to N = 29 spins using Qiskit exact state-vector quantum simulators [44].These Hamiltonians are of the form of Eq. ( 1) with h i = 0 and J ij independent and identically distributed Gaussian random variables of 0 mean and a standard deviation of 1/ √ N .The unbiased SK model presents a Z 2 symmetry, so the ground state is unique up to flipping all qubits.This is a well-understood spin model with compelling classical solvers [45].In quantum optimization it is one of the most studied benchmark problems [30,[46][47][48][49][50].
In Fig. 2, we show how the overlap of the optimized IQP state onto the ground eigenspace varies with the problem size, and how it compares to the overlap achieved by the initial QAOA.Both plots show a clear exponential trend with relatively low and slowly increasing variance.This confirms that our algorithm has a significantly better exponential scaling compared to 1-layer QAOA.
We also study how the temperature of the distribution changes as we perform imaginary time evolution on our variational manifold up to time τ = 10, close to convergence.In Fig. 3 we show how the optimal normalized temperatures achieved by the final optimized IQP state are lower than those achieved by the starting QAOA state.However, the KL divergence between the optimized IQP state and the best-fitting thermal state is higher and presents more dispersion than QAOA across Hamiltonian instances.This indicates that IQP states might be beneficial for the task of sampling low-energy eigenstates while QAOA provides a better approximation to thermal distributions.
In Fig. 4 we plot example distributions produced by the QAOA and the optimized IQP ansatz.From the qualitative aspect of the IQP distribution, we see that the performance of our algorithm in increasing the ground state overlap cannot be entirely explained as a consequence of having a lower temperature.The distribution becomes arched, and the probabilities of sampling the low-energy eigenstates rise orders of magnitude above the predictions of the thermal fit.Future theoretical work is necessary to understand how this effect emerges, and whether it is recovered in more general optimization problems.
Our algorithm is also studied in a more realistic setting, where quantum circuits are affected by hardware noise.We use the recently released Quantinuum H2 trapped-ion quantum hardware and emulator [51].The emulator performs exact state-vector simulation under a noise model that replicates the noisy behavior of the real device.The device presents all-to-all connectivity and high-fidelity parameterized gates of the form exp(−iθZZ), making it ideal for our protocol and for QAOA on densely connected Hamiltonians.
For this analysis, we study biased SK models with the coefficients h i independently sampled from the same Gaussian distribution as the coefficients J ij .The presence of the bias breaks the Z 2 symmetry, halving the initial overlap with the ground eigenspace and making the problem slightly more general.The bias adds an additional slope in the vicinity of QAOA, that sometimes dissolves the local minimum that we exploit in the previous study, leaving no obvious method to pick a point in the gradient-descent trajectory.Our aim for this analysis is however to study the performance in the neighborhood of QAOA, rather than providing the most optimized form of our protocol.For this purpose we pick the optimized 1-layer QAOA as the first circuit, and three equally-spaced circuits corresponding to three of the first gradient-descent steps.The Supplemental Material describes the criterion we used to pick these circuits.
Figure 5 compares the quality of the best solutions obtained by the corresponding four circuits.From the 312 instances, we optimally solve 5, 21, 59, and 86, respectively for the four circuits.The best solution sampled for each instance has an average approximation ratio and standard deviation of (0.87, 0.10), (0.935, 0.083), (0.948, 0.083), and (0.970, 0.060), respectively.When considering for each instance only the best solution obtained from the four circuits as the output of our algorithm, 136 instances are solved optimally (almost 44%) and the distribution has an average approximation ratio and standard deviation of (0.985, 0.029).

VI. DISCUSSION
The algorithm we introduce explores the natural connection between the 1-layer QAOA state and IQP circuits.Studying the vicinity of the QAOA in this broader variational manifold leads to a better understanding of its optimality as a shallow-depth quantum heuristic, as well as how it can be improved.
We show that, for the case of SK Hamiltonians, our approach amplifies the probability of sampling the ground state, beyond what can be obtained using classical tools such as MCMC.The hardware implementation is as resource-demanding as it is for 1-layer QAOA, and parameter training can be performed classically in time O(N 3 ).Results on the Quantinuum H2 show the reliability of our protocol to solve large instances with scarce quantum resources.
We leave as a future work the development of an optimized strategy to pick points along the gradient-descent trajectory where sampling from the quantum computer might yield even better performance.
The results presented motivate the development of strategies to compare the performance of our protocol first IQP For each instance we pick four steps along the gradient-descent trajectory, corresponding to the standard 1-layer QAOA, and three IQP circuits.Then take ∼ 2 0.32N ∈ [4, 1208] shots, equally distributed across the four circuits.Each data point in the figure corresponds to the best solution sampled for each instance.If the best solution is optimal the point is placed in the lower row, while for sub-optimal solutions we place the point in the upper row to visualise the approximation error.
against state-of-the-art classical algorithms at the scale of real-world combinatorial optimization problems.For example, the access to the analytical expectation value of the problem Hamiltonian and higher powers of it might provide an efficient way to estimate the probability of sampling the low-energy tail for large-scale problems.
In this supplemental material we present some technical aspects of the work that are not shown in the main text.Section VII contains a full derivation of the analytic expressions of expectation values of operators in the SD-IQP ansatz, as well as the application of this general formula to the particular case of a problem Hamiltonian.In Section VIII we prove that it is possible for an IQP optimization landscape to have local minima that do not correspond to eigenstates of the problem Hamiltonian, by constructing a minimal example.Section IX shows how to obtain analytic expressions of the Gram matrix elements, which are essential to perform VarQITE.In Section X we show examples of the cost function evolution until convergence and discuss the implications of the choice of sampling at different locations.Section XI discusses a criterion for selecting IQP circuits when the energy profile displays no plateaus.

VII. ANALYTIC EXPRESSION FOR EXPECTATION VALUES IN THE IQP STATE
In this appendix, we derive exact analytic expressions of expectation values of low-weight Pauli operators in the IQP ansatz.A similar derivation is presented in [52] for 1-layer QAOA circuits, that contain only two parameters, so that resulting expressions are a particular case of the ones derived in this Appendix.In contrast to the expressions derived in [12] for extra-parameterized 1-layer QAOA circuits, our expressions apply to arbitrary local Hamiltonians.During the write-up of the second version of this manuscript similar analytical expressions for arbitrary local Hamiltonians were obtained [53].
Let P e be some Pauli string that applies non-identity Pauli operators to a subset of qubits e ∈ N = {1, 2, . . ., N } and let w e = |e| be the weight of P e .If we identify Pauli strings that differ only by a phase, we can characterize them using two length-N boolean vectors a, b ∈ Z ⊗N 2 by the decomposition P e = Z a X b , where we used the notation Z a = N i=1 Z ai .Denote by e Z and e X the subsets of N corresponding to the nonzero elements of a and b respectively, and let w a = |e Z | and w b = |e X | be corresponding weights.Our goal is then to compute ⟨Z a X b ⟩ θ .First, we show how the layer of single qubit X rotations transforms this operator This can be expanded to a sum of 2 wa Pauli strings, whose expectation values are then to be calculated in the state exp(−iH IQP ) |+⟩ ⊗N .To simplify notation we will denote all expectation values in this state by ⟨•⟩, which differs from the expectation value in the full ansatz by omitting the subscript θ.If we recycle previous notation for brevity, we are now interested in computing expectation values of the form Since we are only working with IQPs we can expand the Hamiltonian as The terms in the Hamiltonian that commute with Z a X b can be straight-forwardly canceled out, while those that anti-commute with Z a X b can be moved through with a flipped sign.Then we have where in the last equality we expanded the state |+⟩ ⊗N as a sum over all spin configurations x i ∈ {+1, −1} and made use of the fact that the central operator is manifestly diagonal in this basis.We can absorb the Z a in the propagator by noting that exp(−iπZ/2) = −iZ and using the transformed angles θi = θ i − a i π/2, giving us the simple expression This form has the interpretation of a partition function over the bipartite graph formed by splitting the set of all qubits N into e X and its complement.This suggests we should separate the spins corresponding to different subsets, so we will denote by r the configurations of spins in e X and by s configurations of the complement.The expression is then rewritten as which is now a sum over the configurations of spins in e X only.For simplified notation, we introduced the Q function, which is defined as We can simplify this even further by grouping configurations that differ only by the Z 2 operation of flipping the sign of all spins to obtain where a P = j / ∈e X a j mod 2. We can now merge the two complex phases to obtain the final expression This expresses the expectation value as a sum of 2 w b −1 terms and can be computed efficiently when the operators we are interested in have small weight w ≪ N .In particular, to perform the optimization of the ansatz as described in the main text, we only make use of this expression with w b up to 2. Note that when w b = 0 we have a vanishing expectation value.
We will now show how Eq. ( 17) can be used to efficiently compute the expectation of the Hamiltonian in the IQP We may now use Eq. ( 17) to expand each term in this expression.First, we note that expectation values with no X operators simply vanish, so ⟨Z i ⟩ = 0 and ⟨Z i Z j ⟩ = 0. Then we can compute the remaining terms individually If we plug these expressions into Eq.( 18) we arrive at the final analytic form for our Hamiltonian expectation value Note that the computational time of evaluating this expectation value, as well as the gradient in θ, is O(N 3 ), if the problem Hamiltonian has all to all connectivity, as is the case for the SK model.It can be reduced to O(DN 2 ) if our problem can be formulated on a graph whose degree is bounded by D. Analogous efficient expressions may be obtained for problem Hamiltonians that include many-body interactions, as long as the weights scale at most like O(log N ) with the problem size.

VIII. EXAMPLE OF NON-TRIVIAL MINIMA OF THE OPTIMIZATION LANDSCAPE
In this appendix, we give a minimal example of a problem for which the IQP ansatz leads to non-trivial local minima, where by non-trivial we mean that the state produced is not an eigenstate of the Hamiltonian.Additionally, the state is shown to have an overlap of 0.5 onto the degenerate ground eigenspace, so the problem solution can be recovered by sampling.
Consider the 4-qubit Hamiltonian We explore this Hamiltonian using the IQP ansatz given by all 1-body and 2-body operators in the X-basis which is equivalent to the IQP state defined in the main text.We compute the expectation value ⟨Z 0 Z 1 ⟩ in the IQP state.The other 2 terms in H are obtained by cyclic permutations in 1, 2, and 3.
We provide a symbolic implementation of this expression in Python [39].Using symbolic differentiation, we show that the line given by equations θ 0 = π/2, θ 1 = π/2, θ 3 = π/2, θ 01 = π/2, θ 02 = π, θ 12 = 0, θ 03 = π/2, θ 13 = π/2, θ 23 = 0 (note that θ 2 is free and parameterizes the line) is a critical line of local minima.This is done by verifying that the gradient is 0 for all θ 2 and the hessian is positive semi-definite, with a single null eigenvalue corresponding to the direction going along the curve (except at the isolated point θ 2 = π/2 which we exclude from our analysis).In addition, the expected value of the Hamiltonian on this line is ⟨H⟩ = −2, while it is easy to check that all eigenvalues of the Hamiltonian must be odd integers.Therefore, the states created by the ansatz with the specified parameters must be superpositions of eigenstates with different eigenvalues.We claim that this is sufficient to show that all points on the line are non-trivial local minima and give the following proof: Proof.Consider an optimization space parameterized by (ϕ, ⃗ θ) variables, with the usual parameter range of 0 to 2π.Let us call the cost function defined on this space by J(ϕ, ⃗ θ).Assume J is infinitely smooth, so we can freely Taylor expand around all points (this can be verified from the analytic form of J in terms of sums of products of smooth functions).Assume the line of critical points found in the counterexample is defined by the condition ⃗ θ = ⃗ θ 0 for some constant ⃗ θ 0 .ϕ can then be considered a parameterization of the critical line (changing its value moves us along the line).On this line, we showed that the cost function is constant and the gradient is 0 At the Hessian level, we find that all second-order derivatives that contain ϕ are zero and the restriction of the Hessian to the ⃗ θ subspace is positive definite on some interval [ϕ < , ϕ > ] with 0 < ϕ < < ϕ > < π/2: for all nonzero vectors ⃗ v and ϕ ∈ [ϕ < , ϕ > ].Einstein summation convention is employed.In particular, for all unit vectors n we have that with λ min (ϕ) the smallest eigenvalue of the Hessian at (ϕ, ⃗ θ 0 ) when restricted to the θ i variables.Note that the role of ϕ is played by θ 2 in our example, but we changed the name for brevity.Suppose we want to focus our attention on the point (ϕ 0 , ⃗ θ 0 ) with ϕ 0 ∈ [ϕ < , ϕ > ] and show that this is indeed a local minimum.The coordinates of every point in the vicinity of this critical point that does not lie on the critical line can be written as (ϕ 0 + ϵ∆ϕ, ⃗ θ 0 + ϵ∆ ⃗ θ), where ∆ ⃗ θ is a unit vector and ϵ is some small quantity.We can write the value of the cost function at this point as where from the theory of Taylor series we know that R(ϵ) is continuous and The linear term in the equation above has been dropped due to Eq. ( 31) stating that all points of ⃗ θ = ⃗ θ 0 have vanishing gradients.Since the cost function is constant under shifts in ϕ this is equivalent to Since the minimum eigenvalue is strictly positive in the ⃗ θ subspace and R(ϵ)/ϵ 2 is a continuous function that decays to 0 for ϵ → 0 we conclude that there must exist some ϵ 0 > 0 such that the RHS of the above inequality is strictly positive for all |ϵ| < ϵ 0 .Since this implies that the LHS must also be positive for all values of ϵ in this ball around 0, then this proves that the value of the cost function in the neighborhood must be strictly larger than its value on the critical line.The above prescription can be applied to all perturbations around the critical point except those with ∆ ⃗ θ = 0, for which we know that the cost function must be constant.This proves that all points on the critical line represent local minima (similar to the minima of the Mexican hat potential).

IX. DERIVATION OF THE GRAM MATRIX
In order to perform VarQITE we must compute the Gram matrix A corresponding to the tangent vectors of our variational manifold.According to its definition in [32] we have where we used greek indices µ, ν that run over all variational parameters in the ansatz.We can now separate this matrix into several blocks based on the type of variational parameter From the definition given in the main text we can explicitly compute the tangent vectors as where e is a subset of N present in the IQP ansatz.For the θθ part of the matrix we get where we again employ the notation ⟨•⟩, which stands for the expectation value in the state exp (−iH IQP ) |+⟩ ⊗N .This result states that varying the parameters θ e one at a time with the same starting point on the manifold will take us along orthogonal directions.For the θϕ part and |e| = 1 we have and for |e| = 2 we get Since A must be hermitian, we can get the other off-diagonal block matrix as A ϕθ = (A θϕ ) T .Finally in the ϕϕ sector we have

X. ENERGY LANDSCAPE
In the main text, we claim that, for small problem sizes of the unbiased SK model, we sometimes do not find a non-trivial local minimum and instead the ansatz loses its overlap with the ground state.In this case, we still hit a plateau during the optimization and we overcome this issue by choosing to sample the IQP ansatz close to the middle of the plateau.This situation is illustrated in Fig. 6(b).In the attached histogram we see that the state obtained at the end of the optimization (orange) has non-zero support only on a small number of states and does not find the ground state, despite having much lower average energy.Sampling in the middle of the first plateau (blue) results in a wider spread of states, including a high overlap onto the ground state.In Fig. 7 we show that the cases where the algorithm does not converge to a good local optimum become very unlikely as we increase the number of qubits.

XI. CRITERION TO SELECT IQP CIRCUITS
In the case of biased SK Hamiltonians the local minimum in the vicinity of QAOA is usually dissolved by the bias and the average energy does not present an intermediate plateau.This leaves no obvious strategy to pick a point along the gradient-descent trajectory that provides some intuition on good performance.However, rather than optimizing this step of our protocol, the aim of this work is to study the performance in the neighborhood of QAOA.We then follow a simple criterion to pick four circuits in this vicinity.
The first circuit is precisely the optimized 1-layer QAOA, which serves as a warm-start to our protocol.To pick the other three circuits we observe the evolution of the parameters θ ij in the two-qubit gates of the ansatz.When these parameters reach values close to 0 or π the implementation of the corresponding gate can be replaced by single-qubit gates: the identity or two single-qubit π-rotations, respectively.The remaining two-qubit gates define the connectivity graph of the ansatz at every step.
The motivation to remove two-qubit gates with parameters close to 0 or π is that the noise inserted by their hardware implementation can be larger than the error incurred when they are replaced by their approximation as the identity or single-qubit gates, respectively.We decide to remove a two-qubit gate with parameter θ ij close to 0 (or π) if the effect of the gate is smaller than the reported infidelity p ∼ 10 −3 in Quantinuum's devices, i.e., if |sin(θ ij /2)| < p (or |cos(θ ij /2)| < p).
Our criterion for a fair comparison to 1-layer QAOA is that this graph is at least connected, i.e. formed by a single graph component.This ensures that, as for QAOA, the IQP circuit can not be split and implemented via separate unconnected circuits.
We pick the fourth circuit as the last circuit in the trajectory where the graph is still connected, and the second and third circuits at equidistant steps in that range.The fraction of entangling gates left in the three IQP circuits compared to QAOA has an average of 0.83 and standard deviation across problem instances of 0.25.

FIG. 2 .
FIG. 2. Optimization results for 300 randomly generated Sherrington-Kirkpatrick Hamiltonians of up to 29 spins.a) Probability of sampling the ground state configuration in the optimal IQP ansatz.b) Enhancement factor pIQP/pQAOA for finding the ground state in the optimized IQP ansatz compared to the original QAOA.The IQP was optimized until convergence using simple gradient descent.Using a linear fit, we find the average probability of sampling the ground state pIQP ∼ 2 −αN with α = 0.31 ± 0.02 and the average enhancement factor pIQP/pQAOA ∼ 2 δN with δ = 0.23 ± 0.02.The errors indicate the variability in gradient at one standard deviation.

FIG. 3 .
FIG.3.Normalized effective inverse temperatures β∥J∥ in the QAOA state and the IQP state after VarQITE evolution for a time τ = 10, for 20 randomly generated Sherrington-Kirkpatrick Hamiltonians of each size from 10 to 20 qubits.We also show the average and standard deviations for the KL divergences of each problem size.

FIG. 4 .
FIG. 4. Overlap of the state produced by our ansatz onto different Hamiltonian eigenvalues as a function of energy for the QAOA parameters (top), and optimized IQP parameters (bottom), for a randomly generated 20 qubit Sherrington-Kirkpatrick Hamiltonian.Brighter color indicates higher coarse-grain point density.Red line illustrates the thermal distribution model that minimizes the KL divergence.A red circle marks the location of the ground state.

FIG. 5 .
FIG.5.Optimization results on the Quantinuum H2 trapped-ion quantum hardware and emulator for randomly generated biased Sherrington-Kirkpatrick Hamiltonians of 7 to 32 qubits: two instances per problem size on the device (stars) and ten instances on the emulator (circles).For each instance we pick four steps along the gradient-descent trajectory, corresponding to the standard 1-layer QAOA, and three IQP circuits.Then take ∼ 2 0.32N ∈ [4, 1208] shots, equally distributed across the four circuits.Each data point in the figure corresponds to the best solution sampled for each instance.If the best solution is optimal the point is placed in the lower row, while for sub-optimal solutions we place the point in the upper row to visualise the approximation error.

FIG. 6 .
FIG.6.Energy plot during simple gradient descent, starting from the optimal QAOA parameters for randomly generated 25 qubit optimization problems.a) Nearby non-trivial local minimum is found and the probability of sampling the ground state is amplified, b) Ansatz becomes degenerate after a long plateau.Orange shows samples collected from the final step and blue shows samples collected in the middle of the first plateau.All histograms are obtained from a total of 200 samples.
1  2particles and (h i , J ij ) are real coefficients.This is achieved by starting with the ground state |+⟩ ⊗N of the trivial transverse field mixing Hamiltonian H x = − i X i and evolving the state under the alternating application of the propagators of H and H x .The final trial state is of the form