Tunneling and speedup in quantum optimization for permutation-symmetric problems

Tunneling is often claimed to be the key mechanism underlying possible speedups in quantum optimization via quantum annealing (QA), especially for problems featuring a cost function with tall and thin barriers. We present and analyze several counterexamples from the class of perturbed Hamming-weight optimization problems with qubit permutation symmetry. We first show that, for these problems, the adiabatic dynamics that make tunneling possible should be understood not in terms of the cost function but rather the semi-classical potential arising from the spin-coherent path integral formalism. We then provide an example where the shape of the barrier in the final cost function is short and wide, which might suggest no quantum advantage for QA, yet where tunneling renders QA superior to simulated annealing in the adiabatic regime. However, the adiabatic dynamics turn out not be optimal. Instead, an evolution involving a sequence of diabatic transitions through many avoided level-crossings, involving no tunneling, is optimal and outperforms adiabatic QA. We show that this phenomenon of speedup by diabatic transitions is not unique to this example, and we provide an example where it provides an exponential speedup over adiabatic QA. In yet another twist, we show that a classical algorithm, spin vector dynamics, is at least as efficient as diabatic QA. Finally, in a different example with a convex cost function, the diabatic transitions result in a speedup relative to both adiabatic QA with tunneling and classical spin vector dynamics.


I. INTRODUCTION
The possibility of a quantum speedup for finding the solution of classical optimization problems is tantalizing, as a quantum advantage for this class of problems would provide a wealth of new applications for quantum computing. The goal of many optimization problems can be formulated as finding an n-bit string x opt that minimizes a given cost function f (x), which can be interpreted as the energy of a classical Ising spin system whose ground state is x opt . Finding the ground state of such systems can be hard if, e.g., the system is strongly frustrated, resulting in a complex energy landscape that cannot be efficiently explored with any known algorithm due to the presence of many local minima [1]. This can occur, e.g., in classical simulated annealing (SA) [2], when the system's state is trapped in a local minimum.
Thermal hopping and quantum tunneling provide two starkly different mechanisms for solving optimization problems, and finding optimization problems that favor the latter continues to be an open theoretical question [3,4]. It is often stated that quantum annealing (QA) [5][6][7][8][9] uses tunneling instead of thermal excitations to escape from local minima, which can be advantageous in systems with tall but thin barriers that are easier to tunnel through than to thermally climb over [4,9,10]. It is with this potential tunneling-induced advantage over classical annealing that QA and the quantum adiabatic algorithm [11] were proposed. Our goal in this work is to address the question of the role played by tunneling in providing a quantum speedup, and to elucidate it by studying a number of illustrative examples. We shall demonstrate that the role of tunneling is significantly more subtle than what might be expected on the basis of the "tall and thin barrier" picture.
In order to make progress on this question, the potential with respect to which tunneling occurs must be clearly specified. Tunneling is defined with respect to a semi-classical potential which delineates classically allowed and forbidden regions. In QA, one typically initializes the system in the known ground state of a simple Hamiltonian and evolves the system towards a Hamiltonian representing the final cost function. We shall argue that when one takes a natural semi-classical limit, the semi-classical potential does not become the final costfunction. Instead one obtains a potential appearing in the action of the spin-coherent path-integral representation of the quantum dynamics. This potential, which here we call the spin-coherent potential, has been used profitably before [12][13][14][15]. We provide comprehensive evidence that multi-spin tunneling can be understood with respect to this spin-coherent potential.
We analyze the spin-coherent potential for several examples from a well-known class of problems known as perturbed Hamming weight oracle (PHWO) problems. These are problems for which instances can be generated where QA either has an advantage over classical random search algorithms with local updates, such as SA [12,16], or has no advantage [16,17]. Moreover, because PHWO problems exhibit qubit permutation symmetry, their quantum evolutions are easily classically simulatable, and furthermore, their spin-coherent potential is one-dimensional. Tunneling becomes clear and explicit arXiv:1511.03910v1 [quant-ph] 12 Nov 2015 for these problems when using the spin-coherent potential.
We focus on a particular PHWO problem that has a plateau in the final cost function (henceforth,"the Fixed Plateau"). This problem offers a counter-example to two commonly held views: (1) QA has an advantage, due to tunneling, over SA only on problems where the barrier in the final cost function is tall and thin; (2) tunneling is necessary for a quantum speedup in QA. We refute the first statement by showing that for the Fixed Plateau, which is a short and wide cost function, QA significantly outperforms SA by using tunneling. Indeed, we find numerically that adiabatic QA (AQA) needs a time of O(n 0.5 ) to find the ground state, where n is the number of spins or qubits. Moreover, using the spin-coherent potential, we observe the presence of tunneling during the quantum anneal. On the other hand, we prove that single-spin update SA takes a time of O(n plateau width ). Thus, we have essentially an arbitrary polynomial tunneling speedup of QA over SA on a cost-function that is not tall and thin. We remark that the result about SA's performance is also a rigorous proof of a result due to Reichardt [16] that classical local search algorithms will fail on a certain class of PHWO problems and is of independent interest.
We refute the second statement by showing that, for the Fixed Plateau, it is actually optimal to run QA diabatically (henceforth, DQA for diabatic quantum annealing). The system leaves the ground state, only to return through a sequence of diabatic transitions associated with avoided-level crossings. In this regime, the runtime for QA is O (1). Moreover, in this regime, we do not observe any of the standard signatures of tunneling. We show that this feature -that the optimal evolution time t f for QA is far from being adiabatic -is present in a few other PHWO problems and that this optimal evolution involves no multi-qubit tunneling.
Given that the optimal evolution involves no tunneling, we are inspired to investigate a classical algorithm, spin vector dynamics (SVD), which can be interpreted as a semi-classical limit of the quantum evolution with a product-state approximation. We observe that SVD evolves in an almost identical manner to DQA, and is able to recover the speedup seen by DQA. Thus, in these problems, we show that what may be suspected to be a highly quantum-coherent process-diabatic transitions-can be mimicked by a quantum-inspired classical algorithm.
The structure of this paper is as follows. In Sec. II, we list the PHWO problems we study. In Sec. III, we use these problems to present evidence that tunneling can be understood with respect to the spin-coherent potential. In Sec. IV, we focus on the Fixed Plateau PHWO problem, and exhaustively analyze the performance of various algorithms for this problem. In particular we numerically characterize AQA (Sec. IV A), provide a rigorous proof of SA's performance (Sec. IV B), and numerically analyze DQA (Sec. IV C), SVD (Sec. IV D), and a quantum Monte Carlo algorithm (Sec. IV E). We con-clude in Sec. V by discussing the implications of our work and possible directions for future work. Additional background information and technical details can be found in the Appendix.

II. PERTURBED HAMMING WEIGHT OPTIMIZATION PROBLEMS AND THE EXAMPLES STUDIED
The cost function of a PHWO problem is defined as, where |x| denotes the Hamming weight of the bit string x ∈ {0, 1} n . For SA, this is the cost-function. For QA, this will be the final Hamiltonian. More precisely, we define QA as the closed-system quantum evolution governed by the time-dependent Hamiltonian, where we have chosen the standard transverse field "driver" Hamiltonian H(0) that assumes no prior knowledge of the form of f (x), and a linear interpolating schedule, with s ≡ t/t f being the dimensionless time parameter. The initial state is the ground state of H(0). Below, we list several of PHWO examples that we study in greater detail. We refer to the case with p = 0 as the Plain Hamming Weight problem.

1.
Fixed Plateau: Clearly, this forms a plateau in Hamming weight space. We take u, l = O(1). Since the location of the plateau does not change with n, we refer to it as "fixed." An instance of this cost function with l = 3 and u = 8 is illustrated in Fig. 1. By numerical diagonalization we find that QA has a constant gap for this cost-function.
2. Reichardt: with h u−l √ l = o(1). For this case, Reichardt [16] proved a constant lower bound on the minimum spectral gap during the quantum anneal. In Appendix A we provide a pedagogical review of this proof and fill in some details not explicitly provided in the original proof.

3.
Moving Plateau: with l(n) = n/4, and u(n) = O(1). This is termed "moving" since the location of the plateau changes with n. Note that this is a special case from the Reichardt class.
4. Grover: This is a minor modification of the standard Grover problem: the marked state is the all-zeros string with energy 0, and the energy of all the other states is n. Scaling the energy by n keeps the maximum energy of all the PHWO problems we consider comparable. 5. Spike: This was studied by Farhi et al. in [12], where it was argued that the quantum minimum gap scales as O(n −1/2 ) and that SA will take exponential time to find the ground state. However, we show below ( Fig. 8) that SVD is more efficient than QA for this problem.
6. Precipice: This was studied by van Dam et al. in [17], where it was proved that the quantum minimum gap for this problem scales as O(2 −n/2 ). 7. α-Rectangle: We call this an α-Rectangle because the width of the perturbation (cn α ) is c times the height. This was studied in [18], where evidence for the following conjecture for the scaling of the quantum minimum gap g min was presented, Note that α < 1/4 is a member of the Reichardt class, and thus the constant lower-bound on the minimum gap is a theorem, and not a conjecture. We restrict ourselves to the case of c = 1.
We remark that all the problems listed above are representative members of a large family of problems: if the input bit-string to any of the above problems is transformed by an XOR mask, then all of our analysis below will hold. For QA, the XOR mask can be represented as a unitary transformation: , 1} n being the mask string. As this unitary commutes with the QA Hamiltonian at all times, none of our subsequent analysis is affected. Similar arguments go through for SA and all the other algorithms that we consider.
We note that PHWO problems are strictly toy problems since these problems are typically represented by highly non-local Hamiltonians (see Appendix B) and thus are not physically implementable, in the same sense that the adiabatic Grover search problem is unphysical [19,20]. Nevertheless, these problems provide us with important insights into the mechanisms behind a quantum speed-up, or lack thereof.

III. THE SEMI-CLASSICAL POTENTIAL AND TUNNELING
In order to study tunneling, we need a potential arising from a semi-classical limit, which defines classically allowed and forbidden regions. One approach to writing a semi-classical potential for quantum Hamiltonians is to use the spin-coherent path-integral formalism [21]. This semi-classical potential has been used profitably in various QA studies, e.g., Refs. [12][13][14][15], and we extend its applications here. For the quantum evolution, since the initial state [the ground state of H(0)] is symmetric under permutations of qubits and the unitary dynamics preserves this symmetry (it is a symmetry of H(s) for all s), we can consistently restrict ourselves to spin-1/2 symmetric coherent states |θ, φ : The spin-coherent potential is then given by: We show that for all the examples defined above except the Reichardt class (we address this below), this potential captures important features of the quantum Hamiltonian [Eq. (2)] and reveals the presence of tunneling. Specifically: 1. The spin-coherent potential displays a degenerate double-well almost exactly at the point of the minimum gap. In Fig. 2(a) we plot, for the Fixed Plateau the potential near the minimum gap. The potential transitions from having a single minimum on the right to a single minimum on the left. In between, it becomes degenerate and displays a degenerate double well. Since the instantaneous ground state corresponds to the position of the global minimum, which exhibits a discontinuity, the degeneracy point is where tunneling should be most helpful. In Fig. 3(a), we show that the location of the minimum gap of the quantum evolution is very close to the location of the degenerate double-well in the spin-coherent potential.
2. The ground state predicted by the spin-coherent potential is a good approximation to the quantum ground state except near the degeneracy point.
As expected from a potential that arises in a semi-classical limit, the ground state predicted by the spin-coherent potential (i.e., the spin-coherent state corresponding to the instantaneous global minimum in V SC ) agrees well with the quantum ground state, except where tunneling is important. In particular, delocalization when the spin-coherent potential is a degenerate double-well (or is close to being one) should imply that approximating the ground state with a wavefunction localized in one of the wells fails. Indeed, we find this to be the case. We illustrate this for the Fixed Plateau in Fig. 2 3. There is a sharp change in the ground state of the adiabatic quantum evolution at the degeneracy point. Tunneling should be accompanied by a sharp change in the properties of the ground state at the degeneracy point as the state state shifts from being localized in one well to the other. We quantify this change by calculating the expectation value of the Hamming weight operator, defined as HW = 1 We expect a discontinuity in the spin-coherent ground state expectation value HW , because the spin-coherent ground state changes discontinuously at the degeneracy point. We find that there is a nearly identical change in the quantum ground state expectation value HW , for all of the examples listed above. This is illustrated explicitly for the Fixed Plateau in Fig. 2(c). In Fig. 3(b), we show that there is close and increasing agreement (as a function of n) between the position of the sudden drop in HW and the position of the degeneracy point, for all of the problems considered. 4. The scaling of the barrier height in the spincoherent potential is positively correlated with the scaling of the minimum gap of the quantum Hamiltonian. In Fig. 4, we see that as the barrier height increases, the inverse of the quantum minimum gap also increases.
Note that the Reichardt class is absent from the discussion above. The reason is that for these problems, the barrier in the spin-coherent potential is very small, which makes its numerical detection difficult. Fortunately, we can make some analytical claims about this class of problems. By adapting Reichardt's proof (reviewed in Appendix A) that these problems have a constant minimum gap, we are able to prove that the barrier height in the spin-coherent potential for this class vanishes as n → ∞. Therefore, for these easy-for-AQA problems, there is a vanishing barrier in the spin-coherent potential. More precisely, we can show, for any perturbed Hamming weight problem, where the unperturbed case refers to h(n) = 0 in Eq. (4).
Recall that h u−l √ l = o(1) for the Reichardt class. Thus asymptotically, the spin-coherent potential for this class approaches the spin-coherent potential of the unperturbed Hamming weight problem. It is easy to check that the latter has a single minimum throughout the evolution, and hence no barriers.
Taken together, these observations indicate that the spin-coherent potential (not the cost function alone) is the appropriate potential with respect to which tunneling is to be understood for these problems.

IV. FIXED PLATEAU: PERFORMANCE OF ALGORITHMS
Having motivated the spin-coherent potential for understanding tunneling, we now exhaustively analyze the Fixed Plateau. We choose this problem because it forces us to confront some intuitions about the performance of certain algorithms. Considering the final cost function, the Fixed Plateau has neither local minima nor a barrier going from large to small |x|: it just has a long, flat section before the ground state at |x| = 0. This might suggest that it is easy for an algorithm such as SA, and is not a candidate for a quantum speedup. Moreover, given the absence of a barrier, one might suspect that the quantum evolution would not even involve multi-qubit tunneling.
We dispel both of these intuitions and summarize our findings first. In the previous section, we already provided evidence that tunneling is unambiguously present for this problem. The spin-coherent potential involves energy barriers, despite their absence in the final cost function, and the adiabatic quantum evolution is forced to tunnel in order to follow the ground state. By a simulation of the Schrödinger equation, we find that AQA needs a time of O(n 0.5 ) in order to reach a given success probability (see Sec. IV A). Therefore, the adiabatic algorithm, via tunneling, is able to solve this problem efficiently.
Turning to SA, an algorithm which performs a local stochastic search on the final cost function, we prove that simulated annealing with single spin-updates will take time O(n u−l−1 ) = O(n plateau width ) to find the ground state (see Sec. IV B). This result is due to the fact that a random walker on the plateau has no preferred direction and becomes trapped there. More precisely, the probability of a leftward transition while on the plateau is proportional to the probability of flipping one of a constant number of bits (given by the Hamming weight) out of n, which scales as ∼ 1/n if l, u = O(1). And since the walker needs to make as many consecutive leftward transitions as the width of the plateau in order to fall off the plateau, the time taken for this to happen scales as O(n plateau width ). Consequently, we obtain a polynomial speedup of AQA over SA that can be made as large as desired. Therefore, using the Fixed Plateau, we are able to demonstrate that a quantum speedup over SA is possible via tunneling in the adiabatic regime. However, is the adiabatic evolution optimal? In order to find the optimal evolution time, we employ the optimal time to solution (TTS opt ), a metric that is commonly used in benchmarking studies [22] (also see Appendix C). It is defined as the minimum total time such that the ground state is observed at least once with desired probability p d : where t f is the duration (in QA) or the number of single spin updates (in SA) of a single run of the algorithm, and p GS (t f ) is the probability of finding the ground state in a single such run. The use of TTS opt allows for the possibility that multiple short runs of the evolution, each lasting an optimal annealing time (t f ) opt , result in a better scaling than a single long (adiabatic) run with an unoptimized t f . The quantum evolution that gives the optimal annealing time relative to this cost function is actually DQA, with an asymptotic scaling of O(1). Importantly, this diabatic evolution does not contain any of the signatures of tunneling discussed in the previous section. Therefore, for the Fixed Plateau, tunneling does not give rise to the optimal quantum performance. Motivated by the fact that the optimal quantum evolution involves no multi-qubit tunneling, we consider spinvector dynamics [23] (see, also Refs. [24,25]), a model that evolves according to the spin-coherent potential in Eq. (12). SVD can be derived as the saddle-point approximation to the path integral formulation of QA in the spin-coherent basis [25]. The SVD equations are equivalent to the Ehrenfest equations for the magnetization under the assumption that the density matrix is a product state, i.e., ρ = ⊗ n i=1 ρ i , where ρ i denotes the state of the ith qubit. This algorithm is useful since it is derived under the assumption of continuity of the angles (θ, φ), so tunneling, which here would amount to a discrete jump in the angles, is absent.
We also consider a quantum Monte Carlo based algorithm, often called simulated quantum annealing (SQA) [26,27]. We show that SQA has a scaling that is better than SA's. Indeed, this is consistent with the fact that SQA thermalizes not just relative to the final cost function, but also during the evolution.
We provide further details of our implementations of each of these algorithms in Appendix D. We now turn to each of the algorithms individually and detail their performance for the Fixed Plateau problem.

A. Adiabatic dynamics
In order to study the scaling of adiabatic dynamics, we consider the minimum time τ 0 required to reach the ground state with some probability p 0 , where we choose p 0 to ensure that we are exploring a regime close to adiabaticity for QA. We call this benchmark metric the "threshold criterion," and set p 0 = 0.9. As seen in Fig. 5, we observe a scaling for AQA that is approximately ∼ n 0.5 . As is to be expected given that the tunneling for the Fixed Plateau problem is controlled by the width of the plateau, which is constant (does not scale with n), we find that τ 0 scales in the same way for the Fixed Plateau and the Plain Hamming Weight problems (see Appendix A). This suggests that the dominant contribution to the scaling at large n is not the time associated with tunneling but rather the time associated with the Plain Hamming Weight problem.
As also seen in Fig. 5, we find that the textbook adiabatic criterion [28] given by serves as an excellent proxy for the scaling of AQA [29]. The scaling of AQA is matched by the scaling of the numerator of the adiabatic condition, which is explained by the fact that we find a constant minimum gap for the case l, u = O(1). This numerator turns out to be well approximated in our case by the matrix element of H(s) between the ground and first excited states, leading to t f ∼ n 0.5 in the adiabatic limit. Note that calculating this matrix element can easily be done for arbitrarily large systems, and is hence much easier to check directly than the scaling of AQA.

B. Simulated annealing using random spin selection
We consider a version of SA with random spin-selection as the rule that generates candidates for Metropolis updates. Our main motivation is to understand the behavior of a local, stochastic search algorithm which has access only to the final cost function. We note that our analysis below is general for any Plateau problem, and is not limited to the Fixed Plateau or the Moving Plateau.
If we pick a bit-string at random, then for large n we will start with very high probability at a bit-string with Hamming weight close to n/2. The plateau may be to the left or to the right of n/2; if the plateau is to the right, then the random walker is unlikely to encounter it and can quickly descend to the ground state. Thus, the more interesting case is when the random walker arrives at the plateau from the right. We proceed to analyze these two cases separately.

Walker starts to the right of the plateau
In this case, how much time would it take, typically, for the walker to fall off the left edge? It is intuitively clear that traversing the plateau will be the dominant contribution to the time taken to reach the ground state, as after that the random walker can easily walk down . The time for SQA and SA is measured in single-spin updates (for SQA this is Nτ times the number of sweeps times the number of spins, whereas for SA this is the number of sweeps times the number of spins), where both are operated in 'solver' mode as described in Appendix D. Also shown is the scaling of the numerator of the adiabatic condition as defined in Eq. (15). The scaling for AQA and the adiabatic condition extracted by a fit using n 10 2 is approximately n 0.44 . However, the true asymptotic scaling is likely to be ∼ n 0.5 since the scaling for the Fixed Plateau problem is clearly lower-bounded by the Plain Hamming Weight problem, for which we have verified τ0 ∼ n 0.5 (see Appendix A), and we expect the effect of the plateau to become negligible in the large n limit. SQA scales more favorably (∼ n 1.5 ) than SA (∼ n 5 ). We have checked that the scaling of SQA does not change even if we double the number of Trotter slices Nτ and keep the temperature 1/β fixed. the potential. We show below (for the walker that starts to the left of the plateau) that this time can be at most O(n 2 ) if β = Ω(log n).
To evaluate the time to fall off the plateau, note that the perturbation is applied on strings of Hamming weight l + 1, l + 2, . . . , u − 1, so the width of the plateau is w ≡ u − l − 1. Consider a random walk on a line of w + 1 nodes labelled 0, 1, . . . w. Node i represents the set of bit strings with Hamming weight l + i, with 0 ≤ i ≤ w. We may assume that the random walker starts at node w. Only nearest-neighbor moves are allowed and the walk terminates if the walker reaches node 0.
Our analysis will provide a lower bound on the actual time to fall off the left edge, because in the actual PHWO problem one can also go back up the slope on the right, and in addition we disallow transitions from strings of Hamming weight l to l + 1. This is justified because the Metropolis rule exponentially (in β) suppresses these transitions.
The transition probabilities p i→j for this problem can be written as a (w + 1) × (w + 1) row-stochastic matrix p ij = p i→j . Here p is a tridiagonal matrix with zeroes on the diagonal, except at p 00 and p ww . First consider 1 ≤ i ≤ w − 1. If the walker is at node i, then the transition to node i+1 (which has Hamming weight l +i+1) occurs with probability n−(l+i) n (the chance that the bit picked had the value 0). Similarly, for 1 ≤ i ≤ w, the Hamming weight will decrease to l + i − 1 with probability l+i n (the chance that the bit picked had the value 1). Combining this with the fact that a walker at node 0 stays put, we can write: Let X(t) be the position of the random walker at timestep t. The random variable measuring the number of steps the random walker starting from node r would need to take to reach node s for the first time is The quantity we are after is Eτ w,0 , the expectation value of the random variable τ w,0 , i.e., the mean time taken by the random walker to fall off the plateau. Since only nearest neighbor moves are allowed we have Stefanov [30] (see also Ref. [31]) has shown that where c w+1 ≡ 0. Evaluating the sum term by term, we obtain: . . .
Now consider the following cases: Since the leading order term is Eτ w−(w−1),w−w = Eτ 1,0 , the time to fall off the plateau is O(n w ) = O(n u−l−1 ). This result about SA's performance is confirmed numerically in Fig. 5.
2. In order for Reichardt's bound (see Appendix A) to give a constant lower-bound to the quantum problem, we need u = l + o(l 1/4 ). Since at most we can . Therefore, the time to fall-off becomes we can see that Eτ w,0 = O(1), which is a constant time scaling. More

Walker starts to the left of the plateau
Note that this case is equivalent to the unperturbed Hamming weight problem, which is a straightforward gradient descent problem. We may therefore consider a simple fixed temperature version of SA (i.e., the standard Metropolis algorithm). We will show that the performance of SA on this problem provides an upper bound of O(n 2 ) on the time for a random walker to arrive at the plateau, and on the time for a random-walker to reach the ground state after descending from the plateau. Moreover, our analysis provides a lower bound of O(n log n) on the efficiency of such algorithms.
For this problem, the transition probabilities are: with i = 1, 2, . . . , n denoting strings of Hamming weight i, and β is the inverse temperature. Using the Stefanov formula (19), we can write (after much simplification): We will bound the expected time to reach the all-zeros string starting from the all-ones string. This is the worst-case scenario as we are assuming that we are starting from the string farthest from the all-zeros string. Note again that if we start from a random spin configuration, then with overwhelming probability we will pick a string with Hamming weight close to n/2. Thus, most probably, Eτ n/2,0 will be the time to hit the ground state. We first show that β = O(1) will lead to an exponential time to hit the ground state, irrespective of the walker's starting string. Toward that end, which is clearly exponential in n if β = O(1). Next, let β(n) = log n, i.e., we decrease the temperature logarithmically in system size. In this case, Now it is intuitively clear that Eτ 1,0 > Eτ r,r−1 for all r > 1, which implies that nEτ 1,0 ≥ Eτ n,0 . Thus, if β = log n, then Eτ n,0 = O(n 2 ) at worst.
To obtain a lower-bound on the performance of the algorithm, we take β → ∞. Thus, for each k in Eq. (23), only the l = 0 term will survive. Hence, for large n, with γ being the Euler-Mascheroni constant. The scaling here is O(n log n). This is the best possible performance for single-spin update SA with random spin-selection on the plain Hamming weight problem. Therefore, if β = Ω(log n), the scaling will be between O(n log n) and O(n 2 ). Of course, this cost needs to be added to the time taken for the walker starting to the right of the plateau. Two clarifications are in order regarding the comparison between our theoretical bound on SA's performance and the associated numerical simulations we have presented. First, while Fig. 5 displays the time to cross a threshold probability, our theoretical bound of O(n u−l−1 ) is on the expected time for the random walker to hit the ground state [Eq. (18)]. However, we found that both metrics show identical scaling. Second, while the SA data in Fig. 5 was generated using sequential spin updates, the theoretical bound assumes random spin updates (see Appendix D 1 for more details on the update schemes). However, we found that the asymptotic scaling for both cases is nearly identical in the long-time regime, and thus have plotted only the former.

C. Optimal QA via Diabatic Transitions
Having established that for the Fixed Plateau AQA enjoys a quantum speedup over local search algorithms such as SA via tunneling, we are motivated to ask: Is tunneling necessary to achieve a quantum speedup on these problems? In order to answer this question, we demonstrate using the optimal TTS criterion defined in Eq. (14) that the optimal annealing time for QA is far from adiabatic. Instead, as shown in Fig. 6(a), the optimal TTS for QA is such that the system leaves the instantaneous ground state for most of the evolution and only returns to the ground state towards the end. The cascade down to the ground state is mediated by a sequence of avoided energy level-crossings as seen in Fig. 7. We consider this a diabatic form of QA (DQA) and call this mechanism through which DQA achieves a speedup a diabatic cascade.
As n increases for fixed u, repopulation of the ground state improves for fixed (t f ) opt , hence causing TTS opt to decrease with n, as seen Fig. 6(b), until it saturates to a constant at the lowest possible value, corresponding to a single run at (t f ) opt . At this point the problem is solved in constant time (t f ) opt , compared to the ∼ O(n 0.5 ) scaling of the adiabatic regime. Moreover, as shown in Fig. 6(c), there are no sharp changes in HW , suggesting that the non-adiabatic dynamics do not entail multi-qubit tunneling events, unlike the adiabatic case. Thus, this establishes that we may have speedups in QA that do not involve multi-qubit tunneling.
One may worry that for this diabatic evolution to be successful, the optimal annealing time may need to be very finely tuned. We address this concern in Appendix E, where we show that if is the precision desired in p GS , we need only have a precision of polylog (1/ ) in setting t f , which means that the diabatic speedup is robust. Figure 8 shows that the speedup of DQA and SVD over AQA exists for three other PHWO problems: the Moving Plateau, the Spike, and the 0.5-Rectangle problems. Importantly, DQA and SVD have an exponential speedup over AQA for the 0.5-Rectangle problem. We do not observe a diabatic speedup for the Precipice or Grover problems.

D. Spin Vector Dynamics
Given the absence of tunneling in the time-optimal quantum evolution, we are motivated to consider the behavior of Spin-Vector Dynamics (SVD), which arise in a semi-classical limit (see Appendix D 3 for an overview of this algorithm). As we show in Fig. 6(b), the scaling of SVD's optimal TTS also saturates to a constant time, i.e., (t f ) opt . Moreover, it reaches this value earlier (as a function of problem size n) than DQA, thus outperforming DQA for small problem sizes, while for large enough n both achieve O(1) scaling. As seen in the inset, SVD's The eigenenergy spectrum along the evolution for the Fixed Plateau with n = 512, l = 0, and u = 6. Note the sequence of avoided level crossings that unmistakably line up in the spectrum to reach the ground state. This is the pathway through which DQA is able to achieve a speedup over AQA.
advantage persists as a function of u at constant n.
The dynamics of DQA are well approximated by SVD until close to the end of the evolution, as shown in Fig. 6(c): the trace-norm distance between the instantaneous states of DQA and SVD is almost zero until t/t f ≈ 0.8, after which the states start to diverge. This suggests that SVD is able to replicate the DQA dynam-ics up to this point, and only deviates because it is more successful at repopulating the ground state than DQA.
In Fig. 8, we show that SVD's speedup over AQA is replicated for the Spike, Moving Plateau, and 0.5-Rectangle problems as well. Remarkably, while the 0.5-Rectangle problem has an exponentially small gap [see Eq. (10) and Fig. 4(b)], SVD and DQA both achieve O(1) scaling, and hence the diabatic cascades provides an exponential speedup relative to AQA.
It is important to note that SVD is ineffective if one desires to simulate the adiabatic evolution. In the absence of unitary dynamics (which allow for tunneling) or thermal activation (to thermally hop over the barrier), SVD gets trapped behind the barrier that forms in the semi-classical potential separating the two degenerate minima [see Fig. 2(a)] and is unable to reach the new global minimum. In this sense, SVD does not enjoy the guarantee provided by the quantum adiabatic theorem for the unitary evolution [32][33][34], that for sufficiently long t f dictated by the adiabatic condition, the ground state can be reached with any desired probability.
Likewise, it is important to keep in mind the distinction between a classical algorithm being able to match, or sometimes outperform, a quantum algorithm (as SVD does here), and the classical algorithm approximating the evolution or instantiating the physics of the quantum algorithm (as SVD fails to do here). Indeed, in both the diabatic and adiabatic regimes, SVD provides a poor ap- proximation to the instantaneous quantum state. For example, in the diabatic regime, it is clear from Fig. 6(c) that the trace-norm distance between the instantaneous SVD state and the instantaneous quantum state starts to increase significantly for s 0.8. In the same spirit, consider the instantaneous semi-classical ground state, i.e., the spin-coherent state evaluated at the minimum of the spin-coherent potential, which may be suspected to provide a good approximation to the instantaneous quantum ground state, but does not as shown in Fig. 2(b). Thus the unentangled semi-classical ground state also fails to provide a good approximation to the quantum ground state.

E. Simulated Quantum Annealing
Simulated Quantum Annealing (SQA) is a quantum Monte Carlo algorithm performed along the annealing schedule (see Appendix D 4 for further details). It is often used as a benchmark against which QA is compared (though see Ref. [4] for caveats). SQA scales better than SA for the Fixed Plateau problem using the threshold criterion (see Fig. 5). In order to understand why SQA enjoys an advantage over SA using this benchmark metric, it is useful to study the behavior of the state of SQA along the annealing schedule. We show the behavior of HW for SQA in Fig. 9, where we observe that SQA at the optimal number of sweeps (the case of 1500 sweeps shown in Fig. 9) does not follow the instantaneous ground state. Instead it reaches the threshold success probability by thermally relaxing to the ground state after the minimum gap point (and tunneling event) of the quantum Hamiltonian. Therefore, SQA's advantage over SA stems from the fact that it thermalizes in a different energy landscape than SA.
We also contrast the behavior of SQA and AQA using the threshold criterion. While SQA is able to follow the instantaneous ground state for a sufficiently large number of sweeps and thus mimic the tunneling of AQA (see Fig. 9), this is not the optimal way for it to reach the threshold criterion. For a fixed threshold success probability, the process of thermal relaxation after the minimum gap point uses fewer sweeps (and hence is more efficient) than following the instantaneous ground state closely throughout the anneal [35]. This is in contrast to AQA, where tunneling is the only means for it to reach a high success probability and nevertheless is more efficient than SQA, as seen in Fig. 5.
We note that SQA's threshold criterion advantage over SA does not carry over to the optimal TTS criterion. In fact, we find that using the optimal TTS criterion, SQA to reach the threshold ground state probability of p0 = 0.9, and similarly for the annealing time value of t f = 4931.16 for AQA. While AQA is able to approximately follow the quantum ground state (i.e., the evolution is very close to being adiabatic), the optimal SQA evolution (i.e., that requires the fewest sweeps) for achieving the threshold criterion involves not following the ground state at the minimum gap point and instead thermally relaxing towards the ground state after this point. As shown using the higher Nsw values, only after increasing the number of sweeps by more than two orders of magnitude does SQA follow the instantaneous ground state closely.
scales as O(n 1.5 ), while SA scales as O(n), as seen in Fig. 6(b). The reason for the latter scaling is that the optimal number of sweeps for SA is 1, simply because there is a small but non-zero probability that in the first sweep all the 1s are flipped to 0s.

V. DISCUSSION
It is often assumed that the shape of the final costfunction determines how hard it is for QA to solve the problem (in fact, this was partly the motivation for the Spike problem in Ref. [12]), and that potentials with tall and thin barriers should be advantageous for AQA, since this is where tunneling dominates over thermal hopping (e.g., [4, p.215], [9, p.1062], [10, p.226]). It is then assumed that problems where the final potential has this feature are those for which there should be a quantum speedup. We have given several counterexamples to such claims, and shown that tunneling is not necessary to achieve the optimal TTS. Instead, the optimal trajectory may use diabatic transitions to first scatter com- QA outperforms SVD over the range of problem sizes we were able to check. The reason can be seen in the inset, which displays the ground state probability for SVD and QA for different annealing times t f , with n = 512. The optimal annealing time for SVD occurs at the first peak in its ground state probability (t f ≈ 8.98), whereas the optimal annealing time for QA occurs at the much larger second peak in its ground state probability (t f ≈ 10.91).
pletely out of the ground state and return via a sequence of avoided level crossings. That diabatic transitions can help speed up quantum algorithms has also been noted and advantageously exploited in Refs. [36][37][38][39]. Moreover, we have shown that the instantaneous semi-classical potential provides important insight into the role of tunneling, while the final cost function can be rather misleading in this regard. While both adiabatic and diabatic QA outperform SA for the Fixed Plateau problem, the faster quantum diabatic algorithm is not better than the classical SVD algorithm for this problem. The PHWO problems due to Reichardt [16], which includes problems very similar to the Fixed Plateau, have widely been considered an example where tunneling provides a quantum advantage; we have shown that this holds if one limits the comparison to SA, but that there is in fact no quantum speedup in the problem when one compares the quantum diabatic evolution (which outperforms adiabatic quantum annealing) to SVD.
These results of the diabatic optimal evolution extend beyond the plateau problems: even the Spike problem studied in Ref. [12]-which is in some sense the antithesis of the plateau problem since it features a sharp spike at a single Hamming weight-also exhibits the diabaticbeats-adiabatic phenomenon, indicating that tunneling is not required to efficiently solve the problem. Thus diabatic evolution, especially via diabatic cascades, is an important and relatively unexplored mechanism in quantum optimization that is different from tunneling. The fact that we observe a speedup relative to AQA for several problems, especially an exponential speedup for the 0.5-Rectangle, motivates the search for algorithms exploiting this mechanism and may yield fruitful results. However, we also already know that diabatic cascades are not generic. E.g., we have checked that this mech-anism is absent in the Grover and Precipice problems, even though the Grover problem is equivalent to a 'giant' plateau problem.
In summary, our work provides a counterargument to the widely made claims that tunneling should be understood with respect to the final cost function, that speedups due to tunneling require tall and thin barriers; and that tunneling is needed for a quantum speedup in optimization problems. Which features of Hamiltonians of optimization problems favor diabatic or adiabatic algorithms remains an open question, as is the understanding of tunneling for non-permutation-symmetric problems.
We finish on a positive note for QA. We have given several examples where SVD outperforms QA, e.g., the Spike problem [12]. However, we make no claim that SVD will always have an advantage over QA. A simple and instructive example comes from the class of cost functions that are convex in Hamming weight space, which have a constant minimum gap [40]: We have observed similar diabatic transitions for this problem as for the Fixed Plateau (not shown), but find that DQA outperforms SVD, as shown in Fig. 10. This results because the optimal TTS for QA occurs at a slightly higher optimal annealing time, i.e., there is an advantage to evolving somewhat more slowly, though still far from adiabatically. Thus, this provides an example of a "limited" quantum speedup [22].
which has |+ ⊗n as the ground state. The final Hamiltonian for the cost function f HW (x) is which has |0 ⊗n as the ground state. We interpolate linearly between H D and H P : We note that H i (s) in Eq. (A5) is similar to a variant of the Landau-Zener (LZ) Hamiltonian with finite coupling duration [43,44], for which the Schrödinger equation has an analytical solution, except that there it is assumed that the σ x term is constant and only the σ z terms has a (linear) time dependence over a finite interval. The analytical solution of the problem obtained in Ref. [43] is rather complicated, and for our purposes a simpler approach suffices. Since there are no interactions between the qubits, the adiabatic problem can be solved exactly by diagonalizing the Hamiltonian acting on each qubit separately. For each term, we have the energy eigenvalues E ± (s), and associated eigenvectors, The ground state of H(s) is The gap is given by, The gap is minimized at s = 1 2 with minimum value The minimum gap is independent of n and hence does not scale with problem size. Therefore we can predict an adiabatic run time to be given by, where the n-dependence is solely due to ∂ s H (see Appendix-D 2). However, this is actually a loose upper bound. We next provide separate numerical and analytical arguments to demonstrate that the actual scaling for AQA is O(n 0.5 ).

a. Numerical argument
Suppose the adiabatic algorithm runs long enough so as to attain a desired success probability, p 0 . Let this time be t f . Using the fact that the quantum evolution of the plain Hamming Weight problem is the evolution of n non-interacting qubits, we can express the global groundstate probability in terms of the ground-state probabilities of single qubits. So, if the single qubit ground-state probability for this run-time is p GS (t f ), then we must have p 0 = p GS (t f ) n .
We find numerically (see Fig. 11) that p GS (t f ) has an envelope that is excellently approximated by: for sufficiently large t f . We therefore can write: and upon expanding the ln, we extract a tighter scaling for our adiabatic time:

b. Analytical argument
Here, we invoke a result due to Boixo and Somma [45]. This result states, Theorem 1 ( [45]). To adiabatically prepare a final eigenstate using a Hamiltonian evolution H(s) requires time that scales at least as O L ∆ . Here L is the eigenpath length, where |ψ(s) is the eigenpath traversed to reach the final eigenstate.
We analytically compute L for the ground-state path in the plain Hamming weight problem, and show that it scales as O( √ n). Since we know that in this case ∆ = O(1), we conclude the adiabatic algorithm will require at least O( √ n) time. Recall that the instantaneous ground state is Differentiating: The term d ds |v i − (s) does not have any scaling with n, and the second term vanishes because it is equal to where we use the fact that |v i − (s) is real-valued and normalized. Thus, taking the square root on both sides and integrating from 0 to 1, we obtain the √ n scaling of L. If we desire to fix the constant in front of L, a straightforward calculation will show that

Reichardt's bound for PHWO problems
Here we review Reichardt's derivation of the gap lowerbound for general PHWO problems, but provide additional details not found in the original proof [16].
We use the same initial Hamiltonian [Eq. (A1)] and linear interpolation schedule as before,H(s) = (1 − s)H D + sH P , and choose the final Hamiltonian to bẽ wheref where p(x) ≥ 0 is the perturbation. Note that here we have not assumed that the perturbation, p(x), respects qubit permutation symmetry. We wish to bound the minimum gap ofH(s). Unlike the Hamming weight problem H(s), this problem is no longer non-interacting. Define Below, we suppress the s dependence of all the terms for notational simplicity. We know that E 0 = v ⊗n − |H|v ⊗n − . Using this, where n k is the number of strings with Hamming weight k, we used the fact that if we measure in the computational basis, the probability of getting outcome x is v ⊗n − |x 2 = q(s) |x| (1 − q(s)) n−|x| , and q(s) is given in Eq. (A15). Consider the partial binomial sum (dropping the h k 's), Using the fact that the binomial is well-approximated by the Gaussian in the large n limit (note that this approximation requires that q(s) and 1 − q(s) not be too close to zero), we can write: . Note that σ and µ depend on n, and also on s via q(s). The parameters l and u are specified by the problem Hamiltonian, and are therefore allowed to depend on n as long as l(n) < u(n) < n is satisfied for all n.
Let us define: B(s, n, l(n), u(n)) ≡ (u(n)−µ(n,s))/σ(n,s) (l(n)−µ(n,s))/σ(n,s) (A27) We seek an upper bound on this function. We observe that q(s) decreases monotonically from 1 2 to 0 as s goes from 0 to 1. Thus, the mean of the Gaussian µ(n, s) = nq(s) decreases from n 2 to 0. Depending on the values of l(n), u(n) and µ(n, s), we thus have three possibilities: (i) l(n) < µ(n, s) < u(n), (ii) µ(n, s) < l(n) < u(n), and (iii) l(n) < u(n) < µ(n, s). Note that (ii) and (iii) are cases where the integral runs over the tails of the Gaussian and so the integral is exponentially small. We focus on (i), as this induces the maximum values of the integral. In this case the lower limit of the integral Eq. (A27) is negative, while the upper limit is positive. Thus, the integral runs through the center of the standard Gaussian, and we can upper-bound the value of the integral by the area of the rectangle of width u(n)−l(n) σ(n,s) and height 1 √ 2π . Hence where we have used the fact that l(n) < µ(n, s) = nq(s). Thus, we obtain the bound: Since the PHWO problems, including the plateau, are quantum oracle problems, they cannot generically be represented by a local Hamiltonian. For completeness we prove this claim here and also show why the (plain) Hamming weight problem is 1-local.
Let r be a bit string of length n, i.e., r ∈ {0, 1} n and let with σ 0 i ≡ I i and σ 1 i ≡ σ z i . This forms an orthonormal basis for the vector space of diagonal Hamiltonians. Thus: with J r = 1 2 n Tr(σ r H P ) (B3a) Note that generically J r will be be non-zero for arbitraryweight strings r, leading to |r|-local terms in H P , even as high as n-local.
E.g., substituting the plateau Hamiltonian [Eq. (3)] into this we obtain: On the other hand, if f (x) = |x| (i.e., in the absence of a perturbation), the Hamiltonian is only 1-local: Appendix C: Derivation of Eq. (1) Equation (14) is easily derived as follows: the probability of successively failing k times is [1 − p GS (t f )] k , so the probability of succeeding at least once after k runs k , which we set equal to the desired success probability p d ; from here one extracts the number of runs k and multiplies by t f to get the time-tosolution TTS. Optimizing over t f yields TTS opt , which is natural for benchmarking purposes in the sense that it captures the trade-off between repeating the algorithm many times vs optimizing the probability of success in a single run. The adiabatic regime might be more attractive if one seeks a theoretical guarantee to have a certain probability of success if the evolution is sufficiently slow.
Appendix D: Methods

Simulated Annealing
SA is a general heuristic solver [2], whereby the system is initialized in a high temperature state, i.e., in a random state, and the temperature is slowly lowered while undergoing Monte Carlo dynamics. Local updates are performed according to the Metropolis rule [47,48]: a spin is flipped and the change in energy ∆E associated with the spin flip is calculated. The flip is accepted with probability P Met : where β is the current inverse temperature along the anneal. Note that there could be different schemes governing which spin is to be selected for the update. We consider two such schemes: random spin-selection -where the next spin to be updated is selected at random; and sequential spin-selection -where one runs through all of the n spins in a sequence. Random spin-selection (including just updating nearest neighbors) satisfies detailedbalance and thus is guaranteed to converge to the Boltzmann distribution. Sequential spin-selection does not satisfy strict detailed balance (since the reverse move of sequentially updating in the reverse order never occurs), but it too converges to the Boltzmann distribution [49]. In sequential updating, a "sweep" refers to all the spins having been updated once. In random spin-selection, we define a sweep as the total number of spin updates divided by the total number of spins. When it is possible to parallelize the spin updates, the appropriate metric of time-complexity is the number of sweeps N SW , not the number of spin updates (they differ by a factor of n) [22]. However, in our problem this parallelization is not possible and hence the appropriate metric is the number of spin updates, and this is what is plotted in Fig. 6(b). After each sweep, the inverse temperature is incremented by an amount ∆β according to an annealing schedule, which we take to be linear, i.e., ∆β = (β f − β i )/(N SW − 1). We can use SA both as an annealer and as a solver [50]. In the former, the state at the end of the evolution is the output of the algorithm, and can be thought of as a method to sample from the Boltzmann distribution at a specified temperature. For the latter, we select the state with the lowest energy found along the entire anneal as the output of the algorithm, the better technique if one is only interested in finding the global minimum. We use the latter to maximize the performance of the algorithm.

Quantum Annealing
Here we consider the most common version of quantum annealing: where s ≡ t/t f is the dimensionless time parameter and t f is the total anneal time. The initial state is taken to be |+ ⊗n , which is the ground state of H(0).
The initial ground state and the total Hamiltonian are symmetric under qubit permutations (recall that f (x) = f (|x|) for our class of problems). It then follows that the time-evolved state, at any point in time, will also obey the same symmetry. Therefore the evolution is restricted to the (n+1)-dimensional symmetric subspace, a fact that we can take advantage of in our numerical simulations. This symmetric subspace is spanned by the Dicke states |S, M with S = n/2, M = −S, −S + 1, . . . , S, which satisfy: where S x,y,z ≡ 1 2 n i=1 σ x,y,z i , S 2 = (S x ) 2 +(S y ) 2 +(S z ) 2 . We can denote these states by: where, w ∈ {0, . . . , n}.
In this basis the Hamiltonian is tridiagonal, with the following matrix elements: The Schrödinger equation with this Hamiltonian can be solved reliably using an adaptive Runge-Kutte Cash-Karp method [51] and the Dormand-Prince method [52] (both with orders 4 and 5).
If the quantum dynamics is run adiabatically the system remains close to the ground state during the evolution, and an appropriate version of the adiabatic theorem is satisfied. For evolutions with a non-zero spectral gap for all s ∈ [0, 1], an adiabatic condition of the form is often claimed to be sufficient [53] [however, see the discussion after Eq. (21) in Ref. [32]]. In our case ∂ s H(s) = H(1) − H(0) is upper-bounded by n; since we are considering a constant gap, the adiabatic algorithm can scale at most linearly by condition (D6). This is true for the plateau problems. We showed in the main text that the following version of the adiabatic condition, known to hold in the absence of resonant transitions between energy levels [33], estimates the scaling we observe very well: where ε 0 (s) and ε 1 (s) are the instantaneous ground and excited states in the symmetric subspace respectively. The permutation symmetry is explicitly enforced only in our numerical simulations of the quantum evolution. Since, of course, we do not have quantum hardware that can implement the problems under consideration, we must explicitly enforce this symmetry in order to be able to perform numerical simulations at large problem sizes. Note that even if we were to simulate the quantum system without explicitly imposing this symmetry, the symmetry would be automatically preserved in the dynamics, and we would draw the same lessons about the performance of the quantum algorithm (but our classical simulations would quickly become intractable).

Spin-Vector Dynamics
Starting with the spin-coherent path integral formulation of the quantum dynamics, we can obtain Spin Vector Dynamics (SVD) as the saddle-point approximation (see, for example, Ref. [25, p.10] or Refs. [23,24]). It can be interpreted as a semi-classical limit describing coherent single qubits interacting incoherently. In this sense, SVD is a well motivated classical limit of the quantum evolution of QA. SVD describes the evolution of n unit-norm classical vectors under the Lagrangian (in units of = 1): where |Ω(s) is a tensor product of n independent spincoherent states [54]: (D9) We can define an effective semi-classical potential associated with this Lagrangian: with the probability of finding the all-zero state at the end of the evolution (which is the ground state in our case), as n i=1 cos 2 θi(1)

2
. The quantum Hamiltonian obeys qubit permutation symmetry: P HP = H where P is a unitary operator that performs an arbitrary permutation of the qubits. This implies that our classical Lagrangian obeys the same symmetry: where the derivative operator is trivially permutationsymmetric. Therefore, the Euler-Lagrange equations of motion derived from this action will be identical for all spins. Thus, if we have symmetric initial conditions, i.e., (θ i (0), ϕ i (0)) = (θ j (0), ϕ j (0)) ∀i, j, then the time evolved state will also be symmetric: (θ i (s), ϕ i (s)) = (θ j (s), ϕ j (s)) ∀i, j ∀s ∈ [0, 1] . (D12) As we show below, under the assumption of a permutation-symmetric initial condition we only need to solve two (instead of 2n) semi-classical equations of motion: where we have defined the symmetric effective potential V sym SC as: and |Ω sym (s) is simply |Ω(s) with all the θ's and ϕ's set equal. Note that in the main text [see Eq. (12)], we slightly abuse notation for simplicity, and use V SC instead of V sym SC . The probability of finding the all-zero bit string at the end of the evolution is accordingly given by cos 2n (θ(1)/2). We would have arrived at the same equations of motion had we used the symmetric spin coherent state in our path integral derivation, but that would have been an artificial restriction. In our present derivation the symmetry of the dynamics naturally imposes this restriction.
Note that the object in Eq. (D10) involves a sum over all 2 n bit-strings and is thus exponentially hard to compute; on the other hand, the object in Eq. (D14) only involves a sum over n terms and is thus easy to compute. Therefore, just as in the quantum case-where due to permutation symmetry the quantum evolution is restricted to the n + 1 dimensional subspace of symmetric states instead of the full 2 n -dimensional Hilbert space-given knowledge of the symmetry of the problem we can efficiently compute the SVD potential and efficiently solve the SVD equations of motion.
We also remark that the computation of the potential in Eq. (D10) is significantly simplified if our cost function, f (x), is given in terms of a local Hamiltonian. For example, if H(1) = i,j J ij σ z i σ z j , then: which is easy to compute as it is a sum over poly(n) number of terms.
Let us now derive the symmetric SVD equations of motion (D13). Without any restriction to symmetric spincoherent states, the SVD equations of motion, for the pair θ i , ϕ i , read: As can be seen by comparing Eqs. (D13) and (D16), it is sufficient to show that: and an analogous statement holding for derivatives with respect to ϕ. This claim is easily seen to hold true for the term multiplying (1 − s) in Eq. (D10): where in the last line we used Eq. (D14). Next we focus on the term multiplying s in Eq. (D10). This term has no ϕ dependence and thus we only consider the θ derivatives. First note that Now, we set all the θ i 's equal. Let us define p(θ) ≡ sin 2 θ 2 . Using this and the fact that f is only a function of the Hamming weight (which is equivalent to the qubit permutation symmetry), we can rewrite the last expression, after a few steps of algebra, as: Similar to the quantum case, we can perform SVD without explicitly imposing the permutation symmetry, and obtain the same results. Here too, we are forced to explicitly exploit the symmetry due to the non-local nature of the problem under consideration, which makes directly implementing the SVD oracle (without the symmetry) exponentially hard. For local problems we can efficiently implement the SVD oracle.
In the results presented in the main text, it is the implementation of SA that does not share this symmetry. However, while the quantum algorithms and SVD can be implemented without knowledge of the symmetry and still retain their advantage, an implementation of SA that uses the symmetry would require intimate knowledge of the problem. This would be an unfair advantage for SA, not for the quantum evolution.

Simulated Quantum Annealing
An alternative method to simulated annealing, simulated quantum annealing (SQA, or Path Integral Monte Carlo along the Quantum Annealing schedule) [26,27] is an annealing algorithm based on discrete-time pathintegral quantum Monte Carlo simulations of the transverse field Ising model using Monte Carlo dynamics. At a given time t along the anneal, the Monte Carlo dynamics samples from the Gibbs distribution defined by the action: where ∆(t) = βB(t)/N τ is the spacing along the timelike direction, J ⊥ = − ln[tanh(A(t)/2)]/2 is the ferromagnetic spin-spin coupling along the time-like direction, and µ denotes a spin configuration with a space-like direction (the original problem direction, indexed by i) and a timelike direction (indexed by τ ). For our spin updates, we perform Wolff cluster updates [55] along the imaginarytime direction only. For each space-like slice, a random spin along the time-like direction is picked. The neighbors of this spin are added to the cluster (assuming they are parallel) with probability When all neighbors of the spin have been checked, the newly added spins are checked. When all spins in the cluster have had their neighbors along the time-like direction tested, the cluster is flipped according to the Metropolis probability using the space-like change in energy associated with flipping the cluster. A single sweep involves attempting to update a single cluster on each space-like slice. As in SA, we can use SQA both as an annealer and as a solver [50]. In the former, we randomly pick one of the states on the Trotter slices at the end of the evolution as the output of the algorithm, while for the latter, we pick the state with the lowest energy found along the entire anneal as the output of the algorithm. We use the latter to maximize the performance of the algorithm.