An energetic perspective on rapid quenches in quantum annealing

There are well developed theoretical tools to analyse how quantum dynamics can solve computational problems by varying Hamiltonian parameters slowly, near the adiabatic limit. On the other hand, there are relatively few tools to understand the opposite limit of rapid quenches, as used in quantum annealing and (in the limit of infinitely rapid quenches) in quantum walks. In this paper, we develop several tools which are applicable in the rapid quench regime. Firstly, we analyse the energy expectation value of different elements of the Hamiltonian. From this, we show that monotonic quenches, where the strength of the problem Hamiltonian is consistently increased relative to fluctuation (driver) terms, will yield a better result on average than random guessing. Secondly, we develop methods to determine whether dynamics will occur locally under rapid quench Hamiltonians, and identify cases where a rapid quench will lead to a substantially improved solution. In particular, we find that a technique we refer to as"pre-annealing"can significantly improve the performance of quantum walks. We also show how these tools can provide efficient heuristic estimates for Hamiltonian parameters, a key requirement for practical application of quantum annealing.

However, in this paper, we focus on the coherent regime of operation, for which the effects of thermal dissipation and decoherence can be neglected. Such a regime could be experimentally reached either by reducing noise, implementing quantum error correction (Jordan et al., 2006;Sarovar and Milburn, 2005;Sarovar and Young, 2013;Freeman et al., 2018;Atalaya et al., 2020;Pudenz et al., 2014;Bookatz et al., 2015;Lidar, 2008), or quenching on a timescale which is much faster than the decoherence time. For instance, quantum annealing has been implemented in atomic settings where coherence is easier to maintain than in superconducting circuits (Bernien et al., 2017), and efforts have been made to reduce noise in superconducting circuit settings (D-Wave Systems Inc., 2019a). There have been experimental implementations of simple forms of error correction in quantum annealing (Pudenz et al., 2014(Pudenz et al., , 2015Vinci et al., 2015Vinci et al., , 2016Vinci and Lidar, 2018), and efforts have been made to circumvent experimental limitations on quench rates in superconducting systems (Lanting, 2018). A complementary approach to reduce noise is to implement dynamics which reduce or eliminate the interaction between the system and its environment though quantum interference effects, known as dynamical decoupling (Lidar, 2008;Paz-Silva et al., 2012;Quiroz and Lidar, 2012).
In a fully coherent regime, the dynamics are straightforward to model theoretically, since they can be described by a set of qubits (two state quantum systems) under the action of a Hamiltonian, evolving according to the Schrödinger equation. Conventionally, the Hamiltonian for this evolution is written as the sum of a problem Hamiltonian H prob , which is diagonal in the computational basis, and encodes the classical problem being solved, and a driver Hamiltonian H drive which implements quantum dynamics to explore the solution space. We use two equivalent forms for the total Hamiltonian. First, where A(t) and B(t) are positive, time-dependent control functions. However, typically the crucial feature is what happens to the ratio of driver to problem strength A(t) /B(t) as the algorithm progresses. As such, we define an alternative parametrization of the Hamiltonian, up to an overall (time-dependent) scaling factor B(t), as where there is a single control function Γ(t) > 0 for the ratio A(t) /B(t). Since (1) and (2) are equivalent, up to a rescaling of the time parameter, results for one form of Hamiltonian will generalize to results for the other. We use both forms, choosing the most convenient for the specific problem or example. Hamiltonians of the form (1) and (2), which begin with A(t) > 0 and B(t) = 0 and end with A(t) = 0 and B(t) > 0, or equivalently, begin with Γ(t) 1 and end with Γ(t) = 0, are used for most types of continuous-time quantum computing. When run on a much shorter timescale than required for adiabatic quantum computing, we call this a rapid quench.
The simplest form of continuous-time quantum computing in the coherent regime is continuous time quantum walk (QW) introduced by Farhi and Gutmann (1998); Childs and Goldstone (2004), in which the control functions are time-independent and set so that Γ(t) = γ where γ is a constant hopping rate. This can be viewed as the limit of an infinitely fast quench, in which B(0) jumps from zero to A(0)/γ at t = 0 and A(t f ) drops to zero at the final time t f . The other pure state continuous-time quantum computing which is commonly considered is adiabatic quantum computing (AQC) introduced by Farhi et al. (2000), for which the control functions A(t) and B(t) are varied slowly from A(0) = 1 and B(0) = 0 to A(t f ) = 0 and B(t f ) = 1. By the adiabatic theorem of quantum mechanics, this achieves a success probability (probability of finding the ground state of the problem Hamiltonian H prob ) which approaches 1 as t f → ∞. For a review of AQC see Albash and Lidar (2018). For a thorough discussion of the relationship between AQC and QW, see the introductions of Morley et al. (2019); . The fully coherent regime has provable quantum speedups in the case of both AQC and QW. For instance, unstructured search, the continuous time analog of Grover's search, can yield the same speedup in the AQC (Roland and Cerf, 2002) and QW (Childs and Goldstone, 2004) settings as the gate based counterpart. It is possible to interpolate between these two techniques while preserving the speedup (Morley et al., 2019).
For problems which are closer to real world optimisation, theoretical studies have mostly focused on AQC (Albash and Lidar, 2018), likely because the adiabatic theorem provides a general way to show that such algorithms could in principle succeed with high probability. While theoretically tractable, the adiabatic regime is difficult to reach experimentally, and contains some counter-intuitive effects in the deep adiabatic regime (Wiebe and Babcock, 2012;Campos Venuti and Lidar, 2018;Passos et al., 2020). Solving NP-hard problems adiabatically will at most obtain a polynomial speed up (assuming P = NP). Since AQC requires the system to remain coherent throughout, exponentially long runtime requires exponentially long coherence time, which is experimentally challenging for near-term quantum computing. When the runtime is limited by a constant or mildly scaling coherence time, such an algorithm could only solve the problem with an exponentially low probability, and therefore require exponentially many repeats to succeed with high probability. This approach, however, is a valid one for problems other than search. Recent numerical results on spin-glasses using QW show favourable scaling from many short run repeats . It has also been numerically demonstrated that rapid quenches can be superior to long quenches for AQC-like algorithms (Crosson et al., 2014). Finally, for single shot, high success probability algorithms for NP-hard problems, achieving even a polynomial speedup typically requires setting, with exponential precision, the control functions to values which lead to exponential small gaps in the Hamiltonian spectrum. This was shown to be necessary for unstructured search in Childs et al. (2002); Roland and Cerf (2002); Morley et al. (2019) and for the random energy model (Farhi et al., 2008) in . This requirement is problematic, as there are no general methods for determining where these gaps occur and because such precise control settings can be difficult to achieve in real hardware. Recent work by Chakraborty et al. (2018) demonstrates that some of the fine tuning requirements in unstructured search can be avoided by formulating the Hamiltonian differently, it is unclear whether this approach would extend to the random energy model of Farhi et al. (2008).
Given the near term importance of methods which can succeed with limited coherence time, in this paper, we develop mathematical tools to increase our understanding of how computation is achieved in both the rapid quench regime and quantum walks. These tools are important not only for theoretical understanding of when adiabatic searches and rapid quenches will be effective, but also for choosing parameters for Hamiltonians. While some theoretical arguments Hastings, 2019) can be made for why QW with short runtimes seems to perform well, a theoretical understanding of rapid quenches with time-dependent Hamiltonians, but far from the regime where the adiabatic theorem applies, is essentially missing. It has been recently shown numerically in (Brady et al., 2020) that the optimal protocol for solving problems often involves an annealing step, as opposed to bangbang controls where driver and problem Hamiltonains are not active simultaneously. Previous theoretical work based on the Pontryagin's minimum principle showed that optimal control patterns would always take the bang-bang form (Yang et al., 2017), but that these controls would sometimes require un-physically switching between the driver and problem an infinite number of times in a finite time span. The work done by (Brady et al., 2020) restricted to cases with a finite number of "bangs" and found that in this more realistic setting protocols involving annealing may be superior.
We begin in section 2 with some numerical examples to illustrate the performance gains that can be obtained from well-chosen rapid quenches in quantum annealing. This provides motivation to understand why rapid quenches work, and how to exploit the effects more systematically. We then analyse the energy flow between different quantum states, altering the expectation values of driver and problem terms in the Hamiltonian, as laid out in section 3. Next, we provide a general set of conditions (essentially requiring that quenches be monotonic) under which rapid quenches will preferentially seek out high quality solutions. We augment this analysis by studying the transitions between different computational basis states, to deduce the level of dynamics which will occur, in section 4, and, we apply our tools to different problem settings, including discussing the conditions for general optimisation problems to yield a significant level of dynamics. Then, in section 5 we show how the tools developed here can be used to construct heuristics for setting the parameters for continuous time quantum walks and rapid quenches. Section 6 provides details of our numerical methods, and we summarise and discuss our results in section 7.

Rapid quench examples
To motivate our theoretical tools, we start with three illustrative examples showing the power of rapid quenches to solve problems. For simplicity and concreteness, we focus on monotonic quenches; that is, quenches for which the control parameter Γ(t ) ≤ Γ(t) ∀t > t.

Two stage quantum walk
This is a minimal modification to the time-independent continuous time quantum walk. It consists of two time-independent stages of evolution separated by an infinitely fast quench. Because each stage is effectively a continuous time quantum walk, we refer to this as a two-stage quantum walk. We use a simple transverse field driver Hamiltonian where 1 is the identity operator andX j is the PauliX operator acting in qubit j, but, instead of using a constant control function (Γ(t) = γ), we use the time-dependent schedule which consists of two consecutive evolution stages with two different time independent Hamiltonians. Each of these stages is effectively a quantum walk, although the second stage uses non-standard starting conditions as its initial state is the final state of the first stage. The standard initial state is the equal superposition of all basis states, |ψ 0 = 2 −n/2 j |j , chosen because it is the ground state of the driver Hamiltonian, and also represents our ignorance of which basis state is the solution to the problem. The schedules we use for the two stage quantum walks are shown in Fig. 1 for each of our three examples.
As discussed in  a quantum walk can be understood from an energetic perspective according to a mechanism referred to there as the energy conservation mechanism. Being time-independent, quantum walks conserve the total energy of the system. To show the effect of changing the hopping rate γ part way through the walk, thus disrupting the energy conservation, our first example is a simple two qubit problem Hamiltonian whereẐ j is the PauliẐ operator acting on qubit j. We start the system at t = 0 in the state |ψ 0 = 1 2 (|00 + |01 + |10 + |11 ), the two qubit ground state of the driver Hamiltonian H drive in (3). To simplify notation, we define H prob ψ(t) ≡ ψ(t) | H prob | ψ(t) , the instantaneous expectation value of the problem Hamiltonian with respect to the state ψ(t) at time t. Likewise, H drive ψ(t) ≡ ψ(t) | H drive | ψ(t) for the driver Hamiltonian. We have the total energy E Γ (t) = Γ(t) H drive + H prob . Fig. 2 (top) shows that the expectation value H drive for the transverse field is zero initially (t = 0). As in , the energy conservation mechanism Two stage quantum walk using Hamiltonian in (5) with γ 1 = 2 and γ 2 = 1 2 . The instantaneous quench occurs at time t 1 = 10 (vertical dotted line). Top: energy expectation values E Γ = Γ H drive + H prob (gold), Γ H drive (green), H prob (blue). Also shown (black, dashed) is a guide to the eye at 0, and the minimum eigenvalue of H prob (red, dashed). Bottom: probability P (t) of being in the ground state of H prob at time t (blue), probability of random guessing (red, dashed).
then decreases the expectation value of the problem Hamiltonian at the expense of increasing the expectation value of the driver Hamiltonian. When the instantaneous quench is performed, the problem Hamiltonian expectation value is unchanged, but the driver Hamiltonian expectation value (and therefore the total energy expectation value E Γ (t)) is reduced. As the minimum eigenvalue of H drive is zero, the total energy expectation value E Γ (t) acts as an effective upper bound on H prob ψ(t) . The net effect is that, even if all of the energy stored in the transverse field were returned to the problem Hamiltonian, its expectation value would still be less than it was at the beginning of the algorithm. What actually happens, however, is that the transverse field is able to capture even more of the energy, thereby reducing the problem Hamiltonian expectation value further, and increasing the average probability of finding the ground state, Fig.2

(bottom).
A more realistic problem is the Sherrington-Kirkpatrick spin-glass (Sherrington and Kirkpatrick, 1975) ground-state problem investigated in . This has the problem Hamiltonian the couplings J ab and fields h b are drawn independently from the normal distribution N (0, σ 2 ) with mean 0 and variance σ 2 . Figure 3 shows a two stage quantum walk performed on a nine qubit Sherrington-Kirkpatrick Hamiltonian ‡ from the public repository in  which is associated with . In the setting of this larger problem, the fluctuations after each stage of the quantum walk are smaller relative to the dynamical range than in the two qubit case, a very early sign of the approach to the thermodynamic limit. Apart from this, the behaviour is qualitatively similar to the ‡ We chose instance ovcjhwbhtcpcvwicoxpdpvjzqojril and used it throughout this paper for all single SK problem examples.  prob of (5) shown in Fig. 2, and produces a significant increase in the probability of finding the ground state.

Biased two stage quantum walk
We introduce is a biased driver Hamiltonian, similar to the one used in Duan et al. (2013); Graß (2019). We formulate our biased driver Hamiltonian slightly differently as where g i ∈ {−1, 1} is a candidate (or guess) solution, and takes the value 1 if the ith bit of the guess solution is 0, and −1 if it is 1. The certainty of the guess is parametrized by 0 ≤ θ ≤ π 2 ; if θ = 0 the guess goes unused and the driver reduces to a transverse field of (3). In the other extreme, if θ = π 2 , then the ground state of H bias (g, θ = π 2 ) is the candidate solution and there are no dynamics. The ground state of the biased driver Hamiltonian has zero energy for all allowed values of θ and g, and is a tensor product of spin states which are each anti-parallel to the fields in (7); this state is used as the initial state. For simplicity, in this example we only consider biasing toward the most optimal solution (i.e., correct guesses), and we use the same nine-qubit SK spin glass as in the previous subsection.
As Fig. 4 shows, the effect of biasing toward the optimal solution is to lower the initial values of E Γ and H prob ψ(t) ; biasing toward a well chosen guess effectively gives the algorithm a 'head start' with respect to energy expectation values. This is qualitatively similar to what happens at the beginning of the second stage of the two stage quantum walk, except that the driver energy H bias (g, θ) ψ(t) starts at exactly zero, rather than having some initial energy left over from a previous stage. The bias improves the initial stage success probability by a factor of ten compared with the Biased two stage quantum walk on a 9 qubit Sherrington-Kirkpatrick spin glass, ID code ovcjhwbhtcpcvwicoxpdpvjzqojril from the public repository in  with γ 1 = 3 and γ 2 = 1, using a biased driver (7), biased towards the optimal solution of H prob using θ = π 8 . The instantaneous quench occurs at time t 1 = 10 (vertical dotted line). Top: energy . Also shown (black, dashed) is a guide to the eye at 0, and the minimum eigenvalue of H prob (red, dashed). Bottom: probability P (t) of being in the ground state of H prob at time t. (blue).
unbiased walk in Fig. 3, while the second stage again provides a (further) factor of three improvement. This biased two-stage quantum walk example provides proof-ofconcept that the mechanism we describe can be leveraged on top of a biased search. A thorough analysis of biased (single stage) quantum walks as a subroutine for hybrid quantum/classical computing is forthcoming (Nita et al., 2020).

Pre-annealed quantum walk
Our final example is again in two stages, but this time the first stage is a quantum anneal, and the second stage is a quantum walk that starts from the point where the anneal stops. The motivating intuition is that the initial time-dependent annealing stage will prepare an initial state for the quantum walk that has a lower average problem energy H prob ψ(t) than the usual uniform superposition state. If performed too slowly, such a quench will put the system into its instantaneous ground state, by the adiabatic theorem of quantum mechanics, and there will be no quantum walk dynamics. If performed too rapidly, the state will not evolve much during the anneal stage and the resulting quantum walk will be similar to one without a preannealing stage. However, if the anneal is performed at an intermediate rate, it leads to significant quantum walk dynamics, starting from a lower problem Hamiltonian expectation value H prob ψ(t) .
Using the H AB parametrization defined in (1), we consider pre-annealing with a quadratic schedule for a time t 1 , and then a steady state quantum walk afterwards; specifically, we define the schedule which is plotted in Fig. 5 for the values of t 1 we use.
Using the same nine-qubit SK problem as before, with its optimal γ value of approximately 1.004, the results for three different values of t 1 are shown in Fig. 6. Pre-annealing both decreases the average problem expectation value H prob ψ(t) and increases the success probability, but causes the peak values to be reached more slowly. In the longest pre-anneal with t 1 = 4, the success probability undergoes small amplitude, approximately sinusoidal, oscillations suggesting that the dynamics are dominated by a two level subspace. For t 1 = 0.5 and t 1 = 0, the oscillations are less structured, indicating that more than two energy levels are playing a non-trivial role in the dynamics. The increases in the success probability seen in Fig. 6 are relatively modest for this example. To determine the typical improvement in success probability due to pre-annealing, we use all 10, 000 Sherrington-Kirkpatrick instances from  at each size from n = 5 to n = 11 and compare the quantum walk success probability averaged over the quantum walk stage using 20 different linearly spaced pre-annealing times up to t 1 = 4. In Fig. 7 (top), we see that the success probability increases with pre-anneal time, up to a plateau, and the relative effect of pre-annealing becomes larger as n increases. To quantify this effect, we calculate the scaling exponent at each pre-annealing time by fitting a linear model on log-linear axes. We find a scaling exponent κ such that the success probability p success ∝ 2 κn . The fitted values of κ are plotted in Fig. 7 (bottom). As the inset of Fig. 7 (bottom) shows, the success probability is modelled well by a simple exponential function, as in . We find that pre-annealing significantly improves the scaling from κ = −0.418 for a pure quantum walk, in agreement with , to a maximum of κ ≈ −0.278. It is, of course, an open question whether or not this scaling will continue to problem sizes which are of practical interest, but the lack of visible finite size effects in Fig. 7 suggests that it might. Since very fast quenches can be experimentally challenging to implement, although methods are being explored (Lanting, 2018), determining the effects of quenching at a finite rate is of practical  . Top: success probability P for n = 5 to n = 11 for 21 different linearly spaced pre-anneal times from t 1 = 0 to t 1 = 4, darker magenta colour indicates higher n. All data are averaged over all 10, 000 Sherrington-Kirkpatrick instances from  at each size. Bottom: Scaling exponent κ for a model where psuccess ∝ 2 κn extracted from the linear fit on log-linear axes for different pre-annealing times in the inset. Inset: Scaling of success probability versus n, for the same t 1 values, with t 1 = 0 in red, and t 1 = 4 in dark blue (same colour coding as the bottom main figure).
importance. Our results show that such quenches are potentially a better strategy than trying to speed up or slow down to approach QW or adiabatic extremes.

Energy redistribution mechanism
In all the examples in section 2, we observe that the total energy expectation value E Γ (t) never increases during a rapid quench, and that E Γ (t) serves as an upperbound to the problem expectation value H prob ψ(t) , assuming that the groundstate of H drive is arranged to be at zero energy (the identity term in (3) ensures this). In this section, we formalise these observations into a mechanism that we refer to as the energy redistribution mechanism. Our analysis extends the energy conservation arguments made in Hastings (2019);  to quenches where the Hamiltonian is not time invariant, and therefore total energy is not conserved. Consider a closed system quantum annealing schedule on a system with a Hamiltonian H(t) defined by (2): We show that (for duration t f ≥ 0) the energy expectation value with respect to the problem Hamiltonian at the end is never higher than at the initial time t = 0, provided the following conditions are satisfied: (i) (initial ground state) the initial state |ψ(t = 0) is a ground state of the driver Hamiltonian H drive (ii) (positivity) the control function is non-negative: Γ(t) ≥ 0 ∀t (iii) (monotonicity) the control function is monotonically decreasing: Condition (i) is simply that the system is initially prepared in the ground state of the driver Hamiltonian. This condition is necessary for AQC, and is also standard for QW. Condition (ii) prevents pathological behaviour where the driver spectrum is effectively inverted by taking negative values of the control function Γ(t). This condition is satisfied in all traditional AQC and QW settings. Condition (iii) is that the quench is monotonic; this condition excludes methods such as reverse annealing, both the dissipatively driven form proposed in Chancellor (2017) (2019) is compatible with condition (iii). Our results do not rely on the adiabatic theorem and the control function Γ(t) does not need to be a continuous function. Without loss of generality, the driver Hamiltonian H drive can be chosen such that its ground-state eigenvalue is zero. Let E Γ (t) = ψ(t) | H Γ (t) | ψ(t) be the expectation value of the energy at time t. Then, it follows immediately from condition (i) that, at time t = 0, Furthermore, it follows from conditions (i) and (ii) that, at any later time t > 0, Taken together, the statements in (11), (12) and (13) imply for final time t f . In other words, the energy expectation with respect to the problem Hamiltonian can only decrease compared with the initial state. If the energy expectation of the problem Hamiltonian decreases, then the probability of measuring low energy states increases. Appendix A.2 shows (14) holds for quenches, parameterized with the single control function Γ(t), in the form of (2). However, since the control function Γ(t) is identified with the ratio A(t) /B(t) of control functions for quenches in the form of (1), the result in (14) follows automatically for quenches in A(t), B(t) form, except for when B(0) = 0, when Γ(0) is not well-defined. In Appendix A.3, we extend to the case where B(0) = 0, with the additional condition that the driver Hamiltonian H drive has a finite gap between its ground and first-excited manifolds (which is automatically true for all Hamiltonians on Hilbert spaces of finite dimension).
The key result is that, for quenches where the control function Γ(t) decreases monotonically, the energy expectation value of the problem Hamiltonian H prob cannot be higher than its initial value. Put another way, on average, a monotonic quench can never perform worse than random guessing. This result is important for two reasons. Firstly, although not being harmful to average solution quality is a rather weak statement, it applies very generally to a broad class of algorithms. Secondly, and more importantly, this result can be built upon to determine control functions that can provide a significant improvement, which is important for algorithm design. To do this, we need to combine the result in this section with criteria for when the transfer of amplitude between computational basis states will be significant, which we obtain in the next section.

Ensuring significant dynamics
In section 2, we showed examples of a quantum quench giving significantly better performance than pure quantum walks. In this section, we consider theoretically how a significant improvement can occur. We know from section 3 that dynamics will never be detrimental; this means that, if dynamics occur, in general it will be beneficial. What remains is to determine the circumstances in which significant dynamics will occur.

Quantifying the strength of short-time dynamics
In the analytical solutions for unstructured search in a continuous-time setting (Roland and Cerf, 2002;Childs and Goldstone, 2004), the method involves analysing the dynamics in a two dimensional subspace. To obtain significant dynamics in this setting, the hopping rate γ or schedule functions A(t), B(t) must carefully balance the relative strengths of the driver and problem Hamiltonians, such that the off-diagonal terms in the two dimensional subspace are maximized. Motivated by this, but being interested in shorter timescales, we instead investigate local subspaces spanned by a pair of basis states. To analyse whether significant dynamics will occur, we look at both how far the initial state of the system is from local eigenstates in these different subspaces, and how strong the transitions are to locally redistribute amplitude. If these are both large, for most of the transitions mediated by the driver, then the system will generate a high level of dynamics on a short timescale; otherwise, it will not, although dynamics may still occur on longer timescales.
As we want a measure of dynamics that can be efficiently estimated at all sizes, we analyse individual pairs of computational basis states connected by the driver, to determine whether significant transfer occurs between them, assuming the rest of the system remains in its initial state. Note that for classical problems in the setting we are considering, the problem Hamiltonian is diagonal in the computational basis, hence all of its subspaces are, too. Consider two basis states |j and |k connected by the driver, i.e., j | H drive | k = 0, and define an effective two-level system Hamiltonian where E (j) = j | H prob | j is the energy of computational basis state |j with respect to the problem Hamiltonian (similarly for k), and with the local driver Hamiltonian H (jk) driver defined as The extent to which the local subspace Hamiltonian H (jk) Γ (t) can transfer amplitude between the basis states |j and |k can be characterised by comparing the off-diagonal energy scale to the diagonal one. Define a local transfer coefficient, which takes values 0 ≤ T (jk) ≤ 1, as where is the difference between the diagonal elements in the diagonal basis of the problem Hamiltonian. Similarly, as implied by the energy redistribution mechanism described in section 3, transfer between driver eigenstates is also important. To capture this, we define a local driver coefficient D (jk) by transforming the local subspace Hamiltonian in H where U (jk) is a unitary such that U (jk) † H (jk) drive U (jk) is diagonal. It is easily shown that, for unbiased drivers such as (3), the local driver coefficient D (jk) and local transfer coefficient T (jk) are related by D (jk) = 1 − T (jk) . This makes it clear there is a trade off between the two quantities to obtain significant dynamics under the combined Hamiltonian. We quantify the overall level of amplitude transfer we expect by the product of the transfer and driver coefficients T (jk) and D (jk) , which we call the dynamic coefficient, For unbiased drivers, since D (jk) = 1 − T (jk) , and 0 ≤ D (jk) , T (jk) ≤ 1, it follows that Dyn (jk) satisfies 0 ≤ Dyn (jk) ≤ 0.25. The dynamic coefficient Dyn (jk) captures the level of algorithmically useful local dynamics experienced by the system. In particular, if Γ 1, then the driver Hamiltonian dominates and the problem Hamiltonian H prob will have little effect on the dynamics of the system. Since the initial state is the ground state of the driver Hamiltonian, the dynamics are driven by the much smaller problem Hamiltonian on short timescales. This limit is captured by the dynamical coefficient, as D (jk) ≈ 0, and hence Dyn (jk) ≈ 0. In the opposite extreme, if Γ 1, then the problem Hamiltonian dominates, but since it is diagonal, the dynamics will consist almost entirely of phase rotations in the computational basis, and the amplitudes will change very little. This limit is captured by the transfer coefficient, as T (jk) ≈ 0, and hence Dyn (jk) ≈ 0.
To characterise the level of dynamics in the entire system, we can simply take a mean value of Dyn (jk) over the values of j and k which correspond to a non-zero off diagonal element in H drive . That is, we define the average dynamic coefficient where · jk represents the mean over all pairs of computational basis states j, k connected by the driver Hamilton H drive . Although (22) cannot be exactly calculated efficiently, it should in general be possible to approximate it efficiently (up to additive error) by sampling. This follows from the fact that the values of Dyn (jk) are bounded 0 ≤ Dyn (jk) ≤ 0.25, and therefore the error δDyn can be reduced to the range this value can take, multiplied by the statistical noise in the sample, which scales as the square root of the number of samples, i.e., Equipped with the definition of the average dynamic coefficient Dyn, we can investigate when it is possible to find a value of Γ(t) such that Dyn is large enough for significant short time dynamics to be generated. For simplicity, we restrict ourselves to the unbiased driver case, when the local driver coefficient D (jk) and local transfer coefficient T (jk) are related by D (jk) = 1 − T (jk) . In this case, the local dynamic coefficient Dyn (jk) can be written in terms of the driver strength Γ(t) and a single scaled gap parameter ζ jk = |∆ jk | 2| k|H drive |j | as If we write p ζ for the probability density function that governs the distribution of ζ jk in the particular problem and driver Hamiltonians under consideration, then it can be shown that the maximum value attained by the average dynamic coefficient Dyn for any choice of driver strength Γ(t) has a lower bound which can be stated formally as where µ 1 (p ζ ) (µ 2 (p ζ )) is the first (second) central moment of the distribution governed by the probability density function p ζ . Note that this bound is obtained by choosing the specific driver strength Γ = µ 1 (p ζ ), i.e., the mean of the rescaled local gaps, which is not necessarily optimal, but serves to produce a non-trivial lower bound. We give the proof of the formal lower bound (25) in Appendix B.2. It is illuminating to look at the shape of this bound, which can be easily computed numerically for any given value of the ratio of moments µ2(p ζ ) /µ 2 1 (p ζ ). The bound is plotted for the interesting range of the ratio of moments in Fig. 8. It can be seen that the lower-bound is non-trivial when µ2(p ζ ) /µ 2 1 (p ζ ) < 1.0, but is trivially zero otherwise. This shows that there is a continuous range where Dyn is bounded away from zero, and hence dynamics will definitely happen on short timescales, even for non-optimal choices of Γ(t). This bound is in general far from tight, but still allows us to produce some interesting examples. We next illustrate the calculation of Dyn and the lower bound in (25) for some specific cases.

Example: Sherrington-Kirkpatrick spin-glass
We consider the Sherrington-Kirkpatrick spin-glass problem Hamiltonian given in (6). We take the driver Hamiltonian H drive to be the transverse field defined in (3). Due to the promising results found in  for solving this problem with quantum walks, as well as for the more general quenches presented in section 2, we expect intuitively that it should be generally possible to find values of Γ for which the average dynamic coefficient Dyn takes an appreciable value.
The transverse field driver only connects pairs of states j, k that differ by a single bit flip. Thus, it can be seen from (6) that, for all such pairs, the energy difference can be written where a is the index of the spin that is flipped between states |j and |k , the sum runs over b which indexes the other spins, s (j) ab is the eigenvalue (±1) of the operator Z a Z b on the state |j and s (j) a is the eigenvalue (±1) of the operator Z a on the state |j . The gaps ∆ jk in (27) is a sum of normally distributed variables with mean 0, and so ∆ jk is itself a normally distributed variable with mean 0, and can be shown to have a standard deviation ς = 2(n + 1)σ. Then, since k | H drive | j = 1 for the unbiased transverse field driver, the scaled gap ζ jk is distributed according to the half-normal distribution with probability density function For this distribution, it can be shown that the ratio of moments is which we emphasise is independent of the width ς of the distribution of the scaled gap ζ jk . For this value of the ratio, the lower bound shown in Fig. 8 is While this value is small compared to the maximum possible value of Dyn = 0.25, which is not unexpected for a hard problem (NP hard), we emphasise that it is independent of the width ς of the distribution of the scaled gap ζ jk and thus does not scale with the system size. Bounding Dyn away from zero for all sizes proves that dynamics will occur over short timescales for suitable control parameters, thus providing evidence that the scaling found in  may continue to useful problem sizes.

Example: unstructured search
As a contrasting example, we consider the problem of unstructured search on n qubits, in which a single computational basis state |m , out of the total N = 2 n basis states, is marked by being given a lower energy. The Hamiltonian for this problem is and again we take the driver Hamiltonian H drive to be the transverse field defined in (3). While unstructured search is a well known example with a provable quantum advantage, the algorithms which yield this advantage all involve coherent operations on time scales of order √ N = 2 n 2 rather than the short-time dynamics we are discussing in this paper. As such, we would intuitively not expect the lower bound in (25) to be large in this case.
Of the n 2 n−1 total off diagonal matrix element pairs in the transverse field driver, only n of these will connect a pair of computational basis states with non-zero energy difference, having energy difference ∆ jk = 1, with the remaining n 2 n−1 − n pairs having zero-energy difference ∆ jk = 0. Therefore, the distribution of scaled gaps ζ jk can be written as p ζ (ζ) = n n 2 n−1 δ(ζ − 1) + 1 − n n 2 n−1 δ(ζ) Calculating the first and second central moments of this distribution gives and so the relevant ratio of moments is Looking at the plot of the lower bound in Fig. 8, we can see that, for unstructured search the bound is trivially zero for all n > 1 We can also calculate the exact value using (24). For each of the n 2 n−1 −n pairs of states j, k with |∆ jk | = 0, Dyn (jk) = 0 ∀Γ. For the remaining n pairs of states j, k with |∆ jk | = 0, the choice of driver strength Γ = 1.0 will maximise Dyn (jk) = 0.25 for all remaining pairs of states. Thus, the average dynamic coefficient for unstructured search is which tends toward the lower bound of zero in the limit as n → ∞. This tells us that, for search, most two-level subspaces do not exhibit dynamics and probability enhancement of the marked state can only happen through finely tuned control. For an adiabatic algorithm, this is achieved by slowly adjusting the Hamiltonian within a precise range so that the system can follow a very delicate path, whereas for quantum walk this is achieved by reaching a finely tuned resonance between the marked state and the rest of a symmetric subspace of the Hilbert space. While interpolations between these two extremes are possible (Morley et al., 2019), all of the interpolated algorithms also rely on dynamics of a two level system with a gap proportional to √ N = 2 n 2 . In such a system, significant dynamics cannot occur in the timescales of rapid quenches, O(1) or O(poly(n)).

Using dynamics to find heuristic quench parameters
As mentioned in section 4, the average dynamic coefficient Dyn can in general be efficiently estimated by sampling. In this section, we show via two practical examples that this estimate can be used to develop heuristic methods for setting the control function Γ(t), or equivalently, A(t) and B(t), for a rapid quench, in both quantum walk and quantum annealing settings. In both cases, we use the unbiased transverse field driver Hamiltonian defined in (3). First, we consider the quantum walk algorithm, starting with a simplified example of a two qubit system. We then develop a heuristic for the Sherrington-Kirkpatrick spin-glass, and show that it performs almost as well as the numerically fine-tuned heuristic described in , without needing any fine-tuning. Second, we develop a simple heuristic method for defining a schedule for a time-dependent rapid quench, also applied to the Sherrington-Kirkpatrick spinglass, that outperforms a linear ramp.
In all the examples discussed in this section, we computed the average dynamic coefficient Dyn numerically using all non-zero j,k pairs, rather than estimating it by sampling such pairs. This is computationally easy to do at these problem sizes, and allows us to separate the effectiveness of the heuristic from errors due to sampling.

Heuristic hopping rate for a quantum walk
For a quantum walk, the average dynamic coefficient Dyn is a function of the chosen hopping rate Γ(t) = γ. Informed by the result in section 3 that dynamics will typically be useful, it follows that by maximizing Dyn we can obtain a heuristic hopping rate γ Dyn , that should ensure significant dynamics occur over short timescales. For the two qubit Hamiltonian from (5), Fig. 9 shows how the average success probaility within 100 dimensionless time units P 100 varies with γ. For this two qubit system, we can exactly calculate Dyn, see section 4.2, shown in Fig. 9. The maximum value of Dyn gives a value for γ Dyn which is a good quality estimate for the value of γ opt . Using bisection and a numerically calculated derivative, we find that γ Dyn ≈ 0.864, while the peak of P 100 occurs around γ = 1. Since the peak of P 100 is quite broad, the . Log-linear plot of average short time success probability P short against number of qubits n for quantum walks on the spin-glass dataset from , using the heuristic hopping rate γ dyn derived for each instance by optimizing the average dynamic coefficient Dyn (red). Also shown for comparison, P short obtained using the fine-tuned heuristic hopping rate γ heur (blue) described in . discrepancy between γ Dyn and γ opt only reduces P 100 by a small amount, as can be seen in Fig. 9.
To test how well this heuristic hopping rate works for a more realistic example, we numerically calculated γ Dyn for each instance of size 5 ≤ n ≤ 15 of the spin glass problems from . This was done by performing a bisection optimization to maximise the value of Dyn as a function of γ for each instance. Following the methods in , we performed a short-time quantum walk and calculated the success probability P short , which is time-averaged over a short run time. Averaging over all instances of a given size, we obtain the average short time success probability defined in ), for measuring the problem ground-state. This is shown (red line) for each size in Fig. 10. Included for comparison (blue line) are the results from  using the fine-tuned heuristic γ heur defined there, using properties of the eigenvalue distribution for the spin glass problem Hamiltonian. It can be seen that, despite γ heur being numerically finetuned specifically for the Sherrington-Kirkpatrick spin-glass problem, it performs only marginally better the general method we have used here. Fitting the data produces P short ∼ O(N (−0.411±0.002) ) for γ heur compared to P short ∼ O(N (−0.425±0.001) ) for γ Dyn . The eigenvalue distribution used in  would not generally be available to calculate γ for real problems; this comparison shows that using Dyn is a viable method for determining a useful value for γ in this case. For the small size instances we are using, we have used all the values of Dyn (jk) to calculate the average in the definition of Dyn in (22). We can show that the error in γ Dyn due to sampling a subset of Dyn (jk) values stays manageable for larger sizes. Consider a small error δγ in γ. Doing a Taylor expansion of Dyn(γ) around its peak value Dyn max gives where γ Dyn is the value of γ our heuristic would find § with the exact Dyn max . Using the sampling error in Dyn from (23) and rearranging yields This is a general expression that can be used for any problem Hamiltonian. For the Sherrington-Kirkpatrick spin glass, we can use the distribution of the scaled gaps from (28), and the definition of Dyn (j,k) from (24), to obtain the average value of Dyn(γ) for SK instances, Dyn(γ) Making the substitution z = ζ /(2 √ 2ς), to remove the ς dependence in the exponential, and differentiating twice w.r.t. γ gives (42) § Note that the first order term is absent, and the second order term is negative, because we are sampling at the maximum, and we have implicitly assumed that there are no other peaks in Dyn which take similar values. If This needs to be evaluated at γ = γ Dyn , at the peak of Dyn (γ), which doing the substitution z = ζ /(2 √ 2ς) in (41) shows occurs at a fixed value of γ /ς. Hence, the scaling with n of the double derivative at γ = γ Dyn is determined solely by the ς −2 prefactor in (42). Recalling from section 4.3 that ς = 2(n + 1) for these SK spin glasses, and putting it back into (40)    determined numerically that the peak in the success probability as a function of γ is very broad for SK spin glasses, and that the width of this peak decreases as 1 /n. Combined with (43), this means the sampling rate to calculate Dyn needs to increase by a poly(n) factor as n increases, in order to determine γ Dyn to sufficient accuracy. Since n corresponds to the number of qubits, this can be done efficiently.

Heuristic schedule for quantum annealing
For a time-dependent rapid quench of the form H AB (t) defined in (1) and total duration t f , a common choice of control functions, inspired by the adiabatic algorithm, is A(t) = 1 − s(t) and B(t) = s(t), where s(t) is a schedule function with boundary conditions s(0) = 0 and s(t f ) = 1. In the absence of any knowledge about where along the schedule useful computation can happen, the schedule function is often set to be the linear function s(t) = t /t f . The average dynamic coefficient Dyn provides a measure of the level of dynamics at each point along the schedule. Intuition gained from section 3 suggests that the linear schedule can in general be improved by spending less time in regions where Dyn is small and more time in regions where Dyn is large. A straightforward way to do this is to choose ds dt ∝ 1 Dyn (the constant of proportionality is set by the boundary conditions s(0) = 0 and s(T ) = 1). We have approximated such a schedule for a typical nine qubit Sherrington-Kirkpatrick spin-glass instance, as shown in Fig. 11(a) (red line). We have done this by fixing the value of the points marked by circles according to ∆s ∝ ∆t Dyn , subject to the boundary conditions, and then linearly interpolating between them. A linear schedule s(t) = t /t f (blue line) is also shown for comparison. Figure 11(b) shows the instantaneous success probability P (t) for measuring the problem ground-state as the quench progresses along the heuristic schedule (red line) and the linear schedule (blue line) for quench duration of t f = 2. It can be seen that the simple heuristic we've used here has resulted in a significant improvement in success probability at the end of the schedule. We have checked sufficiently many of the instances to determine that this level of improvement is typical for this size of problem and total time duration t f = 2. Further improvements may be available by varying t f or choosing a different function of Dyn for ds dt .

Numerical methods
Numerical simulation and optimization were used extensively throughout this work, as much of the analysis we have performed is not analytically tractable. The simulations and plots were performed using the Python language (Van Rossum and Drake, 2003), aided extensively by the NumPy (Oliphant, 2006), SciPy (Jones et al., 2001-), quimb, (Gray, 2018), andMatplotlib (Hunter, 2007) libraries. We also used the IPython interpreter (Pérez and Granger, 2007) and Jupyter notebook system (Kluyver et al., 2016). MATLAB was used for some early numerical experiments, but not for any results which directly appear in the manuscript. The numerical optimization used to produce Figs. 8, 9, 10 and 11, as well as the curve fitting used in Figs. 7 and 10, was performed using the optimization tools in SciPy (Jones et al., 2001-).
The Sherrington-Kirkpatrick spin glass instances in the data repository at  have been used extensively.
In any cases where a single example Sherrington-Kirkpatrick spin-glass instance has been used, it is the instance ovcjhwbhtcpcvwicoxpdpvjzqojril. The plot of average short time success probability P short against number of spins n in Fig. 10 uses all of the Sherrington-Kirkpatrick spin glass instances in the repository.

Summary and further work
In this paper, we have generalised and extended work begun in  to time-varying quantum annealing schedules.  provide numerical evidence for the ability of quantum walks to solve NP hard problems using many repeats of short runs. This strategy scales better than quantum search, by exploiting the correlations in the problem Hamiltonian. The energy conservation mechanism identified in  explains how energy conserving quantum walks can find lower energy states with better than guessing probability. In section 3, we generalised the energy conservation mechanism to an energy redistribution mechanism that holds for all monotonic quenches which start in the ground state of the driver Hamiltonian and have non-negative control functions.
The improvements leveraged by time-varying rapid quenches can be considerable, as we illustrated in section 2. To generate significant energy redistribution, there needs to be significant dynamics driving the system away from the initial state. To characterise the dynamics, in section 4 we defined the average local dynamic coefficient that balances the contributions from both the driver and problem Hamiltonians. This allows the control functions in the Hamiltonian to be optimised for fast dynamics, and provides a very general way to estimate good values to use for specific problems. For the spin glass data (Chancellor, 2019), we showed in Fig. 10 that such estimates are almost as good as the numerically optimised values used in .
Taken together, the energy redistribution mechanism and the average dynamic coefficient are powerful tools for understanding, designing, and optimally controlling rapid quench quantum annealing algorithms. While adiabatic quantum computing and quantum walk search have long had theoretical underpinnings, this represents a significant step in understanding how to exploit quantum annealing schedules run for short times. For current state-of-the-art noisy quantum computers, short run times are a big advantage over the long coherence times required for adiabatic quantum computing, or quantum walk search.
We have shown that our tools apply to the biased drivers proposed in Duan et al.  Inc., 2019b), are by definition not monotonic, so the tools and mechanisms identified here cannot be applied. Since reverse annealing is a powerful tool, extending our results to nonmonotonic cases is a worthwhile direction for further research.
Since the initial state |ψ(t = 0) is a ground state of the driver Hamiltonian H drive , it follows immediately that the expectation value of the driver Hamiltonian is at its lowest at t = 0, that is, ψ(t) | H drive | ψ(t) ≥ ψ(t = 0) | H drive | ψ(t = 0) , since the expectation value of the driver Hamiltonian H drive for any quantum state cannot be less than that of the ground state.
The total energy expectation as a function of time can be written where the notation . ψ is used to denote the expectation value with respect to the state ψ has been adopted. Since energy is conserved for 0 < t < t fin , it follows that, for → 0, E( ) = E(t f − ), and therefore rearranging terms, and recalling that ψ(t = 0) is the ground state of H drive and γ ≥ 0, we observe that, and therefore H prob ψ(t f ) ≤ H prob ψ(t=0) . Since ψ(t = 0) is not an eigenstate of the full Hamiltonian, some dynamics are guaranteed to happen, and thus there will be times t > 0 when H prob ψ(t) is strictly less than H prob ψ(t=0) . In the next subsection we show that by discretizing and using a rescaled representation, we can use a combination of the energy conservation mechanism and arguments about the stages where the Hamiltonian is modified, to generalize a version of the energy redistribution mechanism to all cases of time dependent Hamiltonian evolution which obey the conditions given in section 3.

A.2. Energy redistribution mechanism
We aim to show that the energy expectation value decreases monotonically with time. To do so, we first introduce a discretized approximation to the evolution as for 1 ≤ k ≤ q and where the symbol T is added to emphasise that the time order of the product must be preserved, since the Hamiltonians at different times are noncommuting. This discretized approximation becomes exact in the limit q → ∞. The evolution of a quantum system under the time dependent Hamiltonian given in (1) from time t = 0 to time t = t f from the initial state |ψ(0) is broken down as follows: The initial state is evolved under the constant Hamiltonian H( . This kind of discretization can be thought of as an extension of the Suzuki-Trotter decomposition (Trotter, 1959;Suzuki, 1993) and is therefore sometimes informally referred to as Trotterization. In the same manner, we can define a discretized version of the energy expectation value as Quantum states are only defined up to a constant phase, which is equivalent to choosing an arbitrary 'zero' for the energy. Hence, without loss of generality, we choose to set ψ(0) | H drive | ψ(0) = 0; in other words, we impose semidefiniteness on H drive by defining its ground state |ψ(0) to have eigenvalue 0.
During each time-independent evolution step, the energy expectation value E (q) Γ,k is conserved. Furthermore, since by definition H drive is positive semidefinite and Γ( Repeated application of this inequality results in the more useful inequality Since E (q) Γ,1 is the rescaled energy during the whole of the first evolution step, it follows that E (q) Furthermore, we have that lim which means Since this equation holds for all t f > 0, we have shown that E Γ (t) monotonically decreases with t.
A.3. The case of B(t) → 0: divergence of Γ The result in section 3 is that the inequality (14) holds for any quench with a Hamiltonian in the form of (2) that satisfies the three conditions listed in section 3. We now consider quenches with a Hamiltonian in the form of (1). Any Hamiltonian of the form (1) with B(0) > 0 can be put in the form of (2) by identifying the ratio A(t) /B(t) with Γ(t) and rescaling by a factor 1 /B(t), which can be formally compensated for by rescaling time by a factor of B(t). Thus, the inequality (14) holds also for any quench with a Hamiltonian in the form of (1) with B(0) > 0 and which otherwise satisfies the three conditions listed in section 3. Here, we show that this can be extended to to case where B(0) = 0. In the case that B(0) = 0, consider the modified Hamiltonians and the modified control functions where 1. It can be seen that that total Hamiltonian is unchanged, but we have that We define Γ (t) = Γ(t)
It can be immediately seen that Γ (t) is non-negative if Γ(t) is non-negative, and so condition (ii) is satisfied. Furthermore, dΓ (t) dΓ(t) = 1 Thus, Γ (t) is monotonically-decreasing if Γ(t) is is monotonically decreased, and so condition (iii) is satisfied. If we were to start the protocol in the state |ψ gs , a ground-state of H drive , condition (i) would be satisfied and the result would be proven. However, the original protocol we are considering starts in the the state |ψ(0) , ground-state of H drive . Applying first order perturbation theory in to H drive , we find that H drive has a ground-state where |ψ ⊥ is a normalized state vector orthogonal to |ψ(0) and ∆ is the energy gap between the ground and first-excited manifolds of the actual driver Hamiltonian H drive . Thus, assuming the driver Hamiltonian H drive is not gapless (which is automatically true for all Hamiltonians on Hilbert spaces of finite dimension), the inequality in (14) is satisfied in the limit as → 0.
B. Lower bound on the average dynamic coefficient B.1. Bound on probabilities in a range based on second moment Here, we prove a useful bound that will be applied in the following subsection. Assume that the distribution p(x) has a finite second moment where is the first moment (mean). Let us choose some values x max > x min such that µ 1 (p) = 1 2 (x max + x min ). The distribution q(x) = 1 2 δ(x min − ) + 1 2 δ(x max + ) has the minimum possible second moment while having no support in the interval [x min , x max ], where δ is the Dirac delta distribution. In the limit → 0, the second moment of this distribution is µ 2 (q) = 1 4 (x max − x min ) 2 . Thus, if µ 2 (p) < µ 2 (q), then p(x) must have some support within the range [x min , x max ]. In particular, because second moment µ 2 (p) can be lower bounded as the probability for x to be in the interval [x min , x max ] can also be lower bounded as

B.2. A simple lower bound
Let ζ jk = ∆ jk H drive jk and let η jk = ζ jk Γ . Furthermore, let p ζ and p η be probability density functions that govern the distribution of the values ζ jk and η jk , respectively, over a set of problem instances. Let µ 1 (p) and µ 2 (p) refer to the first and second moments, respectively, of a distribution governed by the probability density function p.
The dynamic coefficient is so we will consider the function where x > 0. Let x be distributed according to the probability density function p η . We know that the expectation value f (x) x is then where x max > x min , P η (. . . ) is the probability of its argument being true if η is distributed according to p η , and f (x) b a is the expectation value of f (x) if x is distributed according to a (renormalized) version of p η with all support on x < a and x > b removed. As f (x) is positive for all x > 0, we get the lower bound on Since f (x) is also convex, we know that Now, let the interval [x min , x max ] be of width 2c (for some c > 0), and centred on the mean µ 1 (p η ) (thereby also constraining c < µ 1 (p η )). That is, x min = µ 1 (p η ) − c and x max = µ 1 (p η ) + c, and we must find out which of f (µ 1 (p η ) − c) and f (µ 1 (p η ) + c) is smaller. To do this, we consider under what conditions it is true that f (µ 1 (p η ) − c) < f (µ 1 (p η ) + c).
Applying the result in subsection B.1, we have P ζ (µ 1 (p ζ ) (1 − c) < x < µ 1 (p ζ ) (1 + c)) ≥ 1 − 4µ 2 (p ζ ) Putting this all together gives While this inequality gives a valid lower bound on Dyn, the greatest lower bound can be written which can be found numerically for any given value of µ2(p ζ ) µ 2