Designing Quantum Annealing Schedules using Bayesian Optimization

,


I. INTRODUCTION
Over the last few decades, quantum annealing (QA) has emerged as a novel computational paradigm [1,2].According to the adiabatic theorem, (ideal) QA has theoretically been shown to be computationally equivalent to the more prevalent circuit model of quantum computing [3].However, QA has garnered significant interest mostly due to its potential to encode and solve combinatorial optimization problems [4][5][6][7][8], with important applications across science and industry.Still, several pitfalls remain -most notably, the problem of vanishing gaps during the anneal, which results in the run time of QA scaling exponentially for some problems [9,10].Several extensions have been proposed to overcome this problem, such as leveraging diabatic effects [11], counter-diabatic driving [12][13][14] and reverse annealing [15,16], to name a few.
Recently, variational quantum adiabatic algorithms have been proposed in which a classical optimizer designs QA schedules within a larger hybrid (quantum-classical) computation [17][18][19][20][21].The goal is then to find schedules with the ability to automatically adjust these to the gap structure of the Hamiltonian at hand; for example, by adjusting the annealing speed with which different parameter regions are traversed, or in the form of diabatic paths, which circumvent the need to adhere to the adiabatic speed limit [11].Several methods have previously been proposed to find such optimized schedules, including optimal control techniques [22][23][24], genetic algorithms [25], machine learning techniques [26][27][28][29], as well as several other gradient-based and gradient-free optimizers [19,30].
In this work, we consider the use of Bayesian optimization (BO) to design high-quality QA schedules with minimal user requirements.BO is a widely used global optimization strategy [31], especially useful when optimizing (black box) cost functions that are expensive to evaluate, be it in terms of run time or actual monetary cost.This frugal nature of BO makes it an attractive candidate to optimize quantum annealing schedules, especially in the current noisy intermediate scale quantum (NISQ) era, where access to quantum devices is limited and expensive.Previously, BO has already been used to prepare desired states in ultra-cold gases [32,33], to find optimal angles in the quantum approximate optimization algorithm (QAOA) [34], and to improve QA on D-Wave quantum annealers [35].
Here, we investigate the use of BO for preparing ground states (or low-energy states) of two paradigmatic spin models.We first show that simple, hands-off BO schemes (as schematically illustrated in Figure 1) are able to design QA schedules resulting in fidelities several orders of magnitude better than simple linear schedules.Additionally, we design reverse annealing (RA) schedules for the p-spin model and show that BO can identify schedules that successfully circumvent the first-order phase transition.Furthermore, we present numerical evidence that Bayesian Optimization QA Schedules Evaluation Numerical simulations Quantum hardware Figure 1.Illustration of the proposed schedule optimization scheme.Panel (a) illustrates Bayesian optimization: a surrogate model f (θ) is built, which represents our belief about the true merit function f (θ).The surrogate model is then used to compute the acquisition function (lower panel in (a)) -the parameter set for which the acquisition function is maximized is chosen as the next set to probe θnext.These parameters are then used to specify the schedules u(t), visualized in (c) (with u(•) referring to some relevant Hamiltonian parameter such as, for example, detuning or driving).The schedules are then evaluated in numerical simulations or on actual quantum hardware.As the outcome can be subject to (e.g., sampling) noise, the obtained values (black circles in (a)) fluctuate slightly around the ground truth (black dashed line in (a)).The evaluation outcome is then used to update f (θ), and the cycle can then start anew.We provide further details in the main text.
these BO-designed RA schedules are robust against dephasing noise, thereby providing an improvement to previous limitations while using RA [36].Finally, we turn our attention to the NP-hard maximum independent set (MIS) problem.In particular, we demonstrate (both in numerical simulations and on actual quantum hardware available on Amazon Braket) that BO can be used to design efficient protocols to solve the MIS problem on neutral atom quantum devices.
The remainder of this paper is organized as follows.In Section II we introduce Quantum Annealing and Bayesian Optimization as the two essential techniques underlying this work.In Section III we then introduce the model systems on which we test our approach, followed by Section IV where we present the results of numerical simulations of the p-spin model and results on solving the MIS problem on an array of Rydberg atoms.In Section V we discuss the implications of these results and in Section VI we lay out some of the future research.

II. METHODS
In this section we outline the main methods used in this paper.First, we briefly introduce QA and define several families of schedule parametrizations as used throughout this manuscript.We then review BO and describe in detail the application thereof in the context of QA schedules.
A. Quantum Annealing

General description
In QA we aim to prepare the ground state of a target Hamiltonian Ĥt by exploiting our ability to control the schedules u j (t) of the time-dependent QA Hamiltonian which governs the time evolution of target system.Note that, in general, the Hamiltonian Ĥ(t) can be composed of n H individual terms.Typically, we begin the procedure with the state of the system |ψ(0)⟩ equal to the ground state of Ĥi ≡ Ĥ(0), i.e., we choose the initial Hamiltonian Ĥi such that its ground state is easy to prepare.We then evolve the state with the time-dependent Hamiltonian Eq. ( 1) for a time t f such that ultimately Ĥ(t f ) = Ĥt .If the rate of change of the Hamiltonian is slow enough compared to the minimum of the spectral gap ∆(t) between the instantaneous ground and first excited states of Ĥ(t), then the success of QA is guaranteed by the adiabatic theorem; see, for example [37].If the requirement of adiabaticity is abandoned, one may (for example) leverage diabatic effects, which can potentially yield shortcuts to the ground state of Ĥt [11,38].However, such paths might be difficult to find in practice.Finally, one can additionally forgo the requirement that Ĥ(t f ) = Ĥt ; this has been explored in the context of QAOA-inspired so-called bang-bang schedules (see Sec. II A 2).

Schedule parametrizations
Here, we describe the schedule parametrizations used in this work, as summarized in Table I.For convenience, we first focus on the first five parametrizations and address the bang-bang protocol individually at the end of this section.All of these five Ansaetze are designed such that they fulfill the boundary conditions u(0) = 0 and u(t f ) = 1, and can be used as basic building blocks for the design of more sophisticated schedules by transforming or combining them according to the required boundary conditions, e.g., u(t) → 1 − u(t).Thus, one can use these parametrizations to construct the desired QA Hamiltonian Eq. (1).

Schedule
Function The real, cubic, and low-pass parametrizations all take a collection of n points θ j as inputs.These specify the value of the schedule at equidistant points in time θ j = u(jt f /(n + 1)), where j = 1, ..., n.In addition u(0) = 0 and u(t f ) = 1.In the real (cubic) parametriza-tion linear (cubic) interpolation is then used to obtain the intermediate values, resulting in a piecewise linear (cubic) schedule.For the low-pass parametrization, the real schedule is additionally passed through a low-pass filter, which smoothens the piece-wise linear schedule (see Fig. 3 for an example).This has been suggested previously to ensure that the schedules comply with the limitations of the neutral atom experimental platform, and to mitigate potential artifacts (e.g., excitations) at the points where the original schedule is not differentiable [30].The degree to which the schedules are allowed to deviate from the linear schedule is controlled by the ζ parameter (see right column of Table I).We use ζ = 2 unless stated otherwise.
In the case of the Fourier parametrization, sinusoidal deviations from the linear schedule are considered, with θ j setting the magnitude of the term with frequency ν j = j/(2t f ).As we found in practice that high-frequency components are less important, we decrease their allowed magnitude with increasing j, thus reducing the search space for the optimizer.
Finally, we turn our attention to the bang-bang schedule, which is qualitatively different from the others, as it is inspired by QAOA-type algorithms designed for gatebased architectures.As shown in Fig. 1c, it consists of constant-magnitude discrete pulses.For the experiments documented in this manuscript, it is sufficient to consider pulses that attain the nonzero value of 2θ1 t f in the intervals [0, t f /2] and [t f /2, t f ] (i.e., either in the first half or the second half of the protocol), corresponding to setting the parameters (t 1 , t 2 ) in Table I to either (0, t f /2) or (t f /2, t f ), respectively.Thus, the parameter θ 1 directly corresponds to the area of the pulse.We only fix the duration of the pulses for convenience.However, in principle, other parametrizations are possible.Consequently, evolving a system with the Hamiltonian u(t) Ĥ, where Ĥ is constant in time, is equivalent to applying the unitary U (θ 1 ) = exp(−i Ĥθ 1 ), thus making the connection to QAOA immediately evident; see, for example, Ref. [39].

Preliminaries
Let us review how BO can be used to optimize a scalar objective function f : D θ → R, where D θ ⊂ R n is the parameter domain.In the most standard variant of BO, Gaussian Processes (GPs) are used to build a surrogate model f (θ) of the optimization objective.We say that a function f (θ) is distributed as a GP f (θ) ∼ GP ⃗ µ(θ), k(θ, θ ′ ) , where ⃗ µ(θ) is the mean and k(θ, θ ′ ) is the kernel function, when for every finite subset of points Θ = {θ 1 , ..., θ m } it holds that f (Θ) ∼ N (⃗ µ, Σ).
Here, N (⃗ µ, Σ) is a multivariate normal distribution with an m-dimensional mean vector (⃗ µ) i = µ(θ i ) and an m×m covariance matrix Σ ij = k(θ i , θ j ).In other words, ⃗ µ contains our belief about the mean values of f evaluated on points in Θ.Similarly, Σ holds the information about the correlation between the values of f on Θ.
We can incorporate our prior beliefs about the objective function through the choice of a kernel function k.In our case, we choose a Matérn 5/2 kernel, which favors continuous functions [40].The chosen kernel is isotropic, meaning that it only depends on the Euclidean distance d := d(θ, θ ′ ) between two points as where the length scale ℓ determines how quickly the correlations between values at two different points decay as a function of the Euclidean distance d between them.
For instance, if one is interested in the values of the model function f at a set of points Θ, and one assumes that one already knows the values for some set of observations Θ obs ⊂ Θ.We can condition the probability distribution of f on the observations f (Θ obs ).A key feature of the normal distribution is that it is closed under conditioning, meaning that the resulting distribution on the remaining points of interest Θ rem = Θ \ Θ obs is also normally distributed as f (Θ rem ) = N (⃗ µ c , Σ c ).Here ⃗ µ c and Σ c are the conditioned mean vector and covariance matrix, respectively, which can be obtained by elementary matrix algebra (see, e.g., Ref. [40]).The straightforward way in which observations can be included into the model is one of the main reasons why GPs are typically used as the surrogate model.Another benefit of BO is that uncertainty about the observations can be easily incorporated into the model, if we assume that f (θ) is a normal random variable centered at the ground truth value, with a standard deviation σ obs .This can be useful, for example, to incorporate statistical noise resulting from a finite number of shots obtained from a quantum device.For a visualization see Fig. 1a, where the observations (black dots) deviate slightly from the ground truth (black dashed line).The updated GP can then be used to determine the next point to probe, which is selected to be the point where the so-called acquisition function φ(θ) attains its maximal value θ next = arg max φ(θ).This is shown in the bottom of Fig. 1a.
Several heuristics for φ are used in the literature [41]; in this work, we use the upper confidence bound [42] φ(θ; κ) = µ(θ) + κ • σ(θ). ( Here µ(θ) = E[ f (θ)] is the mean value of f (θ) (indicated by the teal (solid) line in Fig. 1a) and σ(θ) = (E[ f (θ) 2 ] − µ(θ) 2 ) 1/2 is its standard deviation (light teal shading in Fig. 1a).The hyperparameter κ sets the relative importance of the mean value of f (θ) and its standard deviation and thus controls the amount of exploration permitted.In this work we decrease the value of κ as the optimization progresses.Intuitively, we want the algorithm to explore more initially and leverage the acquired information about the optimization landscape at later stages, thus gradually switching from exploration to exploitation.

BO Pipeline
Let us now provide some more detail about the endto-end pipeline-illustrated in Fig. 1-used to produce the results shown in this paper.At the beginning of the process, a schedule Ansatz (or a set thereof) must be specified.This schedule template takes in a set of parameters, which are elements of bounded intervals.We then need to pass the bounds of these parameters to the optimizer to limit the domain on which a surrogate model of the objective function is constructed.Additionally, a figure of merit (FoM), e.g., expected energy, must be specified, which can be obtained by evaluating the schedule at a particular value of the parameters.The evaluation can then be carried out either in numerical simulations or on an actual quantum device.As we use a finite sample of size N shots to estimate the FoM, we scale the standard deviation of the observations as σ obs ∝ N −1/2 shots .Unless specified otherwise, our BO pipeline then proceeds as follows.We initialize the optimizer by probing the linear schedule, i.e., by evaluating the parameters corresponding to the linear schedule for each parametrization.Note that for the bang-bang parametrization, where this is not possible, we start with random parameters.We then evaluate nine additional random parameter sets, after which the next points are chosen by the maximum of the acquisition function Eq. ( 2), starting with κ = 2 (see Fig. 1a).This is then repeated for a total of 50 iterations.As we would like the optimizer to focus on exploiting the information it has acquired during the initial exploration, we specify a decay schedule for κ.After 25 iterations, we begin decreasing κ as a geometric sequence, such that its final value is κ = 0.01.In the BO implementation we use in this study [43], additionally, hyperparameters (such as the kernel length scale ℓ) of the kernel function are progressively tuned to improve the quality of our surrogate for the objective, making this particular implementation powerful out-of-the-box, without any need for hyperparameter tuning.

III. BENCHMARK SYSTEMS
In what follows, we introduce the two paradigmatic model systems studied in this work -quantum annealing (QA) and reverse annealing (RA) of the p-spin model, and a variational adiabatic protocol for finding the maximum independent set (MIS) of a graph using a neutral atom quantum processor based on Rydberg atoms.

A. p-spin model
The p-spin model is a well-studied spin system with all-to-all p-body terms.Using the total spin operators where σx i (σ z i ) is the usual Pauli-X (Pauli-Z) matrix acting on site i, we can write the p-spin Hamiltonian for N spins as For p odd the unique ground state of the p-spin Hamiltonian is |↑⟩ ⊗N , where σz |↑⟩ = +1 |↑⟩.Here, we focus on the case where p = 3, as higher values p ≥ 4 exhibit qualitatively similar behavior [16].We use ℏ = 1 throughout this section.

Quantum Annealing
In the case of QA, we start the protocol in the ground state of the transverse field Hamiltonian where Γ is the strength of the transverse field.The initial ground state is then just |+⟩ ⊗N , where |+⟩ := 2 −1/2 (|↑⟩+ |↓⟩) is the +1 eigenstate of σx .We then evolve the system via where we demand s(t f ) = 1 − s(0) = 0.Here we have implicitly restricted ourselves to only considering parametrizations where the control functions are dependent.It is particularly convenient to use such a parametrization, as fewer parameters are required.Hence, we employ this Ansatz for QA in the p-spin model throughout the main text and defer results with independent controls to Appendix A.
It has been shown, by means of a static [15] and a dynamic [16] analysis, that the system governed by the Hamiltonian in Eq. ( 6) exhibits a first-order phase transition as s(t) is varied between 0 and 1.A first-order phase transition is typically (but not exclusively, see e.g.[44,45]) associated with an exponentially small (in the size of the system N ) gap between the ground state and the first excited state of the instantaneous Hamiltonian [46,47].Consequently, the adiabatic timescale t f likewise suffers from exponential scaling.
One possibility to circumvent the exponentially long time scale was proposed in Ref. [48], where QAOA circuits for preparing the ground state of the p-spin Hamiltonian are proposed, which can be turned into constant time annealing schedules [49].These consist of two constant valued pulses; for t ∈ [0, t f /2) with Ĥt and subsequently for t ∈ (t f /2, t f ] with ĤTF .We use the bang-bang schedules defined in Sec.II A 2 and learn the parameters with BO.Additionally, several other potential modifications to the basic protocol in Eq. 6 have been proposed in the literature [50][51][52][53][54][55].Here, we focus on Reverse Annealing (RA) [15,16].

Reverse Annealing
In RA the system is initialized in a classical state, corresponding to an approximate solution of the target problem, which we have obtained by e.g., a classical approximation algorithm.Here, we consider states |ψ c ⟩ that have a certain fraction c ∈ [0, 1] of spins aligned with the ground state |↑↑ . . .↑⟩ of the p-spin Hamiltonian.Due to the permutation symmetry of the Hamiltonians in Eq. ( 4) and Eq. ( 5), we can without loss of generality choose a configuration where the first N c := ⌊N c⌉ spins are aligned with the ground state of Ĥt and the rest are antialigned as This is the ground state of where Ŝz j is the total z projection of the spin in the aligned (j = 1) and antialigned (j = 2) sectors, comprising the first N c and the remaining N −N c spins, respectively.We can then define the RA Hamiltonian as with s(0) = λ(0) = 0 and s(t f ) = λ(t f ) = 1.Note that for λ(t) = 1 we recover standard QA as defined in Eq. ( 6).It has been shown that for sufficiently good initial approximations (i.e., for c sufficiently close to 1), the firstorder phase transition in the p-spin model can be avoided by RA [15].This has been leveraged to design polynomialtime adiabatic protocols to prepare the ground state of the p-spin model [16].However, as the gap structure of the RA Hamiltonian Eq. ( 9) is a priori unknown, such a speedup is subject to finding the correct parameters (e.g., the transverse field strength Γ, see Fig. 2), especially if we consider exclusively linear schedules for λ(t) and s(t) (see background in Fig. 7a).Hence, finding a way to design s(t) and λ(t) to successfully navigate the ∆(s, λ) landscape is essential to harnessing the full speedup obtainable in RA.
An essential observation is that ∀(s, λ) the RA Hamiltonian Eq. ( 9) commutes with the squared total spin operator S 2 j in both sectors j = 1, 2. Because the initial state belongs to the largest eigenvalues of S 2 j in both sectors, the evolution is constrained to this subspace.Therefore, only a Hilbert space of dimension (N c + 1)(N − N c + 1) needs to be considered, which makes larger system sizes accessible by numerical simulation.Note that this also holds for QA, with a Hilbert space dimension N + 1 obtained by setting c = 0.In this work, numerical simulations are performed using the QuTiP library [56,57].

Noise in the p-spin model
It has been shown that RA loses some of its computational advantage over QA when dephasing noise is considered [36].To analyze this statement in more detail within our approach, we set up simulations of the p-spin model in the mathematical framework of the Adiabatic Master Equation (AME) [58]: Here, H LS is the Lamb-Shift Hamiltonian, and D is the dissipator -for details we refer the reader to Appendix B.
To be able to simulate larger systems, we limit ourselves to the study of the AME with collective dephasing, which does not break permutation symmetry but was shown to hinder the performance of RA [36].We employ the HOQST library to perform numerical simulations of the AME [59].
B. Maximum Independent Set and Neutral Atom Quantum Computers

Maximum Independent Set Problem
Given a graph G = (V, E) with a set of vertices V and an edge set E an independent set is a subset of vertices S ⊆ V such that no pair of vertices v i , v j ∈ S in the subset are connected by an edge The largest independent set is called the maximum independent set (MIS).Finding the MIS is known to be NP-hard for general graphs [60], making the existence of an efficient (i.e., polynomial time) algorithm unlikely.However, applications of the MIS problem can be found across a broad range of industries [61].Here, we focus our attention on solving the MIS problem on a family of graphs called unit-disk graphs (dubbed UDGs in the following).These are generated by positioning a collection of vertices in the plane and only connecting those that are less than a certain distance R d apart.Despite being a special family of graphs, finding the MIS of a general UDG is still an NP-Hard problem [62].For details on the graph generation in this work, refer to Appendix C.

MIS with Rydberg Atom Arrays
The MIS problem on UDGs naturally emerges in the context of neutral atom quantum computers based on Rydberg atom arrays [30,62].In what follows, we briefly summarize the aspects of neutral atom quantum computation relevant to the present study -for a comprehensive review, we refer the reader to Ref. [63].
Visualizations of the schedules for the Rabi frequency Ω(t) and the laser detuning ∆(t).The Ω schedule (solid line) takes in its maximal value Ωmax, and the ramp-up time τΩ as parameters.The detuning schedule is specified by a collection of (in this case 2) equally spaced points ∆j.A linear interpolation between these points is then passed into a low-pass filter, yielding the dashed teal schedule depicted above.
The dynamics of neutral atom quantum processors are governed by the following Hamiltonian Ĥ(t) = Ĥdr + Ĥcost with where Ω is the Rabi frequency, ∆ is the laser detuning, and , with C 6 the van der Waals coefficient, which depends on the particular atomic species used.The number operator ni := 1 2 − σz i counts the number of Rydberg excitations at site i.
A crucial feature of the Hamiltonian in Eq. ( 11) is the so-called Rydberg blockade phenomenon, in which two atoms cannot simultaneously be in the (excited) Rydberg state |1⟩ if their spatial separation is smaller than the Rydberg blockade radius R b ≡ (C 6 /ℏΩ)1/6 [30].Therefore, if one sets the UDG length scale as R d = R b , ground states of Ĥcost naturally obey the independence constraint on UDGs if one associates atoms that are in the |1⟩ with membership in the MIS [62].If we furthermore set 0 < ∆ < V ij , in the ground state of H cost , the number of excitations will be maximized, while not violating the independence constraint, thus corresponding to solutions to the original MIS problem.We thus start the adiabatic protocol with ∆ < 0, such that the ground state is |0⟩ ⊗|V | and then (slowly) increase the detuning until some final positive value, as shown in Fig. 3, such that the ground states of the final Hamiltonian correspond to the solutions of the MIS problem.
In our experiments, we run the protocols identified by the BO algorithm and then use the top 50% of the bit strings in terms of the following MIS energy (cost) function where x i is the i-th component of the bit string x ∈ {0, 1} |V | , which indicates if the i-th vertex is present in the independent set (x i = 1).We have found the value α = 1.2 to work well in practice, and thus use it throughout this manuscript unless specified otherwise.

IV. RESULTS
In what follows, we present results for BO-designed annealing schedules.First, we present results based on numerical simulations of the p-spin model, which allow us to study (relatively) large system sizes of a paradigmatic model that exhibits a first-order phase transition.Then, we turn our attention to the MIS problem on neutral atom quantum computers.Here, we perform numerical (classical) simulations for small graphs and compare our results with results obtained from the QuEra Aquila device 1 .Finally, we implement and execute our end-to-end optimization pipeline on the QuEra device.
As BO is a stochastic optimizer, averages over several runs are required to assess its (average) performance.We report median values of the performance metrics, with the error bars denoting the upper and lower quartiles.In all experiments at least 80 repetitions have been performed.

A. p-spin model
We first analyze the p-spin model in (classical) numerical simulations and set the strength of the transverse field  Γ = 5 and use p = 3, unless stated otherwise.We present results for both QA and RA schedules and study the effects of noise.
We begin with a characterization of BO and different annealing schedules by considering the system size scaling of the ground-state fidelities obtained at the end of the protocol for the case where ground state fidelity is likewise used as the figure of merit for the optimizer.While this scenario might not be realistic for many realworld situations because the ground state is typically not known a priori, it provides the easiest setup for the optimizer to learn, because the feedback is obtained directly on the relevant FoM.In Fig. 4a, the fidelity for different parametrizations is shown.All parametrizations depend on four parameters, apart from the bang-bang protocol, which uses just two.

Quantum Annealing
The fact that the system undergoes a first-order phase transition manifests itself in the dramatic decrease in fidelity for the naïve linear schedule.The fidelity can be improved by several orders of magnitude (when the number of variables increases) using BO, for all of the parametrizations considered.The best results are obtained when the bang-bang parametrization is used -in line with previous results showing that it is possible to prepare the ground state of the p-spin model with an optimal t f ∼ O(1) schedule [48,49].However, similarly, our results based on the real and cubic parametrizations are able to improve dramatically on the linear QA schedule as well.
As mentioned previously, a key reason to consider BO for the optimization of annealing schedules lies in its efficient use of resources (i.e., queries of the cost function).We showcase this feature of BO explicitly through complementary benchmark results obtained with the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm [64], a common choice when optimizing variational quantum algorithms (see, for example, Refs.[30,65,66]).Specifically, for QA in the p-spin model with N = 15 spins and t f = 3, we compare the performance of SPSA and BO in terms of the fidelity achieved as the number of allowed evaluations is increased (see Fig. 4b).Here, hyperparameter optimization was performed for both SPSA and BO to allow for a fair comparison.For reference, the same number of random parameter evaluations is shown.In the intermediate regime (with ∼ 50 -200 evaluations), BO is able to outperform SPSA-based schedules, in line with previous studies using BO to control quantum systems [32].As we increase the number of evaluations, SPSA performance improves (as expected) and converges to schedules of the same quality as BO.Because, SPSA uses a proxy for the gradient to suggest the next parameters, we expect SPSA to achieve higher accuracy in finetuning the parameters, ultimately making it the preferred method in cases where many evaluations are possible.In addition, in Appendix A we compare the efficiency of BO with results from Hedge et al. [25], based on a genetic algorithm to optimize annealing schedules for QA in the p-spin model.As shown in Fig. 12 we find that BO is able to find schedules outperforming those found by the genetic algorithm, using an order of magnitude fewer iterations.
We next investigate the mechanisms underlying the improved schedules found by BO.Because we use a dependent parametrization of the schedules (see Eq. ( 6)), the minimal instantaneous gap that these parametrizations encounter during the protocol is the same as in the linear schedule; see Fig. 5b.The improvement must therefore stem either from the ability of BO to learn adiabatic schedules that adapt their rate of change to the spectral gap profile, or alternatively, by finding diabatic paths, forgoing the requirement of adiabaticity.We find in our simulations that the latter is the predominant case for the p-spin model; see a typical example in Figure 5. Here, the path found is in fact highly non-adiabatic, as the schedule traverses the region with a minimal spectral gap multiple times due to its nonmonotonicity.This results in diabatic  transitions in and out of the instantaneous ground state, as seen in panel (c) of Fig. 5.However, BO is able to exploit these diabatic transitions such that the obtained fidelity is approximately 35× larger than the one obtained by the linear path.
Finally, we analyze the importance of selecting an appropriate figure of merit (FoM) for BO.In a realistic setting (e.g., when solving a combinatorial optimization problem), using the ground-state fidelity as the FoM is not possible because the ground state of the target Hamiltonian is typically not known a priori.Therefore, we also analyze the performance when proxies for the fidelity, such as the energy, are used as the FoM for the optimizer.In Fig. 6a, we show results when the energy serves as the feedback signal seen by the optimizer.Strikingly, for all parametrizations except for the optimal bang-bang parametrization, the optimizer is unable to learn schedules that would improve on the linear schedule in terms of fidelity.This finding led us to consider different FoMs, which we constructed by taking only the top x-th quantile of the measured bit strings in terms of energy -we denote this by E x .Results from optimizing the real parametrization in terms of different FoMs are shown in Fig. 6b.We find that taking the top 30% or 10% of the measured bit strings improves the obtained fidelities dramatically.Intuitively, it is reasonable to expect that biasing the optimizer to increase the probability of measuring very low energy states as opposed to optimizing for the global energy expectation value for the observed state is a better proxy for the fidelity.Similar observations about the importance of choosing a good FoM in variational quantum algorithms were made in Ref. [67], where Conditional Value at Risk was proposed as a possible alternative.

Reverse Annealing
We next turn our attention to Reverse Annealing (RA) as applied to the p-spin model.Here we limit ourselves to the case of c = 0.8, meaning that in the initial state 80% of the spins are aligned with the final ground state.While RA has been shown to enable an exponential speedup compared to QA, careful fine-tuning of the parameters is required to fully unlock this speedup, if only linear schedules λ(t) = s(t) = t/t f are considered [16].To see this, consider the phase diagram in Fig. 7a.Here, the linear path from λ = s = 0 to λ = s = 1 fails to leverage the part of the phase diagram where the gap is larger.In contrast, the BO-designed schedules shown in Fig. 7a consistently find paths going through the opening in the gap landscape.This opening corresponds to the part of the phase diagram where the first-order phase transition is avoided in the thermodynamic limit -in the case of finite system sizes, this corresponds to the exponential speedup demonstrated in Ref. [16], which we are able to harness using BO.This results in improved fidelities over QA (corresponding to a path with λ(t) = 1) and nonoptimized RA.
The importance of finding a path with favorable spectral properties is apparent in the system size scaling results presented in Fig. 7b.At the largest system size considered here (with N = 150 spins), the median fidelity of the real space BO runs is found to be ∼ 0.76, compared to 3.7 • 10 −6 in the case of a linear RA schedule, yielding an improvement of more than 5 orders of magnitude.

Effects of noise
We now analyze the effects of noise.For context, it has previously been shown by Passarelli et al. [36] that RA loses some of its computational edge over QA in the presence of noise.In particular, in Ref. [36] the effects of dephasing noise were analyzed.The authors showed that QA appears to be more robust against both collective and independent dephasing, with QA outperforming RA in terms of time-to-solution in the presence of these noise channels.
To this end we set up simulations of the adiabatic master equation given in Eq. (10), and analyze the ability of BO to find good annealing schedules in the presence of collective dephasing of various strengths.Figure 8 displays results for several values of the noise parameter η controlling the strength of the coupling to the environment (larger η corresponds to noisier systems).Here we use a total annealing time of t f = 20, and a real parametrization with a total of 4 parameters, for both RA and QA.In line with the results of Ref. [36], we find that with increasing η, unoptimized RA fidelities approach those obtained by QA (see dotted lines).Additionally, we see that QA may even benefit from the presence of noise: notice how fidelities at the largest system size (with N = 30 spins) increase with increasing η.This effect has previously been documented in Ref. [68].Most importantly for the present study, we find that BO is able to find paths that overcome this limitation of RA, thus further motivating the adoption of BO in experimental real-world scenarios typically prone to noise.
Interestingly, the improvement of BO for QA schedules is diminishing for larger values of η.This could be due to the limited flexibility of the dependent parametrization used in the simulations or the qualitatively different spectral properties of the QA and RA Hamiltonians.To investigate the underlying mechanisms, we perform numerical simulations of the same system, with independent controls for QA.Results, summarized in Figure 13, indicate that the additional flexibility offered by an independent parametrization indeed enables BO to design better schedules in the presence of noise.

B. Maximum Independent Set
Next, we turn our attention to solving the MIS problem on Rydberg atom arrays, as available online through Amazon Braket via the QuEra device.We first analyze numerical simulations of the experimental platform and then compare these to actual runs on the QuEra Aquila device.Finally, we extend our analysis to graph sizes  for which classically simulating the quantum many-body evolution as described by the Hamiltonian in Eq. ( 11) becomes infeasible.
We first direct our attention to small graphs, to gain an intuitive insight into efficient schedule Ansaetze.Specifically, we generate all 11 non-isomorphic instances of UDGs with radius √ 2a < R d < 2a, where a is the lattice constant of the underlying 4 × 3 square lattice on which 9 or 10 nodes are placed (leaving between 2 and 3 vacancies), such that the instances have a unique ground state (verified via a brute-force search of all possible configurations).We show three randomly-selected graphs in Fig. 9a and choose the lattice constant to be a = 5.3 µm, which we have found to work best with the considered quantum hardware.
Aiming for low-complexity Ansaetze with a modest number of parameters, it is vital to utilize the corresponding parameters efficiently.In particular, this holds for Rydberg atoms, where in principle (at minimum) two schedules (for global detuning and Rabi drive) can be tuned largely independently.Therefore, we first consider a relatively simple parametrization that uses the initial and final values (∆ i and ∆ f , respectively) of the detuning schedules, and the ramp-up time τ Ω as well as the maximal Rabi frequency Ω max specifying Ω(t) -see Figure 3.In all experiments on the toy graphs the total duration of the protocol was set to t f = 0.7 µs.Because both the ∆ and the Ω schedules consist of exclusively linear line segments, we refer to this Ansatz as the BO-Linear parametrization.The initial parameters for this linear parametrizations are partially inspired by Ref. [30] and partially chosen such that the Rydberg blockade radius R b (see Sec. III B 2) is in the desired range √ 2a < R b < 2a; see Appendix C for details.We optimize the linear parametrization ten different times for each of the 11 small graphs using classical simulations and then evaluate the schedules on quantum hardware for comparison.As the figure of merit optimized by BO, we use the energy of the top 50 th percentile (dubbed H 0.5 ).Finally, we emphasize that in this work we focus on the performance of the quantum device and have thus refrained from using classical postprocessing as done in Ref. [30].
We compare the different schedules in terms of the probability P MIS of measuring the solution to the MIS problem.Results shown in Fig. 15a show that significant improvements can be made over the naive choice of parameters for the linear schedule (blue-shaded lines, average P MIS ∼ 0.08) by just optimizing its parameters (green-shaded lines, average P MIS ∼ 0.35).Additionally, we show in Fig. 14 how the improvement in the schedules manifests itself in that the expectation value of the Rydberg excitations ⟨n i ⟩ of the i-th node matches the MIS (shown in Fig. 9a) ever closer.Finally, the numerical simulation results are found to be in good agreement with the results from the QuEra hardware.Moreover, we observe that the optimal parameters of the Ω(t) schedule concentrate significantly at the maximal allowed value by the hardware for Ω max and for shorter values of τ Ω (see Fig. 15b in Appendix C).The latter value may intuitively be understood as shorter values of τ Ω correspond to extended time intervals t f − 2τ Ω during which ∆ is varied (see Fig. 3).Therefore, the system has a higher chance of adhering to the (quasi-)adiabatic limit, because the rate of change of ∆ is smaller.
Motivated by these findings, we consider yet another parametrization, in which only the ∆(t) schedule is varied, and the Ω(t) parameters are fixed to Ω max = 15.8MHz and τ Ω = 0.1t f .We parametrize the ∆(t) schedule using a real-space parametrization with two intermediate points as described in Sec.II A 2 interpolating between ∆ i and ∆ f , as shown in Fig. 3.We further pass the obtained schedule through a low-pass filter, as previously suggested in Ref. [30].The results obtained using this parametrization are shown in Fig. 9b, indicating that improvements of an order of magnitude in the probability of measuring the MIS state P MIS are possible using this approach.Furthermore, we again observe good agreement between numerical simulations of the optimized schedules and the corresponding results obtained on the real hardware.
We now test the pipeline described above on larger instances.These are generated in a similar fashion to the small graphs in Fig. 9a, in that an underlying square lattice is generated and subsequently filled with nodes, connecting only neighboring and diagonal nodes (i.e., nearest and next-nearest neighbors).Specifically, we consider a grid of size 14 × 14 with a lattice constant of a = 5.3 µm, positioning nodes on a randomly chosen 137 sites, corresponding to a filling of ∼ 70%.It was previously shown that the performance of simulated annealing [69], a commonly used classical technique for combinatorial opti-Figure 10.Example data from the QuEra Aquila device.Above: Unit-disk graph instances with 137 nodes partially filling underlying square lattices with a lattice constant of 5.3 µm, and nearest and next-nearest coupling.The instances are sorted according to their hardness parameter HP (from left to right).The node's color indicates the probability ⟨ni⟩ of the i-th node being assigned to the independent set (⟨ni⟩ = 1 when the node is purple), as determined by the BO protocol.Several nodes appear (approximately) equally likely in and out of the independent set.This is due to the degeneracy of the low-energy solutions which require different nodes to be part of the independent set.Bottom: Histograms of the top 50% of the obtained energies (as defined in Eq. ( 12)) with an unoptimized linear schedule (blue), and the BO schedule (orange).The optimal solution of size |MIS| is indicated by the vertical dotted lines.rBO and rL correspond to the approximation ratios of the linear and BO schedules, respectively.There is a clear improvement on the data when BO is used.mization, is limited by the ratio of the degeneracies of nearly optimal and optimal configurations [30].This is captured by the so-called hardness parameter where N M is the degeneracy of configurations corresponding to independent sets of size M ≤ |MIS|.Using the tropical tensor network algorithm described in Refs.[70,71] we can compute the hardness parameter HP, and quantify the difficulty on a per instance basis.In Fig. 10 a collection of graphs with 137 nodes is shown, which have dramatically different hardness parameters, despite appearing qualitatively similar to the human eye.We run the BO pipeline to optimize the ∆(t) schedule for these graphs and use 100 shots to obtain an estimate of H 0.5 .
Here, we use t f = 4.0 µs, which is the maximal duration implementable on the QuEra Aquila device.While we are not able to solve the instances to optimality, the obtained solutions (taken from 800 samples) are significantly improved compared to the linear schedule; see bottom row in Fig. 10.We quantify this by reporting the approximation ratios where M is the set of bit strings obtained from the device, and H(x) is the MIS cost function, as defined in Eq. (12).For each graph, r BO and r L denote the approximation ratios of the best bit string obtained by the BO and Linear protocols, respectively.It should be noted that the approximation ratio r includes the independence constraint as a soft constraint with a penalty α -as such, the energies reported in Figure 10 can in principle correspond to non-independent sets.However, for BO-designed protocols, the best sets used to compute r BO are always independent, while for the linear schedule the obtained bit strings correspond either to very small independent sets, or exclusively to solutions that violate the independence constraint.This further highlights the dramatic impact of BO on the quality of the obtained solutions.While it is difficult to draw conclusions from a comparably small data set, our results indicate that increasing difficulty (in terms of HP) has a more detrimental effect on the quality of bit strings obtained in the linear protocol than in the BO protocol.This is in line with previous studies [30], which found substantial correlation between HP and the minimum spectral gap of the Rydberg Hamiltonian in Eq. ( 11).It appears as if the BO-designed schedules can (partially) overcome this by adjusting the speed of the evolution to the spectral gap profile or by leveraging diabatic transitions as in the case of the p-spin model.
Although BO is specifically designed for resourceefficient optimization, we find at the time of writing this manuscript that deploying the algorithm on real hardware is difficult due to the relatively long readout times [63].To reproduce the procedure devised in Ref. [30] for a single graph would require several weeks, because their pipeline requires approx.10 5 measurements from the quantum device to optimize the schedules.This simultaneously gives merit to our BO pipeline, which uses an order of magnitude less resources and highlights the need for a more holistic evaluation of quantum hardware, which includes wall-clock running times [72].

V. DISCUSSION
In this work we demonstrated how BO can be used to efficiently design quantum annealing schedules in a diverse set of scenarios, using a relatively low number of evaluations (queries of the cost function).While BO is able to learn suitable parameters for a variety of schedules, even in the presence of noise, stark differences in performance between different parametrizations indicate that the particular choice of a schedule Ansatz is important and depends on the problem (class) in consideration.Similarly, the choice of the figure of merit passed to the optimizer strongly influence the quality of the results.These design choices will depend not only on the specific problem class but also on the resources available (e.g., time requirements).
For the p-spin model BO-designed QA schedules are able to learn diabatic mechanisms, which dramatically improve the success probability.However, the degree to which this can be generalized to other systems is unclear, especially because the model is highly symmetric and has been shown to be amenable to diabatic ground state preparation in the past [48,49].Here, we provide numerical evidence that BO-designed RA schedules can overcome the previously identified shortcomings of RA in the presence of noise [36].This potentially motivates a new research direction of optimized RA schedules.It would likewise be interesting to identify the underlying reasons for the inability of BO to significantly improve QA schedules when noise is present.This may indicate that the diabatic mechanisms are not robust against dephasing noise or that noise somehow hampers the ability of BO to find them.Nonetheless, results shown in Fig. 13 suggest that it is possible to enhance the robustness of QA schedules by considering a more flexible parametrization with independent controls, as demonstrated in Appendix A. While these results might be due to the specifics of the p-spin model, they nonetheless offer an interesting probe into the mechanisms underlying the noise-robustness of (a)diabatic paths.
Furthermore, we have shown how to apply our BO pipeline for solving the MIS problem using neutral atom quantum computers.Specifically, we were able to achieve substantial improvements over the naive linear adiabatic protocol, even for relatively complex graphs with more than 100 nodes.We also find that schedules designed in numerical simulations possess a high degree of transferability to real quantum hardware, where they exhibit comparable performance.This is in line with the relatively low error rates of this qubit modality [63], especially in the analog mode of operation utilized here.Our results along with other recent experimental findings [30] highlight the promise of variational quantum adiabatic algorithms, such as those considered in this manuscript, as a direction for future research in quantum optimization.In particular, our BO pipeline has demonstrated its ability to optimize quantum protocols resource-efficiently, making it a valuable practical tool for exploring variational quantum algorithms.

VI. OUTLOOK
Finally, we highlight possible extensions of research going beyond our present work.The BO pipeline proposed here can be amended in various ways, e.g., by using different choices of kernel and acquisition functions, considering different FoMs (e.g., conditional value at risk [67]), or different parametrizations.Moreover, one could envision schemes combining BO with other optimizers with complementary properties.One such possibility would be to combine it with an optimizer better suited to fine-tune the parameters, which would derive its initialization from the model of the FoM constructed by BO.
As BO has proven to be capable of efficiently optimizing QA schedules, it is natural to consider how it may aid the design of protocols based on extensions of QA, such as including non-stoquastic [73] or inhomogeneous transverse field [74] driver Hamiltonians.Additionally, extending the analysis here beyond the p-spin model would be worthwhile, in particular to determine whether its ability to find diabatic and RA paths can be transferred to other systems.In addition to numerical simulations, experiments on quantum annealers based on superconducting flux qubits could be performed [75,76].Over in the realm of classical optimization algorithms, one might even consider designing the temperature schedules for annealing and tempering algorithms using BO.
Finally, a systematic analysis of BO-designed schedules for solving the MIS problem on UDGs in terms of the hardness parameter is deferred to future work.Our inabil-ity to conduct such systematic studies could be remedied by devising yet more efficient protocols to optimize the schedules, e.g., by tailoring the schedule parametrization to the typical spectral gap profiles of the Rydberg Hamiltonian or by an increased capacity to obtain measurement results from neutral atom devices.Rydberg-based special purpose quantum devices are at their infancy and, as such, there is still a lot to be learned.stronger collective dephasing-as seen in the case of η = 10 −4 in Fig. 8-we show here results of numerical simulations for independent controls.In Fig. 13, the obtained fidelities are shown and compared to the dependent case.We only show results for η = 10 −4 because dependent and independent controls exhibit similar behaviour for lower noise levels.The results in Fig. 13 hint towards the fact that the additional flexibility of the independent parametrization is indeed beneficial.However, the systematic analysis of such effects that would be required to underpin the explanation for these observations is outside the scope of this paper.9a.Here, color denotes the probability of the node being assigned to be in the independent set (purple indicates probability of 1, see colorbar), for the different protocols described in Section IV B. Notice how an improved matching of the colors with the reference schematics in Fig. 9a corresponds to an improvement in PMIS, as shown in Fig. 15a.Because the ground state of all of these graphs is unique, disagreement with the ground truth (as shown in Fig. 9a) stems from experimental noise or from excitations into higher energy states.
device is large enough to accommodate many copies of the small graphs in parallel, and because the parameters of the adiabatic protocol are global (i.e., the same for every atom in the register), we deployed the runs with 12 copies of the graphs in the register.This parallelization allows us to effectively perform more runs, as a single measurement effectively yields 12 shots.These shots are additionally postselected on the correct initialization of the corresponding atoms.
For the more complex graphs, the underlying lattice comprises 14 × 14 sites.
We generate graphs with nodes at approximately 70% of the sites, resulting in graphs with 137 nodes, and filter for the instances that have large hardness parameters HP using the GenericTensorNetworks Julia library [70].To limit the number of shots required for the optimization of the schedules for hard graphs, no postselection on the correct initialization of the device is performed.
The parameters for the linear parametrization are chosen as follows: ∆ i = −30 MHz, ∆ f = 60 MHz, Ω max = 9 MHz and τ Ω = 0.1t f .These values were chosen because they comply with the experimental platform and ensure that the ground state of the Rydberg Hamiltonian in Eq. ( 11) matches the solution to the MIS problem, as discussed in Sec.III B 2).These parameters are also used as the initial guess supplied to the Bayesian optimizer.

Figure 2 .
Figure 2. Minimal spectral gap ∆min for the linear protocol s(t) = t/t f for QA and for the linear path s(t) = λ(t) = t/t f in RA.The results indicate that the exponential closing of the minimal spectral gap in QA can only be circumvented in RA if Γ is chosen correctly.RA results shown are for c = 0.8.

Figure 4 .
Figure 4. (a) System size scaling of the fidelity for BO-QA for the p-spin model, with different parametrizations of control functions (see Sec. II A 2 for details).(b) Scaling of the fidelity obtained by quantum annealing schedules designed with a varying number of iterations of BO and SPSA.For comparison, we also show the fidelities obtained by evaluating the same number of random parameter sets.A real-space parametrization with 4 parameters was used for a p-spin model with N = 15 spins and t f = 3.For reference, the results of the linear schedule are shown, which achieves a fidelity of ∼ 0.03 only.All strategies were trained using fidelity as the figure of merit for the optimizer.

Figure 5 .
Figure 5. (a) Illustration of a prototypical cubic schedule, which is obtained by cubic interpolation between the points (purple dots) determined by BO, and the naive linear (baseline) schedule.The cubic schedule is specified by the purple dots.(b) Instantaneous spectral gap ∆(t) for both schedules.(c) Instantaneous ground-state fidelity for both schedules, indicating that diabatic mechanisms are facilitating the improvements of the BO schedule.Results refer to the p-spin model with p = 3, N = 50 spins and t f = 50.

Figure 6 .
Figure 6.(a) Scaling of the fidelity, when energy is used as the figure of merit.Only the Bang-bang schedule is able to improve on the linear schedule.(b) Scaling of the groundstate fidelity when different figures of merit are used.Here Ex corresponds to the energy of the top x-th quantile (in terms of energy) of the measured bit strings, E to the energy, and F to the fidelity.

Figure 7 .
Figure 7. (a) The BO-RA, RA and QA (λ = 1) paths drawn on top of the spectral gap ∆ landscape, for a system with 30 spins.A real-space parametrization is used with 6 parameters specifying the three intermediate points.BO successfully finds a path through the region where the gap is larger, leading to a larger fidelity (see inset).The median fidelity of the 10 runs shown in the diagram (light teal) is 0.996.(b) Scaling of the ground state fidelity for different parametrizations.BO is able to find schedules with good fidelities for large system sizes, where standard RA fails (see black dashed line).

Figure 8 .
Figure 8. Results on RA for the p-spin model with 30 spins and collective dephasing noise, the strength of which is captured in the magnitude of the η parameter.η = 0 correspond to unitary evolution.A total annealing time of t f = 20 is used.
Figure (a) Example of small toy graphs, with 9 -10 nodes.Nodes colored in purple represent the (unique) solution of the MIS problem.(b) Success probabilities PMIS of solving the MIS problem using a neutral atom quantum device, for both a linear schedule (blue hues) and BO schedules (red hues).Lighter and darker shades correspond to results obtained in numerical simulations (NS) and on actual quantum hardware (QC), respectively.

Figure 12 .
Figure 12.Histogram of the ground-state fidelities for 50 runs of BO with independent controls.Using the same parameters, Hedge et al.[25] report a median fidelity of approximately 0.90, which is surpassed by a cubic parametrization with both four and six parameters.Additionally, BO requires approximately an order of magnitude fewer evaluations of the optimization objective.

Table I .
The functional form of the various schedule parametrizations used in this work; see Fig.1cfor a visualization.Here H(•) denotes the standard Heaviside step function.The rightmost column gives the parameter bounds for the individual parameters.