Scaling overhead of embedding optimization problems in quantum annealing

In order to treat all-to-all connected quadratic binary optimization problems (QUBO) with hardware quantum annealers, an embedding of the original problem is required due to the sparsity of the hardware's topology. Embedding fully-connected graphs - typically found in industrial applications - incurs a quadratic space overhead and thus a significant overhead in the time to solution. Here we investigate this embedding penalty of established planar embedding schemes such as minor embedding on a square lattice, minor embedding on a Chimera graph, and the Lechner-Hauke-Zoller scheme using simulated quantum annealing on classical hardware. Large-scale quantum Monte Carlo simulation suggest a polynomial time-to-solution overhead. Our results demonstrate that standard analog quantum annealing hardware is at a disadvantage in comparison to classical digital annealers, as well as gate-model quantum annealers and could also serve as benchmark for improvements of the standard quantum annealing protocol.

Despite overcoming formidable engineering challenges in building superconducting quantum optimization machines, some fundamental limitations will remain as seemingly insurmountable challenges for years to come [37]. Leaving analog noise aside, both mapping (binary) optimization problems onto 2-local Hamiltonians, as well as the embedding of the resulting 2-local Hamiltonian onto the hardwired topology of the quantum annealer, represent potentially large bottlenecks for solving application problems that typically do not perfectly fit the hardware layout. Already the mapping of general Hamiltonians onto a 2-local model, such as a quadratic unconstrained binary optimization problem (QUBO), can result in a large overhead in the number of variables. In addition, the lack of reliable arbitrary long-range couplers between qubits in superconducting systems results in a quasiplanar layout of quantum annealing devices. While newer generations of superconducting quantum annealing devices have increased the qubit connectivity, the underlying topology remains local and therefore unable to natively accommodate long-range couplers between variables, which are often present in applications.
Such intrinsic limitations of analog computing devices have played a major role in the shift to programmable digital computers in classical computing. Digital approaches, either on * The work of H. G. K. was performed before joining Amazon Web Services classical hardware or on future gate-model quantum computers, circumvent the embedding problem because the programmable logic of these devices allows for arbitrary Hamiltonians on arbitrary graphs. While scalable gate-model quantum computers are not yet available, quantum-inspired optimization methods [24] digitally implemented on classical hardware have gained traction to solve application problems [38][39][40][41][42]. For current superconducting quantum annealing hardware, however, highly connected 2-local logical problems with N variables need to be embedded onto the sparseconnectivity hardware graph of O(N 2 ) physical qubits. The additional degrees of freedom of the embedded problem need to be restricted using additional constraints in form of additional coupling terms. While this space overhead (which, incidentally, can sometimes be used for error-correction schemes [4,19]) has been well studied, here we demonstrate that there is an even more significant time overhead in time to solution [8,24].
Specifically, in this work, we focus on three embedding schemes: embedding on a square lattice, embedding on a chimera lattice, and the parity adiabatic quantum optimization (PAQO) or Lechner-Hauke-Zoller (LHZ) scheme. As an example, we illustrate these embedding approaches using the fully connected six-variable binary problem depicted in Fig. 1(a). Our focus in this work is on the embedding overhead. As such, we do not discuss the locality reduction overhead introduced by converting a binary k-local problem to a 2-local. Also, without loss of generality, we assume that the problem does not have local biases (i.e., a field or 1-local term), because these can always be implemented as 2-local terms with a fixed auxiliary variable. For square and chimera embedding we use majority decoding, while belief propagation [43] is used for the PAQO scheme.
The paper is structured as follows. In Sec. II, we discuss the different embedding strategies analyzed in this work, and then, in Sec. III, we outline the technical details of the benchmarking. Section IV shows our main results, followed by a) Direct b) Square c) LHZ d) Chimera c=3  Sec. V on embedding penalties. We summarize our findings in Sec. VI. Appendices A and B contain two technical notes about PAQO constraints and avoided level crossings.

II. DETAILS ON EMBEDDING SCHEMES
In this section we describe in detail the different embedding schemes, as well as the corresponding decoding strategies. In general, we aim to optimize a binary quadratic optimization problem on a complete graph, i.e., where the variables σ i ∈ {±1} and the problem is specified by the values of the couplers J ij . Note that in this work we do not study linear terms in Eq. (1) without loss of generality.
A. Embedding on a square lattice Square-lattice embedding reduces an all-to-all logical system to a physical system connected only locally. This is achieved by representing each logical variable as a chain of strongly coupled physical qubits. Figure 1(b) shows the square embedding for the example presented in Fig. 1(a). Each logical variable is represented by a chain of qubits of length N − 1. With a careful chain layout, each chain of physical qubits has a coupler to every other chain, where the logical couplings (red, light color) are located. The black (dark) couplers along the chains are the constraints that ensure that all variables on the chain have the same value, i.e., either all up or all down. If the logical problem has a special structure or limited connectivity, the embedding of the logical problem can be optimized to use as few physical qubits as possible with the minor embedding [44]. An optimal optimization, however, poses the risk of being computationally more expensive to solve than the actual optimization problem itself. Since we use heavily connected as well as all-to-all-connected problems, use of the minor embedding yields no benefits. We choose the constraints for each logical variable as where ω is a constant determining a fixed base constraint, while γ is the so-called "sum constraint" that is multiplied by the sum of the absolute value of the couplers J ij involving variable i. This allows us to constrain variables with more or larger couplings more strongly than variables with fewer or smaller couplings, therefore ensuring that the system is not overconstrained. In general, one wants to use constraints that are as small as possible because large penalty terms tend to affect the performance of quantum annealing hardware [45]. However, if the constraints (penalties) are not strong enough, the chains that link physical variables to create logical counterparts have the potential to "break", thus losing the logical information about the problem. If the decoding strategy can decode errors (broken chains), we might even deliberately choose weak constraints and fix the errors later, as this is beneficial for solving the problems faster. The chain length scales linearly with the number of variables. Similarly, if γ = 0, the constraints also scale linearly with the number of variables, as an increased number of variables typically increases the value of the sum in Eq. (2). Once the logical problem is embedded in the physical system, the optimization procedure is executed. The physical solution then needs to be decoded to obtain the solution for the underlying logical system. If no constraints are broken, there is a trivial mapping from the chains to the logical variables. If constraints are broken, there are many decoding strategies for square-lattice embedding. We choose a straightforward and computationally inexpensive strategy here. For every chain, majority voting determines the value of the logical variable, depending on the value of the majority of physical variables in a chain.

B. Embedding on a chimera lattice
Chimera-lattice embedding [3], which is used in the D-Wave 2000Q device, uses a variant of the aforementioned square-lattice embedding. The chimera lattice consists of bipartite unit cells of size 2c for some fixed integer c. For the D-Wave 2000Q, c = 4. A logical variable is represented by a chain of physical qubits but it takes fewer qubits to represent the variable due to the higher connectivity in a single K 4,4 cell. Figure 1(d) shows the embedding on a c = 3 chimera lattice for the example shown in Fig. 1(a). The cells at the bottom and on the right contain the same c qubits, while all the other cells (in this example, only one) contain 2c unique qubits. As for embedding on the square lattice, we use majority voting for decoding. Note that the embedding strategy outlined here can also be used for the new generation of D-Wave devices, i.e., the D-Wave Advantage with the new Pegasus topology. Because the connectivity is higher, shorter chain lengths are needed. However, the lattice remains quasiplanar.

C. Parity adiabatic quantum optimization
An alternative way to map the logical problem has recently been proposed in Ref. [46]. The PAQO or LHZ scheme encodes the logical problem differently to square-and chimeralattice embedding. Instead of having the notion of logical variables in the physical system, each physical variable encodes the product between two logical variables α i,j = σ i σ j , i.e., stores if they are equal or opposite. The main advantage of this approach is the mapping from logical couplings to physical local biases, which can be controlled better than two-qubit physical couplers. Figure 1(c) shows the PAQO embedding for the example in Fig. 1(a). While the locality of the Hamiltonian goes from quadratic to quartic, a fully connected nonplanar graph turns into a square lattice. This embedding will be of considerable relevance once four-way couplers become available on quantum annealers.
The constraints are modeled as 4-local terms [47] between four variables that form a tile. Fixed variables that are positive can be introduced at the lower-right edge to complete tiles [not shown in Fig. 1(c)]. Because every logical variable appears twice in a constraint term, it must be 1, i.e., for the example in Fig. 1(a), where α denotes physical variables (qubits) and σ denotes logical variables. This that means every tile needs an even number of negative physical variables. The Hamiltonian is therefore defined as where l iterates over all 4-local constraints that have one physical spin in each corner and C l (ω, γ) = ω + γ (J + J + J + J ), analogous to Eq. 2.
In general, the scaling of the constraints depends on the problem class and can be analytically understood for the ferromagnetic, antiferromagnetic and spin-glass cases with exponents between 0.5-2 [48]. Here, we choose to scale the constraints according to the worst-case scenario, i.e., ∝ N 2 , which is motivated in Appendix A. An alternative is to set the constraints, perform the optimization, and then verify if any of the constraints are broken. If so, increase the value and iterate. This, however, can be computationally costly.
The PAQO embedding shares similarities with a lowdensity parity code, which can be decoded using belief propagation, as also proposed in Ref. [43]. The original choice of constraints is not unique in any way; there are many combinations of physical variables that form loops, meaning that each logical variable occurs an even number of times leading the expression to be 1. Belief propagation uses this fact and creates, for example, all possible three-variable loops, e.g., α AB α AC α BC = 1 and determines which of the physical variables are most likely to be wrong in a broken constraint. While the random bit-flip error resilience of the PAQO scheme might be very useful in an implementation, it cannot correct for single broken constraints as shown in Fig. 8. Belief propagation then decodes the wrong logical state if started from Fig. 8(e) compared to Fig. 8(b). Hence, reducing the constraints is undesirable.

III. METHODS
We compare the different embedding schemes using the time to solution (TTS) [8] for quantum annealing with the goal of determining the cost overhead compared to a direct optimization of the logical problem. The TTS s is defined via where is the number of repetitions needed to obtain a given target success probability p tar , T is the anneal time, and p g.s. (T ) is the probability that the ground state is found. There are two possible types of ground states that interest us, the physical one and, more importantly, the logical one. It is valuable to investigate the physical ground state, because it can be decoded trivially. The result can then be compared to that obtained by the decoder to check its performance. We use a simple grid search to determine the optimal parameters and thus extract the true scaling. This works well because s ptar (T ) is typically a convex function, as shown in Fig. 2(b). To obtain the optimal s opt that describes the actual scaling, the TTS is minimized with respect to the anneal time T , i.e., To keep the notation concise, we use s P = s opt,physical 90% for the optimal time to solution of the physical ground state and s L = s opt,logical 90% for the logical ground state after the decoder is run.
As a benchmark problem and a proxy for real-world applications, we use two instance types. First, we use unweighted MaxCut instances characterized by their connectivity density p, where p = 1 (100%) corresponds to an all-to-all-connected graph. Given a graph of size N and density p, we generate instances with N vertices and N e = p[N (N −1)]/2 edges. This number is then rounded to the nearest integer. The seed s initializes the random number generator that draws N e random edges from the set of all possible edges. All edges have an antiferromagnetic weight, i.e., J = −1. The second problem family class consists of all to all connected (p = 1) instances with weights drawn from a Gaussian distribution, such that there are ferromagnetic and antiferromagnetic weights, i.e., a Gaussian Sherrington-Kirkpatrick [49] spin glass.
We simulate 31 different logical problems with size N = 5 -35 variables. For each system size N , we generate 30 random instances of three types: p = 0.3 MaxCut, p = 0.5 Max-Cut, and fully connected Gaussian spin glasses with mean µ = 0 and variance σ = 1. Using simulated quantum annealing, we study the logical problem directly, the problem embedded in a square lattice, the problem embedded on a chimera lattice and the problem encoded in the PAQO scheme. For each instance, we use 30 repetitions of the transverse-field-simulated quantum annealing [31,[50][51][52] algorithm with 29 different annealing times between 5 and 28000 Monte Carlo sweeps. Furthermore, we use 1024 Trotter slices and an inverse temperature of β = 1024. The transverse field (annealing schedule) is varied from 0.5 to 0.001 in a square manner, with the steep edge at the beginning. Multiplying all dimensions yields approximately 1.1 × 10 7 simulations to be performed. To reduce the search space, we drop selected parameter combinations, e.g., a small logical problem size with a long annealing time and vice versa. This, in turn, reduces the search space to approximately 9 million simulations. Finally, for squareand cimera-lattice embedding we use ω = 0 and γ = 1.1 in Eq. (2), while for the PAQO scheme we use ω = N 2 /50 and γ = 1.1 for the constraints.
As an example, Fig. 2(a) shows the ground-state (g.s.) hit probability for a single specific instance and different annealing times. By counting the Trotter slices that have the groundstate energy, we can compute the ground-state hit probability, because a measurement corresponds to choosing one random Trotter slice. We then convert this quantity to s 0.9 and display the result in Fig. 2(b) where s opt 0.9 is marked by an arrow. Following the same procedure, we calculate s opt 0.9 for every individual instance. It is quite possible that different Max-Cut instances of the same size and density display their s opt 0.9 at different T . Figure 3(a) shows these s opt 0.9 versus different instance sizes. As an example, we show data for the logical problems embedded onto the chimera lattice.
To estimate the embedding overhead, we compare the data in Fig. 3 -Hit) for various annealing times T using simulated quantum annealing. The slower we anneal, the more likely we are to find the ground state. The specific instance used in this example is a MaxCut instance of size N = 25 and density p = 0.3. For each annealing time, we repeat the algorithm 30 times. The box plot shows the first to third quartile, while the whiskers represent 5% and 95% of the measured data. (b) The time to solution (with a probability of 90%) versus the annealing time T . Because the ground-state hit probability is not a linear function, it is faster to repeat short-annealing-time runs than to invest in one long run. Therefore, we measure the mean and not the median. Both panels have the same horizontal axis. Figure 4 shows the s L of the direct simulation and the three embedding schemes versus the logical system size N . The data clearly show a time-based penalty due to the different embedding schemes over the direct simulation of the logical problem. To better quantify the embedding overhead, we study the ratio between the TTS of the logical and physical problems. The data are shown in Fig. 5 as an instance-byinstance scatter plot. The data have an approximately linear trend (log-log plot), indicating a polynomial overhead in the time to solution.

IV. QUANTUM MONTE CARLO RESULTS
Fitting the data in Fig. 6 to a linear function of the form y = mx + c yields the parameters shown in Tab. I. We show the fit for both s L and s P . While it seems nonintuitive that s L scales worse than s P , Fig. 7 shows that the decoding yields more benefits for small sizes and hence distorts the scaling. However, the actual graph of s L is bounded by s P , because s P is trivially decodable. Therefore, the scaling of s L is also bounded eventually by s P . Thus we assume the smaller scaling of the two. A stronger decoder might lead to a better scaling but most likely at a corresponding level of complexity re-    The data shown are for the three embedding schemes, as well as a direct simulation of the logical problem (bottom data set). The sL for each instance is determined by averaging over 30 random starts of the algorithm. The reason for the absence of the largest system size for the PAQO scheme is the inability to find the ground state, likely due to insufficient constraint scaling. Both panels have the same vertical axis. garding its run time.

V. A NOTE ON EMBEDDING PENALTIES
Both an all-to-all spin glass and the encoded spin glass suffer from an exponentially closing gap [53]. Due to the chainlike nature of the physical spins representing a logical spin, all three embeddings are subject to the fact that the gap of a ferromagnetic chain in a weak field is exponentially suppressed [51] with system size N . More precisely, Ref. [51] has shown the gap to be ∆ ≈ Γ N /J N −1 , where Γ is the field strength and J the ferromagnetic coupling of the chain. The gap is related to the tunneling rate to flip the chain, which determines the dynamics and hence the required annealing time T . The reason why the physical spin chain must be able to flip is motivated in Appendix B with a three-spin example. Furthermore, the constraint-to-logical coupling ratio worsens with growing chains, because longer chains require stronger constraints to prevent breaks. However, an embedded system using O(N 2 ) spins is not as slow as a native O(N 2 ) problem, because the  Figure 6. The same data as in Fig. 4 but displayed with log(sL) embedded versus direct optimization of every independent instance as a scatter plot. The black crosses show the first to third quartiles for each logical size. The linear scaling in a log-log plot indicates a polynomial scaling. All panels share the same vertical and horizontal axes. constraint structure is ferromagnetic, which is easier to solve.

VI. CONCLUSIONS
We study the time overhead when solving quadratic binary optimization problems using quantum annealing after embedding the problems with three different embedding schemes. In all cases, there is a sizable time overhead in the time to solution with exponents between m = 2.35 and m = 5.03, and an exponential slowdown with respect to the system size N . A constant-speed advantage of analog quantum annealers is quickly decreased and turned into a slowdown by this run-time penalty for embedding. Therefore, in the absence of long-range physical couplers that would allow for the study of problems without the need of embedding schemes in superconducting quantum annealing machines, simulated quantum annealing scales better in time to solution than analog quantum annealers with quasiplanar graph topologies. Similarly, programmable gate-model quantum computers could . sL plotted versus sP for the three embedding schemes and MaxCut (p = 0.3, left column) and Gaussian spin glasses (right column). A decoder can only make sL faster. If we run a trivial decoder that accepts no errors, then sL is equal to sP -shown as the black circles. A trend to smaller benefits toward larger sizes (i.e., larger sP ) is visible, which leads to slightly worse scaling when fitting the data. Furthermore, it is visible that the decoder for the PAQO scheme is more efficient than majority voting for embedding on both the square and chimera lattices. While it reduces sL by orders of magnitude compared to sP -which could be invaluable for any realworld applications-it does not improve the scaling.
also implement arbitrary problems with arbitrary connectivity with only a polynomial and not an exponential overhead in the problem size N .
In light of these results, as far as analog quantum annealers are concerned, it is also of great interest to develop novel approaches to potentially overcome these limitations. These could include variable connectivity [54], nonlinear driving [55], variational quantum annealing [56], as well as counterdiabatic driving [57]. The present results could serve as a benchmark for future developments of hardware quantum annealers. The constraints for the PAQO embedding need to be selected more carefully because a safe lower bound for the constraint strength is not immediately evident. Consider the following logical problem, which consists of N all-to-allconnected spins divided into three groups, A, B, and C. Within each group, they are coupled with J = 1 and want to be parallel. Each group is connected with J = −1 to any spin of the other groups, i.e. groups A, B, and C all want to be antiparallel to each other, making them frustrated amongst each other. Because it is not possible for A, B, and C to be all antiparallel, at least one pairing needs to be parallel. Figure 8 shows the corresponding PAQO embedding where allowing one broken constraint lowers the energy by E = (1/9)N 2 . Note that this argument can be repeated in a recursive way in the regions A, B, and C to reach a bound of E = (1/6)N 2 . This lower bound dictates the constraint to grow at least as fast to prevent being broken in the physical ground state. Because this is just one example, it is not a safe theoretical lower bound for any arbitrary problem. As such, calculating the necessary minimal constraints is more difficult than for square-and chimeralattice embedding. We expand the analysis of Fig. 8 in Fig. 9, where we change the problem in Fig. 8(a) such that we can observe the break at every point and plot the strength required to prevent it. Again, this is one problem family with one antiferromagnetic and one ferromagnetic coupling and not a guaranteed lower bound. But if any one constraint is below this bound, we can construct an instance for which the constraints are too weak to find the proper physical ground state. The decoder might still recover from this error, but it is straightforward to construct problems where this is not the case.

Appendix B: A Note on Avoided Level Crossings
In hard-optimization problems, avoided level crossings [53] arise in the annealing spectrum because the driver Hamiltonian might initially favor certain configurations that become unfavorable once the driver is weak enough. To remain in the instantaneous ground state and follow the adiabatic anneal Figure 9. The minimal 4-local-term constraint strength required. Using the example from Fig. 8 and changing the number of spins in Groups A, B, and C (and therefore the size of the blocks I, II, and III) allows one to move the constraint position that breaks to any point and observe at which strength it breaks. This is mapped out in this contour plot in terms of N 2 . It shows that the 4-local-term constraint must be stronger in the middle of the PAQO embedding and can be weaker at toward the borders. This is, however, just one type of problem family and not a rigorous lower bound. path, the spins have to flip. In contrast problems where the initial horizontal spins (due to the transverse field driver) only move toward up or down and do not need to flip during the anneal are much easier to solve . We include a three-spin example [58] with the spectrum shown in Figs. 10(a) and 10(b) that displays such an avoided crossing at t ≈ 0.92, where T is the annealing time and t is the progress in the annealing schedule. The gap in Figs. 10(c) and 10(d) shrinks linearly to near zero and then grows again. The instantaneous ground state displayed in computational-basis probabilities in Figs. 10(e) and 10(f) clearly shows that the first two spins flip from up (p up ≈ 1) to down (p down ≈ 1), while the third spin changes from a superposition to down. Figures 1(a) -1(d) show the spins that need to be flipped if one flips the logical spin marked C. For square-and chimera-lattice embedding, it corresponds to a ferromagnetic chain that grows linearly with the system size N . In the PAQO scheme, we also require a number of flips that grows linearly with N . The physical spins are connected by 4-local terms and hence form a chain structure as well, as can be seen in Fig. 1(c). Any embedding needs to accommodate the flips of logical spins during the anneal (ideally) in an efficient manner. f) Figure 10. Spectrum, energy gap, and instantaneous ground state for a 3-spin ferromagnetic chain with individual local fields [58]. Panel (a) shows the spectrum of transverse field annealing, starting with the pure transverse field at t = 0 and transforming to the pure problem Hamiltonian at t = 1 in a linear manner. (b) Avoided level crossing between the two lowest states. Panel (c) show the first energy gap with a zoom in log scale shown in panel (d). The gap reaches ∆ ≈ 10 −3 at its smallest position. (e) The instantaneous ground state is shown by the probabilities in the computational basis, of which we only label the three dominant parts and display the remaining five as black lines. Panel (f) shows the sudden rise and fall of the ground-state components representing a flip in the first two spins, and changing the third spin from a superposition to up.