Quantum Optimization of Fully-Connected Spin Glasses

The Sherrington-Kirkpatrick model with random $\pm1$ couplings is programmed on the D-Wave Two annealer featuring 509 qubits interacting on a Chimera-type graph. The performance of the optimizer compares and correlates to simulated annealing. When considering the effect of the static noise, which degrades the performance of the annealer, one can estimate an improvement on the comparative scaling of the two methods in favor of the D-Wave machine. The optimal choice of parameters of the embedding on the Chimera graph is shown to be associated to the emergence of the spin-glass critical temperature of the embedded problem.

NP problems, such as classical paradigmatic computer science problems [1] as well as practical engineering problems [2] can often be formulated efficiently as Quadratic Unconstrained Binary Optimizations (QUBO). These computational challenges can be seen as the task of finding the ground states of a disordered Ising spin glass often defined on a potentially highly-connected graph [3].
One tantalizing approach to solve QUBOs in their Ising formulation is provided by programmable quantum annealing. While the founding principles of the technique have been investigated numerically [4], analytically [5] and experimentally [6] in the last decade, the disordered, interacting, time-dependent and open nature of the many-body problem makes it very hard to draw universal conclusions about the power of the technique [7].
One very recent development that boosted scientific activity in this field has been the commercialization of D-Wave Two optimizers, which implements the annealing approach by means of a solid state architecture consisting of hundreds of interlaced superconducting flux qubits [8]. While the manufacturing methods and the computing technology are well documented, understanding the power of the machine is a formidable challenge for the aforementioned reasons, with the additional hindrance that the heavy integration of the circuitry entails the existence of static and dynamical sources of noise that are in part unknown. For these reasons groups around the world have started to experimentally benchmark the machine [9][10][11], nurturing a lively discussion on whether the device is making functional use of quantum mechanics for computation [12] and how to properly measure speedups between different computational/experimental algorithms [13,14]. On a more pragmatic level, the chip was also tested to evaluate its performance on toy-application problems in the fields of network diagnostics [15], artificial intelligence [16], computational biology [17], mathematics [18] and machine learning [19]. One typical occurrence in applied problems is when the QUBO to be solved is derived from a linear binary optimization problem with large number of constraints such as enforced equalities or inequalities between linear relations of variables. In this case the resulting penalty terms in the objective function form intersecting cliques whose minimization might be a hard computational problem for classical algorithms such as simulated annealing.
Motivated by the great value of quantifying the power of quantum optimization on valuable applications, in this work we report on the optimal programming guidelines and performance expectation of the D-Wave Two Vesuvius chip, applied to problems defined on fully-connected graphs with random couplings in the absence of longitudinal local fields. This Hamiltonian corresponds to the Sherrington-Kirkpatrick Model (SKM) with couplings randomized from a bimodal distribution of values ±1 [20]. The SKM is directly related to the graph partitioning problem [3] which is known to be NP-hard, and supports a spin-glass phase at finite temperature with transverse fields. For these reasons, it represents one of the most interesting benchmarks to evaluate the performance of the optimizer on structured problems. Moreover, the encoding of the SKM on the D-Wave hardware has very interesting symmetry properties, allowing us to elegantly investigate general procedures common to all structured optimizations on annealers, such as the parameter setting of embedding and the error-correction, which in the general case require heuristic numerical preprocessing [21].
The D-Wave Two "Vesuvius" chip hosted at NASA Ames Research Center features 509 working flux-qubits connected by 1455 tunable composite qubits acting as Ising-interaction couplings [22], arranged in a non-planar lattice known as a "Chimera graph" [23]. In order to implement general Hamiltonians which are defined on arbitrary graphs, it is customary to employ the graph-minor embedding [24] technique. This procedure consists of finding a set of connected subgraphs (logical bits or LB, corresponding to different colors in Fig. 1) of the original graph such that each LB can be associated to a node arXiv:1406.7553v1 [cond-mat.dis-nn] 29 Jun 2014 Figure 1: (Color online) Illustration of the iterative embedding procedure of the SKM in the Chimera graph. Different colors represent the N logical bits, which are arranged in N/4 groups of colors (reds, violets and cyans, indexed by k). The corresponding images of fully connected graphs on top show that logical bits in the same group of colors have two different ways to be connected by a physical coupling on the Chimera graph by having a thicker edge between them. The arrows indicate two qubits with their respective indices convention with reference to the labeling of Eq. 2, where l indicates position within a given color group and i is a running label of the position of the qubit in the chain starting from the uppermost (top-left) i = 1 up to the last i = N/4 + 1.
in the original graph. This association needs to be such that for each two connected nodes there exists at least one edge between the qubits belonging to the associated LBs. While the problem of finding an optimal (i.e. minimizing the number of required nodes) graph minor is itself NP-hard [25] and is typically tackled with heuristic approaches [26], for many graphs with a regular struture an efficient embedding can be found systematically. Fig. 1 shows an embedding of SKM in a triangular portion [27,28] of the Vesuvius processor: each LB in the original problem of size N is represented by (N/4)+1 qubits connected in a line. This means that this embedding procedure encompasses an overhead of N 2 /4 + N hardware qubits for encoding fully-connected graphs. Note that a quadratic scaling of the embedding resources for SKM is expected for any hardware graph with fixed degree. The embedding procedure is useful for the encoding of the problem Hamiltonian into the hardware processor as long as the qubits in each LB are collapsed on the same z-value at the end of the annealing. The basic idea is to ferromagnetically couple all qubits with a negative weight J F within a LB in such a way to energetically penalize discordant qubit states. The remaining couplings can be assigned to reflect the logical Hamiltonian of the problem to be solved.
With reference to the index convention illustrated in Fig. 1, the actual Hamiltonian which is encoded in the annealing machine is then: where S * i and σ * (kl) i are respectively the LB and the Pauli operators corresponding to the qubits along the * -direction. The logical SKM couplings J ij have been explicitly divided among the inter-cell couplings J (kl,kl ) and the couplings between different groups of colors J (kl,k l) and the bounds on the summed variables are implied. Since the maximum allowed energy coupling in the D-Wave Hamiltonian is 1, increasing J F is equivalent to rescaling the logical couplings. A(t) and B(t) are the time-dependent coefficients that define the annealing schedule performed by the machine [29]. It is immediately apparent that from the dynamical perspective that the optimal prescription on the value of J F might be tricky to evaluate despite the fact that it is always possible to set its magnitude to be sufficiently high to make sure that the target ground state still lies at the bottom of the embedded classical spectrum [30]. Fig. 2 shows the median probability P GS for the analog optimizer (run at fixed annealing time τ =20 µs) to reach the ground state. For a given problem size N it depends significantly on J F and goes to zero for large and small value of J F . For J F 1, the ferromagnetic couplings are not energetically stronger than the logical couplings and we expect that the problem is not well encoded. Indeed many chains representing LBs are found in excited states (i.e. having 1 or more kinks) as illustrated by the colored bands of the plot which displays the improvement on P GS obtained by a post-processing procedure which tries to recover logical states from broken chains by doing majority voting (similarly to errorcorrection/repetition codes [31,32] [53]). Conversely, for sufficiently large J F , defects in the LBs are suppressed, but the overall annealing success probability decreases after an optimal J F . The appearance of this maximum can be connected to the expectation that the anneal- The average is taken over 80 instances per size, and runs are performed using 10 random gauges [21]. The black line and the inset indicates the optimal JF for a given size, which increases with N as a power law close to √ N . Errorbars are obtained through resampling.
ing dynamics is more efficient when the ferromagnetic LBs become correlated at the same time as the described SKM enters the spin-glass phase. This is because once the chains feel the ferromagnetic fixed point (for a transverse field of A(t) B(t)) their dynamics slows down and might reasonably impede the development of correlations between the logical states, while in the paramagnetic state they are more easily subject to the formation of kinks. This argument is also supported by the scaling of the optimal coupling, which can be fit as a power law with an exponent close to 1/2, which is consistent with the critical transverse field of the embedded SKM, which goes proportionally to B(t) √ N /J F [33] (connected to Fig. 4 described later on). Comparisons with embedding and runs on embedded 2D-lattices also support the above theory (as detailed in the Supplemental Material (SM)). Figure 3 shows the median expected runtime (in seconds) T RUN for the annealing device to find the groundstate with 99% probability, for different J F and the experimentally shortest possible τ = 20µs. The thicker blue line shows the scaling of complexity with the system size assuming the optimization on J F with a precision of ∆J = 0.25. This exponential scaling and the absolute runtime seems very similar (and correlates well [SM]) to the performance of simulated annealing (SA) on the same logical instance set extended up to N = 50 whose runtime is optimized over τ measured on Intel Xeon E5-2680v2 processors. However, it is well known that the Hamiltonian parameters programmed on the analog optimizer are subject to low-frequency noise that can be modeled as static gaussian disorder realization for each instance [34]. One could argue that this noise (whose presence is not fundamental but rather an engineering issue) introduces an artificial handicap in the evaluation of the performance of the D-Wave machine, as the programmed problem might significantly differ from the target objective function to be minimized. We introduced the noise effect in the logical instance runs with SA in order to compare the scalings with on fair grounds. While on the scale of the maximum physical energy programmed in the problem Hamiltonian (i.e. 3.2 GhZ) this model of noise has a negligible effect [34][54], the rescaling of the absolute energy of the logical parameters due to the introduction of J F proportionally amplifies the relevance of the unwanted disorders. The considered noise model spoils the J (kl,k l ) couplings of Eq. 2 and introduces artificial longitudinal local fields [55]. More specifically, as the logical couplings J ij are chosen to be ±1, this implies that the problem Hamiltonian to be compared with D-Wave II runs at fixed J F must be spoiled as follows: where ξ ij J and ξ i h are disorder realizations with gaussian distribution around zero of respective standard deviations σ ξ J =0.035 and σ ξ h =0.05 [34]. Results are averaged over 1000 realizations for every instance and new optimal speeds have been computed for the final scaling [SM]. What is observed is that starting from N=12 the noise significantly affects the probability for the spoiled sys-tem to find the ground state of the ideal Hamiltonian. As detailed in the SM, for every fixed level of noise ∝ J F there is indeed a problem size above which the noise tends to shift the ground state of the noisy Hamiltonian outside the manifold of the degenerate ground states of the ideal Hamiltonian, independently from the algorithm used to compute the ground state. We note that this effect is likely to be dominant over the slow-down of the LB dynamics conjectured to be responsible of the sharp decrease in performance of the device for large J F observed in Fig.2, and more analysis is needed to establish if this is the case.
While unsurprisingly the current limitations on the number of qubits do not allow us to draw final conclusions on whether the machine has a sound speedup with respect to classical digital methods, the scaling results are encouraging. While it is now established that speedup might emerge artificially due to suboptimal annealing speed [13] (τ = 20µs would supposedly become optimal only for larger N) as well as due to correlation between different subsequent runs [35], we have shown evidence that this is likely to be masked by the detrimental effect of the noise (which is expected to be significantly reduced in future generations of the device [34]). Most importantly, our work elucidates how evaluating the comparative performance of analog optimization with respect to algorithmic methods on necessarily embedded problems is more delicate than it is on natively structured problems. This is largely because the correct representation of the target problem requires an optimal tuning of the analog optimizer, which is dependent on the hardware architecture and the programmability precision. The statistical reasonings behind benchmarks [13] and complexity estimation [36] on natively structured problems performed by previous works need to be extended considering that in embedded problems the number of LBs does not reflect the number of qubits for the comparison of required resources [56]. Moreover, the fact that the logical Hamiltonian is emergent from a corse-graining of the hardware Hamiltonian, which has ferromagnetic correlations due to embedding, carries with it potentially profound consequences regarding the expected complexity of the annealing procedure on the logical problem. In the SKM this also means that the shape and the location of the critical region associated with the spin-glass phase is dependent on the internal representation parameters such as embedding topology and optimal J F .
In order to gain insights on these issues, in Fig. 4 we examined, by means of SA simulations, the emergence of the spin-glass phase of the embedded SKM model (Eq. 2), i.e. the appearance of a pseudo-critical (normalized) spin-glass temperature T SG as a function of α F = J F / √ N . Our findings are compatible with a scaling exponent κ 3 (as expected from the exact SKM) for the universal spin-configuration overlap Binder ratio behavior i between two replicated runs A and B at various temperatures T for each αF in the figure. Error bars correspond to error-propagation over the intersecting region, and the dashed black line indicates the intersection relative to the logical problem (TSG(60 − 24) saturates in the limit αF → ∞ to a value of (0.76 ± 0.08)/αF , which is smaller than the known exact value of 1/αF . This discrepancy is due to finite size effect for the ensemble sizes studied in this work. See SM for more details.
and with an increase of the pseudo-critical temperature with α F towards the theoretical value of the unembedded model, which in the thermodynamic limit is T SG = 1/α F (but for finite N the Binder curves intersect at a smaller value [37]). This means that embedded problems can belong to a different universality class than random chimera problems, answering a question at the center of the current discussion in the quantum annealing community [36].
These results support the intuition that the ferromagnetic couplings need to increase as √ N (up to logarithmic corrections) in order to properly represent the SKM for large sizes. Interestingly, the experimentally optimal α F (see Fig. 3) in our runs is close to 1.0, meaning that the machine is better off optimizing the spectrum of an embedded representation of the SKM model whose critical temperature is appreciable. Moreover, the critical temperature explored by the D-Wave machine at optimal parameter setting is larger than the experimental one, which might have profound consequences on the asymptotic computational complexity of quantum annealing on the embedded SKM.

Supplemental Material: Quantum optimization of fully connected spin glasses
Optimal parameter for the classical simulated annealing In order to compare the performances of the D-Wave II device with respect to other classical methods, we studied the probability of success of simulated annealing (SA) heuristics [38] on the same instances that we run on the annealing machine. Since classical algorithms are not limited by hardware connectivity, we performed the classical simulations using the "logical" Hamiltonians, namely the original problem Hamiltonian without any further embedding. The number of logical spins for the Sherrington-Kirkpatrick model (SKM) considered varies from N = 4 to N = 60, with N = 30 the upper limit of the maximum number of spins that can be actually embedded on a 512-spins Chimera Hamiltonian without breaking symmetries.
As "fair" quantity to compare the performances of both the quantum (D-Wave II) and classical (SA) devices, we used the "expected" runtime defined as with m the number of sweeps and p(m) the success probability after m sweeps, while s is the probability of success that one wants to achieve (from here s = 0.99) [13].
Here τ (m) is the cpu time it takes for a classical core to perform a single sweep. The "optimal" computational time T opt will be defined as where m * = argmin m T exp (m) is the optimal number of sweeps, or "speed".
The SA heuristic depends on two parameters: The initial temperature T 0 and the total number of sweeps m. At the beginning, the configuration is initialized to a high temperature configuration. Starting from an initial temperature T 0 we slowly cool down the system until T = 0 is reached. The cooling down is performed using a linear schedule in m sweeps. In the case of SKM, the initial temperature is chosen equal to the critical spin glass temperature of the model, aka T 0 = √ N [39]. For any T 0 , m and instance, we repeated SA schedules 1000 times.

Role of the noise for the classical simulated annealing
The DWave II runs are plagued by uncontrollable noise originated by the non ideality of manufactured qubits, the low-frequency fluctuations, the unwanted flux offsets, the low-frequency noise, on-chip crosstalk as well as by errors associated to the many digital-to-analog converters. In this paper, we used the simplest model where noise on couplings and local field are uncorrelated and Gaussian distributed (see Eq. (3) of the main text), with respectively σ ξ J = 0.035 and σ ξ h = 0.050, from D-Wave measurements [34], expressed in units of the maximum energy scale J max 3.2GHz. This noise is expected to be present even if couplings and external fields are set to zero. To understand the effects of the noise on classical annealers, we created a spoiled Hamiltonian by adding uncorrelated noise to all couplings and to all local fields of random instances of the SKM Hamiltonian, Eq. (1) of the main text.
Since the optimal J F for the D-Wave device scales with the number of logical spins, as reported in Fig. (2) of the main text, larger instances will be noisier than smaller instances. Therefore, to correctly reproduce the effects of the noise on the classical SA, in the logical spoiled SK Hamiltonian the noise is chosen to be proportional to J F , i.e.
The effects of the noise on the performance of SA are shown in Fig. (6), where data has been obtained by using the optimal number of sweeps, considering the effect of the noise, for any fixed number of logical spins N and J F . Black line on the right panel indicates the optimal J F of the D-Wave II device (see Fig. (2) of the main text). As one can see, the performances of the classical SA quickly drop by increasing J F . Observe that the probability of success of the spoiled SKM for N = 30 is almost half than the probability of success of the unspoiled SKM.
Speeds for the spoiled Hamiltonian are displayed in the top panel of Fig. (7): Interestingly, the speed decreases by increasing the noise. This scaling is in accordance with the fact that it is unlikely that the spoiled and the unspoiled Hamiltonian share the same ground state for large noise, as depicted in the bottom panel of Fig. (7), so that larger annealing times actually reduce the performance of SA in the presence of noise.

Correlation plots of the effective computational times
In this section we compare the median optimal computational time T opt in SA simulations, as defined in Eq. (5), with the median optimal runtime of the D-Wave II (DW2) device considering optimization over J F and fixed annealing time of τ = 20µs, on the noise-free problem. To compare DW2 and SA, we use either a direct comparison of T opt instance by instance or the "copula" of T opt , namely the correlation of the ordered rankings [13] of the value of T opt , which are respectively the left panel and the right panel of Fig. (8). The linear coefficient R is computed as: with n and n the rank positions of the T opt for the two different devices. The D-Wave II device has a slightly better performance than the classical simulated annealing for instances which requires larger annealing time, even if the correlation coefficient R is rather high (R ≈ 0.77).

Calculation of the critical temperature
Spatially random systems can undergo a spin-glass phase transition where local metastable states dominate the thermodynamics of the system. The classical (B(t) = 0) SKM was one of the first model for which the existence of the spin glass phase transition has been proven [39] to occur at T c = |J| √ N , where |J| 2 is the variance of the random couplings. As exploited in Ref. [40][41][42] the spin glass phase transition in a system with N spins can be detected using the spin configuration overlap defined as where σ (1) and σ (2) represent two independent replicas with the same disorder. In the case of the embedded SKM, the replicas refer to the actual physical spin configurations of the embedded SKM Hamiltonian, and the  number of spins is accordingly chosen to be the total number of physical spins used. In the high temperature limit, where the system is in its paramagnetic phase, the overlap distribution P (q) follows a Gaussian distribution of width √ N , where N is the number of spins. On the contrary, in the spin glass phase, P (q) converges to a distribution with a non trivial support [43].
In order to detect a spin glass phase transition for the embedded SKM Hamiltonian (Eq. (2) of the main paper) we numerically studied the Binder ratio of the Although the D-Wave II device has a slightly better performance than the classical simulated annealing, the correlation coefficient R is rather high.
spin configuration overlap [37,44,45] where · denote both the statistical mechanics average at fixed temperature T and the disorder average J. The Binder ratio g is a dimensionless parameter defined so that g → 0 in the paramagnetic phase and 0 ≤ g ≤ 1 in the spin glass phase, observing a scaling in terms of T c (the critical temperature for which the system undergoes to a spin glass phase transition) and κ (the critical exponent corresponding to 2-α, standard notation [46], by means of Josephson's identity of hyperscaling). In Fig. (9), we show the Binder ratio g computed for the embedded SKM Hamiltonian in Eq. (2) of the main text, by varying number of logical spins N for the embedded SK Hamiltonian at fixed J F = α F √ N , where α F is chosen respectively α F = 1.25 (left panel), α F = 2.0 (middle panel) and α F = 3.0 (right panel).
As one can see, for sufficiently large α F the curves of g show an intersection in T = T c , which indicates the presence of a spin glass phase transition. Figure (4) of the main text shows T c computed as the intersection of the Binder ratios g for N = 60 and N = 24 logical spins. Since the full equilibration is obtained only for T > 0.2 (see next Section for more details), only points for reliable intersections are shown. However, we cannot exclude a priori the existence of intersections of the Binder ratio for small α F at very small temperature. Figure (10) shows the same data presented in Fig. (9) but properly rescaled, in order to better appreciate the critical temperature T c and the critical exponent κ.
Noteworthy, the curves of the Binder ratio g scale well around T c when κ = 3 is chosen as critical exponent, which is the same critical exponent κ of the logical SK model [47][48][49]. Finally, Fig. (11) shows the difference of the Binder ratios using N = 60, 24 (left panel) and N = 60, 36 (right panel). The intersection with the zero axis are used to compute T c as a function of α F , as show in Fig. (4) of the main text.

Equilibration of the embedded SK Hamiltonian
As described in the main text, the embedding of the SKM creates long ferromagnetic chains of "physical" spins, which correspond to a single "logical" spin (LB). When the intra-chain ferromagnetic coupling J F becomes comparatively very large, the whole chain behaves like a true logical spin and then, we expect for the embedded SKM to show the same thermodynamics properties of the logical SKM. Unfortunately, due to the presence of these long ferromagnetic chains, equilibration of the embedded SK Hamiltonian happens to be extremely long if the a standard single spin flip Metropolis-Harris is used to perform SA simulations. Indeed, for low temperature and J F |J ij |, where J ij are the couplings or the original logical SK model, spins belonging to the same chain prefer to stay aligned. Since the probability to create a defect in a polarized chain is proportional to exp(−2β|J F |), large part of the equilibration time is spent to try to flip a whole chain. Left panel of Fig. (12) shows the Binder ratio g for the embedded SK Hamiltonian (solid symbols) computed by using a Metropolis-Harris single spin-flip update, by varying the number of equilibration sweeps T eq for each temperature. Curves are compared with the Binder ratio g for the logical SKM. As one can see, the correct equilibration is obtained only for large temperature, above the spin glass critical temperature T c = 1.
In order to reach a faster equilibration for the embedded SKM, we propose a variant of the Wolff cluster method [50] (a generalization of the original Swendsen-Wang cluster method [51]), which takes into account the existence of the logical superstructures. In the Wolff cluster method, a cluster is created at any time step by using the following rules: 1. Chose a random spin i which represents the "cen-ter" of the cluster C.
2. Create clusters: neighbors of the center are included as members of the cluster with a probability: New neighbors are considered if that particular pair were not considered before.
3. The creation process continues until no new neighbors are added.
4. Flip the whole cluster.
Given a cluster C created by using the above rules, the transition probability from a given configuration of the system σ to a configuration where the cluster C is flipped can be written as [51] W (σ → σ , C) = w bulk (σ, C) In the above equation, ∂C indicates the "border" of the cluster C, namely those spins inside the cluster C that share a coupling with spins outside C, and w bulk (σ, C) is the probability to create the "bulk" of the cluster C without its border. Since W (σ → σ, C) p(σ ), with p(σ) = e −βH(σ) /Z, is trivially satisfied.
As described in [50,51], the Wolff cluster method works well in the presence of many domain-walls: in this case, flipping clusters reduces the equilibration time in simulated annealing simulations by quickly removing borders between two neighbors clusters with opposite sign. Unfortunately, the Wolff cluster method perform poorly for fully connected spin-glass model like the SK model, since the Wolff procedure typically creates clusters which contain almost all the spins. Therefore, the Wolff procedure cannot be used "as it" on the embedding SK Hamiltonian since we expect the same thermodynamics properties as the logical SKM for large J F .
To overcome this limitation, we devised a variant of the Wolff cluster method for the embedded SK Hamiltonian which takes into account the existence of the logical spins as a chain of physical spins, but it does not interfere with the equilibration of the underlying SKM. In our variant, clusters in the embedded SK Hamiltonian are created by using the following rules: 1. Chose a random spin i which represents the "center" of the cluster C.
2. Neighbors of the center which belong to the same logical spin/chain are included as members of the cluster with a probability p(σ i , σ j ) = 1 − exp(−β|J F |(1 + σ i σ j )) (recall that all the intrachain couplings are always ferromagnetic and of magnitude J F ). New neighbors are considered if that particular pair were not considered before.
3. The creation process continues until no new neighbors are added. Since C can grow only inside the logical spin/chain, any cluster can be seen as a connected sub-chain which contains the center of the cluster i.

Flip the whole cluster with a probabilitỹ
where σ is the spin configuration of the system and ∂ C consists in all the couplings between spins in C and spins which does not belong to the same logical spin/chain of the spins in C.
Following the same analysis used for the Wolff method [51], it is straightforward to show that the above procedure still satisfies the detailed balance: indeed, the limitation that a cluster can grow only inside a logical spin/chain is balanced by adding the probabilityp(σ, C) to flip the cluster, which involves only extra-chain couplings. To make this point more clear, consider the simple system depicted in Fig. (13) and described by the Hamiltonian where σ i represent spins in the same logical spin/chain while σ α represent spins in other logical spins/chains. Since clusters can be only created within the chain, clusters can be seen as sub-chain as depicted in Fig. (13). Therefore, the probability to create a certain cluster C can be easily computed and results to be where a and b are the ends of the sub-chain. It is important to observe that p cluster (σ i , C) depends only on the intra chain spins and that the term in Eq. (12a) is invariant by flipping the whole cluster C. The transition probability to flip the cluster C using our modified cluster algorithm is where σ and σ are respectively the spin configurations of the system before and after to flip the cluster C. Therefore, the detailed balance is satisfied if with ∆H = H(σ ) − H(σ) and H(σ) as in Eq. (11). Since only spins inside the cluster C are flipped, after some calculation one finds that and therefore, the detailed balance in Eq. (14) is satisfied.
In the right panel of Fig. (12) we show the convergence of the Binder ratio g for the embedded SK model with N = 24 logical spins and J F = √ 24, by using our variant of the Wolff cluster method. In our cluster method, a sweep is defined as a complete update of the system where all the spins have been chosen as center of a cluster. Correctly, for a sufficiently large number of sweeps, the Binder ratio g computed by using our cluster method converges to the same Binder ratio g of the logical SK model.
Finally, in order to be sure that our cluster method has reached the full equilibration for the calculation of the Binder ratio g, we used the same test as originally proposed in [37]. In particular, we compute both a lower-bound estimation χ SG = N hw q 2 , where N hw is the number of the hardware spins (i.e. N 2 /4 + N ), and an upper-bound estimation χ SG = 1 N hw σ (1) i (T ann + t 0 ) σ (1) i (t 0 ) , with t 0 and T ann respectively the equilibration and the measurement time, of the true spin glass susceptibility. Results are shown in Fig. (11). Since the two curves do not coincide for T < 0.2 (shaded regions in the figures), results on the Binder ratio are reliable only for rescaled temperatures T > 0.2.

Comparison of Embedding of the Sherrington-Kirkpatrick Model with
Edwards-Anderson 2D model Many of the properties which are discussed in the main text concerning optimal parameter setting are relevant only for embeddings whose average component size increases with the system size. In this section we program on the D-Wave device the Edwards Anderson 2D model (2D-EAM), which consists of a disordered Ising model on a square lattice without local fields: We follow the same steps as in the main text to shed light on the differences. The embedding is straightforward as shown in Fig. 14, where we chose to encode each LB in ferromagnetic chains of 2, 3 or 4 qubits. The embedding overhead this time is linear instead of quadratic (as in the SKM) because of the finite connectivity of the lattice to be embedded. The optimal J F for a given embedding does not scale with the system size, and it is J F = 1.25, 1.6 and 2.5 respectively for the embeddings with 2, 3 and 4 qubits. These values do not match the optimal values found in Fig. 2 of the main paper for the same embedding chain length (i.e. yellow, green and gray lines of Fig. 2 of the main paper), in accordance with the conjecture that the optimal J F depends on the interplay between the LB criticality and the problem criticality, which for disordered 2D Ising models doesn't scale with the system size [52] (however note that for these small sizes of LB chains we are dominated by finite-size effects so criticality is very loosely defined).  Unsurprisingly, the effect of the error-correction is not as dramatic as in the SKM, but it becomes more pronounced as the number of qubits in each LB increases. The scaling of performance (see Fig. 15) does not seem affected by the embedding choice although larger size ought to be considered in order to make definite statements (which is not possible in current machines). Comparison with SA (optimized in the linear schedule similarly as we did for the SKM. We checked that the starting temperature T=1 was close to an optimal choice) is in-cluded for sake of completeness. The largest number of logical spins we considered is N = 400.