Entropic barriers as a reason for hardness in both classical and quantum algorithms

We study both classical and quantum algorithms to solve a hard optimization problem, namely 3-XORSAT on 3-regular random graphs. By introducing a new quasi-greedy algorithm that is not allowed to jump over large energy barriers, we show that the problem hardness is mainly due to entropic barriers. We study, both analytically and numerically, several optimization algorithms, finding that entropic barriers affect in a similar way classical local algorithms and quantum annealing. For the adiabatic algorithm, the difficulty we identify is distinct from that of tunnelling under large barriers, but does, nonetheless, give rise to exponential running (annealing) times.


I. INTRODUCTION
Hard discrete optimization problems are ubiquitous in scientific disciplines and practical applications. The problem of minimizing a complex cost function (or equivalently maximizing a reward function) naturally appears in many different contexts: e.g. in physics in the computation of ground state configurations, in statistics in the maximization of the likelihood, in machine learning in the training of artificial neural networks, and so on.
Although real world problems have usually local structures that make their analysis difficult, it is commonly believed that the main source of computational hardness arises from the strong long-range correlations that exist among variables, and this effect can be studied also in more idealized and simple-to-solve models. In other words, in hard optimization problems, starting from an optimal or near-optimal configuration, the change of a single variable (or a small subset of variables) often requires the rearrangement of many more variables in order to remain close to optimality; often the variables to be rearranged are not even close to the modified variable. This property makes the search for the optimal configuration a challenging task even for sophisticated algorithms (see for example the case of backtracking algorithms, like DPLL [1]).
An ideal setting for studying this kind of hard optimization problems is provided by constraint satisfaction problems defined on sparse random graphs. Such problems have a twofold benefit: they can be solved analytically using the cavity method, a tool from statistical physics of disordered systems, and they can be efficiently handled on a computer, as the finite mean degree of the graph makes the computational resources required (CPU and memory) grow only linearly with the problem size.
Random constraint satisfaction problems (rCSP) are optimization problems where N discrete variables need to be assigned in order to satisfy M = αN constraints, each one involving a small subset of variables. The most famous among rCSP is maybe random K-SAT [2]. Recently these rCSP have been the subject of intense studies based on statistical physics ideas with the aim of understanding the origin of their computational hardness [3][4][5][6][7][8][9][10][11][12][13][14][15][16]. Indeed, a common feature of all the hard rCSP is the presence of a broad range of the constraints per variable ratio α such that solutions to the problem exists with high probability (in the large N limit), but all known solving algorithms are unable to find any solution in a time growing polynomially with the problem size N . In this hard region it is expected that any solving algorithm requires a time growing exponentially with the problem size, t ∼ exp(aN ). By defining an energy function that counts the number of violated constraints, one can visualize the rCSP as the problem of searching for a zero-energy configuration in a complex energy landscape. The hard phase in rCSP does actually corresponds to an energy landscape with exponentially many (in the system size) local minima that can trap the searching dynamics. The energy barriers between these minima are usually considered the main source of computational complexity, as any local dynamics is required to jump over these barriers in order to proceed further in the search for the optimal configuration. Based on the above picture, it is often believed that a quantum evolution -that allows for tunneling eventsmay escape local minima more efficiently than a classical stochastic dynamics. This physically reasonable expectation implies that quantum algorithms may be faster in the search for optimal configurations than their classical counterparts, and has fueled interest in quantum algorithms that could benefit from this phenomenon: Quantum Annealing [17][18][19][20][21][22], its more recent variant, the Quantum Ap-proximate Optimization Algorithm [21], and Population Transfer [23][24][25] are some well-known examples. All these algorithms typically show a complexity growth comparable with the best classical algorithms but, despite their initial promise, it is entirely possible that the limitations they exhibit are insurmountable, and it is unlikely that they could solve NP-hard problems in polynomial time. The limitations might come from the exponentially small tunneling rate out of a local minimum (a phenomenon linked to many-body localization and the existence of an emergent integrable dynamical phase [20,[26][27][28]) or might come from other dynamical phenomena. In this paper we identify one such phenomenon.
We consider one of the hardest sparse rCSP, namely random 3-XORSAT, and show that the problem hardness can be interpreted as coming essentially from entropic barriers: we introduce a quasi-greedy algorithm, unable to jump over large energy barriers, and notice that it is able to solve the problem as efficiently as the stateof-the-art algorithms, which are designed to be efficient in problems with large energy barriers. This peculiar property suggests that this model is a perfect candidate to understand the effect of entropic barriers.
We investigate several algorithms, both classical and quantum, in order to better understand the effect of entropic barriers. For all the algorithms analyzed, we find that the time to reach a solution scales exponentially with the system size and quantum dynamics seem to suffer from the presence of entropic barriers as much as the classical algorithmic dynamics. As the effort to build a quantum computer are finally giving up some results [29], we believe it is important to identify all possible stumbling blocks for quantum architectures.

II. MODEL DEFINITION AND ITS KNOWN SOLUTION
The random 3-XORSAT problem is among the simplest rCSP [3]: it is made of N binary variables x i ∈ {0, 1} that have to satisfy M = αN parity checks of the kind where the variables entering each constraint are randomly chosen and the parity check bit b ijk is 0 or 1 with equal probability. In a K-XORSAT problem each constraint involves K variables, but for the sake of simplifying the presentation we restrict to the random 3-XORSAT problem, where each constraint involves exactly 3 randomly chosen variables. Increasing α the typical problem becomes more and more difficult to solve. Solutions exist with high probability in the large N limit until the satunsat threshold α s [30]. However, the most interesting transition from the point of view of searching algorithms is the clustering (or dynamical) transition that takes place at α d before α s [6]. For α ∈ [α d , α s ] the space of solutions is shattered in exponentially many cluster of solutions [6,31] and this is what makes the search for solutions much more difficult [3,6,15]. This picture has been proven rigorously to a large extent [32]. The hardness of the problem of finding a solution for some classes of local algorithms in the region α ∈ [α d , α s ] has been proven [33], and found to depend on the so-called 'overlap gap property', which in practice corresponds to the clustering of solutions taking place after the dynamical transition α d . In other words, for α > α d the geometry of solutions is such that the Hamming distance d between any pair of solutions is either very small d < d 1 (for pairs of solutions in the same cluster) or very large d > d 2 (for pairs of solutions in different clusters) [6]. It is exactly the existence of such a range of distances with no solutions that creates an algorithmic bottleneck, while for α < α d local algorithms can sample the space of solutions [32]. So we are going to focus our attention on a 3-XORSAT problem beyond the dynamical threshold α d . 1 Given that we are interested in using this model as a benchmark for optimization, we need two more ingredients: (i) We need to define an energy function whose ground state configurations are the solutions to the problem; the simplest choice consists in just counting the number of violated parity checks via the following Hamiltonian where s i = (−1) xi are Ising spins and J a = (−1) ba the couplings, being a an index running over all interactions (triplets for 3-XORSAT) and ∂a the set of variables entering the a-th interaction. (ii) At least a solution must always exist, and this can be ensured by enforcing a specific configuration to satisfy all the constraints. For example, by setting all b = 0, the configuration x i = 0 ∀i is always a solution. One may think this way of building the model naturally favors the imposed or planted configuration, but this is not the case for the XORSAT problem. As noticed since Ref. [35], finding the imposed or planted solution is like finding the crystal in a model of a liquid that upon cooling spontaneously forms a glass: it is well known that crystallization requires an activated dynamical process (nucleation), which is exponentially 1 Strictly speaking, XORSAT is in P, as the solution to a set of αN linear equations in N variables can be always found in time O(N 3 ) by a simple Gaussian elimination algorithm (more complicated algorithms can do marginally better). However, such an algorithm is very special to XORSAT and cannot be generalized to other CSPs being inherently non-local (i.e. the operations performed involve in general many variables, also very distant on the interaction network). On the contrary, when local algorithms are run on XORSAT they are found to be very slow and affected strongly by the dynamical transition at α d . Even the very same Gaussian elimination algorithm has an algorithmic phase transition at α d from linear to cubic behavior [34] and it does not work as soon as the problem is slightly perturbed (e.g. by the addition of a tiny fraction of constraints with the OR operator). rare in models with long range interactions, as the random XORSAT. Planted models which are hard to solve are the most natural candidates for optimization benchmarks and the planted XORSAT turns out to be the hardest among these [5]. Properties of planted models have been studied in a great detail [36,37] and reviewed in [38]. The Hamiltonian for the planted model simplifies to the following which is indeed minimized by the configuration s i = 1 ∀i. This is the energy function we are going to minimize in order to test classical and quantum optimization algorithms.
The last relevant choice regards the interaction hypergraph, that is the set of M triplets. In the model where the M triplets are chosen randomly the degree of each variable is a Poisson random variable of mean 3α. A different choice is the one where the interaction hypergraph is chosen such that each variable has the same degree d: this is called a random regular hyper-graph and can be generated via the configurational model where N variables are given d legs each and M = N d/3 interactions are given 3 legs each, and then variables and interactions legs are coupled in a random way, just avoiding that the same variable enters more than once in the same interaction. We are going to use the random regular version for the numerical simulations, while the random Poisson version is used for some analytic computations. The two versions share the same physical behavior.
The statistical properties of the random regular XOR-SAT problem are well known [36,[39][40][41]. Hereafter we are going to focus our studies on the d = 3 case. In this case the random and the planted models are equivalent in the large N limit for any positive temperature (at T = 0 the sub-extensive differences between the two models may lead to some discrepancy in the number of solutions discussed in Refs. [42,43]). We are going to use the planted model in order to be sure that a solution always exists even for finite (and small) values of N .
Having fixed the degree of the hyper-graph such that the XORSAT problem is in its hard phase, we can consider now the Gibbs measure corresponding to Hamiltonian H in Eq. 3 at any temperature T . The XORSAT problem corresponds to the problem of finding a zero energy ground state, so it is somehow related to the T = 0 physics of the model, but the behaviour of the model at T > 0 is interesting as well. Indeed it is well known [35] that when a model is beyond the dynamical transition point at T = 0 (e.g. for α > α d ) it undergoes a dynamical phase transition at a positive temperature T d and for T < T d it has an exponentially large number of metastable states dominating the thermodynamics, N ∼ exp[N Σ], where Σ is the so-called complexity. This is the case for the 3-regular 3-XORSAT model that shows a dynamical phase transition at T d = 0.255 < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 B d 9 5 L Y p K 1 k d B l 0 t 0 0 n R G 7 1 A M D w = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 n E o s e C F 4 8 t 2 A 9 o Q 9 l s J + 3 a z S b s b o Q S + g u 8 e F D E q z / J m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d g o b m 1 v b O 8 X d 0 t 7 + w e F R + f i k r e N U M W y x W M S q G 1 C N g k t s G W 4 E d h O F N A o E d o L J 3 d z v P K H S P J Y P Z p q g H 9 G R 5 C F n 1 F i p i Y N y x a 2 6 C 5 B 1 4 u W k A j k a g / J X f x i z N E J p m K B a 9 z w 3 M X 5 G l e F M 4 K z U T z U m l E 3 o C H u W S h q h 9 r P F o T N y Y Z U h C W N l S x q y U H 9 P Z D T S e h o F t j O i Z q x X v b n 4 n 9 d L T X j r Z 1 w m q U H J l o v C V B A T k / n X Z M g V M i O m l l C m u L 2 V s D F V l B m b T c m G 4 K 2 + v E 7 a V 1 W v V n W b 1 5 V 6 P Y + j C G d w D p f g w Q 3 U 4 R 4 a 0 A I G C M / w C m / O o / P i v D s f y 9 a C k 8 + c w h 8 4 n z / K J Y z r < / l a t e x i t > e < l a t e x i t s h a 1 _ b a s e 6 4 = " U I t 5 l 4 s i B H o K u b z v L Z r t G p v 1 3 P o = " > A A A B 8 3 i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 W P B i 8 c K 9 g O a U D b b S b t 0 s w m 7 E 7 G E / g 0 v H h T x 6 p / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v T K U w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 T Z J p D i 2 e y E R 3 Q 2 Z A C g U t F C i h m 2 p g c S i h E 4 5 v Z 3 7 n E b Q R i X r A S Q p B z I Z K R I I z t J I P f R / h C X M c 6 W m / W n P r 7 h x 0 l X g F q Z E C z X 7 1 y x 8 k P I t B I Z f M m J 7 n p h j k T K P g E q Y V P z O Q M j 5 m Q + h Z q l g M J s j n N 0 / p m V U G N E q 0 L Y V 0 r v 6 e y F l s z C Q O b W f M c G S W v Z n 4 n 9 f L M L o J c q H S D E H x x a I o k x Q T O g u A D o Q G j n J i C e N a 2 F s p H z H N O N q Y K j Y E b / n l V d K + q H t X d f f + s t Z o F H G U y Q k 5 J e f E I 9 e k Q e 5 I k 7 Q I J y l 5 J q / k z c m c F + f d + V i 0 l p x i 5 p j 8 g f P 5 A 7 h I k h 8 = < / l a t e x i t > Schematic picture for the energy relaxation in hard optimization problems. On short times the energy relaxes to a threshold value, while the ground state (solution) is reached only for times growing exponentially with the system size. The first regime can be described by ordinary differential equations taking the large N limit at t/N fixed, while the second requires the estimation of rare and large fluctuations. and a non-zero complexity of states at T = 0 that extends from e ≡ H /N = 0 to e = e d = 0.0206705. Actually not all these states are expected to play a relevant role in the relaxation dynamics searching for ground states: from previous studies [40,41] we expect states above the marginal energy e marg = 0.018203 to unlikely trap smart searching algorithms.
Unfortunately a precise connection between relaxation algorithms searching for low energy configurations (e.g. T = 0 Langevin dynamics) and the energy landscape that we can describe in a precise way via the computation of the complexity is still missing (and recent results have clarified that the situation is much more complicated than previously expected [44][45][46]). So we cannot make an analytical claim about the threshold energy which is hard to go below by searching algorithms, but this threshold energy is certainly positive and close to e marg . Reaching a solution, that is an e = 0 configuration, is a very hard problem and requires in general times scaling exponentially with the system size N .

III. THE OPTIMIZATION ALGORITHMS
In the study of the optimization algorithms, that is the out of equilibrium processes that try to minimize the energy, the order in which the large size limit (N → ∞) and large times limit (t → ∞) are taken is extremely important. We expect an algorithm-dependent threshold energy, e thr > 0, to exist such that configurations with e > e thr can be reached in an "easy" way (e.g. in a time scaling linearly with N ), while to reach a solution (i.e. a configuration with e = 0) a time growing exponentially in N is required in general for hard problems. (see the schematic picture in Fig. 1).
We will see that for some algorithms we are able to provide an approximate description of the dynamics in the regime where the N → ∞ limit is taken before the t → ∞ limit, thus estimating e thr (that in the best cases is close The presentation of our results about optimization algorithms is somehow split in two parts. In Sections III A and III B we discuss the regime of linear times where large sizes can be studied and analytical solutions in the large N limit can be obtained (thus estimating the threshold energy for various algorithms). In Sections III C and III D we study the regime where times are made exponentially large in the system size N , and estimate the exponential growth rate of the time to reach a solution.
A. Simulated Annealing: a warming up with the most widely used optimization algorithm Simulated Annealing (SA) is maybe the most widely used optimization algorithm. It consists in implementing a Monte Carlo Markov Chain sampling from the Gibbs-Boltzmann distribution P GB (s) ∝ exp(−H(s)/T ) with a temperature T slowly decreasing towards zero. In Fig. 2 we report the results of Simulated Annealing run on samples of size N = 10 5 with a cooling schedule where the temperature is decreased by ∆T after each Monte Carlo Sweep (MCS): the four curves corresponds to ∆T = 10 −3 , 10 −4 , 10 −5 , 10 −6 (from top to bottom).
From the Simulated Annealing results it is clear that an efficient, but still linear-time, algorithm is not able to solve the 3-regular 3-XORSAT problem, and seems to converge to configurations with a threshold energy close to e marg . To achieve the zero energy configuration we need to use an algorithm that can go below the threshold energy, overcoming energetic and/or entropic barriers.

B. A broad class of stochastic search algorithms
Given a particular spin configuration, let us classify its variables according to the number of unsatisfied interactions they belong to: we say a variable is of type k if it belongs to k unsatisfied interactions. In the present model k ∈ [0, 3] since d = 3 for all variables. We call f k (t) the fraction of variables of type k at time t, that satisfy 0 ≤ f k (t) ≤ 1 and 3 k=0 f k (t) = 1 at any time. The searching algorithm we propose is extremely simple and works as follows: at each time step it chooses one variable of type k with probability p k (t) ∝ w k f k (t) and flips it. The time is then incremented by 1/N in order to have a well defined continuous process in the large N limit.
Starting from a random configuration the behavior of the algorithm is determined only by the vector of weights w = (w 0 , w 1 , w 2 , w 3 ). The stopping condition depends on the weights: if all weights are non-null the algorithm never stops; while if w 0 = 0 the algorithm cannot flip variables participating only in satisfied interactions and thus any solution is a stopping configuration. We fix w 0 = 0 hereafter so as to make any solution a stopping configuration for the algorithm. We study several choices for the vector of weights (the weights need not be normalized, but the probabilities p k ∝ w k f k are).

Analytic description
Before presenting the actual performance of this algorithm, we would like to stress that the evolution of the algorithm can be described analytically under some assumptions, which are similar to those already used in the literature to approximately describe the relaxation dynamics in model defined on a Bethe lattice [31,47,48].
At each time step the fractions {f k (t)} change depending on the variable chosen. For example, if a variable of type k = 3 is chosen and flipped, then that variable changes its type from 3 to 0 (3 → 0), i.e. the fraction f 3 decreases by 1/N and the fraction f 0 increases by 1/N ; at the same time its 6 neighboring variables decrease by one the number of their types, and one needs to compute the number of changes n 3→2 , n 2→1 and n 1→0 in order to properly update the fractions {f k } (for example ∆f 2 = (n 3→2 −n 2→1 )/N ). The three numbers n 3→2 , n 2→1 and n 1→0 are random variables distributed according to the multinomial distribution Mult({p 3→2 , p 2→1 , p 1→0 }, 6). Under the approximation that no correlation exists between the types of neighboring variables once the common interaction is removed (cavity approximation) we can compute the parameters of the multinomial distribution where the proportionality constant is fixed by p 3→2 + p 2→1 + p 1→0 = 1. One more example: if the variable chosen to be flipped is of type 2, then, apart from the FIG. 3. Evolution of the energy for several greedy versions (w1 = 0) of the algorithm. Points are numerical data and lines are the analytic description.
with again the normalization condition p 0→1 + p 1→2 + p 2→3 = 1. So, the variations of the fractions due to a single spin flip are random variables given by where the superscript (d) in n (d) refers to the total number of events in the multinomial distribution, and we fix n −1→0 = n 0→−1 = n 3→4 = n 4→3 = 0. The generalization of the above equation to other random graphs or interaction types is straightforward. Rescaling the time such that a spin flip happens every ∆t = 1/N , and taking the average over a small, but finite, time interval, corresponding to O(N ) single spin flips, the stochastic equation above can be converted in the N → ∞ limit to an ordinary first order differential equation in terms of mean values E[n For simplicity, we keep using the same notation, but now {f i (t)} are not single trajectories, but mean values over the trajectories: An easy check is that 3 k=0 f k (t) = 0, so the total probability is conserved. The solution to Eq. (6) can be easily achieved by any integration algorithm for ordinary differential equations. This analytic solution will be compared in the following with the actual evolution of the algorithm.

Actual performances
Let us start from a greedy version of the algorithm, that is an algorithm that can never increase the energy (it is the equivalent of gradient descent, but here the space of configurations is discrete, so gradients are not well defined). This greedy version of the algorithm requires w 1 = 0 because flipping a variable of type 1 would increase the energy (i.e. the number of unsatisfied interactions); only weights w 2 and w 3 can be non null in the greedy case. For this greedy version the number of stopping configurations is very large (their entropy is computed in Appendix A): any configuration where each variable participate to 0 or 1 unsatisfied interactions is a stopping configuration. These configurations are local minima of the energy function and we call them blocked.
In Fig. 3  to the ODE in Eq. (6). We notice the agreement between analytics and numerics is almost perfect, because at those energy values correlations are very weak. The same almost perfect agreement can be seen also at the level of fraction of variables f k participating in k unsatisfied constraints (see Fig. 4 for the case w 2 = w 3 = 1). Let us consider now non-greedy versions of this algorithm, that is cases with w 1 > 0 where the energy can sometimes increase during the evolution, although we expect on average the energy to relax to the threshold value.
The first interesting case is the one that corresponds to WalkSAT [49]. A detailed description of this algorithm is given in Sec. III C; for the present purposes it is enough to say that in WalkSAT variables are flipped accordingly to the number of unsatisfied constraints they belong to, that is w k ∝ k. The numerical results and the comparison with the analytical description is provided in Fig. 5: the agreement is excellent. Although this version of the algorithm would run forever, after a finite time a stationary regime is reached and nothing interesting happen any more in the large N limit. Once the threshold energy has been reached, then the search for the solution takes place via rare fluctuations as discussed in the Sec. III C.
However, having achieved a good analytic description of the WalkSAT algorithm is not enough, because this algorithm works at very high energies: its threshold energy is 1/4 for 3-regular 3-XORSAT, more than twice the energy that the greedy version achieves and more than 10 times the marginal energy we expect to be the relevant threshold energy for smart searching algorithms.
So we are interested in studying more efficient versions of this algorithm and we explore the case where w 1 is much smaller than w 2 and w 3 (in practice we fix w 2 = w 3 = 1 and explore the range w 1 1). Notice that if w 1 is made very small, the algorithm in practice is allowed to make an energy-increasing spin flip only when no other move is allowed, that is when the configuration is blocked. We have simulated also a version of the algorithm where this is made explicit and the results are equivalent.
In Fig. 6 we report the evolution of the energy for WalkSAT, the greedy version and several quasi-greedy versions of the algorithm. We observe that the threshold energy is 1/4 for WalkSAT, close to 0.1 for the greedy version, but approaches very closely the marginal energy e marg for w 1 1. So the quasi-greedy versions of this algorithm seems very effective in reaching the same energy values Simulated Annealing can achieve.
The analytic description of the algorithm changes a lot depending on the energy value. For high enough energies, correlations are very weak and the analytic solution matches perfectly the numerical data: this is true for WalkSAT and the greedy version as already shown in Figs. 3, 4 and 5. For w 1 = 0.1 we start seeing some deviations at times t ∼ 1, that luckily enough disappear for larger times.
The most interesting cases are those with w 1 ≤ 10 −2 . First of all we observe a very weak dependence on w 1 (the three values used w 1 = 10 −2 , 10 −3 , 10 −4 give practically the same results), so we are working in the limit where the algorithm does an energy-increasing spin flip only when no other move is available. So this algorithm is not able to jump over barriers of height larger than 1 (or few units at most), that correspond to ∆e = O(1/N ).
Indeed this quasi-greedy algorithm follows closely the greedy version until the time, t * ∼ 0.3, when the latter reaches a blocked configuration that stops it. For t > t * the quasi-greedy algorithm keeps decreasing the energy at a much slower pace, eventually approaching an energy very close to the marginal one (that we expect to be a lower bound for the threshold energy in linear time algorithms).
In the regime t > t * the analytical approximation based on the lack of correlations fails dramatically in many important aspects: it keeps predicting a very fast energy decrease and it estimates a too small asymptotic energy. The reason for such a failure is clear: low energy configurations are very peculiar and have strong correlations among variables, even among variables which are far apart. In this energy range, our approximation fails and the analytic solution is meaningless.
It is worth stressing that we are still working on timescales growing linearly with N (the first regime depicted in Fig. 1). So the failure of the analytic approximation is not due to the fluctuations that become important in the second regime of exponentially large timescales.
Here we are still working in a regime such that every spin variable has been flipped a finite number of times. Nonetheless the evolution of the quasi-greedy algorithm bring the system in configurations correlated up to some distance, such that hypothesis leading to Eq. (6) are no longer valid.
From a preliminary study we have understood that in this energy regime the algorithm proceeds by performing collective rearrangements of variables forming a local treelike structure, and the size of these collective rearrangements seems to grow while approaching the threshold energy (this reminds us a lot what happens in similar problems approaching the dynamical transition [50,51]). The analytical description of the quasi-greedy algorithm in this low energy regime is deferred to a future work.
We move now to the problem of estimating the large deviation rate to reach a solution via a rare fluctuation from the threshold energy. Given the differences in the threshold energies clearly visible in Fig. 6 we expect quite different rates for WalkSAT and the several quasi-greedy versions of the algorithm.

C. WalkSAT: the large deviations rate to reach a solution from a simple argument
WalkSAT is a popular randomized (or stochastic) algorithm for solving constraint satisfaction problems [49]. In its simplest version works as follows: starts from a random configuration; at each time step, if all constraints are satisfied returns the solution, otherwise picks uniformly at random an unsatisfied constraint and flips a randomly chosen variable participating that constraint. The flip certainly satisfies the chosen constraint (interaction) but may unsatisfy many other constraints: so the algorithm may increase the energy during the evolution.
An approximated analytic description of this algorithm exists [2,47,52] from which we learn that for the hard problem we are studying the algorithm relaxes to a positive energy in a linear time and then reaches a solution by a rare fluctuation taking an exponential time. However it is not clear which kind of barrier the algorithm crosses, given that the energy increase is in principle without bounds.
Actually the analytic description of the WalkSAT algorithm can be made even simpler than in previous approaches [2,47,52] by noticing that (i) the algorithm stays most of the time on configurations of very high energy where correlations are very weak and (ii) the fluctuation leading to a solution is so rapid that correlations do not arise.
These insights suggest that keeping track of the energy alone should capture the essential features of the algorithm. In practice, we think of the energy as a stochastic process, and write down a stochastic differential equation that describes its dynamics during the computation. The details of the calculation are presented in the following section.

Analytic description
In this section we focus on random p-XORSAT, with α constraints per spin: every spin participates in a Poisson distributed number of interactions (with average and variance pα), while every interaction involves exactly p spins. Random p-XORSAT is known to be less correlated than the regular version (see the comment after Eq. 3), so we expect it to be more amenable to a simple description.
Consider a uniformly random spin configuration s. Every spin is connected to a Poisson distributed number of broken constraints with average pα u , where α u = H[s]/N . The same spin is also connected to a Poisson number of satisfied constraints with average p(α − α u ). The fundamental idea of our approach is to assume that WalkSAT is incapable of building any correlations that violate this property, which is strictly true only on fully random configurations. This is reasonable, as most of the time is spent in high energy states.
Given an initial random configuration s, we run Walk-SAT for T steps and denote the number of broken constraints at this time H(T ). When we take one more step, H(T ) changes by where u(T ) is the number of excess broken constraints connected to the spin (i.e. excluding the one that was selected by WalkSAT) and s(T ) is the number of satisfied constraints connected to it. As explained earlier, we assume that at all times they are distributed as if the configuration was random. After a number ∆T of steps, larger than one but small compared to the number of spins N , the total energy change is and we approximate the two sums using the central limit theorem: where R 1 and R 2 are two independent standard Gaussian random variables. In these expressions we assume ∆T is short enough that the energy density α u does not appreciably change over the interval ∆T . The sum of the two terms involving R 1 and R 2 is again a Gaussian variable, so that the total energy change after ∆T steps is approximately We are interested in the large N limit, so it is convenient to change variable from T to t = T /N . Dividing the previous equation by N we obtain a stochastic differential equation describing an Ornstein-Uhlenbeck process: To estimate the scaling with N of the time necessary to reach a solution, it is best to study the Fokker-Planck equation associated to Eq. (11): Regardless of the boundary condition P (α u , t = 0), after an initial transient, the energy is distributed according to the stationary solution of Eq. (12) (the normalization constant c N is irrelevant here) which is centered around the finite value (if pα > 1) This indicates that the time necessary to reach the solution grows more rapidly than O(N ), but also gives us a concrete way to estimate it: the typical time over which a transition from energy α 0 to zero happens is the reciprocal of the rate  [47] and the blue line is the analytic approximation derived in that reference, based on large deviation theory. The red line is our approximation (see Eq. (15)).
As expected, the solution time scales exponentially with the problem size. This analytical solution does show, in its simplicity, the mechanism by which a solution is found and why we call it entropic barriers. At equilibrium, reached after O(N ) steps, the system fluctuates around a very high energy threshold α 0 . This is well above any local minimum where the system could be stuck. With exponentially small probability however a rapid fluctuation of O(N ) spin flips can bring the system to the real solution, but this happens with exponentially small probability. So, only an exponentially small fraction of all the possible sequences of N spin flips will pick the right direction to the solution, hence the problem is of an entropic nature, not an energetic one. We will discuss this in more details after comparing with numerical results.

Comparison with numerics
A comparison with the actual rate measured in numerical experiments in Ref. [47] is provided in Fig. 7. The quality of the rate reported in Eq. (15) and obtained under the assumption of lack of any correlation is surprising and supports the idea that WalkSAT makes a random search without building any relevant correlation in the problem.
It is interesting to notice that the same simple argument can be made also in the 3-regular 3-XORSAT case and provides the following scaling for the time to a solution t sol ∼ exp(N/6) .
The rate µ = 1/6 0.167 is again close to the numerics reported in Ref. [53], where µ ≈ 0.124 was measured. Notice that the physical interpretation of this calculation is that the equilibrium distribution is centered around a relatively large non-zero value of the cost function, as seen in Fig. 8. The solution is found by one rare fluctuation which brings the system out of equilibrium by a "lucky" series of O(N ) flips which points in the exact direction to the ground state. During this series, no significant correlation is created between the values of the various random variables so the central limit theorem we used is valid. We expect this to be the behavior of a random algorithm which works at high temperature, tackling a problem with entropic barriers.
In the analysis of more complex algorithms, the ones that work at lower temperatures, this assumption is most probably violated. Correlations between the random variables need to be taken into account when computing the rate of the rare fluctuation that will lead to finding of the ground state. If one wants to make progress in the analysis of such algorithms one needs to go beyond what done in this paper.

D. Times to reach a solution via the quasi-greedy algorithm
Given that the quasi-greedy version of the algorithm has a much lower threshold energy than the WalkSAT algorithm we expect rare fluctuations leading to a solution to happen with a better exponential rate. Unfortunately in the quasi-greedy case the computation can not be done analytically because, as seen above, the approximation based on the lack of correlations provides very poor results and so it is not useful at all. We thus resort to a numerical computation.
We have observed above that in the limit w 1 1 the dependence on w 1 is extremely weak in the first regime of linear times. The same is true also for the second regime of exponential times. The exponential rate µ determining the mean time to reach a solution t sol ∼ exp(µN ) does not show any visible dependence on w 1 . So, we have fixed w 1 = 0.05 to carry on our numerical experiments with the quasi-greedy version of the algorithm.
In Fig. 9 we report with blue points the average time to reach a solution running the quasi-greedy algorithm. The time we report is the mean number of sweeps, where a sweep corresponds to N spin flips. The average is taken over 10 6 different runs and the error is smaller than the symbol size. For large N the data is very well fitted by an exponential growth with rate µ QG = 0.0835 (5).
In the same figure we report also the mean time to reach a solution by the WalkSAT algorithm. It is clear that the rate is much larger, although for these sizes we are still observing the preasymptotic behavior and for larger sizes the rate will converge to µ WS ≈ 0.124 [53].
A much more meaningful comparison is the one with the Parallel Tempering (PT) algorithm. PT is considered the state-of-the-art for thermalization (and optimization) in the field of disordered systems, as it can deal efficiently with very rough energy landscapes. For spin glass models, which are similar to the problem we are studying here, it has been recently shown that PT performs comparably to Population Annealing (PA) [54] another standard algorithm to optimize complex energy functions. We prefer to study the performance of PT in finding a solution, rather than PA, because in PT it is straightforward to compute the time to solution once the temperature scheduling is fixed. While in PA one needs to increase the size of the population during the run and if the solution is not found a new run with larger sizes should be done; moreover comparing population sizes and running times requires some extra care.
We have run PT with an optimal temperatures schedul-ing derived in Appendix B. The results in terms of Monte Carlo Sweeps per replica are shown in Fig. 9. The comparison with the quasi-greedy algorithm is very favorable for the latter: the growth rate for PT is slightly larger than µ QG (but still decreasing and could eventually converge to the same value) and the prefactor for PT is larger by 2 orders of magnitude than the one for the quasi-greedy algorithm.

E. Quantum annealing
Originally proposed twenty years ago [17,55], Quantum Annealing is a quantum algorithm designed to solve classical optimization problems, exploiting the adiabatic theorem of quantum mechanics [56,57]. First, we encode the given classical problem in a problem Hamiltonian H P , so that solutions correspond to its ground states. The problem Hamiltonian is typically in the form of a cost function with all terms commuting among themselves (in this sense it is a classical problem). A convenient and customary choice is to have the problem Hamiltonian be diagonal on the σ z basis. Then, we choose a fluctuation Hamiltonian H F , an arbitrary operator that should provide "quantum fluctuations" and must have a known and simple ground state. A popular choice is to use a uniform field in the σ x direction to provide fluctuations: where the sum runs over all spin variables in the system. The Quantum Annealing algorithm consists then in timeevolving the ground state of from t = 0 to t = T . For a long enough annealing time T , the adiabatic theorem guarantees that a system initially prepared in the ground state of H(0) = H F , known by construction, will evolve into a state belonging to the ground state manifold of H(T ) = H P . Measuring this state on the σ z basis we obtain a solution to the original problem. Notice that choosing the fluctuation Hamiltonian as in Eq. (17) guarantees the initial ground state has finite overlap with every state in the computational basis; the algorithm will not miss a solution only because the corresponding state had no initial amplitude. The adiabatic theorem also provides a lower bound on T : for the algorithm to succeed, the annealing time should be longer than where |ψ 0 (t) is the instantaneous ground state at time t, |ψ 1 (t) is the first excited state, and ∆(t) is the energy gap between them. We have already encountered a classical Hamiltonian encoding XORSAT in Eq. (3). The quantum mechanical version is simply In our numerics, we use this as the problem Hamiltonian and a uniform transverse field as the fluctuation one. Once we fix the annealing time T and initial state |ψ(0) , QA will end up in some final state |ψ(T ) . Let the probability of measuring an energy equal to the ground state energy, which in our model is E GS = 0, be ε in this state. As T increases, ε will approach unity. The complexity of the algorithm is not expected to change with ε, as long as ε = O(1). In fact, since ε is the probability to find the ground state after a time T , one can enhance this probability by repetition of the algorithm. A success probability of ε after two repetitions becomes 1 − (1 − ε) 2 = 2ε − ε 2 , and even a small ε, after n = O(1/ε) repetitions can be made close to 1. This repetition, or restart technique, is commonly used in algorithms for CSP [2], like WalkSAT. One commonly adopted definition for the time complexity of QA is to invert the relation between ε and annealing time: one fixes ε and N , and asks what is the corresponding annealing time T (N, ε). For large enough N it is reasonable to expect an exponential scaling of the form where the prefactor A(N, ε) is allowed to have a polynomial dependence on N , while the rate µ does not depend on ε. In practice, T (N, ε) is rarely estimated from realtime unitary evolution, since the required computational power quickly becomes unmanageable as the number of spins grows above N = 20. There are two common strategies adopted to sidestep this problem: estimate the energy gap between the ground and first excited state via Exact Diagonalization or Quantum Monte Carlo [19,42,43] and invoke the adiabatic theorem to impose a bound on T (N, ε), or perform the evolution in imaginary time assuming that the scaling of the imaginary analog of T (N, ε) will be the same as the original quantity [18]. Both methods find exponential scaling of the time for hard classical problems, a fact which has been connected with the order of the thermodynamic transition [42] (however, see [58]). While both methods provide reasonable bounds on the scaling of the annealing time, it is unclear how the entropy of excited states affects those estimates. Since entropic effects are the primary focus of this work, we decided to perform real-time evolution to minimize confounding factors, even if this means limiting the simulations to moderate sizes. We integrated the time-dependent Schrödinger equation via an explicit high-order adaptive Runge-Kutta method [59] implemented in the QuSpin library [60].
The results are presented in Fig. 10: it is clear that instances with a unique solution (dark blue dots) are harder than instances with multiple solutions (lighter blue dots), but not exponentially so. For any choice of the solution probability ε, the growth rate µ is compatible with µ = 0.25 ± 0.07 (22) This value is higher than the one predicted in [19], that in our notation would read µ 0.167. The simulations on that reference are run only on instances with a unique solution, and use a Quantum Monte Carlo method to estimate the gap.

IV. CONCLUSIONS
In Figs. 11 and 12 we compare the spectrum of a typical instance with unique solution to the spectrum of an instance with eight-fold degenerate ground state, as a function of the transverse field. When the ground state is unique (Fig. 11) manifold peels off and has an avoided crossing with the ground state. The performance of the Quantum Adiabatic Algorithm is limited by the size of this gap. When the ground state is degenerate (Fig. 12) the picture is qualitatively different: as the transverse field is turned off, a few excited states go through a cascade of avoided crossings, eventually ending in the ground state manifold. In this case, knowledge of the minimum gap between the ground state and the first excited state is not enough to understand the scaling of the annealing time: we should know how many states end up in the ground state, and what avoided crossings those have. Typically, these states will start high in the spectrum, where there will be many crossings, and so where entropy is important. These instances are not naturally described by the picture of trapped states which tunnel to the solution (like a single level crossing) and we cannot avoid noticing the similarity with the phenomenon of entropic barriers, in the sense that only a "lucky" sequence of flips from one state to the other can lead to ground state manifold. And this, too, occurs with exponentially small probability.
Last, we must notice that the threshold energy for this problem is e marg 0.02, so a more quantitative understanding of the difference between excited states below the threshold energy and above it in the context of Quantum Annealing requires studying instances with N 100 or larger. For the small sizes we can currently study via the exact integration of the Schrödinger equation the ground state degeneracy seems the best indicator for identifying hard instances.
For rCSP problems, some algorithms encounter entropic barriers rather than energetic barriers. This in particular happens when the algorithm works at energies higher than the threshold energies for the given problem and they find the ground state by a rare fluctuation which picks the right sequence of O(N ) spin-flips among the exponentially many. We have shown this explicitly with a family of almost-greedy algorithms which includes WalkSAT. These algorithms evolve in t = O(N ) to their equilibrium state (this part of the evolution can be followed by a system of coupled differential equations), and then must benefit from a rare stochastic fluctuation to find the ground state. In the case of WalkSAT we found the fluctuation rate -and hence the time to find a solution-with a simple Brownian motion (and a corresponding Fokker-Planck) analysis of the algorithm. This was possible exactly because the algorithm's equilibrium state is at high energy, so the region where strong correlations exists between the variables is not explored.
We have also introduced a new class of quasi-greedy algorithms which are not able to jump over large energy barrier. We have shown that algorithms belonging to this class are very effective in reaching the threshold energy in times O(N ) and also in reaching the ground state by rare fluctuations in times O(exp(µN )), with an optimal value for the µ exponent. Given that these algorithms are not able to jump over large energy barriers, they provide a direct evidence that the most effective search for solutions in the hard XORSAT problems is limited mainly by entropic barriers. As it is likely to happen in many other hard optimization problems.
Very recently, an algorithm from this class of quasigreedy searches has been used in a challenge, whose participants were asked to find the ground state of a 3-regular 3-XORSAT problem [61]. At present a very optimized version of this quasi-greedy algorithm, running on Nvidia VT100 GPU, is leading the challenge [62], supporting our claim that the quasi-greedy algorithm introduced here are very effective, as the real barrier is entropic in nature.
We have shown, by a time integration of the timedependent Schrödinger equation, that a similar situation is encountered by quantum annealing algorithms. If one posits to find the solution of the problem with an O(1) probability as N grows, the same rate of growth of the solution time is found for problems with one solution (for which one can apply the adiabatic theorem connecting the gap with the time, as in previous studies [63]) and those with many solutions, for which most of the action occurs at finite energy density, where entropy dominates. Here, in particular, it is not difficult to make the parallel with the difficulties attributed to localization phenomena [20], where many small avoided crossings occur between states which are O(N ) spin flip apart.
Therefore we conclude that, in situations in which algorithms work in regions of the phase space in which trapping in local minima is not a real problem, the real problem is entropic barriers, which get for themselves the task of making the solution time exponential. The fact that two such phenomena, apparently so different from each other, can trade places and make sure P =NP is a fascinating, and we believe not widely appreciated aspect of complexity theory. That quantum algorithms might suffer from a similar trade-off is, if possible, even more surprising.

This work was supported by the European Research
Council under the European Union's Horizon 2020 research and innovation program (Grant No. 694925-Lotglassy). We warmly thank Guilhem Semerijan for the careful reading and suggestions. The greedy version of the algorithm we have introduced has many possible stopping configurations. Indeed any configuration where each variable participate to at most one violated interaction is a local minimum of the energy function and thus can block the greedy dynamics. We call these configurations blocked, because by seeing the dynamics as the evolution of energy defects that can just annihilate in pairs, in a blocked configuration energy defects cannot evolve since they are isolated.
We start from the observation that all greedy dynamics tend to a threshold energy around 0.11; more precisely in Fig. 13 we plot the energy of the stopping configurations for several greedy dynamics as a function of the ratio w 3 /w 2 (the only degree of freedom of the algorithm once we fix w 0 = w 1 = 0).
It is thus natural to ask whether it is possible to relate the stopping energy of the greedy dynamics to the entropy of blocked configurations. In order to compute the latter we use the replica symmetric cavity method, that is the Bethe approximation.
We consider the dual lattice to a 3-regular random 3-hypergraph, which is again a 3-regular random 3hypergraph, where the variables are now the constraints and we assign variables n i ∈ {0, 1} indicating whether a constraint is satisfied (n i = 0) or not (n i = 1). The interaction among triplets of constraints (those shared by a variable in the original model) forbids any configuration with more than one violated constraint per triplet (n 1 + n 2 + n 3 ≤ 1). The energy of a spin configuration now corresponds to the density of variables n i = 1 that we call ρ. Under the Bethe approximation the joint probability distribution can be factorized as follows where the second product is over the N randomly chosen triplets forming the 3-regular 3-hypergraph. Given that the random hypergraph is regular we can assume the one-particle and 3-particles marginal probabilities, p 1 (n i ) and p 3 (n i , n j , n k ), being site independent. These can be written explicitly in terms of the energy ρ as of blocked configurations can be written as which is plotted in Fig. 14. S 0 is non negative for ρ ≤ 0.315742 and has a maximum in ρ = 0.170209. The threshold energy for the greedy algorithms we have studied is in the range [0, ρ ], but we cannot estimated it from S 0 alone, because each blocked configuration has a basin of attraction whose size matters as much as the entropy of blocked configurations.
Given the important role of the basins of attraction in predicting the large time limit of relaxation dynamics we have measure their sizes in problems of small size, N ≤ 30, where an exact enumeration can be performed. For each blocked configuration we have measured the size of the basin of attraction as the number of initial conditions that a greedy dynamics brings to that blocked configuration. These sizes are in general exponentially large in N and thus we define b = log(size of basin)/N .
We report in Fig. 15 the results for b as a function of the energy of the blocked configuration for different sizes. The data have been averaged over different samples and different blocked configuration at fixed energy. It is remarkable that even for such small sizes the data show rather weak size dependence and are thus reliable.
The dotted line in the figure is just a guide for the eyes to convince the reader that the observed b(e) is not far from following a linear behavior up to the energy of most numerous configurations e max = 1/2. According to this linear behavior the size of a basin of attraction depends linearly on the depth of the energy minimum (the blocked configuration). The simplest picture compatible with these data is the one schematically represented in  (2) because it is not normalized). The maximum is achieved at an energy close to 0.11 (marked by a vertical dotted line) where the greedy algorithm converges in the large N limit. Fig. 16 where each energy minimum corresponds to a pit whose edge is close to e max , such that the size of the pit grows exponentially with its depth, that is the distance from the edge. According to the simplified picture in Fig. 16 every pit is in principle accessible from the initial configuration (which has typically an energy e max ). However those leading to a solution correspond to a tiny minority of the configurations at e max and thus is extremely unlikely that the dynamics enter one of these (this is essentially the origin of the entropic barrier).
Combining the number of energy minima exp(N S 0 ) with their size exp(N b) we can compute the large deviation rate to find one of the energy minima as a function of their energy. This is shown in Fig. 17, where indeed the maximum is achieved at e ∼ 0.11 (marked by a vertical dotted line) which is the typical energy reached by the greedy algorithm in the large N limit. For finite values of N these data also provide the exponential rate to find a solution via the greedy algorithm: this is just the probability of randomly choosing one of the pits having the bottom at e = 0 and corresponds to µ = max e (S 0 + b) − (S 0 + b)| e=0 ≈ 0.2.
Appendix B: An optimal temperature scheduling for Parallel Tempering The main problem in setting up an optimized PT is the choice of the temperature set. However in the present model we are in a lucky situation because we known that the 3-regular random 3-XORSAT has no thermodynamical phase transition and so, after convergence, the energy sampled by the PT must be the one of the paramagnetic solution, e(β) = [1−tanh(β/2)]/2. In the large N limit we can assume that the extensive energy at inverse temperature β is a Gaussian variables with mean E(β) = N e(β) and variance σ 2 (β) = −N e (β). This Gaussianity assumption (which is rather well satisfied, but in the vicinity of the ground state) allows us to compute the probability of swapping two replicas at temperatures β 1 and β 2 , which has to be a function of the ratio of the energy difference to the energy fluctuation σ 2 (β 1 ) + σ 2 (β 2 ) .
In the large N limit such a ratio can be written as ∆β N e (β) . The best way to allow replicas to wander fast between temperatures is to fix a constant p swap between any pair of successive temperatures, that is using temperature intervals set by which implies in the large N limit the following recursive equation for the temperatures to be used in the optimized Parallel Tempering β n+1 = β n + 2r where r = f −1 (p swap ). The solution to the above recursive equation converges in the large N limit to the following set of temperatures β n = 2 arcsinh tan rn √ N , with the index n running up to √ N π/(2r) .
The optimal value for r is the one minimizing the mean traveling time between the extremal temperatures, which is proportional to [r 2 f (r)] −1 . The minimum is achieved at r opt ≈ 1.68, that corresponds to an optimal swapping rate f (r opt ) ≈ 0.23 (this result is well known as the 0.23 rule). We run all out PT simulations with the temperature set defined in Eq. (B1) with r = r opt .