Limitations of Variational Quantum Algorithms A Quantum Optimal Transport Approach

The impressive progress in quantum hardware of the last years has raised the interest of the quantum computing community in harvesting the computational power of such devices. However, in the absence of error correction, these devices can only reliably implement very shallow circuits or comparatively deeper circuits at the expense of a nontrivial density of errors. In this work, we obtain extremely tight limitation bounds for standard noisy intermediate-scale quantum proposals in both the noisy and noise-less regimes, with or without error-mitigation tools. The bounds limit the performance of both circuit model algorithms, such as the quantum approximate optimization algorithm, and also continuous-time algorithms, such as quantum annealing. In the noisy regime with local depolarizing noise p , we prove that at depths L = O ( p − 1 ) it is exponentially unlikely that the outcome of a noisy quantum circuit out-performs eﬃcient classical algorithms for combinatorial optimization problems like max-cut. Although previous results already showed that classical algorithms outperform noisy quantum circuits at constant depth, these results only held for the expectation value of the output. Our results are based on newly developed quantum entropic and concentration inequalities, which constitute a homogeneous toolkit of theoretical methods from the quantum theory of optimal mass transport whose potential usefulness goes beyond the study of variational quantum algorithms. DOI:


I. INTRODUCTION
The last years have seen remarkable progress in both the size and quality of available quantum devices, reaching the point in which even the best classical computers cannot easily simulate them [4,27,73,83].In spite of these achievements, current devices lack error correction and, thus, are inherently noisy.Considering the significant overheads required to implement error correction [16,70], this has raised the quantum computing community's interest in investigating whether such noisy quantum devices can nevertheless outperform classical computers at tasks of practical interest [69].
One class of algorithms that are considered suited for this task is variational quantum algorithms [11,21].In most cases, these hybrid quantum-classical algorithms work by optimizing the parameters of a shallow quantum circuit to minimize a cost function [11,21].Prominent examples of such algorithms include the variational quantum eigensolver (VQE) [68] and the quantum approximate optimization algorithm (QAOA) [32].As variational algorithms only require the implementation of shallow circuits and simple measurements, it was expected that they could unlock the computational potential of near-term devices.
However, recent results have highlighted several obstacles to achieving a practical quantum advantage through variational quantum algorithms.For instance, some works have shown that optimizing the parameters of the circuit is computationally expensive in various settings [1,12,57,81].Other works have shown that constant depth quantum circuits cannot outperform classical algorithms for certain combinatorial optimization problems [15,22,31,59].Furthermore, it has been observed [34,40,81] that such variational quantum algorithms are less robust to noise than previously expected: already a small density of errors is sufficient to ensure that classical algorithms outperform the noisy device.
In this article, we further investigate the limitations of variational quantum algorithms.Our contributions are two-fold: First, we obtain extremely tight limitation bounds for standard NISQ proposals in both the noisy and noiseless regimes, with or without error mitigation tools.Second, we provide a new homogeneous toolkit of theoretical methods whose potential usefulness goes beyond the present topic of variational quantum algorithms.Our methods originate from the emerging field of quantum optimal transport [18-20, 24, 25, 36, 37, 65, 66, 71].As we will see, optimal transport techniques have the combined advantages of simultaneously simplifying, unifying, and qualitatively refining previously known statements regarding fundamental properties of the output state of shallow and noisy circuits.

Limitations of noisy variational quantum algorithms
More precisely, we obtain two new complementary sets of results providing a better understanding of the limitations of variational quantum algorithms both at very shallow depths, when the effect of noise is negligible, and for a small density of errors.In section III, we first derive new properties for the output probability of (potentially noisy) shallow quantum circuits initiated in the state |0 ⊗n and after measurement in the computational basis.These findings directly improve upon celebrated recent results on the limitation of certain variational quantum algorithms to solve the Max-Cut problem for certain classes of bipartite D-regular graphs.We prove that QAOA requires at least logarithmic in system size depth L to outperform efficient classical algorithms in some instances [15,28,31,59]: We notice that our bound in (1) exponentially improves upon the dependence on the degree D of the graph previously found in [15].For instance, for D = 55 (the minimum value for which Ref. [15] can prove that shallow quantum circuits cannot outperform the classical algorithm by Goemans and Williamson), our bound implies that the QAOA requires a depth larger than 1 as soon as n 10 6 , whereas Ref. [15] gave n 10 54 .Next, section IV is concerned with the concentration profile of the output measure of noisy circuits at any depth L = Ω(1) for simple noise models, e.g.layers of circuits interspersed by layers of one-qubit depolarizing noise of parameter p.For instance, we are able to prove with realistic depolarizing probability p = 0.1 applied independently to each qubit, the number of vertices for the graph has to be smaller than 10 9 in order for the noisy algorithm to outperform the best known classical algorithm (see Theorem VI.2).Moreover, we prove that at depths L = O(p −1 ) it is exponentially unlikely that the outcome of a noisy quantum circuit outperforms efficient classical algorithms for combinatorial optimization problems like Max-Cut.Although previous results already showed that noisy quantum circuits are outperformed by classical algorithms at constant depth [34], their results only held for the expectation value of the output.In contrast, our methods imply that the probability of observing a single string with better energy than the one outputted by an efficient classical algorithm is exponentially small in the number of qubits.This is a significantly stronger statement, although at the cost of slightly worse constants [34].
In addition, in section V, we show that certain error mitigation protocols cannot reverse our conclusions unless we allow for an exponential number of samples in the number of qubits.First, in section V A we show that virtual distillation or cooling protocols [42,47] only have an exponentially small success probability at constant depth.Furthermore, for mitigation procedures that have as their goal to estimate expectation values of observables, we show stringent limitations at O(log(n)) depth in section V B. At this depth, any error mitigation procedure that takes as input m = poly(n) copies of the output of a noisy quantum circuit is exponentially unlikely to yield an estimate that deviates significantly from the estimate we would obtain by providing m copies of a trivial product state as input.Thus, the copies of the noisy quantum circuit do not provide significantly more insights than sampling from trivial product states.Our results strengthen recent results on limitations of error mitigation [74,80] both in terms of the required depth for them to apply and by providing concentration inequalities instead of results in expectation.

Quantum optimal transport toolkit
The second main contribution of the present article is the development of a new set of simple methods from quantum optimal transport whose potential use is likely to exceed the problem of finding tighter limitations on variational quantum algorithms.Our first main tool leading to the results of section III is an optimal transport inequality introduced by Milman [58] in his study of the concentration and isoperimetric profile of probability measures on Riemannian manifolds with positive curvature (see also [44,50,52] for discussions on some related optimal transport inequalities).Adapted to the present setting of n-bit strings {0, 1} n endowed with the Hamming distance d H (x, y) := n i=1 |x i − y i |, Milman's so called (2, ∞)-Poincaré inequality is a property of a probability measure µ on the set {0, 1} n which asks for the existence of a constant C > 0 such that, for any function where denotes the Lipschitz constant of f with respect to the Hamming distance.Besides its natural application to bounding the probability that the function f deviates from its mean by means of Chebyshev's inequality, namely the (2, ∞)-Poincaré inequality further implies by duality the following symmetric concentration inequality: for any two sets For instance, we prove in Proposition III.2 that in the case of a noiseless circuit, the output measure µ out satisfies the (2, ∞)-Poincaré inequality with constant C = B 2 , where B denotes the light-cone of the circuit, i.e. the maximal amount of output qubits being influenced by the value of an arbitrary input qubit through the application of the circuit.In that case, the resulting symmetric concentration inequality (4) quadratically improves over the one previously derived in [28,Corollary 43]: Moreover, the (2, ∞)-Poincaré inequality turns out to be a very simple and versatile tool as compared to the nontrivial proof of [28,Corollary 43] which required the use of Chebyshev polynomials and approximate projections.Moreover, it can be very easily adapted to noisy shallow quantum circuits and continuous-time local Hamiltonian evolutions.In this latter setting, it unifies and refines the main results of [59].The tools described in the previous paragraph are adapted to the study of quantum circuits of depth L = O(log(n)) and related short-time continuous-time evolutions.In contrast, our second set of fundamental results in section IV concerns the concentration profile of the output measure of noisy circuits at any depth L = Ω(1) for simple noise models, e.g.layers of circuits interspersed by layers of one-qubit depolarizing noise of parameter p.In this case, we appeal to recently developed tools such as contraction coefficients for Sandwiched Rényi divergences [62,82] in order to prove that the probability under the output measure µ out of the circuit that an arbitrary n-bit function f : {0, 1} n → R deviates from its mean by a constant fraction an of the total number of qubits satisfies the sub-Gaussian property: for some constants K, c > 0 and a ≥ a 0 ≥ 0. Interestingly, such strong concentration inequalities are known to be equivalent to a strengthening of the (2, ∞)-Poincaré inequality (2) known as transportation-cost inequality [13,55].The latter states that for any measure ν that is absolutely continuous with respect to µ, where D(ν µ) denotes the relative entropy of ν with respect to µ, whereas is the Wasserstein distance of order 1 between ν and µ, also called Monge-Kantorovich distance or earth mover's distance.In summary, our results clearly illustrate the potential of optimal transport methods such as the (2, ∞)-Poincaré and the stronger transportation-cost inequality to study the performance of variational algorithms.We also believe that the discussed methods can have broad applications beyond that of understanding the computational power and limitations of near-term quantum devices.
Indeed, variations of inequalities like (6) and (2) have recently found applications in different areas of quantum information theory.For instance, in [72] they were used to obtain exponential improvements for the sample complexity in quantum tomography.In [25] they were used to derive concentration bounds for commuting Gibbs states and show a strong version of the eigenstate thermalization hypothesis, a topic of intense research in physics.Thus, we believe that the new inequalities and techniques developed here could pave the way to extending such results to larger classes of states.

II. NOTATIONS AND DEFINITIONS
In this section, we introduce the main concepts discussed in the remaining of the paper.We also refer the reader to section A for a complete list of notations.

A. Basic notions
Given a set V of |V | = n qudits, we denote by H V = v∈V C d the Hilbert space of n-qudits and by B(H V ) the algebra of linear operators on H V .O V corresponds to the self-adjoint linear operators on H V , whereas O T V ⊂ O V is the subspace of traceless selfadjoint linear operators.O + V denotes the subset of positive semidefinite linear operators on H V and S V ⊂ O + V denotes the set of quantum states.Similarly, we denote by P V the set of probability measures on [d] V .For any subset A ⊆ V , we use the standard notations O A , S A . . .for the corresponding objects defined on subsystem A. Given a state ρ ∈ S V , we denote by ρ A its marginal on subsystem A. For any region A ⊂ V , the identity on O A is denoted by I A , or more simply I. Given an observable O, we define O σ = tr [σO].We denote the probability of measuring an eigenvalue of O greater than a ∈ R in the state σ as P σ (O ≥ a).Given two probability measures µ, ν over a common measurable space, µ << ν means that µ is absolutely continuous with respect to ν and dµ dν denotes the corresponding Radon-Nikodym derivative.

B. Wasserstein distance
We will make extensive use of notions of quantum optimal transport.The Lipschitz constant of the selfadjoint linear operator H ∈ O V is defined as [65, Section V]: where the infimum above is taken over operators It is worth mentioning that the latter are considered in the fundamental problem in quantum statistical mechanics regarding the equivalence between the micro-canonical and canonical ensembles [14,25,48,75].
The quantum W 1 distance proposed in Ref.
[65] admits a dual formulation in terms of the above quantum generalization of the Lipschitz constant: the quantum W 1 distance between the states ρ, ω ∈ S V is expressed as [65, Section V]: Whereas the trace distance measures the global distinguishability of states, the Wasserstein distance measures distinguishability w.r.t.extensive, quasi-local observables.

C. Local quantum channels
In this work, we consider evolutions provided with a local description.
Definition II.1.A (noisy) quantum circuit N V on n qudits of depth L is a product of L layers N 1 , . . .N L , where each layer N can be written as a tensor-product of quantum channels N ,e acting on a set e ⊂ V of vertices: for some sets E of disjoint subsets of vertices.The circuit is called unitary (or noiseless) whenever each of the channels N ,e is unitary.We call the set the architecture of the circuit N V and denote E = ∪ E .
A key concept associated to the notion of a local evolution is that of a light-cone.In the case of a quantum circuit, the light-cone of a vertex v ∈ V is the smallest set of vertices We then denote the light-cone of the circuit by I N V := max v∈V |I v |.In section III B, we extend this notion to the case of a continuous-time Hamiltonian evolution, where light-cones are defined thanks to Lieb-Robinson bounds [51].

III. CONCENTRATION AT THE OUTPUT OF SHORT-TIME EVOLUTIONS
In this section, we obtain concentration inequalities for the outputs of short-time evolutions.Our main tool is an inequality between the variance and the Lipschitz constant of an observable.For any O ∈ B(H V ) and ω ∈ S V , the variance of O in the state ω is defined as We denote the KMS inner product associated to the state σ as A, B σ := tr , and its corresponding norm as H σ .We have for any H ∈ O V that [77,Equation (20)]: With a slight abuse of notation, we will use the same terminology for the analogous functionals for classical probability distributions.
In analogy with the classical literature [58], we say that a state For instance, tensor product states ρ ≡ v∈V ρ v satisfy the (2, ∞)-Poincaré inequality with constant C = 1 (see section F).The main motivation to introduce these inequalities are the following direct consequences of the (2, ∞)-Poincaré inequality.We leave their proof to section E.
Theorem III.1.Assume that the state σ satisfies a (2, ∞)-Poincaré inequality of constant C > 0.Then, 1. Non-commutative transport-variance inequality: for any two states ρ 1 , ρ 2 ∈ S V with corresponding densities 2. Measured transport-variance inequality: denote by µ σ ∈ P V the probability measure induced by the measurement of σ in the computational basis.Then, for any probability measure ν << µ σ , Moreover, for any two sets A, B ⊂ [d] V , their Hamming distance d H (A, B) satisfies the following symmetric concentration inequality: 3. Concentration of observables: for any observable O ∈ O V and r > 0, Note that the Wasserstein distance and the Lipschitz constant are invariant under product unitaries.Thus, the same results hold for measuring the state in any product basis, not necessarily only the computational basis.
Although it might not be obvious from the outset, the inequalities in item 2 are known to imply no-go results for outputs of shallow quantum circuits [15,28,59].Consider for example the output distribution µ we obtain when measuring the GHZ-state in the computational basis, i.e. the all zeros or the all ones string.If we take A to contain the all zeros string and B to contain the all ones, we clearly have µ(A) = µ(B) = 0.5 and d H (A, B) = n.Thus, the GHZ state does not satisfy a (2, ∞)-Poincaré inequality with C = O(1).

A. Poincaré inequalities at the output of noisy circuits
We will now bound the constant C in various settings.It turns out that noisy shallow circuits satisfy a (2, ∞)-Poincaré inequality: Proposition III.1.For any tensor product input state ρ, the output N V (ρ) satisfies a (2, ∞)-Poincaré inequality with constant where given a set e ∈ E and m ∈ N, I(e, L − ) denotes the set of all vertices in V in the light-come of the set e for the circuit constituted of the last L − layers of N V .
The proof of this proposition is left to section F. When N V ≡ U V is noiseless, we get the following tightening of Proposition III.1: Proposition III.2.For any tensor product input state ρ, the output U V (ρ) satisfies a (2, ∞)-Poincaré inequality with constant Note that for any circuit the light-cone can grow at most exponentially in L.

B. Poincaré inequality for continuous-time quantum processes
We now consider the continuous-time setting, and restrict ourselves to a system whose interactions are modeled by a graph G = (V, E) whose vertices V correspond to a system of |V | = n qudits, and denote by D := max v∈V {v |(v, v ) ∈ E} the maximum number of nearest neighbours to a vertex.In the (noiseless) continuous-time setting, one replaces the notion of a circuit by that of a local time-dependent Hamiltonian evolution: Definition III.1.A (noiseless) continuous-time local quantum process is a unitary evolution {U V (t)} t≥0 generated by the time-dependent Hamiltonian where H e is a time-independent self-adjoint operator that acts non-trivially only on the edge e ∈ E with norm H e ∞ ≤ 1 2 .We also assume that b := sup t,e |α e (t)| < ∞ independently of the size of the system.In what follows, for any subregion A ⊂ V , we also denote the Hamiltonian restricted to A by H A (t) := e⊂A α e (t) H e , and its corresponding unitary evolution by {U A (t)} t≥0 .
For continuous-time unitary evolutions, the concept of light-cone is formalised by the existence of a Lieb-Robinson bound [51].Since their introduction, Lieb-Robinson bounds have been extensively studied in various levels of generality for unitary [63] as well as dissipative Markovian evolutions [6].In what follows, we define a distance dist : E × E → R + on the edge set E which for any two edges e = (v 1 , v 2 ) and e = (v 1 , v 2 ) takes the value dist(e, e ) = 0 if and only if e = e , and otherwise is equal to the length of the shortest path connecting the sets of vertices {v 1 , v 2 } and {v 1 , v 2 }.Next, we denote by S e (k) the sphere around any edge e ∈ E of radius k, i.e.
Then, the set E is said to be of spatial dimension δ if there is a constant M > 0 such that for all e ∈ E, The following result is taken from [46, Theorem 2] (see also [59, Theorem 1] for a similar result).
Next, we order the vertices {1, • • • , n}, n = |V |, with their graph distance to an arbitrarily chosen vertex v 0 ≡ 1, and denote the graph distance dist({1}, {i, • • • , n}) ≡ d(i).Then, in the notations of section II, we have by that for any H ∈ O V (see section G), where i 0 stands for the first vertex such that d(i 0 ) ≥ 2δ − 1.By a reasoning that is identical to that leading to Proposition III.1, we have Proposition III.3.Let ρ be a product input state.For any t ≥ 0, the output state U V (t)(ρ) satisfies a (2, ∞)-Poincaré inequality with constant For a simpler version of the bound found in Proposition III.3, we refer the reader to Proposition VI.1.The bounds obtained in Proposition III.1, Proposition III.2 and Proposition III.3 can be combined with Theorem III.1 (3) to get Chebyshev-type concentration bounds.This improves for instance over [59,Theorem 2], where the concentration bound was obtained only in the continuous-time Hamiltonian setting and for a specific 1-local observable measuring the Hamming weight.Moreover, the bound obtained in Equation ( 12) on the Hamming distance between two sets in terms of their probabilities in the state σ is an improvement over the symmetric concentration inequality found in [28,Corollary 43], namely as well as its continuous-time analogue in [59,Theorem 3].In summary, the (2, ∞)-Poincaré inequality is a versatile tool that we use to derive the strongest concentration-type bounds for general short-time quantum evolutions currently available in a simple, basis-free manner.

IV. LIMITATIONS AND CONCENTRATION INEQUALITIES FROM NOISE
In section III we discussed how to use optimal transport methods to analyse the concentration profile of quantum circuits at small depths, even in the absence of noise.We now turn our attention to the case where the circuit is also subject to local noise and prove concentration inequalities for their outputs.As in the noiseless case, these can then be used to estimate the potential of noisy quantum circuits to outperform classical algorithms.However, unlike in Theorem III.1, we here obtain stronger Gaussian concentration inequalities.
For this, we make use of the sandwiched Rényi divergences [62,82] of order α ∈ (1, +∞).For two states ρ, σ such that the support of ρ is included in the support of σ they are defined as We also consider the relative entropy we obtain by taking the limit α → ∞, In case the support of ρ is not contained in that of σ, all the divergences above are defined to be +∞.We start from the assumption that the noise is driving the system to a quantum state σ on H V that satisfies a Gaussian concentration inequality of parameter c > 0. That is, there is a constant K such that for any a > 0 and observable O: where the quantum Lipschitz constant of a non self-adjoint matrix Z is defined as Note that inequalities of the form (19) hold for product states [9,65,71], commuting high-temperature Gibbs states [25,84] and in slightly weaker form for all high-temperature Gibbs states [49] and gapped ground states on regular lattices [2].Moreover, in the case where σ and O commute, we clearly have σ − 1 2 Oσ We then have the following concentration result, proved in Lemma B.1 of the Supplemental Material: Theorem IV.1.Let σ satisfy Eq. (19).Then for any state ρ and a > 0 and α > 0 we have: It immediately follows that if we have that for a noisy circuit and a value of a: then the probability of observing an outcome outside of the interval O σ ± a|V | when measuring N V (ρ) is exponentially small in |V |.Thus, given a bound on D α (N V (ρ) σ), we can solve for a in Equation ( 21) and establish a such that the probability of observing outcomes outside of O σ ± a|V | is exponentially small.In section VI we discuss this more concretely to analyse the potential performance of QAOA under noise.For now, let us discuss how to obtain the bounds on D α (N V (ρ) σ) to apply effectively Theorem IV.1.One straightforward way to derive such bounds is to resort to so-called strong data-processing inequalities (SDPI) [9,10,17,38,41,45,61,64,84].A quantum channel N with fixed-point σ is said to satisfy an SDPI with constant q α > 0 with respect to a fixed-point σ and D α if for all other states ρ we have: Then, assuming that the noisy quantum circuit N V we wish to implement is of the form of ( 9) and each layer N satisfies Eq. ( 22) for some constant q α , we show in Lemma C.1 of the Supplemental Material that: Thus, as long as the fixed point of the noise is left approximately invariant by the channels at the end of the circuit, Eq. ( 23) implies that the relative entropy will decay as the depth increases.As we argue in section C 2, this will be the case for both QAOA and annealing circuits for most one qubit noise models.Furthermore, this will hold for any circuit whenever the fixed point of the noise is the maximally mixed state.
It is also possible to derive similar inequalities for continuous-time evolutions with a time-dependent Hamiltonian H t and the noise given by some Lindbladian L. In that case, the assumption in Eq. ( 22) is replaced by for some constant r α > 0. In Lemma C.2 we show the continuous-time version of Eq. ( 23).
To illustrate the power of the bound in Eq. ( 21), let us analyse the case where N V consists in a concatenation of layers of unitary gates with layers of noise where D p is a qubit depolarizing channel with depolarizing probability p.One can then show that Eq. ( 22) holds for α = 2 and q 2 = 2p + p 2 [61, Sec.3.3] and, thus, for any circuit of depth L in this noise model: Moreover, the maximally mixed state satisfies Eq. ( 19) with c = K = 1 [66].By combining Eq. ( 25) with Eq. ( 21) we arrive at: Hamiltonian and N V be a depth L unitary circuit interspersed by 1−qubit depolarizing noise with depolarizing probability p. Then for any initial state ρ and > 0: Let us exemplify the power of Eq. (26).For an H of practical interest, say H an Ising Hamiltonian, efficient classical algorithms are known to find solutions whose energy is a constant fraction from the ground state energy [5].That is, there exists an a c = Ω(1) such that efficient classical algorithms can sample states ρ that satisfy tr (ρH It then follows from Eq. ( 21) that at a constant depth L > log(a −1 c )/(2p), the probability of the noisy quantum circuit outperforming the classical algorithm is exponentially small in system size.
Note that other results in the literature already showed that quantum advantage is already lost at constant depth for such problems [34].However, these results only showed bounds for the expectation value of the output of the circuit, whereas bounds like that in Proposition IV.1 provide concentration inequalities, a significantly stronger result.However, we do pay the price of having slightly worse constants for the depth at which advantage is lost compared to the results of [34].We will discuss concrete examples for the bounds we obtain on the depth in section VI.
Above we illustrated our concentration bounds for depolarizing noise only, as it corresponds to the simplest noise model that we can analyse.But our result can be generalized to all noise models that contract the relative entropy uniformly w.r.t. to a fixed point of full rank.However, this generalization comes at the expense of the bounds not being circuit-independent unless the noise is unital.As before, the first step to obtain concentration results is to control the decay of the relative entropy under the noise for Rényi divergences.
Lemma IV.1 (Lemma 1 of [34]).Let N : B(H V ) → B(H V ) be a quantum channel with unique fixed point σ > 0 that satisfies a strong data-processing inequality with constant p α > 0 for some α > 1.That is, for all states ρ.Then for any other quantum channels Φ 1 , . . ., Φ m : B(H V ) → B(H V ) we have: We refer to section C for a more detailed discussion of this result and Lemma C.1 for a proof.In section C 2 we evaluate the expression in Eq. ( 28) for the special case of QAOA circuits converging to diagonal product states.Furthermore, in section C 3 we discuss the performance of the resulting bounds for random graphs.
In the same appendix we also prove the continuous time version of the Lemma above that is relevant to quantum annealers, which now also state for completeness: Proposition IV.2.Let L : B(H V ) → B(H V ) be a Lindbladian with fixed point σ q defined as before with q ≥ 1 2 .Suppose that for some α > 1 we have for all t > 0 and initial states that there is a r α > 0 such that: Moreover, for functions f, g . Let T t be the evolution of the system under the Lindbladian S t = L + H t from time 0 to t ≤ T .Then for all states ρ: Note that the expression in Eq. ( 28) will converge to 0 as long as Φ t (σ) σ for t close to T .As we will argue in more detail in section C, this is expected to be satisfied for good QAOA circuits.Furthermore, we explicitely evaluate the bound in Eq. ( 28) in terms of the parameters of the QAOA circuit in Corollary C.1 or for a given annealing schedule.These results can then be combined with Theorem IV.1 to understand the concentration properties of the output.The same holds in principle for Eq.(30), where this can be visualized more easily: As long as the function f satisfies f (1) = 0, the second term in Eq. (30) will converge to 0.
We illustrate this concretely in the case of noisy annealers with a linear schedule in Proposition VI.3.

V. LIMITATIONS OF ERROR MITIGATED NOISY VQAS: CONCENTRATION BOUNDS
A possible criticism of bounds like that of Proposition IV.1 is that they do not take error-mitigation techniques [29,30,56,76] into account.Although there does not seem to be a widely accepted definition of what error mitigation entails, the overarching goal of such protocols is to extract information about noiseless circuits by sampling from noisy ones.Such proposals are expected to be useful before the advent of fault tolerance to reduce the level of noise present in the data outputted by NISQ devices.
The majority of existing mitigation protocols require a significant overhead in the number of samples to extract the noiseless signal from noisy ones, potentially making error mitigation prohibitively expensive.Thus, one of the main questions regarding the viability of error mitigation strategies is the scaling of the sampling overhead they require in terms of number of qubits, depth and error rate.
There already exist some results in the literature discussing limitations of error mitigation, such as [74,80].They show that certain error mitigation protocols require a sampling overhead that is exponential in system size at linear circuit depth.Our results in the next sections suggest that at significantly lower depths it is already difficult to extract information about the noiseless output state, while also providing concentration bounds for error-mitigated circuits.
In what follows, we will distinguish sampling and weak error mitigation.To the best of our knowledge this distinction has not been made before in the literature.But in analogy with the terminology for the simulation of quantum circuits, we will call an error mitigation strategy a sampling protocol if it allows us to approximately sample from the output of a noiseless circuit.In contrast, weak error mitigation techniques only allow for approximating expectation values of the outputs of noiseless circuits.Note that the latter is a weaker condition.

A. Sampling error mitigation and the effect of error mitigation on classical optimization problems
We start by discussing the effect of noise on known sampling error mitigation procedures.We believe that these are particularly relevant for classical combinatorial optimization problems.This is because for such problems one is often not necessarily interested in estimating the ground state energy, but rather in obtaining a string of low energy that corresponds to a good solution.And for this it would be necessary to obtain a sample.
To the best of our knowledge, the only error mitigation technique that allows for sampling from the noiseless state is virtual distillation or cooling [42,47].Going into the details of this procedure is beyond the scope of this manuscript.It suffices to say that it takes as an input k copies of the output of a noisy quantum circuit and aims at preparing the state ρ k /tr ρ k .Under some assumptions, one can then show that this state has an exponentially in k larger overlap with the output of the noiseless circuit.However, as this is clearly not a linear transformation, it can only be implemented stochastically.The success probability of the transformation is given tr ρ k ≤ tr ρ 2 .As before, for simplicity we will state our no-go results for the case of local depolarizing noise and leave the proof and the more general case to section C 2. We then have: Proposition V.1.Let N V be a depth L unitary circuit interspersed by 1−qubit depolarizing noise with depolarizing probability p. Then for any initial state ρ and k ≥ 2, the probability that virtual cooling or distillation succeeds is bounded by: The proof of Proposition V.1 can be found in Proposition D.1.Thus, we conclude from Eq. ( 32) that unless the local noise rate is p = O(n −1 ), virtual distillation protocols will require an exponential in system size number of samples to be successful even after one layer of the circuit.We remark that our results essentially imply the same conclusions for general local, unital noise.For nonunital noise driving the system to a product state we obtain the following statement: Lemma V.1.Let τ q = q|0 0|+(1−q)|1 1| and assume w.l.o.g. that q ≤ 1 2 .Then for for any state we have that the probability that virtual cooling or distillation succeeds is bounded by 2 − n .
That is, for more general fixed points the virtual cooling or distilliation will succeed with exponential small probablity if the realative entropy has decayed by a factor of log(2q).As it was the case with the concentration bounds, we see that our bounds become weaker as the fixed point becomes purer.Furthermore, it is also possible to immediately apply the results derived in section C to estimate when the entropy has contracted enough such that the success probability becomes exponentially small.

B. Weak error mitigation with regular estimators
We will now see that the techniques of the last sections also readily apply to weak error mitigation techniques that are regular in a sense that will be made precise later.To the best of our knowledge, all weak error mitigation techniques have the following basic building blocks and parts: 1. Take the outcome of m (noisy) quantum circuits 2. Add auxiliary qubits and perform a collective noisy circuit Φ on the output of the m circuits.
3. Perform a measurement on the m systems.
4. Postprocess the outcomes of the measurements and output an estimate.This is illustrated in Figure 1.It is easy to see that points (2), ( 3) and ( 4) can all be collectively modelled by applying a global projective measurement M := {M s } s∈S on the state m i=1 E i (ρ i ) ⊗ |0 0| ⊗k , where we assume we have access to k auxiliary systems.Here the PVM is indexed from some classical sample space S, followed by a classical procedure mapping each measured output s ∈ S to a real value f (s) through a function f : S → R. The hope is then that f (s) provides a good estimate for some property of the noiseless circuit.
Equivalently, we are interested in the probabilistic properties of the observable in the output state ρ out := m i=1 E i (ρ i ) of the original noisy circuit, where we have traced out the auxiliary systems used in the mitigation process.

FIG. 1. Schematic of an error mitigation protocol
In order to obtain concentration inequalities for error mitigation protocols, we will impose a bit more structure on the estimators.To make our motivation for our further assumptions clear, we will use as our guiding example the most naive of all error mitigation protocols for an optimization task: sampling m times from the quantum device, evaluating the energy of each outcome and outputting the minimum.I.e., just repeating the experiment often enough.First, we will assume that the PVM is indexed by labels s ∈ R m .In the case of the minimum strategy discussed before, the individual entries of this vector would correspond to the energy we observed on each one of the m copies.Furthermore, we will assume that f is L f -Lipschitz w.r.t. the ∞ norm on R m , i.e.: sup In the case of the minimum strategy, f would correspond to the minimum in R m , for which we have L f = 1.Let us justify the assumption that f is Lipschitz by looking in a bit more detail into the case of the POVM measuring copies independently.In that case, f being Lipschitz w.r.t.∞ corresponds to requiring that the errormitigated estimate should not depend too strongly on any individual sample, a robustness condition that is desirable in the presence of noise.Finally, we will assume that the error mitigation procedure concentrates when given trivial, product states: for some function K(m).
Let us discuss this assumption once again in the case of taking the minimum of measuring the energy of a Hamiltonian H m times.In that case, each measurement satisfies Gaussian concentration for some c and L .Thus, it follows from a union bound that for taking independent measurements, Eq. ( 35) holds with K(m) = m.Now that we have formulated the error mitigation protocol in this way, we can immediately apply the same reasoning as in Proposition IV.1 to understand the concentration properties of the error mitigation procedure.We get: Theorem V.1.For an error mitigation observable X as in Eq. (34) assume that for a given state σ Eq.( 35) holds for some function K(m).Furthermore, assume that for r, > 0 given we have for all . Then: ) .(36) We leave the proof of Theorem V.1 to section I.We see that the amount by which the Rényi entropy has to decrease to ensure we are in the regime where we obtain concentration from Eq. ( 36) is connected to the Lipschitz constant of X and the number of copies m.For instance, under local depolarizing noise with depolarizing probability p, this happens at depth O(p −1 log(ml 0 )).
One way of interpreting the bound in Eq. ( 36) is that the probability that the estimate we obtain from the output of the error mitigation algorithm with input given by the noisy states to that with the fixed-point of the noise as input is exponentially small.Thus, the noisy outputs were useless: we could have just sampled from the product state σ ⊗m instead and observed similar outcomes.
However, it might be hard to control the Lipschitz constant L f in general scenarios.Moreover, many mitigation protocols in the literature [29,30,56,76] involve estimating the mean of random variables that take exponentially large values.Thus, their Lipschitz constant will typically also be exponentially large, constraining the applicability of Theorem V.1.

VI. EXAMPLE: FINDING THE GROUND STATE OF ISING HAMILTONIANS IN THE NISQ ERA
Given a matrix A ∈ R n×n and a vector b ∈ R n we define the Hamiltonian It is well-known how to formulate various NP-complete combinatorial optimization problems as finding a string that minimizes the energy of H I .This has motivated the pursuit of NISQ algorithms for this task, including the quantum approximate optimization algorithm [32] (QAOA) or the closely related quantum annealing algorithm.
Let us briefly describe the QAOA algorithm.Given a P ∈ N and vectors of parameters γ, τ ∈ R P , the QAOA unitary is given by where The hope of QAOA is that by optimizing over the parameters γ, β, measuring V γ,β |+ ⊗n in the computational basis will yield low energy strings for the Hamiltonian in Eq. ( 37) even for moderate values of P .In what follows we will distinguish the depth of the QAOA Ansatz (denoted by P ) from the physical depth of the circuit being implemented in the device (denoted by L).
In recent years, several works have identified limitations on the performance of constant depth circuits in outperforming classical algorithms for this problem [15,31], even in the absence of noise.These results were then later extended to short-time quantum annealing [59].
Taking the noise into consideration, recent works have shown that QAOA is outperformed by efficient classical algorithms at a depth that is proportional to the local noise rate [34].However, those works only considered the expected value of the output string.Considering that the goal of QAOA is to obtain one low-energy string, to completely discard exponential advantages of QAOA and other related algorithms at a depth that only depends on local noise rates, it is important to also obtain concentration inequalities for the outputs.
As mentioned before, Proposition IV.1 already allows us to conclude that quantum advantage will be lost against classical algorithms at constant depth.With the techniques presented in this work, it is also straightforward to obtain concentration bounds for concrete instances.Indeed, given that a classical algorithm found a string with given energy −a C n, we can easily bound the depth at which the bound in Proposition IV.1 kicks in and the quantum device is exponentially unlikely to yield a better result.

A. Max-Cut
In this subsection, we analyze the performances of quantum circuits for the Max-Cut problem.Let G = (V, E) be a graph.The cut of a bipartition of V is the number of edges that connect the two parts.The Max-Cut problem consists in finding the maximum cut of G, which we denote with C max .The best classical algorithm for Max-Cut is due to Goemans and Williamson [39] and can obtain a string whose cut is at least 0.878 C max .As in [15], we consider circuits that commute with σ ⊗n x , which include the QAOA circuit.We prove that the algorithm by Goemans and Williamson cannot be outperformed by: • Noiseless circuits with shallow depth (Theorem VI.1); • Noisy circuits with any depth (Theorem VI.2).
We assume that G is bipartite, i.e., C max = |E|, and is regular with degree D, i.e., each vertex belongs to exactly D edges.Without loss of generality, we assume We denote with C(x) the cut of such bipartition.We also assume that G satisfies for any x ∈ {0, 1} n , where |x| denotes the Hamming weight of x, i.e., the number of components of x that are equal to 1.For any D ≥ 3, Ramanujan expander graphs constitute an example of graphs with such property [53,54,60].Moreover, random D-regular bipartite graphs approach the bound (39) with high probability [35].The Max-Cut problem for G is equivalent to maximizing the n-qubit Hamiltonian where for any j ∈ [n], σ j z is the Pauli Z matrix acting on the qubit j.
Theorem VI.1 (noiseless Max-Cut).Let G be a regular bipartite graph with n vertices satisfying (39), and let H be the associated Max-Cut Hamiltonian (40).Let ρ be the output of a noiseless quantum circuit as in Definition II.1 made by L layers, where each layer consists of a set of unitary gates acting on mutually disjoint couples of qubits.We assume that the input state of the circuit and each unitary gate commute with σ ⊗n x .Then, if we must have Furthermore, if ρ is generated by the QAOA circuit (38) with depth P , we must have Remark VI.1.For any D ≥ 55 we have Our result (43) provides an exponential improvement over (45) with respect to D. Already for D = 55, the right-hand side of ( 45) is larger than 1 only for n = Ω(10 54 ), while the right-hand side of ( 43) is larger than 1 already for n = Ω(10 6 ).
Proof. Circuit made of two-qubit gates: From Proposition III.2, ρ satisfies a (2, ∞) Poincaré inequality with constant and let X be the random outcome obtained measuring ρ in the computational basis.Proposition H.1 of section H implies and ( 12) of Theorem III.1 together with (46) imply The claim (42) follows.
QAOA circuit: From Proposition III.2, ρ satisfies a (2, ∞)-Poincaré inequality with constant Proceeding as in the previous case we get and the claim (43) follows.
Theorem VI.2 (noisy Max-Cut).Under the same hypotheses of Theorem VI.1, let each layer of the circuit be followed by depolarizing noise with depolarizing probability p applied to each qubit.Then, For p = 0.
Let us consider the following operator associated to the Hamming distance from x opt : We have Tr K = 0 and K L = 1, therefore Proposition IV.1 implies that for any > 0, upon measuring K on ρ we have Proposition H.1 implies and choosing in (55) we get hence The claim follows combining ( 53) and ( 59).

B. Short-time evolution of local Hamiltonians
Quantum annealing constitutes another family of heuristic algorithms to solve optimization problems.Similar to the variational algorithms discussed earlier, the goal in quantum annealing is to find the lowest energy of a classical Hamiltonian that encodes the optimization problem.To find the lowest energy of the optimization Hamiltonian, we can start from a local Hamiltonian whose ground state is easy to prepare, for example − i X i , and continuously change the Hamiltonian to the desired optimization Hamiltonian H I : where a(0) = b(T ) = 1, a(T ) = b(0) = 0 and T is the final evolution time.The adiabatic theorem [43] guarantees that if we start from the ground state of the initial Hamiltonian and evolve the system slowly enough, the final state would be close to the ground state of the optimization Hamiltonian, which can be found by measurement in the computational basis at the final time.Since noise restricts the total time that coherence in the system is preserved, understanding the limitations of shorttime evolution of local Hamiltonians seems crucial.The presented (2, ∞)-Poincaré inequality provides bounds on the performance of short-time quantum annealers.
Proposition VI.1 (Short-time evolution of local Hamiltonians).Let σ be the quantum state generated by evolving a product state with a continuous-time local quantum process as in section III B for time t ≥ 0. Let µ σ be the probability distribution of the outcome of the measurement in the computational basis performed on σ.Then, for any A, B ⊆ {0, 1} V we have where d H denotes the Hamming distance, v = eb (2D − 1), D is the maximum degree of the interaction graph, b is the maximum interaction strength and where Li s (z) is the polylogarithm function of order s and argument z and δ is spatial dimension of the interaction graph.
Remark VI.3.Crucially, both c 0 and c 1 are independent of the number of qubits.
Proof.We start by deriving an upper bound on C t of Theorem III.1.We note that by the definition of i 0 , we have 2δ − 1 ≤ d(i 0 ) ≤ 2δ, and therefore using Putting these two bounds together, we have The claim follows by applying Theorem III.1.
Considering the example of generating a generalized GHZ state, where and, therefore, at least O(log(n)) time is required to generate generalized GHZ states using local Hamiltonians.
Note that this bound also provides a minimum time required by local Hamiltonians to simulate unitaries that are capable of generating generalized GHZ states starting from product states, such as n-qubit fan-out gates.
The short-time evolution of local Hamiltonians also limits their performance to solve Max-Cut problem discussed in section VI A. Note that both the initial state and the annealing Hamiltonian of ( 60) with the final Hamiltonian H I corresponding to the Max-Cut problem commute with σ ⊗n x , and therefore the techniques of Theorem VI.1 directly lead to a proof for limitation of shorttime evolution of local Hamiltonian for the optimization task.
Proposition VI.2.Consider the Max-Cut problem Hamiltonian H I as discussed in Theorem VI.1, and the corresponding annealing Hamiltonian in the form of (60).Let ρ be evolved states after time T .Then, if we must have Proof.From ( 49) and ( 12) of Theorem III.1 we have which can be combined with (63) to get The claim follows.

C. Noisy QAOA beyond unital noise
In this subsection we discuss the performance of our bounds for QAOA beyond the case of unital noise.As mentioned before, if the noise is not unital our bounds on the relative entropy decay are not independent of the circuit being implemented.Thus, we need to pick a promising family of QAOA parameters to apply our results.
A natural candidate of instances to analyse is Max-Cut on random regular graphs of high girth.This is because in [7] the authors derive the optimal parameters for QAOA for such graphs in the large n limit for up to 17 layers.Furthermore, they show that these QAOA circuits achieve an expected value for the cut that is higher than what known provably efficient classical algorithms achieve.Although these parameters are only optimal in the absence of noise, we analyse their performance in the presence of non-unital noise driving the system to the classical state τ ⊗n q with τ q = q|0 0| + (1 − q)|1 1|.
As explained in section C 3, we show that as long as the output ρ of a noisy QAOA circuit satisfies for a D-regular graph, the probability that the noisy circuit outperforms classical methods is exponentially small.When this is achieved in terms of the contraction coefficient is displayed in Figure 2. Relative entropy density of output of optimal L = 17 QAOA circuit for 50-regular random graphs q=0.5 q=0.6 q=0.7 q=0.99 Threshold FIG. 2. Relative entropy density of the output of a QAOA circuit of P = 17 layers for various fixed-points τ ⊗n q as a function of the contraction coefficient and D = 50.We used the optimal parameters found in [7] for our circuit.The threshold we used is the one in (69) and we used Corollary C.1 to estimate the relative entropy decay.Although we see that our bounds have a worse performance as q → 1, the amount of noise we can tolerate is still independent of the system's size.
Although Figure 2 seems to suggest that advantage is only lost at high noise levels as the fixed point becomes purer, recall that when implementing the QAOA circuit on the actual device, the circuit depth will be significantly larger than 17.Indeed, in the plot, we took D = 50, which means that a circuit of depth at least 50 of two-qubit gates is required to implement each layer of e iγ i H I t .If we further incorporate the compilation of gates and the fact that NISQ devices are unlikely to have all-to-all connectivity, which imposes extra layers of SWAP gates, the depth required to implement each layer of QAOA with D = 50 will conservatively be of order at least 10 2 .Thus, it is also reasonable to assume that the effective noise rate when implementing a layer of the QAOA circuit will be two orders of magnitude larger than the physical noise rate.
More generally, our bounds predict that quantum advantage will be lost whenever the QAOA parameters satisfy β k → 0 as k → ∞.This is the case for the optimal parameters found in [7].This is because for such parameters the relative entropy between the output of the circuit and τ ⊗n q decays to 0. This is illustrated more clearly in the continuous-time case of quantum annealing we discuss now.

D. Noisy quantum annealing beyond unital noise
In this subsection, we will illustrate the bound in Proposition IV.2 for the case of noisy annealers with a linear schedule.That is, the function f in the statement is just given by f (t) = (1 − t).Furthermore, we will assume that the time-independent Lindlbadian of spectral gap 1 is driving the system to the product state τ ⊗q q with τ q = q|0 0| + (1 − q)|1 1| for q < 1 2 .
Proposition VI.3.For 0 < q ≤ 1 2 and T > 0 let and h(T ) be Furthermore, let T t be defined as in Proposition IV.2 and f (t) = (1 − t).Then for the initial state |+ ⊗n and ρ T = T T (|+ +| ⊗n ) we have: We refer to section C 4 for a discussion of this result and Proposition C.3 in the same section for a proof.But the take-away message from Proposition VI.3 is that we can still derive concentration inequalities beyond unital noise.However, the bounds get looser as q → 0 (i.e. the fixed point becomes pure) and the decay of the relative entropy is polynomial instead of exponential.
We can reach similar conclusions for the purity of the output and, thus, for the probability that virtual cooling succeeds.
Proposition VI.4.For 0 < q ≤ 1 2 and T > 0 let and h(T ) be as in Eq. (71).Furthermore, let T t be defined as in Proposition C.1 and f (t) = (1 − t).For the initial state |+ ⊗n let T be large enough for h(T ) ≤ 1 − log(2(1 − q)) − to hold for some > 0. Then the probability that virtual cooling or distillation succeeds is at most exp(− n).
We refer to section C 4 for a proof.

VII. CONCLUSION AND OPEN PROBLEMS
In this work we have used techniques of quantum optimal transport to derive various concentration inequalities for quantum circuits.In particular, we showed quadratic concentration for shallow circuits and Gaussian concentration for noisy circuits at large enough depth and Lipschitz observables.
By applying such inequalities to variational quantum algorithms such as QAOA or quantum annealing algorithms, we showed that for most instances, the probability that these algorithms outperform classical algorithms is exponentially small whenever the circuit has a nontrivial density of errors.Furthermore, we obtained self-contained and simplified proofs of previous results on the limitations of QAOA.
Our work demonstrates the relevance of quantum optimal transport methods to near-term quantum computing.Furthermore, it closes a few important gaps in previous results on limitations of variational quantum algorithms.
An important problem that is left by our work is whether it is also possible to obtain Gaussian concentration inequalities for the outputs of shallow circuits.After the posting the first version of the present work, the authors of [3] found a different method based on polynomial approximations for showing that the output distributions in fact satisfy a stronger Gaussian concentration bound, hence answering this question.

VIII. ACKNOWLEDGMENTS
GDP is a member of the "Gruppo Nazionale per la Fisica Matematica (GNFM)" of the "Istituto Nazionale di Alta Matematica "Francesco Severi" (INdAM)".MM acknowledges support by the NSF under Grant No.CCF-1954960 and by IARPA and DARPA via the U.S. Army Research Office contract W911NF-17-C-0050.DSF acknowledges financial support from the VILLUM FONDEN via the QMATH Centre of Excellence (Grant no.10059) and the QuantERA ERA-NET Cofund in Quantum Technologies implemented within the European Unionâ€™s Horizon 2020 Program (QuantAlgo project) via the Innovation Fund Denmark.CR acknowledges financial support from a Junior Researcher START Fellowship from the DFG cluster of ex-cellence 2111 (Munich Center for Quantum Science and Technology), from the ANR project QTraj (ANR-20-CE40-0024-01) of the French National Research Agency (ANR), as well as from the Humboldt Foundation.
Proof.We have: α by an application of Hölder's inequality and α here being the Hölder conjugate of α.Next, by the Araki-Lieb-Thirring inequality: where in the last inequality we used the fact that α > 1 and E ≤ I. Furthermore, as α is the Hölder conjugate of α, we have that 1 α = α−1 α and then: The claim in Eq. (B1) then follows from a simple manipulation and by noting that tr σ Eq. (B2) also immediately follows from plugging in the Gaussian concentration bound.

Appendix C: Entropic convergence results
In this section we will collect some results that allow us to estimate the sandwiched Rényi divergence between the output of a noisy quantum circuit or annealer and the fixed point of the noise affecting the device.In essence, these results are a generalization of the results of [34, Lemma 1 and Theorem 1].In that work, the authors show precisely the same bounds as here, but only for the Umegaki relative entropy.However, their proofs can immediately be adapted to our setting with Rényi divergences.Thus, we will restrict ourselves to showing how to obtain a convergence result for discrete time circuits and do not describe the same proof for continuous-time in full detail.
Lemma C.1 (Lemma 1 of [34]).Let N : B(H V ) → B(H V ) be a quantum channel with unique fixed point σ > 0 that satisfies a strong data-processing inequality with constant p α > 0 for some α > 1.That is, for all states ρ.Then for any other quantum channels Φ 1 , . . ., Φ m : B(H V ) → B(H V ) we have: Proof.For m = 1, this follows from the data-processed triangle inequality of [23,Theorem 3.1].In their notations, it states that for any quantum channel P , states ρ, σ, σ and α ≥ 1 we have: Setting P = Φ 1 and σ = σ in their notation it implies that: Let us now assume the claim to be true for some m = k.Then for m = k + 1 we have: by our induction hypothesis.Applying Eq. (C3) to the first term in Eq. (C4), the strong data-processing inequality, we obtain the claim.
Note that Lemma C.1 implies that the Rényi divergence will converge to 0 whenever Φ t (σ) σ as t → ∞.This is always the case for unitary circuits under unital noise, as the fixed point is the maximally mixed state and is invariant under unitaries, but is also expected to hold for QAOA circuits.See section VI for examples of such circuits.
We can also show similar statements for continuous-time evolutions under noise to also study quantum simulators or annealers: Lemma C.2 (Theorem 1 of [34]).Let L : B(H V ) → B(H V ) be a Lindbladian with fixed point σ.Suppose that for some α > 1 we have for all t > 0 and initial states that there is a r α > 0 such that: for some time-dependent Hamiltonian H t .Moreover, let T t be the evolution of the system under the Lindbladian S t = L + H t from time 0 to t.Then for all states ρ and times t > 0: Thus, armed with contraction inequalities like those in Eq. (C5) or Eq.(C1) it is straightforward to obtain estimates on Rényi entropies.For completeness, we will collect some known results and techniques to obtain such contraction inequalities in the next Section.

Contraction results for sandwiched Rényi divergences
Let us now collect some known results to obtain inequalities like Eq. (C5) or Eq.(C1).We will focus on the case where the noise has a product form, i.e.N = n i=1 N i , where N i acts only on qubit i.Although it is straightforward to generalize the results to the case in which there is a different channel acting on each qubit, we will make the simplifying assumption that all local channels are the same.Furthermore, we will focus on inequalities that tensorize.This means that q α will not scale with the size of the system n.To the best of our knowledge, strong data processing inequalities are not available for Rényi entropies beyond product channels.
Let us start with the continuous-time setting, as more is known there.For continuous-time, the contraction of Rényi entropies was systematically studied in [61].In particular, in [61,Theorem 4.3] the authors relate bounds on the optimal decay rate r α to so-called logarithmic Sobolev inequalities [17,45,64,78].It is beyond the scope of this article to review logarithmic Sobolev inequalities and we focus instead on the contraction rate these tools give to the problem at hand.
If we have a Lindbladian of the form holds with where λ(L) is the spectral gap of the local Linbladian L. For instance, for generalized depolarizing noise we have λ(L) = 1.The take-home message of Eq. (C8) is that as long as σ −1 = O(1), the rate with which the sandwiched Rényi-2 divergence contracts is constant as well.It is also possible to use similar tools to derive the contraction for other values of α > 1 and we refer to [41,61] for a more detailed discussion.However, to the best of our knowledge, all known results exhibit a similar scaling as that in Eq. (C8) and we do not discuss this further.
In discrete time, the best results available are to the best of our knowledge those of [41,Corollary 5.5,5.6].To parse their results we first need to introduce some notation.For a given σ we will denote by Γ α σ : B(H) → B(H) the map X → σ α 2 Xσ α 2 and by D p,σ the generalized depolarizing channel converging to the state σ (i.e.ρ → (1 − p)ρ + pσ).It follows from [41,Corollary 5.6] that if for a quantum channel N i with fixed point σ we have then for any state ρ on n qudits: The expressions in Eq. (C9) and Eq.(C10) may seem daunting at first, so let us digest them a bit further and summarize their message.First, note that Eq. (C9) only involves one copy of the quantum channel, whereas the expression in Eq. (C10) involves arbitrarily many.Thus, this is an example of an inequality that tensorizes.Furthermore, note Eq. (C9) can be verified efficiently.This is because it just corresponds to checking whether the operator norm of a linear operator is smaller than or equal to one or not, which can be computed in polynomial time.Thus, by performing a binary search on the values of p for which the inequality holds, we can approximate the largest p for which it holds.Then Eq. (C10) tells us that once we establish such an inequality, the Rényi-2 divergence will contract by a rate that is independent of the system size.The take-home message of Eq. ( C10) is essentially the same as that of Eq. (C7).As long as σ −1 ∞ = O(1), the Rényi-2 divergence will contract with a constant rate.This corresponds to the setting in which each local fixed point does not have a purity scaling with system size.

Specializing Lemma C.1 to QAOA and quantum annealing
In the main text we only considered quantum circuits that are affected by unital noise.The reason for that is that then one can use Lemma C.1 to obtain the exponential decay of the relative entropy to the maximally mixed state independently of the circuit that is being implemented.However, it is still possible to obtain closed formulas for the relative entropy decay for QAOA-like circuits, as we will show now.We are still going to depart from the assumption that the noise affecting the device has a product state σ q = ⊗ n i=1 τ q as its fixed point, with for some q ∈ [0, 1].
Recall that for H I the Ising Hamiltonian whose energy we wish to minimize, H X = − i X i and γ, β ∈ R P , the QAOA unitary is given by: In order to obtain an estimate of the relative entropy decay under a noisy version of this circuit, we need to analyse the expressions: We then have: Lemma C.3.Let β, γ ∈ R P be given and for q ∈ [0, 1] σ q as in Eq. (C11).Moreover, for β k , q define z(β k , q) as Then: Proof.As both e iβ k H X and σ are of tensor product form, we obtain by the additivity of the max relative entropy that A simple yet tedious computation shows that: Taking the logarithm yields the claim.
Before we state the entropy decay we obtain for QAOA circuits, let us briefly comment on the scaling of Eq. (C13).First, note that either in the limit q → 1 2 or β k → 0 we have that the r.h.s. of Eq. (C13).The first case corresponds to the fixed point being the maximally mixed state, but the second corresponds to mixer unitaries for which the total time evolution is small.
On the other hand, if we let q → 0 or q → 1, then we see that the r.h.s. of Eq. (C13) goes to infinity.We then have: Corollary C.1 (Relative entropy decay for QAOA).Let β, γ ∈ R P be given and τ q and z defined as before.
Moreover, let N be such that Then for any initial state ρ we have: Furthermore, for the case of ρ = |+ +| ⊗n and α = 2, we have Proof.The first step is to observe that D ∞ (e iγ k H I σ q e −iγ k H I ) σ) = 0.This follows from the fact that e iβ k H I is a diagonal unitary and, thus, commutes with σ q .The claim then follows from combining Lemma C.1 and the result of Lemma C.3.To obtain the expression in Eq. (C17), note that D 2 tensorizes and the two underlying states are product.Thus, we only need to compute D 2 (|+ +| τ q ), a simple computation.
From our previous discussion, it is straightforward to identify the conditions under which Eq. (C16) converges to 0 as P → ∞.First, the case q = 1 2 , which corresponds to unital noise and we already covered at length in the main text.Second, whenever we have that β k → 0 as k → ∞.This is because the relative entropy terms in Eq. (C16) at depth k are suppressed by (1 − p α ) 2(P −k) .Thus, only at depths k P the relative entropy is not suppressed.
Interestingly, parameters β, γ for which QAOA is expected to perform well fulfill this condition [8,33].To see this, it is fruitful to interpret QAOA as a trotterized version of quantum annealing, where we start with the Hamiltonian H X and adiabatically modify it to H I .It is then clear that at late times of the computation, the Hamiltonian will approximate H I and the fixed point of the noise will be approximately preserved by the unitary evolution.
We can make this precise by deriving the analogous version of Corollary C.1 for quantum annealing: Proposition C.1.Let L : B(H V ) → B(H V ) be a Lindbladian with fixed point σ q defined as before with q ≥ 1 2 .Suppose that for some α > 1 we have for all t > 0 and initial states that there is a r α > 0 such that: Moreover, for functions f, g : [0, 1] → R and T > 0 let H t : B(H V ) → B(H V ) be given by H t (X) = i[X, f (t/t)H X + g(t/T )H I ].Let T t be the evolution of the system under the Lindbladian S t = L + H t from time 0 to t ≤ T .Then for all states ρ: Proof.From Lemma C.2 we see that all we need to obtain the claim is to estimate As before, because [H I , σ q ] = 0, this simplifies to where in the last step we applied a triangle inequality using H X = − i X i and the fact that σ q = ⊗ n i=1 τ q .The claim follows after noting that: As adiabatic theorems require that f (1) = 0 to make sure that the we observe a good overlap with the ground state [43], it follows that the Rényi entropy will typically decay to 0 even under nonunital noise for quantum annealers.However, note once again that our bounds perform poorly whenever the fixed point is close to pure and whenever the function f does not decay fast enough to 0 around 1.

QAOA and quantum annealing on random regular graphs of high girth
In the previous section we established estimates on the relative entropy decay of QAOA circuits (Corollary C.1) and quantum annealers (Proposition C.1) under non-unital noise.Such estimates can then be combined with Theorem IV.1 to obtain concentration inequalities for the outputs of these circuits.One important caveat is that Corollary C.1 and Proposition C.1 depend on the actual circuit being implemented.Thus, we cannot give universal bounds on the performance of such circuits that only depend on the depth and the noise level as it was the case for unital noise.
However, Corollary C.1 can still be readily applied for a given choice of QAOA parameters and we will exemplify the performance of the bounds on QAOA on the Max-Cut of random D-regular graphs under noise.The motivation to study this particular class of instances is many.First, the asymptotic value of both the ground state energy and that of the standard SDP relaxation are known.It is known [26,67] that for the Ising model on a random D-regular graph on n nodes the ground-state energy density scales like: with Π * = 0.763166 . . . the Parisi constant.The value that assumption-free efficient classical algorithms [79] achieve is given by − 2 π √ D with 2/π 0.6366.The fact that these values are known makes it straightforward to analyse at which energies the output of a noisy quantum algorithm will be outperformed by efficient classical algorithms.
Furthermore, there is a natural choice for the value of the QAOA parameters to pick for the circuit.Indeed, in [7] the authors computed the optimal parameters for QAOA on such graphs for depths up to P = 17.Note, however, that these are the optimal values as the system's size goes to infinity and in the absence of noise.Nevertheless, they provide a good testing ground for our bounds.
To start our analysis, note that if we define the one qubit state τ q = q|0 0| + (1 − q)|1 1| as before and let H I,D be the Ising Hamiltonian on a D−regular graph, then we have: To see this, note that the expectation value of each Z i Z j term will be (1 − 2q) 2 and the graph is assumed to be D-regular.We then obtain Eq. (C21) by noting that there are nD 2 edges in the graph.As the expected value of the energy achieved by classical algorithms is −2n √ D/π, quantum advantage is lost if we deviate by less than (1 − 2q) 2 nD 2 + 2n √ D/π from the mean under τ ⊗n q .We then have: Proposition C.2.Let ρ be a quantum state on n qubits and assume that for some q ∈ (0, 1), > 0 and D > 0 we have Then the probability that the outcome of measuring ρ in the computational basis provides a lower energy than efficient classical algorithms for Max-Cut on random D-regular high girth algorithms is at most e − 2 n .
Proof.Note that we have H I,D Lip = D, as the graph is D-regular.By Theorem IV.1 we have: By our previous discussion, we know that we need to deviate from the mean at the state τ ⊗n q by at least (1 − 2q) then quantum advantage is lost.In [8, Table 4] the authors give optimal parameters that in the noiseless case outperform known efficient classical algorithms.We can then insert these parameters into the bound obtained in Corollary C.1 to estimate at which noise levels advantage is lost.
It is important to stress once again that these parameters are only known to be optimal in the absence of noise and in the limit of nodes and degree going to infinity.However, we believe that they still provide a natural choice of parameters to analyse under noise.Importantly, note that Eq. (C24) once again only requires the relative entropy to contract by a constant factor before advantage is lost as long as q = 1 2 .In Figure 2 of the main text we plot the performance of QAOA with the parameters for P = 17 as predicted by our bounds.In the absence of noise these QAOA circuits outperform efficient classical algorithms, but we show that this is not necessarily the case in the presence of noise.Note that the values of γ i are irrelevant for the analysis.The values of β i we used are β = [0.6375,0.5197, 0.4697, 0.4499, 0.4255, 0.4054, 0.3832, 0.3603, 0.3358, 0.3092, 0.2807, 0.2501, 0.2171, 0.1816, 0.1426, 0.1001, 0.0536] . (C25)

Computations required for section VI D
In this subsection we collect some auxiliary computations required to arrive at the conclusion of the example discussed in section VI D. Our goal is to evaluate the formula in Eq. (C19) for the case where the initial state is given by |+ = 1 √ 2 (|0 + |1 ) and the annealing schedule is linear, f (t) = (1 − t).Furthermore, for simplicity, we will assume that the local Lindbladians L i have as fixed point the state τ q for q ≤ 1 2 and spectral gap λ = 1.The results can then be easily rescaled to obtain the bounds for other values of the spectral gap.
The first observation we make is that under these assumptions Eq. (C8) implies that: Furthermore, a simple yet tedious calculation shows that: and the integral in Eq. (C19) evaluates to Putting all of these elements together we obtain the bound where r 2 is lower-bounded in Eq. (C26).Furthermore, by combining [65, Theorem 2] and [25,Theorem 7] we conclude for τ q and O satisfying [O, τ ⊗n q ] = 0 we have that: (C29) The one-sided bound also holds without the prefactor 2. Now that we have a contraction result for the Rényi divergence and a concentration inequality for the fixed point of the noise, it is straightforward to also obtain concentration bounds for the output of the noisy quantum annealer with Theorem IV.1.Indeed, we conclude that: Proposition C.3.For 0 < q ≤ 1 2 and T > 0 let r 2 = 2 1 − q log(q −1 ) .

(C30)
and h(T ) be where (E4) follows from (10).Therefore by the triangle inequality, for any two states ρ 1 , ρ 2 with corresponding densities X j = σ − 1 2 ρ j σ − 1 2 :  The proof of (E2) is standard [55]: denote by ν A , resp.ν B , the probability measures Next, we consider the noisy circuit introduced in (9).For any noisy gate N ,e , we denote by σ ,e the environment state of the copy e of the set e, and by U ,{e,e } the unitary dilation of N ,e acting on set e and its copy e , so that N ,e (ρ) = tr e U ,{e,e } (ρ ⊗ σ ,e ) .
We also denote by U V A the composition of the tensor products of dilations U ,{e,e } , where the system A represents the total environment resulting from all the dilations previously defined.In other words, defining σ A := ,e σ ,e , we have We denote by I U V A the light-cone of U V A with respect to the decomposition Proof.From [65], the Wasserstein distance W 1 arises from a norm .W 1 , i.e.W 1 (ρ, σ) = ρ − σ W 1 .Moreover, the norm .W 1 is uniquely determined by its unit ball, which in turn is the convex hull of the set of the differences between couples of neighboring quantum states: n , N (i) n = {ρ − σ : ρ, σ ∈ S V , tr i (ρ) = tr i (σ)} .
Now by convexity, the contraction coefficient for this norm is equal to Let then X ∈ N n .By the expression (G1), and choosing without loss of generality an ordering of the vertices such that tr 1 (X) = 0, we have where µ denotes the Haar measure on one qudit, and where (1) follows from the fact that tr 1 (X) = 0, with U {i−k,••• ,n} (t) defined as in Theorem III.2 with k < i − 1. Next, by the variational formulation of the trace distance and Theorem III.2, we have for i ≥ i 0 that

2 .
For any O ∈ O V and any product state ρ ∈ S V with ρ out := N V (ρ):Var ρout (O) ≤ 4 O 2 L |V | I 2 N V + max |E | L =1 max e∈E I(e, L − ) 2 ,where given a set e ∈ E and m ∈ N, I(e, L − ) denotes the set of all vertices in V in the light-come of the set e for the circuit constituted of the last L − layers of N V .Remark F.1.In the noiseless setting where there are no ancilla systems, by a closer look into the proof below, we can get rid of the sum over layers and hence recover the bound in Proposition III.2.Proof.Given the tensor product input state ρ and for any O ∈ O V , we consider the varianceVar ρout (O) = tr N V (ρ) (O − tr[N V (ρ)O] I) 2 = tr (ρ ⊗ σ A ) U † V A (O − tr[(ρ ⊗ σ A ) U † V A (O)] I) 2 = Var ρ⊗σ A U † V A (O) .Proposition G.1.Assume that the continuous-time evolution {U V (t)} t≥0 defined on the graph G = (V, E) with |V | = n satisfies the bound in Theorem III.2.Then, for anyH ∈ O V , U V (t) † (H) L ≤ 2(i 0 − 1) δ−1 e vt−d(i) H L ,(G2)where dist({1}, {i, • • • , n}) ≡ d(i), and i 0 stands for the first vertex such that d(i 0 ) ≥ 2δ − 1.