Sparse Random Hamiltonians Are Quantumly Easy

A candidate application for quantum computers is to simulate the low-temperature properties of quantum systems. For this task, there is a well-studied quantum algorithm that performs quantum phase estimation on an initial trial state that has a non-negligible overlap with a low-energy state. However, it is notoriously hard to give theoretical guarantees that such a trial state can be prepared efficiently. Moreover, the heuristic proposals that are currently available, such as with adiabatic state preparation, appear insufficient in practical cases. This paper shows that, for most random sparse Hamiltonians, the maximally mixed state is a sufficiently good trial state, and phase estimation efficiently prepares states with energy arbitrarily close to the ground energy. Furthermore, any low-energy state must have non-negligible quantum circuit complexity, suggesting that low-energy states are classically nontrivial and phase estimation is the optimal method for preparing such states (up to polynomial factors). These statements hold for two models of random Hamiltonians: (i) a sum of random signed Pauli strings and (ii) a random signed d -sparse Hamiltonian. The main technical argument is based on some new results in nonasymptotic random matrix theory. In particular, a refined concentration bound for the spectral density is required to obtain complexity guarantees for these random Hamiltonians.


I. INTRODUCTION
What are quantum computers good at?The earliest (and still most compelling) candidates are factoring [52] and simulation of quantum systems [24,39,40].While Shor's celebrated quantum algorithm for factoring [52] settles the quantum complexity of factoring, the complexity of quantum simulation at low energies has not been resolved.Indeed, we know from complexity theory that the ground energy problem for general local Hamiltonians is QMA-hard in the worst case [35], so we anticipate that preparing ground states is generally intractable, even for quantum computers.This worst-case hardness persists for systems with additional physical constraints, including nearest-neighbor interaction in 1D [1] and translation invariance [29].Although these results send a pessimistic signal, they merely indicate that any proof that the ground-energy problem is quantumly easy must further constrain the class of Hamiltonians, or else it can apply only for typical instances.Indeed, one may construct random families of Hamiltonians (Section I A) in the hope that their average-case complexity might be more favorable than the worst case.
Aside from complexity theory, the problem of preparing low-energy states arises in efforts to apply quantum computers to computational chemistry (for example, see [44]) and to condensed matter physics.To prepare a state of sufficiently low energy on a quantum computer, which can be used, e.g., for understanding chemical reaction pathways, a proposed quantum algorithm simply runs phase estimation on an initial trial state [6,16,37,58].This method is efficient if the initial state has a nonnegligible overlap with a low-energy state.Although the phase estimation part of the algorithm is well understood, we have an incomplete understanding of the time required to prepare a good initial state.In fact, recent numerical tests [38] suggest that, for some chemical systems, easily preparable initial states may have exponentially small (in system size) overlap.Moreover, preparing states with good overlap using the adiabatic algorithm may take exponential time, significantly impacting the end-to-end performance of the proposed quantum algorithm.
The search for tasks that are easy for quantum computers, in quantum chemistry or otherwise, is often implicitly a quest for quantum advantage: quantum computers can be particularly helpful if the task is also classically hard.Unfortunately, proving classical hardness is challenging, and many once-promising candidates for classically hard problems have now been dequantized.For example, under certain classical access models, recent progress eliminates exponential quantum advantage in low-rank linear algebra tasks [19,27,53,54].Still, hope remains that the Hamiltonian low-energy problem could provide a quantum advantage [26].With these thoughts in mind, the guiding question of this work is the following.
"Is there a classically nontrivial Hamiltonian whose low-energy states are provably easy to prepare?"Our work argues in the affirmative.In particular, we will show that the textbook phase estimation method (discussed above) works well for preparing low-energy states of a typical random sparse Hamiltonians.Meanwhile, the low-energy states must have a large quantum circuit complexity, so they are plausibly nontrivial for classical computers to simulate.

arXiv:2302.03394v1 [quant-ph] 7 Feb 2023
The paper is organized as follows.First, we review relevant classes of Hamiltonians (Section I A) before presenting the main Hamiltonian model and the main results (Section II).Our proof strategy (Section III) exploits tools from nonasymptotic random matrix theory; Section IV contains further details and context.Last, we discuss the classical complexity of the low-energy problem, and we lay out future research directions in the search for quantum advantage (Section V).

A. Related Models
Before we give a statement of our main results, let us discuss how some familiar models fall short of answering our question.We focus on random ensembles where quantitative statements are available.
• The few-body Pauli models [22].As a natural generalization of the classical spin glasses (e.g., the Sherrington-Kirkpatrick (SK) model [51]), one replaces classical (commuting) constraints with noncommuting Pauli operators.A representative is the ensemble of Hamiltonians given by and σ x i , σ y i , σ z i denote the Pauli operators on qubit i. Heuristically, this model exhibits spin glass behavior at low temperatures [8], which suggests that finding low-energy states could be hard, even for quantum computers. 1In the high-temperature regime, this model becomes classically easy: there exists an efficient algorithm that outputs a product state approximating the operator norm of the Hamiltonian H to a constant ratio [31].We do not know whether there is a temperature range where the state remains quantumly easy but classically hard, nor do we know how to attack this question.
Nonrigorous arguments rooted in physics suggest that this model remains chaotic (instead of a spin glass) at very low temperatures [8,30,43].If true, this is a strong hint that the SYK model answers our question in the affirmative. 2Unfortunately, it is challenging to sharpen the physics arguments into actual proofs.
The only rigorous statement known to us is the recent work of Hastings and O'Donnell [32], which showed that a low-energy witness with a constant ratio approximation of ground energy could be prepared by an efficient quantum algorithm.An extension of this result to arbitrarily low energies would also serve our question.Right now, we do not know any analytic method suitable for the low-temperature regime of the SYK model.
This nonlocal and nonsparse Hamiltonian seems unphysical, but it nevertheless served as an mathematical model for heavy nuclei [45].Together with other random matrix ensembles (see [3] for a textbook introduction), the GUE provides a useful model for strongly interacting systems and for quantum information problems thanks to its well-established properties [21].As the Hamiltonian itself has exponentially many degrees of freedom, by a counting argument, the computational complexity (of low-energy state preparation or Hamiltonian evolution) is exponential e Ω(n) with high probability for quantum computers; this is at most polynomially faster than exact diagonalization.
The ensemble we study in this work shares the nice properties of the few-body Pauli and SYK models, as it is sparse and instances can be efficiently specified.At the same time, like the GUE ensemble, we can accurately approximate the minimal energy and the density of states.Moreover, we show that polynomial-size quantum circuits are necessary and sufficient to generate low-energy states, up to arbitrarily good approximation ratios of the ground state energy.The main downside of our ensemble is that, like the GUE ensemble, it is nonlocal, so it does not closely resemble the Hamiltonians that readily appear in nature.

II. MAIN RESULTS
Now, let us present the model for which we will establish average-case quantum complexity for low-energy states.Consider an independent sum of a few random Pauli strings with random sign coefficients: r j √ m σ j where σ j i.i.d.
∼ unif{+1, −1}. ( The parameter m will be polynomial in the number of qubits n, rather than exponential as in the GUE model.Our main technical results show that its low-energy states enjoy two-sided bounds on circuit complexity. Theorem II.1 (Low-energy states have low complexity).For any accuracy ≥ 2 −n/c1 , let H P S be drawn from the Pauli string ensemble (2) with m = c 2 n 5   4   terms.Then, the following statement holds with probability at least 1 − e −c3n 1/3 over a random draw H P S from the Pauli string ensemble.We can prepare a low-energy state ρ such that using a circuit of size G = Poly(n, −1 ).The quantities c 1 , c 2 , and c 3 are absolute constants, and λ min (H P S ) denotes the smallest eigenvalue of H P S (which is typically negative).
See Appendix C 2 for the proof of Theorem II.1.Here, we elaborate on the interesting complexity aspects of this problem: • Arbitrarily good approximation of the ground energy.For any polynomially small error ∼ Poly(n −1 ), there is a polynomially large choice m = Poly(n) for which a state that -approximates the ground energy can be prepared efficiently at gate complexity G = Poly(n).We hypothesize that the order of quantifiers can be exchanged, which would imply for large enough m = Poly(n) that the low-energy states remain easy for any = Poly(n −1 ).For further discussions, see point 2 in Section V.
• Phase estimation works.As we will show, the quantum algorithm that produces the low-energy state ρ is very simple.Performing phase estimation over the maximally mixed state has a decent chance, at least Ω( 3/2 ), of returning a low-energy state obeying (3).A higher success probability is achieved via repeating the phase estimation step.The Hamiltonian simulation costs at most Poly(m, −1 ) gates using off-the-shelf quantum simulation algorithms (e.g, Trotter [39] or qDRIFT [15]).
• End-to-end complexity.This Hamiltonian problem is oracle-free and input-state-free, giving a complete picture.Further, the model description is entirely classical, and an instance can be generated using only m(2n + 1)-bits of randomness.
• Average-case.The statement holds with high probability over the Hamiltonian ensemble.Indeed, this model can produce an arbitrary local Hamiltonian in the worst case, and we have no control over those instances.
• Nonlocal, noncommuting Hamiltonians.Most Pauli strings {I, σ x , σ y , σ z } ⊗n act nontrivially on Θ(n) sites, and thus the Hamiltonian is nonlocal.The Hamiltonian is highly noncommutative since random Pauli strings anticommute with each other with probability 1 2 .Intuitively, the Pauli string ensemble is closer to a random matrix than to a local Hamiltonian.
• Sparse matrices.From a linear algebra perspective, this model is a sparse, high-rank matrix (which has not been dequantized; see Section V).In general, a sparse matrix may not admit a simple Pauli decomposition; nevertheless, the same result extends to signed random d-sparse matrices (see Section IV B 1).However, the quantum easiness then requires access to a block encoding [28] of the Hamiltonian.
On the other hand, we argue this problem is "very quantum" by proving a lower bound on the complexity of preparing low-energy states.As a disclaimer, we do not prove classical hardness for state preparation (see Section V), which is an intriguing open problem that we leave for future work.
Theorem II.2 (small circuit gives bad energy).Fix a circuit architecture with G two-qubit gates (e.g., 1D brickwork layout) with the initial state |0 and consider the family of all reachable states Circ(G).For any 1 ≥ 0, suppose m ≤ 2 1 • 2 n .Then, with high probability over the random draw of the instance H P S from the Pauli string ensemble (2), φ| H P S |φ ≥ 1 • E λ min (H P S ).
Namely, all possible states |ψ ∈ Circ(G) parameterized by the circuit architecture fail to produce any low-energy state.The notation õ(•) suppresses log(m)-prefactors.

⇢(E)
< l a t e x i t s h a 1 _ b a s e 6 x X 4 t g j 6 j I R W p n U J H e S l T v x a 5 z h J j u r J n 3 2 c a j Y o v p 3 i 1 d a w e X + o w q o + o p v 3 J B B v 9 n e 3 m j 9 s v O i / q r / a W n 1 P P n K + c r 5 1 v n K b z k / P K O X L a T s 8 J H O L 8 5 v z u / F H 7 s / Z X 7 e / a W y 1 9 + m S Z 8 6 X z 6 K r 9 8 y + j b G Z s < / l a t e x i t > E < l a t e x i t s h a 1 _ b a s e 6 4 = " P k See Appendix D for the proof of Theorem II.2.In other words, we very often need a large circuit to describe the low-energy states; they are very entangled and far from product states. 3Further, our circuit size lower bound uses a direct counting argument, and it suggests the circuit should change over different random instances.Nevertheless, Theorem II.1 states the complementary result: an appropriate instance-dependent state can be prepared efficiently using the simplest quantum algorithms (Hamiltonian simulation and phase estimation).
The main caveat for our model is that it is nonlocal, unlike most physical Hamiltonians, and our argument is not immediately applicable to local Hamiltonians.Indeed, the spectral properties of the two types of models are different.As we will show, the Pauli string ensemble has a (compact) semicircular spectrum, while local Hamiltonians tend to have a tail in the spectrum. 4Performing phase estimation with the maximally mixed state would not be able to access the low-energy states far in the low probability tail.Of course, we hope our results ultimately inspire a better understanding of preparing the low-energy states of local Hamiltonians.For further discussions, see point 3 in Section V.
Regardless, from a linear algebra and algorithm perspective, random sparse matrices are natural models to study.We emphasize the main goal is to give a transparent toy model showcasing what quantum computers are good at, especially given recent developments in dequantization.

III. PROOF IDEAS
Given a general strongly interacting Hamiltonian, it seems daunting to control its behavior.However, we can make an exception in the case of certain random matrix ensembles where the matrices have predictable spectral properties.For example, it is well known that GUE matrices (see Appendix B) have a definite maximal eigenvalue and a semicircular spectral density ρ(x): (up to negligible fluctuation and with high probability).
See Figure 1.Indeed, this fact alone hints that the low-energy states have a nonnegligible density, independent of the system size: In terms of complexity, directly running phase estimation on the maximally mixed state returns a low-energy state with decent probability: Ω( 3/2 ).The core of our argument is that the spectrum of the Pauli string

⇢(E)
< l a t e x i t s h a 1 _ b a s e 6 4 = " v e t g v o b R u P 6 W i 6 x e L d R f 1 Z u 7 6 a 9 a d a O H O U K c s 6 j v 5 f S F e M 1 x i y p q F b M e W M U P H / k 8 4 r t Z X y c q n P i + f h O C G M I R T r c d 0 M t H 1 0 p t 6 6 N t r e c i o K q w z j K q t W b 6 R 0 7 d z 1 h S Q 5 A / r J v Y u a R Z R U X F q c 0 7 / E 6 o 3 n e A V 7 X K A k 5 j N K 0 G 7 6 I E R 5 k m l K D q 2 o G s K u m s E i K s n y 5 u l J k 7 U k k N X x X 4 t g j 6 j I R W p n U J H e S l T v x a 5 z h J j u r J n 3 2 c a j Y o v p 3 i 1 d a w e X + o w q o + o p v 3 J B B v 9 n e 3 m j 9 s v O i / q r / a W n 1 P P n K + c r 5 1 v n K b z k / P K O X L a T s 8 J H O L 8 5 v z u / F H 7 s / Z X 7 e / a W y 1 9 + m S Z 8 6 X z 6 K r 9 8 y + j b G Z s < / l a t e x i t > E spectral outliers fluctuating density resolvent < l a t e x i t s h a 1 _ b a s e 6 4 = " V 0 l s o y j 4 n f u 4 0

⇢(E)
< l a t e x i t s h a 1 _ b a s e 6 4 = " v e t g v o b R u P 6 W i 6 x e L d R f 1 Z u 7 6 a 9 a d a O H O U K c s 6 j v 5 f S F e M 1 x i y p q F b M e W M U P H / k 8 4 r t Z X y c q n P i + f h O C G M I R T r c d 0 M t H 1 0 p t 6 6 N t r e c i o K q w z j K q t W b 6 R 0 7 d z 1 h S Q 5 A / r J v Y u a R Z R U X F q c 0 7 / E 6 o 3 n e A V 7 X K A k 5 j N K 0 G 7 6 I E R 5 k m l K D q 2 o G s K u m s E i K s n y 5 u l J k 7 U k k N X x X 4 t g j 6 j I R W p n U J H e S l T v x a 5 z h J j u r J n 3 2 c a j Y o v p 3 i 1 d a w e X + o w q o + o p v 3 J B B v 9 n e 3 m j 9 s v O i / q r / a W n 1 P P n K + c r 5 1 v n K b z k / P K O X L a T s 8 J H O L 8 5 v z u / F H 7 s / Z X 7 e / a W y 1 9 + m S Z 8 6 X z 6 K r 9 8 y + j b G Z s < / l a t e x i t > E Schatten p-norms < l a t e x i t s h a 1 _ b a s e 6 4 = " C 2  For a generic matrix, the phase estimation strategy would not find a low-energy state if there are spectral outliers or if the spectral density gets too small near the ground energy.
(Left) Almost a semi-circle.For the Puali string ensemble, we control the ground energy by Schatten p-norms and control the spectral density by the resolvent.Both values are comparable to the GUE, which has a favorable semicircular spectrum.Therefore, enough states remain near the ground energy, and thus phase estimation efficiently finds them.
ensemble (2) looks a lot like the spectrum of a GUE matrix (Figure 2).As a consequence, it is also "easy" to find the low-energy states of the Pauli string ensemble.How can we prove that the Pauli string ensemble also has a semicircular spectral density?The entire argument then boils down to a universality principle: The Pauli string ensemble, at moderately large m, mimics "smooth" properties of the GUE ensemble, including the maximum eigenvalue and the coarse-grained spectral density.
The mathematical argument is based on techniques from nonasymptotic random matrix theory (Section IV).Several novel results are required to address some of the particular challenges that arise in the quantum information problem.
Our first main result states that the trace polynomial moments of the Pauli string ensemble almost coincide with the corresponding moments of the GUE.In particular, by choosing a large enough moment, we can also compare the spectral norms of the two matrices.Throughout this work, we consider the normalized p-norms Tr .
Theorem III.1 (p-norms and operator norm).Let p ∈ 2N be an even natural number.The random Pauli string ensemble (2) satisfies the norm bound The symbol suppresses constant factors.Furthermore, for 0 ≤ ≤ 1/2 and m ≤ 2 2n , there exist constants See Appendix C 1 for the proof of Theorem III.1.For a fixed moment p that may depend on the number n of sites, the right-hand side of (5) decays with the number m of terms in the Hamiltonian.Applying Markov's inequality for p = Ω(log(N )) = Ω(n) and choosing m = Poly(n), we obtain a tail bound for the spectral norm.
Comparing the spectral densities of the two ensembles requires a more difficult argument.Ideally, we are interested in projectors to Hamiltonian eigenstates |φ φ|.However, exact eigenstate projectors are tricky to handle.Instead, we consider the resolvent, which probes the "coarse-grained" energy projector at energies ω ± O(η).We define where 1 is the indicator function.We often suppress parameter dependencies by writing R := R ω,η (H).For intuition, the resolvent is diagonal in the Hamiltonian basis and spikes at energy ω with width O(η).See Figure 3.

⇢(E)
< l a t e x i t s h a 1 _ b a s e 6 4 = " C T x X 4 t g j 6 j I R W p n U J H e S l T v x a 5 z h J j u r J n 3 2 c a j Y o v p 3 i 1 d a w e X + o w q o + o p v 3 J B B v 9 n e 3 m j 9 s v O i / q r / a W n 1 P P n K + c r 5 1 v n K b z k / P K O X L a T s 8 J H O L 8 5 v z u / F H 7 s / Z X 7 e / a W y 1 9 + m S Z 8 6 X z 6 K r 9 8 y + j b G Z s < / l a t e x i t >    < l a t e x i t s h a 1 _ b a s e 6 4 = " f y 8 J w v 7 e q J l N + o e L 5 1 r B 5 f 6 j C q j 6 j 5 l 5 L 7 9 k a / 3 W j 9 0 F g 9 W 6 2 / 2 p p 9 T j 1 z v n K + d r 5 x W s 6 a 8 8 r Z d z p O z w m c x P n V + c 3 5 v f Z H 7 a / a 3 7 V / t P T p k 1 n O l 8 6 j q / b v f / 8 4 a + A = < / l a t e x i t > |R| p < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 y X x C D Y 2 Q y e p 8 g t r N 5 l z W 7 B N 3 2 Y = " > A A A O S X i c h Z f L b u M 2 F I Y 1 M 7 1 M 0 3 q a t M t u h B o B i q J j 2 E a m S R Y D T K I 4 9 4 u T + J J J 5 A a U T N t E q A s o m p 1 A U J d 9 m m 7 b l + g T 9 D G 6 K 7 o q K d o R o + P p e G P y f P / h 5 Y i / L l 5 M S c L r 9 b + e P H 3 2 0 c e f f P r 8 s 6 X P v 6 i 8 + H J 5 5 a t e E k 2 Z j 7 + 6 J 2 9 f 8 j D K j 6 j 5 l 5 L 9 / k a v W W v 8 W F s 7 X 6 u + 2 Z 5 9 T j 2 3 v r G + t b 6 z G t a 6 9 c b a t 9 p W 1 / K t X 6 3 f r N + t P y p / V v 6 u / F P 5 V 0 u f P p n l f G 0 9 + r 1 4 9 h + e + X N r < / l a t e x i t > q L Z / P P J 0 3 f e f e / 9 D 5 5 9 u P D R x 7 V P P l 1 c + q w f h w n o M 8 4 + q c L Z 1 r B 5 f 6 j C q j 6 j Z l 5 L 9 5 k a / 3 W h 9 3 1 g 5 W 6 m / 2 p x + T j 2 z v r S + s p 5 b L W v V e m X t W R 2 r Z 3 n W L 9 a v 1 m / W 7 7 U / a n / V / q 7 9 o 6 V P n 0 x z P r c e X b V / / w M w s 3 I A < / l a t e x i t > O( ⌘ p p ) However, if we are especially interested in the states near certain energy ω, the resolvent is not localized enough because the filter E → 1/ |E − ω| decays too slowly as a function of the energy. 5Instead, we can take the trace of resolvent powers so that the tail decays at the faster rate ∼ |E − ω| −p .That is, where the energy η √ p is roughly the window where the weight |R| p η p remains large Ω(1).
Theorem III.2 (Comparing the resolvent moments).Let p ∈ 2N be an even natural number.The resolvent (6) of the random Pauli string ensemble (2), written R P S , compares with the resolvent R GU E of the GUE: The symbol suppresses absolute constants.
See Appendix A 2 for the proof of Theorem III.2 in a more general setting.For moderately large m (depending on the distance η from the real line and the power p), the formula (7) controls the expected spectral density, filtered by the resolvent: Since we want to make a statement that holds with high probability over realizations of the Pauli string ensemble, we also need to prove that the quantity Tr |R| p concentrates near its expectation E Tr |R| p (i.e., the spectral density does not fluctuate too much); see Theorem A.1.Lastly, since individual resolvents probe the local density, we may probe the integrated spectral density by placing consecutive resolvents.The abundance of low-energy states then implies phase estimation succeeds with a decent chance.

IV. NEW RESULTS IN NONASYMPTOTIC RANDOM MATRIX THEORY
Our results for the Pauli string ensemble fall into the category of nonasymptotic universality laws for random matrices.This section provides some context for these results, as well as some details about the argument.
Asymptotic universality laws are among the celebrated classical achievements of random matrix theory (RMT).For example, Wigner showed that the semicircle law is the limiting spectral distribution of a (standardized) symmetric matrix with i.i.d.Rademacher entries above the diagonal.The universality law for the Wigner matrix states that the detailed distribution of the entries does not affect the limiting spectral distribution, provided the first four moments are bounded.Subsequently, researchers obtained nonasymptotic comparisons between the spectrum of a Wigner-type matrix and the semicircle distribution.For surveys, see the monographs [7,48].
Our approach depends on a nonasymptotic comparison between the spectrum of the Pauli string ensemble (2) and a GUE matrix, whose spectral distribution approximately follows a semicircle law.This type of result does not fall within the scope of classical universality laws because the Pauli string ensemble barely has any randomness, let alone independent entries.To implement our program, we first observe that the low-order moments of Pauli string ensemble match the low-order moments of a GUE matrix: For a smooth statistic f of the random matrices, we can take advantage of this coincidence by means of Lindeberg's exchange principle.Each of the random matrix models can be expressed as a sum of i.i.d.random matrices, and we can interpolate between the two models by swapping one summand at a time.At each step, we can control the change between the two models by expanding f as a Taylor series to expose the polynomial moments.The terms in these expansions cancel through the third order, leaving a fourth-order error.Our argument is quite different from recent applications [17,36] of the Lindeberg principle in RMT.
In more detail, we consider two random Hermitian matrices H and H that can be written as sums of independent, centered random matrices (all of the same dimension): Although less familiar than the classical random matrix ensembles, the independent sum model is much more flexible and has a wide scope of applicability; see [55] for examples.Suppose that the low-order polynomial moments of the summands match.That is, For example, the first three moments of a random Pauli string match the first three moments of a GUE matrix.More generally, constructive models in quantum information theory can match an arbitrary number t of moments, similar to the case of a unitary t-design.Our work shows how to compare the spectral properties of models with many matching moments.
Our first universality result compares the trace polynomial moments of the two random matrices.These results allow us to control the spectral norm of the random matrices.
Theorem IV.1 (Universality for moments).Consider two families (A i ) and ( Ãi ) of independent random, Hermitian matrices whose moments match (9) up to order t ≥ 2, and introduce the sums H and H as in (8).Define the statistics for k ≥ 1 and p ≥ 2; Then, for each even natural number p ∈ 2N, we have the bounds The p-norm is defined in (4).
The proof of Theorem IV.1 appears in Section A 1. Theorem III.1 follows when we instantiate this result for the Pauli string ensemble (2) and the GUE.
We can obtain simpler versions of this result if we pass to the uniform bound L p,∞ = max i {|||A i ||| p , ||| Ãi ||| p } on the summands.For example, Here, the symbol suppresses absolute constants only.Heuristically, we should think about p m, so there are reductions in the error from matching more moments (i.e., increasing t).
The random matrices H and H are defined in (8).
Theorem IV.2 (Universality for resolvent moments).Instate the assumptions and notation of Theorem IV.1.
For each even natural number p ∈ 2N, the polynomial moments of the resolvent (11) are related by The symbol suppresses constants depending only on t.
See Appendix A 2 for the proof of Theorem IV.2.We obtain Theorem III.2 by instantiating the result for the Pauli string ensemble and the GUE.
The resolvent moment comparison (Theorem IV.2) is not sufficient to guarantee that a random realization H P S of the Pauli string ensemble places significant density on the low-energy states.To achieve this goal, we must also show that |||R P S ||| p concentrates near its expected value.This claim requires a separate argument (Theorem A.1).The results on concentration of the trace moments of the resolvent are new.

A. Related work
The field of RMT has historically focused on asymptotic limit laws for the spectral density of matrices from the classical ensembles (Wigner, Wishart, Jacobi, etc.).In this setting, there has also been a significant amount of research on rates of convergence, and some of these results can be interpreted as nonasymptotic universality laws.For example, see Bai & Silverstein [7,Chap. 8].
In the last few years, researchers have recognized that the scope of the universality phenomenon extends well beyond the classical matrix ensembles.In particular, we have started to develop a deeper understanding of the independent sum model.Tropp obtained the first general result of this type [57].His theory covers a sum of independent Gaussian random matrices, and it provides conditions under which the polynomial moments approximate the moments of the semicircle distribution.Building on Tropp's work, Bandeira et al. [9] developed a method for comparing a sum of independent Gaussian random matrices with a free probability model, which can capture a wider range of spectral distributions.With some effort, the techniques from these two papers can likely be applied to the Gaussian variant of the Pauli string ensemble (2) to obtain results similar to our main theorems.
The most immediate precedent for our work is a recent preprint by Brailovskaya & van Handel [12].Their paper compares an independent sum H of random matrices with an independent sum G of Gaussian random matrices, where corresponding summands share the same mean and covariance: The main result of the paper [12] provides conditions to guarantee that the two random matrix models have similar polynomial moments and polynomial resolvent moments.
Theorem IV.3 (Universality of moments and resolvent moments [12]).Consider two random matrix models as in (12).Define the statistics Then, for every even natural number p ∈ 2N, the polynomial moments and resolvents satisfy the bounds The p-norm is defined in (4), and the symbol suppresses absolute constants.
The proof of this result uses a version of Stein's method, inspired by [42].The basic technique is to interpolate smoothly between the two random matrix models, preserving the second moments along the interpolation path.To control the derivative of a spectral function along the path, the authors use a cumulant expansion along with bounds on the higher derivatives of the function.
It is fruitful to compare the bounds (10) and (13).The variance parameter v in Theorem IV.3 is never larger than the variance parameter σ 2 in Theorem IV.1 because the norm is inside the sum in v.The two quantities σ 2 and v coincide for i.i.d.sums, but they can differ by a factor as large as the ambient dimension N in general.The differences between the tail parameters (L p,p and L p,∞ and L ∞ ) are not an essential feature of the analysis; we have stated the simplest versions of the results, rather than the optimal versions.
On the other hand, the approach in Theorem IV.3 cannot provide more refined comparisons for random matrix models that match beyond the second moment (except perhaps when the third moments are identically zero).There are intrinsic reasons that continuous interpolation does not seem to extend beyond second moments (Appendix F).In contrast, the method based on Lindeberg exchange gracefully handles matching moments of any order.
As we will see (Section IV B), there are some natural settings where higher-order moments coincide.The resulting higher-order error bounds improve over the second-order bounds.In the setting of quantum information, we often need to take the moment parameter p ∼ log N ∼ n, so this improvement is significant.
In addition, our argument is conceptually and technically simpler than the approach based on Stein's method and cumulant expansions.As a consequence, it may be easier to extend to other settings, and it may have a different scope of application.Altogether, our work contributes to the emerging toolkit for nonasymptotic RMT.

B. Further examples
Our universality results apply to many different families of random matrix models, including examples that may not resemble the Gaussian models that are central to the comparison in [12].For quantum computing applications, these families could potentially capture realistic sparse matrices better than random Pauli string sums.However, they generally require access to an additional block-encoding, which we do not discuss in this work.

Comparing sparse matrices with GUE
In addition to the random Pauli string ensemble, we can describe another family of sparse random matrices that also matches the low moments of GUE.Therefore, the universality results (Theorem IV.1, Theorem IV.2) show that these models nearly follow a semicircular distribution.
Definition IV.1 (Permutations with complex signs).A random complex signed permutation matrix is the product of a uniformly random permutation matrix P and a diagonal matrix D with complex signs: ∼ {1, −1}.
Proposition IV.1 (Complex signed permutations).Consider random matrices A and Ã that take the form where Q is a complex signed permutation For these models, the first three moments match: See Appendix E 1 for the calculation.One may also consider random real signed permutations, 6 which match the first three moments of the Gaussian Orthogonal Ensemble (GOE).

Higher moments matching
Even though higher-moment matching examples are less common in the wild, we can describe several pairs of models that match up to arbitrarily high moments.The first example considers conjugating a fixed matrix by random unitaries: where Ũi ∼ haar; where U i ∼ unitary t-design. 6The random signed permutation is defined by Q := D P where D ab = δ ab ra and ra Indeed, if we take the unitaries U i to be the Clifford circuits (exact 3-design) and σ to be a fixed Pauli string, we nearly obtain the Pauli string ensemble (up to the identity element that cannot be produced by conjugation).However, beyond Clifford circuits, we do not know other examples where the matrices A i remain sparse.
If we insist on sparse matrices, here is another example.
where Qi ∼ i.i.d.complex signed permutations; where Q i ∼ i.i.d.t-wise independent complex signed permutations.
In this context, t-wise independent permutation is exactly the t-th moment matching condition E Q ⊗t i = E Q⊗t i .Exact and approximate constructions for both t-designs [14,46] and t-wise independent permutations [2] are available in the literature.We leave for future work for a careful analysis of the approximate case where very few random bits are needed.

V. COMMENTS ON DEQUANTIZATION AND QUANTUM ADVANTAGE
In this section, we comment on the classical complexity for the low-energy problem.The flavor differs from local Hamiltonian problems because our model is highly nonlocal and has a semi-circular spectrum.
1. How far does dequantization go?As we mentioned, recent developments in dequantization show that many linear algebra tasks can be efficiently solved assuming certain classical access to a quantum state.In particular, existing results consider low-rank matrices for various tasks [19,20,53,54] or high-rank matrices but with constant accuracy [26].
In the setting of Theorem II.1, we provide an efficient classical witness for the optimum if the accuracy > 0 is an arbitrarily fixed constant (with polynomially large m).The idea is a simple polynomial approximation.However, the cost of manipulating the witness is exp( Ω(1/ √ )), which scales poorly with the constant .This indicates that eigenstates "far from the ground state" have polynomial classical complexity; this is reminiscent of the cost of dequantization methods [26] in the context of ground energy estimation given good trial states.Still, the above classical polynomial witness gets stuck at a constant approximation ratio, while the quantum algorithm has no problem going to better and better accuracy7 .

Give me a decision problem!
To really talk about quantum advantage, ideally one wants a problem with classical inputs and outputs, especially a decision problem.A candidate problem is to compute an approximation to the ground energy of our model.However, since our problem has randomness, we expect the spectrum to be concentrated around the semicircle.If the spectrum were exactly the semicircle, a classical algorithm could simply output the deterministic value.Therefore, the classical hardness, if it exists, must originate from the instance-to-instance fluctuation of the spectrum away from the semicircle density, and that is why we need the accuracy = 1/Poly(n) to be small while the number of terms m = Poly(n) is not too large (otherwise the fluctuation becomes too small and predictable).
Acknowledging the above, a candidate problem for quantum advantage is deciding the density of states to high precision.It also converts to a binary decision problem by setting a threshold.Question V.0.1 (Task: Deciding the density of states).Given a Hamiltonian sampled from the Pauli string ensemble and a small parameter , output the number of states at a small energy interval 2] up to multiplicative error .

Is it classically hard for some
There is a quantum algorithm that succeeds with gate complexity Poly( −1 , δ −1 , m): our concentration argument for low-energy density of states ((C3) in the proof of Theorem II.1) also implies that for each δ, there is a polynomially large m = Poly(δ −1 , n) such that the local density [−δ, δ] is at least half of that of the semicircle.Therefore, phase estimation samples from this interval with Ω(δ −1 ) success probability.Repeated trials 8 give a high-confidence estimate of the density of states to error with Poly(1/ ) algorithmic cost.Why consider the problem of approximating the density of states and not approximating the ground state energy?Right now, we do not have control over the spectrum very close to the extreme eigenvalues; for fixed m, n, our current results do not rule out the possibility of a small spectral gap Ω(m −1/4 n 5/4 ); while we believe the spectral gap is exponentially small, the proof will require further developments in nonasymptotic random matrix theory.
Proving classical hardness for the density of states problem, e.g. by reduction from a problem already known to be hard, is more elusive.A general proof might be too much to ask for as it would give a computational quantum advantage for an oracle-free average-case decision problem, something for which no other examples are known.Still, it would be interesting to provide arguments for it.A concrete step is proving that the spectrum has a large enough instance-to-instance fluctuation away from the semicircle distribution such that the classical algorithm cannot succeed simply by always outputting the average value.We believe this to be true and it would be interesting to test it numerically.However, a proof of it would require further developments in nonasymptotic random matrix theory.
3. Quantum chaos and quantum advantage.Our work fits into the broader question of whether quantum chaos could be a source of quantumly easy problems and perhaps a quantum computational advantage.As we mentioned, Hastings and O'Donnell [32] made concrete progress on the SYK model, a prominent toy model of quantum chaos, by providing a low-energy witness where Gaussian states are known to fail.Their results would serve the question at hand even better if the classical hardness argument can be improved or if the Hamiltonian remains provably easy near the ground state.The latter seems plausible on physical grounds as it remains "chaotic" near the ground energy.Indeed, if one were to formally assume quantum chaos in terms of the Eigenstate Thermalization Hypothesis (ETH), one may prove that preparing low-energy states is quantumly easy because Gibbs sampling at low temperatures is efficient on a quantum computer [18].
Our work made progress in capturing quantum chaos and its consequences by studying random matrix models where nonasymptotic treatment of spectral properties is possible even near the ground energy.Still, we acknowledge that our model is nonlocal and perhaps deviates from local Hamiltonian problems in some aspects: the quantum easiness stems from the semicircular spectrum and does not directly explain why the low-energy problem of chaotic local Hamiltonians (whose spectral density has a tail instead) should also be easy.Nevertheless, we expect the following findings to extrapolate to local chaotic Hamiltonians: random matrix behavior can emerge from very few bits of randomness, and the spectrum is smooth and free of outliers (Figure 2).
Still, there is a wealth of quantum chaos phenomenology that requires formal treatment for quantum advantage implications.One direction is to show ETH (e.g., for the SYK models), which roughly means that nearby energy eigenstates are well connected to each other.We believe this can be formalized for the GUE, which should also extend to our Pauli string ensemble by the universality principle.Another direction is to reduce the locality of the Pauli string ensemble.In fact, our circuit complexity lower-bound argument remains nontrivial even when the locality k of each Hamiltonian term is reduced from k = Θ(n) to k = log(n), which at least gives a hint of classical hardness.
The remaining part of the work begins with proofs for the comparison principle (Section A), including the moments and the resolvent.We instantiate the nonasymptotic properties of GUE in Section B. The comparison results and GUE properties altogether allow us to calculate the properties of the Pauli string ensemble (Section C).In section D, we prove the circuit size lower bounds for the Pauli string ensemble, whose argument is independent of the comparison principle.Section E contains brief missing proofs.Section F contains an argument for why interpolation methods do not immediately exploit higher matching moments.
Appendix A: Calculations for the Lindeberg principle In this section, we apply a version of the Lindeberg exchange principle for the pth moments and the resolvent moments.The main assumption we use is that two sums of independent matrices share the same lower-order moments.The main technical argument is readily illustrated in the moment calculation.The resolvent calculation is more involved because the resolvent is nonconvex.We also have to establish concentration for a random realization of the resolvent moment around its expected value.

Moments
We recapitulate the statement for moments.
Theorem IV.1 (Universality for moments).Consider two families (A i ) and ( Ãi ) of independent random, Hermitian matrices whose moments match (9) up to order t ≥ 2, and introduce the sums H and H as in (8).Define the statistics for k ≥ 1 and p ≥ 2; Then, for each even natural number p ∈ 2N, we have the bounds The p-norm is defined in (4).
Our proof of Theorem IV.1 is based on the Lindeberg exchange principle.Roughly, we interpolate between the two sums by replacing one argument at each step.Since the low moments of the summands match, each replacement only changes the p-norm slightly, with error on the order (t + 1).The calculation is straightforward, but it implicitly exploits noncommutativity properties of the random matrices in the moment matching.The error is a noncommutative polynomial of matrices, and we treat them by a brutal application of Hölder's inequality, entirely ignoring noncommutativity.Once we have replaced all the summands, we tie the estimates together using a self-bounding argument.To execute this step, we must solve a difference equation by passing to a continuous differential equation.
Lindeberg's method has recently been applied to RMT in the papers [17,36,47].Some of the other ideas and methods in this argument have roots in the papers [9,12,56,57].
The basic argument relies on a general form of Hölder's inequality for Schatten norms.As we will see, we can introduce more refined moment inequalities to obtain some improvements.
Fact A.1 (Multivariate Hölder for random matrices).For any family (X 1 , . . ., X k ) of square random matrices, possibly statistically dependent, the product satisfies the trace inequality Proof sketch.In the deterministic setting, with two matrices, the result appears in [10,Corollary 4.2.6].Use induction to extend the bound to more than two deterministic matrices.To incorporate the normalized trace, note that the weighted geometric mean (p 1 , . . ., p ) → i=1 a pi i is homogeneous for fixed a j ≥ 0. To incorporate the expectation, recall that the weighted geometric mean is concave, and invoke Jensen's inequality.
Proof of Theorem IV.1: Two matching moments.To illustrate the concept behind the argument, we carefully establish the first bound for the t = 2 case.Afterward, we describe the modifications required to extend the bound to t > 2 and to introduce the variance parameter σ 2 .
Fix an even natural number p ∈ 2N.Our goal is to compare the pth moment |||•||| p p of the two independent sums S = m i=1 A i and S = m i=1 Ãi .The main idea is to update one summand at a time from A j to Ãj , controlling the change in the p-norm at each step.
In detail, for each index j = 0, 1, 2, . . ., m, we can define the hybrid matrix A i where S 0 = S and S m = S.
Express the difference between the pth moments of S and S as a telescoping sum: For even p, we can express the p-norm9 in terms of a trace power: |||S||| p p = E Tr S p .To bound the telescoping sum, we first analyze the single update error and then solve a recursion.
Step I: Single update error.Fix an index j = 1, . . ., m.Let us give a bound for the change in the p-norm when we update A j to Ãj .Define the unchanged part of the sum S j by We can control the change in the polynomial moment by performing a Taylor expansion of the polynomial moment at the unchanged part S − .
When expanding powers of a sum of matrices, keep in mind the scalar binomial expansion: For matrices, the expansion takes the form Likewise, (S − + Ãj ) p = m i=1 Mi with analogous bounds for the summands.The bound in (A2) follows from Holder's inequality (Fact A.1), applied for all possible relative positions of S − and A j with all parameters p i = p.Note that this general bound ignores the noncommutativity of the matrices.The binomial coefficients p k are exactly the number of relative positions of the k appearances of A j among the p − k appearances of S − .
To proceed, since the order parameter t = 2, the first and second moments of the random matrices A i and Ãi match for each index i.As a consequence, Crucially, this implies that subtracting the expected moments completely cancels the first-order terms M 1 and M1 and the second-order terms M 2 and M2 .Thus, The notation (A j → Ãj ) denotes a replica of the first term with A j replaced by Ãj .To reach the second line, we collect the higher order terms M 3 , • • • , Mp and M3 , • • • , Mp , we bound the binomial coefficient p k ≤ p k , and we use the convexity of the p-norm The last inequality bounds the geometric series using the elementary numerical inequality We have successfully established a comparison between the quantities |||S j ||| p p and |||S j−1 ||| p p .
Step II: Solving the recursion.We have expressed the difference between the moments of S and S as a telescoping sum (A1) of moments of hybrid matrices S j .The first part of the argument yields a bound on the change in moments at each step in terms of a smaller moment of the hybrid matrices.We can use these results to develop coupled difference inequalities, which we must solve.
Define the scalar quantities The boundary values of the sequence (x j ) are the moments of the original independent sums that we seek to compare: x m = E Tr Sp and x 0 = E Tr S p .To bound the differences of this sequence, we introduce the notation Using the inequality (A3) for the jth step of the exchange, we arrive at the coupled difference inequalities: Our task is to produce bounds for the terminal value x m in terms of the initial value x 0 and the coefficients a j and b j .
To do so, we pass to a differential equation.The proof appears at the end of this section.For a fixed integer 1 ≤ k ≤ p, consider the differential inequality Then each solution x(s) to the differential inequality overestimates the solution (x j ) to the coupled difference inequalities (A4) in the sense that x(j) ≥ x j ≥ 0 for each j = 0, . . ., m.
The following ansatz provides a solution to the differential inequality (A5).The proof of this lemma appears at the end of the section.Then y solves the differential inequality (A5).
We are now prepared to solve the coupled difference inequalities.Instantiate Lemma A.1 and Lemma A.2 with parameters s = m and k = 3 to arrive at the one-sided inequality This inequality provides an upper bound for the difference x 3/p m − x 3/p 0 , where we recall that x m and x 0 are the pth moments of the two independent sums.Taking the third root and bounding the 3 norm by the 1 norm, we also have the estimate This statement is slightly weaker, but it may be easier to interpret and apply.We may repeat the same argument switching the roles of A j and Ãj , noting that coefficients a j and b j remain the same.This yields the desired two-sided estimate: Introduce the values of a j and b j and evaluate the numerical constants to complete the proof of the theorem for random matrix models with matching second moments (t = 2).
Proof of Theorem IV.1:More matching moments.Using an analogous argument, we can obtain related results comparing random matrix models where the moments match.Fix t ≥ 2. Suppose that each pair A i and Ãi of summands has matching moments up to order t.In this case, the terms M 1 , . . ., M t cancel with M1 , . . ., Mt , so the error depends only on the higher order terms M k and Mk for k ≥ t + 1.
Pursuing this observation, we arrive at the bound where Using the notation L p,k from the statement of the theorem, we reach the estimate For 2 ≤ t ≤ p, each of the leading constants is bounded above by 2. This completes the argument.
Proof of Theorem IV.1: Refined statistics.Last, we establish the result with more precise statistics of the random summands.To do so, we simply replace Hölder's inequality (Fact A.1) by a more refined moment inequality.
Fact A.2 (Trace inequality for random matrices).Let A and Y be random Hermitian matrices that are statistically independent.Consider a product with k copies of A and (p − k) copies of Y in any order, where In this expression, We invoke this result with Y = S − and with A = A j or A = Ãj .We can also take the minimum of this bound with the bound via Hölder's inequality to see that the tail parameter L p,p does not get worse.The rest of the proof is the same.
Finally, we complete the proofs of the two lemmas that were required in the argument.
Proof of Lemma A.1.By assumption x(0) ≥ x 0 = 0.For an induction, assume that x(j − 1) ≥ x j−1 for an index j ≥ 1.Then To reach the second line, note that the function x(s) is increasing because the right-hand side of the differential inequality is positive.Then observe that the coefficients a(s) = a( s ) and b(s) = b( s ) are constant on the domain of integration.By induction, we obtain x(j) ≥ x j for each j = 0, . . ., m.This is the stated result.
Proof of Lemma A.2.To verify that the ansatz satisfies the differential inequality, first note the initial condition y(0) = x(0).Using this fact, we take the derivative: The inequality depends on the fact that (k/p) − 1 ≤ 0 and . This is a direct calculation using the fact that all the terms are positive.Therefore, the ansatz solves the differential inequality.

The resolvent
The comparison principle extends to other functions besides polynomial moments.In this section, we study moments of the resolvent: As usual, H and H are defined in (8).The parameters ω ∈ R and η > 0.
Theorem IV.2 (Universality for resolvent moments).Instate the assumptions and notation of Theorem IV.1.
For each even natural number p ∈ 2N, the polynomial moments of the resolvent (11) are related by The symbol suppresses constants depending only on t.
Whenever the right-hand side is small ( (pη) −1 ), we may take pth power to obtain the expected density of states (filtered by the resolvent) up to a multiplicative error.For our Pauli string ensemble, we can achieve this outcome because L 3p,∞ = 1/ √ m, and the number m of summands is chosen sufficiently large.
As compared with the polynomial moments, universality for the resolvent involves some additional technical challenges.They stem from the fact that the resolvent has an infinite Taylor series, and it is a nonconvex function of the random matrix.To address the first concern, we follow [12] and truncate the Taylor series at a carefully chosen order.To that end, let us recall the statement of Taylor's theorem with an integral remainder.

Fact A.3 (Taylor with integral remainder). If the function
The Taylor expansion of the resolvent has a rather involved expression.Fortunately, we merely need bounds for the higher-order terms.
Proposition A.1 (Expanding the resolvent).For Hermitian matrices S and A of the same order, consider the matrix Z = S + iηI where η ∈ R.Then, for each even natural number p ∈ 2N, for k = 0, . . ., 3p.

(A6)
The term M k is a noncommutative polynomial of degree k in the variable A and degree p + k in the variables Z −1 and Z − † , where − † refers to the conjugate transpose of the inverse.
Proof of Proposition A.1.First, we expand The first line depends on the fact that Z + A and Z † + A commute.The second line uses the expansion Next, we collect into the matrix M k all terms with total power k on the matrix A and total power p + k on the matrix Z −1 or Z − † .For 0 ≤ k < 3p, there are p+k−1 k ≤ (4p) k such terms.Then we apply Hölder's inequality (Fact A.1) to each term contributing to M k .This step yields for q 1 = 3p 3p − k and q 2 = 3p k .
To treat additional powers of Z −1 , use a uniform bound for the resolvent: . We reach the advertised bound for terms of order 0 ≤ k < 3p.
For the remainder term in the Taylor expansion (K = 3p), we may compute the Kth derivative using (A7) and invoke the same method to obtain a bound.
We have applied the uniform bound (Z + sA) −1 ≤ η −1 .Introduce the last display into the integral remainder term in the Taylor expansion (A.3).We reach the required estimate for K = 3p.
Proof of Theorem IV.2.To obtain a comparison of the resolvents, we apply Lindeberg's method again.For clarity of argument, we will assume that there are t = 2 matching moments; the general case is similar.Define hybrid matrices and their resolvents: A i and R j := 1 H j − ω + iη for j = 0, . . ., m.
Consider the telescoping sum We must bound each of the terms in the telescope.
Step I: Single update error.Fix an index j = 1, . . ., m.The jth update replaces the summand A j with Ãj .Define the unchanged part of the matrix and its resolvent: Since the moments of A j and Ãj match up to second order and these matrices are independent from H − , the terms M 0 , M 1 , M 2 cancel the terms M0 , M1 , M2 in the Taylor expansion of the resolvent powers (Proposition A.1). Thus, The first inequality bounds the geometric series of the error terms (A6): The second inequality combines the bounds for the two different summands A j and Ãj .The third inequality requires some comment.By another Taylor expansion, we may control the moments of R − using the moments of R j−1 : Bounding the geometric series as ) and noting that |||A j ||| 3p ≤ c j , we find that Last, raise both sides to the (p − 1)/p power, and use the numerical inequality (a + b) (p−1)/p ≤ a (p−1)/p + b (p−1)/p for a, b ≥ 0 to reach (A8).A similar bound holds when t ≥ 2 moments match.
Step II: Solving the recursion.The recursion is similar to the proof of Theorem IV.1.We will present this argument for a general choice of t ≥ 2. First, introduce the scalar variables (2(t + 2)pc j ) (t+1)p η (t+2)p .

Concentration for resolvent trace
In this section, we study the qth moments of the resolvent trace, which, by Markov's inequality, gives the concentration of local density of states needed for Theorem II.1.The concentration fundamentally differs from the calculation for the expectation and does not explicitly refer to an ideal random matrix ensemble (e.g., the GUE).It suffices to introduce an independent copy by the convexity of q-norm where |x| q := (E |x| q ) 1/q which allows us to utilize powerful concentration inequality for martingales.The estimate depends on an expected moment Tr |R| p q , which we bound independently in Section A 3 a.
Theorem A.1 (Concentration for resolvent trace).For independent centered matrices A 1 , . . ., A m , consider identical copies A j of A j .Then, the resolvent trace concentrates Crucially, the estimate depends on a variance-like quantity σ 2 * (A) that more fully reflects the randomness of the random matrix A. For the Pauli string ensemble, this quantity is significantly smaller than the ordinary matrix variance: The quantity σ 2 * (A) arises from the following bound.Fact A.4.Consider a random matrix A and a fixed matrix B with compatible dimensions, then Proof of Fact A.4.Consider the singular value decomposition B = j |v j s j u j |.Then, The first inequality pushes the expectation inside the sum, and it applies Hölder's inequality to the sum.The second inequality is Cauchy-Schwarz.This is the advertised result.
Also, the proof of Theorem A.1 employs a refined scalar martingale inequality as follows.(For an introduction to martingales, see [60].)Theorem A.2 ( [33]).For scalar martingales difference sequence d j (i.e., E j−1 Significantly, the conditional expectation E j−1 appears inside the norm, which then allows us to exploit the second-moment properties of A i via Fact A.4.Otherwise, applying a crude martingale inequality, such as uniform smoothness, gives a looser bound in terms of E | A j | 2 instead of σ 2 * (A).The weaker bound does not properly reflect the randomness in A j .
Proof of Theorem A.1.As usual, we write the telescoping sum We evaluate the individual q-norms Tr R (r) Tr R (r) The second inequality further expands the resolvent difference.The third inequality uses Holder's inequality for the trace.The last inequality uses Holder's inequality, counts the combination of r, s, t by p+1 2 , uses that A and A have the same distributions, and uses that R j , R j−1 have the same distribution as R.
For the sequence b j , we apply the scalar martingale inequality (Theorem A.2). We calculate the predictable quadratic variation using Fact A.4 and the uniform bound on resolvent R ≤ η −1 .We calculate the maximum by The bound for a j is completely analogous.Combine the above estimates to obtain the advertised result.

a. Expected moments
To make use of Theorem A.1, we also need to estimate the expected moments via the comparison argument.
Theorem A.3 (Expected resolvent moments).For independent centered matrices A i , suppose the moments match that of idealized matrices Ãi The symbol suppresses constants depending only on t.
Proof.We begin with a telescoping sum We move on to control the moments of trace Tr |R| p q q , which uses a similar argument as Theorem IV.1 and Theorem IV.2.We present the calculation for t = 2, but the general case is analogous.
Step I: Single update error.We again start with the telescoping sum The Taylor expansions satisfy the bound from Fact A.1: The first inequality is analogous with the calculation (A6).Recall that the p-norms are normalized O p = (Tr |O| p ) 1/p .The second inequality proceeds with an additional q k factor.We then bound the expected increments by canceling the first and second-order terms f 1 and f 2 Step II: Solving the recursion.Again, we simplify the recursion by defining scalar variables (and also consider matching moments up to order t).
and that a 0 := b 0 := 0. The updates (A8) can then be written as a scalar recursion j−1 + b j for each j = 1, . . ., m which, in fact, takes the exact same form as Theorem IV.2 up to p → qp.Regardless, we write down the remaining calculation for completeness.The arguments as before (Lemma A.1, Lemma A.2) give the bound for the endpoints The second inequality uses the uniform bound j = max(||| Ãj ||| 3p , |||A j ||| 3p ) ≤ L. The last inequality drops the denominator 2 (t−1)/qp ≥ 1 and uses Young's inequality for m 1/qp ≤ 1 + m/qp.This is the second advertised result.
Then, there is an absolute constant c such that we have with probability at least 1 − 1 N .Proof.In the setting of [23,Corollary 11.4], set D = 1 and = 1/2.The range |E| ≤ 10 extends to infinity since the semicircle density ρ sc is supported on [−2, 2] and the error must be decreasing for |E| ≥ 2.

Corollary B.1 (Resolvent moments for the GUE).
There is an absolute constant c such that for each ω, η, we have The first inequality uses integration by parts and the boundary value The third line uses Fact B.1 to handle the high probability event (B1).To reach the last line, we compute the integral using the fact that the resolvent power f (E) = |E − ω + iη| −p increasing for E < ω and decreasing for E > ω, so the integral equals 2f (ω) = 2η −p .To bound the maximum, note that 0 ≤ f (E) ≤ η −p .Finally, increase the constant c as needed to combine the terms.√ q p η p+1 σ * where σ * := sup For a GUE matrix, we have σ Proof of Proposition B.1.This is a standard application of Gaussian concentration inequalities.We cannot find the particular function of interest elsewhere, so we include a derivation adapted from [12].Some of the estimates could be loose in general but suffice for our purposes.Consider the function We bound the Lipschitz constant The second equality is a telescoping sum.The first inequality uses the identity A The last inequality uses the triangle inequality, the uniform bound that R ≤ η −1 , and the coarse bound < l a t e x i t s h a 1 _ b a s e 6 4 = " V 0 l s o y j 4 n f u 4 0 s 6 j v 5 f S F e M 1 x i y p q F b M e W M U P H / k 8 4 r t Z X y c q n P i + f h O C G M I R T r c d 0 M t H 1 0 p t 6 6 N t r e c i o K q w z j K q t W b 6 R 0 7 d z 1 h S Q 5 A / r J v Y u a R Z R U X F q c 0 7 / E 6 o 3 n e A V 7 X K A k 5 j N K 0 G 7 6 I E R 5 k m l K D q 2 o G s K u m s E i K s n y 5 u l J k 7 U k k N X x X 4 t g j 6 j I R W p n U J H e S l T v x a 5 z h J j u r J n 3 2 c a j Y o v p 3 i 1 d a w e X + o w q o + o p v 3 J B B v 9 n e 3 m j 9 s v O i / q r / a W n 1 P P n K + c r 5 1 v n K b z k / P K O X L a T s 8 J H O L 8 5 v z u / F H 7 s / Z X 7 e / a W y 1 9 + m S Z 8 6 X z 6 K r 9 8 y + j b G Z s < / l a t e x i t > E -2 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " f y 8 J w v 7 e q J l N + o e L 5 1 r B 5 f 6 j C q j 6 j 5 l 5 L 7 9 k a / 3 W j 9 0 F g 9 W 6 2 / 2 p p 9 T j 1 z v n K + d r 5 x W s 6 a 8 8 r Z d z p O z w m c x P n V + c 3 5 v f Z H 7 a / a 3 7 V / t P T p k 1 n O l 8 6 j q / b v f / 8 4 a + A = < / l a t e x i t > |R| p < l a t e x i t s h a 1 _ b a s e 6 4 = " Q D x q L Z / P P J 0 3 f e f e / 9 D 5 5 9 u P D R x 7 V P P l 1 c + q w f h w n x o M 8 4 + q c L Z 1 r B 5 f 6 j C q j 6 j Z l 5 L 9 5 k a / 3 W h 9 3 1 g 5 W 6 m / 2 p x + T j 2 z v r S + s p 5 b L W v V e m X t W R 2 r Z 3 n W L 9 a v 1 m / W 7 7 U / a n / V / q 7 9 o 6 V P n 0 x z P r c e X b V / / w M w s 3 I A < / l a t e x i t > O( ⌘ p p ) < l a t e x i t s h a 1 _ b a s e 6 4 = " C T x S 8 f + T z i u 1 1 f L q q c + r 5 6 E 4 I Y w h H O j x 3 Q y M f U y m 3 r o u 2 t 5 y K g r b D K s q i 1 Z v p U z c H O W F J C U D + s m 7 i 5 p k V B R c + p y T v 8 T q j e d 4 D X t c Y C z l N 3 X g q 7 u g R H u s 1 q h O 6 6 g Z w t 6 S w S I q y f L m 7 k m z d S S Y 1 / F f t X B k N G Y i t x N o a O y k q l f h 1 w U m T V d 1 X P v M 0 1 G + t s p X W w d q 8 e X O o z q I 6 r t f j L B x m B j v f 3 j + o u z F 8 1 X W / P P q W f e V 9 7 X 3 j d e 2 / v J e + X t e V 2 v 7 0 X e x P v N + 9 3 7 o / F n 4 6 / G 2 8 b f R v r 0 y T z n S + / R 1 f j n X 7 e T Z w 8 = < / l a t e x i t > E 0   log(N ) so that the numerator is bounded by 2(1 + /2).The last inequality uses the elementary estimate 1+ /2 1+ ≤ e − /4 for ≤ 1/2.Note N = 2 n to obtain the advertised result.

Abundance of low-energy states and success of phase estimation
In this section, we combine the bounds on the minimal eigenvalue (Theorem III.1) and the density of states (Theorem A.1) to obtain the low-energy density of states.This immediately implies applying phase estimation on the maximally mixed state returns a low-energy witness with a nonnegligible success probability.Extracting the density of states.Assuming the claim holds, it remains to extract the spectral density from the low-energy subspace proxy Q(E 0 ).We crudely estimate the function q E0 (E) by spliting the spectrum by a step function 1(E ≤ E 0 + η) + e −Ω(p) 1(E ≥ E 0 + η).
The second inequality uses concentration (C2) and plugs in the GUE value E Tr Q(E 0 ) (C1), which is approximately the semicircle integral.The third inequality imposes additional constraints log(N ) ≥ const.log(1/ ), which can be combined with N ≥ const/ 4 by N ≥ −c1 for some constant c 1 .Combine with the tail bound (Theorem III.1) for the operator norm H P S for = /6 (using that λ min (H P S ) ≥ − H P S and that which costs Poly(m, 1/ ) using any off-the-shelf quantum simulation algorithm such as Trotter [39], Qubitization [41], or qDrift [15] for Hamiltonian simulation within phase estimation.We may amplify the success probability to The last inequality is Markov's.We proceed in bounding the denominator 10 Strictly speaking, the process we illustrate is a quantum channel involving both quantum gates and classical randomness (i.e., repeating until success is observed).A fixed, deterministic circuit could be constructed by performing phase estimation on half of an input maximally entangled state (for which the reduced density matrix is maximally mixed), and performing fixed-point amplitude amplification [61] to coherently boost the probability of success.
In other words, any deterministic input state (that does not correlate with the Hamiltonian) is very likely to have small energies.Union bound over an epsilon net.By a union-bound, good concentration implies that the energies must be simultaneously small for a large family of deterministic input states, specifically, the input states drawn from an epsilon net over a small circuit.For a circuit consisting of G gates, there exists an Plug in the assumption that m ≤ 2 N 2 to obtain the advertised result.
We suspect the true circuit complexity to be Ω(m), but the current union bound argument can only give Ω( √ m).The concentration inequality needs to handle the event when the same Pauli string occurs Ω( √ m)-times.There, the variance is much larger, and the optimum is roughly H = Θ( √ vn).Plugging into the union-bound yields The union bound only supports size-O(n) circuits, roughly the circuit size of product states.

Appendix E: Missing proofs
In this section, we collect missing proofs.Proof.The first and third moments vanish for both sets of matrices A i and Ãi .We calculate the second moment 4 = " v e t g v o b R u P 6 W i 6 I g F B O n C 1 e c E 8 E = " > A A A O K H i c h Z f L b u M 2 F I Y 1 0 9 s 0 r d t J u + x G q B G g K N r A D q a X z Q C T K M 7 9 Y i e 2 4 0 l k B K R M 2 4 S p C y i K H c N Q X 6 D b 9 i X 6 N N 0 V s + 2 T l B L t i O F x O 9 q Y P N 9 / e D n i L 1 k 4 Y T Q V j c b b J 0 / f e / + D D z 9 6 9 v H G J 5 / W P v v 8 + e Y X / T T O e E B 6 Q c x i P s A o J Y x G p C e o Y G S Q c I J C z M g N n n k F v 5 G E p z S O u m K e k G G I J h E d 0 w A J F e q 0 7 p / X G 9 u N 8 n J h o 7 l s 1 J 3 l 1 b 7 f r G 3 6 o z j I Q h K J g K E 0 v W s 2 E j F c I C 5 o w E i + s e F n K U l Q M E M T s k B h m s 5 D n L t b I R L T 1 G Z F c B 2 7 y 8 T 4 5 + G C R k k m S B Q o i W L j j L k i d o t d u C P K S S D Y X D V Q w K m a 2 g 2 m i K N A q L 2 u n e X x u n C o A l u u O W V Z S 1 X G W f p S 8 I x 8 5 y L G y l j 6 E r O M D B d T N T H n Z K w y r c E 4 m h G R b 2 y Z w Q g J T H E x y x b C m B O 5 4 X M S k V + C O A x R N F r 4 k g T 5 w i 8 W 4 p t h b 1 d F i x U H i C 1 2 c 5 v u G X Q P U M Q 8 g 3 s 2 3 1 t R j C H 0 9 o 3 U f U B b B m 2 B g V v V w A B 6 B 0 b q A a C H B j 0 E 9 M i g R 4 A e G / Q Y 0 B O D n g B6 a t B T Q M 8 M e g b o u U H P A b 0 w 6 A U o 1 U V V K g C 9 S y P 1 E t C 2 Q d t g 4 H Y 1 M I B e x 0 j t g

N b b 3 6 2 / 2 Figure 1 :
Figure1: Abundance of low energy states.The contour illustrates the density ρ versus the energy level E for a semicircular distribution, which is (in the large dimension limit) the spectral distribution for the GUE.The semi-circle spectral density implies the abundance of states near the ground energy.Performing phase estimation over the maximally mixed state gives a state with low energy −2(1 − ) with a decent probability Ω( 3/2 ).

N b b 3 6 2 / 2 <
P H v Z f L 0 5 / 5 x 6 5 n 3 h f e l 9 5 b W 9 7 7 3 X 3 p 7 X 9 f p e 5 I 2 9 X 7 x f v d 8 a v z f + a P z Z + N t I n z 6 Z 5 3 z u P b o a / / w L O s N m k A = = < / l a t e x i t > l a t e x i t s h a 1 _ b a s e 6 4 = " C 2 A D P D t F p u m O H r 2

N b b 3 6 2 / 2 Figure 2 :
Figure 2: (Right) What could have gone wrong.For a generic matrix, the phase estimation strategy would not find a low-energy state if there are spectral outliers or if the spectral density gets too small near the ground energy.(Left) Almost a semi-circle.For the Puali string ensemble, we control the ground energy by Schatten p-norms and control the spectral density by the resolvent.Both values are comparable to the GUE, which has a favorable semicircular spectrum.Therefore, enough states remain near the ground energy, and thus phase estimation efficiently finds them.

O 1
p := Tr |O| p 1/p and |||O||| p := E Tr |O| p 1 t c 6 P a 8 9 P n 7 d e b s 4 / p 5 5 4 X 3 l f e 8 + 8 j v e T 9 9 L b 8 7 p e 3 4 u 8 x P v N + 9 3 7 o / l n 8 2 3 z r + b f R v r 4 0 T z n S + / B 1 f z n X z u z a W A = < / l a t e x i t > O(⌘) < l a t e x i t s h a 1 _ b a s e 6 4 = " 5 U 8 O / b 8 z T U b 6 G y p d b B 2 r x 5 c 6 j O o j q u N + M s H G Y H 2 t 8 / 3 a i 7 M X r V e b 8 8 + p Z 9 4 X 3 p f e V 1 7 H + 8 F 7 5 e 1 5 X a / v R V 7 q / e L 9 6 v 3 W / L 3 5 Z / O v 5 t 9 G + v T J P O d z 7 9 H V / O d f 7 p N q r w = = < / l a t e x i t > |R| < l a t e x i t s h a 1 _ b a s e 6 4 = " / D Q 4 X n e s X J 9 Z 1 P H / g W Z B 1 p 5 Y D 1 9 z P H 8 T W 5 S M 0 G R S m D / C i 9 p O m T 4 D b E B 4 v T h c A 7 4 E q D d E Q S k t b r x h b s p a h o I 7 w u P n 7 Y 3 N b 8 m b t 0 X d l u e a y k P R W 8 P 1 7 c U J j 8 r F B H q x b t 7 y C j d Y s u 7 U C D e H

h 6 1
j d v r z y I 8 q r f z L B x m B z w / t u 4 8 X Z i 9 a r 7 f n n 1 F P n c + d L 5 y v H c 7 5 3 X j n 7 T t f p O 6 H z i / O r 8 5 v z e / O P 5 l / N v 5 v / G O m T l X n O Z 8 7 C 7 9 n K v 5 D 1 c h A = < / l a t e x i t > ⇠ 1 |E !|

Figure 3 :
Figure 3: Probing the spectrum by resolvents.The resolvent (black curve) centered at energy ω with resolution parameter η filters out energies distant from ω. Taking powers of the resolvent (red curve) focuses the filter on a narrower region around ω.

Proposition V. 1 (
Efficient classical witness at arbitrary constant accuracy).For any and large enough m = Ω(Poly(n, −1 )), there is a degree d = O(1/ √ ) polynomial p d (x) such that the associated ansatz state has low energyρ ∝ p d (H) 2 such that Tr[ρH] ≤ (1 − )λ min (H).Further, this can be efficiently verified classically in runtime O((dm) 2d nd).Proof of Proposition V.1.Using power series approximation (i.e., Taylor expansion) for the Gibbs state gives a suboptimal degree d = O(1/ ).A better degree d = O(1/ √ ) can be achieved using Chebychev's polynomial approximation of the Gibbs state.The verification algorithm simply evaluates all the (dm) 2d terms in the ansatz state ρ.Each of the (dm) 2d terms require 2d-multiplications of Pauli strings, each with cost O(n).

Lemma A. 1 (
From differences to derivatives).Define coefficient functions a(s) := a s and b(s) := b s for s ∈ [0, m].

Proposition B. 1 (
GUE: Concentration for resolvent moments).For a matrix with jointly Gaussian entriesH := i g i A i where g i ∼ N (0, 1)and even p, the spectral density (probed by resolvent powers) concentratesTr |R| p − E Tr |R| p q d b b J 0 / f e / + D D z 9 6 9 v H K J 5 8 2 P v v 8 + e o X g z w t e E T 6 U c p S f o V R T h h N S F 9 Q w c h V x g m K M S O X + C 7 Q / F I S n t M 0 6 Y n 7 j Ax j N E n o m E Z I q N B F 5 7 Z 1 + 7 z Z W m 9 V l w 8 b 7 X m j 6 c 2 v 7 u 1 q Y z U c p V E R k 0 R E D O X 5 T b u V i e E M c U E j R s q V l b D I S Y a i O z Q h M x T n + X 2 M S 3 8 t R m K a u 0 w H l 7 G b Q o x / H s 5 o k h W C J J G S K D Y u m C 9 S X + / D H 1 F O I s H u V Q N F n K q p / W i K O I q E 2 u 3 S W R 6 v C 8 c q s O b b U 1 b V V I W 8 y 1 8 K X p D v f M R Y F c t f Y l a Q 4 W y q J u a c j F W m M x h H d 0 S U K 2 t 2 M E E C U 6 x n W U M Y c y J X Q k 4 S 8 k u U x j F K R r N Q k q i c h X o h o R 0 O N l V U r z h C b L Z Z u n T L o l u A I h Z Y P H D 5 1 o J i D G G w b a V u A 9 q x a A c M 3 K k H B j D Y s V J 3 A N 2 1 6 C 6 g e x b d A 3 T f o v u A H l j 0 A N B D i x 4 C e m T R I 0 C P L X o M 6 I l F T 0 C p T u p S A R i c W q m n g H Y t 2 g U D d + u B A Q z O r N Q z k H p W p w I Y n F u p 5 y D 1 v E 4 F M L i w U i 8 A 7 V m 0 B 2 j f o n 1 A B x Y d A H p p 0 U t A r y x 6 B e h r 3 8 K q 4 / J r C 1 8 / p t 9 q V 2 e 5 E k j E V Y O y N N F P i 0 c D y M 3 K 9 t D b E h m A A N g y A P h d Y g M w A I E B w O R y 2 w D g b 9 k x A L h X 7 h g A j C t 3 D Q C e l X s G A L v K f Q O A U + W B A c C k 8 t A A 4 E 9 5 Z A C w p j w 2 A L h S n h g A P C d P D Q B 2 k 1 0 D g J n k m Q H A K v L c A G A E e W E A 8 I D s G Q C O v + w b A E 6 + H B g A D r 2 8 N A C c d 3 l l A D j q s j r q i s B T L q 8 N u Q Y g p 5 N 4 f l D 9 0 H Q c C a 4 1 i C u R X K q S f J o u h t H N 0 n 0 / 6 j v M y F g 4 e b r E n E 6 m A h g L J 8 V 8 z l C 1 n L R p B a d I L I N 4 k Q h J / D B m D G H 1 l q h g o A / i 2 m P a W U D g K Z z G Z L I o U Gg 6 j m S E J p P S / B B e O j t l + g y w A e F u c b g G f A l Q f w I F p b S 8 a Q 9 n p q x V K L w j P P m + t b 7 x A 3 n z r q j f b P u m 8 l D 0 z r C 7 v S T l c b W Y U C / W n z X b p R 8 u W X d m h B t D k z J r b p S 3 W r x k u M y W t b V s w 5 U

Figure 4 :
Figure 4: Probing the low-energy states E ≤ E0 via consecutive resolvent powers.
The second inequality converts the operator norm to the p-th moments by H p ≤ Tr H p and the third inequality keeps the leading order terms via the notation O(•).The third line uses m ≤ N 2 and chooses appropriate parameters p = c log(N )/ and m = c 1 p 4

Theorem II. 1 (= c 2 n 5 4 3 ) 2 , η = 3 • 2 , 2 1 η p 2 − 2 E 2 E
Low-energy states have low complexity).For any accuracy ≥ 2 −n/c1 , let H P S be drawn from the Pauli string ensemble (2) with m terms.Then, the following statement holds with probability at least 1 − e −c3n 1/3 over a random draw H P S from the Pauli string ensemble.We can prepare a low-energy state ρ such thatTr[ρH P S ] ≤ (1 − ) • λ min (H P S ) (using a circuit of size G = Poly(n, −1 ).The quantities c 1 , c 2 , and c 3 are absolute constants, and λ min (H P S ) denotes the smallest eigenvalue of H P S (which is typically negative).Proof of Theorem II.1.The resolvent probes the local density of states, and we are interested in controlling the integrated density of states in an energy window.The idea is to construct a proxy for the low-energy projector by consecutive local resolvents (Figure4).Considerη p 2≤ ω≤E0 |R ω,η | p =: E |E E| q E0 (E).=: Q(E 0 )as a proxy for the projector E |E E| 1(E ≤ E 0 ) at low-energy E 0 := −(1 − /3) • 2. The resolvents are spaced appropriately |R ω,η | p := 1 |H − ω + iη| p for ∈ Z, ω := √ p • and p = c 1 log(N ) .We will calculate E Tr Q(E 0 ) for GUE, show that Tr Q(E 0 ) for the Pauli string ensemble takes comparable values, and then extract the low-energy density of state.GUE values.Recall the GUE resolvent values (Corollary B.1) E Tr | R ω,η | p ≥ S p p /( − 1)ω • √ p where S ω,η,p is an integral over the semicircle defined in Corollary B.1.The second inequality uses the GUE estimate (Corollary B.1) and imposes the simplifying constraint N ≥ const/ 4 such that S ω,η,p /2 ≥ const.N −1/2 even near the spectral edge.The third inequality evaluates S ω,η,p .Apply a crude Riemann sum over the semi-circular density near the edge and drop constants to obtain bounds on the expected value E Tr Q(E 0 ) := 2≤ ω≤E0 η p E Tr | R ω,η | p = Ω Paulis string ensemble values.Take a crude union bound over the local resolvents, we ensure all of them are at least half of the GUE expectation with high probability Pr Tr Q(E 0 ) ≤ 1 Tr Q(E 0 ) ≤ ω≤E0 Pr Tr |R ω,η | p ≤ 1 Tr | R ω,η | p √ pe −Ω(log(N ) 1/3 ) 1 (Claim).(C2)

Still, we obtain
a growing circuit size lower bound Ω( √ m) by an elementary argument.In retrospect, it crucially depends on the noncommutativity of Pauli strings: the variance is suppressed by dimension.In contrast, the argument only gives Ω(n) circuit size lower bounds (which is useless) for random complete k-local Hamiltonians for fixed k.Concretely, let P k be the set of Pauli strings of weight k and consider the ensemble H = σ∈P k r σ σ where r σ are uniform random signs.Then, as in the proof of Lemma D.1, define a σ = r σ H σ ψ , and compute (viewing k = O(1)) v = σ∈P k E[a 2 σ ] = Θ(|P k |) = Θ(n k ), and a σ ≤ L = 1.
2 √ m -net {|ψ i } for Circ(G) with cardinality #{|ψ i } ≤ exp O(G log(G √ m/ )) .This is justified as follows.Any circuit with G two-qubit gates is equivalently given by a product of fixed CNOT gates interspersed with KG single-parameter single-qubit rotation gates by certain angles, with K = O(1).If we cast a ( /(2 √ mKG))-net over the interval [0, 2π] for each of these KG rotation angles, the set of circuits we generate will form an /(2 √ m)-net over states in Circ(G), and the cardinality of the set is at (4π √ mKG/ ) KG .One of the elements of this net is guaranteed to approximate the state |ψ ∈ Circ(G) that achieves the supremum of H ψ up to error /(2 √ m), and since H ≤ √ m holds, we have that sup |ψ H ψ ≤ max i H ψi + /2.We have therefore reduced the supremum over the state on a size-G circuit to the maximum over the ( /(2 √ m))-net, where the union bound applies (D1) Therefore, there exist constants c 1 , c 2 such thatG ≤ c 1 min( √ m, 2 N ) • log −1 (m) implies Pr sup |ψ ∈Circ(G)