An improved 1D area law for frustration-free systems

We present a new proof for the 1D area law for frustration-free systems with a constant gap, which exponentially improves the entropy bound in Hastings' 1D area law, and which is tight to within a polynomial factor. For particles of dimension $d$, spectral gap $\epsilon>0$ and interaction strength of at most $J$, our entropy bound is $S_{1D}\le \orderof{1}X^3\log^8 X$ where $X\EqDef(J\log d)/\epsilon$. Our proof is completely combinatorial, combining the detectability lemma with basic tools from approximation theory. Incorporating locality into the proof when applied to the 2D case gives an entanglement bound that is at the cusp of being non-trivial in the sense that any further improvement would yield a sub-volume law.


I. INTRODUCTION
One of the striking differences between quantum and classical systems is the number of parameters that are needed to describe them. A classical system of n particles can be generally described by O(n) parameters, whereas an arbitrary state of a similar quantum system would generally require 2 O(n) parameters. This exponential gap is directly related to the phenomena of entanglement; quantum states do not have to be simple product states but can be an arbitrary superposition of such states.
But how genuine is this exponential gap? Is it an artifact of the fact that we are considering arbitrary quantum states, or is it an inherent characteristic of physical states that occur in nature, which are, of course, a much more restricted set of states? Among the best physical systems one may consider with respect to this question are quantum many body systems on a lattice, which are ubiquitous in condensed matter physics. These systems are generally described by a local Hamiltonian that models the local interaction between neighboring particles. In particular, one is interested in the entanglement properties of the ground state of these systems. If the system has a spectral gap > 0, we can approach this state by cooling the system to temperatures below that gap. Such cold systems would be in an almost perfectly coherent state, and one expects their quantum nature to be fleshed out most pronouncedly. What are then the entanglement properties of such states?
Area laws constitute one of the most important tools for bounding entanglement in such systems. 1 Starting from Bekenstein's seminal result that the entropy of a black hole is proportional to the surface of its horizon, 2 it was later conjectured that the origin of this entropy is the quantum entanglement between the inner part of the black hole and its surrounding. 3,4 This conjecture has led researchers to consider the scaling of entanglement entropy in the ground (vacuum) states in models of quantum field theories, 5,6 where it was demonstrated that also in there, entropy scales like the surface area of the region rather than its volume, albeit with some logarithmic corrections in critical cases. The same behavior was then demonstrated also in the context of spin chains by Vidal et al., 7 which has led to the formulation of the area-law conjecture that we now describe.
Consider a local-Hamiltonian system of d-dimensional particles that sit on a D-dimensional grid, as demonstrated in Fig. 1. We say that a state |ψ of this system obeys an area law if, for any contiguous region L of the grid, the entanglement between the particles inside L and the particles outside L is upper bounded by a constant times the surface area of L as follows: The area-law conjecture then states that if the system has a constant spectral gap > 0, its ground state necessarily obeys an area law. This is clearly much stronger than the trivial bound on the entropy (known as a volume law ), which is proportional to the number of particles inside L.
On the face of it, there are several reasons to believe that ground states of gapped local Hamiltonians obey an area law. One might intuitively expect that the entanglement is local because it is generated by local interactions. More formally, it has been shown by Hastings that in the ground state of gapped systems, the correlation between two local observables decays exponentially in their lattice distance. 8 It is, therefore, tempting to conclude that only the degrees of freedom near the boundary are entangled to those outside the region. However, the existence of data hiding states 9 implies that such simple reasoning is probably insufficient to prove an area law. 10 Indeed, even though the area-law conjecture was shown to be true in many specific models, mostly in 1D (see, for example, Refs. 1, 7, 11, and 12 and references therein), it was not until a few years ago, in a seminal paper, 13 that L L FIG. 1. An illustration of a quantum many-body system on a grid. The particles sit on the vertices and the edges denote 2-body interactions. When the system is described by a state that obeys an area law, the entanglement entropy between the particles inside the region L and the particles outside of it is proportional to the surface area |∂L|.
Hastings proved that it holds for all 1D systems with a spectral gap. In this case, an area law says that groundstate entropy across any cut in the 1D chain is bounded by a constant, independent of the system size. From this, one can deduce that, for any practical purposes, the ground state of such systems can be described by a polynomial number of parameters (say, by a Matrix Product State), instead of an exponential number (see Ref. 13).
Unfortunately, Hastings' 1D proof does not scale up to higher dimensions. One reason lies in the form of the entropy bound. For a 1D system with a bounded interaction strength J, a spectral gap > 0, and a particle dimension d, the upperbound is 2 O(X) , where X def = J log d . This exponential dependence on log d is catastrophic if we want to use this formula in higher dimensions by fusing together particles along surfaces parallel to ∂L. It implies an entropy bound of 2 O(|∂L|) instead of O(|∂L|). This is exponentially larger than the trivial volume-law bound of O(|L|). Proving area laws in two or higher dimensions therefore is still a wide open problem and a holy grail in quantum Hamiltonian complexity.
A combinatorial approach to proving the area law for 1D frustration free systems was introduced in Ref. 14. The proof replaced Hastings' analytical machinery including the Lieb-Robinson bound and spectral Fourier analysis with the detectability lemma, 15 a combinatorial lemma about local Hamiltonians. However, the resulting bound was no better than Hastings' bound because, at its heart, the argument followed the same outline as Hastings', including the use of a "mutual information saturation"-type argument (or, in effect, a kind of "monogamy of entanglement"-type argument) that leads to an exponential slack. Nevertheless, the combinatorial nature of the detectability lemma opened up the possibility of a new inherently combinatorial proof of the area law.

A. Our results
In this paper we give a new proof of the area law in 1D for the case of gapped frustration-free Hamiltonians. Our proof yields an exponentially better bound on the entanglement entropy than the bound in Ref. 13. Specifically, we prove Theorem I.1 Consider a 1D, frustration-free local Hamiltonian system H = i H i , over a system of n particles of local dimension d, with H i being 2-local nearestneighbor interactions with a bounded strength H i ≤ J. Assume further that the system has a unique ground state |Ω and a spectral gap > 0, and define Then along any cut in the chain, the von-Neumann entanglement entropy between the two sides is bounded by The importance of this work is threefold. First, in 1D, we exponentially improve the upper bound as a function of −1 : the bound in Eq. (3) implies an upper bound of O( −3 log 8 −1 ). This is within a polynomial factor of recent lower bounds that were found by Hastings and Gottesman 16 and Irani. 17 18 The former showed a 1D system with a fixed d and J, in which the entropy along a cut is at least O( −1/4 ), and the latter showed a lower bound of O( −1/12 ) for a translation invariant system. Second, and more important, is the relevance of this work for proving area laws in higher dimensions. A naive application of our bound to higher dimensional systems would yield an entropy bound of S ≤ |∂L| 3 poly(log |∂L|), which is still worse than a volume law, but is much better than the previous exponential bound. Moreover, as we show in Sec. V, in two or more dimensions, one can further exploit the local properties of the system along the boundary ∂L and improve the bound to |∂L| 2 · poly(log |∂L|). This bound is at the cusp of being nontrivial; any further improvement that would bound the entropy by |∂L| 2−δ for any δ > 0, would prove a subvolume law for 2D. In fact, this gives a subvolume law for the case of fractals with dimension 1 < D < 2, but we shall not pursue that direction here.
Finally, our approach is differs from Hastings' original proof. Here, the "monogamy of entanglement"-type argument is replaced with an iterative procedure to find product states with increasingly higher overlap with the ground state. The central quantity used to analyze the progress of this procedure is Schmidt rank, rather than more advanced tools like relative entropy and its monotonicity. While the iterative procedure is based on the detectability lemma, a much more intricate combinatorial structure is necessary to ensure that the procedure makes progress.
We now give a high-level overview of the proof.
B. High-level overview of the proof Consider a 1D chain of n d-dimensional particles with nearest neighbor interactions, described by the gapped, frustration-free Hamiltonian H = i Q i with a unique ground state |Ω . For the sake of simplicity, we assume that Q i are projections and, therefore, P i def = 1 − Q i are projections to the local ground spaces of the different terms.
The key to proving an area law across a cut, is to find a product state |φ = |φ L ⊗ |φ L with respect to the bipartitioning of the system, which has a large overlap with |Ω . Our approach to finding such a product state is to start with any product state with a nonzero overlap with |Ω and act on it with an operator that increases its overlap with |Ω , without increasing its Schmidt rank much. Specifically, we construct an operator K with the following property: K fixes |Ω , but when applied to any state |ψ , it shrinks the component orthogonal to |Ω by a factor of ∆ while increasing the Schmidt rank of |ψ by, at most, a factor of D. Clearly, there is a race between these two factors D and ∆. It turns out that when D · ∆ < 1/2, we can amplify the overlap with |Ω by replacing |φ = |φ L ⊗ |φ L with one of the Schmidt vectors of K|φ . This amplification continues all the way until the overlap is 1/(2D). A few more applications of K to this product state yield a state with Schmidt rank D O(1) , which has constant (D independent) overlap with |Ω . Further applications of K give rise to Schmidt coefficients with vanishing mass, and, therefore, the entanglement entropy of |Ω can be bounded by O(log D).
The task of proving an area law therefore is reduced to the task of finding an operator K with D · ∆ < 1/2. Our starting point is the detectability lemma (DL). Denote the spectral gap by > 0. We can partition the projections {P i } into two subsets of even and odd projections, which are called "layers" (see Fig. 2). Inside each layer, the projections commute because they are nonintersecting. Consequently, Π odd def = P 1 · P 3 · P 5 · · · and Π even def = P 2 · P 4 · P 6 · · · are the projections into the common ground spaces of the odd and even layers, respectively. Then, by the DL, the operator A def = Π even Π odd is an approximation to the ground-state projection. It preserves the ground state while shrinking its perpendicular subspace by an n-independent factor ∆ 0 ( ) 1 − c , where c is a geometrical factor. Moreover, each application of A increases the Schmidt rank of our state by a constant factor of, at most, D 0 def = d 2 (due to the projection that intersects with the cut in the chain; see Fig. 2). Unfortunately, we generally expect D 0 · ∆ 0 > 1, so the operator A does not, by itself, suffice to carry out our plan.
To construct the operator K we need several new ideas. First, we observe that D 0 and ∆ 0 can be replaced by D k 0 and ∆ k 0 , respectively, by coarse graining: Fuse k adjacent particles, making them a single particle of dimen-sion d k . Although this only increases the value of the product, it creates room for the next step, which is to modify the operator A to decrease the factor by which it blows up the Schmidt rank. For concreteness, assume that the even layer contains the projection that intersects with the cut. We will focus on a segment of m projections around the cut and denote their product by Π m , so Π even = Π m Π rest . We will replace the operator Π even withΠ m Π rest that closely approximates Π even while increasing the Schmidt rank by much less than D k 0 (when amortized over several applications).
One of the great benefits of using the DL is that the all projections in a given layer commute, and, hence, much of the following analysis becomes almost classical. Indeed, the m projections around the cut {P i } m i=1 define a decomposition of the Hilbert space of the system into a direct sum of 2 m eigenspaces, called sectors. Each sector is defined by a string s = (s 1 , . . . , s m ), such that if |ψ s is in the s sector, P i |ψ s = (1 − s i )|ψ s . A site with s i = 1 is called a violation, since it corresponds to a non-zero energy of the corresponding local Hamiltonian term, and m i=1 s i is the total number of violations in the sector s. Now, an arbitrary state |ψ can be decomposed as |ψ = |ψ 0 + |ψ 1 , where |ψ 0 is its projection on the zero violations sector and |ψ 1 is its projection on the violating sectors. Clearly Π m |ψ = |ψ 0 . To approximate this behavior, we will use the {P i } projections to construct an operatorΠ m that is diagonal in the sectors decomposition, and in additionΠ m |ψ = |ψ 0 +|ψ 1 , with |ψ 1 in the violating sectors and ψ 1 2 ≤ δ ψ 1 2 . It follows that the operatorÂ def =Π m Π rest Π odd approximates Π even Π odd in the sense thatÂ|Ω = |Ω ,Â|Ω ⊥ ∈ H ⊥ , To construct the operatorΠ m , first consider the op- The operator N counts the number of violations in a sector: If |ψ s belongs to the s sector, N|ψ s = |s| · |ψ s . The operatorΠ m will be a polynomial in N, with the polynomial evaluating to 1 on |s| = 0, and less than δ on input with |s| between 1 and m. Three ideas play a critical role in the construction of this polynomial and in bounding the increase in Schmidt rank. The first is the use of a Chebyshev polynomial, which achieves the desired behavior at the m + 1 points with a degree of only j = O( √ m log δ −1 ). The second idea is that it suffices to bound the entanglement across any of the m cuts and then to pay a further penalty of, at most, D I def = (D k 0 ) m to bound the entanglement across the cut of interest. So, if we consider the operatorÂ , each term has degree j (i.e., is a product of j of the P i s), and so the typical cut is crossed j /m times, resulting in a Schmidt rank increase by ( . This means that the incremental Schmidt rank per application of a term ofÂ is D k/ √ m 0 , which can be made arbitrarily small by choosing m to be large enough. Finally, a recursive grouping argument shows that we do not have to pay a price in Schmidt rank proportional to the number of terms in the polynomial (which would have been catastrophic); instead, we can decompose the operatorÂ as a sum of only 2 O(log 2 j) operators, for each of which there is a (possibly different) cut with entanglement increase of D Putting it all together, we have an operator K =Â , which increases the Schmidt rank by D = D ID , wherê achieves a shrinkage factor of ∆ =∆ for∆ = ∆ k 0 + δ. It is now a matter of simple algebra to fix the parameters m, δ, k and such that D·∆ < 1/2. The end results turns out to be log D = O(1) · X 3 · log 8 X for X = (log D 0 )/ , and this completes the proof.

Paper Organization:
We begin with some preliminary definitions and known results from mathematics and quantum information in Sec. II. In Sec. III we give the proof of our main result, Theorem I.1. The heart of the proof, which is its most technical part, is the diluting lemma. It is proved separately in Sec. IV. In Sec. V we provide an outline for our entanglement bound in 2D and beyond (the "almost volume law" result), and in Sec. VI we sketch the implication of our area law to the existence of a matrix product state approximation for the ground state. In Sec. VII we offer our summary and conclusions.

II. NOTATION AND PRELIMINARIES
Throughout this paper log(·) will denote the base 2 logarithmic function.

A. Local Hamiltonians
We consider a k-local Hamiltonian H acting on H = (C d ) ⊗n , the Hilbert space of n particles (spins, qudits) of dimension d that sit on a D-dimensional grid. Our main result concerns the 1D case with D = 1, but in Sec. V we will consider higher dimensions. We assume H = i H i where each H i is a non-negative bounded operator that acts non-trivially on a constant number of k particles (hence the term local Hamiltonian). We further assume that H has a unique ground state |Ω with ground energy 0 and has constant spectral gap > 0. Since the H i are all non-negative, the ground state must be a common zero eigenstate for each of the H i ; a Hamiltonian with this feature is known as frustration free. We denote by H ⊥ ⊂ H the orthogonal complement of the ground space of H. Thus, H ⊥ is an invariant subspace for H and We further assume that the H i terms are projections, and, henceforth, we will denote them by Q i to remind the reader. We define P i to be the projection on the ground The assumption that H is made of projections is not actually a restriction, as is demonstrated, for example, in Sec. 2 of Ref. 14. Indeed, given a general frustration-free system with a spectral gap > 0 and interaction strength bounded by H i ≤ J, one can always pass to an equivalent system that shares the same ground space, which is made of projections and has a spectral gap /J. Therefore, throughout the paper, we will drop the J dependence; we will assume J = 1 and work in dimensionless units.

B. The Schmidt decomposition
Given a state |φ and a bipartition of the system to two non intersecting sets, R and L, with corresponding Hilbert spaces H L , H R such that H = H L ⊗ H R , we can consider the Schmidt decomposition of the state along this cut: |φ = j λ j |L j ⊗ |R j . Here λ 1 ≥ λ 2 ≥ . . . are the Schmidt coefficients. Their squares sum 1 (if |φ is normalized) and are equal to the non-zero eigenvalues of the reduced density matrices of either sides of the cut.
The number of nonzero Schmidt coefficients in the Schmidt decomposition of |φ is called the Schmidt rank (SR), which we shall denote as SR(φ). The usefulness of the SR stems from it being a "worst case" estimate for the entanglement. As such, it is often easy to bound. The following facts are easy to verify:

If
O is a k-local operator whose support intersects both H L and H R , then it can increase the SR (with respect to the bi-partitioning) of any state by, at most, a factor of d k : SR(Oφ) ≤ d k SR(φ). If O intersects only one part of the system, its action cannot increase the SR.
3. Consider a 1D system. If r i and r j are the SR of |φ that correspond to cuts between particles i, i + 1, and j, j An important fact about the SR is the following corollary of the Eckart-Young theorem 19 , which states that the truncated Schmidt decomposition provides the best approximation to a vector in the following sense: Fact II.2 Let |ψ be a vector on a bi-partitioned Hilbert space H L ⊗ H R , and let λ 1 ≥ λ 2 ≥ . . . be its corresponding Schmidt coefficients. Then the largest inner product between |ψ and a normalized vector with Schmidt rank r is r j=1 λ 2 j .

C. The Detectability Lemma
One of our main technical tools in this paper is the Detectability Lemma (DL). Originally proved in Ref. 15 in the context of quantum constraint satisfaction and promise gap amplification, a simpler and stronger version of the DL was proved in Ref. 14, where it was used in the context of gapped, frustration-free local Hamiltonians. This is the form that would be used here. To state it, consider a gapped 1D frustration-free Hamiltonian with nearest-neighbor interactions that is defined on a chain of n d-dimensional particles, As explained previously, we assume that the 2-local interactions terms Q i are projections, and we set P i to be the projection of the local ground space of every term. We partition the projections into two sets which we call layers: the odd layer Q 1 , Q 3 , Q 5 , . . . and the even layer Q 2 , Q 4 , Q 6 , . . .. Within each layer the projections are non-intersecting and, therefore, are commuting. It follows that Π odd def = P 1 · P 3 · P 5 · · · and Π even def = P 2 · P 4 · P 6 · · · are the projections into the ground space of the odd and even layers respectively. An illustration of this construction is shown in Fig. 2. We then have: Let A def = Π odd Π even , and let H ⊥ be the orthogonal complement of the ground space. Then For brevity, we will often drop the dependence, and simply write ∆ 0 . We note that the DL is not restricted to the 1D case, and can be easily generalized to any dimension by using more than two layers. In the commuting case, Π even and Π odd commute with each other, and their product, A def = Π odd · Π even is the projection into the ground state of the system. Generally, however, they do not commute, and as a result A is only an approximation to the ground space projection: It leaves the ground space invariant while shrinking its perpendicular space by some factor. The DL quantifies this approximation: It tells us that the shrinking factor is bounded away from 1 by a constant ∆ 0 ( ) that depends on the spectral gap and not on the system size.

III. PROOF OF THE MAIN THEOREM
Recall from the outline of the proof presented in the introduction, that our goal is to construct an operator K whose effect is to rapidly increase the overlap with the ground state, while only slowly increasing the Schmidt rank. We formalize this property of K in the notion of a (D, ∆)-AGSP below and then show that if the tradeoff between D and ∆ if favorable, i.e., D · ∆ < 1/2, then we can show that there is a product state that has large overlap with the ground state, which, in turn, leads to a bound on the entanglement entropy of the ground state. Once this is established, we can move on to the central construction of this work, which is performed in the diluting lemma III.5. This is where an operator with the proper trade-off of D and ∆ is constructed.
We begin with a quantitative definition of an operator that moves any vector toward the ground state. . 2. The settings of the detectability lemma for a 1D system with 12 particles H = 11 i=1 Qi. The local interaction terms, Qi, are divided into two layers: the even layer and the odd layer. The dashed red line denotes a cut in the system between particles 6 and 7.
Definition III.1 An Approximate Ground-Space Projection (AGSP) Consider a local Hamiltonian system H = i H i on a 1D chain, together with a cut between particles i * and i * +1 that bi-partitions the system. We say that an operator K is a (D, ∆)-Approximate Ground Space Projection (with respect to the cut) if the following holds: • Ground space invariance: for any ground state |Ω , K|Ω = |Ω .
We refer to D as the SR factor and ∆ as the shrinking factor.
We note that with this definition, the DL combined with Fact II.1 implies that A = Π even Π odd is a (D 0 , ∆ 0 ) def = (d 2 , (1 + /2) −2/3 )-AGSP. In its bare form, however, this operator is not useful to us since its SR factor is too large with respect to its shrinking factor. Specifically, it turns that the important quantity to consider is the product D · ∆. The following lemma shows how having a (D, ∆)-AGSP with D · ∆ < 1 2 implies the existence of a product state whose overlap with the ground state is at least 1/ √ 2D.
Proof: Let K be a (D, ∆)-AGSP with D · ∆ ≤ 1 2 and |φ a product state |φ def = |L ⊗ |R whose overlap with the ground state is µ = Ω|φ < 1/ √ 2D. Below we show that there exists another product state with a larger overlap with the ground state.
But since D · ∆ < 1/2, and, by assumption µ < 1 √ 2D , it follows that D(µ 2 +∆) < 1, and so the overlap of |L i |R i with the ground state is larger than µ.
With this bound in place, we start from the product state with the maximal overlap with the ground state, and use any AGSP to obtain controlled approximations of the ground state, from which an upper bound on its entropy can be found. A very similar argument was used in Hastings' proof of the 1D area law. 13 Lemma III.3 If there exists a product state whose overlap with the ground state is at least µ, together with a (D, ∆)-AGSP, then the entanglement entropy of |Ω is bounded by The proof can be found in the appendix. The brief overview is that we begin with the asserted product state, and repeatedly apply the AGSP to it. The result is a series of vectors with increasing SR by a factor D each time, that approach the ground state at a rate quantified by powers of ∆. Using these vectors and the Young-Eckart theorem (Fact II.2) provides adequate upper bounds for the high Schmidt coefficients of the ground state to bound the entropy.
Lemmas III.3 and III.2 can be combined to give Corollary III.4 If there exists an (D, ∆)-AGSP such that D · ∆ ≤ 1 2 , the ground state entropy is bounded by: We are left, therefore, with the challenge of designing an operator K which is a (D, ∆)-AGSP with D ·∆ ≤ 1/2. This is the driving construction of this work. It is stated in the following lemma, and is proved in the next section.
Lemma III.5 (The diluting lemma) Consider a 1D gapped frustration-free Hamiltonian, with a spectral gap > 0 and particle dimension d, and define X def = log d . Then for any cut in the chain there exists an (D, ∆)-AGSP with D · ∆ < 1/2 and Substituting the result of this lemma in Corollary III.4 proves Theorem I.1.

IV. PROVING THE DILUTING LEMMA (LEMMA III.5)
We will prove the diluting lemma, by modifying the DL operator A def = Π even Π odd to a new operatorÂ, which is not an AGSP, but has similar properties (see Def. III.1) • Ground space invariance: for any ground state |Ω ,Â|Ω = |Ω .

A. General settings
Without loss of generality, we assume that the bipartitioning cut in the chain intersects with an even projection (see Fig. 2). As a result, when applying A, only the Π even portion of the operator increases the SR. Therefore, our construction ofÂ will modify Π even , leaving Π odd intact.
We begin by considering the piece of Π even consisting of the m even projections closest to the cut. We denote this set of projections by I m , and abusing the notation a little, we relabel them by P 1 , . . . , P m and define the projection into the common ground state of these projections. Note that in this notation the cut intersects with P i * , where i * = m/2 (see Fig. 3). Denoting by Π rest the product of all the remaining projections in Π even , we have Π even = Π m · Π rest . We will approximate Π m by an operatorΠ m , and then defineÂ bŷ Analysis of the amount of entanglement created by powers ofÂ will focus exclusively on the structure ofΠ m since Π rest , Π odd do not increase the SR along the cut. One of the great benefits of using the DL is that the all projections in a given layer commute, and hence the analysis becomes almost classical. Indeed, as we said in the outline of the proof, the projections in I m define a decomposition of the Hilbert space of the system into a direct sum of 2 m eigenspaces, called sectors. Each sector is defined by a string s = (s 1 , . . . , s m ), such that if |ψ s is in the s sector, P i |ψ s = (1 − s i )|ψ s . A site with s i = 1 is called a violation, since it corresponds to a non-zero energy of the corresponding local Hamiltonian term. We denote by |s| = m i=1 s i the total number of violations in the sector s. Finally, we will also consider a coarse-grained decomposition in which we group together all m k sectors with k violations. The direct sum of these subspaces is called the k-violations sector.

B. ConstructingΠm -a general discussion
Before actually constructing the operatorΠ m , it might be useful to describe what we hope to accomplish. Recall that we would like to show that the increase in SR due to application ofÂ is bounded by D ID , whereD ·∆ ≤ 1 2 .
3. An illustration of the decomposition A = ΠevenΠ odd = ΠrestΠmΠ odd that is used to define Πm and its dilutionΠm. Πm is the product of the m projections P1, . . . , Pm that are found in the even layer around the cut.
The rough idea is to show that there is some cut between particles i and i+1 within the support of I m such that the SR across this cut does not grow much due to application ofÂ . This would account for the factorD in the above bound. Moreover, by Fact II.1, the SR across the middle cut i * , i * + 1 can only be larger than the SR between i, i + 1 by a factor of at most d |i−i * | ≤ D m 0 def = D I . There are several approaches to constructingΠ m . Perhaps the most obvious one is by "diluting"; instead of using a product of m projections, we may use a product of rm randomly chosen projections for some 0 < r < 1. After applying suchΠ m for layers, there would be columns with fewer than r entangling projections, and so the SR along these columns will be bounded by D r 0 . Applying Fact II.1 as described above, the SR in the middle cut would be at most D r 0 D m 0 , implyingD = D r 0 and D I = D m 0 . What is the∆ factor of such construction? Intuitively, sectors with high number of violations are easier to "catch" because there is a higher chance of collision between one of their violations and the rm projections. Indeed, on average, the mass in the k sector is shrunk by at least m−k rm / m rm ≤ (1 − r) k . But this means that the low-violations sectors, and in particular the one-violation sector, are barely shrunk. The latter can be shrunk by as little as 1 − r, which means that∆ = ∆ 0 + 1 − r. This can never get us toD ·∆ < 1/2.
To overcome this problem, we take a different approach for the construction ofΠ m using the following operator: The operator N counts the number of violations in a sector: if |ψ s belongs to the s sector, N|ψ s = |s| · |ψ s . We can use the N operator to annihilate the mass in

C. ConstructingΠm using Chebyshev polynomials
To constructΠ m entirely from N, we want a polynomial of minimal degree P (x) such that P (0) = 1 and |P (x)| 2 ≤ δ < 1 for every integer 1 ≤ x ≤ m. In such case,Π m def = P (N) will leave the zero-violations sector intact, while shrinking the other sectors by δ. One naive approach is to take P (x) = (1−x)·(1−x/2) · · · (1−x/m). This gives δ = 0 at the price of a polynomial of degree m, the same degree as the original Π m (thereby creating too much entanglement). For a lower degree polynomial with the desired properties we turn to the Chebyshev polynomial, a central object in approximation theory. As noted in Sec. II D, the Chebyshev polynomial of the first kind T n (x) oscillates between −1 and 1 in the region [−1, 1] and then increases rapidly outside that region. The idea is therefore to use a polynomial that is the mapping of the Chebyshev polynomial from [−1, 1] to [1, m] and rescaled to be 1 at x = 0: (26) Figure 4 shows C m (x) for m = 36. It is easy to verify that 1) C m (0) = 1, and 2) |C m (x)| 2 ≤ 1 9 for every 1 ≤ x ≤ m. The first claim follows from definition, while the second follows from the fact that |Ĉ m (0)| ≥ 3, which follows from Eq. (13): Notice that C m (x) is a polynomial of degree √ m -a huge improvement over the naive construction of degree m. In fact, it can be shown that this construction is optimal: as shown in Ref. 23, any polynomial that satisfies the above two properties must be of at least degree √ m.
To improve the shrinking factor, we can apply [C m (x)] q , where q is a parameter to be determined later. We can now define the operatorÂ that appears in the diluting lemma:

Definition IV.3 (The Chebyshev-based operatorÂ)
Given integers m and q, the Chebyshev-based operatorÂ isÂ We note thatΠ m is a degree j which leaves the zero-violations sector intact and shrinks the violating sectors by at least δ = (1/9) q . Consequently, by Eq. (23), we get∆ = ∆ 0 + (1/9) q . Let us now turn to the task of upper bounding the SR that is generated byÂ .

D. Upper bounding the SR factor ofÂ .
We bound SR generated byÂ in the following lemma: Then for any state |φ , where Proof: Recall thatΠ m is a polynomial of degree j in N and therefore can be written as a sum of j + 1 terms: Consequently,Â can be written as a superposition of (j + 1) terms of the form with i 1 , . . . , i between 0 and j. When applied to a general state |φ , the term that potentially generates the highest SR is (N j · Π rest Π odd ) . We will therefore upper bound the total SR by upper bounding its SR and multiplying the end result by (j + 1) .

Definition IV.5 (A min-entangling operator)
We say that an operator C is min-entangling with respect to some cut in the support of I m if it is of the form where M i are polynomials in the projections of I m such that 1. There exists a subset of at most j /m of the M i 's that contains the projection P ∈ I m that intersects with the cut.
2. Each of the rest of the M i only contains projections that are either strictly to the left of P or strictly to the right of P (in I m ).
It follows that only those M i that contain the "entangling" projection P increase the SR across the cut, and therefore the total SR increase is bounded by D j /m 0 . The bound on the SR follows from the following decomposition ofÂ : Claim IV.6 (N j · Π rest Π odd ) can be written as a sum of at most [4(j + 1) · (j/2 + 1) · (j/4 + 1) · · · (1 + 1)] min-entangling operators.
Proof: Say that an operator C is t-min-entangling if there is a contiguous interval I ⊆ I m of m/2 t projections such that C is of the form where a subset of at most j /2 t of the M i 's are equal to N I def = i∈I 1 − P i , while each of the other M i 's is made of projections that are either to the left of I or to the right of I (in I m ), and in particular does not include any projection from I.
The proof relies on a recursive construction that after t rounds decomposes (N j · Π rest Π odd ) into a sum of t-min-entangling operators. After log m rounds of recursion we end up with the desired decomposition into min-entangling operators.
We begin by splitting N into two terms: Expanding across all layers, we end up with (j +1) terms. Let us focus on one such term, and assume that at layer i it has the powers N j−ni L N ni R . Since the total degree in N L and N R across the layers is j , it follows that the degree of one of the two across the layers must be at most j /2, and hence every term is 1-min-entangling operator. Consequently, (N j Π rest Π odd ) can be written as a sum of at (j + 1) 1-min-entangling operators. Proceeding recursively, we now pick one such term. It either contains at most j /2 N L operators or at most j /2 N R operators. Assume without loss of generality it is N L . We write N L = N LL + N LR , and consequently, every N ni L can be written as the sum of n i + 1 terms: LR . Therefore, upon opening the product, we obtain a sum of i=1 (n i + 1) terms such that each term has either a maximal degree of j /4 of N LL or of N LR . These are all 2-min-entangling operators. Moreover, it is now easy to verify subject to the constraint i=1 n i ≤ j /2, the total number of terms i=1 (n i + 1) is maximized when all n i are equal: i=1 (n i + 1) ≤ (j/2 + 1) . To summarize, we have just shown that (N j Π rest Π odd ) can be written as a sum of at most (j + 1)(j/2 + 1) 2-min-entangling operators.
Continuing in this fashion for log j rounds we end up with a total of at most [(j + 1) · (j/2 + 1) · (j/4 + 1) · · · (1 + 1)] log j-entangling operators. At this stage, the total degree of each such operator in N I is at most j /2 log j = . To bound the increase in the remaining log(m/j) rounds, we observe that subject to the constraint i=1 n i ≤ /2 k the expression i=1 (n i + 1) is maximized when /2 k of the n i 's are 1 and the rest 0: i=1 (n i + 1) ≤ 2 /2 k . It follows that the total increase in the number of terms over these rounds is bounded by 2 2 /2 2 /4 · · · ≤ 4 . This completes the proof of the claim.
We can now finish off the proof of the main lemma of this section. Claim IV.6 gives a decomposition of the operator (N j · Π rest Π odd ) as a sum of at most [4(j + 1) · (j/2 + 1) · (j/4 + 1) · · · (1 + 1)] min-entangling operators. As noted above, to apply this result toÂ we must further multiply this number by (j + 1) , soÂ can be written as a sum of no more than [4(j + 1) 2 · (j/2 + 1) · (j/4 + 1) · · · (1 + 1)] terms. It is easy to verify that for j ≥ 2 (which is always the case), 4(j + 1) 2 · (j/2 + 1) · (j/4 + 1) · · · (1 + 1) The SR contribution of each such min-entangling operator at that cut that passes through its diluted column is at most D j /m 0 , and since this column is at most m particles away from the bi-partitioning cut, it follows from Fact II.1 that its SR contribution to the bi-partitioning cut is at most D j /m 0 · D m 0 . Therefore SR(Â |φ ) ≤ D ID · SR(φ), with This concludes the proof of Lemma IV.4.
At this point it is worthwhile to pause and take inventory of where we are. We have shown that the operator A is an AGSP with characteristic factors {D ID , (∆ 0 + (1/9) q ) }. We are searching for an AGSP whose product of characteristic factors is below 1/2. This will be the case in our situation for some as long as the product D · [∆ 0 + (1/9) q ] is less than one. Our work so far has shown Recalling that j = q √ m it is clear that we can find constants q, m (in D 0 ) such thatD · (1/9) q < 1/2; however the termD · ∆ 0 may still be bigger than 1.
To solve this problem another idea is needed: coarsegraining. As we shall see in the next subsection, fusing together k adjacent particles allows us to move to a new local Hamiltonian system with D 0 , ∆ 0 replaced by D k 0 , ∆ k 0 . Taking k = O(q log ∆ −1 0 ), would then yield ∆ k 0 (1/9) q , and consequently lead toD ·∆ < 1/2.
An illustration of a k-coarse grained system in 1D with k = 4. The elongated rectangles denote the coarsegrained projections and the ovals denote the original projections. Underneath even coarse-grained projections, one can "pull" a pyramid of original projections, and similarly above an odd coarse-grained projection. Together, they form k layers of the original projections. This shows that the coarsegrained shrinking exponent is actually ∆ 0 = ∆ k 0 .

E. Coarse-graining
Consider a k-coarse-grained system, in which we fuse together k adjacent particles, making them a single particle of dimension d k . The new Hamiltonian of the system would now be a 2-local Hamiltonian on a chain, consisting of projections Q i = 1 − P i , where P i denotes the projection into the common ground space of the 2k particles that form the coarse grained particles i and i + 1. We define the odd/even layers projections Π odd , Π even accordingly, and notice that every application of Π even Π odd increases the SR by at most a factor of D k 0 . To estimate the shrinking factor, we may use the DL on the new Hamiltonian using the new spectral gap.
However, there is a much stronger bound that we can obtain by using the DL on the original system. Based on an idea that has already appeared in a related form in Ref. 14, we show that: This immediately implies: The claim follows from two observations: 1. For any coarse-grained constraint P i , we can always "pull" from it a product of the original projections P i that act on the support of P i . The reason being that P i projects into the common ground space of all the original 2k particles in its support, and the original projections P i are trivial on that space.
This is due to their pyramid-like shape, they can be commuted past each other and re-arranged as k+1 layers of the original projections (see the figure above). We note that applying k + 1 original layers of the DL shrinks 24 the perpendicular space by a factor of ∆ k 0 .

F. Gluing it all together
If we first k-coarse-grain the system, and then con-structÂ, we obtain the following factors: We have 3 free parameters: m, q, k (recall that j = q √ m).
To finish the proof we show how these can be chosen to obtainD ·∆ < 1/2. Our first step is to demand that k is large enough such that ∆ k 0 ≤ (1/8) q . Looking at Eq. (6), it is easy to verify that as long as ≤ 10, this can achieved by defining, for example, Then∆ ≤ 2(1/8) q . A sufficient condition forD ·∆ < 1/2 is therefore or equivalently, where we used the definition X def = log d = log D0 2 (see Theorem I.1).
To satisfy this equation, we demand that 40X j m ≤ 2 so that the leading term in the LHS of Eq. (43) would be −q. Substituting j = q √ m, leads us to define Going back to Eq. (43), we now have to satisfy where now j = q √ m = 20Xq 2 .
It is easy to see that this equation can always be satisfied for large enough q, since the logarithmic factors are weaker than the (−q) factor. A straightforward analysis yields and therefore and Then D I = D km 0 , and as log D k This concludes the proof of Lemma III.5.

V. 2D AND BEYOND
Can Theorem I.1 be extended to the 2D case and beyond? Currently, we do not have such a proof. Nevertheless, it is possible to make a small step in the right direction, as we sketch in this section.
For the sake of clarity, we will restrict ourselves to the 2D case, and consider the case where the bi-partitioning of the system is along one dimension, and the length of the boundary is I, as depicted in Fig. 6. In terms of the discussion in the Introduction, I = |∂L|, the area of the separating surface between L and L.
To prove an area-law for this system, one would like to show that the von-Neumann entropy between the two parts of the system satisfies S ≤ O(1) · I.
A straightforward approach for obtaining a bound on S, which was also mentioned in Ref. 13, is to treat the 2D system as a 1D system by considering the particles along a column as a huge particle of dimension d I . Then to get a bound from Theorem I.1 we replace d → d I , or equivalently, X def = log d → I · X. This gives us S ≤ O(1) · (I · X) 3 log 8 (I · X).
The above derivation completely failed to take into account the local aspects of the problem along the cut. We now show how one can make use of it to drop the leading power of I from I 3 to I 2 , and get a bound of S ≤ O(1) · I 2 · X 3 log 8 (I · X).
We first note that up to some unimportant geometrical factors, the DL works also in 2D (and in any dimension for that matter). See Ref. 14 for more details. For simplicity, let us assume that also in the present case we have only two layers and it is only one layer that increases the SR with respect to the cut, which we will still refer to as the "even layer". The idea is then to mimic the 1D case and replace a segment of m columns around the cut, which we denote by I m with the operatorΠ m . The difference is that now I m contains m · I projections instead of m. Therefore, the polynomial that we would use would be [C mI (x)] q , where C mI (x) is based on the Chebyshev polynomial of degree √ mI. Just as in the 1D case, the shrinking factor∆ ofÂ is given by∆ = ∆ 0 + (1/9) q .
What is the SR exponentD of this construction? A very similar analysis to that in Sec. IV D can be done here: we recursively divide I m by cutting it in parallel to the cut. Just as in the 1D case, there are m such possible cuts in I m . The difference is that now the SR contribution of the restriction of N to a certain cut is not D 0 but O(1) · I · D 0 , because N contains a sum of O(1) · I projections along a cut that is parallel to the boundary. Therefore, the overall SR after layers is: and j def = q √ mI. In comparison with the 1D indices in Eqs. (34, 35), we see that D 0 → D I 0 in the formula for D I , but D 0 → ID 0 in the formula forD -an exponential saving in the latter.

VI. MATRIX PRODUCT STATES
We briefly sketch the implications of these results for the efficient approximation of the ground state via Matrix Product States. Specifically, Theorem I.1 implies the following corollary Corollary VI.1 Under the same conditions of Theorem I.1, for any integer k > 0 there exists a matrix product state (MPS) |ψ k with bond dimension k such that This result follows from first noting that across any cut, we can use the properties of the AGSP K to bound the norm of the tail of the Schmidt coefficients: i≥D λ 2 i ≤ 1 µ 2 ∆ ≤ 2D∆ , where µ is the overlap of the initial product state with the ground state (see Lemma III.2 in Sec. III and the proof of Lemma III.3 in the appendix ). Then letting k def = D , we find that if we truncate the Schmidt coefficient of a given cut at k, we introduce an error of δ = i≥k λ 2 i ≤ 2D k , where we used the fact that D · ∆ ≤ 1/2. Then applying the MPS construction procedure of Vidal 25 , and truncating the SR across all cuts to k, yields an MPS of bond dimension k that approximates the ground state to within the accumulated error nδ = 2nD k . Finally, recalling from Eq. (16) in Lemma III.5 that log D ≤ O(1)·X 3 log 8 X gives Eq. (55).

VII. CONCLUSIONS
In conclusions we have given a new proof of the arealaw for 1D, gapped and frustration-free systems. The proof uses the DL and the Chebyshev polynomials to upper bound the entropy by O(1) · X 3 log 8 X, for X = log d , which is exponentially better than the bound in Ref. 13. It brings us much closer to the recent lower bound of O(1) · 1/4 (for fixed d) of Hastings and Gottesman 16 and Irani 17 .
There are two immediate directions in which one might hope to improve this result. First, it is seems very plausible that the proof can be extended to the frustrated case. Indeed, already in Hastings' 1D proof 13 , one of the first steps of the proof is to reduce the frustrated system into an almost frustration-free system by a coarse-graining procedure. It is possible that a similar technique can be also deployed here. Moreover, one might try to take a more direct approach and construct the AGSP directly from the Hamiltonian H, by replacing N with H. Both are sums of projections, and a similar SR analysis can be performed on operators of the form poly(H).
The second direction, which is much more interesting, and probably more difficult, is to try to generalize the area-law for 2D and beyond. In fact, any sub-volume law for 2D would be an extremely interesting result. One possibility is to improve the log d dependence of the bound in Eq. (3). A bound linear in log d would imply an arealaw in all dimensions, whereas anything like (log d) 2−δ would imply a sub-volume law for low dimensions. However, it seems to us that the right approach is to better understand and exploit the locality of the problem in the direction parallel to the cut. This was done in a very weak way in Sec. V, and led to an improved bound in the 2D case. We believe there are more local aspects of the problem that can be used. For example, in the current AGSP construction we do not assume anything about the underlying distribution of violations. Yet, these arise from a very specific local operation, namely the application of the previous AGSPs. If one could show that the distribution of violations decays exponentially in k, it may be enough to use a Chebyshev polynomial of a degree smaller than √ m -which may lead to an area law. More generally, one might want to prove some notion of independence, or decay of correlation, along the cut, thereby reducing the 2D problem to a stack of nearly independent 1D problems.

VIII. ACKNOWLEDGMENTS
We are grateful to Dorit Aharonov, Matt Hastings, Sandy Irani, Tobias Osborne and Bruno Nachtergaele for inspiring discussions about the above and related topics. Proof of Lemma III.3: Let |Ω = i≥1 λ i |L i ⊗ |R i be the Schmidt decomposition of the ground state |Ω , and let |φ = |L ⊗ |R a product state such that | φ|Ω | = µ. Define the sequence of states |v to be the normalization of the vectors K |φ . Since K is a (D, ∆)-AGSP, it follows that |v has the following properties: . Then by Fact II.2 (Eckart-Young theorem), We will use this bound to upper bound the entropy of the {λ 2 i }. Choose 0 = log µ 2 log ∆ so that p 0 < 1. For ≥ 2 0 + 1, we we will upper bound the entropy of the