How much entanglement is needed to reduce the energy variance?

We explore the relation between the entanglement of a pure state and its energy variance for a local one dimensional Hamiltonian, as the system size increases. In particular, we introduce a construction which creates a matrix product state of arbitrarily small energy variance $\delta^2$ for $N$ spins, with bond dimension scaling as $\sqrt{N} D_0^{1/\delta}$, where $D_0>1$ is a constant. This implies that a polynomially increasing bond dimension is enough to construct states with energy variance that vanishes with the inverse of the logarithm of the system size. We run numerical simulations to probe the construction on two different models, and compare the local reduced density matrices of the resulting states to the corresponding thermal equilibrium. Our results suggest that the spatially homogeneous states with logarithmically decreasing variance, which can be constructed efficiently, do converge to the thermal equilibrium in the thermodynamic limit, while the same is not true if the variance remains constant.


I. INTRODUCTION
Entanglement plays a central role in several phenomena in many-body quantum systems. Ground and low excitation states of local lattice Hamiltonians typically have very little entanglement as they fulfill an area law: 1,2 the entanglement of a connected region with the rest scales with the number of particles (area) at its boundary. As a consequence, those states can be efficiently described with tensor networks, with a number of parameters that only scales polynomially with the total system size. [3][4][5] However, generic eigenstates may possess a lot of entanglement, as they are expected to obey a volume law. This has relevant implications, like the fact that the dynamics of quenched systems is hard to describe in terms of tensor networks, at least for sufficiently long times. 6,7 This volume law is also closely related to the eigenstate thermalization hypothesis, [8][9][10][11] namely the fact that generic eigenstates are able to capture the local properties of systems in thermal equilibrium, when the number of lattice sites N → ∞: the entropy of any finite region in the thermodynamic limit must be extensive, and thus entanglement has to obey a volume law. [12][13][14] Indeed, for a large variety of local Hamiltonians it is expected that local expectation values in any eigenstate with energy in the interval ∆ E = [E − aN α , E + aN α ] converge to the same value in the thermodynamic limit for any constant a and α < 1.
The observation that all eigenstates in an energy interval lead to the same properties in the thermodynamic limit naturally implies that convex combinations thereof fulfill the same property. Those mixed states may have an energy variance δ 2 that scales according to δ ∼ N α and still give rise to thermal averages. However, this is not necessarily the case for linear combinations of those eigenstates or, more generally, for arbitrary pure states with δ ∼ N α . In fact, product pure states typically possess a variance that scales as δ ∼ N 1/2 and do not have any entanglement at all, so that they cannot describe thermal properties of a system. This raises some nat-ural questions: Are there states with δ below √ N but still with little entanglement? Is there a general relation between those two quantities? How does the energy variance of a pure state need to scale with N in order to ensure that the state describes local thermal properties? Some related problems have been studied in the literature. Typicality arguments 13 can be invoked to argue that most pure states compatible with any macroscopic constraints will exhibit very similar local expectation values. Under appropriate considerations, this is enough to ensure that energy eigenstates look locally thermal. 15 However, if the most strict sense of typicality is considered, 16,17 this is, if all eigenstates within an energy shell need to have expectation values that are exponentially close (in the system size) to the shell average, this requires exponentially small width of the energy shell for most few-body Hamiltonians. 16 For chaotic systems, a polynomial relation was found 17 between the energy width and the maximal deviation from thermal behavior for any state supported on it, in particular, in terms of the subsystem entropy. Additionally, the typical entanglement of energy eigenstates has been the focus of recent theoretical studies for chaotic and integrable systems. [18][19][20][21][22] Less is known about the typical behavior of entanglement for a pure state that is a superposition of energy eigenstates within a narrow energy shell, although results exist that characterize the typical entropy of physical states, understood as those that can be reached by unitary evolution with local Hamiltonians. [23][24][25] A related topic is the concept of thermal pure quantum states [26][27][28] (TPQ), introduced to develop a pure state formulation of statistical mechanics. TPQ are random states for which expectation values of local observables probabilistically converge, in the thermodynamic limit, to their values in a given statistical equilibrium ensemble. The variance of their energy density distribution vanishes as 1/N . In 29 TPQ states were constructed starting with a state drawn from a random matrix product state (RMPS) ensemble 30 with fixed bond dimension, what allows for a more efficient sampling in many-body systems. However, if the resulting TPQ needs to approach the entanglement content (for a subsystem) that characterizes the equilibrium ensemble, the accurate MPS approximation of the TPQ will be unfeasible in most cases.
In this paper we address the question of how much entanglement is required to reduce the energy variance of a local Hamiltonian in one dimensional spin chains. In particular, we construct states with an arbitrary value of the variance δ 2 and with entanglement that scales as (k/δ) + log √ N , where k is a constant. We use matrix product state (MPS) techniques 31,32 for the deterministic construction and also to compute the entanglement, which we estimate through the bond dimension of the states. We also extensively check numerically this prediction with the Ising model in a transverse and longitudinal field, and with the Heisenberg model. Our results imply that it is indeed possible to build states of constant variance δ = O(1) but with little entanglement, which grows only logarithmically with N and a bond dimension of the MPS that grows polynomially. In fact, one can even take δ 1/ log(N ) while keeping such polynomial scaling with N . However, if we want to obtain a scaling δ ∝ 1/N , the entanglement will grow linearly with the size. We also investigate to what extent the states we construct can recover the thermal properties as N grows. Since the entropy in thermal equilibrium is extensive, a necessary condition for a region of size L to be thermal is that its entanglement entropy is O(L). Thus, the bond dimension of the MPS must at least scale exponentially with L. If one restricts the bond dimension to grow only polynomially with N , the largest thermal region can be at most L ∼ O(log N ). According to our bounds, the required O(L) entropy can be thus achieved with a variance that decreases as δ ∼ 1/ log(N ). Our numerical results confirm that we can decrease the variance as δ ∼ 1/ log(N ) keeping a polynomially scaling bond dimension and, for fixed value of L, all local observables in the region of size L converge to their thermal values in the thermodynamical limit.
The rest of the paper is organized as follows. In section II we introduce the general aspects of our setup and our notation. Then, section III presents two explicit constructions of pure states which can attain arbitrarily small energy variance. In IV we derive our main result: the bounds on bond dimension and entropy of such states as a function of the variance. We also discuss the limitations and implications of our results. Section V presents our numerical results for the Ising and Heisenberg spin chains. Besides checking the scaling of the variance, and the verification of the theoretically derived bounds, we probe the convergence of reduced density matrices towards the thermal equilibrium states. In section VII we compare the results with those from variational minimization of the variance for the same sets of parameters. Finally in section VIII we discuss our results and summarize our conclusions.

II. PRELIMINARIES
We are interested in analyzing the entanglement required to achieve a given energy variance δ 2 . We consider a spin chain of local dimension, d, and a local Hamiltonian where h n acts on spins n and n+1. We will consider open boundary conditions, i.e. h N acts only on the N -th spin, although our results can be easily extended to the case of periodic boundary conditions. Without loss of generality, we will assume that tr(h n ) = 0, and tr n (h n ) = 0. Note that if the latter is not the case, we can always include the part that does not vanish in the term h n+1 . We will normalize the Hamiltonian such that where we took the operator norm, so that its spectrum, When we consider sequences of Hamiltonians with increasing number of spins, will also assume that min h n = h min > 0, where h min is some constant independent of N . In particular, for the numerical computations we will take h n = h m (except for n = N ), so that this is automatically fulfilled. The energy variance of a pure state Ψ is In order to analyze the entanglement present in the state, one can consider a "cut" of the -th link of the chain, with ∈ {1, . . . N − 1}. This divides the chain in two regions, which we denote by the number of spins they contain, respectively and N − . The entanglement entropy with respect to this bipartition is where the reduced state ρ = tr +1,...N |Ψ Ψ|. This quantity is bounded by log 2 d times the minimum between the number of spins in both regions. Typically, when the sizes are large S is difficult to compute (as it requires diagonalizing ρ ), and even to bound (because two states that are arbitrarily close may have very different values). Alternatively, we will also consider the bonddimension required to describe the state Ψ in terms of a matrix product state (MPS). That is, the size, D, of the matrix A s [n] such that |Ψ ≈ |Φ D with Typically, one would expect that the maximum entropy with respect to all possible cuts, S = max S , fulfills S ∼ c log(D) (6) for some c = O(1).
In the following sections, we will construct states with an arbitrary energy variance δ 2 and that (for large system sizes and small δ) can be approximated by an MPS with bond dimension where c and D 0 are some constants. One can thus estimate the entanglement of the state across any cut to be bounded (6) where k 1,2 are constants.

III. CONSTRUCTING STATES WITH ARBITRARILY SMALL ENERGY VARIANCE
In order to explore how much entanglement is there in pure states with small energy variance, we present here two explicit constructions for families of states with welldefined variance. In the next section we will show that they also have controlled entanglement.

A. Product states
We start considering the trivial case of product states where p n are single spin states. These states have no entanglement for any bipartition, and their variance reads This tells us that δ p ≤ √ 6N . Actually, one can always construct a product state with δ ≥ y √ N for some constant y, since the averaged variance over all product states can be easily seen to be O(N ). Thus, δ ∼ √ N can be obtained with zero entanglement.
We will make use of the following result from. 33 In the case of a product state |p with mean energy E p = p|H|p and energy variance σ 2 p such that σ p = a √ N , with a > 0, the local density of states (or energy distribution) converges in the thermodynamic limit to a Gaussian By local density of states we mean that for any interval ∆ = [E 1 , E 2 ] ∈ σ(H), if we take P ∆ to be the projector onto the subspace spanned by the eigenstates of H with eigenvalue in that interval, then 34

B. Entangled states
We will consider here states of the form where p is a product state with E p = p|H|p = 0, σ p = a √ N with a = O(1), and N is the normalization factor. In the following paragraphs we specify two explicit choices for the coefficients c m of the sum in (13), such that the variance of the resulting state |Ψ systematically decreases with the number of terms in the sum M 0 . The sum above can actually be restricted to −x √ M ≤ m ≤ x √ M , with the error scaling as a Gaussian function of x, so that it can be made arbitrarily small by a judicious choice of x = O(1). We have indeed checked that taking x = 2 the relative error is smaller than 10 −3 for N ≤ 1000 and M ≤ (100N ) 2 . The action of this operator on |p can thus be written in the form (13) with M 0 = x √ M terms. Using the fact that cos M (X) e −M X 2 /2 for |X| < 1, and the Gaussian form (11) of the local energy density of |p , the variance of the resulting state is found to be δ 2 = (1/σ 2 p + 2M/N 2 ) −1 which for large enough systems scales as 2. Chebyshev filter.
We found that for the numerical implementation, it is more convenient to use the alternative construction we describe here, which attains a similar scaling, but allows more efficient simulations.
A piecewise continuous function f (x) for −1 ≤ x ≤ 1 can be expanded as f (x) = m p m T m (x) in terms of Chebyshev polynomials of the first type. The coefficients of the expansion can be computed using the orthogonality properties of the polynomials as p m = is the weight function for this family of polynomials, and C m are the normalization factors, C m = 1 −1 dxw(x)T n (x) 2 , namely C m>0 = π/2 and C 0 = π. The truncation of the sum to a finite M provides an approximation to the function, which exhibits characteristic (Gibbs) oscillations near a discontinuity. The kernel polynomial method 35 reduces this effect and improves the convergence of the truncated series by multiplying the coefficients by specific factors g (M ) m . In particular, we use the Jackson kernel, for which In the case of the Dirac delta function, all coefficients for odd polynomials vanish, and the M −th order approximation using the KPM reads By applying the same truncated series to the rescaled Hamiltonian H/N , we obtain the operator The result of applying this operator to the product state |p can also be written in the form (13). The Chebyshev polynomials fulfil T n (cos x) = cos(nx). Then Choosing a constant α 1 and rescaling H → αH ensures that the approximation holds for all eigenvalues of the argument. In the case we study here it is enough to consider α = 1, since the contributions of large eigenvalues (for which in principle the relation may fail) are rapidly suppressed.
So, finally we can write which is of the form (13) where the last step results from considering the limit N 1.

IV. RELATION BETWEEN ENTANGLEMENT AND ENERGY VARIANCE
The entanglement of any state with the form (13) can be upper bounded by a function of M 0 and N . Using a result from, 33 in the thermodynamic limit, the overlap between two terms in the sum, m and m decreases with their separation as Terms for which the separation m − m N/(2σ p ) = √ N /(2a) are almost proportional to each other. Therefore we can reduce the number of terms in the sum by a factor √ N by defining a constant γ a and grouping each set of √ N /γ consecutive terms, as where The entanglement generating capacity of a local Hamiltonian as (1) is bounded, [36][37][38][39] so that, when acting with the evolution operator e irH on a product state, the entanglement for any given cut can only increase linearly with r, independently of N .
Thus, each (normalized) term e 2ikH/γ √ N |p appearing in the sum can be approximated by a MPS with bond dimension D 2|k|/γ √ N 1 , for some D 1 = O(1). Since the bond dimension of a sum of MPS is at most the sum of the bond dimensions of its terms, |Ψ can be approximated by a MPS with (24) or, in the limit of large system size N 1, From equations (15) and (21), we see that the energy variance in both constructions presented in the previous sections decreases precisely as δ ∝ N/M 0 , so that in (25) we may substitute D 2M0/N 1 = D 1/δ 0 , with a value D 0 , specific for each case, that absorbs the corresponding constant exponents. Then the bond dimension in both cases scales (for large systems) as which is one of our main results. The expression reduces to (7) when δ 1.
Correspondingly, using (6), in the large N limit, the entanglement is bounded by which reduces to S ≤ k 1 /δ+log √ N +k 2 for small enough δ.

A. Implications and limitations
Let us briefly comment the conditions and consequences of the results derived above.
Regarding the conditions in our derivations: (i) Equations (26) and (7) have to be taken as upper bounds: the estimations we made to compute the bond dimension of the approximations to the state (13) consider a worst-case scenario, where the Hamiltonian H generates as much entanglement as possible, and the bond dimension of a linear combination of states is the sum of the bond dimensions of each of them.
(ii) The constructions introduced in this section prepare states of a given variance, and we have shown that they possess bounded entanglement, but there may well be other states with less entanglement for the same variance. Indeed, we cannot expect to find tight bounds, since it is possible to construct examples of exact product states in the middle of the spectrum for specific interacting systems. For instance, just take a staggered ferromagnetic Hamiltonian, H = n (−1) n σ n σ n+1 , for which any product state |p ⊗N is an eigenstate with zero energy (for N even).
In order to try to obtain better general bounds, In that case, the bond dimension of the state |Φ = |φ L ⊗|φ R would be upper bounded by a sublinear function of N , but its variance may be as large as N 2 : just taking (iii) We have shown that, in the limit N 1, our constructions yield a variance of the form δ ∼ N/M 0 , for M 0 terms in the sum, but for finite systems we expect some corrections to appear. In particular, to make the derivation of the cosine operator rigorous, we should scale x and γ with N and δ; however, since the error we made by truncating the series (14) is exponentially small in x, the corrections will only depend logarithmically on those quantities.
(iv) Although we have described how to use our construction to obtain states in the middle of the spectrum, with E 0, it can also be used for other energies E 0 , as long as there exists a product state with mean energy p|H|p = E 0 on which we can apply the filter, after replacing H→H −E 0 . In particular for qubits, as considered here, such an initial product state can be shown to exist for any energy |E 0 | ≤ N h min /6. This can be seen as follows. First, for each odd term h 2n−1 in H, we define where the maximization is over all product states, and we define local unitaries U n such that m 2n−1 = 00|h 2n−1 |00 , whereh 2n−1 is h 2n−1 conjugated with U 2n−1 ⊗ U 2n . It is easy to show that m 2n−1 ≥ h min /3, where the bound is tight (for instance, for h 2n−1 = σ 2n−1 σ 2n ). If we conjugate H with the product of all U n , we obtaiñ H = n a n σ z n σ z n+1 + b n σ z n + H , where |a 2n−1 | + |b 2n−1 | = m 2n−1 , and H does not contain terms of the form: 2n . The first two cannot appear since they are already included in (29) and tr n h n = 0, so they cannot come from h 2n . The terms σ z 2n−1 σ x,y 2n cannot appear either, as if they do, then there would exist a product state such that the expectation value ofh 2n−1 would be larger than the maximum, m 2n−1 . Now, we can take a product state of the form ⊗ i |p i , where we choose p i ∈ {0, 1} from left to right making sure that all expectation values give a positive value, and we call, for the even terms, m 2n := p 2n p 2n+1 |h 2n |p 2n p 2n+1 ≥ 0. Then the corresponding energy is m ≥ n m 2n−1 ≥ N h min /6 = E 0 . Similarly, we can construct product states with energy −E 0 .
Regarding the consequences: (i) Eq. (26) implies that there are states with constant energy variance and with a bond dimension that only scales polynomially with N . Those states have at most log N entanglement along any cut, and thus only slightly violate the area law. Conversely, if we keep the bond dimension constant, the variance must increase with δ ∼ √ N .
(ii) It is possible to build states with a variance decreasing as δ ∼ 1/ log N but still keeping a polynomial bond dimension. Notice however that this case may be affected by the corrections mentioned above. For instance, in the cosine construction in III B 1, if the factor x needs to grow logarithmically with N , δ would have to decrease as some power of 1/ log N .
(iii) For the state (13) to reproduce the thermal properties in the thermodynamic limit (i.e. for the expectation values of all local observables -with any support -to converge to their thermal values) the state needs to have an extensive value of the entanglement entropy, and thus δ ≤ 1/N . However, if one is only interested in the observables in a region of fixed size L, it may be enough that the entropy of that region is L, so that keeping δ constant (or slightly decreasing with N ), may be enough to locally thermalize for sufficiently large N , and in this case the local temperature may vary on length scales much larger than L. 17 Notice that we can also apply the arguments leading to (27) to a subsystem in the middle of the chain, by simply considering a double chain obtained by folding the original one in two around the center of the subsystem. Thus a similar bound (27) (with different constants) holds for the entropy of the subchain of length L.
In the next sections we investigate numerically all the points raised above. We give an explicit construction to build the MPS, and numerically check to what extent (21), (26) and (27) are obtained for moderate values of N and δ. We also explore how close to thermal are the reduced density matrices of these states for small subsystems.

A. Numerical implementation
In order to achieve a fixed variance δ 2 , both constructions presented in the previous section would require a number of states in the sum (13) proportional to N/δ. To implement the first strategy III B 2 numerically we can use standard MPS time evolution techniques in order to approximate each term of the sum starting from the product state |p . We have observed, nevertheless, that the truncation error accumulates fast when compressing the terms of the sum. Iteratively applying the cosine operator exp[iH/N ]+exp[iH/N ] produces numerically more stable results, but requires (N/δ) 2 iterations. A more efficient strategy is to implement the Chebyshev construction from III B 2 by approximating with a MPS the action of each term in (18) on the initial state.
Starting from the product state, |p , we construct the terms of the sum (18) as follows. The first two Chebyshev polynomials are exact matrix product operators 42 (MPO) with small bond dimension, T 0 (H) = 1, and T 1 (H) =H, whereH = αH/N is the properly rescaled Hamiltonian, so that its spectrum is within the convergence domain of the Chebyshev expansion. In practice, to choose α it is enough to obtain an estimate of the edges of the spectrum, E min and E max , which can be done efficiently using DMRG, and then to take α < N/ max(|E min |, |E max |), where the inequality is guaranteed by fixing for instance α as 0.9 times the rhs).
The 0-th order approximation corresponds to |Ψ 0 = g (M ) 0 c 0 |p . Then, we iterate until the desired order M , where c n can be read from (18). In principle, the higher order polynomials could be also computed as MPO using the recurrence T n+1 (H) = 2HT n (H)−T n−1 (H) using the standard algorithms. However, since only the action of each polynomial on the initial state, |T n (p) := T n (H)|p , appears in the sum, it is more efficient to compute these vectors, which satisfy the same recurrence relation and use them to update the sum as |Ψ n+1 = |Ψ n−1 + g (M ) n c n |T n (p) . This allows us to operate directly with MPS, avoiding the more costly operators. Notice that the use of Chebyshev polynomials of the Hamiltonian combined with tensor network techniques was first suggested in 43 for approximating spectral functions and has been later used for time evolution [44][45][46] and density of states calculations. 47 The technical details involved in our computation of the intermediate |T n+1 (p) vectors are virtually the same described in those references.

Models and initial states.
We consider two spin 1/2 quantum chains to probe our construction, namely the Ising model with longitudinal and transverse field, and the XYZ Hamiltonian in a magnetic field, We consider non-integrable points, and choose the sets of parameters (J = 1, g = −1.05, h = 0.5) for H I and (J x = 1.1, J y = −1, J z = 0.9, h = 1.2) for H XYZ . In both cases, we select a product initial state that has energy close to 0. In the case of the Ising model,

⊗N
, which has Y + |H I |Y + = 0 for every system size. In the case of the XYZ chain, we use a staggered configuration |p = |Z st2 := (|0 |0 |1 |1 ) ⊗N/4 (if the chain length is even, but not a multiple of 4, the last pair of sites is in |0 ), with constant energy J z (for even chain lengths), and thus vanishing energy density in the thermodynamic limit. Note that these initial states are spatially homogeneous on large length scales, so that as they approach local thermal equilibrium at large M , the local temperature is spatially uniform. Below, in section VI, we also explore spatially nonuniform initial states.
Using the method described above, we compute the Chebyshev sum to M -th order for different values of M , and for system sizes between N = 20 and 100. In each case, we allow bond dimensions between D = 200 and 1000 (notice that D = 1024 is exact for our smallest system N = 20).

B. Scaling of the variance
To probe whether the energy variance decreases with the number of terms in the sum according to the asymptotic behavior in (21) already for the moderate system sizes accessible to the numerics, we run the procedure described above for both models using truncation parameters M that vary with the system size. As shown in figure 1, the scaling is close to the asymptotic one, as long as the truncation error is not important. Fitting the final variance for the converged calculations for each system size to δ = AM −η , we find η close to 1 for all cases, with only small deviations (see caption of figure 1).
Another way to check the behavior predicted in III B is to examine the scaling of the final variance when the truncation order scales as a certain function of the system size M = f (N ) (see figure 2). Since we expect δ ∝ N/M , a linearly growing M ∝ N should maintain constant variance for increasing system size. Our simulations show that this is practically the case, and the behavior is consistent in both models. Nevertheless, we observe a slight increase δ N 0.03 for Ising ( fig. 2a) and δ N 0.05 for XYZ ( fig. 2b). If M grows faster than linear (see the figure for M ∝ N log N and M ∝ N 2 ), the variance decreases with increasing system size. In the first case, the results are compatible with a descent δ ∼ 4/ log N , but also with a power law N −0.223 (Ising), respectively δ ∼ 1.53/ log N or N −0.22 (XYZ). In the quadratic case, the variance drops much faster, compatible with δ ∝ N −0.97 (Ising) and N −0.88 (XYZ) within the converged range of sizes (notice that in that case the largest system size is not converged even with D = 1000).

C. Scaling of entropy and truncation error
Equations (26) and (27) estimate the upper bounds of the scaling of the entanglement entropy and the bond dimension with system size for each set of states defined by a function M = f (N ). In particular, if 1/δ scales as log N or slower, which is the case for all functions discussed above except M ∝ N 2 , the bound scales asymptotically as log N .
We check this prediction plotting the numerical results vs. the logarithm of the system size, as shown in figure 3. Our data show that in almost all cases, the growth of the entropy is compatible with log N . We fit the resulting  entropies for the largest bond dimension (discarding the smallest system sizes) for each model ( fig. 3a, 3b), and find the following forms 48 . We can further probe to which extent the asymptotic behavior for large systems (26) is satisfied within our data. Without taking the large N limit in the sum (24), we obtain for arbitrary sizes

Ising
where we have defined g(N ) = (D 2/γ √ N 1 − 1) −1 . In the limit of large system size, g(N ) ∼ γ √ N /2 log D 1 1, and (26) is recovered. We thus estimate D 0 from the moderate sizes available by fitting the data for each system size to a function 2 S = aD  figure 4. As expected, the largest deviations are observed for the smallest system size N = 20. For larger systems, the Ising data fits well the expected behavior, while in the XYZ case the finite size effects seem to be more important, and deviations can be appreciated also for N = 40. From our simulations, run with constant bond dimension, we can estimate the truncation error and thus the minimal bond dimension required to maintain a given precision in the MPS representation of the states we construct. The MPS form gives access to the Schmidt decomposition {λ k } D k=1 across any cut. For the one corresponding to the middle of the chain we define D tr as the minimum bond dimension required to ensure a small error = 10 −2 , namely 1 − Dtr i=1 |λ k | 2 ≤ . We then analyze the scaling of D tr as we did for the entropy. Notice that these estimates are obtained for the largest D = 1000 (discarding the largest sizes which may not be fully converged).
Repeating the analysis we described for the entropy, we can fit the data for each system size to a curve D tr = aD √ N , the distance grows and approaches the maximum (0.996). For constant variance, instead, the distance seems to stay almost constant in the Ising case, and to go slightly down in the XYZ case. A number of steps M ∝ N log N or M ∝ N 2 seems to be enough to have convergence of the local reduced density matrices to the thermal one. As in previous figures, solid symbols correspond to the maximum bond dimension D = 1000, and for the largest number of steps we show in lighter shades the non-converged results for D = 300 − 500. The right column shows the entropy of the corresponding reduced density matrix for the same cases.
For all the states explored in figure 2a, i.e. sets of states defined by a truncation order M = f (N ) ≥ √ N , the scaling of the variance is compatible with δ ∼ N η , with η < 1, so that the energy density variance δ/N ∼ N η−1 vanishes in the thermodynamic limit. This is however not enough to guarantee that the reduced state for a subsystem is close to thermal.
In the setups we consider, the (target) mean energy is zero, which corresponds to infinite temperature. We thus compare the reduced density matrix for the central L c sites (up to L c = 10) with the corresponding reduced density matrix for the maximally mixed state. We study how this distance varies as a function of the system size for the different sets of states described above, with different scalings of the variance. The results for L c = 8 are shown in figure 6.
We observe that for M ∝ √ N , which yields δ ∼ √ N , the local distance for all sizes L c grows with the system size, approaching the maximum possible value, 1 − 2 −Lc . For linear truncation M ∝ N , which results into almost constant (or very slightly decreasing) δ, the local distance seems to become constant, growing a bit for the smallest systems in the Ising case, and decreasing in the XYZ case, but seemingly stabilizing for larger sizes. These states, thus, do not locally resemble thermal equilibrium either in the thermodynamic limit. Our results suggest however that an energy variance decreasing as log N or faster would guarantee local convergence to thermal equilibrium in the limit of large N (as the curves for M = N log N and M = 0.1N 2 illustrate). We observe the same qualitative behavior in both setups. Also different subsystem sizes (except the smallest ones) behave similarly. A necessary condition for the reduced state of L c sites to resemble thermal equilibrium (which here corresponds to the maximally mixed state) is that the entropy of the corresponding subsystem grows as L c . We have evaluated the entropy of the central subchain in these states for L c = 1, . . . 10. Our arguments bound the growth of this entropy with the same form (27), but since the maximum entropy of the block is L c , if L c < log N/2, the right hand side of (27) just gives a trivial bound. Thus we illustrate the behavior for the case L c = 8 in the right panels of fig. 6, although qualitatively similar plots are obtained for the other sizes explored, if L c 4. We observe that, while for large entropy of the block, approaching the upper bound L c , the distance decreases much faster than L c − S(ρ Lc ), when the entropy is comparatively smaller, there is a clear correlation between both quantities, and the entropy fulfills the form (27) qualitatively. In order to decrease the energy variance δ 2 = n,m Ψ|h n h m |Ψ − Ψ|h n |Ψ Ψ|h m |Ψ , the system needs to arrange the local energy fluctuations and their correlations, such that the sum nearly vanishes. For a translationally invariant system with zero mean energy, and taking into account that h 2 n ≥ a for some constant a > 0, a constant δ 2 can thus be attained by either some short range terms h n h n+ of O(1), or if all terms h n h m become O(1/N ). By inspecting the spatial distribution of such correlators in the states constructed in this section, we can thus better understand how our method constructs the states with small energy variance (the local energy operators h n are chosen to fulfill tr n h n = 0, as specified in section II).

E. Variance and correlations
As illustrated in figure 7 for the case N = 40, very early a certain amount of long range correlations develops. In the Ising case (left panel), where the initial state is translationally invariant, these correlations are homogeneous, while in the XYZ model (right panel) they reflect the periodicity of the initial state. 49 By comparing the (largest) magnitude of the long range correlations for different system sizes, at a number of steps M ∝ N , which, as we have discussed, corresponds to a constant δ 2 , we observe that these long-range terms scale indeed as 1/N . This strategy is thus not the most entanglement effective, which explains why, when we search for the states minimizing the variance at fixed energy and bond dimension (see fig. 12), we encounter different structure. The scaling of the variance in our constructions (15) and (21) follows from considering an initial product state with narrow enough (σ p ∼ √ N ) energy distribution, but does not require translational invariance, and thus must also hold for initial states that have an inhomogeneous energy density. Simulating this situation allows us to ask how such an initial imbalance diminishes as the variance is reduced.
To probe this scenario numerically we consider a product initial state with a step-wise energy density, and zero mean energy, with respect to the non-integrable Ising Hamiltonian, and apply the Chebyshev filter as described above. Figure 8 show the results of our simulation, using exact diagonalization, for systems up to N = 24 sites (although we examined larger systems with MPS, the truncation error became important much before the effects on the spatial profile are noticeable). We observe in the inset that the scaling of the variance decreases with the number of steps as δ 2 ∝ 1/M 2 , as expected from (21). The inhomogeneity of the energy density, nevertheless, survives much longer, and it remains noticeable even when the variance has decreased by several orders of magnitude.  9. Relation between the bond dimension and the final energy variance attained via variational minimization for several system sizes for the Ising (above) and XYZ (below) setups. For large enough system sizes we observed a behavior compatible with log D ∼ 1/δ.

VII. VARIATIONAL OPTIMIZATION
Applying the strategy described in the paper with a fixed bond dimension manages to produce a MPS with small energy variance at the given mean energy. This variance can be systematically reduced by increasing the order of the Chebyshev expansion, before truncation error appears. But this does not need to be the MPS with the smallest possible variance for the given bond dimension and mean energy. Instead, we can directly search for such optimal MPS by a variational optimization using as cost function ∆H 2 = Ψ|H 2 |Ψ − Ψ|H|Ψ 2 (plus a penalty term to ensure the desired mean energy). This variational problem can be formulated as the optimization of a MPS, and be solved using a sequential iteration over tensors, similar to DMRG algorithms, 31,32 with the difference that the local problem to be solved for Entanglement entropy (half chain) of the MPS resulting from the variational minimization of the variance, as a function of 1/δ, for different system sizes for the Ising (above) and XYZ (below) setups.  each individual tensor is the minimization of a quartic expression. A similar problem appears also when optimizing purifications and can be solved using some iterative numerical scheme, e.g. a gradient descent algorithm (see e.g. 50 ), but the problem has local minima (even at the level of the individual tensors) and the convergence severely depends on the parameters of this local optimizer.
We have performed the variational search for the same setups and system sizes discussed in the first part of the paper, and using bond dimensions 20 ≤ D ≤ 500, in order to compare the results to the ones discussed in the previous section. We observe that the values of the energy variance reached for a certain bond dimension can be much smaller than with the Chebyshev sum. Similar to that case, the bond dimension seems to grow exponentially with 1/δ, with a coefficient that depends on the system size, as shown in figure 9. but which does not correspond to √ N . From our data we could not identify a clear scaling of the coefficients. Additionally, the behavior of the variance is not smooth when increasing D, what we attribute to the imperfect convergence of the non-linear optimizations, which may sometimes be trapped in local minima for a certain value of the bond dimension, while the state may change completely when the bond dimension is varied. The entanglement entropy of the states found in the optimization reflects an even stronger non-systematic behavior, specially in the case of the XYZ model, as shown in figure 10. The average distance of the subsystems to the thermal equilibrium does not behave monotonically either (see figure 11), with the large changes corresponding to the abrupt variations in δ appreciated in figure 9. Interestingly, although the variance decreases monotonically when increasing D, the distance does not behave in the same way. Overall, the distances obtained with the variational minima are (except for the smallest system) larger than the best distances achieved with the filter.
Also the energy density correlations in these states differ from the systematic behavior encountered in figure 7. Now we find that the states achieve much lower energy variance by modifying the short range correlations, and long range ones are developed only at much larger bond dimensions (see figure 12).

VIII. DISCUSSION
We have introduced a method that, starting from a product state, systematically constructs states with decreasing energy variance and controlled entanglement for any local one-dimensional Hamiltonian. This allows us to extract conclusions about the minimal entanglement (or bond dimension) guaranteeing the existence of a state with a certain variance, as a function of the system size. We have found that it is possible to prepare states with arbitrarily small variance (vanishing as δ ∼ 1/ log N ) with a bond dimension that scales polynomially with the system size.
Using MPS algorithms, we have implemented the construction numerically for the Ising and XYZ models, and we have confirmed that the asymptotic scalings hold already for system sizes N ∈ {20, 100}. Using the numerical simulation we have also analyzed how close these states are to thermal equilibrium, in terms of the local reduced density matrices. Our results suggest that a variance that vanishes as δ ∼ 1/ log N is enough to obtain local thermal behavior for small subsystems.
For comparison, we have run a variational search for the MPS that minimizes the energy variance at fixed bond dimension. Although the variational method we use may find convergence problems, the states we find enable a qualitative comparison with the results from the systematic construction. The variational search finds states with smaller variance for the same entanglement, although in general the corresponding reduced density matrices are further from thermal. Nevertheless, the exponential scaling of the bond dimension with the variance also seems to hold in this case.
Finally, notice that the result about the scaling of the variance holds independent of the dimensionality of the problem. Furthermore, since the bound on the rate of entanglement generation by a local Hamiltonian is general as well, we expect that the bound of the entropy can also be generalized to higher dimensional systems.