Improved thermal area law and quasi-linear time algorithm for quantum Gibbs states

One of the most fundamental problems in quantum many-body physics is the characterization of correlations among thermal states. Of particular relevance is the thermal area law, which justifies the tensor network approximations to thermal states with a bond dimension growing polynomially with the system size. In the regime of sufficiently low temperatures, which is particularly important for practical applications, the existing techniques do not yield optimal bounds. Here, we propose a new thermal area law that holds for generic many-body systems on lattices. We improve the temperature dependence from the original $\mathcal{O}(\beta)$ to $\tilde{\mathcal{O}}(\beta^{2/3})$, thereby suggesting diffusive propagation of entanglement by imaginary time evolution. This qualitatively differs from the real-time evolution which usually induces linear growth of entanglement. We also prove analogous bounds for the R\'enyi entanglement of purification and the entanglement of formation. Our analysis is based on a polynomial approximation to the exponential function which provides a relationship between the imaginary-time evolution and random walks. Moreover, for one-dimensional (1D) systems with $n$ spins, we prove that the Gibbs state is well-approximated by a matrix product operator with a sublinear bond dimension of $e^{\sqrt{\tilde{\mathcal{O}}(\beta \log(n))}}$. This allows us to rigorously establish, for the first time, a quasi-linear time classical algorithm for constructing an MPS representation of 1D quantum Gibbs states at arbitrary temperatures of $\beta = o(\log(n))$. Our new technical ingredient is a block decomposition of the Gibbs state, that bears resemblance to the decomposition of real-time evolution given by Haah et al., FOCS'18.


A. Background
One of the most important challenges in quantum many-body physics is to understand their thermal equilibrium properties.Recently, with the advent of large quantum simulators [1][2][3][4][5], the size and controllability of quantum Gibbs states accessible for experiments have dramatically improved.In fact, recent experiments have even succeeded in implementing imaginary time evolution [6].These developments are of considerably interest in terms of quantum computation because quantum Gibbs states play crucial roles in quantum machine learning [7][8][9][10][11][12][13][14] and quantum algorithms such as semidefinite program solvers (SDP) [15][16][17].Beyond quantum computation, understanding and characterizing quantum Gibbs states is relevant to many open problems in quantum statistical physics and condensed matter physics.Thus, understanding i) the nature of entanglement structures in quantum Gibbs states and ii) their simulability via tensor network methods is of great interest.
It is now widely accepted that the area law plays a crucial role [18,19] in the characterization of lowtemperature physics of many-body systems.This states that the entanglement entropy between two subsystems is at most as large as the size of their boundaries.A similar notion also applies to finite temperature systems.Although a rigorous proof of the area law at zero temperatures appears to be a notoriously challenging problem [20][21][22][23][24][25][26][27], an analogous area law at finite temperatures has been proved by Wolf et al. [28] in a simple and elegant manner.The authors proved the following inequality: with • • • being the operator norm and ∂L being the surface region of L, where I(L : R) ρ β is the mutual information between the subsets L and R (see Eq. ( 7) below) and H ∂L denotes the boundary interaction Hamiltonian.The upper bound (1) roughly denotes that the correlations between two complementary regions is concentrated around a distance O(β) of their boundary.The thermal area law ( 1) is optimal at high temperatures (β ≈ O (1)) because the dependence on |∂L| cannot be improved.One may similarly expect that at low temperatures (β 1), linear dependence on β should be optimal.This is suggested by the theory of belief propagation [29], which indicates that the nonlocal quantum effects can be induced in a length scale of O(β).However, there are no definite numerical or theoretical examples that achieve the upper bound (1).Indeed, for specific systems [30][31][32], we can get much better area-law bounds than (1).This motivates the possibility of the following improvement of the thermal area law: Any improvement along these lines is intimately associated with new advances in our understanding of the low-FIG.1.Schematic depiction of our problem.By decomposing the total system into L and R, we consider the mutual information I(L : R)ρ β between L and R.Then, the thermal area law in Ref. [28] gives I(L : R)ρ β β|∂L| (γ = 1 in the above picture).We aim to establish a new thermal area law in the form of I(L : R)ρ β β γ |∂L| with γ < 1.In particular, it is a highly non-trivial and fundamental question to identify the best exponent γc for which the thermal area law holds in generic many-body systems.Our main result provides the non-trivial upper bound of γc ≤ 2/3.temperature physics.For instance, the widely known relation between area laws and tensor networks suggests that the identification of the minimum γ c would also lead to optimal representations of Gibbs states.This would result in faster algorithms for computing local expectation values and evaluating the partition functions.
Unfortunately, at lower temperatures, computational complexity theory results severely limit the applicability of these algorithms discussed above.Indeed, computing the partition function of Gibbs states in dimensions D > 1 is already known to be NP-hard [73,74] except for special cases (e.g., ferromagnetic spin systems [75,76]).This is a serious bottleneck for several practical applications in which the Gibbs states are employed at low temperatures.For example, in the quantum algorithm for semi-definite programming [15], the quantum Gibbs states with β = O(log(n)) (n: system size, β: the inverse temperature) must be sampled.Similar challenges are faced in the imaginary time evolution, the implementation of which is a central aim of near-term quantum devices [6,[77][78][79][80][81][82][83].In fact, below a threshold temperature, little is known about the universal properties of Gibbs states that may hold independent of the system's details.This provides a strong motivation to identify the optimal thermal area laws.

B. Description of the main results
For the first main result of the present study, we prove the inequality (2) for γ = 2/3.On the other hand, we also prove the lower bound of γ c ≥ 1/5, using the example constructed in [84] (see Appendix B), which means 1/5 ≤ γ c ≤ 2/3.
To understand why this is counter-intuitive at first sight, let us consider the case of real-time evolution e iHt .The small-incremental-entangling (SIE) theorem [85][86][87][88] predicts the linear increase of the entanglement with respect to time, which translates to the fact that the Schmidt rank of the operator e iHt grows as e O(t) .This suggests the same linear dependence for the imaginary time evolution operator e −βH .However, the inequality (2) shows that the scaling of the exponent is sublinear in β.This means that the entropy growth due to the imaginary time evolution is more diffusive in nature.We explain this difference in Subsection III A, which can be traced back to a better polynomial approximation to e −x compared with e −ix [89].This polynomial approximation is caused by a random walk interpretation of the Chebyshev basis expansion of e −x (see Sec. III A), which is not available for e −ix .This random walk interpretation further suggests that the entropy production in the imaginary time evolution is diffusive.
The improved area law is not only of fundamental interest but also provides important insights regarding the efficient representation of the quantum Gibbs states.In previous studies [64,90,91], the approximations by matrix product operators/projected entangled pair operators (MPO/PEPO) have been investigated through cluster expansion techniques.Furthermore, Ref.
[91] has explicitly given the PEPO/MPO construction scheme with the bond dimensions of D = (n/ ) O(β) ( : approximation error). ( If we use the cluster expansion technique, this is expected to be the best estimation.However, the polynomial-size bond dimension of n O(β) may still be a significant overestimation.Improvements are strongly motivated by the practical use of tensor network techniques in approximating thermal states [33,41,48], which appears to be much more successful than that guaranteed by the current analytical bounds.
Our second main result focuses on classical algorithms for approximating thermal states in one dimension (1D).By applying our new analyses, we establish a sub-linear dependence of the bond dimension of the MPO approximation to the thermal state as D = e Õ(β 2/3 )+ Õ √ β log(n/ ) (4) with the approximation error, where we write O(n log(n)) as Õ(n) by using the notation Õ.The estimated bond dimension is smaller than any power of (n/ ) and is well suited for numerical simulations.
Finally, we consider the computational complexity of the construction of the MPO, which approximates 1D quantum Gibbs states.Establishing provably efficient quasi-linear algorithms for physical systems is relevant to the field of Hamiltonian complexity [92,93].The general difficulty lies in that the existence of an efficient MPO description (4) does not necessarily imply an efficient algorithm to find such a description [94,95].So far, the state-of-the-art algorithm [91] is based on cluster expansion, and MPO construction requires a computation cost which is proportional to n × (n/ ) O(β) , where the estimated exponent of (n/ ) is usually impractically large.However, most classical heuristic algorithms employed practically usually require only (quasi-)linear computational time with respect to the system size [29,[33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50].We, for the first time, give a quasilinear time algorithm that constructs the approximate MPO, with a runtime of n × e which is quasi-linear in (n/ ) for arbitrary β = o(log(n)).
The rest of this paper is organized as follows.In Sec.II, we formulate the precise setting and notations used throughout the paper.In Sec.III, we state the main theorems on the area law and the MPO approximation.In addition, in Sec.III A, we show the relationship between imaginary time evolution and the random walk.In Sec.IV, we give the quasi-linear algorithm to compute the MPO approximation of the 1D quantum Gibbs states.We also provide a brief explanation regarding why the algorithm works well.In Sec.V, we discuss several physical implications from our analytical techniques.Finally, in Sec.V, we summarize the paper, along with a brief discussion.The proofs of the main statements are given in the Appendices.To concentrate on the physics, we have provided the intricate aspects of the proofs in Supplementary materials [96].

II. SETUP AND NOTATION
We consider a quantum system with n qudits, each of which has a d-dimensional Hilbert space.We denote the Hilbert space dimension of a subset S ⊆ Λ, where Λ is a lattice, by D S .For the present discussion, let us restrict ourselves to the case of 1D lattice; we consider higher dimensional lattices later (see Sec. II A).We define the Hamiltonian H as follows: where h i,i+1 contains interactions between i and i + 1, and where {O ≤i,m } and {O >i,m } are operators acting on subsets {j} j≤i and {j} j≥i+1 , respectively.Note that the Schmidt rank SR(h i,i+1 , i) is always smaller than the local Hilbert dimension d (i.e., SR(h i,i+1 , i) ≤ d).
Throughout the paper, we have focused on the Gibbs state ρ β with an inverse temperature β: .
For an arbitrary decomposition of the total system Λ into Λ = L ∪ R, we focus on the mutual information I(L : R) ρ β : where S( is the reduced density matrix in subsets L (R).We define the subsets L and R as L = {1, 2, . . ., i 0 } and R = {i 0 +1, i 0 +2, . . ., n}, respectively.Then, the boundary Hamiltonian H ∂L is given by h i0,i0+1 , which gives the previously known thermal area law (1) of For a more detailed characterization of the structure of the quantum Gibbs state, we focus on the matrixproduct-operator (MPO) representation.We aim to approximate the Gibbs state ρ β by the following operator: (9) where each of the matrices {A [si,s i ] i } i,si,s i is described by the D × D matrix.We refer to the matrix size D as the bond dimension.By choosing D to be sufficiently large as D = e O(n) , we can describe arbitrary operators in the form of MPO; however, only a relatively small bond dimension is often required in practical applications (e.g., D = o(n)).Because the Schmidt rank of M D is smaller than that of D, the mutual information cannot exceed log(D) for an arbitrary decomposition Λ = L R. The primary problem is to estimate how large the bond dimension needs to be to achieve a certain precision error.
In quantitatively estimating the approximation error, we utilize the Schatten p norm, which is defined for arbitrary operator O as follows: Note that O 1 corresponds to the trace norm and O ∞ corresponds to the standard operator norm, which we denote by O for simplicity.The state-ofthe-art results [91] ensure the existence of M D such that ρ β − M D 1 ≤ with the bond dimension as in Eq. ( 3).The bond dimension D is roughly related to the mutual information I(L : R) ρ β as I(L : R) ρ β log(D), and thus, the estimation (3) implies the area-law bound of (8).

A. High-dimensional setup
In extending to the high-dimensional systems, we consider a quantum system on a D-dimensional rectangular lattice with D the spatial dimension (we note that our analysis can also be applied to other lattices).For simplicity of notations, we consider nearest-neighbor interactions as follows: where i, j denotes the pairs of adjacent qudits, and • • • is the operator norm.By taking the energy unit appropriately, we set g = 1.
For convenience, we consider a vertical cut of the total system (see Fig. 1 for example); however, the same argument can be applied to a rectangular cut.For any partition Λ = L R, we define an upper bound with the size of the surface region as |∂Λ|, which is as large as O(n 1−1/D ).Note that |∂Λ| = 1 in the 1D lattice.

III. IMPROVED THERMAL AREA LAW
We first show our main result in the thermal area law.The following theorem holds for arbitrary lattice dimensions: Theorem 1.For an arbitrary cut Λ = L ∪ R, the mutual information I β (L : R) is upper-bounded by where C is a constant of O(1).In particular, for onedimensional systems (|∂Λ| = 1), we have We show the proof in Appendix C 1.The above result has a logarithmic correction of log(|∂Λ|) to the area law in high dimensions.Even then, for β log 2 (|∂Λ|), our result provides a qualitatively better upper bound than the previous one (1).We expect that this correction should be removed using refined analyses on the Schmidt rank for polynomials of the Hamiltonian.
If we consider 1D quantum Gibbs states, we can derive a much stronger statement in the form of a bound to the entanglement of purification (see Theorem 6 in Sec.V A).Moreover, for the MPO representation of the 1D quantum Gibbs states, we obtain the following theorem Theorem 2. For arbitrary 1D quantum Gibbs state ρ β , there exists a MPO M D as in Eq. (9) such that for the Schatten-p norm with p = 1 and p = 2 with where q * := C 0 max This proof is shown in Appendix C 2. We believe that the above MPO could also be used to construct a quantum circuit with depth polynomial in the stated bond dimensions.Now, we discuss the key principles that allow us to improve the original thermal area law (See Appendix C for the details).Our analysis utilizes various recent techniques employed in the proofs of the area law for ground states [23,26,27].Inspired by these studies, we construct an approximation of the quantum Gibbs state using an appropriate polynomial of low degree [89] and then perform a Schmidt rank analysis adapted from [23].As mentioned in the introduction, the main insight is that the polynomial used by us satisfies the random walk property, which we explain below.

A. Physical intuition from the random walk behavior
Before the main discussion, let us consider an illustrative example of the random walk behavior in imaginarytime evolution.We here consider a one-particle tightbinding model as (16) where |x is the state of the particle on site x.Then, the real-time Schrödinger equation gives the ballistic propagation of the particle.We consider a time-evolved quantum state |0(t) = e −iHt |0 , where the initial state |0 is the localized state on x = 0.In Fig. 2 (a), we show the fluctuation of the position, which is given by the square root of the variance Var(X) := 0(t)|X 2 |0(t) − ( 0(t)|X|0(t) ) 2 , where X =  16) with R = 500.In (a) and (b), we plot the fluctuation of the position Var(X) after the real and imaginary time evolutions, respectively.Here, the state at the initial time is given by |0 .The fitting functions for (a) and (b) are given by √ 2t and 0.998769β 1/2 , respectively.This clearly indicates that the real-time evolution induces a ballistic propagation, whereas the imaginary-time evolution induces a diffusive propagation.
in quantum many-body systems.In the following sections, we mathematically justify this intuition. .Below, we will show that the exponential function e −b(1+y)/2 (b ∈ N) can be expanded in terms of the Chebyshev polynomials as (see also Fig. 3): with r 0 = 0, where T r (x) is the Chebyshev polynomial and p(r|r ) is a random walk probability from r b−1 to r b which is defined below.
For application to e −βH , we choose b = β H .Because the Schmidt rank and polynomial degree are closely related [23], we get a diffusive interpretation of the Schmidt rank of e −βH .We thus infer a sublinear βdependence of the mutual information, namely γ c < 1.There are two main issues while achieving this value.First, the above polynomial gives an approximation to e −βH only in the operator norm, whereas we are searching for an approximation in a family of norms.Second, even for a constant error approximation in the operator norm, degree √ b = β H scales with the system size.We solve both the problems using the quantum belief propagation in 1D and a refined version of Suzuki-Trotter decomposition in higher dimensions, which allows us to reduce the problem to a local Hamiltonian H S , where S is a much smaller region.The loss incurred because of the belief propagation and the conversion from the operator norm to other norms leads to our main result of γ c ≤ 2/3.17).The probability P (r b ) is generated from the b-step random walk.In each step, the probability from r to r is given by p(r |r), which is a symmetric function around r.In the picture, we give the numerical plot of p(r|2), where the shape of p(r|r ) does not depend on r .

Derivation of Eq. (17)
As a first step, we expand which is an expectation of (−y) j according to the distribution q(j) := e −1/2 2 j j! .Next, we introduce Chebyshev polynomials T r (y) (for an integer r) and utilize the observation from [89] that for j > 0 and integer k, where we set j s+1/2 = 0 (s ∈ N) and j s = 0 for s < 0 and s > j.Here, B j (r |r) is the binomial distribution which is centered at r with a variance of √ j (see also the footnote [97]).Now, we have all tools to set-up the random walk over integers.By combining Eqs.(18) and (19), we start with the first random walk step of where the symmetric distribution p(r 1 |0) (with mean 0 and variance O(1)) is defined using p(r 1 |0) := ∞ j=0 q(j)B j (r 1 |0).The subsequent steps are obtained by writing with p(r 2 |r 1 ) := ∞ j=0 q(j)B j (r 2 |r 1 ).One can show that the function p(r 2 |r 1 ) is symmetric around its mean r 1 and has a variance of 0.5 (see Fig. 3 for the shape of p(r|2)).By repeating the process, we can arrive at the equation (17).Thus, e − 1 2 (1+y) b is an expectation over T r (y), according to a distribution obtained by performing b steps of a symmetric random walk with constant variance.It is now clear that the degree is strongly concentrated around O(b 1/2 ).This random walk behavior is not available for e ix because the distribution p(r 2 |r 1 ) is not given by a real number.It leads to O(b) approximate degree for real-time evolution.

A. Main statement
Here, we show that the classical algorithm generating an MPO approximation of the Gibbs state ρ β is possible with a run time of O(n 1+o (1) ) as long as β = o(log(n)) We prove the following theorem: Theorem 3.For arbitrary β, we can efficiently compute a matrix product operator M β which approximates e −βH in the sense that where the bond dimension of M β is given by exp(Q ).Also, the computational time to calculate M β is nβ exp(Q ) with where C is an O(1) constant.When β log(n/ ) and = 1/poly(n), the time complexity is given by We compare the bond dimensions of M (β/β0) β0 with that of the theoretical bound in (15).For β log(n), the both estimations are in the form of e Õ √ β log(n) , whereas for β log(n), the estimation (15) gives a slightly better bound.

B. MPO for ground space
We also discuss the consequences regarding the calculation of quantum ground states.Let us assume the following condition for the density of states in an energy shell (E − 1, E] for the low-energy regime [98][99][100]: with c a constant of O(1), where N E,1 is the number of eigenstates within the energy shell of (E −  (23), we obtain the time complexity of an almost polynomial form, as n O(log log(n)) .This result rigorously justifies the empirical success of the imaginary TEBD methods in the computation of the ground states [43,45,47,48].Our estimation, however, is still slightly worse than the polynomial form (i.e., n O (1) ).In the case of the gapped ground states, the existing algorithms [94, 95] have already achieved polynomial computational costs without the assumption (24).Any small improvement of (23) will allow us to obtain a quasi-linear time algorithm for the computation of the ground states under the assumption of (24).

C. Details of the algorithm and proof of Theorem 3
The algorithm proceeds as follows.Suppose we are at a high temperature β 0 ≤ 1/16.First, the 1D Hamiltonian is split into blocks of length l 0 = O(log(n/ )), as We then write e −βH as follows: e β0H1:j−1 e −β0H1:j =: where H 1:j = s≤j H s and H 1:0 = 0. Here, the operator Φ j is the non-local operator on the qudits {1, 2, . . ., j 0 }.We first approximate Φ j by the following operator on the local region: The second approximation is the low-degree polynomial expression of Φj : where Using the above notations, we can approximate the high-temperature Gibbs state by We have illustrated this construction in Fig. 4. We notice that our construction resembles the decomposition of the real-time evolution developed in [101]  where M β has the bond dimension of e Õ √ log(n/ ) .The sufficient computational time for the construction is given by We notice that the inequality (30) immediately reduces to for an arbitrary positive p The computational time (31) is qualitatively explained as follows.The operator M β0 is a product of degreem polynomials T m (x).From Ref. [23], the Schmidt rank of each of {Φ } n j=1 are locally defined, for every cut, constant number of operators in {Φ (m) j } n j=1 contribute to the Schmidt rank.Therefore, the computational time to construct M β0 is at most ne To extend this to arbitrary β, we utilize the following upper bound (see Lemma 10 in Appendix A), which slightly extends the analyses in Ref. [91]: for arbitrary positive integers q and p, where M β0 satisfies the inequality (32) with = 0 for arbitrary p ∈ N.
We get e −βH = e −β0H (β/β0) and then multiply the above MPO construction β/β 0 times, where β 0 is appropriately chosen so that β/β 0 becomes an even integer (i.e., q = β/(2β 0 )).To make 3 0 qe 3 0q ≤ (≤ 1), we need to choose 0 = /(6q) = β 0 /(3β).By extending the Schmidt rank estimation in Ref. [23], we can ensure that the Schmidt rank of M (β/β0) β0 is at most as large as exp(Q ).In more detail, we can prove the following lemma (see Sec. S.IV.F of Supplementary materials [96] for the proof): Lemma 5. Let M β be an approximate operator that has been defined in Eq. (29).Then, for arbitrary q ∈ N, the Schmidt rank of the power of M β is upper-bounded by for an arbitrary cut, where C is an O(1) constant.
Because m has been chosen as m = O(log(n/ )), this upper bound is proportional to Q for q = O(β).Therefore, the quantum Gibbs state e −βH is well approximated by the MPO, with its bond dimensions of exp(Q ).
We have already prepared the MPO form of M β0 in Proposition 4. Using the standard results regarding the canonical form of MPOs [102,103], we can efficiently calculate M q β0 (q β) from M q−1 β0 in a computational time of at most poly(exp(Q )).We notice that in each of the steps, we can compress the MPO without any truncation error so that the bond dimension of M q β0 is smaller than the bound in (34).By recursively constructing M q β0 , the computation of M (β/β0) β0 requires the time steps as many as (23).We thus prove Theorem 3.
Finally, let us compare our method with the imaginary-time-evolving block decimation (TEBD) methods [33,41,48], which proceed by truncation of the Schmidt rank at each imaginary-time Trotter step.A major limitation of these studies is the lack of rigorous justification of the Schmidt rank truncation, as explained below.
In the TEBD algorithms [48], we start with a matrix product operator M 1 which gives the approximation of e −β1H for a certain β 1 .We then connect two MPOs as M † 1 M 1 , which is expected to approximate e −2β1H .To ensure the precision of approximation, we use Lemma 9 for the MPO M 1 , which necessitates the approximation in terms of general Schatten p norm.Now, the main technical difficulty comes from the Schmidt rank truncation of M † 1 M 1 , which gives MPO M 2 in the next step.After the Schmidt rank truncation, we connect M † 2 M 2 to approximate the Gibbs state e −4β1H .However, to ensure the good approximation from Lemma 9, we have to truncate the Schmidt rank of 1 M 1 in terms of the general Schatten p norm.The Schmidt rank truncation based on the singular value decomposition only ensures the approximation in terms of Schatten 2-norm, as in Ref. [104,Lemma 1].So far, we have no mathematical tools to perform Schmidt rank truncation, which guarantees approximation in terms of the general Schatten p-norm.In summary, even though the quantum Gibbs state can be approximated by an MPO with a small bond dimension, it is highly nontrivial to show whether the truncation of the Schmidt rank retains the good approximation.
We can circumvent this problem by constructing e −β0H as a product of local polynomial approximations, which covers the whole chain.Thus, the operator M β0 (or ) has a finitely bounded Schmidt rank, and we do not need to approximate it further using the Schmidt rank truncation.

A. Rényi entanglement of purification
To characterize the bipartite correlations beyond mutual information, we also consider the Rényi entanglement of purification E p,α [105], defined as follows: Definition 1.Let Λ be a copy of the total system with the Hilbert space H .For an arbitrary quantum state σ, we define E p,α (σ) for the partition Λ = L ∪ R as where |φ is the purification of σ (i.e., tr A bound on the Rényi entanglement of purification imposes a stronger restriction to the structure of the quantum state than the mutual information in Eq. (7).For instance, [106] showed that an upper bound on the entanglement of purification of a 1D system guarantees an efficient approximation by MPOs.The mutual information I(L : R) ρ β is related to the Rényi entanglement of purification with α = 1 (see Ref. [107]): We also present an upper bound on this quantity as follows (see Appendix C 1 for the proof): Theorem 6.For arbitrary non-zero 0 < α ≤ 1, the Rényi entanglement of purification E p,α is upperbounded as follows: The above upper bound implies that for α < 1, the entanglement scaling may be linear to β instead of . This can be explained as follows.To calculate the Rényi entanglement of purification, we need to obtain the MPO which has an approximation error such that D α 1, where D is the bond dimension to achieve error .From the MPO with this property, we have E p,α (ρ) log(D ).Because of Let us compare our result to those of previous studies [64,90,91].The bond dimension scales as D = (1/ ) O(β) .By using this estimation, there exists a critical α c (= 1 − O(β −1 )) that violates the finite upper bound of E p,α (ρ) for α < α c .

B. Convex combination of matrix product states
Ref. [108] showed that the thermal state can be expressed as a convex combination of MPS with bond dimension scaling doubly exponentially with β.We can prove the following corollary, which substantially improves their main result (see Appendix E for the proof): Corollary 7. The quantum Gibbs state ρ β is given by a convex combination of the matrix product states in the following sense: where {|M i } are matrix product states with the bond dimension of where q * has been defined in Theorem 2.
This result can be used to justify the METTS algorithm [33] and the algorithm of [109] (see also Ref. [108] for more detailed motivations to study the convex combinations of MPS).Using our bounds, we provide further analytical evidence regarding why these work in practice.
A related quantity in the study of mixed-state entanglement is the entanglement of formation.It captures the 'average bond dimension' in the convex combination shown in Equation (38) and can be defined as follows.
Definition 2. Let Λ be a copy of the total system with the Hilbert space H .For arbitrary quantum state σ, we define E f,α (σ) for the partition Λ = L ∪ R as where again S α (•) is the Rényi entropy, and the minimization is over all pure-state decompositions σ = i p i σ (i) .Entanglement of formation is upper bounded by the entanglement of purification [105] (see also the footnote [110]): where the equality holds for pure states.When σ is given by the quantum Gibbs state ρ β , an upper bound follows from Theorem 6,

C. Real time evolution
Our analyses can be partially applied to real-time evolution.In this case, we approximate the unitary time evolution e −iHt instead of the quantum Gibbs state e −βH .The most essential difference is that the random-walk-like behavior [i.e., Eq. ( 17)] cannot be justified.Mathematically, Lemma 14 below is only applicable to imaginary time evolution.Hence, the MPO approximation of e −iHt requires the bond dimensions of e O(t) instead of e O(t 2/3 ) .This is expected and consistent with the numerical calculations and the theoretical upper bound [85][86][87][88].
Still, our results on the quasi-linear time algorithm can be also applied to real-time evolution, where we utilize only the Taylor expansion (28).By applying Theorem 3 to the case of β = it and p = ∞, we can obtain the following corollary: Corollary 8.For arbitrary t, we can efficiently compute a matrix product operator M t that approximates e −iHt in the sense that where the bond dimension of M t is given by exp Õ(|t|) + Õ |t| log(n) .The computational time to calculate M t is given by For |t| log(n), our result gives a quasi-linear computational cost for n; thus, it is better than the previous computational cost e O(t)+O(log(n/ )) , which is derived from the Lieb-Robinson bound [111,112].However, for |t| log(n), the computational cost (43) grows exponentially with t, and has the same limitation as the previous methods.

D. Entanglement rate by imaginary time evolution
The quantum Gibbs state is regarded as an imaginary time evolution of the uniformly mixed state, namely ρ β ∝ e −β/2 ρ β=0 e −β/2 .Thus, the entropy-production rate of the imaginary time evolution is sublinear with respect to β. Can we extend it to general quantum states instead of the uniformly mixed state?Clearly, when we consider the arbitrary quantum state |ψ , the answer is no; that is, the entanglement generation by e −βH for a given cut (e.g., Λ = L R) is usually unbounded.Even if there are no interactions between L and R or e −βH = e −βH L ⊗ e −βH R , the entanglement rate can be non-zero.
Here, we consider the imaginary time evolution for a product state |P L,R as This setup is feasible for experimental realizations [6,80].When we consider the real-time evolution (i.e., β = it), the SIE theorem [87] gives the upper bound of O(t).In contrast, no theoretical studies have given an upper bound of the entanglement generation by the imaginary time evolution.It is an intriguing open problem whether or not the entanglement rate is finitely bounded for large β.
Using our current analyses, we can partially answer this question.To approximate |P L,R (β) , we use an operator . We aim to estimate the approximation error of |P L,R (β) depending on the Schmidt rank D. Let us set the ground-state energy of H equal to zero.Then, from the inequality (C1) in Proposition 12 with p = ∞, there exists O D such that , the entanglement entropy of P L,R satisfies the same inequality as ( 12) and scales as β 2/3 .However, in general, the quantity e −βH |P L,R is exponentially small for n, and hence, the value of should be as small as e O(n) , which gives the entanglement scaling as √ nβ.This is still non-trivial but is rather worse than the expected scaling of β 2/3 .
To improve the bound, a refined approximation error is required, which is given by the following form of instead of the approximation e −βH − O D p ≤ e −βH p for Schatten-p norm.The approximation of the form of ( 46) can be derived for sufficiently high temperatures (see Proposition 4).If we can extend the Proposition 12 in Appendix C to the form (46), we will be able to prove that the entanglement rate by the imaginary time evolution ( 44) is upper-bounded by Õ β 2/3 .

VI. CONCLUSION
We have shown two main results in this work.The first one is the improved thermal area law that gives a scaling of Õ β 2/3 over all lattices (Theorem 1).This scaling behavior is qualitatively explained by the fact that the imaginary time evolution is intrinsically related to the random walk as in Eq. ( 17).In the 1D case, we also give an MPO representation of the quantum Gibbs state with a sublinear bond dimension with respect to the system size n (Theorem 2).The second one is a quasi-linear time algorithm for preparing an MPO approximation to the 1D thermal state (Theorem 3), which improves upon all the prior rigorous constructions in this context.It also justifies the quasilinear runtime of several heuristic algorithms inspired by the MPO-based techniques.Moreover, our algorithm can be applied to the computation of the ground state under the low-energy-density assumption of (24).Our first technical insight is the use of polynomial approximations of the exponential function, which are based on Taylor truncation and Chebyshev expansion (17).The second technical contribution is a Trotter-Suzuki type decomposition of the Gibbs state (see Fig. 4).It would be interesting to see the possibility to further develop our approximation by using the results in Ref. [113].
We leave the following questions to be considered in future work.
• High-dimensional PEPO representation with sublinear bond dimension: Our analytical approach has improved the bond dimension of the MPO for 1D quantum Gibbs states.
Here, the point is to utilize the estimation in Ref. [23] to efficiently encode the polynomial of the Hamiltonian to the MPO representation.We expect that the same improvement should be possible in the PEPO approximation for the highdimensional Gibbs state.The key question is how to encode the polynomial of the Hamiltonian to a PEPO representation with a non-trivial bond dimension.Such a representation will also be useful in the context of area laws for ground states in higher dimensions.
• Improving the runtime of the algorithm: Our algorithm presented in Theorem 3 has a runtime of ne We expect that this could be improved to ne Õ(β 2/3 )+ Õ √ β log(n) because this matches the bond dimension of the MPO constructed in Theorem 2. Another challenge is to improve the runtime to the subexponential form with respect to log(n) for β = O(log(n)).This improvement would lead to quasilinear time algorithms for ground states under the assumption (24).The main difficulty lies in constructing a better polynomial approximation to the quantum Gibbs state than M • Circuit complexity of preparing 1D quantum Gibbs state: As discussed after Theorem 2, we believe that our MPO approximation could be used to construct a quantum circuit for preparing the quantum Gibbs state.So far, the best estimation requires n O(β) to prepare the 1D quantum Gibbs states on the quantum computer [52].The quantum preparation of the quantum Gibbs state is expected to be easier than the MPO construction on the classical computer.Hence, we conjecture that the sufficient number of the elementary quantum gates should be also quasi-linear as in (23).
For instance, the adiabatic algorithm presented in [55] could be used in this context, by estab-lishing the injectivity of the MPO in (14).As another route, we may be able to employ the techniques in [17, Appendix B], which implemented the smooth-function of a Hamiltonian (see also [59,Sec. 5.3] for further discussions).By using this method, which relies on polynomial approximations to e −βH , the polynomial presented in Theorem 3 could be efficiently implemented on a quantum computer.
• Improving the thermal area law to β 1/2 |∂L|: In this work, we identified the critical γ c satisfying (2) as 1/5 ≤ γ c ≤ 2/3.From the random walk behavior in Sec.III A, we expect that γ c may be equal to 1/2 or even smaller, which would suggest the diffusive propagation of information by the imaginary-time evolution.For the characterization of entanglement structures of quantum many-body systems at finite temperatures, identification of the optimal γ is one of the most fundamental future problems.

Appendix A: Basic analytical tools
In the next sections, we use the following lemmas.We have shown their details in Sec.S.I. of Supplementary materials [96].
The first lemma connects the closeness between two operators to that between square of the two operators: Lemma 9. Let O and Õ be operators that are close to each other in the following sense: Then, the square of the operator O, which is O † O, is close to Õ † Õ as follows: The proof is straightforward by extending the result in Ref. [91], where the positivity of O has been assumed.

Lemma 10.
Let O and Õ be operators that satisfy the inequality Then, the pth power of the operator O † O is close to ( Õ † Õ) p , as follows: This is a simple generalization of Proposition 1 in Ref.
Let us consider a normalized state |ψ and give its Schmidt decomposition as where the Eckart-Young theorem gives the following inequality: where | ψ can be unnormalized.

Appendix B: Lower bound on the critical γc γc γc
We here show that the exponent γ in (2) is at least larger than 1/5.According to Ref. [84], there exists a frustration-free local Hamiltonian system with n qudits (d = 3) such that the half-chain entanglement entropy is linear in system size n, and the spectral gap ∆ is given as where c ∆ is a constant of Ω(1).For this Hamiltonian, let us consider a quantum Gibbs state at the inversetemperature of β = 2c −1 ∆ log(d)n 5 log n.Then, the total weight of the excited state is at most as large as d n e −β∆ = e −n log (d) .Therefore, this Gibbs state is exponentially close to the ground state.Using the Fannes inequality [115], the half-chain mutual information in the Gibbs state is which implies that I β (L|R) should be at least larger than Ω(β 1/5 ).

Appendix C: Proofs of the main theorems
We here show the proofs of Theorem 1 (Theorem 6) and Theorem 2. For simplicity, we focus on onedimensional systems; however, the essence of the proof is the same in high-dimensional cases (see Sec. S.III. in Supplementary materials [96]).
Both theorems are based on the following basic propositions: We aim to approximate the Gibbs state ρ β by another operator ρβ which has a smaller Schmidt rank for a given cut Λ = L ∪ R.This is contained in the following theorem, which plays a central role in deriving our main results: Proposition 12. Let be an arbitrary error such that ≤ e.Then, there exists an operator ρβ which approximates ρ β as follows: for arbitrary p ∈ N, and The proof is shown in Appendix D. For sufficiently small , this estimation gives a sublinear dependence of the Schmidt rank with respect to (1/ ).For example, for = 1/poly(n), we have SR(ρ β , i 0 ) ≤ n log −1/2 (n) , which is slower than any power of n.If the Schmidt rank of the approximating operator ρβ exceeds e Õ(β 2/3 ) , the error decays super-polynomially as a function of the Schmidt rank.

Proof of Theorems 1 and 6
We here show the proofs of Theorems 1 and 6, which give the upper bounds of the mutual information and the Rényi entanglement of purification, respectively.Because of the inequality (36), Theorem 6 includes the Theorem 1 considering E p,1 (ρ).Hence, we only need to prove Theorem 6.
We start from the purification in the form of where {|j } D Λ j=1 is an arbitrary orthonormal basis, and we denote the partition function tr(e −βH ) by Z.Note that from the above definition tr Λ (|ψ ψ|) = e −βH /Z.Then, from the definition of the Rényi entanglement of purification (35), we have where we use the triangle inequality in the first inequality.We then obtain the fidelity between |ψ and | ψ as follows: In the following, using the above upper bound, we estimate the upper bound of Rényi entanglement entropy for arbitrary α > 0. We consider the cases of α = 1 and α < 1 separately.

a. Case of α = 1
We first consider the case of α = 1.We define | ψs as an approximation of |ψ which satisfies where we use Eq.(C7) for the representation of | ψs .
From Theorem 1, the Schmidt rank of | ψs , say D s , is upper-bounded from above by with q s = C max β 2/3 , [β log(s)] 1/2 .We define s as an integer such that where s is in the order of exp[O(β 1/3 )].
Let us denote the Schmidt decomposition of |ψ in Eq. (C4) as follows: where |ψ L,L ,m and |ψ R,R ,m are defined on the Hilbert space of L L and R R , respectively.From the above representation, we obtain the Rényi entropy with α = 1 as which is equal to the standard entanglement entropy.
To estimate S 1 (|ψ ), we utilize the Eckart-Young theorem.By applying the inequality (A7) to |ψ and | ψs , we obtain the following inequality: where, in the second inequality, we use the condition (C13).To upper-bound the Rényi entropy, we first define where we define D 0 = 0. We then obtain where we use the fact that the uniform distribution maximizes where we apply the inequality −x log x ≤ −x log(x/3) ≤ −y log(y/3) for 0 < x ≤ y ≤ 1 to −Γ 2 s log(Γ 2 s ).Using s = exp[O(β 1/3 )], the second and the third terms become less dominant in comparison with the first term when β is large.We thus obtain the main inequality (37) in the theorem for α = 1.

b. Case of α < 1
We follow the same analyses as in the case of α = 1.In this case, we define | ψ s as an approximation of |ψ which satisfies where the Schmidt rank of | ψ s , say D s , is upperbounded from above by D s ≤ e q s log(q s ) (C21) with q s = C max β 2/3 , [α −1 β log(s)] 1/2 .We define s as an integer such that for s > s , Using the Schmidt decomposition as in Eq. (C15), the α-Rényi entropy is given by For α < 1, we obtain the upper bound of where we adopt the similar notation (C18) for Γ s , and to derive Γ 2 s ≤ s −2/α , we use the condition (C20) and the Eckart-Young theorem as in (C17).Therefore, we have the following upper bound for the summation s 2  , where we define cα := C β/α.For the estimation of the summation for s≥s , we also use the inequality of where C1 is a constant of O(1).By combining the above inequalities together, we obtain This gives the main inequality (37) in the theorem for α < 1.This completes the proof.

Proof of Theorem 2
Here, we prove Theorem 2, which gives the MPO approximation of quantum Gibbs state.
We first prove the case of p = 2. Let ρβ be an approximation of ρ β such that for a given cut Λ = L R. We define the Schmidt rank of ρβ as D 0 , which satisfies the inequality (C2) with the approximation error 0 , namely where q * has been defined in Eq. (C3).For the cut, we define the Schmidt decomposition of ρ β as follows: where {Φ L,m } ({Φ R,m }) are orthonormal operator bases which satisfy for m = m .Note that from the above definition, we have By applying the Eckart-Young theorem (C17) to ρ β and ρβ , we obtain where we use the inequality (C23) with p = 2.Then, from Lemma 1 in Ref. [104], there exists an MPO M D such that Therefore, by choosing 0 = [ /(2n)] 1/2 , we obtain the desired approximation error (14), and the bond dimension D 0 satisfies the inequality (15).Second, we prove the case of p = 1.For this, we consider the purification of the quantum Gibbs state ρ β/2 as in Eq. (C4), which is denoted by |ψ : Where, in the second equation, we use an expression of the Schmidt decomposition similar to that of Eq. (C15).
If we can obtain a matrix product state (MPS) |M D such that where we define given by a MPO with the bond dimension of D2 .
Our task is now to find an MPS |M D which satisfies (C31).For this purpose, we consider the purification of ρ † β/4 ρβ/4 as in Eq. (C7), which we denote by | ψ .Here, ρβ/4 gives the approximation of ρ β/4 as ] for a given cut Λ = L R. The Schmidt rank of | ψ along the cut is upper-bounded by D 2 1 .In contrast, from the inequality (C12), we obtain and hence, the Eckart-Young theorem gives the same inequality as (C28): Thus, from Lemma 1 in Ref. [104], there exists an MPS To obtain the approximation error , we need to choose 1 = /(20n).Therefore, if we choose D = D 2 /(400n 2 ) , there exists an MPO M D that satisfies the inequality (14) with p = 1.Note that the bond dimension D 2 /(400n 2 ) satisfies the inequality (15) by choosing C 0 appropriately.This completes the proof of Corollary 3.

Appendix D: Proof of Proposition 12
In this section, we show the proof outline of Proposition 12 which is shown in Appendix C. We prove it based on several essential Lemmas 13, 14, 15 and 16, and defer the details to Sec.S.II. of Supplementary materials [96].Throughout the proof, while considering the Schmidt rank for the target decomposition Λ = L ∪ R, we denote SR(O, i 0 ) by SR(O) for simplicity.

Proof strategy
We here relabel each site such that L = {i} i≤ /2 and R = {i} i≥ /2+1 , where the length is a multiple of 4 to be chosen later [116].We then decompose the total system into three pieces L 0 , S and R 0 (see Fig. 5), where L 0 = {i} i≤0 , S = {i} 1≤i≤ and R 0 = {i} i≥ +1 .Accordingly, we also decompose the Hamiltonian as follows: where H L0 = i<0 h i,i+1 and H R0 := i≥ +1 h i,i+1 .Note that H S , H L0 and H R0 commute with each other.By shifting the energy origin appropriately, we set where means that H S is positive semidefinite.Next, we divide β into 2q pieces (q ∈ N) and introducing ρ 0 := e −β0H , β 0 := β/(2q).
The second step is to estimate the upper bound of the Schmidt rank of ρ2q 0 , which is given by Then, the sufficient Schmidt rank to achieve the inequality (C1) is given by a function of q (see Ineq. (D27) below).By choosing q so that the Schmidt rank is minimum, we show that the Schmidt rank is upper bounded by (C2).We thus prove Proposition 12.In the following sections, we show the details of the above arguments.

Approximation to ρ0
First, we define a parameter ν as follows: In addition, we choose q such that q 2 ≥ β.

(D7)
Let H 0 := H S + H L0 + H R0 .We first relate the two operators ρ 0 = e −β0H and e −β0H0 .We can formally write the following: where Φ is usually a highly non-local operator.The first lemma ensures that the Φ is approximated by an operator supported on L 1 ∪ R 1 : 5. The decomposition of the system considered in the proof.
The proof of this lemma is based on the belief that propagation [29,69] and the Lieb-Robinson bound [117,118].The lemma implies that as the length increases, the approximation error decays exponentially with e −O( /β0) .Thus, to achieve the inequality we need to choose as By using the parameter ν in Eq. (D7), we can write Second, we approximate e −β0H0 by an operator with a small Schmidt rank.Thus, we use the fact that H S , H L0 and H R0 commute with each other, and write e −β0H0 = e −β0(H L 0 +H R 0 ) e −β0H S .Then, we approximate e −β0H S by a low degree polynomial of H S .The most straightforward approximation is given by the truncation of the Taylor expansion, which gives a good approximation of e −β0H S by taking a polynomial degree as large as β 0 H S + log(1/δ 0 ) with δ 0 the precision error.Unfortunately, we cannot improve the thermal area law if we utilize the Taylor expansion.
One of the key aspects of our proof is the use of the following Lemma from Ref. [89, Theorem 4.1], which allows us to achieve the improved thermal area law: Lemma 14.Let δ 0 ∈ (0, 1).For any m satisfying When β 0 H S log(1/δ 0 ), the above estimation gives a significantly better polynomial degree than that from the Taylor expansion.
We recall that this polynomial approximation is obtained from the Chebyshev polynomial expansion (17) in Sec.III A, which is characterized by the random walk behavior (see Fig. 3).
Using the polynomial F m (x) defined above, we approximate the operator ρ 0 in Eq. (D9) as Because of (D1), the spectrum of β 0 H S is included in the span of [0, β 0 H S ], and hence, the inequality (D15) gives The next problem is to estimate the approximation error ρ 0 − ρ0 p0 for arbitrary Schatten p 0 -norm.We prove the following lemma: Lemma 15.Let p 0 ∈ N and δ 0 ∈ (0, 1).Under the choice of Φ L1 ⊗ Φ R1 in Lemma 13, in Eq. (D12) and m, F m (x) in Lemma 14, we have where c 3 is an O(1) constant.
Let us substitute p 0 = 2qp in Lemma 15 and choose δ 0 that satisfies This ensures that ρ 0 − ρ0 2qp ≤ δ/2 ρ 0 2qp , and we conclude where we use the inequality (D11).Therefore, the choice of ρ0 as in Eq. (D16) achieves the inequality (D3).Let us simplify the expression for all the parameters appearing so far.We consider where we define D S = d =: e c d and use the expression of in Eq. (D13).From the assumption (D7), we get for some constant c2 .From the choice of in (D12), we have where we use H S ≤ g because of h i,i+1 ≤ g from Eq. ( 5).Hence, we obtain the following simpler form of m:

Schmidt rank analysis
The remaining task is to estimate the Schmidt rank of the operator ρ2q 0 which is given by (D5).Therefore, we consider the following more general problems for the simplicity of the notation.Let us define a decomposition of the total system into L, S and R (see Figure 6).We then aim to estimate the Schmidt rank of an operator of the form Ĝm where G m (x) is an arbitrary degree m polynomial, the operators Φ 1 and Φ 2 are supported on L and R, respectively, and H S is a local Hamiltonian on the subset S (|S| = ).The Schmidt rank estimation for an arbitrary polynomial of H has been given in Ref. [23].However, in the present case, the additional operators Φ 1 and Φ 2 prohibit the direct application of that results in (D25).
In the following lemma, we can obtain the modified version of the Schmidt rank estimation in Ref. [23].
Now, we specify the choice of q by solving for where we use the definition of ν in Eq. (D6).This gives the result of where we choose q appropriately so that the condition (D7) may be satisfied (i.e., β ≤ q 2 ).By applying the notation of q * in Eq. (C3) to (D27), we finally obtain SR(ρ 2q 0 ) ≤ e q * log(q * ) . (D30) This completes the proof of Proposition 12.

Appendix E: Proof of Corollary 7
Here, we prove that the quantum Gibbs state is well approximated by a convex combination of matrix product states as in (38): We then show that the approximation error is achieved by taking the bond dimension as in Eq. ( 39).The proof is based on Theorem 2. We first consider the MPO approximation of e −βH/2 as follows: where the bond dimension of M β/2 is given by Eq. ( 15) [or Eq. ( 39)].By using Lemma 9 with p = 1, we get where we use e −βH 1 = tr(e −βH ).We now define where σ β is the normalized quantum state and satisfies . The MPO M β/2 has the bond dimension of (39), and hence, the quantum state M β/2 |P i is also given by a matrix product state with (39).We obtain the norm difference between ρ β and σ β as tr(e −βH ) tr(e −βH ) tr(e −βH ) where we use (E4) for the first term, and for the second term we use σ β 1 = 1 and We thus prove the inequality (38).This completes the proof.Let Y be a random variable taking values 1 or −1 with the probability 1/2.We then obtain xTr( We here recall the setup.We consider a quantum spin system with n spins, where each of the spin sits on a vertex of the D-dimensional graph (or D-dimensional lattice) with Λ the total spin set, namely |Λ| = n.We assume that a finite dimensional Hilbert space (d-dimension) is assigned to each of the spins.For a partial set X ⊆ Λ, we denote the cardinality, that is, the number of vertices contained in X, by |X| (e.g.X = {i 1 , i 2 , . . ., i |X| }).We also denote the complementary subset of X by X c := Λ \ X.We denote the Hilbert space of a subset X ⊆ Λ and its dimension by H X and D X , respectively.
For arbitrary subsets X, Y ⊆ Λ, we define d X,Y as the shortest path length on the graph that connects X and Y ; that is, if X ∩ Y = ∅, d X,Y = 0.When X is composed of only one element (i.e., X = {i}), we denote d {i},Y by d i,Y for the simplicity.We also define diam(X) as follows: diam(X) := 1 + max i,j∈X (d i,j ). (S.8) For arbitrary operator O, we use the Schatten p norm: Note that O 1 corresponds to the trace norm and O ∞ corresponds to the standard operator norm.We often denote O ∞ by O for simplicity.

One-dimensional Hamiltonian
Let us now focus on one-dimensional systems, where the Hamiltonian H is given by the general k-local operator: where h X are the interaction terms acting on the subset X.Here, X:X i means the summation which picks up all the subsets X ⊂ Λ such that X i.In the main text, we have considered the Hamiltonian in the form of (S.11) By choosing k = 2 and g = 1, the Hamiltonian (S.10) reduces to the above form.We here define Λ ≤i (Λ >i ) for an arbitrary i ∈ Λ as the subset {j} j≤i ({j} j>i ).For arbitrary operator O, we define the Schmidt rank SR(O, i) as the minimum integer such that where {O ≤i,m } and {O >i,m } are operators acting on subsets Λ ≤i and Λ >i , respectively.We define v i as the interaction between Λ ≤i and Λ >i : We denote the Schmidt rank SR(v i , i) as D loc : where D loc is at most of d O(k) .

High-dimensional Hamiltonian
In considering D-dimensional systems, we also consider the k-local operator: We slice the total system Λ into l Λ pieces: where l Λ is the system length, namely l Λ = O(n 1/D ), and we define |∂Λ| as an integer which gives the upper bounds for |Λ j |.Similar to the one-dimensional case, we define Λ ≤i (Λ >i ) for an arbitrary i ∈ Λ as the subset j≤i Λ j ( j>i Λ j ).We then define the Schmidt rank SR(O, i) in the same way as Eq.(S.12).We also define v i as the interaction between Λ ≤i and Λ >i : (S.17) Here, each of the {v i } l Λ i=1 consists of at most of O(|∂Λ|) local interaction terms h X .We define D loc as the upper bound for the Schmidt ranks of {v i }: (S.18) We are interested in the Gibbs state ρ β with an inverse temperature β: .

B. Generalized Hölder inequality for Schatten norm
For a general Schatten p norm, we can prove the following generalized Hölder inequality (see Prop. 2.5 in Ref. [119]): where s j=1 1/p j = 1/p.From the inequality, we can immediately obtain where we set p 1 = p and p 2 = ∞ in (S.21).

C. The Eckart-Young theorem
We here show the Eckart-Young theorem [114] without the proof: Lemma 17 (The Eckart-Young theorem).Let us consider a normalized state |ψ and give its Schmidt decomposition as where the Eckart-Young theorem gives the following inequality: where | ψ can be unnormalized.
We note that the Eckart-Young theorem can be also applied to operator by regarding it as the vector with D 2 Λ elements.For an operator O, we can obtain the Schmidt decomposition as where we defined . We note that in applying the operator the Eckart-Young theorem is only applied to the Schatten 2-norm.As far as we know, the Eckart-Young theorem cannot be extended to general Schatten p-norm.

D. Approximation of square operators
In the analyses, we often use the following lemma, which connects the closeness between two operators to that between square of the two operators: Lemma 18.Let O and Õ be operators which are close to each other in the following sense: (S.28) Then, the square of the operator O, which is O † O, is close to Õ † Õ as follows: The proof is straightforward by extending the result in Ref. [91], where the positivity of O has been assumed.We show the proof in the following.

Proof of Lemma 18
Following Ref.
[91], we start from where the inequality is derived from the triangle inequality.By using the Hölder inequality (S.21) with p 1 = p 2 = 2p, we obtain where we use the inequality (S.28) and O † 2p = O 2p .In the same way, we obtain where the last inequality is derived from The definition (S.9) implies where we use hermiticity of O † O.By combining all the above inequalities, we arrive at the inequality of where we use δ ≤ 1 in the last inequality.This completes the proof of the inequality (S.29).

E. Approximation of qth power of operators
The statement in Sec.S.I D is extended to arbitrary powers: Let O and Õ be operators which satisfy the inequality Then, the pth power of the operator O † O is close to ( Õ † Õ) p as follows: The proof is a simple generalization of Proposition 1 in Ref.

Proof of Lemma 19
Following Ref.
[91], we start from the equation as follows: We can easily check that the above equation holds for arbitrary q.By using the triangle inequality for the Schatten norm, we have Then, our task is to estimate the upper bound of the norm of (O From the generalized Hölder inequality (S.21), we obtain where the equations (O † O) q−s pq q−s = O † O q−s pq and ( Õ † Õ) s−1 pq s−1 = Õ † Õ s−1 pq are straightforwardly derived from the definition (S.9), and we use the inequality (S.29) for By applying the inequality (S.35) to (S.33), we obtain the main inequality (S.31).This completes the proof of the inequality (S.31).

F. Upper bound of the commutator norm
For the norm of multi-commutators, we can prove the following lemma (see Lemma 3 in Ref. [120]): Then, for an arbitrary operator O X supported in a subset X, the norm of the multi-commutator is bounded from above by where For (S.38)

A. Restatement
We here show the full proof of Proposition 12, which plays key roles in the proofs of the main results (Theorem 1, Theorem 2 and Theorem 6 in the main text).For the convenience of the reader, we restate the proposition as follows: Proposition 12 Let be an arbitrary error such that ≤ e.Then, there exists an operator ρβ which approximates ρ β as follows: 7. The decomposition of the system considered in the proof.
for arbitrary p ∈ N, and with where C 0 is a constant of O(1).
This following proof overlaps with Appendix D in the main text, but here we show all the details of the key lemmas.We consider general k-local Hamiltonian (S.10).

B. Proof strategy
We here relabel each of the sites such that L = {i} i≤ /2 and R = {i} i≥ /2+1 , where the length is an even integer to be chosen later.We then decompose the total system into three pieces L 0 , S and R 0 (see Fig. 7), where L 0 = {i} i≤0 , S = {i} 1≤i≤ and R 0 = {i} i≥ +1 .Accordingly, we also decompose the Hamiltonian as follows: where v 0 and v have been defined by Eq. (S.13).We note that H S , H L0 and H R0 commute with each other.By shifting the energy origin appropriately, we set H S 0, (S. 43) where means that H S is positive semidefinite, We will divide β into 2q pieces (q ∈ N) and introduce The first step of the proof is the approximation of ρ 0 , which is in the following form: where Φ L1 and Φ R1 are operators supported on L 1 and R 1 , respectively (i.e., L 1 = {i} i≤ /4 and L 1 = {i} i≥3 /4+1 ), and the degree m polynomial F m (x) approximates the exponential function e −β0x .For every δ ≤ 1/(3q), we will estimate the length and the degree m such that: with = 3eqδ, where we use ρ 2q 0 = e −βH and 3δqe 3δq ≤ 3eqδ from δ ≤ 1/(3q).Therefore, by choosing ρ = ρ2q 0 /tr(e −βH ), we can achieve the bound (S.39).Note that the condition ≤ e is due to the equations = 3eqδ and δ ≤ 1/(3q).
The second step is to estimate the upper bound of the Schmidt rank of ρ2q 0 , which is given by Then, the sufficient Schmidt rank to achieve the inequality (S.39) is given by a function of q (see Ineq. (S.97) below).By choosing q so that the Schmidt rank is minimum, we will show that the Schmidt rank is upper bounded by (S.40).We thus prove Proposition 12.In the following, we are going to show the details of the above arguments.

C. Approximation of ρ0
In the following, we define a parameter ν as follows: In addition, we choose q such that q 2 ≥ β. (S.49) We first relate the two operators ρ 0 = e −β0H and e −β0H0 .We can formally write where Φ is usually highly non-local operator.The first lemma ensures that the Φ is approximated by an operator supported on L 1 R 1 : Lemma 21.There exists an operator Φ = Φ L1 ⊗ Φ R1 such that for we have for arbitrary p 0 ∈ N, where we assume −c 1 /β 0 + c 2 β 0 ≤ 0, and c 1 and c 2 are O(1) constants.

Proof of Lemma 21
For the proof, we start from the belief propagation [29], which gives where the operator Φ 0 is defined as where and g(t) is defined as Note that the function g(t) decays exponentially with t and hence the operator φ(τ ) is quasi-local due to the Lieb-Robinson bound [117,118].We aim to obtain the approximation Φ 0 ≈ Φ L1 ⊗ Φ R1 =: Φ0 , and consider the norm difference of for arbitrary p 0 ∈ N.
In order to quantitatively evaluate the quasi-locality of φ(τ ), we first define v 0 (t, H τ , L 1 ) as an approximation of v 0 (t, H τ ) in the region L 1 : We define v (t, H τ , R 1 ) in the same way.By utilizing the Lieb-Robinson bound [118], we obtain the approximation error of where c, c and v are constants of O(1) and we obtain the same upper bound for v (t, H τ ) − v (t, H τ , R 1 ) .By using the notations of v 0 (t, H τ , L 1 ) and v (t, H τ , R 1 ), we define φL1 (τ ) and φR1 (τ ) as follows: We notice that φL1 (τ ) and φR1 (τ ) are supported on the subsets L 1 and R 1 , respectively.We then approximate φ(τ ) by φ(τ ) = φL1 (τ ) + φR1 (τ ) with an error of where the inequality is derived from the approximation error in (S.57) and the exponential decay of g(t) as in Eq. (S.55).

[ End of Proof of Lemma 21]
The lemma implies that as the length becomes large, the approximation error decays exponentially with e −O( /β0) .Thus, in order to achieve the inequality (S.66) Second, we approximate e −β0H0 by an operator with small Schmidt rank.For this purpose, we use the fact that H S , H L0 and H R0 commute with each other, and write e −β0H0 = e −β0(H L 0 +H R 0 ) e −β0H S .Then, we approximate e −β0H S by a low degree polynomial of H S .The most straightforward approximation is given by the truncation of the Taylor expansion, which gives a good approximation of e −β0H S by taking the polynomial degree as large as β 0 H S + log(1/δ 0 ) with δ 0 the precision error.Unfortunately, we cannot get any improvement of the thermal area law if we utilize the Taylor expansion.
One of the keys of our proof is to use the following Lemma from Ref. [89, Theorem 4.1], which allows us to achieve the improved thermal area law: ) there exists a polynomial F m (x) with degree m that satisfies When β 0 H S log(1/δ 0 ), the above estimation gives a significantly better polynomial degree than that from the Taylor expansion.
By using the polynomial F m (x) defined above, we approximate the operator ρ 0 in Eq. (S.51) as (S.69) Because of (S.43), the spectrum of β 0 H S is included in the span of [0, β 0 H S ], and hence the inequality (S.68) gives The next problem is to estimate the approximation error ρ 0 − ρ0 p0 for arbitrary Schatten p 0 -norm.We prove the following lemma: where c 3 is an O(1) constant.

Proof of Lemma 23
From the definitions (S.51) and (S.69) of ρ 0 and ρ0 , respectively, we start from the inequality where we used Hölder's inequality (S.22).From the definition (S.60) of Φ L1 ⊗ Φ R1 , we obtain We next consider where { Ẽs } D LR s=1 and {ε s } D S s =1 are eigenvalues of H L0 + H R0 and H S , respectively.Note that the Hamiltonians H L0 , H R0 and H S commute with each other and are diagonalizable simultaneously.From the assumption (S.43), we have ε 1 = 0, and ε D S ≤ H S .
Let us simplify the expression for all the parameters appearing so far.We consider where we define D S = d =: e c d and use the expression of in Eq. (S.66).From the assumption (S.49), we have where we use H S ≤ g because of h i ≤ g from Eq. (S.10).Hence, we obtain the following simpler form of m: 8.The decomposition of the system in the Schmidt rank analysis

D. Schmidt rank analysis
The remaining task is to estimate the Schmidt rank of the operator ρ2q β0 which is given by (S.47).For this purpose, we consider the following more general problem for the simplicity of notation.We also utilize the lemma in the subsequent sections.Let us define a decomposition of the total system into L, S and R (see Figure 8).We then aim to estimate the Schmidt rank of an operator of the form Ĝm where G m (x) is an arbitrary degree m polynomial, the operators Φ 1 and Φ 2 are supported on L and R respectively and H S is a local Hamiltonian on the subset S (|S| = ).The Schmidt rank estimation for an arbitrary polynomial of H has been given in Ref. [23].However, in the present case, the additional operators Φ 1 and Φ 2 prohibit the direct application of that results to (S.86).In the following lemma, we can obtain the modified version of the Schmidt rank estimation in Ref. [23].For the generalization to high-dimensional systems in Sec.S.III we consider the high-dimensional Hamiltonian (S.15).

Lemma 24.
For an arbitrary operator in the form of (S.86), the Schmidt rank across the bi-partition of the system to the left and right at the point i ∈ S is upper bounded by where ∂Λ and D loc are defined in (S. 16) and (S.18), respectively.If we consider one-dimensional Hamiltonian with two-body interactions (k = 2), we have |∂Λ| = 1 and D loc ≤ d.
We can further extend Lemma 24 to the following operator where S j ⊆ S (j = 1, 2, . . ., p) with |S| = .We then obtain the following corollary: Corollary 25.For an arbitrary operator in the form of (S.86), the Schmidt rank across the bi-partition of the system to the left and right at the point i ∈ S is upper-bounded by where ∂Λ and D loc are defined in (S. 16) and (S.18), respectively.
Proof of Corollary 25.The proof is the same as that of Lemma 24.The difference is that the inequality (S.92) is replaced by ,s m,M in (S.92) for the Hamiltonian H S .We then obtain the same inequality as (S.95), and prove the inequality (S.89).This completes the proof.

Proof of Lemma 24
We apply an analysis similar to that in Ref. [26], which modified the proof in [23] for the Schmidt rank estimation.First, we decompose S into l 0 blocks {B s } l0+1 s=0 with |B s | = k (s = 1, 2, . . ., l 0 ) and l 0 = ˜ /k (see Fig. 9).Here, ˜ is a control parameter such that ˜ ≤ .We then decompose the Hamiltonian H S as 9. The decomposition of the subset S into blocks.
where h Bs is comprised of the internal interactions in B s and block-block interactions between B s and B s+1 .Note that the interaction length is at most k, and hence only adjacent blocks can interact with each other.Also, from the inequality(S.18),the Schmidt rank of h Bs is upper-bounded by We expand G m (H S ) = m j=0 a j (H S ) j by using the decomposition (S.91).Using the polynomial interpolation argument in [23], it holds that (see [26,  ,s m,M is derived from Ĝm,M by considering only those terms in which h Bs occurs at most (mM/l 0 ) times.Let us H S = P + h Bs + Q, where P is to the "left" of h Bs and Q is to the "right" of h Bs and expand the powers H S .Any particular power (H S ) T −1 is a linear combination of the following terms: with T i=1 (p i + q i ) ≤ T and T ≤ T .This allows us to expand Ĝm,M as a linear combination of the following terms: Φ 1 (P p1,1 Q q1,1 ) h Bs (P p1,2 Q q1,2 ) h Bs . . .(P p 1,T 1 −1 Q q 1,T 1 −1 ) h Bs (P p 1,T 1 Q q 1,T 1 ) Φ 2 Φ 1 (P p2,1 Q q2,1 ) h Bs (P p2,2 Q q2,2 ) h Bs . . .(P p 2,T 2 −1 Q q 2,T 2 −1 ) h Bs (P p 2,T 2 Q q 2,T 2 ) . . . (S.93) Above, the positive integers T i and the powers p i,k , q i,k ≥ 0 are such that since the total degree is mM .But, recall that we are interested in Ĝ≤ mM l 0

,s m,M
where h Bs occur at most (mM/l 0 ) times, which enforces the following constraint M i=1 The number of the combinations of positive integers {T 1 , T 2 , . . .T M } satisfying this inequality is smaller than ( mM l0 + M )-multicombination from a set of M elements, and hence is upper-bounded by When a tuple {T 1 , T 2 , . . .T M } is given, the number of the non-zero integers in {p i,k , q i,k } i∈[M ],k∈[Ti] which appears in Eq. (S.93) is equal to Therefore, for a fixed {T 1 , T 2 , . . .T M }, the number of the combinations of positive integers {p i,k , q i,k } i∈[M ],k∈ [Ti] satisfying Eq. (S.94) is upper-bounded by (mM )-multicombination from a set of ( For each of the non-zero integers in {p i,k , q i,k } i∈[M ],k∈ [Ti] , the Schmidt rank of the expression in Eq. (S.93), across the cut between B s and B s+1 , is at most D .Therefore, we finally arrive at the inequality of where we use l 0 = ˜ /k in the last equation.By applying the above inequality to (S.92), we obtain where we use mM 2 ≤ (mM ) 2 ˜ .This completes the proof.

[ End of Proof of Lemma 24]
In considering one-dimensional systems in Lemma 24 with ˜ = , we have Now, we specify the choice of q by solving for where we use the definition of ν in Eq. (S.48).This gives the result of where we choose q appropriately so that the condition (S.49) may be satisfied (i.e., β ≤ q 2 ).By applying the notation of q * in Eq. (S.41) to (S.97), we finally obtain SR(ρ 2q 0 ) ≤ e q * log(q * ) .(S.100) This completes the proof of Proposition 12.

S.III. PROOF OF PROPOSITION 12 IN HIGH DIMENSIONAL CASES
We here prove the improved thermal area law for high-dimensional Hamiltonians (S.15).

A. Restatement
For the convenience of the reader, we restate the statement in the form of the following theorem: Theorem 26.Let us consider D-dimensional lattice and a vertical cut of the total system: where we use the notation in Eq. (S. 16).Then, we obtain the improved area law for the mutual information as follows: where C is a constant which depends on k, g, d and D.

Remark.
The above upper bound is qualitatively better than the established thermal area law of I ρ (L : R) β|∂Λ| for β log 2 (|∂Λ|).For the simplicity, we here consider a vertical cut of the total system, but the generalization to rectangular cut is straightforward.We notice that the logarithmic correction originates from the super-exponential dependence of m in Lemma 24.If we can improve the m-independence in Lemma 24 we can prove the improved area law in the form of I ρ (L : R) ≤ C|∂Λ|β 2/3 .

B. High-level overview
We, in the following, restrict ourselves to the inverse temperature such that since the regime of β < log 2 (|∂Λ|) in (S.101) has been already covered by the previous thermal area law [28].The proof strategy is very close to that in one-dimensional case.We here relabel each of the sites such that L = {Λ i } i≤ /2 and R = {Λ i } i≥ /2+1 (see Eq. (S.16) for the definition of Λ i ), where the length is an integer which is multiple of 4 to be chosen later [121].We then decompose the total system into three pieces L 0 , S and R 0 (see Fig. 7), where L 0 = {Λ i } i≤0 , S = {Λ i } 1≤i≤ and R 0 = {Λ i } i≥ +1 .Accordingly, we also decompose the Hamiltonian as follows: where v i is defined in Eq. (S.17).We note that H S , H L0 and H R0 commute with each other.As in the one dimensional case, by shifting the energy origin appropriately, we set H S 0, (S. 105) where means that H S is positive semidefinite, We divide β into 2q pieces (q ∈ N) and introducing ρ β0 := e −β0H , β 0 := β/(2q). (S.106) The first difference from the one dimensional case is that we cannot derive Lemma 21, since we cannot utilize the belief propagation technique [29] in high-dimensional systems.In high-dimensional cases, the operator φ(τ ) in Eq. (S.54) has the norm of O(β 0 |∂Λ|), while in one dimensional case, it has the norm of O(β 0 ).This fact reduces the approximation error in (S.52) to e −O( /β0)+O(β0|∂Λ|) ρ 0 p0 in high-dimensional systems.Hence, we need to choose = O(β 2 0 |∂Λ|) to ensure a good approximation error, but this is too large to be utilized in the derivation of the improved thermal area law.
In order to overcome this difficulty, we choose q = O(β) such that As shown in Lemma 27 below, this condition allows us to construct the operator ρβ0 as in (S.45) (i.e., ρ β0 − ρβ0 2qp ≤ δ ρ β0 2qp ) in the following form: The mutual information is determined by the upper bound of the Schmidt rank of ρ2q β0 , which is given by Φe The Schmidt rank of the above operator is (mq) O(q)+O( |∂Λ|)+O(mq/ ) from Lemma 24.In the one-dimensional case, for q = O(β), this estimation gives the Schmidt rank of e O(β) and spoils the improved thermal area law.However, in high-dimensional systems, the contribution of e O(β) is much smaller than e β 2/3 |∂Λ| as long as β ≤ |∂Λ| 3 .Therefore, it is still possible to derive an improved area law from the approximation by (S.110).This point is the second difference between 1D case and high-dimensional cases.
In the following, we show how the basic lemmas in one-dimensional case are extended to the high-dimensional cases.
C. Approximation of ρ β 0 of (S.108) We relate the two operators ρ β0 = e −β0H and e −β0H0 .We can formally write where Φ = e −β0H e β0H0 is usually highly non-local operator.The lemma below ensures that the Φ is approximated by an operator supported on L 1 R 1 : Lemma 27.Let H and H0 be Hamiltonians as follows: We then define Then, for β ≤ 1/(32gk) and ≥ 2k log(|∂Λ|), the approximated operator for arbitrary p 0 ∈ N.
As has been mentioned in Sec.S.III B, this decomposition has an advantage over the belief propagation method used in Lemma 21.By using the decomposition (S.115), we can achieve the upper bound of |∂Λ|e −O( ) instead of e −O( )+O(|∂Λ|) .

Proof of the inequality (S.124)
We start from the following equation: V − e τ H0 e −τ H0 V e τ H0 e −τ H0 = − From Lemma 20 or inequality (S.38), the norm of the multi-commutator is upper-bounded by where h X is supported on X such that |X| ≤ k.Then, because of the definition of v i in Eq. (S.17), we have where we use By using the inequality From max(τ, x) ≤ β ≤ 1/(32gk), the above inequality reduces to where we use τ ≤ β ≤ 1/(32gk) in the second inequality.This completes the proof.

[ End of Proof of Lemma 27]
The lemma implies that as the length becomes large, the approximation error decays exponentially with e −O( ) .Thus, in order to achieve the inequality we need to choose as ≥ 2k log(4|∂Λ|/δ).(S.141) We approximate e −β0H0 by an operator with small Schmidt rank.For this purpose, we use the fact that H S , H L0 and H R0 commute with each other, and write e −β0H0 = e −β0(H L 0 +H R 0 ) e −β0H S .Then, we approximate e −β0H S by using the polynomial of H S in Lemma 22.By using the polynomial F m (x) defined there, we approximate the operator ρ β0 in Eq. (S.115) as ρβ0 := Φe −β0(H L 0 +H R 0 ) F m (β 0 H S ). (S.142) Because of (S.43), the spectrum of β 0 H S is included in the span of [0, β 0 H S ], and hence the inequality (S.68) gives The next problem is to estimate the approximation error ρ β0 − ρβ0 p0 for arbitrary Schatten p 0 -norm.We prove the following lemma, which is similar to Lemma 23:  where we use tr(e −p0β0H0 ) = e −β0H0 p0 p0 , β 0 ≤ 1/(32gk), and derive the upper bound of v i from the definition (S.17)We aim to give an explicit algorithm to obtain the MPO approximation of e −βH for β ≤ 1/(8gk).We decompose the total system into small blocks {B s } n0 s=1 with length 0 (i.e., |B s | = 0 ), which gives Λ = n 0 0 (see Fig. 10).In fact, we may not be able to find an integer n 0 satisfying n = n 0 0 , but we can arbitrary extend the system size Λ → Λ δΛ without changing the Hamiltonian.We only have to add 0 operators and the form of (S.10) is still retained as follows: where h X = 0 if X ∩ δΛ = ∅.

RFIG. 2 .
FIG. 2.Comparison between real-time evolution and imaginary time evolution in the tight-binding model(16) with R = 500.In (a) and (b), we plot the fluctuation of the position Var(X) after the real and imaginary time evolutions, respectively.Here, the state at the initial time is given by |0 .The fitting functions for (a) and (b) are given by √ 2t and 0.998769β 1/2 , respectively.This clearly indicates that the real-time evolution induces a ballistic propagation, whereas the imaginary-time evolution induces a diffusive propagation.
Suppose x is fixed to be in a range [0, b].As shown in [89], e −x can be approximated by a polynomial of degree O(b 1/2 ), for a constant error.This is the consequence of a random walk that is concentrated around degree O(b 1/2 ) after b steps.Let us introduce y ∈ (−1, 1) such that x = b(1 + y)/2 and e −x = e − 1 2 (1+y) b

4 p(r 4 |r 3 )FIG. 3 .
FIG.3.Schematic picture of the random walk.The exponential function e −b(1+y)/2 is given by the expectation of Tr b (x) with the probability P (r b ), as in Eq. (17).The probability P (r b ) is generated from the b-step random walk.In each step, the probability from r to r is given by p(r |r), which is a symmetric function around r.In the picture, we give the numerical plot of p(r|2), where the shape of p(r|r ) does not depend on r .

FIG. 4 .
FIG. 4. Our algorithm proceeds by iterated approximations of e −β 0 H , performed β/β0 times.In each step, we approximate the Gibbs operator e −β 0 H by the operator M β 0 .For this, we establish a decomposition of e −β 0 H as a product of operators shown on the right-hand side.This uses an imaginary-time version of the Lieb-Robinson bound and the Taylor truncation of the exponential function.
in Eq. (29) is upper-bounded by m O( √ m) ∼ log(n) √ log n along every cut.Because {Φ (m) j (β/β0) β0 in Eq. (29) • Stronger norm inequality for imaginary time evolution: As discussed in Sec.V D, we observed that an approximation of the form O D e βH − 1 ≤ instead of the current one e −βH − O D p ≤ e −βH p would lead to an imaginary-time version of the SIE theorem.
ACKNOWLEDGMENTS A.A. would like to thank David Gosset for introducing the excellent survey [89] on polynomial approximations.The work of T.K. is supported by the RIKEN Center for AIP and JSPS KAKENHI Grant No. 18K13475.Part of the work was done when T.K. was visiting the Perimeter Institute.A.A. is supported by the Canadian Institute for Advanced Research, through funding provided to the Institute for Quantum Computing by the Government of Canada and the Province of Ontario.This research was supported in part by the Perimeter Institute for Theoretical Physics.Research at the Perimeter Institute is supported in part by the Government of Canada through the Department of Innovation, Science and Economic Development Canada and by the Province of Ontario through the Ministry of Colleges and Universities.

FIG. 6 .
FIG.6.Decomposition of the system in the Schmidt rank analysis