Locality of temperature

This work is concerned with thermal quantum states of Hamiltonians on spin and fermionic lattice systems with short range interactions. We provide results leading to a local definition of temperature, thereby extending the notion of"intensivity of temperature"to interacting quantum models. More precisely, we derive a perturbation formula for thermal states. The influence of the perturbation is exactly given in terms of a generalized covariance. For this covariance, we prove exponential clustering of correlations above a universal critical temperature that upper bounds physical critical temperatures such as the Curie temperature. As a corollary, we obtain that above the critical temperature, thermal states are stable against distant Hamiltonian perturbations. Moreover, our results imply that above the critical temperature, local expectation values can be approximated efficiently in the error and the system size.


I. INTRODUCTION
The ongoing miniaturization of devices, with structures reaching the nanoscale, has lead to the development of extremely small thermometers [1,2], some of which are so small that they can only be read out with powerful electron microscopes [3]. Even small thermal machines working in the quantum regime have been suggested [4,5]. In order to understand the working of such devices, it is necessary to formulate a theory of statistical mechanics and thermodynamics at the microscopic and mesoscopic scales. A prerequisite for this is a good understanding of the limitations of the concept of temperature at small scales.
The problem with assigning locally a temperature to a small subsystem of a globally thermal system is the following: Interactions between the subsystem and its environment that generate correlations can lead to noticeable deviations of the state of the subsystem from a thermal state (see Figure 1). Hence, given only a subsystem state, there is no canonical way to assign a temperature to the subsystem. We call this the locality of temperature problem.
The first steps toward a solution of the locality of temperature problem have been taken in Refs. [6][7][8], and more recently, within the mindset of quantum information theory, in Ref. [9]. The general locality of temperature problem is, however, still open. In this work, we conclusively solve it for spin and fermionic lattice systems.
More precisely, we first show that the locality of temperature problem is equivalent to a decay of correlations measured by an averaged generalized covariance that precisely captures the response of expectation values to perturbations of the Hamiltonian. We expect the corresponding equality to be useful for applications beyond the scope of this article.
We then provide conditions under which the generalized covariance decays exponentially with the distance, including a detailed analysis of the preasymptotic, and of the finite-size regime. In particular, this exponential decay holds above a universal critical temperature that only depends on the "connectivity" of the underlying graph of the model and is an upper bound on physically relevant critical temperatures such as the Curie temperature.
While, in the low-temperature regime, quantum lattice T ? ? ?
? ? Figure 1. The locality of temperature problem: Subsystems of thermal states are themselves, in general, not in a state with a locally well-defined temperature. Down to which length scale can temperature be an intensive quantity? models exhibit a great diversity of phases, many of which involve the emergence of long-range or topological order [10], in the high-temperature regime, exponential clustering of correlations is expected. Our rigorous results help to delineate the boundary between these two regimes. They build upon and go significantly beyond previous results on the clustering of correlations in classical systems [11], for quantum gases [12], i.e., translation-invariant Hamiltonians in the continuum, and cubic lattices [13][14][15].
Mathematically, we significantly contribute to the problem of whether and under which precise conditions thermal quantum states are stable against distant Hamiltonian perturbations. This is particularly relevant in the broader scheme of phase transitions in classical and quantum lattice models [14,16] as well as for the foundations of statistical mechanics and the equilibration and thermalization behavior of closed quantum systems [17][18][19][20][21][22][23][24][25]. In the light of the recent surge of interest in these topics, developing a better understanding of the properties of thermal states has become a timely issue.
A major obstacle to progress on some of the most interesting open questions in this context, such as equilibration time scales in closed quantum systems, is the limited set of mathematical tools available for exploiting the structure of locally interacting Hamiltonians [25]. Our results are among the first that explicitly exploit properties of local Hamiltonians, without being limited to very specific models.
For quantum Monte Carlo simulations [26], our results provide a guideline as to how large the finite system size has to arXiv:1309.0816v4 [quant-ph] 1 Aug 2014 be taken in order to be able to sample from the right partition function and, conversely, to identify observables that are best suited to detect long-range correlations.
In fact, our results are reminiscent of known statements about ground states. If a Hamiltonian has a unique ground state and is gapped, correlations in its ground state cluster exponentially and faraway regions become essentially uncorrelated. This is rigorously proven using information theory inspired methods such as Lieb-Robinson bounds and quasiadiabatic continuation [27][28][29]. These rigorous results allow for certified algorithms that efficiently approximate ground states of gapped Hamiltonians on classical computers [30]. In the same spirit, we are able to show that an exponential decay of correlations renders thermal states locally efficiently simulatable.
The rest of this paper is structured as follows: In Section II, we formulate the precise setting and explain the main results and their implications. In Section III, we discuss connections to known results on phase transitions, thermalization in closed quantum systems, and matrix product operator approximations. Then, in Section IV, we discuss basic properties of the generalized covariance, explain how our results can be made applicable to finite-range k-body interactions, and state the results for fermionic lattices. We proceed with proving all theorems in Section V and conclude in Section VI. In the Appendix, we provide a detailed proof of two bounds on truncated cluster expansions, one of which is an important ingredient to the proof of clustering of correlations.

II. SETTING AND MAIN RESULTS
In this section, we introduce the setting, state the locality of temperature problem more formally, and state our results.

A. Perturbation formula for thermal states
As the first result, we state a perturbation formula, which is a general statement about the response of the expectation value of an observable in the thermal state, upon changes in the system Hamiltonian. It does not make any reference to the locality structure of the Hamiltonian but turns out to be especially useful when correlations between local observables decay rapidly with distance.
Throughout the paper, we assume the Hilbert space to be finite dimensional [31] and denote the thermal state, or Gibbs state, of a Hamiltonian H at inverse temperature β by We measure correlations by the (generalized) covariance that we define for any two operators A and A , full-rank quan-tum state ρ, and parameter τ ∈ [0, 1] as We discuss various properties of this covariance and generalizations to arbitrary-rank quantum states in Section IV A.
The generalized covariance appears naturally in our first theorem about the response of expectation values to perturbations. More precisely, when we are given an unperturbed Hamiltonian H 0 and perturbed Hamiltonian H, then the difference of expectation values in the corresponding thermal states is captured by that covariance: for any operator A.
The proof of the theorem, which is presented in Section V A, relies on the fundamental theorem of calculus and Duhamel's formula. We refer to the double integral over the covariance in Eq. (3) as the averaged (generalized) covariance.

B. Spin lattice systems
In the remainder of this work, we will be concerned with spin and fermionic lattice systems. We will only write out everything for spin systems and then later, in Sections IV C and V C, explain the necessary modifications for fermionic systems. In the case of spin lattice systems, the Hilbert space is given by H = x∈V H x , where V is called the vertex set and is assumed to be finite. To make the presentation more accessible, many of the following definitions are highlighted in Figure 2. A local Hamiltonian with interaction (hyper)graph Given some subsystem S ⊂ V there are two natural thermal states associated with it: (i) g S (β) := g[H S ](β) denotes the thermal state of S alone, i.e., the thermal state of the truncated Hamiltonian H S . (ii) g S (β) := Tr S c (g(β)) denotes the full thermal state reduced to S. For a non-interacting Hamiltonian, these two states coincide, but, in general, this is not the case due to correlations between S and its environment. This discrepancy raises the question of how to define temperature locally as an intensive quantity, i.e., the locality of temperature problem.

C. Locality of temperature
In order to locally assign a temperature to the subsystem S ⊂ V it was suggested, e.g., in Ref. [9], to extend S by a buffer region and define the temperature of S via the thermal state of the Hamiltonian truncated outside the extended region B, see Figure 2. The role of the buffer region B is to remove the boundary effects and the correlations with the rest of the system that are intuitively the reason for the locality of temperature problem. Nevertheless, it is not obvious how these correlations should be quantified and how large this buffer region needs to be. We will see shortly that Theorem 1 answers these questions.
By ∂B ⊂ E, we denote the set of boundary edges of B, i.e., the edges having overlap with both B and its complement B c := V \ B. Then, by choosing H 0 = H − H ∂B in Theorem 1, using that g 0 = g B ⊗ g B c , and tracing over B c , we obtain the following: Now we choose S ⊂ B ⊂ V (see Figure 2). If, for a given inverse temperature β, correlations over the distance between S and ∂B are negligible, then the corollary clearly implies that for any observable A B = A S ⊗ 1 B\S on S. Also note that such an approximate equality does not hold whenever average correlations over lengths exceeding the distance between S and ∂B are non-negligible. Hence, we have the following equivalence for the temperature defined via thermal states: Implication 1 (Locality of temperature). Temperature is intensive on a given length scale if and only if correlations (measured by the averaged generalized covariance) are negligible compared to 1/β on that length scale.
In order to fully exploit Corollary 1, it is necessary to bound the generalized covariance, which we will do for high temperatures in the next section.

D. Clustering of correlations at high temperatures
For small temperatures, correlations can be arbitrarily longranged, as is, e.g., the case for the ferromagnetic Ising model in two or higher dimensions below the Curie temperature. On the other hand, above a universal critical temperature, depending only on a local property of the interaction graph, correlations cluster exponentially, as we will see next. Given the combinatorial nature of parts of the arguments leading to this result, we need additional notation related to edges and vertices of the lattice. Most of the following definitions can be understood intuitively, as is shown in Figure 2.
We say that two subsystems X, Y ⊂ V overlap if X ∩ Y = ∅, a set X ⊂ V and a set F ⊂ E overlap if F contains an edge that overlaps with X, and two sets F, F ⊂ E overlap if F overlaps with any of the edges in F . A subset of edges F ⊂ E connects X and Y if F contains a sequence of pairwise overlapping edges such that the first overlaps with X and the last overlaps with Y and similarly for the case where X and/or Y are just vertices.
The graph distance on V , and also the induced distance on subsets of V , are denoted by d. The distance d(X, F ) of a subset X ⊂ V and a subset F ⊂ E is 0 if X and F overlap and otherwise equal to the size of the smallest subset of E that connects X and F . Sometimes, we denote the support of an operator by the operator itself, e.g., for two operators A and A , their distance is d(A, A ) := d(supp A, supp A ) and ∂A ⊂ E are the edges across the boundary of supp(A).
A subset of edges F ⊂ E that connects all pairs of its elements λ, λ ∈ F is called connected. Such a connected set F is also called an (edge) animal. The size |F | of an animal F is given by the number of edges contained in F . The results presented here apply to Hamiltonians with interaction graphs (V, E) whose number a m of lattice animals of size m containing some fixed edge is exponentially bounded. With a m := sup λ∈E |{F ⊂ E connected : λ ∈ F, |F | = m}| , (7) the growth constant α is the smallest constant satisfying For example, the growth constant of a D-dimensional cubic lattice can be bounded as α ≤ 2 D e (Lemma 2 in Ref. [32]), where e is Euler's number. Moreover, α is finite for any regular lattice [33]. Upper bounds to growth constants for socalled spread-out graphs [32] render our results applicable for the case of bounded-range two-body interactions. By a simple embedding argument, one can also bound the growth constant for the case of local k-body interactions on a regular lattice, which we explain in Section IV B in detail. For any operator A and p ∈ [1, ∞], we denote by A p its Schatten p-norm; e.g., A ∞ is the operator norm and A 1 is trace norm of A. We call J := max λ∈E h λ ∞ the local interaction strength of a local Hamiltonian, as given in Eq. (4).

E. Universal locality and stability at high temperatures
If one is interested in the state g S (β) of some subsystem S, then one can truncate the Hamiltonian to S extended by some buffer region and obtain the approximation via the thermal state of the truncated Hamiltonian. The following theorem implies that the approximation error is exponentially small in the width of the buffer region. For any operator ρ, we denote its reduction to a subsystem S ⊂ V by ρ S := Tr S c [ρ] and note that Then, as a consequence of Corollary 1 and Theorem 2, we obtain the following: Corollary 2 (Universal locality at high temperatures). Let H be a Hamiltonian satisfying the conditions of Theorem 2, let |β| < β * , and let S ⊂ B ⊂ V be subsystems with d(S, ∂B) ≥ L 0 (β, | ∂S|). Then, (13) where g S B denotes the thermal state of B reduced to S and v := 4 | ∂S| | ∂B|/ ln(3).
Similarly, as a corollary of Theorems 1 and 2 we obtain the following: Implication 2 (Stability). Below the critical inverse temperature β * [from Eq. (9)], thermal states of local Hamiltonians are exponentially stable against distant locally bounded perturbations.

F. Efficient approximation
Corollary 2 on the universal locality of thermal states also has the following complexity theoretic consequence: Implication 3 (Efficient approximation). For |β| < β * , local expectation values can be approximated with a computational cost independent of the system size and bounded polynomially in the reciprocal error.
In this sense, the error bound (see Figure 4) of Corollary 2 is reminiscent of the quasi-locality of dynamics, as, e.g., presented in Ref. [34], which is a consequence of Lieb-Robinson bounds [35,36]. The quasi-locality theorem [34] allows for an approximation of time evolved local observables by truncating the Hamiltonian in the time evolution operator at a distance L > 0 far away from the space time cone of the observable's support and has an approximation error that is exponentially small in L.

G. Fermions
In Ref. [37], it was shown for fermionic systems that twopoint functions of observables that are odd polynomials in the fermionic operators decay exponentially with a correlation length proportional to the inverse temperature. Here, we obtain an exponential decay of the covariance above the critical temperature for all operators.
Observation 1 (Fermions). All results also hold for locally interacting fermions on a lattice. See Theorem 4 and Corollaries 4 and 5 in Section IV C for the precise statements.

III. RELATIONS TO KNOWN RESULTS
In this section, we discuss the critical temperature from the clustering theorem, the connection of this work to concepts related to thermalization, and approximations of thermal states with so-called matrix product operators. As a last point, we briefly mention similarities with local topological quantum order.

A. Critical temperatures and phase transitions
Our results show that the quantity β * , as defined in Eq. (9), provides a potentially coarse but universal and completely general upper bound on physical critical temperatures like the Curie temperature. For the ferromagnetic two-dimensional isotropic Ising model without external field, our bound yields, for example. 1/(β * J) = 2/ ln((1 + 1 + 1/e)/2) ≈ 24.58, whereas the phase transition between the disordered paramagnetic and the ordered ferromagnetic phases is known to really happen at 1/(β c J) = 2/ ln(1+ √ 2) ≈ 2.27 [16]. Our universal bound is about an order of magnitude higher than the actual value for this example. To put this discrepancy into perspective, it is worth pointing out that it is generally a very difficult task to estimate physical critical temperatures -numerically or analytically. In fact, analytic expressions for critical temperatures or even just bounds on their values are known only for very few models.
One of the few known general statements is the Mermin-Wagner-Hohenberg theorem [38]. It states that in certain lowdimensional systems with short-range interactions there cannot be any phase transition involving the spontaneous breaking of a continuous symmetry at any non-zero temperature. However, such systems can still have a low-temperature phase with quasi-long-range order characterized by power-law-like decaying correlations. Consequently, even for systems covered by the Mermin-Wagner-Hohenberg theorem, our Theorem 2 is nontrivial. For example, it implies an upper bound on the critical temperature of the Kosterlitz-Thouless transition in the two-dimensional XY -model [39].
In this work, we have concentrated on the general picture, but it seems likely that refinements of the methods employed and developed here can yield much tighter bounds on critical temperatures if more specific properties of a model are taken into account. At the same time, it remains an open problem to actually find a model with a phase transition with long-range order at the universal highest possible temperature.

B. Foundations of statistical mechanics
The recent years have seen a large number of numerical and experimental (see Ref. [21] for a review) as well as analytical investigations (see, for example, Refs. [17][18][19][22][23][24]) of equilibration and thermalization in closed quantum systems. In the focus of these works are the approach to equilibrium or properties of energy eigenstates. The current work complements this body of literature in that it shows fundamental properties of systems in thermal equilibrium. A feature that makes the current work unique is that, contrary to essentially all other works, the results derived here explicitly use the structure of locality interacting systems (noteworthy exceptions are Ref. [24] and, albeit in a very special setting, Ref. [18]).
The locality of thermal states is also of interest for recent results [24] on the dynamical thermalization of translationinvariant lattice models: Our Corollary 2 guarantees the existence of a "unique phase" [24] for all temperatures above our critical temperature. Hence, it implies that at sufficiently high temperatures, Theorems 1, 2, and 3 of Ref. [24] are applicable for any translation-invariant Hamiltonian.
There is also an interesting connection of our locality and stability results to the so-called eigenstate thermalization hypothesis (ETH) [20,21]. The ETH essentially conjectures that the expectation values of certain physically relevant observables (for example local ones) in energy eigenstates of sufficiently complex Hamiltonians should be very similar to the expectation values in thermal states with the same average energy. Corollary 2 and Implication 2 thus imply that the eigenstates of a Hamiltonian in the center of the spectrum (which correspond to high-temperature thermal states) must, if the Hamiltonian fulfills the ETH, also be locally stable against perturbations of the Hamiltonian. This insight could put constraints on the class of Hamiltonians that fulfills the ETH, provide new insights into the properties of their eigenstates, and open up new ways to test the ETH.

C. MPO approximation of thermal states
Matrix Product Operators (MPOs) are a certain class of operators that are tractable on classical computers for onedimensional systems. Therefore, they play an important role in numerical simulations based on so-called tensor networks.
An important ingredient to our proof of Theorem 2 on clustering of correlations will be a bound on a truncated cluster expansion (Lemma 1). The original result on the cluster expansion (Lemma 2 in the Appendix) is due to Hastings and was first used to approximate thermal states with inverse temperature below 2 β * by MPOs [40]. This approximation is summarized in the next theorem.
In one spatial dimension, this MPO approximation yields a tensor size bounded polynomially in the system size and the approximation error (see the subsequent corollary). In higher dimensions, however, the MPO approximation yields a tensor size bounded only subexponentially in the system size and is hence not computationally efficient, albeit exponentially cheaper than storing the full density matrix g(β). In order to explain this in more detail, we start the discussion with a slightly non-standard definition of MPOs: j=1 be a basis for the operators on H x and write an arbitrary operator A on H in the product basis as with expansion coefficients A k ∈ C and where [d 2 ] := {1, 2, . . . , d 2 }. If the A k are of the form where every a[x](k) only depends on at most r of the |V | indices k x , then A is called an MPO with tensor size d 2r .
Thermal states can be approximated by such MPOs. The following theorem is a consequence of Lemma 2, which we will prove in the Appendix along with Lemma 1.
is the number of vertices within a distance less than L. The approximation error is bounded as In particular, the theorem implies the following: Corollary 3 (Bound on the tensor size). Let D be the spatial dimension of the Hamiltonian's interaction graph (V, E), let n := |E| be the system size, and β < 2 β * with β * from Eq. (9). Then, the MPO approximation in Theorem 3 gives rise to a tensor size of the MPO ρ(β, L) scaling as with some β-dependent constant C. In particular, for D = 1, the bound on the tensor size scales polynomially with n/ .
Let us consider a one-dimensional system and suppose we are explicitly given the MPO tensors a [x] [see Eq. (15)] of an approximation to a state ρ and, similarly, an observable A of MPO form with MPO tensors a[x]. If the tensor sizes of both MPOs scale at most polynomially in the system size, then one can compute the corresponding approximation to the expectation value Tr(ρA) with a computational cost scaling polynomially in the system size. This means that, for instance, global product observables can be approximated efficiently, which is not guaranteed by our Implication 3. The problem with the MPO approximation, however, is that Theorem 3 only guarantees the existence of the MPO tensors but it is not obvious how they can be computed (efficiently).
Proof of Corollary 3. The condition β < 2 β * is equivalent to b(βJ) < 1. Let us denote the bound to the approximation error in Eq. (17) by . Note that the upper bound in Eq. (17) satisfies for distances L being at least logarithmically large in n = |E| and some β-dependent constant C. Then, the distance L necessary to reach must asymptotically be at least as large as Bounding N (L) in terms of the spatial dimension D as N (L) ≤ M L D with some constant M yields a tensor size bounded as (21)

D. Local topological quantum order
It is worth mentioning that Corollary 2 and Implication 2 are very reminiscent of the local topological quantum order condition for open quantum systems introduced in Ref. [41] and the results on the local stability of stationary states of local Liouvillians in Ref. [42]. A slightly different family of local topological quantum order conditions for closed quantum systems [41][42][43][44] has played a very important role in the theory of locally stable (topological) lattice systems and for rigorous proofs of entropic area laws. Corollary 2 similarly characterizes the regime where local perturbations cannot drive any thermal phase transition.

IV. DETAILS
In this section, we first discuss the generalized covariance and then provide details concerning the applicability of our results to Hamiltonians with k-body interactions. Finally, we justify Observation 1 by stating the fermionic versions of our results.

A. The generalized covariance
The generalized covariance defined in Eq. (2), which depends on a parameter τ ∈ [0, 1], provides more information about the correlations between two observables than the standard covariance in a similar way as the class of Rényi entropies characterizes more completely the entanglement properties of a state than simply the von Neumann entropy [45]. While it occurs quite naturally in the perturbation formula Theorem 1, other possible applications are to be explored. Here, we discuss possible generalizations of the generalized covariance to operators of arbitrary rank, show that for operators A and A they are always bounded by A ∞ A ∞ , and comment on convexity and a symmetrized version of the generalized covariance.
A definition of the generalized covariance for states of arbitrary rank is not relevant for this work because for non-zero temperatures thermal states are full-rank operators. However, the discussion of possible generalizations also hints at the behaviour of cov τ at the end points of the unit interval. On the open interval τ ∈ ]0, 1[, it is natural to simply keep the definition from Eq. (2). There are two natural ways to define ρ 0 : Either as, ρ 0 := 1 or as ρ 0+ := lim τ →0 ρ τ , where ρ 0+ turns out to be the projector onto the image of the operator ρ. For each end point τ = 0 and τ = 1, there are hence two natural ways to define cov τ , either such that the generalized covariance is continuous or such that cov 0 defines the standard covariance. Note that for product states and operators with disjoint support, all versions of the generalized covariance vanish. Moreover, for pure states, the continuous version of the generalized covariance vanishes also, meaning that classical correlations are needed to yield a non-zero value.
Next, we show that the generalized covariance is always bounded as irrespective of which definitions are chosen for cov 0 and cov 1 .
We consider a state ρ and defineĀ := A − Tr(ρA). Then, Hölder's inequality generalized to several operators and the fact that X p = |X| p 1/p 1 then imply that and, by noting that Ā ∞ = A ∞ , the bound (23) is proven for the continuous version of the generalized covariance. For the non-continuous versions, the bound follows similarly. The variance cov τ ρ (A, A) induced by the continuous version of the covariance is convex in τ . This can be seen by writing out ρ in its eigenbasis. As one can change the sign of cov τ ρ (A, A ) by just changing the sign of A , the generalized covariance is not convex in τ . But, it might be that its magnitude | cov τ ρ (A, A )| is convex, which is unclear. If this were the case, it would be enough to prove the clustering Theorem 2 only for the end points τ ∈ {0, 1}, and hence the proof could be significantly simplified.
Similarly, as there is a symmetrized version of the standard covariance, one can also symmetrize the generalized covariance with respect to the two operators. Because of the cyclicity of the trace, the generalized covariance satisfies the symmetry property Hence, one can define the symmetrized version of the generalized covariance as follows: Our results can also be phrased in terms of this symmetrized version, since the averaged generalized covariance in the perturbation formula Theorem 1 can easily be rewritten in terms of cov, and a bound analogous to the clustering Theorem 2 holds also for the symmetrized quantity.
B. Bound on the growth constant for local k-body interactions In this section, we show that regular hyperlattices also have a finite growth constant, which renders our results applicable to Hamiltonians with local k-body interactions.
In the case of k-body interactions, the Hamiltonian is again a sum of local terms h λ whose supports are hyperedges λ = supp(h λ ) ⊂ V with |λ| ≤ k. As before, V denotes the vertex set and E the set of hyperedges.
We assume that the interaction hypergraph (V, E) is a regular hyperlattice, i.e., that it can be embedded into a regular hypercubic lattice of a certain dimension D with hyperedges of hypercubic form. Let us denote by R the edge length of the resulting hypercubes. Note that such an embedding is, in general, not unique and changes both the number of terms in the Hamiltonian and the local interaction strength of H. Moreover, the grouping changes the values of the metric d in our results.
In order to find an exponential upper bound to the number a m of hyperanimals composed of m hypercubes, let us define a spread-out graph of range R as the graph with the edge set consisting of all pairs {x, y} with 0 < x − y ∞ ≤ R and x, y ∈ Z D (see Ref. [32]). Notice that as any hypercube is uniquely specified by the coordinates of its "lower left corner", any hyperanimal of size m corresponds to a lattice animal of size m − 1 and range R in the spread-out graph. It follows from Lemma 2 in Ref. [32] that a m ≤ (Ke) m with K = (2 R + 1) D − 1 being the coordination number. Hence, the hyperlattice has a growth constant bounded by α ≤ ((2 R + 1) D − 1) e.
The bound obtained is for most models, far from optimal, in particular, in situations where the supports of the local Hamiltonian terms are very different from hypercubes. For such cases, one can derive tighter but more specific bounds from known results about lattice animals in a similar way.

C. Fermionic versions of the main results
To make Observation 1 about fermions precise, we introduce the setting of interacting fermions on lattices. For each site x ∈ V , the corresponding fermionic operators, i.e., the creation and annihilation operators f † x and f x , act on the fermionic Fock space and satisfy where {A, B} := A B+B A is the anti-commutator. For such systems, all operators can be given in terms of polynomials in the fermionic operators. A monomial of fermionic operators is called even (odd) if it can be written as a product of an even (odd) number of fermionic operators f x and f † y . A polynomial of fermionic operators is called even (odd) if it can be written as a linear combination of only even (odd) monomials, and an operator is called even (odd) if it can be written as an even (odd) polynomial of fermionic operators. According to the fermion number parity superselection rule, only operators that are even polynomials in the fermionic operators are physical observables and Hamiltonians.
As with spin lattice systems, we have again a finite interaction graph (V, E); however, the support of an operator is now to be understood in the picture of second quantization as follows: The support of any operator A being a polynomial in the fermionic operators is the set of vertices of the fermionic operators that occur in the polynomial. Correspondingly, we denote the algebra of the even operators supported on a region X ⊂ V by G X and denote G := G V for short. The Hamiltonian of a fermionic lattice system is of the form with h λ ∈ G λ . For B ⊂ V , the truncated Hamiltonian H B is similarly the sum only over the edges contained in B. As for spin systems, H ∂B is the sum over the boundary edges of B. Theorem 1 also holds for such fermionic lattice systems, and we can prove statements analogous to Corollary 1, Theorem 2, and Corollary 2. Hence, all implications stated in Section II, also hold. All proofs are presented in Section V C.

V. PROOFS
We start this section with the proofs of Theorems 1 and 2. One important stepping stone for the proof of the latter is a tailored version of a bound on a truncated cluster expansion (Lemma 1) from Ref. [40]. Both versions are proven in the Appendix. In the last part of the section, we prove the fermionic versions of our main results, Therorem 4 and Corollaries 4 and 5.

A. Proof of the perturbation formula (Theorem 1)
The two main ingredients in the proof of Theorem 1 are the fundamental theorem of calculus, and Duhamel's formula.
The generalized covariance appears as a natural measure of correlations.
Proof of Theorem 1. Using the fundamental theorem of calculus we obtain After applying Duhamel's formula to both derivatives, i.e., using that Together with the the cyclicity of the trace and the definition of the generalized covariance in Eq. (2), this finishes the proof.

B. Proof of Theorem 2 on clustering of correlations
The proof of Theorem 2 builds on and develops further a cluster expansion of the power series of e −β H in terms of summands of the form where w j ∈ E. For the sake of a compact presentation, we refer to edges as letters, to the edge set E as an alphabet, and call sequences of edges words. For any sub-alphabet F ⊂ E, we denote by F * := ∞ l=0 F l the set of words with letters in F and arbitrary length l, where the length |w| of a word w ∈ E * is the total number of letters it contains. For two words w, v ∈ E * , their concatenation is denoted by w • v := (w 1 , w 2 , . . . , w |w| , v 1 , v 2 , . . . , v |v| ). We call a word c ∈ E * connected or a cluster if the set of letters in c is an animal, i.e., connected. So, clusters are connected sequences of edges where the edges can also occur multiple times, while animals are sets of edges without any order or repetition. A word v is called a sub-sequence of w ∈ E * if v can be obtained from w by omitting letters, i.e., if there is an increasing sequence j 1 < j 2 < . . . < j |v| such that v i = w ji . This will be denoted by v ⊂ w. A connected sub-sequence c ⊂ w is called a maximal cluster of w if c is not a sub-sequence of any other connected sub-sequence of w. Importantly, for any word w ∈ E * , one can permute its letters to a new word w such that h(w ) = h(w) irrespective of the choice of the local terms h λ and such that w = c 1 • c 2 • · · · • c k is a concatenation of maximal clusters c j ⊂ w of w. Note that this decomposition is unique up to the order of the c j .
In the following, we will consider systems that are either n = 2 or n = 4 copies of the original system with Hilbert space H. For any operator A on H, we denote by A (j) the operator on H ⊗n that acts as A on the jth copy, e.g., A (2) := 1 ⊗ A for n = 2. By S (i,j) , we denote the swap operator on H ⊗n that swaps the ith and jth tensor factors, e.g., S 1,2 |k 1 , k 2 , k 3 , k 4 = |k 2 , k 1 , k 3 , k 4 for n = 4. For n = 2, we write S instead of S 1,2 .
We can now state the subsequent lemma, which is a bound on a truncated cluster expansion that is based on a more general, but for our purposes not tight enough bound, used previously in Ref. [40] (see Lemma 2 in the Appendix). The lemma will play an important role in the subsequent proof of Theorem 2.
λ . Consider two operators A and B on H, define b(x) := α e |x| e |x| − 1 , and let |β| be small enough such that b(βJ) < 1. For some set of edges F ⊂ E, let C ≥L (F ) ⊂ E * be the set of words containing at least one cluster c that contains at least one letter of F and has size |c| ≥ L and let us denote the corresponding truncated cluster expansion of e −βH by withh(w) :=h w1hw2 . . .h w |w| . Then, for all τ ∈ [0, 1], In the Appendix, we provide a detailed proof of this lemma. The terms resulting from the expansion of the exponential series of e −βH are classified according to whether they contain a cluster of size at least L that contains a letter from F . One can then show that there is a percolation transition at β * = b −1 (1)/(2J) such that for |β| < β * , the contribution of long clusters is exponentially suppressed.
In the following proof of the exponential clustering, we will use the so-called swap-trick: For any two operators A and B, it holds that which can be checked by a straightforward calculation.
Proof of Theorem 2. Fix some τ ∈ [0, 1]. For any operator As the first step, we write the covariance as Using the swap-trick (39) yields (see Figure 6) where ρ 4 := ρ τ ⊗ ρ τ ⊗ ρ 1−τ ⊗ ρ 1−τ . For the case ρ = g(β), the operator ρ 4 turns out to be Writing out ρ 4 as a power series yields w |w| . Next, we argue that t(w) vanishes whenever w does not contain a cluster connecting the supports of A and B. Without loss of generality, we assume that | ∂A| ≤ | ∂B| and consider C ≥L (∂A) c = E * \ C ≥L (∂A), the set of words that do not contain a cluster containing an edge in ∂A of size L := d(A, B) or larger. The set C ≥L (∂A) c hence contains no words with clusters that connect supp(A) and supp(B). Any word w ∈ C ≥L (∂A) c can be replaced by a concatenation of two words w A and w B such thath (+) (w) =h (+) (w A )h (+) (w B ), where w A contains all maximal clusters of w that overlap with supp(A) and w B all other maximal clusters of w. The operators h (+) (w A ) and 1 ⊗ 1 ⊗ B (−) =:B, andh (+) (w B ) and A (−) ⊗ 1 ⊗ 1 =:Â then have disjoint supports, respectively, and the trace in Eq. (43) factorizes into a product of two traces, one over the subsystem X := supp(Â) ∪ supp(h (+) (w A )) and the other over the rest of the system. It turns out that both vanish: By using the symmetriesÂ = −S 1,2Â S 1,2 , h (+) (w A ) = S 1,2 S 3,4h(+) (w A ) S 3,4 S 1,2 ,Â S 3,4 = S 3,4Â , and that S i,j 2 = 1, one can show, e.g., that This implies that for every w ∈ C ≥L (∂A) c , t(w) ∝ Tr S 1,3 S 2,4Âh(+) (w A ) = 0 . After applying Lemma 1 and using that Â ∞ ≤ 2 A ∞ , and similarly for B, we obtain The fact that the condition β < β * is equivalent to b(2 βJ) < 1 implies that b(2 βJ) L decays exponentially with L. In order to obtain the desired exponential bound (11), we apply the bound ∀x ∈ [0, This guarantees the exponential bound (11) and finishes the proof.

C. Proofs of the fermionic versions of the main results
In order to also establish our main results for fermionic systems, we go through the proofs for spin systems and discuss the necessary modifications. Proof of Theorem 4. We use the same tensor copy trick as in the proof of Theorem 2. Eq. (40) still holds in the fermionic setting. Note that the Hilbert space over which the trace is performed in Eq. (40) is not the Fock space of a system of 4 times the number of modes but the tensor product of four identical fermionic Fock spaces with the canonical inner product. This Hilbert space can be interpreted as that of a system of four types of fermionic particles that are each mutually indistinguishable and subject to (up to τ -dependent prefactors) identical Hamiltonians but do not interact with each other and can be distinguished from each other. It is spanned by tensor products of Fock states. The state g[H (+) ](β) is the thermal state of this system. Eq. (42) with t(w) as defined as in Eq. (43) still holds. Note that the swap operators swap tensor factors, not fermionic modes. Thus, they still satisfy the symmetry relations that are used to prove that only terms corresponding to words w ∈ C ≥L (∂A) can contribute to the covariance.
It remains to show that Lemma 1 still holds in the fermionic setting. Lemmas 3 and 8 are purely combinatorial. Lemmas 4, 5, 6, 7, 9, and 10 only use the local boundedness of the Hamiltonian and that Hamiltonian terms with disjoint support commute. The same holds in the fermionic setting because the Hamiltonian terms must be physical operators, i.e., even polynomials in the fermionic operators. Hence all lemmas used in the proof of Lemma 1 carry over to the fermionic setting. It is then straightforward to see that the proof itself also goes through without any modifications.
Proof of Corollary 5. Tracing out B c in the second trace in Eq. (32) and bounding the integral yields Taking the supremum over all A with A ∞ = 1 and supp(A) ⊆ S and using Theorem 4 finish the proof.

VI. CONCLUSIONS
In this work, we clarify the limitations of a universal concept of scale-independent temperature by showing that temperature is intensive on a given length scale if and only if the correlations are negligible. The corresponding correlation measure turns out to also quantitatively capture the stability of thermal states against perturbations of the Hamiltonian. Moreover, we find a universal critical temperature above which correlations always decay exponentially with the distance. We compare our results to known results on phase transitions, comment on recent advances concerning thermalization in closed quantum systems (e.g., concerning the eigenstate thermalization hypothesis), and discuss known matrix product operator approximations of thermals states. More concretely, our results imply that at high enough temperatures, the error made when truncating a Hamiltonian at some distance away from the system of interest is exponentially suppressed with the distance. As a computational consequence, expectation values of local observables can be approximated efficiently. The following discussion of cluster expansions is expected to be interesting in its own right, as it contains a rigorous formulation of the ideas outlined in Ref. [40]. We will provide a proof of the original statement used to establish Theorem 3 as well as of the tailored statement in Lemma 1, which is used to prove Theorem 2 on the clustering of correlations.
1. The original cluster expansion from Ref. [40] The original cluster expansion is similar to Lemma 1 with just one copy of the system instead of two weighted ones: Lemma 2 (Truncated cluster expansion [40]). Let H = λ∈E h λ be a local Hamiltonian with finite interaction graph (V, E) having growth constant α and local interaction strength J = max λ∈E h λ ∞ , and define b(x) := α e |x| e |x| − 1 . Moreover, let β be small enough such that b(βJ) < 1. For some subset of edges F ⊂ E let C ≥L (F ) ⊂ E * be the set of words containing at least one cluster c that contains at least one letter of F and has size |c| ≥ L and denote the corresponding truncated cluster expansion by

Then
, If one applies this lemma to the setting of Lemma 1, one obtains a bound similar as the one in Eq. The purpose of this section is to prove Lemma 1. But, along the way, we also prove Lemma 2. In order to do so, we start with the introduction of some more notation, mainly concerning clusters and lattice animals. For w ∈ E * and any sub-alphabet G ⊂ E, we write G ⊂ w if every letter in G also occurs in w. By G c := E \ G, we denote the complement of G ⊂ E. The extension of G is defined to be G := {λ ∈ E | ∃λ ∈ G : λ ∩ λ = ∅} and, similarly as for subsystems, its boundary is ∂G := G \ G. Throughout the proof, we fix some subset of edges F ⊂ E. We denote by C ≥L (F ) ⊂ E * the set of words that contain at least one cluster c with c ∩ F = ∅ and |c| ≥ L, and we denote by C k ≥L (F ) the set of words that contain exactly k such clusters. Note that for an animal G ⊂ E, there exists a cluster c ∈ E * such that G = {λ ∈ c}, and if one imposes some order on G, one obtains a cluster. We denote by A =l (F ) and A ≥L (F ) the sets of animals that contain at least one edge of F and are of size exactly l or at least L, respectively. Moreover, we denote by A k ≥L (F ) the corresponding sets of k-fold animals, i.e., For a more compact notation, we write the terms in the exponential series as We will frequently use the following fact: For any Hamiltonian with a finite interaction graph (V, E), the partial series over any set of words W ⊆ E * converges absolutely, i.e., In particular, this bound implies that the order of the terms in the series over any subset of words W does not matter.
In the following proofs of Lemmas 1 and 2 we use several technical auxiliary lemmas, which we will only state and prove subsequently.
Proof of Lemma 1. During this proof, we indicate quantities corresponding toH by a tilde accent, e.g.,f (w) is defined as in Eq. (A.4) but with respect to the local termsh λ ofH while f (w) is defined with respect to the local terms h λ of H.
We start the proof by rearranging the terms in the series over C ≥L (F ) in Eq. (37) according to the number of relevant clusters they contain and use Lemma 3 with b k being the series over C k ≥L (F ) to obtain Lemmas 5, 6, and 9 are the core of the proof. They define a series of operators (ρ m ) ∞ m=1 that have a particularly useful form given in Lemma 10. This form exactly matches the series over k in Eq. (A.8), which leads to the following identity: In the previous steps, the series over words has been rewritten as a series over m-fold animals. Lemma 7 provides a bound onρ(G) that, together with Eqs. (A.9) and (A.10), yields Now, a counting argument for lattice animals from Lemma 8 allows us to bound the series over m-fold animals G in terms of a series of animals Using that the number a l [see Eq. (7)] of lattice animals G with G ∩ F = ∅ and of size |G| = l is bounded by |F | a l and that a l ≤ α l [see Eq. (8)], we obtain Performing the partial geometric series over l with argument b(βJ) < 1 and the exponential series over m yields Eq. (A.2) and completes the proof.
Similarly, we prove Lemma 2: Proof of Lemma 2. By the same argument that led us to Eq. (A.9) in the proof of Lemma 1, we obtain Applying the triangle inequality and using the bound on ρ m from Lemma 9 yields (A.13) Performing the partial geometric series over l with argument b(βJ) < 1 and the exponential series over m yields Eq. (A.2) and completes the proof.
We now prove various lemmas that are used in the previous proofs of Lemmas 1 and 2. which we will use. We prove the identity by induction. A 1 = B 1 is easy to see. Under the assumption that A K = B K for some K ∈ N, we obtain where we have used Eq. (A.16) in the last step. This proves the lemma.
The goal of the following lemmas is to show that ρ m is well-defined and to upper bound it in 1-norm. The order of the lemmas is chosen in a way that makes clear that the two quantities ρ m and ρ(G), which will be defined shortly, are actually well-defined.
We start with a 1-norm bound on the perturbed exponential series.
Lemma 4 (Eq. (21) from Ref. [40]). Let H be a Hamiltonian with finite interaction graph (V, E). For any sequence (G j ) k j=1 of sub-alphabets G j ⊂ E, Proof. The lemma is essentially a consequence of the Golden-Thompson inequality and the fact that the 1-norm of a positive operator coincides with its trace. Using first the Golden-Thompson and then Hölder's inequality, we obtain Iteration completes the proof.
We will use the following lemma to bound the operator norm of certain subseries of f (w).
The following lemma provides a factorization of the series ρ(G) in Eq. (A.33) over words that have no letters on the boundary of an m-fold animal G ∈ A m =l (F ) and contain all letters in G, into exp(−β H (G) c ), whose norm we have bounded in Lemma 4, times a product of operators η(G j ).
The η(G j ) are supported on the single animals G j composing the m-fold animal G. As we will see, a norm bound for η(G j ) follows immediately from the previous lemma, which, in turn, also yields an upper bound on ρ(G). The form of ρ(G) given in Eq. (A.33) together with this upper bound plays an important role in the main cluster expansion. Proof. To simplify the notation, we denote the relevant set of words that contain no letters in ∂G and each letter in G at least once by The idea is to group these words into subsets [w] ⊂ W ⊃G that coincide on the connected components of G and on (G) c and correspondingly split up the series (A.33). We formalize this idea by introducing an equivalence relation on where, for any sub-alphabet G ⊂ E, the restriction w G of a word w ∈ E * is obtained from w by omitting all letters that are not in G . Then, the size of each equivalence class [w] ∈ W ⊃G / ∼ is given by the multinomial coefficient .
is well-defined as a function on the classes. Let us denote the set of words over the alphabet G j that contain all letters at least once by Then, the quotient set can be identified with a Cartesian product of these sets For each equivalence class K ∈ W ⊃G /∼ we pick an arbitrary representative w K ∈ W ⊃G , use the definition of f in Eq. (A.4), and determine that k is the number of connected components of G. This yields Using the definition of η from Eq. (A.32) on the last factors yields The following lemma is a tighter variant of some of the original arguments leading to Lemma 2 for Hamiltonians consisting of two weighted copies of a local Hamiltonian. Its purpose is to provide a specialized tighter bound on ρ(G), which turns out to be sufficient for our purposes. The central idea of the lemma is to expand ρ(G) in the left-hand side of Eq. (A.41) in order to be able to bound the trace using the generalized Hölder's inequality.
Proof. Let us denote k Bounding the trace by the 1-norm and applying Hölder's inequality generalized to several operators yields where in the second step, we have used that X p = |X| where in the second to last step, we have factorized the series and in the last step, we have used Lemma 5.
We will need the following combinatorial lemma: In the following lemma, we define a family of operators ρ m and bound their 1-norms. The bounds, in particular, guarantee that the ρ m are well-defined. In addition, they are useful for the proof of Lemma 2, albeit they are not explicitly needed for the proof of Lemma 1. Proof. Consider a k-fold animal G ∈ A k ≥L (F ) and decompose it into its k non-overlapping animals G j ∈ A ≥L (F ) as While the last lemma provides a bound on ρ m and, in particular, implies that ρ m is well-defined, the next lemma provides a useful form of ρ m . i.e., into sets of words having exactly k ≥ m such clusters. Then, the observation that for every w ∈ W k (G) there are exactly k m many m-fold animals G ∈ A m ≥L (F ) with w ⊂ G completes the proof.