Quantum many-body systems in thermal equilibrium

The thermal or equilibrium ensemble is one of the most ubiquitous states of matter. For models comprised of many locally interacting quantum particles, it describes a wide range of physical situations, relevant to condensed matter physics, high energy physics, quantum chemistry and quantum computing, among others. We give a pedagogical overview of some of the most important universal features about the physics and complexity of these states, which have the locality of the Hamiltonian at its core. We focus on mathematically rigorous statements, many of them inspired by ideas and tools from quantum information theory. These include bounds on their correlations, the form of the subsystems, various statistical properties, and the performance of classical and quantum algorithms. We also include a summary of a few of the most important technical tools, as well as some self-contained proofs.

We are currently at the dawn of the age of synthetic quantum matter.Increasingly better experiments on a variety of quantum platforms are improving in size and arXiv:2204.08349v2[quant-ph] 21 Jul 2023 controllability at unprecedented rates, aided by the current impulse of quantum information science and technology.This gives very good prospects to the exploration of the physics of complex quantum many body systems.Our aspiration to better understand these systems is very well motivated from a scientific perspective, but also potentially from the industrial one: unlocking the potential of complex quantum systems may bring surprising advances to the engineering of new materials or chemical compounds in the future.It may also yield computational tools with unprecedented capabilities for a still unknown range of applications.
Many of the most commonly studied materials and current experimental platforms are described by an arrangement of quantum particles in some sort of lattice configuration.Due to the spatial decay of electromagnetic forces, each of these particles only interacts appreciably with their immediate vicinity, which causes the interactions between them to be local.
In this tutorial, we focus on the properties of these important systems when at thermal equilibrium, so that they are accurately described by the so-called thermal or Gibbs state.We review and explain some of their most important universal properties, covered from a mathematical perspective.That is, we focus on statements that can be proven about states of the Gibbs form where H = l E l |E l E l | is the Hamiltonian, β is the inverse temperature and Z ≡ Tr[e −βH ] is the partition function.The Hamiltonian describes the interactions between the N particles, which are restricted to shortranged or local.A "local Hamiltonian" is a Hermitian operator H in the finite-dimensional Hilbert space of N d-dimensional particles (C d ) ⊗N .It is defined as a sum of terms each of which has support (i.e.acts non-trivially) on at most k particles, and bounded strength, such that For a definition of the operator norm || • || see Sec.II A below.Typically, the Hamiltonians are scaled so that h = O (1).
In what follows we just write the terms as h i for simplicity.These constitute the individual interactions, which are typically arranged in a lattice of a small dimension.A simple example is e.g. the transverse-field Ising model in one dimension with open boundary conditions Here, k = 2 and the interactions are arranged on a 1D chain.
The idea of a local Hamiltonian is very general, and involves many different models describing a wide range of situations, of interest for many fields of physics, chemistry and computer science.The only thing they have in common is the locality of the interactions.We aim to understand mathematically how this fact alone constrains both the physics and the computational complexity when combined with thermal fluctuations.
The thermal states of these general local Hamiltonians appear in many different contexts, and are interesting for a wide variety of reasons.Some of the main ones are: • It is one of the most ubiquitous states of quantum matter: typical experiments happen at finite temperature, where the quantum system at hand is weakly coupled to some external radiation field or phonon bath, that drive it to the thermal state.For completeness, we sketch the standard argument of how the weak coupling assumption leads to states of the Gibbs form in App.A 1.
• The thermal state is also important when studying not just systems with an external bath, but also in the evolution of isolated quantum systems, even when their global state is pure: in very generic cases, these end up being "their own bath", and the individual subsystems thermalize to the Gibbs ensemble [1,2].
• From a general condensed matter/material science standpoint, we are very interested in numerous questions about the physics at finite temperature: How are conserved quantities (e.g.charge, energy) propagated in a state close to equilibrium?How does the system respond to small or large perturbations away from equilibrium?
• Systems at thermal equilibrium (both quantum and classical) display interesting phase transitions in certain (low) temperature regimes (e.g.classical Ising model in 2D).It is thus relevant to study what are their universal properties both in and away from the critical points.
• They are also important from the point of view of quantum phases of matter and topological order at zero temperature.It has been widely established that thermal states of local models in dimension D − 1 appear in the entanglement spectrum of Ddimensional ground states [3].As such, understanding their structure should also help us in elucidating the low energy behaviour of many interesting systems.
• They very naturally appear in information theory and inference as the distributions that best reproduce partial current knowledge of a system.This is justified by Jaynes' principle, which we explain in App.A 2.
• These states are also important for computation.
For instance, being able to sample from the thermal distribution of local models is a typical subroutine for certain classical and quantum algorithms [4][5][6][7].They are also a very naturally occurring data structure in both classical and quantum machine learning [8][9][10][11][12] (often under the name of Boltzmann machines).
• It is known from quantum computational complexity that the low energy subspace of local Hamiltonians is able to encode the solution to very hard computational problems: finding the ground state energy is QMA complete [13].Thus, it is widely believed that even a quantum computer should not be able to do it in polynomial time.This then at least also applies to the thermal state at very low temperature, and motivates the study of how the complexity changes as the temperature rises [14,15].
There are many different specific aspects that one could explore, but here we focus on the following, which we believe to be of particular importance: • The correlations between the particles at different parts of the lattice.In particular, how does the structure of those correlations are structured in relation to the geometry of the lattice.
• The states of the subsystems that a thermal state can take, and how are they related to few-particle Gibbs states.
• The statistical physics properties of these systems at equilibrium, including Jaynes' principle, concentration bounds and equivalence of ensembles.
• The efficiency of classical and quantum algorithms for the generation and manipulation of thermal states, and the computation of expectation values and partition functions.
More specifically, we focus on these topics for a broad family of Gibbs states that can be understood as being away from phase transitions within the phase space.It is for this region of the parameter space of Hamiltonians that the mathematical results described here are typically more tractable and give insightful results.We describe this further in Sec.I B.
The general topic of this tutorial, and the particular results explained here, are a small part of the exciting past, present and future efforts to understand the physics and complexity of quantum many-body systems.We hope to contribute to the understanding and cross-fertilization of the many different angles that the quantum many-body problem can take.See also e.g.[1,16] for previous references with partially overlapping content.

A. Scope and content
Throughout this tutorial, we cover statements that have a precise mathematical formulation, many of them motivated by a quantum information theoretic perspective.This notably includes a short exposition of a few key technical tools in Sec.III.These have not previously appeared together, but rather separately explained in the literature with various levels of detail, depending on the context and usage.We hope that this encourages new, potentially unexpected, applications thereof.
Along the sections with the actual physical and computational results, we write the proofs of some of the simpler or more important ones explained throughout.This includes at least one main result per section, which should serve as a pedagogical example.For the rest, some of which have more detailed or involved derivations, we refer the reader to the original works cited along the text.One of our main hopes is that after reading this tutorial even the more technical works will be more easily accessible to a wider range of researchers.Because of this, rather than the traditional Theorem-Proof structure of most mathematical physics writing, we have chosen a more streamlined style for the presentation which allows for more physical explanations and intuitions of the steps.This will hopefully contribute to a wider readability.
Most of what we describe are known results, with at most some small improvements, with the proofs either being the same or slightly simplified versions of previous ones.The relevant references are included, but this does not mean that all of the previous relevant ones are listed here: we are certainly missing to mention a very large body of work.This includes many relevant papers on mathematical physics, but also a lot of important physics literature that covers these topics from perspectives that are beyond our scope: based on numerical methods, theory work on experimental implementations, as well as all experimental results.

B. Completely analytical interactions
Before we proceed, we should put the content of this tutorial into context more precisely.In the mathematical study of many body systems, one of the biggest points of interest are phase transitions, including the study of order parameters, symmetry breaking, and other very well established ideas which aim at classifying the possible kinds of phase transitions.For instance, there are important models, such as the paradigmatic Ising in two and three dimensions, that have a very well understood phase transition at a given temperature.
However, for many classes of models and most regions of their parameter space, local Hamiltonians do not have a thermal phase transition.This "one phase region" (as it is sometimes referred to [17]) likely contains the "simplest" cases of thermal equilibrium, in-cluding those of non-interacting gases.These situations are typically characterized by the analiticity of the partition function and other closely related facts, such as: • The convergence of the cluster expansion.
• The localization of correlations in the lattice, and absence of long range order.
• The approximation of marginals with local Gibbs states, and the idea of locality of temperature.
• Concentration properties of local observables.
• Efficiency of approximation, either with quantum or classical algorithms.
• The existence and boundedness of log-Sobolev constants.
It is expected that many or all of these simplifying facts are equivalent, in that a model that obeys one (such as the analiticity of the partition function) will also display the other features.In the classical case, a large number of conditions are known to be equivalent to the analiticity of the partition function.The study of this problem was initiated by the seminal work of Dobrushing and Shlosman [18], aiming at characterizing these "completely analytical interactions" in terms of many diferent equivalent conditions (12 in the original article).In the quantum case, much less is known about the equivalence of the analiticity of the partition function with other physical facts, although some important steps have been taken (see e.g.[19,20]).
The main aim of this tutorial is to cover results that show that Gibbs states have simplifying features with respect to generic quantum states.Following that spirit, most (although not all) of the results explained here apply to this "phase" or universality class of Gibbs states, in which those simplifying facts are expected to hold.In fact, every element of the list above is individually considered in each of the sections below.Because of that, we do not cover an important part of the literature where analytical results are typically much harder to obtain.For instance, those studying the effect of phase transitions in e.g. the simulability of Gibbs states, the types of correlations that can arise, and others.

A. Operator norms
A basic but very important mathematical tool in this context are the Schatten p-norms for operators, as well as the different inequalities between them.These norms are maps from the space of operators to R, as M → ||M || p , that obey the following properties: For a given operator M with singular values {α M l } and p ∈ [1, ∞), they are defined as The more important ones are the operator norm and the 1-norm or trace norm Thus Typically we measure the "strength" of an observable with the operator norm, and the closeness of two quantum states with the trace norm ||ρ − σ|| 1 , since it is related to the probability of distinguishing them under measurements.The 2-norm, on the other hand, is often the easiest one to compute in practice.Also note the very important H ölder's inequality which holds for1 p = 1 q1 + 1 q2 (e.g.p = q 1 = 1, q 2 = ∞).A particularly useful corollary is the Cauchy-Schwarz inequality, when q 1 = q 2 = 2 and p = 1,

B. Information-theoretic quantities
Let us define the von Neumann entropy of a quantum state which, roughly speaking, quantifies the uncertainty we have about the particular state.It is bounded by 0 ≤ S(ρ) ≤ log d.The lower bound is obtained by choosing ρ pure, and the upper bound by the identity ρ = I/d.Another important quantity is the Umegaki relative entropy which is a measure of distinguishability of quantum states.It obeys Pinsker's inequality which relates the relative entropy with the 1-norm.It is strictly positive for ρ = σ, and vanishes otherwise.It is also closely related to the non-equilibrium free energy which also shows that the equilibrium free energy is This distance measure also naturally appears in the derivation of Jayne's principle, as shown in App.A 2.
From these quantities we can also define the quantum mutual information, which, given a bipartite state ρ AB on subsystems A and B, with

quantifies the correlations between
A and B as In particular, it is zero if and only if ρ AB = ρ A ⊗ ρ B .For all these three functions we can also define their corresponding Rényi generalizations.See [21,22] for details.
A further, perhaps more refined quantity is the conditional mutual information (CMI), defined as This perhaps less known quantity is behind many nontrivial statements in quantum information theory (see Section 11.7 in [23] for more details).In a nutshell, it measures how much A and C share correlations that are not mediated by B. That is, if this quantity is small, most of the correlations between A and C (which may be weak) are in reality correlations between A and B and B and C.

C. Lattice notation
In what follows we need some technical definitions regarding the properties of the Hamiltonian and the lattice.The lattice is a hypergraph which we denote by Λ = {V, E} with vertex set V and hyperedges E. To each vertex we associate a Hilbert space of dimension d, C d .The number of particles is N = |V |, and the number of hyperedges is |E|.The locality of the Hamiltonian can be expressed by a parameter d, defined as the as the largest number of hyperedges adjacent to any individual hyperedge.
We can separate the vertices into subregions, such as V A , and we denote with ∂ A ∈ V A the sites at the boundary of that region (that is, with at least one hyperedge connecting to \Λ A ), of which there are |∂ A |.For simplicity, we often refer to regions as A, B, .. instead of V A , V B , ....We also need the notion of "distance" between two regions, dist(A, B), defined as the smallest number of overlapping hyperedges that connect a vertex from A with a vertex from B.
To define the Hamiltonian, we associate local interactions to hyperedges, such that H = i∈E h i .For an operator h i , the set of vertices on which it has non-trivial support is supp(h i ).We have already specified that each h i is such that |supp(h i )| ≤ k (that is, the hyperedges have size at most k), so that for constant k, N ∝ |E|.We also note that that ||h i || ≤ h and introduce the following quantity that is, J upper bounds the norm of the interactions that act on an individual vertex.

D. Asymptotic notation
The so-called asymptotic or Bachmann-Landau notation succinctly describes the asymptotic behaviour of a function when the argument grows large.It is typically used when in a particular expression there are constant factors that we are happy to omit, that are unnecessarily cumbersome, or when we only have partial knowledge of the exact expression but know the asymptotic behaviour.We say that , given functions f (N ), g(N ) ≥ 0: ) is similar to O(g(N )) but with possible additional poly-logarithmic factors, so that instead ∀N > N 0 , f (N ) ≤ M g(N )polylog(g(N )).
• f (N ) = o(g(N ) if for every ε > 0 there exists a These are the most commonly used symbols of this notation, all of which appear below.

III. AN OVERVIEW OF TECHNICAL TOOLS
When studying quantum thermal states from a mathematical point of view, what we often need is some way of simplifying the operator e −βH , in a way that makes the particular problem at hand mathematically tractable.This is usually achieved by expressing the relevant function of e −βH in simpler terms.Potential issues that complicate this are: 1.The exponential of a local operator is not a local operator, due to the high order terms in the expansion, and could in principle be arbitrarily complicated.
2. The individual elements in the Hamiltonian Eq.
(2) do not commute with each other.Thus we cannot divide the exponential of the Hamiltonian into smaller pieces by iterating simple identities like e −β(H1+H2) ?= e −βH1 e −βH2 .
The locality of the Hamiltonian helps make these two problems often not as serious as they could be in general situations.There is a number of tools to deal with this, and we now describe some of the most relevant ones.Below, we explain how the cluster expansion in Sec.III A helps with issue 1, while there are at least two different techniques in Sec.III B and III C that help us with issue 2.

A. Connected cluster expansion
This is a powerful set of ideas whose origins can be traced back to a wide set of the classic (and classical) literature on mathematical physics and statistical mechanics (see e.g.[24]) , initiated in [25].It has traditionally been used to prove the analyticity of the partition function at high temperatures and other regimes, so it serves as an ideal tool to characterize the completely analytical interactions from Sec.I B.More recently, it has also been used to study the existence of computationally efficient approximation schemes to it (see e.g.[26]).The technique is flexible and general enough that it can also cover objects beyond partition functions, such as characteristic functions and other related quantities.
For simplicity we here focus on the high temperature expansion 2 .The starting point is the logarithm of the partition function log Z ≡ log Tr[e −βH ].Let us consider its Taylor expansion around One can then ask: what is the radius of convergence of this Taylor series?More precisely, we would like to know whether there is some β * independent of the system size such that for 0 ≤ β < β * we have that: • The function log Z is analytic.
• The m-th derivative at β = 0 is such that FIG. 1: Illustration of the clusters, defined as a multiset of interaction terms or hyperedges.In this example, the interaction is a graph on a square lattice.The cluster on the left is connected W ∈ G m , while the one on the right is disconnected.The thickness of the lines represents the multiplicities µ W i of the edges, which may appear any number of times in a cluster as long as i µ W i = m.
for some constant C 1 .
• The truncated Taylor series gives a good approximation as There are various ways to narrow down the radius of convergence of this series, but they all revolve around the idea of writing log Z in terms of connected clusters.
A cluster is a multiset (that is, a set counting multiplicities) of Hamiltonian terms h i (or alternatively, of hyperedges {i ∈ E}), which can appear more than once.A given cluster W has size |W| equal to the number of elements in the multiset (counting multiplicities µ W i , so that |W| = {i∈W} µ W i ).Moreover, W is connected if the hypergraph with hyperedges i ∈ W is connected.Let us define the set of all clusters of size at most |W| = m with C m , and the set of all connected clusters as G m .For instance, G 1 is the set of {h i }, G 2 are the pairs {h i , h j } provided that i, j are adjacent or i = j (in which case µ W i = 2).See Fig. 1 further illustrations of a connected and a disconnected cluster.Now, let us define the Hamiltonian with auxiliary variables {λ i } as H(λ) = i λ i h i .We use this to introduce the cluster derivative Here, the subscript λ = 0 means to set λ i = 0 for all i after taking the derivatives.We thus write What we have done here is to simply write each moment of the Taylor series as a sum of the contributions of all clusters W, without further specifying what each contribution looks like.We now prove the key simplification stemming from this expression.Let W / ∈ G m , so that we have W = W 1 ∪ W 2 where W 1 , W 2 are non-overlapping clusters.This allowsus to define h W1 , h W2 as the Hamiltonian terms in those clusters, so that supp(h W1 ) ∩ supp(h) W2 = Ø.We then have This means we can write the moments in terms of connected clusters only This reduces the number of contributions to K m dramatically, and makes it possible to estimate them.One way to show the convergence of the series (see [28][29][30]) is to prove the following: • The number of connected clusters of size m is bounded by N c m 1 for some constant c 1 [31,32].
• The size of each cluster derivative for a cluster of size |W| = m is at most for some constant c 2 [29,30].
The constants here can usually be taken to be simple functions of the lattice parameters d, k, J, h and of some property of the interaction graph.For instance, in [29,30], it is shown that c 1 = ed, and that c 2 = 2eh(d + 1) (see Eq. (3) for the definition of h).These facts together imply that |K m | ≤ 2e 2 hd(d + 1) ≡ (β * ) −1 O(h), so that the partition function is analytic within a disk in the complex plane of radius β < β * , and is also well approximated by its Taylor series.
Beyond this argument for the convergence of the series, there are other more general abstract methods for proving convergence of this type of quantity, in terms of the so-called polymer models [26,[32][33][34].See Chapter 5 of [27] for an introduction.
So far we have only discussed convergence of the series.However, the cluster expansion can be used to device efficient approximation schemes.The main idea is to prove that the individual Taylor terms can be computed efficiently.This requires two separate steps: • The set of all clusters of size m can be enumerated in time poly(N ) × exp(O(m)) [29,35] • Each cluster derivative can be computed exactly in time poly(N ) × exp(O(m)) [26,29,36].
We can thus add all the contributions from the different derivatives to obtain K m in time poly(N ) × exp(O(m)).This, together with Eq. ( 19), implies that by calculating the Taylor series up to a degree M = O (log(N/ ) × log(β * /β)) there exists an -close additive approximation to log Z that can be computed in time poly(N, −1 ).
We do not expect to be able to prove many general statements at all temperatures, due to the presence of thermal phase transitions, and to the fact that the ground state energy is computationally hard to estimate [13].However, there are specific models in the literature for which the convergence can be guaranteed for larger ranges of temperatures (see e.g.[37] and references therein).See Sec.VII B for more details.
This same technique also allows for e.g. the computation of expectation values such as Tr[h i e −βH /Z] by differentiating by an extra λ i in the cluster derivative.It can be also applied to other similar objects such as characteristic functions of the form Tr[e αA e −βH /Z] for some local observable A, which allows for the derivation of probability theory statements, as explained further in Sec.VI A.

B. Thermal locality estimates
We now show the first method to decompose the thermal state into a product of smaller operators despite the non-commutativity, which is related to the general idea of operator growth.Consider an operator A with local support on some small region on the lattice.For simplicity, let this region be such that |supp(A)| ≤ k.
An interesting quantity to study is the operator evolved in Euclidean or imaginary time β under the Hamiltonian H, This is in analogy with the Heisenberg-picture operator A(t) = e itH Ae −itH , which can be understood in terms of the well-known Lieb-Robinson bounds [38], that state that the support of A(t) is mostly confined to a linear lightcone.In many situations, one will want to choose A here to be one of the h i operators.
It then makes sense to ask the following question: what is the locality of the Euclidean-evolved operator A(iβ)?Perhaps surprisingly, this can be dramatically different to the real-time case: there is no general linear growth with the inverse temperature β, but a much wilder dependence on it.
The main difference is that e −itH is a unitary matrix, while e −βH is not.This means that results that exploit unitarity, such as the aforementioned Lieb-Robinson bounds, do not apply straightforwardly.Our best way forward seems then to analyze A(iβ) in terms of nested commutators It is easy to see that the m-th term in this expansion has support on a connected region whose furthermost point is a distance m away from A. The question then becomes: how does this expansion in terms of β converge?
We now discuss how the answer to this question (either high temperatures and 1D) appears to also be restricted to the one-phase region of completely analytical interactions from Sec.I B.
It can be shown that, for general interaction graphs, [24,39] with J, k as defined in Sec.II C.This statement is very much related to the bound on the number of connected clusters in Sec.III A above, since only connected clusters contribute to the nested commutators.Eq. ( 27) implies that as long as β < (2Jk), the expansion can be controlled as a geometric series, from which we obtain Given that the m-th nested commutator can have support on at most k×m sites, the latter equation means that A(iβ) is, roughly speaking, localized within the subset of vertices a distance at most k × m away from supp(A).
It is known that one cannot extend this result to temperatures lower than β O(1), since there exists a 2D lattice in which the terms in the nested commutators in Eq. ( 26) add up constructively, in a way that the norm of A(iβ) grows with system size, and diverges as N → ∞ [40].
On the other hand, it has been known for some time [41] that when the lattice is a one-dimensional chain, the nested commutators grow more slowly, so that this type of convergence happens for all temperatures.For simplicity, we show explicitly the result for k = 2 combining [42] and [40], which is Here, we have defined f (β, J) ≡ 16βJ exp(1 + 8βJ) and g(β, J) ≡ exp(240e 2 βJ) − 13 .The intuitive reason for these is that the geometric bound of Eq. ( 27) can be improved in this case as [40] (again, for k = 2) Notice that, because of the logarithm, the series in Eq. ( 26) is not geometric, and converges for all β.For further explanations of these points see also [43].
So far, we have described how does A(iβ) approximate its Taylor expansion.An alternative approximation commonly considered is to the operator e −βHΛ m Ae βHΛ m , where H Λm = supp(hi)∈Λm h i and Λ m is a subset of the hypergraph corresponding to some region V m of the full lattice Λ.One can then consider how the norm decays with m in terms of how Λ m is defined (typically, some hyper-sphere centered around A).The analysis and convergence turn out to be almost the same as the one for the moments C m (A) above.The reason is that the difference between M m=0 β m C m (A) and e −βHΛ m Ae βHΛ m are essentially the higher order terms in β of the latter, which are also suppressed.See e.g.[42] for a detailed analysis of the 1D case or e.g.Lemma 20 in [44] for a proof in higher dimensions.
One of the main reasons why both of these approximations are interesting is that they are related to the following propagator where A(s) = e −sH Ae sH and T denotes the usual timeordered integral.This is such that This operator E A can be used, for instance, to decompose e −βH as a product of its parts by e.g.choosing A as the Hamiltonian at the boundary of two regions.This operator can be analyzed through a usual Dyson series in terms of powers of e −xH Ae xH .Assuming β < (2Jk) −1 , it can be shown that E A has bounded norm as it follows from Eq. ( 28) and ( 34) that In Appendix B we also show that it is approximately localized in a similar way as e −xH Ae xH is.This means that there exist an operator E A (l) with support restricted to a distance at most l away from A such that for β < Also, notice that if [H, A] = 0, then E A = e −βA .With the right choice of H, A, the operator E A can thus be thought of as a "transfer operator".Corresponding results also exists for 1D using Eq. ( 30) and ( 31).Alternatively, one can also define the following operator with the difference that H and A are now treated on equal footing.In this case, E A is just the multiplicative error term in the first order Trotter product formula, which can be similarly analyzed through the expansion of A(iβ) (see the thorough analysis of Trotter errors in [45] for more details).These Trotter errors are most commonly analyzed in the context of digital quantum simulation [46], for which it is often convenient to go to higher orders in the decomposition.We finish this subsection outlining a result in 1D related to this discussion, which follows from bounds on the quantity in Eq. (33).It appeared first in [41], and it features in Sections V A 1 and VII A. Let us define E l A = e −β(H l +A) e βH l , where H l are the interaction terms a distance at most l away from supp(A).It can be shown that where C 1 , C 2 and q > 1 are constants depending on k, J, β which we do not show explicitly for simplicity, although notice that C 1 will be essentially the exponential of Eq. ( 30) .The proof is similar to that of Appendix B, together with a bound on Eq. (33).We refer the reader to e.g.[41,42] for further details.
The approximations E A (l) and E l A to the operator E A are important in that they allow us to decompose e −βH into a product of smaller local operators despite the Hamiltonian being non-commuting.For instance, they will be useful in the arguments of Sec.V A 1.

C. Quantum belief propagation
An idea related to the previous locality estimates appeared first [47], and has more recently featured in several results about Gibbs states on lattices [12,19,44,[48][49][50][51].It is a tool similar to that of the previous Sec.III B, in that it also allows us to decompose the thermal state as a product of smaller, localized operators, which makes certain calculations more tractable.The goal is to be able to divide the big operator e −βH into smaller pieces, that allow, for instance, to prove that the Gibbs state can be sequentially generated, or that local perturbations only have effect in the near vicinity.This is part of a celebrated series of works including the decay of correlations for gapped ground states [52] or the area law of entanglement in one dimension [53] which show how Lieb-Robinson bounds (a dynamical statement) can be used to prove static properties about ground and thermal states.
The derivation here is a particular example of that idea, but see [54,55] for overviews that go beyond Gibbs states.
The goal is to construct a quasi-local operator O m A (to be defined below) with support near supp(A) such that Notice the difference with Eq. ( 35), where we only multiply e −βH with an operator from the left.We start by considering the "perturbed" Hamiltonian H(s) = H + sA and the following derivative where, if where βπ log e π|t| β+1 e π|t| β−1 the Fourier transform of fβ (ω) (see Appendix B of [12]), we can also write The proof leading to Eq. ( 42) that explains hte appear- (A)|| ≤ ||A|| by the triangle inequality and Eq.(B16).Moreover, it can also be approximated by a localized operator around the support of A by using Lieb-Robinson bounds [38].In particular, when H(s) is local, it can be shown that [56] where H Λm (s) is the restriction of H(s) to the sum of local terms that are at most a distance m away from the support of A, v, b and c are constants, and D is the dimension of the interaction lattice.This should be reminiscent of the H Λm appearing in Eq. ( 33), with the only difference being that now the evolution is for real times.
Eq. ( 45) allows us to establish that after time-evolving A for a short time, the support is still approximately localized, with a radius growing with time.This also allows us to define Φ where in the first line, after the triangle inequality, we divided the integral into two different ranges, and used Eq. ( 45) in the first range and ||Φ

H(s) β
(A)|| ≤ ||A|| in the second.The bounds on the integrals were obtained using the properties of f β (t) from Eq. ( B16) and (B18).As a result, for m large enough, the difference between the two operators is exponentially decaying in m.
We can now integrate Eq. ( 42) between s = 0 and s = 1 to obtain where Similarly to Eq. ( 34) above (see the analyisis of the operator E A in Appendix B), this operator has a bounded norm, since Additionally, it is also approximately localized exactly around the support of A. Let us define the operator O m A in the natural way We can use an argument analogous to that used in App.B to show that O A and O m A are exponentially close, as, given Eq. ( 46), for large enough m, ≤ β||A||e β||A|| 2 −Ω(m) .
These bounds can be compared to Eq. ( 39) and ( 40), which are of a very similar nature.There are, however, two important differences between O A and E A in Eq. (34) above: • Since it is based on the Lieb-Robinson bound, the operator O A is well-behaved in all lattices and at all temperatures, in the sense that it has a bounded norm and is approximately localized.This is as opposed to E A , which is likely a large operator in high dimensions and low temperatures.That is, this result holds for all Gibbs states, irrespective of whether they are completely analytical interactions or not.
• On the other hand, to recover e −β(H+A) from e −βH we require left and right multiplication with O A , O † A , as opposed to Eq. ( 35), which may be problematic in some applications.In particular, we should not expect it to be a key ingredient in proving results that do not hold at all temperatures such as the decay of correlations or the analyticity of the partition function.

D. Selected trace inequalities
In the past two subsections we have explored how to analyze perturbations to the Hamiltonian in the Gibbs operator e −β(H+A) via E A and O A .When considering traces, simpler identities hold.We exemplify this with two with very elementary implications and proofs, which can be found in (at least) [17].The first one is about the stability of partition functions.Let H 1 , H 2 be Hermitian operators.Then we have The proof is just as follows: log where in the last inequality we have simply used H ölder's inequality Eq. ( 7) with q 1 = 1, q 2 = ∞.If we take e.g.
The proof can be found in Appendix B. This norm can then be bounded with the results from Sec. III B, to scale as ∝ ||H 2 ||.The resulting expression can for instance be used for analyzing characteristic functions of observables F by taking C = e αF for α ∈ R [17].
More generally, in the practice of mathematical quantum physics, whether it is from the many body, the QI, or any other perspective, many important proof ingredients take the form of inequalities, either between operators, traces, or norms (such as those already mentioned in Sec.II A).There are too many to give a reasonably complete overview here but we refer the reader to e.g.[57,58].

IV. CORRELATIONS
One of the more important questions when studying many body systems is: how and how much are the different parts correlated?Intuitively, the stronger these correlations, and the longer their range, the more complex a state is -the reason being that we cannot think of the large system as a collection of simpler, weakly correlated parts.The obvious extreme example is that of an uncorrelated gas, in which the particles do not interact and have completely independent properties.
For thermal states with local interactions, we can expect that locality will make the state far from generic, in a way that constraints its complexity.Intuitively, it should cause the correlations to be "localized", meaning that particles are only correlated with their vicinity as given by the lattice geometry.For a rough intuition, consider the first terms of the Taylor series That is, at very high temperatures we approach the trivial uncorrelated state ∝ I and the leading order term includes only k-local couplings, with only higher order terms coupling far away particles.We thus expect that the correlations between particles will generally be weaker i) the higher the temperature and ii) the larger their distance on the interaction graph.This is one of the main ways of understanding how the Gibbs states of the "one phase region" of Sec.I B are very different from generic states.An important motivation for this is that, as we will see in later sections, the situations in which the correlations are weaker or short range roughly correspond to those in which we expect better algorithms for the description of thermal states.This is perhaps most clearly the case in the context of tensor network methods.We now proceed to describe (and even prove) the more important ways in which these correlations are constrained.

A. Correlations between neighbouring regions: Thermal area law
One of the more important statements about correlations in quantum many-body systems is the area law.
This roughly states that a measure of correlations between two adjacent regions is upper-bounded by a number proportional to the size of their mutual boundary.
Traditionally, this has been mostly studied in the context of ground states, which are pure.There, the relevant measure of correlations is the entanglement entropy, or some Rényi version of it.In that context, an area law for the entanglement entropy is believed to hold for all ground states of models with a gap [59].This can be proven in 1D [53,60] and in some cases in 2D [61].The interest in it is largely due to its relation to other phenomena, such as phase transitions [62], the decay of long-range correlations [63] or the effectiveness of certain tensor network algorithms [64,65].
For thermal states, a very general area law can be shown to hold for systems in any dimension, at all temperatures.We now give a short proof of this statement, and then discuss its significance (see [66] for the original reference).In this case, since it is a mixed state, an appropriate measure of correlations is the mutual information in Eq. ( 13).
Let us partition our interaction graph into two subsets of particles A, B, with a thermal state ρ AB β .We start with the very simple thermodynamic observation that the free energy F from Eq. ( 12) of the thermal state is lower than that of any other state (this follows from Eq. ( 12)), and in particular Writing out the free energy explicitly as F β (ρ) = Tr[ρH] − β −1 S(ρ) and rearranging yields ) Given that the entropy is additive S(ρ⊗σ) = S(ρ)+S(σ) notice that the LHS is exactly the mutual information I(A : B) ρ AB β from Eq. (13).Since the Hamiltonian is local, we can write it as where H A , H B have support only on A, B respectively, and H I is the interaction between them (with support on both).By definition, the expectation values of H A and H B coincide on both states Tr[(

Now we can use a few of the operator inequalities from
Section II A to obtain . FIG. 2: Under an area law, the correlations between regions A and B grow at most as the size of their mutual boundary ∂ AB .
Putting Eq. ( 58) and ( 61) together we have the final result This is the area law for the mutual information of a thermal state: it implies that the strength of the correlations of systems A, B cannot depend on their size, but that it grows at most as their common boundary.For a local Hamiltonian, we have that where What this strongly suggests (although it does not quite prove) is that the correlations between A and B are localized around the mutual boundary, and that the bulks of A and B are mostly uncorrelated.That is, the only relevant information about A that B contains is about the region of A that is near their boundary.
This statement, as can be seen from the proof, holds for all temperatures and all interaction graphs, which is likely as general as it can be , at least for systems with finite-dimensional Hilbert spaces [67].The drawback of that generality, however, is that it will be unable to signal important phenomena that happens only at specific temperature ranges, such as thermal phase transitions, or an efficient classical or quantum simulability.That is, Eq. ( 62) does not narrow down the set of "completely analytical interactions" for Sec.I B in any meaningful way.Other more specific versions of the thermal area law in the literature may have more potential in this regard [22,68].
The temperature dependence of Eq. ( 62) can be improved to Õ(β 2/3 ) [44].This can be proven with a variety of techniques, including those of Sec.III B and Sec.III C, as well as methods originated in the study of ground states [60].This dependence is not far from optimal, since there exists a 1D model for which the scaling of the MI is at least O(β 1/5 ) at low temperatures [69].
Many important physical models have a very different temperature dependence, such as log(β + 1) [70,71].Classical systems, on the other hand, have an upper bound that is independent of the temperature, as I(A : [66].All these suggests that the scaling of the mutual information with β in the low temperature regime is related to the computational complexity of the ground space of the models.

B. Decay of long-range correlations
An important fact about thermal states is that often their spatially separated parts are very weakly correlated.Let C, D be regions such that their distance is dist(C, D) (see Fig. 3).We focus on measures of correlations evaluated at the marginals on these regions Tr \(CD) [ρ β ] = ρ CD β .For instance, taking the mutual information, we expect that in general where f is some rapidly decaying function.In fact, we expect that for completely analytical interactions f (l) ≤ K|∂ C ||∂ D |e −l/ξ , where K > 0 is some constant, ∂ C,D is the size of the boundary of each region, and ξ is the thermal correlation length that depends on the temperature and other parameters, but not on l or the system's size.This has been proven in translation invariant 1D systems at all temperatures [72].The main idea behind it is to use the locality estimates from III B, and in particular the properties of the operator E A in Eq. 35.That this also holds for high enough temperatures in all dimensions follows the cluster expansion applied to the mutual information [44].
A more commonly stated but weaker condition is the decay of the connected two-point correlators.This usually takes the form where here M C and M D have support on regions C, D, respectively.That this is weaker than the decay of the mutual information follows from Pinsker's inequality applied to the marginal on regions C, D.
It is known that correlations decay exponentially at large temperatures for arbitrary interaction graphs [73,  74].This can be shown with the cluster expansion [75].
Instead of showing the proof in full generality, we can already see a simple but instructive case by noticing that, with the notation of Sec.III A, At the same time, when one differentiates over two variables λ i , λ j , the nonzero contributions come from clusters that contain both However, connected clusters such that i, j ∈ W belong to G m with m ≥ dist(i, j).This means that the lowest moment that appears in the correlation function is K m with m = dist(i, j).If Eq. 18 holds, then the correlation function will decay at least as (β/β * ) dist(i,j) , mirroring (65).A similar argument also holds for arbitrary few body-observables h C , h D if one consider an appropriate cluster expansion of the perturbed Hamiltonian H + h C + h D .See e.g.[74,75] for more general results.
The connection between this type of correlation decay and the analiticity of the partition function is very well understood in the classical case, where they are known to be equivalent [18].In the quantum case, it is only known that a condition stronger than the analiticity of log Z (called analiticity after measurement) implies decay of correlations.See [19] for details.
Intuitively, both these properties are related to the absence of phase thermal phase transitions: at those critical points, the correlation function diverges and the partition function becomes non-analytic.Since there are known phase transitions at finite temperature (e.g.2D classical Ising model), the exponential decay does not hold for all thermal states at all temperatures.As such, this can typically be thought of as a characteristic property of the set of completely analytical interactions from Sec.I B. The decay of correlations is an important fact: it shows that the different parts of the system behave almost completely independently.A state with this property should then share many large-scale features with an uncorrelated gas, in which the particles are not interacting at all.This has as a wealth of related physical consequences.For instance, it is associated with basic statistical physics facts covered in Sec.VI, in particular the validity of the central limit theorem and related results on concentration properties of thermal states [76,77] and the phenomenon of equivalence of ensembles [76,78,79].It also features in the proof of local indistinguishabiliy in Sec.V A 1.

C. A refined correlation decay: Conditional mutual information
In Sec.IV A, we mentioned that the area law itself does not quite imply that the correlations in a system are localized, in the sense that a particular subsystem is only appreciably correlated with its vicinity.There is, however, a significantly stronger statement about correlations that does imply it in a clear way. 4his is the property of being an approximate quantum Markov state [81], which is defined in terms of the decay of the CMI in Eq. ( 15).Let us consider three regions A, B, C such that B shields A from C. A simple example is given in Fig. 4, or in Fig. 5 for 1D.
Since this quantifies how many of the correlations between A and C are not mediated through B, we thus expect that it becomes small as the size of B grows, and A, C are further apart.This is perhaps the strongest sense in which correlations can be localized.
This is studied in one dimensional systems in [49].By choosing A, B, C to be adjacent regions of the chain (see Fig. 5), [49] shows that It is expected that the decay is e −Ω(|B|) as opposed to Eq. ( 68), which may be important for certain applications [49,51,82].The key technique is the quantum belief propagation from Sec. III C, but the locality bounds from Sec. III B are also sufficient.The idea is to use those results to define a completely positive map corresponding to a particular POVM outcome, that can be used in a "measure until success" strategy.This result, however, relies on the exponential decay of correlations in 1D, and also on bound on the correlation length of the form ξ ≤ e O(β) , which is currently not known.Interestingly, [49] also shows a converse statement: any state with a sufficiently fast decaying CMI approximates the thermal state of some local Hamiltonian.
In larger dimensions, the work [83] shows that a non-commutative analogue of the cluster expansion (in which there are no traces or expectation values, but rather operators) suffices to study this problem.In particular, if the cluster expansion corresponding to the object log Tr \A [e −βH ] converges, exponentially, one obtains an exponential decay of the form which only works for high enough temperatures β < β * NC = O(1).This expansion is more involved than the one described in Sec.III A in that the individual terms of the expansion may not commute (since no trace is being taken when expanding log Tr \A [e −βH ]).Thus, the constant β * NC need not be the same as the β * .The significance of a fast decay of the CMI is highlighted by the idea of the Petz map [84,85].An important result in this regard states that, given a tripartite state ρ = ρ ABC , there exists a CPTP map N (•) B→BC (that is, acting on B, and with output on BC) such that [86,87] The map N usually goes under the name of recovery map.See [88] for an overview.
A fast decay of the CMI thus guarantees that the Gibbs state on A, B, C can be reconstructed from ρ AB by acting locally on B (and importantly, not on A), such that I A ⊗R B→BC (ρ AB ) ρ ABC , with R B→BC some CP map taking only B as input.This gives a way of sequentially preparing the whole thermal state from its smaller components, which can potentially be used e.g. for quantum algorithms (see Sec. VII).

V. LOCALITY OF TEMPERATURE
In the previous section we focused on the correlations between different parts.Now, we move the spotlight to features of individual subsystems.That is, if we divide the system into A and its compliment \A, what does Tr \A [ρ β ] look like?In the rest of the section we drop the subscript β for simplicity of notation.
Consider first the trivial case: if the particles are noninteracting, it holds that the marginal on A is the thermal state of H A which is the Hamiltonian that acts only on subsystem A. That is Now, how does Eq.( 71) change when we introduce local (and potentially strong) interactions?Can we identify the state of a subsystem with some thermal state?How different is it from e −βH A Z A ?This general question sometimes goes under the name of locality of temperature [73].
There are (to the author's knowledge) two different but related answers to this: the idea of local indistinguishability and also the notion of Hamiltonian of mean force.We now explain both of them, elaborate on their significance for thermodynamics, and also give a proof of the simplest instance of the first (in 1D).

A. Local indistinguishability
Given the above discussion on the decay of correlations, we expect that the state of a local subsystem will not depend much on the parts that are far away enough from it.A possible way to phrase this is that the local marginal ρ A is indistinguishable from the marginal of a much smaller thermal state, with a Hamiltonian that acts only in the vicinity of A. We now make this intuition precise.
Let us refer to partitions into ABC such as those in Fig. 4 or Fig. 5, and write the Hamiltonian with the following terms: We now have the full thermal state ρ, as well as a ther-mal state supported on A, B defined as that is, without the terms in H that have support in C.
One can also think of this as the marginal of the thermal state ρ AB 0 ⊗ρ C 0 ≡ e −β(H AB +H C ) /Z AB Z C in which we have removed the interactions H BC between AB and C. Notice that ρ AB 0 = ρ AB due to the presence of H BC .This is, however, just a small local term.
The main idea is that if B is large enough, these two states are almost indistinguishable on A. Let us assume that the connected correlations from Eq. ( 65) decay with function f (dist(C, D)).Then, the following upper bound holds for some constant K > 0 [51] ||Tr The first term in the RHS comes from the decay of correlations assumption.The second comes from using the QBP technique in Sec.III C. The exponential decay of this quantity thus holds whenever both the correlations decay fast enough, and Lieb-Robinson bounds hold.An alternative proof for high temperatures using the cluster expansion can also be found in [73].
A straightforward consequence of this is that we do not need to know the whole state to compute local quantities.If we care about some kind of local order parameter, or want to compute currents or else between some part and its surroundings, we can calculate them without having to diagonalize a huge matrix of size exp(N ), but rather just focus on a much smaller region.This is particularly useful in translation-invariant systems.

Proof in 1D
We now show the full proof of this result in the case of one dimension.The more general one, however, is essentially the same and can be found in [51].It uses previously mentioned results, and shares some steps and ideas that appear in other fundamental questions including the proof of the absence of phase transitions in 1D [19,41] or of decay of correlations [41,72].It will also be a key ingredient in the algorithm of Sec.VII A.
We focus on the restricted setting of a chain, that we divide into three parts A, B, C, such that B is in the middle and A is a small subsystem at the end of the chain, as in Fig. 6.The aim is a small upper bound on The parameter l is free, so we can choose to our convenience.We refer now to the result from [41] in Eq. ( 39) and ( 40), from which it follows that That is, the operator E BC has bounded norm and, since we can approximate it by E l BC with some l < dist(A, C), its support on region A is super-exponentially suppressed in l (due to the factorial, which always dominates over q l ).In what follows, we choose l = |B|/2.

Let us define M *
A to be the operator that optimizes the RHS of Eq. (75).With the triangle inequality we can write ≤ Let us now upper-bound these two terms independently.The second can be bounded with Eq. ( 77) and H ölder's inequality applied twice. = Given Eq. ( 53), , which is a constant that only depends on β, k, J. Thus, this second term is super-exponentially suppressed in |B|.
For the first term, we require the decay of correlations property Eq. ( 65) (which holds in 1D under the assumption of translation invariance).Since l = |B|/2, where for the last inequality we used , which holds for sufficiently large l.We can now write ≤ where we used the triangle inequality in the first line, and H ölder's inequality Tr[M * A ρ] ≤ ||M * A || ≤ 1 to get to the second.Finally, we can use Eq. ( 77) again after another application of H ölder's inequality and since Tr This finishes the proof.Putting everything together, we see that we have upper-bounded our target quantity in Eq. ( 75) by a small number related to the error term in the decay of correlations and Araki's result.Without writing the constants explicitly, and just on the leading exponential error, the final result is where Ω(x) is defined in Sec.II D.
For simplicity, we have only dealt with the case of a 1D chain, where A is at the end of it.To generalize the proof, one just needs to define an analogous partition ABC in higher dimensions (see Fig. 4 for 2D) and then remove all the different interaction terms from H BC one by one.Here, we have done it with the operator E BC , but this can also be done with the (suitably defined) QBP operator O A from Sec. III C, and the result Eq. ( 74) is essentially unchanged.

B. Hamiltonian of mean force
The state ρ A is obviously the thermal state of some Hamiltonian on A, since we can always define which is in general different from H A .This is the socalled Hamiltonian of mean force [89].Notice that it can be defined up to some additive constant chosen at will.
How does this Hamiltonian compare to the "bare" Hamiltonian H A , which disregards the interactions between A and the rest?That is, we would like to understand the norm and locality of the operator Φ A ≡ HA − H A .This turns out to be a difficult problem, very much related to both the decay of mutual information and of the conditional mutual information from Sec. IV.
One potential result is as follows.Since the interactions are local, it makes sense that, if the size of A is much larger than the number of nearest neighbours k, most of the weight of Φ A is localized around its boundary with the rest of the system, of size |∂A|.
If the size of A is much larger than the number of nearest neightbours k, most of the weight of Φ A is localized around its boundary with the rest of the system, of size |∂A|.The precise question is: can we approximate Φ A with another operator Φ l A that only has support on sites a distance l away from the boundary?Using a noncommutative cluster expansion, as in Sec.IV C, Theorem 2 in [83] shows that, for any temperature β above a threshold one β * NC > 0, one can define a Φ l A such that That is, Φ A can be exponentially well approximated with an operator localized around the boundary.See Fig. 7 for an illustration.A similar result is expected to hold in 1D, but this is a so far open problem.See [90] for a recent overview on this topic for a different set of models, and its implications.

C. Non-equilibrium thermodynamics with strong coupling
The ideas of this section may help understand how a priori complex non-equilibrium thermodynamic pro-cesses may be tractable in practice.In many situations of interest, the starting point at t = 0 is an equilibrium state with time-dependent Hamiltonian This includes a system Hamiltonian H S (t), a bath Hamiltonian H B , and an interaction H I between them 5 .The driving of H S (t) for t > 0 takes the system away from equilibrium, and different thermodynamic quantities can then be studied.Textbook thermodynamics are typically centered around macroscopic systems such as gases, where one can take the weak coupling limit H I H S , H B .In many body quantum systems, such as the ones considered here, this limit may no longer apply, which creates a number of difficulties for thermodynamic considerations (see e.g.[89,91,92]).These, however, can be dealt with if one considers the effective state on the system where it is convenient to define the Hamiltonian of mean force HS (t) such that ZS (t) ≡ Tr SB [e −βH(t) ] Tr B [e −βH B ] .This way, for instance, the non-equilibrium free energy (and thus the second law) is defined as See e.g.[92] for details.These quantities may be difficult to calculate.While HS (t) may be inferred from the system alone, ZS (t) may depend on the system but also potentially on its relation to the whole bath.Consider the rather general situation in which the system to be a small part of a lattice Hamiltonian of a Gibbs state with short-ranged correlations, and the bath to be the rest of the lattice.In that case, the situation simplifies dramatically.First, the discussion from Sec. V B suggests that typically the Hamiltonian of mean force HS does not necessarily depend on the whole bath, but has some corrections which only depend on the region around the boundary between S and B.
Then, for the effective partition function ZS , the same fact follows from local indistinguishability.To show this, notice that ZS (t) where O I is the belief propagation operator from Eq. ( 48) with A = H I and O l I is its local approximation in Eq. (50).Defining B l to be the region of the bath that is at most a distance 2l away from S, we have that This shows that the effective partition function can be approximated to multiplicative error by computing the expectation value of O † I O I on the system and a region of the bath a distance l = log( −1 ) + O(β||H I ||) away from the small system.In D dimensions, assuming ||H I || = O(1), the computational cost of exact diagonalization is e O(l) = e O(log D −1 ) , independent of the size of the bath.

VI. STATISTICAL PROPERTIES
We now explain and prove some important statistical features of thermal states.These are central statements of the field of statistical physics and characterize the ensembles involved: the thermal (or canonical) and the microcanonical, as well as the grand canonical or others, when relevant.In contrast to the results of other sections, all those shown here (as well as their proofs) apply equally to classical models.

A. Measurement statistics and concentration bounds
In Sec.IV we saw how in many instances of thermal states, in particular for those in the "one phase region", the different subsystems tend not to have strong correlations.This has a number of consequences, and we now explore an important one that shows that their large-scale statistical properties resemble those of noninteracting/statistically independent systems.These are concentration bounds, akin to the (perhaps more widely known) central limit theorem.
The setting is as follows: let us consider a k-local observable A = j A j , such that A j has support on at most k sites.The best example is the energy, but also other properties like magnetization j σ Z j .The expectation value of any such observable can be thought of as a macroscopic property of the system (such as the average magnetization of the material).While we expect that there will be thermal fluctuations around that average value, our intuition from thermodynamics tells us that any such large-scale property should have a definite value, almost free of fluctuations.This is due to one of the most basic ideas from probability theory: the measurement statistics of sums of independent random variables greatly concentrate around the average.The main conclusion is that if we measure an observable A on a thermal state, the outcome will be very close to the average A β with overwhelmingly large probability.That is, the distribution which is the probability of obtaining outcome x when measuring A, is highly peaked around the average This has important implications for the validity of thermodynamic descriptions of these systems, in that averaged macroscopic quantities characterize the large system of many particles whose properties we do not know with any certainty.
In the theory of probability, there are various types of results characterizing distributions comprised of many independent (or close to independent) variables.Their proofs most often involve constraining the characteristic function e λA β , where λ may be real or imaginary.We now describe some of them.

Chernoff-Hoeffding bound
This is a concentration bound that reads where Ā ≡ j ||A j ||.Thus if δ 2 c Ā, the probability of measuring A to be away from A β by at least δ is exponentially small.The most common proof technique is via a bound on the characteristic function of the form for some O(1) constant c > 0 (which might depend on β and the parameters of the Hamiltonian) and a wide enough range of τ .From this it follows that One can follow the same steps for the range Then, choosing τ = δ/(2c Ā) yields Eq. ( 99).This result can be very easily shown for independent random variables or for independent spins.For interacting spins, Eq. ( 100) was shown in [28] with the cluster expansion technique from Sec. III A, which holds for all dimensions and all temperatures β < β * .To see this, notice that a bound of the form of Eq. ( 100) follows from proving the convergence of the expansion of log e τ (A− A β ) β to second order.The main result from [77] proves a slightly weaker version of Eq. ( 99) with a different technique, only assuming the decay of correlations from Sec. IV B.

Large deviation bound
A related important type of concentration bound is given by large deviation theory.This is the branch of probability theory concerned with understanding the likelihood of very rare events, and has a long history as one of the most important mathematical frameworks for studying statistical physics.For instance it gives a way of describing the equilibrium properties of large ensembles (as is also the case here), or for predicting the long-time behaviour of non-equilibrium processes such as Brownian motion.See [93] for an excellent overview of the main results and their consequences for classical systems.
The basic idea is that given any set of measurement outcomes A, we would like to identify whether there always exists a rate function If this is the case, the dominant behaviour of P A,β (x ∈ A) is essentially a decaying exponential P A,β (x ∈ A) e −N I A +o(N ) , unless I A = 0.This means that, in the thermodynamic limit, the measurement statistics of A are extremely peaked around the points where the rate function vanishes I A = 0.This is slightly stronger than the Chernoff-Hoeffding inequality, in that it can in principle give an exact expression of the probability distribution for large enough N .However, we do not always know how large an N is "enough", and for finite N , it often does not give an expression as explicit as Eq.(99).
Again, the proof strategy most often involves the characteristic function.In particular, the Gärtner-Ellis theorem states that a sufficient condition is that the function exists and is differentiable.This has been shown using the cluster expansion in [94] for 1-local observables, and upper bounds on the rate for general observables have been shown using the locality estimates from Sec. III B in [17,95].The full large deviation principle was shown in 1D in [96].An alternative proof can be found in [97].

Berry-Esseen theorem
Another interesting probability theory result is the Berry-Esseen theorem [76,98,99], which can be thought of as a refinement of the central limit theorem for a finite sample size (which in this case is the system size N ).Let us define the cumulative distribution function as well as the equivalent for a Gaussian with the same average and variance where The distance between these two functions is bounded by Esseen's inequality [100], which states that, for all T > 0 That is, the right hand side is small if the characteristic function inside the integral is close to a Gaussian for rather long times t.This can be shown by bounding the logarithm of the characteristic function Z ] with the cluster expansion from Sec. III A. Assuming h = O(1), for short times t/σ A ≤ t * , with t * some O(1) constant, it can be shown that it is close to the second order Taylor expansion as so that To prove this, see for instance Theorem 13 in [30].This allows us to bound the integral in Eq. ( 109) choosing which, considering that σ A = Ω( √ N ), means that ∆ = O(N −1/2 ).This means that the cumulative functions F (x) and G(x) become increasingly similar with system size, which shows that the probability P A,β (x) approaches a Gaussian in the thermodynamic limit.
A different proof starting from the assumption of decay of correlations, can be found in [101].

B. Equivalence of ensembles
We now prove an important statement in the study of statistical physics, which goes back all the way to Boltzmann and Gibbs.In large systems, the average macroscopic properties of both the thermal or canonical state, and of the microcanonical ensemble, are essentially the same.This means that both canonical and ergodic averages coincide in the thermodynamic limit, and shows that the particular ensemble used for calculations does not necessarily matter.
There are various similar statements in the literature [28,76,78,79,[102][103][104], but the proof that we now show follows that of [28,78,79] and relies on the Chernoff-H öffding bound.Let us define the extensive observable A = j A j (such as e.g. the total magnetization N j σ Z j ) with thermal/canonical average A β which for simplicity we will set to A β = 0, while the microcanonical average is where E is the energy and ∆ the width of the microcanonical window (which might depend on N ), and |E j is the energy eigenstate of energy E j .D N (E, ∆) is a normalization constant counting the number of eigenstates within the window.This motivates the following probability distribution which gives the probability of measuring x = E j |A|E j when sampling eigenstates from the microcanonical ensemble.
First, we need to determine what is the energy that corresponds to temperature β and thus characterizes the microcanonical ensemble.Given the temperature β, the microcanonical energy E 0 is such that Assuming that the width is significantly different than the energy scales of the system, ∆ H β (which is most typically the case), this roughly implies that E 0 is the energy of the microstates {|E j } that have the dominant weight in the canonical ensemble (when the density of states is weighted by the factor e −βE ).We have written the dependence on β, ∆ explicitly in Eq. ( 115), but let us now drop them for simplicity of notation.
We start by upper bounding the m-th (even) moment of where we we used the convexity of x m with m even.The bound can easily be expressed in terms of a canonical average as, since A m is positive, The factor Ze βE 0 D N (E0,∆) can now be upper bounded using the definition of the microcanonical ensemble and the concentration bound.Let us define the following modified partition function Z ≡ |Ej −E0|≤δ e −βEj .If we also set δ = KN 1/2 with K = O (1) it follows from Eq. ( 99) that Now divide the energy range in the sum in equal parts of width ∆ * ≡ min{∆, β −1 }, such that the largest energy of each interval is E ν , so that with K = O(1), where the last inequality follows from the fact that D N (E 0 , ∆) is monotonic on ∆.We thus have To finish this part of the proof we bound A m β .It was shown in [28,79] that the concentration inequality Eq. ( 99) implies that For completeness, we reproduce the proof in Appendix B. We are now in a position to bound the tail of P E,∆ (x) as Thus, choosing m = x 2 0 4ce Ā .and repeating for x ≤ −x 0 , leads to (let us now bring back the average A β explicitly, previously taken to be zero) We are almost done.We now bound the difference between canonical and microcanonical as and so choosing so that the difference vanishes in the thermodynamic limit.Notice that ∆ * ≡ min{∆, β −1 }, so that in principle even rather low temperatures and very small (up to exponentially small) microcanonical windows are allowed.This is the final result.It states that average properties are essentially the same, provided that the average energy E 0 is determined by Eq. ( 115), and that the width ∆ is not too small.The fact that it can be up to exponentially small in system size is rather strong, and related to weak statements of the eigenstate thermalization hypothesis (see [79,105]).

VII. ALGORITHMS AND COMPLEXITY OF THERMAL STATES
When addressing specific problems in many-body physics, we would most often like to understand whether they are fundamentally complex or not, in the precise sense established by theoretical computer science.This can typically done in two complementary ways: • By showing that there exists an algorithm with a provable performance and run-time.Additionally, it is interesting if the algorithm can be explicitly constructed, and implemented in practice.
• By establishing that a problem, or a set of them, belong to or are complete or hard for a certain complexity class.
This applies to both classical and quantum computation, and their respective complexity classes.Problems related to quantum thermal states can also be studied under this light.The relevant ones include most notably the estimation of the partition function, or the generation of either approximations to the thermal states (in quantum computers) or their classical representations (in classical computers).
As an illustrative example of what can be proven, we start with a simple explicit algorithm that approximates the quantum partition functions in 1D in polynomial time [106].We then briefly review some other important known results about the hardness of approximating partition functions.The rest of the section includes an explanation of the current best tensor network results, which are provably efficient in a wide range of situations, and a short review of quantum algorithms for preparing thermal states.

A. An efficient classical algorithm for the 1D partition function
Using some of the results from the previous sections, we now show that, assuming that h, β = O(1), and that exponential decay of correlations holds, we can efficiently approximate the partition function in 1D.This is done with an algorithm with runtime poly(N, ε −1 ) that outputs Z , where This section follows the result and proof strategy from [106], with some minor modifications.
In one dimension, let us consider the partial Hamiltonian H j = j−1 i=1 h i , which includes the first j − 1 interaction terms as counted from the left, starting from the left-most h 1 .Then, define the partial partition function where O hi is the quantum belief propagation from Sec. III C and A i = O † hi O hi .Now, rewriting Eq. ( 134) notice the simple iterative relation where ρ i = e −βHi /Z i−1 .Thus we can write where Z ≡ Z |E| and d N = Z 0 .The key now is to use results from Sec. III C to approximate A i , and local in- ≤ e O(βh) e −Ω(l) , where in the first line we used the triangle inequality and in the second we used both Eq. ( 49) and (51).Now, let us label by Λ l * to be the rightmost part of the chain of length l * , with vertex set V l * and in which H i+1 has support.Choose l * ∈ R so that A l * i has support in the right side of V 2l * and define ρ The expectation value can be approximated as ≤ This follows from the triangle inequality.The partial trace \V l * is over the support of ρ i excluding vertices V l * .Eq. ( 141) now has a form that we can upper bound.
Since βh) , we can use Eq. ( 140) to bound the first term, and Eq. ( 89) with |B| = l * to bound the second.With these, we conclude that there exists constants c 1 , c 2 depending on all the constants involved (i.e.β, h, J, k, c , v) such that The key feature of Tr[ρ is that it is an expectation value of an operator whose form we known explicitly, as per Eq. ( 50), evaluated in a thermal state of size 2l * .This can be computed exactly (or rather, with a subleading error) in a time exp (O(l * )).Let us now choose a precision ε/N in Eq. ( 142), so that l * = O log N/ε −1 .This way, we have The last equation comes from the fact that N ∝ |E| and that all eigenvalues of A i are O(1), as per the definition in Eq. ( 48).The algorithm thus consists of exactly calculating the numbers {Tr[ρ } exactly, and then multiplying them, so that Since there are |E| ∝ N terms in Z , and each takes time poly(N × ε −1 ), the final runtime is poly(N, ε −1 ), as desired.See also [107] for a related result in the translation invariant setting.

B. Hardness of approximating partition functions
In the previous section we have seen how the partition function can be approximated in 1D in the sense of Eq. ( 133) as long as the temperature is β = O(1).Moreover, through the cluster expansion we briefly explained in Sec.III A how it can be approximated for any local model as long as β < β * , where β * is some fixed constant independent of system size.
On the other hand, in the limit of β → ∞, the logpartition function equals the energy of the ground state.For classical models, approximating this to a certain precision is an NP-complete problem.For local quantum Hamiltonians, it is QMA hard.This means that there should be no efficient classical or quantum algorithm to approximate log-partition functions for low enough temperatures, both for classical and quantum models.In fact, it is known that the classical problem is only slightly harder than NP [108]  6 , and that it is at least #P hard if complex interactions are allowed [110].For the quantum case, the exact complexity class to which this belongs or is complete for is not yet clear (see [109] for more details and results).
There is still the expectation that for certain classes of interesting models we can still compute the partition function efficiently, even with classical algorithms and at very low temperatures.One notable example are quantum Monte Carlo methods [111][112][113], which are restricted to Hamiltonians without the so-called "sign problem" (or stoquastic [114]).Other results cover different specific kinds of models [37,109].Quantum algorithms for approximating general partition functions also exists [109,[115][116][117], but often come with exponential run-times.
Another relevant angle of this problem is the connection of efficient algorithms to the idea of completely analytical interactions from Sec.I B, as well as the physics of phase transitions.The intuition is that a physical phase transition in the system may come together with a computational phase transition in which approximating log Z becomes fundamentally harder.Along these lines it has been shown that in quite a general setting [19], that the analiticity of the log-partition function implies the existence of an efficient algorithm.This can be understood in terms of the setting of Sec.III A: as long as the Taylor expansion converges well, we can compute the individual coefficients of Eq. ( 17) efficiently.

C. Tensor network methods
Tensor network (TN) techniques are perhaps the most successful way of classically computing physical properties of quantum systems in 1D, and sometimes 2D and often come with rigorous theoretical guarantees.This includes most notably the regime of low energy physics [53,64,[118][119][120] and, as we now review, that of finite temperature too.See e.g.[121][122][123][124] for introductory texts to this topics. 6More specifically, it is in the class BPP NP , see [109].

Product of operators
Tensor network ∝ log   .
FIG. 8: Schematically, the way to prove that a 1D thermal state is a tensor network is by decomposing it as a product of smaller operators.It then follows from standard methods that the bond dimension of the tensor network representation is related to the size of those operators.
The main aim is to obtain a TN representation of an operator M D such that ||e −βH −M D || 1 ≤ εZ, which then allows us to compute all thermal expectation values up to error ε as per Eq. ( 6).The index D labels the bond dimension which, roughly speaking, quantifies the complexity of representing M D .A TN of bond dimension D requires a memory ∝ N × D 2 to be stored.Intuitively, the approximation operator M D should be made out of a sum or low-depth product of operators with smaller support i.e. of size at most ∝ log D. This is graphically described for 1D in Fig. 8.
One possible way to do this is by adding piece by piece from right to left, aided by the results from Sec. III B. To do this, first define segments of the chain of length l such that the Hamiltonian up to segment j − 1 is defined by H j .This allows us to define the operator such that e −βH = e −βh1 j=1 Ψ j .Each operator Ψ j can then be approximated by a localized operator Ψ l j with support in a region of length l as in Eq. ( 40), exponentially well in l.This can take the form There are O(N ) of those operators, and the error e −Ω(l) of each approximation can be shown to contribute additively.Thus, choosing l ∝ log N/ε gives the desiredgood approximation to e −βH .By construction, the product of operators resembles that of Fig. 8.In that case, the bond dimension can be straightforwardly assumed to be D ≤ e O(l) = poly(N, ε −1 ), which is already computationally efficient and likely close to optimal.A non-trivial improvement to this can be found in [44].Roughly speaking, one can define an operator Ψl j in which the exponential functions of Eq. ( 147) are approximated by their Taylor series.In that case, we can put forward results about the bond dimension required to represent polynomials of Hamiltonians [60].This leads to an improvement of the bond dimension to D ≤ e Õ( √ l) = exp Õ( log(N/ε)) .This is sub-linear in system size, much more computationally efficient.
In higher dimensions, the best-known method is a variation of the cluster expansion proposed in [125].There, the expansion is treated in a slightly different way, to approximate the exponential e −βH rather than the log-partition function as in Sec.III A. Instead of counting the number of individual clusters of at most size m, one has to consider arbitrary products i∈W h i ≡ h(W ) of terms h i from a multiset W of size |W |, that can be divided into connected clusters.Let us label the multisets W = {h i } in which the biggest cluster has size M to be C M .That this is consistent with the cluster expansion can be seen from taking the exponential of Eq. ( 17) given the expression in terms of clusters of the powers in Eq. ( 21).
The following result was proven in detail in [73].It reads where b(β) < 1 for all β < β * = O(1).In [126] the sum over clusters on the RHS of Eq. ( 148) was shown to be a tensor network (in fact, a so-called PEPO) of bond dimension e O(M ) .Thus, by setting the RHS to be ε, we achieve a TN approximation to e −βH with bond dimension poly(N, ε −1 ).This only holds for inverse temperatures below β * , but the result can be extended to arbitrary temperatures simply by taking powers of the operator.This means the bond dimension grows as [126] for the details).This scheme has recently been numerically implemented in practice [127].
These results might seem surprising, since they show that there are in principle efficient TN representations for all dimensions and all temperatures β = O(1).This contradicts the intuition (justified by numerical works [128][129][130][131][132][133]) that, at phase transitions, when long-range correlations are present, such efficient schemes should not exist.The caveat, however, is that in dimensions higher than one, a TN representation is not enough to be able to extract numerical data efficiently.This is because the contraction of TN can be a computationally demanding task by itself [134,135].In fact, what we expect is that the ability to reliably contract a higher dimensional tensor network is related to facts such as local indistinguishability [136,137], which allows us to obtain reliable results by contracting suitably smaller regions.
Finally, let us note that by using the local indistinguishability from Sec. V A (or even without it in 1D [138]) it can be shown that a much smaller bond dimen-sion is needed to simulate local properties [139] .

D. Quantum algorithms for preparing thermal states
One of the most promising applications of quantum computers is the generation of exotic states of matter in complex many-body models.The expectation is that this should allows us to discover a potentially wide variety of physics, and also serve as a subroutine in certain quantum algorithms, such as those performing optimization tasks.
Because of this, a question that has been very much explored lately is that of how to prepare thermal states of local Hamiltonians with a quantum computer.This could be either a fully fledged fault-tolerant one or one more suitable for the so-called NISQ (Noisy Intermediate Scale Quantum) devices.In the following we review some of the currently existing ones, and also explain the ideas that highlighting the complexity of the problem.We mostly focus on those that have some provable performance guarantees.There are many others we will not cover (such as e.g.[140][141][142] and others), which often rely on some level of heuristic arguments.These also include approaches such as variational algorithms [143][144][145][146] or those based on quantum versions of metropolis sampling [147][148][149].These may nonetheless be more efficient in many physically relevant settings.

General considerations
We have very strong evidence pointing that preparing thermal states is, in its most general setting, not an easy task.The results on QMA hardness of the local Hamiltonian problem [13] show that there are vanishingly small temperatures (scaling quickly with system size) at which the preparation of ρ β is QMA complete.That is, not even a quantum computer can do it efficiently [150].This is the case even for 1D systems [151].
There are also reasons to believe that the problem is not easy even at slightly higher temperatures.For instance, it was recently shown [152] (following a famous conjecture [15]) that there are local models for which preparing states below a certain energy density (including low temperature Gibbs states) requires a circuit of depth at least log N .This property, however, does not apply to lattices [14].
In that sense, we expect that large interesting classes of models and temperature ranges will have efficient algorithms.The locality of the model, and some of its consequences from the previous sections, should often simplify this task.

Algorithms based on purifications
These algorithms work for general Hamiltonians, and are often designed to be run in a fully fault tolerant quantum computer, capable of applying any quantum circuit without large errors.They aim to construct the following state where the second subsystem A is made of auxiliary particles with an orthogonal basis {|l } such that, upon tracing out, yield Tr A [|ρ β ρ β |] = ρ β .The first stage of the algorithm involves preparing a state |ψ with |ρ β as a component such that where we have included a possible additional register R.
The first works proposing this scheme [115,153,154] instead apply the phase estimation algorithm [155].To a good approximation, this algorithm acts as follows 7U that is, it "measures" the energy of system A into the register A. Inputting a uniform superposition Now we add an additional qubit register on the state |0 R and rotate it to |θ = cos θ|0 + sin θ|0 by an angle θ(E l , β) = arccos(e − βE l 2 ) conditioned on the system, obtaining which is the target state with N = d N Z .Other approaches use more recent quantum simulation ideas, such as the technique based on sums of unitaries [156], which leads to a better dependence on the approximation error in many cases of interest.
Finally, to obtain |ρ β with high precision, one must then apply amplitude amplification of the state |ψ , to output the component of Eq. ( 153) corresponding to the register state |0 R .The gate complexity of this, however, grows linearly in N , which sets the leading (almost) exponential gate cost of the algorithm.
There already exists improvements to this type of scheme.In one dimension, one can instead implement this same algorithm connecting subsequent segments of the chain, which can reduce the gate complexity to a polynomial ∼ N O(β) [154].Also, recent progress shows that the phase estimation and amplitude amplification steps in these schemes can instead be replaced by random circuits with post-selection [157], making them more amenable to current technologies.For commuting Hamiltonians, a purification in the form of a tensor network state (a PEPS) can be very efficiently prepared through an adiabatic algorithm [158].See also the recent [159], which produces a purification of a thermal state ∝ e −βH1 starting from that of another Hamiltonian H 0 , and is efficient when ||H 0 − H 1 || is not too large.A potentially efficient scheme along these lines is the one presented in [140].

Efficient algorithms from physical features
Perhaps the main caveat of most of the aforementioned algorithms is that they are constructed for very general Hamiltonians: they do not always make a very clear use of the physical features that we expect could simplify the problem, such as locality or any one of its consequences.However, we expect that there exists efficient schemes to prepare Gibbs states of local model that belong to the class of "completely analytical interactions", aided by facts such as decay of correlations.This is the case for the proposal in [51], whose efficiency depends on two such factors: the speed of decay of CMI from Sec. IV C, and the error in the local indistinguishability from Sec. V A. The algorithm uses iterations of the recovery map that appeared in Eq. ( 70), which are guaranteed to yield a low error if the CMI decays quickly enough.The main idea is that one can construct local decoupled parts of the thermal state independently, and then join them together to make up the whole ρ β via subsequent applications of the recovery map.The local indistinguishability guarantees that the local parts used in the recovery are also accurate parts of the whole thermal state the algorithm constructs.The results on the exponential decay of correlations described in Sec.IV B and of exponential decay in CMI from Sec. IV C thus guarantee that there exists efficient algorithms for local models with a high enough temperature β ≤ β * NC , and also for 1D systems as long as the corresponding assumptions on the correlation decay are satisfied.
Another potential alternative route along these lines is to find out under which conditions the dissipative dynamics (that is, when the system is coupled weakly to some external bath) associated to a Gibbs state converge quickly.Then, tools to engineer dissipative dynamics can be in principle implemented in a quantum computer [160,161].The challenge is to find under which conditions these dynamics have a fast convergence or mixing rate.Rigorous results along these lines are so far mostly limited to commuting Hamiltonians, as we explain in Sec.VIII.

VIII. COMMUTING HAMILTONIANS
There is a much simpler and yet physically relevant class of Hamiltonians that merits a specific mention: those in which all the {h i } commute with each other.This includes many interesting models for quantum many-body physics and quantum computation.It includes all stabilizer Hamiltonians, including the toric code and other widely studied examples, as well as many other models describing various topological phases of matter.
Notice that these are not the same as classical Hamiltonians: even if we can diagonalize all the h i simultaneously, the energy eigenbasis will in general be highly entangled.In contrast, classical Hamiltonians have a product eigenbasis.At the same time, we have e −β(H−hi) = e −βH e βhi = e βhi/2 e −βH e βhi/2 , ( so the tools in Sec.III B and III C are unnecessary.This means that many of the results described above take much simpler forms and easier proofs, as we now briefly explain.
Let us divide the lattice into two complementary regions D, E, with boundary Clearly Tr E [e −β(H B +H I ) ] has non-trivial support on the region ∂ D only.This means that the local indistinguishability from V A holds with no error by choosing A = D, B = E ∩ ∂ DE , C = E \ B, so that dist(A, C) is roughly the width of the boundary.A similar exact result applies to the Hamiltonian of mean force.We now briefly show the proof, which is elementary and can be found in [162].If we define e −βΦ ≡ Tr E [e −β(H E +H I ) ], we see that where α is some constant, and Φ is localized in D ∪ ∂ DE and has bounded norm, as implies that which upon tracing E out and multiplying by e βH D , implies that ||Φ|| ≤ 2h|∂ DE |.It should also be no surprise then that the Markov property of Sec.IV C also holds exactly.This means that if we define regions A, B, C such that A, C are shielded by region B, we have that I(A : C|B) = 0 [163,164].In fact, a converse statement holds (vanishing CMI implies the state is a thermal state of a local Hamiltonian) when the interaction graph Λ is triangle-free [165].As mentioned in Sec.IV C, this is the quantum equivalent of the Hammersley-Clifford theorem [80].
All these exact results strongly suggest that algorithms such as those described in Sec.VII are much more efficient in this setting.For instance, it is immediate from a repeated application of Eq. ( 154) that the thermal states can be expressed exactly as tensor networks with constant bond dimension D ≤ e O(k) .There also exists quantum algorithms for commuting Hamiltonians that are significantly more efficient than the general ones in Sec.VII D [158].In fact, the exact Markov property guarantees that Gibbs states of finite temperature can always be prepared efficiently (in linear time), simply by iterating applications of the Petz recovery map [84].
Another important fact about commuting Hamiltonians is that, when weakly coupled to an external heat bath, the dissipative dynamics is known to remain local.This process is modeled by a Lindblad equation of the form where L α are local "jump" operators and α indexes the energy gaps of H.The best known example are the Davies generators [166].See e.g.[167,168] for introductory references.
The interesting cases are those for which ρ β is the unique fixed point, such that L(ρ β ) = 0.The important question then is how long does this local dissipative evolution e tL (ρ) take to approach the Gibbs state?This can be tackled by analyzing the spectral gap and the log-Sobolev constant of L. A bound on the spectral gap was proven assuming the decay of correlations in [169], and for specific models in [170][171][172][173].This shows that it takes poly(N ) time to thermalize.On the other hand, a bound on the so-called log-Sobolev contant [174] instead constraints that to O(log N ).This was recently proven for 1D chains [175,176] and in [20,177] for other models of dissipation.

IX. CONCLUSIONS AND OPEN QUESTIONS
There are many different models and systems for which we would like to know their properties at equilibrium.This is due to their pervasive presence in physics, but also due to their appearance in learning and sampling algorithms.
It may appear at first that studying thermal states of general complex quantum models is a very challenging task.We hope to have illustrated the fact that this is not always the case: for a large array of situations involving local Hamiltonians many non-trivial analytical statements can be made.These are both about universal physical features of the models at hand, but also about the computational complexity of the problems the physics poses.The connections found motivate a timely research program, largely inspired by quantum information theory: to understand the links between fundamental physical features and their computational complexity.
In the present context, much of the technical difficulty lies in working with with the matrix exponential of any such a Hamiltonian, in which typically the individual terms do not commute.As seen in Sec.III, however, we have a number of mathematical tools to deal with these in many physically relevant regimes.

A. List of open questions
We have covered a number of statements in different areas and summarized many of the existing results on the topic.However, plenty of relevant questions are still open.We now summarize some of them, which we believe to be of particular physical or technical interest: • In Sec.III A we have explained the technical concept of cluster expansion, and their large number of applications in this context.Understanding its convergence further, in particular when considering the expansion of operators as done in [83], seems crucial for understanding relevant ideas such as the Hamiltonian of mean force (Sec.V B), and the decay of the conditional mutual information (Sec.IV C).It could also be interesting to extend them to long-range interacting systems [178].
• The ideas of Sec.V, and in particular the Hamiltonian of mean force, have in the past few years features in the study of thermodynamic quantities for strongly coupled systems [89,91,92,179,180].Many existing results on this topic focus on simpler models than those considered here, such as individual spins coupled to quadratic baths [90,181].In Sec.V C we have outlined how one can also answer thermodynamic questions about strongly coupled spin systems.It would be interesting to further explore whether the results from Sec. V have further non-trivial consequences, such as those found in [182].
• With the advent of quantum computing, there are multiple ongoing efforts aiming to find more efficient quantum algorithms for thermal sampling and partition functions.As we have seen in Sec.VII D, many of the existing ones are designed for very general situations, and as such have performance bounds that will often be too conservative.Some existing schemes make use of relevant physical features to simplify them [51,154,158,183], but it seems that there is still plenty of room for exploring the kinds of regimes in which explicit and efficient algorithms can be proven.Since preparing thermal states is presumably an easier task than a general quantum computation (at least in certain regimes), it may be possible to tailor them to the limited capabilities of near-term noisy devices [157,184].
• An important question when dealing with large quantum systems is to construct efficient ways to verify and characterize them.In the present context, the question is that of the complexity of the problem of thermal state tomography [12,29,162,185].The basic question is: can we learn the Hamiltonian from a small number of simple (local) measurements of few copies of e −βH /Z? Optimal sample and computational complexity bounds exists in the high temperature regime, in which the cluster expansion applies [29], but beyond that our theoretical understanding is not complete (for instance, in 1D).This problem has a number of applications, including the verification of quantum computation in which thermal sampling is involved [4][5][6][7], or the characterization of many-body entanglement [186,187].
H S + H B + H I , in which the interaction H I is arbitrarily weak.In that limit, we can approximate so that the eigenstates are product between system and bath.
A common and often relevant assumption is that the dynamics is ergodic, in the sense that we can describe the system-bath by the microcanonical ensemble, where all configurations of the same energy E have equal probability.This is The bath is typically understood as an infinitely large system, with an unbounded heat capacity dβ .The bath also obeys the very weak constraint that its entropy is extensive with system size.Both these facts translate into the density of states of the bath B having the following exponential form (see e.g.[189]) (A3) That is, the number of bath eigenstates with energy E B is exponential in that energy.
We can use this to obtain the expression for the reduced density matrix on the system which are exactly the Gibbs weights.

Jaynes' maximum entropy principle
A well-known property that uniquely characterizes thermal states is the so-called maximum entropy principle.This specifies that of all the states with a given energy (or the expectation value of some other quantity) they are the state of largest possible entropy.To see this, let us choose ρ = ρ β such that Tr (A9) Notice that these steps are unchanged if instead of considering just the Hamiltonian H we take into account a higher number of charges Q i with their chemical potentials µ i , and the state exp(− j µ j Q j )/Tr[exp(− j µ j Q j )].This simple principle is often interpreted as follows: if there is some state of which we only have partial information (in this case, its average energy), it is very often a good guess to assume it is the thermal state of that energy.Since it is the state with maximum entropy (which we can associate with "maximum ignorance"), its choice makes the fewest assumptions about the structure of the actual state at hand.This idea is often applied in fields like statistical inference and optimization problems, as well as certain quantum algorithms [4][5][6][7].It can also be seen as a variational definition that uniquely singles out thermal states.This allows for the application of this principle in different types of algorithms for finding or characterizing them [12,190].Considering that Eq. ( 28) also applies to the generator A l (iβ), using ( 29) and ( 36) and the triangle inequality repeatedly yields Proof of Quantum Belief Propagation Eq. (42) The aim of this section is to give an expression for the derivative of the matrix exponential de −βH(s) ds , where we assume H(s) = H + sA.These steps are elementary and have been omited in some previous relevant references [12,48,49], but here we reproduce them in full as they appear in [188] First, using DuHamel's identity, we can write .
Proof of Eq. (55) This can also be found in [17].Let F (t) be a differentiable and bounded operator.DuHamel's identity for a general operator function F (t) states that .This follows from H ölder's inequality Eq. ( 7) and the positivity of C, C .where the last step follows from the triangle inequality, Eq. (B23) and the change of variable u − 1/2 = s.
Proof of Eq. (126) This can also be found in [79].Let p(x) be an arbitrary probability distribution with ∞ −∞ xp(x)dx = a, and the condition that p(x) be Lebesgue integrable.We aim to bound where in the second line the first term vanishes by definition, and in the third line we used the concentration bound Eq. ( 99).

FIG. 3 :
FIG.3: Regions C, D in the lattice are separated by a distance dist(C, D).The mutual information between these two regions typically decays exponentially with their distance.

FIG. 4 :FIG. 5 :
FIG.4:In this configuration, the region B shields A from C, such that the minimum distance between A and C is given by the shortest path from A to C through B.

FIG. 6 :
FIG. 6: Choice of regions for the proof in Sec.V A 1, and depiction of the distance l on which the operator E l BC

FIG. 7 :
FIG. 7: Illustration of the regions of the Hamiltonian of mean force.The correction term Φ A is exponentially well approximated by Φ lA , which has support on the region ∂A only.

Appendix B :
Miscellaneous proofsLocality of operator EAIn Sec.III B we defined the operatorE A = e −β(H+A) e βH = T e − β 0 dse −sH Ae sH , (B1)which is the solution of the differential equationdE A dβ = −E A A(iβ),(B2)with A(iβ) = e −βH Ae βH .We can also define the localized generatorA l (iβ) = l m=0 β m C m (A),(B3)and also the corresponding operator E A (l) as the solution of dE A (l) dβ = −E A (l)A l (iβ).

2 − 1 / 2
Finally, log Tr[Ce H1+H2 ] − log Tr[Ce H1 ] ds||e s(H1+tH2) H 2 e −s(H1+tH2) ||, (B26) p(x + a) + p(−x + a))dx (last step we used the fundamental theorem of calculus.This can now be integrated by parts as − this implies that changing the Hamiltonian by A changes the logpartition function at most by β||A||.The second is a similar result that holds for expectation values of positive operators.Let H 1 , H 2 be as before, and let C > 0. Then log Tr[Ce H1+H2 ] − log Tr[Ce H1 ]