Optimal Quantum Thermometry with Coarse-grained Measurements

Precise thermometry for quantum systems is important to the development of new technology, and understanding the ultimate limits to precision presents a fundamental challenge. It is well known that optimal thermometry requires projective measurements of the total energy of the sample. However, this is infeasible in even moderately-sized systems, where realistic energy measurements will necessarily involve some coarse graining. Here, we explore the precision limits for temperature estimation when only coarse-grained measurements are available. Utilizing tools from signal processing, we derive the structure of optimal coarse-grained measurements and find that good temperature estimates can generally be attained even with a small number of outcomes. We apply our results to many-body systems and nonequilibrium thermometry. For the former, we focus on interacting spin lattices, both at and away from criticality, and find that the Fisher-information scaling with system size is unchanged after coarse-graining. For the latter, we consider a probe of given dimension interacting with the sample, followed by a measurement of the probe. We derive an upper bound on arbitrary, nonequilibrium strategies for such probe-based thermometry and illustrate it for thermometry on a Bose-Einstein condensate using an atomic quantum-dot probe.


I. INTRODUCTION
Thermometry is a basic metrological task, vital throughout science and technology. Estimating temperature is important on all scales, ranging from astronomical bodies with temperatures in the millions of Kelvin to atomic systems near absolute zero. In particular, applications of thermometry in nano-or micro-scaled devices are becoming increasingly relevant as technology advances [1][2][3][4]. Examples include, for instance, accurate temperature estimation of ultra-cold gases [5][6][7][8][9], in electronic systems [10][11][12], or the use of atomic-size devices, such as colour centers in diamond or quantum dots, as probes to be employed in a variety of systems [13][14][15][16]. At these scales, quantum effects have significant influence on the achievable precision. It is therefore important to understand what the fundamental limits for temperature estimation in quantum systems are.
When the measurement resolution is unlimited, the ultimate precision of temperature estimation allowed by quantum mechanics is obtained by performing projective measurements of energy [32][33][34]. However, for large (or even moderately-sized) many-body systems, one seldom has access to measurements which distinguish individual energy levels. Instead, one usually measures only a local subsystem of the sample [19,35,36] or performs a global measurement with only a finite resolution [30,31] [see Fig. 1(a)]; alternatively, one addresses the sample indirectly, by measuring a probe that has interacted with it [18,19,27,29,37] [see Fig. 1(b)]. All of these cases are examples of a coarse-grained measurement, which from an abstract point of view can be described by a d-outcome generalized quantum measurement of a D-dimensional system, with d < D. The fact that d D in most physically relevant cases may reduce the precision significantly. It is hence natural to ask what the optimal measurement strategy and the associated precision of temperature estimation are under such limitations. In this paper, we put forth a framework for addressing this question in detail. The framework is based on ideas from signal processing and parameter-estimation theory, and provides a simple, easy-to-use toolbox for studying coarse-grained thermometry of both few-and many-body systems. We illustrate the framework by applying it to paradigmatic many-body models of spin lattices, both close to and far from criticality.
In a second part of the article, these abstract ideas are applied to probe-based temperature measurements. Here, the temperature of a sample is estimated by letting it interact with a probe (and possibly some auxiliary arXiv:2011.10513v2 [quant-ph] 20 May 2021 system), and then measuring the probe, as illustrated in Fig. 1(b). These type of measurements are particularly appealing since they provide a natural way to overcome one of the main challenges in thermometry: the design of noninvasive measurements, , e.g., for ultra-cold atomic gases [9,[38][39][40][41][42]. In such probe-based measurements, a natural strategy is to let the probe reach thermal equilibrium with the sample [32]. Yet, it has been shown that the precision can be considerably enhanced by nonequilibrium strategies, where the probe either interacts with the sample for a finite time [9, 22-28, 43, 44], couples strongly to the sample [18,19], or uses an ancillary catalytic system [29]. Here, we use our framework to obtain a fundamental bound on any such nonequilibrium strategy. We map the problem of probe-based thermometry to that of coarse-grained thermometry, and determine the maximal sensitivity that can be obtained with a probe of dimension d. We construct a specific fine-tuned sampleprobe interaction that always saturates the bound, and notably show that it can also be reached in relevant experimental situations. In particular, when the sample and the probe are described by a harmonic oscillator and a qubit, respectively, the optimal nonequilibrium estimation scheme in the low-temperature regime can be obtained via the Jaynes-Cummings Hamitonian. This is of direct relevance to temperature measurements of Bose-Einstein condensates [45] or micromechanical resonators [22,23] via qubit probes.

II. FRAMEWORK
To be specific, we consider a system S living in a Ddimensional Hilbert space, described by a Hamiltonian H, and initially in a canonical thermal state where β = 1/T is the inverse temperature (we adopt units such that k B = 1) and Z = Tr e −βH is the partition function. This is a family of states parameterised by the temperature, and the smallest variance in estimating this parameter, based on any measurement, is hence lower bounded by the quantum Cramér-Rao bound [46] (see also [47,48]) where n is the number of repetitions of the measurement and F is the quantum Fisher information with respect to T , which we refer to as the 'thermal Fisher information'. It is given by with var(H) := H 2 τ − H 2 τ , where the angle brackets denote averaging over the quantum state: Ô τ := Tr(Ôτ ), and C is the heat capacity of the system (C : =   FIG. 1. (a) Thermometry with coarse-grained energy measurements. The measurement can be understood as resulting from post-processing of a fine-grained, projective energy measurement. Energies are grouped into bins and a single outcome is assigned to each bin. (b) Thermometry with probebased measurements. A probe interacts unitarily with a target system. A measurement is then performed on the probe alone and used to infer the temperature of the target. d H /dT ). The optimal measurement, attaining the thermal Fisher information, is a projective measurement onto the eigenbasis of H, i.e., a projective measurement of the system energy. In this optimal scenario, the more the energy fluctuates, the more precise the measurement can be in principle. The optimal measurement saturates the inequality (2) for arbitrary n when the temperature estimator is unbiased. When an unbisaed estimator is not available, a large class of generic biased estimators will still asymptotically saturate the inequality (2) in the limit of many repetitions (n → ∞) [49]. Moreover, in the specific case of temperature estimation in manybody systems consisting of N 1 particles, one can go even further and explicitly construct an estimator that, despite being biased for any finite N , will saturate the Cramér-Rao bound (2) as N → ∞, even for n = 1 [50].
However, when fine-grained measurements of energy are not available, while remaining valid, the bound (2) will in general become too loose. We formalise the problem as follows. Suppose the resolution of the experiment is limited to d < D measurement outcomes. What is then the maximal precision for estimating the temperature of S and which is the optimal d-outcome measurement achieving it? Below, we provide a systematic way to identify the optimal measurement. Moreover, in Sec. V, we prove that the highest Fisher information achievable by a d < D dimensional probe undergoing an arbitrary interaction with the sample is equal to the optimal d-outcome coarse-grained Fisher information in the above sense. We thus provide a fundamental benchmark for any conceivable protocol of probe-based thermometry. In particular, this includes any standard thermometric technique in current experimental setups.

III. OPTIMAL COARSE GRAINING
We consider coarse-grained thermometry on a Ddimensional system.
We take coarse graining to mean that only generalised measurements, i.e., positive operator-valued measures (POVM) with at most d < D outcomes are available. We then construct a framework for identifying the optimal POVM for thermometry in two steps.
First, we show that the optimal POVM is a projection onto energy subspaces of the system. This means that we can split the system spectrum into d subsets and study measurements which project onto the corresponding eigensubspaces.
Second, we show that the optimal choice of subsets consists of consecutive "bins," i.e., the sets are not interspersed. We then lay down a method for constructing the optimal choice of bins, for any given system spectrum. This is done by casting the problem in the language of an analogous signal-processing problem, known as Lloyd-Max quantization [51].

A. Optimal POVM
We take a d-outcome POVM M = {M α } d α=1 and a system in the thermal state τ of Eq. (1). Each outcome α then occurs with probability This distribution contains information about the temperature T , as quantified by the Fisher information [52] which, for a thermal state, becomes [30] Note that, after coarse graining, p α is no longer in the so-called exponential family with respect to T 1 , which means that no temperature estimator can satisfy the Cramér-Rao bound for any finite number of repetitions n [49]. However, the Fisher information (C) is still a key precision quantifier as, for all unbiased and certain generic biased estimators, such as the maximum likelihood estimator, the Cramér-Rao bound is saturated asymptotically in the n → ∞ limit. Moreover, in the same limit, the Fisher information retains its key role even in the Bayesian estimation approach 2 . We thus set as our goal to determine the optimal doutcome POVM which maximises the Fisher information C.
We first observe that the POVM elements M α can be taken to be diagonal in the energy eigenbasis. Because τ is diagonal in this basis, only diagonal elements of M α contribute to the probability p α in Eq. (4). Dropping all off-diagonal elements from each M α results in a valid POVM since the operators remain positive and still sum to identity. Hence, for every POVM, there exists a diagonal POVM which achieves the same p α and thus the same Fisher information. It is therefore sufficient to consider diagonal POVMs when looking for an optimal POVM maximising C.
Next, let us note that C is convex with respect to the POVM. That is, denoting by C M the Fisher information corresponding to a particular POVM and considering two POVMs M and N and a mixing parameter 0 ≤ λ ≤ 1, we have where λ = 1 − λ. This can be seen by rewriting Eq. (6) as α . Equation (7) is then an immediate consequence of the fact that 1 p W 2 is a jointly convex function of p and W (see, e.g., Ref. [54]).
Finally, we show that the optimal POVM is necessarily a collection of nonoverlapping projectors onto eigensubspaces of H. Indeed, take a POVM M such that, for some eigenstate |k of H, there are at least two POVM elements for which k|M α |k > 0. Define Now, construct d new POVMs N (γ) , with elements Each N (γ) has the property that only N (γ) γ has a nonzero k'th diagonal, namely k|N (γ) Since the M α form a POVM, we have ς α ≥ 0 and α ς α = 1. By convexity (7) of the Fisher information, and it follows that there must be at least one γ for which . This means that the optimal POVM will have to be one that consists of nonoverlapping projectors on eigensubspaces of H. Given a (possibly degenerate) Hamiltonian H = i E i |i i| the optimal POVM will thus be of the form where the I α define a partition of the set of all eigenenergies into nonoverlapping subsets ("bins").
To summarize, optimal, coarse-grained thermometry can always be achieved by considering projective measurements onto nonoverlapping eigenenergy subspaces.

B. Optimal binning
We now construct a method for determining the optimal eigenenergy subsets, defining the optimal POVM.
For convenience, we choose the basis of H such that and write the probability of finding the system in bin I α as [cf. Eq. (4)] where q i = exp(−βE i )/Z. Next, we introduce the "bin energies" (normalized average energy within each bin), and, with these definitions, reexpress the Fisher information in Eq. (6) for the corresponding measurement as We shall henceforth refer to this as the coarse-grained Fisher information. Compared with Eq. (3), the expression for C describes how each of the energies α fluctuates away from the average. As a first step towards finding the optimal sets I α , in Appendix A, applying a result from Ref. [55], we prove that the best choice of I α is given by a binning into consecutive intervals: will not enter into any physical quantity and is introduced just having an open end). Note that, for discrete spectra, the boundaries b α need not be exactly at energy eigenvalues. Positioning a boundary anywhere between neighboring eigenvalues results in the same POVM in Eq. (13).
The remaining task is then to find the optimal intervals I α = [b α−1 , b α ) which maximize C. This will give the best strategy for temperature estimation using a d-outcome POVM.
Before carrying out this optimization, it is useful to recast the problem in terms of the density of states (DOS) where δ denotes the Dirac's delta function (note that this definition does not assume a continuous spectrum). Expectation values of any function g(H) of the Hamiltonian may then be written as This means that we can define the distribution of energy as so that expectation values can be computed simply in terms of integrals over q(E). This way, the probabilities p α in Eq. (15) and the bin energies in Eq. (16) can be conveniently written as These quantities are therefore all expressed explicitly as functions of the boundaries b α . The advantage of introducing the DOS is twofold. First, it allows for a unified treatment of Hamiltonians with discrete and quasi-continuous spectra (as one would expect in quantum many-body systems). Second, it allows us to frame the problem in the language of signal processing. A common task in signal processing is to transmit a continuous function q(E) through a channel. Often, in order to do so, the function must be discretized into a finite set of bins I α = [b α−1 , b α ). The question is then which choice of bins leads to the optimal transmission. This problem is solved by using the scheme known as Lloyd-Max quantization (see Ref. [51], Chapter 3). If one uses the mean squared variations of energy as a figure of merit, one sees that the maximization of the Fisher information [see Eq. (17)] becomes entirely analogous to this signal processing problem.
To proceed, we introduce the quantity It can be directly verified that the thermal Fisher information given by Eq. (3) can be decomposed as which means that the task of maximizing C, for a fixed F, is tantamount to that of minimizing D.
The minimization can be carried out in the usual way, by equating to zero the derivatives of D with respect to b α . A straightforward calculation shows that the minima of D occur when the intervals b α satisfy the implicit (and generally nonlinear) equations, 3 These equations are implicit because α itself is a function of the {b α } [Eq. (16)]. This summarizes the core of our framework. Equation (24) provides a recipe for how to optimize the energy bins in a d-outcome POVM in order to maximize the thermometric precision.

C. Illustrative examples
Let us now present two examples using our framework for optimal coarse-grained thermometry.

Noninteracting qubits
A simple, but illustrative example, is a system of N identical, noninteracting qubits. The system is in a thermal state, and take the ground-and excited-state energies to be 0 and 1, respectively. The energy levels of the system will thus range from 0 to N in integer steps. The 3 The solutions to this equation are also guaranteed to be an actual minimum of D, never a maximum. First, ∂ 2 D/∂bαb α = 0 for α = α. Second, at bα = ( α+1 + α)/2, we have ∂ 2 D/∂b 2 α = 2β 4 ( α+1 − α)q(bα) 0, since q(bα) 0 by construction and α+1 α by hypothesis.
probability to find the system in a state with energy j is then where we have defined the excited-state population s = 1/(e β + 1) and r = 1 − s. For a d-outcome measurement, the probabilities and bin energies take the form where the bin positions b α , which are integers in this case, are determined from Eq. (24) (with b 0 = 0). Note that, for this system, the average energy is simply H = N s, while the thermal Fisher information (3) is F = β 4 N rs.
In Fig. 2, we show the ratio between C/F as a function of the bin positions b α , for the cases d = 2, 3. The bins b α which maximize C/F are precisely the solutions of Eq. (24). According to the De Moivre-Laplace theorem, in the N 1 limit, the distribution q j becomes Gaussian. Using this, we show in Appendix B that, for binary measurements (d = 2) and N 1, optimal binning strategy leads to and is achieved when the boundary is inserted at b = H = N s. This result is noteworthy, it shows that, irrespective of the number of qubits N , it is always possible to construct a dichotomic measurement strategy which captures (2/π) ≈ 0.63 of the full thermal Fisher information.

Linear density of states
To contrast with the N qubits case, we now consider an example of a system with a continuous spectrum. Namely, we assume the system has a linear density of states: Ω(E) ∝ E. Such a DOS is met, for instance, in a noninteracting, ultra-relativistic gas in two dimensions. If the system is in a thermal state [Eq. (1)], the average energy is simply H = 2/β and the variance is var(H) = 2/β 2 . Thus, the thermal Fisher information given in Eq.
We first consider the case of binary measurements, d = 2, defined by a single boundary b. The probabilities p 1 and p 2 [Eq. (21)] are then given by and the corresponding bin energies become Thus, the Fisher information for the measurement, Eq. (17), is To find the optimal partition, we solve Eq. (24) for b, i.e., b = ( 1 + 2 )/2. This is a nonlinear equation which can be solved numerically. A plot of C/F is shown in the inset of Fig. 3(a). It attains a maximum at βb ≈ 2.589. At this point C ≈ 0.643F, i.e., the binary measurement reaches approximately 64% of maximal Fisher information for any possible measurement (this is slightly higher than in Eq. (27)).
The dependence of C on the number of outcomes d, for optimal binnings, is shown in Fig. 3(a). Quite remarkably, even with as little as 8 bins, one can already reach a precision of ≈ 97% of F-the maximal possible precision. An illustration of the probabilities p k and the corresponding bins b k is given in Fig. 3

D. General remarks and extension to imperfect measurements
The two examples in Sec. III C carry an important general message: even measurements so coarse-grained as to have only two outcomes can estimate the temperature of a generic system with precision (as quantified by the Fisher information) that is proportional to the ultimate precision-the thermal Fisher information [Eq. (3)]. In Appendix C, we show that this is not a coincidence, by proving that any system for which the energy distribution is unimodal and has sufficiently fast decaying tails, displays a proportionality C ∝ F. More specifically, we prove that there exists a finite number Ξ ∈ [0, 1] such that C ≥ ΞF.
As will be discussed in Sec. IV, the relevance of these results lies in the fact that unimodal energy distributions with quickly decaying tails are a generic behaviour expected in finite-temperature many-body systems with short-range interactions, both at and away from criticality. In fact, we will show that Ξ = 2/π, as in Eq. (27), actually happens for a large variety of interacting lattice models. It is of course possible to conceive nonunimodal energy distributions for which the proportionality C ∝ F breaks down. We illustrate this in Appendix D, where we construct an example for which C/F → 0 when N → ∞.
Our framework can also be adapted to scenarios where the ideal energy binning cannot be implemented and imprecisions are present. In the simplest case, one could have some imprecision in determining the optimal bins through Eq. (24), which would lead to a decrease in C. For the case d = 2, this is illustrated in the inset of Fig. 3(a), which shows how C/F decreases as the bin position deviates from its optimal value.
Another way in which imprecisions can enter our framework is when the POVMs (4) are noisy. For instance, experimental errors may cause energies close to a bin edge to sometimes result in outcomes corresponding to neighbouring bins. Such effects can be accounted for within our framework by modifying Eq. (21). To see that, we first rewrite Eq. (21) as and similarly for α . Here B α (E) is a boxcar function, with value 1 when , and 0 otherwise. A similar analysis can also be done at the discrete representation of Eq. (15). In this case, we would have i is a matrix with entries 1 when E i ∈ I α and zero otherwise. However, it is more convenient to work with the continuous-energy representation (29).
It is now straightforward to generalize Eq. (29) to include the effects of noise by replacing B α (E) by a different function. For instance, a smoothed boxcar as depicted in Fig. 4(a). Since α p α = 1 for any initial distribution q(E), it follows that α B α (E) = 1 for all E. This can be viewed as a normalization condition for B α (E). In fact, B α (E) is actually a combination of a stochastic matrix (whose columns add up to one), plus an isometry, which reduces the dimension from a continuous energy E, to a discrete set of outcomes α. The precise form of B α (E) will depend on the details of the experiment.
Measurement errors can cause not only a loss of precision, but also systematic shifts in energy by, e.g., falsely displacing the energies α by a certain amount. For simplicity, we shall study these kinds of imprecision separately. We defer the discussion of robustness to energy shifts to Sections IV A and IV B, while here we choose B α (E) to be symmetric in the interval [b α−1 , b α ), and centered at the midpoint (b α−1 + b α )/2, so that the α 's are not displaced.
The remaining feature to describe is errors associated with imperfect binning. This can again be done using the smoothed boxcar of Fig. 4(a) where erf(x) is the error function and σ is a parameter measuring the degree of imprecision (a sharp boxcar is recovered when σ → 0). A function of this form defines a certain energy window 2σ, where measurements associated to one bin can be recorded in another. For this reason, the wider bins tend to be less affected than the thinner ones, which is physically reasonable.
We illustrate the above ideas with the linear DOS example of Sec. III C 2. In Fig. 4(b) we present C/F as a function of d for binning strategies computed using the smoothed boxcars (30), with different values of σ. This is contrasted with the ideal case, shown in red, which coincides with the red curve in Fig. 3(a). As can be seen, unsharp bin edges necessarily decrease the coarse-grained Fisher information. That said, C/F is surprisingly robust: even when the smearing occurs over a large part of the bin width (e.g., 30%), C/F does not decrease much (only about 7%).

IV. MANY-BODY LATTICE MODELS
We now proceed to analyze quantum systems on a lattice, which is one of the most physically relevant settings where the coarse-grained measurements could be useful. We start with general considerations and a tight-binding chain as an illustrative example. Then we show a general result for all noncritical spin models and conclude with an analysis of a system undergoing a thermal phase transition.

A. Gaussian density of states
In many-body lattice models, the energy distribution (20) often displays an approximate Gaussian form in the thermodynamic limit [56][57][58] (see also the detailed discussion in Sec. IV B and Appendix E). As a simple, illustrative example of this principle, consider a fermionic one-dimensional tight-binding chain with N sites, under periodic boundary conditions: whereĉ k is the fermionic annihilation operator at site k, ε is the on-site energy, t is hopping (tunneling) strength, andĉ k+N =ĉ k ensures periodic boundary conditions. When diagonalized, the Hamiltonian of this model takes the form [30,59] with the (linearly) transformedĈ a 's satisfying standard fermionic anti-commutation relations, and with eigenenergies given by ε a = ε − t cos(2πa/N ). In Fig. 5(a), we numerically compute the energy distribution (20) of this model, and compare the results to a continuous Gaussian distribution with average energy H and variance µ 2 ≡ var(H) = −∂ H /∂β (both of which depend implicitly on T ); i.e., We observe that the Gaussian distribution is a good approximation already with a modest number of sites and a modest hopping strength. The approximation, in fact, improves with the number of sites and becomes exact in the thermodynamic limit (see Refs. [57,58]).
Let us now take a Gaussian distribution as a given and compute the Fisher information for different coarsegrainings of the continuous distribution (33). In this case, the probabilities and bin energies in (21) become are the shifted and rescaled bin positions. The full Fisher information is simply F = β 4 µ 2 . For a d-outcome measurement, one can then numerically perform the optimization according to Eq. (24) to find the best such measurement and the corresponding coarse-grained Fisher information (17). The results for C/F for different numbers of bins d, are shown in Fig. 5(b). As in the linear density of states case, one sees a quick growth of C with d towards the maximum value F.
The particular case of d = 2 can be obtained by setting b 0 = −∞, b 2 = ∞ and b 1 = b. We then get so that the coarse-grained Fisher information [Eq. (17)] becomes The result is expressed solely in terms of the shifted bin positionb; therefore, the minimization procedure is independent of H or µ. In fact, as one may readily verify, the function in Eq. (35) is maximized atb = 0. That is, the bin should be placed symmetrically, at b = H . The corresponding maximum is This relation is robust with respect to imprecise identification of the optimal boundary (which can be understood as a systematic error in the energy measurement, as mentioned in Sec. III D). Indeed, Taylor-expanding the right-hand side of Eq. (35) with respect to small b around it optimal value,b = 0, we find that Even for a significant deviation of |b − H | = 0.3µ, C/F degrades only by ≈ 3.3%. Not coincidentally, the relation in Eq. (36) also appears in the case of noninteracting qubits in the limit of large N (Sec. III C 1). This is because the energy distribution in that case also becomes Gaussian in the N 1 limit, due to the central limit theorem.
In Fig. 5(c), we illustrate the optimal bins and the corresponding probabilities for the distribution (33) in the case of d = 8. In this case, the optimal bins have to be located numerically. Unsurprisingly, it is found that the optimum is symmetric around the average energy.

B. Noncritical, interacting systems on lattices
Now we will show how some of the conclusions of Sec. IV A actually hold universally in the thermodynamic limit. Intuitively speaking, the idea is that generic lattice models with finite-range interactions, when away from criticality, tend to have Gaussian energy distribution due to the many-body Berry-Esseen theorem [56][57][58]. Therefore, the same behaviour as in Fig. 5 is expected to occur when coarse-graining to different partitions in such lattice models.
In fact, in Appendix E, we prove that the maximal C/F achievable by two-outcome measurements (d = 2) is 2 π +O(ln −1 N ), and the boundary of the optimal partition In the the thermodynamic limit, this coincides with the results for the exact Gaussian distribution (Sec. IV A) and for independent qubits (Sec. III C 1). We expect that, for d ≥ 3 partitions, one should be able to prove results identical to those obtained for the exact Gaussian energy distribution in Sec. IV A by using arguments along the lines of those in Appendix E.
In order to prove Eq. (36), we need to assume that the thermal state of the lattice satisfies the following two generic conditions: (i) Exponential decay of correlations: For arbitrary regions X , Y separated by a distance l on the lattice, and some constant ξ, (ii) The variance in energy scales linearly with the number of lattice sites: var(H) = H 2 − H 2 = s 2 N . Assumption (i) is expected to hold for a very large class of systems away from criticality. Indeed, it has been rigorously proven for 1D translation-invariant thermal states [60], finite-range fermionic lattice systems of arbitrary spatial dimension at nonzero temperatures [61], and all finite-range lattice systems above a threshold latticedependent temperature [62]. Assumption (ii) is expected to hold for most systems at a high enough temperature. In fact, note that (i) already implies that var(H) = O(N ) [63].
The detailed proof of Eq. (36) in Appendix E is based on the Berry-Esseen theorem for local Hamiltonians which relies on the two assumptions above and is proven in Refs. [57,58] (see also Appendix F 1). This result can be seen as a strengthening of the central limit theorem, which gives a precise notion of how the energy distribution of lattice models converges to a Gaussian in the thermodynamic limit. It allows us to estimate functions of the form of Eq. (19), in this case, the bin energies k .
Lastly, the Gaussian behavior of noncritical manybody lattice systems also extends to the problem of how robust C/F = 2/π is to imprecise identification of the optimal binning boundary. Indeed, as we show in Ap- so, as in Sec. IV A, for e.g. |b − H | = 0.3 var(H) (and N 1), C/F will degrade only by ≈ 3.3%.

C. Critical systems
The thermal Fisher information (3) is proportional to the heat capacity of the system, which scales as C = N c(β), where c(β) is the specific heat. For noncritical systems, c(β) is intensive. However, at a finitetemperature phase transition, it diverges as the temperature of the system approaches the critical temperature T c > 0, according to c(β) ∝ |t| −α , where t := (β − β c )/β c and α ≥ 0 is the so-called critical exponent [64]. When α = 0, the divergence is logarithmic: c(β) ∝ ln |t| −1 [64]. In large but finite systems, there are of course no infinities and, at the phase transition point, with N implies that critical systems do not satisfy the condition (ii) of Sec. IV B. In general, critical systems also feature diverging correlation lengths [64], thereby violating the condition (i) of Sec. IV B as well. Therefore, the many-body Berry-Esseen theorem becomes inapplicable for critical systems.
In Appendix F, building on several rigorous results on translation-invariant lattices with finite-range interactions in Refs. [59,67,68], we develop an alternative approach towards determining the energy distribution of such lattices in arbitrary spatial dimensions. Fist of all, for noncritical lattices, we show that the energy distribution approximates a Gaussian in a way that complements the many-body Berry-Esseen theorem [57,58]. Moreover, for this wide but specific class of lattices, our approach allows us to access the energy distribution even at criticality.
For critical lattices with α = 0, we show in Appendix F 2 a that the energy distribution still tends to a Gaussian in the N → ∞ limit; however, the convergence does not include the tails of the distribution, which are O( √ N ) standard deviations away from H . In a sense, for translation-invariant lattices, this result suggests that the Gaussianity of the distribution holds beyond assumptions (i) and (ii) above [57,58]. Thus, Eq. (36) applies in the thermodynamic limit, both at criticality (with α = 0) and away from it. We illustrate these ideas below in Sec. IV C, with a detailed study on the classical 2D Ising model on a square-lattice, a paradigmatic model with α = 0.
The case 1 > α > 0 is treated in Appendix F 2 b. We show that the energy distribution is Gaussian only in a neighbourhood of the peak that is much smaller than the standard deviation. Hence, it is not Gaussian as a whole. Notwithstanding, we show that it is unimodal with exponentially decaying tails, which means that the considerations in Appendix C are applicable; that is, a two-bin measurements with the boundary placed at H will yield a C that scales proportionally to F. In other words, since F = β 2 N c N , we will have C ∝ β 2 N 2 2−α .

2D Ising model
The square-lattice 2D Ising model is defined on an L × L square lattice where each site i is characterized by a Pauli matrix σ i z , with i = 1, . . . , N (N = L 2 ). The spins interact according to the Hamiltonian where the sum is over all nearest neighbors. Since the interactions only involve σ z operators, the Hamiltonian is already diagonal in the computational basis, with energy eigenvalues where σ = (σ 1 , . . . , σ N ) and σ i = ±1 are the eigenvalues of σ i z . Here, we will impose periodic boundary conditions. The model presents a phase transition at T c /J = 2/ ln(1 + √ 2) [69,71,72]. This can be seen, for instance, in terms of the magnetization m = 1 N i σ i , as plotted in Fig. 6(a).
For not very large N 's, the full energy distribution q(E) can be computed exactly using a method developed in Ref. [70]. Results for L = 8, 16, 32, and 64 are shown in Fig. 6(b)-(e). Although irregular for small sizes, it can be seen that the distribution visually appears to approach a Gaussian as the lattice size increases.
In order to rigorously prove that this is indeed the case, one needs to show that the higher-order (≥ 3) cumulants of the energy distribution, κ k , become irrelevant as N becomes large. As discussed in Appendix F 3, the cumulants of the energy distribution can be found from the free energy F N through the simple relation Moreover, for the 2D Ising model, an exact expression for F N , for finite N , is available [73]. Using these facts, we show in Appendix F 3 that κ 2 ∝ N ln N [64,66], while κ k ∝ N k/2 , for k ≥ 3 (these results are also illustrated in Fig. 6(f)-(h)). As a consequence, we therefore have that showing that the higher-order cumulants do indeed become negligible as compared to κ 2 = var(H); i.e., the distribution tends to a Gaussian as N → ∞.
Note that the previous discussion refers to the scaling at the vicinity of the critical point. Away from it, due to the extensivity and analyticity of F N in the limit of N → ∞, we simply have from Eq. (40) that κ k ∝ N for k ≥ 1. Hence, κ 2k , i.e., away from criticality, q(E) approaches its Gaussian limit polynomially, as compared to the slow logarithmic convergence at criticality.

V. PROBE-BASED MEASUREMENTS
Whereas the previous section allowed for arbitrary global measurements, in this section, we look at measurement schemes (both idealized and more realistic) which can be realised by the interaction of a probe with the system, possibly some auxiliary system of arbitrary size, and the subsequent measurement of the probe alone. We compare the performance of such probe-based measurements to the upper bounds obtained in Section III B.
First of all, we observe that the maximal thermometric precision achievable by measuring a d-dimensional probe P that has unitarily interacted with the system S in a thermal state τ , and an auxiliary system A in some state ρ A , is the same as the maximal precision of a d-outcome measurement on S. Here we assume that d < D, because otherwise one can simply transfer all of the state of Sand the (Fisher) information on β, F, along with it-to P ; however, when d < D, even the best possible strategy of encoding the state of S into that of P will bear losses. Indeed, if the initial state of P is some σ, then, whatever the optimal unitary U , standard quantum metrology tells us [74] that the optimal POVM on will have to have d outcomes. On the other hand, the probability distribution generated by any d-outcome POVM on σ can also be generated by a d-outcome POVM on S. Thus, denoting the quantum Fisher information of σ on β by C (P ) , we have C (P ) ≤ C.
To show that C (P ) = C, let us note that C is delivered by a projective measurement on the system corresponding to some binning I 1 ∪ · · · ∪ I d which yields measurement statistics p α = qj ∈Iα q j . Now, let us choose σ = |1 1|, so that, in the {|α ⊗ |E j } basis, σ ⊗ τ = diag q, 0, ..., 0 . Here, q = (q 1 , ..., q D ) and 0 is made of D zeroes. Then, the permutation unitary acting on σ ⊗ τ that permutes all the q j 's in I 2 from q with some of the zeroes from the 0 next to q, all the q j 's in I 3 with zeroes from the second 0, etc., will render σ = diag(p 1 , ..., p d ). And this distribution will produce a C (P ) that is = C. Note that, to show that C (P ) can be made equal to C, there was no need to involve any auxiliary systems.
For the case of d = 2, where the optimal POVM on S is defined by the bins I 1 = {E j : E j < b} and I 2 = {E j : E j ≥ b}, with b being the boundary, we will now show that such a permutation can be generated by the quantum-optics-inspired Hamiltonian where | ⇓ and | ⇑ are the eigenstates of the probe spin, with the corresponding eigenvalues E ⇓ = 0 and E ⇑ = b. This Hamiltonian may not be easily realizable in practice. However, the point is that, as we will show, it is guaranteed to provide the best possible precision using a two-level probe. This can then be used as a benchmark to compare against when using other interactions. Furthermore, we take the system's ground state to be at energy E 1 = 0, and since ultimately we are going to be interested in the case where the system's energy spectrum is effectively continuous, we will also assume that |E k + b is a valid eigenstate.
Let us initialize the system and the probe in the joint state ρ(0) = | ⇓ ⇓ |⊗τ . The simplest way to characterize its evolution under H is to describe how the pure states comprising it, |Ψ j (0) = | ⇓ ⊗ |E j , evolve under H. It is easy to show that, in the interaction picture (labelled by the superscript I), |Ψ I j (t) = | ⇓ ⊗ |E j for E j < b, and Thus, transitioning back to the Schrödinger picture, from ρ(t) = j 1 Z e −βEj |Ψ j (t) Ψ j (t)| we find the probability of finding the probe qubit in the spin-up state, P ⇑ , when measuring it at the moment of time t, to be Hence, for t meas = π/(2λ), the ideal projective measurement of the probe qubit's spin produces a probability distribution identical to that produced by the ideal binary measurement of the system corresponding to the binning I 1 ∪ I 2 .
Note that realizing this idealized scheme experimentally is far from being straightforward; we study a specific model realization in the next subsection, describing both its capabilities and limitations.
A. Jaynes-Cummings model As a specific illustration of quantum probe-based thermometry, we consider an experimentally relevant system consisting of a superfluid Bose-Einstein condensate (BEC) reservoir in a shallow confining trap interacting with an atomic quantum dot [45]. Generally, the physics of this system is well-captured by a spin-boson model, in which the atomic quantum dot interacts with the phononic excitations of the BEC superfluid. Given suitably engineered boundary conditions, the spectral density will be such that a quantum dot with frequency ω d will in effect couple only to the phonon modes which comes closest to be resonant with the quantum dot frequency (for simplicity we suppose all relevant phonon modes have the same frequency ω a ).
We can then ask the question of how well one can estimate the BEC temperature by measurements on the quantum dot probe. In Fig. 7(a) we plot the coarsegrained vs thermal Fisher information ratio for the optimal binary measurement within the effectively resonant subspace. The ratio is given as a function of the number of effectively resonant modes, which would be expected to increase in proportion to the width of the spectral density. From the figure we see that the ratio approaches a value of 0.64 as the number of modes increase. This provides an optimal value against which to compare specific binary measurement strategies. Interestingly we see that the obtained ratio agrees with the optimal ratio found for a binary measurement on a system described by a Gaussian density of states.
If we now consider the specific case where the effectively resonant subspace consist of a single phononic mode, and furthermore make a rotating wave approximation, the resulting system is modelled by the paradigmatic Jaynes-Cummings Hamiltonian [75]: where a, a † are the creation and annihilation operators of the bosonic cavity mode, g is the coupling strength, and σ = |g e| where the excited and ground states of the quantum dot are denoted by |e and |g respectively. Experimental work has shown how such models arise for specific thermometry protocols [6], and our results makes it possible to evaluate how close such a strategy is to being optimal. Consider a measurement protocol in which the quantum dot is initialized in its ground state. The quantum dot then evolves jointly with the BEC for a time t, after which the probability of finding the quantum dot in its excited state is give by where we have defined λ n = δ 2 /4 + g 2 (n + 1) in terms of the detuning δ = ω d − ω a . From this probability we can compute the coarse-grained Fisher information, see also [22]. In Fig. 7(b) we show the ratio of the coarse-grained Fisher information computed from Eq. (46) and optimized over the measurement time at each temperature, with the thermal Fisher information of the phononic mode itself. Inspection of the results show that the probe-based measurement gives a Fisher information which never falls below 45% of the optimal binary measurement strategy. In Fig. 7(c) we plot the optimal measurement time as a function of temperature, and we observe an inverse relationship between the optimal measurement time and the temperature. Notice that as the temperature approach absolute zero, the optimal measurement time for zero detuning approach π/2g in agreement with known results [22]. It is interesting to note that similar considerations for the Fisher information have been obtained for temperature measurements of micromechanical resonators via a qubit probe, whose interaction can also be described by the Jaynes-Cummings (45) [22]; and more general interaction Hamiltonians, either dropping the rotating wave approximation or considering interactions far off resonance, have also been considered [13,23]. In all such cases, our considerations provide upper bounds on the maximal precision estimation with a qubit probe, as shown in Fig. 7. Indeed, the strength of our bounds is that they apply to arbitrary nonequilibrium strategies.

VI. CONCLUSIONS
We considered the precision limits on temperature estimation when having access to coarse-grained measurements which have at most d outcomes. Using tools from signal processing, we derived the structure of the optimal POVM measurement. These abstract considerations have been applied to two physically relevant scenarios: temperature measurements of many-body systems and nonequilibrium thermometry.
For many-body systems, we considered spin lattices, both away from and at criticality, and found that the Fisher information C can grow extensively with the system size even when d does not. In particular, for d = 2, we obtain that it is in principle possible that C/F ≈ 2/π in the thermodynamic limit N → ∞ even for systems at criticality. While this will decrease for realistic strategies where the POVM are smoothed out (see the discussion in Sec. III D and Fig. 4 specifically), we expect that the extensive scaling will be preserved as long as the binary measurement can distinguish system energies that are O( var(H)) apart (see the discussion on displaced boundary in Sections IV A and IV B).
Along the way, we also derived new results on the energy distribution of many-body systems in the regime of criticality, which might be of independent interest. These generic considerations were illustrated on the 2D Ising model, the energy distribution of which becomes wellapproximated by a Gaussian distribution except in the tails of the distribution. We expect more pronounced non-Gaussian features in the energy distribution of other critical models, which we leave as interesting future research.
For nonequilibrium thermometry, we used our results to devise the optimal probe-system interaction and interrogation time, thus providing general guidelines on the design of optimal nonequilibrium thermometry strategies. This result also provides an upper bound on specific experimentally motivated protocols. This was illustrated for a temperature measurement of a Bose-Einstein condensate through a quantum dot via a Jaynes-Cummings interaction [22,23]. It remains an exciting open question to find a realistic implementation of the optimal probesample interaction (43).
Lastly, in this work, we focused on asymptotic metrology, where one has access to full measurement statistics and can possibly run the experiment many times. This may not always be feasible in practice, and the Fisher information may then no longer be an adequate precision quantifier. In such cases, alternative approaches are needed, such as global Bayesian estimation [76]. Analyzing the effect of coarse-graining in such situations is another interesting research direction.

ACKNOWLEDGMENTS
We sincerely thank Raffaele Salvia for pointing us to Ref. [55], from which it follows that consecutive binnings are optimal, as previously conjectured. We also gratefully acknowledge fruitful discussions with Raam Uzdin. MRJ In this appendix, we apply Theorem 1 of Ref. [55] to prove that the sets I α maximizing the coarse-grained Fisher information C can be chosen to be consecutive. First, we will state the mentioned theorem, and then we will show how it applies to our problem.
Theorem 1 (Theorem 1 in Ref. [55]) Let a 1 , ..., a D and b 1 , ..., b D be real numbers such that either a i ≥ 0 ∀i or b i ≥ 0 ∀i and If some b i = 0, then a i /b i is defined as sgn(a i ) × ∞ when a i = 0. When a i is also 0, a i /b i is defined arbitrarily to satisfy Eq. (A1). Next, for a partitioning of the set In view of the joint convexity of the function a 2 /b [54], we immediately see that C(a I1 , b I1 , ..., a I d , b I d ) is a jointly convex function of all of its 2d arguments. So, having shown that all the conditions of Theorem 1 are met for our problem, we have thus proven that the maximum of C over all partitions (≡ nonoverlapping bins) {I α } d α=1 is delivered by a consecutive partitioning. Lastly, let us note that the optimality of consecutive partitioning does not preclude the possibility that some other, nonconsecutive partitioning delivers the same maximum. In fact, direct numerical checks show that for our particular case such coincidences do indeed happen.

APPENDIX B: NONINTERACTING QUBITS IN THE LARGE N LIMIT
In the limit of a large number of qubits, we can approximate the sums in Eqs. (26) by integrals (using the De Moivre-Laplace theorem), and obtain the approximate probabilities and bin energies The case of binary measurements, d = 2, are defined by a single bin boundary b = b 1 . Assuming the system to be at a temperature such that N s/r 1, substituting into Eq. (17), we find The ratio takes its maximum value for b = N s, which is also a solution to Eq. (24). In this case we therefore arrive at Eq. (27). This is consistent with Eq. (36) in view of the De Moivre-Laplace theorem that almost straightforwardly states that the energy distribution of N 1 thermal identical noninteracting qubits is a discrete Gaussian. Thus, we find that in the large-N limit for a binary measurement, the optimal partition is at the average energy, and a very appealing feature is that such a measurement provides a Fisher information which is already 2 π ≈ 64% of the maximal possible value, independently of the system size N . In this appendix we show that a proportionality of the form C ∝ F is generically true for two-bin measurement, in which the separation of the bins is at the average energy H , for any distribution that decays sufficiently quickly (see Eq. (C4)). Note that, by energy distribution, we understand the q(E) = Ω(E) e −βE Z [Eq. (20)], and not just e −βE . Indeed, since β > 0, e −βE must decay monotonically with energy. But Ω(E) may concentrate energy in different sectors, which can cause the overall distribution q(E) to have arbitrary shapes. Indeed, as discussed in Sec. IV, quite often q(E) will have a Gaussian shape or, more generally, a short tail distribution.
For a dichotomic measurement, with bin position at b = H , the coarse-grained Fisher information (17) can be written, with some rearrangements, as where Here, to simplify notation, we also introducedq(E) = q(E + H ). Now, we require-and this is the only requirement we impose-that there exists a fixed (i.e., independent on N ) number λ > 1 such that These conditions are generically expected to be satisfied by unimodal distributions with fast decaying tails; however, unimodality is not a requirement as long as the above requirement is met. Let us further pick a 0 < λ 0 < 1 2 (C7) and define We designate the remaining interval as and we further define Now, by Eq. (C8), Then, again, by the definition of Eq. (C9), Invoking Eqs. (C4) and (C11), we rewrite Eq. (C12) as which renders With this, we immediately find that and thus, Keeping in mind Eq. (3), we thus have Note that the bound above, the purpose of which is solely to prove the scaling of C, may be rather loose, and it is of course desirable to find tighter bounds. We leave this task for future work.

APPENDIX D: EXAMPLE OF A SMALL FISHER INFORMATION
In this appendix we construct an example for which the scaling C ∝ F, found in several examples throughout the main text, breaks down. Albeit rather artificial, the goal of this construction is to illustrate that, although occurring generically, the relation C ∝ F is not universal.
To construct our example, let us take N two-level systems, the total Hamiltonian of which is such that the density of states has four sharp peaks, approaching as N → ∞, where t > 0 and > 0 are some fixed values of energy and the quantities ϑ N,± and ϑ N,±tN are also positive. By definition, dEΩ N (E) = 2 N , so We emphasize that the distributions are discrete and become continuous in the N → ∞ limit, and the delta functions above are also to be understood as sharp peaks that approach the delta function in the N → ∞ limit. Finally, we require that the effectively discrete energy distribution q N (E) = 1 Z e −βE Ω N (E) [cf. Eq. (20)] is which corresponds to the following choice of the quantities ϑ N,E : where, according to Eq. (D2), Z should be .

(D6)
Now, it is straightforward to see that the average energy corresponding to q N,E is zero. Thus, the thermal Fisher information will be Furthermore, considering two-outcome measurements with the optimal boundary being at the average energy (in this case 0), the bin probabilities will be p 1 = p 2 = 1/2 [cf. Eq. (15)] and the bin energies [cf. Eq. (16)] will be hence, according to Eq. (17), Thus, which, for fixed and t, scales ∝ 1/N , which strongly breaks the C ∝ F relation.

APPENDIX E: PROOF OF EQ. (36)
To prove Eq. (36), we first state the quantum Berry-Esseen theorem proven in [57,58] for lattices with finite-range interactions. The theorem requires the following two assumptions about the state of the system ρ: • i) The state has exponentially-decaying correlations: for arbitrary regions X , Y separated by a distance l in the lattice, and some constant ξ • ii) The variance in energy scales with the number of sites as var(H) = H 2 − H 2 = s 2 N .
The system is an N -vertex lattice L N (the vertices of correspond to "particles") and is described by a locally bounded, finite-range interacting Hamiltonian: Each H v acts only on vertices the Manhattan distance of which from v is ≤ z, where z is some fixed natural number that sets the (finite) range of the interactions within the lattice. Lastly, the theorem requires the Hamiltonian to be locally bounded: there exists a constant h > 0 such that where · can be chosen to be, e.g., the spectral norm (h will of course depend on the norm we choose).
Theorem 2 (Lemma 8 of Ref. [57]) Let ρ be a state such that assumptions i) and ii) hold, and with a local Hamiltonian with uniformly bounded local terms, of a system of N particles on a d-dimensional lattice. Given the cumulative function and the Gaussian cumulative function then where C 0 is a constant.
Crucially, the constant C 0 does not depend on system size, but only on parameters such as the range of the Hamiltonian (z), the lattice structure, or the correlation length ξ. Note that we can always set the energy of the ground state of H to be zero. Moreover, keeping in mind Eqs. (E2) and (E3), we have H ≤ v∈L N H v ≤ hN , meaning that the largest E i is ≤ hN . Thus, the real range of energies in Eq. (E4) can be summarized as E min := min{E i } = 0 and E max := max{E i } ≤ hN , and for any E ∈ [E min , E max ] one sets q(E) = 0, so that integrals with infinite energy ranges are meaningful.
We now bound C, assuming two partitions I 1 = (−∞, b] and I 2 = (b, ∞) (i.e., d = 2). Let us first estimate the probabilities p 1 and p 2 = 1 − p 1 , defined in Eq. (15), using Theorem 2: where Next, we note that, since the bin energies, as given by Eq. (16), will read Therefore, according to Eq. (17), for the coarse-grained Fisher information, we will obtain Now, taking into account Eq. (E7), we can write Thus, introducingb and noticing that similarly to Eq. (E12), we can write Turning to E, let us rewrite Eq. (E10) as where Λ > 0 is an arbitrary constant. With this, by choosing an arbitrary natural number R, we can lower-bound E as will read Next, using Theorem 2, we arrive at the following estimate: Substituting Eq. (E20) into Eq. (E19), we obtain Since Λ and R have so far been free, let us choose them such that With this choice, we have Let us specify our choice of Λ and R to further to Λ = ln −1 N and in full accordance with Eq. (E22) (note that this choice is not unique). With this Λ and R, that e −R 2 Λ 2 = e − ln 2 N = o( Λ) as N → ∞; therefore, we can absorb the second big O term into the first big O term in Eq. (E23). Then, substituting Eq. (E23) into Eq. (E21), we arrive at Since, for sufficiently large N , N −1/2 ln 2d+3 N = o ln −1 N , we can absorb the second big O term into the first big O term in Eq. (E26), thereby obtaining When |b| ≤ ln 4/5 N , then, by the same logic as above, the second O-big term in Eq. (E27) can be absorbed into the first O-big term, producing In order to estimate E in the |b| ≤ ln 4/5 N range more precisely, let us find an upper-bound for E akin to the bound (E28). To that end, we divide the decomposition in Eq. (E17) as with R and Λ satisfying the conditions in Eq. (E22). Let us first deal with the second term in Eq. (E29). To do so, recall that our system has exponentially decaying correlations, and therefore Theorem 4.2 of Ref. [77] applies. It states that, for arbitrary d-dimensional lattices with exponentially decaying correlations, there exists a constant ℵ > 0 such that, whenever |E − H | > ℵ var(H), the function where ℵ > 0, ℵ 0 > 0 are some constants. The bounds provided in Ref. [77] are a bit tighter, but the bound (E31) will be sufficient for our needs here. Now, turning to the second sum in Eq. (E29), we write where in the second line we noted that q(E)dE = −dJ(E) and performed integration by parts. Now, keeping mind that − ln 4/5 N ≤b ≤ ln 4/5 N and, according to Eq. (E22), R Λ = ln N , we have thatb + R Λ > 0 andb + R Λ ≈ ln N , which means that b + RΛ − H ≈ ln N 2 var(H). Hence, inequality (E31) applies, and therefore we can write ∞ l=R+1 Λl≤Ei−b<Λ(l+1) where Γ(x, y) is the incomplete gamma function.
For y 1, Γ(x, y) = y x−1 e −y [1 + O(y −1 )]. Therefore, the second term in the last line of Eq. (E33) is For N → ∞, sinceb + R Λ ≈ ln N → ∞, the right-hand side of Eq. (E34) decays to zero. However, we note that, when d = 3, it starts becoming small only when N 10 6 (which is for ℵ = 1, and for smaller ℵ it is even slower); for d = 1, 2, it becomes small much earlier. To obtain a "safe" estimate for the big O term in Eq. (E34), let us note that where we have introduced Now, substituting Eq. (E36) into Eq. (E29), we obtain which, keeping in mind Eqs. (E14), (E18), and (E20), we transform into Proceeding as in Eq. (E23) and taking into account Eq. (E24), we can rewrite Eq. (E37) as Thus, when |b| ≤ ln 4/5 N , equations (E28) and (E38) show that Substituted into Eq. (E16), Eq. (E39) produces where we used the fact that F = β 4 var(H). Note that Eq. (E40) resembles Eq. (36), and shows that the maximum of C/F over the boundary position, as quantified byb, is reached for some smallb. To be more specific, Taylor-expanding Eq. (E40) aroundb = 0, and considering onlyb 1, we see that This implies that (i) Someb 2 = O ln −1/2 N yields the maximum and that (ii) C/F is robust to variations of the bin boundary around its optimal position, in the sense discussed in Sections IV A and IV B.

APPENDIX F: ENERGY DISTRIBUTION OF CRITICAL LATTICE SYSTEMS
In this section, we will show that, at a finite-temperature phase transition point, the energy distribution of a translation-invariant, finite-range quantum lattice is Gaussian when the critical exponent α = 0 (α is the exponent corresponding to the specific heat; see Sec. IV and the subsection F 2 below), albeit with a larger variance as compared to the noncritical case (see Appendix E and the forthcoming subsection F 1). When α > 0, we show that the distribution is unimodal with exponentially decaying tails but not in general Gaussian (subsection F 2 b). The section starts with a derivation of the general formalism for obtaining the energy distributions, then, in Subsection F 1, we apply the formalism to the known case of noncritical systems, deriving results consistent with the many-body Berry-Esseen [57,58] theorem presented in Appendix E. Finally, in Subsection F 2 and its two subsubsections, we analyze the energy distribution for critical systems.
In order to understand what the energy distribution near the critical point looks like, we will invoke Lemma 12 from Ref. [68] (see also Theorem III.4.15 of Ref. [59]) stating that the cumulative density of states of a translationally invariant, finite-range lattice in arbitrary spatial dimensions is an exponential of the canonical entropy of the lattice. To cast this in more precise terms, we first have to introduce some notation. Fixing periodic boundary conditions, let H N denote the translation-invariant Hamiltonian of an N -site lattice (N 1). Then, for an arbitrary translationinvariant state Υ N on the lattice, we define the energy and entropy densities as Now, introducing the free energy density: where we can formulate the variational principle for finite N : where is the Gibbs state. The finite-N principle is read-off straightforwardly from the identity where S(Υ N ) = − Tr(Υ N ln Υ N ) is the von Neumann entropy, S(Υ N ||τ N (β)) = Tr(Υ N [ln Υ N − ln τ N (β)]) is the relative entropy, and F N (β) = −T ln Z N = N f N (β) is the free energy of the lattice at temperature T . As can be seen, the finite-N situation always yields a thermal state as the unique solution of the minimization in Eq. (F4), and therefore cannot account for second-order phase transitions at finite temperatures. This is of course in accord with the general understanding that finite-temperature phase transitions appear only in the thermodynamic limit (N → ∞). In this limit, the Hilbert space has infinite dimensions, and the simple finite-dimensional argumentation logic breaks down on certain levels. However, as is proven in Refs. [59,60], first of all, the densities exist and the variational principle still holds: where the infimum is sought over the set of translationally invariant states. The infimum is delivered by state(s), which we will call "equilibrium state(s)", satisfying the Kubo-Martin-Schwinger (KMS) condition (see, e.g., Refs. [59,60] for a definition; we will not go into the details of it since we will not use that definition in what follows) at inverse temperature β. The states that satisfy the KMS condition for a given β generalize the Gibbs state, always coinciding with it when the Hilbert space is finite-dimensional. In infinite dimensions, the KMS state is unique and coincides with the Gibbs state only at and below the critical β c , if β c is finite. Above β c , the KMS state will generally not be unique, with different KMS states representing different phases (see the discussion in Chapter V of Ref. [78]). When β c = +∞, the KMS state is a Gibbs state for arbitrary β < +∞; at β c , i.e., when the system is in the degenerate ground state, the set of KMS states coincides with the ground eigensubspace. Here, we will only deal with lattices for which β c is finite (that is, only thermal phase transitions). Defining the minimal and maximal possible energy densities as where d is the local Hilbert-space dimension of a single node of the lattice, we invoke Lemma 9 of Ref. [68] (also proven in Ref. [59]). It states that, for any u ∈ (u min , u max ], there exists a unique β = β(u) for which at least one equilibrium state at temperature β(u) yields energy density u. Moreover, the entropy density is and an analogue of the maximum entropy principle holds: s(u) is the highest entropy density among translationally invariant states with energy density u.
With this, we are ready to state the result of Lemma 12 of Ref. [68] (which is a clarification and generalization of Theorem III.4.15 of Ref. [59]): if Assuming differentiability of ln Q N (u)/N with respect to the small parameter 1/N , we can write Eq. (F12) as ln Q N (u)/N = s(u) + O(1/N ), or Q N (u) = e N s(u)+O (1) . Furthermore, also assuming differentiability of s N (u) with respect to 1/N , we can write s N (u) = s(u) + O(1/N ) (this can be rigorously proven for 1D and 2D systems even without assuming analyticity [67]), thereby arriving at Now, introducing the density of states (cf. Eq. (18)), where E = N u, we obtain from Eq. (F12) With these, we can write the energy distribution, in the sense of Eq. (20), as where, again, E = N u, and, in order to obtain the second equality, we used Eq. (F2). The normalization condition thus takes the form Regardless of what is the exact form of q N (E), by construction it satisfies Fixing an arbitrary β 0 ≤ β c we observe that the infimum in Eq. (F8) will be given by the Gibbs state at temperature T 0 (see the discussion below Eq. (F8)), and, since N , however large, is finite, the infimum will in fact be a minimum. Therefore, in the vicinity of u N (τ N (β 0 )) (which we will simply call u N (β 0 )), we can write (F20) Now, writing the double derivative in this formula as and noting that ds N (u) du = β(u), we find that where c N (β) is the specific heat of the N -site lattice at the inverse temperature β: Observing that we thus obtain The cubic term is obtained by differentiating Eq. (F22): so that The higher-order terms in Eq. (F20) (K j≥4 ) can be obtained by further differentiating Eq. (F26). Now, separating the u-dependent part of the O(1) in Eq. (F16) as where K 1 and all ζ j 's are O(1) (since they only depend on intensive quantities), we can write Eq. (F16) as where, as usual, E = uN . K is a u-independent quantity that absorbs all u-independent quantities; it is nothing but the normalization factor for q N (E).

Away from criticality: β0 < βc
When the lattice is away from criticality, c N (β) and its derivatives are finite (i.e., do not scale with N ). Therefore, var(u) ∝ β −2 N 1 , meaning that when deviating from the average, u by 1 [but o( √ N )] standard deviations, u − u remains o(1). Hence, up to rather far into the tails of the distribution, the quadratic term in the exponent in Eq. (F29) dominates the higher-order terms. Neglecting those higher-order terms and combining the linear and quadratic terms, and absorbing the resulting u-independent e where Noticing that the difference between u N (β 0 ) and , and rescaling u to E = N u, we find that Our result in this subsection thus complements the many-body Berry-Esseen theorem (Theorem 2 in Appendix E). In particular, the fact that the tails of q N (E) decay exponentially [Eq. (F32)] cannot be directly deduced from the many-body Berry-Esseen theorem; in this sense, our result directly connects to Theorem 4.2 of Ref. [77] (discussed in Appendix E) by strengthening it for the particular case of translationally invariant lattices. Equation (F32) reflects the "common wisdom" that the thermal state is located in a typical subset of energy levels of width O( var(H N ) β0 ) = O( √ N ), centered at the average energy; the energy levels within the typical subset have approximately equal probabilities, so that the entropy of the state is essentially the log of the number of the energy levels in the subset (which we see by invoking Eq. (F15) and noting that the number of energy levels in the typical subset is ∝ √ N e N s N (β0) ).

At criticality: β0 = βc
The specific heat of a critical lattice diverges with N , and even more divergent are its derivatives. This necessitates a careful bookkeeping of all the terms in the series in Eq. (F29). In order to do so, we will need to find how the specific heat and its temperature derivatives scale at criticality.
First, we recall that, as the system approaches the critical temperature, with the approach being parameterized by the specific heat and correlation length (ξ) in the thermodynamic limit scale as where 1 > α ≥ 0 and ν > 0 are the corresponding critical exponents. When α = 0 (e.g., in 2D Ising model), When N is finite, neither ξ(β) nor c N (β) can diverge as t → 0. In this case, we note that, since the correlation length diverges in the thermodynamic limit, when the lattice is large but finite, it will simply become proportional to the size of the lattice (for spatial dimension < 4) [65]. Therefore, in d spatial dimensions, On the other hand, Eq. (F34) suggests that c N ∝ ξ α/ν . Therefore, and when α = 0, for the proof of this in the case of 2D Ising model, see Ref. [66]. By the same logic, for any α ≥ 0, we have since, in view of Eqs. (F34) and (F36), d j c∞(β) dβ j ∝ β −j |t| −α−j . Looking into the structure of Eq. (F26) and its derivatives with respect to u, we see that the term is dominant in K j , j ≥ 3. Therefore, Taking into account the scaling relation (see, e.g., Ref. [64]) we can simplify Eq. (F42) to In order to determine the regime of validity of the Gaussian approximation to q N , we will now compare the j ≥ 3 terms to the quadratic term in Eq. (F29): Since N K 2 and all N K j 's diverge with N , in the above inequality the ζ's, being O(1), are not going to play a role, and therefore, we will omit them. We will analyze α = 0 and α > 0 cases separately.
a. Energy distribution for α = 0 In this case, taking Eq. (F44) in to account, Eq. (F45) takes the form which reduces to The latter simply means that, as long as the quadratic term in Eq. (F29) will dominate the higher-order terms. Noting that, for α = 0, the standard deviation of u, var(u), is ∝ T c √ ln N √ N , we conclude that the energy distribution is Gaussian up until ∝ √ ln N standard deviations into the tails.
Just like when away from criticality, the linear term in Eq. (F29) shifts the tip of the distribution function by ∝ T c c N (see Eq. (F31)), implying an asymmetry of the distribution as a whole. However, this shift, being ∝ T c ln N , is than the standard deviation var(E) ∝ T c √ N ln N , meaning that the energy distribution, in the energy range in which it is Gaussian, is close to Eq. (E5) (and Eq. (33)).
We will quantify the proximity of q N (E) to a Gaussian more precisely for the classical 2D Ising model in Appendix F 3. b. Energy distribution for 1 > α > 0 For strictly positive α's, Eq. (F45) becomes which leads to Taking into account that the standard deviation of u, we see that the situation with Gaussianity here is more tricky than for noncritical lattices or those that are critical but with α = 0. We see that, once we depart one standard deviation away from the average, we already find ourselves in a situation where both the quadratic term and the higher-order terms are O(1). However, as long as |E − H N βc | T c N 1/(2−α) , q N (E) tends to a Gaussian as N → ∞. Lastly, since α < 1, c N /N → 0 as N → ∞, the shift of the peak of the Gaussian caused by the linear term in Eq. (F29), being ∝ c N /N (see Eq. (F31)), is the variance of u, which is ∝ c N /N . Therefore, as in the previous cases, we can neglect that effect, while, of course, keeping in mind that it indicates a certain asymmetry of the overall energy distribution, with the asymmetry becoming the more significant, the higher the α.
To sum up, for |E − H N βc | T c N 1/(2−α) , q N (E) ∝ exp − (E− H N βc ) 2 2N c N (βc) . That q N (E) significantly deviates from a Gaussian when α > 0 is not surprising in the light of Appendix F 3. Furthermore, since s N (u) is a strictly concave function of u (see Theorem III.4.13 of Ref. [59]), and therefore so is s N (u) − β c u, we see from Eq. (F16) that − 1 N ln q N (E) is a strictly monotonically increasing function once one departs away from its unique minimum near H N βc . The strict convexity of s N (u) − β c u in particular means that, once |u − u N (β c )| = Θ(1), where the big Θ is according to standard asymptotic notation, q N (N u) = e −N Θ(1) . In other words, the distribution q N (E) is unimodal, with exponentially decaying tails.

Classical 2D Ising model at phase transition
Let us now look into the the most salient example of a finite-temperature phase-transition with α = 0-the classical square-lattice 2D Ising model. To make the inevitably complicated analysis as easy as possible, we choose the model to have symmetric, nearest-neighbour couplings, all equal to 1 (J = 1); be at zero magnetic field; and have periodic boundary conditions on both boundaries. In such a case, the free energy per particle in the thermodynamic limit, f = lim N →∞ 1 N F N = − β −1 N ln Z N (cf. Eq. (F7)), is given by [73] βf = − ln 2 2 − ln[cosh(2β)] + 1 2π π 0 dθ ln 1 + 1 − ι 2 cos 2 θ , where ι = 2 sinh(2β) cosh 2 (2β) , with the critical temperature being Differentiating βf (β) we obtain: where, as before, c(β) is the specific heat and is n'th central moment. Now, using the easy-to-derive formula we immediately obtain Near the critical temperature [64] c(β) ∝ ln |b| −1 , which, at criticality, translates into [66] E 2 ∝ N ln N.
Using E 3 = − ∂E2 ∂β = 2N β −3 c(β) − N β −2 ∂c(β) ∂β , we see that see Fig. 8(a). This vaguely suggests that, at β c , E 3 nullifies for finite but large N . However, Eq. (F61) (and therefore Fig. 8(a)) is inconclusive since it is not defined at β c . Therefore, in order to understand what really happens, we need to consider the exact, finite-N solution of the 2D Ising model. This can be easily done using the transfer matrices [73], and we made use of the ready formulas presented in [66,70]. Feeding these formulas to Mathematica, we find the behaviour shown on Fig. 8(a) we find that Importantly, there is a certain assymmetry between the peaks in that they have slightly differing magnitudes and distances from β c : This suggests that E 3 is not necessarily 0 at β c , which would of course not be surprising as E 3 is not zero even far away from criticality [see Fig. 8(a)]. At most, E 3 (β c ) may scale proportionally with the peaks, i.e., as N 3/2 , therefore, we can write Keeping in mind that, away from criticality, E 3 ∝ N , for the asymmetry, which we quantify by we find that A(β = β c ) ∝ N −1/6 , A(β c ) ∝ ln −1/2 N.

(F69)
In both cases, the asymmetry tends to zero in the thermodynamic limit. Note that E 3 peaking near the critical point but not at it fits very well into the general picture drawn above. Indeed, as the temperature approaches β c , the typical subset of energy levels approaches the less dense region characterized by the increased specific heat [see Eq. (F29), keeping in mind that K 2 is given by Eq. (F25)]. At some point, part of the subset will be in the critical, "sparse," zone, whereas the other part will be in the noncritical, "dense," zone, which will necessarily make the distribution asymmetric. Then, as one gets even closer to the critical point, most of the typical subset will be contained in the "sparse" zone, thereby mitigating the asymmetry.
In order to assess the Gaussianity of the energy distribution even further, let us invoke the fact that a distribution is Gaussian if and only if its first and second cumulants (κ 1 and κ 2 ) are nonzero whereas all the cumulants starting from the third (κ k≥3 ) are zero. Now, we know that κ 1 = E β and κ 2 = E 2 and κ 3 = E 3 . Therefore, if we quantify the Gaussianity by the relative weight of a cumulant as compared to κ 2 , i.e, by κ 1/k k /κ 1/2 2 , then Eq. (F69) for the asymmetry A already gives an answer for the third cumulant. As per the fourth cumulant, where the second equality is due to Eq. (F57), we find numerically by calculating ∂E 3 /∂β that