Accuracy of the finite-temperature Lanczos method compared to simple typicality-based estimates

We study trace estimators for equilibrium thermodynamic observables that rely on the idea of typicality and derivatives thereof such as the finite-temperature Lanczos method (FTLM). As numerical examples quantum spin systems are studied. Our initial aim was to identify pathological examples or circumstances, such as strong frustration or unusual densities of states, where these methods could fail. Instead we failed with the attempt. All investigated systems allow such approximations, only at temperatures of the order of the lowest energy gap the convergence is somewhat slower in the number of random vectors over which observables are averaged.


I. INTRODUCTION
One way to approximate thermodynamic quantities is to rely on trace estimators. These schemes became very popular in recent years since they turn out to be rather (or astonishingly) accurate and in addition a valuable alternative in cases where Quantum Monte Carlo suffers from the sign problem. Trace estimators approximate a trace by a simple evaluation of an expectation value with respect to a random vector [1][2][3][4][5][6][7][8][9][10], i.e., Here O ∼ is the operator of interest, and | r is a vector (pure state) drawn at random from a high-dimensional Hilbert space. The factor on the r.h.s. takes care of the dimension. If not mentioned otherwise the vector | r will be normalized. The complex components r ν of the vector | r with respect to a chosen orthonormal basis { | ν }, are supposed to follow a Gaussian distribution with zero mean (Haar measure [11][12][13]). Under unitary basis transforms this distribution remains Gaussian, i.e, also in the energy eigenbasis. The latter fact is the essential difference to the eigenstate thermalization hypothesis (ETH) [14][15][16], which assumes that the approximation in Eq. (1) also holds if | r is replaced by an individual energy eigenstate (with the trace operation being performed in a microcanonical energy shell). While it is well established that the ETH is indeed satisfied for few-body observables in generic nonintegrable models [17,18], counterexamples are also known to exist, with integrable models [17] and manybody localized systems [19,20] as the most prominent ones. However, this discussion does not interfere with the investigations presented in our paper simply for the reason that individual energy eigenstates are obviously no Gaussian random superpositions of energy eigenstates, a prerequisite we require for (1) to work.
The traces to be delt with in this article appear in equilibrium statistical physics where they are evaluated for static operators such as exp{−βH ∼ } and O ∼ exp{−βH ∼ } yielding the partition function and thermal expectation values of observables O ∼ , respectively. In this context, different schemes relying on trace estimators, sometimes termed typicality or (microcanonical) thermal pure quantum states [21][22][23][24], have been very successfully employed in the field of correlated electron systems to evaluate magnetic observables, see e.g. [4,, but also elsewhere [45,46]. Although some estimates for the accuracy of such schemes have been provided analytically [2,[8][9][10][47][48][49] as well as numerically [9,24,32,40,50], more confidence into the approximation seems to be desirable in particular in view of some scepticism [7,51].
We therefore present large scale numerical calculations for bipartite and geometrically frustrated archetypical spin systems, together with a detailed analysis of the statistical errors. The use of conserved quantities (good quantum numbers) and a related decomposition of the full Hilbert space according to irreducible representations is as well addressed. We find that the gross estimation of the relative variance of an observable O ∼ in leading order of system properties (thermally occupied levels), with E 0 being the ground state energy, is indeed roughly fulfilled for α = 1 and not too low temperatures, compare [29,47,48]. While Eq. (3) is found to hold approximately, it is worth pointing out that additionally an upper bound for δ O ∼ can be derived, see e.g. [22]. This upper bound arXiv:1911.08838v3 [cond-mat.str-el] 21 Jan 2020 does not only justify trace estimators for static quantities but also for dynamic quantities [12,52] such as timedependent correlation functions [37,49,[53][54][55]. The concept of dynamical typicality further plays a central role for the foundations of statistical mechanics and thermodynamics [56][57][58]. In this paper, however, we focus on static observables such as heat capacity and susceptibility. For such quantities one can also show that δ O ∼ = 0 for T = 0 for systems with non-degenerate ground states [29,47]. For a deeper understanding of the actual numerical method applied later in this paper it is helpful to note that two rather different approximations are employed for the evaluation of thermal averages. The first approximation is provided by the trace estimator in (1). The second approximation is provided by the Krylov space expansion of the exponential which yields a spectral representation that covers the true spectrum in a coarse grained manner [40]. These issues will be addressed in the following. Note that the second approximation might be replaced by other approaches which solve the imaginary-time Schrödinger equation iteratively such as fourth-order Runge-Kutta [37,54,55] or Chebyshev polynomials [7].
The paper is organized as follows. In Section II we recapitulate typicallity-based estimators. In Section III we present our numerical examples both for frustrated and unfrustrated spin systems. The article closes with a discussion in Section IV.

II. METHOD
There are many sound motivations and derivations of the idea that traces can be accurately approximated by expectation values with respect to several or single random vectores [2,[8][9][10] as in (1), that we do not want to repeat it here. We want to motivate the older idea of trace estimators using the more recent concept of typicality. The idea of typicality in the context of trace estimators for physical quantities such as the partition function means that the overwhelming majority of all random vectors that one can draw in a high-dimensional Hilbert space consists of virtually equivalent vectors and corresponds to a situation of infinite temperature, where all expansion coefficients of the density matrix with respect to an orthonormal basis would be just about the same. In one of the very first realizations by Hutchinson [2] this was explicitely built into the method by generating random vectors with unit entries but random sign (Rademacher random vectors). Later it turned out that unbiased estimators can also be set up with other distributions, as for instance Gaussian distributions [10] (cf. Haar measure [11][12][13]).
In the ideal case one could compute a typicallity-based expectation value, depending on temperature T and mag-netic field B, using just one single random state | r , i.e.
Numerical examples indicate that one single random state indeed works well for dense spectra and large enough Hilbert spaces [40], where the notion of "large" clearly depends on temperature. For high temperatures, already dimensions of oder 10 3 can be sufficient, while for low temperatures, the required dimension increases substantially, cf. (3). One could naively assume that this approximation could be improved by the following mean with respect to a set of R different random states, but this is not true, as we will see in the next section. It is instead more accurate to improve the involved traces separately, albeit with respect to the same set of random vectors in numerator and denominator, The latter corresponds to the scheme that is used in the finite-temperature Lanczos method (FTLM) [4,29,47]. Technical details particularly concern the evaluation of r | e −βH ∼ | r , i.e. the application of the exponential. FTLM employs a Krylov space expansion, i.e. a spectral representation of the exponential in a Krylov space grown from | r as starting vector. A similar idea can be realized using Chebyshev polynomials [7].
In addition, if the Hamiltonian H ∼ possesses symmetries, these can be employed by decomposing the full Hilbert space into mutually orthogonal subspaces according to the irreducible representations of the employed symmetry [32,35] leading for the partition function to H(γ) denotes the subspace that belongs to the irreducible representation γ, N L is the dimension of the generated Krylov space, and | n(r) is the n-th eigenvector of H ∼ in this Krylov space with seed | r and energy eigenvalue (r) n . N L is chosen such, that the ground state energy in the respective subspace is converged to numerical accuracy. For large subspaces this requires N L of the order of 300 . . . 500. The method is publicly available with the program spinpack [59,60]. Effectively, the method provides a coarse grained coverage of the density of states by means of energy representatives (r) n that come together with weight factors w n (r) = | n(r) | r | 2 , thus emulating the true density of levels in the vicinity of the energy representative. In this respect the method has got some similarities to the classical Wang-Landau sampling [61][62][63].
Approximations of the type discussed here essentially suffer from two sources of error: (i) the choice of random vectors in a Hilbert space of large but finite dimension and (ii) the expansion of the exponential in the respective Krylov spaces grown from the random vectors. The error due to choosing a single random vector is upper bounded by O(1/ √ Z eff ). In addition, one can prove minimal numbers R of random vectors if a certain minimal probability (1 − δ) is required to stay below a certain relative deviation ε [9] Pr |tr R Here tr R D (A ∼ ) denotes a trace with respect to a mathematical distribution D of random numbers, that is averaged over R realizations. For the Hutchinson method one obtains R ≥ 6ε −2 ln(2/δ), for a Gaussian distribution R ≥ 8ε −2 ln(2/δ) [9].
The error related to the specific approximation of the exponential, e.g. Krylov or Chebyshev, is not so simple to quantify, which is one of the reasons for our numerical study. The central question is how accurate the coarse grained Lanczos spectrum represents the true spectrum in the partition function for various temperatures. It is evident from the schematic representation given in Fig. 1 that the Lanczos spectrum does not provide accurate resonance frequencies for more than the lowest energy gaps, even if these are indeed very accurate thanks to the (in N L ) exponentially fast convergence of extremal eigenvalues in the Lanczos method [64]. In addition, symmetries of the Hamiltonian can not only be used to yield smaller orthogonal subspaces H(γ) according to Eq. (7) which helps to access larger system sizes and/or to make calculations faster and less memory consuming, but also to generate a larger number of very accurate Lanczos energy values since they are extremal in their respective subspaces. However, even if the lowest Lanczos energies are very accurate, this does not automatically imply that the related weights w n (r) share the same superb accuracy. The low-temperature behavior thus remains to be further explored. Only at T = 0 the observables are bound to be correct as long as the the random vector has got some overlap with the true (non-degenerate) ground state [47,57]. One the other hand, for high temperatures one can show that the expansion of the exponential in Krylov space up to order N L corresponds to a high-temperature series expansion of equilibrium expectation values up to the same order [47]. Overall, relation (3) could be derived in Ref. [47] under rather reasonable assumptions. This relation will be our reference in the upcoming part of the paper.
In the following we estimate the uncertainty of a physical quantity approximated by a trace estimator by repeating the numerical evaluation N S times. The generated set of results is considered as a statistical sample, for which we define the standard deviation of the observable in the following way: O m (T, B) is either evaluated according to Eq. (4) or to Eq. (6), depending on whether the fluctuations of approximations with respect to one random vector or with respect to an average over R vectors shall be investigated (cf. the following examples). In the following numerical examples two observables are considered, zero-field susceptibility and heat capacity. Both are evaluated as variances of magnetization and energy, respectively, i.e.

III. NUMERICAL RESULTS
In this section we investigate various spin systems. They are of finite size and modeled by Heisenberg Hamiltonians augmented with a Zeeman term, i.e.
where the first sum runs over ordered pairs of spins ("−2J" convention used). Our original intention was to identify systems and circumstances where the approach of Eq. (6) fails. But none of the investigated systems turned out to be (systematically) intractable. Figure 2. Spin ring N = 10, s = 5/2: The light-blue curves depict 100 different estimates of the differential susceptibility as well as the heat capacity using single vectors. Mean values as well as the exact result are also presented.

A. Spin ring
As a first example we examine a spin ring with N = 10 spins s = 5/2 and nearest-neighbor antiferromagnetic interaction. This system exists as a magnetic molecule (abbreviated Fe 10 ), and it is called a "ferric wheel" [65]. Although the dimension of the total Hilbert space is 60,466,176, the Heisenberg Hamiltonian can be diagonalized completely thanks to the high symmetry (SU(2) and C 10 ) [66]. Figure 2 shows N S = 100 calculations of the differential susceptibility as well as the heat capacity according to Eq. (4), i.e. using a single random vector for each estimate together with the means according to Eqs. (5) and (6). For the calculation S ∼ z -symmetry was employed. In addition the exact result is depicted. One can clearly see that the estimates using a single random vector fluctuate largely for temperatures less than ten times the coupling. Nevertheless, if the estimates are joined in an FTLM fashion according to Eq. (6), the result for R = 100 can be hardly distinguished from the exact calculation. A simple mean according to Eq. (5) fails.
A statistical analysis of the set of estimates O r (T, B) in Fig. 3 reveals that the error estimate (3) with α = 1 indeed accurately describes the standard deviation. Only for temperatures smaller than the exchange coupling larger deviations can be observed, but they do not exceed the error estimate much (e.g. by orders of magnitude).

B. Cuboctahedron & Icosidodecahedron
As a second example we choose two frustrated polytopes: the cuboctahedron as well as the icosidodecahedron with antiferromagnetic nearest-neighbor interactions. Not only do both exist as magnetic molecules [67][68][69][70][71][72][73], they are also itimately related to the kagomé lattice antiferromagnet [74][75][76]. In contrast to the bipartite spin ring discussed above, these spin systems possess a rather dense spectrum with for instance several to many singlet states below the first triplet state (a hallmark of geometric frustration).
For the cuboctahedron, that has 12 spin sites, we choose a single-spin quantum number of s = 3/2 since we can still completely diagonalize the Hamiltonian using symmetries although the dimension of the total Hilbert space is 16.777.216 [77]. As can be seen in Fig. 4, the magnetic observables fluctuate below a temperature of five times the coupling when evaluated with respect to a single random vector. Aggregating them into an FTLM estimate with R = 100 again yields a very good approximation compared to the exact result.
Since the system is not too big we repeated this analysis for N S = 100 samples of FTLM estimates with R = 100 each (in this case O m (T, B) of Eq. (9) equals O FTLM (T, B) of Eq. (6)). The result is shown in Fig. 5. One immediately recognizes the much smaller spread of the estimates. Only at (sharp) features of the respective functions deviations are still visible. The origin can be found in strong variations of the true density of states with energy and/or external magnetic field; such variations seem to be hard to emulate by the coarse grained coverage through the trace estimator. The related standard deviations are expected to further decrease in a Monte-Carlo-fashion by a factor of 1/ √ R, compare [47]. This is indeed found as depicted in Fig. 6. The solid curves display the true standard deviation as well as the estimate for R = 1, whereas the dashed curves do the same but for R = 100. Since √ 100 = 10, the fluctuations of the trace estimator should be ten times smaller, which agrees with the numerical study.
The spectrum of the icosidodecahedron features similar properties as that of the cuboctahedron or that of finite-size realizations of the kagomé lattice [40,74,78]: the spectrum is rather dense, which in particular means that many singlets populate the energy spectrum below    the lowest triplet level. The latter has a stark impact on the low-temperature behavior of the heat capacity. On the other hand, for a given temperature a dense spectrum leads to a much larger effective partition function Z eff in (3) compared to e.g. a bipartite spin system with pronounced low-lying energy gaps and thus to smaller fluctuations at this temperature. Comparing Figs. 2 and 7, one notices that the fluctuations of the estimators were visible below k B T ≈ 10|J| for the ferric wheel, whereas this value is only k B T ≈ 1|J| for the icosidodecahedron. This means, that a single random vector is sufficient for the evaluation of these observables k B T 1|J|, which constitutes a drastic reduction of the computational effort. One could thus sloppily say that frustration works in favor of trace estimators, cf. Ref. [41].
The respective standard deviations support these impressions. Only at the lowest temperatures -corresponding to the low-lying level structure in particular of the singlet states -the specific heat estimates express large fluctuations. The susceptibility is not affected, since the low-lying singlets are non-magnetic.

C. Delta chain
The above discussed spin systems have a pretty regular (dome-shaped, close to Gaussian) density of states. As our next example we would like to investigate a delta chain (also sawtooth chain) close to the quantum critical point [79,80]. We choose a chain of N = 32 sites with a ferromagnetic nearest-neighbor interaction J 1 and a next-nearest neighbor antiferromagnetic interaction J 2 between spins on adjacent odd sites, i.e. i and i + 2 with i = 1, 3, 5, . . . . Periodic boundary conditions are applied. At the quantum critical point (QCP), |J 2 /J 1 | = 1/2, the system features a massive degeneracy due to multimagnon flat bands. Therefore, close to the QCP an additional small energy scale is created, around which the density of states exhibits an additional low-energy maximum. It is worth mentioning that such a compound, that is very close to the QCP, could be synthesized recently [81].
When evaluating the estimates one notices that fluctuations of observables appear only for temperatures of the order of the emergent small energy scale, as can be seen in Fig. 9. This energy scale, which is approximately 10 −2 |J 1 | and corresponds to the low-temperature maximum, is much smaller than the dominant scale |J 1 |, that corresponds to the high-temperature Shottky peak.
The reason for this behavior can be traced back to the enormous number of low-lying levels assembled at the low-energy scale, compare density of states in Fig. 9, that at temperatures elevated above the low-temperature scale contribute to the effective partition function (3) and thus lead to a very small estimate for the fluctuations of any observable. This is clearly seen for δ(C) in Fig. 9, which virtually drops to zero above the low-temperature scale. Also in this case a single random vector suffices to evaluate the thermal behavior above the low-temperature scale.

D. An integrable spin system
We already mentioned that the use of the concept of typicality for trace estimators is not connected to the question whether ETH holds for the respective system or not. Here we present a simple example of a spin-1/2 chain with nearest-neighbor antiferromagnetic interaction, that would be integrable via the Bethe ansatz [82][83][84][85]. We investigate a spin chain of N = 24 spins s = 1/2 with periodic boundary conditions that can also be solved numerically exactly when employing the symmetry groups SU (2) and C N [86]. For the FTLM investigation a reduced symmetry was used, namely S z ∼ -symmetry as well as translational symmetry C N .  In Figs. 10 and 11 we present our results for the heat capacity. The susceptibilty (not shown) behaves similarly. The results are very similar to the already discussed examples. The largest deviations again occur at and below a temperature scale of the order of the exchange interaction |J|. A sharp peak of the standard deviation δ(C) at very low temperatures occurs at temperatures coressponding to the lowest (singlet-triplet) gap. However, one would expect for this class of spin systems that the temperature above which the approximation is good drops with increasing system size since the lowest gaps shrink for this system that is gapless in the thermodynamic limit.

E. A Haldane spin system
The question how the lowest gap influences the low temperature quality of the approximation will be addressed in this section. To this end we choose a Haldane spin chain of N = 20 and s = 1 with nearest neighbor antiferromagnetic exchange as an example for systems  where the lowest gap is sizable and does not even close in the thermodynamic limit.
As one can see in Fig. 12 both susceptibility as well as heat capacity fluctuate largely about and below a temperature scale that is provided by the lowest energy gap (dashed vertical line). In essence this means -here and for all previous examples -that a single random vector, Eq. (4), does not work for temperatures of this order and below, except for T = 0 where the method is bound to be exact [29,47]. This is also clearly reflected by the respective standard deviations shown in Fig. 13. In addition, here we encounter an example where the standard deviation δ(dM/dB) assumes values much larger than our estimator (3) for T > 0. The reason of the strong fluctuations of lowtemperature observables lies in a poor coverage of w n (r) for the lowest energy eigenvalues. Although the lowest eigenvalues, thanks to the properties of power methods, are very accurate, i.e. possess a standard deviation of < 10 −4 or better, the corresponding weight factors fluctuate largely. This can be seen in Fig. 14 where the weigths w n (r) are displayed for the ground state and the first excited state as they are evaluated in the respective Hilbert subspaces H (M ). These fluctuations of the weights are conserved by a power method, this is particularily important for the low-lying levels. In order to yield an accurate low-temperature partition function the weights of the lowest states should equal one in (7). The naturally occuring variation of the weights in a random vector are amplified at low temperature by the Boltzmann factor. One can derive an estimate for the relative error of the specific heat assuming (for simplicity) that at low temperatures only the ground state as well as the first excited state contribute to the partition function, Here ∆ is the gap between ground and first excited state, d the degeneracy of the first excited state, and w 0 and w 1 the weights of the ground and the first excited state. and H ∼ separately.
Although every random vector possesses these fluctuations, a proper averaging as outlined in (7) decreases the fluctuations drastically. Figure 14 demonstrates that after averaging over 100 random vectors, the averaged weights (thick red bars) approach an equal magnitude although not yet one, but ∼ 1.2. For most observables it is sufficient that the averaged weights of the lowest states are about the same since they appear simultaneously in numerator and denominator of Eq. (6). Quite recently new ideas have been formulated how to improve the lowtemperature estimates of FTLM also for small numbers R of random vectors by taking special care of the weights for low-lying levels [87].
The investigation of the averaged weights of low-lying levels also sheds light on the failure of the naive mean according to Eq. (5). This kind of a mean does not average the individual weights, but the individual single-vector expectation values, which converges either very slowly or even to a different function at low temperatures.

IV. DISCUSSION AND CONCLUSIONS
Finally, we can conclude that typicality-based methods allow an astonishingly accurate approximation of static thermodynamics observables sometimes using just one random vector. This qualifies methods such as FTLM for a reliable treatment of large quantum systems, in particular of those that cannot be delt with by quantum Monte Carlo due to the sign prolem and of those where approximations using matrix product states converge slowly such as the two-dimensional kagomé lattice [40].
The simple idea of typicality to replace a trace by an expectation value with respect to just one random vector works indeed for large enough temperatures. 1/ √ Z eff provides the estimate for the relative error to be expected for temperatures well above the lowest excitation gap. An additional average over many random vectors according to (6) further increases the accurcy in a Monte-Carlo fashion and reduces the error by another factor 1/ √ R, where R is the number of employed random vectors. The simple average (5) of single-vector approximations does not converge properly especially at low temperatures.
Although power methods such as the Lanczos method yield exact ground state expectation values for systems with non-degenerate ground states, and should thus be accurate at T = 0, the large fluctuations of estimates using a single random vector surprise. We could clarify the latter problem by elucidating the important role jointly played by the energy gap between ground state and first excited state as well as the weight factors of both states. Although both energies are spectroscopically accurate it needs sufficient averaging to tame the strongly fluctuating weight factors. In view of this, and with 1/ √ Z eff in mind, one can state that the discussed approximations work better for systems with small gap and larger density of low-lying states. Therefore, frustration works in favor of trace estimators.
Overall, we conclude that methods such as FTLM, which rely on trace estimators, are astonishingly accurate. We could demonstrate with several prototypical examples that the standard deviations of observables can be systematically reduced via averaging. In addition, we are convinced that we could provide a valuable contribution in order to trust these methods by presenting realistic standard deviations [88].