Predicting Gibbs-State Expectation Values with Pure Thermal Shadows

The preparation and computation of many properties of quantum Gibbs states is essential for algorithms such as quantum semidefinite programming and quantum Boltzmann machines. We propose a quantum algorithm that can predict $M$ linear functions of an arbitrary Gibbs state with only $\mathcal{O}(\log{M})$ experimental measurements. Our main insight is that for sufficiently large systems we do not need to prepare the $n$-qubit mixed Gibbs state explicitly but, instead, we can evolve a random $n$-qubit pure state in imaginary time. The result then follows by constructing classical shadows of these random pure states. We propose a quantum circuit that implements this algorithm by using quantum signal processing for the imaginary time evolution. We numerically verify the efficiency of the algorithm by simulating the circuit for a ten-spin-1/2 XXZ-Heisenberg model. In addition, we show that the algorithm can be successfully employed as a subroutine for training an eight-qubit fully connected quantum Boltzmann machine.

The preparation and computation of many properties of quantum Gibbs states is essential for algorithms such as quantum semidefinite programming and quantum Boltzmann machines. We propose a quantum algorithm that can predict M linear functions of an arbitrary Gibbs state with only O(log M ) experimental measurements. Our main insight is that for sufficiently large systems we do not need to prepare the n-qubit mixed Gibbs state explicitly but, instead, we can evolve a random n-qubit pure state in imaginary time. The result then follows by constructing classical shadows of these random pure states. We propose a quantum circuit that implements this algorithm by using quantum signal processing for the imaginary time evolution. We numerically verify the efficiency of the algorithm by simulating the circuit for a ten-spin-1/2 XXZ-Heisenberg model. In addition, we show that the algorithm can be successfully employed as a subroutine for training an eight-qubit fully connected quantum Boltzmann machine.

I. INTRODUCTION
Gibbs states are mixed quantum states that describe quantum systems in thermodynamic equilibrium with their environment at finite temperature. They play a central role in quantum statistical mechanics [1] and their properties are of importance for a wide range of academic and industry-relevant applications [2]; these include the design of complex quantum materials in condensed-matter physics, optimization with quantum semidefinite programming, and machine learning with quantum Boltzmann machines.
The preparation of Gibbs states and the computation of Gibbs-state expectation values are highly nontrivial tasks. Existing algorithms can have rather complicated implementations and may apply only to a limited set of systems. Classical algorithms suffer from the exponentially growing Hilbert space or have a sign problem for some fermionic systems [3]. Fault-tolerant quantum algorithms have better asymptotic scaling [4,5] but have qubit and noise requirements that exceed the capabilities of current and near-term quantum processors. Variational quantum algorithms can prepare Gibbs states [6,7] and cope with some hardware limitations but require many experimental measurements for each optimization step and may suffer from barren plateaus [8]. Other quantum approaches based on minimally entangled typical thermal states (METTS) [9,10] set up a Markov chain that potentially has a long thermalization time.
Gibbs states can be represented mathematically by a density matrix of the form ρ β = e −βH /Z, where H is the system Hamiltonian, β is the inverse temperature, and Z = Tre −βH is the partition function. In this work, we propose an efficient quantum algorithm to estimate a large number of Gibbs-state expectation values without preparing and measuring the mixed state ρ β . This * luuk.coopmans@quantinuum.com The classical shadows of a Gibbs state ρ β , here represented as a quantum system (red sphere) in thermal equilibrium with its environment (red bath), are equal to the classical shadows constructed from a thermal pure quantum state |ψ β (purple sphere). The random measurement direction of the shadows is determined by the Clifford unitary V Cl . Bottom: A circuit diagram of the quantum circuit that implements a pure thermal shadow with quantum signal processing (QSP).
is achieved by combining thermal pure quantum (TPQ) states [11][12][13][14][15][16][17][18][19][20][21] and classical shadow tomography [22][23][24][25][26][27][28][29]. One way to generate a thermal pure quantum state is by imaginary time evolution, e −βH/2 |φ , of a n-qubit random state |φ [30,31]. We show that if the probability distribution over the initial states, |φ , forms at least a quantum 2-design, the imaginary-time-evolved states approximate the expectation values of ρ β up to an error that falls off exponentially with the system size, n. Classical shadow tomography is then exploited to construct an efficient classical representation of these TPQ states from outcomes of randomized measurements. Using the results from Ref. [25], we show that for sufficiently large systems, we can estimate M Gibbs-state expectation values using O(log M ) measurements of a single prepared TPQ state. This is remarkable, because the arXiv:2206.05302v4 [quant-ph] 26 Jun 2023 required number of measurements is similar to the case where we have actually prepared the Gibbs state (e.g., via a purification [5], which incurs a higher cost of 2n qubits).
We also propose a practical implementation of the algorithm. We begin by preparing the initial state using a polynomial-depth Clifford circuit. This suffices to produce a quantum 2-design with the additional benefit of a reduced circuit depth compared to some of the other random-state preparation techniques. We then approximate the imaginary time evolution using quantum signal processing (QSP) [32][33][34][35]. This approach is very general (i.e., it applies to any quantum Hamiltonian H, in principle) andit offers great control since we can systematically trade off circuit depth for accuracy. For the randomized measurement, we use another Clifford circuit. The complete circuit, comprising a Clifford circuit, a QSP circuit, and a randomized measurement, is shown in Fig. 1 and uses a total number of qubits linear in the system size, n.
In order to support our theoretical findings, we classically simulate the quantum circuit for a quantum system of ten spin-1/2 particles in the XXZ-Heisenberg model [36,37]. Moreover, we use the simulated algorithm to numerically train a fully connected quantum Boltzmann machine for the generative modeling of both classical and quantum data to high accuracy.

II. THERMAL PURE QUANTUM STATES
A TPQ state is any pure state, |ψ , that is able to estimate a fixed set of properties (expectation values) of a sufficiently large mixed thermal state, specified by some specific statistical ensemble. For the thermodynamic (canonical) Gibbs ensemble, following Ref. [16], we define it to be any pure state |ψ , which is drawn at random, that satisfies for all O j in some predefined set of Hermitian operators {O j }. The constants C and α are specified later. We now show that the pure states generated by imaginary time evolution, where U ∈ Cl(2 n ) is a random unitary drawn from the n-qubit Clifford group, satisfy Eq. (1). These TPQ states are slightly different from the ones originally proposed in Ref. [16] and also differ from the ones used in Ref. [19] for classical numerical simulations. There, a particular form of Haar-random states for U |0 is used, which on a quantum computer would require exponential circuit depth. Our choice of U ∈ Cl(2 n ) yields a unitary 3design [38] and thus it replicates Haar integrals up to the third moment. This has the advantage of requiring only O(n 2 / log n) quantum gates [39]. First, as shown in detail in Appendix B, the expectation value of an arbitrary Hermitian operator O in the random pure states |ψ β is, on average, Here, E U denotes the ensemble average with respect to the n-qubit Clifford group U ∈ Cl(2 n ). The approximate equality follows from computing a Taylor expansion up to second order in the variance of the normalization N of |ψ β (for details, see Appendix B). This leads to the same approximation as in previous works based on Haarrandom states [16,19]. A similar calculation for the variance of the expectation value with respect to U yields Both the bias in Eq. (3) and the variance in Eq. (4) are proportional to the purity of the Gibbs state, Trρ 2 β . Since the Gibbs state ρ β minimizes the Helmholtz free energy, F β = − log Z/β, we can write The last equality follows from properties of the free energy, i.e., the extensivity, F β ∝ n, [40] and the monotonicity with respect to the inverse temperature, F 2β > F β . The terms that multiply the purity in Eqs. (3) and (4) Hence, the expectation values with respect to the random pure states |ψ β can be used as estimators for Gibbsstate expectation values of polynomially sized operators O, for a sufficiently large system and finite β. Importantly, for β = ∞, i.e., when the Gibbs state approaches the ground state of H, the Gibbs state becomes pure, Trρ 2 β = 1, and the error remains finite for any system size n. For all other β, the rate of exponential decay, and thus how large n needs to be, is dictated by F 2β −F β . The exact error, therefore, depends on which specific system Hamiltonian H one considers, the inverse temperature β, and also the spectral norm of the observable O . This means that in some specific instances [16,19],the use of only a single TPQ state is sufficient (see Appendix E). In contrast, approaches that do not have a vanishing variance, such as algorithms based on METTS [9,10], may require multiple pure states.

III. PURE THERMAL SHADOWS
Classical shadow tomography [22][23][24][25] is a method to efficiently predict many properties of a prepared quantum state, ρ, by performing randomized measurements and constructing efficient classical representations (shadows) from the measurement outcomes. In Ref. [25], it has been proven that one can predict M properties of ρ from O(log M ) measurements (for a review, see Appendix A). For this reason, classical shadow tomography is particularly beneficial in scenarios where one needs to compute many observables, such as in variational quantum algorithms and quantum fidelity estimation [29].
To predict many properties of a Gibbs state ρ β , one can prepare a purification of ρ β on 2n qubits and apply classical shadow tomography. Here, instead, we simplify the process by constructing shadows of the n-qubit TPQ states |ψ β . We refer to these as pure thermal shadows. Notably, in the limit of large system size, the number of state preparations and measurements required for predicting M properties to a certain accuracy becomes similar to the case where we prepare and measure ρ β directly.
The first step of the algorithm is the application of a random unitary, V , to the TPQ states, |ψ β → V |ψ β . The operation V can either be a random n-qubit Clifford unitary, V ∼ Cl(2 n ), for a random Clifford measurement, or a tensor product of single-qubit Clifford unitaries, V ∼ Cl(2) ⊗n , for a random Pauli measurement. Subsequently, a computational-basis measurement of V |ψ β is performed to obtain a n-qubit bit-string outcome, |b . The thermal shadows, can be constructed with M −1 (X) = (2 n + 1)X − 1TrX.
As V is a Clifford circuit and |b a bit string, the shadow can be efficiently constructed and stored classically. By averaging over both the unitary V of the randomized measurements and the unitary U of the TPQ states |ψ β , we find that the expectation values of the pure thermal shadows satisfy Here, E V,b includes both the classical average over the random circuits V as well as the quantum average of the measurement outcomes |b via Born's probabilities. The expectation value of the shadowη V,b is thus the same as the expectation value of |ψ β . By Eqs. (3) and (5), this becomes equal to the Gibbs-state expectation value, Trρ β O, as we increase n.
The success probability of the algorithm depends on the mean squared error, σ 2 , of the pure thermal-shadow expectation value from the true Gibbs-state expectation value. For random Clifford measurements, V ∼ Cl(2 n ), we find (see Appendix C) that this error evaluates to Here, we define O 0 = O − 1TrO/2 n as the traceless part of the operator O. Similarly, for random Pauli measurements, V ∼ Cl(2) ⊗n , we obtain where O is now a k-local Pauli operator [42] Remarkably, these mean squared errors are, up to an exponentially small bias, approximately the same as the σ 2 of shadows constructed from randomized Pauli and Clifford measurements of a prepared ρ β (see Appendix C). This means that the construction of shadows of the TPQ state performs approximately as well as the construction of shadows of the true Gibbs state. However, the latter scenario is only theoretical, since up to now no quantum device has been able to prepare a desired Gibbs state exactly.
In the scenario in which we wish to predict a total of M Gibbs-state expectation values, we invoke a specific form of Theorem 1 in Ref. [25] for a median-of-means estimator. This gives the main result of this paper, which can be summarized by the following theorem.

Pr max
Here, µ j (N, K) is the predicted expectation value for observable O j obtained from a median-of-means estimation on K subsets of N classical shadows. A detailed proof of this Theorem and a review of Theorem 1 in Ref. [25] are given in Appendices A and C. We conclude that for M Gibbs-state expectation values, we require O(log M ) preparations of TPQ states |ψ β and randomized measurements. The exact performance of the algorithm depends on which type of measurement one uses (Pauli or Clifford) and the considered observables. For k−local Pauli observables, with k n, the variance of Pauli measurements [Eq. (9)], is lower and hence a better choice. On the other hand, for tasks such as fidelity estimation, the Clifford measurements perform better [25].

IV. IMPLEMENTING THE QUANTUM CIRCUIT
In order to implement the pure thermal-shadow algorithm on a quantum device we need several different elements. First, we need a method to sample uniformly from the n-qubit Clifford group to generate the random Clifford circuits U . An efficient polynomial-time algorithm exists for this [43], which has been implemented in the quantum computing library qiskit. We can use this for both the generation of the random pure states at the beginning of the algorithm as well as the randomized measurements at the end. Second, we need a method for approximating the nonunitary operator e −βH/2 in Eq. (2). To this end, we use quantum signal processing [34] (QSP), a framework for performing matrix arithmetics on quantum computers.
QSP can generate polynomials of a Hamiltonian given access to a block encoding of the Hamiltonian into a larger unitary matrix (see Appendix D). There exist block-encoding schemes for generic matrices [33,44]. Here, we employ the linear combination of unitaries (LCU) method [45], under the assumption that the Hamiltonian is given in the form H = k a k P k , where a k ∈ R and P k are n-qubit Pauli operators. We also require a preprocessing step that rescales the spectrum of H to the interval [0, 1]. We achieve this with a min-max rescalingH = (H − λ min 1)/(λ max − λ min ), where λ min (λ max ) is the smallest (largest) eigenvalue. In practice, the extremal eigenvalues are unknown and one resorts to a lower bound for λ min and to an upper bound for λ max . We also define the imaginary time τ = β(λ max − λ min )/2 so that Thus, we can evolveH for time τ and obtain the desired nonunitary operator up to a constant factor. This factor is irrelevant, as it cancels out in Eq. (2). We then use QSP to implement a polynomial approximation to the exponential function and apply it to the eigenvalues ofH [44,46]. The method in Corollary 64 of Ref. [44] approximates the exponential function to accuracy with a polynomial of degree O( β(λ max − λ min ) log(1/ )). The degree is proportional to the number of uses of the block-encoding circuit and thus determines the overall depth of the circuit. In our implementation, we instead numerically find a polynomial approximation to e −τ |x| using the pyqsp library [35] and then construct a suitable QSP circuit.
At the bottom of Fig. 1, we show the circuit diagram for the pure thermal-shadow algorithm. The success probability e −τH U |0 2 , i.e., the probability of measuring the ancilla qubits in the all-zero state, depends on the random Clifford circuit U . Averaging over the Clifford group and using Eq. (12), we find that the probability of success is O(e βλmin Tre −βH /2 n ). It can be verified that the success probability decreases as β increases. This is expected from the intuition that low-temperature sampling is computationally harder [47]. By including O(e −βλmin/2 2 n /Tre −βH ) iterations of amplitude amplification, the protocol is enhanced and succeeds with probability O(1). Amplitude amplification can be implemented via yet another layer of QSP [35]. Note that while this protocol works for any Hamiltonian and temperature, the total circuit depth is expected to be polynomial in n only for certain choices of H and β.

V. NUMERICAL SIMULATIONS
A. Verification of the efficiency of the algorithm We verify the efficiency of the pure thermal-shadow algorithm by simulating the circuit for the one-dimensional XXZ-Heisenberg Hamiltonian [36,37] given by Here, J is the nearest-neighbor x, y-coupling constant and ∆ is the nearest-neighbor z-coupling strength. This is a paradigmatic model for the study of (quantum) magnetism and it maps to the Hubbard model for spinless interacting fermions after a Jordan-Wigner transformation. We set the system size to n = 10 and the coupling parameters to J = 0.5 and ∆ = 0.75, which means that the ground state is in the antiferromagnetic phase [37]. First, in Fig. 2(a), we show the maximum error of the estimated expectation values of all one-and two-qubit Pauli observables (435 in total) using the TPQ states in Eq. (3) generated with QSP. Note that here we use the full TPQ state vectors to estimate the expectation values and do not construct thermal shadows yet. We observe that the error decreases as we increase the degree of the polynomial approximation used in QSP. This shows that we can use QSP to tune the accuracy. In addition we see that, as expected, the method performs better for higher temperatures, i.e., smaller β.
In Fig. 2(b), we show the same error computed for the full algorithm, in which we use the mean expectation values of the thermal shadows to estimate the true Gibbs-state expectation values. As we are computing Pauli observables, we use random Pauli measurements for this. We compare the performance of shadows constructed from the exact Gibbs state ρ β with β = 1, the exact TPQ state, and the TPQ state obtained from a QSP of degree 32. As we increase the number of shadows used to estimate the expectation values, we observe that the maximum error becomes similar, O(10 −1 ), for all three methods. This confirms our theoretical finding that shadows constructed from |ψ β perform as well as shadows constructed from ρ β .

B. Training of quantum Boltzmann machines
In order to demonstrate the power of the pure thermalshadow algorithm for practical applications, we exploit it for the training of a quantum Boltzmann machine (QBM). A QBM is a quantum machine-learning model, ρ θ = e −H(θ) /Z (see Appendix E), that can be used for various learning tasks [48][49][50][51], e.g., the generative modeling of classical and quantum data and Hamiltonian learning via Gibbs states [52]. The training of a QBM is, however, highly complex and intractable, since most optimization algorithms require the evaluation of many Gibbs-state expectation values, Trρ θ O j . Here, we use the pure thermal-shadow algorithm in this subroutine.
We focus on an eight-qubit fully connected QBM, where the model Hamiltonian has the form H(θ) = k=x,y,z n i,j>i λ k ij σ k i σ k j + n i γ k i σ k i , with θ ≡ (λ, γ) the model parameters. As objective function, we take the quantum relative entropy, S(η ρ θ ) = Tr(η log η) − Tr(η log ρ θ ), where η is the target density matrix that encodes our data.
First, focusing on quantum data, we set η equal to the Gibbs state of the XXZ Hamiltonian from Eq. (13) at β = 1 and minimize S with respect to θ using vanilla gradient descent. In Fig. 3, we show results for three different training methods. When using the exact gradient, both the relative entropy (main panel) and the error in the expectation values (inset) decrease monotonically during training. When using either the exact TPQ state or thermal shadows, we observe a similar behavior but with slower rates. This is expected, since these methods provide approximations to the gradient only up to a certain accuracy.
Importantly, the maximum error in the expectation values of the final QBM from the target η for the thermalshadow method is max = O(10 −1 ). The error can be further reduced by increasing the number of shadows and/or the degree of polynomial approximation used for QSP. This shows that thermal shadows can be used to train a QBM to model quantum data to high accuracy. More- over, our theoretical analysis indicates that the accuracy may become higher for increased system sizes. This is because the exact TPQ states are expected to better approximate the true Gibbs state for larger systems.
As a second example, we train a fully connected QBM on a multielectrode array recording of eight salamander retinal-ganglion cells [53] [54]. In order to encode this classical data set into a quantum state η, we follow Ref. [51]. First, we construct the empirical probability distribution, q(s) = 1 N N µ=1 δ s,s µ , where the s are bit strings of length 8 and the {s µ } N µ=1 are binary vectors of length 8 corresponding to the N salamander data samples. This distribution is shown in Fig. 4(c) for the first 20 bit strings. Then, we turn q(s) into a rank-1 density matrix, η = |ψ ψ|, with |ψ = s q(s)|s . The advantage of this encoding is that the required target expectation values can be efficiently computed classically and it imposes nontrivial quantum statistics in the classical data [51].
In Fig. 4(a), we show the results for a trained QBM with this classical data encoding. We observe that the training algorithm based on pure thermal shadows performs almost as well as the one based on exact gradients. In contrast with the monotonically decreasing relative entropy observed for the quantum target state, the relative entropy now saturates at a small but finite value. This is expected, because the target η is not a Gibbs state itself, i.e, there is a model mismatch. Note that this is not related to our training method, but a feature of the chosen model.
In Fig. 4(b), we plot the maximum and mean error in the predicted expectation values, = | O j η − O j ρ θ |, of the QBM during training. Similar to the XXZ-Heisenberg-model results, we see that during training the maximum error decreases and reaches a finite value for the thermal-shadow method. The final error is small, O(10 −2 ), and can be further improved by increasing the number of shadows. As a last indicator for the performance of our QBM training algorithm, in Fig. 4(c), we inspect the final probability distribution of the QBM. It can be seen that the probability distribution obtained from the QBM approximates the true q(s) to high accuracy. This shows that the algorithm can be exploited for the training of quantum machine-learning models on real-world classical data sets. We leave the question of how well a QBM performs compared to other generative machine-learning models for future studies.

VI. CONCLUSION AND OUTLOOK
In this work, we show that one can predict M linear properties of a n-spin Gibbs state at finite temperature with only O(log M ) preparations of a thermal pure quantum state. In practice, this means that one only needs to prepare a n-qubit random pure state on a quantum computer, imaginary time evolve it with the system Hamiltonian, and finally perform randomized measurements. Compared to algorithms that prepare a purification of the Gibbs state, our approach reduces the required qubits by half; and compared to variational Gibbs-state preparation, it reduces the number of experimental measurements.
We also propose a practical implementation of the algorithm where the imaginary time evolution is performed by quantum signal processing. By simulating a XXZ-Heisenberg model we demonstrate the performance of the algorithm and circuit and show that the predicted expectation values are in excellent agreement with the exact values. In addition, as a relevant real-world application, we show that the algorithm can be exploited to train a fully connected quantum Boltzmann machine to model quantum and classical data with high accuracy. This problem is known to be computationally intractable.
For future studies, a few interesting avenues can be explored. First, on a theoretical level, one can look at constructing shadows of other types of thermal pure states, such as states that satisfy the eigenstate-thermalization hypothesis [1,55] and minimally entangled typical thermal states [9]. Another option is to generalize the algorithm to other thermodynamic ensembles and look at the efficiency.
More practically, one can investigate further improvements of the algorithm by derandomization [56][57][58] of the measurements and also the prediction of nonlinear functions of ρ β . Furthermore, it will be interesting to see if one can improve the imaginary-time-evolution approximation by quantum signal processing; e.g., by using fragmentation as described in Ref. [59]. We believe that this may lead to a practical realization of our algorithm, and the training of quantum Boltzmann machines, on early-term quantum signal processors.

Appendix A: Review of classical shadows
In this appendix we review the classical-shadow-tomography protocol as introduced by Huang et al. in Ref. [25]. [60] This method uses randomized measurements to construct efficient classical representations (shadows) of an arbitrary quantum state ρ. The power of this randomized approach is that it can be used to predict many properties (such as observables) of ρ with very few measurements. We start by outlining all the steps in this randomized measurement method (by following Ref. [25]) and defining what the classical shadows are. Afterward, we prove how many shadows are required to compute M expectation values, TrρO j , for a set of arbitrary observables {O j } M j=1 .

Randomized measurements and classical shadows
Definition 1 (Randomized measurements). Given an arbitrary n-qubit quantum state ρ and a random unitary operator U , which is drawn uniformly from some ensemble U, we define a random measurement process as follows: where in the second step we measure in the computational basis, i.e., b is a bit string. The probability of obtaining bit string b is given by Born's rule: The interpretation of this measurement process is straightforward: we rotate the quantum state to some random basis and then perform a computational-basis measurement. Taking these two steps together, we thus perform a measurement in a random basis.
This random measurement process can then be used to construct efficient classical representations of ρ by performing some (classical) postprocessing steps. First, one applies the inverse of U to the outcome |b b|. This can be done efficiently classically if U is drawn, for example, from the n-qubit Clifford group Cl(2 n ) or a tensor product of random single-qubit Cliffords, U = U 1 ⊗ ... ⊗ U n , where U i ∈ Cl(2). We then repeat this measurement process many times, each time for a different unitary U , and look at the expectation value, E[U † |b b|U ]. We can see that this complete process defines a quantum channel M on the original state ρ. Specifically, we obtain As M is a linear map, we can obtain information about ρ from the measurement results by inverting the channel M −1 . We remark that the channel is invertible provided that the map is one to one. This means that there exist a unitary U ∈ U and a corresponding b for which b|U ρU † |b = b|U σU † |b for ρ = σ. The application of M −1 to both sides of Eq. (A3) gives Note that here we use the fact that M −1 is a linear map and hence can be absorbed into the expectation value. From this, we can define the classical shadows.
Definition 2 (Classical shadow). Given a n-qubit quantum state ρ and a bit string |b obtained from a randomized measurement of ρ with random unitary U ∈ U, a classical shadow of the state ρ is defined to be the unit-trace operator provided that there exists a unitary U ∈ U and a corresponding b for which b|U ρU † |b = b|U σU † |b for ρ = σ.
By definition, these classical shadows reproduce the quantum state ρ if we average over the unitaries U and measurement outcomes b, i.e., E[ρ U,b ] = ρ. However, in order to efficiently store and compute these shadows classically, we need to derive the explicit form of the channel M. We do this in the next subsection for random Clifford unitaries U ∈ Cl(2 n ). To construct M −1 , we evaluate the effect of the channel M on an arbitrary state ρ, The evaluation of this expression for any arbitrary ensemble U is very hard, if possible at all. However, we recognize that the argument of the average over the ensemble U is a polynomial of order 2 in (U, U † ). This means that if the unitary U is at least a unitary 2-design [61], we can replace the inner average by an integral with respect to the Haar measure dµ Haar (U ), where E U ∈U k≥2 is the average over a unitary k-design. An example of a category of unitary 3-designs, and thus also 2-designs, is the random Clifford unitaries, U ∈ Cl(2 n ) [38], which can be classically simulated and sampled efficiently by the Gottesmann-Knill theorem [62,63]. For this reason, from now on we use random n-qubit Clifford measurements. However, in reality we can use any randomized measurements with a unitary k ≥ 2-design. The advantage of writing the average as a Haar integral is that expressions exist to evaluate some integrals analytically. In particular, below we repeatedly make use of the following three Haar-integral identities [25]: Here, O 1 is an arbitrary Hermitian operator and O 2 and O 3 are traceless operators. Using the second-moment Haar integral given in Eq. (A9), we find that b∈{0,1} n U ∈U dµ Haar (U ) b|U ρU † |b U † |b b|U = ρ + 1Trρ (2 n + 1) .

(A11)
This leads to the following inverse quantum channel for U ∈ U k≥2 : where we use the fact that the channel is unital, M(1) = 1, and X is a Hermitian operator of dimension 2 n × 2 n . As such, the classical shadows (A5) for random n-qubit Clifford measurements of the state ρ are given bŷ

Computing expectation values with classical shadows
Having defined the classical shadows and shown how to construct them from outcomes of randomized measurements, we can now look into the problem of computing expectation values. Suppose that we are interested in expectation values of the form TrρO j , for the set of Hermitian operators {O j } M j=1 . In this subsection, we show that we can compute these expectation values by taking the classical shadows,ρ U,b , as estimators for the density matrix. To keep this review concise, we only focus on the randomized Clifford measurements, for which the shadows are given in Eq. (A13).
In order to show that we can useρ U,b as an estimator for the expectation values of ρ, we need to compute the average, E [Trρ U,b O j ], and the variance, Var [Trρ U,b O j ], with respect to the randomized measurement process. As the estimatorsρ U,b are obtained from randomized Clifford measurements, we can compute these averages by again exploiting the fact that U ∈ Cl(2 n ) is a unitary 3-design. First, by using the linearity of the trace and Eq. (A4), we find that for the average of the shadow expectation value of observable O j . For the computation of the variance, we note that it only depends on the traceless part, O 0 j = O j − 1TrOj 2 n , of O j . This can be seen from This important observation simplifies the variance calculation enormously, as we now only need to compute one Haar integral. Focusing on the first term in the variance, we obtain Here, we use the identity for the third-moment Haar integral (A10) for traceless operators after swapping the trace and integral in Eq. (A17). Subsequently, we perform the sum over all bit strings to obtain the result in Eq. (A18). The variance is thus equal to

Predicting M observables with O(log M ) classical shadows
For the final part of our review of the classical-shadow-tomography algorithm, we show that we only need O(log M ) independent classical shadows,ρ U,b , to predict M expectation values {TrρO j } M j=1 . To simplify the notation, we replace the U and b subscripts with an index; i.e., nowρ i denotes the ith shadow. We follow the approach laid out in Ref. [25] closely and perform a median-of-means estimation. The median-of-means estimator of TrρO j is defined as with This means that we construct n shadows = N K independent shadowsρ i in total, divide them into K subsets of size N , compute the mean of each subset, and finally take the median of the subset means. Although this procedure is slightly more complicated than computing the mean directly on all the shadows, there are scenarios in which the median-of-means estimator is a more robust estimator than the standard mean estimator (see, e.g., Ref. [64]). Moreover, results are known for how many samples (independent estimators) are required to predict a function up to a certain accuracy. Specifically, for a random variable X with finite sample variance σ 2 , the median-of-means estimator,μ(N, K), has the following failure probability [64,65]: where the number of shadows in each subset needs to satisfy N > 2σ 2 / 2 for this inequality to hold. Then, by setting K = (9/2) log(M/δ) and N = 6σ 2 / 2 , we find that The values of the constants N and K are found by minimizing the number of shadows for a fixed upper bound δ/M and with N > 2σ 2 / 2 . In this way, we obtain a number of shadows n shadows = N K smaller than the (reserved) one found in Ref. [25]. Now, by applying a median-of-means estimation to the problem of computing M observables O j , we find that As this is the failure probability for each individual observable O j , the failure probability of all of them together (i.e., that at least one observable fails) follows from Boole's inequality, From this, we arrive at the following theorem [25]. This means that in total we need O (log M ) randomized measurements. Importantly, the mean squared error for observable O j , σ 2 j depends on the operator that one tries to measure, and which specific unitary ensemble is used. As a final remark, note that the constants in Eq. (A26) result from bounding the failure probability. This means that it is a worst-case scenario (upper bound) and in practice one could likely get away with fewer shadows. In this appendix we show that the random state where the unitary U ∈ U k≥2 is at least a quantum 2-design, satisfies Eq. (1) and therefore is a thermal pure quantum (TPQ) state. We start by deriving the expectation value and variance of an observable O in the state |ψ β , for which the results are given in Eqs. (3) and (4). Afterward, we apply Markov's inequality to prove the exponentially vanishing failure probability.

Deriving the variance and expectation value of the thermal pure quantum state
First, we note that the variance of the normalization constant, 0|U † e −βH U |0 , of the TPQ state, |ψ β , is exponentially suppressed. This can be derived by evaluating the variance explicitly and invoking the Haar-integral identities in Eqs. (A8) and (A9): Since 0 < Tre −2βH ≤ 2 n for a positive semidefinite H [66], and for all finite β > 0, this variance will always be exponentially small in the system size, n.
In the following, we use the shorthand notation f ≡ 0|U † e −βH/2 Oe −βH/2 U |0 and g ≡ 0|U † e −βH U |0 . As Var [g] is small, we expand the expectation value of O with a multivariate Taylor expansion up to first order in Var [g]: where Cov[f, g] denotes the covariance between f and g, defined by Cov . Note that here we do not explicitly write the higher-order terms in the Taylor expansion. These higher-order terms need to be evaluated, or bounded, to turn the approximate equality into an equality. Now, by evaluating each of the terms in this Taylor expansion with the Haar-integral identities in Eqs. (A8) and (A9), we arrive at Eq. (3): For the variance of the expectation value, we exploit a similar multivariate Taylor expansion, and obtain Eq. (4) after the evaluation of the Haar integrals

Application of Markov's inequality
In order to show that |ψ β satisfies Eq. (1), we start from the following Markov inequality: Here, E U denotes the ensemble average with respect to the n-qubit Clifford group U ∈ Cl(2 n ). This average can be computed by using the unitary 2-design property of U and writing it in terms of Haar integrals. After evaluating the integrals, this results in The terms that multiply the purity, Trρ 2 β , in this expression, have the form of expectation values, which can be upper bounded by the spectral norm, O 2 . For the first term, Tr(Oe −βH ) 2 Tre −2βH , this can be proven as follows. Inserting the complete set of eigenstates {|λ } of H, i.e., H|λ = λ|λ , we have where we use the inequality of arithmetic and geometric means, √ xy ≤ (x + y)/2 for x, y ∈ R ≥0 .
We thus arrive at which means that |ψ β satisfies Eq. (1) with C ≈ 4 O 2 / 2 and e −αn = Trρ 2 β . The fact that Trρ 2 β can be written as an exponential is explained below Eq. (5) in the main text. In this appendix, we provide the detailed proof of Theorem 1. We construct classical shadows of TPQ states (pure thermal shadows) by using two independent random unitaries. The first unitary, U ∼ Cl(2 n ), is followed by an application of e −βH/2 to create a TPQ state, |ψ β . The second unitary, V , combined with the measurement outcomes in the form of a bit string, |b , is used to construct the corresponding classical shadow,η V,b = M −1 (V † |b b|V ). We consider the cases where V is drawn from Cl(2 n ), a random n-qubit Clifford measurement, and Cl(2) ⊗n , a random Pauli measurement, respectively. We start by deriving the mean squared errors of the thermal shadows, Eqs. (8) and (9), for both measurement protocols. Then, from a direct application of Theorem 2, we obtain Theorem 1, which completes the proof.

Random Clifford measurements
The random-Clifford-measurement protocol consists of the application of a Clifford circuit V ∼ Cl(2 n ) to the TPQ state, |ψ β , followed by a computational-basis measurement with bit-string outcome |b . We first compute the average, E [Trη V,b O], and then the mean squared error, σ 2 , with respect to the random Clifford measurement process. For the average, we find that where we use the fact that the averaged shadow reproduces the TPQ state, In order to compute the mean squared error, we first write it in terms of the variance of the thermal-shadow expectation value: The thermal-shadow variance can be computed from where O 0 is the traceless part of the observable O. Focusing on the first term and invoking Eq. (A18) for the random measurement process, we obtain From this, we find that the variance is equal to The combination of this expression with Eq. (C2) then results in Eq. (8) for the mean squared error. We note that the mean squared error of the thermal shadows is, up to an exponential correction, equal to the σ 2 that one obtains by plugging ρ = ρ β in Eq. (A19). As such, for sufficiently large systems, shadows obtained from randomly measuring the TPQ state perform approximately as well as shadows obtained from measuring the true Gibbs state.

Random Pauli measurements
Random Pauli measurements can be performed by applying a tensor product of random single-qubit Clifford gates, V = n i=1 V i , to the TPQ state, |ψ β , followed again by measurements in the computational basis with bit-string outcomes, |b = n i=1 |b i . For these random Pauli measurements, the quantum channel M [Eq. (A3)] is given by the depolarizing channel, (D 1/3 ) ⊗n (for the derivation, see, e.g., Ref. [25]). This means that the inverse channel is given by M −1 (X) = (D −1 1/3 ) ⊗n (X) and the classical shadow can be constructed from, for the expectation value of the Pauli shadows in Eq. (C6). Thus, shadows from randomized Pauli measurements can, like Clifford measurements, be used as estimators for Gibbs-state expectation values. For the mean squared error, we use Eq. (C2) again and evaluate the first term in the variance [Eq. (C3)] for the shadows constructed from Pauli measurements. For this we assume that we are interested in an observable O, which is a k-local Pauli operator, O = P 1 ⊗ · · · ⊗ P k ⊗ I ⊗(n−k) . For more general observables, our calculation below can be modified by using the results presented in the appendix of Ref. [25]. We find that the first term in the variance is given by Here, we use the second-moment Haar integral [Eq. (A9)] for a single qubit to compute the average over E Vj and the fact that the contribution from I ⊗(n−k) evaluates to 1. The combination of this with the second term in Eq. (C3), and with Eq. (C2), leads to Eq. (9).
Then, Theorem 2 holds with σ 2 j replaced by Eq. (C2) for the mean squared error of observable O j . These errors are for random Clifford and Pauli measurements given by Eqs. (8) and (9). This completes the proof of Theorem 1. encoding the input matrix H on a primary n-qubit system, i.e., embedding H into a unitary matrix W by adding a m-qubit ancillary register. For any state |ψ of the primary system, W is defined such that where |Ψ ⊥ satisfies ( 0 m | ⊗ 1)|Ψ ⊥ = 0 and a is some constant obeying a ≥ H . To construct such a block encoding, we employ the linear combination of unitaries (LCU) method [45]. The LCU provides a way to block encode H when it is expressed as a linear combination of unitaries {P k } K k=1 , H = K k=1 a k P k with a k ∈ R. Following the main text, we take the unitary operators to be Pauli operators P k ∈ {1, σ x , σ y , σ z } ⊗n . The LCU consists of two unitary operators: i a unitary operator A acting on the ancillary register with m = log K such that A|0 m = 1 √ a k √ a k |k with a = k a k and ii a controlled unitary operator B = K k=1 sign(a k )|k k| ⊗ P k with the sign function, sign(a) = +1(−1) for a ≥ 0(a < 0) [67] Then, one can readily show that W = A † BA gives a block encoding of H, i.e., it satisfies Eq. (D4).
In order to clarify the connection with QSP, we expand the input state, |ψ = λ c λ |λ , in the eigenbasis of H, (H/a)|λ = λ|λ . In each eigensubspace spanned by the basis vectors, the block encoding W has the matrix representation Therefore, the block encoding W can be identified with the signal-processing operator W [Eq. (D2)] in the qubitized space spanned by Eq. (D5). The signal-processing rotation operator S(φ) in the qubitized space is constructed as where we have added a single-qubit ancillary register and defined C |0 NOT = σ x ⊗ |0 m 0 m | + 1 ⊗ (1 ⊗m − |0 m 0 m |). This operator S(φ) acts trivially on the primary system. Combining them, we find that the following sequential application of unitaries,  FIG. A1. The purity, Trρ 2 β , as a function of the system size, n, for (a) the XXZ-Heisenberg model (with J = 0.5, ∆ = 0.7, and closed boundary conditions) and (b) a random Hamiltonian with all-to-all connectivity (the XYZ-Heisenberg model with random couplings between all qubits and a random field on each qubit). Note that for (b), we use a different random initialization for each n. The jump observed at n = 8 and β = 2 is an outlier.
FIG. A2. The mean error of an ensemble of size nTPQ ∈ {1, 5, 10, 100} thermal pure states and different system sizes. For this, we compute the mean error on all possible one-and two-qubit Pauli observables. We see that for larger n, the required size of the ensemble, for a fixed mean error, becomes smaller, in accordance with the vanishing of the variance. For these simulations, we use the XXZ-Heisenberg model with J = 0.5, ∆ = 0.7, and closed boundary conditions.
In practice, the first term in this derivative is usually given (experimental measurement data) or can be computed efficiently classically (for density matrices based on a classical data set, as discussed in Ref. [51]). The second term, H ij ρ θ , requires the evaluation of expectation values of the parametrized Gibbs state ρ θ . This is where we apply