Quantum Error Mitigated Classical Shadows

Classical shadows enable us to learn many properties of a quantum state $\rho$ with very few measurements. However, near-term and early fault-tolerant quantum computers will only be able to prepare noisy quantum states $\rho$ and it is thus a considerable challenge to efficiently learn properties of an ideal, noise free state $\rho_{id}$. We consider error mitigation techniques, such as Probabilistic Error Cancellation (PEC), Zero Noise Extrapolation (ZNE) and Symmetry Verification (SV) which have been developed for mitigating errors in single expected value measurements and generalise them for mitigating errors in classical shadows. We find that PEC is the most natural candidate and thus develop a thorough theoretical framework for PEC shadows with the following rigorous theoretical guarantees: PEC shadows are an unbiased estimator for the ideal quantum state $\rho_{id}$; the sample complexity for simultaneously predicting many linear properties of $\rho_{id}$ is identical to that of the conventional shadows approach up to a multiplicative factor which is the sample overhead due to error mitigation. Due to efficient post-processing of shadows, this overhead does not depend directly on the number of qubits but rather grows exponentially with the number of noisy gates. The broad set of tools introduced in this work may be instrumental in exploiting near-term and early fault-tolerant quantum computers: We demonstrate in detailed numerical simulations a range of practical applications of quantum computers that will significantly benefit from our techniques.


I. INTRODUCTION
Quantum computers are developing rapidly and can already be said to perform certain demonstration tasks that are impossible or very difficult with even the largest supercomputers [1][2][3][4][5].It is however still to be seen whether the technology can achieve true practical quantum advantage, i.e., the point when these machines can solve an otherwise impossible computational task that is of value to industry or to researchers in other fields such as quantum field theory [6], quantum gravity [7] or drug development and materials science [8][9][10][11].
Quantum computers are highly vulnerable to noise and while quantum error correction provides a comprehensive solution, implementing it poses an extreme engineering challenge [12].It is generally expected that some form of early practical quantum advantage just beyond the reach of classical computing could be achieved even with noisy quantum computers [13][14][15][16].This prospect has motivated the development of a broad range of quantum error mitigation protocols which has grown into an entire subfield.While the range of error mitigation tricks are very diverse, they collectively aim to mitigate the effect of gate errors in an expected-value measurement process -a key subroutine in quantum computing.
Another major challenge is that near-term quantum algorithms typically require an extreme number of circuit repetitions in order to suppress shot noise [17][18][19][20].Classical shadows were introduced relatively recently [21] and represent another promising angle in achieving practical quantum advantage.The approach allows one to extract many properties of a quantum state without having to repeat the measurement many times.This is achieved by performing measurements in randomized bases.The measurement outcomes as bitstrings, along with the indexes of the measurement bases form a classical shadow which is an efficient classical representation of the entire quantum state.Shadows have become an entire subfield and various promising applications have been proposed [22,23] that greatly benefit from the rich information one can access via shadows.For instance, in shadow spectroscopy [23], we estimate many time-dependent expected values from time-evolved quantum states, which then allows us to reveal accurate spectra through the use of efficient classical post-processing.
The focus of the present work is to amalgamate quantum error mitigation techniques with classical shadows.It is worth noting that prior works have considered fruitful connections between quantum error mitigation and classical shadows.First, ref. [24,25] use classical shadows obtained from a noisy quantum state to perform purification-based error mitigation [26][27][28] offline, with only access to a single copy of the state but at an exponential complexity in the number of qubits.Second, the mitigation of errors in the randomised measurements have similarly been addressed in [29,30].
In the present work we address a distinct problem: Previous methods have assumed that the task involves extracting information from a predetermined quantum state ρ, such as the output of a quantum device.However, we consider the practically more relevant scenario where the state ρ is generated by a noisy quantum circuit, In the present work we assume we only have access to a noisy quantum computer (left) such that every circuit we run (left, yellow area) gets corrupted by gate noise (unwanted red gate elements).We aim to extract properties of a state that would be prepared by an ideal quantum computer (right) with the use of powerful error mitigation techniques.We provide a rigorous theoretical framework for PEC shadows which effectively allows us to obtain a classical shadow of the ideal quantum state (noise-free shadow) from which we can predict many ideal properties in classical post-processing (middle blue area, classical computer).In our formalism we run a series of distinct quantum circuit variants (left, yellow area) that cast different classical shadows (noisy shadows) due to gate noise and due to our intentional recovery operations (red gate elements and their shadows).
Under the assumption that the device's error characteristics have been appropriately learned, we can estimate the noise free shadow (middle) via classical post-processing.
and our aim is to mitigate the impact of errors induced by the noisy quantum gates.Our focus is thus to extract properties of an ideal state ρ id , which would be generated by a noise-free quantum computer.This approach is a generalisation of quantum error mitigation techniques which generally aim to extract an ideal expected value Tr[Oρ id ] when only noisy measurements Tr [Oρ] are available.In contrast, the techniques we present are not restricted to a single expected value but instead provide efficient classical representations of the ideal quantum state ρ id through powerful classical shadows as illustrated in Fig. 1.While we cover most classes of conventional error mitigation techniques, such as Probabilistic Error Cancellation (PEC), Zero Noise Extrapolation (ZNE) and Symmetry Verification (SV), we find that PEC is the most amiable to be used in combination with classical shadows.We thus dedicate most attention to PEC shadows for which we establish a comprehensive theory: Assuming the error model of the quantum gates has been appropriately learned, we rigorously prove that our PEC shadow is an unbiased estimator of the ideal quantum state ρ id .We also furnish explicit, efficient classical reconstruction algorithms that enable the simultaneous prediction of linear and non-linear properties of ρ id .
Similarly to conventional error mitigation techniques, the ability of estimating properties of the noise-free scenario comes at the cost of an increased statistical variance which implies an increased number of circuit repetitions.We establish rigorous bounds on sample complexities and our results indicate that: (a) the sample complexity of PEC shadows is identical to that of conventional shadows up to a multiplicative factor, and (b) this multiplicative factor ∥g∥ 1 has already been known in the literature as the sample overhead of the PEC approach [13].Thus the present techniques are efficient in the sense that sample complexities are independent of the number of qubitsbut of course the overhead grows exponentially with the number of noisy gates.
In numerical simulations we showcase a broad range of useful practical applications that will play a crucial role in both the near-term and in the early fault-tolerance era.These examples comprise: (a) determining error mitigated energies in variational quantum circuits, which constitutes a fundamental subroutine in near-term applications; (b) predicting many properties simultaneously in ground state preparation to extract two-point correlators or to accelerate the training of circuit parameters [21][22][23]; (c) extracting error mitigated local entanglement entropies of a ground state that is prepared by a noisy quantum circuit.Moreover, we discuss several other applications that will significantly benefit from our efficient amalgams of quantum error mitigation and classical shadows.
This paper is organised as follows.In the following section we first briefly review the formalism of classical shadows.Then in Section III we introduce our main result as Probabilistic Error Cancelled shadows.In Section IV we discuss how to combine further error mitigation techniques with classical shadows but conclude that PEC shadows admit the most natural formalism.Finally, in Section V we demonstrate powerful applications of our approach and then conclude in Section VI.

II. PRELIMINARIES: CLASSICAL SHADOWS
The original idea of classical shadow tomography is to apply to the quantum system of N qubits prepared in a specific state ρ a unitary Q j randomly sampled from a certain ensemble Q; typically the ensemble corresponds to just rotating the individual qubits with single-qubit unitaries (Pauli basis measurements) or applying Clifford rotations.This is followed by a measurement in the computational basis, yielding a bitstring b ∈ {0, 1} N as the outcome; This bitstring is logged along with the measurement basis forming the index l = (j, b).The collection of these indexes from many independent runs of the protocol then allow us to construct a classical shadow of the state.A classical shadow provides a description of the quantum state that can be classically efficiently stored and manipulated, bypassing the computationallyexpensive reconstruction of the full density matrix [21].

A. Classical shadows via idealised measurements
We mathematically describe a particular measurement outcome l = (j, b) by the positive operator as E l =p j Q † j |b⟩⟨b|Q j ; The probability q l = Tr(ρE l ) of this outcome is a product of a (classical) probability p j of choosing a unitary Q j and the probability of observing the bitstring b given the rotated measurement basis.The shadow protocol can be therefore compactly described by a set E of N E = 2 N |Q| positive operators given by In the literature, such a collection E of positive operators E l that sums up to the identity is referred to as a generalised measurement (positive operator-valued measure -POVM) and E l are called its effects [31].It has been shown that formulating shadow tomography using POVMs brings various advantages [32].Particularly relevant to our purpose, this formulation allows one to automatically account for errors in measurements, which include both read-out errors and gate errors in the implementation of the random unitaries Q j [12,29,30,33,34].This is carried out by simply adjusting the effects E l appropriately [32,35] as we detail towards the end of this section.
Given the above generalised measurement, a single outcome l = (j, b) can be used to construct a snapshot as ρl = C −1 E (E l ) where the channel C E is defined by which is invertible if E spans the whole space of observables [21,32].The snapshot can be thought of as singleshot-estimator of the prepared state ρ.In an experiment one repeats the above single-shot procedure N s times, which produces a collection of outcomes {l 1 , l 2 , . . ., l Ns }.Accordingly, a collection of snapshots can be constructed which is called a classical shadow of ρ.The classical shadow allows us to obtain an unbiased estimate for the density operator in the sense ρ Crucial to the advantage of shadow tomography is that when the measurement E consists of independent measurements on individual qubits, the snapshots ρl also factorise into a tensor product over the qubits.It is therefore sufficient to store single-qubit tensoring factors of ρl , instead of the exponentially large matrix itself [21].Functions of the density operator with appropriate locality, such as correlation functions or the Rényi entropy, can also be efficiently estimated [21].As an example, for experimentally-friendly case of randomised noiseless Pauli basis measurements on the qubits, the snapshot corresponding to l = (j, b) is given explicitly by Above b (i) is the i th bit of the N -qubit measurement outcome bitsring b, and j is the i th single-qubit basis transformation in the applied N -qubit Pauli basis transformation Q j .In the following, we focus on this practically-pivotal randomised Pauli-measurement scheme.However, our general formalism can immediately be applied to other unitary ensembles such as matchgates [36], Clifford circuits [21] and beyond [37].

B. Mitigating readout errors
An advantage of having introduced classical shadows through generalised measurements (POVMs) is that it is now straightforward to incorporate readout-error mitigation techniques [32].In particular, readout errors refer to the classical process of incorrectly assigning the labels b to the measurement outcome.While in ion-trap devices readout errors may not be significant, i.e., below error levels of gate operations [38,39], in solid-state devices these errors can be quite substantial and can be on the order of several percent [40].We illustrate the approach by considering a simple readout-error model where entries in the bitstring b undergo random and independent bit flips (i th bit is flipped as 0 → 1 with probability α + i and as 1 → 0 with probability α − i ) -while existing techniques for addressing correlated readout errors as well as gate errors in the shadow basis rotations are indeed similarly applicable [13,32,33,41].As a consequence, the idealised measurement operators |0⟩⟨0| and |1⟩⟨1| on the i th qubit are replaced according to the readout error model, in the present case by ( , which then allows us to explicitly build the effects in Eq. ( 1) and invert the measurement channels in Eq. (2).
For example, we can analytically obtain a simple formula for the Pauli measurement snapshot from Eq. ( 3) for the specific readout-error model when α In the more realistic case when α + i ̸ = α − i the snapshots can still be computed straightforwardly by numerically (rather than analytically) inverting the single-qubit channel in Eq. ( 2), model-free readout-error mitigation is also applicable [42].In conclusion, as measurement-error mitigation techniques are completely decoupled from mitigating state-preparation errors, our mathematical theorems will quantify the sample complexity of the latter, while we demonstrate in numerical simulations that it is indeed straightforward to combine measurement-error mitigation techniques with the present approach.

III. PROBABILISTIC ERROR CANCELLED SHADOWS
While PEC has been used in the literature to remove the bias in expected-value measurements [13], here we apply it to classical shadows to obtain an efficient, classical representation of the entire ideal, noise free state ρ id .While this procedure even allows us to estimate the full density matrix ρ id , we will focus on efficient practical applications such as simultaneously predicting many properties of ρ id .At a technical level PEC shadows is a combination of two random processes, i.e., sampling circuit variants G k and sampling the bitstrings and the basis transformations that form a shadow.

A. Probabilistic error cancellation
PEC is one of the most broadly studied error mitigation techniques [13,43,44] and indeed has been implemented experimentally [13][14][15][16]; It is performed by decomposing the channel U of an ideal unitary gate into a linear combination of noisy physical gate operations Negative quasiprobabilities γ k < 0 are required to formally implement the inverse of a noise channel.Thus the above operation is nonphysical, similarly as the inverse measurement channels of the shadow protocols are nonphysical operations.For this reason, PEC only applies the decomposition in classical postprocessing, at the level of expected values and allows us to compute ideal expected values as a linear combination of noisy ones as Let us first give an overview of efficient methods that have been developed to accurately identify the coefficients {γ k } for the quasi-probability decomposition [13,[45][46][47][48].The simplest such approach exploits that noise models are approximately local and one can thus efficiently characterise the local noise channel of each gate and invert them classically.Recent advances allow for non-local noise models of the form Λ G = e L to be efficiently learned for the case of sparse Pauli operations L resulting in a trivial inverse of the channel as Λ G−1 [47].Making the assumption that U is supported only on the space spanned by the noisy operations G k , one then randomly applies circuit variants that implement the operations G k in the inverse noise channel.Under this assumption, theoretical guarantees have been derived on the sample complexity of learning the error model [47].Given the noise model can be learned prior to the experiment with a constant overhead, we assume in this work that a sufficiently high-precision estimate is available [49] Indeed a considerable experimental challenge is posed by the possible drift of the error models over time which may necessitate the repetition of the learning procedure every so often further increasing its sample budget.Let us finally note that our scope goes beyond mitigation of physical gate errors and the formalism developed here can be immediately applied to other scenarios, such as the following.a) Overcoming finite rotation-angle resolution whereby the quasiprobability decomposition is known exactly [50] b) Mitigating logical errors in earlyfault tolerant devices whereby the dominant source of noise may arise from imperfect magic-state distillation but noise characterisation and mitigation are effectively identical to the presently considered formalism [51][52][53].
We consider that an ideal quantum state ρ id := U circ |0⟩⟨0| is prepared by an ideal circuit of ν gates whose channel we denote as By introducing the vector notation k = (k 1 , k 2 , . . .k ν ), we can compactly represent the decomposition of this circuit into noisy gate sequences as U circ = k g k G k .Here the index k indexes all possible gate sequences as (2) kν , (4) and as shown above the corresponding quasiprobabilities g k factorise (the superscript indexes individual gates, e.g., G k1 stands for the decompositon of U 1 ).We now define the quasiprobability decomposition of a quantum circuit.
Definition 1.We define the quasiprobability decomposition of an ideal circuit U circ via the set We also define the associated probability distribution p(k) := |g k |/∥g∥ 1 and here the norm factorizes as ∥g∥ 1 = ν k=1 ∥γ (k) ∥ 1 into a product of individual norms.The above quasiprobability decomposition has been used for estimating the ideal expected value of an observable O, Tr[OU circ |0⟩⟨0|], by randomly sampling the noisy expected values sign(g k )Tr[OG k |0⟩⟨0|] according to the probability distribution p(k) and linearly combining them in a classical post-processing step [13,43,44].The norm ∥γ (k) ∥ 1 can be evaluated straightforwardly for any probabilistic error model: Assuming that during the execution of the k th gate an error happens with probability p k , e.g., Pauli errors, we obtain the single-gate norm [13].Thus, the cost of error mitigation-as the product of these individual norms from Definition 1-grows as ∥g∥ 1 ∈ O(e 2ξ ) with the expected number ξ = k p k of errors in the full circuit, rendering the approach impractical when ξ ≫ 1 [13,43,44].

B. Details of the protocol
We start by applying the PEC protocol in a more general setting such that the quasiprobability decomposition allows us to obtain an unbiased estimator of the full density matrix.
Lemma 1.Given a quasiprobability decomposition G from Definition 1, by sampling the noisy circuits G k according to the probability distribution p(k) we obtain an unbiased estimator of the ideal density matrix ρ id := U circ |0⟩⟨0| as The above estimator has a clear operational meaning: (1) choose a multi-index k randomly according to the probability distribution p(k) and run the noisy quantum circuit G k ; (2) the output state G k |0⟩⟨0| is a density matrix that we multiply by sign(g k ) and with the norm ∥g∥ 1 ; (3) formally, the mean of these matrices is an estimate of the ideal density matrix ρ id .
Regrettably, the above protocol is purely formal as the multiplication with negative quasiprobabilities is non physical and could only be achieved in post-processing, e.g., after fully reconstructing the density matrix.We thus exploit classical shadows as a powerful tool for obtaining an efficient classical description of the states which can then be naturally assigned negative quasiprobabilities in classical post-processing.Indeed, snapshots are not physical density matrices either, as is apparent in Eq. (3).We now state our protocol that serves as an unbiased estimator of the ideal state.
Theorem 1 (PEC shadows).Given a quasiprobability decomposition G of the ideal circuit U circ from Definition 1, and a classical shadow protocol with the POVM measurement E from Eq. (1), we define PEC shadows as the set H := {(g k , G k , E l )} k,l and define the corresponding PEC snapshot as We will often use the notation ρid to abbreviate ρk,l as it is an unbiased estimator of the ideal density matrix Above the averaging E[•] happens not only over the effects E l indexed by l (all basis transformations and all measurement outcomes), but additionally we average over all circuit variants indexed by k.The reason is that the measurement E = {E l } l is not performed on a fixed input density matrix ρ as in conventional shadows but rather on the quasiprobability decomposition of the ideal state ρ id ∝ G k |0⟩⟨0|.Let us now summarise the resulting experimental protocol.
• choose randomly a multi-index k according to the probabilities p(k) and store the sign(g k ) • choose uniformly randomly a unitary rotation Q j ∈ Q and store its index j • execute in a quantum computer the gate sequence G k , the unitary rotation Q j , perform a measurement in the standard basis and finally register its outcome b • repeat the procedure and collect N s classical snapshots to build a classical shadow of the ideal state The classical dataset S(ρ id , N s ) can then be classically post-processed offline and we detail explicit algorithms for predicting local properties in Section III D.
Note that PEC shadows produce a distribution of snapshots that is different than directly applying conventional shadows to a noise-free state ρ id , albeit with an identical mean.The reason is that each circuit variant G k in Eq. ( 5) yields a different distribution of classical snapshots.For example, in the next section we prove bounds on variances of PEC shadows and find that they are indeed increased compared to conventional shadows applied directly to ρ id .

C. Rigorous performance guarantees
We first consider the pivotal practical application as predicting error mitigated expected values of observables O via the estimator ô = Tr[O ρid ].A key observation is that in error mitigation techniques the ability to predict noise-free expected values comes at the cost of an increased statistical variance which implies an increased number of circuit repetitions.We now bound the variance of any operator's expected value.
Lemma 2 (variance of linear properties).Given an observable O and the PEC snapshot ρid from Theorem 1, the variance of ô = Tr[O ρid ] can be upper bounded as where ∥•∥ 2 E is the shadow norm of the observable O as defined in Lemma 3. When O is a q-local Pauli string and we use Pauli basis measurements then ∥O∥ 2 E = 3 q as we detail in Lemma 3.
We explain in Appendix D that we can account for the cost of readout-error mitigation via the above shadow norm, which becomes 3 q (1−2α) −2q when considering a simple readout-error model with probability at most α.Observe that the above variance depends on two factors: The first one is the squared shadow norm ∥O∥ 2 E which determines the sample complexity of conventional shadows [21]; The second factor is a multiplicative term ∥g∥ 2  1 which accounts for the well-known measurement overhead associated with the conventional PEC protocol [13,43,44] We defer further discussion to Appendix A 2 where we also explain how the shadow norm depends on the unitary ensemble Q: While we only state explicitly the shadow norm for the practically most important ensemble of Pauli basis measurements, we note that bounds for other ensembles are immediately available in the literature [21,36,37].Furthermore, we also explain in Appendix B 1 that the above bound is expected to be pessimistic due to an even more significantly overestimated constant prefactor than in conventional shadows.
Following the approach of [21] we use concentration properties of the median of means estimator to derive rigorous sample complexities: for the simultaneous prediction of many observables O 1 , . . ., O M we exponentially suppress statistical outliers by splitting the PEC shadows S(ρ id , N s ) into independent batches and then computing a median of the means as we detail in Appendix B 2. The resulting bounds depend on two performance metrics as the accuracy ϵ and the success probability δ.
Theorem 2 (informal summary).Given the PEC shadows where we use the largest shadow norm ∥O k ∥ 2 E .Refer to Theorem 4 for a formal statement of this theorem.
Finally, we consider predicting non-linear properties of the state of the form Tr[O(ρ id ) m ].Following ref. [21] we use the fact that a polynomial function in the quantum state can be written as a linear function in tensor products of the state, for example, where the SWAP operator swaps the two copies.In Appendix B 3 we detail our construction using U-statistics to derive unbiased estimators in terms of the classical snapshots, e.g., for m = 2 we select all distinct pairs of snapshots (ρ id ) i ⊗ (ρ id ) j with i ̸ = j.We can bound the variance of any non-linear property as follows.
Theorem 3 (variance of nonlinear properties).Given our PEC snapshots ρid from Theorem 1 we can estimate polynomial properties of degree m of the ideal state ρ id via FIG. 2. Illustration of the light cone of an observable O which is represented by the measurement apparatus and only acts nontrivially on the second (from top) qubit.The orange area indicates the qubits which are contained in the light cone I of the observable with respect to the ideal quantum circuit (orange boxes) U3U2U1.To simplify derivations we assume the gate noise channels N k are local (light green boxes) such that they are contained within the lightcone I but our results can be extended to non-local models via [54].
U-statistics of tensor products of all distinct snapshots.The number of samples required to predict the non-linear property scales as N s ∈ O(||g|| 2m 1 /ϵ 2 ) for a desired accuracy ϵ.Refer to Appendix B 3 for details.
One can similarly apply a median of means estimator to enable simultaneous prediction of many non-linear properties: We detail an explicit protocol in the next section for simultaneously estimating many local Rényi entropies.Note that measurement overhead ||g|| 2m 1 grows with the 2m th power of the quasiprobability norm consistent with our effective construction of m copies of the original noisy circuit which leads to an effective m-fold increase in the number of noisy gates.

D. Classical post-processing algorithms
In this section we summarise reconstruction algorithms for the practically pivotal scenario of Pauli basis measurements from Section II.
Algorithm 1 (local Pauli strings).The expected value of a q-local Pauli observable P in a PEC snapshot can be calculated analytically as The reconstruction algorithm iterates over all snapshots in the shadow S(ρ id , N s ) and calculates the median of means of the above expression using a number of batches provided by the user.Here f (b, Q j ) ∈ {±1, 0} results in 0 if the measurement bases in Q j are incompatible with P and ±1 if the measurement bases are compatible with P while the sign is determined by the bitstring b.The algorithm has runtime O(qN s ).
We defer the detailed derivation to Appendix C 1. Note that the above error mitigated reconstruction algorithm deviates from that of [21] as the individual snapshot outcomes are multipiled with the norm ∥g∥ 1 and signs of the quasiprobabilities sign(g k ).
Since we reconstruct q-local Pauli observables, we can significantly reduce the sample variacne via light-cone arguments [54,55].In Fig. 2 we illustrate the light cone that an observable creates with respect to the ideal unitary circuit U circ .To simplify the following arguments we assume local noise models to guarantee the same light cone is valid for all gate sequences G k , however, it is straightforward to extend the arguments to non-local noise following [54].
We observe that for each gate that is not within the light cone of the observable P we can "turn off" PEC thereby not wasting our measurement budget on mitigating noisy gates that do not affect our observable.Algorithm 2 (light cones).Given a q-local Pauli string P we define the set of indexes of all gates in the light cone of the obserable as I := {l | U l is in the light cone of P }.We then simply use Algorithm 1 with a modified set of quasiprobabilities from Definition 1 as The algorithm has the same asymptotic runtime O(qN s ) as Algorithm 1 and only incurs a negligible preprocessing time to determine the index set I specifically for each P .
Refer to Appendix C 2 for a derivation.The measurement cost ∥g∥ 1 is thus determined by the number of gates in the light cone of P rather than by the total number of gates ν.Imagine, for example, noisy quantum gates with ∥γ (l) ∥ 1 = 1+p; the measurement cost is determined by (1+p) |I| as opposed to the worst case ∥g∥ 1 = (1+p) ν where ν is the total number of noisy gates as detailed in Ref. [54].A significant advantage of this procedure is that it does not require one to modify the experimental protocol, i.e., the noise in all gates can be mitigated in the shadows.
Finally, we consider estimating Rényi entropies via the purities Tr ρ 2 Q as R Q := − log Tr ρ 2 Q where ρ Q is the reduced density matrix of the subsystem Q. Algorithm 3 (local purities).Given a subsystem as the set of qubits Q = {q 1 , ..., q m }, an unbiased estimator for the respective purity is obtained as Here i and j abbreviate indexes of the snapshots as, e.g., i = (k, l) and i ̸ = j.The algorithm iterates over all distinct pairs of snapshots in the shadow S(ρ id , N s ) and calculates the median of means of the above expression.The factors f (i, j, Q) depend only on whether the measurement bases and outcome bitsrings are identical within the subsystem Q.The algorithm has a runtime O(|Q|N 2 s ).We defer the detailed derivation to Appendix C 1. Note that the runtime is linear in the subsystem size and quadratic in the number of shots.For sufficiently small subsystems and large numbers of shots it might be preferred to use the exponentially O(4 |Q| N s ) scaling algorithm of [21].

IV. FURTHER ERROR MITIGATION TECHNIQUES A. Error extrapolated shadows
The key idea behind zero-noise extrapolation resides in the possibility of increasing the noise in the circuit and extrapolating expected values back to the case of zero noise.The approach is intuitive to use, requires less resources than PEC but yields a biased estimator.A non-trivial aspect, however, is choosing the correct model function for the extrapolation which has been extensively discussed in the literature [13,14,44]; typical models include a linear function, an exponential function or a linear combination of multiple exponentials.
We consider extrapolation as a means for mitigating errors in properties extracted from classical shadows.
The key ingredient we require is the ability to generate a collection of shadows at different noise strengths S(ρ p0 , N s ), ..., S(ρ pn , N s ) such that p k ≥ p 0 and p 0 is the device's lowest possible noise strength.These shadows enable us to extract the expected values f m (p) = Tr[O m ρ p ] at a given noise level p.By fitting a suitable model function fm (p), e.g., a linear model, to this dataset we can approximate ideal properties of the state using an extrapolation via the limit While we could certainly leverage existing techniques for physically increasing noise rates in a circuit to obtain S(ρ p , N s ) [13,14], we can also exploit the power and flexibility of the previously derived PEC shadow approach: Instead of considering the quasiprobability representation of the ideal circuit in Definition 1 we can rather decompose the noise-boosted circuits as G circ (p) = k g k (p)G k with non-negative probabilities g k (p).For example, in the case of local depolarising noise the circuit variants G k are simply obtained by randomly inserting Pauli X, Y or Z operations after each noisy gate with probabilities p−p 0 .Furthermore, Lindblad-Pauli learning directly gives access to the continuous set of circuits G circ (p) [47].
Let us now state a corollary to Theorem 1 that allows us to obtain the shadows of error boosted states ρ p := G circ (p)ρ ref .
Corollary 1 (error-boosted shadows).We consider the parametric quasiprobability decomposition G as noiseboosted circuits G circ (p) with p ≥ p 0 .The PEC shadows It follows that ρp is an unbiased estimator of the noise-boosted density matrix A significant advantage in boosting noise via p ≥ p 0 rather than reducing it is that now every quasiprobability is non-negative g k (p) ≥ 0 and thus we do not incur a measurement overhead in Theorem 2 via ∥g(p)∥ 1 = 1.Nevertheless, the extrapolated value indeed suffers from an increased variance which implies an increased number of samples and details can be found in the literature [13].
Note that the above scheme can be applied beyond the estimation of expectation values.For instance, one can in principle use shadows to reconstruct partial density matrices ρp at different noise strengths p and apply ZNE to individual matrix entries.However, note that ZNE might require different kinds of model functions f (p) for different properties, e.g., non-linear models for predicting non-linear properties of the state.In contrast, the great advantage of PEC shadows is that it provides an unbiased estimator for the entire quantum state.

B. Symmetry verified shadows
Symmetry verification is another leading quantum error mitigation technique [56,57]; It exploits that often the ideal states to be prepared ρ id are pure states that obey certain problem specific symmetry group operations described by S ∈ S. The fact that the ideal state is symmetric then implies that it "lives in" the subspace defined by the projection operator which satisfies Π 2 S = Π S .Given a noisy state ρ, one might be able to measure the above symmetries (in fact their generators are sufficient) via, e.g., Hadamard-test circuits, and retain only circuit runs that produce the correct symmetry outcomes [13,58].Such post-selection projects the noisy state back into this symmetry subspace producing the effective output state as ρ sym = Π S ρΠ S / Tr(Π S ρ).We can apply conventional shadow tomography to this symmetry-verified state ρ sym thereby effectively obtaining error mitigated shadows, i.e., an unbiased estimator of ρ sym .The sampling overhead of this post-selection technique is Tr(Π S ρ) −1 the inverse of the fraction of circuit runs that pass the symmetry verification process.
Instead of post-selection, we can also perform symmetry verification at the post-processing stage.Suppose we are interested in the expectation value of the ideal state with respect to the target observable O, the target expectation value can be written as Conventional shadow tomography can be used well in practice for estimating SOS ′ and S for all S, S ′ ∈ S when the symmetries are sufficiently local, i.e., they are supported on at most weight-s Pauli operators.Then, given a Pauli observable O of weight at most q, the effective observable SOS ′ is then at most of weight-(2s+q).However, the sample complexity of conventional shadows with Pauli measurements grows exponentially with the weight of the Pauli string and it is thus crucial that the total weight 2s+q be reasonably small.
For example, a typically used symmetry in fermionic simulation is the fermionic particle number parity which is, however, usually a high-weight operator for standard encodings such as the Jordan-Wigner encoding.Nevertheless, one can use encodings that come with inherent local symmetry generators like Majorana loop encodings [59], or even implement the circuit using some small quantum codes with local stabilisers [60].However, even if these symmetry generators are local, the number of generators scales with the number of qubits thus some symmetries they generate are still high-weight.Hence, in order to efficiently use shadow techniques, we can apply verification using a constant number of local symmetry generators, such that the highest-weight symmetry that can be generated is upper-bounded by some constant.
We also note that the sampling cost can be reduced when the target observable O commutes with the symmetry projector Π S which is often the case in typical applications.In such a scenario, Π S OΠ S = OΠ S and thus .
This way the effective observables we need to estimate from shadows are SO and S for all S ∈ S which have a reduced weight s+q compared to the previous 2s+q.

V. APPLICATIONS
In this section we showcase how our approach can effectively extend the reach of noisy quantum computers and explore its practical applications.Recall that fully fault-tolerant quantum computers will enable executions of (in principle) arbitrarily deep circuits thus allowing users to extract expected values via, e.g., amplitude estimation, whose time complexity is superior O(M/ϵ) but is proportional to the circuit depth.In contrast, we focus on application areas where these coherent techniques are prohibitive due to circuit depth limitations, e.g., due to non-negligible logical error rates expected in early faulttolerant devices.The advantage of classical shadows is that they only require an increase of circuit depth that is independent of the state preparation circuit -and this increase is negligible for Pauli shadows.Since the present approach has a sample complexity O(log(M )/ϵ 2 ) it is particularly well suited for applications where the aim is to extract a large number M of properties.
For instance, noisy quantum computers in either the late NISQ era or in the early fault-tolerance era will enable us to simulate the time evolution of quantum states or to prepare ground or eigenstates [17-19, 22, 23, 61, 62].Our approach can then be used to accurately and efficiently extract a large number of properties of these  10) but our aim is to learn properties of the noise-free state.(middle) Energy estimation errors for different noise strengths with conventional shadows (dashed blue, dashed red) and with PEC shadows including readout error mitigation (solid blue, solid red).Bias (grey solid lines) is introduced when the ground state energy Tr(ρH) is directly estimated from the noisy quantum state ρ.Error mitigated shadows are unbiased as they estimate Tr(ρ id H).Increasing the circuit error rate ξ (blue vs. red) increases the bias in standard shadows (dashed blue vs. dashed red) and increases the variance of the error mitigated shadows (solid blue vs. solid red).Each data point is an average over 10 4 experiments of a fixed shot budget Ns.(right) Error in simultaneously estimating all 3-local Pauli strings without (blue) and with (red) PEC and readout error mitigation -only the 200 observables of the highest estimation error are shown and a circuit error rate ξ ≈ 0.72 is assumed.Errors are significantly below our rigorous bounds from Theorem 2 (which also take into account the overhead due to readout errors) for PEC shadows but the errors for conventional shadows can be above their respective bounds from [21] due to bias and readout errors (right end of blue).

A. Ground-state preparation
We first consider a spin-ring Hamiltonian as with coupling J = 0.3, on-site interaction strengths uniformly randomly generated in the range −1 ≤ ω k ≤ 1 and T is a vector of single-qubit Pauli matrices.This spin problem is relevant in condensed-matter physics in understanding many-body localisation [63] but is challenging to simulate classically for large N [64,65].A broad range of techniques are available in the literature for finding eigenstates of such quantum Hamiltonians using near-term or early fault-tolerant quantum computers [17][18][19]22].Here we prepare the ground state of this model using a variational Hamiltonian ansatz in Fig. 3 (left) of l = 5 layers on 12 qubits and, as we detail in Appendix D, we assume a biased Pauli noise model that can be learned efficiently using techniques from [47] while assuming a readout error model from Section II B. Ground state energy with PEC: Fig. 3 (middle) shows the error in the ground state energy estimated using conventional shadows (dashed blue and dashed red) and PEC shadows (solid blue and solid red) for an increasing number of shots N s .
Our ansatz circuit would ideally prepare the ground state ρ id but due to gate noise we actually prepare the noisy state ρ.Thus, conventional shadows (dashed blue, dashed red) converge to a plateau corresponding to the biased energy Tr(ρH) (solid grey).This bias is significantly increased as we increase the circuit error rate from ξ ≈ 0.15 to ξ ≈ 0.26 (dashed blue vs. dashed red), which is the expected number ξ = k p k of errors in the full circuit as explained in Section III.In contrast, PEC shadows that include measurement-error mitigation (solid blue and solid red) converge to the true energy Tr(ρ id H) in standard shot-noise scaling O(1/ √ N s ).
Local properties with PEC: Besides Hamiltonian energy estimation, which is one of the typical subroutines in quantum computing, there is also significant value in simultaneously determining many local observables' expectation values.For example, the rich information from classical shadows can be used to significantly improve parameter training or to directly estimate Hamiltonian energy gaps through the use of efficient classical postprocessing [22,23].In Fig. 3(right), we plot errors when simultaneously estimating all 3-local Pauli operators for an increasing number of shots N s .Fig. 3(red) shows that the errors in PEC shadows are always significantly below the theoretical bounds (black line) from Theorem 2 confirming looseness of the bounds (assuming success probability δ = 10 −3 , and M = 3 3 12 3 = 5940).Fig. 3(blue) shows the errors in conventional shadows are below their bounds (with ∥g∥ 1 = 1) only for a small number of shots but then asymptotically reach a plateau due to circuit noise.
Local properties with extrapolation: We now consider the same task of simultaneously estimating expectation values of Pauli operators but we use error extrapolation.Here we start by generating shadows S(ρ p1 , N s ), ..., S(ρ pn , N s ) at different noise strengths that we use to compute the noisy Pauli expectation values.Fig. 4 shows 10 examples of expected values (crosses) as a function of noise strength and the respective linear models we fit (dashed lines).The intercept of the fitted model (dashed lines) is our estimate of the exact expected value (disks) and is indeed reasonably close in the example.While ZNE has been very effective and typically has a lower measurement overhead then PEC, it is generally biased.

B. Error mitigated estimation of entanglement entropies
Finally, we consider an application for which classical shadows are a primary enabler but for which error mitigation techniques have been less explored [13].As opposed to studying entanglement properties or verifying the presence thereof in mixed quantum states [66][67][68][69], here our primary goal is to extend the reach of noisy quantum computers: we aim to study entanglement properties of ideally pure states which are prepared by quantum algorithms, such as phase estimation or VQE.For example, near-term quantum computers will enable us to prepare eigenstates [17][18][19]22] of quantum Hamiltoni-ans and error mitigated entanglement measures can be used for, e.g., characterising phase transitions.Similarly, one could simulate the time evolution of a collision of two molecules with an early fault-tolerant quantum computer and investigate how entanglement builds up across the individual subsystems.Furthermore, efficiently characterising many local correlations in a state can be used to train DFT models for accurate classical simulations [70].
We consider the Heisenberg chain with uniformly random −1 ≤ J k ≤ 1 and prepare its ground state with a variational Hamiltonian ansatz of l = 8 layers on 12 qubits.This system was used in ref. [21] to illustrate the power of classical shadows in predicting entanglement entropies.However, the ground state was approximated by a set of noise-free singlet states [71,72] whereas we assume a noisy quantum computer is used for state preparation.We use PEC shadows to extract purities Tr(ρ 2 Q ) for all single and two-qubit subsystems Q; These purities then define Rényi entropies as R Q := − log Tr(ρ 2 Q ).In Fig. 5, we plot the exact purities in the noiseless case -disjoint blocks involving two qubits confirm that the ground state could be approximated by a tensor product of noise-free singlet states.
Fig. 5 (middle) shows the errors in estimating local purities using shadows of size N s = 10 5 for a circuit error rate ξ = 0.6.Even for this moderate error rate conventional shadows are significantly impacted by imperfections and result in errors as large as 0.27 -whereas for an increasing noise rate all purities converge to a constant value of 1/d where d is the subsystem dimension.In contrast, PEC shadows drastically improve the accuracy in Fig. 5 (right) and the largest error is approximately 7 × 10 −2 at a number of samples N s = 10 5 .

C. Further applications
The techniques presented in this work enable us to approximate an unbiased estimator of an ideal noise free state ρ id which can be enabling for a broad range of further practical applications that we defer to follow up works.For example, ref. [21] proposed that classical shadows with randomised Clifford measurements can be used to predict fidelities, such as the fidelity of ρ with respect to a known state ψ.One can imagine applications where the fidelity ⟨ψ|ρ|ψ⟩ is not a relevant indicator due to the impact of noise on ρ and one rather aims to predict ⟨ψ|ρ id |ψ⟩, e.g., to quantify how well a variational quantum circuit or phase estimation can prepare a known ground state thereby verifying a circuit structure under the presence of gate noise.
Furthermore, the quantum Fisher information (QFI), which is a key quantity in quantum metrology, can be bounded and approximated using classical shadows via Qubit index i Qubit index j FIG. 5. A noisy variational Hamiltonian ansatz is used to prepare the ground state of Eq. ( 11) whose ideal, noise-free Rényi entropies RQ we can learn with PEC shadows.We plot purities Tr(ρ 2 Q ) as a proxy for RQ := − log Tr ρ 2 Q .(left) Purity heat map in the noiseless case and infinite shot limit.An increasing value indicates that the subsystem Q is less entangled with the remaining qubits.(middle-right) Absolute error in the purities due to gate noise for a circuit error rate ξ = 0.6 and due to finite repetition using Ns = 10 5 .(middle) Although the entanglement pattern is approximately recovered with conventional shadows, in some instances we observe substantial errors, i.e., the largest error is 0.27.(right) Absolute errors with PEC shadows are significantly smaller, i.e., the largest error is 7 × 10 −2 but this figure could be further reduced by increasing Ns.
techniques of Ref. [73].Indeed, in certain applications the relevant quantity might not be the QFI of the noisy state ρ but rather the QFI of the noise-free state ρ id which can be approximated with our techniques [74].
We also finally note possible "reverse" applications where classical shadows can be used to improve error mitigation techniques; an obvious one is perhaps the use of shadow tomography in explicitly reconstructing the noisy gate channels from Section III -typically one assumes the gate channels are local and thus the approach is efficient.In contrast, in learning-based error mitigation one does not reconstruct the noisy gate channels but rather aims to directly learn the quasiprobabilities g k by running classically simulable circuits on a noisy quantum computer and comparing observable expectation values to classically simulated ones [13,45,46,48].While one can exclusively train a model for one specific observable with conventional measurement schemes, classical shadows allow for simultaneously estimating a large number of expected values.We can thereby efficiently train error models that mitigate the impact of errors in all local operator measurements.The approach might similarly be useful in measuring many Pauli operators in case of learning sparse Pauli models [47] VI.DISCUSSION AND CONCLUSION In this work we consider the powerful classical shadows methodology which allow us to obtain an efficient classical representation of a quantum state ρ and thus to simultaneously predict many of its properties in classical postprocessing.A major difficulty concerning near-term and early fault-tolerant quantum computers is that they can only prepare noisy quantum states ρ from which we would estimate corrupted properties; This challenge motivated the field to develop quantum error mitigation techniques that allow us to estimate expected values Tr[Oρ id ] of observables O in an ideal noise-free state ρ id but with having access only to noisy expected values.
We consider a range of typical quantum error mitigation techniques and generalise them from single expectedvalue measurements to the case of mitigating errors in classical shadows.We find that Probabilistic Error Cancellation is the most well-suited candidate which motivates us to develop a thorough theory of PEC shadows.In the conventional PEC approach one learns error characteristics of the device and counters them by a probabilistic implementation of the inverse noise channelthus the only source of noise is due to a possibly imperfect knowledge of gate-error characteristics and due to finite circuit repetition.Under the assumption that the error model of the quantum device has been appropriately learned such that a quasiprobability representation is known, we prove that PEC shadows are an unbiased estimator of the ideal state ρ id .We additionally prove the following rigorous performance guarantees.
First, we prove bounds on the number of samples required to simultaneously predict many linear properties of the ideal quantum state ρ id .Second, the fact that we use noisy quantum circuits to predict ideal properties manifests in a multiplicative measurement overhead -this overhead is identical to the cost of the conventional PEC approach and grows exponentially with the number of noisy gates.Third, we prove rigorous sample complexities for predciting non-linear properties of the ideal states.
We note that our results are completely general and apply to any shadow ensemble E via Eq.( 1) and to any linear or non-linear property of the quantum state.Fur-thermore, we provide practical post-processing protocols for the pivotal scenario of randomised measurements in Pauli bases.Finally, we demonstrate in numerical simulations the usefulness of PEC shadows and error extrapolated shadows, and conclude that these techniques may be instrumental in practical applications of nearterm and early fault-tolerant machines.
We note that previous works have already explored applying error mitigation techniques to classical shadows focusing on errors in the POVMs E [29,30] assuming the aim is to estimate shadows of an input state ρ.Furthermore, classical shadows of ρ have been used to classically estimate expected values Tr[Oρ n ]/Tr[ρ n ] thereby classically performing Error Suppression by Derangements (ESD) [26], Virtual Distillation (VD) [27].This ultimately allows us to estimate expected values in the dominant eigenvector of ρ which is an approximation [75] to ρ id but at an exponential cost in the number of qubits and in the number of noisy gates.In contrast, the techniques we present are efficient in the sense that the sample complexity does not directly depend on the number of qubits but rather depends exponentially on the number of noisy gates, here via the norm ∥g∥ 2  1 .As with usual error mitigation techniques, the approach is limited to a number of gates ν ∈ O(p −1 ) given by the inverse of the per-gate error rate -beyond this threshold the effect of errors escalates exponentially [76][77][78].State-of-the-art theoretical lower bounds suggest a slightly more optimistic picture whereby circuits of poly log log N depth provably yield exponential decrease of fidelity as opposed to constant depth suggested by ∥g∥ 2  1 .In summary, the present work leverages an existing, rich toolbox of quantum error mitigation ideas and generalises powerful classical shadows to the pivotal scenario of approximating properties of an ideal quantum state ρ id .As we demonstrate in a broad range of examples, these quantum error mitigated classical shadows are very intuitive, easy to use in practice and may play a central role in exploiting near-term and early fault-tolerant quantum computers.We discuss a broad range of further possible use cases and anticipate the present work will stimulate further advancements in the field.

ACKNOWLEDGMENTS
The authors thank Simon Benjamin, Dan Browne, Otfried Gühne, Ryuji Takagi, and Zoltán Zimborás for helpful comments on drafts of the manuscript.B.K. thanks the University of Oxford for a Glasstone Research Fellowship and Lady Margaret Hall, Oxford for a Research Fellowship.J.S. and H.C.N are supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, project numbers 447948357 and 440958198), the Sino-German Center for Research Promotion (Project M-0294), the ERC (Consolidator Grant 683107/TempoQ), and the German Ministry of Education and Research (Project QuKuK, BMBF Grant No. 16KIS1618K).J.S. also acknowledges the support from the House of Young Talents of the University of Siegen.Z.C. is supported by the Junior Research Fellowship from St John's College, Oxford.The numerical modelling involved in this study made use of the Quantum Exact Simulation Toolkit (QuEST), and the recent development QuESTlink [79] which permits the user to use Mathematica as the integrated front end, and pyQuEST [80] which allows access to QuEST from Python.We are grateful to those who have contributed to all of these valuable tools.The authors would like to acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility [81] in carrying out this work and specifically the facilities made available from the EP-SRC QCS Hub grant (agreement No. EP/T001062/1).The authors also acknowledge funding from the EP-SRC projects Robust and Reliable Quantum Computing (RoaRQ, EP/W032635/1) and Software Enabling Early Quantum Advantage (SEEQA, EP/Y004655/1).Proof of Lemma 1.The statement directly follows from the fact that ||g|| 1 sign(g k )G k is an unbiased estimator for the ideal operation U circ .In particular, as we sample k according to the probability distribution p(k), we obtain the expectation as The above expression can be simplified collecting the constant factors as p(k)||g|| 1 sign(g k ) = sign(g k )|g k | = g k and thus we obtain the quasiprobability decomposition Proof of Theorem 1.Using the abbreviation ρid ≡ ρk,l we calculate the expected value as where p k = |g k |/∥g∥ 1 is the probability of choosing the circuit variant G k from Definition 1 and we also use the probability q l = Tr[G k (|0⟩⟨0|)E l ] of observing the POVM outcome l.We obtain the expected value by substituting these in Eq. (A1) as Here we can collect and simplify all constant factors as ∥g∥ 1 g k and simplify the expected value as Above in the first equality we simply used the linearity of the trace operation while in the second equality we used that by definition k g k G k |0⟩⟨0| = ρ id .We finally substituted the definition of the measurement channel C E (•) given by Eq. ( 2).

Shadow norms
Before proving Lemma 2, let us define the shadow norm of an operator O and calculate it explicitly for the practically important scenario when O is a local Pauli string.Lemma 3. We define the shadow norm with respect to the generalized measurement E as where ∥•∥ ∞ denotes the maximal eigenvalue of the corresponding operator and ρl = C −1 E (E l ).For the specific case of Pauli-basis measurements and observables that are q-local Pauli strings, the squared shadow norm is given as 3 q .
Proof.When formulating shadow tomography with generalized measurements, the case of uniformly sampled Pauli-basis measurements corresponds to the so called octahedorn POVM [32], where the effects on a single qubit are given by where b ∈ {0, 1} is a single bit and Q j is one of the three basis transformation unitaries that allow us to measure in the bases of the Pauli X, Y and Z operators.Thus the effect is equivalent to 1 3 |t ± ⟩⟨t ± | for t ∈ {x, y, z} where |t ± ⟩ denotes the eigenvector corresponding to eigenvalue ±1 of the single-qubit Pauli-t operator.It follows from the symmetry of the measurement [32] that the shadows can be computed directly from the effects as For the case of a system consisting of n qubits where one aims to estimate local observables of the form O = O 1 ⊗• • •⊗O n and the measurement is given by the tensor product of local measurements E (1) jn with E (j) denotes the POVM acting on the jth qubit, the shadow norm similarly factorizes as ∥O∥ 2 E = j ∥O j ∥ 2 E (j) .We now consider the case when the single-qubit operator O j acting on the jth qubit is a Pauli operator X, Y or Z and thus Tr[O j ] = 0.By the previous discussion, it is sufficient to only consider a single qubit, thus we will suppress the index j.This yields the shadow norm Now observe that if O, T ∈ {X, Y, Z} with |t ± ⟩ are the normalized eigenvectors of T to eigenvalues ±1, we have due to the anticommutation relation This implies that the sum in Eq. (A10) collapses to the identity 1 1.Hence we obtain ∥O∥ 2 E = 3.When the single-qubit observable is the identity O = 1 1 we obtain a shadow norm ∥O∥ 2 E = 1.Consequently, for q-local Pauli strings acting on n qubits the squared shadow norm is ∥O∥ 2 E = 3 q , thus independent of n.We explain in Appendix D how this bound is modified when considering the effect of readout errors.
In practice it is often the case that the set of targeted observables posses a certain structure.If this is the case, small variations to classical shadow protocol in which the measurement basis is sampled uniformly at random can yield a substantial improvement with respect to sample complexity [82].
For instance, in electronic structure problems where one aims to, e.g., determine the ground-state of molecules using a quantum algorithm, one typically starts by transforming the molecular Hamiltonian into a qubit Hamiltonian as a sum of Pauli observables by means of an appropriate mapping.Common types of such mappings are Jordan-Wigner (JW), Bravyi-Kitaev (BK) and the parity (P) transformation [9].Here it is important to note that depending on the encoding, the different Pauli operators X, Y, Z appear with different frequencies in the corresponding qubit observable.For instance in BK encoding, the appearance of Pauli-Y operators is suppressed compared to X and Z. Consequently, measuring the different Pauli basis uniformly on each qubit, i.e., using the octahedron measurement, would be very wasteful.
A similar statement concerning sample complexity as in Lemma 3 can be made for the case of locally biased shadows [82,83].Let assume that the bias is p x , p y , p z where p t is the probability for performing the measurement in Pauli t basis.The corresponding POVM would be E t ± = p t |t ± ⟩⟨t ± |.Then the classical shadow based on measurement outcome would be where µ = p 2 x + p 2 y + p 2 z .With this, given a Pauli string, one can directly calculate the shadow norm.
We can now calculate the expectation by recalling that p k = |g k |/∥g∥ 1 is the probability from Definition 1 of choosing the circuit variant G k and is the probability of observing the POVM outcome l.
Thus the above expectation is calculated as Above we introduced Ω := ∥g∥ −1 1 k |g k |G k which is actually a permissible quantum channel [31], i.e., a CPTP map since by definition it is a convex combination of CPTP maps G k .This expression is similar to the one in Ref. [21].
The above expression can be upper bounded by replacing the initial state |0⟩⟨0| by a maximization over all states σ.Thus we obtain the upper bound Tr Ω(σ where we moved the summation inside the trace.By introducing the abbreviation Γ = l Tr[O ρl ] 2 E l we obtain the upper bound as Above we used that Ω(σ) is a valid density matrix and thus upper bounded the trace via the operator norm Tr σΓ ≤ ∥Γ∥ ∞ as the largest singular value of Γ, which is by definition the shadow norm from Lemma 3. Since ⟨O⟩ 2 ≥ 0, we obtain Var Let us make a few observations.(a) recall that conventional classical shadows make no assumption about the input state ρ [21].In contrast, in our case a circuit description U circ |0⟩⟨0| of the "input state" ρ id is actually part of the protocol.Of course, knowing such a description of the input state does not allow one to classically efficiently predict its properties without using classical shadows unless the circuit U circ has some special properties permitting efficient classical simulation, such as Clifford circuits.
(b) the proof in Lemma 2 involves a maximization over density matrices such that our bounds are independent of the particular quasiprobability decomposition and thus depends only on the norm ∥g∥ 2  1 .(c) it can be expected that the upper bound in Lemma 2 is very pessimistic.Similar, constant factor looseness of the bounds was already observed for conventional shadows [21], however the discrepancy is strongly expected to be even larger for PEC shadows.This is due to (b) as we do not take into account properties of the individual circuits in the quasi probability decomposition but rather apply a pessimistic global bound.(B2) Even though this method requires an increased number N batch K of independent classical shadows, it is much more robust against outlier corruption.The idea is that if μK,b (O j ) deviates more than ϵ from Tr[O j ρ id ], more than K/2 of the individual empirical mean values must deviate by more than ϵ.This is an exponentially suppressed event.This can be made more formal by the concentration inequality estimator [84,85].
where σ denotes the standard deviation.
Then a collection of N = KN batch independent classical shadows allow for accurately predicting all ideal expectation values via median of means estimation such that Proof.This is a direct consequence of the concentration property of the median of means estimator together with the bound on the variance from Lemma 2. Because Var[μ] ≤ ∥g∥ 2 1 ∥O∥ 2 E and if the accuracy is ϵ, we have Further, as we have M measurements that we want to accurately predict with at most failure probability δ, we need for each individual measurement exp(−K/8) ≤ δ/M .Thus the choice K = 8 log(M/δ) yields the desired bound.In total we have: 3. Predicting non-linear properties (Theorem 3) In order to obtain rigorous performance guarantees of our estimator, two ingredients are needed.First, note that any polynomial function in the quantum state can be written as a linear function in tensor products of the quantum state.More precisely, suppose we want to estimate a polynomial function of degree m of the quantum state ρ, e.g.The second tool needed is the so called U-statistics, which often provides a uniformly minimum variance unbiased estimator for nonlinear polynomial functions.Suppose we have access to N independent snapshots ρ1 , ..., ρN which are generated by an underlying state ρ and that f (ρ 1 , ..., ρm ) is a polynomial function in the shadows such that θ, which is our parameter of interest is given by θ = E[f (ρ 1 , ..., ρm )].The U -statistics [86] of order m is defined as where C N,m is the set of all combinations of m distinct elements that one can build out of N different snapshots.
The variance of this estimator has a closed form in dependence on the function f .For a U -statistic U N given by Eq. (B7) the variance obeys [86] Var where In order to understand the scaling of Var[U N ] it is sufficient to consider a particular instance Var[f (d) (ρ 1 , ..., ρd )].First notice that for A ∈ B(H ⊗m ) as defined in Eq. (B6) one has where p = m − d with the convention that ρ Similarly as in the proof of Lemma 2 we denote the operator Ω := ||g|| −1 k j |g k j |G k j for all 1 ≤ j ≤ d and write where Γ = l Tr[Aρ l ] 2 E l .Finally, we can evaluate an upper bound of the summation in Eq. (B8) analytically as Given the factorial formula is of O(1/N ) with N ≡ N s , the above bound implies that the number of samples needed to predict polynomial functions of degree m scales as O(||g|| 2m 1 /ϵ 2 ).For an arbitrary observable P we can calculate the estimator from Eq. ( 6) as In the idealised measurement case E simplifies as detailed in Eq. ( 1) and our classical shadow is then a collection of N s measurement outcomes b ∈ {0, 1} N and corresponding single-qubit Pauli measurement basis defined by the single-qubit rotations Q k ∈ Q Since P is a q-local Pauli string it admits the product from P = i∈Q P (i) while the snapshot C −1 E (E l ) similarly is of a product form via Eq.( 3).Note also that we use the index set Q to abbreviate the set of qubits to which P acts non-trivially and |Q| = q.Thus we obtain the trace as Above we have used that the trace of a tensor product simplifies to a product of traces and that on every qubit i for which P (i) ≡ 1 1 the single-qubit expression evaluates to Tr P (i) 3(Q The expression in Eq. (C2) evaluates to {±3 q , 0} as we explain now.The expression evaluates to ±3 q if the measurement bases defined by Q (i) k are the same as the single qubit Pauli matrices P (i) on the qubits i ∈ Q.The sign is then determined by the bits b (i) in the bitstring b, i.e., it is negative if the Hamming weight of the bitstring is odd on the qubits in Q.Otherwise, if the measurement bases are not compatible with P on the qubits in Q then the above expression evaluates to zero.
Thus we obtain the simplified estimator as where f (b, Q k ) ∈ {±1, 0}.The reconstruction algorithm thus takes the classical shadow data as the collection of bitsrings and Pauli measurement bases {b k , P k } Ns k=1 , as well as the Pauli observable P , and calculates the values of f (b, Q k ) as 0 if the measurement bases are incompatible with P and ±1 otherwise.The algorithm has runtime O(qN s ).

Improved estimation of local observables with light cones (Algorithm 2)
For example, imagine a circuit of a single noisy gate with quasiprobaiblity decomposition |γ 1 |G 1 − |γ 2 |G 2 and an observable O whose light cone does not contain this gate.In Algorithm 1 we modify the sign sign(γ k ) → +1 and renormalise the coefficients such that ∥γ∥ 1 → 1.Thus in expectation we implement the gate where we used that |γ 1 | + |γ 2 | = 1 by definition.Furthermore, as the gate is not in the light cone of the observable O we used the identity Tr[OG 1 (ρ in )] = Tr[OG 2 (ρ in )] = Tr[OU(ρ in )] where U is the ideal gate.
In general, for general circuits with local noise models we can redefine the signs and norms in Eq. ( 4) such that if the corresponding gate is not contained in the light cone as l / ∈ I then sign(γ k l ) = +1 with ∥γ (l) ∥ 1 = 1 otherwise the coefficients are unchanged as γ(l) k l .This allows us to rescale the original sampling cost ∥g∥ 1 = ν l=1 ∥γ (l) ∥ 1 where ν is the total number of noisy gates to l∈I ∥γ (l) ∥ 1 since we have redefined ∥γ (l) ∥ 1 → 1 for every gate whose index l is outside of the light cone.A significant advantage of this approach is that it is completely done at the stage of classical post-processing and at the time of measurement we simply just implement the conventional quasiprobability approach assuming all gates are noisy.For generalisation to non-local noise models refer to [54].

Local purities (Algorithm 3)
For ease of notation we abbreviate the indexes of PEC snapshots as ρi := ρk,l with i = (k, l).Given a subsystem as a set of qubits Q = {q 1 , ..., q m } we obtain an unbiased estimator for the Rényi entropy as with i ̸ = j.Here SWAP Q,Q ′ swaps all pairs of qubits q k and q k+N in the system of 2N qubits in ρi ⊗ ρj .The factor f (i, j, Q) can be computed analytically using that snapshots are of a product form as ρi = where SWAP is the standard 2-qubit SWAP operator.
Here we have used that traces for qubits not in subsystem Q evaluate to 1 and that the trace of a tensor product simplifies to a product of traces.We can evaluate analytically the following expression as it only involves 2 qubits as Tr SWAPρ When the single-qubit measurement bases Q k and Q (q) l in the snapshots are non-identical then the expression evaluates to 1 2 .When the measurement bases are identical then then the expression evaluates to 5 given the measurement outcome bits are identical.Otherwise it is −4 for non-identical measurement outcome bits.f (i, j, Q) is then just a product of these values evaluated for all qubits in Q.
The algorithm simply iterates over all distinct pairs of snapshots and evaluates f (i, j, Q).We further multiply each snapshot outcome by the corresponding signs sign(g i )sign(g j ) and with the squared norm ∥g∥ 2  1 .Finally, we compute the median of means of these individual outcomes.The algorithm has a runtime O(|Q|N 2 s ).

Appendix D: Details of numerical simulations
In this work we use exact quantum state simulators to simulate noisy quantum circuits up to 12-qubit systems.Given the present approach is effectively a Monte-Carlo sampling scheme, we efficiently simulate the effect of noise using a Monte-Carlo approach.For example, a Pauli channel is simulated by randomly choosing a Pauli event according to its corresponding probability and applying the corresponding Pauli operator to the state.
As state-of-the-art experiments [40] apply Pauli twirling to guarantee that the noise model is well approximated by a Pauli channel-which can be learned efficiently [47]-we assume Pauli noise models.In particular, we assume that two-qubit gates are the dominant source of noise and they are affected by Pauli errors with possibly different probabilities for each gate as p k .
1. Details of Fig. 3 As we discussed in the main text, we simulate the 12qubit ansatz circuit in Fig. 3 (left) that prepares the ground state of the spin-ring Hamiltonian in Eq. (10).Gate error mitigation: We assume that each noisy quantum gate is affected by the following, biassed, local Pauli channel Φ k (ρ) :=(1 − p k )ρ (D1) where η k is a bias parameter.Furthermore, we consider that the error probability p k is specific to each gate and we randomly draw their values from the distribution N (µ = p, σ = p) and as we explain below we explore two different regimes via p ∈ {10 −3 , 2 × 10 −3 }.We randomly choose probabilities to reflect the high variability of two-qubit error rates in typical superconducting devices [40].Of course, this variability may be lower in ion-trap devices [38,39].Given in most platforms T2 relaxation timescales are significantly faster than T1 relaxation timescales, we consider a biased noise channel, that is specific to each gate, by choosing the bias parameters η k from the distribution N (µ = 0.9, σ = 0.015).We can straightforwardly find the inverse noise channel analytically as Φ k (ρ) In our simulations we thus randomly apply Pauli X, Y , Z or 1 1 gates to the qubits in the support of the relevant gate via probabilities defined by γ.
We additionally note that, while we found the coefficients γ analytically, they can also conveniently be computed numerically at the pre-processing stage.The numerical inversion is efficient given noise models are local, while state-of-the art noise-model learning techniques achieve trivial inversion via Pauli-Lindblad channels even for non-local (but sparse) models [47].Indeed, the present noise channel captures dominant terms in these experimentally-learned error models [47].Circuit error rate: In Fig. 3 we repeat our simulations for two different noise levels to demonstrate that the bias (as well as the sampling overhead) grows exponentially.We implement two different circuit error rates, ξ ≈ 0.15 and ξ ≈ 0.26.As we detail in sec Section III, this circuit error rate ξ = k p k expresses the average number of errors in the full circuit and our circuit contains ν = 60 noisy two-qubit gates.
Readout error mitigation: We also consider readout errors: while readout errors may not be significant in ion-trap devices [38,39] as they are typically below gate operation errors, we consider readout error rates that are consistent with typical superconducting systems.In particular, we assume the noise model detailed in Section II B whereby the readout of each qubit is affected by the same bitflip probability α = 0.01.We mitigate the effect of readout errors using the analytical inverse of the measurement channel detailed in Section II B.
Resampling scheme: In Fig. 3 (middle) we perform a ground-state energy estimation for an increasing number N s of snapshots and for each fixed shot budget N s , we estimate the average error from the exact energy by averaging over 10 4 different experiments.We efficiently simulate this sampling task via the usual resampling scheme: we generate a very large pool of 10 7 snapshots (an order of magnitude more than the largest simulated shot budget N s ≤ 10 6 ) using the above described quantummechanical simulations.We then estimate the groundstate energy by randomly choosing N s snapshots each time from this large pool.
Error bounds: In Fig. 3 (right) we compare to analytical error bounds in Theorem 2. Recall that our bounds in Theorem 2 depend on the largest shadow norm max 1≤k≤M ∥O k ∥ 2 E and in Lemma 3 we evaluate the shadow norm of a q-local Pauli string as 3 q assuming ideal measurements via the snapshots of the from in Eq. (3).We now take into account the effect of readout errors by explicitly calculating the shadow norm under the above readout-error model.As such, it is straightforward to modify our proof in Lemma 3 by considering the effects and snapshots from Section II B, obtaining the following bound on the shadow norms for considering all q local Pauli strings as Indeed, readout-error mitigation has an associated measurement overhead of (1−2α) −2q .
2. Details of Fig. 4 As our main objective is to demonstrate that extrapolation-based error mitigation is indeed compatible with classical shadows, we assume a simple error model whereby every two-qubit gate has the same error probability p that we can perfectly magnify to λp.Of course, in reality magnifying the error rates precisely is rather involved but has been successfully demonstrated in even large-scale experiments [40].In particular, in our simulations for extrapolation-based error mitigation, we assume the above Pauli noise with an asymmetry parameter of η k = 1  3 , effectively a local depolarising noise with a probability p.We define this channel as FIG. 1.In the present work we assume we only have access to a noisy quantum computer (left) such that every circuit we run (left, yellow area) gets corrupted by gate noise (unwanted red gate elements).We aim to extract properties of a state that would be prepared by an ideal quantum computer (right) with the use of powerful error mitigation techniques.We provide a rigorous theoretical framework for PEC shadows which effectively allows us to obtain a classical shadow of the ideal quantum state (noise-free shadow) from which we can predict many ideal properties in classical post-processing (middle blue area, classical computer).In our formalism we run a series of distinct quantum circuit variants (left, yellow area) that cast different classical shadows (noisy shadows) due to gate noise and due to our intentional recovery operations (red gate elements and their shadows).Under the assumption that the device's error characteristics have been appropriately learned, we can estimate the noise free shadow (middle) via classical post-processing.
Theorem 1 we want to simultaneously estimate expected values of M operators in the ideal state as Tr[O 1 ρ id ] . . .Tr[O M ρ id ].Using a median of means estimator, the number of shots required to achieve performance parameters ϵ, δ ∈ [0, 1] is Appendix A: Details of PEC Shadows 1. Unbiased estimators (Lemma 1 and Theorem 1)

2 .
Sample complexity for predicting linear properties (Theorem 2) In order to predict expected values of M independent observables {O 1 , ..., O M }, we group N s = N batch K independent snapshots into K batches B 1 , ..., B K each of size N batch .Then for each subset B i one uses the empirical mean as μi (O j ) = N −1 batch l∈Bi Tr[O j (ρ id ) l ].The final estimate for the expectation value of O j is then obtained by the median of the individual empirical means, i.e., μK,b (O j ) := median{μ 1 (O j ), ..., μK (O j )}.
FIG. 3. (left) A noisy variationalHamiltonian ansatz is used to prepare the ground state of Eq. ( Theorem 4 (Formal version of Theorem 2).Let U circ be the ideal quantum circuit producing the ideal output state ρ id from Definition 1. Suppose that we want to predict M linear properties O 1 , . . ., O M of the ideal state, i.e., ⟨O j ⟩ = Tr[O j ρ id ].For fixed performance metrics ϵ, δ ∈ [0, 1] set