Robust shadow estimation

Efficiently estimating properties of large and strongly coupled quantum systems is a central focus in many-body physics and quantum information theory. While quantum computers promise speedups for many such tasks, near-term devices are prone to noise that will generally reduce the accuracy of such estimates. Here we show how to mitigate errors in the shadow estimation protocol recently proposed by Huang, Kueng, and Preskill. By adding an experimentally friendly calibration stage to the standard shadow estimation scheme, our robust shadow estimation algorithm can obtain an unbiased estimate of the classical shadow of a quantum system and hence extract many useful properties in a sample-efficient and noise-resilient manner given only minimal assumptions on the experimental conditions. We give rigorous bounds on the sample complexity of our protocol and demonstrate its performance with several numerical experiments.


I. INTRODUCTION
We are in the process of building large-scale and controllable quantum systems. This not only provides new insights and tool kits for fundamental research in quantum many-body systems [1] and the quantum nature of spacetime [2], but also yields fruitful applications in computing [3][4][5][6], communication [7][8][9], and sensing [10,11]. Learning the properties, e.g., fidelity [12,13], entanglement [14,15], and energy [16] of generated quantum states is usually a major step in many quantum benchmarking protocols and quantum algorithms. Among various figure of merits, robustness and efficiency are two key factors to assess the practicality of any property learning protocol.
In the Noisy Intermediate-Scale Quantum (NISQ) era [17], quantum circuits inevitably suffer from noise. The robustness of a property learning protocol then refers to the ability to tolerate such noise. In a typical property estimation process, we generate several identical copies of the target quantum states, and then measure them using some devices which might be noisy and uncharacterized. To verify the property estimates, one has to introduce new benchmarking devices, which (in the NISQ era) will also be noisy. Consequently, we will be trapped into a loop of benchmarking. To get rid of this, at least two approaches have been proposed: One is to introduce extra assumptions on the noise model, in which case we might be able to mitigate the error [18][19][20][21], but such assumptions may not be verifiable. The other is to use device-independent protocols [22][23][24] which do not have any assumptions on the devices, but such protocols are * These authors contributed equally to this work. † qubitpei@gmail.com mostly designed for some specific property learning tasks (e.g. entanglement detection), and their requirements on devices and computational/sample complexity can be too strict to produce anything informative in practice.
Thus, while property learning and testing leads to large efficiency gains in sample and computational complexity, one must in general have a well-characterized device for these methods to be applicable. Quantum tomography [25,26] is a standard method to extract complete characterization information, but it requires exponentially many samples with respect to the number of qubits. Several efficient tomographic schemes were proposed based on some properties of the prepared states, such as the low rank property [27,28], permutation symmetry [29,30], and the locality of Schmidt decomposition [31,32]. Nevertheless, such assumptions are restrictive and not applicable in many cases. Another line of research focuses on efficiently extracting partial information of a quantum state without any prior knowledge. An example is the quantum overlapping tomography [33,34] which can simultaneously estimate all k-qubit reduced density matrix of an arbitrary quantum state in a sample-efficient manner for small k. The simplest version of this idea is to measure uniformly random Pauli strings [35], which leads to a sample complexity of O(k3 k log n) for estimating all k-body Pauli observables to fixed precision. Machinelearning based approaches are also proposed [36] in this direction.
Recently, a new paradigm for efficient and universal quantum property estimation has been proposed named quantum shadow estimation. Shadow estimation was first put forward in Ref. [37]. Roughly speaking, this scheme can simultaneously estimate the expectation values with respect to N observables of an unknown d-dimensional quantum state with order log d log N number of samples, arXiv:2011.09636v2 [quant-ph] 25 Jun 2021 which is usually more efficient than either conducting full tomography or measuring the N observables one by one. Later on, a more experiment-friendly shadow estimation scheme was proposed [38], which is able to estimate many useful properties of a quantum system with a small number of samples (see also [39]). This protocol is also proven to be worst-case sample-optimal in the sense that any other protocol that is able to accurately estimate any collection of arbitrary observables must consume a number of samples at least comparable to this one. Although promising for a broad spectrum of applications, the shadow estimation scheme in Ref. [38] (as well as the random Pauli scheme from [35]) assume perfect implementation of a group of unitary gates as well as ideal projective measurement on the computational basis. It remains unclear how experimental noise can affect the performance of this scheme.
In this work, we reexamine the shadow estimation scheme and regard it as a twirling and retrieval procedure of the measurement channel. In this way, we extend shadow estimation to the case when the unitaries and measurements are noisy. With similar techniques used in the study of randomized benchmarking [40][41][42][43][44], we propose a modified shadow estimation strategy which is noise-resilient. When the noise in the unitary operations and measurements is small, the robust shadow tomography scheme is able to faithfully estimate the required properties with a small additional cost, subject only to the assumption that one can prepare the initial ground state |0 ⊗n with high fidelity. The proposed scheme is both robust and efficient, and hence highly practical for property estimation of a quantum system.

II. PRELIMINARIES
We first introduce the Pauli-transfer-matrix (PTM) representation (or Liouville representation) to simplify the notation. Note that all the linear operators L(H d ) on the underlying n-qubit Hilbert space H d with d = 2 n can be vectorized using the n-qubit (normalized) Pauli operator basis {σ a := P a / √ d} a , where P a are the usual Pauli matrices. For a linear operator A ∈ L(H d ), we define a column vector |A ∈ H d 2 with the a-th entry to be |A a = Tr(P a A)/ √ d. The inner product on the vector space H d 2 is defined by the Hilbert-Schmidt inner product as A|B := Tr(A † B). The normalized Pauli basis {σ a } a is then an orthonormal basis in H d 2 . Superoperators on H d are linear maps taking operators to operators L(H d ) → L(H d ). In the vector space H d 2 , a superoperator E can be represented by a matrix in the Pauli basis, with the entries given by E ab = σ a |E(σ b ) = σ a |E|σ b . Choosing the Pauli basis for the superoperator is sometimes called the Pauli transfer matrix. With a slight abuse of notation, we sometimes denote a superoperator and its PTM using the same notation. A detailed introduction to the PTM is given in Appendix A 3.
In this work, we focus on the task of estimating the expectation values {Tr(O i ρ)} i of a set of observables {O i } i on an underlying unknown quantum state ρ, When the number of observables N is large, a direct exhaustive measurement of the (generally incompatible) observables {O i } on ρ is expensive. Besides, in many cases we may want to perform tomographic experiments on ρ before deciding which observables {O i } should be estimated. To realize this, a natural idea is to insert an extra prepare-and-measure superoperator between O i | and |ρ , In an experiment, we first apply a POVM measurement {E x } x at ρ. Then, conditioned on the outcome x, we calculate O i |A x via classical post-processing. If we repeat this procedure, then the sample average over these experiments gives an estimator for O i |ρ . As long as the inserted superoperator x |A x E x | equals to I, this estimator will be unbiased.
To construct a realization of such a superoperator, we consider the dephasing channel in the computational basis (Z-basis) M Z := z |z z|, where |z is the vectorization of the Z-basis eigenstate |z z|, with z ∈ {0, 1} ⊗n . Expanding M Z in the Pauli operator basis {|σ 0 , |σ x , |σ y , |σ z } ⊗n , we have where diag(a 1 , a 2 , · · · ) is a diagonal matrix with the diagonal elements a 1 , a 2 , · · · . If M Z were invertible, we could insert the superoperator M −1 Z M Z = z |M −1 Z (z) z| = I. However, M Z is not invertible due to the lack of X, Y -basis information in a Z-basis measurement. To make M Z invertible, we can introduce an extra unitary twirling [38], Here, G is a subset of the unitaries {U } in U (d) to be specified later, and U is the PTM representation of U . When G forms a group, the PTMs {U} forms a representation of G. A direct application of Schur's Lemma [45] (see Appendix A 1) allows us to calculate the explicit form of M, where R G is the set of irreducible sub-representations of the group G, and Π λ is the corresponding projector onto the invariant subspace. Since the projectors are complete and orthogonal to each other, M is invertible if and only Algorithm 1 Shadow Estimation (Shadow) [38] Input: Unknown n-qubit quantum state ρ, observables Prepare ρ, uniformly sample U ∈ G and apply to ρ.

4:
Measure in the computational basis, outcome |b .
if all the coefficients are non-zero. Therefore the twirling group G needs to satisfy Once Eq. (6) is satisfied, we can construct a shadow estimation protocol based on the equation To implement shadow estimation, one can repeat the following experiment: generate a single copy of ρ, act via a randomly sampled unitary U , and then perform a Z-basis measurement to return an output bit string b. Then O i |M −1 U † |b is calculated on a classical computer. Thanks to this decoupled processing of ρ with respect to O i , the estimation of different observables can be done in parallel with a relatively small increase in sample complexity. The quantum shadow estimation procedure can be summarized as in Algorithm 1. We refer toô (r) i as the singleround estimator. The subroutine MedianOfMeans divides the R = N K single-round estimators into K groups, calculates the mean value of each group, and takes the median of these mean values as the final estimator. As a formula: For the standard shadow estimation algorithm [38], the input quantum channel M is decided by Eq. (4).

III. ROBUST SHADOW ESTIMATION
In practice, the unitary operations and measurements used in the standard shadow estimation algorithm will be noisy. We want to mitigate the effect of this noise on the output estimate of the shadow. Our strategy to do this is simple: we first learn the noise as a simple stochastic model and then compensate for these errors using robust classical post-processing.
In general, noise in quantum devices is not stochastic, and coherent errors must be addressed. However, thanks to the unitary twirling in shadow estimation, the stochastic nature of the noise is inherent to the protocol itself. For example, any noise map that is twirled over a Clifford group that contains the Pauli group as a subgroup will reduce the noise to a purely stochastic Pauli channel [46]. The complete characterization of such noise channels can be efficiently and accurately performed [47][48][49]. It is then straightforward to compensate for such errors by modifying the classical post-processing, although a lengthy analysis is required to show the efficacy of this strategy.
In order to pursue a rigorous analysis of this strategy, we make the following two assumptions on the noise in the experimental device implementing the shadow estimation.

A2
The experimental device can generate the computational basis state |0 ≡ |0 ⊗n with sufficiently high fidelity.
Our first assumption is used throughout to ensure that there exists a completely positive trace-preserving (CPTP) map such that the noisy gateŨ can be decomposed into ΛU, where U is the ideal gate while Λ is the noise channel. The noise map Λ is independent of the unitary U and the time t. It also implies that the noise map occurring in the measurement is fixed independent of time and hence can be absorbed into Λ. We remark that assumption A1 is widely used in the analysis of randomized benchmarking protocols. The gate-independent part of the assumption is especially appropriate when the experimental unitaries are single-qubit gates, but it has been shown that the effect of weak gate dependence (a form of non-Markovianity) generally leads to weak perturbations [50,51]. We also provide numerical evidences in Sec. VIII showing that our scheme is still quite robust against realistic gate-dependent noise models in experiment.
For our second assumption A2, from Sec. III to Sec. V we initially make the stronger assumption that the experimental device can prepare the |0 state exactly. In Sec. VI we relax this to show that when |0 is not precisely prepared, but is prepared with sufficiently high fidelity, our protocol still gives a good estimation. Fortunately, the computational basis state |0 is relatively easy to generate faithfully in many experimental platforms.
To see how unitary twirling helps to reduce the number of noise parameters, we calculate the noisy version of the random measurement channel M, where the {f λ } are expansion coefficients of the twirled channel. Note that the channel Λ describes both the noise in the gate U and in the measurement M Z , which is always possible under our assumption A1. The number of {f λ } is related to the number of irreducible representations in the PTM representation of the twirling group G.
Later we will show that the coefficients {f λ } can be estimated in parallel, similar to the normal shadow estimation procedure (referred to as the calibration procedure). Prepare |0 , sample (noisy) U ∈ G and apply to |0 .

4:
Measure in the computational basis, return |b .
bust quantum shadow estimation (RShadow) protocol to faithfully estimate {Tr(O i ρ)} i even with noise. The algorithm is depicted by Fig. 1 and it works as follows. We first estimate the noise channel M of Eq. (9) with the calibration procedure, and then use the estimator M as the input parameter M of Algorithm 1 to predict any properties of interest (referred to as the estimation procedure). The procedure is shown in Algorithm 2, where the subroutine NoiseEst is decided by G and is given later.
In the following discussion, we will focus on two specific groups G: the n-qubit Clifford group Cl(2 n ) and the n-fold tensor product of the single-qubit Clifford group Cl ⊗n 2 . We will give a specific construction of the NoiseEst subroutine and show the correctness and efficiency of our RShadow algorithm with these two groups.

IV. ROBUST SHADOW ESTIMATION USING GLOBAL CLIFFORD GROUP
We first present a robust shadow estimation protocol using the n-qubit global Clifford group, Cl(2 n ). The nqubit Clifford group has many useful properties such as being a unitary 3-design [52][53][54], which is widely used in many tasks of quantum information and quantum computation. It is a standard result that the n-qubit Clifford group has two irreducible representations in the Liouville representation whose projectors are given by |σ 0 σ 0 | and I − |σ 0 σ 0 |. Assuming the M channel defined in Eq. (9) is trace preserving, it can be written as for some f ∈ R, i.e. as a depolarizing channel. It is easy to obtain f = 1/(2 n + 1) for the noiseless case using Eq. (9). The noise characterization subroutine with Cl(2 n ) is defined as follows, where |b is the Liouville representation of the computational basis state |b b| and similar for |0 . Next, define the Z-basis average fidelity of a noise channel Λ as F Z (Λ) = 1 2 n b∈{0,1} n b|Λ|b . The following theorem demonstrates the correctness and sample efficiency of our protocol. We remark that the validity of this theorem relies on Assumptions 1.
if the number of samples for the calibration procedure satisfies where F Z ≡ F Z (Λ) and we assume F Z 2 −n , then the subsequent estimation procedure with high probability satisfies for any observable O and quantum state ρ, whereô (r) is the single-round estimator defined as in Algorithm 1.
Here and throughout the paper, we useÕ to represent the Big-O notation with poly-logarithmic factors suppressed. The more rigorous version of Theorem 1 is Theorem 7 in Appendix B. We see that our protocol indeed eliminates the systematic error of shadow estimation in a sample-efficient manner, since without the calibration step the empirical expectation value would converge to a value that conflated the noise map Λ into the estimate, whereas Λ does not appear in Eq. (13). More specifically, if the Z-basis average fidelity of the noise channel Λ is lower bounded by some constant (e.g. constant-strength depolarizing noise), then the sample complexity of our calibration stage is approximately independent of the system size n.
A more realistic noise model to consider is that of local noise with fixed strength, where Λ := n i=1 Λ i and each single-qubit noise channel Λ i satisfies F Z (Λ i ) ≥ 1 − ξ.
In that case, we have F Z (Λ) −2 ≈ exp(2nξ) for small ξ, so we can efficiently deal with a system size n that is comparable to ξ −1 .
Next, we consider the sample complexity of the estimation procedure. Following a similar methodology of bounding the sample complexity in the noise-free standard shadow estimation scheme [38], we bound the sample complexity of our RShadow estimation procedure as follows.
Theorem 2 (Informal). For RShadow with Cl(2 n ), if the number of calibration samples R C and the number of estimation samples R E satisfies The rigorous version of Theorem 2 is Theorem 8 in Appendix B. Compared with results in Ref. [38], one can see that the RShadow scheme has nearly the same sample complexity order as the noise-free standard shadow estimation methods in a low-noise regime.
Finally, we comment on the computational complexity of RShadow. The computational complexity of our calibration procedure is favorable since the single-round fidelity estimator can be calculated efficiently with the Gottesman-Knill theorem [55,56]. However, a efficient computation using the Gottesman-Knill theorem for the estimation procedure would require the observable O to have additional structure, such as being a stabilizer state or being a Pauli operator. The standard shadow estimation scheme of Ref. [38] or the fast Pauli expectation estimation method of Ref. [35] also have such a requirement.

V. ROBUST SHADOW ESTIMATION USING LOCAL CLIFFORD GROUP
Despite the useful properties the global Clifford group possesses, it is often challenging to implement the full n-qubit Clifford group under current experimental conditions. The local Clifford group Cl ⊗n 2 , which is the n-fold tensor product of the single-qubit Clifford group, is an experimentally friendly alternative. We now present an robust shadow estimation protocol based on the local Clifford group which can efficiently calibrate and mitigate the error in estimating any local property.
It is known that the n-qubit local Clifford group has 2 n irreducible representations [57]. Being twirled by the local Clifford group, the channel M becomes a Pauli channel that is symmetric among the x, y, z indices, and the Pauli-Liouville representation is for f z ∈ R which is called the Pauli fidelity. Here, for any string m ∈ {0, 1} n , we define |m to be the Liouville representation of the computational basis state |m m|, and define P m := n i=1 P mi Z and σ m to be the corresponding normalized Pauli operators. In the noiseless case, one can obtain f z = 3 −|z| using Eq. (9), where |z| is the number of 1s in z.
The noise characterization subroutine with Cl ⊗n 2 is defined as follows NoiseEst Cl ⊗n 2 (z, U, b) := b|U|P z , ∀z ∈ {0, 1} n . (16) In the standard shadow estimation using Cl ⊗n 2 [38] (and in the earlier work [35] then the subsequent estimation procedure with high probability satisfies for any k-local observable O and quantum state ρ, wherê o (r) is the single-round estimator defined as in Algorithm 1.
The rigorous version of Theorem 3 is Theorem 9 in Appendix C. Indeed, this protocol can calibrate the shadow estimation process for all k-local observables using a small number of samples that only depends on k (but basically not on the system size n). Note that, Theorem 3 holds for any gate-independent noise model, even for global unitary noise.
Now we investigate the sample complexity of the estimation procedure. We are currently unable to bound the sample complexity against the most general noise channel, but we do have a bound for a local noise model, as shown in the following theorem: respectively, then the protocol can estimate M arbitrary linear functions The rigorous version of Theorem 4 is Theorem 10 in Appendix C. Again, we see RShadow using Cl ⊗n 2 has a sample complexity similar to the noiseless standard shadow estimation protocol when the noise is local and not too strong. We also remark that, although we do not have a sample complexity bound against a more general noise model, our numerical results show that RShadow can still perform well in that case (see Appendix E). Furthermore, in realistic experiments, one can monitor the standard deviation of estimators in real time, which means they can still suppress statistical fluctuations to an acceptable level even without a theoretical sample complexity bound.
Regarding the computational complexity, it is obviously impractical to calibrate all 2 n parameters f z . However, since we only care about k-local observables, onlyf (r) z such that |z| ≤ k needs to be computed, the number of which is no greater than n k . Further note thatf can be computed within O(n k ) time using dynamic programming. If there is extra structure of the observables to be predicted (e.g. spatially local), the number of necessarŷ f (r) z can be further reduced. In practice, one may store the raw data of the calibration procedure and see what observables are to be predicted, before deciding which set of f z need to be calculated. An example is given below in our numerical experiments. The computational complexity for the estimation procedure is therefore low when the observables are k-local for reasonably small k.

VI. ROBUSTNESS AGAINST STATE PREPARATION NOISE
In the last two sections, we prove the performance of the RShadow protocol based on the assumption of perfect |0 preparation. Although |0 is relatively easy to prepare on most current quantum computing platforms, state preparation (SP) noise is still inevitable. In this section, we show that the RShadow protocol is also robust against small SP noise in the following sense: when |0 can be prepared with high fidelity during the calibration procedure, the estimators for the estimation procedure will not be too biased, and the sample complexity will not increase drastically.
Formally, in a realistic calibration procedure, one prepares some ρ 0 instead of |0 0| for each round. We assume ρ 0 is time-independent, which is reasonable if the experimental conditions do not change much during this process. We have the following theorems: then with the same number of calibration samples as in Theorem 1, the subsequent estimation procedure with high probability satisfies up to the first order of ε and ε SP .
Theorem 6. For RShadow using Cl ⊗n 2 , if the state is prepared as a product state ρ 0 = n i=1 ρ 0,i and the singlequbit state-preparation fidelity satisfies then with the same number of calibration samples as in Theorem 3, the subsequent estimation procedure with high probability satisfies up to the first order of ε and kξ SP . k is the locality of observable O.
The proof is given in Appendix D. The above two theorems show that the effect of state-preparation noise can indeed be bounded for RShadow. They also enable an experimentalist to decide a practical sample number according to how well his device can prepare |0 0|.

VII. NUMERICAL RESULTS
Here, we design several numerical experiments to demonstrate the practicality of the robust shadow estimation (RShadow) protocol. We first benchmark the robustness of the RShadow protocol under various types of noise models in the task of estimating the fidelity of the GHZ state. After that, we show the application of RShadow in estimating the 2-point correlation as well as the energy of the ground state of the anti-ferromagnetic transverse-field Ising model (TFIM). These tasks frequently appear in the field of quantum computational chemistry [58]. In all the numerical experiments, we assume that the states to be tested are perfectly prepared while the shadow estimation circuits are noisy. We compare the performance of RShadow protocol with the standard quantum shadow estimation scheme (standard Shadow) [38] in all the tasks. Our numerical simulation makes use of Qiskit [59], an open-source python-based quantum information toolkit.
For all the plots in this section, the error bars represent the standard deviation of the estimation procedure (which means we ran the calibration procedure of RShadow only once for each data point), and are calculated via the empirical bootstrapping method [60], where we randomly samples the same size of data points with replacement from the original data and calculate the estimator as a bootstrap sample. Repeat this for B = 200 times, and take the standard deviation among these bootstrap samples as an approximation to the standard deviation of our RShadow estimator.
In the first experiment, we numerically prepare a 10qubit GHZ state, and use the shadow estimation protocol to estimate its fidelity with the ideal GHZ state. Each protocol use R = 10 5 (N = 10 4 , K = 10) samples for the estimation stage, while our RShadow uses an extra R = 10 5 (N = 10 4 , K = 10) samples for its calibration stage. We simulate the following three noise model: depolarizing, amplitude damping, and measurement bit-flip, each with several different levels of strength. The random circuits are set to be global Clifford gates. Fig. 2 shows the results. One can see that, for all these three noise models, when the noise level increases, the standard shadow estimation deviates from the true value, while the robust shadow estimation remains faithful. Note that there exist some numerical results from our robust procedure exceeding the ground truth in the above figure. That is due to the nature of the shadow protocol to eliminate the effect of noisy fidelity parameters {f λ }. To eliminate this fidelity parameters and extract the estimation of a desired observable, the protocol will use a ratio estimator, which tends to have results with systematically biased errors. Moreover, the statistical fluctuation will affect the estimation of our procedure. Fortunately, Theorem 2 and 4 allow us to bound the size of fluctuation errors along with the systematic biases. From practical consideration, the estimation results of the observables {ô (r) } are allowed to be truncated given some physical ranges from prior knowledge, and this can help to improve the accuracy and circumvent some non-physical estimation. A caution is that we cannot On the same task of estimating the GHZ-state fidelity, we further test the performance of our RShadow method when the size of system increasing from 4 qubits to 12 qubits. During the measurement procedure, we set a noise model where all the qubits undergo a local X-rotation U X (θ) = e −iθX . We remark that such kind of coherent noise can not be modeled as a classical error occurred in the measurement results. We fix the number of trials to be R = 10 5 (N = 2500, K = 40) for both the calibration and estimation stages. Meanwhile, we choose the rotation angle to be θ = π 25 , 2π 25 , and 3π 25 . In Fig. 3, we compare the fidelity estimation result of standard Shadow and RShadow. When local noises occur, the performance of standard Shadow decreases when the system size increases. In contrast, the estimation of RShadow is still accurate. This highlights the necessity of noise suppression especially when the system size gets larger. Here, we assume that all the qubits will experience a local X-rotation error UX (θ) = e −iθX with θ = π/25, 2π/25, and 3π/25. In the experiment, we set the number of trials R = 10 5 (N = 2500, K = 40) for both calibration and estimation stages.
The next experiment is designed for shadow estimation with local Clifford group. We estimate the 2-point ZZ-correlation functions and energy expectation of the ground state of an anti-ferromagnetic transverse field Ising model (TFIM) in one dimension with open boundary, whose Hamiltonian is H = J i Z i Z i+1 + h i X i and we focus on the case J = h = 1. The ground state is approximated using density matrix re-normalization group method, represented by a matrix-product state (MPS). Codes from [36] are modified here to sample random Pauli measurements on the MPS. We compare the performance of RShadow and the standard shadow estimation [38] scheme in the presence of measurement bit-flip noise, which means each qubit measurement outcome has an independent probability p to be flipped. Our RShadow uses R = 500000 (N = 20000, K = 25) calibration samples and R = 500000 (N = 10000, K = 50) estimation samples, while standard shadow estimation uses R = 500000 (N = 10000, K = 50) samples.
We first generate a 50-spin TFIM ground state, and es-timate the ZZ-correlation functions between the leftmost spin and all other spins Z 0 Z i , where the bit flip probability is set to be 5%. Fig. 4 shows the estimation values and absolute errors of both RShadow and standard Shadow.
It can been seen that RShadow in general gives a much more precise estimation than standard Shadow. We then estimate the energy expectation. In Fig. 5 we plot the energy estimation results on a 50-spin TFIM ground state under three different noise models (similar as above numerical experiments of GHZ fidelity estimation). One can see that the estimation error of standard Shadow increases when the noise level increases, while RShadow remains giving precise results. Then we fix the noise model to be 5% measurement bit flip and conduct estimation on different sizes of systems. In Fig. 6 we plot the absolute estimation error. This error increases when the system size grows for the standard Shadow, but it remains close to zero for RShadow scheme. This provides a strong reason why the RShadow scheme should be applied as the size of quantum system becomes increasingly large.
As a remark regarding the computational complexity, we do not calibrate all f z such that |z| ≤ 2, the number of which scales as O(n 2 ). Instead, we only calibrate the nearest-neighbor terms of f z for the energy expectation estimation, and the f z terms such that acts on the first qubit and any other qubit for the correlation function estimation. In both case, there are only O(n) parameters to be calibrated. Therefore, when the system size gets large, the RShadow protocol remains efficient.
To demonstrate the noise-resilience of RShadow scheme against 2-qubit correlated noise, we present more numerical results in Appendix E in the task of estimating the 2-point correlation function of the n-qubit GHZ state. These numerical experiments justify that the RShadow scheme can indeed mitigate the experimental errors and  reproduce faithful estimation with a small number of benchmarking trials.

VIII. GATE-DEPENDENT NOISE
Perhaps the strongest assumption we made is the gateindependence of the noise channel Λ with respect to the unitary gate U being sampled. In this section, we present numerical evidence showing that even with an experimentally realistic gate-dependent noise model, RShadow can still greatly reduce noise bias. Throughout this section, we focus on RShadow with the local Clifford group, which is experimentally implementable on most near-term platforms. The task we consider here is the electronic structure problem: decide the ground state energy of a molecule with an unknown electronic structure. This is a important problem in quantum chemistry, and is viewed as one of the most promising applications of near-term quantum algorithms, see e.g. [58]. Several recent works have already applied shadow estimation related methods to study this problem [61,62].
Specifically, we choose a benchmark molecule and use a certain encoding scheme to map the molecular Hamiltonian into a qubit Hamiltonian. Then, given the ground state of this Hamiltonian, we numerically run the (standard and robust) shadow estimation protocols to estimate its energy, and compare the estimation with the classically computed true value, in the presence of noise. In our setting, we choose H 2 and apply the Bravyi-Kitaev encoding [63] to map it to a 4-qubit Hamiltonian.
To come up with a realistic gate-dependent noise model, we first need to decide how the local Clifford group is implemented on real experimental platforms. One common approach is to decompose all unitary gates into a small set of generators. Here, we consider the generating set consisting of the following three single-qubit generators which can be understood as π/2 rotations along the X,Y,Z axes respectively. Every single-qubit Clifford gate can be decomposed into two subsequent rotations along two out of these three axes. For example, the Hadamard gate can be implemented by first applying a π/2 rotation pulse along the Y axis, and then a π rotation pulse along the X axis, which is in turn implemented by concatenating two π/2 X pulses. (See App. E 2 for more details.) This generating set is wildly used in real experiments.
Our numerical simulations will deal with the following two kinds of errors that naturally appear in experiments: 1. Pulse mis-calibration: The π/2-pulses have some fixed error due to e.g. an uncharacterized constant magnetic field. These noisy generators would look like for P = X, Y, Z and some traceless Hermitian operator ∆ 0 representing the uncalibrated Hamiltonian. Although ∆ 0 is the same for all three generators, the commutator [P, ∆ 0 ] is in general different for different P . Thus, one can verify that this is a gatedependent noise model by expanding R P using the Baker-Campbell-Hausdorff formula.
2. Random over-rotation: Due to imperfect pulse control, the actual rotation angle for each generator could be modeled as π/2 + δ for some zero-mean Gaussian random variable δ. The noisy generators look like We assume that the value of δ is re-sampled every time a generator is applied. One can verify that this noise model is equivalent to a dephasing noise on the eigenbasis of Pauli P following the noiseless generator R P , thus it is a gate-dependent noise model. (See App. E 2.) Our numerical results are presented in Fig. 7, where we plot the energy estimation outcome of both standard Shadow and RShadow in the presence of different levels of noise strength. The noise model is Pulse mis-calibration for the upper figure and Random over-rotation for the lower one. For both noise models, we use R = 30000 (N = 3000, K = 10) calibration samples and R = 10000 (N = 1000, K = 10) estimation samples for RShadow, and R = 10000 (N = 1000, K = 10) samples for standard Shadow [64]. The data points and the error bars are the average values and the standard deviations over 30 independent runs [65].  These numerical results provide evidence for the ad-vantage of RShadow over standard shadow estimation even with realistic gate-dependent noise. Specifically, for Pulse mis-calibration noises, it seems that RShadow cannot eliminate all biases when the noise strength becomes very large. (Note that, noise intensity ∆ 0 = 0.1π is already fairly high in practice.) Yet, even in this regime RShadow still significantly outperforms standard shadow estimation and greatly suppresses the bias in the estimated ground state energy. For Random over-rotation noise, RShadow completely eliminates all bias even at large noise strengths.
So why does RShadow work for these gate-dependent noise models? One possible explanation is as follows. The key subroutine of RShadow is to learn a Pauli channel. Even though the noise has strong gate-dependence, if the M channel defined in Eq. (15) is approximately a Pauli channel, and the random unitary gates are "good enough" to twirl the input probe state |0 into an approximate complex projective 2-designs, then we expect RShadow to still perform well. A more rigorous and comprehensive analysis of RShadow's noise-resilience against general gate-dependent noises is left for future work.

IX. CONCLUDING REMARKS
We have analyzed the shadow estimation protocol proposed in Ref. [38] by considering the gate and measurement errors occurring during the process, and have proposed a modified protocol that is robust against such noise. We have proven that, in both the global and the local random Clifford group version of the robust shadow protocol, we can efficiently benchmark and suppress the effects caused by the noise. On account of the broad application prospects of the shadow estimation protocol in predicting various important properties of quantum states, e.g., entanglement witness, fidelity estimation, correlation functions, etc., we expect that our robust protocol is practical and feasible for current experiments.
While we only focus on estimating linear properties in this work, RShadow can also be used to calibrate the estimation of higher-order properties such as the subsystem Rényi-2 entropy with similar methods shown in [38]. An exploration into the corresponding sample complexity bound is left for future studies. It is also interesting to explore how RShadow can be incorporated with other variants of shadow estimation such as the locallybiased classical shadow [61] and derandomized classical shadow [62]. These two methods can greatly improve the sample efficiency of shadow estimation when one has prior knowledge about which properties are to be predicted, like in the electronic structure problem. We believe these techniques can also be applied to RShadow and have been actively developing these methods.
The idea of using additional calibration processes and classical post-processing to eliminate noise effects also appears in the field of error mitigation [18][19][20][21]. Among them, it is particularly interesting to compare our work with Ref. [20] and Ref. [21], which mitigate the measurement readout error on multi-qubit devices using a measurement calibration (or detector tomography) process. The spirit of these works is quite similar to ours, but their assumptions on the noise model are much stronger, and their calibration algorithm is more like a heuristic one without an explicit bound on the sample complexity. One reason why we are able to obtain a useful sample complexity bound against a more general noise model is that the random twirling in RShadow greatly simplifies our analysis of the noise estimation. For future research, it is interesting to explore the relationship between RShadow and other error mitigation schemes, and see if any general results for error mitigation [66] can be applied to our scenario. Very recently, error mitigation has been shown to be helpful even for fault-tolerant quantum computing [67]. We expect RShadow to be a useful protocol in the fault-tolerant regime as well.
For our performance guarantee of the robust shadow estimation protocol, the noise in the random gates is allowed to be coherent and highly correlated, but cannot depend on the unitary gate to be implemented. This assumption is reasonable in many cases, especially in the protocol with local Clifford gates, where the noise is mainly caused by amplitude damping and decoherence of the system to the environment [68]. Nevertheless, it is important to analyze how gate-dependency and non-Markovianity of the noise can affect the performance of RShadow. We have provide some numerical evidences for RShadow's resilience against gate-dependent noise in Sec. VIII, and left more rigorous analysis for future research.
In the PTM representation, the picture of quantum state shadow estimation can be easily extended to the shadow estimation of quantum measurements and quantum channels. For example, in order to estimate O i |E|ρ j for some unknown n-qubit quantum channel E and a set of given observables {O i } and states {ρ j }, one may insert two random measurement channels into the expression, In the experiment, one can randomly prepare a computational basis state |y , apply a random unitary V and send to the channel E, then apply another random unitary U and measure in the computational basis, getting outcome |x . Then 2 −n O i |M −1 U † |x y|V † M −1 |ρ j is an unbiased estimator of O i |E|ρ j . This is only the most straightforward way to extend robust shadow estimation to quantum channels; there may exist other schemes that have even better performance. We believe a complete analysis of the channel version of shadow estimation will be an interesting direction for further study.
Finally, one can also consider applying (standard or robust) shadow estimation to qudit systems, Boson/Fermion systems and other continuous-variable systems using the techniques developed in this work.
Note added. -After posting this paper to arXiv, two related but independent works subsequently appeared. The independent work by Koh and Grewal [69] also studies how to mitigate noise in the shadow estimation protocol. In their work, the noise channel is assumed to be completely pre-characterized. The main results Theorem 1.1 and Theorem 1.2 is similar to our Theorem 2 and Theorem 4 if our noise calibration procedure is assumed to be done perfectly. The other independent work by Berg, Minev, and Temme [70] also contains some similar ideas as presented in our work. We thank the authors for communicating their work with us.

Groups and representations
The group representation theory plays an important role in the shadow estimation protocol. Denote a generic group as G = {g i } i , where g i is one of the group elements. Denote a unitary representation of G to be a map with the homomorphism Moreover, we denote all the irreducible representations (irreps.) of the group G as R G = {φ λ (G)} λ . The Maschke's Lemma ensures that, every representation of a group can be written as a direct sum of irreps, where m λ is an integer implying the multiplicity of the irrep φ λ .
In the later discussion, we will frequently come across the twirling of an linear operator O on Hilbert space H with respect to a group representation φ(G), As a result of the group structure G, the twirling result T φ (O) owns a simple structure, which is related to the irreps in φ(G). The following lemma is a corollary of Schur's lemma. [45], rephrased by [44]) For a finite group G and a representation φ of G on a complex vector space H with decomposition

Lemma 1. (Lemma 1.7 and Prop 1.8 in
where {φ λ } are the irreps of φ(G), m λ is the multiplicity of φ λ . Then for any linear map O ∈ GL(H), the twirling of O with respect to φ can be written as where Π j λ j λ is a linear map from the support of the j λ -th copy of φ λ to the support of the j λ -th copy of φ λ . In this work, we focus on the group representation φ with no multiplicities, In this case, Eq. (A6) can be simplified as where Π λ is the projector onto the support of φ λ . Here, we introduce some common groups that will be frequently used. Note that all the linear operators in L(H d ) form a Lie group GL(d, C). The unitaries in L(H d ) also form a Lie group U (d).
Denote Z 2 = {0, 1} to be the 2-element cyclic group. Z n 2 := (Z 2 ) ⊗n is the n-copy tensor of Z 2 group. Denote A = {a i } with {a i } the generators of the group. In the later discussion, we will also slightly abuse Z n 2 to denote the set of n-bit binary string.
For n-qubit quantum system, the Pauli group is with I, X, Y, Z the qubit Pauli matrices. Denote the quotient of P n to be P n = P n / i , which is an Abelian group and isomorphic to Z 2n 2 . Therefore, we will use a 2n-bit string to denote the elements in P n and choose the elements to be The multiplication and commutation of elements in P n follows, with a binary symplectic product. This symplectic product owns the following properties The n-qubit Clifford group Cl(2 n ) is defined to be Cl(2 n ) = {g|gP a g −1 ∈ P n , ∀P a ∈ P n }/U (1), where the U (1) represents the global phase. Obviously, P n is a subgroup of C n . The single-qubit Clifford group is then Cl 2 := Cl(2). Later we will also come across the tensor-ed n-fold single-qubit Clifford group Cl ⊗n 2 .

Random unitaries and t-designs
The shadow estimation is a direct application of twirling in random unitaries. The ideal "uniformly distributed" randomized unitaries over the Lie group GL(d, C) is characterized by Haar measure µ(H d ) [71]. The Haar measure is defined to be the unique countably additive, nontrivial measure of the group U such that, where f (U ) is any matrix function of U .
In practice, to sample unitaries with respect to Haar measure is challenging due to its continuity. Alternatively, one may choose to sample from a finite subset K = {U k } |K| k=1 over the unitaries in GL(d, C).

Definition 1. A finite subset
for all the polynomial f (t,t) (U ) of degree at most t in the matrix elements of U and at most t in the matrix elements of U * .
It has been proven that, the Clifford gate set Cl(d) ⊂ U(H) is a unitary 3-design [52,53], while fails to be a unitary 4-design [72].

Quantum channel and the representations
Quantum channels are the linear maps E : L(H d ) → L(H d ) which are completely positive and trace-preserving (CPTP).

Definition 2. Let E : L(H d ) → L(H d ) be a linear map. We say that 1. E is positive if E(ρ) ∈ D(H d ) for any ρ ∈ D(H d ).
2. E is completely positive (CP) if I d ⊗ E is positive, for all the dimension d .

E is a quantum channel if it is both CP and TP.
In this work, we will come across two representations of the quantum channels : Kraus representation and Liouville representation. For a quantum channel E : L(H d ) → L(H d ), its action on a linear operator O ∈ L(H d ) can be expressed as where {K t } k t=1 are the Kraus operators satisfying k t=1 K † t K t = I. To represent the effect of quantum channels in a convenient way, we first introduce the Pauli basis P n on L(H d ) to vectorize the linear operators in L(H d ). Define the inner product between two operators to be the Hilbert-Schmidt product Q, W := Tr(QW † ), ∀Q, W ∈ GL(H d ). (A18) In this case, the operators in P n form an orthogonal basis. We introduce the operators as the orthonormal basis. To vectorize the linear space spanned by {σ a }, we introduce the notation {|σ a }. For the single-qubit case, we will also use the following notations, Then the operators on L(H d ) can be vectorized as The quantum channel E can then be represented as with The matrix E is the Pauli-transfer matrix (PTM) or Pauli-Liouville representation. In this work, we slightly abuse the notation of a superoperator E to represent its PTM. For a unitary matrix U , we use the calligraphic U to represent its PTM.
For a quantum channel E with state ρ input, and POVM measurement M = {M b } with b M b = I, the probability to get the measurement result b is Under the PTM representation, the composition and tensor product of channels E 1 and E 2 can be naturally expressed as The PTM of the unitaries in U (d) forms a natural group representation of U (d). Denote the PTM of a given unitary U as φ P (U ) := U, we have The PTM representation φ P (U (d)) can be decomposed to two irreps, Here, the projectors Π I and Π σ are The n-qubit Clifford group Cl(2 n ), as the subset of n-qubit unitary group, can also be represented by the PTM matrices. The PTM representation φ P (Cl(2 n )) can be decomposed similarly, where φ P I and φ P σ are two irreps on the support Π I and Π σ , respectively.

Weingarten Function
In this part, we introduce the Weingarten function as a tool to calculate general Haar integrals [73][74][75]. The following presentation owes a lot to Section 2 of [76].
For an operator A acting on H ⊗k d , define the k-fold Haar twirling of A as Φ (k) Using Schur-Weyl duality, one can show that Φ (k) Here, S k is the k-element permutation group, and W π is the permutation operator defined as follows and the coefficients c π,σ are the Weingarten matrix [74] which can be calculated as where Q is called the Gram matrix. Q + stands for the Moore-Penrose pseudo inverse of Q, which is Q −1 when Q is invertible. (Note that, when Q is not invertible, c is not uniquely determined. It is only a conventional choice to take c = Q + [75,77]).
In this basis, the Gram matrix becomes For d ≥ 3, one can show that the Weingarten matrix becomes while for d = 2, Q is singular, so we take its pseudo inverse as follows

Appendix B: Sample Complexity of RShadow with Global Clifford Group
In this section, we study our robust shadow estimation protocol with G chosen to be the n-qubit Clifford group Cl(2 n ).

Calibration Procedure: Global
Recall that the channel M can be written on the Pauli basis as for some f ∈ R depending on Λ. Note that f = (d + 1) −1 when the noise channel is trivial, i.e. Λ = id. We rewrite the RShadow protocol from the main text as below

5.
After the above steps, apply the standard classical shadow protocol of [38] on ρ with the inverse channel M −1 replaced by in the Liouville representation.
In Protocol 1, the unitary operations and the measurement are assumed to contain gate-independent noise, and the preparation of |0 is assumed to be perfect. The next theorem shows thatf (r) is an unbiased estimator of f and its variance can be bounded.
Moreover, the single-round estimatorf satisfies Before we provide the proof of Proposition 1, we first introduce two lemmas.

Lemma 2. (see e.g. [72, Proposition 4]) If a group G ⊆ U (d) forms a unitary t-design, then
where P sym t is the projector onto the t-fold symmetric space, or equivalently, P sym t = 1 |St| π∈St W π where W π is the permutation operator defined in Eq. (A33).

Lemma 3. For two operators
We also have the following relation between the average fidelity of a channel M and the trace of its Pauli transformer matrix (see e.g. [44]), Combining the above two equations, we get hence Eq. (B3) follows directly from Eq. (B2). We only need to calculate the expectation and variance ofF (r) .
Denote the Kraus operators of the noise channel Λ as {K t }. The average fidelity of M can be explicitly written as follows and recalling the fact that Cl(2 n ) is a unitary 3-design [52][53][54], we are able to do the following calculations.
where we write F Z ≡ F Z (Λ) as the Z-basis average fidelity of Λ.
Now we can bound the variance ofF as follows (B17) where we use the fact that F Z ≤ 1. This completes the proof.
Now we analyse the sample complexity of Protocol 1 in order to guarantee the protocol to succeed within a given level of precision. Specifically, we consider using the protocol to estimate a linear function of ρ, i.e. O|ρ . Given that one makes sufficiently many samples in the estimation procedure, the estimation of this function will be close to O| M −1 M|ρ . Hence, we are concerned about the following error where O 0 = O − Tr(O) d I is the traceless part of O. Now we want to upper bound |f −1 f − 1| by some ε > 0. Suppose with high probability the estimator in Protocol 1 satisfies |f − f | ≤ γ for some 0 ≤ γ ≤ |f |. Then we have, where the last inequality is by the triangular inequality. Now if we have then we obtain the bound |f −1 f − 1| ≤ ε with high success probability. Now is the time to calculate the number of rounds R in order to bound |f − f | as we want with high confidence. As noted before, we will uses the median of means estimator [79,80] in order to get a preferable scaling with respect to the failing probability. Similar techniques are also applied in [38]. Specifically, we conduct R = KN rounds of the procedure in Protocol 1, calculate K estimators each of which is the average of N single-round estimatorsf , and take the median of these K estimators as our final estimatorf . In formula,f (r) , k = 1, 2, ..., K.

(B21)
The performance of this estimator is given in the following lemma.

Lemma 4.
( [79,80], rephrased by [38]) For the estimator described by Eq. (B21) wheref (r) is identical and independent sample of f , if N = 34Var(f )/γ 2 for any given γ > 0, then Further, by taking K = 2 ln(2δ −1 ) for any δ > 0, one have Thanks to this Lemma and the above discussion, we reach the following theorem which summarize the trade-off between precision and the sample complexity of our main protocol. This theorem is the rigorous version of Theorem 1 in the main text. Theorem 7. Given ε, δ > 0, the following number of rounds of calibration in Protocol 1 is enough for the asymptotic error of the subsequent estimation procedure to satisfy with a success probability at least 1 − δ, where F Z ≡ F Z (Λ) is the Z-basis average fidelity of the noise channel Λ.

Estimation Procedure: Global
Till now, we have proved the efficiency of the calibration procedure, but have not addressed the efficiency of the estimation procedure. In the noiseless case, the performance of the standard quantum shadow estimation protocol has been characterized in [38]. Here, we extend their methods to show the performance of the RShadow estimation procedure. For When Λ = id, the function · shadow,Λ degrades to the norm · shadow defined in [38].
Proof. First observe that the variance ofô (r) from Algorithm 2 only depends on the traceless part of O: which is because M is diagonal in Pauli transfer matrix representation, and M is a trace-preserving map. Therefore, In the special case that G := Cl(2 n ), we can obtain the following bound on the shadow norm · shadow,Λ . Lemma 6. For RShadow using Cl(2 n ), if the calibration procedure guaranteesf ≥ δf for some δ > 0, and we assume for any observable O.
Proof. From the definition of the noisy shadow norm and using the Weingarten functions from Eq. (A32) we have where in the last equation we use the Weingarten function to expand the Haar intergral (see Eq. (A32)). Now using Eq. (B8) to compute the traces appearing above, we have Recall that F Z (Λ) is the Z-basis average fidelity of Λ as defined in Prop. 1, and we denote it simply as F Z in the following. Then we also have where the first inequality is by the fact that Tr(σO 2 0 ) ≤ O 2 0 ∞ ≤ Tr(O 2 0 ) and the assumption F Z ≥ 1 d , and in the second equality we use the expression of f from Proposition 1.
Compared to Proposition 1 from [38] which states that O 0 2 shadow ≤ 3 Tr(O 2 0 ), we conclude the following. As long as the noise channel Λ has a Z-basis fidelity that is not too low and the noise calibration procedure is conducted with sufficiently many rounds, then the estimation procedure of our RShadow protocol using Cl(2 n ) is as efficient as the noiseless standard quantum shadow estimation protocol [38] up to a small multiplicative factor. That is to say, the expectation value of any observable O that has small Hilbert-Schmidt norm can be efficiently estimated by RShadow.
To complete the discussion, we give the following theorem as a rigorous version of Theorem 2 in the main text.
Proof. First, according to Theorem 7, for the given number of samples R C one have Meanwhile, from the proof of Theorem 7 (see Eq. (B19)), one also have Both of the above equations hold simultaneously with probability at least 1 − δ 1 . Now, by Lemma 5 and Lemma 6, the single-round estimators in the estimation procedure satisfy: So we set the median of mean estimatorsô i of the estimation procedure with the following parameters: Then Lemma 4 combined with the union bound gives that the following holds for all i with probability at least 1 − δ 2 : Combining Eq. (B39) and Eq. (B43) using the triangular inequality gives which holds with probability at least 1 − δ 1 − δ 2 . This completes the proof. and f z is the Pauli fidelity. In the noiseless case, one can obtain f z = 3 −|z| where |z| is the number of 1 in z.
Notation: For any string m ∈ {0, 1} n we define |m to be the Liouville representation of the computational basis state |m , while |σ m stands for the normalized Pauli operator corresponding to P m := n i=1 P mi Z . On the other hand, the notation of z in this section consistently stands for an n-bit string and should not be confused with the Pauli-Z index.
The RShadow protocol using local Clifford group can be written as follows.
in Protocol 2 above, we have that the expectation value over the experiments is given by To derive the expression forf z that depends on the noise channel Λ, we can alternatively expand the expectation as follows, To evaluate this expression, we first consider the single-qubit case. By direct calculation we obtain Hence, for any X ∈ Herm(2) and b ∈ {0, 1}, by Lemma 3, (C8) Applying this to the n-qubit case, one can then verify that To compute the variance, we compute Again, first consider the single-qubit case. One can verify that Hence, for any X ∈ Herm(2) and b ∈ {0, 1}, by Lemma 3, Tr(X). (C12) One can also verify these equations using the Weingarten matrix. Applying to the n-qubit case, one can verify that Tr(Λ † (|b b|)) Since E(f (r) 2 z By setting γ z = ε 1 + ε |f z |, the above equation is upper bounded by ε. Therefore, if we set for the median of mean estimator in Eq. (C14) and Eq. (C15), by Lemma 4 we have |1 −f −1 z f z | ≤ ε with a success probability at least 1 − δ. Now we want all z ∈ {0, 1} n such that |z| ≤ k to statisfy this inequality. The number of such strings is no larger than n k , so we set and apply the union bound. Now we have |1 −f −1 z f z | ≤ ε for all |z| ≤ k with probability at least 1 − δ. Our final upper bound of the sample complexity is which completes the proof.
The quantity Γ Λ (z) can be lower bounded when Λ is close to an identity channel, as shown by the following lemma.

Lemma 7. if the Z-basis average fidelity of
Proof.
where the second equality is by the fact that Λ is trace-preserving, and hence b∈{0,1} n b|Λ|x = 1.
Specifically, if we substitute the bound for Γ Λ (z) from Lemma 7 into the above theorem, we get Theorem 3 in the main text. We conclude that our Protocol 2 can mitigate the noise in the computation of the expectation of any k-local observable efficiently, given that k is small and the noise is weak.

Estimation Procedure: Local
Now we consider the RShadow estimation procedure using Cl ⊗n 2 . Thanks to Lemma 5, we only need to characterize · 2 shadow,Λ . Due to technical difficulties, we are currently not able to bound · 2 shadow,Λ for the most general noise channel Λ, but we do have results for local noise channel Λ (hence also for any separable Λ by linearity). Suppose Λ ≡ n i=1 Λ i , and denote the Z-basis fidelity of the qubit channels Λ i as F Z,i . Further assume O is k-local, which means it is non-trivially supported on only k qubits. We have Consider the single-qubit case, one have where we use the Weingarten function to expand the Haar integral, see Eq. (A32), and the value of the Weingarten matrix is from Eq. (A38).
For any X ∈ Herm(2 n ) and single-qubit Pauli operators P p , P q , we want to calculate the following quantity Tr [(X ⊗ P p ⊗ P q )Φ i ]. By direct calculation using Eq. (C30), one can verify that there are following four different cases This indicates that, the value Tr [(X ⊗ P p ⊗ P q )Φ i ] is non-zero if and only if the two single-qubit Pauli operators P p and P q commute.
Now we return to the evaluation of Eq. (C29). Our strategy is similar to [38]. We first decompose O into the Pauli operator basis (Note that, we use un-normalized Pauli operators here) Since O is k-local, one have α p = 0 for all |p| > k, where for any p ∈ Z 2n 2 we denote the Pauli weight of P p as |p|. Also recall from Eq. (C3) that where we define z as the following mapping The intuition is that after twirling over the local Clifford group the Pauli X, Y, Z indexes are symmetrized.
Here, for the second equality, we apply the single-qubit result from Eq. (C31). The functional δ(p, q) equals to 1 if P pi commutes with P qi for all i ∈ [n] and equals to 0 otherwise, and we have the following definitions |p ∨ q| := #{i ∈ [n] : P p,i = I or P q,i = I}. |p ∧ q| := #{i ∈ [n] : P p,i = I and P q,i = I}.
The third equality is by the dual characterization of the operator norm. The first inequality is by the fact that the operator norm of a Pauli operator is 1. The second inequality is by relaxing F Z,i to 1 and noticing that |p ∧ q| = |p ∨ q| − |p| − |q|. The last inequality uses the k-local property of O.
The first factor of Eq. (C35) can be bounded using the same method as in [38]. We reproduce their proof here for the convenience of the reader. Without loss of generality, suppose O is supported on the first k qubits, and hence can be written as O =Õ ⊗ I 2 n−k . The decomposition ofÕ is denoted as For any two q, s ∈ Z 2n 2 we write q s if one can obtain P q from P s by replacing some single-qubit Paulis of P s with I. Then, where in the first equality we restrict our attention to the first k qubits, the second equality can be verified by checking the coefficients of every |α p ||α q |, the first inequality is by Cauchy-Schwarz inequality, the third and fourth equality is by simple combinatoric arguments. For the last line, the first equation follows from the definition ofÕ, the inequality follows from the relationship between the Hilbert-Schmidt norm and the operator norm, and the last equality is by the fact that the largest eigen value of O equals to that ofÕ. On the other hand, suppose the preceding calibration procedure guaranteesf z ≥ δf z for all |z| ≤ k for some postive number δ close to 1. Then the second term of Eq. (C35) can be bounded as follows by Proposition 2, Since Λ is assumed to be local noise, we have the following bound for Γ Λ (z), which could be better than Lemma 7, Proof.
where the third equality is by the fact that Λ i is trace-preserving, and hence x,δ∈{0,1} x ⊕ δ|Λ|x = 2, so we can eliminate those indexes i such that z i = 0.
Combine Lemma 8 with Eq. (C39), we get the following lemma: (Note that we substitute O with its traceless part O 0 in order to use Lemma 5 later.) Lemma 9. For RShadow using Cl ⊗n 2 , suppose the noise is local, i.e. Λ := n i=1 Λ i , and satisfies F Z (Λ i ) ≥ 1 − ξ for all i ∈ [n] and some 0 ≤ ξ < 1 2 . Then, if the calibration procedure guaranteesf z ≥ δf z for all |z| ≤ k and some δ > 0, we have for any k-local observable O.
Compared to Proposition 2 from [38] that O 0 2 shadow ≤ 4 k O 2 ∞ , we conclude that, when the separable noise channel Λ has not too low Z-basis fidelity per qubit and the noise calibration procedure is conducted sufficiently many rounds, the estimation procedure of our RShadow protocol using Cl ⊗n 2 is as efficient as the noiseless standard quantum shadow estimation protocol [38] up to a small multiplicative factor. That is to say, expectation value of any observable O located on a k-qubit subsystem can be efficiently estimated.
To complete the discussion, we give the following theorem as a rigorous version of Theorem 4 in the main text.  Proof. First, according to Theorem 9, for the given number of samples R C one have Note that we apply the bound for Γ Λ (z) from Lemma 8. Meanwhile, from the proof of Theorem 9 (see Eq. (B19)), one also have Both equations hold simultaneously with probability at least 1 − δ 1 . Now, by Lemma 5 and Lemma 9, the single-round estimators in the estimation procedure satisfy So we set the median of mean estimatorsô i of the estimation procedure with the following parameters: Then Lemma 4 combined with the union bound gives that the following holds for all i with probability at least 1 − δ 2 : Combining Eq. (C44) and Eq. (C48) using the triangular inequality gives which holds with probability at least 1 − δ 1 − δ 2 . This completes the proof.
4kξ ≈ e 4kξ . That is how we get the bound in Theorem 4.

Appendix D: The effect of state preparation noise
In this section, we will prove Theorem 5 and Theorem 6 in the main text establishing the robustness of RShadow against state preparation noise in the calibration procedure. Let's first fix the notations: We assume |0 is experimentally prepared as some other state ρ 0 which is fixed over time, and we will use a subscript "SP" to denote the state-preparation noisy version of our estimators. For example, M SP = λ∈R Gf λ,SP Π λ is our estimation for the physical channel M := λ∈R G f λ Π λ when the calibration process suffers from state preparation error.

Robustness of RShadow with Global Clifford Group
Lemma 10. For RShadow using Cl(2 n ), if the state-preparation fidelity satisfies then the SP-noisy single-round estimatorf Proof. According to the calibration procedure described in Algorithm 2 or Protocol 1 of App. B , we have .
One can immediately conclude that f ≥ E(f (r) The second moment ofF where we define F 0 := 0| ρ 0 |0 and F Z := F Z (Λ). Therefore, The following theorem is a more detailed formalisation of Theorem 5 in the main text.
Theorem 11. For RShadow using Cl(2 n ), if the state-preparation fidelity satisfies then with R =Õ(ε −2 F −2 Z ) calibration samples, the subsequent estimation procedure with high probability satisfies up to the first order of ε and ε SP for any observable O. We have assumed F Z := F Z (Λ) 1/d. (D8) According to Lemma 4, by taking the parameters of the median of mean estimators as N = 34Var(f (r) SP )ε −2 f −2 , K = 2 ln(2δ −1 ), (D9) the following holds with probability at least 1 − δ, We also have, from Lemma 10, that Therefore, our final bound is as claimed The sample complexity is for F Z := F Z (Λ) 1/d. Here we have used Lemma 10 and Proposition 1 to bound Var(f (r) SP ) and f , respectively.

Robustness of RShadow with Local Clifford Group
Note that, we consider a local state-preparation noise model for the results in this section, i.e., no cross-talk between qubits.
Lemma 11. For RShadow using Cl ⊗n 2 , if the prepared state is in a product form, i.e., ρ 0 = n i=1 ρ 0,i , and the single-qubit state-preparation fidelity satisfies for some ξ SP < 1/2, then the SP-noisy single-round estimatorf (r) z,SP satisfies Proof. According to the calibration procedure described in Algorithm 2 or Protocol 2 of App. C , we have E(f To calculate the second moment, we can first investigate the single-qubit case: Haar (ρ 0,i ⊗ P Z ⊗ P Z ), To further simplify the second expressions, one can verify that Tr( W (ρ 0,i ⊗ P Z ⊗ P Z )) = 0 0 0 2 1 1 , where W is defined in Eq. (A35). Calculating the Haar integral using Eq. (A32), one immediately notice that the form of ρ 0,i has nothing to do with the result. So we can safely replace all ρ 0,i with |0 0| and retrieve the result with no state preparation error: E(f The sample complexity is where we have used Lemma 11 and Proposition 2 to bound Var(f (r) z,SP ) and f z , respectively. The second inequality is by Lemma 7. We remark that one can alternatively use a stronger bound given in Lemma 8 when the noise model is assumed to be local.

Gate-dependent noise: more details
Here, we present more details about the gate-dependent noise simulations in Sec. VIII of the main text.
Gate decomposition: In our simulations, we decompose each single-qubit Clifford gate using the following three generators R P π 2 = exp(−i π 4 P ), P = X, Y, Z .
To see how this works, note that each single-qubit Clifford gate can be uniquely specified by its conjugation on Pauli Z and Pauli X. In notations, any single-qubit Clifford gate C can be equivalently described by , as shown in Fig. 9. Gate-dependence of the noise model: We claim in the main text that both pulse mis-calibration and random over-rotation are gate-dependent noise model. Here we explain in more details. The noisy generators for pulse mis-calibration error are { R P ( π 2 ) = exp(−i 1 2 ( π 2 P + ∆ 0 )), P = X.Y.Z}.
By the Baker-Campbell-Hausdorff formula, this can be expanded as As long as [P, ∆ 0 ] is different for different P , the noise channel is different for different generators. We also notice that the gate-dependent effect only appears in higher-order terms.
Different directions of ∆ 0 : In the numerical results of Fig. 7, we fixed a random direction for ∆ 0 and only varied its magnitude. Here we show that there is nothing special about the direction we picked. We uniformly sample 10 unit vectors v and set ∆ 0 to be 0.1π × (v 1 X + v 2 Y + v 3 Z). For each of these noise settings, we use R = 30000 (N = 3000, K = 10) calibration samples and R = 10000 (N = 1000, K = 10) estimation samples for RShadow, and R = 10000 (N = 1000, K = 10) samples for standard Shadow. The performance is shown in Fig. 10 in which the average and standard deviation is calculated over 10 independent runs. For all cases, RShadow gives a more accurate estimate than standard Shadow.  Table I .