Shadow tomography on general measurement frames

We provide a new perspective on shadow tomography by demonstrating its deep connections with the general theory of measurement frames. By showing that the formalism of measurement frames offers a natural framework for shadow tomography -- in which ``classical shadows'' correspond to unbiased estimators derived from a suitable dual frame associated with the given measurement -- we highlight the intrinsic connection between standard state tomography and shadow tomography. Such perspective allows us to examine the interplay between measurements, reconstructed observables, and the estimators used to process measurement outcomes, while paving the way to assess the influence of the input state and the dimension of the underlying space on estimation errors. Our approach generalizes the method described in [H.-Y. Huang {\it et al.}, Nat. Phys. 16, 1050 (2020)], whose results are recovered in the special case of covariant measurement frames. As an application, we demonstrate that a sought-after target of shadow tomography can be achieved for the entire class of tight rank-1 measurement frames -- namely, that it is possible to accurately estimate a finite set of generic rank-1 bounded observables while avoiding the growth of the number of the required samples with the state dimension.


I. INTRODUCTION
The reliable reconstruction of the information encoded in a quantum register is one of the stepping stones of any quantum information processing device.In this respect, quantum state tomography (QST), that is, the task of estimating quantum states from a measured dataset, is the gold standard for verification and benchmarking of quantum devices [1][2][3].QST has been performed in countless experiments by measuring a complete set of observables whose expectation values determine the quantum state.
As the typical representation of density matrices implies a number of coefficients exponential in the number of constituent subsystems, the standard formulation of tomography [4] of a generic state requires an exponential time in the system size.Alternative methods based on efficient representations of multiparty quantum statessuch as matrix product states [5] -have led to improved schemes for state tomography.Such an advantage, however, is achieved only for those states that are efficiently represented in the ansatz that is chosen.On the other hand, performing QST of d-dimensional quantum states, within error ǫ (in trace distance), requires a number of copies of the unknown state that scales polynomially with d [2,4].In this context, tight lower bounds to single-copy non-adaptive state reconstruction have been proven [6][7][8][9].
Often, though, the request of fully reconstructing the state of a register is excessive, as knowledge of specific features or figures of merit is sufficient for tasks such as process-validation and performance-characterization.In such cases, a smaller number of states might be required [10], thus easing the demands of a given experimental implementation.In particular, the number of measurements required to estimate the expectation value of M observables within error ǫ was shown to scale logarithmically with M and, most importantly, not depend explicitly on the space dimension -the associated task is referred to as "shadow tomography" in the literature [11].
An explicit way to implement shadow tomography has been constructed using random projective measurements realized via random Clifford circuits in order to efficiently estimates expectation values and other quantities of interest of given unknown many-qubit states [12,13].A recent review discussing some of the relations between state tomography and shadow tomography is found in Ref. [14].It was recently pointed out that shadow tomography protocols can be worked out, and the main scaling results obtained, for general quantum measurements [15,16].
Here, we further the grounding of shadow tomography for agile property reconstruction by highlighting its deep connection with the approach of state tomography via measurement frames [17][18][19][20][21][22][23].Our formalism reduces to the standard approach of [13] in special cases, and is compatible with its generalizations proposed in [15,16].We demonstrate that this general formalism provides a simple framework to understand the relationship between measurement, target observable, and estimator used to post-process measurement outcomes, as well as how the input state and the dimension of the underlying space affect the estimation error.This approach also directly connects with general metrological considerations, showing how the techniques used to compute classical shadows can be seen as optimal unbiased linear estimators.In turn, these are connected to the Fisher information matrix for a specific choice of parametrization.This formalism can also potentially be of great use to study the efficiency of state estimation schemes involving generalized measurements and single-setting measurement schemes, which have recently attracted significant attention [24,25].
More specifically, we take the analysis of measurement frames developed for state tomography, and specialize it to analyze estimation errors for shadow tomography tasks.We discuss how the MSE matrix, a quantity defined to study state tomography whose trace gives estimation error, also reveals a powerful tool to study errors in shadow tomography.We show how, for any choice of measurement, multiple possible unbiased estimators can be used to post-process data to recover the target observables, and discuss how to find the optimal estimators with respect to the prior knowledge on the input state.For example, our approach shows clearly how the shadow norm of an observable introduced in Ref. [13] is the variance for a particular (optimal) estimator and measurement, maximized over initial states to capture the worst case scenario.Furthermore, fixed a choice of optimal estimator, we discuss how errors scale with respect to different choices of measurement protocol.A crucial feature of shadow tomography is the favorable scaling of estimation errors with the dimension of the state.Focusing on this aspect, we derive explicit bounds for best-and worst-case estimation errors corresponding to different measurement choices, which in particular show how to suitably choose the measurement in order to remove the dependence on the state dimension from the variance of the optimal estimators.
The remainder of this manuscript is organized as follows.In section II we present a reformulation of shadow tomography using the formalism of measurement frames, highlighting the tight connections between the two approaches, as well as with more standard tomographic considerations.In section III we how how this formalism allows to analyze error scalings in general scenarios, as well as find the optimal error scalings both for tomography and shadow tomography tasks.Finally, in section IV we show explicitly how the formalism introduced in [13] can be seen as a special case of our approach when covariant measurements and canonical estimators are used.Additional in-depth discussions about derivations and the formalism used throughout the paper can be found in the appendices.

II. SHADOW TOMOGRAPHY ON MEASUREMENT FRAMES
In this section, we demonstrate explicitly how the formalism of measurement frames provides a natural framework for discussing shadow tomography on general quantum measurements.The central idea of shadow tomography [11] is to compute unbiased estimators for the target observables that operate in the "single-shot regime", that is, on individual measurement outcomes.By not requiring to recover a tomographically complete description of the states, such specialized estimators allow to estimate the target quantities much more efficiently.As said, an explicit protocol to perform shadow tomography with Clifford circuits was recently proposed in [12,13], and some generalizations to general measurements were proposed in [15,16].Here, we demonstrate that frame theory [26,27], and in particular the formalism of measurement frames [17,[28][29][30], provide a remarkably simple conceptual framework to think about shadow tomography, and allow to directly view the "classical shadows" as the unbiased estimators which constitute the elements of the dual measurement frame.Notation -In this section, we will consider finitedimensional states and measurements with a finite num-ber of outcomes.We can relax this constraint without significantly changing the formalism.Following the notation of [31], we will denote the real vector space of Hermitian operators acting on a d-dimensional complex vector space C d as Herm(C d ), the set of positive semidefinite operators acting on the same space as Pos(C d ), and the subset of density matrices as D(C d ) ⊂ Pos(C d ).To focus on the linear algebraic properties involved in the calculations, we will use the notation X, Y ≡ tr X † Y to denote the Hilbert-Schmidt inner product between a pair of operators X, Y.We will denote POVMs with ℓ outcomes as µ ≡ (µ a ) ℓ a=1 , where µ a ∈ Pos(C d ) and ∑ a µ a = I.Given a state ρ ∈ D(C d ), the associated outcome probabilities will be given by p a (ρ) = µ a , ρ .Any procedure involving an arbitrary evolution followed by a measurement in some basis can be concisely modeled using such POVMs.Frame theory -In linear algebra, a frame [26,27,32] for a vector space V is a collection of vectors 2 , for some 0 < A ≤ B < ∞.These can informally be thought of as overcomplete bases: sets of vectors spanning the space that provide a straighforward linear decomposition for all other vectors.For finite frames in finite-dimensional spaces, a set (v k ) k is a frame iff it spans V [26], although this is not true in general.Given any frame (v k ) k , any other v ∈ V can be linearly decomposed as where ( ṽk ) k is another frame, referred to as a dual frame of (v k ) k .A frame (v k ) k admits infinitely many possible dual frame iff it is not linearly independenti.e. if it is "overcomplete".If we want to estimate a given unknown state ρ from measurement outcomes, a natural class of objects to study are unbiased estimators.In this context, these are functions f : Σ → Herm(C d ), with Σ the set of measurement outcomes, that reproduce the input state on average: The elements of a POVM µ ≡ (µ a ) a∈Σ are vectors in Herm(C d ), and span linearly the space of Hermitian operators iff they are informationally complete (IC) [31].We can therefore think of µ as a frame of operators in the real space Herm(C d ) equipped with the Hilbert-Schmidt inner product.Such frames of operators are referred to as measurement frames [17-19, 21, 33, 34].The task of finding unbiased estimators is thus seen as equivalent to that of finding dual measurement frames for a given IC-POVM µ.A notable choice of dual frame is the canonical dual frame (µ ⋆ a ) a∈Σ defined via the frame superoperator The frame superoperator can also be conveniently written as F = ∑ a P(µ a ), where P(Y) for the class of so-called "tight measurement frames", it is possible to find an optimal choice of unbiased estimator such that the estimation errors do not explicitly depend on the dimension of the states.This discussion highlights that the task of choosing a suitable unbiased estimator to post-process the data is completely equivalent to the task of finding a suitable dual measurement frame.As it turns out, for the purposes of estimating the state or its properties, the canonical estimator in eq. ( 3) is in fact not the ideal choice.Different dual measurement frames provide improved estimation errors.This will be discussed in detail in section III.
Applying the framework in practice -To obtain the target estimates in a practical scenario, the idea is to (1) collect a sample of experimental data, in the form of a finite set of outcomes {b 1 , ..., b N }, (2) compute the "classical shadow" corresponding to each outcome, that is, the operators f (b k ) for all k = 1, ..., N, and finally (3) compute the sample average of the shadows, 1 N ∑ N k=1 f (b k ).Assuming each outcome is drawn independently from the same probability distribution -which is the case in this context, where we employ non-adaptive single-shot measurements -then this sample average has variance 1  N Var[ f ], where If we can suitably bound the variance Var[ f ], we then obtain performance guarantees for the additive estimation error using standard statistical bounds, such as Chebyshev's, Hoeffding's, or Bernstein's inequalities, or employing median-of-means estimators, to obtain improved tail bounds.A recent discussion of these statistical bounds and their applications to quantum state estimation is found in [14].

III. ERROR SCALING
Optimal estimators for tomography -It was shown [17,21,28] in the context of state tomography that the operators μ(ρ) b defined as give an unbiased estimator that minimizes the L 2 state estimation error if the input state is ρ and the measurement is µ.Here F ρ is the frame superoperator associated to the rescaled measurement frame with elements µ b / µ b , ρ .Note that μ(ρ) ≡ ( μ(ρ) b ) b is a dual measurement frame for µ, but not its canonical dual measurement frame.It is a suitably rescaled version of the canonical dual to the rescaled measurement frame with elements µ b / µ b , ρ .To use μ(ρ) one needs to already have a good guess about the underlying state ρ which is being measured.We can interpret ρ as the prior information on the input state, thus μ(ρ) is the optimal unbiased estimator given this prior information [35].A convenient tool to study the precision of an estimator is the MSE matrix.Following [21], this is defined with respect to a generic dual frame μ and state ρ as While we do not write the functional relationship explicitly, C ρ depends on the choice of µ, μ, and ρ.The expected L 2 state estimation error associated to the estimator f (b) = μb can be written concisely using the MSE matrix as When using the optimal unbiased estimator μb = μ(ρ) b , the MSE matrix simplifies to which therefore gives the state estimation error when using the optimal estimator built using prior knowledge of the state ρ to be estimated [36].A standard scenario is the lack of any prior information about the input state.In such cases, because the error will generally depend on the input state, it is common to consider as optimal the estimator that minimizes the average L 2 estimation error, which corresponds to the optimal estimator with respect to the reference state ρ = I/d.Following [21], we will refer to the optimal estimator in this special case as the canonical estimator, denoted with μcan ≡ μ(I/d) , which is thus written explicitly as It is worth noting that this is not the same as the canonical dual with respect to the measurement frame µ [37].
More precisely, the canonical estimator minimizes the L 2 error averaged over unitarily equivalent input states [17,28,38].This average L 2 error turns out to only depend on the purity P ≡ tr(ρ 2 ) of the input state, and will be denoted with E P .As discussed in Ref. [21], this quantity is lower bounded by with the lower bound saturated iff the measurement is composed of projectors onto subnormalized pure states that form a weighted 2-design.Such measurements are referred to as tight rank-1 IC-POVMs, and have elements µ b = w b P(ψ b ) with the weights satisfying ∑ b w b = d, and with Π sym the projection onto the symmetric subspace, writable explicitly as Π sym = (I + W)/2 with W the Swap operator.The corresponding estimator can be written as For rank-1 tight IC-POVMs, the MSE matrix equals where Π H 0 ≡ Id −P(I/ √ d) is the superoperator that projects onto the subspace of traceless linear operators.A more in-depth discussion of these results, and more generally of the connection between weighted 2-designs and tight IC-POVMs, is given in appendix D. Estimation of observables -In the context of shadow tomography, we are interested in estimating a finite set of target observables, or other properties of the state, rather than fully characterizing the state.We can readily obtain an unbiased estimator ô for any obervable O ∈ Herm(C d ) from any dual measurement frame μ via Given a finite set b 1 , ...., b N of observed outcomes, the associated estimate for the expectation value of O is obtained averaging the individual estimators: 1 N ∑ N k=1 ô(b i ).The usefulness of shadow tomography lies in the potentially favorable scalings of the associated estimation errors with respect to the state dimension d.More specifically, we are interested in the variance of ô for different choices of ρ, µ, μ, and O.For notational convenience, we indicate explicitly only the dependence of the variance on ρ: for different choices of ρ, µ, μ, O.This can be conveniently written using the MSE matrix C ρ as As discussed in detail in appendix C, we can derive a general expression for the optimal estimator with respect to a given target observable and input state, and this is found to equal the optimal estimator for state tomography on the support of the observable.More precisely, if μ(ρ) is an optimal estimator for state tomography with respect to the state ρ, then any μ such that b is optimal to estimate O.Although derived using different methods and notation, this result is similar to the one reported in [18,19].If we want an estimator which is optimal for arbitrary target observables, we need to find the one minimizing the variance averaged over the observables, and we find the optimal estimator to be in this case the same we found for state estimation.Given that in shadow tomography we do not generally want to fix beforehand the observables to estimate, we can safely fix as optimal estimators the μ(ρ) derived for state tomography.We will furthermore focus on the scenario where only the purity of the input state is known beforehand, and we will thus in the following always use the canonical estimator μcan given in eq. ( 9).This has the added advantage of being independent of both ρ and O. Nevertheless, the estimation variance will in general depend on these quantities.The variance averaged over unitarily equivalent input states is The coefficient β can be computed explicitly using known formulas to integrate polynomials in the components of unitaries matrices over the uniform Haar measure [39,40], and does not depend on µ.On the other hand, as shown in detail in appendix A, the other term decomposes as The second term can be furthermore expressed in terms of the eigenvalues of F −1 I/d , as These relations define best-and worst-case scenario performance estimates with respect to the choice of observ-ables for any given µ.Focusing on the worst-case scenario, we find the general upper bound with λ min ( FI/d ) the smallest eigenvalue of FI/d .This expression gets larger the closer FI/d is to being singular, which corresponds physically to µ getting closer to not being informationally complete.At the opposite end of the spectrum, we have tight measurements, corresponding to which the eigenvalues FI/d are fully degenerate, the upper bound becomes an identity, and simplifies to We recognize in particular the term d 2 + d − 1 − P which is the optimal state estimation L 2 error discussed in appendix D. Equation (22) shows that for tight rank-1 measurements, the variance increases with the state dimension only due to the variance V of the observable calculated with respect to the totally mixed state.Note that for rank-1 observables of the form O = P ψ for any |ψ , we have Vd = 1 − 1/d, while for observables normalized as tr(O) = 0 and tr(O 2 ) = 1, we have Vd = 1.It immediately follows that for all such cases Vd → 1 for large d, and thus the variance does not increase with d, converging asymptotically to V → 1.On top of estimating bestand worst-case scenarios for the variance, we show in appendix F how to also compute it averaging with respect to unitarily equivalent observables.We thus showed that for the entire class of tight rank-1 measurement frames, which includes but is not limited to the covariant measurements,the sampling statistics required to estimate arbitrary rank-1 observables with bounded norm does not increase with the state dimension, in direct contrast with the corresponding results about state tomography.More generally, we can explicitly characterize the class of observables which correspond to such favorable scalings.This directly implies that all these measurements can be equivalently used to implement shadow tomography schemes.While not all such measurements will allow an efficient circuit decomposition like the one presented in [13], this will depend on the experimental context that is being considered.Having a good characterization of the general class of viable measurements can greatly help to find measurement schemes to efficiently implement shadow tomography in different experimental scenarios.Shadow tomography vs state tomography -It is worth stressing the tight relation between shadow and state tomography emerging from the above discussion.The general formalism of measurement frames clarifies how these can be viewed as one and the same experimental protocol, with the only difference being how estimation errors are evaluated.Both state tomography and shadow tomography can be performed for arbitrary IC-POVMs -albeit, as discussed previously, not always with favorable error scalings -and the identical post-processing procedure can be applied in both cases.The core difference is in the problem setting: whether the target is recovering an approximation of the full density matrix, or just recovering the expectation values of finitely many observables.

IV. RELATION WITH CONSTRUCTION OF REF. [13]
We now specialize our discussion in section II to the formalism presented in Ref. [13].The goal is to show that the latter can be viewed and studied from the general perspective of measurement frames, and corresponds to the special case where the employed IC-POVM is a covariant measurement [21,29,41,42].Description of the formalism -The procedure to build classical shadows introduced in Ref. [13] involves the following steps 1. Perform a random unitary rotation ρ → UρU † on the state, and then measure the evolved state in the computational basis |b .
2. Define the operator where | b is a random variable associating to each outcome b the corresponding state |b .The expec- tation value is taken with respect to the uniform Haar measure on the group of unitary matrices U, and with respect to the possible outcomes b for each choice of unitary.
3. Compute and store the operators ρ ≡ M −1 (U † | b b|U), which are then referred to as the "classical shadows" of the state.
To estimate the expectation values of an observable O, one then uses the estimator ô ≡ O, ρ built from the classical shadows.We will focus here on the task of estimating expectation values, although in [13] the estimation of other kinds of quantities is also discussed.Another important aspect discussed in [13] is the efficiency of computing and storing the classical shadows for large many-qubit Hilbert spaces, which can be solved leveraging Clifford circuits and the formalism of stabilizer states.We will not focus on these aspects here, but rather on the general structure underlying this type of shadow tomography protocol.Equivalence: step 1 -The equivalence between the formalism thus outlined and our approach is seen observing that the act of measuring in the computational basis {|b } after a random unitary rotation U, is equivalent to directly measuring with a POVM with elements This is a measurement with (uncountably) infinitely many outcomes, whose normalization thus reads where the integral is performed with respect to the Haar measure over the unitary group of suitable dimension, and thus U(d) dU = 1.
Equivalence: step 2 -The introduced map M is precisely the frame operator corresponding to the measurement frame {µ U,b } U,b , being writable as which matches the structure of the frame superoperator, defined in eq. ( 3).Equivalence: step 3 -From the considerations above, it is now clear that the classical shadows, which read in terms of the POVM (µ U,b ) as ρ = M −1 (µ U,b ), are the elements of the canonical dual frame of the measurement frame.This shows that the formalism to compute classical shadows with random unitary rotations and projective measurement follows as a special case of the general procedure for measurement frames outlined in section II.
Covariant measurements and tight frames -At first glance this procedure might still appear slightly different from the one discussed in section II, as we did not explicitly use rescaled measurement frames here.This is however simply due to the particular choice of covariant measurement being such that tr(µ U,b ) = 1 for all U, b, and thus M and FI/d only differ by the proportionality constant d.
In other words, one is performing a random projection over a state |ψ chosen according to some distribution.More specifically, one can consider such measurements of the form µ ψ ≡ p ψ P ψ , when X is finite and ψ → p ψ ∈ [0, 1] a probability distribution on it, or when infinitely many outcomes are allowed, we define a measure over a suitably defined space of outcomes -a standard example being choosing the uniform Haar measurement in the set of all pure states.Either way, provided that the ensemble of states {(p ψ , ψ)} ψ∈X is a weighted 2-design, as discussed in detail in appendix D, the associated POVM µ constitutes a tight measurement frame, and thus the results discussed in section II ensure the efficiency of the reconstruction.If furthermore the probability distribution is completely balanced, as is the case in [13], then all the terms tr(µ b ) in eq. ( 9) are identical, and our def- inition of canonical estimator coincides precisely with ρ used in [13].
Variance and shadow norm -In Ref. [13], the variance of the estimators for observables is bounded in terms of their so-called "shadow norm", which is there defined as where the maximization is performed with respect to all possible states σ.This expression is equivalent to the canonical estimator associated to this measurement as given in eq. ( 9), which in this case reads μcan ).Note that the explicit expression for F I/d for this POVM is where we used tr(µ U,b ) = 1 for all U, b.Therefore in terms of the operator M defined in eq. ( 26) we have ).We finally recover eq. ( 27) observing that M, or equivalently F I/d , is Hermitian as a superoperator, and thus Relation between worst-case and average variances -Rewriting the shadow norm as in eq. ( 28) highlights that it amounts to the (nontrivial component of the) variance in the worst-case scenario with respect to the choice of input states.On the other hand, our discussion in section III focused on the study on the average variance.Nonetheless, we can write explicitly the connection between these quantities observing that the quantity of interest is linear in σ, that is, we can write where This implies that averaging it over σ amounts to replacing σ → I/d, and A, I/d = O, F −1 I/d (O) , which leads to the results in section III.Maximizing over σ, we get potentially different results.However, in the case of rank-1 measurements that also give a weighted 3-design, we can find a remarkably simple expression for the state-and observable-dependent variance even in the non-averaged scenario.To see this, we start observing that For any tight rank-1 POVM with elements µ b = w b P(ψ b ), using the canonical estimators μcan b given in eq. ( 12), we can also write where If furthermore the states |ψ b form a complex projec- tive 3-design, then S 3 = dΠ sym,3 /( d+2 3 ) with Π sym,3 ∈ Lin((C d ) ⊗3 ) the projection onto the completely symmetric subspace of (C d ) ⊗3 , and is a sum of the projections on the symmetric subspace of (C d ) ⊗2 on first and second and first and third qubits, respectively.These projections can be written more explicitly as Π sym,2 = (I ⊗ I + W)/2 with W the swap operator, Frame operator Optimal estimator Variance of optimal estimator Tomography (standard)  Π sym,3 = 1 3! ∑ π∈S 3 W π with S 3 denoting the symmetric group over 3 elements, and W π the unitary operators defined as [31] With these we can work out the explicit expressions for state-and observable-dependent variances, and obtain This expression shows explicitly that for any rank-1 measurements that form a 3-design, we get an explicit expression for the variance even in the non-averaged regime.This dramatically simplifies the study of the relations between best, worst, and average cases with respect to both input state and target observable.Note that the Clifford circuits considered in [13] give an example of a measurement whose elements form a 3-design [43,44], and eq.( 36) can in fact be considered as a generalization of some of the expressions for the shadow norm reported in [13] for Clifford circuits.All single-qubit mutually unbiased biases (MUBs) also provide examples of 3-designs for which this expression is thus valid.More generally, in terms of the scaling with the state dimension, we can easily see that the guarantees on the scaling of the average variance translate into guarantees for the worst-case variance considered here: if the average variance does not increase with the dimension d, the maximum of the variance with respect to input states also cannot increase with d, as it is always lower bounded by 0.

V. CONCLUSIONS AND FORWARD LOOK
We have demonstrated how the general theory of measurement frames embodies a natural framework for shadow tomography.In doing so, we have assessed thoroughly the interplay between general measurements and associated optimal estimators to recover expectation values of target observables.Our results push the current knowledge in this context, recovering previously reported seminal results (cf.Ref. [13]) as special cases of our general framework, and making possible to estimate a finite set of generic rank-1 bounded observables while avoiding the expensive growth of the number of the samples, as the dimensions of the state being considered increases.In Table I we provide a useful summary of the expressions for the frame operators, optimal estimators and associated variances needed for state and property reconstruction.
Moreover, by demonstrating how the techniques used to compute classical shadows can be seen as optimal unbiased linear estimators, our approach establishes useful connections with metrology and estimation theory.In particular, estimating only certain properties of an unknown quantum state is formally a quantum semiparametric estimation problem [45] (also known in the finite-dimensional case as estimation with nuisance parameters [46]).While quantum estimation is most commonly studied in a local/asymptotic scenario, we hope our approach will lead to further connections between shadow tomography and semiparametric estimation in the non-asymptotic regime.Another intriguing area where an approach based on infinite-dimensional measurement frames could provide useful insights is continuous variable shadow tomography, which has only very recently been proposed [47,48].Finally, the agility of the framework put forward here holds the premises to inform experimental efforts aimed at demonstrating a resourceinexpensive route to quantum state and property reconstruction.

Appendix A: Properties of frame superoperators
We briefly review in this section, for the sake of completeness, the more important properties of the frame superoperators used in the paper.General definition -The frame superoperator that provides the optimal state estimator when the true input is some reference state ρ is . Being a linear function defined on linear operators, F ρ can be thought of as a quantum map (though not generally a CPTP one).To connect to the more general theory of frames in linear algebra, this map is the frame operator corresponding to the rescaled frame of operators with elements {µ b / µ b , ρ } b .
Properties in the general case -Thinking of F ρ as a linear operator, we can define its trace in the standard way, which in this case reads for an arbitrary orthonormal basis of Hermitian operators {σ α } d 2 α=1 .More explicitly, this results in the expression We can furthermore verify by direct substitution that Properties of the inverse -As discussed in the main text, the optimal unbiased estimator provided by In particular, this means that the canonical dual frame corresponding to this frame operator has elements { µ b , ρ μ(ρ) b } b , and Taking the trace, we obtain This expression is particularly useful in that it directly enters the corresponding MSE matrix.
Canonical estimator -The optimal unbiased estimator when no prior knowledge about the true input state is assumed is obtained setting ρ 0 = I/d in the frame superoperator.We show in appendix D that the unbiased state estimator that minimizes the L 2 error averaged over unitarily equivalent states is f (b) The map F I/d has some further properties compared with its general counterpart.In particular, we have F I/d (I) = dI, which means that I is an eigenvector of F I/d .This observation can be exploited to write the general decomposition where FI/d is defined as the projection of F I/d on the subspace of traceless operators, that is, where is the (superoperator) projector onto the subspace of traceless operators.We employ the rescaled identity operator I/ √ d in these expressions to ensure the normalization of the corresponding operator with respect to the Hilbert-Schmidt inner product: This decomposition also translates into corresponding simplified expressions for inverse and trace As discussed in more detail in appendix D, these expressions simplify even further in the special case of tight rank-1 measurement frames.MSE matrix -Following [21], we define the MSE matrix corresponding to a state ρ, measurement µ, and estimator μ, as Using the optimal dual estimator given in eq.(A5), for ρ 0 = ρ, the MSE matrix takes the simplified form For an arbitrary choice of possibly suboptimal estimator, we have the inequality C ρ ≥ C opt ρ .A remarkable property of the MSE matrix is that its trace equals the average L 2 state estimation error, as will be further discussed in the following sections.The optimal MSE matrix can also be regarded as the (classical) Fisher information matrix, when the states are considered parametrized via their coefficients in some orthonormal basis.

Appendix B: Derivation of optimal state estimators
Let us consider a generic unbiased estimator -or equivalently, as discussed before, a generic dual measurement frame -and ask what is the associated average estimation error.Measuring the error in the Hilbert-Schmidt distance we find where we introduced the notation ∆ 2 ≡ ∆ 2 (ρ, µ, μ) to denote the component of the average error that depends on the choice of measurement µ and dual μ.The dependence of this quantity on these choices will not the explicitly shown in the following in order to ease the notation.
Optimal dual frame -As previously mentioned, different dual frames generally exist, and from eq. (B1) we can see that the choice of dual frame μ can affect the associated average estimation error.It is then natural to ask what is the optimal choice of dual frame.This issue is addressed in [17][18][19][28][29][30].We include here a different approach to proving that the optimal estimators are obtained from the rescaled frame superoperator, using the method of Lagrange multipliers to directly perform the optimization with respect to all possible linear unbiased estimators.
Problem definition in vectorized notation -To find the optimal choice of μ, we observe that it amounts to optimizing a quadratic function under linear constraints.To see this more clearly, we temporarily neglect the fact that the various objects in eq.(B1) are operators, and simply think of them as vectors, upon some choice of orthonormal basis for the underlying Hilbert space.The error term ∆ 2 , which is what we need to minimize, can be written in vectorized notation as and the minimization must be performed with respect to the real parameters μbj .More explicitly, this notation amounts to decomposing the operators as for some fixed choice of orthonormal operatorial basis {σ i }.
We need to take into consideration that not all sets of parameters μbj correspond to a valid dual frame of µ.The definition of dual frame can be written in vectorized notation as and this must hold for all possible choices of ρ.Although these are in principle an infinite amount of constraints, they can be thought of as equivalent to the finite set of constraints corresponding to using as ρ the elements of the considered operatorial basis {σ i }.These constraints read Let us denote this set of constraints as φ ij ≡ φ ij (µ, μ) = 0, having defined Lagrange multipliers to find stationary points -To find the minimum of eq.(B3) under the constraints in eq.(B5), we can use the general method of Lagrange multipliers.For there to be a stationary point for the cost function under the given constraints, the gradient of the cost must be in the linear span of the gradients of the constraints.More explicitly, this means that there must be a set of coefficients λ ij such that, for all b, k, we have Computing the derivatives explicitly we find and thus eq.(B7) becomes Thinking of λ, µ, μ as matrices, and defining the diagonal matrix Λ with components Λ ab ≡ δ ab µ b , ρ , eqs.(B5) and (B9) can be written concisely as Putting these together, and assuming Λ to be invertiblewhich amounts to using ρ such that µ b , ρ > 0 for all b -we get 2I = 2µ T μ = µ T Λ −1 µλ.We thus conclude that the set of coefficients λ ij must have the form In writing this, we are interpreting λ as a matrix, that is, as a linear operator in the underlying Hilbert space of Hermitian operators.In other words, we can in this context interpret the set of Lagrange multipliers as a quantum map satisfying the given relations.We can safely talk about the inverse of µ T Λ −1 µ because the corresponding map is invertible provided that µ is an IC-POVM.This is because µ T Λ −1 µ, going back to the original formalism in terms of operators, corresponds to the map and if {µ b } is an IC-POVM then its elements span the space, and the quantum map thus defined is invertible.With this solution for λ, we can now find the optimal dual frame μ using eq.(B10) as Note that µ is not in general an invertible, nor squared, matrix, and thus we cannot simplify the inverse (µ T Λ −1 µ) −1 using the inverse of its elements.
Going back to the notation with operators, the optimal dual frame we just found corresponds to the operators where we denoted with F ρ the map corresponding to the Lagrange multipliers, which can also be seen as the frame operator of the rescaled frame with elements µ b / µ b , ρ .An explicit expression of F −1 ρ in terms of μ can be obtained using again eq.(B10): we get that λ = 2 μT Λ μ and therefore to be compared with F ρ of eq. (B12).
It is worth stressing the precise kind of "optimality" we just derived.While the above optimal dual frame μb is an unbiased estimator with respect to all states, meaning ∑ b µ b , ρ μb = ρ for all ρ, the associated estimation error and its optimality depend upon the specific state ρ that is being examined.Different choices of ρ will provide different optimal unbiased estimators, although all of these estimators are unbiased with respect to all states.If one is looking for the choice of estimator that is optimal on average with respect to all possible input states, according to the uniform Haar measure, then the above reasoning follows from the choice of ρ = I/d, and the corresponding optimal estimator reads in this case where

Appendix C: Optimal observable estimators
Analogous calculation can be performed to find the optimal estimator to recover the expectation value of a target observable O.If μb is a generic dual frame, with corresponding estimator ô(b) ≡ μb , and ρ is the true state, the variance reads We focus on minimizing the first term with respect to μ, as the second term only depends on ρ and O.In vectorized notation, the first term can be rewritten as Note that in this notation μ and µ are matrices, Λ is a diagonal matrix, and O is a vector.The constraints on the estimators remain μT µ = µ T μ = I, which amounts to the set of constraints Taking the derivative with respect to μbk on both cost function, given in eq.(C2), and constraints, we obtain that there must be coefficients In more compact matrix notation, denoting with λ the matrix with components λ ij , we obtain the condition Multiplying both sides from the left first by Λ −1 and then by µ T , and observing that µ T Λ −1 µ is the matrix representation of F ρ , which is invertible for IC-POVMs, we find We thus conclude that the optimal estimators are given by explicitly, this amounts to In operator notation this reads We thus get essentially the same result, except for the fact that the optimal estimator is now only required to equal F −1 ρ (µ b )/ µ b , ρ on the span of O. Equivalently, using the optimal unbiased estimator given in eq.(A5), we have that the unbiased estimators μb that are optimal with respect to O satisfy The associated variance can be written in terms of the MSE matrix as where P(O) denotes, as is the for similar projectors, the map X → O, X O for all X ∈ Lin(C d ).
We thus conclude that finding the state estimator providing an optimal estimator with respect to a target observable amounts to finding an estimator which acts like the overall optimal state estimator on the support of the observable.If the goal is estimating a number of a priori unknown observables, we thus can directly use the optimal state estimator μ(ρ) b , which will also be optimal from the point of view of shadow tomography.In this section we prove the equivalence between weighted complex projective 2-designs and tight measurement frames, discuss the general property of tight measurement frames, and prove the known lower bounds on L 2 average estimation error corresponding to canonical state estimators.Although using a slightly different formalism, the idea behind the proof reported here is analogous to the one reported in [38]. is a linear operator acting in the space of linear operators, which projects onto the linear operator P(ψ b ).This object is a quantum map, which acts on any X ∈ Lin(C d ) as follows Being this a quantum map, we can consider its Choi representation.Given any map Φ : Lin(H A ) → Lin(H B ), we define its Choi representation as the operator J(Φ) ∈ Lin(H B ⊗ H A ) such that For an arbitrary map of the form Φ(X) = A, X B the Choi is J(Φ) = B ⊗ Ā.It follows that and thus for the frame superoperator, where T B denoted the partial transpose of the second space.This expression is useful because it provides a direct connection with the defining property of weighted 2-designs.The vectors |ψ b form a complex projective 2- design with weights w b iff we have . (D6) The d normalization factor on the right-hand side of this equation comes from ∑ b w b = d, whereas in the standard definition of weighted 2-designs the weights are normalized to 1. Using this relation we get where we expressed the projector in terms of the swap operator W via Π sym = I+W 2 .Observing that W T B = P m ≡ ∑ ij |ii jj|, J(P(I)) = I ⊗ I, and J(Id) = P m , together with the fact that the Choi is a linear isomorphism between maps and operators, we conclude that This derivation shows that, for any rank-1 IC-POVM with elements µ b = w b P(ψ b ), the frame superoperator F I/d has this form if and only if the vectors |ψ b and weights w b form a weighted 2-design.Equation (D8) differs by a factor of d to the expressions for tight frames found e.g. in [17], but that is simply due to the definitions of frame superoperator differing by a d factor, and will not affect our results.
Structure of MSE matrix for tight measurements -Suppose now µ is a tight rank-1 IC-POVM, and thus the frame superoperator satisfies eq.(D8).We can rewrite this expression as Lower error bounds for tight measurement frames -We will show here that the L 2 estimation error averaged over unitarily invariant input states, when using any unbiased estimator, is lower bounded by d 2 + d − 1 − tr(ρ 2 ), with the inequality saturated for rank-1 tight measurements.This was first proven in [17,28].To estimate the average state estimation errors we use the MSE matrix C ρ discussed in eq.(A12).If we assume the estimators μb do not depend on the input state -as is the case for the canonical estimator, but not for the optimal ones -then taking the uniform average with respect to states unitarily equivalent to ρ we get where the integral is taken with respect to the uniform Haar measure in the group of unitary matrices.Taking the trace we get the average error as For tight rank-1 measurement frames we know from eq. (D10) that tr(F −1 Let us now show that this is also the lower bound for an arbitrary measurement.From eq. (A3) we see that for any µ, where we used the inequality tr(X 2 ) ≤ tr(X) 2 for X ≥ 0, which is saturated iff rank(X) =  with equality holding iff all the eigenvalues have the same value, that is, iff FI/d is a multiple of the identity (when acting on the (d 2 − 1)-dimensional subspace of traceless Hermitian matrices).We conclude that for any measurement, we have the lower bound tr(F −1 with the inequality saturated for tight rank-1 measurements.We therefore just proved that for any measurement, the average L 2 estimation error when using the canonical estimator is lower bounded as It is also possible to study the errors corresponding to more general not-necessarily-rank-1 tight IC POVMs.This analysis can be found in [30], and the smallest possible average L 2 estimation error, when the POVM elements have average purity ℘, works out to be where )

Appendix E: Errors to estimate single observables
As discussed in appendix D, to study the estimation errors associated to a given state estimator, it is useful to introduce the MSE matrix C ρ .Suppose now we want to estimate the expectation value of some observable O on a state ρ, using an unbiased estimator of the form ô(b) ≡ O, â f (b) = O, μb .The associated average squared error will be Var This quantity can be conveniently expressed via the MSE matrix as Let us focus on the behavior of this variance when using the canonical state-independent estimator μcan b .With this choice, the variance reads explicitly By taking the average over unitarily equivalent input states, assuming the input state to have purity P ≡ tr(ρ 2 ), we obtain the averaged variance where β is the squared expectation value O, ρ 2 , averaged over all states unitarily equivalent to ρ.This quantity is computed using the known formulas to integrate polynomials in the components of unitaries matrices over the uniform Haar measure [39], and can be cast in the following convenient form: We therefore have the general expression for the variance averaged over states unitarily equivalent to ρ: The first term can furthermore be bounded in terms of the eigenvalues of F −1 I/d , as where λ min ( FI/d ), λ max ( FI/d ) denote the smallest and largest eigenvalues of FI/d (which is positive semidefinite as an operator).This bound is obtained observing that FI/d , and therefore also F −1 I/d , is a (Hermitian) linear operator acting on the space of Herm(C d ) spanned by traceless Hermitian operators.In general, if H ∈ Lin(V) is a Hermitian operator acting on some vector space V such that W ≤ V is an invariant subspace, then for any v ∈ V we have To achieve this, we exploit the fact that the canonical frame superoperator is a positive semidefinite operator, and write its eigenvalues as λ k .We optimize for maximum and minimum eigenvalue under the additional constraints ∑ k λ k = A and ∑ k λ 2 k = B, for some fixed A, B. For the purpose, we use the method of Lagrange multipliers [49].By symmetry, this is the same as optimizing 1/λ 1 under the above constraints, which in turn is the same as optimizing λ 1 (and taking the reciprocal of the solutions obtained at the end).Assuming all eigenvalues are strictly positive, we define the Lagrangian function L ≡ L(λ, α, β) as and imposing ∇L = 0 we obtain the conditions 1 + α + 2βλ 1 = 0, α + 2βλ j = 0, j > 1. (E11) These imply in particular that λ j = λ k for all j, k > 1.By constructions, there are d 2 − 1 eigenvalues, and thus the constraints become which solve to achieved when the measurement is a tight rank-1 IC-POVM.
In contrast to projector-like observables, this lower bound is not bounded from above by a constant that is independent on the dimension d and indeed one has that ∀ρ: We proved that the canonical dual frame is, in general, not the optimal choice of unbiased estimator, when the goal is reconstructing the state from a finite number of measurements.Nonetheless, it can be interesting to notice that we can see the optimal dual frame as corresponding to the canonical dual frame computed with respect to a rescaled frame with elements The set of operators {µ N b } b is a frame iff {µ b } b is.Furthermore, the corresponding frame operator is precisely the S derived above.
We briefly show in this section the consequences of considering frames of operators defined in terms of rescaled POVM elements, for arbitrary rescalings.In particular, we show what the expansion of a generic state looks like using such formalism, and the associated unbiased estimators corresponding to each choice of rescaling.These observations are not pivotal to the main results of the paper, but are presented here for the sake of completeness.) 2 ) = tr(S −1 α ). (G7) This reduces the problem to that of looking for the measurement µ that minimizes tr(S −1 α ) with the above choice of α b .

Weighted 2 -
designs and measurement frame -Consider a rank-1 measurement with elements µ b = w b P(ψ b ), b = 1, ..., m, for some set of weights w b ∈ R such that ∑ b w b = d, and some set of vectors |ψ b ∈ C d .The corresponding canonical frame superoperator is by definition equal to F I/d = d ∑ b P(µ b ) tr(µ b ) = d ∑ b w b P(P(ψ b )), (D1) where we used tr(µ b ) = w b , and we denoted with P(P(ψ b )) the projector onto the projector P(ψ b ).Here ψ b ∈ C d is a vector, P(ψ b ) ≡ |ψ b ψ b | ∈ Herm(C d ) is a linear operator projecting onto |ψ b , and thus P(P(ψ b ))

)
This writing is useful because it splits the action of F I/d into two invariant subspaces.The superoperators P(I)/d and Id −P(I)/d project onto the one-dimensional subspace spanned by I, and the d 2 − 1-dimensional subspace of traceless Hermitian matrices, respectively.This allows us to see immediately that the inverse of the superoperator is Estimators for tight measurement frames -Knowing the general structure of the optimal frame corresponding to a tight measurement with elements µ b = w b P(ψ b ), we can compute explicitly the structure of the corresponding estimator f (b) ≡ μcan b , which gives μcan b = d tr(µ b ) F −1 I/d (µ b ) = (d + 1)P(ψ b ) − I. (D11) ) where λ k are the eigenvalues of FI/d , and there are d 2 − 1 terms in the sum because rank( FI/d ) = d 2 − 1.A direct application of Lagrange's multipliers then allows us to find the minimum value of tr( F −1 I/d ) under the constraint of λ k ≥ 0 and tr( FI/d ) = d(d − 1), which reads tr( F −1 I/d ) ≥ (d 2 − 1)(d + 1) ) where v W ≡ v − Π W v is the projection of v on W, with Π W the orthogonal projection operator onto W. Applying this with H → F −1 I/d and v → O we get eq.(E8), because the orthogonal projection of O on the subspace of traceless Hermitian operators is O − tr(O)I/d, and O − tr(O)I/d 2 = Vd.Optimization of smallest and largest eigenvalues -With eq.(E8) we can now closely tie the estimation error associated to an observable to an intrinsic property of the frame superoperator associated to the chosen canonical estimator.We now analyze what class of measurements turns out to minimize these errors, and in particular what classes of measurements provide dimension-independent error scalings.

) 2
To provide some examples, let us consider observables of the form O = P ψ for some |ψ , for which we have Vd = (d − 1)/d and thus Var[ ô] wit = 1 d(d + 1) tr F −1 I/d − P + increasing function of d but bounded from above, as expected for this type of O.In a completely similar fashion, for Pauli observables of the form O = σ i 1 ⊗ • • • ⊗ σ i n acting on n qubits (d = 2 n ) we have Vd = d and therefore Var[ ô] Pauli = d d before, this quantity is bounded from below by Var[ ô] Pauli ≥ min µ Var[ ô] Pauli = d + 1 − dP − 1 d 2 − 1 .(F8) ] Pauli ∼ O(d), d → ∞. (F9)Comparing the average variances for the two observables we just saw, we can also write for general measurement framesVar[ ô] Pauli = d 2 d − 1 • Var[ ô] wit .(F10)Appendix G: Optimal dual frame and rescaled frames

P= S − 1 α
Consider a rescaled measurement frames with elements µ b / √ α b for some set of positive real coefficients α b .The associated rescaled frame operator will thus be S α ≡ ∑ b (µ b / √ α b ) denotes the corresponding canonical dual frame, the associated decomposition of a state ρ readsρ = ∑ b µ b , ρ √ α b µ (α)⋆ b = ∑ b µ b , ρ α b S −1 α (µ b ).(G3)We can then define a corresponding estimator for the state asf (b) E[ f ] = ρ.Note that, in general, µ (α)⋆ b = √ α b µ ⋆ b ,and thus different frame scalings provide nontrivially different canonical estimators, albeit eq.(G3) means that each set of operators { 1 √ α b µ (α)⋆ b } b is a, generally non-canonical, valid dual frame for the non-rescaled measurement frame {µ b } b .Average error with rescaled frames -The main usefulness of considering rescaled measurement frames is that the associated average L 2 error now reads E f − ρ 2 2 = E tr( f 2 ) − tr(ρ 2 ), we rescale the operators via α b = µ b , ρ , we can write In vectorized bra-ket notation, this is also written as P(Y) ≡ |Y Y|.Estimators from measurement frames -The above statements can be rephrased as saying that for any IC-POVM µ and any dual measurement frame μ, we have an unbiased estimator f (b) ≡ μb for the unknown input state ρ, and vice versa, any such unbiased estimator can be obtained from a dual measurement frame to µ.A core feature of shadow tomography schemes is to build estimators for target observables (or other properties of the state) via these state estimators.For example, if the goal is estimating the expectation value of some O, we use the estimator ô(b) ≡ O, f (b) .
∈ Pos(Herm(C d )) denotes the outer product of Y ∈ Herm(C d ) with itself, i.e. the superoperator that acts as P(Y) : ρ → Y, ρ Y on any ρ ∈ Herm(C d ).

Table I .
Table summarizing some of the discussed quantities involved in the characterization of quantum states and expectation values of target observables.The first two rows present the relevant quantities -frame operator, state estimator, and associated variances -associated with estimating input state in L 2 distance.The following three rows refer to the cases where the goal is instead recovering the expectation value of some target observable O.The last row presents the explicit form of frame operator, state estimator, and averaged variance to retrieve expectation values, in the special case of tight rank-1 measurement frames.
1. Thus tr(F I/d ) ≤ d 2 and tr( FI/d ) = tr(F I/d ) − d ≤ d(d − 1) with identity for rank-1 measurements.But also, being FI/d Hermitian and non-singular as a linear (super)operator, we have tr )where V ≡ O 2 − O 2 is the variance of the observable computed on the maximally mixed state: O ≡ tr(O)/d and O 2 ≡ tr(O 2 )/d.Note that the averaged variance only depends on the purity P, but not on the specific choice of initial state ρ.