Dual frame optimization for informationally complete quantum measurements

Randomized measurement protocols such as classical shadows represent powerful resources for quantum technologies, with applications ranging from quantum state characterization and process tomography to machine learning and error mitigation. Recently, the notion of measurement dual frames, in which classical shadows are generalized to dual operators of POVM effects, resurfaced in the literature. This brought attention to additional degrees of freedom in the post-processing stage of randomized measurements that are often neglected by established techniques. In this work, we leverage dual frames to construct improved observable estimators from informationally complete measurement samples. We introduce novel classes of parametrized frame superoperators and optimization-free dual frames based on empirical frequencies, which offer advantages over their canonical counterparts while retaining computational efficiency. Remarkably, this comes at almost no quantum or classical cost, thus rendering dual frame optimization a valuable addition to the randomized measurement toolbox.


I. INTRODUCTION
As our abilities to build and operate large quantum information processing systems steadily progress [1], the question of finding the most effective strategies to interrogate such devices becomes more and more pressing.Indeed, it is well known that to fully reconstruct and store quantum states produced during, say, a quantum simulation, one would need to afford exponentially many physical measurements and a similarly large amount of classical memory.
Among these, approaches based on randomization [11,12,[30][31][32][33], as well as general positive operator-valued measures (POVMs) [34,35], received substantial attention.Upon input of a target quantum state, these strategies return a statistical estimator for it, which is often referred to as a classical shadow [36].Although, in general, shadows are not valid quantum states by themselves, they can be efficiently stored and processed, and can be used to reconstruct several incompatible expectation values simultaneously [11].Notably, shadows allow for a separation between the data acquisition phase, which can be carried out without fixing a target observable, and the classical processing and reconstruction stage.The power and flexibility of classical shadows led to the development of numerous applications beyond the simple task of operator averaging.These include the reconstruction of fidelity measures [37] and of genuine quantum properties of states [38][39][40], the characterization of quantum processes [41], classical and quantum machine learning [42][43][44] and error mitigation techniques [45][46][47].Often, shadows conveniently serve as a bridge between quantum and classical representations such as tensor networks.
Focusing on the data collection step, several works have considered the optimization of informationally complete (IC) measurement operators to yield estimators with favorable statistical properties [7,12,15].However, little emphasis has so far been put on studying how the post-processing stage governs the quality of these estimators, particularly of those constructed from overcomplete measurement schemes.In fact, it was only in a recent publication that Innocenti et al. [48] raised awareness on this point, highlighting the existence of often neglected degrees of freedom associated with the so-called measurement dual frames [49,50].In other words, while classical shadows protocols embody the principle "measure first, ask questions later" [30], a lot remains to be said about how such questions should be asked.
In this work, we dive deeper into the application of frame theory to IC quantum measurements for digital quantum computing architectures.We present efficiently computable classes of parametrized dual frames, together with the corresponding optimization routines.Indeed, while a number of results are known concerning optimal choices of measurement settings and dual frames for both tomography and observable estimation [48], these are often impractical or impossible to realize at scale due to inherent technical (e.g., lack of connectivity, device noise) or fundamental (e.g., memory or data processing requirements) limitations.By leveraging a product structure, our proposed methods ensure consistent improve-ments over standard settings while remaining applicable, in principle, up to large sizes of the target qubit registers.We support our analytical findings with numerical investigations.These suggest that dual frame optimization -even when subject to certain pragmatic constraintscan significantly boost the quality of shadow estimators for generic operator averaging tasks.In particular, it reduces the performance gap between randomized projective measurements and local dilation POVMs, which are substantially more demanding to implement.
The paper is organized as follows.In Sec.II, we review the theory of observable estimation with generalized measurements from a frame theory point of view.In Sec.III, we develop methods to optimize dual frame operators to improve the variance of overcomplete POVM estimators.Finally, in Sec.IV, we showcase our methods on paradigmatic numerical examples.

A. Generalized measurements
The most general class of measurements in quantum mechanics are described by the POVM formalism.An noutcome POVM is a set of n positive semi-definite Hermitian operators M = {M k } k∈{1,...,n} that sum to the identity, i.e., n k=1 M k = 1.Given a d-dimensional state ρ, the probability of observing outcome k is given by Born's rule as p k = Tr[ρM k ].Standard projective measurements (PMs) are a special case of POVMs, where each POVM operator is a projector such that M k = |ϕ k ⟩⟨ϕ k | for some pure states ϕ k .A POVM is said to be informationally complete (IC) if it spans the space of Hermitian operators [49].Then, for any observable O, there exist Given such a decomposition of O, the expectation value ⟨O⟩ ρ can be written as In other words, ⟨O⟩ ρ can be expressed as the mean value of the random variable ω k over the probability distribution {p k }.Given a sample of S measurement outcomes {k (1) , . . ., k (S) }, we can thus construct an unbiased Monte-Carlo estimator of ⟨O⟩ ρ as ô : {k (1) , . . ., k

B. PM-simulable POVMs
Digital quantum computers typically only give access to projective measurements (PMs) in a specified computational basis.More general POVMs can be implemented through additional quantum resources, e.g., by coupling to a higher-dimensional space in a Naimark dilation [51] with ancilla qubits [52] or qudits [8,34] or through controlled operations with mid-circuit measurements and classical feed-forward [53].While these techniques have been demonstrated in proof-of-principle experiments, their full-scale high-fidelity implementation remains a challenge for current quantum devices [8].Of particular interest are thus POVMs that can be implemented without additional quantum resources, i.e., only through projective measurements in available measurement bases.
More complex POVMs can be built from available projective measurements through convex combinations of POVMs: For two n-outcome POVMs M 1 and M 2 acting on the same space, their convex combination with elements M k = pM 1,k + (1 − p)M 2,k for some p ∈ [0, 1] is also a valid POVM.This can be achieved in practice by a randomization of measurements procedure, which simply consists of the following two steps for each measurement shot.First, randomly pick M 1 or M 2 with probability p or 1 − p, respectively, then perform the measurement associated with the chosen POVM.We call POVMs that can be achieved by randomizations of projective measurements PM-simulable.On digital quantum computers the easiest basis transformations are single-qubit transformations of the computational basis.POVMs that consist of single-qubit PM-simulable POVMs are thus the most readily accessible class of generalized measurements and have found widespread application.These include classical shadows and most of their derivatives, see Appendix A.
Importantly, PM-simulable informationally-complete POVMs are overcomplete [54].The decomposition of observables from Eq. ( 1) is thus not unique.In this work, we leverage these additional degrees of freedom to build better observable estimators, see Fig. 1.

C. Frame theory and dual space
We will now outline a formal approach to obtain the coefficients ω k in Eq. ( 1) for a given observable O. First, we note that the minimal number of linearly independent POVM elements for an IC-POVM is n = d 2 .We call such POVMs minimally informationally complete.In that case, the coefficients ω k are unique.However, for POVMs with n > d 2 , such as those that arise from IC PM-simulable POVMs, the decomposition in Eq. ( 1) is not unique.This redundancy is described by frame theory, as outlined in Ref. [48] and detailed in Appendix B.
The set of POVM operators M = {M k } k∈{1,...,n} forms a frame for the space of Hermitian operators if and only if it is IC.For any frame, there exists at least one dual frame D = {D k } k∈{1,...,n} , such that for any Hermitian operator O. Therefore, the coefficients ω k can simply be obtained from the duals D as Notably, dual operators generalize the concept of classical shadows of a quantum state [11], thus providing a direct connection to the popular randomized measurement toolbox [30].For a minimally IC POVM, only one dual frame exists.It can be constructed from the POVM elements as with the canonical frame superoperator where we have used the widespread vectorized "doubleket" notation detailed in Appendix C. Thus, the frame superoperator can be used to transform between the POVM space and the dual space.For an overcomplete POVM, the canonical frame superoperator creates one of infinitely many possible dual frames.We will further explore this point in section III B.

D. Observable estimation
The theory of frames and duals enables a systematic approach to estimate observable expectation values from a given set {k (1) , . . ., k (S) } of POVM measurement outcomes: First, one picks a valid dual frame D, and construct the dual operators {D k (1) , . . ., D k (S) } for the observed outcomes.Then, one computes the corresponding operator coefficients {ω k (1) , . . ., ω k (S) } through Eq. ( 5).Finally, Eq. ( 3) yields an estimate for ⟨O⟩ ρ .The statistical variance of this estimator is given by the standard error on the mean The numerator, also known as the single-shot variance (SSV), depends explicitly on the POVM M, the duals D (when they are not unique), the observable O and the state ρ as Throughout this work, the SSV is used as a performance measure for POVM-based estimators.Note that the second term ⟨O⟩ 2 ρ depends neither on the POVM nor the dual frame.However, the first term can be decreased both by adjusting the POVM M, but also by optimizing the duals D (if they are not unique) when the POVM itself remains unchanged, see the schematic in Fig. 1.
The minimal SSV is achieved by performing a PM in the eigenbasis of 55,56].While this measurement is usually not easily implementable, it serves as a lower bound for all estimations of ⟨O⟩ ρ with a √ S-scaling.In practice, one typically chooses a specific type of POVM measurement that the quantum hardware can implement, e.g.PM-simulable POVMs or dilation POVMs.The POVM operators can then be parametrized and classically optimized to minimize the SSV.This can either happen during repeated measurement rounds in an adaptive quantum-classical feedback loop [7,14], or a priori [12,15].Since the SSV depends both on the observable and the generally unknown state, the POVM operators can be optimized either under knowledge of the targeted observable only or by taking into account an approximation of the state obtained from a classical reference calculation.However, the SSV depends both on the POVM operators as well as the chosen duals whenever these are not unique.Crucially, the dependence on the dual frame can be controlled purely during the post-processing phase and thus comes with no additional cost in the quantum resources.Moreover, an optimization of the dual operators can also be individually tailored to different observables that one might want to estimate from the same set of IC data.
For a fixed IC-POVM M and state ρ, the dual frame that minimizes the SSV as a function of D, irrespective of the observable O, is obtained from Eq. ( 6) when using a modified frame superoperator given by and re-scaling each dual operator by 1/ Tr[ρM k ] [48].This, however, requires knowledge of the state ρ.If no prior information on ρ is available, the best choice of duals can be considered the one that minimizes the SSV as a uniform Haar average over all states.In that case, the optimal duals are obtained from a modified frame superoperator given by It is important to point out that this choice is the one generally employed in classical shadows protocols [30].More explicitly, a standard classical shadow ρs , namely a single-shot estimate of the state obtained by constructing an inverse measurement channel, is equivalent to the "average-optimal" dual D k obtained from inversion of the frame superoperator F avg introduced above.

A. Local POVMs and duals
A POVM-based estimation of observables is only feasible if the POVM operators themselves as well as the dual operators can be efficiently handled classically.Therefore, most POVM-based estimation schemes employ local POVMs, where every M k acts non-trivially only on few qubits.This is crucial in order to keep a local structure in the frame superoperator from Eq. ( 7), such that it can be constructed and inverted efficiently.Moreover, also the duals themselves need to be efficiently processable in order to compute the observable coefficients ω k via Eq.( 1).However, this is not guaranteed by the optimal dual frame F opt , even when the POVM operators have a product structure.This implies that the optimality results presented in the previous chapter cannot be applied in general.
In the following we consider N -qubit systems where the POVM effects are tensor products of single-qubit noutcome POVM effects.That is, each global effect can be written as where ki=1 is a n-outcome single-qubit POVM acting on qubit i.To fully leverage the ability of optimizing duals for the SSV, we require a parametrization of suitable dual frames, such that they remain efficiently processable.We provide this through a parametrized frame superoperator in the following.

Weighted frame superoperator
If M is overcomplete, the set of all valid duals can be explicitly parametrized through a singular value decomposition [57], see Appendix D. In principle, this parametrization could be used to optimize the dual frame for a minimal SSV.However, it is not straightforward to impose a product structure on the dual operators in this way.For a more practical, albeit non-exhaustive parametrization of the dual frames, we thus define a weighted frame superoperator which resembles the canonical frame superoperator F, but with the contribution of each effect M k rescaled by a factor α k ∈ R. If F α is invertible, the effects given by form a valid dual frame which is invariant under uniform scaling of the coefficients.Notice that if all the coefficients {α k } are positive, then F α will be positive definite and hence invertible.We can hence think of the parameters {α k } as a probability distribution when restricting them to positive values.
Assuming the POVM has a product structure as in Eq. ( 11), it is the degree of correlations in the multivariate probability distribution α k that determines what kind of product structure the resulting duals from Eq. (13) will have.In the simplest case, when α k fully factorizes, i.e., α k1,k2,...,k N = α k N , the dual frame will be of product form as in Eq. (11).More generally, if α k is a product of distributions that each act on at most m qubits, then the duals will be tensor products of terms that act on m qubits.In this case, the traces to compute ω k in Eq. ( 5) factorize into blocks that involve constructing matrices of at most size 2 m × 2 m .This way, the complexity of the dual operators in the post-processing can be tuned by imposing restricted correlations in the distribution α k .
We thus propose the following general procedure to improve statistical estimators based on overcomplete POVMs such as classical shadow methods.First, a collection of shots {k (1) , . . ., k (S) } is measured from a fixed POVM.For a given parametrization of duals through equation ( 13), the single-shot variance in Eq. ( 9) is estimated with the (corrected) sample variance of the values {ω k (1) , . . ., ω k (S) }.An optimizer will then minimize the SSV as a function of the parameters entering the weighted frame superoperator, yielding an estimator of ⟨O⟩ ρ with the smallest possible variance.This can be repeated independently for each observable of interest starting from the same collection of samples, harnessing the true power of IC measurements.As this dual optimization does not require changing the quantum circuits to be executed nor increasing the sample size, we consider it to be a "free lunch" improvement over standard classical shadows techniques.
In practice, the optimization landscape of the dual parameters α k could be difficult to navigate, due to the complicated dependency of the duals on α k in Eq. ( 13).Also, a simultaneous optimization of the duals and the POVM operators themselves can be cumbersome, as it requires a quantum-classical feedback loop.As an alternative to optimizing a parametrization of the dual operators, we thus propose the following procedure to obtain suitable dual frames for a fixed and overcomplete IC POVM, which we refer to as empirical frequencies dual frames.

Empirical frequencies dual frames
When no knowledge about the state is available, the average-optimal dual frame should be used, as discussed in Sec.II D. As the POVM measurement is repeated and the number of shots S increases, we gain some knowledge about the state, which we can leverage to approximate the optimal dual from Eq. (10).More precisely, the measured frequencies f k = #k/S (where #k is the number of times the outcome k was obtained) follow a multinomial distribution and converge to the true measurement probabilities p k = Tr[ρM k ] as p k (1 − p k )/S.One could thus replace the outcome probabilities p k = Tr[ρM k ] in the optimal dual frame with the empirical frequencies f k = #k/S.That is, we use the global empirical dual frame which is a weighted frame superoperator with α k = 1/f k = S/#k.However, an obvious issue arises if an outcome is not obtained (f k = 0).We address this by adding a regularization to the empirical frequencies with a bias term.This biases the outcome probabilities with respect to the fully mixed state 1 d 1, borrowing an idea from Ref. [56].The resulting biased empirical frequencies are given by fk ({k (1) , . . ., k (S) }, S bias ) = #k + Tr 1 d 1M k S bias S + S bias .
(15) If we assume that all effects are non-null, then α k = 1/ fk > 0, which ensures the frame superoperator is invertible.Note that, for S = 0, we recover the averageoptimal dual frame, while for S → ∞ the empirical dual frame converges to the optimal dual frame from Eq. (10).
The global empirical dual frame still suffers from two issues: Firstly, for sizable qubit numbers, the number of different POVM outcomes n eventually becomes much larger than the available shot budget S. In this regime, it is difficult to improve over the average-optimal dual with the above global empirical dual frame.Secondly, the dual matrices can become exponentially large when the correlations in fk are not restricted, as discussed in Sec.III B 1.Both of these issues are overcome when relaxing the task from learning the global distribution p k to recovering only the most relevant few-qubit correlations of this multivariate distribution.In the simplest case, the (potentially biased) outcome probabilities f k can be approximated with the product of marginal probabilities While this does not model correlations between POVM outcomes of different qubits, it still presents an advantage over the average-optimal dual frame, while ensuring the dual frame is of product form, see Sec.IV.The correlations captured by the empirical frequencies can be systematically tuned up by partitioning k into marginals of larger sizes.Let Λ = {λ 1 , . . ., λ l } be a partitioning of the qubit indices {1, . . ., N } into subsets λ i that each contain up to m terms.We can then approximate the global distribution f k (or fk ) as a product of m-body marginals f λi This leads to dual frame operators that are tensor products of m−local terms.The question arises how to optimally choose the partitioning Λ. Ideally, pairs of qubits whose POVM outcomes are highly correlated should preferably be grouped into the same set.We quantify this through the empirical mutual information I(i, j) shared by two qubits, given as This quantifies the price one has to pay when approximating the joint distribution f {i,j} through the product of marginal distributions f kj , given by their Kullback-Leibler divergence.
In a practical setting, the maximally-allowed degree m max should be chosen such that the classical cost in computing the traces of the resulting 2 m ×2 m dual matrices is deemed tolerable, and sufficient statistics are gathered to capture the m−body marginals, which becomes exponentially more difficult as m increases.Once m max is chosen, one can define a cost function C for Λ that is given by the sum of the pairwise mutual information over all set, i.e., While the optimal partitioning can be straightforwardly computed from this cost function for small qubit numbers, this becomes infeasible for larger N due to the super-polynomial scaling of the number of different partitionings.In such a setting, one can construct a wellperforming partitioning by first computing I(i, j) for all pairs of qubits, and then putting pairs of highest values into the same subset with a greedy allocation strategy.

IV. NUMERICAL BENCHMARKS
We now showcase the methods for dual frame optimization developed in Sec.III on numerical examples.In the following, in Sec.IV A, we first benchmark the performance of different classes of POVM operators and dual frames by optimizing their single-shot variance for generic random states and observables.Then, in Sec.IV B, we demonstrate how the explicit optimization of the operators can be circumvented by using only the empirical frequencies of the outcomes to obtain a wellperforming dual frame.

A. Performance limit of different POVM classes
Here, we investigate which class of single-qubit POVMs and dual frames yields the best possible estimators.In this idealized setting, we assume full knowledge of the underlying state ρ.Our procedure is the following: We first sample a Haar-random pure state ρ.We construct random observables O by sampling eigenvalues λ 1 , . . ., λ d uniformly at random from [−5, 5] and applying a Haarrandom unitary U that yields O = U diag(λ 1 , . . ., λ d )U † .As discussed in Sec.II D, the optimal measurement would be the projective measurement in the eigenbasis with a SSV of O 2 ρ −⟨O⟩ 2 ρ .For a given observable O, we therefore evaluate a class of POVMs M and duals D by their class performance with Var[ω k ] given as in Eq. ( 9).Similarly, we quantify the ability to estimate several observables {O i } i∈{1,...,N obs } from the same IC POVM data through the cumulative class performance ) Note that the duals D in Eqs. ( 21) and ( 20) are implicitly defined through the POVM operators M but carry free parameters as in Eq. ( 13).We use a BFGS optimizer to compute F and F C for five classes of single-qubit POVMs detailed in Appendix A, namely, classical shadows, locally-biased classical shadows, mutually-unbiased bases (MUB) POVMs, and general PM-simulable POVMs (all 6 outcomes each), as well as 4-outcome dilation POVMs.The distributions of the achieved performance limit for 200 random states ρ state and observables O (or set of N obs = 5 observables {O i } for F C ) are shown in Fig. 2 for single-qubit and two-qubit systems.
In all cases, the class performance improves significantly when moving beyond the canonical dual frame.Therefore, the additional degrees of freedom leveraged by our dual optimization improve POVM estimators beyond what can be achieved by optimizing the POVM operators alone.Trivially, on a single-qubit observable, the MUB, PM-simulable and dilation POVMs all reach the optimal performance F = 1, as the eigenbasis projectors are included in this class of POVMs, see Fig. 2a.However, when estimating several observables from the same POVM data, the cumulative performance is again improved by adapting the dual frame for each observable, see Fig. 2c.For two-qubit observables, no measurement setting will consistently reach the optimal performance as we restrict ourselves to single-qubit POVM operators, see Fig. 2b.The optimized local duals perform slightly worse than the optimized global duals but still considerably better than canonical duals.When estimating several two-qubit observables, optimizing the dual operators gives a more significant performance improvement than adding more complexity to the measured POVM operators by going from classical shadows to more general PM-simulable POVMs, see Fig. 2d.The optimal local dual frame depends on the observable, hence it should be re-optimized it for every observable, offering an additional slight improvement.
A common trend in the results of Fig. 2 is the following: As more degrees of freedom are optimized in a PMsimulable POVM, the performance gains become increasingly smaller and reach a plateau when using the canonical dual frame (see, e.g., blue violins in Fig. 2d).However, these gains become increasingly larger when using optimized dual frames.In other words, it becomes less and less worth it to add further degrees of freedom to the POVM operators when using the canonical dual frame, which is the opposite to what is observed when using optimized dual frames.This is especially true when estimating several observables from the same POVM data.In all cases, PM-simulable POVMs with optimized duals (even local ones) come close to or surpass the performance of optimized dilation POVMs.Interestingly, the average-optimal dual frame (see Sec. II D) does not offer reliable performance improvements, indicating that this result might not be practical in a realistic setting.

B. Empirical frequencies dual frames
Next, we showcase how to bypass the explicit optimization of the dual frame with the use of empirical fre- For blue distributions a fixed dual frame is used.For orange distributions, the optimal dual frame is used.The red dashed line represents the optimal lower bound which is saturated for the projective measurement in the eigenbasis of the observable.a) Single-qubit system with one observable.b) Two-qubit system with one observable.c) Single-qubit system with five observables.d) Two-qubit system with five observables.For two-qubit systems, the class performance is computed for the cumulative variance of all observables and the optimized dual frames are limited to product form for the green distributions.For the light-green distributions, the duals are re-optimized for every observable.
quencies dual frames as introduced in Sec.III B 2. We first investigate the performance of the m-body marginal distributions in the infinite shot limit, i.e., we construct the marginal probabilities in Eq. ( 17) from the exact outcome distributions p k In Fig. 3, we show how the improvement over canonical duals scales with an increase in the system size.Here, we plot the ratio of the variance of classical shadows estimators when using optimized duals compared to canonical duals.Distributions in violin plots are obtained from 200 random samples of states and (single) observables.Remarkably, as the system size increases from one to four qubits, the improvement of the optimal global duals becomes more and more pronounced.At four qubits, the variance of this optimal estimator is less than half of the variance of the canonical dual estimator.This can be understood as a consequence of increasing the classical resources that go into the construction of the dual operators.The duals derived from the single-qubit marginal distributions (one-local duals) become less performant as the system size increases.This comes as no surprise, as the product of the marginal distributions will capture the true correlated distribution less and less successfully with increasing qubit number.The performance of the marginal duals can be systematically improved by including higher-order correlations, as shown by the two-local and three-local duals.These are constructed by choosing the optimal partitioning of the four qubits into subsets of sizes (2, 2) and (3, 1) according to Eq. ( 19).Overall, these numerical results show that marginal frequencies dual frames can offer a straightforward improvement over canonical dual frames.Their performance can be systematically boosted by introducing higher-order correlations.
Finally, we investigate how well empirical frequencies perform in the practically relevant setting of finite samples.In Fig. 4, we show how the SSV improves with increasing sample size S for different types of marginal distributions when estimating a single four-qubit observable.Note that we use the biased empirical frequencies introduced in Eq. ( 15) to choose the dual frame, but plot the true underlying SSV according to Eq. ( 9) instead of estimating the variance from the finite sample.To illustrate the role of S bias , we show the convergence with one lower value of S bias = 128 and one larger value of S bias = 1296 (the number of different POVM outcomes).
In the regime where S ≤ S bias , the duals remain close to the canonical duals by design.As S increases, the empirical frequencies eventually converge to their true underlying values.The bias controls the rate and stability of this statistical convergence.The smaller bias is sufficient to give a smooth convergence for the more restricted marginal distributions (dotted lines).In fact, the empirical frequencies with the one-local, two-local, and threelocal marginals already offer a concrete improvement over the canonical dual variance with only a few hundred measurement samples, which is well below the total number of POVM outcomes.However, when approximating the global outcome distribution (blue curves), choosing too small of a bias will render the empirical frequencies unstable (dashed blue line).On the other hand, for the one-local and two-local duals, the larger bias comes at the price of a significantly slower convergence, illustrating a tradeoff between stability and speed.In practice, S bias can nevertheless always be chosen large enough such that the empirical frequencies dual frame gives a performance that is at least as good as the canonical duals.
Our results indicate that a reasonable choice for S bias is on the order of the degrees of freedom in the marginalized probability distributions.

V. DISCUSSION
In this paper, we investigated the connection between frame theory and randomized overcomplete quantum measurements [48].In particular, we developed and tested scalable dual frame optimization strategies that allow us to significantly sharpen the performances of "measure-first" schemes while acting entirely at the data post-processing stage.
Inspired by known analytical results which, albeit optimal, are hardly realizable in practice, we identified a minimal set of constraints that guarantee an efficient and effective computational pipeline.More specifically, we proposed a marginalized version of dual frames offering advantages even if restricted to a structure of limited correlations.This class of duals can, in principle, be parameterized and iteratively refined in combination with, e.g., adaptive POVMs [7].Furthermore, we described an optimization-free dual frame, based on empirical outcome frequencies, that converges (in the statistical sense) to the best m-local one.This approach does not require any prior knowledge or assumptions on the state being measured, and can be tuned up systematically to capture the most relevant correlations in the measurement outcomes of individual qubits.By removing the need to explicitly optimize the dual frame, this solution significantly simplifies the navigation of the measurements space when searching for the most suitable POVM/dual combination.Our techniques are especially relevant for use-cases that require estimating several observables for the same state, as the dual operators can be optimized for every observable independently.Beyond reconstructing expectation values, all our proposed methods are applicable to an extensive set of tasks, including reduced state tomography, machine learning.and error mitigation.
To support our analysis, we performed numerical simulations in both the infinite statistics limit and in the finite sample regime.Remarkably, our results suggest that, with a judicious selection of the dual frame, overcomplete PM-simulable POVMs can come close to the best results obtained with dilation ones in the context of operator estimation.Due to the inherent simplicity of their implementation, PM-simulable POVMs might then be preferable in practical applications.Our simulations -albeit collected at modest scales and for generic, Haar-random states and observables -also indicate that the advantage unlocked by using global duals increases with the system size, while the relative improvement brought by simple single-qubit marginal dual frames decreases.We note that, in most practical settings, observables are seldom dense and are rather linear combinations of Pauli strings with finite weight, e.g.Hamiltonians encountered in condensed matter, lattice problems and quantum chemistry.For such local Pauli strings, one can construct a dual that acts globally on the qubits that make up the non-trivial support of the observables.Furthermore, physically relevant states most often exhibit a distinctive and restricted correlation structure, which can be exploited in the construction of our proposed mutual-information-based dual frames.We leave the in-depth study of these aspects to future work.
As a next step, one could aim at better characterizing the gap between ideal and marginal dual frames in a systematic way, possibly using tools from probability theory [58,59].In parallel, it would also be interesting to focus on the development of alternative classes of efficiently computable dual frames which could help reduce such performance separation.Examples might draw inspiration from Clifford [31,33] and matchgate [32,60] shadows, or make use of classical techniques such as tensor networks [46] and neural network quantum states [61].
Combining estimates obtained with different dual frames, for example through a median of means [11], could also improve the overall quality of the estimators.
To conclude, and as a key takeaway, our work confirms that dual frames deserve a greater attention whenever overcomplete IC techniques are employed to reconstruct properties of quantum states.In fact, the freedom they offer only concerns the classical processing of outcomes and can be straightforwardly leveraged to improve any shadow-based protocol without overheads in sampling or quantum circuit complexity.We therefore expect that the careful selection of duals will soon become a standard component of randomized and IC measurement toolboxes, significantly enhancing their performances.
Note added.Recently, we became aware of related work by J. Malmi et.al [62], in which the authors report estimation improvements using dual optimization in the context of Hamiltonian simulation and quantum chemistry.A. Caprotti et.al [63] also report complementary results concerning optimised shadow inversion maps.
then the set F is called a frame.Given a frame F , the tightest bounds A and B are called the frame bounds of F .This definition implies that a frame spans the vector space V , which is a necessary condition for a set to be a frame.Note that this condition is also sufficient for finite sets in finite-dimensional vector spaces, but not in general [65].
If the vectors constituting the frame are linearly independent, then the frame is a usual basis and it is referred to as minimally-complete.Otherwise there is some redundancy and the frame is said to be overcomplete.It can be seen as a generalization of the notion of a basis. Let is called a dual frame to F .We see from the definition that if D is a dual frame to F , then F is a dual frame to D. In other words, duality is a reciprocity relation.A fundamental result of functional analysis is the existence of a dual frame for any frame [66].More precisely, if F is minimally-complete, it has exactly one dual frame.If F is overcomplete, it has infinitely many dual frames.
To summarize, a frame F spans V and it always has a dual frame D. This offers a very convenient and natural way to find suitable coefficients to express any vector using the frame F .For any

Frame Operator
After establishing the existence of a dual frame D for any frame F , let us now consider explicit constructions for it.Given a frame {f k } k∈K ⊂ V , the linear operator The canonical frame operator F is thus invertible and its inverse F −1 is linear, self-adjoint and positive definite.Consider a frame {f k } k∈K ⊂ V and its canonical frame operator F. The set of vectors is called the canonical dual frame to {f k } k∈K .To see that it is indeed a dual frame to {f k } k∈K , simply note that we can write any v ∈ V as where we used the fact that F −1 is self-adjoint.Consider now a frame F = {f k } k∈K ⊂ V and a set of real coefficients {α k } k∈K ⊂ R. The operator is called an α-frame operator.If the operator F α is invertible, the set is a valid dual frame to F .If all the coefficients {α k } k∈K are positive, then the α-frame operator will be positive definite and hence invertible.Lastly, this α-parametrization of the dual frame is invariant under uniform scaling of the coefficients.That is, the sets of coefficients {α k } k∈K and {C • α k } k∈K , with C > 0, will give the same dual frame.

Application to POVMs
As we only consider Hilbert spaces H of finite dimension, d < ∞, the set of Hermitian operators is an operator-valued vector space V = Herm(H).Together with the Hilbert-Schmidt inner product, ⟨O 1 , O 2 ⟩ = Tr O † 1 O 2 , it forms a real Hilbert space of dimension d op = d 2 .Therefore, the definition of a frame can be applied to the vector space V = Herm(H).In this case, a frame is a set {M k } ⊂ Herm(H), for which there exist A, B ∈ R >0 such that for all O ∈ Herm(H).In the following, we will only consider finite sets of operators {M k } acting on a finitedimensional space H. Hence, a set {M k } n k=1 is a frame if and only if it spans Herm(H), and an n-outcome POVM is a frame for Herm(H) if and only if it is informationally complete.
However, note that a general frame for Herm(H) is not necessarily a POVM: indeed, it does not necessarily have positive semi-definite elements nor does it necessarily sum up to the identity.We can relax the IC condition if we are only interested in operators O ∈ span(M).Then, there still exists a dual frame on span(M) for the following reasons.Note that span(M) is a vector space and M is trivially a frame on the vector space span(M).Therefore, there exists a dual frame D on this vector (sub-)space.

FIG. 1 .
FIG. 1. Schematic of dual frame optimization.Generalized measurements are performed on the quantum system.Upon obtaining outcome k, the corresponding canonical dual operator D k -also known as classical shadow -can be efficiently computed and stored on a classical computer.The expectation value of any observable O can be estimated from a sample of dual operators.Leveraging additional degrees of freedom, one can optimize these dual operators through classical postprocessing, effectively reducing the estimation variance.

FIG. 2 .
FIG. 2.Performance of different classes of POVMs and dual frames for estimating random observables, shown as violin plots.Color indicates the employed class of duals.Each distribution is built from 200 repetitions of sampling a Haar-random pure state and an observable (or set of observables) and subsequently minimizing the class performance F (or cumulative performance FC) for each combination of POVM and dual frame.Horizontal markers show the median of each distribution.For blue distributions a fixed dual frame is used.For orange distributions, the optimal dual frame is used.The red dashed line represents the optimal lower bound which is saturated for the projective measurement in the eigenbasis of the observable.a) Single-qubit system with one observable.b) Two-qubit system with one observable.c) Single-qubit system with five observables.d) Two-qubit system with five observables.For two-qubit systems, the class performance is computed for the cumulative variance of all observables and the optimized dual frames are limited to product form for the green distributions.For the light-green distributions, the duals are re-optimized for every observable.

FIG. 3 .
FIG. 3. Variance reduction compared to classical shadows with canonical duals for different types of empirical frequencies dual frames.Violin plots show the distribution over 200 random pairs of states and observables.

FIG. 4 .
FIG. 4. Convergence of the single-shot variance with increasing shot number for estimators based on empirical frequency dual frames on random four-qubit states and observables.The POVM operators are single-qubit classical shadows.For the solid line data, a bias of S bias = 1296 is used, while dashed lines are obtained with S bias = 128.Error bars are the standard deviation over 15 repetitions of sampling the indicated number of shots from the underlying POVM distribution.
For an IC-POVMM = {M k } n k=1 , a dual frame D = {D k } n k=1 ⊂ Herm(H) exits such that O = n k=1 Tr[OD k ]M k = n k=1 ω k M k (B10)for all O ∈ Herm(H), where the coefficients ω k = Tr[OD k ] are real as O, D k ∈ Herm(H).
Finally, as O ∈ span(M), we can write O = k Tr[OD k ]M k = n k=1 ω k M k by definition of a dual frame.