Experimental Estimation of Quantum State Properties from Classical Shadows

Full quantum tomography of high-dimensional quantum systems is experimentally infeasible due to the exponential scaling of the number of required measurements on the number of qubits in the system. However, several ideas were proposed recently for predicting the limited number of features for these states, or estimating the expectation values of operators, without the need for full state reconstruction. These ideas go under the general name of shadow tomography. Here we provide an experimental demonstration of property estimation based on classical shadows proposed in [H.-Y. Huang, R. Kueng, J. Preskill. Nat. Phys. https://doi.org/10.1038/s41567-020-0932-7 (2020)] and study its performance in the quantum optical experiment with high-dimensional spatial states of photons. We show on experimental data how this procedure outperforms conventional state reconstruction in fidelity estimation from a limited number of measurements.


I. INTRODUCTION
A full description of a quantum system state is provided by its density matrix ρ, and conventional quantum tomography aims to provide an estimateρ of a density matrix for an unknown quantum state given the measurement data [1]. The measurements have to be tomographically complete in the sense that they should allow unambiguous determination of all density matrix elements. Simple parameter counting shows that for a general mixed state of a system in a D-dimensional Hilbert space, the required number of measurements is at least D 2 [2]. This number may be reduced to O(RD log 2 D) if some prior information about the state rank R is known by using the techniques of compressed sensing [3], or otherwise one has to stick to incomplete state tomography [4]. For pure states, protocols requiring as few as 5D measurements are known [5]. Anyway, for an nqubit system, the number of measurements scales exponentially, since D = 2 n , which is known as the curse of dimensionality. One of the ways around this problem is to assume some model for the quantum state, allowing for efficient representation, such as a matrix-product state model [6,7] or a neural-network-based model [8][9][10]. In general, however, there may be no a priori reason to assign such a model to an unknown state.
On the other hand, the exponential amount of information contained in a full density matrix may be redundant. Typically a researcher is interested in a restricted number of state properties, such as fidelity to the given state which is intended to prepare, or a mean value of some observable. This fact led to a different approach called shadow tomography pioneered in the work [11]. It promises accurate estimation of exponentially many linear functions of ρ using only a polynomial number of state copies. However, the original method from Ref. [11] * struchalin.gleb@physics.msu.ru is very demanding for hardware implementation as it involves measurements that act collectively on all copies. So despite significant experimental progress in approximate quantum learning [12], direct realization of the original shadow tomography is beyond the current technology.
Fortunately, the authors of Ref. [13] proposed another procedure that requires only separable measurements on each copy yet being powerful in estimating an exponentially large number of state properties. Here we report an experimental realization of this procedure demonstrating estimation of mean values of operators and fidelity estimation from classical shadows of quantum states introduced in [13]. We experimentally access Hilbert spaces of dimensionality up to 32 and clearly demonstrate that the approach is applicable in the region of incomplete measurement sets, where traditional tomography fails completely.

II. METHOD
Shadow tomography is a tool for the effective prediction of quantum state properties. Let us note, that understanding the term in this broader sense we will refer to the protocol of Ref. [13] as shadow tomography as well. While it is capable of recovering both linear and higher-order polynomial target functions in matrix elements of ρ, in the present work, we will focus solely on linear ones. We will explicitly describe the algorithm we used in application to our experiment. The reader is referred to the original paper [13] for details on the general framework, nonlinear feature prediction, and proofs of performance guarantees.
The goal of the algorithm is to predict the expectation values {o i } for a set of M observables {O i }: where ρ is an n-qubit true state. Obviously, o i (ρ) are arXiv:2008.05234v1 [quant-ph] 12 Aug 2020 linear in matrix elements of ρ.
In the data-gathering stage, ρ is transformed by a unitary operator U , ρ → U ρU † , and then each qubit is measured in a computational basis. This procedure is repeated many times for different U ∈ U, chosen randomly from some matrix ensemble U. The choice of U affects the tomography performance and determines a class of observables O i that can be effectively estimated. The authors of [13] mainly consider two ensembles: stabilizer circuits, i. e., U belonging to the n-qubit Clifford group [14], and Pauli measurements, where each U is a tensor product of single-qubit operations. We have selected the first option as a more extensive alternative, yet our experimental setup can carry out any measurement.
The random unitary transformation, ρ → U ρU † , followed by a measurement in a computational basis {|b i } is equivalent to the projection onto a random vector |ψ i = U † |b i ∈ S. Since U is a Clifford scheme, then by definition |ψ is a random stabilizer state and S is the set of all n-qubit stabilizer states. Later, such measurements will be referred to as Clifford or stabilizer measurements. We resort to vectors, rather than Clifford gates, because our experiment lacks a natural decomposition of unitary transformations into a gate sequence. The algorithm for uniform sampling of random stabilizer states |ψ ∈ S, is presented in the Appendix A.
When the measurement results are obtained, the classical shadowρ of the n-qubit true state ρ is calculated: where P is the number of projections and f i is the observed frequency for the outcome, corresponding to |ψ i , (2) is nothing more than an explicit form of a linear inverse (least squares) estimator for any spherical 2-design POVM [15]. Our choice, i. e., stabilizer states, forms a 3-design [16] and the expression is also applicable.
We emphasize that initially, in the work [13], each projection is assumed to be performed for a single copy of ρ. Therefore, the number of projections P coincides with the number N of measured copies, P = N , (f i = 1/N ). On the other hand, in our quantum optical experiment, several photons can be detected for the same |ψ i during the acquisition time, so P < N . Moreover, we worked in the regime of overexposure, for which P N (typically, N/P ∼ 10 4 -10 5 depending on the system dimensionality). This setting is common in compressive sensing experiments, where shot noise in the outcome probability estimation should be diminished [17][18][19]. Preliminary tomography simulations showed that feature prediction accuracy was limited by finite P even though N = ∞ (observed frequency f i was substituted with exact outcome probability). In this sense, P is more important than N . When P < N , at least, P copies are measured with dissimilar projectors, so theorems presented in Ref. [13] stay valid if N is replaced by P . However, theorem statements can become pessimistic, and proofs may require further justification for the case P < N .
Once a classical shadow (2) is obtained, an estimatorô i of o i is simplyô Here comes another discrepancy with the original algorithm: the authors of Ref. [13] propose to use medianof-means estimator. However, we omit the median evaluation and use a simple mean estimator throughout the work, because no valuable difference was found between the two approaches [20]. When P = N , the shadow tomography protocol has the following sampling complexity [13]: i M, within an additive error ε given that The number of copies N depends on target operators O i rather implicitly via Tr O 2 i . In our experiments we used rank-1 projectors, therefore, Tr O 2 i = 1, and this factor vanishes from (4).

III. EXPERIMENT
We use spatial degrees of freedom of photons to produce high-dimensional quantum states. The corresponding continuous Hilbert space is typically discretized using the basis of transverse modes. We have chosen Hermite-Gaussian (HG) modes HG nm (x, y), which are the solutions of the Helmholtz equation in Cartesian coordinates (x, y) and form a complete orthonormal basis. The mode order k is defined as a sum of mode indices: k = n + m. There exist (k + 1)(k + 2)/2 HG modes from zero to kth order inclusive. We bound the beam order to prepare a D-dimensional system, i. e., the order k is limited by k max , k k max , where k max is the minimal integer fulfilling the inequality (k max + 1)(k max + 2)/2 D. We test shadow tomography for dimensions D = 2, 4, 8, 16, and 32, which corresponds to one to five qubits.
In our setup ( Fig. 1) an attenuated light from an 808nm diode laser is spatially filtered by a single-mode fiber (SMF-1) and collimated by an aspheric lens L2. The top half of a spatial light modulator (SLM, Holoeye Pluto) serves to prepare the desired state of the photon, and the bottom half followed by focusing into a single-mode fiber (SMF-2) and single photon detection implements projective measurements [21,22]. Lenses L3 and L4 have equal focal lengths F = 100 mm and are mounted 200 mm apart. Since holograms displayed on the SLM use a blazed grating for amplitude modulation [23], the pinhole in the focal plane is used for state selection in the first diffraction order. After a double pass through a telescope and a quarter-wave plate (QWP), the beam is reflected by a polarizing beam splitter (PBS) and directed back to the SLM without any additional alterations.
Note that the detected state differs from the prepared one due to the Gouy phase incursion during beam propagation from one half of the SLM to another. The Gouy phase ϕ G depends solely on the geometric parameters of the experimental setup, e. g., the beam Rayleigh range and traveling distance. It causes the following transformation of basis states: |HG nm → e i(n+m+1)ϕ G |HG nm . We use the Gouy phase as a fitting parameter to determine the "true" state.

A. Correlation analysis
Expectations o i can be estimated by shadow tomography via (3). On the other hand, the expression (1) has the form similar to the Born's rule, so quantities o i can be measured directly. It provides a way of independent experimental verification of shadow tomography predictions. We will denote the estimates given by shadow tomography asô est.
i , and the directly measured expectations asô meas.
i . These values are both subject to experimental imperfections and shot noise due to finite statistics N . However, the latter factor is negligible, since in all experiments the total exposure corresponding to the value o i = 1 was approximately 3 × 10 5 photons with proportional scaling for other values of o i .
At first, we performed 10 4 stabilizer measurements to obtain the classical shadowρ. Then, 5000 projectors O i = |ϕ i ϕ i | onto random Haar-distributed vectors |ϕ i were measured, resulting in an array ofô meas. i . For each investigated dimension D = 2 n , n = 1, . . . , 5, we probed five different Haar-distributed random pure true states to ensure that the procedure is a state agnostic one. We observed high Pearson correlation coefficient between the two quantities in all scenarios, signaling about the shadow tomography consistency (see Table I). A typical correlation plot is depicted in Fig. 2 for system dimension D = 8. The solid black line shows perfect matching-the dependenceô est. i =ô meas.
i . As one can see all points tend to concentrate near this line (Pearson correlation coefficient is r = 0.9758). Note the existence of a small "nonphysical" region, whereô est. i < 0. It appears because the classical shadowρ is not forced to be positive semidefinite as in conventional tomography, such as maximum likelihood estimation. And, indeed,ρ contains negative eigenvalues due to experimental imperfections. Apparently, values ofô meas.
i are shifted towards zero. This is a mere artifact of our choice for O i . The probability density function (PDF) forô meas.
i coincides with the PDF p(x) for a squared dot product, x ≡ | ψ|ϕ | 2 , between a fixed vector |ψ , reflecting the true state, and a random Haar-distributed vector |ϕ , corresponding to a projector O i : [24]. As the dimensionality D increases, the mean value x = 1/D decreases.

B. Effect of median of means estimator
It was said earlier that the authors of Ref. [13] suggest to use the median of means estimator [25], which proceeds as follows: 1. A sequence of P measurement results is divided into K batches of length P/K .

A final assessmentô i is the median:
The median of means estimator is robust against outliers in the measured data. The number of batches K depends on the number of target operators M and the confidence probability 1 − δ: K = 2 log(2M/δ). For example, if the failure level is chosen to be δ = 0.01 and M = 5000, the number of batches is K ≈ 28.
In order to investigate how the number of batches K influences the overall tomography performance, we found the median-of-means predictionsô est. i . The obtained dependencies r(K) are presented in Fig. 3 for different system dimensions D. Each curve is averaged over five tomography runs. The case K = 1 corresponds to the ordinary mean estimator, as was used before. The reader can see that the dependencies r(K) are almost flat, and correlation even becomes slightly lower with the increase of K. This implies that in application to our experiment the effect of the median of means estimator is negligible compared to the mean alone. We connect the independence of accuracy on K with two facts. Firstly, the statistics per measurement in our experiments is huge, and the outliers hardly occur. See Appendix B for more detailed reasoning. Secondly, systematic, deterministic errors in measurement projectors dominate over the statistical noise, and medians cannot smooth away this source of imperfections.

C. Fidelity estimation
One of the important usecases for shadow tomography is the estimation of fidelity to some given pure state |ψ . In this case, the target operator O is simply a projector onto this state: O = |ψ ψ|. In particular, one can find fidelity of the state preparation. However in our experiment, the prepared state |ψ prep. and the detected one |ψ det. differ significantly due to the Gouy phase incursion during the beam propagation, and we have to perform the corresponding correction (see Appendix C for details). The obtained fidelities F are listed in Table I.
The results presented above were obtained using overcomplete measurement sets since we used P = 10 4 projectors to construct the classical shadowρ. This number is far greater than the size of a minimal complete set, which has D 2 − 1 POVM elements, even for D = 32. However, the main distinguishing feature of shadow tomography is its ability to predict expectation values using much less then a tomographically complete set of measurements. Hence, we also studied the performance of shadow tomography for the intermediate values of P , including the incomplete scenario, where P < D 2 − 1. Fig. 4a shows averaged dependencies of the preparation fidelity F , estimated using shadow tomography, on the number of stabilizer measurements P for various system dimensions D. Fidelity is calculated with respect to the compensated prepared state, where the compensatory Gouy phase is found using the full data sequence (i. e., for P = 10 4 ). The averaging is done over five different states for each dimension.
In the beginning, for low P , the volatility of curves is vast, and fidelity F can even lie outside the physical region 0 F 1 due to the negative definiteness of a shadow matrixρ. As P increases, fidelities start to stabilize near their final values. Nevertheless, the fidelity estimators are unbiased for any number of projectors P because shadow tomography is based on the linear inversion that is unbiased. And indeed, as one can see from Fig. 4a, the error bars cover the final values of fidelity reasonably well for any P , which experimentally confirms the unbiasedness property.
It is interesting to see how the above fidelity estimates change if the shadow matrixρ [see Eq. (2)] is forced to be positive semidefinite. To achieve this, we project the eigenvalues λ i ofρ onto a canonical simplex , using the recipe from Ref. [26], while leaving the eigenvectors un-(a) Shadow tomography.  touched. The obtained results are shown in the inset of Fig. 4a. Now the estimators are biased: for incomplete measurement sets, P D 2 , fidelity is underestimated and significantly shifted towards zero. When P becomes equal in the order of magnitude to D 2 , the assessments attain their final values. Note the apparent dependency on the system dimension D, which is not the case for ordinary shadow tomography.
The bias of the estimator leads to poor accuracy when a measurement set is incomplete. For example, consider the point with D = 32 and P = 251 in Fig. 4a. Shadow tomography has already converged since fidelity is F = 0.81 ± 0.04, which coincides with the final value for P = 10 4 within the error bars, but after the projection of eigenvalues onto the positive simplex fidelity drops to F = 0.36 ± 0.03. Unfortunately, the bias is unavoidable for any procedure that always yields positive density matrices [27]. A maximum likelihood estimate (MLE) is neither an exclusion. Fig. 4b presents fidelity dependencies for the same measurement data, processed with an accelerated projective gradient MLE [28]. Qualitatively, the performance is the same as the one for the inset of Fig. 4a.

D. Estimator biasedness
Preparation fidelity is not the only quantity estimated with heavy bias employing the MLE method when the number P of stabilizer measurements is low. All projectors O i with the near-unity mean value o i ≈ 1 will be underestimated. To check this hypothesis, we carried out another correlation-like test, similar to those in Fig. 2. Both a classical shadowρ CS and a maximum likelihood estimateρ MLE are calculated using the same stabilizer measurements outcomes. Then as usual, these estimators are substituted into Eq. (3) to giveô est.
where |g is a vector with real and imaginary parts of its elements being independent Gaussian random variables with zero mean and unit variance and a is distributed uniformly on the interval [0, 1]. It is easy to verify that | ψ|ϕ | 2 = a, so if |ψ is the true state, then, indeed, o i = a has uniform distribution. We take a close approximation-a compensated prepared state-as the vector |ψ . The choice of distribution for |g ensures that a "circle" determined by the equation | ψ|ϕ | 2 = const is also populated uniformly [29, Appendix C]. Obtained predictionsô est.
i against directly measured mean valuesô meas. i are shown in Fig. 5 for system dimensions D = 8 and 32. We investigated two cases: estimates for small number of measurements P (100 for D = 8 and 300 for D = 32) and large P = 10 4 . As expected, shadow tomography gives unbiased estimates in all situations:ô est. i ≈ô meas.
i . MLE performs differently, since it is only an asymptotically unbiased estimator. For small P , although the predictions are more condensed compared to classical shadows (there is less volatility), they are underestimated and concentrate near a lineô est. i = βô meas. i with proportionality constant β < 1 (see Table II for best-fit parameters). For large P the behavior equalizes: MLE approaches the asymptotic and produces unbiased  Fig. 5b and 5d) both methods result in the same unbiased predictions with a proportionality coefficient β ≈ 1. For low values of P (Fig. 5a and 5c) MLE predictions are highly biased and underestimateô meas.  We have experimentally demonstrated that classical shadows, i. e., linear inversion estimators for quantum states can be used to faithfully predict expectation values of observables from very few measurements. Specifically, we have shown that the estimator obtained from the classical shadow is unbiased and provides correct expectation values even when the number of measurements used for estimation is significantly less than required for full state reconstruction. As a special case we performed estimation of fidelity with the "true" state and shown that it is also possible with few measurements.
Our treatment reformulates the results of [13] in terms of a typical quantum optical experiment and is then applied to experimental data for high-dimensional spatial states of photons. The versatility of the chosen experimental platform allows us to realize arbitrary projective measurements; however, we have demonstrated that in full accordance with the theoretical predictions, the procedure works well when the measurement set is restricted, for example, to projections on the stabilizer states. This is an important feature of the protocol, making it a scalable approach to quantum property estimation.
The framework of shadow tomography was recently extended with online learning protocols [30,31], which from an operational point of view are close in spirit to the one implemented in this work. Comparing the performance of these approaches on real experimental data is an interesting direction for further research. In this section, we describe the procedure for the explicit generation of random, uniformly distributed, stabilizer states. By "explicit" we mean that the whole n-qubit state vector |ψ of 2 n amplitudes is calculated. This requirement comes from the fact that in photonic experiments like the one performed here the preparation and measurement stage has no natural decomposition in terms of quantum gates and requires the explicit specification of the state vectors.
The set S of all stabilizer states is finite, its cardinal-ity C(n) is [32]: Uniform sampling means that each state |ψ i ∈ S is selected with equal probability. A naive approach would be to generate a random index i = 1, . . . , C(n), and pick the corresponding state |ψ i from a pre-generated set S. However, the huge cardinality makes it infeasible. When working with stabilizer states on a classical computer, one usually resorts to their stabilizer operators rather than vectors, since this implicit description allows very efficient (polynomial in the number of qubits n) storage scheme and simulation of Clifford gate actions. This fact is known as the Gottesman-Knill theorem [32,33].
Therefore, an evident practical approach for constructing a random |ψ is to generate a set of its stabilizers {g i } n i=1 : g i |ψ = |ψ . This can be done efficiently by utilizing, e. g., a method from Ref. [34], which enumerates all possible stabilizers circuits U : |ψ = U |0 . Then given the stabilizers {g i } the state |ψ is obtained using the relation: Thus, the conversion from a stabilizer formalism to an explicit form involves three exponentially hard routines: 1. an explicit construction of stabilizer matrices g i -O(n · 2 2n ) operations, 2. product evaluation-O(n · 2 3n ), 3. recovering of |ψ from |ψ ψ|-O(2 n ).
The overall complexity is dominated by the second stage (note the power 3n). Of course, the complexity of an explicit n-qubit state generation is always exponential and cannot be lower than O(2 n )-the number of operations required to address every element in the vector. But the power index dramatically affects the performance. For the method above, it is 3n, while a decrease to n is possible. Below, we describe an algorithm that requires O(2 n poly(n)) operations.
Let us start with the universal form of any |ψ ∈ S [35,36]: where x ∈ F k 2 and t ∈ F n 2 are, respectively, k-and ndimensional binary vectors, k n, q(x) is a quadratic form on F k 2 , l(x) is a linear one, and R ∈ F n×k 2 is an n × k binary matrix with rank k. Summation and multiplication in (A3) are carried modulo two, since we work in a Galois field F 2 . Also, we identify a binary representation of a given integer number x with the corresponding binary vector and vice versa.
Representation (A3) reveals some properties of stabilizer states. Up to normalization, each element of the state can be either ±1, ±i, or 0. The number of nonzero elements is always 2 k , 0 k n, which is simply the number of different vectors x in F k 2 . By convention, we define F 0 2 = {0}. In our sampling algorithm the set S k , which by definition contains all n-qubit stabilizer states with 2 k nonzero elements, plays an important role. In Theorem 2 we show how to sample |ψ ∈ S k uniformly. Then in Theorem 3 we calculate the cardinality C(n, k) of S k . Finally, we combine these results in Theorem 4, where an algorithm for uniform sampling of the whole set S = n k=0 S k is presented.
Theorem 2. Fix k = 0, . . . , n. Let |ψ be an n-qubit state of the form are random with independent and identically distributed (i. i. d) elements 0 or 1 appearing with probability 1/2. R ∈ F n×k 2 (rank R = k) is a random matrix sampled uniformly from the set of all rank-k matrices. Then |ψ is uniformly sampled from S k .
Proof. Again, consider (A3). Every quadratic form q(x) can be expressed as a sum: The constant x 0 affects only the global phase of |ψ and can be omitted. The linear term b T x is already enclosed in x T Qx. Indeed, in the expansion of x T Qx, there is a diagonal term Q ii x 2 i = Q ii x i , since x 2 i = x i for any x i ∈ F 2 . Therefore, without loss of generality, q(x) = x T Qx, where Q is an arbitrary k × k binary matrix. Analogously, a constant term can be neglected in the linear form: is an arbitrary binary vector. Started from the form (A3), we have already arrived at a more specific expression (A4).
Let us prove that |ψ is sampled uniformly. Quantities Q, c, R, t are associated with their own structures, respectively, a quadratic form Q, a linear form L, a kdimensional vector subspace V k , and an affine subspace A k : The corresponding maps f Q , f c , f R , f t are in general surjective, i. e., they may map many different quantities to a single structure. An affine subspace A k determines positions of nonzero elements in |ψ , while forms Q and L define the order in which ±1, ±i appear. In this sense, Q, L, and A k act independently, so the state (A4) is uniformly distributed if each of these three structures is sampled uniformly.
Obviously, Q, c, and t are generated uniformly, because their elements are i. i. d. random variables with prob(0) = prob(1) = 1/2; R is sampled uniformly as the theorem condition states. However, for a general surjective map, ω = f (ξ), ξ ∈ Ξ, ω ∈ Ω, a uniform sampling of the domain Ξ does not imply the same for its image Ω. Fortunately, for the maps (A5)-(A8) cardinality of a preimage for each element in a codomain is the same (see below): where | · | denotes the cardinality evaluation. Any map f satisfying (A9) has the property that a uniform sampling of the domain Ξ results in a uniform sampling of its codomain Ω. Let us start with proving that the map f Q complies (A9). Only the sum (Q ij + Q ji )x i x j , i < j, matters in the expression x T Qx. So adjoint nondiagonal elements Q ij and Q ji can be replaced by their equivalents: a pair (Q ij , Q ji ) = (0, 1) is equivalent to (1, 0), and (0, 0) ∼ (1, 1). There are k(k − 1)/2 such pairs in a k × k matrix Q and each pair has two equivalent values. Therefore, for every Q: |f −1 Q (Q)| = 2 k(k−1)/2 . There is a one-to-one correspondence between vectors c and linear forms L, so the map f c is bijective, and |f −1 c (L)| = 1. Any nondegenerate n×k matrix R defines some basis of a k-dimensional vector subspace V k of a vector space V n and vice versa. The number of different bases in a given subspace V k depends solely on the dimension k, not on the contents of V k . On the other hand, the number of bases is equal to |f −1 R (V k )|, so the condition (A9) holds for the map f R .
It also follows from the above reasoning that two affine subspaces coincide, iff t 2 = t 1 + y, where y ∈ V k . Therefore, for each t 1 ∈ F n 2 there exist 2 k vectors t 2 that result in the same affine subspace. So, cardinality |f −1 t (A k )| = 2 k is the same for all A k , and the condition (A9) is satisfied.
We have shown that affine subspaces A k , viewed as a shift of the fixed vector subspace V k by a random vector t, are uniformly distributed. Because, as we proved earlier, vector subspaces are also sampled uniformly, the set of all affine subspaces is sampled uniformly.
By pointing out that all necessary structures, namely, Q, L, and A k , are uniformly distributed, we complete the proof.
Theorem 2 does not tell anything about how to sample matrices R. In our implementation we use the simplest possible method. First, fill n × k matrix R with random bits, where prob(0) = prob(1) = 1/2, and compute matrix rank over F 2 . If rank R = k, then stop, otherwise repeat the procedure. We have taken a routine for matrix rank calculation that requires O(n 2 k) operations. where is a 2-binomial (Gaussian) coefficient.
Proof. As in the proof of theorem 2, we can divide evaluation of C(n, k) by counting all distinct quadratic forms Q (A5), linear forms L (A6), affine subspaces A k (A8), and multiplying the results. The whole set of matrices Q ∈ F k×k 2 has cardinality 2 k 2 . But it is divided into groups of 2 k(k−1)/2 matrices, where each group corresponds to the same quadratic form Q. Therefore, the total number of distinct forms is given by the ratio of these quantities and is equal to C quad. = 2 k(k+1)/2 .
There are C lin. = 2 k different possible linear forms over F k 2 . The total number of k-dimensional vectors subspaces V k of an n-dimensional vector space over F 2 is equal to a 2-binomial coefficient n k 2 [37]. Each of these subspaces can be shifted in 2 k ways (by adding a vector t ∈ V k , see proof of theorem 2) resulting in the same affine subspace A k . This gives 2 n /2 k different cosets V k +t. Therefore, the total number of affine subspaces A k is equal to product C aff. = 2 n−k n k 2 . By multiplying the numbers C quad. , C lin. , and C aff. , we obtain expression (A10).
Using the q-binomial theorem [37], it is easy to check that, indeed, n k=0 C(n, k) = C(n) [see Eq. (A1)]. Theorem 4. Choose integer k randomly with probability prob(k) = C(n, k)/C(n), 0 k n, and generate |ψ ∈ S k according to the theorem 2. Then |ψ is uniformly sampled from S.
Proof. The index k determines the set S k to sample, hence, prob(ψ ∈ S k ) = prob(k). Conditional probability of sampling |ψ from S k is, prob(ψ | ψ ∈ S k ) = 1/C(n, k), because the algorithm from the theorem 2 produces uniformly distributed states. Then overall probability of obtaining the state ψ ∈ S is equal to prob(ψ) = prob(ψ | ψ ∈ S k ) prob(ψ ∈ S k ) = 1 C(n) .
(A12) Every state is produced with the same probability, therefore, the sampling is uniform.
The overall complexity of the procedure from theorem 4 is O(2 n poly(n)), because there are 2 n elements in |ψ (A4), and each element evaluation requires no more than poly(n) operations. Actually, in our implementation poly(n) = O(n 3 ), since the complexity is dominated by calculation of rank R.
We provide the Python code for the explicit sampling of random stabilizer states, which is available at GitHub [38].

Appendix B: Median of means estimator
Median of means estimation [25] is an enhancement over an empirical mean estimator that is robust against the outlier corruption. Consider N samples x 1 , . . . , x N of a random variable x. The empirical meanx is defined asx According to the Chebyshev inequalityx deviates from the expectation E x by more than ε with probability at most δ: where Var x denotes the variance of x. Therefore, the sampling complexity for a mean estimator is To calculate the median of means estimatorx MM one, first, splits N samples into K batches, each containing N/K representatives, and evaluates empirical meanŝ x k over the group number k. Then,x MM is defined as follows:x MM = median(x 1 , . . . ,x K ), This estimator is substantially more robust, since for the choice K = log 1/δ the following inequality holds: The sample complexity for the median of means is N = 4 Var x ε 2 log 1/δ.
Note the appearance of log 1/δ instead of 1/δ compared to (B3). It turns out, however, that in the asymptotic limit N → ∞, the central limit theorem (CLT) holds, and empirical meanx is distributed normally:x ∼ N (E x, Var(x)/N ). One can calculate the probability of deviation: where erf(y) is the Gauss error function, which satisfies the inequality: 1 − erf(y) exp(−y 2 ) for y 0. By expressing N , we obtain: In this case the empirical mean is also robust because it contains the logarithmic dependence log 1/δ similar to the one for the median of means estimator. If the classical shadowρ (2) is substituted into (3), then the estimatorô takes a form of the linear combination of random variables f i . Each frequency f i can be viewed as an empirical mean of roughly N/P single-shot measurements. Since in our experiment we worked in the overexposure regime N/P → ∞, the CLT conditions are satisfied with high accuracy, and the above reasonings about the same performance of mean and median of means estimators become valid.

Appendix C: Compensation of the Gouy phase
The prepared state |ψ prep. and the detected one |ψ det. are tied by a unitary transformation U (ϕ G ) with one unknown parameter, namely, Gouy phase ϕ G : |ψ det. = U (ϕ G )|ψ prep. , where U (ϕ G ) is a diagonal matrix and contains entities of unit magnitude only. Compensated fidelity of preparation F or preparation fidelity for short is determined by maximizing fidelity to |ψ det. over ϕ G : The state U (ϕ max G )|ψ prep. , where ϕ max G maximizes (C1), is the compensated prepared state.
A typical dependence of the quantity under maximization in (C1) on ϕ G is demonstrated in Fig. 6 for the system dimensionality D = 32. There is a "low-valued ripple" with local extrema (especially pronounced for higher D), but for all tested states and dimensions, a global maximum around ϕ G ≈ 1 radians with near-unity value is observed.