Shadow tomography from emergent state designs in analog quantum simulators

We introduce a method that allows one to infer many properties of a quantum state -- including nonlinear functions such as R\'enyi entropies -- using only global control over the constituent degrees of freedom. In this protocol, the state of interest is first entangled with a set of ancillas under a fixed global unitary, before projective measurements are made. We show that when the unitary is sufficiently entangling, a universal relationship between the statistics of the measurement outcomes and properties of the state emerges, which can be connected to the recently discovered phenomenon of emergent quantum state designs in chaotic systems. Thanks to this relationship, arbitrary observables can be reconstructed using the same number of experimental repetitions that would be required in classical shadow tomography [Huang et al., Nat. Phys. 16, 1050 (2020)]. Unlike previous approaches to shadow tomography, our protocol can be implemented using only global operations, as opposed to qubit-selective logic gates, which makes it particularly well-suited to analog quantum simulators, including ultracold atoms in optical lattices and arrays of Rydberg atoms.

Introduction.-Theability to control interactions in a many-body quantum system allows one to simulate and study other complex quantum systems of interest [1,2].In a universal quantum computer, where logical gates can be selectively applied to a few qubits at a time, one can in principle mimic the dynamics of any Hamiltonian [3]; however at present such devices are limited by their size and noisiness [4].In contrast, analog quantum simulators-such as ultracold atoms in optical lattices [5,6] and arrays of Rydberg atoms [7][8][9][10]-typically possess global rather than site-specific control, and as such are more tailored to synthesizing specific classes of Hamiltonian.Despite their limitations in terms of programmability, such platforms are often more scalable and less noisy than computationally universal devices, and have already been used to shed light on a wide variety of many-body quantum phenomena [11][12][13][14][15][16][17][18][19][20].
In any such experiment, a key task is to infer the properties of some many-body state once it has been prepared.In computationally universal devices, a particularly powerful technique known as shadow tomography can be employed for this purpose [21][22][23], wherein random unitary rotations are applied before projective measurements of each qubit are made (see also [24][25][26]).Using this scheme, many properties of the state can be simultaneously estimated using a single set of experimental data, and nonlinear properties such as Rényi entropies can also be accessed.However, measurement strategies of this kind currently involve the application of spatially inhomogeneous sequences of site-selective gates.While these operations are natural in digital quantum computation, they are not available in analog quantum simulators, wherein all degrees of freedom evolve simultaneously under some global uniform Hamiltonian.Accordingly, the set of observables that can be directly accessed therein (efficiently or otherwise) is at present much more limited.
In this paper, we bridge this gap by introducing a new protocol that allows one to simultaneously infer many properties of a state (including Rényi entropies, etc.) without needing to address each degree of freedom individually.Rather than applying inhomogeneous unitaries drawn randomly and compiled from few-qubit gates, we propose to apply some fixed deterministic global unitary U to the system together with a set of ancillas, followed by measurements in the computational basis [see Fig. 1(a)].The unitary need not be fine-tuned, and so can be native to the system in question, making our protocol particularly well-suited to analog quantum simulators.Importantly, our scheme offers the same performance guarantees as classical shadow tomography [23], meaning that the number of measurements needed to estimate a wide range of expectation values does not grow with system size.
We show that for generic choices of U , a universal relationship between properties of the target state and the distribution of measurement outcomes emerges.Specifically, the procedure becomes equivalent to making measurements of the state in bases drawn randomly from the Haar ensemble.This equivalence is made precise later through our introduction of a construction called the tomographic ensemble: a probability distribution of wavefunctions that describes the overall measurement process [Eqs.(1,2)].For sufficiently scrambling U , integer moments of this ensemble agree closely with the Haar ensemble, i.e. an approximate quantum state design (QSD) is formed [27,28].Consequently, properties of the system density matrix can be reconstructed through appropriate post-processing of the measurement outcomes.This can be achieved with moderate resources, while allowing low errors in observables' estimates (≲ 1%).
The emergence of QSDs from a single global unitary (as opposed to random sequences of local gates [29]) can be related to the recently introduced concept of 'deep thermalization', where QSDs appear in the projected ensemble of many-body quantum states [30][31][32][33][34][35].By adapting analytical arguments developed in that context, we rigorously establish the existence of QSDs for particular representative cases.We supplement this with numerical evidence for generic choices of U , which allows us to benchmark the full tomography procedure, and understand the effect of symmetries.
Protocol.-Ouraim is to measure properties of some state of interest ρ S , which is prepared at the beginning of each run of the experiment in some register S. For concreteness, we consider systems of qubits, although similar considerations apply to more general setups.We assume that projective measurements of all qubits can be made in some computational basis {|m⟩}, which without loss of generality we take to be Z i -diagonal, where i labels the qubits and (X i , Y i , Z i ) are Pauli operators.
Projective measurements in the fixed basis {|m⟩} give us access to expectation values of diagonal observables, e.g.⟨Z i Z j ⟩.To learn off-diagonal observables, one can apply an appropriate unitary to the system qubits before measurement.For instance, if we rotate every qubit by e −iπ i Yi/4 , then observables such as ⟨X i X j ⟩ can be learned.However, in analog quantum simulators, where we have only global control, observables such as ⟨X i Y j ⟩ cannot be measured in this way, since different unitaries would have to be applied to qubits i and j separatelyan operation which we assume to be unavailable.(See also Refs.[36,37] for a discussion.) To overcome this limitation, we propose a protocol that employs a set of ancilla qubits A initialized in some predetermined state, which for convenience we assume to be a pure product state |0 ⊗N A ⟩ (this assumption is not strictly necessary).The system and ancilla qubits are jointly evolved using some fixed global unitary U , which is generated by a (possibly time-dependent) Hamiltonian that can be readily simulated on the platform in question.We refer to all such unitaries as native.Finally, all qubits are measured in the computational basis {|m⟩}.This is repeated M times, resulting in a collection of M bitstrings m Our claim is that if a native unitary U is sufficiently entangling (in a sense soon to be made precise), then any observable can be inferred from the distribution of measurement outcomes m, and-crucially-that the number of experimental repetitions M and amount of classical computation required to estimate most observables of interest can be bounded, in the same spirit as classical shadow tomography [23].Remarkably, this is possible using just a single, fixed choice of U each run (although later we will show that quantitative performance improvements can be obtained by sampling U from an ensemble of native unitaries in each run).The above claim can be more precisely specified using the formalism of positive operator-valued measures (POVMs).Most generally, any measurement scheme on a state ρ S whose possible outcomes are indexed by m can be captured by a set of positive Hermitian operators F m ≥ 0, known as a POVM, chosen such that the probability of obtaining outcome The constraint that all probabilities should sum to unity implies m F m = I S .For our protocol, the POVM operators are given by ( We have assumed that the initial state of the ancilla is pure and the evolution is unitary.Therefore, F m are proportional to rank-1 projectors where q m = P(m|I S /d S ) = Tr[F m ]/d S are the outcome probabilities for the maximally mixed state I S /d S , |ϕ m ⟩ are normalized wavefunctions, and d S = 2 N S .Since q m ≥ 0 and m q m = 1, formally we can define a probability distribution over pure states on S, where the normalized wavefunction |ϕ m ⟩ occurs with probability q m .We refer to this distribution, which contains complete information about the POVM, as the tomographic ensemble.
We argue that for generic choices of U generated by local interactions without conservation laws, the tomographic ensemble exhibits a useful universal property, namely that it forms an approximate QSD [27,28].This means that for small enough integers k, the kth moments of the ensemble agree with the kth moments of the Haar ensemble E If the dynamics U respects some symmetry, then E (k) will instead tend towards an alternative ensemble, where within each symmetry charge sector a k-design is formed; we discuss this case in the supplement [38].
We first provide evidence justifying the above claim, and then describe how this property can be leveraged to perform shadow tomography of target states ρ S .
Emergent quantum state designs.-Theformation of QSDs in the tomographic ensemble is reminiscent of the concept of deep thermalization.In the latter, a bipartite wavefunction |Ψ SA ⟩ is prepared by applying a unitary U to a product state, the qubits on A are measured projectively, therefore producing an ensemble of states on S. Deep thermalization is achieved if this ensemble reproduces the Haar ensemble up to the kth moment for some k > 1.While deep thermalization and QSDs in the tomographic ensemble are distinct concept, they bear many similarities.This connection is particularly fruitful since there are examples [30,32,33] where the emergence of deep thermalization can be rigorously established.We have adapted these proofs to show that the tomographic ensemble forms an (approximate) QSD when U is drawn from the Haar ensemble, or is a dualunitary circuit evolved for a time t ≥ N S [38].
These two cases are illustrative, albeit contrived, examples where rigorous results that support our claim can be obtained.For more practical purposes, we wish to illustrate that the same occurs for generic unitaries that arise in analog quantum simulators, and for this purpose we must turn to numerical simulations.As figure of merit, following Ref.[30], we employ the trace distance ∆ (k) := 1  2 Haar ∥ 1 , which quantifies how far the tomographic ensemble is from being a k-design is the trace norm).We study dynamics generated by Hamiltonians of the form H(t) = j X j X j+1 + h x (t)X j + h y (t)Y j +h z (t)Z j , which approximates the native dynamics of Rydberg atom quantum simulators [17]; here X j , Y j , Z j are Pauli matrices for qubit j.In certain parameter regimes, this model is known to give rise to fast scrambling of information [39][40][41][42].Furthermore, when the fields h x,y,z (t) are timedependent, there are no conserved quantities (including energy density), and we find that this encourages a rapid approach to k-design.In particular, we find that Floquet evolution works well, with h x = 0.8, h z = 0, and h y (t) toggling periodically between 0.9 for t ∈ [n, n + 0.5) and 1.8 for t ∈ [n − 0.5, n), with n ∈ Z.In the following, the system qubits are located at the centre of a chain with open boundary conditions.
The behaviour of the trace distance for k = 2 as a function of time is shown in Fig. 2  ing |ϕ m ⟩ with 2 N independently sampled Haar-random wavefunctions, indicating that the states making up the tomographic ensemble are effectively quasirandom.Accordingly, the plateau trace distance scales as ∼ 1/ √ 2 N .This behaviour is qualitatively similar behaviour to that seen in the projected ensemble of wavefunctions generated from non-energy-conserving dynamics [34].
Extracting properties of the state.-Havingestablished that the POVMs generated from our protocol generically form QSDs, we now describe how this property can be leveraged to efficiently learn properties of ρ S .While 2designs are known to be optimal for full reconstruction of the system density matrix [43] or process tomography [44,45], here we describe an explicitly shadow tomographic scheme for extracting information about ρ S , which in comparison keeps the sample complexity and classical computational cost bounded [21][22][23].
For a fixed unitary U , the distribution of measurement outcomes p m depends on the state ρ S through the POVM operators (1).It will be useful to treat operators on S as vectors over a d 2 S -dimensional space, denoted using double angled brackets |O⟫ and equipped with the inner product Similarly, the outcome distribution can be written as a 2 N -dimensional vector |p) = m p m |m) where |m) is an orthonormal basis for R 2 N , i.e. (m|m ′ ) = δ m,m ′ .One can then define a completely positive linear map, which we call the POVM channel The observed experimental outcomes { m(r) } (r = 1, . . ., M ) are evidently distributed according to the probability vector |p) = F|ρ S ⟫.
The inverse problem of learning properties of ρ S from experimental data { m(r) } can be solved by finding a left inverse G satisfying GF = id.This allows us to construct an unbiased estimator θO for any expectation value ⟨O⟩ = Tr[Oρ S ] according to θO = M −1 M r=1 ⟪O|G| m(r) ) In the spirit of shadow tomography [21][22][23], this estimator can be computed without needing to reconstruct the full density matrix ρ S , which would be sample-inefficient.Such an inverse G only exists when F has full row rank: a condition known as informational completeness, which is guaranteed when the tomographic ensemble forms a 2-design [38].
While G is non-unique in general, to minimize sample complexity we choose the inverse that minimizes the where we defined the normalized channel F = m |m)⟪ Fm |.The map M is a superoperator mapping the space of operators on S to itself.It has full rank whenever F is informationally complete, and therefore has a unique inverse.
At this point, recalling that the POVM operators (1) are rank-1 projectors, we notice that the superoperator M is equivalent to the second moment of the tomographic ensemble E (2) , Eq. ( 2) [38].Now, having established that QSDs generically appear in our protocol, we can replace M with its universal 2-design form M = (id + |I⟫⟪I|)/(d S + 1), which has an inverse (5) By using the fact that a 2-design is formed, we circumvent having to explicitly compute M −1 , which keeps the classical computational cost bounded. Using | Fm ⟫, we can express the variance of the estimator (4) as The factor in rounded brackets we identify as the third moment, E (3) in Eq. ( 2).Therefore, if the tomographic ensemble forms a 3-design, as we expect for generic unitaries U , then the variance (6) will be the same as for any other POVM for which the F m form a 3-design.One such POVM arises in classical shadow tomography with random global Clifford unitaries [38].Therefore we can conclude that our scheme can be used to estimate expectation values of ρ S using the same number of repetitions M as one would need when doing ordinary classical shadow tomography.The dependence of the variance on the observable in question is well-characterized in Ref. [23]: observables with bounded spectral norm ∥O∥ ∞ = max eigO † O can be efficiently estimated for any system size N S .The procedure can be generalised in the same way as classical shadow tomography to estimate nonlinear observables, e.g.Rényi entropies [38].
To summarise, we have shown that the inverse map (4) can be used to construct estimators of expectation values, and that the map M −1 can be replaced by its universal form (5) when the tomographic ensemble forms an approximate 2-design.The deviation from the 2-design will govern the systematic error, since |E m θO − ⟨O⟩ | ≤ (d S +1)∆ (2) ∥O∥ ∞ , where ∆ (2) is the trace distance, while the k = 3 moments E (3) determine the variance via (6).It is evidently favourable to have the tomographic ensemble as close to a 2-and 3-design as possible, which occurs for generic chaotic evolution as we saw above.
Benchmarking the protocol.-Wenow provide numerical simulations of our full protocol, including the joint evolution of the system and ancillas, the sampling of measurement outcomes, and the reconstruction of observables.We test our measurement scheme on a family of two-qubit target states ρ S (α The coherence parameter α ∈ [0, 1] allows us to interpolate between fully dephased (α = 0) and pure (α = 1) EPR pairs.For the purpose of demonstration, the observables we choose to reconstruct are the fidelity with the EPR state Tr[ρ S |EPR⟩ ⟨EPR|] and the purity Tr[ρ 2  S ].In one set of simulations, we generate U from Floquet evolution using the tilted-field Ising model as a generating Hamiltonian, as before.In a second set, we also add some randomness to U -that is, for each repetition r we generate a distinct U (r) by selecting random magnetic fields.Then, U (r) is used in the joint system-ancilla evolution, and in the construction of estimators.This helps to bring the tomographic ensemble closer to a 2-design, therefore further reducing systematic errors [38].To construct random unitaries U (r) , for each time interval of length τ = 1, we sample each field component h x,y,z independently from a normal distribution with zero mean and standard deviation √ 2. In Fig. 3, we plot estimations of the fidelity and purity for various different α and tomography schemes, using M = 5×10 3 repetitions each and evolving for a total time t = 10.We see closer agreement with the true fidelity as N A is increased, and when randomness is introduced.
Classical computations.-As in classical shadow tomography, the estimation of expectation values from experimental data requires a certain amount of classical post-processing, the complexity of which we wish to bound.Specifically, when an outcome m is observed we must evaluate ⟪O|G 0 |m), which requires computation of the backwards time evolution U † |m⟩.
When the number of system qubits N S is O(1), the evolution time required to obtain an approximate QSD is S ] (bottom panels) using the deterministic protocol (Det; fixed U ), and the semi-randomized protocol (Rand), see main text.In both cases the total evolution time is t = 10.The target state is the EPR pair state |EPR⟩ after dephasing with strength 1 − α.M = 5 × 10 3 repetitions are used for all data points.Errors relative to the true data are shown, with data artificially shifted horizontally for readability.also O(1), and hence efficient matrix product state techniques can be used even for large N A .For tomography of many-body states, the present strategy must be modified, since the time of evolution required to reach a QSD grows with N S .Instead of evolving all system qubits with a single collection of ancillas, one can instead block the system into n groups of N S /n = O(1) qubits, and evolve each block jointly with a separate collection of ancillas A j under a unitary U j , where j = 1, . . ., n.This scheme, illustrated in Fig. 1 mj is of the form (1). The tomographic ensemble for each separate block reaches an approximate 3-design in a O(1) time, allowing F m1,...,mn to be evaluated efficiently using matrix product methods as before.The tradeoff is that M −1 must be replaced by a n-fold tensor product of (5), and this will affect how the estimator variance (6) depends on the observable O.By analogy to shadow tomography with random local Pauli measurements [23], observables with support on a small number of blocks will still be accessible using a reasonable number of repetitions M , regardless of how big S is; we prove bounds on the variance in the supplement that confirm this [38].
Note that one could in principle compute the map M −1 without using the universal 2-design form (5), which would eliminate any systematic error in estima-tion.However, this is only feasible for a small number of ancillas N A , since 2 N S +N A separate terms must be summed to construct M.
Note added.-Duringcompletion of this work we became aware of a complementary study, to appear in the same arXiv posting, where a similar measurement scheme is presented [37].The protocol introduced in that work follows the same steps as ours, where the state is first entangled with ancillas, before measurements in the computational basis are made, data from which are postprocessed classically to infer properties of the state.In contrast to our proposal, no assumption is made about the formation of a QSD; instead the inverse map M −1 needs to be explicitly computed.
Supplemental Material for "Shadow tomography from emergent state designs in analog quantum simulators"

PROOFS OF QUANTUM STATE DESIGNS IN THE TOMOGRAPHIC ENSEMBLE
As mentioned in the main text, the construction of the tomographic ensemble [Eq.( 1)] resembles that of the projected ensemble for a many-body state |Ψ SA ⟩ = U |0⟩ using the same unitary U , as can be seen in Fig. S1.Comparing the two cases, we see that in the region A, all qubits begin in the same initial state and are measured at the end of the process.The two protocols therefore differ only in the inputs and outputs of U in the region S. (One could in principle also consider scenarios where measurement events occur throughout the dynamics, as in the study of 'retrodiction' in noisy quantum dynamics [46]; however we leave this possibility for future work.) In this section, we make use of this resemblance to establish the existence of (approximate) quantum state designs in the tomographic ensemble, based on arguments that prove the same for the projected ensemble.We consider two cases: 1) where U is a single unitary drawn at random from the Haar ensemble over U(2 N ), and 2) where U is a dual-unitary circuit containing t ≥ N S timesteps.

Haar-random unitary
The statement we wish to prove is as follows Theorem 1 For a unitary U chosen at random from the Haar ensemble over U(2 N ), the tomographic ensemble forms an ϵ-approximate k-design with probability 1 − δ if where Ω( • ) contains a constant multiplicative prefactor, as well as constant offset terms.
The above implies that approximate quantum state designs are realised with overwhelmingly high probability in the limit of large system sizes (tending to unity double-exponentially fast in N A ), provided the number of ancillas scales quickly enough with the size of the system (at least as fast as kN S ).
Proof of Theorem 1.-Our proof uses many of the same analytical tools as the proof of the analogous theorem for the projected ensemble given in Ref. [30].First, one shows that the average of E (k) over all choices of U matches the moments of the Haar distribution.Then, concentration of measure results can be used to bound the fluctuations of E (k) away from its average.This can be used to upper bound the probability that the trace distance Haar ∥ 1 exceeds an allowed tolerance ϵ.The main difference between the two proofs will be the derivation of an upper bound of the variation of E (k) considered as a function of U .
The calculation of the averaged moments E U E (k) can be achieved by first proving that the probabilities √ q m are independent random variables.To see this, note that the joint probability density for a pair (q m , |ϕ m ⟩) satisfies dP(q m , |ϕ m ⟩) = dP(q m , U S |ϕ m ⟩) by virtue of the fact that for any U realizing a given pair (q m , |ϕ m ⟩), there exists a unitary U (U † S ⊗ I A ) occurring with the same probability (thanks to the invariance of the Haar measure), which realizes the pair (q m , U S |ϕ m ⟩).This implies that the conditional probability density dP(|ϕ m ⟩ |q m ) is invariant under unitary rotations U S , and hence must be equal to the Haar measure for any q m .One can therefore separate averages of q m and |ϕ m ⟩, and using the definition of the moments (2) we have Haar . (S2) We now seek to upper bound the probability of finding the kth moment a distance at least ϵ away from its mean.This can be done using concentration of measure results, which describe the general phenomenon where probability SA is an extension of the product state |0⟩ A that also encompasses the system qubits.Note that the two protocols only differ in how the system qubits are prepared, and whether they are read out at the end.
distributions over high-dimensional manifolds become approximately uniform (see, e.g.Ref. [47]).Whereas the relevant quantity in Ref. [30] was a functional of a Haar-random state, here the moments of the tomographic ensemble are functionals of a Haar-random unitary.The particular lemma that we will need is therefore slightly different; it is stated as Lemma 3.2 in Ref. [48]: where ∥C∥ 2 := Tr[C † C] is the Frobenius norm, the probability that f deviates from its mean E U f by at least ϵ can be upper bounded as The constant η appearing in Eq. ( S3) is referred to as the Lipschitz constant of f .We will apply Lemma 1 to the scalar functional where ⟩ is a state in the k-fold replicated space, and each i (l) runs over a basis for the Hilbert space of S. First we need to compute the Lipschitz constant of f ij .This will proceed somewhat differently to the arguments of Ref. [30].
To bound the left hand side of (S3), we define a parametrization of matrices U (t) = (1 − t)U 1 + tU 2 , which lies within the convex hull of unitary matrices for t ∈ [0, 1].We then have where U = dU/dt, and the maximum is taken over all U in the convex hull of U(d).In the last step, we have used the  1)], the norm of the matrix derivative can be evaluted where we have defined Now recalling the definition Fm = F m / Tr[F m ], we can express the squared norms ∥X∥ 2 2 = Tr[X † X] in terms of matrix elements of POVM operators where |il⟩ is a tensor product of all |i (l ′ ) ⟩ for l ′ ̸ = l.Here we have invoked the representation of E (2k−1) in terms of the POVM operators; see the right hand side of Eq. ( 2).It is important to remember that U here can be any matrix in the convex hull of U(d), i.e.U = i λ i V i with λ i ≥ 0, V † i V i = I, and i λ i = 1.This set is equal to the space of We can still use the form , where |ϕ m ⟩ are normalized wavefunctions, and We will make use of the following matrix inequality which follows straightforwardly from the fact that ⟨ϕ|E (1)  1) |b⟩ for any |a⟩ , |b⟩ in the replicated Hilbert space.After applying this to the summand in (S10), we then use where in the last step we have used Eq.(S11).This implies that the summand of (S11) have modulus at most 1, and so summing over l, p we finally obtain ∥X∥ 2 2 ≤ k 2 .Similarly, we can express ∥Y ∥ 2 2 in terms of the moments E (2k) : Following a similar line of reasoning as before, we have ⟨i ⊗ j|E Putting everything together, we get The rest of the proof follows the same logic as Ref. [30].Defining ∆ Haar , we have Therefore the probability that the trace distance exceeds some value ϵ can be upper bounded FIG. S2.Summary of graphical notation used here, following the same conventions as Ref. [33].(a) A two-qudit unitary and its complex conjugate are represented as blue and red tensors, respectively.(b) Representation of the unitary condition.(c) Representation of the dual-unitary condition.(d,e) When l-fold replicas are constructed, as in Eq. (S18), thick lines are used to represent the l forward indices ai and l reverse indices a ′ i .Time evolution in the replicated space can be expressed using multiple copies of each unitary and its complex conjugate.(f) In Eq. (S18), n copies of the unnormalized POVM operator Fm are traced over, while k copies are left untouched; this operation is denoted using a semicircle.(g) The permutation tensor πσ pairs each forward index ai with the reverse index a ′ σ(i) associated via the permutation σ ∈ Σ l , where Σ l is the group of permutations of l elements.
Employing Lemma 1 and taking a union bound, we have For qubits we have d = 2 N S +N A and d S = 2 N S , and so the tomographic ensemble forms an ϵ-approximate k-design with probability 1 − δ whenever δ is less than the right hand side of the above, giving This is achieved whenever N A scales according to the relation quoted in Eq. (S1).□

Dual-unitary circuit
The second case where the existence of k-designs can be rigorously proven is when the unitary U is a brickwork circuit made up of two-site gates that are each dual-unitary.In this section, we will allow for arbitrary local Hilbert space dimension q ≥ 2, i.e. we consider systems of N qudits; accordingly, measurement outcomes m can take q distinct values.
A unitary gate acting on two qudits can be viewed as a rank-4 tensor U o1o2 i1i2 , with two indices i 1,2 for the initial state of the qudits and two indices o 1,2 for the corresponding outputs.The gate is dual-unitary if the components of this tensor also describe a unitary matrix when viewed as a map from inputs i 1 o 1 to outputs i 2 o 2 .This is illustrated in Fig. S2(a).The space of dual-unitary gates acting on qubits (q = 2) was classified in Ref. [49].Brickwork circuits made up of dual-unitary gates describe a form of many-body quantum dynamics wherein many properties can be calculated exactly, such as two-point correlation functions, entanglement entropies, and out-of-time-order correlators [50][51][52].
Recently it has been shown that under certain conditions, a many-body wavefunction |Ψ SA ⟩ = U |0⟩ generated by time evolution under a dual-unitary circuit U realises an exact k-design in its projected ensemble, in the limit of an infinite number of ancilla qudits N A → ∞ [32][33][34].In order for this result to hold, the dual-unitary must not be finetuned to an integrable point, and the initial state |0⟩ and measurement basis {|m⟩} must form a solvable measurement scheme (using the terminology of [33]), meaning that certain criteria that depend on the circuit in question must be met.We will leverage results that were proved in this context to show that the tomographic ensemble formed from such a unitary will also form exact k designs in the limit N A → ∞.
By analogy to the moments of the projected ensemble [33], here the moments of the tomographic ensemble can be studied analytically using a replica trick.We generalize the definition of the ensemble moments (2) to where F m are the unnormalized POVM operators defined in Eq. ( 1).This representation can be generalized to any number of layers t (t = 4 layers are shown above).The properly normalized kth moment is recovered by analytically continuing n and taking the replica limit n → 1 − k.
Each term in the sum in (S18) can be expressed in terms of an l-fold copy of the original circuit U := (U ⊗ U * ) ⊗l , where l = n + k.In all copies, the initial state of the ancillas are the same state |0 A ⟩, and the final states of all qudits are projected onto |m⟩ ⟨m|.For n of the copies, the inputs to the unitary in the region S are traced out, while for the remaining k copies, those inputs are left as free, constituting the components of the summand in (S18).We will focus on initial states and measurement bases that are product states here.The summand can be graphically represented using the notation described in Fig. S2 as We have highlighted the region corresponding to qudits in S to distinguish this part of the circuit from the part acting on ancillas A. The latter part of the diagram will simplify upon taking the limit N A → ∞.
Given that the initial state and measurement basis are product states, we will need to assume an additional property of U which ensures that the measurement scheme is solvable.This property is found in the kicked Ising model [49], as well as a family of gates introduced in Ref. [33]; we refer interested readers to that work for details.Here, we will simply state this property, and assume it in the following.For any computational basis states m 1 , m 2 , the two-site gates we consider here must satisfy When this property is obeyed, the part of the circuit (S19) that acts on ancilla qudits simplified considerably in the limit where the components of the rank-1 tensors π σ , defined for each element of the permutation group of l objects σ ∈ Σ l , are given in Fig. S2(g).The above holds for any integer k and any even number of layers t ≥ 2, provided that the circuit is not integrable.An analogous result for odd t can also be obtained, with different boundary conditions at the top.The factor of q 2(l−1)N A is required to ensure that the left hand side is finite and bounded in the limit N A → ∞.Generalized moments (S18) of the projected ensemble of the wavefunction |Ψ SA ⟩ = U |0⟩ can be computed with the help of Eq. (S21).By analytically continuing to n → 1 − k, the properly normalized moments can be obtained.
When the number of layers t is at least as large as N S , one finds that the projected ensemble forms an exact k-design [32][33][34].We will use similar arguments to show that the tomographic ensemble also forms a k-design for t ≥ N S .A key ingredient will be the following relations for any permutation σ.These follow from the unitarity and dual-unitarity of the two-qudit gates.In addition, the solvable measurement scheme condition (S20) can be rewritten in the replicated space as which will be made use of in the following.The generalized moment is obtained by summing Eq. (S19) over m and applying (S21), which leads to a significant simplification, and subsequent application of Eqs.(S22, S23) allows further reduction (a representative case t = 4, N S = 2 is shown in the following diagrams, but the arguments steps are readily generalised) Finally, by performing the trace over the n copies [see Fig. S2(f)], we find that the generalized moment is an equalweight sum of all permutation tensors over k elements E (n,k) ∝ σ∈Σ k π σ .The dependence on n is then entirely through a constant of proportionality, and so the analytic continuation n → 1 − k is readily taken.In fact, the value of this proportionality constant in the replica limit is fixed by the condition that Tr[E (k) ] = 1.Putting everything together, and noting that the sum over all permutation operators the k-fold replica space is precisely the kth moment of the Haar ensemble, we conclude that the tomographic ensemble forms an exact k-design for all k.□

INCLUDING RANDOMNESS
In the main text, we described a 'semi-randomized' version of our original protocol, where in each repetition of the experiment, U is chosen at random from some distribution of unitaries that are all native to the quantum device in question.Here we describe how this modification reduces the systematic error of the estimators, and reduces the overheads in terms of number of ancillas.
In this section, we will make reference to a scenario where the unitary is sampled from a discrete probability distribution {(p c , U c )}, where c = 1, . . ., C labels distinct unitaries U c , which occur with probability p c (note however that all results can be straightforwardly generalized to continuous probability distributions).When the unitary U itself is decided by a random process, we can write down a POVM {F m,c } such that the probability of both choosing a specific unitary U c , and obtaining the outcome m is Tr[ρ S F m,c ].Specifically, we have where F m|Uc is the POVM operator (1) with U replaced by U c .Since each {F m|Uc } m is itself a POVM, we have m,c F m,c = I as desired.Note that this formalism could be used to capture classical shadow tomography, where FIG.S3.Trace distance ∆ (2) of the tomographic ensemble where the unitary is chosen at random from a discrete probability distribution {(pc, Uc)}, where c = 1, . . ., C, and the moments formed as in Eq. (S26).We vary the number of different unitaries C, from the smallest value C = 8 (green), doubling each time to C = 16, 32, 64, . .., up to C = 256 (purple).We always choose a uniform distribution for pc = 1/C, and each unitary Uc is generated by the tilted-field Ising Hamiltonian, as discussed in the main text.Increasing C leads to better convergence towards a k-design-in particular, after a C-independent time, ∆ (2)  saturates at a value proportional to U c is sampled from the appropriate distribution of unitaries (global Clifford circuits or local Pauli rotations).The difference here is that a state design is approximately formed for each U c , whereas in shadow tomography the POVM for a single unitary is far from being a k-design for k ≥ 2, since the number of different possible measurement outcomes is not big enough (2 N S compared to the minimum number 4 N required to form a 2-design).Now, we observe that the kth moments of the full POVM F m,c [Eq.( 2)] are convex combinations of the moments for each individual POVM F m|Uc , namely where E (k) [F m|Uc ] is the kth moments of the tomographic ensemble for the POVM (1) where the unitary U c is used.
If each separate POVM {F m|Uc } m deviates from a perfect 2-design by an amount ∆ c (as quantified by the trace Haar ∥ 1 ), then by the triangle inequality E full will deviate from a state design by at most the average value of the trace distance c p c ∆ c .Roughly speaking, if the projectors that make up each F m|Uc are uncorrelated with one another, then we expect that the differences E (2) Haar will typically be in different directions in operator space, and so the errors will compound sub-additively, yielding a smaller value of the trace distance.Since the discrepancy between the k = 2 moments of the tomographic ensemble and the corresponding Haar moments governs the systematic error of the estimators, we see that adding randomness allows one to reduce any such error in our protocol.
To verify that the moments of the full ensemble E full are indeed closer to being a k-design than each separate POVM, we compute the k = 2 trace distance for a particular family of different distributions of unitaries {(p c , U c )}.In each case, we keep the probabilities uniform p c = 1/C, and vary the number of different unitaries C. Each U c is generated by the tilted-field Ising model Hamiltonian discussed in the main text, H(t) = j X j X j+1 + h x (t)X j + h y (t)Y j .The values of h x,y (t) change abruptly every τ = 0.5 time units, and each unitary U c has a different sequence of field values.Before running any simulations, we choose the actual field values for each c through independent random sampling from a normal distribution with zero mean and standard deviation √ 2, and use these field values to construct the moments (S26).In the limit C → ∞, this describes the protocol used to generate the data shown in Fig. 3.We see that increasing C does indeed reduce the trace distance, and empirically we find that the plateau value of ∆ (2)  scales as 1/ √ C. Notably, this is the behaviour that we would expect if we assumed that every wavefunction |ϕ m,c ⟩ in the ensemble was an independent randomly distributed vector.Of course, in the true C → ∞ limit, one can find two possible unitaries U c , U c ′ that are very close to one another such that this assumption of independence will fail; thus at large enough C the trace distance should saturate to a finite (but very small) value.Since constructing the tomographic ensemble for large values of C is computationally demanding, we find it easier to properly assess the performance of this randomized protocol by simulating the whole procedure, as in the data presented in Fig. 3.
Note that adding randomness does not increase the classical computational overhead because the inverse map M −1 is chosen to be the universal form (5). If, on the other hand, we were to compute M −1 exactly, we would have to compute 2 N × C individual wavefunctions, where C is the number of different unitaries in the distribution.Also, from our simulations we find that the full ensemble can approach a 2-design very closely even with a modest number of ancillas, because trace distance for each separate POVM {F m|Uc } m does not need to be particular small, provided that we can sample from a sufficiently diverse range of unitaries.

ACCOUNTING FOR SYMMETRIES
Some quantum simulators possess intrinsic symmetries that cannot be readily broken, e.g.number conservation in ultracold atomic gases.This restricts the space of unitaries U that are available, which in turn leads to constraints on the POVMs that can be realised with our protocol.Focusing on Abelian symmetry groups, in this section we will show that the moments of the tomographic ensemble tend towards different universal form that respects this symmetry: Specifically, within each symmetry charge sector an approximate state design is formed.Again this occurs provided that U is sufficiently entangling, and, in the case where the symmetry is continuous, the initial state of the ancillas must also have a small enough effective chemical potential (i.e.|0⟩ A is not close to being a maximum-or minimum-charge state).
The Hilbert space of a system that respects some Abelian symmetry can be decomposed into charge sectors q, spanned by orthogonal projectors P q , which are not coupled by symmetric unitaries: P q U P q ′ = 0 for q ̸ = q ′ .Any target density matrix ρ S that can be prepared using symmetric operators will also be constrained to have vanishing coherences between different charge sectors, i.e. [ρ S , P q ] = 0. (In the nomenclature of Ref. [53], this corresponds to a 'weak symmetry', in contrast to a state that has support in only one charge sector, which is 'strongly symmetric'.)Therefore, only charge-diagonal observables and states need be considered, since operators that couple different charge sectors have vanishing expectation values.Symbolically, we have We naturally presume that computational basis states and the ancilla initial states each have definite charge.Therefore each POVM operator F m [Eq.(1)] will lie in a particular charge sector q(m), where q(m) is the charge of the system that is required to match the total system plus ancilla charge before applying U to the final measured charge m.This constraint prevents the formation of full state designs, since superpositions of states with different charges are forbidden.Instead, we find that generic symmetry-respecting dynamics yields a POVM for which the tomographic ensemble forms an approximate state design within each charge sector-we call such a distribution a 'block-diagonal state design'.That is, we can decompose E (k) = q E (k) q , where E (k) q contains only terms in Eq. (2) for which q(m) = q, and we find that E (k) q approaches the kth moment of the Haar ensemble over span(P q ).Note that the block-diagonal Haar ensemble is the distribution that maximizes randomness subject to the constraints imposed by symmetry.
Provided that E (2) takes this universal form, expectation values of charge-diagonal operators can be estimated using (4), after replacing the expression in Eq. ( 5) with a block-diagonal superoperator where O q is the submatrix of the operator O contained within the charge sector q, as in Eq. (S27).
It is relatively straightforward to see that if the moments of the tomographic ensemble do converge to a universal form, then it must be the block-diagonal state design described above.This follows from considering the behaviour of a unitary U sampled from the maximally random distribution of charge-conserving unitaries, where each submatrix U q describing the behaviour of U within charge sector q is drawn from the Haar ensemble.By considering one block q at a time, one can use the same method as in Eq. (S2) to see that the mean value of E (k) q is equal to the kth moments of the Haar ensemble over the space spanned by P q .The concentration of measure results given in the previous section can also be used to bound the deviation of a given symmetry sector from being a state design, with the Hilbert space dimensions d, d S replaced with their appropriate charge-restricted values: Specifically, d S should be replaced with the FIG.S4.Trace distance between the k = 2 moments of the tomographic ensemble for dynamics generated by the XXZ model, and the fixed point distribution of states where wavefunctions are Haar-random within each symmetry charge block.We vary the parameters of the model periodically in time, such that the effective Floquet unitary describing evolution over one time period is UF = e −iτ H 2 e −iτ H 1 , with H1,2 both of the form (S29), and τ = 0.5.The staggered field h z = 0.6 throughout, while ∆ = 0.8 in H1 and ∆ = −1.7 in H2.Here, there are N2 = 3 system qubits, located at the centre of a chain with open boundary conditions.
number of system states with a fixed charge q, and d should be replaced with the number of measurement outcomes that could arise starting from a state where the system qubits have charge q, and the ancilla qubits have the charge determined by |0⟩ A .
For discrete symmetry groups, d increases exponentially with system size for each charge block as before, and so state designs are formed with overwhelmingly high probability.However, if the symmetry is continuous (such that there is a conserved charge density), then the effective dimension d will scale much more slowly with system size when the charge of the ancilla initial state has near-maximum or near-minimum charge, reflecting the fact that there are fewer possible final measurement outcomes m, that are compatible with the initial charge configuration.Because of this, the ancilla initial state should be initialized with a non-extremal charge distribution if one is to expect formation of block-diagonal quantum state designs.
We also provide numerical evidence that for representative symmetry-conserving unitaries, the tomographic ensemble approaches a block-diagonal state design.As a representative example, we consider dynamics generated by the XXZ Hamiltonian in a staggered longitudinal field This model possesses a U(1) symmetry generated by operators e iθ j Zj /2 , which implements a rotation of all spins by an angle θ about the z-axis.The staggered field is included to break the integrability of this model, which allows for chaotic dynamics.Again we use Floquet evolution U = U t F , with U F = e −iH2τ e −iH1τ , where H 1,2 have different values of the anisotropy parameter ∆.In our simulations, we pick τ = 0.5, and h z = 0.6, ∆ = 0.8 in H 1 , and ∆ = −1.7 in H 2 .We have verified that qualitatively similar behaviour is seen for other choices of parameters.
The initial state of the ancillas is chosen to be a staggered state j [|0⟩ 2j−1 ⊗ |1⟩ 2j ].This state is chosen because there are a large number of states with the same total charge as this, compared to states that have near-extremal magnetization, i.e. those that are close to all |0⟩, or all |1⟩.This ensures that there will be a large number of different possible measurement outcomes, which is necessary for the formation of a state design.We have found that a much larger number of ancillas are needed to form an approximate state design when the initial state has maximum magnetization.
We compute the trace distance ∆ (2) between the k = 2 moments of the tomographic ensemble and the corresponding moments of the block-diagonal Haar ensemble.The results are plotted in Fig. S4.Again we see similar trends to the trace distance for the tilted-field Ising model: After an initial transient period, the trace distance plateaus at a value that scales exponentially with the number of ancillas.This behaviour can be understood in the same way as before, by noting that the trace distance is a sum of contributions from each charge sector q, and that the number of measurement outcomes m that reside in each sector is exponentially small in N A for all q (assuming N A /N S is large).
In addition to conservation laws that are associated with unitary symmetries, one could in principle also consider conservation of energy due to time-translation symmetry.This applies when U is generated from evolution under a time-independent Hamiltonian U = e −iHt .However, from the results of Ref. [30], where the projective ensemble is studied, we anticipate that the evolution time required to reach the appropriate universal form will be much longer in this case.Since it is almost always possible to introduce some form of time-dependence in the Hamiltonian in experiments, we will not address this case here, instead leaving it to future work.

DETAILS OF CLASSICAL POST-PROCESSING
In this section we provide additional details on how properties of the target state ρ S can be estimated from experimentally observed measurement outcomes.

Optimality of the inverse map (4)
In the main text, we stated that the choice of inverse map G that minimizes the average-case variance is given by Eq. ( 4).Here we prove this statement.Our logic follows a similar line of reasoning to the arguments given in Ref. [43], with the difference that here-in the spirit of shadow tomography [21][22][23]-the goal is to estimate specific expectation values, rather than perform full tomography of the density matrix.
For a given informationally complete POVM channel F [Eq. ( 3)], a linear estimator θO for any expectation value ⟨O⟩ can be represented as a dual vector (w| ∈ R 2 N * satisfying (w|F = ⟪O|.In particular, when we have a left inverse of F, i.e. an operator G satisfying GF = id, then we can set (w| = ⟪O|G.Thanks to this condition, experimental data can be processed to form a quantity θO = M −1 M r=1 (w|m (r) ) (where m (r) is the measurement bitstring for repetition r) which is an unbiased estimator, since E(w|m (r) ) = (w|F|ρ S ⟫ = ⟪O|ρ S ⟫ ≡ ⟨O⟩.While performance could in principle be improved by harnessing more sophisticated nonlinear estimation schemes, e.g.maximum likelihood estimation, here we mainly analyse linear estimators.(One relatively simple example employed in Ref. [23] is to calculate a median-of-means, rather than just the mean used here, which reduces the chances of finding outliers.)Since (w| is non-unique whenever N A > N S , we wish to find a choice that is optimal.However, because the variance depends on the state ρ S itself, which in principle is not known in advance.As in Ref. [43], to reflect our lack of a priori knowledge of ρ S we first average over all unitarily equivalent states, and then minimize this averaged variance.This amounts to minimization of subject to the constraint (w|F = ⟪O|.We will show that the choice (w * | = ⟪O|G * , where the inverse map G * is given in Eq. ( 4), achieves this minimum.Firstly, we show that (w * | is a valid estimator.First observe that the channel M defined in Eq. ( 4) can be written as either F † F or F † F.Then, we have

□
Bounding the variance of estimators As we showed in the previous section, the dual vector (w 0 | produces the estimator with the smallest possible variance averaged over all possible input states.However, this does not give us complete information about the variance that one would find for specific choices of ρ S .Indeed, in principle there could exist particular adversarial input states for which the variance is exceptionally high, even if the average variance is small.To ascertain Var θO for any particular ρ S , we can use Eq. ( 6), which is expressed in terms of the third moments of the tomographic ensemble E (3) [see Eq. ( 2)].Here, we will use this expression to obtain upper bounds for the variance of estimators of expectation values, as well as nonlinear properties of ρ S .
We are particularly interested in cases where the moments of the tomographic ensemble approach their universal maximum-randomness forms (the form in question depends on whether blocking is used or not, and whether any symmetries are present).In these cases, analytic expressions for M −1 and E (3) can be obtained which allow the variance, treated as a joint functional of ρ S and O, to be specified explicitly.We aim to obtain simple upper bounds for the variance in such cases, focussing on setups with no conservation laws, either with or without blocking.As we will show, in the case with (without) blocking the functional form of the variance becomes the same as that of classical shadow tomography with random global (local) gates [23].
To prove this, it will be helpful to represent conventional shadow tomography using the POVM formalism employed in this work.There, a unitary U is applied to the system only (no ancillas are used), before measurement in the computational basis, giving an outcome m S .The unitaries are randomly sampled from an appropriate discrete set U with probabilities p U .We can then define POVM operators F m S ,U as Each operator of the above form corresponds to an event where the unitary chosen is U , and the subsequent measurement outcome is m.Again, these are rank-1 projectors, and so moments of the tomographic ensemble can be formed: In addition to Eq. ( 6), we will also prove a useful result that allows us to express the variance of nonlinear estimators in terms of the third moment of the tomographic ensemble.

No blocking
In the protocol with a single collection of ancillas and no conservation laws, the tomographic ensemble approaches a k-design over all wavefunctions in the Hilbert space of S. The second and third moments therefore take their universal form E (2) = 1 d S (d S + 1) (I + π (12) ) (S37) Because the POVM operators are rank-1 projectors F m = d S q m |ϕ m ⟩ ⟨ϕ m |, the second moment E (2) fully specifies the map M. Specifically, the two share the same matrix elements

FIG. 1 .
FIG. 1. (a)In our protocol, the target state ρS is evolved together with a set of ancillas under a fixed unitary U , before projective measurements are made in the computational basis.(b) For a many-body state, qubits can be subdivided into 'blocks', each of which interact with a separate set of ancillas.(c) Schematic of the process for constructing estimators θO of expectation values ⟨O⟩.
up to some small error.(Here, Fm = F m / Tr[F m ] are unit-trace positive operators.)Intuitively, closeness of a given ensemble to the Haar measure [as quantified by the moments (2)] implies that the probability distribution covers the space of states approximately uniformly.

FIG. 3 .
FIG. 3. Estimations of the fidelity Tr[ρS |EPR⟩ ⟨EPR|] (top panels) and purity Tr[ρ 2S ] (bottom panels) using the deterministic protocol (Det; fixed U ), and the semi-randomized protocol (Rand), see main text.In both cases the total evolution time is t = 10.The target state is the EPR pair state |EPR⟩ after dephasing with strength 1 − α.M = 5 × 10 3 repetitions are used for all data points.Errors relative to the true data are shown, with data artificially shifted horizontally for readability.
FIG. S1. (a)Circuit diagram for our tomography protocol, as in Fig.1(a).(b) Protocol for generating the projected ensemble for the state U |0⟩ SA , where |0⟩ SA is an extension of the product state |0⟩ A that also encompasses the system qubits.Note that the two protocols only differ in how the system qubits are prepared, and whether they are read out at the end.
for various different N A .We see approximately exponential decay with time, until a plateau is reached.The value of this plateau is close to the average trace distance that one obtains by replac-