Postselection-free learning of measurement-induced quantum dynamics

We address how one can empirically infer properties of quantum states generated by dynamics involving measurements. Our focus is on many-body settings where the number of measurements is extensive, making brute-force approaches based on postselection intractable due to their exponential sample complexity. We introduce a general-purpose scheme that can be used to infer any property of the post-measurement ensemble of states (e.g. the average entanglement entropy, or frame potential) using a scalable number of experimental repetitions. We first identify a general class of estimable properties that can be directly extracted from experimental data. Then, based on empirical observations of such quantities, we show how one can indirectly infer information about any particular given non-estimable quantity of interest through classical post-processing. Our approach is based on an optimization task, where one asks what are the minimum and maximum values that the desired quantity could possibly take, while ensuring consistency with observations. The true value of this quantity must then lie within a feasible range between these extrema, resulting in two-sided bounds. Narrow feasible ranges can be obtained by using a classical simulation of the device to determine which estimable properties one should measure. Even in cases where this simulation is inaccurate, unambiguous information about the true value of a given quantity realised on the quantum device can be learned. As an immediate application, we show that our method can be used to verify the emergence of quantum state designs in experiments. We identify some fundamental obstructions that in some cases prevent sharp knowledge of a given quantity from being inferred, and discuss what can be learned in cases where classical simulation is too computationally demanding to be feasible.


I. INTRODUCTION
In quantum mechanics, measurements serve both as a means to extract information about a system, and as a form of dynamics in of themselves.Not only does the outcome of a measurement provide information about the pre-measurement state, but also it is used to update one's knowledge of the post-measurement state, effectively leading to a stochastic 'collapse' the wavefunction.This effect is central to a number of longstanding ideas in quantum information science, including active error correction [1] and measurement-based quantum computation [2,3].In recent years, a great deal of interest has emerged in the study of many-body quantum states that are generated by such dynamics, leading to the discovery of measurement-induced entanglement phase transitions [4][5][6][7][8][9][10][11][12][13], emergent quantum state designs [14][15][16][17][18], and protocols for generating long-ranged entangled states via non-unitary dynamics [19][20][21][22][23][24][25].
The probabilistic nature of quantum measurements makes probing such phenomena in experiment a considerable challenge.This is because the states of interest cannot be prepared deterministically; rather, in each repetition of the experiment, we will obtain a different randomly chosen outcome, and hence a different postmeasurement state.Using conventional learning techniques, any property of a given quantum state can only be inferred through repeated preparation and measurement, which in this context would only be possible if we run the experiment sufficiently many times such that each state is realised on multiple occasions.Such a 'postselection'based approach has a sample complexity that is expo-nential in the number of measurements [16,26], which is infeasible for many-body systems.
In this paper, we introduce a method by which properties of the post-measurement ensemble of states can be learned from experimental data, without suffering from the exponential cost of postselection.Although a given ensemble-averaged quantity of interest (e.g. the mean entanglement entropy of the conditional states) may not be directly accessible in the sense one usually thinks of, we show that information about its value can still be inferred indirectly based on independent observations of certain other quantities, which we call 'estimable properties'.These latter quantities are constructed such that they can be directly computed using data obtained from a scalable number of experimental repetitions.
The basis of our method is to propose the following optimization task: What are the maximum and minimum values that the quantity of interest could take, based on the empirically observed values of a set of estimable properties?[See Fig. 1(c).]These extrema provide us with two-sided bounds for the desired property, i.e. we can conclude that the true value lies somewhere within this range.Despite the extremely high-dimensional nature of this optimization problem (scaling with the number of possible measurement outcomes), we show using analytical arguments how concrete bounds can be efficiently computed.
The success of our scheme-as quantified by the gap between minimum and maximum-depends on which estimable properties we choose to measure.We propose to use classical simulations of the quantum device to inform this decision.That is, when we run the experiment  (1)] that arise from some quantum dynamics featuring measurement.This could be a hybrid quantum circuit, the projected ensemble, or some other protocol of interest.To be general, we can think of the device as a black box which on a given run of the experiment, with with probability pz outputs a label z (representing the outcome of all measurements during the dynamics), along with a corresponding quantum state ρ Q z (the post-measurement conditional state).To learn properties of the ensemble, we perform a subsequent measurement on the state, here denoted Az, which itself can depend on z; after many repetitions this allows us to infer properties of the form (8). A classical simulation of the underlying dynamics can be used to help inform us how to best choose Az (Section IV).(b) Having learned the values of these measurable properties, we can characterise the space of all quantum state ensembles (collections of conditional states ρ = {ρ Q z }) that are consistent with our findings ρ ∈ K-see Section V. (c) For a given ensemble property of interest Ḡ [Eq.( 3)], we can construct minimum and maximum values (green stars) over all ensembles the empirically deduced set K. We can infer that the true value must lie between these extrema.and obtain a particular measurement outcome, a corresponding simulation of the dynamics is run on a classical device to determine what the best observable to measure is, given that the outcome in question has occurred.As the fidelity of the simulation improves, the bounds constructed using our method become tighter.We benchmark the performance of the method for particular representative examples, demonstrating that tight bounds can be constructed even when there is appreciable mismatch between the simulated and true states.This usage of a parallel simulation of the dynamics can be thought of as analogous to constructing 'quantumclassical correlators', which have been recently introduced in the context of measurement-induced dynamics [13,[27][28][29][30][31][32] as an alternative to feedback-based approaches [12,33].In contrast to those works, where the inference one makes is dependent on how closely the quantum and classical devices behave, our approach allows one to extract unambiguous information regarding properties that are intrinsic to the quantum device alone.While bounds of this kind have been proved for specific quantities on a case-by-case basis in Ref. [34], our method can in principle be used to estimate any ensemble-averaged quantity, with a guarantee that the sharpness of the bounds is optimal for a given set of estimable parameters.We stress that even though modelbased simulations are employed, the inferences we make are not contingent on any assumptions regarding the ac-curacy of this simulation, thereby allowing one to definitively verify whether or not some phenomena of interest is actually realised in experiment.As a concrete example, we show how the method introduced here can be used to verify the formation of quantum state designs in the projected ensemble [14,17]-a class of states with potential applications in state tomography, benchmarking, and cryptography [35][36][37][38][39].
By analysing the nature of the information that one gains from performing experiments in general terms, we also identify certain fundamental limitations that in some cases preclude arbitrarily sharp knowledge about a certain property from being known, even in principle.In particular, we prove a result (Theorem 1) which implies that if the conditional post-measurement states realised by the device are not close to being pure on average, then there will always be some residual uncertainty in the value of the desired quantity, even if the classical simulation is perfect.We also consider the possibility that classical simulations of the device may not be possible due to having too high a computational complexity.Our conclusion is that without the ability to perform some form of simulation, nothing can be learned about the ensemble of quantum states, besides the averaged state generated by the device (Theorem 2).Put simply, this suggests that we should only think of the post-measurement conditional quantum states as being physically accessible if we have some means to predict something about their structure in advance.Altogether, our results establish a fundamental separation between what can and cannot be learned about measurement-induced dynamics, and pave the way towards an understanding of how the various phenomena that arise in this context can be leveraged for other purposes, be it quantum communication, cryptography, or computation.

A. Setup
Our aim is to understand how one can infer properties of post-measurement quantum states from experimental data.Specifically, we consider settings where a quantum system Q is subjected to some dynamics featuring measurements, and we are interested in properties of the conditional states of the system at the end of the dynamics.Scenarios that fit into this category include (but are not limited to): 1. Hybrid quantum circuits: models of dynamics typically defined in discrete time featuring measurements at various points in space and time [4-8, 10, 11] 2. The projected ensemble, where a fixed many-body state is prepared before measurements of a subset of degrees of freedom are made [14][15][16][17]25] 3. Continuously monitored quantum systems [8,[40][41][42][43][44][45], and settings where one is interested in the quantum jump trajectories of open quantum systems [46,47] 4. Systems undergoing error correction/detection, which involves measuring stabilizer operators [1] 5. Measurement-based approaches to quantum computation [2,3] Since measurements are fundamentally stochastic processes, the conditional state of the system after measurements have occurred is itself a random variable.Thus, in contrast to familiar scenarios where the task is to learn about a fixed quantum state ρ that can be prepared deterministically, we are instead concerned with statistical ensembles of quantum states In the above, each z ∈ Z is a possible measurement outcome, which occurs with probability p z , and ρ Q z is the (normalized) state of the system conditioned on this outcome-a unit-trace, Hermitian, positive semi-definite operator.
Such ensembles of states can be thought of as an abstracted description of any of the above-mentioned examples.We will remain indifferent to the exact nature of the underlying dynamics, and instead picture a scenario where in each run of an experiment, some oracle (i.e. a 'black box', the implementation of which we disregard) provides us with a label z (the measurement outcomes), sampled from the probability distribution p z , along with a corresponding conditional state ρ Q z .In each run of the actual experiment, the dynamics of interest is executed, which we can think of as a single query to this oracle, after which we can apply some additional measurement whose purpose is to extract information about ρ Q z [see Fig. 1(a)].
If the outcome z is ignored, then this oracle is equivalent to a device that prepares the averaged state every time.Properties of ⟨ρ Q ⟩ can therefore be learned using conventional approaches.In some settings, we may be concerned with properties of this averaged state, while in others-including many of the examples listed abovethe physics being investigated may be manifest in the individual conditional states that make up the ensemble (1).In the latter case, our wish is to learn properties of this ensemble E Q beyond those of the average state, using some fixed number of queries/samples M .
A well-appreciated issue that makes this objective difficult to achieve in the context of measurement-induced dynamics is the postselection problem, which we describe in detail later.In brief, the problem stems from the fact that in each run of the experiment, we only get a single copy of the conditional state ρ Q z , which is sampled randomly from the distribution p z .In the regime where the number of outcomes |Z| is large (which is to be expected in many-body settings), the state we get will be different every time for any reasonable number of repetitions M .Evidently, we can immediately rule out the possibility of learning properties of any individual conditional state ρ Q z * , since the probability of this state never occurring is high.One might still hope to be able to estimate ensemble-averaged properties from a finite sample {z (1) , . . ., z (M ) }, i.e. we look to estimate quantities of the form where G(σ) is some function of a density matrix σ, and E z denotes an expectation values over the distribution {p z }.However, in contrast to classical physics, quantum states cannot be copied, and thus having only single-copy access to the conditional states limits the information we can extract about each ρ Q z .In particular, as has been discussed previously, there is an apparent obstacle to learning properties of the kind (3) where G(σ) is a nonlinear function of σ, since these cannot be expressed as functions of the average state ⟨ρ Q ⟩.A central aim of this paper is to critically examine this expectation, which is usually presented in somewhat heuristic terms, and to sharply determine exactly what can and cannot be inferred about a post-measurement quantum state ensemble from experimental data of a reasonable size M .

B. Results and structure of paper
Our first step to determine if and how the postselection problem can be circumvented is to establish a general class of ensemble properties that can be directly estimated using a reasonable number of repetitions M .In Section III, we demonstrate that expressions of the form Eq. ( 8) constitute such 'estimable properties' of the ensemble, in that one can construct a function of the experimental data which equals the property in question on average, without any additional assumptions being made.
Most properties of the ensemble that are of interest do not fall within this class, and hence cannot be directly estimated in the same way.Nevertheless, we demonstrate how information about some non-estimable property can be indirectly inferred using independent measurements of other estimable quantities, using the following logic.Given a collection of estimable quantities and some empirical observations of their values (which we get from running the experiment), we consider the space of all ensembles that are consistent with these observations.We refer to this as the feasible space K [Fig.1(b)], and we can guarantee that the true ensemble realised in the experiment lies somewhere within K. We can characterise the best possible state of knowledge about some nonestimable average E z [G] by looking at the extremes of this quantity over the space K.The maximum and minimum possible values of E z [G] that are consistent with our observations can be represented as the solutions to an optimization problem [Fig.1(c)].By solving these optimization problems, we can construct two-sided bounds for the desired quantity, i.e. we infer that E z [G] must be between the minimum and maximum, both of which can be computed efficiently using a scalable number of repetitions M .Ideally the upper and lower bounds constructed using this approach will be close to one another, thereby giving us sharp knowledge about its value.This approach is outlined in detail in Section V, and we apply this idea to construct explicit bounds for various quantities of interest in Section VI.
We are naturally concerned with how successful this indirect inference scheme can be, as quantified by the width of the feasible range.To address this issue, one must first decide how to choose which estimable parameters to measure.While our analysis works for any such choice, we can make a decision based on an approach introduced in previous works, where one constructs cross-correlations between experimental data and an independent simulation of the underlying dynamics.These 'quantumclassical correlators', which we describe in Section IV, fall within the set of measurable properties, and hence can be efficiently estimated.(The nature of these simulations need not matter for the purposes of our inference scheme, but we address some specifics in the discussion.)Using this approach, we argue that the sharpness of the two-sided bounds depend on two key factors.Firstly, the accuracy of the simulation influences the gap between the minimum and maximum: Naturally, as the simulation gets closer to the true behaviour of the system, the bounds become tighter.Secondly, the nature of the conditional states that are realised on the quantum device ρ Q z also plays an important role.We prove an important result-Theorem 1-which states that regardless of which measurable quantities we compute, there will always be a consistent ensemble which is made up of states that are almost all pure.The implication is that when the actual states realised by the device ρ Q z are mixed, we cannot necessarily constrain the range of a desired property E z [G] to be within an arbitrarily small window, even if the simulations we use are perfect.This represents a fundamental obstruction to learning mixed state ensembles without postselection, which we discuss in detail in Section VIII A.
As an immediate application of our results, we show how the inference scheme developed here can be used to verify the emergence of quantum state designs in chaotic many-body systems [14,17].We show how one can constrain a quantity known as the frame potential, which can be used to quantify how far an ensemble is from being a k-design (namely, an ensemble whose kth moments coincide with those of the Haar ensemble [48]).In Section VII, we present a numerical simulation of an experiment that features both noise and miscalibrations in the Hamiltonian, and show that despite these imperfections (which are inevitably present in any experiment), the frame potential can be determined to be within a reasonably narrow window, in the sense describe above.
In practice, performing a simulation of the quantum device may be a computationally demanding task, and we discuss the feasibility of such simulations for various specific cases in Section VIII B. It is therefore natural to consider whether anything can be learned in the case where simulation is not possible.To address this question, we present a result-Theorem 2-the implication of which is that if we do not employ some sort of simulation of the dynamics (broadly defined), then we cannot learn anything about the ensemble of quantum states E Q beyond properties of the averaged state (2) using a number of repetitions that scales polynomially with system size.To be specific, in this regime the true ensemble is indistinguishable from one where every conditional state ρ Q z is replaced by the averaged state (2).This suggests that an inability to simulate the device in question renders the conditional states inaccessible in experiment.We conclude by discussing this point, along with some of the other broader implications of our results in Section IX.

A. The no-coincidence regime
In all the scenarios captured by our generalized setup, a natural task is to infer properties of the ensemble E Q using some kind of learning scheme.In particular, for the purposes of this work we will be interested in estimating averaged properties of the states in the ensemble, i.e. quantities that can be expressed in the form of Eq. ( 3).
We start by addressing the postselection problem in detail.When it comes to learning properties of postmeasurement quantum states from experimental data, a fundamental difficulty arises when the number of states in the ensemble |Z| is large-this is typically the case in the many-body setting, since |Z| scales exponentially with the number of measurements made, which itself typically scales with system size and/or time.This places us in a regime where, for any reasonable number of experimental repetitions M , the probabilities will satisfy p z M ≪ 1, meaning that we typically only get access to at most one instance of each state ρ Q z over the whole experiment.The significance of this 'no-coincidence' regime can be appreciated relatively straightforwardly: If we are given a single copy of a given quantum state ρ Q z , then whatever measurement we subsequently perform on it, the distribution of outcomes will depend linearly on the density matrix ρ Q z .Hence, if there are no coincidences (no value of z occurs twice or more), then regardless of how we process the data, the only quantities that we can infer from the observed distribution of measurement outcomes are those that are themselves linear in ρ Q z .If we hastily apply this logic to quantities of the form (3), this forces us to restrict ourselves to functions of the form G(ρ Q z ) = Tr[ρ Q z A] for some observable A. In this case, we write the property in question as where ⟨ρ Q ⟩ is the average state (2) Such quantities evidently give us no information about the nature of individual states in the ensemble, and we only learn about the average state (2).In contrast, averages of nonlinear functions of the ensemble states, e.g.squared expectation values , do contain information beyond that contained in the average state, which is why these are the quantities that are of relevance to the various problems described in Section II A. If we had access to multiple copies of each state ρ Q z , then we could in principle learn such nonlinear functions by looking at the full distribution of measurement outcomes for each z separately.However, in the regimes we are interested in this demands a prohibitively large number of experimental repetitions.Our aim is to find a solution to this postselection problem while keeping the query complexity bounded.

B. Measurable quantities
To get a more precise picture of exactly what quantities are or are not experimentally accessible in the nocoincidence limit, let us consider a general procedure that can be used to extract information about the ensemble E Q .In a given repetition r ∈ {1, . . ., M }, we obtain a sample from the ensemble z (r) , and subsequently apply some (generalized) measurement to the quantum system Q, which can in principle depend on the outcome z (r) .This z-dependent measurement scheme can be represented by a POVM F Q (z) = {F Q x (z) : x ∈ X }, where X is a discrete set of measurement outcomes, and the operators F Q x (z) are Hermitian positive semi-definite, satisfying x∈X F Q x (z) = I for each z.Conditioned on the outcome z (r) , the result x (r) ∈ X occurs with probabil- , which is linear in ρ Q z (r) , as discussed above, and together with the ensemble probabilities p z this defines a joint probability distribution for the set of possible outcomes of a single run The full set of data that we acquire from the experiment {(x (r) , z (r) ) : r = 1, . . .M } corresponds to a set of M independent, identically distributed samples of pairs X (r) := (x (r) , z (r) ) drawn according to the probabilities (5).Most obviously, from such a sample we can estimate the average of an arbitrary function of these pairs f (X) where the overline is used as a shorthand for expectation values with respect to the samples X.The function f (X (r) ) is said to be an unbiased estimator of the quantity on the right hand side of (6).We will focus on the above quantities for the most part, since they are particularly relevant to ensemble averages of the form (3).However, if we wish to be even more general, we could also use the sampled data to estimate functions of n ≤ M independently sampled pairs f n (X 1 , . . ., X n ) The quantities (6,7) are referred to as estimable parameters of the distribution (5), because one can find a function of the sampled data {X (1) , . . ., X (M ) } that is equal to these quantities in expectation [49,50].In fact, any functional over the space of probability distributions that has an unbiased estimator must be expressible in the forms written above [51,52].Hence, these are the only classes of observables that we can experimentally learn in the no-coincidence limit.Returning to Eq. ( 6), we remark that the ensemble states ρ Q z only appear through the outcome probabilities (5), which as discussed in the previous section are linear in the density matrices.It is therefore helpful to rewrite the average (6) as where we define the family of operators Eq. ( 8) gives us a succinct characterization of the class of quantities that can be learned experimentally without prohibitive postselection overheads.(The more general estimators (7) can always be decomposed in terms of the above.)We immediately see that Eq. ( 4) is the special case where we disregard the classical information z when choosing the measurement scheme and post-processing function f , i.e. f (x, z) = f (x) and F Q x (z) = F Q x , such that A z becomes z-independent.Therefore, quite naturally, we conclude that in order to probe properties of the ensemble that cannot be characterized solely by the average density matrix (2), we must adopt a learning strategy that itself depends explicitly on z.Notably, to do so necessarily requires us to have some a priori knowledge about the relationship between the labels z and the states ρ Q z .The essence of our scheme, which we describe in more detail in the following sections, is to use idealised classical computer simulations of the quantum device to inform us as to how A z should depend on z.
C. Aside: Avoiding mid-circuit measurement and feed-forward using classical shadows As written, the measurable quantities (8) appear to require a feed-forward mechanism, where the outcome z is used to decide what physical measurement to perform.Before describing our postselection-free inference scheme in detail, we briefly pause to explain how this aspect of the measurement procedure, which may be hard to implement in practice, can be avoided using ideas from classical shadow tomography [53].A similar approach has been outlined in Ref. [34].
Without feed-forward, we must fix the POVM F Q (z) to be z-independent.We will make the key assumption that this fixed POVM is informationally complete [54], i.e. the collection of operators {F Q x } x∈X span the full space of operators over H Q .In this case, one can find a (not necessarily unique) complementary set of operators F Q x which overall have the following property Using the nomenclature of Ref. [54], the POVM operators F Q x constitute an operator frame, while F Q x is the corresponding dual frame.With this construction in hand, any observable of the form (8) can be estimated from a set of experimentally measured data {(x (r) , z (r) ) : r = 1, . . .M } by using the post-processing function Combining Eqs.(9, 10), we see that the average of such a function equals the right hand side of (8) in expectation The above prescription 11 gives us an explicit way of computing an unbiased estimator for any quantity of the form (8). Notably, we can do this without deciding on the operators A z in advance of the physical measurement of the system, i.e. we can "measure first, ask questions later" [55].This feature of informationally complete POVMs means we do not need to employ adaptive schemes, where the physical measurement procedure is decided based on the sample z.
A particularly straightforward way to implement an informationally complete POVM is to use classical shadow tomography [53,56], which was proposed as a useful way to study measurement-induced dynamics in Ref. [34].In each run of the experiment, we apply a randomly chosen unitary U c from a pre-chosen ensemble U = {(q c , U c )} c∈C , where C is an arbitrary discrete set, before performing a projective measurement in a fixed basis {|m⟩ ⟨m|}.For certain choices of U, informational completeness is guaranteed, and the dual frame can be computed efficiently.
While there are many such possibilities, as a concrete example, once can take U to be a uniform distribution over all single-qubit Clifford gates (i.e. operations of the form Thus, the necessary measurement scheme can be implemented using single-qubit rotations and measurements.In many settings, this means we can perform all the measurements simultaneously at the end of the experiment (both those that generate the ensemble E Q and those we use to extract information).This is illustrated for the case of the projected ensemble of a bipartite state The number of repetitions M required to estimate ⟨A z ⟩ Q to a given accuracy can be expressed in terms of the variance Var In the more familiar setting where one wishes to learning the properties of a fixed state ρ, the variance of a shadow tomographic estimator can be bounded using a useful construction the shadow norm ∥A∥ 2 shadow , which is a function of the observable to be estimated A, as well as the ensemble U; this is defined in Ref. [53].In the present case, where we are instead Protocol for measuring arbitrary estimable properties (8) for the projected ensemble using classical shadows, with all measurements occurring simultaneously at the end of the circuit.The projected ensemble corresponds to the collection of post-measurement states that arise when the bipartite state |Ψ AB ⟩ = U (t) |0 ⊗N ⟩ is prepared, and then qubits in A are measured in the computational basis, with outcomes {zi : i = 1, . . ., NA}.The post-measurement states are thus defined on the qubits in B; this figure shows the case NA = 5, NB = 3.As described in the main text, a shadow-based scheme can be used to probe these states, which corresponds to performing random on-site unitaries uc i before measuring in the computational basis, with outcomes mi (to be distinguished from zi, which label the states we are trying to probe).
dealing with an ensemble of states, it is straightforward to show that Var x,z f (x, z) can be bounded as Thus, the property ⟨A z ⟩ Q can be estimated to a good accuracy using a reasonable number experimental repetitions provided that each of the operators A z is (with high probability) an operator that itself can be efficiently estimated in ordinary shadow tomography.
The randomized Pauli measurements discussed above constitute one particular example of an informationally complete POVM, but we emphasise that there are many other alternative, e.g.those based on global Clifford rotations [53], or even generalized methods that use chaotic Hamiltonian evolution and/or ancilla qubits [39,57].In all the subsequent analysis, we remain agnostic to the exact measurement scheme used, and will simply assume that we have some way to measure the properties (8).

IV. CLASSICAL SIMULATIONS
We have now identified an explicit scheme for measuring quantities of the form ⟨A z ⟩ Q [Eq.( 8)] without problematic postselection overheads.
However, as mentioned previously, our goal is to infer properties of the ensemble that take the form (3)-specifically those for which G(ρ) is a nonlinear function.Here we describe the idea of 'quantum-classical correlators', where a simulations of the system in question is run in conjunction with the experimental, and describe what they can tell us about such nonlinear averages.
By 'classical simulations', we mean the following: Each time we perform a run of the experiment, which gives us a label z ∈ Z and a measurement outcome x ∈ X , we also compute and store a representation of some corresponding state ρ C z on a classical computer.We use the superscript C to distinguish this 'classical' state, which represents the result of some idealised simulation of the experiment, from the 'quantum' state that is actually realised on the true device.We emphasise that this simulation can be done 'lazily', i.e. we only compute ρ C z for those values of z that happen to arise in the experiment, as opposed to pre-computing every state ρ C z in advance, the cost of which would scale with |Z|.We also do not need to classically sample from the distribution p z , which in many cases is itself computationally demanding.The nature, accuracy, and computational cost of the classical simulation may depend on the specific physical setting being considered, and we will discuss several particular examples in Section VIII B. However, for the time being, we presume that some form of simulation is possible, and motivate our discussion on the basis that there is some partial correlation between the classical and quantum states, ρ C z and ρ Q z .The case where a classical simulation is impossible-either due to incomplete knowledge of how the quantum device operates, or prohibitively high computational cost-is discussed in Section VIII C.
Let us take the simplest nontrivial case and suppose that our aim is to learn the average of a particular squared expectation value Tr[Oρ Q z ] 2 over the ensemble E Q .We introduce the following notation for such a quantity The superscript QQ is used to emphasise that the quantity in question is a linear functional of the state ρ Q z ⊗ρ Q z .As explained in Section III, we cannot directly measure this quantity.However, a natural proxy that has been introduced in several recent works is the 'quantum-classical correlator' [13,[27][28][29][30][31][32], which here we define as Note that the probabilities p z appearing in Eq. ( 16) are the same as those appearing in the fully quantum expression (15).The above quantity can then be cast in the form of Eq. ( 8) with Thus, once we collect samples z (r) taken on the quantum device, we can compute the corresponding classical observables Tr[Oρ C z ] and construct an estimator of the quantum-classical correlator, using the methods prescribed in Section III.Finally, we can also consider a 'classical-classical' correlator defined by analogy to Eq. ( 15), again with the probabilities p z set by the quantum device.This is also of the form (8), with A z = Tr[Oρ C z ] 2 × I (the quantum states are disregarded here).
Clearly, in the limit where the classical simulation perfectly matches the behaviour of the quantum device, ρ C z = ρ Q z , all these quantities are equal.Thus, we hope that if the simulation is good, but not perfect, the proxy quantity ( 16) will be close in value to the true 'quantumquantum' observable (15), which is the physically relevant quantity.However, at present we cannot make any definitive conclusions about the value of the quantumquantum correlator without making unsubstantiated assumptions about the accuracy of our classical simulations.Our objective in the following two sections, which form the most technical parts of this paper, is to establish methods that allow one to construct rigorous two-sided bounds for the true value (15) based on experimental observations of the measurable quantities (16,18), without making any a priori assumptions about how accurate the classical simulation is.That is, even though we are using our classical simulation as a form of prior 'guess' for the conditional states ρ Q z , we allow for the possibility that this guess is incorrect.This skeptical approach to learning means that anything conclusions we make about the ensemble will be entirely unambiguous.

V. CONVEX OPTIMIZATION APPROACH TO INFERRING AVERAGES
We have seen in detail how the no-coincidence limit gives rise to a distinction between properties of quantum state ensembles that can be measured straightforwardly-those of the form ⟨A z ⟩ E Q , Eq. ( 8), which include 'quantum-classical' correlators ( 16)versus those that cannot be directly measured with a reasonable number of experimental repetitions, e.g. the 'quantum-quantum' correlator (15).In the absence of direct estimation schemes for the latter class, we are interested in understanding the best possible state of knowledge that we could in principle have about such unobservable quantities, based on experimentally accessible data.Our intuition, based on the discussion of the previous section, is that by cross-correlating experimental outputs with classical simulations of the quantum system, we can gain some amount of knowledge about these quantities, even though we cannot measure them directly.To make this intuition concrete, in this and the following sections, we aim to address the question: Given knowledge of a set of observable quantities of the form (8), what range of values can a particular unobservable quantity take, while ensuring consistency with our observations?
More formally, suppose that from a set of experimental data, we construct estimates of a family of R scalar quantities {⟨A (i) z ⟩ : i = 1, . . ., R}, the outcomes of which we denote b i .For the moment, we presume that any statistical uncertainty in these observations can be neglected (an assumption which we will relax later).We wish to determine the maximum and minimum possible values that a particular average E z G(ρ Q z ) can take over all quantum state ensembles E Q that satisfy Denoting the minimum and maximum values of E z G(ρ Q z ) as g * ± , this will give us a two-sided bound for the desired average, namely The determination of g * ± can be viewed as an optimization task, where the object being varied over is the ensemble E Q itself, and the function being extremized is the average E z G(ρ Q z ).In this section, we study the structure of such optimization problems at a general level.

A. Set of consistent ensembles
Our first step is to analyse the structure of the space of quantum state ensembles that satisfy Eq. (19).For succinctness of notation, we will find it useful to represent the collection of states {ρ Q z } z∈Z in terms of a single large block-diagonal matrix ρ = z∈Z ρ Q z , where each block contains the density matrix ρ Q z for a particular label z.We can then view ρ as an element of the linear space of matrices M := B(H) ⊕|Z| .As for the probabilities p z , while these are not known to us in full in practice, our ability to run the experiment means we can sample from this distribution; therefore our approach will to keep p z fixed, while allowing the states ρ Q z themselves to vary.With all this in mind, we begin by formally specifying the space of valid quantum state ensembles as where D ⊂ B(H) denotes the space of density matrices for a single copy of the system Hilbert space H.We are interested specifically in ensembles that are consistent with the observations (19).Again using the direct sum representation, for each block-diagonal matrix X = z X z ∈ M, we define the linear function and we define the space A as Since A is a subspace of a linear space defined by R linear constraints, we have that A is a hyperplane of codimension R in M. Finally, the feasible space is given by the intersection K = K 0 ∩ A. Writing this out in full, Each element of K corresponds to a particular ensemble that could describe the system, based on the empirical data (19).Crucially, we note that when viewed as a subset of the linear space M, the set K is convex, i.e. if ρ = z ρ Q z and σ = z σ Q z are two sets of states that both belong to K, then so too does for any λ ∈ [0, 1].This is a consequence of the convexity of both the space of density matrices D and the hyperplane A, along with the fact that the intersection of two convex sets is itself convex.

B. Convex functions
Concretely, our aim is to determine the range of feasible values that a particular average E z G can take over the space of feasible ensembles K. Namely, we wish to characterize the set where we introduce the shorthand The convexity of K will prove useful for this purpose, in particular for determining the extremal feasible values g * + = max g∈G g (similar for the minimum g * − ).Indeed, we can formulate such a task in terms of the following optimization problems (which we write out in longhand momentarily) (with g * + corresponding to the maximum, and g * − to the minimum).One can then immediately use these extrema to bound the average for the true ensemble E Q on both sides as This reflects the best possible state of knowledge we could have about the average Ḡ, based on our observations.
Optimization problems over convex sets such as Eq. ( 27) have been well-studied in a wide variety of contexts.Most prominently, much work has gone into the study of convex optimization, which is concerned with minimizing convex functions (equivalently, maximizing concave functions) over convex sets.Recall that a function Ḡ : K → R is convex if for any two ρ, ρ ′ ∈ K and any λ ∈ [0, 1], we have Convex optimization tasks enjoy many useful properties, which can be exploited to gain analytical insight into their solutions, and to design efficient algorithms.For this reason, from hereon we specialize to cases where the function Ḡ(ρ) is convex, unless mentioned otherwise.
From our construction of Ḡ [Eq.( 26)], we see that Ḡ is convex if the function G(ρ Q z ), the average of which we are interested in, is itself a convex function over density matrices.Some particularly pertinent functions that arise in quantum many-body physics include:  15) is an example of this with k = 2.We would need to compute such a quantity arises if we wanted to know the variance of ⟨O⟩ over the ensemble E Q .We note that Tr[ρO] k is convex for even k ≥ 2, or for any positive integer when O is semi-positive definite.

2.
Entropies.-Oftenwe are interested in an entropy associated with a quantum state ρ, or a entropy of a subsystem of Q.The von Neumann entropy S(ρ) = − Tr[ρ log ρ] is a concave function of ρ, as are Rényi entropies S α (ρ) = (1 − α) −1 log(Tr[ρ α ]) for α ∈ (0, 1) [58].Since our subsequent analysis refers explicitly to convex functions, one can simply work with −S(ρ), which is convex.One should then bear in mind that the role of minimization and maximimzation of the objective function are exchanged.

Purities.-A closely related object is the purity
Tr[ρ 2 ] and generalizations to higher powers Tr[ρ k ].These are equal to exponentials of Rényi entropies S α=k (ρ), and hence give information on how mixed the states are, either globally or within a subsystem.It is straightforward to show that Tr[ρ k ] is convex in ρ for k ≥ 1.
While our specialization to convex functions may seem restrictive, we note that any function G(ρ) whose Hessian (matrix of second derivatives) is bounded can be decomposed as a sum of a convex and a concave function [59], and hence each component can be bounded separately using the methods described in the following.

C. Duality and certificates for convex optimization problems
Working with the understanding that Ḡ(ρ) is a convex function, we now describe our approach to finding the extrema g * ± , or approximations thereof.Several standard techniques and results from the field of convex optimization will be used in the following; we refer the interested reader to Ref. [60] for an introduction to the field and proofs of various pertinent results.
The standard approach to optimization problems with equality constraints of the form ( 19) is to make use of Lagrange multipliers.For each constraint i = 1, . . ., R we introduce a scalar Lagrange multiplier λ i , and define the Lagrange dual functions as follows [60] where the Lagrangian L(ρ, λ i ) is given by We emphasise that the domains in Eqs. ( 30) is K 0 , without the linear constraints (19).This is significant because L(ρ, λ i ) is a sum over z, and thanks to the structure of K 0 , we can perform the extremization for each block ρ Q z separately.
An important concept in the study of optimization problems is that of strong and weak duality, which relate the original optimization task (the 'primal problem') to a particular secondary task (the 'dual problem').The dual problem is to find the extreme points Since h ± (λ i ) are by definition the pointwise supremum and infimum of a family of affine functions, they are concave and convex functions, respectively.The dual problems are therefore convex optimization problems, even if the primal is not.
Strong duality is the statement that the solutions to the primal and dual problems are identical.In our case, by Slater's condition [60], strong duality holds for the minimization problem, i.e. h * − = g * − , provided that Ḡ(ρ) is convex.As for the maximization problem, this is not a convex optimization problem, since it is equivalent to minimizing − Ḡ, which is concave by assumption.There is still some useful structure exhibited by convex maximization problems, which we will discuss in Section V D, but for the time being we will instead rely on weak duality, which holds irrespective of the nature of Ḡ. Weak duality states that h * − ≤ g * − and h * + ≥ g * + .This general property can be proved using the min-max inequality.
While many efficient algorithms exist that allow one to solve convex optimization problems numerically, the high dimensionality of the primal problem (as well as the stochastic nature of the observations we make) mean that we cannot directly employ such algorithms as they stand.One option is to apply such methods to solve the dual problem, which has dimensionality R, rather than |Z| × d 2 , thus making the problem more manageable.While this numerical approach is perfectly feasible, here we choose to study these problems analytically, in order to gain more insight into the structure of these problems.While exact solutions to the optimization problems are not always obtainable using analytic methods, we can still invoke strong or weak duality, which allow us to find primal-dual certificates: That is, even if we only have an approximate solution {λ i } to the dual problem, we always have the following series of inequalities Thanks to these relations, if we can evaluate h − (λ i ) for some set of Lagrange multipliers, we can certify that the true minimum is no less than h − (λ i ).Thus, even if we cannot determine the exact value of the extrema, we can use suboptimal solutions of the dual problem to obtain conservative estimates of g * ± , i.e. a rigorous bound that is guaranteed not to be an overestimate (underestimate) of the minimum (maximum).In lieu of an exact solution to either the primal or dual problems, our aim is then to find a way of obtaining h ± (λ i ) for near-optimal λ i , so as to obtain as tight a bound as possible.
By treating these optimization problems analytically, rather than numerically, we will gain valuable insight that informs us how to choose the operators A (i) z from the very beginning so as to obtain small feasible ranges; we will have this in mind in Section VI and beyond.

D. Minimization vs. Maximization
Before we begin the process of constructing explicit bounds for specific observables, it is important to point out a fundamental difference between minimization vs. maximization problems for a given convex average Ḡ(ρ).Namely, the former is a convex optimization problem (minimizing a convex function over a convex set), while the latter is not.This difference between the two will turn out to have important consequences for the tightness of the bounds that one can infer based on experimental observations.While the maximization problem cannot be cast into a standard convex optimzation problem, we can still invoke a useful property that results from its special structure: The maximum of a convex function over a convex set is always attained at at least one extreme point of the set.Recall that an extreme point of a convex set C are those elements τ ∈ C which cannot be expressed as a nontrivial convex combination of two other elements.Specifically, τ is extreme if and only if τ = λτ ′ +(1−λ)τ ′′ for some λ ∈ (0, 1) implies τ ′ = τ ′′ = τ [60].If G(τ ) is a convex function over C, then for any non-extreme τ , we can find appropriate elements τ ′ , τ , which proves the claim stated above.
By considering the eigenstate decomposition of a density matrix ρ, one can see that the extreme points of the space D are pure states ρ = |ϕ⟩ ⟨ϕ|.This structure is naturally reflected in the extreme points of K.In Appendix A, we prove the following: Theorem 1.Any ensemble ρ that is an extreme point of the set K [Eq.(23)] has at most R states that are not globally pure, where R is the number of linear constraints in Eq. (19).
Since K is necessarily nonempty (the true ensemble E Q lies within K), and is a closed, linearly bounded subset of M = B(H) ⊕|Z| , at least one extreme point must exist, and hence we immediately have Corollary.Given the values of R scalar measurable properties (19), there exists an ensemble consistent with these observations for which at most R states are globally non-pure.
As a result, in the no-coincidence limit, where Rp z ≪ 1, we can infer that the solution to the maximization problem is extremely close to the solution of the same problem with the additional constraint that all states are pure.This fact imposes fundamental limitations on how wide the feasible range (20) can be made in cases where the actual state ensemble E Q is significantly mixed, We will consider this issue in more detail in Section VIII A.

VI. CONSTRUCTING TWO-SIDED BOUNDS
Having outlined the general structure of our optimization-based approach in the previous section, we can now consider a range of different averaged properties that are of particular physical relevance, and construct explicit upper and lower bounds for each.Because the minimization and maximization problems have distinct characters, it is helpful to consider the two separately for each observable.The collection of quantities considered here is by no means exhaustive, and analogous bounds can be constructed for other observables, but the examples we choose encompass a broad range of quantities that are pertinent to measurement-induced dynamics.
The logical arguments used to derive bounds for each quantity follow much the same pattern, and so after deriving the first several cases, we will simply quote the remainder of our results, leaving the proofs to Appendix B. Readers who are mainly concerned with the results of our calculations, rather than the detailed derivations, may skip the bulk of this section, and instead consult Table I, where references to specific bounds are listed.

A. Global purity lower bound
One of the simplest possible nonlinear averages that we can consider is the averaged global purity, G(ρ (to be distinguished from the purity of a subsystem of Q, which we treat in Section VI C).As mentioned above, this is a convex function of ρ Q z , and hence the minimization problem can be solved using convex optimization techniques.The dual function (30a) for this problem can be formally defined as By virtue of the product structure of K 0 ≡ D ⊕|Z| , we can minimize with respect to each conditional state ρ Q z separately, giving where z ], and z .Here we have defined a function over the space of traceless Hermitian operators Evidently, the above depends only on the eigenvalues of C. In principle, a fully general expression for the above can be found, however to make progress in the following we will use a simple lower bound F The right hand side is readily maximized over the Lagrange parameters λ i , which gives us a lower bound for the solution of the dual problem where we define the matrix The lower bound (38) corresponds to the set of dual parameters

and thus the bound becomes optimal ( h
Eq. ( 52) von Neumann entropy Eqs. (63,64) Eqs.(56,58) Frame potential Examples of ensemble-averaged quantities for which we derive upper and lower bounds, and references to the relevant inequalities derived below.Here, Q1 denotes a subsystem of Q, N is an arbitrary linear map over the space of operators, and the notation N ⪰ 0 indicates that this map is semi-positive definite, i.e. ⟪X|N |X⟫ ≥ 0 for all operators X.In some cases, two bounds are quoted, one of which is simple to evaluate, with the other being more versatile and amenable to optimization.The asterisk indicates that the bound is vacuous.
condition may or may not be met for any given set of measurements; regardless, Eq. ( 38) constitutes a viable certificate by way of strong duality (32), in that the averaged global purity of the true ensemble E Q can be no less than the right hand side, i.e.

Incorporating classical simulations
Having reached this point, we can revisit the original problem and ask: how could we chosen z in the first place in order to make our bound (38) as close as possible to the true value of the average E z G(ρ Q z )?We can use the discussion of Section IV to guide our intuition.There, we posited that the quantum-classical correlator (16), which is a measurable quantity, would serve as a good proxy for the quantum-quantum correlator (15), on account of the fact that the two coincide in the limit of perfect classical simulation ρ Q z = ρ C z .Here, we can define an analogous 'quantum-classical purity', which can evidently be cast in the form of an estimable quantity (8), and indeed is equal to the desired average when ρ Q z = ρ C z .Accordingly, it is instructive to consider an example where this is the only constraint we use, i.e.R = 1, with A where we define ] by analogy to Eq. (40).The above inequality can also be proved by independent means using the Cauchy-Schwartz inequality, first applied to the Hilbert-Schmidt inner product , and then to the average This serves as a useful sanity check for our series of bounds h− ≤ g . We observe that as the quality of the classical simulation improves, the corresponding lower bound should increase, resulting in tighter bounds.
We emphasise, however, that Eq. ( 38) is far more versatile as a bound than the simple inequality (41).In particular, we can incorporate multiple constraints (19) which allows us to use more information than just the averaged overlap between classical and quantum states.In particular, since the purity is a quadratic function of the density matrix, this suggests using more general quantum classical correlators that are bilinear in ρ Q z and ρ C z .For this purpose, we introduce the superoperators (linear maps over the space of operators) Thinking of these as (d 2 × d 2 )-dimensional matrices, one can see that one can extract all possible quantumclassical and classical-classical correlators from the above objects: For any operators A, B, one has ⟪A|η QC |B⟫ = ⟨A † ⊗ B⟩ QC , and similar for η CC .Therefore, if we were to construct a complete basis of operators {σ µ } and measure all correlators of the form ⟨σ µ ⊗ σ ν ⟩ QC for µ, ν = 1, . . ., d 2 , we could fully reconstruct η QC .For each one of these measured correlators, we will have a Lagrange multiplier λ i , and these can also be organized into a superoperator form which we call ζCQ .When arranged in this way, the dual function (37) then becomes Here, STr[η] denotes the trace of a superoperator η, which we could write in terms of a complete orthonormal basis of operators σ µ as STr[η] = µ ⟪σ µ |η|σ µ ⟫.While the above holds for any choice of ζCQ , by analogy to Eq. (38) we can find the optimum choice ζ CQ (written without a tilde), which is the solution to the superoperator equation (which is guaranteed to exist) (If η CC has an inverse, we could write ζ CQ = (η CC ) −1 (η QC ) † .)At this dual-optimum point, we obtain the bound Eq. ( 45) is our first concrete inequality that can be straightforwardly calculated using classical-quantum correlators.Evidently, in the case of perfect classical simulation ρ C z = ρ Q z , we have ζ CQ = id, and the inequalities in (45) are both saturated.Hence, although we have lost some tightness in our calculations, we expect the bound to be near-optimal when the classical simulation is not quite perfect.We will benchmark how tight these inequalities are in various scenarios in Section VII.

B. Global purity upper bound
In searching for an upper bound for the global purity, we could in principle set up the maximization problem (27) in an entirely analogous way to the above, finding an upper bound for h * + and exploiting weak duality to obtain a corresponding bound for g * + .However, the global purity has a special significance in this context, which means that an immediate answer can be obtained by invoking the corollary of Theorem 1: We know that there exists at least one ensemble consistent with the measurements (19) for which no more than R of the states ρ Q z are mixed.The existence of such an ensemble immediately implies that g * + must be at least In the no-coincidence limit p z ≪ 1, the above very close to 1, which is itself a universally applicable upper bound for the averaged purity.We conclude that regardless of which observables z we measure, we cannot obtain a non-vacuous upper bound for the averaged global purity.This fundamental obstruction is related to the special significance that global purity has for the structure of K, and in particular its extreme points.We discuss this issue in depth in Section VIII A.

C. Lower bound for subsystem purities and quadratic observables
Moving beyond the global purity, we can consider more general functions G(ρ Q z ) that are quadratic in the conditional states.Most generally, one can write these as where N is a Hermitian superoperator, i.e. a linear map over the space of operators satisfying ⟪C|N |D⟫ = ⟪D|N |C⟫ * for operators C, D. To ensure the convexity of G, we will insist that N is positive semi-definite: ⟪C|N |C⟫ ≥ 0 for any operator C; we write this condition as N ⪰ 0. Examples of this include the quantumquantum correlator ⟨O ⊗ O⟩ QQ [Eq.( 15)], which corresponds to N = |O⟫⟪O|.Moreover, the purity of some subsystem of Q, which we will denote Q 1 with complement Q 2 = Q\Q 1 , can be written in this form: One can express purity of Q 1 as where {σ ν } is a basis of operators that respects the tensor product structure of the Hilbert space H Q = H Q1 ⊗H Q2 , and the notation ν ∈ Q 1 denotes that the sum is restricted to those ν for which σ ν acts as the identity on Q 2 .The above implicitly defines the superoperator N corresponding to the subsystem purity.
To solve the minimization problem for this class of observables, we use similar logic to that described in Section VI A, with some modifications.Detailed arguments are presented in Appendix B 1, which lead to the following bound where ζ CQ is the superoperator that was defined in Eq. ( 44).The global purity bound (45) corresponds to the special case where N = id, the identity superoperator.Again, the above inequality is saturated in the limit of perfect classical simulation.

D. Upper bound for quadratic observables
The corresponding maximization problem for averages of the form (47) is not a convex optimization problem, and so cannot be solved in a fully analogous way.Rather than directly solving the dual problem, we instead choose to re-express the problem by first trivially rewriting where we define and ∥N ∥ ∞ = max ⟪C|C⟫=1 ⟪C|N |C⟫ is the spectral norm of N when viewed as a matrix.The significance of the above is that N ⪯ 0 by construction, and hence when we substitute this into Eq.( 47), the second term constitutes a concave function of ρ Q z .The first term simply gives us a term proportional to the purity Tr[(ρ Q z ) 2 ], which by Theorem 1 we know to be very close to unity at the point where the maximum is achieved.Hence, we lose little tightness by replacing the first term with the constant ∥N ∥ ∞ .
Being concave, the average of ⟪ρ Q z |N |ρ Q z ⟫ can be upper bounded, using the same method as the lower bound for the convex function (49).Altogether, we obtain From Eqs. (49,52), it becomes clear that we can constrain the average of (47) to within a window whose width is determined by the quantity (1 − STr[η QC ζ CQ ]).Indeed, from Sections VI A and VI B, this quantity is itself equal to the range of values that the global purity can take.This highlights the significance of Theorem 1: If the true states ρ Q z realised by the device are themselves mixed, then the purity cannot be constrained to be within an arbitrarily narrow window, and in turn, quadratic observables of this kind cannot be tightly constrained either.Indeed, even before we performed any of the manipulations in this section, one could straightforwardly show that where we use the shorthand ran(N ) = g * + −g * − for the ob- Hence, if there is large uncertainty in the purity, then there must also be large uncertainty in certain quadratic observables (see also Section VIII A).

The von Neumann entropy S(ρ
is another quantity that can be used to characterized mixedness and/or entanglement of quantum states, which has particular information-theoretic significance.This function is concave in ρ Q z , and so in this instance the maximization problem is most easily addressed.The dual function for the maximization problem is where by analogy to F 2,− (C), we define which is attained for ρ = e −C / Tr[e −C ].Hence, we have This upper bound can be applied quite generally, and can even be used to re-derive a result that was obtained in Ref. [34] using alternative arguments based on the quantum relative entropy: If we have a single constraint with Setting λ = 1 in the above, we obtain a certificate the right hand side of which was introduced in Ref. [34] as the 'quantum-classical entanglement entropy'.Evidently, we could use exactly the same inequality for the average von Neumann entropy of a subsystem of Q, provided one replaced all density matrices with the corresponding reduced density matrices.The significance of the choice λ = 1 can be understood by recognising that λ = 1 is dual-optimal in the limit of perfect simulation ρ C z = ρ Q z , since the inequality ( 58) is then saturated.If there are small discrepancies in the quantum and classical states, one expects that the optimal choice of λ will be shifted slightly, and hence this bound can in principle be tightened by optimizing over λ.
However, a more serious problem to address is the instability of the quantum-classical entanglement entropy for singular or near-singular classical states ρ C z , since log ρ C z diverges when the eigenvalues of ρ C z are small.This is a particularly important problem when ρ C z are close to being pure, as noted in Ref. [34].Here, our more general approach allows one to get around this issue in a systematic way, since the choice A z = − log ρ C z can be easily altered in a way that guarantees numerical stability.A natural choice of regularization is to work directly with the eigenvalue decomposition ρ C z = n q z,n |χ z,n ⟩ ⟨χ z,n |, and to separate out the problematic eigenvalues-namely those that are below some threshold ϵ.We are free to separately measure the following two quantities where we define the projector The first of these is similar to the original choice A z = − log ρ C z but with near-singular eigenvalues removes, while the second measures the average weight of ρ Q z lying within the nearsingular subspace Π ≤ z .We can then proceed as before and find an optimal bound based on empirical values of the above two observables.In Appendix B 2, we work out such an upper bound explicitly.As a particularly simple special case, if the classical states are all pure ρ C z = |ϕ C z ⟩ ⟨ϕ C z |, then this improved bound can be expressed in terms of the quantity where H 2 (p) = −p log p−(1−p) log(1−p) is the Shannon entropy for a binary random variable.

F. von Neumann entropy lower bound
If we focus on the global von Neumann entropy, then arguments similar to those in Section VI B can be used to show that a non-vacuous lower bound cannot be obtained: Theorem 1 implies the existence of a feasible ensemble whose average entropy is no greater than R max z p z log d, which is small in the no-coincidence limit p z ≪ 1.
As for the von Neumann entropy of a subsystem Q 1 , this can in principle be lower bounded, but a direct analysis of the minimization problem is not straightforward, due to the concavity of S(ρ Q1 z ).Instead we use a similar approach to Subsection VI D, where we exploited the fact that the maximum of a convex function is attained at an extreme point, which we know to be a (mostly) pure ensemble.There, we rewrote the desired function in terms of the difference between the purity and a convex function [Eq.( 50)], thereby allowing the maximization problem transformed into a convex optimization task.Here, we use a similar line of reasoning, by trivially rewriting where S(ρ AB |B) := S(ρ AB ) − S(ρ B ) is the conditional quantum entropy for a bipartite state ρ AB .From Theorem 1 and the concavity of S(ρ Q1 z ), the desired minimum is attained for an ensemble that is mostly globally pure, and so the average of S(ρ Q z ) will be close to 0 at this point.This suggests that we do not lose much tightness in employing the bound Crucially, S(ρ Q z |Q 1 ) is a concave function of the global state ρ Q z (a statement that is equivalent to strong subadditivity of the von Neumann entropy [61]).Thus, we are left with a convex optimization problem, as desired.In Appendix B 3, we derive the following concrete bound for any choice of the Lagrange multipliers λ i .Again, we can specialise to a particular choice of operators A which can be shown to be no greater than unity using Theorem 11.29 of Ref. [62]; we then have As with the other optimization problems considered in this section, we can choose to use bounds that are simple and easy to evaluate, such as Eq. ( 64), or to use the more versatile expression (63), which are more cumbersome, but can in principle be numerically optimized to obtain a tighter inequality.

G. Frame potential
The frame potential is a property of a quantum state ensemble that characterises the variation between different quantum states in the ensemble.For any integer k, it is defined as For pure state ensembles, the frame potential can be used to characterize how far away an ensemble is from being a quantum state k-design [63,64], namely an ensemble for which the kth moments z p z (ρ Q z ) ⊗k coincide with those of the Haar ensemble [48,65] (see Section VII B).
Although the quantity ( 65) is not manifestly in the form of the averages (3) considered so far, we can use a simple trick to bring it into the appropriate form: Focusing on k = 2 for now, we can write , where η QQ is defined by analogy to Eq. ( 42).We do not have access to η QQ in advance, but from the bounds (49,52) we can infer that where the notation N ⪯ N ′ for two superoperators N , N ′ means ⟪C|N |C⟫ ≤ ⟪C|N ′ |C⟫ for all operators C.Moreover, by its definition we have Hence, successive application of the bounds (49,52) yield (67a) More generally, for higher k the frame potential can be expressed in terms of a new 'doubled' ensemble E QQ with the following structure In words, samples from this ensemble correspond to independently sampled pairs (z, z ′ ) of the original ensemble E Q , and the corresponding states are tensor products Then, the frame potential ( 65) of E Q can be interpreted as an average of the form ( 3) for E QQ , where the function G is given by where σ is a density operator on the doubled space, and π S = a,b |a ⊗ b⟩ ⟨b ⊗ a| is the swap operator.Bounds for averages of kth powers of expectations, such as the above, can in principle be derived using similar approaches to those described in this Section.

VII. NUMERICAL BENCHMARKING
Having derived various bounds for certain ensembleaveraged quantities of interest, we now present results of some numerical experiments where we simulate the full procedure described in this work, including estimation of the measurable parameters (19) to constructing the bounds.

A. Projected ensemble
As a testbench for our method, we will consider the projected ensemble of a many-body state generated by (possibly noisy) finite time evolution from a product state.For the noiseless case, we take as the premeasurement state |Ψ(t)⟩ = U (t) |0 ⊗N ⟩, where t is an integer-value discrete time, and U (t) = U t F is a unitary generated by Floquet evolution with Floquet unitary U F = e −it2H2 e −it1H1 .We choose the Hamiltonians H 1,2 to be given by the one-dimensional tiltedfield Ising model with open boundary conditions H α = N −1 j=1 J j,α Z j Z j+1 + h x j,α X j + h z j,α Z j for α = 1, 2, where (X j , Y j , Z j ) are Pauli operators on site j.For the purposes of this subsection, we fix J j,α = 1, h z j,α = (1 + √ 5)/2, h x j,1 = 0.4, h x j,2 = −0.6.The projected ensemble considered here is defined as the set of postmeasurement states arising when N − 1 qubits are measured in the computational (Pauli-Z) basis, with the qubit on site j = 1 acting as the 'unmeasured' qubit.In cases where noise is present, we will apply an amplitude damping channel of strength p dec on every qubit after each Floquet period.Concretely, the system density matrix evolves as ρ(t . In the noisy case, where we must compute the full evolution of the density matrix, our exact diagonalization results are limited to relatively small system sizes.To ensure cross-comparability between all cases, we will fix N = 10 throughout.Note that the specific form of evolution we choose, along with the parameters selected, do not have a great deal of bearing on the performance on our method-indeed we have considered several different settings and found quantitatively similar performance.For the purposes of this section, we will focus on the for which we can use the bounds Eqs.(49,52); again, this choice is not particularly important.
We will simulate the full protocol using the shadow tomography-based method described in Section III C (Fig. 2) to extract the values of the estimable parameters (8).Specifically, the procedure involves 1) Preparation of the pre-measurement state, 2) Applying random on-site Clifford unitaries to the unmeasured qubits, 3) Projectively measuring all qubits, 4) Processing the measurement outcomes via the dual frame (13), in terms of which estimators for the quantum-classical correlators can be obtained, and finally 5) Using the bounds derived in the previous section to constrain the value of the desired quantity.This is repeated a finite number of times M .
The finite number of repetitions M means that there will be some residual uncertainty in the values of the chosen estimable parameters.
Taking the example of the correlators (42), we will obtain an estimate ηQC = η QC + εQC , where η QC is the true value, and εQC is a zero-mean error, whose variance decays as 1/ √ M (hats are used to denote random variables here).Despite this uncertainty, we can still employ the bounds in a rigorous manner as follows: We use the estimated value ηQC as parameters for finding the optimal Lagrange multipliers [in this case we substitute ηQC into Eq.( 44) to find ζCQ ].This will give us a set of Lagrange multipliers that are approximately dual-optimal point for the true problem, which features η QC rather than ηQC .These Lagrange multipliers can then be substituted into the dual function [in this case Eq. ( 43)], which by virtue of being linear in b i (equivalently η QC ) can be evaluated with some error bars based on estimates of the standard deviation of b i , along with some pre-chosen confidence level-we pick 99% confidence throughout.One can then guarantee that the true value is no smaller or greater (as appropriate) than the constructed value with probability at least 99%.
As a first check, we can consider the best-case scenario, where dynamics is noiseless and the classical simulation is perfect ρ C z = ρ Q z .We show upper and lower bounds for 2 ) as calculated via our method, along with the true value, in Fig. 3. Due to the fact that the ensemble states are pure and simulation is perfect, the only source of uncertainty in the inferred value of Ḡ is from the finite number of samples M , which we deal with as described above.We show bounds empirically constructed for two different sample sizes M , and both lower and upper bounds will tend towards the true value as M is increased indefinitely, with corrections scaling as 1/ √ M .We can now introduce some inaccuracy in the classically simulated states ρ C z .For this purpose, we will consider the projected ensemble at a fixed time t = 8, using the same 'quantum' states ρ Q z as before.To construct the imperfect classical states, we will employ the FIG. 3.
Estimates of the averaged quantity Ḡ = ) for the projected ensemble generated from noiseless dynamics as described in the main text.The true value is shown, along with bounds constructed using our method for two different choices of the number of experimental repetitions M = 5000 and M = 50 000.In this case, classical simulation is perfect and the states in the ensemble are pure, so as M is increased both upper and lower bounds converge asymptotically towards the true value.
same Floquet evolution as above, but now with some additional spatially-dependent randomness in each of the Hamiltonian parameters J j,α , h x,z j,α .For each parameter, we pick a perturbed value Jj,α = J j,α (1 + f n j,α ) (similar for hx,z j,α ), where n j,α = ±1/2 are chosen independently at random for each parameter, but kept fixed in all of the data shown here.The free parameter f represents a fractional uncertainty in the Hamiltonian parameters; thus as it is increased, the quantum and classical states ρ Q z , ρ C z become less correlated.We can keep track of accuracy of the classical states in our numerical experiment by evaluating the quantity ∆ QC := z p z ∥ρ Q z − ρ C z ∥ 1 , where ∥ρ − σ∥ 1 is the trace distance.
Figure 4 shows data obtained using these inaccurate classical states.We show both the bounds constructed based on data from a finite number of experimental samples M = 50 000, along with 'asymptotic' (i.e.M → ∞) bounds, which we obtain by evaluating the right hand sides of Eqs.(49,52), thus eliminating any statistical uncertainty.We see that as the fractional uncertainty f is increased, the distance between the quantum and classical states increases, and the bounds become less tight.For reference, the value of ∆ QC that would be obtained if ρ C z were chosen as independent random states would be 1; thus the data towards the right of this plot represents fairly poor simulation.In the absence of noise and for the specific quantity considered here, crude estimates for the deviation between the bounds and the true value scale as at least in the regime of small ∆ QC ), which is upper bounded by (∆ QC ) 2 .This quadratic dependence on the distance between quantum and classical states explains why we see good performance, even when there is appreciable difference between the two states.
Finally, we consider the case where both noise and in-FIG.4. Effect of the fidelity of the classically simulated states ρ C z on the tightness of the bounds for the noiseless projected ensemble at time t = 8.The classical states ρ C z are perturbed away from the ensemble states ρ Q z by introducing some fractional uncertainty f ∼ δJ/J in the Hamiltonian parameters, as described in the main text.In the top panel, we include both bounds constructed based on shadow tomographic data from M = 50 000 repetitions, along with asymptotic bounds, where we evaluate the right hand sides of the inequalities (49,52) without any statistical uncertainty-this is the value that one would obtain as M → ∞.The bottom panel shows the average trace distance between quantum and classical states , which acts as a measure of how faithful the simulated states are.
accuracies in the classical states are present, which reflects the nature of realistic experiments.Again we fix t = 8, and now consider amplitude damping of strength p dec = 0.002 for each qubit and each timestep.Although this value of p dec appears to be small, the expected number of errors in the full circuit can be estimated to be N × t × p dec = 0.24, which has an appreciable effect on the conditional states, as can be seen in their mean purity, which here is 0.95.The classical states are generated in the same way as the ensemble states, including the noise, but with additional randomness in the Hamiltonian parameters of strength f , introduced in the same manner as above.
The results are shown in Fig. 5.The data follows most of the same trends as before, with higher fractional uncertainty leading to higher deviation ∆ QC , and in turn less tight bounds.However, because the dynamics is noisy, the states in the ensemble ρ Q z are no longer pure-in this case their average purity is z p z Tr[(ρ Q z ) 2 ] = 0.95.Unlike the lower bound, the upper bound saturates at a value that is strictly above the true value of Ḡ even in the limit of perfect simulation (f = 0).This can be seen as a consequence of Theorem 1: We saw already in Section VI B that the global purity cannot be tightly bounded, and the same goes for other quadratic observables.This issue is discussed in detail in Section VIII A. 4, but with noisy dynamics (p dec = 0.002), resulting in a projected ensemble with mixed states, with average purity z pz Tr[(ρ Q z ) 2 ] = 0.95.The lower bound approaches the true value in the limit of perfect simulation, but the upper bound does not, as a consequence of Theorem 1.

B. Application: Verifying emergent quantum state designs
One feature of the projected ensemble that has attracted much interest recently is that under rather generic conditions, the ensemble of post-measurement states turns out to form an (approximate) quantum state k-design [14,15,17].That is, the kth moments of the ensemble agree (closely) with the corresponding moments of the Haar ensemble sym is the projector onto the permutationsymmetric subspace of (H Q ) ⊗k .Such ensembles of states are in a certain sense 'maximally random', and this makes them useful for certain tasks including quantum state and channel tomography [35,36,38,39,54] and cryptography [37].Here we demonstrate how our method can be employed to verifiably conclude whether or not the projected ensemble realised in an experiment is an approximate k-design.
For ensembles of pure states, the frame potential constitutes a measure of how far the ensemble is from being a k-design.Specifically, in Ref. [64] it was shown that the frame potential ( 65) is related to the (normalized) Frobenius distance between the moments (70, 71) via k −1 the frame potential for the Haar ensemble.Thus, when the frame potential is minimized, a k-design can be formed.However, in experiment the ensemble states can be mixed, and this leads to a suppression of the frame potential which can mimic the effect of forming a quantum state design-indeed, the right hand side of Eq. ( 72) can even be negative when the states are mixed.For this case, we need a more generally applicable bound.By expanding the left hand side of (72), we find that the distance can more generally be re-expressed as where we use the shorthand Here, each c ∈ τ is a cycle of the permutation group element τ , whose length is |c|.In the case k = 2, which we will focus on here, this gives The right hand side of the above contains the frame potential and the average global purity.Interestingly, these appear with opposite signs, and thus we have a difference between two convex functions, which itself is not convex.Nevertheless, we can employ the upper and lower bounds derived in Section VI for each term separately.We can test this method using the same dynamics that we considered in the previous subsection, including both noise and inaccuracies in the classical states.The tiltedfield Ising Hamiltonian that generates the Floquet unitary is understood to be chaotic [66], and hence we expect to see emergent state designs in the projected ensemble [17].The results of our numerical simulations are shown in Fig. 6.We plot both the true value of the distance (75) along with bounds constructed from simulated experimental data using M = 50 000 repetitions.In this case, the classically simulated states are generated from noiseless evolution, but with a fractional uncertainty in the parameters of f = 0.5%.Evidently, the distance does indeed decay towards zero as time increases.For small values of noise, the bounds we obtain are relatively tight, meaning that in a real experiment (where one would not have access to the true value), a definitive conclusion regarding the formation of approximate state designs could be made.As the noise rate and/or time t increases, the total number of errors accrued during the circuit increases, and the states in the ensemble become less pure, as evidenced in the bottom panels of Fig. 6.This leads to bounds that become less tight, in particular the upper bound.
These results demonstrate that even in the presence of noise and miscalibrations between the quantum and 6.Distance (75) between the k = 2 moments of the projected ensemble from those of the Haar ensemble, as calculated using our method with M = 50 000 experimental repetitions (classical shadows), for various values of p dec (top panel).Classical states are generated using a fractional parameter uncertainty f = 0.5%, and without noise.For reference, we also show the average trace distance between quantum and classical states ∆ QC (middle panel), along with 1 , which measures how far the states are from being mixed on average (bottom panel).
classically simulated states, one can make concrete inferences regarding the closeness of the projected ensemble realised in experiment to being a quantum state design.

VIII. FUNDAMENTAL LIMITATIONS
So far, we have introduced an optimization-based approach to inferring properties of post-measurement quantum state ensembles, constructed explicit two-sided bounds for various quantities, and demonstrated an immediate application of our method for the verification of emergent quantum state designs.Evidently, the method described in this paper gives us some degree of ability to learn quantities of interest in the context of measurement-induced dynamics, but we have already seen that in some cases there are unavoidable limitations in terms of what can be unambiguously inferred from experimental data.In this section, we discuss some of these limitations in detail, with the aim to more precisely characterise the boundary between properties of the post-measurement conditional quantum states that can or cannot be inferred from experiment.
are both examples of this; let us consider the former for concreteness.Suppose that the true ensemble E Q features states that are appreciably mixed on average, E z Tr[(ρ Q z ) 2 ] = 1 − δ, with δ > 0.Even in the best-case scenario, where we have perfect classical simulations, we cannot make the range of feasible values [Eqs.(20,23)] significantly narrower than δ, since this range must include both the true value (1 − δ) and the value 1 − O(e −Hmin(p) ) implied by Theorem 1. (Here H min (p) = − log max z p z is the min-entropy of the distribution p z , which typically scales linearly with the number of measurements.)So, if the ensemble being probed is not close to being pure, δ > 0, then although we might hope to obtain a good lower bound for the averaged purity, we cannot obtain a good upper bound, and hence there will always be some uncertainty in our conclusions.We can hope to learn the purity to a good accuracy if the ensemble being measured is itself close to pure (something we wouldn't know in advance, but could verify using our bounds).Indeed, for perfect simulation, the bound (45) tends to the true purity 1 − δ, and hence the range of feasible values (23) has an optimal width δ.
It may seem counter-intuitive that there remains a ambiguity in the average purity even when the simulation being used is perfect.This stems from the difference between knowing that the simulation is perfect, versus having a perfect simulation but not knowing (or assuming) that it is so.To learn something definitive about the ensemble, we cannot make such an assumption, and this means that we may not be able to make sharp conclusions even for perfect classical simulation.Note also that this limitation is not intrinsic to the inference method we are proposing in this paper: it is a fundamental obstruction, in that whatever strategy we employ, we cannot rule out the possibility of a mostly-pure state ensemble.
As a simple example that allows one to appreciate this idea intuitively, consider an ensemble defined for a single qubit (d = 2), where every conditional state is the same ρ Q z = ρ Q 0 , and this fixed state ρ Q 0 is mixed.This is represented by the blue arrow on the Bloch sphere in Fig. 7(a).If our classical simulation were perfect, ρ C z = ρ Q 0 , then any quantum-classical correlator we measure corresponds to a z-independent choice of A z in Eq. ( 8), since the classical states are themselves z-independent.Once we measure some collection of quantum-classical correlators, we can consider which ensembles are consistent with these observations.The true ensemble is of course one possibility, but there are also alternative ensembles, made up of pure states which vary between different z, such that they average out to the same mixed state ρ Q 0 .This is shown by the many blue arrows in Fig. 7(b).Put simply, Theorem 1 reflects the fact that based on our experimental observations, we cannot determine whether the true states are mixed, or whether the states are pure but our classical simulations are inaccurate: Indeed, both of these have the effect of reducing the values of the quantum-classical correlators compared to the case of pure states with perfect simulation.To rule out the possibility of inaccurate classical simulation, one would have to estimate a number of quantities R that itself scales with the cardinality of the ensemble |Z|-effectively one for each possible value of z.This is of course not possible unless one uses a number of experimental repetitions M that also scales with |Z|.
Looking at Figure 7, we can roughly characterize the difference between panels (a) and (b) by saying that the mixed state in (a) is realised as a convex combination of the many different pure states in (b).When we consider more general averages of the form (3) with convex functions G (beyond those like the average global purity), the ensembles (b) will have a higher value for this average compared to the true scenario (a), since taking convex combinations of states will reduce the value of Ḡ.Therefore, the existence of consistent pure-state ensembles implied by Theorem 1 means that when the true ensemble is mixed, the upper bounds of convex functions cannot be made arbitrarily tight.As for lower bounds of convex averages, these can indeed be made tight as the fidelity of classical simulation is improved.Indeed, this is borne out in the various bounds we derived in Section VI: Compare, e.g.Eqs. ( 49) and ( 52) in the limit of ρ Q z = ρ C z .The lower bound tends to the true value, while the upper bound deviates from the true value by an amount ).One can see this asymmetry between the upper and lower bounds explicitly in Fig. 5.
To conclude, the arguments presented above illustrate the conditions which must be satisfied if we wish to constrain the value of a particular nonlinear convex (a) FIG. 7.An illustrative example that demonstrates Theorem 2 implies a limitation our ability to sharply determine the purity of an ensemble.We consider a case where the quantum ensemble is given by the same mixed state for each z, i.e. ρ Q z = ρ Q 0 , represented by a light blue arrow on the Bloch ball, with sub-unit length.(a) Perfect classical simulation also implies ρ C z = ρ Q 0 (dark orange arrow).(b) When we measure quantum-classical correlations using ρ C z = ρ Q 0 , there is also another candidate ensemble made up of pure states which vary between each z (multiple light blue arrows), distributed such that the same correlations are observed.Since we cannot a priori know that our simulations are perfect (even if they are), we cannot use our experimental data to rule out the possibility that the true scenario is (b).average (3) to within a small window: Not only should the classical simulation be accurate, but also the conditional states of the true ensemble should be close to pure-otherwise, it will be impossible to make the upper bound tight.Purification transition.-Onecase where bounds on the global purity of the system are required is when probing purification dynamics [9], and so this case deserves special attention.Here, an initially mixed state is subjected to a hybrid unitary-projective circuit, and one asks how the global purity increases over time.Since the states of interest are necessarily mixed even in the idealised case where the hybrid circuit is noiseless, one may conclude on the basis of Theorem 1 that purification dynamics cannot be probed experimentally.This is indeed the case if one actually prepares maximally mixed states as the inputs to the circuit; however if one instead uses a maximally entangled state between the system and a set of ancillas as the initial state, then the global state will be pure.The purity of the system can then be thought of as the purity of a subsystem of the global state, which can be bounded using the approach outlined in Section VI B.
Conveniently, using methods based on classical shadow tomography one can avoid having to use any ancillas in an actual experiment: By preparing randomized initial states and measuring the output states in random bases, it is possible to construct a classical shadow of the Choi state describing the hybrid dynamics.This Choi state is precisely the global state one would obtain by performing conventional shadow tomography on a purified systemplus-ancilla state.See, for instance, Refs.[67,68].

B. Computational cost of classical simulation
So far we have considered the classical states ρ C z in quite general terms, without specifying how (and if) the states ρ C z are constructed.Recall that the specific computational task required by our protocol is that upon obtaining an outcome z in the experiment, we must compute and store a state ρ C z , which we construct on the basis of some model of how we expect the quantum device to behave.Whether or not this task is (even approximately) achievable is an intricate question, and has been discussed to some extent in other recent works where quantum-classical correlators have been introduced [13,[27][28][29][30][31][32].Here we explore some examples where we can answer this question in the affirmative or negative.
Most obviously, some clear examples where such a classical simulation can be readily performed are in few body systems, or many-body systems where the full underlying dynamics is efficiently simulable, e.g. for Clifford circuits or free fermionic systems.Going beyond these simple cases, the first issue to address is whether a representation of the simulated state can be stored using a scalable amount of classical memory (irrespective of whether it can be computed or not).One case where this is certainly possible is when the conditional states are defined on a O(1) number of qubits, even if the full quantum device is a many-body system.For example, in the projected ensemble [14][15][16][17], or teleportation transition [25], we can choose to measure many qubits, leaving only a few unmeasured.If, on the other hand, the postmeasurement states consist of extensively many qubits, then the states themselves must have some appropriate structure which allows them to be represented using some variational ansatz, e.g. a tensor network.
In terms of the simulation strategy, the approach chosen will naturally depend on the specifics of the system, and so to make any further statements we must specialise to particular types of measurement-induced dynamics.One example where full simulation can be done efficiently is in the projected ensemble for a wavefunction generated by evolving a product state under some finite-time dynamics in one spatial dimension (see e.g.Refs.[14,64]).There, state of the unmeasured qubits conditioned on a particular measurement outcome can be constructed using a transfer matrix technique, the cost of which is linear in system size for any fixed time of evolution.Since the convergence towards emergent state designs is exponentially fast in time 1D [64], this means that for the scheme described in Section VII B, the computational cost can remain small.The same goes for many-body teleportation protocols in one dimension [25].If the time of evolution were to scale with system size, or we move to higher dimensions, then efficient simulation may not be possible, although depending on the circuit being simulated there may still be viable options, see e.g.Ref. [69].
For the case of one-dimensional hybrid quantum circuits, results on the measurement-induced phase transi-tion indicate that the dynamics is efficiently simulable in the area-law phase using matrix product state-based techniques, but not in the volume law phase [5,70,71].This suggests that experimentally learning properties of the post-measurement ensemble of conditional states without employing brute-force postselection is only possible in one phase.In the volume law phase, where simulation is presumably not possible, one cannot use the scheme described in this paper in practice, and we must instead ask whether anything nontrivial can be learned in the absence of a simulation-see Section VIII C. Put simply, in these cases there exists a set of measurable properties (8) which could in principle be used to determine some ensemble property of interest, but to determine the correct operators A z is a computationally intractable task.See also Ref. [34], where the question of whether entanglement can be probed in area vs. volume law phases is considered.
Beyond these examples, it is interesting to consider whether there may be scenarios in which the computational task required here (calculating conditional states for specific values of z) can be done efficiently, even if full simulation, which would also involve sampling from the probability distribution-a potentially hard task classically [72]-is not.One might also consider the possibility of using a second quantum device to perform the simulation itself: This certainly would be possible for the projected ensemble in 1D dual-unitary circuits by leveraging spacetime rotation [73], but whether this can be done efficiently in other cases is unclear.Finally, we highlight that in settings where the dynamics features some particular structure, such as a continuous global symmetry, then it may be possible to make concrete inferences about the states in question by employing simulations that only use partial knowledge about the dynamics (e.g. the distribution of gates employed rather than the exact gates chosen), which are computationally scalable.Indeed this was applied in Refs.[28,74] to demonstrate the feasibility of probing charge-sharpening transitions in hybrid U(1)-symmetric circuits.We leave these ideas to future work.

C. Can simulations be avoided?
Having highlighted the fact that constructing the simulated states ρ C z may not always be possible, this raises the question of what can be learned without any advanced knowledge of the structure of the conditional states.To address this question in a sharp way, we consider a thought experiment, the implication of which is that if we cannot perform such a simulation, no information can be gained about the ensemble beyond the properties of the averaged state (2) using a reasonable number of experimental repetitions M .
Let us introduce the following hypothetical scenario: We are given access to a device (oracle) that when queried, outputs a label z and state ρ Q z sampled from an ensemble.The ensemble realised by the oracle is one of the following, chosen with equal a priori probability where ⟨ρ Q ⟩ is the average state (2).After querying the oracle M times, we are asked to determine whether the samples came from We want to consider this problem because if we can't reliably distinguish these two ensembles, then we cannot hope to learn any properties of E Q 1 besides those that depend only on ⟨ρ Q ⟩.For the present purpose, we will broadly define a simulation as any method that allows us to obtain some (partial) prior knowledge of how the labels z map onto properties of the individual states ρ Q z .Thus, in the scenario we are thinking about where we do not have access to a simulation, we can only employ strategies that treat each z on an equal footing, i.e. we cannot exploit of any contextual information about what the labels z represent.To make this idea concrete, we define a simulation-free strategy as one that works the same if all labels z output by the oracle are first permuted by some arbitrary τ ∈ Σ |Z| .In Appendix C, we prove the following Theorem 2. Given coherent access to M independent samples of quantum states from one of the two ensembles )], chosen with equal a priori probability, then any simulation-free strategy to distinguish E Q 1 from E Q 2 succeeds with probability no greater than Thus, in the regime where M ≲ 2 H (2) [pz] , where H (2) [p z ] := − log 2 ( z p 2 z ) is the collision entropy of the distribution p z , which typically scales linearly with the number of measurements, we cannot reliably distinguish a given ensemble E Q 1 from one where the average state ⟨ρ Q ⟩ is supplied independently of z.As a consequence, we cannot hope to learn any property of an ensemble that is not contained within the average state (2), unless we obtain a number of samples that scales exponentially with the collision entropy of p z .This is even the case when we can access all M copies of the state simultaneously, which encompasses situations where we employ adaptive measurement strategies, i.e. protocols where the gates and measurements that we apply can be chosen in a way that depends on measurement outcomes that occurred earlier.Hence, Theorem 2 establishes that the postselection problem can only be avoided if we incorporate some prior knowledge about the states ρ Q z into our learning strategy.

IX. CONCLUSION AND OUTLOOK
In this work, we have introduced a scheme that allows one to infer properties of an ensemble of quantum states generated by dynamics that involve measurements.We avoid postselection by insisting that the quantities we directly measure from the experiment are 'estimable properties' [Eq.( 8)], which can be computed using a scalable number of experimental repetitions.Then, information about an ensemble averaged property (3) can be indirectly inferred by solving an optimization problem, as defined in Section V. Our method gives one a lower and upper bound for the desired quantity, which can be made narrow by employing simulations of the quantum device on a classical computer.Crucially, the conclusions we make regarding the ensemble of states generated by the quantum device are not contingent on any a priori assumptions about the accuracy of the simulation: Any bounds we construct are entirely rigorous.
Our results clearly have immediate implications for near-term experiments that probe measurement-induced dynamics.In Section VII B, we saw how the method used here can be used to infer the emergence of quantum state designs in the projected ensemble.For other types of measurement-induced phenomena beyond this example, once the relevant order parameters and/or figures of merit are identified, and corresponding inequalities of the kind presented in Section VI are derived, one can immediately start using these bounds in the spirit of this work to indirectly infer the value of this quantity realised in some experiment.In particular, by virtue of having postmeasurement states defined on a small number of qubits, we anticipate that experiments for witnessing many-body teleportation transitions [13,25] should be ideal settings for the application of our method.
Thinking more generally, the arguments presented in Section VIII point to a sharp distinction between scenarios where properties of the ensemble of post-measurement conditional states can or cannot be inferred from experimental data.One such condition is on the states being probed: If these are not close to being pure, then there will inevitably be some uncertainty in the value of the desired quantity.The other pertains to the feasibility of classical simulation: If simulation is not possible, then the states generated by the dynamics are indistinguishable from the case where the average state (2) is realised every time.The implication is that if we hope to use the conditional states ρ Q z as resources for some useful task, then we must necessarily have some prior knowledge about their structure, i.e. some model to guide our expectation for how ρ Q z depends on z.This basic expectation should be borne in mind when considering possible extrinsic applications of measurement-induced physics in future studies.
While the approach we introduce here is designed with the aim to characterise properties of the postmeasurement states, it is helpful to make comparisons to other scenarios where one wishes to make other kinds of inferences about processes that generate both quantum and classical data.One example that fits particularly well into this category is the problem of parameter estimation for continuously monitored few-body systems [75][76][77][78][79], which has applications in quantum metrology.There, one again has to deal with the fact that every repetition of the experiment results in a different nondeterministic outcome, which can be handled by using a parallel classical simulation of the dynamics conditioned on those quantum trajectories that arise in the experiment.A key difference is that in parameter estimation, a particular starting assumption is made that the generator of dynamics is given by some model with a fixed number of unknown parameters, which we wish to infer based on the outcomes of the measurements we make.Because of this strong assumption, inferences can be made purely from the observed distribution of outcomes, without measuring state of the system at the end of the experiment.
In cases where such model-based assumptions can be reliably applied, it may well be possible to incorporate these into our scheme through modification of the space K, and this might perhaps allow one to overcome some of the limitations of Theorem 1.In fact, in the even more structured setting where one wishes to learn the best candidate out of a set of hypotheses for a quantum-classical process, general bounds on the required sample complexity are known [80,81], and so insights from these works could suggest strategies for overcoming these limitations.
One type of scenario that has not been explicitly considered in this work is where the outcome of measurements are used to determine some subsequent dynamics, i.e. feedback is employed.In such scenarios, which include error correction [1] and measurement-based computation [2,3] as special cases, nontrivial behaviour can arise even in the averaged state [19,[82][83][84][85][86][87], and this can be probed in experiment using conventional learning techniques (though one should note that the physics of the averaged state is still distinct from those of the individual trajectories ρ Q z [86][87][88]).Still, if one is interested in using such 'interactive dynamics' to generate states with some desired property, then the choice of feedback applied could in principle be made based on some knowledge of the post-measurement states prior to the conditional unitary operation.The scheme introduced here can in principle be used to characterise these 'pre-feedback' states, which can then be used to make a constructive choice of feedback algorithm.Indeed, this type of experimental flow arises in certain proposals of measurement-based computation [89].To establish Theorem 1, we will need the following result, which is a restatement of claim (5.8) in Ref. [90].
Lemma (Dubins [90]).Let K 0 be a linearly bounded convex subset of a finite-dimensional vector space M. Let A be an intersection of R hyperplanes in M, i.e. a linear subspace of codimension R. Then the extreme points of the intersection K = K 0 ∩ A are elements of the Rskeleton of K 0 , namely the union of all faces of K of dimension less than or equal to R.
When applied to the feasible space K defined in Eq. ( 23), we can reduce our problem to a study of the Rskeleton of K 0 .Recall that a face of a convex set K 0 is a subset F ⊂ K 0 with the property that if x ∈ F is a convex combination of two other elements x = λx ′ + (1 − λ)x ′′ , where x ′ , x ′′ ∈ K 0 , then this implies that x ′ , x ′′ ∈ F. All faces have a dimension, which is the dimension of the smallest affine set containing it-for example, the faces of dimension zero are singletons, each containing an extreme points of K 0 .Here we are interested in the faces of dimension at most R.
The set K 0 in question is a Cartesian product of |Z| copies of the space of density matrices D over a Hilbert space of dimension d.Then, our first step is to show that the faces of K 0 are products of faces of D. To demonstrate this, consider a face F ⊂ C, where Then, by the definition of a face we have ).This implies that F 2 (x 1 ) = F 2 (x ′ 1 ), and hence for any x 1 , F 2 (x 1 ) is either empty or equal to a fixed set F 2 , and we conclude that F = F 1 × F 2 , where F 2 is a face of C 2 .The same logic applied to the analogously defined set F 1 (x 2 ) gives us that F 1 must be a face of C 1 .If we apply this to the multiple product K 0 = × z∈Z D, we conclude that faces of K 0 are products of |Z| faces of D, as claimed above.
With this understood, note that the n-dimensional faces of K 0 are products of faces of D whose dimensions sum to n.Thus, at least (|Z| − n) factors of any ndimensional face of K 0 will be zero-dimensional.Since a zero dimensional face is an extreme point, and the extreme points of D are pure states [91], we have that any element of an n-dimensional face of K 0 are ensembles with no more than n non-pure states.We conclude that extreme points of K, which according to the Lemma quoted above belong to the R-skeleton of K 0 , must have no more than R non-pure states, thus competing the proof of Theorem 2.
□ The corollary quoted in the main text follows since K is a compact, convex subset of M = z∈Z B(H), which is a finite-dimensional vector space, and hence the Krein-Milman theorem applies.Because K is necessarily nonempty, at least one extreme point of K must exist.

Quadratic observables
Here we prove the lower bound (49), which is obtained using a similar line of reasoning to that discussed in Section VI A. We start with the case where the superoperator N = |O⟫⟪O|, i.e. we are considering the average of The case of general positive semidefinite N ⪰ 0 will follow quickly from this case.
We can assume that Tr[O] = 0 without loss of generality, since any component of O proportional to I can be subtracted off, yielding an additional term ∝ Tr[⟨ρ which is a property of the average state and hence can be measured directly.Then, the dual function h − (λ i ) corresponding to this particular primal problem takes the form where a i , Ã(i) z are as in Eq. ( 35), and we have defined the function Another useful aspect of the qubit problem that helps us here is that the the space of density matrices D has a simple geometry that can be straightforwardly characterized: The conditions that ρ is Hermitian and satisfies Tr[ρ] = 1, Tr[ρ 2 ] ≤ 1 are necessary and sufficient for ρ to be a valid density matrix.This leads to the notion of the Bloch ball, which is a helpful visualization of the space D: any density matrix can be written as |ρ⟫ = (|I⟫ + n 1 |X⟫ + n 2 |Y ⟫ + n 3 |Z⟫)/2, with ⃗ n = (n 1 , n 2 , n 3 ) a 3-dimensional vector that specifies the state, which belongs to the unit ball |⃗ n| ≤ 1.Since the function to be extremised is independent of n 2 , we can reduce this to A solution to this problem can be formally written down in terms of the roots of a certain polynomial equation, but the resulting expression is rather cumbersome and not particularly informative.Rather, it is useful to consider the behaviour of the above function in the vicinity of α 1 = 0.By considering small perturbations around the α 1 = 0 solution, we obtain the expansion in the directions α 1 and α 3 is markedly different near the point α 1,3 = 0: The function decreases linearly along α 1 , and quadratically along α 3 .Bearing in mind that α 1,3 are linear combinations of the Lagrange multipliers λ i , we consider a stability analysis of the function h − (λ i ), which is to be maximized, around the candidate point λ i = 0. Qualitatively speaking, we will find that it is favourable to increase λ i away from zero for those i where the operators A (i) z are predominately along the Z direction in operator space (α 3 dominates), but not those along orthogonal directions (α 1 dominates).This is because for the former subset of λ i , the second term in Eq. (B1) will increase faster than the first term decreases as λ i is varied away from zero.
This rough intuition can be made concrete most straightforwardly if we assume that for each i, A (i) z is either orthogonal to or parallel to Z. (Such an assumption is not particularly restrictive-if A (i) z are measurable, then we can always decompose A (i) z = µ a iµz σ µ in terms of components along some basis of operators σ µ , which we are free to choose such that one of the σ µ is proportional to O.Then, using our shadow-based scheme we can always choose to measure the enlarged set of operators A (i,µ) z := a iµz σ µ for each pair (i, µ) separately without losing any information, simply by altering the classical post-processing.)Let us denote the set of i for which ⟪A (B6) (Note that we have a i = 0 for i ∈ I ∥ .)From this point, our arguments follow a similar structure to those of Subsection VI A. We can use the bound F (d=2) 2,− (0, α 3 ) ≥ −α 2  3 /4 to obtain an analogous expression to Eq. (37) where we have introduced J ij , an analogue of L ij in Eq. ( 38) Again, optimality of this candidate solution to the dual problem is not guaranteed, i.e. h− might be strictly smaller than the true optimal value g * − .Even so, the above can be evaluated straightforwardly using experimental data, and can be used as a lower bound for the quantum-quantum correlator E z Tr[ρ Q z O] 2 .Beyond a single qubit.-Whenwe generalize beyond the case of a single qubit d = 2, the optimization over density matrices in higher dimensional Hilbert spaces is made more complicated by the geometric structure of the space D, which does not have simple interpretations such as the Bloch ball.For instance, expressions analogous to Eq. (B3) are not so straightforward.Nevertheless, we can carry the intuition gained above forward to this case to anticipate that reasonable bounds can be found by optimizing only λ i for i ∈ I ∥ , where now I ∥ denotes the set of is for which A where ∥O∥ ∞ denotes the spectral norm, equal to the largest singular value of O.This solution has the same structure as the qubit case (B4), and hence the same logic can be used as before to reproduce the bound Eq. (B9).That is, the same expression (B9) for h− as a lower bound for g * − for arbitrary d.Now, following the same approach as in Section VI A, we can use quantum-classical correlators to choose the operators A (i) z .The simplest case, where we use the standard quantum-classical correlator (16) as the only constraint, yields which can be interpreted as a resuly of the Cauchy-Schwartz inequality, or equivalently the standard inequality Var(X) Var(Y ) ≥ Cov(X, Y ) 2 for classical random variables X, Y .If we make use of quantum-classical correlators beyond just ⟨O ⊗ O⟩ QC , then we can in principle use all the information contained in the superoperators η QC , η CC [Eq.(42)].This allows us to improve the simple bound above to where ζ CQ is the superoperator implicitly defined in (44).Now, for more general convex quadratic observables (N ⪰ 0), we can always perform a decomposition of the superoperator N = a µ a |C a ⟫⟪C a |, where µ a ≥ 0 are eigenvalues and C a are eigenoperators.Due to the nonnegativity of µ a , we can apply the bound (B12) with O = C a for each a separately, which gives Eq. (49).

Numerically stable upper bound for von
Neumann entropy In the main text, we derived a general upper bound for the average von Neumann entropy (56), which for the special case A z = − log ρ C z reduces to the quantumclassical entropy introduced in Ref. [34].Unfortunately, this leads to an expression that is numerically unstable when the classical states ρ C z are near-singular, due to the need to take the logarithm of the operators ρ C z .However, thanks to the flexibility of our optimization-based approach, we can devise a simple solution, where we modify the chosen operators A z to obtain a better bound.Here we consider the case where we measure the two observables defined in Eq. ( 59), denoted A + λ 1 ⟨A (1)  z ⟩ + λ 2 ⟨A (2)  z ⟩ (B13)

FIG. 1 .
FIG. 1. (a)In this work, we are interested in ensembles of quantum states E Q [Eq.(1)] that arise from some quantum dynamics featuring measurement.This could be a hybrid quantum circuit, the projected ensemble, or some other protocol of interest.To be general, we can think of the device as a black box which on a given run of the experiment, with with probability pz outputs a label z (representing the outcome of all measurements during the dynamics), along with a corresponding quantum state ρ Q with H, H Y Hadamard and Y -Hadamard gates, respectively).Then we can use the dual frame

1 .
Powers of expectation values.-Whileexpectation values of observables ⟨O⟩ = Tr[ρO] are linear in ρ, and hence directly estimable without any additional analysis, often we are interested in integer powers of expectation values, Tr[ρO] k .The quantum-quantum correlator (

z and b 1 =
P QC .Then, altogether, the bound (B9) reduces to a result of Ref.[34]: Taking A (1) z = − log ρ C z and A (2) = + log ρ C1 z , one can set λ 1 = λ 2 = 1.With this particular choice, the argument of the logarithm in the above becomes ∥ Tr Q2 e log ρ C z

A. Interpretation and consequences of Theorem 1
Theorem 1 and its corollary immediately tell us something regarding what can be inferred about certain quantities-most obviously those that measure how mixed the conditional states ρ Q z are.The average global purity E z Tr[(ρ Q z ) 2 ] and the average global von Neumann entropy

ACKNOWLEDGMENTSI
am grateful to Sam Garratt and Wen Wei Ho for insightful discussions.Support from Trinity College, Cambridge is acknowledged.Appendix A: Proof of Theorem 1

F 2 ,
− (O, C) := inf ρ∈D Tr[ρO] 2 − Tr[ρC](B2) by analogy to Eq.(36).While we cannot exactly solve this problem analytically in full generality, it is possible in the case where H is a single qubit (Hilbert space dimension d = 2).Our solution to this case will prove instructive when it comes to treating the more general scenario d ≥ 2 later; we thus briefly specialise in the following.Single qubit.-Whenwe fix d = 2, since both C and O are traceless, one can rescale and perform rotations in Hilbert space appropriately to map the problem of evaluating F 2,− (O, C) to the case O = Z, and C = α 1 X +α 3 Z, where (X, Y, Z) are Pauli matrices, and α 1 , α 3 ≥ 0 are nonnegative scalars.Evidently, α 1 and α 3 are the components of C orthogonal and parallel to O, respectively.This effectively reduces F 2,− to a function of two variables (α 1 , α 3 ).

z
|Z⟫ = 0 as I ⊥ , and those for which A (i) z = c zi Z as I ∥ .We temporarily fix λ i∈I ⊥ = 0 by hand and then optimize over λ i∈I ∥ .Using the expansion Eq. (B4) with the appropriate replacement of parame-ters, we obtainh − (λ i )| λi∈I ⊥ =0 = z

=
c zi O for scalars c iz , whose complement I ⊥ is made up of operators orthogonal to O in operator space, namely Tr[A (i) z O] = 0 for i ∈ I ⊥ .Following the same logic as before, we set λ i = 0 for i ∈ I ⊥ , after which the second argument of the function F 2,− in Eq. (B1) becomes proportional to O.The resulting minimization problem (B2) can then be directly evaluated for arbitrary d F 2,− (O, αO) = −∥O∥ 2 ∞ × min(α 2 /4, |α|) (B10) of these two observables we have a Lagrange multiplier λ 1,2 .In terms of the expectation values ⟨A (1,2) z ⟩, the dual function can be obtained by minimizing with respect to ρ Q z , givingh + (λ 1,2 ) = E z log Tr[(Π > z ρ C z Π > z ) λ1 ] + e −λ2 rank(Π ≤ z ) d is met for all z.This ) (Eq. (B7) actually becomes an equality if we have | i∈I ∥ λ i c zi | ≤ 2 for all z.)As before, this can be maximized to obtain a certificate max λi : i∈I ∥