Quantifying quantum speedups: improved classical simulation from tighter magic monotones

In the stabilizer circuit model of quantum computation, universality requires a resource known as magic. Here, we propose three new ways to quantify the magic of a quantum state using magic monotones, apply the monotones in the characterization of state conversions under stabilizer operations, and connect them with the classical simulation of quantum circuits. We first present a complete theory of these quantifiers for tensor products of single-qubit states, for which the monotones are all equal and all act multiplicatively, constituting the first qubit magic monotones to have this property. We use the monotones to establish several asymptotic and non-asymptotic bounds on state interconversion and distillation rates. We then relate our quantifiers directly to the runtime of classical simulation algorithms, showing that a large amount of magic is a necessary requirement for any quantum speedup. One of our classical simulation algorithms is a quasi-probability simulator with its runtime connected to a generalized notion of negativity, which is exponentially faster than all prior qubit quasi-probability simulation algorithms. We also introduce a new variant of the stabilizer rank simulation algorithm suitable for mixed states, while improving the runtime bounds for this class of simulations. Our work reveals interesting connections between quasi-probability and stabilizer rank simulators, which previously appeared to be unrelated. Generalizing the approach beyond the theory of magic states, we establish methods for the quantitative characterization of classical simulability for more general quantum resources, and use them in the resource theory of quantum coherence to connect the $\ell_1$-norm of coherence with the simulation of free operations.


I. INTRODUCTION
The effort to develop feasible quantum technologies relies on harnessing the power of intrinsic quantum phenomena to provide advantages in computation and information processing. A deeper understanding of the operational and practical aspects of such quantum resources therefore remains one of the most important goals of quantum information research. The framework of quantum resource theories aims to understand such phenomena in a unified fashion, providing methods for their characterization and utilization [1]. In quantum computation, an illuminating perspective is to view the quantum advantage as originating from a special resource known as magic. The free states and operations, those with no magic, include the stabilizer states, Clifford unitaries, and Pauli measurements. A magic monotone is a function which quantifies magic, and there are many suitable choices of such monotones, each with distinct merits and limitations. There are two strong motivations for the development of magic theory. First, in fault-tolerant quantum computation, the stabilizer operations are simple to perform, whereas magic operations have a much larger spacetime cost due to the need for magic state distillation [2,3]. Therefore, quantifying the magic of the target unitary or state can lower bound the cost of performing a particular computation or state conversion task (such as distillation) [4]. The second motivation is the Gottesman-Knill theorem, which implies that stabilizer operations can be efficiently classically simulated [5,6], so possession of magic is a necessary condition for a quantum computational advan-tage. Recent work has further linked some magic monotones to the runtime of classical simulation algorithms [7][8][9][10][11], so that one needs an exponential amount of magic for an exponential quantum speedup.
One peculiarity of magic theory is that it behaves differently depending on whether it models qubit quantum computers or qudit (with odd dimension) quantum computers. Our primary interest in this article is on qubit magic theory, but it is informative to first review the situation with qudits [12][13][14]. In odd dimensions, the discrete Wigner function or discrete phase space [15,16] conveniently represents quantum states. States with a positive Wigner function are called positive states and, interestingly, the class of positive states is larger than the set of stabilizer states. Veitch et. al. [17] introduced a qudit monotone called mana, which quantifies the amount of negativity in the Wigner function. Later, Pashayan et. al [7] showed that the mana of ρ quantifies the runtime of a classical simulation algorithm that estimates the probabilities of outcomes of Clifford circuit acting on ρ. More recently, Wang et. al. introduced some additional qudit magic monotones [18] based on the qudit Wigner function, but without obvious connections to classical simulation algorithms. Several of these monotones, including the mana, have the convenient property that they are additive (M(ρ ⊗ σ) = M(ρ) + M(σ)) or multiplicative (M(ρ⊗σ) = M(ρ)M(σ)). The additivity or multiplicativity of monotones is useful as it enables their evaluation on tensor products of many states, which helps us prove bounds on state conversion even in the asymptotic limit and, where relevant, estimates the runtime of classical simulation algorithms.
Curiously, for qubits there is no such clean notion of a qubit arXiv:2002.06181v1 [quant-ph] 14 Feb 2020 Wigner function. A straightforward application of techniques that work for qudits yields a Wigner function that can be positive for pure non-stabilizer states, and even negative for some pure stabilizer states. If one restricts the free operations and states to a subclass of the usual free operations, then a wellbehaved Wigner function can be defined [19,20]. Recently, another qubit phase-space representation was proposed which allowed for the full class of free operations with phase point operators defined using so-called closed non-contextual sets of Pauli operators [21]. However, it has the unusual property that associated monotones are super-multiplicative and that the class of positively represented states is not closed under tensor products. No known qubit phase-space representation captures every property one can easily discover in the qudit case, and severe no-go theorems prevent such a result [22]. While discrete phase space is the cornerstone of qudit magic theory, in the qubit setting many other ideas have been explored because of the limitations of all qubit phase-space representations. In particular, the stabilizer rank is a monotone for pure states, and relates to a powerful classical simulation technique [8]. Later, Ref. [9] developed the notion of approximate stabilizer rank and related it to a monotone called the extent [11]. While the stabilizer rank is typically not multiplicative (or additive), the extent is multiplicative on any tensor product of one-, two-, and three-qubit states [11]. Unfortunately, these concepts only apply to pure states. A parallel research direction considered quasi-probability distributions over stabilizer states as the basis of classical simulation algorithms [10,23,24] with the relevant state-based monotone called the robustness of magic. While this gives a qubit resource theory for mixed states, the robustness of magic is again not multiplicative and simulation algorithms based on quasi-probabilities are often slower than those based on stabilizer rank.

A. Summary of results
In this article, we introduce three new qubit magic monotones for mixed states. We find that for pure states all our monotones equal the extent, and so our monotones generalize the extent to mixed states.
We present in Sec. III a complete picture of how these monotones compare by showing that they are all equal for general, possibly mixed, single-qubit states. In Sec IV, we extend this result to cover all tensor products of single-qubit states and further show that in this setting the monotones are strictly multiplicative. This is the first multiplicativity result for any qubit magic monotone for mixed states, and the proof technique draws on the prior result on multiplicativity of the pure-state extent [11].
In Sec. V we compare our new monotones with the robustness of magic and show that they can be exponentially smaller in magnitude, with consequences for runtime scaling of related classical simulation algorithms. In Sec. VI, we discuss how these monotones can bound the rates of magic state distillation protocols. We compare our bounds against another recent lower bound due to Fang and Liu [25], and observe that our bounds are often stronger in practical regimes.
We relate our three magic monotones to the runtime of classical simulation algorithms in Sec VII. When multiplicativity holds, our magic monotones help us understand the runtime scaling precisely. Our first simulators (Sec. VII A) generalize quasi-probability and phase-space methods for estimating expectation values and Born rule probabilities. Instead of sampling from stabilizer states or phase-point operators, we sample from objects we call stabilizer dyads. We show the corresponding resource monotone to be smaller than the robustness of magic, which means that fewer samples are required. This performance gap can be exponentially large: for instance, for n copies of a T state the dyadic simulator has a runtime O(4 0.228443n ), whereas for simulators based on the robustness of magic [10,24] the runtime is Ω(4 0.271553n ).
Our second classical simulator (Sec. VII B) extends the stabilizer rank simulator to mixed states. Our simulator chooses a pure state randomly from some ensemble of states and executes a stabilizer rank simulator. The techniques involved are suitable for both estimating expectation values and also drawing samples from the same distribution as a quantum computer. Additionally, we tighten prior error bounds of stabilizer rank simulations, so that for a large class of states (including T states) we improve the runtime over the best prior art by a factor 1/δ where δ is the simulation approximation error. The key technical insight is that by considering the trace distance between a state to be simulated and an entire ensemble of states (a mixed density matrix), we can quadratically improve the error bounds. Our third simulator (Sec. VII C) is efficient, but often noisy, and approximates the magic state with a nearby mixed stabilizer state.
Finally, in Sec. VIII we outline how our results extend beyond the theory of magic states. Although classical simulation relying on specific properties of quantum systems has been studied in various contexts apart from magic theory [7,[26][27][28][29], there have been no known tools allowing one to easily adapt such approaches to the umbrella of quantum resource theories [1] in general. We provide a comprehensive recipe to apply our methodology to general quantum resources. We thus establish connections between a family of resource monotones and simulation tasks, and shed light on classical simulation algorithms in broader settings. For instance, in the resource theory of quantum coherence [30,31], the 1 -norm of coherence is a fundamental quantifier of coherence but lacks an operational meaning. Our results fill this gap by giving the 1 -norm of coherence an operational meaning within the context of simulating incoherent operations.

II. MAGIC MONOTONES
We use V n to denote the set of pure, n-qubit stabilizer states, and we use |φ j to denote a particular n-qubit stabilizer state in V n . We use F n for the set of mixed stabilizer states, which are convex combinations of projectors onto pure states. We use O n to denote the free operations, which we take to be the completely stabilizer-preserving channels [24]. These are defined as follows. Definition 1. We define the set of free operations O n as the set of channels E that are: (i) completely positive; (ii) trace preserving, so that Tr[E(ρ)] = Tr[ρ]; (iii) completely stabilizer preserving in the sense that This set can equivalently be defined via the Choi-Jamiołkowski isomorphism [24,32,33]: Lemma 1. Let E be an n-qubit completely positive tracepreserving (CPTP) map. Then E ∈ O n if and only if: where: and where the summation is over all n-qubit computational basis states |j .
The set of free operations O n includes the so-called (trace-preserving) stabilizer operations: Clifford unitaries, Pauli measurements, and composition of these operations using classical feedforward and convex combinations. The Gottesman-Knill theorem has long been known to show that these specific operations can be efficiently classically simulated. However, interestingly, it was recently shown that the class O n of all completely stabilizer-preserving operations also admits efficient simulation algorithms [24]. While it is clear that the stabilizer operations are contained in O n , it is not known whether all elements of O n can be realized by the standard stabilizer operations without post-selection.
It is also known that certain stabilizer-preserving but tracenon-increasing maps, such as post-selection on the outcome of a Pauli measurement, can also be efficiently simulated. For technical reasons we do not consider these as elements of the convex set of free operations O n in our resource theory, but we exploit their simulability in Section VII.
Although in our discussion we specialize to the theory of magic states, our basic considerations below can also be applied to more general resource theories in which the set of free mixed states is the convex hull of free pure states V n , including important examples such as coherence and entanglement. We elaborate on this in Sec. VIII.
We begin by introducing several magic monotones. For pure states, we define the following. Definition 2 ( [11]). The pure-state extent ξ is the quantity In magic theory, ξ is the stabilizer extent [11]. A related quantity appears in other resource theories such as entanglement, where it admits an analytical formula as the squared sum of the Schmidt coefficients of a state [34], or in coherence theory, where it is the square of the 1 -norm of coherence [30]. It is well known [11,35] that this can be recast as a dual optimization problem Here, we define an ω-witness to be any feasible solution to the optimization problem in Eq. (5).
We now consider four monotones, of which three can be regarded as mixed-state extensions of ξ. Firstly, one can extend the above to mixed states using the usual convex roof extension [36]: where every |Ψ j is a pure state and p j are non-negative coefficients such that j p j = 1.
Furthermore, if the minimum can be achieved with a decomposition where all ξ(Ψ j ) are equal, then we say the state admits an equimagical decomposition.
We also consider quasi-probability distributions over free states as follows.

Definition 4 ([10]
). The robustness R is the quantity where q j are real coefficients.
In magic theory, R is called the robustness of magic [10,37], inspired by the (standard) robustness of entanglement [38]. The robustness uses decompositions where the rank-one ket-bra terms are Hermitian. We can relax the robustness to obtain the following.
Definition 5. The dyadic negativity Λ is the quantity where the coefficients α j are complex numbers.
The name reflects the fact that each |L j R j | comprises of a pair of vectors, and so is a dyad. Within the resource theory of entanglement, a related quantity called the projective tensor norm was considered [34,39], and in the resource theory of coherence the dyadic negativity corresponds to the 1 -norm of coherence [30]. Viewing this quantity as the primal solution of a convex optimization problem, it is useful to state the equivalent dual formulation [35] in terms of witness operators. We define the set of W -witnesses, denoted W, to be the Hermitian operators such that which by strong duality leads to This bring us to our last monotone of interest. Definition 6. The generalized robustness Λ + is the quantity where W is the set of W -witnesses.
A corresponding quantity to Λ + appeared in several resource theories, including entanglement [38,40] and coherence [41]. Notice that this is similar to the dual formulation given in Eq. (10) except we further restrict to witnesses that are also positive semidefinite operators. We define a W + -witness to be any feasible solution to the optimization problem in Eq. (11). Since W + -witnesses are positive semidefinite, the condition | L|W |R | ≤ 1 simplifies to ψ|W |ψ ≤ 1 for all |ψ ∈ V n . Furthermore, the primal form of this monotone can be given as = min λ ≥ 1 : where the optimization in the second line is over all density matrices ρ . This form motivates the name of generalized robustness: rearranging Eq. (7), the robustness R can be similarly expressed as We stress that both Λ and Λ + are computable, in the sense that their evaluation corresponds to convex optimization problems -a second-order conic optimization for Λ, and a semidefinite program for Λ + -which can be evaluated using numerical solvers [42]. In practice, we were able to compute Λ + up to n = 4 and Λ up to n = 3, but one can certainly hope to make further progress in computing the quantities for states obeying some symmetry, just as in the case of R [37]. The evaluation of convex-roof-based quantities such as Ξ is notoriously hard in general [43], although one could again use symmetry to facilitate it in special cases [44]. Our results in Sec. III-IV further simplify the computation of all of the monotones for the practically important case of tensor products of single-qubit states.
The monotones have been considered from the perspective of general resource theories [35] and it is already known that they satisfy the following: 3. strong monotonicity (monotonicity on average under selective free measurements): where {K i } i are the Kraus operators of a quantum channel such that each K i is stabilizer preserving, i.e. K i |φ ∝ |φ ∈ V n ∀|φ ∈ V n , and p i = Tr(K i ρK † i ); 4. convexity: M( j p j ρ j ) ≤ j p j M(ρ j ); We remark that, although R and Λ + are monotones in any convex resource theory, the fact that Λ and Ξ obey monotonicity under all completely stabilizer-preserving operations O n is a consequence of two properties: the strong monotonicity of the measures [35] coupled with the fact that any operation O ∈ O n can be expressed in terms of Kraus operators {K i } i which preserve the set of stabilizer states [24]. If we instead work with logarithmic monotones, M log (ρ) = log[M(ρ)] then multiplicativity becomes additivity, faithfulness instead has a M log (ρ) = 0 condition, and due to concavity of the logarithm M log is no longer a convex function but still obeys strong monotonicity [45]. Here we find it convenient to work without the logarithm in most cases. Next, we present some general relations between these monotones that were previously known.
Lemma 2 (Regula [35]). For pure states Therefore, our monotones can be interpreted as mixed state extensions of ξ. For completeness, we provide an alternative proof using the notation of this work in App. A. We also observe the following.
Theorem 3. For any state ρ we have This is reminiscent of known results in general resource theories [35].
Proof of Thm. 3. Since the proof is straightforward, we review it here. Recall that Λ is the result of maximizing over all W -witnesses, whereas Λ + is limited to all W + -witnesses, which immediately leads to Λ + (ρ) ≤ Λ(ρ). To show Λ(ρ) ≤ Ξ(ρ), one simply takes the optimal decomposition w.r.t to Ξ, as follows Next, we insert this decomposition into Λ and use convexity Using Lem. 2 we have which completes the proof of Thm. 3.
Since Λ + is easier to evaluate than Λ and Λ + ≤ Λ, in practical settings, one can approximate Λ by evaluating Λ + .

III. SINGLE-QUBIT MAGIC STATES
In this section, we present a complete theory for singlequbit magic states. We recall that there are six single-qubit, pure stabilizer states, each the "+1" eigenstate of some Pauli operator and we reuse this notation throughout. A mixed stabilizer state is any convex combination of these. Important single-qubit magic states include |H and |F , where which we use throughout.
The key question is whether the inequalities of Thm. 3 can be tight, thus unifying the different approaches to the quantification of magic. Here we show that, indeed, the inequalities are tight for all single-qubit states.
We will see in the following section that this equivalence persists for tensor products of single-qubit states. The proof of Thm. 4 rests on a trio of lemmata. First, we have: Lemma 5 (The monotone equality lemma). For any ωwitness |ω , we define the set B ω to be the convex hull of all pure states Ψ for which | ω|Ψ | 2 = ξ(Ψ). It follows that for all ρ ∈ B ω we have Proof of Lem. 5. If ρ ∈ B ω , we can find a convex decomposition where | ω|Ψ j | 2 = ξ(Ψ j ) for all j. We can use this decomposition to obtain an upper bound on the mixed-state extent as follows On the other hand, W = |ω ω| ∈ W + and so can be used to lower bound the generalized robustness to show Combining Eq. (27) and Eq. (28) with Thm. 3, we have Therefore, these inequalities all collapse to equalities.
Making use of Lem. 5 requires us to first understand the structure of optimal ω-witnesses, which we shall discuss soon. However, first it is useful to define some different subsets of the Bloch sphere.
We further subdivide the positive octant as follows: where we use the shorthand M := Tr[ρM ]. See Fig. 1 for an illustration of P Y .
The sets P X , P Y and P Z further divide the positive octant into thirds and it is easy to verify that P = P X ∪ P Y ∪ P Z . These sets are not quite disjoint because of the following proposition.
Proposition 6. From Def. 7, we have the following This is straightforward to prove. For example, in P Z the smallest expectation value is for Z and for P Y the smallest expectation value is for Y . Therefore, in the intersection these two expectation values must be equal. We note that any state is Clifford equivalent to a state in the positive octant P . Furthermore, the Clifford satisfies Therefore, the sets P X , P Y and P Z are Clifford equivalent and therefore every state is Clifford equivalent to some ρ ∈ P Y . Now we are ready to characterize optimal ω-witnesses. For these boundary states, we know (by Lem. 7) that the optimal ωwitness is given by Eq. (34) with the parameter set to q = 2/3. For other pure states in PY , the ω-witness still has the form given by Eq. (34) but the parameter q may be greater than 2/3. However, interestingly, the majority of pure states in PY have an optimal ωwitness with q = 2/3 and these are shown in yellow in this plot. On the geodesic through |0 , |H and |+ , we have that q = 1. Between this geodesic and the yellow region, q varies continuously from 1 to 2/3 and this intermediate region is shown in green.
Lemma 7. Let |Ψ be any pure, single-qubit state in the set P Y . Then the ω-witness |ω that achieves | Ψ|ω | 2 = ξ(Ψ) has an operator representation of the form where 2/3 ≤ q ≤ 1 and H = (X + Z)/ √ 2. Furthermore, if |Ψ is in the set P Y ∩ P X or P Y ∩ P Z then q = 2/3 and the ω-witness takes the form The actual value of the variable q is easy to numerically compute, but is analytically complicated and not instructive to present. Rather, in Fig. 1, we illustrate the region P Y and highlight where q = 2/3 and q > 2/3.
Proof of Lem. 7. We begin by observing that for any |Ψ there exists a decomposition into stabilizer states such that |Ψ = j c j |φ j and ξ(Ψ) = ( j |c j |) 2 . Given an optimal ωwitness we have Therefore, Given that | ω|φ j | ≤ 1, the above equality can only hold if | ω|φ j | = 1 for every j with |c j | > 0. In particular, if Ψ is a non-stabilizer state it must have at least two non-zero c j terms, and there must exist at least two stabilizer states such that | ω|φ j | = 1. We return to use this fact shortly. Using the set of Pauli matrices as a matrix basis, we can write where the coefficients q x , q y and q z are real. Since |ω ω| is a rank-1 operator we know and since |ω ω| is a positive operator we have λ > 0. Given a valid ω-witness, we can always obtain another valid ω-witness by permuting any of {q x , q y , q z } or changing the signs. Therefore, the optimal ω-witness for a state in the set P Y has q x ≥ q y and q z ≥ q y , since this ordering maximizes | Ψ|ω | 2 . This means that the two stabilizer states with the largest overlap with |Ψ are |+ and |0 . We showed earlier there must be at least two stabilizer states for which | ω|φ j | = 1, so we conclude | ω|+ | = 1 and | ω|0 | = 1. It follows that q x = q z and we define q := √ 2q x = √ 2q z . Condition Eq. (39) implies that q y = 1 − q 2 so we have shown that the optimal ω-witness has the form Furthermore, | ω|+ | = 1 implies that Lastly, we note that only satisfies | ω|+ i | ≤ 1 if q ≥ 2/3. Therefore, we know the form of the ω-witness in the set P Y and proved that 2/3 ≤ q ≤ 1. Next, consider the special case when the state is at an intersection, such as P Y ∩ P Z . Then, the optimal ω-witness has the above form determined for the region P Y . However, the region P Z only differs by an F rotation, so the optimal ω-witness must have a similar form but with the Pauli operators permuted, so that The only way Eq. (34) and Eq. (43) can both be true, is if q = p = 2/3. A similar argument holds for P Y ∩ P X and this proves Lem. 7.
Our third lemma now shows that every mixed state is contained in an appropriate convex set. Lemma 8. For any single-qubit non-stabilizer state ρ, there exists a ω-witness |ω such that ρ ∈ B ω (as defined in Lem. 5).
This implies that for single-qubit states we can leverage Lem. 5 and Lem. 7 to prove Thm. 4.
Proof of Lem. 8. We consider individual slices of the Bloch sphere such that where σ F = (X + Y + Z)/ √ 3 and f is a constant labelling a particular slice. Let us denote S f as the set of all pure states inside this slice. For every non-stabilizer state in the positive octant we have 1/ √ 3 ≤ f , and for all normalized states we have f ≤ 1. Within this slice there are three special, pure states, which are where a obeys Crucially, these states are the unique pure states of the following set intersections.
Referring back to Prop. 6, it is clear that these states must have the form given in Eq. (45). Notice that these special states are Clifford rotations of each other. By Lem. 7 these three special states all have the same optimal ω-witness given by Eq. (35). Since they share their optimal ω-witness, Lem. 5 applies to all convex combinations Therefore, for these states, a mixture of states with the same amount of magic achieves the optimal convex roof extension.
That is, each of these states admit an equimagical decompositions (recall Def. 3) Next, we consider mixed states outside the convex hull of Fig. 2. Let us define a set of linearly independent, Hermitian unitary We encountered σ F earlier but reintroduce it here. The set {σ A , σ B , σ F } is unitarily equivalent to {X, Y, Z}, so every state can be decomposed in this basis where inside the slice S f we have r F = f . The variables r A and r B are used for the co-ordinate system in Fig. 2. Given a mixed state ρ, we can define a pair of pure states and the states are pure, so that There are two possible solutions for Φ ± ρ |σ B |Φ ± ρ , which leads to By construction, ρ is a convex combination of Φ + ρ and Φ − ρ . The geometry is illustrated in Fig. 2, where the pair of purified states are shown as green dots with ρ located on the line between them. To deploy Lem. 5, it remains to prove that Φ ± ρ share an optimal ω-witness. Let us assume for now that Φ ± ρ are both in the region P Y , then we can use Lem. 7 to determine the form of their optimal ω-witnesses. In Lem. 7, the witness ω(q) had a free parameter q that we had to maximize over. Since ω(q)|σ B |ω(q) = 0 for any q value, we have Performing the maximization over q, the optimal q value is the same for Φ + ρ and Φ − ρ due to Eq. (56). Therefore, Φ ± ρ share exactly the same optimal ω-witness.
This almost completes the proof, but we assumed above that Φ ± ρ are both in the region P Y and this has yet to be shown. It is perhaps clear to some readers from the geometry shown in Fig. 2, but let us also show it formally. For instance, we need to know that  (47), and the purple triangle denotes the convex hull of the set For states outside this convex set, we instead consider the state as a mixture of two pure states |Φ + ρ and |Φ − ρ as defined in Eq. (55) and shown with green dots.
The first condition is easy to confirm because Next, we tackle the second inequality (with the third inequality following in a similar fashion). One computes that which is positive whenever or more concisely For mixtures of Ψ X f and Ψ Z f , we find this holds with equality.
However, we are currently considering ρ outside this set, just outside the facet spanned by Ψ X f and Ψ Z f . Therefore, Eq. (61) indeed holds. This completes the proof of Lem. 8 and thus also of Thm. 4.

IV. MULTIPLICATIVITY
We now study the behavior of the monotones Ξ, Λ, and Λ + for tensor products of states. It was found by Bravyi et al. [11] that ω-witnesses of small dimension are closed under the tensor product, formalized as follows.
This is a rewording of Corollary 1 and Corollary 3 of Ref. [11]. From the above result, Ref. [11] further showed that the extent is multiplicative for such tensor products: Theorem 10 ( [11]). Let |ψ j be 1-, 2-, or 3-qubit states. Then Here, we give a related multiplicativity result for several mixed state monotones.
Theorem 11. Let σ j be single-qubit states. Then Furthermore, we know of no obstruction or counterexamples preventing multiplicativity in full generality, and indeed numerical results suggest that Λ + is also multiplicative for mixed two-qubit states. Prior to this work, there were no known strict multiplicativity results for resource monotones for mixed states in qubit magic theory. For instance, Howard and Campbell [10] found that the robustness of magic can be strictly sub-multiplicative, R(ρ ⊗ ρ) < R(ρ) 2 for all nonstabilizer ρ considered, and we discuss this later in this section. There does exist a multiplicative lower-bound on the robustness of magic, proved using the so-called stab-norm [10]. However, the lower bounds and upper bounds appear to always be loose and so we have no strict multiplicativity results. Additionally, Raussendorf et al. [21] introduced a qubit-based phase-space robustness R ps that can behave strictly supermultiplicatively, so that R ps (ρ ⊗ ρ) > R ps (ρ) 2 for some ρ.
Proof of Theorem 11. From the definition of Ξ we see that it is manifestly sub-multiplicative. Combining this observation with Thm. 4 we have that holds for all products of single-qubit states. Strengthening this to strict equality requires us to find a matching lower bound. The proof of Thm. 4 established that for every single-qubit state the optimal W + -witness has the form |ω j ω j | where ω j is an ω-witness. By Thm. 9, |Ω = ⊗|ω j is also an ω-witness, and consequently |Ω Ω| = ⊗|ω j ω j | is a W + -witness that can be used to lower bound Λ + as follows Combining Eq. (64), Eq. (65) and Thm. 3 we obtain Since the left-and rightmost quantities are the same, all these inequalities must collapse to equalities.

V. COMPARISON WITH ROBUSTNESS
Here we discuss how our new monotones scale compared to the robustness of magic (recall Def. 4). While Λ + , Λ and Ξ are often equal, the robustness of magic is typically much larger, as formalized in the following result.
Lemma 12. For any density matrix ρ we have Furthermore, if ρ is a single-qubit state we have the tighter bound We remark that a similar result to Eq. (67) for Λ is claimed in Refs. [35,39], but the proof contains an error. See Fig. 3 for a numerical comparison of R(ρ) and Λ + (ρ) for a class of 1-qubit states for which our bound is tight. However, because the robustness of magic is not multiplicative, Lem. 12 does not tell us much about how the different monotones scale. For this, we observe that the gap can scale exponentially.
Theorem 13. Given any single-qubit non-stabilizer state ρ, there exists positive real constants α and β where α > β and so that For example, for the Hadamard |H state we will show that α = 0.271553 and β = 0.228443.
Proof of Lem. 12. The dual formulation of the robustness of magic tells us that R(ρ) ≥ Tr[Rρ] for any R such that | φ|R|φ | ≤ 1 for all |φ that are stabilizer states. We call such an operator an R-witness. Note that an R-witness is not necessarily positive. Let W denote the W + -witness such that Λ + (ρ) = Tr[W ρ]. Now, we consider the operator where s = min φ∈Vn φ|W |φ .
Next, we show R is indeed an R-witness. We have for any |φ ∈ V n that where we have used φ|W |φ ≤ 1. We also have where we have used φ|W |φ ≥ s. Therefore, R is indeed an R-witness and we can lower bound the robustness as follows Since Λ + (ρ) ≥ 1, the right hand side is monotonically increasing with s on the relevant range s ∈ [0, 1). This prompts the question whether we can lower bound s. By definition s ≥ 0 for any W + -witness and so Eq. (67) holds in general.
In the special case of single-qubit states, and assuming for brevity that ρ ∈ P Y , we know the optimal witness has the form W = |ω ω| given by Lem. 7. Since |0 has the largest possible overlap with |ω , it follows that |1 must have the smallest possible overlap and one finds that Over the allowed range q ∈ [ 2/3, 1], we have for every optimal single-qubit W + -witness. Substituting this into Eq. (75) gives Eq. (68).
Proof of Thm. 13. The stab-norm D has been shown to provide a lower bound on the robustness of magic (see the supplementary material of Ref. [10] and also Ref. [46]) as follows where for single-qubit states the stab-norm is Defining α = log 2 (D(ρ)), we obtain Eq.  and also Ref. [46]) also provides a lower bound on the robustness of magic R. While the stab-norm bound is loose, it establishes an exponential gap for many copies of a single-qubit state (see Thm. 13).
For this class of states, the bound in Eq. (68) is tight.
To show α > β in general, we need to show that D(ρ) > Λ(ρ) for all non-stabilizer, single-qubit states. First, we note that for the single-qubit case the robustness is easily solved and found to be for any non-stabilizer state. Therefore, Combining this with Lem. 12, we get which simplifies to This reveals that D(ρ) > Λ + (ρ) whenever Λ + (ρ) > 1.
We further remark that the robustness of magic is not multiplicative and the known upper bounds on R(ρ ⊗n ) are loose compared to the lower bound in Eq. (69). For instance, Heinrich and Gross [37] showed that for the Hadamard state (or the equivalent T -state) R(|H H| ⊗n ) = O(2 0.368601n ) and this is the best known upper bound.

VI. DISTILLATION AND ASYMPTOTIC RATES
We now consider the scenario of distillation -that is, consuming many copies of an input resource state ρ to prepare copies of some target state -and show how the quantifiers we introduced characterize this task. Firstly, it is easy to see using the multiplicativity of the magic monotones Λ, Λ + , and Ξ for single-qubit systems together with their monotonicity that, whenever there exists a stabilizer operation taking ρ ⊗k → σ ⊗m for some single-qubit ρ and σ, we must have and analogously for the other magic monotones. This already allows one to obtain insightful no-go results on the transformations between stabilizer states and gate synthesis, along the lines considered in [10] but without the need to perform the difficult computation of the monotones for many copies of a state. However, in practical settings it is often desirable to go beyond such exact transformations and consider protocols which allow for imperfect conversion. Our quantifiers can yield bounds for the efficiency of more general distillation protocols and their asymptotic rates. We focus on the magic monotone Λ + as it is the most efficiently computable out of the three and gives us the tightest bounds. A useful property of Λ + is its monotonicity on average under general probabilistic protocols. More specifically, we have [35] where each O i is a stabilizer-preserving quantum operation that need not preserve trace Consider then any protocol which takes k copies of a given n-qubit input ρ to m copies of some desired pure output state ψ, up to error ε in fidelity, and succeeding with probability p. Specifically, assume that there exists a probabilistic (that is, not necessarily trace preserving) stabilizer operation taking ρ ⊗k → pτ , where τ is a state such that ψ ⊗m |τ |ψ ⊗m ≥ 1 − ε. Using F (ψ) = max |φ ∈Vn | ψ|φ | 2 to denote the stabilizer fidelity [11], we then have where the first inequality arises from the sub-multiplicativity of Λ + , the second inequality arises from Eq. (85), and the third inequality arises because |ψ ψ| ⊗m /F (ψ ⊗m ) is a W + -witness and hence a feasible solution to the dual form of Λ + in Eq. (11). Furthermore, if ψ is any single-qubit, twoqubit or three-qubit pure state, then F (ψ ⊗m ) = F (ψ) m (see Ref. [11] or Thm. 9). This implies that at least k copies of ρ are necessary to distill any such state ψ, where This characterizes the dependence on the resources contained in both ρ (as quantified by Λ + ) and in ψ (as quantified by stabilizer fidelity F ). When p = 1, Eq. (86) recovers a related recent bound of [47]. Another bound of this kind, which also explicitly depends on Λ + (ρ) and F (ψ) but exhibits a different scaling with respect to ε, was recently obtained in [25]. We compare the performance of the bounds in Fig. 4. Alternatively, if we use log Λ + instead of Λ + in the above derivation (noting that log Λ + also decreases on average under stabilizer protocols due to concavity of the logarithm), for any pure state ψ of at most three qubits we have which can give a better bound (see Fig. 4). When ε = 0, we obtain a bound on the "distillation efficiency", as considered for qudits in [17,18]: Additionally, the ultimate constraints on the convertibility between two states are often characterized in the asymptotic limit, where we are interested in the best achievable rate R(ρ → ψ) at which k copies of ρ can be approximately converted to kR(ρ → ψ) copies of ψ, with the error ε of this conversion vanishing in the limit k → ∞. Using Eq. (88) with p = 1, any such rate must satisfy which gives a semidefinite programming upper bound on the asymptotic rate of transformation between any state ρ and a pure state ψ of at most three qubits.
States of interest in magic state distillation include |H and |F [2]. These states obey a Clifford symmetry in the following sense; we say a state |ψ is Clifford symmetric if there exists an Abelian subgroup C ψ of the Clifford group such that: (i) C|ψ = |ψ for all C ∈ C ψ ; and (ii) |ψ is the unique state with this property up to a global phase. Crucially, any such state has extent equal to the inverse of its stabilizer fidelity [11], i.e.
When we already know the value of the extent ξ(ψ) is, we only need to evaluate Λ + (ρ) to determine the bound in (90). For instance, for the rate of transformation from any state to a Clifford symmetric state of up to three qubits, we get R(ρ → ψ) ≤ log Λ + (ρ) log Λ + (ψ) . Asymptotic distillation rates of the magic states |H and |F are bounded by and where we used the known values of ξ(|H ) and ξ(|F ) [4,11]. The above can be compared with the recent bounds obtained in [18] for qudit magic state theory, as our approach similarly yields computable upper bounds on the rates of distillation, although applicable to the fundamentally important case of qubit systems.
We can alternatively show these asymptotic results by bounding the achievable rates of transformations between states using stabilizer protocols using the regularized relative entropy of magic r ∞ (ρ) = lim n→∞ 1 n r(ρ ⊗n ) [17], where r(ρ) = min σ∈Fn D(ρ σ). The ratio r ∞ (ρ)/r ∞ (σ) provides a general upper bound on the rate R(ρ → σ) of the transformation from ρ to σ using stabilizer protocol. This upper bound is achievable whenever the states can be reversibly interconverted [17] or when the set of stabilizer protocols is relaxed to the class of operations which asymptotically preserve the set of stabilizer states [48]. Using the bounds r(ρ) ≤ log Λ + (ρ) for arbitrary states [49] and r(ψ) ≥ − log F (ψ) for pure states [49], we similarly obtain Eq. (90). Note that r(ψ) = log Λ + (ψ) for any Clifford symmetric state, and r ∞ (ψ) = log Λ + (ψ) for a Clifford symmetric state of at most three qubits.
Finally, we note that the best known magic state distillation protocols perform many orders of magnitude worse than our best bounds, to the extent that plots of performance versus bounds would not be informative. It remains a considerable challenge to close this gap.

VII. CLASSICAL SIMULATION OF MAGIC RESOURCE STATES
In this section, we present three classical algorithms that simulate the measurement statistics of various families of quantum states. Since the runtime and simulation error of our algorithms can be expressed in terms of the magic monotones described in Section II, we endow these magic monotones with direct operational meaning.
We define the notion of a quantum sub-theory to better understand the operational meaning of our magic monotones. We take the fundamental elements of a sub-theory (V, T ) to be a set of pure state vectors V and a set of operators T , such that if |φ ∈ V and K ∈ T , then K|φ = c|φ for some scalar c and |φ ∈ V.
We say that a sub-theory is free if: (i) each n-qubit state in V can be specified uniquely by poly(n) classical bits; (ii) for any K ∈ T we can efficiently compute the update K|φ = c|φ , including the prefactor c; and (iii) we can efficiently compute the complex-valued inner product φ|φ , for any pair of states in |V . Given this definition of an elementary free sub-theory one can always define three models for classical simulation that allow increasingly powerful simulation tasks: a pure state evolution model, a mixed state model, and a dyadic model. Although our work can be applied to any free sub-theory satisfying the above criteria, for concreteness we will focus on the stabilizer sub-theory, and discuss extensions later.
For the stabilizer sub-theory, the n-qubit pure state model takes the free states to be the n-qubit pure stabilizer states. We restrict the free operations to be the set of Clifford gates C n and the set of stabilizer projectors P n . Note that these are both subsets of T . Given an initial quantum state |ψ , a string of gates U (1) , . . . , U (T ) ∈ C n and a stabilizer projector Π, we can efficiently compute the associated Born rule probability, . This is essentially the setting of the original Gottesman-Knill theorem [5,6]. The pure state sub-theory extends naturally to a mixed state model, where the free states F n include all convex combinations σ = j p j |φ j φ j | of pure stabilizer states, and the free operations are the completely stabilizer-preserving channels (see Definition 1). Any channel E ∈ O n can be written in terms of elements of T via the Kraus representation:  [10,24]. We can further extend to a dyadic model of simulation, but we defer discussion to subsection VII A.
When our algorithms work outside the framework of free quantum sub-theories, they incur an additional cost manifested either as an increased runtime or a reduced precision in the final estimate. Our magic monotones can quantify this cost, and thereby determine how far a computational task de-parts from a given free sub-theory.
In subsection VII A, we generalize quasi-probability-based methods [7,10,23] for estimating Born rule probabilities or expectation values of bounded observables. By representing a non-free density matrix as a quasi-probability distribution over elements from the free sub-theory, one repeatedly samples from this distribution to estimate Born rule probabilities. The runtime of standard quasi-probability simulators is proportional to the square of the 1 -norm of the distribution. For us, this corresponds to the dyadic negativity squared of the initial state, Λ(ρ) 2 . As discussed in Section II, extension to the dyadic frame typically enables a reduction in the 1 -norm of the decomposition, and hence our simulator shows a speedup compared to previous related methods [10,23].
Next, we describe in subsection VII B an algorithm for sampling bit-strings at the output of a quantum circuit, given an initial mixed magic state ρ as input. We extend previous stabilizer rank-based methods [9,11], previously only defined for pure states, to arbitrary mixed states. A key step in the method of Ref. [11] approximates exact pure state decompositions with k-term sparsified decompositions, with the runtime of the simulator being O(k). The sparsification lemma (Lemma 6 in Ref. [11]) implies that to achieve target precision δ S in the trace-norm, one should set k ∝ c 2 1 /δ 2 S . Here, we improve this sparsification lemma so that under modest assumptions, we can set k ∝ c 2 1 /δ S . This dramatically improves the runtime in some important cases. We also show that the expected runtime of the simulator for optimal decompositions is O(Ξ(ρ)).
Finally, in subsection VII C, we give a simple method to efficiently estimate Pauli expectation values up to a finite precision on E(ρ) for stabilizer channel E and non-stabilizer state ρ, given a known primal solution to the generalized robustness problem, Eq. (12). The method works by approximating the magic state ρ with the stabilizer part of the decomposition. When the decomposition of ρ is optimal, the estimation error increases linearly with the generalized robustness Λ + (ρ), so that the technique eventually fails to give meaningful estimates when ρ has too much magic. Nevertheless, our algorithm can efficiently compute coarse bounds on measurement statistics for near-term noisy circuits, giving tightened bounds when the Pauli estimate is close to ±1.

A. Dyadic frame simulator
Howard and Campbell [10] proposed an algorithm similar to the quasi-probability simulator of Pashayan, Wallman and Bartlett [7], where the simulator's runtime scales quadratically with the robustness of magic. These algorithms produce additive error estimates of Born rule probabilities and under certain circumstances (see Ref. [50]), such algorithms can be lifted to simulators that approximately sample from the quantum output distribution. Ref. [10] expresses the initial states' density matrices as quasi-probability distributions over the set of pure, n-qubit stabilizer states {|φ j φ j | : |φ j ∈ V n }. The quasi-probabilities are real-valued, sum to unity, and can be negative. The algorithm of Ref. [10] proceeds by sampling pure stabilizer states from a probability distribution (constructed from the quasi-probability distribution). Sampled stabilizer states then propagate through a stabilizer circuit using the standard Gottesman-Knill tableaux method [5,6].
Here we extend the sub-theory simulation model to include dyads |L R| where |L and |R may be different stabilizer states. A 'state' σ is now considered free if it is in the convex hull of the dyads e iθ |L R|. Non-free density matrices are expressed as quasi-probability distributions over the set of n-qubit dyads, where quasi-probabilities are now complexvalued. As we shall see, the associated dyadic negativity will quantify the classical simulation overhead for estimating Born rule probabilities on a non-free state. Our algorithm proceeds by sampling dyads from a probability distribution (now constructed from the complex-valued quasiprobability distribution). Our extended algorithm must therefore track a complex pre-factor when updating a sampled dyad |L R| → K|L R|K † , whereas the original tableaux method used in the Gottesman-Knill theorem does not track this global phase. Subsequent extensions of the Gottesman-Knill method show that the update can be efficiently computed, including global phase, whenever K is a Clifford unitary or Pauli projection [8,9,11]. Moreover, we can efficiently compute the complex inner product L|R for any pair of stabilizer states [8,9,11]. For certain initial states, the 1 -norm of known quasi-probabilistic decompositions is dramatically reduced by extending the support to include all dyads, which improves the runtime of our algorithm exponentially. We capture the performance of our simulator with the following theorem.
Theorem 14. Let an n-qubit initial state with known dyadic decomposition be ρ = j α j |L j R j | for complex α j , where the associated probability distribution {|α j |/ α 1 } can be ef- is a completely stabilizer-preserving channel that acts non-trivially on at most b qubits. Suppose that every O (t) has a known decomposition into at most N K stabilizerpreserving Kraus operators. Then, given a stabilizer projector Π, we can estimate the Born rule probability with probability at least 1 − p fail and additive error within a runtime Furthermore, if the dyadic decomposition of ρ is optimal then α 1 can be replaced by Λ(ρ).
For concreteness, we assume that for each t, the decomposition of O (t) into at most N K stabilizer-preserving Kraus operators is given as a length ≤ N K list with each entry being a pair. The pair encodes a stabilizer-preserving Kraus operator and its associated weight factor. The stabilizer-preserving Kraus operator is described by giving an efficient description of the stabilizer state corresponding to the Choi state Φ O (t) as defined in Eq. (2).
Before proving the theorem, we briefly comment on the scaling of the runtime with α 1 and N K . The quadratic scaling of our simulator's runtime with α 1 is reminiscent of Howard and Campbell's algorithm associated with robustness of magic [10]. The key improvement arises from extending the free subtheory. In particular, the move from representing non-free states as real linear combinations of stabilizer projectors to complex combinations of stabilizer dyads can reduce the state's 1 -norm. This results in runtime improvements that are exponential in the number of non-free states. We proved this explicitly in Thm. 13 for products of single-qubit magic states. More generally, an exponential runtime gap will always be present as a consequence of only having access to optimal decompositions for a few qubits.
We see that the factor N K in the runtime arises from the number of transition probabilities that must be computed for each channel O, which is equal to the number of Kraus operators in its decomposition. Equivalently, the Choi state Φ O for each b-qubit channel O is decomposed as a convex combination of N K pure stabilizer states. The number of vertices in the optimization problem is equal to the number of 2b-qubit pure stabilizer states. Since the number of stabilizer states grows super-exponentially, one might worry that N K could also grow super-exponentially, even for channels that act on few qubits. For example, the two-qubit channel optimization problem already has 36720 vertices. However, the real vector space inhabited by 2b-qubit density matrices is (4 b − 1)dimensional. We can therefore completely partition the stabilizer polytope into simplices with 4 b vertices, where any mixed stabilizer state inhabits at least one simplex. Hence Φ O is a convex combination of at most 4 b pure stabilizer states (e.g. for 2-qubit channels, we need at most 16 vertices). In practice, the number of non-zero elements in channel decompositions obtained numerically is typically only a small frac- for r ← 1 to NK do 3: Pr ← qrKrσK †   To prove Theorem 14, we present the algorithm and prove its validity. Our algorithm has two subroutines: (i) Algorithm 1, which is an extended Gottesman-Knill-type subroutine that probabilistically updates an input stabilizer dyad given a Kraus decomposition of stabilizer channel O; and (ii) Algorithm 2, which is an outer quasi-probability sampling routine that samples an initial dyad from the initial nonstabilizer state and propagates the dyad through the circuit.
To simulate a quantum state within an additive error of with success probability p suc ≥ 1 − p fail , we require at least M samples in our algorithm, where To prove the validity of our algorithm we must: (i) explain how the stabilizer update q r K r σK † r can be carried out efficiently, (ii) show that the values P r in steps 2-5 of Algorithm 1 form a proper probability distribution, and (iii) show thatμ returned by Algorithm 2 is an unbiased estimator for Tr[ΠE(ρ)]. The runtime given in Theorem 14 then follows from standard arguments for quasi-probability simulators [7,10].
(i) Efficient stabilizer update with Kraus operators.
In Algorithm 1 we must perform the update |L R| → q r K r |L R|K † r and calculate the trace norm q r K r |L R|K † r 1 efficiently. Since only a single Kraus operator is involved, we compute the bra and ket updates |ψ → √ q r K r |ψ independently. We interpret this update as √ q r K r,A ⊗ 1l B |ψ AB , where the partition A contains the b qubits on which K r acts non-trivially, and B comprises of the remaining n − b qubits. By assumption, we are given K r in terms of a stabilizer state description of the Choi state |φ r φ r |: where K r (·) = K r (·)K † r , and |Φ b is the maximally entangled state defined in Eq. (3). By the Choi-Jamiołkowski isomorphism, the action of the operator K r can be reproduced by a post-selected teleportation circuit [51] consuming a stabilizer 'resource' state |φ r : (99) We prepare the input state |ψ ∈ {|L , |R } on a partition DB, where D comprises the b qubits on which the Kraus operator should act non-trivially. The resource state is prepared on the the ancillary subsystem AC. Following the update, the final state of the system of interest K r |ψ AB is product with the state |Φ b CD . The state on partition CD can therefore be reset without affecting AB, and the qubits re-partitioned for the next update. Since the right side of Eq. (99) only contains stabilizer states and stabilizer inner products, a classical description of the resulting (non-normalized) stabilizer state can easily be classically computed. Additionally, the unwanted component of the resultant product state can be discarded whilst describing K r |ψ classically. Throughout this update, we track the global phase using the phase-sensitive simulator described in Ref. [11], where it was shown that the update corresponding to the projection (1l + Q)/2, where Q is a Pauli operator, can be carried out for a system of N qubits in O(N 2 ) steps. Here we have N = n + 2b where b = O(1), so the runtime for a single step is O(n 2 ). Therefore the projection onto |Φ b Φ b | can be done in time O(bn 2 ). We need to compute the updates for the bra and ket separately, but this simply doubles the runtime for each Kraus operator. It is also possible to track the Euclidean norm of the state vector √ q r K r |ψ through this update. The trace norm is then The vector norms appearing on the right side are easy to classically compute by evaluating the (stabilizer) inner product between K r |ψ and its dual. We have to compute this norm twice for all N K Kraus operators, so a single call to the function defined in Algorithm 1 completes in time O(N K bn 2 ). We call the subroutine M times to generate each sample, so the total runtime is M poly(n, T, N K ).
(ii) Valid probability distribution. From the definition of P 0 in step 5 of Algorithm 1, it is clear that N K r=0 P r = 1 and P 1 , . . . , P N K ≥ 0. Hence, to show that {P r } is a probability distribution, it suffices to show that P 0 ≥ 0. We have |ψ ∈ {|L , |R } normalized and O is trace non-increasing. It is given that O(·) = r=1 q r K r (·)K † r is a CPTP map, so for any pure state |ψ , we have Let Q (ψ) be the N K -element real vector where the r-th entry is Q (ψ) r = √ q r K r |ψ . From Eq. (101), we have that Q (ψ) ≤ 1. Then we can express the sum of P r for r ≥ 1 as a dot product between Q (L) and Q (R) : where in the last line we used the Cauchy-Schwarz inequality to show that r≥1 P r ≤ 1, as promised. We note that the strategy of using an 'abort' outcome P 0 was deployed in the appendix of Ref. [52] to simulate post-selective channels. In our case the fact that P r for r ≥ 1 can sum to less than 1 instead arises from the non-Hermiticity of the initial 'state' σ.
(iii) Unbiased estimator. Finally we show that the expected value ofμ in Algorithm 2 is Tr[ΠE(ρ)]. Recall that the t-th circuit element O (t) has a known Kraus decomposition . Let the (T +1)-element vector r = (r 0 , r 1 , . . . , r T ) label a particular trajectory through the circuit, in the following sense. The first entry r 0 labels the initial dyad σ (0) r = |L r0 R r0 | sampled in step 3. For t ≥ 1, the entry r t gives the index of the Kraus operator chosen at the t-th circuit element and we write K  denote the current dyad updated up to the t-th Kraus operator along the trajectory r, so that we have the recursive relation σ is the probability of obtaining the outcome corresponding to the map K (t) r . The probability P r of choosing the trajectory r is given by where P (0) r = |α r0 |/ α 1 is the probability of sampling the initial dyad σ (0) r , and for t ≥ 1, P 1 is calculated in the t-th call to Algorithm 1. Then the final dyad σ (T ) r that we obtain from sampling the trajectory r is where r . This dyad is properly normalized according to the trace norm, but is only defined for those trajectories with non-zero probabilities P (t) r > 0. We write P to denote the set of all such non-zero probability trajectories. Now, there are two mutually exclusive possibilities for a given iteration of Algorithm 2: either we pick r t > 0 at each circuit element, choose some r ∈ P, and thus obtain a normalized dyad σ (T ) r , or at some step we choose r t = 0, and the iteration terminates with σ (T ) r = 0. Since these are the only possible outcomes, the total probability of the latter occuring must be p 0 = 1 − r∈P P r . We can now write down an explicit expression for the expectation value of the random variable µ m in step 12: where in the second line we have cancelled the factors P (t) r for t ≥ 1 with those in the denominator of Eq. (107). The real vectors r / ∈ P are never chosen when running the algorithm, since they correspond to paths where P (t) r = 0 for some t, and hence K (t) r (σ (t−1) r ) = 0. Since K r (σ (0) r ) = 0 for all r / ∈ P, we can add these zero-probability trajectories back into the summation in Eq. (109) without affecting the total value. Thus where in the second line we have written P (0) r outside of the inner sum since this probability is independent of r t for t ≥ 1. The inner expression sums over all Kraus trajectories, and by linearity we have r1,...,r T Hence where in the second line we used the definition P (0) r e iθr 0 = α r0 / α 1 . Hence μ = Tr[ΠE(ρ)], soμ is an unbiased estimator. Since for each individual sample we have |µ m | ≤ α 1 , standard arguments based on Hoeffding inequalities [7,10] show that the number of samples required M is as given in Eq. (97) resulting in the total runtime as given in Theorem 14.

B. A stabilizer rank simulator for mixed states
Any pure quantum state |ψ can be expressed as a linear combination of stabilizer states. Namely, we can always write where |φ j are stabilizer states and c j are quasi-probabilities that are complex numbers. The minimum number of constituent stabilizer states k is |ψ 's stabilizer rank k. A stabilizer rank simulator performs Clifford updates and Pauli measurements on |ψ with a runtime that is linear in k, and the norm of the vector c with components c j . More specifically, a fast norm estimation routine can approximate the probabilities of the measurement outcomes. When |ψ is a tensor product of T -states, Bravyi and Gosset [9] showed one can approximately efficiently simulate |ψ using a simulator on randomly generated states |Ω of low stabilizer rank, where the random states |Ω approximate |ψ well with high probability.
Bravyi. et al in Ref. [11,Lem. 6]) extend this result, so that for any pure state |ψ , there exist random states |Ω of stabilizer rank k so that with · denoting the Euclidean norm. We present a simple corollary of [11,Lem. 6]). This implies that For any target precision δ S > 0, choosing k so that: we get We call (121) the BBCCGH sparsification lemma [11].
Our simulators work not only on pure states, but also on mixed states, and our simulator's runtime relates to our monotone Ξ. First, we prove a new sparsification lemma that improves the runtime over that implied by Eq. (121) by a factor of 1/δ S with minor caveats. Second, we apply our new lemma to classically simulate bit-string sampling. While our ideas naturally apply to estimating Born probabilities, and can be extended to propagate an initial state through a stabilizer circuit prior to measurement, we omit this for brevity.

Sparsification lemma
We consider the subroutine SPARSIFY. The input to SPAR-SIFY is an integer k and pure state |ψ j with known stabilizer decomposition (117) with quasi-probability vector c. For every positive integer α, let |ω α be a random vector that is equal to (c j /|c j |)|φ j with probability p j = |c j |/ c 1 . Clearly, Clearly, E[|Ω ] = |ψ . Since the output |Ω of SPARSIFY is a random vector, this random vector need not have a unit norm. In [11], after obtaining a state |Ω from SPARSIFY, one estimates its Euclidean norm, and discards the state if its norm is not close to 1. For such a post-selected pure state, Eq. (118) holds with high probability. Here, we instead consider a sampling strategy that avoids post-selecting |Ω . After SPARSIFY gives a random |Ω , we renormalize so that it has unit norm. Furthermore, instead of bounding the error between an individual sample and the target state |ψ , we bound the error between |ψ and the whole ensemble as captured by the density matrix Intuitively, this is advantageous because coherent errors in each sample smooth out to a less harmful stochastic error. Similarly, randomizing coherent errors improves error bounds in the setting of circuit compilation [53][54][55][56].
The constant C ψ,c quantifies the performance of our sampling strategy on |ψ , where and depends only on the |ψ 's stabilizer decomposition (117). We can also equivalently write C ψ,c in terms of an expectation of the randomly sampled vectors |ω so that Our generalization to the BBCCGH sparsification lemma is then the following theorem.
Theorem 15. Let |ψ and c be an input state and a quasiprobability distribution as given in (117), and let ρ 1 be the mixed state in (123). Let C = C ψ,c be as given in (124).
There is a critical precision δ c = 8(C − 1)/ c 2 1 such that for every target precision δ S for which δ S ≥ δ c , we can sample from a mixed state ρ 1 . Moreover, every sampled pure state in ρ 1 has stabilizer rank at most 4 c 2 1 /δ S and When |ψ is a Clifford magic state as defined in Ref. [11], the critical precision becomes δ c = 0, and sampled pure states in ρ 1 have stabilizer rank at most The proof of Theorem 15 follows from two lemmata. We discuss these below, but first we consider how restrictive our lower bound on the precision δ S is in practice. Consider |ψ = |ψ ⊗N where |ψ are pure states. When |ψ is a product of N pure states, we can write the random vector as |ω = N α=1 |ω α , where |ω α are i.i.d. random vectors. It follows that E | ψ|ω | 2 = (E | ψ |ω α | 2 ) N . Since |ω α are always stabilizer states, when |ψ are non-stabilizer states, we have | ψ |ω α | 2 < 1. Therefore the lower bound 8(C − 1)/ c 2 1 < 8C/ c 2 1 = 8(E | ψ |ω α | 2 ) N vanishes for large N when |ψ is a tensor product of N pure states. More generally, since we can only find optimal decompositions for small numbers of qubits, known decompositions of many-qubit states are typically constructed as products of fewqubit decompositions, so we expect that C is small for many cases of interest. We now discuss the lemmata that lead to Theorem 15.
Lemma 16. Given a state |ψ = j c j |φ j where φ j are stabilizer states, we can sample from a mixed state ρ 1 such that every sampled pure state has stabilizer rank ≤ k and where |Ω is the random sparsified vector defined in Eq. (122).
Proof. First we introduce the operator where µ = E[ Ω|Ω ]. We insert 0 = (ρ 2 − ρ 2 ) as follows, and use the triangle inequality so that Now, Using Jensen's inequality we can bring the expectation value outside the norm so that That µ = E[ Ω|Ω ] = 1+( c 2 1 −1)/k comes from Ref. [11]. Loosening (133) with 1 µ ≤ 1 gives which is simply the average deviation of Ω|Ω from the mean. Using Jensen's inequality we get and so Next, we consider the term ρ 2 − |ψ ψ| 1 , by first finding an explicit form for ρ 2 . Observe that so taking the expectation value we have Let σ := E[|ω α ω α |]. We split (138) into two summations as follows: In the first contribution, we used the independence of ω α and ω β when α = β and E[|ω α ] = |ψ / c 1 . In the second contribution, we used σ 1 = 1. Next, there are k(k − 1) terms and k terms in the first and second summations respectively, so that Leading order Prior art Legend |ψ := (cos(θ)|0 + sin(θ)|1 ) ⊗100 k k k δ c δ c θ = π/8 θ = π/16 θ = 0.1187 5. A comparison of the trace norm error of target state |ψ and a sparsified state using k terms. The target state |ψ is a tensor product of 100 single-qubit states parametrized by θ, and we show three different choices of the parameter. Leading order refers to our Thm. 15 expression k = 4 c 2 1 /δS, and is valid provided δS ≥ δc with δc highlighted by a vertical line. Note θ = π/8 corresponds to the Clifford magic state |H , for which δc = 0. In the regime δS ≥ δc, our bound improves on the prior art [11] by a factor 1/δS. We also show an exact bound that is valid for all δS, which refers to Eq. (127) with the variance exactly bounded by Eq. (B33). The exact bound shows that (when C = 1), while we do not have a factor 1/δS improvement for very small δS, there is still a large saving. To better understand the deviations from the δS approximate scaling, we refer the reader to App. B and in particular Fig. 6, and to the discussion at the end of Section VII B. We also label the k value equal to the stabilizer rank χ upper bound that follows from Thm. 2 of Ref. [11]. At this k value we can obtain a δS = 0 simulation using the exact stabilizer rank decomposition.
Next, we use Lemma 17 to bound the variance of Ω|Ω .
Lemma 17. Using the notation of Lemma 16 the variance of Ω|Ω satisfies the bound where C = C ψ,c is as given in (124). When |ψ is a Clifford magic state as defined in Ref. [11], Since the proof of Lemma 17 is somewhat technical, we give it in App. B. By combining Lemmas 16 and 17 we can now prove Thm. 15. Substituting k = 4 c 2 1 /δ S and δ S ≥ 8(C − 1)/ c 2 1 into Eq. (144), we obtain and hence, using Using (147) with the expression for k and Lemma 16, we have This proves the main result of Theorem 15. When |ψ is a Clifford magic state, Eq. (145) combined with Lemma 16 gives This allows us to obtain Eq. (126) by setting k = (2 + √ 2) c 2 1 /δ S , completing the proof. We have shown that whenever the constraint on the target precision δ S is greater than a critical precision, one can sample from an ensemble of sparsified states ρ 1 that is δ S -close in the trace-norm to ψ|ψ , where the number of stabilizer terms k = 4 c 2 1 /δ S . Compared to the BBCCGH [11] sparsification lemma where k = 4 c 2 1 /δ 2 S , we see a factor 1/δ S improvement. If the target precision is smaller than the critical precision, one can compute C and obtain a sharp bound on the trace-norm error by using Lemmas 16 and 17 directly. In this case, the δ −2 S scaling of k is recovered, but with a prefactor often much smaller than in the original BBCCGH sparsification lemma. This is because one typically finds that (C − 1)/ c 2 1 1 for many-qubit magic states. We illustrate this in Fig. 5, where we compare the sharpened trace-norm bound of our Lemmata with that of Ref. [11] for states of the form |ψ N = |ψ ⊗N , where |ψ are single-qubit magic states, and N = 100. While δ S ≥ 8(C − 1)/ c 2 1 we have a quadratic improvement over Eq. (119), but even in the highprecision regime, we find a significant reduction in k.

Bit-string sampling
Theorem 18. Let ρ = j p j |ψ j ψ j | be an n-qubit state where every pure state |ψ j = r c (j) r |φ r has a known stabilizer decomposition. For every |ψ j , let C j = C ψj ,c (j) where C ψj ,c (j) is defined in (124). Let Ξ = j p j c (j) 2 1 , and let D = max{(C j − 1)/ c (j) 2 1 }. Then for any p fail > 0, and δ ≥ 24D there exists a classical algorithm that, with success probability (1 − p fail ), samples a bit-string x of length w with probability P (x) such that: where P (x) = Tr(Π x ρ), and Π x = |x x| ⊗ 1l n−w is a projector. The algorithm returns x with random runtime T where the expected runtime is If the decomposition of ρ is optimal with respect to the definition (6), then the expected runtime is O(Ξ(ρ)). Moreover, if the state decomposition is equimagical, then the right side of (151) also bounds the actual runtime T . If arbitrary precision δ ≤ 24D is required, this can be achieved at the cost of an increased runtime: All single-qubit states admit an equimagical decomposition (Thm. 4) that naturally extends to all tensor products of singlequbit states. We present the algorithm as Algo. 3 and prove its stated performance.
Algo. 3 uses two subroutines, SPARSIFY and FASTNORM, both of which originate from Ref. [11]. The SPARSIFY routine is the procedure described in the previous section. The second subroutine FASTNORM is the fast norm estimation algorithm described in Ref. [11]. FASTNORM takes as input error parameters p FN and FN , and the not necessarily normalized vector |Ω from SPARSIFY that has a known k-term stabilizer decomposition. Then with probability (1 − p FN ) FASTNORM outputs a random variable η that approximates |Ω 2 to within a multiplicative error of FN , in the sense that It is key that the error is multiplicative, because it is this that enables 1 -closeness between the ideal probability distribution, and that computed by the algorithm. While the bit-string sampling algorithm of Ref. [11] simulates measurements on pure states only, our Algo. 3 admits general mixed states. We now justify its validity and stated runtime. The proof strategy is to first show that the ensemble of states ρ that the algorithm samples from is close in the trace-norm to the ideal state ρ, before showing that the probability distribution P (x) computed by calls to FASTNORM is close in 1 -norm to P (x) = Tr[Π x ρ ] [9,11]. Algo. 3 first samples a state |ψ j from the ensemble with probability p j , and chooses a sparsification |Ω j,l with probability q j,l . if Px b =0 < 1/2 then 12:

17:
end if 18: Select y ∈ {0, 1} with probability P (x b = y|x), then x b ← y. 19: 20: x ← (x, x b ) 21: end for 22: return x FASTNORM then with high probability estimates the norm of |Ω j,l imperfectly up to multiplicative error. If we could compute each norm |Ω j,l exactly, we would be sampling bit strings x from a probability distribution defined by being the expected normalized sparsification of |ψ j , as defined in Eq. (123). The algorithm ensures that, by the result of Thm. 15 the number of terms for each sparsification is chosen so that ρ (j) 1 −|ψ j ψ j | 1 ≤ δ S +O(δ 2 S ), provided δ S ≥ δ c , which we can ensure by choosing δ S ≥ 8D. By the triangle inequality we have This analysis shows that Algo. 3 approximates the ideal density matrix ρ with a sparsified operator ρ that is δ S -close in the trace-norm. It remains to show that FASTNORM can generate probability distributions P (x) that approximate P (x) well in the for-loop of Algo. 3, where Here each Q j,l (x) is the probability of Algo. 3 returning x given the sparsification |Ω j,l . We now drop the subscript as we consider a single sparsification |Ω . Let x b denote the first b bits sampled of the string x. We take x 0 to be the empty string, so that Π x0 = 1l. Then at the b-th iteration, the computed conditional probabilities of sampling the b-th bit to be y given The probability of sampling the w-bit string x is the product of these conditional probabilities Therefore with probability at least (160) One can check that (1 + FN ) w /(1 − FN ) w ≤ 1 + 3w FN , whenever FN ≤ 1/5, and the analogous result holds for the lower bound. Therefore Q j,l (x) approximates Π x |Ω j,l 2 / |Ω j,l 2 up to multiplicative error 3w FN . Comparing (155) with (158), we therefore obtain: If we want to bound the total multiplicative error due to the sequence of calls to FASTNORM to , then we must set FN = /(3w). It then follows that In the first part of the proof we showed that ρ − ρ 1 ≤ δ S + O(δ 2 S ) and P (x) = Tr[Π x ρ ]. Combined with Eq. (162), we obtain where Similarly the error bound given above is only obtained with probability (1 − p FN ) 2w ≈ 1 − 2wp FN , so to obtain the above closeness in 1 -norm, with failure probability at most p fail , we must set p FN = p fail /(2w). If we select the state |ψ j in step 1, then k ≤ 4 c (j) 2 1 δ −1 S + 1. To return a single bit-string x there are at most 2w calls to FASTNORM, so the runtime is O( Recall that the statement of the theorem defined the quantity Ξ = j p j c (j) 2 1 . Therefore the expected runtime is O(w 3 n 3 Ξδ −1 S 2 log(w/p fail )). If the decomposition is optimal with respect to the monotone Ξ, then we have Ξ = Ξ(ρ) and the expected runtime is O(Ξ(ρ)). For equimagical states, Ξ(ρ) = ξ(ψ j ) for all j, and this expression becomes the actual runtime. We now optimize the choice of δ S and . Setting the total error budget δ = δ S + , by inspecting the runtime we find that the best constant is obtained by setting δ S = δ/3 and = 2δ/3. The constraint δ S ≥ 8D therefore becomes δ ≥ 24D. Substituting the optimal choice of δ S and into the expected runtime, we obtain Eq. (151). Now suppose that we want to achieve arbitrary precision, δ S ≤ 8D, where D = max{(C j − 1)/ c (j) 2 1 }. In this regime, one can amend the expression for k in step 2 to achieve any desired precision, at the cost of slightly poorer scaling in the runtime. We first use lemmata 16 and 17 to obtain a sharpened bound on the sparsification error: (164) When δ S 8D, we can achieve a precision of δ S by choosing Here we recover the same asymptotic δ −2 S scaling as derived from the original BBCCGH sparsification lemma [11]. However, the prefactor from this prior work was two, whereas our prefactor D is typically exponentially small in the number of qubits, as explained in the previous subsection. Therefore, at intermediate precision, the δ −1 S term may still dominate. When the target precision δ S is too small, our bound on the required k exceeds the number of terms in the exact decomposition of |ψ (i.e., the stabilizer rank of ψ). In this scenario, using a sparsified approximation in both our approach and in [11] has no benefit, and one should instead use an exact decomposition without any sparsification. Substituting the revised expression for k into the expected runtime, with δ S = δ/3 and = 2δ/3, we obtain Eq. (152).

C. Fast but noisy estimator
The definition of Λ + given in Eq. (13) implies that for a given state ρ, there exists σ ∈ F n , such that we can write ρ as a pseudomixture of the mixed stabilizer state σ and some general state ρ − :  Choosing E max and E min to be given in steps 3 and 4, we ensure that for all λ and E σ , holds with probability 1 − p fail . The major caveat is that there are certain regimes (for large λ and small E σ ) where the algorithm fails by trivially estimating the true expectation value to be anywhere in the range [−1, 1]. Nevertheless, in some regimes we efficiently obtain a noisy but non-trivial estimate. We first briefly explain steps 3 and 4, before analysing the error bound and runtime. When λ and σ are such that ρ ≤ λσ, using (13), there is some density matrix ρ − such that ρ can be written as Step 2 estimates E σ such that Similarly one obtains Tr(EO[ρ]) ≥ λ(E σ − − 1) + 1. Trivially we know that |Tr(EO[ρ])| ≤ 1, so in case either expression exceeds this (for example if E σ is close to ±1) we simply take either E max = 1 or E min = −1 as necessary. We now consider the regimes where the bounds are trivial, and give the size of the error otherwise. Case 1 (failure): Trivial bounds are obtained when both these conditions hold: that is, when E σ satisfies: In this case, we have Case 3 (error decreases with |E σ |): The remaining case occurs when E σ is sufficiently close to ±1, so that either E max = 1 or E min = −1. This limits the range of possible values of Tr(EO[ρ]), so that This occurs when exactly one of the inequalities (172) and (173) is satisfied, while the other is violated.
Estimating Tr(EO[σ]) using the dyadic frame simulator takes up the most time in the algorithm, as the other steps are trivial to evaluate. Since σ is a (mixed) stabilizer state, there is no sampling overhead due to negativity in the decomposition. When = c(1−1/λ), the number of samples T required to achieve accuracy in this estimate with probability at least 1 − p fail satisfies the bound For fixed c the runtime decreases with λ. For larger values of λ, the total error ∆ is dominated by our ignorance of the state ρ − , so it is unnecessary to estimate Tr(EO[σ]) to high precision. Our explicit algorithm for estimating Pauli expectation values easily adapts to estimate Born rule probabilities for stabilizer projectors Π by replacing the assumption |Tr(Eρ)| ≤ 1 for any ρ with 0 ≤ Tr(Πρ) ≤ 1.

VIII. APPLICATIONS TO OTHER RESOURCES
As remarked before, the monotones R, Λ, and Λ + can be defined in the context of general quantum resource theories [35]. An important question then concerns the operational significance of such quantifiers. What does the quantitative value of a resource represent, and do these quantities always admit a meaningful physical interpretation? Recent progress in this direction revealed that in all convex resource theories, both R(ρ) and Λ + (ρ) precisely measure the advantage provided by a resource state in a family of channel discrimination tasks [57,58], and in several classes of resources they can find use in distillation and dilution tasks [47,59]. However, the explicit operational interpretation of the dyadic negativity Λ has not been considered before, nor has a general connection between resource quantifiers and classical simulation been established.
The structure of our proofs in Sec.VII allows for a straightforward adaptation of our algorithms beyond magic-state quantum computation, thereby making them suitable in the context of other, more general quantum resources theories. From our discussion on the dyadic frame simulator in Sec. VII A, the proof clearly requires only three crucial properties, which we formalize in the following Theorem. Theorem 14'. Let b be a constant. Suppose that a resource theory with a set of free n-qubit pure states V n , free operations O n and free observables M n obeys the following conditions. 1. Only O(poly(n)) bits of information are necessary to index all n-qubit pure free states in the set V n .
2. Given a free operation O ∈ O n that acts non-trivially on at most b qubits, we can compute O(|L R|) = N K j=1 p j |L j R j | for any free states |L , |R ∈ V n in O(poly(n, N K )) time.
3. Given a free observable Π ∈ M n and any free states |L , |R ∈ V n , we can compute R|Π|L in O(poly(n)) time.
Let an n-qubit initial state with known dyadic decomposition be ρ = j α j |L j R j | for complex α j , where the associated probability distribution {|α j |/ α 1 } can be efficiently sam- is a free operation that acts non-trivially on at most b qubits. Suppose that every O (t) has a known decomposition into at most N K resource-preserving Kraus operators. Then, given a free projector Π, we can estimate the Born rule probability with probability 1 − p fail and additive error within a runtime α 2 1 2 log(p −1 fail )poly(n, T, N K ).
Furthermore, if the dyadic decomposition of ρ is optimal then α 1 can be replaced by Λ(ρ).
Theorem 14' establishes an efficient simulation algorithm which can be employed in any resource theory that satisfies the theorem's requirements. Theorem 14' also exactly connects the magic monotone Λ with the sampling overhead of the algorithm, and imbues an operational interpretation to Λ. The result also allows us to directly employ the fast but noisy simulator of Sec. VII C, which directly depends on the dyadic frame simulator above.
As an example where the result can be immediately applied, consider the resource theory of quantum coherence [30,31], where the free states V n are the vectors of the computational basis {|i }. The free measurements are in the computational basis, for instance projectors of the form Π = |x x|⊗1l where x is a fixed bit-string, which can be efficiently computed. The corresponding dyadic negativity Λ is then the (element-wise) 1 -norm, ρ 1 = i,j | i|ρ|j |. We remark that · 1 is trivially a multiplicative monotone in any dimension. Although · 1 is one of the most commonly employed measures in the resource theory of coherence [30,31,60], it has lacked an explicit operational interpretation thus far. Since the resource theory of coherence is not known to admit a unique, physically motivated choice of free operations [31,61], we explore the possible choices of O n and their classical simulability. From this, we use Thm. 14' to give · 1 an operational interpretation.
The most fundamental class of free operations within the resource theory of coherence are the incoherent operations (IOs) [30], defined to be maps which admit a decomposition into Kraus operators which preserve the set of incoherent states. Such Kraus operators can be expressed as [62] K = x∈S c x |f (x) x| for some set of bit-strings S, coefficients c x , and an arbitrary function f . Given such a K acting on no more than b qubits, where b is constant, we can efficiently compute (K ⊗ 1l n−b )|i . Since any Boolean f can be implemented by composing a set of universal classical logic gates (with b = 2) such as AND and XOR, such gates can generate an IO realising any Boolean function. Furthermore, IOs can simulate any quantum channel with sufficiently many coherent states [30,63].
One family of useful IOs in practice are the strictly incoherent operations (SIOs) [62,64], which can be efficiently implemented by quantum circuits using only incoherent ancillae [64]. As a subtheory of IO, all the updates are still efficiently computable. Furthermore, b = 3 suffices to provide the Toffoli gate which is universal for classical reversible logic, and so can generate any Kraus operator of the required form. The biggest difference between SIOs and IOs is that, while SIOs are better understood from the perspective of their practical implementation, they cannot be promoted to universal quantum operations through the use of ancillary resource states [63].
We conclude that the resource theory of coherence that uses either IOs or SIOs as free operations satisfies the conditions of Thm. 14'. Thus, the theorem endows the 1 -norm of coherence with an operational interpretation in the context of classical simulation of either IOs or SIOs.
Our dyadic frame simulator is especially useful in many resource theories where other simulation algorithms such as the Howard-Campbell simulator for magic states [10] cannot be readily adapted. For instance, in the resource theory of coherence, the free states F n form a zero-measure subset of all states, which means that no resourceful state ρ can be decomposed as ρ = j p j |φ j φ j | with |φ j ∈ V n and so the corresponding robustness quantifier R(ρ) diverges.
Note, however, that the dyadic frame does not work for all resource theories. While the dyadic frames for stabilizer and incoherent operations meet the conditions of Thm. 14, the requirements cannot hold for the theory of separable states under local operations and classical communication (LOCC). This is because the free states consist of an infinite number of inequivalent pure product states, which cannot be described using poly(n) bits. However, one can accurately compute local unitaries acting on product states efficiently. Indeed, our framework could encompass entanglement and similar theories using a suitable -net over the set of separable states, and we leave the precise statement of the relevant conditions for future work.

IX. CONCLUSIONS
We have introduced three resource monotones into the setting of magic state quantum computation: the dyadic negativity Λ, the generalized robustness Λ + , and the convex roof monotone Ξ. The first part of the paper focuses on resourcetheoretic results, including that: (i) for pure states, the monotones all equal the extent monotone ξ; (ii) for tensor products of single-qubit mixed states, they all coincide; and (iii) the monotones act multiplicatively on tensor products of singlequbit mixed states. The results significantly simplify the computation of the monotones for multiple copies of a single-qubit state, and allow us to completely understand the asymptotic behavior of our magic quantifiers, which contrasts with previously used monotones. While (i) is a universal property of all n-qubit pure states, it remains to determine if (ii) and (iii) hold in full generality. Furthermore, our magic monotones often tighten bounds on previously known distillation rates.
For each monotone, we introduce a related classical simulation algorithm. Our dyadic negativity simulator has a runtime proportional to Λ(ρ) 2 , which is similar to the Howard-Campbell simulator with runtime R(ρ) 2 where R is the robustness of magic. Additionally, we show that the dyadic negativity simulator works for circuits which use completely stabilizer-preserving operations. This class includes all the conventional stabilizer operations (Clifford unitaries, Pauli projections, etc.) and we believe it is likely to be strictly larger. If true, the situation would mirror entanglement theory in the separation between LOCC and separable operations [65]. We found that for tensor products of n single-qubit states, both Λ and R scale exponentially with n, but R is exponentially larger than Λ. This establishes our dyadic negativity simulator as the fastest known quasi-probability simulator for qubit magic states.
However, not all classical simulation algorithms are based on quasi-probability distributions, with the stabilizer rank methods representing a distinct paradigm. There are several crucial differences including that stabilizer rank methods enable a stronger notion of classical simulation, as they allow us to sample outputs of a quantum computation, not just estimate Born rule probabilities. Prior work on stabilizer rank simulations considered only pure states, but our simulator extends this to mixed states and demonstrates an expected runtime proportional to Ξ(ρ). Note the linear dependence on Ξ (largely due to fast norm estimation [9]), in contrast to the quadratic dependence encountered with quasi-probability simulators. , which would mean a runtime advantage for the quasi-probability methods. However, for products of single-qubit states the monontones are equal, so for such states our resource theory results show that the advantage clearly falls to the stabilizer rank simulators. Furthermore, we improve stabilizer rank bounds, with the runtime for sampling Clifford magic states (e.g. T states) improved to O(1/δ) from the prior O(1/δ 2 ) bound where δ is the sampling precision. For other magic states, the advantage is not as simple to describe using big-O notation, but Fig. 5 shows it to be considerable in practice.
Finally, by ensuring that our simulation algorithms can be easily generalized and providing a recipe to adapt our simulators to resource theories beyond magic states, we shed light on the simulation of quantum circuits using very general resources under suitable assumptions. This not only provides new insight into the practical uses of resource quantifiers in well-studied theories such as quantum coherence, but also opens an avenue for a further study of the connections between the theoretical frameworks of quantum resources and their operational applications in quantum computation.
A clear direction for further research is to extend our results to the channel picture, which would enable a more direct route to simulate circuits with no need to replace non-free operations with state injection gadgets. This is especially important in the context of stabilizer theory for the simulation of circuits with gates outside the Clifford hierarchy, as the gadgets then become more complex.  6. The variable C as introduced in Eq. (B32) as a function of the angle θ for a class of single-qubit states. This is the C value for one copy of the state, for n copies we must raise to the n th power. The prefactor C − 1 appears in Eq. (B33) and is important because when C = 1, the variance scales asymptotically as O(1/k 2 ). We highlight three specific angles θ = {π/8, π/16, π/32} that correspond to angles used in Fig. 5. For θ = π/8, we have C − 1 = 0 and so the O(1/k 2 ) is exact as can be seen in Fig. 5. For θ = π/32, we are close to the maximal possible value of C and Fig. 5 shows a deviation from O(1/k 2 ) scaling.
(A5) and in particular for any W satisfying these conditions we have Λ(ρ) ≥ Tr[W ρ]. We notice that the extent witness |ω can be used to build an operator W = |ω ω | that is a valid witness for Λ. Therefore,

Appendix B: Variance bound
We now prove Lemma 17, which leads to the new sparsification result of Theorem 15. Recall that given a state |ψ = j c j |φ j , where |φ j are stabilizer states, we can ob-tain a sparsified k-term approximation given by: where each |ω α is chosen randomly so that |ω α = (c j /|c j |)|φ j with probability p j = |c j |/ c 1 . In general |Ω may not be conventionally normalized, but Lemma 17 upper bounds the variance of Ω|Ω . We now prove Lem. 17.
where σ = E[|ω β ω β |] = j (|c j |/ c 1 )|φ j φ j |, since the probability of sampling |ω β ω β | = |φ j φ j | is defined as p j = |c j |/ c 1 . Next we consider E[B β,µ ]. Taking the modulus and using the triangle inequality we obtain which to leading order is which gives us the general bound. Clifford magic states were defined in Ref. [11] as those pure states |ψ that are stabilized by a group Q of Clifford unitary operators whose generators take the form U X j U † , where X j is the Pauli X operator that acts on the j-th qubit. For such states, there exists [11] an optimal decomposition |ψ = q∈Q c q |φ q = 1 |Q| ψ|φ 0 q∈Q q|φ 0 , where |φ 0 is some stabilizer state. If we take this decomposition as the basis for our sparsification, then we have c 1 = |Q| · (|Q|| ψ|φ 0 |) −1 = | ψ|φ 0 | −1 , and where p q = |Q| −1 . This yields ψ|σ|ψ = q∈Q p q ψ|q|φ 0 φ 0 |q † |ψ (B37) = q∈Q p q ψ|φ 0 φ 0 |ψ (B38) where in the second line we used the Hermiticity of q and q|ψ = |ψ . This shows that for optimal decompositions of Clifford magic states, C = 1, and leads to the simplified bound