Exponential Error Suppression for Near-Term Quantum Devices

As quantum computers mature, quantum error correcting codes (QECs) will be adopted in order to suppress errors to any desired level $E$ at a cost in qubit-count $n$ that is merely poly-logarithmic in $1/E$. However in the NISQ era, the complexity and scale required to adopt even the smallest QEC is prohibitive. Instead, error mitigation techniques have been employed; typically these do not require an increase in qubit-count but cannot provide exponential error suppression. Here we show that, for the crucial case of estimating expectation values of observables (key to almost all NISQ algorithms) one can indeed achieve an effective exponential suppression. We introduce the Error Suppression by Derangement (ESD) approach: by increasing the qubit count by a factor of $n\geq 2$, the error is suppressed exponentially as $Q^n$ where $Q<1$ is a suppression factor that depends on the entropy of the errors. The ESD approach takes $n$ independently-prepared circuit outputs and applies a controlled derangement operator to create a state whose symmetries prevent erroneous states from contributing to expected values. The approach is therefore `NISQ-friendly' as it is modular in the main computation and requires only a shallow circuit that bridges the $n$ copies immediately prior to measurement. Imperfections in our derangement circuit do degrade performance and therefore we propose an approach to mitigate this effect to arbitrary precision due to the remarkable properties of derangements. a) they decompose into a linear number of elementary gates -- limiting the impact of noise b) they are highly resilient to noise and the effect of imperfections on them is (almost) trivial. In numerical simulations validating our approach we confirm error suppression below $10^{-6}$ for circuits consisting of several hundred noisy gates (two-qubit gate error $0.5\%$) using no more than $n=4$ circuit copies.


I. INTRODUCTION
The control of errors, also called noise, is fundamental to the successful exploitation of quantum computers. The powerful and general theory of quantum fault tolerance, exploiting quantum error correcting codes (QECs), provides a theoretical blueprint for controlling errors in the era when quantum devices are large-scale [1][2][3][4][5][6][7]. Encoding qubits into collective states permits the suppression of the error rate on logical gates to an arbitrary small level at the cost of increasing the number of physical qubits. Below a threshold the error suppression is exponential in the hardware scaling. However, this powerful solution is prohibitive in the current era of noisy, intermediate scale quantum (NISQ) devices for the following reasons [8]. (a) the qubit-count scale factor is at least 5 for the simplest codes that protect against comprehensive noise types [9,10]. (b) the extra circuit complexity that is needed in order to monitor the stabilisers, or equivalent measures of code integrity, is very considerable and will boost the effective error rate. (c) in order to achieve a universal set of quantum operations on code-protected logical qubits, highly-non-trivial additional measures such as magic state purification must be undertaken, greatly increasing the hardware scale.
Here we present an approach to controlling errors that achieves the key benefit of true QEC in the specific (but pivotal) case of estimating expected values of operators, and does so without the three key drawbacks of QEC mentioned above. The present idea requires an * balint.koczor@materials.ox.ac.uk increased qubit-count (by some integer factor that is at least two), and therefore it is more hardware-expensive than many NISQ error mitigation schemes [11][12][13][14][15][16][17][18][19][20][21], but in return it provides exponential error suppression -which other NISQ solutions cannot. Therefore the approach might be seen as sitting between the established NISQera techniques and the full QEC domain, albeit nearer to the NISQ approaches. Moreover the present approach is compatible with other NISQ mitigation techniques such as extrapolation, quasi-probability or symmetry verification [12,[14][15][16][17][18][19][20]; In fact, extrapolation is used in the present analysis to negate the impact of errors in the derangement process.

A. Estimating Expectation Values
Estimating expectation values on a quantum device is of central importance and most near-term applications do need to estimate such expectation values. Many variants of the so-called variational quantum eigensolver have been proposed for solving classically intractable problems, such as simulating quantum systems described by Hamiltonians H [11-14, 16, 22-41]. Expectation values of Hamiltonian operators are typically decomposed as ψ id |H|ψ id = k c k ψ id |P k |ψ id , where P k are tensor products of Pauli operators, and we will collectively denote them as σ ≡ P k in the following. Various approaches have been proposed for estimating such expectation values ψ id |σ|ψ id using quantum computers [12,14,[42][43][44]. However, without comprehensive error correction, errors during the state preparation will contribute a bias as ψ k |σ|ψ k into the result, where |ψ k are erroneous states as shown in the next section. There exist numerous error mitigation techniques that potentially reduce the effect of such contributions without increasing the number N of qubits, but at the cost of a significantly increased number of measurements and increased numbers of circuit variants [12,[14][15][16][17][18][19][20]. Note that error mitigation techniques are also limited to correcting errors in measurements of observables as opposed to QECs.
Here we take a different route and introduce the Error Suppression by Derangement (ESD) approach: we introduce a high degree of symmetry by preparing n copies of the quantum state |ψ and use derangement operators (generalised SWAP operations) to protect collective permutation symmetry. Most noise events that occur during the imperfect preparation of |ψ break this permutation symmetry and they are effectively 'filtered out' by the ESD. We outline a possible construction for such a measurement process in Fig. 1 and thoroughly analyse its properties while supporting our claims with rigorous mathematical proofs.
A crucial element of typical NISQ applications is the accurate estimation of expectation values of observables. The present approach allows one to exponentially suppress errors in such estimations and thus enables to push the limits of a vast number of promising NISQ techniques. Let us name a few potential applications: variants of the variational quantum eigensolver for, e.g., finding ground states of molecular Hamiltonians in quantum chemistry or spin model Hamiltonians in materials science; quantum approximate optimisations of graph problems; quantum machine learning and beyond. Please refer to the review articles [11][12][13] and references therein for more examples. Since the present approach is completely general and can be applied to the estimation of any observable (as discussed above), we will present our results and proofs in complete generality without explicitly specifying or restricting the observable σ.
Our construction is certainly very well suited for NISQ hardware for the following reasons. First, the main computation is modular as the n copies of the computational state are prepared completely independently. Second, the derangement circuit that 'bridges' the n copies immediately prior to measurements is sufficiently shallow (as it can be decomposed into a linear number of primitive gates) and therefore picks up significantly less noise than the state-preparation stage. Third, the derangement measurement is highly resilient to noise, since most error events that occur during the derangement process do not contribute to the result. Let us now introduce basic concepts and explain the main idea in detail.

A. Noisy Quantum States and Entropies
Near-term quantum devices aim to prepare computational quantum states |ψ id for, e.g., simulating other Our derangement operator Dn is a generalisation of the SWAP operator and acts on n (not necessarily identical) copies of the quantum state ρ = λ|ψ ψ| + (1 − λ)ρerr. In the above circuit Dn permutes the n registers and allows only permutation symmetric combinations, e.g., |ψ ⊗n , to contribute to the expectation-value measurement process. The probability of measuring the ancilla qubit in the 0 state enables us to approximate the expectation value ψ|σ|ψ and errors are suppressed exponentially in n. This derangement operator can be implemented as a shallow circuit using a linear number N (n − 1) of primitive, controlled, two-qubit SWAP gates.
quantum systems or beyond. These quantum devices are, however, imperfect and can only prepare noisy, mixed quantum states which can be expressed generally via the spectral decomposition of a density matrix ρ = λ|ψ ψ| + (1 − λ) 2 N k=2 p k |ψ k ψ k |. (1) Here λ ≤ 1 and k p k = 1 is a probability distribution. It is important to recognise that the dominant eigenvector |ψ above is not necessarily equivalent to the state that one would obtain from an ideal computation; even purely incoherent error models result in a small coherent mismatch in the dominant eigenvector. A comprehensive analysis of this coherent mismatch is presented in ref. [45] and strong theoretical guarantees are provided that it can be exponentially smaller than the build-up of the erroneous contributions |ψ k . In the following we thus focus on estimating expectation values in the dominant eigenvector, while advantages of this approach are discussed below the Acknowledgements. We further stress that in principle λ can be arbitrarily small, e.g., λ = 10 −6 , as long as it is the dominant component and larger then any other eigenvalue as λ > (1−λ)p k for all k. Although, for extremely low λ other factors such as the sampling cost may of course become prohibitive in practice as we discuss in later text.
Furthermore, p k are probabilities of 'erroneous' contributions |ψ k , and we will refer to these (orthonormal) states as 'erroneous' eigenvectors in the following and we denote their probability vector as p. To keep our discus-sion completely general we do not restrict the probability distribution p at all, but we remark that Rényi entropies [46] as will have a crucial effect on the efficacy of the technique and, indeed, for typical experimental quantum systems one can expect that H n (p) are large.

B. Main Idea
As discussed above, most applications targeting early quantum devices aim to estimate expectation values ψ id |σ|ψ id in a quantum state |ψ id prepared by an ideal noiseless quantum device. Measuring expectation values in the dominant eigenvector ψ|σ|ψ from Eq. 1 would give in practical scenarios a very good approximation [45], however, erroneous eigenvectors during state preparation contribute bias ψ k |σ|ψ k to the estimated expectation values. Here we aim to suppress these contributions via the following novel principle. Let us prepare n copies of the state ρ from Eq. (1). The most likely event during state preparation is that we obtain the dominant eigenvector of the state: with a probability λ n the resulting state (immediately after state preparation) is |ψ, ψ, . . . ψ . Measuring the expectation value on the first register gives the desired result ψ, . . . ψ, ψ|σψ, ψ, . . . ψ = ψ|σ|ψ .
Here the SWAP operator changed the ordering of the registers as |ψ k , ψ, . . . ψ → |ψ, ψ, . . . ψ k and the result is 0 due to the orthogonality of the eigenvectors of the density matrix. We can straightforwardly generalise this idea to the case where all registers are swapped, allowing only permutation-symmetric states to contribute to the measurement of expectation values. We will refer to this permutation operation as 'derangement'. Let us emphasise that the above argument is completely general and holds for any noise model. While one can certainly realise the above measurement principle in various different ways, we propose one such circuit in Fig. 1. We rigorously prove properties of this particular construction in Result 1, Result 2 and Result 3, but we stress that the current proposal is not limited to the circuit in  Fig. 1 leaves room for various different physical implementations which we discuss in later text).

A. Exponential Error Suppression
Let us now formally state the main result of the present work. In particular, the circuit in Fig. 1 can be thought of as a Hadamard-test technique [1] that measures the expectation value of the product σD n , where the derangement operator D n permutes the n input registers; as we will explain in a later section and discuss that it only requires a linear number of primitive gates to construct. We prove in Theorem 1 that only permutation-symmetric combinations can pass through the derangement measurement in Fig. 1, such as the dominant eigenvector |ψ ⊗n (which happens with a probability λ n ) or states in which the same errors occured to all registers |ψ k ⊗n (which happen with probabilities (1 − λ) n p n k ). Our general result in Theorem 1 determines the probability prob 0 of measuring the ancilla qubit in Fig. 1 in the 0 state as where the erroneous contributions ψ k |σ|ψ k are exponentially suppressed as we increase n. Dividing by λ n allows one to approximate the expectation value of a unitary observable σ 2 = Id, or otherwise the real part of the expected value of a unitary operator. We work out two explicit results in Example 1 and Example 2 that demonstrate how the above scheme allows to exponentially suppress the noise as we increase the number n of copies of ρ and how its efficacy depends on properties of the probability distribution p k . Let us now state approximation errors of Methods A and B.
Result 1. Let us prepare n identical copies of the experimental quantum state ρ from from Eq. (1) and apply the derangement measurement from Fig. 1. Both Methods A and B approximate the expectation value ψ|σ|ψ by estimating prob 0 on the ancilla qubit. Method B only estimates prob 0 and assumes explicit knowledge of the dominant eigenvalue λ. In Method A we additionally estimate prob 0 by repeating the procedure but omitting the controlled-σ gate in Fig. 1. We denote their approximation errors as E A and E B , respectively, and these approximation errors generally decay exponen-Identical copies of ρ Non-identical copies of ρ  tially with the number n of copies via the sequence Q n which is bounded Q n ≤ const × Q n via the suppression factor Q < 1 as established in Theorem 2 and in Lemma 1.
Note that these error bounds naturally extend to observables H of unit norm that are linear combinations of Pauli strings. We shown in Lemma 1 that the errors also decay exponentially with the Rényi entropy H n (p) of the error probability distribution from Eq. (1) via Q n = (λ −1 −1) n exp[(1−n)H n (p)]. Even without knowing or having a good guess of the Rényi entropy of the error probabilities, we can state a general upper bound that only depends on the two largest eigenvalues of the state as Q n ≤ (λ −1 −1) n (p max ) n−1 , where p max is the largest of the error probabilities p k in Eq. (1). Note that these quantities, and thus the upper bounds, may be estimated experimentally [47][48][49][50][51][52][53].

B. Numerical Simulations
Let us now numerically verify the above bounds in a practical setting: We consider a 12-qubit quantum state that is produced by a noisy, parametrised quantum circuit typically used in variational quantum algorithmsour circuit consits of 10 alternating layers and overall 372 quantum gates. Refer to Sec. F for more details. Each two-qubit gate undergoes 2-qubit depolarising noise with 0.5% probability and each single-qubit gate undergoes depolarising noise with 0.05% probability. The resulting state has a dominant eigenvalue λ ≈ 0.51 and it has a high entropy, full-rank error probability distribution via the Rényi entropies that monotonically decrease with n as H 2 (p) = 4.69, H 3 (p) = 4.38, H 4 (p) = 4.23, and H ∞ (p) = 3.63. Refer to Appendix F for more details.
Let us remind the reader that despite the purely incoherent error model, the dominant eigenvector |ψ of ρ is slightly different than what one would obtain from a completely error-free computation and in Fig. 2 we compute errors using the dominant eigenvector, refer to Appendix F for more details.
In Fig. 2 (left) we plot our error suppression upper bounds from Result 1, i.e., solid lines represent the error bounds computed from the Rényi entropy of the quantum state's error-probability distribution and dashed lines represent the general upper bound Q n ≤ (λ −1 −1) n (p max ) n−1 where the largest error probability is p max = 0.026 and the suppression factor is Q = 0.026. Red and blue colours correspond to Method A and Method B, respectively. We have generated 500 Pauli strings as observables randomly and computed the errors in estimating their expectation values (there are overall 4 12 = 1.68 × 10 7 Pauli strings, and we randomly select 500). These samples (see horizontal lines in Fig. 2) are significantly below our upper bounds and seem to decrease in a similar exponential order as our bounds (i.e., slope is similar in the logarithmic plot).
Method B slightly outperforms Method A (slightly smaller errors as blue is slightly below red), but it requires an exact (or very precise) knowledge of the dominant eigenvalue λ. Nevertheless, this eigenvalue could be determined precisely by existing approaches in special cases, e.g., as in [54].

C. Effect of Non-Identical States
We now turn to the question of how the efficacy of our error suppression scheme is affected when the n copies of the state ρ are not identical.
Result 2. We assume that all copies of the quantum state are arbitrarily different via ρ 1 = ρ 2 = . . . ρ n except that their dominant eigenvector is |ψ . Our scheme via Lemma 3 still provides exponentially decreasing approximation errors when the dominant eigenvalue of the worst quality copy is λ min > 1/2 via In the special case when all copies of the quantum state commute (same eigenvectors but different eigenvalues) one can expect very similar approximation errors to Result 1 via an effective sequence Q ef f n . We can efficiently simulate the case when all copies of the quantum state commute. We disturbed every copy of the density matrix such that their trace distance is ρ k − ρ l ≈ 10 −2 for all k = l. Note that the approximation errors in Fig. 2

D. Complexity Analysis
Let us now analyse resource requirements of our ESD approach. In particular, one needs to prepare a suitable number n of copies of ρ in order to suppress its errors below a threshold level, that we will refer to as precision and denote as E. The overall number of qubits required is then nN + 1, where N is the number of qubits in the computational state ρ. Furthermore, one needs to repeat measurements many times to sufficiently reduce the effect of so-called shot noise, i.e., we estimate the probability only from a finite number of repetitions [55]. We denote the number of repetitions as N s . Let us now summarise our general results from Lemma 2.
Result 3. In order to reach a precision E in determining the expectation value ψ|σ|ψ , one requires a logarithmic number n = O(ln E −1 / ln Q −1 ) of copies of the quantum state ρ (up to rounding). Here Q < 1 is the suppression factor from Result 1 that depends on Rényi entropies. The number N s of measurements required to suppress shot noise below the threshold E grows polynomially as Method B: where f = ln[λ −1 / ln Q −1 ] increases the polynomial order compared to the standard shot-noise limit O(E −2 ) and we have derived a general upper bound on f in Lemma 2.
Dividing by the exponentially attenuated factor λ n in both Methods A and B, in Result 1 requires an increasingly large number of measurements to sufficiently suppress shot noise. Methods A and B are therefore less efficient than permitted by the standard shot noise limit N s = O(E −2 ). For example in the extreme, but still valid, case of λ = 10 −6 and Q = 1/2 we obtain f = 19.9 which increases the sampling costs prohibitively in practice. Nevertheless, the polynomial order of O(E −1 ) is only logarithmically increased via f and its effect might be negligible in practically relevant scenarios. For example in our simulations in Fig. 2 we obtain f = 0.18 using our expression Q = (λ −1 −1)p max in Lemma 1. Indeed, we recover the standard shot-noise limit O(E −2 ) for very good quality states λ ≈ 1 or for very high entropy probabilities.
In summary, the complexity of our ESD approach only depends on the largest eigenvalue λ of the state and on the suppression factor Q from Result 1 -which is determined by the Rényi entropy of the error probabilities. As expected, the number N s of samples grows polynomially with the target precision E −1 and the system size(via n) grows logarithmically with E −1 . Let us remark that in case of certain applications a global prefactor in observable expectation values does not matter -such as in case of VQE optimisations -and one can use method B but omitting the division by λ n . Using Method B significantly reduces the measurement costs and reduces errors from Result 1 when compared to Method A.

E. Derangements of Quantum Registers
Let us now discuss how to implement derangement circuits using a linearly growing number (in n and N ) elementary gate operations. In particular, our ESD circuit in Fig. 1 uses a generalisation of the SWAP operator that permutes subspaces of quantum registers. Recall that in general there exist n! permutations of a set of n ordered elements. Derangements are a subset of the collection of all permutations: they permute the n elements such that no element remains in place [56,57]. We define D n in Definition 1 as unitary representations such that they permute subspaces of n quantum registers. For example, for n = 2 our D 2 reduces to the usual SWAP operator as D 2 |ψ 1 , ψ 2 = SWAP 12 |ψ 1 , ψ 2 = |ψ 2 , ψ 1 .
For n = 4 one has 6 possibilities while in general there are (n−1)! possibilities for constructing distinct derangement operators -but choosing any one of these constructions is sufficient for our scheme to work. Indeed, one could construct derangements D n straightforwardly as cyclic shifts [47], but the large number of possibilities might offer more preferable constructions that take into account, e.g, hardware constraints such as connectivity. Please refer to [58,59] for illustrations of the corresponding circuits. Furthermore, we discuss in Appendix E 2 that the large number of symmetries in the derangement circuit can be exploited in order to, e.g., reduce errors that happen during the controlled-SWAP operations.
Regarding gate complexity, derangement operators can be implemented efficiently in general using N (n − 1) elementary controlled two-qubit SWAP gates, where N is the number of qubits in the register |ψ and n is the number of copies of |ψ . These minimal SWAP circuits (which optimally implement derangement operators) can be constructed by mapping the corresponding permutations to graph trees [60], refer to Definition 1.
It is important to recognise that while the number of elementary controlled-SWAP gates grows as O(N ), preparing the quantum state |ψ generally requires O[a(N )N ] gates, where a(N ) is the depth of the computation. It is generally expected that for practical problems one needs to go beyond constant-depth circuits such that the number of gates in the main computation grows faster than O(N ) [61][62][63][64][65]. Thus the gate count of the derangement circuit can be expected to be of diminishing relative significance when scaling up computations. Even if the controlled-SWAP operator is not a hardware-native gate, one needs at most 6 native entangling gates to implement the elementary controlled-SWAP operator, refer to Table I in the Appendix. We demonstrate this below on a practical example assuming a hardware-native gateset and also briefly discuss connectivity constraints.

A. Mitigating Experimental Imperfections
So far we have assumed that the derangement operator in Fig. 1 is perfect. Indeed, gates involved here are expected to be noisy in a realistic scenario which ultimately limits the precision of our approach and increases its complexity.
We show in Example 3 quite generally that the derangement operator is highly resilient to experimental imperfections and protects permutation symmetry even under experimental noise. This is nicely illustrated in our simulated noisy circuit: the unmitigated errors in determining prob 0 in Fig. 3 are quite low and are below 10 −2 for all 50 randomly selected states. The simulated circuit consist of 13 qubits, i.e, 3 copies of a 4 qubit state, and elementary controlled-SWAP gates undergo 3-qubit depolarisations with a probability 3 × 10 −3 . Refer to Appendix F for more details Most importantly, we show in Example 3 quite generally that most errors that occur during the derangement measurement will only trivially affect the final result by (almost) linearly attenuating the output probability prob 0 which can in principle be corrected by an extrapolation. We use extrapolation techniques [12,[14][15][16][17]66] which typically estimate prob 0 ( ) at different values of and extrapolate, e.g., linearly, to zero noise = 0. Due to the high degree of noise resilience of the derangement operator, the measurement probabilities prob 0 ( ) are closely approximated by a linear function in and Fig. 3 illustrates that indeed a linear extrapolation surprisingly well approximates the ideal probability with errors less than 10 −4 .
Here we aim to suppress errors arbitrarily by accounting for the slight non-linearity of the function prob 0 ( ). We prove in Theorem 3 quite generally that expectation values are exactly described by degree-ν polynomials as prob 0 ( ) = ν k=0 c k k and ν is the number of noisy gates. It follows that one can in principle determine the ideal probability by determining prob 0 ( ) at ν +1 different values of and fitting a degree ν polynomial. Fig. 3 (blue circles) demonstrates how the extrapolation error decreases exponentially with the degree of the fitted polynomial.
Furthermore, we analytically solve the dependence on in the limiting case of a large number of gates and obtain the approximation whereη is a constant. The above (3, 3) Padé approximation of the analytical dependence can be determined by fitting the coefficients a 1 , a 2 , a 3 , a 4 , a 5 . These Padé approximations appear to slightly outperform degree-k polynomial extrapolations in Fig. 3. Refer to Theorem 3 for more details. In summary, guided by analytical arguments in Example 3 we propose an efficient and straightforward approach to mitigate experimental errors that occur during the derangement circuit. Although in realistic scenarios an experimentalist may not be able to perfectly amplify all errors, we demonstrate below that extrapolation techniques can still significantly reduce the impact of noise. Note, however, that for an increasing number of qubits the noise in the controlled-SWAP gates accumulates and might attenuate the output probability prob 0 . Estimating this attenuated probability at increased error ratesas required for extrapolation-requires an increased number of measurements. For example, a factor of 0.1 atten-  uation threshold could be approximated via the formula 0.1 = (1 − ) N (n−1) , and at a gate error = 10 −3 it limits the maximal number of qubits as N (n − 1) ≤ 2301which is still an encouraging figure in practice. We note that other error mitigation schemes could also be applied straightforwardly to address errors happening during the derangement measurement.

B. Limitations of the Technique
There is one main limitation of the present approach: In a realistic experiment one can expect that coherent errors occur. As opposed to error correcting schemes, our ESD approach is completely oblivious to these and ultimately such errors will limit precision. Nevertheless, well-established techniques enable us to suppress these coherent errors, e.g., via converting them into incoherent errors by Pauli twirling [67][68][69][70]. Furthermore, as discussed above, even incoherent noise models introduce a mismatch in the dominant eigenvector which can be expressed via √ 1 − c|ψ id + √ c|ψ err . While the coherent mismatch c limits the precision of the present approach, we present a comprehensive analysis and provide strong theoretical guarantees in ref. [45] that its impact decreases when increasing the scale of the computation. Refer also to the Appendix for an illustration how this error can be mitigated. Furthermore, the present approach is expected to be particularly well suited for variational quantum algorithms: First, the impact of coherent mismatch is guaranteed to be quadratically smaller when the aim is to prepare eigenstates [45]. Second, variational algorithms are inherently robust to this kind of error as a variational optimisation implicitly minimises the impact of coherent errors. We also remark that in the context of variational algorithms one could slightly re-adjust variational parameters such that the overlaps between copies Tr[ρ k ρ l ] are maximal for every k = l -note that measuring such overlaps is possible with the setup in Fig. 1. This ensures us that the dominant eigenvector of every copy is (close to) identical. One could also use Clifford circuits to calibrate or validate the quantum device by comparing to expectation values obtained from (efficient) classical simulations [18,19].
We further remark that we have also neglected the effect of measurement errors, i.e., when the probability of collapsing into state 0 is biased. Nevertheless, there exist well-established techniques for mitigating the effect of such imperfections [12,71].

V. PRACTICAL APPLICATIONS
Recall that near-term quantum devices are limited to shallow quantum circuits due to their inability to implement quantum error correction. Nevertheless, such shallow circuits may still be of high practical value as, for example, they may allow one to approximate groundstate energies of Hamiltonians H, which cannot be estimated by other means [11][12][13][22][23][24][25][26][27]. Let us consider a spin-ring Hamiltonian with a constant coupling J = 0.1 and uniformly randomly generated on-site interaction strengths ω k ∈ [−1, 1] as for the following reasons: (a) this Hamiltonian is relevant in the context of condensed matter phenomena, such as manybody localisation [72], but its ground state cannot be approximated classically for large N [73,74]; (b) it has a very simple structure as well as a linearly scaling number of Pauli observables σ; (c) it is closely related to other important Hamiltonians, cf. approximate optimisation algorithms (QAOA) or spin systems in materials science [11-13, 75, 76]. We prepare the ground state via the usual variational Hamiltonian ansatz (VHA) [11][12][13], which was proposed in the context of QAOA [22,75,76], but has successfully been extended to and analysed in the context of, e.g., quantum chemistry, the Hubbard model as well as spin systems [65,[77][78][79]. It consists of alternating layers of discretised time evolutions as illustrated in Fig 4/a, refer to the Appendix for more details. We consider a quantum device that can natively implement single-qubit R y 10 -2. 10 -1.  Table I requires 5 (4) applications of the hardware-native entangling gates. (c) Error in estimating the ground state energy with and without mitigation as a function of the number of expected errors ξ in the ansatz circuit. We assume that the experimentalist can amplify the vast majority of the noise (94%) in the hardware-native gates, but not all of it, limiting extrapolation to a finite precision (orange diamonds). Although the derangement circuit is also degraded by noise, it can still drastically reduce errors both in combination with (black crosses) and without extrapolation (magenta dots). Dashed lines correspond to Tr[Hρ n ]/Tr[ρ n ] as obtained via noiseless derangement circuits. When increasing n, we approach in exponential order a non-zero error (dashed grey) which is due to the coherent mismatch in the dominant eigenvector. The present demonstration on 2 × 6 + 1 qubits should rather be viewed as a worst-case scenario since increasing the scale of the computation will favour the ESD approach.
and R z rotation gates as well as XX gates of the form exp[−iθX j X k ] between any pairs j = k of qubits, i.e., a gateset comparable to ion-trap systems [80]. Such a platform can efficiently implement the ansatz circuit of l layers using 3N l applications of the entangling gates. Using general techniques of ref. [81] we recompile the derangement circuit into hardware-native quantum operations. Table I summarises the number of entangling (ν e ) and single-qubit (ν s ) gates required to implement the elementary controlled-SWAP operator: we find more compact representations than previous ones [82,83].
We use l = 20 ansatz layers such that the ground state energy in a noise-free setting could be approximated to ∆E ≈ 10 −4 and explicitly simulate N = 6 qubits with n = 2 copies of the noisy computational state (equivalent of a 26-qubit pure-state simulation). We discuss in the Appendix that controlled-SWAP gates in the derangement circuit need only be recompiled up to a local SU (4) freedom as shown in Fig. 4(b), refer also to second and third columns in Table I. We thus need less than 5N = 30 entangling gates for the mitigation, which is significantly fewer than the 3N l = 360 entangling gates required for the the main computation. Fig. 4(c/red squares) shows unmitigated errors when estimating the ground state energy of H. We assume a noise model in which the vast majority of errors is due to dephasing and damping (relaxation), which the experimentalist can perfectly amplify. We additionally assume that a small depolarising noise, approximately 6% of the overall gate error rate, affects the qubits that the experimentalist cannot amplify. This limits extrapolation techniques [12,[14][15][16][17]66] to a finite precision as shown in Fig. 4(c/orange diamonds). In contrast, the present approach can suppress errors under arbitrary noise models. Indeed, even with a noisy derangement circuit, one can drastically reduce errors by orders of magnitude as shown in Fig. 4(c/magenta dots).
As discussed above, we can apply zero-noise extrapolation to mitigate the effect of errors in the derangement circuit. As such, extrapolation in Fig. 4(c/black crosses) can almost fully mitigate errors in the derangement circuit as black crosses approach the blue dashed line, i.e., the performance of the noiseless derangement circuit D 2 . Thus it would be advantageous to prepare a larger number of copies n > 2 to further suppress the errors as illustrated in Fig. 4(c/green and brown dashed lines). We remark, however, that going significantly beyond n = 4 copies may not be relevant in practice for the following reasons. (a) In the practically most important region with ξ 1, errors may be sufficiently suppressed below the level of other practical factors, such as shot noise, or the approximation error ∆E ≈ 10 −4 due to insufficient ansatz depth. (b) In the limit of a large number of copies, i.e., n → ∞, a constant error is approached which is due to the coherent mismatch Fig. 4(c/grey dashed line). (c) The region with ξ 2 is practically inaccessible due the to rapidly increasing measurement overhead from Result [45]. Note that it is generally the drawback of all mitigation techniques that their measurement cost grows exponentially with ξ and becomes prohibitive when ξ 2 [12].
Let us finally emphasise that one should look at the present demonstration as a worst-case scenario for the following reasons. (a) Practical value is expected when computations are scaled beyond N > 20 qubits [73,74], for which the ansatz layers need to be increased beyond the present l = 20, e.g., refer to [63,64,79]. This leads to an increasing ratio r e of the number of entangling gates in the main computation relative to the derangement circuit as r e = 3 5 l(N ). Here the number of layers l(N ) > O(N 0 ) needs to grow faster than a constant. (b) The impact of coherent mismatch in Fig. 4(c/grey dashed line) is guaranteed to decrease as the number of gates increases [45]. (c) Approximating ground states of Hamiltonians other than the one in Eq. 8 may require more complex ansatz circuits with more rapidly growing gate counts. For example, simulating the Hubbard model on N = 50 qubits-one of the promising candidates for demonstrating practical quantum advantage-requires ≈ 2 × 10 4 entangling gates [84] while the derangement circuit requires only a few hundred, resulting in the ratio of entangling gates as r e ≈ 10 2 . An even more pronounced example is the case of molecular Hamiltonians in which the number of Pauli terms may grow as O(N 4 ) [85]. (d) The ansatz was optimised in a noiseless, pure-state simulation and re-optimising the parameters may reduce the impact of coherent mismatch. (e) In the present case we assume 94% of gate errors can be amplified perfectly: the experimentalist may only have control of a smaller fraction of errors further limiting the precision of extrapolation techniques.
We also consider the example of a connectivity constrained architecture in the Appendix: the number of two-qubit gates to implement the derangement circuit is increased from 5N to 6N , while in the ansatz it is increased from 3N l to 9N l. Thus in such a scenario connectivity constraints work in our favour. Of course, in principle specific hardware may be fabricated to optimally accommodate the present technique as well as one may utilise long-range links between macroscopically separate quantum processors [86].

VI. DISCUSSION AND CONCLUSION
This work has introduced a novel principle for suppressing errors in near-term quantum devices. As opposed to error mitigation techniques, our ESD approach requires an increased system size: By preparing n identical copies of a computational state, our derangement circuit protects its permutation symmetry and suppresses errors in an expectation value measurement exponentially (in the number n of copies). Furthermore, the ESD is very NISQ friendly, since the n copies of the computational state can be prepared completely independently and they only need to be 'bridged' by a shallow derangement circuit immediately prior to measurement. Furthermore, the significant advantage of the ESD approach is that it is completely oblivious to the error model during the state preparation process and works (in principle) with arbitrarily high error rates. As such, the present approach could be compared to other mitigation techniques. While quasi-probability techniques [15,[17][18][19] may in principle be able to perfectly negate the effect of errors, they require an exponentially growing number of circuit variants together with a perfect knowledge of the error model. Any deviation from the assumed noise model results in errors, which may grow exponentially with the number of gates. Symmetry verification is another successful mitigation technique that could be used if exploitable symmetries are present [20,87], however, it cannot reduce errors that fall within the subspace of appropriate symmetry. Furthermore, zero-noise extrapolation [12,[14][15][16][17]66] can in principle be applied generally, however, the experimentalist may not be able to perfectly amplify all errors, see Fig. 4(c). In contrast, the present approach can be applied completely generally in any scenario. Note, however, that these existing mitigation techniques will be highly relevant as they can be used in combination with the present approach, as demonstrated above.
The main limitation of the ESD approach is that it cannot address coherent noise or a coherent mismatch in the dominant eigenvector, although those errors can be exponentially smaller than the incoherent decay of the fidelity and are guaranteed to decrease when increasing the scale of the computation [45]. As long as the derangement circuit is assumed to be perfect, the sample complexity of our ESD approach is polynomial in the inverse precision E −1 and comparable to the standard shot-noise limit in practically relevant scenarios, i.e., when the number of expected errors in the main computation is below ξ ≈ 1. Errors during the derangement process do degrade the performance of the present approach and one needs to rely on error mitigation techniques to reduce this impact. Nevertheless, it was shown above that the number of gates in the derangement circuit is expected to become negligible relative to the main computation when scaling up computations.
Let us now briefly comment on prior approaches that similarly consider identical copies of quantum states and similarly apply SWAP operators (or generalisations thereof). In fact, numerous prior works have considered and exploited the permutation symmetry of identical copies of mixed states in the context of, e.g., reconstructing spectral properties of mixed quantum states [47][48][49][50][51][52][53], probing their entanglement characteristics [88][89][90], for constructing universal quantum software [91], and for optimal state discrimination [92][93][94]. Indeed, in the special case of n = 2 copies, our scheme is comparable to a modification of the usual SWAP-test circuit [47]. However, as opposed to previous works, here we are not interested in the input mixed state ρ, but only in its dominant eigenvector |ψ that represents a computational quantum state. In fact, we regard any other contribution in the state ρ as 'noise' which we aim to exclude from the expectation-value measurement process. The present approach could also be compared to entanglement distillation protocols [95][96][97], however, our derangement circuit cannot exponentially improve the 'quality' of the input states, but only exclude erroneous contributions from the expectation-value measurement process.
Let us finally remark that the ESD approach leaves a lot of room for a large number of different physical implementations, beyond the circuit in Fig.1 that has been analysed in detail in this work. Our circuit in Fig.1 is only one possible realisation of the general principle outlined here and even this circuit has a large number of invariants. We only need to remark here that the results presented here are very general, and our example circuit could certainly be improved by combining it with advanced techniques for example, by simultaneously measuring groups of commuting observables [43,44] -but we expect these can only introduce constant factor improvements and will not change the main results in this work. In future work we will explore the numerous possibilities offered by the general principle introduced here.
Please also refer to the online repository [58, 59] for simulation and demonstration material.

ACKNOWLEDGMENTS
I would like to thank Simon C. Benjamin for his invaluable comments and challenging questions. His help and support was crucial for finalising this work. I would like to thank Earl Campbell, Robert Zeier, Suguru Endo and Ying Li for their very constructive and valuable comments on drafts of this work. I acknowledge funding received from EU H2020-FETFLAG-03-2018 under Grant Agreement No. 820495 (AQTION) and the QCS Hub (EPSRC Hub grant under the agreement number EP/T001062/1) for support including hardware provision. The numerical modelling involved in this study made use of the Quantum Exact Simulation Toolkit (QuEST), and the recent development QuESTlink [98] which permits the user to use Mathematica as the integrated front end. I am grateful to those who have con-tributed to both these valuable tools.
Note on subsequent works-A week after I had made my preprint available a paper appeared on the arXiv that proposes a very similar idea [99], while mostly focusing on the n = 2 scenario. The main difference is that ref. [99] uses the trace distance to quantify errors thereby also taking into account the effect of coherent mismatch. In contrast, in Result 1 I only quantify errors with respect to the dominant eigenvector, while I defer a comprehensive analysis of the coherent mismatch to the subsequent paper [45] for the following reasons.
(a) Ref. [99] numerically computed and plotted the trace distance in a comprehensive range of scenarios, and noted that the bound is "pessimistic" as it overestimates errors. As such, in ref. [45] I show that in most practical scenarios this trace distance should not be used since a quadratically smaller bound exists, i.e., the square c of the trace distance √ c. This is nicely illustrated in Fig. 4(c/gray dashed line): at a circuit error rate ξ = 0.1 the actual error is ∆E ≈ 7.5 × 10 −6 while the bound of ref. [99] is misleading as it is orders of magnitude larger 2 √ c H ∞ = 9.2 × 10 −3 . The relation between the two bounds is discussed in more detail in ref. [45].
(b) Going beyond the "pessimistic" bound of ref. [99] and realistically characterising the coherent mismatch is a very complex problem as it is related to important themes in mathematics, such as Weyl's inequalitiessolving of which was a major breakthrough. In ref. [45] I provide strong theoretical guarantees that the coherent mismatch can be exponentially small and decreases when increasing the scale of the computation. Thus practitioners need principally care about the errors with respect to measuring expectation values in the dominant eigenvector, cf. Fig. 4.
(c) It is also interesting to note that Result 1 only depends on spectral properties, i.e., eigenvalues λ k and Rényi entropies H n , that may be estimated experimentally. In contrast, estimating the trace distance of ref. [99] would require one to prepare the ideal, perfect, noiseless quantum state as well as the state ρ n /Tr[ρ n ], which is prohibitive.
In the interval since the present paper and then ref. [99] appeared, other studies have already reported ideas extending or varying these original concepts. For example, ref. [100] introduces a generalisation of the presented permutation-symmetry principles. Furthermore, ref. [101] proposes that the derangement circuit D n := SWAP 1,n · · · SWAP 1,3 SWAP 1,2 can be realised in a qubit-efficient manner by utilising qubit resets, thus drastically reducing resource requirements of the present approach. In this section we prove that the derangement circuit in Fig. 1 can be used to estimate expectation values. We then prove upper bounds on approximation errors with or without using Rényi entropies of quantum states. We finally prove sample complexities of our ESD approach.
Here all s ∈ S n are permutations of the index set {1, 2, . . . n} with no fixed point, i.e., s are derangements [56,57]. Here S n denotes the symmetric group. For n ≥ 4 we also demand that s are n-cycles (standard cyclic permutations of maximal length [57]), which are a subset of derangements. The number of unique (n-cycle) derangement operators is given as |D n | = (n − 1)!. Due to seminal results of Dénes, s can be decomposed into n − 1 transpositions [60] and therefore D n decomposes into n − 1 pair-wise SWAP operators of the quantum registers. One can therefore construct minimal SWAP circuits by (bijectively) mapping the corresponding permutations performed by D n ∈ D n to graph trees.
Theorem 1. We consider n identical copies of the same quantum register ρ in a separable state as ρ ⊗n . Methods A and B, as illustrated in Fig. 1, result in the probability of measuring the ancilla in the 0 state as Here σ is a unitary (Hermitian) observable (or otherwise the real part of a unitary operator is estimated).
Proof. We start by recapitulating that any density operator ρ admits the following spectral decomposition (note that here we use a different notation than what in the main text) where the second equation is the spectral decomposition of n copies of the same state.
Recall that the action of any unitary circuit U on a density matrix U ρU † represents a probabilistic mixture of its transformed eigenvectors U |ψ k that occur with probabilities p k . Similarly the action of a unitary circuit on the composite state ρ ⊗n can be written as a probabilistic mixture of the pure-states U |ψ k1 , ψ k2 , . . . ψ kn that occur with probabilities as products p k1 p k2 · · · p kn .
Let us now derive the action of the unitary circuit in Fig. 1 on the composite quantum system ρ ⊗n . Our proof works with any derangement operator D n from Definition 1 but here we only need to consider one example: we consider a cyclic shift (as originally proposed in [47]) of the registers via its explicit action on pure states as D n |ψ 1 , ψ 2 , . . . ψ n = |ψ n , ψ 1 , . . . ψ n−1 .
Our controlled derangement operator acts on the pure state |0, ψ k1 , ψ k2 , . . . ψ kn that occurs with a probability p k1 p k2 · · · p kn , and we denote as 0 the state of the additional ancilla qubit. Applying the sequence of gates from Fig. 1 yields the following transformations of the pure states.
Substituting the above results back we obtain the expression for the ancilla probability by using that only terms with coinciding indexes contribute to the sum via k 1 = k 2 = . . . k n Example 1. Using our definition of experimental quantum states from Eq. (1), our circuit in Fig. 1 can estimate the expectation value It is clear that the error probabilities p k are suppressed exponentially via p n k , but the dominant term gets slightly attenuated too via λ n . For example, let us assume that our dominant eigenvalue is λ = 0.8 and we have a high-entropy error in a subspace spanned by 100 eigenvectors via the uniform distribution p k = (1 − λ)/100 when k ≤ 101 and p k = 0 when k > 101. We then obtain the estimate Tr[ρ n σ] = 0.8 n ψ|σ|ψ + 101 k=2 (2 × 10 −3 ) n ψ k |σ|ψ k .
Since | ψ k |σ|ψ k | ≤ 1, we can upper bound the errors in Tr[ρ n σ] = 0.512 ψ|σ|ψ + E for, e.g., n = 3 as |E| ≤ 8 × 10 −7 . Hence our error contribution is at least 640000-times smaller than the desired expectation value. This high degree of error suppression is due to the large n = 3 Rényi entropy of the error probabilities p k as We will show in Theorem 2 and in Lemma 1 that the efficiency of the error suppression depends exponentially on this Rényi entropy. Indeed, in order to obtain an accurate estimate of ψ|σ|ψ we need to have a good knowledge of the largest eigenvalue of the density matrix λ that divides ψ|σ|ψ . We assume in Method B in Theorem 2 that this eigenvalue is known precisely. However, in Method A we just replace our observable σ with the identity in Fig. 1 and  (2 × 10 −3 ) n , for n = 3 we obtain the result as 0.8 3 + 8 × 10 −7 = 0.512001, which is a very good estimate of 0.8 3 = 0.512 as the error is 640000-times smaller than the ideal value.
Example 2. We consider now the worst-case scenario of 0-entropy error distributions. For example, let us consider the state ρ = λ|ψ ψ| + (1 − λ)|ψ err ψ err | which is a mixture of the ideal state |ψ that occurs with a probability λ and an erroneous state |ψ err which occurs with a probability (1 − λ). The error probability distribution from Eq. (1) is obtained as p 2 = 1 and p k = 0 for k > 2. It follows that the error distribution has a 0 entropy and our approach completely breaks down when λ ≤ 1/2 since the dominant eigenvector then becomes |ψ err . We can show that the errors in the expectation value are still exponentially suppressed, but much less efficiently than before in Example 1. Let us set λ = 0.8 and Tr[ρ n σ] = 0.8 n ψ|σ|ψ + 0.2 n ψ err |σ|ψ err .
For n = 3 we obtain Tr[ρ 3 σ] = 0.512 ψ|σ|ψ + 0.008 ψ err |σ|ψ err , and therefore the error is suppressed by a factor of 64. This is significantly lower that the factor of 640000 suppression from Example 1 which assumed a high-entropy error distribution.

Appendix B: Exponentially decreasing upper bounds on approximation errors
Theorem 2. We use Methods A/B from Fig. 1 to estimate the probability prob 0 = 1 2 + 1 2 Tr[ρ n σ] of the ancilla qubit. In Method A we use the same technique via σ = Id to estimate the probability prob 0 = 1 2 + 1 2 Tr[ρ n ] and our Method A yields the approximation In Method B we assume that the largest eigenvalue λ of the state ρ is known and therefore we have the approximation Method B: The approximation errors are bounded via |E A | ≤ 2Qn 1+Qn and |E B | ≤ Q n , and we prove in Lemma 1 that the bounding sequence Q n = (λ −1 − 1) n p n n generally decays exponentially when we increase n or when we increase the Rényi entropy of the probability vector p.
Proof. Let us recapitulate the explicit form of the density matrix from Eq. (1) as We can evaluate the expressions for the trace operation as where we have used that | ψ k |σ|ψ k | ≤ 1 due to unitarity of σ and p n is the n-norm of the probability vector p. Method B: Here our aim is to estimate Tr[ρ n σ] and λ n is known exactly. The error term can be calculated via and here we have defined the sequence Q n . Method A: In this case we estimate Tr[ρ n σ] and Tr[ρ n ], and we now calculate the error term using that Tr[ρ n ] = λ n + (1 − λ) n k p n k = λ n + (1 − λ) n p n n . Indeed, we obtain where we have used the notation Z = λ −n (1 − λ) n 2 N k=2 p n k ψ k |σ|ψ k for simplicity. It follows that the error term is bounded via Let us now upper bound this expression as where we have used the triangle inequality as |a − b| ≤ |a| + |b|. We can now use from before that |Z| ≤ Q n , which results in the error term This concludes our proof.
Lemma 1. The sequence Q n in our upper bounds in Theorem 2 decreases exponentially for a fixed n when we increase the Rényi entropy H n (p) of the error probability vector p from Eq. (1) as Q n = (λ −1 −1) n exp[−(n−1)H n (p)]. Furthermore, the sequence generally decays exponentially via Q n ≤ (p max ) −1 Q n where we define the suppression factor Q := (λ −1 −1)p max < 1 and p max is the largest error probability from Eq. (1).
Proof. The first part of the proof straightforwardly follows by substituting the expression for the Rényi entropy [46] H n (p) = 1 into the expression for Q as This concludes the first part of our proof.
Let us now prove that the sequence Q n decreases in exponential order when we increase n. Using the well-known series of inequalities satisfied by the Rényi entropies as H ∞ (p) · · · ≤ H n (p) ≤ H n−1 (p) ≤ · · · ≤ H 1 (p), we obtain the general bound − ln p max ≤ H n (p) for all n, and we define the largest error probability p max := max k p k . It follows that The upper bound Q < 1 holds due to our condition below Eq. (1) as (λ − 1)p k < λ for every probability k = {2, 3, . . . 2 N }. It follows that p max < (λ −1 −1) −1 and therefore Q = (λ −1 −1)p max < 1.
Lemma 2. Determining the expectation value ψ|σ|ψ from Theorem 2 to a fixed precision E requires n = ln Q −1 copies of the quantum state ρ (one needs to apply the ceiling function to round this up to the nearest integer). Here Q < 1 is the suppression factor from Theorem 2 and from Lemma 1.
Reducing shot noise to the desired precision E requires the following number of samples. In Method A one needs to assign N s,1 samples to determine prob 0 and N s,2 samples to determine prob 0 as Indeed, in both cases the measurement cost grows polynomially with the inverse precision E −1 and its polynomial order is determined by f := ln(λ −1 ) ln(Q −1 ) . The standard shot-noise limit E −2 is only slightly modified by f in the case of good quality quantum states or in the case of high-entropy probabilities.
Proof. Let us first compute the upper bound on the number of copies n required to achieve a fixed precision E 1. We use the upper bounds from Theorem 2 as |E A | ≤ E = 2Qn 1+Qn ≈ 2Q n and |E B | ≤ E = Q n . It is clear that the precision of Method A differs by a factor of 2 for E 1, and we will use this expression for both methods for simplicity. Let us use the exponentially decreasing upper bounds on Q n from Lemma 1 and write |E A | ≤ 2(p max ) −1 Q n , and |E B | ≤ 2(p max ) −1 Q n , where we have defined the suppression factor as Q := (λ −1 −1)p max < 1. It is straightforward to express n as Remark: Let us further expand the above equation by using our expression from Lemma 1 as Q n = (λ −1 −1) n exp[−(n−1)H n (p)], which results in We can express n as We remark that the denominator is positive due to the bound on Rényi entropies from Lemma 1 as ln[(λ −1 −1)] < H n (p). One should actually use the ceil function to round up the right-hand expression to the nearest integer. Note that the above expression implicitly depends on n via the Rényi entropy H n (p), but one could always use the series of inequalities 0 ≤ H n (p) ≤ H n−1 (p) ≤ . . . H 2 (p) ≤ H 1 (p) to bound the value of n. It is straightforward to show now that in the limiting scenarios H 2 (p) 1 or λ ≈ 1 we recover n → 1 (via the ceil function). Let us now express the scaling with respect to shot noise.
Method B: We estimate the probability prob 0 from Theorem 2 and we exactly know λ n . Our precision E is determined by the variance of our estimator which can be obtained as where we have used that the variance of the binomial distribution is prob 0 (1 − prob 0 )/N s and N s is the number of samples. We can explicitly express the number of shots N s required to reach a fixed precision E as Let us now simplify λ 2n by expressing the dependence of n on the precision above E as and it follows that We can finally express the number of samples explicitly as ] is a constant multiplication factor and 0 ≤ prob 0 ≤ 1 and we have introduced f := ln(λ −1 ) ln(Q −1 ) . Indeed, we obtain the expected limits due to lim λ→1 f = 0 and lim Q→0 f = 0. In general when λ > 1/2 we can use the expression Q ≤ (λ −1 − 1) from Theorem 2 as f ≤ ln(λ −1 ) ln[(λ −1 −1) −1 ] which is only saturated by 0-entropy distributions. For example when λ = 0.6 then we obtain f ≤ 1.26, and this value can be smaller depending on the entropy of the probability distribution. Interestingly, for sufficiently good quality states as λ ≥ 0.9, the polynomial overhead introduced is very small via f ≤ 0.16.
Method A: In this case we estimate both prob 0 and prob 0 . The variance of our estimator can be specified as Let us now use that 2prob 0 − 1 ≈ λ n and simplify the above expression as We can again substitute the variance of binomial distributions as Var[prob 0 ] = prob 0 (1−prob 0 )/N s,1 and Var[prob 0 ] = prob 0 (1 − prob 0 )/N s,2 . The measurement cost of determining both components to a precision E 2 /2 follows as We can now use our previous expression from Eq. (B13) for determining λ 2n and λ 4n , which finally yields our formula for the measurement costs as Total number of measurements required to determine the result is indeed N s,1 + N s,2 , and recall that f = ln(λ −1 ) ln(Q −1 ) .
It follows from the above definition that the dominant eigenvector |ψ is orthogonal to every error contribution in every eigenstate as ψ|ψ kµ = 0 for every k = {2, . . . 2 N } and for every µ = {1, . . . n}. Modifying accordingly the orthogonality relation in the proof of Theorem 1 in Eq. (A4) allows us to compute the leading term as expected, but every other non-zero term is multiplied with the prefactor n µ=1 (1 − λ µ ) which leads to the following error term (1 − λ µ )]. (C3) As shown previously, this allows us to compute the error of our Method A and Method B in Theorem 2 as in general for λ min > 1/2.

Coherent mismatch in incoherent error channels
As we discussed in the main text our approach cannot address coherent errors, i.e., when the dominant eigenvector of the density matrix is √ 1 − c|ψ id + √ c|ψ err , where ψ id is the ideal computational state and ψ err is some error. This is expected to happen when systematic errors, such as miscalibrated rotation angles, are present but it is straightforward to show that even a completely incoherent error channel (random unitary events) can introduce a slight mismatch in the eigenvectors.
We show this by considering a quite general noise channel as in which no errors happen with a probability (1− ) and some error happens with a probability . In complete generality, the eigenvectors of ρ can be different than the eigenvectors of ρ err unless the commutator vanishes [ρ, ρ err ] = 0. A typical example for a vanishing commutator is the single-qubit depolarising channel in single-qubit systems, in which case ρ = (1 − )ρ + Id and indeed [ρ, Id] = 0. However, for more than 1 qubits (or non-separable states) the above expression does not hold and even single qubit depolarising can introduce a coherent mismatch such that the dominant eigenvector of ρ is √ 1 − c|ψ + √ c|ψ err . The coherent mismatch due to incoherent errors is expected to be very small in practically relevant scenarios since the high entropy of the error probabilities from Eq. (1) ensures us that [ρ, ρ err ] 1. For example, in our numerical simulations in Fig. 2 the infidelity of the dominant eigenvector with respect to the pure state obtained from a noise-free computation was below 10 −4 .
In general, for a high entropy error distribution in Eq. (1) we obtain the spectral decomposition with λ k 1 for k ≥ 2 ρ = k=1 λ k |ψ k ψ k |.
One can compute the first order (in ) corrections to the eigenvectors of ρ via the usual perturbative series: the dominant eigenvector of ρ is approximately (up to normalisation) where we have used that λ 1 − λ k ≈ λ 1 . Indeed the result is constant bounded due to the norm of the fist order correction k=2 | ψ k |ρ err |ψ 1 | 2 = |Col 1 [ρ err ]| 2 ≤ 1, hence the scaling of the correction O( ). Here Col 1 [ρ err ] is the column vector of ρ err whose norm is bounded by the largest eigenvalue.
Let us now focus on the repeated application of the noise channel from Eq. (C4) which can be used to model a quantum circuit that applies a series of noisy quantum gates with error probability . The incoherent decay of the dominant eigenvalue is expected to decay exponentially with ν as (1 − ) ν . For large systems one needs to implement a large number ν of gates and therefore one requires to have a sufficiently low per-gate error in order to keep the dominant eigenvalue above a threshold λ min -and thus keep the sampling costs in Result 2 practical. Since the strength of the coherent mismatch is proportional to the per-gate error rate , it is expected to decrease when we decrease . Interestingly, our numerical simulations of the single-and two-qubit depolarising channel suggest that  in the investigated region (see Fig. 5) the coherent mismatch only grows linearly when we increase the number of gates ν. This suggest that when we increase the number of gates the incoherent (exponential) decay of the dominant eigenvalue is significantly more damaging than the (linearly) increasing coherent mismatch.
We illustrate this the following way. Let us define the following quantities. We define the fidelity between the dominant eigenvector ψ (ν) 1 (after the application of ν noisy gates) and the ideal state as η 1 := | ψ id |ψ Furthermore we define the fidelity between the dominant eigenvector and the density matrix ρ as η 2 := ψ Here η 1 decays due to the coherent mismatch while η 2 ≈ (1 − ) ν decays purely due to the incoherent effect of the noise channel. Fig. 6 (a) shows how the ratio η 2 /η 1 decreases when we increase the number of gates. Interestingly the scale at which this ratio decays appears to be exponential in the investigated region.
These results suggest that the coherent mismatch in the dominant eigenvector can be expected to be sufficiently small for large, complex quantum circuits. Refer to ref. [45] for a detailed analysis.

a. Mitigating the coherent mismatch
As discussed in the main text, well-established techniques can be used to mitigate the effect of coherent errors. We now focus on the above introduced coherent mismatch in the eigenvector due to incoherent error channels and demonstrate the effectiveness of an extrapolation approach in Fig. 6 (b). Similarly to Fig. 3 in the main text, we use extrapolation techniques, but here we vary the gate error rate in the state preparation stage (and not in the derangement process). We set gate errors such that two-qubit gates undergo a depolarising noise with probability = 10 −3 and assume that the experimentalist can increase this error in k = 2, 3, 4 . . . steps up to = 10 −2 . As expected from the above arguments based on a perturbative expansion of the dominant eigenvector, the measured expectation value should depend on the error levels as a polynomial that has rapidly decaying expansion coefficients due to the fact that the per-gate error level is low as 1. We have determined extrapolation errors using various fitting techniques as shown in Fig. 6 (b). We define the extrapolation error as the difference between the ideal, error free expectation value ψ id |σ|ψ id and the estimated expected value Tr[ρ n σ] Tr[ρ n ] from Method A of Theorem 2. Here |ψ id is the state that one would obtain from a perfect, noise-free evaluation of the circuit and in our simulation we consider the same 12-qubit circuit as in Fig. 2 (right) in the main text (refer to Appendix F) with n = 3 copies.
Indeed, Fig. 6 (b) confirms that the effect of the coherent mismatch can be straightforwardly mitigated by fitting low-order polynomials to the experimental data. The red horizontal line represents the error bound from Result 1 and one can demonstrably suppress the effect of the coherent mismatch below this error bound. As expected, when we increase the degree of the fitting polynomial, the error saturates as it reaches the level from Result 1 which we defined for the case when the coherent mismatch is neglected -and the errors could only be further suppressed by increasing the number n of copies.  6. (a) The fidelity η1 decreases purely due to the coherent mismatch while the fidelity η2 decays purely due to the incoherent effect of the noise channel as discussed in Appendix C 1. The ratio η2/η1 appears to decay exponentially in the investigated region (linear in the logarithmic plot) when we increase the number of gates for the fixed two-qubit gate error = 10 −2 and this ensures us that the coherent mismatch in the dominant eigenvector becomes negligible for large systems (large ν). We simulated the same circuit as in Fig. 5 (b). (b) Mitigating the error caused by a coherent mismatch in the dominant eigenvector of the density matrix as discussed in Appendix C 1. Gate error levels in the state preparation stage were varied in k = 2, 3, 4 . . . steps and the obtained expectation values were extrapolated to the zero error limit. The red horizontal line represents the error bound from Result 1 and one can straightforwardly suppress the effect of the coherent mismatch below this error bound by fitting low-order polynomials. The error saturates when increasing the degree of the fitting polynomial and can only be further reduced by increasing the number n of copies.
Appendix D: Noise resilience of derangements and error extrapolation Example 3. We now show examples why the derangement operator is highly resilient to errors. We proceed by recapitulating that quantum channels can be represented by a set of non-unique Kraus maps and, in particular, we consider the decomposition into the following sum of unitary transformations as where U is the ideal unitary transformation, k c m = 1 and 0 ≤ , c m ≤ 1, while the erroneous Kraus operators are unitary via U m U † m = Id. The deviation from the ideal transformation can be interpreted as unitary transformations that randomly affect the eigenvectors |ψ of the quantum state as U m |ψ with probability c m .
It follows that the non-symmetric combinations of input states do not contribute to the output even when the derangement operator is affected by random errors. Note that even though the errors do not directly contribute to the final output (as shown above), the probability that the circuit outputs an error-free result is decreased via the 1 − factor. This is, however, a trivial effect that only attenuates the output probabilities linearly and can be completely corrected by a linear extrapolation (i.e., estimating the output probabilities at different values and then extrapolating to = 0).
The second equation shows that we obtain the correct contribution despite all registers except for register 1 have undergone some random error U 2 , U 3 etc. Our previous argument again holds: despite the fact that these error events do not directly contribute to the final output of the circuit, the probability of an error-free output is attenuated linearly which, nonetheless, can be completely corrected by a linear extrapolation.
In summary, the derangement measurement is highly resilient to errors and completely protects the permutation symmetry of input states even when the derangement operator suffers from experimental noise. However, errors that affect the qubits to which the observable σ is applied will degrade the final result non-trivially via U 1 ψ|σ|U 1 ψ , where U 1 is some unitary noise process that occurs with a (possibly) low probability. Nevertheless, we show in the main text and in the following theorem that these erroneous contributions can be successfully mitigated with, e.g., extrapolation techniques.
Theorem 3. Assume that a circuit consists of a sequence of ν noisy quantum gates, and each gate's error model is of the form (1 − )Φ k + E k , where Φ k is the ideal, error-free quantum channel and E k is an arbitrary error channel (CPTP map) that occurs with probability . Most typical error models are of this form, including dephasing, depolarising, inhomogeneous Pauli errors, arbitrary unital channels and beyond (E k need not be local or two-local). In a circuit that consists of number ν such gates, any expectation value E will depend on the error probability as a degree ν polynomial via where E k are real polynomial coefficients. One can therefore exactly determine the ideal expectation value E(0) by estimating E( ) at ν +1 points in . The so-called Lagrange polynomial or the Newton polynomial provide explicit formulas for computing E( ) from the pointwise reconstructions E( k ). Furthermore, one can approximate the dependence on via, e.g., the (3, 3) Padé approximation as that only requires the coefficients a 1 , a 2 , a 3 , a 4 , a 5 to be fitted to experimental data.
Proof. Applying ν gates in a sequence will result in the product of channels where G k is a channel which decomposes into the sum of all terms in which k errors occur and ν k=1 Φ k is the ideal error-free circuit. We can introduce the circuit with no errors as G 0 := ν k=1 Φ k which simplifies our formula as.
It follows that any expectation value (with respect to some observable H) will be of the form therefore any expectation value can be expressed as a degree ν polynomial as a function of the error probability as where E 0 is the ideal, noise-free expectation value and E k are polynomial coefficients. Let us now write (without loss of generality) that the expectation values are of the form Tr{HG k ρ} =η + η k , wherẽ η is a mean value and η k expresses the deviation from the mean value. Let us assume that η k η, which in the case of the derangement operator is motivated by our argument in 3, that most errors do not contribute and thereforẽ η ≈ 0. In this case we can evaluate the summation analytically for the mean value We can obtain a Padé expansion of the above result at ≈ 0 by neglecting the term ν . For example the (3, 3) Padé approximation follows as where a(n), b(n), c(n), d(n) are the Padé expansion coefficients that depend on the number n of gates ν, for example a(n) = 2(62 + 11n − 8n 2 + n 3 ) 5(26 − 9n + n 2 ) .
Indeed this expansion is only valid when η k ≈ 0. Nevertheless, we propose to approximate the polynomial by fitting the coefficients a 1 , a 2 , a 3 , a 4 , a 5 to experimental data. Recall that derangement circuits permute registers via Definition 1. Permuting two registers is performed by the SWAP operator, which decomposes into a product of N elementary, two-qubit SWAP gates as SWAP N,N · · · SWAP 2,2 SWAP 1,1 , where N is the number of qubits in a register. Our aim is now to optimally recompile elementary, controlled-SWAP gates assuming various different hardware-native gatesets.
The controlled-SWAP, also called Fredkin, gate has been much investigated in the literature, but mostly in the context of fault-tolerant quantum computing. For example, ref. [102] provided a circuit that optimally implements the controlled-SWAP gate using 8 applications of CNOT gates and 9 applications of T gates. Early works have suggested that if one has the ability to natively implement any two-qubit gate, then one can in principle implement the controlled-SWAP gate with only 5 applications of arbitrary two-qubit gates [82,83]. These works have provided circuit representations using 7 applications of controlled-X rotation gates. We now use general techniques of ref. [81] to recompile the controlled-SWAP gate and find more compact representations under the assumption that only a limited set of hardware-native gates can be executed by the hardware. We also find analytical guarantees that the recompiler has found the most compact representation possible. Results as the number ν e of entangling gates and number ν s of single-qubit gates are summarised in Table I. While the corresponding detailed circuits can be found online [58,59], we show the resulting circuit diagrams in Fig. 7, Fig. 8 and Fig. 9.
Equivalence classes: Before discussing details of the recompilation, let us recapitulate basic definitions. A unitary U is fully recompiled into V if their action on every quantum state in the Hilbert space H is identical, i.e., there exists a global phase factor freedom ∃φ ∈ R such that We also consider the case of local SU (4) equivalent recompilations V which, in contrast, result in equivalence only up to a local SU (4) transformation as We apply this definition to the case of elementary controlled-SWAP gates where the SU (4) transformation acts locally on the two swapped qubits. The reason why these circuits are important is the following. We notice that after the derangement circuit D n and the observable σ in Fig. 1 we can apply any local unitary transformation W to the quantum registers without changing the outcome of the measurement on the ancillary qubit. This generally allows us to recompile those controlled-SWAP gates into more compact circuits that are not followed by any further operations. As such, when considering n = 2 copies, the entire derangement circuit can be recompiled into these more compact circuits. Let us now introduce the 3 types of recompilations used in this work. Type A: We consider the fully equivalent recompilation which may be necessary when n > 2 and when implementing controlled-SWAP operations that are followed by other controlled-SWAP operators acting on the same registers. The first column of Table I shows that we generally need 6 two-qubit operations to implement an elementary controlled-SWAP gate and corresponding compact circuits are illustrated in Fig. 7 and Fig. 8. We also need to consider fully equivalent recompilation if the observable is measured by post-selecting on the ancilla and sampling the output of the registers -and not by implementing the controlled-observable as in Fig. 1. The former scheme would allow us to estimate multiple observables simultaneously.
Type B: If a controlled-SWAP gate is not followed by any other gate we only need to recompile up to a local SU (4) freedom. For example in case of n = 3 copies and observable σ = Id, the derangement circuit consists of two swaps of pairs of registers as C[SWAP 2,3 SWAP 1,2 ]. We need to consider Type A recompilation for C[SWAP 1,2 ], and we can consider type B recompilation for C[SWAP 2,3 ], since the latter is not followed by any other operation on the main registers. The second column of Table I shows that we generally need 5 two-qubit operations to implement such an elementary gate. The resulting compact circuits are illustrated in Fig. 7 and Fig. 8. We note that the entire C[SWAP 2,3 SWAP 1,2 ] circuit could also be recompiled up to an SU (8) freedom.
Type C: Observables as Pauli strings σ ∈ {Id 2 , X, Y, Z} ⊗N act on some or all of the qubits non-trivially. For example, consider n = 2 copies and the Pauli string X 4 Y 9 , in which case we can consider Type B recompilation for all controlled-SWAP operators except for the ones that swap qubits 4 with 4 and 9 with 9 in the two registers. Similarly, we need only recompile the product of the elementary swap and the observable, C[X 4 SWAP 4,4 ] and C[Y 9 SWAP 9,9 ], up to a local SU (4) freedom. In the present work we fix an X basis: one can thus implement C[SWAP 9,9 Y 9 ] by first transforming the basis of qubits 9 and 9 using single-qubit rotations. The third column of Table I shows that we generally need 4 two-qubit operations to implement such an elementary gate. The resulting compact circuits are illustrated in Fig. 9.
Gatesets: Let us start by defining single qubit R α (θ) rotation gates that depend on a parameter that can be calibrated to any fixed value in experiments −2π ≤ θ ≤ 2π as where X, Y and Z are Pauli matrices. C[R x ] gates-Let us first consider the aforementioned case of controlled-X rotation gates. We assume that the hardware can natively implement single-qubit Y and Z rotations as well two-qubit controlled-X rotations which we define as This unitary generates the CNOT gate when θ = π. The controlled-SWAP gate can then be implemented via 6 (5) These allow for surprisingly compact representations: the controlled-SWAP gate can be implemented using a single application of the CC[P ] gate plus 2(1) applications of controlled-Z rotation gates as illustrated in the second row of Fig. 8

Exploiting symmetries in derangement circuits
As discussed in the main text, derangement circuits have a rapidly growing number of invariants when increasing the number of copies n or the number of qubits N . This includes the large number of distinct permutations from Definition 1. Let us illustrate how these symmetries can be exploited via the following three examples.
Example 1: In the first example we assume that the connectivity between registers is limited such that only nearest neighbour registers can be swapped. For example, this could be a quantum device with n individual quantum processors arranged in a line. A derangement operation in this case is preferred that swaps only nearest-neighbour registers. For example, for n = 4 we can first apply nearest neighbour SWAP operators between registers as SWAP 1,2 and SWAP 3,4 followed by SWAP 2,3 . In contrast, a derangement which uses, e.g., SWAP 1,4 would not be supported natively by the device. Refer to [58,59] for a demonstration of the various distinct derangement circuits.
Example 2: In the second example we assume that arbitrary connectivity is available. In this case one can in principle implement any of the distinct derangement (permutation) patterns. Although choosing and fixing any one of those is sufficient, it is also possible to randomly choose from these circuits since noise may affect them differently. This may also help in suppressing asymmetries in the potentially non-identical input density matrices thereby creating randomised, 'average' density matrices when n is large.
Note that further invariants exist due to the fact that the σ gate in Fig. 1 can be applied to any of the registers. Example 3: In the third example let us consider an approach that can exploit symmetries in the derangement operator in complete analogy with twirling techniques. We notice that the controlled-derangement operator is a very specific kind of operation: when the ancilla state is |0 then the registers are left invariant and when the ancilla state is |1 then the registers are permuted. We start by applying an operation U before the derangement operator, i.e., Pauli strings P k ∈ {Id 2 , X, Y, Z} ⊗N are applied to registers k with U := ⊗ n k=1 P k . We can undo this operation after the derangement operator by first applying the anti-controlled Pauli string U . This is then followed by the controlled Pauli string U , where in U we need to relabel the indexes k according to the permutation s(k) that the derangement operator implements, i.e., U := ⊗ n k=1 P s(k) . Applying the above (controlled) Pauli strings before (after) the controlled-derangement operator does not affect the expectation-value measurement in an ideal scenario. However, the Pauli strings in U applied to the registers do reflect the errors that happen during the swap process, thus randomly applying Pauli strings and averaging the measurement results can reduce and homogenise the impact of errors that happen in the derangement circuit. Note that this is analogous to twirling techniques.
Let us illustrate this on the particular case of n = 2 copies. We randomly select two Pauli string P 1 , P 2 ∈ {Id 2 , X, Y, Z} ⊗N , where we apply P 1 to the first input state ρ and apply P 2 to the second register. Note that these are just single-qubit X, Y and Z operations (or the identity) applied to individual qubits in the registers. We then perform the controlled-derangement operator D 2 which swaps the two registers. We can now undo the effect of Pauli strings by applying the anti-controlled Pauli string P 1 and P 2 to registers 1 and 2, respectively. Since the derangement operator swaps the two registers we apply the controlled P 1 operator to the second register and the controlled P 2 operator to the first register. 10 -3.

-2.
10 -1. Errors in measuring expectation values of 50 randomly selected observables σ ∈ {Id2, X, Y, Z} ⊗N were computed with (Ẽ) and without (E) twirling. The median of the ratios F of the two errors F :=Ẽ/E is plotted as a function of the number of expected errors in the derangement circuit (ξ) -shading represents the quantiles 1/4 and 3/4. Twirling reduces 30-50% of errors in the noisy derangement circuit, but it can be viewed as a worst-case scenario as discussed in Appendix E 2. (right) Finding the ground state energy E of the spin-ring Hamiltonian from Eq. 8 using the variational Hamiltonian ansatz with l layers as discussed in Appendix F 3. The ansatz parameters were optimised in 5 independent instances and the average of the distance from the ground state energy ∆E is plotted as a function of l. Shading represents the minimum of the 5 instances.
We have simulated the above example assuming a system of N = 3, 4 and 5 qubits using recompiled controlled-SWAP operators from Appendix F 3. We assume the noise model and the same native gateset as in Appendix F 3. We randomly generate input states ρ and compute the error in the derangement circuit as E = Tr[ρ 2 σ]/Tr[ρ 2 ] − 2prob 0 −1 2prob 0 −1 , where we estimate the probabilities prob 0 and prob 0 using the circuit in Fig. 1. We then randomly select and apply 50 pairs of Pauli strings P 1 and P 2 and average the estimated probabilities; we denote the resulting errors asẼ. The ratio of the errors with and without twirling F :=Ẽ/E is plotted in Fig. 10(left). Note that the controlled Pauli strings P 1 and P 2 introduce additional noise when compared to just applying the plain controlled-derangement operator. Nevertheless, Fig. 10(left) highlights that the above twirling scheme is still able to reduce 30-50% of errors in the practically most important region, i.e., when the circuit error rate is ξ < 2.
It is important to note that the simulations in Fig. 10(left) should be viewed as a worst-case scenario for the following reasons. (a) In the example we have considered n = 2 copies, in which case we need 6 entangling gates per qubit in a register to fully recompile controlled-SWAP operators. In the simulations we naively implemented the twirling technique resulting in 2 additional entangling gates per qubit. When we consider a larger number of copies, e.g., n = 5, the overhead of the twirling technique remains 2 entangling gates, but implementing the derangement operation requires proportionally more entangling gates. This leads to a decreasing overhead of the twirling technique. (b) As discussed above, the controlled-observable can be combined with elementary controlled-SWAP gates via recompilation resulting in no increase in the number of two qubit gates in the derangement circuit. Similarly, we can recompile the entire, twirled controlled-SWAP operator into one compact circuit. This will significantly reduce the gate-count overhead of the twirled circuit thereby increasing the efficacy of the error reduction factor F . and randomly generate and fix the on-site energies as ω = (−0.70983, −0.0517, 0.9065, −0.9265, 0.0950, −0.49597). We optimise the ansatz parameters β and γ by applying 1000 iterations of natural gradient evolution to a set of randomly chosen initial parameters in the vicinity of the parameters that approximate the adiabatic evolution. We perform 5 independent optimisations and plot the average and the minimum of the distance ∆E in Fig. 10 (right). The average and the minimum follow the expected scaling [64,79] and the minimum reaches ∆E = 10 −4 at l = 20 layers. Furthermore, regarding the ansatz depth we find that l = 20 is comparable to results of refs. [63,64,79]. It follows that we can implement the ansatz circuit with 3N l = 360 applications of the native entangling gate. Let us remark that even though one may be able to find more compact ansätze [105], the VHA has the strong benefit of being informed by the problem structure -and it is guaranteed to find the ground state for an increasing depth due to its convergence to an adiabatic evolution [22]. Furthermore, even when using more compact ansätze, it is generally expected that the depth of the computation needs to grow when increasing the scale of the computation as discussed in the main text.
We assume the following noise model. Single-qubit gates are followed by dephasing noise with probability and damping (relaxation) noise with a small probability 0.1 . We also assume that the experimentalist can amplify this noise by increasing the value of . Furthermore, the qubits also undergo a small depolarising noise with probability 0.07 but we assume this noise cannot be amplified by the experimentalist. The ratio of the non-extrapolatable noise to the total gate error rate can be expressed via the probabilities We assume the same noise model in case of two-qubit gates but with all probabilities magnified by a factor of 5, i.e., → 5 .
In selecting an error model, it was important to include key characteristics of real systems while retaining the ability to perform efficient simulations. The model chosen exhibits the key elements of dephasing, damping and depolarising, and therefore captures the core characteristics of systems such as ion-traps, encompassing finite T2 relaxation (via dephasing), T1 relaxation (via damping) and imperfect control, heating etc. (via depolarisation). It is typical in ion traps, superconducting systems, and other platforms that single-qubit gate infidelities are significantly less severe than those of two qubit gates and therefore this characteristic was incorporated. Moreover, no real system can be expected to support perfect extrapolation as this implies flawless scaling of all error contributions; here the chosen model makes the assumption that the non-extrapolatable component is small; this is favourable to established extrapolation techniques and therefore provides a rigorous test for our new protocol.
While the resulting comparison is therefore physically plausible, it is worth noting that the ESD technique should also be expected to be robust over a wide variety of other noise models. Specifically, the theoretical error bounds in Result 1 depend only on the eigenvalue distribution of the density matrix; indeed, additional simulations were performed (not reported here) using a depolarising noise model that confirm these theoretical expectations.
In the present example we consider N = 6 qubits and n = 2 copies, i.e., we need to simulate the density matrix of 13 qubits which is equivalent to a 26-qubit pure-state simulation. We recompile the derangement circuit into hardware native gates as discussed in Appendix E 1: we only need to recompile the elementary controlled-SWAP gates up to a local SU (4) freedom (that acts on the swapped qubits). When estimating the probability prob 0 we thus need to use type B circuits from the second column of Table I, see also second column in Fig 7. Implementing the corresponding derangement circuit thus requires overall 5N applications of the entangling gate. When estimating the probability prob 0 we either use circuits of type B or type C depending on whether the observable acts on the particular qubit. For example, when estimating the expectation value of the observable Z 1 , we use type B circuits for all controlled-SWAP gates except the one that swaps the first qubits in both registers as SWAP 1,1 . For the latter we use the type C circuit after rotating the basis of qubits 1 and 1 so that that the observable is effectively mapped Z 1 → X 1 , refer to third column in Table I