Probabilistic Interpolation of Quantum Rotation Angles

Quantum computing requires a universal set of gate operations; regarding gates as rotations, any rotation angle must be possible. However a real device may only be capable of $B$ bits of resolution, i.e. it might support only $2^B$ possible variants of a given physical gate. Naive discretization of an algorithm's gates to the nearest available options causes coherent errors, while decomposing an impermissible gate into several allowed operations increases circuit depth. Conversely, demanding higher $B$ can greatly complexify hardware. Here we explore an alternative: Probabilistic Angle Interpolation (PAI). This effectively implements any desired, continuously parametrised rotation by randomly choosing one of three discretised gate settings and postprocessing individual circuit outputs. The approach is particularly relevant for near-term applications where one would in any case average over many runs of circuit executions to estimate expected values. While PAI increases that sampling cost, we prove that a) the approach is optimal in the sense that PAI achieves the least possible overhead and c) the overhead is remarkably modest even with thousands of parametrised gates and only $7$ bits of resolution available. This is a profound relaxation of engineering requirements for first generation quantum computers where even $5-6$ bits of resolution may suffice and, as we demonstrate, the approach is many orders of magnitude more efficient than prior techniques. Moreover we conclude that, even for more mature late-NISQ hardware, no more than $9$ bits will be necessary.


I. INTRODUCTION
Producing quantum computers of the scale and fidelity needed to solve practically useful problems requires development not just of the quantum processor itself, but of the analogue and digital electronics used for the control and readout of qubits.These electronics may include field-programmable gate array (FPGA) systems [1,2] and customised integrated circuits (ICs) [3][4][5][6], typically cooled to improve performance and integration with the qubits.Understanding and minimising the required specifications of the electronics supporting the quantum computer is an essential step in developing scalable systems in general; These issues, however, become even more acute when considering cryogenic electronics with limited power budgets [7], and/or quantum processor architectures targeting close integration of quantum systems with classical systems, such as for silicon-based spin qubits [8].
Consider, for example, the instruction to implement a parametrised Pauli gate in which a user has specified the kind of gate they want to implement and to which qubits.Parametrised Pauli gates are naturally implemented in most physical platforms and the ability to realise a continuous set of parametrised gates is required by most near-term quantum algorithms [9][10][11].As the gate angles are defined and implemented using digital electronics, the gate angles must be discretised into B bits of resolution.The choice of B has significant impacts on elements such as the bandwidth of communication channels between different elements of the control stack; the memory requirements of any gate instruction cache; and in the digital-to-analogue converters (DACs) used ultimately to produce the driving fields acting on the qubits.There is therefore a strong benefit in minimising B to the point where it is just sufficient to provide the required gate fidelities for a given application or circuit.Most of the currently leading qubit hardware platforms operate optimally at cryogenic temperatures, including superconducting qubits [5,12], trapped ions [13,14], semiconductor spin qubits [15,16], and photonic qubits [17].This has motivated significant effort on developing control systems which can also operate adjacent to the qubits, at low temperatures, where the motivation to minimise power consumption becomes even greater.
Previous studies have examined the required hardware specifications for qubit control, with the goal of optimising the fidelity of individual logic gates to the point that they are limited by factors other than (e.g.) the bit resolution of the control system [7].However, focusing on the impact of control limitations on a single gate operation, rather than on the average output of a quantum calculation, risks significantly over-specifying the control hardware requirements.Our approach is to consider the output state of the quantum device, and examine its sensitivity to the number of bits used in the angular discretisation of the gates (which can ultimately be related to parameters such as the bit resolution and/or sampling rate of a qubit control DAC).
In principle even 3 bits of angular resolution would already guarantee a universal computing machine, as any continuously parameterised gate can be arbitrarily well approximated as a sequence of discrete Clifford and T gates [18].However, realising the desired gate by a sequence of discrete options, one would significantly deepen arXiv:2305.19881v2[quant-ph] 14 Nov 2023 the overall circuit; this is undesirable in general and particularly so for near-term quantum computers where one always prefers shallow circuits.Using a method of Probabilistic Angle Interpolation (PAI) we show that we can effectively upgrade the capabilities of a physical device with only a set of discrete angles such that on average it can implement any continuous rotation gate -and this is achieved without increasing circuit depths.We do so by randomly instructing the control infrastructure to perform one of the discretised rotation angles, and we subsequently combine the individual outputs of the device such that on average we obtain the same output as the ideal device with infinite resolution.The PAI approach thus allows one to fully exploit the power of algorithms that (nominally) require continuously parametrised gates.It does so with only a marginal increase in repetition (sampling) cost for any reasonable number of parameters as long as the aim is to estimate expected values of observables as relevant in most near-term quantum algorithms [9][10][11].
Note that our PAI technique addresses the challenge of discretisation, rather than the well-studied issue of gate infidelity -i.e.random and unknown variations in the implemented gate due to, e.g., imperfections or interactions with the environment.For the latter case techniques such as probabilistic error cancellation (PEC), are commonly used in the context of quantum error mitigation [19] whereby noise in a quantum gate is mitigated by learning the noise model of the particular gate or set of gates and randomly inserting recovery operations such that their average effect cancels out the effect of noisewe discuss further connections in Section V.In the following analysis, however, we assume perfect unitary gates and thus the only challenge we address is that the rotation angles of quantum gates are discretised for the sake of reducing the complexity of engineering.Needless to say, methods of error mitigation can be combined with our PAI techniques.

A. Summary of the protocol
Focusing on quantum systems of N qubits, we consider parametrised quantum gates R(θ) = e −iθG/2 with gate generators G of eigenvalues ±1; These include parametrised SWAP gates and Pauli gates for any Pauli string G ∈ {1 1, X, Y, Z} ⊗N .These gates encompass most gatesets developed for quantum technologies, such as single qubit X, Y or Z rotations or two-qubit XX entangling gates.As we presently explain, the PAI method can also be extended to other physical gate sets.We denote the superoperator of our parametrised gates as R(θ) which acts by conjugation as R(θ)ρ := e −iθP k /2 ρe iθP k /2 .Fig. 1 illustrates a physical device that can perfectly perform parametrised gates R(Θ k ) but only with a finite FIG. 1.
Continuously parametrised Pauli rotation gates R(θ) encompass most typical gates developed for quantum technologies.Due to the use of digital electronics, the rotation angles are divided into 2 B equiangular segments Θ k .In order to reduce engineering complexity, the number B of bits is chosen as small as possible.PAI realises an arbitrary, continuous rotation R(Θ k +θ) by randomly instructing the quantum hardware to apply one of the two nearest notch settings R(Θ k ) and R(Θ k+1 ) or with a small probability to apply the antipolar rotation R(Θ k +π).
angular resolution of B bits as and we detail below the generalisation to non-uniformly distributed (non-linear) set of angles.We define any continuous rotation angle as an overrotation of one of the discrete settings R(Θ k +θ) by an angle 0 ≤ θ < ∆.Given a relative position λ = θ/∆ between two discrete settings, the most simple solution would be to round to the nearest notch; however this leads to systematic coherent errors, and in the numerics that we later present we observe that this effect can be remarkably severe.A less naive approach would involve randomly switching between two nearest notch settings Θ k and Θ k+1 with probabilities λ and (1 − λ), respectively, see Appendix E. However, on average one obtains a non-unitary operation that can result in an exponential decrease of the fidelity of the quantum state with the number of parametrised gates ν in the circuit.PAI randomly chooses one of three allowed notch settings for each parametrised gate in a circuit and exactly implements the desired continuous rotation angle by post processing measurement outcomes.In particular, in each circuit execution we randomly choose either the nearest two notch settings Θ k and Θ k+1 or with a small probability we choose the antipolar angle setting Θ k +π as illustrated in Fig. 1.When estimating expected values with PAI, the individual circuit outputs are multiplied by a sign −1 whenever the third rotation angle was chosen.
Thus, the expected value estimation yields a probability distribution in Fig. 3 (left, grey histogram) that is centred around the same mean value that one would obtain via an infinite angular resolution (blue).However, the ability to exactly implement continuous rotations while having access to only discrete rotation angles comes at the price of an increased number of circuit repetitions that scales in the worst case exponentially as e ν∆ 2 /4 with the number of gates ν.We find in Fig. 2, however, that at B = 7 bits of resolution this overhead is still reasonable when the number of parametrised gates in the circuit is not more than a few thousand (of course, there can be arbitrarily many additional non-parameterised gates which align to the naturally available rotations).

B. PAI of a single rotation gate
Introducing the notation for the aforementioned discrete notch settings as (2) the main observation we build on is that we can exactly express any overrotation R(Θ k +θ) as a linear combination of the discrete gates as By solving a system of trigonometric equations, we obtain the analytic form of the coefficients γ l (θ) in Eq. (A4) as a function of the continuous angle θ.In a fashion analogous to quasiprobability sampling methods [19][20][21] which mitigate non-unitary error effects, our angular decomposition leads to the following implementation.
Statement 1.We define a sampling scheme whereby we randomly choose one of the three discrete gate variants {R l } 3 l=1 from Eq. (2) according to the probabilities p l (θ) = |γ l (θ)|/∥γ(θ)∥ 1 which yields the unbiased estimator of the rotation gate as To intuitively understand the above approach, we expand the trigonometric probabilities to leading order in the small ∆.With probability p 1 (θ) = (1 − λ) + O(∆ 2 ) we apply the gate at the notch setting Θ k , and with probability p 2 (θ) = λ + O(∆ 2 ) we apply the gate at the next notch setting Θ k+1 .In leading order this would be equivalent to the naive approach discussed in the previous section which led to a non-unitary operation.In contrast, we obtain the desired, unitary operation by applying the antipolar rotation R(Θ k +π) with a small probability p 3 (θ) = 1  4 λ(1 − λ)∆ 2 + O(∆ 4 ).Additionally, we multiply any observable measurement-outcome with the factor ∥γ(θ)∥ 1 sign[γ l (θ)].
In Theorem 1 we explicitly prove the above approach is optimal in the sense that it yields a minimal ∥γ∥ 1 and present a general solution that can be applied to nonuniform notch settings too.
Theorem 1 (informal summary of Theorem 2).Given any set {R(Θ q )} of discrete (possibly non-uniform) notch settings that a machine can realise, the optimal protocol that minimises ∥γ∥ 1 uses Θ k and Θ k+1 as the two nearest notch settings to θ and we choose the third gate to be the notch setting nearest to Θ k +π+ ∆ 2 , where we defined the distance ∆ := Θ k+1 −Θ k .

C. PAI of parametrised circuits
We now consider a quantum circuit U circ that contains ν continuously parametrised gates and additionally may also contain other non-parameterised gates.We apply Statement 1 to each of the continuously parametrised gates: Given that each parametrised gate has a desired continuous rotation angle Θ k l +θ l , we first determine the corresponding notch settings (Θ k1 , Θ k2 . . .Θ kν ) and corresponding overrotation angles (θ 1 , θ 2 , . . .θ ν ) in Eq. (3).At each execution of the circuit we randomly replace a parametrised gate with the corresponding discrete gate variant, i.e, the j th parametrised gate is replaced by one of the discrete gate variants R (j) lj from Eq. ( 2) according to the probability distribution p lj (θ j ) from Statement 1.
The result is a set of circuit variants U j that contain only discrete notch settings according to the multi index j = (j 1 , j 2 , . . .j ν ) ∈ 3 ν .Statement 2. Given a circuit U circ of ν parametrised gates we choose a multi index l ∈ 3 ν according to the probability distribution p(l) = |g l |/∥g∥ 1 where g l are simply products of the single-gate factors from Statement 1.We obtain an unbiased estimator of the ideal circuit as by executing the circuit variants U l in which all continuously parametrised gates are replaced by the discrete ones according to the multi index l.Thus, E[ Ûcirc ] = U circ .
The above scheme can be compared to Probabilistic Error Cancellation (PEC) [20,21] which removes nonunitarity of gates by randomly inserting recovery operations into the circuit.Despite the close formal connection, PAI is quite different conceptually, e.g., all gates involved in PAI are unitary, PAI does not apply gate insertions but rather applies the same gate at different angle settings.Furthermore, the quasiprobability decomposition in PAI in Eq. ( 3) is known by construction as opposed to the experimentally learned approximate models in PEC [19,[22][23][24][25]. Thus one applies a state-preparation circuit to a fixed reference state as U circ |0⟩⟨0|; One then performs a measurement whose outcome is generally a random variable; By averaging over many repeated measurement outcomes  6).(solid lines) Worst-case measurement overhead ∥g∥ 2  1 of PAI as a function of the number of parametrised gates in the quantum circuit.The number of gates one can reasonably (with an overhead at most 12) implement with PAI is approximately 2 2(B min −1) where Bmin is the number of bits used to digitise the rotation angle in Fig. 1.As these estimates rely on worst-case bounds, we observe in numerical simulations that the actual number of gates can be significantly larger.(red vs. green lines) our optimal scheme can achieve many orders of magnitude smaller overheads than prior techniques based on Clifford operations [26].(yellow region) very deep circuits will require quantum error correction and thus even for late-NISQ devices no more than 9 bits will be necessary.
one obtains an empirical estimate of the expected value o.Without loss of generality we assume a normalised observable ∥O∥ ∞ and thus the number of repetitions required to determine the expected value to a precision ϵ scales as N s ≤ ϵ −2 .
Since we assume access to only discrete rotations, whenever the hardware is instructed to execute the statepreparation circuit U circ the parametrised gates are replaced by one of the discrete circuit variants U l .After performing a measurement, one multiplies the random outcome with a factor ∥g∥ 1 sign(g l ) that can have negative signs.As a consequence, the variance of the estimator is magnified which implies an increased number of circuit repetitions.
Statement 3. Applying PAI to the estimation of an expected value results in an unbiased estimator ô of the expected value of an observable as The number of repetitions required to determine the expected value o to accuracy ϵ scales as where ∥g∥ 1 is simply a product of the single-gate norms ∥γ (j) (θ j )∥ 1 from Statement 1.
Indeed, to achieve the same precision, PAI has an increased measurement cost compared to having physical access to continuously parametrised gates.In the worst case, when all gate angles are exactly halfway between two notches as θ j = ∆ j /2, this overhead scales as e ν∆ 2 max /4 where ∆ max is the largest discretisation across the different parametrised gates.
The overhead is actually quite reasonable as long as the exponent does not significantly exceed 1 as illustrated in Fig. 2. Thus, in order for the circuit repetitions to not exceed a 12-fold increase (grey dashed line in Fig. 2), the number ν of parametrised gates in a circuit that can be implemented with PAI is limited by the lowest resolution B min of the gate discretions as ν ≤ 2 2(Bmin−1) .
For example, at B = 7 bits resolution 4096 gates can still reasonably be implemented, while 10 bits resolution allows over a quarter of a million gates, which is certainly sufficient for most near-term applications [9][10][11].Furthermore, one can significantly reduce these costs by "turning off" PAI for the gates that are not contained in the light cone of the analysed observable O [27,28].

III. NUMERICAL SIMULATIONS
We consider a typical practical benchmarking task of simulating the spin-ring Hamiltonian with coupling J = 0.3 and uniformly random −1 ≤ ω k ≤ 1.This spin problem is relevant in condensendmatter physics in understanding many-body localisation in which early quantum computers might be very useful [29][30][31].Time evolution-We first consider simulating time evolution which is one of the most natural applications of quantum computers [32][33][34] and focus on Trotterisation that is a commonly applied simulation technique [35]; it approximates the time evolution e −itH as repeated layers of evolutions under the individual Hamiltonian terms for small times δt.Since the evolution under each Hamiltonian term in Eq. ( 7) is a Pauli rotation gate, e.g., R(ω k δt), a layer of the time evolution circuit is just a series of rotation gates each tuned to its relevant small rotation angle; this layer is then repeated a large number t/δt of times.
Rounding the rotation angles to nearest notch settings, e.g., ω k δt → Θ 1 , leads to a significant coherent error as it implements incorrect evolution times and/or incorrect interaction terms ω k , thus near phase transitions the discrepancy might be radical.Measuring an expected value with a fixed number of shots leads to a biased probability distribution as we illustrate in Fig. 3(left, red).In contrast, PAI results in a probability distribution that is centred around the exact mean in Fig. 3(left, blue and grey) while its distribution width is slightly increased.The increase in width is actually significantly lower than our worst-case estimates in Statement 3 and we quantify in the Appendix that indeed generally this is the case when all rotation angles are small or are close to one of the notch settings as in case of trotterised time-evolution circuits.
Finding eigenstates-We next consider finding eigenstates of the Hamiltonian in Eq. ( 7); A broad range of techniques are available in the literature including ones that target near-term and early fault-tolerant quantum computers [9][10][11][36][37][38][39].We use the same trotterised circuit structure as we used for time evolution but we optimise the angles of the rotation gates so that the energy Tr[HU circ |0⟩⟨0|] of the emerging state is minimal -this variant of the variational quantum eignesolver uses the Hamiltonian Variational Ansatz as in case of QAOA [9][10][11]40].Fig. 3 (right, red lines) illustrates that a gradient descent optimiser does not manage to meaningfully lower the energy when the gradient is calculated using only nearest notch settings due to the coherent discrepancy in the output state.In contrast, using the same quantum resources (same number of circuit repetitions and discretised gates) but estimating the gradient using PAI matches the performance of an ideal quantum circuit that has infinite angular resolution in Fig. 3 (right, green and blue lines).We also note that formally our PAI protocol applies a different, randomly chosen circuit variant at each circuit repetition.However, reconfiguring circuits will likely be a bottleneck for some quantum hardware platforms and thus it is desired to run the same circuit variant multiple times.Indeed, Fig. 3 (right, green) only uses 100 different circuit variants-each of which is repeated 10 4 times-which demonstrably does not com-promise the optimiser's performance.

IV. GENERALISATIONS
A number of generalisations and further applications of our approach are apparent.
First, our results in Theorem 1 directly apply to settings where the discretisation in Fig. 1 is not uniform: Such a non-uniform discretisation of angular settings may arise from a non-linear relationship between the control field amplitude and the rotation angle achieved, for example when modulating the exchange interaction [41] between two spin qubits or applying a Stark shift [42].
Second, the approach can be generalised to quantum gates beyond gate generators of eigenvalues ±1.One then writes a system of equations similar to Eq. ( 3) but uses more variables and more discrete gate angle settings, and solves for the variables either analytically or numerically.
Third, as we detail in Appendix E, a variant of PAI would be to simply omit the rarely-occurring third, antipolar angle from the choices and instead select between only the nearest notches.Conceivably this would offer a slight simplification of hardware, however this would fundamentally limit the device's capacity to obtain expectation values that are unbiased with respect to the values obtained from an ideal (continuous angle) system.
Fourth, in our spin-ring simulations we assumed standard trotterisation is used, however, for quantum chemistry Hamiltonians one may significantly benefit from randomised compilation techniques, such as qDRIFT [43].As we now detail, PAI can be seamlessly combined with such randomised compilers.In particular, given the Hamiltonian H = L q=1 h q H q with coefficient ℓ 1 norm λ = L q=1 |h q |, a standard trotter circuit of r steps consists overall of N = Lr gates.In qDRIFT one randomly chooses a sequence of the rotation gates R q (τ ) = e iτ Hq/2 with the benefit that the sequence length depends on the norm λ rather than on the number of terms L and that each gate in the sequence needs only have a fixed, constant rotation angle τ = 2tλ/N .These angles are indeed potentially small and the application of PAI is just as relevant as in the case of standard trotterisation: the combination with PAI proceeds by first randomly choosing a gate R q (τ ) according to qDRIFT and then implementing the continuous rotation angle θ = τ by randomly choosing one of the relevant notch settings R q (Θ k ) according to PAI.
Finally, we note that the present approach is not limited to near-term applications and might also be useful in the early fault-tolerance regime.In particular, ref [26] considered the Solovay-Kitaev decomposition whereby one aims to approximate a continuous rotation R(θ) as a sequence of Clifford and T gates.Ref [26] then considered PEC to mitigate the approximation error of this decomposition by randomly applying Clifford recovery operations and already noted it leads to a significant measurement overhead.Indeed, our general solution in Theorem 2 can be applied to Clifford recovery operations R(π/2) and R(π) by substituting the (suboptimal) angles A = π/2, B = π and θ = ∆/2.Given in our approach we choose A and B optimally, our scheme achieves many orders of magnitude lower measurement overhead in typical practical scenarios as illustrated in Fig. 2(green vs. red lines).

V. DISCUSSION AND CONCLUSION
We present PAI which effectively upgrades the capabilities of a quantum hardware that can only realise discrete rotation angles to a device that can perform arbitrary, continuous rotation angles.We achieve this by randomly choosing one of three possible rotation angles in each parametrised gate such that on average the exact, desired unitary rotations are performed.The limitation of only being able to perform a discrete set of rotation angles manifests itself in an increased number of circuit repetitions when measuring expected values of observables.
We upper bound this measurement overhead and conclude it is negligible even for circuits consisting of hundreds or a few thousand parametrised gates at a resolution as low as B = 7 bits.Apart from the slightly increased number of circuit repetitions, the present approach requires no additional quantum resources.PAI can be compared with a number of well-established prior techniques.
Cooperative optimum control -it was proposed in [44] that shaped pulses R k that implement a desired rotation need not be individually accurate but rather need only satisfy a relaxed condition that the average of a series of pulse variants are required to be accurate over many repeated measurement rounds.The present approach is indeed quite comparable, however, we allow for the additional freedom that different gate/circuit variants are weighted.For example, in a single notched gate implementation we have three gate variants in Eq. ( 3) but the third one R(Θ k +π) is only rarely applied due to its small probability O(∆ 2 ).We expect, however, that the present techniques will enable new developments in exploiting pulse-optimisation techniques for near-term applications [45,46].
Quantum Error Mitigation-techniques [19] can be combined with the present approach seamlessly in order to mitigate experimental imperfections of the gates R 1 , R 2 and R 3 in Eq. ( 2) as we detail in Appendix I.The measurement overhead of PAI from Section II D then increases to e ν(4ϵ+∆ 2 max /4) , thus the resolution ∆ max should be engineered in accordance with the error rates ϵ of the physical gates via 4ϵ ≈ ∆ 2 max /4.Specifically, for first generations of devices (10 −3 ≤ ϵ ≤ 10 −2 ) B = 5 digits of precision may suffice while one needs B = 6 − 7 bits of precision for NISQ devices in the early practical quantum advantage regime (10 −4 ≤ ϵ ≤ 10 −3 ).Furthermore, our approach leverages on a similar quasiprobability decomposition to PEC [19][20][21], and we can thus take advantage of a rich literature to, e.g., use light-cone arguments to significantly reduce sampling costs [27,28,47], while PAI can also be immediately combined with classical shadows via [27,48].
Circuit Knitting-Related quasiprobability decompositions have been used for replacing two-qubit entangling gates with classically post processed random implementations of single-qubit operations [49,50] which was termed Circuit knitting.The approach has a cost O(9 ν /ϵ 2 ) for implementing ν two-qubit gates and is thus limited to only implementing a few, e.g., 3-4, quantum gates whereas the present protocol works well in the regime of tens of thousands of gates assuming resolutions in the range 7 − 9 bits.
To conclude, we expect the present technique will be an important and useful tool in designing optimal quantum hardware: Our analysis suggests that first generation quantum hardware, being practically limited to only a few thousand gate operations (e.g., due to limited coherence times), will need no more than 7 bits of resolution in the control systems.As the technology matures, future generations of hardware are expected to be able to execute tens of thousands of quantum gates without error correction which still, however, requires no more than 9 bits of angular resolution.

B.K. thanks the University of Oxford for a Glasstone
Research Fellowship and Lady Margaret Hall, Oxford for a Research Fellowship.The numerical modelling involved in this study made use of the Quantum Exact Simulation Toolkit (QuEST), and the recent development QuESTlink [51] which permits the user to use Mathematica as the integrated front end, and pyQuEST [52] which allows access to QuEST from Python.We are grateful to those who have contributed to all of these valuable tools.The authors would like to acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility [53] in carrying out this work and specifically the facilities made available from the EP-SRC QCS Hub grant (agreement No. EP/T001062/1).The authors also acknowledge funding from the EP-SRC projects Robust and Reliable Quantum Computing (RoaRQ, EP/W032635/1) and Software Enabling Early Quantum Advantage (SEEQA, EP/Y004655/1).

Appendix A: Single rotation gate
As they form one-parameter groups, any overrotated quantum gate can be written as where 0 ≤ θ ≤ ∆ is a small, arbitrary overrotation.Due to this property, we need only expand the rotation R(θ) into a linear combination of rotations at different angles.
Focusing on Pauli gates of the form e −iθP k /2 for any Pauli string P k ∈ {Id, X, Y, Z} ⊗N in an N -qubit system, ref. [54] showed that any Pauli rotation can be decomposed as a linear combination of the same gate at different rotation angles.We also note that any gate generator with eigenvalues ±1 is covered by our formalism, e.g., our results similarly apply to parametrised SWAP gates.The decomposition follows as We use the above decomposition to obtain the quasiprobabilities γ l in Eq. ( 3).In particular, by combining Eq. (A2) with Eq. ( 3) we obtain the following non-linear system of equations The above system of equations is solved by the set of coefficients γ 1 , γ 2 and γ 3 as We can also analytically compute the vector norm as Finally, we verify that the estimator of the channel in Statement 1 is unbiased by substituting the probability where the last equation follows from Eq. ( 3).

Appendix B: Asymptotic expansions
We compute asymptotic expressions for small ∆ by introducing the ratio λ = θ/∆ as ), (B1) Thus the probabilities can be expanded as Similarly, later we will make use of the squared norm ∥γ(θ)∥ 2 1 and its expansion into leading terms as We can also generally upper bound this norm by its worst case value at λ = 1/2 and substitute the above expansion as   3 showing how the root mean square (RMS) deviation decreases as we increase the number of shots.(blue) standard shot noise limit using the device of infinite resolution; (black) sample RMS deviation -the ideal device's standard deviation is slightly increased by PAI as black is slightly above blue (red) our theoretical worst-case bound based on the variance of PAI is well above the sample RMS deviation.
of positive operators E b , called the effects.The simplest case is the case of perfect projective measurements via the effects E b = |b⟩⟨b| where b ∈ {0, 1} N are bitstrings.The effects can be more general positive operators which is useful when modelling measurement errors but in gen- The expected value of an arbitrary observable O is then estimated via the estimator x = Tr[OE b ] and the expectation value is Generally, the probability of observing one of the outcomes in an arbitrary state ρ is q b = Tr[ρE b ].Indeed for the case of projective measurements in the eigenbasis of O one obtains the usual where we used the equality Let us now assume that the state is prepared via the circuit as U circ |0⟩⟨0| and our aim is to estimate the expected value Tr[OU circ |0⟩⟨0|].As described in Statement 2, we choose randomly and run the circuit variants U l and we multiply the individual outcomes with the relevant prefactor ∥g∥ 1 sign(g l ) thereby estimating ∥g∥ 1 sign(g l )Tr[OU l |0⟩⟨0|].Formally, this results in the estimator as in which we multiply the individual outcomes Tr[OE b ] with the relevant prefactors.The probability of observing an outcome is , where p l is the probability of choosing the circuit variant U l .One can indeed verify that we obtain the correct expected value as where we used that ∥g∥ 1 p l = |g l | and substituted that l g l U l = U circ .Indeed we obtain the expected value

Variance of estimators
We consider the variance of the estimator of the quantum-mechanical expected value in Eq. (C3) as for any observable O. Recall that we calculate the variance as Var and since our estimator Above we have used that Tr[OE b ] ≤ ∥O∥ 2 ∞ and used that the probability distribution satisfies b,l q l,b = 1.
After repeating the single-shot procedure N s times and calculating the mean of the individual outcomes mean(ô 1 , ô2 , . . .ôNs ), the variance ϵ 2 of the empirical mean scales inversely proportionally with N s .Thus, the sample complexity to achieve accuracy ϵ is

Asymptotic expansion of the variance
Without loss of generality we assume that ∥O∥ 2 ∞ = 1.We can further expand the variance by recalling that the coefficients ∥g∥ 2  1 factorise into a product form in Eq. (C2) as We generally upper bound the norms in Eq. (B3) as ∥γ (j) (θ j )∥ 2 1 ≤ ∥γ (j) (∆ j /2)∥ 2 1 using the worst-case scenario, maximal value which is attained at exactly halfway between two notch settings as θ j = ∆ j /2.We can thus generally upper bound the coefficients and expand the bound for small notch discretisations ∆ j ≤ ∆ max as where ∆ max is the largest of the notch discretisations ∆ j .By substituting the explicit from of ∆ j for the discretisations from Eq. ( 1) we obtain in the exponent π 2 ν2 −2Bmin .Thus the minimal number of bits B min required for the variance to still be bounded as Var[ô] ≤ e π 2 /4 ≤ 12 the number of gates ν needs to satisfy ν ≤ 2 2(Bmin−1) .The same simulation of a time-evolution circuit as in Fig. 3(left) but using our approximate scheme from Appendix E whereby we only choose from the two nearest notch settings.This approach does not introduce a measurement overhead, however, the average gate implemented is non-unitary.The simulation shows that as we increase the number of gates to which we apply this approximate approach the fidelity of the output state drops exponentially.In comparison, at B = 7 bits of precision and for ν = 1786 parametrised gates we observe a 20% increase in the distribution width in Fig. 3(left) whereas here we observe a 15% drop in the fidelity of the output state.We show extrapolated values for gatecounts larger than ν = 1786.
The above upper bound is attained only in the worstcase scenario when all rotation angles are exactly halfway between two notch setting via λ j = 1/2.We can instead approximate the variance upper bound by expanding the vector norms using Eq.(B2) given the discretisation are small in practice ∆ j ≪ 1 as Above we have introduced the notation for the average deviation of the rotation angles from the nearest notch settings as which has the property 0 ≤ λ ≤ 1.We thus find that the minimum number of bits in the discretisation needs to satisfy 1) .
Indeed in the worst case λ = 1 but when no rotation angle deviates from the nearest notch setting by more than 25% then λ = 0.75 and thus the maximal number of gates is increased inversely proportionally.
evolution circuit as in Fig. 3(left).At B = 7 bits of precision the fidelity drops by 15% when all ν = 1786 gates are executed using the approximate approach.In contrast, our PAI protocol yields perfect fidelity (mean value matches the ideal one in Fig. 3(left)) but the probability distribution of measurement outcomes is increased by about a comparable factor, i.e., by about 20%.However, this increase in distribution width can be trivially overcome just by repeating the measurement procedure by a slightly increased number of times.In contrast a 15% infidelity might imply a comparable error when measuring expected values which would be prohibitive in practice.However, if one still prefers to use the approximate approach then one can apply error mitigation techniques to mitigate the drop in the fidelity.These indeed generally come at the cost of an increased sampling [19] and may thus yield a worse performance than PAI.A straightforward approach would be to use the present approximate scheme at B and B−1 bits of precision, estimate expected values and extrapolate to the case of ∆ → 0. While the individual expected values require no increased sampling, the shot noise in the extrapolated estimate is increased and thus Zero Noise Extrapolation indeed requires overall an increased sampling.

Appendix F: Optimality of the solution
It is clear from Eq. (A2) that the minimal number of rotation gates needed is three as Eq.(A2) has three degrees of freedom, namely, 1 ± cos(θ) and sin(θ).While indeed we use 3 rotation gates, namely the two nearest notch settings and a polar opposite rotation, we discuss in Appendix H that using more than 3 notch settings is suboptimal.
We now prove that our choice of 3 notch settings is optimal in the sense that any other choice of 3 rotation angles yields a solution with a higher ∥γ∥ 1 norm -which norm is of paramount importance for us to minimise.
Theorem 2. Given a target rotation gate as the quasiprobability decomposition (cf Eq. (3)) where the three rotation gates are chosen from any set of (possibly non-uniform) discrete notch settings that a machine can realise R 1 , R 2 , R 3 ∈ {R(Θ q )}.Let us denote as Θ k and Θ k+1 as the two nearest notch settings to θ and define the distance ∆ := Θ k+1 − Θ k .The optimal choice of rotation gates that minimises ∥γ∥ 1 is the following where B is the rotation angle nearest to Θ k + π + ∆ 2 .In the present work we consider a uniform discretisation of Θ k , thus there are two equivalent choices as B = Θ k + π or as B = Θ k+1 + π, as we detail in Appendix H.
Proof.As explained in Eq. (A1), without loss of generality the notch setting can be defined as Θ k := 0. We then have two degrees of freedom that we want to optimise as the rotation angles A and B of the second and third gates as R 2 := R(A) and R 3 := R(B) whereby the angles satisfy 0 < θ < A < B < 2π by definition.The relevant quasiprobabilities γ satisfy the equation This system of equations is solved by the coefficients We now minimise this vector norm as a function of the rotation angles A and B Optimising the rotation angle A: Let us compute the derivative of the norm ∥γ∥ 1 with respect to A as Indeed, we find that ∂∥γ∥1 ∂A > 0 in the relevant region where 0 < θ < A < B confirming that the norm increases monotonically as we increase A. Thus we can monotonically decrease the norm by decreasing A and ultimately approaching the limit lim A→θ ∥γ∥ 1 = 1.Thus, in order to minimise ∥γ∥ 1 we need to choose A as small as possible and indeed our best option is choosing the nearest notch setting A = Θ k+1 .Optimising the rotation angle B: We now find the minimum of the vector norm with respect to the rotation angle B. For this reason we calculate the derivative of the norm with respect to B and solve the trigonometric equation The above product of trigonometric functions is only 0 when sin A 2 − B = 0 which equation is uniquely solved by B = π + A/2 in the interval A < B ≤ 2π and indeed corresponds to the unique minimum along B.
However, one can straightforwardly show that the above solution is actually equivalent to Eq. (A4) up to the symmetry transformation θ → ∆ − θ and relabelling of the gates, i.e., γ ′ This indeed confirms that we are free to choose whether we define θ as an overrotation of one of the two nearest notch settings or as an underrotation of the other nearest notch setting.
Furthermore, both ⃗ γ and ⃗ γ ′ are solutions to our problem with identical ℓ 1 norm; Since linear combinations of solutions are similarly solutions, there are infinitely many solutions that use four rotation angles Θ k , Θ k+1 , Θ k + π and Θ k+1 + π however, it is straightforward to check that any admissible linear combination of these solutions have the same ℓ 1 norm -which is identical to Eq. (A5).We have verified these expectations: we set up the system of 3 equations in Eq. (A2) but with four unknowns γ 1 , γ 2 , γ 3 and γ 4 , and analytically solved for the unknowns.Indeed there are infinitely many solutions but the ℓ 1 norm of all solutions is equivalent to Eq. (A5).Thus, in order to reduce engineering complexity, we strongly prefer our optimal solution that uses only 3 notch settings.
One could of course use more than n = 3 or n = 4 notch settings, which again results in an underdetermined system of equations as Eq.(A2) would contain only three equations for the three degrees of freedom, namely, 1±cos(θ) and sin(θ), while the system would contain n unknowns, namely γ 1 , . . .γ n .Our best strategy is then to choose solutions that have minimal ℓ 1 norms.In fact it is a well-established principle in the theory of underdetermined systems that the solutions with minimal ℓ 1 norm are almost always the sparsest possible solutions.We confirmed numerically that indeed the solutions with least ℓ 1 norm are always 3-sparse and correspond to the 3 rotation gates that we chose in the present work.

Appendix I: Combining with quantum error mitigation
Error mitigation techniques are straightforwardly compatible with PAI.Given these techniques are completely decoupled from the present approach, error mitigation can be implemented seamlessly "on top" of PAI.As error mitigation yields measurement overheads that generally grow exponentially with both the number ν of gates and with the per-gate error rates ϵ, we detail that the discretisation ∆ should be engineered in accordance with the gate errors ϵ.

Common error mitigation techniques
In the present work we focused on ideal, unitary gates R 1 , R 2 and R 3 that are used to implement an arbitrary rotation angle R(θ) in expectation via Eq.( 2).In a real experiment, however, these gates are noisy which we reflect via the tilde notation Rj , and as we now detail, nearly all existing error mitigation techniques can straightforwardly be combined with the present approach.
First, one can use efficient schemes, such as sparse Pauli-Lindblad or learning-based QEM techniques [19,24,55] to efficiently learn the noise model of the local gates Rj ; one then implements standard Probabilistic Error Cancellation (PEC) techniques [19][20][21] to effectively obtain the noise-free unitary gate via the quasiprobability decomposition R j = q Γ jq Ũq .Here Ũq are native noisy gate operations supported by the quantum hardware, e.g., noisy rotations Rj followed by recovery operations [19].This PEC approach introduces a measurement overhead due to the increase of variance of observables and thus each noisy gate variant Rj has a measurement overhead ∥Γ j ∥ 1 associated with the cost of mitigating incoherent errors.Thus, the measurement overhead ∥γ(θ)∥ 2  1 to implement the desired noise-free and continuous angle rotation R(θ) associated with PAI is increased in the worst case to the product ∥γ(θ)∥ 1 ∥γ max ∥ 1 .-and we will denote as ∥Γ max ∥ 1 = max j ∥Γ j ∥ 1 the largest overhead due to gate noise.
Another commonly used error mitigation technique is zero-noise extrapolation [19].In this approach one increases the noise level ϵ of the gates Rj (ϵ) (for example via the previously learned noise models [55]) to measure expectation values at increasing error rates; One then extrapolates the expected values to zero error to effectively obtain the expectation value one would measure having access to the noise-free gates Rj (ϵ → 0).Finally, purification-based techniques can also be applied straightforwardly [19,56]: these techniques are oblivious to the error models of the gates and by preparing n copies of the noisy quantum state one can guarantee that on average the deviation from the desired ideal gates R j is effectively exponentially small in n.

Measurement overhead
The measurement overhead of the above errormitigation techniques generally scale exponentially with the circuit error rate ξ = νϵ which is the per-gate error rate ϵ multiplied by the number ν of gates.For example, the measurement overhead of PEC grows as e 4νϵ as derived in [19] for a common error model.As we proved in Appendix C 3, the overhead associated with PAI grows as e ν∆ 2 max /4 and thus the total measurement overhead grows as the product e 4νϵ e ν∆ 2 max /4 = e ν(4ϵ+∆ 2 max /4) .
The above dependence on the number ν of gates motivates us to choose a discretisation ∆ max that ensures the condition ∆ 2 max /4 ≈ 4ϵ.In particular, choosing too many digits results in a fine discretisation as ∆ 2 max /4 ≪ 4ϵ which risks overengineering the quantum device, i.e., one could simply increase ∆ max without significantly increasing the measurement overhead which is dominated by the overhead associated with error mitigation.In contrast choosing too few digits ∆ 2 max /4 ≫ 4ϵ leads to the measurement overhead being dominated by the implementation of PAI.
Specifically, first generations of devices with error rates 10 −3 ≤ ϵ ≤ 10 −2 can be engineered in principle with as low as B = 5 − 6 digits of precision.In contrast, achieving practical quantum advantage will likely require per-gate error rates 10 −4 ≤ ϵ ≤ 10 −3 for which B = 6 bits of precision may already be sufficient while B = 7 (and ϵ ≈ 10 −4 ) would enable useful applications of early quantum computers.

D.
Estimating expected values Typical near-term and early fault-tolerant quantum algorithms use quantum computers for estimating expected values o = Tr[OU circ |0⟩⟨0|] of an observable O [9-11].

FIG. 2 .
FIG.2.The measurement cost of PAI is increased compared to the case when one has access to continuous rotation angles, see Eq. (6).(solid lines) Worst-case measurement overhead ∥g∥2  1 of PAI as a function of the number of parametrised gates in the quantum circuit.The number of gates one can reasonably (with an overhead at most 12) implement with PAI is approximately 2 2(B min−1) where Bmin is the number of bits used to digitise the rotation angle in Fig.1.As these estimates rely on worst-case bounds, we observe in numerical simulations that the actual number of gates can be significantly larger.(red vs. green lines) our optimal scheme can achieve many orders of magnitude smaller overheads than prior techniques based on Clifford operations[26].(yellow region) very deep circuits will require quantum error correction and thus even for late-NISQ devices no more than 9 bits will be necessary.
FIG. 4. (left)A single layer of the ansatz structure used in our simulation show for 6 qubits.(right) Same experiment as in Fig.3showing how the root mean square (RMS) deviation decreases as we increase the number of shots.(blue) standard shot noise limit using the device of infinite resolution; (black) sample RMS deviation -the ideal device's standard deviation is slightly increased by PAI as black is slightly above blue (red) our theoretical worst-case bound based on the variance of PAI is well above the sample RMS deviation.
FIG. 5.The same simulation of a time-evolution circuit as in Fig.3(left) but using our approximate scheme from Appendix E whereby we only choose from the two nearest notch settings.This approach does not introduce a measurement overhead, however, the average gate implemented is non-unitary.The simulation shows that as we increase the number of gates to which we apply this approximate approach the fidelity of the output state drops exponentially.In comparison, at B = 7 bits of precision and for ν = 1786 parametrised gates we observe a 20% increase in the distribution width in Fig.3(left) whereas here we observe a 15% drop in the fidelity of the output state.We show extrapolated values for gatecounts larger than ν = 1786.
. 3. (left) Distribution of estimated expected values ⟨Z0⟩ using 1000 circuit repetitions (shots) in a deep, 12-qubit trotter circuit of l = 50 layers that consists of ν = 1786 parametrised gates.(left, red) using the nearest notch settings at 7 bits resolution results in a shifted mean (black vertical line) due to over/under rotations.Experimentally estimated histogram (grey) of PAI at 7 bits resolution is centred around the same mean as the ideal one (blue) assuming infinite resolution but its distribution width is slightly increased.
ΔEVQE iterationsFIG(right) Energy distance ∆E from the ground-state during a gradient descent search of a 12-qubit spin-ring problem (energy shown assuming infinite resolution to inform about the optimiser's progress).A relatively deep circuit of ν = 540 parametrised gates is used.Gradient estimation is performed with (blue) infinite rotation-angle resolution and infinite number of shots; (red) using only the nearest notch settings at 7 bits of resolution and 10 6 shots; (green) using PAI with 10 4 shots at only 100 different circuit configurations.PAI (green) significantly outperforms the naive approach (red) despite it uses the same amount of quantum resources and essentially recovers the performance of the ideal optimiser (blue).Additionally shown is the energy (dashed grey) at the notch settings nearest to the ground-state parameters. )