Modeling the Performance of Early Fault-Tolerant Quantum Algorithms

Progress in fault-tolerant quantum computation (FTQC) has driven the pursuit of practical applications with early fault-tolerant quantum computers (EFTQC). These devices, limited in their qubit counts and fault-tolerance capabilities, require algorithms that can accommodate some degrees of error, which are known as EFTQC algorithms. To predict the onset of early quantum advantage, a comprehensive methodology is needed to develop and analyze EFTQC algorithms, drawing insights from both the methodologies of noisy intermediate-scale quantum (NISQ) and traditional FTQC. To address this need, we propose such a methodology for modeling algorithm performance on EFTQC devices under varying degrees of error. As a case study, we apply our methodology to analyze the performance of Randomized Fourier Estimation (RFE), an EFTQC algorithm for phase estimation. We investigate the runtime performance and the fault-tolerant overhead of RFE in comparison to the traditional quantum phase estimation algorithm. Our analysis reveals that RFE achieves significant savings in physical qubit counts while having a much higher runtime upper bound. We anticipate even greater physical qubit savings when considering more realistic assumptions about the performance of EFTQC devices. By providing insights into the performance trade-offs and resource requirements of EFTQC algorithms, our work contributes to the development of practical and efficient quantum computing solutions on the path to quantum advantage.


I. Introduction
Despite significant experimental and theoretical progress [2,3], noisy intermediate-scale quantum (NISQ) devices have yet to exhibit the capacity to solve practical real-world problems with valuable outcomes.A promising avenue towards achieving practical quantum advantage lies in the development of architectures that can support large-scale fault-tolerant quantum computations (FTQC) [4].By incorporating robust fault-tolerance capabilities, we can suppress errors in our computations to an arbitrary extent.However, this comes at the cost of resources that far exceed the capabilities of present-day devices by several orders of magnitude.Projections by researchers indicate that millions to billions of physical qubits would be required to outperform classical computers in tasks such as factoring and ground state energy estimation [5][6][7].
There exists a substantial discrepancy between the capabilities of today's quantum devices and the projected resource requirements for practical large-scale fault-tolerant architectures.This discrepancy motivates the question: how will the intermediate generation of devices, positioned between NISQ and FTQC, deliver practical advantage?Such devices have recently been referred to as early fault-tolerant quantum computers (EFTQC) [8,9].Notably, EFTQC devices would possess a limited number of physical qubits, thus imposing constraints on the distance of the error-correcting codes they can support.These deviate from the conventional assumptions of fault-tolerant quantum computing, where such resources are presumed to be infinitely scalable.
A recent thrust in the field of quantum computing has been the development of EFTQC algorithms tailored to address the above limitations of EFTQC devices [1,[8][9][10][11][12][13][14][15][16][17][18][19][20][21].So far, two key considerations are central to the pursuit of practical value using EFTQC algorithms.The first is developing quantum algorithms that reduce the number of qubits and operations per circuit, often at the expense of increased circuit runs and consequently extending the runtime [8,10,12,13,17,18].The second is designing quantum algorithms such that they are robust against gate and measurement errors [1,10,11,20,21].These recent advancements showcase the potential applications of EFTQC devices, further motivating our previously posed question.An essential next step is to develop methodologies for assessing the performance of these algorithms, enabling a deeper understanding of how intermediate devices between NISQ and FTQC can be leveraged to attain practical value.
In this work, we conduct an analytically case study, first to our best knowledge, that links logical circuit error models to the performance of a variant of the EFTQC algorithm mentioned above, the Robust Fourier Estimation (RFE) algorithm [1].Specifically, we develop such a methodology to achieve the following: • A proof that quantifies the performance of the RFE algorithm that interpolates between using a single oracle call (suited for the high-noise NISQ setting) and using many oracle calls per circuit (suited for the low-noise FTQC setting) (see §V A) • A numerical demonstration of the suitability of this algorithm for EFTQC devices in that it can reduce arXiv:2306.17235v2[quant-ph] 12 Dec 2023 by an order of magnitude the number of physical qubits required for large instances of phase estimation at the cost of an increase in runtime by several orders of magnitude (see §V B) To establish these results, we introduce a modularized methodology designed to assess the impact of generic circuit-level errors on a broad class of quantum algorithms.As a case study, we demonstrate the effectiveness of our methodology on the RFE algorithm, which shares a similar structure with other EFTQC-suited algorithms, such as those used for ground state energy estimation [12,14,15,18] and property estimation [13].The key feature common to these algorithms is the utilization of signals derived from Hadamard test outcomes, which makes our methodology applicable and adaptable to various cases within the EFTQC framework.
Our objective is to gain a comprehensive understanding of how errors impact the RFE algorithm.By doing so, we aim to leverage this understanding and extend it to algorithms designed for tasks beyond phase estimation.While previous studies have investigated the resilience of algorithms for quantum phase estimation [22], we choose to analyze the RFE algorithm for two key reasons.Firstly, while sharing a similar structure to many of the above-mentioned EFTQC algorithms, the RFE algorithm is analytically tractable, making it an ideal candidate for our study.Secondly, its ability to smoothly interpolate between the high-to low-noise settings makes it well-suited for the EFTQC regime.
The paper is organized as follows.We begin in §II by introducing the RFE algorithm and our modifications to the algorithm for the purpose of this study.In §III, we outline the framework of our methodology, which we apply to analyze the runtime performance of the RFE algorithm.Specifically in §IV, we develop a chain of noise models from physical errors to algorithmic errors, and study the impact of these errors on the performance of RFE in §V A. Based on these results, we provide a faulttolerant resource estimation, comparing RFE to the traditional quantum phase estimation algorithm in §V B. Finally in §VI, we conclude by highlighting our key results and provide an outlook of this work.

II. Randomized Fourier Estimation (RFE)
In this section, we introduce a variant of the RFE algorithm proposed in Ref. [1].The RFE algorithm is used to solve the task of phase estimation, in which the goal is to estimate the phase angle θ defined by U |ψ⟩ = e iθ |ψ⟩, assuming the ability to prepare the eigenstate |ψ⟩ and to implement controlled-unitaries c-U .The traditional approach to this problem is the quantum phase estimation (QPE) algorithm [23], which achieves the optimal performance asymptotically.However, the realization of the QPE algorithm requires multiple ancillary qubits and many high-fidelity quantum operations, both of which may be prohibitively-costly given hardware constraints in the early fault-tolerant regime.
Given these constraints, alternative schemes for phase estimation have been proposed for near-term to early fault-tolerant devices [12,14,24,25].In our study, we focus on the RFE algorithm [1], the performance of which can be analytically studied.Specifically, we will analyze a variant of the original RFE algorithm with slight modifications to simplify our analysis, as introduced later in this section.As the modifications made do not fundamentally change the mechanism of the algorithm, we will henceforth refer to the modified version of the algorithm as "RFE" throughout the paper.
FIG. 1 Diagram of the circuits used in the Randomized Fourier Estimation (RFE) algorithm.The parameter k is uniformly randomly chosen among {0, . . ., K − 1} for each iteration.A key feature of the algorithm is that the parameter K controls the maximal circuit depth and is set to accommodate different degrees of error in the c-U operation: high error implies small K (low depth) and many repetitions, while low error warrants the use of large K (high depth) and fewer repetitions.The boxed-up elements in blue can be collectively interpreted as a measurement with respect to the observable σ ϕ = cos(ϕ)σ x − sin(ϕ)σ y , where σ x and σ y are the conventional Pauli operators and S(ϕ) = 1 0 0 exp (iϕ) .
The basic intuition behind this algorithm is summarized as follows.Each measurement outcome z = ±1 is generated from a sampled parameterized Hadamard test circuit in Fig. 1  To estimate the θ encoded in the frequency of the signal, we can construct from each outcome z an unbiased estimate fj of the expected discrete Fourier transform f j of signal g(k) (= exp(ikθ)).In the Fourier domain j ∈ {0, . . ., J − 1}, the estimate is expressed as Given enough samples, we expect the magnitude of the averaged fj to peak at a frequency close to θ.In the noiseless case, this peak will occur within the Fourier resolution 2π/J from the true θ, where we later set the FIG. 2  parameter J such that the Fourier resolution matches the desired accuracy of the algorithm ϵ.This peak frequency is then used as an estimate of θ, denoted θ.Fig. 2(a) shows the real and imaginary components of the signal g(k) as a function of k.The magnitude of the Fourier transformed signal |f j | is then plotted in Fig. 2(b) as a function of j, where the peak occurs at an index near θJ/2π corresponding to the true frequency θ.
The algorithm that we introduce here differs from that of Ref. [1] in two regards.First, rather than taking samples corresponding to the real and imaginary parts of g(k) separately by setting ϕ = 0 or π/2, ϕ is chosen uniformly randomly such that we can construct an unbiased estimator for g(k) with a single shot.This simplifies the algorithm analysis without any changes in its performance.
Second, the more substantial change is introduced to accommodate a more realistic noise model as developed in §IV.This noise model results in an exponential attenuation (as a function of k) of the outcome probabilities, converging towards a uniform two-outcome distribution at large k, similar to that of an unbiased coin toss.By appropriately setting the maximal value of k, i.e.K, we can minimize the impact of this attenuation on our estimated θ from the signal.
In Ref. [1], the parameter K was used to set both the maximum value of k and the Fourier basis resolution 2π/K.This can be an issue in the case when the attenuation is strong and high accuracy is required: measurement outcomes for large k are uninformative because they are drawn from a nearly uniform distribution.To address this issue, we allow these two values to differ; K still labels the maximum value of k, while a new parameter J is used to set the Fourier resolution.Then, the high accuracy and high noise case is accommodated by setting J large and K small.
We now elaborate on why our algorithm works in the noiseless limit, which will provide intuition for its performance in the noisy case.One can calculate the probability of measuring the outcome z in the Hadamard circuit of Fig. 1 as which is an oscillatory function of k with frequency θ.The two classically sampled variables are drawn uniformly: P (k) = 1/K and P (ϕ) = 1/2π, where the former distribution is discrete while the latter is continuous.Using these distributions, the expected value of our constructed estimator ( 1) is calculated to be where a := e iθ−i(2πj/J) .For j ∈ R + , the expectation (3) is maximized at j = θJ/2π.Whereas for j ∈ Z + , i.e. in a discrete Fourier transform setting, and assuming K ≤ J, the expectation achieves its maximum magnitude at ⌊θJ/2π⌋ or ⌈θJ/2π⌉ (or precisely at θJ/2π if θJ/2π ∈ {0, . . ., J − 1} ).Consequently, by setting J = 2π/ϵ we can then guarantee that the maximum of the expected discrete Fourier transform occurs at a frequency that is less than ϵ away from the true θ.The maximal circuit depth K will be chosen according to the target accuracy ϵ and a parameter λ (introduced in Eq. ( 12)) that characterizes the error strength in the c-U FIG. 3 Framework of our methodology.
operation.Qualitatively, K is chosen to monotonically increase with 1/ϵ and 1/λ, which is described in further details in Section V A.
Our algorithm estimates the value of θ by using an average over multiple Fourier signal estimates (1).With M estimates f (i) j generated independently, their average will concentrate about the expected value f j := E[ fj (k, ϕ, z)] (3), and a corresponding estimate of θ is given by This concentration will be addressed quantitatively in §V A. The algorithm succeeds, i.e. yields an estimate such that | θ − θ| ≤ ϵ, if one of the "adjacent" frequencies (⌊θJ/2π⌋ or ⌈θJ/2π⌉) achieves the largest magnitude among all Fourier estimates fj = 1 In expectation, one of these "adjacent" frequencies (shown as the two green dots in Fig. 2(b)) will achieve the largest magnitude, with a finite gap between it and the magnitudes of the non-adjacent frequencies (points that fall outside of the green shaded region).Hence, with sufficiently many samples M , the probability of failure of the algorithm can be made less than any finite failure probability δ.

III. Framework
In this section we outline our proposed methodology for connecting the logical error model of an arbitrary quantum circuit to the success probability of a quantum algorithm, with the special case analysis of the RFE algorithm introduced in the last section as an example.
Our methodology, depicted in the flowchat of Fig. 3, can be summarized as follows.We begin by modeling the effect of error on an algorithmic level from physical to logical-level error models proposed in §IV.Specifically, in §IV A, we establish a fault-tolerant overhead model that relates physical error rate p phys to logical error rate p logical for a surface code of distance d.We then propose a generic N -qubit logical Pauli error channel in §IV B and statistically quantify its impact on an N -qubit quantum circuit of depth D. To achieve this, we assume that a random N -qubit Pauli error occurs after each layer of unitaries in our circuit such that the resulting state is a mixture of random states drawn from a unitary 2-design.By computing the expected value and variance of measurement outcomes based on this probability distribution, we gain insights into their statistical properties, which allow us to develop an algorithmic noise model in §IV C.
In §V, we investigate the performance of RFE under the algorithmic noise model proposed in §IV C, namely, the exponential decay noise.In §V A, we give an upper bound on the algorithm runtime performance, and analyze its scaling with respect to the desired degree of accuracy ϵ in the presence of various strength of decay λ.Finally, in §V B, we provide a resource estimation of the RFE algorithm as compared to the standard QPE algorithm based on our original fault-tolerant overhead model proposed in §IV A.

A. Fault-tolerant overhead model
We begin by establishing the connection between physical and logical error rates.Currently, physical error rates range from 10 −3 to 10 −4 , which are too large for reliable implementations of the RFE algorithm.To overcome this, we analyze the performance of our algorithm using lower logical error rates achievable through fault-tolerant computational protocols.Implementation errors at the circuit (logical) level arise from approximations in the operations (e.g., gate synthesis) and uncorrected errors in the fault-tolerant protocols.The failure probabilities in both of these cases can be systematically reduced by paying a cost in the number of physical operations and, therefore, a cost in runtime.In our analysis, we assume the operations to be non-approximate, focusing solely on the uncorrected errors in fault-tolerant protocols as the source of implementation error.
In order to quantify the reduction in error rates from the physical to logical level, we adopt the model proposed in Ref. [7].This reduction in error rate comes at the cost of an increase in time and number of physical qubits; equivalently, these resources can be thought of as convertible into error rate reduction.The quality of this conversion is governed by the ability of the particular architecture to maintain low physical gate error rates at scale.A model for this conversion as a function of resource overhead d is expressed as where the parameters A and B depend on the physical error rates [7].The typical values for these parameters in the case of high (moderate) physical error rates are A = 0.5(0.4) and B = 1.6(1.1)[7].The overhead parameter d can be thought of as the code distance in the context of a surface code [26].In this model, the physical qubit overhead is approximately 2d 2 .In our subsequent analysis, we approximate the logical error model as a composition of single-qubit depolarizing errors (13), where p logical (5) represents the depolarizing rate.Based on these relationships, we can estimate the optimal performance of the RFE algorithm for given architecture parameters A and B, as we will elaborate in §V A.

B. Logical Gate Error Model
We consider a generic N -qubit Pauli error channel in order to study the impact of logical errors on the RFE algorithm.A generic N -qubit Pauli error channel acting on an N -qubit density matrix ρ can be expressed by the following Kraus decomposition where A j ∈ {I, X, Y, Z} ⊗N and p j is the probability of error A j occurring.We note that here we have assumed the most generic Pauli error channel for our logical errors to follow the convention of most error-correction literature, which can later be modified into different desirable error channels based on the set of {p j } of our choosing.
Given a quantum circuit that implements the ideal unitary i is the superoperator describing the action of the unitary U i on the density operator ρ.The overall noisy circuit is illustrated in Fig. 4. The outcome state ρ f upon applying this noisy circuit to the initial state ρ i is given by After obtaining the noisy output quantum state (7), we now focus on the various statistical properties of the final measurement with respect to an N-qubit Pauli observable.Here we introduce the notion of a "trajectory state," defined as where j is the index tuple that includes all of j 1 , . . ., j D .Each of these trajectory states |ψ j ⟩ corresponds to one combination of errors occurring on ρ i among the 4 N D possibilities with probability p j = p j1 . . .p j D .Expressing the expectation of an observable P with respect to the state ρ f in Eq. ( 7) in terms of these trajectory states, we get where respectively.By our assumption, each of these noisy trajectories shares the same mean.Consequently, as an increasing number of trajectories are averaged over to form the second term in Eq. ( 10), their collective average will converge towards this mean.A detailed calculation of the above quantities is provided in Appendix §A.We would also like to point out that the unitary 2-design model proposed here to establish Eq. ( 10) and (11) assumes that the noisy trajectory states to be drawn uniformly randomly from all possible states, which is a strong assumption for errors happening in a worst-case scenario.
An improved version of the model would likely benefit from system-specific knowledge of noises in the early fault-tolerant devices and their corresponding distributions of noisy trajectory states, which warrants further investigation.
Lastly, we briefly mention the potential impact of state preparation errors on our analysis.Small amounts of error in the initial state will not significantly affect the results of our paper.However, larger amounts of error can have a more pronounced effect.For example, if the initial state is not perfectly prepared in the ground state, then the peak in Fig. 2 will be suppressed in height and additional peaks may appear, each with a height proportional to the overlap between the imperfect initial state | ψ⟩ and other eigenstates |ψ⟩ of U , i.e. |⟨ ψ|ψ⟩| 2 < 1.To avoid over-complicating the analysis, we choose to omit the consideration of state preparation errors in our noise modeling.We further note that a recent study [19] investigates the performance of a similar-spirited EFTQC algorithm for phase estimation as RFE under the effect of imperfect state preparation.Combining our work with the analysis of Ref. [19] would provide additional insights into the interplay between different error sources in the early fault-tolerant regime.

C. Logical Gate Error Model to Algorithmic Noise Model
Based on the statistics of ⟨P ⟩ from Equations ( 10) and ( 11), we propose an algorithmic noise model that captures the effect of noise as a combination of exponential decay and random fluctuations on our algorithm.The exponential decay term stems from the p 0 term in the mean of ⟨P ⟩, whereas the random fluctuation is related to the variance of ⟨P ⟩.For the RFE algorithm, the impact of error is to alter the probability (2) of the measurement outcome z as where we introduce η k,ϕ to represent a noise bias, i.e. random fluctuations, and λ to parameterize an exponential attenuation of outcome probability with respect to circuit depth k.Letting η k,ϕ and λ vary arbitrarily, this "algorithmic" noise model is completely general and the algorithm would clearly not succeed in all settings.The key to establishing a reasonable algorithm performance is to limit, at least statistically, the magnitude of η k,ϕ under different strengths of λ, as we elaborate later in this subsection.
We now analyze how RFE works in a noisy setting under our proposed algorithmic noise model (12).Given, for example, a single-qubit depolarizing error channel the N -qubit composite error channel is given by a special case of the N -qubit Pauli error channel from Eq. ( 6).Upon applying the operation c-U , where U is assumed to have D layers, k repetitions in the Hadamard test circuit, the probability of the signal remaining noiseless at the end is which decreases as a function of c-U depth number k.
From Eq. ( 10), we learn that the expected value of the quantum expectation value ⟨P ⟩ decays with the total probability p total after k iterates.This corresponds to a decay in the signal, i.e.
which makes the exponential decay parameter λ that we introduced in Eq. ( 12) to be λ = − ln (1 − r) N D .Similarly, we can substitute r from Eq. ( 13) into the variance expression Var[⟨P⟩] from Eq. ( 11) We note that Var[⟨P ⟩] here establishes the expected deviation β j on our signal f j in our algorithmic error model to be discussed next.
Under the effects of random fluctuations and exponential decay, we now arrive at a noisy expression of the expected fj (k, ϕ, z), similar to the calculation in Eq. ( 3) where  There, we numerically show that the standard deviation of ⟨P ⟩, σ ⟨P ⟩ := Var[⟨P⟩], is negligible compared to the signal magnitude ∼ 1 for the parameter settings of our interest in the EFTQC regime.This means that the variance term drops out of our algorithmic error model (12), leaving exponential decay on the ideal signal as the sole source of error to be considered for the remaining analysis.This result is indicative that our approach for modeling the noisy trajectory states with a unitary 2-design model is indeed consistent with a global depolarizing noise, the effect of which has been thoroughly studied in many EFTQC algorithms [11,20,25].Hence, we set the β j term to 0 for the rest of the discussion.
The effect of exponential decay on the expected signal from Eq. ( 18) is shown as a function of decay strength λ in Fig. 5.We note that as λ increases, the signal g(k) decays exponentially as a function of circuit depth k, which results in a flatter spectrum in the expected Fourier signal amplitude |f j |.This is problematic as the RFE algorithm relies on distinguishing the largest peak in the signal amplitude as our predicted phase index.A flattened spectrum means that the amplitude contrasts between neighboring peaks will decrease, and as a result, more measurements are needed to overcome shot noise in order to better discern the highest-amplitude point in the spectrum.

A. Link to Algorithm Performance Analysis
The goal of this section is to establish the connection between the algorithmic noise model proposed in the previous section and the algorithm performance guarantee.As a reminder, the algorithm is said to succeed when the estimate θ is within ϵ of θ, see Fig. 2(b).Our analysis determines an upper bound on the number of samples that are needed to ensure success with probability greater than 1 − δ.
The estimated θ is calculated based on the discrete frequency point 2πj/J corresponding to the highest amplitude of | fj | (as in Eq. ( 4)).As a reminder, the consideration of a successful estimate of θ depends on whether θ is one of the values of 2πj/J or not.In the case when θ falls onto one of the discrete frequency points, there are a total of three values of j leading to successful estimates, i.e., j = Jθ/2π, Jθ/2π±1.In the case when θ falls between two points in the discrete frequency spectrum, as shown in Fig. 5(b), there are a total of two neighboring points that constitute successful guesses, namely ⌊Jθ/2π⌋ and ⌈Jθ/2π⌉.
Let j * indicate the index of the "best estimate," that is, the integer closest to Jθ/2π.Any index j that is more than 1 away from Jθ/2π will not correspond to an ϵ-accurate estimate according to our criteria for success.We refer to such estimates as "bad estimates."Hence, to guarantee that the algorithm succeeds, it suffices that | fj | 2 < | fj * | 2 for all j with |j − Jθ/2π| > 1.In other words, this condition ensures that no bad estimate has the largest |f j | so that the algorithm picks one of the ϵ-accurate estimates.Note that this is not a necessary condition; the algorithm could succeed even if | fj * | 2 were smaller than that of one of the bad estimates, as long as the other (or one among the two others) ϵ-accurate estimate had a magnitude larger than the rest.
The above algorithm success condition can be violated in the presence of exponential decay noise, the effect of which is shown in Fig. 5(b).The noise reduces contrast between neighboring peaks, which can invalidate our success condition if not enough measurement samples are collected to overcome the uncertainty due to shot noise.One way to counteract this deleterious effect is to perform extra measurements.Thus the performance of our algorithm can be quantified by establishing an upper bound on the number of measurements needed to guarantee a certain success probability of the algorithm, for various noise strengths and the desired accuracies.We point interested readers to Appendix §B for a proof of the algorithm performance bound.
We now present the main result of our algorithm performance analysis as follows.For target accuracy ϵ and exponential decay parameter λ, a success probability greater than 1 − δ can be ensured by using M samples FIG. 6 This plot shows the runtime (measured in total number of calls to the c-U operation) as a function of the target accuracy inverse 1/ϵ in the presence of various exponential decay errors of strength λ = 0.1, 0.01, 0.001, 0.0001, and 0.00001.As λ decreases, the runtime transitions from a 1/ϵ 2 scaling to a 1/ϵ scaling.The upper and lower dotted lines show the O(1/ϵ 2 ) and O(1/ϵ) scaling, respectively. satisfying where W (K, J, λ) is a complicated function whose explicit form is described in Eq. (B51) of Appendix §B and K is chosen as a function of ϵ and λ as described below.The algorithm performance measured by the runtime upper bound in units of c-U operations are plotted against 1/ϵ for various values of λ in Fig. 6.In the lownoise regime, the runtime scales as 1/ϵ, which resembles that of the original QPE algorithm [23].In the highnoise regime, the runtime scales as 1/ϵ 2 .For a moderate amount of noise, the runtime performance of the algorithm interpolates between the low-and high-noise performance.Due to the limited coherence time from the exponential decay, we choose the maximal circuit depth K based on ϵ and λ, given by K(ϵ, λ) = max{c⌊ 1 c(aλ+ϵb/2π) ⌋, 2}, where a, b, c ∈ R + .We note that here we set K to be tunable based on λ and ϵ such that the runtime of the algorithm interpolates between the Heisenberg-limit scaling (i.e.O(1/ϵ)) in the regime of Jλ ≫ 1 and the shot-noiselimit scaling (i.e.O(1/ϵ 2 )) in the regime of Jλ ≪ 1.For the specific functional form of K, we employ the floor function and set c = 10 to bypass values of K between 3 and 9, which were numerically found to be suboptimal.The parameters a = 2 and b = 1.5 were then chosen within this form to roughly minimize the runtime upper bound over a range of values of J and λ.We note that since this functional form of K is empirically derived to minimize the runtime upper bound, in practice, a more rigorous treatment for developing an optimal strategy for choosing K is necessary in the future.
Our analysis of the algorithm in Appendix §B differs from that of [1] in two regards: 1) we separate the roles of K and J enabling high-accuracy estimates in the highnoise setting and 2) we take into consideration the correlation between the values of neighboring fj , enabling a reduction in sample complexity in the high-noise (low-K) setting.These two features enable us to establish a unifying expression that captures the performance of the algorithm in a wide variety of scenarios, ranging from using (effectively) a Bernoulli estimation approach (i.e.K = 2) to a Heisenberg-limited phase estimation approach (i.e.K = O(1/ϵ)).A signature of the switch in the algorithm's strategy from the Bernoulli to the Heisenberg approach to accommodate different scenarios of error is marked by the cusp in the runtime upper bound of RFE in Fig. 7.
Finally, we would like to address the issue of determining the exponential decay parameter λ.Our algorithm relies on some knowledge of λ when setting the appropriate K(ϵ, λ).This raises the question of whether it is necessary to accurately determine λ and, if not, to what extent a discrepancy between the presumed and actual values of λ would compromise the algorithm's performance.The precise level of accuracy required to maintain the derived runtime remains an open question that we defer to future investigation.Nonetheless, intuitively, we anticipate that overestimating λ will result in using more samples than necessary, while underestimating λ will lead to using fewer samples than required, thus slightly increasing the probability of failure.We further highlight that several established benchmarking techniques, such as randomized benchmarking [28] and cross-entropy benchmarking [29], can be utilized to estimate essential noise parameters like the depolarizing error rate.These estimates can then be applied to determine the value of λ using Eq. ( 16) in our methodology.

B. FT Overhead Estimation Comparison
As an application of our established performance bound in Eq. (19), in this section we provide a cost analysis of implementing the RFE algorithm accounting for the error correction overhead.Specifically, we aim to compare the overall runtime performance (measured in units of QEC cycles) of the traditional QPE versus the RFE algorithm as a function of the number of physical qubits required, which relates to distance of the surface code d as described in Eq. ( 5).
In Fig. 7, we show this comparison for the case of a logical c-U acting on 100 qubits with a unitary circuit depth 1000, aiming for an accuracy of 0.1% and failure probability below 1%.One notable characteristic of the traditional QPE algorithm is that there is a minimal number of physical qubits/code distance below which the desired algorithm success probability cannot be achieved (refer to the detailed derivation in Appendix §C).Therefore, in order to implement the traditional QPE algorithm reli-

EFTQC FTQC
FIG. 7 Comparison of runtime upper bounds (in units of error correction cycles) between RFE (red) and the traditional QPE algorithm (blue) as a function of physical qubits available.These upper bounds are proved in Appendix Sections B and C, respectively.The upper bound on the runtime of RFE may be higher than the standard approach to QPE, RFE can be run using an order-of-magnitude fewer physical qubits.The cusp occurring in the runtime upper bound of RFE at ∼ 10 4 physical qubits marks the switch of the algorithm's strategy from using K = 2 to K > 2. In the regime of low physical qubits, the runtime becomes extremely high due to the sampling overhead that compensates for the high degree of error.Note that the runtime of standard QPE may be larger than presented; we've optimistically assumed that a single sample of the QPE circuit suffices, though in practice one might have to take several samples to ensure the failure rate is below some tolerance.
ably, a certain distance code or number of physical qubits have to be attained.This is, however, not required by the RFE algorithm.In principle, a higher runtime cost can always be paid in exchange for fewer physical qubits or a lower code distance in implementing the RFE algorithm.This is particularly valuable in the EFTQC regime, where devices have limited number of physical qubits, as depicted schematically in the green region of Fig. 7.
The RFE algorithm offers the advantage of requiring an order-of-magnitude fewer qubits compared to the traditional QPE algorithm.However, Fig. 7 also shows that the runtime upper bound of RFE in the FTQC regime is approximately four orders of magnitude larger than that of traditional QPE.Nevertheless, we expect that the aforementioned upper bound is conservative for two main reasons.Firstly, the analysis incorporates several analytical bounds that result in a more cautious choice of M than what is likely necessary (a well-known phenomenon in algorithm analysis).We expect that the empirical runtime would be several orders of magnitude smaller than the derived upper bounds and leave an investigation of this to future work.Secondly, the algorithm itself can be optimized to enhance performance.Examples of potential improvements include: 1) tailoring the distribution from which k is sampled based on the noise characteristics of the device, and 2) employing more sophisticated fitting strategies for extracting θ from the Fourier transformed data.

VI. Conclusions and Outlook
In summary, we have developed a methodology for systematically analyzing the performance of a class of quantum algorithms suited for early fault-tolerant quantum computers.This is motivated by the need to understand the quantum resources necessary for these algorithms to achieve quantum advantage.Our approach can be extended to encompass various error models, fault-tolerant resource overhead models, and a wide range of quantum algorithms.By offering a generalized framework, our methodology paves the way for a comprehensive understanding of resource requirements and performance trade-offs in the realm of early fault-tolerant quantum computing.
As an application of our methodology, we analyzed the performance of the recently proposed RFE algorithm [1].
Studying the circuit-level error under our proposed Haar trajectory model, we found that the noise can be best described by an exponential decay at the algorithmic level.We developed a variant of this algorithm that interpolates between low-depth and high-depth (based on the strength of the exponential decay) and analytically derived its runtime upper bound as a function of target accuracy and failure rate.Studying the algorithm under a continuum of noise strengths, we found that the runtime upper bound interpolates between O(1/ϵ 2 ) (shotnoise limited scaling) and O(1/ϵ) (Heisenberg scaling) in the high-to low-noise limits.
Based on the runtime upper bound, we then carried out a fault-tolerant overhead analysis, comparing RFE with the traditional QPE in the problem instance of 100 logical qubits.Our analysis showed that the RFE algorithm can be implemented with an order-of-magnitude fewer physical qubits than the traditional QPE algorithm, albeit with a substantial increase in runtime.This tradeoff allows for an earlier onset of practical quantum advantage in the EFTQC regime, where error correction remains expensive.
There are several crucial research directions that hold promise for delivering quantum computation with practical advantage: First and foremost, it is imperative to develop more realistic circuit-level error models.Our methodology relies on the assumption that the sampled Haar trajectory states are derived from a unitary two-design, or at the very least, their statistics can be well-approximated by those of a two-design.To better understand the limitations of this model, further numerical investigations are required.In future studies, it would be valuable to develop an empirically-derived model that captures the expected bias and variance of various circuit-level errors.We anticipate that the structure of the specific quantum circuits and error models of interest will introduce non-uniformity in the distribution of noisy trajectories.This non-uniformity could in principle shift E[⟨P ⟩] and increase Var[⟨P ⟩], leading to a worse algorithm performance than that predicted by the current analysis.By addressing these aspects, future studies can provide more realistic noise settings and evaluate the performance of Randomized Fourier Estimation (RFE) under those settings.
Secondly, there is a need to develop fault-tolerant overhead models specifically tailored for early fault-tolerant quantum computers.Recent work [30] has shown that small deviations of fault-tolerant architectures from the ideal assumptions can substantially impact the performance of quantum algorithms.We anticipate that by including such realistic models of fault-tolerant architectures into the methodology of this work, the importance of robust quantum algorithms will be increasingly apparent.
Thirdly, as briefly mentioned in §IV B, we have exclusively considered circuit-level errors while neglecting the effect of state preparation errors in our noise modeling.Extending our methodology to include state preparation errors is a plausible avenue for further exploration and would likely require an algorithm-specific analysis.We note that parallel to our work, some related recent studies (e.g.[19]) provide complementary insights by studying the impact of state preparation errors on the performance of a similar-spirited EFTQC algorithm to RFE.A potential future direction is to study the expected performance of these relatively shallow-circuit algorithms under a mixture of the realistic error models, which early fault-tolerant devices are known to suffer from.Furthermore, it is crucial to extend our methodology to analyze alternative EFTQC algorithms rather than RFE towards solving practical problems.We have chosen to analyze the RFE algorithm as a proof-of-principle case study to demonstrate the viability of our methodology, for the RFE is analytically tractable, which comes at the price of a potential sub-optimal performance.We note that, however, our methodology is readily applicable to more advanced versions of EFTQC algorithms, such as the QCELS algorithm, a variant of the phase estimation algorithm proposed in [18].For example, the noise modeling and FT overhead estimation detailed in our paper can be directly transferable to the QCELS robustness analysis similar to that outlined in [20].In addition, our noise modeling scheme allows for the substitution of depolarizing errors with alternative structured noise models, such as dephasing errors, with simple modifications to the statistics of the error channels modeled in §IV.We expect that these alternative algorithms will also be found to reduce quantum resources in the early faulttolerant regime.
We envision that by extending our methodology to encompass existing algorithms and future developments in the EFTQC regime, we can enable their comprehensive evaluation and enhance the practical utilization of early fault-tolerant quantum computers.the performance of devices in the early fault-tolerant regime.Specifically, we are interested in the parameter regime characterized by r = p logical ≲ 10 −2 , N ≳ 100, and D × k ≳ 1000.In this regime, the standard deviation σ ⟨P ⟩ remains sufficiently small, justifying our assumption to omit it from the algorithmic error model.We point out that the performance of the RFE algorithm under the NISQ setting, where p logical > 10 −2 , N < 100, and D × k < 1000, falls outside the scope of our study and merits further investigation.

B. Algorithm Performance
In this section, we detail the analytical derivation of the performance upper bound of the RFE algorithm in the presence of exponential decay noise with strength λ.Rather than presenting these results in a theorem-proof format, we choose a more narrative presentation to facilitate broader accessibility.We will upper bound this worst-case failure probability using Chebyshev's inequality.
In previous analyses of the RFE algorithm [1], we had analyzed the likelihood that every Fourier estimate fj was within some distance of its mean f j .This approach does not capture the performance of the algorithm in the regime of small K, as the correlation between Fourier estimates, fj and fj ′ , is largely overlooked.As a result, the success probability bound becomes too loose to capture the actual performance scaling of the algorithm.
To account for the correlation among the Fourier estimates, we will consider a change of variables, ĉj,j * ≡ fj + fj * (B4) dj,j * ≡ fj − fj * . (B5) The motivation for this choice is the fact that dj,j * captures correlation between fj and fj * and therefore will have small variance for nearby j, j * when K is small compared to J.In contrast, ignoring the correlation between fj and fj * and treating their variances separately, as was done in [1], leads to an overestimation of the algorithm failure probability.We will define the expectation values of these quantities to be The sufficient condition for success can be expressed as Re(ĉ j,j * d * j,j * ) < 0 for all j with |j − Jθ/2π| > 1 (B8) This condition also holds for c j,j * and d j,j * , which are the expected values of ĉj,j * and dj,j * , respectively.It is the statistical fluctuations from the finite sampling that can cause the condition to fail.As the number of samples is increased, the estimates ĉj,j * and dj,j * will be expected to increasingly concentrate about their means.We can quantify this concentration with the following version of the central limit theorem proven by H. Chernov [31], where q is any continuous random complex variable.We use the Chebyshev inequality to upper bound the likelihood that the estimator deviates from its ideal expected value: where Var(ĉ) and Var( d) are the single sample variances of ĉ and d.Thus, we have two free parameters (χ and η) to choose in a manner such that 1) |ĉ − Eĉ| ≤ χ and | d − E d| ≤ η imply success (letting us upper bound the failure probability) and 2) the number of samples does not become too high.While it does not necessarily minimize the number of samples, we will choose the value of χ such that the upper bounds in Eq.B11 become equal: Next, we aim to establish a sufficiently large value for η, while still ensuring that |ĉ − Eĉ| ≤ η Var(ĉ)/Var( d) and | d − E d| ≤ η imply success.This will be achieved by recasting the success condition.Note that the condition Re(ĉ j,j * d * j,j * ) < 0 is independent of the magnitudes of ĉj,j * and d * j,j * ; it only depends on the relative phase angle between these two complex numbers.Thus, an equivalent condition for algorithm success is that, for all j with |j − Jθ/2π| > 1, the phase angle formed between complex values ĉj,j * and dj,j * is not within [−π/2, π/2].For the time being, we will ease the notation by dropping the j, j * subscripts.
We use this phase angle condition to establish a sufficient condition for success in terms of the allowable sizes of deviations from the mean.The largest possible angle formed between ĉ and c is given by sin(γ) = |ĉ − c|/|c| and the largest possible angle formed between d and d is given by sin(τ ) = | d − d|/|d|.We also define the angle between c and d according to |c||d| cos(α) = Re(cd * ).With these definitions, the smallest possible phase angle formed between ĉ and d is α − γ − τ , hence, the condition for success becomes π/2 < α − γ − τ. (B12) We use the bounds of x ≤ arcsin x ≤ πx/2 for 0 ≤ x ≤ 1 to establish a sufficient condition for success that is amenable to using the Chebyshev inequality.First, we have that γ Next we have that cos(α) = − sin(α − π/2), and so arcsin(−Re(cd * )/|c||d|) = α − π/2.Using the lower bound on arcsin, we then have −Re(cd * )/|c||d| ≤ α − π/2.From the chain of inequalities we can establish the following implication The contrapositive of this statement gives our sufficient condition for algorithm success We can then set the maximal allowable value for η according to Eq. (B15).Defining ρ = Var(ĉ)/Var( d), this is Putting this all together, the probability of failure is upper bounded by Pr(fail) ≤ J max j Pr(fail j ) (B17) where we've reintroduced the indices j ′ , j * for clarity and are using j ′ to indicate the index that realizes the maximization.Let W (K, J, λ) be a parameterized upper bound on the quantity Var( dj ′ ,j * )/η 2 j ′ ,j * that is independent of θ, j * , and j ′ .To ensure that P (fail) ≤ δ, we can choose M to be a value such that We can then solve for such a value of M as To achieve a high probability of success, it is also sufficient to set M to be greater than a value that is larger than the right-hand side above.Let W (K, J, λ) be a function satisfying Then, setting is sufficient to ensure success with high probability.As described in the main text, K and J are set as functions of λ and ϵ, given by where we see that K is a harmonic average of J and 1/λ and K ≤ J.This will make the number of measurements M an implicit function of ϵ and λ.
We now establish a function W that satisfies W ≥ Var( d) η 2 .From Eq. (B16), we can establish the upper bound, where in the second line we have used the fact that both σ ĉ = Var(ĉ) and |c| are upper bounded by 4. This follows from the fact that for a single sample estimate, ĉ has magnitude at most 4 as it is the sum of two quantities that have magnitude at most  is an upper bound of m(x; K, λ) when r = 0.89.Consider two cases: 1) K = 2 and 2) K > 2. In the case of K = 2, through use of trigonometric identities, it can be shown that g is an exact expression for m when r = 1.By setting r = 0.89 < 1, g can only increase given that r multiplies a positive-valued function.Therefore, setting r = 0.89 leads to g being an upper bound for m in the case of K = 2.In the case of K > 2, we will establish that 1 − cos is a lower bound of 1 − g(x; K, λ) r(1 − tanh 2 (λ/2)) . (B46) First, we use the fact that the above function is monotonically decreasing in λ > 0 and the fact that λ is upper bounded, to set λ to this value and establish a new λ-independent function of K and x that is a lower bound.Since, as set in the main text, K = max{10⌊ and this function S(K, J, λ; j) decreases monotonically away from j = Jθ/2π on the interval −J/2 ≤ j − Jθ/2π ≤ J/2, which will be used later.

C. Comparison to cost of standard quantum phase estimation
In this section, we analyze the cost of the standard quantum phase estimation (QPE) algorithm as described in [32].This allows us to compare the fault-tolerant overhead associated with the traditional approach to the frugal approach that we have taken using the RFE algorithm.
FIG. 2 (a) Real and imaginary components of the expected (noiseless) signal Re[g(k)] = cos(kθ) and Im[g(k)] = sin(kθ) as a function of the circuit depth k.(b) The magnitude of the expected Fourier transformed signal |f j |.The green shaded region shows the acceptable range of values for θ, and the red dashed vertical line shows the true peak at index Θ = θJ 2π .The solid line represents the analytical functional forms where k and j are treated as continuous variables for easier visualization.The discrete values of g(k) and |f j | are marked with dots.The two green dots in (b) correspond to j = ⌊Jθ/2π⌋ and ⌈Jθ/2π⌉, which we refer to as "adjacent frequencies." is the expected deviation of our signal related to the quantity Var[⟨P ⟩].We delay the detailed discussion of the variance to Appendix §A 2.

FIG. 8
FIG. 8 Standard deviation of ⟨P ⟩ as a function of D × k and r is plotted on a log scale for N = 1, 10, 100.