Quantum error mitigation as a universal error-minimization technique: applications from NISQ to FTQC eras

In the early years of fault-tolerant quantum computing (FTQC), it is expected that the available code distance and the number of magic states will be restricted due to the limited scalability of quantum devices and the insufficient computational power of classical decoding units. Here, we integrate quantum error correction and quantum error mitigation into an efficient FTQC architecture that effectively increases the code distance and $T$-gate count at the cost of constant sampling overheads in a wide range of quantum computing regimes. For example, while we need $10^4$ to $10^{10}$ logical operations for demonstrating quantum advantages from optimistic and pessimistic points of view, we show that we can reduce the required number of physical qubits by $80\%$ and $45\%$ in each regime. From another perspective, when the achievable code distance is up to about 11, our scheme allows executing $10^3$ times more logical operations. This scheme will dramatically alleviate the required computational overheads and hasten the arrival of the FTQC era.

In the early years of fault-tolerant quantum computing (FTQC), it is expected that the available code distance and the number of magic states will be restricted due to the limited scalability of quantum devices and the insufficient computational power of classical decoding units. Here, we integrate quantum error correction and quantum error mitigation into an efficient FTQC architecture that effectively increases the code distance and T -gate count at the cost of constant sampling overheads in a wide range of quantum computing regimes. For example, while we need 10 4 to 10 10 logical operations for demonstrating quantum advantages from optimistic and pessimistic points of view, we show that we can reduce the required number of physical qubits by 80% and 45% in each regime. From another perspective, when the achievable code distance is up to about 11, our scheme allows executing 10 3 times more logical operations. This scheme will dramatically alleviate the required computational overheads and hasten the arrival of the FTQC era.

I. INTRODUCTION
Quantum computers are believed to be capable of implementing several tasks such as factoring and Hamiltonian simulations, in exponentially smaller computational times than those of classical computers [1, 2]. However, quantum systems generally interact with their environments, which leads to physical errors in the system that may destroy their quantum advantages. Since the physical error rates of quantum computers are still much higher than those of classical computers, it is vital to suppress these errors. As a solution, fault-tolerant quantum computing (FTQC) using quantum error-correcting codes has been studied [3][4][5][6][7]. The long-term FTQC allows executing conventional quantum algorithms such as Hamiltonian simulation algorithms [8]. According to the current state-of-the-art resource estimations [9,10], the logical quantum operation count will be in the order of 10 10 to observe clear quantum advantages based on the computational complexity theory.
Towards the realization of the long-term FTQC, we will experience several intermediate regimes as shown in Fig. 1 because high-level encoding is not allowed due to restrictions of quantum resources such as qubit and magic-state count [5,7]. Since quantum error correction (QEC) requires massive classical computation for repetitive error estimations, the available code distance would also be strongly limited in the near future [11][12][13]. As quantum technologies become mature, compu-tational quantum supremacy [14] will be achieved in the logical space. We will refer to the intermediate regime from the realization of logical quantum supremacy to the demonstration of long-term applications as an early-FTQC regime. The number of physical qubits will go beyond one thousand in this region, and we anticipate that more than about 10 4 reliable logical operations on 10 2 logical qubits are available. Even at the beginning of the early-FTQC regime, we may observe a quantum speed-up with heuristic quantum algorithms, for example, with the variational quantum eigensolver [15][16][17].
In this paper, to realize efficient and high-accuracy quantum computation in the early-FTQC era, we propose a novel framework of FTQC, where QEC and quantum error mitigation (QEM) are combined on an equal footing. While QEM has been considered to be an alternative error minimization technique for noisy intermediate-scale quantum (NISQ) devices due to its low hardware overhead at the expense of the sampling cost, we show that, by integrating probabilistic error cancellation [19,20] into the FTQC framework, we can mitigate all the dominant types of errors in the logical space. We also note that our scheme can efficiently mitigate Pauli errors by virtually updating the quantum states with a classical memory called the Pauli frame [5]. In the conventional QEM formalism, the sampling cost of QEM increases exponentially with the number of physical error events [21,22]. Therefore, the sampling overheads of QEM become unrealistic in NISQ computing when the number of physical operations increases for a fixed error rate per quantum gate; and the number of error events that QEM can efficiently suppress is limited to the order of unity. In our framework, the sampling cost of QEM increases exponentially with the number of logical error events in the encoded space. Note that we can tune the To estimate these lines, we refer to the quantum supremacy experiments [14] and the existing state-of-the-art resource estimation [9,10,18]. The early-FTQC regime is defined as a region between these lines. In the main text, we assume that the number of error events during FTQC Ne is required to be smaller than 10 −3 , which is shown as the dotted black line. Our technique allows for FTQCs with the number of error events in the order of unity Ne ∼ 1, which is shown as a solid black line, to execute applications that originally require a much smaller error event count. For example, at the beginning and the end of the early-FTQC regime, our technique allows simulating applications (white and yellow circles with black rims) with the relaxed hardware requirement (white and yellow circles with red rims).
number of logical error events by adjusting several parameters such as the code distance, distillation levels, and precision of approximations for Solovay-Kitaev decomposition. Thus, it is highly likely that we can find regions where the QEM techniques are the most effective, i.e., the number of logical error events is the order of unity. Accordingly, we can relax the hardware requirement with constant sampling overheads. Even after the scalable FTQC is realized, taking QEM into account, we can optimize quantum computation by allocating computation resources at will to perform even more efficient quantum computing. We need to overcome several fundamental difficulties for applying QEM in the logical space because the costs and restrictions of logical operations and dominant sources of errors are different from the NISQ formalism. We resolve them in the affirmative by giving a solution one by one. For example, solutions to major problems are as follows. In FTQC, logical Clifford operations and Pauli measurements can be efficiently applied while non-Clifford operations are costly because it involves a number of T -gate injection, distillation, and teleportation procedures [5,7]. These logical operations are affected by three types of logical errors: logical errors in each elementary gate operation due to restricted code distances, noise in non-Clifford logical gates deriving from shortage of magic-state distillation processes, and errors induced in the Solovay-Kitaev decomposition [23,24]. We call the first two logical errors decoding errors and the last one approximation errors. We will discuss what types of errors are present when implementing logical operations, and provide a hierarchical way to mitigate noisy and costly operations with clean and less costly ones. To detect and correct physical errors during computation, we store the estimated errors in the Pauli frame instead of physically applying recovery operations [5]. This means that actual physical states are almost never in the code space. We will provide concrete procedures for a universal set of logical operations incorporating QEM which are compatible with the Pauli frame. To apply probabilistic error cancellation, we need a good characterization of the noise model to construct QEM operations. We show that decoding errors can be efficiently characterized with gate set tomography [25,26] on the code space. Note that the approximation errors of the Solovay-Kitaev algorithm can be characterized efficiently on classical computers. Finally, while probabilistic error cancellation is a QEM technique to mitigate errors in the algorithms for calculating the expectation values, many FTQC algorithms are sampling algorithms using the phase estimation [9,10,27]. We show that probabilistic error cancellation is compatible with the phase estimation algorithm. See Appendix. H for details.
We perform resource estimation of FTQC under realistic scenarios with and without QEM, and we show that our scheme can dramatically alleviate the required computational overheads in FTQC. We assume that the mean number of logical error events N e is required to reach N e = 10 −3 , and the sampling overhead by QEM is restricted to a reasonable level, i.e., within 10 2 times greater samples for achieving a certain accuracy. We expect at least 10 4 logical operations are required to demonstrate classically intractable applications. In this case, the required number of qubits is reduced to approximately one-fifth with QEM compared to the original qubit count. We also expect that 10 10 logical operations are at least necessary to perform conventional long-term applications. The required number of qubits is reduced to 55% in this regime. From another perspective, our scheme can be used for increasing the number of available logical operations when the available code distance is strongly restricted. The lifetime of current superconducting qubits is about up to 1 ms, and a cycle of error estimations during FTQC must be sufficiently faster than the lifetime, i.e., about 1 µs [11,28]. To cope with this strong restriction, an efficient implementation of classical error-decoding architectures has been studied. According to the recent state-of-the-art proposals [11][12][13], the available code distance would be limited up to about 11 in the near future even with simplified decoding algorithms. When the available code distance is limited up to 11, our scheme enables 10 3 times more logical operations with the same hardware requirement. Thus, our technique can clearly accelerate the realization of applications in early-and long-term FTQC regimes. This improvement is illustrated by red arrows in Fig. 1. It is also worth noting that, to the best of our knowledge, these are the first examples where the performance of useful quantum algorithms with clear quantum advantages is enhanced via QEM under realistic conditions since QEM has been investigated for near-term heuristic quantum algorithms dependent on numerical optimization. This paper is organized as follows. In Sec. II, we review probabilistic error cancellation and the architecture of fault-tolerant quantum computing. In Sec. III, we describe how to evaluate decoding errors and approximation errors. Then we show our novel FTQC architecture with an analytical argument of the cost of QEM and explain the effect of model estimation errors. In Sec. IV, we numerically analyze the sampling cost of QEM for decoding errors and approximation errors and demonstrate that we can effectively increase the code distance and the number of T -gates via QEM even when there are finite estimation errors. Finally, we conclude our paper with a discussion in Sec. V.

II. PRELIMINARIES
A. Quantum error mitigation and probabilistic error cancellation Quantum processors are affected by a number of physical noise sources, which should be mitigated to obtain correct results. Here, for simplicity, we will assume that the gate errors are Markovian, i.e., the noise process N for a gate is totally independent of other gate errors. In this case, we have where ρ out and ρ in are the output and input quantum states, U k and N k denote the ideal and noisy part of the process of the k-th gate, and N G is the number of gates. To ensure correct computations, it is necessary to mitigate the effect of N k , (k = 1, 2, ..., N G ) and obtain Quantum error mitigation (QEM) has been proposed as a method for suppressing errors without encoding, and it is useful especially for NISQ devices with a restricted number of qubits [19,20,29]. Generally speaking, QEM methods recover not the ideal density matrix ρ ideal out itself, but rather the ideal expectation value of an observable M ideal = Tr(ρ ideal outM ) via classical post-processing. Note that QEM is not a scalable technique because it needs exponentially increasing circuit runs with the number of error events in the quantum circuit [19,20]. Now let us explain the concept of probabilistic error cancellation with which we can eliminate a bias from the expectation value of the observables completely given the complete information on the noise model [19,20]. (Later, we will use this method to suppress errors in FTQC.) First, we identify the noise map N via either process or gate set tomography [25,26], and calculate the inverse N −1 . Then, by finding a set of processes {B i } such that Note that arbitrary operations can be represented as linear combinations of tensor products of single-qubit Clifford operations and Pauli measurements [20]. Here, we can rewrite Eq. (3) as where γ Q = i |η i |, q i = |ηi| γ Q , γ Q ≥ 1 and sgn(η i ) is a parity which takes ±1, corresponding to the operation B i . We refer to γ Q as the QEM cost because it is related to the sampling overhead. Now let us suppose that we have measured an observ-ableM and obtain Here,μ eff i = sgn(η i )m i , andm i is a measurement outcome for a process B i N U. We generate the process B i with a probability q i and multiply the corresponding parity with the measurement result, which is denoted as µ eff . Then, the expectation value of the random variablê µ mit = γ Qμ eff approximates the error-free expectation value M U . Note that since Var[μ mit ] = γ 2 Q Var[μ eff ] and a measurement outcome without QEM, which we denotê µ nmit has a similar variance, the variance of the errormitigated value is approximately amplified as Γ Q = γ 2 Q . Therefore we need to have Γ Q times more samples to achieve a similar accuracy before applying QEM.
In practice, we use probabilistic error cancellation for each gate in quantum circuits. The ideal process for the entire quantum circuit is described as (6) From Eq. (6), we can see that, in each gate, a process B i k is generated with probability q i k , and the product of parities Ng k=1 sgn(η i k ) is multiplied with the measurement results to obtain the outcomeμ eff . This procedure is repeated, and the product of the mean of the outcomes μ eff and γ tot Q approximates the correct expectation value. Note that here γ tot Q is the QEM cost for the entire quantum circuit. Let us assume the cost for each gate is uniform and can be approximated as γ (k) Q = γ Q = 1 + aε with a and ε being a positive constant value and the effective error rate, respectively. Now the QEM cost and sampling overhead can be approximated as γ tot Q e aεN G = e (γ Q −1)N G and Γ tot Q = (γ tot Q ) 2 , which increase exponentially with the mean number of error events in the quantum circuit εN G . Note that for εN G = O(1) and ε → 0, since ε k N G = 0 (k ≥ 2), the QEM cost can be exactly described as γ tot Q = e (γ Q −1)N G .
B. Fault-tolerant quantum computing

Stabilizer formalism
In the framework of fault-tolerant quantum computing (FTQC), one prepares a redundant number of physical qubits and performs quantum computing in a code space defined as a subspace of the whole Hilbert space. By repetitively performing quantum error detection and correction, we can protect the logical qubits defined in the code space against physical errors. The state of the logical qubits is manipulated in a fault-tolerant manner with a set of logical operations.
The stabilizer formalism [3,30] is the most standard way to construct quantum error-correcting codes. Here, supposing that we construct k logical qubits with n physical qubits, a 2 k -dimensional code space C is specified with a subgroup of n-qubit Pauli operators called the stabilizer group. Let the n-qubit Pauli group be where I is the identity operator and X = 0 Pauli operators S ⊂ G n is called a stabilizer group if S is a commutative subgroup, the number of elements in S is 2 n−k , and −I ∈ S. We denote the (n−k) generator set of a stabilizer group as G = (g 1 , · · · , g n−k ). The code space C is defined as an eigenspace with +1 eigenvalues for all the operators in the stabilizer group, i.e., C = {|ψ | ∀s i ∈ S, s i |ψ = |ψ }. In the code space, we can introduce a logical basis as {|0 L , |1 L } ⊗k and logical Pauli operators as {I L , X L , Y L , Z L } ⊗k . The code distance d is defined as the minimum number of physical qubits on which an arbitrary logical operator, except the logical identity I ⊗k L , acts.
During a quantum computation, physical errors that occur in the encoded state are detected by using (n − k) Pauli measurements P s = 1 2 (I + (−1) s g i ) for s ∈ {0, 1}. These measurements are called stabilizer measurements and their binary outcomes s are called syndrome values. The original state is restored by applying appropriate feedback operations that are estimated from the syndrome values. These stabilizer measurements are performed repeatedly during a computation. One repetition of the stabilizer measurements is called a code cycle of fault-tolerant quantum computing. If the effective error probability per physical qubit during a cycle is smaller than a certain threshold, we can estimate the Pauli operator that restores the original state with an exponentially small failure probability with the code distance d. Since the required number of physical qubits n increases polynomially with the code distance d in typical quantum error-correcting codes, we can exponentially decrease the error probability of logical qubits with a polynomial qubit overhead.

Logical operations
We must not only correct physical errors but also update the logical quantum state for performing quantum computation. To this end, a universal set of logical operations should be performed in a fault-tolerant manner. According to the Solovay-Kitaev theorem [23,24], we can approximate an arbitrary one-and two-qubit gates with a finite set of local operations. For example, the Hadamard gate H = 1 1| ⊗ X, and Tgate T = exp i π 8 Z form a universal gate set. Several logical operations can be performed by transversally operating the same one-or two-qubit operations on physical qubits. Since transversal operations constantly increase the effective physical error rate per qubit during a cycle, we can fault-tolerantly achieve transversal logical operations. However, it is known that there is no stabilizer code for which the set of transversal gates is universal [31]. Thus, we need an additional technique to achieve fault-tolerant and universal quantum computing. The most promising solution is to create a quantum state called a magic state and perform non-transversal logical operations with gate teleportation [5]. For example, is a typical magic state and T -gate operations can be performed by consuming this state. This magic state encoded in a logical qubit can be constructed with a process called magic-state injection. While the infidelity of a magic state created by magic-state injection is generally larger than the logical error rate, we can create a high-fidelity magic state from several noisy magic states by using another quantum error-correcting code implemented on the logical space, which is called magic-state distillation. Since the application of T -gates requires a longer time than the other operations, the number of Tgates is the dominant factor affecting the computation time of FTQC.
Although we can estimate a Pauli operation for recovery from syndrome values, we do not directly apply it immediately after estimation. Instead, we store the Pauli operations that should be applied to the physical qubits for recovery in a classical memory called the Pauli frame [5,32]. The stored operations will be taken into account when the logical measurements are performed; the outcome of a logical measurement is flipped according to the Pauli frame. A schematic figure is shown in Fig. 2. In the above construction of logical operations, the whole process, except for magic-state injection, consists only of Clifford operations and Pauli channels in the code space. Since a Pauli operator conjugated by a Clifford operator is also a Pauli operator, we can always track a recovery operator as a Pauli operator during a computation. In addition, when we can apply a logical Pauli operator to a quantum state, we can perform it simply by updating the Pauli frame, since a logical Pauli operator is a transversal physical Pauli operation. As far as classical computers are reliable, this operation is effectively noiseless.

III. QUANTUM ERROR MITIGATION FOR FAULT-TOLERANT QUANTUM COMPUTING
In this section, we discuss how to integrate QEM into the FTQC architecture. Here, we consider two types of errors in FTQC: decoding errors due to failures in the error estimation and insufficiency of magic-state distillation and approximation errors in the Solovay-Kitaev decomposition. In Sec. III A, we explain how these errors in FTQC can be modeled. In Sec. III B, we discuss how these errors can be canceled and evaluate their QEM costs. Probabilistic error cancellation requires the errors to be estimated in advance. In Sec. III C, we also discuss the effect of estimation errors on probabilistic error cancellation and the characterization efficiency.
A. Errors in fault-tolerant quantum computing

Decoding error
Here, we describe noise due to the failures of error estimation in elementary logical operations, i.e., stabilizer measurements and magic-state distillation. The first obstacle to applying probabilistic error cancellation to FTQC is how to characterize an effective map of noise due to the failures of error estimation. If we suppose that the physical errors can be modeled as a stochastic physical Pauli map and assume that there are no errors on the ancillary qubits for syndrome measurements, we can define a logical noise map for decoding errors that is Markovian and a logical stochastic Pauli map. Yet, these assumptions do not hold in practice. Nevertheless, here we will assume that we can define an effectively Markovian logical error map for each logical operation and also assume that this noise map is a stochastic logical Pauli map. It is known that even if noise is unitary, a noise map in a logical space of surface codes can be wellapproximated as stochastic Pauli noise when the code distance is sufficiently large [33]. Furthermore, the remaining coherent errors can be canceled by using pulse optimization techniques. Thus, it is reasonable to suppose that the decoding errors due to the failure of error estimations in surface codes are almost stochastic Pauli errors. In addition, we numerically verified that we can regard the decoding errors as Markovian errors even in the presence of measurement errors. See Appendix. F for the details. While we mainly describe and analyze the decoding errors in the surface codes, a similar idea can be applied to the decoding errors due to insufficient magic-state distillation. As for the logical noise map on a prepared magic state due to insufficient magic-state distillation, we can twirl the noise map by logical Clifford operations, and it can be also assumed to be a stochastic Pauli noise.
Under the above assumptions, we can describe a noise map for a l-qubit logical operation N dec as the following stochastic Pauli noise: where p g ∈ R, g p g = 1 and p g ≥ 0. The sum of probabilities of non-identity logical operations is called the logical error probability p dec , i.e., p dec = g =I ⊗m p g . It is known that when the physical error rate p is smaller than a value called the threshold p th , the effective logical error probability decreases exponentially with respect to the code distance d. For the effective logical error probability per syndrome-measurement cycle of surface codes p cyc , it decreases as where C 1 , C 2 are constants [34]. While the constant values depend on the details of the error correction schemes, C 1 0.13 and C 2 0.61 are expected in a typical construction of surface codes and the noise model [34,35]. Suppose that a logical operation requires m cycles; then, the logical error probability for the logical operation can be approximated as p dec as Note that the number of cycles per logical gate increases at most linearly with the code distance d.
In order to apply probabilistic error cancellation, we need to know the logical error probabilities {p g } in advance. While we can estimate {p g } by using gate set tomography in the logical space, the estimations are not exact. The effect of estimation errors is discussed in Sec. III C, while the efficiency of our proposal, including noise characterization, is discussed in Appendix. C.

Approximation error
Since we are only allowed to use a limited set of logical operations for achieving fault-tolerance, we need to decompose an arbitrary unitary gate into a sequence of available gates. Any unitary operator can be decomposed into a product of CNOT gates and single-qubit gates. Thus, we need to approximate single-qubit gates with a given gate set to the desired accuracy. By using the improved Solovay-Kitaev algorithm [36], given a universal gate set such as {T, H, S} and the single-qubit gate U to be approximated, we can construct an approximated gatẽ U which satisfies ε = Ũ − U to an arbitrary accuracy ε as a sequence of given gate set with lengthÕ(log(ε −1 )) with · being an operator norm. The error of approximated map is given by Since this decomposition involves only single-qubit operations, this error channel can be efficiently and exactly evaluated in advance.
B. Quantum error mitigation for fault-tolerant quantum computing

Overview of our framework
Here, we show that decoding errors and approximation errors can be mitigated with probabilistic error cancellation. When we insert recovery operations for probabilistic error cancellation, it is assumed that the noise level of the recovery operations for QEM is much lower than that of the error-mitigated gates. In NISQ computing, for example, it is reasonable to assume that the error probabilities of two-qubit gates are much larger than those of single-qubit gates and measurements; therefore, the errors of two-qubit gates can be mitigated by using singlequbit recovery operations. However, this is not a reasonable assumption in FTQC, since the operations that are noisy and time-consuming are different from those of NISQ architecture. More concretely, even Clifford operations involving only one logical qubit suffer decoding errors.
Here, we show an architecture of FTQC that implements QEM with small overheads. The keys are the two significant properties of FTQC architecture: logical Pauli operations are error-free and instantaneous due to the Pauli frame, and the noise map of the decoding errors can be assumed as stochastic Pauli noise. Thanks to these properties, we can mitigate errors in all the elementary logical operations simply by updating the Pauli frame. This means the error-mitigated Clifford operations and Pauli measurements are available for computation. Because they form a complete basis for mitigating arbitrary errors [20], we can mitigate approximation errors due to the Solovay-Kitaev decomposition. Since the approximation errors can be exactly known in advance, an unbiased estimator free from approximation error can be obtained, as will be explained in Sec. III B 3.
To make our QEM procedure to work, the accuracy and efficiency of the decoding error estimation are vital. We show that the decoding errors can be estimated with gate set tomography under an appropriate choice of the gauge, considering state-preparation and measurement errors. We also show that the cost of gate set tomography is acceptable compared with the main computation of FTQC for estimating expectation values in Sec. III C. In this section, we further show a refined gate set tomography suited to our framework that significantly improves the estimation for logical Clifford gates.

Quantum error mitigation for decoding errors
We can express the inverse channel of the non-uniform depolarizing channel Eq. (8) as a linear combination of Pauli operations. Thus, we can express the inverse channel as Refer to Appendix. B for a concrete expression of each coefficient η g , γ dec , and q g . Thus, we can suppress the errors by applying probabilistic error cancellation only with Pauli operators after the decoding processes. The QEM cost for decoding errors in the entire circuit can be expressed as γ tot dec = where N dec is the number of logical gates, and γ (k) dec is a QEM cost of the k-th operation.
Note that probabilistic error cancellation usually applies the recovery operations of QEM immediately after the noisy gates [19,20]; however, because we perform only logical Pauli operations as the recovery operations for decoding errors, they can be done simply by updating the Pauli frame instead of directly applying them after noisy gates. Finally, the measurement result is postprocessed according to the state of the Pauli-frame, the parity corresponding to the applied recovery operations, and the QEM cost. Thus, unlike in probabilistic error cancellation for NISQ devices, the logical noise due to decoding errors can be mitigated without any additional noise due to the recovery operation. A schematic figure is shown in Fig. 3. Note that the information on the QEM cost and the parity is used only when the final measurement result is obtained; the outcome of a destructive logical Pauli measurement is flipped depending only on the state of the Pauli frame. Whether we can mitigate decoding errors of complicated logical operations such as magic-state preparation, gate teleportation, and adaptive Clifford gates by simply updating the Pauli frame is not trivial; therefore we provide a concrete procedure for actual devices and Pauli frames in Appendix. E.
In the case of decoding errors in surface codes, by approximating the QEM cost to the first order of the logical error, we have Refer to Appendix. B for details. Under the assumption that the logical error rate is the same for all the logical operations and p dec N dec = O(1) with p dec → +0, the QEM cost γ tot dec for the entire quantum circuit can be shown to be exactly equal to e 2p dec N dec on the basis of the argument in Sec. II A. Thus, by using Eqs. (9) and (10), we obtain FIG. 3. Schematic figure for the Pauli frame incorporating QEM. If a QEM recovery operation is a Pauli operation, it is not directly applied to the quantum computer but rather the Pauli frame is updated instead. The parity is also updated in accordance with the generated recovery operations of QEM.
Here, we denote the parity corresponding to the QEM recovery operation as pa in the figure. If a QEM recovery operation is not a Pauli operator, it is performed physically. Then measurement outcomes are then post-processed depending on the Pauli frame, parity, and QEM cost.
which results in the total QEM sampling overhead Notice that Eq. (15) clearly shows a trade-off relationship between the sampling overhead and the code distance, i.e., the number of physical qubits.

Quantum error mitigation for approximation errors
Unlike decoding errors due to the failure of error correction, we cannot describe errors due to the Solovay-Kitaev decomposition as stochastic Pauli errors. Nevertheless, we can still apply probabilistic error cancellation with negligible overheads. Denote N SK (ρ) =Ũ U ρ(Ũ U ) † ; we invert this approximation error by where {B (L) i } denotes recovery operations in the logical space. Note that we can represent any map as a lin-ear combination of Clifford operations and Pauli channels [20], and thus, we do not need T -gates for mitigating approximation errors. Recovery operations are randomly chosen and applied immediately after each single-qubit logical operation if they are not Pauli operations. In the case of Pauli operations, we can again use the Pauli frame, and physical operations on quantum computers are not required, in a similar vein to QEM for decoding errors. Since a single-qubit logical unitary operation consists of several repetitions of Clifford gates and T -gate teleportation, the insertion of the recovery operation for probabilistic error cancellation negligibly increases the length of the quantum circuit. In the numerical simulations described in the next section, we will verify that the QEM costs can be approximated with the following equation: where β 1 and β 2 are constants dependent on the quantum gate and N T is the number of available T -gates. The QEM cost due to approximation errors can also be represented as where N SK is the total number of recovery operations for mitigating approximation errors in the quantum circuit with the cost γ (k) SK corresponding to the k-th recovery operation. By assuming that the cost does not depend on gates, we have the following QEM sampling overhead: This shows there is a trade-off relationship between the sampling overhead and the number of available T -gates.
C. Effect of estimation errors of the noise map

Effect of estimation errors on expectation values
While approximation errors can be exactly determined in advance, decoding errors have to be characterized. Since the logical error probabilities of the decoding errors are small, it is unavoidable that the characterization will contain finite and non-negligible estimation errors. Thus, we need to care about QEM with estimation errors and the efficiency of the characterization of decoding errors.
Let us discuss how estimation errors affect the performance of QEM. Given a perfect characterization of the noise model N k for the k-th gate, we can realize the inverse operation N −1 k with probabilistic error cancellation to achieve N −1 k N k = I. If we obtain an incorrect estimation for the error process N k = N k , it leads to an estimation error ∆N k ≡ N −1 k N k = I. Now, denoting the ideal process of the k-th gate as U k , the difference of the the error-mitigated process and the error-free process for the entire quantum circuit can be described by the diamond norm: where we used the fact that the diamond norm is subadditive and we denote ∆ε = max k ∆N k − I . Similarly, the discrepancy of the noisy and ideal process can be upper-bounded as Because the deviation of the expectation values of an observable M for two processes E 1 and E 2 with the input state ρ can be described as · is an operator norm, we have δM QEM ≤ M ∆εN G and δM noise ≤ M εN G . Here, δM QEM and δM noise are the deviation of the observable with and without error mitigation.
Thus, we can see that QEM is beneficial when we can achieve r < 1 for Note that this discussion does not include sampling errors; i.e., δM is the error of the expectation value given infinite samples.

Efficiency of characterization of decoding errors
As a cause of model estimation errors, when we use gate set tomography to characterize the noise model for decoding errors, we need to consider state preparation and measurement (SPAM) errors and the finite statistical error arising from an insufficient number of samples. It has been shown that the effect of SPAM errors can be eliminated in the case of probabilistic error cancellation based on gate set tomography [20]. While the general choice of the gauge is not compatible with the Pauli frame, we can modify the scheme of gate set tomography so that this method is compatible with QEM with the Pauli frame. Refer to Appendix. C for details.
To achieve an accuracy r given in Eq. (20), we need to perform N GST = O((rε) −2 ) samplings with gate set tomography [26,38]. Here, we show this efficiency is acceptable compared with the main part of FTQC, i.e., the time required for gate set tomography corresponds to O(r −2 n q N G ) runs of the whole quantum logical circuits to obtain expectation values, where n q is the number of logical qubits. Let the time for a single run of the logical circuit of FTQC be τ . The depth of logical quantum circuit is estimated as O(N G n −1 q ), and the time per gate can be roughly approximated as τ gate = O(τ n q N −1 G ). Then, the time for gate set tomography can be estimated as . In a situation where QEM is useful, we have εN G = O(1) [39]. Thus, we can conclude that to use QEM to decrease the logical error rate p dec to rp dec by QEM, we need gate set tomography as a pre-computation that takes times longer than a single circuit run of FTQC. The numbers of logical gates N G and logical qubits n q are expected to grow polynomially with the problem size, and FTQC circuits will be repeated on the order of O(r −2 ) to make the statistical fluctuation of expectation values smaller than the reduced bias. Accordingly, while the estimation costs of the noise map cause another overhead to FTQC depending on the required accuracy, it is performed with a time that grows polynomially with the problem size and without requiring additional physical qubits. We remark that when we assume the noise properties of the quantum devices are uniform, we can perform the sampling for gate set tomography in parallel. If we use all the logical qubits for characterization, the time for gate set tomography is reduced to . Note that, in the scenario that we can fully parallelize the sampling procedure, i.e., when . To further make the characterization of noise more efficient, we propose an improved gate set tomography for decoding errors of the Clifford process that is fast and compatible with the Pauli frame. See Appendix. C for the details of this scheme. The number of measurements N GST is reduced to N GST = O(r −2 ε −1 ), which makes the costs of pre-computation O(n q r −2 ). Thus, as long as r is not too small, the time for characterization is expected to be relatively short. While our efficient gate set tomography cannot be applied to the characterization of the T -gate preparation, several ways to reduce the costs for estimating errors of T -gates can be considered. Since the error of logical T -gate depends on physical T -gate and the process of injection and distillation is constructed by a few T -gate circuit, there may be an efficient way to numerically estimate the noise of logical T -gate from the characterization of physical T -gate and efficient simulation for quantum circuits dominated by Clifford gates [40]. There may be a way to mitigate Tgate errors by temporally expanding the code distance or increasing the distillation depth for T -gate. The cost of gate set tomography might be also reduced by utilizing long-sequence GST [38], i.e., repeating several T -gates to amplify a small error rate to a large value. Ref. [41] shows that if decoding errors of logical Clifford gates are negligible, one can reliably twirl the noise of T -gate and perform efficient process tomography on that by repeating T -gates. Nevertheless, it is still an open problem whether there exists a more efficient gate set tomography on the logical space with imperfect logical Clifford gates.

Effective increase in code distance by quantum error mitigation under estimation errors
We can regard that QEM effectively increases the code distance. Suppose that we can effectively achieve an r times smaller logical error rate p eff = rp dec via QEM. Since the logical error rate is roughly approximated with the code distance as p dec (d) = p(p/p th ) (d−1)/2 , QEM effectively achieves a larger code distance d where p eff = p dec (d ) without increasing the number of physical qubits. The effective increase in the code distance via QEM can be derived as Therefore, by setting r = (p/p th ) x , we can effectively increase the code distance by 2x. Note that, as discussed in the previous sections, we need exp(O(N dec p dec )) = exp(O(1)) times more repetitions to achieve the same precision as the error-free case. It is worth noting that we can increase the number of available logical qubits via QEM. If we are allowed to use a fixed number of physical qubits, the decrease in the code distance indicates that we can allocate more logical qubits; therefore, we can convert the code distance into the number of logical qubits.

IV. NUMERICAL ANALYSIS
We numerically evaluated how well error mitigation suppresses the qubit overhead in FTQC. (See Appendix. G for the detailed settings and the definitions of the terms used in the numerical analysis.) A. Quantum error mitigation for decoding errors

Cost analysis
We evaluated the performance of QEM on decoding errors occurring during logical operations, where we assumed FTQC with surface codes and lattice surgery. (See Appendix. D for details about surface codes.) For simplicity, we assumed a single-qubit depolarizing noise model for each data and measurement qubit at the beginning of each cycle, which corresponds to a phenomenological noise model [42,43]. To determine the failure probability of decoding with faulty syndrome-measurement cycles, we further assumed perfect syndrome measurements in the 0-th and d-th cycles. Then, we checked whether any logical Pauli errors occurred during d cycles. The recovery operations were estimated from the syndrome values by using the minimum-weight perfect matching decoder [44,45]. We evaluated the logical error probabilities of Pauli-X, Y, Z and computed the QEM cost for d cycles according to Eq. (B5). Despite our assumption of perfect syndrome measurements of the 0-th and d-th cycle, we expect that the numerical results are asymptotically equivalent to those without the assumption when d is sufficiently large.  The logical error probabilities of Pauli-X, Y, Z of a single logical qubit for several code distances were calculated. The sum of their probabilities are plotted according to physical error rates in Fig. 4(a). The logical error probability exponentially decreases according to the code distance when the physical error probability p is smaller than a threshold value. Fig. 4(b) plots the logical error probabilities around the threshold value, which is around p th = 0.044.
We computed the QEM costs for decoding errors γ dec corresponding to d cycles and different code distances and compared them with the first-order approximation shown in Eq. (13). The numerical results are plotted in Fig. 4(c). In this figure, the solid lines correspond to the approximation of the QEM cost in Eq. (13). We can see that the QEM costs decay exponentially depending on the code distances and show a threshold behavior like the logical error probabilities. They coincide well when the physical error rate is sufficiently small.

Performance analysis
Next, we examined the performance of QEM on decoding errors in large-scale quantum circuits with a 100qubit logical random Clifford circuit with 100 layers. We remark that since a linear combination of Clifford operations can represent arbitrary quantum operations, it is sufficient to demonstrate the performance of QEM for Clifford operations [46]. We simultaneously applied randomly generated single-qubit Clifford gates to each layer, and then we applied 50 CNOT gates to two randomly chosen qubits. We can simulate these protocols efficiently by using an efficient algorithm for stabilizer circuits [30,47]. As an observable, we chose a Pauli operator whose measurement outcome is always unity for the final state vector if there are no physical errors; i.e., the final state is a +1 eigenstate of the chosen observable. The numerical simulations assumed a non-uniform singlequbit depolarizing logical error in the form of Eq. (8) for each layer. The logical error probabilities of depolarizing channels were determined according to the numerical results of the last section. We chose p = 0.01 and obtained the logical error probabilities by extrapolation. The estimated logical error probabilities are summarized in Table.   Note that while the required number of cycles for Clifford operations scales linearly with the code distance, the actual number of cycles and logical error probability per logical gate are dependent on the Clifford operations. In particular, logical CNOT gates with lattice surgery may induce correlated logical Pauli errors on multiple logical qubits. Nevertheless, we used a simplified error model, since we expect this evaluation captures the basic properties of QEM performance.
We numerically performed a series of 10 4 experiments, each of which computed an expectation value from 10 4 single-shot measurements. The results are shown in Fig. 5(a), while data around the ideal expectation value are shown in Fig. 5(b). Fig. 5(c) shows the mean value of 10 4 samples for each logical error probability together with its standard deviation (the error bar). We can see that there was a large bias in the expectation value without QEM, but no bias when the QEM technique was employed, while its standard deviation was amplified. The standard deviation of the expectation value for d = 5 was 14.2, and thus, it is not visible in the histogram because it is too large. The mean number of Pauli errors in the whole quantum circuit was 3.6 for d = 5 and 0.28 for d = 7. Thus, as explained in Sec. II A, QEM is useful when the number of Pauli errors in the circuit is less than unity. These results show that the QEM technique is effective for large-scale quantum computing and it enables us to increase the effective code distance.
B. Quantum error mitigation for approximation errors

Cost analysis
Next, we studied the performance of QEM when the Solovay-Kitaev decomposition is used. Since the actual QEM cost γ SK depends on the target unitary operator, we drew a sample of unitary operations U from a Haar-measure random distribution µ H . Then, we decomposed a unitary gate in the form of U = where √ X = HSH is a Clifford operation. We used the improved Solovay-Kitaev algorithm from Ref. [36]. This algorithm enables us to approximate an arbitrary Pauli-Z rotation R Z (θ) = exp(i θ 2 Z) with an operatorŨ which is described as a sequence of Clifford operations and T -gates. We set the maximum count of T -gates for each decomposition for three Pauli-Z rotations to check the trade-off relation between the T -gate count and the approximation accuracy. Fig. 6(a) shows the histogram of errors evaluated with operator norm ||U −Ũ ||. As expected, there was an exponential decrease in the approximation errors.
Next, we calculated the QEM cost by using Eq. (16). Fig. 6(b) shows the histogram of QEM costs γ SK , and Fig. 6(c) plots the QEM cost versus the number of allowed T -gates. We can see that γ SK − 1 exponentially decreases according to the number of T -gates, and its variance also decreases exponentially. We fitted the QEM cost γ SK with Eq. (17), and obtained β 1 = 3.9(5) and β 2 = 0.072(1).

Performance analysis
Next, we evaluated the performance of QEM for approximation errors due to the Solovay-Kitaev decomposition in a simulation of a SWAP test circuit with 7 qubits. A SWAP test circuit evaluates the overlap of two input states ρ and σ as Tr[ρσ] by measuring ancilla qubits [48]. We set one of the input states to the ideal state and the other to the state affected by approximation errors. A schematic diagram is shown in Fig. 7.
The ideal state was generated by using random quantum circuits composed of three layers. In each layer, random single-qubit unitary operations were simultaneously applied; then a CNOT gate acted on two randomly chosen qubits. The same random quantum circuit was applied to the approximate state by applying the Solovay-Kitaev decomposition to each single-qubit rotation. In this case, if there are no approximation errors, we necessarily obtain +1 as measurement outcomes since the input states are the same; hence the expectation value is also +1 for the Pauli-Z operator of the ancilla qubit. On the other hand, the expectation value becomes smaller than unity when the inner product is reduced by approximation errors. Since approximation errors cannot be treated in the framework of stabilizer simulation, we simulated the quantum circuits by directly updating the state vector after each gate.
We numerically performed a series of 10 4 experiments and computed the expectation values from 10 4 singleshot measurements. The number of allowed T -gates in each Solovay-Kitaev decomposition for single-qubit unitary operations was varied from 24 to 60. Fig. 8(a) shows the results, while Fig. 8(b) shows the data around the ideal expectation value. Moreover, Fig. 8(c) shows the mean value of 10 4 samples for each logical error probability together with its standard deviation (the error bar). Note that standard deviation with 21 T -gates is 4.65. We can see that our QEM technique successfully removed bias from the expectation value in the noisy cases. Compared with the QEM cost for decoding errors, we obtained a larger QEM cost in this case. This is consistent with the results reported in Ref. [20] that indicates the QEM cost for unitary errors tends to be larger than that for stochastic errors.
This problem may be alleviated by performing several Solovay-Kitaev decompositions with the same accuracy, constructing randomizing approximation errors, and removing the coherent component of the noise. Note that with a sufficiently large sample size, our QEM technique enables the effective number of T -gates to be increased by inserting additional Clifford gates and Pauli channels and conducting repeated sampling, with negligible additional hardware requirements.

C. Quantum error mitigation with estimation errors
Probabilistic error cancellation assumes that the noise maps to be canceled are known in advance. While we can determine the approximation error of the Solovay-Kitaev decomposition within the numerical precision, it is hard to exactly characterize the noise maps of the decoding errors because GST is affected by finite sampling, as discussed in Sec. III C. In this section, we numerically evaluated the performance of our framework in the case of finite estimation errors.
For the benchmarks, we chose the same quantum circuit, noise model, and observable as in Sec. IV A 2. We evaluated the expectation value for a 100-qubit noisy Clifford circuit that is unity if there is no noise. A nonuniform depolarizing channel where Pauli-X, Y, Z occurs with probabilities (p x , p y , p z ) was inserted after each gate. We used the probabilities obtained from the simulation of surface codes shown in Table. I. The point of difference from the previous simulations is that these prob-abilities were over-or under-estimated as ((1 + r)p x , (1 + r)p y , (1 + r)p z ) (r > −1) in probabilistic error cancellation. Here, r = 0 corresponds to an exact characterization and perfect error mitigation, while r = −1 corresponds to FTQC without error mitigation. As in the discussion in Sec. III C, we expect the distribution is such that its mean value is the same as in the simulation of the mitigated noise model (|r|p x , |r|p y , |r|p z ) and its variance is approximately Γ, which is the overhead of QEM determined by the estimated noise model ((1+r)p x , (1+r)p y , (1+r)p z ). The histogram in the case of finite estimation errors parametrized by r for d = 7 and d = 9 are plotted in Fig. 9(a), and the mean values and standard deviations are plotted as a function of r in Fig. 9(b). We can see that the residual bias increases exponentially with |r|. With infinite samples, QEM is beneficial when r is sufficiently smaller than unity. Comparing the cases of under-estimation r < 0 and overestimation r > 0 with the same absolute value |r|, we find that over-estimation (r > 0) has a larger variance than that of under-estimation (r < 0), while there is similar amount of bias in the expectation values. Since QEM prefers under-estimation to over-estimation, we conclude that characterization methods with a weighted penalty may lead to a further improvement in QEM. We calculated the overlap of randomly generated states and the approximation of the generated state by using the Solovay-Kitaev algorithm. The random circuit was composed of three layers, each of which consisted of random single-qubit rotation gates and CNOT gates acting on two randomly chosen qubits.

D. Practical utility of quantum error mitigation for FTQC
While we have shown our method effectively decreases the hardware requirement of FTQC with the examples of random Clifford and swap-test circuits, the practical availability of QEM in the region where FTQC is used for useful applications with a quantum advantage is also vital. In this section, we discuss the enhancement of computation accuracy in this regime with our protocol.
We estimated how many quantum logical Clifford operations N G are required in the practical region from the existing resource estimation. Note that there are other noise sources, such as imperfect T -gate preparation, that can be counted as the overhead of logical operations in distillation processes, and thus we counted the effects of them as decoding errors. We considered two scenarios for evaluation; an optimistic one and a realistic one. An optimistic scenario is the case of lightweight applications mainly for showing a quantum advantage, i.e., an application with the minimum possible N G that cannot be simulated with the existing classical computers. We referred to the analysis of Refs. [14,49] to estimate the maximum problem size tractable with the existing classical computers. According to them, we expect that quantum circuits with depth 100 and 100 logical qubits are sufficient to go beyond the limitation of classical simulation and that we can achieve this with N G ∼ 10 4 . The other scenario is the ground-energy estimation of spin models and chemical molecules since the quantum simulation is expected as one of the most resource-efficient applications whose quantum advantage is well studied. We picked an expected number of gates from recent the state-of-the-art resource estimation in Refs. [9] and [10]   qubitization [51] as a subroutine, respectively. Trotterization is a method to simulate quantum systems by approximating Hamiltonian dynamics with Trotter decomposition [50], and qubitization [51] is a recently proposed method that constructs the state after time-evolution by repetitive applications of Grover-like iterations. According to the Tables. 1 and 2 in Ref. [9] and the Table. IV in Ref. [10], approximately 10 8 T -gates are required to simulate a Hubbard model that is hard to simulate with classical computers. Note that while these algorithms use phase estimation sampling to obtain the ground energy as binary digits and are not a procedure to estimate expectation values, we can still apply QEM to these algorithms with small overheads. This is because these problems can be translated into a series of decision problems. See Appendix. H for a detailed explanation. Considering there is an overhead of executing magic-state distillation and so on, we expect that N G = 10 10 is required as a pessimistic estimation. Next, we considered how QEM can reduce the effect of decoding errors of logical operations during FTQC.
Without QEM, a logical error probability p L must be much smaller than the inverse of the number of logical operations N G . In other words, the mean number of logical errors N G p L must satisfy N G p L = O(1). Suppose the allowed mean number of errors without QEM for satisfying the required accuracy as N e , which becomes small as required accuracy becomes strict. On the other hand, with QEM, the bias of expectation values caused by logical errors whose mean number is below unity can be mitigated with e O(1) , i.e., about e 4 ∼ 55, sampling overheads according to Eq. (15). Thus, the required logical error rates are relaxed from p L ∼ N e /N G to p L ∼ 1/N G . The effective error rates of elementary logical operations such as logical Clifford gates decrease exponentially with the code distance. When we focus on the advantage of QEM in the logical space, we can estimate the relations between the problem size N G and code distances d in terms of the required mean number of errors without QEM N e as shown in Fig. 10(a). In this figure, the required code distance for reliable accuracy is plotted as a function of the number of logical gates required for executing an al-  gorithm for several cases of N e . For the calculation, we used Eq. (9) with p/p th = 0.1. The number of physical qubits per logical qubit scales as O(d 2 ) in the case of twodimensional topological codes. We plot how the number of physical qubits are reduced with QEM d 2 mit /d 2 nmit in Fig. 10(b), where d mit and d nmit are the required code distance with and without QEM, respectively. While the improvement of the code distance d is constant, the impact of resource reduction depends on the expected technologies and the size of the problems of interest.
When we consider applications in the early FTQC era with N G ∼ 10 4 , according to Fig. 10(a), the required code distance is reduced from about 9 to 4 if N e = 10 −3 and the required number of qubits is reduced to 21%. This advantage becomes large when the required accuracy N e becomes strict. Even when we take the cost for distillation and lattice surgery into account, FTQC with 100 logical qubits with the code distance 4 is estimated to require about 10 4 physical qubits with QEM, which can be the promising milestone to show the computational supremacy on the logical space. While this advantage becomes comparably small when we see promising long- term applications in N G ∼ 10 10 , the code distance is still reduced from 19 to 14, which suppresses the number of physical qubits to 55%. Thus, we can conclude that in both cases, our proposal is expected to drastically alleviate the hardware requirement in practice.
It should be noted that the reduction of the code distance is vital not only for the reduction of the number of physical qubits but also for relaxing the requirement of error decoding architectures. To estimate occurred Pauli errors on physical qubits during FTQC, we need classical peripherals for decoding with sufficiently small latency for stabilizer measurement cycles. However, recent results show the tractable size of realistic implementation can decode up to about code distance 11 [11][12][13]. While this value may be improved in the future, it is obvious that the performance of decoding units is another restriction of FTQC. When we assume tractable code distance is limited to 11, we can use at most 10 5 logical gates with N e = 10 −3 , which is just around the limitation of the classical simulation. On the other hand, the use of QEM can increase the available logical gates to 10 8 . Thus, our proposal can be the key to push the performance of FTQC from a classically simulatable region to the quantum supremacy regime.
While we only discussed the reduction of code distance, we can similarly reduce the effective required number of T -gates. While the effective increase of the resource is also constant as in the case of code distance, this can be used not only for reducing the total number of generated T -gates but also for mitigating the bias of generation throughput of magic states during FTQC protocol. Since magic-state generation succeeds probabilistically, the number of available magic states per time unit statistically fluctuates and cannot be estimated in advance. We expect our method can exempt these kinds of difficult run-time scheduling and make long-time execution of FTQC reliable. Furthermore, in the case of distributed FTQC, logical CNOT gates between distant nodes require entanglement distribution and distillation, which also typically succeed probabilistically. Thus, QEM can be used for reducing a wide range of difficulties in FTQC.

V. DISCUSSION
We have described a method to effectively decrease errors in FTQC by performing QEM in the logical space. In the case of decoding errors due to insufficient code distances and magic-state distillation, we can perform QEM with small modification on quantum circuits. In particular, owing to the Pauli frame, we can perform QEM without implementing any physical operations if decoding errors are stochastic Pauli maps, while QEM operations may induce additional errors when we implement it physically on general quantum circuits. In regard to the approximation errors due to the Solovay-Kitaev decomposition, we cannot always use the Pauli frame because QEM employs not only Pauli operations but also Clifford operations. Since Clifford operations can be efficiently performed in FTQC and the number of decoding processes for error correction is much larger than the gate count in Solovay-Kitaev algorithms, this overhead is negligible. We have verified the trade-off between costs of QEM and the code distance and the number of Tgates. Furthermore, we have estimated the sampling cost of gate set tomography to obtain the decoding noise map to the required accuracy, and clarified that our approach enables quantum computing corresponding to more than the achievable code distance. We have numerically compared the computation with and without QEM on FTQC with the same sampling number and have shown the advantage of QEM even under the existence of finite estimation errors. We also have estimated the required resources with and without QEM in the early-FTQC era, and have shown that the required number of physical qubits can be suppressed up to tens of percent.
It should be noted that this is the first result to clearly show QEM can dramatically improve the computation accuracy for useful applications under realistic assumptions as we have shown by using the example of quantum simulation on FTQC. This is because the computational advantage of typical NISQ algorithms such as variational algorithms are empirically assumed and the required runtime of the algorithm has not been revealed; yet the accuracy of FTQC algorithms can be estimated reliably depending on the complexity of the problem. Accordingly, the usefulness of quantum error mitigation can be clearly discussed in the FTQC scenario.
Another important aspect of providing the theory and implementation of QEM for FTQC rather than NISQ computing is as follows. While it is known that QEM is the most effective when the mean number of errors during computation is in the order of unity [20], this criterion is not necessarily satisfied in the NISQ regime depending on the problem size. On the other hand, because we are allowed to tune code distances, magic-state distillation levels, and the number of T -gates per Solovay-Kitaev decomposition in FTQC, it is highly likely that we can satisfy this criterion. Thus, the main drawback of QEM, the exponential growth of the sampling overheads, can be circumvented; therefore we can find the highly practical regimes where QEM can help enhance the computation accuracy. Accordingly, we can conclude that the theory of QEM on FTQC is more versatile than that for NISQ computing.
As mentioned in the main text, promising applications of our method are quantum phase estimation algorithms and Hamiltonian simulation algorithms for investigating quantum many-body dynamics. There are algorithmic errors in the Trotter decomposition [8] and recently proposed methods such as Taylor series [52] and quantum signal processing [51,53]. In Refs. [54,55], it is shown that such algorithmic errors can be mitigated by employing extrapolation. Since algorithmic errors can be controlled by changing the simulation accuracy, this technique can also be naturally incorporated in an FTQC scenario. Thus, the dominant errors in FTQC can be compensated via QEM.
The first generation of FTQC may not be sufficiently large for naively solving large and useful problems. While the architecture of distributed quantum computing is the most straightforward approach to increase the total number of qubits, it requires interconnections between quantum nodes, which induces additional overheads for entanglement distillation. Thus, we sometimes cannot use the sufficient number of distilled entanglements for distributed FTQC. In this context, techniques developed in NISQ era [56,57] for solving larger problems with small NISQ computers may also be useful in the middle-term FTQC. Our work is the first proposal that makes the best of a technique tailored for NISQ devices in the context of FTQC.
Finally, we should discuss the difference between our scheme and a similar work that combines QEM with the quantum error correction proposed by McClean et al. [58]. Their method considers implementing quantum error correction for NISQ devices via classical postprocessing in the case that experimentalists cannot implement stabilizer measurements because of the limited connectivity and large error rates of NISQ devices. Although their method enables the state to be projected to the code space via quantum subspace expansion [59], logical errors cannot be fully eliminated. On the other hand, our scheme assumes FTQC can be performed but the number of qubits and T -gates cannot be increased infinitely. The remarkable advantage of our method is that we can fully eliminate the decoding errors and approximation errors by using a greater number of measurements at negligible hardware overhead, given the good characterization of the noise model.

NOTE ADDED
After we uploaded this work to arXiv, three relevant works appeared that have a similar concept that quantum error mitigation is incorporated in fault-tolerant quantum computing to relax the hardware requirement at the cost of sampling overheads [41,60,61]. Ref. [60] shows quantum error mitigation for encoded qubits but they focus on concatenated codes rather than topological codes. Ref. [41] uses quantum error mitigation for implementing T -gate without magic-state distillation and shows efficient characterization methods for the errors of T -gate under the assumption that logical Clifford operations are perfect. Ref. [41] also discusses on how to relax the costs for implementing T -gates by using the concepts of robustness of magic.
Compared to these works, we emphasize that our framework consider a different scenario where logical Clifford operations are imperfect, and thus FTQC suffers from decoding errors of logical Clifford procedure, logical noise on prepared magic states, and insufficient magicstate supply. This difference makes our framework versatile in the early FTQC era. While we provided consistent analysis of gate set tomography with noisy logical Clifford gates, we have refined the treatment of the noise of magic state preparation in gate set tomography, motivated by Refs. [41,61]. We would like to thank Takanori Sugiyama for a fruitful discussion on gate set tomography. We acknowledge useful discussions with Zhenyu Cai, Xiao Yuan, Rui Asaoka, and Kaoru Yamamoto.

Appendix A: Pauli transfer matrix
Any quantum map can be represented as a matrix called a Pauli transfer matrix (PTM). Suppose we perform a quantum process E on n qubits that maps an n-qubit density matrix to another density matrix. We denote the set of n-qubit Pauli operators as P (n) = {I, X, Y, Z} ⊗n . The set of n-qubit Pauli operators forms a basis of 4 n × 4 n matrices and the elements are mutually orthonormal, 1 d Tr[P i P j ] = δ ij , for d = 2 n . The PTM representation of process E is defined with the Pauli basis as follows: where P i is the i-th element of P (n) . Note that since any physical process maps a self-adjoint operator to another self-adjoint operator, all the elements of the Pauli transfer matrix are real values. The density matrix can also be represented as a column vector with the Pauli basis: Note that an element of the vector corresponding to P i = I is the trace of ρ. A measurement of an observable O can be mapped to a row vector, Note that the PTM representation satisfies There are several important properties of the Pauli transfer matrix representation. Given a composite map Due to the linearity of the PTM representation, for the map E : ρ → k q k E k (ρ), we have M(E ) = k q k M(E k ). When no confusion is possible, we will simply represent the Pauli transfer matrix M(E) as E.

Appendix B: Coefficients for quasi-probability decomposition
To perform probabilistic error cancellation on an arbitrary m-qubit noise map N , we need to decompose the inverse of the noise map N −1 into a linear combination of physical quantum processes. According to Ref. [20], any TPCP-map can be represented as a linear combination of Clifford operations and Pauli channels. This is because the set of Pauli transfer matrices of m-qubit Clifford operations and Pauli channels forms a basis of m-qubit Pauli transfer matrices. Here, we will introduce the following 16 operators: where j ∈ {1, 2, 3}, (σ 1 , σ 2 , σ 3 ) = (X, Y, Z) and σ 4 = σ 1 . The Pauli transfer matrices of B ⊗m comprise a complete basis of n-qubit Pauli transfer matrices; i.e., any quantum map can be represented as a linear combination of Clifford operations and Pauli channels. Since the application of non-Clifford operations requires complicated processes such as magic-state injection, distillation, and teleportation, this property is vital to performing probabilistic error cancellation on an arbitrary noise map in FTQC.
Specifically, when the noise can be modeled as stochastic Pauli errors, we can cancel it only with Pauli operations. This is because the set of Pauli transfer matrices of Pauli operations forms a basis of diagonal n-qubit Pauli transfer matrices. Since we can perform logical Pauli operations only by updating the Pauli frame, which is stored in a classical memory, in FTQC, we can cancel stochastic logical Pauli noise without acting on the actual quantum device. Suppose that the noise model is described by a Pauli transfer matrix acting on m qubits: where I is the identity map, P g is the Pauli transfer matrix of Pauli operator g, p g (g ∈ {I, X, Y, Z} ⊗m ) is the probability at which the Pauli error g occurs, and p err = g =I ⊗m p g . This noise can be canceled with the following map: where η g = 4 −n g c(g, g ) g p g c(g , g ) .

(B4)
Note that c(g, g ) is a function of two Pauli operators such that c(g, g ) = 1 if gg = g g and c(g, g ) = −1 otherwise. In the case of single-qubit Pauli noise, the coefficients and QEM cost γ Q can be explicitly expressed as follows.
Next, we show that the first-order approximation of the QEM cost for stochastic Pauli noise is Eq. (13). Consider an unphysical map, We can easily show where · denotes the operator norm for the Pauli transfer matrix. Here, we can see that N Pauli is an approximation of the inverse map N −1 Pauli up to first order, and we have N Pauli = N −1 Pauli + O(p 2 err ). Accordingly, the QEM cost can be approximated as γ Q ≈ p err + g p g = 1 + 2p err (B8) when p err 1.

Appendix C: Probabilistic error cancellation with gate set tomography
In this section, we explain how probabilistic error cancellation can be implemented using the result of gate set tomography. Furthermore, we also show this method is compatible with the Pauli frame.

Gate set tomography
Suppose that our goal is to characterize the gate set {G 1 , G 2 , ..., G Ns }, which involves at most N qubits. To implement gate set tomography, we measurẽ where G is one of the gates in the gate set and O i | and |ρ j are linearly independent 4 N observables and states. Note that the measurement results are generally noisy because of state preparation and measurement . Note that I| corresponds to a trivial measurement whose outcome is always unity. By inserting the identity operator I = k |O (C3) Note that A (out) and A (in) are affected by SPAM errors, which cannot be experimentally measured because they cannot be separated from each other. In addition, we replace G with the identity operation to obtain In a typical scenario of gate set tomography, the estimation of the process is represented as with B being an arbitrarily chosen invertible matrix. Denoting the error-free A (in) matrix as A (in)(0) , a feasible choice is to set B = A (in)(0) when the initialization error is small. We estimate the initial state and measurement as follows: which can be computed from the B and g matrices. Now, by implementing the same procedure for G k (k = 1, 2, ...N s ) and assuming identical SPAM errors for each experiment, we can estimate G est k = BA (in)−1 GA (in) B −1 . Although the estimated gate set, initial states, and measurements may differ from the true ones {|ρ j , O i |, G k } due to SPAM errors, they give the correct expectation value for the gate-set sequence as follows: Throughout this paper, we will refer to the transformation due to SPAM errors and the B matrix as the gauge and denote G a . In the aforementioned case, G a = BA (in)−1 . Note that the choice of the gauge is crucial because it affects the set of required QEM operations; on the other hand, the QEM operations are restricted to Pauli operations because of the use of the Pauli frame. Therefore, we need to carefully choose the gauge so that only Pauli operations are used for QEM. We propose to estimate the gate set, initial states, and measurements as follows: The above formalism corresponds to the case with gauge G a = A (out) . Later, we will see that this choice of gauge is compatible with QEM with the Pauli frame in the presence of stochastic Pauli errors.

Probabilistic error cancellation
Here, we discuss how to derive quasi-probabilities in order to implement probabilistic error cancellation based on the results of gate set tomography. In addition, we show that this method is compatible with the Pauli frame. Let us assume that we have obtained the estimations of the gate set, initial states, and measurements as follows: Let us also assume that gate set tomography is applied to the basis operations for QEM and estimates are obtained as B est l = G a B l G −1 a . Here, we will denote the ideal operation as U k . We will attempt to invert the estimated noise process, i.e., N est−1 In reality, what we implement via probabilistic error cancellation corresponds to N mit−1 k = l q l B l = G −1 a U k G a G −1 k . Therefore, the error-mitigated gate can be expressed as G mit Similarly, we try to construct the ideal initial states and measurements, i.e., |ρ i |G −1 a , respectively. Accordingly, we obtain the error-mitigated expectation value for the sequence of quantum gates, Now, let us discuss the compatibility of this method with the Pauli frame. When using the Pauli frame for QEM, only Pauli operations are allowed. This indicates that the solution of only contains Pauli operators. Here, we choose the gauge G a = A (out) . Note that in the presence of stochastic Pauli measurement errors, A (out) can also be described by a diagonal matrix corresponding to a stochastic Pauli error. We can easily check that the Pauli operations are invariant under this gauge and that N est−1 k is also described by stochastic Pauli noise when the gates suffer stochastic Pauli errors. Thus, only Pauli operators are required for QEM. Similarly, we can also show only Pauli operations are required to realize |ρ for the state preparation in the presence of stochastic Pauli errors. Therefore, this method is fully compatible with the Pauli frame.

Efficiency of gate set tomography for decoding errors
In this section, we discuss the efficiency of gate set tomography. Here, we assume that the noise of elementary logical operations is modeled as stochastic Pauli noise. Then, the noise map is of the form N = g∈{I,X,Y,Z} ⊗N p g gρg, where p g = O(p L ). We will evaluate the number of required samplings to estimate p g within a precision of (1 ± r)p g .
When we estimate each element ofG and g with the standard error δ, it is enough to perform N GST = O(δ −2 ) samplings. Suppose we have obtainedG and g with a small statistical fluctuation, asG + ∆G and g + ∆g, respectively, where each element of ∆G and ∆g is in the order of O(δ). We can show that the gate set tomography estimation in Eq. (C8) can be performed with small standard errors. |ρ est j can be directly obtained from g, and O est i |) can be obtained without any statistical errors. Then, we obtain G est as G est = (G + ∆G)(g + ∆g) −1 ∼ Gg −1 + ∆Gg −1 −Gg −1 ∆gg −1 . Thus, the elements of G est have the same order of standard error. The noise map of G est can be obtained as N = G est (G (0) ) −1 , where G (0) is an error-free gate. Thus, the logical error probabilities can be also estimated with the standard error O(δ). Let p est g for g ∈ {I, X, Y, Z} ⊗N be the estimated logical error. Then, to achieve an estimation accuracy of the form p est g = (1 + r)p g , its standard error must be smaller than rp g . Thus, the number of samples required to achieve an accuracy factor r in the above form is given by N GST = O((rp L ) −2 ). Note that when we evaluate the noise map of magic-state preparation, we perform state tomography for the magic state twirled with Clifford gates, which also requires O((rp L ) −2 ) samplings for the same accuracy.
Next, we will show that the required number of samplings can be improved to N GST = O(r −2 p −1 L ) when the target gate of gate set tomography is a logical Clifford gate. When there is no noise, every element ofG and g is obtained as the results of Pauli measurements for a stabilizer state. Each element ofG and g is zero if the measured state is not an eigenstate of an observable, and ±1 otherwise. Since the noise is modeled as stochastic Pauli noise, these elements are zero even in the presence of noise. Thus, we do not need to perform sampling for such elements. When the elements are ±1 without noise, the elements with noise become ±(1−2µ), which is obtained as the mean value of a random variablem such thatm = ±1 with probability 1 − µ and m = ∓1 with probability µ, where µ = O(p L ). Supposing that N GST samplings are performed on this distribution, ±(1 − 2µ) can be estimated with the standard error O p L N −1 GST . Therefore, to estimate each element of G and g with the standard error δ, it is enough to perform N GST = O(δ −2 p L ) samplings. This means when we need to estimate logical error probabilities with the accuracy δ = rp g , we only need to perform N GST = O(r −2 p −1 L ) samplings.

Appendix D: Surface code and lattice surgery
While the scope of our proposal is not limited to a specific architecture of FTQC, as an example, let us consider FTQC with surface codes and lattice surgery. Surface code [44,62] is one of the most promising quantum error-correcting codes for integrated devices such as superconducting qubits. This is because surface code has a large threshold value, its stabilizer measurements can be done in a short and constant depth, and it requires physical qubits that are allocated in a two-dimensional grid and interact only with the nearest neighboring ones. An array of logical qubits is shown in Fig. 11. There are nine boldly colored patches in the figure, each of which corresponds to a logical qubit. The data qubits are allocated on the vertices of the boldly colored red and blue squares. The red (blue) squares in each patch correspond to a stabilizer operator which acts on their vertices as a Pauli-Z (Pauli-X) operator. The width of each boldly colored patch d is equivalent to the code distance. Thus, we use O(d 2 ) physical qubits per logical qubit for surface codes with code distance d. Note that while we can reduce the number of physical qubits by using rotated surface codes [6], we used the surface codes shown in Fig. 11 for the sake of a simple numerical simulation. Meanwhile, lattice surgery [6,7] is a method to increase the number of logical qubits and perform logical two-qubit gates with physical qubits in a planer topology. When we prepare logical qubits as patches, we can perform a Hadamard gate transversally. Although a logical CNOT-gate is also a transversal in surface codes, we cannot perform it fault-tolerantly in planer topology. Instead, we implement multi-qubit Pauli measurements by merging and splitting patches corresponding to logical qubits. By using multi-qubit Pauli measurements and feed-forward operations, we can indirectly perform logical two-qubit Clifford gates fault-tolerantly. For nontransversal operations, we can perform them by injecting, distilling, and performing gate teleportation with magic states, |A and |Y = SH |0 where S = exp i π 4 Z . Using these magic states and the technique of gate teleportation, we can indirectly apply non-transversal operations such as S-gate and T -gate. While we can perform an S-gate using |Y without consuming it, a magic state |A is consumed whenever a T -gate is performed. Therefore, the number of required magic states for T -gate is a dominant factor in the execution time.
Although the above-mentioned strategy of FTQC was used in the main text, there are several other possible strategies choosing codes and logical operations that improve efficiency and feasibility. For instance, we can also construct logical qubits as defect pairs in a single large patch and perform two-qubit Clifford gates with braiding [5]. We can also use concatenated Steane codes or color codes if we can perform CNOT operations more flexibly. An S-gate can be achieved with code deformation [63] instead of magic-state injection and distillation. We can choose CCZ-gate as a magic state instead of a T -gate [64]. We can estimate the recovery operations by using algorithms that achieve small latencies through the use of threshold degradation [11,43]. In any case, our method is general, and we expect that it is practical.

Appendix E: Concrete process for logical operations
In this section, we explicitly show how our framework works by updating the quantum state of experimental devices and the Pauli frame, i.e., a classical memory for storing and updating Pauli operators for obtaining errorcorrected results via post-processing of measurement outcomes. Note that there is latency in collecting sufficient information to correct quantum states since there are errors in syndrome measurements and we cannot instantaneously perform feedback operations for error correction; the required recovery Pauli operations need to be continuously updated in the Pauli frame classically because unitary Clifford operators are performed for computation while storing the outcomes of the syndrome measurements. Finally, we obtain the error-corrected results from the outcomes of post-processing measurements based on the state of the Pauli frame.
First, we describe a typical construction of FTQC without QEM as studied in Refs. [7,35]. We reformulate it with the superoperator representation so that we can introduce notations that will simplify the description of our framework. Then, we explain the FTQC framework incorporating both QEC and QEM. In this framework, the decoding errors are assumed to be not negligible but can be mitigated by probabilistic error cancellation in the logical space.
In FTQC, the process of making syndrome measurements on all the logical qubits is called a code cycle. For reliable decoding, we need to wait for several cycles depending on the code distance before processing the next logical operation. Here, we will call the unit of latency for the slowest logical operation (i.e. the maximum number of cycles that are required before being ready for the next logical operation) a step and assume that in each logical operation, every logical qubit waits until all the logical qubits become ready for the next logical operation. This leads to a simple definition of the states of the quantum devices and the classical memory at a certain step. We should emphasize that this unit is introduced only for the sake of illustration and that our scheme can be applied to asynchronous processing of logical operations.

FTQC architecture without QEM
At the t-th step of FTQC without QEM, we have to update two components: a quantum device of which the state is |ρ  PF . Let us denote the Pauli transfer matrix of an actual quantum process that suffers noise until the t-th step as U (t) dev and an ideal initial state as |ρ 0 . Then, the state at the t-th step can be represented as |ρ dev is the superoperator of a trace-preserving and completely positive map, but it is not a unitary process, since the actual process requires several intermediate measurements. Denote the ideal operations until the t-th step asŨ (t) . Here, we aim to have for an arbitrary step t. Note that since several successive syndrome values are required to calculate P (t) PF , the Pauli frame at the t-th step turns out with a latency in cycles that is at least proportional to the code distance [35].
We will focus on a simplified universal set of logical operations, including preparation of logical |0 L and |A L , logical Clifford operations including logical Pauli operations, logical single-qubit Pauli-Z measurements, and logical gate teleportation for performing a logical T -gate. While these logical operations can be divided into more basic logical operations such as merge-and-split operations in lattice surgery [6], we will use this set for simplicity. In the following, we illustrate how the physical states and the Pauli frame are updated in each case.

a. Preparation of logical states
There are two types of initialization in FTQC: preparation of a logical |0 L state and a logical magic state |A L implemented in the surface code. Here, we describe an operation to add a clean logical qubit: Let us first explain the procedure for preparing |0 L . We join n physical qubits to a system, where n is the number of qubits used to construct a logical qubit. The initial state of the joined data qubits can be any random state. We measure all the joined data qubits in the Pauli-Z basis. If there is no error in this measurement, we obtain a computational basis |x , where x ∈ {0, 1} n is n-bit outcomes. We suppose a Pauli operator P x = i X xi ; then we find P x |x = |0 ⊗n . The state |0 ⊗n is the +1 eigenstate of all the Z-stabilizer operators and the logical Pauli-Z operator, but not an eigenstate of any X-stabilizer operator. We perform X-stabilizer measurements to project |0 ⊗n to the code space. If the state is projected to the +1 eigenspace for all the X-stabilizer operators, the state becomes the +1 eigenstate of all the stabilizer operators and the logical Pauli-Z operator, which is the definition of |0 L . If any of them is −1, we can find a Pauli-Z operator P z which anti-commutes with all the X-stabilizer operators with −1 outcome and commutes with the other stabilizer operators. Accordingly, we have that P z P x |ψ , where |ψ is the state after X-stabilizer measurements on |x , is equal to |0 L . The procedure to add |0 L is as follows: suppose that U (t+1) dev is a sequence to join n qubits; measure all of them in the Pauli-Z basis, and perform X-stabilizer measurements. The Pauli frame P (t+1) PF is the tensor product of P (t) PF and the superoperator of P z P x . In practice, Pauli errors may occur during the above measurements. Since the error rates are expected to be below the threshold value, they can be reliably detected in the succeeding stabilizer measurements and corrected by updating P z P x . See Sec. E 1 c for the update of the Pauli frame with latency.
The other initialization is the preparation of the logical magic state |A L , which is, as detailed later, used for performing non-Clifford gates fault-tolerantly via gate teleportation. In order to use magic states for gate teleportation, the infidelity of the logical magic state must be comparable to or smaller than the required logical error rate. On the other hand, since |A L is not an eigenstate of the logical Pauli-Z operator or logical Pauli-X operator, we cannot use the same method we used above for preparing |0 L . Instead, a logical magic state |A L with sufficient fidelity for gate teleportation can be constructed as follows: I) create a noisy magic state |A L whose code distance is a small constant d s , II) expand the code distance from d s to d fault-tolerantly, and III) create a clean magic state from several noisy magic states with magic-state distillation. If the fidelity of noisy magic states is sufficiently large and a magic state is sufficiently distilled, we can obtain a clean magic state together with its Pauli frame. See Refs. [7,35,65] for details on these procedures.
Thus, we can perform an updateŨ where P (Q) PF turns out with a latency because of noisy syndrome measurements.

b. Logical Pauli Operation
Logical Pauli operations are special cases of logical Clifford operations and can be performed much more easily than general Clifford operations because of the Pauli frame. When we perform a logical Pauli operation P (t) at the t-th step, we need to construct P (t+1) PF and U (t+1) dev . Since a logical Pauli operation is a product of physical Pauli operations, there are two ways to achieve this. One is to update only the Pauli frame as follows: Note that this is an instantaneous and noiseless operation because it is processed only on the classical computer. We refer to this scheme as a Pauli operation by software update.
The other is to update the physical device instead of the Pauli frame.
Compared with the software update, this operation requires operation of the physical quantum devices, and hence, it may involve an error. We call this scheme a hardware update. Note that this procedure is a transversal single-qubit Pauli operation, so we expect that it negligibly increases the physical error rate per code cycle. Although the hardware update seems to have no advantage in a typical architecture of FTQC, logical Pauli operations by hardware update are required in the Pauli twirling of logical errors in FTQC, as described in Appendix. F.

c. Logical Clifford Operation
Unlike logical Pauli operations, Clifford operations require physical operations to be performed on quantum devices. Here, we denote the Pauli transfer matrix of physical errors at the t-th step as P (t) rec . Note that since quantum states are projected to the code space with syndrome measurements, we expect that P (t) rec can be approximated as a Pauli error. For the case that we cannot assume P (t) rec to be a Pauli error, we can twirl it with stochastic Pauli operations by hardware update and project it to the stochastic Pauli errors. See Appendix. F for details.
Pauli errors are detected via syndrome measurements. Furthermore, the recovery Pauli operation P (t) rec can be estimated with a high success probability and the Pauli frame can be updated in the same way as Eq. (E4). If there are no measurement errors, the recovery operation for a certain cycle can be estimated only from the syndrome values during the same cycle. However, when the syndrome measurements suffer physical errors, we need to idle the logical qubits for d cycles to collect sufficient syndrome measurement outcomes before starting the next Clifford operation to guarantee exponential decay of the logical error rate [35]. Therefore, the recovery operations turn out with a latency of at least d cycles. In practice, the latency is larger than d cycles since we also need post-processing for signal discrimination and of the decoding algorithms on classical peripherals.
Suppose that we perform a logical Clifford operation C (t) and that physical Pauli errors P (t) rec happen during the Clifford operation, which appear after the decoding. Here, we can ignore recovery failures when the code distance is sufficiently large. We aim to obtain P A logical Clifford operation updates the state of the quantum devices as follows: In addition, the Pauli frame is updated as follows: Since a logical Clifford operation is also a Clifford operation on physical qubits in the case of stabilizer codes, P (t+1) PF is a Pauli operator.
In practice, P rec is expected to have a large latency since it requires post-processing to estimate the recovery operation. Thus, another Pauli error might happen during the idling operations performed while waiting for the estimation of P (t) rec . Nevertheless, the next logical Clifford operation or logical Pauli measurement can be performed before P (t) rec is estimated. As an example, consider the case in which C (t+1) is processed before the last Pauli frame is updated. The Pauli frame of the next step is represented as Since Pauli actions commute, we can update the part rec (C (t+1) ) −1 even in the next step. We can use a similar technique when the next operation is a logical Pauli measurement, which is explained in the next section. Therefore, as long as all the successive operations are Clifford operations or Pauli measurements, we can postpone the application of the recovery operators and updates of the Pauli frame. This technique is essential not only for improving the throughput of logical operations in FTQC but also for avoiding exponential growth in latency [11].
Note that in the lattice surgery scheme, logical CNOTs are performed by preparing a logical |0 L state, making logical Pauli measurements, and adaptive Pauli operations [7]. While none of them is a Clifford operation, we can consider them on the whole to constitute a unitary process. Adaptive Pauli operations can be postponed for the same reason as the updates of the Pauli frame.

d. Single-qubit logical Pauli measurement
We can perform a destructive single-qubit logical Pauli-Z measurement by making physical Pauli-Z measurements on all the data qubits of the target logical qubit. Note that logical Pauli measurements with another Pauli basis can be performed by swapping Z and X or combining the logical Pauli-Z measurement with single-qubit Logical Clifford operations. The outcome of a logical measurement is calculated as the parity of the measurement outcomes of the data qubits. In other words, let n be the number of data qubits of the target logical qubit, and Π x = i (I + (−1) xi Z i )/2 for x ∈ {0, 1} n be an operator that projects all the data qubits in a physical computational basis |x . A singlebit outcome of a single-qubit logical measurement is calculated as a parity of the outcomes of physical measurements x. We denote this function as f : {0, 1} n → {0, 1}. Therefore, we aim to estimate x and obtain f (x) fault tolerantly.
When we perform physical Pauli-Z measurements, physical Pauli errors P (t) rec occur on the data qubits. Also, as explained in the section on logical Clifford operations, the Pauli frame P (t) PF does not have the same timing as the logical measurement. Even in this case, the recovery operation can be applied after the physical Pauli-Z measurements have been performed on the data qubits, since where ⊕ represents an element-wise XOR, and y ∈ {0, 1} n is a binary vector such that acts on the i-th data qubits as a Pauli-X or Y operator and y i = 0 otherwise. We denote this function as rec ). Therefore, we obtain x as an outcome of the measurement on the uncorrected state |ρ (t) dev . Notice that y appears when the decoding process catches up with the code cycle of the logical measurements, and we can retrieve the error-corrected logical outcome as f (x⊕mask(P (t) PF P (t) rec )). The update rules can be written as where p x is the probability with which we obtain x and Dis M [·] is an operation to discard the Pauli frame corresponding to the measured logical qubits. In contrast to the logical initialization, the update for the logical measurement will reduce the space of the target logical qubit. Several things should be noted in regard to the decoding algorithm at the measurement timing. The Pauli frame can be represented in the form ( i X xi Z zi ) for x i , z i ∈ {0, 1}. We call {x i } and {z i } as the X-part and Z-part of the Pauli frame, respectively. Since all the data qubits are directly measured in this logical operation, syndrome measurements are not performed in this code cycle. Instead, we calculate the values of the Z-stabilizer syndrome measurements without measurement errors as parities of x because we directly measure the data qubits without resorting to ancilla qubits. Since there is no effective measurement error in the Z-stabilizer syndrome measurements at the cycle of the logical measurement, an estimation of the X-part of P (t) PF P (t) rec can be converted into an instance of a graph problem called minimum-weight perfect matching [35]. On the other hand, the information required to construct a perfect matching problem for estimating the Z-part of the Pauli frame is lost by making direct Z-basis measurements. Nevertheless, this does not affect the computation since only the X-part of the Pauli frame is relevant to determining y = mask(P For a universal FTQC, we need to perform non-Clifford operations on logical qubits encoded in surface code. To this end, in a typical scenario, we create a logical qubit prepared in |A L and perform T -gate on the target system by consuming the magic state |A L .
First, we describe gate teleportation without noise. Let |ρ ⊗|A be the tensor product of the target system and a magic state. It is known that the following equation holds for an arbitrary ρ [35,66]: where Λ is the Pauli transfer matrix of a logical CNOTgate acting on |ρ as a control and |A L as a target, T is the Pauli transfer matrix of the T -gate, and S f (x) is the Pauli transfer matrix of the adaptive S-gate that is applied to the state if f (x) = 1 (see the last section for the definition of the function f ).
Using the above fact, we can construct the following update rules for the target system: where P (t) rec represents physical detectable errors, and Dis Q [·] (Dis ρ [·]) is an operation to discard the Pauli frame of the magic state (target state). Then, we can verify that the above update rule obeys Eq. (E4), as follows.
Note that we have used the following property of the discard operation for an arbitrary Pauli operation P: Note as well that y has latency due to the decoding process and the (S f (x) and Π x |) operations cannot be performed simultaneously. Thus, there is an additional delay in determining f (x ⊕ y), which is required for determin-ing whether we perform S or not.

FTQC architecture with QEM
The failure probability of decoding and residual errors on the prepared magic states are not negligible when the code distance of quantum error correction is not enough and when we cannot perform sufficient magic-state distillation. Here, let us consider the case that a decod-ing error N happens after each elementary logical operation. We will assume that N is a stochastic logical Pauli channel that is characterized in advance. While these assumptions do not strictly hold in practice, we nonetheless expect that they hold with negligible errors in typical quantum devices. See Appendix. F for a justification of this assumption. Note that even if there is a finite discrepancy because of this approximation, our scheme can decrease bias in the expectation values as long as the discrepancy is small. In this section, we show that probabilistic error cancellation can be integrated into the FTQC architecture. More concretely, we show that each logical operation in the previous section can be modified so that probabilistic error cancellation can remove the residual noise of QEC only with additional logical Pauli operations by software update.
Unlike in the NISQ scenario, FTQC operations are probabilistic, since intermediate measurements are involved in the gate teleportation of the magic states, and subsequent operations are adaptatively chosen corresponding to the measurement outcomes. We need to determine QEM operations accordingly because decoding of the noise processes may change depending on the measurement outcomes. Here, let us denote the set of outcomes of intermediate measurements and QEM operations up to the t-th step as h(t), with the corresponding Pauli frame and the state of the hardware denoted as P (h(t)) PF and U (h(t)) dev . Note that we do not independently define the measurement outcomes and QEM operations since they affect each other. The probability that h(t) can be expressed as p h(t) q h(t) , where p h(t) is the probability with which a certain measurement outcome of intermediate measurements is observed and q h(t) is the probability with which a certain QEM operation is performed. Note that, these probabilities are functions of h(t). Furthermore, we denote the parity of the QEM operation (the product of the parities of the generated QEM operations) and the QEM cost at the t-th step as s (h(t)) and γ (h(t)) Q . In our framework, we can construct a consistent FTQC architecture incorporating QEM by satisfying the following equation: Here, we explain why satisfying the above equation is sufficient. If we can construct U (h(t)) dev as a product of physical processes, we can sample the density matrix |ρ (h(t)) = P which shows that QEM can recover an unbiased expectation value of the observables when Eq. (E21) holds.

a. Logical Pauli operation
Since logical Pauli operations by software update are instantaneous and noiseless, we do not need to perform error mitigation on them.

b. Logical Clifford operation
Suppose that we want to apply a Clifford operation C at the (t + 1)-th step; here, C is followed by not only detected physical Pauli errors P (t+1) rec but also logical Pauli noise N reflecting a decoding failure with a non-negligible probability; i.e., the quantum device is updated as While P (t) rec is revealed with latency and canceled by updating the Pauli frame, N is not canceled and affects the expectation values without QEM. Here, We show that we can cancel N by applying QEM with a probabilistic update of the Pauli frame. Our goal is to find a set of update rules (γ Since we know the stochastic logical Pauli noise N in advance, we can calculate the inverse of the noise map N −1 = i η i P i . Next, we decompose the non-zero coefficients η i into η i = γ Q sgn(η i )q i , where γ Q = i |η i | and q i = |η i |/γ Q . We randomly choose i with probability q i , and the index i is appended to give h(t+1). We find that q h(t+1) = q i q h(t) and p h(t+1) = p h(t) . Then, we update the state of the Pauli frame and the classical coefficients with the following update rules: Now, we can verify that Eq. (E21) is satisfied in the (t + 1)-th step if it is satisfied in the t-th step.
As we will see later, we can construct a similar update rule for probabilistic error cancellation for a single qubit logical Pauli measurement; therefore, we can inductively show that a decomposition specified with the update rule satisfies Eq. (E21).

c. Logical initialization and single-qubit logical Pauli measurement
When code distances and magic-state distillation processes are insufficient, errors in the logical state preparation are not negligible. The noise maps for logical |0 state preparation can be assumed to be stochastic logical Pauli noise. While the noise map on magic-state preparation due to insufficient distillation may not be approximated as a stochastic Pauli noise, it can be twirled by error-mitigated logical Clifford operations. Thus, these errors are assumed to be inserted stochastic logical Pauli noise maps just after initialization. We can consider that these preparations are ideal, and instead, there exists a virtual noisy idling operation just after initialization. We can mitigate these errors with the same update rules as in the case of logical Clifford operations by treating these errors as decoding errors. Similarly, the errors of singlequbit logical Pauli measurement can be considered to be a probabilistic logical bit-flip just before the logical measurement; as such, they can be processed in the same manner.

d. Gate teleportation with magic state
To perform gate teleportation on the T -gate, we can try the following process: and then perform S f (x) depending on the measurement outcomes to indirectly perform T on the target system. However, in practice, what is performed until the measurement is made is where P (t) rec means the physical errors caused by the logical magic-state preparation, logical CNOT, and logical measurements. Compared with the case without QEM Eq. (E14), there is additional logical Pauli noise N that is caused by the failure of the decoding to estimate P (t) rec . Since this procedure involves an intermediate measurement, we obtain the outcome x randomly. The procedure for the inverse decomposition is the same as that of a logical Clifford operation: we calculate the inverse of the noise map N −1 = i η i P i and decompose each non-zero coefficient η i into η i = γ Q sgn(η i )q i , where γ Q = i |η i | and q i = |η i |/γ Q . Note that the measurement result x is added to h(t) together with the choice of QEM operation i to obtain h(t + 1). Next, q h(t) is updated to q h(t+1) = q i q h(t) and p h(t) is updated to p h(t+1) = p x p h(t) , where p x is the probability that the measurement outcome is x. Accordingly, we can update the relevant elements with the following rules: where Note that an additional decoding error that depends on whether we perform S or not would occur after the delayed application of S f (x⊕y) . While this map is omitted for simplicity, we can perform probabilistic error cancellation on it in the same way.
With the above update rules, we can verify that Eq. (E21) is obeyed at the (t + 1)-th step. The product of U (h(t+1)) dev and P (h(t+1)) PF is evaluated as follows: Therefore, the expectation value is evaluated as follows.
Therefore, we have verified that all the logical operations obey Eq. (E21).

Appendix F: On the noise model of the decoding errors
In the main text, we assumed that the decoding errors for elementary logical operations can be modeled as Markovian and stochastic Pauli noise obeying Eq. (8).
Here, we justify this assumption. First, we will determine whether an actual noise model of syndrome-measurement cycles of surface codes can be treated as Markovian or not. When the syndrome measurements may output an incorrect value, we need d consecutive syndrome values for reliably estimating the recovery operations. Since the quantum states are in the logical code space only after the recovery operations, the actual quantum states are not in the code space during FTQC. This makes it hard to evaluate the logical noise map for several cycles during FTQC, because the map does not take a logical state to another logical state, while we need to evaluate the logical noise map in advance in order to perform QEM on the code space. To avoid this problem, we assume that the following noise model can well approximate the actual noise model: suppose that we can perform perfect syndrome measurements in the ld-th cycle, where l = 1, 2, ..., and that we can perform recovery operations just after that. Then, the quantum state is in the logical code space at the ld-th cycle. In this case, we can define a logical error map M dec from the (l − 1)d-th cycle to the ld-th cycle. Here, we assume that if we have a logical operation U requiring χd cycles, the logical map including effective decoding errors can be approximated with M χ dec U when the code distance d is sufficiently large. If this assumption holds, we can cancel the noise map M χ dec by performing QEM on each logical operation. Although the actual χ depends on the logical operations, we will assume that χ = 1 for simplicity.
The following numerical analysis shows that this assumption holds at least when the physical errors are stochastic Pauli noise. Let M dec,c be a noise map for c-cycle idling of a single logical qubit with code distance d, and let Λ (c) be the Pauli transfer matrix of M dec,c . Here, M dec,c is a stochastic noise map since a stochastic Pauli error can only cause logical Pauli errors; thus, Λ (c) is a diagonal matrix. Our assumption can be rephrased as follows: There is an effective Pauli transfer matrix Λ eff such that Λ (c) = Λ c eff for sufficiently large c. Equivalently, we assume that each diagonal element decays exponentially to the number of cycles c. Since Λ 00 is always unity for stochastic Pauli errors, we are interested in the other diagonal elements. Fig. 12 plots the diagonal elements, except Λ 00 , according to the number of cycles. We utilize the same settings as in Sec. IV A 2, i.e., a depolarizing noise map with p = 0.01. Note that Λ 33 and Λ 11 are equal, since the behavior of surface codes is symmetric for Pauli-X and Z errors. In the figure, the circles are the numerical results, and the dashed lines are fitting results with an exponentially decaying function. We used data points with c > 20 for the fitting. The results agree well with our assumption, except for the region where the cycle count c is around 1. Thus, we can conclude that if a physical error is stochastic Pauli noise, we can define an effective logical noise map for each logical operation.
Next, we discuss the case in which the physical noise cannot be modeled as stochastic Pauli noise. In practice, the noise of calibrated quantum operations is expected to be almost stochastic Pauli, since the unitary component of the physical errors can be canceled by echoing. In addition, it is expected that the noise map per code cycle in the code space of surface codes is well approximated as stochastic Pauli noise at a large distance [33]. While the logical noise map on a prepared magic state also suffers the noise from magic-state injection and distillation, this noise can be twirled to stochastic Pauli noise with errormitigated logical Clifford operations. Nevertheless, there may be non-negligible coherence in quantum noise of logical Clifford gates. In this case, we can twirl the noise map per code cycle and remove the unitary component of the noise via logical Pauli operations with hardware update (See Appendix. E for the definition of hardware update). Below, we show that we can perform twirling on logical noise caused by logical Clifford operations if logical Pauli operations with physical operations, i.e., logical Pauli operations without updating Pauli frame, can be performed with negligible error rates. Note that the logical error rates for logical Pauli operations are expected to be sufficiently smaller than those for logical Clifford operations. This is because logical Pauli operations can be performed with transversal single-qubit operations that are completed in a single cycle, and the errors caused by these operations are negligible compared with those caused by two-qubit operations for stabilizer measurements. Suppose that we perform a logical Clifford operation C and a logical noise map M χ dec follows it. Further suppose that we perform twirling noise M χ dec with a set of Pauli operators S. The twirling process can be described as follows: where P is the superoperator of the logical Pauli operations. Since C † PC is a logical Pauli operation, M χ dec can be twirled simply with logical Pauli operations. The same arguments hold for Pauli measurements and feedback operations dependent on their outcomes. Since all the elemental logical operations except for magic-state injection are Clifford operations or Pauli channels, we can apply Pauli twirling to most of the quantum operations in FTQC. Note that while logical Pauli operations for computation can be done by updating the Pauli frame, the logical Pauli operations for twirling require a physical implementation on quantum devices. This is because when we attempt to perform logical Pauli operations via the Pauli frame for twirling, we need to update the Pauli frame according to the actual logical operation M χ dec C (see Appendix. E for the formalism of the Pauli frame in superoperator representation). However, since M χ dec is not a stochastic Pauli noise, we cannot keep the Pauli frame as a Pauli operator; thus, we cannot continue tracking the frame as a Pauli operator. Thus, the Pauli twirling must be done physically.

Appendix G: Details of the numerical analysis
In the numerical simulations for evaluating the decoding errors, we used a uniform depolarizing noise model, in which noise occurs on each physical qubit independently and acts as follows: This error acts on data qubits at the beginning of each cycle and on ancillary qubits just before the measurement. As indicated in the main text, we assumed perfect syndrome measurements at the 0-th and d-th cycles, which guarantees that the quantum states at these cycles are in logical space with recovery operations regardless of whether the decoding is successful or not. Then, we evaluated the logical error probabilities during these cycles. To estimate the recovery operation, we used a minimum-weight perfect matching decoder [44]. This decoder reduces the decoding problem to an instance of the minimum-weight perfect matching problem. While this problem is NP-hard when there are Pauli-Y errors, we can approximately solve it by using Edmonds' blossom algorithm [45]. It is known that surface codes show threshold behavior even with this approximation. We used the implementation in Ref. [67] for solving this problem. To estimate the logical error rate, we evaluated 10 5 samples for each data point in Fig. 4(b) and 10 6 samples for the other figures.
In the performance evaluation of error mitigation for decoding errors, we assumed that the error channel for each logical gate is a non-uniform logical depolarizing channel obtained in the benchmark of the logical error probabilities in surface codes. Since there is no perfect syndrome measurement in practice, this assumption does not hold exactly. Nevertheless, this approximation is asymptotically correct, and thus, we used it to evaluate the performance of QEM in the case of logical errors.
For the simulation of the Clifford circuits, we used a stabilizer circuit simulator of which the memory allocations were optimized so that the updates for the actions of the Clifford operations become sequential. With this technique, the simulation of the stabilizer circuits that were dominated by Clifford operations rather than by Pauli measurements becomes hundreds of times faster than the existing stabilizer simulators [47].
For the Solovay-Kitaev algorithm, we used the method and implementation proposed in Ref. [36]. While we need to limit the allowed number of T -gates, this method outputs a sequence of Clifford and T -gates according to the allowed error rate ε. Thus, we searched for the mini-mum error rate ε * with which the algorithm outputs a sequence wherein the number of T -gates is smaller than the allowed number of T -gates. This search involved using a simple bisection method, and it was repeated until the accuracy reached 10 −14 . For the simulation of the SWAP test circuits, we used Qulacs [68], which is a simulator for general noisy quantum circuits and is fast especially when a huge number of simulations have to be performed on small quantum circuits.

Appendix H: Quantum error mitigation for decision problems
Several important algorithms in FTQC, such as prime factoring and calculation of the ground-state energy, are a procedure to obtain the computational results via phase estimation sampling [9,28] and are not a procedure to calculate expectation values of observables. Since typical QEM techniques are designed to reduce the bias in the expectation value caused by noise, it is not clear whether QEM can be applied to such sampling algorithms. In this section, we show that QEM can be utilized for mitigating errors in a wider range of problems than calculating expectation values. In particular, we show that several promising long-term algorithms, such as ground-state energy estimation via phase estimation sampling and Shor's factoring algorithm, can be decomposed into a series of decision problems, and show that QEM can be applied to each algorithm to solve the decision problem. To the best of our knowledge, this has not been mentioned in the context of QEM, while a similar concept has been known in the context of quasi-probability sampling for classically simulating quantum circuits [69,70].
First, we show that ground-energy estimation with phase estimation sampling [9,10] can be decomposed into a series of decision problems with a bisection method. In a quantum phase estimation routine, we prepare an FIG. 13. The diagram of circuit conversion from the original decision problem to fully quantum picture. Here, ρ is a n-qubit initial state, E is a noisy process, D is a perfect dephasing noise map, f is a classical binary function on nbit string x, and U f is a quantum circuit that simulate f as U |0 |x = |f (x) |x . See the main text for details. initial state, perform phase estimation based on quantum simulation. Then, we obtain a quantum state |ψ = i α i |Ẽ i |ψ i , where |Ẽ i is the energy of the i-th eigenstate or its relevant value in binary representation, |ψ i is an eigenstate of the given Hamiltonian, and α i corresponds to the overlap between the i-th eigenstate and the initial state. The Pauli-Z basis measurement is performed on the first register of |ψ , and E i is sampled with the probability |α i | 2 . Now, we apply a bisection method to estimate the ground state energy. The subroutine outputs 1 if the sampled energy is smaller than a given parameter K and outputs 0 otherwise. Suppose that the prepared initial state has an overlap with the ground state larger than 1/poly(n). If the ground energy is smaller than K, the subroutine outputs 1 with probability more than 1/poly(n). Otherwise, the subroutine always outputs 0. We denote the process before the Pauli-Z measurements as P, and the classical post-processing after the measurements to determine 0 or 1 as f . Then, we can conclude that the procedure as the sampling of a bit-string x with the probability Π x |P|0 , where Π x is the POVM element on the first register, and outputs a single bit f (x) with the classical post-processing. When the quantum process P suffers from errors, we can apply QEM to P to mitigate errors. While there may exists a complicated classical post-processing using a sampled bit-string after the quantum process P, the combination of sampling Π x and classical post-processing f can be interpreted as a quantum process with strong dephasing noise. The evaluation of decision bit f (x) can also be interpreted as the evaluation of the Pauli-Z operator. Therefore, we can interpret the whole subroutine as a fully quantum process for calculating the expectation value, and we can apply QEM to the subroutine. The process of circuit conversion is shown in Fig. 13.
With this subroutine, we can solve the original groundenergy estimation with a bisection method. We assume the minimum and maximum possible energy E min and E max as variables, respectively. Then, the subroutine is called several times to check whether the ground energy is smaller than E mid = (E max + E min )/2. If the ground energy is smaller than (E max + E min )/2, the polynomial number of calls is enough to determine the inequality with high confidence. According to the output, we reduce a possible range of the ground energy [E min , E max ] by updating E max or E min with E mid . Since the range is halved in each iteration, we need O(log ) iterations to achieve E max − E min < .
When the number of logical errors in each iteration is O(1) on average, the sampling overhead due to the application of QEM is also constant. While this conversion enables the application of QEM, this requires O(log )times more iterations for binary search compared with the original algorithm. Nevertheless, since these iterations are independent computational tasks and the total number of logical operations and hardware requirements per single execution do not change. Therefore, we can conclude that QEM can be applied not only for the evaluation of the expectation values but also for long-term applications.
For a fair comparison, the following fact should also be noted. In the case of prime factoring, we can apply the same decomposition to the algorithm by finding the minimum non-trivial factor of a given integer. However, when the noise model is stochastic and the sampling overhead of QEM is constant, i.e., the mean number of logical errors during FTQC is O(1), we can obtain a correct answer for prime factoring with a small overhead without using QEM. Since prime factoring is in the NP class, we can efficiently check whether the submitted answer is correct or not. When the mean number of errors in FTQC is O(1), i.e., the overhead of QEM is constant, a noiseless sampling occurs with a constant probability. This means we can obtain a correct non-trivial factor with constant sampling overheads even without QEM. Thus QEM is not effective for problems in the intersection of the BQP and NP class such as prime factoring. In other words, QEM is useful for problems satisfying the following two conditions; 1) problems are in the BQP class but not in the NP class, and 2) problems can be reduced to a series of decision problems. Useful algorithms for long-term applications, such as quantum simulation and the estimation of ground energy of spin models and molecules, satisfy these conditions.