Compilation of a simple chemistry application to quantum error correction primitives

,


I. INTRODUCTION
Quantum error correction (QEC), the study of how many noisy physical qubits are used to represent a smaller number of less noisy logical qubits, has seen significant recent developments in a number of directions.One such success is experimental demonstrations of error correction successfully suppressing errors on a real-world quantum device [1].Another recent development is in careful resource estimates, which have allowed for more accurate estimates of the resources a quantum computer requires to solve problems of significant interest, from estimating chemical properties [2][3][4][5][6] to factoring RSA integers [7,8].These developments together have helped define both the current state of our abilities to suppress noise on quantum devices, and where we need to get to in order to solve key industrial problems.
There are some natural next steps following the experimental demonstration of a logical quantum memory.Natural follow-ups include implementing basic logical gates: implementing Pauli gates through transversal operations, non-Pauli Clifford gates through lattice surgery techniques [9][10][11][12][13][14][15], and non-Clifford gates initially through error mitigation techniques [16] and later through magic state distillation [17][18][19].Eventually, a natural goal will be to demonstrate small-scale quantum algorithms, showing that these logical operations can be used to solve a toy application.Understanding the resources required for such an algorithm is important for knowing the point at which small applications can start being solved on fault-tolerant quantum computers, as well as helping us understand the constant factors in the scaling of large quantum algorithms.A number of algorithms have recently been proposed that are aimed specifically at this regime, referred to as "early fault-tolerant" algorithms [20][21][22][23][24]; it is therefore particularly relevant to assess how challenging even minimal applications will be to perform using fault-toleration operations.* alex.moylett@riverlane.com In this work, we estimate the resources required for implementing a small quantum algorithm on a fault tolerant quantum computer, including detailed consideration of how to perform each required operation using lattice surgery.The application we choose is quantum phase estimation (QPE) applied to finding the groundstate energy of the hydrogen molecule.This application is sufficiently small that related circuits without QEC have already been successfully run on current quantum hardware [25,26].We investigate optimisations of this algorithm at a variety of levels, including algorithmic [2,[27][28][29], gate decompositions [30], compilation to lattice surgery primitives [9][10][11][12][13][14] and generation of magic states [19].Our final resource estimates are presented in Fig. 1, looking at different physical error rates and techniques which trade off time and space resource requirements.It is worth noting that when implemented on the surface code, even this small application requires hundreds of physical qubits and thousands of QEC rounds.This shows the significant prefactor associated with quantum error correction, and suggests that in early fault-tolerance further techniques will be required to yield small-scale algorithmic demonstrations [16].
The rest of this paper is laid out as follows.In Section II, we review the algorithms and chemical system to be considered, and present the logical quantum circuit.In Section III, we describe how to decompose the logical quantum circuits into operations from the Clifford+T gate set and how to implement these gates on the surface code using lattice surgery primitives.In Section IV we estimate the overhead introduced by quantum error correction.Finally, we conclude with some open questions and further directions for research in Section V. FIG.1: Estimated cost in physical qubits and time for calculating the ground-state energy of a hydrogen molecule on an error-corrected quantum computer with varying physical error rates, using iterative quantum phase estimation.Methods for implementing logical gates through either directly implementing Clifford and T gates or moving Clifford gates through the circuit are described in Sections III B and III C, respectively.Further details about estimating these resource requirements are presented in Section IV C.
of quantum phase estimation (QPE) [31].QPE is one of the key proposed quantum algorithms for calculating ground and excited-state energies in electronic structure problems.Provided an initial trial state can be prepared that has a sufficiently good overlap with the true ground state (which is usually the case for molecular systems), QPE is capable of obtaining energy estimates to a desired precision in polynomial time with system size.However, the algorithm requires high circuit depths for non-trivial examples, and so has seen less attention compared to variational quantum algorithms in current NISQ applications.For fault-tolerant applications, however, it is often regarded as the algorithm of choice.
We focus on the "textbook" [32] and iterative (semiclassical) QPE algorithms [33][34][35][36].The textbook QPE algorithm is perhaps the best known QPE approach, the circuit for which is presented in Fig. 2a.The algorithm allows one to measure the eigenphases of some unitary U up to m bits of precision; doing so requires m ancilla qubits, in addition to the n data qubits needed to represent U .At the end of the circuit, an inverse quantum Fourier transform (QFT) is performed and the ancilla qubits are measured.If the input state |ψ⟩ is an exact eigenstate of U , then the measured bits will yield the bits of the corresponding eigenphase.For a non-exact |ψ⟩, the probability of obtaining the desired phase will depend on the overlap between |ψ⟩ and the corresponding exact eigenstate.
The inverse QFT can also be performed in a semiclassical manner [37].Using such a semi-classical QFT, the resulting phase estimation algorithm is performed iteratively, obtaining one bit of information about the phase from each iteration.We refer to this approach as iterative quantum phase estimation [36].Iterative QPE has many of the benefits of the textbook approach, in- FIG. 2: QPE circuits used in this paper.In both cases, the state |ψ⟩ is over n qubits.In (b), the circuit is iterated backwards from k = m (in the initial iteration) to k = 1 (in the final iteration).The rotation angle in iteration k is ω k = −π(0.xk+1 x k+2 . . .x m ), with ω m = 0 in the initial iteration.While the ancilla is measured at the end of each iteration, the data qubits remain coherent throughout.
cluding a Heisenberg-limited running time O(ϵ −1 ) for a precision of ϵ, but has the significant benefit that it uses only a single ancilla qubit.
We briefly give some analysis of the iterative QPE approach here.We are interested in estimating the eigen-values of a Hamiltonian where P j are n-qubit Pauli operators and c j are coefficients.We denote the eigenvalues and eigenvectors of H by {λ j ; |Ψ j ⟩}.We multiply H by a constant t such that −0.5 ≤ λ j t ≤ 0.5 for all j, which can always be achieved by choosing 1/t = 2 j |c j |.We then work with the unitary The eigenvalues of U are e 2πiϕj , where the range 0 ≤ ϕ j ≤ 1 can be chosen.It is then simple to obtain λ j t from ϕ j , which only differ due to the wrapping of phases; the normalization of Ht above is chosen to avoid potential ambiguity in this wrapping.Therefore each ϕ j can be written in binary as In iterative QPE the bits ϕ jk are measured directly using the circuit in Fig. 2b.The circuit is performed for m iterations in order to obtain m bits of precision for ϕ j , starting with k = m and iterating backwards to k = 1.
After each controlled-unitary operation, an R z (ω k ) gate is applied to the ancilla with angle which depends on the measurement results from previous iterations (and ω m = 0 in the initial iteration).The data qubits are prepared in an initial state |ψ⟩, which should be an approximation to the exact state whose energy is to be estimated.We write |ψ⟩ in the eigenbasis of H by |ψ⟩ = j ν j |Ψ j ⟩. ( The state of the qubits before the first measurement Consider the simple case where ϕ j can be represented by exactly m bits, so that ϕ j = 0.ϕ j1 ϕ j2 . . .ϕ jm 00 . ... In this case exp(i2 m πϕ j ) = exp(iπϕ jm ) exactly, and the state of the system before measurement is 1 2 Thus the probabilities of measuring the ancilla as 0 or 1 are Provided that |ν j | is sufficiently large for the desired state |Ψ j ⟩, the desired bit will be measured with high probability.The measurement will also project away the contribution from those states |Ψ j ⟩ for which ϕ jm does not match the measured result.It is simple to continue this process for subsequent iterations to k = 1.After the final iteration, the probability that all of the bits for the desired ϕ j were measured is |ν j | 2 .Therefore, for a sufficiently good initial state, and a sufficient number of repetitions, the ground-state energy can be measured with high probability.Further clear analysis is given in [36].
In addition to the textbook and iterative QPE methods, there has been recent progress on statistical phase estimation methods [21,23,[38][39][40].Compared to the above approaches, such statistical methods allow shorter circuit depth [22,23] and ready combination with error mitigation techniques [26], in exchange for performing many circuits.It has been suggested that these methods are particularly appropriate for early-fault tolerant quantum computers.We do not consider such methods here, but note that they would be interesting to investigate further in the context considered here.

B. Hamiltonian simulation via Trotterization
In this section we briefly discuss first and second-order Trotterization, and present an optimization to the latter.
We consider n-qubit Hamiltonians of the form of Eq. 1.We specifically denote H j = c j P j , so that In Trotter schemes more generally, each H j might correspond to a sum of commuting Pauli terms, rather than a single Pauli contribution.We are concerned with implementing an operator U = e iHt , controlled on an ancilla qubit.The well-known firstand second-order Trotter approximations, U 1 and U 2 , are and which have errors O(t 2 ) and O(t 3 ) compared to the exact U , respectively.Let us consider the number of single-qubit rotations needed to implement the controlled U 1 and U 2 unitaries, as required to perform QPE.Each controlled Pauli rotation, which we shall denote by W j , can be rewritten  3: Circuit diagrams demonstrating reduction of the controlled time evolution operator in QPE with second-order Trotterization.(3a) The time evolution operator controlled on an ancilla, and acting on an initial trial state |ψ⟩, which can be equivalently replaced by (3b) in phase estimation circuits.For the second-order Trotter formula, this can be further reduced to the circuit (3c).Lastly, each pair of boxed terms can be expressed as a single multi-qubit Pauli rotation (3d).
as (16) which is a product of two multi-qubit Pauli rotations.These can be reduced to a single-qubit rotation each after conjugation through an appropriate Clifford [40].Therefore the cost of each controlled Pauli rotation is 2 single-qubit rotations plus Cliffords, and the number of single-qubit rotations for U 1 is 2L per Trotter step.
At first glance it appears that for a given t, the secondorder formula requires 4L single-qubit rotations to implement.In fact in QPE circuits this is not the case, and the second-order formula can also be implemented with 2L rotations, as for the first-order formula, but with better error suppression.This trick was introduced in Ref. [27], and is known as directionally-controlled phase estimation.It was expanded on in Refs.[2,28] and also used in Ref. [29].
We briefly give a derivation of the directionallycontrolled approach.The general procedure is presented in Fig. 3.We consider the controlled time evolution operator in Fig. 3(a).The state of the qubits at the end of this circuit is Now, note that we can apply e −iHt/2 to the data qubits in Fig. 3(a) without affecting any measurement outcomes; since this operator commutes with all controlled-e iHt gates, it can be moved to the end of the circuit where it has no effect on the measurement of the ancilla.With this additional operator applied, the final state of the qubits is and we see that we can work with circuit in Fig. 3(b) instead.
We next expand e iHt/2 via its Trotter formula, where K is the number of terms in the Trotter product formula, equal to L for the first-order formula and 2L for second-order formula.Then, For even-order Trotter formulas the string of operators , and the expansion is unchanged when the order of the terms is reversed.Therefore, for the second-order Trotter formula (but not the first-order formula) we can write which is equivalent to the circuit in Fig. 3(c).Lastly, note that the paired operators in Fig. 3(c) can each be expressed as which can be reduced to a rotation on a single qubit plus Clifford gates.Therefore, application of the secondorder Trotter formula in QPE can be performed with 2L rotations, which is equal to the number required for the first-order Trotter formula.In addition, the Trotter expansion is applied to the operator e iHt/2 instead of e iHt , resulting in lower Trotter error.

C. The hydrogen molecule
We next define the Hamiltonian that we will consider throughout this paper.As an application of QPE, we will consider the common task of finding the ground-state energy of an electronic structure Hamiltonian.Such a Hamiltonian can be defined in second-quantized form as where p, q, r and s label spin orbitals.The coefficient h 0 defines the nuclear-nuclear contribution (which is just a number due to the Born-Oppenheimer approximation), and h pq and h pqrs are one-and two-body integrals, respectively.The form of these integrals are well known from quantum chemistry [41].
In this paper we are concerned with compiling a minimal chemistry problem to lattice surgery operations, including visualization of the patch layout.We therefore consider the hydrogen molecule H 2 in a STO-3G basis, which is a prototypical minimal molecular example, consisting of 2 electrons in 2 spatial orbitals, or 4 spin orbitals.We use an equilibrium geometry with an internuclear distance of 0.7414 Å.
The fermionic Hamiltonian in Eq. 23 must be mapped to a qubit Hamiltonian for use in QPE.Because the minimal basis for H 2 consists of 4 spin orbitals, direct mappings will result in a Hamiltonian with 4 qubits.However, as shown by Bravyi et al. [42], the qubit Hamiltonian for this problem can be reduced to just a single-qubit operator.This can be seen from symmetry arguments; the H 2 Hamiltonian (in this non-relativistic approximation) commutes with spin and particle-number operators, and also has spatial symmetry.Each of these symmetries allows one qubit to be tapered.More precisely, labelling the bonding and antibonding orbitals as ψ g and ψ u , and ordering the spin orbitals as ψ g↑ , ψ g↓ , ψ u↑ , ψ u↓ (that is, using a spin-interleaved arrangement), the only determinants that contribute to the ground-state wave function are |1100⟩ and |0011⟩, and these two states can be represented by a single qubit.A more general approach for tapering qubits due to Z 2 symmetries is given in Ref. [42].
The Hamiltonian used takes the form with c 1 = 0.78796736 and c 2 = 0.18128881, and we have neglected the identity contribution.Note that an identical qubit Hamiltonian was considered in Ref. [25], which performed textbook QPE on a neutral-atom quantum computer.

D. Overall logical circuit
Using the second-order Trotter formula techniques described in Section II B, we derive logical circuits for both textbook and iterative quantum phase estimation.We choose a time step t = π/(c 1 + c 2 ), where c 1 and c 2 are defined in Eq. 24, in order to ensure that eigenvalues of Ht are in the range [−π, π].In this simple application, we take just a single time step in the Trotter expansion of e iHt .We also perform QPE for just three bits of accuracy in the energy.These simplifications will of course lead to large errors in the final energy estimate; indeed, after removing rescaling factors, using three bits of precision means that the energy can only be estimated to precision (c 1 + c 2 )/4 = 0.242 Ha.Here, we are primarily interested in understanding the required circuits in terms of lattice surgery primitives.Increasing the number of Trotter steps, or bits of precision, does not provide further insight beyond increasing the circuit depth (and number of ancilla qubits, in the case of textbook QPE).
To implement e −ic1 Z⊗Zt/4 and e −ic2 Z⊗Xt/2 we use rotation operations R Z⊗Z and R Z⊗X , defining R P (θ) = e −iP θ/2 .Thus we have rotation angles θ 1 = tc 1 /2 and θ 2 = tc 2 for the Z ⊗ Z and Z ⊗ X rotations, respectively.We also make a minor optimisation by combining pairs of R Z⊗Z (θ 1 ) rotations into a single R Z⊗Z (2θ 1 ) rotation where possible.Figures for the logical circuits are provided in Appendix A.
For both iterative and textbook QPE circuits, the gates can be grouped into three types: Pauli gates such as the X gate, non-Pauli Clifford gates such as the Hadamard and S † gates, and non-Clifford gates such as the two-qubit rotations and T † gates.These different types of gates require different techniques to be implemented on the surface code, which we shall detail further in Section III.

III. IMPLEMENTING LOGICAL GATES
In this section we discuss how to implement the logical circuits presented in Section II D using operations available on the surface code.The surface code represents logical qubits as patches of d × d data qubits, where d is the distance of the code [43,44].Stabilisers consist of weight-4 X and Z measurements on the patch, and logical X and Z observables are defined along the horizontal and vertical boundaries of the patch [9].The surface code has proven to be a popular candidate for fault tolerant quantum algorithms, due to both its high threshold and low connectivity requirements.In particular, a number of resource estimation papers use the surface code as the basis for estimating the overhead from quantum error correction [4][5][6][7][8].
This section proceeds as follows.First, we approximately decompose the logical gates into a sequence of Clifford and T gates.Then we consider two potential methods for implementing these gates: in Section III B, we implement the Clifford and T gates directly using native lattice surgery operations; whereas in Section III C, we use commutation relations to remove Clifford operations from the circuit, at the cost of needing to implement more general T -like operations.

A. Decomposition to Clifford and T gates
Ideally we would want to implement logical quantum operations transversely on our error-correcting code, applying the operation to each physical qubit(s) in turn.Unfortunately, the Eastin-Knill theorem shows that this is not possible for any quantum error-correcting code [45].In the case of the surface code, the logical gates which can be implemented transversely are single-qubit Pauli gates if the code distance is odd.Other gates within the Clifford group can be implemented on the surface code via lattice surgery operations such as patch deformation [9], but non-Clifford gates such as the T gate cannot be implemented in an error-corrected fashion.
However, it is possible to approximately decompose an arbitrary unitary operation into a sequence consisting of Clifford gates and the single-qubit T gate.This was shown for arbitrary gates originally using the Solvay-Kitaev theorem [46], and a number of improvements have been subsequently shown for both single-and multi-qubit gates [30,47].
In the case of the QPE circuits in Section II D, the circuits contain a mixture of Clifford and non-Clifford gates.As Clifford gates can be implemented on the surface code via lattice surgery and patch-deformation techniques, such as the ones that we shall describe in Section III B, we only need to decompose the non-Clifford gates.Both the textbook and iterative QPE circuits consist of a series of two-qubit Pauli rotations as part of the Trotter expansion.In textbook QPE, there are controlled phase gates after the two-qubit rotations, to implement the inverse quantum Fourier transform.In iterative QPE, the two-qubit rotations are followed by (classically conditioned) single-qubit phase gates to implement a semi-classical version of the inverse Fourier transform.
We decompose the non-Clifford gates in two steps.First, we exactly compile the two-qubit operations into Clifford gates and single-qubit Z rotations and phase gates using circuit identities presented in Figure 4. Second, we use the gridsynth software package to approximately decompose the single-qubit Z rotations into sequences of one-qubit Clifford and T gates [30].Note that the two-qubit Z ⊗ Z rotations require a different decomposition to the controlled phase gates, due to differences in local phases.In comparison, single-qubit Z rotations are equivalent to single-qubit phase gates up to a global phase R Z (θ) = e −iθ/2 P (θ), and can therefore be decomposed using the same techniques.An example of using gridsynth to approximately decompose a single-qubit Z rotation into the Clifford and T gate set is provided in Figure 5.
There is a trade-off to be made between the accuracy of decompositions generated by gridsynth and the number of gates required.gridsynth approximates a single rotation R Z (θ) up to error ϵ in the operator norm with typically 3 log 2 (1/ϵ)+O(log(log(1/ϵ))) non-Clifford gates [30].To get an understanding of how this extends to a whole circuit, we ran simulations of the textbook and iterative QPE circuits with gridsynth decompositions of varying accuracy, from 1 bit to 32 bits.For each number of bits of accuracy, we generate 1000 circuits with the single-qubit rotations decomposed to that degree of accuracy, and simulate each circuit 10,000 times.
The results are presented in Figure 6.In Figure 6a, we take the total variation distance between the output distributions of the decomposed circuits with that of the perfect QPE circuit.From this we see that for both textbook and iterative QPE, the total variation distance reduces quickly to approximately 8.3 × 10 −3 at 10 bits of precision per gate decomposition, but tails off beyond this value.This is due to finite precision used when estimating the total variation distance from samples.In the following results, we choose 10 bits of precision for the decomposition of phase gates, as it provides sufficient overall total variation distance for purpose of this circuit.
We also present the number of gates required for each gate decomposition accuracy in Figure 6b.For 10 bits of precision, there are approximately 1,300 and 1,000 logical gates for textbook and iterative QPE, respectively.Fewer logical gates can also be used at the cost of increased error; for example, fewer than 1,000 logical gates can be achieved with 5 bits of precision per rotation: 870 gates for textbook QPE, and 740 gates for iterative QPE.The total variation distance at 5 bits of precision is 2.4%.
The results in Figure 6b also show that for this particular circuit iterative QPE requires fewer gates than textbook QPE regardless of decomposition accuracy.This is due to the fact that the inverse QFT step of textbook QPE requires two-qubit controlled phase rotations around fixed angles θ.These are subsequently decomposed into smaller rotations θ/2 and −θ/2, which are then approximately decomposed using gridsynth.In comparison, iterative QPE works with single qubit phase rotations θ which are classically controlled.As these single angles are larger than those used for the single-qubit rotations in textbook QPE, fewer gates are required to decompose them up to a desired accuracy.For this particular QPE circuit, which is only performed to three bits of accuracy, the smallest rotation angle required beyond the Hamiltonian simulation step in the iterative QPE circuit is −π/4, which can be implemented as a single T † gate.Hence for the rest of this paper we shall primarily focus on iterative QPE.
Finally, for anyone curious to see an example of the complete logical circuit, we have included example QASM circuits in the supplementary material to this paper [48], including an iterative QPE circuit with phase gates decomposed up to 10 bits of precision.This circuit features a total of 1029 operations, of which there are 13 X gates, 169 Z gates, 34 CNOT gates, 411 Hadamard gates, 13 S/S † gates, 386 T /T † gates, and three measurements in the Z basis.This is the circuit we will estimate the resources for in Section IV.Note that gridsynth is a randomised process, and so different runs might produce different gate decompositions than presented here.Next, we consider methods to implement Clifford and T gates in the logical circuit.These can either be applied directly, or can be moved to the end of the logical circuit [12].In this section we first discuss the time and space cost of directly implementing both Clifford and T gates.Both of these estimates will be calculated in terms of the code distance d.The approach of moving Clifford operations will then be considered in Section III C.

R(θ)
The simplest gates to implement on the surface code are single-qubit Pauli gates.These operations can be implemented by either applying the corresponding Pauli gate to all data qubits if the distance d is odd, or, if the distance d is even, by tracking their values in software.Due to their simplicity, we shall not focus on how to implement them in this section.Likewise, preparation and measurement of a logical qubit in either the Z or X basis can be done in a single QEC round by preparing or measuring all data qubits in that basis.In many cases these operations can even be implemented at a cost of no additional QEC rounds, by preparing the data qubits at the start of the following round or measuring the data qubits at the end of the preceding round.Note however that preparing or measuring a logical qubit in the Y basis is more complicated [14].
It is important to note that not every Clifford gate presented in Section III will be directly implemented.Any sequence consisting of only Z, S/S † , and T /T † gates can be implemented at the cost of implementing a single T gate (as shown in Appendix B).Thus we can think of the sequences of gates generated by gridsynth such as those shown in Figure 5 as equivalent to sequences of alternating Hadamards and T -like gates.
Before discussing how to implement non-Pauli gates, we present how our logical qubits are arranged on a quantum processor with nearest-neighbour connectivity.For iterative QPE, we have two logical qubits, each of which is represented by a d × d patch.The primary lattice surgery operations we utilise are for implementing joint Z ⊗ Z measurements.We arrange our logical qubits as d × d patches such that performing joint measurements with the horizontal observable is easy.We also introduce two additional spaces of d × d data qubits, which can be used as both routing space for performing joint measurements with the vertical operator, and for additional qubits required for implementing logical gates.We have the layout in Figure 7, which for distance d uses a total of (2d + 2) 2 data qubits, or 2(2d + 2) 2 physical qubits including those used for measurement.

CNOT gate
A CNOT gate between a control qubit c and target qubit t can be implemented based on two-qubit joint Pauli measurements [9,11], see Figure 8. Namely, an auxiliary qubit a is initialised in the |+⟩ state, followed FIG.7: Layout of logical qubits as distance d = 3 surface code patches which can be used when directly implementing Clifford and T operations.Orange dots represent qubits used for measuring stabilisers, which are represented by squares and triangles.X and Z stabilisers are coloured in grey and blue, respectively.Data qubits are not shown, but lie on the corners of the stabilisers.Additional qubits lie outside this space for generating states required for T gates, as detailed in Section IV A. by two joint measurements: Z c ⊗ Z a and X t ⊗ X a .Finally, the auxiliary qubit is measured out in the Z basis, and Pauli corrections are applied based on the outcomes.This operation can be implemented on our patches via the protocol shown in Figure 9.In Figure 9b, we use the routing space to initialise an additional patch in the |+⟩ state.We then use a merge-and-split operation between the horizontal boundaries of the control patch and the auxiliary patch to perform the Z ⊗ Z measurement, and at the same time grow and shrink the target patch to move it into the routing space.Next, we use another merge-and-split operation between the vertical boundaries of the target patch and the auxiliary patch to implement the X ⊗ X measurement.Finally, we measure out the auxiliary patch and at the same time use patch growing and shrinking to move the target qubit back to its original space.The remaining Pauli operations can either be applied transversely at the start of the next operation if the distance d is odd, or simply tracked in software if the patch distance d is even.The operations for growing and joining patches require d QEC rounds in order to protect the code from both qubit and measurement errors, the operations for splitting and shrinking patches as well as the single-qubit logical X measurement each require a single QEC round, and the Pauli operations at the end of the circuit are effectively free, meaning a total of 3d + 4 QEC rounds are required to implement the CNOT gate.

Hadamard gate
The Hadamard gate is a Clifford gate whose role is to swap the X and Z observables of a qubit.Naïvely, this can be achieved on a surface code patch by applying a Hadamard operation transversely to all data qubits on the patch, as shown in Figure 10a.However, this has the side-effect of swapping the X and Z stabilisers as well as the logical observables, resulting in a different patch to the one we started with and making joint patch operations such as those used for the CNOT in Section III B 1 more complicated.This effect of swapping the stabilisers can be seen by comparing the patches in Figures 10a and  10b.
If we rotated the patch by 90 degrees around the central data qubit after applying the transversal Hadamard gates, then we would have implemented the logical Hadamard gate.However, this is not possible on a physical device.Instead, we use a patch deformation technique, which we present in Figures 10b-10f, to achieve the same effect [49].First in Figure 10b, we grow the patch into a longer one with length 2d + 1.At the same time we move the corner at the top right in the original patch to the top left in the longer patch.Next in Figure 10c, we use patch deformation to move the corner on the bottom-right up to the top-right.At this stage the logical observables have changed directions from vertical to horizontal and vice versa.Next, we shrink the patch down in Figure 10d.Now we have the X and Z logical observables swapped, with the stabilisers in their original positions, but the whole patch has been shifted upwards.
To move this patch back to its original position, we start by growing and shrinking the patch in Figures 10e  and 10f, but this leaves the patch one row of stabilisers higher than it originally was.To correct this, we use two rounds of SWAP gates to swap the data qubits with neighbouring measurement qubits, as shown in Figure 10f.
The most expensive parts of this process are the stages that involve patch growing and corner movement, which require d QEC rounds each.Since in general two-qubit gates are much noisier than one-qubit gates, the transversal Hadamard at the start of this sequence does not require any QEC rounds.Finally, patch shrinking and transversal SWAP gates each require a single QEC round, thus requiring a total of 3d + 4 QEC rounds.

S/S † gate
The S gate, also known as the √ Z gate, is a Clifford gate that applies a phase of i to the |1⟩ state.Like the Hadamard gate, this gate also cannot be implemented transversely on the surface code.
There are various ways of implementing the S gate using patch deformation, similarly to implementing the Hadamard in Section III B 2 [14,49].However, these require extending the X observable of a patch, and therefore require moving the patch into the routing space and back.Instead, we consider a different technique, which uses an additional patch in the |Y + ⟩ = (|0⟩ + i |1⟩)/ √ 2 state [12].We then perform a joint Z ⊗ Z measurement between this qubit and our qubit, and measure this auxiliary qubit in the X basis.Finally, we apply a Z correction depending on the outcomes of the two measurements.A circuit describing this operation is presented in Figure 11.
Note that unlike Z and X basis states, the |Y + ⟩ state cannot be generated in a single QEC round.Instead, we utilise a different technique to generate Y basis states in d/2 + 2 rounds with no additional qubits [14].With this additional patch, we can implement the logical S gate using the process described in Figure 12.Generating the |Y + ⟩ state in Figure 12a takes d/2 + 2 QEC rounds, the joint measurement in Figure 12b takes d QEC rounds, and measuring the |Y + ⟩ state in the X basis takes a single QEC round.Any Z correction can be applied in software at no additional cost, so the total number of QEC rounds required is 3d/2 + 3. Finally, note that the S † = SZ gate can also be implemented at no additional cost, by simply inverting the conditions under which the Z correction is applied.

T /T † gate
The T gate is a non-trivial gate to implement on the surface code as it cannot be performed either transversely or via lattice surgery operations such as patch deformation.Instead, we introduce an auxiliary qubit initialised in the |T ⟩ = (|0⟩+e iπ/2 |1⟩)/ √ 2 state.With this |T ⟩ state prepared, we can implement the T gate using techniques similar to those for implementing the S gate in Section III B 3. The circuit is presented in Figure 13 [12].First, we perform a joint Z ⊗ Z measurement between the data qubit and the auxiliary qubit.Next, we perform an S gate conditioned on the result of this measurement outcome.Finally, we measure out the auxiliary qubit in the X basis, and depending on this measurement outcome apply a final Z gate to the data qubit.However, while the |Y + ⟩ state can be prepared on the surface code in a fault-tolerant way in d/2 + 2 QEC rounds, the |T ⟩ state cannot be prepared on the surface code in an errorcorrected fashion, and thus additional work is required in order to prepare a high-quality |T ⟩ state.We shall detail this further in Section IV A. FIG.9: Implementation of a CNOT via lattice surgery operations, with the left patch of (9a) being the control qubit and the right patch being the target qubit.Green dots represent stabiliser measurements whose outcomes produce the result of the joint logical Pauli measurement.
We can implement this circuit on our patch layout using the process shown in Figure 14.Note that the patch for the |T ⟩ state is not stored in the routing space like the |Y + ⟩ is in Figure 12.This is because unlike the |Y + ⟩ state, the |T ⟩ state cannot be generated in a fault tolerant process, and instead needs to be generated elsewhere and stored outside of the routing space until it is required.Also note that the patch for the |T ⟩ state in Figure 14a is rotated compared to the patches for our data qubits, such that the vertical observable on the auxiliary patch matches the horizontal observable on our data patches.We use this to perform a joint Z ⊗ Z measurement between our auxiliary patch and our data patch via mergeand-split operations in Figure 14b.Finally, in Figure 14c we measure our auxiliary patch in the X basis, and at the same time we potentially apply an S correction using the methods described in Section III B 3. As with the CNOT gate presented in Section III B 1, the Z operation is effectively free as it can be either tracked in software or implemented transversely.The joint measurement requires d QEC rounds, the X measurement requires a sin-gle QEC round, and the S correction requires 3d/2 + 3 QEC rounds, leading to a total of 5d/2 + 4 QEC rounds to implement a logical T gate.
Finally, it is worth noting that other sequences of gates can also be implemented using these techniques with no extra cost.In general, any sequence consisting of only T /T † , S/S † , and Z gates can be implemented using the protocol above at the cost of implementing a single T gate.Further details are provided in Appendix B.

C. Moving Clifford gates
In this section we will consider another way of implementing the logical circuit from Section II on the surface code based on [12].This technique offers the benefit of only needing to think about how to implement the non-Clifford gates, but at the cost of increasing the complexity of implementing such gates.
state is required.Note that the S † gate can be implemented by inverting the condition under which the Z gate is applied.

Pauli product rotations
The key to this implementation method is that the logical gates we want to implement can be realised as rotations in a particular single-or multi-qubit Pauli basis.More formally, an n-qubit quantum gate can be implemented as a sequence of rotations R Pj (θ j ) = e −iPj θj /2 for suitably chosen P j ∈ {I, X, Y, Z} ⊗n and θ j [50].The simplest example of this phenomenon is the Pauli-gates themselves, which can be implemented as P = R P (π).Similarly, the T and S gates are both single-qubit rotations in the Z basis, and can thus be realised as T = R Z (π/4) and S = R Z (π/2), respectively.Singlequbit Pauli measurements, although not rotations around a Pauli basis, can also be seen as operations which project a state into a Pauli basis.In the case of QPE for example, measurements project a state into the Z basis.
The remaining gates to translate into this picture are the CNOT and Hadamard gates.Although not as easy to see as the gates listed above, both of these gates can be implemented as sequences of Pauli π/2 rotations given in Figure 15 [12].The Hadamard gate can be decomposed as ), up to a global phase.The CNOT can be written as a joint π/2 Z ⊗ X rotation, followed by a −π/2 Z rotation on the control qubit, and a −π/2 X rotation on the target qubit.This is similar to the circuit used in Figure 8, but with Pauli π/2 rotations rather than Pauli measurements.FIG.12: Implementing an S gate on a logical patch.In (12a), a patch in an S state has been initialised in the routing space using the methods provided in Ref. [14].Green dots represent stabiliser measurements whose outcomes produce the result of the joint logical Pauli measurement.

Moving Pauli rotations
The benefit of describing operations as rotations in a Pauli basis is that it becomes easier to understand how to transform them without modifying the outcome of the circuit.For example, in Figure 16, a π/2 rotation in the X basis is moved past a π/4 rotation in the Z basis.The result is that the Z rotation is transformed into a π/4 rotation in the iXZ = i(−iY ) = Y basis.These transformations can be applied more generally as well, the rules for which we discuss in Appendix C. The benefit of these transformations to the circuit is that we can move all π and π/2 Pauli rotations, which correspond to Pauli and Clifford operations, past the final measurement operation of the circuit.Operations beyond this point do not affect the outcome of our circuit, and therefore do not need to be implemented.Thus we have reduced our circuit to only involving π/4 Pauli rotations, which correspond to a generalisation of T gates, and joint Pauli measurements.We shall now look at how to implement these more general operations.

Implementing π/4 joint Pauli rotations
First we shall show how to reduce the π/4 joint Pauli rotations to joint Pauli measurements.These will then be implemented using a particular patch layout and lattice surgery operations in Section III C 4.
A circuit for implementing π/4 rotations is presented in Figure 17.This can be seen as a generalisation of the T gate circuit in Figure 13, where now the single-qubit Z basis has been replaced with a general multi-qubit basis P .The auxiliary qubit required for this operation is the same |T ⟩ = (|0⟩+e iπ/4 |1⟩)/ √ 2 state from Section III B 4. Because the rotation basis has generalised, so too have the corrective gates performed after the measurement.Now, instead of single-qubit S and Z gates we have more general π/2 and π rotations in an arbitrary Pauli basis P .The implementation of the π rotation is still a Pauli operation, and can be either tracked in software or implemented transversely as before.As for the π/2 rotation, one can account for this by employing the same techniques as described in Section III C 2 in an online fashion, moving the rotation past the final round of measurements to effectively remove it from the circuit and adjusting the subsequent operations accordingly [12].

Implementing joint Pauli measurements
Finally we discuss how to implement general Pauli measurements between patches on a surface code.The specific arrangement we use is given in Figure 18a.Note that this patch has more routing space than the one in 7, this is because the more general operations require access to both the horizontal and vertical observables of the patches.This results in six logical patches arranged on a grid of (3d+4)×(2d+2) data qubits, or 2(3d+4)×(2d+2) physical qubits total.FIG.14: Implementing a T gate on a logical patch.In (14a), a patch in a T state has been provided in some additional space generated by a magic state factory using methods described in Section IV A. Green dots represent stabiliser measurements whose outcomes produce the result of the joint logical Pauli measurement.S correction is not shown, but occurs in (14c) after the Pauli X measurement.
The most challenging operations to implement are those which include the Y basis of a qubit.This is because the Y basis does not correspond to the horizontal or vertical observable on a surface code patch, but is instead a product of both the horizontal and vertical observables.One option is to decompose π/4 rotations which involve the Y basis of a qubit into a sequence of π/4 and π/2 rotations which only act on the X and Z bases [12].However, doing so introduces π/2 rotations which cannot be moved past the π/4 rotation without reintroducing the Y basis, so such rotations would need to be implemented.Instead, we utilise another technique from [13] to implement Y basis measurements directly via lattice surgery operations.Some example measurements for implementing Pauli π/4 rotations in the Y ⊗ X and Z ⊗ Y bases are given in Figure 18.These joint measurements require d QEC rounds, followed by a single QEC round to measure the auxiliary patch in the X basis.These two sets of measurement results give us the corrections to move past future operations.
Here we utilise some lattice surgery techniques not used in Section III B. First, we add weight-five stabilisers, known as twist defects, which involve a Y Pauli term on one of the qubits.To ensure the surrounding stabilisers commute with the twist defects, we utilise two other lattice surgery techniques: first, we add domain walls, which are denoted by half-blue-half-grey squares and act as a combination of X and Z stabilisers; and second, we add elongated weight-four stabilisers, which are denoted by blue and grey rectangles.It is important to note that although these techniques allow for direct implementation of joint measurements involving the Y basis, there is an additional cost in that measuring these longer stabilisers requires additional connectivity compared to the layout used in Section III B. These extra connections between measurement qubits are not uniform, and shown by arrows in Figure 18.In general, for distance d a total of 4d extra connections are required for implementing this algorithm, which connect four columns of adjacent measurement qubits.

IV. ERROR CORRECTION OVERHEADS
We are now ready to discuss the cost of implementing these logical gates on the surface code.There are two primary sources of error which contribute to the probability of a failure at the error-correction level: first, errors from generating |T ⟩ states, which we shall explore in Section IV A; and second, errors from a logical failure on a qubit, which we shall explore in Section IV B.

A. Generating |T ⟩ states
Both of the methods used in Section III require additional qubits initialised in the |T ⟩ state.It is possible to initialise a surface code patch into an arbitrary qubit state |ψ⟩, by initialising one data qubit of the patch in the |ψ⟩ state, followed by d rounds of measurements [9].However, initialising a data qubit into an arbitrary state means that this qubit is initially unprotected from errors, so this method cannot be implemented in a way that reduces the logical error probability below the physical error probability.In fact, it can be shown that there is no fault-tolerant way of initialising non-stabiliser states such as the |T ⟩ state on the surface code [51].
Even though patches cannot be initialised in the |T ⟩ state in a way that suppresses errors, it is possible to use distillation protocols to reduce the error probability of |T ⟩ states.These protocols take multiple noisy |T ⟩ states and output a smaller number of |T ⟩ states with a reduced error probability [12,[17][18][19][52][53][54][55].For example, if it is possible to generate 15 |T ⟩ states each with error probability p, it is possible to distill these into a single |T ⟩ state with error probability 35p 3 [17].It is also possible to concatenate these factories to reduce the error probability even further.For example, if the 15to-1 protocol is used to generate 15 |T ⟩ states each with error probability 35p 3 , these can then be used in another 15-to-1 protocol to generate a single |T ⟩ state with error probability 35(35p 3 ) 3 = 1, 500, 625p 9 [12].The cost with these protocols is that reducing the error probability requires additional resources in terms of both time and number of qubits.A summary of several protocols and their associated costs is provided in Ref. [19].We also provide some example resource estimates for 15-to-1 factories in Table I, generated using code from Ref. [19].
When choosing a suitable protocol, there are multiple factors that we need to consider.First, we need to consider the overall logical failure probability from faulty |T ⟩ state generation.This means that if our logical circuit uses m T gates -and therefore requires m |T ⟩ stateswe need to choose a probability of distilled state failure p dist such that m × p dist is within our error bounds.
The second aspect we need to consider is the time required to generate each |T ⟩ state.In order to avoid logical qubits remaining idle as we wait for |T ⟩ states to be generated, we need to ensure that |T ⟩ states are gen-erated fast enough that they are available as and when they are needed.This depends on both the number of QEC rounds required to generate the |T ⟩ states, but also the number of QEC rounds required to implement these logical operations.If we implement Clifford and T gates directly as described in Section III B, the circuit primarily consists of alternating sequences of Hadamard gates, which take 3d + 4 QEC rounds, and T -like gates, which take between d + 1 and 5d/2 + 4 QEC rounds, depending on whether or not an S gate correction is required.This means that when implementing Clifford and T gates directly, a |T ⟩ state needs to be produced at least once every 4d + 5 QEC rounds.In comparison, when Clifford operations have been moved through the circuit as described in Section III C, the only operations required are a single joint Pauli measurement and a single X basis measurement, meaning that a |T ⟩ state must be produced every d + 1 QEC rounds.If a single distillation protocol cannot generate states fast enough, multiple instances of the protocol can be run in parallel to generate states more frequently, at the cost of increasing the number of physical qubits [12].As we show in Appendix D, up to four factories can be placed around the two corners at the top of the routing space.It is possible to add even more factories beyond these four, but doing so could require additional space for routing and storage of |T ⟩ states.On the other hand, if a logical |T ⟩ state can be generated faster than required, additional storage space is required to protect the state from errors while it waits to be consumed, which can be included as part of the routing space estimates.

B. Estimating code distance
To reduce the probability of a logical error occurring on one of our logical qubits, we can tweak the code distance d.A higher distance will reduce the probability of getting a sequence of physical errors which lead to a logical error, but comes at the cost of increasing both the number of physical qubits per logical qubit, and the number of QEC rounds per logical operation.In the case of the surface code, the probability of a logical error on a single logical qubit per code cycle assuming a depolarising noise model can be estimated as where p is the physical error probability [10,12,54].For the purpose of this application, we want to choose a sufficiently high d that the probability of a logical error occurring on any qubit during any QEC round is within our error bound.We use Eq. 25 to approximate our probability of a logical error at any point in the computation as where n data is the number of surface code patches for our data qubits, n route is the number of additional patches used for routing [56], and n meas is the number of QEC rounds.Given these parameters and physical error probability p, we can pick a distance by choosing an appropriate d such that Eq. 26 is within our target failure probability.

C. Results
We are now ready to estimate error correction overheads for our iterative quantum phase estimation circuit.As a recap, our circuit consists of 13 X gates, 169 Z gates, 34 CNOT gates, 411 Hadamard gates, 13 S/S † gates, 386 T /T † gates, and three Z basis measurements.As previously described, X and Z gates are free as they can be implemented transversely at the start of a QEC round.Of the S and S † gates, one is used in a sequence of T gates, and can therefore be implemented as a T -like gate.This leaves our costing as 411 Hadamard gates, 34 CNOT gates, 386 T -like gates, 12 S/S † gates, and three measurements.
We also need to make assumptions on the error correction requirements of our algorithm.We assume physical errors correspond to depolarising noise with a physical error probability ranging between 10 −4 and 2 × 10 −3 .We also assume a target failure probability of 1%, though a higher target probability can be used to reduce overheads [7,18].This target failure probability is split evenly, so the probability of errors occurring from faulty |T ⟩ state preparation is at most 0.5%, and the probability of logical errors happening on the qubits used in the logical circuit is also at most 0.5%.For 386 T -like gates, the required error rate per T gate in order to meet this error budget is 1.3 × 10 −5 .This is a higher error probability than what is seen from many distillation techniques [18,19], so instead we use code from [19] to look for smaller factories which still fit within our target failure probability.Note that both factories presented in Table I suffice at error rates 10 −3 and 10 −4 .
Our results are presented in Fig. 1.To help explain these resource estimates, the rest of this section will provide detailed costings for physical error rates of 10 −3 and 10 −4 .These physical error rates are commonly used when estimating the resource requirements of faulttolerant quantum algorithms [4,12,19].For ease of reading, a summary of these results is presented in Table II.

Cost of directly implementing Clifford and T gates
Using the estimates described in Section III B, we note that there are four logical patches to consider when estimating code distance.In terms of time requirements, CNOT and Hadamard gates require 3d + 4 rounds, S/S † gates require 3d/2 + 3 rounds, T -like gates require up to 5d/2 + 4 rounds, and Z basis measurements require a single round.This brings our total number of rounds to 2318d + 3363.Using Eq. 26, we find that for a physical error probability of 10 −3 , distance d = 12 achieves a logical error probability of 3.9 × 10 −3 , requiring 1,352 physical qubits for the patches and 31,179 QEC rounds.The factory in Table I produces a |T ⟩ state with error probability 8.1×10 −6 on average once every 31.3QEC rounds, meaning a single factory is sufficient.This factory uses 2,066 physical qubits, along with 288 physical qubits for storing |T ⟩ states.Combined with our 1,352 qubits for the logical circuit and routing, this leads to a total of 3,706 physical qubits.The additional logical qubit for storing |T ⟩ states increases the probability of a logical error to 4.9 × 10 −3 , leading to a total error probability of 8.1 × 10 −3 .
For a physical error probability of 10 −4 , an error probability of 6.9 × 10 −4 can be achieved with distance d = 6, which requires 14,953 QEC rounds and 288 physical qubits.A |T ⟩ state needs to be generated every 29 rounds with an error probability of 1.3 × 10 −5 .For this physical error probability, the factory in Table I produces a |T ⟩ state on average every 18.05 rounds with error probability 4.7×10 −6 .We use a single factory, which requires 522 physical qubits, along with 72 physical qubits for storing |T ⟩ states.Adding in our 392 qubits for the logical circuit and routing, this gives us a total of 986 physical qubits.The additional storage space for data qubits increases the probability of a logical error on qubits used in the quantum circuit to 8.6355 × 10 −4 , leading to a total error probability of 2.6 × 10 −3 .

Cost of moving Clifford gates
If we choose to move Clifford gates through the circuit, we are left with a total of 386 Pauli π/4 rotations, each of which requires d+1 QEC rounds, and three joint Pauli measurements, which require d QEC rounds each.Therefore our total number of QEC rounds is 389d + 386.We also have six logical patches allocated for both the logical circuit and routing.
For a physical error probability of 10 −3 , distance d = 11 achieves a logical error probability of 2.8 × 10 −3 , requiring 1,776 physical qubits and 4,665 QEC rounds.A |T ⟩ state needs to be produced once every 12 QEC rounds.We use the same 15-to-1 factory as in Table I, however a single factory is not sufficient for producing one |T ⟩ state every 12 rounds.Instead, we use three factories, which produce a single |T ⟩ state on average once every 10.4 QEC rounds and require 6,198 physical qubits for implementing the factories.The additional three logical qubits for storing |T ⟩ states increase the probability of a logical error on a qubit used in the quantum circuit to 4.2 × 10 −3 , leading to a total error probability of 7.3 × 10 −3 .The total number of physical qubits is 8,700.
For a physical error probability of 10 −4 , distance d = 5 achieves a logical error probability of 1.4 × 10 −3 at a cost of 2,331 QEC rounds and 456 physical qubits.Using the same factory as in Table I produces a |T ⟩ state with sufficiently low failure probability every 18.05 rounds, but a |T ⟩ is required every 6 rounds.Arranging four factories around the data qubits is sufficient to remove this bottleneck.These four factories require 2,088 physical qubits, and 200 physical qubits for storing |T ⟩ states.Adding this to our 456 physical qubits for the logical patches and routing leads to a total of 2,744 physical qubits.The extra four logical qubits for storing |T ⟩ states increase the probability of a logical error on a qubit used in the quantum circuit to 2.3 × 10 −3 , which means the total error probability is 4.1 × 10 −3 .

D. Analysis
As we can see from Fig. 1, there are still some significant overheads introduced from quantum error correc-tion.The most optimistic error rates still require hundreds of physical qubits and thousands of QEC rounds, while at an error rate of 0.2% this circuit requires tens of thousands of qubits and QEC rounds.
From the detailed costings of Section IV C, we can identify several bottlenecks with these approaches.For physical qubits, the overhead mostly comes from |T ⟩ state factories: at a physical error rate p = 10 −4 a single factory requires 522 physical qubits, nearly twice as many as required by the data qubits when implementing Clifford and T gates directly.This is even more prominent when moving Clifford gates through the circuit, where of the 2,744 physical qubits required at error rate 10 −4 , 2,288 are for preparing and storing |T ⟩ states.Although using fewer factories can reduce the number of physical qubits, this creates time bottleneck as the data qubits need to remain idle while |T ⟩ states are prepared.
Although it is expected that the overhead from such factories will become a less significant factor as we move towards larger quantum computations [19], for early fault-tolerant quantum circuits these overheads are likely to be more costly.This could be improved via more efficient small footprint factories like the ones presented in [19], as well as the use of error-mitigated T gates [16].
When moving Clifford operations through the circuit, another space overhead comes from joint Pauli operations in the Y basis.These require additional routing space and extra connectivity.Optimising the circuit to remove such measurements would also therefore reduce the routing overhead.
In time complexity, a significant bottleneck is the long sequences of Hadamard and T gates which come from gridsynth decompositions.Of the 846 logical operations implemented in this circuit, 797 are either Hadamard or T gates.The Hadamard gate is especially expensive, requiring 3d+4 QEC rounds.In practice, this means that more than half the QEC rounds are spent implementing Hadamard gates: at a physical error rate of 10 −4 , 9,042 of the 17,271 QEC rounds are spent implementing logical Hadamard gates.Time requirements can also be further reduced in general by using gate-based teleportation to execute gates in parallel, though this comes at a cost of more physical qubits [12].
Finally it is worth emphasising that there are other ways in which these resource estimates can be improved above the quantum error correction layer, such as the use of different quantum algorithms [57] and decomposition techniques [47].

V. CONCLUSION
As we enter the era of early fault-tolerant quantum computers, where quantum error correction is able to suppress errors on a logical qubit and basic logical gates are demonstrable, it is essential for us to understand the progress required for large-scale fault-tolerant quantum algorithms.Understanding the requirements of small ap-plications is an important step in the process.In this work, we have analysed a minimal application: estimating the ground-state energy of the hydrogen molecule.We have used several techniques to reduce the estimated resources to approximately 900 physical qubits and 15,000 QEC rounds through implementing Clifford and T operations directly, and approximately 2,700 physical qubits and 2,331 QEC rounds when implementing general Pauli π/4 rotations.
It is worth emphasising that even for this small application, the numbers of physical qubits and gates required is several orders of magnitude larger than what has been performed experimentally so far.There are a number of further optimisations which can be made across the quantum computing stack in the hope of reducing these estimates.At the algorithmic level, techniques such as qubitisation have been shown to produce asymptotically shorter quantum circuits [58][59][60][61], and could potentially offer improvements even for this minimal example [57].Statistical phase estimation methods can allow reduced circuit depth in exchange for performing more samples [22,23,40], and are often stated as being particularly appropriate for the early fault-tolerant era for this reason.At the gate synthesis level, alternative techniques have produced circuits with a smaller T count, at the cost of additional logical qubits [47].When implementing π/4 joint Pauli rotations, the number of QEC rounds can be further reduced by implementing non-commuting rotations in parallel on separate patches before using teleportation to combine them, though this comes at a cost of more physical qubits [12].Finally, improvements can be made to the implementation of non-Clifford gates which are more targeted towards early fault-tolerant quantum devices, such as the use of error mitigation when implementing faulty T gates [16], avoiding the need for magic state distillation factories.Algorithms such as statistical phase estimation may remain well suited even in the presence of error mitigation [26].
A final note is that these estimates assume that quantum computers are affected specifically by depolarising noise [7,10,12].While depolarising noise is easy to mathematically model, the physical noise that affects real-world devices is more complex and cannot necessarily be captured by such a model.An important direction of future work is investigating other more realistic noise models such as leakage and deriving similar scaling formulae to that presented in Eq. 25.

CODE AVAILABILITY
The source code for generating & running the logical circuits, and estimating resources, is available on GitHub [48].

FIG. 6 :
FIG.6: Performance of gridsynth on textbook and iterative QPE, for increasing bits of precision in the gridsynth decomposition (with 3 bits of precision used in each QPE circuit).(6a) Comparison of the decomposed circuits to the exact circuits in terms of total variation distance of the output distributions.(6b) The number of gates in the overall circuit.

B
. Directly implementing Clifford and T gates

FIG. 8 :
FIG.8: Circuit for implementing a CNOT gate through joint Z ⊗ Z and X ⊗ X measurements.An additional qubit initialised in the |+⟩ state is required.

FIG. 10 :FIG. 11 :
FIG.10: Implementation of a Hadamard gate on the right logical qubit via a transversal Hadamard gate, a series of patch deformations, and two transversal SWAP gates.Arrows denote SWAP gates between pairs of neighbouring qubits.In (10f), the first QEC round is to shrink the patch, and the subsequent two QEC rounds occur after each round of SWAP gates.

|TFIG. 18 :
FIG. 18: Layout of logical patches for implementing joint Pauli measurements.In (18a), the two qubits used in the logical circuit are at the bottom, and an auxiliary qubit is initialised in the |T ⟩ state at the top.Example joint measurements required for implementing π/4 Pauli Y ⊗ X and Z ⊗ Y operations in d + 1 QEC rounds are presented in (18b) and (18c), respectively.The auxiliary |T ⟩ state is always measured in the Z basis as part of the joint measurement.Green dots represent stabiliser measurements whose outcomes produce the result of the joint logical Pauli measurement.Twist defects are presented in yellow.Arrows between neighbouring measurement qubits show extended connectivity than what is required for the methods presented in Section III B.

FIG. 22 :
FIG. 22: Moving n-qubit π/2 Pauli rotations past other Pauli operations.(22a) A π/2 rotation in Pauli basis P can be swapped with a π/4 rotation in a commuting Pauli basis P ′ .(22b) A π/2 rotation in a Pauli basis P can also be swapped with a π/4 rotation in a non-commuting basis P ′ , by modifying the basis of the π/4 rotation to iP P ′ .These rules also apply to moving π/2 rotations past Pauli measurements, as shown in (22c) and (22d).

( a )FIG. 24 :
FIG.24: Arrangements of |T ⟩ state factories around the routing spaces for (24a) implementing Clifford and T operations directly and (24b) commuting Clifford operations.Green space denotes storage space for |T ⟩ states produced by the factories, which are denoted in yellow.Note that the full factories are not shown due to size.
gates into Clifford operations and single-qubit Z rotations.Note that while single-qubit Z rotations are equivalent to single-qubit phase gates up to a global phase, the two-qubit Z ⊗ Z rotation is different from a controlled phase gate due to local phases.
c) Controlled phase gate FIG.4: Decompositions of parameterised (4a) Z ⊗ Z rotations, (4b) Z ⊗ X rotations, and (4c) controlled-phase FIG.5: Example approximate decomposition of a π/8 Z rotation into a sequence of single-qubit Clifford gates and T gates using the gridsynth software package with three bits of precision.Global phases have been omitted, and some optimisations have been applied to combine multiple S and T gates.

TABLE I :
Resource estimates for some example 15-to-1 |T ⟩ state factories at physical error rates 10 −3 and 10 −4 .

TABLE II :
Detailed resource estimates required to perform the iterative QPE circuit described in the main text for the hydrogen molecule, considering physical error rates of 10 −3 and 10 −4 .Resource estimates for the |T ⟩ state factories are in TableI.