Quantum Computation for Periodic Solids in Second Quantization

In this work, we present a quantum algorithm for ground-state energy calculations of periodic solids on error-corrected quantum computers. The algorithm is based on the sparse qubitization approach in second quantization and developed for Bloch and Wannier basis sets. We show that Wannier functions require less computational resources with respect to Bloch functions because: (i) the L$_1$ norm of the Hamiltonian is considerably lower and (ii) the translational symmetry of Wannier functions can be exploited in order to reduce the amount of classical data that must be loaded into the quantum computer. The resource requirements of the quantum algorithm are estimated for periodic solids such as NiO and PdO. These transition metal oxides are industrially relevant for their catalytic properties. We find that ground-state energy estimation of Hamiltonians approximated using 200--900 spin orbitals requires {\it ca.}~$10{}^{10}$--$10^{12}$ T gates and up to $3\cdot10^8$ physical qubits for a physical error rate of $0.1\%$.


I. INTRODUCTION
Quantum mechanical simulation of molecules and materials is a promising application area of quantum computers [1][2][3] that will enable the calculation of key properties of chemical systems with controllable errors using physically accurate models. Following Feynman's original idea of modelling quantum systems on quantum computers [4] and the first formalized procedures for carrying out such simulations [5][6][7], a plethora of quantum algorithms for calculating energies of molecular systems have been developed in recent years . Similarly, but to a lesser degree, quantum algorithms taking into account the specifics of condensed matter applications have also been conceived. These include the development of different flavors of variational quantum eigensolvers (VQE) [35][36][37][38], the quantum imaginary time evolution algorithm [39], and fault-tolerant algorithms [24,34,[40][41][42][43] for simulation of model Hamiltonians, such as the Hubbard model, as well as first-principles Hamiltonians.
Quantum computers can provide a computational advantage over classical computers only for hard classical problems. These include the simulation of so-called strongly correlated systems and, more practically, problems that are not solved with sufficient accuracy using classical methods with low computation cost -such as Kohn-Sham density functional theory (KS-DFT)) [44][45][46] or coupled-cluster theory [47,48]. Notwithstanding varying definitions and interpretations of "strong correlation", and an ongoing debate regarding the extent to which KS-DFT can describe such systems [49,50], the general consensus is that molecular and solid state systems with a large number of localised d or f electrons present a significant challenge for classical simulations. Examples of such systems include transition metal oxides such as NiO and PdO used in heterogeneous catalysis applications. The number of localised sites in such systems is formally infinite as the solids should be simulated at the thermodynamic limit. In practice, one restricts calculations to a periodic finite-sized cell (also referred to as supercell) with ca. 30-100 unique transition metal atoms; all other atoms in the solid are replicas of those in this computational cell.
The ability to accurately model the electronic structure of materials such as NiO and PdO would no doubt prove extremely useful in the study of heterogeneous catalysis, a field with no shortage of materials that are poorly described by DFT. It is often the case that the interpretation of calculated results (e.g. regarding trends in activity) must be presented with significant caveats regarding the underlying nature of the models used.
In this work, we focus on the calculation of the ground state energy of electrons in materials within the Born-Oppenheimer approximation [51]. This corresponds to finding the lowest eigenvalue of the electronic Hamiltonian for a fixed position of the nuclei. Two main families of quantum algorithms can perform such calculation: VQE [13] and quantum phase estimation (QPE) [52][53][54]. While VQE might have its merits in certain use cases, it appears the emerging consensus is that QPE has a superior scaling with the system size [1,55]. In order to estimate the eigenvalues of the Hamiltonian with QPE, one has to implement a unitary operator encoding the spectrum of the Hamiltonian. QPE requires deep quantum circuits, and as such it will need to run on error-corrected quantum computers. In such error-corrected implementations one must strive to minimize the number of T gates needed to encode the Hamiltonian, high-temperature. Under certain operating conditions, a local oxidizing environment can form within the reactor, leading to the deactivation of Ni due to NiO being present. Thermodynamics can predict the conditions at which this can occur [68]. However, in general, it is still a challenge to obtain reliable or accurate thermodynamic parameters for strongly correlated oxide materials, especially when they deviate form the bulk limit such as in nanoparticles.
Methane has an estimated greenhouse warming potential (GWP 100) of 27.9 [69], meaning its emissions contribute significantly to global warming and climate change; it is therefore necessary to reduce them wherever possible. Among the many different technologies for methane abatement, methane combustion catalysts based on palladium can be found. Such technologies include after-treatment for combustion of natural gas engines (CNG) and diesel oxidation catalysts (DOC) as well as in mine ventilation systems.
In the above applications, methane is efficiently combusted over palladium (or alloyed) oxide catalyst to produce H 2 O and CO 2 , with activity in this process influenced by a number of factors. A technical target in practical catalysis is to reduce the temperature at which this occurs, allowing for a lower operating temperature and more efficient handling of emissions. Partial oxidation can sometimes occur, and may indeed be desirable in the development of processes to produce precursors for more complex chemicals. The ability to simulate accurately not only the activity but also the selectivity, which is a measure of a catalyst's ability to promote the formation of the desired product(s) over other possibilities, is crucial to the prediction of new catalysts. Figure 1(a) shows a schematic of a typical catalyzed reaction. The presence of a catalyst provides additional reaction coordinates, or reaction intermediates, with their own activation energies (E T S1 , E T S2 ). For an effective catalyst, these energies are necessarily lower than the uncatalyzed activation energy E a . In the case of heterogeneous catalysts, reaction intermediates are typically adsorption steps, where one or more of the reactants binds to a surface site of the catalyst. Depending on the complexity of the reaction mechanisms, there may be a large number of these intermediates as well as branches and side reactions that must be considered when studying a reaction in order to determine the key step(s). It is often necessary to find these steps, which govern the activity and/or selectivity of a catalyst, as in doing so, the problem is reduced to fewer dimensions and descriptors that facilitate a more rapid study. For example, in a kinetic analysis the largest activation energy is usually of most interest, as this will be the rate determining step. Whilst this knowledge may be well established in well known reactions, it can be necessary to perform many calculations in more novel applications. Furthermore, whilst the accuracy of current computational approaches may be good enough to predict trends in similar systems, obtaining chemical accuracy and absolute values for detailed kinetic studies remains a challenge. Figure 1(b) shows a typical set of model systems that would be used to estimate the energetics of a heterogeneous catalytic process.
When running simulations of a catalyst, consideration needs to be made of the question at hand and the level of accuracy that is needed. Broadly speaking, we are interested in activity, selectivity and stability. When simulating activity, we often need a kinetic model which can provide rates or turn-over frequencies. If we are interested in screening for materials, it is often sufficient to correlate these rates with descriptors [71].
For example, following the Sabatier principle [72], which is employed primarily for materials screening, calculating the (heterogeneous) catalytic activity of a material is performed by determining the binding strengths of the reactants, products and any important intermediates of a given reaction with the surface of that material. These binding strengths can be determined from energy calculations using a wide variety of models, each with their own trade-offs between accuracy, transferability and computational cost.
However, if we are interested in predicting reactor performance or process conditions then we need significantly greater precision in the simulated parameters. Likewise, simulating the often subtle differences in competing reactions (which result in different products) typically requires greater accuracy in calculations to predict selectivity.
Whilst the questions of activity and selectivity are crucial for a material's function as a catalyst, when looking for a technical solution, the question of stability becomes critical. Catalysts often need to operate over many years under harsh conditions (high temperature, pressure, contaminated conditions and, in the case of electrocatalysis, high potentials and corrosive environments). The simulation of stability introduces a whole range of other problems; for instance, predicting morphological changes and thermal degradation of a catalyst requires a large number of calculations, often of large model systems, to allow sintering of nanoparticles or ceramic supports to be conducted. Material complexity (e.g. simulation of realistic metal/ceramic interfaces), bridging time and length scales where accurate atomic-scale materials properties can be fed into multi-scale models, are all open challenges in this area.
DFT is one of the most successful and widely used models for calculating the energies of molecular and solid state systems relevant to industrial processes. It is an ab initio method that uses functionals of the electronic density to calculate energy rather than attempt to deal directly with the many-body wavefunction. In KS-DFT, the electronic density is constructed using a fictitious set of non-interacting single electron wavefunctions and approximating an unknown correction term. This term, known as the exchange-correlation (XC) functional, includes exchange and correlation effects as well as discrepancy between the real and non-interacting kinetic energy. There are many choices, though all of them approximated, for its form.
Ultimately, it is the use of single-particle wavefunctions in DFT that leads to some of its most prominent short- comings. In the case of NiO, and indeed most transition metal oxides, the strong electron-electron interactions of the d-electrons in these materials is poorly described by approximate KS-DFT, leading to over-delocalisation of these bands (and to the prediction of more metallic electronic structures than the reality). A Hubbard U [73] correction can be used alongside local density approximation (LDA) and generalised gradient approximation (GGA) XC functionals to mitigate this issue in some cases, although it is overly empirical in nature. While the use of hybrid XC functionals such as PBE0 [74] can sometimes perform better [75], due to the inclusion of Hartree-Fock exact exchange, the fraction of exact exchange to use can be varied (depending on the XC functional used), which again leads to empirical fitting. Hybrid functionals are also incomplete (and incorrect) in their description of the electronic structure, and are by no means a guaranteed improvement over GGA functionals in their prediction of transition metal oxide properties [76].
To model the bulk properties of materials effectively, the use of periodic boundary conditions (PBCs) is required, allowing for a simulation box to include only the primitive unit cell in highly ordered systems. Even in disordered systems, periodicity is still imposed (on a larger unit cell), as the approximation still provides more representative models than any non-periodic alternative, without extending the system far beyond practical limits.
The study of heterogeneous catalysis primarily concerns the properties of surfaces, so slab models are often used. These are also periodic, albeit in 2 dimensions rather than 3. Bulk calculations are also required in order to determine the surface energies of the facets of a material, which, for example, allow for the prediction of the expected shape of nanoparticles, as well as which facets are most predominant and relevant for catalysis. The stability of a material is another important aspect that can be predicted by energy calculations on bulk systems.

A. Hamiltonian for Periodic Systems
The Hamiltonian of interacting electrons in the Born-Oppenheimer approximation can be written as follows: whereĤ (0) is a constant term describing nuclear repulsion,Ĥ (1) andĤ (2) are one-body and two-body terms, respectively [77, p. 32]:Ĥ x denotes position and spin, (r, σ), of an electron and the integration domain is over the volume of the macroscopic crystal, V . In this work, we do not consider the external magnetic field or spin-orbit coupling and therefore, the oneand two-body kernels are diagonal w.r.t. spin degrees of freedom. The spatial part of one-body kernel is: where U (r) is the nuclei potential Z a and P a are the nuclear charge and position of nucleus a. The spatial part of two-body kernel is We assume Born-von-Kármán periodic boundary conditions at the boundaries of the macroscopic crystal which is defined by the vectors L 1 , L 2 , L 3 : In this case, the external potential and two-body kernel are defined in terms of their Fourier series: where K satisfies: Crystalline solids consist of unit cells and each unit cell V uc is defined by translation lattice vectors, a 1 , a 2 , a 3 . Each unit cell can be labeled with R indicating a node of the Bravais lattice: Let N α ≥ 1 be the number of unit cells along a α , n α = 0, 1, .., N α − 1 and thus, the total number of unit cells which spans the whole finite macroscopic crystal is N = N 1 N 2 N 3 . We also introduce the reciprocal lattice which is defined as: The vectors b 1 , b 2 , b 3 and a 1 , a 2 , a 3 satisfy the following relations: In the case of crystalline solids, the external potential can also be rewritten in terms of reciprocal lattice vectors, because it is has periodicity of the lattice: This is similar to Eq. (8) but written for unit-cell periodicity instead of periodicity within macroscopic crystal. In order to perform practical calculations, one can choose a single-particle basis which is suitable for the problem of interest:ψ where p can be a set of numbers describing a single particle state such as wave-vector index and band index, for example. The Hamiltonian in the new basis set can be written as: where two-body matrix elements (often referred to as the electron-repulsion integrals or just Coulomb integrals) are: From this definition, g pqrs obeys the 4-fold symmetry relations: If coefficients are real then g pqrs obeys the 8-fold symmetry relations: The set of orbitals {ψ p (r)} should be an orthonormal basis set which satisfies the periodic boundary conditions. Below we describe two basis sets which are commonly used in computational condensed matter physics. Further in the text, the number of spatial orbitals per unit cell is denoted as M (the number of bands) while the total number of spatial orbitals in the crystal is P = M N .

Bloch Functions as a Basis Set
Bloch functions are the solution of a mean-field problem in a periodic potential. The Bloch functions can be written as follows [78, p.167]: ψ kj (r) = e ikr u kj (r) (20) where u k (r) is periodic function with periodicity of the unit cell, u k (r) = u k (r + R), j is the band index, 0 ≤ j ≤ M and k = (k 1 , k 2 , k 3 ) T is the wave vector belonging to the first Brillouin zone which can be defined as: where g β is an integer such that The larger the size of the macroscopic crystal, the larger the number of k-points in the Brillouin zone as can be seen from (22). Using Bloch functions as the basis set the Hamiltonian can be written as: where we used the convention that the Bloch functions are normalized in the unit cell. Due to G δ q+q −k−k ,G , the number of two-body terms scales as O(N 3 ), the same as in the plane-wave basis set [24]. One-and two-body matrix elements are defined as:h where and V uc is the volume of the unit cell. Coefficients in the Hamiltonian are usually complex and the two-body terms obey 4-fold symmetry (18). The composite index from Sec. III A indicates both band and wave vector, p = (k, i), q = (q, j), r = (k , l), s = (q , m).

Wannier Functions as a Basis Set
Wannier functions are the set of localized orbitals which obey the translational symmetry of the crystal: Such localized orbitals can be obtained by carrying out a localization procedure in the supercell or applying a Fourier transformation to the Bloch orbitals [62]: where unitary matrices can be chosen according to localization criteria such as, for example, Foster-Boys [79] (Maximally Localized Wannier orbitals [60]) or Pipek-Mezey [80][81][82]. Contrary to Bloch functions, these functions can be chosen to be real valued [62]. In the Wannier basis set the Hamiltonian iŝ with matrix elements: and which satisfy the following relations: This is due to the fact that Wannier functions obey Eq. (28). In this paper we don't construct Wannier functions from Bloch orbitals but rather choose natural atomic orbitals in the supercell calculations (see Section III C for computational details). Since the two-body term is real, it satisfies Eq. (19). The composite index from Sec. III A indicates both band and unit cell indices, p = (R, j), q = (T, j), r = (R , l), s = (T , m).

Majorana Representation
Majorana operators represent a convenient choice for working with quantum computing algorithms. The reason is that each Majorana operator is Hermitian and can be mapped onto one Pauli string using a qubit representation and, as a result, any unique product of Majorana operators is one Pauli string. The actual qubit representation depends on the choice of transformation such as the Jordan-Wigner [7,83] or Bravyi-Kitaev [11,84]. However, some properties of the Hamiltonian which do not depend on the choice of qubit mapping can conveniently be obtained in Majorana representation. We will use this representation in order to generalize the sparse qubitization approach on Hamiltonians with complex coefficients. Majorana operators are defined as: with an additional binary index specifying the Majorana type. They satisfy the following anti-commutation relations: Following Refs. [32,85], the constant, one-body and two-body terms of the Hamiltonian (16) in Majorana representation can be written as follows:H p≤q,r≤s (p,q)≤(r,s) where the tensors B pqrs = Re(g pqrs + g pqsr ) 2 = g pqrs + g qpsr + g pqsr + g qprs 4 (48) B pqrs is symmetric w.r.t. (p, q) and symmetric w.r.t. (r, s), C pqrs is symmetric w.r.t. (p, q) and antisymmetric w.r.t. (r, s), F pqrs is antisymmetric w.r.t. (p, q) and antisymmetric w.r.t. (r, s), B and F are symmetric w.r.t. interchange of pairs (p, q), (r, s), while C pqrs is not. The reverse relation is This representation of the Hamiltonian is valid for both Bloch and Wannier orbitals and the only differences are indices labeling states and the value of coefficients. For real-valued Wannier functions, only the B pqrs tensor is nonzero while for complex-valued orbitals, such as Bloch functions, all tensors need to be taken into account as coefficients of the Hamiltonian in Eq. (16) can be complex. The Hamiltonian in Majorana representation (39)-(47) provides a decomposition in a linear combination of unitaries (LCU [86]) and the entry-wise L 1 norm of such a Hamiltonian, which is the same in any qubit representation [85], can be written as: where In the context of quantum computation, the magnitude of λ = λ 1 + λ 2 defines the number of repetitions of controlled-unitary in QPE as discussed below.

B. Quantum Algorithms
QPE [52,54] allows to determine the phase of a unitary operator which is implemented in the quantum circuit. Originally, Hamiltonian simulation with the choice of unitary was used, as the time evolution operator can be implemented with Trotterization [87,88] and its phases E j t are directly related to the system's energies E j . Other methods for Hamiltonian simulation have since been developed, including Taylor series [20,23,24] and randomised methods [89][90][91]. Instead of the time evolution operator (57), one can apply QPE to a unitary walk operator W with eigenvalues e ±i arccos(Ej /λ) , from whose phases the Hamiltonian energies can readily be retrieved. This is the approach of more recent qubitisation methods [25,26]. The normalization factor λ is the L 1 norm of Hamiltonian's coefficients in an LCU decomposition, and ensures that all energies correspond to phases between 0 and 2π. The walk operator is constructed from a reflection along with a block encoding of H/λ. The starting point for a block encoding is an LCU decomposition of the Hamiltonian into simpler unitary operators H i that can be readily implemented on a quantum computer. In our case, the LCU decomposition of the Hamiltonian is given in section III A 3, and the H i consist of strings of up to four Majorana fermions that can be implemented in the Jordan-Wigner representation with a ranged operation [27]. The two operators facilitate a block encoding of H/λ because The leading order cost of qubitization algorithms is typically proportional to O(λ √ d/ ). A multiplicative λ is needed to maintain a fixed energy accuracy in phase estimation despite the normalization of the Hamiltonian by λ. Meanwhile, the √ d stems from data loading. The values of the d coefficients of the LCU ω i can be loaded with Toffoli cost of order O( √ d) with a select-swap network [92] (also dubbed QROAM [28]) at the expense of O( √ d) ancilla qubits.
Various implementations of the qubitization walk operator [27,28,31,32] factorize the Hamiltonian to find alternative LCUs with lower d and/or λ. They also show bespoke implementations of PREPARE and SELECT matching the choice of LCU. We base our method on the sparse qubitization method [28,31], which does not factorize the Hamiltonian, but instead exploits sparsity. In the next section, we will give an overview of that method. Then we will describe how we have adapted and extended the method to deal with Bloch and Wannier basis sets. We focus on the asymptotically dominant costs and confine detailed Toffoli and qubit number costings to Appendix A. The number of T gates is 4 times the number of Toffoli gates [27,93].

Overview of Sparse Qubitization
In this section we outline the original sparse qubitization algorithm which has been used for simulation of real Hamiltonians which satisfy 8-fold symmetry relations (19). It was first developed in [28] and improved in [31], Appendix A, on which we base our exposition. Our modifications to this algorithm are presented in the Secs. III B 2 and III B 3. For simplicity of this summary, we focus on the Hamiltonian's two-body terms, of which there are significantly more than one-body terms. The main insight the sparse qubitization algorithm [28,31] uses is that the Hamiltonian's LCU decomposition into unitaries Q is very sparse with respect to the orbital indices, (p, q, r, s), and spin indices, (σ, τ ), especially if small coefficients V pqrs are approximated to zero. Thus one can save on quantum resources required for the data loading: Instead of loading the coefficients for all values of p, q, r, s, σ, τ , only a unique set of non-zero coefficients is loaded onto the quantum computer and the rest of the Hamiltonian can be restored using symmetry restoration circuits. Let these non-zero coefficients be indexed by l = 1 . . . d in an arbitrary way, such that we must load only d data items. While the scaling of d with the total number of spatial orbitals, P , is still expected to be the same as the full number of electron repulsion integrals, one can truncate small coefficients and reduce d. Each data item to be loaded consists of the value of the coefficient V pqrs as well as the corresponding indices p, q, r, s, σ, τ that allow to apply the correct unitary Q pqσ Q rsτ . QROAM allows to load these as qubit bitstrings. However, PREPARE requires the coefficient values as amplitudes of the state, not bitstrings. This gap is bridged by instead loading so-called "keep-probabilities" and performing coherent alias sampling [27]. Fig. 2 shows a sketch of the PREPARE operator.  [28,31], outlined in Sec. III B 1. It prepares a superposition of product states describing the Hamiltonian's LCU: The amplitudes in the superposition are the (square root modulus of the) LCU's coefficients ∝ Vpqrs, whilst the corresponding product state specifies the unitary operator QpqσQrsτ . The state is entangled to an additional junk register, such as the additional qubits needed for coherent alias sampling. Only the qubits indicated with SEL are used as controls by the SELECT operator. At the vertical dashed line (red), the prepared state describes just one symmetry sector w.r.t. spin symmetry and 8-fold symmetry. This reduces the amount of data to be loaded, which is the most expensive step in the circuit. The superposition is then enlarged to more product states covering all symmetry sectors with the symmetry restoration circuits. Note that ancilla qubits (such as in the QROAM or 8-fold symmetry restoration circuit) are not depicted. Despite becoming entangled with the other qubits, like |junk , they do not affect the rest of the circuit.
The number of data items d to load can be reduced further by leveraging symmetries of the Hamiltonian that cause multiple identical coefficients. First of all, coefficient values are independent of spin. Thus we must only load one coefficient, and the other identical terms can be restored in the quantum circuit. Likewise, for a given permutation of orbital indices, eight coefficients which posses 8-fold symmetries (see Eqs (19)) can also be restored in the quantum circuit using only one set of orbital indices. This reduces the number of terms to be loaded d by approximately a factor of 8.
The PREPARE operator is implemented in the following steps illustrated in Fig. 2 (see [28,31] for details): a. Equal superposition state Prepare 1 l=0 |l , where d is the number of non-zero LCU terms (up to 8-fold and spin symmetries). This uses ancillas for amplitude amplification not shown in the figure.
b. Data loading A QROAM loads data of width m qubits. In principle, these m qubits include the value of the coefficient indices p(l), q(l), r(l), s(l) along with the value w l = |V p(l)q(l)r(l)s(l) |/8 and its sign θ.
However, in practice, in order to perform coherent alias sampling, slightly different data items must be loaded [27]. Instead of the coefficient value V pqrs , a data field of ℵ qubits (so-called "keep-probability") is needed. ℵ determines the accuracy with which the coefficients V pqrs are ultimately loaded and can be computed with (A1). Further, coherent alias sampling requires two values of the other data to be loaded (indices p, q, r, s and θ and a qubit not mentioned here to distinguish between one-and two-body terms) [27]. Thus, m = ℵ + 2(4 log P + 2) where P is the number of spatial orbitals (2P is the number of spin orbitals). The QROAM is the asymptotically most expensive step with Toffoli cost Adjusting κ (which must be a power of 2) leads to a tradeoff between Toffoli cost and ancilla qubit count [92] mκ + log(d/κ) .
Choosing κ ∼ √ d to optimise Toffoli cost, both Toffoli and ancilla count of the data lookup asymptotically follow (dropping logarithmic factors) While the QROAM lookup will also have to be uncomputed in UNPREPARE, the cost is lower because it doesn't depend on the size m of the data items when using a measurement-based uncomputation scheme [28].
c. Coherent alias sampling From the information thus loaded, coherent alias sampling then creates the state d−1 l=0 w i λ |l |0/1 |p |q |r |s |θ |junk l , which is entangled to some |junk l that is not relevant. The second register, 0 or 1, flags one-or two-body terms, respectively. d. Symmetry restoration Now, the spin symmetry and 8-fold symmetry must be restored. Two new qubits encoding spin |σ and |τ in the state (|0 + |1 )/ √ 2 are added as further tensor product factors. When the tensor product is expanded, it quadruples the number of states in the superposition (67).
To restore 8-fold symmetry, similarly three qubits (|0 + |1 )/ √ 2 for each of the symmetries p ↔ q, r ↔ s, (p, q) ↔ (r, s) are added as tensor product factors. Swaps controlled on these qubits then swap p, q, r, s registers depending on the symmetry.
The result is a state describing the full LCU in (61). A subtlety of symmetry restorations is that slightly different values of w i must be loaded by the QROAM, because the symmetry restoration accumulates factors of 1/ √ 2. Yet this does not affect the overall subnormalisation λ of the Hamiltonian.
Next, a SELECT operator selects the correct unitary for the p, q, r, s, σ and τ indices. The UNPREPARE operator uncomputes PREPARE. Using measurement based uncomputation, this is much more efficient than PREPARE [28]. The total leading order Toffoli cost is the product of the QROAM cost for data loading (64) with the number of iterations of the walk operator πλ/(2 QPE ) . The normalisation factor, λ, together with the desired accuracy, QPE , determine the number of iterations of the walk operator required for phase estimation.

Generalization of Sparse Qubitization for Bloch Basis Functions
While the original sparse qubitization method supports Hamiltonians with real electron repulsion integrals only, Bloch orbitals usually lead to complex coefficients. We generalise the sparse method to complex Hamiltonians by expanding the Hamiltonian in Majorana strings III A 3 and instead of working with 8-fold symmetry restoration circuits, we introduce Majorana type restoration circuits. For a real Hamiltonian the expansion (40)-(47) only contains the terms (40), (42), and (45). The other terms arise for complex Hamiltonians. The coefficients of the Majorana strings are all real (or purely imaginary for the one-body terms) due to the Hamiltonian's Hermeticity.
We use a SELECT operator [27,32] that allows to select Majorana strings based on: the indices p, q, r, s; spin indices σ, τ ; and four Majorana type indices j 1 through j 4 -as they appear in the Hamiltonian (section III A 3). In addition, the correct sign of the LCU coefficient is selected based on a |θ qubit. In Fig. 3, the control qubits for SELECT are indicated by SEL . The indices p, q, r, s index the Bloch basis functions (20), and as such they are composite indices, each consisting of band index and k-wave-vector index. For the Bloch basis we do not need to split up the composite indices and arbitrarily enumerate them as p, q, r, s ∈ {0, . . . , P − 1}.
A main benefit of using Bloch basis functions even in classical methods is that momentum conservation causes many terms to be zero (see (20)). Therefore we can expect the number of non-zero terms to scale as for an LCAO basis set, while for PW basis sets M 4 can be reduced to M 3 . Since the number of bands M is defined per unit cell, the scaling of the algorithm with the system size, N , is cubic. The PREPARE operator is sketched in Fig. 3. The electron repulsion integrals in the original sparse method possess 8-fold symmetry in the p, q, r, s indices, and this is restored in PREPARE with controlled SWAPS (see Sec.III B 1). In our case, at first, the electron repulsion integrals merely have 4-fold symmetry (18) because the basis functions are complex. Once the Hamiltonian is expanded in Majorana strings (section III A 3), this results in different types of symmetry for different coefficients as explained in sec III A 3. Instead of restoring these symmetries with controlled SWAPS, we rewrite the LCU decomposition of the Hamiltonian such that the symmetries are not explicitly present anymore, see Sec. III A 3. The sums can be restricted to one branch of the symmetry by instead summing over the Majorana type. For example, the coefficient B pqrs in (42) has 8-fold symmetry. Yet the sum is restricted to p ≤ q, r ≤ s, (p, q) ≤ (r, s), such that only one branch of the symmetry is present in the LCU, and it does not have repeated coefficients for 8-fold permutations of p, q, r, s. The "missing" terms are compensated by summing over multiple values of the Majorana type indices j 1 , . . . , j 4 . The resulting symmetry in the Majorana type can then be more easily restored, because it does not involve CSWAPS of multi-qubit p, q, r, s registers. Further, even if we did not restrict the sums in this fashion, some Majorana type symmetry would be still present and have to be restored anyway due to the complex nature of the Hamiltonian.  2), it has Majorana type indices that are used by SELECT to apply the correct Majorana types for the Majorana string described by a product state. jt = 0, 1 and t = 1, 2, 3, 4 enumerates the positions of the Majorana operators in a Majorana string. The circuit for Majorana type symmetry restoration is given in the circuit (71); its input qubits must be loaded by QROAM. The spin symmetry restoration circuit is given in circuit (75). The Hamiltonian has been written such that no 4-fold or 8-fold symmetry is present any more (see Sec. III A 3). To simplify the sketch we have depicted that both spin qubits are loaded by QROAM (See the circuit (76) for details).

Majorana
The LCU has multiple repeated coefficients for different values of j 1 , . . . , j 4 of the Majorana type indices, up to signs like (−1) j1+j4 . Rather than repeatedly loading the same coefficient with a different sign multiple times, we restore the Majorana type in PREPARE with the following symmetry restoration circuit:

1-body/2-body
The circuit needs three input qubits |s 1 |s 2 |s 3 along with the qubit flagging one-body or two-body terms that together distinguish all of the types of terms and thereby Majorana type symmetry in the LCU. The four output qubits |j i are initialised as |0 and will indicate Majorana type to be used by SELECT (see Fig. 3). The initial values of the |s i qubits must be loaded from QROAM [94]. They identify terms in the Hamiltonian as follows: Let us give an example of Majorana type symmetry restoration for a term of type B, i.e. (42). The table above shows that the qubits are loaded such that at the vertical dashed line (red) in Fig. 3 the input qubits are |1 or 2 body |s 1 |s 2 |s 3 = |1 |0 |0 |0 .
The symmetry restoration circuit (71) acting on these then results in the state This selects the correct Majorana type indices and signs for the B-type term in (42). Note that when inverting the circuit for UNPREPARE, one must omit the CZs. Otherwise the −1 in the Majorana symmetries of the coefficients (like in (74)) introduced in both PREPARE and UNPREPARE would cancel. The cost of this circuit is subleading, except to the extent that the necessary data loading increases m. Its cost can be reduced by using a unary iteration over the |s i qubits. As in the original sparse method, the spin symmetry is also restored on the quantum computer. Identical coefficients corresponding to different spin configurations must only be loaded once. Here we have two spin qubits |σ and |τ for the first two and second two Majoranas, respectively. Because we have more possible combinations than in the original sparse method, we need a short spin symmetry restoration circuit Note that (contrary to the simplified Fig. 3), the qubit |σ is not loaded from QROAM but initialised as zero. As in the original sparse qubitization algorithm, the dominant cost is (68), the product of the number of iterations in phase estimation and the QROAM cost of data loading. Here, the parameters are as follows: • QPE is the error budget for the phase estimation.
• λ is the normalization factor of the Hamiltonian, i.e. the L 1 norm of the LCU (section III A 3). It depends on the specific material under consideration.
• d is the number of non-zero coefficients of the LCU to load from QROAM. (Up to spin symmetry and Majorana type symmetry, which are restored with circuits as discussed above.) • m is the size of each of the d data items that need to be loaded. Specifically, we have m = ℵ + 2 (4 log(P ) + 6) The values of the coefficients V pqrs are effectively encoded in the ℵ qubits and restored with coherent alias sampling [27]. The other qubits correspond to the necessary indices and further qubits to be loaded. For technical reasons of coherent alias sampling, they must be loaded with two values each, explaining the factor of 2 in above equation. First, 4 log(P ) is the total width of the four basis function indices p, q, r, s. A small number of qubits are needed for Majorana type restoration, spin symmetry restoration, distinguishing one-and two-body terms, and the sign of the coefficient.
• κ is a power of 2, and can be tuned to achieve a tradeoff between Toffoli and ancilla count (65). We choose it such that Toffoli count is minimized. Then the overall cost (dropping logarithmic factors) is

Generalization of Sparse Qubitization for Wannier Basis Functions
The algorithm closely follows the one for Bloch states. However, since Wannier functions can be chosen to be real, only the terms (40), (42) and (45) in LCU expansion from Sec. III A 3 are non-zero. Additionally, these terms possess the same translational symmetry as ERIs (35). This is taken into account through a translational symmetry restoration circuit which further reduces the cost of quantum computation. We sketch the PREPARE operator in Fig. 4. Translational symmetry can be leveraged to avoid loading repeated coefficient values. In order to restore it with a symmetry restoration circuit, the compound indices p, q, r, s must be split into an orbital index i t and cell index n t = (n t,1 , n t,2 , n t,3 ) T as indicated in Fig. 4, t = 1, 2, 3, 4 enumerates the positions of the Majorana operators in a Majorana string. The figure also shows that SELECT must now be controlled on all qubits constituting the compound indices. While i t ∈ {0, . . . , M − 1} for the number of orbitals per cell M , the cell index n t,i ∈ {0, . . . , N i − 1} for each spatial direction. Even though the total number of non-zero terms is larger than in the Bloch basis set, the translational symmetry reduces the number of unique non-zero terms d to the same asymptotic scaling as in the Bloch basis (70). The spin restoration circuits work identically to circuits used for the Bloch representation, while the Majorana type symmetry restoration circuit (71) can be simplified: Since the only terms appearing are the one-body term Eq. (40), and two two-body terms Eq. (42) and Eq. (45), we can remove all of the gates from (71) that are controlled for other contribution in the Hamiltonian. Further, we can remove the qubits |s 1 |s 2 keeping only |s 3 .
The translational symmetry of the LCU coefficients B from Eq. (42) can be expressed as The in-place additions [93] are to be performed separately for the three spatial directions of n t . In this circuit, n 4 is an ancilla and the index must not be loaded from QROAM, leading to a reduction in m compared to the Bloch basis.
The dominant cost of the algorithm in the Wannier basis has the same formula (68) and scaling (78) as in the Bloch basis. Yet, m is lower in the Wannier basis. This reduces the dominant Toffoli (64) and qubit costs (65) which stem from data loading. Further, d is expected to be lower (if only by a factor) because of the two-body terms, only Eq. (42) and Eq. (45) are non-zero for real orbitals, and, as we will find in Sec. III C 1, more small coefficients can be truncated. The dependence of the normalization factor, λ, which has a large effect on the dominant cost, on the choice of the basis set is discussed in Sec. IV.

Error Correction Scheme
The details of the error correction implementation we used in this work can be found in Sec. 4 of Ref. [55], but for completeness a short description is also provided here.
We estimate the error correction overheads following Litinski's Game of Surface Codes scheme [65]. The computational qubits are arranged in fast blocks [65], that enable the consumption of one magic state per time step -the clock rate of an error corrected quantum processor. We select magic state factories from a subsect of the ones described in [65] [96]. We allow ourselves as many magic state factories as necessary, so that computational qubits do not idle between magic state injections. Magic state factories are arranged around the computational qubits, accounting for the routing space necessary to move the distilled T states to the computational area.
The code distance d and magic state factory types are determined by allocating a failure rate budget of 1% to error correction; this budget is divided between magic state distillation failure (0.1%) and probability of logical circuit failure (0.9%). Logical circuit failure is estimated using the conventional formula for error rate in the surface code [56]: where p is the probability of physical error, that we take to be 10 −3 or 10 −4 . The former is an accepted rate for superconducting systems, while the latter is considered optimistic in these platforms. To obtain runtime from T-gate count, we assume serial implementation and multiply the number of T gates by the duration of the time step, which in the surface code is d times the duration of the code cycle. We take the code cycle (the time needed to carry out one round of stabiliser measurements) to be 1µs long -this is a conventional figure for superconducting qubits [97,98].

C. Classical Computational Details
In order to generate ERIs, we have carried out restricted Hartree-Fock calculations in the Gaussian basis set using the PySCF software [99][100][101][102]. The Fermi-Dirac distribution of occupation numbers has been used. In order to numerically study the scaling properties of the algorithms, we carry out calculations on a model hydrogen (H) crystal in body-centre cubic (BCC) structure. Quantum resource estimations have also been applied to realistic solids such as lithium hydride (LiH), NiO and PdO. In the case of H and LiH, all-electron calculations have been carried out with STO-3G [103][104][105][106][107] basis set, while for NiO and PdO the GTH pseudopotentials [108,109] with GTH-SZV basis set have been used [110,111]. Gaussian density fitting has been employed for the calculation of ERIs [112].
The pseudopotential setups with 18 valence electrons for Ni and Pd, and 6 valence electrons for O have been used. The geometry of H has been optimized with GPAW software [113][114][115], the geometry of PdO has been optimized with Quantum Espresso [116,117], while experimental structures have been used for LiH [118, p.7] and NiO [119]. We provide geometry files in the Supplemental Information. In order to obtain spatially localized orbitals satisfying translational symmetry (28), the natural atomic orbitals [120] in a supercell have been constructed. We have then validated that the translational symmetry (34), (35) of the integrals is satisfied (see Table I). Calculations in the supercell have been carried out using cubic cells of different sizes for H, LiH and NiO, while a tetragonal cell has been used for PdO. For calculations using Bloch functions, we used primitive 2-atom cells for LiH and NiO with varying k-point meshes (up to a 3x3x3 mesh). The unit cell used for H consists of two atoms while the tetragonal unit cell of PdO has been chosen such that it contains four atoms.

Errors and Truncation Strategies
The total additive error of the energy estimation can be split into three errors [31]: where QPE is the error due to limited precision of the QPE, trunc is the error due to discarding Hamiltonian coefficients with a certain precision, and prep is the error due to finite precision of state preparation amplitudes. Given the accuracy we distribute the total error as Since in condensed matter calculations one is often interested in the energy of the crystal per formula unit (f.u.), we allow the total error, , to grow proportionally to the system size, that is the error in energy estimation per f.u. is fixed when the size of the supercell increases. The required total accuracy of calculations strongly depends on the problem of interest. For example, the band gap of transition metal oxides is on the order of a few eV, while the energy difference between different crystal structures can be up to a few hundred meV/f.u. [121]. Therefore, we carry out truncation with different accuracy of 50 meV/f.u. (1.8 mHa/f.u., which is close to chemical accuracy), 5 meV/f.u., and 0.5 meV/f.u. One of the bottlenecks in the quantum algorithm is the number of data items, d, that must be loaded into a quantum computer. The number of data items can be reduced by truncating small terms in the Hamiltonian. In order to estimate the error trunc due to truncation of the Hamiltonian coefficients, one usually exploits classical heuristics based on the norm of the Hamiltonian's coefficients [32] or classical quantum chemistry calculations such as coupled-cluster theory [28,31]. For large systems, such as crystalline solids, high-order coupled-cluster calculations require large computational efforts and, therefore, we have used a simpler truncation methods based on the LCU's L 2 norm following Ref. [32]. Namely, if H = i w i U i as in Eq.(58) then we define the truncated Hamiltonian H trunc = i w trunc i U i based on the following equation: One can also estimate the error based on L 1 -norm truncation and even though such an estimation is rigorous, it does not allow truncating many coefficients. As shown in Fig. 5, there are many two-body terms in the Wannier representation with magnitude less then 10 −8 Ha, and as a result, the error estimate based on L 1 -norm would already be on the order of 0.01 Ha if all these terms are neglected. The L 2 -norm allows truncating much more coefficients but it is not a rigorous error estimate and, therefore, it might provide optimistic resource estimations. Instead of using L 1 -norm-based truncation, we will also present results for a truncation threshold of 10 −9 Ha in order to understand how much L 2 -norm truncation reduces resource as compared to an accurate representation of the Hamiltonian. We use this procedure for moderate size systems. However, large systems such as NiO and PdO with supercells made of 64 and 72 atoms with around 900 spin orbitals require large amount of memory. In this case, we save only those integrals whose absolute value is less then 10 −7 Ha and then the Hamiltonian coefficients in Majorana representation are truncated with the threshold (1.81 × 10 −6 Ha and 3.82 × 10 −7 Ha for NiO-64 and PdO-72, respectively) obtained from calculations on smaller supercells at a total accuracy, , of 5 meV/f.u. The absolute value distribution of two-body matrix elements for H using Bloch and Wannier functions. The Hamiltonian terms were truncated at 10 −16 Ha. The k-point mesh of (4, 4, 4) is used for Bloch functions with a 2-atom unit cell, and a 128-atom supercell is used for constructing Wannier functions.

A. Scaling
Both Wannier and Bloch functions provide the same spectrum of the Hamiltonian since they are related to each other through unitary transformation. However, the Hamiltonian have different properties in different basis sets and the main task is to chose such basis functions which minimize the total cost of quantum computation. It is proportional to the square root of the number of Hamiltonian coefficients, d, loaded into the quantum computer times the L 1 norm, λ, of the Hamiltonian divided by the precision of QPE simulation QPE (see Eq. (78)). With the example of model H in a BCC lattice structure, one can see that the number of non-zero terms in the Hamiltonian in Bloch representation scales as a cubic power of the system size, which is almost an order of magnitude better than in Wannier representation if no symmetries are taken into account (see Fig. 6(a)). However, the trend is opposite for the L 1 norm, λ, and the Hamiltonian has a much smaller norm using Wannier functions (see Fig. 6(b)) which in turn significantly reduces the number of controlled-unitary steps in Hamiltonian simulation. This is on par with molecular calculations where the set of localized orbitals reduces the L 1 norm [85]. The number of T gates shown on Fig. 7 is significantly lower in Wannier representation and has a better asymptotic scaling. This is achieved by (i) loading only unique Hamiltonian terms into the quantum computer as discussed in III B 3, and (ii) providing a lower L 1 norm, λ, than in Bloch representation. Further reduction in T-count can be accomplished by truncating the Hamiltonian coefficients using the L 2 -norm based truncation as shown on Fig. 7. After applying the L 2 -norm truncation, the number of T gates in model H systems scales as O(N 1.5 ) with the system size. Such a low asymptotic scaling can also be explained by the fact that the permissible error grows proportionally to the system size and the number of terms, remained after truncation in such model systems with Wannier functions, scales as O(N 3 ). In molecular systems, one can expect that localized orbitals produce Hamiltonians with quadratic scaling w.r.t. number of non-zero terms [122]. We can expect the same for materials with large band gaps while the model H system in our calculations have zero-gap at both PBE and Hartree-Fock level of theory.

B. Resource Estimations
The number of T gates required for a single shot of the qubitized QPE circuit is presented in Fig. 8. When Wannier functions are used as a basis set and the targeted accuracy is 50 meV/f.u., the number of T gates in the circuit is less than 10 11 for all materials, system sizes and truncation strategies used in this work. As one can see, reducing the total permissible error of the Hamiltonian simulation by a factor of 10 increases the number of T gates by approximately an order of magnitude in agreement with Eq. (78). The L 2 -norm truncation strategy reduces the number of T gates but not much except for H in Wannier basis, where the truncation of coefficients leads to an order of magnitude reduction of T gates. Similar trends are observed when Bloch functions are employed. However, in this basis set, the T-gate count is consistently higher than the T-gate count obtained with Wannier functions: for H the number of T gates is 25 times larger, for LiH with 54 k-points the T count is four times larger than for LiH with 64 atoms in the supercell whereas for NiO the difference between the two approaches is almost two orders of magnitudes. We can see that more than an order of magnitude comes from the fact that the L 1 norm in Wannier representation is smaller and the other improvement comes from truncation of small coefficients and symmetry considerations.  The number of physical and logical qubits for simulations described above are shown in Fig. 9(b,c). As can be seen from Fig. 9(b), the smallest simulations will require few million physical qubits if the physical error rate reaches 0.01%, while the largest simulations of NiO and PdO need about 65 million physical qubits. For the error rate of 0.1%, quantum error correction requires the number of physical qubits to be 4-5 times larger. The number of logical qubits [see Fig. 9(c)] required for the simulation of small cells is around few thousands while large super cells would need around 10 5 logical qubits. We note that the improvement in the physical error rate occasionally reduces the number of logical qubits. This is because the number of logical qubits is the sum of the computational qubits and the magic state factory qubits. The computational qubit count is a feature of the system we study, and does not depend on the error rate of the quantum computer. The number of logical qubits dedicated to magic state distillation could in principle change when the error rate changes -because the fidelity with which we need to distill magic states depends on the error rate. Yet, many of our resource estimations just require the highest fidelity factory (225-to-1) in our list [see footnote in Sec. III B 4] for either error rates (0.1% or 0.01%), and hence the logical qubit count does not change. For much smaller error rates, smaller factories would suffice and we would see such changes.  x-axis is the total number of spin orbitals. The number next to the name of the crystal indicates how many atoms in the supercell contains and how large the k-point mesh has been used for Wannier and Bloch basis sets, respectively.

V. DISCUSSION AND CONCLUSION
The estimation of the ground state energy of crystalline solids with a supercell of ca. 50-70 atoms requires ca. 10 10 -10 12 T gates when the size of the basis set is ca. 300-500 spatial orbitals. This is comparable to the T-gate count required for the estimation of the energy of molecular systems within the active space of several tens of orbitals. For example, simulation of a Ru complex with 65 spatial orbitals and using double factorization (DF) requires around 4.6 × 10 10 T gates [32] while simulation of cytochrome P450 with 58 spatial orbitals and using tensor hypercontraction (THC) [31] requires around 7.8 × 10 9 T gates [123]. Thus, if molecular Hamiltonians can be simulated within a reasonable time then so can a Hamiltonian describing crystalline solids. However, the number of logical qubits is larger for solids, 10 4 -10 5 , which is due to the fact that the number of orbitals considered in this work is larger almost by an order of magnitude as compared to molecular resource estimates.
We have considered the use of minimal Gaussian basis sets. However, for realistic solids one would need to use at least DZP or TZP basis sets, which would lead to a higher T-gate count. However, numerical studies for molecular systems indicate that methods like THC provide the best asymptotic scaling with respect to the number of orbitals for molecules [123]. Thus, using such approaches one might still obtain a reasonable resource estimates for crystalline solids. In order to carry out such estimations, one would have to (i) adapt such methods for periodic systems by also taking the translation symmetry into the consideration and (ii) develop classical electronic structure software for efficient generation of factorized Hamiltonians. For example, in order to generate ERIs for NiO with 64 atoms in the supercell and using GTH-DZP basis set one would need to use several TB of memory. Another approach which would allow to perform useful simulations of solids on error-corrected quantum computers with larger basis sets is by choosing the active space within the size of several hundred of orbitals or by using quantum embedding methods.
In this work, we have investigated how the translational symmetry of the Hamiltonian can be exploited in order to reduce the quantum resources. Similarly, other symmetries such as point group symmetry can be taken into account. However, we do not expect that this will lead to an order of magnitude reduction of T gates as qubitization-based algorithms scale as the square root of the number of terms. The use of Brillouin zone symmetry can also reduce the cost of quantum algorithms in Bloch basis set but we expect that Wannier representation will provide a better resource estimates for moderate-size systems. For example, 27 k-points were used for NiO and even if each k-point provided the same coefficients it would reduce the cost by a factor of 5.
The efficiency of the QPE to estimate the ground state energy depends also on the overlap between an initial state such as Hartree-Fock state and the true ground state wave function. We have not investigated this in this work. For molecular systems, this overlap can appear to be either sufficient [32,123] or small [124] and more investigation needs to be carried out for crystalline solids. In this work, we have focused only on the single-shot cost of the total QPE circuit.
In classical computations of ground state energy of periodic solids, using Bloch functions is currently considered to be the most efficient approach in both KS-DFT [125] and wave function methods such as coupled-cluster theory [102]. Wannier functions are often used as a post-processing tool for calculation of properties such as conductivity or band structure interpolation [62]. In quantum computing, however, the Wannier functions represent an efficient choice as a basis set for the ground state energy calculations. Other areas, such as linear-scaling DFT also uses the Wannier functions [126,127] as the primary basis set.
In conclusion, we have considered ground state energy estimation of crystalline solids on error-corrected quantum computers using qubitization based QPE. We present two materials such as NiO and PdO which are known to be challenging systems for electronic structure methods on classical computers and are relevant for heterogeneous catalysis. The investigation of properties beyond ground state energy calculations will be addressed in the future; this work is a first step towards practical algorithms for simulation of crystalline solids on error-corrected quantum computers. We have adapted the qubitization algorithm to solid-state systems by taking into account the symmetries of the integrals in Wannier representation and generalized sparse qubitization for use with complex Hamiltonians which are needed when Bloch functions are employed. Realistic resources estimations have been carried out and presented for error-corrected quantum computers. The simulation of crystalline solids in the minimal basis set on a quantum computer with the approach presented in this paper would require an order of 10-100 millions of physical qubits and 10 10 -10 12 number of T gates. We expect that these numbers can be reduced further by using, for example, different qubitization techniques such as DF [32] or THC [31] adapted for solid state systems.

VI. ACKNOWLEDGMENT
We thank RobertÍszak for valuable discussion and comments on the manuscript. We also thank Nick Blunt for helpful discussion as well as Earl Campbell for useful discussion on algorithms and error correction.

VII. SUPPLEMENTAL INFORMATION
The geometry and integral files for all systems considered in this work are available at Zenodo [128]. qubits (Bloch) or one qubit (Wannier) identifying term types for Majorana type restoration and one qubit for spin restoration. (Note that contrary to the simplified depiction in Figs. 3 and 4, the |σ qubit in (76) is added as an ancilla rather than loaded.) The factor of 2 stems from needing "ind" and "alt" values for coherent alias sampling.
• κ 1 and κ 2 are powers of 2. They determine the space-time tradeoff in the QROAM and QROAM uncompute.
We choose them to minimize Toffoli cost.
• I = πλ 2 QPE the number of repetitions of the walk operator for quantum phase estimation • 2 η is the maximal power of 2 that's a factor of d • b r are bits of precision for the equal state preparation, we take it 7 as suggested in Ref. [31].

Toffoli count
The total Toffoli count is the product of I (the number of iterations in phase estimation) and the the number of Toffolis needed to construct the walk operator. The walk operator consists of the following circuit elements: • PREPARE and UNPREPARE. The PREPARE operator is sketched in Figs. 3 and 4, while UNPREPARE uncomputes the state. We state the total cost of both, leading to a factor of two in most items.
-Equal state superposition over d basis states via amplitude amplification: 2(3 log d − 3η + 2b r − 9) (A4) -Data lookup via QROAM. This is the asymptotically dominant contribution to the walk operator. Uncomputing can be significantly simplified using a measurement based uncomputation scheme, such that the total cost is: -Coherent alias sampling. It can be uncomputed without Toffolis, giving: The ℵ is for an inequality test, and the register sizes to be swapped are (m − ℵ)/2. Swapping the sign qubits can be done without Toffolis, leading to the −2.
A controlled Hadamard can be implemented with a single Toffoli using a catalytic T state, see Fig. 17 in Ref. [31]. -Spin symmetry restoration: 0 -Translational symmetry restoration (Wannier basis only). The cost of computing and uncomputing the additions in (81) is: For this formula we have assumed that N 1 , N 2 , N 3 are all powers of 2. Then the in-place additions (or in the uncomputation, subtractions) modulo N i can be performed with log N i − 1 Toffolis each [93]. If N i is not a power of 2, the cost will be higher to perform the correct modular arithmetic. Yet this symmetry restoration circuit is a subleading contribution and for simplicity we use (A9) even if the cell numbers are not powers of 2.
• SELECT. We have two ranged operations that are uncontrolled and two that are controlled on the qubit flagging one-or two-body terms and thereby the number of Majoranas in the unitary. Each of these unary iterations is over 4P values. These are spin, Majorana type, and the indices specifying each Majorana [See Figs. 3,4]. The total cost is 2(4P − 2) + 2(4P − 1). (A10) • Reflection. The walk operator is built from the block-encoded Hamiltonian along with a reflection. This is implemented by a multicontrolled Z controlled on a number of qubits. These are log d for the QROAM index, one further qubit from preparing the equal superposition state, ℵ for the equal superposition state in coherent alias sampling, one qubit from spin, four qubits from Majorana type, and (only for Wannier) log N 1 + log N 2 + log N 3 . The gate controlled on c qubits can be implemented with c − 1 Toffolis, giving log d + ℵ + 5 + (Wannier only: Toffolis. • Two more Toffolis for each step (to make the reflection controlled, and for the unary iteration in the phase estimation) Note we do not include the cost of initial state preparation for the Heisenberg-limited phase estimation or of the inverse QFT. These are small additive costs that are not multiplied by I ∝ λ/ QPE in contrast to all the above contributions.