Quantum computing enhanced computational catalysis

The quantum computation of electronic energies can break the curse of dimensionality that plagues many-particle quantum mechanics. It is for this reason that a universal quantum computer has the potential to fundamentally change computational chemistry and materials science, areas in which strong electron correlations present severe hurdles for traditional electronic structure methods. Here, we present a state-of-the-art analysis of accurate energy measurements on a quantum computer for computational catalysis, using improved quantum algorithms with more than an order of magnitude improvement over the best previous algorithms. As a prototypical example of local catalytic chemical reactivity we consider the case of a ruthenium catalyst that can bind, activate, and transform carbon dioxide to the high-value chemical methanol. We aim at accurate resource estimates for the quantum computing steps required for assessing the electronic energy of key intermediates and transition states of its catalytic cycle. In particular, we present new quantum algorithms for double-factorized representations of the four-index integrals that can significantly reduce the computational cost over previous algorithms, and we discuss the challenges of increasing active space sizes to accurately deal with dynamical correlations. We address the requirements for future quantum hardware in order to make a universal quantum computer a successful and reliable tool for quantum computing enhanced computational materials science and chemistry, and identify open questions for further research.


I. INTRODUCTION
Quantum computing [1][2][3][4] has the potential to efficiently solve some computational problems that are exponentially hard to solve on classical computers. Among these problems, one of the most prominent cases is the calculation of quantum electronic energies in molecular systems [5,6]. Due to its many applications in chemistry and materials science, this problem is widely regarded as the "killer application" of future quantum computers [7], a view that was supported by our first rigorous resource estimate study for the accurate calculation of electronic energies of a challenging chemical problem [8].
At the heart of chemistry is predicting the outcome of chemical processes in order to produce chemicals, drugs, or functional molecular assemblies and materials. A prerequisite for this ability to predict chemical processes is an understanding of the underlying reaction mechanisms. Quantum mechanics allows one to assign energies to molecular structures so that a comparison of these energies in a sequence of molecular transformations can be taken as a measure to rate the viability of such a chemical reaction. Relative energies are a direct means * To whom correspondence should be addressed. Electronic mail: markus.reiher@phys.chem.ethz.ch † To whom correspondence should be addressed. Electronic mail: mtroyer@microsoft.com to predict reaction heats and activation barriers for chemical processes. However, the reliability of such predictions depends crucially on the accuracy of the underlying energies, of which the electronic energy is often the most important ingredient. This energetic contribution of the dynamics of the electrons in a molecule can be calculated by solving the electronic Schrödinger equation, typically done in a so-called one-particle basis, the set of molecular orbitals.
The computational complexity of an exact solution of the electronic Schrödinger equation on classical computers is prohibitive as the many-electron basis expansion of the quantum state of interest grows exponentially with the number of molecular orbitals (often called the "curse of dimensionality"). An exact diagonalization of the electronic Hamiltonian in the full many-electron representation is therefore hard and limited to small molecules that can be described by comparatively few (on the order of twenty) orbitals. Once accomplished, it is said that a full configuration interaction (full-CI) solution in this finite orbital basis has been found. Since typical molecular systems will require on the order of 1000 molecular orbitals for their reliable description, exact-diagonalization methods need to restrict the orbital space to about twenty orbitals chosen from the valence orbital space (called complete active space (CAS) CI). Accordingly, approximate methods have been developed in quantum chemistry for large orbital spaces. By contrast, quantum com-puting allows for an encoding of a quantum state in a number of qubits that scales only linearly with the number of molecular orbitals and has therefore the potential to deliver full-CI solutions for large orbital spaces that are inaccessible to traditional computing because of the exponential scaling.
In our previous case study on the feasibility of quantum computing for chemical reactivity [8], we chose to investigate electronic structures of a biocatalyst with still unknown mode of action, i.e., the active site of nitrogenase which is a polynuclear ironsulfur cluster. This choice was motivated by the fact that such polynuclear 3d transition metal clusters are known to exhibit strongly correlated electronic structures that are hard to describe reliably with approximate electronic structure methods on classical computers (electronically excited states of molecules in photophysical and photochemical applications represent another example of this class of problems). In Ref. [8], which relied on a structure model of the active site of nitrogenase in its resting state, we avoided biasing toward the electronic ground state of that particular structure by optimizing arbitrary electronic states for that structure for spin and charge states. These did not coincide with the electronic ground state of the resting state as emphasized in the supporting information of Ref. [8]. Although the resource estimates provided in Ref. [8] supported the view that quantum computing is a true competitor of state-ofthe-art CAS-CI-type electronic structure methods such as the density matrix renormalization group (DMRG) [9,10] or full configuration interaction quantum Monte Carlo (FCIQMC) [11] for studies of chemical reaction mechanisms, we did not consider an actual reaction mechanism.
In this work, we therefore revisit the problem of quantum computing enhanced reaction mechanism elucidation by considering the latest algorithmic advances, but now with a focus on a specific chemical reaction that is prototypical for homogeneous catalysis and equally well relevant for heterogeneous catalysis. The example that we chose is the catalytic functionalization of carbon dioxide, i.e., the capture of the small green-house gas carbon dioxide by a catalyst that activates and eventually transforms it to a useful chemical such as methanol. Hence, we continue to focus on small-molecule activation catalysis, but emphasize that, despite the obvious interest into this specific system in the context of carbon capture technologies, our analysis is of general value to a huge body of chemical reactivity studies. Moreover, we re-examine some of our initial assumptions that are a moving target in the fast developing field of quantum computing. These are the gate counts for the quantum algorithm that performs the energy measurement and assumptions on the error corrected gate times on a future quantum computer, which are heavily dependent on the underlying technology of its hardware. Furthermore, we extend our previous work with respect to the steps that need to be carried out by a quantum computer, specifically regarding the resources of the state preparation step.
Since our previous work [8], there has been significant progress in quantum algorithms, but also a better understanding of what it takes to build a scalable quantum computer has been reached. On the algorithmic side, Hamiltonians represented by a linear combination of unitaries in a so-called blackbox query model can now be simulated with optimal cost using techniques called "qubitization" [12] and "quantum signal processing" [13]. In addition, structure in broad families of Hamiltonians can be exploited to reduce the cost of simulation even more. This includes Hamiltonians with geometrically local interactions [14] and large separations in energy scales [15]. Even more recently, very promising tight theoretical bounds on the performance of traditional Lie-Trotter-Suzuki formulas on average-case Hamiltonians have been obtained [16].
Here, we introduce further refinements to the technique of qubitization applied to molecular systems, which has already been noted [17] to be particularly promising in terms of the dominant quantum Toffoli-gate complexity [18]. We assume that the two-electron tensor describing interactions between N molecular orbitals has a low-rank approximation in a so-called double-factorized representation. This leads to Toffoli gate cost estimates that are orders of magnitude better. For instance, we previously estimated that obtaining an energy level of a Nitrogen fixation problem to 1mHartree [8] cost the equivalent of 1.5 × 10 14 Toffoli gates. This was reduced to 2.3×10 11 Toffoli gates by Berry et al. [19] using a socalled single-factorized representation in the qubitization approach. Under similar assumptions on the rank as Berry et al., we achieve 1.2 × 10 10 Toffoli gates, a further improvement of more than an order of magnitude.
We have also taken into account realistic assumptions for mid-term quantum hardware. While our previous work [8] focused on optimistic future devices with logical gate times of 100ns and all-to-all connectivity between qubits, we here employ gate times of 10 µs for fault tolerant gates with nearest neighbor connectivity -realistic assumptions for mid-term quantum computer architectures. We therefore also discuss the overhead due to mapping the quantum algorithm to a two-dimensional planar layout, which further increases the overall runtime but makes the estimates more realistic.

II. A HOMOGENEOUS CARBON DIOXIDE FIXATION CATALYST
The catalytic process that we selected for our present work is the binding and transformation of carbon dioxide. The infrared absorption properties of carbon dioxide make it a green-house gas that is a major contributor to climate change. Naturally, limiting or even inverting rising carbon dioxide levels in the atmosphere is a truly important goal and all possible ways to accomplish it must be considered. One option, although currently not the most relevant one [20], is carbon dioxide utilization by chemical transformation. In basic research, options have been explored to fix and react inert carbon dioxide to yield chemicals of higher value (see Refs. [21,22] for two recent examples). Also, homogeneous transition metal catalysts have been designed in the past decade, many of them based on ruthenium as the central metal atom. Despite the fact that eventually heterogeneous catalysis may be preferred over homogeneous catalysis, understanding the basics of carbon dioxide fixation chemistry is facilitated by welldefined homogeneous systems. While formate is often the product of such a fixation process, methanol is a chemical of higher value. Not many catalysts have been reported so far that can transform carbon dioxide directly into methanol [23] and all of these are plagued by a comparatively low turnover number. To find synthetic catalysts that robustly produce high-value chemicals from carbon dioxide with high turnover number is therefore an important design challenge and computational catalysis can provide decisive insights as well as virtual screening to meet this challenge.
For our assessment of quantum computing resource estimates, we chose a catalyst reported by the Leitner group [24] as extensive density functional theory (DFT) calculations on its mechanism have already been reported by this group. Hence, key molecular structures have already been identified based on DFT. A detailed mechanistic picture has emerged, from which we took intermediates and transition state structures for our resource estimate analysis. Lewis strucures of the eight structures selected for our work are shown in Fig. 1.
In this work, we focus on the electronic structure of isolated complex structures and therefore neglect any surrounding such as a solvent as well as energy contributions from nuclear dynamics that would be required for the calculation of free energies (cf. [8] on how to include them). The heavy element ruthenium is known to form complexes that are often low-spin, i.e., singlet or doublet, and do not feature strong multi-configurational character (by contrast to its lighter homolog iron). It is therefore not surprising that we found no pronounced multiconfigurational character as highlighted by the entanglement diagrams provided in the supplementary material. Hence, the Ru complexes selected for our resource estimate study feature mostly dynamic electron correlation indicated by small weights of all but one electronic configurations (Slater determinants) that contribute to the exact wave function in a full configuration interaction expansion. For our resource analysis, this is of little importance but highlights the importance of a larger and faster universal quantum computer with a few thousand logical qubits for complete state representation, to accurately capture all relevant dynamical correlations.
The presentation of the CO 2 -fixating ruthenium catalyst of the Leitner group [24] was accompanied by extensive DFT calculations of the potentially relevant molecular structures, which is a routine procedure in chemistry. However, DFT electronic energies are plagued by approximations made to the exchange-correlation functional [25,26] and can therefore be of unknown reliability (cf., e.g., Refs. [27][28][29]). The authors of Ref. [24] selected the Minnesota functional M06-L [30] because they found good overall agreement with their experimental results. In order to demonstrate the uncertainty that generally can affect DFT results, which can be very cumbersome if experimental data are not available for comparison, we supplemented the DFT data of Ref. [24] with results obtained with the standard generalized-gradient-approximation functional PBE [31] and its hybrid variant PBE0 [32] for the eight structures of the reaction mechanism considered in this work (data shown in Fig. 2).
As can be seen in Fig. 2, whereas structure optimization (compare results for structures optimized with the same functional with the single-point data denoted by the double slash notation) has a small effect on the relative electronic energies, the difference between the functionals can amount to more than 50 kJ/mol (i.e., 19 mHartree per molecule) and can even reverse the qualitative ordering of the compounds (compare structures 'V' und 'VIII'). Obviously, in the absence of any additional information (such as experimental data or more accurate calculations), it is virtually impossible to settle on reliable energetics with potentially dramatic consequences for the elucidation of a reaction mechanism.

III. QUANTUM COMPUTING ENHANCED COMPUTATIONAL CATALYSIS
The elucidation of chemical reaction mechanisms is routinely accomplished with approximate quantum chemical methods [33,34]  stationary structures on Born-Oppenheimer potential energy surfaces are optimized, yielding stable reactants, products, and intermediates of a chemical process. More importantly, optimized first-order saddle points on such a surface represent transition state structures, which are key for detailed kinetic studies and hence for the prediction of concentration fluxes.
In the following, we discuss where in the mechanism elucidation process quantum computing can be efficient, useful, and decisive, and hence, how quantum computing can efficiently enhance and reinforce computational catalysis to make a difference. As there are many steps involved that require a deep understanding of various branches of theoretical chemistry, we provide an overview of the essential steps in Fig. 3. Understanding chemical catalysis, and chemical reactions in general, requires an exploration of relevant molecular structures (structure exploration in Fig. 3), which is usually done manually and with DFT approaches (as in Ref. [24] for our example here), but which can now also be done in a fully automated and even autonomous way [35]. These structures need to be assigned an energy which may be conveniently separated into an electronic contribution (steps 1-7 in Fig. 3) and a remaining part (additional free energy caculations in Fig. 3) containing nuclear and other effects (calculated in a standard rigid-rotor-harmonic-oscillator model accompanied by dielectric continuum embedding or by explicit molecular dynamics) to eventually yield a free energy calculated from all related microstates. Relative free-energy differences will eventually be used as barrier heights in expressions for absolute rate constants (kinetic modeling in Fig. 3) that can then be used in kinetic modeling for predicting concentration fluxes through the chemical web of relevant molecular structures. Ultimately, such knowledge can be exploited to improve on existing or to design new catalysts with enhanced catalytic properties.
Accuracy matters: A reaction rate depends exponentially on the energy difference between a transition state structure and its corresponding stable reactants, which are connected by an elementary reaction step. Because of this exponential dependence, highly accurate energy differences are decisive. While many contributions enter these free energy differences, the electronic energy difference is the most crucial one in bond breaking and bond making processes. Relative DFT electronic reaction energies for the catalytic cycle obtained with a def2-TZVP basis set and three different approximate density functionals: M06-L [24], PBE, and PBE0 on M06-L/def2-SVP-optimized structures taken from Ref. [24]. For comparison, we provide relative electronic reaction energies from PBE/def2-TZVP//PBE/def2-TZVP calculations (in Pople's double slash notation, where the density functional behind the double slash is the one for which the structure was obtained) labeled as "PBE opt.".   Electronic energies are key components (steps 1-7 in Fig. 3): Electronic energies are notoriously difficult to calculate and standard approximations are affected by unknown errors that can be large. Only for electronically simple structures, so-called closed-shell single-determinant electronic structures, well-established methods exist that run efficiently on a classical computer (such as explicitly correlated local coupled cluster schemes with, at least, perturbatively treated triple excitations [36]). For general electronic structures, however, no method of comparable accuracy exists that is at the same time feasible for moderately sized molecules. In particular for strong correlation cases, which require more than one Slater determinant for a qualitatively correct description of the electronic wave function, it can be hard to obtain an accurate total electronic energy, which then enters the calculation of relative energies.
In our previous work [8], we considered a quantum computer of moderate size within reach in the not too distant future. Moreover, we assumed that such a machine might have 100 to 200 logical qubits available for the representation of a quantum state. Such a state would be constructed from single-particle states, i.e., molecular orbitals for molecular structures. For decades, it has been the goal of quantum chemistry to devise methods that efficiently construct approximations to a many-electron state represented in terms of orbitals. If the full manyelectron Hamiltonian is expressed and diagonalized in a complete many-electron (determinantal) basis constructed from such a one-electron basis, then a full-CI calculation may be carried out. Such an exact diagonalization is, however, routinely only feasible for about 18 orbitals [37] (a record calculation was recently carried out for 24 orbitals [38]) due to the exponentially growing number of many-electron states with the number of orbitals.
Required accuracy: A reasonable target accuracy for relative energies (and therefore also for total electronic energies of individual molecular species) is about 1 mHartree, if not 0.1 mHartree. This corresponds to about 2.6 and 0.26 kJ/mol, respectively. Note that thermal energy RT (T being temperature and R the gas constant) at room temperature is on the order of 2.6 kJ/mol, which may be related to the kinetic energy of a reactant molecule at average velocity (compare this to the spread observed for different DFT functionals in our example in the last section).
It is important to understand that this target accuracy is important for relative energies, i.e., for the energy differences that eventually enter the rate constant expressions. For total electronic energies, however, this accuracy does not match the precision with which these energies are actually known. In fact, the true total energies are typically off by a huge energy offset because one does not make an effort to describe core electrons, which contribute significantly to the total electronic energy, but not to reaction chemistry as they are atomically conserved. Hence, such calculations rely on significant error cancellation effects that occur when atomic contributions to the total electronic energy drop out in the calculation of reaction energies (which are relative energies) as they are conserved during a reaction (consider, e.g., a molecular orbital that is mostly of 1s-atomic orbital character and remains unaffected by a chemical reaction, but contributes significantly to the total electronic energy).
Challenges of electronic structure (steps 1-7 in Fig. 3): It is therefore most important to get the electronic (valence) structure of each relevant molecular structure right. Here, quantum computing offers an opportunity [8]. It is important to realize that a typical chemical catalysis problem does not involve very many valence orbitals in every elementary reaction step. As a consequence, the size of the active orbital space, from which the electronic wave function is constructed, does usually not need to be very large and can be easily handled with methods such as Complete Active Space Self Consistent Field (CASSCF), DMRG, or FCIQMC. The latter two are capable of handling active orbital spaces of up to about 100 spatial orbitals owing to efficient approximations. However, sometimes there may be a price to pay for these approximations and that is a residual uncertainty regarding convergence of the electronic energy. For instance, a DMRG result will critically depend on -apart from a fixed finite value of the bond dimension -proper convergence of the sweeping algorithm, which, at times, might be difficult to determine. In the case of the more recent FCIQMC approach, which is under continuous development and offers extraordinarily efficient scaling on large traditional parallel computers, convergence with respect to the number of walkers or extrapolation to an infinite number of walkers may be hard to achieve for certain molecules. We also note that it is generally hard for any quantum chemical approach (hence, also for DMRG and FCIQMC) to deliver rigorous yet useful information about the error associated with a specific result. Moreover, any active-space approach is plagued by a severe drawback discussed below, namely the neglect of dynamic electronic correlation arising from the majority of virtual orbitals neglected.
Importance of dynamic electron correlation (steps 1 and 7 in Fig. 3): As noted already above, moderately sized molecules, such as the ones studied in this work, may easily require on the order of 1000 or more spatial molecular orbitals for a description that may be considered accurate within the chemically relevant accuracy of about 1 mHartree or even 0.1 mHartree for relative energies. However, the restriction to the valence orbital space from which the active space of the most strongly correlated orbitals is chosen [39,40] compromises this accuracy (note that these active orbitals may be identified based on natural-orbital occupation numbers [41] or on orbital entanglement measures [42,43], even in a completely automated fashion [42,44]). The vast majority of orbitals that are weakly entangled and neglected in this procedure give rise to dynamic electron correlations, which are neglected in a small-CAS calculation. However, the dynamic electron correlation contribution to the total electronic energy is decisive and so standard recipes exist to approximate it. The most prominent one is the a posteriori correction (step 7 in Fig. 3) provided by multi-reference perturbation theory [45], which, however, requires elements of the three-and four-body reduced density matrices (3-RDMs and 4-RDMs, respectively) that would be extremely hard to obtain by quantum computing. Not even approximate approaches that rely on at most some 3-RDM elements will be accessible by quantum computing for interesting molecules. To evade such a computational bottleneck, perturbthen-diagonalize approaches (step 1 in Fig. 3) such as range-separation DFT for CAS-type methods [46][47][48] or transcorrelation approaches [49] were proposed for quantum computing [8].
Quantum computing is supposed to be a valuable and, in the long run, ultimately superior competitor to the aforementioned traditional methods (i) because rigorous error estimates are available and (ii) because systems may be accessible that are traditionally not feasible because of the curse of dimensionality when a total state is to be represented in a large set of orbitals. Its true benefits will fully unfold if energy measurements in the full orbital basis become feasible (on this, see the discussion in the Conclusions section).
Four-index transformation (step 4 in Fig. 3): A cumbersome technical step, to be carried out by traditional computing on classical computers, is the four-index transformation of the two-electron interaction integrals in the Hamiltonian, which scales with the fifth power of the number of basis functions. In this transformation, the final parameters for the electronic Coulomb Hamiltonian in the molecular orbital basis are produced from the four-index integrals defined in the atomic orbital basis, i.e., in the basis which is provided for the representation of all molecular orbitals (typically a set of Gaussian-type functions such as the def2-TZVP basis set used in the DFT calculations presented above). Note, however, that this transformation, while being expensive but feasible for a small CAS, becomes a threat to the whole calculation when the active space grows to eventually incorporate the complete one-electron (atomic orbital) basis (of say, more than 1000 basis functions). Hence, the sheer number of two-electron parameters in the electronic Hamiltonian will require us to rethink how to deal with these terms growing to the fourth power in the number of orbitals, such as in recent work [50] that generate sparse low-rank approximations to the two-electron integrals.
Naturally, this issue has also been discussed in modern traditional approaches [51].

IV. QUANTUM ALGORITHMS FOR CHEMISTRY
For the catalysis problem we require quantum algorithms that provide reliable results with controlled errors on the electronic energy in a given orbital basis. Uncontrolled approximations in quantum algorithms would negate the advantages offered by quantum computers, which will require tremendous effort to build and operate even at a moderately sized error-corrected scale.
Though it is without doubt that the popular variational quantum eigensolver (VQE) [52] can obtain parameters of a unitary coupled cluster (UCC) parametrization of the electronic wavefunction that is likely to be accurate (especially when higher than double excitations are considered [53]), these scheme unavoidably generate a residual unknown uncertainty in the true electronic energy. Reducing these errors by improving the ansatz will require significantly more gate operations than are expected to be possible on non-error corrected noisy intermediate scale quantum devices [54,55]. Moreover, the number of repetitions required to estimate energies with sufficient accuracy of 1 mHartee or better is enormous [56].
We also note that knowledge of reliable and controllable errors in quantum algorithms is a decisive advantage over classical methods such as DMRG and FCIQMC, for which convergence control with respect to their parameters is not necessarily easy or even feasible. Therefore, we turn to one of the most promising applications of quantum computers, which is a bounded-error simulation of quantum systems using quantum phase estimation. The main idea is to synthesize a quantum circuit that implements the real time-evolution operator W = e −iH/α by a given Hamiltonian H for some normalizing factor α, which henceforth are always in atomic units of Hartree. When applied n times to an eigenstate H|ψ k = E k |ψ k , a phase nE k /α is accumulated. Quantum phase estimation then estimates the energy E k with a standard deviation ∆ E = O(α/n) [57]. If one prepares an arbitrary trial state |ψ trial rather than an eigenstate, phase estimation collapses the trial state to the k-th eigenstate with probability p k = | ψ k |ψ trial | 2 and it returns an estimate to the corresponding energy E k [58].
The phase estimation procedure is executed on a quantum computer by applying a sequence of quantum gates. If the unitary W is implemented using a number c W of quantum gates, the overall quantum gate cost of obtaining a single estimateÊ k is then where the factor πα 2∆ E arises from previous analyses on the performance of phase estimation [17,59] (combined with a so-called phase-doubling trick [17,60]). In general, the quantum circuit only approximates W to some bounded error ∆ W in spectral norm. This adds a systematic bias of α∆ W to the es-timateÊ k . Thus, we budget for this error by making the somewhat arbitrary choice of performing phase estimation to an error of 0.9∆ E , and compiling W so that ∆ W ≤ 0.1∆ E /α. To date, there are several prominent quantum algorithms for approximating real time-evolution, such as Lie-Trotter-Suzuki product formulas [61], sparse Hamiltonian simulation [62], linear-combination of unitaries [63], qubitization [12], and quantum signal processing [13]. In this work, we consider the electronic Hamiltonian in its non-relativistic form with Coulomb interactions (in Hartree atomic units), which is parametrized through the one-and two electron integrals h ij and h ijkl of the molecular orbitals {ψ i }, where x i denote electronic coordinates and Z m is the charge number of nucleus m at position r m (note that a relativistic generalization is straightforward [64]). We explicitly separate the fermion indices p ≡ (i, σ) into an index where i ∈ {1, ..., N } enumerates the N spatial molecular orbitals, and σ ∈ {0, 1} indexes spin-up and spindown. Hence, the fermion operators satisfy the usual anti-commutation relations and the coefficients h ij , h ijkl are real and satisfy the symmetries For the purposes of phase estimation through the so-called qubitization approach, it is not necessary to simulate time-evolution e −iHt . Whereas our previous work [8] approximated the time-evolution operation using Lie-Trotter-Suzuki product formulas, it can be advantageous in some cases to implement the unitary walk operator W = e sin −1 (H/α) [12,65,66] instead, which has some normalizing constant α ≥ H that ensures the arcsine is real. This walk operator can be implemented exactly, assuming access to arbitrary single-qubit rotations, in contrast to all known quantum algorithms where time-evolution e −iHt can only be approximated. After estimating the phaseθ = sin −1 (E k /α) with phase estimation, we may obtainÊ k by applying sin(θ) in a classical postprocessing step.
We focus on the qubitization technique applied to electronic structure Eq. (2), with the goal of minimizing the quantum gate costs in Eq. (1). In fault-tolerant architectures, quantum gate costs reduce to the number of so-called primitive Clifford gates (e.g. Hadamard= 1 √ 2 1 1 1 −1, , phase= 1 0 0 i, , and controlled-Not) and non-Clifford gates (e.g. and Toffoli, which applies a Not gate controlled on two input bits being in the 'one' state). As the physical resources needed to implement a single error-corrected primitive non-Clifford gate, such as the T gate is on the order of 100 to 10000 times more costly than a single two-qubit Clifford gate [67,68] -and one Toffoli gate may be implemented by four T gates -much recent work has focused on optimizing the non-Cliffordgate cost of W . Within an atom-centered basis set (such as the ones employed in this work), this focus has reduced the T gate cost of obtaining a single estimateÊ k fromÕ(N 5 /∆ 3/2 E ) using Trotter methods toÕ(α CD N 3/2 /∆ E ) Toffoli gates and O(N 3/2 log N/∆ E ) qubits using qubitization combined with a so-called single-factorized Hamiltonian representation, where α CD = O(N 3 ) [19] is a certain norm of the Hamiltonian. Our main technical contribution is a further reduction of the gate cost of W and the normalizing factor to α DF = O(N 3/2 ) as described in the next Section.

V. EFFICIENT ENCODING OF DOUBLE-FACTORIZED ELECTRONIC STRUCTURE
Our main algorithmic advance is based on a quantum algorithm to 'qubitize' the so-called doublefactorized representation H DF of the electronic Hamiltonian H. On the one hand, the doublefactorized representation is sparse, and therefore minimizes the cost c W of qubitization, which generally scales with the number of terms needed to represent the Hamiltonian. On the other hand, the double-factorized representation is a partial diagonalization of the original Hamiltonian, and hence has a small normalizing constant α DF .
The double-factorized form builds upon the singlefactorized representation H CD of the Hamiltonian H where Note that the rank-R factorization of the two- Eq. (2) into the N ×N symmetric matrices L (r) always exists due the symmetry constraints Eq. (6), and may be computed using either a singular-value decomposition or a Cholesky decomposition. This representation also facilitates a low-rank approximation by truncating the rank R. In the worst-case, R ≤ N 2 . However, it was noted by Peng et al. [50] that rank R ∼ N for typical molecular systems when N is proportional to the number of atoms. This reduces the number of terms needed to describe the second-quantized Hamiltonian from O(N 4 ) in Eq. (2) to O(RN 2 ) in Eq. (7). This representation was first exploited by Berry et al. [19] to qubitize H CD with a normalizing constant α CD . = 2 h EW + 2 r∈[R] L (r) 2 EW expressed using the entry-wise norm The technical innovation in our approach is a quantum circuit, detailed in the supplementary material, for qubitzing the double-factorized Hamiltonian This is obtained by a rank-M (r) eigendecomposition of the symmetric matrices  [50] also noted that for typical molecular systems where N 10 3 scales with the number of atoms, one may retain M ∼ N log N eigenvectors and truncate the rest. A key technical step in our approach is to work in the Majorana representation of fermion operators {γ p,x , γ q,y } = 2δ pq δ xy I.
As we show in the supplementary material, this representation maps which is expressed as a sum of squares of one-body Hamiltonians This leads to a normalizing constant Note that the difference between L (−1) andh is the mean-field Coulomb repulsion contribution term − 1 2 l h illj . Note that α DF is also significantly smaller than α CD from previous approaches [19].
In addition to the factor of eight reduction in the prefactor of the two-electron norms, the dependence of α DF on Schatten norms L (r) SC is beneficial as they can be up to a factor of N smaller than the entry-wise norms L (r) EW that α CD depends on, as follows from the following tight inequalities for any Hermitian N × N matrix h, which we prove in the supplementary material and may be of independent interest. The Toffoli gate complexity of synthesizing the walk operator to an error ∆ W = 0.1∆ E /α DF , as detailed in the supplementary material, is then for any choice of integer λ ≥ 0, and where the parameter β = 5.652 + log 2 . This also uses the following number of qubits Note that λ controls the number of ancillary qubits. By choosing λ = O( M N β ), the Toffoli cost is , which is advantageous when M N β, using O( √ M N β +N +β) qubits, Assuming the empirical scaling of M and R by Peng et al, our algorithm encodes electronic spectra for atomcentered basis sets into the walk operator using only c W =Õ(N ) Toffoli gates, and with a normalizing constant α DF , which improves upon theÕ(N 3/2 ) Toffoli gates and larger normalizing constant α CD required by the single-factorized approach [19].
In the following examples we consider, typical values of λ that minimize the Toffoli costs are between 1 and 5, and the Toffoli gate counts and qubit counts we quote exclude the sub-dominant big-O(·) component of Eqs. (17) and (18).

A. Truncation
We now describe our procedure for obtaining, by truncating eigenvalues, low-rank approximations H DF of the double-factorized Hamiltonian Eq. (12) from an initial numerically-exact representation H DF . We focus on truncating the two-electron terms as those tend to dominate the cost of block-encoding.
Given a target approximation error , we consider two truncations schemes, which we call coherent and incoherent. The coherent scheme upper bounds the actual error in spectral norm for any given choice of co . This upper bound is obtained by a triangle inequality, which assumes that all truncated terms have an error that adds linearly. That is, if the differenceH DF − H DF = j H j is a sum of terms, then we truncate so that We find that this bound is often quite loose, and the gap between the actual error and grows with system size. This motivates the incoherent scheme, which assumes that the error of truncated terms add incoherently by a sum-of-squares. Thus, we truncate so that Suppose we remove a single eigenvalue λ (r) m from Eq. (12) during truncation. Using the identity can be bounded in · as follows: In the coherent scheme, we truncate eigenvalues in the index set T such that (r,m)∈T L (r) SC |λ (r) m | ≤ co . In the incoherent scheme, we truncate such that In the following, we truncate according to the sum-of-square procedure. While this approach does not rigorously bound the total error, we find that it better matches the error we observed for our benchmark systems, see Fig. 4, but still tends to significantly overestimate the actual error.

VI. RESULTS
For the intermediates depicted in the catalytic cycle in Fig. 1, we carried out density functional calculations that delivered the data presented in Fig. 2 and wave function calculations with various active orbital spaces for the analyses discussed below. For the sake of brevity, we refer the reader to the supplementary material for all technical details on these standard quantum chemical calculations.
For each of these key intermediates and transition states we then evaluate the cost of performing quantum phase estimation to chemical accuracy, both for small active spaces of 52-65 orbitals and larger ones, and then discuss runtimes and qubit requirements.

A. Selection of active orbitals
As mentioned in Section IV, the electronic Hamiltonian in Eq. (2) is parametrized by one-and twoelectron integrals, h ij and h ijkl , respectively, over the molecular orbitals of a restricted orbital subspace, the active space. For the selection of the active space (cf. Fig. 3), all strongly correlated orbitals must be identified (see the detailed discussion for our carbon dioxide fixation process in the supporting information). For this, we relied on orbital entanglement measures which we applied in an automated procedure [42,44]. The molecules considered here turned out to be not dominated by static electron correlation. It was, however, still possible to select a small active space of the size of five to sixteen orbitals. As we are interested in studying resources and algorithms in the limit of very large active spaces, we added weakly correlated orbitals. Naturally, it then became increasingly difficult to select the active spaces based on orbital entanglement alone (because all such orbitals show similarly weak entanglement entropy measures). In this study, we therefore resorted to criteria which ensure a reproducible selection that is, at the same time, reasonable from a chemical point of view. We chose three different active space sizes (small, intermediate, and large) for each molecular structure of the catalytic cycle. The first active space comprises those orbitals selected based on entanglement criteria. Note that this small active space is identical to the one employed in the CASSCF calculations that produced the molecular orbitals from which the Hamiltonian parameters h ij and h ijkl were calculated. The intermediate active space includes all the orbitals of the small active space and then in addition the valence orbitals of ruthenium and the ligands (excluding the triphos ligand and, if present, solvent molecules), as well as the orbitals involved in ligand-metal bonding. For the largest active space, we further supplemented these orbitals with the π and π * orbitals on the triphos ligand, as they form a well-defined and easily identifiable set of orbitals. Apart from these three active spaces per catalyst, we created even larger active spaces for structure XVIII, for which we additionally selected three active spaces of size n =100, 150, and 250 spatial molecular orbitals. Since a manual selection is pointless for such large active spaces of mostly dynamically correlated orbitals, we selected the n/2 occupied and n/2 virtual orbitals around the Fermi level. For each of these active spaces, we obtained the one-and two-electron integrals h ij and h ijkl via a four-index transformation (see also Fig. 3). These integrals then served as input to the quantum algorithms.

B. Resource estimates
We now evaluate the cost of phase estimation to chemical accuracy for various structures in the carbon capture catalytic cycle of Section II. The active spaces of the molecules considered in this section are in the range of 52-65 orbitals, although we tabulate more results for 2-150 orbitals in the supplementary material. In addition, we also evaluate the scaling of cost with respect to the active space size N while keeping the number of atoms fixed, and find a different scaling law of M ∼ N 2.5 for the number of eigenvectors compared to that of M ∼ N log N [50] when increasing the number of atoms. We find, as shown in Table I, that the gate cost for systems in the 52-orbital to 65-orbital range is on the order of 10 10 Toffoli gates, using about 4000 qubits.
The cost depends on the truncation threshold in as plotted in Fig. 5. At the highly aggressive threshold of in = 100mHartree, the rank of these examples roughly matches the parameters fit by Peng et al. [50] to three-dimensional hydrocarbons for 54 orbitals. However, the numerically computed shift in energy at that threshold can exceed chemical accuracy, following Fig. 4, and should be interpreted as a most optimistic cost estimate. Therefore, we choose in = 1mHartree, which our numerical simulations indicate closely reflect chemical accuracy. Irrespective of the truncation scheme, we note that the logarithmic scaling with 1/ in means that the Toffoli costs vary by at most factor of five between these extremes. By varying the active space size between  We next demonstrate the extent of our algorithmic improvements by applying our techniques to the 54-orbital representation of the FeMoco active site of nitrogenase that we considered previously [8], and comparing costs with prior art based on Trotterization and qubitization of the single-factorized representation. As seen in Table II, our Toffoli cost following Eq. (17) is on the order of 1.31 × 10 10 using 3700 qubits. Assuming that the number of logical qubits is not a limiting factor, this is a dramatic improvement over the 6.0 × 10 14 T gates and 142 qubits in our original estimate [8], and a significant improve-ment over the 1.2 × 10 12 Toffoli gates of the singlefactorized approach [69], where α CD = 3.6 × 10 4 Hartree with 3.0 × 10 5 unique non-zero terms, and a rank of R ∼ 200 was claimed to be sufficient to achieve chemical accuracy with respect to a CCSD ansatz. This is seen in Table III where, for the sake of comparison, we choose a truncation threshold of in = 73mHartree to normalize the rank between our results and the single-factorized approach. Moreover, our double-factorized approach also improves upon the 2.3 × 10 11 Toffoli gates of highlyoptimized implementations [69] based on the unfactorized Hamiltonian Eq. (2), which has α = 9.9×10 3 Hartree with 4.4 × 10 5 unique non-zero terms. Our improvement largely stems from a normalizing factor α DF that is 30 to 107 times smaller as the number of terms in all approaches at this threshold are roughly equal. This trend also applies to the various carbon fixation catalyst structures that we consider  I  II-III  II  IX  V  VIII-IX  VIII   when we perform a similar comparison but instead use the same incoherent truncation scheme for all examples at a more conservative error threshold of in = 1mHartree.

C. Runtimes and qubit counts
We finally relate these gate count estimates to expected runtimes on future quantum computers. Exact runtime will, of course, depend on details of the system and error correction schemes. In our previous analysis [8], we optimistically assumed gate times of 100ns for fault tolerant logical gate operations, which may be a long-term achievable goal. As a more realistic assumption for mid-term fault tolerant quantum computers we now expect that the physical gate times in current quantum computer ar-chitectures range from tens of nanoseconds for solid state qubits to tens of microseconds for ion traps. Realizing fault tolerance by using quantum error correction with the surface code [70], will lead to logical gate times for a Toffoli gate of about 10µs to 10ms depending on the architecture. The lower estimate of 10µs means that 10 10 Toffoli gates correspond to a runtime of 28 hours or about a day, while the upper estimate of 10ms would correspond to several years. These considerations show that fast gate times will be essential for realistic quantum computation for chemical catalysis.
The above estimates do not explicitly consider the cost of layout, i.e. the mapping onto a nearest neighbor planar square lattice topology of error corrected logical qubits. We ignore this overhead here, since the subroutine with dominant cost, the table lookup discussed in the supplementary material is based on Table II. Comparison of our new double-factorization approach for HDF applied to the FeMoco active site of nitrogenase (N = 54) with prior approaches based on Trotterization [8] or qubitization [19] using the unfactorized H or single-factorized HCD Hamiltonian, and also for the VIII structure in the catalytic cycle (N = 65) where all examples apply the incoherent truncation scheme with the same threshold of in = 1mHartree. Fanout operators [71] and maps well to this topology. Moreover, we here assume that any overhead of layout as well as Clifford gates are included in the assumed gate time for a Toffoli gate, as these dominant Fanout operations may be implemented in a constant Clifford depth of 4 in parallel with the sequentially applied Toffolis. Fault tolerant gates also have an overhead in the number of qubits, with hundreds to thousands of physical qubits needed per logical qubit [8,70], depending on the quality of the qubits. 4000 logical qubits will thus correspond to millions of physical qubits. This implies the need for a scalable quantum computer architecture, scaling to millions of qubits.

D. State preparation
To determine the ground state energy using phase estimation, it is required to prepare a trial state |ψ trial which has a high overlap with the true ground state |ψ 0 of the Hamiltonian H as discussed in Section IV. While the exact ground state is unknown, we used state-of-the-art DMRG calculations to obtain an approximate ground state for each of our systems (see supplementary material for further details). We then calculated the overlap of this approximate ground state with the dominant single-determinant state and found that there is a large overlap for all of our systems, see Table IV. We expect this overlap to not shrink substantially with increasing precision of the approximate ground state obtained by DMRG and hence preparing the dominant single-determinant state is sufficient for the molecular structures considered in this work. Recall that the Ru complexes selected for our study do not exhibit strong multi-reference character. If this overlap had turned out to be small, we would defer to Ref. [72] for options on how to prepare a multi-determinant initial state in order to boost the success probability of phase estimation.

VII. CONCLUSIONS
In this work, we considered computational catalysis leveraged by quantum computing. We rely on accurate error bounds on the electronic energy accessible in quantum algorithms to obtain sufficiently accurate data for intermediate and transition state structures of a catalytic cycle. In particular, i) we considered decisive steps of a synthetic catalyst that is known to convert carbon dioxide to methanol (and for which traditional work based on density functional theory had already been reported alongside the experimental results [24]), ii) we presented a new quantum algorithm in the qubitization framework that exploits a double-factorized electronic structure representation, which significantly reduces the runtime, iii) we validated the truncation schemes by comparison with density matrix renormalization group calculations, iv) we confirmed that starting the quantum algorithm from a single determinant initial state has high success probability for the carbon dioxide functionalization process studied in this work, and v) we calculated realistic resource estimates for mid-term scalable quantum computers and related them to expected runtimes.
A pressing challenge in the cost of quantum simulation for large molecular systems is the rapid growth in the number of four-index two-electron integrals. By using integral decomposition techniques in the double-factorized representation [51,73], we simultaneously minimize two key parameters governing the cost of qubitization: the number of coefficients that must be loaded into the quantum computer and a certain bound α DF on the spectral norm of the Hamiltonian. Compared to other simulation techniques such as Trotterization, qubitization also enables a large trade-off in non-Clifford gate count by using additional qubits. In the case of estimating an energy level to chemical accuracy, we found that our approach has a Toffoli gate cost with respect to active space size N , while keeping the number of atoms fixed, that scales like ∼ N 3.25 . When the number of atoms also grows with N , our technique promises a Toffoli cost scaling like ∼ α DF N log (N ), assuming the empirically fitted sparsity of double-factorization by other authors [50], and where α DF ∼ N 1.5 based on Hydrogen chain benchmarks. When applied to structures in the catalytic cycle with 52-65 orbitals, our approach requires roughly 10 10 -10 11 Toffoli gates and ∼ 4000 logical qubits. Note that the benefit of our reduced Toffoli count can far outweigh the cost of using more logical qubits -the physi-cal qubit count of the overall algorithm is typically dominated by that required for the fault-tolerant distillation of non-Clifford gates. Under realistic assumptions on the performance of mid-term quantum hardware, this corresponds to a runtime of a few weeks for a single calculation. While this seems similar to the results of our previous paper [8], we here use much more conservative and realistic estimates for the clock speeds of mid-term quantum computers, while Ref. [8] assumed ambitious specifications for the long term.
From our results it is evident that a future universal quantum computer can only lead the range of electronic structure methods if further developments provide faster algorithms that can treat much larger active spaces in order to be competitive. We emphasize that, although no logical qubit has been realized experimentally so far, a universal quantum computer that can represent a state on a few thousand logical qubits would revolutionize electronic structure theory as it bears the potential to provide an accurate exact-diagonalization energy for a moderately sized molecular system of on the order of 100 atoms in the full single-particle basis set. As a consequence, any necessarily approximate a-priori or aposteriori correction for the nagging dynamic correlation problem (that arises solely from the choice of a reduced-dimensional active space) would no longer be needed. It is clear that such a calculation would be an ultimate goal, but also that it would present new technical challenges.
Going to an order of magnitude larger active space sizes as considered here would increase the quantum computational requirements by three orders of magnitude. Hence, despite the advances reported in this work, it is obvious that further algorithmic improvements of quantum algorithms are urgently needed in order to get timings down so that a quantum computer can become superior to traditional approaches, i.e., to eventually demonstrate a quantum advantage in real-world chemistry applications. Given the innovation rate in quantum algorithms for chemistry of many orders of magnitude over the past years, we are confident for this to happen. One direction is exploring novel sparse representations. Another promising avenue is exploiting the fact that the number of electrons may grow much smaller than the size of the single-particle basis set when expanding the latter to incorporate dynamical correlations. That can lead to significant savings compared to the algorithms presented here.

VIII. COMMENTS ON THE INTEGRAL FILES
The integral files used in this work are available upon request from the corresponding author and will be made publicly available upon publication.

ACKNOWLEDGMENTS
We are grateful to Dr. Markus Hölscher for valuable discussions on the catalyst synthesized in the Leitner lab and for providing us with the correct structure of complex I and to Prof. Ali Alavi for sharing his insights into the FCIQMC algorithm with us.

Supplementary Material for Quantum Computing Enhanced Computational Catalysis
Vera von Burg, 1  We chose seven key intermediates and transition states from the supporting information of Wesselbaum et al. [1], therein labeled as complexes II, II-III, V, VIII, VIII-IX, IX, and XVIII. These structures had been optimized with the M06-L/def2-SVP combination of exchangecorrelation density functional and basis set, and single-point M06-L/def2-TZVP energies were obtained [1] which we used for the diagram in the main text. In this work, we continue to use the roman numerals as labels. Hyphenated labels (i.e., II-III and VIII-IX) correspond to transition states whereas the others are stable intermediates. The Cartesian coordinates of intermediate I were erroneous in the supporting information of the original paper and we obtained the correct ones through a private communication [2] (see Section IX). All of the complexes are monocations and were considered in the lowest-energy singlet state.

II. DFT CALCULATIONS
To be able to obtain relative reaction energies (see Table II), we optimized the small molecules CO 2 , H 2 , H 2 O, THF, and methanol with Gaussian09, revision D.01 including density fitting and the UltraFine keyword for the integration grid in accordance with the supporting information of Ref. [1]. For all other density functional theory (DFT) calculations reported herein we employed Turbomole [3], version 7.0.2. We then carried out singlepoint, unrestricted PBE [4] and PBE0 [5] calculations in the def2-TZVP [6] basis for each complex and the small molecules on the M06-L/def2-SVP structures reported in the supporting information of Ref. [1] or the newly obtained ones in the case of the small molecules, respectively. To ensure that we obtained the correct global minima in the molecular orbital coefficient parameter space, we perturbed the α and β orbitals of the converged calculation with the orbital steering protocol of Ref. [7]. We repeated the respective calculation with the perturbed orbitals as starting orbitals to probe whether a solution of lower energy could be obtained.
We additionally optimized the structures with the PBE/def2-TZVP combination of density functional and basis set and carried out frequency calculations to ensure we had obtained the correct type of stationary points (no imaginary frequencies for intermediates, one for transition states).
A. Single-point M06-L energies for small molecules For comparison, we report here the M06-L/def2-SVP electronic energies for the small molecules which we re-optimized. The electronic energies of the unrestricted, single-point PBE/def2-TZVP and PBE0/def2-TZVP DFT calculations for the complexes and remaining compounds as well as the relative reaction energies of the complexes are collected in Table III and the ones of the re-optimized structures in Table IV. The relative reaction energies reported in these Tables were obtained in accordance with the supporting information of Ref. [1] and their formulaic calculation is given in Table II. We employ the double slash notation where the labels before the double slash denote the combination of exchange-correlation functional and basis set for the singlepoint calculation whereas the exchange-density functional and basis set employed for the structure optimization is given after the double slash. Table II. Formulae for the calculation of relative reaction energies ∆E el rel for the complexes, where E el (i) is the electronic energy of compound i computed with a certain combination of exchangecorrelation density functional and basis set.

Complex
Calculation of ∆E el

III. HF AND POST-HF CALCULATIONS
We obtained Hartree-Fock (HF) and Complete Active Space Self-Consistent Field (CASSCF) [8][9][10][11] molecular orbitals (MOs) in an ANO-RCC-VTZP [12,13] atomic orbital (AO) basis for the light elements and ANO-RCC-VQZP [14] for ruthenium with OpenMolcas [15]. We employed a Cholesky decomposition of the two-electron repulsion integrals with the default threshold of 10 −4 (note that this decomposition which occurs during the quantum chemical calculations is separate from a later truncation of the two-electron integrals in the context of the quantum computing algorithms.). The choices for the molecular and atomic orbital bases in this study are summarized in Table V.
We performed exploratory calculations in a so-called minimal basis (mb) (ANO-RCC-MB [12][13][14] for this study). For quantitative results, it is necessary to employ a much larger AO basis, in this study, the ANO-RCC-VTZP for light elements and ANO-RCC-VQZP for ruthenium, which we will term the full atomic orbital basis (fb). Table VI lists the number of one-electron basis functions resulting from the choice of these bases for each complex. Note that the number of MOs equals the number of AOs and the numbers in Table VI therefore apply to both types of one-electron basis sets.
The MOs are generally labeled by the type of method with which they have been obtained, i.e. HF and CASSCF. When necessary, the active space employed in the CASSCF calculation is given explicitly through the notation (N ,L) (e.g., CAS(6,6)SCF), where N is the number of electrons and L the number of orbitals in the active space. The CASSCF molecular orbitals were obtained in the following manner: HF orbitals in the ANO-RCC-MB atomic orbital basis were split-localized [16] with the Pipek-Mezey method [17]. From these localized orbitals, we selected the orbitals corresponding to the 4d orbitals of ruthenium, the 2p and 2s orbitals of oxygen and carbon atoms (i.e. the bonding and antibonding π orbitals of carbon dioxide and derivatives), the bonding and antibonding σ orbitals of H 2 as well as the s-orbitals of Hydrogen atoms to evaluate their entanglement via an approximate Density Matrix Renormalization Group-Configuration Interaction (DMRG-CI) calculation with the QCMaquis [18] program. We selected the orbitals corresponding to a threshold in autoCAS [19,20] as the active space for a subsequent CASSCF calculation. We repeated the approximate DMRG calculation on the CASSCF orbitals and decided on the final active space based on the thereby obtained entanglement diagrams. We then expanded these orbitals to the full atomic orbital basis with the EXPBAS module of OpenMolcas and re-optimized them in a final CASSCF calculation. The one-and two-electron integrals of a subset of these orbitals then served as parameters for the Coulomb Hamiltonian, for which the quantum-algorithm resource estimates were obtained. The general procedure for the selection of these active spaces is detailed in Section III C whereas the molecular orbitals resulting from this selection for each complex are reproduced in Section V.
For each structure, we additionally carried out DMRG-Configuration-Interaction (CI) calculations of the orbitals corresponding to the integral file with the largest active space in order to gain qualitative information about the electronic structure. We employed several combinations of different bond dimensions and orbital orderings and reproduce the entanglement diagram of the calculation with the lowest energy in Section V. From the matrix product states generated in this way, we obtained the largest CI-coefficient through a reconstruction of an approximate CI wave function expansion through the sampling-reconstruction algorithm of Ref. [21]. For the latter algorithm, we employed a CI-threshold of 0.001 and a CI completeness measure of 10 −6 . To obtain reasonable thresholds for the truncation parameters in and co , we carried out highly accurate DMRG calculations with QCMaquis on integral files where the twoelectron integrals had been truncated at varying thresholds as well as the full integrals and reported the resulting absolute error in the energy. We performed these calculations on linear Hydrogen chains of length 2,4,6, and 8 with an internuclear separation of 1.4 times the equilibrium H 2 bond distance (1.037Å) as well as on the catalysts II-III with an active space of (8,6) and IX with an active space of (16,16), respectively. The details on these DMRG calculations are summarized in Table VIII. The HF orbitals for the integrals of the linear Hydrogen chains were obtained with OpenMolcas in an ANO-RCC-VDZ basis. The resulting DMRG-CI electronic ground-state energies are given in Tables IX to XI. ( co ) for the coherent truncation scheme. The energy associated with the non-truncated integrals is given as the energy at the truncation value of 0.0 Hartree. in/co denotes the value of the truncation parameter of either the incoherent truncation ( in ) of the two-electron integrals or the coherent one ( co ). The resulting ground-state DMRG-CI energy of the integral file at the given truncation level is given by E DMRG−CI el ( in ) for the incoherent truncation scheme and E DMRG−CI el ( co ) for the coherent one. The energy associated with the non-truncated integrals is given in the first row at a truncation value of 0.0 Hartree. in/co denotes the value of the truncation parameter of either the incoherent truncation ( in ) of the two-electron integrals or the coherent one ( co ). The resulting ground-state DMRG-CI energy of the integral file at the given truncation level is given by E DMRG−CI el ( in ) for the incoherent truncation scheme and E DMRG−CI el ( co ) for the coherent one. The energy associated with the non-truncated integrals is given in the first row at a truncation value of 0.0 Hartree. During the selection of the small active spaces for the CASSCF calculations, it became evident that the intermediates and transition states in this study are mainly dynamically correlated, as indicated by low values of our multi-configurational measure Z s(1) [22]. While the orbitals are sufficiently entangled to allow the determination of a small active space for the CASSCF calculations, the selection becomes ambiguous for larger active space sizes as the distinction between statically and dynamically correlated orbitals gets blurred and it is not possible to make a clear cut between the two. For the selection of the active spaces for the integral files, we therefore chose a scheme that is as reproducible as possible while still maintaining some chemical relevance. To investigate the performance of the quantum algorithms in the limit of very large active spaces, we additionally generated integral files with 100, 150, and 250 active orbitals for the complex XVIII. As a selection guided by manual inspection becomes unfeasible for these active space sizes, we chose to include the n/2 orbitals below (and including) the HOMO and the n/2 orbitals above it (with n=100, 150, 250).
For complex II, we also generated integral files from HF orbitals to understand the effect of the orbital type on the resource estimates. We selected the active space analogously to the CASSCF active spaces but without the small active space (as it is not defined for the HF orbitals).
For complex I, we obtained an additional set of orbitals obtained with a Cholesky decomposition threshold of 10 −8 instead of the default value of 10 −4 during the quantum chemical calculations.

IV. OVERVIEW OF THE INTEGRAL FILES
The files containing the one-and two-electron integrals for a selection molecular orbitals as defined in Section III C are summarized in Table XII, which also lists the atomic and molecular orbital bases used as well as the active spaces for the integrals. For each complex, the first three files correspond to the small, intermediate, and large active spaces, respectively. The additional files probe the limits of the quantum algorithms. The notation for the files is catalyst-MObasis-AObasis-n e n o , where catalyst is one of the catalyst identifiers I, II, II-III, V, VIII, VIII-IX, IX, or XVIII, whereas MObasis denotes the molecular orbitals basis, AObasis the atomic orbital basis, and n e and n o are the number of electrons and orbitals, respectively. Table XII. List of the integral files resulting from the active space selection outlined in Section III C for the different complexes with the notation explained in the text. The MO active space is the one of the underlying CASSCF calculation through which the orbitals have been obtained, whereas the integral active space denotes the size of the active space chosen for the integral file under consideration. All calculations were performed with the default threshold of 10 −4 for the Cholesky decomposition of the two-electron integrals except the ones which are labelled "highCD", where a threshold of 10 −8 was employed.        2. Graphical representation of CAS(12,11)SCF molecular orbitals chosen for integral files Figure 12. Active spaces of integral files for complex V produced from CAS(12,11)SCF molecular orbitals in the full atomic orbital basis.

E.
Stable intermediate VIII Figure 13. Lewis structure of intermediate VIII.     Figure 24. Active spaces of integral files for complex XVIII produced from CAS(4,4)SCF molecular orbitals in the full atomic orbital basis.

VI. RESOURCE ANALYSIS
There exists a plethora of quantum algorithms that allow one to obtain energies of a given Hamiltonian. Our previous work [23] focused on a product-formula based implementation of the time evolution operator, where the Hamiltonian was given in a standard second-quantized representation. In this work, we evaluated a variety of different approaches and focused on the ones that yield the lowest resource costs.
An initial screening of methods left us with the following methods to consider: • Trotter based implementation of the low-rank factorized Hamiltonian [24] • Qubitization of the standard (unfactorized) second-quantized representation [25] • Qubitization of the single-factorized Hamiltonian [26] • Qubitization of the double-factorized Hamiltonian (this work) We found that a qubitization based implementation of a doubly-factorized Hamiltonian resulted in the lowest costs. This method is discussed in the main text with a comparison to the single-factorized Hamiltonian, and also thoroughly detailed in Section VII C of this supporting document. We also ruled out qubitization in the standard and single-factorized second-quantized representation based on highly-optimized algorithms targeting these representations [26]. In the remainder of this section, we summarize our evaluation of remaining Trotter based implementation, and compare resource estimates in Table XIII with the best qubitization implementations. Finally, we tabulate resource estimates of all carbon-dioxide fixation complexes mentioned in the main text.

A. Product formula and low-rank factorization of the Hamiltonian
This method was introduced in [24]. The idea is to approximate the Hamiltonian using a low-rank factorization, i.e., using R < N 2 components, and to then simulate time-evolution using a product formula. Each step of the product formula based implementation consists of a sequence of rotations that diagonalize the current component of the low-rank factorization, followed by a sequence of controlled phase gates that correspond to the eigenvalues of the component. This is repeated for all R components of the low-rank factorization. As the simulation basis size N is increased, it has been shown that R ∼ O(N ) [27]. Additionally, the eigenvalues of each component in the factorization may be truncated, allowing one to further reduce the resource requirements. For a more detailed description of this method, we refer the reader to [24].
We provide gate counts for estimating the electronic energies of some of our chosen systems using this method in Table XIII. The resource estimation uses the same truncation method as discussed in Section 5.1 of the main text. We note that this product formula based implementation naturally offers more parallelism than, e.g., a qubitization based implementation at comparable qubit numbers. We leave the detailed investigation of such space/time tradeoffs for future work.
The time step for the second order Trotter method was chosen using the following upper bound [28] which applies to any Hamiltonian H = M −1 j=0 H j where time-evolution by each term can be applied without any error. In the context of electronic structure, time-evolution by the one-electron term and each rank component of the two-electron terms can be applied exactly by diagonalizing them with Fermionic basis transformations that are implemented by Givens rotations [29].
Since such bounds have been shown to be loose by many orders of magnitude, we carried out an empirical study of the required Trotter step size for Hydrogen chains. We depict the bound and empirically determined step sizes in Fig. 25. The results for Hydrogen chains indicate that the upper bound yields Trotter numbers that may be up to 10 √ n larger than what empirical data suggests. Rescaling the T -count estimates for our catalyst systems by this factor results in an optimistic estimate that can also be found in Table XIII.

B. Results
In this section, we compare the T -gate and qubit counts of Trotter-and qubitization-based each method in Table XIII -note that any Toffoli gate can be synthesized from 4 T gates. Qubitization using a doubly-factorized representation of the Hamiltonian offers the lowest T -counts at reasonable qubit numbers. The main text thus focuses on this approach, and thus we tabulate, for completeness, the cost of phase estimation to 1mHartree and problem parameters of all the carbon-dioxide fixation complexes mentioned in the main text using this approach. We also briefly compare with qubitization using the unfactorized or singlefactorized representation using the algorithms of Berry et al. [26]. In all these examples, we reduce the number of non-zero terms by using the incoherent truncation scheme discussed in the main text at an error threshold of in = 1mHartree.  Table XIV. Double-factorization resource estimates for all steps of carbon-dioxide fixation for active space sizes from 52-65 orbitals at a truncation threshold between in = 0.1mHartree and 0.1Hartree.

VII. QUBITIZATION OF DOUBLE-FACTORIZED HAMILTONIAN
In this section, we present our approach for qubitizing the double-factorized Hamiltonian. We make heavy use of quantum circuit diagrams, and many our proofs follow from combining these diagrams. Thus in Section VII A, we define our notation for many standard angle controlled by an integer index, and sparse multiplexed data-lookup in Section VII B 2. These primitives are then combined in our qubitization of the double-factorized Hamiltonin in Section VII C. Step

A. Standard quantum circuit primitives
In this section, we define quantum circuit primitives and their costs. Throughout, we use the shorthand n x . = log 2 x which is the number of bits needed to store an integer of size x. All the quantum circuits we use are constructed from the following primitive elements. Unitary and its adjoint: Controlled unitary: Multiplexed unitaries: Controlled multiplexed unitary: Data-lookup that XORs x k with z: Unitary that prepares | a : |0 | a | a = j |a j | a 1 |j .
Unitary that prepares | a : |0 Unitary that prepares | a, f : Block-encoding unitary: The block-encoding of a unitary is trivial: Multiplying block-encodings: The costs of these quantum circuits are expressed in terms of primitive quantum gates and qubits. We define primitive quantum gates as those acting on at most two qubits, and we distinguish between Clifford gates {Had, S, CNOT}, and the non-Clifford T gate. Note that we denote the Hadamard gate as Had instead of the more traditional symbol H to avoid confusion with the Hamiltonian. Some of these circuits may be implemented by ancilla qubits in addition to the 'register' qubits. Typically, we do not show the ancilla qubits in the diagrams and implicitly assume that these invisible qubits are all borrowed, meaning that they are returned to the same initial state at the end of the circuit. We call these ancilla qubits 'clean' if their initial state is the computational basis state |0 . In contrast, We call these ancilla qubits 'dirty' if their initial is arbitrary and unknown.
In some cases, a quantum circuit U is approximated by U to some error U − U ≤ in spectral norm. The errors of multiple approximate quantum circuits add linearly, following the triangle inequality

Data-lookup oracle
Given a list of d bit-strings a ∈ {0, 1} d×b , each of length b, the data-lookup oracle in Eq. (6) returns the bit-string a x , that is D|x |z = |x |z ⊕ a x . In addition to the b + log 2 (d) qubits needed to store its inputs and outputs, this oracle has the following cost.
According to Low et al. [31], the Toffoli gate cost of data-lookup can be further reduced by adding λb additional qubits, where λ ≥ 0 is an integer. These qubits can be clean, meaning they start in and are returned to the computational basis state |0 . Circuit optimizations by Berry et al. [26] further reduce the cost by constant factors.
Note that the additional O(1) clean qubits and O(log (d)) Toffoli gates only needed when 1 + λ is not a power of two. In this case, they are used in an intermediate step to reversibly compute the remainder and quotient of j/ (1 + λ), where the numerator j ∈ [N ] is stored in a n d qubit register. In the following, we will omit this additive cost as it is subdominant in all our applications. We find it useful to define the Toffoli gate count function which returns the smallest possible Toffoli gate count for any number of λ available clean qubits. The bit-strings output by these lookup oracles may be uncomputed by applying their adjoint. This doubles their gate complexity at most. However, the gate complexity of uncomputation can be further improved [26] used measurement-based uncomputation. By using λ + O(log(d/λ)) qubits, measurement-based uncomputation of data-lookup reduces the additive λb Toffoli gate term to λ, which becomes significant when λb ∼ √ bd. More importantly, any garbage qubits produced by Lemma 2 can measured and the results stored in classical memory for later uncomputation. This frees up clean ancilla qubits and means that the λ parameter for uncomputation can be optimized separately from that of computation in reducing Toffoli costs. Thus we also find it useful to define the cost of uncomputation as In most of the cases we consider, uncomputation of data-lookup is an order of magnitude cheaper than computation. These assisting qubits can also be dirty, meaning that start in and are returned to the same initial state. This is useful whenever the quantum algorithm has any idling qubits. To simply notation in the following, we will assume that λ is a power of two.
Note that when λ ≤ 1, there is no advantage over the original construction in Lemma 1. We find it useful to define the Toffoli gate count function which returns the smallest possible Toffoli gate count for any number of λ available dirty qubits. Note a slight difference in notation compared to Eq. (18) -there, the last subscript λ is the number of times the output register is duplicated, whereas the last subscript λb parameterizes the total number of dirty qubits available. Similar to the case using clean qubits, the Toffoli cost of uncomputation is less than that of computation [26].

State preparation unitary
Quantum state preparation is a unitary circuit that prepares a desired quantum state. A number of different quantum circuit implementations of Eqs. (7) and (8) are known, each with different trade-offs in qubit count, and the various quantum gates. For instance, the approach by Shende et al. [32] uses log 2 d qubits and d arbitrary single-qubit rotations, and O(d) other two-qubit Clifford gates.
For our purposes, it suffices to prepare quantum states where each |j is, in general, entangled with some arbitrary quantum state |Garbage j .
As this includes Eq. (7) as a special case, quantum circuits for state preparation with garbage can use fewer T gates, though at the expense of more qubits. This garbage state may be safely ignored in the remainder of this manuscript, so we do not differentiate between the state preparation unitaries of Eq. (7) and Eq. (20). Moreover, the circuits for | a and | a are very similar and have the same T gate cost. We will repeatedly invoke the following implementation which approximates each coefficient to a targeted precision.
Lemma 4 (Approximate quantum state preparation with garbage [30]). Given a list of d positive coefficients a ∈ R d and the desired bits of precision µ, the quantum state where can be prepared by a unitary U that is implemented using one application of any data-lookup oracle from Section VII A 1 for d bit-strings of length log 2 (d) + µ. The total cost for implementing U is given by: • Toffoli gates: µ + D d, log 2 (d) +µ,λ + Θ(log (d)).
• Dirty qubits: λ. When many coefficients of |j b are zero, it is useful for the T count to scale with the number of non-zero elements nnz [ a] rather than with d The modification of Eq. (7) that outputs some additional bits specified by a Boolean function f : [d] → {0, 1} b is also useful, such as when many On one hand, the unitary of Eq. (9) can be implemented by combining state preparation in Eq. (7) with data-lookup Eq. (4) as follows. where can be prepared by a unitary U , and U is approximated to error using one application of any data-lookup oracle from Section VII A 1 for d bit-strings of length 2b + µ. The total cost for implementing U is given by: • Toffoli gates: µ + D d, log 2 (d) +µ,λ + Θ(log (d/ )).

Block-encoding framework
These two components of state preparation and select allow us to implement a blockencoding.
Note that the same Hamiltonian by be block-encoded by many quantum circuits. For instance, the quantum circuit may explicitly implement the coefficient sign as follows, where Z is the Pauli Z|x = (−1) x |x , and sign[ a](j) = 1−sign(a j ) 2 ∈ {0, 1}. This implementation has the advantage that its controlled version only needs to apply a control to the select unitary. In particular, the T gate cost of controlled-B [H/ a 1 ] is identical to B [H/ a 1 ] when the select unitary is implemented following the approach by Babbush et al. [30]. Moreover, note that the state preparation unitary is always followed up by by its adjoint. Thus the same Hamiltonian is block-encoded even if state preparation in Eq. (20) entangled with an garbage state [30].
Errors in state preparation or unitary synthesis introduce errors into the block-encoded Hamiltonian. We find it useful to define the approximate block-encoding.
For instance, we have the following approximation due to error in the coefficient of the quantum state.
Lemma 6 (Approximate block-encoding using approximate state preparation). Using the approximate state preparation circuit of Lemma 4 with precision parameter µ in the blockencoding circuit of Definition 1 produces an -approximate block-encoding B [H/ a 1 ], where = 2 −µ .
Proof. Let p be such that p − a/ a 1 1 ≤ 2 −µ . Let H = j p j U j , and H = j a j U j . Then

Qubitization
We also use the following result on qubitization, which is a generalization of quantum walks.
In particular, following the work of Low and Chuang [33], this circuit , (29) costs j total queries to B [H] and its inverse, and j − 1 reflections REF = 2|0 0| a − I a on the ancilla register.
Each reflection can be understood as a multi-controlled Z gate, and so has a Toffoli cost equal to the number of qubits it acts on.
B. New quantum circuit primitives
Then Eq. (30) is implemented by the following circuit.
With only κ qubits, clearly b/κ slices of the circuit within the dotted regions are required.
Another useful situation is where arbitrary unitaries are applied on the system register are interspersed between M multiplexed rotations.
We may use the same construction as in Eq. (32) to implement this. The number of datalookup oracles required is then M b/κ + 1. We may reduce this to just M b/κ + 1. When b is not an integer multiplier of κ, the data-lookup in the last slice might store fewer than κ entries. Thus we fill these empty entries with bit-strings from the nest data-lookup. This filling procedure is illustrated by the following example, where the bits of precision b = 2 ≤ κ = 3.
In the case where many data qubits are available κ b, we may similarly merge multiple bitstrings into the same lookup oracle. Thus, the Toffoli cost of Eq. (33) is equal to M b/κ + 1 data lookup oracles with K bit-strings of length κ in addition to that of all the U j . Moreover, the number of qubits required for the data and index k is equals to κ + log 2 (M ) . It is valuable to express these costs with respect to a tunable number of qubits κ. According to Section VII A 1, the Toffoli gate cost of data-lookup with K elements that outputs κ bits an be reduced by using λ ancillary qubits. When these qubits are clean, the Toffoli cost is K/ 1 + λ κ + λ. Thus the Toffoli count of all the data-lookup oracles is M b κ which is minimized by choosing λ ∼ √ Kκ. We now bound the bits of precision required to approximate Eq. (33) to an overall error of in the spectral norm. Suppose we are given angles θ exact that are real numbers. Then the error of approximating each angle θ exact,k with a β-bit number θ k is at most 2 −β−1 Thus the error of each rotation compared to its binary approximation is R 2| sin (2 −β π)| ≤ π2 −β+1/2 . The cumulative error ≤ M π2 −β+1/2 of all rotations then follows by the triangle inequality on unitary operators ( j U j ) − ( jŨ j ) ≤ j U j −Ũ j . Thus the bits of precision required is There can be an additional error introduced from approximating each single-qubit rotation gate with Clifford + T gates. However, using the phase gradient technique [34] eliminates this error with a worst-case cost of one Toffoli gate per R b rotation.

Multiplexed sparse data-lookup
In this section, we describe an implementation of a multiplexed data-lookup oracle. This can be non-trivial as standard data-lookup constructions [30,31] are controlled by a single index register. Whereas in this case, there can be two or more index registers such as below.
In the above, j ∈ [J] and k ∈ [K]. Thus there are at most KJ bit-strings x j,k . One solution is to map the indices (j, k) to a unique integer q = jK +k. Thus Eq. (37) can be implemented by a data-lookup oracle controlled by a single index q ∈ [JK], combined with an arithmetic circuit that computes q from j and k as follows.
We consider the situation where for each j, only K j ≤ K bit-strings are defined. Thus the multiplexed data-lookup oracle only encodes Q = j∈[J] K j elements. Using the construction of Eq. (38) is wasteful as it enumerates over KJ elements, which is more than necessary. Our solution uses a data-lookup oracle that enumerates over exactly Q elements. The basic idea is to map (j, k) to a unique integer q = k + α∈[j−1] K α . Note that the shift Q j = α∈[j−1] K α can be classically pre-computed. Thus this map is implemented by a data-lookup oracle that outputs Q j , followed by an arithmetic circuit that adds k to Q j as follows.
As cost is dominated by the single data-lookup oracle in the middle, this construction lends itself readily to Toffoli gate count reduction using additional ancilla qubits, following Section VII A 1.
Lemma 7 (Multiplexed sparse data-lookup oracle). Given a set of bit-strings { x j,k ∈ {0, 1} b : j ∈ [N ] and k ∈ [K j ]}, the data-lookup oracle in Eq. (37) can be implemented using one application of any data-lookup oracle for Q = j∈[J] K j bit-strings of length b, two applications (one computation and one uncomputation) of any data-lookup oracle for J bit-strings of length log 2 (Q) , and two log 2 (Q) -bit arithmetic adders.
There is much flexibility in choosing the implementation of data-lookup oracles in Lemma 7. In the following corollary, we implement data-lookup on the Q elements using only clean ancilla qubits in Lemma 2, and implement data-lookup on the shift Q j using dirty qubits Lemma 3.
Uncomputation can be implemented using the same number of qubits and • Toffoli gates: DU Q,J,b,λ = D J, log 2 Q ,n dirty + DU J, log 2 Q ,n dirty + DU Q,b,λ + 2 log 2 Q + O(1).
Proof. At the beginning of the circuit we allocate the stated number of qubits. We then tabulate the resources required for each operation to ensure that there are sufficient clean qubits available, and sum the Toffoli gate counts.

Operation
Clean qubits Dirty qubits Toffoli gates required available (n) Lookup on Q j Lemma 3 log 2 J ≥ log 2 Q + b(1 + λ) D J, log 2 Q ,n dirty +DU J, log 2 Q ,n dirty Two adders [35] 0 n/a 2 log 2 Q + O(1) In most applications, particularly later when we use this to block-encode the molecular Hamiltonian, the total number of bit-strings Q is significantly larger than J. Thus the cost of computation in Lemma 7 is dominated by D Q,b,λ + O(n Q + D J,n Q ,n ).

C. Double-factorized Hamiltonian
The electronic Hamiltonian in first-quantization is where ∇ 2 n is the Laplace operator on the n th electron, Z m is the nuclear charge, and r m is the nucleus coordinate. By choosing a basis of orbitals ψ i (x), this implies the second-quantized representation This is equal to the single-factorized H CD and double-factorized H DF Hamiltonians Note that the coefficient Λ  The block-encoding framework supports the addition and multiplication of encoded matrices. For instance, one may add block-encoded Hamiltonians H = j H j with a new normalizing constant such as α = j α 1 , using the following quantum circuit As mentioned in eq. (14), block-encoded Hamiltonians may also be multiplied to obtain B [H 2 H 1 /α 2 α 1 ]. In general, these addition and multiplication operations increase the ancilla register size in a straightforward manner. Using these ingredients, one may block-encode matrices specified by a variety of common input models, such as sparse matrix oracle, linearcombination-unitaries, or other quantum data structures.
In qubitizing H DF , we find it more natural to work in the Majorana representation of the fermion operators Thus a p = (γ p,0 + iγ p,1 )/2 and a q = (γ p,0 − iγ p,1 )/2, where p . = (i, σ) is a combined orbital and spin index. Some useful identifies are which implies the Majorana representation for one-electron Hamiltonian that separates into a trivial identity component, and a non-trivial one-electron component One L . As L is a symmetric matrix, it has the eigendecomposition L = k λ k R k · R k . Thus we may diagonalize the one-electron Hamiltonian to obtain One should verify that γ 2 u,σ,x = I, which follows from Eq. (45) and the unit-length normalization Eq. (48) of u.
Thus the spectral norm One L = L SC is seen to be the one-norm of the eigenvalues of L. Substituting the Majorana representation into the doubly-factorized Hamiltonian results in Our strategy for block-encoding Eq. (49) is to build it up from block-encodings of its component pieces. Let us further collect terms as follows.
where T 2 (x) = 2x 2 − 1 is a Chebyshev polynomial of the first kind. The identity term only contributes a constant shift in energy and may be ignored. Observe that we have reduced the double-factorized Hamiltonian to sums of products of basis-rotated Majorana operators γ u,σ,x . We thus block-encode the two-body term as follows.
3. Block-encode B One L L SC by taking a linear combination of λ k B γ R k ,σ,1 γ R k ,σ,1 over the eigenvalues and eigenvectors of L and the spins Section VII C 3.

Block-encode B T 2
One L L SC by applying B One L L SC twice using qubitization [33].

Block-encode B
Two H Generally, the cost of block-encoding the large number of two-electron terms dominate that of the smaller number of one-electron terms. A common theme throughout will be the use of symmetries. Many coefficients turn out to be identical. For instance, L (r) ij = L (r) ji , and is independent of spin. Moreover, the same coefficients R (r) k occur in both of γ u,σ,1 γ u,σ,1 . Wherever possible, we use this redundancy to optimize the number of bits of classical data we need to encode into our quantum circuits. These optimizations are combined with recent advances using ancillary qubits [31] to substantially reduce T of f oli gate count We now provide optimized quantum circuits that implement the above steps. We make heavy use of quantum circuit notation, outlined in Section VII A.

Block-encoded, basis-transformed Majorana operator
We synthesize the basis-transformed Majorana operator γ u,σ,x by conjugating γ 0,σ,x with a sequence of unitary rotations. The required sequence of unitary rotations follows from the following observation.
Proof. In the interests of clarity, we drop the σ, x subscript in the following. By taking a Taylor series expansion, observe that V (p) u = cos (θ p )I + sin (θ p )γ p γ p+1 . Thus Thus We obtain the angles θ p by recursively solving this linear chain of equations.

Block-encoded one-electron operator and its square
The one-electron operator is The block-encoding of One L is accomplished by adding block-encodings of B [γ u,σ,0 γ u,σ,1 ], each weighted by the eigenvalue λ k , following the construction in Eq. (13). This requires synthesizing the quantum state and the multiplexed unitary Note that we encode the sign of λ k in a single qubit using the Pauli Z operation, as described in Eq. (25), which uses the fact sign[λ k ]|Z|sign[λ k ] = sign[λ k ]. In the following circuit diagrams, this sign qubit is assumed to be implicitly present and will not be shown. Given these two components, the block-encoding is implemented by the quantum circuit We now discuss the implementation of Eq. (65) which is controlled by the spin index σ and the eigenvalue index k. Control by the spin index remains unchanged from Eq. (62). Control by the eigenvalue index is implemented by multiplexing the basis transformation From Eq. (59), U R k ,0 U R k ,1 is a sequence of Z-rotations by a k-dependent angle θ R k ,p , each conjugated by a Clifford gate that is independent of k. Thus Eq. (67) may be implemented by multiplexing the phase-rotations as follows We implement multiplexed phase-rotations using a data-lookup oracle D p |k |z = |k |z ⊕ θ R k ,p on K β-bit entries. This oracle computes a β-bit binary representation of the rotation angle θ R k ,p 2π ≈θ R k ,p = β−1 b=0θ R k ,p,b /2 1+b ∈ [0, 1 − 2 −β ], and may be implemented according to Section VII A 1. With this oracle, we perform the following sequence Controlled rotations → |k This costs two queries to data-lookup D p and β arbitrary single-qubit rotations. As the circuit for Eq. (65) contains 4N multiplexed rotations, these rotations costs at most 8N queries to data-lookup on K β-bit entries, and 4N β arbitrary single-qubit rotations. However, some optimizations are possible. First, the rotation angle bits computed once by D p may be used to implement both rotations Rθ R k ,p in Eq. (69). This reduces queries by a factor of two to 4N . Second, the uncomputation of |θ R k ,p may be merged with the computation ofθ R k ,p+1 . Instead of applying D p and D p+1 , we merge them into a single datalookup oracle D p,p+1 |k |z = |p |z ⊕θ R k ,p ⊕θ R k ,p+1 . This reduces queries by another factor of two to 2N . Third, the data-lookup oracles may output κ > β bits. Thus multiple angles Operation # Toffoli cost each Ancillary qubits Data-lookup Lemma 2 1 D K,N β,λ + DU K,N β,λ 2 log 2 K + N β(1 + λ) Arbitrary rotations [34] 4N β 1 0 Controlled-SWAPs 2N 1 0 State-preparation Lemma 4 2 µ + D K,n K +µ+1,N β(1+λ)+2N +1 dirty log 2 K + 2µ + 1 Table XIX. Resources used to block-encode a one-electron operator B One L L SC on N orbitals where L has K eigenvectors. The parameters β = 5.152 + log 2 N and µ = 2 + log 2 (1/ ) . may be written out at once, e.g. |z 0 ⊕θ R k ,p |z 1 ⊕θ R k ,p+1 · · · |z κ/β−1 ⊕θ R k ,p+κ/β−1 . These angles allow the application of more controlled-rotations for each application of data-lookup. This is detailed in Section VII B 1, and reduces the number of queries to 2 N β κ , but now to data-lookup on K κ-bit entries. Fourth, many of arbitrary single-qubit rotations are by identical angles (powers of 2). These may be implemented exactly using 4N β Toffoli gates by the phase gradient technique [34]. This also requires a one-time cost of preparing and storing β single-qubit resource states whose cost is negligible and thus ignored. Fifth, using the choice κ = N β, only one computation and uncomputation step is required. This can be advantageous as uncomputation as described in Section VII A 1 has a Toffoli cost that can be smaller than computation when many ancillary qubits are used.
We now bound the bits of precision β required to approximate Eq. (65) to an overall error of in spectral norm. As each rotation angle θ R k ,p is approximated to an error of at most π2 −β , the error of each unitary rotation compared to its binary approximation is The cumulative error ≤ (4N )π2 −β+1/2 of all rotations then follows by the triangle inequality on unitary operators ( j U j )−( jŨ j ) ≤ j U j −Ũ j . Thus the bits of precision required is Including the error µ of state-preparation results in an = +µ-approximate block-encoding B +µ One L L SC instead of the exact B One L L SC . A simple choice of error budgeting is = /2 and µ = /2, though this may be optimized with more weight on as its data-lookup oracles are more costly.
The overall cost of block-encoding the one-electron operator is summarized in Table XIX, and is dominated by data-lookup in multiplexing the basis transformation rotations. As state-preparation is cheap in comparison, we choose an implementation that is assisted by κ + λ + 2N dirty qubits. Given B One L L SC , qubitization in Theorem 1 describes how it may be applied twice, hence doubling the cost, to block-encode B 2 T 2 [ One L L SC ] . This also applies a reflection about the all-zero state |0 · · · 0 on O(log K +µ) qubits, whose cost is negligible and thus ignored.

Block-encoded two-electron operator
In this section, we describe a block-encoding of the double-factorized Hamiltonian in Eq. (50). We focus on the two-body term, which is the most costly component.
At a high level, we achieve this block-encoding using the following circuit identities, which are defined in Section VII A.
Add block-encodings Eq. (13): Qubitization Eq. (29): B T 2 Add block-encodings Eq. (13): Compared to the one-electron operator in Eq. (66), we see that the only new element is multiplexing by another control variable r. Thus instead of the singly-multiplexed basis transformation in Eq. (69), we implement the doubly-multiplexed basis transform Following the implementation of multiplexed-phase rotations in Eq. (70), Eq. (77) can be implemented given a multiplexed data-lookup oracle DD p |r |k |z = |r |k |z ⊕ θ R (r) k ,p . Two challenges make the implementation of DD p non-obvious. First, the number of entries k ∈ [M (r) ] can depend on the index r. Rather than storing the worst-case number  We also choose the number of clean qubits κ = N β, which is just a constant factor more than the 2N qubits representing spin-orbitals, as β = 5.652 + log 2 . As the number of dirty qubits Thus the total number of Toffoli gates is 2(D M,R,N β,λ + DU M,R,N β,λ ) + 4(STATE M,2n R +µ+1, + N (2β + 1)) (85) Note that converting this block-encoding to a quantum walk using the qubitization procedure Theorem 1 requires an additional reflection REF about the state |0 · · · 0 on the ancillary register of the block-encoding. This reflection may be implemented using a number of Toffoli gates that is at most equal to the number of qubits comprising the ancillary register. Importantly, this ancillary register excludes any additional clean or dirty qubits that are used in intermediate steps, such as in data-lookup. Thus the cost of REF is subdominant to that of the block-encoding.

VIII. MATRIX SCHATTEN AND ENTRYWISE 1-NORM
Let h ∈ C N ×N be any Hermitian matrix. This matrix has eigenvalues h · U k = σ k U k , where the orthonormal k th eigenvector U k can be viewed as the k th column of some unitary matrix U ∈ C N ×N . We now prove that the following inequality is true where Moreover, we show that this inequality is tight. We have not been able to find a proof of this statement in the literature, so our derivation may be of independent interest. To prove the lower bound, observe that The middle equality follows from the definition of the Frobenius norm, and the first and last inequalities follow from the usual vector norm inequality |x j | 2 for any vector x. This is an equality when every entry in h is one, and is hence tight.
To prove the upper bound, observe that |h ij |.
The final inequality follows from applying the Cauchy-Schwartz inequality on the term in round brackets. Using the fact that any row or column of any unitary has unit 2-norm, This upper bound is also an equality by choosing h to contain only a single non-zero entry that lies on the diagonal, and is hence tight.