Orbital transformations to reduce the 1-norm of the electronic structure Hamiltonian for quantum computing applications

Reducing the complexity of quantum algorithms to treat quantum chemistry problems is essential to demonstrate an eventual quantum advantage of Noisy-Intermediate Scale Quantum (NISQ) devices over their classical counterpart. Significant improvements have been made recently to simulate the time-evolution operator $U(t) = e^{i\mathcal{\hat{H}}t}$ where $\mathcal{\hat{H}}$ is the electronic structure Hamiltonian, or to simulate $\mathcal{\hat{H}}$ directly (when written as a linear combination of unitaries) by using block encoding or"qubitization"techniques. A fundamental measure quantifying the practical implementation complexity of these quantum algorithms is the so-called"1-norm"of the qubit-representation of the Hamiltonian, which can be reduced by writing the Hamiltonian in factorized or tensor-hypercontracted forms for instance. In this work, we investigate the effect of classical pre-optimization of the electronic structure Hamiltonian representation, via single-particle basis transformation, on the 1-norm. Specifically, we employ several localization schemes and benchmark the 1-norm of several systems of different sizes (number of atoms and active space sizes). We also derive a new formula for the 1-norm as a function of the electronic integrals, and use this quantity as a cost function for an orbital-optimization scheme that improves over localization schemes. This paper gives more insights about the importance of the 1-norm in quantum computing for quantum chemistry, and provides simple ways of decreasing its value to reduce the complexity of quantum algorithms.


I. INTRODUCTION
Quantum chemistry has been identified as the killer application of quantum computers, which promise to solve problems of high industrial impact that are not tractable for their classical counterparts [1][2][3][4]. Unfortunately, quantum devices are noisy and only shallow circuits can be implemented with a relatively accurate fidelity. Hence, even though we know how to solve the electronic structure problem with, for instance, the quantum phase estimation (QPE) [5][6][7][8][9] algorithm, this remains out of reach in practice and one has to come up with new algorithms that can actually be run within this noisy intermediate-scale quantum (NISQ) era [10]. Reducing circuit depth has been achieved by interfacing quantum and classical devices, thus leading to hybrid quantum-classical algorithms such as the variational quantum eigensolver (VQE) [11]. Since then, variational algorithms have been successfully applied to the estimation of ground-state (see Refs. 12-15 and references therein) and excited-state energies [16][17][18][19][20][21][22][23][24][25][26], as well as molecular properties [27][28][29][30].
However, it remains unclear if these algorithms can provide a clear quantum advantage in the long run, especially due to the difficulty in circuit optimization [31] * yalouzsaad@gmail.com † bruno.senjean@umontpellier.fr and to the large overhead in the number of measurements required to achieve sufficient accuracy [32,33], though significant progress has been made recently [33][34][35][36][37][38][39][40][41][42][43][44][45]. This led to the development of new strategies for reducing the resources required to implement quantum algorithms (such as gate complexity), for both the faulttolerant and NISQ era. For simulating the time-evolution operator U (t) = e iĤt whereĤ is the electronic Hamiltonian, Campbell has shown that using randomized compiler is better suited than the Trotter-Suzuki decomposition, the so called quantum stochastic drift protocol (qDRIFT) [46]. One can also directly simulate the Hamiltonian by using linear combination of unitaries, generalized by the block encoding or "qubitization" formalism [47][48][49][50]. Most of these algorithms have a gate complexity that scales with respect to the parameter λ Q = i |h i |, whereĤ = i h iPi is the qubit Hamiltonian andP i are Pauli strings [51]. Lowering the value of λ Q -usually referred to the 1-norm of the Hamiltonian -has a significant impact on quantum algorithms, and techniques such as double-factorization [4], tensor hypercontraction [52] and n-representability constraints [33] have proven successful in this respect.
In this work, we investigate how the value of the 1norm and its scaling with respect to the number of orbitals can be reduced by single-particle basis rotations. As a starting point we consider the use of localized orbitals that are commonly available in most quantum chemistry codes. Generating such orbitals presents a simple and effortless way of reducing gate complexity by arXiv:2103.14753v3 [quant-ph] 6 Oct 2021 classical pre-optimization of the Hamiltonian representation.
We show that the use of localized orbitals has a significant impact on the 1-norm of all systems studied, from simple one-dimensional hydrogen chains (also studied in Refs. 33 and 52) to much more complex organic and inorganic molecules. Additionally, we connect the expressions of the 1-norm before and after the fermion-to-qubit mapping. This expression is used to define a new cost function for single-particle basis rotations, and is shown to reduce the value of λ Q even more than standard localization schemes.
The paper is organized as follows. After introducing the electronic structure problem in Sec. II A, the impact of the 1-norm on quantum algorithms is detailed in Sec. II B. Then, we review the localization schemes applied in this work in Sec. II C followed by our socalled 1-norm orbital-optimization method in Sec. II D. Finally, preceded by the computational details in Sec. III, the scaling of the 1-norm when increasing the number of atoms is investigated in Sec. IV A for hydrogen and alkane chains, followed by a benchmark on several systems in Sec. IV B and a study of the effect of increasing the active space size in Sec. IV C. Conclusions and perspectives are given in Sec. V.

A. Electronic structure Hamiltonian in second quantization
In quantum chemistry, the second quantization formalism is usually employed to describe the electronic properties of molecules. Under the Born-Oppenheimer approximation, N e electrons can rearrange around N a nuclei by occupying a restricted set of N spatial molecular orbitals (MO). The orthonormal MO basis {φ p (r)} is built from linear combination of atomic orbitals (LCAO) and is usually obtained from an inexpensive mean-field calculation, e.g. with the Hartree-Fock (HF) method [53]. In this basis, the molecular Hamiltonian can be expressed in a spin-free form that reads (in atomic units) are the one-and two-body spin-free operators. In Eq. (1), the coefficients are the so-called one-electron integrals which encode, for each individual electron, the associated kinetic energy and Coulombic interaction with the N a nuclei of the molecule (with r iA = |r i − R A | and Z A the distance between electron i and nuclei A, and the atomic number of atom A). The coefficients (3) are the so-called two-electron integrals encoding the Coulombic repulsion between each pair of electrons (with r ij = |r i − r j | the distance between electrons i and j). Solving the electronic structure problem consists in solving the time-independent Schrödinger equation with |Ψ an electronic eigenstate with corresponding energies E . This is only possible in the case of small molecules and small basis sets, due to the exponential scaling of the computational cost with respect to the size of the system. A commonly employed approach is the so-called active space approximation which divides the MO space into three parts: the core (frozen occupied orbitals), the active, and the virtual spaces (deleted unoccupied orbitals), such that only the electrons inside the active space are treated explicitly. This approximation will be used throughout this work and is equivalent to finding the eigenstates of an effective Hamilto-nianĤ F C also called the "frozen-core Hamiltonian" (see Appendix. A). The electronic Hamiltonian can be mapped onto an appropriate representation for quantum computers by doing a fermion-to-qubit transformation (such as Jordan-Wigner [51]), resulting in a linear combination of Pauli stringsP j ∈ {I, X, Y, Z} ⊗N , Here, S denotes the sparsity of the Hamiltonian and generally scales as O(N 4 ), but sometimes O(N 2 ) for a sufficiently large system and a localized basis [54].

B. The 1-Norm in quantum computing
The term '1-norm' in quantum computing refers to a norm induced on a Hilbert-space operator by its decomposition as a sum of simpler terms. To calculate a 1norm, we write an operatorĤ (e.g. a Hamiltonian) in the formĤ where theB j are operators on C 2 2N , and the b j are complex numbers. Typically the operatorsB j are chosen to be either unitary (B † jB j = 1) or unital (B † jB j < 1). We define the 1-norm ofĤ (induced by this decomposition)
to be the 1-norm on the vector b (with components b j ): (It is common to suppress the dependence on the Hamil-tonianĤ when λ B is unambiguous.) For example, under the above decomposition into Pauli operators [Eq. (5)], we have In general, λ B is a norm on the space spanned by the set {B j } as long as this set is linearly independent, as the mappingĤ ↔ b is then bijective and linear. The Pauli operatorsP j used above make a natural choice forB j as they span the set of 2 2N × 2 2N matrices, and so induce a norm for any operators on the Hilbert space. Pauli operators are also unitary, and may be easily implemented in a quantum circuit [57], making them a common choice for Hamiltonian decompositions. However, unlike other operator size measures, 1-norms may be heavily optimized by a change of operator basis (whereas e.g. the 2-norm or infinity-norm are both invariant under unitary rotation). As operator 1-norms play a role in determining the cost of many quantum computing algorithms, this makes them a key target of study in quantum algorithm research. A versatile method for implementing an arbitrary (non-Hermitian) operator on a quantum device is the linear combination of unitaries (LCU) method [47], which involves a decomposition over a set {B j } of strictly unitary operators. AsĤ is typically not unitary, the original LCU technique requires postselection, but this forms the basis for unitary methods such as qubitization [49]. These in turn underpin many proposals for quantum computing on a quantum computer [4,52,56], and the dependence on λ B is pulled directly through. We give a brief description here of the key identity in LCU methods to show where the dependence on λ B appears. LCU methods all require a control register with states |j that encode the index to the operatorB j . (One need only encode those operatorsB j that are relevant to the targetĤ.) Then, these methods encode the coefficients ofĤ on the register by preparing the state Then, consider implementing the joint unitary j |j j| ⊗B j (that is, applying the unitary op-eratorB j to a system conditional on the control register being in the state j). Post-selecting the control register returning to the initial state performs the LCU operation to the register We see here that λ B emerges naturally as the normalization constant of the LCU unitary. This dependence is passed on to any LCU-based method, suggesting that minimizing λ B is key to optimizing any such techniques. The 1-norm can also play a role in methods where Hamiltonian simulation is implemented stochastically. For instance, in the qDRIFT [46] method, hermitian termsB j in the Hamiltonian are chosen at random, weighted by |b j |/λ B , and exponentiated on a system as e iτBj (λ B appears here immediately as the normalization of the probability distribution). To first order, this implements the channel on a stateρ [46] which can be observed to be the unitary evolution ofρ under e iĤdt for dt = τ /λ B . To approximate time evolution for time t to error then requires repeating this process O(2λ 2 B t 2 / ) times. This has been extended upon recently [58], yielding a more complicated λ B dependence.
A final situation where the 1-norm of an operator plays a role is in tomography. One may in principle measure the expectation value of an arbitrary Hermitian opera-torĤ on a state ρ by repeatedly preparingρ, rotating into the eigenbasis ofĤ and reading out the device. The variance in averaging M shots of preparation and measurement is directly given by the variance ofĤ onρ: However, in practice (especially in the NISQ era) arbitrary rotations into an unknown basis may be costly. As expectation values are linear, given a decomposition ofĤ over {B j }, one may instead perform M j different preparations ofρ and measurements in the basis of eachB j and sum the result, with variance (13) Typically the V j are not known in advance, but ifB j are chosen to be unital (by a suitable rescaling of b j ) then one can bound V j < 1. One can confirm (e.g. via the use of Lagrange multipliers [33]) that the least number of measurements required to bound Var[ Ĥ ] below some error 2 can be found by choosing M j = M |b j |/λ B , and that this yields a total number of measurements first mentioned by Ref. [32]. State tomography is essential in hybrid quantum-classical algorithms such as a VQE. In the VQE, the Hamiltonian is usually expressed as a linear combination of Pauli operators, meaning λ B = λ Q . As the number of measurements is one of the biggest overheads in the practical implementation of VQE, lowering the 1-norm would be highly beneficial. One way to do so is to add constraints to the Hamiltonian based on the fermionic n-representability conditions, which has led to a reduction of up to one order of magnitude for hydrogen chains and the H 4 ring molecule [33]. Another hybrid algorithm that would benefit for a decrease in the 1-norm is the constrained-VQE algorithm, as this would decrease the higher-bound used in the penalty term, thus improving convergence [59]. As the above considerations imply, many of the most competitive algorithms have a strong dependence on 1norms for various choices of the basis decomposition {B j }, as shown in Tab. I. The 1-norm induced by a Pauli decomposition, λ Q (Eq. 8), scales with the number of orbitals somewhere in between O(N ) and O(N 3 ). This depends on the system, and whether N increases because the number of atoms increases for a fixed number of basis functions per atom, or because the number of basis functions per atom increases while fixing the number of atoms. As far as we are aware of, this scaling has only been numerically benchmarked for the H 4 ring Hamiltonian [33,52] and one-dimensional hydrogen chains [52], which remain quite far from realistic chemistry problems encountered in enzymatic [1] and catalytic reactions [4]. In this work, we perform a similar benchmark on a larger set of organic and inorganic molecules, from hydrogen and carbon chains to the FeMoco and ruthenium complexes.
The 1-norm used in many of the aforementioned algorithms in Tab. I takes the following form [50,52] and with λ V λ T . Note that the difference between Eqs. (16) and (17) 1), and the fact that N refers to spatial orbitals in this work instead of spin-orbitals. This norm of the Hamiltonian expressed as a linear combination of unitaries is used in the database method of Babbush et al. [55], the qubitized simulation by the sparse method of Berry et al. [50] (further improved by Lee and coworkers [52]) and the qDRIFT protocol of Campbell [46].
Further attempts have been made to reduce the number of terms in the Hamiltonian. For instance, one may perform a low rank decomposition of the Coulomb opera-torV such that the 1-norm of the -now singly-factorised -Hamiltonian reads [50,52] where W ( ) pq are obtained from a Cholesky decomposition of the g pqrs tensor. Note that λ SF is a higher-bound of λ V , and so this decomposition tends to increase the 1norm. To improve over this low-rank factorization, one can write the Hamiltonian in a doubly-factorized form [4] by rotating the single-particle basis to the eigenbasis of the Cholesky vectors, such that the corresponding 1norm λ DF is much lower than after a single factorization. One can also use the tensor hypercontraction representation of the Hamiltonian [52]. However, applying the qubitization algorithm on this Hamiltonian directly is not efficient, as its associated 1-norm λ THC is even larger than λ V . To bypass this issue, Lee et al. provided a diagonal form of the Coulomb operator by projection into an expanded non-orthogonal single-particle basis, thus leading to a non-orthogonal THC representation of the Hamiltonian with an associated 1-norm λ ζ that scales better than any prior algorithm [52].
As readily seen in this section, the λ norm depends on how you represent your Hamiltonian. However, there is a unique way to write the Hamiltonian as a sum of unique Pauli strings, or equivalently as a sum of unique products of Majorana operators (see Appendix C). Doing so allows to express the qubit 1-norm λ Q in Eq. (8) as a function of the electronic integrals, where The first term λ C corresponds to the absolute value of the coefficient of the identity term that emerges when rearranging the Hamiltonian in terms of unique majorana operators, see Eq. (C8). It is invariant under orbital rotations and can be added to the energy as a classical constant together with the nuclear repulsion energy and, if one employs the frozen core approximation (see Appendix A), the mean-field energy of the frozen core. Thus, this term (apart from being small compared to λ T and λ V ) will not be important for quantum algorithms, and we will leave it out of our results, redefining the 1norm as: λ Q = λ T + λ V . λ T represents the absolute values of the coefficients of the quadratic term in Majorana operators, and λ V of the quartic term. Notice that λ V is slightly different from λ V of Ref. 52 also given in Eq. (17) (which added a slight correction to the one of Ref. 50).
In this work, we add another correction that comes from the swapping of two Majorana operators. We confirm that this is the correct form by directly comparing λ Q with the norm of the qubit Hamiltonian after doing a qubit-to-fermion transformation in Eq. (8). Note that λ V ≤ λ V (see appendix C for more information). By expressing the 1-norm of the qubit Hamiltonian in Eq. (5) with respect to the electronic integrals, we can compute its value before doing any fermion-to-qubit transformation, which can be costly for large systems.

C. Localized orbitals
As discussed previously, different approaches have already been considered to minimize the 1-norm of a given HamiltonianĤ. In this work, we choose to tackle the problem from a chemistry point of view, by focusing on the use of orbital transformations (i.e. single-particle states rotations). More precisely, we investigate the use of orbital localization techniques as a classical preoptimization method to express the electronic structure HamiltonianĤ in a new basis (see Appendix B for details about electronic integrals transformations) that presents natural advantages for quantum computing. Note that exploiting spatial locality to reduce the 1-norm has already been mentioned in Ref. 52, or to reduce the number of significant integrals in Ref. 54.
In computational chemistry, localization schemes represent state-of-the-art orbital-rotation techniques employed in various situations. For example, localized or-bitals (LOs) are regularly used to alleviate the computational cost of numerical simulations in post-HF methods such as second order Møller Plesset [60][61][62][63], coupled cluster [64][65][66][67], and multireference methods [68][69][70][71]. They can also be used to partition a system in spatially localized subsystems that are treated at different levels of theory [72][73][74][75][76][77][78][79][80][81][82][83][84]. In the context of quantum algorithms for the NISQ era, one may for instance consider performing a calculation with a classical mean-field method to produce localized orbitals and using the orbitals localized in the spatial region of interest as a basis for a calculation with a quantum algorithm. In the current work, we demonstrate that LOs can also be of a significant help beyond isolating the spatial region that is of chemical interest, by reducing the qubit 1-norm λ Q of the electronic HamiltonianĤ after a fermion-to-qubit transformation.
In the following, we introduce the orbital localization schemes considered in this work where the notations ψ p (r) and χ µ (r) are used to denote orthogonal LOs and non-orthogonal atomic orbitals (AO), respectively.

Lowdin orthogonal atomic orbital
The first approach we investigate to generate LOs is the orthogonalization of atomic orbitals method (OAO). In practice, several techniques exist to produce orthogonal AOs [85,86]. Here, we focus on Löwdin's method [87,88], known to generate orthogonal LOs with a shape that is the closest to the original AOs (in the least square sense). In practice, orthogonal Löwdin orbitalsψ p (r) are built via a linear combination of the N original AOs,ψ where the orbital coefficient matrixC takes a very simple form likeC and S µν = drχ µ (r)χ ν (r) is the overlap matrix encoding the overlap between different non-orthogonal AOs. From a practical point of view, this method represents one of the simplest and numerically cheapest localization methods, where the computational cost is dominated by the exponentiation of the overlap matrix and typically scales as O(N 3 ). However, this approach is defined with respect to the full AOs' space and cannot be straightforwardly applied to the practical case of active space calculations where only a restricted set of molecular orbitals -formed by linear combination of AOs -is used.

Molecular orbital localization schemes
Fortunately, other localization schemes can be used in any circumstances (i.e. with or without active space approximation). Among the possible approaches, we focus on three of them: the Pipek-Mezey [89] (PM), Foster-Boys [90] (FB) and Edmiston-Ruedenberg [91] (ER) methods. From a practical point of view, these methods fundamentally differ from OAO as they generate LOs out from molecular orbitals and not AOs. In practice, all these methods start from a set of orthogonal canonical MOs (CMOs) {φ s } N s=1 obtained with an initial mean-field calculation (e.g. Hartree-Fock or density functional theory). The CMOs form linear combinations of AOs as with C the CMO-coefficient matrix. A set of LOs {ψ p } N p=1 is then generated by applying a unitary transformation matrix U (with U † U = UU † = 1) to express each LO as a linear combination of the original CMOs, or, in a more compact matrix form, whereC is the LO-coefficient matrix. In practice, the shape of the LOs is numerically determined by modifying the unitary U to optimize a cost function L based on a relevant localization criterion (depending on the scheme considered). This localization can be prone to many local minima and the orbital-optimization process is usually realized with different numerical techniques (e.g. Jacobi rotations [92,93], gradient descent [94], Newton and trust-region methods [95][96][97], etc.). In chemistry, the question of how to realize an efficient orbitaloptimization process in practice still constitutes an active topic of research which goes way beyond the scope of our manuscript (we refer the interested reader to Ref. 98 and references within). Let us now focus on the different criteria used in the FB, PM and ER methods together with their respective numerical costs.
In the Foster-Boys scheme [90], the localization criterion is the square of the distance separating two electrons r 2 12 = |r 2 − r 1 | 2 , and the set of LOs is obtained by minimizing the following cost function which represents the average value of the distance criterion over the set of orbitals to be optimized. In practice, a single estimation of the cost function L FB requires a decomposition in the AO basis that generates five nested sums, thus leading to a scaling of O(N 5 ). However, specific manipulations (see Refs. 90 and 98) can reduce the total computational cost of the Foster-Boys method to O(N 3 ) multiplied by the number of times L FB is called (which intrinsically depends on the optimization algorithm considered in practice). Note that extensions of the Foster-Boys scheme exist and are based on the orbital variance [99] or the fourth moment method [100], resulting in more localized orbitals especially for basis sets augmented by diffuse functions.
In the case of the Edmiston-Ruedenberg method, the inverse distance 1/r 12 (proportional to the two-body electronic repulsion operator, see Eq. (3)) is used as a criterion, and the LOs are obtained by maximizing The where is the contribution of orbital p to the Mulliken charge of atom A. In this case, the numerical cost of one call of L PM essentially depends on the triple sums over the molecular and atomic orbitals in Eqs. (30) and (31). As a results, the computational cost of the global PM method scales as O(N 3 ) (multiplied by the number of times L PM is called). Note that the traditional PM method is illdefined since Mulliken charges are basis set sensitive and do not have a basis set limit. Various partial charges estimates can be used instead of Mulliken charges [101], such as intrinsic orbitals that are basis set insensitive and lead to a cheaper and better behaved localization procedure than PM localization [102,103]. In summary, the construction of OAOs is the cheapest approach but can't be used when considering active space. Somewhat more expensive are PM and FB which share an equivalent scaling, and finally ER which is the most computationally demanding method.

D. 1-Norm orbital-optimization
One of the main contribution of this paper is the use of an orbital-optimization (OO) process specifically dedicated to the minimization of the qubit 1-norm λ Q . To proceed, we introduce a unitary operator with an anti-hermitian generator This operator is then used to transform a reference MO basis into a new basis denoted by {φ q } with C = CU OO , for which the optimal orbitals are obtained by minimizing the following cost function, by varying the N (N − 1)/2 off-diagonal independent elements of the matrix K. This optimization can be carried out for a reference MO-basis that consists of canonical Hartree-Fock orbitals, but could equally well be carried out on a subspace of the full MO-basis, e.g. only the set of fractionally occupied orbitals in an active space type calculation, or only a subset of localized orbitals resulting from a preliminary localization using one of the methods discussed above.
In practice, to realize the OO of the 1-norm, we choose to use 'brute-force' optimization algorithms which autonomously estimate the local derivatives of the cost function (i.e. gradients and hessians). This choice is motivated by the expression of λ Q in Eq. (19) that contains many absolute values and for which analytical derivatives are clearly non-trivial to estimate. Numerically, the main bottleneck of the OO method is linked to the repetitive transformation of the two-electron integrals realized at each step of the process. As a result, the core algorithm scales as O(N 5 ) (similarly to the ER localization scheme). However, with the computational effort deployed to estimate the numerical gradient and hessian of the complicated cost function (given in Eq. (34)), this scaling should increase more especially when treating large systems.

III. COMPUTATIONAL DETAILS
The geometries of the small systems considered in this work were optimized using the ADF program of the Amsterdam Modeling Suite (AMS) [104] with a quick universal force-field (UFF) optimization, sufficient for our purposes. These geometries are provided in the Supplementary Material [105]. The first geometry in Ref. 1 with a charge of +3 is used for FeMoco and the geometries in Ref. 4 (all with a charge of +1) were used for the Ruthenium metal complexes. The electronic integrals of the Hamiltonian were computed using the restricted Hartree-Fock method from the PySCF package [106]. For large molecules, the frozen-core approximation was invoked according to Appendix A. All the localization schemes used to transform the Hamiltonian were already implemented in the PySCF package. For our 1-norm orbital-optimization scheme, the SLSQP optimizer was used from the SciPy package [107] and the python version of the L-BFGS-B optimizer of the optim-Parallel package [108]. The choice of these optimizers has been motivated by their capacity to automatically approximate gradients (and Hessians) for minimization. This represents an interesting tool when no evident analytical gradient/Hessian can be determined as for the 1norm. A fast algorithm to estimate this λ Q [Eq. (19)] was implemented in the OpenFermion package [109], allowing users to calculate 1-norms of large molecular Hamiltonians without employing expensive fermion-to-qubit mappings.

IV. RESULTS
In this section, we study the scaling of λ Q with respect to the number of orbitals. First, we fix the basis set and increase the number of atoms, in the spirit of Refs. 33 and 52. Concerning the orbital localization schemes, we allow rotations between the active occupied and virtual spaces as this can lead to a better localized orbital basis.
Then, we benchmark the value of λ Q for several different chemical systems and active space sizes, using the different localization schemes and our 1-norm orbitaloptimized scheme. When considering an active space, we always choose our active space based on the CMOs by considering the N act orbitals around the Fermi level (i.e. around the highest occupied and lowest unoccupied molecular orbitals). We localize only inside the active space (i.e. the localizing unitary in Eq. (26) only has indices corresponding to active orbitals) such that the subspace spanned by the active space remains invariant under the unitary rotations. Although this means that the expectation values of observables remain the same inside the active space when an exact solver is used, one can converge to different expectation values when approximate solvers are considered such as the truncated unitary coupled cluster ansatz (that becomes exact if not truncated [110]). However, rotating from the CMO to the LMO basis will not necessarily deteriorate the results of the (approximate) simulation, as LMOs have significant importance in local correlation treatments in post-HF methods like second order Møller Plesset [60][61][62][63], coupled cluster [64][65][66][67], embedding approaches [72][73][74][75][76][77][78][79][80][81][82][83][84], and multireference methods [68][69][70][71] on classical computers. Hence, using localized orbitals could lead to similar advantages in quantum computing simulations, and could also inspire new ansatz proposals based on embedding strategies. This is left for future work.
Finally, to study the scaling of λ Q for large molecules, we increase the size of the active space N act while fixing the basis set and the number of atoms.  Fig. 1.

A. Hydrogen and alkane chains: scaling of the 1-norm by increasing the number of atoms
In this section, we investigate the performance of the localization schemes together with our brute-force OO method, studying the scaling of λ Q with respect to the number of orbitals by increasing the number of atoms in small systems. Inspired by Refs. 54 and 33, we consider linear chains of hydrogens with a spacing of r = 1.4 A and linear alkane chains (for which the geometry has been optimized). Results are shown in Fig. 1 and Tab. II.
The orbital-optimizer method is not included for the linear hydrogen chains, because the gradient-estimating optimizer has trouble finding the approximate gradient in the first step of the optimization. The reason for this is that the localization schemes already come so close to the optimal solution that the optimizer estimates the gradient to be zero or positive in each direction. The STO-3G basis results in the localized orbitals resembling just 1s atomic orbitals on every nucleus which are very localized and therefore have a minimal 1-norm. As linear hydrogen chains are very artificial systems and give limited information on the scaling of λ Q for actual interesting chemical systems, we thought the results of the conventional localization schemes to be sufficient here.
As readily seen, all localization schemes perform comparably well in reducing the 1-norm compared to canonical molecular orbitals. Indeed, the scaling of the 1-norm decreases significantly from O(N 2.31 ) to O(N 1.34 ) for the hydrogen chains, and from O(N 2.21 ) to O(N 1.38 ) for the alkane chains. In terms of concrete values, for the largest N studied in Fig. 1, we can see a reduction of λ Q by more than a factor 13 for the hydrogen and 5 for the alkane chains, when passing from the CMO basis to the LO ones. In the alkane chains, we do not see any significant distinction between the different schemes. While α for the orbital-optimization scheme is slightly higher than the best localization scheme, it is not so informative here as the λ Q associated to the OO has the lowest absolute value for any orbital space. In the hydrogen chains, it seems that the Pipek-Mezey scheme is slightly less ef-  ficient. Surprisingly, the atomic orbital based Löwdin orthogonalization method do not provide any further improvement over other localized orbitals, which might be due to the simplicity of the systems for which localization schemes can generate extremely localized orbitals.

B. Benchmarking λQ for a variety of molecules and active spaces
In order to benchmark the impact of the localization schemes on the 1-norm, we calculated λ Q for a variety of molecules and active spaces. Whenever we consider the full space of orbitals, we include the value of λ Q obtained from the Hamiltonian expressed in the OAO basis. When an active space is considered, we use the frozen-core approximation and we localize the orbitals inside the active space. The resulting λ Q values are tabulated in Tab III, where the lowest ones obtained from standard localization schemes for each system are in bold. Apart from the different localization schemes, results of our brute force optimizer are also shown. Note that we start the optimization with already localized orbitals, such that it always gives a lower 1-norm than the best localization scheme. On most molecules, we had the best experience with the L-BFGS-B optimizer. Unfortunately, this optimizer was not able to estimate the gradient of λ Q w.r.t. U OO on the smallest molecules (H 2 and LiH), which may be due to infinitesimal changes in λ Q when changing U OO . This why we employed the SLSQP optimizer here.
In order to identify the origin of the significant reduction of the 1-norm and its scaling, we focus on the transformation of the two-electron integrals defined in Eqs.
for the cases (5) and (7), respectively. These contributions can give us some insight as to how the localized orbitals affect the norm of the diagonal and off-diagonal parts of the tensor, and are represented in Fig. 2 for the second stable intermediate Ruthenium complex in different orbital bases. As readily seen in this diagram, localization schemes do not lead to a uniform reduction of the norm of every term, but tends to maximize some of them (the pppp term), minimize others (the pqrs term) and leaves others relatively unchanged (the ppqq term). However, the pqrs contribution clearly dominates in the CMO basis, being more than one-order of magnitude larger than any of the other terms. Especially the orbitaloptimizer can give an even better reduction of this term. This explains why the localization schemes can be used to reduce the 1-norm, as reducing the pqrs norm by one order of magnitude (as seen in Fig. 2) will play a much more important role than increasing the pppp one by the same order of magnitude. As the magnitude of these integrals depends on the overlap densities ψ p (r)ψ q (r) and ψ r (r)ψ s (r), it is clear that localization will reduce the number of numerically significant contributors. For an extended system, integrals in which p and q as well as r and s are localized on the same atoms are expected to dominate. This explains the observation that the pqrs norm becomes comparable to the ppqq and pprs terms when employing localized orbitals.
C. Effect of increasing the size of the active space on λQ Let us now investigate how λ Q scales with respect to the number of active orbitals for large and complex systems. We selected test-molecules based on two important recent papers in the field of quantum computing for quantum chemistry: FeMoco -an iron-molybdenum complex that constitutes the active site of a MoFe protein which plays an important role in nitrogen fixation [1] -and all the stable intermediates and transition states of the catalytic cycle presented in Ref. 4. This catalytic cycle describes the binding and transformations of carbon dioxide CO 2 , a molecule with infrared absorption properties that makes it a potent and the most important greenhouse gas.
For the calculations to remain doable in a reasonable amount of time, the ANO-RCC minimal basis (with scalar relativistic corrections) is used. This should be sufficient for the current purpose of studying 1-norm reductions within an active space, but we note that larger basis sets will be required to reach chemical accuracy [111]. The scaling of λ Q with respect to the number of active orbitals N act for different orbital bases are reported in details in Fig. 4. Comparing the scaling reported in this section and the ones obtained in Sec. IV A, we see that the scaling is larger when increasing the number of active orbitals than when increasing the number of atoms, as expected. However, the impact of localized orbitals on the 1-norm is similar in both cases, as it reduces the scaling by almost one order of magnitude. We also do not  N ), where ne is the number of electrons in N the number of spatial orbitals. The lowest 1-norm obtained from standard localization schemes for each system are in bold. Superscripts (a) and (b) refer to the use of the SLSQP and the L-BFGS-B optimizer used for in our 1-norm orbital-optimization scheme (denoted as "Optimizer" in the table), respectively. The rightmost column shows the percentage of reduction obtained from the 1-norm-optimized orbitals compared to the CMOs.
see any significant distinction between the 1-norm associated to the different conventional localization schemes. The brute-force orbital-optimizer however, gives a consistent 10-25% improvement over the conventional localization schemes. In the largest active spaces considered (N act ≥ 85) the orbital-opimization scheme can converge very slowly due to the need of 4-index transformations at every step, combined with the non-continuous and complicated landscape of λ Q . Because of this, the user is usually forced to cut off the optimizer at some point, or choose a high threshold of convergence. In spite of this, we got similar good results in the largest active spaces we considered, indicating that the optimizer is not limited to a specific active space size. The scaling results for all large molecules are summarized in Table IV. The scaling lies in the order of O(N 2.6 ) to O(N 2.9 ) when using CMOs, while it lies around O(N 2 ) or slightly higher when using LOs. One gets a bit of a misleading picture looking at just the scaling of the orbital-optimizer, as it has the lowest absolute value of λ Q at every point. The cause is that the cost function of the orbital-optimizer has more local minima and is harder to converge compared to conventional localization schemes.
For the localization schemes it is usual that the bigger the active space is, the more λ Q can be reduced compared to CMOs. This can be rationalized as follows. When the active space increases, it may involve orbitals localized on atoms which have negligible overlap with the other orbitals. Another possibility is that the additional orbitals may have a positive effect on the localization procedure. As an illustration, see the example shown in Fig. 3 where we consider an occupied π-bonding orbital mixing with its associated virtual π * -antibonding orbital for the formaldimine molecule. By rotating those two orbitals via a Jacobi rotation, we move from a maximal value of λ Q for θ = 0 (corresponding to the original CMOs) to a minimal value for θ = π/4 (corresponding to the new LOs). This way, we see that the larger is the active space the more degrees of freedom we have to perform a better orbital localization. For the biggest active space chosen, an active space of 100 electrons in 100 orbitals, λ Q is reduced by an order of magnitude. This illustrates the potential of using widely available and computationally cheap classical pre-optimizations, together with our dedicated orbital-optimizer, to reduce the cost of quantum simulations.

V. CONCLUSIONS
Numerically we have seen that exploiting the locality of the basis gives rise to a lower variance in the Hamiltonian coefficients, reducing λ Q . This results in a significant reduction of the absolute values of the integrals. The off-diagonal elements of the two-electron tensor play the  Table IV. Scaling of λQ with respect to the active space size Nact for different orbital bases, assuming log λQ = α log Nact + β with the associated R 2 value. The minimal ANO-RCC basis-set is considered.   Table IV. biggest role here.
To find an even better basis-rotation that minimizes λ Q , one can use a "brute-force" orbital optimizer as we have done in this work. This method could be realized in practice only because we have a efficient way of computing λ Q directly in terms of the molecular integrals [Eq. (19)], avoiding doing a fermion-to-qubit mapping explicitly. Expressing it as λ Q = λ T + λ V , this improves a bit over the equation of λ V in Ref. 52, holding into account the representation of the Hamiltonian in terms of unique Pauli strings. The direct orbital optimizer can be used in quite large active spaces, reliably converging for active spaces with up to N act ≈ 80 orbitals, sometimes more. For the largest active spaces considered, it converges very slowly due to the need of 4-index transformations at every step which are costly because of the large dimensionality of the system, combined with the non-continuous and complicated landscape of λ Q . This gives rise to the need to either set a high threshold of convergence or cut off the optimizer at some point where the user is satisfied with the result. While this is problematic, there is no a priori reason not to try this optimizer on the (potentially large) active spaces of the molecule one is considering. This can be a point of further study.
We benchmarked a range of various molecules and active spaces for which we showed to achieve a significant reduction of λ Q . Apart from this benchmark, we investigated the scaling of λ Q with the size of the system N . There are multiple paths one can take here to increase N . A popular way to do this in the literature is considering either a hydrogen chain or hydrogen ring, and increase the amount of atoms. We benchmarked the scaling of λ Q for a hydrogen chain and a linear alkane chain by increasing the number of atoms as well, where our results show a scaling of O(N 2.3 ) and O(N 2.2 ), respectively. Here we saw localized orbitals can give approximately a factor of N lower scaling of O(N 1.3 ) and O(N 1.4 ), respectively. As this gives one limited information how big λ Q will be in real eventual applications of quantum algorithms, we decided to investigate the scaling on relevant highly correlated molecules such as FeMoco (important in nitrogen fixation) and Ruthenium metal complexes (important in carbon dioxide capture). Even though we used a minimal basis to make the scaling calculations feasible on these molecules, there is no reason to believe λ Q will scale differently with respect to the active space size on a bigger basis-set. Here we showed that the scaling is a factor of N larger, and with localized orbitals can be brought down to λ Q ≈ O(N 2 ), depending on the molecule in consideration. Our dedicated orbital-optimizer was able to consistently give an even further improvement of 10-25% on these molecules. As a concrete example, consider the largest active space considered of FeMoco: this factor of N difference would result in a reduction by two orders of magnitude in the amount of measurements needed for tomography [Eq. (14)], if one wants chemical accuracy. This indicates that the simple and efficient classical preprocessing by widely available localization techniques, together with our dedicated optimizer, will help to make simulations with large active spaces feasible.

ACKNOWLEDGMENTS
We sincerely thank Susi Lethola for his helpful comments on localization schemes. SY and BS acknowledge support from the Netherlands Organization for Scientific Research (NWO/OCW). EK acknowledges support from Shell Global Solutions BV.

Appendix A: Frozen core Hamiltonian
Applying the frozen core approximation to the electronic structure Hamiltonian consists in assuming the existence of a set of frozen orbitals (always occupied), another set of active orbitals (belonging to an active space), and a set of virtual orbitals (always unoccupied). Based on this partitioning, every Slater determinant |Φ used to describe properties of the system take the form where the left contribution Φ frozen is the part of the determinant encoding the frozen orbitals of the system (always occupied) whereas Φ active is a part encoding the occupancy of the remaining electrons in the active orbitals of the system. In this context, if one considers that every correlated electronic wavefunction is always expanded with Slater determinants following Eq. (A1), one can demonstrate by projections that the system Hamiltonian takes an effective form withĤ FC the so-called "frozen core Hamiltonian" defined as follows,Ĥ FC =Ĥ active + E MF frozen +V.
Here,Ĥ active is the Hamiltonian encoding the one-and two-body terms only acting in the active space, where t, u, v, w denote active space orbitals. The second term E MF frozen is a scalar representing the mean-field-like energy obtained from the frozen orbitals,

Appendix B: Electronic integrals transformation
Preparing the electronic Hamiltonian in a given orthogonal orbital basis {φ p (r)} N p=1 is a crucial step for realizing concrete quantum computing applications. In practice, such a process can be realized classically via electronic integral transformations. For this, we assume that the orthogonal orbitals can be expressed as a linear combination of AOs such that where we assume that the functions χ µ (r) as well as the coefficient matrix C are real-valued, as is usually done in non-relativistic quantum chemistry. Based on our knowledge of this matrix C, one can transform the one-and two-electron integrals from the AO basis to the orthogonal orbital basis. To do so, one starts from the oneand two-electron integrals in the AO basis, respectively h µν and g µνγδ , and implements the following two-and four-indexes transformations: and g pqrs = µνγδ g µνγδ C µp C νq C γr C δs .
We now want to determine the value of λ Q after a fermion-to-qubit mapping of this Hamiltonian in which all products of Majorana operators should be distinct. We use a fermion-to-qubit mapping such that single Majorana operators are mapped to single unique Pauli strings, such as the Jordan-Wigner transformation [51]: where we used the composite index i = pσ. Note this is not only true for the Jordan-Wigner transformation, but also for for example the Bravyi-Kitaev transformation [114]. The only thing left to do is to make sure that the sum over 4 indices used to evaluate λ Q is indeed truly quartic. Looking at Eq. (C8), this is not yet the case. If pσ = rτ or qσ = sτ the corresponding Majorana operators will square to identity and will reduce to quadratic and identity terms. Below we employ the permutational symmetry of the real-valued integrals g pqrs to obtain the most compact representation.
We first reorder the Majorana's and distinguish be-tween the same and opposite spin cases: where we applied the identity relationγ 2 i = I and for the second term used that integrals are symmetric in exchanging p and r while the product of Majorana operators is antisymmetric under this exchange. This nullifies this and the third term. The first term can be absorbed in the scalar term. For the truly quartic term we may make further use of permutational symmetries to get the most compact form: 1 8 p =r,q =s σ g pqrsγpσ,0γrσ,0γqσ,1γsσ,1 = 1 8 p>r,s>q σ ( g pqrsγpσ,0γrσ,0γqσ,1γsσ,1 +g rqpsγrσ,0γpσ,0γqσ,1γsσ,1 +g psrqγpσ,0γrσ,0γsσ,1γqσ,1 +g rspqγrσ,0γpσ,0γsσ,1γqσ,1 ) = 1 4 p>r,s>q σ (g pqrs − g psrq )γ pσ,0γrσ,0γqσ,1γsσ,1 (C12) Combining all terms we end up with the expression: We then take the sum of absolute values of the coefficients and perform the sums over spin in the quartic term explicitly (amounting to a factor of 2), to get the following form of λ Q : where λ C corresponds to the constant term in Eq. (C13), λ T to the quadratic and λ V to the quartic term in Majorana operators. They have the form: This λ V is slightly different to the λ V of Ref. 52 in calculating the λ for the "sparse" algorithm of Berry et al. [50], which should actually be the same as λ Q . This is due that we took into account the swapping of majorana operators in Eq. (C12). In Ref. 52, this corresponds to the termV = 1 is hard to see here that the productQ pqαQrsα is antisymmetric in swapping p, r and q, s, but it becomes clear when one realizes thatQ pqα = iγ pσ,0γqσ,1 , indicating the usefulness of majorana operators. As the absolute values give that: |g pqrs − g psrq | ≤ |g pqrs | + |g psrq | ∀ p, q, r, s (C18) such that λ V < λ V (except when g pqrs and g psrq always have opposite sign, in which case they would be equal).