Adaptive construction of shallower quantum circuits with quantum spin projection for fermionic systems

Quantum computing is a promising approach to harnessing strong correlation in molecular systems; however, current devices only allow for hybrid quantum-classical algorithms with a shallow circuit depth, such as the variational quantum eigensolver (VQE). In this study, we report the importance of the Hamiltonian symmetry in constructing VQE circuits adaptively. This treatment often violates symmetry, thereby deteriorating the convergence of fidelity to the exact solution, and ultimately resulting in deeper circuits. We demonstrate that symmetry-projection can provide a simple yet effective solution to this problem, by keeping the quantum state in the correct symmetry space, to reduce the overall gate operations. The scheme also reveals the significance of preserving symmetry in computing molecular properties, as demonstrated in our illustrative calculations.

Quantum computing is a promising approach to harnessing strong correlation in molecular systems; however, current devices only allow for hybrid quantum-classical algorithms with a shallow circuit depth, such as the variational quantum eigensolver (VQE). In this study, we report the importance of the Hamiltonian symmetry in constructing VQE circuits adaptively. This treatment often violates symmetry, thereby deteriorating the convergence of fidelity to the exact solution, and ultimately resulting in deeper circuits. We demonstrate that symmetry-projection can provide a simple yet effective solution to this problem, by keeping the quantum state in the correct symmetry space, to reduce the overall gate operations. The scheme also reveals the significance of preserving symmetry in computing molecular properties, as demonstrated in our illustrative calculations.

I. INTRODUCTION
Recent advances in quantum technology have attracted widespread attention in various disciplines. Electronic structure theory is one of the fields expected to potentially benefit from the development of quantum computers because strong correlation can be handled by mapping the wave function directly onto entangled qubits without an exponential increase in computational cost. However, current noisy intermediate-scale quantum (NISQ) devices cannot implement error correction owing to the limited quantum resources; hence, they suffer from errors triggered by noise and short coherent times, 1-3 which limit the application of general quantum algorithms such as quantum phase estimation. Therefore, several hybrid quantum-classical algorithms have been proposed to address the technical challenges emerging in NISQ hardware. The variational quantum eigensolver (VQE) is one such algorithm, 4,5 in which a quantum state is propagated by a parametrized short quantum circuit that can be optimized in classical post processing, to variationally lower the energy expectation value and obtain the Hamiltonian ground state.
In quantum chemistry, the most conventional ansatz for VQE circuit is the unitary coupled-cluster (UCC), [6][7][8][9][10] where fermionic excitation operators with respective to the Fermi vacuum (e.g. Hartree-Fock) are antihermitized and exponentiated to form unitary gates. Given the remarkable success of standard CC in a wide range of classical applications, UCC with single and double substitutions (UCCSD) is expected to be a feasible starting point for chemical Hamiltonians. Nevertheless, UCCSD is manifestly incapable of describing strong correlations, 11 which is the primary target of quantum computing. Extending the excitation manifold to include a) Electronic mail: tsuchimochi@gmail.com general orbitals, generalized UCCSD captures higher excitation effects in a compact manner, and is thus known to often provide accurate results for strongly correlated systems. 11,12 However, these UCC-based methods are inhibited by a limited representability because of the fixed structure of unitary gates; in addition, they also require several parameters and deep circuits, which makes them difficult for NISQ devices to handle.
Ansatz-free algorithms are more flexible and can control the accuracy and circuit depth in a highly blackbox manner. [13][14][15][16][17] A prominent algorithm in this context is ADAPT-VQE proposed by Grimsley et al., 13 which adaptively adds a set of short unitary gates to the quantum circuit successively. Each unitary is chosen from an operator pool comprising generalized-UCCSD excitation operators, based on the VQE energy gradient. Later, Tang et al. proposed qubit-ADAPT 15 , which employs Pauli rotations instead of a UCC-like operator to drastically reduce the gate depth. Recently, Yordanov et al. demonstrated that "qubit-excitations," each of which is a linear combination of Pauli strings in qubit-ADAPT, can be implemented with only a handful of CNOT gates. 18 Based on this result, they employed qubit-excitations to form an operator pool in what they called qubit-excitation based (QEB) ADAPT. 17 However, these ADAPT-based methods break important symmetries of chemical Hamiltonians, of which the most significant is spin-symmetry.
It is well known in the quantum chemistry community that breaking spin-symmetry triggers several undesired consequences, which include inaccurate energies and properties, a slow convergence to the exact state, and a difficulty in interpreting obtained results. [19][20][21] ADAPTbased methods result in spin-symmetry breakages because of the use of fermionic or qubit excitations, for which spin-adaptation is difficult when exponentiated. This is more the rule than the exception. In fact, in strongly correlated systems, spin-symmetry is completely broken because ADAPT typically requires large rotation parameters to represent the multi-configuration character. As will be demonstrated, this causes the slow convergence of ADAPT algorithms and inaccurate results with truncated operators.
To address this issue, this study employs a spinprojection circuit, 22 which restores the total spin quantum number to restrict ADAPT and explore the operator candidate in the correct symmetry space. We demonstrate that our algorithm can generally reduce the circuit depth, although it increases the measurement cost. In particular, we show that the spin-projected ADAPT outperforms UCCSD as a better tradeoff between accuracy and circuit depth, which is not always guaranteed in ADAPT. To investigate the importance of spin-symmetry in molecular property calculations, we further derive the first-order energy derivative in the presence of the spinprojection operator.
The remainder of this paper is organized as follows. In Sections II A and II B, we review the original algorithms for ADAPT-VQE, and describe how they break symmetries in Section II C. Section II D presents detailed discussions on the application of spin-projection to each ADAPT scheme, including an explanation on how the simulated qubits can be tapered off in the presence of spin-projection. Subsequently, we derive the energy derivative for ADAPT in Section II E, especially focusing on the importance of orbital response. In Section III B, we demonstrate that the convergence in ADAPT becomes increasingly slower for strongly correlated systems, and in Section III C, we present our benchmark results on the spin-projected ADAPT, comparing the accuracy and gate efficiency between different methods. Section III D demonstrates how the number of operators in a pool can be reduced in QEB ADAPT. The calculations for dipole moment and geometry are explained in Section III E. Finally, we conclude this work in Section IV.

II. THEORY
A. Fermionic ADAPT-VQE First, we review the ADAPT-VQE algorithm originally proposed by Grimsley et al. 13 This algorithm is based on the observation that the exact full configuration interaction (FCI) state can be reached from the Hartree-Fock (HF) state using the exponential form, whereτ (k) is the kth instance of arbitrarily ordered operators comprising the anti-symmetrized single and double excitations,τ Here, we have adopted p > q, r > s to denote general spin-orbitals. Note that the same anti-symmetrized excitation operators can appear repeatedly in Eq. 1, but with different variational parameters t k . Although Eq. (1) includes only singles and doubles, the effect of higher excitations such as triples and quadruples can be conveniently considered by their exponential products.
Regardless of its exactness, Eq. 1 is extremely cumbersome and long because it requires a large number of k; hence, its realization is prohibitively difficult on quantum computers. On the one hand, from a circuit complexity perspective, such as the number of CNOT gates, one can employ only a limited number of unitaries e t kτ (k) on NISQ computers. On the other hand, it is expected that such a truncated ansatz with a reasonable number n of unitaries can still outperform fixed ansätze such as UCCSD, especially for strongly correlated systems, with appropriately chosenτ (k) (k = 1, · · · , n) and variationally optimized t k . Therefore, the objective of ADAPT-VQE is to dynamically create an ansatz that approaches FCI, using a maximally compact sequence of n unitary operators, which are successively determined based on the energy gradient. To do so, an operator pool P is first defined, which comprises M operators that are used to construct the trial state: where we explicitly consider the spins α and β and each linear combination of operators is labeled by m = 1, · · · , M . Note that each combination of doubles does not comprise the spin-complement form, which we would address later in this paper. A trial state of ADAPT-VQE at the nth cycle is given by |ψ n = e θnÂn e θn−1Ân−1 · · · e θ1Â1 ψ HF (4) whereÂ k ∈ P, and the parameters {θ k } are variationally optimized to minimize the energy expectation value. To select the most contributing operator from the P for n + 1th cycle, we create trial states with eachÂ m ∈ P (m = 1, · · · , M ) as and calculate the energy gradient for each |ψ n+1,m around θ n+1 = 0, which can be evaluated with up to three-body reduced density matrix (3RDM) because of the commutator property. The operator with the largest gradient R (n) m is selected asÂ n+1 , with which the parameters {θ k : k = 1, · · · , n + 1} in |ψ n+1 are all optimized by the standard VQE.
The ADAPT algorithm presented above is converged when the norm of Eq. (6) is smaller than the threshold . It is worth noting that such a condition is closely related to the k-particle generalized Brillouin theorem. 23 Kutzelnigg demonstrated that an exact FCI wave function satisfies Ψ FCI | Ĥ ,τ µ |Ψ FCI = 0 for all excitation ranks. The convergence in ADAPT-VQE with = 0 only corresponds to the 1-and 2-particle generalized Brillouin conditions, and therefore, ADAPT-VQE is not guaranteed to converge to FCI. However, in practice, it approaches the FCI accuracy as ||R|| decreases. 24

B. Qubit-based ADAPT-VQE
The aforementioned Fermionic ADAPT is based on the Jordan-Wigner transformation for mapping fermionic excitation operators Eqs.
(2) to Pauli operators. This introduces the chain of Z operations because of the antisymmetric character of fermions, e.g., where we assume p > q > r > s, but similar relations can be obtained for other cases. The exponential ofτ pq rs is usually implemented by decomposing the eight Pauli strings presented in Eq. 7, each for which requires the ladders of CNOT gates to take into account the parities for t = q +1, · · · , p−1 and u = s+1, · · · , r −1; refer to Fig. 1 as an example. Therefore, such a naïve implementation of e θτ pq rs necessitates 8(p−q+r−s+2) ≥ 48 CNOTs. However, several studies [25][26][27] have demonstrated that even if all the Z operators from the Pauli string in Eq. (7) are discarded to define the qubit excitatioñ the accuracy obtained by VQE will be similar to that with the fullτ pq rs , although it eliminates a considerable number of CNOT gates. Tang et al. attempted to further decomposeτ pq rs and construct a reduced operator pool for what they called qubit-ADAPT-VQE,, noting that each Pauli string has a similar action. Using circuits depicted in Fig. 1(a), each operator e θmÂm requires only six CNOT gates (similarly, two CNOT gates for single excitation variants, Y p X q ). However, as will be discussed, this treatment significantly breaks several symmetries in the Hamiltonian and slows down the algorithm's convergence for chemical applications. Moreover, because it violates the number symmetry, it cannot be applied to ionized states or half-filled Hubbard models.
Recently, Yordanov et al. demonstrated that the role of e θτ pq rs is to rotate the bit strings between |0 p 0 q 1 r 1 s and |1 p 1 q 0 r 0 s by θ, while everything else is unaffected, and proposed the application of a controlled-Ry gate, as illustrated in Fig. 1(b), up to the phase. 18 This implementation requires only 13 CNOT gates, although the connectivity between pqrs is assumed. They have also established that a standard double excitation e θτ pq rs can be similarly implemented ( Fig. 1(c)), where the number of CNOT gates is reduced to 2(p − q + r − s) + 9. Using Eq. (8) and Fig. 1(b), they proposed qubit-excitationbased (QEB) ADAPT, which was shown to perform significantly better than qubit-ADAPT of Ref. [15] in terms of CNOT gate counts and number of parameters. 17

C. Symmetry-breaking in ADAPT-VQE
In the original proposal by Grimsley et al., it appears to be convenient to chooseτ pαqα rαsα +τ p β q β r β s β and τ pαq β rαs β +τ p β qα r β sα as the elements of the operator pool for fermionic ADAPT, because operators in each linear combination are symmetric in terms of spin. However, they are neither generally commutative (τ pαq β rαs β andτ p β qα r β sα do not commute if p = q or r = s) nor spin-free. This results in two limitations. First, because the operator selection is based on Eq. (6) while the VQE part (implicitly) introduces the Trotter approximation, the gradients between these two steps can be inconsistent, which may cause an occasional numerical instability . In our experience, the same operator can be chosen between consecutive ADAPT cycles, and the convergence to FCI can never be achieved in some cases. Second, the application of a spin-dependent operator basis sacrifices the conservation of spin s. Although a more recent study has employed the spin-free doubles generator, that is, the sum of these two combinations, 15 it necessitates an undesired Trotter decomposition, as demonstrated in our previous study. 22 Using such spin-incomplete operators provokes a serious issue widely known in electronic structure as spincontamination. This is especially problematic in strongly correlated systems where the initial HF state significantly deviates from the FCI state and therefore large amplitudes θ k are essential. Certainly, a lost spin-symmetry would be eventually recovered by the ADAPT algorithm, provided the target spin state is the ground state and the aforementioned instability is circumvented. However, large spin-symmetry breaking is inevitable in the early steps of the ADAPT procedure, which often leads to unphysical gradients and a poor convergence, as will be demonstrated below. The qubit-ADAPT algorithm 15 is also expected to suffer from the same problem. Furthermore, the candidate operators in qubit-ADAPT break other two important symmetries,Ŝ z andN , and the problem becomes even more severe. This fact can be easily confirmed by transforming each of the strings in Eq.8 back to their corresponding fermionic representations; for example, for the case where p = 6, q = 3, r = 1, s = 0, we have Hence, it is important to maintain a correct symmetry throughout the ADAPT algorithm, as much as possible, to prevent any unnecessary diversion toward FCI. Accordingly, this study investigates the importance of symmetry-conserving sequence of operators, with a special focus on spin-symmetry. One may employ the constraint approach where λ Ô − o 2 is added to the energy functional as a penalty to enforce the target state to be an (approximate) eigenfunction of the symmetry oper-atorÔ with the eigenvalue o. Here, we take a different approach and restore the broken symmetry in each step of ADAPT by adopting appropriate post treatments, that is, spin-projection.

D. Spin-projected ADAPT
In electronic structure theory, spin-projection is a vital tool, [28][29][30][31][32] and has recently exhibited a significant potential in quantum computing in the context of VQE. 22 It restores the spin of the broken-symmetry quantum state |ψ , whereŜ 2 |ψ = s(s + 1)|ψ , by applying a spinprojection operatorP , such that where m s = (N α − N β )/2. Following our previous work, we expressP as a linear combination of unitary opera- whereÛ g = e −iαgŜz e −iβgŜy e −iγgŜz and w g are weights defined by the Wigner D-matrix. In VQE, we are interested in the expectation value of energy, where we have usedP 2 =P =P † and Ĥ ,P = 0.
Note that, using transfer operators |s; m s; k|, which generate a spin s state with m s = m from m s = k, a more general projector m s c m s |s; m s s; m s | introduces more variational freedom, provided |ψ is not an eigenstate ofŜ z , by diagonalizing the Hamiltonian to determine the coefficients c k 30 ; however, for simplicity, we only consider |s; m s s; m s |, which results in Eqs. 12 and 13. When |ψ is an eigenstate ofŜ z , Eq. 13 can be further simplified using the commutativity betweenĤ andŜ z , and one only needs to evaluate the Hamiltonian coupling ψ|Ĥe −iβgŜy |ψ and the overlap ψ|e −iβgŜy |ψ . These expectation values are estimated, for instance, by the Hadamard test with one ancilla qubit, with a constant number of CNOTs (4n qubit ). With the Gauss-Legendre quadrature, we take N g = 2 in the applications below; hence, the overall measurements will be doubled, while the circuit length is almost unchanged with or without P .
The basic idea here is to introduceP to ADAPT-VQE and constrain the operator search in the correct spin space for a better convergence. Accordingly, we simply define the spin-projected (SP) ADAPT by applyingP to Eq. 4. |ψ SP n =P e θnÂn e θn−1Ân−1 · · · e θ1Â1 ψ HF .
The energy gradient evaluated to find the n+1th operator is then similar to Eq. (6) as follows: The commutator involves higher rank operators owing to the presence ofP , and R (n) m requires four-body transition RDM between |ψ n andÛ g |ψ n . However, in practice, they can be handled by the (Fermionic-)shift rule without explicitly computing RDMs 33-35 consistently, both in the operator selection and VQE steps.
The operator candidates used in the original fermionic ADAPT algorithm are pairwise in terms of spin, because if only either of the spin-dependent excitations is employed in ADAPT-VQE, the other would become almost indispensable to restore the spin symmetry. However, whenP is introduced, such a pairwise treatment is no longer necessary. For example, consider the single excitation operatorsτ pα qα andτ p β q β . Although each operator excites (and de-excites) an electron from the q-th orbital to the p-th orbital, the role ofτ p β q β is simply to complement the lost spin inτ pα qα and vice versa, which can be handled byP . Based on this consideration, it is advisable to employ completely spin-dependent excitation operators as a pool for Fermionic ADAPT ansatz, In the presence ofP , the role ofτ pαq β rαs β becomes less significant if its spin-flipped operatorτ p β qα r β sα is already treated in the ansatz. Therefore, some spin-flipped operators that primarily produce effects similar toP would not be chosen in the algorithm, which will potentially minimize the overall circuit depth. Furthermore, the exponential of each operator in the pool P spin can be treated without the Trotter decomposition; therefore, the energy derivatives estimated in the VQE part and Eq. (15) are equivalent for the last chosen operator. This eliminates the algorithmic problem discussed in Section II C.
Spin-projection is also applicable to qubit-based ADAPT. However, as we have discussed, qubit-ADAPT, which adopts the decomposed Pauli strings as candidate operators, violates theŜ z symmetry, and therefore prevents us from using the simplified e −iβgŜy rotation for spin-projection. The full rotation operator e −iαgŜz e −iβgŜy e −iγgŜz allows one to project the correct m s state but entails both α and γ rotation grids, which results in a drastic increase in the number of measurements, by two orders of magnitudes. In contrast, in QEB-ADAPT,τ pq rs preserves theŜ z andN symmetries. Because the difference betweenτ pq rs andτ pq rs is simply that the former neglects the parities between qubits, we can express it aŝ where t, u are the qubit indices appearing as a Z string in Eq.
(2), that is, t, u = [q + 1, p − 1], [s + 1, r − 1]. As a concrete example, it can be verified that Hence, spin-projection can be easily combined with QEB-ADAPT, and we call such an algorithm SP-QEB. Finally, let us briefly consider the tapering-off technique for spin-projection. As it is known, because chemical Hamiltonians possess number and point-group symmetries, one can identify unitaries that transform a Hamiltonian such that it has only Pauli operators that trivially act on certain qubits, which can thus be discarded. 36,37 Such unitaries are identified using the Z 2 symmetry; namely, the parities of α and β electron numbers for the number symmetry, 36 and the sign changes of the underlying wave function by (Abelian) point-group symmetry operations. 37 However, the spin-rotation operator e −iβŜy changes the number parity of α and β electrons as follows: Therefore, the Z 2 -symmetry can be exploited only for the total number operatorN =N α +N β , but not for resulting in the reduction of only one qubit instead of two. This is a necessary cost for spin-projection to exert its advantages; note that theŜ 2 symmetry is not categorized as a Z 2 symmetry and the tapering-off scheme is not applicable. From a different perspective, spin-projection can explore a "hidden" Hilbert space that is not accessible by standard UCC-like ansätze by deliberately breaking and restoring the number symmetry of each electron spin. Although one cannot use number parity to discard the other one qubit from the simulation, there is no such restriction for the point-group symmetry. In some of our calculations discussed below, we taper qubits to ease the computational cost. However, we will assume the number of CNOT gates N CNOT , which is central to measuring the circuit complexity in this study, remains the same. In addition, we will not discuss how gate operations would change by the transformation in the tapering-off algorithm.

E. First-order molecular properties
In molecular systems, one is often interested not only in the total energy but also chemical properties. Time-independent molecular properties are defined as the total energy change with respect to a perturbation x introduced to the Hamiltonian. To obtain properties from quantum computing, several studies have focused on evaluating analytical energy derivatives. 38,39 Although higher-order properties such as polarizability require higher-order derivatives, and therefore, are difficult to compute, first-order properties dE[x]/dx such as force and dipole moment can be easily obtained in VQE. This is because the quantum state is variational with respect to VQE parameters {θ k } used in the quantum circuit, However, in both ADAPT-VQE and SP-ADAPT-VQE, the canonical orbitals of HF commonly employed as a starting point are generally not optimal. The fully parametrized wave function is given byP Hence, the total energy is not usually fully stationary with respect to orbital change unless it is fully converged to the FCI state. In other words, the 1-particle generalized Brillouin condition is not satisfied; note that the left side of Eq. (23) is equivalent to the gradient R pq defined in Eqs. (6) and (15) for standard ADAPT and SP-ADAPT ansätze, respectively. To address this issue, we can compute the changes in molecular orbitals induced by x, by solving the coupledperturbed HF equation 40,41 or the Z-vector equation, 42 similar to previous studies. 38,39 In this study, we adopt an alternative (but mathematically equivalent) formulation commonly used in quantum chemistry, 43,44 and rederive the necessary equations to obtain first-order molecular properties in SP-ADAPT. Accordingly, we introduce the following Lagrangian, where z pq and F pq are Lagrange multipliers to be determined and off-diagonal elements of the canonical Fock matrix, respectively. We note that the canonical HF orbitals require F pq = 0 for p = q at each x, and therefore, L is stationary with respect to z pq . Evidently, for any x, L[θ, κ, z, x] reproduces the same value as E[θ, κ, x], and therefore, dL/dx ≡ dE/dx; however, L has an added advantage in that it can be stationary with respect to all variational parameters: θ, κ, and z (note that ∂L/∂θ k = 0 is automatically satisfied by VQE, see Eq. (20)). The stationary condition for κ can be achieved by solving the linear equation to determine z: is the Fock derivative with respect to orbital change and can be easily computed by a classical computer. We present the explicit form of A and the solution to Eq. (25) in Appendix.
Using the chain rule, where the last term is straightforward to evaluate once z is available. It is noteworthy that this approach is applicable to higher order derivatives in a mathematically clear way, although we will not go into the details. The Lagrangian L can be expressed by the so-called relaxed density matrices D relax , which incorporate the response correction ascribed to the orbital change by perturbation (Appendix). It can be easily shown that molecular properties as energy derivatives are computed using the relaxed density matrices, instead of the expectation value of the corresponding observable operator, that is, with the regular density matrices D pq = ψ|a † p a q |ψ . The simple expectation value using the latter is generally less accurate because the orbital response is neglected. To extend the orbital response correction to SP-ADAPT, we can resort to the derivation in Ref. [45]. In Appendix, we summarize the equations and also review the evaluation of nuclear gradients in the case where HF orbitals are a function of nuclear coordinates.
Although this approach can be applied to any VQE method, in several chemistry-inspired ansatzes such as UCCSD, R rs values are zero or negligible because single excitations may be explicitly treated in the variational optimization. However, we note that the frozen-core (or frozen-virtual) approximation is frequently exercised both in classical and quantum computing, where the lower core (higher virtual) orbitals are considered inert and fixed to those of the reference state that is often HF. Because these frozen orbitals are not explicitly optimized within VQE, a response correction is required. This can rs at each nth cycle by construction. Therefore, the energy derivative and first-order molecular properties are readily available by processing density matrices in a classical computer. Conversely, for the qubit-based ADAPT, one needs to explicitly compute R rs unless the quantum state is exact. Otherwise, the property would be computed as an expectation value and can suffer from an uncontrollable error.

A. Computational details
Before discussing our numerical results, we describe the computational details. All simulations were conducted using Quantum Unified Kernel for Emulation (Quket), developed by us, without the effect of noise. 48 Quket compiles several open-source libraries to generate a Hamiltonian mapped to a qubit basis and perform quantum simulations with parallel computing. Specifically, it employes PySCF 49 to generate molecular orbitals and integrals, and OpenFermion 50 to perform the Jordan-Wigner transformation. To simulate quantum circuits, we utilize Qulacs. 51 The energy minimization in VQE uses the Broyden-Fletcher-Goldfarb-Shannon (BFGS) algorithm implemented in Scipy. The geometry optimization was performed by interfacing Quket with pyberny. 52 For Fermionic-ADAPT, we adopt two different pools, Eqs. (3) and (16), and the controlled-Ry circuit to perform e θτ pq rs and e θτ p q and estimate the number of CNOT gates N CNOT in Fig. 1(c). If the same operator is chosen between two consecutive ADAPT steps because of a numerical error or Trotter error, we choose the opera-tor with the next largest gradient. We will not consider the effect of noise in this work, but simply evaluate the expected performance.
We summarize the ADAPT protocols used in this study and the corresponding operator pools in Table I. For spin-projection, we always use the pool of either the spin-dependent Fermionic excitations or qubit excitations.

B. Spontaneous symmetry breaking in ADAPT
First, we examine the spontaneous symmetry breaking and its consequence in ADAPT for the N 2 molecule with the STO-3G basis set. We have considered three different bond lengths, R N−N = 1.098, 1.8, and 2.5 Å to represent weakly and strongly correlated cases. The solid lines in Fig.2 depict the fidelity of the Fermionic ADAPT-VQE state to the exact FCI state | Ψ FCI |ψ ADAPT | 2 as a function of N CNOT , using either P or P spin , labeled as Fermionic and Fermionic (spin), respectively. In all these cases, ADAPT-VQE approaches the exact state as more operators are added, as expected. The figure indicates that there is no significant difference in convergence between P and P spin , although the former, the original algorithm proposed in Ref. [13], is more advantageous in that it requires less VQE parameters, as indicated by the circles. However, the convergence behavior is significantly different depending on the system. In other words, the stronger the correlation of a system, the slower the convergence of ADAPT-VQE to FCI.
This behavior is attributed to the broken spin characterized by Ŝ 2 , which is also illustrated in Fig. 2 with the dotted lines. In the strongly correlated case of R N−N = 2.5 Å (Fig. 2(c)), the HF state exhibits a very small overlap with FCI, with a fidelity of 0.2, which quickly increases to 0.8 after the initial few ADAPT steps with approximately 200 CNOTs. However, simultaneously, Ŝ 2 (simply the degree of the spin-symmetry breaking) becomes larger than 1, and at this point, the fidelity convergence gets trapped in a plateau.
Such large spin-symmetry breakings occur initially because, in the VQE step of ADAPT, large VQE parameters are essential for describing strong correlations (i.e., highly excited configurations from the HF reference). Consequently, the α and β electrons are treated quite differently, especially in the absence of the Trotter decomposition. 22 We should mention that the spinsymmetry is safely conserved if "paired" double excitations such asτ pαp β qαq β are chosen. In fact, the first few operators selected in ADAPT are of this type, and Ŝ 2 remains as zero (see Figs. 2 and S1). However, paired double excitations do not have a sufficient ability to describe all types of correlation effects, and spin-unpaired excitations are more suitable in several cases. With Ŝ 2 > 1, the ADAPT state is a mixture of several different spin states, each with a significant weight. Therefore, the derivative approach expressed in Eq. (6) to determine the  operator candidate is likely inappropriate, because such an operator search is performed in an incorrect symmetry space. Of course, by further processing ADAPT with more operators and CNOT gates, the lost spin-symmetry starts to be restored (∼700 CNOTs for Fig. 2(c)). At this point, the fidelity also quickly increases to one. Similar but less pronounced results can be confirmed for the tests with shorter bond lengths.
Although fermionic ADAPT follows theŜ z andN symmetries, qubit-ADAPT breaks all, and as a result, exhibits a significant slow-down of convergence to FCI. This is illustrated by Fig. 3, where we have plotted the fidelity and symmetry expectation values, Ŝ 2 , Ŝ z , and N using the problematic case of N 2 at R N−N = 2.5 Å. We note that, in this particular case, both algorithms coincidentally result in similar CNOT gate counts to achieve the FCI state. However, the number of VQE parameters for Fermionic-ADAPT to achieve a fidelity of 0.99 is 30, while that for qubit-ADAPT is 128, implying that more measurements are required for qubit-ADAPT. Interestingly, the symmetry-breaking of qubit-ADAPT in S z andN is rather moderate; however, the error in Ŝ 2 is substantially larger than that of Fermionic-ADAPT. Apparently, such a large error in Ŝ 2 severely affects the fidelity, and it is evident that the plateaus that emerge in the fidelity and Ŝ 2 plots correspond to each other.

C. Accuracy and gate efficiency in ADAPT and SP-ADAPT
As demonstrated above, symmetry-breaking is associated with strong correlation, and occurs spontaneously in the ADAPT algorithms, thereby slowing down the convergence. Therefore, one would naturally expect that maintaining the correct symmetry can mitigate the problem. In this section, we study the effect of spin-projection using several systems, which include the N 2 molecule (with R N −N = 1.098, 2.5 Å), linear chain of H 6 equally separated by 2 Å, and half-filled one-dimensional periodic Fermi-Hubbard model with six sites and U = 8. For the molecular systems, we have used the STO-3G basis set and six electrons in six orbitals. Therefore, all the models considered here are composed of 12 qubits.
We choose the initial state to be the HF determinant for the molecular systems; however, for the Hubbard model, we intentionally set the charge localized state where the first three sites are doubly occupied and the remaining three sites are empty to break the spatial symmetry. Fig. 4 presents the energy error from FCI and the number of CNOT gates in the quantum circuit for several methods, where the shaded area in orange corresponds to chemical accuracy (< 1 mHartree error). We first consider the performance of qubit-ADAPT. 15 For the molecular systems (Figs. 4(a-c)), no significant advantage is found over Fermionic ADAPT in terms of CNOT gate counts. This is because the current implementation of Fermionic ADAPT relies on the controlled-Ry circuit of Ref. [18], which can eliminate a large number of CNOT gates. In contrast, we find that qubit-ADAPT is disadvantageous in two aspects. First, the energy lowering in each VQE step is considerably small, as plotted in Fig. 5, and it requires several VQE parameters (measurements), which also makes it difficult to converge each VQE step because of a highly non-linear parameter space. Second, it breaks the electron number symmetry; the application of qubit-ADAPT to ionic cases is not straightforward, as illustrated by the example of the half-filled Hubbard model, where qubit-ADAPT falls into the true ground state of N = 4 and Ŝ 2 = 2. Now let us focus on the results obtained by spinprojection. Evidently, by eliminating the incorrect spinsymmetry components, SP-ADAPT methods (both for Fermionic and QEB) require much less CNOT gates to achieve the same accuracy as those without spinprojection. SP-Fermionic ADAPT is already effective for the weakly-correlated N 2 at equilibrium geometry, as illustrated in Fig. 4(a). The CNOT gate reduction in this system with spin-projection (by a factor of 0.6 ∼ 0.7) is almost solely attributed to the fact that spin-complement excitations are handled by the superposition of spinrotated states; in contrast, with Fermionic ADAPT, one needs to explicitly construct a quantum circuit to perform spin-complement excitations, except for paired doubles. This can be verified by plotting the energy error against the number of VQE parameters (Fig. 5). From Fig. 5(a), the number of VQE parameters required to achieve the same accuracy is quite similar between Fermionic and SP-Fermionic ADAPT algorithms, which implies the two algorithms yield similar quantum states for each ADAPT iteration.
The efficacy of spin-projection becomes more distinct for strongly correlated systems. For the stretched N 2 and H 6 (Figs. 4(b) and (c)), the number of CNOT gates required to reach the FCI ground state with SP-ADAPT is less than half of that with the corresponding brokensymmetry ADAPT. Having said that, spin-projection may not seem to be advantageous because it employs up to 200 CNOT gates. For instance, QEB-ADAPT exhibits a lower energy than SP-QEB-ADAPT in the stretched N 2 case, and the convergence of qubit-ADAPT is initially even faster. However, this is an artifact due to the unphysical spin contamination effect, and the quality of the quantum state is far from satisfactory. A large spin contamination also leads to incorrect properties and state assignment. Furthermore, as seen in Fig. 3, again, it often causes the plateau.
The number of required parameters in SP-ADAPT (Fig. 5) is less than half of standard ADAPT, as expected. As illustrated in Fig. 6, the scaling of CNOT gate counts with the number of used parameters is essentially identical between ADAPT and SP-ADAPT when the same operator pool is used. Therefore, a lower number of parameters in the latter simply implies a more gate efficient circuit at each energy accuracy. SP-Fermionic ADAPT converges slightly faster than SP-QEB ADAPT; this may simply be because the former contains more operators in the pool. However, it is interesting to observe that both the SP-Fermionic and SP-QEB algorithms require almost the same number of parameters to represent the exact FCI. This suggests we can further reduce the number of operators to form a pool, which can reduce the measurements for derivative evaluations, while expecting an unchanged fine convergence behavior.
Finally, we also performed UCCSD to evaluate the advantages of ADAPT and SP-ADAPT. In our UCCSD implementation, only the gates with non-zero parameters are constructed to minimize the gate depth. Because the molecular systems we consider have a high spatial symmetry, most UCCSD excitations are symmetryforbidden, thereby permitting gate-efficient circuits. With such a treatment, UCCSD is already as powerful as ADAPT for N 2 at equilibrium and H 6 (see the green plots in Figs. 4(a) and (c)). For the stretched N 2 , ADAPT appears to be significantly more efficient and accurate; however, this may well be attributed to the spin-symmetry breaking. For a fair comparison, we also present the broken-symmetry UCCSD 22 in red, which provides a fully variational solution at the cost of large spin-contamination. In addition, note that the "full" UCCSD, which performs symmetry-forbidden gate operations, that is, even if the amplitudes are zero, can be prepared with 2001 CNOT gates for all cases. For the Hubbard model, UCCSD naturally requires all excitations, and thus, 2001 CNOT gates, as we have initi-  ated the calculation with no spatial-symmetry; UCCSD's performance is similar to QEB-ADAPT. In conclusion, although ADAPT is imminently more flexible than the fixed ansatz of UCCSD, it "passes through" a UCCSDlike state in the process of converging to the FCI state; this is quite logical because UCCSD is usually considered an accurate method that is both chemically and theoretically well established.

D. Comparison between different operator pools
The previous section suggested that, not only is the number of CNOT gates required to obtain the FCI state constantly smaller for QEB-ADAPT than for Fermionic-ADAPT, but also the number of required variational parameters is very similar with spin-projection. We also performed calculations with spin-dependent Fermionic ADAPT using P spin defined in Eq. (16) and arrived at the same conclusion for broken-symmetry standard ADAPT algorithms.
The authors of Ref. [15] verified that only 2n − 2 Pauli operators can span the n-qubit real Fock space. Although this theorem appears very promising, much more operators are usually required in the ADAPT algorithm because it chooses only one operator at a time; thus, the pool has to be physical in some sense (i.e., it should contain fermionic excitations higher than singles to examine the k-particle generalized Brillouin theorem). Nevertheless, the theorem suggests the number of operators in the QEB pool can be further reduced, thereby providing an opportunity to save the computational efforts to choose the next operator. Here, we propose the following three additional qubit-excitation pools and compare their performances: • Scheme 1.τ pα qα ,τ p β qα r β sα (β singles and doubles with the same spins are omitted).
• Scheme 2. Same as Scheme 1 but with p ≥ q.
• Scheme 3. Same as Scheme 2 but with r ≥ s.
For the 12-qubit systems, the number of operators in each pool is 855, 555, 400, 190, and 155 for spindependent Fermionic, QEB, Scheme 1, Scheme 2, and Scheme 3, respectively. We have plotted the energy convergence of the earlier H 6 example against CNOT gates in Fig. 7 and parameter numbers in Fig. 8. From these figures, it can be observed that, at convergence (an energy error of 10 −6 ), all QEB schemes perform similarly in terms of both the numbers of CNOT gates and VQE parameters. However, each plot behaves slightly differently. For broken-symmetry ADAPT algorithms (indicated by the lines without points), full QEB provides a slightly better description than Scheme 3 before the convergence. This could be because the entire operator pool in QEB contains more choices than Scheme 3. In particular, the latter does not contain spin-complement operators, which are subsequently chosen in several instances because spin-complement excitations result in similar energy gradients. It is also clear from Fig. 8 that spindependent Fermionic ADAPT generally exhibits a better energy convergence for a given number of VQE parameters, although it is inefficient in terms of CNOT gate counts. For example, with 100 parameters, the energy obtained by Fermionic ADAPT is one order of magnitude more accurate than those by QEB schemes.
With spin-projection, the performances of different QEB schemes are almost equivalent with a certain CNOT gate count and parameter number. In particular, its accuracy is comparable to SP-Fermionic ADAPT when the number of parameters is fixed, because spin-projection can partially replicate the spin-complement effects. However, SP-Fermionic ADAPT is almost always more accurate than SP-QEB-ADAPT, although slightly. In the particular case of H 6 , their differences are marginal, and may be rather attributed to the fact that the current approach to determining the next operator is not optimal; operators that are not chosen could identify a lower energy than the chosen one when VQE is performed. However, this behavior is general, as is clearly observed in Fig. 5, and can seemingly become more significant for larger systems, as will be demonstrated later. Nevertheless, the improved similarity between the SP-Fermionic and SP-QEB algorithms over standard ADAPT is encouraging for practical applications where the actual operator number (or CNOT gate counts) that can be implemented is assumed to be limited.

E. Molecular properties
In this section, we study the molecular properties computed by several ADAPT-based algorithms using the method discussed in Sec. II E. In a practical application of the ADAPT algorithm, we cannot expect to obtain FCI accuracy, but have to make a tradeoff between accuracy and either the circuit depth or the number of measurements. To benchmark their accuracy, we compare the results obtained with the fixed number of VQE parameters, N param , or the number of CNOT gates, N CNOT .

Dipole moment of H2O
Dipole moment is an important first-order property of a molecule, providing a measure of polarity of the system. To assess the accuracy of ADAPT in computing this quantity, we have employed the H 2 O molecule with a fixed angle of 104.5 • and varied the OH bond length symmetrically. Fig. 9 summarizes the dipole moment error from the FCI value calculated at different ADAPT cycles (N param = 6, 9, and 12). The results of spin-dependent Fermionic ADAPT without and with the orbital response correction are presented in Figs. 9(a) and (b), respectively. As can be observed from the figures, the orbital response correction is important for describing the dipole moment correctly at short bond lengths (∼1.5 Å). Note that the correction effect is almost absent in UCCSD because it takes care of single excitations explicitly in VQE by playing the role of orbital relaxation. For the larger bond distance, the orbital response correction sometimes worsens the result; however, this means the ADAPT state is not stationary with respect to orbital rotation and the good dipole moments obtained in the unrelaxed results ( Fig. 9(a)) are fortuitous. Having said that, because we are using the HF canonical orbitals in ADAPT, the orbital Hessian matrix A in Eq. (25) can be nearly singular at a certain bond length (around 1.9 Å), where the first-order relaxed density matrix of ADAPT is overcorrected. This singular behavior is circumvented by increasing the number of operators such that the residual R rs is close enough to zero.
When spin-projection is applied, we essentially observe a similar trend (see Figs. 9(c,d)). Although spinsymmetry breaking in ADAPT does not affect the estimated dipole moment because both the singlet and contaminated triplet states give almost zero dipole moments at stretched bond lengths, SP-Fermionic ADAPT shows promising results, yielding better convergence with respect to N param , especially in a more strongly correlated region ( Fig. 9(d)).

Geometry and potential curve of O3
It is difficult to investigate the accuracy in estimating geometrical parameters of strongly correlated systems as we are restricted to small molecules or toy models because of a limited computational budget, and most small molecules are only weakly correlated at equilibrium. In fact, previous studies have considered such simple molecules, 38,39,53 which do not allow one to properly evaluate the potential of quantum computers. Another issue is that, for adaptive algorithms like ADAPT, the potential energy surface is not smooth, and determining the minimum can be challenging. However, it is interesting to benchmark how effective the geometry optimization in ADAPT methods can be. We think it is also pedagogical to compare the results of spin-dependent Fermionic and QEB ADAPT algorithms in predicting potential energy surfaces when the same quantum resource is available. Accordingly, we choose the ozone molecule with the STO-3G basis as our test case, using an active space comprising 9 orbitals and 12 electrons, resulting in an 18-qubit system. To make a direct comparison between the accuracy of each ADAPT method, we have assumed that the number of CNOT gates that one can handle for each calculation is limited (except for UCCSD, where 2534 CNOT gates were required). Namely, all ADAPT results were obtained with less than a certain N CNOT ranging from 300-1000. In all simulations, we tapered qubits to reduce the computational cost but assumed the total number of CNOT gates remain unchanged. Because O 3 is a two-determinant system, UCCSD is reasonably accurate, as indicated by the dotted gray line. Thus, symmetry-breaking in ADAPT is not so significant for both the Fermionic and QEB states. Our calculations indicate that Ŝ 2 in ADAPT is approximately 0.1 at most, at the equilibrium geometry estimated by FCI. From the figures, the bond length is more accurately predicted by Fermionic ADAPT than by QEB ADAPT, with or without spin-projection, while the results for the angle exhibit the opposite trend. Without spin-projection, the two methods yielded significantly different results for geometry optimization, especially at a larger N CNOT . Increasing N CNOT does not always lead to more accurate results, at least up to N CNOT = 1000. However, by applying spin-projection, it appears the geometries predicted by the SP-Fermionic and SP-QEB algorithms are improved with N CNOT . They are also similar to each other, implying the two methods also result in similar states.
To further investigate these results, we have plotted in Fig. 10(c) the potential energy curve of symmetric bond dissociation of O 3 with a fixed angle of 109.9 • , using N CNOT = 1000. In Fig. 10(d), we have also presented the energy error from FCI in Hartree. In all the methods, to a certain degree, the energy is more accurate at a shorter (weakly correlated) distance and becomes worse as the bond is stretched (strongly correlated). While this result is quite reasonable, we found that the change in QEB energy is rather significant compared to that in the Fermionic methods, especially without spin-projection. When comparing the Fermionic and QEB results, their energy accuracy is inverted at 1.40 Å. We took the nonparallelity error (NPE) of the potential energy curve, which is defined as the difference between the maximum and minimum errors from FCI throughout the potential curve, and tabulated in Table II. As this table shows, Fermionic ADAPT provides a more parallel curve to that of FCI than QEB ADAPT. This explains why the latter is less accurate than the former in predicting the bond length, as illustrated in Fig. 10(a).
At this point, we are uncertain whether this failure of QEB can be attributed to its fundamental deficiency or the possibly inappropriate choice of operators in the ADAPT algorithm. Because the deviation between Fermionic and QEB ADAPT results are significantly mitigated by spin-projection (Figs. 10(c,d)), we suspect symmetry breaking to be behind the different behaviors. Therefore, in Fig. 11, we have summarized the Ŝ 2 values obtained from the calculations in Fig. 10(c). In fact, we find that, although Fermionic ADAPT retains the same degree of symmetry breaking throughout all bond distances, QEB is more spin-contaminated when the molecule is stretched. As we have seen several times, symmetry breaking can slow down the convergence in ADAPT; therefore, the degree of spin-contamination is directly related to the energy accuracy with a fixed N CNOT . Hence, spin-projection plays an important role in equalizing the two schemes to some extent.

IV. CONCLUSIONS
One of the unsolved problems in quantum chemistry is a consensus way of treating strong correlations. Quantum computing is expected to provide a solution to this problem because it can, in principle, handle entangled quantum states directly mapped onto qubits. The recently proposed ADAPT algorithm has paved the way to construct an efficient quantum circuit adaptively; how-ever, its convergence is slow for strongly correlated systems because of a significant symmetry breaking.
To address this issue, we introduced spin-projection to ADAPT. We demonstrated that spin-projection can be quite effective when combined with ADAPT, offering shallower circuits with lesser CNOT gates and VQE parameters to achieve the same accuracy at the cost of increased measurements. This is especially the case if a gate-efficient circuit is adopted for fermionic and qubit excitations, whose performances are often comparable. Our calculations also indicated that the pool of QEB ADAPT can be reduced by discarding certain classes of spin-dependent qubit excitations. However, we concluded that it is not worthwhile to apply symmetryprojection to qubit-ADAPT because it would require the restoration of lostŜ z (and number) symmetry, which considerably increases the number of measurements in evaluating the energy.
We have also derived the first-order energy derivative in the presence of spin-projection, which enabled the calculation of dipole moment and geometry optimization with SP-ADAPT. It was demonstrated that the orbital response correction is important both for ADAPT and SP-ADAPT in the calculation of dipole moment, because they are far from stationary with respect to orbital changes, unlike UCCSD, which is less sensitive to orbital rotation because of the presence of explicit single excitations. Furthermore, the method was applied to the geometry optimization of O 3 to quantify the capabilities of Fermionic and QEB schemes. We found that QEB ADAPT is less stable in achieving a constant accuracy throughout the potential energy surface and thus is less predictable than Fermionic ADAPT in terms of optimized geometry. The deviation between the two schemes was caused by a larger spin-contamination in QEB ADAPT and could be largely mitigated by performing spin-projection. However, the reason behind this unfavorable behavior of QEB ADAPT remains unclear, and further studies will be required to elucidate its cause.
Similarly, solving Eq. (A4c) results in It should be mentioned that if orbitals are (nearly-)degenerate, the denominator can become numerically zero; however, in such a case, we expect the numerator to be approximately zero, meaning the rotation is redundant and can be discarded. We have not faced such a numerically challenging situation. D pq = ψ|a † p a q |ψ (A17) D pq,rs = ψ|a † p a † q a s a r |ψ (A18) Note that D relax pq,rs is explicitly anti-symmetrized such that D relax pq,rs = −D relax qp,rs , and so on, for convenience.

Frozen-core approximation
When the frozen-core approximation is exercised, the VQE energy derivative with respect to the mixing between the frozen-core I and active p orbitals (i.e., those mapped to qubit), R pI , cannot be directly evaluated by qubit measurements. Nevertheless, we can compute these values using the density matrices within the active space as follows: with general spin-orbitals P, Q.
The Fock derivatives and multipliers with frozen-core indices, such as A aI,ck and z cI , are computed with the same equations as above, simply by expanding i, j, k, l to include the frozen-core orbitals.
For the frozen-virtual approximation, we must consider the contribution from the frozen-virtual orbitals A:

Nuclear gradients
In several cases, HF orbitals are expressed as a linear combination of atomic orbitals (AO) that are functions of atomic coordinates. If AOs depend on the perturbation x (nuclear displacement), the electronic energy derivative includes a contribution from the AO overlap derivative S (x) = ∂ φ µ |φ ν /∂x. The general expression is where the superscript (x) represents the (explicit) partial derivative, and W µν denotes the so-called energy weighted density matrix,

Spin-projection
When the spin-projection operatorP is present, the same procedure is followed, but with Eq. (15) instead of Eq. (6) for R pq . As a caveat, for spin-dependent properties such as hyperfine coupling constants, the relaxed density matrices need to be slightly modified. Our SP-VQE energy is expressed as ψ|P |ψ pq h pq ψ|a † p a qP |ψ + 1 4 pqrs pq||rs ψ|a † p a † q a s a rP |ψ (A25) Here, notice that ψ|a † p a qP |ψ / ψ|P |ψ does not correspond to the genuine density matrix of SP-VQE, that is, ψ|P † a † p a qP |ψ = ψ|a † p a qP |ψ . In fact, the "halfprojected" density matrix and its response correction are not spin-adapted, and therefore, lead to an incorrect spin density. To avoid this issue, the Wigner-Eckart theorem can be adopted, which is expressed aŝ where s 1 m 1 s 2 m 2 |S M represents the Clebsch-Gordan coefficient andP s m,k = |s; m s; k| denotes the transfer operator, which is a general form of the spin-projection operatorP ≡P s m,m .