Penalty methods for variational quantum eigensolver

The variational quantum eigensolver (VQE) is a promising algorithm to compute eigenstates and eigenenergies of a given quantum system that can be performed on a near-term quantum computer. Obtaining eigenstates and eigenenergies in a specific symmetry sector of the system is often necessary for practical applications of the VQE in various fields ranging from high energy physics to quantum chemistry. It is common to add a penalty term in the cost function of the VQE to calculate such a symmetry-resolving energy spectrum, but systematic analysis on the effect of the penalty term has been lacking, and the use of the penalty term in the VQE has not been justified rigorously. In this work, we investigate two major types of penalty terms for the VQE that were proposed in the previous studies. We show a penalty term in one of the two types works properly in that eigenstates obtained by the VQE with the penalty term reside in the desired symmetry sector. We further give a convenient formula to determine the magnitude of the penalty term, which may lead to the faster convergence of the VQE. Meanwhile, we prove that the other type of penalty terms does not work for obtaining the target state with the desired symmetry in a rigorous sense and even gives completely wrong results in some cases. We finally provide numerical simulations to validate our analysis. Our results apply to general quantum systems and lay the theoretical foundation for the use of the VQE with the penalty terms to obtain the symmetry-resolving energy spectrum of the system, which fuels the application of a near-term quantum computer.

The variational quantum eigensolver (VQE) is a promising algorithm to compute eigenstates and eigenenergies of a given quantum system that can be performed on a near-term quantum computer. Obtaining eigenstates and eigenenergies in a specific symmetry sector of the system is often necessary for practical applications of the VQE in various fields ranging from high energy physics to quantum chemistry. It is common to add a penalty term in the cost function of the VQE to calculate such a symmetry-resolving energy spectrum, but systematic analysis on the effect of the penalty term has been lacking, and the use of the penalty term in the VQE has not been justified rigorously. In this work, we investigate two major types of penalty terms for the VQE that were proposed in the previous studies. We show a penalty term in one of the two types works properly in that eigenstates obtained by the VQE with the penalty term reside in the desired symmetry sector. We further give a convenient formula to determine the magnitude of the penalty term, which may lead to the faster convergence of the VQE. Meanwhile, we prove that the other type of penalty terms does not work for obtaining the target state with the desired symmetry in a rigorous sense and even gives completely wrong results in some cases. We finally provide numerical simulations to validate our analysis. Our results apply to general quantum systems and lay the theoretical foundation for the use of the VQE with the penalty terms to obtain the symmetry-resolving energy spectrum of the system, which fuels the application of a near-term quantum computer.

I. INTRODUCTION
Quantum computers have been attracting attention because they are expected to solve some classes of computation problems remarkably faster than classical computers [1][2][3][4]. Quantum computers that are expected to be realized in the near future are called noisy intermediatescale quantum (NISQ) devices, which contain a few hundred qubits and do not perform error correction on their qubits [5]. Although NISQ devices are not suitable to perform such algorithms [1][2][3][4] that practically require error correction with numerous qubits, simulating quantum many-body systems is one of the most promising ways to utilize NISQ devices. In particular, the variational quantum eigensolver (VQE) [6], which was already performed on a real NISQ device [6][7][8], computes the ground-state energy and the ground state of a given quantum system, for example, molecules [9,10]. In the VQE, we prepare a parameterized state, an ansatz state, on a quantum computer and minimize the expectation value of energy for this state by updating the parameters. Recently, variations of the VQE have been developed to compute not only the ground state and it energy but also excited states/energies [8,[11][12][13][14][15][16].
In many cases appearing in various fields ranging from high-energy physics to condensed matter physics, a Hamiltonian we encounter possesses some symmetry. It is then of great importance to obtain the energy spectrum of the Hamiltonian for each symmetry sector (eigenvalue * kkuroiwa@uwaterloo.ca † nakagawa@qunasys.com of the symmetry operation) to analyze the properties of the system. For example, such a symmetry-resolving spectrum is important to predict photochemical reactions in quantum chemistry [17]. It can also tell the nature of the spontaneous symmetry breaking in condensed matter physics [18]. When we want to use the VQE algorithm to compute eigenvalues/eigenstates of a given Hamiltonian that reside in a target sector of the symmetry, there are several options. The first one is to use a special ansatz state in the VQE so that the ansatz state always belongs to the desired sector of the symmetry [19], but the construction of such ansatz strongly depends on the heuristic knowledge of the symmetry and is not always readily available for generic symmetries. The second one is to "taper off" the qubits by finding a unitary transformation in which eigenvalues of the symmetry are encoded into specific qubits [20]. This technique is more general compared with the first one, but the sector (eigenvalue) of the symmetry can be designated only in modulo 2. The third option, which is the most general and we study in this work, is to include penalty terms in the VQE algorithm to aim to guarantee that the resulting state of the VQE has the desired symmetry [15,[21][22][23][24]. We refer to such a strategy, or the VQE with penalty terms, as the constrained VQE in this work following Ref. [24].
In the previous work, two types of penalty terms (detailed definitions are found in Eqs. (5) and (6)) have been suggested to perform the constrained VQE, i.e., to compute eigenstates possessing the desired symmetry. However, there are serious problems in such penalty terms that can hinder practical applications of the constrained VQE. First, systematic analysis of the coefficients of penalty terms, penalty coefficients, has been lacking. The coefficients stipulate the magnitude of the penalty for the ansatz state that deviates from the desired symmetry sector. Small coefficients are preferable for the fast convergence of the VQE, but when they are too small, the resulting state of the VQE does not necessarily have the desired symmetry. Currently, the coefficients are heuristically chosen, and there is no guarantee that one can obtain the state with the desired symmetry as a result of the VQE. Second, the performance of the two types of penalty terms is not explored. Evidence for which one is superior to the other greatly helps when we utilize the constrained VQE in practical problems.
In this work, we theoretically study the two types of penalty terms and report several crucial findings to exploit the constrained VQE in actual calculations. For one type of them (Eq. (5)), we provide a convenient formula to determine the penalty coefficients that can rigorously ensure the result of the constrained VQE has correct symmetry. On the other hand, remarkably, we prove that the other type of penalty terms (Eq. (6)) does not work for any finite value of the coefficients in a rigorous sense. We show that it even yields a completely wrong answer in some cases. To validate our theoretical analysis, we provide numerical experiments simulating the constrained VQE. Our findings resolve the problems of the constrained VQE and pave the way for the further applications of it. The analysis of energy spectrum with resolving their symmetry is ubiquitous in the study of modern quantum physics; our results have a broad impact on the use of near-term quantum computers (or NISQ devices) in various fields.
The rest of this paper is organized as follows. In Sec. II, we review several basic concepts underlying the constrained VQE. We provide our main results in Sec. III. We analyze the two types of cost functions in the constrained VQE and derive several implications for both of them. In Sec. IV, we perform numerical simulations to validate our theoretical results in Sec. III. In Sec. V, we summarize our results and give a conclusion.

II. PRELIMINARIES
In this section, we review several concepts necessary to understand our main results. In Sec. II A, we briefly explain the VQE algorithm. In Sec. II B, we review a variant of the VQE algorithm to obtain excited states, which is called the variational quantum deflation (VQD) algorithm. Finally, in Sec. II C, we introduce the constrained VQE/VQD, or the VQE/VQD with penalty terms.

A. Variational Quantum Eigensolver (VQE)
The VQE [6] is a quantum-classical hybrid algorithm to compute the ground state and its energy of a given system and considered to be executable on NISQ devices. Suppose that we have a quantum system with a Hamil-tonianĤ. We prepare an ansatz state of n qubits, using an ansatz circuit (unitary operation) where θ := (θ 1 , . . . , θ l ) are real-valued parameters, V i (θ i ) is a parameterized quantum gate, and |ψ 0 is some reference state. In the VQE, we optimize these parameters so that the expectation value of the energy, is minimized with respect to θ. Then, letting θ * be the optimum, we can regard L VQE (θ * ) as a nice approximation of the ground-state energy and |ψ(θ * ) as that of the ground state due to the variational principle. During the process of the algorithm, the preparation of the ansatz state |ψ(θ) and the measurements necessary to compute L VQE (θ) are performed by a quantum computer while the optimization is completed solely by a classical computer. This separation of roles between classical and quantum computers alleviates the hardware requirement for quantum computers and make the algorithm possible to run even on NISQ devices.

B. Variational Quantum Deflation (VQD)
The VQE has been extended to compute excited states [8,[11][12][13][14][15][16]. Among those extensions, the variational quantum deflation (VQD) [15] is shown to be possibly the most efficient [25] compared with the two major ones, subspace-search VQE (SSVQE) [12] and the multistate contracted VQE (MCVQE) [13]. In this work, we focus on the VQD and review it here. We note that our analysis in section III also applies to the SSVQE and MCVQE (see the second last paragraph of Sec. III A).
Suppose that our goal is to obtain the kth excitedstate energy ofĤ. The VQD algorithm assumes that we already obtain the eigenstates up to (k − 1)th as |ψ i (i = 0, 1, . . . , k − 1). The cost function to be minimized in the VQD algorithm is where θ k are variational parameters to be optimized, |ψ(θ k ) is an ansatz state like Eq. (1), β i is an appropriate positive number to guarantee the orthogonality between the optimized ansatz state and the eigenstates up to (k −1)th. Let θ * k be the optimal parameters. Then, as a result of the optimization, L (k) VQD (θ * k ) and |ψ(θ * k ) can be regarded as good approximations of the kth excitedstate energy and the kth excited state, respectively. By incrementing k from k = 1 to the desired energy level, one can obtain any eigenstate/energy ofĤ. We note that the procedure of the VQD can also be viewed as finding the ground state of the modified Hamiltonian H =Ĥ + k−1 i=0 β i |ψ i ψ i | by using the original VQE algorithm; that is, the cost function is the expectation value ofĤ , L (k) VQD (θ k ) = ψ(θ k )|Ĥ |ψ(θ k ) .

C. Constrained VQE/VQD
As explained in the introduction, the Hamiltonian to which we want to apply the VQE/VQD often has some symmetry, and it is essential to compute eigenvalues/energies in the desired symmetry sector. For such purpose, we can introduce a penalty term to the cost function of the VQE/VQD algorithms [15,[21][22][23][24], which we call the constrained VQE/VQD. This can be interpreted as the penalty method studied in the field of computational optimization [26].
LetĈ be an observable (a Hermitian operator) corresponding to the conserved quantity associated with the symmetry [27]; [Ĥ,Ĉ] = 0. For example, when the system has U (1) symmetry, there is the particle-number conservation law, andĈ is the particle-number operatorN . Suppose that we want to find an eigenstate ofĤ that is also an eigenstate ofĈ with the eigenvalue c. Here, c determines the symmetry sector. For the constrained VQE/VQE, two types of penalty terms and the cost function have been proposed: where L(θ) is either L VQE (θ) or L (k) VQD (θ) for the constrained VQE/VQD, |ψ(θ) is an ansatz state with parameters like Eq. (1), and µ C is some positive number to penalize the deviation of the expectation value ofĈ for |ψ(θ) from the desired value c.
Intuitively, when µ C is infinitely large, the second terms of the right hand sides of Eqs. (5) and (6) get also infinitely large if ψ(θ)|Ĉ |ψ(θ) = c. The minimum of the cost function may be obtained at θ * satisfying ψ(θ * )|Ĉ |ψ(θ * ) = c and the resulting state |ψ(θ * ) may become an eigenstate ofĈ with the eigenvalue c. However, in the previous studies, there is no concrete mathematical discussion how the penalty terms in Eqs. (5) and (6) work to guarantee that the minimum of the cost functions is reached at the state satisfyinĝ C |ψ(θ * ) = c |ψ(θ * ) . Moreover, it is known in the field of computational optimization that the large coefficient µ C slows the convergence of the optimization process [26]. The penalty coefficient µ C is preferably small as long as one can assure that the resulting state of the optimization is an eigenstate ofĈ with the eigenvalue c.

III. MAIN RESULT
We present our main results in this section. We theoretically analyze the two cost functions (5) and (6) for the constrained VQE/VQD, focusing on the role of the penalty terms and the effect of the values of µ C on the optimization result. For F (1) (θ), we derive a rigorous condition of the penalty coefficient µ C that guarantees the result of the optimization yields a quantum state having the desired eigenvalues ofĈ. Utilizing the condition, we also derive a convenient formula to set an appropriate value of µ C in practical calculations. For F (2) (θ), we prove that this cost function does not work at all in a rigorous sense; a quantum state after the optimization always deviates slightly from the eigenstate ofĈ for any finite value of µ C . We show that the cost function F (2) (θ) gives completely wrong results for some cases.
Before presenting our main results, let us clarify the notation. We consider a HamiltonianĤ on n-qubit Hilbert space and an conserved quantityĈ which commutes with the Hamiltonian, [Ĥ,Ĉ] = 0. We write the spectral decomposition ofĤ andĈ aŝ where |i is an eigenstate ofĤ with eigenvalue E i (E 0 E 1 · · · E 2 n −1 ) and C i is the eigenvalue of |i for C. We want to obtain the ground (excited) state energy of the sector of C i = c by the constrained VQE (VQD). Since the VQD can be viewed as a search for the ground state of the modified HamiltonianĤ as explained in Sec. II B, here we assume that the target state is the ground state of the sector of C i = c. We denote the target state energy by E i0 , which means that C i = c for i = 0, . . . , i 0 − 1.
In addition, because our purpose is to explore the performance of the cost functions (5) and (6) itself, we assume that the ansatz state |ψ(θ) (Eq. (1)) is perfect. That is, we expand the ansatz states as with a i ∈ C and consider a i takes any value with satisfying i |a i | 2 = 1.
A. Analysis of F (1) : Formula for µC Here, we investigate how the penalty term in F (1) (θ) works. By substituting Eq. (8) into the cost function (5), we have To obtain E i0 as a result of the minimization of Eq. (9) with respect to the parameters , and µ C > 0. Therefore, it suffices to take In other words, if one take µ C satisfying Eq. (11), it is guaranteed that the cost function (5) works properly to choose the desired quantum state that has the eigenvalue c forĈ as a result of the minimization (with the parameters |a i0 | 2 = 1, |a i =i0 | 2 = 0). Moreover, we can derive a little loose but much simpler formula for µ C than Eq. (11). Let C min be the smallest gap among distinct eigenvalues ofĈ. Then, the right hand side of (11) can be upper-bounded as and we obtain a convenient formula for µ C : This simple equation is one of our main results. In the following, we explain how to estimate the right hand side of the equation in practical calculations.
The smallest gap C min is uniquely determined for each conserved quantityĈ, and it is easy to compute C min for most of the cases we consider in quantum chemistry and condensed matter physics. For example, whenĈ is the particle number operatorN , we have C min = 1 because its eigenvalues are non-negative integers. For the total spin-squared operatorŜ 2 , since its eigenvalues are S(S + 1) for S = 0, 1/2, 1, . . ., it follows C min = 3/4. For the zcomponent of the total spin operatorŜ z , we have C min = 1/2 because its eigenvalues are 0, ±1/2, ±1, . . .. We use these values for numerical simulations in Sec. IV. We note that because we consider the n-qubit Hilbert space, the eigenvalues ofĈ are always discrete and C min > 0.
On the other hand, there are several ways to estimate the numerator of Eq. (13), or the energy gap between the desired state and the ground state, E i0 − E 0 . One way is to replace E i0 and E 0 with rough estimations calculated by classical computational methods that are less costly. For example, in the case of quantum chemistry, we can use the Hartree-Fock (mean-field) method or the Møller-Plesset method [28], which are much less costly on classical computers than the exact diagonalization method, to estimate E i0 and E 0 . We call the coefficient µ C determined by a classical method µ (ce) where E (ce) i0 , E (ce) 0 are classically-estimated values. Another way to estimate E i0 − E 0 is to use the rigorous upper bound of it. LetĤ = j c j P j be a decomposition of a given Hamiltonian into Pauli operators. When we perform the VQE/VQD, this decomposition is already obtained [6,21]. By denoting the spectral norm of operators as · , it follows E i0 − E 0 2 Ĥ 2 j |c j | because P j = 1. In this way we define another choice of µ C , which is easy and applicable to any system but may be too large for the fast convergence of the constrained VQE/VQD (see Sec. IV).
We note that the classical estimation µ C still works properly and we can find the desired eigenstate. When we underestimate E i0 − E 0 , it is possible that the minimum of the cost function differs from the desired eigenstate. We can nevertheless know whether such cases happen or not because we compute the value of ψ(θ)|(Ĉ − c) 2 |ψ(θ) during the optimization; when the value deviates from 0 drastically even after the optimization, we judge that the optimization fails to obtain the desired eigenstate. We can then adopt slightly larger µ C for the optimization.
Finally, we discuss several extensions of our result. First, our result applies to a case where we have multiple conserved quantities {Ĉ (l) } l . The cost function in this case is The same discussion deriving Eq. (13) leads to the condition for the optimized result to yield the eigenstate of C (l) with eigenvalue c (l) : where C (l) min is the smallest gap among distinct eigenvalues ofĈ (l) . Second, the formula (13) is also applicable to the other VQE-based algorithms to obtain the excited states, namely, SSVQE [12] and MCVQE [13]. The result is given by simply replacing E i0 of Eq. (13) with the largest energy E ex that one wants to obtain as a result of the optimization.
In summary, we derive the formulas (11) and (13) for µ C in F (1) (θ) that apply to general quantum systems, and they always guarantees that we can obtain the desired eigenstates/energies as a result of the optimization of F (1) (θ).
B. Analysis of F (2) : Failure to obtain the desired eigenstates In this section, we investigate the cost function F (2) (θ) (Eq. (6)) and prove that we cannot obtain the exact desired energy/eigenstate by minimizing this cost function. Concretely, the resulting energy after the minimization of F (2) (θ) deviates from the one that we want to obtain by O(1/µ C ) even in the best cases and can be completely wrong in the worst cases.
Substituting Eq. (8) into Eq. (6) leads to We minimize this cost function with respect to the pa- and see if E i0 is obtained as a result of the optimization (with the parameters |a i0 | 2 = 1, |a i =i0 | 2 = 0).
To graphically see the way the magnitude of µ C affects the global minimum of the cost function, we consider an orthogonal coordinate plane whose vertical axis represents the expectation values ofĈ and horizontal axis represents the expectation values ofĤ, as depicted in Fig. 1. For example, a point corresponding to some state |φ on this plane is ( φ|Ĉ|φ , φ|Ĥ|φ ). Hereafter, we call this coordinate plane (C, E) plane. For the ansatz state |ψ(θ) (Eq. (8)), the expectation values are so all possible points corresponding to |ψ(θ) on (C, E) plane constitute the convex envelope defined by the 2 n points, {C 0 , E 0 ), (C 1 , E 1 ), . . . , (C 2 n −1 , E 2 n −1 )} (the region colored cyan in Fig. 1). On this (C, E) plane, a contour line of the cost function (18) (i.e., a set of points where Eq. (18) takes the same value) is a parabola: where f is the value of Eq. (18). Therefore, the minimization of the cost function is identical to finding the smallest f such that the parabola (21) and the convex envelope defined by have non-empty intersection. The smallest f is achieved when the parabola (21) touches the polygon at just one point as shown in Fig. 1.
Depending on the location of the point (c, E i0 ) corresponding to the desired eigenstate, we can consider two cases: (A) The point (c, E i0 ) is a boundary point of the convex envelope (the blue point in Fig. 1), (B) The point (c, E i0 ) is an interior point of the convex envelope (the red point in Fig. 1).
In case (A), unless i 0 = 0 (the desired state is the ground state of the Hamiltonian), the global minimum of the cost function is reached at the point slightly different from (c, E i0 ) (the green and yellow points in Fig. 1). Let us write an edge of the convex envelope that is tangent to the parabola (21) as E − E i0 = α(C − c) with some real number α on (C, E) plane. It is straightforward to show where (C t , E t ) is a coordinate of the tangent point and f min is the value of f in Eq. (21) for (C, E) = (C t , E t ), i.e., the minimal value of the cost function. Those equation mean that the expectation values ( ψ(θ)|Ĉ |ψ(θ) , ψ(θ)|Ĥ |ψ(θ) ) at the global minimum of the cost function (6) always deviate from the target ones (c, E i0 ) by O(1/µ C ) for any finite µ C . Only for infinitely large µ C , the tangent point becomes (c, E i0 ), and we get the desired eigenstate. On the other hand, in case (B), the desired eigenstate can never be obtained as a result of the optimization even for infinitely large µ C ; the desired point (c, E i0 ) has no chance to be a tangent point to the parabola (21). Before ending this section, we give another intuitive explanation for the reason why F (1) (θ) works properly to choose the desired eigenstate while F (2) (θ) does not. In the expression of F (1) (θ) (the right hand side of Eq. (9)), both the "energy part" (the first term) and the "conserved-quantity" part (the second term) are proportional to |a i | 2 . On the other hand, in F (2) (θ) (the right hand side of Eq. (18)), the conserved-quantity part is a quadratic function of |a i | 2 while the energy part is proportional to |a i | 2 . Since |a i | 2 1 for all i, the deviation in the conserved-quantity part from the desired value is less penalized for F (2) (θ) than for F (1) (θ). This makes the difference between the performance of the cost functions F (1) (θ) and F (2) (θ). Indeed, the minimum of F (2) (θ) in case (A) is achieved at a point where the value of conserved quantity deviates from the desired one, and the energy gets smaller than the desired energy.
In short, we show that the cost function F (2) (θ) for the constrained VQE/VQD (Eq. (6)) does not work at all in a rigorous sense. We can only obtain a slightly-deviated desired eigenstate with the error of O(1/µ C ) even in the best cases while we obtain a totally different eigenstate in the worst case.

IV. NUMERICAL SIMULATIONS
In this section, we numerically simulate the constrained VQE/VQD for two molecules, H 2 and H 4 , to  7)), and the convex envelope defined by them is colored cyan. The curves colored yellow, green, and pink are the parabolas (21) with small, medium-sized, and large µC , respectively. The tangent points to the convex envelope (i.e., the global minimum of the cost function) are also indicated by the points in the same color. The dashed vertical line represents the desired expectation value of the conserved quantity.
validate our results presented in the previous section. We also discuss the comparison of the performance of the two cost function F (1) (θ) and F (1) (θ) in practical calculations.
Our setup for numerical simulations is as follows. We consider a hydrogen molecule H 2 at bond distance R = 0.7414Å and a hydrogen chain H 4 where four hydrogen atoms are aligned in line with the identical bond distance R = 2.0Å. We adopt the STO-3G minimal basis set to perform the restricted Hartree-Fock calculation for the two molecules. We prepare the fermionic secondquantized Hamiltonian for electrons using PySCF [29] and Openfermion [30] and use Jordan-Wigner transformation [31] to map the fermionic Hamiltonians into qubit Hamiltonians [9,10]. The number of qubits to express the Hamiltonian is four and eight for H 2 and H 4 , respectively. As conserved quantities, we consider the total number of electronsN e , the total spin-squaredŜ 2 , and the total z-component of the spinŜ z . Those conserved quantities are also transformed into operators on qubits by Jordan-Wigner transformation.
We employ the hardware-efficient type ansatz [7] shown in Fig. 2 for H 2 . The depth of the ansatz is D = 4 in the constrained VQE to compute T 1 state (defined later) and D = 12 in the constrained VQD to com-

H2
H4 triplet T1Ne = 2,Ŝ 2 = 2,Ŝz = −1Ŝ 2 = 2,Ŝz = −1 singlet S1Ne = 2,Ŝ 2 = 0Ŝ 2 = 0 TABLE I. The penalty terms included in the cost functions of the constrained VQE/VQD in numerical simulations. Since the real-valued symmetry-preserving type ansatz [19,25] preserves the number of electrons of the reference state (see Eq. (1)) and we take |ψ0 = |00001111 , we do not set the penalty onNe for H4. pute S 1 state (defined later). We adopt the real-valued symmetry-preserving type ansatz of the depth D = 12 introduced in Ref. [19,25] for H 4 to perform the constrained VQE/VQD. We do not include any noise for quantum circuit simulations, and exact expectation values are used in numerical simulations. All simulations are performed by a high-speed quantum circuit simulator Qulacs [32].
A. Numerical demonstration for analysis on F (1) We perform simulations to validate the analysis on F (1) (θ) in Sec. III A and investigate how the magnitude of penalty coefficient µ C affects the accuracy and the convergence of the optimization.
As for the target state of the constrained VQE/VQD, we consider the T 1 state (the ground state in the triplet sector) which corresponds toŜ 2 = 2 andŜ z = −1 and the S 1 state (the first excited state in the singlet sector) which corresponds toŜ 2 = 0 andŜ z = 0. Note that the ground state is the spin singlet (Ŝ 2 = 0 andŜ z = 0) state for both of H 2 and H 4 molecules. We set the penalty terms for some ofN ,Ŝ 2 and,Ŝ z depending on the molecule and the target state, which is summarized in TABLE I. We compute the energy of the S 1 (T 1 ) state as the ground state (first excited-state) energy of the constrained VQE (VQD) with these constraints.
First, we estimate µ  15)) for H 2 and H 4 molecules. We use the con-figuration interaction singles and doubles (CISD) method implemented in PySCF [29] to compute the classical estimations of the energies appearing in µ (ce) C . Note that we may use other less-costly methods for large molecules. The values of µ C are presented in TABLE II. As expected, µ (rough) C is much larger than µ (ce) C for all coefficients, which may cause the slow convergence of the optimization in the constrained VQE/VQE.
Next, by using the values of µ C in TABLE II, we numerically simulate the constrained VQE/VQD with the cost function F (1) (θ) (Eq. (5)). The optimization is performed by several optimizers (Powell, CG, and BFGS) implemented in SciPy, a numerical library of Python [33] with default parameters. We simulate the constrained VQE (VQD) to obtain the T 1 state (S 1 state) for both H 2 and H 4 molecule. When performing the constrained VQD for the S 1 state, we take a hyperparameter β 0 (see Eq. (4)) as β 0 = 2(E S1 − E 0 ), where E S1 and E 0 are energies of the S 1 state and the ground state, respectively. This choice of β 0 is based on the original VQD proposal [15] to assure the VQD cost function L VQD (θ) to bring out the excited state properly. In practice we also estimate the value of β 0 in the same way as we do for µ C , β The results for H 2 and H 4 are shown in TABLEs III and IV, respectively. Those results clearly indicate that the penalty coefficient µ C determined by our formulas in Sec. III A is sufficient for the cost function to choose the desired eigenstates. Moreover, we observe that the number of the function evaluations is much smaller for µ (ce) C than for µ (rough) C . The formula for µ (rough) C is convenient and always applicable as long as the Hamiltonian is known, but it may be inappropriate for the fast convergence in practical calculations. We note that the optimization result is slightly worse for Powell method possibly due to the imperfectness of the optimization, nevertheless the tendency of the small number of the function evaluations in µ (ce) C is well observed.
B. Numerical demonstration for analysis on F (2) and comparison of practical performance of two cost functions We perform the following two simulations to validiate our analysis on F (2) (θ) in Sec. III B and to compare the two cost functions F (1) (θ) and F (2) (θ).
(i) Computing the energy of T 1 state for H 4 by the  (14) and (15) for the target eigenstates of H2 and H4.
(ii) Computing the energy of S 1 state of H 2 using the constrained VQD under the constraintN e = 2 with µ C = 100. Since we set the constraint only forN e in this case, the S 1 state is obtained as the forth excited state of the constrained VQD (k = 4). The hyperparameters of the VQD {β i } 3 i=0 (see Eq. (4)) are all set to 3.0.
For both simulations, we use BFGS optimizer, which shows the best performance in Sec. IV A.
The results of the simulation (i) are shown in TA-BLE V. We perform the constrained VQE for ten different initial values and calculate the average of the expectation values of the Hamiltonian for the resulting optimized states. Unlike in the previous subsection, we calculate the average number of the "Pauli measurements" to assess the computational cost for the constrained VQE. The number of the Pauli measurements is defined as (the number of evaluations of the cost function during the optimization) × (the number of the Pauli operators to be measured to evaluate the cost function once). This number represents the actual computational cost to perform experiments on a real NISQ device [21].
From TABLE V, we can observe two facts. First, the accuracy ("residuals") of the optimization tends to improve with 1/µ C , which validates the discussion in Sec. III B. Second, more importantly from the viewpoint of practical applications, the number of the Pauli measurements for F (2) (θ) is smaller than that for F (1) (θ). This is mainly because the evaluation of F (1) (θ) involves the evaluation of the expectation value of the operator (Ĉ − c) 2 that consists of more Pauli operators than the originalĈ does. The evaluation of F (2) (θ) necessitates only the expectation value ofĈ itself. If we choose the best cases for F (1) (θ) (µ C = 0.01) and F (2) (θ) (µ C = 100), where both cost functions gives sufficiently accurate results, the number of the Pauli measurements still is smaller for F (2) (θ). This interesting observation indicates that even though F (2) (θ) cannot theoretically achieve an exact desired energy, it may achieve a sufficiently accurate energy with small computational costs for some cases.
Nevertheless, we also find a practical case where F (2) (θ) can never achieve a target energy even with H2 Triplet T1 H2 Singlet S1 µ The results of the numerical simulations for calculating the T1 state and S1 state of H2 by the constrained VQE/VQE with the cost function F (1) (θ) whose penalty coefficient is µ (ce) or µ (rough) . The optimizer is chosen from Powell, CG, and BFGS methods. "nfev" is the number of evaluations of the cost function during the optimization, and "residuals" represents the difference between the expectation value of the Hamiltonian for the resulting optimized state and the theoretical (or desired) energy in Hartree (Ha). The theoretical value for the T1 state is −0.532 479 Ha, and that of the S1 state is −0.169 901 Ha. We perform the constrained VQE/VQD for ten different initial values, and all values shown in this table are the average for the ten trials.
H4 Triplet T1 H4 Singlet S1 µ  infinitely-large µ C in the simulation (ii). The exact energy of the desired eigenstate (T 1 state) is −0.169 901 Ha in this case. The optimization result for F (1) (θ) gives −0.169 901 Ha but that for F (2) (θ) does −0.492 853 Ha, which is crucially far from the exact value. This result is obtained at µ C = 100, but we confirm that F (2) yields a completely wrong value for larger values of µ C . In this case, the target eigenstate is inside the convex envelope explained in Sec. III B and we simply cannot obtain that state by optimizing the cost function F (2) (θ).
To summarize, we numerically validate our theoretical analysis on F (2) (θ) in Sec. III B in practical quantum chemistry calculations. We find examples for both cases (A) and (B) in Sec. III B, where one can obtain the desired energy by the error of O(1/µ C ) (case (A)) or cannot obtain it even with infinitely large µ C (case (B)). Moreover, we find that sometimes the cost function F (2) (θ) can find an sufficiently-accurate energy with less computational cost than F (1) (θ), reflecting the difference in the amount of effort to evaluate the cost functions on NISQ devices. We stress that the use of F (2) (θ) may be preferable in some cases to reduce the computational cost, but whether we can obtain the target state as a result of the optimization cannot be guaranteed at all a priori.  Nmeas represents the number of the Pauli measurements necessary to perform the whole optimization process, which is computed as (the number of the evaluations of the cost function during the optimization) × (the number of Pauli operators to be measured to get the value of the cost function once). "residuals" represents the difference between the expectation value of the Hamiltonian for the resulting optimized state and that for the exact (desired) state in Hartree (Ha). The exact value for this case is −1.881 876 Ha. We perform the constrained VQE for ten different initial values, and all values shown in this table are the average for the ten trials.

V. CONCLUSION
In this work, we study two cost functions F (1) (θ) (Eq. (5)) and F (2) (θ) (Eq. (6)) of the constrained VQE/VQD, which can be exploited to compute eigenstates of the Hamiltonian of a given system that reside in the desired symmetry sector. Our theoretical analysis revealed that the minimization of the cost function F (1) (θ) can yield the desired state/energy when the penalty coefficient µ C is larger than a certain threshold (Eq. (11)), and we derived a simple and practical formula to estimate it (Eq. (13)). On the other hand, we proved that the exact desired state/energy cannot be obtained by minimizing the cost function F (2) (θ) and that we obtain completely wrong values in some cases. To validate these theoretical analyses, we performed several numerical simulations of the constrained VQE/VQD for H 2 and H 4 molecules. Our simulations validated the formula (13) for F (1) (θ) in practical quantum chemistry calculations and indicated that we should estimate the energy gap in the formula as accurately as possible to achieve the fast convergence of the optimization. Furthermore, we found an explicit example where the desired state/energy is never obtained by using the cost function F (2) (θ). Even though F (2) (θ) sometimes shows a better performance than F (1) (θ) in terms of the total number of the Pauli measurements required for the optimization, F (1) (θ) still serves as a better cost function of the constrained VQE/VQD because we can assure that the target state/energy is obtained as a result of the optimization.
Our results elucidate the fundamental difference between the performances of the two cost functions F (1) (θ) and F (2) (θ). The inconsistent and heuristic use of the two cost functions is resolved and will be terminated by showing the theoretical superiority of F (1) (θ) over F (2) (θ) and providing the formula (13) to determine the appropriate penalty coefficient. The proper choice of the penalty coefficient leads to the fast convergence of the optimization. Since we assume only the discreteness of the system, our findings apply to general quantum systems and lay the theoretical foundation for exploiting NISQ devices with the constrained VQE/VQD to solve large quantum systems that are classically intractable.
As future work, it is intriguing to study the effect of the noise, which is inevitable in NISQ devices, on the practical performance of the cost functions. Another interesting direction is to numerically investigate other symmetries that are not treated in this work, such as the translation symmetry or the point group symmetry.