Optimal Parameter Configurations for Sequential Optimization of Variational Quantum Eigensolver

Variational Quantum Eigensolver (VQE) is a hybrid algorithm for finding the minimum eigenvalue/vector of a given Hamiltonian by optimizing a parametrized quantum circuit (PQC) using a classical computer. Sequential optimization methods, which are often used in quantum circuit tensor networks, are popular for optimizing the parametrized gates of PQCs. This paper focuses on the case where the components to be optimized are single-qubit gates, in which the analytic optimization of a single-qubit gate is sequentially performed. The analytical solution is given by diagonalization of a matrix whose elements are computed from the expectation values of observables specified by a set of predetermined parameters which we call the parameter configurations. In this study, we first show that the optimization accuracy significantly depends on the choice of parameter configurations due to the statistical errors in the expectation values. We then identify a metric that quantifies the optimization accuracy of a parameter configuration for all possible statistical errors, named configuration overhead/cost or C-cost. We theoretically provide the lower bound of C-cost and show that, for the minimum size of parameter configurations, the lower bound is achieved if and only if the parameter configuration satisfies the so-called equiangular line condition. Finally, we provide numerical experiments demonstrating that the optimal parameter configuration exhibits the best result in several VQE problems. We hope that this general statistical methodology will enhance the efficacy of sequential optimization of PQCs for solving practical problems with near-term quantum devices.

The core question is how to model the PQC U (θ) and how to minimize H with some classical optimizer.There have been extensive investigation on this problem [14].In particular, the sequential optimization method have been used in a variety of settings such as quantum circuit tensor-networks [15][16][17][18][19], where θ corresponds to a set of local unitaries and they are sequentually optimized one by one.In this paper, we focus on the special type of sequential optimization method developed in Refs.[20][21][22][23][24].In this framework, θ are the parameters characterizing the set of single-qubit rotation gates such as R y (θ) = e iθY (Y is the Pauli y matrix) in the case of Rotosolve [20,21].Then the sequential optimization /2 Ry

Best Parameter Configuration
FIG. 1.A general view to explain how the estimated optimal solution varies depending on the parameter configuration, when there is statistical errors for determining the cost function.
method takes the strategy to exactly optimize the single rotation gates one by one.For example, consider the step where we optimize the R y (θ) gate contained in the PQC shown in Fig. 1 by minimizing the cost H as a function of θ.The point is that, in this case, H must be of the form of a sinusoidal function with respect to θ, and thus the optimal θ opt can be exactly determined once we identify the sinusoidal function shown by the black curve in the figure.In particular, as a nature of sinusoidal function, specifying the mean values of three observables corresponding to the three points of θ allows us to exactly identify H ; we call the alignment of these three points of θ the parameter configuration.Note that, in the case of Free axis selection (Fraxis) [22] where the freedom of a single-qubit rotation gate is served by the rotation axis with fixed rotation angle in the Bloch sphere, H takes the form of a quadratic function of a real normalized vector n = (x, y, z) T , which can also be exactly minimized.This setup was further generalized to Free Quaternion Selection (FQS) [23,24] so that the rotation angle can also be tuned; then H takes the form of a quadratic function of a real normalized vector q = (w, x, y, z) T .In this case, as shown later, the mean values of 10 observables corresponding to 10 points of q identify H ; we also call this {q 1 , . . ., q 10 } the parameter configuration.
However, this optimization strategy relies on the critical assumption that the mean values of observables and accordingly H are exactly identified.In reality, those mean values can only be approximately obtained as the average of a finite number of measurement results; that is, practically there is always a statistical error in H .In the above one-dimensional case, as illustrated in Fig. 1, the energy curve, θ opt , and consequently the minimum value of H may all largely fluctuate depending on the parameter configuration.Hence the question is what is the best parameter configuration for achieving a small fluctuation of min H .In the above one-dimensional case, we have an intuition that the best configuration might be such that the three parameters are equally spaced (i.e., equidistant), as shown in the left bottom of Fig. 1, which is indeed true as proven later.However, the general case is of course nontrivial; will we have such equidistant configuration in some sense, or some biased configuration would be the best?
In this paper, we develop the theory for determining the optimal parameter configuration.As a preliminary result, in Sec.II, we prove that, if the exact expectation values are available without any statistical error, then we have analytical solution of the best parameters achieving min H (almost) without respect to the parameter configuration for every method of [20][21][22][23][24].Then, in Sec.III, we give the most essential result providing the basis of the theory; that is, we derive the explicit form of the fluctuation of min H under statistical errors, with respect to the parameter configuration.This enables us to introduce the C-cost (configuration cost/overhead), a useful metric for determining min H and thereby providing us with the optimal parameter configuration.Actually, Sec.IV gives numerical experiments to demonstrate that the optimal parameter configurations obtained using Ccost yield the best result in the sense of the statistical error of estimating H .
Notably, beyond such utilization for numerically determining the configuration, the C-cost satisfies several interesting mathematical properties, suggesting the relevance of this metric.The first is that the lower bound of C-cost is 1; moreover, we prove that, for the minimum size of the parameter set, this bound is achievable if and only if the parameter configuration satisfies a geometric condition called the equiangular line condition, an important and beautiful mathematical concept in algebraic graph theory.Here, each parameter q corresponds to a line that passes the origin and q.This condition rigorously supports our above-described intuition that it would be desirable for the parameters to be equally spaced for the Rotosolve case shown in Fig. 1 or Fig. 2A; this intuition holds for the case of Fraxis, showing that there is a unique parameter configuration (up to the global rotation) satisfying the equiangular line condition, as displayed in Fig. 2B.But interestingly, this intuition does not apply to the most general FQS case due to the non-existence of 10 equiangular lines in R 4 .That is, the so-called Gerzon bounds [25], Neumann theorem [26] and Haantjes bound [27] prove that there does not exist a set of 10 lines satisfying the equiangular line condition in R 4 ; the maximum number of such lines is 6.Nevertheless, the C-cost is still useful in this case, as it gives us a means to numerically obtain the optimal parameter configuration, which is displayed in Fig. 2C.Furthermore, if redundant measurements are allowed, there exists parameter configurations that achieves the theoretical lower bound of the C-cost, one of which is illustrated in Fig. 2D.
Finally, we note that equiangular lines in complex spaces are equivalent to symmetric, informationally complete (SIC) POVMs [28] whose properties have been much studied, e.g., it is conjectured that there is always a set of d 2 equiangular lines in C d [29] (it has been proven up to some large d theoretically and numerically).The SIC POVMs defined from such lines are informationally complete because the results of other measurements can be computed from those of the SIC POVMs.In this study, we obtain similar results connecting equiangular lines in real spaces with the variational quantum circuits using parametrized single-qubit gates.

II. ENERGY MINIMIZATION WITH MATRIX FACTORIZATION
A. Brief review of Rotosolve, Fraxis, and FQS FQS method [24] describes the procedure to completely characterize the energy landscape with respect to a single-qubit gate in a PQC.The parametrized singlequbit gate, which we call FQS gate, is none other than the general single-qubit gate U (4) expressed as [24,30] U (4) where the superscipt indicates the number of parameters: q = (w, x, y, z) T ∈ R 4 satisfying q 2 = 1.Here, i is the imaginary unit, I is the 2×2 identity matrix, and X, Y, Z are the Pauli matrices.ς = (ς I , ς X , ς Y , ς Z ) T denotes an extension of the Pauli matrices defined as The dimension of the parameter q is four, but since the parameter q is constrained on the unit hyper-sphere, the degree of freedom of the parameter is three.
In Fraxis, the rotation angle is constrained to π, which corresponds to the case w = 0 of Eq. (1) as where the parameter of the gate is n = (x, y, z) T such that n 2 = 1.We term this U (3) as Fraxis gate.Thus, the Fraxis gate has two degrees of freedom.
In Rotosolve, the rotation axis is fixed and the rotation angle serves as the parameter.In particular, Rx gate fixes the rotation axis to the x-axis; in the form of Eq. ( 1), this corresponds to y = z = 0 and thus where the parameter of the gate is r = (w, x) T such that r 2 = 1.Thus, the degree of freedom of Rx gate is one.Similarly, Ry and Rz gates are obtained by replacing X in Eq. ( 4) with Y and Z, respectively.
In what follows we use the most general FQS gate to describe the optimization algorithm.The sequential optimization method takes the strategy to update respective FQS gates in a coordinate-wise manner, where all parameters are fixed except for the focused FQS gate U (4) (q).The entire quantum circuit containing FQS gates is supposed to be the PQC V = i U (4) i (q i )W i on the n-qubit system, where U (4) i is the ith FQS gate and W i is a fixed multi-qubit gate.Now, let V 1 and V 2 be the gates placed before and after the focused FQS gate U (4) (q).Then, a density matrix ρ prepared by the PQC is expressed as where ρ in is an input density matrix.Thus, the expectation value H of given Hamiltonian H with respect to ρ is then where Substituting Eq. (1) into Eq.( 6) yields where G (4) is a 4 × 4 real-symmetric matrix: and each element, G µν (µ, ν = I, X, Y, Z), is defined by Thus the energy landscape with respect to the FQS gate is completely characterized by the matrix G (4) .Because Eq. ( 7) is a quadratic form with respect to the parameter q with the constraint q 2 = 1, the eigenvector p 1 associated with the lowest eigenvalue λ 1 of the matrix G (4)  minimizes the energy (7); see Appendix 1 for the details.
In the following, we call the matrix G (4) FQS matrix.Note that the above result can be directly extended to the case of Fraxis and Rotosolve, in which case Eq. ( 8) is replaced by and

B. FQS with arbitrary parameter configurations
Since G (4) is a real-symmetric matrix, we can expand Eq. ( 7) as the following form: Eq. (12) indicates that, if we know all the 10 coefficients (G II , ..., G Y Z ), we can exactly estimate the expectation H for any parameter q.In other words, only algebraic calculations on classical computers are required to find the parameters achieving the minimum expectation value for the target gate.Therefore, it is important to obtain the coefficients with as few measurements as possible.To consider this problem, we define the function h (4) (q) that outputs the normalized vector ( h (4) (q) = 1): and the vector g (4) Then, the relation between the parameter q and the expectation H is expressed as 4) (q) T g (4) .
Suppose measurements with different parameters {q 1 , ..., q N } and the N expectation values of the measurement results b = (b 1 , ..., b N ) T were obtained, we can also write the relations between the expectation values b and the coefficient vector g (4) as b = A (4) g (4) , where the matrix that encodes the information of the parameter configurations {q 1 , ..., q N }.
It is obvious, if N < 10, g (4) is not uniquely determined.Hence, we suppose N ≥ 10 throughout this paper.If rank(A) = 10, A T A is invertible and there exists the generalized inverse of A + := (A T A) −1 A T [31].Accordingly, we can obtain the vector g (4) by exactly solving linear equations as In other words, a single execution of FQS requires at least ten sets of the parameters and the corresponding observables.However, it may not necessarily be the case when input states and/or Hamiltonian has symmetry, which reduces the number of required measurements to construct G (4) in Eq. (8).We also note that it is possible that rank(A) < 10 if the rows of A are dependent on each other.However, it is plausible to exclude such situation, because the input parameters are controllable.Hereafter, we suppose that all columns of A are independent of each other, equivalently, rank(A) = 10.The same argument is applicable to the Fraxis gate as Likewise, for Rx gates The minimum sizes of the parameter configuration required to construct G (d)  If infinite number of measurements were allowed, there would be no estimation errors in the expectation values b, and the resulting vector g is exactly obtained as long as the matrix A is invertible.This allows for the exact evaluation of the optimal solution of the FQS matrix.In this section, we quantitatively evaluate the error propagation from the shot noise in the expectation values b to the estimation of the minimum solution.Although we focus on the FQS for generality, it can be easily applied to other sequential quantum optimizers, Rotosolve and Fraxis.Suppose a FQS matrix is estimated from N expectation values of an observable, which are obtained by independent measurements with different parameters {q 1 , ..., q N } assigned to the gate of interest.Due to the finite number of shots, the expectation values are no longer deterministic, but randomly distribute around the true values b * obtained with infinite shots as where is the random variables reflecting the errors on the measurements.Note that the relation between b and g is no longer valid under the finite measurement condition.Alternatively, we employed the least-square solution g g = arg min as a plausible estimate of g * .Apparently, Eq. ( 26) has the same form as Eq. ( 18), but the resulting vector g is an estimate of the true vector g * in the context of maximum likelihood [32] and deviates from the ideal vector g * due to errors for finite measurement.Substituting Eq. ( 25) into Eq.( 26), we get where the third equality follows g * = A + b * .Eq. ( 27) implies the errors of the estimated coefficient vector g − g * = A + is amplified by the linear transformation A + from the shot errors .
Let G be a FQS matrix generated from the estimated vector g with finite number of measurements.In the below, we focus on the FQS procedure to estimate the minimum eigenvalue of G. Here, for convenience, we define the half-vectorization function vech : where the order of elements corresponds to g.In addition, the scaling matrix D is defined as Using these notations, we have the following relations, where the function vech −1 is a linear mapping as Accordingly, G is expressed as which implies that the ideal FQS matrix In the following part, we quantitatively evaluate the matrix perturbation effect on the optimization result.Let λ * i and p * i be the ith lowest eigenvalue and the corresponding eigenvector of G * .Likewise, λ i ( ) and p i ( ) are the ith lowest eigenvalue and its corresponding eigenvector of the estimated matrix G.For quantitative evaluation of the perturbation, we suppose two metrics: (1) Var[λ 1 ( )], the variance of the estimated minimum value, and (2) E[∆E], the mean error of the minimum expectation value using the estimated optimal parameters with infinite shot.Here, ∆E is the deviation of the expectation value with the estimated parameter set p 1 from the true minimum expectation value, defined as where the positivity of ∆E comes from the fact that the true parameter set p * 1 gives the minimum value of the quadratic form.We suppose that Var[λ 1 ( )] is a measure to verify the estimated energy λ 1 by one-time execution of FQS, while E[∆E] is a measure to qualify the estimated parameter p 1 .Throughout the following parts, for simplicity, we employed Var[λ 1 ] as the indicator of shot errors.

(See Appendix 2 c for E[∆E])
Since G is a 4 × 4 symmetric matrix, it is represented by eigendecomposition as where P = (p 1 , ..., p 4 ) T and Λ = diag(λ 1 , ..., λ 4 ).From the first-order perturbation theory [33], the minimum eigenvalue λ 1 of G is approximated as Then, Var[λ 1 ] is evaluated as To deal with Eq. ( 35), we apply a simple model to the measurement errors satisfying as where s denotes the number of measurement shots to evaluate an expectation value of observables and σ 2 is a part to specific to observables.
In addition, we assume the first eigenvector p 1 follows a uniform distribution on the unit sphere.Based on the models, Eq. ( 35) can be further calculated as where d = dim(q) (4 for FQS, 3 for Fraxis and 2 for Rx) and is the vector that the first d elements are unity and the others are zero (e.g. 1 d = (1, 1, 1, 1, 0, 0, 0, 0, 0, 0) T for FQS).Derivation of Eq. ( 38) is detailed in Appendix 2 b.Since we focus on the optimization performance, it is convenient to discuss the total number of shots required for an one-time optimization rather than the cost for evaluating an expectation value.Suppose the total shots for an one-time optimization is constant.Let s min be the number of measurement shots to estimate an expectation value of the observable when N = N min , where N min := d(d + 1)/2 is the minimum size of the parameter configuration.For a redundant parameter configuration N > N min , the number of shots for evaluating an expectation value is s min N min /N .As a result, where we define the C-cost (Configuration cost), C(A), as Equation (39) indicates that Var[λ 1 ] is separable into the number of shots (s min ) dependent part and the parameter configuration dependent part i.e. a 50% reduction of C(A) is equivalent to doubling the number of shots.The C-cost is a metric to estimate Var[λ 1 ] under the condition that the number of shots to optimize a single-qubit gate is constant.Now, the conditions for the minimum C(A) are of interest to minimize the estimation error.We rigorously give the lower bound of the C-cost as the following theorem (See Appendix 3 for the proof of this theorem): Theorem 1.For the C-cost C(A) in Eq. (40), C(A) ≥ 1 holds with equality if and only if the parameter configurations {q i } N i=1 satisfy In other words, the parameter configurations that satisfies Eq. ( 41) is optimal with respect to efficiency.Although it may not be straightforward to find the optimal parameter sets that satisfy Eq. ( 41), in the case of minimum parameter set (N = N min ) a useful formula is available as the following corollary of Theorem 1. (See Appendix 3 for the proof.)Corollary 1.For the minimum number of parameters (N = N min ), the C-cost C(A) in Eq. ( 40) is always C(A) ≥ 1 with equality if and only if the parameter configurations {q i } N i=1 satisfy The equality condition in Corollary 1 tells us that the parameters must be equiangular unit vectors.This equiangular property is known as equiangular lines in real spaces [25,[34][35][36], which is equivalent to the algebraic graph theory of two-graphs [37].The existence of N min = d(d + 1)/2 equiangular lines in R d is known as the Gerzon bounds, and so far only shown to hold for d = 2, 3, 7, 23.For our optimal parameter configurations, only the cases of Rx and Fraxis gates (d = 2, 3), there exists a unique set of N min equiangular unit vectors (up to rotation) and such parameter configuration uniquely achieves the minimum value of C-cost C(A).The non-existence of such optimal parameter configuration for FQS gate (d = 4) is due to the non-existence of equiangular lines satisfying the condition of Corollary 1, which is attributed to Haantjes [27] and Neumann in [25] (see also [26]).

B. The Rotation Invariance of C-cost.
The C-cost C(A) in Eq. ( 40) is invariant to rotation of all the parameter configurations.In other words, a parameter configuration (q 1 , ..., q K ) and its rotated configuration (Rq 1 , ..., Rq K ) have the same value of the C-cost, where R is a rotation matrix ∈ R d×d (R T R = I).See Appendix 4 for the proof of rotation invariance.This implies that, for any parameter q of a single-qubit gate of interest, there exists the optimal parameter configuration {q i } such that q ∈ {q i }.This property allows for one reduction of the total number of measurements required in the matrix construction, i.e. reduced to two for Algorithm 1 Algorithm to reuse optimization results of the previous gate Input: The parameter q pre of the target gate, The estimated minimum eigenvalue λ pre in the previous FQS, and the optimal parameter configurations {q * 1 , • • • , q * N }.Output: the optimized parameter of the target gate q opt and the updated cost λ.1: Find a rotation matrix R such that q pre = Rq * 1 2: Set b1 = λ pre (instead of measuring b1 = H (q * 1 )) 3: for i = 2 to N do 4: Diagonalize G and obtain the minimum eigenvalue λ and the corresponding eigenvector q opt 7: Return q opt and λ Rotosolve, five for Fraxis, and nine for FQS by diverting the previous results to the subsequent gate update.The reduction for Rotosolve has been known before [21] but not for Fraxis and FQS.In each step of the sequential optimizations, the resulting cost value after the parameter update can be estimated without additional measurement.Since all parameters are fixed except for that of the target gate, this estimated cost can be regarded as one of the observable expectation value b 1 in the subsequent application, where the parameter q 1 of the next gate of interest is diverted from the previous application.
The detailed procedure is as follows; (1) Prepare an optimal parameter configuration {q * }, the gate parameter set {q (m) } for m = 1, • • • , M , and the temporal cost value H ({q (m) }) where m and M denote the gate index and the total number of parametrized gates, respectively.
(2) Finds a rotation matrix R such that q * 1 = R T q (m) where the mth gate is of interest and sets b 1 = H . (3) Measure the cost values with the parameter 4) Construct the matrix G from b and {Rq * i } (5) Diagonalise the matrix to estimate the new parameter q (m) and the new cost H , which can be reused in the next iteration and go back to (2) until convergence.The pseudo-code of this procedure is given in Algorithm 1.

C. Optimal configurations
The minimum size of parameter configuration (N min ) for Rx, Fraxis, and FQS are 3, 6, and 10, respectively.According to Corollary 1 in the case of the Rx gate, the three equiangular vectors on a unit circle are trivially represented by q = [cos 2  3 πnθ, sin 2 3 πnθ] T for n = 0, ±1, that is, the vector angle ∆θ = 2π/3 (equivalently π/3) as shown in Figure 2A.In contrast, the original parameter configuration proposed in Rotosolve [21] was ∆θ = π/2, which resulted in C(A) = 3/2.(It is worth noting that in [20] it is argued that arbitrary parameter configurations can be used due to the sine property of the expecta-tion value but did not discuss the estimation accuracy dependent on the parameter configurations under the finite measurements.)To achieve the same estimation accuracy, our optimal parameter configuration (∆θ = 2π/3) requires two-thirds as many shots as the original parameter configuration (∆θ = π/2).
Corollary 1 is also instrumental for Fraxis with d = 3.It is also possible to find the equiangular formation of six unit vectors in 3D space.Figure 2B shows the unique optimal parameter configuration except for the rotational degrees of freedom, where they form a regular icosahedron.The original parameter configuration of Fraxis has C(A) = 1.8 [22] (See Appendix 5).Thus, the optimal configuration improves the estimation accuracy 1.8 times with the consistent number of shots.
In contrast, it was proved that N min (= 10) equiangular unit vectors cannot be placed in d (= 4) dimensional Euclidean space.Namely, Corollary 1 tells that there is no parameter configuration that satisfies C(A) = 1 for N = 10.In addition, Corollary 1 also implies that the minimum size of the parameter configuration (N = 10) may not be the most efficient if the total number of shots are limited for a single FQS execution, although it is not straightforward to know the analytical minimum value and the corresponding parameter configurations.Instead, we searched the numerical solution by classical optimization, where C(A) is minimized based on the gradient descent method.Since the algorithm may lead to a local minimum solution, we repeated the algorithm independently 10 5 times starting from random initial configurations.
For N = 10, we have obtained the same optimized Ccost value (C(A) ≈ 1.033172) from all the initial configurations as far as our experimental trials, which implies that all simulations presumably reached to the global minimum.Although the obtained configurations were not numerically identical, we found that they were attributed to a unique configuration just by reversal and rotational operations.Since the reversal of each parameter does not affect the expectation value (i.e., h(q) = h(−q)) and the uniform rotation of the parameter configuration gives the indentical value of the C-cost (See Sec.III B), all the configurations were equivalent, which seem to be optimal.
Figure 2C shows the unique optimal parameter configurations for the FQS case.In this figure, the parameter configurations are projected into 3-dimensional space by a stereographic projection.It means that extra 1D components that cannot be displayed are projected in the radial direction.See Appendix 5 for the parameter values of the optimal and other parameter configurations.
From the parameter values of the (numerically obtained) optimal parameter configurations (Eq.( 126)), we can see the optimal parameter configurations has highly symmetrical structure; the first four parameters {q 1 , ..., q 4 } and its opposite {−q 1 , ..., −q 4 } constitute a regular cube in a hyperplane and the last six parameters {q 5 , ..., q 10 } constitute a regular octahedron in a hyperplane (its opposite also constitute another regular octahedron) as shown in Fig. 2.
For FQS, the original parameter configuration has C(A) = 3.0 and the optimal parameter configurations estimated with numerical experiments is approximately C(A) ≈ 1.033172.And thus, to achieve a certain accuracy, the optimal parameter configuration reduces the number of required shots 3 times than that of the original.
Likewise, we also conducted the numerical optimization to find the optimal parameter configuration for redundant measurements with N = 11, 12.As a result, all the optimizations converged to a consistent value of C(A) within computational precision, which is consistent with the case of N = N min .However, the optimal configurations are not necessarily unique, which is in contrast to N = N min .While the obtained C(A) was ≈ 1.005390 for N = 11, C(A) was exactly converged to unity for N = 12.It is also notable that the optimal configurations of C(A) = 1 for N = 12 include the regular 24-cell polytope in 4-dimensional space as shown in Fig. 2D.
Therefore, If the total number of shots for A matrix construction is constant, the optimal sizes of N are three for Rotosolve, six for Fraxis, and twelve for FQS.
Next, we focus on the optimal N allowing the reduction of measurements exploiting the rotation invariance as mentioned in Sec.III B. Assuming a constant number of shots per gate, the measurement reduction modifies the relation between C(A) and s min as where the C-cost is apparently scaled by (N −1)/N .Note that this factor does not change the optimal N for Rotosolve and Fraxis.Thus, it is most efficient to revert the estimated value in previous optimization to construct A and additionally execute two and five measurements for Rotosolve and Fraxis, respectively.It is worth noting that Table .I shows that the optimal N for FQS is shifted from twelve to eleven by measurement reduction, although the difference is smaller than 1 %.Altogether, under limitation of the total number of shots, it is most efficient to construct the matrix A by three-, six-, and twelve-type measurements for the expectation values in the beginning of Rotosolve, Fraxis, and FQS optimizations, respectively.In contrast, during the sequential op-timization, matrix A should be made by one estimation value from the previous step and two, five, and ten values from subsequent measurements of Rotosolve, Fraxis, and FQS, respectively.It should be also noted that this optimal condition may differ depending on the supposed condition of real devices.For instance, if parallel computation is allowed, where a constant number of shots are available for evaluating an expectation value even though when N varies, C(A) would not be an appropriate metric because the assumption about the number of shot is not valid, and thus one should trivially employ as large N as as possible.

IV. EXPERIMENTS
In the following, we provide several experiments to numerically verify our proposed method on the condition of N = N min .

A. Estimation Accuracy of One-time Optimization with Different Parameter Configurations
We focused on the one-time optimization rather than an entire VQE processes.To this end, we examined the averaged error of FQS between the exact minimum and the estimated minimum energies with limited number of shots for several parameter configurations.We used the 2-qubit Hydrogen molecule-like Hamiltonian [38] defined as in this experiment.We use the 2-qubit ansatz in Fig. 4, where we applied the corresponding single-qubit gate representation of Rotosolve (=RzRy), Fraxis, and FQS methods to U i .Here, the target gate to be optimized is U 2 for the FQS and Fraxis and the Ry gate in U 2 for the Rotosolve case.The experiments were performed as the following procedure.(1) prepare 100 independent parameter configurations, where the parameters of all the gates were randomly initialized with uniform probability distribution, which was followed by 50 iterations of the steepest decent optimization using C(A) as a cost function.
(2) evaluate A + and C(A) of the 100 parameter configurations.
(3) randomly initialize the PQC in the staterandom manner for the respective single-qubit gates [22].Each subplot shows the averaged deviations of the estimated minimum from the true minimum energy (vertical axis), where the former energy was evaluated from G made with randomly generated parameter configurations under the limited total number of shots, while the latter energy were obtained statevector simulator.The left, center, and right columns show the results with 10, 100, and 1000 shots per circuit, respectively.The top, center, and bottom rows show the results for Rotosolve, Fraxis, and FQS, respectively.The results of original and optimal parameter configurations are highlighted in the figure.The description about the number of shots above each subplot represents the number of shots used for a single mean value of the Hamiltonian based on a parameter configuration.
By definition, C(A) and ∆E are metrics to qualify the estimated energy and the estimated parameter, respectively.Although both the metrics are linked through the following equation, the concrete behaviors are not necessarily trivial because of dependency on A and the observable.Here, we confirmed that the energy errors ∆E are roughly proportional to C(A) for all the cases, and ∆E is inversely proportional to the number of shots approximately.We also found that the optimal parameter configuration (red) achieves the lowest error ∆E , indicating that the optimal parameter configurations are actually effective to minimize the estimation error.Although the magnitude of ∆E in FQS is seemingly larger than that of Rotosolve, we note that it does not necessarily indicate the advantage of Rotosolve with respect to error suppression because the single gate expressibility is not comparable among the respective methods.For instance, sequential Rotosolve applications of a series of three single-qubit gates are comparable to one-time FQS application.In this case, however, it is not straightforward to compare them because of error propagation, which is beyond the present framework.In the next section, instead, we examine the effect of the parameter configuration on the entire performance in comparison with the optimization methods.

B. The Influence of the C-cost on VQE Performances
We investigate the effect of different parameter configurations on the results of VQE when we sequentially optimize single-qubit gates in quantum circuits by the framework of FQS [24].We employed the 5-qubit quantum Heisenberg model [39] defined as where We herein set J = h = 1.We used a Cascading-block ansatz shown in Fig. 5, where the gates within the dashed lines are repeated L times.We set L = 1, 3, 5 in this experiment.According to the optimization method, we applied the respective single-qubit representations to U i in the PQC.We begin VQE with randomly initialize PQC in the state-random manner for respective single-qubit gates in the PQC.In VQE, we sequentially applied Rotosolve/Fraxis/FQS to U i in the order of subscripts in Fig. 5, i.e., from the top-left to the right bottom.We term this procedure to update all gates in the PQC once as sweep.In a single VQE run, we carried out 100 sweeps to obtained the estimated minimum eigenvalue Hamiltonian.We performed independent 100 VQE runs and plotted the error distribution ∆E := E − E * for respective 100 trials in Fig. 6, where E * is the exact minimum eigenvalue of the Hamiltonian.We evaluated the resulting distributions using the number of shots to 100, 1000, 10000, and ∞ for two or three different parameter configurations (See Appendix 5 for the specific parameter values).Note that we used a statevector for VQE with an infinite number of shots.Figure 6 suggests that parameter configurations strongly affect the entire VQE performance and shows that the optimized parameter configuration (C(A) 1) achieves the smallest errors on all the conditions with the finite numbers of shots.The optimal parameter configuration works more effectively as the number of shots is smaller, which is in line with the analysis of the one-time application to a single-qubit gate in Fig. 3.In addition, the impact of the parameter configuration on the VQE performance is not visible on shallow circuit and more distinct as the number of the layer increases.In general, more expressive ansatz can potentially approximate the state of interest with higher precision.Correspondingly, one has to increase the number of shots, because for accuracy , the number of required shots scales in O(1/ 2 ).Otherwise, the enhanced expressibility by the circuit extension may not be highlighted.Since the gain of C(A) is equivalent to the increase of measurements, the optimal parameter configuration will be more beneficial as desired accuracy in VQE becomes higher.In fact, FQS is superior to Rotosolve and Fraxis and the statevector simulation implies that FQS with ansatz of L = 5 can potentially achieve the accuracy < 10 −2 .However, it is less likely to reach this energy level with the 10000 shots which is a practical standard for the present quantum devices, i.e.IBM-Q device.There, the parameter optimization assists VQE lowering the reachable energy level distinctively, although it is not the case for Rotosolve and Fraxis because the number of shots available are sufficient relative to their expressibility.

V. CONCLUSIONS
In this work, we showed that the parameter configuration affects the performance of analytical optimization of a single-qubit gate.This estimation error was quantified by the C-cost C(A), the variance of the estimated value of the cost function.We theoretically proved that the lower bound of C(A) is unity.We also showed that when the size of the parameter configuration is minimal, the Ccost becomes unity if and only if the parameter configuration satisfies the equiangular condition.Exploiting this property, we found the optimal parameter configuration for Rotosolve and Fraxis.Although we revealed no parameter configuration of minimum size for FQS achieves C(A) = 1, it turned out the parameter configuration of N = 12 corresponding to the regular 24-cell polytope in the 4-dimensional space satisfies C(A) = 1.In addition, we also demonstrated how to reduce the number of measurements for matrix construction by making use of the rotation invariance of C(A).Then, the optimal parameter configurations exhibited the best results improving efficiency 1.5 times for Rotosolve, 1.8 times for Fraxis, and 3.0 times for FQS, when compared to the original parameters.Additional numerical experiments showed that the parameter configuration affects the performance of not only the one-time optimization but also the entire VQE.We also found that the parameter configuration is more instrumental to elicit the VQE performance as the ansatz becomes more expressive.
MEXT Quantum Leap Flagship Program Grant Number JPMXS0118067285 and JPMXS0120319794.

APPENDIX 1. Free Quaternion Selection
We show the minimum value of Eq. ( 7) is the minimum eigenvalue λ 1 of G achieved when q = p 1 for the corresponding eigenvector p 1 of G.
For the Lagrange multiplier method, we first define a function, l(q, λ), corresponding the above optimization problem as where λ is a Lagrange multiplier.Taking the partial derivatives for l(q, λ) and setting them to zero, we can obtain Thus, the candidates for the local minimum/maximum value of l(q, λ) and the solutions are the eigenvalues λ i and its normalized eigenvectors p i , respectively.Substituting the normalized eigenvectors p i into Eq.( 7), we get this means the global minimum value of Eq. ( 7) and its solution are given by the minimum eigenvalue λ 1 and the corresponding normalized eigenvector p 1 .

a. Expectation value over an orthogonal basis
We show several equations that are useful for derivation of analytical form of the measures.
Let Z ∈ R d×d be a random symmetric matrix which satisfies E[Z ij ] = 0 for all i, j.Independently, let P = (p 1 , ..., p d ) T ∈ R d×d be a random orthogonal matrix (i.e., the matrix is uniformly sampled from the orthogonal group O(4)).Then, the following equations holds: and so, For i = j, For the fourth equality, we employed the following relation.
where x is a random vector in R d , which follows the ddimensional multivariate standard normal distributions N (0, I), and x, Z are independent each other.To evaluate Eq. ( 51) for i = j, we suppose another random vector y as x, but independent of x and Z.
Here, we introduce two vectors as Using these vectors, we obtained the following relation, For the third equality, we use the probability distribution f satisfies f (x, y , y ⊥ ) = f (x, y , −y ⊥ ), and thus equivalently, On the other hand, where we suppose that the probability distribution In addition the first term in Eq. ( 56) where the second equality arises from the independence of the random variables, and the third equality is based on x/ x = y / y , f (x/ x ) = f (p i ), a.e., E[ x 2 ] = d and E[ y 2 ] = 1.From Eq. (52)(56)(59)(60), we finally obtain Using the noise model i = 0, i j = σ 2 δ i,j /s and the uniformly distributed model of the first eigenvector p * 1 .We show the analytical form of Var[λ 1 ( )] in Eq. ( 35): . For simplicity, we write Z = vech −1 (D −1 A + ).Note that Z is a symmetric matrix and satisfies Then, we deal with the first term i,j E[Z ii Z jj ] and the second term i,j E[Z 2 ij ], separately.To this end, we introduce some useful representations.We note Eq. ( 15) can be rewritten as Here q is the parameter of the target single-qubit gate, G is the FQS matrix, and vec : R d×d → R d 2 is the vectorization operator for matrices.
For simplicity, we now write Z = vech −1 (D −1 A + ).From Eq.(61) in Appendix 2 a , In addition, if C(A) = 1, i.e. the case of theoretical lower bound, A T A = N d(d+2) (2I + 1 d 1 T d ) holds from Theorem 1.As a result, we obtain where we used the following relation, 3. Proof of Theorem 1 and Corollary 1 In this section, we first present useful lemmas to prove Theorem 1 and its Corollary 1 that allow for analytical calculation of the optimal bound of the C-cost.
The first lemma is trivial from the singular-value decomposition of a matrix A = U ΣV T , where U, V are orthogonal matrices, and Σ is the diagonal matrix that contains the singular values of A. Lemma 1.Let A be a real matrix.The multiset of nonzero eigenvalues of AA T is the same as the multiset of non-zero eigenvalues of A T A. Lemma 2. Let A be a real symmetric matrix such that one of its eigenvalues is a and the rest are b's.Then, it holds that A = (a − b)uu T + bI where u is the (normalized) eigenvector corresponding to the eigenvalue a.
Proof.Easy by seeing that Au = au, and Av = bv hold for every v which is orthogonal to u, i.e., v T u = 0.
Let A be an n × n positive definite matrix with the largest eigenvalue λ max and the smallest eigenvalue λ min such that κ = λ max /λ min .It is known that n 2 /κ ≤ Tr (A) Tr A −1 ≤ n 2 κ holds with equality if and only if κ = 1, i.e., A = λI for some λ > 0. We formalize this in the following lemma.

Lemma 3. Any positive-definite real symmetric matrix
−1 with equality if and only if A = λI for λ > 0.
We now prove Theorem 1 and its Corollary 1 concerning lower bounds and its equality conditions for C-cost.Here we revisit Theorem 1 for convenience.
Theorem 1. Suppose a single-qubit gate expressed by a parameter q in R d where |q| = 1.Let {q 1 , • • • , q N } be a parameter configuration and let A be the corresponding matrix satisfies C(A) ≥ 1 with equality if and only if the parameter configuration {q i } and A satisfy Proof.Using the Woodbury matrix identity giving we obtain the lower bound of Eq. (88) as where the inequality in the fourth line is derived by Lemma 3. To obtain the last line, we use Tr A T A = Tr AA T = N and Tr According to Lemma 3, the equality in the fourth line in Eq. ( 89) is given as where λ is a constant.Tracing over both sides of Eq. (90), we have Therefore, C(A) = 1 if and only if Corollary 1.For the minimum number of parameters (N = N min ), it holds that C(A) ≥ 1 with equality if and only if the parameter configurations {q i } N i=1 satisfy Proof.We show Eq. ( 92) is equivalent to Eq.(93) if N = N min .We first show (for all i = j).
From Eq. ( 71) and ( 75), the C-cost contains the Gram matrix A T A. For the rotated parameter set, the corresponding Gram matrix is given as where we denote R L := L T (R ⊗ R)L.
In fact, the first and second terms of Eq. ( 62) are independently invariant for parameter rotations as follows.
For the first term 1 d (A T A) −1 1 T d (Eq.( 71)), the rotated version of the first term is expanded as where we use the fact where we employed LL T (R ⊗ R) = (R ⊗ R)LL T and L T L = I.This equations implies the first term is rotation invariant.
For the second term Tr[(A T A) −1 ] (Eq.( 75)), the rotated version of the second term is expanded as where for the second equality where we employed LL T (R ⊗ R) = (R ⊗ R)LL T and L T L = I.This equations implies the second term is rotation invariant.Consequently, the C-cost is rotation invariant, because both the two terms in the C-cost are rotation invariant.(E[∆E] is also rotation invariant, because it is the weighted sum of these two terms.) 5. Comparison of our Parameters with the original methods.
We show the parameter values used as a sequential optimization in the main text as follows.The parameters are in no particular order.

a. Rx gate
The original parameter configuration for Rx gate r 1 , r 2 , r 3 proposed in [21]  (120) The unique optimal parameter configuration for Rx gate with minimum number parameter set r 1 , r 2 , r 3 is analytically derived as The original parameter configuration for Fraxis gate n 1 , n 2 , ..., n 6 proposed in [22] is represented as The unique (up to arbitrary rotation and individual reversal) optimal parameter configuration for Fraxis gate with minimum number parameter set n 1 , n 2 , ..., n 6 is analytically derived as the vertices of icosahedron where ϕ = 1+ √ 5 2 is the golden ratio.

c. FQS gate
The original parameter configuration for FQS q 1 , q 2 , ..., q 10 proposed in [24] is represented as The symmetric parameter configuration for FQS gate q 1 , q 2 , ..., q 10 which is only used for the experimental results in main text is represented as The unique optimal parameter configuration for FQS gate with minimum number parameter set q 1 , q 2 , ..., q 10 is numerically derived as  In the experiment in Sec.IV A, we performed optimization of only one gate to investigate the estimation error of the target gate.In the main text we shows only the case of U 2 (Ry gate in U 2 for Rotosolve case) as the target gate.In this section we show another case, that is the case of the target gate is U 0 for the FQS and Fraxis case and the Ry gate of U 0 for the Rotosolve case.Note that the number of shot per circuit S is set to 10, 100, 1000 and the parameters of all the gates are initialized to random values and only the target gate is optimized.Fig. 7 shows the results of all the additional experiments.The title of each subplot tells the target gate and other settings.

FIG. 2 .
FIG. 2.Optimal parameter configurations for Rotosolve, Fraxis, and FQS models.Blue spheres represent the optimal configuration of {qi} N i=1 , and green spheres represent its opposite position, {−qi} N i=1 .(A) Rotosolve: The diagonal lines between qi and −qi constitute three equiangular lines in 2D space (red lines).(B) Fraxis: The diagonal lines constitute six equiangular lines in 3D space.(C) FQS: The four positions and its opposite in the first term of the right hand side constitute a regular cube in a hyperplane; the six positions in the second term (outer blues) constitute a regular octahedron in a hyperplane.The opposite of last six positions (inner greens) also constitute a regular octahedron.The yellow diagonals are apparently doubly overlapped due to the stereographic projection, but they are not actually overlapped.(D) FQS (N =12): The 24-cell polytope in the 4-dimensional space, which achieves C(A) = 1 as a redundant parameter configurations.

( 4 )FIG. 3 .
FIG.3.The average energy error for one-time optimization with different parameter configurations.Each subplot shows the averaged deviations of the estimated minimum from the true minimum energy (vertical axis), where the former energy was evaluated from G made with randomly generated parameter configurations under the limited total number of shots, while the latter energy were obtained statevector simulator.The left, center, and right columns show the results with 10, 100, and 1000 shots per circuit, respectively.The top, center, and bottom rows show the results for Rotosolve, Fraxis, and FQS, respectively.The results of original and optimal parameter configurations are highlighted in the figure.The description about the number of shots above each subplot represents the number of shots used for a single mean value of the Hamiltonian based on a parameter configuration.

FIG. 6 .
FIG.6.Comparison of the VQE performance with different parameter configurations.The vertical axis represents the deviation of the resulting VQE energy from the exact ground state energy.The respective energies were evaluated after 100 sweeps, where one sweep stands for sequential updates of all the single-qubit once.The box-and-whisker plots show the statistics of the energy deviations (∆E) obtained by independent 100 VQE runs.We carried out the VQEs with the circuit layers (L) from 1, 3, 5 and showed the results in respective subplots.

c.
Discussion of E[∆E] for the perturbation effect rotation and (individual) reversal.b.Fraxis gate

FIG. 7 .
FIG. 7.Additional results of the average energy error for the one-time optimization on different target gates in the ansatz with various parameter configurations.

6 .
∆E Distributions sampled with various parameter configurations

TABLE I .
C-cost values for the different sizes of parameter configurations of FQS.(A) Comparison of the C-cost C(A) with a constant number of shots for evaluating an expectation value.(B) Comparison of scaled C-cost (N − 1)C(A)/N with constant number of shots per single-