Sequential minimal optimization for quantum-classical hybrid algorithms

We propose a sequential minimal optimization method for quantum-classical hybrid algorithms, which converges faster, is robust against statistical error, and is hyperparameter-free. Specifically, the optimization problem of the parameterized quantum circuits is divided into solvable subproblems by considering only a subset of the parameters. In fact, if we choose a single parameter, the cost function becomes a simple sine curve with period $2\pi$, and hence we can exactly minimize with respect to the chosen parameter. Furthermore, even in general cases, the cost function is given by a simple sum of trigonometric functions with certain periods and hence can be minimized by using a classical computer. By repeatedly performing this procedure, we can optimize the parameterized quantum circuits so that the cost function becomes as small as possible. We perform numerical simulations and compare the proposed method with existing gradient-free and gradient-based optimization algorithms. We find that the proposed method substantially outperforms the existing optimization algorithms and converges to a solution almost independent of the initial choice of the parameters. This accelerates almost all quantum-classical hybrid algorithms readily and would be a key tool for harnessing near-term quantum devices.


I. INTRODUCTION
Quantum computing devices with almost a hundred qubits are now within reach in the near future [1][2][3].Since they have an unignorable amount of error because of a lack of error correction, they are called noisy intermediate-scale quantum (NISQ) devices.While the complexity of NISQ devices is limited because of the noise, still their classical simulation is considered to be intractable on classical computers if the fidelity of gates is sufficiently high [4][5][6].
In order to exploit NISQ devices in useful ways, quantum-classical hybrid algorithms have been proposed.Quantum-classical hybrid algorithms solve problems with a combination of sampling on low-depth quantum circuits and classical post-processing of the sampled outcomes.The variational quantum eigensolver (VQE) [7][8][9][10] is, for example, an attracting algorithm for NISQ devices for finding an approximate ground state of a given Hamiltonian H.The VQE is now extended in various purposes beyond finding the ground state [11,12].The quantum approximate optimization algorithm (QAOA) [13][14][15] makes it possible to get approximate solutions for combinatorial optimization problems.The quantum machine learning [16], especially for the near-term devices, has been proposed in various machine learning settings such as supervised learning [17][18][19], unsupervised learning [20], generative model [21], generative adversarial model [22], and so on.
All these major quantum-classical hybrid algorithms have a common structure: a parameterized quantum circuit and its optimization with respect to an observed cost function.More precisely, we repeatedly generate an ansatz state |ψ(θ) from a parameterized quantum circuit U (θ) on quantum devices and optimize the parameters θ to minimize the expectation value of the given Hermitian operator, H(θ) = ψ(θ)|H|ψ(θ) , on classical computers.Therefore, the convergence speed of the optimization process is a key factor that determines the overall performance of the algorithms.
Existing works employ either gradient-free [23][24][25] or gradient-based [26][27][28][29][30] optimization algorithms.For example, Peruzzo et al. [7] and Ryabinkin et al. [31] use the Nelder-Mead method, which is one of the gradientfree optimization algorithms [23].Gradient-based methods are generally more efficient if we can get the gradients directly.In the context of the neural networks, the backpropagation method, which is an efficient method to calculate the gradients, is one of the most important ingredients to minimize the cost function of the model.To use gradient-based optimization algorithms for parameterized quantum circuits, Li et al. [32] and Mitarai et al. [17] propose an analytical way to take a partial derivative of parameterized quantum circuits, which allows us to obtain the gradient directly from the expectation value of a Hermitian operator.However, it is not yet fully explored whether a better optimization algorithm, specially designed for the parameterized quantum circuits, exists, or not.For example, in the optimization procedure in the support vector machine [33], the sequential minimal optimization [34] is widely used, in which the cost func-tion is exactly minimized with respect to two parameters at each step by making use of the characteristic structure of the quadratic programming problem.
In this study, we propose an optimization method which is specialized for quantum-classical hybrid algorithms based on parameterized quantum circuits.The idea is similar to the sequential minimal optimization for support vector machine; the cost function is minimized exactly with respect to certain chosen parameters at each step by using the characteristic structure of the parameterized quantum circuits.The proposed optimization method has several good properties: hyperparameterfree, faster convergence, less dependence on the initial choice of the parameters, and robust against the statistical error.To this end, we use the fact that the cost function, as a function of a parameter θ, behaves very simply as a sine curve with period 2π, when the parameterized quantum circuit consists of a unitary gate exp(iθA) being subject to A 2 = I.By virtue of this special property, we can determine the angle θ that provides the exact minimum of the cost function by evaluating the cost function at three independent points with respect to the parameter.This allows us to update each parameter with fewer measurements and to achieve a smaller value of the cost function.The update is deterministic if the order of the parameters is provided, and hence the proposed optimization method is hyperparameter-free.
We perform detailed numerical simulations to compare the proposed method with existing optimization algorithms including both of gradient-free and gradientbased methods.To this end, we develop a benchmark task for which we can purely compare the performance of the optimization algorithms apart from the representation power of the parameterized quantum circuits.The results show that the proposed optimization method converges extremely faster than the existing ones, especially when the statistical error exists because of the finite number of the samples to estimate an expectation value.Furthermore, we find that the proposed method converges to the solution almost independently of the choice of the initial parameters, while other methods fail for certain initial parameters.These results mean that the proposed method readily accelerates almost all quantumclassical hybrid algorithms substantially in a practical situation.The proposed optimization method should be an inevitable ingredient in the parameter tuning of NISQ devices.
The rest of the paper is organized as follows.In Sec.II we first describe the proposed optimization method for a single parameter optimization at each step, where we can find an exact minimum at each step.Then we further extend it for the case where multiple parameters are updated at each step.We also explain the case where multiple gates are parameterized by the same parameters.In Sec.III, we present numerical simulations to compare the proposed method with the existing optimization algorithms.We perform two tasks.One is a benchmark of the optimization of the parameterized quantum circuits, where we can achieve the exact solution if an optimization algorithm successfully finds the best parameters regardless of the representation power of the ansatz state.The other is VQE of the lithium hydride (LiH) molecule.Section IV is devoted to the conclusion.

A. Preconditions
Our optimization method requires the following three conditions of the quantum-classical hybrid algorithms with parameterized quantum circuits.
1.The parameters of the parameterized quantum circuit are independent of each other.(This condition can be relaxed by extending the proposed method as we will see in Sec.II D.) 2. The parameterized quantum circuit with J parameters U (θ) θ := {θ j } J j=1 is composed only of the two types of gates: fixed unitary gates (e.g. the Hadamard gate and the control-Z gate) and rotation gates where (e.g. the Z-rotation gate and the X-rotation gate.) 3. The cost function which we are going to minimize is written by the (weighted) sum of K expectation values: where are the input states, and w k is the weight of the k-th term.Hereinafter, we refer to L as the "cost function".
Most quantum-classical hybrid algorithms with parameterized quantum circuits, such as hardware efficient ansatz [10,19], satisfy these requirements.

B. Our method
Here we describe how parameters are updated in the proposed optimization method.Let θ (n) be the parameters of the circuit after n steps of update.Then, let U (n) j (θ j ) be the parameterized quantum circuit U (θ) in which the parameters are fixed to be θ (n) except for the j-th parameter θ j : Similarly, let L (n) j (θ j ) be the cost function L(θ) with the fixed parameters θ (n) except for the j-th parameter θ j : This cost function L (n) j (θ j ) can be rewritten as a function of θ j : where a (n) j ( = 1, 2, 3) denote constants independent of θ j .(See appendix A for the derivation.)Equation ( 6) tells us that the relation of L (n) j (θ j ) to θ j is just a sine curve with period 2π.The three constants a

and a
(n) 3j can be determined from the values of the cost function L (n) j (θ j ) evaluated at three independent points of θ j .Then we can find the argument θ j that minimizes the cost function Using the above feature, we update the parameters as follows: 1. Choose an index j n ∈ {1, 2, • • • , J} of the parameters sequentially or randomly.

Estimate L
(n−1) jn (θ 2 ) from a quantum device.Although the arguments do not have to be θ 2 , the present choice simplifies the optimization at step 3 greatly.

Determine θ jn minimizing the cost function L
(n−1) jn (θ jn ) given by Eq. ( 6) using L (n−1) jn (θ ), which was obtained in the previous update, and 2 ). 4. Update as follows: Note that, the minimum is not estimated directly but calculated from the sine curve.Therefore, the statistical error would accumulate.To avoid this, the minimum ) should be estimated directly in a certain period.5. Repeat step 1, 2, 3, and 4 until L(θ) converges.

C. Generalization of our method
In the previous section, the minimization is performed with respect to one parameter at each update.Here we generalize it for a multi-parameter case.Let θ (n)  be the values of the parameters after n times of update as in Sec.II B. Then, let U (n) M ({θ j } j∈M ) be the parameterized quantum circuit U (θ) in which the parameters are chosen to be θ (n) except for a set of parameters where M is a subset of indices for which the minimization is performed.Similarly, let L M ({θ j } j∈M ) be the cost function L(θ) with the fixed parameters θ (n) except for the parameters {θ j } j∈M (M ⊂ {1, • • • , J}): This cost function L (n) M ({θ j } j∈M ) can be rewritten as where denotes the Kronecker product, and b The optimization for the multi-parameter update runs as follows: ⊗|Mn| using a quantum device, where α j denotes the j-th element of α.Although the above choices of the parameters are not necessary, they have the advantage that the coefficients in Eq. ( 12) can be easily determined by using the discrete Fourier transformation.4. Update as follows: Similarly to the previous case, should be estimated directly in a certain period to avoid the error accumulation.

D. Special case of our method
In this section, we consider the case where the several rotation gates in the parameterized quantum circuit share the same parameter.This is the case when the target state has a symmetry, like translation invariance, and hence the ansatz state is also subject to it.Assume that the parameterized quantum circuit has J parameters θ = {θ j } J j=1 which are independent of each other.Unlike Sec.II B and II C, each θ j is used at S j times in the circuit.θ (n) , U where a ) and c j denote some constants.Note that, these constants are independent of θ j .(See appendix C for the derivation.)Since Eq. ( 16) has 2S j + 1 parameters, a

and c
(n) j , we can determine these constants using the values of cost function L (n) j (θ j ) at 2S j + 1 different values of θ j , and then find θ j minimizing the cost function L (n) j (θ j ).Using this feature, we propose an updating method which is the same method of Sec.II B except for the following points: ) using a quantum device.Although the above choices of the parameters are not necessary, they have the advantage that the coefficients can be determined easily by using the discrete Fourier transformation.

III. NUMERICAL SIMULATION A. Numerical setups
In this section, we numerically demonstrate the performance of the proposed method via two types of optimization tasks.One (task 1) is a benchmark task of the parameter optimization problems, which we introduced here to compare the performance of different optimization algorithms.In this task, we minimize where r is the number of qubits, and θ * is randomly chosen from a uniform distribution [0, 2π) a priori.This task is equivalent to VQE whose Hamiltonian is given by ⊗r U (θ * ).However, in our numerical simulations, we sample the output from the quantum circuit U † (θ * )U (θ)|0 ⊗r in the {|0 , |1 } basis for each qubit, which can be done even on an actual quantum device.Then we calculate the probability to obtain the outcome zero for all qubits similarly to the estimation of the inner product in Higgott et al. [35].The cost function multiplied by −1 corresponds to the fidelity between two quantum states U (θ)|0 ⊗r and U (θ * )|0 ⊗r .Since the minimum of the L(θ) is exactly −1 and achieved at least when θ = θ * , we can purely compare the performance of the optimization algorithms apart from the representation power of the parameterized quantum circuits.
The other (task 2) is VQE for the lithium hydride (LiH) molecule at bond distance with four qubits.The molecular Hamiltonian H LiH is the same as the one given in the supplemental material of Kandala et al. [10].
We use the method with the single-parameter minimization proposed in Sec.II B at each update.We reestimated the cost function ) once in 32 iterations.We compare the proposed method with five existing optimization algorithms, Powell, Nelder-Mead, conjugate gradient (CG), BFGS, and SPSA methods, implemented in the SciPy library [36] except for the SPSA method. 1 The hyperparameters of the SPSA method 1 Actually, we used a slightly modified code of them not to finish the optimization process in the middle.This modified code used for our experiment is available at https://github.com/ken-nakanishi/scipy.

FIG. 1.
The parameterized quantum circuit used in the simulations of Sec.III.These parameters θ are optimized to to minimize L. D denotes the number of repetition of a circuit in the bracket.
we used are the same as the default values in Qiskit v0.8.0 [37].
The initial values of the circuit parameters were randomly sampled from a uniform distribution [0, 2π).In the present numerical simulations, we sampled the outcome 1024 times for the estimation of the cost function (except for the results shown in Fig. 3), which determine the amount of the statistical error.For each simulation, the optimization was run 100 times starting from different initial values of the parameters.In the following, step counts the number of the estimation of the expectation value of the cost function L(θ).

B. Numerical results
In task 1, we set the number of qubits and the depth of the circuit in Fig. 1 to be r = 5 and D = 9, respectively.The total number of the parameters is thus 100.In Fig. 2, the horizontal axes represent the fidelity, and the vertical axes represent the number of samples whose fidelity is under the value of x-axis for each of 1024, 2048, 4096, and 8192 steps from left to right.From Fig. 2, one can see that our method colored by red converges extremely faster than the other methods we compared.Specifically, our method achieved fidelity higher than 0.98 after 8192 steps being independent of the initial set of parameters, while the other methods only result in far low fidelity for certain choices of the initial parameters.Figure 2 also shows that the gradient-based methods converge much faster than the gradient-free methods in general.Regarding the gradient-based methods, BFGS seems to be the best, while SPSA exhibits a better fidelity for some initial parameters.Furthermore, the Powell method outperforms the Nelder-Mead method when the number of steps is sufficiently large.
To investigate the statistical-error tolerance of each method, the number of samples to estimate the cost function is changed.In Fig. 3, we show the results after 8192 steps, for each of 256, 1024, 16384, and ∞ outcomes from left to right.Here ∞ means that the cost function is directly calculated from the inner product.In the limit of the larger number of accumulations, Nelder-Mead, CG, BFGS, and the proposed method achieve high fidelity with almost no dependence on the initial choice of the parameters.Specifically, BFGS and the proposed method both result in fidelity close to unit.However, if the statistical error becomes larger with fewer accumulations, the advantage of the proposed method gets larger.Notably, the proposed method can achieve fidelity above 0.9x with only 256 accumulations for almost all choices of the initial parameters.Thus we conclude that the proposed method is robust against the statistical error.
In task 2, the depth D in Fig. 1 is set to 4, and hence the circuit has 40 parameters.We show the results of VQE of the lithium hydride molecule in Fig. 4. The horizontal axes represent the energy difference (for top four figures) and fidelity (for bottom four figures) between the solution of VQE and the true ground state.The vertical axes represent the choices of the initial parameters, which are sorted by the values of each of energy difference and fidelity.From Fig. 4, one can see that our method gets closer to the true ground state much faster than the other methods we compared.Especially, our method achieved fidelity higher than 0.95, which is considered the close-tolimit of the representation power of the prepared parameterized quantum circuits, after only 512 steps against almost all initial set of parameters, while the other methods only result in far low fidelity for certain choices of the initial parameters.
Let us finally argue why the proposed method outperforms the other existing optimization algorithms.The possible weak point of BFGS and CG methods is that these methods require the gradients of all the parameters.They are good methods when we can calculate all gradients at once, such as neural-network models.In the quantum-classical hybrid algorithm, however, we need to estimate the cost function twice as many as the number of the parameters to get all gradients of the parameter.This could be a disadvantage of these methods, since the step counts not the number of updates but the number of the estimation of the cost function in total.In SPSA method, we use the difference of the two cost functions instead of the gradient.This method cannot avoid the additional noise on the gradient, and this can interrupt the fast convergence.In task 2, the SPSA method is better results than in task 1.It is considered that the gradients of the parameters in task 2 is farther apart of zero than ones in task 1 and then can be estimated better.

IV. CONCLUSION
In this work, we proposed an efficient optimization method for quantum-classical hybrid algorithms using parameterized quantum circuits.In most quantumclassical hybrid algorithms with parameterized quantum circuits, the relation of the cost function to each parameter of the circuit is just a sine curve with period 2π, on which our proposed method is based.To make a good use of the above property, we divide the optimization prob-  lem of the parameterized quantum circuits into solvable subproblems by considering only a subset of the parameters.By numerical simulations, we demonstrate that our method converges to a better solution much faster than the existing ones, especially in the presence of a large statistical error.The proposed method is expected to have robustness not only the statistical error but also to the noise in the NISQ devices due to the lack of the error correction, because the relation of the cost function to each parameter of the circuit would be robust against noise.This property of the present method would enable us to conduct parameterized-quantum-circuit-based variational algorithms on real quantum devices.We believe that this work drastically accelerates the quantumclassical hybrid algorithms and makes them practical in a realistic situation.Furthermore, the proposed method itself can be applied not only for optimizing parameterized quantum circuits but also for classical variational method.For example, the proposed method might be applied for the optimization of multi-scale entanglement renormalization ansatz (MERA) [38] by using variational unitary matrices inspired by parameterized quantum circuits.Neural networks composed of rotation matrices might also be considered as the another candidate. .The proposed method is denoted by the red lines.The horizontal axes are the energy difference/fidelity between the calculated ground state and the true ground state.The vertical axes show the number of samples whose energy difference/fidelity is under the value of x-axis at particular steps.

3
|M | -dimensional coefficient vector with |M | being the number of the elements of M .(See appendix B for the derivation of Eq. (12).)Since Eq. (12) has 3 |M | parameters b (n) M , we can determine b (n) M using the values of cost function L (n) M {θ j } j∈M evaluated at 3 |M | independent points of {θ j } j∈M .Then we can find the values of {θ j } j∈M that minimize the cost function L

j
(θ j ), and L (n) j (θ j ) is defined in the same way as Sec.II B. The cost function L (n) j (θ j ) can be transformed as

FIG. 2 .FIG. 3 .
FIG.2.Cumulative distribution function of fidelity after 1024, 2048, 4096, and 8192 steps in task 1.The proposed method is denoted by the red lines.The horizontal axes are the fidelity 0| ⊗r U † (θ * )U (θ) |0 ⊗r 2 .The vertical axis shows the number of samples whose fidelity is under the value of x-axis at particular steps.

FIG. 4 .
FIG.4.Cumulative distribution function of energy difference and fidelity in VQE for the ground state of the Hamiltonian of lithium hydride molecule LiH (task 2).The proposed method is denoted by the red lines.The horizontal axes are the energy difference/fidelity between the calculated ground state and the true ground state.The vertical axes show the number of samples whose energy difference/fidelity is under the value of x-axis at particular steps.