Optimizing Variational Quantum Algorithms using Pontryagin's Minimum Principle

We use Pontryagin's minimum principle to optimize variational quantum algorithms. We show that for a fixed computation time, the optimal evolution has a bang-bang (square pulse) form, both for closed and open quantum systems with Markovian decoherence. Our findings support the choice of evolution ansatz in the recently proposed Quantum Approximate Optimization Algorithm. Focusing on the Sherrington-Kirkpatrick spin-glass as an example, we find a system-size independent distribution of the duration of pulses, with characteristic time scale set by the inverse of the coupling constants in the Hamiltonian. The optimality of the bang-bang protocols and the characteristic time scale of the pulses provide an efficient parameterization of the protocol and inform the search for effective hybrid (classical and quantum) schemes for tackling combinatorial optimization problems. For the particular systems we study, we find numerically that the optimal nonadiabatic bang-bang protocols outperform conventional quantum annealing in the presence of weak white additive external noise and weak coupling to a thermal bath modeled with the Redfield master equation.


I. INTRODUCTION
Quantum Annealing (QA) aims to solve computational problems by using a guided quantum drive. The dynamics is generated by a time-dependent Hamiltonian along a trajectory that ends at a final target Hamiltonian whose ground state contains the solution of the problem [1][2][3]. QA is based on the adiabatic theorem, which guarantees that if the Hamiltonian is changed sufficiently slowly, transitions to excited states are suppressed during the adiabatic evolution, thus preparing states that are close to the target ground state. Unfortunately, the adiabatic condition that ensures that the system remains in the instantaneous ground state leads to long time scales for the solution of hard computational problem. Within the framework of adiabatic computation, there has been several theoretical proposals on the optimizations of the Quantum Adiabatic Algorithm (QAA), such as heuristic guesses for the initial state [4], increasing the minimum gap [5,6], and the quantum adiabatic brachistochrone formulation [7].
The adiabatic trajectory is not the only path for reaching the ground state of a final Hamiltonian that encodes the solution of the computational problem. More generally, one could imagine many other paths, including those where the Hamiltonian is varied rapidly, that land at the desired state or, of practical interest, reach low energy states. In fact, it has been already found that for certain hard instances of problems, fast nonadiabatic paths can sometimes prevent the system from getting stuck at local minima, thus improve the search results [8][9][10]. The Variational Quantum Algorithm (VQA) is an example where one searches for such possible paths, using optimization of the outcome via the variation of a fixed number of parameters in the protocol. A hybrid machine, combining classical optimization and quantum evolution, optimizes the variational parameters. Such hybrid variational ap- proaches have proved useful in the context of quantum state preparation [11][12][13][14]. Recently, Ref. [15] introduced a variational quantum eigensolver (VQE) for applications in quantum chemistry. This idea was further explored in [16][17][18][19][20] and experimentally tested in [21][22][23]. In a related approach [24][25][26][27], Farhi et al. introduced a Quantum Approximate Optimization Algorithm (QAOA) for combinatorial optimization problems based on a parameterized square-pulse ansatz for dynamical evolution of the solver.
In this paper, we make a connection between VQA and op-arXiv:1607.06473v3 [quant-ph] 11 Mar 2017 timal control theory [28][29][30][31]. VQA is essentially an adaptive feedback control [32,33] of a quantum system with the objective function encoding the solution of a computational problem, see Fig. 1(a). It utilizes a hybrid system comprised of a classical computer that searches for the optimal variational protocol using measurements done on a quantum machine that generates the final states corresponding to different variational protocols, via a closed-loop learning method [34]. Using Pontryagin's minimum principle of optimal control, we show that the optimal protocol for VQA has a "bang-bang" form. Our results put the bang-bang ansatz of QAOA on a rigorous ground in contrast to VQA with a continuous-time evolution. A comparison of the performance of the optimal nonadiabatic bang-bang protocol with conventional (linear ramp) QAA demonstrates that the former significantly reduces error in the final state in the absence of noise or decoherence. The advantage over the linear ramp protocol in QAA survives weak dephasing white noise as well as weak coupling to a thermal bath. Furthermore, we perform a quantitative analysis of the characteristics of these optimal protocols. We numerically find a system-size independent distribution function for the duration of individual pulses, which may facilitate the development of effective algorithms for the classical optimizer through an efficient representation of the protocol with few variational parameters. Interestingly, each of the pulses in our bang-bang protocols contains commuting (either one-qubit or two-qubit) terms. Thus our protocol can be implemented by applying a sequence of one-qubit gates [ generated by the initial Hamiltonian, g = 0 in Eq. (9)] and two-qubit gates generated by the problem Hamiltonian (g = 1).

II. VARIATIONAL QUANTUM ALGORITHM
Consider a computational optimization problem such as finding a sequence of N bits that minimizes a certain function of all of the bits. To solve this problem with VQA, we consider a system of N qubits with a parameterized Hamiltonian Generically, we can cast the problem into generating a state |ψ that minimizes a certain cost function such as the expectation value of an operator O with respect to |ψ . A common example is finding the ground state of a disorderd classical Ising Hamiltonian [35], where the operator O is a Hamiltonian diagonal in the computational basis. In the context of quantum chemistry, VQE considers the operator O to be the Hamiltonian of a molecule [16].
The essence of VQA, as depicted in Fig. 1, is finding the time-dependent parameters g α (t) over a time period T such that minimizes a cost function ψ(T )|O|ψ(T ) . Generically, the controls g α (t) belong to a permissible set determined by the experimental setup. A common such set is given by simple bounds as seen in Eq. (3) below. The ideal solution could be a unique state |ψ target [as depicted in Fig.1(b)] that is the ground state of the target Hamiltonian or more generally a set of states in the Hilbert space with an optimal figure of merit. One can either fix the initial state |ψ(0) or add it to the list of the variational parameters (here we fix it motivated by experimental constraints). Generally, the longer the total time T , the closer we can get to an ideal solution.
One way to view this is to consider the reachable set, i.e., the set of all the final states one can reach by using one of the infinite number of permissible controls. The reachable set, naturally, grows with T (in fact, if g α = 0 is allowed, the reachable set for T = T 1 is strictly a subset of the reachable set for T = T 2 > T 1 ). As seen in Fig. 1(b), there could be a critical time beyond which the reachable set includes the target state and an exact solution is possible. There is no advantage in increasing T beyond this critical time. Generically, for smaller T , where the reachable set does not include the target state, the optimal protocols are highly constrained as they should prepare the closest point(s) of the reachable set to the target. For times longer than the critical time mentioned above, we expect an infinite number of protocols to produce the target as the evolution has extra time to meander in the Hilbert space. Our strategy is to fix T and find the best variational protocol g α (t). If the solution is not acceptable, we can increase T . Next we discuss how Pontryagin's minimum principle from optimal control theory determines the form of optimal g α (t) functions.

III. PONTRYAGIN'S MINIMUM PRINCIPLE APPLIED TO VQA
A. Bang-bang optimal protocols The parameters in Hamiltonian (1) are generically constrained by their range: during the evolution 0 < t < T . Eq. (3) implies that, by assumption, each g α can be tunned in the above range independently of the values of the other control parameters. For fixed initial state |ψ(0) , the coupling constants g α (t) uniquely determine the final wave function. Consequently, the cost function, which we take as an arbitrary function of the final state, is a functional of g α (t) The Pontryagin's minimum principle [29] is directly applicable here. Briefly, this theorem states that for a set of dynamical variables x evolving from given initial values x(0) with the equations of motionsẋ = f (x, g), where g are a set of control functions, the control functions g * that minimize an arbitrary function F [x(T )] of the final values of the dynamical variable satisfy at any point in time and for each of the control functions. The optimal-control Hamiltonian is defined as Here the " * " superscript indicates the optimal solution corresponding to g * . An important consequence of Eq. (5) is that if the equations of motion for x, and consequently the optimal-control Hamiltonian H, are linear in g, generically, the optimal protocol is bang-bang, i.e., at any time during the evolution we have g * This follows from the fact that at any point in time we need to choose g α to minimize H(x * , p * , g). If the sign of the coefficient of g α in the optimal-control Hamiltonian is positive (negative), we should then choose the smallest (largest) g α from the permissible range (3). In other words, the optimal protocol for each control function involves a sequence of sudden jumps between its minimum and maximum permissible values. The only caveat for the above argument is the possibility that the coefficient of a particular g α in H(x * , p * , g) vanishes over a finite interval (since the sign of this coefficient determines whether we should choose the minimum or maximum value). We expect this special scenario to be nongeneric particularly for the disordered systems considered in the present paper.
In the quantum mechanical context, if the physical Hamiltonian is linear in the controls, the equations of motion and consequently the optimal-control Hamiltonian will also be linear, giving rise to bang-bang protocols as verified in several recent studies on optimal topological quantum computing [36,37]. To find the protocol g that minimizes the cost function in our case, we expand the wave function in a complete orthonormal basis, e.g., the computational basis |z as |ψ(t) = z A z (t)|z and treat the real and imaginary parts of the amplitudes A z (t) as dynamical variables, which evolve according to the Schrödinger equatioṅ where H zz α ≡ z|H α |z and A R,I z ≡ Re, Im(A z ). Clearly, these equations of motion are linear in the control functions g α and the cost function (4) is a function of only the final values of the dynamical variables. Thus, the argument above holds and the optimal protocol is generically bang-bang regardless of the number of variational parameters. We remark that our optimal bang-bang protocol is nonadiabatic by construction, and we put no constraint on maximizing the degree of adiabaticity. The value of this result hinges upon the time scale over which a coupling constant is held fixed. The longer this time scale, the fewer parameters (switching times) are needed to represent the protocol. In fact, in the limit where this time scale goes to zero, any protocol can be approximated by a sequence of square pulses through Trotterization. In this paper, we find that the time scale above is indeed finite and is set by the energy scale of the Hamiltonian for the Sherrington-Kirkpatrick (SK) Ising spin-glass model [see Eq. (9)].

B. Presence of decoherence
From a practical point of view, it is important to assess the validity of the closed system findings in the presence of decoherence. Again, a straightforward application of the Pontryagin principle extends the above results for a closed system evolution to an open quantum system with Markovian dynamics described by a Lindblad equation of type bang-bang. This is due to the linearity of the dynamical equation (8). A decoherence operator F β can represent either noise in the Hamiltonian parameters (in which case, F β is Hermitian [38,39]) or an engineered bath [40]. In the former case, f β 's are constant rates of noise processes and in the latter, f β (t)'s are control knobs that the Pontryagin's minimum principle says should vary in bang-bang form for an optimal protocol. In the rest of the paper, we only focus on closed systems Schrödinger dynamics when finding the optimal protocol. We do, however, discuss the effects of noise and opensystem dynamics on our optimal protocols.

IV. VQA FOR THE SK SPIN-GLASS MODEL
We now focus on a canonical problem in combinatorial optimization, namely the SK Ising spin glass [41] with the energy function where J i j and h i are independent Gaussian random variables with zero mean and variance J 2 = h 2 = 1, and each σ z spin can take the values ±1. The goal is to minimize C over all the 2 n spin configurations. A multitude of practical combinatorial optimization problems map to this model. The computational cost of finding the minimum with classical algorithms is exponential in n.
In analogy with the simple instances of quantum annealing, we focus on the case with only one control function g(t) and use the following parameterized Hamiltonian: with the operator B ≡ − n i=1 σ x i representing a transverse field, which generates quantum fluctuations.
For the initial state, we choose the ground states of B. It is easy to prepare product state |ψ(0) = i commonly used in other schemes such as the QAA. Here σ z i | ↑ i = | ↑ i and σ z i | ↓ i = −| ↓ i . We would like to minimize the cost function ψ(T )|C|ψ(T ) . In the adiabatic scheme, a smooth ramp such as g(t) = t T is applied for 0 < t < T and we can generate large overlap with the ground state of C in the limit of large T . Here, we allow for arbitrary time dependence of the control function in the fixed range 0 g(t) 1. According to the general argument of Sec. III A, the optimal solution is bang-bang.
As discussed in the introduction, in VQA a classical optimization algorithm commands a quantum system to find the optimal protocol variationally from measurement of the cost function for many protocols. This requires many repetitions and it is to our advantage to use the shortest possible time T for which the final state has an acceptable overlap with the ground state of C (projective measurement is ultimately used in generating the ground state). In the adiabatic scheme, we only need one shot but there are important restrictions from the small energy gaps along the adiabatic trajectory, which can lead to exceedingly long time scales, over which quantum coherence cannot be even approximately sustained. Furthermore, the presence of noise or modulation in the control fields places important limitations on adiabatic schemes due to the emergence of the recently proposed noise-induced antiadiabaticity in the long-time limit [42]. Unlike QAA, which relies on the adiabatic theorem, VQA has no known connection to instantaneous ground states and the minimum gap to excitations as transitions to excited states during the time evolution are allowed as long as the system eventually lands at the ground state of the final Hamiltonian.
For a given p, the goal of the algorithm is to find a set of variational parameters that minimizes the expectation value of F p (γ, β) = γ, β| C |γ, β , which ensures that the state |γ, β approaches the ground state of C. Physically, the ansatz describes time evolution for a total time T = p i=1 (γ i + β i ), and a sequence of sudden switching between the Hamiltonians B and C. While this ansatz with a finite p is an intelligent guess, the result that we derived using Pontryagin's minimum principle implies that given bounded independent control of Hamiltonian terms, the ansatz (11) is the optimal choice for a VQA approach to optimization. We reiterate that B and C are each a sum of commuting one-and/or two-qubit terms. Therefore, our protocol can be interpreted as a sequence of simple gates. Estimating the required p requires an analysis of the characteristic time scales of the pulses, which we carry out in this paper.

V. NUMERICAL STUDIES
We start by verifying for small system sizes and short annealing times that the optimal annealing protocol is indeed bang-bang, by using a Metropolis Monte Carlo (MC) algorithm, which makes no assumptions about the nature of the protocol. We divide the total time T into S slices of duration δt = T/S and use a piece-wise constant protocol. The method approaches an unbiased optimization, i.e., it explores all permissible controls and chooses the optimal one, if the protocols obtained converge upon increasing S . We then proceed by carrying out a MC simulation starting from random initial protocols, without any assumption regarding the bangbang nature of the protocol. In each step, we slightly change the control parameter g in a randomly chosen discretized time interval. If the cost function gets smaller, we accept the attempt; if the cost function gets larger, we accept the attempt with probability e −∆E/T MC , where T MC is a fictitious temperature that is gradually reduced to zero.
In Fig. 2(a), we show the optimal protocol obtained from such MC simulation for a fixed instance of Hamiltonian (9) with n = 5 spins and total time T = 0.8. Indeed, the MC simulation converge to a bang-bang protocol for different initial protocols in agreement with the Pontryagin's minimum principle. Despite the convergence for short total time, the MC simulations often fail to converge for longer times and larger systems, signaling the difficulty of implementing VQA without any a priori knowledge about the form of the optimal protocol. However, based on the mathematical proof of the bang-bang nature of the optimal protocols, we can parameterize the protocol similar to QAOA [24] and use the durations of the pulses as variational parameters to be optimized with the interior-point minimization method (IPMM), increasing p to achieve convergence.
We have checked that IPMM results are indeed in agreement with MC results, e.g., Fig. 2(a) (it also runs much faster). In Fig.2(b), we show a typical optimized protocol obtained with IPMM for a certain instance of Hamiltonian (9) with n = 5 spins and T = 2. Guided by MC results, we choose around ∼ 20 × T variational parameters, which proved to be adequate (we converged to a smaller number of bangs than we allowed in the ansatz).
We now turn to the critical question of the time scales of the pulses. We observe numerically and then argue analytically that the typical time scale of each bang is independent of the system size, and is only determined by some characteristic energy scale of Hamiltonian (9). Therefore, from a complex- ity theory perspective, this result implies that the hardness of the optimization problem should translate into the number of pulses and/or the hardness of the protocol optimization. In Fig. 3 we plot the distribution of the time scales of each bang ∆t for system sizes n = 6, 7, 8, 9 and 10. For each system size, we fix the total annealing time to be T = 2, and average over 50 instances of the Hamiltonian (9). We find that the distributions for the bang time collapse for different system sizes, and peak at almost the same value. This observation suggests a universal average distribution of the bang times for the near optimal protocols, and a typical time scale (peak or average value) that is independent of the system size. Although we have only considered a few system sizes, the dependence on n is extremely weak and we expect our results to extrapolate to large n.
Finally we comment on the performance of our protocols. The cost function we minimize is the expectation value ψ(T )|C|ψ(T ) . Minimizing the energy expectation value results in larger overlap with the ground state of C. As expected, the time scales for our protocols are significantly shorter than those of the adiabatic algorithm with similar success rate. A comparison between the optimal bang-bang protocols and linear ramps g(t) = t/T is shown in Fig. 4. The errors in the final wavefunctions 1 − | ψ GS |ψ(T ) | 2 and final energies E − E GS ≡ ψ(T )|C|ψ(T ) − E GS are averaged over the 20 instances (out of 50 generated realizations) with the highest success rates, for the optimal bang-bang protocol and linear QAA ramp respectively. We find that, in the system sizes considered in this work, the nonadiabatic bang-bang protocol with the same total time, significantly outperforms the linear ramp (commonly used in QAA) in the ideal case, where the thermal environment and external noise are neglected. Of course, in practice one needs to include the overhead of searching for the optimal solution, and understand how it scales as function of system size. In particular, while implementing the bang-bang protocol consisting of square pulses on a quantum annealer is feasible [43], finding the optimal protocol with a classical optimizer could be difficult for certain hard instances of problems.

VI. EFFECTS OF DISSIPATION AND DEPHASING
Real-world implementations of the bang-bang and QAA protocols are inevitably subject to noise either in the external controls or due to coupling to the thermal environment. Therefore, it is important to examine the effects of these perturbations on our optimal protocols for practical applications.
Here we consider two noise models in order to evaluate the robustness of our bang-bang protocol, at the same time comparing it with the performance of QAA.

A. Random Dephasing Noise
Here we consider pure dephasing noise, where we introduce random fields in the x and z directions. This type of noise can capture noise induced by hardware electronics. Our error model can be viewed as the continuous time analog of the depolarizing channel commonly used for simulating noise in quantum circuits [44]. Since in a bang-bang protocol we either have g(t) = 1 or g(t) = 0 at any given time, we can write the stochastic Hamiltonian as where δh i (t) and δb i (t) are noise in the z and x directions respectively, with strengths independent of the value of the coupling constants (additive noise). Assuming independent white noise for different terms with zero mean and second moments the noise-averaged density matrix evolves with the following master equation [39] dρ(t) dt where we take W b = W h = W for simplicity. In the bang-bang case, the Hamiltonian H takes two different values H = C (H = B) for g = 1 (g = 0), while in the QAA case, H has the explicit time dependence of Eq. (10) with g(t) = t/T . In Fig. 4 we show the errors in the fidelity 1 − ψ GS |ρ(T )|ψ GS and final energy Tr ρ(T )C − E GS for different strengths of noise. We find that in the small W regime, the noise only slightly decreases the fidelity, acting like a perturbation without inducing any instability. The effects of the noise on the linear QAA ramp are similar both qualitatively and quantitatively, changing the W = 0 error by amount of the same order of magnitude. For the strongest strength of noise that we studied (W = 0.01), the fidelity of the optimal bang-bang protocol remains higher than that of the linear ramp protocol.
A comment is in order regarding the dimension of W and the range used. As δ(t − t ) has a dimension of time (inverse energy), W 2 has a dimension of energy. Strictly speaking, the δ function introduces infinitely large (albeit completely uncorrelated) random fields. This is unrealistic. In real experiments there is a characteristic high frequency, introducing a characteristic short time scale ∆τ, over which noise is correlated. This frequency scale is typically several orders of magnitude larger than the characteristic energy of the Hamiltonian (it diverges for the δ function). Therefore, Eq. (14) and (15) imply that δh, δb ∼ W/ √ ∆τ, which means that for moderate noise in the random fields δh and δb, the corresponding values of W are suppressed by √ ∆τ.

B. Weak Thermal Bath
Here we consider coupling the system to a weak thermal bath at temperature 1/β. In this regime, the dynamics of the open system can be approximately described by the Redfield master equation which is commonly used to model noisy QAA for an actual annealing hardware [45][46][47]. Here we use the formulation in Ref. [45] and apply it to both QAA and bang-bang protocols.
The system of many qubits is coupled to the thermal bath via the Hamiltonian n i σ z i Q z i , where Q z i are bath operators. We assume an Ohmic bosonic bath in thermal equilibrium, with the spectral density function given by where η is a dimensionless coefficient describing the strength of the coupling to the environment. We have taken the cutoff frequency of the bath to be infinite, so as to guarantee the Markovian assumption of dynamics. We employed Eqs. (4)(5)(6)(7)(8)(9) in reference [45] 5 shows the errors in the fidelity for different strengths of coupling to the bath, for both QAA and bang-bang protocols. Similar to the case of the closed system in the presence of white noise, we find that the errors corresponding to both protocols change in an analogous manner due to weak coupling to the environment in both the short-and long-time regimes. There is an intermediate time regime 8.5 T 11.5, where QAA exhibits remarkable robustness and a much smaller change in η = 0 error. However, the errors of the VQA and QAA get closer as we increase T . Once again, the fidelity of the optimal bang-bang protocol remains higher than the QAA even for open system dynamics.

VII. PULSE DURATION FROM THE PONTRYAGIN'S MINIMUM PRINCIPLE
Here we provide more details on how the Pontryagin's minimum principle can not only tell about the form of optimal solution for VQA but can also shed light on when the pulses should be switched on and off, in the context of the SK model.
Using the computational basis z = z 1 . . . z n , we represent the wave function as |ψ(t) = z A z (t) |z . The initial state with all the spins in the x direction corresponds to A z (0) = 1/ √ 2 n , and the Schrödinger equation reads i∂ t A z (t) = g(t)C z A z (t) + [1 − g(t)] n k=1 Az (k) (t), withz(k) = z 1 . . .z k . . . z n , wherez k represents a flipped spin with respect to z k and C z is the energy function we would like to minimize. In terms of the real and imaginary parts protocol that minimizes the energy of a SK spin glass. The optimal nonadiabatic bang-bang protocols significantly reduce the error when compared to QAA within the same running time, and, at least for our system sizes, the advantage remains in the presence of weak additive white noise in the control parameters as well as weak coupling to a thermal environment.
Importantly, we show that the characteristic time scale between bangs is fixed by the energy scales in the problem and is independent of system size, which we confirm numerically. This finding significantly reduces the number of variational parameters in VQA, potentially decreasing the computational cost of the VQA outer-loop classical optimization algorithm to a great extent.
Our results, that the bang-bang protocols are optimal and the duration of each square pulse is size-independent, inform the search for effective hybrid classical-quantum schemes for solving combinatorial optimization problems. Further progress relies on the development of efficient outer-loop al-gorithms as well as hardware development for quantum enhanced optimization. Ultimately, the power of our results lies in their application to larger systems, for which solving the time-dependent Schrodinger equation is impossible on a classical computer. Rapid developments in quantum technologies [48], together with the relative robustness of our protocols to specific models of external noise and thermal environment support the promise of such applications.