Mitigated barren plateaus in the time-nonlocal optimization of analog quantum-algorithm protocols

Quantum machine learning has emerged as a promising utilization of near-term quantum computation devices. However, algorithmic classes such as variational quantum algorithms have been shown to suffer from barren plateaus due to vanishing gradients in their parameters spaces. We present an approach to quantum algorithm optimization that is based on trainable Fourier coefficients of Hamiltonian system parameters. Our ansatz is exclusive to the extension of discrete quantum variational algorithms to analog quantum optimal control schemes and is non-local in time. We demonstrate the viability of our ansatz on the objectives of compiling the quantum Fourier transform and preparing ground states of random problem Hamiltonians. In comparison to the temporally local discretization ans\"atze in quantum optimal control and parameterized circuits, our ansatz exhibits faster and more consistent convergence. We uniformly sample objective gradients across the parameter space and find that in our ansatz the variance decays at a non-exponential rate with the number of qubits, while it decays at an exponential rate in the temporally local benchmark ansatz. This indicates the mitigation of barren plateaus in our ansatz. We propose our ansatz as a viable candidate for near-term quantum machine learning.

Similarly, quantum optimal control (QOC) aims to optimize the time-dependent system parameters of a quantum system to attain a given objective [23][24][25][26][27][28][29].QOC has been connected to VQA approaches, and advantages of moving from the discrete circuit picture to the underlying physical system parameters have been demonstrated [30,31].Such analog VQA approaches commonly utilize piecewise constant, or step-wise, parameterization ansätze [32][33][34][35][36], which behave like the Trotterized limit of very deep parameterized quantum circuits with very small actions per gate.
A major obstacle of VQA is the existence of barren plateaus in the loss landscapes, i.e. increasingly large regimes in the parameter space with exponentially vanishing gradients, which hinder training [37][38][39][40][41][42][43][44].The general scaling behavior and emergence of barren plateaus is largely not understood and the dependence of barren plateaus on the details of VQA has been an active field of research in recent years.The comparison of local to global objective functions, the dependence on circuit depth, and the effects of spatial and temporal locality of parameterizations have been studied in connection to barren plateaus [40][41][42][45][46][47].In particular, the emergence of barren plateaus has been proven in time-locally parameterized quantum circuits for global objective functions and for local objective functions in the case of nonshallow circuits [40,42,45].Limiting the controllability of such ansätze can reduce the onset of barren plateaus [47][48][49][50], which constitutes a trade-off in expressibility [51,52] in favor of trainability.This includes ansätze that are tailored to a given problem, such as the variational Hamiltonian ansatz [4,53], the unitary coupled cluster ansatz [54], and QAOA [11].These results suggest that non-local ansätze that depart from the parameterized circuit paradigm may mitigate barren plateaus without the loss of generality.Overcoming the obstacle of barren plateaus is crucial for the success of near-term QML technologies.
In this paper, we propose a parameterization ansatz for quantum algorithm optimization using generalized analog protocols.In this ansatz we directly control the Fourier coefficients of the system parameters of a Hamiltonian.This constitutes a method that is non-local in time and is exclusive to analog quantum protocols as it does not translate into discrete circuit parameterizations which are conventionally found in VQA.We compare our ansatz to the common optimal control ansatz of step-wise parameterizations for the example objectives of compiling the quantum Fourier transform as well as minimizing the energy of random problem Hamiltonians.This comparison shows that this Fourier based ansatz results in solutions with higher fidelity and in particular superior convergence behavior.Note that the optimization of Fourier coefficients has been proposed for the control

Levels of abstraction of quantum algorithms.
A common formulation of quantum circuits consists of a set of discrete gates, see panel (a).The physical realization of these gates consists of temporally isolated control protocols of the system parameters.These are denoted as θj(t) for the different parameters, see panel (b).A more efficient realization utilizes the full space of temporal evolutions of the parameters θj(t).This includes fully parallel protocols which take less time to complete the task, see panel (c).Any such protocol can be expressed via its Fourier coefficients θj,k , which we specifically treat as trainable parameters in our ansatz, see panel (d).
of molecular dynamics [55].It has also been used in a mixed approach that optimizes in the basis of piecewise constant functions [56], as well as in phase-modulated gradient-free optimization [57].The Fourier basis has also been used with tuned frequencies in the CRAB algorithm [23,58].However, studies on this particular ansatz in the context of analog quantum computing as a natural extension of VQA appear to be lacking.We demonstrate that our ansatz exhibits non-exponential scaling behavior with respect to the number of qubits in the objective gradient variance, which suggests the absence of barren plateaus.We conclude that our ansatz is a promising candidate for efficient training and avoiding barren plateaus in VQA.

II. METHODS
In quantum circuits, the time-dependent Hamiltonian parameters that implement the gates are sequential, rather than parallel, and therefore contain long idling times.This is a consequence of deconstructing unitary transformations into algorithmic sequences of logical gates.Fig. 1 illustrates different levels of abstraction of quantum algorithms.The departure from the conventional quantum circuit paradigm towards a larger and more intricate space of solutions of quantum protocols enables a computational speed-up due to parallelized Hamiltonian operations.
We write a general time-dependent Hamiltonian as where H j are Hermitian matrices that define the system.θ j (t) are the parameters that determine the timedependence of the system.The resulting time-evolution operator is formally written as where T indicates time-ordering.We restrict the timeevolution to t ∈ [0, 1] and use units in which ℏ = 1, for simplicity.The unitary transformation U θ is explicitly a function of the protocols θ j (t).In order to perform numerical optimization, it is necessary to choose a particular parameterization for the θ j (t).
In the ansatz which we highlight in this work, we parameterize the θ j (t) in terms of the first n f real-valued Fourier coefficients θ j,k such that This ansatz is motivated by its inherent temporal nonlocality, as varying a single parameter θ j,k changes the protocol θ j (t) at all points in time.It presents a natural choice for a time-non-local parameterization that results in protocols that are smooth and slowly varying by construction, which is advantageous experimentally.We initialize the parameters θ j,k randomly between ±π/k, such that slow modes are emphasized.
In addition to our ansatz, we consider the step-wise ansatz that uses the common discretization in terms of piecewise constant system parameters.
with k = 0, . . ., n f − 1.We initialize the θ j,k randomly between ±π.This ansatz is time-local and generates discontinuous step-functions with n f steps with values θ j,k .These steps are reminiscent of the sequences of parameterized gates in quantum circuits as they are conventionally found in VQA.Due to its connection to conventional parameterized variational circuits, this ansatz serves as a benchmark to which we compare our ansatz of Eq. 3.
In either ansatz, we optimize the parameters with respect to a given objective function L θ , which encodes a target transformation.The exact expression of any objective L θ depends on the details of the problem it describes.The êj,k are formally constructed unit-vectors that collect the trainable parameters θ j,k in the vector θ.Successful optimization corresponds to a time-evolution operator U θ which implements the target transformation.For a single optimization iteration, we vary the individual parameters θ j,k by a small δ and evaluate the objective function to estimate the respective derivatives such that we obtain the gradient We then update the parameters as where η is the learning rate, which we update dynamically using the ADAM [59] algorithm.g is the Fubini-Study metric, which contains information on the quantum geometry of the system in order to improve training behavior and makes this approach a quantum natural gradient descent method [60].For more details see App. A. Note that in a physical realization, the parameters θ j (t) cannot become arbitrarily large, and are limited by physical constraints or features of the realization.In our numerical approach, these parameters are unbounded.However, we find that these parameters remain reasonably small throughout learning, as we show below.

III. RESULTS
We compare our Fourier ansatz to the step-wise ansatz for the objectives of quantum compiling and energy minimization.Further, we evaluate the scaling behavior of the variances of objective gradients with respect the number of qubits.Throughout this work we use the Ising Hamiltonian [61] with a two-component transverse field for n q qubits as the controllable system that generates the variational unitary U θ .It is with controllable parameters B j x (t), B j y (t) and J j (t).We consider open boundary conditions, such that the index of J j (t) goes up to j = n q − 1.In total this gives (3n q −1)n f trainable parameters in θ, as the B j x (t), B j y (t) and J j (t) take the role of the θ j (t) in Eq. 1.Our ansatz In

FIG. 2. Illustration of hybrid quantum optimization.
A quantum processing unit (QPU) is assumed to have controllable parameters θ.Problem-specific input r is mapped onto the initial state of the qubits which the QPU evolves in time according to the parameters θ and its underlying Hamiltonian H.The final qubit state is measured to determine the value of an objective function L θ and the Fubini-Study metric g.These quantities are used on a classical machine to approximate the quantum natural gradient step to update the parameters θ and improve L θ .
Eq. 3 presents a general parameterization of system parameters and therefore the particular choice of the Hamiltonian is not essential.In particular, neither the Fourier ansatz nor the choice of the Hamiltonian are informed a priori by any objective at hand.They are agnostic to the optimizational tasks we utilize them for.

A. Quantum Compiling
We first demonstrate the performance of our ansatz for the example of learning implementations of the QFT represented by the unitary operation V , operating on n q qubits.The matrix elements of V are where k, l = 1, . . ., 2 nq .For compiling unitary transformations, we utilize the objective function where {r} is a set of randomized unentangled input states which is similar to recent methods [62].This objective function estimates the implementation error ϵ = 1 − |Tr(U † θ V )2 −nq | 2 between the unitaries U θ and V .Note that there exist state estimation and tomography methods [63][64][65][66][67] that are experimentally favorable over the overlap in Eq. 11.Here we use this overlap due to its straightforward numerical accessibility.In Fig. 3, we show the estimated implementation error ϵ during training, as a function of n f for n q ≤ 4. We observe that both implementations converge to the target transformation for sufficiently large n f .For smaller n f the accessible unitary transformations generated from the ansätze Eqs. 3 and 4 are insufficient and presumably do not contain the QFT on n q qubits.We emphasize that our Fourier based ansatz is consistently outperforming the step-wise ansatz in terms of convergence speed.We show in Figs. 3 (a,b,c) that our ansatz tends to converge after roughly 50, 100 and 200 training iterations for n q = 2, 3 and 4, respectively.Figs. 3 (d,e,f) show that the step-wise protocol ansatz tends to converge after roughly 100, 300 and 1800 iterations for n q = 2, 3, 4, respectively.For n q = 4 in Fig. 3 (f), the convergence behavior of the step-wise ansatz is increasingly inconsistent.The step-wise ansatz has the tendency to linger at suboptimal fidelities from which it only moves away very slowly.This behavior becomes more prominent with increasing n q and is a consequence of the loss landscape that follows from the pa-rameterization in Eq. 4. Our ansatz does not show this behavior, but rather exhibits faster and more direct convergence.This is an indication for the absence of vanishing gradients, as is apparent when comparing Figs. 3 (c) and (f).
In order to further evaluate the quality of the converged solutions, we show the minimal errors after training ϵ opt with respect to the hyperparameter n f for both ansätze in Figs. 4 (a) and (b).We find the minimal n f that is necessary for convergence during training to be approximately n f,min ≈ 4, 6 and 8 for n q = 2, 3 and 4, respectively.The minimal n f necessary for convergence appears to be the same for both ansätze in this example.For larger n f , the minimal error converges to very small values that show no strong dependence on n f .For the cases of n q = 3 and n q = 4, the resulting minimal error tends to approach ϵ opt ≈ 10 −5 .We note that for a concrete experimental realization, additional considerations, e.g.what dissipative processes are present and how well a specific parameter can be tuned dynamically, determine the overall success of these approaches, which will be explored elsewhere.
As a second figure of merit we consider the effective implementation action, which we quantify with the integrated magnitude of the vector of system parameters θ(t), such that Given that the parameters θ j (t) have the units of energy, this quantity is an overall measure of the phase or action that is accumulated during the time-evolution.It therefore quantifies an estimate of both the energy that is required to implement a protocol in a given time, as well as the time that is required given a bound to the magnitude of the parameters θ j (t).This figure of merit allows us to determine whether a solution with improved fidelity in our Fourier ansatz merely emerges due to decreased time-efficiency.In Figs. 4 (c) and (d) we show the effective actions Φ opt of the same optimal solutions of Figs. 4 (a) and (b), with respect to the hyperparameter n f .We find the two ansätze to be very similar in terms of necessary action and therefore time-efficiency.In both ansätze, there is no strong dependence on the hyperparameter n f past n f,min .While the implementation actions consistently remain reasonably small, there is a clear and expected tendency of implementations to require larger effective actions with increasing amounts of qubits.

B. Energy Minimization
As a second optimization task, we consider the energy expectation value of a problem Hamiltonian H p and its minimization.Specifically, we consider the objective function where U θ is the time-evolution operator of the Hamiltonian given in Eq. 2, which we use to construct the trial state U θ |0⟩.We use the shortened notation |0⟩ = |0⟩ ⊗nq of the state where all qubits are in the logical zero state.We perform this ground state search for random problem Hamiltonians for both our ansatz and the step-wise ansatz with n f = 16.In this example we do not apply the QNG, i.e. we set the metric g = 1, for simplicity.Fig. 5 shows the energy differences to the ground state energies ∆E = ⟨E⟩ θ − E 0 for the training trajectories of three randomized problem Hamiltonians for up to six qubits.We again see that our ansatz outperforms the step-wise ansatz in terms of convergence speed.There is an increasing tendency of gradients to flatten out in the step-wise ansatz.This behavior is not present in our ansatz and indicates the onset of barren plateaus in the optimization of ground state preparation for step-wise protocols.

C. Objective Gradient Variances
In order to quantify the presence of barren plateaus, we consider the variance of the gradients of the objective function for both our ansatz and the step-wise ansatz.In random parameterized quantum circuits this amounts to Learning trajectories for the ground state preparation of three randomly generated problem Hamiltonians for nq = 4, 5, 6 qubits for our ansatz (solid lines) and the step-wise ansatz (dashed lines).∆E = ⟨E⟩ θ − E0 is the expected energy of the prepared states relative to the ground state energy.In all cases n f = 16.
uniformly sampling possible initializations in the parameter space of θ [40].In analog parameterizations of quantum algorithms, the parameter space is aperiodic and non-compact such that sampling is more intricate.We consider uniformly sampled vectors θ inside a (3n q − 1)dimensional ball with radius |θ| max for each time-step in the step-wise ansatz, and |θ| max /k for each kth Fourier mode in our ansatz.The value of |θ| max determines the set of reachable states of a given ansatz.We consider the variance of the gradient with respect to the first parameter for the specific problem Hamiltonian We calculate the variance as a function of |θ| max for up to 8 qubits for n f = 128.Analytical arguments on the existence of barren plateaus in RPQCs [40] rely on time-local expressions of the gradient of a loss function such as Eq.14.This also applies to the step-wise ansatz.However, in our ansatz given by Eq. 3, the expression is † .The variance of this expression includes all possible covariances of time-local changes to the protocols θ(t), which differs substantially from the variances in RPQCs.Further, in the parameter space of θ(t), the unitaries U t ′ 0 and U 1 t ′ are neither necessarily independent in the sense of the Haar measure nor guaranteed to be 2-designs.Therefore, the analytical argument for RPQCs [40] does not apply to our ansatz.In particular, the argument generates no statement about the scaling behavior.
In Fig. 6 (a), we show the results of the step-wise ansatz.We find that the variance is independent of the amount of qubits n q for small |θ| max .For increasing |θ| max , the variance decays exponentially with |θ| max with slopes that are independent of n q .More importantly, the variance decays exponentially as a function of n q with a log-scale slope of roughly ln( 12 ), as indicated by the equally spaced lines.The step-wise ansatz is reminiscent of a continuous Trotterized limit of parameterized circuits and therefore these results are consistent with barren plateau studies on RPQCs [40].
In Fig. 6 (b), we show the results for our ansatz.The variances show asymptotic behavior as functions of |θ| max .They converge at increasingly large values of |θ| max , which vastly exceed implementation actions that are necessary for highly entangling unitaries such as the QFT as we show in Fig. 4 (c).Thus, in our ansatz |θ| max provides a useful hyperparameter for initialization that can be tuned to comparatively small values where the scaling with n q is very favorable.Further, we find that the variance decreases as a function of n q at a decreasing and non-exponential rate.This non-exponential scaling behavior indicates the reduction of barren plateaus in our ansatz, in particular during initialization.

IV. CONCLUSION
We have proposed a system-agnostic ansatz of analog variational quantum algorithms rooted in quantum optimal control.The central feature of our ansatz is that it treats the Fourier coefficients of the time-controlled system parameters of a given Hamiltonian as trainable.Therefore, our ansatz is non-local in time and has no direct analog in discretized parameterized quantum circuits.By restricting the modes to low-end frequencies we keep the amount of trainable parameters low, while also ensuring smooth quantum protocols and sufficient controllability by construction.We have applied a measurement based stochastic quantum natural gradient optimization scheme to our ansatz to generate protocols for the quantum Fourier transform for up to four qubits.Additionally, we have optimized ground state preparation processes for random problem Hamiltonians for up to six qubits.We compared the results to optimizations of the more commonly utilized step-wise parameterization ansatz.The results we have presented here are limited to few-qubit systems, as the numerical simulations on the native Hamiltonian level are computationally more demanding than the circuit-based counter-parts of conventional VQA.This does not translate into a lack of scalability in a true hybrid realization of the proposed method.
We have demonstrated that the convergence behavior of our ansatz outperforms the step-wise protocols in speed and consistency.We have found the effective im-plementation action to be comparable and to remain reasonably small in both ansätze.We have analyzed the gradient along the loss landscape for both ansätze, and have shown that our ansatz shows non-exponentially decreasing variances with respect to the amount of qubits, indicating an absence of barren plateaus.The step-wise ansatz shows a characteristic exponential decay with the amount of qubits that is consistent with barren plateau studies on random parameterized quantum circuits.The scaling behavior of objective gradient variances for larger systems, as well as tuning the sampling range for initialization and its relation to expressibility, will be elaborated on elsewhere.
In conclusion, our ansatz is a promising candidate for mitigating barren plateaus in quantum algorithm opti-mization and presents an alternative to parameterizations that are discrete or local in time.This approach is of direct relevance for current efforts of implementing quantum computing, as it provides realistic and efficient access to optimal quantum algorithm protocols.

FIG. 3 .
FIG. 3. Implementation errors during training of the quantum Fourier transform.The errors ϵ during training as a function of the hyperparameter n f for the QFT for nq = 2 (a,d), 3 (b,e) and 4 (c,f) for our Fourier based ansatz (a,b,c) and the step-wise protocol ansatz (d,e,f).For sufficiently large n f ≥ n f,min both ansätze converge to very small errors.Our Fourier based ansatz outperforms the step-wise ansatz in terms of convergence speed and consistency.

FIG. 4 .
FIG. 4. Minimal errors and effective actions for training the quantum Fourier transform.The minimal errors ϵopt (a,b) found during training and the corresponding effective protocol actions Φopt (c,d) of both ansätze.The training results are for the QFT for nq = 2 (blue circles), 3 (red triangles) and 4 (green squares).The inconsistent ϵopt in the step-wise ansatz for nq = 4 (b) is a consequence of the suboptimal convergence behavior, related to the emergence of barren plateaus.

8 FIG. 5 .
FIG. 5. Training trajectories for energy minimization.Learning trajectories for the ground state preparation of three randomly generated problem Hamiltonians for nq = 4, 5, 6 qubits for our ansatz (solid lines) and the step-wise ansatz (dashed lines).∆E = ⟨E⟩ θ − E0 is the expected energy of the prepared states relative to the ground state energy.In all cases n f = 16.
where U b a is the time-evolution operator from the time a to the time b ≥ a.For a ≥ b it is U b a = (U a b )

FIG. 6 .
FIG.6.Variances of the energy objective gradient.The variance of the gradient∂ θ 1,1 L E θ of the loss function L E θ = ⟨0|U † θ [σ 1 z σ 2 z ]U θ |0⟩for up to 8 qubits for the step-wise ansatz (a) and our Fourier based ansatz (b) on a logarithmic scale.The parameters are sampled uniformly within a radius of |θ|max for n f = 128.The lines are visual guides.