Benchmarking hybrid digitized-counterdiabatic quantum optimization

Hybrid digitized-counterdiabatic quantum computing (DCQC) is a promising approach for leveraging the capabilities of near-term quantum computers, utilizing parameterized quantum circuits designed with counterdiabatic protocols. However, the classical aspect of this approach has received limited attention. In this study, we systematically analyze the convergence behavior and solution quality of various classical optimizers when used in conjunction with the digitized-counterdiabatic approach. We demonstrate the effectiveness of this hybrid algorithm by comparing its performance to the traditional QAOA on systems containing up to 28 qubits. Furthermore, we employ principal component analysis to investigate the cost landscape and explore the crucial influence of parameterization on the performance of the counterdiabatic ansatz. Our findings indicate that fewer iterations are required when local cost landscape minima are present, and the SPSA-based BFGS optimizer emerges as a standout choice for the hybrid DCQC paradigm.


I. INTRODUCTION
In the field of quantum computing, one crucial aspect of research involves benchmarking quantum algorithms.The purpose of benchmarking is to thoroughly evaluate the performance of a quantum algorithm using a well-defined set of metrics.This evaluation process is particularly important for the Noisy Intermediate-Scale Quantum (NISQ) era.Benchmarking quantum algorithms is essential for gaining insights into the capabilities and limitations of NISQ computers while executing quantum algorithms.
In recent years, the field of quantum computing has witnessed remarkable progress, opening up new frontiers for solving complex computational problems that were once considered intractable for classical computers.Among several promising quantum algorithms, Variational Quantum Algorithms (VQAs) have emerged as a particularly intriguing and versatile approach [1].VQAs utilize the unique properties of quantum systems to tackle optimization tasks by iteratively optimizing the parameters of a quantum circuit to minimize a given cost function.This flexibility and adaptability make VQAs well-suited for a wide range of applications, including quantum chemistry simulations [2][3][4], machine learning [5,6], and combinatorial optimization [7,8].
One powerful example of VQA is the Quantum Approximate Optimization Algorithm (QAOA) [9].QAOA employs p layers of unitary operations to iteratively explore the potential solutions using classical optimization and enhance the probability of obtaining the optimal solution.By adjusting the number of layers, QAOA offers efficient use of quantum hardware for solving a given optimization problem.One of the main challenges of QAOA is its sensitivity to the choice of hyperparameters.Finding the optimal values for these parameters can be a challenging and time-consuming task, especially for large-scale optimization problems, as it may require significant classical computational resources and extensive experimentation.
Over the past few years, numerous algorithms have been introduced to enhance the capabilities of original QAOA.One such example is multi-angle QAOA (maQAOA), which incorporates additional parameters into the quantum circuit to achieve improved approximation ratios compared to the standard QAOA [10].Similarly, Adaptive Derivative-Assembled Problem-Tailored QAOA (ADAPT-QAOA) employs an iterative process to select the ansatz from a predefined pool of operators [11].This selection aims to maximize the gradient of the commutator between the pool operator and the cost Hamiltonian, leading to enhanced optimization outcomes.Several other techniques, such as Recursive QAOA (RQAOA) [12,13], Quantum Alternating Operator Ansatzes (QAOAnsatz) [14], Spanning Tree QAOA (ST-QAOA) [15], and Adaptive Bias QAOA (ab-QAOA) [16], have been proposed with the aim of enhancing various aspects of the QAOA.These methods claim varying degrees of improvements in circuit depth, parameter space, operator pool, and computational cost [17].
In this paper, we examine the performance and effectiveness of hybrid digitized-counterdiabatic quantum computing (DCQC) algorithms [18].This method incorporates elements from Shortcut to Adiabaticity (STA) [19,20], particularly the use of counterdiabatic (CD) driving [21], to reduce the circuit depth and improve the optimization process of conventional QAOA.In prior studies, these algorithms have exhibited their efficacy in addressing diverse optimization problems, includ-TABLE I. Update rules of parameters and gradients for hybrid gradient-based-optimizers using parameter shift rules (PS) or SPSA rules and gradient-free-optimizers.

Optimizers
Gradient calculation Update rules

COBYLA
Terminates when the target radius is satisfied.

Nelder-Mead
Terminates when the cost function is minimized according to the criteria in Eq.B4.
Our investigation focuses on convergence while using different gradient-based and gradient-free optimizers.We illustrate the scalability of this approach concerning system size, highlighting its effectiveness in finding the ground state, even when p = 1 within systems containing up to 28 qubits.To consolidate these findings, we explore the Fourier landscape of the cost function for various parameterizations using principal component analysis (PCA).This paper is structured as follows.Section.II provides a brief discussion of hybrid DCQC followed by a detailed description of gradient-based and gradient classical optimizers and their application in hybrid DCQC.This is followed by respective findings regarding various parameterizations, cost landscape, and variance of gradients for different optimizers in Section.III.We finally conclude in Section.IV and provide direction for future work.

II. BACKGROUND
Hybrid DCQC algorithms fall under the umbrella of VQAs where the quantum part consists of a circuit ansatz with parameterized gates.These parameters are optimized by classical routines to minimize a given cost function.The selection of these parameterized gates can be done in two major ways: taking information from the problem or taking information from the hardware.The former is known as problem-inspired ansatz and the latter is known as hardwareefficient ansatz.One of the popular problem-inspired ansatzes is QAOA where two unitaries U b (β) and U c (α) with tunable parameters (α, β) are applied iteratively p times to an initial state |ψ 0 ⟩.Here, U b (β) = exp (−iβH m ) is the mixing unitary and U c (α) = exp (−iαH c ) is the unitary corresponding to the cost Hamiltonian with [H m , H c ] 0. The aim is to minimize the expectation value ⟨ψ(α, β)|H c |ψ(α, β)⟩ for achieving the ground state of the system that encodes the solution the chosen problem.
The main drawback of QAOA is the need for high circuit depth (large p) as it takes inspiration from the adiabatic process which is inherently slow.To circumvent this, DCQC makes use of counterdiabatic protocols to decrease the circuit depth of the ansatz.In DCQC, the circuit ansatz is chosen by finding suitable operators using the nested commutator (NC) method [26].The resultant parameterized unitary is of the form [25] where A ∈ A (l) λ , and A (l) λ is an operator pool obtained from the NC method [25].θ is the set of parameters to be optimized.
The classical optimizers, employed to optimize these parameters, also play a crucial role in deciding the performance of an ansatz.Classical optimizers can be broadly classified into two categories: gradient-based and gradient-free.In gradient-based optimization, the iteration step is decided based on the gradient direction.The gradient values can be evaluated by various methods such as finite-difference or parameter-shift [27].There are also newer methods that take fewer circuit evaluations to compute the gradient [28].After evaluating the gradient, classical optimizers like Adam [29] and Adagrad [30] are implemented to find the next iteration steps.
On the other hand, gradient-free optimizers do not rely on the gradient information to determine the next iteration.These include optimizers like COBYLA [31], Nelder-Mead [32], etc.There also exist other optimizers like the simultaneous perturbation stochastic approximation (SPSA) [33] that do not directly utilize gradient information.In SPSA, the gradient is approximated along a randomly chosen direction by a single partial derivative computed by finite difference.This has shown a faster convergence rate than the finite difference method [33].
Before starting to explore gradient-based optimizers, we commence with describing a few methods to calculate gradients which will be used later.
Parameter shift (PS): Let us consider a cost function J(θ), which is parameterized by some variables θ ∈ R. PS enables us to calculate the exact gradient of J(θ) with respect to θ = {θ 1 , θ 2 , • • • , θ m } using the following relation [27]: where, ∇ θ i J(θ) represents the gradient with respect to element θ i ∈ θ. r is the shift constant, determined by the ansatz chosen.
For instance, it is proved that r = 1 2 for all single qubit gates [34].This method evaluates the same ansatz twice based on different shifted parameters to find the gradient making the runtime proportional to twice the total number of parameters.
SPSA: SPSA approaches the gradient by introducing a mdimensional perturbation vector δ = (δ 1 , δ 2 , • • • , δ m ) T , whose elements δ i are all randomly picked from {−1, 1}.The gradient estimator can be written as [35], where c is a small positive scaling factor.Due to this perturbation, SPSA is robust under noisy environments.Adjoint differentiation: Consider the circuit ansatz defined by the unitary evolution operator and any observable Ô for which we compute the trial state as |0⟩, such that the gradient with respect to θ i ∈ θ can be formulated as, Using the above-mentioned methods of computing gradients, we implement a total of six gradient-based optimizers, which are PS-SGD, PS-BFGS, PS-Adam, SPSA-SGD, SPSA-BFGS, SPSA-Adam, to compare their performance with different ansatzes.Further details on the optimizers can be found in Appendix A. In addition, we also used two gradient-free optimizers, namely, COBYLA, and Nelder-Mead (See Appendix B).Table I provides a brief summary of the update rules for every optimizer.Due to the high computational costs required for parameter shift, we use the adjoint method in simulations.This is because we have found a strong similarity between these two methods of computing gradients, which is explained explictly in Appendix C.
One of the most important factors that determines the performance of an ansatz is its expressibility, which is defined as how uniformly the ansatz is capable of searching the solution space of the problem Hamiltonian.When the solution is not known, a highly expressible ansatz is required to explore larger spaces such that the probability of finding the solution is higher.On the other hand, even when the ansatz includes the solution, it should be trainable, such that the optimization can reach the solution effectively.So, the requirement for a 'good' ansatz is that it should include the solution and be sufficiently trainable.Therefore, the choice of classical optimization can have a drastic impact on the trainability of the circuit ansatz.High expressibility is not always as good as one may face the problem of barren plateaus where the gradient vanishes exponentially with increasing system size and parameter space.Random circuits have already shown the presence of barren plateaus with increasing system size [36].
Another crucial element related to expressibility and trainability is the parameterization of the ansatz.Several works have been done to investigate the effect of parameterization on the performance of the ansatz chosen [37].Thus, making the right choice of parameterization and classical optimizers is crucial for an optimization algorithm to work.For example, if we have many local minima in the energy landscape, even with a good classical optimizer, we can get stuck to solutions that are not close to what we expect.Therefore, analysis of the energy landscape becomes important.If parameters scale with the system size, the energy landscape becomes high dimensional which makes the analysis non-trivial.
PCA is a commonly utilized technique in data analysis, aimed at reducing the dimensionality of a dataset while preserving as much information as possible.Generally, this is done by transforming the dataset to a new space determined by principal components, through which a lower dimensional hyperplane can be obtained such that the visualization of multidimensional data becomes easier.The computation of the principal components is done using the eigenvectors of the covariance matrix generated by the dataset.tion matrix E such that, where γ = (γ 1 , γ 2 , • • • , γ m ) represents the principal components and T defines its direction.Each γ i contains the information about the variance of i-th principal component.A reduced n-dimensinal principal component space can thus be achieved by truncating at n position.To obtain a more profound comprehension of optimization process using different optimizers for hybrid DCQC, we implement the first two principal components to demonstrate the cost landscape.Corresponding original parameters θ can then be obtained using Eq. 5 and utilized to visualize the optimization trajectory on the cost landscape.

III. RESULTS
In this section, we benchmark hybrid DCQC ansatz with different parameterizations as well as different optimizers to find the ground state of the Sherrington-Kirkpatrick (SK) model.SK model is a classical spin model that possesses all-to-all connectivity [38,39].This model is particularly interesting because several optimization problems can be encoded into spin glass systems [40].The Hamiltonian of the SK model is given by where J i j are coupling coefficients associated to spin i and spin j.We study the SK model with qubits ranging from N = 14 to N = 28 qubits with J i j ∈ {−1, 1} with mean 0 and variance 1.For this model, the favourable CD ansatz is A = i σ i y + i, j σ i y σ j z .In our investigation of the parameterization, we consider two types of ansatz: a 2-parameter ansatz, where one parameter is associated with all the single-qubit gates, and the other with the two-qubit gates.The second one is a fully parameterized ansatz, wherein each gate contains its own independent free parameter which amounts to a total of N(N − 1)/2 + N number of free parameters.In Fig. 1, we study the impact of this parameterization in both conventional QAOA and hybrid DCQC for p = 1, utilizing the PS-ADAM optimizer for N = 10 qubits.The 2-parameter ansatz fails to reach the ground state where QAOA shows better minimization compared to the hybrid DCQC ansatz.However, DCQC shows a lower standard deviation (the shaded region) which means it quickly converges to particular minima.On the other hand, QAOA shows high standard deviation showing random behaviour in the convergence process.Upon fully parameterizing the ansatz, DCQC shows superior performance in efficiently reaching the ground state.The standard deviation is also small for the final solution in DCQC proving its efficacy against gradient-based optimizers.Note that this fully parameterized version of QAOA is known as maQAOA [10] which has been shown to have a superior performance than QAOA.That being said, increasing the number of parameters per layer will result in an increase in expressibility.From this, we can conclude that the fully parameterized DCQC ansatz performs the best.Therefore, we select this ansatz and perform further analysis with various optimizers.
To compare the impact of different classical optimizers, we restrict ourselves to a system size of N = 10 qubits, initialize with ten random instances.Fig. 2 illustrates the performance of all the optimizers in 1000 function evaluations.The number of function evaluations is defined by the total number of measurements performed to evaluate the cost function during the optimization process.This metric is most suitable for comparing the performances of both gradient-based and gradientfree optimizers.In the case of gradient-based optimizers, this also includes the number of measurements required to compute the gradient.For instance, PS requires two function evaluations to calculate gradients with respect to each parameter resulting in (2m + 1) function evaluations for each iteration.Note that, gradient calculation using PS is costly in our study as the number of function evaluations increases linearly with the number of parameters.From Fig. 2a, we show the comparison of three optimizers where the gradients are computed using the PS rule.We observe that, among these PS-based optimizers, PS-Adam can approach ground-state energy faster than PS-SGD and PS-BFGS.
In general, SPSA-based optimization tends to perform better compared to PS as it requires only three function evaluations for each iteration regardless of the size of the parameter space.For gradient-based optimizers using SPSA, we observe that BFGS outperforms the other two with significant advantages, with fewer function evaluations.To make a thorough analysis, we have also compared two gradient-free optimizers in Fig. 2c.Like SPSA, in COBYLA we also need three function evaluations for each iteration whereas Nelder-Mead requires one evaluation only.However, COBYLA converges more efficiently compared to Nelder-Mead.In Fig. 2d, we compare the best optimizers from the three above-discussed groups where SPSA-BFGS comes out to be the best optimizer among all the optimizers studied in this paper.It is also important to note that this benchmarking heavily depends on the system size.Nonetheless, for smaller N, SPSA-BFGS demonstrates the best performance for DCQC.In addition, to comprehensively understand the process of reaching the minima, we count the number of iterations and the function evaluations needed for each optimizer to achieve 90% of the exact ground state energy.Fig. 3 clearly depicts the SPSA-BFGS outperforms other optimizers in terms of both the number of iterations and the function evaluations.
In order to numerically analyze the scalability of DCQC in the NISQ regime, we conducted evaluations based on the mean of 10 randomly initialized parameters for a specific instance of the model, employing p = 1 layers of the ansatz and the SPSA-BFGS optimizer from N = 10 to N = 28 qubits.In Fig. 4, we plot the approximation ratio with respect to the number of qubits where the approximation ratio is defined as the ⟨Hc⟩ /E g .We observe that the performance of DCQC ansatz is better than QAOA considering the fact that the size of the parameter space is the same for both the ansatz.Fig. 4 also shows that the performance of both ansatzes highly depends on the initial parameters chosen.This statement is evident from the high error bars, signifying that when the initialization is not good, the optimization lands into local minima.Nevertheless, with appropriate initial parameters, we can find the exact ground state for systems up to N = 28.Note that even for an ansatz with m = 406, DCQC can reach the exact ground state proving the absence of issues like barren plateau in the cost landscape.
Next, to visualise the cost landscape and the optimization trajectory, we employ PCA, which is designed to reduce the dimensionality of the parameter space by identifying a hyperplane that effectively captures the properties of all the samples.As mentioned in the previous section, PCA enables us to obtain the variance for each dimension of the hyperplane, which serves as a measure of the sparsity in the samples.By leveraging PCA, we gain valuable insights into the high-dimensional data, allowing us to better understand and navigate through the complex cost landscape.
The first two principal components (x and y axes in Fig. 5) account for at least 70% of the total variance in our data, which permits us to effectively visualize and understand the primary trends and patterns in the cost function.This enables us to make informed decisions to refine our optimization strategies accordingly [41].
In Fig. 5, we start the optimization with the same initial points for all the optimizers (shown by the blue dots) and show the optimization trajectory that eventually reaches to the minimum (the red dots).The number of points represents the number of iterations required to obtain the global minimum.The observations of the cost landscape validate the results we obtained in Fig. 2. We can observe that, the optimization requires less number of iterations when local minima exist in the cost landscape of the principal components.As depicted in Fig. 5a and Fig. 5f, ADAM works best for the PS-based optimization whereas, among SPSA based optimizers, BFGS reaches the minima in the fewest possible steps.Furthermore, cost landscapes in Fig. 5d and Fig. 5h indicate the inability of SPSA-SGD and nelder-Mead to reach a global optimum.
Considering the results presented earlier, we can confi- dently assert that SPSA based BFGS optimizer emerges as the most promising candidate for hybrid DCQC.Its effectiveness, combined with its ability to efficiently navigate through the cost landscape, makes it most suitable for the optimization process.This conclusion underscores the importance of selecting appropriate classical optimizers to enhance the performance and practicality of the DCQC algorithm for various quantum computing applications.

IV. CONCLUSION
In this article, we have conducted an extensive study of a hybrid DCQC algorithm, bench-marking its performance and efficacy with respect to various classical optimizers.Our results suggest that gradient-based optimizers like ADAM and BFGS typically exhibit faster convergence in comparison to gradient-free optimizers such as COBYLA and Nelder-Mead as they need less function evaluations.Nonetheless, the method used for gradient calculation plays a pivotal role in distinguishing the most efficient optimizer.When the number of circuit evaluations is considered as the performance metric, SPSA-BFGS emerges as the best, demonstrating a distinct advantage over other optimizers, whether they are gradient-based or gradient-free.For a fully parameterized ansatz, DCQC with SPSA-BFGS provides a better approximation ratio compared to maQAOA even when it is scaled up to a system size N = 28, avoiding the occurrence of any trainability issues such as barren plateaus.Moreover, the study of the cost landscape yields valuable insights into the behavior of the optimization processes, particularly in relation to variations in the first two principal components obtained through PCA.However, one issue requires addressing.The assumption of equality between parameter shift and adjoint derivative is founded solely on analytic, and further validation is necessary as the library continues to be developed.
In summary, this study serves as a comprehensive benchmark for hybrid DCQC algorithms, specifically examining their performance in optimizing the parameter space and its correlation with the cost landscape.Through an analysis of various gradient-based and gradient-free optimizers, valuable insights were gained into their convergence rates, efficiency, and sensitivity to system size and initial parameters.Additionally, employing PCA to study the cost landscape shed light on the optimization trajectory, deepening our understanding of the process itself.Notably, our emphasis on scalability sets this research apart.By evaluating DCQC in a system with up to 28 qubits, we underscore the substantial potential and relevance of this approach in addressing larger-scale quantum computing challenges.This research establishes a basis for selecting classical optimizers in hybrid DCQC techniques, enhancing their potential applications across diverse domains to address complex problems.an observable Ô: We set the notions like this: So that we can calculate the gradient of J(θ) with respect to parameter θ i : Using adjoint differentiation method to calculate the gradient: where ⟨φ i−1 | = ⟨ψ i−1 | U † i (θ i )B i+1 is set for simplification, so that it is easy to calculate ∂J(θ)/∂θ i−1 .
Using parameter shift method to calculate gradient, now we support U(θ) = e −iθ P/2 , where P = ⊗ n i=0 σ i , σ i ∈ {I, X, Y, Z}, and the gradient will be: According to [34], From the given evidence, we can infer that the gradient computed via both the adjoint differentiation and the parameter shift method coincides when no noise exists in the quantum circuit.These methods employ distinct mathematical approaches to obtain the exact value of the gradient.

FIG. 1 .
FIG.1.Energy as a function of iterations for the p = 1 layer, N = 10 qubits.Results show simulator data for 50 iterations using the PS-Adam optimizer comparing QAOA and DCQC ansatz.Ten initial parameters are randomly chosen for each ansatz.The shaded area shows the standard deviation.

FIG. 2 .
FIG.2.Energy as a function of a number of function evaluations for (a) PS-based optimizers (b) SPS-based optimizers (c) gradient-free optimizers (d) the best ones from the previous group.Comparisons are made between different optimizers on p = 1, N = 10 qubits system using fully parameterized DCQC ansatz.Ten random initial parameters are chosen to be the same for each optimizer.

FIG. 3 .
FIG. 3. Number of operations as a function of optimizer type for 10 instances of SK model when p = 1 layer, N = 10 qubits.The figure shows the counts (iterations, function evaluations) needed to reach the fidelity of 90% using SPSA-BFGS with fully parameterized DC ansatz.The gray lines at the tips of the bars represent the width of each standard error.

FIG. 4 .
FIG.4.Approximation ratio as a function of qubit number.The result represents p = 1 layered, N = 10 to N = 28 qubits ansatz.SPSA-BFGS optimizer is considered for both maQAOA and DCQC ansatz.

FIG. 5 .
FIG. 5. Principal component (PC) analysis result on the cost landscape of p = 1, N = 10 qubits system.The figure shows how the optimization trajectory goes with different hybrid optimizers on the cost landscape.The x, y axis denotes how much the first and second PC will affect the variance of the data.The red and blue dots represent the final and initial points respectively.