Improved variational quantum eigensolver via quasi-dynamical evolution

The variational quantum eigensolver (VQE) is a hybrid quantum-classical algorithm designed for current and near-term quantum devices. Despite its initial success, there is a lack of understanding involving several of its key aspects. There are problems with VQE that forbid a favourable scaling towards quantum advantage. In order to alleviate the problems, we propose and extensively test a quantum annealing inspired heuristic that supplements VQE. The improved VQE enables an efficient initial state preparation mechanism, in a recursive manner, for a quasi-dynamical unitary evolution. We conduct an in-depth scaling analysis of finding the ground state energies with increasing lattice sizes of the Heisenberg model, employing simulations of up to $40$ qubits that manipulate the complete state vector. For the current devices, we further propose a benchmarking toolkit using a mean-field model and test it on IBM Q devices. The improved VQE avoids barren plateaus, exits local minima, and works with low-depth circuits. Realistic gate execution times estimate a longer computational time to complete the same computation on a fully functional error-free quantum computer than on a quantum computer emulator implemented on a classical computer. However, our proposal can be expected to help accurate estimations of the ground state energies beyond $50$ qubits when the complete state vector can no longer be stored on a classical computer, thus enabling quantum advantage.


I. INTRODUCTION
The advent of quantum computing has seen a growing interest in developing useful applications for current and near-future quantum devices. Currently, and in the foreseeable future, quantum devices are expected to remain error-prone due to noise and decoherence. Hybrid quantum-classical algorithms, which are to some extent resilient to errors, have been proposed to integrate quantum and classical resources [1][2][3][4][5][6]. The variational quantum eigensolver (VQE) [2,7] and the quantum approximate optimization algorithm [8][9][10] are among the proposed candidates to address chemical and combinatorial optimization problems, respectively. Prototype problems have been experimentally demonstrated [11][12][13][14][15]. Simulations have also analysed such variational methods [16][17][18][19].
Despite the progress, all demonstrated applications fall within the small-scale proof-of-concept domain. While the VQE has a simple description, there is a lack of thorough understanding when applied beyond a small number of qubits. Recent works have suggested that such explorative endeavours, which are expected to require large numbers of parameters, will encounter barren plateaus [20] in the energy landscapes, diminishing all hope for quantum advantage. Furthermore, hitherto new problems in the largely unexplored large-scale simulations may be waiting for us. Thus, it is of immediate importance to investigate VQE beyond the small scale to establish a potential quantum advantage.
In this work, with a focus on the Heisenberg model, we contribute in three major aspects. First, we propose and test a general 'evolution' heuristic that can, by construction, systematically lower the ground state energy. We benchmark the performance of the heuristic on one-, two-, and threedimensional lattices, as well as several randomly generated Hamiltonians. Second, we perform largescale simulations of VQE using an ideal quantum computer emulator and analyze the performance trend as a function of an increasing number of qubits. Last, we study the experimental realizations of our methods. We ask whether finite samples are sufficient to accurately approximate the ground state energy. We propose and test a benchmarking toolkit suitable for current and future devices, using a physically relevant problem.
The paper is structured as follows. In Sec. II, we review the working of VQE and discuss the current problems that variational simulations face. In Sec. III, the theory of a 'state evolution' heuristic is introduced which builds upon VQE. In Sec. IV, the heuristic is tested for the Heisenberg model and randomly generated Hamiltonians. In Sec. V, we move on to large-scale applications of VQE and systematically study the performance for increasing lattice sizes. In Sec. VI, we discuss the relevant aspects for experimental realization, propose and test the benchmarking toolkit, and discuss if quantum advantage is feasible through VQE for the Heisenberg model. Figure 1 visualizes the working of the VQE. As a hybrid method, it requires a quantum and a classical processing unit. The calculation is started by giving some initial parameters as input to the classical unit. The quantum unit prepares a problem-specific initial state and takes as input the variational parameters suitably placed in a quantum circuit that needs to be executed on the device. After the execution, a measurement gives the output as a sequence of bits whereby each bit corresponds to the measurement outcome of each qubit. The bitstring is transferred to the classical unit, which is tasked to accumulate the bitstrings and calculate the energy once a certain number of bitstrings is reached. The calculated energy is fed to a classical optimization algorithm, optimizer in short, which computes the next set of parameters with the aim that successive iterations minimize (or maximize) the energy. When some internal criteria of the optimizer are met, it stops the procedure. The last (or lowest) calculated energy is then the ground state energy.

II. VARIATIONAL QUANTUM EIGENSOLVER
Although the variational quantum eigensolver has been originally proposed in the context of quantum chemistry, the idea is general and readily usable   for physics (and other) problems. We focus on its application to the quantum spin model given by where i, j are pairs on a hypercube lattice of N sites, σ σ σ ∈ {σ x , σ y , σ z } are the Pauli matrices, and J > 0. One benefit of studying spin models, in contrast to fermionic ones, is that such models find direct mappings to qubits and do not require transformations [21][22][23] that may yield prohibitively long circuits. We analyze the model for simple one-, two-, and three-dimensional lattices visualized in Fig. 2. A horizontal or vertical layer illustrates a twodimensional lattice and the edges of such a layer, a one-dimensional ring. Regardless of the dimension of the model, each spin interacts only with the nearest neighbour(s), shown through dotted lines. The alternating configuration of spins shown in Fig. 2 is called the Néel state. For an even number of spins, the ground state is found in the zeromagnetization sector. The mapping of the spins to qubits is straightforward; all spins pointing up are mapped to the initial state |0 and those anti-parallel to |1 . For the example shown in Fig. 2, the preparation step of the quantum processing unit in Fig. 1 leaves the state |0 unchanged for spins pointing in one direction and flips it to |1 for those in the other direction.

A. The variational principle
Let E be the energy of a system described by a Hamiltonian H, and E 0 its ground state energy. Then according to the variational principle, E given by a parameterized wavefunction ψ(θ ) is a strict upper bound to the ground state energy E 0 , VQE relies on Eq.
(2) to find the ground state energy.

B. Essential modules
Several factors play a role in variational methods. We briefly discuss the essential modules.
Optimizer: Tasked with lowering the energy at successive iterations and stopping the calculation, an optimizer plays a central role. There are two broad categories of optimizers: gradient-based and gradient-free. Depending on whether a device or an (ideal) emulator is used, each has its own merits and drawbacks. This work focuses on gradient-based optimizers since they converge faster for noise-free energy evaluations [24]. We use the SLSQP and BFGS algorithms [25][26][27][28]. For use on a quantum device, efficient (stochastic) methods have been developed [29][30][31][32][33][34].
Ansatz: The choice of an ansatz is crucial to algorithms. As the current noisy intermediate-scale quantum era [35] devices can handle only low depth circuits, we focus on these. In this paper, we examine a few different ansatzes that produce low-depth circuits and require a small number of parameters.
Initial state: The optimization process can be significantly accelerated if an efficient initial state is known for the problem. An example is the Hartree-Fock state in quantum chemistry [36,37]. For our problem, it is the Néel state [38,39].
Initial parameters: In order to take advantage of a specific initial state, the parameters in an ansatz also need to be appropriately chosen. Clever choices of initial parameters help avoid barren plateaus [6,[40][41][42]. In contrast, random initialization consists of randomly choosing each parameter's value in the interval [0, 2π).

C. Current problems and progress
Recent works relevant for the spin model given by Eq. (1) focus towards implementation on current devices [11,43] as well as outlining efficient schemes for similar problems [44,45]. They consider a few qubits and one-dimension only. The one-dimensional isotropic case of the model is analytically solvable using the Bethe ansatz [46]. Attempts have been made to implement VQE for this model on a quantum computer [47,48].
Even for small problems requiring a few qubits, the global minimum of the multi-dimensional rugged energy landscapes can be surrounded by multiple local minima. A simple numerical demonstration to show this is carried out as follows. We implement the "simple" case of one-dimensional isotropic anti-ferromagnetic rings with different numbers of spins. The BFGS optimizer is restarted 100 times, and the parameters are assigned randomly from the interval [0, 2π). It suffices to count unique values of energy for the purpose of demonstration. We accommodate the fact that the minima may be a valley (in multi-dimensions) by rounding from floating-point precision to 10 −3 for the counting. Figure 3(a) shows the number of local minima, M l , per 100 trials. For larger (N ≥ 7) problem sizes, the landscapes have multiple local minima. Figure 3(b) shows that the average energy for 100 runs is very close to the ground state in all cases, demonstrating that when the local minima exist, they surround the global minimum. Therefore, it is vital that variational methods are equipped with a procedure to escape from local minima. Recent works in this direction are limited and resource expensive [49]. Despite the fact that the system memory, which doubles per qubit, for simulating up to about 30 qubits is available on modern personal computers, medium scale simulations of the model given in Eq. (1) beyond 15 qubits are rare in the literature. One reason for this is that the memory requirement is not the only barrier to such simulations. Large circuit depths and thousands of iterations required to reach reasonable accuracy demand considerable resources beyond the small scale. For example, Ref. [50] reported that the simulations for the onedimensional spin model for 20 qubits required a week even with 192 CPU cores. We accommodate the resource demand for our simulations on a supercomputer [51] using a massively parallel emulator [52,53]. The same emulator performed quantum verification in the supremacy experiment [54].
The phenomenon of barren plateaus [20,[55][56][57][58] is seen as a significant hurdle for increasing the problem sizes. Therefore, it is necessary to keep the number of parameters as low as possible without sacrificing accuracy, not only to avoid barren plateaus but also to keep the total computational time to a minimum. Without actually performing large-scale simulations, it remains untested whether such problems can be effectively countered. Methods to tackle all the above-mentioned obstacles are essential for leveraging the power of future quantum devices using VQE. We address some of these problems in the present work.

III. STATE EVOLUTION USING VQE
In this section, we propose a method that builds on VQE and systematically tries to find the ground state of a problem Hamiltonian. We also introduce an ansatz that we will use for our simulations.

A. Evolution of the optimized state
During the variational optimization, the wavefunction at each iteration as a function of M parameters is Let U(Θ) represent the unitary operators corresponding to the optimized numeric values of the parameters. In order to avoid mixing of the optimized and unoptimized parameters, we denote the optimized parameters with Θ. In our notation, the VQE performs the task U (θ θ θ ) → U(Θ). The state, after the optimizer signals convergence, is given by We propose that the final state from Eq. (4) serves as the initial state, |Ψ 1 → |Ψ 0 , for another variational optimization. We substitute Eq. (4) in Eq. (3), such that Let the (final) wavefunction after p th successive repetition of the above substitution and optimization be given by We call the above procedure culminating in Eq. (6) as quasi-dynamical state evolution using the VQE for each cycle, or 'evolution' in short. To make use of |Ψ from cycle p at cycle p + 1, the parameters in U p+1 (θ θ θ ) need to be appropriately chosen, otherwise the progress is lost. This means U p+1 (θ θ θ ) should initially be an identity circuit [59]. Therefore, a suitable choice for our ansatz is θ θ θ = [0, . . . , 0], where by avoiding an initialization at random in the energy landscape we avoid possible barren plateaus [20].
In proposing Eq. (6), it was assumed that the U (θ θ θ ) at each cycle is the product of the same unitary operators. This assumption can be relaxed, such that for each evolution cycle, U (θ θ θ ) contains either a different combination or a different set of unitary operators. In the former case, the number of parameters M for all evolution cycles will remain the same. In the latter case, there are two possibilities: (a) ≥ M operators or (b) ≤ M operators. For near-term devices, option (b) is of considerable interest due to limited resources. For option (b), the maximum number of 'actively' utilized parameters during each evolution cycle is M. Keeping the number of parameters in a circuit low has at least two benefits for our VQE simulations. First, a low number of parameters helps the classical gradient-based optimizer to converge quicker since fewer gradient computations are required per iteration. Second, the construction of the ansatz is such that the circuit depth is directly proportional to the number of parameters. Thus, a lower number of parameters implies lower-depth circuits, and therefore usage of less computational resources.
When building a different U (θ θ θ ) at each cycle, it is currently unclear how to choose the best one for a given Hamiltonian. Adaptive methods [60][61][62][63][64] to build each operator one by one solve this problem but are computationally expensive. Due to the absence of a cost-effective way to ascertain a new U (θ θ θ ) for each cycle, we use the same one for each cycle. As our numerical investigations show in the following sections, this simple approach has significant benefits. However, even without the numerical evidence, Eq. (5) tells us that the number of active parameters during VQE can be carefully kept below a threshold and for each new cycle of evolution different unitary operators can be made accessible. This allows for expanding the parameter space at a minimal expense of a polynomial increase in the circuit depths.
In Ref. [65], a similar approach with the difference that parameters added in each cycle can be optimized together with the previous parameters was suggested, and demonstrated for an 8 qubits circuit only. It has been shown to be useful for combinatorial optimization problems [66]. Another similar technique involves changing the Hamiltonian at each iteration [67].

C. Connection to quantum annealing
We conjecture that for suitable choices of U(Θ) at each cycle, the evolution method facilitates finding the ground state |Ψ g of a Hamiltonian H, such that If finding the ground state energy E 0 of H is the primary interest, then using Eq. (7) and the variational principle, Thus, repeated applications of evolution cycles will systematically lower the energy, under the condition that suitable U(Θ) are chosen for each cycle. The proposal remains a conjecture because it is unclear what the suitable U(Θ) are for each cycle and given problem Hamiltonian. The adiabatic theorem [68] guarantees that a system, (say) initially prepared in the ground state, will remain in its instantaneous ground state if the change in the system Hamiltonian is slow enough and if there is a sufficiently large gap between the ground and excited states. Since a change in the system Hamiltonian is equivalent to a change in the unitary operators as given by Eq. (6), the theorem guarantees that Eq. (7) holds. By application of the variational principle on Eq. (7), it is further guaranteed that Eq. (8) holds. Since an optimizer in the VQE algorithm is designed only to accept those parameters that ultimately lower the energy, the construction of the evolution heuristic guarantees that the energy is either lowered or stays constant. For p → ∞, one can imagine choosing the parameters Θ such that the state evolution corresponds to an adiabatic evolution. A similar line of thought has helped to develop the quantum approximate optimization algorithm [8] and rapidly quenched quantum annealing [69].
From a numerical simulation perspective, the small or large but ultimately finite value of p depends on the computational resources required for each cycle. Our numerical evidence shows that indeed by increasing p, there is a systematic decrease in energy. We discuss these results in the following sections. Our numerical evidence suggests that the evolution method is a useful heuristic.

D. Our ansatz
The choice of an ansatz is crucial not only from a computational perspective but also for the evolution heuristic. If an ansatz can accurately represent the ground state energy of a given problem, optimization of the parameters may still pose problems, and the evolution heuristic can help overcome them. On the other hand, if an ansatz has a poor overlap with the ground state, the heuristic can still help achieve quasi-optimal results. The ansatz we consider ap-proximates the ground state energy reasonably well, but not exactly, for the larger problems of interest [70]. The ansatz is given by The number of unitary operators in this ansatz is given by N(N − 1). We note that, as pointed out previously in Refs. [71][72][73], the ordering of the factors in Eq. (9) is important. Any other ordering can produce results that may be different from one another. We name the specific combination of the factors in Eq. (9) as the XY-ansatz. The operators defined in Eq. (10) can be easily implemented in terms of single-and two-qubit gates on a quantum circuit. The details about the implementation are given in Appendix A.
The choice of the XY-ansatz is motivated by three empirical considerations. First, the ansatz produces low-depth circuits that can be implemented on nearterm quantum devices. Second, the parametrized gate is always placed on only one qubit. Due to manufacturing imperfections, not all qubits in a device perform equally well, and the XY-ansatz allows the best performing qubit to implement the parametrized gates. For current devices having higher two-qubit gate error rates, the best performing qubit may be the best connected qubit. Third, the ansatz has an all-to-one connectivity between the qubits, which is easier to implement in devices, in contrast to an all-to-all connectivity [65].

IV. PERFORMANCE OF THE STATE EVOLUTION METHOD
We compare the performance of our methods to the strategy of random initialization of the parameters. The random initializations (RI) strategy involves assigning random values to the parameters in the range [0, 2π). We present results for the Heisenberg spin model and several randomly generated problem Hamiltonians.

A. Heisenberg model
We test our heuristic on one-dimensional antiferromagnetic rings of length 4 ≤ N ≤ 12. Three types of strategies are used, namely, RI; initialization from the Néel state (NS) where the parameters are initialized as zeros; and the evolution heuristic. We calculate the energy fidelity given by the ratio of the final energy obtained using VQE and the ground state energy. As shown in Fig. 4(a), for N ≤ 6, NS performs just as well as the best out of 100 RI optimization runs -both can find the ground state energy. The average performance of RI, however, begins to deteriorate for N ≥ 6. Keeping the RI runs to 100, we observe a significant drop in the average performance as N increases. The drop in the perfor-mance can be understood from the perspective that as the number of parameters is increased, a larger number of restarts would be required to keep up the performance. Meanwhile, if the best RI run is better than NS, it is only slightly so. On the other hand, the initial energy for NS is lower than RI, requiring fewer energy evaluations until optimization convergence (data not shown). Thus, NS achieves similar accuracy by reducing the huge computational cost from 100 RI runs to just one run.
We perform our evolution heuristic on the final state at the end of the optimization of NS. In this way, we leverage the |ψ 1 obtained after the optimizer gets stuck in a local minimum for each N ≥ 7. We know that these are local minima because RI runs find a lower energy. For each evolution cycle, we use the same U (θ θ θ ), and the stopping criterion is when the change in energy is less than 10 −4 . As initial parameters we use zeros for each new cycle, the reason for which is discussed in Appendix B. The results are shown in Fig. 4(b). The odd-spins lattices have relatively lower energy fidelities due to the degeneracy of the ground state. Our heuristic further increases the energy fidelities in all cases, beyond the best of RI runs and significantly better than the average of RI runs. The evolution heuristic successfully overcomes several local minima for all N. One way to interpret how it overcomes is as follows. Once an optimizer gets stuck, a new cycle is started which can be seen as another VQE process but with a new initial state. Such a configuration change creates a restructuring of the energy landscape where a larger or different part of the Hilbert space becomes accessible. The energy obtained from the initial parameters loaded in the new cycle is no longer located at a local minimum as in the previous cycle. Using the evolution heuristic, we get the triple benefit of having a smaller parameter space, low-depth circuit, as well as a higher energy fidelity.

B. Random Hamiltonians
We study the performance of the evolution heuristic by studying many randomly generated Hamiltonians. To generate the problems, we define the Hamiltonian where the sum i, j sums over all pairs of N lattice sites. We randomly select one-third of all the terms in H. To the selected terms, we assign random values to all the coefficients J αα i j ∈ (0, 10) for α ∈ {x, y, z}. While Eq. (11) is analytically solvable for J xx i j = J yy i j = J zz i j = 1, the energy for a finite subset with random coefficients can only be computed numerically.
We generate two sets of random Hamiltonians, specifically for N = 6 and N = 8. In the case of N = 6, we use only half the number of operators in the XY-ansatz, which in itself produces a different ansatz. Since the XY-ansatz can be split into a product of two groups of unitary operators U 2 (θ θ θ 2 )U 1 (θ θ θ 1 ) |Ψ 0 , we use only U 1 (θ θ θ 1 ) as the ansatz. We prepare the system in the Néel state. We initialize the parameters randomly for each experiment and perform the evolution on the state obtained. Each evolution cycle uses the same U (θ θ θ ). In Fig. 5(a), we show the results for the N = 6 case. For each of the 1000 experiments, we performed 10 calculations per experiment such that after the first VQE calculation there are nine evolution cycles. We plot the ratio of the final energy E (1) obtained using VQE (blue squares) and the final energy E (2) calculated (orange dots) after the evolution cycles performed on the final state obtained from VQE. In the case of N = 8, we use the XY-ansatz, and all other settings are the same as in the N = 6 case. The results are shown in Fig. 5(b). From the results, we observe that the heuristic also works for N = 8. We observe that there is a significant improvement in the energy by using the evolution heuristic.
The fact that E (1) /E (2) < 1 in a large sample of random experiments for both test cases shows that the heuristic is useful in further increasing the energy fidelity without requiring a larger parameter space. While the evolution heuristic does not guarantee to find the global optimum, i.e. a fidelity of 1.0 (data not shown), the results from this section show that it can produce quasi-optimal energy fidelities that would otherwise require a large number of randomly initialized optimization runs. Furthermore, the simple approach of using the same U (θ θ θ ) for each evolution cycle appears useful for a relatively large set of problems.

V. LARGE-SCALE APPLICATIONS
We present results for large-scale applications of the evolution heuristic to find the ground state of the isotropic anti-ferromagnetic Heisenberg Hamiltonians on one-, two-, and three-dimensional lattices. In all cases, we use the XY-ansatz. We use the Néel state as the initial state and initialize the parameters to zeros. For each evolution cycle we use the same U (θ θ θ ). A. One-and two-dimensional lattices In Fig. 6(a), we show the evolution heuristic applied to one-dimensional rings of length N > 12. For the purpose of demonstration, we perform only a few evolution cycles per lattice, given by the green crosses. Each cycle is observed to increase the energy fidelity indicated by the crosses moving higher up. Similarly, Fig. 6(b) shows the results for ladder and square lattices of dimensions up to 13 × 2 and 6 × 6, respectively. Except for 6 × 6, all two-dimensional lattices were chosen to have open boundary conditions. The energy per spin for the largest ladder and square lattices, after one evolution cycle, were −2.216 and −2.647 as compared to the ground state energy per spin −2.261 and −2.715, obtained by the Lanczos method [74]. Note that J = 1 and the Hamiltonian has dimensionless units. In all large-scale simulations, the low-depth XY-ansatz can find the ground state energy with a fidelity greater than 96%. We observed that while a single evolution cycle increases the energy fidelity, for the two-dimensional lattices, the second cycle did not produce any variation in the energy. Furthermore, the convergence of the evolution heuristic to the ground state appears to slow down as the ring size increases. A possible reason for both observations is the breakdown of the assumption that the same U (θ θ θ ) is a good candidate for each cycle, suggesting that some other ansatz should be tested in the future.
The largest system that we simulated was a 40qubit isotropic ladder. Since the time required to perform one energy calculation was large, we removed some of the parameters. In order to remove the parameters, we rewrite the two product terms in Eq. (9) as U 2 (θ θ θ 2 )U 1 (θ θ θ 1 ) |Ψ 0 , and consider only U 1 , which contains then only half of the parameters. Further, we roughly followed the rule l + 10 ≥ k > l and reduced the parameters from 1560 originally to a more manageable 400. After four iterations, the energy per spin was −1.966 (without evolution), which was significantly lower than the Néel state energy −1.5. The ground and first excited state energies are −2.312 and −2.262, respectively, found using the Lanczos algorithm. The VQE energy estimate is considerably higher than the ground state energy, which can be explained as follows. Firstly, restricted by the computational time required, the estimate uses only the standard VQE. Lastly, the number of parameters in the XY-ansatz was heavily reduced, resulting in reduced performance.
We also studied the evolution heuristic for the 40-qubit ladder. We reduced the parameters even further with the rule k = l + 1 (using U 1 only) to 40. Let us denote the resulting product of operators as U 40 (θ θ θ ) |Ψ 0 . We performed two evolutions after the initial standard VQE energy of −1.867 with the same U 40 (θ θ θ ) per cycle, with a few iterations only, to obtain the energy per spin −1.878. The evolu-tion heuristic can lower the energy even for a very small parameter space. Due to the construction of the heuristic, a different U (θ θ θ ) may prove helpful in further improving the energy estimate. This is a task for future work.

B. Three-dimensional lattices
By simulating the one-and two-dimensional lattices, we observe that the XY-ansatz appears to yield a reasonable approximation to the ground state. Similarly, for the three-dimensional lattices, we continue by initializing parameters as zeros and the system in the Néel state, and use the XY-ansatz. Since the maximum number of qubits that can be simulated on classical hardware is severely restricted by the memory of the supercomputer, only a limited number of three dimensional lattices can be studied. We simulate 3 × 3 × b lattices where b = 2, 3, 4 with open boundary conditions and J = 1.
In Appendix D we show that, as expected, the Néel state continues to be an efficient initial state also for the three-dimensional lattices. The Néel state gives a low initial energy per spin to start the optimization, e.g. −1.83, −2.00, and −2.08 for b = 2, 3, and 4, respectively. We performed two evolution cycles for b = 2, 3. For b = 4, the circuit contains 1260 parameters and due to a time limit of 24 hours per run on the supercomputer, only a limited number of iterations are possible. Due to the amount of computational resources needed, we did not perform an evolution for b = 4. During two evolutions, the energies continued to drop for b = 2 and 3. The final energies per spin were −2.482, −2.631, and −2.694 which can be compared to the ground state energies −2.617, −2.720, and −2.797 obtained by the Lanczos method.

VI. CONSIDERATIONS FOR EXPERIMENTAL REALIZATIONS
There are several practical aspects for implementing the variational algorithms depending on the technological maturity of the quantum computing devices. In this section, we discuss relevant considerations for experimental implementations. Al- though the evolution framework is general and applicable to various problems, we restrict our focus to spin models of the type presented in the previous sections.

A. Hardware optimal benchmarking
Benchmarking protocols are employed to ascertain the proper working of any quantum device. Most protocols involve physically irrelevant tasks, although the hope for future quantum computers is to solve physically relevant problems, e.g. in physics or chemistry. The Heisenberg Hamiltonian could be a candidate. However, the XY-ansatz we presented requires all-to-one connectivity between the qubits. Although all-to-all connectivity is native to the trapped-ion devices [75], the number of functional qubits remains low. Alternatively, current superconducting devices, among others, which mostly offer neither all-to-all nor all-to-one connectivity but other benefits, are also dominant in the research community. Considering the limited qubit connectivities of the current devices, we propose that the mean-field Hamiltonian, given by Eq. (11) with J αα i j = 1 for all α ∈ {x, y, z}, is a suitable candidate for benchmarking. The ground state energy of the Hamiltonian is given by a simple and easy to verify [76] expression 3(a − N)/2 where a = 1 for odd N and a = 0 for even N.
We propose a very low-depth 'mean-field ansatz' well-suited for the mean-field Hamiltonian. nian by using a two-parameter mean-field ansatz on an ideal emulator. We create a 16 ⇥ 16 grid and plot the energy on the z-axis. We observe that the landscape is periodic for each parameter in the interval q 2 [0, 2p]. The global minimum is located at q 1 = q 2 = p/2. We compare the ideal landscape with those obtained from two different IBM Q devices, Santiago [68] and Belem [69], having quantum volumes 32 and 16, respectively. The results are shown in Figs. 8(b-d). Due to the limited connectivity between the qubits in the IBM Q devices, a careful selection of the layout is essential. We IBM Q Belem gates are requ swapping onl device conne sults. The tw having a uni all initial poi problem for t In our sim on the state v tum device, | one draws sa tion. The pot ing rests on t the sampling states in a ce sis) from the exponentially samples is su tion accuratel vantage is to two assumpti these two assu model using t From the fi tor, we sampl ability distrib by currently a maximum of number of qubits, the mean-field ansatz accurately recovers the ground state of the model using only up to N/2 parameters. The details of the meanfield ansatz are given in Appendix C. In Fig. 7, we show numerical evidence that the ansatz can recover the ground state energy for N ≤ 40. Since the ground state energy for the model is the same for N even and N even + 1 spins, for an odd number of spins we only consider N ≤ 15. The solution is easily verified by setting all parameters θ i = π/2 for i = 1, . . . , N/2. Therefore, in general, the meanfield ansatz and Hamiltonians are simple benchmarks for physics problems on the current and future quantum devices. Using the ansatz, we perform benchmarking of the IBM Q devices. In Fig. 8(a), we show the energy landscape for the N = 5 mean-field Hamiltonian by using a two-parameter mean-field ansatz on an ideal emulator. We create a 16 × 16 grid and plot the energy on the z-axis. We observe that the landscape is periodic for each parameter in the interval θ ∈ [0, 2π]. The global minimum is located at θ 1 = θ 2 = π/2. We compare the ideal landscape with those obtained from two different IBM Q devices, Santiago [77] and Belem [78], having quan-tum volumes 32 and 16, respectively. The results are shown in Figs. 8(b-d). Due to the limited connectivity between the qubits in the IBM Q devices, a careful selection of the layout is essential. We demonstrate the effects of using different layouts by choosing a layout where two adjacent qubits in our circuit are physically separated by one qubit in the middle, thus requiring swap gates. Using 2 13 samples, we observe qualitative agreement to the ideal landscape for all cases. For all the tested devices, the minimum also appears to be located at θ 1 = θ 2 = π/2. The ground state energy, calculated by exact diagonalization, is −6. The minimum energies found on the devices were −5.30 (Santiago), −3.97 (Belem without using swap), and −3.38 (Belem using swap). The quantitative difference can be attributed to the errors in the devices. We observe that the device with a larger quantum volume shows better results for our benchmark. The IBM Q Belem device performs worse when swap gates are required for the implementation, even for swapping only two qubits. Knowledge of the actual device connectivity is essential to obtain better results. The two parameters and an energy landscape having a unique global minimum reachable from all initial points makes this a simple benchmarking problem for the N = 5 case.

B. Computational states
In our simulations, we performed calculations on the state vector. However, on an actual quantum device, |Ψ is not directly accessible. Instead, one draws samples from the underlying distribution. The potential advantage of quantum computing rests on two closely related assumptions about the sampling. First, the number of contributing states in a certain basis (i.e. the computational basis) from the entire Hilbert space does not increase exponentially with N. Second, a finite number of samples is sufficient to represent the entire distribution accurately. The hope of gaining quantum advantage is to find algorithms that fulfil the above two assumptions. This section discusses whether these two assumptions are viable for the Heisenberg model using the XY-ansatz.
From the final state |Ψ calculated on the emulator, we sample 2 13 times from the underlying probability distribution. This sample size is motivated by currently available IBM Q devices, which offer a maximum of 2 13 samples per experiment. It should be noted that 2 13 samples are not necessarily sufficient to accurately extract all the possible unique states for the larger lattice sizes. However, accurate energy fidelity has overriding importance. In Fig. 9(a), we count the number of unique states or distinguishable bitstrings sampled from |Ψ for the one-dimensional isotropic rings and compare it to the number of unique states in the Hilbert space. We perform the same task for the ladder lattices, and the results are shown in Fig. 9(b). From both plots, we observe that the first assumption appears to be satisfied; that is, the number of unique states (purple dots and green squares) do not seem to grow exponentially with N.
In Fig. 9(c), we plot the energy fidelity obtained by calculating the energy from the 2 13 samples for both rings and ladders. The fidelities can be directly compared to those obtained by calculating the energy from |Ψ , shown in Figs. 6(a-b). We observe that the fidelities in Fig. 9(c) and Figs. 6(a-b) differ by about one percent for the rings and less than three percent for the ladder lattices. The difference is further reduced by using more samples. The energy calculated from a small number of samples is reasonably close to the actual energy. Therefore, a finite number of samples is sufficient to accurately represent the distribution.

C. Quantum vs classical
Quantum advantage concerns the idea that certain tasks can be completed significantly faster on a quantum computer than on a classical computer. In this section, we compare the expected computational time required by a fully functional future quantum computer against an emulation of an ideal quantum computer using a massively parallel simulator run on a supercomputer for the task of finding the ground state of the Heisenberg model. We question if quantum advantage is possible using VQE for the Heisenberg model. Although emulation of VQE might not be the best classical method for finding the ground state energy of the Heisenberg model on a classical computer, we restrict ourselves to the same method for the purpose of comparison.
The largest number of successive operations on a quantum computer are given by where N g is the maximum number of gates that need to be sequentially executed [79], T h is the number of terms in the Hamiltonian to be measured, a is a constant that encodes the number of groups of the Hamiltonians which can be simultaneously evaluated, and N s is the number of samples required to reach a certain statistical accuracy. The time taken to complete N o operations can be estimated by knowing the single-and two-qubit gate times. On the IBM Q superconducting type of quantum computer, single-qubit gates take about 85 ns and twoqubit gates take about 400 ns [80]. Ion traps offer even slower two-qubit gates of the order 1.6µs [81]. For our calculations, we take the IBM Q values. For the N = 24 spins ring, we count the total number of energy evaluations required up to three evolution cycles to be 84744. At each cycle, a new layer of U (θ θ θ ) is added to the circuit. The largest number of single-and two-qubit sequential operations executed are approximately 6400 and 11500, respectively. By taking optimistic values aT h = 3 and N s = 2 13 , the time required to calculate the energy once is about 126 seconds. The total time required to complete the variational calculation using a quantum computer stands at 124 days. In comparison, running the emulator on a supercomputer required 4 days for the same computation.
Similarly, for the 6 × 6 and 3 × 3 × 3 lattices, the time required to calculate the energy once is 90 and 52 seconds, respectively. The total number of energy evaluations were 12700 and 68426. The total time required to complete the variational calculation using a quantum computer stands at about 13 and 41 days, compared to 7 and 4 days on the supercomputer, respectively. Counterintuitively, the VQE runtime for the 6 × 6 = 36 spins lattice is less than that of the 24 spins ring due to the total energy evaluations and the number of evolutions performed.
The above calculations reveal that the large number of energy evaluations required to obtain the ground state energy forbids quantum computers to run VQE in a reasonable amount of time. However, as shown in Appendix E, we observed quick drops in the energy initially during the optimization which relatively slows down as it approaches a local minimum. Hence, the first few iterations provide the largest decrease in energy. For the 3 × 3 × 3 lattice, energy per spin of −2.605 is found within 5 × 10 3 energy evaluations. This will expectedly take 24 hours on a quantum computer.
In summary, VQE on an emulator using a clas-sical computer performs the task faster than a hypothetical fully functional quantum computer up to the number of qubits and tasks we tested. This does not mean that emulators will always perform tasks faster than actual quantum computers. For several parameters and lattice topologies, e.g. random couplings or frustrated structures, the Heisenberg model is a hard problem. The memory requirements to store the complete state vector for a 50 or more qubits system is beyond the current memory storage capacities of modern supercomputers. If the favourable scaling of the ansatz continues, a future quantum computer may be able to approximate the ground state energy of the Heisenberg model beyond 50 qubits. For such large problems, the polynomial time scaling given in Eq. 12 can be expected to hold. In this sense, the realization of potential quantum advantage using VQE for the Heisenberg model may be anticipated on a quantum computer.

VII. CONCLUSION
We studied the performance and the experimental implementations of the variational quantum eigensolver applied to the Heisenberg model. We introduced and extensively tested a state evolution heuristic to overcome the obstacles faced by current VQE algorithms. Without increasing the active number of parameters, our tests showed that the heuristic could escape local minima. In contrast, the current standard method, which uses restarts of sets of random parameters, would quickly become intractable even for quantum computers. We observed that the evolution heuristic improves the estimates of the ground state energies of the Heisenberg model compared to the standard VQE and this across all lattice dimensions. The simulations were accelerated by initializing the quantum system in the Néel state, which continued to be an initial state with significantly lower energy to start the variational optimization, as compared to random initializations. Taking into account that the current and near-term devices will only have a limited computational capacity, we proposed as a benchmark a mean-field ansatz requiring a circuit depth of five independent of the number of qubits. The benchmark was able to accurately recover the ground state of the mean-field spin model using only N/2 parameters. Finally, realistic gate execution time calculations of the current quantum hardware capabilities show that an emulator can perform the task of finding the ground state energy of the Heisenberg model faster than current (ideal) quantum computers. However, there is hope for quantum advantage because a computation that cannot be performed on classical hardware due to memory limitations may be performed on future quantum computers.
Even though our heuristic works very well, innovatory improvements can benefit the evolution method's performance and cogency. One possible direction is to investigate analytical or cost-effective numerical ways to find the best U (θ θ θ ) for each cycle and given problem. While the variational methods are not affected by the sign problem [82], it remains to be seen if the optimization algorithms will remain efficient for N > 40. The parallelizability of VQE is another aspect that can be explored [83]. The evolution heuristic can be seen as complementary to the idea of optimising all parameters actively, i.e. the entire parameter space can be made active when evolution stops making progress and vice versa. Another critical issue is the connectivity of the XYansatz; while the heuristic is ansatz-independent, the development of a hardware-connectivity efficient ansatz that also accurately approximates the ground state of the Heisenberg model is an open problem. The effect of quantum noise should also be studied in future works. While we have shown that the simulations find the ground state for up to the limit of what can be classically simulated using a limited number of samples, it remains to be seen if this favourable scaling continues beyond what can be classically simulated. Our results indicate that it can be expected.

VIII. ACKNOWLEDGEMENT
The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gausscentre.eu) for funding this project by providing computing time [51]. We acknowledge use of the IBM Q for this work. The views expressed are those of the authors and do not reflect the official policy or position of IBM or the IBM Q team. M.S.J. ac- Circuit showing the implementation of knowledges support from the project OpenSuperQ (820363) of the EU Quantum Flagship.

A. THE XY-ANSATZ
We illustrate how the XY-ansatz is implemented as a quantum circuit. The ansatz consists of products of unitary operators of the forms exp(−iθ A ⊗ B) and exp(−iθ A ⊗ B ⊗ C), which would require factorization into products of unitary operators [84] and significantly increase the circuit depth. Instead, when A = B = C = σ z , we take the implementation [85] given by the circuit shown in Fig. 10. To implement A, B ∈ {σ x , σ y } as in the case of the XYansatz, we appropriately change the basis as done in [15]. When the number of qubits is larger than two, after changing the basis, we always place the parametrized gate on the qubit with the largest index. After several trials we found that by using operators of type exp(−iθ A ⊗ B ⊗ C) to place all the parametrized gates on one qubit, the performance was consistently better. We believe that this aspect of quantum circuit preparation needs further exploration, but this is outside the scope of the current work.

B. EVOLUTION PARAMETERS
We discuss numerical examples that shed light on what parameters need to be employed when starting a new evolution cycle. We study two possibilities: all parameters initialized either randomly or as zeros.
As shown in Fig. 11, the energy fidelity drops rapidly if random new non-zero parameters are used instead of zeros. While starting a new evolution cycle with parameters as zeros increases the energy fidelity (not visible in the scale on plot), random parameters show the opposite effect. This is understood from Eq. (5), where |ψ can be used effectively in the next evolution cycle only if for the initial parameters θ θ θ An evolution cycle that systematically lowers the energy at each iteration can be achieved by setting U (θ θ θ = [0, . . . , 0]) in Eq. (13), but not by setting the parameters randomly. Additionally, setting parameters as zeros is equivalent to an identity circuit and is shown to help avoid barren plateaus [59] (see Appendix D). By using random parameters, and |Ψ is not preserved for the next cycle.
FIG. 12. Circuit implementation using the mean-field ansatz for N = 6.

C. MEAN-FIELD ANSATZ
The ansatz is given by where k = 1, 3, 5, . . . < N and the initial state is where the odd and even indexed (starting at 0, 1, 2, . . .) qubits are initialized in the state |0 and |1 , respectively. We construct an optimized circuit for the mean-field ansatz as shown in Fig. 12.
The ansatz only requires a nearest neighbour connectivity for quantum devices, which is readily available for current devices. For the N = 5 case discussed in the main text, we construct the same ansatz as one would for a four-qubit circuit and leave the fifth qubit without any gates. Due to the simplicity of the problem, such an approach appears to work. The final state appears to be a product of two-qubit states. Once the ansatz is constructed, the mean-field Hamiltonian terms can be measured by rotating to the appropriate basis.

D. INITIAL STATE EFFICIENCY
We compare the initial energy obtained using the Néel state and using randomly generated parame- ters (RI). As a specific example, we consider the 3 × 3 × 3 lattice where the circuit contains 702 variational parameters. We choose 10 4 different random sets of values for the parameters and plot the energy obtained in each case. We do not perform a VQE calculation but plot the energy corresponding to the initial parameters. Figure 13 compares the energy obtained using RI runs and the Néel state. Given that 10 4 random points in the energy landscape correspond to E ≈ 0, the landscape is relatively flat or barren across a vast region. We expect that the gradients computed around the points where the energy was calculated would be close to zero in this case. This is numerical evidence for the presence of barren plateaus for the lattice under discussion. For all the RI cases, the obtained energy did not deviate too much from E = 0, which is a poor initial energy for an optimizer to start. We also plot the energies obtained after evolution on the Néel state and the ground state energy for comparison. The results show that initial energies obtained from random parameters are far away from the ground state energy, and assuming there are no local minima between the initial point and the global minimum, it would take a significantly large number of iterations as compared to starting from the Néel state to approximate the ground state energy. Thus, the Néel state is an efficient initial state for this example.

E. OPTIMIZATION PROGRESS
We show the energy progress at each step of the optimization for selected lattices. We plot the energy at each step of the optimization curve for 24-qubit one-dimensional ring, a 6 × 6 (36-qubit) square lattice, and a 3 × 3 × 3 (27-qubit) threedimensional lattice, as shown in Fig. 14. Initially, we observe a quick drop in the energy across all lattices, including those not shown here. As the VQE reaches a local minimum, the gradients become smaller and so does the relative decrease in the energy. This is reflected by the flat part of the curve towards the right of each plot. The step-like structure reflects the working of quasi-Newton optimization algorithms.
As observed for the 3 × 3 × 3 case, after a significant initial drop, the drop in energy slows down after 10 4 energy evaluations. This is explained by the fact that the optimizer gets trapped in a local minimum. However, by using the evolution heuristic, escape from the local minimum becomes possible (around 4 × 10 4 evaluations), which shows a relatively large drop in the energy again.
In Sec. VI C, we focus on comparing the time required to obtain the same optimization curves using an emulator and an ideal quantum computer. Fig. 14 shows that, on the one hand, obtaining the ground state requires a large number of energy evaluations, and on the other hand, the most considerable change in the energy drop occurs during the first few iterations. If the accuracy of the final energy is not important, the energy obtained within the first few iterations could also be used for comparing the computational times.