Variational Quantum Time Evolution without the Quantum Geometric Tensor

The real- and imaginary-time evolution of quantum states are powerful tools in physics, chemistry, and beyond, to investigate quantum dynamics, prepare ground states or calculate thermodynamic observables. On near-term devices, variational quantum time evolution is a promising candidate for these tasks, as the required circuit model can be tailored to trade off available device capabilities and approximation accuracy. However, even if the circuits can be reliably executed, variational quantum time evolution algorithms quickly become infeasible for relevant system sizes due to the calculation of the Quantum Geometric Tensor (QGT). In this work, we propose a solution to this scaling problem by leveraging a dual formulation that circumvents the explicit evaluation of the QGT. We demonstrate our algorithm for the time evolution of the Heisenberg Hamiltonian and show that it accurately reproduces the system dynamics at a fraction of the cost of standard variational quantum time evolution algorithms. As an application of quantum imaginary-time evolution, we calculate a thermodynamic observable, the energy per site, of the Heisenberg model.


I. INTRODUCTION
Quantum time evolution is a central task in physics.Real-time evolution provides detailed insight into properties of quantum mechanical systems, such as phase transitions [1][2][3] or thermalization [4,5].Imaginary-time evolution is an important tool that enables the preparation of ground states or thermal states [6][7][8].These can, in turn, be used for the calculation of thermodynamic observables [8,9].In particular, combining real-and imaginary-time evolution would allow the direct calculation of dynamical correlation functions at thermal equilibrium.
The range of applications of imaginary-time evolution extends beyond the field of physics.Ground-state preparation with imaginary-time evolution for gapped, nondegenerate Hamiltonians is guaranteed to converge in the generic case of non-zero overlap between the ground state and the initial trial state.This makes it a promising candidate in settings where a good initial state can be constructed, e.g. in chemistry applications [10] or in classical optimization problems [11].In quantum machine learning, the preparation of Gibbs states with imaginary-time evolution is a subroutine for quantum Boltzmann machines, which can, for example, be used in distribution learning or classification [12].
Since performing quantum time evolution generally requires representing the exponentially large wave function of a quantum system, quantum computers are a promising platform for developing efficient algorithms [13].In fact, in 1996, the Trotter algorithm for real-time evolution was among the first proposed use cases for a quantum computer [14].However, the complexity of the quantum circuits required for the Trotter algorithm depends on the Hamiltonian, and the circuit depth scales with the simulation time and accuracy [15,16].This renders the algorithm currently unsuitable for general time evolution on near-term devices, which are characterized by limited qubit connectivity and coherence times.The imaginary- time counterpart of Trotter suffers from the same restriction [8].
Variational algorithms for quantum time evolution, on the other hand, allow to choose a parameterized circuit as an ansatz to approximate the wave function, that operates within the device's capabilities.Using a variational principle, variational quantum time evolution (VarQTE) maps the quantum state evolution to the evolution of parameters in the model [17], both for real -time evolution (VarQRTE) and imaginary-time evolution (VarQITE).The parameter update rules depend on the evaluation of the Quantum Geometric Tensor (QGT) and gradients of the current energy and state.For an ansatz with d variational parameters, the number of circuits required to evaluate the QGT and gradients scale as O(d 2 ) and arXiv:2303.12839v3[quant-ph] 7 Aug 2023 O(d), respectively.While this does not pose a problem for small systems, the evaluation of the QGT quickly becomes a bottleneck once the system size, and therefore the number of variational parameters, increases.
Figure 1 shows a runtime estimation for VarQITE, assuming current superconducting processor specifications, see Appendix A for details on the derivation.For a few parameters the runtime is of the order of hours, but already for only 200 parameters the computation time around 1 week, which renders this algorithm currently impractical.With recent advances in processor sizes exceeding 100 qubits, such as the IBM Quantum Eagle or Osprey devices [18,19], improving the resource requirements of quantum algorithms becomes crucial for finding practically relevant applications of quantum computers.
Recently, focus has shifted to optimization-based algorithms, which implement partial steps or approximations of the full Suzuki-Trotter step [20][21][22][23][24].In the case of realtime evolution, for example, the projected variational quantum dynamics (p-VQD) algorithm [20] provides a scalable alternative to VarQRTE on near term-devices, if a single Trotter step can be efficiently implemented.However, the required quantum circuit gates in p-VQD reflect the couplings of the Hamiltonian.This means that, for Hamiltonians with long-distance interactions or numerous Pauli terms (e.g. in molecular dynamics), even a single step could involve global connections or deep circuits hindering the execution on near-term devices.Furthermore, the p-VQD algorithm is not directly applicable to imaginary-time evolution.
Other approaches concerned with real-time evolution are Variational Fast Forwarding (VFF) methods [25][26][27][28] and classical pre-processing approaches [29,30].VFF methods rely on diagonalizing the Hamiltonian or the Trotterized time evolution operator with a variational ansatz.However, finding the diagonalizing unitary remains challenging in practice, which limits demonstrations to very few qubits.Classical pre-processing techniques, on the other hand, impose additional restrictions on the simulated system, such as translational invariance [30] or Hamiltonians with low entanglement [29].Within such systems, these techniques scale to large systems, but they do not allow for general quantum time evolution.
Another line of work directly focuses on the preparation of thermal states by minimizing the free energy of a variational ansatz [31].This approach, however, also does not implement general quantum time evolution.
In this paper, we propose a novel variational algorithm for quantum time evolution based on a dual optimization problem, which allows to replace the QGT by evaluating the overlap of the variational ansatz for different parameter values.This formulation applies equally real-and imaginary-time evolution and does not require additional qubits or connections than already present in the ansatz.We show that this new algorithm requires significantly fewer measurements and thereby drastically reduces the expected runtime compared to VarQTE.This is summarized in Fig. 1, where, under the same assumptions, our proposed method can reduce the expected runtimes from several weeks for VarQTE to only a few days.Following the naming conventions of VarQTE, we name the algorithm DualQTE with specifiers DualQITE for imaginarytime evolution and DualQRTE for real-time evolution.
The remainder of this paper is structured as follows.In Sec.II, we recap VarQTE based on variational principles, derive the proposed dual formulation, and discuss how to implement it on a quantum computer.Then, in Sec.III, we demonstrate our proposed algorithm for the imaginary-time evolution of the Heisenberg model and investigate the resource requirements.As a practical application, we use the quantum minimally entangled typical thermal states method (QMETTS) to calculate thermodynamic observables.Sec.IV demonstrates the dual formulation for real-time evolution, including the calculation of variational error bounds.Finally, Sec.V concludes the paper and gives an outlook on possible applications and further research directions.

II. DUAL FORMULATION OF VARIATIONAL TIME EVOLUTION
For a time-independent Hamiltonian H acting on n qubits, an initial quantum state |Ψ 0 ⟩ and an evolution time t, the real-time evolved quantum state is For an imaginary-time evolution, the time evolution operator is non-unitary, and the normalized state reads Variational quantum time evolution maps the evolution of the quantum state |Ψ(t)⟩ to the evolution of parameters θ(t) ∈ R d of a parameterized quantum state |ϕ(θ(t))⟩.The parameters' dynamics can be derived with variational principles such as the Dirac-Frenkel, McLachlan, or time-dependent variational principle [17].In McLachlan's formulation, the derivatives of the parameters are determined by the linear system of equations where the matrix g = Re(G) ∈ R d×d is the real part of the QGT, and we call b ∈ R d the evolution gradient.The QGT is defined as ) where we use the notation ∂ i := ∂/(∂θ i ) and do not explicitly state the time dependence of the parameters.The evolution gradient for VarQRTE is given by the expression whereas a VarQITE evolution yields the following with the energy E(θ) = ⟨ϕ(θ)|H|ϕ(θ)⟩.From hereon, we present general equations that apply to both real and imaginary-time evolution; thus, unless specified, we simply use b without a specific superscript.
Note that these equations are introduced for a timeindependent Hamiltonian, but they can also be applied to the time-dependent case H = H(t).

A. Dual formulation
Instead of solving the linear system defined in Eq. ( 3), we propose to solve the dual formulation of the problem [32,33] given by θ = argmin The term ∥ θ∥ 2 g(θ) = θT g(θ) θ is the squared norm of the parameter derivative in the metric of the QGT.This quantity describes the magnitude of the derivative from an information geometric point of view and is derived from the Fubini-Study metric.For infinitesimal displacements δθ, we have where ∥ • ∥ 2 is the ℓ 2 norm [33].By writing θ = δθ/δτ , for some time perturbation δτ > 0, we can now reformulate the optimization in terms of the fidelity where we directly optimize for the parameter update δθ and we introduced the loss function In practice, the optimization problem can be solved without the factor (δτ ) −2 , which decouples the shape of the locally quadratic infidelity term from the time perturbation and improves the numerical stability of the optimization.
Note that this dual formulation can alternatively be obtained from the derivation of quantum natural gradients [31,33], which is detailed in Appendix B. For an intuitive understanding of the relationship of the infidelity and QGT the effect of approximating ∥δθ∥ g(θ) ≈ 1−F (θ, θ+δθ) in an illustrative example is demonstrated in Appendix C.
Instead of computing the QGT at each timestep, which requires O(d 2 ) circuit evaluations, we now have to solve an optimization problem where the loss function requires only one fidelity evaluation.The required resources of DualQTE per timestep are therefore O(d) for the computation of the evolution gradient b, times the number of iterations in the optimization.Thus, we improve upon the direct QGT approach if the number of iterations scales better than O(d), which, as we show in the following sections, is the case for the examples we investigate in this work.

B. Evaluating the loss function
The evaluation of the loss function L, defined in Eq. (10), requires the calculation of the evolution gradient b and the fidelity of the ansatz |ϕ(θ)⟩ for two different parameter sets.For imaginary-time evolution, the evolution gradient can, for example, be evaluated with analytic gradient rules, such as the parameter-shift rule or a linear combination of unitaries (LCU), or with finite difference methods [34].In the case of real-time evolution, however, we are restricted to an LCU approach, as this is the only method that allows the calculation of the imaginary part of gradients [12].
The fidelity F can, for example, be estimated using the swap test [35] and its variants [36], where the states are prepared in separate qubit registers followed by entangling gates across these registers, or with the Hadamard test, which adds only a single auxiliary qubit, but requires controlling the state-preparing unitary [37].A more near-term-friendly option is the compute-uncompute method [38], which does not introduce additional global operations.If the states are given by |ϕ(θ)⟩ = U (θ) |0⟩ for a parameterized unitary U and two different parameter values θ and θ ′ , the fidelity can be calculated by preparing U † (θ)U (θ ′ ) |0⟩ and measuring the probability of obtaining |0⟩ on all qubits.
If the state |ϕ(θ)⟩ has n qubits and the preparing unitary U has depth m, the swap test variants require a circuit width of 2n with depth of m+O(1), whereas the evaluated circuits for the compute-uncompute method are of only width n, but of depth 2m.The Hadamard test for fidelities between the same circuit with different parameters can be evaluated by controlling the parameterized gates, resulting in depth of m and depth of n+1, plus the overhead of controlling the gates.For sparse device connectivities, this can be a challenge.To avoid increasing the circuit complexity, the overlap can also be estimated via randomized measurements of two independent state preparations [39].However, this technique requires an exponential number of measurements.
Evaluating the QGT for VarQTE, however, suffers from similar issues.The QGT can be evaluated as the Hessian of the infidelity [40] using a parameter-shift or finite difference technique, which comes with the restrictions for fidelity evaluations as described above.Alternatively, Eq. ( 4) can be directly computed with an LCU approach, which adds two auxiliary qubits and two entangling gates [12].This method is less demanding than e.g. a Hadamard test, but still comes with additional connectivity requirements.In practice, for both VarQTE and DualQTE a suitable combination of parameterized quantum state |ϕ(θ)⟩ and gradient method must be selected, such that the resulting circuits can be executed reliably.
Depending on the topology and coherence times of the available hardware and the structure and size of the unitary, either method for gradient and fidelity calculations can be advantageous.In this work, we focus on near-term friendly methods and use the parameter-shift rule for gradients (if possible) and the compute-uncompute method for the fidelity, as these do not require additional gate connections or an exponential number of measurements.Note that, for systems with a large number of qubits, this method might become unsuitable as it measures the global zero projector.Then, approaches using only local measurements, such as the Hadamard test, could be the better choice.

C. Solving for the update step
The infidelity-based loss function L is locally convex around δθ = 0, as its Hessian at this point is ∇∇ T L(0) = g/2, and g is positive semi-definite.To leverage this property, we use gradient descent as a local optimization routine, which also allows the use of analytic gradient formulas that have proven more stable in presence of shot noise.The gradient of L with respect to the parameter update δθ is The gradient of the fidelity can be evaluated with a parameter-shift rule , where e i is the i-th unit vector and s is the parameter shift, which can be chosen as, e.g., π/2 for single-qubit Pauli rotations [34].
At each timestep, the gradient descent update for the update step δθ is where η k > 0 is the learning rate at step k.This iteration is continued until a maximum number of iterations or a convergence criterion is met.An example of the latter is a minimum tolerance in the change of the cost function or the norm of the gradient.
An intuitive choice for the initial guess δθ (0) is the zero vector, which corresponds to no change in the parameters.However, a more efficient choice can be to introduce momentum by warm starting the optimization with the update step from the previous timestep.This heuristic is motivated by the idea that, especially for small timesteps, we do not expect the parameter derivatives θ to change significantly.
Methods that approximate the gradient, such as finite difference or SPSA [41], may face challenges in the optimization.For small timesteps, the fidelity is close to 1 and the noise in the readout, e.g. from finite sampling statistics or device noise, can easily mask changes in the cost function.Parameter-shift gradients suffer less from this problem, as they allow to evaluate the cost function over larger perturbations, and do not amplify the noise by dividing by a small constant.
The ideal choice of the time perturbation δτ is a tradeoff: the error in approximating the QGT scales with (δτ ) 3 , but a smaller perturbation amplifies any measurement noise in the loss function as the update step is obtained as δθ/δτ .Appendix C displays this trade-off for an illustrative example.

D. Trainability
Recently, there has been a lot of research showing that, in certain settings, the loss function gradients of variational algorithms decay to zero exponentially and cannot be evaluated efficiently, as they would require an exponential number of measurements.These so-called barren plateaus can be induced, for example, if the loss function requires measuring a global observable [42], if the quantum circuit preparing the parameterized state is too deep or generates too much entanglement [42][43][44], or if the measurements are too noisy [45].
Since variational quantum dynamics is driven by the evolution gradient defined Eqs. ( 5) and ( 6), it can be affected by a barren plateau and fail to track the true evolution of the quantum state.However, it is important to note that the gradients only vanish on average for a random initialization, whereas in time-evolution the initial quantum state is typically specifically chosen.Furthermore, Hamiltonians of physical systems are usually local, as they reflect the interactions of the quantum mechanical system, and exponentially vanishing gradients can be avoided by choosing a circuit depth scaling logarithmically in system size [42].Alternatively, an application-specific ansatz with few variational parameters can help mitigate barren plateaus, such as circuits based on Hamiltonian evolutions [46,47].
In addition to the evolution gradient, the DualQTE loss function gradient ∇ δθ L depends on the gradient of the fidelity, which relies on measuring a global observable.This can be seen by writing the fidelity of two n-qubit states prepared by unitaries U (θ) and ⊗n is the global projector on the all-zero state.Thus, evaluating the fidelity gradient for two randomly selected parameter sets θ and θ ′ would exhibit barren plateaus at any circuit depth [42].However, the optimization in DualQTE starts at zero perturbations, θ = θ ′ where the total state preparing unitary is the identity, |λ⟩ = U † (θ)U (θ) |0⟩ = I |0⟩, which is an initialization that is proven to not exhibit barren plateaus even for global cost functions [48].Together with the fact that the DualQTE loss function is locally convex, the non-vanishing gradients at the initial point of the optimization is a strong motivation for the efficient trainability of DualQTE.
In Appendix E 4, we provide numerical evidence that for a local Hamiltonian and a logarithmic-depth circuit, neither the evolution gradient or the fidelity gradients decay exponentially with system size.

E. Sample complexity
The implementation of VarQTE on quantum hardware has several sources of errors: the model |ϕ(θ)⟩ could lack expressitivity to capture the dynamics, the time integration scheme introduces errors, the QGT and evolution gradient are subject to sampling error from a finite number of measurement, and each operation is affected by hardware noise.If we denote the ideal VarQTE parameters without sampling or hardware noise by θ(t) and the noisy parameters by θ(t), the error contributions can be split as where we measured the error in Bures distance and distinguish in error due to lack of model expressitivity plus integration error ε M (t) = D B (ϕ(θ(t)), Ψ(t)) and error due to a noisy implementation of VarQTE ε S (t) = D B (ϕ( θ(t)), ϕ(θ(t)) [49].
Since the proposed DualQTE algorithm promises a reduction in measurement cost of VarQTE, but is not concerned with the ansatz selection or hardware noise, we here focus on investigating the scaling of the sampling error.The model error ε M can be bounded with a-posteriori errorbounds [50], which we also investigate for the real-time evolution case in Sec.IV.
Deriving a concrete bound in terms of system quantities such as the energy or the number of parameters requires an assumption on the circuit structure.Here we assume a circuit with only Pauli rotations R P (θ i ) where each parameter θ i is unique and does not have coefficients.Note that the bounds can be adjusted for different circuit structures and parameterizations.In addition, we assume a cutoff δ c > 0 on the smallest eigenvalue of g due to a regularization of the linear system.
Then, we can state the following upper bound on the number of samples required to achieve a sampling error of ε S , where E max is the maximal eigenvalue of the Hamiltonian.
In contrast to a similar approach described in Ref. [49], we state the upper bound in terms of the number of parameters in the model or the system's Hamiltonian, instead of the QGT and evolution gradients.Further, we are able to derive a tighter result by leveraging Latala's theorem from random matrix theory to upper bound the sampling error in g [51].
Since the DualQTE algorithm does not construct the QGT directly but only the evolution gradient, we expect a reduction of a factor d in the complexity, and an additional factor for the number of optimization steps K in each timestep.Indeed we can show that the upper bound for the number of samples is The detailed derivation of both bounds is described in Appendix D. While it is possible to construct circuits where each component of these bounds are tight Sec.III shows that in practice the actual number of required samples scales less than this upper bound, which is further discussed in the Appendix.

III. IMAGINARY-TIME EVOLUTION
In this section, we show the results of DualQITE and investigate the circuit costs compared to VarQITE.As an application, we use our algorithm as a subroutine to prepare typical thermal states of the Heisenberg model, which are then used to calculate the energy per site as a thermodynamic observable.All circuits are constructed and simulated using Qiskit [52].

A. Heisenberg model
We simulate the imaginary-time evolution of the Heisenberg model with nearest-neighbor interaction on a 12-qubit circle in a transverse field: with interaction strength J = 1/4 and transversal field strength g = −1.As a variational ansatz, we use a circuit with Pauli-Y and Pauli-Z single qubit rotation layers that alternate with pairwise CNOT entangling gates.16)) for DualQITE and VarQITE, with mean and standard deviation of 5 experiments.The resources are measured in total number of measurements and are shown for evaluation of gradients (and the QGT, in the case of VarQITE) using parameter-shift rules (dashed) or a LCU approach (solid lines).
The circuit structure is shown in Appendix E 1 and we use r = 3 repetitions.The initial state for the evolution is the equal superposition of all qubits, |+⟩ ⊗n , which we prepare by setting the rotation angles of the last Pauli-Y layer to π/2 and the remaining angles to 0.
The optimization problems in DualQITE are solved using gradient descent with a fixed learning rate of η = 0.1 and time perturbation δτ = 0.01.The initial iteration performs 100 update steps and the subsequent, warmstarted iterations, only 10.These values are motivated by the simulations shown in the Appendix E 2 and are partially heuristic, as a termination criterion is challenging to define with access only to noisy loss function and gradient evaluations.The parameters are integrated with an explicit Euler scheme with timestep ∆ t = 0.01, i.e.
Note that the integration timestep ∆ t , which determines the accuracy of the time integration, can be chosen differently from the time perturbation δτ , which affects the approximation error of the QGT metric with the infidelity.
We compare the performance to VarQITE with the same integration scheme and use an L-curve regularization [53] for a stable solution of the linear system.Among all regularization techniques we attempted, such as adding a diagonal shift, truncating small or negative singular values or solving on a stable subsystem, the Lcurve regularization provided the most accurate and stable results.
In Fig. 2(a), we present the results for a varying number of shots along with the exact time evolution based on exact diagonalization.Already with as little as 100 mea-surements per circuit evaluation (shots) on the 12-qubit model, the dual time evolution is able to qualitatively follow the imaginary-time evolution and, up to time t ≈ 1, even outperform VarQITE with 1024 shots.Increasing the number of measurements of DualQITE to 1024 shots allows the dual method to closely track the exact solution towards the ground state, with a higher accuracy than VarQITE with 8192 shots.

B. Resource requirements
In the above experiment, DualQITE requires fewer circuit evaluations to achieve the same accuracy as Var-QITE.To investigate the total resource requirements, we perform both DualQITE and VarQITE with different resources and compute the achieved error.Since we are interested in following the imaginary-time dynamics as closely as possible at each timestep, we define the error as the average integrated Bures distance to the exact solution over the time evolution, The state fidelity is computed exactly, i.e., we compute the state vector of the model |ϕ⟩ at variational parameters θ(t), and take the inner product with the exact time-evolved state |Ψ(t)⟩.
The results for an integration time of T = 2 are shown in Fig. 2(b).We show the integrated Bures distance with respect to the total number of measurements recorded during the time evolution.In DualQITE, the resources can be split between using more optimization steps in each timestep or more shots to evaluate the gradients.The algorithm settings are detailed in Appendix E 3. The figure shows the resource counts for gradient calculations via the parameter-shift rule (PSR) and linear combination of unitaries (LCU).The LCU technique requires additional auxiliary qubits and additional non-local operations, but less overall circuits than PSR.For P Pauli terms in the Hamiltonian, the total number of required circuits C per timestep For DualQTE the number of circuits is and , where K is the number of optimization steps per timestep.The total number of measurements N is obtained by multiplying the number of circuits with the number of shots.
We see that, on average, DualQITE requires about one order of magnitude fewer measurements to achieve the same accuracy as VarQITE.With an increasing number of parameters, we expect this difference to grow, since VarQITE scales as O(d 2 ) whereas our algorithm, with warm starting, only computes small corrections at each time step.

C. Sample complexity
In addition to the fixed-size model with 12 qubits, we investigate how the resource requirements scale with system size.We compare VarQITE and DualQITE for the Heisenberg model from Eq. ( 15) with varying number of spins n and the same circuit structure as before, but with an adjusted number of repetitions of r = ⌈log 2 (n)⌉ times, plus a final rotation layer.We then tune the settings of VarQITE and DualQITE to achieve a mean accuracy of I B ≤ 0.1 over 5 experiments and count the total number of required measurements N .This threshold corresponds to a per-timestep fidelity of 0.995.
The results are presented in Fig. 3, which show the improved scaling of DualQITE compared to VarQITE.For small system sizes and few parameters, the overhead of solving the optimization problem in DualQITE is larger than evaluating the QGT.But, as we increase the problem size, the quadratic scaling of VarQITE takes over and our algorithm becomes more efficient.
This experiment allows to validate the upper bound on the number of measurements of Sec.II E. As shown in Fig. 3, the model error ε M is negligible in comparison to the sampling error ε S and we approximately have ε S ≈ I B ≈ 0.1.The maximal energy of the Heisenberg model on a periodic chain scales with the number of spins, and can be bounded by Inserting these values in Eqs. ( 13) and ( 14) we expect the scaling to be upper bounded by O(d 5 ) for VarQTE and O(d 4 K) for DualQTE.In practice, we observe approximately a scaling of d 3.56 for VarQTE and d 2. 29 for DualQTE, which shows the expected improved scaling for our algorithm.The measured scaling also suggests that the bounds are not yet tight, which we discuss further in Appendix D.

D. Calculating thermodynamic observables
As an application of imaginary-time evolution, we calculate thermodynamic observables using the quantum minimally entangled thermal states algorithm (QMETTS) [8,54].While the classical METTS algorithm has been specifically developed for Matrix Product State simulations, the thermal state preparation is still costly, and classical simulations fail if the system produces macroscopic entanglement during the imaginary time evolution (e.g. for low-temperature, 2D systems).Due to these restrictions, QMETTS is a promising application for quantum imaginary-time evolution algorithms.
For an observable A and inverse temperature β, the QMETTS algorithm generates samples {A m } m using a Markov chain whose average approximate the ensemble average: The sampling process to obtain the sample A m is 1.Start from a product state |ϕ m (t = 0)⟩.
We investigate the Heisenberg model from Eq. ( 15) on a chain with n = 6 spins with parameters J = 1/4 and g = −1.As a thermodynamic observable we compute the energy per site, ⟨H⟩ /n.To reduce the auto-correlation length in the QMETTS Markov chain, and for faster convergence to the ensemble average, it is favorable to measure in different bases in each step.Since the Heisenberg Hamiltonian conserves the number of qubits in the |1⟩ state, avoiding the Z basis greatly reduces the standard deviation of the Markov chain.Thus, we here alternate between the X and Y basis for each sample.
As ansatz for DualQITE, we use problem-inspired circuits with pairwise CNOT couplings and r = 2 repetitions of rotation and entanglement layers, plus a final rotation layer, see Appendix E 1 for a circuit diagram.For evolutions of product states |±⟩ in the X basis, the rotation layers are single qubit R Y R Z gates, and for the states |±i⟩ in the Y basis, the layers implement R X R Z gates.The initial product states |ϕ m (0)⟩ are prepared by setting the parameters in the final layer rotation layer of the ansatz as follows, Each energy sample is evaluated with 1024 measurements per basis.The optimization problem in DualQITE is solved with a time perturbation δτ = 0.01 and gradient descent with a learning rate of η = 0.1 and 100 iterations in the first timestep, followed by 10 iterations in the following, warmstarted timesteps.We integrate with a fixed timestep of ∆ t = 0.01. Figure 4 shows the estimated energy per site, along with the standard deviation of the samples, for different inverse temperatures β.For the alternating X − Y basis, the Markov chain converges quickly and M = 25 samples suffice for an accurate estimate of the observable.For the imaginary-time evolution, we compare DualQITE with the same settings as in the previous sections to an exact evolution performed with matrix exponentials.It shows that using the dual method allows to reliably reproduce the mean and standard deviation of the Markov chain samples compared to the exact reference METTS.

IV. REAL-TIME EVOLUTION
The focus of this paper is on imaginary-time evolution as, to date, no other QGT-free time evolution algorithms exist in this setting.For real-time evolution, p-VQD has a similar structure as our algorithm and solves an optimization problem rather than evaluating the QGT.However, there are key differences to the dual algorithm applied to real-time evolution.
The p-VQD algorithm [20] projects a single Suzuki-Trotter step onto the circuit model by solving the following optimization problem: For Hamiltonians with many Pauli terms or long-range interactions, such as those arising in molecular dynamics, the single step might already lead to large circuits with non-local gates.While DualQRTE requires an LCU method to evaluate the imaginary part of the energy and state gradients, see Eq. ( 5), this only adds a single entangling gate, compared to a full Suzuki-Trotter step is required.Furthermore, our dual time evolution allows the evaluation of error bounds at almost no additional cost, which is not possible in p-VQD.Due to these differences, this section presents DualQRTE: the dual time evolution for real-time evolution.

A. Heisenberg model
We present the real-time evolution under the Heisenberg Hamiltonian of Eq. ( 15) on a linear chain with n = 4 spins with parameters J = 1/4, g = −1.As variational model, we use a circuit with alternating Pauli-X and Pauli-Y rotation layers, and Pauli-ZZ entangling gates that reflect the connectivity of the spins.The circuit structure is visualized in Appendix F and, in this experiment, all algorithms use r = 3 repetitions of the rotation as well as entangling gates.To prepare the initial state, |+⟩ ⊗4 , we set the parameters of the final Pauli-Y rotations to π/2 and the rest to 0.
During the evolution, we track the average magnetization in the X and Z direction, Since this Heisenberg Hamiltonian preserves the qubit excitations, and the initial state is the equal superposition, the ⟨Z⟩ expectation value should remain 0 throughout the evolution.The results of the different time evolution algorithms for an integration time of T = 2 and timestep ∆ t = T /100 are presented in Fig. 5.Both DualQRTE and p-VQD accurately track the observables using only 200 shots per circuit.With the same resources, VarQRTE, on the other hand, has lower accuracy and we need to use 1024 shots per circuit to match the result of the optimization-based algorithms.

B. Error bounds
In variational real-time evolution, the model error in terms of Bures distance D B due to restriction to the variational manifold can be expressed as [50] εM := where we set ℏ ≡ 1. Integrating this error rate provides an upper bound on the Bures distance, that is where |Ψ(t)⟩ is the exact time-evolved state and the timedependence of ε M is due to the time-dependence of the parameters θ = θ(t).Up to the variance Var(H|ϕ(θ)) = ⟨ϕ(θ)|H 2 |ϕ(θ)⟩ − (⟨ϕ(θ)|H|ϕ(θ)⟩) 2 , this error is proportional to the loss function used in DualQRTE.By using the same expansion θ = δθ/δτ and using the infidelity to approximate the inner product with respect to the geometric tensor, we can rewrite the error as Note that the error scales linearly in time perturbation δτ as the infidelity approximation has a cubic error term [33], which is divided by the square of the perturbation.If we, for example, use a forward Euler rule with timestep ∆ t to integrate the variational error, the integration error scales as O(∆ t + T δτ ).This highlights the importance of differentiating between the timestep ∆ t for the integration, and the time-perturbation δτ to approximate the derivative.
In Fig. 6, we show the error bounds along with the true error for the time evolution of the Heisenberg model.The bounds are computed for different timesteps ∆ t for VarQRTE, and for DualQRTE for a fixed time perturbations δτ = 10 −3 in exact simulations.Firstly, we can verify that the error bounds hold.Secondly, the larger the timestep relative to the time-perturbation, the more accurate the approximation of the dual time evolution, as the error O(∆ t + T δτ ) is dominated by the integration error.

V. CONCLUSION
In this paper, we present a novel algorithm for variational quantum time evolution that does not require the evaluation of the QGT, but instead solves a dual optimization problem in each timestep.The proposed dual time evolution algorithm, DualQTE, is particularly interesting for imaginary-time evolution, as there is currently no alternative variational algorithm able to circumvent the O(d 2 ) cost of VarQITE.For real-time evolution, p-VQD also offers an optimization-based approach by projecting a single Trotter step onto the variational form.In comparison, the dual time evolution has the advantage that no Suzuki-Trotter step has to be implemented, which could require deep circuits or non-local operations, depending on the Hamiltonian.Furthermore, our algorithm allows to evaluate variational error bounds [50], although how accurately they can be evaluated in the presence shot noise remains an open question.
We demonstrated DualQTE for the imaginary-time evolution of a Heisenberg Hamiltonian on 12 qubits, and found that, in this setting, it requires about one order of magnitude less measurements to achieve the same accuracy as VarQITE.As a practical application of imaginary-time evolution, we calculated thermodynamic observables with the QMETTS algorithm and showed that the DualQITE is suitable to reproduce the sampling distributions.Finally, we applied our algorithm to an illustrative example for real-time evolution, where it produced comparable results to p-VQD for the same amount of resources, while both algorithms outperformed VarQRTE.
In the presented experiments, we used standard gradient descent algorithms with a fixed learning rate.We expect that the performance could be further improved by using more advanced optimization schemes, or methods that also take into account information from previous iterations.Another possible improvement would be a suitable termination criterion for noisy evaluations of the loss function.As for other optimization-based time evolution algorithms, such as p-VQD, it remains challenging to accurately measure the fidelity in the presence of hardware noise.
In conclusion, the proposed DualQTE is an efficient variational algorithm for quantum time evolution that does not suffer from the quadratic complexity of evaluating the QGT.This cost reduction enables scaling imaginary-time evolution to larger, practically relevant system sizes and allows the simulation and demonstration of a wide variety of important tasks such as Gibbs state preparation, mixed time evolution, or the evaluation of thermodynamic observables.Improving the resource requirements for near-term algorithms is an important step for scaling demonstrations to the full size of today's quantum computers and work towards practical applications.
The formulation in Eq. (B1) is then obtained by solving the minimization problem.
To circumvent the explicit evaluation of the QGT the natural gradient update can instead be calculated without the quadratic local approximation, and instead solve the optimization problem directly.If we use the infidelity as distance metric and replace the loss function gradient by the evolution gradient ∇ℓ(θ) = ∇E(θ)/2 = −b(θ), we obtain the same update rule as the main text For an intuitive understanding of the approximations of the QGT norm, we investigate an illustrative example with the variational model |ϕ(θ)⟩ = R Z (θ)R Y (θ) |0⟩, the Hamiltonian H = Z and a timestep of δτ = 1/2.In Fig. 7(a) we compare the exact values of the loss function L for imaginary-time evolution around θ = π/4 obtained by using the metric ⟨δθ, g(θ)δθ⟩ or the infidelity 1 − F (θ, θ + δθ) as norm.At δθ = 0 the approximation is exact and in the vicinity the difference scales as (δθ) 3 , see also Fig. 7(b).Note, that the infidelity is periodic and bounded in [0, 1] but the linear term b T δθ is unbounded, which leads to the fact that the minimum of the infidelity-based loss function close to δθ = 0 is not the global minimum.This is well visible in Fig. 7(a) where the infidelity-based loss function achieves lower values for large δθ than the minimum of the QGT close to δθ = 0. Since we aim to find the same minimum as the QGT-based loss function using a local optimization routine, such as gradient descent, is crucial for the dual time evolution.

Impact of the time perturbation
The approximation error scales with the norm of δθ and, therefore, solving for the update step θ = δθ/δτ with a smaller time perturbation δτ should result in a smaller error in the update step.Remembering the definition of the loss function we see that a smaller δτ moves the minimum closer to the minimum of the infidelity at δθ = 0, leading to a smaller approximation error.Since the fidelity is bounded but the linear part b T (θ)δθ is not, there is a maximum feasible range for the value of δτ .A necessary condition for the existence of the minimum is that the gradient of the loss function vanishes, ∇L = 0, which requires ∀i ∈ {1, . . ., d} : For a circuit with unique parameters and only Pauli rotations gates, the gradient of the fidelity can be bounded via the parameter-shift rule to be in [−1/2, 1/2] (see also Appendix D).Thus, a necessary condition for the timestep perturbation is ∀i ∈ {1, . . ., d} which can be generalized to circuits with repeated parameters or other than Pauli gates.Note that this is only a necessary and not a sufficient condition for the existence of a minimum since, depending on the circuit structure, the fidelity gradient may not support the full range [−1/2, 1/2].
In Fig. 8(a) we visualize the impact of δτ on the loss landscape.For small time perturbations the QGT-based and dual loss landscapes almost coincide, but if δτ is chosen too large the dual loss function has no minimum.If the loss function can be evaluated exactly, choosing δτ as small as possible therefore minimizes the approximation error.In Fig. 8(b), we find the the error in the parameter derivative θ = δθ/δτ scales approximately as O(δτ ).In practice, however, the loss function is subject to measurement noise and errors in the solution δθ are amplified by 1/δτ .Hence, for a finite number of measurements there is a trade-off between QGT approximation error and controlling the noise amplification.Assuming a forward Euler integration, the Bures distance can be formulated in terms of the QGT as where we introduced ∆ θ = θ − θ, λ max ≥ 0 is a bound on the largest eigenvalue of g for any parameter value θ and, similarly, ∥∆ θmax ∥ 2 an upper bound on the norm ∆ θ.In the first line, we dropped O(∆ 3 t ) error terms, in the second line we used a first order Taylor-expansion and in the last line we use the definition of the operator norm to bound the inner product of δθ in the metric of g.

VarQTE
In each VarQTE step we solve a linear system for the update step, where the measurements of the QGT and evolution gradient are subject to sampling error.We define the noisy quantities as g(θ) = g(θ)+∆g(θ) and b(θ) = b(θ)+∆b(θ) and, then, solve the noisy linear system with the noisy update θ = θ + ∆ θ.To stabilize the linear system and ensure the QGT and it's estimate are invertible, we assume a regularization of g and g in form of a diagonal shift δ c .This shift is a trade-off of stability and bias, which is also discussed in Ref. [40], Appendix D. We can write the error in the update step using the difference of the noisy and exact linear system solutions, as where we dropped the explicit parameter dependence for legibility.In the second line we used the Neumann series to approximate (g + ∆g) 2 ) and dropped quadratic error terms on the fourth line.In the following we derive upper bounds on the maximal value of the individual contributions in the error bound, such that we finally obtain a bound ∥∆ θmax ∥ on the error in the update step.
a. Spectrum of g Each QGT entry can be computed as [40] g For a circuit with unique, non-interacting parameter and only plain Pauli rotation gates R P (θ), we can use the parameter-shift rule [34] to write the entry as where we abbreviated F (±±) ij = F (θ, θ ± e i π/2 ± e j π/2) for the ith unit vector e i (and jth unit vector e j ).Since the fidelity is in [0, 1] we can bound each entry by for any value of θ.Gershgorin's circle theorem tells us that the maximal eigenvalue of g is bounded from above by the maximal sum over the columns or rows, which in this case is achieved by setting all elements of a column/row to 1/4.This gives the bound This bound can be generalized to circuits with coefficients or repeated parameters by applying the chain and product rules.For example, for a coefficient-free circuit where parameters can be repeated up to m times, the bound becomes md/4.
b. Norm of the update step The update step can be bounded as ∥ θ∥ 2 ≤ ∥g −1 ∥ 2 ∥b∥ 2 , where ∥g −1 ∥ 2 ≤ δ −1 c .The evolution gradient can be bounded using the parameter-shift rule, under the circuit structure assumptions as the previous section.Each element in the gradient is bounded by where ) and E max is the absolute maximum system energy.The norm over all elements is then ∥b∥ 2 ≤ √ dE max , leading to an overall bound of for any parameter value θ. c.Sampling errors Since the measurement noise is unbiased, the random variable ∆g = g − g has zero mean with i.i.d.entries.This allows to apply Latala's theorem [51], which states that for some constant C ∈ R.
Ref. [60] is concerned with the similar case of sampling the matrix [F (x i , x j )] d i,j=1 for a set of parameters {x i } d i=1 .There, the matrix entries are Bernoulli distributed with probability F (x i , x j ).Using QGT representation as Hessian and applying the parameter-shift rule, we can see that the entries of g ij are Poisson binomial distributed [61] with probabilities [F ] over a shifted support [0, 1, 2, 3, 4] → [−2, −1, 0, 1, 2].Since the of this distribution are independent of the number of circuit parameters, it can be shown analogous to Ref. [60] that which leads to a total bound of The bound on ∥∆b∥ 2 does not need to be tighter than ∥∆g∥ 2 ∥ θ∥ 2 , which is straightforward to achieve via the sampling error.Using the product rule we have loss function is available these criteria become unreliable as the noise in the evaluation might prevent the termination criterion to be fulfilled even though the algorithm converged.
One possible resolution would be to consider a moving average over a past batch of iterations.However, depending on the level of noise, this could require a large batchsize and therefore many iterations until the termination can be checked.Since the dual time evolution only has to compute small corrections, if small timesteps are performed and the optimization are warmstarted, we only expect a few iterations and a moving average is not a resource-efficient solution.Therefore we use a heuristic where the first optimization uses a large number of steps and the subsequent ones perform a fixed number of few iterations.
To demonstrate the effectiveness of warmstarting and to calibrate the number of required steps for noisy evaluations we investigate the dual time evolution in an ideal setting with exact statevector simulations and no finite-sampling statistics.First, we perform the time evolution for a Heisenberg Hamiltonian with periodic boundary conditions with n = 12 sites, J = 1/4, g = −1 and the initial state |+⟩ ⊗n .As circuit model we use the hardware efficient circuit from Fig. 10 with r = 6 repetitions and optimize the update step with a gradient descent routine with a fixed learning rate of η = 0.1.In each timestep we iterate until the change in loss function ∆L is below the threshold of 10 −4 ∆ t = 10 −6 .The results are presented in Fig. 11(a), and we observe that warmstarting drastically reduces the number of required iterations until the convergence criterion is reached.
In a second experiment we analyze how the required number of optimization steps scales with the system size.In Fig. 11(b) we repeat the above experiment for n = 3 to 12 spins and track the number of steps in the first iteration and the mean and standard deviation of the warmstarted iterations.We see that the number of steps scales sublinearly in the number of parameters d and is almost constant for the warmstarted iterations.k ∥ 2 at selected times t, where k indicates the iteration in the optimization within the timestep.Lengths for the fidelity gradients differ as the optimization were performed using a different number of steps, see Table II.

FIG. 1 :
FIG. 1: Estimated runtimes of variational imaginary-time evolution (VarQITE) and our proposed dual method (DualQITE) as a function of the number of parameters d of the variational model, for an exemplary Heisenberg model and 200 timesteps.See Appendix A for more details.

FIG. 2 :
FIG. 2: (a) The mean and standard deviation of DualQITE and VarQITE, each averaged over 5 independent experiments for a varying number of shots.(b) The accuracy measured in integrated Bures distance I B (see Eq. (16)) for DualQITE and VarQITE, with mean and standard deviation of 5 experiments.The resources are measured in total number of measurements and are shown for evaluation of gradients (and the QGT, in the case of VarQITE) using parameter-shift rules (dashed) or a LCU approach (solid lines).

FIG. 3 :
FIG. 3: (a) Total number of measurements required to achieve a mean accuracy of I B ≤ 0.1 over an average of 5 experiments.See Table II for the exact algorithm settings.Dotted and dashed lines show fits for the number of measurements.The bumps in the fits are due to the discontinuity of the circuit depth, which depends on ⌈log 2 (n)⌉.VarQITE is not evaluated for n = 14 qubits as it requires too many measurements.(b) Mean accuracy and standard deviation of each point of the top panel.The grey line indicates the infidelity threshold of I B = 0.1.

FIG. 4 :
FIG. 4: Energy per site for the Heisenberg model on a 6-spin chain, comparing mean and standard deviation of QMETTS with DualQITE (blue circle and errorbars) with a reference METTS implementation (black line and grey shade).

FIG. 5 :
FIG. 5: Average magnetization in X and Z direction as tracked by different variational algorithms.

FIG. 6 :
FIG. 6: Error of the real-time evolution in Bures distance, plus error bounds obtained with VarQRTE and DualQRTE.

FIG. 7 :
FIG. 7: (a) Values of the loss function L for evaluation with the QGT metric, and with introduced infidelity approximation.(b) Difference of the QGT metric and infidelity as function of the perturbation δθ.

FIG. 8 :
FIG. 8: (a) Loss landscapes and optimal solutions of the original, QGT-based loss function and the dual loss function for different δτ .(b) Error in calculating the parameter derivative θ depending on δτ and the number of measurements N .

FIG. 10 :
FIG. 10: The hardware efficient ansatz for the imaginary-time evolution experiments.In the QMETTS experiments the Pauli-Y rotations is replaced by Pauli-X rotations, if the evolution starts in the Y basis states |±i⟩.

FIG. 11 : 2 FIG. 12 :
FIG. 11: (a) The number of iterations required per timestep until convergence is reached with different initialization techniques.(b) The number of iterations for different numbers of qubits and the first iteration and warmstarted iterations.The warmstarted points show mean and standard deviation of the number of iteration of all steps after the first.