Problem specific classical optimization of Hamiltonian simulation

Nonequilibrium time evolution of large quantum systems is a strong candidate for quantum advantage. Variational quantum algorithms have been put forward for this task, but their quantum optimization routines suffer from trainability and sampling problems. Here, we present a classical pre-processing routine for variational Hamiltonian simulation that circumvents the need of a quantum optimization by expanding rigorous error bounds in a perturbative regime for suitable time steps. The resulting cost function is efficiently computable on a classical computer. We show that there always exists potential for optimization with respect to a Trotter sequence of the same order and that the cost value has the same scaling as for Trotter in simulation time and system size. Unlike previous work on classical pre-processing, the method is applicable to any Hamiltonian system independent of locality and interaction lengths. Via numerical experiments for spin-lattice models, we find that our approach significantly improves digital quantum simulations capabilities with respect to Trotter sequences for the same resources. For short times, we find accuracy improvements of more than three orders of magnitude for our method as compared to Trotter sequences of the same gate number. Moreover, for a given gate number and accuracy target, we find that the pre-optimization we introduce enables simulation times that are consistently more than 10 times longer for a target accuracy of 0.1%.


I. INTRODUCTION
Quantum simulation is believed to be one of the first applications in quantum computing that may show practical advantages over their classical counterparts [1,2].To compile a digital quantum algorithm for simulating a time evolution U (t) that will be typically generated by a p-local Hamiltonian, a Trotter-Suzuki decomposition [3], that approximates U (t) as a product of local exponentials, can be employed.Although so called "beyond Trotter" methods, such as more general product formulas [4][5][6], quantum signal processing techniques [7] or Krylov subspace inspired algorithms [8], have been investigated, recent studies on Trotter error scaling [9,10] have proven Trotter sequences to still be competitive.Nonetheless, in specific applications, Trotter sequences fail to deliver the optimal solution for a product formula with fixed gate count [11].Amongst other optimization strategies, variational quantum algorithms have been put forward to find optimized (system-specific) gate sequences for Hamiltonian simulation [12][13][14][15][16][17].
Although variational quantum algorithms have recently piqued significant interest as possible applications on near term quantum hardware [18], they hide intrinsic difficulties that make their implementation on near term devices very challenging.Vanishing gradients (Barren plateaus) [19][20][21], sample overhead for efficient measurements [22,23] and effects of noise on cost function * Refik.Mansuroglu@fau.deevaluations [24] are only some of the hurdles to overcome.Thus, key to a successful near term quantum algorithm is developing strategies to use quantum hardware in the crucial solution step, but otherwise as little as possible.
A drastic, however promising, approach to solve this is not to do quantum optimization at all, but rather push the whole optimization loop into classical pre-processing, which requires a cost function that is efficiently computable by a classical machine.This idea has been demonstrated to be applicable in specific cases, such as translational invariant models [25] in which parameters found on small systems can be reused in large systems, as well as systems with efficient tensor network representations [26][27][28][29] to calculate error measures classically.
Here, we present a cost function that is classically efficiently computable and use it for classical optimization of quantum circuits that implement time evolution in pre-processing.We show that this cost function yields a faithful upper bound on the two-norm error of the ansatz unitary, if one restricts the single time step to be small, an assumption that is common to all product formulae [4][5][6]9].Our cost function is agnostic of the considered initial state and relies on a perturbative expansion of a rigorous upper bound that can be transformed into a polynomial in the parameters.The number of terms that need to be calculated scales with the number of non-commuting terms H j in the Hamiltonian.While this number is generally dependent on the locality and interaction distance in H, it is bounded by a power of the qubit number for models with only two-body interactions.There are thus no intrinsic limitations on the Hamiltonian for the use of our method.We further give a perturbative and classically computable error estimate that allows us to predict error scalings in system size and simulation time if the optimized gate sequence is repeated K times.T V are due to the finite step size in the search for optimal time steps t to meet the error threshold ϵ.See section IV for details of the numerical experiments to produce the data shown here.
We illustrate the improvements enabled by our approach via exact numerical simulations for an XY model with random nearest neighboring interactions.Our results (cf.Fig. 1) show that the maximally reachable simulation times T for a given target accuracy ϵ and a fixed gate budget G = O(K) are more than 10 times longer than for a Trotter sequence of the same order for cases with K ≳ 20 and ϵ = 10 −3 .Moreover, the variational sequence even reaches slightly larger simulation times than second order Trotter and is competitive with fourth order Trotter, whereas these higher order formulas provide less data points for fixed gate budget.
The paper is structured as follows.We begin by introducing the second order perturbative distance of a variational sequence from the correct time evolution operator in section II and show that it bounds the algorithm error up to third order corrections in the parameters.We further show that the perturbative distance always has a non-vanishing gradient at the point in the parameter space that corresponds to the Trotter sequence yielding a general guarantee for the existence of an optimal solution away from Trotter.In section III, we compute the scaling of the error in the extrapolation of found optimal gate sequence in system size and simulation time.Finally, we demonstrate our findings and the advantages they enable in numerical experiments on the XY model in section IV before we close with a summary and outlook.

II. PERTURBATIVE COST FUNCTION
We are interested in a variational ansatz for quantum simulation U (t) = exp (−itH).It has been demonstrated before [11,25] that Trotter sequences can be optimized in a model specific way using the ansatz e −iθr,j Hj . ( The inner product factorizes all M Hamiltonian terms indexed with j and the outer product adds the freedom to use in total R layers in which the parameters can be different.While the choice of ansatz is not necessarily restricted to Eq. ( 2), it has the advantage of containing the Trotter solution.If we use the parameters θ r,j = cj t R , Eq. ( 2) reduces to a Trotter decomposition with Trotter number R.
Finding the optimal solution for an operator approximation is hard, since error measures for the time evolution operator, such as where ∥.∥ is a yet unspecified, unitary invariant norm, require exponential resources, in general.In this work, we present a classically efficient optimization strategy that yields good results in the relevant regime of small time steps t R , where power expansions can be applied.To this end, we introduce an approximation to an upper bound of the error measure ϵ var for the time evolution operator in equation (3), which is provided by a quantity that we call perturbative distance.
Definition 1.Let H = M j c j H j be a time-independent Hamiltonian.Define the perturbative distance with the coefficients The perturbative distance C(θ) represents an approximation to an upper bound of the error measure ϵ var for the time evolution operator.We make this statement precise in the following proposition.
Proposition 2. Let H = M j c j H j be a timeindependent Hamiltonian and C(θ) the perturbative distance defined in Eq. (4).Then the error ϵ var defined in Eq. ( 3) is bounded up to third order corrections by Proof.We can write the difference operator in Eq. ( 3) in the form where Z is determined by merging U var into one exponential via the Baker-Campbell-Hausdorff (BCH) lemma.We show in Lemma 7 (see Appendix A 2 for a proof) that up to third order We assumed a specific ordering of the multi-index (r, j) for the sake of being precise.The statement can be very well applied to arbitrary orderings of the exponentials in Eq. ( 2).We finally make use of a well-known estimate (see Lemma 6 in Appendix A 1 for a proof) that leaves us with the proposition.
To make C(θ) an approximate upper bound to the error ϵ var , we need to assume t R ∥H∥ ≪ 1 and bound the parameters θ to be at the same order of magnitude.This assumption yields a time-dependent bound on R, which is not surprising.In general, Lieb-Robinson bounds on the entanglement that is generated by time evolution gives a relation between t and R [30].If one manages to minimize C(θ) within the perturbative regime, the leading term in the upper bound will be of order O(θ 3 ), yielding a new error estimate that we will analyze in section III.
Note that first order Trotter sequences can be represented by Eq. ( 2) setting the parameters θ r,j = tcj R .The Trotter solution has the advantage that it makes the linear contribution in θ, respectively t R , vanish.It is however not the only solution that achieves this.In fact, we can reduce the number of parameters by M , if we constrain the linear terms by ξ j = 0 ∀j.Let us thus fix the last layer (r = R) by setting so that (R − 1)M parameters remain.Note that Eq. ( 11) is a choice that turns out to be helpful as long as the operators that contribute in linear order are all orthogonal to the operators in the second order terms, which we assume in the following.Otherwise first and second order terms could be used to cancel each other.We are still free to choose a specific norm.While in finite dimensions, all norms are equivalent, choosing a norm to describe the error ϵ, implies a notion of how the algorithm performs for specific initial states.If we were to choose the operator norm ∥.∥ ∞ , for instance, ϵ can be connected to an infidelity for the worst case initial state, while a 2-norm ∥.∥ 2 yields an average case error.Later, C(θ) shall be used as a cost function in variational optimization, so it should be continuously differentiable in the parameters θ r,j .While this rules out the operator norm, all Schatten p-norms can be applied.If we consider the 2-norm, we see that the computation of C(θ) is classically efficient.In this case where we included a normalization factor 2 −n that prevents C(θ) from growing in the Hilbert space dimension.Typically, the commutators [H j , H j ′ ] follow a simple algebraic structure (for instance if all H j are Pauli strings).

The factors Tr ([H
) can thus efficiently be computed for the model of interest, typically even by hand, leaving C(θ) to be a fourth order polynomial in θ.
The Trotter solution, while becoming exact in the limit R → ∞, can always be beaten for finite R ≥ 3, as we will show in the following.Proof (Sketch).For the forward direction "⇒", we explicitly calculate the gradient at the Trotter solution ∇C θ T .Setting this to zero, gives a relation where A j,k,k ′ ,l is a tensor independent of r and R. For R ≥ 3, we show that this forces the left hand side of Eq. ( 13) to vanish making the Trotter sequence exact.For details, see Appendix A 3. The reverse direction "⇐" is trivial.

III. ERROR SCALING
Since the proposed classical optimization scheme can only make valid statements within the perturbative regime θ ≪ 1 ∥H∥ , longer simulation times either require large R or repetitions of a small time step.A similar extrapolation can be done in system size.This section studies the scaling of the perturbative distance along these extrapolations.In the case that the second order perturbative cost function is optimzed to values that become negligible to higher orders, the error scaling is determined by a third order term E(θ) for which holds.First, we consider the extrapolation in system size in the presence of translation symmetries.If the parameters are chosen to be translation invariant, i.e. θ r,j = θ r,a , then the cost value for a system of n qubits on a D-dimensional geometry is bounded by the cost value C unit (θ) of a system of (2p − 1) D qubits and the error is bounded by E unit (θ) of a system of (3p − 2) D qubits Further, if all the commutators [H j , H j ′ ] are orthogonal with respect to the Hilbert-Schmidt product, cost and error read See Appendix A 4 for a proof.Remarkably, the calculation of C(θ) on translation invariant systems is reduced to a problem on (2p − 1) D qubits only.If the optimal parameters are used for a larger system, the cost function will scale as O(n).In fact, a system that is small enough can be exactly diagonalized on a classical computer and an exact measure of error, such as ∥U (0, t) − U var ∥ can be calculated.A successful extrapolation of optimal parameters for translation invariant systems to larger system sizes has been found in [25] and also shows a linear scaling of the squared cost function in system size n, even beyond the perturbative regime.While we restrict the discussion here to translation symmetries, the behavior of other scaling transformation such as clusterings or pooling inspired transformations can be analogously checked.
Optimizing on longer simulation times still within the perturbative regime comes with an increase in R and therefore with the number of parameters making a successful optimization hard.Instead, long simulation times can be reached by repeating a pre-trained single time step fixing the number of parameters to be RM .The scaling of the cost function under this extrapolation in simulation time is studied in the following.
Proposition 5. Let U var (θ) be an ansatz for variational Hamiltonian simulation as defined in Eq. (2) with the cost value C(θ).The ansatz simulates time T = Kt using KR layers and K copies of the RM parameters.The corresponding cost value reads Proof idea.The extrapolated ansatz is equivalent to the ansatz before by mapping R → KR, t → Kt and identifying θ r,j = θ r%R,j , where % denotes the modulo operation.Plugging this into the definition of χ from Eq. ( 6) yields a linear factor K. See Appendix A 5 for details.
In total, the cost function scales as O(K) when extrapolating in time.Note that the leading order Trotter error scales the same under the above mapping The error E(θ) of the repeated sequence does not admit a common K-dependent factor that can be pulled out, but rather introduces extra terms, Λ, that make the error scaling worse than for Trotter sequences (see appendix A 5 for details).The extra terms will, in general, contribute to an O(K 2 ) scaling, Although this result means that usually the improvement with respect to Trotter will eventually melt away at long times, our numerical experiments show that there are significant improvements for the time scales that are relevant in order to reach quantum advantage in digital quantum simulation.

IV. NUMERICAL EXPERIMENT
To explore the accuracy improvement that can be enabled with our approach in specific examples, we study the performance of the derived cost function by optimizing a variational sequence with the second order perturbative distance defined in Eq. ( 4) and comparing it to the exact value for the difference of time evolutions ϵ var = ∥U (t) − U var ∥ and ϵ T rotter = ∥U (t) − U T rotter ∥ as well as to the leading order error term E(θ) (cf.Eq. ( 14)) that is efficiently computable classically.For this comparison, we study the ratios of perturbative (R E ) and exact (R ϵ ) error measures comparing Trotter and variational sequences.In Eq. ( 21), C(θ T ) denotes the perturbative distance for choosing the variational parameters equal to the Trotter choice θ r,j = tcj R .From the bound in Eq. ( 14), we can also infer an approximate lower bound for the exact error ratio where we assumed that C(θ) ≪ E(θ) after a successful optimization of the parameters θ.
As an example, we consider an XY model with a fully connected interaction graph that is described by the Hamiltonian (up to unitary equivalence) The sum over µ, ν counts every pair of qubits and the symmetric matrices J (y,z) µν encode the interaction strengths between qubits µ and ν.In our numerical simulations, we normalize the Hamiltonian to ∥H∥ = √ n to fix the time scale.
To calculate C(θ) for the XY model, we define a mapping between the single index j ∈ {1, ..., M } used in the above derivations and the Hamiltonians in Eq. ( 23) labelled by the qubit indices µ ∈ {1, ..., n} and the interaction type.The non-vanishing commutator terms that appear in This allows the reduction of C(θ) to O n 3 non-vanishing terms in the fully connected case and to O(n) non-vanishing terms in a nearest-neighbor XY model.See Appendix C for an explicit form of the perturbative distance C(θ).Note that we are still free to turn off specific interactions from the fully connected graph to study lattice geometries of any dimension.We will now turn to a two-dimensional geometry for which there is no analytical solution using free fermion mapping, such as in one dimension.
In Fig. 2, we plot the cost values C, E and ϵ, as well as the ratios R ϵ and R E for different times t.The optimizer manages to find small values of C(θ) for all times.For small enough times t < R ∥H∥ = 1, the perturbative distance also incorporates a faithful indicator for the exact norm in a sense that the optimal parameters also yield a significant decrease in the exact error ϵ V ar of the variational sequence.Also, the error estimate E and the ratio R E yield very good approximations for the true error ratio R ϵ as predicted in Eq. ( 22).The numerics hence indicate that the approximate upper bound derived in Eq. ( 14) is tight.
In Fig. 3, the optimal parameters found before for a fixed time step √ 2nt ∈ {0.05, 0.5, 1.5} are being repeated up to K = 10 times to simulate longer times.As predicted in Proposition 5, the square of the perturbative distance grows with a factor K 2 .For total simulation times √ 2nKt < 1.5, this scaling is also reflected in the exact cost ϵ.Above this critical time, higher order contributions become non-negligible, which explains why the scaling ϵ deviates from O K 2 .Remarkably, the error ratio between Trotter and variational sequences stays constant also far beyond this critical time until the improvement of the variational sequence eventually starts melting away at t > 1.
Let us take this extrapolation to a practical case in which we have a gate budget G = O(KRn), that will scale with the number of qubits n and the number of circuit repetitions K, and a maximal target accuracy ϵ.A natural question from a practitioner's point of view is to ask, what is the maximally reachable simulation time T = Kt with these fixed resources.We study this question in Fig. 1 where we plot T V for variational and Trotter sequences of order q ∈ {1, 2, 4} picking the maximal time step t that corresponds to an error ϵ just below the threshold.While the variational sequence reaches larger simulation times than first and second order Trotter sequences, its maximal simulation time is comparable to the fourth order Trotter formula in the ϵ = 10 −2 case and in the ϵ = 10 −3 case for circuits with K < 25.The simulation time of the variational sequence is then surpassed by that of a q = 4 Trotter formula in the case ϵ = 10 −3 at depths K > 25.However, since higher order Trotter formulas require more gates for a single time step, the resolution of the time evolution for a given gate number also becomes more sparse.For NISQ applications, which are typically K < 20, there might be only a single or no fourth order data point in the implementable range, depending on the system size.We also plot the ratio of simulation times T T /T V for first order sequences to show that our method can reach simulation times that are more than 10 times longer than for a Trotter sequence of the same order for ϵ = 10 −3 and K ≳ 20.
We have also computed the values of the the ratios R C and R ϵ for different times t for a Transverse Field Ising model in Appendix B. Also for this model, our approach leads to a dramatic improvement.This model however has a particularly simple commutator algebra that is routed in it's low connectivity.Therefore many of the two-qubit gates in a Trotter sequence commute with  each other and, as a consequence, a first order Trotter sequence of the Ising model is unitarily equivalent to a second order Trotter sequence.Improving upon this sequence would require a third order perturbative distance analogous to our discussion above.

V. CONCLUSION
In this work, we presented a classically efficient way to estimate error measures for variational time evolution in the regime t R , θ ≪ 1 ∥H∥ and proposed an optimization routine for variational Hamiltonian simulation that is executed completely in classical pre-processing.The method is applicable to arbitrary p-local Hamiltonians and only requires computing commutators of individual terms in the Hamiltonian.
We show that -analogous to a Trotter decomposition -the linear contribution to the error in the parameters θ can be forced to exactly vanish.The remaining parameters can then be found to minimize the second order error terms for which the existence of a better solution than Trotter is always guaranteed within the perturbative regime.A generalization to higher orders is straightforward and remains only a matter of diligence.The Trotter-inspired error scalings in system size and extrap-olations in simulation time are conserved also when deviating from the Trotter solution.Our findings are backed up with numerical experiments on the XY model with random interaction strengths.
For future work, it would be interesting to study other physically motivated models like non-local fermionic encodings.Also, an extension to time-dependent Hamiltonians would be interesting, although the extrapolation to larger times is not straight-forward for this case.Another possibility to simplify the computation of the perturbative distance is to change the exact error measure that is to be approximated.This can be done by fixing the initial state of the quantum simulation or to consider alternative error measures, such as the conservation of statistical moments of the Hamiltonian, as recently proposed in [17].
interaction types may admit support on clusters of at most p neighboring qubits.We do not make any assumptions on lattice geometry or dimension in the following and formally denote by N p (j) the set of Hamiltonian terms that correspond to a p-nearest neighbor of the j th term.The parameters are chosen to be translational invariant by just dropping the dependence on the qubit number, i.e. θ r,j = θ r,a .As a consequence, also χ j,j ′ = χ a,a ′ and Φ j,k,l = Φ a,b,c .
Since we introduced another index j → (a, j), we need to refine the chosen ordering.Without loss of generality, we choose (r, a, j) : (1, 1, 1) ≤ (1, 1, j) ≤ (1, a, j) ≤ (r, a, j) (A25) To simplify the cost function defined in Eq. ( 12), we decompose the sum We plug in the above structure into the definition of the cost function, and use the locality of the Hamiltonian to get for one fixed j where we used the triangle inequality in the last step.From translational invariance we know that the second last term yields the same value for every j yielding a factor n. If the commutators [H j , H j ′ ] are orthogonal with respect to the Hilbert-Schmidt product, the sum j can be pulled out of the squared norm yielding n equal addends and hence C(θ) = √ nC unit (θ).The same way, we can reduce the computation of E(θ) to a unit cell.However, as we consider two-fold nested commutators here, the unit cell is of size b , H  Proof.As Eq. ( 17) is equivalent to the ansatz before by just mapping R → KR, t → Kt and identifying θ r,j = θ r%R,j , the cost function will have the same form as before with modified χ χ where µ ≁ ν denotes that µ and ν are not neighbors.On a two-dimensional square lattice, it is handy to further refine the qubit indices into two µ = (µ x , µ y ), where µ x ∈ {1, ..., n x } and µ y ∈ {1, ..., n y } count the lattice sites in each dimension, respectively.θ r,j = 0 also implies χ j,j ′ = 0 ∀j ′ which further reduces the number of terms in Eq. (B5).

(B7)
Although the cost function becomes tedious to write down, it only contains a linear number of terms and is thus efficiently computable.The four contributions in Eq. (B5) account for the different combinations of the appearing ZY string and are depicted in Fig. 4.

Numerical restults
In Fig. 5, we study the performance of the derived cost function by optimizing a variational sequence with the second order cost function derived in Eq. (B7) and comparing it to the exact value of the difference of time evolutions ϵ for different times t.The optimizer manages to find small values of C(θ) for all times.For small enough times t < R ∥H∥ ≈ 1, the perturbative cost function also incorporates a faithful indicator for the exact norm in a sense that the optimal parameters also yield a significant decrease in the exact error of the variational sequence.

2
ZZ terms and n X terms.To use the notation introduced before, one needs to map the interactions between two qubits to a single index j for j ≤ n Y µ(j) Y ν(j) for n < j ≤ n(n+1) 2 Z µ(j) Z ν(j) for n(n+1) for j ≤ n J (y) µ(j)ν(j) for n < j ≤ n(n+1) 2 J (z) µ(j)ν(j) for n(n+1) The exact ordering of the indices is not important, in general.To give an explicit example, consider row major ordering j = (µ − 1) n − µ 2 + ν, for µ ≤ ν.We plug in the H j into Eq.( 12) and observe that the only non-zero

Figure 1 .
Figure 1.Maximal simulation time T of variational and Trotter sequences of order q ∈ {1, 2, 4} (left) and simulation time ratio TT /TV of first order sequences (right) for a fixed gate count that is determined by K and a threshold accuracy ϵ ∈ {10 −2 , 10 −3 } evaluated on a nearest neighbor XY model on a 3 × 3 quadratic lattice and random interaction strength J (y) µ,ν centered around 0.5, J (z) µν centered around 1 and hµ = 0.25.The small jumps in T TT V are due to the finite step size in the search for optimal time steps t to meet the error threshold ϵ.See section IV for details of the numerical experiments to produce the data shown here.

Proposition 3 .
Let H = j c j H j a Hamiltonian and U var as defined in Eq. (2) with R ≥ 3. The Trotter solution θ T r,j = tcj R describes a (local) minimum of C(θ) if and only if the Trotter formula is exact, i.e.C(θ T ) = 0 ∀t.

Proposition 4 .
Let H = local, translation invariant Hamiltonian, where the index a labels different interaction terms and the dependence on the qubit number, indicated by the superscript (j), only denotes on which qubit H (j) a has support.Let further U var be as defined in Eq. (2).

Figure 2 .
Figure 2. Perturbative (C, E) and exact (ϵ) error measures (left) and exact error ratio Rϵ and the ratio of error estimates RE (right) as defined in Eq. (21) evaluated on a nearest neighbor XY model on a 3 × 3 quadratic lattice and random interaction strength J (y) µ,ν centered around 0.5, J (z) µν centered around 1 and hµ = 0.25.

Figure 3 .
Figure 3. Pertubartive cost values C compared with exact norm cost function ϵ for a repeated sequence of variational and Trotter solution evaluated on a nearest neighbor XY model on a 3 × 3 quadratic lattice and random interaction strength J (y) µ,ν centered around 0.5, J (z) µν centered around 1 and hµ = 0.25.A single time step always corresponds to R = 3 layers and the single step time increases from left to right √ 2nt ∈ {0.05, 0.5, 1.5}.From the log-log plot, one can read off C(θ K ) 2 = O(K 2 ) as predicted by Proposition 5.
c are orthogonal with respect to the Hilbert-Schmidt inner product, we get E(θ) = √ nE unit (θ).

Figure 5 .
Figure 5. Perturbative (C) and exact (ϵ) error measures (left) and exact error ratio Rϵ and the ratio of error estimates RE (right) as defined in Eq. (21) evaluated on a nearest neighbor Ising model on a 3 × 3 quadratic lattice and random interaction strength Jµ,ν centered around 1 and hµ = 0.25.
Appendix C: Perturbative Distance of an XY model in a Transverse FieldIn total, the XY Hamiltonian from Eq. (23) admits n(n−1) 2 YY terms, n(n−1)

Figure 6 .
Figure 6.Perturbative (C) and exact (ϵ) error measures (left) and exact error ratio Rϵ (right) for R = 4 and R = 6 evaluated on a nearest neighbor XY model on a 3 × 3 quadratic lattice and random interaction strength J (y) µ,ν centered around 0.5, J (z) µν centered around 1 and hµ = 0.25.On the left panel, results for R = 6 are plotted.