Dynamical simulation via quantum machine learning with provable generalization

Much attention has been paid to dynamical simulation and quantum machine learning (QML) independently as applications for quantum advantage, while the possibility of using QML to enhance dynamical simulations has not been thoroughly investigated. Here we develop a framework for using QML methods to simulate quantum dynamics on near-term quantum hardware. We use generalization bounds, which bound the error a machine learning model makes on unseen data, to rigorously analyze the training data requirements of an algorithm within this framework. This provides a guarantee that our algorithm is resource-efficient, both in terms of qubit and data requirements. Our numerics exhibit efficient scaling with problem size, and we simulate 20 times longer than Trotterization on IBMQ-Bogota.

Introduction The exponential speedup of dynamical quantum simulation provided the original motivation for quantum computers [1,2].In the long term, large-scale quantum simulations are expected to transform fields such as materials science, chemistry, and high-energy physics.Nearer term, since efficient classical dynamical simulation methods are lacking (in contrast to those for computing static quantum properties like electronic structure), dynamical simulation may plausibly be one of the first applications to see quantum advantage.
Achieving near-term quantum advantage for dynamics will require long-time simulations on Noisy Intermediate-Scale Quantum (NISQ) hardware [3].Standard methods like Trotterization grow the circuit depth in proportion to the simulation time, ultimately running into the decoherence time of the NISQ device [4,5].Fast-forwarding methods for long-time simulations on NISQ devices have recently been introduced [6][7][8][9], but are limited by various inefficiencies (e.g., qubit and data requirements).Here, we address these inefficiencies, potentially opening the door for near-term quantum advantage.
In parallel to these developments, quantum machine learning (QML) [10,11] has emerged as another potential application for quantum advantage [? ].At its core, QML involves using classical or quantum data to train a parameterized quantum circuit.A number of promising paradigms for training are being pursued, including variational quantum algorithms using training data [12], quantum generative adversarial networks [13,14] and quantum kernel methods [11] (to name just a few).Here we seek to combine the potential of QML and dynamical simulation, by leveraging recent advances in QML to reduce resource requirements for dynamical simulation.
To assess the scalability of QML methods, as well as their applicability to real world problems, it is critical to understand their training data requirements, quantified by so-called generalization bounds [15][16][17][18][19][20][21][22][23][24][25][26][27][28].These provide bounds on the error a machine learning model makes on unseen data, as a function of the amount of data the model is trained on and of the training performance.In this letter, we assess the training data requirements of QML approaches to dynamical simulation.
Our analysis provides the groundwork for a new QMLinspired algorithm for dynamical simulation that we call the 'Resource-Efficient Fast Forwarding' algorithm (REFF).This algorithm uses training data to learn a circuit that allows for fast-forwarding, where long-time simulations can be performed using a fixed-depth circuit.The REFF algorithm is efficient in the amount of training data required.It is also qubit-efficient in the sense that simulating an n qubit system only requires n qubits, in contrast to earlier work [6] which required 2n qubits.We use generalization bounds to rigorously lower bound the final simulation fidelity as a function of the amount of training data used, the optimization quality, and the simulation time.This analysis is complemented by numerical implementations, as well as a demonstration of our algorithm on IBMQ-Bogota.
General Framework Given a set of initial states S ρ and an n-qubit Hamiltonian H, the goal of dynamical simulation is to predict the evolution of a set of observables S O up to time T .A promising approach in the NISQ era is to use QML to fit a time-dependent quantum model that can be extrapolated to long simulation times using a short-depth quantum circuit [6][7][8]29].This is valuable in the NISQ era since high noise levels constrain the depth of circuits that may be simulated.More concretely, the aim is to find some time-dependent Quantum Neural Network (QNN), V t (α), and optimized parameters, α opt , such that for any time t < T , for any ρ ∈ S ρ and O ∈ S O .Possible time-dependent QNNs include those formed from the Newton-Cartan decomposition of H [29,30], or a diagonalization of H [7,31,32], or of the propagator for short-time evolution [6,8].Fig. 1 depicts this framework.
In training V t (α), the most appropriate choice in training data will depend on the set of initial states S ρ and observables S O , but typically will be generated by the properties of the system at short times.For example, if one is interested in only knowing the evolution of a single observable, the training data could consist of the evolution of this observable for some subset of the target states up to some short time ∆t.Alternatively, if one is interested in simulating the evolution of any possible n qubit observable, it would be natural to use training data consisting of N pairs of input-output states where the output states are generated by evolving the input states for some short time, i.e., |Φ (j) = U ∆t |Ψ (j) where U ∆t ≈ e −iH∆t is a gate sequence (such as a Trotterization) that approximates the true time evolution.
The training data is initially used to train the QNN to reproduce the properties of the system at short times, i.e., to ensure that Eq. ( 1) holds for times t ≤ ∆t.Crucially, while trained on a subset of the target states and/or observables, the hope is that the learned QNN generalizes to the unseen target data, i.e., it well reproduces Eq. ( 1) for short times (t ∆t) for any ρ ∈ S ρ and O ∈ S O .The properties of the system at some longer time T can then be extrapolated via V T (α opt ).
Generalization bounds quantify the performance of a QNN on unseen data, after optimization on a limited training set.In particular, the generalization error measures how much the performance on new data differs from the performance on the training data.Ref. [23] has shown that if the parameterized quantum circuit has K trainable local gates, the generalization error scales at worst as Õ( K/N ).Crucially, as established in Ref. [25], these training states may be product states.Below we use these quantum generalization bounds to quantify the performance of dynamical simulation via QML.

REFF algorithm
For the remainder of this letter, we focus on a time-dependent QNN of the following form: where α = (θ, γ) and W (θ) is a time  D t (γ) M = D M t (γ) for any positive integer M .We will train V t (α) at time t = ∆t to obtain the trained QNN, V ∆t (α opt ).If this well-approximates the target unitary, i.e., V ∆t (α opt ) ≈ U ∆t , then we have learned an approximate diagonalization of U ∆t and the long-time simulation e −iHT can be approximated using the fixed depth circuit V T (α opt ) [6,8].Below we formally bound the fidelity between the simulated evolution and the exact evolution.
In what follows we consider the (most difficult) task of learning the dynamics for all input states and all observables, i.e., where S ρ is the whole Hilbert space and S O is the set of all positive operator valued measure (POVM) elements.(We remark, however, that our analysis can be extended to dynamics within subspaces, such as subspaces that preserve certain symmetries.)For this task, as discussed above, a natural choice in training data would be N pairs of input-output states D(N ) = {(|Ψ (j) , U ∆t |Ψ (j) )} N j=1 .A simple choice for the input training states |Ψ (j) would be to use (Haar) random n-qubit states.However, such random states will typically be highly entangled, requiring deep circuits to prepare, and thus are unsuitable for NISQ hardware.A more promising approach is to use tensor products of (Haar) random single-qubit states, which are preparable using only single-qubit gates, and so induce less noise.Thus we suppose that U ∆t is learned using a set of product input states, i.e., that the training data has the form where the states {|ψ i=1 are independently drawn from the single-qubit Haar distribution [33].
There is freedom in how exactly this training data is used to learn U ∆t but a natural approach would be to minimize the distance between the target output state |Φ where for compactness we write V ∆t (α) ≡ V and • 1 denotes the trace norm.We can rewrite this cost in terms of the fidelity as which can be measured on a quantum computer using the Loschmidt echo circuit [27] shown in Appendix C. While natural and intuitive, Eq. ( 5) is a global cost [34] since it is measured via the global measurements 1 − |Ψ (j) P Ψ (j) P | on all n qubits of the states V † |Φ (j) P Φ (j) P |V for j = 1, . . ., N .Hence, it encounters exponentially vanishing gradients, known as barren plateaus [34][35][36][37][38][39][40][41][42][43][44][45].To mitigate such trainability issues, we advocate instead training using a local version of the cost of the form with O (j) , where i denotes the set of all qubits except for i.This cost is faithful [46], i.e., vanishing if and only if U ∆t = V , but crucially is also trainable as long as the ansatz is not too deep [34].
We call this algorithm, which uses the local product state cost, Eq. ( 6), to learn a diagonalization QNN of the short time evolution unitary of a system and thereby fastforward its evolution, the Resource-Efficient Fast Forwarding (REFF) algorithm.The algorithm is both efficient in terms of qubit usage (requiring only n qubits to simulate an n-qubit system) and, as shown below, in terms of quantum data usage.
Simulation Error Bounds An operationally meaningful measure of the quality of the simulation via V T (α opt ) is given by the average simulation fidelity [47] where the integral is over states |ψ chosen according to the n-qubit Haar measure.In this section, we lower bound the final simulation fidelity F (α opt , T ) for the REFF algorithm, allowing for an arbitrary optimization procedure.Our bound depends on the time simulated, T , the amount of training data, N , the learning error over the training data (i.e., the minimum cost achieved, , and the error incurred from approximating the short time evolution e −iH∆t with the gate sequence U ∆t , that is = U ∆t − e −iH∆t 2 . Theorem 1 (Simulation error for product-state training -Informal).Consider a QNN V t (α) given by Eq. (2) and composed of K parameterized local gates.When trained with the global cost C G DP(N ) using training data D P (N ), the simulation fidelity after time T = M ∆t, for a positive integer M , satisfies with high probability over the choice of random product state data.Here f (K, N ) := .
Alternatively, if the local cost C L DP(N ) is used for training, Eq. (8) holds with Theorem 1 implies that the fast-forwarded simulation fidelity deviates from 1 at worst quadratically in the number of fast forwarding steps, M .Moreover, inverting Eq. ( 8) provides a means of bounding the number of product state training pairs and the minimal cost function sufficient to guarantee a given desired simulation fidelity and total simulation time.
In particular, Theorem 1 implies that a high fidelity may be achieved whenever the number of training pairs N is effectively of the same order as M 4 Kn 2 , i.e., scales polynomially in the product of the number of fast forwarding steps, the number of parameters used for the diagonalization, and the system size.Thus, the success of this QML-inspired approach to simulation depends critically on the number of parameters required to approximately diagonalize the short-time evolution of a system.
For example, Ref. [48] analytically constructs the circuits required to exactly diagonalize the Ising chain.These circuits, which can simulate the XY model and Kitaev's honey-comb lattice, require n gates for the diagonal matrix D and O(n 2 ) gates for W . Thus, assuming the correct discrete structure is known (or can be approximately found variationally), such models are captured by ansätze with only a polynomial number of parameters.This provides hope that other systems may similarly be diagonalized with favorable parameter number scalings.
Implementations In practice, even less training data than that suggested by our theoretical bounds may suffice.Here we numerically investigate the minimal training data required for two oft-studied Hamiltonians.We train on random product input states, and to simplify notation we write To evaluate the quality of the learned diagonalization and the resulting simulation, we use the average fidelity F M between the learned diagonalization and a second order Trotterized unitary U M ∆t .(This is similar to Eq. ( 7), but with U M ∆t replacing e −iHT .)Let us consider the 4-qubit Heisenberg Hamiltonian H = 4 i=1 S i • S i+1 with periodic boundary conditions.For a set of input states composed of 3, 4, or 5 Haarrandom product states, we minimized the C G REFF cost and simultaneously tracked the average gate fidelity between the ansatz [49] and the Trotter unitary (with ∆t = 0.1), as shown in Fig. 2a).The optimization was repeated for 10 different sets of training data, from which we computed the geometric mean and standard deviation (arithmetic mean and standard deviation in logspace).We found that 5 input states were sufficient to perform a full Hilbert space diagonalization, with all 10 runs of 5  6), to help mitigate the exponential suppression of gradients as the system size grows due to Barren Plateaus.In all system sizes tested, as shown by the overlap of the lines and markers, only a single training state was required to diagonalize over the entire Hilbert space.The inset of Fig. 2b) indicates that the final simulation error, as measured by 1 − F M , scales sub quadratically with time, as predicted.In addition, we observe efficient (i.e., polynomial) scaling with n for the gate count required for diagonalization.For further elaboration on this point, and as well as additional implementation details, see Appendix C of the Supplementary Material.We note that our simulation fidelities are computed with respect to the Trotterized unitary, to quantify the quality of the learning of the target unitary.The Trotterization has an associated Trotter error that will cause a deviation from the true dynamics of the Hamiltonian, however increasing the order of the Trotter approximation can arbitrarily decrease this error.This effect on the simulation fidelity is numerically explored further in Appendix C.
To demonstrate the suitability of REFF for near-term hardware, we implemented REFF to diagonalize and fastforward a 2-qubit spin chain, described by the XY Hamiltonian.The clear alignment between C G REFF and 1 − F 1 in Fig. 3b) shows that, whilst only using a single unentangled training state, a full approximate diagonalization has been successfully learned.This then enables the high fidelity fast forwarding shown in Fig. 3b).Namely, for random input states we achieved, on average, a fidelity of 0.8 for 94 time steps [50].This is a factor of 23.8 improvement on the standard Trotter method which has a fidelity of less than 0.8 after 4 time steps.
Discussion In this work, we introduced a framework for leveraging the power of QML for dynamical quantum simulations.The core idea is that quantum training data may be used to train a time-dependent QNN, which can then predict the evolution of the properties of the target system at long times using a short-depth circuit.By way of example, we introduced the REFF algorithm, which uses training data composed of product-state inputs and corresponding time-evolved outputs to learn an approximate diagonalization of the short-time evolution of the system.We showed that generalization bounds provide a tool to rigorously ground this QML-driven approach to quantum simulation.Specifically, for REFF we proved that a high fidelity simulation may be achieved with a number of training pairs N that scales polynomially in the product of the number of fast forwarding steps, the number of parameters of the QNN, and the system size.
While our error analysis and implementations focus on REFF, the framework is much more general.Important future steps include investigating alternative ansätze for time-dependent QNNs and alternative forms of training data.The most appropriate choice of training data is dictated partially by the states and observables one wants to simulate and partially by what is available.For example, to simulate only within a particular subspace, one may use only training data from that subspace.How to do so most efficiently remains an open question.
Note that we can define the loss function, and thus the training and expected cost, also using different loss observables O loss |Ψ (j) ,|Φ (j) in Eqs.(A10), (A11), and (A13).As Theorems A.1 and A.2 allow for such general loss observables, a change of this kind does not alter the statements of Corollaries A.1 and A.2.We will make use of this observation in Section B 2 to define a suitable local cost.
Here, we denote by the average fidelity over Haar-random inputs.
Lemma B.1 effectively allows us to freely choose in our analysis whether we work with C G Haarn (α), F (U, V ∆t (α)), or C HST (U, V ∆t (α)).With this freedom, we can now establish a guarantee on the overall simulation fidelity for an evolution along a Hamiltonian H for time T , given by , with the |Ψ i independent Haar-random n-qubit states and U = U ∆t the unknown approximate short-time evolution unitary, the total simulation fidelity satisfies where α opt denotes the final parameter setting after training.
Proof.By Theorem A.1, with probability ≥ 1 − δ over the choice of training data , with the |Ψ i independent Haar-random n-qubit states, the final parameter setting α opt satisfies Once we recall from Lemma B.1 that C G Haarn (α opt ) = d d+1 C HST (U, V ∆t (α opt )), we see that the above gives a bound on the HST cost of the learned short-time evolution in terms of the training cost on N Haar-random n-qubit states, which holds with high probability.Namely, with probability ≥ 1 − δ, Next, we make use of the behavior of the HST cost under iterated applications of unitaries, which it inherits from the Schatten 2-norm.Namely, according to Eq. (S20) of Ref. [6], we have Rearranging this inequality, and applying both where denotes the short-time approximation error as defined above.Now, we can plug in the bound from Eq. (B14) and obtain which is the claimed bound.
Theorem B.1 tells us: When using a QNN to learn the approximate short-time evolution by training on a data set with Haar random input states, a training data size effectively scaling as N ∼ M 4 • K log(K) will, with high probability, lead to good generalization also on the level of the long-time evolution.
Remark B.1.From a more practical perspective, we can interpret Theorem B.1 as justifying a purely training-costbased termination condition for training if a large enough training data set is used.We can see this as follows: Suppose we want to achieve a simulation fidelity ≥ 1 − ε for up to M 0 forwarding steps, with some fixed success probability, say ≥ 0.95.Then, it suffices to choose training data of size N ∼ M 4 0 • K log(K) (ε/2) 2 and to ensure that the training cost and the short-time approximation error satisfy . In other words, assuming a suitable training data size is chosen, we can terminate training as soon as the training cost satisfies (Naturally, this presupposes a small enough short-time approximation error, namely Remark B.2. Throughout our proofs in this subsection, the only property of the n-qubit Haar measure Haar n that entered our reasoning was the connection of the corresponding expected cost to the average fidelity and the HST cost, see Lemma B.1.We can replace the data-generating measure Haar n by an n-qubit 2-design.This will lead to the same expected cost, still satisfying Lemma B.1, and thus to the same simulation error guarantee.

Training with product of Haar-random single-qubit input states
The costs presented in Section B 1 are based on Haar-random n-qubit states.However, these are costly to prepare in practice as they require deep quantum circuits.In particular, assuming access to a training data set consisting of multiple Haar-random n-qubit states and their output states under (approximate) short-time evolution might be too optimistic for many practical applications as n grows.Therefore, we instead use a different notion of cost, based on tensor products of easy-to-prepare Haar-random single-qubit states.That is, we work with training data of the form where the |ψ (j) i are drawn i.i.d.according to the single qubit Haar measure Haar 1 and U = U ∆t is the unknown short time evolution that we are trying to learn.Note that, if we have access to a black box implementing the evolution according to U , we can create such a training data set efficiently because only local Haar random states are required as inputs.(ε/2) 2 .Our numerical results, however, have found that a full generalization can be achieved with far fewer states than this bound would suggest.To take advantage of this and seek a minimally sized training dataset, an algorithm variant is motivated, starting with a small training dataset that is grown over time until generalization is observed.As we are working below the upper bound, the guarantees on generalization no longer apply, so an extra validation error is required to quantify the QNN's generalization.This could be implemented by creating a validation dataset of Haar-random product states that the C REFF cost is periodically tested against as the optimization progresses.If the training C REFF is decreasing during the optimization, but the validation C REFF is observed to plateau, this indicates the training dataset is too small and its size should be increased.

Gradient Formula
Following the method of [6,8], the partial derivative of C REFF (U, V (θ, γ)) with respect to θ l , is The unitary W l+ (W l− ) is generated from the original unitary W (θ) by the addition of an extra π 2 (− π 2 ) rotation about a given parameter's rotation axis: The analogous formula for the partial derivative with respect to γ l , is

Choice of Ansatz
XY Hamiltonian.For our diagonalizing unitary, we exploit the particle-number conservation property of the Heisenberg Hamiltonians, and for W use only 2-qubit Givens rotation gates that also respect this symmetry.This ensures the state remains in the symmetry subspace, simplifying the optimization.For the numerics shown in Fig. 2b), a brickwork-style ansatz was used for W .Each layer was composed of a Givens rotation gate between odd pairs of qubits, then a Givens rotation gate between even pairs of qubits, resulting in n − 1 gates per layer.All the examples used 1.5n layers, resulting in a total gate count for W of 1.5n(n − 1) and a depth of 3n; this can be considered an improvement to the O n 2 gate count and O (n log (n)) circuit depth presented in [48].The ansatz for D was composed of a single R z gate on each qubit.
Heisenberg Hamiltonian.It is known [54] that a 2-local nearest-neighbor gate fabric of Givens rotations is not universal for the Hamming-weight preserving subgroup H(2 n ), so the ansatz we used for the XY Hamiltonian does not diagonalize the more general Heisenberg Hamiltonian.In these numerics, for the W layered ansatz, we instead use the most general 2-qubit gate that conserves the particle number, which takes 4 parameters and is displayed below.
D is composed of an R z rotation gate on each qubit, and an R zz gate between all pairs of qubits.generalization.For each system size, we perform REFF with a range of training set sizes, training until C G REFF = 10 −8 and simultaneously compute the simulation fidelity (1 − F 1 ) as our metric for generalization.Each instance of REFF performed on a particular training set size was repeated 5 times to generate a mean and standard deviation.We observe, as shown in Fig. C2, that training sets of insufficient size do not produce generalization when trained on, but when the training set reaches some critical threshold, minimizing C G REFF simultaneously minimizes the simulation fidelity.The scaling of this threshold is plotted in Fig. C2b); as a function of the number of parameterized gates, K, the required number of training states, N , appears to scale sub-linearly, within our analytic upper-bound.

Simulation fidelity against the true Hamiltonian evolution
The numerical results in Fig. 2 demonstrate that with the correct ansatz design, the Hilbert-Schmidt Test between the Trotterized unitary and the diagonalized ansatz can reduced arbitrarily small for the systems tested.As shown in the inset of Fig. 2b), this allows the action of the Trotterized unitary to be fast-forwarded for long-time simulations.The unitary we are learning is a Trotterization, with an associated Trotter error with respect to the true Hamiltonian evolution, which will also be learned during the REFF training.REFF can at best perfectly learn the evolution with this Trotter error included, however increasing the order of the Trotter approximation decreases this error to reproduce the true Schrödinger evolution.6.Quantum state tomography of hardware implementation Fig. 3b) shows the evaluation of the fast-forwarding performance of the QNN trained on ibmq bogota.We use quantum state tomography to reconstruct the output density matrix of both the REFF-evolved state and the Trotterevolved state at timestep N , ρ(N ), and compute the fidelity with respect to the noise-free Trotter evolved state, |ψ(N ) .The method for performing this follows [8]; an n-dimensional density matrix ρ can be decomposed into the Pauli product basis as ρ = η • σ (n) .Here σ (n) is a 4 n dimensional vector composed of the elements of the n-qubit Pauli group P n = {σ I , σ X , σ Y , σ Z } ⊗n and η is the corresponding vector of Pauli weights, i.e. η k = 1 2 n Tr(σ

FIG. 1 .
FIG. 1. QML framework for dynamical simulation.Our general framework (left panel) consists of using quantum training data, e.g., composed of input-output state pairs and/or input-output observable pairs, to train a timedependent QNN, Vt(α).Typically the training occurs at a short time ∆t, resulting in the trained QNN, V∆t(αopt).The evolution of the system at some longer time T is extrapolated via VT (αopt).The Resource Efficient Fast Forwarding (REFF) algorithm (right panel) is an illustrative example of this framework.The training data consists of Haar-random product states as inputs and then the time-evolved states as outputs, i.e., evolved under U∆t ≈ exp(−iH∆t).The time-dependent QNN is a parameterized quantum circuit in a diagonal form: W (θ)Dt(γ)W † (θ).Hence training the QNN amounts to approximately diagonalizing the short-time evolution U∆t, resulting in the trained QNN: W (θopt)D∆t(γ opt )W † (θopt).This model can simulate the evolution of arbitrary input states up to time T via W (θopt)DT (γ opt )W † (θopt).

=
U ∆t |Ψ (j) P and the hypothesized output state V ∆t (α)|Ψ (j) P , averaged over the N states in the training set.That is, we minimize the following cost

FIG. 2 .
FIG.2.Numerical simulations.a) REFF is used to diagonalize the 4-qubit Heisenberg Hamiltonian with periodic boundary conditions.Only 5 Haar-random product training states are required to generalize over the whole Hilbert space, measured by the decrease in simulation infidelity 1 − F1.The inset shows that the average fidelity FM ≡ F(αopt, M ∆t) in this case remains over 0.95 for 2000 time steps.b) REFF is applied to increasing sizes of the XY model.For all sizes tested, a single Haar-random product training state was sufficient to achieve a machine precision simulation infidelity.For each system size, the ansatz is saved upon reaching C L REFF = 10 −14 and used to fast-forward the evolution, as shown in the inset.

FIG. 3 .
FIG. 3.Quantum hardware implementation.a)The 2-qubit XY Hamiltonian is diagonalized via REFF on ibmq bogota using 216 measurement shots per circuit.The noisy C G REFF cost is measured on ibmq bogota, whereas the noise-free C G REFF cost and 1 − F1 are computed classically.b) After training, the fast-forwarded performance is compared to the iterated Trotter method for 5 Haar-random product states and we plot the mean and standard deviations.

Algorithm 1 : 4 0K 3 4 4 0K
FIG. C1.REFF Circuit.A generalized 3-qubit representation of the circuit used to compute CREFF is displayed.H k is a Haar-random single-qubit unitary.For a training dataset of size N , the output of this circuit is averaged over the N unique training states to compute the cost function.
FIG. C2.Scaling.a) REFF is applied to increasing sizes of the n-qubit Heisenberg Hamiltonian, H = n i=1 Si • Si+1 with periodic boundary conditions.An increasing size dataset of Haar-random product states is used for training, to determine the minimum number required for generalization.This is seen when minimizing the training loss (C G REFF ) results in simultaneously minimizing the validation loss (1 − F1) without premature plateauing.b) The minimum number of training states required for generalization in the above numerics are plotted against the number of parameterized gates in its VFF ansatz, showing an apparent sub-linear scaling.

2 U
FIG. C3.Trotter Error.The REFF algorithm is applied to the 10 qubit Hamiltonian H = 9 i=1 XiXi+1 + YiYi+1 with open boundary conditions, and the resulting diagonalized unitary performs a fast-forwarding of the Hamiltonian.The Trotterization has a timestep of ∆t = 0.1.The fidelity of the fast-forwarding is computed with respect to the Trotterized unitary and the true Hamiltonian evolution.

9 i=1
Fig. C3 shows the fast-forwarding of a diagonalized ansatz after REFF training against the 10 qubit Hamiltonian H = X i X i+1 + Y i Y i+1 with open boundary conditions.To compare the simulation fidelity against the Trotter unitary U (∆t) for non-integer multiples of ∆t, for t = M ∆t + c (with c < ∆t) the unitary applied is U Trotter (t) = U (∆t) M U ( c ∆t ).Here the Trotterization is a 2nd order Suzuki-Trotter approximation, with Trotter number 10, and hence has a very low Trotter error.The VFF ansatz had in total 360 CNOT gates, compared to 380 in the Trotter unitary.
-independent unitary.D t (γ) is a time-dependent unitary that is diagonal in the standard basis.Since D is diagonal, M applications of D t are equivalent to one application of D M t , i.e., Theorem 1 upper-bounds the number of training states required to guarantee a high simulation fidelity for a QNN with K parameterized gates by the scaling O(K log(K)).Here we numerically investigate this bound by performing REFF on increasing n-qubit Heisenberg Hamiltonians and observe the minimum number of training states required for