Quantum algorithm for time-dependent Hamiltonian simulation by permutation expansion

We present a quantum algorithm for the dynamical simulation of time-dependent Hamiltonians. Our method involves expanding the interaction-picture Hamiltonian as a sum of generalized permutations, which leads to an integral-free Dyson series of the time-evolution operator. Under this representation, we perform a quantum simulation for the time-evolution operator by means of the linear combination of unitaries technique. We optimize the time steps of the evolution based on the Hamiltonian's dynamical characteristics, leading to a gate count that scales with an $L^1$-norm-like scaling with respect only to the norm of the interaction Hamiltonian, rather than that of the total Hamiltonian. We demonstrate that the cost of the algorithm is independent of the Hamiltonian's frequencies, implying its advantage for systems with highly oscillating components, and for time-decaying systems the cost does not scale with the total evolution time asymptotically. In addition, our algorithm retains the near optimal $\log(1/\epsilon)/\log\log(1/\epsilon)$ scaling with simulation error $\epsilon$.


I. INTRODUCTION
The problem of simulating quantum systems, whether it is to study their dynamics, or to infer their salient equilibrium properties, was the original motivation for quantum computers [1] and remains one of their major potential applications [2,3]. Classical algorithms for this problem are known to be grossly inefficient. Nonetheless, a significant fraction of the world's computing power today is spent on solving instances of this problem -a reflection on their importance [4][5][6].
An important class of quantum simulations that is known to be particularly challenging, and is the focus of this work, is that of time-dependent quantum processes, which are at the heart of many important quantum phenomena. These include for example quantum control schemes [7], transition states of chemical reactions [8] analog quantum computers such as quantum annealers [9] and the quantum approximate optimization algorithm [10]. Devising stateof-the-art resource efficient quantum algorithms to simulate these types of processes on quantum circuits is therefore a very worthy cause: it will allow for the studying of said phenomena in a controllable and vastly more illuminating manner.
In the literature, a number of quantum algorithms designed to simulate the dynamics of time-dependent quantum many-body Hamiltonians already exist. However, most of them are variants of algorithms that suit time-independent Hamiltonians but lack optimizations for dynamical ones. For example, Hamiltonians based on the Lie-Trotter-Suzuki decomposition were developed in Refs. [11,12], where the complexity scales polynomially with error. More recent advances [13,14] improve it to a logarithmic error scaling, which directly lead to applications in time-dependent Hamiltonian simulations [15,16]. A recent study by Berry et al. [17] improves the Hamiltonian scaling to L 1 −norm, by considering the dynamical properties of the time-dependent Hamiltonian. However, these mostly comprise of slicing the dynamics into a sequence of 'quasi-static' steps, each of which implementing a static quantum simulation module. In addition, all the above-mentioned algorithms assume a time-dependent oracle -a straightforward but not necessarily practical assumption that can obscure the true complexity of the simulation when physical models are considered.
The sub-optimality that characterizes existing quantum algorithms can be attributed mainly to the fact that the time-evolution operator for time-dependent Hamiltonians is a more intricate entity than its time-independent counterpart (this matter is discussed in more detail below): While in the time-independent case the Schrödinger equation can be formally integrated, the time-evolution unitary operator for time-dependent systems is given in terms a Dyson series [18] -a perturbative expansion, wherein each summand is a multi-dimensional integral over a time-ordered product of the (usually interaction-picture) Hamiltonian at different points in time. These time-ordered integrals pose multiple algorithmic and implementation challenges.
In this paper, we provide a quantum algorithm for simulating a time-dependent Hamiltonian dynamics. This algorithm invokes a separation of the Hamiltonian H(t) into a sum of a static diagonal part H 0 and a dynamical part V (t), i.e., H(t) = H 0 +V (t), and switches to the interaction-picture with respect to H 0 . The target evolution operator becomes a product of an interaction-picture unitary U I (t) followed by a diagonal unitary e −iH0t that can be simulated efficiently. The interaction Hamiltonian V (t) is expanded as a sum of generalized permutations, and the resulting Dyson series of the evolution operator U I (t) becomes an integral-free representation [19] with the notion of divided differences, which is a well-studied quantity [20][21][22][23][24][25]. The divided differences have an intuition of discretized derivatives and is closely related to polynomial interpolations [20]. We refer the reader to Appendix A for a short summary of the notion of the divided differences. Under this representation, we use the LCU method [14] to simulate U I (t) with a truncated Dyson series. We find a partitioning scheme that determines the duration of the time steps along the simulation. Following this procedure, in general, each time interval has a different duration which is determined form the Hamiltonian's dynamical characteristics and can lead to substantially fewer number of steps as compared to using identical-length simulation segments, typically used in quantum simulation algorithms. We analyze the implementation gate and qubit costs and discuss the circumstances under which our simulation algorithm provides improvements over the state-of-the-art. Specifically, our algorithm is independent of the oscillation frequencies of the Hamiltonian. This is in stark contrast to existing algorithms which have dependence on ||dH(t)/dt||, which grows with oscillation rates. Another class of Hamiltonians for which our algorithm is preferred over others is those with exponential decays. We show that for these systems, our algorithms requires asymptotically a finite number of steps which does not scale with the evolution time, leading in turn to an exponential saving comparing to the linear scaling in existing approaches. Moreover, the cost with Hamiltonian norm only mainly depends on the interaction Hamiltonian V (t) and not the total Hamiltonian H(t) [17]. This also indicates an advantage of the algorithm when the time-dependent Hamiltonian is dominant by a static part.
The paper is organized as follows. In Sec. II, we review the permutation expansion method that leads to an integralfree representation for the Dyson series, as introduced in Ref. [19]. In Sec. III, we present in detail the simulation algorithm that combines the integral-free expression of the evolution operator with the LCU method, and analyze the circuit costs. We highlight the main advantages of our algorithm in Sec. III D 3. In Sec. III E, we address the cases when the exponential-sum expansion of the time-dependence is not exact and estimate the error that stems from a finite sum approximation. Finally, we give a brief summary for our methods and results in Sec. V.

II. PERMUTATION EXPANSION METHOD FOR TIME-DEPENDENT HAMILTONIANS
In this section, we briefly describe the integral-free Dyson series expression of the evolution operator, derived from a permutation expansion of the time-dependent Hamiltonian [19]. Without loss of generality [26], we expand a general time-dependent Hamiltonian in terms of products of time-dependent diagonal matrices, D i (t), and permutation operators, P i , i.e., where P 0 ≡ 1. This decomposition can be done efficiently as long as M scales polynomially with log d, where d is the dimension of the Hamiltonian. We decompose each diagonal matrix into a finite sum of exponential functions, i.e., where Λ in some basis {|z } (the basis in which D 0 is diagonal) and K i indicates the number of terms in the exponential decomposition for D i (t). This can be done for many cases when the time dependencies are simple combinations of exponential terms. For simplicity we assume here that the K i 's are finite, and address the most general time dependence in detail in Sec. III E and refer to various algorithms [27][28][29][30][31] for efficiently finding an exponential sum approximation of a function. For a lighter notation, we set K i = K for all i. We can evaluate the time-evolution operator U (t) corresponding to where i q = {i q , · · · , i 1 } and k q = {k q , · · · , k 1 } are multi-indices. The action of U (t) on a basis vector |z is where |z ij ≡ P ij · · · P i1 |z with j ranging from 1 to q, and λ . P iq is a shorthand of P iq · · · P i1 , and similarly d Figure 1 illustrates the accumulative actions of D (k) i P i on a basis vector |z . To proceed, we use the following identity to simplify the expression in terms of divided differences. It is a variant of Hermite-Genocchi formula [20] applying to the exponential function.
where x j = q l=j λ l and e [x1,··· ,xq,0] is the divided difference of the exponential function with inputs x 1 , · · · , x q , 0. (The case with q = 1 can be shown by explicit integration, and the identity follows by induction. For more details, see Ref. [19].) With this property, the multi-dimensional integration in the time-evolution operator can be simplified as where The second equality uses the change of variable dτ = tds, and the last equality follows from Identity 1 and the identity of t q e [tx0,··· ,txq] = e t[x0,··· ,xq] . By completing the basis, we get This is an integral-free expression for the unitary time-evolution operator of the time-dependent Hamiltonian H(t).
We will later approximate the unitary by truncating the series at some order q = Q that scales as O log(1/ ) loglog(1/ ) [14], where is the required accuracy.

III. TIME-DEPENDENT HAMILTONIAN SIMULATION ALGORITHM
A time-dependent Hamiltonian H(t) can be expressed as a sum of two Hamiltonians-a time-independent H 0 and a dynamical V (t), i.e., In many practical models, H 0 represents a static and simple Hamiltonian that is often diagonal in a known basis (which we will identify with the computational basis). Hence, hereafter, we assume that H 0 is a diagonal operator with real diagonal elements. The V (t) component represents the nontrivial interactions between subsystems. Assume 1 H 0 is diagonal in the computational basis {|z }. We switch to the interaction picture, i.e., where |ψ I (t) = e iH0t |ψ(t) and H The Schrödinger-picture unitary operator U (t), satisfying |ψ(t) = U (t)|ψ(0) , is equivalent to a time-ordered matrix exponential followed by a diagonal unitary, i.e., Hence, the simulation of U (t) = e −iH0t U I (t) consists of two parts-a complicated U I (t) and a simple diagonal unitary e −iH0t . The simulation of e −iH0t can be achieved with a gate cost that scales only linearly with the locality of H 0 . When we write H 0 = L γ=0 J γ Z γ , where each Z γ is some tensor product of (single-qubit) Pauli-Z operators acting on at most d qubits, it can be shown that the gate cost scales as O(Ld) [32,33]. Therefore, the main focus of our simulation is on U I . We next provide an overview of the simulation algorithm in Sec. III A. In Sec. III B, we incorporate the LCU framework with the permutation expansion method. Sec. III C 2 provides the state preparation operation and Sec. III D evaluates the simulation cost for the whole procedure.
A. An overview of the algorithm Our proposed simulation algorithm consists of a permutation expansion procedure for U I and the LCU method for the quantum simulation. In Sec. III B, we explain in detail the essential ingredients for merging these two approaches. Before delving into technical details, we provide an overview of the algorithm in this section.
Given a time-dependent Hamiltonian H(t), we first decompose H(t) into a sum of a static diagonal term H 0 (if exists) and a dynamical term V (t). We switch to an interaction picture so that the target unitary evolution U (T ) over a period T becomes Therefore, the simulation of U (T ) is equivalent to applying U I (T ) followed by e −iH0T . Since the diagonal unitary e −iH0T can be efficiently simulated, we focus on U I (T ) hereafter. Let us expand V (t) as a sum of permutations as where P i are permutations (P 0 ≡ 1) and D i (t) are some diagonal matrices that are expressed as exponential sums, i.e., , whose respective durations ∆t w , w = 0, . . . , r − 1 are determined by the partitioning scheme given in Sec. III B and the time markers t w , are defined as t w = w−1 l=0 ∆t l . The total number of steps is denoted as r. The evolution operator from t w to t w + ∆t w is expressed as where we denote where ||·|| max the max norm, are some diagonal unitaries as derived later in Eq. (38) and each P iq is a unique product of permutations. Note that the above evolution operators are given as a linear combination of unitaries (LCU). We provide a review for the LCU method in Appendix C. We set the truncation order Q to be 2 where is the overall simulation accuracy.
To implement the LCU routine for each U I (t w + ∆t w , t w ), we require preparing a state 2 An exact truncation order that guarantees the accuracy is where |i q represents Q quantum registers that each has dimension M and |k q represents Q quantum registers that each has dimension K, and s is the normalization factor. Following the same notation in Sec. C, let us denote the state preparation unitary as B, i.e., B|0 ⊗2Q+1 = |ψ 0 (B is explicitly given in Sec. III C 2). Let us denote V c the control unitary such that The Oblivious Amplitude Amplification (OAA) involves interleaving the operator where R ≡ I − 2(|0 0| ⊗ I). For each piece of the unitary, we implement A on the extended system |0 ⊗(2Q+1) |ψ . By construction, we have This means that applying A effectively performs the unitary U I (t w + ∆t w , t w ) on the main system |ψ , with error O( /r). Combining r pieces of the procedure, it effectively simulates U I (T ) with overall error O( ), i.e., where A w are the OAA operators for the corresponding piece of evolution. This implies that applying the sequence of As followed by the circuit for e −iH0T can approach the action of U (T ) to an arbitrary accuracy.

B. Permutation expansion for UI (t)
In this section, we give a thorough introduction of the permutation expansion in the Dyson series and the conditions arisen from implementing the LCU method. We focus on addressing the interaction-picture unitary U I (t), i.e., the time-ordered operator in Eq. (14). Using the expansions introduced in Eqs. (16) and (17), we get We denote the basis in which H 0 is diagonal by {|z } and its diagonal elements by E z = z|H 0 |z . The action of U I (t) on a basis vector |z becomes where E z i j is the z ij th diagonal element of H 0 , i.e., E z i j = z ij |H 0 |z ij , and |z ij = P ij |z with P ij = P ij · · · P i1 . By Identity 1, this can be further simplified as where To implement the LCU method for a quantum simulation of U I (T ), we first decompose the overall simulation duration T into r pieces in sequence, i.e., where the operators in the product of the last equation are understood to be ordered, t w+1 = t w + ∆t w and t 0 ≡ 0 and t r ≡ T . The number of steps, r, and the step size, ∆t w , are to be determined. When acting on a computational basis state, each piece in the decomposition can be written as which has the same form as Eq. (28) except that the integration intervals are shifted (with E z i 0 ≡ E z ). We can denote which leads to To formulate the above expression in terms of a linear combination of unitaries, we need to evaluate the norms of e ∆tw[x1,x2,··· ,xq,0] and d The norm of the e ∆tw[x1,x2,··· ,xq,0] can be bounded by using the following identity.
The proof can be found in Appendix A. From Identity 2, we show in Appendix B that where we denoted the quantity With these bounds, the factors in the expansion form in Eq. (32) can be written as The evolution operator from t w to t w + ∆t w becomes where Φ (kq,w) iq,± are diagonal unitaries with diagonal elements being e i ±φ To implement the LCU method for simulating U I (t w + ∆t w , t w ), we require a preparation of the state where |i q represents Q quantum registers that each has dimension M and |k q represents Q quantum registers that each has dimension K. The normalization constant is where we define the Γ(t w ) as and we note that Γ(t w ) is an upper bound on the max-norm of the interaction Hamiltonian at time t w , Γ(t w ) ≥ V (t w ) max . The quantity Γ(t w ) is related to the energy strength in a typical LCU setup [14]. In Appendix D, we provide an alternative way that uses a larger bound Γ = M K max ∀k,i ||D (k) i || max , which leads to an exponential saving for the state preparation. We proceed with Γ(t w ) hereafter.
The OAA step in the LCU method requires s ≈ 2. This leads to and Eq. (40) becomes a truncated Taylor expansion of 2 up to order Q, i.e., 2 ≈ Q q=0 (ln2) q q! . If we require |s−2| ≤ /r, where r is the total number of steps and is some positive number, then the simulation error for each U I (t w +∆t w , t w ) is also within /r. The required truncation order with this accuracy scales as

Time partitioning and number of time steps
The condition in Eq. (42) imposes a constraint on the next step size ∆t w given the current time t w , Remembering that Γ(t w ) is a function of t w = w−1 l=0 ∆t l , this condition determines the schedule, as every ∆t w is determined by the preceding time steps.
Special care should be given when setting the last time step, as ∆t w can become too large that exceeds the total desired evolution time T . Whenever t w+1 is found to be greater than T (or if the argument inside the ln(·) is found to be negative), one should replace the bound Γ(t w ) with a larger boundΓ(t w ) = λ ln 2/(e λ∆tw − 1) and set the final step ∆t w = T − t w .
Let us now examine the dependence of ∆t w on Γ(t w ) in order to determine a bound on the number of time steps (equivalently, number of repetitions) r required for the execution of the entire time evolution. We distinguish between three cases. (i) When λ = 0, we have ∆t w Γ(t w ) = ln 2, similar to the time-independent case though we note that a vanishing maximal λ could imply time-dependent oscillations as well. This can be seen by taking the λ → 0 limit of Eq. (44). (ii) In the case where λ < 0, i.e., a system with a decaying Γ(t w ), we have ∆t w Γ(t w ) ≥ ln 2, i.e., the time steps are longer than ln 2/Γ(t w ). Furthermore, the total number of steps r is finite even for an arbitrarily large evolution time T . Note that since Γ(t w ) approaches zero asymptotically, for a large enough time t w * , we have Γ(t w * ) < |λ| ln 2, i.e., the argument inside the logarithm above becomes negative. This indicates it reaches the final step, i.e., the bound should be modified asΓ(t w * ) = λ ln 2/(e λ∆tw * − 1) and ∆t w * = T − t w * becomes the final step. (iii) In the case where λ > 0 (an amplified Γ(t w )), we have Γ(t w ) λ at large simulation times, t w . From Eq. (44), we have ∆t w → ln 2/Γ(t w ) in this limit.
We see that (for large enough simulation times) the time step ∆t w is inversely proportional to Γ(t w ) which upperbounds the max-norm of the interaction Hamiltonian at time t w . Therefore, we have r−1 w=0 Γ(t w )∆t w r ln 2, which implies r r−1 w=0 Γ(t w )∆t w / ln 2. It would be instructive to compare the above scaling with that of Ref. [17] in which the simulation algorithm is said to have an L 1 -norm scaling, i.e., an algorithm cost scaling linearly with t o dτ H max (τ ) up to logarithmic factors. Under a similar intuition, our algorithm has a discretized L 1 -norm-like scaling with r−1 w=0 Γ(t w )∆t w . However in our case, Γ(t w ) is related to the norm of the interaction Hamiltonian.

State preparation
In this subsection, we provide a procedure to prepare the state |ψ 0 given in Eq. (39). First, we initialize a state |0 ⊗Q |0 ⊗Q |0 , where each of the first Q registers has dimension M (responsible for |i q part), each of the later Q registers has dimension K (responsible for |k q part), and the last register is a qubit (for the cosine decomposition). For simplicity, we can perform a Hadamard gate on the last qubit and then omit its dependence for the following discussion. The next step is to create a state in following the form, where s q ≡ Γ(t w ) ∆t w q /q!. For each |1 from the first Q registers (the |i q part) and the corresponding |1 in the later Q registers (the |k q part), we make Then Eq. (45) becomes which is the required |ψ 0 in Eq. (39), when combined with |x . Next, we provide a process that produces the state in Eq. (45). First, we perform a rotation that takes the first register in the |i q part to and perform a control gate from the first register to the second (both in the |i q part) such that Continuing this procedure for the rest of the registers in the |i q part, the state becomes At this step, we perform CNOT operations 3 from the first Q registers (|i q part) to the last Q registers (|k q part) correspondingly, e.g., perform a CNOT from the first register in the |i q part to the first register in the |k q part, and so on and so forth. Finally, we have which gives Eq. (45) as required. The estimated gate cost for the preparation of |ψ 0 is O(QM K). More detail regarding the cost is provided in Sec. III D 1.

Implementation of the controlled unitaries
The second ingredient of the LCU routine is the construction of the controlled operation V c |i q |k q |x |ψ = |i q |k q |x (−i) q P iq Φ (kq,w) iq,x |ψ .
Taking an approach similar to that taken in Ref. [33], we first note that Eq. (52) indicates that V c can be carried out in two steps: a controlled-phase operation (V cΦ ) followed by a controlled-permutation operation (V cP ). The controlled-phase operation V cΦ requires a somewhat intricate calculation of non-trivial phases. We therefore carry out the required algebra with the help of additional ancillary registers and then 'push' the results into phases. The latter step is done by employing the unitary whose implementation cost depends only on the precision with which we specify ϕ and is independent of Hamiltonian parameters [32] (see Ref. [33] for a complete derivation). With the help of the (controlled) unitary transformation V χφ |i q |k q |x |z |0 = |i q |k q |x |z |χ Note that V χφ sends computational basis states to computational basis states. We provide an explicit construction of V χφ in Ref. [33]. We find that its gate cost is O(QM (k od + log M ) + QM K(C D + C ∆H0 + C Λ )) and qubit cost is O(Q log(M K)). Addiitonal details are provided in Sec. III D 1. The construction of V cP is carried out by a repeated execution of the simpler unitary transformation U p |i |z = |i P i |z . Recall that P i are the off-diagonal permutation operators that appear in the Hamiltonian. The gate cost of U p is therefore O(M (k od + log M )). Additional details may be found in Ref. [33].

D. Algorithm cost
We next analyze the circuit costs for the permutation expansion algorithm. Recall that the simulation of U (T ) consists of two operations-e −iH0T and U I (T ). The diagonal unitary e −iH0T can be implemented efficiently with a gate cost that scales linearly with the system size. To observe this, note that H 0 is a diagonal matrix with real diagonal elements and can be written as H 0 = L γ=0 J γ Z γ , where each Z γ is a tensor product of Pauli-Z's (Z ⊗· · ·⊗Z) acting on at most d qubits (weight-d operators). Hence, we can write e −iH0T = L γ=0 e −iJγ Zγ T . Each e −iJγ Zγ T can be simulated using at most 2d CNOT gates with a single ancillary qubit. For example, let Z γ be a weight-m (m ≤ d) operator, then e −iJγ Zγ T can be implemented as where |z 1 · · · |z m are the qubits Z γ acts on and |0 is an ancillary qubit for extracting the phase. There are total L such implementations for e −iH0T . Therefore, the total gate cost is O(Ld) and the qubit cost is O(1). Since L usually grows linearly with the system size, the gate cost also scales linearly.

The cost for the state preparation and the controlled unitaries
The cost of implementing U I (T ) resembles those in Ref. [33]. The first ingredient is the preparation of state |ψ 0 . Recall from Sec. III C 2, the operation that takes |0 ⊗Q |0 ⊗Q |0 to 1 |i |k costs O(M K) [34]. The total gate cost for the preparation of |ψ 0 (i.e., B) is O(QM K) (Lemma 8 in [35]). In Appendix D, we provide an alternative procedure that leads to a O(Q log(M K)) scaling for implementing B. The qubit cost in the state preparation is O(Q log(M K)). The next component is the implementation of the control unitary V c . As shown in [33], the gate cost of performing the control permutation P iq is O(QM (k od + log M )), where k od is the "locality," i.e., each permutation P i is a tensor product of at most k od Pauli-X operators. The implementation of the control phase Φ i ) [therefore, C ∆H0 + C Λ is the cost for obtaining the inputs x j 's as defined in Eq. (29)]. The additional cost for the reversibility of the process scales as O(Q 2 ). A detailed discussion of the costs of C ∆H0 and C Λ may be found in Ref. [33]. Combining these, we estimate the total cost for V c is O(Q 2 + QM (k od + log M ) + QM K(C D + C ∆H0 + C Λ )). (56)

Overall cost of the algorithm
The full simulation for U I (T ) is a product of segments U I (t w + ∆t w , t w ), where each segment is simulated by interleaving B and V c . The total number of segments, r, is determined by T = r−1 w=0 ∆t w , where each ∆t w is determined by partitioning scheme described in Sec. III C 1.
As discussed above, the number of LCU applications r can be upper-bounded by r r−1 w=0 Γ(t w )∆t w / ln 2 (in the long simulation time limit), which can be viewed as a discretized L 1 -norm-like scaling with the norm of the non-static component of the Hamiltonian V (t).
Combining with the cost for simulating e −iH0T and the cost for each step (56), we conclude that at worst, the total gate cost scales as and the qubit cost scales as where Q scales as O(log(r/ )/ log log(r/ )). For convenience, we provide a glossary of symbols in Table I. A summary of the gate and qubit costs of the simulation circuit and the various sub-routines used to construct it is given in Table II.

Example advantages of the algorithm
To illustrate how our simulation algorithm can provide speedups over existing algorithms, we focus in this subsection on two types of Hamiltonian systems: highly oscillating systems and decaying systems.
The cost of our algorithm is independent of the oscillation rates of the dynamics, whereas the cost of any simulation algorithm (e.g., [12,[15][16][17]) that depends on ||dH(t)/dt|| would depend on oscillation rates of the system. To illustrate this advantage, consider a two-level system with a Hamiltonian where h, Γ, α ∈ R, H 0 = hZ and V (t) = Γ e −iαt |0 1| + e iαt |1 0| . In this case, we have k od = M = K = 1 and λ = 0. The gate cost of simulating U (T ) scales as which is independent of α. This means the simulation cost remains the same even if α becomes arbitrarily large. One can realize the absence of α owing to the fact that phases are explicitly integrated out into an integral-free expansion series, where the bound of each term does not depend on the oscillations (due to Identity 2). Therefore, our simulation can be significantly more effective when the time dependence of the Hamiltonian has very high frequencies. Note that while the example above was given for a simple qubit system with pure oscillation, the frequency-independence in cost holds for any system. Another class of systems for which our algorithm can provide seedup are Hamiltonians with exponential decays, i.e., λ < 0. For concreteness, consider the Hamiltonian where h, Γ ∈ R and α > 0 and H 0 = hZ and V (t) = Γe −αt X. In this case, λ = −α and ||V (t)|| max = Γe −αt . The L 1 -norm defined in Ref. [17] is T 0 ||H(t)|| max dt, which has a linear scaling O(hT ) with the simulation duration T , whereas our discretized L 1 -norm r−1 w=0 ||V (t w )|| max ∆t w tends to a constant in the long time limit. This can be seen from the fact that the partition terminates at a large enough time t w (≤ T ), where ∆t w = T − t w becomes the final simulation step, as described in Sec. III C 1. The above results also hold for any combination of exponential decays (even when these are multiplied by oscillatory terms) with which different time decay dependencies may be constructed.

E. Hamiltonians with arbitrary time dependence
The simulation algorithm invokes a switch to the interaction picture, by dividing the Hamiltonian into a static diagonal part H 0 and a time-dependent Hermitian operator V (t). The V (t) is expanded using permutations and exponential sums as presented in Eq. (17). There, we assume that the time dependence can be expressed as exponential sums with a finite number of terms, K. Although this assumption holds for many models (e.g., when the time dependencies are some combinations of trigonometric functions and exponential decays), the exponential series generally requires an infinite sum (e.g., a Fourier series). A straightforward procedure to obtain a finite sum approximation is via a truncated Fourier series. As an example, let us consider a polynomial function of time, i.e., f (t) = p l=0 c l t l . Using the proof of Theorem 8.14 in Ref. [36], it can be shown that a truncated Fourier series of f (t) is O( ) close to f (t) when the truncation order is O(1/ ). We also note that, other than Fourier series, there have been numerous studies [27][28][29][30][31] regarding finding an exponential-sum approximation of a function. Some of them, e.g., [27], provide efficient algorithms with logarithmically scaling terms (with respect to the inverse of a required accuracy). These results suggest that efficient methods for finding the exponential-sum decompositions of the time dependences of V (t) can exist in many cases.
Suppose that V (t) is approximated by a finite series of exponential sum. The resulting error of the unitary evolution, due to the Hamiltonian approximation, scales only at most linearly with the evolution duration. This can be shown using the following property. Given two time-dependent Hamiltonians H 1 (t) and H 2 (t) such that then This holds true for any norm || · ||. Before proving this, we first note a property of the so-called Subadditivity of error in implementing unitaries [32]. It says that for unitaries U 1 , U 2 , V 3 and V 4 , we have This can be easily shown by where the basic operator norm inequalities are used. Now we prove the bound in Eq. (63). We divide T into n segments such that each segment has width T /n. We can rewrite the time evolution operators as Repeatedly using the subadditivity of error, we have Since this inequality holds for any n, we can take n → ∞ and it yields Eq. (63) as claimed. Now we apply this property to the simulation of U I (T ). Suppose that we have anδ−accurate approximation of V (t), i.e., ||Ṽ (t) − V (t)|| ≤δ for all t ∈ [0, T ], whereṼ (t) is the finite exponential-sum approximation of V (t). The accumulative error from this approximation is bounded byδT and the overall error is O(δT + δ), where δ is the error from LCU implementation. Recall thatδ is closely related to K (the number of terms). Although it is intuitive that a larger K can allow for a smallerδ, the explicit relation between the two largely depends on the model and the expansion method. Nonetheless, we can expect K to scale at least linearly with 1/δ for many cases, e.g., aforementioned truncated Fourier series for a polynomial.
The simulation cost also depends on M , the number of terms in the permutation expansion. This quantity usually scales linearly with the system size and can be easily determined. For example, a typical spin model usually involves a sum of tensor products of Pauli-X's (or Y 's) and Pauli-Z's. Each tensor product represents an interaction between qubits on certain lattice sites. Due to the common locality constraint that prevents a qubit interacting with the ones arbitrarily far apart, the number of interacting terms, M , scales at most linearly with the number of qubits. In addition, a tensor product of Pauli operators can be easily separated into a product of diagonal matrix and a permutation, e.g., X ⊗ X ⊗ Y = (I ⊗ I ⊗ −iZ)(X ⊗ X ⊗ X). We conclude that M will have modest linear scaling for most practical models.

IV. ALTERNATIVE SCHEME AND REDUCTION TO THE TIME-INDEPENDENT CASE
In this section, we provide an alternative yet equivalent scheme for the dynamical simulation, one that will allow us to establish an immediate connection to the time-independent Hamiltonian simulation formalism (specifically to the scheme presented in Ref. [33]), in which H(t) is assumed constant in time.
In previous sections, we have chosen to partition the interaction-picture unitary U I (T ) into short time segments and then follow its execution by the application of a diagonal e iH0T bringing it back to the Schrödinger picture. Here, we show that the Schrödinger picture U (T ) can be partitioned similarly.
Recalling the expansion of U I (t w + ∆t w , t w ) in Eq. (32), we have Breaking the e twx1 phase, we get: We find that where iq,z P iq |z z| .
Inspecting the full unitary evolution, we observe The evolution operator U (T ) can be simplifies as eliminating the diagonal piece. Each U I (t w + ∆t w , t w ) can be rewritten as: The factor e −∆tw(iEz iq + q l=1 λ can be absorbed into the divided difference: iq,z P iq |z z| . withỹ which simplifies toỹ By inserting additional i∆t w E z phases into the divided differences, we can rewrite Now, we can write U (T ) as alternating off-diagonal and diagonal unitaries: When H(t) becomes time-independent, λ (k l ) i l ,z i l = 0 and ∆t w = ∆t = ln 2/Γ. To synchronize the notation with [33], we identify H 0 = D 0 and U od (t w + ∆t w , t w ) = U od . The evolution operator becomes U (T ) = U od e −iD0∆t · · · U od e −iD0∆t , which coincides with [33].

V. CONCLUSIONS
We presented a quantum algorithm for simulating the evolution operator generated from a time-dependent Hamiltonian. The algorithm involves a permutation expansion for the interaction Hamiltonian, a switch to the interactionpicture, and the incorporation of the LCU technique. Combining the permutation expansion with the Dyson series has led to an integral-free representation for the interaction-picture unitary with coefficients involving the notion of divided differences with complex inputs.
We found that our expansion allowed us to adjust the time steps based on the dynamical characteristics of the Hamiltonian, providing a resource saving as compared to the equal-size partition with the largest bound. This further resulted in a gate resource that scales with an L 1 -norm-like scaling with respect only to the 'non-static' norm of the Hamiltonian.
Specifically, we demonstrated that for systems with a decaying non-static component, the resources do not scale with the total evolution time asymptotically. Furthermore, the simulation cost is independent of the frequencies, implying a significant advantage for systems with highly oscillating components.
Although the above sorting procedure is not unique, it can be shown that any choice of the permutation gives the same result, and hence the definition is well-defined.
The divided difference involves a recursive relation that connects a q + 1 input case to two q cases. For q = 1, For q = 2, and suppose x 0 , x 1 and x 2 are all distinct, In fact, it can be shown that for distinct x 0 , x 1 , · · · , x q , Remark. Since any analytic function admits a Taylor expansion representation and the divided difference is a linear functional, the divided difference of an analytic function f has a series expansion form, i.e., for x 0 , · · · , x q and y in f 's analytic domain, where p n|y (x) ≡ (x − y) n . Because p n|y [x 0 , · · · , x q ] = 0 for all n < q, the non-vanishing term of the series starts from the qth order. For simplicity, we denote the divided difference for the exponential function as e [x0,··· ,xq] , i.e., Property 1. For any non-negative integer q and x 0 , x 1 , · · · , x q ∈ C, This property and the fact that divided differences are permutation symmetric among inputs imply that any input can be factored out of the divided difference by subtracting it from every entry. Property 2. For any non-negative integer q and x 0 , x 1 , · · · , x q ∈ C, An equivalent definition of divided difference for an analytic function is via its Taylor expansion. It amounts to apply divided difference on every order of the series. Since any polynomial of order less than q is annihilated, the series starts from the order q. P roperty 2 is derived from the Taylor expansion of e x with respect to the origin. Lemma 1. For any non-negative integer q and x 0 , x 1 , · · · , x q ∈ C, Proof. This can be observed from the series expansion of the divided difference for the exponential function, i.e., from P roperty 2, a q e [ax0,ax1,··· ,axq] = a q Performing term-by-term integration over a on both side, we have where the last equality follows from the series expansion representation of e [0,x0,x1,··· ,xq] . This completes the proof.
Corrolary 1. Let f (x) = e tx , where t ∈ R and x ∈ C. We denote e t[x0,··· ,xq] ≡ f [x 0 , · · · , x q ], where x 0 , · · · , x q ∈ C. For any τ ∈ R, With these properties, we are ready to prove the bound in Identity 2 in the main context.
where the second and the third equalities use Lemma 1 and the second inequality uses (A14). This proves that the inequality holds for any number of complex inputs.
where s = m−1 j−0 β j . Suppose we have access to a control unitary V c such that for each j, Consider the following combination of the above operations W ≡ B † ⊗ I V c (B ⊗ I) . (C5) We have where |Φ 's ancillary part is orthogonal to |0 0|. Let us denote P ≡ |0 0| ⊗ I the orthogonal projection onto that subspace and R ≡ I − 2P the reflection operator with respect to P . It is shown that the sequence of operations A ≡ −W RW † RW , acting on the total system is A|0 |ψ = |0 Ũ |ψ whenŨ is unitary and s = 2. This procedure is the so-called Oblivious Amplitude Amplification (OAA). However,Ũ is in general not unitary because it is a truncated series of U . This nonunitarity can be accounted for whenŨ ≈ U and s ≈ 2. More specifically, it is shown that if ||U −Ũ || = O(δ) and |s − 2| = O(δ), then This means whenŨ is δ-close to U and s is δ-close to 2, the effect of the operator A on the whole system is δ-close to only U acting on |ψ . Note that the condition ||U −Ũ || = O(δ) can be satisfied when the truncation order m is high enough. However, the condition |s − 2| = O(δ) is satisfied only when β j are specifically chosen. By construction, we require s = m−1 j=0 β j . If we choose β j = (ln2) j /j!, then becomes a truncated Taylor expansion of 2, i.e., 2 = e ln2 . In fact, it can be shown that the required truncation order m such that |s − 2| = O(δ) scales like log(1/δ)/ log(log(1/δ)). With this m, it also guarantees that ||U −Ũ || = O(δ), because In summary, performing A on an extended system |0 |ψ , with β j = (ln2) j /j! and m = O(log(1/δ)/ log log(1/δ)), effectively performs U on |ψ with O(δ) accuracy.

Appendix D: An alternative approach for the LCU setup
We provide an alternative procedure for the LCU routine that leads to an exponential saving for the state preparation. Let us define Γ ≡ max The evolution operator from t w to t w + ∆t w becomes U I (t w + ∆t w , t w ) = z U I (t w + ∆t w , t w )|z z|