Linear combination of Hamiltonian simulation for nonunitary dynamics with optimal state preparation cost

We propose a simple method for simulating a general class of non-unitary dynamics as a linear combination of Hamiltonian simulation (LCHS) problems. LCHS does not rely on converting the problem into a dilated linear system problem, or on the spectral mapping theorem. The latter is the mathematical foundation of many quantum algorithms for solving a wide variety of tasks involving non-unitary processes, such as the quantum singular value transformation (QSVT). The LCHS method can achieve optimal cost in terms of state preparation. We also demonstrate an application for open quantum dynamics simulation using the complex absorbing potential method with near-optimal dependence on all parameters.

Introduction-Fault-tolerant quantum computers are expected to excel in simulating unitary dynamics, such as the dynamics of a quantum state under a Hamiltonian.Most applications in scientific and engineering computations involve non-unitary dynamics and processes.Therefore, efficient quantum algorithms are the key for unlocking the full potential of quantum computers to achieve comparable speedup in these general tasks.Quantum phase estimation (QPE) [1,2] is the first algorithm to bridge this gap.QPE stores the eigenvalues of the Hamiltonian in an ancilla quantum register, and can be used to solve a wide range of problems, including amplitude estimation [3], linear systems [4], and differential equations [5,6].There have been two significant improvements over QPE based methods.The first is linear combinations of unitaries (LCU) [7], which provides an exponential improvement in precision for tasks such as the Hamiltonian simulations [8,9] and linear systems [10].Here the exponential improvement refers to the cost of preparing a quantum state in a register, which also directly translates into a polynomial improvement in precision when the final measurement cost is taken into account.The second is quantum signal processing (QSP) [11], and its generalization including quantum singular value transformation (QSVT) [12], and quantum eigenvalue transformation of unitary matrices (QETU) [13].Compared to LCU, QSP based approaches can achieve a similar level of accuracy but with a much more compact quantum circuit and a minimal number of ancilla qubits.
The unifying mathematical argument that underlies many of these approaches is the spectral mapping theorem for Hermitian matrices: Let A be a Hermitian matrix and f (A) be a real-valued matrix function defined on * linlin@math.berkeley.eduthe eigenvalues of A. Then, the eigenvalues of the matrix f (A) are equal to the values of the classical function f applied to the eigenvalues of A, i.e., if λ 1 , . . ., λ N are the eigenvalues of A, then the the eigenvalues of f (A) are f (λ 1 ), . . ., f (λ N ).Furthermore, f (A) is diagonalized by the same unitary matrix that diagonalizes A. For instance, the quantum linear system problem corresponds to f (A) = A −1 , Hamiltonian simulation corresponds to f (A) = cos(At), sin(At) (for the real and imaginary parts of e −iAt ), and Gibbs state preparation corresponds to f (A) = e −A .
The limitation of this matrix-function-based perspective can be readily observed when solving a general differential equation ∂ t u(t) = −A(t)u(t) + b(t), u(0) = u 0 . (1) When b(t) = 0 and A(t) = A ∈ C N ×N is a general timeindependent matrix, the system has a closed form solution u(t) = e −At u 0 .Even in this case, the eigenvalues of A may not be real, the eigenvectors of A may not form a unitary matrix, or A may not be diagonalizable at all.These difficulties prevent us from applying techniques such as QSVT to implement f (A) = e −At on a quantum computer, and the situation becomes much more complicated when A, b are not some fixed matrices and vectors, but are time-dependent.To solve Eq. (1), most existing quantum algorithms convert the problem into a quantum linear system problem (QLSP) with a fixed and dilated matrix (i.e., a matrix of enlarged size).The resulting QLSP can then be solved using many of the aforementioned techniques based on the spectral mapping argument.However, both the construction of the linear system problem and the solution of the QLSP with near-optimal complexity (in order to achieve desired dependence on parameters such as the precision and the simulation time) can be very involved.We shall compare our new proposals with the QLSP approach later in the paper.
In this work, we propose a significantly simplified solution to the non-unitary process in Eq. (1).Our procedure is not based on the spectral mapping theorem, but on a surprising identity expressing the solution as a linear combination of Hamiltonian simulation (LCHS) problems.LCHS can be viewed as a special case of LCU.Namely, each Hamiltonian simulation problem is described by a unitary operator.Unlike LCU for Hamiltonian simulation or solving linear systems, these unitary operators do not commute with each other.LCHS is a very flexible procedure.The linear combination can be implemented in a hybrid quantum-classical fashion to compute observables related to the solution, using a small amount of quantum resources and is thus suitable for the setting of early fault-tolerant quantum computers.The linear combination can also be coherently implemented to prepare the solution directly in a quantum register and to reduce the complexity.In this case, we show that the cost of LCHS is optimal in terms of state preparation, which is useful when the initial state u 0 is difficult to prepare.
When the anti-Hermitian part of the matrix A(t) is fast-forwardable, we may incorporate the interaction picture Hamiltonian simulation in the LCHS to obtain further improvements.A practical example with such a feature is the open quantum system dynamics.Many problems in quantum dynamics, such as molecular scattering [14], photodissociation [15], and nanotransport [16], are defined in an infinite space.Unlike solving the ground state of molecules in quantum chemistry, replacing the infinite space by a finite-sized box in these quantum dynamics problems may lead to significant errors at least along certain extended dimensions.Therefore boundary conditions need to be carefully designed and implemented to balance the accuracy of the simulation and the computational cost.One widely used method in quantum chemistry is the complex absorbing potential (also called the imaginary potential) method [15,17,18].In this case, we show that our LCHS algorithm can achieve near-optimal dependence on all parameters in both state preparation and matrix input models.
Linear combination of Hamiltonian simulation-Let us consider the homogeneous problem first with b(t) = 0.
Theorem 1 (Linear combination of Hamiltonian simulation).Let A(t) ∈ C N ×N , t ∈ I = [0, T ] be decomposed into a Hermitian and an anti-Hermitian part, . Assume L(t) ⪰ 0 for all t ∈ I. Then (T is the time ordering operator) T e −i t 0 (H(s)+kL(s)) ds dk. ( Theorem 1 can be viewed as a generalization of the Fourier representation of the exponential function f Note that f (k) ≥ 0 and f (k) dk = 1.Therefore f (k) is the density of a probability distribution, called the Cauchy-Lorentz distribution.If H(t) = 0 and L(t) = L is time-independent, Theorem 1 can be readily proved from Eq. ( 3) and the spectral mapping theorem.This special-case formula has been applied to simulating imaginary time evolution dynamics [19,20].However, our Theorem 1 works in a more general setting where the matrix can be time-dependent and non-Hermitian.Our general proof hinges on a special instance of the matrix version of the Cauchy integral theorem, which is a key for avoiding the spectral mapping argument (see the Supplemental Materials).
The condition that the Hermitian part L(t) is positive semidefinite can always be satisfied without loss of generality.Indeed, by redefining u(t) = e ct v(t), the equation for v(t) is (4) By choosing −c to be the minimum of the smallest eigenvalues of L(t) on t ∈ I, L(t)+cI is a positive semidefinite matrix.
When b(t) = 0, the solution to Eq. (1) becomes u(t) = T e − t 0 A(s) ds u 0 .By discretizing the integral with respect to k using a grid k j with quadrature weights ω j , Eq. ( 2) becomes where c j = ωj π(1+k 2 j ) , and U j (t) = T e −i t 0 (H(s)+kj L(s)) ds is the propagator for a time-dependent Hamiltonian simulation problem.Therefore Eq. ( 2) compactly expresses the solution as a problem of LCHS, which can be coherently implemented using LCU (see the Supplemental Materials).
If we are only interested in obtaining observables of the form u(t) * Ou(t), we may implement the linear combination in a hybrid quantum-classical fashion.Notice that, since U j 's are unitary, the observable can be expressed as We can then use the quantum computer to evaluate a series of correlation functions ⟨u 0 |U † k (t)OU k ′ (t)|u 0 ⟩ via the Hadamard test for non-unitary matrices [21] and amplitude estimation [3], and perform the summation on a classical computer (see the Supplemental Materials).
In the presence of the source term b(t), we can use the Duhamel's principle (a.k.a.variation of constants) to express the solution as T e −i t 0 (H(s)+kL(s)) ds u 0 dk We may again use LCU to coherently prepare the state u(t), or use an expression similar to that of Eq. ( 6) for hybrid computation of observables.
Implementation-The LCHS can be implemented in a gate-efficient way by combining LCU with any Hamiltonian simulation algorithms.Here we discuss the simplest implementation of LCHS based on the product formula, and we will discuss the one based on the truncated Dyson series method later in the paper.
We first truncate the integral in Eq. ( 2) on a finite interval [−K, K] and discretize it by a trapezoidal rule with (M + 1) grid points.We obtain T e − T 0 A(s) M are the weights and nodes of the trapezoidal rule.To implement each U j (T ) = T e −i T 0 (H(s)+kj L(s)) ds , we use a p-th order product formula with a fixed number of steps r for all j.Then, Here Ξ p is the number of the exponentials in the product formula, α l 's and β l 's are the corresponding coefficients, and γ l 's and δ l 's determine the discrete times at which the time-dependent Hamiltonians are evaluated (see [22] for an example of the product formula via Suzuki recursion).Suppose that we are given the state preparation oracle O prep : |0⟩ → |u 0 ⟩, the Hamiltonian simulation oracles O L (s, τ ) = e −iL(τ )s for |s| ≤ 1/∥L∥ and O H (s, τ ) = e −iH(τ )s for |s| ≤ 1/∥H∥, and the LCU coefficient oracle √ c j |j⟩.According to the binary representation of j's, we may first construct a coherent encoding of the time evolution as the select oracle SEL L (s, τ ) = M j=0 |j⟩ ⟨j| ⊗ e −iL(τ )kj s using O(log(M )) queries to O L (s, τ ) (see Supplemental Materials and, e.g., [10,23,24] for details).Then Eq. ( 8) can be implemented following the standard LCU approach by first applying O coef ⊗ O prep , then sequentially applying SEL L (α l T /r, (l ′ +γ l )T /r) and O H (β l T /r, (l ′ +δ l )T /r) for l ∈ [Ξ p ] and l ′ ∈ [r], and finally applying O † coef on the ancilla register.Such a procedure yields the quantum state approximating 1 ∥c∥1∥u0∥ |0⟩ a T e − T 0 A(s) ds u 0 + |⊥⟩, encoding the homogeneous part in its first subspace.
For the inhomogeneous term in Eq. ( 7), after discretizing the integral for both k and s using the multidimensional trapezoidal rule, we obtain Here c j,j ′ = v j ′ wj ∥b(s j ′ )∥ π(1+k 2 j ) , w j , k j are as defined before, and s j ′ = j ′ T Mt .This can also be implemented by the same Trotterization and LCU approach as the homogeneous case (see Supplemental Materials for details).
Notice that the evolution time of different Hamiltonians in the LCHS of the inhomogeneous term varies, so we will assume the input model of H, L to coherently encode the evolution of different Hamiltonians with different time periods, as and All of these oracles are an extension of the time-dependent encoding proposed in [23].
To linearly combine T e − T 0 A(s) ds u 0 and we append an extra ancilla qubit, prepare both states controlled by this ancilla qubit, and implement the LCU again at the outer loop using a single-qubit rotation.The final step is to measure all the ancilla registers, and if all the outcomes are 0, then the resulting state approximately encodes the solution u(t) of the ODE.
Computational cost-The complexity of our algorithm for solving Eq. ( 1) is given as follows.Here |u(T )⟩ is the solution to Eq. (1) at the final time T .Theorem 2. There exists a quantum algorithm that prepares an ϵ-approximation of the state |u(T )⟩ with Ω(1) success probability and a flag indicating success, using 1. queries to the aforementioned input models of H and L a total number of times where 2. queries to the state preparation oracle O prep and the source term input model additional one-qubit gates.
The proof of Theorem 2 can be found in the Supplemental Materials.Our algorithm uses a small number of queries to the state preparation oracle.This is because each run of the LCU procedure only requires O(1) uses of such oracles and the overall complexity only relates to the success probability, which contributes to a (∥u 0 ∥ + ∥b∥ L 1 )/∥u(T )∥ factor.The query complexity to the matrix input in each run depends linearly on the number of Trotter steps, which contributes to the scaling Γ 1+1/p p T 1+1/p /ϵ 1/p according to the p-th order Trotter error bound.The extra 1/ϵ 1/p scaling is due to the relative error scaling.Notice that in the matrix query complexity there are still extra terms including 1/ϵ and ((∥u 0 ∥ + ∥b∥ L 1 )/∥u(T )∥) 1+1/p .The former is because we need to simulate the Hamiltonian up to K = O(ϵ −1 ), while the latter arises from the necessity of bounding the relative Trotter error.Ancilla qubits are used to implement quadrature.
Our algorithm and Theorem 2 can be directly applied to the special case where A(t) ≡ A is time-independent.In this case, we may further simplify the implementation and reduce the computational cost.First, the coherent encoding of the time evolution is no longer needed, and the select oracles in the LCU procedure can be efficiently constructed using O L (s) = e −iLs and O H (s) = e −iHs with O(log(M ) log(M t )) cost.Second, the parameter Γ p can be improved to the commutator scalings between H and L thanks to the improved error bound for timeindependent product formula [25].We refer to the Supplemental Materials for more details.

Optimal state preparation cost and comparison with other methods-The query complexity of the LCHS approach to the state preparation oracle
. In fact, when b = 0, this corresponds to the dependence of a quantity denoted by q = ∥u0∥ ∥u(T )∥ in quantum differential equation solvers.Such a linear scaling of q cannot be improved.From the perspective of quantum complexity theory, the renormalization from ∥u 0 ∥ to ∥u(T )∥ could be utilized to implement postselection, and it would imply the unlikely consequence BQP = PP (similar discussions appear in Section 8 of [26] and Section 7 of [27]).Moreover, as shown in [28,Corollary 16], any quantum differential equation algorithm must have worst-case query complexity Ω(q) to the state preparation oracle, indicating the optimal state preparation cost that our LCHS approach achieves.Most generic quantum differential equation algorithms [5,6,26,29,30] convert the time-dependent differential equation problem Eq. (1) into a QLSP.The efficiency of these quantum algorithms relies on the efficiency of the quantum linear system algorithms (QLSA), which typically take a large number of queries to the state preparation oracle.In particular, the state-of-theart QLSA takes O(κ log(1/ϵ)) queries [31], where κ is the condition number of the QLSP matrix and depends on T and ∥A(t)∥.However, our algorithm directly implements the time evolution operator by LCHS without using QLSA, so the number of the state preparation oracle is significantly reduced.For example, when b = 0, our algorithm takes O ∥u0∥ ∥u(T )∥ queries to the state preparation, while the state-of-the-art QLSA-based quantum Dyson series method [30] queries the initial state O ∥u0∥ ∥u(T )∥ ∥A∥T log(1/ϵ) times.Our algorithm removes the explicit dependence on ∥A∥, T and ϵ, and matches the lower bound.Additionally, in the QLSAbased approaches, the value of κ can be difficult to estimate in practice, and it is common that the theoretical bound significantly overestimates the value of κ [29].We also note that the time marching method [32] does not involve QLSP and can query the initial state O ∥u0∥ ∥u(T )∥ times as well.In this case, our LCHS approach removes the need of implementing the uniform singular value amplification procedure and can thus be simpler to implement.It also outperforms the time marching method in terms of the number of matrix queries with respect to the evolution time, with an improvement from Applications to open quantum system dynamics with complex absorbing potentials-In its simplest form, the complex absorbing potential method [15,17,18] replaces the real potential by a complex one, and the timedependent Schrödinger equation becomes Here V R (r, t) is the real time-dependent external potential, and −iV I (r) is the absorbing potential and can often be chosen to be time-independent.The minus sign reflects that this is a damping potential, and V I can be chosen to be bounded and non-negative.For simplicity we only discuss Eq. ( 10) in the context of singleparticle dynamics, and this formulation can be generalized to accommodate multi-particle dynamics.Our method can also be generalized to other more boundary treatment methods such as the perfectly matched layer (PML) method [33,34].In these applications, we are interested in the case before the scattering wave leaves the region of interest, i.e., ∥u(T )∥ 2 = |u(r, T )| 2 dr is not too small.Such a condition can also be satisfied if u 0 is a near-resonance state.Another widely used method for modeling open quantum system dynamics is the Gorini-Kossakowski-Sudarshan-Lindblad (GKSL) quantum master equation [35][36][37].The non-Hermitian quantum dynamics (10) is also related to a class of numerical methods for solving the GKSL equation, known as the quantum jump or the Monte Carlo wavefunction method [38,39].
To solve Eq. ( 10) using our LCHS algorithm, we first discretize the spatial variable using N equidistant grid points, and the Laplace operator is discretized by the central difference formula.In the context of Eq. (1), the is the standard Hamiltonian with ∥H(t)∥ = O(N 2 + max t ∥V R (t)∥), and the non-Hermitian part L = V I is a time-independent positive semi-definite matrix.Furthermore, the dynamics e −ikLt can be fast-forwarded in the sense that e −ikLt can be performed with cost independent of k, t, and ∥L∥ [40], so we may perform the Hamiltonian simulation in the interaction picture with the truncated Dyson series method [23] to avoid computational overhead brought by K.
In the interaction picture, the LCHS becomes where H I (s; k) = e iLks H(s)e −iLks .Eq. ( 11) can be directly implemented by LCU.Notice that the derivative of H I still scales linearly in K, but the cost of the truncated Dyson series method scales only logarithmically with respect to the derivative of H I (s) and thus K [23].
Our main result is as follows, which achieves nearoptimal scaling in all parameters.The proof is given in the Supplemental Materials.We remark that the following result also holds in a more general case where H(t) is an arbitrary time-dependent Hamiltonian and L is a time-independent fast-forwardable Hamiltonian.Then there exists a quantum algorithm that prepares an ϵapproximation of the state |u(T )⟩ with Ω(1) success probability and a flag indicating success, using Discussion-Linear combination of Hamiltonian simulation (LCHS) provides a simple way for solving the non-unitary dynamics in the form of Eq. (1).Compared to existing approaches based on QLSP, LCHS is simple, flexible, and achieves optimal state preparation cost.The linear combination procedure can be implemented in a hybrid quantum-classical fashion to facilitate the computation of observables on early fault-tolerant quantum computers.The most significant limitation of LCHS is that the spectral radius of the Hermitian part L needs to be multiplied by a factor up to the frequency cutoff K, where K = O(1/ϵ) due to the quadratic decay of the kernel (1 + k 2 ) −1 .This increases the maximal circuit depth by a factor ϵ −1 .The problem can be ameliorated when the dynamics of L can be fast-forwarded.For instance, in the case of simulating open quantum system dynamics using a complex absorbing potential, our LCHS-based solver achieves near optimal complexity in all parameters.However, Hamiltonian simulation in the interaction picture [23] may be difficult to implement.A more desirable solution to this problem would be to replace the kernel (1 + k 2 ) −1 by a fast decaying one, so that the truncation range can be dramatically reduced.It remains an open question how to obtain the best possible solver that is optimal in all parameters for simulating general non-unitary dynamics.
When a task can be solved using either LCU-or QSPbased techniques, the latter are often preferred due to their superior resource efficiency, as well as ease of implementation.However, all QSP-based techniques are based on the spectral mapping theorem.LCHS avoids this spectral mapping argument, and thus significantly expands the application range of LCU.It would be interesting to see if the mathematical reformulation of LCHS can inspire further generalization of QSP-based approaches, particularly for applications involving non-normal matrices.Additionally, exploring extensions of LCHS to other quantum linear algebra problems can be a promising direction for future research.

I. PROOF OF Theorem 1
We first establish a lemma which is a special instance of Cauchy's theorem applied to matrix functions without relying on the spectral mapping argument.Here P stands for the Cauchy principal value of the integral.
Proof.By a change of variable ω = ik, and denote the imaginary axis by γ, we have Let R be any positive number, and we choose a closed contour C in the complex plane as in Fig. S1, where γ C is the half circle with a radius R in the right half plane.We will prove the lemma by deforming the integral along the imaginary axis to the integral on along the half circle.Note that each entry of the integrand on the right hand side of Eq. ( S2) is analytic with respect to ω in the region enclosed by C. Therefore Cauchy's integral theorem applies, and We can parameterize , and the second is θ ∈ J = [−π/2, π/2]\I.We also denote by λ 0 > 0 the smallest eigenvalue of L.

R→∞ I∪J
Re iθ 1 + Re iθ e −iH−Re iθ L dθ. (S4) On the interval I, we have Therefore this term vanishes in the limit R → ∞.On the interval J, for R ≥ 2, we have which vanishes as R → ∞.This proves the lemma.
Denote the right hand side of Eq. ( 2) by V (t; ik) = T e −i t 0 (H(s)+kL(s)) ds .Then V is analytic with respect to ω = ik in the complex plane for any t.Furthermore, if the eigenvalues of L(t) are uniformly bounded from below by λ 0 > 0, then for θ still holds.This proves the following statement.Here P stands for the Cauchy principal value of the integral.
Now we are ready to establish the proof of Theorem 1.
Proof of Theorem 1.We first prove the statement when L(t) ⪰ λ 0 > 0 for all t ∈ I.The right hand of Eq. ( 2) converges absolutely.Hence T e −i t 0 (H(s)+kL(s)) ds dk.(S9) When t = 0, both sides of Eq. ( 2) are the identity matrix due to the normalization condition of the Cauchy distribution.For t > 0, the definition of the time-ordered operator gives We will show that W (t) satisfies the same differential equation for t > 0. To this end, we choose a fixed δ > 0 and consider t ∈ [δ, T ].Differentiating W (t) with respect to t gives V (t; ik) dk.

(S11)
Here in Eq. (S11), since the right hand side of the first line uniformly converges in t ∈ [δ, T ], we can exchange the order of limitation (i.e., the principal value) and differentiation, so the calculations in Eq. (S11) rigorously hold.Now use Lemma 5, the first term vanishes and we can flip its sign as Then, for t ∈ [δ, T ], we have V (t; ik) dk.

(S13)
Since δ can be chosen arbitrarily close to 0, Eq. (S13) also holds for all t ∈ (0, T ].This proves that W (t) satisfies the same differential equation as the left hand side of Eq. ( 2) By the uniqueness of the solution, we prove that W (t) = T e − t 0 A(s) ds under for positive definite L(t).Finally, each matrix entry of both the left and right hand sides of Eq. ( 2) are continuous functions with respect to L(t).This allows us to take the limit λ 0 → 0 and finishes the proof.

II. NUMERICAL INTEGRATION
We introduce the notation ∥H∥ := max τ ∈[0,T ] ∥H(τ )∥ 2 , ∥L∥ := max τ ∈[0,T ] ∥L(τ )∥ 2 , and ∥b∥ := max τ ∈[0,T ] ∥b(τ )∥ 2 .We further denote We assume H(t) and L(t) are C 1 -smooth, and b(t) is C 2 -smooth in the following analysis.Firstly, we consider numerical integration formula to estimate in which F is a matrix-valued function (but not a matrix function).We can choose the truncation parameter K = c/ϵ for some constant c, and the remainder is then bounded by We can simply apply a trapezoidal rule on Eq. (S17) Here and k j = −K + 2jK M are the weights and nodes of the trapezoidal rule.The global error of the trapezoidal rule is proportional to where In order to reach precision ϵ, the total number of points is Secondly, we consider numerical integration formula to estimate in the inhomogeneous case, spatially truncated on [−K, K] with K = ∥b∥ L 1 /ϵ, and the remainder is then bounded by We can simply apply a trapezoidal rule on Eq. (S23).
M are the weights and nodes of the two-dimensional trapezoidal rule.
As above, the global error of the trapezoidal rule is proportional to where as above, and Using ∥b∥ L 1 ≤ ∥b∥T , to bound the error of the trapezoidal rule by ϵ, we should choose and

III. INTRODUCTION TO LINEAR COMBINATION OF UNITARIES
Linear Combination of Unitaries (LCU) [7,42] is a quantum primitive that is widely employed for tasks such as Hamiltonian simulations [8,9] and quantum linear system algorithms (QLSAs) [10].LCU offers exponential enhancements in precision when compared to quantum phase estimation (QPE) methods.
Let T = K−1 i=0 α i U i be a linear combination of unitaries U i .Without loss of generality, we assume K = 2 a and α i > 0 since a phase factor can be subsumed into U i .The selection oracle implements U i conditioned on the value of the controlled register.The preparation oracle satisfies where ∥α∥ 1 = i |α i |.As shown in [42,Lemma 2.1], we can implement T in the following sense.Lemma 6 (LCU Lemma).
Then for all states |ψ⟩, where |⊥⟩ is an unnormalized state that satisfies The LCU Lemma is a powerful quantum primitive, since the number of ancilla qubits a only scales logarithmically in the number of linear combination terms K. Upon measuring the ancilla qubits of W |0 a ⟩ |ψ⟩ and obtaining the outcome |0 a ⟩, the resulting postselection state is proportional to T |ψ⟩.This successful outcome occurs with probability (∥T |ψ⟩ ∥/∥α∥ 1 ) 2 .We can perform amplitude amplification [3] to boost success probability to Ω(1), using O(∥α∥ 1 /∥T |ψ⟩ ∥) number of queries to U , V , and |ψ⟩.

IV. HYBRID IMPLEMENTATION OF THE LCHS METHOD
We discuss the evaluation of Eq. ( 6) in a hybrid quantum-classical fashion.The idea is to use the quantum computer to evaluate ⟨u 0 |U † k (t)OU k ′ (t)|u 0 ⟩ via the non-unitary Hadamard test and amplitude estimation, and then perform the summation via classical Monte Carlo sampling.In this section, we first discuss how to estimate ⟨u 0 |U † k (t)OU k ′ (t)|u 0 ⟩ for a fixed pair (k, k ′ ), and analyze the classical sampling complexity.

IV.1. Estimating observables via non-unitary Hadamard test
Hadamard test is a well-known quantum primitive to estimate the expectation value of a unitary matrix.It uses one ancilla qubit, applies a Hadamard gate on it, then applies a controlled version of the target unitary operation, followed by another Hadamard gate on the ancilla qubit.Then, the probability of obtaining 0 when measuring the ancilla qubit is associated with the real part of the desired expectation value.The imaginary part can be obtained similarly with a slight modification.
The Hadamard test can be modified to estimate the expectation value ⟨ϕ|G|ϕ⟩ of a non-unitary matrix G (see e.g., Ref. [21,Appendix D]).The only modification is to replace the controlled unitary operation in the Hadamard test by the controlled version of the block encoding of G.For simplicity, we assume there is a unitary U G using m extra ancilla qubits such that (⟨0| Here α G is the block-encoding factor with α G ≥ ∥G∥.Notice that when G is unitary, the quantum circuit implementing G itself is the (1, 0)-block-encoding of G.Then, the scaled expectation value 1 α G ⟨ϕ|G|ϕ⟩ can still be obtained via the the probability of obtaining 0 when measuring the ancilla qubit.When combined with amplitude estimation, the non-unitary Hadamard test has the following complexity.
Using Lemma 7, we can directly obtain the complexity of estimating ⟨u 0 |U † k (t)OU k ′ (t)|u 0 ⟩ in our case.Lemma 8. Suppose that O prep is the state preparation oracle of |u 0 ⟩, U O is an (α O , 0)-block-encoding of O with α O ≥ ∥O∥, and U k (t) is a quantum circuit that approximates U k (t) with error smaller than ϵ HS for any k.Then, ⟨u 0 |U † k (t)OU k ′ (t)|u 0 ⟩ can be estimated to precision ϵ with probability at least 1 − δ, by choosing ϵ HS = ϵ/(4∥O∥) and using O((α O /ϵ) log(α O /ϵ) log(1/δ)) queries to O prep , U O , U and their inverses.
Proof.By [12,Lemma 53], multiplying U k ′ , U O and U † k (with additional ancilla qubits) gives an , so we may directly use the non-unitary Hadamard test to estimate the desired expectation value.According to Lemma 7, with probability at least 1 − δ, we can estimate There is another part of the error due to the imperfect implementation of U k .Specifically, we can bound and thus we can choose ϵ HS = ϵ/(4∥O∥) to bound this part of the error by ϵ/2.

IV.2. Linear combination via classical Monte Carlo sampling
Let c = (c k ) denote the vector of the coefficients c k = ω j /(π(1 + k 2 j )).Notice that all c j 's are positive real number and k,k ′ c k c k ′ = ∥c∥ 2 1 .We describe a sampling-based hybrid algorithm for estimating Eq. ( 6) as follows, which is similar to the sampling approaches from Fourier series in [43,44]: 1 .Denote the sample by (k(j), k ′ (j)).

Estimate ⟨u
)|u 0 ⟩ using the non-unitary Hadamard test.Denote the successful estimator by X j .
3. Estimate u(t) * Ou(t) using ∥c∥ 2 1 X, where X = 1 J J j=1 X j .If there is no error in implementing U k (t) or performing the non-unitary Hadamard test, then the the expectation value EX is exactly u(t) * Ou(t).Taking into consideration these errors as well as the sampling errors, we can analyze the complexity of our approach as follows.
Theorem 9. Suppose that O prep is the state preparation oracle of |u 0 ⟩, U O is an (α O , 0)-block-encoding of O with α O ≥ ∥O∥, and U k (t) is a quantum circuit that approximates U k (t) with error smaller than ϵ HS for any k.Then, u(t) * Ou(t) can be estimated to precision ϵ with probability at least 1 − δ, by choosing ϵ HS = O(ϵ/ ∥O∥).Furthermore, 1.The number of the samples is We first consider the sampling step.Notice that, even in the presence of errors, each X j is bounded within [−(∥O∥ + 1), ∥O∥ + 1].Hoeffding's inequality implies that To bound the failure probability by δ/2, it suffices to choose The last equality is because c j 's come from the numerical approximation of the integral dk π(1+k 2 ) and ∥c∥ 1 = j ωj π(1+k 2 j ) ≈ dk π(1+k 2 ) = O(1).Let δ ′ , ϵ ′ denote the upper bound of the failure probability and the error in the non-unitary Hadamard test step for any (k, k ′ ), respectively.Recall that X j denotes the estimator upon success of the Hadamard test step.Then To bound this by ϵ/2, it suffices to choose ϵ ′ = ϵ/(2 ∥c∥ 2 1 ).The overall success probability is at least (1−δ ′ ) J ≥ 1−Jδ ′ .To bound it from below by 1 − δ/2, we can choose δ ′ = δ/(2J).By Lemma 8, we need to choose the tolerated level of the error in implementing U k (t) as ϵ HS = ϵ ′ /(4 ∥O∥), and the query complexity for each circuit is Plugging in the choices of ϵ ′ , δ ′ and J yields the desired complexity estimate.
In Theorem 9, the query complexity of each circuit is measured by the number of queries to the state preparation oracle, the block-encoding of O and the circuit U k (t) that approximates U k (t).The final query complexity in terms of the input models for A(t) still depends on how we actually implement U k (t).This may introduce extra overhead if we use low-order methods such as first-order Trotter formula.On the other hand, if we use high-order Trotter formula or truncated Dyson series method, the complexity of implementing U k (t) can be almost linear in t, k and certain norms of A, and the dependence on ϵ HS is 1/ϵ o(1) HS .In the worst case, the query complexity of the circuit still has an almost linear dependence on K.However, we remark that in the average case, such a K dependence can be improved.This is because the circuit with larger k and k ′ is sampled with smaller probability, which decays quadratically as ∼ 1/(k 2 k ′2 ).
Throughout this section, we care about the expectation value u(t) * Ou(t) associated with the possibly unnormalized solution u(t), which is the natural setup in various applications of classical non-unitary dynamics.If we want to estimate ⟨u(t)|O|u(t)⟩ where |u(t)⟩ is the normalized solution of the ODE, then we need to divide the previous estimator by ∥u(t)∥ 2 .This may affect the algorithm in two aspects.First, to ensure that the final estimator is still an ϵ-approximation, we need to change the tolerated level of errors in u(t) * Ou(t) to be ϵ ∥u(t)∥ 2 .Second, if ∥u(t)∥ is unknown a priori, then an extra algorithm to estimate it is required.In this case, one may consider a similar hybrid algorithm specified in e.g., Ref. [43].

V. IMPLEMENTATION OF THE INHOMOGENEOUS TERM
The inhomogeneous term in Eq. ( 7) can be implemented following the same approach as the homogeneous case discussed in the main text.After discretizing the integral for both k and s using another trapezoidal rule, we obtain Ξp−1 l=0 e −iH(s j ′ +(l ′ +δ l )(t−s j ′ )/r)β l (t−s j ′ )/r e −iL(s j ′ +(l ′ +γ l )(t−s j ′ )/r)α l kj (t−s j ′ )/r |b(s j ′ )⟩ .

(S41)
Suppose we are given the coefficient oracle We start with the homogeneous case.Let We can guarantee this by choosing sufficiently large K, M and r.First, based on Eq. (S22), we can choose Next, according to [22], we have and To bound this by ϵ/2, it suffices to choose Query complexity in O coef is straightforward, and that in the select oracles are O(r).Using our estimate of r (and noticing that the select oracle can be constructed with O(log(M )) cost), we complete the proof.
Theorem 12. Consider the homogeneous ODE Eq. (1) with b(t) ≡ 0.Then, there exists a quantum algorithm that prepares an ϵ-approximation of the state |T e − T 0 A(s) ds u 0 ⟩ with Ω(1) success probability and a flag indicating success, using queries to O L (s, τ ) and O H (s, τ ) a total number of times Proof.Let V denote the (∥c∥ 1 , log(M ), ϵ ′ )-block-encoding of T e − T 0 A(s) ds as constructed in Lemma 11.We further write where ∥E∥ ≤ ϵ ′ .We start with the state |0⟩ a |0⟩, where the ancilla register contains log(M ) qubits.After applying O prep on the system register and V , we obtain the state Using the inequality ∥x/∥x∥ − y/∥y∥∥ ≤ 2∥x − y∥/∥x∥ for two vectors x, y, we can bound the error in the quantum state after a successful measurement as In order to bound this error by ϵ, it suffices to choose Therefore the overall complexity of a single run of our algorithm can be obtained by Lemma 11 with ϵ ′ , which becomes  e −iH(s j ′ +(l ′ +δ l )(T −s j ′ )/r)β l (T −s j ′ )/r e −iL(s j ′ +(l ′ +γ l )(T −s j ′ )/r)α l kj (T −s j ′ )/r |b(s j ′ )⟩ .(S60) To bound the error by ϵ, we use Eq.(S23) and [22]   The overall query complexity to the matrix input models is O(r log(M )).

VI.4. Linear combination of homogeneous and inhomogeneous terms
We state a more general result as follows on how to linearly combine two ("block-encoded") quantum states.The algorithm is a special case of LCU, but the errors in the quantum states need to be carefully controlled.Lemma 14.Let x 0 and x 1 denote two (possibly unnormalized) vectors.Suppose that we are given two unitaries U 0 and U 1 such that U j |0⟩ a |0⟩ = 1 ηj |0⟩ a x j + |⊥⟩ where ∥x j − x j ∥ ≤ ϵ j .Then, for any real positive parameters (θ 0 , θ 1 ), there exists a quantum algorithm which outputs an ϵ-approximation of the quantum state |θ 0 x 0 + θ 1 x 1 ⟩ with Ω(1) success probability and a flag indicating success, using 1 extra ancilla qubit and O η0θ0+η1θ1 ∥θ0x0+θ1x1∥ queries to U 0 , U 1 and additional one-qubit gate, where the tolerated errors are chosen as The operator SEL U serves as the select oracle in the LCU step.After the LCU as in our general algorithm, we obtain a quantum state

Theorem 3 .
Consider the spatially discretized Eq. (10) using finite difference with N equidistant grid points.Suppose that we are given the oracles O V I : |r⟩ |0⟩ → |r⟩ |V I (r)⟩, O V R : |r⟩ |s⟩ |0⟩ → |r⟩ |s⟩ |V R (r, s)⟩, and the state preparation oracle O prep for the initial condition.

Lemma 4 .
Let H, L ∈ C N ×N be Hermitian matrices, and L ≻ 0. Then P

VI. 3 .Lemma 13 .
Inhomogeneous termHere we analyze the complexity of encoding the inhomogeneous term T 0 T e − T s A(s ′ ) ds ′ b(s) ds.There exists a quantum algorithm which maps |0⟩ a ′ |0⟩ a |0⟩ to the state 1 η |0⟩ a ′ |0⟩ a u + |⊥⟩ such that u is an ϵ-approximation of T 0 T e − T s A(s ′ ) ds ′ b(s) ds and η = ∥ c∥ 1 , using queries to the input models of H and L a total number of times O Γ 1+1/p p queries to O ′ coef and O b for O(1) times, and log(Γ 1 ∥b∥ C 2 T /ϵ) ancilla qubits.Proof.According to Section V, the LCU procedure before measurement gives the state 1 η |0⟩ a ′ |0⟩ a u + |⊥⟩ such that η = ∥ c∥ 1 and
Each circuit with sampling value (k, k ′ ) uses queries to O prep , U O and U k (t) Proof.Notice that there are two sources of errors and failure probabilities: the non-unitary Hadamard test and the sampling step.It suffices to bound both errors (resp.failure probabilities) by ϵ/2 (resp.δ/2).