Quantum algorithms for powering stable Hermitian matrices

Matrix powering is a fundamental computational primitive in linear algebra. It has widespread applications in scientific computing and engineering, and underlies the solution of time-homogeneous linear ordinary differential equations, simulation of discrete-time Markov chains, or discovering the spectral properties of matrices with iterative methods. In this paper, we investigate the possibility of speeding up matrix powering of sparse stable Hermitian matrices on a quantum computer. We present two quantum algorithms that can achieve speedup over the classical matrix powering algorithms -- (i) an adaption of quantum-walk based fast forwarding algorithm (ii) an algorithm based on Hamiltonian simulation. Furthermore, by mapping the N-bit parity determination problem to a matrix powering problem, we provide no-go theorems that limit the quantum speedups achievable in powering non-Hermitian matrices.

Matrix powering is a fundamental computational primitive in linear algebra. It has widespread applications in scientific computing and engineering, and underlies the solution of time-homogeneous linear ordinary differential equations, simulation of discrete-time Markov chains, or discovering the spectral properties of matrices with iterative methods. In this paper, we investigate the possibility of speeding up matrix powering of sparse stable Hermitian matrices on a quantum computer. We present two quantum algorithms that can achieve speedup over the classical matrix powering algorithms -(i) an adaption of quantum-walk based fast forwarding algorithm (ii) an algorithm based on Hamiltonian simulation. Furthermore, by mapping the N −bit parity determination problem to a matrix powering problem, we provide no-go theorems that limit the quantum speedups achievable in powering non-Hermitian matrices.

I. INTRODUCTION
Recent years have seen rapid progress in the development of quantum computing hardware, and there have already been experimental demonstrations of quantum computations that are believed to be hard to simulate on classical computers [1,2]. While this progress in hardware has brought us closer to the monumental goal of building a fault tolerant quantum computer, it has also provided us with access to noisy quantum hardware which might already solve problems that are hard for classical computers [3]. From a theoretical standpoint, it has become important to discover algorithms that can provide speedup over their classical counterparts on both fault tolerant quantum computers and near-term noisy quantum hardware.
Quantum computers are known to offer exponential speedup in simulating the physics of quantum systems -near-optimal algorithms have been developed for the simulation of Hamiltonian dynamics [4][5][6], Lindbladian dynamics [7][8][9][10] and steady state (finite temperature or ground state) properties of Hamiltonians [11][12][13][14]. Several techniques used for simulating quantum systems have been generalized to accelerate more fundamental linear algebra computational primitives -exponential quantum speedup in the solution of systems of linear equations have been obtained [15][16][17], and quantum speedups have also been shown in solving ordinary differential equations [18,19] and partial differential equations [20].
Another fundamental computation that can be accelerated on quantum computers is matrix powering i.e. computing a matrix-element v † A t u given access to the matrix A ∈ C N ×N , a positive integer power t and vectors v, u ∈ C N . This is a computational primitive which appears in various applications, including but not limited to solving linear differential equations, simulating discrete-time Markov chains as well as matrix inversion and eigenvalue computation using Krylov subspace methods. Classically, this problem can be solved by repeated * rahul.trivedi@mpq.mpg.de matrix multiplication in time O(poly(N )Dt), where D is the sparsity of the matrix A. Without any assumptions on the matrix A, one approach to solve the matrix-powering problem on a quantum computer is to map it to a matrix inversion problem and use quantum algorithms for solving linear equationsthis approach has been investigated in Refs. [18,19] in the context of solving linear time-homogeneous ODEs and has a run-time O(polylog(N )poly(ε −1 )κ V v 2 u 2 Dt), where κ V is the condition number of the eigenvector matrix of A, to obtain v † A t u to a precision ε for stable matrices thereby providing an exponential speedup in the matrix size over classical algorithms.
Furthermore, several authors have studied the problem of powering a stochastic matrix which arises in the context of simulating the dynamics of a discrete-time Markov chain [21][22][23]. Classically, the problem of powering a stochastic matrix can be solved efficiently to precision ε with the Monte Carlo algorithm in time O(Dt v 2 u 2 /ε 2 ) -using the Monte Carlo algorithm is thus exponentially faster than using repeated matrix multiplication. While the quantum algorithms based on linear-equation solve do not provide an exponential speedup for stochastic matrix powering when compared to the classical Monte Carlo algorithm, there have been two proposals for achieving polynomial quantum speedups for this specific problem. One of the proposed algorithms is to use a reversible implementation of the classical Monte Carlo together with quantum amplitude estimation to achieve a quadratic improvement in the dependence of the run-time on precision as compared to the classical Monte Carlo algorithm [24]. This idea has been applied to propose solutions to the heat equation [25], and stochastic differential equations [26]. A different quantum speedup can be obtained for symmetric stochastic matrices by employing quantum walks [27][28][29][30]. In particular, Ref. [31] prepares a quantum state within an ε−radius ). The same author generalized this algorithm to arbitrary Hermitian matrices A in Ref. [32]. While this algorithm obtains a quantum speedup over classical methods (an exponential speedup in N over repeated matrix multiplication, and quadratic speedup in t over arXiv:2103.08329v1 [quant-ph] 15 Mar 2021 Monte Carlo algorithm), the dependence of the run-time on A t u −1 can often make it polynomially slow in N .
In this paper we introduce two algorithms to compute v † A t u for stable Hermitian matrices A i.e. Hermitian matrices all of whose eigenvalues have magnitudes less than 1. The first algorithm, which combines the construction of Ref. [31] with a Hadamard test [33], has a run-time For matrices where it is known that A 1 ≤ 1, this provides a quantum speedup over repeated matrix multiplication, since its run-time does not scale polynomially with the size of the matrix. Furthermore, it provides a quadratic speedup in t over classical Monte Carlo algorithm for symmetric stochastic matrices (in which case A 1 = 1). For problems such as the simulation of diffusive discrete-time Markov chains, where A t u −1 = O( √ N ) at large t, this algorithm provides a exponential speedup in the size of matrix A over Ref. [31]. The second algorithm has a run-time ofÕ(Dt 2 poly v u ε −1 ) to compute v † A τ u for all τ ∈ {0, 1, 2 . . . t}. While this is slower than the quantumwalk based algorithm, it only uses Hamiltonian simulation as a primitive and thus is more suitable for near-term quantum hardware. It also achieves a run-time comparable to the quantum algorithms based on linear equation solve [18,19]. Furthermore, for matrices that are not stochastic and consequently cannot be classically powered with the Monte Carlo algorithm, this algorithm achieves a quantum speedup over repeated matrix multiplication since its run-time does not scale polynomially with the size of the matrix. Finally, following a construction similar to Ref. [34], we provide no-go theorems that limit the speedups achievable with a quantum computer for powering non-Hermitian matrices.

II. PROBLEM DEFINITION, PRELIMNARIES AND SUMMARY OF RESULTS
We consider the problem of powering a Hermitian matrix A ∈ C N ×N that is stable i.e. all of its eigenvalues have a magnitude less than 1, or equivalently A 2 ≤ 1. Furthermore, we will assume the matrix to be D−sparse i.e. every row or column of the stochastic matrix has at most D non-zero elements. Hermitian matrices arising in practice will typically have D = O(1) or O(polylog(N )). The matrix powering problem that we consider is precisely defined below.
Problem (Matrix powering) Given a D−sparse stable Hermitian matrix A ∈ C N ×N , a positive integer power t and 1 Notation for norms: Throughout this paper, for a vector v ∈ C N , v k , k ∈ {1, 2 . . . } will refer to the standard k norm of the vector. Furthermore, for convenience, we will use v to denote the 2 norm of v. For matrices A ∈ C N ×N , A k denotes the operator norm induced by k vector norm i.e. A k = sup v Av k / v k . In particular, A 2 will be the largest singular value of A, which coincides with the largest magnitude eigenvalue of A is Hermitian. Additionally, for Hermitian matrices A, A ∞ = A 1 will be the maximum absolute row (or column) sum of the matrix A.
vectors v, u ∈ C N , compute v † A t u to a specified precision ε > 0.
We point out that previous works that solve the matrix powering problem in various contexts adopt a different problem definition wherein they aim to prepare a quantum state encoding A t u. Since in many application of matrix powering we are finally interesting in computing its inner product, v † A t u, with another vector v which is typically known beforehand, the algorithms proposed in this paper directly compute this expectation value without ever explicitly prepare a quantum state encoding A t u. We make two further notes about this problem definition: 1. The precision of the output of this algorithm is assumed to be in a probabilistic sense i.e. the algorithm is said to produce an estimate X of a quantity x with precision ε if Prob[|X − x| ≤ ε] is large enough. The value of this probability, often referred to as the confidence level of the algorithm, is assumed to be a pre-specified constant close to 1 throughout this paper and we will suppress it in the complexity results.
where f (j, k) is the index of the k th non-zero element in the j th row or column. The oracle O A provides access to the non-zero elements of the matrix A via the implementation of a unitary that satisfies where A j,k are the complex elements of the matrix A that are represented by a bit-string upto some specified precision δ. On a quantum computer, these oracles can be implemented with quantum circuits of depth O(D polylog(1/δ)) [35]. On near-term hardware, there might be alternative more efficient ways of implementing these oracles for specific matrices A (for instance the matrices corresponding to local Hamiltonians of a lattice of classical spins). In this paper, for clarity, we will express our complexity results in terms of the number of calls to the oracles O F and O A and these can easily be translated to the circuit depths for various hardwarespecific implementations.
In the remainder of this paper, we provide several quantum algorithms to solve the matrix powering problem and achieve speedups over classical algorithms. The first algorithm builds on Ref. [31] and combines a quantum walk together with a Hadamard test and a classical sampling algorithm to obtain the following result. Theorem 1 Given a constant C > 0 such that A 1 < C, the matrix-powering problem can be solved with a quantum Furthermore, by employing the linear combination of unitaries (LCU) technique along with quantum amplitude amplification [24], we can obtain a quadratic improvement in the scaling of the run-time with the precision ε.
Theorem 2 Given a constant C > 0 such that A 1 < C, the matrix-powering problem can be solved with a quantum algo- HereÕ hides any polylog complexity factors. We point out that these algorithms suffer from an exponential scaling with the power t when A 1 > 1 -this is due to the fact that the quantum walk construction we employ can only be used if the sum of magnitude of the elements of each row (or column) of A is smaller than 1. For a number of matrix powering problems, such as simulation of discrete-time Markov chains, A 1 = 1, and the run-time of these scales sublinearly with t. The above results improve the fast-forwarding algorithm presented in Refs. [31], whose run-time scales inversely with A t u -our approach avoids this scaling at the expense of scaling with u 2 , v 2 . This could be of relevance in problems such as the simulation of diffusive discrete-time Markov Furthermore, compared to the quantum algorithms based on linear-equation solve, this result has a quadratic speedup in t. We also remark that when compared to classical algorithms, we obtain an exponential speedup in N over matrix multiplication algorithm and a quadratic speedup in t over the Monte Carlo algorithm when the matrix A is stochastic.
While the quantum walk based algorithms provided above are able to achieve 'fast-forwarding', i.e. a sublinear run-time with respect to the matrix power t, they are difficult to implement on near-term quantum hardware. Given experimental constraints, it is widely believed that Hamiltonian simulation [4][5][6] will be one of the first problems to be solved on practical hardware. Furthermore, simulation of several classes of Hamiltonians can also be implemented on analog quantum simulators [36][37][38] which are significantly easier to experimentally build as compared to fully-programmable quantum computers. Based on a truncated Fourier series expansion of the function f (x) = x t , we provide a quantum algorithm to solve the matrix-powering problem with only the ability to use Hamiltonian simulation.

Theorem 3
The matrix powering problem can be solved simultaneously for all powers from 0 to t using an efficient Hamiltonian simulator in timeÕ t 2 poly v u ε −1 D .
We note that this result has a worse run-time not only when compared to the quantum-walk algorithms, but also with classical Monte Carlo algorithm if the matrix A is stochastic. However, it achieves an exponential speedup in N over classical repeated matrix multiplication algorithm although at an expense of quadratically worse scaling with t, for matrices that are not stochastic. Furthermore, it achieves the same run-time as quantum algorithms based on the linear-equation solve if they are employed to compute matrix powers from 0 to t. The key advantage of this algorithm over other quantum algorithms is its feasibility to being implemented on near-term quantum hardware.
Finally, all the algorithms provided above assume the matrix A to be Hermitian, in which case it was possible to obtain a 'fast-forwarding' speedup using quantum walks i.e. com- A natural question to ask is if fastforwarding is possible for non-Hermitian matrices as well. By utilizing a construction similar to the no-go theorems for Hamiltonian simulation [34] and relying on the result that even a quantum computer cannot speedup the calculation of parity of N −bits [39,40], we provide the following no-go theorem.
Theorem 4 (No-go theorem) There cannot exist a quantum algorithm that solves the matrix-powering problem iñ O t α poly( u , v , ε −1 ) calls to the oracle O F , O A , with α < 1, for any arbitrary irreducible sparse matrix A.
We point out that while these no-go theorems rigorously show that it is not possible to fast-forward the matrix-powering problem for generic non-Hermitian matrices, it does not prohibit an improvement of the run-time's dependence on the size of the matrix. Indeed, matrix-powering methods based on quantum linear equation solvers [18,19] obtain an exponential improvement over classical algorithms even for non-Hermitian matrices if the matrix is not stochastic.
The remainder of this paper contains proofs of the theorems stated above. In section III we describe the matrix-powering algorithms presented in this paper and prove theorems 1, 2 and 3. In section IV, we prove the no-go theorem 4. We only provide proofs of the most important theorems in the main text, and details are relegated to the appendices.

III. MATRIX MULTIPLICATION ALGORITHM
Before detailing the matrix-powering algorithm, we provide the following lemma that map the computation of v † A t u to the overlap of A t , ψ| A t |ψ , with quantum states |ψ that depend on u, v. This transformation is useful since the Hadamard test naturally allows for the computation of such overlaps.

Lemma 1 Given a Hermitian matrix A and vectors v, u, it follows that
where |ψ R i and λ R i , i ∈ {1, 2}, are the eigenvectors and eigenvalues of the Hermitian matrix uv † + vu † and |ψ I i and λ I i , i ∈ {1, 2}, are the eigenvectors and eigenvalues of the Hermitian matrix i(vu † − uv † ).

Proof: It follows immediately from the Hermiticity of
, we obtain the result in the lemma.
Consequently, we will focus on developing methods to compute the overlap ψ| A t |ψ efficiently. It can also be noted that for problems where u and v are sparse, the eigenvectors |ψ 1,2 introduced above are also sparse and consequently efficiently preparable on quantum computers. Furthermore, we note that the eigenvalues λ 1 and λ 2 are bounded by the norms u and v, which is concretely stated in the following lemma.
Proof: Denoting by |ψ the normalized eigenvector corresponding to the eigenvalue λ, it follows that

A. Fast-forwarding with quantum walks
One of the key ingredients in the quantum walk based algorithms for the matrix powering problem is expressing A t as a linear combination of Chebyshev polynomials of A, where p m is a probability distribution given by Consequently, a quantum circuit to compute the overlap ψ| A t |ψ can be constructed from a quantum circuit that can compute the overlap ψ| T m (A) |ψ for a specified m ∈ {0, 1 . . . t}. As is shown below, this can be done with a quantum walk provided that the 1-norm of A is smaller than 1. Since this isn't necessary for stable Hermitian matrices, we assume that we have access to an upper bound C on this norm i.e. A 1 ≤ C and compute ψ| (A/C) t |ψ , albeit to a precision C t higher than that required in ψ| A t |ψ . Therefore, in the remainder of this section, unless otherwise mentioned, we will assume A 1 ≤ 1.
A quantum walk construction similar to that used in Refs. [27,28,31] together with a Hadamard test allows us to compute these overlaps. However, since the elements of the matrix A can be complex, it is important to design the quantum walk with care so as to account for the phase of the complex matrix elements [35]. For A ∈ C N ×N , we consider a Hilbert space C N +1 ⊗ C N +1 ⊗ C 2 and assume access to a unitary V that satisfies where if A i,j = |A i,j |e i∠Ai,j for ∠A i,j ∈ (−π, π] then ϕ i,j = ∠A i,j for i ≥ j and −∠A i,j for i < j. Furthermore, we introduce the operator S given by We remark that this operator is different from that used in Ref. [31] -in particular, we have modified this operator to account for possibly negative on-diagonal elements of the matrix A which Ref. [31] did not handle since they were dealing with a stochastic matrix. Finally, the quantum walk operator W can then be constructed using the operators V, S and a reflection about the last qubit We then obtain the following lemma, similar to that obtained in Refs. [31,32] stating that m applications of the quantum walk operator effectively applies T m (A) on an input state conditioned on the state of the last qubit.
As is shown in appendix A, the quantum walk operator A can be implemented with O(D) calls to the oracles O F , O A that access the matrix A. In order to estimate the overlap ψ| T m (A) |ψ using the walk operator W , we introduce a controlled version of W , W c via We then have the following lemma to compute the overlap ψ| T m (A) |ψ using a hadamard test with the controlled operator W c .
Lemma 4 (Chebyshev polynomial overlap) Consider the state (W c ) m |ψ |0, 0, + , and measure the last two qubits on the basis {|0, + , |0, − , |1, + , |1, − }. Define a random variable X m based on the measurement outcome µ via While this overlap estimation procedure can be used together with the Chebyshev polynomial expansion in Eq. 3 to compute ψ| A t |ψ , this would not provide any fast-forwarding since computing T t (A) would require t applications of the operator W c . However, an important insight into the nature of the coefficients p m in the expansion in Eq. 4 is that they concentrate around m ∼ √ t. One possible approach to exploit this property is to sample m from the probability distribution given by the coefficients p m in Eq. 4, and compute the overlap ψ| T m (A) |ψ of the corresponding Chebyshev polynomial using lemma 4 -this would allow us to reduce, on an average, the number of times the walk operator W c is applied. This is formalized in the lemma below:

Lemma 5 (Matrix power overlap with classical sampling)
The overlap ψ| A t |ψ can be computed by estimating the mean of a random variable X which is generated by first sampling m ∈ {0, 1 . . . t} from the probability distribution in Eq. 4, followed by drawing a sample of X m defined in lemma 4 using the state |ψ . Furthermore, this estimation can be done with a precision ε with O(ε −2 √ t) calls to the controlled walk operator W c or equivalently with O(Dε −2 √ t) calls to the oracles O F , O A .
Proof: We can immediately see that Furthermore, noting that X 2 ∈ {0, 1}, it follows that var(X) ≤ 1. Consequently, E(X) can be estimated to a precision of ε with N = O(1/ε 2 ) samples. Furthermore, the average number of calls to the controlled walk operator W c , m is given by For large t, using Stirling's approximation this is Therefore m = O √ t , and consequently the number of calls to W c operator to achieve a precision ε given by Proof of theorem 1: Combining lemma 5 with lemma 1, we obtain a procedure for solving the matrix powering problem. The complexity result in theorem 1 can be obtained as follows: Given an upper bound C on A 1 , we note that to compute v † A t u to precision ε, we need to compute ψ L,R 1,2 | (A/C) t |ψ L,R

1,2
to a precision at-most ε/4 u v C t which can be done using the algorithm in lemma 1 with While the algorithm described above allows a quadratic fastforwarding for the matrix powering problem, the dependence of the run-time on the precision ε can also be improved by using amplitude amplification, which is precisely stated in the following lemma from Ref. [24]: Lemma 6 (Overlap estimation, Theorem 2.5 of Ref. [24]) Given a state |ψ in terms of its preparation unitary U from a known state |0 : |ψ = U |0 , an observable V and an estimate σ of its variance satisfying ψ| V 2 |ψ − ( ψ| V |ψ ) 2 ≤ σ 2 , ψ| V |ψ can be estimated on a quantum computer with precision ε withÕ(ε −1 σ) calls to the unitary U .
In order to use amplitude amplification and achieve fast forwarding in the same algorithm, we approximate the problem of computing the overlap ψ| A t |ψ for a given |ψ to computing an overlap of the form φ t | V |φ t where the operator V is independent of t and the state |φ t can be prepared in Θ( √ t) calls to the oracles O F , O A . This is achieved by using a combination of quantum walks with the hadamard test and the linear combination of unitaries (LCU) technique similar to Ref. [31]. The quadratic fast-forwarding in this approach is also obtained due to the concentration of the coefficients p m in Eq. 4 around m = √ t. This is made concrete by the following lemma, according to which the sum in Eq. 3 can be truncated after ∼ O( √ t log(ε −1 )) terms while incurring a specified additive error ε.

Lemma 7 If
A is a stable Hermitian matrix and |ψ is a normalized state, then ∀ε > 0 and C ≥ 2 log(2/ε) Proof: Using lemma 3 from Ref. [31], we obtain that ∀ε > 0, C ≥ 2 log(2/ε) Since A is a stable matrix, its eigenvalues will have a magnitude less than equal to 1. Furthermore, since A is also Hermitian, its eigenvalues are real and lie in [−1, 1] and hence satisfy Eq. 13. Denoting by λ i , |φ i the eigenvalues and eigenvectors of M and using its spectral decomposition Consequently, we can effectively approximate A t as weighted linear combination of O( √ t) Chebyshev polynomials of Awhile the quantum walk operator introduced in Eq. 7 can be used to individually implement the Chebyshev polynomials, in order to implement their linear combination we use the LCU technique [6]. Below, we show the construction of an operator to effectively apply τ m=0 p m W m to given quantum state. We do this by introducing auxillary qubits and implementing the following unitary V P depending on the coefficients p m : Furthermore, we assume access to a controlled quantum walk operator W τ defined by The operator W τ requires τ calls to the quantum walk operator W . Therefore, following the result in appendix A, it can be constructed with O(Dτ ) calls to the oracles O F , O A . The operator V † P W τ V P then effectively applies the linear combination τ m=0 p τ W τ to an input state. Lemma 8 (LCU adapted from Refs. [6,31]) The unitary operator V † P W τ V P , with V P and W τ defined in Eqs. 15 and 16 respectively, satisfy for some |φ ⊥ .
In order to compute ψ| τ m=0 p m T m (A) |ψ , we consider a controlled version of the operator defined in lemma 8: It then follows from lemmas 3 and 8 that computing the expectation value of the operator |0 0| ⊗ σ z on the last two qubits in the circuit of W c τ on a state prepared by application of HW c τ H (where H is a hadamard gate on the last qubit) to |ψ |0, 0 . . . 0 allows us to evaluate ψ| τ m=0 p m T m (A) |ψ . By truncating the linear combination to an appropriate number of terms using lemma 7 and using amplitude estimation (lemma 6), we obtain the following lemma for estimating the overlap of the matrix with a given state. Proof: The overlap estimation can be done using the Hadamard test described above -as shown in lemma 7, it is sufficient to use τ = O( √ t log(ε −1 )) in the LCU construction described in lemma 8. Furthermore, the outcome of the Hadamard test has a variance bounded by 1 and consequently a direct application of amplitude amplification (lemma 6) allows us to obtain an estimate of ψ| A t |ψ withÕ(D √ t/ε) calls to the oracles O F , O A .
Proof of theorem 2: Combining lemma 9 with lemma 1, we obtain a procedure for solving the matrix powering problem. The complexity result in theorem 2 can be obtained as follows: Given an upper bound C on A 1 , we note that to compute v † A t u to precision ε, we need to compute ψ L,R 1,2 | (A/C) t |ψ L,R

1,2
to a precision at-most ε/4 u v C t which can be done using the algorithm in lemma 1 with

B. Matrix powering with hamiltonian simulation
In this section, we describe an approach to solve the matrix powering problem using Hamiltonian simulation as a primitive and prove the complexity result in theorem 3. Formally for our purposes, a Hamiltonian simulation can be considered to be the problem of computing ψ| e −iHt |ψ to a precision ε given access to the sparse Hamiltonian H and the states |ψ . A Hamiltonian simulator (implemented on a quantum computer or an analog quanutm simulator) is said to be efficient if it can solve this problem inÕ(poly(ε −1 )D H max t) time, where D is the sparsity of the Hamiltonian and H max is its maximum magnitude element. State of the art algorithms for Hamiltonian simulation on quantum computers achieve such run-times for general sparse Hamiltonians, while such run-times can be achieved on quantum simulators for local Hamiltonians.
We again restrict ourselves to stable Hermitian matrices A. In order to compute an overlap ψ| A t |ψ , we expand A t into a fourier series -as is shown in the following two lemmas, this can be done to a precision of ε while retaining only N h = O(t/ε) harmonics.
Lemma 10 ∀ε ∈ (0, 2/π), N h ≥ 4t/π 2 ε, ∃a ∈ C 2N h +1 such that where a n are the elements of the vector a and a can be com- A detailed proof of this lemma, as well as an explicit calculation of the coefficient vector a, is provided in appendix B.
Employing this result, we can now compute the overlap of the power ψ| A t |ψ using an efficient Hamiltonian simulator.
Proof: Since the matrix A is stable and Hermitian, all of its eigenvalues are real and lie in [−1, 1] thus satisfying Eq. 18. Denoting by λ k , |φ k the eigenvalues and eigenvectors of A and using lemma 10, we obtain that for an appropriately chosen N h = O(τ /ε). Furthermore, we note that since A is stable, the magnitude of all of its elements is at-most 1 i.e. A max ≤ 1. Using an efficient Hamiltonian simulator, we can estimate ψ| e iπnA/2 |ψ to a precision ε/2 in time O(poly(ε −1 )Dn). Since a 1 ≤ 1, we can then compute N h n=−N h a n ψ| e iπnA/2 |ψ to a precision ε on a classical computer, thus determining ψ| A τ |ψ to a precision ε -since we need to compute ψ| e iπnA/2 |ψ for . Furthermore, we note that since we propose to compute the overlaps ψ| e inπA/2 |ψ individually for all n, we can compute ψ| A τ |ψ for all τ ∈ {0, 1 . . . t} with the same set of Hamiltonian simulations in time O(poly(ε −1 )Dt 2 ).
Proof of theorem 3: Combining lemma 11 with 1, we obtain a procedure for solving the matrix powering problem. To obtain the complexity result in theorem 3, we note that computing v † A τ u to precision ε, we need to compute ψ L,R 1,2 | A τ |ψ L,R

IV. NO-GO THEOREMS FOR FAST FORWARDING
In this section, we provide no-go theorems stated in section II, showing that fast forwarding the matrix powering problem is not possible for a generic non-Hermitian matrix. These no-go theorems utilize a construction similar to the no-go theorems for Hamiltonian simulation [34], and rely on the fact that even a quantum computer cannot speedup the calculation of parity of N −bits [39,40], a result concretely stated as the following Lemma: Lemma 12 (N −bit parity problem, Refs. [39,40]) There cannot exist a quantum algorithm that can determine the parity b 1 ⊕ b 2 ⊕ . . . b N with success probability greater than 1/2 with fewer than N/2 calls to the oracle O B .
We note that this result rules out even an approximate solution of the N −bit parity problem on a quantum computer with run-time less than O(N ). In particular, since the parity is known to be an integer, if an algorithm can estimate this parity to a precision ε < 1/2 with a confidence level greater than 1/2, then it would have solved the N −bit parity problem -consequently, in the rest of this analysis we can consider the precision ε = O(1). As is shown in the no-go theorem below, we can construct a matrix such that computing its N th power, in the sense specified in section II, allows us to solve the N −bit parity problem, thereby ruling out the possibility of designing a quantum algorithm that can achieve a run-time scaling sublinearly with the matrix power. We first provide a proof of lemma 13, which rules out the possibility of fast-forwarding the powering of a general matrix, and then strengthen it to obtain theorem 4 which rules out the possibility of fast-forwarding even the powering of irreducible matrices.
Lemma 13 (No-go theorem for arbitrary matrix) There cannot exist a quantum algorithm that solves the matrix powering problem inÕ t α poly( u , v , ε −1 ) calls to the oracles O F , O A , with α < 1, for any sparse matrix A.
Proof: Given N bits b 1 , b 2 . . . b N , we can construct the following matrix powering problem that determines the parity of b 1 ⊕ b 2 · · · ⊕ b N on computing its N th power: 1. The matrix A ∈ C 2N ×2N with the rows (or columns) being indexed by (i, σ) where i ∈ [N ] and σ ∈ {0, 1} and matrix elements being given by 2. The vectors u and v are given by The matrix A is a stochastic matrix corresponding to a discretetime Markov chain (DTMC) shown in Fig. 1a -the states of this DTMC can be grouped as per their σ index, and for every bit b i is identified with a flip of the 'σ' index at i. Consequently, on computing v † A t u, we can count the number of bits that are 1 thereby computing While this argument proves that no quantum algorithm can exist to fast-forward the matrix powering problem for a general matrix, we note that the specific matrix A used in this argument is reducible. Consequently, this raises the question of whether their exists a quantum algorithm that can fast-forward the powering of arbitrary irreducible matrices. We show that this too isn't possible by constructing an irreducible stochastic matrix that is very close to the reducible stochastic matrix constructed above, and thus approximately solves the N −bit parity problem.
Proof of theorem 4: We consider an instance of the matrixpowering problem with u, v as defined in Eq. 21 and a matrix A δ = A + Bδ where A is defined in Eq. 20, δ ∈ (0, 1) and matrix B has elements given by: The matrix A δ is the stochastic matrix corresponding to a discrete-time Markov chain shown in Fig. 1b. It is easy to see that A δ is irreducible for δ = 0. Furthermore, we can easily bound the difference between the result of powering A δ and A: v For the choice of u, v under consideration, δ = log(1 + ε/2 √ 2)/8N 2 and t = N it follows that wherein we have used v = √ 2 and u = 1 This shows that being able to compute v † A t δ u to precision ε/2 allows us to determine v † A t u = b 1 ⊕ b 2 ⊕ · · · ⊕ b N to precision ε. Consequently, from lemma 13, the no-go theorem follows for irreducible matrices as well.

V. CONCLUSION
This paper studied the problem of computing the power of a stable Hermitian matrix. Following the construction of Ref. [31], we show that fast-forwarding is possible while powering stable Hermitian matrices and present algorithms based on quantum walks that improve their results. We also present a complementary algorithm to solve the matrix powering problem using only Hamiltonian simulators which could potentially be used on near-term quantum hardware. Finally, by establishing a map between the the N −bit parity determination problem to a matrix powering problem, we show that quantum computers cannot fast-forward powering of non-Hermitian matrices.

ACKNOWLEDGMENTS
We acknowledge support from the ERC Advanced Grant QUENOCOBA under the EU Horizon 2020 program (grant agreement 742102) and from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under the project In this appendix we show how the quantum walk operator W can be implemented with O(D) calls to the oracles O F , O A that access the matrix A, where D is the sparsity. The quantum walk operator is W = V † SV . We have assumed access to a unitary operator V such that We can write V as Therefore, V applies V i depending on the second and third registers depending on the state of the first qubit. The V i act as ψ ⊥ i,j , along with |ψ i , form an orthonormal basis. The expression for |ψ i is where the square root is defined as in A1a. Therefore, we need to show that we can apply the operator U that prepares |ψ i with O(D) queries to the oracles. This can be proved with a slight modification of the procedure in [41].
To do this, we start with the state |i |0 . We then prepare a list with all the neighbours of i: |i |f (i, 1) .. x t sin((2p + 1)πx/2)dx.
We point out that the Eq. B2 can be rewritten in terms of complex exponentials to obtain a fourier series of the form used in lemma 10. Furthermore, c p (t) and s p (t) can be explicitly evaluated to obtain We point out that s p (t) and c p (t) can be computed in O(t) time on a classical computer using a recursive implementation of the summations in Eq. B4. We now consider a truncated fourier series expansion i.e. we construct the functionf N t (x) from the coefficients c p (t), s p (t) wherê It now remains to provide bounds on e N (t) in terms of t and N . We note from Eq. B4 that ∀p > t/π, |c p (t)| ≤ 2 A similar bound holds for |s p (t)|: ∀p > t/π, |s p (t)| ≤ 2 (t+1)/2 k=1 1 ((p + 1/2) 2 π 2 ) k 2k−2 Consequently, it then follows that To ensure that e N (t) is smaller than a given precision ε, we can then choose N to be We point out that for this estimate to be correct, the chosen N should also be larger than t/π (Eq. B9), which is implied by Eq. B10 if the precision ε to be smaller than 2/π and we obtain the estimate provided in lemma 10. Finally, we compute the l 1 norm of the coefficients c p (t) and s p (t). We note that (−1) p c p (t) ≥ 0 and (−1) p s p (t) ≥ 0 for all p ≥ 0. This is easily seen as follows -from Eq. B3, using integration by parts it follows that c p (t) = 2(−1) p t p 2 π 2 − t(t − 1) p 2 π 2 c p (t − 2) s p (t) = 2(−1) p t (p + 1/2) 2 π 2 − t(t − 1) (p + 1/2) 2 π 2 s p (t − 2).