Accelerated quantum Monte Carlo with mitigated error on noisy quantum computer

Quantum Monte Carlo and quantum simulation are both important tools for understanding quantum many-body systems. As a classical algorithm, quantum Monte Carlo suffers from the sign problem, preventing its application to most fermion systems and real time dynamics. In this paper, we introduce a novel non-variational algorithm using quantum simulation as a subroutine to accelerate quantum Monte Carlo by easing the sign problem. The quantum subroutine can be implemented with shallow circuits and, by incorporating error mitigation, can reduce the Monte Carlo variance by several orders of magnitude even when the circuit noise is significant. As such, the proposed quantum algorithm is applicable to near-term noisy quantum hardware.


I. INTRODUCTION
The simulation of quantum many-body systems is one of the main motivations for quantum computing [1]. A lot of quantum many-body problems are intractable in classical computing. An apparent reason is that the Hilbert space dimension increases exponentially with the system size and it is impossible to store the wave function of a large system in classical memory. Quantum Monte Carlo (QMC) is a group of classical algorithms designed to bypass this memory issue. By sampling only the most important part of the configuration space, QMC can solve certain many-body problems at a polynomial complexity, at the cost of introducing small statistical errors. Unfortunately, when applied to fermion systems and real time dynamics, QMC encounters the notorious sign problem, i.e. the target amplitude is a highly-oscillating function with alternating sign. This sign problem results in a variance that increases exponentially in the Monte Carlo simulation [2], forming the dominant limitation of QMC. On the other hand, by mapping the target wave function of the simulated system into the wave function of qubits on a fault-tolerant quantum computer [3], we can reproduce the dynamics of quantum systems while the memory and run time scale polynomially [4]. With the development of the fault-tolerant technologies as a long-term goal, exploring the power of noisy intermediate-scale quantum hardware is of particular importance for near-term applications [5]. In this paper, we establish the framework of quantum-circuit Monte Carlo (QCMC) algorithm, in which quantum computing is a subroutine of QMC. We show that this algorithm has a quantum advantage in solving many-body problems, even on noisy quantum computers.
Since Ulam and Metropolis's pioneering work of using random sampling to simulate real physical systems [6], the Monte Carlo method has grown into a large family of algorithms. Here, we focus on a specific subset of Monte Carlo algorithms, namely, the QMC methods, which are based on real or imaginary time evolution. These methods include Green's function Monte Carlo [7], auxiliary * yli@gscaep.ac.cn field Monte Carlo [8,9], world-line Monte Carlo [10,11], and diagrammatic Monte Carlo [12][13][14][15], and their various variants. In what follows, by QMC, we refer to this subset of algorithms. The other QMC algorithms are based on variational methods [16] but while their connection to quantum computing is also an interesting topic, they are not be covered in this work.
In most QMC methods, we sample the configurations according to a quasi-probability amplitude derived from time evolution. For fermion systems such an amplitude is usually a complex number, which can be positive definite if the system respects certain symmetries. Examples of the latter case include the half-filled Hubbard model with particle-hole exchange symmetry [17,18] and the nuclear system with Wigner-SU(4) symmetry [19,20]. However, a realistic Hamiltonian usually contains terms that break these symmetries and induce oscillating phases in the probability amplitude. As a result, even though QMC methods are very successful in describing certain strongly correlated systems in chemistry [21], condensed matter physics [22], and nuclear physics [7], their application is still rather limited due to the sign problem. Although in some important cases the sign problem can be alleviated using complicated techniques [23], e.g. the complex Langevin method [24,25] or the Lefschetz thimble method [26,27], finding a generic solution is unlikely, as it is proven that the sign problem is NP-hard [2].
In quantum computing, the qubit and time costs for simulating the unitary time evolution of a quantum system scale polynomially with the problem parameters, i.e. the system size, evolution time, and accuracy. Such algorithms include the Lie-Trotter-Suzuki decomposition [4,28,29], the truncated Taylor series [30,31], linear combinations of Lie-Trotter-Suzuki products [32,33], and the random compiler [34]. Based on the simulation of unitary time evolution, one can also simulate opensystem dynamics [35,36], solve equilibrium-state problems [37,38] and find the ground state for certain Hamiltonians [39][40][41]. However, implementation of these algorithms at a meaningful scale usually requires a faulttolerant quantum computer [42,43], on which the logical error rate can be reduced to any level at a polynomial cost in quantum error correction [44]. In recent years, hybrid quantum-classical algorithms have been de-arXiv:2106.09880v3 [quant-ph] 5 Jan 2022 veloped for applications before the era of fault-tolerant technologies [45]. Many such algorithms are based on variational principles for solving the ground-state energy [46,47], real time simulation [48,49] and imaginary time simulation [50,51]. A variational quantum algorithm largely depends on the ansatz, i.e. a parameterised quantum circuit. Some ansatz circuits suffer from the "barren plateaus" problem, which is a vanishing gradient in the parameter landscape, making the algorithm inefficient [52]. So far, a general way to construct a proper ansatz is still lacking. Applied to Hamiltonians with tens to hundreds of qubits, the performance of variational quantum algorithms on a noisy quantum computer remains an open question [53,54].
In this paper, we propose a hybrid non-variational quantum simulation algorithm, i.e. the QCMC algorithm. Contrary to the QMC methods, there is no sign problem in simulating the time evolution using quantum computing. If we can delegate the calculation of the most oscillating part to quantum computing, the remaining calculations in QMC might have a very mild sign problem, or even be free from it when the entire calculation is delegated to quantum computing. To explore this possibility, we carry out the QCMC simulation by sampling random quantum circuits. Several aspects of this hybrid scheme are discussed, including implementation of the time evolution operators, the total computational complexity, the optimal sampling distribution in Monte Carlo, and the error-mitigation techniques. We show that our algorithm is polynomial on a fault-tolerant quantum computer and can reduce the variance of the Monte Carlo estimator even on a noisy quantum computer. As a subroutine of QMC, the circuit depth in quantum computing can be drastically reduced compared with the conventional Lie-Trotter-Suzuki decomposition. Therefore, our algorithm is a suitable candidate for the near-term application of quantum computing.
In the QCMC algorithm, we simulate many-body dynamics by expressing the time evolution operator in a summation form. Each term in the summation corresponds to a quantum circuit configuration. The summation formula is chosen to minimise the circuit depth and variance of the Monte Carlo estimator. We introduce two series of summation formulas based on Lie-Trotter-Suzuki product formulas [55,56]: Pauli-operatorexpansion (POE) formulas and leading-order-rotation (LOR) formulas. Compared with product formulas, in our formulas the algorithmic error converges faster with the time step size ∆t, at the cost of a moderately increased gate number per time step. For example, the second-order LOR formula converges as O(∆t 6 ), which is even faster than the fourth-order product formula. This algorithmic error in QCMC is only due to the variance of the Monte Carlo estimator and can be reduced by increasing the sample number.
We mitigate errors in QCMC in three ways. First, our summation formulas are exact formulas of the time evolution operator for any finite time step size. The product formulas have the decomposition error depending on ∆t, which must be sufficiently small to reduce the error. Exact summation formulas allow us to take a large ∆t (i.e. a small number of time steps) and use shallow circuits to implement QCMC. We remark that the gate number per time step is only moderately increased to implement the proposed summation formulas. Second, we use quantum error mitigation techniques to eliminate the impact of machine errors caused by noise in the quantum computer [48,57,58]. We present two types of circuits: forward-backward circuits have larger depths than compact circuits but provide inherent error mitigation. Alternatively, probabilistic error cancellation is a universal way to mitigate machine errors, which enlarges the estimator variance by a factor depending on the circuit depth [57,59]. Considering probabilistic error cancellation applied to compact circuits, we can estimate the overall variance of QCMC due to both QMC and error mitigation. Third, we minimise the variance, i.e. the statistical error, by taking the optimal time step size. We obtain the minimised variance of QCMC in the form of approximately e 4γhtott , where γ is the increasing rate of the variance, h tot characterises the magnitude of the Hamiltonian, and t is the evolution time.
QCMC has a variance that depends on the rate of machine errors and achieves a quantum advantage even when the error rate is finite. For the second-order LOR formula, rate of increase of variance has the upper bound γ 2.45 0.82 , where is the total gate error rate of one elementary Lie-Trotter-Suzuki product (i.e. the firstorder product for one time step). QCMC is polynomial on a fault-tolerant quantum computer because we can suppress to any small value at a polynomial cost in quantum error correction. Suppose that the variance in classical algorithms is in the same exponential form with a finite increasing rate γ c [2]: the quantum algorithm surpasses the classical algorithms given an error rate of (γ c /2.45) 1/0.82 . As an example, the rate of increase of variance in Green's function Monte Carlo taking the computational basis is γ c = 1 for a large class of qubit Hamiltonians. Compared with this classical algorithm, QCMC reduces the variance by several orders of magnitude even on a quantum computer with significant noise, e.g. by a factor of approximately 4 × 10 4 when h tot t = 4 and = 0.1. As a result, the sample number required in Monte Carlo is reduced by the same factor.
In this paper, we focus on the non-variational simulation of real time evolution. With the real time simulation, we can construct quantum phase estimation circuits [39] and eigenenergy filtering operators [40] to solve eigenstate and finite-temperature problems. The QCMC algorithm also provides a flexible tool for variational quantum algorithms. Here, we present two such examples. First, the ground state and other eigenstates are stationary and do not evolve with time, which leads to a way of ruling out fallacious solutions from the variational quantum eigensolver: if we find that the state evolves in the real time simulation, the initial state must not be an eigen-state. Second, the optimiser in the variational algorithm may get stuck in a local minimum; then, real time evolution can be used to bring the state out of the local minimum without changing the average energy. Note that by using shallow circuits in QCMC, the overall circuit combining the variational ansatz and the time evolution are still within the regime of near-term application.
This paper is organised as follows. In Sec. II, we briefly review Green's function Monte Carlo and auxiliary-field Monte Carlo. In Sec. III, we sketch the QCMC algorithm. Two series of summation formulas are introduced in Sec. IV. Details of the QCMC algorithm are presented in the form of pseudocode in Sec. V. In Sec. VI, we give two types of quantum circuits (i.e. compact circuits and forward-backward circuits) for evaluating transition amplitudes. In Sec. VII, we discuss the optimal distribution for generating samples in Monte Carlo. Two quantum error mitigation protocols using probabilistic error cancellation and forward-backward circuits, respectively, are discussed in Sec. VIII. The QCMC algorithm and the classical QMC algorithm are compared in Sec. IX. In Sec. X, we summarise the conclusions.

II. QUANTUM MONTE CARLO
Many applications of QMC can be formalised as computing the transition amplitude ψ f |e iHt * Oe −iHt |ψ i given the initial state |ψ i , the final state |ψ f , and the operator O. Here, H is the Hamiltonian, and t is a real or imaginary evolution time. For example, the ground state energy of an interacting Hamiltonian can be expressed as where |ψ 0 is a trial ground state, which has a large overlap with the true ground state. A canonical approach is Green's function Monte Carlo [7], in which the transition amplitude is expressed in the path-integral form: where {|r } is an orthonormal basis of the Hilbert space and N is the number of time steps. The path integral is performed numerically using Monte Carlo methods. Auxiliary-field Monte Carlo is another important approach of QMC [8,9], which is characterized by the decomposition of particle-particle interactions into interactions of particles with a group of auxiliary fields, i.e.
Here, A(s, ∆t) is an operator depending on the auxiliary field s. Then, the transition amplitude is expressed as The operator A(s, ∆t) is chosen such that ψ f | · · · |ψ i in the integral can be evaluated on a classical computer. In diagrammatic QMC, the time evolution amplitudes are expressed as perturbative expansions [12][13][14][15]. Suppose that the contribution of an m-th-order term is D(ξ m , x 1 , . . . , x m ): the transition amplitude is a summation of integrals in the form where ξ m is the index of the term and the x are the temporal and spatial coordinates to be integrated. These terms can be represented by Feynman diagrams. In these models, we can develop similar quantum algorithms, in which both the non-interacting time evolution and the interaction vertices can be implemented as a series of operators that can be evaluated on a quantum computer. It often occurs that the amplitude q = ψ f | · · · |ψ i as a function of r in Eq. (2) or as a function of s in Eq. (4) is not positive definite. In this case, we have to use the reweighting procedure by splitting q into its modulus and phase, i.e. q = |q|e iθq , and sample according to a probability distribution P ∝ |q|. The expectation value of the remaining phase e iθq indicates the degree of the sign problem and if it is much smaller than 1 then the sign problem is severe. In many QMC simulations, this phase goes to zero exponentially for a large system volume or particle number, which signifies a very bad sign problem.
In some special cases, the sign problem is only induced by part of the integral variables. In other words, the amplitude q is a highly oscillating function of some variables and a smooth function of the others. This usually occurs when the system is protected by an approximate symmetry. For example, for fermion systems with equal numbers of up and down spins, a spin-independent attractive interaction respecting the SU(2) spin symmetry does not induce the sign problem. In more general problems, the realistic interaction might be dominated by such a "good" component, while other "bad" components play a minor role but induce most of the sign problem. A typical example is the nuclear force, which is approximately independent of spin and isospin at low energy [20]. The spinisospin dependent components and the Coulomb force only contribute a small portion of the total nuclear binding energy but introduce strong a sign problem in the auxiliary field Monte Carlo calculations. Usually, these interactions can be simulated using the coupling constant extrapolation method [60], perturbation theory [61] or
The above problem has an alternative solution in the quantum computing era. As a quantum computer can calculate the amplitude q with the same complexity regardless of the form of the interaction, we can use the quantum computer to simulate interactions causing the sign problem, while leaving the smooth high-dimensional integrals to the classical Monte Carlo solver. For example, in the auxiliary-field Monte Carlo simulation of atomic nuclei [9], we can simulate the repulsive Coulomb force using quantum computing. In this paper, we introduce such a hybrid simulation scheme and establish a general framework for future work in this direction.

III. QUANTUM-CIRCUIT MONTE CARLO
To implement QMC using a quantum computer, we replace the integral over the auxiliary field with a summation over unitary operators. The time evolution operator is expressed in the summation form where the U (s) are unitary operators and the c(s) are complex coefficients. For real time evolution, approximate summation formulas have been proposed, including truncated Taylor expansion [30,31] and linear combinations of Lie-Trotter-Suzuki products [32,33]. In this paper, we propose exact summation formulas of the real time evolution operator (see Sec. IV). Note that we can also construct the imaginary time evolution operator as a summation of unitary operators and construct any operator in the limit that the U (s) form a complete basis of the operator space. By combining quantum circuits and the Monte Carlo method, our exact formulas can be implemented for any finite time step size ∆t. In quantum circuits, the gate number per time step is only moderately increased upon the Lie-Trotter-Suzuki product (see Sec. VI) and we can minimise the number of time steps N by maximising ∆t. Because of the minimised circuit depth, which is proportional to N , our formulas are practical on noisy quantum computers without fault tolerance.
With the summation expression of the time evolution operator, the transition amplitude in the path-integral form becomes where s = (s 1 , . . . , s N , s 1 , . . . , s N ) and One can realise a summation formula either by using a deterministic circuit [30][31][32] or sampling random circuits [33,34]. To minimise the circuit depth, we compute the transition amplitude using random circuits: we sample random unitary operators (i.e. the parameter s) on the classical computer, evaluate ψ f |O s |ψ i on the quantum computer and then compute the path-integral summation using the Monte Carlo method on the classical computer. See Fig. 1 for a schematic diagram of the QCMC algorithm and see Sec. V for details. Without fault tolerance, we use error mitigation techniques to eliminate errors in quantum circuits. In the quantum error mitigation based on quasi-probability decomposition (i.e. probabilistic error cancellation) [57,59], each unitary circuit for evaluating ψ f |O s |ψ i is decomposed into a linear combination of noisy circuits. Then, the overall algorithm includes Monte Carlo summations over unitary operators and also noisy circuits. Details of the error mitigation are given in Sec. VIII. Using our exact formulas of the time evolution operator and assuming that quasi-probability decompositions are also exact, the sampling noise in Monte Carlo is the only source of error in our algorithm.

Sampling noise and normalisation factor
The Monte Carlo summation has a finite variance depending on the sampling approach. To compute the transition amplitude in Eq. (7), we randomly generate samples of s with a probability distribution P (s). According to the importance sampling, the variance is minimised by taking the optimal distribution Implementation of the optimal distribution requires knowledge of | ψ f |O s |ψ i |.
In this paper, we focus on a practical suboptimal distribution where the normalisation factor C A = s |c(s)| determines the variance. Taking the suboptimal distribution, the transition amplitude Formally, we have Taking the suboptimal distribution, the estimator of Here, • Ns denotes the empirical mean taken over N s samples of s. The variance of the estimator is When O is a unitary operator, | ψ f |O s |ψ i | ≤ 1, and the variance has the upper bound In our QCMC algorithm, we use the circuits given in Sec. VI to evaluate ψ f |O s |ψ i . Each quantum circuit reports a probabilistic binary outcome, the expected value of which is either the real or imaginary part of e iθs ψ f |O s |ψ i . We find that the suboptimal distribution (which is suboptimal when we can deterministically evaluate ψ f |O s |ψ i ) is actually the optimal distribution for the probabilistic evaluation without prior knowledge of | ψ f |O s |ψ i | (see VII). Accordingly, the minimum variance is where 2M tot is the total number of quantum circuit shots, and each shot is an implementation of the circuit that returns one binary measurement outcome. We find that ideally C A = 1, i.e. the variance does not increase with the number of time steps. This limit can be approached on a fault-tolerant quantum computer: we take a sufficiently small ∆t, c(1) 1, U (1) e −iH∆t is a Lie-Trotter-Suzuki product, and terms with s > 1 are negligible. On a noisy quantum computer, C A is always greater than one. A large part of our effort is devoted to minimising C A , in order to reduce the variance.

IV. SUMMATION FORMULAS OF TIME EVOLUTION OPERATORS
We look for summation formulas satisfying the following criteria: • The unitary operators U (s) are easy to implement using elementary quantum gates, in order to reduce the gate number.
• The normalisation factor C A is minimised.
• Samples of s can be efficiently generated on a classical computer according to the distribution in Eq. (10).
We propose two types of summation formulas in this paper as examples of the general approach. By adding Pauli operators to Lie-Trotter-Suzuki products, we obtain POE formulas. For an lth-order product formula, the corresponding POE summation formula has the normalisation factor C A = 1 + O(∆t l+1 ). By replacing leading-order Pauli operators with rotation operators, we obtain LOR formulas and the normalisation factor is re- In the following, we first discuss Lie-Trotter-Suzuki product formulas and then introduce our summation formulas.

A. Product formulas
In this section, we review Lie-Trotter-Suzuki product formulas [55,56] and discuss some properties that are important for our discussion. Given the Hamiltonian H = M j=1 H j , where the H j are Hermitian operators, the first-order formula reads Higher-order formulas are defined recursively for any positive integer m by where when m > 1, and p r,m = 2r − (2r) Here, r can be any positive integer. S 2m is a product of 2r + 1 S 2m−2 operators.
For the first-order formula, we define the correction operator where R Pauli-operator-expansion formulas CA = 1 + CL + CT Leading-order-rotation formulas where the leading-order operator is Hermitian. Later, we show that the Hermitian leadingorder operator is important for minimising the normalisation factor C A . For higher-order formulas, the correction operators are where R 2m are Hermitian operators that are independent of ∆t. Because of the symmetric form, V 2m (∆t) = V 2m (−∆t) † for all real ∆t, and R (k) 2m = 0 for all even k [56]. Then, where the leading-order operator is Hermitian. For the second-order formula,

B. Summation formulas
To simplify the quantum circuits, we work with Pauli operators P n = {I, X, Y, Z} ⊗n as the basis of matrix space, where n is the number of qubits. Without loss of generality, we assume that each term of the Hamiltonian is a Pauli operator, i.e. H j = h j σ j , where σ j ∈ P n , and h j is a real coefficient. We define h tot ≡ j |h j |, which characterises the magnitude of the Hamiltonian.
Given the time evolution operator, there exist many different summation formulas e −iH∆t = s c(s)U (s). Each formula represents a sampling protocol in Monte Carlo. For example, Such a formula is impractical, because the computing of the coefficients Tr σe −iH∆t on a classical computer is usually difficult when n is large.
For the practical implementation, we express the time evolution operator in the form where K L and K R are unitary operators in the Lie-Trotter-Suzuki product form, and V is the correction operator, see Eqs. (20) and (23). We apply the Taylor expansion to the correction operator to obtain the summation formula. We divide the Taylor expansion into three parts, V = 1 1−iL+T , where L is the leading-order operator, and T is the high-order operator. The normalisation factor of a POE summation formula is C A = 1+C L +C T , where C L and C T are contributions of L and T , respectively. The normalisation factor of a LOR summation formula is C A = 1 + C 2 L + C T . The normalisation factors of all the formulas are summarised in Table I.

Zeroth-order Pauli-operator-expansion formula
The direct Taylor expansion of the time evolution operator gives the zeroth-order summation formula (29) where the Hermitian leading-order operator is and the high-order operator is The normalisation factor is given by C L = h tot ∆t and C T = e htot∆t − (1 + h tot ∆t).

First-order Pauli-operator-expansion formula
According to the first-order product formula, we express the time evolution operator as We obtain the summation formula by applying the Taylor expansion to each exponential in the correction operator, where and Note that the first term in L 1 is O(∆t 2 ) according to discussions on product formulas. For the first-order formula, the normalisation factor is given by

Second-order Pauli-operator-expansion formula
Similar to the first-order formula, according to the second-order product formula, we express the time evolution operator as The Taylor expansion of the correction operator reads where and According to discussions on product formulas, L 2 only contain ∆t 3 and ∆t 5 terms. For the second-order formula, the normalisation factor is given by C L = 1 6 (2h tot ∆t) 3 + 1 120 (2h tot ∆t) 5 and

Higher-order Pauli-operator-expansion formulas
For the 2mth-order formula, we express the time evolution operator as Then, we can obtain the POE summation formula by applying a Taylor expansion to each exponential in the correction operator V 2m , similar to the firstand second-order formulas. The normalisation factor of the 2mth-order formula is given by Here, the factor λ = 1 + m k=2 (4rp r,k − 1) is due to the backward evolution with the time (1 − 2rp r,m )∆t in the product formula.

Simplified leading-order operators
By combining terms with the same Pauli operator in the summation formula, we can reduce the normalisation factor. For example, if both ασ and −ασ exist in the summation formula, the contribution to the normalisation factor is 2|α|, which is reduced to zero after combining like terms. We apply this approach to F As a result, the leading-order contributions are reduced to C L < 1 2 (h tot ∆t) 2 + 1 6 (2h tot ∆t) 3 in the first-order formula and C L < 1 18 (h tot ∆t) 2 + 1 120 (2h tot ∆t) 5 in the second-order formula.

Leading-order-rotation formulas
The leading-order operator L l is Hermitian, which allows us to reduce its contribution to the normalisation factor C A from O(∆t l+1 ) to O(∆t 2l+2 ). We suppose that the Pauli-operator summation form of L l is where the τ u ∈ P n are Pauli operators. Here, all α u are real because L l is Hermitian, which is the key to LOR formulas. To minimise the normalisation factor, we express the leading-order terms as a summation of rotation operators, where φ = arctan(C L ), β u = |α u |/ sin φ and C L = u |α u |. The normalisation factor contributed by 1 . By using LOR formulas, we reduce the normalisation factor We have introduced all of our summation formulas. We remark that our summation formulas are used for sampling random U (s) rather than sampling quantum operations [34], which corresponds to a summation of completely positive maps instead of operators.

C. Comparison between formulas
Now, we compare different formulas of the time evolution operator in the fault-tolerance limit, i.e. gate errors are negligible. In this case, we can use deep quantum circuits to implement the formulas and take a sufficiently small time step size ∆t. We leave the discussions on noisy quantum computing to Secs. VIII and IX.
When gate errors are negligible, sampling noise is the only source of error for our exact summation formulas. The error due to sampling noise is of approximately A . Therefore, the error for the lth-order POE formula is of approximately 1 √ Ns + 2N √ Ns O(∆t l+1 ) and the error for the lth-order LOR formula is of approximately For Lie-Trotter-Suzuki product formulas, there are two sources of error: the error due to finite ∆t, i.e. the formulas are approximate, and the error due to sampling noise. The error due to finite ∆t is systematic and cannot be reduced by increasing the number of samples. For the l-th order product formula, the error is of approximately 1 √ Ns + N O(∆t l+1 ), where the first term is due to the sampling noise and the second term is due to the finite ∆t. We note that on a fault-tolerant quantum computer, we can use amplitude amplification to accelerate the evaluation of an amplitude of the wave function [65]. Amplitude amplification can be applied to product formulas; how to apply it to our summation formulas is an open question.
We find that for the same order of formulas, our summation formulas have a smaller error than product formulas, due to the factor 1 √ Ns in the ∆t term and the increased exponent of ∆t (for LOR formulas). The reduced error is at the cost of an increased gate number per time step: to implement our formulas, we need to add a correction operator to the Lie-Trotter-Suzuki product for each time step. A correction operator is either a Pauli operator σ or a rotation operator in the form e −iφσ . In Sec. VI C, we show that implementation of the correction operator for POE and LOR formulas requires at most n and 4n controlled-NOT gates, respectively, on an all-toall qubit network (4n − 3 and 8n − 4 gates, respectively, on a linear qubit network). Here, n is the qubit number. Unless the Hamiltonian has the simplest structure, such as the one-dimensional quantum Ising model, it is reasonable to assume that the gate number for the first-order Lie-Trotter-Suzuki product S 1 is more than 2n. Therefore, the gate number increment in each time step is moderate.
The linear combination of Lie-Trotter-Suzuki products can efficiently reduce the error due to finite ∆t [32,33]. The simplest example is e −iH∆t = 4 . We can find that the error for our second-order LOR formula converges faster as O(∆t 6 ), and the gate number is smaller compared with S 2 2 (assuming that the gate number for one S 2 is larger than a correction operator).

V. ALGORITHM
The algorithm consists of three phases. First, the classical computer generates samples of s according to the distribution given by Eq. (10) and composes corresponding quantum circuits. Second, the quantum computer implements circuits to evaluate ψ f |O s |ψ i . Finally, with results from the quantum computer, the classical computer calculates the expected value of e iθs ψ f |O s |ψ i and returns the final estimate of the transition amplitude ψ f |e iHt Oe −iHt |ψ i .
In this section, we present the first and final phases of the algorithm, which are implemented on the classical computer. We leave details of the second phase, i.e., the quantum computing, to Sec. VI. We focus on second-order summation formulas and the algorithms for the other summation formulas are similar.
Our algorithm has some implicit connections to the diagrammatic Monte Carlo, in which the Feynman diagrams represent the perturbative expansions for interacting amplitudes. Similarly, the summation formulas in our algorithm are perturbativelike expansions around Lie-Trotter-Suzuki products. In our case, each term represents a path in the Hilbert space defined by the unitary operator U (s) instead of ξ m and x and these paths constitute the time evolution, which resembles the pathintegral picture. This connection may be further explored to design new quantum algorithms.

A. Sampling algorithm
The Hamiltonian is specified by a vector of real numbers h = (h 1 , . . . , h M ) and a vector of Pauli operators σ = (σ 1 , . . . , σ M ). Given the evolution time t, we need to choose a number of time steps N ; then, the corresponding time step size is ∆t = t/N . These parameters, h, σ, N , and ∆t, are inputs to the sampling algorithm. To present the algorithm in a way that works for both POE and LOR formulas, we introduce an additional input parameter F = P, R to denote POE and LOR formulas,  . . . , WN , W 1 , . . . , W N Output (W , θ). Algorithm 2 Sample generation for one time step.
In the second-order summation formulas, each term is in the form S 1 (− ∆t 2 ) † W S 1 ( ∆t 2 ): In the POE formula, W is always a Pauli operator; in the LOR formula, W is either a rotation operator or a Pauli operator. Taking U (s Here, we use the notations S 1 = S 1 ( ∆t 2 ) and S 1 = S 1 (− ∆t 2 ) † for simplicity. Given the vector of correction the quantum computer can evaluate ψ f |O s |ψ i . In the final phase, the classical computer estimates the transition amplitude by computing the expected value of e iθs ψ f |O s |ψ i . Therefore, the sampling algorithm also needs to output θ s .
Overall, the outputs of the sampling algorithm are W and θ. The procedure for generating W and θ is given in Algorithm 1. Algorithm 2 is a subroutine for processing one time step.

B. Quantum-circuit Monte Carlo algorithm
Using the Monte Carlo summation to compute the path-integral formula in Eq. (7), we need to choose two parameters N s and M s , which are the number of s samples and the number of shots per quantum circuit for evaluating ψ f |O s |ψ i , respectively. Because the transition amplitude is a complex number in general, the quantum computing returns two real numbers a R,s and a I,s , which are estimates of the real and imaginary parts of e iθs ψ f |O s |ψ i , respectively. By computing expected values of a R,s and a I,s , we obtain the transition amplitude ψ f |e iHt Oe −iHt |ψ i up to the factor C 2N A . QCMC is summarised in Algorithm 3.

C. Quantum-circuit Monte Carlo on classical computer
In this section, we show that QCMC with the zerothorder POE formula is equivalent to QMC on a classical computer. In the zeroth-order POE formula, the time evolution operator is expanded into the form e −iH∆t = s c(s)σ s , where the σ s ∈ P n are Pauli operators. We can express a Pauli operator as where x a , z a = 0, 1, and i xaza X xa Z za = I, X, Y, Z is a single-qubit Pauli operator of qubit-a. We consider computational basis states in the form n a=1 |µ a , where µ a = 0, 1. A Pauli operator acting on a basis state always results in a basis state, i.e. σ n a=1 |µ a = n a=1 i xaza (−1) zaµa |µ a ⊕ x a , where ⊕ denotes the modulo 2 addition. Therefore, Pauli operators acting on basis states can be efficiently calculated on a classical computer. Similarly, Pauli operators acting on product states in the form n j=1 |ψ j and stabiliser states [66] can also be efficiently calculated on a classical computer. In the following, we focus on computational basis states.
The zeroth-order POE formula is auxiliary-field Monte Carlo, which takes the space of Pauli operators as the auxiliary-field. Suppose that the initial and final states are computational basis states and O is a Pauli operator. We can evaluate on a classical computer. By expressing the initial and final states as linear combinations of basis states and the operator O as a linear combination of Pauli operators, we can evaluate ψ f |O s |ψ i on a classical computer for the general states and the operator. Therefore, we can implement QCMC with the zeroth-order POE formula without using a quantum computer. Now, we consider a class of Hamiltonians without short-time interference between Pauli operators. Each Pauli operator corresponds to two binary strings (x 1 , . . . , x n ) and (z 1 , . . . , z n ). If the x strings of two Pauli operators σ and τ are different, we have ψ|τ σ|ψ = 0 for all computational basis states |ψ = n a=1 |µ a . The short-time evolution operator e −iH∆t 1 1 − iH∆t acting on a basis state results in We find that there is no interference between the terms if and only if ψ|τ σ|ψ = 0 for all σ, τ ∈ {1 1} ∪ {σ j }: i.e., the Pauli operators in the Hamiltonian have different x strings. For Hamiltonians without short-time interference, the zeroth-order POE formula is equivalent to Green's function Monte Carlo, which takes the computational basis. In Green's function Monte Carlo, we sample states |r ; in QCMC, we sample Pauli operators. Substituting the computational basis for {|r }, the transition amplitude of each time step reads ψ |e −iH∆t |ψ , where |ψ = n a=1 |µ a . For a Hamiltonian without short-time interference, basis states |ψ with nonzero ψ |e −iH∆t |ψ and Pauli operators in {1 1} ∪ {σ i } have one-to-one correspondence in the limit of small ∆t. Therefore, sampling Pauli operators is equivalent to sampling basis states |ψ .
The class of Hamiltonians without short-time interference includes those are hard for simulation in classical computing. In Appendix B, we show that the Fermi-Hubbard model on any bipartite lattice (e.g. the square lattice) can be encoded into a qubit Hamiltonian without short-time interference, using the Jordan-Wigner transformation.

VI. QUANTUM CIRCUITS
We propose quantum circuits for evaluating the transition amplitude of the operator O s , and the gate num-ber per time step is moderately increased upon the Lie-Trotter-Suzuki product. To measure the transition amplitude, we need to introduce an ancillary qubit, which controls the evolution of n qubits representing the system. In Eq. (45), the evolution is driven by Lie-Trotter-Suzuki products R 1 and correction operators W . Our circuits are simplified in two ways: first, we avoid controlled Lie-Trotter-Suzuki products and only use controlled corrections; and, second, the correction operators are either Pauli operators σ or rotation operators e −iφσ .
We propose two types of circuits. For compact circuits, the circuit depth is the same as the Lie-Trotter-Suzuki decomposition with additional controlled-correction gates. For forward-backward circuits, the circuit depth is doubled, but they provide inherent quantum error mitigation. In this section, we also show how to efficiently decompose a controlled-correction gate into elementary gates. We assume that O is a unitary operator, and we can evaluate a general operator by decomposing it into a linear combination of unitary operators.

A. Compact circuit
The compact circuit for second-order formulas is shown in Fig. 2(a). The circuits for the other summation formulas are similar. For the first-order formulas, we remove the S 1 products from the circuit; for the zeroth-order formulas, we remove both the S 1 and the S 1 products; and by adding more S 1 and S 1 products, the circuit can be used for higher-order formulas. If we ignore controlledcorrection gates, the compact circuit for the lth-order summation formula is the same as the circuit for the lthorder Lie-Trotter-Suzuki product formula. Now, we focus on second-order formulas, and the analysis for the other formulas is similar. The final state of the compact circuit (before the basis adjusting gate B) is Measuring the ancillary qubit, we obtain where X a and Y a are Pauli operators of the ancillary qubit. Here, we use Eq. (45). The procedure for evaluating ψ f |O s |ψ i using compact circuits is given in Algorithm 4.
FIG. 2. Quantum circuits for evaluating e iθs ψ f |Os|ψi . The qubit on the top is the ancillary qubit. The empty circle in the blue box denotes a controlled-U gate that U acts on the n qubits when the ancillary qubit is in |0 . Unitary operators Ui and U f prepare the initial and final states, respectively, i.e. |ψi = Ui|0 ⊗n and |ψ f = U f |0 ⊗n . The gate B is for adjusting the measurement basis. For simplicity, we use the notations S1 = S1( ∆t 2 ) and S 1 = S1(− ∆t 2 ) † .

Algorithm 4
Quantum circuit evaluation. Implement the circuit, measure cos θXa + sin θYa and collect the outcome µ R = ±1.

5:
a R ← a R + µ R

7:
a I ← a I + µ I

8:
a R ← a R /Ms 9: a I ← a I /Ms 10: Output (a R , a I ).

B. Forward-backward circuit
The forward-backward circuit for second-order formulas is shown in Fig. 2(b). Compared with the compact circuit, the number of S 1 and S 1 products is doubled. The final state of the circuit is Here, we use Eq. (45). Measuring the ancillary qubit, we obtain The procedure for evaluating ψ f |O s |ψ i is similar to Algorithm 4. Note that evaluating the transition amplitude in this way does not provide inherent error mitigation. We discuss the inherent error mitigation using postselection in Sec. VIII B.

C. Controlled-correction gates
We consider two types of qubit networks. On the all-toall network, controlled-NOT gates on all pairs of qubits are available. On the linear network, only controlled-NOT gates on nearest neighboring qubits are allowed. We use Λ a,b to denote the controlled-NOT gate that a  and b are the control and target qubits, respectively. Because the error rate of controlled-NOT gates is usually much higher than that of single-qubit gates, we only count controlled-NOT gates and minimise their number. A general Pauli operator σ is equivalent to an Xproduct Pauli operator (i.e. a tensor product of I and X) up to a unitary transformation. For the σ in Eq. (47), the transformation isR = H z1(1−x1) S z1x1 ⊗ · · · ⊗ H zn(1−xn) S znxn , where H is the Hadamard gate, and S is the π 4 phase gate. This transformation leads tõ Implementation of the controlled-σ gate on the all-toall network is straightforward. For each qubit with x a ∨ z a = 1, we apply the controlled-NOT gate Λ 0,a , where qubit 0 is the ancillary qubit. The controlled-σ gate is n a=1 Λ xa∨za 0,a , and the number of controlled-NOT gates In the compact circuit shown in Fig. 2(a), there are two controlled-correction gates in each time step, corresponding to W i and W i , respectively. When W i = τ and W i = τ are Pauli operators, we can combine the two controlled-correction gates into one controlled-σ gate in the following way. Note that τ τ = ζσ, where ζ is a phase factor. We apply τ first, then a controlled-σ gate, and finally a phase gate diag(1, ζ) on the ancillary qubit. The overall transformation is equivalent to the two controlledcorrection gates. The total number of controlled-NOT gates is N Λ . Now, we present another protocol for the controlled-σ gate. The circuit is shown in Fig. 3(a), which is formed of three parts: gates transforming a general Pauli operator σ into an X-product Pauli operatorσ, gates transforming σ into the single-qubit Pauli operator X 1 on qubit 1, and the controlled-NOT gate Λ 0,1 on the ancillary qubit and qubit 1. On the linear network, we assume that qubit 1 is next to the ancillary qubit. On the all-to-all network, we can label any qubit as qubit 1; without loss of generality, we assume that x 1 ∨z 1 = 1. In this protocol, there is only one instead of N Λ gates on the ancillary qubit. Because the outcome is obtained by measuring the ancillary qubit, applying fewer gates on the ancillary qubit potentially reduces the impact of errors. Replacing Λ 0,1 with the circuit in Fig. 3(b), we can realise the controlled-e −iφσ gate.
To transformσ into X 1 , we look for a transformatioñ Λ that satisfies σ =Λ † X 1Λ . On the all-to-all network, we takeΛ = n a=2 Λ xa∨za 1,a and the number of controlled-NOT gates for eachΛ is N Λ − 1. On the linear network, we takẽ where x a ↓ z a = (1 − x a )(1 − z a ) and n = max{a | x a = 1}. On the linear network, the number of controlled-NOT gates for eachΛ is (n − 1) + The maximum number of controlled-NOT gates for implementing the two controlled-correction gates in each time step is summarised as follows. On the all-to-all network, the maximum gate number is n for POE formulas, which becomes 2(n − 1) + 1 = 2n − 1 to reduce gates on the ancillary qubit and 2[2(n − 1) + 2] = 4n for LOR formulas. On the linear network, the maximum gate number is 2(2n − 2) + 1 = 4n − 3 for POE formulas and 2[2(2n − 2) + 2] = 8n − 4 for LOR formulas. In the MCQC algorithm, the controlled-correction gates are randomly selected and the gate number could be much smaller than its maximum value. For example, for the POE formula, the Pauli operator of the zeroth-order term in the expansion is the identity.

VII. OPTIMAL DISTRIBUTION
In this section, we derive the optimal distribution of s that minimises the variance in Monte Carlo. Using the protocols in Sec. VI to evaluate ψ f |O s |ψ i , we prove that taking the distribution in Eq. (10) and M s = 1 is optimal and that the minimum variance is given by Eq. (16).
A quantum circuit usually has random measurement outcomes; therefore, the outputs of quantum computing a R,s and a I,s are random variables. We suppose that a ν,s (ν = R, I) takes the value a i with the probability P ν,s,i in the quantum computing; then, its expected value is E [a ν,s ] qc = i P ν,s,i a i . Here, E [•] qc denotes the mean taken over quantum computing runs for the specific s (each run returns an output evaluated using M s shots) and E [•] without the subscript 'QC' denotes the mean taken over both s and quantum computing runs. Using the protocols in Sec. VI, a R,s and a I,s are unbiased estimators of ψ f |O s |ψ i , i.e.
e iθs ψ f |O s |ψ i = E [a R,s ] QC + iE [a I,s ] QC . (55) Let A R and A I be the real and imaginary parts of ψ f |e iHt Oe −iHt |ψ i , respectively.
In the QCMC algorithm, we evaluate the summation A ν = s c s E [a ν,s ] QC using the Monte Carlo method, where Given any probability distribution P (s), we have Therefore, we can estimate A ν by sampling s according to the distribution P (s) and compute the empirical mean of c s a ν,s /P (s). The variance of the estimatorÂ ν with N s samples is where α ν,s = E a 2 ν,s QC . The optimal distribution that minimises the variance is P (s) ∝ |c s |α ν,s , and the minimum variance is Now, we consider that a ν,s is obtained by taking the empirical mean of M s binary numbers. Each binary number takes ±1 corresponding to the measurement outcome of the ancillary qubit (see Algorithm 4). Then, a ν,s follows the binomial distribution and Let M tot be the total number of circuit shots; we have N s = M tot /M s . Substituting α ν,s and N s into Eq. (58), we obtain the variance as a function of M s . Taking M s as a continuous variable, we find that the derivative of the variance with respect to M s is always positive when M s ≥ 1. Therefore, the variance is minimised at M s = 1. When M s = 1, we have α ν,s = 1, and the optimal distribution is P (s) ∝ |c s |. Accordingly, the minimum variance is With Var Â = Var Â R + Var Â I , we obtain the minimum total variance in Eq. (16). Here, we assume that the total number of shots for each of the real and imaginary parts is M tot . We remark that the optimal distribution is obtained by assuming the empirical mean estimator for a ν,s . If we have prior knowledge of the a ν,s distribution, we can use other estimators such as the Bayes estimator to reduce the variance. In the extreme case, suppose that E [a ν,s ] QC is known, the optimal distribution is P (s) ∝ |c s E [a ν,s ] QC | instead of P (s) ∝ |c s |α ν,s (note that in this case, we do not even need the quantum computer).

VIII. QUANTUM ERROR MITIGATION
Many quantum error mitigation protocols can be classified into three categories. In the first category, with knowledge of the error model, we compensate the effect of errors by using approaches such as error extrapolation and probabilistic error cancellation (i.e. quasi-probability decomposition) [48,57,59]. In the second category, data from quantum circuits are processed according to constraints on the quantum state. The protocols in this category include, for example, symmetry-based postselection [67,68] and purification [69][70][71]. There are also protocols, e.g. subspace expansion [58], introduced for specific algorithms, which belong to the third category.
In this section, we first discuss the application of quasiprobability decomposition in QCMC, and then we show that the forward-backward circuit in Fig. 2(b) provides inherent error mitigation based on constraints on the state. The error mitigation increases the variance in Monte Carlo. On a noisy quantum computer, we need to choose an optimal time step size ∆t to minimise the variance. Eventually, the variance is determined by the error rate, which is discussed in Sec. IX.

A. Quasi-probability decomposition
In the quasi-probability decomposition, an error-free quantum operation is expressed as a linear combination of noisy operations. Let G ef = [U ] and G i be the errorfree operation and noisy operations, respectively. The quasi-probability decomposition is in the form where q i are real coefficients, i.e. quasi-probabilities.
Here, U is a unitary quantum gate, [U ](•) = U • U † is the trace-preserving completely positive map of the gate and G i are operations that can actually be implemented on the noisy quantum computer. Similar decompositions can be applied to the initial state and measurement. We take the Pauli error model as an example. Note that a general error model can be converted into the Pauli error model using Pauli twirling [72]. In the Pauli error  (62) p σ 1 is the rate of Pauli error σ, and p = σ =I⊗I p σ is the total error rate. The inverse map of N is also in the Pauli-operation summation form, i.e.
and we can solve coefficients q σ numerically. Without a general analytically expression of q σ , it is sufficient for us to consider the first-order expansion in order to discuss the impact on variance. To the first order, we have Given the inverse map, the quasi-probability decomposition of gate U is Assuming that errors in single-qubit gates are negligible, the composite operation [σ]G can be implemented on the noisy quantum computer by adding a Pauli gate σ after the noisy two-qubit gate G. We note that the assumption of negligible errors in single-qubit gates is not necessary for the quasi-probability decomposition. Now, we apply the quasi-probability decomposition to a quantum circuit. In QCMC, using protocols in Sec. VI, only the ancillary qubit is measured. We can adjust the measurement basis using the gate B in Fig. 2; therefore, without loss of generality, we focus on the observable Z a in the error mitigation. Given a quantum circuit formed of many elementary gates, the mean of Z a reads where ρ = |0 0| ⊗n is the initial state of the quantum circuit, and N G is the number of gates. Suppose that the quasi-probability decomposition of each gate is G ef Each term in the summation is the mean of Z a in a circuit modified from the original one. We remark that errors in the initial state and the final measurement can be corrected in a similar way. We evaluate the decomposition formula in Eq. (68) using the Monte Carlo summation method by sampling random noisy circuits; therefore, such an error mitigation protocol is called probabilistic error cancellation. Similar to QCMC, the sampling of random circuits increases the variance by a factor of C 2 According to the Pauli error model, we where p j is the error rate of the jth gate. We find that the factor C E increases with the number of noisy gates. Therefore, the circuit with fewer gates, i.e. the compact circuit in Fig. 2(a), is preferred.
In previous discussions, we have assumed that errors in different gates are not correlated. To deal with correlations, we need to introduce a general form of the quasiprobability decomposition, where Z a C k is the mean of Z a in the circuit C k , C 0 is the original circuit, and the C k =0 are modified circuits. Modified circuits generated by adding single-qubit operations to the original circuit are usually sufficient for the existence of the decomposition formula. Without correlations, we can work out quasi-probabilities using gate set tomography [59]; with correlations, we can determine quasi-probabilities using data of Clifford circuits, i.e. Clifford sampling [73,74]. Given the quasi-probability decomposition formulas, we can evaluate ψ f |O s |ψ i with error mitigation following the procedure in Algorithm 5. Choose k with the probability |q k |/CE.

B. Inherent error mitigation by postselection
As shown in verified quantum phase estimation [75] and dual-state purification [76], a quantum circuit with the forward-backward structure incorporating postselection is robust to errors. For the postselection, we measure the n qubits representing the system in addition to the ancillary qubit see Fig. 2(b). We only select the state when the measurement outcome is |0 ⊗n , which transforms the final state in Eq. (52) into Measuring the ancillary qubit in the state after postselection, we have In the quantum-circuit Monte Carlo (first-order leading-order-rotation formula), we take ∆t = 0.05 and Ns = 10000. In the classical Monte Carlo (zerothorder Pauli-operator-expansion formula), we take ∆t = 0.01 and Ns = 100000 [77]. The samples are generated according to the distribution in Eq. (10).
where • 0 denotes the mean conditioned on the outcome |0 ⊗n . Solving the equations, we obtain The postselection forces most of qubits into a pure state, which eliminates errors that transform |0 ⊗n into orthogonal states. In addition to postselection, we can purify the ancillary qubit as follows. According to Eq. (70), the state of the ancillary qubit is a pure state when the quantum circuit is error-free. In the tomography purification, we implement the state tomography to the ancillary qubit and compute the eigenstate with the largest eigenvalue of the reconstructed reduced density matrix [76]. Using the eigenstate to compute the three means • 0 , we can make sure that the final result is obtained from a pure state. In Sec. VIII B 1, we demonstrate that the inherent error mitigation can significantly reduce the error in QCMC. Now, we have two protocols using the circuit in Fig. 2(b) to evaluate ψ f |O s |ψ i . In the protocol without postselection (see Sec. VI B), the estimator of ψ f |O s |ψ i is unbiased, and it is optimal to take M s = 1. In the protocol with postselection, the estimator is biased due to the denominator in Eq. (72) i.e. the mean of estimates is not exactly ψ f |O s |ψ i when M s is finite. Therefore, for the postselection protocol, it is necessary to choose a large M s to evaluate each • 0 (such that the bias is small) in order to obtain an accurate final result of the transition amplitude.
The inherent error mitigation increases the variance of QCMC. When the circuit is error-free, the postselection succeeds with the probability If the circuit is implemented for M s shots, only P S M s shots generate effective data on average. When the circuit is noisy, errors transform |0 ⊗n into orthogonal states, which reduces the success rate. Therefore, the number of effective shots decreases with the error rate and the gate number, which causes an enlarged variance.

Numerical demonstration
To demonstrate the inherent error mitigation, we consider the one-dimensional Fermi-Hubbard model and numerically simulate the noisy quantum computing on a classical computer. The Hamiltonian reads where N L = 3 is the number of sites and c i,s is the annihilation operator for the fermion with spin-s on the ith site. This model can be encoded into 2N L qubits using the Jordan-Wigner transformation.
We use the first-order Lie-Trotter-Suzuki product formula and a corresponding summation formula to simulate the real time evolution. The initial state is |ψ i = c † 1,↑ c † 2,↓ c † 3,↑ |Vac , where |Vac is the vacuum state, and we take |ψ f = |ψ i . The observable is O = 2c † 3,↑ c 3,↑ − 1 1 and the simulation is to compute O = ψ i |e iHt Oe −iHt |ψ i . To minimise the variance of QCMC, we first expand the correction operator using Pauli operators, i.e.
where a σ and b σ are real, and We have a 1 1 > 0 and b 1 1 = 0. Then, we take the summation formula where φ = arctan(a −1 1 1 σ |b σ |) and β σ = |b σ |/ sin φ. Using the forward-backward circuit for error mitigation, we find that the impact of machine errors can be significantly suppressed, as shown in Fig. 4. We model the noise in quantum computing using the depolarising error model. For a controlled-NOT gate, the noise map is given by Eq. (62) with parameters p σ = p/15. We neglect errors in the initialisation, single-qubit gates, and measurement. In the numerical simulation, we take the error rate per gate p = 0.03%. The number of controlled-NOT gates for each S 1 is 14 and the simulation involves at most 85 time steps, i.e. the total number of controlled-NOT gates is above 2380. Therefore, the maximum total error rate is above 71.4%. After the error mitigation, we find that the overall accuracy of the summation formula taking N s = 10000 samples is higher than the product formula without machine errors.

IX. QUANTUM COMPUTING VERSUS CLASSICAL COMPUTING
Sampling noise is the main source of error in the QCMC algorithm. The Monte Carlo variance increases exponentially with the evolution time as approximately Ns e 4t∆t −1 ln C A . As summarised in Table. I, , we can reduce the factor to C 4N A = e δ+O(δ k/(k−1) ) for any small δ. We note that k > 1 in lth-order POE formulas with l > 0 and all LOR formulas. For the zeroth-order POE formula, because C A = e htot∆t (i.e. k = 1), we have C 4N A = e 4htott for all ∆t.
It is widely believed that a classical computer cannot simulate the time evolution of general quantum manybody systems at a polynomial cost, which is one of main motivations for quantum computing [1]. The QCMC algorithm with the zeroth-order POE formula is equivalent to a classical algorithm, i.e. Green's function Monte Carlo taking the computational basis, for a large class of Hamiltonians (see Sec. V C). In this classical algorithm, the variance increases exponentially with the evolution time and system size, i.e. approximately 1 Ns e 4htott , whatever ∆t we choose. Here, t is the evolution time and h tot increases with the system size. The variance is up to minimization, e.g. changing the Hilbert space basis [23] and optimising the method for generating samples. Nevertheless, the existence of a generic approach that reduces the exponential scaling to polynomial one is unlikely [2].
On a fault-tolerant quantum computer, we can simulate the time evolution of quantum many-body systems at a polynomial cost. By taking a small ∆t, we can reduce the factor C 4N A to a satisfactory level and ∆t scales polynomially with t and h tot . Therefore, the number of times steps N = t/∆t, i.e. the circuit depth, scales polynomially with t and h tot .
To demonstrate the impact on the sign problem in QMC incorporating quantum computing, we simulate the real time evolution of two models, the Fermi-Hubbard model and the Heisenberg model. We use two formulas in the simulation of each model, the zeroth-order POE and first-order LOR formulas. The zeroth-order POE formula corresponds to a classical QMC algorithm. Because the zeroth-order POE formula only includes products of Pauli operators, we can efficiently evaluate it on a classical computer even when the system size is large. The first-order LOR formula includes products of non-Pauli unitary operators, e.g. the product in Eq. (17). With a quantum computer, we can efficiently evaluate these non-Pauli products when the system is large. For the purpose of comparing the sign problem in two formulas, we evaluate both formulas on a classical computer given that the system size is up to six qubits. The phase average e iθq is used to indicate the sign problem, where e iθq denotes the phase of e iθs ψ f |O s |ψ i . We find that the sign problem is significant in the zeroth-order POE formula, i.e. e iθq converges to zero rapidly with the evolution time (see Figs. 5 and 6). As a result, the estimation of an observable has a large variance. In comparison, the sign problem is mild in the first-order LOR formula, i.e. e iθq is finite. With the sign problem mitigated, the simulation using the first-order LOR formula is accurate for a much longer evolution time compared with the zerothorder POE formula.
On a noisy quantum computer, the cost of simulating quantum many-body systems increases exponentially with the evolution time and system size and the rate of increase decreases with the error rate. Using the quasiprobability decomposition to mitigate errors, the error mitigation enlarges the variance. The variance taking into account quantum error mitigation is approximately 1

Ns C 4N
A C 2 E . We consider the compact circuit in Fig. 2(a). Let be the error rate of one elementary product S 1 (we assume that S 1 has the same error rate), let g be the number of S 1 and S 1 products per time step, and let η be the average error rate of the controlled-correction gates in one time step. The total error rate of one time step is approximately (g + η) . Here, g = 0, 1, 2 for zeroth-, first-, and second-order formulas, respectively. Suppose that the total error rate of other operations [which are out of the bracket in Fig. 2(a)] is and the factor due to error mitigation is C E (1 + 2 ) [1 + 2(g + η) ] N , according to the Pauli error model. Then, we can express the variance in the form where γ = 1 h tot ∆t ln C A 1 + 2(g + η) .
We find that the rate γ decreases with the error rate. Given the error rate of the noisy quantum computer, we choose the time step size ∆t to minimise the rate γ. Taking C A 1 + ξ∆t k , we find that the optimal step size is and the corresponding minimum rate is For the second-order LOR formula, k = 6, g = 2, and ξ < h 6 tot /(2 × 18 2 ). In Fig. 7, we plot the minimum rate computed numerically using formulas of C A in Table. I. For the first-and second-order formulas, we take the upper bound of the simplified leading-order contribution. We find that the second-order LOR formula outperforms other formulas. Taking higher-order formulas does not further reduce γ for the given error rates.
Although the cost of the QCMC algorithm on a noisy quantum computer scales exponentially with the evolution time and system size in the same way as the classical algorithm, the quantum computing can accelerate QMC by reducing the variance. To achieve the computation accuracy δ, i.e. to reduce the variance to δ 2 , we take N s ∼ e 4γhtott /δ 2 . In the classical algorithm, γ = 1. In the quantum algorithm, taking the minimum value for η = 1 [see Fig. 7(b)], we have γ 0.34 when the error rate per elementary product is = 0.1 and γ 0.058 when = 0.01. For h tot t = 4, the quantum algorithm reduces the sample size N s by a factor of approximately 4 × 10 4 when = 0.1 and approximately 4 × 10 6 when = 0.01. The advantage of the quantum algorithm grows when the error rate decreases: fitting to the second-order LOR curve in Fig. 7(b), we have γ 2.45 0.82 .
We can optimise the quantum algorithm to reduce the variance in various ways. First, we can significantly reduce C A for a Hamiltonian with only local interactions. C A is greater than one because of the correction operator, which is used to compensate the difference between the exact time evolution operator and the Lie-Trotter-Suzuki product. According to the Baker-Campbell-Hausdorff formula, this difference is a series of commutators. For local interactions, most of the commutators in low-order terms are zero. In this case, expanding the correction operator according to the Baker-Campbell-Hausdorff formula (instead of the direct Taylor expansion) can reduce C A . Second, similar to the classical algorithm, with some knowledge of ψ f |O s |ψ i , we can optimise the distribution of generating samples to reduce the variance. Third, the variance due to quantum error mitigation can be reduced. In the error mitigation protocol used to estimate γ, we correct all Pauli errors in the circuit, which is unnecessary. Because the ancillary qubit is measured to evaluate the transition amplitude, we only need to correct errors that affect the ancillary qubit. These errors can be identified and corrected by utilising the learning-based approach of error mitigation [73].

X. CONCLUSIONS
In this paper, we propose a QMC algorithm that uses quantum computing as a subroutine, which allows the non-variational quantum simulation to be implemented with noisy intermediate-scale quantum hardware. In our algorithm, we use exact summation formulas to express the time evolution operator. We optimise these summation formulas and quantum circuits to minimise the Monte Carlo variance and circuit depth. The optimal distribution of generating samples in Monte Carlo is derived in the circumstances of probabilistic evaluation using quantum computing. On a noisy quantum computer, we can use probabilistic error cancellation or inherent error mitigation to eliminate machine errors. By choosing the parameter ∆t, we can maximise the quantum speedup given a finite error rate. This scheme illustrates a way of designing quantum algorithms with reduced circuit depth by using Monte Carlo techniques [78].
Our algorithm shows that a quantum computer without fault tolerance can speed up solving practical problems. In terms of algorithmic complexity, quantum computing has an advantage over classical computing in many computational tasks, in the fault-tolerance regime achieved with quantum error correction [79]. Even without error correction, a quantum device can perform tasks that are intractable for classical computers, such as sampling the output of a quantum circuit [80]. Our algorithm is to solve a practical problem, i.e. the non-variational simulation of quantum many-body systems. We theoretically analyse the complexity of our algorithm, i.e. the circuit depth and sampling cost. The complexity is polynomial on a fault-tolerant quantum computer. On a noisy quantum computer, although the complexity is exponential due to the finite error rate, our algorithm can still outperform classical algorithms and speed up Monte Carlo calculations by substantially reducing the sign problem.