Efficient variational quantum simulator incorporating active error minimisation

One of the key applications for quantum computers will be the simulation of other quantum systems that arise in chemistry, materials science, etc, in order to accelerate the process of discovery. It is important to ask: Can this be achieved using near future quantum processors, of modest size and under imperfect control, or must it await the more distant era of large-scale fault-tolerant quantum computing? Here we propose a variational method involving closely integrated classical and quantum coprocessors. We presume that all operations in the quantum coprocessor are prone to error. The impact of such errors is minimised by boosting them artificially and then extrapolating to the zero-error case. In comparison to a more conventional optimised Trotterisation technique, we find that our protocol is efficient and appears to be fundamentally more robust against error accumulation.


I. INTRODUCTION
Many quantum algorithms have been developed under the presumption that the hardware upon which they will run is effectively error-free: the error rate is so low that the entire algorithm can be executed successfully without a single error. It is now known that such hardware can, in principle, be created using components that have far higher error rates. Fault-tolerant quantum computing can be achieved by encoding qubits in non-Abelian anyons in topological materials [1] or using the quantum error correction codes [2]. While the former is still in the early stages of its development, for the latter approach sub-threshold quantum operations have been demonstrated in ion-trap and superconducting systems [3][4][5][6]. However, quantum error correction involves a substantial multiplication of resources; the number of physical qubits required may be orders of magnitude greater than the number of error-free logical qubits seen by the algorithm. A recent study audited the cost of implementing Shor's algorithm to solve a classicallyinfeasible task, and found that even with state-of-the-art techniques for magic state distillation the machine would need over six million of today's highest quality qubits [7].
The need for millions of qubits contrasts starkly with the fact that only fifty qubits are needed to achieve socalled 'quantum supremacy', i.e. to create a quantum processor that is so complex that conventional supercomputers [8] cannot predict its behaviour. Machines involving this many qubits, under good but imperfect control, are expected to emerge in the next few years. The challenge for researchers is to identify useful functions for such devices, in order to motivate further investment and continue the evolution toward the longer-term goal of fully fault tolerant systems.
Recently some hybrid quantum/classical algorithms have been developed which are promising for near future quantum applications [9][10][11][12][13][14][15][16][17]. A common feature of these algorithms is that the quantum computer is only in charge of carrying out a subroutine, acting as a 'coprocessor' while the larger scale algorithm is governed by a classical computer. The task of the quantum computer is thus simplified and may be accomplished with relatively few quantum operations. A higher error rate per operation is therefore tolerable; in a fault-tolerant machine this would imply a more modest resource overhead for the code, but it may even be possible to implement such quantum algorithms without quantum error correction.
Hybrid approaches are very relevant to quantum simulation, i.e. Feynman's vision [18,19] of using a controlled quantum processor to model another quantum system. Such a technology would be highly advantageous for the investigation of various large quantum systems, e.g. simulating quantum chemistry systems [20][21][22][23], or novel materials and other condensed-matter systems [24][25][26]. A powerful tool that has been exploited in several hybrid protocols is the variational method [9][10][11][12]. Typically the state of the target system can be found by writing a trial quantum state with a large but tractable number of parameters, and then discovering the optimal value of these parameters. Implicitly this requires the scientists to use their understanding of the target system (the novel molecule, or material) to select a set of parameters that, while large, is far smaller than the total number of parameters needed to specify an arbitrary quantum state. The latter is of course exponential in the number of particles composing the target system.
Our focus here is on finding the dynamics of interesting quantum systems, and we briefly remark on the considerable significance of such a capability. Dynamics must be studied when properties cannot be determined from static features. This has motivated dynamical versions of many well-known techniques, e.g. nonequilibrium dynamical mean-field theory [27], the time-dependent variational quantum Monte Carlo method [28], timedependent tensor network methods [29,30], and of course time-dependent density functional theory [31]. However, there are still many problems that cannot be solved using these powerful classical methods, so it is hoped that quantum computers can extend their reach [18,19,32].
We therefore propose a hybrid quantum algorithm for simulating the dynamics of a quantum system. The conventional approach for simulating quantum dynamics employs Trotterisation [19,[33][34][35][36] many quantum operations; therefore it seems likely to necessitate the full machinery of fault-tolerant quantum computing [20][21][22]. Our approach is based on the variational method and our hope is that it could be implemented using small-size quantum circuits, i.e. quantum circuits with a small number of quantum operations that suffer significant noise compared with fault-tolerant quantum computers. A novel feature of our algorithm is that it compensates for errors through classical inference without encoding: If the noise in the quantum computer mainly results in stochastic errors and the rate of errors can be amplified in a controllable way, then we find that errors can be approximately corrected. The condition that noise is stochastic can be met by engineering for many systems: if, for example, single qubit gates have relatively high fidelity [3][4][5][6]37] and their noise is stochastic, then arbitrary two-qubit gate errors can be made stochastic through a twirling-like technique [38][39][40] which we presently discuss. Moreover, the severity of such errors can be deliberately increased artificially, allowing one to create curves that the classical algorithm can extrapolate to estimate the zero-error limit. We performed numerical emulations of the process on small systems, finding that this technique does indeed lead to robustness: The impact of physical errors on the simulator's performance is far lower than in an (optimised) Trotterisation protocol, and moreover this impact does not worsen with the duration of the simulation.
The remainder of this paper is organised as follows. In Sec. II, we review the Trotterisation algorithm and variational methods. In Sec. III, our hybrid algorithm is introduced. In Sec. IV, the variational theory is discussed. In Sec. V, the task for the quantum computer and the overall program are described in detail. Errors in our algorithm are analysed in Sec. VI. The method for reducing errors is given in Sec. VII, in which we also discuss how to convert non-stochastic errors into stochastic errors and how to tune the rate of errors. Numerical results are presented in Sec. VIII. A summary is given in Sec. IX.

II. TROTTERISATION AND VARIATIONAL METHOD
The Trotterisation approach to simulation, which we use as a basis for comparison with our protocol, exploits the fact that time evolution under a general Hamiltonian H = j H j can be approximated according to the Trotter-Suzuki decomposition [33] Here, each term e −iHj τn,j corresponds to the evolution driven by the term H j for a short time τ n,j , which can be realised by a quantum gate or a combination of quantum gates. Usually, the short time is taken uniformly as τ n,j = T /N t , where T is the time of the simulated evolution. When N t is larger, the approximation is better, and errors in the approximation scale with the simulated time and the number of quantum gates as T 2 /N t [35]. Our approach is based on a variational technique. Variational methods have numerous applications in the numerical study of many-body quantum systems, for examples, density functional theory [41], the matrix product state method [42], and simulating molecular dynamics using the variational principle [43]. In these methods, typically a trial state is used to approximate the true state of the system. The trial state must of course be specified by some tractable number of parameters. But since existing realisations are entirely classical, there is a stronger condition on the trial function: it must be possible to efficiently evaluate its fit to the true quantum state using only a classical algorithm. This requirement limits the application of variational methods. Sometimes it may be impossible to evaluate a trial state that provides a good approximation to the true state in a classical computer. In such a case, a quantum computer could be helpful, because we may be able to complete tasks that are difficult for a classical computer using a quantum computer. An example is the unitary coupled cluster method [10,44], in which the energy of the trial state can be evaluated using a quantum computer when it is hard for a classical computer to do so. The protocol we describe here is another example.

III. HYBRID QUANTUM SIMULATION OF DYNAMICS
The purpose of our hybrid algorithm is to solve the Schrödinger equation i ∂ ∂t |Φ(t) = H|Φ(t) ( = 1), assuming that the state |Φ(t) can be approximated by a trial state |Ψ(t) ≡ |Ψ(λ 1 , λ 2 , . . .) , where {λ k (t)} are variational parameters. As shown in Fig. 1(a), the hybrid algorithm is implemented on both a quantum computer and a classical computer. The task of the classical computer is to determine variational parameters according to the Schrödinger equation, and this procedure requires certain derivatives associated with the state |Ψ(t) which the quantum computer provides.
The hybrid algorithm works out variational parameters iteratively as shown in Fig. 1(b). Parameters at the time t ({λ k (t)}) are sent to the quantum computer, with which the quantum computer finds the values required by the classical computer. Based on results from the quantum computer, the classical computer can determine parameters at the time t + δt ({λ k (t + δt)}), where δt is a short time. Then, these new parameters are sent back to the quantum computer. In this way, given parameters of the initial state ({λ k (0)}), parameters at the time T ({λ k (T )}) are systematically inferred by iterating the process carried out by two computers. The simulation is successful if the state |Ψ(T ) is a good approximation of the state |Φ(T ) .
Using the variational method, the degrees of freedom provided by variational parameters allow us to use quantum circuits with a size much smaller than the circuit of the Trotterisation algorithm to simulate the time evolution of a quantum system. Note that this is an 'apples to oranges' comparison because our algorithm only simulates the time evolution of a given initial state while the Trotterisation algorithm simulates the time evolution of arbitrary initial states, i.e. the time evolution operator. Thus our algorithm aims at an easier problem than the Trotterisation algorithm.
Tracking the evolution from a specific initial state is the main goal in many simulations, and other more general tasks can also be reexpressed this way. The approach we describe can be relevant to the specific problem of designing and calibrating quantum gates, thus allowing early quantum computers to aid in the design of their successors. Moreover there are also interesting connections between dynamical simulation and the determination of static properties: one could find a ground state by simulating an adiabatic time evolution [45], thus our algorithm is relevant to that task. In other hybrid algorithms for determining the ground state of a quantum system [9][10][11][12], one may need to find the global minimum of the energy in the parameter space to maximise the fidelity. However, finding the global minimum in a highdimensional parameter space is usually a non-trivial computing task. In our algorithm, parameters are worked out iteratively, therefore the global minimisation is not required. We remark that Trotterisation is used in some hybrid algorithms [10,[12][13][14][15]. In principle our algorithm can be used to replace the Trotterisation method in these instances, to further simplify the task of the quantum computer.

IV. VARIATIONAL THEORY OF QUANTUM TIME EVOLUTION
The time-dependent variational principle corresponding to the Schrödinger equation reads δ t f ti dtL = 0, where the Lagrangian is [46,47] Assuming that the state |Ψ(t) is determined by a set of real parameters {λ k (t)}, i.e. |Ψ(t) ≡ |Ψ(λ 1 , λ 2 , . . .) , the Lagrangian can be rewritten as which is a function of parameters {λ k } and their time derivatives {λ k = dλ k dt }. According to L, the Euler-Lagrange equation describing the evolution of parameters (hence the state |Ψ ) is where Here, η = 1, both M and V are real, and M is antisymmetric. There are other variational principles for the quantum time evolution [48]. For example, McLachlan's variational principle reads δ (i ∂ ∂t − H)|Ψ(t) = 0 [49], which leads to the same equation as Eq. (4) but η = −i. Here, the norm is ψ = ψ|ψ . In the following, we focus on the time-dependent variational principle, but the hybrid algorithm can be adapted to McLachlan's variational principle.
Recall that we can always express a state as |Ψ = n (α n + iβ n )|n , where α n and β n are real, and |n are orthonormal basis states. Taking parameters {λ k } = {α n , β n }, Eq. (4) leads to the Schrödinger equation; but of course we require a parameterisation such that the number of parameters remains tractable for the sizes of target systems that we are interested in. Thus our variational approach, like others, is relevant when the scientist can make an educated guess as to the general form of the quantum state -there can be a large number of free parameters {λ k }, but typically far fewer than would be needed to specify an arbitrary state. ancilla register redundant gates Quantum circuit for the evaluation of certain coefficients required by the classical main program, as specified in the text. To evaluate Re , the ancillary qubit is initialised in the state (|0 + e iθ |1 )/ √ 2 and measured in the |± = (|0 ± |1 )/ √ 2 basis. Here, U k is one of σ k,i , and Uq is one of σq,j or σj (By taking q = Nv + 1, σj is put on the left side of RN v in the product). In the figure, we have assumed that k < q. Gates on the register after the second controlled unitary gate can be omitted. This circuit is actually a variant of the circuit proposed in 2002 by Ekert et al [52,53]. It involves Nv gates on the register, two flip gates (X) on the ancillary qubit, and two controlled unitary gates on the ancillary qubit and the register.

V. VARIATIONAL ALGORITHM ON A HYBRID COMPUTER
We consider trial states that can be directly prepared in the quantum computer, i.e. states can be expressed as |Ψ = R|0 , where |0 is an initial state of the quantum computer, and R is a sequence of quantum gates determined by parameters {λ k }, i.e.
Here, R k is a unitary operator describing a quantum gate, and the total number of gates (i.e. parameters) is N v . If N v is smaller than the dimension of the Hilbert space (2 × dim − 2 to be exact), trial states |Ψ only span a sub-manifold of the Hilbert space. In this restricted trialstate space, Eq. (4) approximates the exact evolution if the exact state is close to the trial-state space.
In the following analysis, we describe each R k gate as dependent on only one parameter λ k . However it is worth noting that the trial state can be generalised to the case where each gate R k depends on multiple parameters, including parameters that vary in a pre-defined way with time, and that both the Trotter-Suzuki decomposition [33] and the unitary coupled cluster ansatz [44] can be expressed in this form. As a generalisation to the case in which the number of gates N v is fixed, one can even vary N v depending on the simulated time, providing that we understand how to re-express the trial state using the new gates (adding gates to the set is of course trivially possible). Our point is that the set of gates with which we create our trial state, can itself evolve over the simulated time.
We are interested in the case where evaluating coefficients M and V in Eq. (4) is intractable in a classical computer, therefore these coefficients are obtained using the quantum computer. Each parameter is determined in turn, by appropriately configuring a quantum circuit involving ∼ N v gates and a single measurement outcome; this fixed circuit is run repeatedly until the expected measurement outcome is known to a given precision. Note that this implies the overall protocol is trivially parallelisable over a large number of quantum processors with no quantum link between them.
We express the Hamiltonian in the form where σ i are unitary operators. In many quantum systems, the number of terms in this expression scales with the size of the system polynomially. Similarly, we write where σ k,i are also unitary operators. For many frequently used single-qubit gates and two-qubit gates (R k ), there is only one term in this expression, and σ k,i is also a one-qubit or two-qubit gate. Because any operator can be expressed using Pauli operators, we can choose unitary operators σ i and σ k,i as (single-qubit and multi-qubit) Pauli operators. Using the expression (9), we rewrite the derivative of the state as where Then, differential equation coefficients can be expressed as and where we have used the expression (8). In Eqs. (12) and (13), each term is in the form where the amplitude a and phase θ are determined by either if * k,i f q,j or f * k,i h j [50], and U is a unitary operator equal to either R † k,i R q,j or R † k,i σ j R. Such a term can be evaluated using the quantum circuit shown in Fig. 2. This circuit needs an ancillary qubit initialised in the state (|0 + e iθ |1 )/ √ 2 and a register initialised in the state |0 . The ancillary qubit is measured in the {|+ , |− } basis after a sequence of quantum gates on the register and two controlled gates, in which the ancillary qubit is the control qubit. The value of each term is given by Re e iθ 0 |U |0 = X = Tr(Xρ), where ρ is the final state of the quantum computer, and X is the x-direction Pauli operator of the ancillary qubit. In the following, we consider the case in which the value of X is estimated by repeating this relatively shallow circuit and calculating the mean value of measurement outcomes. Note that the value of X could be estimated more efficiently using quantum amplitude estimation [51] if error rates could be made low enough to allow a circuit of sufficient depth to function.

Main program
The overall flow of the algorithm is as follows: Firstly, we select initial parameters {λ k (0)}. Secondly, we solve the differential equation (4) numerically using the classical computer, in which the matrix M and the vector V in the equation are evaluated using the quantum coprocessor. The solution permits us to project our parameters forward by a small time increment, and repeat the second step. Eventually we reach the parameters {λ k (T )} which allow us to prepare the final state in the quantum computer.
There are many different numerical methods for solving a differential equation, and the choice of specific numerical method determines the details of the information exchange loop between quantum and classical processors. In the following, we take the Euler method as an example, but the algorithm can be adapted to other numerical methods, e.g. Runge-Kutta methods [54].
Time is discretised as t n = nδt, where t 0 = 0 is the initial time, and t N = N δt = T is the simulated evolution time. Firstly, M and V corresponding to parameters {λ k (t 0 )} are evaluated using the quantum computer. Then, the following process is repeated. Given M and V corresponding to the time t n , Eq. (4) is solved numerically on the classical computer to obtain values of {λ k (t n )}. As parameters {λ k (t n )} have been obtained from previous calculations, one can approximately calculate parameters of the time t n+1 using λ k (t n+1 ) = λ k (t n ) +λ k δt. Repeating the process until t n+1 = T , we can work out the parameters {λ k (T )} of the final state.

VI. ERROR ANALYSIS
There are four types of errors that can result in infidelity in the variational quantum simulation: i) errors due to limited generality of the trial wavefunction, which may only be able to describe the simulated system approximately; ii) errors in the numerical integration obtained by solving Eq. (4), which is always approximate because of the finite discretisation of time; iii) shot noise in measuring equation coefficients M and V ; and iv) errors due to noise in the quantum machine, e.g. decoherence and quantum gate infidelity.
A good trial wavefunction allows us to not only reduce trial-wavefunction errors but also minimise the difficulty of implementing the algorithm, e.g. only use small-size quantum circuits. Whether a good trial wavefunction can be found depends on the simulated system and our understanding of the physics in that system. However, a trial wavefunction that contains a polynomial number of parameters and is a high-fidelity approximation to the true wavefunction always exists. For example, one can set the trial wavefunction in the Trotter-Suzuki form, i.e. take the sequence of gate operations R in the form of Eq. (1), and then take the evolution time of each term as a variational parameter ({λ k } = {τ n,j }) rather than a fixed value as in the Trotterisation algorithm [11]. Using the Trotter-Suzuki-form trial wavefunction, we know that the probability of trial-wavefunction errors decreases with the number of Trotterisation slices N t as 'Error ∝ 1/N t ' [35] in the worst case.
Integration errors depend on the numerical method for solving the differential equation (4). We take the Euler method as an example. In the Euler method, the probability of error is proportional to the size of each step δt [54]. Therefore, by choosing a small step size, integration errors can be suppressed.
Shot noise and machine noise occur in the implementation of the algorithm while trial-wavefunction errors and integration errors are due to the imperfection of the algorithm itself. The effect of implementation errors in the integration process is that coefficients M and V evaluated using the quantum computer are inaccurate, i.e. their values are different from their true values M (0) and V (0) given by Eqs. (12) and (13)

A. Trace distance
To analyse errors in the hybrid algorithm, we use the trace distance D(ρ, ρ ) = 1 2 Tr|ρ − ρ | [55] as the measure of error severity. The degree of error in the overall process is given by D (|Φ(t N ) , ρ N ). Here, |Φ n ≡ |Φ(t n ) denotes the true wavefunction, |Ψ n ≡ |Ψ(t n ) denotes the trial wavefunction, and ρ N is the state prepared in the quantum computer according to the state |Ψ N . The two states ρ N and |Ψ N are different because of the machine noise. The triangle inequality holds for the trace distance, i.e. D(ρ, ρ ) ≤ D(ρ, ρ ) + D(ρ , ρ ). Therefore, an upper bound of D(|Φ N , ρ N ) is given by (see Fig. 3) Here, U n is the exact evolution during the time from where Here, D A corresponds to algorithm errors, and D I corresponds to implementation errors. Algorithm errors in each time step can be expressed as where and Here, we have used Taylor expansions of U n |Ψ n−1 and |Ψ k , which can be expressed as where A is a positive semi-definite matrix and Here, we have used the Taylor expansion of |Ψ n , which is the same as |Ψ Using Q max to denote the maximum value of the quantity Q for all t n , we have The first term of D A and the second term of D I are due to imperfections in approximating the initial state using the trial wavefunction and preparing the final state in the quantum computer with machine noise, respectively. Note that these two terms are not T dependent; for simulations over a substantial time T we may expect them to make relatively small contributions. In the following, we analyse the other three terms one by one. The second term of D A is the accumulation of trialwavefunction imperfections in each time step. Using the circuit in Fig. 2, every term in ∆ (2) can be measured by a method analogous to that used for obtaining M and V . The accuracy is again limited by the shot noise and machine noise. Therefore, algorithm errors due to the trial wavefunction can be continually estimated during the execution of the hybrid quantum computation.
The third term of D A is caused by the finite integration step size, which can be reduced by decreasing δt. In order to limit this term to ε, we need to choose a step size δt ∼ ε 2 /(∆ (3) T 2 ), i.e. the number of time steps N ∼ ∆ (3) T 3 /ε 2 . When the trial wavefunction is a good approximation to the exact state, we can expect that The first term of D I is due to the difference between {λ k } and their true values {λ δV . Here, M and V are evaluated as required by the algorithm. Similar to ∆ (2) , each element of A can also be measured using the circuit in Fig. 2. Therefore, the susceptibility to shot noise and machine noise in the integration process can be estimated during the execution of the hybrid quantum computation. The algorithm is susceptible to implementation errors when M (0) is singular, which should be avoided when choosing the trial wavefunction. As a worst-case scenario, implementation errors accumulate linearly with the simulated time. However, it can be far less severe: In Sec. VIII, we will explore an example in which the accumulation of errors due to the machine noise is almost negligible compared with errors in a quantum simulation based on the conventional Trotter-Suzuki decomposition.
Shot noise can be suppressed by repeating quantum circuits for measuring M and V many times. To measure the quantity X , the deviation due to the shot noise decreases with the number of repetitions N r as δ X ∝ 1/ √ N r . Therefore, if there is only shot noise, we 1/2 . In order to limit the overall effect of shot noise to ε , we need to choose N r ∼ A ∆ 2 T 2 /ε 2 . The number of distinct circuits N c required for finding the M and V parameters depends on N H the number of terms in the Hamiltonian H [see Eq. (8)], N d the number of terms in each time derivative of R k [see Eq. (9)], and N v the number of parameters in the trial wavefunction.
where the first term corresponds to M , and the second term corresponds to V . If each R k is realised by N R gates, and each term σ i or σ k,i in the Hamiltonian or the time derivative of R k is a Pauli operator of less than K qubits, each circuit includes at most N g = N v N R + 2(K + 1) gates. Here, we have assumed that each controlled-U gate in the circuit (Fig. 2) is realised by K two-qubit controlled-σ gates, where σ is a single-qubit Pauli operator. To complete the circuit, other operations include preparing initial states of the ancillary qubit and the register and measuring the ancillary qubit.
The overall computation includes N times steps, in each time step, N c circuits are implemented, each circuit contains N g gates and is repeated N r times. Therefore, the overall number of gates is From this expression, we see that the cost is a polynomial function with respect to the integration error ε, the shotnoise error ε and the simulated time T . Factors A , ∆ and ∆ (3) depend on the form of the trial wavefunction. However, during the actual execution of the hybrid computation, it will be possible to estimate A and ∆. Moreover ∆ (3) ∼ H 3 when the trial wavefunction is a good approximation to the true state. It is important to remember that while the overall gate count will be a large (albeit polynomially-scaling) total, the complete calculation is formed of many small quantum calculations of depth N g . Each small computation is isolated from the others, i.e. there is no shared or persistent quantum resource, and indeed ∼ N c N r such circuits could be performed in parallel using that many separate small quantum computers.

VII. EFFECT OF MACHINE NOISE AND ERROR REDUCTION
Of the implementation errors, machine noise is the more problematic. Shot noise can be suppressed merely by repeating each quantum circuit many times, and the number of repetitions is a polynomial function with respect to the accuracy. Machine noise is less easily dismissed, and it is the focus of this section.
Machine noise need not necessarily result in computing errors. The task of the quantum computer is to evaluate and V = ηV (0) . As long as η is non-zero, the solution {λ k } of Eq. (4) is the same for any value of η. Therefore, only an inhomogeneous scaling of quantum outputs X results in computing errors.
An example of the homogeneous scaling is the case of balanced measurement errors. Errors in the measurement on the ancillary qubit (see Fig. 2) can be modelled as follows: if the state of the qubit is |0 (|1 ), the measurement outcome is correct, i.e. 0 (1), with the probability 1 − p 0 (1 − p 1 ), and the outcome is incorrect, i.e. 1 (0), with the probability p 0 (p 1 ). If there are no other implementation errors, quantum outputs are changed from basis is done by performing a Hadamard gate before measuring the qubit in the {|0 , |1 } basis). If measurement errors are balanced, i.e. p 0 = p 1 , the effect of measurement errors is a fixed scaling factor η = 1 − p 0 − p 1 , which does not result in computing errors. Therefore, our hybrid algorithm is inherently insensitive to measurement errors on the ancillary qubit if these errors are balanced. We remark that if single-qubit gates are reliable, one can flip the qubit before the measurement, so that measurement errors are effectively balanced.
Measurement errors can be corrected even if they are not balanced. If p 0 and p 1 can be evaluated by bench-marking measurement operations, one can easily work out the true value X (0) using the value obtained from the real machine: We remark that when error probabilities are higher, the denominator is smaller, which means that we need to evaluate X with a higher accuracy in order to achieve the same accuracy of X (0) . Next, we show that a similar procedure can be applied to any machine noise if errors due to the machine noise are stochastic with tunable probabilities. We also show how to convert errors in two-qubit entangling gates, which are expected to be the main sources of errors, into stochastic errors if they are not stochastic, and how to simulate stochastic errors to tune error probabilities.

A. Error reduction
Errors in an operation are stochastic if the operation is described by a superoperator N U and N has the form N = (1− )I + E. Here, U is the ideal operation without errors, N is the superoperator describing the effect of the noise, I is an identity operation, and errors E occur with the probability . Here, E is a valid quantum operation, i.e. trace-preserving completely positive map.
Given an initial state |0 , after a sequence of operations, the final state of the quantum computer is where N l U l denotes the lth operation. Taking into account the fact that errors are stochastic, the quantum outcome can be rewritten in the form Here, X = Tr(Xρ), X (0) = Tr(Xρ (0) ) is the value without errors, X (1) = Tr(Xρ (1) ), and The l th term of ρ (1) corresponds to the case in which only the l th operation causes errors and all other operations are ideal. Note that in these equations we have replaced error probabilities l with r l , where r is a convenient scale factor allowing us to write X (r) as a function of r, and X (0) = X (0) . If probabilities of errors are tunable, we can infer the value of X (0) by measuring values of X (r) of a set of different factors r [see Fig. 4(a)]. Clearly r can never be zero for our machine, as this would imply that we have the ability to fully switch off the machine noise, making it perfect. If { l } are the minimum error probabilities allowed by the machine, the minimum value of r achievable by the machine is 1. To infer the value of X (0), Here, |Φ(t) is the true state and ρ(t) is the state prepared in the quantum computer. The Trotterisation algorithm (black curves) and the hybrid algorithm are compared. In the Trotterisation algorithm ρ(t) is prepared according to the Trotter-Suzuki decomposition, and in the hybrid algorithm ρ(t) is prepared according to the trial state |Ψ(t) . We have taken the error rate (2) = 0.1%. The hybrid algorithm without the error reduction (blue curves) is already much more reliable than the conventional Trotterisation algorithm. In the hybrid algorithm, one can further reduce the distance using the error reduction protocol (red curve). The residual distance that cannot be eliminated using the error reduction is mainly due to errors in the state preparation (i. e. D(|ΨN , ρN )). Inset: the parameter δt in the Trotterisation algorithm is optimised to minimise the average distance, and the gray curve is obtained by using the lowest-order symmetric Trotter-Suzuki decomposition (see Appendix D). (b) The difference between the average value of a stabiliser estimated using the quantum computer and its true value. Here, S(ρ) = Tr(S2ρ) and S(|Φ ) = Φ|S2|Φ . Average values of other stabilisers are only slightly different. The true average value of the stabiliser is plotted in the inset. In both figures (a) and (b), light blue and red bands denote the fluctuation due to shot noise: with 68% chance (in total 100 trials), the corresponding quantity is within the band. firstly, we take N X values of r as r 1 , r 2 , . . . , r N X and measure X (r 1 ), X (r 2 ), . . . , X (r N X ) using the quantum computer, where we can take r 1 = 1. For example, in Fig. 4(a), we have taken r 1 = 1, r 2 = 1.5 and r 3 = 2. Secondly, we fit quantum outputs using the function X (r) = X (0) + χr, where χ = − X (0) l l + X (1) . As a result of the fitting, we obtain the value of X (0) , represented by the gray circle in Fig. 4(a). In this way, the first-order contribution of machine noise can be corrected. Similarly, by considering second-order terms in the expansion (28), one can fit data using a function with second-order terms (i.e. r 2 terms) to correct the secondorder contribution of machine noise. Using the extrapolation, we can reduce the effect of the machine noise. However, the final estimation of X (0) may still be different from its actual value, and the error in the extrapolation depends on the shot noise in estimating each X (r).
The error reduction protocol only works for small-size circuits, which are used in the hybrid algorithm while the Trotterisation algorithm usually needs large-size circuits. The true value X (0) can be inferred because the contribution of high-order terms is much smaller than the con-tribution of lower-order terms, i.e. |O(r 2 )| |r X (1) |. The total rate of errors in the quantum circuit with N g gates is ∼ 1 − (1 − ) Ng = N g + N 2 g 2 /2 + · · · . The first term in the expansion corresponds to the first order contribution |r X (1) |, and so on. Therefore, high-order terms cannot be neglected if N g 1. When there are too many gates in the circuit or the error rate is too high, the quantum state will be populated with errors and one cannot retrieve the true value X (0) even if we consider high-order terms in the interpolation. The best experiments to date [5,6] have reduced two-qubit gate infidelity to the range 10 −3 to 10 −4 . With hardware of that kind, our protocol could support hundreds of gates. Thus a simulation of a quantum system using a trial wavefunction with hundred(s) of parameters, may be feasible.
Implementing the error reduction protocol requires knowledge of the inherent machine noise. Therefore, quantum operations need to be benchmarked, e.g. using quantum process tomography [55,56], before the quantum coprocessor is used, and the nature of the machine noise should not vary significantly during the simulation. Alternatively, one can monitor the machine noise in the process by stopping the protocol and benchmarking op-erations, because in the hybrid algorithm the quantum computer only performs small-size circuits and can be stopped at any stage.

B. Error twirling and simulation
Non-stochastic errors can be converted into stochastic Pauli errors using some redundant Pauli gates [38][39][40]. It is a common feature of quantum computing systems that two-qubit gates are the main source of errors, i.e. probabilities of errors in two-qubit gates are much higher than probabilities of errors in single-qubit gates [3][4][5][6]37]. In this case, arbitrary errors in two-qubit gates can be converted into stochastic errors through the use of redundant Pauli gates, without introducing significant additional noise. We consider the controlled-phase gate Λ = (1 1 + σ z c )/2 + σ z t (1 1 − σ z c )/2 as an example, and it is similar for other two-qubit gates, e.g. the controlled-NOT gate. Here c and t denote the control qubit and the target qubit, respectively.
The controlled-phase gate with noise is N Λ U Λ , where U Λ ρ = ΛρΛ † . In general the noise may not be in stochastic-error form, but we can always express the noise operation in the Kraus form, i.e.
In order to convert errors, Pauli gates {1 1, σ x , σ y , σ z } are randomly chosen and applied on each qubit before and after the controlled-phase gate. If the gate before the controlled-phase gate is U , the gate after the controlledphase gate is restricted to be ΛU Λ † in order to let random Pauli gates cancel each other. Here, U and ΛU Λ † are both two-qubit Pauli gates. The circuit is shown in Fig. 4 to a phase factor. As a result, the overall operation isN Λ U Λ , whereN Λ is the superoperator describing the effective noise after applying random Pauli gates. The effective noise superoperator is in the stochastic form [57] (see Appendix C), i.e.
where the fidelity is F Λ = h |α h;0,0 | 2 , and error probabilities are a,b = h |α h;a,b | 2 . Here, [U ] is a superoperator, and [U ]ρ = U ρU † . We have assumed that Pauli gates are ideal. In the case where Pauli gates are not ideal but their error probabilities are much lower than the controlled-phase gate, noise of Pauli gates will be a perturbation to the effective noise of the controlled-phase gate. Error severity in the controlled-phase gate will effectively increase by four units of the single-qubit Pauli gate error rate, and the effective noise of the controlledphase gate may not be fully stochastic if Pauli-gate errors are non-stochastic.
To tune probabilities of errors, we can randomly perform Pauli gates after the controlled-phase gate according to desired error probabilities [see Fig. 4(b)]. Assuming that we want to tune error probabilities from e,f to r e,f , we can perform the Pauli gate σ e c σ f t [(e, f ) = (0, 0)] with the probability (r − 1) e,f . Because we are only interested in the case where r e,f 1, overall error probabilities are approximately e,f + (r − 1) e,f where the first term is due to the raw controlled-phase gate (with noise), and the second term is due to simulated errors using single-qubit Pauli gates.

VIII. NUMERICAL RESULTS -QUANTUM ISING MODEL
To demonstrate the hybrid algorithm, we numerically simulate a small quantum computer using classical computers. We take the quantum Ising model [58] as an example. The Hamiltonian of the model reads Here, n s is the number of spins, and σ z ns+1 = σ z 1 . In our numerical simulations, we take J = B = 1/2 and n s = 3, therefore we need at least four qubits in the quantum computer to implement the hybrid algorithm. The trial state is chosen to be where the initial state |Φ(0) is a one-dimensional cluster state. In the cluster state, qubits are in the eigenstate of stabilisers S j = σ z j−1 σ x j σ z j+1 (j = 1, 2, . . . , n s ) with the same eigenvalue +1, which is prepared by performing the controlled-phase gate on each pair of nearest neighbouring qubits initialised in the state |+ [59]. The evolution of the true state is |Φ(t) = e −iHt |Φ(0) . In this example, the trial state is capable of exactly matching the true state given the correct values of the parameters.
We consider a quantum computer that can initialise qubits in the state |0 , measure a qubit in the {|0 , |1 } basis, and perform single-qubit and two-qubit quantum gates. Quantum gates include the Hadamard gate, Pauli gates, phase gates e iσ z θ , flip gates e iσ x θ , two-qubit gates e iσ z 1 σ z 2 θ , the controlled-phase gate and the controlled-NOT gate. If we have one of the three types of two-qubit gates, the other two can be efficiently realised, e.g. the gate e iσ z 1 σ z 2 θ = H 1 Λe iσ x 1 θ ΛH 1 can be realised using two controlled-phase gates and three single-qubit gates. Here, H 1 is the Hadamard gate on qubit-1. We assume that all three types of two-qubit gates can be directly implemented for simplification. The state |+ is prepared by initialising the qubit in the state |0 and performing a Hadamard gate; the measurement in the {|+ , |− } basis is done by performing a Hadamard gate before measuring the qubit in the {|0 , |1 } basis.
We model the machine noise in the quantum computer as depolarising errors. A qubit may be initialised in the incorrect state (|1 ) with the probability I . The measurement outcome is incorrect with the probability p 0 = p 1 = M . For single-qubit gates, the noise superoperator is For two-qubit gates, the noise superoperator is Here, (1) and (2) are rates of errors per gate. We assume that error rates of all single-qubit gates are the same, error rates of all two-qubit gates are the same, and error rates of single-qubit operations are only one tenth of the error rates of two-qubit gates, i.e. I = M = (1) = (2) /10. Because the size of quantum circuits for implementing the hybrid algorithm is small, we neglect memory errors. In this model of the machine noise, errors in quantum operations are all stochastic, and error rates can be tuned by simulating errors using single-qubit Pauli gates. Numerical simulations are performed to find the trace distance D(|Φ , ρ) between the true state |Φ(t) and the state ρ(t) prepared in the quantum computer according to the trial state |Ψ(t) [see Fig. 5(a)]. Because of the machine noise, ρ(t) is different from |Ψ(t) . The average values of stabilisers can be used to describe the quality of a cluster state. The performance of quantum algorithms in estimating the average values of stabilisers is shown in Fig. 5(b). In our numerical simulations, we have taken (2) = 0.1%, which is the state-of-the-art error rate [5,6]. See Appendix D for some details about our numerical simulations. We observe that in this simulation the hybrid algorithm proves to be far more reliable than the Trotterisation algorithm. The distance in the hybrid algorithm is about ten times lower than the distance in the Trotterisation algorithm, and moreover the increase of the distance as a function of time can be largely suppressed in the hybrid algorithm by using the error reduction scheme [ Fig. 5(a)]. As a result, the hybrid algorithm can provide a much better estimation of the average values of stabilisers [ Fig. 5(b)]. We would like to stress that in order to make the comparison fair, the time interval selected for the Trotterisation algorithm has been optimised to minimise errors (in practice this would be possible only if the distance from the true state can be measured in the quantum computer, which would probably negate the need for simulation). We have also considered the lowestorder symmetric Trotter-Suzuki decomposition [60,61], which can reduce errors due to the Trotterisation compared with the conventional Trotterisation scheme. However, we find that the total errors are more significant using the symmetric decomposition given the gate error rate (2) = 0.1%. In the hybrid algorithm, we have used the fourth-order Runge-Kutta method and taken δt = 2π × 10 −6 to eliminate errors due to the numerical integration. We have neglected the effect of shot noise, therefore all errors are due to machine noise. Given a finite time cost of implementing the hybrid algorithm, we need to consider the effect of errors in the numerical integration and shot noise. Taking into account the effect of these two types of imperfections, we find that the distance in the hybrid algorithm may be increased but is still much lower than the distance in the Trotterisation algorithm [blue and red bands in Fig. 5(a)]: we take δt = 2π × 10 −4 and assume that each circuit is repeated for N r = 10 4 (N r = 10 6 ) times to measure X in the hybrid algorithm (with the error reduction). Increasing the distance only slightly changes the estimation of average values of stabilisers [ Fig. 5(b)].

IX. SUMMARY
We have proposed a quantum algorithm for simulating the time evolution of a quantum system. In this algorithm, both a classical processor and a quantum coprocessor are tightly integrated. Because of the assistance of the classical computer, the algorithm can be implemented with quantum circuits of much less depth (i.e. fewer quantum operations) compared with the canonical Trotterisation algorithm. We discussed the robustness of the algorithm to noise, and we found that the algorithm can automatically correct some errors induced by the noise in the quantum computer. Moreover, by deliberately amplifying stochastic noise in a controllable way, the zero-error limit can be estimated; consequently, the effect of errors can be significantly suppressed without the need for code-based quantum error correction and its concomitant resource overheads. This quantum algorithm can also be parallelised easily. The task of the quantum coprocessor is to repeatedly implement a set of small-size quantum circuits, therefore the computing speed can be accelerated by using a cluster of quantum coprocessors, in which each works independently and there are no quantum channels linking them.
In view of these various merits, we believe that our algorithm is a promising candidate for early-stage nonfault-tolerant quantum computers.

ACKNOWLEDGMENTS
This work was supported by the EPSRC National Quantum Technology Hub in Networked Quantum Information Technologies. The authors would like to acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility in carrying out this work. http://dx.doi.org/10.5281/zenodo.22558.

Note added:
Recently, a highly relevant paper was also posted by Temme, Bravyi, and Gambetta (arXiv:1612.02058). In that work the authors also studied an error mitigation strategy involving deliberate variation of the error severity and subsequent extrapolation to the most likely zeroerror value of their observable. While the authors analyse this as a general technique and investigate higher order corrections, rather than employing the technique in the specific context of quantum dynamical simulation as we do here, the results of our two papers are consistent and can be compared.
In the hybrid algorithm with error reduction, we take r 1 = 1 and r 2 = 2 and fit the function X = X (0) + χr to infer the value of X (0) . To simulate shot noise, we use the normal distribution to approximate the binomial distribution, i.e. the value of X taking into account the shot noise is given by x = 1 − 2p, where p is given by a normal distribution with the mean p 0 = (1 − X )/2 and the standard deviation p 0 (1 − p 0 )/N r .
In the Trotterisation algorithm, the time evolution is simulated using Eq. (1). For the quantum Ising model, the Hamiltonian is decomposed into two terms H 1 = H Z and H 2 = H X . In our numerical simulations, we take N t = ceiling(T /δt), τ n,j = δt if n < N t , and τ Nt,j = T − (N t − 1)δt. Such a method of determining Trotterisation parameters coincides with the optimal choice of the number N t to minimise errors [36]. The state ρ(t) is given by replacing T with t and determining N t and τ n,j using the method we just described.
To find the optimal δt, we consider the average trace distance D(|Φ , ρ) = T −1 T 0 dtD(|Φ(t) , ρ(t)), where |Φ is the true state, and ρ is the state prepared in the quantum computer according to the Trotterisation algorithm. The average trace distance is plotted in Fig. 5(a), taking T = 4π. To obtain other results in Fig. 5, we take δt = 2π × 10 −1.4 in the Trotterisation algorithm, which minimises the average distance.
It is interesting to ask whether second-order techniques, which are known to be helpful in reducing the error in the Trotterisation technique, might have a superior performance here. For the lowest-order symmetric Trotter-Suzuki decomposition [60,61], each time step in Eq. (1) is replaced by N H j=1 e −iHj δt/2 1 j=N H e −iHj δt/2 , where N H is the number of terms in the Hamiltonian. The performance when using this technique is plotted as the grey curve in the inset to Fig. 5(a). We see that the performance is actually inferior to the more basic Trotterisation approach; the explanation is that the potential gains are more than negated because the larger number of gates employed introduces a greater degree of error (due to the 0.1% physical gate errors).