Simulating prethermalization using near-term quantum computers

,


I. Introduction
Quantum computers promise to have a great impact on scientific research.A particular example is the study of thermalization of quantum many-body systems.The problem is computationally challenging with classical methods [1][2][3] as it requires simulating the long-time dynamics of large systems.A fault-tolerant quantum computer would render this problem tractable by enabling quantum simulation [4][5][6][7].
Despite impressive recent progress, present day quantum devices are still far from the regime of fault tolerance.Any current quantum simulation is therefore affected by noise and imperfections.In circuit-based quantum computers, continuoustime dynamics can be approximated using, for example, Trotterization [8].With state-of-the-art gate errors [9][10][11][12][13][14], it is however only possible to run simulations with a controlled Trotter error up to short times, which are insufficient to explore thermalization in classically intractable systems (50 or so qubits in two or more dimensions).
In this work, we demonstrate that thermalization can already be observed for much larger Trotter steps than needed to guarantee a bounded Trotter error, making it feasible to study this phenomenon on near-term quantum devices.In this regime, the system may be viewed as subject to a periodic Floquet drive [15][16][17], where one Trotter step corresponds to one period.The fate of Floquet systems at late times has been a subject of recent interest [18][19][20].Even though the system generally heats up to infinite temperatures [21,22], the heating time may be very long if the driving frequency is large compared to all local energy scales [23].The system then prethermalizes [24][25][26][27][28][29]: Before it heats up, its dynamics mirror the equilibration of a closed system.The prethermal regime is relatively easy to access in practice because the Floquet heating time increases exponentially with the driving frequency or, equivalently, the inverse Trotter step size (see Fig. 1).
With this in mind, we define the prethermalized expectation value problem (PEVP): Given a Floquet unitary and a product initial state, what value does a local observable reach in the prethermal plateau?We find that this problem can be solved even in presence of realistic noise.Following a small circuit adjustment, the PEVP turns out to be amenable to a simple but Different regimes of the dynamics of local observables depending on the Trotter step  and the evolution time .The black lines separate three regimes: bounded Trotter error (bottom), Floquet prethermalization (middle), and chaotic dynamics (top).The lower line scales as O max  − /(+1) ,  −1  −  , following error bounds for the  th -order Trotter decomposition with system size .The upper line is determined by the Floquet heating time and scales as  O (1/) .The blue shaded area indicates constant maximum circuit depth, relevant for noisy quantum computers.The grey area is excluded due to the constraint that  ≥ .The red shading highlights where the total time  exceeds a system-dependent (pre)thermalization time scale  th .The prethermalized expectation value problem is experimentally accessible in the purple intersection.arXiv:2303.08461v1[quant-ph] 15 Mar 2023 highly effective error-mitigation scheme based on rescaling survival probabilities.Using this strategy, the error-mitigated PEVP reproduces the equilibrium properties of a model that is closely related to the Hamiltonian underlying the Trotterization.More precisely, the prethermal expectation values describe the diagonal ensemble of this model, which is equivalent to the microcanonical ensemble assuming that the eigenstate thermalization hypothesis (ETH) [30][31][32][33] is valid.Besides its application to the study of thermalization, the PEVP may be viewed as a problem of independent computational interest in the context of demonstrating quantum advantage.
The paper is structured as follows.In Sec.II, we discuss thermalization in Floquet systems and present simulation results for the two-dimensional XY model as an example.We introduce our error mitigation strategy based on the rescaling of survival probabilities in Sec.III, where we also provide a thorough numerical analysis of its performance.Equipped with that, we demonstrate the suitability of the PEVP for nearterm devices by simulating it with realistic noise parameters of superconducting quantum computers.We conclude in Sec.IV.

A. Time evolution on digital quantum computers
The time evolution under a Hamiltonian  can be reproduced on a digital quantum computer using the Suzuki-Trotter decomposition.In its simplest, first-order form, the decomposition approximates the time-evolution unitary  () =  −  by where  = Γ =1   .Each   is a sum of mutually commuting local terms, such that  −   can be efficiently implemented using local gates.The smaller the Trotter step , the more accurate the Trotter decomposition.For the -th order Trotter decomposition [34], which generalizes the previous simple formula, the error of  Trotter () with respect to the desired unitary  () is bounded from above by O ( +1 ), where  is the system size [8].The dependence on  can be eliminated if all quantities of interest are local observables.According to the Lieb-Robinson bound, only a light cone with a radius proportional to the total evolution time  is relevant [35].Therefore, the system size  can be replaced with the size of the light cone ∼   before it reaches the edges of the system, where  is the spatial dimension.We hence require that the Trotter step  be less than O (max  −(+1)/  , () −1/  ) for the Trotterized time evolution of local observables to converge to the continuous evolution under .
We can now define the following computational problem.
within additive error   , where • is the operator norm.
Note that the Trotterization is not uniquely defined by the Hamiltonian and  Trotter must be specified explicitly.The cost of solving this problem on a classical computer generically scales exponentially with either the number of Trotter steps  or the system size  [36], whereas on a fault-tolerant quantum computer, the effort increases at most polynomially with both.The hardness of the problem is further supported by the fact that it becomes BQP-complete at times  = poly() if the Trotter error is negligible [37].In section III, we present evidence that the problem is solvable on noisy quantum computers up to a maximum number of Trotter steps, which is independent of system size.We then show in section III C that noisy quantum devices may reach a classically intractable regime with realistic noise parameters, even when taken into account the overhead of our error mitigation strategy.

B. Prethermalization
Problem 1 is not only interesting from the perspective of dynamics but it can also yield insight into equilibrium properties.In condensed matter or statistical physics, one would typically describe a system in equilibrium in terms of its temperature, or in case of the microcanonical ensemble, its internal energy.Under ETH, the microcanonical ensemble at the mean energy of the state | can be approximated by solving Problem 1.More precisely, in the limit of continuous time evolution, the long-time average of an observable is described by the diagonal ensemble.For a given initial state | and an observable , where  =    |  | is the spectral decomposition of a non-degenerate Hamiltonian [38].Assuming ETH, the expectation value  | | is a smooth function of the energy   up to a small, state-dependent correction [30].The diagonal ensemble is then equivalent to the microcanonical ensemble at energy || provided the energy variance of | is sufficiently small.For observables that are an average of an extensive number of local terms, e.g., the total magnetization per site, we expect the microcanonical ensemble to vary significantly only on an extensive energy scale.It is thus possible to estimate expectation values in the microcanonical ensemble from the diagonal ensemble of states whose width in energy is subextensive.Product states satisfy this condition as their widths in energy are (under weak assumptions) proportional to √  [39].The above discussion shows that it is possible to probe the microcanonical ensemble by solving problem 1 with product initial states at different mean energies.This is, however, challenging with current quantum devices for two reasons.
First, the maximum number of Trotter steps / is limited by the maximum circuit depth in the presence of noise, while the total time  required to reach equilibrium may be large.Therefore, noisy quantum devices are usually unable to reach long enough times with bounded Trotter error.Secondly, the finite calibration precision renders it challenging to get high relative precision in the angle of rotation for gates that are very close to the identity, bounding from below the size of .
We will now argue that it is nevertheless possible to study equilibrium phenomena.Using larger, experimentally feasible Trotter steps can be viewed as applying a periodic Floquet drive.The system can be described by the Floquet Hamiltonian   , which is implicitly defined by The Floquet Hamiltonian is not unique as its eigenvalues are only defined modulo  = 2/, the effective driving frequency.For large , (small ), i.e., outside the Trotter limit, the Floquet Hamiltonian is highly non-local and will cause a generic initial state to heat up to infinite temperature [21,22].Despite this, it is possible to observe (approximate) equilibration if the heating time scale is much greater than the equilibration time scale.This is known as Floquet prethermalization [24,40,41].Fortunately for our purposes, Floquet prethermalization is relatively easy to access because Floquet heating occurs on a time scale   ∝  O ( /  ) , where  is the interaction range and  is the local energy scale, assuming   .We highlight the favorable exponential dependence of   on /  and the fact that   is independent of the system size.
For times much less than   , the system evolves approximately according to an effective Hamiltonian which is close to, but not the same as, the original Hamiltonian .More precisely, the effective Hamiltonian is local and it is given by the  0 -th order Magnus expansion [42,43] of the Floquet Hamiltonian, where  0 = O (/ ) (see Appendix B for details).Observables start to equilibrate under the effective Hamiltonian before eventually heating up.If the equilibration time  0 is much shorter than   , then there exists a prethermal plateau  0 ≤    , during which the expectation value of the observable is approximately constant.We provide a formal definition of a plateau in Appendix A.
The above observations motivate the definition of the PEVP:

Problem 2 (Prethermalized expectation value problem).
Given a unitary  Trotter (), a state | , and a local observable , assume that a prethermal plateau exists between times This problem reduces to solving Problem 1 at time  =  1 .In the following sections, we show using the example of the twodimensional XY model that the prethermal plateau is indeed accessible and that the properties of the effective Hamiltonian closely resemble those of the initial Hamiltonian.We further demonstrate that the PEVP can be solved on a noisy quantum device with realistic parameters up to system sizes for which classical simulation of the dynamics is intractable.

C. PEVP with the XY model
We focus on the two-dimensional quantum XY model on a square lattice for the remainder of this work.We emphasize, however, that the approach can be readily applied to many other models.The Hamiltonian of the XY model is given by where  is the interaction strenth,    ( ∈ {, , }) are spin-1/2 operators on site , and the sum runs over all pairs of nearest neighbors.The model is convenient for digital quantum computers as its two-site interaction generates a partial iSWAP gate, A single Trotter step in a first-order decomposition consists of applying a partial iSWAP gate to each nearest-neighbor pair of qubits.As non-overlapping gates can be performed in parallel, these operations can be carried out in a circuit whose depth is equal to the number of nearest neighbors (4 in the case of the square lattice).
The XY model in two dimensions can be solved with quantum Monte Carlo algorithms [44,45] and thus serves as a good benchmark to our method.It is known to undergo the Kosterlitz-Thouless (KT) transition [45,46] at nonzero temperature.This phase transition can be characterized by the mean-squared in-plane magnetization per site, which is an approximation to the in-plane susceptibility [45].
The mean-squared magnetization can be written as the sum of two-site correlators, which decay exponentially with the distance between the two sites at high temperature.Hence,  2  +  2  decreases with the system size as 1/ in the thermodynamic limit.Below the critical temperature, the system exhibits quasi long-range order.The mean-squared magnetization decays only as 1/ 1/8 and its value remains non-negligible for moderately large systems [45].
In analogy to the long-time average that gives rise to the diagonal ensemble, we probe the prethermal plateaus using the Floquet time average as in Definition 1, where the Trotterization is shown in the appendix in Fig. 6a.We explore this quantity using exact diagonalization on a square lattice with  = 4 × 4 spins and open boundary conditions.Figure 2a shows the values of the mean-squared in-plane magnetization for the initial state . The different colors indicate the Trotter step size  or, equivalently, the driving frequency  = 2/.The initial state is close to the ground state of the XY Hamiltonian.We therefore expect the in-plane magnetization to remain high in the prethermal plateau, provided the effective Hamiltonian does not differ too much from the XY model.
We indeed observe prethermal plateaus for large driving frequencies ( ≥ 8), and these last for  > 10 3 / when  ≥ 9.The plateau values approach the diagonal ensemble value (black dashed line) with increasing driving frequencies.They deviate only slightly due to the correction in the Magnus expansion, which will be discussed later in this subsection.This confirms that the dynamics with fast Floquet drive are similar to the dynamics of the original Hamiltonian in this prethermal regime.By contrast, no plateaus are observed at low driving frequencies, where the time average of the meansquared magnetization quickly drops to expected value at infi-nite temperature, 2/.
We may perform the same analysis for different initial states.We choose product states in which the spins on the two sublattices of the square lattice are in the respective states |, 0 and | − ,  , where |,  = cos(/2) |0 + sin(/2)   |1 parametrizes an arbitrary state of a qubit (spin-1/2).This choice of states allows us to cover a wide range of the spectrum while ensuring that the total magnetization in the  direction vanishes.The latter constraint is convenient because the Hamiltonian conserves the total -magnetization,   =  =1    /.Thermalization therefore occurs in the eigenspaces of   .Low-energy product states however are not eigenstates of   .By choosing the expectation value of   to be zero, we maximize the overlap of the product state with the sectors of low -magnetization, for which we expect similar equilibration dynamics.
We find that all product states of the above form exhibit prethermal plateaus at similar driving frequencies and evolution times.We evaluate the prethermal values of the in-plane magnetization by performing the Floquet time average up to time  = 20/ with driving frequency  = 8.The result is shown for various initial states as a function of their mean energy in Fig. 2b.For comparison, we also show the diagonal and microcanonical ensemble values of the initial XY model, as well as the diagonal ensemble one of the first-order Magnus expansion of Floquet Hamiltonian, given by Here,  () is the piecewise constant Hamiltonian corresponding to the different terms of the Trotter expansion Eq. (1): where 1 ≤  ≤ Γ. Definitions of the different ensembles and higher orders of the Magnus expansion can be found in App.A and App.B, respectively.The values at the prethermal plateau are close to those of the diagonal ensemble  (1)  Magnus , indicating that the first-order truncation already serves as a good approximation for Floquet Hamiltonian in the prethermal regime.In Appendix B, we show that the higher orders lead to no significant improvement for  = 8.The thermal equilibrium values of the initial XY Hamiltonian, in both the diagonal and the microcanonical ensemble, deviate slightly from the Floquet values.Nevertheless, the comparison indicates that the prethermal properties of the Floquet system can reveal nontrivial thermal properties of the XY Hamitlonian.

A. Rescaling of survival probabilities
Without mitigation, noise will frustrate any naive attempts to observe prethermal plateaus on current quantum hardware.
As we show in Appendix C, noise provides an additional heating source to the Floquet driving already discussed; one that we expect to be far stronger with today's error rates, and one without favourable scaling in the system size.It is therefore desirable to develop an error mitigation technique to estimate the result of a noiseless quantum circuit from multiple measurements in a noisy circuit [47][48][49].However, we do not see a reliable method for extracting the desired noiseless results from measurements of the noisy state as this would imply the ability of inferring low-temperature results from high-temperature ones.
To circumvent this issue, we avoid direct tomography of the time-evolved observables on the noisy state.Instead, we convert observable estimation into a survival probability circuit, in a manner similar to that used in out-of-time-order correlators (OTOC) [50] or echo verification circuits [51,52].Following forward evolution, we apply the observable and then evolve backwards in time, followed by a projection onto the initial state (see Fig. 3a).This yields a survival probability of the form In the following, we drop the label  for notational simplicity.
For this procedure to work,  must be a (local) unitary.For spin systems, it is possible to write any observable as a sum of products of unitary Pauli operators and to measure each Pauli operator separately.Although   () only gives the expectation value of an observable up to a sign, one can infer the sign by tracking it from the known initial value, assuming | ()| is a smooth function [53].This simplifies previous Loschmidtecho style methods for learning | ()| , which required ancilla qubits, the preparation of large Greenberger-Horne-Zeilinger (GHZ) states [51] or intermediate re-preparation and measurement of qubits [52].As we will now demonstrate, a simple rescaling is remarkably effective at mitigating errors in the estimation of the survival probability.The strategy is based on the observation that the survival probability is approximately proportional to the probability of no error occurring.The reason is that the state becomes highly entangled during the evolution, at which point a single-qubit error results in an orthogonal state with high probability.To be more concrete, consider a single Pauli error    occurring at time  <  at site  and set the observable  to be identity.The survival probability is then given by [Tr (  ( )   )] 2 , where   ( ) is the reduced density matrix of |( ) at site .If this site is entangled with the other parts of the system, the reduced density matrix will be close to the identity (completely mixed) and the survival probability will be close to zero.
The above discussion suggests that the survival probability with noise is related to the noiseless value, times the probability that no error has occurred.For concreteness, we consider error models in which a single-qubit noise channel N  is applied to each qubit after every layer of unitary gates.Here,  is the probability that the channel causes an error on the qubit.The state of art gate error rate is around 0.5% for two-qubit gates [13,14], motivating our choice of  = 0.3% per qubit per gate as the reference value in our model [54].
Denoting the survival probability in the presence of noise by  N   (), we then expect that where  is the number of qubits and  is the circuit depth including both forward and backward evolutions.Crucially, no independent knowledge of the noise channel is required to estimate   ().By setting  = 1, we obtain  N  1 () ≈ (1 − )   since the noiseless survival probability satisfies  1 () = 1.Hence, where the right-hand side can be obtained from measurements on the noisy quantum device.We can make this argument more rigorous for channels that can be represented in terms of unitary Kraus operators.For such channels, the probability that a particular error occurs is independent of the state.This class of channels includes depolarizing and dephasing noise as well as all other Pauli channels [55].The survival probability after the noisy circuit can be expressed as where  N   () is the mixed state after the noisy forward evolution [56].We write the state where |  =  / Trotter () | is the state after noiseless forward evolution and  = (1 − )  /2 is the probability that no error occurred during the forward evolution.The density matrix ρ is the state conditioned on at least one error having occurred.The survival probability in noisy simulation then becomes Defining  = √︃ Tr ρ2 , we can use Cauchy-Schwarz inequality to obtain (see Appendix F) Since 0 < ,  ≤ 1,  N   ()/ 2 serves as a good approximation of   () when  .This condition can be satisfied over a broad range of parameters because  typically decays with the system size.In the most extreme case of global depolarizing noise, ρ is a completely mixed state, for which  2 = 2 − .The condition   then gives rise to for some constant .For  = 0.3%, this evaluates to  < 230 in the thermodynamic limit.For more general types of noise,  we similarly expect the scaling with  2 to hold up to some constant circuit depth in the thermodynamic limit.The noisy survival probability at this constant circuit depth will, however, decay exponentially when increasing the system size such that exponentially many measurements are required to resolve the signal.Nevertheless, we will show below that the number of measurements remains experimentally feasible in superconducting quantum devices for moderately sized systems with realistic error rates.
Two situations where Eq. ( 12) fails directly follow from our argument.One is the case when  approaches , as already discussed.The other is when the initial state does not thermalize.For example, the product state |+ = |0 ⊗  is invariant under the (Floquet) XY Hamiltonian and thus will not get entangled.However, even in this case Eq. ( 12) works well for many practical channels because two independent errors are unlikely to cancel each other.

B. Numerical results
We now numerically verify these considerations for the Floquet evolution of the XY model described in Sec.II C in the presence of local depolarizing noise.For each qubit, the noise channel is given by Other types of noise are discussed in the Appendix E. In Fig. 3b  and c, we respectively show  N  1 () and  N   () for the initial state | = |+ for different system sizes.The computations were performed using the Monte Carlo wavefunction method with the Cirq library [57].Each data point in the figure corresponds to an average over 2000 quantum trajectories.This number of trajectories is sufficient to observe convergence of the mean value in the region of our interest.The results agree well with Eq. (11).This also holds for different types of noise as we show in Appendix E. We note that the data points start to deviate from the estimated black dashed lines at   approximately linear in , in line with the expectation from Eq. (17).
To quantify the error of the mitigation strategy, we define Figure 4a shows the distribution of  of the mitigated data from Fig. 3.The error remains small for depths up to  ≈ 100.To compare different noise rates, we plot in Fig. 4b the square root of the moving average of  2 for different values of .Similar plots for types of noise other than depolarizing noise are presented in Appendix E. For reference, the typical value of   () in the simulation is around 0.3, which indicates that for circuit depth  = 80, the relative error is around 10% for  = 0.3%.Although these results confirm the effectiveness of our error mitigation strategy, we also observe a systematic shift of  towards positive values.This can be explained by the error terms in Eq. (15).Let us assume for simplicity that ρ = 1/2  , from which it follows that where we used the fact that  2 = 1 since  is hermitian and unitary.Hence, For certain error models, it may be possible to remove this systematic error by using a more complicated rescaling formula instead of (12).Nevertheless, the systematic error remains small as long as  2  Tr ( ρ2 ).We will now argue that our mitigation strategy enables the observation of prethermalization on current and near-term quantum devices.After Trotterization, the total required circuit depth  to simulate time evolution of the two-dimensional XY model up to time  max is which, from left to right, represents the number of layers per Trotter step, back and forward evolution, and the number of Trotter steps.To see prethermalization of the Floquet XY model, Fig. 2 indicates that  max should be at least 8/ for  = 8, which yields  ≈ 80.The estimation is within the limit of the maximum circuit depth from Eq. ( 17) and Fig. 4 for  = 0.3%, showing that our proposal is suitable for current and near-term quantum devices.
We have now gathered all the ingredients for the full simulation of the PEVP on a noisy quantum device.We consider the two-dimensional XY model on a 4 × 4 square lattice in the presence of depolarizing noise with noise rate  = 0.3%.For the observable, we focus on the correlator  = 4     +1 of a pair of neighboring sites at the center of the lattice.In Fig. 5, we plot the time averages of () 2 at driving frequency  = 8 as a function of the initial state energy  up to  = 10, corresponding to circuit depth  = 80.The initial states were chosen from the same set as in Fig. 2b.The black crosses represent the noise-free results, whereas for the red points the experiment was simulated including noise and error mitigation.The error bars show statistical errors due to fluctuations of different Monte Carlo trajectories, propagated from the standard deviations of  N   () and  N  1 ().Note that the sign of | ()| turns out to be constant during the Floquet time evolution in our range of simulations.In the long-time limit, the time average of the square is therefore equivalent to the square of the time average, given that they converge to a constant.
We find that the noise-free results lie within the error bars for all initial states and that the trend of the observable is well reproduced.This shows that our error mitigation procedure is viable to solve the PEVP.We note that the deviation between the noisy and noise-free results is biased since the red points are systematically above the black crosses, consistent with the expectation from Eq. ( 21).

C. Implementation
The results of the previous section show that our error mitigation strategy enables the solution of the PEVP for the XY model at a depolarizing noise rate of  = 0.3%.One more step remains to assess the experimental viability: an estimate of the number of required measurements.
In experiments, the survival probabilities are estimated from binary outcomes (success / failure).This gives rise to shot noise, which in turn sets a lower bound on the necessary number of samples.To achieve a statistical uncertainty of , roughly 1/ 2 samples are needed.For the error mitigation scheme to work, the shot noise must be smaller than the survival probability.As the noisy survival probability is suppressed by the factor (1 − )   , it follows that the number of needed measurements scales as (1 − ) −2  .We note that this number of samples is typically orders of magnitude larger than the number needed to suppress the fluctuations in Monte Carlo trajectories due to noisy dynamics.
Since the sample complexity scales exponentially with the number of qubits, this is an important limitation to the system size that can realistically be reached.Nevertheless, classically hard regimes are accessible with realistic parameters.For instance, setting  = 50 while keeping  = 0.3% and  = 80, we find that (1 − ) −2  ≈ 3 × 10 10 samples are needed.This is inconveniently large as current superconducting quantum devices can collect millions of samples on the time scale of minutes.However, a modest improvement in the error rate to  = 0.2% reduces the number of samples to a much more realistic value of 9 × 10 6 .
We have so far neglected the role of measurement errors, which occur with probability   ≈ 1% − 2% for each single qubit measurement in current devices [14,58].Fortunately, these errors are automatically remedied by our error mitigation strategy.The measurement errors simply suppress the survival probability by another factor (1 −   )  , which is independent of the circuit depth.For system sizes up to  = 50, this increases the required number of measurements by at most an order of magnitude.

IV. Summary and outlook
We have proposed the prethermal expectation value problem as a way to study thermal observables on noisy, intermediatescale quantum devices.Our approach relies on the observation that relatively large Trotter steps, which do not permit a rigorous bound on the Trotter error, can give rise to prethermalization.We showed that in the prethermal regime, the equilibration of observables is similar to the expected dynamics under the original Hamiltonian.It may be possible to approximate evolution under the original Hamiltonian even better by cancelling higher-order terms of the Magnus expansion at the cost of more complex circuits.The range of energies at which the observables can be probed is set by the range of energies of the used intial states.We restricted ourselves to product states for this work, but the protocol can straightforwardly be extended to different initial states, which may increase the range of accessible energies.
We further demonstrated that the prethermal regime is experimentally accessible with noise rates of near-term devices using an error-mitigation scheme based on measuring and rescaling survival probabilities.This scheme is not limited to the PEVP but can be applied much more broadly in the context of quantum simulation.Our work provides all necessary ingredients to also study the approach to equilibrium and to extract, for instance, diffusion constants.Alternatively, one could consider the quantum dynamics of models which do not thermalize, such as quantum scars [59,60] or many-body localized systems [61,62].
Our work creates a new avenue to demonstrating useful quantum advantage on noisy devices.Although the XY model studied here can be efficiently simulated on classical computers with quantum Monte Carlo methods [45], our approach can be readily adapted to more complex Hamiltonians.As a simple modification of the XY model, one might consider adding a site-dependent sign to the interaction strength .This renders classical simulation of this model much harder since it causes a sign problem in quantum Monte Carlo methods [63][64][65].The complexity of our proposed approach to quantum simulation however remains unaffected by this modification.Hence, quantum advantage may be within reach for studying the equilibrium properties of Hamiltonians with a sign problem.

Appendix B The Magnus expansion
The Magnus expansion serves as a series expansion for the effective Hamiltonian of a Floquet driving  () with period : In general, the Magnus expansion is not convergent [16,43] and thus higher order contributions are not negligible for finite driving frequencies.Nevertheless, its finite truncation is still expected to approximate the quasi-stationary prethermal plateau [24].To be more precise, let  () Magnus =  =0 Ω  denote the -th order truncated effective Hamiltonian, then there exists  0 = O (/ ) such that The general estimation Eq. (B2) for the unitary evolution operators has a linear dependence on system size, which does not imply prethermalization for  exp(O (/ )).When considering local observables acting on a subsystem  and short-range interacting Hamiltonians, however, the bound can be tightened for the reduced density matrix   : for the same  0 , where the system size dependence is erased [24].
For the proof of this relation to hold rigorously, the required driving frequency is  ≥ 16  ≈ 100 for nearest neighbour interacting Hamiltonians, while in our numerical simulation in Fig. 2, prethermalization has occurred for  ∼ 8.For all of our numerical simulations of the XY-model, we use the Trotterization shown in Fig. 6a.In Fig. 6b-c the differences between Floquet evolution and its Magnus expansions up to the third order are plotted.Note that the zeroth order Magnus expansion is just the original non-Floquet Hamiltonian.For  = 8, it turns out that the  = 1 case already gives a good approximation of the Floquet Hamiltonian.

Appendix C Difficulty of error mitigation in time evolution
The difficulty of error mitigation of observables by measuring them directly can be explained in the following two ways.First, if we take the formalism as in Eq. ( 14), the aim will be to obtain   () =   | |  from Although the second term vanishes for global depolarizing channel and traceless , one can not use the same trick as Eq. ( 12) to directly estimate , since setting  = 1 would not give any meaningful output.Of course, it is in principle still possible to measure the survival probability with backward evolution that approximates  2 and take its square root.In the latter circuit, however, any coherent noise will partially cancel in forward and backward evolutions, which gives a different value of  from the one we need in Eq. (C1).
Alternatively, we can think about the problem using a random walk picture, where an initial state will be quickly heated during time evolution on noisy digial simulators, because of the strong energy dependence of the density of states (DOS).Let us consider the quantum trajectory simulation process of a noisy circuit.Assume the absolute average energy change per error to be a constant  > 0 and denote the expectation value of the energy of the simulated state after  errors by   .The probability of increasing or decreasing energy after each gate of noise will be P( +1 =   + ) P( +1 =   − ) = DOS(  + ) DOS(  − ) .

(C2)
For short-range interacting and locally bounded Hamiltonians, the DOS converges weakly to a Gaussian in the thermodynamic limit [68]: where  is a constant depending on local energy scale.(C6) It gives rise to an exponential decay in energy with regard to the circuit depth.In other words, the initial state will be heated to infinite temperature, and this process is much faster than the heating caused by Floquet driving in the prethermal regime.Post-selection error mitigation strategies for direct time evolution would then imply that it is possible to extract low temperature properties from higher temperatures.There is no reason to assume that this would be the case, especially in the case when phase transitions exist.
FIG. 1.Different regimes of the dynamics of local observables depending on the Trotter step  and the evolution time .The black lines separate three regimes: bounded Trotter error (bottom), Floquet prethermalization (middle), and chaotic dynamics (top).The lower line scales as O max  − /(+1) ,  −1  −  , following er- FIG. 2. (a) Prethermal plateau of the 2D XY model for system size  = 4 × 4. The initial state is |+ .The colored lines show the time averages of the mean-squared in-plane magnetization for different Trotter step sizes , corresponding to different driving frequencies  = 2/.The large circles stand for the starting and end points of the plateaus according to Definition 4 with tolerance  = 0.05 and a maximum value of  2  of 10 3 .The black dashed line represents the value in the diagonal ensemble of the initial Hamiltonian.(b) Comparison of the value at the prethermal plateau with the values in the microcanonical and diagonal ensemble values of the initial XYHamiltonian and in the diagonal ensemble of the first-order Magnus expansion.The system size is 4 × 4. The driving frequency is as  = 8 and the plateau value is taken from the time average at  = 20/, which is on the prethermal plateau for all computed initial states with tolerance  = 0.05.For the microcanonical ensemble, we average over an energy window of width  = 0.5 in the   = 0 subspace (see Appendix A).

FIG. 3 .
FIG. 3. (a) Quantum circuit to map the expectation value of a (unitary) observable onto a survival probability.The initial state is prepared with ,  =  1 ,  2 ,  3 or  4 is a single step in the Trotter decomposition, and N denotes a local noise channel.(b), (c) Dependence of  N  1 () and  N   () on the circuit depth  and system size  in the presence of depolarizing noise with error probability  = 0.3%.The initial state is | = |+ .The observable  = 4     +1 is a correlator in the center of the lattice.The black dashed lines represent the scaling predicted by Eq. (11).

FIG. 4 .
FIG. 4. (a) The mitigation error  N   for the range of data in Fig. 3 where  N  1 () > 0.01.We choose this cutoff due to the limited number of trajectories in the simulation, which limits the significant digits.(b) The root-mean-square of , √︁ data  2 / data , evaluated over the window of circuit depth [ − 16,  + 16] in (a).

FIG. 5 .
FIG. 5.The time average of       +1 ,  () at  ≈ 7.85 with  = 8, corresponding to 10 Trotter steps.The black crosses show the noiseless result.The red points were obtained by applying our error mitigation strategy to noisy simulations with a single-qubit depolarization rate  = 0.3%.Error bars indicate the statistic errors due to fluctuations of different Monte Carlo trajectories, propagated from the standard deviations of  N   () and  N  1 ().The system size  = 4 × 4.

U 1 U 2 U 3 FIG. 6 .
FIG. 6. (a): Trotterization of 2D XY Hamiltonian.The circles represent qubits and the rectangles on the bounds represent twoqubit evolution gates on neighbouring qubits.(b-c): Simulation of evolving Floquet XY model with the Magnus expansion truncated to -th order.Their differences from Floquet evolution are plotted.The initial state is |+ and  =  2  + 2  .(b):  = 4, (c):  = 8.Note the difference in scale on the y-axis.The system size is  = 4 × 3.