Composite QDrift-Product Formulas for Quantum and Classical Simulations in Real and Imaginary Time

Recent work has shown that it can be advantageous to implement a composite channel that partitions the Hamiltonian $H$ for a given simulation problem into subsets $A$ and $B$ such that $H=A+B$, where the terms in $A$ are simulated with a Trotter-Suzuki channel and the $B$ terms are randomly sampled via the QDrift algorithm. Here we show that this approach holds in imaginary time, making it a candidate classical algorithm for quantum Monte-Carlo calculations. We upper-bound the induced Schatten-$1 \to 1$ norm on both imaginary-time QDrift and Composite channels. Another recent result demonstrated that simulations of Hamiltonians containing geometrically-local interactions for systems defined on finite lattices can be improved by decomposing $H$ into subsets that contain only terms supported on that subset of the lattice using a Lieb-Robinson argument. Here, we provide a quantum algorithm by unifying this result with the composite approach into ``local composite channels"and we upper bound the diamond distance. We provide exact numerical simulations of algorithmic cost by counting the number of gates of the form $e^{-iH_j t}$ and $e^{-H_j \beta}$ to meet a certain error tolerance $\epsilon$. We show constant factor advantages for a variety of interesting Hamiltonians, the maximum of which is a $\approx 20$ fold speedup that occurs for a simulation of Jellium.


I. INTRODUCTION
Quantum simulations of quantum systems, suggested by Feynman in 1982 [1], provides what is perhaps the most natural application for quantum computers with expected exponential advantages over the best classical algorithms [2].The quantum simulation problem can be summarized as follows: Given a Hamiltonian H, a time t, and an error tolerance , we wish to implement, on a quantum computer, an approximate function f to the time evolution operator e −iHt such that e −iHt − f (iHt) κ ≤ for some distance norm κ.The significance of this operator comes from quantum mechanics where e −iHt is the solution to the Schrödinger Equation with a time-independent Hamiltonian: i∂ t |Ψ = H |Ψ in units where = 1.Motivation for quantum simulation is drawn from the fact that it is a necessary subroutine for phase estimation, which allows one to learn the eigenvalues of H [3,4].This of particular interest in quantum chemistry and materials [5][6][7] where Hamiltonians are large and analytic solutions to the eigenvalue problem are not known.In addition, quantum simulation also allows for the study of dynamical quantum observables [8].Classically, implementing the time evolution operator is a difficult problem due to how rapidly the matrices grow with respect to the size of the system of interest.As well, quantum effects such as interference are hard to simulate classically.This problem has received extensive attention over the last few decades, and many different solutions have been proposed.These solutions can in some sense be broken down into two families; those built on product formulas including Trotter-Suzuki [2,9,10] formulas and QDrift [11][12][13][14] and those implementing more intricate Hamiltonian transformations such as Linear Combinations of Unitaries [15,16], and Qubitization [17][18][19].Put simply, if we have a Hamiltonian that can be expressed as a sum of terms H = j H j with a time evolution operator e −it j Hj , a product formula is an approximation to e −iHt ≈ j e −iHj t which is not exact in general due to non-zero commutators in the Taylor series.In the absence of fault-tolerant quantum computers, powerful methods have also been discovered to study the properties of H on classical computers, particularly in the field of Quantum Monte-Carlo [20,21].Here, the propagator of interest is often the time evolution operator which has been Wick transformed to imaginary time e −βH : it → β.This is implemented both as a projection method to study the ground state of H, and to study the partition function Z = Tr e −βH from statistical physics where β is the inverse assume that these operators can be written as a finite sum of terms, H = L i=1 H i with each H i ∈ C 2 n ×2 n .Later, in Section IV, we will discuss algorithms for geometrically-local Hamiltonians in which case each term in this sum will only be supported on a finite subset of H ; we save those details for later.Note that we can normalize the terms in the above sum by factoring out their respective spectral norms (Schatten infinity norm on matrices), h i = ||H i || such that H = L i h i H i .We also impose h i ≥ 0 by pulling any phase factors into H i ; this can always be done without loss of generality.The Schatten p-norms will be a useful concept throughout, and they are defined as H p = ( i s p i ) , where s i are the singular values of the matrix H. From this definition, we observe that the Schatten infinity norm coincides with the spectral norm.Other norms will be discussed throughout the paper.Wherever a norm • is lacking a subscript, it will be assumed to be the Schatten infinity norm.All other norms will be subscripted appropriately.In what follows, Hamiltonian operators are taken to be time independent, and they generate dynamics through the time evolution operator, U = e −iHt .Throughout, we make use of the density matrix formalism to describe mixed quantum states, with quantum channels being the object that evolve states, rather than the evolution operator e −iHt .By quantum channels we mean a mapping A(ρ) = i E i ρE † i that is completely positive and preserves the trace, Tr(ρ).For this to be true, the operators E i must satisfy the condition i E i E † i = 1 1.These conditions are required so that information regarding the channel operation is preserved and not lost to an external environment.For example, consider performing a measurement on the output density matrix A(ρ).If we find that Tr(A(ρ)) ≤ 1, we conclude that the mapping does not provide a complete description of the processes involved, since there must be some other process(es) and therefore measurement outcomes that occur with non-zero probability [4].For easy distinction, we reserve calligraphic fonts for channels A while operators A are written in the standard way.Given a density matrix ρ(t 0 ) that describes the state at some initial time t 0 , the time evolution channel is then defined in the following way U(ρ(t 0 ), t) := e −iH(t−t0) ρ(t 0 )e iH(t−t0) → ρ(t). ( When not specifically defining evolutionary channels, we will often use t 0 = 0 and write ρ(t = 0) = ρ to be the initial state.If time evolution is broken into m time intervals {∆t 1 , ∆t 2 , ..., ∆t m }, the state vector evolution looks like e −iH∆tm ...e −iH∆t2 e −iH∆t1 |ψ .However, in the channel description we do not take products of channels, rather we apply a channel multiple times in a channel composition which we denote with •.For example, U(t i+1 ) • U(t i )(ρ) = e −iHti+1 e −iHti ρe iHti e iHti+1 .
Note that in this expression, t i corresponds to the time interval during which the channel acts.A significant portion of this paper is dedicated to the investigation of imaginary-time channels U(ρ, β) = 1 N e −βH ρe −βH and the algorithms that closely approximate them.Upon first glance, the notion of imaginary time seems to violate the trace preserving property of a quantum channel, so we introduce N as a normalization factor.To avoid confusion, we use β throughout these discussions instead of t, motivated by statistical mechanics where β is considered an inverse temperature.In these cases, we are not considering a dynamical evolution of the system in the physical sense, but are rather interested in calculating properties of the Hamiltonian H, such as its partition function Z = Tr e −βH or its ground state.This is because time evolution of a quantum system is isomorphic to the cooling of a statistical ensemble via a Wick rotation.We wish to make clear that all discussions pertaining to algorithms in imaginary time are not being proposed for digital quantum simulation, but as classical algorithms for use in calculations like Quantum Monte-Carlo.For a study of imaginary time evolution on a quantum computer, see [25] B. Algorithmic Cost Model Given that we are dealing with channels containing only product formulas, a natural cost model is to count the number of exponential gates of the form e −iHj t , required to achieve some error in a simulation.Here, H i is a specific term chosen from a sum of Hamiltonian terms and t is an appropriately-chosen time slice of the total simulation time t; the following sections on the algorithms of interest explain how to choose these time slices to meet a desired simulation accuracy .Throughout, the aforementioned cost will be denoted C and often subscripted with the algorithm of interest.This cost model is convenient analytically as the algorithms we explore contain an exact amount of iterations or samples of terms that can be easily equated to a count of exponential gates.This situation is different from common forms of cost in the literature such as a query complexity, which requires an assumed access to an oracle that necessarily has an implementation overhead.In this case, the overhead would come within the implementation of each e −iHj t , which depends on the form of the Hamiltonian terms.In the case of common Pauli and Fermionic Hamiltonians, circuits are well known and relatively simple to analyze [26].For example, it is common in Heisenberg-like models to have interactions that are a products of Pauli operators such as σ ν i σ ν i+1 σ ν i+2 .The time evolution of these operators e −it(σ ν i σ ν i+1 σ ν i+2 ) can be simply implemented using rotation gates R ν (θ) = exp −i θ 2 σ ν .In fact, using the following circuit we can implement the time evolution for any product of Pauli operators using just R z (θ): This circuit effectively implements exp −i θ 2 σ z i σ z i+1 σ z i+2 , and we can build a product of any length by continuing this pattern.Switching this to a product of σ x or σ y is straightforward given that R x = ĤR z Ĥ and R y = T 2 R z (T 2 ) † .We can then adjust the simulation time t through our choice of angle θ.Fermionic Hamiltonians follow in a similar way, given that we can map them to a Pauli Hamiltonian using the Jordan-Wigner transformation, which will be outlined in Section V A 1. The exampled circuit also reinforces the need for product formula approximations since while we can simply implement the above mentioned products of operators, implementing e −it(σ x +σ z ) is highly non-trivial, and we approximate it as e −itσ x e −itσ z as we detail in the following sections.In our cost model, this approximation would be counted as 2 gates.Numerically, counting exponential gates is also a convenient cost model since it allows for straightforward counting of gates as they are applied.

C. Trotter-Suzuki Formulas
Given an operator that can be written as a finite sum of terms A = L i A i , the simplest approximation for the corresponding operator exponential is then e L i Ai ≈ L i e Ai .While this holds as an equality for exponential functions for variables, it does not in general hold for operator exponentials since individual terms of A do not necessarily commute.This can be easily seen by considering the Taylor series of a simple exponential approximation e (A+B)x ≈ e Ax e Bx to second order in x: An error clearly arises due to a non-vanishing commutator, meaning the approximation is exact for commuting algebras.However, the nature of this error opens up the possibility for improving this approximation.
If one can symmetrize this formula, the second order commutator will vanish.In reference to the previous example, a perhaps obvious symmetrization is e , leaving only error terms in O x 3 .More generally, for an operator A = i A i , the 2nd order Trotter-Suzuki decomposition is given by where the flipping of indices in the two products is indicative of reversing the ordering of products.One can readily imagine that it is then possible to continue this strategy and build higher order formulas and cancel out arbitrarily high commutator errors in the product formula approximation.Indeed, we can build these highly non-trivial decompositions in the following way [27,28]: Definition 1 (Trotter-Suzuki Product Formula).Given a linear operator A acting on a finite dimensional Hilbert space that can be represented as a finite sum of linear operators A = L i A i , a Trotter-Suzuki product formula approximation S 2k (Ax) for A of order 2k (k > 1) can be constructed in the following recursive fashion: where .
Hence, in order to obtain a desired product formula of order 2k, we start with the second order formula S 2 (Ax) given in Equation ( 5) and recursively build the higher order formulas.To reiterate, the advantage of doing so is that for a product formula of order 2k, the error of the approximation is ∈ O t 2k+1 with t here as the time interval [28].
When the general operator e Ax is replaced by the Hamiltonian time evolution operator e −iHt , where H = L i H i , it becomes obvious how product formulas can be used to evolve quantum states |ψ .However, we also wish to generalize this to evolve probabilistic mixtures of states, which are represented by density matrices ρ = i p i |ψ i ψ i |.A Trotter-Suzuki evolution formula for a channel can be written as follows: Definition 2 (Trotter-Suzuki Channel).Given a Hamiltonian H, a density matrix ρ, times t and t 0 (t > t 0 ≥ 0), and order 2k, then a Trotter-Suzuki channel T 2k (ρ(t 0 ), ∆t) performs the operation T 2k (ρ(t 0 ), ∆t) → ρ(t) and can be defined as: Where S 2k (−iHt) represent the product formulae from Equation ( 5) and ∆t = t − t 0 .
Since the error of this approximation accumulates considerably for long iteration times, we introduce the iteration parameter r, and apply the channel r times with time intervals t /r.As a result, longer time simulations become more expensive in the number of iterations r required to manage the error that accumulates to do the simulation time t.The composition of r Trotter-Suzuki channels is straightforward, and applying r channels each for time t /r can be written as The cost of this algorithm for first order decomposition is simply rL where L is the number of Hamiltonian terms.For order 2k, the cost is ΥLr where Υ = 2 × 5 k−1 .
Looking at cost formulas for different orders 2k, we desire information about r, specifically, the minimum value of r required to meet some error tolerance where T 2k (ρ, t /r) •r − U(ρ, t) κ ≤ for some norm κ.In Ref. 22, the diamond norm is selected, which is the completely bounded trace norm between channels tensored with the environment.The diamond norm is defined as follows: . This is a good analytical norm to choose, given that it maintains the interpretation of distinguishability between states from the trace distance (see Section B 1 b).However, it is a bound on channels (maximized over possible input states ρ) and thus yields the interpretation of distinguishability between quantum channels.This norm is also sub-additive and sub-multiplicative, which makes it convenient to work with.The procedure in Ref. 22 upper bounds this norm, sets the value equal to , and solves the expression for r to arrive at the following first and 2kth order costs where C, to reiterate, is the number of operator exponentials that need be applied rather than an exact gate count where gates are drawn from some universal gate set.Here, This is a nested commutator error that arises from 2k symmetrizations in the product formula.The main takeaway from this formula is the structure of the Hamiltonian that the simulation cost depends on.Trotter-Suzuki channels clearly depend on the number of terms in the Hamiltonian L and on the commutator structure of the Hamiltonian.The simulation cost does not directly depend on the magnitude of the terms H i , only indirectly through their commutators.This is important to keep in mind when examining the next algorithm and its cost dependence.

D. QDrift Random Compiler
The QDrift algorithm was discovered by Campbell [11].It generates random product formulas allowing for the evolution of a system to stochastically drift (through the Hilbert space) towards the true evolution with high probability.Using the introduced quantum channel notation, the algorithm works as follows: Definition 3 (QDrift Channel).Given a Hamiltonian H = L i=1 h i H i , density matrix ρ, times t and t 0 (t > t 0 ≥ 0), and let p i = hi λ with λ = L i=1 h i be a probability distribution from which Hamiltonian terms are drawn, then a QDrift channel Q(ρ(t 0 ), ∆t) → ρ(t) can be defined as: where ∆t = t − t 0 .
This channel mixes unitaries with a single sample from the distribution p i .Similarly to the Trotter-Suzuki approach, the accuracy of this approach improves with the number of samples N that go into our "random product formula".To write the channel for multiple samples, it is useful to think of sampling a vector j = {j 1 , j 2 , ...j N } of length N , i.i.d from the distribution P j = λ −N N k=1 h j k .This corresponds to implementing the following unitary V j = N k=1 e −iHj k τ with τ = tλ /N.With this in mind, we can neatly write down the expression for the channel with arbitrary samples N as where the sum is performed over all possible vectors j of length N .In terms of preparing these gates with a quantum circuit, this algorithm can be thought of as a linear combination of unitaries under classical control.
Similarly to the Trotter-Suzuki formula, we are also interested in the algorithmic cost of QDrift.A diamond distance upper bound is provided in Ref. 11, which is then used via the same procedure to provide a cost function in Ref. 22, by using N = 4λ 2 t 2 / and restricting ∈ (0, λt ln(2)/2), The important take home message here is the difference in the structure of the algorithmic cost between the Trotter-Suzuki formula [Eq.( 9)] and QDrift [Eq.( 13)].In QDrift, the cost does not depend on the number of Hamiltonian terms L, or the commutator structure.In contrast, it depends on the size of the terms in the sum of the spectral norms λ.It is important to keep this in mind when constructing composite channels.

E. Composite Simulation Formulas
Equipped with the Trotter-Suzuki and QDrift channel formulas, we wish to hybridize these approaches to form a composite time evolution channel.The eventual goal is to do so in such a way that we can minimize the "side effects" of each algorithm while taking advantage of the strengths of each approach.Following Ref. [22], we define the composite channel as follows.We consider the Hamiltonian H = A + B, where A = L A j=1 A j and B = L B j=1 B j .Other terms of interest, such as the number of summands in a set L are written with a subscript A to indicate that they belong to a set of Hamiltonian terms that will be simulated by a Trotter-Suzuki channel, and the same goes for the B terms, for a QDrift channel.The composite channel takes the following form where the channel subscripts indicate the subset of the Hamiltonian that it is simulating.Note that there is substantial freedom in the construction of this channel.The first obvious freedom is the partitioning into the sets A and B. Section V B is devoted to this task.The next choice lies in the Trotter-Suzuki order 2k.Further, we can construct an outer loop as in Ref. 22 which uses a Trotter-Suzuki style symmetrization of the channels themselves.For example, a composite channel with outer order 2k = 2 looks like: Such an "outer-loop" with a respective "outer-order" was introduced as a strategy to make more detailed analysis of simulation costs.The outer loop can be thought of as a product formula for composite channels, symmetrizing the channels and allocating the respective time slice, while the inner Trotter order does so on the operator exponentials as before.However, when introducing the outer-order in Ref. 22 it was enforced to match the same order as the inner factorization.In this paper, one of our goals is to numerically evaluate the exact costs of these composite algorithms, and this gives us the freedom to independently set the inner and outer orders.However, an actual application of the outer loop is situational as it requires specific knowledge about the commutator structure for it to be of use.We further discuss this in later sections where we will resort to the notation X 2k,2l for 2k inner, 2l outer orders respectively.If only a single superscript is given, it is understood to be the inner order with the outer order fixed at 1.
We now restate the first order cost result for composite channels from Ref. 22, where the A terms are placed in a Trotter channel and the B terms are placed in QDrift: We only restate the first order Trotter-Suzuki result due to the fact that this is the expression from which we will draw most of intuition to build our heuristics for partitioning.As well, the second order cost formula is of a similar structure with extra α comm and Υ terms, but it is based on channels with matching inner-order and outer-order, which our numerical work does not follow in general.Inspecting Equation ( 16), it clearly inherits the structure from Trotter in its dependence on the commutators and number of terms L A in the set A (first term), and on QDrift in the sum of the spectral norms λ B and number of samples N B of the terms in the set B (third term).The second term adds to the cost due to the hybridization of the Trotter and QDrift channels.Clearly, in order to gain as much advantage as possible using this algorithm over the noncomposite approaches, we desire that the Hamiltonian of interest has a structure that we can exploit via a good choice of partitioning.An interesting structure might be of an H that contains terms of largely varying magnitude, such that there might be a sharp contrast between the small and large terms, rather than their spectral norms following say a Gaussian distribution.Having small magnitude terms that largely outnumber the large-magnitude terms is also likely desirable.This understanding naturally leads into the next topic of how to choose the partitioning.This task can be motivated by the structure of H and its commutators, if known a-priori.For example, inspecting Eq. ( 16), if we have a small number of large commuting terms and numerous small non-commuting terms, it seems natural to place the large terms into the Trotter channel, but sample the smaller magnitude terms with QDrift.The question of choosing effective partitioning will be further addressed in Section V B.

F. Local Lattice Hamiltonians
Given the observation of commutator-dependent error in previous sections, an interesting system to study are geometrically-local Hamiltonians.These systems are defined on a lattice, and they have only local interactions, such as nearest-neighbour terms.It turns out, via a Lieb-Robinson bound (Section II G), that commutators of interaction terms fall off exponentially with the separation distance in the lattice.This motivates the search for a local product decomposition to simulate the time evolution of local systems.In fact, computational advantages have been demonstrated for local Hamiltonians in Ref. 23.We follow this work in defining geometrically-local Hamiltonians as Definition 4. Given a D-dimensional lattice Γ ∈ Z D , a local Hamiltonian can be written in the following way: Where H X is only supported on the subset X that contains only immediately adjacent lattice points, and it acts as and dist is the graph distance between the lattice indices.Each term in the Hamiltonian is also normalized such that H X ≤ 1.
In our analysis below, we will explicitly state when we are referring to a local Hamiltonian.These Hamiltonians can be used to describe a wide variety of physical systems, perhaps most famously is the quantum Heisenberg model, which will be investigated numerically in Section V.If locality is not explicitly mentioned then the Hamiltonian is of the aforementioned more general structure.

G. Lieb-Robinson Bound
A Lieb-Robinson bound is essentially a bound on the speed at which information propagates through a quantum system that has interactions governed by a local Hamiltonian.This result is particularly interesting due to the fact that it holds for non-relativistic quantum systems, meaning that in no part of the system is the finite speed of light c enforced.Instead, the locality of the system's interactions, as well as the geometry of the lattice leads to the emergence of a Lieb-Robinson velocity v LR , which bounds the speed of causality.The central idea behind this bound is that the commutator between the time evolution of an operator A X and another operator B Y , such that the operators are supported on disjoint sets X ∩ Y = ∅, have an exponentially small commutator.More formally, a general Lieb-Robinson bound is often given in the following form [29,30]: where µ is a constant that depends on the lattice and F depends on the cardinality of the sets X and Y .It is also important to note how the distance between sets is defined, and it appears in the exponent above.The distance between the sets X and Y is dist(X, Y ) = min x∈X,y∈Y dist(x, y), where dist(x, y) is the distance between elements in the set; in a lattice this measure is just the absolute value of the difference of their indices.From a simulation viewpoint, and specifically regarding our Trotter-Suzuki product formulas, we know that the error in these approximations depend on commutators of terms.Therefore, carefully-chosen product decompositions may have exponentially-small error.This understanding motivated the algorithm in Ref. 23, and in Sec.IV we combine this algorithm with the composite framework.

III. IMAGINARY TIME CHANNELS ALGORITHMIC ANALYSIS
Within this section, we show that the composite algorithms proposed by Hagan and Wiebe in Ref. [22], for the purposes of digital real-time quantum simulation, can be applied onto the imaginary time case with similar error bounds and asymptotics.As mentioned previously, the algorithms in this section are formulated with the intention of calculating properties of the Hamiltonian on classical computers.These algorithms can therefore be considered as quantum-inspired classical algorithms, a topic unrelated to the problem of quantum imaginary time evolution [25].Within this section, we introduce the imaginary-time QDrift, Trotter-Suzuki, and the composite channels, and we bound their distance norm with respect to the ideal unitary evolution.
We first introduce the notion of an imaginary time evolution channel, given in the definition below, and demonstrate how it can be viewed as a ground state preparation, or "cooling" procedure.
Definition 5 (Imaginary Time Evolution Channel).Given a Hamiltonian H and input state ρ, we define the imaginary time evolved state as the action of the following map This can be seen as an imaginary time evolution channel most straightforwardly when considering quantum state vectors as opposed to density matrix.The time-independent Schödinger equation tells us |ψ(t) = e −iHt |ψ(0) .If we take this expression and perform a Wick rotation that sends t → −iβ we get that |ψ(β) = e −βH |ψ(0) .If we consider the action of this matrix e −βH on the density matrix of the state |ψ(0 we get e −βH |ψ(0) ψ(0)| e −βH due to the fact that e −βH is a Hermitian operator.Relaxing the input from a pure state to a density matrix and normalizing by the trace yields the channel provided in Definition 5.
Often when imaginary time is discussed in quantum mechanics the state vector representation is used, in which case the imaginary time exponential operator has the following property This clearly only holds whenever our initial state |ψ i has nonzero overlap with the ground state |ψ GS , which is expressed as ψ i |ψ GS = 0.In the density matrix picture, we have the following equivalent property: such that supp(ρ i )∩supp(ρ GS ) = ∅.This property can be seen by expanding ρ into an arbitrary mixed state in the energy eigenbasis.Now that the maps of interest have been made clear, we can consider algorithms that closely approximate them and analyze their performance analytically by bounding their error.An important detail is the choice of distance norm to quantify this error.In comparison to real time composite algorithms, the aforementioned diamond norm is not a good choice for imaginary time, given that this norm is about distinguishing quantum operations on a system and ancillary space, whereas here we are simply performing classical calculations.For this reason, we bound the following induced Schatten p → q norm on a map Φ: Φ p→q = max ρ q =1 Φ(ρ) p .Here, Φ will be the difference between the ideal map and that which is generated by the composite algorithm, and we will investigate the case of the induced trace norm where p = q = 1.Despite the fact that the maps being investigated in this section are non-linear, due to the trace operation in the denominator, the outputs of the maps are Schatten-class linear operators, and thus the induced Schatten norm is well defined.Further, by analyzing 1 → 1 norms for two density matrices, such as max ρ 1 =1 ρ − σ 1 ≤ , we can guarantee from properties of the trace distance that the measurement statistics for ρ and σ will deviate by at most .
Introduced in Section II D, QDrift was introduced as a quantum simulation algorithm that, in real-time simulations, has a unique property in that its error does not depend on the number Hamiltonian summands, nor on the commutators between them, only on the total size of these operators.To the best of our knowledge, this algorithm has not been applied to imaginary time calculations.We recover all the characteristic properties of the QDrift channel in imaginary time with renormalizing trace operations in the following Theorem.
Theorem III.1 (Imaginary Time QDrift).Given a Hamiltonian H = j h j H j with λ = j h j , imaginary time β, number of samples N , and a density matrix initial state ρ, then the induced Schatten 1 → 1 norm of the difference between the imaginary time QDrift channel and exact imaginary time evolutionary channel has the following bound: given that |e 2βλ /N − 1| < 1 which is satisfied if N > 2βλ ln 2 .Further, If the constraint λ N ≤ 0.01 is satisfied, then the bound simplifies to the following: where the constant C ≈ 29.71747.
For a proof of this theorem see Appendix A. The proof consists of repeated applications of the triangle inequality and sub-multiplicative property of the diamond norm, as well as manipulations with the geometric series and a tail bound on a Taylor series.The geometric series arises when one uses a Taylor expansion on the terms in the denominator and takes the trace of the zeroth order term, which is just Tr ρ = 1.A constant theme throughout the analytics in this paper are multiple exponential factors that arise due to the techniques used to bound trace terms.Multiple techniques are used to accomplish this throughout, and the exponential factors are unavoidable in imaginary time, given that the exponential operator is no longer unitary and has an exponentially decaying norm.For clearer interpretation of the bounds we use linearization techniques to either show that they disappear under some constraints when proving an inequality, or showing they become O (1) in some limit for an asymptotic bound.The important takeaway from this theorem is that the QDrift error scales essentially the same was in imaginary time as it does in real time, thus making it a candidate algorithm to be applied in Quantum-Monte Carlo calculations.As well, this promise on the ratio λ N is reasonable to make as this algorithm excels in the regime of small terms.

B. Imaginary Time Trotter-Suzuki
Different from QDrift, work has already been done to provide bounds on non-unitary product formulas [10].However, these bounds are given in terms of operator norms rather than channel distances, as well as they are given without renormalizing trace operations.However, these existing results are integral in proving the theorem below.
Theorem III.2 (Imaginary-Time Trotter-Suzuki Channels).Given a Hamiltonian H = j h j H j with λ = j h j , imaginary time β, and a density matrix initial state ρ, then the induced Schatten 1 → 1 norm of the difference between the imaginary time Trotter-Suzuki channel T 2k (ρ, β) and the exact imaginary time evolutionary U(ρ, β) channel has the following bound: which yields the following asymptotic bound: Here Υ is the number of stages of the product formula, 2k is the order of the product formula, and λ is the sum of the spectral norms of the Hamiltonian summands.Here O (•) is understood in the infinite limit of its arguments.
For a proof of this theorem, see Appendix A. The proof follows similar strategies to that of imaginary-time QDrift, however, one cannot simply expand a geometric series in the denominator here given that we have a product instead of a sum of terms.So we instead bound the trace terms using a combination of von Neumann's trace inequality and Weyl's inequality regarding the singular values of matrices.Similar to the QDrift case, we obtain a similar looking bound to that obtained in [22] for the real time case, with some additional exponential factors as expected.Considering asymptotics, the bounds behave the same in both cases where we again see the dependence of the error on a nested commutator sum.

C. Imaginary Time Composite Channels
Now equipped with error bounds on both QDrift and Trotter-Suzuki channels in imaginary time, we can proceed with the analysis of channels composed of the two.Given the difficulty of analyzing the renormalizing operations in the trace terms, we will again make use of asymptotic notation as was done in the Trotter-Suzuki analysis in the previous section.We will once again write a partitioning of the Hamiltonian as Theorem III.3 (Imaginary Time Composite Channels).Given a partitioned Hamiltonian H = j a j A j + i b i B i with λ A = j a j , λ B = j b j , imaginary time β, QDrift samples N B and a density matrix initial state ρ, then the induced Schatten 1 → 1 norm of the difference between the imaginary time Composite channel X 2k (ρ, β) of order 2k and the exact imaginary time evolutionary channel U(ρ, β) has the following bound: given that r ≥ Υβλ.Here, all parameters in this bound correspond to the same quantities in Theorems III.1 and III.2, where the subscripts A and B indicate their belonging corresponding set.
For a proof of this bound see Appendix A. The condition on r is not overly restrictive, nor arbitrary.In general, we do not expect to simulate the imaginary time evolution with r sub-linear in either the imaginary time β or the norms of the Hamiltonian λ, and Υ is an artefact of using higher order Trotter-Suzuki formulas.This result yields three error terms that we might expect.It is a sum of the QDrift and Trotter error, plus the error that arises from the initial partitioning of the Hamiltonian into two subsets.Therefore, it becomes apparent that if we have information about the terms in the Hamiltonian, we can potentially exploit the form of this error bound by choosing a good partition.By good partition, we mean one that is close to the optimal one that minimizes the error.We later show in Section V B how to possibly go about choosing a partition using intuition from physics, as well as provide a machine learning regressor to find the optimal partition.The later is more for a proof of principle rather than something you would run in practice due to the large overhead.

D. Examples and Implementation
Within this section we illustrate an implementation of the imaginary time composite channel with an example.Approximating the imaginary time propagator e −βH , which is a function of a Hamiltonian H that is hard to exponentiate in general, can be achieved through the algorithm written in pseudo-code below.This algorithm returns a list of propagators (of which their product approximates e −βH ) that are instead easily diagonalizable.
Input: Simulation parameters: Trotter-Suzuki inner-order 2k, QDrift samples N B , iterations r, and imaginary time β.A list of Hamiltonian terms H = j h j H j , two deterministic classical functions: PARTITION(H) which returns two lists of Hamiltonian terms A = q a q A q and B = p b p B p , and TROTTER(A, 2k, β, r) that returns a Trotter-Suzuki product formula of order 2k with time-slice β r .We also require a classical oracle function SAMPLE() which returns a value j from the probability distribution p j = bj l b l .Output: A composite imaginary time product formula in the form of a list of propagators, with inner-order 2k and outer-order 1, which closely approximates the imaginary time propagator e −βH .Function CompositeListCompilation(H, β, r, 2k, N B ): Algorithm 1: Pseudo-code for implementation of an imaginary time composite simulation using a highorder Trotter formula and a partitioning heuristic to divide the Hamiltonian terms between the two channels.The algorithm above is written for illustrative purposes and simply constructs a composite product formula in the form of an ordered list.This can then be used as a subroutine for other algorithms in computational physics, such as those from the field of Quantum Monte-Carlo (QMC).For example, in statistical physics calculating the partition function Z is of central interest.Given this returned list, which for clarity, is not a channel, we can approximate Z as follows: where is bounded by a set of functions implied by Theorem III.3.Next, we insert the resolution of the identity between each propagator such that the trace disappears given that n = |X |r or the number of terms in the list multiplied by the number of iterations.Note the indices are written such that there L A r Trotter terms and N B r QDrift terms, with the difference that the subscripts indexing the QDrift terms are dummy indices, as these propagators are importance sampled at random, as shown in the pseudo-code.Now, with the above expression, we can either diagonalize these terms and Monte-Carlo sample this sum, or impose a classical mapping that allows us to apply some of the well known QMC algorithms.These classical mappings depend on the system in play, for example, if we are interested in Heisenberg or XXZ chains, it is possible to map the model onto a higher dimensional Ising model and apply the method of world lines [20].The motivation for using this more complicated propagator list, as opposed to the standard Trotterization, is that a propagator with a lower error requires a smaller r to achieve the desired precision, and can therefore reduce the path length of the Monte-Carlo path integral required to approximate Z above.Our numerics show that this framework can achieve a substantially lower error in the right conditions via attaining a smaller C in multiple cost plots in Section V D 2, therefore, highlighting the potential applications of this algorithm to improve QMC simulations.
In this example, it was chosen to show arbitrary inner-order and outer-order 1, given that the goal is simplicity.This can be simply generalized to higher outer-orders by also requiring an additional subroutine that symmetrizes T and Q, and adjusts their time-slices accordingly.We also remark that the classical function PARTITION() is presented here somewhat as a black box.This is done for generality, given that the user is free to use a method of their choosing.In section V B, we present an explicit algorithm for this function, which requires a single additional input ω c called the chop threshold, and simply involves a looping if statement that sorts Hamiltonian summands into A or B depending on whether their spectral norm is greater or less than ω c .

IV. LOCAL COMPOSITE CHANNELS FOR REAL-TIME EVOLUTION
We introduce here an algorithm for real time evolution, which utilizes the local structure of the Hamiltonian and is composite.It combines intuition from both the Trotter-QDrift composite channels as discussed in Sec.II E and the Lieb-Robinson decomposition, Sec.II G. Our main result is Eq.(37), which bounds the error of a local composite channel.
The basic idea behind the method goes as follows: We decompose the Hamiltonian evolution into local blocks and build composite channels out of the operators supported on these local blocks.Each block is thus assigned a local partition {A, B} and a sample number N B .
Based on prior intuition, we expect this approach to perform well on disordered models yet with local interactions, such that there is a structure for the local composite channels to exploit.However, by adding QDrift sampling to the local framework, with the speed limit v LR in mind we are "fuzzing" the light-cone in the quantum circuit evolution of the system.For example, in Figure 2 we show a circuit that illustrates the Trotter time evolution of a local Hamiltonian where the H i,j terms only connect neighbouring qubits within and between each disjoint subset A and B of the 1-dimensional lattice.
FIG. 2: Brickwork circuit diagram to highlight the emergent light-cones from nearest neighbour interactions.The notion of "fuzzing" the light-cone occurs when we use QDrift to sample terms along these light-cone boundaries in the circuit.
The light-cone analogy is now made clear: A term H 0,1 influences the most adjacent qubit in B only once 3 "layers" of gates have been applied (intermittent gates cannot improve this speed).This behavior is analogous to the Lieb-Robinson velocity that upper-bounds the propagation of information flow in the lattice.We note though that in a local composite channel, some of the gates are being randomly sampled in the QDrift channel, thus the evolution of the light-cone is not exactly reproduced since not all terms are necessarily implemented due to sampling.Keep in mind that local composite channel does include more complicated subsets of gates, some of which evolve boundary region terms like H 3,4 backwards in time; the circuit above is presented simply for conceptual purposes.
Towards bounding the error of a local composite channel, we build on the result from Ref. 23, which concerns with bounds on time evolution operators, and reformulate it as a bound on channels.The motivation for the reformulation of the bound into a channel norm is so that QDrift could be incorporated into this algorithm.This would later allow us to bound the overall error.Since QDrift produces mixed states, we need to work with the density matrix formalism.Theorem 6.Given a time t and a local Hamiltonian H = X H X that generates a time evolution unitary U and an evolutionary channel U, ∃ (µ > 0) | for any disjoint regions A, B, C we have the following operator norm bound: where the evolutionary channels contain only Hamiltonians terms supported on their subscripted sets of the lattice, and X:bd Proof.We first restate the main result from Ref. 23 on time evolution operators, Now, write V := U A∪B U † B U B∪C and U := U A∪B∪C and consider the following induced channel infinity norm max In the last line we used the fact that the infinity norm of a density matrix is upper bounded by 1, and the unitary invariance of the Schatten norms.Inserting the original definitions for V and U into max ρ: ρ 1 ≤1 ||V − U|| ≤ 2||V − U ||, where V is just V ρV † , and similarly U = U ρU † , we successfully upper bounded the desired channel norm with the result with the constant factor disappearing due to O notation.
The bound (31) we have proven above is still not of the form of that given in QDrift.QDrift bound utilizes the diamond distance, which is a completely bounded trace norm or Schatten 1-norm defined as While this is an upper bound on the infinity norm in general, it is not immediately clear how tight this bound may be as factors of dimensionality of the system may come into play, which we wish to avoid.This can be made clear by a simple example: Consider an N × N identity matrix 1 1 N .This matrix has where N is a factor of dimensionality.Therefore, if possible, we would also like to convert the result (31) to a diamond norm.Building on Ref. 22, we accomplish this in the following lemma: Lemma 7. Given a time t and a local Hamiltonian H = X H X that generates a time evolution unitary U and an evolutionary channel U, ∃ (µ > 0) | for any disjoint regions A, B, C we have the following diamond distance bound Proof.We use Equations (11)(12)(13)(14)(15)(16)(17) from Ref. 22 replacing their Trotter channel with the local block channel V = U A∪B U † B U B∪C and set U = U A∪B∪C with V and U representing the unitaries that induce each channel, In the last line, we inserted the Theorem 6 to complete the proof.We observe the same bound on our Diamond distance as appeared in the induced infinity norm with the constant factor 2 once again absorbed into O.
Onward, the idea of the the local-composite algorithm is to start with some Hamiltonian H = X H X defined on some lattice Γ with X ⊆ Γ. Begin by breaking up the lattice into two regions: A ∪ B = Γ.Now we take a smaller region of the lattice Y ⊆ A and use the result to approximate the time evolution operator in the following way, Note that this breakup is distinct from Trotterization in that we are decomposing the operator into a product formula of which the operator exponentials still contain sums of operators.However, the operator sums now only contain terms in each local "block".We can re-curse this process to make m blocks of near-equal size.
Next, we decide which simulation method to employ to approximate and implement the block-evolution operators.In the previous theorem (32), a bound was proved for only one round of decomposition, but note that when defining V , the proof holds regardless of how many unitary blockings the operator is composed of.In Ref. 23, it was stated that the error in Theorem 6 for D rounds of decomposition in a D-dimensional lattice is ∈ O e −µl DL D /l for a lattice with chain length L and blocks with overlap l.The overlap for the block in the above example would be the diameter of the set Y , which can be written diam(Y ) = max y,y ∈Y dist(y, y ); the latter distance is again the distance between lattice indices.
Before investigating this channel numerically, we provide a bound on the diamond norm between this local composite channel and the exact evolutionary channel.Along with the fact that the diamond norm is sub-additive and sub-multiplicative, we will make use of the following two properties: Where A, B, C and D represent channels.
These properties were shown in Ref. 31.For our purposes, we require a more general property that holds for the composition of k channels A i .We provide a simple proof in the following lemma: Lemma 8. Given a composition of k quantum channels A i , and another composition of k quantum channels A i , the diamond norm of the difference between these two channels has the following bound Proof.The proof is achieved by induction and it uses the fact that ||A i || ≤ 1 for any quantum channel, which is due to the fact that the spectral norm is unitary invariant and density matrices have a maximum eigenvalue of 1. Starting from the left and suppressing the composition operation • above, we have Repeating this procedure, we decompose the rightmost term into a sum of two channel differences by induction, allowing one to prove the original identity in Equation (35).
Next, we utilize Theorem 6, where we converted the spectral norm result on local block unitaries to a bound on a local decomposition channel.It is now made clear why this was done; we require a channel description to describe the QDrift algorithm and relate their bounds.Using Lemma 8, we proceed with the initial goal of bounding the diamond norm between the local composite channel and the exact evolutionary channel.Here, for compactness we adopt the notation of writing blocked composite local channels as X Yi and "exact" local channels (prior to hybridization into Trotter and QDrift) U Yi .Here, Y i indicates a set of lattice points Y i ⊆ Γ such that each Y i is constructed via the algorithm laid out earlier in this Section.Therefore, only certain Hamiltonian interaction terms are supported on the lattice subsets Y i .Additionally, let U Yi contain terms that have also undergone the appropriate conjugation (given that the intersections of some blocks are evolved backwards in time).Finally, let each set   It should also be stressed that r is a local channel iteration, which means that we do not iterate the channel that is a composition of blocks but rather each individual local composite channel built out of the block terms.
The reason for this is that the error in the local decomposition is independent of time for time independent H with t ∈ O (1).We also require m ∈ O (L/l) D [23].
Proof.We begin with the norm we wish to bound, and then apply the above identities to format the norm such that known bounds from [23] and [22] can be applied.
Then by inserting the bound given in [22] Equation 31, we arrive at the original expression.It then immediately follows that we can bound local composite channels of any order 2k by inserting the result of [22] Equation 58, however, we omit this as it follows trivially from above.
The takeaway from this is that the block-local composite channel is bounded by an error that is very similar to that of the general composite channel asides from the addition of an extra term that is asymptotically exponentially small.However, while we expect to see improvement from this localization scheme, getting more out of this bound does not seem feasible due to the amount of freedom in this algorithm as almost every parameter is dependent on the "worst" subset of the set Y .

V. NUMERICAL SIMULATIONS
In this section we highlight the details of configuring our numerical investigations by defining Hamiltonians of interest.We then discuss partitioning strategies and choice of an error measure before presenting numerical results.The choice of error measure usually does not require a dedicated subsection.However, when considering composite algorithms, there is a subtle feature in that not all algorithms may be treated on equal footing by the same error measure, which can lead to inconsistencies.For our proposes, the trace distance is the most useful measure of choice.Rather than computing upper-bounds, the numerical results we provide here are exact calculations of the cost C, or the precise counts on the number of quantum gates with the form e −i t r j σ ν j required to meet some desired precision .We write gates in this form to highlight that the Hamiltonian summands in our simulations are always written as a tensor product of Pauli operators, allowing for a nice parallel to the well known rotation gates in the context of quantum computing.For imaginary time, although this form still holds in our numerics, the same analogy does not hold, as we are simply constructing e − β r Hj on a classical computer which can be of somewhat arbitrary form.
In order to calculate C, we have constructed a library to compile any desired product formula simulation, given a list of Hamiltonian terms, partition, QDrift samples, simulation time and desired precision.These simulator objects are handled by external functions that can partition the simulators, calculate errors and exact costs, or approximate simulation cost via Monte-Carlo methods and more.The library is built on NumPy, but also contains conversion functions to load Hamiltonians generated by quantum chemistry packages OpenFermion [24] and PySCF [32] to simulate systems of interest.It also contains methods for geometrically local simulations to compute blockings using the required set logic.

A. Hamiltonians of Interest
Within this section, we introduce the Hamiltonians in our numerical investigation of composite algorithms, as well as briefly outline methods used for their generation.

Electronic Structure Hamiltonians in Second Quantization
The electronic structure problem is perhaps one of the most famous classically intractable problems that has vast applications in quantum chemistry.In order to write down these Hamiltonians, it is first necessary to introduce Fermionic creation and annihilation operators.Fermions are particles with half-integer spin that obey Fermi-Dirac statistics, meaning they obey the following anti-commutation relations: We work in Fock space where the subscripts of operators indicate the excitation number or the atomic orbital of an electron.Molecular electronic structure Hamiltonians then take the following form: Here the first term represents single excitations and the second term keeps track of double excitations or "hopping" amongst orbitals.The coefficients h are molecular integrals that depend on the basis of choice to describe the molecule [26].Further, more can be done to aid in the implementation of this problem on a quantum computer.The Jordan-Wigner transformation provides a one-to-one mapping between the fermionic and spin operators.This will allow us to write down the time evolution operator in terms of the universal rotation gates (and CNOT gates).To understand the transformation, first observe: Now to build in the desired commutation relations and generalize this to a Hilbert space for N qubits, or the tensor (Kronecker) product of N 2-dimensional Hilbert spaces H = N i=1 H i : In order to build these Hamiltonians numerically, we use the OpenFermion [24] and PySCF [32] packages for quantum chemistry.OpenFermion is a library that allows for the easy manipulation of fermionic operators that arise in quantum chemistry, as well as it interfaces with a variety of electronic structure packages that perform molecular integrals in the basis of choice to generate Hamiltonians in the form of Equation 43.Further, OpenFermion also has the Jordan Wigner transform built in, allowing one to construct this Hamiltonian in the Pauli basis.PySCF was our electronic structure package of choice to compute molecular integrals.
Given the form of Equation 43, we observe that our Hilbert space needs to be truncated.An active space calculation does exactly this; the Hamiltonian is written in a space such that only so many orbitals are "active" or such that an electron can be excited to occupy active orbitals.We generate all of our electronic structure Hamiltonians in the minimal basis where we use a the number of qubits equal to the total period of the molecule.For our numerical investigation, we provide a function to generate chains of hydrogen atoms given a very simple input; bond length and number of atoms.The function uses PySCF to compute the molecular integrals, and then uses the data to build the Hamiltonian in an active space implied by the minimal basis, and using a minimal spin configuration.

Jellium Uniform Electron Gas
Jellium is a model of a uniform electron gas that captures the interactions between delocalized electrons in a solid with uniformly distributed positive potentials serving as Nuclei.It is not only a system of interest in Materials Science, but also as a benchmark system in quantum simulation.More compact representations of this Hamiltonian have been proposed as a candidate for experimental simulation on near-term hardware [7].The system Hamiltonian has a closed form representation and does not require any additional molecular integrals to construct: where the jth nuclei has position R j and atomic number ζ j , and k ν = 2πν Ω 1 /3 with cell volume Ω and σ containing both up and down spins.For the derivation of this Hamiltonian see Appendix B of [7].Conveniently, OpenFermion also provides simple functions to quickly generate this Hamiltonian, and we do so in the momentum plane wave basis with periodic boundary conditions.We elect not to use the more compact plane wave dual basis representation presented in [7], due to the fact that we are using this Hamiltonian as a benchmark, rather than studying the outputs of the simulation.For the composite simulation, Jellium provides many Hamiltonian terms and a very sharply peaked distribution (see Figure for a system of size equal to that of the spin models we study.Given that system size is more of a limiting factor than term number in our numerical study, this presents an opportunity to see how a composite channel performs on a system with greater L. To limit the system size we also use a spinless model, and then perform the Jordan-Wigner transformation on the second quantized Hamiltonian to represent our Hamiltonian as a sum of Pauli operators.This Hamiltonian is constructed with the necessary transformations using OpenFermion [24].In red we provide a potential choice of ω c for the partitioning heuristic.

Graph Hamiltonian Model
The Hamiltonian we explore here involves a spin chain imposed on a lattice Γ ∈ Z D with a graph distance metric dist(u, v) = |u − v| 1 where u and v are vectorized coordinates on the graph with dimension equal to D. For our investigation we only examine lattices with D = 1, given that for a fixed number of sites, this gives the most sharply peaked spectral distribution than any other D: This system is similar to the quantum transverse field Ising model but with interactions that fall of exponentially with graph distance.The coefficients α ij and β k are site dependant coupling constants that allow for the introduction of more disorder and/or structure in the Hamiltonian.To add some disorder to the model, we draw these coefficients pseudo-randomly from a Gaussian distribution with mean 0 and variance 1.

Heisenberg Model
The Heisenberg model describes a quantum spin system in a magnetic field with nearest-neighbour interactions.The Hamiltonian takes the following form: Here B z is the strength of the magnetic field in the z direction and J {x,y,z} are coupling constants.Given the intuition of our composite channel, we expect this model to take advantage of our algorithm when the coupling constants largely differ in magnitude such that partitioning into Trotter and QDrift takes advantage of more Hamiltonian structure.Furthermore, introducing site-dependant coupling constants or writing down a highly disordered spin system could further add structure that the algorithm can take advantage of.A Hamiltonian of this nature would look something like the following: This Hamiltonian appears somewhat contrived for extracting the performance of the local composite channel.However, this system closely resembles the Edwards-Anderson model of a spin glass, a system of interest in condensed matter physics.In an attempt to create a sharp distribution, we simply sample J (j) ν from an exponential distribution with a scale parameter of 0.1.

B. Partitioning Schemes
The main difficulty with deploying the composite simulation framework concerns finding a good partitioning.In introducing composite simulations, Hagan and Wiebe suggested partitioning schemes derived on the basis of optimizing analytic cost functions both in deterministic and probabilistic settings [22].Here, we take a different approach involving the exact calculation of the simulation error, and an optimization routine that, given convergence, finds the optimal partitioning and gate count with respect to a chosen error tolerance and simulation time.This approach is used to answer the question regarding the best savings one can hope to achieve when deploying composite methods to simulate a specific Hamiltonian.We are not proposing this as a pre-processing routine, as it has complexity greater than that of the simulation itself, which is trivial as the optimization involves solving the simulation problem recursively.In addition, we also arrive at simple heuristics that can be used to partition certain Hamiltonians with little overhead, which we do propose as a strategy for using a composite approach.
Chop is the partition that we introduce in this work.The idea is based on the heuristic of placing a few terms with larger spectral norms into Trotter-Suzuki channels and numerous small terms into QDrift, assuming that the Hamiltonian presents this structure.We start by sorting the terms by their spectral weights and introducing a "chop threshold" ω c ∈ [0, max i h i ].This scale will determine the partition such that if a term has spectral norm Now we can express the error tolerance as a function of channel iterations r, with partitioning chop ω c and a sample number N b that will be chosen in an optimization routine, and for a fixed initial state and time: By fixing an error tolerance for , the exact cost of the simulation becomes a black box function with no closed form expression: This is the cost function we wish to minimize.However, we cannot do that by conventional methods such as with direct gradient descent.Also, with no strong intuition for a choice of N B , if we wish to optimize this parameter we have to deal with integer optimization as well.The iterations r, however, while an integer, does not require optimization, but rather emits a search problem.If we allow an optimizer to pick initial random values for N b and ω c from a fixed interval, then we must find the value r required to meet the error threshold thresh , which will ultimately be determined by the optimizer's choice of the other two parameters.To complete this, we perform an exponential search on r until we find some r where (r) ≥ thresh and set this as an upper bound on r.We then perform a binary search to find the smallest value of r required to meet this condition and count the number of gates in the channel.This is a very expensive function given that we are precisely building the composite channel, applying it to the density matrix initial state in the problem, and counting the gates applied in each iteration of the search.The expensive nature emerges due to the sheer number of matrix multiplications required in performing this task, not in the search for r, which is nearly optimal.Note the importance of using the trace distance in this approach as it guarantees monotonicity of (r), which makes the search possible.This is not so in the framework of sampling the quantum infidelity, as finding the cost here would require other statistical methods (see Appendix B. Now a glaring question left unanswered is the choice of an optimizer.We implement the Gradient-Boosted Regression Trees (GBRT) algorithm included in Sci-Kit Optimize [33].This algorithm is specifically-designed to handle the optimization of very expensive functions.It is also convenient for our purposes given that it can handle both integer and real optimization parameters simultaneously.At a high level, the algorithm works by using a series of decision trees with an associated loss function.The decision trees perform regression to fit the input function and are iteratively generated based on the minimization of the loss function via gradient descent.This optimizer and cost function 53 can then be easily generalized to the local composite channels where now we have an N b and ω c for each blocking.As the number of local blocks grows, the optimization routine will need to take a larger number of input parameters in this prescription.However, the size of the system becomes classically intractable long before we would consider using this many local blocks, so this is far from a concern.
In some cases, models may exhibit a partitioning that is somewhat canonical and can lead to excellent performance of composite methods.This occurs when we have a Hamiltonian that fits naturally into the intuition behind the algorithm, such that we have a set A containing large terms with small commutators and a set B with small terms that are known not to commute in general.We are, therefore, proposing to use the chop partition but by choosing the chop threshold ω c based on physical intuition regarding the Hamiltonian, rather than some expensive optimization routine.A perfect example of such a system is a Heisenberg model with weak coupling.In this case, looking at Equation 50, we would set the chop threshold ω c = max{J x , J y , J z }, which implies we simulate the interactions with QDrift {J ν σ ν j σ ν j+1 } ν=x,y,z → B, and simulate the site energy terms with Trotter-Suzuki {B z σ z j } → A. In this way, the terms in the set A all commute with each other, whereas the terms in the set B are guaranteed to have a small spectral norm.We bring numerical evidence that this provides computational advantages in the sections below.In general, any system with perturbative interactions may benefit from this framework, given that the commutators within the system are small, as they will avail this canonical partitioning.In cases where the partitioning is not as obvious, as is the case with H 3 and Jellium, we can achieve similar advantages by choosing ω c = max d Hj dj s.t.B ≥ A , meaning we sweep an ordered list of the Hamiltonian spectral norms and track the largest difference between terms, chopping the list where this occurs, given j ≥ L 2 .The final condition is just to ensure the majority of terms are simulated by QDrift.We also use this strategy throughout V D and show advantages.

C. Error Measures
In this investigation, there is some arbitrariness in the error measure one can choose in order to quantify the performance of a simulation channel.In order to evaluate the resources required by an algorithm, one must evaluate the number of gates required to meet a certain , which is calculated by said error measure.In the literature, this is often quantified by the diamond distance utilized in previous sections.However, while analytically convenient, for any reasonably-sized system, computing this quantity becomes computationally expensive.While it is possible to evaluate it efficiently, this requires finding the solution of a semi-definite program, which is much less efficient than using some other error measures.In addition, since we are not constrained to analytically solvable expressions or closed form equations with our numerical methods, we can optimize this cost in terms of some partition scheme.This is the idea behind the optimal chop partition, and doing so requires frequent computations of .With this in mind, the error of our algorithm should be a quantity that we can compute in a reasonable amount of time while also being a fair error measure.The criteria for "fairness" comes within the error measures treating each algorithm that comprises the composite channel on an equal footing.For example, if we are to optimize the partitioning with respect to the gate cost (which is dependent on ), then if Trotter is more performant with respect to QDrift in one error measure than in another, then our composite optimizer will favor Trotter, which will be reflected in the partition.As a result the total cost of the composite channel will be skewed by the error measure used.
We consider the infidelity and trace distance as possible measures of .For definitions and an additional discussion regarding the scaling and complexity of computing these quantities, see Appendix B. Analytics are required in order to answer the question of which measure might provide a more fair comparison.More specifically, we can ask the question of how the error measures will scale with respect to the total simulation time t, and then test these results numerically.In terms of the infidelity, we provide the following Theorems: Proposition V.1 (QDrift Infidelity Time Scaling).Given a QDrift channel Q(ρ, t) and the standard evolutionary channel U(ρ, t /N), for a density matrix ρ, time t/N , then the infidelity between the outputs of the channels Proposition V.2 (Trotter-Suzuki Infidelity Time Scaling).Given a Trotter-Suzuki channel T 2k (ρ, t) and the standard evolutionary channel U(ρ, t), for a density matrix pure state ρ and time t, then the infidelity between the outputs of the channels The proofs of these theorems are included in Appendix B. Here, we obtain the non-trivial result that the infidelity is squared when considering Trotter-Suzuki formulas.This would lead to our optimized chop algorithm to heavily favor this channel over QDrift, and for this reason, we consider it an "unfair" error measure.On the other hand, if we consider how the trace distance scales with simulation time in both algorithms, we obtain the following Theorems: Proposition V.3 (QDrift Trace Distance Time Scaling).Given a QDrift channel Q(ρ, t) and the standard evolutionary channel U(ρ, t /N), for an arbitrary density matrix ρ and time t/N , then the trace distance between the outputs of the channels Proposition V.4 (Trotter-Suzuki Trace Distance Time Scaling).Given a Trotter-Suzuki channel T 2k (ρ, t) and the standard evolutionary channel U(ρ, t), for an arbitrary density matrix ρ and time t, then the trace distance between the outputs of the channels Here, we see that no such squaring occurs, and the expected time-scaling is obtained.For this reason, we compute the entire density matrix and using the trace distance in all of our numerical simulations.Proofs of the above theorems, as well as further discussions can again be found in Appendix B.

D. Performance Results
In this section, we first numerically analyze the real time quantum algorithm given by Hagan and Wiebe [22] and then show equivalent numerical calculations of the imaginary time classical case.To accomplish this, we provide cost plots in which we provide the minimum C, or the number of rotation gates to achieve a desired simulation accuracy (calculated by the trace distance) for each point in time t or β.To reiterate, here we exactly compute entire evolution channels with a random initial state ρ sampled from the unit hyper-sphere, and directly apply and count gates.We conclude this section with a brief discussion about numerical studies for the local composite simulation algorithms.In these plots, we study variants of the composite channel and display results with the aforementioned notation with the addition of a tilde over the channel if the partition and N B have been optimized with GBRT.For example, a composite channel with inner-order 2 and outer-order 1 with an optimized partition and number of QDrift samples N B is written like so X 2,1 .
Throughout the section, we normalize H = 1 and run simulations for times t ∈ (0, 3π  2 ] so as to ensure the system undergoes non-trivial dynamics without overlapping the phase.This is done due to the fact that Trotter formulas have a periodic error for H t ≥ 2π, and running simulations in this range would lead to the optimizer finding the "good points", where the error happens to be small, which would provide a very low cost simulation and a sharp drop in the cost trend.We also report the cost advantages achieved on crossover points, which are values of t such that C QD (t) = C T S (t).We denote the composite channel advantage as ξ := C QD (t )/C X (t ) = C T S (t )/C X (t ).As we are unable to compute these times exactly we use interpolation methods to report values of ξ.This is motivated by the fact that analytics suggest this to be the point of greatest advantage for a composite channel 22.This is intuitive, especially given higher-order Trotter-Suzuki formulas which are known to asymptotically outperform QDrift for large t, whilst QDrift is dominant in the small t limit, suggesting a region where their strengths can be combined.TABLE I: Summary of gate cost improvements observed (contingent on optimization convergence).We observe that savings tend to somewhat improve as the number of terms increases (within the same model), with the exception of Jellium 7 where optimizer struggles with partitioning due to the number of terms.This is evident in the lack of monotonicity of C( X 1 in Figure 6.The most significant savings are seen for the Jellium models.Even in cases where the number of terms are comparable to other models, larger advantages are persistent in Jellium.This is further establishes the spectral norm distribution as one of the most important indicators of performance in the composite framework.

Real-Time Composite Quantum Simulations
Beginning with Hydrogen chains, our results are highlighted in Figure 4.This plot reveals two interesting features: for long-time simulations, heuristics can be found that essentially match the performance of an optimized formula simply by inspecting the distribution of the norms of individual summands H j , and with the optimized routine, we find a significant improvement at the crossover point with ξ = 2.3.These plots begin flat for most of the simulation channels, which for the most part, indicates that one application of the channel achieves the desired for multiple sequential simulations at small times.This is expected, and is especially common with Trotter-Suzuki formulas, given that with one iteration they apply at least L gates depending on the order, while QDrift provides the option of sampling single gates.
FIG. 4: H 3 Cost Plot Simulation (Real-Time): Log-log gate cost plot for the H 3 Hamiltonian generated with OpenFermion using three-dimensional Gaussians in a minimal basis.The bond distance is chosen to be that which minimizes the energy surface of H 2 , which is ≈ 0.8 angstrom.We achieve a crossover advantage of ξ = 2.3, as well as remaining constant factor advantages at long times.
Figure 4 is interesting given that both the heuristic and optimized channels provide significant advantages at the crossover point, as well as the heuristic partition seems to match the asymptotic performance of the optimized channel.This demonstrates that optimization subroutines are not required to gain advantages in this framework.Furthermore, the second inner-order composite channel also shows a consistent advantage over the second-order Trotter channel.
To gain insight into effective choices of partition and QDrift samples, we also plot the optimized N B values and the ratio of Trotter terms to total terms |A|/|H| with time in Figure 5. Here, we find choices that somewhat agree with our prior intuition.For short times, places almost all terms into QDrift, and slowly increases N B .As t increases, more terms are placed into Trotter with the partition bouncing around in the regime where Trotter and QDrift have similar costs, which is also expected.The composite channel X 2,1 essentially places all terms into the Trotter simulator, given the favorable asymptotic performance of higher order Trotter formulas over QDrift, while X 1 finds a balance between the two at long times, likely due to their equivalent t scaling.The most interesting behavior is that of N B at mid to long times.Here, N B peaks near the crossover point and then falls off as Trotter t-scaling becomes dominant in X 2,1 .However, for the X 1 channel, N B experiences somewhat of a revival after the peak, and stabilizes at 15, which is about 24% of the terms.We use this percentage to motivate future heuristic choices of N B in our investigation of Jellium in Figure 6, which turns out to work quite well.When it comes to the simulation of Jellium, we find some of the most significant performance improvements within this section, including an order of magnitude cost difference at the crossover point (see Figure 6).Specifically, in the case of 6-site Jellium, the Trotter and QDrift cost at the crossover point is approximately 100 gates, versus the composite channel, which achieves the same precision with only about 7 gates.Here, it is also shown that one can find an adequate partition that leads to advantages at longer times without the need for any optimization.This is also the only model whereby the optimization routine struggles to find optimal partitions in the neighbourhood of the crossover point.This leads to the Jellium 7 model inheriting a smaller ξ than what is likely achievable.L terms are sampled.This heuristic works quite well, although it is outperformed by the optimized version, especially at short times.In (a) we achieve ξ = 9.2.In (b) we achieve an impressive advantage of ξ = 18.8, the largest of all our real time results.In (c), we perform a similar analysis of the 7 site model with some higher order composite channels, but find that the optimizer has increased difficulty with larger numbers of Hamiltonian terms.Here ξ = 10.4,but through inspecting some neighbouring points of the crossover region, it surely has the potentially to be much larger.
The final system we investigate in this section is that of the graph toy model with exponentially decaying interactions, which is a beyond nearest neighbour model.In Figure 7, we study this model for chains of length 7 and 8, and find essentially identical behaviour.When moving from 7 to 8 spins, we only add 15 more terms to the Hamiltonian, which is clearly not enough to see any significant advantages.In fact, the crossover advantage is slightly smaller for the bigger model, but this could also be due to the optimizer not fully converging.

Imaginary-Time Composite Channels
This section contains cost plots of simulations of the same aforementioned Hamiltonians, but with imaginary time propagators, where we present cost plots of the most interesting Hamiltonians from the previous section.
For simulations of the Heisenberg model, we find similar advantages to those in real time.In Figure 8, we see that our proposed heuristic leads to an advantage over Trotter-Suzuki and QDrift in the regions of interest.What is different about this plot is that the optimizer finds the same N B and partitioning for all short times.This is an artefact of both the Hamiltonian and how the optimizer is programmed.Since all the splitting (single site) terms have equal spectral norm, the optimizer is placed in an all or nothing scenario, as choosing ω c < 1 immediately places all terms into QDrift.Given that after receiving the cost, the program then solves a search problem to find the minimal r to achieve precision, it is rare that it finds the ideal conditions to build a pure QDrift channel with r = 1.However, significant savings are still achieved at the crossover point, and over T 1 and T 2 at large β.Once again, the validity of heuristic partitions are shown, specifically in the 1st order composite channel, which exactly matches its optimized version at large β.
FIG. 8: 8 Spin Heisenberg Model Cost Plot: In this imaginary time simulation we establish ξ = 3.1, as well as maintain advantages at large β.We also find that our chosen heuristics are essentially optimal at large β, with the green and dark blue lines overlapping.
For Hydrogen chains, we obtain strikingly similar results and compared with those in real time, as seen in Figure 9.We once again obtain a significant crossover advantage, as well as constant factor advantages at large β, or low temperature.Heuristics are also shown to continue to hold in their effectiveness, in this case, from the crossover point and onward.The same parameters are used to build the 3 atom Hydrogen chain as we done in real time.We achieve incredibly similar results, and again recover the real time result of ξ = 2.3 that was achieved in Figure 4.
For Jellium, we choose to investigate the system size with the best-behaved optimizer, as well as the largest ξ, which occurs for the 6-site model.In imaginary time, we once again reproduce a significant advantage, shown in Figure 10.As in the case with H 3 , this plot is quite similar to the real time case in Figure 6.However, here at large β the composite channel seems to do better in imaginary time given that even the first order composite channel (with optimization) outperforms the second order Trotter channel.This happens in the final point of the plot where T 2 is no longer in the"flat-regime".While this is very interesting, it is unclear analytically why this occurs, and we would likely not expect this trend to continue asymptotically.FIG.10: 6 Site Jellium Simulation Cost Plot (Imaginary-Time): Here we recover the crossover advantage of ξ = 18.8 from the real time simulation in Figure 6, which is also the largest advantage achieved in our imaginary time simulations.We additionally achieve advantages over second order Trotter at large β.
Overall, this section nicely complements some of the analytics in Section III both by reinforcing the fact that composite quantum channels allow for similar advantages in both real and imaginary time, as well as through calculation of exact constant factor advantages.In other words, this section provides convincing evidence on the applicability of composite formulas to classical imaginary-time Monte-Carlo algorithms.

Local Composite Quantum Channels
Distinct from the previous two algorithms, with the local composite channels we do not immediately expect to see significant simulation advantages for the small systems we can compute.Recall, this algorithm makes use of the Lieb-Robinson velocity v LR that limits the propagation of information and thus correlations in a local lattice model.In our above numerics, our lattices contain ≤ 8 sites, meaning even for small v LR , the lattice can still become quickly entangled.In this section, it is important to understand where the observed advantages are originating, whether they are from the local decomposition, or something else.For example, the results in prior sections already suggest that composite channels can outperform Trotter and QDrift channels in certain regimes.If a block decomposition is introduced, we pay a small gate cost to break the simulation into subsets (as gates on the boundary are applied more than once), but the advantages of the composite simulation are almost guaranteed to outweigh this cost.Thus we wish to find a regime in which we can perform calculations of exact costs with local composite channels that explicitly gain advantages via block decomposition.Otherwise, we will observe essentially the same behaviour as before, but with slightly smaller constant factors.There are two ways to go about achieving this; one is to add more sites to the model, which quickly becomes computationally intractable with standard methods.The second strategy is to decrease the coupling between sites in the lattice, which naturally decreases v LR .This is the strategy we utilize.In reference to Equation 50 we perform our cost plots on Heisenberg models with 8 sites with B z = 1, and J (j) ν sampled from an exponential distribution with a scale parameter (serving as a coupling constant) of 0.00005.To allow for a fair simulation, we then choose = 0.000001, such that statistically, 98% of terms will be greater than , which can be seen from a simple integration of the PDF.Results of this simulation are shown in Figure 11.FIG.11: 8 Spin Heisenberg Model Simulated by localized and standard channels: This cost plot compares Trotter to its localized versions, with the subscript l indicating the overlap of the boundary region in the block-local simulation.Channels with no l subscript are the standard simulations from prior sections.As expected, we observe that the composite channel is most efficient, however, given that the standard Trotter algorithm outperforms the localized Trotter channel, we can conclude that we are not in the regime where locality is providing advantages.The same heuristics were used for the composite channel as in Figure 8, with N B = (4, 1, 4) on lattice subsets (A, Y, B).
Given the gap between T 1 and T l=2 in this very weak coupling regime, it is unclear whether our methods (exact gate counts) provide the means for investigating advantages gained by locality.In Ref. [23], via computations of bounds, numerics did not show advantages (in the form of T-gates) until approximately 100 sites were included in a more strongly-coupled model with J (j) ν ∈ [−1, 1] sampled i.i.d., so our results with far fewer spins are not unexpected.However, we still theoretically expect to see advantages in the limit where system sizes are large, and we can take advantage of the Lieb-Robinson bound.

VI. DISCUSSION AND SUMMARY
The main contributions of this work are the extension of the composite channel to imaginary time and to local Hamiltonians, and the construction of a composite simulation library for numerically evaluating algorithmic performance.The imaginary-time bounds provided advocate for the potential use of QDrift and Composite QDrift-product formulas in classical simulations where they can potentially improve the efficiency/accuracy of Quantum Monte-Carlo simulations.This is possible via shortening the path length of discrete time path integrals in these calculations given that our algorithms can approximate e −βH with fewer propagators than Trotter-Suzuki formulas in most cases.Our results also highlight that depending on the choice of β, one may also perform an imaginary time simulation using QDrift, particularly for high temperature physics.It is an open question as to whether these methods can be used to improve the overall accuracy before one runs into the sign problem.Nevertheless, these methods are interesting as one can use physical information about the Hamiltonian to choose a partitioning that possibly grants a computational advantage.
Furthermore, the extension of the composite channel to local Hamiltonians provides an interesting approach to lattice simulations in which there is a priori knowledge about the lattice structure.The diamond distance bound on this algorithm is significant in that the cost of establishing local composite blocks seemingly minimal in terms of asymptotics, making it a potentially-useful algorithm.However, it seems more powerful numerical methods may be required to explore the regimes in which this approach shines, given the size of the systems likely required.An interesting future trade-off to be explored here is the size of the block-ing l, and the size of the commutator sums in Theorem 9. Furthermore, one can investigate optimizations of partitions and QDrift samples on lattice subsets.Despite our library containing these methods, we did not employ them as they can lead to misleading figures.Here, one must tread cautiously, as adding more blocks with parameterized weights and QDrift samples can lead to an optimizer finding lower costs, similar to how adding hidden units to a neural network can lead to the over-fitting of a data set in machine learning.In this scenario, it may be possible to find localized composite channels that outperform the standard channels, simply because the increase in parameters allowed us to find a more tailored partitioning.
One of the difficulties with the techniques introduced in Ref. 22 is finding good parameters to yield cost improvements.Our numeric results seem to suggest that even heuristic approaches for parameters, such as N B , can yield at least factors of 2 reductions for small Hamiltonians, as seen in Fig. 4. The difficulties of multiple parameter optimization becomes more of an issue when moving to the local simulation framework.With m blocks there are at least 2m parameters that need to be determined with a chop partitioning scheme.However, simple heuristics still lead to performance advantages over localized Trotter-Suzuki formulas in Figure 11.The fact that said heuristics provide advantages with only an 8 spin model suggests that cost savings should scale favorably with increased lattice sizes.
In Section V C, a series of short but significant proofs were also provided to remind the reader to be cautious in examining error within hybridized algorithms.In our case, the infidelity treated QDrift differently than Trotter by a square, and using it would have skewed the partitions and provided inaccurate costs on the composite channel.
Finally, the library built for this project is easy to implement.It is also flexible, and it reliably evaluates the exact number of operator of exponentials e −iHj t required to execute on a quantum computer (or e −iHj β for classical) to time evolve a system within a given error tolerance.To reiterate, the reason this is important is because the cost of these algorithms is highly dependent on the Hamiltonian, specifically in the number and size of the terms, and their commutators.Additionally, the costs and errors discussed in this paper are derived from analytic upper bounds on the diamond distance and induced Schatten 1 → 1 norm, which are often bounded with repeated applications of the triangle inequality and using max operations, and can therefore be loose.These exact numerics can capture the true costs and errors of these algorithms, and advantages can motivate further studies into these approaches.This library is available upon reasonable request to the authors.
Now we shall proceed by dealing with each of these terms individually.The denominator of the a can be expanded via its geometric series: We will first focus on the leftmost factor.To bound this term we will apply the triangle inequality to apply the norm to each term in the sum.We will also use the fact that the absolute value of the trace is trivially upper-bounded by the induced 1-norm: |Tr L(ρ)| ≤ L(ρ) 1→1 .Here it is understood that this absolute value is maximized over all ρ : ρ = 1.We then use the sub-multiplicative property Where we require |e 2βλ /N − 1| < 1. Next, inspecting the numerator, this is of the form bounded in the Appendix of [11], therefore, the following holds trivially: Next, let us proceed to bound the b term: Now, carefully expanding the double sums up to second order: Using the that j h j L j = L, the final two terms cancel and we have: Now we once again use the property |Tr L(ρ)| ≤ L(ρ) 1→1 : and using the same integral bound we obtain: Finally, we are left to bound the second term in Equation A34, which we can write in the following way: (A42) The function f (y) = y ln y ln y is monotonic over the constrained domain and can be upper bounded by inserting the largest value that ln y achieves.Also, notice the term in parentheses is precisely the 3rd term in Equation A34 that we bounded above.Using these two observations: where in the last step we once again used the integral identity following the same procedure as we did for the first term.Now that we have linearized each term, we combine all the results to obtain: where a ≈ 28.41845, b ≈ 128.95110 and c ≈ 95.08717.If we make the promise that λ N ≤ 0.01, the bound simplifies to where C ≈ 29.71747, completing the proof.
This condition is not very restrictive on the imaginary time β.Considering the condition given in Equation (A35), we have β < ln 2N 2λ ≈ 34.657, so β < 10π, meaning that given this promise on the ratio λ N , the bound holds for a simulation time that allows the phase to oscillate up to 5 times if we were dealing with real-time.

Imaginary Time Trotter-Suzuki Formulas
Following the procedure in [22], we must first convert the existing bound to a bound on channels and then bound the error with the renormalizing operations.
Here Υ is the number of stages of the product formula, 2k is the order of the product formula, and λ is the sum of the spectral norms of the Hamiltonian summands.Here O (•) is understood in the infinite limit of its arguments.
and N samples, building the exact channel yields additional matrix multiplication overhead O (N L) (not considering arithmetic overhead).While this is does not seem prohibitive, it does become expensive when we have a Hamiltonian with many terms that we wish to sample many times for a high degree of accuracy.It should be noted, however, that this is not as bad as it initially seems given that there are L N terms QDrift channel sum.The numerical routine that computes this sum was optimized to achieve this, by using the fact that channel samples are i.i.d.With intermediate summation to produce a new input ρ after each sample vector, we do not need to compute each term in this sum individually: j p j e −iHj t ρ i e iHj t → ρ i+1 (B13) repeating this procedure N times until ρ i → ρ N which constructs the exact QDrift channel output with O (N L) matrix multiplications.We mention this to highlight the advantages of Monte-Carlo sampling where it can be done reliably, as well as to highlight how QDrift is numerically computed exactly in our work (which will be made necessary by results in the following section).

b. Trace Distance over Density Matrices
The next error measure we wish to investigate is the trace distance, which is equivalent to the Schatten 1-norm.The trace distance is a metric over the space of density matrices that can be defined in the following way: Definition 12 (Trace Distance).Let ρ, σ ∈ C 2 n ×2 n be density matrices that represent quantum states with equal Hilbert space dimension.The trace distance between the states is then defined as: Where T (ρ, σ) ∈ [0, 1].
Given that ρ and σ are necessarily hermitian, the trace distance simplifies to T (ρ, σ) = i |λ i | 2 where λ i are eigenvalues of the matrix (ρ − σ).There are no simplifications in our approach to computing this numerically.Here, we must compute the exact density matrix output of the composite channel (a necessarily mixed state outputted by the QDrift channel component) and the output of the unitary channel.Exact diagonalization is performed to obtain the eigenvalues, and the sum above is then computed.Providing a rigorous argument for crossover points in the compute time between the fidelity and trace distance (as a function of N, L) is beyond the scope of this paper, however, given the arguments from the previous section our intuition is that for small systems the trace distance will be computable in a reasonable amount of time whereas Monte-Carlo sampling the infidelity will become more advantageous as the system size grows.

Comparison of Error Measures
In the following section we provide short proofs regarding how the error of each of the QDrift and Trotter-Suzuki algorithms scale with time, both in the infidelity and trace distance framework.It is important to note that we are specifically using the Schatten 1-norm over density matrices and not an induced 1-norm over channels.Therefore, we are computing this quantity using the outputs of the two channels, and the same goes for the infidelity which is only defined on the density matrices that are the outputs of the channels in question.We begin with the infidelity: Proposition V.1 (QDrift Infidelity Time Scaling).Given a QDrift channel Q(ρ, t) and the standard evolutionary channel U(ρ, t /N), for a density matrix ρ, time t/N , then the infidelity between the outputs of the channels 1 − F (Q(ρ, t), U(ρ, t/N )) ∈ O t 2 .Here we expand the Qdrift channel with the Taylor series of the exponential operators and show that the terms of this series cancel with that of the Taylor series of U N up to order t 2 .To achieve this we simply use the definitions of P j = h j /λ and j H j = H.In the last line R(t) is considered a remainder operator which is only a function of polynomials t n , n ≥ 0. The notation O t 2 is understood in the limit as t → 0.
Proposition V.4 (Trotter-Suzuki Trace Distance Time Scaling).Given a Trotter-Suzuki channel T 2k (ρ, t) and the standard evolutionary channel U(ρ, t), for an arbitrary density matrix ρ and time t, then the trace distance between the outputs of the channels T (T 2k (ρ, t), U(ρ, t)) ∈ O t 2k+1 .
be more expensive to compute for larger systems, this makes it favourable to work with as it provides a more "fair" measure, treating the algorithms on more equal footing.Thus, it is expected that the composite channel will appear more performant using this metric.In addition, using this framework requires no sampling, which means that monotonicity of the cost will be guaranteed with respect to channel iterations.This will prove useful in the cases in which we choose to optimize over possible partitions with respect to the cost.Now, out of interest, we also wish to numerically investigate the trace distance of the composite channel.Analytically, this is likely a messy problem given the partition, but intuitively we expect the scaling to be some linear combination αt 2 + βt 3 + γ with the slope of the log plot being between 2 and 3 (see Figure 13).
FIG. 13: Trace distance time scaling for a composite channel in inner order 2k = 2.This plot emits an interesting structure in that the time dependence of the trace distance goes through a "phase transition" between QDrift (red) and 2nd order Trotter-Suzuki (blue) dominated regions.Given that QDrift excels in short time simulations and whereas the higher order Trotter-Suzuki dominates at longer simulation times this is not entirely unexpected.

FIG. 1 :
FIG.1:A simple quantum circuit showing an implementation of an exponentiated produce of Pauli spin operators.

Ax 2 e B e Ax 2 ,
which annuls the error term [B,A]x 2 2

Theorem 9 .
and m is the number of blockings constructed by the Lieb-Robinson decomposition.This provides convenient notation for the following theorem: Given a time t, local channel iterations r, and a local Hamiltonian H Γ on a D-dimensional lattice Γ, let X Y (t) be a local-composite channel with m local blocks Y i , and partition {A, B} Y on each block, then the diamond norm of the difference between the local composite channel and the evolutionary channel has the following bound:

Y
DL D /l (37) where A Y , B Y are once again the terms of H Y in Trotter and QDrift with their respective spectral norms a Y , b Y , N B is the number of QDrift samples and λ is the sum of the spectral norms of the terms in QDrift.

FIG. 3 :
FIG. 3: Jellium Spectral Norm Distribution: Semi-log plots of the sorted normalized spectral norms versus Hamiltonian index for 5 and 7 site Jellium models in figures (a) and (b) respectively.The plots show the increases in number of terms as well as how the distributions become increasingly sharply peaked.In red we provide a potential choice of ω c for the partitioning heuristic.

FIG. 5 :
FIG. 5: Optimized H 3 Simulation Parameters: Semi-log plots of parameters obtained by the GBRT optimization routine for the X 1 and X 2,1 real time channels.These parameter choices correspond to the H 3 simulation in Figure 4.In (a) we plot the cardinality of the A set over the total number of terms, as a function of time.These values are dictated by GBRT optimized value of ω c .In (b) we present the equivalent plot with N B .

FIG. 6 :
FIG. 6: Jellium Simulation Cost plots (Real-time): Log-log cost plots of quantum simulations of Jellium with 5, 6, and 7 sites in (a), (b), and (c) respectively.In (a) and (b) we have in red, a chop heuristic where no optimization overhead is used.The distribution of Hamiltonian terms is chopped immediately before max d Hj dj , and approximately 1 5L terms are sampled.This heuristic works quite well, although it is outperformed by the optimized version, especially at short times.In (a) we achieve ξ = 9.2.In (b) we achieve an impressive advantage of ξ = 18.8, the largest of all our real time results.In (c), we perform a similar analysis of the 7 site model with some higher order composite channels, but find that the optimizer has increased difficulty with larger numbers of Hamiltonian terms.Here ξ = 10.4,but through inspecting some neighbouring points of the crossover region, it surely has the potentially to be much larger.

FIG. 7 :
FIG. 7: Toy Spin Graph Model Cost Plot: In (a) the we have a cost plot for the 7 spin model where we obtain a crossover advantage of ξ = 4.1, which is fairly significant.The plot has multiple regions where different composite channels are optimal.In (b) we have the 8 spin model where we establish ξ = 3.9.This advantage, as well as the channel performance is almost identical to (a).

FIG. 9 :
FIG. 9: H 3 Simulation Cost Plot (Imaginary-Time): The same parameters are used to build the 3 atom Hydrogen chain as we done in real time.We achieve incredibly similar results, and again recover the real time result of ξ = 2.3 that was achieved in Figure 4.