Concentration for random product formulas

Quantum simulation has wide applications in quantum chemistry and physics. Recently, scientists have begun exploring the use of randomized methods for accelerating quantum simulation. Among them, a simple and powerful technique, called qDRIFT, is known to generate random product formulas for which the average quantum channel approximates the ideal evolution. qDRIFT achieves a gate count that does not explicitly depend on the number of terms in the Hamiltonian, which contrasts with Suzuki formulas. This work aims to understand the origin of this speed-up by comprehensively analyzing a single realization of the random product formula produced by qDRIFT. The main results prove that a typical realization of the randomized product formula approximates the ideal unitary evolution up to a small diamond-norm error. The gate complexity is already independent of the number of terms in the Hamiltonian, but it depends on the system size and the sum of the interaction strengths in the Hamiltonian. Remarkably, the same random evolution starting from an arbitrary, but fixed, input state yields a much shorter circuit suitable for that input state. In contrast, in deterministic settings, such an improvement usually requires initial state knowledge. The proofs depend on concentration inequalities for vector and matrix martingales, and the framework is applicable to other randomized product formulas. Our bounds are saturated by certain commuting Hamiltonians.


I. INTRODUCTION
Simulating complex quantum systems is one of the most promising applications for quantum computers. This task has many applications, such as developing new pharmaceuticals, catalysts, and materials [5,21,35], as well as solving linear algebra problems [3,26,43]. The task of digital quantum (dynamics) simulation can be phrased as a compiling problem: approximate a given unitary, say a Hamiltonian evolution U = e −iHt , by a product of 'simple' unitaries g k : Such a decomposition into elementary gates should obey two conditions: (i) It should accurately approximate the target unitary. In this work, we require that the error in operator norm 1 satisfies U − V ≤ for some specified accuracy parameter . Moreover, (ii) the decomposition should cost as little as possible. Several cost functions make sense in this context, but most of them can be approximated by the gate complexity, i.e., the number N of simple gates 2 on the right-hand side of Eq. (1). * These authors contributed equally. 1 For measuring time-evolved observables tr(ρ(t)O), the operator norm suffices; for quantum phase estimation of ground state energies things can get more complicated, though. There, the estimates may depend on the details and vary orders of magnitude [29] and the assessment of real costs is an on-going research direction that is beyond the scope of this work. 2 In this work, we will count e −ih j t as a single gate, as in [6,7,13,14,31]. Strictly speaking, in a fault-tolerant quantum computer, The earliest compilation procedures for quantum simulation were based on product formulas [30,45], also known as Trotterization, or splitting methods. They depend on the idea of approximating the matrix exponential of a sum by a product of matrix exponentials.
We will review one such construction below in Eq. (2). Subsequently, alternative principles have led to the development of other quantum simulation algorithms. These include linear combination of unitaries [15], quantum signal processing [31] and qubitization [32]. By and large, these algorithms rely on more powerful quantum computing primitives to yield improved performance in accuracy and cost. We refer to [34] for a systematic review.
Despite these advanced simulation techniques, product formulas have recently undergone a renaissance [6,7,9,13,14,31]. They are not only simple and (comparatively) easy to implement on near-term devices, but they also remain very competitive [14] provided that they incorporate information about the structure of the problem, such as initial state knowledge [42], locality [23] or the commutator structure of Hamiltonian [14].The purpose of this paper is to explore the randomized aspects of constructing product formulas.
We begin by reviewing the first-order Lie-Trotter formula, which will be later contrasted with a randomized variant (qDRIFT). To that end, consider a quantum many-body Hamiltonian H = L j=1 h j composed of L we would further decompose each e −ih j t into universal 2-qubit gates. According to the Solovay-Kiteav Theorem [17] this incurs at most a constant (multiplicative) overhead, but the exact cost depends on the type of hardware.
simple terms h j . The first-order formula cycles through each term in the Hamiltonian U ≈ exp (−i(tL/N )h L ) · · · exp (−i(tL/N )h 1 ) N/L .
(2) To approximate the target unitary U up to accuracy in operator norm, a total gate count N = O(Lλ 2 t 2 / ) suffices 3 [14,Section 3]. In this expression, t is the simulation time, L is the number of terms, and λ = j h j summarizes the interaction strengths within H. The main idea is to cancel out the leading-order term in the Taylor expansion by including each of the L terms. Owing to this construction, the factor L remains in higher-order Suzuki formulas where the gate count N = O(L(λt) 1+o(1) / o(1) ), even though the time-dependence becomes nearly optimal. We refer to Table I for a sharper gate count incorporating commutators.
Recently, researchers started using randomization to improve the performance of product formulas [9,13,18,38]. Campbell [9] introduced the qDRIFT algorithm, which approximates the target evolution U = e −iHt by a quantum channel that results from averaging products V N · · · V 1 of random unitaries. Each V k corresponds to a short-time evolution based on a single term h K from the Hamiltonian. The index K is selected randomly, according to an importance sampling distribution (p 1 , . . . , p L ), constructed to match the leading order of time step This approximation is achieved by averaging over a single (random) unitary. (In contrast, the first order Suzuki formula (2) must cycle through all terms which incurs an extra L-factor.) Independence among the V k 's then ensures Campbell proved that the averaged Hamiltonian approximates the true Hamiltonian when the gate count satisfies Observe that the gate count is independent of the number of terms L in the Hamiltonian. We refer to Figure 1 for a summary of this procedure. Operationally, Campbell considers a black box that applies a new random product V N · · · V 1 of unitaries every time it is invoked. The average of all possible product formulas is V (N ) (X) = E[V N · · · V 1 XV † 1 · · · V † N ] and forms a completely positve trace-preserving (CPTP) map. The ideal unitary also forms a CPTP map given by U(X) = U XU † . Campbell proves that (3) ensures that the two CPTP maps are -close in diamond distance. 3 To finally compile into universal gates, run a standard gate synthesis algorithm (such as Solovay-Kiteav) for each simple term. This yields another multiplicative factor on the gate count.

Randomized product formula (qDRIFT)
Inputs: Hamiltonian H = L j=1 hj with interaction strength λ = j hj , evolution time t, and number of steps N .
At each t/N interval: evolve a random term in Hamiltonian according to its importance Output: the unstructured (randomly generated) product formula This procedure uses a single, randomly selected, Hamiltonian term to approximate the whole quantum simulation up to time t/N . Subsequent time steps are approximated in an analogous fashion. The importance sampling distribution over Hamiltonian terms is designed to provide accurate approximations in expectation: It is interesting to compare the gate count (3) achieved by qDRIFT with very recent and powerful results for (deterministic) product formulas [14,23], see Table I. Broadly speaking, deterministic product formulas can achieve gate counts that are linear in time t and the number L of terms. The qDRIFT gate count, on the other hand, is independent of L, but quadratic in t. This implies that qDRIFT should be favored for short-time simulations rather than long-time simulations. For systems with many terms in the Hamiltonian (L 1), such as in quantum chemistry and the SYK model [33,40,41] 4 , there are indeed physically relevant time-scales for which qDRIFT outperforms Suzuki formulas; see Table I. To summarize, there are interesting quantum simulation problems where the qDRIFT gate count (3) compares favorably with very recent and powerful results about high-order Suzuki formulas. This advantage is a consequence of randomization. But we may ask whether the benefit arises from the random choice of an individual product formula or whether it is due to the averaging of many product formulas together. Which aspect produces the speed-up?
All input states ρ 1 , ρ 2 , …  To sample a product formula that works for all n-qubit input states and observables with high probability, the number of gates is larger than sampling a product formula that works for a fixed, yet arbitrary, input state (center). Resampling fresh product formulas every time (right) produces an average channel that requires even fewer gates; this is the original qDRIFT guarantee [9].
In this work, we analyze why random product formulas are efficient. To do so, we study the performance of a random instance of the product formula V N · · · V 1 . By contrasting this instance with the average channel V (N ) (X), we deduce that random sampling alone allows us to avoid the dependency on the number L of terms in the Hamiltonian; it is not necessary to average many different product formulas. Our results establish strong concentration bounds for two cases, depicted in Figure 2.
First, let n denote the number of system constituents, e.g., qubits. If the gate count N obeys N ≥ Ω(nt 2 λ 2 / 2 ), then, with high probability, a single realization of the random product formula approximates the ideal target unitary up to accuracy in operator norm. By the probabilistic method, this result also establishes, for the first first time, the existence of product formulas whose gate count N is independent of the number L of terms in the Hamiltonian but depends on the system size n instead. On the other hand, we cannot easily verify the quality of any given instance.
In practice, we often wish to evolve a fixed input quantum state ρ, which may be arbitrary and unknown. This change in the problem statement has profound implications for randomized quantum simulation. With high probability, a random product formula with terms suffices to achieve an -approximation V N · · · V 1 ρV † 1 · · · V † N of the ideal time-evolved state U ρU † with respect to trace distance. Roughly speaking, this result implies that each input state has a set of product formulas that are n times shorter than a "general-purpose" product formula that works for all input states simultaneously. Although the set of effective product formulas depends on the choice of state and observable, the formulas are all produced by the same randomized procedure. Remarkably, this procedure does not exploit any knowledge of the particular input state.
Note that the gate count required for the original qDRIFT protocol (3) and Rel. (5) only differ in the scaling with accuracy: order 1/ for the full qDRIFT channel vs. order 1/ 2 for a single random product formula with fixed input. This discrepancy is reminiscent of the mixing Lemma by Hastings [24] and Campbell [8]: mixing unitaries yields a "quadratic" improvement in accuracy. See Lemma 3.1 below for a statement.
In this work, we analyze several classes of randomized product formulas, including qDRIFT as a particular example. The underlying theory is far more general because it relies on powerful concentration results for matrix and vector martingales. The approach yields strong concentration results for any product of random unitary matrices, such as the ones that arise from randomized compiling [8,24]. We are confident that these ideas are applicable to other problems that involve functions of random matrices, such as [10,12].
Roadmap The rest of this paper is organized as follows. Section II presents the main theoretical contributions to this work in detail. These are further supported and illustrated by numerical experiments presented in Section II D. Section III contains two instructive examples, as well as a non-technical illustration of the underlying proof idea. We introduce related background for martingales in Appendix A. Details and rigorous arguments are provided in Appendix B). The final appendix section (Section C) establishes asymptotic tightness for time-evolving two simple (commuting) Hamiltonians.

II. MAIN RESULTS
This section gives rigorous results for the error incurred by a randomly sampled product formula V N · · · V 1 , as compared with the ideal unitary evolution operator U = exp(−iHt). The results depend on the distance measure and the particular setup, which we discuss separately in the following subsections.
A. Error bound in diamond distance: Worst-case error over all input states and observables The diamond distance is a standard distance measure for quantum channels. To compare two unitaries U 1 and U 2 , the diamond distance is equivalent to Our first main result bounds the gate complexity that suffices to guarantee that the randomly sampled product formula V N · · · V 1 is close to the ideal evolution exp(−itH) in this worst-case error metric.
With probability at least 1−δ, the diamond distance error satisfies To keep notation as simple as possible, we have formulated this result for pure input states |ψ . Convexity readily allows for extending the bound to mixed input states ρ = i p i |ψ i ψ i | as well. A complementary result bounds the expected approximation error in terms of gate count N .
The symbol < ∼ applies in the large-N regime and suppresses constants. The proof sketch is presented in Section III C, and the detailed proof is given in Section B 1.
For comparison, the error bounds for the average quantum channel, established in [9], imply that As we can see, the error bound of the average over all product formulas is smaller than the error for an individual random product formula. To understand the discrepancy, it is valuable to think about a randomly sampled product formula as a random walk on the group of 2 n ×2 n unitary matrices. Figure 3 indicates why a single realization of the random walk V N · · · V 1 has much greater error than the average of the random walks.
In the error bound O( nt 2 λ 2 /N ), the square root reflects the statistical nature of the fluctuations in the random walk around its expectation. The diamond norm requires us to control the behavior of the random product formula when applied to every 2 n -dimensional input Figure 3.
Illustration of concentration effects for random walks (and their averages) on the unitary group.
The expectation E[VN · · · V1] of a random product formula is not unitary, but it may be very close to the ideal evolution. A sampled random product formula VN · · · V1 is unitary, but its distance from the ideal evolution is about O( nt 2 λ 2 /N ). The average of the random product formulas results in an error of O t 2 λ 2 /N . state. Remarkably, we only pay for the logarithm of the dimension, which coincides with the number n of qubits. We will discuss an intuitive example, where log(2 n ) naturally arises from a union bound in Section III B. Formally, this pre-factor is a general feature of concentration inequality for matrix martingales (Sec. A). Similar proof techniques could potentially be useful for controlling stochastic errors in other quantum computing applications.

B. Faster simulation for a fixed input state
In practice, it is common to perform the quantum simulation starting from a particular input state. The distinction with the previous setting is that the product formula only needs to work for one (arbitrary and possibly unknown) input state, not for all states simultaneously. The next theorem asserts that much shorter product formulas suffice in the easier setting.
With probability at least 1 − δ, the output state V N · · · V 1 |ψ satisfies where O |ψ = ψ| O |ψ . This is equivalent to the output state V N · · · V 1 |ψ being -close to the ideal output state exp(−itH) |ψ in trace distance.
Again, convexity allows us to extend this bound to a (fixed) mixed state ρ = i p i |ψ i ψ i | as well. And, a complementary result bounds the expected approximation error in terms of gate count N . Corollary 2.1 (qDRIFT: Error bound in trace distance). Consider an n-qubit Hamiltonian H = j h j with λ = j h j . A randomized product formula V N · · · V 1 , drawn from (4), has expected trace distance error for any fixed input Theorem 2 yields an n-fold improvement over the gate count from Theorem 1. So, for a 100-qubit system, focusing on a single input state leads to a 100× reduction in gate complexity over a simulation that works for all input states. The product formulas that work for a fixed input state may vary with the choice of state, but they are all generated using the same qDRIFT procedure. Taking a union bound, we can also construct short product formulas that work for a moderate number of input states by increasing the gate complexity slightly.
The proof of Theorem 2 is similar in spirit to the proof of Theorem 1. We construct a random walk from the (fixed) starting state |ψ . We show that, with high probability, the output state is close to the ideal output state U |ψ . However, what marks the difference from the unitary results (Theorem 1) is that vector concentration inequalities suffice. More precisely, we analyze the vector random walk using a geometric tool, called uniform smoothness [25]. The details appear in Section B 2. The resulting n-fold improvement can also be understood as a result of switching the order of quantifiers, see III B for an explicit example.

C.
Extension to general products of random unitaries So far, we have presented our results exclusively for qDRIFT. But the underlying concentration analysis readily extends to more general random walks on the unitary group. Let V N · · · V 1 ∈ U (2 n ) be a random product designed to approximate a target unitary U = U N · · · U 1 . We need two properties. (i) Causality: for 1 ≤ k ≤ N the random selection of V k can only depend on previous choices for V 1 , . . . , V k−1 : (ii) accurate approximation: Each realization of V k (and their conditional expectation) must be close to the ideal unitary U k . More precisely, let R, b k > 0 be constants such that almost surely for all 1 ≤ k ≤ N . All concentration results from this work, as well as the main result from [9], do readily extend to random products that do obey these two properties.
Theorem 3 (general concentration bounds for products of random unitaries). Let V = V N · V 1 ∈ U (2 n ) be a random product that approximates a target product U = U N · · · U 1 in a causal (Eq. (8)) and small-step (Eq. (9) fashion. Then, the associated unitary channels V(X) = V XV † and U(X) = U XU † obey where < ∼ suppressed absolute constants.

Concentration of randomly permuted Suzuki formulas
So far, we have only introduced the first order Lie-Trotter formulas (2). The 2nd order formula is typically called the Suzuki formula. For some short time τ > 0, define Higher order formulas, also named after Suzuki, are constructed recursively from S 2 (τ ): where q p := 1/(4 − 4 1/(2p−1) ) [45]. We can combine r rounds of 2p-th order Suzuki formulas with time τ = t/r each to approximate a desired quantum evolution. This yields For fixed, we choose an appropriate number of rounds r and obtain a gate count N ≈ rL that scales as simple gates e −i(t/r)hj to ensure worst-case approximation error U − V ≤ [13]. Note that the 2p-th order Suzuki formula S 2p (τ ) does not specify a preferred order in which the terms in Hamiltonian h j should appear. Childs, Ostrander and Su observed that randomly permuting the order of Hamiltonian terms within each block S 2p (t/r) can suppress the approximation error for low order Suzuki formulas [13] considerably. More precisely, we reshuffle the ordering in each Suzuki block by applying uniformly random permutations π 1 , . . . , π r to the term indices (h j → h π1(j) for the first Suzuki block, etc.). Denote the resulting randomized Suzuki formula by V πr···π1 Childs, Ostrander and Su proved that a total gate count ensures that the average channel (averaged over all possible permutations) obeys U − E πr,...,π1 V πr···π1 2p ≤ . We note in passing that Eq. (11) is tighter than the original bound presented in [13]. This slight improvement follows from replacing the original mixing Lemma [8,24] in the proof of Childs et al. by a stronger statement derived in this work (Lemma 3.1 below) Applying Theorem 3 to the problem at hand supplies strong concentration around this expected behavior.
Corollary 3.1 (concentration of randomly permuted Suzuki-formulas). Consider a n-qubit Hamiltonian H = j h j with Λ = max j h j , an associated time-t evolution U = e −itH and a Suzuki order 2p. Then, a total gate count of , ensures that a randomly permuted Suzuki formula with r terms obeys E π1,...,πr U − V πr···π1 2p · · · S π1 2p (t/r) ≤ (typical case). For a fixed (but arbitrary) input state, the gate count can be further reduced to . For simplicity, we omitted the logarithmic dependence on failure probability δ.
In contrast to qDRFIT, the parameters n, , L parameters now all raised to the 1/(4p + 1)st power, and the different randomized settings yield very much the same gate count tΛ 2 L for large p. This is in accordance with observations provided in [13].
For illustration, we have chosen to directly import older results (10) to compare with the averaged channel bounds (11). However, (10) can be sharped to scale with commutator [14]. Future work remains to carry out commutator analysis for the averaged channel (11) and then swiftly upgrade Corollary 3.1. We refer to the appendix (Sec. B 3) for detailed statements and proofs.

Compiling
Beyond Hamiltonian simulation, random product also help suppressing error in compiling. The task of compiling turns a perfect circuit diagram into a sequence of executable gates. However, since the gate set is discrete (or imperfect), the compiler can only return an approximate circuit where each V k will be further decomposed into universal gates (such as using the Solovay-Kiteav Theorem, see e.g. [17]) up to some individual accuracy U k − V k ≤ η.
In the worst case, the local error U k − V k ≤ η accumulates linearly and we must conclude by means of the triangle inequality. This means that individual accuracy η = /N is required to ensure an overall approximation error of (at most) . This in turn, requires more gates for each approximation, because ηapproximating V k requires a gate count 5 proportional to log c (η). Randomized compiling [8,24] addresses precisely this issue. It uses random gate synthesis to avoid that individual approximation errors add up "coherently" over the entire compilation. The resulting "incoherent" error accumulation can be much more favorable and the mixing Lemma [8,24] makes this intuition precise. In the following, we present a sharpened version of this statement that seems to be novel. Lemma 3.1 (Mixing lemma (sharpened)). Let U k be a fixed unitary and V k a random approximation thereof that obeys U k − EV k ≤ b, for some b > 0. Then, the associated channels obey We refer to Appendix B 1 a for a self-contained proof. Back to compiling, EV k −U k can as small as O(η 2 ) [8] by mixing appropriate synthesis protocols for V k . This yields an improved bound for the average channel, In words, the individual accuracy needed is suppressed quadratically to η = Ω( /N ). In [8], Campbell pointed out that this square root improvement may turn into a multiplicative overhead reduction, depending on the gate synthesis efficiency log c ( √ η) = log c (η)/2 c . Using Theorem 3, we can immediately bound the performance of randomized compiling without mixing.
Corollary 3.2 (Randomized compiling without mixing). Suppose that we wish to approximate U = U N · · · U 1 by a gate collection V = V N · · · V 1 such that each V k is random and obeys U k − V k ≤ η almost surely. Then, What is more, the √ n-factor on the r.h.s. disappears if we restrict attention to a fixed input state.

D. Numerical experiments
In this section, we perform numerical experiments for simulating a simple Heisenberg model on a onedimensional chain with a randomly sampled product formula. For n qubits, H = 1 and we view this as a sum of 3(n − 1) simple terms. The normalization keeps the interaction strength λ = 3 as a constant and we consider a constant time evolution t = 2. The numerical experiments for the error under various setups using different gate count N and different system sizes n are shown in Figure 4. We can see that the error when we consider all input states scales as √ n. In contrast, the error stays roughly the same when we only consider a single input state. This is in accordance with the theoretical predictions presented in Theorem 1 and 2.

A. Comparison between stochastic averages of product formulas and concrete instances
This section considers an extremely simple Hamiltonian to pinpoint important differences between averaging random product formulas (that is, Campbell's black box) and concrete instances of product formulas. The example Hamiltonian is a 1-local non-interacting Hamiltonian with a Pauli-Z operator acting on each qubit: All input states Fixed input state In All input states (left), we consider = U − VN . . . V1 ∞ , which considers the error over all input states and observables. In Fixed input state (right), we consider = U |ψ − VN . . . V1 |ψ 2 , which considers the error over all observables. The input state |ψ is chosen to be the tensor product of single-qubit Haar-random states. For both All input state and Fixed input state, we give an additional plot showing how the error increases as the system size n increases for a fixed number of gates N = 160. The y-axis is normalized using the average error for system size n = 4 over 50 independent runs. Bounds in Theorem 1 and 2 show that the relative error n/ n=4 scales as n/4 for All input state and stays as constant 1 for Fixed input state, which are shown as the dotted lines. The shaded regions are the standard deviation over 50 independent runs.
The relevant parameters are L = n (number of terms), λ = 1 n n k=1 Z k = 1 (interaction strength) and we fix the evolution time to t = π.
Stochastic averages of random product formulas can accurately approximate the associated unitary evolution U = exp(−iπH) after only a few iterations. The following observation is an immediate consequence of Campbell's main result [9], see also Proposition 3.2 below.
This assertion seems remarkably strong. In particular, the sequence length N does not depend on the number of qubits n. Once n is sufficiently large it becomes impossible for concrete product formulas to achieve comparable results. The problem is that the sequence length N is too small to address all n qubits. This necessarily leads to substantial discrepancies between the simulated time evolution V N · · · V 1 and the actual target U , see Figure 5 for an illustration.
Lemma 3.2. Assume that n is an even number. It is impossible to accurately approximate the time evolution U defined in Eq. (15) with N < n/2 elementary gates of the form V i = exp(−i π N Z k(i) )⊗I (else) . More precisely, for each product formula V = V N · · · V 1 , there exists an input state |ψ ψ| such that 1 2 |0i ⌦n/2 Figure 5. Illustration of the worst-case input for a product formula simulating evolution of a simple Hamiltonian.
The Hamiltonian H = 1 n n k=1 Z k produces a time evolution that factorizes into single qubit unitaries U (left). A product formula with fewer than n/2 single-site terms (right) is too small to address all qubits; at least n/2 of them must remain untouched. These errors accumulate for a GHZ-state comprised of these untouched qubits. If n is large, even small evolution times (U = exp(−i π n Z)) can accumulate and lead to a maximal approximation error ( GHZ(+), GHZ(−) = 0). (15) commute. Hence, the associated target evolution factorizes nicely into tensor products:

Proof. All terms in the Hamiltonian
Up to a global phase, each singlequbit unitary affects the computational basis in the following fashion: exp(−i π n Z)|0 = |0 and exp(−i π n Z)|1 = exp(i 2π n )|1 . These small phase shifts can add up for states that are in superposition. Consider the tensor product of a GHZ state on n/2 qubits with the allzeroes state on the remaining half: |ψ = 1 √ 2 (|0 ⊗n/2 + |1 ⊗n/2 ) ⊗ |0 ⊗n/2 . Then, and we can easily check that input and output are orthogonal to each other: 1 2 U |ψ ψ |U † − |ψ ψ | 1 = 1. These features do not change if we permute the qubits in the input state |ψ . Any combination of a GHZ state on one half of the qubits with computational |0 -states on the remaining ones obeys the same orthogonality relation. We can use this freedom to construct a worst-case input |ψ for a fixed product formula V = V N · · · V 1 comprised of fewer than n/2 single-qubit gates. Simply initialize the (at most) n/2 qubits on which the product formula acts nontrivially in the computational 0-state and hide the GHZ component among the remaining qubits. By construction, the product formula V does not affect this input state at all.This is a worst case, because the target unitary U does rotate the hidden GHZ component into an This negative statement highlights that the gate count of (worst case) accurate product formulas must in general depend on the number of qubits and justifies the appearance of n in Theorem 1. Note, however, that Lemma 3.2 is contingent on identifying a worst-case input state for a fixed (and known) product formula. If the input state is fixed, the situation can change dramatically. For instance, we could use explicit knowledge of the input to construct a product formula that accurately approximates its time evolution. Identifying an optimal product formula seems like a daunting task, but randomness can help. Theorem 2 asserts that a collection of N > ∼ π 2 / 2 randomly selected single-qubit gates approximate the time evolution (15) of any fixed input state |ψ up to accuracy in trace distance. While this gate count is considerably larger than the one put forth in Corollary 3.3, it is still independent of the number of qubits. What is more, this assertion applies with high probability to any fixed input state. This capitalizes on another advantage of generating unstructured product formulas according to a randomized procedure: it is extremely difficult to fool a randomized compiling procedure with an already fixed input.

B. Instructive concentration argument for a simple Hamiltonian
This section provides intuition for the concentration effects that ultimately imply Theorem 1 by means of another example Hamiltonian that is composed of (commuting) Pauli-Z terms only: For all 2^n starting states: Figure 6. Illustration of the probabilistic proof for the commuting Hamiltonian given in Equation (16). We consider all the 2 n computational basis states as the starting state. The probability for one of the starting state to incur at least an error is exponentially smaller than the probability for the maximum of the 2 n starting states to incur error > . However the failure probability is exponential suppressed by increasing the gate count N . Hence one only need to set N = n/ 2 .
(with the convention that Z 0 = I) and α p ∈ {−1, 1}. That is, the Hamiltonian is a sum of 2 n signed Pauli strings that are comprised of Z and I, as well as a global sign. A high-order Suzuki formula would require a gate complexity of O(L) = O(2 n ) by putting down every term. In contrast, by random sampling, Theorem 1 yields a gate complexity of O(n/ 2 ) (Theorem 2 yields O(1/ 2 )). This is an exponential improvement in system size.
The physical intuition is that all the terms in the Hamiltonian act on the same system with n qubits (a 2 ndimensional Hilbert space), so their actions must "overlap" with one another, and can be efficiently estimated by random sampling. To see this effect more clearly, let us write down the unitary evolution exp(−iH) in the computational basis |b with multi-index b = (b 1 , . . . , b n ) ∈ {0, 1} n . Note that all terms in the Hamiltonian (16) are diagonal in the computational basis. This implies When we instead sub-sample an effective HamiltonianĤ comprised of N randomly selected terms, the constructed product formula would be By Hoeffding's inequality,Ŝ(b) = 1 N k c p k (b) should concentrate around S(b) = 2 −n p∈{0,1} n c p (b) with standard deviation 1/ √ N and an exponentially decaying tail (think Monte Carlo). An illustration and some helpful facts can be found in Figure 6.
Let us now fix an (arbitrary) n-qubit basis state. Then, the probability of an -deviation (or larger), i.e. |Ŝ(b) − S(b)| > , can be bounded by 1/e if we choose N = 1/ 2 . However, if we wish the effective Hamiltonian to work for all basis states simultaneously, we would choose a larger N = n/ 2 to ensure that the deviation probability is further suppressed exponentially to 1/e n . By a union bound, |Ŝ(b) − S(b)| ≤ for all 2 n computational basis states |b with probability at least 1 − 2 n /e n . Albeit in the simplest example (commuting Hamiltonian), this demonstrates that a random product formula exp(−iĤ) can accurately simulate exp(−iH) up to error with only N = n/ 2 gates, which further reduces to O(1/ 2 ) for an arbitrary basis state. The powerful tool of concentration for matrix (vector) martingales allows us to prove the same statement for any (noncommuting) many-body Hamiltonian.
We will return to this example Hamiltonian in Section C to show that this more general analysis yields an essentially optimal parameter dependence: dimension dependence that is tight: the scaling N ≥ Ω(nt 2 λ 2 / 2 ) in Theorem 1 is unavoidable in general.

C. Proof idea for Theorem 1 and 2
This section sketches the main ideas and tools required to establish Theorem 1 and Theorem 2. The other results follow from more elementary arguments. Detailed arguments and rigorous statements are provided in Appendix B below.
Consider an n-qubit Hamiltonian H = L j=1 h j and an evolution time t. The associated unitary evolution defines a (unitary) channel on n-qubit states: Fix a number of steps N and set λ = L j=1 h j . The task is to accurately approximate the target unitary U by a product formula, i.e., the composition of N simple unitary evolutions: We quantify the difference between V (N ) and U in diamond distance. That is, the worst case approximation error over all possible input states ρ in the presence of an unaffected quantum memory. Let E, F be two quantum channels, and let I(|ψ ψ|) = |ψ ψ| denote the identity channel on an equally large ancilla system. The diamond distance between E and F is defined as where the maximization ranges over all pure 6 input states |ψ ψ| and · 1 denotes the trace norm. First, we relate the diamond distance (18) between the channels V (N ) and U, which are both unitary, to an operator norm distance of the associated matrices: N to decompose the operatornorm difference into two qualitatively different contributions: These two contributions can be analyzed separately: i. Deterministic bias: Most product formulas arise from first decomposing the target unitary into a sequence of many small steps: is close to the identity matrix. This allows for approximating U 1/N by another process that is easier to implement. The random importance sampling model (4) over individual Hamiltonian terms is designed to achieve this goal. The average approximation error scales inverse quadratically in the number of steps: While small, this expected error does constitute a bias that is present in each of the N approximation steps. It can, and in general will, accumulate across different time steps: see Lemma 3.6 below. The first inequality is obtained from a telescoping sum. This upper bound diminishes as the number of steps N increases. For > 0, ii. Random fluctuation: We also need to control the deviation of a product of i.i.d. unitaries V N · · · V 1 from its expectation E[V N · · · V 1 ] = (EV ) N in operator norm. We achieve this by applying modern matrix martingale tools, which may of independent interest. In short (see Sec.A for a more thorough introduction), a martingale is a random process {B k : k = 0, . . . , N } such that the distribution of B k only depends on past elements B k−1 , . . . , B 1 ('causality') and also E[B k |B k−1 · · · B 1 ] = B k−1 ('on average, tomorrow is the same as today'). To control fluctuations, we introduce a martingale that interpolates between the extreme cases we need to compare: Note that adjacent elements of this process only differ in a single term: B k arises from B k−1 by replacing EV at position k (counted from the right) by a random realization V k of V , and thus E k−1 B k = B k−1 . Moreover, the current iterate B k only depends on realizations in the past and the interpolation process {B k : k = 0, . . . , N } forms a martingale comprised of d × d matrices, where d = 2 n is the Hilbert space dimension. The discrepancies are small: V k − (EV ) ≤ 2tλ/N , because each realization of V is also very close to the identity matrix. Powerful tail bounds for matrix-valued martingales (which are relatively modern compared to their scalar counterparts) are available in the literature [37,46]. Adapting these results to the task at hand yields the bound see Proposition 3.3 below. In words, the product V N · · · V 1 will concentrate around its expectation once N is sufficiently large. Similar to more conventional random walks on integer lattices, the error is subgaussian with variance proportional to N · (λ 2 t 2 /N 2 ). There is an extra dimensional factor d = 2 n that arises because the martingale is matrix-valued. It converts to the number of qubits n = log(d) in the gate count N . For error parameters , δ ∈ (0, 1), a gate count obeying N ≥ 44 t 2 λ 2 2 log(2d/δ) implies (25) V N · · · V 1 − (EV ) N ≤ 2 with probability > 1 − δ.
Theorem 1 can be derived by combining the bound (22) for the deterministic bias with the bound (25) for random fluctuations. This provides a high probability error bound in the diamond distance when N ≥ Ω(nt 2 λ 2 / 2 ). Corollary 1.1 is derived by integrating the tail bounds of Theorem 1. This produces where the symbol < ∼ suppresses a modest multiplicative constant. We refer to Section B 1 d for details. With fixed input(Theorem 2), we use convexity to restrict our attention to a fixed, pure input state |ψ . We then need to compare U |ψ with V N · · · V 1 |ψ which can be achieved by constructing an interpolating random process that describes a vector -valued martingale. We then call another concentration inequality (Lemma 3.7) which, in contrast, does not contain a dimensional factor. This is the reason why Theorem 2 and Corollary 2.1 do not depend on system size at all.

IV. DISCUSSION AND OUTLOOK
This work shows interesting characteristics of randomization that might help to further improve quantum simulation. (a) By studying typical unitary instances of qDRFIT, we have shown that L-independence of qDRIFT attributes to randomly sampling terms; mixing different realizations is not essential. (b) Gate complexities can be reduced substantially by restricting attention to a particular input state or/and target observable. Simple randomized compilation procedures, like qDRIFT, do not utilize extra information. We believe that more specialized algorithms might be able to exploit additional structure. (c) We have shown that channel averages can be much closer to the ideal evolution than any individual product formula, however, in the case of qDRIFT it is only a quadratic improvement 1/ 2 → 1/ . This can be seen as a manifestation of Jensen's inequality for convex functions (distance to ideal evolution) and we may also see such behavior in other quantum simulation algorithms. Up to our knowledge, we also provide the first matrix concentration analysis for a randomly sampled product formulas and -more generally -Hamiltonian simulation. Similar proof techniques readily apply to any random product formula, e.g. randomly permuted Suzuki formulas [13], and in fact any (causal) random unitary product, e.g. randomized compiling [8,24]. Beyond random products, we expect the developed matrix concentration tools to be useful for controlling stochastic errors in other quantum computing applications, as well as analyzing properties of random Hamiltonians [10]. Campbell and Nathan Wiebe provided insightful comments, as well as encouraging feedback. Finally, we would like to thank the anonymous reviewers for in-depth com-ments and suggestions. CC is thankful for Physics TA Relief Fellowship at Caltech. HH is supported by the Kortschak Scholars Program. RK acknowledges funding from ONR Award N00014-17-1-2146 and ARO Award W911NF121054). JAT gratefully acknowledges funding from the ONR Awards N00014-17-1-2146 and N00014-18-1-2363 and from NSF Award 1952777.
Appendix A: Matrix and vector valued martingales.
Let X 1 , . . . , X N be independent and identically distributed (i.i.d.) random variables. Then, the strong law of large numbers (LLN) implies that the sample meanμ = (1/N ) N k=1 X k converges to the actual mean µ = EX k (μ ≈ µ as N → ∞). In fact, even for finite N , the sample average is likely to be close to the expectation. This behavior is called concentration. It turns out that concentration is a surprisingly generic phenomenon, and it kicks in earlier than one might expect. Asymptotically large sample sizes (N → ∞), which are essential for the law of large numbers and the central limit theorem, are not required establish concentration. An example is Bernstein's inequality for sums of bounded random numbers.

Fact 3.1 (Bernstein's inequality)
. Let X 1 , . . . , X N be independent random variables that obey EX i = 0 and |X i | ≤ R almost surely. Then, for > 0, Bernstein's inequality equips random sums with a LLN-type concentration behavior already for non-asymptotic sample sizes N > ∼ max v 2 / 2 , R/ . However, it requires strong assumptions on the random variables involved. They need to be independent, bounded scalars. In fact, random processes may concentrate under much weaker conditions.
Martingales form a richer family of processes that capture more realistic random processes and that still enjoy powerful concentration inequalities. Let us offer a minimal technical introduction following Tropp [46]. Consider a filtration of the master sigma algebra F 0 ⊂ F 1 ⊂ F 2 · · · ⊂ F k ⊂ · · · F, where we denote the conditional expectation with respect to F k by the symbol E k . A martingale is a sequence {B 0 , B 1 , B 2 , . . . } of random variables that satisfies Heuristically, we can think of k as a 'time' index, and F k contains all possible events that are determined by the history up to time k. The present, aka B k may depend on the past (i.e., B 0 , . . . , B k−1 ), but it cannot depend on the future ('causality'). The second condition formalizes the requirement that, on average, tomorrow (B k ) is the same as today (B k−1 ). A martingale sequence may be composed of scalars, e.g., a one-dimensional random walk, but we can also consider vector-or matrix-valued martingales. Analyzing concentration for vector-and matrix-valued martingales is an ongoing field of research; for example, see [39,46], but many powerful concentration inequalities already exist. In this work, we will use the matrix generalization of Freedman's inequality (due to one of the authors). Let M d×d denote the space of complex-valued d × d matrices.
Then, for any τ > 0 This bound exponentially suppresses the probability of the following undesirable event: the conditional variance w is small while the actual deviation is large. The intricacy of this bound is that the conditional variance w is itself a random variable. However, for this work we will use a weaker but more transparent version. Corollary 3.4. Let {B k : k = 0, . . . , N } ⊂ M d×d be a matrix martingale. Assume that the associated difference sequence C k = B k − B k−1 obeys C k ≤ R and its conditional variance obeys for any τ > 0.
To arrive at this, ignore the events for < N and use that w ≤ v holds almost surely. This simplified Matrix Freedman inequality closely resembles Bernstein's inequality. Actually, Fact 3.1 can be viewed as a special case of Corollary 3.4 where d = 1 and B k = k k =0 X k . But for matrix-valued martingales, the tail bound must depend on the dimension d.
Recapitulating the proof of the general Matrix Freedman inequality would go beyond the scope of this work. It makes full use of Lieb's concavity theorem, stopping times, and Burkholder functions. There is a slightly weaker result that follows from the Golden-Thompson inequality [36,Thm. 1.2]; see also [22,Theorem 11].
Similar concentration inequalities are valid for martingales on the complex vector space C d . Remarkably, these results are independent of the ambient dimension d.
≤ v and C k 2 ≤ R almost surely. Then, for any τ > 0 This is simplified version of [39,Thm. 3.3]. To prove the above version, start with [39, Thm. 3.1] for general martingales on Banach spaces. In our case, vectors are equipped with the standard 2 norm, set the smoothness constant D = 1 (in the notation of [39, Thm. 3.1]), and optimize for a λ like in Bernstein's inequality.
Our concentration analysis of random product formulas with fixed inputs relies on another tool: uniform smoothness of the underlying space [25]. Given a vector martingale, uniform smoothness supplies a recursive/local bound on moment growth. Decomposing B k = C k + B k−1 , all we need is E[C k |B k−1 ] = 0, to apply the following result. Proof. Start with Bonami's inequality [20, Cor. 13.1.1] for normed vector spaces (denoting · 2 as · 2 ): This formula can be converted to the desired statement (Proposition 3.1) via elementary manipulations. We follow [25,Corollary 4.2]: take expectation and use triangle inequality for L q/2 norm Next, following [25,Proposition 4.3], by Jensen's in the first and Lyapunov's in the second inequality, By rearranging terms, we obtain the result with the constant 2(q − 1), which is off by a factor of 2. The advertised constant (Proposition 3.1) can be obtained via a more involved trick [25,Lemma A.1]. Geometrically, this result expresses the uniform smoothness of the space (E · q 2 ) 1/q . We refer to [25] for further exposition on uniform smoothness for general Schatten p-norm. Recently, this method has been applied to dynamic properties of random Hamiltonians [10]. For the task at hand, we can use Markov's inequality to convert such bounds on moment growth into a strong tail bound, similar to Fact 3.3. This is the content of Lemma 3.7 below.

1.
Proof of Theorem 1 and Corollary 1.1: Approximation error under the worst-possible input The proofs of Theorem 1 and Corollary 1.1 were sketched in Section III. This section contains the details. In Section B 1 a, we first relate the diamond distance to the operator norm. This allows us to work with the operator norm, which is mathematically simpler. Then we bound the two error contributions arising from the deterministic bias (in Section B 1 b), as well as random fluctuations (in Section B 1 c). Finally, we combine the two bounds to obtain a convergence guarantee for randomly sampled product formulas. This is the content of Section B 1 d.

a. Conversion from diamond distance into operator norm
The diamond distance is a rather intricate object. Although it can be phrased implicitly as a semidefinite program, analytical formulas are rare and far between. It is, however, sometimes possible to relate diamond distances to other figures of merit that are easier to access, The mixing lemma by Campbell [8] and Hastings [24] is one very useful example. [8,24]). Let U be a fixed unitary and V be a random approximation thereof that obeys V − U ≤ a almost surely. Then, the associated channels U(X) = U XU † and V(X) = V XV † obey

Lemma 3.3 (Mixing lemma
More recent insights, in particular [52,Thm. 3.56], allow for sharpening this bound. In particular, we can completely remove the a 2 /2-term on the r.h.s.
The latter relation generalizes to averages of random unitary channels: The first claim follows from the fact that stabilization is not required to compute the diamond distance between two unitary channels [1,Sec. 5.3]. The second claim improves the mixing lemma.
Proof. Fix an input |ψ and denote the output state vectors by |u = U |ψ and |v = V |ψ , respectively. Normalization ensures that these state vectors obey | u, v | ≤ 1. Apply the Fuchs-van de Graaf relations [52,Theorem 3.33] to convert the output trace distance into a (pure) output fidelity and conclude The first diamond distance bound then is a direct consequence of this relation. Use the fact that stabilization is not necessary for computing the diamond distance of two unitary channels [1, Sec. 5.3] to conclude Here, we have also used the definition of the operator norm. In order to handle expectation values, we need an additional argument. Let (p k , V k ) be an ensemble of unitaries with weights p k ≥ 0 that obey k p k = 1. Then, Cauchy-Schwarz asserts for any unitary U and state |ψ . Combined with Fuchs-van de Graaf, this observation delivers for any pure input state |ψ . This is enough to conclude 1 )|ψ 2 , much as before. We emphasize that this relation is true for any fixed unitary U and any unitary ensemble (p k , V k ). This flexibility is essential to deduce the diamond distance bound, because E[V] is not unitary and stabilization must be taken into account: This is what we needed to show.

b. Controlling the deterministic bias
Next, we establish a bound on the deterministic bias between the averaged channel and the ideal unitary evolution.
Note that the improved mixing lemma, Lemma 3.4 above, allows for converting this statement into a diamond distance bound for the associated channels: This is a slight improvement over the main existing technical result regarding qDRIFT [9]. Indeed, Campbell labels the total average qDRIFT channel E = E[V N • · · · • V 1 ], and he establishes that 1 2 U − E ≤ (t 2 λ 2 /N )e 2tλ/N in [9, Eq. (B12)]. Both assertions become very similar in the large N limit, but (B1) is always tighter and the discrepancy can be quite pronounced for small and intermediate values of N .
The proof of Proposition 3.2 is based on an extension of the numerical bounds |e ix −1| ≤ |x| and |e ix −ix−1| ≤ x 2 /2 for all x ∈ R to Hermitian matrices.
This is the advertised result.
We also need a statement regarding error accumulation over several applications of similar, but not identical, linear operators. It is a rather intuitive consequence of operator norm sub-multiplicativity and the triangle inequality. See [44] for related results. Lemma 3.6. Let EV and U 1/N be matrices with bounded operator norm, i.e. EV ≤ 1 and U 1/N ≤ 1. Then Proof. The triangle inequality and sub-multiplicativity imply for any matrix quadruple A 1 , A 2 , B 1 , B 2 with compatible dimensions. Use the assumed operator norm bounds to iteratively apply this relation and deduce the statement: This is the stated result.
This is what we had to show.

c. Controlling random fluctuations
In the previous subsection we have essentially recapitulated the state of the art regarding qDRIFT: the algorithm provides an accurate approximation in expectation over all possible random choices (deterministic bias). In this section, things start to get interesting. We want to show that a single realization of qDRIFT is likely to provide a good approximation, provided that the number of steps N is sufficiently large. In order to achieve this goal, we need to show that concrete fluctuations around the (accurate) expected behavior remain small: In words, we need to show that a product of i.i.d. random matrices concentrates sharply around its expectation value. This is an interesting and nontrivial problem, even in the (asymptotic) large N limit. While sharp concentration bounds for sums of i.i.d. random matrices have been available for more than a decade now [2,47], our understanding of concentration for random matrix products is more limited; see [25] and references therein. There is a lot of math literature on random walks on Lie groups, but the focus is usually on asymptotic convergence and the machinery is different; see [50] and references therein. The small-step regime has seen less development, although there are some asymptotic bounds [51]. Fortunately, the qDRIFT construction has several appealing features: the random unitaries V N , . . . , V 1 are i.i.d. unit-norm matrices that are close to the identity matrix ( V − I ≤ tλ/N almost surely) and close to their expectation ( V − (EV ) ≤ 2tλ/N almost surely). These properties allow us to use the matrix martingale formalism to derive a strong, nonasymptotic result on the quality of the approximation.
This statement provides a strong tail bound for random fluctuations in the small-error regime ≤ 4tλ. As N increases, the probability of incurring (at least) error /2 diminishes exponentially. For > 4tλ, we have instead a subexponential tail bound: Pr [ V n · · · V 1 − E[V N · · · V 1 ] ≥ τ ] ≤ 2d exp(−N /6tλ). We refer to (B4) for a unified statement that covers both regimes.
The proof technique deserves some exposition, as it is rather general and may be of independent interest. It heavily utilizes the martingale concentration tools introduced in Appendix A. For fixed N , we interpolate between both sides of Rel. (B2) by means of a random process {B k : k = 0, . . . , N }: The increments of this random process are certainly not independent. For instance, B k depends on the (random) choice of V k and all previous choices V k−1 , . . . , V 1 . Fortunately, we can recognize it as a (matrix-valued) martingale satisfying the defining properties: 1. Causality: Each B k is completely determined by the information we have collected up to step k. That is, the (random) choices of V k , . . . , V 1 .

2.
Status quo: Conditioned on previous choices, the expectation of B k+1 equals B k : This feature underscores similarities to an unbiased random walk. On average, "tomorrow" (B k+1 ) is the same as "today" (B k ).
With this matrix martingale reformulation at hand, we can prove Proposition 3.3 using a concentration inequality for matrix martingales, namely Corollary 3.4.
Proof of Proposition 3.3. We have already established that the random process B k = (EV ) N −k V k · · · V 1 forms a matrix martingale that interpolates between B 0 = E[V N · · · V 1 ] = (EV ) N and V N = V N · · · V 1 . The elements of the associated difference sequence are Recall that V k = exp(−iX j ) for some Hermitian matrices X j = t N λ hj h j with index 1 ≤ j ≤ L. Boundedness ( EV , V k ≤ 1) and Fact 3.4 ( exp(−iX) − I ≤ X for X Hermitian) ensure almost surely. Set R = 2tλ/N , and invoke Corollary 3.4 to conclude that The statement follows from bounding the somewhat complicated exponential by either exp −3τ 2 /(8N R 2 ) for τ ≤ 2λt or by exp −3τ 2 /(8R) for τ ≥ 2λt. Last, we substitute τ = /2.
In fact, the same proof works for any adapted small-step random walks on the unitary group. Such a generalization results in Theorem 3 and we refer to Appendix B 3 for details.

d. A bound for expected errors
In the previous subsection, we established that a sufficiently long qDRIFT random product formula concentrates sharply around its expectation. We can translate this statement into a bound on the expected fluctuation around the true evolution. Proposition 3.4 (qDRIFT: Expected diamond norm error). Consider an n-qubit Hamiltonian H = L j=1 h j with total strength λ = L j=1 h j . Fix parameters N, t, and assume that N ≥ n. Set U = U XU † with U = exp (−itH), and suppose that V N , . . . , V 1 ∼ V are i.i.d. realizations of the qDRIFT protocol. That is, V(X) = V XV † , where V is defined by (4). Then where C > 0 is a (modest) numerical constant. The symbol ≈ denotes an accurate approximation in the large-N regime.
It is instructive to compare this assertion to the original qDRIFT result [9] and the improvement in (B1): Note that the expectation over all possible realizations of all N unitary channels appears inside the diamond distance. This implies that qDRIFT performs well on average over many random realizations, provided that the number N of steps exceeds t 2 λ 2 / . In contrast, (B5) has the expectation outside the diamond distance. Our result gives a much stronger conclusion: An individual realization of the randomized qDRIFT protocol does not deviate much from the target evolution, for any input states and observables, with very high probability. The price for such an improvement is a larger number of steps that depends on the system size. For n qubits, the gate complexity N ≥ Cnt 2 λ 2 / 2 is sufficient to ensure -closeness on average. The quadratic scaling in the accuracy parameter is necessary (for large N ) because of the central limit theorem for martingales. The appearance of the number n of qubits is a consequence of measuring closeness in diamond distance. To obtain we need the random product formula to behave for all possible n-qubit input states ρ simultaneously. If we restrict our attention to any fixed input state, we can obtain a gate complexity that does not depend on n. This is the topic of the next section.
Proof of Proposition 3.4. First, we relate the expected diamond distance to an expected operator norm distance and split it up into deterministic bias and (expected) fluctuations: The first term is deterministic and controlled by Proposition 3.2: U − (EV ) N ≤ t 2 λ 2 /N . The second term can be bounded by integrating the tail bound in Proposition 3.3, or rather the tighter bound presented (B4); see [47,Remark 6.5]. For n qubits, we have d = 2 n and conclude Here, we have evaluated the integral by two segments: the integrand is at most 1 until τ ∼ max( nt 2 λ 2 /N , ntλ/N ); for larger τ , the expression decays exponentially and integrating it only produces a contribution of order O(max( t 2 λ 2 /N , tλ/N )). Also, 2C absorbs all constants.

2.
Proof of Theorem 2: Approximation error under a single input Proposition 3.4 asserts that a single, random realization of the qDRIFT protocol (4) accurately approximates a unitary target evolution with respect to the diamond norm: Here, < ∼ denotes an accurate approximation of the true bound in the large N regime. This bound scales linearly in the (qubit) system size n. The dependence on n should not come as a surprise, since the diamond norm produces a very stringent worst-case distance measure. As emphasized by the above reformulation, the approximation must be accurate even when we optimize to find the worst possible input state ρ.
In Hamiltonian simulation, demanding such a stringent worst-case promise may be excessive. In most practical applications, the input state ρ is fixed and simple, e.g., a product state. In this more practical setting, we can obtain a gate complexity N that does not depend on the system size n. The main result of this section asserts In other words, fixing an arbitrary input state ρ helps a lot. A total number of N = 4 (tλ/ ) 2 steps ensures that qDRIFT produces an -accurate output state, with respect to trace distance. The proof is similar in spirit to the argument behind Proposition 3.4. We construct a vector martingale that describes the evolution of the state. We control the behavior of this martingale using the uniform smoothness of the L q ( 2 ) norm. This argument is inspired by the work [25] on concentration of random matrix products.

a. Approximation error for a fixed state
In this section, we state and prove our main technical result on the action of the qDRIFT protocal on a fixed input state.
Moreover, for > 0, max ρ state Proof. First, we reduce the problem to a question about pure states. For any q ≥ 2, Markov's inequality implies that The right-hand side of this equation is a convex function of the state ρ. Thus, the maximum over all states is attained at a pure state. As a consequence, we can establish both claims in the proposition by limiting our attention to an (unknown) pure state ρ = |ψ ψ| that does not depend on the random unitary channels V k (X) = V XV † . Next, we convert the trace distance of the output states into a Euclidean distance on the state vectors themselves. The power q ≥ 2 will remain fixed until the last step of the argument. Lemma 3.4 implies The last bound follows from the triangle inequality and (a + b) q ≤ 2 q max{a q , b q } for a, b ≥ 0.
We have split up the difference into two components, a deterministic bias and a random fluctuation. To control the deterministic bias, we simply apply Proposition 3.2: We will see that the bias is always negligible in comparison with the fluctuation. To control the second term, we need the following lemma.
Lemma 3.7. Let V N , . . . , V 1 be i.i.d. unitaries that implement the qDRIFT protocol (4) with parameters t and λ. Then, for any q ≥ 2, We will establish this lemma below. The basic idea behind the proof is to express the random vector using a martingale sequence, similar to the matrix case. We could have already call vector martingale tail bounds (Fact 3.3) to arrive at the desired results. However, we will demonstrate the same results via markov's inequality and moment bounds (uniform smoothness, Proposition 3.1) which we introduced in Appendix A. Introduce the inequalities from (B8) and Lemma 3.7 into the bound (B7). We obtain We have used the assumption that N ≥ (tλ) 2 to see that the second branch of the maximum always dominates the first.
We may now complete the proof. To obtain the expectation bound, we set q = 2 in (B9) and apply Lyapunov's inequality. To obtain the probability bound, we combine (B6) and (B9) to arrive at Select q = ( 2 N )/(16et 2 λ 2 ) to obtain the stated result. The resulting probability bound is vacuous unless q ≥ 2.
b. Proof of Lemma 3.7 In this section, we establish the bound on the size of the fluctuations.
Proof of Lemma 3.7. Fix a vector |ψ , and introduce a sequence of random vectors: We can recast this difference as a sum of two random vectors that are conditionally orthogonal in expectation: We can apply uniform smoothness (Proposition 3.1) to split up the contributions: We can now iterate this argument to conclude that Invoke Lemma 3.5, using the properties of the random unitaries constructed by qDRIFT: Combine the last two displays to reach the stated result. The proof techniques for establishing Theorem 1 and Theorem 2 are remarkably general and we condense them into Theorem 3. Let us reiterate the premise. Consider approximating a target unitary product U = U N · · · U 1 by a random unitary V = V N · · · V 1 such that {V k } satisfy: (i) Causality: for 1 ≤ k ≤ N the random selection of V k can only depend on previous choices for V 1 , . . . , V k−1 : (ii) Accurate approximation: each realization of V k (and their conditional expectation) must be close to the ideal unitary U k . More precisely, let R, b k > 0 be constants such that almost surely for all 1 ≤ k ≤ N . Note this is more general then we need to prove Theorem 3; this is to take into account the cases when the conditional variance may be much smaller than N R 2 .
Proof of Theorem 3. Recall the decomposition of approximation error into a deterministic bias and a random fluctuation: The deterministic bias can be once again be controlled by a telescoping sum, Note that this also controls the performance of channel EV − U by Lemma 3.4. For the random fluctuations, tweaking the proofs of Theorem 1 and Theorem 2 implies the following general result.
Proposition 3.6 (Adapted random walk on the unitary group; with and without fixed input state). Let {V 1 , V 2 , · · · , V N } ⊂ U (d) an adapted(causal) random unitary matrices that obey for some constants v, R ≥ 0. Then, the product of N random unitaries satisfies a concentration inequality: For a fixed, but arbitrary, input state ρ, concentration is independent of the ambient dimension d: For a fixed input state, we would call the vector Freedman inequality (Fact 3.3) instead of uniform smoothness (Proposition 3.1) in Theorem 2. There are several recent independent papers that also use matrix martingale tools to study products of random matrices that are close to the identity. The work [25] addresses the problem using uniform smoothness tools. The paper [27] uses the matrix Freedman inequality; their proof is quite similar to ours. In contrast, we are interested in unitary products, which allows for additional simplifications. For more background on matrix martingales, see [16,25,37,46].
It is instructive to illustrate these improvements by example. In qDRIFT, all steps have a uniform bound R, but in the fully general statement the variance v can differ from the crude uniform bound N R 2 . In such a regime, the sub-exponential tail of size e −3 /2R can start playing a role.
Lastly, for illustration in Theorem 3 we give loose estimates on the variance to avoid complication with the heavy tail effects. Plugging in the parameters v = ra 2 , R = 2a as V − EV ≤ E V − V ≤ 2 U − V = 2a translates to a typical fluctuation 2 ∼ nra 2 (and 2 ∼ ra 2 for fixed input). We conclude the proof by combining the bound for the deterministic bias and random fluctuation.
Proof of Corollary 3.1. Consider randomly permuting the 2k-th order Suzuki formulas as V k = S σ 2k (t/r) with uniform probability 1/L!. Then by direct calculation [13,44]: by Theorem 3, Which translates to the sufficient gate counts N = rL Appendix C: Asymptotic tightness It is natural to wonder whether the bound (B5) is tight for some Hamiltonian H = j h j , i.e., whether N = Ω(nλ 2 t 2 / 2 ) is also necessary to achieve concentration. More precisely, we want to understand whether the dependences on system size n = log 2 (d), evolution time t and interaction strength λ = j h j are also necessary to control the typical deviation of the unitary random walk we considered.
In the context of matrix concentration inequalities, this question has been thoroughly addressed [48,Section 4.1.2]. The answer is affirmative for sums of bounded matrices: concentration inequalities are tight and saturated for collections of commuting matrices. Although in this work we consider products of random matrices, we are still using a telescoping sum in the small step regime and expect an analogy.
This observation motivates us to look at artificial Hamiltonians whose associated unitary evolution saturates the upper bounds put forth in this work. The cases we can handle lie at the two extremes: either the sum of single-site Pauli Zs or the sum of all 2 n many-body Pauli Zs. We will see the presence of the system size factor n = log 2 (d) at both extremes, so one may believe the same to hold for the intermediate q-local cases. However, this factor arises for very different reasons. It arises in the single-site case, because the operator norm completely factorizes into n constituents (one for each term). For Hamiltonians that encompass all 2 n many-body Zs, it comes from the fact that diagonal entries are nearly independent, so the union bound we used in Section III B is tight. Independence of entries requires the presence of all many-body terms, and does not extend to the few-body case.
The multivariate central limit Theorem will be crucial for analyzing both cases, as it greatly simplifies the analysis in large N limit. This example demonstrates the saturation of our martingale bounds for single site Hamiltonians that factorize completely. To this end, we revisit a variant of the n-qubit example Hamiltonian discussed in Section III A: H = n k=1 Z k where Z k = I ⊗ · · · ⊗ I (k − 1)-times ⊗ Z ⊗ I ⊗ · · · ⊗ I (n − k)-times for 1 ≤ k ≤ n.
(C1) Proposition 3.7. Suppose that we wish to obtain an N -term approximation of the time evolution U = exp(−itH) associated with the n-qubit Hamiltonian (C1) for evolution time t. In the large N limit (CLT), the qDRIFT approximation (4) incurs an operator norm error that matches the (upper) bound from Corollary 1.1 (and indirectly Theorem 1) up to a constant factor: We have chosen to state this result directly in terms of operator norm deviation. A conversion into diamond distance is also possible: 1 2 U − V ≥ 1 2 U − V for any pair of unitary channels. This conversion rule readily follows from the geometric characterization of 1 2 U − V provided in [1]. Proof of Proposition 3.7. Each of the n terms in the Hamiltonian (C1) has unit operator norm ( Z k = 1) and the total strength is λ = n k=1 Z k = n. For fixed N and t, each short-time approximation (4) has the form V k = exp −i tn N Z k(i) , where each k(i) is an index chosen uniformly from the set {1, . . . , n} (multinomial distribution). Since all Z k s commute, we can rewrite the entire product formula as Here, we have introduced the count statistics m k for each site label k, that is the number of times location k has been selected throughout N independent selection rounds, to rearrange the sum. This count statistics obeys m k = Em k = N/n = N/λ for each 1 ≤ k ≤ n. We can use this observation to re-express the target unitary U in a compatible fashion: Unitary invariance then implies that the operator norm difference between both unitaries becomes This is a promising starting point. The multinomial CLT (Fact 3.5) ensures that the n centered and normalized random variables s k = (m k −m k )/ √ N approach the coefficients of a Gaussian vector s ∈ R n with covariance matrix Σ = 1 n I − 1 n J . This, in particular implies Es k = 0 and Es 2 k = 1 n (1 − 1 n ) = σ 2 for all 1 ≤ k ≤ n. We can capitalize on this observation by simplifying (C2) via a second-order Taylor expansion. Set X = − tλ √ N k s k Z k for brevity and apply Fact 3.4 to obtain This relation is preserved under expectations and we obtain Let us focus on the leading order term first. The particular structure of the Hamiltonian (C1) -each Z k is the tensor product of a single Pauli-Z matrix at location k with (n − 1) identity matrices -ensures that the operator norm factorizes nicely. Use X ⊗ I + I ⊗ Y = X + Y iteratively to conclude