Random Quantum Circuits Anticoncentrate in Log Depth

We consider quantum circuits consisting of randomly chosen two-local gates and study the number of gates needed for the distribution over measurement outcomes for typical circuit instances to be anticoncentrated, roughly meaning that the probability mass is not too concentrated on a small number of measurement outcomes. An understanding of the conditions for anticoncentration is important for determining which quantum circuits are diﬃcult to simulate classically, as anticoncentration has been in some cases an ingredient of mathematical arguments that simulation is hard and in other cases a necessary condition for easy simulation. Our deﬁnition of anticoncentration is that the expected collision probability of the distribution—that is, the probability that two independently drawn outcomes will agree—is only a constant factor larger than the collision probability for the uniform distribution. We show that when the two-local gates are each drawn from the Haar measure (or any 2-design), at least (cid:2)( n log ( n )) gates (and thus (cid:2)( log ( n )) circuit depth) are needed for this condition to be met on an n -qudit circuit. In both the case where the gates are nearest neighbor on a one-dimensional ring and the case where gates are long range, we show that O ( n log ( n )) gates are also suﬃcient and we precisely compute the optimal constant prefactor for the n log ( n ) . The technique we employ relies upon a mapping from the expected collision probability to the partition function of an Ising-like classical statistical-mechanical model, which we manage to bound using stochastic and combinatorial techniques. DOI: 10.1103/PRXQuantum


I. INTRODUCTION
Random quantum circuits (RQCs) are a crucial model for understanding a diverse set of phenomena in both quantum information and quantum many-body physics.They have been used to study the onset of quantum chaos and dynamical spread of entanglement in strongly interacting quantum systems [1][2][3], including information processing in black holes [4].They also form the basis for recent experiments aiming to demonstrate exponential quantum advantage [5][6][7].
The utility of RQCs in these situations derives from a myriad of quantitative properties they have been shown to possess.For example, RQCs quickly generate entanglement [1,8,9], lead to fast scrambling and decoupling of quantum information [10,11], and act as efficient encoding circuits for good quantum error-correcting codes [12].When the circuits are geometrically local, they lead to ballistic spreading of local operators [2,3].Furthermore, they form approximate unitary designs, that is, despite being composed of local gates, they efficiently approximate a global random unitary transformation up to any polynomial number of moments [13][14][15][16][17].Meanwhile, computing transition amplitudes of RQCs has been shown to be just as difficult as for arbitrary quantum circuits [18][19][20][21][22], suggesting that classical simulation of RQCs should require exponential time.
In this work, we focus on another property of random quantum circuits called anti-concentration.Roughly speaking, when we measure the output state of the circuit in the computational basis, anti-concentration is the property that the distribution over measurement outcomes is fairly well spread across all possible outcomes and not too concentrated onto just one or only a small portion of those outcomes.Quantitatively, our definition of anti-concentration depends on the collision probability, the probability that measurement outcomes from two independent copies of the circuit agree.An RQC architecture is said to be anti-concentrated if the collision probability is at most a constant factor larger than its minimal value.Understanding when this is the case is particularly important for knowing when RQCs are hard to classically simulate.On the one hand, anti-concentration is a necessary ingredient in most formal hardness arguments for RQC simulation [18,[23][24][25][26][27][28][29].On the other hand, certain classical algorithms for simulating RQCs require anticoncentration in order to be efficient, for example, the algorithms discussed in Refs.[30,31] for noisy circuit simulation and the algorithm in Ref. [32] that spoofs the linear cross-entropy benchmarking metric introduced in Ref. [6].
In most previous work where RQC anti-concentration is needed, it has been asserted as an implication of the 2-design property (see, e.g., Refs.[27,33]).However, the 2-design property is much stronger than what is required for anti-concentration.It was shown that n-qubit RQCs on a fully connected architecture form approximate 2-designs after roughly O(n) depth [13], and this was later shown to also apply to geometrically local RQCs in 1D and improved to O(n 1/D ) in D spatial dimensions [16].However, recent work by Barak, Chou, and Gao [32] -using a similar method to the one presented hereshowed that for 1D RQCs the collision probability converges in depth O(log(n)), much faster than the 2-design depth of O(n).They also conjectured that 2D RQCs anti-concentrate in depth O( log(n)).
In this work, we prove sharp bounds on the number of gates needed for anti-concentration in two RQC architectures.For 1D RQCs, we confirm the O(log(n)) upper bound on the anti-concentration depth in Ref. [32], and add a lower bound that matches the upper bound even up to the constant prefactor of the log(n).We also show that an Ω(log(n)) lower bound on the depth needed for anticoncentration holds regardless of which RQC architecture we use, which refutes the conjecture from Ref. [32] that 2D RQCs anti-concentrate in O( log(n)) depth.We then consider a fully connected (i.e.not geometrically local) RQC architecture, where each gate acts on a pair of qudits chosen randomly among all n(n − 1)/2 possible such pairs.We show that, for qubits (local dimension q = 2), 5n log(n)/6 gates are necessary and sufficient (up to subleading corrections) for anti-concentration to be achieved, which settles a conjecture in Ref. [16].
Our method employs a technique for analyzing RQCs that converts the collision probability into a weighted sum over bit assignments to each location in the circuit diagram; this weighted sum can be viewed as a partition function for an Ising-like statistical mechanical model.The bit assignments can also be interpreted as a Markov chain, and the number of gates needed for anticoncentration ultimately translates into the time needed for certain expectation values to converge under the dynamics of the Markov chain.This method not only yields sharp quantitative bounds, it also produces an appealing qualitative explanation on how and why the collision probability reaches its limiting value, which allows for effective heuristic reasoning even in architectures that we have not explicitly considered here.
The main takeaways from our work are twofold.First, we show that anti-concentration is generally achieved much faster than the 2-design property.The fact that anti-concentration occurs in Θ(n log(n)) circuit size both in 1D and for the fully connected architecturethese being two opposite extremes of geometric locality -suggests that anti-concentration may require only Θ(n log(n)) size for any reasonably well-connected architecture.This comes in sharp contrast to the situation for unitary designs, where the scaling of the size needed with n is highly dependent on the architecture.Second, the fact that we can prove tight upper and lower bounds suggests a broader utility for our method based on the correspondence between RQCs and statistical mechanical partition functions.

II. ANTI-CONCENTRATION AND THE COLLISION PROBABILITY
A RQC architecture is an instruction set on how to draw a circuit diagram given the number of qudits n (each with local Hilbert space dimension q) and the size s of the circuit.The two architectures we consider specifically are the 1D architecture (with periodic boundary conditions) and the complete-graph architecture.The associated RQC ensemble for an RQC architecture is formed by following this instruction set and then choosing the value of each gate in the diagram independently and uniformly at random from the Haar measure.If we fix an instance U from this ensemble, there is an associated output probability distribution p U over q n possible computational basis measurement outcomes x ∈ [q] n , (where [q] = {1, 2, . . ., q}).Anti-concentration tries to capture the notion that the probability mass is well spread out over all the outcomes.The uniform distribution, where each output is allocated q −n fraction of the total probability mass, is the ultimate anti-concentrated distribution because the mass is exactly equally spread, but we say a distribution is still anti-concentrated as long as the average fluctuations from uniform are no larger than O(q −n ).This definition is captured precisely by the collision probability, which is x p U (x) 2 .The collision probability gives the probability that measurement outcomes from two independent copies of the circuit are identical.It is also proportional to the second moment (and thus is related to the variance) of the output probability of a randomly chosen bit string.If p U is the uniform distribution, then the collision probability is q −n , its minimal possible value.For a RQC architecture at a specified qubit number n and circuit size s, we consider the collision probability averaged over the randomly chosen circuit instances U .
where the second equality holds because by symmetry each of the q n terms in the sum yields the same number under expectation as long as at least one Haar-random gate acts on each qudit.We say a RQC architecture with n qudits and s gates is anti-concentrated if there is a constant α (independent of n) with 0 < α ≤ 1 for which Z ≤ α −1 q −n , i.e. that the collision probability is only a constant factor larger than its minimal value.In particular, our theorem statements roughly correspond to the choice α = 1/4, but other choices of α would yield the same results up to leading order.If desired, Markov's inequality can then be used to bound the fraction of the randomly chosen U whose collision probability is larger than some constant multiple of Z. Moving forward, for convenience, when we say collision probability we will mean the average collision probability Z.
Very shallow circuit architectures are not anticoncentrated: there are expected to be some output probabilities x for which p U (x) is exponentially larger than the mean of q −n .As the circuit gets deeper, we expect the probability distribution to become closer to uniform, but even at infinite depth, when the circuit unitary U becomes a globally Haar-random q n ×q n unitary, the output distribution still does not become completely uniform.In this case, the output distribution will typically follow a Porter-Thomas distribution 1 and Z can be exactly com- Not anti-concentrated Z ≥ e n c q −n FIG. 1.A caricature of anti-concentration.In the three examples, the fraction (frequency) of bit strings x for which pU (x) = p is plotted against p. (This could be either for a fixed choice of U or averaged over random choice of U .)Since there are q n bit strings, the mean of this distribution is q −n (dotted blue line).For the uniform distribution, which is completely anticoncentrated, all q n outcomes are allocated probability mass q −n and the collision probability is Z = q −n .For globally Haarrandom unitaries, the output probabilities are on average q −n but have some non-zero variance, and the collision probability is Z ≈ 2q −n .Whenever Z ≈ cq −n for some c independent of n, we call the distribution anti-concentrated.For low-depth RQCs, the mean output probability is q −n , but the variance is much larger, and the collision probability is much larger than q −n .Most of the probability mass is concentrated onto a few measurement outcomes, while the vast majority of the outcomes are assigned a very small amount of mass, leading to the divergence in the frequency that pU (x) is close to 0 depicted in the plot.puted as roughly twice as large as the minimal value of q −n associated with the uniform distribution.This statement is proved using the techniques described later.For a graphical illustration of these cases, see Figure 1.
While one could capture the notion of anticoncentration with a different definition, the definition we choose is useful and relevant because it has concrete ramifications in all of the previously mentioned applications of anti-concentration.For example, one implication of our definition (by application of the Paley-Zygmund inequality) is that if Z ≤ α −1 q −n , then for any 0 meaning that for at least a constant fraction of the circuit instances the probability of a given measurement outcome x is at least a constant multiple β of the mean measurement probability q −n .This sort of inequality is the relevant one for turning good additive approximations into good multiplicative approximations (with reasonable probability), employed in e.g.[18,[23][24][25][26][27][28][29] to argue 2 that it is hard to classically sample output distributions for a large fraction of instances up to small total variation distance error (for more details, see Section IV B).In fact, p U (x) = p is proportional to exp(−p/q −n ), illustrated roughly in the middle diagram of Figure 1. 2 Note that, while Eq. ( 3) is an ingredient in these arguments, it is not alone sufficient to imply hardness of simulation as the arguments generally rely on additional unproven conjectures.
equations like Eq. ( 3) are sometimes taken to be the definition of anti-concentration [27], which is a weaker definition than ours since, in principle, Eq. ( 3) can hold even in cases where Z exceeds any constant multiple of q −n .

III. OUR RESULTS
We show that the collision probability is given by a discrete sum, which we interpret as the expectation value of a certain stochastic process.The correspondence between the collision probability and the discrete sum is described in Section V, and a complete derivation is provided in Appendix B.
Analyzing our expression, we derive rigorous upper and lower bounds on the collision probability generally and for two specific architectures.These bounds are stated here and the proofs are provided in the appendices.These bounds are then used to form upper and lower bounds quoted in Table I on the anti-concentration size s AC , defined as the minimum circuit size required such that Z ≤ 2Z H .The constant 2 in the definition of s AC is arbitrary but, since we will show that Z approaches Z H as Z H (1 + e −Ω(s/n) ), a different choice of constant would only lead to linear-in-n changes to s AC , which would be subleading and would not affect any of the statements in Table I.All logarithms in this paper are natural logarithms.

Collision probability upper bounds
Our upper bounds take the following form: Architecture where the constant a is independent of n and depends on the circuit architecture and s * is a function of n that also depends on the architecture.Thus, if the anticoncentration size s AC is defined to be the minimum size s such that Z ≤ 2Z H , then we have s AC ≤ s * .Specifically, we have the following results, which are restated here as theorems, and proved rigorously in the Appendices.First, we consider the 1D architecture with periodic boundary conditions, where the qudits are arranged on a ring and alternating layers of n/2 nearest-neighbor Haarrandom gates are applied.
Theorem 1.For the 1D architecture, Eq. (4) holds with Since this depth of the 1D architecture is given by d := 2s/n, we can define d * := 2s * /n = a −1 log(n) + O(1) for 1D and conclude that the "anti-concentration depth" Similarly, we show an upper bound for the completegraph architecture, where each gate acts on a random pair of qudits without regard for their spatial proximity.
Theorem 2. For the complete-graph architecture, Eq. (4) holds with whenever s ≥ s * , for a constant c that is independent of n.
A size-s circuit diagram chosen randomly from the complete-graph architecture will have depth at most O(s log(n)/n) with high probability [11], meaning that O(log(n) 2 ) depth is typically sufficient for anticoncentration in the complete-graph architecture.
We also consider general architectures.We define a property called regularly connected (Definition 5 in Appendix C), which applies to a RQC architecture when for any partition of qubits into two sets, there will be a gate in the circuit that couples the two sets at least once every O(n) gates.Nearly all natural architectures have this property, including standard architectures in D spatial dimensions for any D.
Theorem 3. If an architecture is regularly connected, then Eq. (4) holds with a = Θ(1) and s * = Θ(n 2 ) This corresponds to Θ(n) gates per qudit.This result is weaker than our specific result for the 1D and completegraph architecture, and we conjecture that much better is possible.

Collision probability lower bounds
Our lower bounds on the collision probability take the form for constants A and B that are independent of n.(The lower bound for the complete-graph architecture takes a different but very similar form.)This form implies that if s grows with n like s ≈ f n log(n)/B for some f < 1, then we have Z/Z H ≥ 1 2 e An 1−f , which becomes arbitrarily large as n → ∞, meaning that the architecture is not anti-concentrated.This puts a lower bound on the "anticoncentration" size Specifically, we show a general lower bound, as well as specific lower bounds for the 1D and complete-graph architectures.
Theorem 4. For any RQC architecture with s two-qudit gates, the following holds.
This has the consequence that if s AC and d AC are defined as the minimum size and minimum depth for which Z ≤ 2Z H , then We improve on the general lower bound for the two specific architectures that we consider.Theorem 5.For the 1D architecture, there exists a constant A such that where a = log((q 2 + 1)/(2q)) is the same as for the upper bound in Eq. ( 5).
This implies that in 1D, which is tight with the upper bound up to subleading corrections.
Theorem 6.For the complete-graph architecture, Although a slightly different form than the other lower bounds, this still yields the conclusion which is tight with the upper bound up to subleading corrections.When q = 2 (qubits), the prefactor of the n log(n) is 5/6, settling a conjecture proposed in Ref. [16].
The upper and lower bounds together allow us to conclude that s AC = Θ(n log(n)) for both the 1D architecture and the complete-graph architecture, and, in fact, we have matching upper and lower bounds on the constant prefactor of the n log(n).
We note that, for q ≥ 5, our results have the counter-intuitive implication that the 1D architecture anti-concentrates faster than the complete-graph architecture, even though it is geometrically local.We argue that this is an artifact of the definition of the models, and can be explained by the fact that the qudit pairs acted upon by the gates in the complete-graph architecture are chosen randomly, while the qudit pairs in the 1D architecture are not random; in fact, in the latter case they are optimally packed into layers of n/2 non-overlapping gates.As q increases, anti-concentration becomes arbitrarily fast for the 1D architecture (the coefficient of the n log(n) decreases like 1/ log(q)).Meanwhile, for the complete-graph architecture, no matter how large q is, there will always be some minimum number of gatesroughly n log(n)/2 -needed simply to guarantee that all the qudits have been involved in the circuit with high probability.We suspect that a parallelized version of the complete-graph architecture would anti-concentrate with a slightly better constant than the 1D architecture.

IV. RELATED WORK AND IMPLICATIONS
Here we highlight a few relevant previous works and emphasize how our results fit in.
• Harrow and Mehraban [16] studied how quickly RQCs form approximate unitary t-designs and anticoncentrate for various architectures.For geometrically local circuits, they showed that the approximate t-design property is achieved after only O(n 1/D ) depth in D spatial dimensions, the first work to break the O(n) barrier for designs.Since anti-concentration follows from the approximate 2-design property, their work implies an O(n 1/D ) upper bound on the anticoncentration depth.We show that for D = 1, the anti-concentration depth is actually Θ(log(n)) and we conjecture that this is also the case for D ≥ 2, but we do not prove this, so the O(n 1/D ) bound remains the best known for D ≥ 2.
They also considered the question of anticoncentration in the complete-graph architecture and showed an upper bound on the anti-concentration size of O(n log(n) 2 ) and a lower bound of Ω(n log(n)).
They used heuristic reasoning to conjecture that (for q = 2) the anti-concentration size should be 5n log(n)/3, up to leading order.We are able to show that to leading order the anti-concentration size for the complete-graph architecture is 5n log(n)/6.This is off by a factor of 2 from the conjecture stated in their paper, which we suspect is due to a minor error in their heuristic reasoning.
• Barak, Chou, and Gao [32] developed a classical algorithm for shallow RQCs that achieves a nonnegligible score on the Linear Cross-Entropy Benchmarking (XEB) metric despite not performing a full simulation of the RQCs.The Linear XEB metric was used by Google to verify its 2019 quantum computational supremacy experiment [6].Barak, Chou, and Gao show that if a depth-d RQC architecture in D spatial dimensions has collision probability Z, their algorithm achieves a score of with high probability after a total runtime ( RQCs, which is equivalent to our Theorem 1.This shows that their algorithm achieves a ≥ 1/poly(n) score in polynomial time for logarithmic depth 1D RQCs.For 2D RQCs, they conjecture that Z = O(2 −n ) after depth d = O( log(n)), which would imply their algorithm achieves ≥ 1/poly(n) score in polynomial time at that depth.Our Theorem 4 contradicts their conjecture by showing generally that 2 n Z ≥ exp(n 1−o(1) ) when d is sublogarithmic.
• Our method performs expectations over individual gates in the RQC using formulas for Haar integration, a strategy that has also been used on similar problems in the past.Many works have used this strat-egy to form a random walk over Pauli strings with wide-ranging applications [8, 10-13, 16, 34-36].Our analysis applies this strategy in a distinct way that more closely resembles a series of works that interpret the resulting expression as the partition function of classical statistical mechanical models [2,3,17,[37][38][39][40][41][42].Here, we analyze those partition functions using a Markov chain analysis, but our Markov chain has different transition rules compared to the Pauli string Markov chain.

A. Connection to 2-design
Anti-concentration for random quantum circuits (as well as some Hamiltonian models) is often established as a consequence of the convergence to approximate unitary 2-designs, where approximately reproducing the first two moments of the Haar measure allows one to bound the RQC collision probability.For both 1D and completegraph RQCs, size O(n 2 ) circuits (of linear depth) form approximate 2-designs and therefore anti-concentrate.There are a number of definitions of approximate unitary designs utilizing different norms, we briefly comment on the definitions and requirements for anti-concentration in this architecture.
As we review in Appendix F, defined in terms of the diamond norm, ε-approximate 2-designs have a collision probability upper bounded by Z H up to additive error.In order to achieve anti-concentration, ε must be taken to be exponentially small (i.e.we require ε = 1/q 2n ).Ref. [15] introduced a stronger notion of approximate design in terms of the complete positivity of the difference in channels.Under this strong definition, 2-designs bound the collision probability up to relative error with respect to the Haar value and thus anti-concentrate.A much weaker definition of approximate design is the operator norm of the moment operators, often called the tensor product expander (TPE) condition.Interestingly, TPEs also bound the collision probability up to additive error, but again the error needs to be exponentially small to achieve anti-concentration.
Random quantum circuits on the 1D architecture form ε-approximate 2-designs, in both diamond norm and the stronger definition, when the circuit size is O(n(n + log(1/ε))).Moreover, 1D random circuits actually form ε-approximate TPEs in constant depth, when the circuit size is O(n log(1/ε)).But again, anti-concentration requires that ε be taken to be ε = 1/q 2n , thus mandating linear depth.So to establish that the collision probability is bounded up to a relative error, as in the definition of anti-concentration, using unitary 2-designs or a general bound on the moments necessitates linear depth.For non-local RQCs defined on a complete-graph, the best known upper bounds on the approximate 2-design depth are the same as for the 1D architecture.But it has been conjectured that this may be improved for non-local RQCs, which would close the gap between the 2-design time and the depth required for anti-concentration.
To further emphasize the distinction between anticoncentration and unitary 2-designs, we note that anticoncentration can be achieved for specific short-depth circuits without generating entanglement across the system (indeed, a circuit consisting of a single layer of singlequbit Hadamards suffices).Moreover, for an ensemble of random quantum circuits, anti-concentration can be equivalently phrased as the statement that certain matrix elements of second moment operator E U [U ⊗2 ⊗ U * ⊗2 ] reach the Haar value of 2/q 2n after some depth.Whereas the approximate 2-design condition gives that E U [ ψ|U ⊗2 ⊗ U * ⊗2 |ψ ] is small for all states |ψ , even those that are entangled across the tensor copies.As we show in Appendix F, there are necessarily some states which require linear depth to equilibrate to the minimal Haar value, at least for RQCs on the 1D architecture.
The starting point for these hardness arguments is the long-known observation that the answer to a hard classical problem can be encoded into the output probability p U (x) of a quantum circuit U . 3 Thus, exactly computing p U (x) for arbitrary U and x should not be possible in classical polynomial time.This remains true even if one only needs to compute p U (x) up to some constant relative error.The ultimate goal in the context of quantum computational supremacy is to show that there is no polynomial-time classical algorithm that approximately simulates random circuits (or at least to give extremely convincing evidence in favor of this conclusion).More precisely, the approximate simulation task is to produce samples from a distribution p U for which for some small ε = O(1), and to do this for a large fraction of U drawn randomly from some random ensemble.
3 For example, given an n-bit efficiently computable Boolean function f consider the following circuit U .First, perform a layer of Hadamard gates on every qubit, then perform the 2 n × 2 n diagonal unitary operation x (−1) f (x) |x x|, then perform another layer of Hadamard gates.It is straightforward to show that p U (0 n ) is proportional to gap(f ) 2 where gap(f ) = |{x : f (x) = 0}| − |{x : f (x) = 1}|, which is an extremely difficult quantity to compute classically; it is expected that there exist functions f where the best classical algorithm is essentially a brute-force enumeration over all 2 n inputs x.
Turning the starting point into the ultimate goal requires a few steps (some of which rely on conjecture).Anticoncentration is one of these steps.
The primary role anti-concentration plays is to turn a small additive difference |p U (x) − p U (x)| for most x into a small relative difference r(x) for most x, where If Eq. ( 18) is obeyed then the value of |p U (x) − p U (x)| is on the order of ε/q n for most x.Meanwhile, the mean value of p U (x) for random x is exactly 1/q n .If p U (x) is anti-concentrated, then for most x, p U (x) will be within a constant factor of the mean, as shown in the middle diagram of Figure 1, and r(x) = O( ) will hold for most x.However, if p U (x) is not anti-concentrated, then p U (x) will be much smaller than the mean for most x, as depicted in the right diagram of Figure 1.This means that without anti-concentration, r(x) for most x, which is problematic because the hard classical problems encoded into p U (x) are no longer hard when the relative error is extremely large, so anti-concentration appears to be necessary if there is any hope of completing the hardness argument using existing techniques.
Even if anti-concentration holds, more is needed to show hardness of approximately simulating RQCs.One must turn hardness of computing p U (x) into hardness of sampling from p U and also turn hardness for arbitrary U into hardness for a random U .There are techniques that work for each of these steps individually, but currently they do not work together simultaneously, and thus an additional conjecture must be made.
Our work puts sharp bounds on the number of gates needed for anti-concentration to hold in multiple RQC architectures, which constrains when these hardness arguments have the potential to work.Our finding that the number of gates per qudit needed for anti-concentration grows only like O(log(n)) in the 1D and complete-graph architectures implies that perhaps RQC-based quantum computational supremacy could be achieved at a shallower circuit depth than previously believed.For example, Google's 2019 quantum computational supremacy experiment was based on 2D RQC's of depth exceeding the √ n diameter of the qubit array [5,6].The fact that 1D circuits anti-concentrate in Θ(log(n)) depth is evidence that 2D circuits should have the same scaling (if anything, anti-concentration should happen faster in 2D).Thus a similar quantum computational supremacy experiment might be equally defensible at Θ(log(n)) depth instead of Ω( √ n) depth.We note, however, that there are other reasons to want to go to larger depth (e.g.classical simulation via tensor network methods becomes harder at larger depths).Without anti-concentration, the hardness-ofsimulation arguments appear to break down, but this does not generally imply that simulation is easy.On this topic, a subset of these authors and others described an algorithm for solving the approximate simulation problem for 2D RQCs [41].The algorithm is proved to be efficient for a certain constant-depth (and thus not anti-concentrated) 2D RQC architecture, but it is conjectured to become inefficient once the depth exceeds a larger constant threshold.Thus, the complexity of the algorithm transitions to inefficient before the circuits become anti-concentrated, suggesting that in 2D there could be a regime where the RQCs are too shallow to be anti-concentrated but classical simulation is still hard.

V. COLLISION PROBABILITY AS A SUM OVER BIT STRING TRAJECTORIES
The main technical contribution of our work is to derive a correspondence between the collision probability and a discrete sum (which can be interpreted as the partition function of a classical statistical mechanical model or as the expectation of a Markov chain) and then to derive rigorous upper and lower bounds on the sum.Here we describe the correspondence along with a brief example for a simple random quantum circuit in Figure 2. We also explain why this correspondence leads us to expect anti-concentration to be achieved after Θ(n log(n)) gates in most architectures.In Appendix B, we give a more careful derivation of this correspondence, and in the other appendices, we use it to rigorously prove the upper and lower bounds quoted in Table I.
Recall that we wish to compute the collision probability where U is the unitary enacted by the random quantum circuit.The Haar measure uniformly covers the unitary group so, intuitively speaking, taking the expectation over application of a Haar-random gate removes much of the bias in the quantum state; we use a technique that allows us to effectively keep track of only n bits of information about the n-qudit state after the application of (two copies of) each Haar-random gate.Instead of 0 or 1, our bits take values I or S, because they are associated with the identity and swap operations on two qudit copies.
In particular, if V is a q × q Haar-random matrix and σ is an operator on two copies of a q-dimensional Hilbert space, then the quantity E V [V ⊗2 σV † ⊗2 ] is equal to a linear combination of the identity operation I on the two copies of the Hilbert space, and the swap operation S on the two copies of the Hilbert space.Specifically, it is given by Tr(σ) − q −1 Tr(σS) This well-known formula is derived in Appendix B.
By applying the formula to each of the two-qudit Haarrandom gates sequentially, the state (which begins in |1 n 1 n | ⊗2 ) evolves as a sum over n-fold tensor products of identity and swap operations.Each of these n-fold tensor products is labeled by an n-bit vector that we call a configuration ν ∈ {I, S} n .For a circuit with s twoqudit gates, each term in the resulting sum is then associated with a length-(s + 1) sequence of configurations γ = ( γ (0) , . . ., γ (s) ), which we call a trajectory.Each trajectory γ has a certain non-negative coefficient in the sum, allowing us to write for a fairly simple weighting function, described as follows and derived more carefully in Appendix B. First of all, the weight for most trajectories is simply 0. In order for a trajectory to have positive weight it must obey the following rules.If the gate at time step t acts on qudits a and b, then the configuration values γ , one of the bits must be flipped during the transition from γ (t−1) to γ (t) .If the values at positions a and b already agreed at time step t − 1, they must remain unchanged from time step t − 1 to time step t.Moreover, the bit values at the other n − 2 positions must also remain unchanged from time step t − 1 to time step t.
For trajectories that obey these rules, the weight is reduced by a factor of 2/5 for qubits, or q/(q 2 + 1) for general local dimensions.Thus, the most significant terms in the weighted sum are the terms with the fewest bit flips along the trajectory.The expression for Z as a weighted sum can alternatively be interpreted as a partition function for an Ising-like classical statistical mechanical model since it is a weighted sum over "spin" configurations for spins with two possible values, or it can be interpreted as the expectation of a certain quantity over a simple Markov chain that generates the sequence ( γ (0) , . . ., γ (s) ).We take the latter approach in our application of the method to prove our upper and lower bounds.See Figure 2 for an example of two trajectories for a simple RQC, along with a calculation of their weight.
The correspondence given in Eq. ( 23) is powerful because we have a good sense of what to expect from the weighted sum over trajectories, and we can draw conclusions that were not obvious from the definition of the collision probability itself.For example, we can straightforwardly analyze the infinite circuit-size limit.In this limit, each positive-weight trajectory γ will be forced to keep flipping bits (each time a two-qudit gate acts on a disagreeing pair of bits) until it reaches a fixed point, either I n or S n , in which case bits can no longer be flipped since all the bits agree.Let Q(x) be the total weight of all trajectories that begin at a configuration with x S as-signments and n − x I assignments.At some point in the circuit, a disagreeing pair of bits will be acted upon by a gate, and one of the bits must flip, sending the number of S assignments either to x − 1 or x + 1 and reducing the weight by q/(q 2 + 1).Since there are an infinite number of gates, the following recursion relation must be obeyed which, by imposing the boundary conditions Moreover, for each x, there are n x configurations each contributing weight Q(x), so reproducing the value Z H that would be obtained if the random quantum circuit were one large q n × q n Haarrandom transformation instead of a series of q 2 × q 2 twoqudit gates.(The fact that a q n ×q n Haar-random transformation yields Z H is a direct consequence of Eq. ( 22) with the substitution q → q n .)This conclusion makes sense since a random circuit with an infinite number of 2-local Haar-random gates should enact a global Haarrandom transformation.
When the circuit size is a finite number s, we have Z > Z H , corresponding to the fact that many trajectories have not yet reached a fixed point and are overweighted compared to their contribution to Z H .As the circuit size increases, more of the trajectories get closer to the fixed point and Z approaches Z H .The point at which anticoncentration is achieved is intimately connected with the point at which most of the weight can be accounted for by trajectories that have reached a fixed point.A depiction of this process at n = 60 is given in Figure 3.
Our quantitative challenge is to understand, for a certain RQC architecture, how quickly these trajectories approach the fixed points, and consequently how quickly Z approaches Z H , as the circuit size increases.Recall that we define the anti-concentration size s AC to be the circuit size (as a function of the number of qudits n) needed for Z to be only a constant factor larger than Z H . Perhaps surprisingly, we find in multiple architectures that s AC = Θ(n log(n)), corresponding to only Θ(log(n)) gates per qudit.We can explain this observation heuristically by generating trajectories γ at random with probability proportional to weight(γ) (in the statistical mechanical interpretation, this corresponds to drawing samples from the thermal distribution).For typical trajectories generated in this fashion, each additional layer of Θ(n) gates will cause the trajectory to move a constant fraction of the way closer to terminating at a fixed point.Since trajectories typically begin on the order of n bit flips away from the fixed point (i.e. the initial U (5)   γ = γ (0) , γ (1) , γ (2) , γ (3) , γ (4) , γ (5)   weight(γ) = 2 U (5)   γ = γ (0) , γ (1) , γ (2) , γ (3) , γ (4) , γ (5) FIG. 2. Two example trajectories for a quantum circuit diagram with n = 4 qubits and s = 5 gates.Each gate displayed is chosen randomly from the Haar measure over single or two-qubit unitaries.The collision probability Z is expressed as a weighted sum over trajectories γ = ( γ (0) , . . ., γ (s) ), which are length-(s + 1) sequences of assignments ("configurations") of I or S to each of the n qudits.When the input bits to a gate are assigned opposite values, one must be switched at the next configuration in the sequence.These bit flips happen at gates 1, 2, 3, and 5 in the first example, and at gates 1 and 2 in the second example.Each bit flip results in a reduction of the weight by a factor 2/5 (when q = 2).In the second example, the trajectory reaches one of the fixed point configurations where all n values agree; this is not the case in the first example.Trajectories that quickly reach a fixed point generally have larger weights and make up most of the contribution to the collision probability.
configuration typically has Θ(n) I assignments and Θ(n) S assignments), Ω(log(n)) layers are necessary and sufficient for typical trajectories to get within a constant distance from the fixed point.
This heuristic statement is perhaps confirmed most clearly in the complete-graph architecture, where qudit pairs are chosen uniformly at random.Here let x n, and suppose the current configuration at time step t has value S at x of the n positions and value I at the other n − x positions.If we perform gates on n/2 random pairs of qudits, we will expect roughly x of those pairs to couple an I value with an S value.Each time this happens, a bit must be flipped and there is an opportunity for the trajectory to move closer to the fixed point I n .Thus, we expect the number of S values in the configuration at time step t + n/2 to have decreased by an amount proportional to x.After Θ(n log(n)) gates, we expect the trajectory to be at (or very close to) the fixed point I n with high probability.Fewer gates would leave most trajectories too far from the fixed point for anticoncentration to have been reached.In Figure 3, we illustrate the convergence of typical trajectories and the correspondent convergence of Z for the complete-graph architecture at n = 60.
We prove that a similar situation occurs even if the gates are arranged in a 1D fashion, and we fully expect that this situation applies for nearly all natural4 architectures, including circuits on D-dimensional lattices for D > 1.We formalize this in Conjecture 1.We believe Conjecture 1 firstly because anti-concentration should intuitively only be faster when the circuit becomes more connected, and the 1D architecture is perhaps the least connected a natural architecture can be, as it takes Ω(n 2 ) gates for information to travel across the diameter of the qudit array.Secondly, the above intuitive argument about the convergence of typical trajectories to a fixed point in O(n log(n)) gates should apply to any natural architecture.Specifically, if you choose a configuration with x S assignments at random, and you apply a layer of Θ(n) two-qudit gates, with high probability you will have formed Θ(x) disagreeing pairs and moved the trajectory a constant fraction of the way to the nearest fixed point.The difficulty in proving Conjecture 1 lies in characterizing what happens in the low-probability event that this is not the case.
Indeed, our rigorous proofs for the 1D and completegraph architectures have to deal with the fact that it is not sufficient to examine only typical trajectory behavior.In particular, the collective contribution of trajectories at the tails of what is allowed are tricky to bound.Nonetheless, heuristic reasoning about typical trajectory behavior ultimately gives accurate predictions about the collision probability in these cases.
The rigorous bounds are provided in the appendices.In the 1D case, the proof associates each trajectory with a configuration of domain walls on a 2D lattice (of size n×(d+1), where d is the circuit depth) and bounds their total contribution combinatorially, similar to the method it implies anti-concentration in O(n 2 ) gates and conjecture this can be improved to O(n log(n)).

FIG. 3.
Thirty trajectories generated randomly for the complete-graph architecture at n = 60.A trajectory γ is chosen with probability proportional to weight(γ) in the s → ∞ limit, and then the number of S assignments (out of 60) are plotted for the first 300 time steps.The trajectories rapidly approach either the fixed point I n with 0 S assignments, or the fixed point S n with 60 S assignments, but not all have reached the fixed point within 300 time steps.The distance of a typical trajectory from the nearest fixed point decays exponentially with time, with characteristic time scale Θ(n).Thus, it takes Θ(n log(n)) gates for most typical trajectories to have reached the fixed point.Inset: As trajectories approach the fixed points, the collision probability Z (which can be efficiently numerically calculated for the complete-graph architecture) approaches ZH .Anti-concentration is defined as the point where it falls beneath 2ZH (dashed line), which occurs at s = 214 for n = 60.employed in Refs.[2,3,17,32].The main intuition is that trajectories that have not reached a fixed point must have domain walls that penetrate through the entire depth of the circuit and thus receive weight that decreases exponentially with the depth as (q/(q 2 + 1)) d .Accounting for the total number of possible domain walls of this type, which scales like n2 d , one finds that d = O(log(n)) is sufficient for the overall contribution to be small.We use a similar domain wall counting method to produce a tight lower bound.
In the complete-graph case, we present a much different and novel approach.For each trajectory γ, we define R[γ] as the "reduced" trajectory that results from removing consecutive duplicates of the same configuration.Note that the weight of γ depends only on the length of R[γ] (the number of bit flips).Long subsequences of consecutive duplicates occur when the randomly chosen gates repeatedly act on pairs of qudits that are already assigned the same value by the configuration, an outcome that becomes more likely as the trajectory approaches a fixed point.For each ψ, we can condition on R[γ] = ψ, and examine the probability distribution over the length of γ (i.e. the number of consecutive duplicates plus the number of bit flips).We use a Chernoff bound to upper bound the probability that the length of γ is greater than a certain quantity.We then use a generalization of the recursive method that yielded Eq. ( 25) to perform the weighted sum over all reduced trajectories ψ.In the appendix, we provide a more detailed proof summary prior to the full proof.

VI. OUTLOOK
In a quantum computer, quantum information is ultimately accessed by making measurements of the output state and obtaining samples from the associated output distribution over measurement outcomes.In many applications, it is desirable to choose our quantum computation completely at random, the only constraint being the arrangement of the different gates, and thus it is important to characterize the output distribution over measurement outcomes in random quantum circuits, and how it depends on the underlying circuit architecture.
One feature of the output distribution is that, for very shallow circuits, there are a relatively small number of very "heavy" measurement outcomes that are exponentially more likely than average to be obtained, a fact that inhibits the design of certain classical simulation algorithms, but also in other cases prevents potential proofs that no good simulation algorithms exist.As the circuit gets deeper, the probability mass gradually anti-concentrates and eventually becomes fairly well spread out over all possible measurement outcomes.We have developed a framework to quantitatively understand this situation; we map the anti-concentration process to the equilibration of a simple stochastic process (an alternative interpretation of the stochastic process is the partition function of a statistical mechanical model).The stochastic process allows for effective qualitative reasoning, but also produces sharp quantitative anti-concentration upper and lower bounds.
Both sides of our bounds have meaningful and surprising takeaways.On the one hand, the fact that only O(n log(n)) gates are needed to achieve anticoncentration in geometrically local and non-local architectures contradicts the intuition that anti-concentration should not occur until information has had time to spread across the entire system.In fact, up to a constant factor, the anti-concentration time does not appear to be sensitive to exact connectivity structure of the circuit.While we only rigorously consider two architectures, our work gives strong evidence that any natural architecture anti-concentrates in O(n log(n)) gates (which typically corresponds to O(log(n)) depth).In cases where anticoncentration is a desirable property, our work gives explicit bounds on how many gates are needed, and the fact that this number is relatively small will come as welcome news in practical situations where the gates are noisy or otherwise costly to implement.
On the other hand, by showing that Ω(n log(n)) gates are necessary for anti-concentration (and computing the optimal constant pre-factor in our two specific scenarios), we have cleared up some confusion about very shallow circuits.Increasing the depth causes the anti-concentration process to begin, but our lower bound implies that the phenomenon of very heavy measurement outcomes will remain for any architecture of constant depth.Even the 2D circuits of depth O( log(n)) (for which the lightcone volume is O(log(n))) considered in Ref. [32] cannot be anti-concentrated, as had been speculated in that work.
We conclude with some other specific open problems inspired by our work.
• We have proved that the anti-concentration size is Θ(n log(n)) for the 1D and complete-graph architectures.We believe this is true for most other natural architectures and formally conjecture in Conjecture 1 that this follows from our "regularly connected" definition.
• A sharp anti-concentration analysis for 2D and higher dimensional geometrically local architectures would be particularly valuable since, unlike in 1D, Θ(log(n))depth 2D circuits can perform universal quantum computation (indeed, Ω(1)-depth is sufficient [43]), and 2D circuits form the basis for Google's 2019 quantum computational supremacy experiment [6].
• We suspect the constant prefactor of (2 log(q 2 + 1)) −1 in the general lower bound in Theorem 4 could be improved.What is its optimal value?That is, can we show an improved general lower bound and then find an RQC architecture A fast that has a matching upper bound.This would show that A fast is the fastest anticoncentrator.A candidate for A fast is the architecture where each layer of n/2 gates is formed by choosing a random partition of the n qudits into n/2 pairs.
• Are there other problems involving second moment calculations over RQCs where our techniques would produce sharp upper and lower bounds?One such problem could be the 2-design time for RQCs in various architectures.
Here we establish some precise definitions for the terms in this paper.Throughout, we consider systems of n qudits of local Hilbert space dimension q, with basis states {|1 , |2 , . . ., |q }.Loosely speaking, a quantum circuit is a sequence of unitary transformations called gates, which each typically involve only a few of the n qudits, acting on the initial state |1 ⊗n ≡ |1 n .Formally, we let a quantum circuit diagram of circuit size s be specified by a length-s sequence (A (1) , . . ., A (s) ) of non-empty subsets of [n] := {1, 2, . . ., n}, indicating for each gate which qudits participate in that gate.Since we consider circuits consisting only of two-qudit gates, we require |A (t) | = 2 for all t.We also make the assumption that the circuit begins with a single-qudit gate on each of the n qudits at the beginning of the circuit, without counting these n gates toward the circuit size.This sequence can be turned into a diagram as in Figure 2 (ignoring the overlaid I and S), where the gate sequence is ({1, 2}, {2, 3}, {1, 2}, {3, 4}, {2, 3}).Note that the single-qudit unitaries are each displayed with the symbol U but will not necessarily be the same unitary.The circuit depth d of a circuit diagram is the minimum number of layers of non-overlapping gates needed to implement all s gates in the circuit, or formally, the smallest integer such that there exists a sequence 0 = s 0 < s 1 < s 2 < . . .< s d = s where A (t) ∩ A (t ) = ∅ whenever s j < t < t ≤ s j+1 .
Once a circuit diagram has been chosen, a quantum circuit instance is generated by additionally specifying a lengths + n sequence of unitary matrices (U (−n) , . . ., U (−1) , U (1) , . . ., U (s) ) where U (−j) is a q × q (single-qudit) matrix for each j = 1, . . ., n and U (t) is a q 2 × q 2 (two-qudit) matrix for each t = 1, . . ., s.We denote the global q n × q n unitary operator implemented by the circuit by U , where with V X indicating the action of the q |X| × q |X| unitary V on the qudits in subregion X ⊂ [n] tensored with the identity operation on the qudits in the complement of X.
In this work, we will always assume that projective computational basis measurements are performed on all n qudits at the end of the circuit.Thus, a quantum circuit instance U has a corresponding classical probability distribution p U over possible measurement outcomes x ∈ [q] n , as follows: Random quantum circuits will refer to situations when, once a circuit diagram has been fixed, the actual unitary gates U (t) that determine the circuit instance are each randomly chosen independently from some distribution over the unitary group.In this paper, we always take this distribution to be the Haar measure, but since our techniques rely on calculating expectations over quantities with only two copies of each U (t) , our results also apply when the gates are drawn from any 2-design, such as the Clifford group.Note that Google's quantum computational supremacy experiment [6] drew gates from another distribution that is not a 2-design.Heuristically speaking, as long as the distribution lacks any bias or symmetries, we expect properties like anti-concentration to be the same as in the Haar-random case.

Random quantum circuit architectures
An architecture for random quantum circuits is simply a procedure for choosing a circuit diagram.Formally, we define it to be a (possibly randomized) classical algorithm that, given parameters n and s, computes a circuit diagram of size s on n qudits.Given an architecture and parameters n and s, we let the expectation of some quantity Q, denoted E U [Q], refer to the expectation over the process of first choosing a circuit diagram according to the architecture, and then choosing a circuit instance by randomly generating each gate in the circuit diagram independently from the Haar measure.Next, we define the two architectures that we consider.
Definition 1 (Complete-graph architecture).Circuit diagrams of size s on n qudits are generated by choosing s gates each uniformly at random from the set of all two-qudit gates, i.e.
Note that if it could be guaranteed that every qudit would eventually participate in at least one gate, the distribution over circuit instances would be equivalent if we omitted the first layer of n single-qudit gates (defined to be part of every architecture), a fact that follows from the invariance of the Haar measure; the single-qudit gates could be absorbed into the two-qudit Haar-random gates that act directly before or after without changing the distribution over the two-qudit gates.However, in the complete-graph architecture there is a chance that a qudit does not participate in any two-qudit gates, although for sufficiently large circuit size the probability of this vanishes.
Definition 2 (1D architecture).Assume n is even and d := 2s/n is an integer.The circuit diagram of size s on n qudits is generated by alternating between the two types of layers of n/2 non-overlapping nearest-neighbor two-qudit gates on a ring.That is, for each t = 1, . . ., n/2, if j is even, then A (t+jn/2) = {2t − 1, 2t}, and if j is odd, then A (t+jn/2) = {2t, 2t + 1}, where index n is identified with index 0 to enforce periodic boundary conditions.

Collision probability and anti-concentration
Anti-concentration is a concept that describes a classical probability distribution for which the probability mass is not too concentrated onto a small number of outcomes of the random variable.The uniform distribution is the ultimate anti-concentrated distribution, as the probability mass is allocated evenly over every possible outcome, but we would still like the term anti-concentrated to apply to some non-uniform distributions if the probability mass is fairly well spread over many of the outcomes.There are multiple ways to make this quantitative.For the purposes of this paper, we choose one way -the collision probability -that mirrors previous work on anti-concentration of quantum circuit outputs and suffices for the applications we discuss in the introduction.
Let X be a discrete random variable and let M be the set of outcomes of X.We can form another random variable p, where p is equal to Pr[X = x] for an x chosen uniformly at random from M .Since x Pr[X = x] = 1, we have E[p] = 1/|M | no matter how X is distributed.We define the collision probability for X to be which is the probability that two identical independent copies of X will be equal to each other -hence collision probability.If the distribution over X is the uniform distribution, then the distribution over p is the point distribution on the value |M | −1 , the collision probability takes its minimal value Z = |M | −1 , and var(p) = 0.If X is non-uniform but still somewhat anti-concentrated, then p won't always be |M | −1 but it will usually be close, and this will be reflected by a collision probability that is greater, but not too much greater than |M | −1 .Formally, we make the following definition.
Definition 3 (Anti-concentrated).We say that a random variable X over a set M of outcomes is α-anti-concentrated for 0 < α ≤ 1 if Thus a distribution is 1-anti-concentrated if and only if it is the uniform distribution.
In our setting, the random variable X is the measurement outcome of a random quantum circuit instance, which is distributed according to the distribution p U over the outcome set [q] n .Example distributions of p U for RQC outputs in the uniform, the non-uniform but still anti-concentrated, and the not anti-concentrated case are shown in the caricature in Figure 1.A random quantum circuit architecture for specified n and s is understood as an ensemble over many different U , only some of which will have output distributions p U that are α-anti-concentrated for a certain choice of α.We would like to say that the architecture as a whole is anti-concentrated if typical circuit instances drawn from the architecture are anti-concentrated, acknowledging that not every instance will be.We also require this to hold for the same constant α as n increases, with s increasing like some function s(n).Formally, we accomplish this by averaging the collision probability over the random circuit instance, as follows.
Definition 4 (Anti-concentrated RQC architecture).We say that a random quantum circuit architecture is α-anticoncentrated for 0 < α ≤ 1 at circuit size s(n) if there exists n 0 such that whenever n ≥ n 0 where E U denotes drawing circuit instances according to the architecture over n qudits with circuit size s(n).Generally, we say that the architecture is anti-concentrated at size s(n) if there exists a constant α > 0 independent of n for which it is α-anti-concentrated at that size.
RQC architectures for which every qudit experiences at least one gate, which includes all the architectures introduced above, will have a symmetry over the q n measurement outcomes in the sense that the quantity p U (x) is distributed identically (over circuit instances) for every x.In this case each term in the sum in Eq. (A7) will have the same contribution and we can write simply The anti-concentration of an architecture implies that most of the instances drawn from that architecture have good anti-concentration properties: Given an architecture at a certain size and a bound on its collision probability Z ≤ α −1 q −n , we can use Markov's inequality to assert that at least a 1 − β fraction of instances have collision probability at most q −n 1 + (α −1 − 1)β −1 .In practice, we expect the collision probability of individual instances to be even more clustered near the mean collision probability than this analysis indicates, but proving that this is the case would seem to require computing higher moments like As discussed in the main text, an important implication of an α-anti-concentrated architecture is that for any β with 0 ≤ β ≤ 1 and sufficiently large n which follows directly from the Paley-Zygmund inequality.This inequality indicates that whenever an architecture is anti-concentrated, at least a constant fraction of the outcomes will be allocated an amount of mass that is within a constant factor β of the mean mass; it cannot be the case that all but a vanishing fraction of the outcomes are allocated a vanishing fraction of the mean mass.
U (5)   Random quantum circuit Collision probability Classical stat mech model Partition function γ (5)   Random walk through configuration space Sum over trajectories FIG. 4. A diagram depicting the equivalent ways to interpret the expected value of the collision probability for random quantum circuits.Left: a random quantum circuit of size five.Middle: the reinterpretation as the partition function of a classical statistical mechanics model with local Ising-like spins.Right: another interpretation as a stochastic process of evolving configurations.
Appendix B: Framework for analysis: Random quantum circuits as a stochastic process This appendix gives more details on the correspondence discussed in Section V from the main text.The key idea in our analysis of the collision probability of RQCs is to perform the Haar expectation over each local unitary individually.This is possible due to explicit formulas for expectations under action by a Haar-random unitary.We use these formulas to re-express the collision probability, originally an integral over many continuously varying unitary matrices drawn from the Haar measure, as a weighted discrete sum, which is then analyzed using combinatorial and stochastic methods.This weighted sum can also be interpreted as the partition function of a classical statistical mechanical Ising-like model or as the expectation value of a simple stochastic process.Figure 4 depicts these equivalent representations of the problem.In this appendix, we explain this method and derive the important formulas that will apply generally for any RQC architecture, which are then used in later sections to prove our main results.

Averaging individual unitaries over the Haar measure
The quantity of interest for anti-concentration is the expected collision probability, which is proportional to a second moment over choice of unitary operator U , as illustrated in the following equation, where we recall that |1 n 1 n | ⊗2 is two copies of the circuit input state Moreover, for a fixed quantum circuit diagram, the unitary U is given by Eq. (A1) as a product of single-qudit unitaries U (−j) acting on qudit j for j = 1, . . ., n and two-qudit unitaries U (t) acting on some pair of qudits A (t) ⊂ [n] for t = 1, . . ., s.Each unitary is independently chosen according to the Haar measure, and its expectation can be evaluated separately.Let Then we can write When an architecture is itself a mixture over randomly chosen circuit diagrams, such as the complete-graph architecture, the overall quantity ] is a mixture over terms of the above form.The remainder of this subsection illustrates how the action of M (t) can be evaluated, ultimately allowing us to arrive at the expression for Z given in Eq. (B20).In the other subsections of this section, we explain how that equation can be interpreted as a partition function of a classical statistical mechanical model or as the expectation over simple stochastic process.
When the local unitaries are drawn from the Haar measure (or any exact 2-design), the expression M (t) [ρ] can be evaluated in a simple way.Generally, for σ a q 2 × q 2 Hermitian operator, and with V chosen from the Haar measure over the set of q × q unitaries, we define and observe that, for any unitary W and any σ, where the second equality follows from the invariance of the Haar measure under the substitution V → W V .A mathematical fact from Schur-Weyl duality (see Ref. [44]) is that any operator on k copies of a system that commutes with W ⊗k for any unitary W must be a linear combination of permutation operators over the k systems.Here, we have k = 2 and thus the only permutation operators are the identity operation I and the swap operation S, which can be defined as the operator satisfying S |ψ ⊗ |φ = |φ ⊗ |ψ for any |ψ , |φ .Letting M [σ] = αI + βS, we make the following calculations: which determine α and β and allow us to write The unitaries U (−j) are q × q (single-qudit) that act on qudit j.Two copies of the input state on qudit j is |1 1| ⊗2 {j} .Denote two copies of the input state on the other n − 1 qudits by ρ [n]\{j} .Using Eq. (B9), we then find that meaning that M (−j) simply replaces the state on qudit j as a uniform sum over operators I and S. Hence We call each γ ∈ {I, S} n a configuration.The above equation states that the expected value of two copies of the state after application of all the single-qudit unitaries is precisely a uniform sum over all identity/swap configurations of the n sites.Now, we need to examine the action of M (t) for t > 0. In this case, the unitaries are q 2 × q 2 and act on the qudit pair A (t) .We can use Eq.(B9) by replacing q → q 2 and sending I → I ⊗ I, the identity operation on two copies of two qudits, and S → S ⊗ S, the swap operation on two copies of two qudits.We assume that the input state is a product state ρ [n]\A (t) ⊗ ρ A (t) and see that Since the two qudit gates act after the single-qudit gates, the input state to M (t) will always be a sum of tensor products of I and S, so we only need to evaluate the above expression when ρ A (t) is either I ⊗ I, I ⊗ S, S ⊗ I, or S ⊗ S. Doing so, we arrive at (B14) Thus, if ρ is a linear combination of configurations in {I, S} n , M (t) [ρ] will also be a linear combination of configurations, with coefficients that transform linearly under application of M (t) .For configurations γ, ν ∈ {I, S} n , we let M (t) ν γ be the matrix element of this linear transformation defined such that Suppose that U (t) acts on qudits . Then from Eqs. (B14), (B15), (B16), we have ν γ is always non-negative.The way to think about the above equation is to notice three things.First, the input configuration γ and the output configuration ν must agree on all indices that are not involved in the gate, i.e., for all indices c ∈ {a, b}; otherwise the matrix element is 0. Second, if the two input values involved in the gate agree, i.e., if γ a = γ b then ν a = ν b = γ a = γ b must hold (in which case the matrix element is 1); otherwise, it is 0. Third, if the two input values disagree, then one of them must be flipped so that the two output values agree (in which case the matrix element is reduced to q/(q 2 + 1)); otherwise, it is 0.
Note also that for all γ ∈ {I, S} n .Thus, from Eq. (B2), we find which is the expression quoted in Eq. ( 23) from the main text.In the above equation, the sum is over length-(s + 1) sequences of configurations, which we call a trajectory γ = ( γ (0) , . . ., γ (s) ) and the weight of each term is given by the product of the matrix elements for each step in the trajectory.This final equation is depicted graphically in the right-hand part of Figure 4.

Collision probability as statistical mechanical partition function
The expression for the collision probability in Eq. (B20) can be interpreted as a partition function for a classical statistical mechanical model by thinking of each γ (t) j as an Ising spin variable with the association {I, S} ↔ {+1, −1}.A trajectory γ is then a configuration of the Ising spins, and Z is a weighted sum over all the spin configurations.Moreover, the weight is always non-negative and is given by a product of factors M (t) γ (t) γ (t−1) that can be determined by examining a small number of the spin values.This means that the energy functional over spin configurations of the classical Ising-like model is always real and can be broken up into local terms that depend on the local dimension q and which qudits are acted upon at each step in the circuit.
The statistical mechanics interpretation has been a useful one for similar problems in the past, where certain RQC moment quantities can be exactly rewritten as the partition sum over spin configurations of a lattice model, as depicted in the central diagram in Figure 4. We can arrive at the formulation as in Eq. (B20) from the lattice model by summing over a subset of the spins and reinterpreting the resulting nodes as 4-body interaction vertices.This exact rewriting of RQC moment quantities has been used to compute, for instance, correlation functions [3], Rényi entropies [38], and the distance to forming an approximate design [17].Moreover, thermal phase transitions in the classical model can be related to phase transitions of entanglement-entropy-like quantities for the output state of the RQC [39][40][41].The interpretation is particularly intriguing when considering analogous quantities to Z for higher moments.The collision probability is a second moment quantity, and the resulting stat mech model has Ising-like variables with two possible values.Quantities related to the kth moment will map to classical stat mech models that have k! possible values, one for each element of the symmetric group S k .However, one challenge of computing highermoment quantities is that the weights in the partition function can be negative (corresponding to non-real values of the energy for certain spin configurations), complicating many strategies for bounding its behavior, including the strategies employed in the rest of this paper.

Unbiased random walk
We can build from the formula for Z in Eq. (B20) and re-express it in terms of a length-s unbiased random walk through configuration space {I, S} n , which we denote P u .At time step 0, a configuration γ (0) is chosen uniformly at random, i.e., the initial distribution is the uniform distribution in configuration space, denoted Λ u .Then configuration γ (t+1) at time step t + 1 is generated from the configuration γ (t) at time step t as follows: letting A (t) = {a, b}, if the ath and bth bits of γ (t) agree, then the configuration is left unchanged at time step t + 1; if they disagree, either the value at a or the value at b is flipped each with probability 1/2 to form γ (t+1) .The weight is reduced each time a bit is flipped.Thus we can write where E Pu,Λu indicates the expectation over the choosing a length-s walk as described above, where the initial distribution is Λ u .This is seen to be equivalent to Eq. (B20) since the probability of a certain trajectory occurring is given by q −n (1/2) # of bit flips and thus each trajectory contributes exactly the same amount toward Z, once the probability of observing the trajectory is accounted for.

Biased random walk
A potential problem with the unbiased random walk picture is that the weight of a particular walk is related to the number of bit flips that occur during that walk; it depends not only where the walk begins and ends but also on how it got there.To fix this issue, we can form an equivalent biased random walk denoted P b .In this case, the initial distribution Λ b is not uniform over {I, S} n , rather, the probability of choosing γ (0) = ν is proportional to q −| ν| , where | ν| is the Hamming weight of ν (number of S entries).Specifically, we have The dynamics of P b are the same as P u , except that when the two bits involved in a gate disagree, it chooses to flip the S to I with probability q 2 /(q 2 + 1) and I to S with probability 1/(q 2 + 1).Thus, it is biased in the I direction.Then, we can express Note that the quantity being averaged is exponentially large in the Hamming weight of its final ending point, making the quantity sensitive to the probability that the biased walk stays far from the all I configuration.The biased walk is observed to be equivalent to the unbiased walk simply by noting that once the probability of observing a certain trajectory is included, every trajectory contributes the same amount to Z for both walks.The exponential weighting underneath the expectation in the biased walk exactly cancels the bias in the probability of observing a certain walk.

Computing sums over trajectories
Throughout our analysis, we will need to compute weighted sums over various trajectories, or, relatedly, compute probabilities that the biased and unbiased walks end in a certain place.We use the following lemma.The key takeaway is that (perhaps surprisingly), in the limit of infinite size, the contribution of all trajectories originating from a certain initial configuration depends only on the Hamming weight of that initial configuration, and not the configuration itself.Moreover, this contribution can be calculated.This lemma is a more precise and generalized version of the recursive calculation of Q(x) in Section V in the main text.
Proof.First, we claim that the sum should depend only on x, y, and m, and not on γ (0) (other than through its dependence on x).To see this, note that there is a one-to-one correspondence between trajectories in T and sequences of Hamming weights (x, x 1 , . . ., x s −1 , y) with the property that either x t = x t+1 + 1 or x t = x t+1 − 1 for every t (no consecutive duplicates).This is seen by ( 1) the fact that given a trajectory in T , one can generate such a sequence by taking the Hamming weight of each configuration in the sequence and removing consecutive duplicates and (2) the fact that given such a Hamming weight sequence one can generate a unique trajectory by starting with γ (0) , evolving the trajectory according to the circuit diagram A, and always choosing whether to flip I to S or S to I so that the order of Hamming weights prescribed by the sequence is followed.Thus, the sum over trajectories in T may be replaced by a sum over Hamming weight sequences, which does not depend on γ (0) , except through its Hamming weight x.
For each x in the interval [y, y + m], let the expression on the left-hand-side of the lemma be given by Q(x).Then for each x in [y + 1, y + m − 1], we have the recursion relation since the first bit flip will either send x to x − 1 or to x + 1 and in either case a factor of q/(q 2 + 1) is incurred.The recursion relation gives rise to a general solution of the form for some constants F and G.This is a unique solution since all values can be generated once two consecutive values are specified, and the specification of two consecutive values also uniquely specifies F and G.To find F and G in this case, we must also impose the boundary conditions Q(y) = 1 and Q(y + m) = 0, since if x = y the only trajectory in T is the length-0 trajectory ( γ (0) ), and if x = y + m, T is the empty set.By specifying these boundary conditions we can solve for F and G and verify the statement of the lemma.
Corollary 1. Fix non-negative integers x, y, m such that y ≤ x < y + m.For the biased walk, if the starting configuration has Hamming weight x, the probability that the walk reaches a configuration with Hamming weight y before it reaches a configuration with Hamming weight y + m is given by q x−y 1 − q −2m q −(x−y) − q −2m+x−y . (B28) Proof.The transition rules of the biased walk prescribe that transitions upward in Hamming weight occur with probability 1/(q 2 + 1), and transitions downward in Hamming weight occur with probability q 2 /(q 2 + 1).Thus the probability of a series of transitions in which the initial Hamming weight is x, the final Hamming weight is y, and the number of times a bit flip occurs is b is precisely q x−y (q/(q 2 + 1)) b .The sum over all paths weighted by their probability is then precisely the sum in the left-hand-side of Lemma 1 scaled by q x−y , yielding the corollary.
Corollary 2. If we begin at a trajectory γ (0) with | γ (0) | = x and allow the biased walk to evolve until it ends at one of the fixed points I n or S n , then the probability that the trajectory ends at I n is given by the partition becomes large after Θ(n) gates.Most natural architectures we might consider have this property.One architecture that is not regularly connected is the hypercube architecture, where n = 2 D qudits lie at the vertices of a D-dimensional hypercube, and D layers of gates are performed by cycling through each set of parallel edges.In this architecture, it would take nD/2 = Θ(n log(n)) gates to guarantee that any partition has been crossed.Assuming the regularly connected property, we can show a weak upper bound on the collision probability.

of bit flips during walk)
. (C4) Define Z (t) to be the value of the collision probability, given above via the biased walk, after t time steps, so Z = Z (s) and Z (0) = 2 n /(q + 1) n .Consider a given trajectory produced by the unbiased walk up to time step t, γ = ( γ (0) , . . ., γ (t) ).If γ (t) = I n or γ (t) = S n then the walk has reached a fixed point and will never again change.From the calculation in Section B 6, we know that the sum of all the weights of all walks of any length that reach a fixed point is precisely Z H . Since the weights are non-negative, this implies that the sum over walks that have reached it before time step t is less than Z H , and hence the combined weight of trajectories that have not reached a fixed point by time step t is at least Z (t) − Z H .Meanwhile, if γ (t) is not at a fixed point, then we can consider the proper subset R ⊂ [n] of sites with value S. By the r-regularly connected property, there is at least a 1/2 chance that one of the gates between time step t + 1 and t + rn matches an index in R with one in the complement of R. When this happens, a bit must be flipped and the weight of that trajectory is reduced by factor 2q/(q 2 + 1).Thus, the following must hold Moreover, we know that Z (0) = 2 n /(q + 1) n , so s/(rn) s/(rn) where a := (2r) −1 log 2(q 2 + 1) (q + 1) 2 = Θ(1) (C11) Note that we have made no attempt to optimize the constant prefactor of the Θ(n 2 ) or the value of a. Indeed, we conjecture that Theorem 7 could be improved so that s * = Θ(n log(n)), which would be a dramatic improvement that implies the fundamental scaling of the anti-concentration size is independent of the architecture's connectivity, so long as it satisfies the regularly connected property.

Lower bound on collision probability
In this section, we prove an Ω(n log(n)) lower bound on the circuit size needed for anti-concentration in general circuit architectures.This also implies an Ω(log(n)) lower bound on the anti-concentration depth.
Theorem 8 (Theorem 4 from main text).For any RQC architecture of size s on n qudits with local dimension q, the collision probability satisfies Corollary 3.For a given RQC architecture, let s AC be the minimum circuit size, as a function of n, such that Z ≤ 2Z H .Then, it must hold that Proof.This statement follows directly from the bound in Eq. (C13).
Corollary 4. For a given RQC architecture, let d AC be the minimum circuit depth, as a function of n, such that Z ≤ 2Z H .Then, it must hold that Proof.Each layer can have at most n/2 gates, so it must hold that d AC ≥ 2s AC /n.
Proof of Theorem 8. We use the framework of the biased random walk, given by the expression for Z in Eq. (B24).
For each of the n sites, there is some initial probability that it starts with value S, and then each gate involving that site has some chance of flipping it to value I.However, there will always be some minimum probability that even after many gates, the value has not yet been flipped to I.This constitutes the idea behind our lower bound.Given an index j ∈ [n], we compute a lower bound on the probability that γ (t) j = S for all t = 0, 1, . . ., s, (i.e. the jth bit begins with value S and is never flipped to I), as a function of the number of gates s j that act on qudit j since there is a 1/(q + 1) chance that γ (0) j = S when we draw γ (0) from Λ b , and the probability it does not flip after each gate is at least 1/(q 2 + 1).This holds for each j, and thus we have Since each of the s gates in the circuit diagram acts on two indices, it must hold that j s j = 2s, and given this constraint, the minimum of the final expression above occurs when all the s j are equal, and thus By convexity of the exponential function, we have , and hence Appendix D: Bounds for the 1D architecture We now focus specifically on the 1D architecture defined formally in Definition 2. We assume periodic boundary conditions, although it would be possible to consider open boundary conditions as well.In 1D, the qudits are arranged in a geometrically local fashion and it is fruitful to think of a configuration γ ∈ {I, S} n as being composed of contiguous domains, consecutive sites where all the values are I or all the values are S.We then identify domain walls as locations where one domain ends and another begins.Gates that couple qudits in different domains cause one of the values to flip, which moves the domain wall separating those domains one unit to the left or one unit to the right.The notation for talking formally about this is discussed in the next subsection, and then the upper and lower bounds on Z are proved.

Domain walls and notation
In 1D, configurations γ ∈ {I, S} n are associated with a set of domain wall locations.We let be the set of domain wall positions for a configuration γ, where γ 0 is identified with γ n when there are periodic boundary conditions.For each set of domain wall locations there are exactly two configurations that map to it, since choosing γ 0 = I or γ 0 = S determines the value of all other sites.A configuration trajectory γ = ( γ (0) , . . ., γ (s) ) is then associated with a sequence of sets of domain wall locations G = (g (0) , . . ., g (s) ) where g (t) = DW ( γ (t) ).We call G a domain wall trajectory.Domain wall trajectories with non-zero contribution to the collision probability Z obey the following rules: when there is a domain wall at position e and a gate acts on qudits {e, e + 1}, the domain wall must move to position e − 1 or e + 1 (at the cost of a reduction in the weight) and may annihilate with another domain wall if there is already a domain wall at the new position.However, pairs of domain walls cannot be created; the number of domain walls that exist throughout the domain wall trajectory is non-increasing, and a particular domain wall can be uniquely tracked throughout each step of the trajectory (either until the final step or until its annihilation).Let G be the set of all domain wall trajectories that obey these rules.Any domain wall trajectory G ∈ G will have the property that when t is odd, e is even for all e ∈ g (t) , and when t is even (but non-zero), e is odd for all e ∈ g (t) .This is because odd (even) numbered layers couple qubits {2j − 1, 2j} ({2j, 2j + 1}) meaning domain walls must lie between qudit positions 2j and 2j + 1 (between qudit positions 2j − 1 and 2j) for some j.
By converting the sum over trajectories in Eq. (B20) to a sum over domain wall trajectories, we can express Z by the equation where the weight is given as follows, recalling that A (t) is the pair of qudit indices involved in the tth gate, which in 1D is always A (t) = {j, j + 1} for some j. (D3) Cartoon illustrating unique decomposition of a domain wall trajectory G into a disjoint union of one part, G0, where all domain walls annihilate prior to the end of the circuit, and another part, GU , where no domain walls annihilate.
In other words, if the gate on qudits {j, j + 1} and there is a domain wall at position j, then the weight is reduced by a factor q/(q 2 + 1) (and the domain wall must move to position j − 1 or position j + 1, possibly annihilating if a domain wall already exists at that position).Given two domain wall trajectories G and G , we will consider the combined domain wall trajectory where is the disjoint union and is defined only under the assumption g (t) ∩ g (t) = ∅ for all t.The upshot of thinking about trajectories this way is that if In particular, we will find it useful to decompose a domain wall trajectory G into G = G U G 0 where G U is a domain wall trajectory with a conserved number of domain walls throughout the trajectory, and G 0 is a trajectory for which |G (s) 0 | = 0, i.e. all the domain walls have annihilated by the end of the trajectory.This decomposition is unique, and an example is shown in Figure 5.Let G U and G 0 be the subsets of G that have no annihilations and that have no surviving domain walls at the end of the circuit, respectively.Let G U,k be the subset of G U with k domain walls.When the boundary conditions are periodic, k must be even for G U,k to be non-empty.

Collision probability upper bound
Theorem 9 (Theorem 1 from main text).For the 1D architecture, let a := log q 2 + 1 2q (D7) Then, whenever s ≥ s * .The circuit depth d is d = 2s/n, so we may define d * = 2s * /n and equivalently conclude Note that when s < s * , an upper bound on Z can still be inferred from this method.The essence of the proof of Theorem 9 is the same as the proof of the statement proved in [32], although we have expressed it here within our notation and framework.
Proof.We use the formula in Eq. (D2), which expresses Z as a weighted sum over domain wall trajectories.Each domain wall trajectory G = (g (0) , . . ., g (s) ) can be associated with an integer k = |g (s) |, the number of domain walls that remain unannihilated at the end of the trajectory.Due to periodic boundary conditions, k must be even, and let k 0 = k/2.Let G k ⊂ G be the associated set of length-s domain wall trajectories, and let G U,k ⊂ G k be the subset containing domain wall trajectories that have a conserved number of domain walls throughout.As discussed in the previous subsection, it is possible to uniquely decompose Suppose we fix a domain wall configuration g (0) for the initial time step at the beginning of the circuit with k domain walls.There are n k such configurations.The total weight of all the trajectories in G U,k that begin at this configuration is at most (2q/(q 2 + 1)) k(d−1) since each domain wall must move either left or right (introducing a factor of 2) during each of the d layers of gates, except for possibly the first layer (if the domain wall begins at an even position it does not move during the first layer), and each time one moves it incurs a weight reduction of q/(q 2 + 1).This does not account for the rule that the k domain walls cannot intersect, but it still yields an upper bound on the total weight.
Meanwhile, the sum of the weights of all domain wall trajectories in G 0 approaches Z H (q + 1) n /2 from below as depth increases.This follows from the analysis in Section B 6 where it was shown that the sum over all trajectories that eventually reach a fixed point is exactly Z H (q + 1) n , but at a finite depth not every trajectory will have reached a fixed point so only a subset of the terms are included in the sum.Due to the fact that each domain wall configuration corresponds to 2 equal-weight trajectories through {I, S} n the sum of the weights of all the domain wall trajectories in G 0 can be at most Z H (q + 1) n /2.
Collecting these observations, and recalling k = 2k 0 , we have where Eq. (D17) holds so long as d ≥ d * , based on the following small lemma.

Collision probability lower bound
Theorem 10 (Theorem 5 from main text).Consider the 1D architecture.There are constants A and A such that as long as s * − s ≥ A n, the collision probability satisfies where a and s * are the same as in Theorem 9.
In our proof, the constant A is explicit but very small, on the order of e −10 , and A ≈ − log(A).The value of A could certainly be improved with some attempt at optimization.Corollary 5.For the 1D architecture, if we define s AC and d AC to be the smallest circuit size and circuit depth for which Z ≤ 2Z H , then Proof.Theorem 9 implies that Meanwhile, Theorem 10 implies that if Proof of Theorem 10.Eq. (D2) expresses Z as a weighted sum over domain wall trajectories.Heuristically, when d < d * we expect that the output distribution will not be anti-concentrated and that domain wall trajectories drawn at random with probability proportional to its weight will usually have many domain walls that never annihilate.To lower bound Z, we will sum over the set of configurations with k unannihilated domain walls for a particularly chosen value of k.
For a fixed value of the depth d, define We chose n H to be exactly half the value of n for which a depth-d circuit would be anti-concentrated.Heuristically, we expect on the order of n/2n H unannihilated domain walls in typical configurations.
Let k be an even integer to be specified later.Let H k ⊂ G U,k be the set that contains any domain wall trajectory H = (h (0) , . . ., h (s) ) for which (1) H has k domain walls at each time step (none annihilate) (2) For each of the k domain walls in the initial configuration h (0) , the nearest domain wall in both directions is at most n H positions away.Now, temporarily fix some H ∈ H k .It has k domain walls which move around throughout the trajectory.We let e H,j,t be the location of the jth domain wall at time step t in the trajectory H.We then define the set J H,j ⊂ G 0 , for j = 1, . . ., k to be the set of domain wall trajectories for which (1) all of the domain walls annihilate before time step s and (2) the position e t of any domain wall at time step t satisfies e H,j,t < e t < e H,j+1,t . (D30) In other words, all of the domain walls fall between the jth and (j + 1)th domain walls of H.This ensures that H is disjoint from any J j ∈ J H,j .Specifying a trajectory H ∈ H k as well as J j ∈ J H,j for each j = 1, . . ., k, determines a unique trajectory H = H J H,1 . . .J H,k .This decomposition is illustrated in Figure 6.Thus, if we perform the weighted sum only FIG. 6. Outline of the idea of the proof of Theorem 10.We choose a domain wall trajectory H which has k domain walls that never annihilate and such that the distance between consecutive domain walls is always at most nH .We then choose domain wall trajectories JH,1, . . ., J H,k such that the domain walls of JH,j lie between the jth and (j + 1)th domain walls of H and all annihilate before the end of the circuit.The domain wall configuration H is the disjoint union of H and JH,j for j = 1, . . ., k.We can lower bound the collision probability by lower bounding the weighted sum over the contribution from all H formed this way.
over the set of H formed this way, we will arrive at a lower bound to Z, as follows: The quantities in parentheses can be bounded with the following two lemmas, whose proofs are delayed until after the proof of the Theorem.
Lemma 4. Fix a value of H and j.Suppose the jth and (j + 1)th domain walls of the initial configuration of H lie at positions e and e + X − 1 (mod n), respectively, for some positive integer where c = 3e 10 .
The sum of the domain length X for each of the domains is simply n.Thus the ((q + 1)/q) X factors cancel the 1/(q + 1) n prefactor for Z, and we have for any k that satisfies 4d ≤ n/k and n H /2 ≤ n/k .Now we choose a value of k to maximize the right-hand-side of the above equation.In the limit of large n, the requirement that k is an even integer will have negligible effect.In our analysis, we handle this requirement by defining k to be a real number and k to be the smallest even integer larger than k , and then we make a few rather crude bounds on the floor and ceiling of quantities like n/k, which are not asymptotically tight but good enough for our purposes.We choose k := smallest even integer greater than k (D37) Note that n/k is at least 8ce, which is very large, meaning n/2k /2 ≤ n/2k ≤ 2 n/2k certainly holds.For finite n, we can say that as long as k ≥ 1, then k ≤ k ≤ 2k will hold.The requirement k ≥ 1 translates into which, by recalling s = nd/2 and that s * ≥ (2a) −1 n log(n), can be re-expressed as with A := (2a) −1 log(8ce(e − 1)) + 2 −1 , which is assumed to hold in the Theorem statement.This implies that Inspection of the formula for k reveals that the relation 4d ≤ n/k holds for any d and n.Moreover, we have n H /2 = (ne −a )/(k 32e(e − 1)c) ≤ n/k so the second relation holds as well.
Recall that Z H = 2/(q n + 1) ≤ 2q −n .Plugging in the above bound on n/k into Eq.(D35), we find for A := 1/8ce.Note that this value of A is quite small (on the order of e −10 ) but with some optimization could likely be made much larger.Now we provide the delayed proofs of the two lemmas.
Proof of Lemma 3.Each term in the sum on the left-hand-side is non-negative, so we make a lower bound by summing over a subset of the terms.To do so, we can split the n indices up into k nearly equal-size segments of length at most n/k , which is less than n H /2 by assumption.Then for each of these segments, we choose the location of a single domain wall that is at least distance d from each edge of the segment.This will generate a unique initial domain wall configuration that satisfies criteria (2) of H k , since any pair of consecutive domain walls is closer than n H apart.The total number of choices is at least which, by the assumption 4d ≤ n/k , is at least ( n/k /2) k .
Once the initial k domain wall locations have been chosen, we examine how they can propagate through the circuit.Each layer of gates will force each of the k domain walls to move in one of two directions, and the weight is reduced by a factor (q/(q 2 + 1)) k , except for the first layer, where some of the domain walls may not move if they begin at an even index.Since by construction, there are no instances where domain walls start within a distance of 2d of any other domain wall, there is no chance of domain walls crossing.Thus, we find that for each initial set of k locations chosen in the manner outlined above, the combined weight of all possible trajectories is at least (2q/(q 2 + 1)) kd .This proves the lemma.
Proof of Lemma 4. Consider an alternative 1D qudit system with periodic boundary conditions consisting of X sites by identifying site e + X with site e and ignoring all other sites.Because H ∈ H k , we can be assured that X ≤ n H . Let J H,j be the set of all domain wall trajectories on the size-X system.Let J H,j,l be the subset that have l = 2l 0 domain walls on the last time step.Because the collision probability, denoted Z X , for this X-qudit system must satisfy Z X ≥ Z H,X , and here Z H,X = 2/(q X + 1), it must be the case that We can upper bound the contribution of all the terms with l 0 > 0 in the above expression by the method that yielded the upper bound in Theorem 9.The sum of those terms is upper bounded by the second term in Eq. (D17), that is since X ≤ n H . Combining Eqs.(D47) and (D49), we find a lower bound on the l 0 = 0 term   J ∈J H,j,0 weight(J) where the last inequality follows since q ≥ 2 and X ≥ 1 must be true.Now, every domain wall trajectory in J H,j e j e j+1 e j e j+1 e j e j+1 Outline of the argument in the proof of Lemma 4 that the sum over domain wall trajectories in JH,j is at least the sum in J H,j,0 divided by some constant factor, expressed in Eq. (D58).Every trajectory in J ∈ J H,j,0 can be decomposed into a trajectory J ∈ JH,j a trajectory K ∈ BH,j, and a trajectory K ∈ BH,j+1, where each domain wall in K intersects the jth domain wall of H, and each domain wall of K intersects the (j + 1)th domain wall of H.Because the combined weight of all possible K and K is only a constant factor, independent of n, the combined weight of all possible J cannot be more than a constant factor smaller than the combined weight of all possible J .Note that the system with X sites has periodic boundary conditions in this figure .will also be in J H,j,0 , but the converse will not be true.Some trajectories in the latter set will have one or more domain walls that intersect with either the jth or the (j + 1)th domain wall of H at some time step, which is not allowed within the former set.Thus, the sum over the domain wall trajectories in J H,j will be smaller than the sum over those in J H,j,0 , but we argue by at most some constant factor by the following argument, which is also described in Figure 7. Let B H,j be the set of all trajectories in which every domain wall either intersects the jth domain wall of H at some time step t, or it annihilates with a domain wall that previously intersected with the jth domain wall of H. Then any trajectory in J H,j,0 can be formed as the disjoint union of a trajectory in J ∈ J H,j , a trajectory in K ∈ B H,j and a trajectory in K ∈ B H,j+1 , to account for the parts that intersect the jth and (j + 1)th domain walls.Given J , the choice of J for this decomposition is unique, but there may be multiple choices of (K, K ) for which it holds.Note also that a trajectory in B H,j can be decomposed into individual domain wall pairs that coincide with the jth domain wall of H at some time step t and annihilate at some time step t .The combined weight of all such pairs, given fixed coincidence point at e H,j,t is at most (2q/(q 2 + 1)) 2t .Summing over t ≥ t we find the combined weight for all possible domain wall pairs coinciding at time step t is at most There can be many domain wall pairs that intersect the jth domain wall of H, but for each value of t there will either be no intersection (in which case the factor is 1) or one intersection (in which case the factor is at most the above quantity).Thus we can take the product over including or not including a domain wall at each value of t and find where the last inequality follows since q ≥ 2 and the function of q inside the exp is monotonically decreasing.
Combining the above with Eq. (D50), we arrive at   J∈J H,j weight(J) Appendix E: Bounds for the complete-graph architecture

Proof intuition and guide
In the following sections, we complete the proofs for upper and lower bounds of the complete-graph architecture, defined formally in Definition 1.The first insight about the complete-graph architecture is that all configurations with the same Hamming weight are equivalent, as there is a symmetry upon permutation of the qudits.Thus, trajectories through configuration space {I, S} n are reduced to trajectories through Hamming weight space {0, 1, . . ., n}.
Our upper bound will use the framework of the unbiased walk, and the lower bound will use the biased walk.Recall we can use the unbiased walk to express the collision probability Z as a sum over all possible paths that the trajectory might take, working from Eq. (B22) where the n x comes from the fact that this is the number of initial configurations with Hamming weight x.For the complete-graph case, P u takes on a simple form: if the current configuration is x, the chance that the configuration changes on the next time step is precisely the chance of finding mismatching values upon drawing a random pair of indices in [n], which is given by 2x(n−x) n(n−1) , and if it does change, it is equally likely to become x − 1 or to become x + 1.For the biased walk P b , everything is the same except that when the configuration changes, it is biased to travel to x − 1 with probability q 2 /(q 2 + 1).Also, in the biased case, the initial configuration is not a uniform choice over all configurations but instead distributed according to Λ b , and the expectation in the above equation is replaced with E[q | γ (s) | ].Thus larger Hamming weight configurations are exponentially more significant in their contribution to Z.
To gain an intuition for what we expect, we first think about the biased walk, which is what we use for the lower bound.Here, the peak of the probability mass in the initial configuration Λ b starts around Hamming weight x = n/(q + 1).On average, the walk lingers for n(n − 1)/2x(n − x) time steps before moving, which is approximately equal to n/2x when x is close to 0. Due to the bias, and due to the time required to wait, the effective speed of the biased walk is in the direction of 0, since each time it moves, it has a q 2 /(q 2 + 1) chance of moving one unit closer to 0, but a 1/(q 2 + 1) chance of moving one unit farther away from 0. Thus, in expectation, the time it takes for the peak of the probability mass to reach value 0 is noting that x 1 x ≈ log(n).This strongly suggests that s * time steps are necessary for anti-concentration, as any less time would mean the peak of the distribution over Hamming weights at the end of the circuit will be located at some Hamming weight y > 0 and as a result, it will receive a significant amount of weight q y in its contribution to Z.This is the intuition for our lower bound.
The biased walk also gives intuition for why there is a matching upper bound.If the circuit size is a little bigger than the lower bound, we expect the peak of the distribution to have terminated at the fixed point at 0. It is still possible that the tail of the distribution, which will not yet have reached the fixed point, is too fat to for anti-concentration to have been achieved; each unit farther away from 0 results in a factor of q larger contribution to Z, so we need the tail to be exponentially decaying if we want to be able to ignore it.This is essentially what we are able to show, albeit in a way where it might not be completely clear that this is what we have done.Intuitively, one reason we expect this exponentially decaying tail is because the effective speed slows down as you get closer to zero.This gives the tail of the distribution, which is sitting further away from 0, time to "catch up," as its effective speed is faster.
To actually perform the upper bound, we turn back to the unbiased walk.To be clear and to match the progression in the full proof, we introduce the concept of a reduced path (equivalently, "reduced walk") as a walk that never stands still at a certain configuration.If its Hamming weight at time step t is y, then its Hamming weight at time step t + 1 will move to y − 1 or y + 1.For each walk, we can form a corresponding reduced walk simply by removing consecutive duplicates from the sequence of configurations.Another way to look at it is that given a fixed reduced walk, the actual walk will linger at each location for a certain number of time steps before continuing.In the limit of large circuit size s, there is enough time for the actual walk to linger as long as it would like at each step, and any reduced walk will successfully be "completed" by the actual walk.In this limit, Z = Z H where Z H := 2/(q n + 1) is the Haar value.Away from this limit, there is some probability that some reduced paths will not be completed.
We are able to express the difference between Z and Z H as a sum over all reduced paths, including reduced paths that do not terminate at Hamming weight 0 or Hamming weight n, where the summand is proportional to the probability that the reduced path is not completed within s time steps: The next key insight is to use a Chernoff bound to bound the probability of a reduced walk not being completed.If L is the length of the walk, the Chernoff bound states (for any constant a > 0) that e as , (E7) but this is particularly useful because, for fixed φ, L is itself a sum of independent random variables L φ (i) , the number of time steps the walk waits on step i.Thus and because each random variable L φ (i) is exponentially distributed, we can calculate E[e aL φ (i) ] exactly.For the purposes of the proof sketch, denote which will depend only on the Hamming weight x of the configuration the walk is at.(The walk will wait longer when it is near 0 or n than when it is near n/2.)This dependence appears to be a problem as it is unclear how to actually perform the following sum over all possible φ.(The constant a for each reduced walk φ, denoted a φ , will be specified later.) To proceed, we break φ up into subpaths that inch closer and closer to 0 and n.We can write φ as the concatenation of φ x , φ x−1 , . . ., φ w , where φ v begins at either v or n − v and only reaches v − 1 or n − v + 1 for the first time on the very last step.Then, w is the minimum Hamming weight distance from one of the fixed points (0 or n) the reduced walk φ ever reaches.Because the walk φ v spends all its time between v and n − v, the expectation T y for all of the y within one of these φ v walks will be less than or equal to T v (the walk moves slower when its closer to 0 or n), and we can write (q + 1) n (Z − Z H ) Here the n x comes in as the number of configurations with Hamming weight x, and a φ has changed to a w because we will choose it so that it only depends on the end point w of φ.
The above equation is huge progress because we already know how to perform the sums in brackets.Essentially, the factor of T x simply changes the effective value of q; we may define q to satisfy q/(q 2 + 1) = qT x /(q 2 + 1) . (E12) Then we can use the formulas for sums over paths that we have already developed in Lemma 1 to perform the sums.What we find is that, for the values of a w that we can choose, we must allocate roughly (q 2 + 1)n/2(q 2 − 1)x time steps for s such that e −aws can cancel out the value of the sum in brackets for φ x .Note that this is precisely the inverse of the effective speed we defined before.Then, for all the sums to be canceled from v = 1 to v = n/2, we must allocate n/2 v=1 q 2 + 1 2(q 2 − 1) time steps.Fundamentally, the log(n) factor becomes necessary because the walk waits longer and longer as it gets closer and closer to the fixed points.In the full analysis, we find a term linear in n is also necessary to fully anti-concentrate but our analysis of the linear term is not tight.

Preliminaries a. Trajectories
For the complete-graph architecture, we may keep track of only the Hamming weight of a certain configuration.Thus, our random walks are over the set {0, 1, . . ., n}.A trajectory γ is now a sequence of integers (γ (0) , . . ., γ (s) ).Generally speaking, if t > s for a sequence of length s, let γ (t) return γ (s) .A sequence is valid if for every t, |γ (t) − γ (t−1) | ≤ 1 and such that if 0 or n appears, it appears only once at the very end of the sequence.Let Γ be the set of all valid trajectories.
For any valid trajectory γ the unbiased random walk associates a non-zero probability: Pr where We can make the same definition for the biased random walk by replacing P u with P b where For P ∈ {P u , P b } and any subset Υ ⊆ Γ, we let Pr P [Υ] = γ∈Υ Pr P [γ] be the total probability assigned to paths in Υ.

b. Conditional probabilities and expectations
For any γ ∈ Υ we may also define the conditional probability which indicates drawing from the subset Υ with probability proportional to that assigned by the (unbiased or biased) random walk.This also allows us to naturally define conditional expectation values for some quantity Q computed from γ

c. Trajectory concatenation and other operations
For any trajectory γ = (γ (0) , . . ., γ (s) ) ∈ Γ, let L[γ] = s be the length of the trajectory, and let γ (L) be shorthand for γ (L[γ]) .The statement w ∈ γ returns true if there exists some t for which γ (t) = w.Then, we let be the first time step along γ for which the trajectory reaches w.We also let be the maximum and minimum Hamming weight the trajectory passes through.We can naturally concatenate two trajectories γ 1 and γ 2 if γ For any trajectory γ we let γ be the flipped trajectory.
Similarly, let γ return the reversed trajectory.

d. Important subsets of Γ
We now define various subsets of Γ.Let be the subset of trajectories that begin at x, and let be the set of trajectories that reach w for the first time and immediately terminate.We make the natural combination of these Γ w x := {γ ∈ Γ : Of particular importance are sets where w = 0 or w = n, which include valid trajectories that terminate at one of the fixed points of the random walk.Define and note that Pr Pu [Γ * x ] = Pr P b [Γ * x ] = 1, a statement that intuitively makes sense since walks will eventually reach either 0 or n with probability 1. Adding the superscript w to any set Υ restricts to walks for which L When any walk in Υ can be concatenated with any walk in Υ we let We also introduce the concept of a reduced trajectory, which we sometimes refer to synonymously as a reduced walk, which is a valid trajectory for which |γ (t) − γ (t−1) | = 1 for all t; that is, the reduced walk never stands still.We let the set of reduced walks be Ψ, and let all sub and superscripts restrict Ψ in the same way they restricted Γ.For any γ ∈ Γ we can associate a reduced walk ψ ∈ Ψ by removing consecutive duplicates from γ.Under this definition, we let R[γ] := ψ.For any ψ ∈ Ψ we let where the second condition acts to include only trajectories γ whose final configuration appears only once (i.e. when the final configuration is removed, the reduced sequence changes).Under dynamics by either the unbiased or biased walk, it is easy to calculate the probability associated with Γ ψ : Pr . (E34) Finally, define the following subsets of Ψ: where the subset Ψ c a|b is defined as follows.
In words, the set Ψ w x|z includes reduced walks that begin at x and end at w without ever reaching z.Thus Λ x is the set of reduced walks that start at x and end at x − 1 without ever reaching n − x + 1 or end at n − x + 1 without ever reaching x − 1.The set Ξ w is the set of reduced walks of any finite length that start at w but never reach either w − 1 or n − w + 1.

Upper bound proof
Theorem 11 (Theorem 2 from main text).For the complete-graph architecture with circuit size s on n qudits with local Hilbert space dimension q as long as s ≥ s * , where Proof.In this proof, we will be working with expressions for the collision probability Z.It will take several steps to manipulate the original expression into the form we need, so we will move back and forth between updating the expression and developing the tools needed to justify these updates.We start by expressing This is seen to be equivalent to Eq. (B22) as follows.There are n x initial configurations with Hamming weight x, and generating a length-s trajectory beginning at x with the unbiased Markov chain is equivalent to randomly choosing a trajectory γ from Γ * x , which begins at x and ends at a fixed point (0 or n), with probability proportional to that assigned by the unbiased walk, and then truncating the walk to length s, denoted by γ [s] .Then R[γ [s] ] is the reduced trajectory, where consecutive duplicates are removed, and L[R[γ [s] ]] is the length of that reduced trajectory, or in other words, the total number of bit flips that have occurred within the first s time steps. . (E54) Now we examine the final expression.The difference between Z and Z H is a sum over Ψ x , which includes all reduced paths φ that start at x and may or may not terminate at 0 or n.The statement L[γ] > s is true if the number of time steps it takes to complete this reduced path is at least s, i.e. the probability that the path does not finish within s time steps.As a sanity check, when s becomes infinite, we expect this probability to become zero for any path as there would be enough time for any path to finish, and in this case Z = Z H as expected.This expression represents progress because we will be able to bound the probability of a certain path being completed using a Chernoff bound.
For any random variable X and for any constant a > 0 Pr We use this bound with X = L[γ], k = s, and yet-to-be-specified constants a φ > 0 . (E56) The Chernoff bound has the additional benefit that E[e aX ] separates when X is the sum of independent random variables.In particular, once φ is fixed, L[γ] is the sum of exponentially distributed random variables corresponding to how many time steps the path γ waits at each position along the reduced path φ.This is seen formally by noting that φ = (φ (0) , φ (1) ) • (φ (1) , φ (2) and meanwhile, for any collection of subsets Υ m For any r = 0, . . ., L[φ] − 1, we can evaluate where is the expected amount of time the walk will wait at Hamming weight v before moving to v + 1 or v − 1, and hence We have made some progress at evaluating the bound on Z, but at this point it remains unclear how to perform the sum over paths φ ∈ Ψ x .To do so, first we will decompose paths φ into a series of subpaths that inch closer and closer to the fixed points at 0 and n.In particular, we will decompose a path φ as a concatenation of subpaths drawn from Λ v for various v and one final subpath drawn from Ξ w , as described in the following lemma.Recall from Eqs. (E35) and (E36) that these subsets of Ψ are defined by where they start, where they end, and/or some maximum or minimum point at which they ever reach.
Lemma 5. Suppose that φ ∈ Ψ x .Let x = min(x, n − x) and let w = min(m(φ), n − M (φ)).Then there is a unique sequence of trajectories (φ v ) x v=w with φ v ∈ Λ v for v = w + 1, . . ., x and φ w ∈ Ξ w and such that φ where for each v either α v = φ v or α v = φv , depending on whether α v+1 terminates at v or at n − v.
Proof.Let r v be the minimum r such that φ (r) = v or φ (r) = n − v. Then for each v = w + 1, . . ., x, we can define and Then, each α v begins at either v or n − v and terminates upon reaching either v − 1 or n − v + 1 for the first time.
Hence it is a member of Λ v or Λv , but not both.Finally, α w is a member of Ξ w because it begins at w and never reaches either w − 1 or n − w + 1, since this would contradict the definition of w.
We will use the notation ṽ := min(v, n − v) for any integer v throughout the remainder of the proof.The above lemma allows us to replace the sum over φ ∈ Ψ x with sums over w from 0 to x and sums over φ each of which can be collected within just one of the sums.Moreover, the fact that these products are invariant under reversing the path, i.e. for any function f , means that it is unimportant that α v can equal φ v or φv as both yield the same result.We choose a φ so that it only depends on w = min(m(φ), n − M (φ)), denoted henceforth by a w .Collecting these observations, and noting that the L[φ] factors of q/(q 2 + 1) can each be allocated to one of the steps taken in φ we find that (q + 1) n (2q) (q − 1) 2 (Z − Z H ) (E68) where in the final line, we used the fact that by definition of v ≤ n − v for all r < L[φ] and also λ v ≥ λ v whenever ṽ < ṽ .
Proof of Theorem 12.The structure of the proof is very similar to Theorem 8 for general architectures.We use the framework of the biased random walk.Let x := γ (t) .The transition rule is such that x with probability 1 − 2x(n−x) n(n−1) x − 1 with probability 2x(n−x) n(n−1) x + 1 with probability 2x(n−x) n(n−1) and so ≥ x 1 − 2(q 2 − 1) n(q 2 + 1) .(E122) Definition 7 (Approximate designs).A probability distribution µ on U(q n ) is an ε-approximate unitary k-design if the k-fold channels obey For a given k, if ε = 0 we say that the distribution forms an exact k-design.
A weaker notion of approximate design involves the operator norm of the moment operators, sometimes referred to as the tensor product expander (TPE) condition.The vectorization isomorphism uniquely maps channels to operators, with which we can define the kth moment operator from the k-fold channel for a probability distribution µ on the unitary group U(q n ) as Φ (k)  µ := vec Φ (k) µ ) = dµ(U ) U ⊗k ⊗ U * ⊗k .(F3) For convenience we denote U ⊗k,k := U ⊗k ⊗ U * ⊗k .
Definition 8 (Weak approximate designs).A probability distribution µ on U(q n ) is a weak ε-approximate unitary k-design if the kth moment operators obey The expectation of the collision probability for completely Haar-random unitaries is Z H = E H [Z] = 2/(q n + 1) ≤ 2/q n , and thus anti-concentrates with α = 1/2 as defined in Definition 4. But as the collision probability is a second moment quantity, where p U (x) 2 = | x| U |1 n | 4 , for an exact unitary 2-design µ we find and thus also 1/2-anti-concentrates, where E H [•] denotes the expectation with respect to the Haar measure on the unitary group.
Proposition 1.An ε-approximate 2-design µ with ε = 1/q 2n has a collision probability of Z = E µ [ x p U (x) 2 ] ≤ 3/q n and is thus a 1/3-anti-concentrator.Moreover, the same holds for a weak ε-approximate 2-design (TPE) µ with ε = 1/q 2n .Proof.For an ε-approximate 2-design in diamond norm, we find H )(|1 n 1 n |) 1 + 2 q n (q n + 1) (F9) ≤ 2 q n (q n + 1) + ε , (F10) where we wrote the difference in terms of the 2-fold channels, in the second to last line used Hölder's inequality, and in the last line used the definition of the diamond norm and the definition of an ε-approximate 2-design.
Given the definition of an approximate design in terms of the diamond norm, we must take the error to be exponentially small.Thus, for an approximate 2-design µ with ε = 1/q 2n , the collision probability is Z ≤ 3/q n and thus 1/q 2n -approximate unitary 2-designs in diamond norm anti-concentrate with α = 1/3.

TABLE I .
Summary of results: upper and lower bounds on the circuit size sAC at which anti-concentration is achieved for different random circuit architectures.