Models of quantum complexity growth

The concept of quantum complexity has far-reaching implications spanning theoretical computer science, quantum many-body physics, and high energy physics. The quantum complexity of a unitary transformation or quantum state is defined as the size of the shortest quantum computation that executes the unitary or prepares the state. It is reasonable to expect that the complexity of a quantum state governed by a chaotic many-body Hamiltonian grows linearly with time for a time that is exponential in the system size; however, because it is hard to rule out a short-cut that improves the efficiency of a computation, it is notoriously difficult to derive lower bounds on quantum complexity for particular unitaries or states without making additional assumptions. To go further, one may study more generic models of complexity growth. We provide a rigorous connection between complexity growth and unitary $k$-designs, ensembles which capture the randomness of the unitary group. This connection allows us to leverage existing results about design growth to draw conclusions about the growth of complexity. We prove that local random quantum circuits generate unitary transformations whose complexity grows linearly for a long time, mirroring the behavior one expects in chaotic quantum systems and verifying conjectures by Brown and Susskind. Moreover, our results apply under a strong definition of quantum complexity based on optimal distinguishing measurements.


Motivation and overview
The complexity of a computation is a measure of the resources needed to perform the computation. In a classical model of computation, the complexity of a Boolean function may be defined as the minimal number of elementary steps needed to evaluate the function. The precise number of steps needed depends on how the model is chosen, but this notion of complexity provides a useful way to quantify the hardness of a computational problem because how the number of steps scales with the size of the input to the problem has only weak dependence on the choice of model. By broad consensus, a computational task is considered to be feasible if its complexity grows no faster than a power of the input size, and intractable otherwise; using this criterion, all classical models of computation agree about which problems are (classically) "easy" and which ones are "hard." Likewise, we may separate computational tasks into those that are easy or hard for a quantum computer. The circuit model of quantum computation provides a convenient way to quantify quantum complexity -namely, the quantum complexity of a Boolean function is the minimal size of a quantum circuit which computes the function and outputs the right answer with high success probability. Here by the size of the circuit we mean the number of quantum gates in the circuit. These gates are chosen from a universal set of gates, where each gate in the set is a unitary transformation acting on a constant number of qubits or qudits.
Though there are many ways to choose the universal gate set, any set of universal gates can simulate another accurately and efficiently, so that circuit size provides a useful modelindependent measure of complexity. From a physicist's perspective, a quantum computation is a process governed by a local time-dependent Hamiltonian, and an intractable computation is a process that requires a time which grows superpolynomially with the system size. Such intractable processes are not expected to be observed in Nature.
Furthermore, in quantum physics, in contrast to classical digital computation, there is a meaningful notion of complexity not only for processes, but also for quantum states. Starting from a state in which all the bits are set to 0, any string of n classical bits can be prepared by flipping at most n bits. But the time needed to prepare a pure n-qubit quantum state, starting from a product state, even if we are permitted to use any time-dependent Hamiltonian which is a sum of terms with constant weight and bounded norm, can be exponential in n. In fact, because the volume of the n-qubit Hilbert space is doubly exponential in n, while the number of quantum circuits with T gates is merely exponential in T , most n-qubit pure quantum states have exponentially large complexity. That is, for a typical pure state in the n-qubit Hilbert space, the time needed to prepare the state with some small constant error δ, starting from a product state, grows exponentially with n. Thus, nearly all quantum states of any macroscopic system will forever be far beyond the grasp of the quantum engineers [1].
While the complexity of quantum circuits has long been a foundational concept in quantum information theory [2], appreciation that quantum state complexity is an important concept has blossomed relatively recently. For example, the complexity of ground-state wave functions may be used to classify topological phases of matter at zero temperature [3]. Furthermore, a chaotic quantum Hamiltonian H can be usefully characterized by saying that evolution governed by H over a long time period generates highly complex states. A particularly intriguing proposal is that, in the context of the AdS/CFT correspondence, the complexity of a quantum state of the boundary theory corresponds to the volume in the bulk geometry which is hidden behind the event horizon of a black hole [4][5][6][7].
When we say a quantum state is highly complex, we mean there is no easy way to prepare the state, but how can we be sure? Perhaps we were not clever enough to think of an ingenious short-cut that prepares the state efficiently. It's not possible in practice to enumerate all the quantum circuits that approximate a specified state to find one of minimal size. For that reason, it is quite difficult to obtain a useful lower bound on the complexity of the quantum state prepared by a specified many-body Hamiltonian in a specified time. It is reasonable to expect that, for a chaotic Hamiltonian H and an unentangled initial state, the complexity grows linearly in time for an exponentially long time, but we do not have the tools to prove it from first principles for any particular H.
One possible approach is to rely on highly plausible complexity theory assumptions to derive nontrivial conclusions about the complexity of states generated by particular circuits or Hamiltonians [8][9][10]. Another is to consider ensembles of circuits, and to derive lower bounds on complexity which hold with high probability when samples are selected from these ensembles. We follow the latter approach here, drawing inspiration from recent work exp(Ω(n)) complexity circuit size (time) Figure 1. Expected complexity growth in random circuits. Conjecture 1 states that, for random quantum circuits acting on n qubits, the circuit complexity grows linearly with circuit size (time) until it saturates at a value exponentially large in n. Our work provides rigorous evidence supporting this picture for quantum systems with sufficiently large local dimension; see Corollary 2. by Susskind [8] and Brown and Susskind [7]. These authors state a conjecture about the complexity growth of geometrically local random quantum circuits (see Figure 1): Conjecture 1 (Brown, Susskind [7]; Susskind [8]). Most local random circuits of size T have a complexity that scales linearly in T for an exponentially long time.
Our goal is to strengthen the evidence supporting this conjecture.
Brown and Susskind provided evidence for this scaling law by means of a simple counting argument; see also [11]. For a fixed finite set of universal quantum gates, consider the ensemble of all circuits with size T . By definition, this ensemble accurately approximates (to within a specified error δ) all unitary transformations with complexity T or less. Furthermore, the number of circuits increases exponentially with T , and, because the unitary group has a very large volume, it seems reasonable to assume that "collisions" between circuits are rare unless T is very large; that is, that the number of distinct unitary transformations realized by this ensemble (where "distinct" means more than distance δ apart) is comparable to the number of circuits. This means that the number of circuits with size T is too small to account for more than a small fraction of the unitary transformations realized by circuits of size T if T is much smaller than T . In other words, most random circuits with size T have complexity at least T , where T is comparable to T .
This argument hinges on a crucial assumption, which sounds plausible but is hard to prove: collisions between circuits of subexponential size are rare. Collisions certainly occur for any circuit size T , and necessarily become common for circuits of exponential size, where T is comparable to the Hilbert space dimension so that the exponential of T is comparable to the Hilbert space volume. Thus an analytic treatment of complexity growth seems like a daunting combinatorial task.
The work [12] provides some rigorous support for Conjecture 1. There, the authors show that local random circuits can "fool" short measurement procedures. That is, a typical quantum state prepared by the local random circuit, acting on an initial product state, cannot be distinguished from a maximally mixed state by any procedure that is much simpler than running the circuit backwards and verifying that the initial product state is recovered. Although not stated in this fashion in [12], their results imply that, with high probability, a local random circuit of size T has complexity Ω(T 1/11 ). While this argument rigorously proves a weakened version of Conjecture 1, there are still issues we wish to address: (i) Restricted notion of complexity: The authors implicitly define complexity as the capability of fooling short measurement protocols. While this operational notion of complexity is well motivated, the actual measurement procedures considered are quite restrictive.
In particular, they do not take into account ancilla-assisted measurements -a mainstay of modern quantum information.
(ii) Collisions are not treated explicitly: The ensemble of local random circuits of size T defines a probability distribution on the n-qubit unitaries. If we are only interested in specifying unitary transformations up to some specified error δ, collisions occur, so that some unitaries are more likely than others. The arguments in [12] show that the unitaries sampled from this distribution typically have complexity Ω(T 1/11 ), but do not rule out the possibility that the distribution is highly nonuniform. It is at least a logical possibility, compatible with the findings of [12], that the ensemble contains only a small number of unitaries which have high complexity, all of which occur with relatively high probability. To conclude that the ensemble contains many high-complexity unitaries, we need to know more about the properties of the probability distribution governing the ensemble.
(iii) Polynomial relation between circuit size and complexity: The relation between circuit size T and expected minimal complexity T 1/11 is polynomial, not (yet) linear as required by Conjecture 1.
In this work we make progress toward a rigorous proof of Conjecture 1 by developing a general framework which addresses some of the shortcomings of the previously known rigorous evidence in favor of the conjecture [12]. In particular, we define and use a strong notion of complexity, which captures the difficulty of distinguishing a given circuit from the most useless possible quantum channel: the completely depolarizing channel D(ρ) = Tr(ρ) d I that maps any state to the maximally mixed state.
Definition 1 (strong complexity: informal definition). The complexity of a quantum circuit U is the minimal circuit size required to implement an ancilla-assisted measurement that is capable of distinguishing ρ → U ρU † from the completely depolarizing channel ρ → 1 d I.
We refer to Sec. 2.1 for a more detailed motivation and a precise statement of this definition. For now, we emphasize that this strong definition of complexity implies other (weaker) definitions, such as the minimal circuit size required to approximate U . Our first main contribution is a rigorous connection between complexity growth and the notion of approximate unitary k-designs [13,14]. We use the notation {p i , U i } for an ensemble of unitary transformations in which the unitary U i occurs with probability p i . A unitary k-design is an ensemble with strong pseudo-random properties; an approximate kdesign accurately approximates the first k moments of the Haar measure on the unitary group. Hence a k-design with large k behaves essentially like a Haar-random ensemble of unitaries, while a small-k design can be highly structured. For instance, the n-qubit Pauli group forms a 1-design, while the n-qubit Clifford group is a 3-design [15][16][17]. The design order k allows us to interpolate between these two very different regimes. Intuitively, we would expect that the complexity of a k-design grows with k. Our first technical contribution makes this intuition precise: a linear growth in design implies a linear growth in (strong) complexity.
Theorem 1 (informal statement). Let {p i , U i } be an approximate unitary k-design. Then, a randomly selected (according to the weights) element is very likely to have strong circuit complexity ≈ k.
We refer to Theorem 3 for a more detailed, quantitative statement. This result strengthens the assertions in [12] by allowing ancilla-assisted measurement procedures. To do so we prove novel bounds on Haar moments, see Sec. 2.4 for details. Our second technical contribution shows that the k-design property alone severely limits the likelihood of collisions. Lemma 1. Let {p i , U i } be an approximate k-design. Then, the associated weight distribution cannot be too spiky: This result formalizes the intuitive idea that giving unusually high weight to some unitaries cannot be compatible with the k-design property, but we are not aware of any precise statements along these lines in the existing literature. Importantly, because Lemma 1 establishes that the distribution is nearly flat, knowing that sampling from a k-design yields a highcomplexity unitary with high probability (as stated in Theorem 1) allows us to infer that there must be many distinct high-complexity unitaries in the ensemble. Here our reasoning is based on an approximate version of Laplace's definition of probability: if events are assigned nearly equal probabilities, then the probability of property X is approximately the number of events with property X divided by the total number of events. Together, Theorem 1 and Lemma 1 imply the following corollary: Corollary 1. Any approximate k-design contains exponentially many (in k) unitaries that have circuit complexity Ω(k).
While Corollary 1 does not by itself strongly constrain how these high-complexity unitary transformations are distributed geometrically within the n-qudit unitary group, we are also able to prove a stronger result: An approximate k-design contains exponentially many (in k) high-complexity unitaries whose pairwise distance (i.e. the distance between any pair of unitaries) is almost maximal in the diamond norm. This stronger statement rules out the possibility that most of the high-complexity unitaries reside inside a few tightly packed clusters within U (d).
Approximate unitary k-designs are a central concept in quantum information, where their pseudo-random properties have found extensive application across subfields, e.g. state distinguishability [18], decoupling [19], state tomography [20,21], randomized benchmarking [22], equilibration [12] (and references therein), information scrambling [11,23], and many more. As a result, several probabilistic constructions are known. Applying Corollary 1 to any of these constructions establishes a rigorous model for quantum complexity growth. In particular, • Local random quantum circuits with polynomial design growth: Ref. [12] proves that the set of all geometrically local circuits of size T = O(n 2 k 11 ) forms an approximate unitary k-design. 1 Corollary 1 therefore implies that local circuits of size T contain at least exp(Ω(T 1/11 )) elements with strong complexity Ω(T 1/11 ).
• Stochastic quantum Hamiltonians with polynomial design growth: One can study the growth of complexity in continuous-time models of chaotic dynamics, rather than the discrete-time dynamics embodied by random circuits [24][25][26]. Stochastic Hamiltonian dynamics, in which a local Hamiltonian fluctuates as a function of time, has been shown to realize approximate k-designs [25] with a relationship between k and the evolution time similar to what was established in [12] for local random circuits. Further progress achieved in [26] shows that, for a particular class of stochastic Hamiltonians, evolution time linear in k suffices to generate approximate k-designs for k = o( √ n). Corollary 1 therefore implies that with high probability the complexity grows linearly in time, at least for a while.
• Local random circuits with linear design growth: Very recently, one of the authors substantially improved on the results of [12] using an exact mapping from random circuits to the statistical mechanics of a lattice model [27], showing that local circuits of size T = O(n 2 k) form approximate k-designs in the limit of large local dimension (Hilbert space dimension d = q n with q large). Combined with Corollary 1 this establishes a linear relation between circuit size and complexity. Thus we can prove the following statement analogous to Conjecture 1: The set of all local circuits of size T contains at least exp(Ω(T )) elements with strong complexity Ω(T ), provided that the local dimension is sufficiently large: q ≥ q 0 (T ).
More precise statements of our main results, and a more detailed comparison to previous work, can be found in Sec. 2. We consider systems comprised of n qudits with local dimension q: d = q n . Existing works on complexity typically start with identifying a class of states that are useful starting states for quantum computations. In this work we will take a reverse approach and start with identifying a useless state. The maximally mixed state is unique in the sense that it is invariant under arbitrary unitary evolutions, including any quantum circuit. Intuitively, useful starting states should be as far away from this useless state as possible. If we measure distance in trace norm, this intuition is true to some extent. Any pure state |ψ ψ| obeys But this is clearly too coarse for distinguishing the usefulness of different states. In order to achieve such a task, we recall the operational interpretation of the trace distance. It corresponds the optimal bias achievable in distinguishing the state |ψ ψ| from ρ 0 using a single measurement [28,29]. We refer to Sec. 6.1.2 for a more detailed exposition. The optimal measurement achieving this bias is M = |ψ ψ| and does depend on the state in question. Such a measurement may be challenging to implement for states that we would intuitively assign a high complexity to (such as random states) and very easy for states that we consider useful (such as computational basis states). We can interpolate between these extreme cases by limiting the resources available to implement distinguishing measurements. Let H d denote the space of d × d Hermitian matrices. For fixed r ∈ N, we consider the class of measurements M r (d) ⊂ H d that can be implemented by combining (at most) r 2-local gates from a fixed, universal gate set G ⊂ U (4). We refer to Sec. 6.2.2 for further details and justification. The maximal bias achievable with such a restricted set of measurements is the solution to the following optimization problem: We may decompose the true optimal measurement as |ψ ψ| = U |0 0|U † for some U ∈ U (d).
The unitary U may be approximated to arbitrary precision by 2-local circuits chosen from a universal gate set [30]. This ensures For simple states, like computational basis states, this convergence happens rapidly, while generic states |ψ require exponentially large circuit depths. This observation is the motivation for the following definition of complexity.
r Figure 2. Pictographic illustration of strong state complexity (Definition 2). A black-box either outputs a (known) pure state ρ = |ψ ψ|, or the maximally mixed state ρ 0 = 1 d I. The task is to correctly guess which one it produced by applying a pre-processing circuit V (blue) of limited size r and performing a simple measurement (right). We say that |ψ has strong state complexity less than r if the probability of correctly distinguishing both possibilities is close to optimal.
Definition 2 (Strong state complexity). Fix r ∈ N and δ ∈ (0, 1). We say that a pure state |ψ has strong δ-state complexity at most r if This definition has a ready operational interpretation that is illustrated in Figure 2. The following result relates it to more traditional definitions.
The converse is false in general. To see this, select a generic state | ψ on (n − 1) qudits and set |ψ = |0 ⊗| ψ . Then, the quantity in Eq. (2.6) is dominated by the (traditional) complexity of | ψ , which may be very high. Nonetheless, the simple distinguishing measurement M = |0 0| ⊗ I (ignore everything but the first qudit) achieves which is high, especially for large local dimension q. This example highlights that Definition 2 is indeed more stringent than traditional definitions of state complexity.
Proof of Lemma 2. By contraposition. Let G r ⊂ U (d) denote the class of unitary circuits that are comprised of at most r 2-local gates chosen from a universal gate set G. Suppose there exists a size-r circuit V ∈ G r such that 1 The state difference in question has rank two which allows for explicitly computing the trace distance: The assumption is therefore equivalent to | 0|V † |ψ | 2 ≥ 1−δ and we conclude because V |0 0|V † ∈ M r . This in turn implies C δ (|ψ ) ≤ r and the claim follows.

Unitary complexity
We define the complexity of unitary channels U(ρ) = U ρU † in a fashion similar to state complexity. We start with identifying the completely depolarizing channel as the most useless channel conceivable: The diamond distance between D and any unitary channel is close to maximal: (2.10) As detailed in Sec. 6.1.3, the diamond distance also has an appealing operational definition [31]. It corresponds to the maximal bias achievable for the task of distinguishing the channel U from D with a single channel use. The optimal strategy may involve a quantum memory. Choose a state in the doubled Hilbert space |φ φ|, with |φ ∈ C d ⊗ C d and input one half into the unknown channel, while the other half remains unchanged in the quantum memory. Subsequently, perform a two-outcome measurement on the output state to distinguish both channels. An optimal strategy for distinguishing U from D corresponds to choosing a maximally entangled (Bell) state |Ω ∈ C d ⊗ C d as input and measuring M = (U ⊗ I)|Ω Ω|(U † ⊗ I).
Equivalently, choose (U † ⊗ I)|Ω as input and measure M = |Ω Ω| on the output. Similar to the state complexity argument, the optimal input state, or the optimal outcome measurement (or both) depend on the unitary U ∈ U (d) describing the channel U. This may be challenging to implement, especially if U corresponds to a complicated circuit. We restrict apparatus power by bounding the total circuit sizes that are allowed to implement such a measurement procedure. Let G r ⊂ U (d 2 ) be the set of all unitary circuits on 2n qudits (register+memory) that are comprised of at most r elementary gates. Likewise, let M r ⊂ H ⊗2 d denote the class of all two-outcome measurements on 2n qudits that require circuit size at most r to implement. The optimal bias achievable under such restrictions is β qc (r, U ) = maximize Tr (M ((U ⊗ I) (|φ φ|) − (D ⊗ I) (|φ φ|))) (2.11) where the identity channel I : H d → H d indicates that the memory is left unchanged. As r increases, more complicated measurements and state preparations become possible. At some point this will include ever more precise approximations of the optimal measurement [30]: Similar to the state case, the rate of convergence does depend on the complexity of the unknown unitary U . This is the basis for our operational definition of unitary complexity.
Definition 3 (Strong unitary complexity). Fix r ∈ N and δ ∈ (0, 1). We say that a unitary U ∈ U (d) has strong δ-unitary complexity at most r if β qc (r, U ) ≥ 1 − 1 d 2 − δ , which we denote as C δ (U ) ≤ r . A black box (center) takes quantum states as inputs and applies either a unitary channel U(ρ) = U ρU † , or the depolarizing channel D(ρ) = ρ 0 . The task is to correctly guess which evolution occurred. The rules of the game allow short pre-and post-processing circuits (blue) that may involve a quantum memory. The final guess must be based on a simple measurement (right). We say that U has complexity less than r = r + r if the probability of correctly distinguishing both options is close to optimal.
The operational motivation for this definition is sketched in Figure 3. Strong unitary complexity (Definition 3) is more stringent than traditional definitions that use approximation errors in some norm.
Lemma 3. Suppose that U ∈ U (d) obeys C δ (U ) ≥ r + 1 for some δ ∈ (0, 1), r ∈ N and measurement procedures that include the Bell-measurement |Ω Ω|. Then, min size(V )≤r 14) i.e. it is impossible to accurately approximate U by circuits comprised of fewer than r elementary gates.
Again, the converse relation is false in general. Including the Bell measurement |Ω Ω| simplifies the proof of this claim considerably. Also, relatively short circuits allow for transforming computational basis states into Bell states. For q = 2, a depth-two circuit comprised of n Hadamard gates and n CNOTs suffices.

Proof of Lemma 3. By contraposition. Assume there exists
as the second expression involves a trace distance of two pure states which can be computed explicitly. Next, note that M = (V ⊗ I)|Ω Ω|(V † ⊗ I) is a legitimate distinguishing measurement, because size(V ) ≤ r and M = |Ω Ω| is permitted by assumption. Together with (2.16)

Approximate unitary designs
The concept of unitary k-designs [13,14] provides an interpolation between two extreme cases: (i) small collections of highly structured unitaries that form the basic building blocks of quantum computing devices (e.g. local Pauli gates, or elements of the Clifford group).
(ii) generic (Haar random) unitaries that lack any sort of structure and require circuits of exponential size to approximate. Roughly speaking, an ensemble E = {p i , U i } of unitaries is a unitary k-design if it exactly reproduces the first k moments of the Haar measure over the unitary group. More precisely, given the twirling channels T (k) U (X) for all X in the k-fold tensor product. (2.17) Although seemingly abstract, this notion captures important physical concepts. 1-designs are in one-to-one correspondence with unitary operator frames, while 2-designs sufficiently capture the notion of scrambling [11,23]. Unitary k-designs are known to exist for any dimension d and any order k. Nevertheless, explicit constructions are notoriously difficult to find. This challenge can be overcome by relaxing the notion of a k-design. Indeed, for most applications it is sufficient to ensure that Eq. (2.17) is only approximately true, see Definition 4 for a precise statement. Several conventions for choosing an appropriate distance measure · have been put forth, but here we opt for the diamond distance · which quantifies the distinguishability of two ensembles.
In contrast to exact k-designs, several explicit constructions for approximate k-designs have been established [12, 25-27, 32, 33], including local random circuits and various Brownian circuits/stochastic quantum Hamiltonians. These constructions allow us to relate abstract insights about complexity growth in designs to concrete random circuit models. Consider the set of (pure) states in d = q n dimensions that results from applying all unitaries associated with an -approximate 2k-design to a fixed starting state |ψ 0 . Then, this set contains at least 18) distinct states that obey C δ (|ψ ) ≥ r + 1 each. Qualitatively, this number is of order (d/k) k as long as r obeys r k(n − 2 log(k)) log(n) . (2.19) Because of collisions, the emphasis on distinct is justified; two or more distinct unitaries can lead to the same final state.

Unitary complexity growth
Theorem 3. A discrete approximate 2k-design in d = q n dimension contains at least distinct unitaries that obey C δ (U ) ≥ r + 1 each. Qualitatively, this number is of order (d 2 /k) k as long as r obeys r k(n − 4 log(k)) log(n) . (2.20)

Moment bounds
Both Theorem 2 and Theorem 3 follow from an initial probabilistic statement combined with relatively straightforward counting arguments. These probabilistic statements highlight that it is very unlikely to distinguish random k-design elements from their average with a fixed measurement procedure. Markov's inequality -Pr [S ≥ τ ] = Pr S k ≥ τ k ≤ E[S k ]/τ k for nonnegative random variables S -reduces this probabilistic assertion to a question about moment growth. The larger the moments we can control, the stronger this assertion becomes. Designs appropriately capture this feature: a k-design accurately approximates Haar-random moments up to order k. This is why designs with growing k become increasingly complex. For state complexity, the associated Haar moment computation is relatively straightforward: for any fixed measurement M , see e.g. Corollary 6 below. However such simple moments do not cover strong unitary complexity. Quantum channels allow for more sophisticated measurement procedures that render the associated Haar moment computations nontrivial. Our main technical contribution is a novel bound that addresses this setting.
This bound is considerably more general than existing ones in the literature and may be more generally applicable. Ref. [12], for instance, utilizes Eq. (2.21) only. We establish this result by combining Schur-Weyl duality [34,35] with Weingarten calculus [36,37] and auxiliary arguments from tensor network theory [38,39] and convex optimization [40,41]. We believe that the dimensional scaling in the final bound is tight, but there may be room for further improving the k-dependent constants. In particular, we do not know if the Catalan number is necessary, or merely an artifact of our proof technique.

Relation to previous work
Quantum complexity has recently become a popular subject in high energy physics. A considerable amount of attention has been devoted to understanding the complexity accumulated after an exponentially long time. A joint work by Susskind and Aaronson [4,8,9] points to an intriguing connection to theoretical computer science: unless PSPACE ⊆ BQP/poly (a hypothetical relation between different computational complexity classes that is widely believed to be false), the circuit complexity of certain Hamiltonian evolutions U = exp(−iHt) achieves super-polynomial values for exponentially long time scales t. In a similar vein, Bohdanowicz and Brandão [10] constructed a family of Hamiltonians that provably achieves superpolynomial complexity in exponential time, unless PSPACE = BQP.
These arguments address late-time complexity and therefore do not yield insights regarding early-time complexity growth. In this regard, relations between complexity growth and approximate k-designs have recently been pointed out in [11,42]. Specifically, [11] defined a notion of the complexity of generating an ensemble of unitaries and gave a lower bound on the ensemble complexity in terms of the distance to forming a unitary design. They also argued that the lower bound of the complexity of a k-design is linear in k. Our arguments and results may be regarded as a substantial refinement of these ideas.
The notion of strong complexity put forward in our work has its conceptual roots in quantum information. Encompassing this mindset is the statement from Ref. [43]: "most states are too entangled to be useful as computational resources." At the core of this argument is the following observation. Haar-random pure states are so highly entangled that local measurements yield almost uniformly random outcomes. In turn, any quantum device that relies on local measurements and uses known, but Haar-random, states could be efficiently simulated by tossing classical coins! This prevents any genuine quantum advantage for computation.
Strong state complexity (Definition 2) may be thought as a formal version of this observation. Measuring the maximally mixed state ρ 0 always results in a uniform outcome distribution. Moreover, Ref. [43] makes essential use of the fact that the measurements are constrained to be "simple" (in their case: local measurements augmented by classical post-processing). The core of their argument may be summarized as follows: low complexity measurements do not allow for distinguishing a Haar-random state from the maximally mixed state. We present a variant of this argument in Sec. 5.1 below.
While [43] only considers Haar-random pure states, similar arguments have been established for states that are less generic, see e.g. [12,Section 3]. Although not stated at this level of generality, [12, Corollary 10] effectively points out that states generated by approximate k-designs fool short quantum circuits: with high probability they cannot be distinguished from the maximally mixed state by means of any measurement with small circuit size. They also extend this result to circuits [12,Corollary 11]. With high probability, a randomly selected (according to the weights) k-design element cannot be approximated by any short-sized circuit V in the sense that U − V ∞ is small.
The second main result of our work, Theorem 3, improves upon this result in two ways. Firstly, the strong unitary complexity (Definition 3) is more stringent than their more traditional definition. While Theorem 3 does imply [12,Corollary 11], the converse is not necessarily true.
Secondly, and more importantly, both Corollary 10 and 11 in [12] are probabilistic. While this is enough to deduce average-case behavior, a strong quantitative statement about the number of k-design elements with high circuit complexity remains beyond the scope of these assertions. A worst case caricature may help to illustrate this subtlety. Suppose that the weights accompanying a unitary k-design are extremely spiky. A single high-complexity unitary, say U 1 ∈ U (d) is accompanied by an exceedingly large weight p 1 1, while all other design unitaries U i have low complexity and almost vanishing weights p i 0. Such a weight distribution would not contradict the assertion of [12,Corollary 11]. The single high complexity circuit occurs with high probability (over the weights). Nonetheless, the hypothetical k-design does only contains a single high-complexity element.
Here we overcome this issue by explicitly ruling out the possibility of such extreme cases ever occurring. The definition of an approximate k-design alone implies that the weights cannot be too spiky, see Lemma 1. This bound on the weights allows us to convert probabilistic (average case) statements into quantitative ones. Not only does the average circuit complexity grow linearly with the order k of an approximate design, the absolute number of distinct circuits that have high complexity must also grow exponentially with k.
Interest in state complexity has been stimulated by its potential role in quantum gravity and the AdS/CFT correspondence; see Sec. 4 for further discussion. Recently, the relevance to holographic duality of computational pseudorandomness has been emphasized. Specifically, the authors of [44] argue that one can construct two mixed quantum states on the boundary (A and B) such that both A and B can be efficiently prepared, yet A and B cannot be distinguished from maximally mixed states by polynomial-size quantum circuits. Furthermore, the corresponding bulk states (A and B ) can be distinguished efficiently from one another. This observation indicates that the holographic dictionary which relates bulk and boundary states must have high computational complexity.
We stress that this concept of pseudorandom quantum states, which can be efficiently prepared yet cannot be distinguished from random by computationally bounded observers, is applicable to mixed states, or ensembles of pure states, but not to individual pure quantum states. If a particular pure state can be prepared efficiently by a quantum circuit, that state can always be distinguished efficiently from a maximally mixed state by running the circuit backwards. An ensemble of pure states can be pseudorandom only if it contains superpolynomially many pure states, where the observer who draws a sample from the ensemble and attempts to distinguish this sampled state from a maximally mixed state has no information about which sample was chosen. In contrast, in our definition of complexity for pure states, the observer is permitted to use a different distinguishing circuit for each possible pure state. On the other hand, the existence of pseudorandom quantum states [45] indicates that, for mixed states, our definition of state complexity, namely the computational cost of distinguishing the state from a maximally mixed state, can differ substantially from another natural definition, the computational cost of preparing the state.

Complexity growth in random circuits
The rigorous statements put forward in Theorem 2 and Theorem 3 gain additional meaning when applied to concrete examples. The literature contains several proofs of design growth in random circuits. Combining these with our rigorous insights yields a number of concrete models for complexity growth.

Local random circuits
For concreteness, we focus here on systems comprised of n qubits, i.e. q = 2 and d = 2 n . Let G ⊂ U (4) be a (finite) universal gate set containing inverses, i.e. g −1 = g † ∈ G whenever g ∈ G. We can generate G-local random circuits by sequentially applying a random gate g ∈ G to a randomly selected pair of neighboring qubits. Repeating this procedure independently for T steps results in random circuits of size T . We refer to the application of each gate as a time step, such that size T circuits are of depth T and have thus evolved to time T . Intuitively, the larger T , the more random the circuit becomes. A seminal result by Brandão, Harrow, and Horodecki confirms this intuition in a precise sense.
Theorem 5 (Corollary 7 in [12]). Fix d = 2 n , > 0, k ≤ √ d, and let G ⊂ U (d 2 ) be a universal gate set containing inverses. 2 Then, the set of all G-local random circuits of size T forms an -approximate k-design if We emphasize that the weights associated with each unitary in this ensemble are defined implicitly by this random procedure. Several different T -sized circuits may give rise to the same final unitary, say U 1 , while another one, say U 2 , may exclusively be obtained from a single circuit geometry. The weights associated with U 1 and U 2 take into account this behavior, i.e. p 1 ≥ 2p 2 for our example. However, the fact that the entire ensemble still forms an approximate k-design limits potential fluctuations. This in turn imposes lower bounds on the minimal number of distinct unitaries and severely limits the potential for collisions. It cannot be too likely that two or more different random circuits coincide. These features were conjectured by Brown and Susskind [7, Sec. 6.5] who, in turn, base their counting argument that relates circuit size and complexity on an extreme version of this conjecture: collisions do not occur at all. One of the main results of this work is rigorous proof for a weaker version of their conjectured relation between circuit size and complexity. It is an immediate consequence of Theorem 3 and Theorem 5.
Corollary 3 (Polynomial relation between circuit size and circuit complexity for local random circuits). Fix δ ∈ (0, 1), r ≤ 2 n/2 and set T ≥ Cn 2 log 2 (n)r n 11 . Then, the set of all G-local circuits of size T contains at leastC2 log(n)r unitaries that obey C δ (U ) > r. Here, C,C > 0 are constants that implicitly depend on δ and G.
This result establishes a polynomial relation between the length T of G-local circuits and the strong δ-unitary complexity that may be achieved in such a model. 3 The relation T r 11 a consequence of Theorem 5, which features a similar relation between the degree 2k of an approximate 2k-design and the circuit size T required to implement it. This relation between complexity and circuit size can certainly be improved, which we will soon discuss, but there are fundamental limits: a lower bound on on the design depth for random circuits is known. A converse result (Proposition 8 in [12]) states that for ≤ 1/4 and k ≤ d 1/2 , the size of random circuits on n qudits must be at least T ≥ 2kn log q q 4 log k to form an -approximate k-design .
See Sec. 7.10 for a rederivation of this claim.

Relating two conjectures
Fix q = 2, d = 2 n (n qubits) and suppose that the aforementioned lower bound were not only necessary, but also (approximately) sufficient: G-local circuits of size T 2nk log 2 (n) generate (sufficiently accurate) approximate 2k-designs. Under this assumption, G-local random circuits of size T contain at least d 2k /(k!) 2 elements with circuit complexity r T . If we assume that T ≤ 2n d, then this bound can be simplified further as This is essentially Conjecture 1: the set of all G-local circuits of size T contains an exponentially growing set of elements with complexity r T . This observation provides a relation between Conjecture 1 (linear growth in complexity) to an existing Conjecture in quantum information [12]: To achieve a linear growth in complexity we need a linear growth in design.

Linear growth in design for local random circuits at large local dimension
We again consider a 1d system comprised of n qudits of local dimension q, with total dimension d = q n , and evolve the system by a random circuit consisting of local 2-site unitaries drawn Haar-randomly from U (q 2 ). The results of [12] also ensure that such random circuits form approximate k-designs when the size is O(n 2 k 11 ). Although Conjecture 2, a linear design growth in G-local random circuits with local qubits, is currently out of reach, progress was made recently in [27], improving the k-dependence for Haar-local random circuits in the limit of large local dimension and giving a linear growth in the circuit size to form a unitary k-design.
Theorem 6 ( [27]). Random quantum circuits on n qudits of local dimension q form approximate unitary k-designs when the circuit size is T = O(n 2 k) for some q > q 0 , where q 0 depends on the size of the circuit. 4 The approach of [27] was to consider the frame potential, capturing the 2-norm distance to forming an approximate design, and make use of an exact statistical mechanical mapping [48,49] in order to write the frame potential as the partition function of a triangular lattice model. The contributions to the partition function can be interpreted as domain walls in the lattice model. In the limit of large q, [27] showed that only a simple sector of domain walls contribute, allowing for the calculation of the k-design circuit depth. More precisely, by computing the single domain wall terms and showing that the multidomain wall terms contribute at subleading order in 1/q, it was proved that local random circuits exhibit a linear growth in design for some q > q 0 , where q 0 depends on the circuit size T and moment k.
Theorem 6 and Corollary 3 allow us to establish Conjecture 1 for local random circuits with Haar-random 2-site unitaries in the limit of large q.
Corollary 4 (Linear complexity growth). Given the set of local random circuits of size T at large q, most circuits have strong complexity Ω(T ), i.e. growing linearly in T for a long time.
Although the Theorem 3 still applies for local Haar-random quantum circuits, giving a lower bound on the number of distinct unitaries with high complexity. Its meaning becomes less clear when we have a continuous ensemble. We can consider an ensemble of finite cardinality by constructing an ε-covering of the set of random circuits. We review the notion of an ε-covering in Sec. 7.10 and give a bound on the cardinality of a covering for local random circuits. Constructing a coarse net then shows that exponentially many random quantum circuits, with Haar-random 2-site unitaries, have high complexity.
Lastly, we emphasize that we have not proved linear complexity growth up to time scales of order d. While taking a large enough q will ensure linear design growth for times exponential in n, such a limit still pushes the true exponential time scales of interest, t ∼ d = q n , out of reach. Proving an optimal design growth for local random circuits away from the large q limit would allow us to better probe late-time complexity.

Brownian circuits/Stochastic quantum Hamiltonians
There also exist continuous-time models of chaotic dynamics, analogous to random circuits, which scramble in O(log n) time [24]. In a similar spirit to models of random walks on the unitary group, one can define a one-parameter family of Hamiltonians which generate a timedependent unitary evolution. The Hamiltonian on n qubits at a time step s is given by a sum of random all-to-all 2-body interactions, meaning we sum over all possible 1 and 2-local interactions with independently chosen Gaussian random couplings where S α i is a Pauli operator acting on site i with α = {0, 1, 2, 3}. The couplings are each drawn independently from a Gaussian distribution with zero mean and variance σ 2 . Not only are the couplings random in space, but are further chosen randomly at each time-step s. The time evolution to time t is then given by where we consider the continuum limit δt → 0 with the variance of the couplings scaling as σ 2 = J/δt so that the interactions strength increases proportional to the inverse time step and where J is a constant. It was shown in [25], using similar techniques to [12], that these Brownian circuits form k-designs in polynomial time.
Theorem 7 (Corollary 10 in [25]). For d = 2 n and > 0, Then the ensemble of timeevolutions by stochastic Hamiltonians in Eq. (3.4), forms an -approximate k-design for times where C > 0 is a constant.
For the Brownian circuit models, the constant prefactor C depends on the local dimension, here chosen to be 2, but also on the interaction strength of the couplings J, C ∼ 1/J, meaning if the interactions are stronger then the depth required to form a design decreases accordingly.
We can again use the polynomial relation between complexity and design to discuss complexity growth. Theorem 3 and Theorem 7 together give that Brownian circuits have a complexity growing polynomially in time as Ω(t 1/11 ).

Nearly time-independent Hamiltonian dynamics
There is another random quantum circuit-like construction of a time-dependent Hamiltonian with varying couplings over discrete time steps. This "nearly time-independent" model of [26] forms k-designs in a depth O(n 2 k) up to moments k = o( √ n), achieving the nearly optimal lower bound with a linear growth in design for a short time.
Consider a 1d system of n qudits, with d = q n , and define a time-dependent set of random couplings as well as two ensembles of Hamiltonians with time-dependent couplings where g = t/2 /2 and b = t/2 + 1/2. We then define the time-evolution of our system: we evolve by an X-type Hamiltonian H X ∼ E X at even time steps and a Z-type Hamiltonian H Z ∼ E Z at odd time steps. Then the unitary time-evolutions form an -approximate k-design for k = o(n 1/2 ), after T time steps, where This construction forms unitary k-designs almost linearly in time, with the caveat that the time scale is limited to ∼ √ n. Thus we get a linear growth in design at early times, but not exponentially in n. Consequently, this implies a linear growth in complexity at (very) early times.

Comment on time-independence
We have discussed a few explicit models of complexity growth in systems that are random in both space and time. As we emphasized, one of our results is that a polynomial growth in design implies a polynomial growth growth in complexity (Corollary 1). Thus, the random circuit and Brownian circuit models, which form approximate k-designs in poly(k) depth, are also explicit examples of systems with a long time polynomial growth in complexity.
But for an ensemble of time-evolutions to form a k-design, randomness in time is likely essential. For instance, consider an ensemble of time-evolutions generated by an ensemble of where E H could be a disordered spin system or any ensemble of random Hermitian matrices. The rigid structure of eigenvalues then prohibits the late-time Haar-randomness.
Interestingly, the Gaussian unitary ensemble (GUE), an ensemble of d×d random Hermitian matrices with a unitarily-invariant measure, does come close to an approximate k-design in 2-norm for moments k d at a specific time-scale t ∼ √ d [42]. But at later times, the 2-norm distance between the ensemble of unitaries generated by GUE Hamiltonians and the Haar ensemble becomes large. More generally, one expects that any ensemble of unitary evolutions by time-independent Hamiltonians will not form a k-design at late times. A general argument for this is as follows [11], under the exponential map λ → e iλt , the eigenvalues of a Hamiltonian are distributed as time-evolving phases on the unit circle. In the limit t → ∞, the phases become uncorrelated and uniformly distributed, unlike the correlated and logarithmically repelling eigenvalues of Haar random unitaries. Thus, to understand the complexity growth of (ensembles of) time-independent Hamiltonian evolution, we would need to look beyond their design properties for an alternative approach, for instance, by studying the approximate invariance of the ensemble [42,50].

Complexity in holographic systems
Much of the recent interest in quantum complexity in the high-energy literature has centered on the conjectured relation between complexity growth and the long-time growth of black holes interiors [4,5,51]. More specifically in the context of the AdS/CFT correspondence, the region behind the horizon of an eternal AdS-Schwarzschild black hole grows linearly in time for an exponential time (t ∼ e n ). The holographic picture is a two-sided geometry connected by a wormhole, where the throat of the wormhole is growing in time. The claim is that the quantum complexity of the dual CFT state is the long-time linearly increasing quantity which captures the wormhole growth. There have been a number of proposals for what bulk quantity actually computes the complexity, including the volume and action of the AdS wormhole. The complexity/volume conjecture states the the computational complexity of the boundary state is equal to the volume of the wormhole. More precisely, the complexity of time-evolved thermofield double state of the two boundary CFTs is equal to spatial volume behind the horizon of the two-sided geometry on a maximal time slice [5]. The 'complexity equals action' conjecture posits that the action computed on a certain region of the bulk geometry which extends behind the horizon (the Wheeler-DeWitt patch), is the quantity which is dual to the complexity [6,52]. A lot of progress has been made checking these conjectures and studying complexity growth in holographic systems, see for instance [53][54][55][56][57][58][59].
In this work we have rigorously computed the complexity growth in a number of random circuit models, by relating the growth in design to the growth of complexity, and were able to prove a linear growth in complexity for local random circuits in the limit of large local dimension (albeit, not for an exponentially long time). As we mentioned, the connection between unitary designs and quantum complexity will likely not inform complexity growth in holography as evolution by time-independent Hamiltonians will not converge to approximate designs. Thus, to study complexity growth in holography we need to explore properties beyond the Haar-randomness of the evolution.

Strong complexity in the bulk
We will briefly discuss why we believe our proposed strong definition of complexity (in terms of a distinguishing measurement), is congruent with expectations from the bulk and might be more suited for holography than the standard definition in terms of the circuit complexity.
One feature we expect complexity growth will exhibit in holography, and fast scrambling systems more generally, is the switchback effect [5]. Consider a time-evolved local operator O(t) = e −iHt Oe iHt (sometimes called a precursor), where O might be a single site Pauli. For such an operator, we anticipate a delay in the onset of the linear complexity growth. For the traditional definition of complexity, consider the minimal circuit approximating the evolution operator e −iHt . The reason for this delay is the exact cancellation of gates outside the lightcone of the spreading operator. Once the operator grows to be the size of the system (more precisely, to have support on weight n Pauli operators) after a timescale called the scrambling time, we expect the complexity of O(t) to begin its long time linear growth. Such an effect is also present in the bulk for both complexity-volume and action conjectures. This feature is also present in complexity growth of O(t) under the strong definition of complexity in Definition 2. To be concrete, consider a system of n qubits and the evolved state e −iHt Oe iHt |ψ 0 , where H is a chaotic but local Hamiltonian and we take |ψ 0 to be an unentangled product state. Prior to the scrambling time, the optimal measurement to distinguish the evolving state from the maximally mixed state is a simple measurement of a qubit outside the lightcone of the evolving operator. It is not until the scrambling time, when operator has grown to have support on all sites, that the complexity of the distinguishing measurement begins to grow.
A more interesting example, where the strong and weak definitions of complexity differ, is that of one-clean qubit. This is essentially the argument given in Lemma 2, to prove that measurement complexity is a stronger definition than standard circuit complexity. Consider a simple initial state |ψ 0 , which has been evolved for an exponential time such that |ψ(t) is maximally complex. If we add a single unentangled qubit to the state |ψ(t) ⊗ |0 , then the minimal circuit complexity will be unchanged, but maximal potential complexity increases and the complexity of the state can continue to grow for a long time until it saturates at the new maximal value. For the complexity of a distinguishing measurement, adding a single clean qubit resets the complexity to an order one value, as the optimal measurement is simply the projection onto the clean qubit. Ref. [7] proposed the notion of uncomplexity as the difference of the complexity of a state or unitary from its maximal complexity, where uncomplexity can be thought of as a resource to do useful computation. As we described, our strong definition of complexity directly encodes this potential for useful quantum computation.

Entanglement growth by design
The suggestion that complexity be the dual of the long-time geometric growth in the bulk was motivated by the observation that the wormhole grows long past the time-scales at which entropic quantities saturate. Given that we have discussed long-time growth in complexity from a long-time growth in design, it is worth commenting on the saturation of entropies after a short growth in design order.
The entanglement entropies for k-designs were studied in [60]. Specifically, they looked at the Rényi-α entropies of a density matrix ρ: S (α) (ρ) = 1 1−α log (Tr(ρ α )). For any state, the Rényis are bounded above and below by the min-entropy S min (ρ) := lim α→∞ S (α) (ρ) = − log( ρ ∞ ). 5 For an n-qubit system, consider the reduced density matrix ρ A = TrĀ|ψ ψ| on a subsystem A consisting of half the qubits, so that d A = dĀ. Ref. [60] showed that for states |ψ drawn from a (k > log d)-design, the min-entropy of ρ A is nearly maximal. Therefore, all entropies are nearly maximal once the design order is k ≈ n. Considering then the timeevolved states of a fast-scrambling system which forms unitary designs linearly in time, all entropies will saturate at a time of order n. Our arguments ensure complexity growth of approximate k-designs well beyond this entropy saturation threshold.

Most states have high complexity
The Hilbert space of n qudits is enormous, d = q n . Nonetheless, only a tiny fraction of all possible (pure) quantum states seems to be useful for quantum computation, see e.g. [43]. Strong state complexity (Definition 2) captures this curious aspect. In order to get a quantitative handle on the set of all pure states we endow it with the uniform measure dψ that is induced by the Haar measure on the unitary group U (d). Then, random pure states |ψ ψ| behave like the maximally mixed state ρ 0 in expectation. This behavior extends to the outcome statistics of arbitrary (fixed) measurements: We refer to Proposition 1 in the appendix for a proof of this well-known result. We can combine this assertion with a union bound (Boole's inequality) to conclude for any r ∈ N and δ ∈ (0, 1) To see this, recall the relation between Schatten-α norms in d dimensions: This ensures that for any state ρ, we have Smin(ρ) ≤ S (α) (ρ) ≤ Smin(ρ) + log d α . Note that as we take α to be greater than n, these Rényi entropies concentrate ever sharper around the min-entropy.
Suppose that M r arises from combining at most r elements of a fixed universal gate set G ⊂ U (q 2 ). A naive counting argument reveals |M r | ≤ 2dn r |G| r . We conclude that the Pr [C δ (|ψ ) ≤ r] remains exponentially suppressed (in d = q n ) until r q n log(n) .
To summarize, a random state is exceedingly likely to have an exponentially large strong δ-state complexity.
The Haar measure has another desirable property. It is fair in the sense that it assigns the same (infinitesimal) weight to each pure state. Such perfectly flat probability distributions allow for turning the probabilistic statement (5.3) into a quantitative one. From the definition of probability, Pr [C δ (|ψ ) ≤ r] corresponds to the ratio of low complexity states over all states. Thus, Eq. (5.3) ensures that the fraction of low complexity states remains exponentially tiny until r q n / log(n). In other words: most pure states have exponentially large complexity.

Most high-complexity states are far apart
In the previous subsection, we saw that concentration of measure (5.2) allows us to conclude that most quantum states have exponentially high state complexity. This argument, however, does not (yet) tell us anything about the geometric separation between high complexity states. In principle, a large fraction of high-complexity states could result from tiny perturbations of only a few well-separated high-complexity core states. In other words, high-complexity states could come in few tightly packed clusters, in which case the effective number of highcomplexity regions could still be comparatively small.
The probabilistic method [61] allows us to prove that extreme clustering cannot occur: there are exponentially many high complexity states whose pairwise distance is almost maximal.
We show this statement by induction based on two features of Haar random states. Firstly, we use the main result from the previous subsection. Choose r q n / log(n) such that Eq. (5.3) ensures The parameter r is chosen such that Haar random states exceed this complexity with probability 1/2. Concentration of measure also implies that a Haar-random state is very likely to be far away from any fixed state |φ φ|. For any ∆ ∈ (0, 1), This bound readily follows from Eq. (5.2) (set M = |φ φ|) and elementary modifications. The first step in our inductive argument is simple. Eq. (5.5) asserts that the probability of Haar-randomly sampling a low complexity (at most r) state is smaller than 1/2. This is equivalent to stating that the probability of Haar-randomly sampling a high complexity (larger than r) is at least 1/2. Importantly, this assertion implies that such a state exists, because the probability of sampling one is strictly positive. Choose one such state |φ 1 as the first state in our list.
To construct the second state in our list, we refine this probabilistic existence argument. The probability of Haar-randomly sampling a low complexity state or a state that is too close to |φ 1 is bounded by By contraposition, the probability of sampling a state that has high complexity and is simultaneously far away from |φ 1 is at least This implies the existence of such a state. Choose one such state |φ 2 and append it to the list: {|φ 1 , |φ 2 }.
We can now inductively repeat this probabilistic existence argument and iteratively append distant high-complexity states to the list {|φ 1 , . . . , |φ N }. This construction only breaks down once the list cardinality N counterbalances exponential suppression: . Beyond this threshold, we cannot use simple union bounds and concentration of measure to ensure existence of another list element. Such a threshold, however, scales exponentially in the Hilbert space dimension. We conclude that the list {|φ 1 , . . . , |φ N } contains N = 1 6 exp( ∆ 2 d 9π 3 ) high complexity states whose pairwise trace distance is at least 1 − ∆.
We conclude this subsection with providing a bit of context. Existence proofs based on strictly positive probabilities date back to Erdős who developed them to solve several important problems in graph theory. Today, this technique is known as the probabilistic method and does constitute an important tool in applied math, combinatorics, and theoretical computer science [61].

Proof of Theorem 2
Haar-random states result from applying a Haar-random unitary U ∈ U (d) to an arbitrary starting state, say |ψ 0 . Now suppose that this unitary U is not chosen from the Haar measure, but from an approximate 2k-design. By definition, this ensures that the first 2k moments of |ψ ψ| = U |ψ 0 ψ 0 |U † accurately approximate the corresponding Haar moments. While this is too coarse to deduce exponential concentration (5.2) (this would require matching behavior for all moments), polynomial concentration arguments do apply. Fix a measurement M ∈ H d and letM = M − Tr(M ) d I denote its traceless part. Markov's inequality then implies that for any τ > 0 The final expectation value corresponds to a moment of order 2k. This is the largest moment that still approximately exhibits Haar-random behavior. Explicit bounds can be derived by exploiting this similarity and we refer to Corollary 6 below for a technical derivation: Qualititatively, this deviation bound is proportional to d −k and becomes ever more stringent as the design order 2k increases. We can now combine this tail bound with a union bound and a counting argument for the measurement set M r in a fashion analogous to the Haar random case. For any r ∈ N and any δ ∈ (0, 1) this yields where we have tacitly assumed (1 − δ) ≥ 2d −1 in the last step. Qualitatively, this probability remains tiny until provided that n ≥ |G| and k < d/2. So far, this is a purely probabilistic statement. In contrast to the Haar-uniform case it is a priori not clear whether it is possible to transform it into a quantitative one. The reason for this is twofold: (i) the weights p j associated with different elements from an approximate 2k-design are typically not uniform. This non-uniformity extends to the distribution over the different states |ψ i . (ii) collisions in the state generation: two (or more) distinct design unitaries can produce the same state. Fortunately, the defining properties of designs ensure that these deviations cannot be too radical: the weights associated with distinct states |ψ i must obey q j ≤ (1 + ) d+k−1 k −1see Lemma 6 below. This extra condition does allow for drawing quantitative conclusions. Recall that the probability of an event E is the expected value of its indicator function 1 {E}. Therefore, The sum on the r.h.s. is simply the cardinality N of the set of states |ψ with δ-state complexity greater than r and the l.h.s. is 1 − Pr [C δ (|ψ ) ≤ r]. Substituting the bound (5.9) into this expression establishes the claim: (5.12)

Proof of Theorem 3
The proof is largely analogous to the proof of Theorem 2. The final expectation value corresponds to the highest 2k-design moment that still approximates Haar-random behavior. Our main technical contribution in Theorem 4 establishes tight bounds on such Haar random moments. These generalize approximate 2k-design ensembles E in a relatively straightforward fashion: See Corollary 5 below for a precise statement and proof. Next, we emphasize that the crude bound |M r | ≤ (2d 2 + 1)n 2r |G| r applies to circuit measurements. Combining the above concentration inequality with a union bound over all measurements M ∈ M r ensures where we have tacitly assumed (1 − δ) ≥ 2d −1 . Qualitatively, this probability remains tiny until r (k − 2)n − 4k log(k)) log(n) + log |G| k(n − 4 log(k)) log(n) , (5.16) provided that n ≥ |G| and k ≤ d/2. The definition of an approximate 2k-design also imposes constraints on the weight fluctuations. Lemma 1 asserts that weights associated with distinct ensemble unitaries must obey p j ≤ (1 + ) k! d 2k . This approximate flatness allows us to turn the probabilistic statement from above into a quantitative one: The sum on the right counts the cardinality N of distinct unitaries with δ-unitary complexity at least r + 1, while the l.h.s. may be lower-bounded by (5.15): (5.18)

Distant and distinct design elements
We have shown that unitary and state designs contain an exponential number (Ω(d k )) of distinct high complexity elements. But to really capture the ergodic nature of chaotic evolution over the unitary group, these distinct high complexity elements should be pairwise far apart.
Here we address this subtlety and show that unitary and state designs contain an exponential number of distant high complexity unitaries or states.

Distant and distinct state design elements
Consider an element drawn at random from an -approximate spherical k-design |ψ . Eq. (5.9) gives that the probability the state has δ-state complexity less than r, C δ (|ψ ) ≤ r, is bounded to be O(d −k ) when r kn. We can also show that the probability an element drawn at random from an -approximate spherical k-design is close to a fixed reference state |φ is polynomially suppressed in k. Choose ∆ ∈ (0, 1) and combine 1 2 |ψ ψ| − |φ φ| 1 = 1 − | ψ, φ | 2 with Markov's inequality to conclude The last inequality follows from k-design moment bound similar to Eq. (2.21). We refer to the proof of Lemma 6 for a detailed derivation. Qualitatively, this bound is of order O(d −k ).
We can now use a union bound to limit the probability of a random k-design state to have either low complexity or to be close to the reference state, As long as r nk, this bound is also of order O(d −k ) and, in turn, strictly smaller than one. We know that if the probability of the state having low complexity or being close to our fixed state is strictly less than 1, then there is a nonzero probability of a design element that is of high complexity and is far away from the fixed state. Simply stated: if Pr[A ∪ B] < 1 then Pr[Ā ∩B] > 0. We can iterate this procedure to construct a set of high complexity states that are pairwise separated. As long as the probability that the design element is of low complexity or is close to all elements of the set is less than one, then there exists a design element which is of high complexity and far away from all other design elements in the set. To construct the set {|ψ 1 , . . . , |ψ N }, we simply need that A union bound then converts this requirement into the following sufficient condition on the set cardinality N : For constant ∆ ∈ (0, 1), this threshold is exponential as long as the complexity obeys r k, We note the similarity of this bound to the bound derived for the number of distinct design elements.

Distant and distinct unitary design elements
Now we consider a unitary U drawn from an -approximate unitary k-design E. Eq. (5.15) bounds the probability of the unitary having δ-unitary complexity less than r, C δ (U ) ≤ r, to be O(d −2k ) when the complexity is roughly r nk.
Randomly chosen k-design elements also tend to land far away from any fixed unitary. For some V ∈ U (d) and ∆ ∈ (0, 1), Markov's inequality implies where the last inequality follows from a k-design moment bound. We refer to the proof of Lemma 1 for a detailed derivation. Next, we apply a trick from the proof of Lemma 3: This allows us to conclude Qualitatively, this is of order O(d −2k ). We now have all the ingredients in place to repeat the argument from the state case. The probability of sampling a unitary that has either low complexity or is close to any reference unitary V is according to a union bound. This is on the order of O(d −2k ) < 1 as long as the complexity r nk. By contraposition, this ensures that there exists a design element U 1 that has both high complexity and is far away from V . We can use this insight to iteratively construct a set of N high-complexity design unitaries with large pairwise distances. Explicitly, to construct a set of unitaries {U 1 , . . . , U N }, we need that A union bound relates this condition to a sufficient upper bound on the set cardinality N : (5.28) This threshold is exponential as long as the complexity obeys r k: 6 Conceptual background and contributions

Distinguishing states and channels
This conceptual section will review one fundamental question in probability theory, as well as two quantum generalizations. We refer to [31,62] for details. The underlying question is: what is the best strategy to distinguish two (biased) coins based on a single toss? More precisely, we consider the following game: there are two identically-looking coins with different biases towards coming up heads when being tossed. These biases are known to the player. A referee then picks one of these coins uniformly at random and hands it to the player. The player is allowed to perform a single toss. Based on the result she must guess which coin she obtained and wins if this guess was correct.

Distinguishing classical probability distributions
Consider two (discrete) d-variate random variables. Then, we may represent the associated probability distributions by d-dimensional vectors p, q ∈ R d which are entry-wise positive (p i , q i ≥ 0) and whose entries sum up to one. Likewise, a collection of events E 1 , . . . , E m can be also represented by vectors e 1 , . . . , e m ∈ R d that are entry-wise non-negative and obey the following normalization condition: Let us now return to the motivating question: what is the best strategy to distinguish two random variables, characterized by known probability vectors p and q in the single-shot limit? This is a binary question and without loss of generality we can restrict our attention to binary events. Let e 1 denote the event that leads us to guess that we observed the first random variable. The complementary event e 2 = 1 − e 1 is then fully characterized as well. Under the additional assumption that either random variable is handed to us with equal prior probability, the probability of success becomes This expression may now be optimized over all possible events e 1 in order to determine the optimal guessing strategy. The only constraints on e 1 are non-negativity and normalization. Together, they demand 0 ≤ e 1 ≤ 1, where the inequality signs are to be understood component-wise. The resulting optimization problem is a linear program [41,63] maximize 1 2 + e 1 , p − q (6.3) subject to 1 ≥ e 1 ≥ 0 . and can be solved in a computationally tractable way. In fact, this problem is simple enough to solve analytically. The optimal e 1 is the indicator function for p i ≥ q i , i.e. e i = 1 {p i ≥ q i }. This is the maximum likelihood estimator from statistics. Opt for the distribution that is most likely to produce the outcome that has been observed. This choice achieves an optimal success probability of Note that a success probability of 1/2 can be trivially achieved by mere guessing. The remaining factor (multiplied by two) is called the bias and corresponds to the total variational distance between p and q.

Distinguishing quantum states
It is useful to think of quantum states ρ as matrix generalizations of probability vectors. Similarly, (POVM) measurements with m outcomes are characterized by a collection of psd matrices {M i } m i=1 ∈ H d that sum up to the identity matrix I. Born's rule states that the probability of observing certain outcomes is This may be viewed as a non-commutative analogue of the classical probability rule in Eq. (6.1). One may also adapt the distinguishability game to the quantum setting: what is the probability of correctly distinguishing two quantum states ρ, σ by performing a single measurement? Once more, this is a binary question. We can without loss restrict attention to 2-outcome measurements: M 1 and M 2 = I − M 1 . We associate the first outcome with opting for ρ while the second outcome flags σ. Similar to the classical case, the probability of success is which corresponds to a bias of β qs = (M 1 , ρ − σ). We may now optimize over all possible measurements M 1 to obtain the best bias possible: The constraint denotes the positive semidefinite order (A B if and only if A − B is positive semidefinite). This is a semidefinite program [41,63] that is simple enough to solve analytically. The optimal measurement M 1 corresponds to the orthogonal projection onto the positive range of ρ − σ. The associated optimal bias is which is the trace distance of the density matrices ρ and σ. This result is known as Holevo-Helstrom Theorem [28,29].

Distinguishing quantum channels
Quantum channels decribe evolutions of quantum mechanical systems. They are linear maps A : H d → H d that map density operators to density operators of potentially different dimension d . Suppose that we wish to distinguish two channels, say A and B based on a single channel use. I.e. input a concrete quantum state and perform a measurement on the outcome state. This indicates more freedom to maximize the probability of correct distinction by opitmizing over potential input states and measurements of the channel output. The laws of quantum mechanics allow for further improving this strategy. It is possible to entangle the input state with a quantum memory: ρ in ∈ H d ⊗ H d . We then apply the channel to the first quantum system, while the second one is left unchanged in the memory. A final two-outcome measurement M 1 ∈ H d ⊗ H d on both output and memory state potentially reveals additional information. The outcome state depends on the channel in question. A priori there are two possibilities. Either ρ out = A ⊗ I(ρ in ), or ρ out = B ⊗ I(ρ in ). Here, I(X) = X denotes the identity channel acting trivially on the memory. The probability of correctly distinguishing these states -and thus the underlying channels -with a singe measurement M 1 ∈ H d ⊗ H d becomes p qc = 1 2 + Tr M 1 (A ⊗ I(ρ in ) − B ⊗ I(ρ in )) . (6.11) We may now optimize over all degrees of freedom to maximize the value of p qc . Optimizing the measurement M 1 results in a bias that is proportional to the trace distance of the outcome states. Because of convexity, optimization over potential input states can without loss of generality be restricted to pure states: This optimal bias is called the diamond distance between channels A and B [64]. This metric is more complicated than their vector and matrix counterparts and does highlight genuine quantum advantages. It can be difficult to compute it analytically, but does admit a computationally tractable reformulation (SDP) [65][66][67]. and optimal strategies are based on maximally entangling the input with the memory: Let It is easy to check that this strategy achieves the diamond distance in Eq. (6.13). Proving optimality is less trivial. For instance, this claim follows from relating the diamond distance to another norm that is easier to compute. We refer to [68,Theorem 7] and [69] for details.

Cornering "easy" unitary transformations
Fix d = q n . The evolution of a closed, d-dimensional quantum mechanical system is unitary: U(ρ) = U ρU † with U ∈ U (d). While evolutions may represent natural processes, they can also be engineered to perform certain tasks, such as quantum computing. Scalability of quantum computing hinges on the important observation that complicated evolutions (quantum gate architectures) can be decomposed into sequences of simple building blocks. A universal gate set G ⊂ U (q 2 ) acting on two (neighboring) qudits forms such a basic set of building blocks. For technical reasons, we shall assume that G contains the identity (doing nothing), as well as inverses: g ∈ G implies g † ∈ G.
Universality then means that any unitary U ∈ U (d) may be accurately approximated by a finite sequence of r unitaries chosen from G. We refer to Figure 4 for an illustrative example. Such decompositions into sequences of elementary gates provide us with a notion of simplicity. Intuitively, a quantum cicuit V is simple if it may be generated by a G-local circuit of short size. In contrast do depth, size counts the total number of elementary gates in a circuit. For r ∈ N we define G r := {V ∈ U (d) : V is generated by a G-local circuit of size ≤ r} . (6.14) We set G 0 = {I} and the following inclusion relation follows from I ∈ G: The cardinality of G r may be bounded by a simple counting argument: The fact that G is a universal gate set ensures that G r becomes dense in U (d) provided that r → ∞. A priori G r depends on the particular choice of universal gate set G. However, the Solayev-Kitaev theorem also asserts that other universal gate sets can be accurately compiled at the cost of a constant overhead only [30].

Cornering "easy" measurements
The conceptual question underlying our definition of complexity is binary. Are we facing a pure state (unitary channel), or a maximally mixed state (depolarizing channel)? This allows us to restrict attention to two-outcome measurements, where we associate one outcome with each possibility. Two-outcome measurements are described by A projective two-outcome measurement is one for which M is an orthogonal projection: Here l ∈ [d] characterizes the rank of the measurement M and V is a unitary basis change to the eigenbasis of M . Naimark's theorem, see e.g. [31,70], provides a powerful connection between arbitrary two-outcome measurements M and projective measurements of the form (6.18). Every two-outcome measurement on ρ ∈ H d corresponds to a projective measurement on ρ ⊗ |a a| ∈ H d ⊗ H 2 , where |a a| ∈ H 2 is an ancilla system prepared in a pure state |a ∈ C 2 . Pictorially (see Sec. 7.3 for an introduction of wiring diagrams), Based on this reformulation of general 2-outcome measurements, we model limited resources in the following way: 1. The ancilla state |a ∈ C 2 is corresponds to a (fixed) simple state, e.g. |a = |0 .
2. The unitary V ∈ U (2d) must be feasible to implement. More concretely we assume that it is comprised of at most r 2-qubit gates chosen from a (fixed) universal gate set G ⊂ U (q 2 ).
3. The projective measurement P l = l i=1 |i i| is diagonal in the computational basis.
For fixed r ∈ N (circuit size for V ), this framework defines the following class of measurements: Here, Tr 2 : H d ⊗ H 2 → H d denotes the partial trace. By construction, this set is finite and obeys For the class of measurements applied to channel (and memory) outputs, we slightly modify this definition. We include a single Bell measurement |Ω Ω| ∈ H ⊗2 |i ⊗ |i in the definition: This modification simplifies exposition and is comparatively benign. A simple counting argument reveals |M r | ≤ (2d 2 + 1)|G r | ≤ (2d 2 + 1)d 2 2 log q (d) r |G| r . (6.22) 7 Technical background and contributions

Notation and basic facts from matrix analysis
Endow the vector space C d with the standard inner product x|y . A pure quantum state is a vector ψ ∈ C d normalized to (Euclidean) unit length, i.e. ψ, ψ = 1. We succinctly denote this by identifying normalized vectors with kets: |ψ denotes ψ ∈ C d with ψ|ψ = 1 . i|X|i . The trace is cyclic, i.e. Tr(XY ) = Tr(Y X) and forms the basis for defining the Schatten-p norms. In particular, Schatten-norms obey the following order relations: A variant of Hölder's inequality applies to traces of inner products, see e.g. [71, Ex. IV.2.12]: 3) The trace corresponds to a full index contraction. Partial contractions are possible for tensor products and partial traces are concrete examples. For X, Y ∈ H d define Tr 1 (X ⊗ Y ) = Tr(X)Y and Tr 2 (X ⊗ Y ) = Tr(Y )X , (7.4) and extend this definition linearly to the tensor product H ⊗2 d H d 2 . This definition naturally extends to tensor products of higher order. The following tight bound connects partial traces and operator norms: A matrix X ∈ H d is positive semidefinite (psd) if y|X|y ≥ 0 for all y ∈ C d . We denote this feature by X 0. Positive semidefiniteness is preserved under partial traces: The trace norm of psd matrices is particularly simple: X 1 = Tr(X) whenever X 0.

Convex geometry and optimization
The main technical contributions of this paper are based on bounds that follow from a fundamental argument in convex optimization. Comprehensive references for convex geometry and optimization include [40,41]. A function f : Linear transformations in the argument preserve this feature. Similarly, a set K ⊆ H d is convex if X, Y ∈ K imply τ X + (1 − τ )Y ∈ K for all τ ∈ [0, 1] . (7.8) Let K ⊆ H d be a convex set. A point X ∈ K is an extreme point if Y, Z ∈ K and X = τ Y + (1 − τ )Z for some τ ∈ (0, 1) necessarily imply Y = Z = X. Extreme points form the boundary of a convex set.  (7.11) The following technical result will prove highly valuable for establishing bounds on very general Haar moments. A ∈ H d psd (A 0). Then, the function h(X) = Tr (XAXA) is nonnegative and convex for all X ∈ H d .

Wiring calculus
Wiring diagrams, sometimes also known as tensor network diagrams, provide a graphical way for computing contractions between tensors. Here we only provide a brief overview and refer to the recent survey [38] and lecture notes [62] for a detailed introduction. The wiring formalism associates a box with every tensor and a line emanating from the box with every index. Connected lines represent contracted indices. More precisely, we place contravariant indices of a tensor on the left of the box and covariant ones on the right. Table 1 contains all the essential rules necessary for the scope of this work. Importantly lines can be bent at will without changing the value of an equation. 6 For instance, let ρ = |ψ ψ| ∈ H d be a pure quantum state and suppose that M ∈ H d is measurement. We can then represent Born's rule pictographically as Partial traces also assume a simple form. For X ∈ H d ⊗ H d Tr 1 (X) = X and Tr 2 (X) = X .
Wiring calculus is exceptionally well suited to keep track of flip operators. Define F|i | ⊗ |j = |j ⊗ |i via its action on computational basis elements and extend this definition linearly to Vectorization is a linear map vec: M d → C d ⊗ C d defined by its action on computational basis elements |vec (|i j|) := |i ⊗ |j , (7.17) and linearly extended to all of M d . In wiring calculus, |φ = |vec(Φ) corresponds to bending the right (covariant) index of a matrix A to the left (into a contravariant one): It is easy to see that vectorization is an isometry:

Random unitaries and k-designs
Here we introduce a few essential concepts from quantum information theory, including a discussion of random unitaries and the notion of a design. First, recall that the Haar measure is the unique left/right invariant measure on the unitary group U (d). We will often be interested in moments of the Haar ensemble. Consider an operator X acting on the k-fold Hilbert space (C d ) ⊗k , the k-fold channel, or k-fold twirl, of the operator with respect to the Haar measure on the unitary group is Similarly, we can average an operator over an ensemble of unitaries E = {p i , U i }, a weighted subset of the full unitary group. The k-fold channel with respect to E is here written for a discrete ensemble, but such an ensemble might be discrete or continuous.
Unitary k-designs We will often be interested in how well an average over an ensemble captures an average over the full unitary group, i.e. how random the ensemble is with respect to the Haar measure on U (d). A unitary k-design is an ensemble of unitaries E = {p i , U i }, for which the k-fold twirl equals its Haar-random counterpart: This means that the ensemble E exactly captures the first k moments of the Haar ensemble.
Unitary operator bases, such as the n-qubit Pauli group, form an exact 1-design. But very little is known about the construction of exact designs for higher k, with the notable exception of k = 3 and the n-qubit Clifford group [15][16][17]. We will return to this point when discussing approximate designs.
Schur-Weyl duality Many of the important analytic expressions for Haar averages rely on Schur-Weyl duality [34,35], a deep connection between irreducible representations (irreps) of the unitary group U (d) and the symmetric group S k . First, when thinking about k-fold Hilbert spaces, there is a useful set of operators that acts on this space, namely permutations of the k copies. A permutation operator P σ acts on the computational basis of (C d ) ⊗k as This action can be extended linearly to all of (C d ) ⊗k . Schur-Weyl duality is the statement that an operator acting on (C d ) ⊗k commutes with all k-fold unitaries U ⊗k if and only if it is a linear combination of permutation operators Many of the exact expressions for Haar moments and random unitary averages in the following subsection follow directly from this powerful result.

Haar-integration over the unitary group
We now introduce the general formalism for integrating arbitrary moments of random unitaries over the full unitary group with respect to the Haar measure, often referred to as Weingarten calculus. The exact expression [37,72] for integrating the k-th moment of U (d) is where we sum over elements of the permutation group S k and define a contraction of indices with respect to a permutation σ ∈ S k as Mixed moments of U and U † , i.e. averages of U ⊗k ⊗ U †⊗k with k = k , vanish identically. It will often be convenient to interpret the index contraction δ σ ( ı |  ) as a permutation operator acting on the computational basis of the k-fold space, For instance, two examples of contractions for k = 4 are (7.28) The weight associated to a given contraction is called the the Weingarten function, Wg(σ, d). It is a function on elements of S k and admits an expansion in terms of characters of the symmetric group where we sum we sum over the integer partitions of k that label the irreps of S k ; χ λ (σ) is an irreducible character of λ, and f λ is the dimension of the irrep λ. The polynomial in the denominator is defined as where we take a product over the coordinates (i, j) of the Young diagram of λ. Writing λ as an integer partition of k, with elements λ i , the product is taken over i from 1 to (λ), the length of the partition, and j from 1 to λ i . The expression for the Weingarten function in Eq. (7.29), is valid for k ≥ d by restricting the sum over partitions of length (λ) ≤ d (such that the polynomial c λ (d) in the denominator is free of zeroes).
The Weingarten functions only depend on the cycle type of the permutation, where the cycle type of σ ∈ S k is an integer partition of k. We end this brief exposition by listing the first few unitary Weingarten functions, labeled by cycle type. For k = 1, Wg((1), d) = 1 d , and for k = 2, we have Wg ((1, 1) The k-fold twirl of an operator over the unitary group can be written using Eq. (7.25) as This expression equivalently follows from noting that, by the invariance of the Haar measure, the k-fold twirl T We also note that the k-fold twirl of a permutation operator is T (k) U (P ρ ) = P ρ . Eq. (7.32), then gives that Wg(σ −1 τ, d)Tr(P τ P ρ ) = δ σ,ρ . Viewed as a matrix equation, the matrix of Weingarten functions Wg (k) is the pseudoinverse of the k! × k! matrix G (k) of inner products of permutation operators P σ (the Gram matrix of P σ 's). The elements of G k are the inner products between permutation operators, Tr(P σ P τ ) = d (σ −1 τ ) , where (σ −1 τ ) simply counts the number of closed cycles in the permutation product (equivalently, the length of the cycle type of the product): The matrix inverse exists for k ≤ d. Although elegant, this derivation of the Weingarten functions quickly becomes intractable as we need to invert a k!×k! matrix. The representation theoretic definition in Eq. (7.29) is straightforward to use in computing high moments.
Wiring diagrams for the first few Haar moments To set up the calculations that will follow in the next section, we explicitly write out the wiring diagrams in the first two moments, detailing the index contractions one must take. For k = 1, we simply have For k = 2, we sum over elements of S 2 , separately permuting the internal and external indices as Moments of traces We can use the formalism introduced above to compute a few simple expressions averaged over the unitary group, which will be of use in later sections. Consider the 2k-th moment of the trace of a random unitary, |Tr(U )| 2k , which we integrate over the unitary group as with Tr(P σ P τ ) = d (στ ) . View this as a matrix equation, and recall that for k ≤ d the Weingarten functions are the inverse of the inner products Eq. (7.33). Then, we simply have the trace of the identity matrix, a sum over S k : This quantity is essentially the same as the frame potential [14], a quantity which quantifies the 2-norm distance between an ensemble of unitaries E and the Haar ensemble. The frame potential for any ensemble is lower bounded by this Haar value.
Averages of pure states Consider a Haar random state |ψ = U |0 , with |0 ∈ C d and U ∈ U (d), and take the k-fold average with respect to the unitary group. Then, as permuting and contracting the pure state moments is the same for any permutation. This also follows from Schur-Weyl duality by noting that the k-fold average is invariant under k-fold unitary conjugation and may thus be expressed as a sum of permutations. Fixing σ above, the sum over τ just gives the sum over Weingarten functions, which is Equivalently, we can fix this coefficient by taking the trace of Eq. (7.38). Thus we find the the k-fold average of a pure state is (7.40) where Π sym = 1 k! σ∈S k P σ is the projector onto the symmetric subspace and k+d−1 k is the corresponding dimension.
A similar calculation is to consider the moments of the expectation value of a conjugated operator ψ|U † M U |ψ , where |ψ ∈ C d and a a Hermitian operator M ∈ H d . We find Again, as permuting and contracting tensor products of a pure state just gives one, for any τ the σ sum is just a sum over Weingarten functions. Using Eq. (7.39) and recalling the definition of the projector onto the symmetric subspace, we conclude

Approximate k-designs and bounds on weight distributions
Weingarten calculus is a powerful tool. It characterizes twirls over the diagonal representation of the unitary group for arbitrary tensor powers k ∈ N. In turn, this formula allows for computing moments of random variables that involve Haar random unitaries. These then can be used to establish generic features, such as concentration of measure. However, full control of all moments comes at a price. It is excessively difficult to sample unitaries directly from the Haar measure. Simple dimension counting highlights that circuits of exponential size are required to implement a Haar random unitary circuit on n qudits. The notion of k-designs introduced in Sec. 7.4 addresses this issue by allowing one to interpolate between Haar-random (k = ∞) and highly structured (k = 1) ensembles. Unfortunately, very few explicit constructions of k-designs are known. This lack of efficient constructions can be overcome by relaxing the defining property of a k-design.
Here, T This definition readily extends to ensembles of infinite cardinality. Several different definitions of approximate k-designs can be found in the literature. By and large these differ in terms of the metric that is used to quantify closeness. We have defined an approximate design up to additive error, but have chosen to scale with d in a manner that mimics relative error, similar to the strong definition of a design used in [12]. This will also simplify exposition considerably.
The approximate k-design property imposes severe restrictions on associated distribution of weights and the ensemble size.
Lower bounds on approximate k-design cardinality are known, see e.g. [12,Lemma 26] for a similar result. We are not aware of any weight bounds in the literature.
We also consider orbits of approximate k-designs Fix |x ∈ C d arbitrary and define |y i = U i |x for i ∈ [N ]. Doing so results in a weighted set of unit vectors. These sets are called approximate complex-projective k-designs [18,73]. They approximately reproduce the first k moments of the uniform distribution on the complex unit sphere. Lower bounds on the cardinality of exact spherical k-designs are known, see e.g. [20], but we are not aware of any statement that bounds the associated weights.
The emphasis on distinct states is justified. Two or more distinct unitaries can give rise to the same state.
Proof of Lemma 5. Fix j ∈ [N ] = {1, . . . , N } and use Eq. (7.37) The approximate k-design property implies that the mismatch on the r.h.s. remains small. Let |i ⊗ |i denote the maximally entangled state. Then, Tr(U ) = d Ω|U ⊗ I|Ω and we apply Definition 4 to bound ≤ (1 + )k!. This allows us to conclude arbitrary. The lower bound on the cardinality N is an immediate consequence of this weight restriction: Proof of Lemma 6. The argument is very similar to the proof of Lemma 5. Fix j ∈ [N ], set M = |y j y j | and use Eq. (7.42) Next, observe that the Haar average obeysTr Π sym M ⊗k = Tr Π sym |y j y j | ⊗k = 1. The approximate k-design property in addition implies that the deviation from this ideal value remains small. Matrix Hoelder asserts because M ∞ = |y j y j | ∞ = 1. This allows us to conclude for any j ∈ [N ]. Both weight and cardinality bound readily follow from this assertion. where D(X) = Tr(X) d I is the depolarizing channel. Moreover, the following bounds apply to all centered moments of order k = 1, . . . , d 2/3 :

A general moment bound for Haar random unitaries
Here, C k = 1 k+1 2k k is the k-th Catalan number.

Moment bounds for approximate designs
Corollary 5. With the same assumptions in Theorem 8, but suppose that U ∈ U (d) is chosen from an -approximate unitary k-design E. Then, Next, fix k ∈ N and compare the k-th centered moment to its Haar-averaged counter-part: (7.59) The first contribution is bounded by Theorem 8 and the approximate k-design property (Definition 4) ensures that the mismatch ∆ remains controlled: Finally, use the fact thatM is the difference of two psd matrices to conclude where we have also used Eq. (7.5).
Corollary 6 (Moments of k-design orbits). For |x ∈ C d and a measurement M ∈ H d (I M 0) defineQ where U is sampled from an -approximate k-design. Then, −1 follows from arguments that are analogous to the ones presented in the proof of Lemma 6. Next, apply Eq. (7.42) to the remaining Haar expectation: Tr Π symM ⊗k .

Proof of the general moment bound
This section is devoted to proving the general moment bound presented in Theorem 8.

Reformulation and basic norm bounds
Use the vectorization correspondence |φ = vec(Φ) with Φ ∈ M d×d to rewrite the random variable defined in Theorem 8: Here, we have implicitly defined M Φ := (I ⊗ Φ † )M (I ⊗ Φ). Also, recall that vectorization is an isometry, i.e. Φ 2 = φ|φ = 1. The following auxiliary result bounds the 2-norm of M Φ and its partial contractions. Proof. Observe The function X → h 1 (X) is convex, according to Lemma 4 (M 0 implies Tr 1 (M ) 0). Moreover, ρ = Φ † Φ ∈ H d is guaranteed to be a quantum state: ρ = Φ † Φ 0 and Tr(ρ) = Φ 2 2 = 1. The extreme points of the convex set of all quantum states are pure states. The convex function h 1 achieves its maximum value at such an extreme point (Fact 1) and we infer Bounds on centered moments Lemma 9. With the same assumptions and notation as in Corollary 7, suppose that U ∈ U (d) is chosen uniformly from the Haar measure. Then, for any k ≤ d 2/3 k is the k-th Catalan number.
Proof. It is instructive to first analyze and understand the second moment: Each term is a full contraction that is also called a tensor network [38,39]. There are three possible constituents for each tensor network:M Φ , Tr 2 (M Φ ), and Tr 1 (M Φ ). Importantly, no full self-contractions can contribute to the overall sum, becauseM φ is traceless. This ensures that networks with self-contractions -like the first term -evaluate to zero. Moreover, Lemma 7 bounds the 2-norm of each elementary constituent: The final bound is considerably larger than the rest. However, the corresponding contribution in the sum (7.82) is also suppressed by an additional dimension factor. This is not a coincidence: term 3 can only arise if the cycle classes of (σ,τ ) differ from each other. This feature reflects itself in the Weingarten function. For the second moment, we thus obtain the following simple bound (ignoring signs): It immediately follows from upper-bounding individual terms using Eq. (7.83). This general strategy also applies to higher moments. Fix k ≥ 3 arbitrary. Then, Weingarten calculus implies Here, each N σ,τ (·) indicates a tensor network diagram that combines (at most) three elementary building blocks according to rules that are dictated by the permutations τ and σ: We can without loss restrict summation to tensor networks without self-contractions, because Tr(M Φ ) = 0 ensures that such contributions vanish identically. Next, we apply a powerful general bound to individual tensor networks. [39,Proposition 18] states that the value of a tensor network (without self-contractions) is bounded by the product of 2-norms of the individual constituents. For any σ, τ this implies where ν 1 , ν 2 , ν 3 ∈ [k] denote the number of times each basic building block occurred in the network. Clearly, ν 1 + ν 2 + ν 3 = k and we can combine this with Eq. (7.83) to conclude The final contribution d ν 3 /2 is always counter-balanced by the Weingarten function, i.e. the dangerous terms are always suppressed by powers of 1/d. As we discussed, the Weingarten functions Wg(σ, d) only depend on the cycle type of the permutation σ. The asymptotic behavior is Wg(σ, d) ∼ 1/d 2k− (σ) , where is the length of the cycle type, i.e. the number of cycles in the permutation. The leading order terms are those for which the cycle type is (1, 1, . . . , 1), the partition of 2k into 1's. For Wg(σ −1 τ, d) this corresponds to terms with σ = τ . Returning to the problem at hand, we contract the upper indices ofM Φ with respect to σ and the lower indices with τ . The leading order terms, are the terms for which we act on upper and lower indices the same. In order to generate terms in the tensor network contraction of M 's containing a dangerous contribution Tr 1 (M Φ ) , the lengths of the cycle types of the two permutations must differ by at least one, in order to generate a contraction, a length one cycle, upstairs: Although, the Tr 1 (M Φ ) terms will only contribute at subleading order, they appear with a larger contribution in powers of d. Thus, to rigorously upper bound the expression, we need bounds on the Weingarten functions as well as on the number of terms ν 3 which appear in a given tensor network N σ,τ .
Precise upper bounds on the Weingarten functions are known [37,74]. For our purposes, it will be convenient to use the (slightly weaker) bound in [75], which states that for k ≤ d 2/3 where C k is the k-th Catalan number. Now we establish that ν 3 (σ, τ ), the number of dangerous terms Tr 1 (M Φ ) terms in a given N σ,τ , is bounded by the distance between the permutations σ and τ as ν 3 (σ, τ ) ≤ 2d(σ, τ ). First we note a few facts about the symmetric group. d(σ, τ ) is defined as the minimal number of transpositions needed to take σ to τ , and defines a distance between the permutations. Specifically, d(σ, τ ) is a metric on the Cayley graph of the symmetric group with the generating set of transpositions. The length of the cycle type of a permutation σ ∈ S k is related to the number of transpositions needed to build σ from the identity permutation as (σ) = k−d(σ, I). Furthermore, a transposition changes the number of cycles in a permutation by exactly one.
The terms Tr 1 (M Φ ) only appear when the permutation σ has a fixed point where τ does not, i.e. there is a contraction on the upstairs indices ofM Φ , meaning σ has a length one cycle at a point where τ does not. As d(σ, τ ) is the minimal number of transpositions required to take σ to τ , and a transposition can only change the number of cycles by exactly 1, then for every two dangerous terms the distance between the permutations σ and τ must increase by at least one. This shows that ν 3 (σ, τ ) is bounded as Returning to the general moment bound, we can apply the bound on Weingarten functions in Eq. (7.90) and the bound on ν 3 to show that which establishes the claim.

ε-coverings of local random circuits
We would like to extend our results in Sec. 3.1 on complexity growth to local random circuits, where the gates are chosen Haar-randomly from U (q 2 ). Obviously, the ensemble of size T circuits is continuous and statements about the number of states of a certain complexity become less meaningful. Nevertheless, we can consider an ε-covering of the ensemble of local RQCs in order to make concrete statements about complexity growth. We say that a set of unitaries V is an ε-covering of a set of unitaries U if for all U ∈ U there is some Consider the set of local random circuits of size T , where again we act on n local qudits with local dimension q and with local gates chosen Haar-randomly from U (q 2 ). Following Lemma 27 from [12], we can bound the size of an ε-covering of the set E RQC size T local RQCs. Approximating each local gate to accuracy ε/T , we construct a covering in diamond norm of each gate with size ≤ 10T /ε . For the n T choices of gates in the circuit, we conclude that there exists an ε-covering E RQC of size T RQCs with cardinality Furthermore, if an ensemble E forms an -approximate unitary k-design, then the εcovering of E will form an -approximate unitary design with = + 2d 2k ε (from Prop. 8 in [12]). Using the lower bound on the cardinality of an approximate design in Lemma 1 and the upper bound on the cardinality of an ε-covering of size T local random circuits in Eq. 7.93, means that forẼ RQC to form an approximate design, we must have This gives a lower bound on the size for local random circuits to form k-designs T ≥ 2kn log q q 4 log k .
(7.95) Therefore, an optimal random circuit implementation of a unitary design will have at least an essentially linear scaling in both n and k. The proof is standard and we include it in this appendix for completion. It is based on Levy's Lemma, i.e. concentration of measure on the real-unit sphere S 2d−1 ⊂ R 2d . A function f : S 2d−1 → R is L-Lipschitz (with respect to the Euclidean norm · 2 on R 2d ) if |f (x) − f (y)| ≤ L x − y 2 for all x, y ∈ S 2d−1 . (A.2) Theorem 9 (Levy's Lemma). Let f : S 2d−1 → R be a L-Lipschitz function on the unit sphere. Then, the following relation is true if x is chosen uniformly from S 2d−1 : Proof of Proposition 1. The complex unit sphere in d-dimensions admits an isometric embedding -with respect to the Euclidean norm -onto the real-valued unit sphere S 2d−1 ⊆ R 2d : |ψ → |x = Re(|ψ ) ⊕ Im(|ψ ) ∈ S 2d−1 . because M is Hermitian. Its expectation is also preserved and Lemma 10 below states that this function is Lipschitz with constant 2 M ∞ ≤ 2. The claim then readily follows from Levy's lemma (Theorem 9).

B Designs and the traditional definition of complexity
In the bulk of the paper we focused on a stronger notion of complexity than the standard definition, an operational definition involving the complexity of the distinguishing measurement to differentiate the state from the maximally mixed state. A more traditional definition is often considered in the literature, which involves building a quantum circuit which approximates the state when evolved from an initial state. This intuitive notion of complexity is related to the minimal size of such a circuit.
In this appendix, we will work through the counting arguments in Sec. 5 for the complexity of elements of a k-design using the more traditional (albeit weaker) definition of complexity. We will refer to this as the weak complexity of a state or unitary to distinguish it from the operational definitions presented in Sec. 2.1.
Consider a system of n qudits with local dimension q, such that the total dimension is d = q n . Let G ⊂ U (q 2 ) denote a universal gate set of elementary 2-local gates, and let G r be the set of circuits of size r built from our gate set G.
Definition 5 (weak δ-state complexity). For δ ∈ [0, 1], we say that a state |ψ has δ-state complexity of at most r if there exists a unitary circuit V ∈ G r such that 1 2 |ψ ψ| − V |0 0|V † 1 ≤ δ , which we denote as C δ (|ψ ) ≤ r . We want to be able to make precise statements about the complexity of sets of states. More specifically, if we consider a complex projective design, the requirement that they form a k-design is sufficiently restrictive to deduce a quantitative statement about the complexity of the constituent states.
Theorem 10 (weak complexity of state designs). Consider an -approximate complex projective k-design E = {p i , |ψ i } N i=1 . Then there are at least d k k!
The number of high complexity states is exponentially large in k for complexity r k(n − log k) log n .

(B.2)
Turning now to the complexity of unitaries, the traditional definition of complexity is the minimal size of a circuit, built from our gate set, which approximates that unitary.
Definition 6 (weak δ-unitary complexity). For δ ∈ [0, 1], we say that a unitary U has δ-unitary complexity of at most r if there exists a circuit V ∈ G r such that where U(ρ) = U ρU † and V(ρ) = V ρV † .
Again, we ask if the structure of a unitary k-design allows us to conclude anything about the complexity of unitaries. Once more, we find that we can turn the statement that k-design elements have a certain expected complexity into a quantitative one.
Theorem 11 (weak complexity of unitary designs). Consider an -approximate unitary k- Then there are at least unitaries in E with weak δ-unitary complexity C δ (U i ) > r.
The number of high complexity unitaries is again exponentially large in k for complexity less than r k(2n − log k) log n . (B.4) We now provide details and proofs of the above statements about complexity of spherical and unitary designs.

B.1 Weak state complexity for spherical designs
Proof of Theorem 10. First, as stated in Lemma 2, we note that the definition of weak δ-state complexity in Definition 5 is equivalently written as We can show this by first noting that X := |ψ ψ| − V |0 0|V † has rank at most two. Directly computing the eigenvalues of X from Tr(X) = λ 1 + λ 2 = 0 and Tr(X 2 ) = λ 2 1 + λ 2 2 = 2 − 2| ψ|V |0 | 2 , (B.6) we find λ 1,2 = ± 1 − | ψ|V |0 | 2 . Then as X 1 = |λ 1 | + |λ 2 | we have that from which the claim follows. We want to ask, given some state |ψ chosen uniformly from an -approximate spherical k-design, what is the probability that the state has δ-complexity at most r: C δ (|ψ ) ≤ r? We know that the state will have δ-complexity r if there exists a V ∈ G r such that Eq. B.5 holds. A union bound then gives that We can bound the probability that a state drawn from a spherical k-design satisfies Eq. B.5 as a straightforward consequence of Markov's inequality: In the last step here, we use Eq. (7.42) and proceeding similarly as in the proof of Lemma 6, noting that for a fixed state |φ and |ψ averaged over an -approximate spherical k-design, we have This claim readily follows from an argument similar to the proof of Lemma 6. Returning to Eq. B.8, we find that the probability that a state in a spherical design has complexity of at most r is Pr C δ (|ψ ) ≤ r ≤ (1 + ) d + k − 1 k −1 n r |G| r (1 − δ 2 ) k , (B.11) using the bound on the expectation and a bound on the cardinality of G r . We now turn to proving the primary claim. Negating the above assertion implies that Furthermore, we may also write this probability as the expectation of the associated event, which yields where 1 is the indicator function, and in the last step we use the bound on the weights of an -approximate spherical k-design in Lemma 6. M denotes the number of states in the spherical design |ψ i with weak δ-complexity greater than r. Combining the previous two equations, we find that (B.14) which completes the proof.

B.2 Weak unitary complexity for unitary designs
Proof of Theorem 11. We start by noting an equivalent definition of weak δ-unitary complexity as shown in the proof of Lemma 3. A necessary, but in general not sufficient, condition for weak unitary complexity in Definition 6 is Now we again ask, given some unitary U chosen uniformly from an -approximate unitary k-design, what is the probability that it has δ-unitary complexity at most r: C δ (U ) ≤ r? As this holds if there exists a V ∈ G r such that the channels are close in diamond distance, a union bound then gives that using the reformulation above. We can bound the probability that a unitary drawn from a k-design satisfies this condition again by using Markov's inequality: where in the last step, we use the moments of traces for unitary designs and as in Lemma 5 find that for a fixed unitary V and a unitary U averaged over an -approximate unitary k-design, we have Returning to the expression above in Eq. B.16, we find that the probability C δ (U ) ≤ r is using the bound on the expectation and a bound on the cardinality of G r . Negating the expression gives a lower bound on the probability that a unitary in a k-design has complexity greater than r. Furthermore, we may also write this probability as the expectation where we use the bound on the unitary design weights in Lemma 5. M denotes the number of untiaries in a k-design with weak δ-complexity greater than r. Combining the previous two equations, we find that which completes the proof.