Operator Spreading in Random Unitary Circuits

Random quantum circuits yield minimally structured models for chaotic quantum dynamics, able to capture for example universal properties of entanglement growth. We provide exact results and coarse-grained models for the spreading of operators by quantum circuits made of Haar-random unitaries. We study both 1+1D and higher dimensions, and argue that the coarse-grained pictures carry over to operator spreading in generic many-body systems. In 1+1D, we demonstrate that the out-of-time-order correlator (OTOC) satisfies a biased diffusion equation, which gives exact results for the spatial profile of the OTOC, and the butterfly speed $v_{B}$. We find that in 1+1D the `front' of the OTOC broadens diffusively, with a width scaling in time as $t^{1/2}$. We address fluctuations in the OTOC between different realizations of the random circuit, arguing that they are negligible in comparison to the broadening of the front. Turning to higher D, we show that the averaged OTOC can be understood exactly via a remarkable correspondence with a classical droplet growth problem. This implies that the width of the front of the averaged OTOC scales as $t^{1/3}$ in 2+1D and $t^{0.24}$ in 3+1D (KPZ exponents). We support our analytic argument with simulations in 2+1D. We point out that, in a lattice model, the late time shape of the spreading operator is in general not spherical. However when full spatial rotational symmetry is present in 2+1D, our mapping implies an exact asymptotic form for the OTOC in terms of the Tracy-Widom distribution. For an alternative perspective on the OTOC in 1+1D, we map it to the partition function of an Ising-like model. As a result of special structure arising from unitarity, this partition function reduces to a random walk calculation which can be performed exactly. We also use this mapping to give exact results for entanglement growth in 1+1D circuits.


I. INTRODUCTION
A key challenge for many-body physics is to identify universal properties of quantum dynamics and the approach to thermalization. Particularly important are universal results that hold for generic quantum systems. Examples of such universal properties include the existence of effective light cones for the propagation of quantum information [1] and the existence of universal scaling forms for the growth and saturation of the von Neumann entanglement entropy in 1+1D [2][3][4][5][6][7][8][9][10][11][12][13][14][15].
By definition, generic systems lack the structures (for example large numbers of symmetries or conservation laws) that allow for exact results in typical solvable many-body systems. Surprisingly, insight into generic systems can come from studying dynamics with even less structure than a generic Hamiltonian system, such as the dynamics generated by a random quantum circuit. Random circuit dynamics provide a minimally structured model with which real Hamiltonian dynamics can be compared [16][17][18][19][20][21][22][23][24]. Despite its simplicity, this model is able to capture universal scaling forms for entanglement growth both in 1+1D and in higher dimensions [10]. Random circuits are also toy models for information scrambling in black holes and other strongly coupled systems [16][17][18][19][20][21][22][23][24][25].
In this paper we provide both exact results and coarsegrained descriptions for the spreading of quantum operators under random circuit dynamics, as measured by the 'out-of-time-order correlator' (OTOC). The OTOC originally appeared in the study of quasi-classical approximations to superconductivity [26], and is closely related to the commutator norm that appears in Lieb-Robinson bounds [1], but it has been studied recently as a means of quantifying the scrambling of quantum information [27][28][29][30]. It has been argued that early-time exponential growth of the OTOC is a characteristic feature of chaotic quantum systems, and such growth has been obtained within the AdS/CFT correspondence and in a range of physical systems [31][32][33][34][35][36][37][38]. The OTOC has also been applied to characterize slow dynamics in the presence of disorder and in the many-body localized phase [39][40][41][42][43][44][45][46]. It has been calculated in 2D conformal field theories [47] and integrable chains [48], and studied numerically in nonintegrable 1D systems [49][50][51]. Following theoretical proposals [52][53][54], experiments addressing the OTOC have been conducted [55][56][57].
Random quantum circuits provide an ideal theoretical setting for the exact calculation of quantities such as the OTOC. While the behaviour of the OTOC in a random circuit is interesting in its own right, we conjecture that the long-distance properties of the OTOC that we derive will also be applicable to deterministic dynamics. Therefore we believe that these results will provide a useful starting point for understanding the generic spatial structure of spreading operators.
An operator O 0 which is initially localized near the spatial origin (say, on a single site of a spin chain) will evolve under Heisenberg time evolution into a vastly more complicated operator O 0 (t) = U † (t)O 0 U (t) that acts nontrivially on many sites. The 'size' of O 0 (t) is the size of the region in which O(t) fails to commute with a typical local operator Y x at position x. This may be measured by where the expectation value has been taken in an appropriate Gibbs state. (For our purposes this will be taken to be the infinite temperature Gibbs state ρ ∞ , which is the state to which random circuit dynamics equilibrate.) To make the connection with the out-of-time-order correlator (OTOC), we may expand out the commutators in (1). For simplicity let us assume for the moment that the operators O 0 and Y x are both Pauli-like operators squaring to the identity. We then have The second term, in which the operators are not time ordered, is the OTOC. At a given time t, the range of x where the commutator C(x, t) is significantly larger than zero gives a measure of the size of the operator. This region typically grows ballistically [58], even when local conserved quantities exhibit diffusive transport [31,38,49,50]. 1 The immediate natural questions about C(x, t) include: what is the 'butterfly' velocity v B associated with this ballistic growth? What is the spatial structure of C(x, t)? Is there a 'hydrodynamic' equation for C(x, t) at large time and distance scales? Are there important differences between 1+1D and higher dimensions? We will answer all these questions for the case where the time evolution operator U (t) is a circuit composed of Haar random unitaries, as in Fig. 1. 1 Strongly disordered Hamiltonians in 1+1D provide counterexamples to this ballistic spreading. We find that the average OTOC C(x, t) (where the average is over the local unitaries in the quantum circuit) has a front which broadens as t α , with the indicated exponents in various spatial dimensions d.
We demonstrate that, both in 1D and in higher dimensions, operator spreading and the growth of the OTOC can be mapped to classical stochastic growth models. We show via an exact calculation that operator spreading in 1+1D can be understood in terms of an equation involving diffusion and drift. The 'front' of the operator propagates at a finite velocity v B . However the front also broadens diffusively, so that its width is proportional to √ t (Fig. 2). We conjecture that this physics also occurs in generic (nonintegrable) 1D systems undergoing deterministic Hamiltonian dynamics. For random circuit dynamics, we must also consider how the averaged correlator C differs from the correlator C within a given realization of the random circuit. We argue that fluctuations between realizations are small: typical variations in the front position between different realizations are O(t 1/4 ), so negligible in comparison with the √ t broadening of the front.
Turning to higher dimensions, we show by an exact mapping that there is a remarkable relationship with a classical droplet growth problem in the Kardar-Parisi-Zhang universality class [59]. (To avoid confusion, we note that this is not related to the connection between entanglement growth and KPZ introduced in [10].) We use this relationship to quantify the broadening of the 'front' of a growing operator in a higher dimensional random circuit. In 2+1D the front broadens like t 1/3 [59], and in 3+1D like t 0.240 [60]. 2 In the two-dimensional case, and in the absence of lattice anisotropies, recent breakthroughs in the theory of interface growth [61][62][63][64][65][66][67][68][69][70][71][72][73] also translate to an exact form for the OTOC, in terms of the celebrated Tracy-Widom distribution (Fig. 3). The broadening of the front of the OTOC is summarized in Fig. 2.
Again, we conjecture that these universal scaling forms extend to nonintegrable models with timeindependent Hamiltonians, although we note that a previous calculation in a different setting has instead found a front that does not broaden with time, and is governed by a traveling wave equation [31] (see also [37,38]). (A traveling wave equation arises from our mappings if we make a certain mean field treatment, Appendix. I. But this mean field is not valid in physical dimensionalities.) In higher dimensions the shape of the spreading operator 3 is also of interest. At first sight one might expect the shape of the operator to be asymptotically spherical at late times. Instead, we argue that in systems with an underlying lattice, which have only discrete spatial symmetries, the spreading operator will not become spherical. Its asymptotic shape is determined by a model-specific velocity function v(n), the speed of the front depending on the local normal vectorn. We verify this for random circuits by simulation in 2+1D.
The results above are for random circuits composed of generic (Haar-random) unitary matrices. It is interesting to compare with random circuits composed of unitaries from the Clifford group, a discrete subgroup which leads to efficiently simulable dynamics [74,75]. In this case the dynamics of the operator is much simpler [10], and randomness-induced fluctuations are much more severe. But remarkably the results for the averaged OTOC C coincide with the results for generic unitaries. This is a consequence of the fact that the Clifford group is a unitary 2-design [76].
In one dimension we give a complementary exact calculation of the averaged OTOC, using a mapping to the partition function of a classical Ising model. These Ising degrees of freedom have a similar origin to those found in calculations in random tensor networks [77]. We show that special structure arising from the unitarity of the quantum circuit means that this partition function is exactly calculable for any value of the local Hilbert space dimension.
Another important question is how the speed v B associated with operator spreading relates to the speed v E which can be associated with entanglement growth following a quench in 1D [2-4, 6-8, 10, 11, 78, 79]. Refs. [10,11] pointed out that in general v E is smaller than v B , unlike the situation in a 1D conformal field theory [3]. We extend this here, showing that arbitrarily small values of v E /v B can be achieved without any fine-tuning. (In a system with quenched, i.e. timeindependent, disorder it is even possible to have v B > 0 but v E = 0 [40]. ) We also use the Ising mapping described above to give an exact calculation of the average entanglement purity (the exponential of the second Renyi entropy) for a random circuit, complementing the scaling picture, in terms of a coarse-grained minimal cut, of Ref. [ We begin by defining the random circuit dynamics which we consider in 1+1D, and describing the 'hydrodynamic' continuum picture we propose for the OTOC in 1+1D. In Sec. IV we give an alternative exact calculation of the OTOC, confirming and extending the results below.

A. Hydrodynamic equation for averaged OTOC
We consider time evolution by a quantum circuit on an infinite 1D spin chain where each spin (qudit) has local Hilbert space dimension q. The structure of the quantum circuit is shown in Fig. 2a. Two-site unitaries are applied to 'even' bonds on even time steps and 'odd' bonds on odd time steps (a 'running bond' layout in the language of bricklaying). Each two-site unitary is drawn independently from the uniform distribution on the two-site unitary group U(q 2 ). Formally, our time evolution operator is U (t) = U (t, t − 1)U (t − 1, t − 2) · · · U (1, 0), where a single layer of the circuit is given by Each two-site unitary U x,x+1 (t , t − 1) is Haar random and independent of all of the others. Given an operator O, we write O(t) = U (t) † OU (t). We will evaluate the following out-of-time order correlator with respect to this time evolution: Here, ρ ∞ is the infinite temperature Gibbs state, i.e., the mixture of all possible spin configurations with equal weights. X 0 is a Hermitian operator located at the origin of the spin chain, and Y x is a Hermitian operator located at site x. We take both X and Y to be traceless, and normalized such that Tr X 2 = Tr Y 2 = q. For example if q = 2 (the spin-1/2 chain) we can take X and Y to be Pauli matrices at sites 0 and x, respectively. Since the unitaries in the circuit are random, we must distinguish between averaged quantities (denoted by E U , or whenever unambiguous by overline [· · · ]) and quantities within a given realization of randomness. However, we will argue that fluctuations induced by the random circuit are small in a certain sense, meaning that the spatial profile of C(x, t) in a given realization of the circuit is, at large times, parametrically close to the average value C(x, t). Fig. 2 is a schematic of the spatial profile we will show for C(x, t) at fixed large time. The 'size' of the operator is determined by a butterfly speed which is Within a region of size ∼ 2v B (q)t the commutator C(x) has saturated to a value very close to unity. Note that for finite q the butterfly velocity is smaller than the 'naive' speed limit of unity, which is set by the geometry of the quantum circuit, while in the limit q → ∞ they coincide. The 'front' of the operator, i.e. the region in which C varies between 0 and 1, broadens diffusively. The width of the front is proportional to More precisely, letting Φ denote the cumulative density function of the Gaussian distribution, Φ(y) = 1 √ 2π y −∞ e −x 2 /2 dx (which tends to zero for y 0 and to 1 for y 0), we have In Sec. IV we will see that Eq. (7) is the partition function of an Ising-like statistical mechanics problem, and will derive an exact formula on the lattice (without any continuum approximation): and g(n, a, p) = a k=0 n k (1 − p) n−k p k .
Here we show how Eq. (7) can be related to a continuum hydrodynamic equation which is asymptotically accurate at large times. For x near the operator's right hand front, C is related to a diffusing conserved density ρ: We will explain the quantity ρ below.
To begin with, focus on the spin-1/2 chain (q = 2). At time t we may write the operator in the basis of products of Pauli matrices [11,22,23,58,79], where X 0 (t = 0) is a single site operator. Here the 'string' S can be any product of Pauli matrices on distinct sites. The number of strings in the sum generically grows exponentially with time (at the naive lightcone velocity, set by the geometry of the circuit). The S are normalized as and X 0 is also normalized so Tr ρ ∞ X 2 It is useful also to introduce ρ(x, t), the 'fraction' of strings ending at x: We observe that the OTOC is determined by a S (t) 2 as follows. Let Y x be the Pauli matrix σ y at site x. (This choice does not sacrifice generality due to Haar randomness of the circuit.) Since distinct Pauli matrices anticommute, we see Due to the orthonormality in Eq. (12), we have This tells us that if we determine the evolution of a S (t) 2 , then the averaged OTOC is also determined. The dynamics of a S (t) 2 turns out to be remarkably simple, as shown in Refs. [22,23]. It is best understood if we first consider a system of just two sites (rather than an infinite chain) over which a Haar random unitary is applied at time t. It is straightforward to calculate (see Appendix A) that for arbitrary q where Note two features. First, the result is linear in a S (t) 2 . Second, S must be the identity if and only if S is, but otherwise a S (t + 1) is a constant for all S = I. In other words, the random unitary introduces a (fictitious) Markov process on the probabilistic ensemble {(S, a S (t) 2 )} of strings [22,23]. This Markov process describes a single string S which is stochastically updated over time. If S is nontrivial, each update maps it to any nontrivial string, with uniform probability. The generalization to multiple spins is immediate: for each pair of spins that interact in a given timestep, the stochastic update is applied to the corresponding two-site substring of S. This Markov process will also be used in higher dimensional setting below, as it is not specific to the 1+1D setting. Note that the fictitious stochastic dynamics, which involves a single evolving string, is entirely different from the stochastic dynamics of the operator X 0 (t) itself (which is a superposition of exponentially many strings).
Returning to the average of the OTOC, we realize that it only matters whether or not the string component of X 0 (t) at the site x is the identity. In the ensemble {(S, a S (t) 2 )}, the fraction of strings that occupy the site x, may fail to commute with Y x . There are q 2 − 1 possible nontrivial operators at the the site, which are all equally probable in the ensemble of string components of X 0 (t). In the present case of q = 2, this yields 4 4 For general q, one has to start with an operator basis that obeys our normalization condition in Eq. (12). It is easy to construct such a basis. Define X = k∈Zq |k + 1 k| and Z = k∈Zq e 2πik/q |k k|. Then, the discrete group generated by these two matrices contains exactly q 2 elements up to unimportant phase factors. These are not hermitian, but no problem arises if one considers |a S | 2 . Over Haar random unitaries, one easily obtains C(x, t) = q 2 q 2 −1 µ(x, t).
In turn, the average occupation number µ(x, t) can be related to the endpoint density ρ, assuming that x is far to the right of the left-hand front of the operator: The constant of proportionality µ 0 has been determined by assuming local equilibration of the structure of the strings. 5 Therefore It is natural to conjecture that local equilibration of the strings, together with the exponentially large number of strings contributing to ρ, will make this identity valid asymptotically even without the average. It remains to analyze the dynamics of ρ(x, t). The above Markov process implies a simple autonomous dynamics for ρ: is calculated by counting the non-identity two-site operators S that have the identity at x + 1, and the overline denotes averaging over unitaries applied up to a given time.
Recalling that unitaries are applied on even and odd bonds alternately, Eq. 23 gives a complete description of the dynamics of ρ(x, t). This is a lattice diffusion equation for the conserved density ρ. Formally, ρ behaves like the probability density for a random walker who starts at the origin, and who prefers to travel to the right since p < 1 2 . In the continuum (i.e. at long timescales) ρ satisfies a simple diffusion equation, whose drift and diffusion constants are determined in Appendix. B: 5 To find µ 0 , make the ansatz that each µ(x) is independent from µ(x ) for x = x . Under this ansatz, the probability that a pair of sites is partially or fully occupied is 1 − (1 − µ 0 ) 2 , and such an occupied pair evolves to fill one of the pair with probability 1 − p. Therefore, setting µ 0 = (1 − p)(2µ 0 − µ 2 0 ) = q 2 −1 q 2 yields the stationary state.
The peak in ρ corresponds to the front of the spreading operator X 0 (t). It travels at speed v B (q) and broadens as σ(q, t) (Eq. 6). We emphasize that this fictitious random walker should not be thought of as 'the endpoint' of the operator X 0 (t), which is a superposition of many strings with different endpoints.
From (22), or in the continuum we see that C obeys the same equation as ρ(x, t) but with different boundary conditions, Taking into account the similar behaviour at the left hand front gives (7). Above we had to make two (very natural) assumptions. One was that we can ignore the interaction between the left end and the right end, and the other was that the occupation density µ(x, t) reaches its equilibrium value. In Section IV we give an exact calculation of the averaged OTOC (including exact results for finite t and x, not necessarily large) without making any approximation.

B. Hydrodynamic description including fluctuations
Having determined the averaged OTOC, the key question is about the fluctuations between different realizations of the random circuits. From the point of view of exact results this is a much harder question (it is possible to obtain bounds in regions far from the front: we return to this in Sec. IV C). However, we conjecture that the universal physics of fluctuations in ρ(x, t) can be obtained by upgrading Eq. 25 to a stochastic diffusion equation for the random quantity ρ(x, t). This description indicates that fluctuations are strongly suppressed at late times. Since the diffusive broadening is present in a single realization (i.e. is not an artefact of disorder averaging) it is natural to conjecture that it will also be present in generic non-random 1D many-body systems.
Microscopically we expect noise in both the diffusion constant and the drift, but we restrict to noise in the latter since it is more relevant in the RG sense: Here η(x, t) is white noise, uncorrelated in space and time.
The statistical properties of this equation are easy to analyze. In the absence of the noisy drift term, ρ(x, t) forms a 'wavepacket' whose width grows like √ t and whose center of mass is at x cm = v B t. When the noisy drift is turned on, it induces statistical fluctuations in x cm whose magnitude scales with time as A quick way to see this is to ask what the drift velocity has been in a given realization, averaged over the spacetime region visited by the wavepacket. The wavepacket visits a spacetime volume of order t dt √ t ∼ t 3/2 . Averaging the drift velocity η(x, t) over this spacetime volume yields v av ∼ t −3/4 . The typical random displacement of the wavepacket is thus of order ∆x cm ∼ v av t ∼ t 1/4 . A standard perturbative calculation in Appendix C reproduces this exponent 1/4, which also characterizes the spreading of directed waves in random media [80].
The quantity ∆x cm is parametrically smaller than √ t, the width of the front of the averaged commutator. Therefore this heuristic argument indicates that the front profile of the averaged OTOC also applies to the OTOC within a given instance of the random circuit. This is somewhat surprising. To see why, let us contrast the above Haar random dynamics with Clifford dynamics for q = 2.

C. Comparison with Clifford circuit dynamics
The Clifford group is a discrete subgroup of the unitary group, defined by the property that any Pauli matrix is mapped to a product of Pauli matrices. When the quantum circuit consists of Clifford operators, an initial Pauli matrix remains a single string (rather than evolving into a superposition of exponentially many strings as for dynamics with generic unitaries) and the endpoint density ρ(x, t) is localized on a single site for all times.
However, uniformly random Clifford circuits have a crucial relationship with Haar random circuits. Under a uniformly random Clifford update on a pair of sites, a nontrivial operator is mapped with equal probability to any of the nontrivial operators, and thus the dynamics satisfies the master equation of the Markov process in Eq. (17) [10]. As a result, the averaged quantities such as the average end point density ρ(x, t), the average occupation number µ(x, t), and, most importantly, the average OTOC C(x, t), obey exactly the same dynamics as the Haar random case. Formally, this is a consequence of the fact that random Clifford operators form a unitary 2-design [76]; see Appendix D for the definition of design and a proof for random Clifford. One may say that Clifford dynamics realizes the a priori-fictitious Markov process in a physical system. 6 Despite the equivalence of averaged quantities, the quantities within a realization are entirely different. In the Clifford case the endpoint density ρ and the OTOC C are strongly fluctuating, while we have argued that for generic unitaries they are self-averaging (fluctuations are parametrically small).

III. HIGHER DIMENSIONS
We now address the structure of the out of time order correlator C(x, t) in spatial dimensions greater than one, by exploiting the relationship between the averaged OTOC and a fictitious classical Markov process (Sec. II A). We show that this process is a classical droplet growth problem whose universal physics can be understood in terms of the Kardar-Parisi-Zhang equation [59]. By taking appropriate averages, we then obtain exact universal exponents and scaling forms for the OTOC in a circuit composed of Haar random unitary matrices. We conjecture that these scaling forms also apply to more realistic Hamiltonian dynamics in non-integrable lattice models and field theories.
Somewhat surprisingly, we show that the 'shape' of the spreading operator at late times does not become spherical, unless the microscopic dynamics has symmetry under continuous spatial rotations. In a lattice model, the spreading operator remembers forever that the lattice has only discrete point group symmetries. Our argument for this is not specific to random circuit dynamics. The point is simply that 'the' butterfly velocity v B , which sets the speed at which the operator's front moves, generically depends on the front's orientation, resulting in an anisotropic profile for the spreading operator at long times. Another surprising outcome, given previous work in the context of many-body perturbation theory including Ref. [31], is that for the dynamics considered here the averaged OTOC C does not satisfy a local differential equation.

A. Higher dimensions: setup and mapping to classical growth
We now describe the unitary dynamics for which we wish to study operator spreading and the OTOC. We choose a circuit where in each timestep Haar-random two-site unitaries are applied to bonds of a d-dimensional cubic lattice in a manner that generalizes the 1+1D protocol. We describe the 2+1D case for concreteness; the generalization to higher dimensions is immediate. The periodicity of the circuit is 4 layers. Four successive layers cycle through the four columnar 'dimer coverings' of the square lattice as shown schematically in Fig. 4 and Fig. 5, so that the site at the origin interacts sequentially with its neighbours at x = (0, 1), (−1, 0), (0, −1), (1, 0).
The reduction to a classical stochastic process in terms of the fictitious occupation numbers n(x) = 0 or 1 (31) proceeds just as in 1D (Sec. II and Appendix. A). Consider two adjacent sites x and y which undergo a joint update in a given timestep. If both sites are initially empty (n(x) = n(y) = 0) they remain so after the update. If at least one of the sites is initially occupied (n(x) = 1 or n(y) = 1 or both) then the configuration after the update can be n(x) = 1, n(y) = 0 with probability p, or n(x) = 0, n(y) = 1 with the same probability, or n(x) = n(y) = 1 with probability 1 − 2p, where as before If we consider the OTOC for a spreading operator which is initially localized at a single site, then the corresponding classical model is initialized with n = 1 at the origin and n = 0 everywhere else. A possible evolution of n(x) in a single timestep is shown in Fig. 4.
Growth of a Classical Droplet and the OTOC: We relate the behavior of the OTOC (averaged over the unitaries in the circuit) to a classical stochastic process for the growth of a droplet in two spatial dimensions. A given configuration of the classical droplet is specified by a binary occupation number n(x, t) as shown the left. Remarkably, the average droplet profile n(x, t) precisely reproduces the averaged OTOC.
Recall that the Haar-avaraged OTOC is related to the mean occupation number for this Markov process at time t by the relation as illustrated schematically in Fig. 6. The averages on the two sides of the above equation have different meanings. On the left, the bar denotes an average over realizations of a unitary circuit, and C is a correlator for this quantum dynamics. On the right, the angle brackets denote an average in a classical stochastic process. The real number C and the integer n are only related after averaging. As we noted above, the fictitious Markov process can be physically realized by random Clifford dynamics, whenever q is a prime power.

B. Classical model in 2+1D: analytical and numerical results
The 'seed' at the origin grows to produce a cluster of linear size ∼ t. In the interior of this cluster the state equilibrates rapidly to a state in which nearby sites are essentially uncorrelated, with average occupation n(x, t) classical = (q 2 − 1)/q 2 . In a given realization there is an interface between the occupied and unoccupied regions which is sharp on length-scales of the order of the lattice spacing. The evolution of the droplet is very similar to well-studied growth models such as the Eden model [83], and reduces to the stochastic growth of this one-dimensional interface. The size of the occupied region grows linearly in time, with statistical fluctuations in the shape of the interface. (The average shape in our 2D model is not circular, but has only four-fold rotational symmetry; we discuss this in Sec. III E.) Typically such growth processes are in the universality class of the Kardar-Parisi-Zhang (KPZ) equation [59]. We determine the behavior of the averaged OTOC by simulating the stochastic growth of a two-dimensional cluster over M = 2 × 10 3 realizations, with local updates applied at each timestep, as described in the text. The average occupation number for the cluster n(x, t) is shown for the indicated times in the evolution as it approaches its asymptotic shape.
Consider a section of the interface and let ξ be a coordinate parallel to the interface and h its height in the perpendicular direction. The KPZ equation is where ζ is uncorrelated spatiotemporal noise. The constant c contributes to the average normal growth rate for the interface, while the ν term describes diffusive smoothing of sharp features. Finally, the non-linear λ term encodes the dependence of average growth rate on the slope. This equation renormalizes to a nontrivial fixed point. One of its most basic properties is the fact that the fluctuations in the height at a given position ξ grow with time as t β , with β = 1/3. Let us write the shape of the droplet as a parameterized curve in polar coordinates, with R(θ) the radius at angle θ from the origin. (As mentioned above, the interface is sharp on an O(1) lengthscale, and therefore R(θ) is well defined up to an O(1) uncertainty; this is sufficient since the properties we discuss below are on parametrically larger lengthscales when t is large.) From KPZ scaling we would expect with the exactly-known exponent β = 1/3. We will discuss the nonuniversal function r(θ) below and in Sec. III E, and we will discuss more detailed universal properties in the next section.
We have examined the growth of the droplet for spin-1/2 degrees of freedom (q = 2) on the square lattice, by tracking the average, evolving support of a cluster over M = 2 × 10 3 realizations of the classical dynamics up to time t = 1000. We store only the density n(x, t) , averaged over all M realizations, as a function of position and time, as this is the quantity with a direct interpretation in the quantum setting. We have also investigated smaller values of q: these do not have an interpretation in the quantum circuit, but in the classical model decreasing q simply corresponds to increasing the probability p in the update. At each time slice, the form of n(x, t) is fitted, along cuts through lattice symmetry axes, to extract the cluster size and the the width of the front (where n(x, t) is appreciably different both from zero and from its t → ∞ value). We observe linear growth of the size as expected. Note that the fluctuations in the second equation of (35) imply that the width of the front region is expected to scale like t 1/3 . Fig. 8 (top) shows the growing width of the front for cuts along the diagonal, θ = ±π/4. There, at the largest times we can access, the fitted exponent is β = 0.3305 ± 0.0269, extracted from a fit to the blue data points in Fig. 8. As expected, this value is consistent with the KPZ value β = 1/3.
A slight surprise is that the behaviour along the axis, e.g. at θ = 0 is rather different: see Fig. 8 (bottom), which does not show KPZ growth. Generically the only stable fixed point for the growth of a 1D interface is believed to be the KPZ fixed point. However anomalous growth is possible in this model, for q greater than a critical value q c 2, when the direction of the front's local normal vector is fine-tuned to coincide with one of the axes, as occurs at θ = 0. In this regime, a front with normal parallel to a lattice axis moves at a speed exactly equal to the naive light-cone speed, v B = 2, and does not roughen. This is a known phenomenon in various lattice growth models in discrete time which have synchronous parallel updates, and can be understood by a relationship with directed percolation [84][85][86][87][88]: see Appendix. E for an explanation. While interesting, this phenomenon is an artefact of the specific discrete spacetime geometry we have chosen, which could be eliminated by modifying this geometry, 7 and we certainly do not expect it to be relevant to continuous time dynamics. (It would be interesting to look for this effect in appropriate deterministic Floquet dynamics, however.) It has an effect on the shape of the droplet, which we discuss in Sec. III E. We now discuss the OTOC scaling that results from the KPZ mapping, neglecting effects of lattice anisotropy (which we will return to in Sec. III E).
To simplify things let us consider a case where lattice anisotropy is absent or very weak, so that the spreading operator is circular and the OTOC depends only on a radial coordinate and time. Weak anisotropy could certainly be engineered in an appropriate random circuit. More importantly, we conjecture that the scaling form below captures universal scaling in realistic rotationally invariant many-body systems and field theories.
For the growth of a droplet, the probability distribution of the interface radius is given by the GUE Tracy Widom distribution [65,68] (which has been been observed experimentally in striking experiments on the growth of a turbulent domain in liquid crystals [93,94]). Following convention we write where the non-universal constants v B and c are of order one, and χ(θ, t) is a random variable whose mean and variance are of order one at large times. Focussing on a fixed value of θ, the cumulative probability distribution of χ at a fixed time is t-independent at large times and given by the Tracy Widom distribution F 2 : Remarkably, this allows us to fix the full functional form of C(x, t) in two dimensions, in the case where lattice anisotropy is absent. In polar coordinates (r, θ), and in the continuum, Eq. 33 is where Θ is the Heavyside step function. The right hand side is precisely the probability that χ is greater than (r − v B t)/ct 1/3 . We suppress the θ dependence since we are assuming rotational symmetry: The form of C(r, t) is shown in Fig. 9. The asymptotic behaviour near the trailing edge, close to saturation, i.e.
The former asymptotic expansion of F 2 was achieved only recently [95,96].
One can also consider operator spreading with other initial conditions. For example we can initialize an operator in a half-plane so that C(x, t) has a straight, rather than a circular, front. The scaling form for C(x, t) will then be given by the Tracy Widom distribution of the Gaussian orthogonal ensemble, denoted F 1 . The objects F 1 and F 2 are of fundamental importance in a broad range of mathematical and physical problems and it would be very interesting to see whether any of these connections shed light on operator growth.

D. Scaling of the OTOC in 3 + 1D and above
The basic features of the 3+1D case are very similar to those in 2+1D. The KPZ equation extends to an interface of arbitrary dimensionality [59]. For the the 3+1D quantum problem, the dimensionality of the interface is two and the critical exponent β relevant to the width is β 0.240 (Ref. [60] and references therein). The analogue of F 2 which yields the universal form of C is not known analytically, but has been determined numerically [92]. Numerical simulations for the 3+1D random circuit, along the lines of those above, would be feasible.
Dimensions equal to or higher than 4+1 are of course inaccessible experimentally, but they are nonetheless interesting because in these high dimensionalities the KPZ equation yields a phase transition as a function of the strength of nonlinearity [59]. Both a rough phase, in which fluctuations grow as t β with β > 0, and a smooth phase, where fluctuations remain of order one as t → ∞, exist. It would be interesting to know whether both phases are accessible in appropriate many-body systems. It is interesting to consider the shape of the spreading operator at late times -i.e. the shape of the growing spatial region in which the OTOC C(x, t) has already saturated to its late time value (to within an exponentially small correction). Rescaling distances by a factor of t −1 gives a 'droplet' of O(1) size, which we expect to reach a fixed asymptotic shape. In this scaling limit the width of the front is negligible, so the front can be treated simply as a curve. What is its asymptotic shape?
At first glance, one might expect that the asymptotic shape is a circle in two spatial dimensions and a sphere in higher dimensions. For example, this would be expected if the OTOC satisfied a local nonlinear differential equation in which derivatives higher than 2 could be neglected, as discrete lattice symmetries would ensure that such an equation had symmetry under continuous spatial rotations. Instead, we argue that for many body systems on the lattice the shape of the operator is model-dependent and retains information about the discrete symmetries of the lattice, even at arbitrarily late times. For the random circuit model this follows immediately from the mapping to domain growth processes, for which anisotropy is a well-known feature [97][98][99][100][101]. Figure. 10 shows the shape of the droplet in the present model for various values of q.
For concreteness consider the 2D case (similar statements hold in higher dimensions). The asymptotic droplet shape is described by a radius R(θ) depending on the polar coordinate θ. Since the size of the operator is large at large times, the curvature of the front is parametrically small, except possibly at isolated θ values where R(θ) is not smooth. Away from such isolated points, the local velocity of the front, in the direction of its normal vector, can depend only on the orientation of this local normal vector. This dependence is captured by a velocity function v B (φ), where φ = φ(θ) is the angle of the local normal vector to the x-axis. A priori v B (φ) is constrained only by lattice symmetry; for example on the square lattice It is evident that the asymptotic shape cannot be a circle except when v(φ) is a constant function. Since the front of the operator advances by v(φ)dt in the normal directionn, the distance between the front and the origin grows by v(n)dt/n ·r, which must be equal to h(r)dt.
Expressing the normal vector in terms of h, one obtains where φ(θ) is the angle of the normal at polar position θ on the interface. This equation is solved by a geometrical construction described in Ref. [100,101]: cos(φ−θ) . When the effect of lattice anisotropy is t = 1000 weak (as is likely to be the case in many realistic situations when the relevant degrees of freedom are longwavelength modes), we expect v B (φ) to be a smooth, weakly varying function, and we may also solve for the shape perturbatively in w(φ) = v B (φ)/v B (φ), as described in Appendix. F. Restoring the time dependence, However when v B (φ) varies sufficiently strongly, the asymptotic shape R(θ) can include sharp corners or straight segments on the boundary: in this regime the perturbative solution above is no longer appropriate. For many-body systems in continuous time we expect v(φ) to be analytic. In the present lattice model, v(φ) is analytic for q < q c (q c 2) while for q > q c this function is nonanalytic near φ = 0 as a result of the anomalous behaviour of a lattice-aligned front: v(φ) 2 + const. |φ| [88]. This leads to flat facets near θ = 0 in the asymptotic shape [88]. This change in the surface morphology as q is varied is shown in Fig. 10. FIG. 11. Anisotropy in the Cluster Profile: Numerically determined anisotropy in the average shape of the 2D cluster R(θ, t)/t, at the indicated times. The anisotropy in the cluster shape grows in time, and appears to asymptote to a nontrivial steady-state shape.
For the random circuit model, it is straightforward to determine v B (θ) in the extreme limit q = ∞ where growth becomes deterministic. The propagation of the front in this limit is similar to that of the 'next nearest neighbour' deterministic Eden model introduced in Ref. [102] and has the same nonanalytic angular dependence of the velocity 8 [102]: v q=∞ (φ) = 2 (| cos φ| + | sin φ|) .
In this limit, the growing operator is simply a square. Fig. 11 shows the angular dependence of the radius for the 2+1D random circuit dynamics at q = 2, for several values of the time, showing a clear anisotropy. Note also that R(θ = 0) → 2t at late times. In the light of our 1D results where, for finite q, v B is always less than the speed associated with the naive lightcone, it is remarkable that in a higher dimensional circuit it is possible for the OTOC front to propagate at the maximal speed in some directions. However we emphasize that this effect relies on the specific discrete spacetime geometry.

F. Formal viewpoint
Before returning to 1D, we restate the higherdimensional results of Sec. III A in a more formal language which parallels our discussion in 1D. We introduce a density on clusters, C, where C is a collection of sites: Here supp(S) is the support of S. After coarse-graining, we can represent C by a closed surface of spherical topology, namely the boundary of the coarse-grained cluster. Therefore ρ(C) is the natural analogue of the 'endpoint density' ρ(x) in 1D. The surface growth picture implies that the effective dynamics of ρ(C) are the dynamics of the probability distribution of a growing interface. Therefore, when this is KPZ, ρ(C) satisfies the Fokker-Planck equation corresponding to the KPZ equation. We will discuss this further elsewhere.

IV. EXACT CALCULATION OF OTOC IN 'SPACETIME' PICTURE
We now given an analytical treatment of the OTOC from a 'spacetime' point of view. This leads to connections with domain walls in an effective Ising model. Similar Ising degrees of freedom have appeared in work on random tensor networks [77]. Here the effective Ising model looks complicated at first sight, but turns out to be much simpler than those encountered in random (nonunitary) tensor networks, due to special structure arising from unitarity.
This spacetime picture may be much more generalizable than the dynamical point of view above. In Sec. V we will use it to calculate an entanglement-related quantity. In the future, we hope that the tools introduced in this section will be generalizable to higher moments of the OTOC which capture fluctuations (C 2 etc.), or higher powers of the commutator, or to a direct calculation of the von Neumann entropy.
Our exact result for the OTOC for arbitrary x and t (not necessarily large) is There are no approximations in Eq. (48). Approximating g by the cumulative density function Φ(y) = 1 √ 2π y −∞ e −x 2 /2 dx of the Gaussian distribution, we reproduce Eqs. 5, 6, 7 above. This approximation is valid when t is large.
Although the spin chain is spatially infinite in both directions and so is our quantum circuit, the time evolved operator U † (t)X 0 U (t) is supported only on the interval [−t, t − 1] of length 2t. Therefore, it suffices to consider an observable Y inserted in this interval, and our correlator becomes the trace of a q 2t × q 2t matrix. The infinite temperature Gibbs state reduces to the identity matrix divided by the dimension q 2t . Expanding the commutator, we see The Haar average of the first term is easy to evaluate. The observable X 2 0 is conjugated by a unitary U −1,0 (1, 0) and after taking the Haar average becomes proportional to the identity. The constant of proportionality is fixed by the trace-preserving condition. By the normalization convention, Tr X 2 0 = q = Tr Y 2 x , and therefore the Haar average of the first term is equal to q −2t Tr I = 1. The second term F contains all the complexity.
Observe that the local unitaries form a square lattice that is rotated by 45 • . It is thus natural to introduce null coordinates as Due to the cylic property of the trace, the only unitaries in the circuit that could affect the correlator are those in the intersection (a rectangle) of the future light cone of X 0 and the past light cone of Y x . From now on, let us use u and v to denote the linear sizes of this intersection along u-and v-direction, respectively. There are u v local unitaries contained in the intersection of the lightcones.

A. Reduction to Ising spins
For each local unitary U the expression F contains two U s and two U † s. We will see that averaging over the local unitaries allows us to express F as a partition function for a set of classical Ising spins. To see why such Ising spins arise, consider the standard expression for the Haar average of a single unitary matrix in U(n): (See Appendix G for a self-contained derivation of this formula.) It is convenient to regard the above expression as a matrix whose rows are labelled by the multi-index (a , b , c , d ) and whose columns are labelled by (a, b, c, d).
Note that two types of contraction appear for the unprimed indices, namely δ ab δ cd and δ ad δ bc , and similarly for the primed ones. Correspondingly, in bra-ket notation the above matrix can be written in terms of two vectors which we denote |I ↑ and |I ↓ (the reason for the notation will become clear below): In the natural basis, these vectors are With this definition, we may write (53) as with We see that the unitary may be associated with a pair of classical Ising 'spins' s and s . For the application of interest to us the unitaries are two-site unitaries acting on the q 2 -dimensional space associated with spins i and i + 1. In this case it is easy to see that with αβγδ |↑ = 1 q δ αβ δ γδ , αβγδ |↓ = 1 q δ αδ δ γβ , (58) and now α, . . . , δ run over the q basis vectors associated with a given spin. The vectors |↑ and |↓ have norm 1 and satisfy When we consider F , the spins arising from each unitary in the circuit will form an interacting network. The interactions between spins from the same unitary will be given by w(s, s ), while the interactions between spins from different unitaries will arise from the inner products of kets |↑, ↓ i associated with a given spin.
As a final piece of notation, we generalize the definition of |I ↑ and |I ↓ . Given an operator O ab on the ndimensional space, we define n 4 -dimensional vectors |O ↑ and |O ↓ via Choosing O to be the identity gives the vectors |I ↑,↓ .
Before we evaluate F for arbitrary u and v , let us consider the simplest case where u = v = 1.
In the second line, X = I ⊗ X and Y = I ⊗ Y . The third line is a trivial rearrangement of the second, and the fourth employs the formal correspondence between matrices and normalized vectors introduced above. The Haar average of the tensor product of four unitaries is given by (53) with n = q 2 .
To complete the evaluation of F we observe that When u , v > 1, we map the layout of local unitaries to a partition function for the spins s, s in Eq. 55. To facilitate the mapping, we decompose the input bra I s | and output ket |I s into separate 'legs' corresponding to the two physical spins, as in Eq. (57), Similarly, the vectors encountered above for the case u = v = 1 can be decomposed as |(IX) ↑ = |↑ |X ↑ and |(IY ) ↓ = |↓ |Y ↓ , which satisfy The expression Eq. (61) is now depicted as in Fig. 12.
It is now clear that for general u , v we may regard the array of unitaries as a tensor network composed of tensors of the form (62). The boundaries of this tensor network -i.e. the external legs of the array of u × v unitaries -involve inner products with fixed vectors. Two of the boundary legs are dressed with q |X ↑ and q Y ↓ |; see Fig. 12. Apart from these, the external legs on the top boundary are dressed by states q ↓|, while those on the bottom boundary are dressed with q |↑ . In addition F includes an overall dimension factor q −2t coming from the infinite temperature Gibbs state. For convenience, we absorb the overall dimension factor q −2t  into the vectors on the lower boundaries; these vectors are taken to be normalized, whereas the boundary bras in the top boundaries have norm q.
We may now interpret F as a partition function for the Ising spins s u,v and s u,v which according to Eq. We have thus mapped the Haar average of the out-oftime correlator to a partition function for Ising degrees of freedom (with the q-dependence residing in the interactions on the bonds). At first sight, this may appear to be a formidable problem. Note in particular that some configurations have negative weight. However, a simplifi- cation is possible, as a result of the unitarity of the underlying dynamics. A hint that a simplification is possible comes from the fact that the expression for F , Eq. (49), becomes trivial if one of the operators X 0 and Y x is the identity operator. In the Ising language this corresponds only to a slight change of boundary conditions. The simplification is effected by integrating out the 'bra' variable s u,v from each unitary. This generates a three-spin interaction among the 'ket' variables s u,v , s u−1,v , and s u,v−1 . The calculation is straightforward and yields the table of weights in Fig. 14 The fact that the weight is zero for two of the configurations means that only a very restricted subset of Ising configurations are allowed. We will show that these can be summed exactly by viewing the configurations in terms of domain walls.
Let us specify the new boundary conditions. The above rules apply along the bottom boundaries due to our normalization convention for the boundary kets, except for the site where the observable ket |X ↑ is dangling.
The top boundary bras, which have norm q, follow the rule that

B. Partition function for two directed paths
Now the problem is reduced to a partition function of Ising variables with the three-body interaction and the boundary interaction. We first simplify the partition function by relating it to one with modified boundary conditions as follows. We denote the weight of a given configuration by W X,Y (s), where the subscripts indicate the dependence on the boundary conditions induced by the operators X and Y . Because of the last rule in Eq. (66), the spin at the site where Y x is attached -null coordinate ( u , v ) -has to be s u, v = ↑. As a result we can replace Y ↓ with ↓ fixed , which according to Eq. (66) gives the same weight when s u, v = ↑. Let us denote the weight of a configuration s under this modified top boundary condition by W X (s), dropping the subscript Y . We may then write the desired quantity We claim that the first term is equal to 1, and thus The claim can be shown in two ways. First, the out-oftime correlator 1 − F must vanish if the operator Y is replaced by the identity. The boundary vector |Y ↓ then becomes |↓ , and the partition function for F becomes precisely s W X (s). Therefore 1 − s W X (s) = 0. The other way to show the claim is by directly integrating out the Ising variables inductively, starting from the top line with respect to the all-↓ boundary condition along top boundary. This is a nontrivial consistency check on our reduction. Now we focus on W X (s) with the variable at nullcoordinate ( u , v ) fixed to be ↓. If the bottom variable where X is attached is ↑, then the second rule in Fig. 14 together with the boundary condition along the bottom boundary dictates that all the bulk variables be ↑. This cannot be fulfilled for the top-right variable, implying that the weight is zero.
Hence, we have fixed two Ising variables in the bulk to be ↓ where the observables X 0 and Y x are attached. Let us think of domain walls instead of spins. The key point is the first rule in Fig. 14 (70) from the weight table in Fig. 14.
A domain wall can be deformed without changing the weight to the partition function. There is essentially one local deformation of the domain wall. One can easily see from Conversely, any pair of paths A → D and C → B, which must meet at at a point, can be viewed as a pair of paths A → B and C → D with a common point. Therefore, the number of pairs of paths from A → B and C → D without intersection is the number of all unrestricted pairs from A → B and C → D, minus the number of all unrestricted pairs from A → D and C → B. The number of our domain wall configurations is therefore where the second factor vanishes when u = 0 or v = 0. Finally we combine the results above: Eq. (72) .
This correctly reproduces the answer q 4 /(q 4 − 1) when This can be conveniently rewritten as Further simplification is possible since g(t − 1, a) g(t − 1, a − 1) for large t. where

C. Bounds on Fluctuations
Here we estimate the fluctuation of C(t, x) due to the randomness of the unitaries. One might wish to calculate this fluctuation directly, using a similar technique that we employ for the average of C(t, x), but we found the exact computation unwieldy as it involves high powers of unitaries. Nevertheless, we can argue that the fluctuations are negligible in two regimes.
Since the random variable C(t, x) takes values between 0 and 2, the variance is upper bounded by 2C. Therefore, the standard deviation is upper bounded by This bound is valid for any t, x, but only meaningful when |x| v B t. This basically says that there is almost no "leakage" of operators beyond the lightcone defined by v B . (In passing, we note that one can also use Markov inequality Pr[X ≥ a] ≤ a −1 EX which holds for any positive random variable X and a positive number a to have a probability tail bound.) In the opposite regime where |x| v B t, we have shown that the average C(t, x) is almost 1; the discrepancy is upper bounded by O(1) exp(−(v B t − |x|) 2 /2σ 2 ). Thus, in this regime the fluctuation is basically given by where F is defined in Eq. (49) and N is the dimension of the Hilbert space of spins where U XU † Y is supported on. Here U includes all the local unitaries in the evolution quantum circuit.
To estimate the fluctuation, we consider a slightly different system where 2ct spins form a ring, where c is some absolute constant that depends on q only. If c > 1, this does not modify the dynamics at all, since the evolved operator U X 0 U † is supported on 2t spins. For c < 1, while we do not insist that this allows us to compute the fluctuation rigorously, we anticipate that qualitative conclusions from this modified setting carry over to the original open chain system.
In Appendix H, we show that if c = M −1 q =Õ(q −2 ), then for all |x| < ct That is, deep in the lightcone, there is a region in spacetime bounded by a nonzero speed where the fluctuation of C(t, x) is suppressed exponentially in t. It is likely that this is only a bound, rather than a tight estimate of fluctuation. Eq. (82) is proved using previous results on approximate unitary designs [24], and estimates for E U F 2 (U ) when U is truly Haar random [103].

V. ENTANGLEMENT GROWTH
Entanglement can by quantified in various ways, but perhaps the simplest measure is the entanglement purity P = Tr (Tr A c |ψ ψ|) 2 ≤ 1, where A is some region. A pure state |ψ on A∪A c is entangled if and only if the purity is not equal to 1. The logarithm of the entanglement purity is the Renyi-2 entropy In this section, we calculate exactly the average purity of 'half' of the infinite chain, for arbitrary t, under the evolution protocol in Section II, with an initial product state. (Previous work has obtained a bound on the saturation time for q = 2 [23].) The calculation technique will be very similar to the OTOC calculation; the difference is only in the boundary conditions. Let A be the left half of our chain, and B be the right half. The initial pure density matrix is ρ(t = 0) = · · · ⊗ P −1 ⊗P 0 ⊗P 1 ⊗· · · , where P i is a projector |i i|, the state at site i. If U is the full time-evolution unitary consisting of local unitaries, then the entanglement purity P across the cut between A and B is The notation |↑ , |↓ is the same as in Sec. IV A. For a one-dimensional projector P on q-dimensional space, the q 4 -dimensinal vector |P ↑ satisfies The expression for the purity can be thought of as a partition function for classical Ising spins as in Sec. IV A. There are two Ising spins associated with each local unitary; see Eq. (55). Due to Eq. (85), for any configuration of the Ising spins, the weight factor from the bottom boundary is q −|A|−|B| , which cancels the factor q |A|+|B| in front of Eq. (84). Hence, the average purity is simply the sum of weights from the domain wall in the bulk (e.g. see Fig. ).
In Sec. IV A, we first integrated out the 'bra' Ising variables s , but here we find it simpler to integrate out the 'ket' Ising variables s. The transition rules of Fig. 14 are now upside down, but otherwise the same. Then, we have a single domain wall starting from the top boundary to reach the bottom. Any domain wall has length exactly t, giving rise to weight q q 2 +1 t . The domain wall can choose between left-down or right-down moves as it proceeds from the top, and therefore there are 2 t domain walls. We conclude that We may define the 'purity speed' This quantity gives a bound on the growth rate of the second Renyi entropy: The inequality is because the function f (x) = − log x is convex. Note that this expression bounds the growth rate of S 2 but does not fix it. The distribution of S 2 fluctuates in a window of small size compared to its mean [10], 9 but since S 2 appears in the exponential in q −S2 , this does not rule out the possibility that this quantity is affected by rare anomalously small values of S 2 , making it very different from q −S2 . The von Neumann entropy S vN is always greater than or equal to S 2 , so the growth rate v E of S vN is also bounded by v P : where the expansion is for large q. In Ref. [10] we argued that the universal fluctuations of the entanglement in random circuit dynamics may be understood in terms of a coarse-grained minimal cut, of random shape, through the random circuit. This picture may be contrasted with the domain wall calculation of the averaged purity, which reduces to a statistical mechanics problem without quenched randomness. This is reminiscent of the difference between a quenched and an annealed average in the statistical mechanics of disordered systems [104]. A direct exact calculation of S 2 (not to mention S vN , or of the fluctuations in the entropy) for finite 10 q would be much more difficult than the calculation above, as a replica-like limit [104] would be required to handle the logarithm. However structure arising from unitarity might make this calculation tractable. This is an interesting task for the future.
The scaling limit of the representation obtained in this section, where we take length and time scales to be large and of the same order, yields a 'deterministic' domain wall configuration. This is simply a vertical line for the · · · bonds · · · left staircase right staircase U FIG. 16. Random Circuit built from "Staircase" Unitaries: We use "left" and "right" staircases -built from random two-site unitary operators as shown, and extending over bonds -as the building blocks for a random quantum circuit in which the ratio of the entanglement and butterfly velocities vE/vB may be made arbitrarily small.
infinite geometry considered here. 11 We expect that extending the calculation to higher dimensions will give, in the scaling limit, a formula for −log P as the 'energy' of a minimal surface (representing the Ising domain wall) which has a deterministic coarse-grained geometry, obtained from an effective elastic energy minimization problem. This is precisely the scaling picture proposed in Ref. [10] for the growth of entanglement in higherdimensional systems.

A. Nonuniversality of the ratio vE/vB
In Ref. [10], see also Ref. [11], we showed that the speed v E associated with entanglement growth is in general smaller than the operator growth speed v B , and gave explicit models displaying a ratio v E /v B < 1. In these models 12 this ratio happened to be 1/2. Values close to 1/2 were also found numerically in Ref. [11] and Ref. [51]. These results might lead one to wonder whether this value is in some sense generic. Here we show that it is not, by constructing random circuit dynamics, involving interactions of large but finite range, which give arbitrarily small values of v E /v B without any fine tuning. The construction uses random unitaries made up of 'staircases' of length O(q 2 ) which are made up of smaller random unitaries (Fig. 16). When q is large, we obtain a ratio v E /v B which is at most of order 1/q 2 . (In a deterministic spin chain with quenched spatial disorder it is 11 This is because the √ t fluctuations in the transverse position of the domain wall are negligible compared to t; compare [10] where the minimal cut configuration is also deterministic in the scaling limit. 12 This was determined analytically for a certain large q model, distinct from that here, and numerically for various circuits composed of Clifford gates. even possible to have v E /v B = 0 [40], but here we insist on statistical translational invariance: i.e. we insist that the probability distribution for the circuit is invariant under translations.) Consider quantum circuit dynamics in which 'staircase unitaries' are applied at random locations and at random times in a Poissonian fashion. A staircase is a collection of 2-site unitaries arranged as in Fig. 16. Left and right-oriented staircases are applied with equal probability. The staircase acts on bonds and we take large but finite, satisfying /q 2 1. Let r be the rate at which staircases are dropped at a given location.
A single staircase can increase the entanglement across a given bond by at most 2 units, implying v E ≤ 2r . On the other hand a single staircase can move the endpoint of an operator a long way when q 2 1. The random walk picture of Sec. II A shows that in the limit of large /q 2 , a staircase advances the front of the OTOC by an average distance ∼ q 2 /2. This involves an average over the two staircase orientations, only one of which is effective in advancing the front a long distance. The large value is because, when q is large, the small value of p = 1/(q 2 + 1) (Eq. 24) means the random walker can 'run' a long way up a rightward-oriented staircase before falling off. The previous implies v B q 2 r /2 at leading order in . This yields a ratio v E /v B 4/q 2 in this regime, which can be made arbitrarily small by taking q (and hence ) to be large.

VI. OUTLOOK
We have argued that universal scaling forms for the out-of-time-order correlator can be obtained using mappings to paradigmatic problems in classical statistical mechanics. In one dimension we gave an extremely simple hydrodynamic picture in terms of diffusion. In higher dimensions we gave a mapping to classical surface growth and the KPZ equation. 13 These mappings were derived exactly for random unitary circuits, which are natural 'least structured' models for chaotic quantum dynamics in situations where conserved quantities are not playing an important role. We have conjectured that the universal scaling forms found here also apply to OTOCs at asymptotically late times in generic, nonintegrable many body systems and quantum field theories. It will be interesting to test this conjecture in other situations where calculations are possible.
This picture differs from that obtained in a number of previous calculations using many-body perturbation theory [31,33,37,38], and it will be interesting to understand the reasons for these differences. Ref. [31] found an operator front that did not broaden in time, whereas here we find a broadening front in all dimensions below 4 + 1. Additionally, in Ref. [31] the OTOC was found to obey a local, nonlinear traveling wave equation, which is unlike what we found for random circuits. In 1D we obtained a linear hydrodynamic equation, while in higher dimensions C(x, t) in a random circuit is not governed by a local differential equation at all, contrary to standard lore about OTOCs.
Interestingly, a mean field approximation to the classical growth process would yield a local differential (or rather difference) equation for the OTOC, of traveling wave form. This is discussed in Appendix. I. However, the mean field approximation is not valid in physical dimensionalities.
Assuming that the results here do indeed have applications to realistic systems with Hamiltonians that are fixed in time, it will be interesting to consider extensions of the present coarse-grained pictures which take conserved quantities into account.
We have also given exact results for entanglement growth in 1+1D which support the scaling ideas put forward in [10], as discussed in Sec. V. In this picture (in any D) entanglement growth is determined by a minimal surface in spacetime, whose geometry becomes welldefined 14 in an appropriate scaling limit and is determined essentially by an elastic minimization problem. Furthermore, it was argued in that paper and in Ref. [11] that generically v E < v B , where v E is the speed characterizing the growth of entanglement. Here we have shown that it is possible to have arbitrarily small v E /v B in a random quantum circuit.
The effective Ising partition functions for calculating the OTOC and the purity turned out to have interesting structure, making them drastically simpler than they appeared at first sight, and much simpler than the analogous partition function for a non-unitary tensor network [77]. It would be very interesting to explore whether similar simplifications occur when the averaging involves higher powers of the unitary circuit. If so this would permit calculations of, say, modified versions of the OTOC involving higher powers of the commutator, or a direct calculation of the fluctuations. Even more interesting would be a direct calculation of the von Neumman entropy, which would have to use a replica limit to handle the logarithm.
OTOCs involving higher powers of the commutator are important for comparison with Lieb-Robinson bound. The OTOC considered here can be thought of as the squared Frobenius norm of the commutator divided by the Hilbert space dimension, whereas the Lieb-Robinson bound is on the operator norm of the commutator. The 14 But model dependent above 1+1D two norms are related as the operator norm is always upper bounded by the Frobenius norm, but our results do not put any nontrivial bound on the operator norm, due to the large dimension factor. The exact relation of the two quantities is yet to be determined.
In addition to exploring implications for realistic many-body systems, interesting questions remain that are specific to the random circuit context. (Note that, at the most basic level, our results show that operator growth saturates the naive causal lightcone of the quantum circuit as q → ∞, but not for finite q.) The randomness in the circuit necessarily implies statistical fluctuations in all observables including C(x, t). We have argued that these statistical fluctuations are (perhaps counterintuitively) a subleading effect at late times. We have shown this in regimes far from the front of the OTOC by giving inequalities, and we have given a heuristic argument for it in the region near the front. This argument was based on a phenomenological extension of the hydrodynamic equation for C(x, t) in the 1D case to allow for statistical fluctuations in C(x, t) (Eq. 29). It would be desirable to give a microscopic derivation of Eq. 29. (For the entanglement entropy, statistical fluctuations were investigated in Ref. [10].) It also remains to characterize the classical growth problem in Section III more fully, for example by obtaining the nonuniversal constants via an approximate analytic treatment.
The KPZ equation is connected to a remarkable array of topics in classical statistical mechanics [105], including the directed polymer in a random medium [106] and one-dimensional hydrodynamics [107], and has beautiful experimental applications [93,94]. Through the Tracy-Widom distribution [81], it is also connected to random matrix theory and an array of combinatorial problems (for example the longest increasing subsequence problem and the statistics of random permutations [108,109]). It will be interesting to explore which members of this array can shed light on operator growth.
Related work: While this manuscript was being finalized, we became aware of related work [110], to appear in the same arXiv posting. We also alert the reader to forthcoming numerical work on operator spreading [111].  In this Appendix we give a more detailed explanation of the relationship between the dynamics of the coefficients a S and a Markov process [22,23] and of the derivation of the diffusion picture. We consider Haarrandom, local unitary dynamics. In an N -site system with a q-dimensional Hilbert space at each site, a Hermitian operator that has evolved under the unitary circuit Our normalization convention is Tr(SS ) = q N δ SS , so that a S (t) = q −N Tr(O(t)S). The squared coefficient a S (t) 2 evolves as where U is a layer of m-site unitaries that were applied at time t − 1. In the second line, we have written U = r U r where r is the coordinate of disjoint, m-site clusters on which the unitary U r ∈ U(q m ) acts, and we have also decomposed S = r S r as a product of basis elements acting on these m-site clusters. These operators are normalized according to tr[S r S r ] = q m δ Sr ,S r and tr[S r ] = q m δ Sr ,1 . The Haar average of the above expression is given by And so, the Haar-averaged a S (t) 2 evolves linearly with the real, symmetric matrix Averaging again over the unitaries applied in the previous timesteps gives an equation for P S (t) ≡ a 2 S (t) which is formally a master equation for a fictitious Markov process [22,23]; at a given time there is a single string S which is updated stochastically in each time step, via local updates involving a cluster of m sites. From the form of W SS we see that the local update on m sites is performed by replacing a non-trivial generator on the cluster randomly by any one of the q 2m − 1 non-trivial generators. We emphasize that this fictitious Markov process is not the true unitary dynamics of the operator O(t). This fictitious classical stochastic process dramatically simplifies through the following observations. We focus here on one spatial dimension with updates on bonds. First, observe that the matrix elements W SS only depend on the support of the generators S and S , so that (A6) gives rise to a simpler Markov process for the binary occupation number n(x), which is 1 if the corresponding generator has support at site x and 0 otherwise (if S acts as the identity at x). Formally, the probability distribution of the occupation numbers is given by where the prime indicates that the sum is only over strings S that are compatible with the configuration n(x). Further, the endpoint of the string observes an autonomous Markovian dynamics. Since m = 2, updates involving the endpoint either include the site to the right of it which is empty or that to the left which may be empty or full. The dynamical rule above implies that the probabilities for the position of the endpoint after the update are independent of whether the leftward site was initially occupied or empty. Formally the probability distribution for the position of the endpoint in this fictitious dynamics is which is precisely ρ(x, t), as defined in Sec 2. Therefore, for an endpoint at x or x + 1, a single update applied to the sites x and x + 1 leaves the endpoint at x with probability p = (q 2 − 1)/(q 4 − 1) = 1/q 2 + 1 and at x + 1 with probability 1 − p. This establishes the claim in Sec. 2 for the evolution of ρ(x, t) in a single timestep.

Appendix B: Velocity and diffusion constant for lattice diffusion equation
Our layout of the evolution operator is such that the local unitaries alternate between even and odd bonds. In other words, a bond at time step t is either at the left or right of the bond at t − 1. Thus, it suffices to count the left and right moves to specify the position of the right end-bond of X 0 (t). As described in the main text the probability of a left move is p. Let u ≥ 0 be the number of right moves, and v ≥ 0 be the number of left moves. We have u + v = t, and u − v (or u − v ± 1) is the spatial coordinate of the right endpoint. Therefore, the probability distribution of the position of the right end bond is This is correctly normalized since u+v=t f (u, v) = (1 − p + p) t = 1. Then, the probability that a site x is left to the right end of X 0 (t) is where Φ is the cumulative density function of the normal distribution, and Appendix C: Noisy diffusion equation Starting with Eq. 29, WLOG rescale space so D = 1 and set v = 0 by going to the moving frame. Let ρ 0 be the solution without noise, ρ 0 = (4πt) −1/2 e −x 2 /4t . In terms of the Green's function The centre of mass position of the wavepacket within a given realisation is x cm = x xρ(x, t), so, if x cm is the centre of mass position averaged over realisations, Averaging over the noise with η(x, t)η( (C2) For the leading order scaling, we replace ρ with ρ 0 on the right hand side. Then dimensional analysis applied to the integral gives This is a statistical variation in x cm of order t 1/4 , in agreement with the heuristic argument and with Ref. [80]. This variation is small compared to the width of ρ, indicating that ρ − ρ 0 ρ at late times. The typical size of ∂ x ρ near the peak is O(1/t), so x cm − x cm ∼ t 1/4 corresponds to ρ − ρ 0 ∼ t −3/4 , as compared with ρ ∼ t −1/2 . The approximation above is therefore self-consistent.

Appendix D: Random Clifford operators
Here we review that the left-and right-invariant probability distribution over the Clifford group on n qdimensional qudits is a unitary 2-design when q is a prime number. In other words, for a qudit of prime power dimension q n , the unitary group U (q n ) has a finite subgroup that is a unitary 2-design, and there is a linear operator basis that remains closed under conjugations by this subgroup. This is a well-known result [? ], but we include it here for readers' convenience.
To define the Clifford group, we first need the Pauli group. Define X = q−1 j=0 |j + 1 mod q j| and Z = q−1 j=0 e 2πij/q |j j|. Then, the Pauli group is the subgroup of U (q n ) generated by matrices X 1 , Z 1 , . . . , X n , Z n where (D1) The Clifford group is defined to be the normalizer of the Pauli group in U (q n ). The Pauli group quotiented out by its center ω = e 2πi/q is abelian since XZX † = ω −1 Z, and is isomorphic to the additive group Z 2n q . We define P v for v ∈ Z 2n q to be an element of the Pauli group (Pauli operator) as The center of the Pauli group is also contained in the center of U (q n ), and therefore the conjugation action by the Clifford group on the Pauli group induces an action on Z 2n q . It turns out that this group S of action consists precisely of those that preserves the symplectic form λ n = 0 −I n I n 0 (D3) over Z q . A probability distribution ν of unitary matrices to form a 2-design means that where µ is the Haar probability distribution over U (q n ), and U * is the complex conjugate of U . Tautologically, the Haar distribution is a 2-design. This is equilvalent to having that for any q n × q n matrices O and O . Since Pauli operators (the elements of the Pauli group defined above) generates over the complex numbers the full operator algebra, it is enough to have Eq. (D5) with O and O being Pauli operators.
Let ν be the left-invariant (hence right-invariant) probability distribution over the Clifford group. This is the uniform distribution over the finite Clifford group. Consider a C-linear map Π ν on the set of operators defined by Since ν is a left-invariant distribution over a group of unitaries, Π ν is a projector (which is hermitian under the Hilbert-Schmidt inner product). Since the Clifford group includes the Pauli group, we have for arbitrary a, b ∈ Z 2n q Π ν (P a ⊗ P † b ) = x,y∈Z 2n q η a,b x,y P x ⊗ P † y (η x,y ∈ C, Pauli basis expansion) q by the left-invariance of ν) = x,y∈Z 2n q η a,b x,y ω c T λn(x−y) P x ⊗ P † y (commutation relation among Pauli operators) x (det λ n = 1, and c was arbitrary).
The use of inverse P † b here instead of P b is for notational convenience later. Now, observe that for any nonzero x, y ∈ Z 2n q there exists a symplectic transformation S ∈ S such that y = Sx. For this step, it is essential that q is prime. By the right-invariance of ν by S, we see Π ν (P x ⊗ P † x ) = Π ν (P y ⊗ P † y ). This implies that We claim that this is a linear combination of the identity operator and the swap operator F = q−1 u,v=0 |u v| ⊗ |v u|. This is easily verified once we expand F in the Pauli operator basis; using Tr(FO ⊗ O ) = Tr(OO ) for any q n × q n matrices O and O , we see that F ∝ x∈Z 2n q P x ⊗ P † x . The identity operator and the swap operator commute with U ⊗ U where U ∈ U (q n ). This implies that Π ν (P a ⊗ P † b ) commutes with U ⊗ U , and hence is equal to Π µHaar •Π ν (P a ⊗P † b ). By the right-invariance of the Haar distribution µ Haar , we conclude that Eq. (D5) is proved.
When q is not prime, any probability distribution over the Clifford group fails to be a unitary 2-design. Let n = 1. Since the image of Π µ is a linear combination of the identity and the swap, we must have (see App. G below) (D8) When q = 6, there are non-identity Pauli operators P and Q such that P 2 = I and Q 3 = I. By Eq. (D8), we have Π µ (P ⊗ P † ) = Π µ (Q ⊗ Q † ) = 0. However, Π ν (P ⊗ P † ) is a linear combination of Pauli operators, each of which squares to identity, whereas Π ν (Q ⊗ Q † ) is a linear combination of those that cube to identity, so they cannot be equal.
Appendix E: Anomalous behaviour of the front for φ = 0 Above we noted that for sufficiently large p, p > p c , the lattice growth process which we consider has anomalous behaviour when the front is oriented parallel to a lattice plane. This is a known phenomenon in various lattice growth models in discrete time which have synchronous parallel updates and is well understood in terms of directed percolation [84][85][86][87][88].
In the regime p > p c the lattice-aligned (φ = 0) front has a speed v(φ = 0) = 2 which is precisely the maximum possible speed allowed by causality. In this regime the front is pinned to the 'light front' and is not rough (i.e. the width is of order one). (Exactly at p c , the aligned front is logarithmically rough [87].) For our lattice model it appears that p c 2.
This phenomenon is easily understood via a correspondence with directed percolation [86]. First consider a straight, lattice-aligned front in the trivial deterministic limit p = 0 (q = ∞). Apart from possibly on the first time step, this flat front advances by two lattice spacings every period: the front keeps pace with the 'light cone' which is the line x = 2t. Letñ(y, t) = 0, 1 denote the occupation numbers of the column of sites at the lightcone: n(y, t) is the occupation number of the site at position (2t, y) at time t. When p = 0 we haveñ(y, t) = 1. We are interested in the density ñ (averaged over y) at late times when p is nonzero. If this density remains finite, that means the front has an O(1) width, and is attached to the light cone. If it instead tends to zero, the front detaches from the light cone, and we expect to recover standard KPZ roughening. Note that, in order to determineñ at time t+1, it is sufficient to know onlyñ at time t. The dynamics of the occupation numbersñ(y, t) are as follows. Under a horizontal dimer update (which advances the lightfront) each occupied y has a chance (1−p) of becoming unoccupied. Under a vertical update pairs of adjacent y undergo the pairwise update described in the main text. This allows occupied sites to 'reproduce'. This is therefore a birth-death process of the directed percolation type [104]. When p is large the death rate is small and the reproduction rate is large, and the process is in an 'active' phase with ñ > 0, while when p is small the population of occupied sites dies out.
If the solution is everywhere smooth then the above equation must be satisfied everywhere. (Such solutions exist for sufficiently weakly varying v.) It is straightforward to solve this equation in powers of w: We find that the RHS involves only total derivatives of periodic functions. (Just from looking at Eq. F3 this is at first sight surprising since it emerges from various cancellations.) Therefore, integrating the right hand side according to (F1) gives a periodic r(θ). For a formal explanation for why the expansion of tan(φ−θ) contains only total derivatives of periodic functions, consider a flow in the space of functions v(φ) which interpolates between the function of interest and the trivial function v(φ)=const. Let v 1 (φ) and v 2 (φ) be two functions that are infinitesimally close on this flow and let φ 1 (θ) and φ 2 (θ) be the corresponding solutions. Assuming that φ 1 (θ) is periodic and corresponds to a periodic r(θ) we show that this property is inherited by φ 2 (θ) to order φ 2 − φ 1 . Using (F1), (F3) we obtain tan[φ 1 (θ) − θ] − tan[φ 2 (θ) − θ] (F5) = ∂ θ [ln v 2 (φ 1 (θ)) − ln v 1 (φ 1 (θ))] (F6) As required, the RHS is indeed the total derivative of a periodic function (note that φ 2 does not appear on the RHS). Integrating along the flow then establishes the property for general v(φ) at the formal level -i.e. assuming that the solution evolves smoothly during the flow.

Appendix G: Haar average formula
Here we review a standard formula for the average of matrix elements of unitary matrix with respect to the Haar probability measure µ on U (N ). Let us abbreviate U (N ) dµ(U ) as E U . We are going to prove that where F is the swap operator on (C N ) ⊗2 . Evaluating a particular matrix element, we have δ a b δ c d δ ab δ cd + δ a d δ b c δ ad δ bc − 1 N (δ ab δ cd δ a d δ b c + δ a b δ c d δ ad δ bc ) . (G3) Proof of Eq. (G2). The average is a matrix on H = (C N ) ⊗2 that commutes with every U ⊗2 . Hence, the average is block-diagonal in the basis where the representation of U (d) is block-diagonal. The irreps appearing in H are the symmetric subspace and the anti-symmetric subspace. In each irrep, the average must be proportional to the identity I ± by the Schur's lemma, and we need to evaluate the trace in order to determine the constant of proportionality. The projection onto the (anti-)symmetric subspace is (I ± F )/2 where F is the swap operator: F |ac = |ca . So the trace is This must be equal to C ± Tr(I ± ) = C ± N (N ± 1)/2 Therefore, the average is equal to s=± C s (I + sF )/2.
The first term on the right is a 'death rate'. The second term is spreading. The third term is a correction to overcounting in the second term. An analogous equation could be written down for the regular circuit considered in the main text, but we would have to use discrete time.
Eq. I5 is a lattice traveling wave equation. This is most apparent if we make a formal expansion in the lattice spacing a to second order (valid, given the approxi-mations already made, if the solution is slowly varying). Recalling p = 1/(q 2 + 1) and Eq. I2, This differs from the Fisher-KPP equation only in the C-dependence of the diffusion constant, and we expect similar properties. Above, the mean field limit was an unjustified formal approximation. We could of course construct random circuit models in which the (lattice) mean field approximation was quantitatively accurate up to a large time, for example by using long range interactions or a large coordination number to reduce the effect of correlations. However at long times, in physical dimensionalities, we expect the front to roughen so that the mean field traveling wave picture breaks down. In (unphysically) high dimensions, mean field may be valid even at late times (recall that the phase diagram of the KPZ equation allows for a non-roughening phase in high dimensions, as discussed in the text).