Efficient Classical Simulation of Random Shallow 2D Quantum Circuits

A central question of quantum computing is determining the source of the advantage of quantum computation over classical computation. Even though simulating quantum dynamics on a classical computer is thought to require exponential overhead in the worst case, efficient simulations are known to exist in several special cases. It was widely assumed that these easy-to-simulate cases as well as any yet-undiscovered ones could be avoided by choosing a quantum circuit at random. We prove that this intuition is false by showing that certain families of constant-depth, 2D random circuits can be approximately simulated on a classical computer in time only linear in the number of qubits and gates, even though the same families are capable of universal quantum computation and are hard to exactly simulate in the worst case (under standard hardness assumptions). While our proof applies to specific random circuit families, we demonstrate numerically that typical instances of more general families of sufficiently shallow constant-depth 2D random circuits are also efficiently simulable. We propose two classical simulation algorithms. One is based on first simulating spatially local regions which are then “ stitched ” together via recovery maps. The other reduces the 2D simulation problem to a problem of simulating a form of 1D dynamics consisting of alternating rounds of random local unitaries and weak measurements. Similar processes have recently been the subject of an intensive research focus, which has observed that the dynamics generally undergo a phase transition from a low-entanglement (and efficient-to-simulate) regime to a high-entanglement (and inefficient-to-simulate) regime as measurement strength is varied. Via a mapping from random quantum circuits to classical statistical mechanical models, we give analytical evidence that a similar computational phase transition occurs for both of our algorithms as parameters of the circuit architecture like the local Hilbert space dimension and circuit depth are varied and, additionally, that the effective 1D dynamics corresponding to sufficiently shallow random quantum circuits falls within the efficient-to-simulate regime. Implementing the latter algorithm for the depth-3 “ brickwork ” architecture, for which exact simulation is hard, we find that a laptop could simulate typical instances on a 409 × 409 grid with a total variation distance error less than 0.01 in approximately one minute per sample, a task intractable for previously known circuit simulation algorithms. Numerical results support our analytic evidence that the algorithm is asymptotically efficient

1 Introduction 1.1 How hard is it for classical computers to simulate quantum circuits?
As quantum computers add more qubits and gates, where is the line between classically simulable and classically hard to simulate? And once the size and runtime of the quantum computer are chosen, which gate sequence is hardest to simulate?
So far, our answers to these questions have been informal or incomplete. On the simulation side, [MS08] showed that a quantum circuit could be classically simulated by contracting a tensor network with cost exponential in the treewidth of the graph induced by the circuit. When applied to n qubits in a line running a circuit with depth d, the simulation cost of this algorithm is exp Θ(min(n, d)) . More generally we could consider n = L 1 L 2 qubits arranged in an L 1 × L 2 grid running for depth d, in which case the simulation cost would be exp Θ(min(L 1 L 2 , L 1 d, L 2 d)) . (1) In other words, we can think of the computation as taking up a space-time volume of L 1 ×L 2 ×d and the simulation cost is dominated by the size of the smallest cut bisecting this volume. An exception is for d = 1 or d = 2, which have simple exact simulations [TD02]. Some restricted classes such as stabilizer circuits [Got98] or one dimensional systems that are sufficiently unentangled [Vid03; Vid04; Osb06] may also be simulated efficiently. However, the conventional wisdom has been that in general, for 2D circuits with d ≥ 3, the simulation cost scales as Eq.
(1). These considerations led IBM to propose the benchmark of "quantum volume" [Cro+18] which in our setting is exp d min(L 1 , L 2 ) ; this does not exactly coincide with Eq. (1) but qualitatively captures a similar phenomenon. The idea of quantum volume is to compare quantum computers with possibly different architectures by evaluating their performance on a simple benchmark. This benchmark task is to perform n layers of random two-qubit gates on n qubits, and being able to perform this with 1 expected gate errors corresponds to a quantum volume of exp(n) 1 . Google's quantum computing group has also proposed random unitary circuits as a benchmark task for quantum computers [Boi+18]. While their main goal has been quantum computational supremacy [Nei+18;Aru+19], random circuits could also be used to diagnose errors including those that go beyond single-qubit error models by more fully exploring the configuration space of the system [Cro+18].
These proposals from industry reflect a rough consensus that simulating a 2D random quantum circuit should be nearly as hard as exactly simulating an arbitrary circuit with the same architecture, or in other words that random circuit simulation is nearly as hard as the worst case, given our current state of knowledge. To the contrary, we prove (assuming standard complexitytheoretic conjectures) that for a certain family of constant-depth architectures, classical simulation of typical instances with small allowed error is easy, despite worst-case simulation being hard (by which we mean, it is classically intractable to simulate an arbitrary random circuit realization with arbitrarily-small error). For these architectures, we show that a certain algorithm exploiting the randomness of the gates and the allowed small simulation error can run much more quickly than the scaling in Eq. (1), running in time O(L 1 L 2 ). While our proof is architecture specific, we give numerical and analytical evidence that for sufficiently low constant values of d, the algorithm remains efficient more generally. The intuitive reason for this is that the simulation of 2D shallow random circuits can be reduced to the simulation of a form of effective 1D dynamics which includes random local unitaries and weak measurements. The measurements then cause the 1D process to generate much less entanglement than it could in the worst case, making efficient simulation possible. Before discussing this in greater detail, we review the main arguments for the prevailing belief that random circuit simulation should be nearly as hard as the worst case.
(a) Hardness of sampling from post-selected universality. A long line of work has shown that it is worst-case hard to either sample from the output distributions of quantum circuits or compute their output probabilities [TD02; Aar05; BJS10; AA13; BMS16; BMS17 ;HM17]. While the requirement of worst-case simulation is rather strong, these results do apply to any quantum circuit family that becomes universal once post-selection is allowed, thereby including noninteracting bosons and depth-3 circuits. The hardness results are also based on the widely believed conjecture that the polynomial hierarchy is infinite, or more precisely that approximate counting is weaker than exact counting. Since these results naturally yield worst-case hardness, they do not obviously imply that random circuits should be hard. In some cases, additional conjectures can be made to extend the hardness results to some form of average-case hardness (as well as ruling out approximate simulations) [AA13; BMS16; AC17], but these conjectures have not received widespread scrutiny. Besides stronger conjectures, these hardness results usually require that the quantum circuits have an "anti-concentration" property, meaning roughly that their outputs are not too far from the uniform distribution [HM18]. While random circuits are certainly not the only route to anti-concentration (simply performing Hadamard gates on the all |0 state also works) they are a natural way to combine anti-concentration with an absence of any obvious structure (e.g. Clifford gates) that might admit a simple simulation.
(b) Average-case hardness of computing output probabilities [Bou+19; Mov18; Mov19]. It is known that random circuit simulation admits worst-to-average case reductions for the computation of output probabilities. In particular, the ability to near-exactly compute the probability of some output string for a 1−1/ poly(n) fraction of Haar-random circuit instances on some architecture is essentially as hard as computing output probabilities for an arbitrary circuit instance with this architecture, which is known to be #P-hard even for certain depth-3 architectures. The existence of such a worst-to-average-case reduction could be taken as evidence for the hardness of random circuits. Our algorithms circumvent these hardness results by computing output probabilities with small error, rather than near-exactly.
2. Near-maximal entanglement in random circuits. Haar-random states on n qudits are nearly maximally entangled across all cuts simultaneously [Pag93;HLW06]. Random quantum circuits on L × L × · · · arrays of qudits achieve similar near-maximal entanglement across all possible cuts once the depth is Ω(L) [DOP07; HM18] and before this time, the entanglement often spreads "ballistically" [LS14;BKP19]. Random tensor networks with large bond dimension nearly obey a min-flow/max-cut-type theorem [Hay+16;Has17], again meaning that they achieve nearly maximal values of an entanglement-like quantity. These results suggest that when running algorithms based on tensor contraction, random gates should be nearly the hardest possible gates to simulate.
3. Absence of algorithms taking advantage of random inputs. There are not many algorithmic techniques known that simulate random circuits more easily than worst-case circuits. There are a handful of exceptions. In the presence of any constant rate of noise, random circuits [YG17; GD18], IQP circuits [BMS17] and (for photon loss) boson sampling [KK14;OB18] can be efficiently simulated. These results can also be viewed as due to the fact that fault-tolerant quantum computing is not a generic phenomenon and requires structured circuits to achieve (see [BMS17] for discussion in the context of IQP). Permanents of random matrices whose entries have small nonzero mean can be approximated efficiently [EM18], while the case of boson sampling corresponds to entries with zero mean and the approach of [EM18] is known to fail there. In practice, evidence for a hardness conjecture often is no more than the absence of algorithms. Indeed, while some approximation algorithms are known for estimating output probabilities of constant-depth circuits [BGM19], IQP circuits [SB09] and boson sampling [AA13] up to additive error δ in time poly(n, 1/δ), these are not very helpful for random circuits where typical output probabilities are ∼ 2 −n .
For the case of constant depth, there have been some quantum computational supremacy proposals that do not use uniformly random circuits, mostly based on the MBQC (measurementbased quantum computing) model [RB01]. This means first preparing a cluster state and then measuring it in mostly equatorial bases, or equivalently performing e iθZ for various angles θ and then measuring in the X basis. This is far from performing uniformly random nearest-neighbor gates up to the same depth and then measuring in a fixed basis. In many cases, the angles θ are also chosen to implement a specific family of circuits as well [GWD17; MSM17;Ber+18]. Previously it had not been clear whether this difference is important for the classical complexity or not.
Despite the above intuitive arguments for why the simulation of uniformly random circuits should be nearly as hard as the worst case, we (1) prove that there exist architectures for which this is not the case, and (2) give evidence that this result is not architecture specific, but is rather a general property of sufficiently shallow random circuits. As described in more detail below, we propose two simulation algorithms. One (which we also implement numerically) is based on a 2D-to-1D mapping in conjunction with tensor network methods, and the other is based on exactly simulating small subregions which are then "stitched" together. The performance of both algorithms is related to certain entropic quantities.
We also give evidence of computational phase transitions even for noiseless simulation of quantum circuits. Previously it was known that phase transitions between classical and quantum computation existed as a function of the noise parameter in conventional quantum computation [Sho96; AB96; HN03; Raz04; VHP05; Buh+06; Kem+08] as well as in MBQC [RBH05;Bar+09]. We do not know of any previous work showing phase transitions even for noiseless computation, except for the gap between depth-2 and depth-3 circuits given by Terhal-DiVincenzo [TD02] and the phase transition as a function of rate of qubit loss during the preparation of a 2D cluster state for MBQC [Bro+08].
We believe this latter transition, arising from the percolation phase transition on a square lattice, is in fact closely related to our results. Intuitively, a measurement on the system may be viewed as "good" (entanglement destroying) or "bad" (entanglement preserving). If all measurements are random, some fraction of these measurements will be "good". If this fraction becomes sufficiently large, the entanglement-destroying effects win out, and the system can be efficiently simulated classically. In the special case of a Z-basis measurement being performed on each qubit of a 2D cluster state with some probability p, the result of [Bro+08] shows that this may be made rigorous via results from percolation theory; the resulting quantum state may be efficiently simulated classically if 1 − p falls below the percolation threshold p c for a square lattice, and the resulting state supports universal MBQC if 1 − p falls above p c . In contrast, in our setting a measurement is performed on each site in a Haar-random basis with probability one. While the percolation argument is no longer directly applicable in this setting, we nonetheless find evidence in favor of a computational phase transition driven by local qudit dimension q and circuit depth d.

Our results
We give two classes of results, which we summarize in more detail below. The first consists of rigorous separations in complexity between worst-case simulation 2 and approximate averagecase simulation (for sampling) and between near-exact average-case simulation and approximate average-case simulation (for computing output probabilities) for random circuit families defined with respect to certain circuit architectures. While these results are rigorous, they are proved with respect to a contrived architecture and therefore do not address the question of whether random shallow circuits are classically simulable more generally. To address this issue, we also give conjectures on the performance of our algorithms for more general and more natural architectures. Our second class of results consists of analytical and numerical evidence in favor of these conjectures.

Provable complexity separations
We now summarize our provable results for particular circuit architectures. We first define more precisely what we mean by an "architecture".
Definition 1 (Architecture). An architecture A is an efficiently computable mapping from positive integers L to circuit layouts A(L) defined on rectangular grids with sidelengths L×f (L) for some function f (L) ≤ poly(L). A "circuit layout" is a specification of locations of gates in space and time and the number of qudits acted on by each gate. (The gate itself is not specified.) For any architecture A, we obtain the associated Haar-random circuit family acting on qudits of constant dimension q, C A,q , by specifying every gate in A to be distributed according to the Haar measure and to act on qudits of dimension q which are initialized in a product state |1 ⊗(L×f (L)) .
In this paper, we only consider architectures that are constant depth and spatially 2-local (that is, a gate either acts on a single site or two adjacent sites); therefore, "architecture" for our purposes always refers to a constant-depth spatially 2-local architecture. The above definition permits architectures for which the layout of the circuit itself may be different for different sizes. However, it is natural for a circuit architecture to be spatially periodic, and furthermore for the "unit cells" of the architecture to be independent of L. We formalize this as a notion of uniformity, which we define more precisely below.
Definition 2 (Uniformity). We call a constant-depth architecture A uniform if there exists some spatially periodic circuit layout B on an infinite square lattice such that, for all positive integers L, A(L) is a restriction of B to a rectangular sub-grid with sidelengths L × f (L) for some f (L) ≤ poly(L). A random circuit family C A,q associated with a uniform architecture A is said to be a uniform random circuit family.
While uniformity is a natural property for a circuit architecture to possess, our provable separations are with respect to certain non-uniform circuit families. In particular, we prove that there exists some non-uniform circuit architecture A acting on n qubits such that, if C A is the Haar-random circuit family associated with A acting on qubits, 1. There does not exist a poly(n)-time classical algorithm that exactly samples from the output distribution of arbitrary realizations of C A unless the polynomial hierarchy collapses to the third level.
Implications for worst-to-average-case reductions for random circuit simulation. Very recently, it was shown [Mov19] that for any random circuit family with Haar-random gates for which it is #P-hard to compute output probabilities in the worst case, there does not exist a poly(n)-time algorithm for computing the output probability of some arbitrary output string x up to additive error 2 −Θ(n 2 ) with high probability over the circuit realization, unless there exists a poly(n)-time randomized algorithm for computing a #P-hard function. Essentially, for Haar-random circuits, near-exact average-case computation of output probabilities is as hard as worst-case computation of output probabilities. Our results described above imply that the error tolerance for this hardness result cannot be improved to 2 −1.99n .
This hardness result follows other prior work [Bou+19; Mov18] on the average-case hardness of random circuit simulation. In particular, the original paper [Bou+19] uses a different interpolation scheme than that used in [Mov18;Mov19]. Interestingly, as discussed in Appendix A, we find that the interpolation scheme of [Bou+19] cannot be used to prove hardness results about our algorithms' performance on C A , despite C A possessing worst-case hardness. While this observation may be of technical interest for future work on worst-to-average-case reductions for random circuit simulation, the alternative interpolation scheme of [Mov19] does not suffer from this limitation.
While [Bou+19; Mov18; Mov19] prove hardness results for the near-exact computation of output probabilites of random circuits, it is ultimately desirable to prove hardness for the Random Circuit Sampling (RCS) problem of sampling from the output distribution of a random circuit with small error in variational distance, as this is the computational task corresponding to the problem that the quantum computer solves. A priori, one might hope that such a result could be proved via such a worst-to-average-case reduction. In particular, it was pointed out in these works that improving the error tolerance of the hardness result to 2 −n / poly(n) would be sufficient to prove hardness of RCS. Our work rules out such a proof strategy working by showing that even improving the error tolerance to 2 −1.99n is unachievable. In particular, any proof of the hardness of RCS should be sensitive to the depth and should not be applicable to the worst-case-hard shallow random circuit ensembles that admit approximate average-case classical simulations.

Conjectures for uniform architectures
While the above results are provable, they are unfortunately proved with respect to a contrived non-uniform architecture, and furthermore do not provide good insight into how the simulation runtime scales with simulation error and simulable circuit fraction. An obvious question is then whether efficient classical simulation remains possible for "natural" random circuit families that are sufficiently shallow, and if so, how the runtime scales with system size and error parameters. We argue that it does, but that a computational phase transition occurs for our algorithms when the depth (d) or local Hilbert space dimension (q) becomes too large. Here we are studying the simulation cost as n → ∞ for fixed d and q. Intuitively, there are many constant-depth random circuit families for which efficient classical simulation is possible, including many "natural" circuit architectures (it seems plausible that any depth-3 random circuit family on qubits is efficiently simulable). However, we expect a computational phase transition to occur for sufficiently large constant depths or qudit dimensions, at which point our algorithms become inefficient. The location of the transition point will in general depend on the details of the architecture. The conjectures stated below are formalizations of this intuition.
We now state our conjectures more precisely. Conjecture 1 essentially states that there are uniform random circuit families for which worst-case simulation (in the sense of sampling or computing output probabilities) is hard, but approximate average-case simulation can be performed efficiently. (Worst-case hardness for computing probabilities also implies a form of average-case hardness for computing probabilities, as discussed above.) This is stated in more-or-less the weakest form that seems to be true and would yield a polynomial-time simulation. However, we suspect that the scaling is somewhat more favorable. Our numerical simulations and toy models are in fact consistent with a stronger conjecture, Conjecture 1', which if true would yield stronger runtime bounds. Conversely, Conjecture 2 states that if the depth or local qudit dimension of such an architecture is made to be a sufficiently large constant, our two proposed algorithms experi-ence computational phase transitions and become inefficient even for approximate average-case simulation.
Conjecture 1. There exist uniform architectures and choices of q such that, for the associated random circuit family C A,q , (1) worst-case simulation of C A,q (in terms of sampling or computing output probabilities) is classically intractable unless the polynomial hierarchy collapses, and (2) our algorithms approximately simulate C A,q with high probability. More precisely, given parameters ε and δ, our algorithms run in time bounded by poly(n, 1/ε, 1/δ) and can, with probability 1 − δ over the random circuit instance, sample from the classical output distribution produced by C q up to variational distance error ε and compute a fixed output probability up to additive error ε/q n .
Conjecture 1'. For any uniform random circuit family C A,q satisfying the conditions of Conjecture 1, efficient simulation is possible with runtime replaced by n 1+o(1) · exp O( log(1/εδ)) .
Conjecture 2. For any uniform random circuit family C A,q satisfying the conditions of Conjecture 1, there exists some constant q * such that our algorithms become inefficient for simulating C A,q for any constant q ≥ q * , where C A,q has the same architecture as as C q but acts on qudits of dimension q . There also exists some constant k * such that, for any constant k ≥ k * , our algorithms become inefficient for simulating the composition of k layers of the random circuit, is i.i.d. and distributed identically to C A,q . In the inefficient regime, for fixed ε and δ the runtime of our algorithms is 2 O(L) .
Our evidence for these conjectures, which we elaborate upon below, consists primarily of (1) a rigorous reduction from the 2D simulation problem to a 1D simulation problem that can be efficiently solved with high probability if certain conditions on expected entanglement in the 1D state are met, (2) convincing numerical evidence that these conditions are indeed met for a specific worst-case-hard uniform random circuit families and that in this case the algorithm is extremely successful in practice, and (3) heuristic analytical evidence for both conjectures using a mapping from random unitary circuits to classical statistical mechanical models, and for Conjecture 1' using a toy model which can be more rigorously studied. The uniform random circuit family for which we collect the most evidence for classical simulability is associated with the depth-3 "brickwork architecture" [BFK09] (see also Figure 4 for a specification). We now present an overview of these three points, which are then developed more fully in the body of the paper.
1.3 Overview of proof ideas and evidence for conjectures 1.3.1 Reduction to "unitary-and-measurement" dynamics.
We reduce the problem of simulating a constant-depth quantum circuit acting on a L × L grid of qudits to the problem of simulating an associated "effective dynamics" in 1D on L qudits which is iterated for L timesteps, or alternatively on L qudits which is iterated for L timesteps. This mapping is rigorous and is related to previous maps from 2D quantum systems to 1D system evolving in time [RB01;Kim17a;Kim17b]. The effective 1D dynamics is then simulated using the time-evolving block decimation algorithm of Vidal [Vid04]. In analogy, we call this algorithm space-evolving block decimation (SEBD). We rigorously bound the simulation error made by the algorithm in terms of quantities related to the entanglement spectra of the effective 1D dynamics and give conditions in which it is provably asymptotically efficient for sampling and estimating output probabilities with small error. SEBD is self-certifying in the sense that it can construct confidence intervals for its own simulation error and for the fraction of random circuit instances it can simulate -we use this fact later to bound the error made in our numerical experiments.
A 1D unitary quantum circuit on L qubits iterated for L c timesteps with c > 0 is generally hard to simulate classically in poly(L)-time, as the entanglement across any cut can increase linearly in time. However, the form of 1D dynamics that a shallow circuit maps to includes measurements as well as unitary gates. While the unitary gates tend to build entanglement, the measurements tend to destroy entanglement and make classical simulation more tractable. It is a priori unclear which effect has more influence. To illustrate the mapping, we introduce the simple worst-casehard random circuit family consisting of a Haar-random single qubit gate applied to each site of a cluster state, and obtain an exact, closed-form expression for the effective 1D unitary-andmeasurement dynamics simulated by SEBD. We call this model CHR for "cluster-state with Haarrandom measurements".
Fortunately, unitary-and-measurement processes have been studied in a flurry of recent papers from the physics community [LCF18; Cha+18; SRN19; LCF19; SRS19; Cho+19; GH19a; BCA19; Jia+19; GH19b; Zab+19]. The consensus from this work is that processes consisting of entanglement-creating unitary evolution interspersed with entanglement-destroying measurements can be in one of two phases, where the entanglement entropy equilibrates to either an area law (constant), or to a volume law (extensive). When we vary parameters like the fraction of qudits measured between each round of unitary evolution, a phase transition is observed. The existence of a phase transition appears to be robust to variations in the exact model, such as replacing projective measurements on a fraction of the qudits with weak measurements on all of the qudits [LCF19; SRS19], or replacing Haar-random unitary evolution with Clifford [LCF18; LCF19; GH19a; Cho+19] or Floquet [SRN19; LCF19] evolution. This suggests that the efficiency of the SEBD algorithm depends on whether the particular circuit depth and architecture being simulated yields effective 1D dynamics that falls within the area-law or the volume-law regime. It also suggests a computational phase transition in the complexity of the SEBD algorithm. Essentially, decreasing the measurement strength or increasing the qudit dimension in these models is associated with moving toward a transition into the volume-law phase. Since increasing the 2D circuit depth is associated with decreasing the measurement strength and increasing the local dimension of the associated effective 1D dynamics, this already gives substantial evidence in favor of a computational phase transition in SEBD.
SEBD is provably inefficient if the effective 1D dynamics are on the volume-law side of the transition, and we expect it to be efficient on the area-law side because, in practice, dynamics obeying an area law for the von Neumann entanglement entropy are generally efficiently simulable. However, definitively proving that SEBD is efficient on the area-law side faces the obstacle that there are known contrived examples of states which obey an area law but cannot be efficiently simulated with matrix product states [Sch+08]. We address this concern by directly studying the entanglement spectrum of unitary-and-measurement processes in the area-law phase. To do this, we introduce a toy model for such dynamics which may be of independent interest. For this model, we rigorously derive an asymptotic scaling of Schmidt values across some cut as λ i ∝ exp −Θ(log 2 i) which is consistent with the scaling observed in our numerical simulations. Moreover, for this toy model we show that with probability at least 1 − δ, the equilibrium state after iterating the process can be ε-approximated by a state with Schmidt rank r ≤ exp O( log(n/εδ)) . Taking this toy model analysis as evidence that the bond dimension of SEBD when simulating a circuit whose effective 1D dynamics is in an area-law phase obeys this asymptotic scaling leads to Conjecture 1'.

Proof idea for rigorous complexity separation with non-uniform architectures.
We now explain the proof idea for the complexity separation discussed above. The main idea is that we define a non-uniform architecture such that, when all gates are Haar-random, the effective 1D dynamics consists of alternating rounds of unitary and measurement layers where in each measurement layer, a weak measurement is applied to each qubit. While the problem of rigorously proving an area-law/volume-law transition for general unitary-and-measurement processes is still open for the case of measurements with constant measurement strength, for our particular architecture the measurement strength itself increases rapidly as the system size n is increased (this is achieved using the non-uniformity of the architecture). Intuitively, then, by considering large enough n, the weak measurements can be made strong enough to destroy nearly all of the pre-existing entanglement in the 1D state. This is the key idea that allows us to prove that, by always compressing the MPS describing the state to one of just constant bond dimension, the error incurred is very low and efficient simulation is possible.
The fact that the effective measurement strength increases rapidly with system size follows from a technical result about the decay of expected post-measurement entanglement entropy when a contiguous block of qubits in a state produced by a 1D random local circuit is measured. In particular, we show that if ψ ABC is a 1D state produced by a depth-2 circuit with Haar-random 2-local gates, and subregion B is measured in the computational basis, then the expected entanglement entropy on the post-measurement state satisfies where ψ b AC is the post-measurement state after obtaining measurement outcome b, and |B| denotes the number of qubits in region B. (While we only need and only prove this result for depth-2 circuits, we expect it to remain true for any constant depth.) A similar result was obtained by Hastings in the context of nonnegative wavefunctions rather than random shallow circuits [Has16].

Numerical evidence for conjectures.
Excellent empirical performance of SEBD. We numerically implemented the SEBD algorithm to simulate two different universal (for MBQC) uniform shallow circuit architectures with randomly chosen gates. The first model is depth-3 circuits with "brickwork architecture" (see Figure 14 for a specification) and Haar-random two-qubit gates 3 . We found that a laptop using non-optimized code could efficiently simulate typical instances on a 409 × 409 grid with variational distance error less than 0.01. The mean runtime (averaged over random circuit instance) was on the order of one minute per sample. In principle the same algorithm could also be used to compute output probabilities with small additive error. We also simulated the CHR model as defined above on lattices of sizes up to 50 × 50. The slower-decaying entanglement spectrum of the effective 1D dynamics of the CHR model causes the maximal lattice size that we can simulate to be smaller, but allows the functional form of the spectrum to be better studied numerically, helping establish the asymptotic efficiency of the simulation as discussed below. It is useful to compare our observed runtime with what is possible by previously known methods. The previously best-known method that we are aware of for computing output probabilities for these architectures would be to write the circuit as a tensor network and perform the contraction of the network [Vil+19]. The cost of this process scales exponentially in the tree-width of a graph related to the quantum circuit, which for a 2D circuit is thought to scale roughly as the surface area of the minimal cut slicing through the circuit diagram, as in Eq. (1). By this reasoning, we estimate that simulating a circuit with brickwork architecture on a 400 × 400 lattice using tensor network contraction would be roughly equivalent to simulating a depth-40 circuit on a 20 × 20 lattice with the architecture considered in [Vil+19], where the entangling gates are CZ gates. We see that these tasks should be equivalent because the product of the dimensions of the bonds crossing the minimal cut is equal to 2 200 in both cases: for the brickwork circuit, 100 gates cross the cut if we orient the cut vertically through the diagram in Figure 14(a) and each gate contributes a factor of 4; meanwhile, for the depth-40 circuit, only one fourth of the unitary layers will contain gates that cross the minimal cut, and each of these layers will have 20 such gates that each contribute a factor of 2 (CZ gates have half the rank of generic gates). The task of simulating a depth-40 circuit on a 7 × 7 lattice was reported to require more than two hours using tensor network contraction on the 281 petaflop supercomputer Summit [Vil+19], and the exponentiality of the runtime suggests scaling this to 20×20 would take many orders of magnitude longer, a task that is decidedly intractable.
However, we emphasize that the important takeaway from the numerics is not merely the large circuit sizes we simulated but rather the fact that the numerics serve as evidence that the algorithm is efficient in the asymptotic limit n → ∞, as we discuss presently.
Evidence for asymptotic efficiency. Our simulations indicate that the average entanglement generated during the effective 1D dynamics of the SEBD algorithm for these architectures quickly saturates to a value independent of the number of qubits -an area law for entanglement entropy. In fact, our numerics indicate that the effective 1D dynamics not only obeys an area law for the von Neumann entropy, but also has the stronger property of obeying an area law for some Rényi entropies S α with α < 1. It has been proven that such a condition implies efficient representation by matrix product states [Sch+08], providing evidence for efficiency in the asymptotic limit.
To obtain a more precise estimate of the error of SEBD and attain further evidence of asymptotic efficiency, we also study the form of the entanglement spectra throughout the effective 1D dynamics. We observe that the spectrum of Schmidt values obeys a superpolynomial decay consistent with that predicted by our toy model for unitary-and-measurement dynamics in the arealaw phase, providing further validation of our toy model which suggests that not only is SEBD polynomial-time, but obeys the even better scaling with error parameters given in Conjecture 1'. Overall, the numerics strongly support Conjecture 1 and support the toy model which is the basis for the more aggressive Conjecture 1'.

Analytical evidence for conjectures from statistical mechanics.
In addition to providing strong numerical evidence that SEBD is efficient in the cases we considered, we also give analytical arguments for both algorithms' efficiency when acting on the depth-3 brickwork architecture using methods from statistical mechanics. We focus on the depth-3 brickwork architecture because it is a worst-case hard uniform architecture which is simple enough to be studied analytically. We also give evidence of computational phase transitions as qudit dimension and circuit depth are increased.
We map 2D shallow circuits with Haar-random gates to classical statistical mechanical models, utilizing techniques developed in [NVH18; Von+18; ZN19; Hun19; BCA19; Jia+19], such that the free energy cost incurred by twisting boundary conditions of the stat mech model corresponds to quantitiesS k , which we refer to as "quasi-entropies" of the output state of the quantum circuit. The quasi-entropy of index k is related but not exactly equal to the Rényi-k entanglement entropy averaged over random circuit instances and measurement outcomes, denoted by S k . We briefly define the quasi-entropy associated with a collection of states here. All logarithms are base-2 unless indicated otherwise.
Definition 3 (Quasi-k entropy). For a collection E = {ρ i } of non-normalized bipartite states on system AB, we define the quasi-k entropy of register A as where ρ i,A is the reduced state of ρ i on subregion A.
Virtually identical quantities were also considered in two other very recent works [BCA19; Jia+19]. Notably, in the k → 1 limit, these quantities approach the expected von Neumann entropy S(A) achieved when state ρ i is drawn from E with probability proportional to tr(ρ i ): Although the quasi-entropiesS k are not the entropic quantites that directly relate to the runtime of our algorithms, we study them because the stat mech mapping permits for an analytical handle onS k for integer k ≥ 2, and the calculations become especially tractable for k = 2. Essentially, changing the qudit dimension q of the random circuit model corresponds to changing the interaction strengths in the associated stat mech model. Phase transitions in the classical stat mech model are accompanied by phase transitions in quasi-entropies. While the efficiency of our algorithms is related to different entropic quantities, which are hard to directly analyze, the phase transition in quasi-entropies provides analytical evidence in favor of our conjectures, as we outline below.
Area-law to volume-law transition forS 2 in brickwork architecture. Since we can only analytically access quasi-entropies, our argument falls short of a rigorous proof that SEBD is efficient and experiences a computational phase transition, but contributes the following conclusions about the entanglement in the effective 1D dynamics.
1. For the brickwork architecture,S 2 for the collection of pure states encountered by the SEBD algorithm satisfies an area law when qudits have local dimension q = 2 (qubits) or q = 3 (qutrits).
2. For brickwork architecture,S 2 transitions to a volume-law phase once qudit local dimension q becomes sufficiently large. The critical point separating the two phases is estimated to be roughly q c ≈ 6 and could be precisely computed with standard Monte Carlo techniques.
The fact thatS 2 obeys an area law for brickwork architecture with q = 2 corroborates our other evidence that SEBD is efficient in this regime.
Phase transitions forS 2 in arbitrary architectures. The stat mech mapping can also be used to understand the behavior ofS 2 during SEBD for more general architectures. In particular, for any architecture, the mapping implies an area-law-to-volume-law phase transition inS 2 as q is increased, contributing further evidence in favor of a computational phase transition driven by q.
We also present a heuristic argument for why the existence of a computational phase transition as a function of q should always be accompanied by the existence of a phase transition as a function of the number of layers of the random circuit. Together with prior work on phase transitions in unitary-and-measurement models as measurement strength and qudit dimension are changed, this provides further evidence of computational phase transitions as stated in Conjecture 2.
Patching algorithm and transitions in quasi-conditional mutual information. We can use the stat mech mapping to study average entropic properties of the classical output distribution of the quantum circuit. In particular, we define a "quasi-k conditional mutual information" for the distribution over classical output distributions which is parameterized by the real number k and approaches the average conditional mutual information (CMI) in the k → 1 limit. We argue that if a random circuit has an associated stat mech model that is disordered, then the quasi-2 CMI of the distribution over classical output distributions is exponentially decaying in the sense that I 2 (A : C|B) ≤ poly(n)e −Ω(dist(A,C)) for any lattice subregions A, B, and C, where dist(A, C) is the distance between subregions A and C.
Taking this as evidence that the average CMI obeys the same exponential decay condition, we show that a very different circuit simulation algorithm which we call Patching can also be used to efficiently simulate the random circuit with high probability. This simulation algorithm, based on [BK19] but improving on the runtime of that algorithm (from quasi-polynomial to polynomial) in our setting of shallow circuits, works by exactly simulating disconnected subregions before applying recovery maps to "stitch" the regions together. We obtain rigorous bounds on the performance of Patching in terms of the rate of decay of CMI of the output distribution and give conditions in which it is asymptotically efficient.
On the other hand, when the corresponding stat mech model transitions to an ordered phase as q is increased, the quasi-CMI does not decay to zero as dist(A, C) is increased, providing evidence against CMI decay and against the efficiency of Patching. In fact, in the limit of infinitely large qudit dimension q → ∞, by formally evaluating all quasi-k CMIs and taking the k → 1 limit we show that the expected CMI of the classical output distribution between three regions forming a tripartition of the lattice becomes equal to a constant: I(A : C|B) = (1 − γ)/ ln 2 ≈ 0.61 where γ is the Euler constant. Unfortunately, performing the analytic continuation and hence exactly evaluating von Neumann entropies is difficult outside of the q → ∞ limit.

Future work and open questions
Our work yields several natural follow-up questions and places for potential future work. We list some here.
1. Can ideas from our work also be used to simulate noisy 2D quantum circuits? Roughly, we expect that increasing noise in the circuit corresponds to decreasing the interaction strength in the corresponding stat mech model, pushing the model closer toward the disordered phase, which is (heuristically) associated with efficiency of our algorithms. We therefore suspect that if noise is incorporated, there will be a 3-dimensional phase diagram depending on circuit depth, qudit dimension, and noise strength. As the noise is increased, our algorithms may therefore be able to simulate larger depths and qudit dimensions than in the noiseless case.
2. Can one approximately simulate random 2D circuits of arbitrary depth? This is the relevant case for Google's quantum computational supremacy experiment [Aru+19]. Assuming Conjecture 2, our algorithms are not efficient once the depth exceeds some constant, but it is not clear if this difference in apparent complexity for shallow vs. deep circuits is simply an artifact of our simulation method, or if it is inherent to the problem itself.
3. Our algorithms are well-defined for all 2D circuits, not only random 2D circuits. Are they also efficient for other kinds of unitary evolution at shallow depths, for example evolution by a fixed local 2D Hamiltonian for a short amount of time?
4. Can we rigorously prove Conjecture 1? One way to make progress on this goal would be to find a worst-case-hard uniform circuit family for which it would be possible to perform the analytic continuation of quasi-entropiesS k in the k → 1 limit using the mapping to stat mech models.
5. Can we give numerical evidence for Conjecture 2, which claims that our algorithms undergo computational phase transitions? This would require numerically simulating our algorithms for circuit families with increasing local Hilbert space dimension and increasing depth and finding evidence that the algorithms eventually become inefficient.
6. How precisely does the stat mech mapping inform the efficiency of our algorithms? Is the correlation length of the stat mech model associated with the runtime of our simulation algorithms? How well does the phase transition point in the stat mech model (and accompanying phase transition in quasi-entropies) predict the computational phase transition point in the simulation algorithms? If such questions are answered, it may be possible to predict the efficiency and runtime of the simulation algorithms for an arbitrary (and possibly noisy) random circuit distribution via Monte Carlo studies of the associated stat mech model. In this way, the performance of the algorithms could be studied even when direct numerical simulation is not feasible.
7. In the regime where SEBD is inefficient, i.e., when the effective 1D dynamics it simulates are on the volume-law side of the entanglement phase transition, is SEBD still better than previously known exponential-time methods? Intuitively, we expect this to be the case close to the transition point.

Outline for remainder of paper
We now outline the material of the remaining sections. The logical dependencies between subsections are described in the following figure. The paper may be read linearly without issue, but readers interested in only one specific aspect of our results may skip subsections outside the relevant chain of dependencies illustrated by the diagram.
• In Section 2, we specify and analyze our two algorithms for simulating 2D circuits, obtaining rigorous bounds on the runtime in terms of various parameters. In Section 2.1, we specify and analyze the SEBD algorithm. In Section 2.2, we give an explicit example of how the efficiency of the algorithm is related to the simulability of associated unitary-and-measurement processes. In Section 2.3, we study a toy model for unitary-and-measurement dynamics in an area-law phase, which (along with supporting numerics) is the basis for Conjecture 1'. In Section 2.4, we specify and analyze the Patching algorithm.
• In Section 3, we prove the complexity separation for a particular architecture. To understand this section, one need only read Section 2.1.
• In Section 4, we present numerical evidence supporting Conjecture 1 and Conjecture 1'.
• In Section 5.1, we introduce the mapping from random circuits to stat mech models. We first review the mapping technique developed in prior works before, in Section 5.2, applying the mapping to analyze a particular family of 1D circuits with weak measurements that correspond to the effective 1D dynamics of the cluster state model we study in Section 2.2.
• In Section 6.1, we apply the stat mech mapping to shallow random 2D circuits. In Section 6.2 and Section 6.3, we show how the stat mech mapping supports Conjecture 1 and Conjecture 2 with respect to SEBD and Patching, respectively. In Section 6.4, we study the stat mech mapping in more detail for the "brickwork" architecture, strengthening the evidence for Conjecture 1 and Conjecture 2. To understand the contents of Section 6, one must read Section 5.1 but Section 5.2 may be skipped.

Algorithms
We now propose and analyze two algorithms for sampling from the output distributions of shallow 2D random quantum circuits. The first algorithm is based on a reduction from the 2D simulation problem to a 1D simulation problem, which is then simulated with the time-evolving block decimation (TEBD) algorithm [Vid04]. We therefore refer to this algorithm as space-evolving block decimation (SEBD). Essentially, the resulting algorithm is efficient if the 1D state in the corresponding effective 1D dynamics can be approximately represented as an MPS of polynomially bounded bond dimension at all times. We also discuss how a variation of SEBD can be used to compute output probabilities with small additive error. We analyze the behavior of SEBD applied to a simple class of 2D shallow random circuits with a uniform circuit architecture that is universal for measurement-based quantum computation and is therefore believed to be hard to simulate in the worst case -namely, the 2D cluster state with Haar-random single qubit measurements. We believe that this simple class of random 2D circuits qualitatively captures the behavior of many other random shallow circuit architectures. The benefit of studying this model is that we can obtain an exact, closed-form description of the behavior of our simulation algorithm for this problem. In particular, we show that the algorithm can be understood as simulating a 1D process involving alternating layers of random unitary gates and measurements. Such processes have been the subject of a number of recent works [LCF18; Cha+18; SRN19; LCF19; SRS19; Cho+19; GH19a; Zab+19], which find evidence for the existence of an entanglement phase transition driven by the frequency and strength of the measurements from an "area-law" phase characterized by low entanglement to a "volume-law" phase characterized by large entanglement.
We introduce a toy model that intuitively captures how the entanglement spectrum of such a unitary-and-measurement process might scale in the area-law phase. The toy model predicts a superpolynomial decay of Schmidt values across any cut, a sufficient condition for efficient MPS representation. Later in Section 4, we numerically observe the effective 1D dynamics of the uniform architectures we simulate to be in the area-law phase, with a decay of Schmidt values consistent with that predicted by the toy model. This physical picture provides strong evidence that our algorithm is efficient for the uniform architectures we considered. Later in Section 6, we provide additional analytical evidence for this indeed being the case.
The second algorithm, which we call Patching, involves first sampling from the output distribution of small disconnected patches of the lattice, and then stitching them together to obtain a global sample. This algorithm is efficient if the conditional mutual information (CMI) of the output distribution of the circuit is exponentially decaying in a sense that we make precise below. The latter algorithm is essentially an adaptation of an algorithm for preparing Gibbs states with finite correlation length [BK19]. However, by exploiting the fact that the distribution we want to sample from is classical and arises from a constant-depth local circuit, we are able to improve on a naïve application of that scheme, obtaining a polynomial-time algorithm instead of the quasipolynomial-time algorithm obtained in [BK19] if the conditional mutual information is exponentially decaying.
While the criteria required by these two algorithms for efficiency superficially appear unrelated, we find evidence that they are indeed related. Namely, in Section 6 we relate the efficiency criteria of both algorithms to phases of a statistical mechanical model associated with the random circuit family. A stat mech model in the ordered phase suggests that the criteria for both algorithms is met, whereas a model in the disordered phase suggests that the criteria for both is not met. It therefore is plausible that SEBD can efficiently simulate some random circuit family if and only if Patching can.
We assume the reader is familiar with standard tensor work methods, particularly algorithms for manipulating matrix product states (see e.g. [Orú14; BC17] for reviews). In all figures, the 2D circuit is depicted as a spacetime volume, with time flowing upwards. The green (respectively dotted blue) region denotes unmeasured (measured) qudits. In (a), we apply all gates in the lightcone of column 1, namely, those gates intersecting the spacetime volume shaded red. In (b), we simulate the computational basis measurement of column 1. In (c), we apply all gates in the lightcone of column 2 that were previously unperformed. Figure (

Space-evolving block decimation (SEBD)
In this section, we introduce the SEBD algorithm for simulating a shallow 2D random circuit. In fact, the algorithm is well-defined for any 2D circuit, but we present evidence that it is efficient for sufficiently shallow random circuit families with uniform architectures, and prove it is efficient for certain depth-3 random circuit families with non-uniform architectures in Section 3. We first give our algorithm for sampling and error bounds, before describing a modified version of the algorithm for computing output probabilities and error bounds associated with this version of the algorithm.

Specification of algorithm
For concreteness, we consider a rectangular grid of qudits with local Hilbert space dimension q, although the algorithm could be similarly defined for different lattices. Assume WLOG that the grid consists of n = L 1 × L 2 qudits, where L 1 is the number of rows, L 2 is the number of columns, and L 1 ≤ L 2 . For each qudit, let |i , i ∈ [q] label a set of basis states which together In (a), we begin with an MPS describing the current state ρ j . In (b), the MPS is compressed via truncation of small Schmidt values. This will generally decrease the bond dimension of the MPS, depicted by the thin black line rather than thick purple line. In (c), qudits acted on by V j that are not already incorporated into the current state are added to the MPS (increasing the physical bond dimension of the MPS) and initialized in |0 states. In (d), the unitary gates associated with V j are applied. Figure (e) depicts the MPS after the application of V j ; the thick purple lines schematically illustrate the fact that the bond dimension may increase in this step. In (f), the measurement of column j is performed, and the outcome 01101 is obtained. Subsequently column j is projected onto 01101, removing the physical legs associated with these sites from the MPS. The resulting state is ρ j+1 .
form the computational basis. Assume all gates act on one site or two neighboring sites, and the starting state is |1 ⊗n . Let d denote the circuit depth, which should be regarded as a constant. For a fixed circuit instance C, the goal is to sample from a distribution close to D C , defined to be the distribution of the output of C upon measuring all qudits in the computational basis. For an output string x ∈ [q] n , we let D C (x) denote the probability of the circuit outputting x after measurement. The high-level behavior of the algorithm is illustrated in Figure 1. Recall that C can always be exactly simulated in time L 2 q Θ(dL 1 ) using standard tensor network algorithms [MS08].
Since all of the single-qudit measurements commute, we can measure the qudits in any order. In particular, we can first measure all of the sites in column 1, then those in column 2, and iterate until we have measured all L 2 columns. This is the measurement order we will take. Now, consider the first step in which we measure column 1. Instead of applying all of the gates of the circuit and then measuring, we may instead apply only the gates in the lightcone of column 1, that is, the gates that are causally connected to the measurements in column 1. We may ignore qudits that are outside the lightcone, by which we mean qudits that are outside the support of all gates in the lightcone.
Let ρ 1 = |1 1| ⊗L 1 denote the trivial starting state that is a tensor product of |1 states in column 1, which the algorithm represents as an MPS. Let V 1 denote the isometry corresponding to applying all gates in the lightcone of this column. The algorithm simulates the application of V 1 by adding qudits in the lightcone of column 1 as necessary and applying the associated unitary gates, maintaining the description of the state as an MPS of length L 1 as illustrated in Figure 2. Since there are up to d + 1 columns in the lightcone of column 1, each tensor of the MPS after the application of V 1 has up to d + 1 dangling legs corresponding to physical indices, for a total physical dimension of at most q d+1 . Since in the application of V 1 , there are up to O(d 2 ) gates that act between any two neighboring rows, the (virtual) bond dimension of the updated MPS is at most q O(d 2 ) .
We now simulate the computational basis measurement of column 1. More precisely, we measure the qudits of column 1 one by one. We first compute the respective probabilities p 1 , p 2 , . . . , p q of the q possible measurement outcomes for the first qudit. This involves contracting the MPS encoding V 1 ρ 1 V † 1 . We now use these probabilities to classically sample an outcome i ∈ [q], and update the MPS to condition on this outcome. That is, if (say) we obtain outcome 1 for site i, we apply the projector |1 1| to site i of the state and subsequently renormalize. After doing this for every qudit in the column, we have exactly sampled an output string x 1 ∈ [q] L 1 from the marginal distribution on column 1, and are left with an MPS description of the pure, normalized, postmeasurement state ρ 2 proportional to tr column 1 Π x where Π x 1 denotes the projection of column 1 onto the sampled output string x = x 1 . Using standard tensor network algorithms, the time complexity of these steps is L 1 q O(d 2 ) .
We next consider column 2. At this point, we add the qudits and apply the gates that are in the lightcone of column 2 but were not applied previously. Denote this isometry by V 2 . It is straightforward to see that this step respects causality. That is, if some gate U is in the lightcone of column 1, then any gate W that is in the lightcone of column 2 but not column 1 cannot be required to be applied before U , because if it were, then it would be in the lightcone of column 1. Hence, when we apply gates in this step, we never apply a gate that was required to be applied before some gate that was applied in the first step. After this step, we have applied all gates in the lightcone of columns (1, 2), and we have also projected column 1 onto the measurement outcomes we observed.
By simulating the measurements of column 2 in a similar way to those of column 1, we sample a string x 2 from the marginal distribution on column 2, conditioned on the previously observed outcomes from column 1. Each time an isometry V j is applied, the bond dimension of the MPS representation of the current state will in general increase by a multiplicative factor. In particular, if we iterate this procedure to simulate the entire lattice, we will eventually encounter a maximal bond dimension of up to q O(dL 1 ) and will obtain a sample x = (x 1 , x 2 , . . . , x L 2 ) ∈ [q] n from the true output distribution.
To improve the efficiency at the expense of accuracy, we may compress the MPS in each iteration to one with smaller bond dimension using standard MPS compression algorithms. In particular, in each iteration j before we apply the corresponding isometry V j , we first discard as many of the smallest singular values (i.e. Schmidt values) associated with each cut of the MPS as possible up to a total truncation error per bond of , defined as the sum of the squares of the discarded singular values. The bond dimension across any cut is reduced by the number of discarded values. This truncation introduces some error that we quantify below.
If the maximal bond dimension of this truncated version of the simulation algorithm is D, the total runtime of the full algorithm to obtain a sample is bounded by (taking q and d to be constants) O(nD 3 ) using standard MPS compression algorithms.
We assume that for a specified maximal bond dimension D and truncation error per bond , if a bond dimension ever exceeds D then the algorithm terminates and outputs a failure flag fail. Hence, the runtime of the algorithm when simulating some circuit C with parameters and D is bounded by O(nD 3 ), and the algorithm has some probability of failure p f ,C . We summarize the SEBD algorithm in Algorithm 1. apply Π x t t to condition on measurement string x t observed for that column The untruncated version of the algorithm presented above samples from the true distribution D C of the measurement outcomes of the original 2D circuit C. However, due to the MPS compression which we perform in each iteration and the possibility of failure, the algorithm incurs some error which causes it to instead sample from some distribution D C . Here, we bound the total variation distance between these distributions, defined by where the sum runs over the q n possible output strings (not including fail) in terms of the truncation error made by the algorithm.
We first obtain a very general bound on the error made by SEBD with no bond dimension cutoff in terms of the truncation error. Note that the truncation error may depend on the (random) measurement outcomes, and is itself therefore a random variable. See Appendix B for a proof.
Lemma 1. Let i denote the sum of the squares of all singular values discarded in the compression during iteration i of the simulation of a circuit C with output distribution D C by SEBD with no bond dimension cutoff, and let Λ denote the sum of all singular values discarded over the course of the algorithm. Then the distribution D C sampled from by SEBD satisfies where the expectations are over the random measurement outcomes.
From Lemma 1 we immediately obtain two corollaries. The first is useful for empirically bounding the sampling error in total variation distance made by SEBD when the algorithm also has a bond dimension cutoff. The second is a useful asymptotic statement. The corollaries follow straightforwardly from the coupling formulation of variational distance, Markov's inequality, and the triangle inequality.
Corollary 1. Let A denote a SEBD algorithm with truncation error parameter and bond dimension cutoff D. Consider a fixed circuit C, and suppose that A applied to this circuit fails with probability p f ,C . Then A samples from the output distribution of C with total variation distance error bounded by If the failure probability of A averaged over random choice of circuit instance and measurement outcome is p f , then for any δ, on at least 1 − δ fraction of circuit instances, A samples from the true output distribution with total variation distance error bounded by L 2 In practice, the variational distance error of SEBD with truncation error applied to the simulation of some circuit C can be bounded by constructing a confidence interval for p f ,C and applying the above bound.
Corollary 2. Let A denote a SEBD algorithm with truncation error parameter and no bond dimension cutoff. Suppose that, for some random circuit family with q = O(1) and d = O(1), the expected bond dimension across any cut is bounded by poly(n, 1/ ). Then, SEBD with some choice of = 1/ poly(n) and D = poly(n) runs in time poly(n, 1/ε, 1/δ) and, with probability at least 1 − δ over the choice of circuit instance C, samples from the output distribution of C with variational distance error less than ε.
Thus, to prove the part of Conjecture 1 about sampling up to total variation distance error ε for uniform random circuit families, it would suffice to show that there is a 2D constant-depth uniform random quantum circuit family with the worst-case-hard property for which the expected bond dimension across any cut while running SEBD with truncation parameter is bounded by poly(n, 1/ ). Later, we will introduce two candidate circuit families for which we can give numerical and analytical evidence that this criterion is indeed met.
In the next subsection, we show how the other part of Conjecture 1, regarding computing output probabilities, would also follow from a poly(n, 1/ ) bound on the bond dimension of states encountered by SEBD on uniform worst-case-hard circuit families.

Computing output probabilities
In the previous section, we described how a SEBD algorithm with a truncation error parameter and a bond dimension cutoff D applied to a circuit C samples from a distribution D C satisfying is the probability that some bond dimension exceeds D and the algorithm terminates and indicates failure. Expanding the expression for the 1-norm and rearranging, we have SEBD with bond dimension cutoff D can be used to compute D C (x) for any output string x in time O(nD 3 ) (taking q and d to be constants). To do this, for a fixed output string x, SEBD proceeds similarly to the case in which it's being used for sampling, but rather than sampling from the output distribution of some column, it simply projects that column onto the outcome specified by the string x, and computes the conditional probability of that outcome via contraction of the MPS.
That is, at iteration t, the algorithm computes the conditional probability of measuring the string , by projecting column t onto the relevant string via the projector Π x t t and then contracting the relevant MPS. If the bond dimension ever exceeds D, then it must hold that D C (x) = 0, and so the algorithm outputs zero and terminates. Otherwise, the algorithm outputs . We summarize this procedure in Algorithm 2. if some bond dimension is > D, terminate and output zero 6: apply Π x t t to condition on string x t 7: We have therefore shown the following.
Lemma 2. Let p f ,C be the failure probability of SEBD when used to simulate a circuit instance C with truncation error parameter and bond dimension cutoff D. Suppose x ∈ [q] n is an output string drawn uniformly at random. Then Algorithm 2 outputs a number D C (x) satisfying The above lemma bounds the expected error incurred while estimating a uniformly random output probability for a fixed circuit instance C. We may use this lemma to straightforwardly bound the expected error incurred while estimating the probability of a fixed output string over a distribution of random circuit instances. The corollary is applicable if the distribution of circuit instances has the property of being invariant under an application of a final layer of arbitrary single-qudit gates. This includes circuits in which all gates are Haar-random (as long as every qudit is acted on by some gate), but is more general. In particular, any circuit distribution in which the final gate to act on any given qudit is Haar-random satisfies this property. This fact will be relevant in subsequent sections.
Corollary 3. Let p f be the failure probability of SEBD when used to simulate a random circuit instance C with truncation error parameter and bond dimension cutoff D, where C is drawn from a distribution that is invariant under application of a final layer of arbitrary single-qudit gates. Then for any fixed string x ∈ [q] n the output of Algorithm 2 satisfies Proof. Averaging the bound of Eq. (8) over random circuit instances, we have Let L y denote a layer of single-qudit gates with the property that L y |x = |y . By assumption, from which the result follows.
The following asymptotic statement follows straightforwardly.
Corollary 4. Let A denote a SEBD algorithm with truncation error parameter and no bond dimension cutoff. Suppose that, for some random circuit family with q = O(1) and d = O(1), the expected bond dimension across any cut is bounded by poly(n, 1/ ). Then, SEBD with some choice of = 1/ poly(n) and D = poly(n) runs in time poly(n, 1/ε, 1/δ) and, with probability at least 1 − δ over the choice of circuit instance C, estimates D C (x) for some fixed x ∈ [q] n up to additive error bounded by ε/q n .
Corollary 4 shows how the part of Conjecture 1 about computing arbitrary output probabilities to error ε/q n would follow from a bound on the bond dimension across any cut when SEBD runs on a uniform worst-case-hard circuit family.

SEBD applied to cluster state with Haar-random measurements (CHR)
We now study the SEBD algorithm in more detail for a simple uniform family of 2D random circuits that possesses the worst-case-hard property required by Conjecture 1. The model we consider is the following: start with a 2D cluster state of n qubits arranged in a √ n × √ n grid, apply a singlequbit Haar-random gate to each qubit, and then measure all qubits in the computational basis.
Recall that a cluster state may be created by starting with the product state |+ ⊗n before applying CZ gates between all adjacent sites. An equivalent formulation which we will find convenient in the subsequent section is to measure each qubit of the cluster state in a Haar-random basis. We refer to this model as CHR, for "cluster state with Haar-random measurements".
It is straightforward to show, following [BJS10], that sampling from the output distribution of CHR is classically worst-case hard assuming the polynomial hierarchy (PH) does not collapse to the third level.
Lemma 3. Suppose that there exists a polynomial-time classical algorithm for CHR that, for any circuit realization, samples from the classical output distribution. Then PH collapses to the third level.
Proof. Recall that the cluster state is a resource state for universal measurement-based quantum computation (MBQC) [RB01]. Hence, it is PostBQP-hard to sample from the conditional output distribution of an arbitrary instance of CHR with some outcomes postselected on zero. Therefore, if there is some efficient classical sampling algorithm, then PostBPP = PostBQP which implies that PH collapses to the third level [BJS10].
It can also be shown, following [Mov19], that near-exactly computing output probabilities of CHR is #P-hard in the average case.
Lemma 4 (Follows from [Mov19]). Suppose there exists an algorithm A that, given a random instance C of CHR and fixed string x, with probability 1−1/p(n) outputs D C (x) = | x|C|0 ⊗n | 2 up to additive error 2 −Θ(n 2 ) , where p(n) is a sufficiently large polynomial. Then A can be used to compute a #P-complete function with high probability in polynomial time.
Under standard complexity theoretic assumptions, Lemma 3 rules out the existence of a classical sampling algorithm for CHR that succeeds for all instances, and Lemma 4 rules out the existence of an algorithm for efficiently computing most output probabilities of CHR. A natural question is then whether efficient approximate average-case versions of these algorithms may exist. We formalize these questions as the problems CHR samp/prob ± . Problem 1 (CHR samp/prob ± ). Given as input a random instance C of CHR (specified by a sidelength √ n and a set of n single-qubit Haar-random gates applied to the √ n × √ n cluster state) and error parameters ε and δ, perform the following computational task in time poly(n, 1/ε, 1/δ).
• CHR samp ± . Sample from a distribution D C that is ε-close in total variation distance to the true output distribution D C of circuit C, with probability of success at least 1 − δ over the choice of measurement bases.
• CHR prob ± . Estimate D C (0), the probability of obtaining the all-zeros string upon measuring the output state of C in the computational basis, up to additive error at most ε/2 n , with probability of success at least 1 − δ over the choice of measurement bases.
In the next section, we show that SEBD solves CHR samp/prob ± if a certain form of 1D dynamics involving local unitary gates and measurements is classically simulable.

SEBD applied to CHR
We first consider the sampling variant of SEBD. Specializing to the CHR model, the algorithm takes on a particularly simple form due to the fact that the cluster state is built by applying CZ gates between all neighboring pairs of qubits, which are initialized in |+ states. Due to this structure, the radius of the lightcone for this model is simply one. In particular, the only gates in the lightcone of columns 1-j are the Haar-random single-qubit gates acting on qubits in these columns, as well as CZ gates that act on at least one qubit within these columns. This permits a simple prescription for SEBD applied to this problem.
Initialize the simulation algorithm in the state ρ 1 = |+ +| ⊗ √ n corresponding to column 1. To implement the isometry V 1 , initialize the qubits of column 2 in the state |+ +| ⊗ √ n and apply CZ gates between adjacent qubits that are both in column 1 and between adjacent qubits in separate columns. Now, measure the qubits of column 1 in the specified Haar-random bases (equivalently, apply the specified Haar-random gates and measure in the computational basis), inducing a pure state ρ 2 with support in column 2. Iterating this process, we progress through a random sequence of 1D states on √ n qubits ρ 1 → ρ 2 → · · · → ρ √ n which we will see can be equivalently understood as arising from a 1D dynamical process consisting of alternating layers of random unitary gates and weak measurements.
It will be helpful to introduce notation. Define θ, φ := cos θ 2 |0 + e iφ sin θ 2 |1 . In other words, let θ, φ denote the single-qubit pure state with polar angle θ and azimuthal angle φ on the Bloch sphere. Let θ i specify the measurement basis of the qubit in row i and column t; that is, the projective measurement on the qubit in row i and column t is {Π 0 Note that {M 0 (θ, φ), M 1 (θ, φ)} defines a weak single-qubit measurement. We now describe, in Algorithm 3, a 1D process which we claim produces a sequence of states identical to that encountered by SEBD for the same choice of measurement bases and measurement outcomes, and also has the same measurement statistics.
Algorithm 3 Effective 1D dynamics of a fixed instance of CHR apply a CZ gate between every adjacent pair of qubits 4: apply a Hadamard transform 6: where ρ j is the state at the beginning of iteration j of the SEBD algorithm.
Proof. The lemma follows from the above description of the behavior of SEBD applied to CHR, as well as the following identities holding for any single-qubit state |ξ which may be verified by straightforward calculation: We have seen that, for a fixed choice of single-qubit measurement bases {θ j } t,j associated with an instance C, we can define an associated 1D process consisting of alternating layers of single-qubit weak measurements and local unitary gates, such that simulating this 1D process is sufficient for sampling from D C . Now, recall that in the context of simulating CHR, each single-qubit measurement basis is chosen randomly according to the Haar measure. That is, the Bloch sphere angles (θ i are uniformly distributed on [0, 2π]. Using these observations, as well as the observation that the outcome probabilities of the measurement of qubit i in iteration t are independent of the azimuthal angle φ (t) n, we may derive effective dynamics of a random instance.
Define the operators Note that {N (x), N (−x)} defines a weak measurement. Also, define the phase gate By randomizing each single-qubit measurement basis according to the Haar distribution, one finds that the dynamics of Algorithm 3 (which applies for a fixed choice of measurement bases) may be written as Algorithm 4 below, where the notation x ∈ U [−1, 1] means that x is a random variable uniformly distributed on [−1, 1]. That is, the distribution of random sequences ϕ 1 → ϕ 2 → · · · → ϕ √ n and distribution of output statistics produced by Algorithm 4 is identical to that produced by SEBD applied to CHR.
Hence, if TEBD can efficiently simulate the process of Algorithm 4 with high probability, then SEBD can solve CHR samp ± and CHR prob ± . We formalize this in the following lemma.
Algorithm 4 Effective 1D dynamics of CHR  Proof. Follows from Corollary 2, Corollary 4, and the equivalence to Algorithm 4 discussed above.
We have shown how SEBD applied to CHR can be reinterpreted as TEBD applied to a 1D dynamical process involving alternating layers of random unitaries and weak measurements. Up until this point, there has been little reason to expect that SEBD is efficient for the simulation of CHR. In particular, with no truncation, the bond dimension of the MPS stored by the algorithm grows exponentially as the algorithm sweeps across the lattice.
We now invoke the findings of a number of related recent works [LCF18; Cha+18; SRN19; LCF19; SRS19; Cho+19; GH19a; BCA19; Jia+19; GH19b; Zab+19] to motivate the possibility that TEBD can efficiently simulate the effective 1D dynamics. These works study various 1D dynamical processes involving alternating layers of measurements and random local unitaries. In some cases, the measurements are considered to be projective and only occur with some probability p. In other cases, similarly to Algorithm 4, weak measurements are applied to each site with probability one. The common finding of these papers is that such models appear to exhibit an entanglement phase transition driven by measurement probability p (in the former case), or measurement strength (in the latter case). On one side of the transition, the entanglement entropy obeys an area law, scaling as O(1) with the length L. On the other side, it obeys a volume law, scaling as O(L).
Based on these works, one expects the entanglement dynamics to saturate to an area-law or volume-law phase. And in fact, our numerical studies (presented in Section 4) suggest that these dynamics saturate to an area-law phase. The common intuition that 1D quantum systems obeying an area law for the von Neumann entropy are easy to simulate with matrix product states therefore suggests that SEBD applied to this problem is efficient. While counterexamples to this common intuition are known [Sch+08], they are contrived and do not present an obvious obstruction for our algorithm. To better understand the relationship between maximal bond dimension and truncation error when the effective dynamics is in the area-law phase as well as rule out such counterexamples, in the following section we describe a toy model for a unitary-and-measurement process in the area-law phase, which predicts a superpolynomial decay of Schmidt values across any cut and therefore predicts that a polynomial runtime is sufficient to perform the simulation to 1/ poly(n) error. Our numerical results (presented in Section 4) suggest that the effective dynamics of the random circuit architectures we consider are indeed in the area-law phase, with entanglement spectra consistent with those predicted by the toy model dynamics. Further analytical evidence for efficiency is given in Section 5.
Note that, although we explicitly derived the effective 1D dynamics for the CHR model and observed it to be a simple unitary-and-measurement process, the interpretation of the effective 1D dynamics as a unitary-and-measurement process is not specific to CHR and is in fact general. In the general case, SEBD tracks O(r) columns simultaneously where r is the radius of the lightcone corresponding to the circuit. In each iteration, new qudits that have come into the lightcone are added, unitary gates that have come into the lightcone are performed, and finally projective measurements are performed on a single column of qudits. Similarly to the case of CHR, this entire procedure can be viewed as an application of unitary gates followed by weak measurements on a 1D chain of qudits of dimension q O(r) . Intuitively, increasing the circuit depth corresponds both to increasing the local dimension in the effective 1D dynamics and decreasing the measurement strength. The former is due to the fact that in general the lightcone radius r will increase as depth is increased, and the local dimension of the effective dynamics is q O(r) . The latter is due to the fact that as r increases, the number of tracked columns increases but the number of measured qudits in a single round stays constant. Hence the fraction of measured qudits decreases, and intuitively we expect this to correspond to a decrease in effective measurement strength. This intuition together with the findings of prior works on unitary-and-measurement dynamics suggests that the effective dynamics experiences an entanglement phase transition from an area-law to volume-law phase as q or d is increased, and therefore SEBD experiences a computational phase transition, supporting Conjecture 2. While this analogy is not perfect, we provide further analytical evidence in Section 6 that the effective 1D dynamics indeed undergoes such a phase transition.

Conjectured entanglement spectrum of unitary-and-measurement dynamics in an area-law phase
Numerical (Section 4) and analytical (Section 6) evidence suggests that the effective 1D dynamics corresponding to the uniform 2D shallow random circuit families we consider are in the area-law phase, making efficient simulation via SEBD very plausible. However, it is desirable to have clear predictions for the scaling of the entanglement spectra for states of the effective 1D dynamics, as this allows us to make concrete predictions for error scaling of SEBD and rule out (contrived) examples of states [Sch+08] which cannot be efficiently represented via MPS despite obeying an area law for the von Neumann entanglement entropy.
To this end, we study a simple toy model of how entanglement might scale in the area-law phase of a unitary-and-measurement circuit. Consider a chain of n qubits where we are interested in the entanglement across the cut between 1, . . . , n/2 and n/2 + 1, . . . , n (assume n is even). We model the dynamics as follows. In each time step we perform the following three steps: 1. Set the state of sites n/2 and n/2 + 1 to be an EPR pair |Φ = (|00 + |11 )/ √ 2.
Without the measurements this would create one EPR pair in each time step until the system had n/2 EPR pairs across the cut after time n/2. However, the measurements have the effect of reducing the entanglement. For this process, we derive the functional form of the asymptotic scaling of half-chain Schmidt coefficients λ 1 ≥ λ 2 ≥ · · · . Moreover, bounds on the scaling of the entanglement spectrum allows us to derive a relation between the truncation error (sum of squares of discarded Schmidt values) incurred upon discarding small Schmidt values, and the rank r of the post-truncation state. The bounds are given in the following lemma, which is proved in Appendix B.
Lemma 7. Let λ 1 ≥ λ 2 ≥ · · · denote the half-chain Schmidt values after at least n/2 iterations of the toy model process. Then with probability at least 1 − δ the half-chain Schmidt values indexed by i ≥ i * = exp Θ( log(n/δ)) obey the asymptotic scaling Furthermore, upon truncating the smallest Schmidt coefficients up to a truncation error of , with probability at least 1 − δ, the half-chain Schmidt rank r of the post-truncation state obeys the scaling This is the basis for our Conjecture 1'. More precisely, we take this analysis as evidence that the bond dimension D, truncation error , and system size n obey the scaling D ≤ exp Θ log(n/ δ) with probability 1 − δ over random circuit instance and random measurement outcomes when SEBD simulates a random constant-depth 2D circuit whose effective 1D dynamics lie in the arealaw phase. Recalling that the runtime of SEBD scales like O(nD 3 ) for a maximal bond dimension of D and using the relationship between truncation error, failure probability, variational distance error, and simulable circuit fraction given in Corollary 1, we conclude that SEBD with a maximal bond dimension cutoff scaling as exp Θ log(n/ δ) runs in time n 1+o(1) exp Θ log(1/εδ) and simulates 1 − δ fraction of random circuit instances up to variational distance error ε.
It is important to note what this heuristic argument leaves out. While a 1D unitary-andmeasurement circuit will indeed create O(1) ebits across any given cut in each round, these will not remain in the form of distinct pairs of qubits. The unitary dynamics within each side of the cut will have the effect of transforming the Schmidt bases into entangled ones. This will make the measurements less effective at reducing the entanglement, for reasons that can be understood in terms of quantum state merging [HOW07;Cho+19]. Another simplification of the toy model is that the measurement angle θ is taken to be a fixed constant rather than random. Finally, in the toy model we assume for simplicity that the EPR pairs move cyclically. We expect that, if this effect is significant, it is more likely to make the toy model overly pessimistic compared with the real situation. Despite these simplifications, we believe this model is qualitatively accurate in the area-law phase. Indeed, the scaling of Schmidt values predicted by our toy model analysis is consistent with the scaling we find numerically in Figure 8.

Patching
We now describe a second algorithm for sampling from the output distributions and computing output probabilities of 2D quantum circuits acting on qudits of local dimension q. While the SEBD algorithm described in the previous section is efficient if the corresponding effective 1D dynamics can be efficiently simulated with TEBD, the algorithm of this section is efficient if the circuit depth d and local dimension q are constant and the conditional mutual information (CMI) of the classical output distribution is exponentially decaying in a sense that we make precise below. In Section 6 we will give evidence that the output distribution of sufficiently shallow random 2D circuits acting on qudits of sufficiently small dimension satisfies such a property with high probability, and the property is not satisfied if the circuit depth or local dimension exceeds some critical constant value.
The algorithm we describe is an adaptation and simplification of the Gibbs state preparation algorithm of [BK19]. In that paper, the authors essentially showed that a quantum Gibbs state defined on a lattice can be prepared by a quasipolynomial time quantum algorithm, if the Gibbs state satisfies two properties: (1) exponential decay of correlations and (2) exponentially decaying quantum conditional mutual information for shielded regions. Our situation is simpler than the one considered in that paper, due to the fact that sufficiently separated regions of the lattice are causally disconnected as a result of the fact that the circuit inducing the distribution is constantdepth and therefore has a constant-radius lightcone. The structure of our algorithm is very similar to theirs, except we can make some simplifications and substantial improvements as a result of the constant-radius lightcone and the fact that we are sampling from a classical distribution rather than a quantum Gibbs state.
Before we describe the algorithm, we set some notation. Let Λ denote the set of all qudits of a L 1 × L 2 rectangular grid (assume L 1 ≤ L 2 ≤ poly(L 1 )). If A and B are two subsets of qudits of Λ, we define dist(A, B) := min i∈A,j∈B dist(i, j), where dist(i, j) is the distance between sites i and j as measured by the ∞-norm. There are two primary facts that our algorithm relies on. First, if the circuit has depth d, any two sets of qudits separated by a distance greater than 2d have nonoverlapping lightcones. Hence, if A and B are two lattice regions separated by distance at least 2d, and ρ is the quantum state output by the circuit (before measurement), it holds that ρ AB = ρ A ⊗ ρ B and therefore D AB = D A ⊗ D B if D = x D(x) |x x| is the classical output distribution of the circuit and (for example) D A denotes the marginal of D on subregion A. (Note that our notation is slightly different in this section -we now use subscripts on D to denote marginals, and the dependence of D on the circuit instance is left implicit.) Second, if the classical CMI I(X : Z|Y ) p of three random variables with joint distribution p XY Z is small, then p XY Z is close to the distribution p X|Y p Y p Z|Y corresponding to a Markov chain X − Y − Z. We state this more formally as the following lemma, which follows from the Pinsker inequality.
Lemma 8 (see e.g. [CT91]). Let X, Y , Z be discrete random variables, and let p XY Z denote their joint distribution. Then Following [BK19], we also formally define a notion of CMI decay.
Definition 4 (Markov property). Let p denote a probability distribution supported on Λ. Then p is said to satisfy the δ(l)-Markov condition if, for any tripartition of a subregion X of the lattice into Intuitively, our algorithm works by first sampling from the marginal distributions of spatially separated patches on the lattice, and then stitching the patches together to approximately obtain a sample from the global distribution. For a O(1)-depth circuit whose output distribution has exponentially decaying CMI, the efficiency of this procedure is guaranteed by the two facts above. We now show this more formally.
Theorem 1. Suppose C is a 2-local quantum circuit of depth d defined on a 2D rectangular grid Λ of n = L 1 × L 2 qudits, and let D(x) := | x| C |1 ⊗n | 2 denote its output distribution. Then if D satisfies the δ(l)-Markov condition, for any integer l > 2d Patching with a length-scale parameter l runs in time nq O(dl) and samples from some distribution D that satisfies In , and D is poly(n)e −Ω(l) -Markov, then for any polynomial r(n), for some choice of lengthscale parameter Patching runs in time poly(n) and samples from a distribution that is 1/r(n)-close to D in total variation distance.
Proof. The algorithm proceeds in three steps, illustrated in Figure 3. First, for each square subregion R i shaded in Figure 3(a) with i ∈ [O(n/l 2 )], sample from D R i , the marginal distribution of D on subregion R i . To do this, first restrict to the qudits and gates in the lightcone of R i . Sampling from the output distribution on R i produced by this restricted version of the circuit is equivalent to sampling from the marginal on R i of the true distribution produced by the full circuit.
Since l > 2d, this restriction of the circuit is contained in a sublattice of dimensions O(l) × O(l). Using standard tensor network methods [MS08], sampling from the output distribution of this restricted circuit on R i can be performed in time q O(dl) . Since there are O(n/l 2 ) patches, this step can be performed in time nq O(dl) . After performing this step, we have prepared the state D R 1 ⊗ · · · ⊗ D R k = D R 1 ,...,R k where the equality holds because the patches are separated by l > 2d and are therefore mutually independent.
In the second step, we apply "recovery maps" to approximately prepare a sample from the larger, connected lattice subregion S shaded in Figure 3(c). The prescription for these recovery maps is given in Figure 3(b). Referring to this figure, a recovery map R B→AB is applied to generate a sample from subregion A, conditioned on the state of region B. Explicitly, the mapping is given by linearly extending the map R B→AB (|b b| B ) = a D A|B (a|b) |a a| A ⊗ |b b| B . Note that, for a tripartite distribution D ABC , R B→AB (D BC ) = D A|B D B D C|B . To implement this recovery map, one can again restrict to gates in the lightcone of region AB and utilize standard tensor network simulation algorithms to generate a sample from the marginal distribution on A, conditioned on the (previously sampled) state of B. The time complexity for this step is again q O(dl) . After applying this and O(n/l 2 ) similar recovery maps, we obtain a sample from a distribution D S . By Lemma 8, the triangle inequality, and Definition 4, the error of this step is bounded as Note that the fact that the errors caused by recovery maps acting on disjoint regions accumulate at most linearly has been referred to previously [BK19] as the "union property" for recovery maps. The final step is very similar to the previous step. We now apply recovery maps, described by Figure 3(d), to fill in the "holes" of the subregion S and approximately obtain a sample from the full distribution D = D Λ . By a similar analysis, we find that the error incurred in this step is again O(1)(n/l 2 ) δ(l), and therefore the procedure samples from a distribution D Λ for which The second paragraph of the theorem follows immediately by choosing a suitable l = Θ(log n).
A straightforward application of Markov's inequality implies that a polynomial-time algorithm for sampling with error 1/ poly(n) succeeds with high probability over a random circuit instance if the output distribution CMI is exponentially decaying in expectation. We formalize this as the following corollary.
Corollary 5. Let C be a random circuit distribution. Define C to be δ(l)-Markov if, for any tripartition of a subregion X of the lattice into subregions X = A ∪ B ∪ C such that dist(A, C) ≥ l, we have where the angle brackets denote an average over circuit realizations and D is the associated classical output distribution. Then if d = O(1), q = O(1), and C is poly(n)e −Ω(l) -Markov, then for any polynomials r(n) and s(n), Patching can run in time poly(n) and, with probability 1−1/s(n) over the random circuit realization, sample from a distribution that is 1/r(n)-close to the true output distribution in variational distance.
Thus, proving that some uniform worst-case-hard circuit family C is poly(n)e −Ω(l) -Markov provides another route to proving the part of Conjecture 1 about sampling with small total variation distance error. In Section 6, we will give analytical evidence that if C is a random circuit distribution of sufficiently low depth and small qudit dimension, then C is indeed poly(n)e −Ω(l) -Markov, and if the depth or qudit dimension becomes sufficiently large, then C is not poly(n)f (l)-Markov for any f (l) = o(1), supporting Conjecture 2, which states that our algorithms exhibit computational phase transitions.
Finally, we note that Patching can also be used to estimate specific output probabilities of a random circuit instance C with high probability if C is drawn from a distribution C that is poly(n)e −Ω(l) -Markov. This shows that the Markov condition could also be used to prove the second part of Conjecture 1 regarding computing output probabilities with small error. This is similar to how SEBD can also be used to compute output probabilities, as discussed in Section 2.1.2.
with probability 1 − 1/s(n) for any polynomials r(n) and s(n), where D is the output distribution associated with C.
Proof. With probability 1 − 1/ poly(n) over the circuit instance C, Patching with some choice of lengthscale l = Θ(log n) efficiently samples from a distribution D C that is 1/ poly(n)-close in variational distance to D C for any choice of polynomials. Hence, for an output probability y chosen uniformly at random and a circuit C drawn from C, it holds that if l = c log n and c is a sufficiently large constant. By a nearly identical argument to that used in the proof of Corollary 3, due to the invariance of C under application of a final layer of single qudit gates, for some fixed x ∈ [q] n we also have for any choice of polynomial. Finally, it is straightforward to see that an instance of Patching that samples from D can also be used to exactly compute D (x) for any x. (To do this, the algorithm computes conditional probabilities via tensor network contractions as before, except instead of using these conditional probabilities to sample, it simply multiplies them together similarly to how SEBD can be used to compute output probabilities.) Applying Markov's inequality completes the proof. from prior works that exactly sampling from the output distribution of this random circuit family for arbitrary circuit instances or near-exactly computing a specific output probability with high probability is classically hard under standard complexity theoretic assumptions. We summarize these observations in the following lemma. Proof. We first note that Brickwork(L, r(L), v(L)) supports universal MBQC, in the sense that a specific choice of gates can create a resource state that is universal for MBQC. This is an immediate consequence of the proof of universality of the "standard" brickwork architecture (corresponding to r = 1) proved in [BFK09]. Indeed, when using the extended brickwork architecture for MBQC, measurements on the long 1D stretches of length 2r may be chosen such that the effective state is simply teleported to the end when computing from left to right (i.e., measurements may be chosen such that the long 1D segments simply amount to applications of identity gates on the effective state). Since a worst-case choice of gates creates a resource state for universal MBQC, an algorithm that can simulate an arbitrary circuit realization can be used to simulate arbitrary single-qubit measurements on a resource state universal for MBQC. Under post-selection, such an algorithm can therefore simulate PostBQP [RB01] and hence cannot be efficiently simulated classically unless the polynomial hierarchy collapses to the third level [BJS10].
Similarly, for some subsets of instances, it is #P-hard to compute the output probability of an arbitrary string, since (by choosing gates to create a resource state for universal MBQC) this would allow one to compute output probabilities of universal polynomial-size quantum circuit families which is known to be #P-hard. The result of [Mov19] is then applicable, which implies that if the gates are chosen Haar-randomly, efficiently computing the output probability of some fixed string with probability 1 − 1/ poly(n) over the choice of instance up to additive error bounded by 2 −Θ(n 2 ) implies the ability to efficiently compute a #P-hard function with high probability.
Our goal is to prove that SEBD can efficiently approximately simulate the extended brickwork architecture in the average case for choices of extension parameters for which the above hardness results apply. To this end, we first show a technical lemma which describes how measurements destroy entanglement in 1D shallow random circuits. In particular, given a 1D state generated by a depth-2 Haar-random circuit acting on qubits, after measuring some contiguous region of spins B, the expected entanglement entropy of the resulting post-measurement pure state across a cut going through B is exponentially small in the length of B. We defer the proof to Appendix B.
(We expect the result to remain true for general constant-depth random circuits in 1D acting on qudits, but we will only need the depth-2 case with qubits.) Lemma 11. Suppose a 1D random circuit C is applied to qubits {1, . . . , n} consisting of a layer of 2-qubit Haar-random gates acting on qubits (n, n + 1) for odd n, followed by a layer of 2-qubit Haar-random gates acting on qubits (n, n + 1) for even n.
for some universal constant c < 1, where the expectation is over measurement outcomes and choice of random circuit C.
We now outline the argument for why SEBD should be efficient for the extended brickwork architecture for sufficiently large extension parameters, before showing this more formally. During the evolution of SEBD as it sweeps from left to right across the lattice, it periodically encounters long stretches of length 2r in which no vertical gates are applied. We call these "1-local regions" since the gates applied in the corresponding effective 1D dynamics are 1-local when the algorithm is in such a region. Hence, in the effective 1D dynamics, no 2-qubit gates are applied and therefore the bond dimension of the associated MPS cannot increase during these stretches. It turns out that in 1-local regions, not only does the bond dimension needed to represent the state not increase, but it in fact rapidly decays in expectation. If r is sufficiently large, then the effective 1D state at the end of the 1-local region is very close to a product state with high probability, regardless of how entangled the state was before the region. Hence, when SEBD compresses the MPS describing the effective state at the end of the region, it may compress the bond dimension of the MPS to some fixed constant with very small incurred error. The two-qubit gates that are performed inbetween 1-local regions only increase the bond dimension by a constant factor. Hence, with high probability, SEBD can use a O(1) maximal bond dimension cutoff and simulate a random circuit with extended brickwork architecture with high probability. More precisely, it turns out that the scaling r ≥ Θ(log(n)) is sufficient to guarantee efficient simulation with this argument. We now give the argument in more detail before stating some implications as a corollary. Proof. Suppose the state stored by SEBD immediately before entering into a 1-local region is ψ A , defined on register A. After another O(r) iterations of SEBD, just before the end of the 1-local region, denote the new one-dimensional state stored by SEBD as ψ . Note that ψ is a random state, depending on both the random choices of gates in the 1-local region and the random measurement outcomes. We now bound the expected entanglement entropy of ψ across an arbitrary cut.
To this end, we observe that the random final state ψ may be equivalently generated as follows. Instead of iterating SEBD as usual for O(r) iterations, we first introduce a contiguous block of qubits that lie in the 1-local region. In particular, for all rows, we introduce all qubits that lie in columns {i, i +1, . . . , j}. Here, i is chosen to be the leftmost column such that the lightcone of column i does not contain qubits in register A. Similarly, j is chosen to be the rightmost column such that the lightcone of qubits in column j does not contain vertical gates. Note that |i −j| = Θ(r).
We next apply all gates in the lightcone of the qubits of columns {i, i+1, . . . , j}, before measuring these qubits in the computational basis. Note that in this step, we are effectively performing a set of L one-dimensional depth-2 Haar-random circuits, and then measuring Θ(r) intermediate qubits for each of the L instances. For each instance, we are left with a (generically entangled) pure state between a "left" and "right" subsystem, as illustrated in Figure 5. Let L i (R i ) denote the left (right) subsystem associated with row i, and let φ i L i R i denote the associated post-measurement pure state on these subsystems. By Lemma 11, it follows that the expected entanglement entropy for any 1D instance obeys E S(L i ) φ i ≤ 2 −Θ(r) where the expectation is over random circuit instance and measurement outcomes. The next step is to apply all gates in the lightcone of the qubits of registers A and L := ∪ i L i before measuring these registers, inducing a (random) 1D post-measurement pure state on subsystem R := ∪ i R i . It is straightforward to verify that the distribution of the random 1D pure state ψ R obtained via this procedure is identical to that obtained from repeatedly iterating SEBD through column j 4 . Indeed, the procedures are identical up to performing commuting gates and commuting measurements in different orders, which does not affect the measurement statistics or post-measurement states.
Our goal is now to bound the entanglement entropy S(R 1 R 2 . . . R k ) ψ in expectation across an arbitrary cut of the post-measurement 1D state. Such a bound follows from the concavity of the von Neumann entropy. Let ρ R 1 ,...,R k denote the reduced density matrix on these subsystems before the measurements on A and L are performed. Let ρ x R 1 ,...,R k denote the reduced density matrix on these subsystems after the measurements on A and L are performed and the outcome x is obtained; note that the final state ψ implicitly depends on x. Now, letting Pr[x] denote the probability of obtaining outcome x, we have the relation To see this, observe that for any set of measurement operators where the first line follows by definition, the second line follows from concavity of the von Neumann entropy, the third line uses the relation we discussed previously, and in the final line we used the fact that ρ R 1 ,...,R k is a product state. Hence, we see that for any fixed set of gates and for any outcomes of the measurements of qubits in columns i, i + 1, . . . , j, the expected entanglement entropy of the final 1D state ψ on R across any cut is bounded by the entropy across that cut before the measurements on subregions A and L. Taking the expectations of both sides of this result with respect to the random gates and measurement outcomes of the qubits in columns i, i +1, . . . , j, we finally obtain where we used the fact that k < L and E S(R i ) φ i ≤ 2 −Θ(r) . We now use the fact that the largest eigenvalue λ max (R 1 · · · R k ) of the reduced density matrix is lower bounded as λ max (R 1 · · · R k ) ψ ≥ 2 −S(R 1 ···R k ) ψ ; this follows from the fact that Shannon entropy upper bounds min-entropy. Using this inequality as well as Jensen's inequality, we have the bound Therefore, if we truncate all but the largest Schmidt coefficient across the R k : R k+1 cut, we incur an expected truncation error upper bounded by L2 −Θ(r) . Hence, by Markov's inequality, we incur a truncation error upper bounded by L2 −Θ(r) with probability at least 1 − 2 −Θ(r) . Therefore, if we run SEBD using a per bond truncation error of = L2 −Θ(r) and a maximum bond dimension cutoff of D = O(1), the failure probability will be upper bounded by Lv2 −Θ(r) ; here we used the union bound to upper bound the probability that any of the O(Lv) bonds over the course of the algorithm becomes larger than the cutoff D. Hence, by Corollary 1, for at least 1 − 2 −Θ(r) fraction of random circuit instances, SEBD can sample from the output distribution with variational distance error Lv2 −Θ(r) < n2 −Θ(r) . Similarly, by Corollary 3, for at least 1−2 −Θ(r) fraction of circuit instances, SEBD can compute the probability of the all-zeros output string up to additive error n2 −Θ(r) /2 n .
Since the runtime of SEBD is O(nD 3 ) when acting on qubits as discussed previously, and D is chosen to be constant for the version of the algorithm used here, the runtime is O(n).
With an appropriate choice of r = Θ(log(L)), the above result implies that SEBD can perform the simulation with error 1/ poly(n) for at least 1 − 1/ poly(n) fraction of instances. Similarly, choosing r to be a sufficiently large polynomial in L, SEBD can perform the simulation with error 2 −n 1−δ for 1 − 2 −n 1−δ fraction of instances, for any constant δ > 0. We summarize these observations as the following corollary.

Numerical results
We implemented 5 SEBD on two families of random circuits: one consisting of depth-3 random circuits defined on a "brickwork architecture" consisting of three layers of two-qubit Haar-random gates (Figure 4 with parameter r = 1), and the other being the random circuit family obtained by applying single-qubit Haar-random gates to all sites of a cluster state -we referred to this problem as CHR previously. Note that the former architecture has depth three (not including the measurement layer) and the latter has depth four, and both architectures support universal measurementbased quantum computation [BFK09], meaning they have the worst-case-hard property relevant for Conjecture 1. We did not implement Patching, due to its larger overhead.
Implementing SEBD on a standard laptop, we could simulate typical instances of the 409 × 409 brickwork model with truncation error 10 −14 per bond with a runtime on the order of one  Figure 6: Rényi half-chain entanglement entropies S k versus sidelength L in the effective 1D dynamics for the CHR and brickwork models, after 80 (resp. 550) iterations. Each point represents the entanglement entropy averaged over 50 random circuit instances, and over the final 10 (resp. 50) iterations for the CHR (resp. brickwork) model. Dashed lines depict the half-chain entanglement entropy scaling of a maximally entangled state, which can be created with a "worst-case" choice of gates for both architectures. The maximal truncation error per bond was 10 −10 for CHR and 10 −14 for the brickwork model.  The toy model predicts that the blue curve asymptotes to a straight line with slope two in the right figure, illustrated by the dashed orange line, corresponding to scaling like λ i ∼ 2 −Θ(log 2 (i)) . The plot is qualitatively consistent with this prediction. The spectrum for the brickwork model decays too quickly to obtain as useful statistics without going to much higher numerical precision. minute per sample, and typical instances of the 34 × 34 CHR model with truncation error 10 −10 per bond with a runtime on the order of five minutes per sample (these truncation error settings correspond to sampling errors of less than 0.01 in variational distance as derived previously in Section 2.1). We in fact simulated instances of CHR with grid sizes as large as 50 × 50, although due to the significantly longer runtime for such instances we did not perform large numbers of trials for these cases. In the case of the 409 × 409 brickwork model, performing over 3000 trials (consisting of generating a random circuit instance and generating a sample from its output distribution using a truncation error of 10 −14 ) and finding no instances we could not simulate, then with 95% confidence, we may conclude that the probability of successfully simulating a random instance with this truncation error is over 0.999. Using the bound derived in Section 2.1, we can therefore conclude that, with 95% confidence, we can sample from the output distribution of at least 0.9 fraction of 409 × 409 circuit instances with variational distance error at most 0.01. Intuitively, we expect the true simulable fraction to be much larger than this statistical guarantee, as it appears that the entanglement in the effective 1D dynamics only grows extensively for highly structured instances. Note that for both models, the runtime for a fixed truncation error was qualitatively highly concentrated around the mean. We expect that substantially larger instances of both random circuit families could be quickly simulated with more computing power, although we previously argued in Section 1.3 that the 409 × 409 simulation of the brickwork architecture is already far beyond what could have been achieved by previous simulation methods that we are aware of.
The discrepancy between maximal lattice sizes achieved for the two architectures is a result of the fact that the two have very different effective 1D dynamics. In particular, the entanglement of the effective dynamics for the brickwork architecture saturates to a significantly smaller value than that of the cluster state architecture. And even more directly relevant for prospects of fast simulation, the typical spectrum of Schmidt values across some cut of the effective 1D dynamics for the brickwork architecture decays far more rapidly than that of the 1D dynamics for CHR. For this reason, the slower-decaying eigenvalue spectrum of CHR was significantly more costly for the runtime of the algorithm. (In fact, the eigenvalue spectrum of the brickwork model decayed sufficiently quickly that we were primarily limited not by the runtime of our algorithm, but by our numerical precision, which could in principle be increased.) But while the slower decay of the spectrum for the CHR model necessitated a longer runtime for a given sidelength, it allowed us to study the functional form of the spectrum and in particular compare against the predictions of the toy model of Section 2.3 as we discuss below.
While we were computationally limited to probing low-depth and small-size models, our numerical results point toward SEBD having an asymptotic running time for both models bounded by poly(n, 1/ε, 1/δ) in order to sample with variational distance ε or compute output probabilities with additive error ε/q n with probability 1−δ, suggesting that Conjecture 1 is true. Our numerical evidence for this is as follows.
1. We find that the effective 1D dynamics associated with these random circuit families appear to be in area-law phases, as displayed in Figure 6. That is, the entanglement does not grow extensively with the sidelength L, but rather saturates to some constant (which appears to be concentrated, as illustrated in Figure 7). We furthermore observe qualitatively identical behavior for some Rényi entropies S α with α < 1. It is known [Sch+08] that this latter condition is sufficient to imply that a 1D state may be efficiently described by an MPS, indicating that SEBD is efficient for these circuit families and that Conjecture 1 is true.
2. For further evidence of efficiency, we study the functional form of the entanglement spectra of the effective 1D dynamics. For the effective 1D dynamics corresponding to CHR, we observe superpolynomial decay of eigenvalues (i.e. squared Schmidt values) associated with some cut, displayed in Figure 8, indicating that choosing a maximal bond dimension of D = poly(1/ ) is more than sufficient to incur less than truncation error per bond. The observed spectrum tends toward a scaling which is consistent with the asymptotic scaling of λ i ∼ 2 −Θ(log 2 (i)) predicted by the toy model of Section 2.3 and consistent with our Conjecture 1'. Note that this actually suggests that the required bond dimension of SEBD may be even smaller than poly(1/ ), scaling like D = 2 Θ( √ log(1/ )) .
While these numerical results may be surprising given the prevalence of average-case hardness conjectures for quantum simulation, they are not surprising from the perspective of the recent works (discussed in previous sections) that find strong evidence for an entanglement phase transition from an area-law to volume-law phase for 1D unitary-and-measurement processes driven by measurement strengths. Since the effective dynamics of the 2D random shallow circuits we study are exactly such processes, our numerics simply point out that these systems are likely on the area-law side of the transition. (However, no formal universality theorems are known, so the various models of unitary-and-measurement circuits that have been studied are generally not known to be equivalent to each other.) In the case of the brickwork architecture, we are also able to provide independent analytical evidence (Section 6.4) that this is the case by showing the "quasientropy"S 2 for the 1D process is in the area-law phase. We leave the problem of numerically studying the precise relationship between circuit depth, qudit dimension, properties of the associated stat mech models (including "quasi-entropies") as discussed in subsequent sections, and the performance of SEBD for future work. In particular, simulations of larger depth and larger qudit local dimension could be used to provide numerical support for Conjecture 2, which claims that as these parameters are increased the circuit architectures eventually transition to a regime where our algorithms are no longer efficient.

Mapping random circuits to statistical mechanical models
The close relationship between general tensor networks and statistical mechanics has been explored in previous work (see, e.g. [RS18]). The relationship becomes especially useful when the tensors are drawn from random ensembles. For example, a mapping from random tensor networks to stat mech models was used to study holographic duality in [Hay+16;Vas+18]. The same idea was applied to tensor networks where the tensors are Haar-random unitary matrices, i.e. random quantum circuits, to study quantum chaos and quantify entanglement growth under random unitary dynamics [NVH18; Von+18; ZN19; Hun19]. Very recently, [BCA19] and [Jia+19] used this mapping to argue that the phase transition from area-law to volume-law scaling of entanglement entropy numerically observed in circuits consisting of Haar-random local gates mixed with projective measurements is related to the disorder-to-order phase transition in Ising-like statistical mechanical models.
We present the details of this mapping for unitary quantum circuits in Section 5.1 and then in Section 5.2, we show how it is applied to 1D circuits interspersed with weak measurements, as was done in [BCA19; Jia+19] (however, we analyze a different weak measurement). This provides background and context for our application of the mapping directly to 2D circuits, the relevant case for our algorithms, in Section 6.

Mapping procedure
Setup. Let our system consist of n qudits of local dimension q. The circuits we consider are specified by a sequence of pairs of qudits (indicating where unitary gates are applied) and singlequdit weak measurements; this sequence can be assembled into a quantum circuit diagram. The single-qudit measurements are each described by a set M of measurement operators along with a probability distribution µ over the set M. These sets are normalized such that tr M † M is constant for all M ∈ M and E M←µ M † M = I q where I q is the q × q identity matrix. Thus we have tr M † M = q for all M. The introduction of a probability measure over M in our notation, which was also used in [Jia+19], is not conventional, but it is equivalent to the standard formulation and will be important for later definitions.
When a measurement is performed, if the state of the system at the time of measurement is σ , the probability of measuring the outcome associated with operator M is µ(M) tr Mσ M † (Born rule for quantum measurements). For a fixed outcome M, the quantity tr Mσ M † is a function of σ that we refer to as the relative likelihood of obtaining the outcome M on the state σ , since it gives the ratio of the probability of obtaining outcome M in the state σ to the probability of obtaining outcome M in the maximally mixed state 1 q I q . After obtaining outcome M, the state is updated by the rule σ → Mσ M † / tr Mσ M † . Thus a pure initial state remains pure throughout the evolution.
For notational convenience and without loss of generality, we will assume that for each u, the uth unitary is immediately followed by single-qudit measurements (M u , µ u ) and (M u , µ u ) on the qudits a u , a u ∈ [n] that are acted upon by the unitary, respectively; in the case no measurement is performed, we may simply take M u to consist solely of the identity operator, and in the case that more than one measurement is performed, we may multiply together the sets of measurement operators and their corresponding probability distributions to form a single set describing the overall weak measurement.
Thus, the (non-normalized) output state of the circuit with l unitaries acting on the initial state |1 . . . 1 can be expressed as where each unitary U u is chosen from the Haar measure over unitaries acting on qudits a u and a u , while M u and M u are the measurement operators associated with the measurement outcome obtained upon performing a measurement on qudits a u and a u , respectively, following application of unitary U u .
Goal. The objective of the stat mech mapping is to learn something about the entanglement entropy for the output state ρ on some subset A of the qudits. The k-Rényi entanglement entropy for the state ρ on region A is defined as where Z k,∅ = tr(ρ) k (32) and ρ A is the reduced density matrix of ρ on region A. The von Neumann entropy S(A) ρ = − tr ρ A tr(ρ) log ρ A tr(ρ) represents the k → 1 limit.
For the purposes of assessing the efficiency of our algorithms, we would like to be able to calculate the average value of the k-Rényi entropy over the random choice of the unitaries in the circuit and for measurement outcomes drawn randomly according to the Born rule. Mathematically, we let the notation E U (Q) represent the expectation of a quantity Q when the unitaries of the circuit are drawn uniformly at random from the Haar measure and the measurement outcomes are drawn at random from the distribution over their respective sets of measurement operators Here U (q 2 ) denotes integration over the Haar measure of the unitary group with dimension q 2 . To take into account the Born rule, a certain choice of unitaries and measurement outcomes leading to the output state ρ, as in Eq. (30), should be weighted by tr(ρ), i.e. the product of the relative likelihoods of all the measurement outcomes. Thus, the relevant average k-Rényi entropy values are given by .
As we will see, the quantity naturally computed by the stat mech model is not S k (A) ρ , but rather the "quasi-entropy" given bỹ where F k,∅/A := − log E U (Z k,∅/A ) will be associated with the "free energy" of the classical stat mech model that the circuit maps to. When clear, we abbreviateS k (A) byS k and S k (A) ρ by S k . It is apparent that S k is not equal to the quasi-entropyS k : the definition of S k weights circuit instances by tr(ρ) and takes the log before the expectation, while the definition ofS k weights by tr(ρ) k and takes the log afterward. Indeed, it is possible forS k to be smaller than some constant independent of L (area law), while S k scales extensively with L (volume law) due to fluctuations of the random variable Z k,A away from its average value toward 0. Importantly, though,S k → S k as k → 1; in this limit both quantities approach the expected observed von Neumann entropy S of the circuit output. This conclusion is justified by L'Hospital's rule and noting that Z k,A /Z k,∅ → 1 as k → 1. We will see that the stat mech model can only be applied for integers k ≥ 2, so unfortunately it does not allow for direct access to formulã S k in this limit. Nonetheless, the fact that the quasi-entropyS k is intimately related to the actual expectation of the von Neumann entropy lends some justification to the use ofS k as a proxy for S k even when k 1. This is further justified by previous work studying random 1D circuits without measurements; in [Von+18], the growth rate ofS 2 in random 1D circuits was calculated using the stat mech mapping (note that when there are no measurements, we have tr(ρ) = 1 and thus Z k,∅ = 1 for any k), and no significant difference was found with numerical calculations of S 2 . Moreover, [ZN19] used the so-called replica trick to directly compute S 2 as a series in 1/q, where q is the qudit local dimension, and the leading term of this expansion agrees withS 2 . While the replica trick is not completely rigorous, this gives a clear indication thatS 2 is a valid substitute for S 2 in the q → ∞ limit and it is a good approximation for finite q (when there are no measurements).
Mapping. We now describe the procedure for mapping a quantum circuit diagram to a classical statistical mechanical model, such that quantities E U (Z k,∅ ) and E U (Z k,A ) for integers k ≥ 2 are given by partition functions of the stat mech model. This follows work in [NVH18; Von+18; ZN19; Hun19; BCA19; Jia+19], although our presentation is for the most part self-contained. Here we present merely how to perform the mapping, leaving the details of its justification to Appendix C.
To define the stat mech model we must specify two ingredients: first, the nodes and edges that form the interaction graph on which the model lives, and second, the details of the interactions between nodes that share an edge. The graph, which is the same for all k, is formed from the circuit diagram as follows. First, we replace each Haar-random unitary (labeled by integer u) in the circuit diagram with a pair of nodes, which we refer to as the incoming node t u and outgoing node s u for that unitary, and we connect nodes t u and s u by an edge. Then, we add edges between the outgoing node s u 1 of unitary u 1 and the incoming node t u 2 of another unitary u 2 when u 2 acts immediately after u 1 on the same qudit (ignoring for now the presence of any measurements made in between). Finally, we introduce a single auxiliary node x a for each qudit a ∈ [n] and add a single edge connecting x a to the outgoing node s u for unitary u if u is the final unitary of the circuit to act on qudit a. Thus, all of the outgoing nodes have degree equal to three. The incoming node for each unitary has degree three minus the number of qudits (0, 1, or 2) for which the unitary is the first in the circuit to act on that qudit. We provide an example of this mapping in Figure 9. Each node in Figure 9: Example of stat mech mapping applied to a circuit diagram with 4 qudits and 5 Haarrandom gates. Thick orange edges carry Weingarten weight. Black edges carry measurementdependent weight.
the graph may now be viewed as a spin that takes on one of k! values, corresponding to an element of the symmetric group S k . A spin configuration is given by an assignment (σ u , τ u ) ∈ S k × S k for each pair of nodes (s u , t u ), as well as an assignment χ a ∈ S k to each auxiliary node x a . The main utility of the stat mech mapping is then given by the following equation, expressing the quantities E U (Z k,∅/A ) as a sum over spin configurations on this graph This is a partition function -a weighted sum over spin configurations where the weight of each term is given by a product of factors that depend only on the spin value of a pair of nodes (s, t) connected by an edge, denoted st . In this case, the sum runs only over the values σ u and τ u , of the incoming and outgoing nodes; the values χ a of the auxiliary nodes are fixed across all the terms and encode the boundary conditions that differ between E U (Z k,∅ ) and E U (Z k,A ). We define the free energy to be the negative logarithm of this partition function (see Eq. (36)), mirroring the standard relationship F = −k B T log(Z) between the free energy and the partition function from statistical mechanics, with k B T set to 1. We now specify the details of the interaction by defining the weight function for different edges. There are only two different kinds of interactions. Edges s u t u between incoming and outgoing nodes of the same unitary have weight( s u t u ) = wg(τ u σ −1 u , q 2 ) (38) where wg(π, q 2 ) is the Weingarten function. The Weingarten function arises from performing the integrals over the Haar measure in Eq. (34), and one formula for it is given in the appendix in Eq. (114). Note that there exist permutations π for which wg(π, q 2 ) < 0, so the overall weight of a configuration can be negative and our stat mech model would only correspond to a physical model with complex-valued energy. Meanwhile, edges s u 1 t u 2 connecting nodes of successive unitaries u 1 and u 2 (resp. edges s u x a connecting outgoing nodes to auxiliary nodes) have weight that depends both on the setting of variables σ u 1 and τ u 2 (resp. σ u and χ a ) as well as on the distribution µ over some set M of single-qudit measurement operators that act on the qudit between the application of unitaries u 1 and u 2 (resp. after unitary u). This weight is given by where W π is the operator acting on a k-fold tensor product space that performs the permutation π of the registers. Later, in Section 6.3, we will be interested in expressing entropies of the classical output distribution of the circuit in terms of partition functions and to handle this case we will update Eq. (40). Note that the quantity tr X ⊗k W π is equal for all π with the same cycle structure, which corresponds to some partition λ = (λ 1 , . . . , λ r ) of k, where i λ i = k and λ 1 ≥ . . . ≥ λ r > 0. Then we have This formula allows us to simplify the weight formulas (39) and (40) in a few special cases. If no measurement is made, then M = {I} and weight( s u 1 t u 2 ) = q C(σ u 1 τ −1 u 2 ) , where C(π) is the number of cycles r in the permutation π. On the other hand, if a projective measurement onto one of the q basis states is made, then M = { √ qΠ m } q m=1 and µ is the uniform distribution, where Π m = |m m|. Since in this case tr (M † M) w = q w for any power w and any M ∈ M, we have weight( s u 1 t u 2 ) = q k−1 for any pair σ u 1 , τ u 2 .
The final piece of this prescription is setting the value χ a for each of the auxiliary nodes x a at the end of the circuit, which can be seen as fixing the boundary conditions for the stat mech model. These nodes are fixed to the same value for each term in the sum and depend on whether we are calculating E U (Z k,∅ ) or E U (Z k,A ), and whether the qudit a is in the region A. For E U (Z k,∅ ), the value χ a is fixed to the identity permutation e for every a. Meanwhile, for E U (Z k,∅ ), we "twist" the boundary conditions and change χ a to be the k- Special case of k = 2. When k = 2, the symmetric group S k has only 2 elements, identity e and swap (12), so the quantities E U (Z 2,∅ ) and E U (Z 2,A ) map to partition functions of Ising-like classical stat mech models where each node takes on one of two values. Furthermore, in the k = 2 case with no measurements, it was shown in [NVH18;Von+18] (see also [ZN19;Hun19]) that one can get rid of all negative terms in the partition function by decimating half of the nodes, i.e. explicitly performing the sum over the values of the incoming nodes {τ u } u in Eq. (37). This continues to be true even when there are measurements in between unitaries in the circuit. However, the decimation causes the two-body interactions to become three-body interactions between any three nodes s u 1 , s u 2 , s u 3 when unitary u 3 succeeds unitaries u 1 and u 2 and shares a qudit with each.
The lack of negative weights for k = 2 is convenient because it allows one to view the system as a classical spin model at a real temperature and can therefore be analyzed with well-studied numerical techniques like Monte Carlo sampling.

Mapping applied to 1D circuits with weak measurements
In Section 2.2, we discussed the connection between the effective 1D dynamics of our SEBD algorithm and previous work [LCF18; Cha+18; SRN19; LCF19; SRS19; Cho+19; GH19a; GH19b; Zab+19] on 1D Haar-random circuits with some form of measurements in between each layer of unitaries.
In this subsection, we apply the stat mech mapping to the 1D with weak measurement model and explain the connection between the area-law-to-volume-law transition that has been observed in numerical simulations and the disorder-to-order thermal transition in the classical stat mech model, which occurs at a non-zero critical temperature T c . This analysis was first performed in [BCA19] and independently in [Jia+19]. The results presented in this section are a reproduction of their analysis but for a different weak measurement, chosen to be relevant for the dynamics of the SEBD algorithm acting on the CHR problem. We include this analysis for two purposes: first, to shed light on the behavior of SEBD acting on CHR, and second, to serve as a more complete example of the stat mech mapping in action, prior to our application of the mapping directly to 2D circuits in Section 6, where the analysis is somewhat more complicated.
Mapping to the honeycomb lattice. Let us assume our circuit has n qudits of local dimension q arranged on a line with open boundary conditions. A circuit of depth d acts on the qudits where each layer consists of nearest-neighbor two-qudit Haar-random unitaries. In between each layer of unitaries, a weak measurement is performed on every qudit, described by the set M of measurement operators and a probability distribution µ over M. The first step of the stat mech mapping is to replace each Haar-random unitary with a pair of nodes and connect these nodes according to the order of the unitaries acting on the qudits. The second step is to introduce a new auxiliary node for each qudit and connect each outgoing node within the final layer of unitaries to the corresponding pair of auxiliary nodes. The resulting graph is the honeycomb lattice, as shown in Figure 10(b). We now review what the interactions are on this graph. The horizontal links in Figure 10(b) host interactions that contribute a weight equal to the Weingarten function. When k = 2, the interaction depends only on if the pair of nodes agree (σ u τ −1 u = e) or if they disagree (σ u τ −1 u = (12)). In this case the interactions are given explicitly by Meanwhile, the diagonally oriented links in Figure 10(b) host interactions that depend on the details of the weak measurement being applied in between each layer of unitaries, which we now define.
Weak measurement and diagonal weights. The weak measurement we choose is given as follows. For a fixed q × q unitary matrix U , define M (m) that is, the q × q matrix whose diagonal entries are given by the mth row of U , scaled by a factor of √ q, and whose off-diagonal entries are 0. Define the probability distribution µ U be the uni- U is a projector onto the mth basis state (scaled by a factor of √ q), and the weak measurement is simply a projective measurement onto the computational basis. The weak measurement that we consider here will be a mixture of the weak measurement (M U , µ U ) for different U . Formally, we take M = ∪ U ∈U (q) M U . We let the distribution µ over M be the distribution resulting from drawing U according to the Haar measure, and then drawing M from M U uniformly at random. This weak measurement is seen to exactly reproduce the weak measurement of SEBD acting on CHR in Algorithm 4 when q = 2, where the measurement operators were the diagonal matrices with angles (θ, φ) drawn according to the Haar measure on the sphere. Indeed, even for q 2, this weak measurement arises from a natural generalization of the CHR problem, where one makes Haar-random measurements on a cluster state of higher local dimension, which is created by applying a generalized Hadamard gate to each qudit followed by a generalized CZ gate on each pair of neighboring qudits on the 2D lattice.
To compute the weights on the edges of the stat mech model for k = 2, we apply the formula in Eqs. (39) and (40).
where in the second line we have invoked the Haar integration formula that appears in Eq. (113) later in Appendix C, and then substituted the explicit values for the Weingarten function when k = 2. The formula for weight( s u x a ) is given similarly. We can see that for all q > 1, the weight is larger when the values of the nodes agree than when they disagree, indicating a ferromagnetic Ising interaction. Indeed, the interaction for k = 2 will be ferromagnetic regardless of what weak measurement M is made since tr M † M 2 ≥ tr (M † M) 2 holds for all M. Furthermore, for our choice of weak measurement, the ferromagnetic Ising interaction becomes stronger as q increases.
Eliminating negative weights via decimation when k = 2. The possibility of a negative weight on the horizontal edges of the honeycomb lattice in Figure 10(b) appears to impede further progress in the analysis since the classical model cannot be viewed as a physical system with real interaction energies at a real temperature. For k = 2, this problem may be circumvented by decimating half of the spins; that is, we explicitly perform the sum over {τ u } u in the partition function in Eq. (37), yielding a new stat mech model involving only the outgoing nodes s u . Since the decimated incoming nodes (except for those in the first layer) each have three neighbors, all three of which are undecimated outgoing nodes, the new model will have a three-body interaction between each such trio of nodes. We may furthermore observe that, for our choice of weak measurement when k = 2, the three-body weight may be re-expressed as the product of three twobody weights acting on the three edges of the triangle. Below we give formulas for the two-body weights; our formulas are a unique decomposition of the three-body interaction up to a shifting of overall constant factors from one link to another. Thus, via decimation we have moved from the honeycomb lattice with two-body interactions to the triangular lattice with two-body interactions, as illustrated in Figure 10(c). There are two kinds of two-body interactions on this triangular lattice. Vertically oriented links between nodes s u 1 and s u 2 host anti-ferromagnetic interactions and diagonally oriented links host ferromagnetic interactions, where For all values of the measurement strength p, the ferromagnetic interactions are stronger than the anti-ferromagnetic interaction.
Phase diagram. The model described above for k = 2 is exactly the anisotropic Ising model on the triangular lattice. In general this model may be described by its energy functional where g i ∈ {+1, −1} are Ising spin variables and the three sums are over links along each of the three triangular axes. This model has been studied and its phase diagram is well understood [Hou50;Ste70]. In the setting where along two of the axes the interaction strength is equal in magnitude and ferromagnetic, while along the third axis it is weaker in magnitude and antiferromagnetic, the model is known to experience a phase transition as the temperature is varied. At high temperatures, it is in the disordered phase; in other words, samples drawn from the thermal distribution exhibit exponentially decaying correlations between spin values σ u with a constant correlation length of ξ. At low temperatures, it is in an ordered phase where samples exhibit long-range correlation. At the critical point, the interaction strengths satisfy the equation [Hou50;Ste70] sinh(2J 1 ) sinh(2J 2 ) + sinh(2J 2 ) sinh(2J 3 ) + sinh(2J 1 ) sinh(2J 3 ) = 1.
For us, parameter q plays the role of the temperature, and the interaction strengths, derived from Eqs. (52) and (53), are given by Using these equations, we can solve for the critical point, and we find it to be q c = 3.249. Only integer values of q correspond to valid quantum circuits, so we conclude that the model is disordered when q = 2 or q = 3 and ordered when q ≥ 4. We plot this one dimensional phase diagram in Figure 11. Connection between (dis)order and scaling of entanglement entropy. We expect the scaling of the quantityS 2 = F 2,A − F 2,∅ = − log E U (Z 2,A )/ E U (Z 2,∅ ) to be related to the order or disorder of the model by the following argument. For E U (Z 2,∅ ), the auxiliary spins are all set to χ a = e, biasing the bulk spins nearby to prefer e over (12). For E U (Z 2,A ), the spins within the region A are twisted so that χ a = (12), introducing a domain wall at the boundary. In the ordered phase, the bias introduced at the boundary extends throughout the whole bulk since there is no decay of correlation with distance. The domain wall at the boundary in the calculation of E U (Z 2,A ) forces the bulk to separate into two regions with distinct phases separated by a domain wall that cuts through the bulk. The domain wall has length of order min(|A|, d) where |A| is the number of sites in region A and d is the depth. In the calculation of E U (Z 2,∅ ), there is no domain wall. The addition of one additional unit of domain wall within a configuration leads the weight of the configuration to decrease by a constant factor, so in the ordered phase we expect − log E U (Z 2,A )/ E U (Z 2,∅ ) = O(min (|A|, d)). Meanwhile, in the disordered phase, there is a natural length scale ξ that boundary effects will penetrate into the bulk. The domain wall at the boundary due to twisted boundary conditions will be washed out by the bulk disorder after a distance on the order of ξ = O(1). Thus we expect − log E U (Z 2,A )/ E U (Z 2,∅ ) = O(1). A cartoon illustrating this logic appears later in Figure  13 when we consider the stat mech mapping applied to 2D circuits. For further discussion of the connection between order-disorder properties of the stat mech model and entropic properties of the underlying quantum objects, see [Vas+18; BCA19; Jia+19]. This logic suggests that, if we take the scaling ofS 2 to be a good proxy for the scaling of S 2 , the disorder-to-order phase transition in the classical model would be accompanied by an arealaw-to-volume-law phase transition in the Rényi-2 entropy of the output of random circuits.
Relationship to numerical simulation of SEBD on CHR. In Section 2.2, with fixed q = 2, it was established that the effective dynamics of SEBD running on CHR are alternating layers of entangling two-qubit CZ gates and weak measurements on every qubit of a 1D line, where the form of the weak measurement is given explicitly. The dynamics we have studied in this section use the same weak measurement, but choose the two-qubit entangling gates to be Haar-random. We have established that the quasi-2-entropyS 2 satisfies an area law for this process when q = 2, and the statement remains true for q = 3 when the weak measurement corresponds to a natural generalization of the CHR problem to larger local dimension. For q = 4, it is no longer true; the dynamics ofS 2 satisfy a volume law.
Due to the similarity between the dynamics studied in this section and that of SEBD running on CHR, our conclusion provides a partial explanation for the numerical observation presented in Section 4 that the average entanglement entropy S k satisfies an area law when SEBD runs on CHR for q = 2 and various values of k.
Additional observations appearing in previous work. The above analysis is essentially a restatement of what appears in recent works by Bao, Choi, and Altman [BCA19] and separately Jian, You, Vasseur, and Ludwig [Jia+19], except that here we analyzed a different weak measurement. In particular, [BCA19] considered the case where a projective measurement occurs with some probability p on each qudit after each layer of unitaries, and otherwise there is no measurement. They made the observation that we describe above that the k = 2 mapping can be written as a 2-body anisotropic Ising model on the triangular lattice with an exact solution. Both of these papers went beyond what we have presented here to analyze the q → ∞ limit directly, where they observed that the stat mech model becomes a standard ferromagnetic Potts model on the square lattice for all integers k. For k = 2 this is exactly the square lattice Ising model and indeed, we can see from Eq. (57) that when q → ∞, J 3 → 0; the anti-ferromagnetic links along one axis vanish leaving a square lattice with exclusively ferromagnetic interactions. The fact that the model becomes tractable for all integers k ≥ 2 allows these papers to invoke analytic continuation and make sense of the k → 1 limit, where the quasi-entropyS k exactly becomes the expected von Neumann entropy S .

Evidence for efficiency of algorithms from statistical mechanics
The efficiency of both algorithms hinges on the behavior of certain entropic quantities of the quantum state produced by the quantum circuit. In the SEBD algorithm, the bond dimension of the 1D MPS is truncated at each step of the MPS evolution, introducing small error but keeping the runtime of the algorithm efficient (i.e. polynomial in n) so long as the entanglement entropy across any cut of the MPS obeys an area law 6 . In the Patching algorithm, the classical conditional mutual information I(A : C|B) of the joint distribution of measurement outcomes must decay exponentially in the size of region B, where A, B, and C are regions of the lattice with B separating A from C. In this section, we examine the behavior of these entropic quantities for typical circuit instances by employing the mapping introduced in Section 5.
While the stat mech mapping falls short of providing a fully rigorous justification of the efficiency of the algorithms, it is the most promising tool we are aware of to analytically understand the behavior of the SEBD and Patching algorithm when they are running on circuits made from Haar-random gates. At the very least, the mapping provides important intuition for why these algorithms would be efficient in certain cases and not efficient in others. Furthermore, it lays the groundwork for what could constitute a more rigorous justification. The main roadblock is the fact that the stat mech mapping gives an expression for the "quasi-entropy," which is only exactly equal to the average entanglement entropy in a limit that cannot be directly accessed by the mapping. From a practical perspective, the stat mech model could provide a way to numerically estimate some of these quasi-entropies, which could possibly aid in predicting the runtime and error of the algorithms in regimes where it may not be easy to tell by running the algorithm itself.
Modulo the concern of quasi-entropy vs. entropy, the stat mech mapping predicts the following about the efficiency of the algorithms: 1. For 2D circuits with nearest-neighbor Haar-random gates of sufficiently small depth d and sufficiently small local dimension q, we expect both algorithms to be efficient in the system size. That is, they produce samples from a distribution with ε total variation distance from the ideal distribution on a fraction 1 − δ of circuit instances, and they have runtime poly(L 1 , L 2 , 1/ε, 1/δ)q poly(d) . In particular, this is true for depth d = 3 and local dimension q = 2 for the uniform "brickwork" architecture where exact simulation is known to be hard in the worst case, under plausible complexity theoretic assumptions.
2. For a fixed depth d, we expect the algorithms to become inefficient once the local dimension exceeds some critical value q c .
3. For a fixed local dimension q, we expect the algorithms to become inefficient once the depth exceeds some critical value d c .
The efficient-to-inefficient transition in computational complexity is related to a disorder-toorder phase transition in the classical model. Item 1 above should be regarded as evidence for Conjecture 1, while items 2 and 3 are evidence for Conjecture 2.

Mapping applied to general 2D circuits
We now apply the mapping described in Section 5.1 directly to a depth-d circuit acting on a √ n× √ n lattice of qudits consisting of nearest-neighbor two-qudit Haar-random gates, but without measurements in between the gates. This is the relevant case for the algorithms presented in this paper. In this section, we will assume for concreteness that the first unitary layer includes gates that act on qudits at gridpoints (i, j) and (i, j + 1) for all odd i and all j, the second layer on (i, j) and (i, j + 1) for all even i and all j, the third layer on (i, j) and (i + 1, j) for all i and all odd j, and the fourth layer on (i, j) and (i + 1, j) for all i and all even j. Subsequent layers then cycle through these four orientations.
The model resulting from the stat mech mapping. Replacing the unitaries in the circuit diagram with pairs of nodes and connecting them as described in Section 5.1 yields a graph embedded in three dimensions. The nodes in this graph still have degree three, so locally the graph looks similar to the honeycomb lattice, but globally the nodes form a 3D lattice that can be viewed roughly as a √ n × √ n × d slab, although the details of how these nodes connect is not straightforward to visualize. We have included pictures of the graph in Figure 12.
As before, edges between nodes originating from the same unitary are assigned a weight equal to the Weingarten function. Since there is no measurement between unitaries, we may take the measurement operator applied on the edges that connect successive unitaries to be the identity, and we find that the weight contributed by these edges is weight( s u 1 t u 2 ) = q C(σ u 1 τ −1 u 2 ) . For k = 2 this amounts to a ferromagnetic Ising interaction where To analyze the output state, we will divide the n qudits into three groups A, B, and C. We suppose that after the d unitary layers have been performed, a projective measurement is performed on the qudits in region B. Qudits in regions A and C are left unmeasured and we wish to calculate quantities like E U (Z k,∅/A ). The mapping calls for us to introduce an auxiliary node for each qudit in the circuit. However, the projective measurement of qudits in the region B effectively isolates auxiliary nodes associated with qudits in region B; the edge connecting it to one of the bulk nodes carries a weight that is constant across all bulk configurations. Thus we may equivalently omit the introduction of auxiliary nodes in region B as well as any edges that would be connected to them.
Eliminating negative weights via decimation when k = 2. The quantities E U (Z k,∅/A ) are now given by classical partition functions on this graph with appropriate boundary conditions for the auxiliary nodes in regions A and C. We wish to understand whether this stat mech model is ordered or disordered. Again, we are faced with the issue that the Weingarten function can take negative values and thus some configurations over this graph could have negative weight. As in the case of 1D circuits with weak measurements, we can partially rectify this by decimating all the incoming nodes. The resulting graph has half as many nodes and interactions between groups of three adjacent nodes. In the case of 1D circuits with weak measurements when k = 2, we could decompose this three-body interaction into three two-body interactions. If we try to do the same here, we find the two-body interactions have infinite strength. Instead, we work directly with the three-body interaction between nodes s u 1 , s u 2 , and s u 3 , where unitary u 3 acts after u 1 and u 2 , for which there is a simple formula for the weights: Now, all the weights are non-negative. Moreover, the largest weight occurs when all the nodes agree, indicating a generally ferromagnetic interaction between the trio of nodes. If one of the bottom two nodes disagrees with the other two, the weight is reduced by a factor of q + 1/q. When the top node disagrees, the weight is 0; these configurations are forbidden and contribute nothing to the partition function.
Allowed domain wall configurations and disorder-order phase transitions. Using this observation, we can understand the kinds of domain wall structures that will appear in configurations that contribute non-zero weight. In this setting, domain wall structures are membrane-like since the graph is embedded in 3D. Membranes that have upward curvature, shaped like a bowl, are not allowed, because somewhere there would need to be an interaction where the upper node disagrees with the two below it. On the other hand, cylindrically shaped domain wall membranes do not have this issue, nor do dome-shaped membranes. The weight of a configuration is reduced by a factor of q + 1/q for each unit of domain wall, an effect that acts to minimize the domain wall size when drawing samples from the thermal distribution (energy minimization). On the other hand, larger domain walls have more configurational entropy -there are many ways to cut through the graph with a cylindrically shaped membrane -an effect that acts to bring out more domain walls in samples from the thermal distribution (entropy maximization). The question is, which of these effects dominates? For a certain setting of the depth d and local dimension q, is there long-range order, or is there exponential decay of correlations indicating disorder? Generally speaking, increasing depth magnifies both effects: cylindrical domain wall membranes must be longer -meaning larger energy -when the depth is larger; however, longer cylinders also have more ways of propagating through the graph. Meanwhile, increasing q only magnifies the energetic effect since it increases the interaction strength and thus the energy cost of a domain wall unit but leaves the configurational entropy unchanged.
Thus, in the limit of large q we expect the energetic effect to win out and the system to be ordered for any circuit depth d and any circuit architecture. What about small q? Physically speaking, q must be an integer at least 2 since it represents the local Hilbert space dimension of the qudit. However, the statistical mechanical model itself requires no such restriction, and we can allow q to vary continuously in the region [1, ∞). Then for q → 1, the energy cost of one unit of domain wall becomes minimal (but it does not vanish). Depending on the exact circuit architecture and the depth of the circuit, the system may experience a phase transition into the disordered phase once q falls below some critical threshold q c . The depth-3 circuit with brickwork architecture that we present later in Section 6.4 provides an example of such a transition. It is disordered when q = 2 and experiences a phase transition as q increases to the ordered phase at a transition point we estimate to be roughly q c ≈ 6.
When q is fixed and d is varied, it is less clear what to expect. Suppose for small d, the system is disordered. Then increasing d will amplify both the energetic and entropic effects, but likely not in equal proportions. If the amplification of the energetic effect is stronger with increasing depth, then we expect to transition from the disordered phase to the ordered phase at some critical value of the depth d c . Without a better handle on the behavior of the stat mech model, we cannot definitively determine if and when this depth-driven phase transition will happen. However, we have other reasons to believe that there should be a depth-driven phase transition. In particular, we now provide an intuitive argument for why a disorder-order transition in the parameter q should imply a disorder-order transition in the parameter d. Consider fixed d, and another fixed integer r ≥ 1 such that d/r 1. We may group together r × r patches of qudits to form a "supersite" with local dimension q r 2 . Similarly, we may consider a "superlayer" of O(r) consecutive unitary layers. Since O(r) layers is sufficient to implement an approximate unitary k-design on a r × r patch of qudits (taking k = O(1)) [HM18], we intuitively take each superlayer to implement a Haar-random unitary between pairs of neighboring supersites. Thus, a depth-d circuit acting on qudits of local dimension q is roughly equivalent to a depth-O(d/r) circuit acting on qudits of local dimension q r 2 in the supersite picture. If for a fixed d, we observe a disorderorder phase transition for increasing q, then for fixed q and fixed d/r, we should also observe a disorder-order phase transition with increasing r. Equivalently, we should see a transition for fixed q and increasing d. This logic is not perfect because superlayers do not exactly map to layers of Haar-random two-qudit gates between neighboring supersites, but nonetheless we take it as reason to expect a depth-driven phase transition.

Efficiency of SEBD algorithm from stat mech
The efficiency of the SEBD algorithm relies on the error incurred during the MPS compression being small. If the inverse error has a polynomial relationship (or better) with the bond dimension of truncation, then the algorithm's time complexity is polynomial (or better) in the inverse error and the number of qudits. This will be the case if the MPS prior to truncation satisfies an area law for the k-Rényi entropy for some 0 < k < 1. The stat mech mapping is unable to probe these values of k. However, we hypothesize that the behavior of larger values of k is indicative of the behavior for k < 1 since the examples where the k-Rényi entropy with k ≥ 1 satisfies an area law but efficient MPS truncation is not possible require contrived spectrums of Schmidt coefficients. Although some physical processes give rise to situations where the von Neumann and k-Rényi entropies with k > 1 exhibit different behavior (see e.g. [Hua19], which showed that for random 1D circuits without measurements but with the unitaries chosen to commute with some conserved quantity, after time t the entropy is O(t) for k = 1 but O( t log t) for k > 1), the numerical evidence we gave in Section 4, where the scaling of all the k-Rényi entropies appears to be the same, suggests our case is not one of these situations.
In Section 5.2 we discussed how for 1D circuits with alternating unitary and weak measurement dynamics, there has been substantial numerical evidence for a phase transition from an area-law phase to a volume-law phase as the parameters of the circuit are changed. We also reviewed previous work applying the stat mech model to this setting, and explicitly showed how there is a q-driven phase transition from an area law to volume phase for the quasi-entropy for a specific choice of weak measurement. The SEBD algorithm simulating a 2D circuit of constant depth made from Haar-random gates may be viewed as a system with very similar dynamicsan alternation between entanglement-creating unitary gates and entanglement-destroying weak measurements. However, none of the unitary-and-measurement models that have been previously studied capture the exact dynamics of SEBD, one reason being that SEBD tracks the evolution of several columns of qudits at once (recall it must include all qudits within the lightcone of the first column). The Haar-random unitaries create entanglement within these columns of qudits, but not in the exact way that entanglement is created by Haar-random nearest-neighbor gates acting on a single column. Nonetheless, we expect the story to be the same for the dynamics of SEBD since the main findings of studies of these unitary-and-measurement models have been quite robust to variations in which unitary ensembles and which measurements are being implemented; we expect that varying parameters of the circuit architecture like q and d can lead to entanglement phase transitions, and thus transitions in computational complexity.
Indeed, the discussion from the previous section suggests precisely this fact. When we apply the stat mech mapping directly to 2D circuits instead of to 1D unitary-and-measurement models, we expect disorder-order phase transitions as both q and d are varied. To make the connection to entanglement entropy explicit here, we note that after t steps of the SEBD algorithm, all √ n qudits in the first t columns of the √ n × √ n lattice have been measured, and we have an MPS representation of the state on columns t + 1 through t + r, where r = O(d) is the radius of the lightcone (which depends on circuit architecture, but cannot be larger than d). To calculate the entropy of the MPS, we take the region A to be the top half of these r columns, and region C to be the bottom half. Region B consists of the first t columns, which experience projective measurements. The prescription for computing E U (Z 2,A )/ E U (Z 2,∅ ) calls for determining the free energy cost of twisting the boundary conditions in region A, which creates a domain wall along the A : C border. If the bulk is in the ordered phase, then this domain wall membrane will penetrate through the graph a distance of t, leading to a domain wall area of O(td). If the bulk is in the disordered phase, it will only penetrate a constant distance, on the order of the correlation length ξ of the disordered stat mech model, before being washed out by the disorder, leading to a domain wall area of only O(ξd). The typical domain wall configurations before and after twisting boundary conditions in the ordered and disordered phases is is reflected in the cartoon in Figure 13. As we discussed in Section 5.2, we expect there to be a correspondence between the scaling of the domain wall size and the free energy cost after twisting the boundary conditions of the stat mech model. This implies that the quasi-entropyS 2 is in the area (resp. volume) law phase when the classical stat mech model is in the disordered (resp. ordered) phase. Heuristically we might expect the runtime of the SEBD algorithm to scale like poly(n) exp O(S 2 ) , suggesting that the disorder-to-order transition is accompanied by an efficient-to-inefficient transition in the complexity of the SEBD algorithm. Furthermore, near the transition point within the volume-law phase, the quasi-entropy scales linearly with system size but with a small constant prefactor, suggesting that the SEBD runtime, though exponential, could be considerably better than previously known exponential-time techniques.

Efficiency of Patching algorithm from stat mech
We now study the predictions of the stat mech model for the fate of the Patching algorithm we introduced in Section 2.4. To do so, we in turn study the predictions of the stat mech model for entropic properties of the classical output distribution, as Patching is efficient if the CMI of the classical output distribution is exponentially decaying with respect to shielded regions.
We have previously applied the stat mech model to study expected entropies of quantum states. However, we now wish to study expected entropies of the classical output distribution. To this end, we now consider the non-unitary quantum circuit consisting of the original, unitary circuit followed by a layer of dephasing channels applied to every qudit. The resulting mixed state is classical (i.e., diagonal in the computational basis) and is exactly equal the output distribution we want to study. That is, the state after application of the dephasing channels is x D(x) |x x| where D is the output distribution of the circuit. Note that the application of the dephasing channel is not described in the formalism we have discussed previously, but is easily incorporated. In particular, we need to compute the weights between the auxiliary node x a and the corresponding outgoing node s u associated with the unitary u that is the last in the circuit to act on qudit a. We may update Eq. (40) (whose original form was derived in Appendix C) and compute the following, We therefore see that weight( s u x a ) in this setting is exactly equal to the number of k-tuples of indices (i 1 , . . . , i k ) with i j ∈ [q] that are invariant under both permutation operators σ u , χ a ∈ S k acting as σ u · (i 1 , . . . , i k ) = (i σ (1) , . . . , i σ (k) ). In fact, for our purposes, the auxiliary spin χ a will either be set to the identity e or to the k-cycle permutation (1 . . . k). In the former case, the weight reduces to tr W σ u = q C(σ u ) . In the latter case, since the only tuples that are invariant under application of the cycle permutation (1 . . . k) are the q tuples of the form (x, x, . . . , x) for x ∈ [q], the weight is simply q for all σ u . Summarizing, From these expressions, we may immediately note the following facts. First, flipping some auxiliary spin from e to (1 . . . k) cannot increase the weight of a configuration, and hence such a flip corresponds to an increase in free energy. Second, if an auxiliary spin is in the (1 . . . k) configuration, then the auxiliary spin may be effectively removed from the system since in this case the contribution of the auxiliary spin to the weight of a configuration is constant across all configurations.
With these modified weights, we may now compute "quasi-entropies"S k (X) as before, where now in the k → 1 limitS k (X) approaches the expected Shannon entropy of the marginal of the output distribution on subregion X, S(X) D , where the average is over random circuit instances.
In stat mech language, the quasi-CMI is essentially the difference in free energy costs of twisting the boundary condition of subregion A in the case where (1) no other spins have boundary conditions, and the case where (2) subregion C also has an imposed boundary condition. Now, consider some random circuit family C with associated stat mech model that is in the disordered phase for k = 2. For any subregion X of qudits, and partition of X into subregions X = A ∪ B ∪ C, we expect this difference between free energy costs will decay exponentially with the separation between A and C as where ξ is a correlation length. This is because in the disordered phase of the stat mech model, information about the boundary of region C will be exponentially attenuated as the distance from region C grows. If we takeĨ 2 (A : C|B) as a proxy for the average CMI of the output distribution, I(A : C|B) D , we conclude that the random circuit family C is poly(n, q)e −Θ(l) -Markov as defined in Section 2.4. The results of that section then show that Patching can be used to efficiency sample from the output distribution and estimate output probabilities with high precision and high probability. We take this exponential decay of quasi-2-CMI as evidence that the average CMI also decays exponentially, and therefore that Patching is successful.
Ordered stat mech model suggests Patching is unsuccessful. We first obtain exact, closed form results in the zero-temperature limit of the stat mech model, which corresponds to the q → ∞ limit. However, we expect that qualitatively similar results hold outside of this limit. As before, consider the stat mech model obtained by applying dephasing channels to all qudits after the application of all gates. Consider some connected, strict subset A of qudits on the original grid. Suppose we are interested in the quasi-entropyS k (A) = (F k,A − F k,∅ )/(k − 1) of the output distribution on this region. This quantity is given by the free energy cost of twisting the boundary conditions (auxiliary spins) associated with region A from e to (1 . . . k). The auxiliary spins associated with qudits in the complement of A are fixed to be in the identity permutation configuration, e. For both sets of boundary conditions, all non-auxiliary spins will order in the configuration e. This is because the configuration e maximizes the weights in Eq. (62) for spins connected to auxiliary spins in the configuration e, and the weight of a spin connected to an auxiliary spin in the configuration (1 . . . k) is independent of that spin's configuration. Hence, regardless of the configuration of the auxiliary spins, all bulk spins are in the identity permutation configuration in the q → ∞ limit of infinitely strong couplings.
Therefore, twisting a single auxiliary spin from e to (1 . . . k) results in a reduction of the total weight by a factor of q/q C(e) = q/q k = q 1−k , corresponding to a free energy increase of (k − 1) log(q). We therefore computeS Note that this result is exact in the q → ∞ limit. Notably, we find that all integer quasi-entropies are equal in this limit, and so we may trivially perform the analytic continuation to the von Neumann (i.e. Shannon) entropy: Hence, in the q → ∞ limit, the entropy of a strict subregion of the output distribution is maximal. Now, let X denote the set of all qudits. We want to compute S(X) . We again proceed by computing the quasi-entropies:S As before, for each auxiliary spin associated with region X that we "twist", the weight of the configuration is decreased by a factor of q 1−k relative to the configuration in which all auxiliary spins are set to e. However, in this case, as opposed to our previous calculation, all of the auxiliary spins are twisted. Recall from Eq. (62) that the weight between a twisted auxiliary spin and a bulk spin is independent of the value of the bulk spin. Hence, if all auxiliary spins are twisted, the lowest energy state in the bulk is no longer just the configuration in which all spins take the value e -in the absence of a symmetry-breaking boundary condition, there is now a global spin-flip symmetry and the ground space is k!-fold degenerate, consisting of all configurations in which all bulk spins are aligned. This symmetry contributes a factor of k! to the partition function and − log(k!) to the free energy. We hence calculatẽ We now perform the analytic continuation to the Shannon entropy: where γ ≈ 0.557 denotes the Euler constant. The expected Shannon entropy of the output distribution is therefore 1−γ ln(2) less bits than maximal in the low-temperature limit, corresponding to q → ∞.
From the above facts, we can immediately compute the expected CMI of the output distribution in this limit. Let (A, B, C) be any partition of the qudits. We have We therefore find that in this limit, the expected CMI of the classical output distribution approaches a constant equal to 1−γ ln(2) . While this result was derived with respect to the completely ordered stat mech model, corresponding to q → ∞, we expect similar behavior for ordered stat mech models in general. In particular, if X denotes the set of all qudits, in the case of an ordered k th -order stat mech model,S k (X) will similarly receive an extra contribution corresponding to the global spin-flip symmetry, which will also be contributed to the corresponding quasi-CMĨ I k (A : C|B) D . Hence, we do not expect the quasi-CMIs to decay when the corresponding stat mech model is in an ordered phase. We take this as evidence that the average CMI does not decay, and therefore that Patching is not successful in efficiently sampling from the output distribution with small error.

Depth-3 2D circuits with brickwork architecture
In the previous sections we discussed the implications of the stat mech mapping for random 2D circuits of variable depth d. In this section we fix d = 3 and examine the order or disorder properties of the model. In particular, we choose uniform depth-3 circuits with the so-called "brickwork" structure, where three layers of two-qudit gates are performed on a 2D lattice of qudits as shown in Fig. 14(a). Note that this architecture was also introduced in Section 3; the architecture we consider here is exactly the "extended brickwork architecture" of that section with the extension parameter r fixed to be one.
This structure is known to be universal in the sense that one may simulate any quantum circuit using a brickwork circuit (with polynomial overhead in the number of qudits) by judiciously choosing which two-qudit gates to perform and performing adaptive measurements [BFK09]. Thus, it is hard to exactly sample or compute the output probabilities of brickwork circuits in the worst case assuming the polynomial hierarchy does not collapse, and we expect neither the SEBD algorithm nor the Patching algorithm to be efficient. However, we give evidence that these algorithms are efficient in the "average-case," where each two-qudit gate is Haar random, by considering the order/disorder properties of the stat mech model that the brickwork architecture maps to.
Stat mech mapping for general k. The stat mech mapping proceeds as previously discussed for 2D circuits, but we will see that the brickwork architecture allows us to make some important simplifications. Each gate in the circuit is replaced by a pair of nodes, which are connected with an edge. Then, the outgoing nodes of the first (purple) layer are connected to the incoming nodes of the second (orange) layer, and the outgoing nodes of the second (orange) layer are connected to the incoming nodes of the third (green) layer. The resulting graph is shown in Fig. 14(b). Edges connecting incoming and outgoing nodes of the same layer are shown in color (purple, orange, green) and carry weight equal to the Weingarten function. Edges connecting subsequent layers are black. No weak measurement is performed between layers, so we may take M = {I q } along each of the black edges. Thus, these edges carry weight given by weight( s u 1 t u 2 ) = q C(σ u τ −1 u 2 ) . To perform the full mapping, we would also add a layer of auxiliary nodes to the graph and connect them to the third layer. However, we will suppose that most of the qudits undergo projective measurement after the third layer, and thus the auxiliary nodes may be omitted for those qudits. The auxiliary nodes will be important for any unmeasured qudits, but we assume these exist only at the edges of the graph. We do not need to consider auxiliary nodes to understand the bulk order/disorder properties of the model.
Looking at Fig. 14(b), we see that some of the nodes have degree 1 and connect to the rest of the graph via a (purple or green) Weingarten link. We can immediately decimate these nodes from the graph. For any τ, we have [Gu13] σ ∈S k which is independent of τ, so decimating these spins merely contributes the above constant to the total weight. This constant will appear in both the numerator and denominator of quantities like E U (Z k,A )/ E U (Z k,∅ ), and we ignore them henceforth. The remaining graph can be straightened out, yielding Fig. 15(a). The fact that Fig. 15(a) is a graph embedded in a plane that includes only twobody interactions is one upshot of studying the brickwork architecture, as it makes the analysis more straightforward and the stat mech model easier to visualize. This property and the fact that the brickwork architecture is universal for MBQC constitute the primary reasons we studied this architecture in the first place. Architectures with larger depth would lead to stat mech models that cannot be straightforwardly collapsed onto a single plane while maintaining the two-body nature of the interactions.
(a) (b) Figure 15: (a) The graph that results from decimating degree-1 nodes in Fig. 14(b). Each thin black link carries weight equal to the function q C(σ τ −1 ) while each thick orange link carries weight equal to wg(σ τ −1 , q 2 ). (b) The graph that results from decimating all degree two nodes for the graph in (a). For k = 2, both the horizontal orange and the vertical dark blue links are ferromagnetic, but have different strengths.
Simplifications when k = 2. As in previous examples, we examine the k = 2 case. In this case we might as well decimate all the degree-2 nodes in the graph in Fig. 15(a). This yields a graph with entirely degree-3 nodes, as shown in Fig. 15(b). The graph has two kinds of links, both carrying standard Ising interactions. The vertical dark blue links have weights given by while the horizontal orange links have weights given by weight( s u s u ) =        1 q 2 (q 4 +1) 2 q 6 + q 4 − 4q 3 + q 2 + 1 if σ u σ u = e 1 q 2 (q 4 +1) 2 2q 5 − 2q 4 − 2q 2 + 2q if σ u σ u = (12) (79) Both of these interactions are ferromagnetic and become stronger as q increases. We may think of the model as the square lattice Ising model for which 1/2 of the links carry a ferromagnetic interaction of one strength, 1/4 of the links carry ferromagnetic interactions of another strength, and the final 1/4 of the links have no interaction at all. The energy functional can be written where s i take on values in {+1, −1}. For q = 2 we have J vert = log(5/4)/2 = 0.112 and J horiz = log(53/28)/2 = 0.319. Both of these values are weaker than the critical interaction strength for the square lattice Ising model of J square = log 1 + √ 2 /2 = 0.441. This indicates that the graph generated by the stat mech mapping on 2D circuits of depth 3 with brickwork architecture is in the disordered phase when q = 2. This remains true for q = 3. For q = 4, J horiz = 0.500 > J square , but J vert = 0.377 < J square . Recall that 1/4 of the links can be thought to have J = 0 since they are missing. Taking this into account, the value of J averaged over all the links remains below J square for q = 5, and slightly exceeds it for q = 6.
This indicates that when we run SEBD on these uniform depth-3 circuits with Haar-random gates, the quantityS 2 = O(1) (independent of the number of qudits n) when q = 2 or q = 3 (and probably also for q = 4 and q = 5). Moreover, it indicates that for a partition of the output state into regions A, B, and C, the quasi-2-CMI of the distribution over classical output distributions I 2 (A : C|B) decays exponentially in dist(A, C). We take this as evidence that the SEBD and Patching algorithms would be efficient for these circuits.
how this polynomial interpolation argument is insufficient to show that the task of even exactly computing output probabilities and sampling from the output distribution of a constant-depth Haar-random circuit instance with high probability using our algorithms is classically hard, even though these circuits possess worst-case hardness. We first briefly review their technique before discussing a limitation in the robustness of the polynomial interpolation scheme. We then discuss how this robustness limitation makes the interpolation scheme inapplicable to our algorithms.
The main point is that our algorithms exploit unitarity (via the fact that gates outside of the lightcone of the qudits currently under consideration are ignored), but the hardness result of [Bou+19] holds with respect to circuit families that are non-unitary, albeit very close to unitary in some sense. Our algorithms are unable to simulate these slightly non-unitary circuits to the precision required for the worst-to-average case reduction, regardless of how well they can simulate Haar-random circuit families. While it is true that in this scheme there is an adjustable parameter K which, when increased, brings the non-unitary circuit family closer to approximating the true Haar-random family, increasing K also increases the degree of the interpolating polynomial. This makes the interpolation more sensitive to errors in such a way that, for any choice of K, the robustness that the interpolation can tolerate is not large enough to overcome the inherent errors that our algorithms make when trying to simulate these non-unitary families. The existence of simulation algorithms like SEBD and Patching, which exploit the unitarity of the circuit, may present an obstruction to applying worst-to-average-case reduction techniques that obtain a polynomial structure at the expense of unitarity. Note that, as discussed in the main text, a very recent alternative worst-to-average case reduction [Mov19] based on "Cayley paths" rather than truncated Taylor series does not suffer from this same limitation.

Background: truncated Haar-random circuit ensembles and polynomial interpolation
In this section, we give an overview (omitting some details) of the interpolation technique of [Bou+19] used to show their worst-to-average-case reduction, partially departing from their notation. Suppose U is a unitary operator. Then we define the θ-contracted and K-truncated version . Note that U (θ, ∞) = U e −iθ(−i ln U ) is simply U pulled-back by angle θ towards the identity operator I. Note that U (0, ∞) = U and U (1, ∞) = I. For U (θ, K) for K < ∞, the operator that performs this pullback is then approximated by a Taylor series which is truncated at order K. If K < ∞, U (θ, K) is (slightly) non-unitary.
Suppose C is some circuit family for which computing output probabilities up to error 2 − poly(n) is classically hard. Now, for each gate G in C, multiply that gate by H (θ, K) with H Haardistributed and supported on the same qubits as G. This yields some distribution over non-unitary circuits that we call D(C, θ, K). Note that if θ = 0, D exactly becomes the Haar-random circuit distribution with the same architecture as C. When θ = 1, the hard circuit C is recovered up to some small correction due to the truncation. If K is sufficiently large, we can assume that computing output probabilities for this slightly perturbed version of C is also classically hard.
Fix some circuit A drawn from H(C), the distribution over circuits with the same architecture as C with gates chosen according to the Haar measure. Let A(C, θ, K) denote the circuit obtained when the θ-pulled-back and K-truncated gates of A are multiplied with their corresponding gates in C. Note that A(C, θ, K) is distributed as D(C, θ, K). Define the quantity Assuming the circuit C has m gates, it is easy to verify that p 0 (A, θ, K) may be represented as a polynomial in θ of degree 2mK. Note also that p 0 (A, 1, ∞) = p 0 (C), which is assumed to be classically hard to compute. Now, assume that there exists some classical algorithm A and some = 1/ poly(n) such that, for some fixed K ≤ poly(n) and for all 0 ≤ θ ≤ , A can compute p 0 (A, θ, K) up to additive error δ ≤ 2 −n c for some constant c, with probability 1 − 1/ poly(n) over A(C, θ, K) ∼ D (C, θ, K). Then, A may evaluate p 0 (A, θ, K) for 2mK + 1 evenly spaced values of θ in the range [0, ] (up to very small error), and construct an interpolating polynomial q 0 (A, θ, K). By a result of Rakhmanov [Rak07], there is some interval [a, b] ⊂ [0, ] such that b − a ≥ 1/ poly(n) and |p 0 (A, θ, K) − q 0 (A, θ, K)| ≤ 2 −n c for θ ∈ [a, b] where c depends on c. One then invokes the following result of Paturi. Applying this result, we find |p 0 (A, 1, K) − q 0 (A, 1, K)| ≤ 2 −n c e poly(n,m,K) . If c is sufficiently large, then |p 0 (A, 1, K) − q 0 (A, 1, K)| ≤ 2 − poly(n) and the quantity q 0 (A, 1, K) is hard to compute classically. But this would be a contradiction, because q 0 (A, 1, K) can be efficiency evaluated classically by performing the interpolation.
Hence, this argument shows that for some choice of K and a sufficiently large c depending on K, computing output probabilities of circuits in the truncated families D(C, θ, K) with θ ≤ 1/ poly(n) up to error 2 −n c is hard (assuming standard hardness conjectures).

Limitation of the interpolation argument
The above argument shows that the average-case simulation of some family D(C, θ, K) of nonunitary circuits which in some sense is close to the corresponding Haar-random circuit family to precision 2 − poly(n) is classically hard, if simulating C is classically hard and the polynomial in the exponent is sufficiently large.
We now explain how, based on this argument, we are unable to conclude that exactly computing output probabilities of Haar-random circuits is classically hard. 8 In other words, suppose that with probability 1 − 1/ poly(n), some algorithm A can exactly compute output probabilities from the distribution H(C). We argue that a straightforward application of the above result based on Taylor series truncations and polynomial interpolation is insufficient to compute p 0 (C) with small error.
Consider some circuit realization A drawn from H(C), and assume that we can exactly compute its output probability p 0 (A). To use the argument of [Bou+19], we actually need to compute p 0 (A, θ, K) for some fixed value of K and θ in some range [0, ]. We first find an upper bound for which must be satisfied for the interpolation to be guaranteed to succeed with high probability. To this end, we note that [Bou+19] the total variation distance between the distributions D(C, θ, ∞) and D(C, 0, ∞) is bounded by O(mθ). Hence, if we try to use the algorithm A to estimate p 0 (A, θ, ∞), the failure probability over random circuit instances could be as high as O(mθ). Therefore, since the θ values to be evaluated are uniformly spaced on the interval [0, ], the union bound tells us that the probability that one of the 2mK + 1 values p 0 (A, θ, K) is erroneously evaluated is bounded by O(m 2 K ). Hence, in order to ensure that all 2mK + 1 points are correctly evaluated, we should take ≤ O(1/m 2 K). Now, assume that we have chosen ≤ O(1/m 2 K) and all 2mK + 1 points p 0 (A, ·, ∞) are correctly evaluated. Let θ be one of the evaluation points. We now must consider the error made by approximating the "probability" associated with the truncated version of the circuit with the probability associated with the untruncated version of the circuit, namely |p 0 (A, θ, ∞) − p 0 (A, θ, K)|. This error associated with the truncated Taylor series is upper bounded by δ ≤ 2 O(nm) K! [Bou+19]. Plugging these values into Lemma 13, we find that if we use these values to try to interpolate to the classically hard-to-compute quantity p 0 (C, 1, K), the error bound guaranteed by Paturi's lemma is no better than 2 O(nm) K! exp O(2mK(1 + m 2 K)) , which diverges in the limit n → ∞ for any scaling of m and K. Hence, the technique of [Bou+19] is insufficient to show that exactly computing output probabilities of circuits drawn from the Haar-random circuit distribution H C with high probability is hard.
Intuitively, the limitation arises because there is a tradeoff between the amount of truncation error incurred and the degree of the interpolating polynomial. As the parameter K is increased, the truncation error is suppressed, but the degree of the interpolating polynomial is increased, making the interpolation more sensitive to errors.

Inapplicability to SEBD and Patching
To summarize the findings above, the argument of [Bou+19] for the hardness of computing output probabilities of random circuits applies not directly to Haar-random circuit distributions, but rather to distributions over slightly non-unitary circuits that are exponentially close to the corresponding Haar distributions in some sense. We argued that the interpolation scheme cannot be straightforwardly applied to circuits that are truly Haar-random, and therefore it cannot be used to conclude that simulating truly Haar-random circuits, even exactly, is classically hard.
A priori, it is not obvious whether this limitation is a technical artifact or a more fundamental limitation of the interpolation scheme. In particular, one might imagine that if some algorithm A is capable of exactly simulating Haar-random circuit families, some modified version of the algorithm A might be capable of simulating the associated truncated Haar-random circuit families, at least up to the precision needed for the interpolation argument to work. If this were the case, then the hardness argument would be applicable.
However, SEBD and Patching appear to be algorithms that cannot be straightforwardly used to efficiently simulate truncated Haar-random circuit families to the precision needed for the interpolation to work, even under the assumption that they can efficiently, exactly simulate Haarrandom circuit families. This is because the efficiency of these algorithms crucially relies on the existence of a constant-radius lightcone for constant-depth circuits. The algorithm is able to ignore all qubits and gates outside of the lightcone of the sites currently being processed. However, the lightcone argument breaks down for non-unitary circuits. If the gates are non-unitary and we want to perform an exact simulation, we are left with using Markov-Shi or some other generalpurpose tensor network contraction algorithm, with a running time of 2 O(d Consider what happens if one tries to use one of these algorithms to compute output "probabilities" for a slightly non-unitary circuit coming from a truncated Haar-random distribution D(C, θ, K), and then use these computed values to interpolate to the hard-to-compute value p 0 (C, 1, K) via the interpolating polynomial of degree 2mK proposed in [Bou+19]. Even without any other sources of error, when one of these algorithms ignores gates outside of the current lightcone, it is essentially approximating each gate outside the lightcone as unitary. This causes an incurred error bounded by 2 O(nm) /K! for the computed output probability. Then, by an argument essentially identical to the one appearing in the previous section, one finds that this error incurred just from neglecting gates outside the lightcone is already large enough to exceed the error permitted for the polynomial interpolation to be valid. We conclude that this worst-to-average-case reduction based on truncated Taylor series expansions cannot be used to conclude that it is hard for SEBD or Patching to exactly simulate worst-case hard shallow Haar-random circuits with high probability.

B Deferred proofs
Lemma 1. Let i denote the sum of the squares of all singular values discarded in the compression during iteration i of the simulation of a circuit C with output distribution D C by SEBD with no bond dimension cutoff, and let Λ denote the sum of all singular values discarded over the course of the algorithm. Then the distribution D C sampled from by SEBD satisfies where the expectations are over the random measurement outcomes.
Proof. We rely upon a well-known fact about the error caused by truncating the bond dimension of a MPS, which we state in Lemma 14.
Lemma 14 (follows from [VC06] The second inequality follows from the fact that i x 2 i ≤ i x i for x i ≥ 0. To prove the first inequality, we start by considering the version of the algorithm with no truncation, which we have argued samples exactly from D. Let N t denote the TPCP map corresponding to the application of gates that have come into the lightcone of column t and the measurement of column t. That is, where x t indexes (classical) outcome strings of column t. Note that N t (ρ) is a classical-quantum state for which the sites corresponding to the first t columns are classical, and the quantum register consists of sites which are in the lightcone of column t but not in the first t columns. Define ρ t := N t−1 (ρ t−1 ) and ρ 1 := |1 1| ⊗L 1 column 1 , so that ρ L 2 +1 is a classical state exactly corresponding to output strings on the L 1 × L 2 grid distributed according to D. Now consider the "truncated" version of the algorithm, which is defined similarly except we use σ t to denote the state of the algorithm immediately after the truncation at the beginning of iteration t. That is, we define where T t denotes the mapping corresponding to the MPS truncation and subsequent renormalization at the beginning of iteration t, and we define σ 1 := T 1 (ρ 1 ) = ρ 1 (there is no truncation at the beginning of the first iteration since the initial state is a product state). We now have where the first inequality follows from the triangle inequality, and the second from contractivity of TPCP maps. Applying this inequality recursively yields where we also used the fact that no truncation occurs after N L 2 is applied (i.e. T L 2 +1 acts as the identity). Now, note that N i (σ i ) − (T i+1 • N i )(σ i ) 1 is exactly the expected error in 1-norm caused by the truncation in iteration i + 1. (This is true because of the following fact about classicalquantum states: orthonormal basis for the Hilbert space associated with register C.) By Lemma 14, this quantity is bounded by E √ 8 i+1 . Substituting this bound into the summation yields the desired inequality.
Furthermore, upon truncating the smallest Schmidt coefficients up to a truncation error of , with probability at least 1 − δ, the half-chain Schmidt rank r of the post-truncation state obeys the scaling r ≤ exp −Θ log(n/ δ) .
Proof. Suppose that an EPR pair is measured 2t times, corresponding to each of the two qubits being measured t times. A calculation shows that the probability of obtaining s M 1 outcomes is given by a mixture of two binomial distributions. Letting S be the random variable denoting the number of M 1 outcomes, we find Pr[S = s] = 1 2 Pr B 2t,sin 2 (θ/2) = s + 1 2 Pr B 2t,cos 2 (θ/2) = s , where B n,p denotes a binomial random variable associated with n trials and success probability p.
If after the 2t measurements we obtain outcome M 1 s times, the post-measurement state is given by (up to normalization) |00 + tan 2(t−s) (θ/2) |11 .
By Hoeffding's inequality, we have Pr X 2 ,sin 2 (θ/2) ≥ a ≤ exp −Θ(a 2 / ) ; this yields the bound Furthermore, note that since there are Θ(n 2 ) sectors, by the union bound, with probability at least 1−δ, for each sector j, all coefficients lie in the range [γ j+K j,p , γ j−K j,p ] if we take p to be p = δ/Θ(n 2 ). We make this choice of p and assume for the remainder of the argument that all coefficients of sector j lie in the given range, which is true with probability at least 1 − δ. We also note the following fact which will be used below: if and p are related as ≥ Θ(log(1/p)), then K ,p = O( ). Still working with the unnormalized state of Eq. (95), we now study the scaling between the Schmidt index i and corresponding coefficientλ i for i in the regime i ≥ exp Θ( log(1/p)) . Note thatλ i = γ for some integer . We first lower bound . Note that the lower bound is achieved if, for each sector j, all coefficients in that sector are equal to γ j−K j,p . In this case, the exponent is equal to − K ,p , where is the smallest integer such that Rearranging, we see that = Θ(log 2 (i)) ≥ Θ(log(1/p)), and hence = Θ(log 2 (i)) since − K ,p = Θ( ). Similarly , an upper bound on is achieved if, for each sector j, all coefficients in that sector are equal to γ j+K j,p . In this case, is equal to + K ,p , where is defined as above. This yields a matching upper bound for of Θ(log 2 (i)). We therefore have the scaling = Θ(log 2 (i)), which, using the fact thatλ i = γ yields λ i = exp Θ(− log 2 (i)) , i ≥ exp Θ( log(1/p)) .
Noting that λ i is proportional toλ i via λ i = 1 Nλ i with N = iλ 2 i , this shows the first statement of the lemma. Now, suppose that for some i ≥ i * = exp Θ( log(1/p)) , we truncate all Schmidt coefficients with index ≥ i. The incurred truncation error is = j≥i λ 2 j < j≥iλ 2 j = exp −Θ(log 2 (i)) where the inequality holds because the unnormalized state has norm strictly greater than one (i.e. N > 1). Rearranging, this becomes i ≤ exp Θ( log(1/ )) .
Hence, if we truncate the state at the end of the process up to a truncation error of , the rank r of the post-truncation state is bounded by r ≤ max exp Θ( log(1/ )) , exp Θ( log(1/p)) = exp as desired, where we used the relation p = δ/Θ(n 2 ).
Lemma 11. Suppose a 1D random circuit C is applied to qubits {1, . . . , n} consisting of a layer of 2-qubit Haar-random gates acting on qubits (n, n + 1) for odd n, followed by a layer of 2-qubit Haar-random gates acting on qubits (n, n + 1) for even n. for some universal constant c < 1, where the expectation is over measurement outcomes and choice of random circuit C.
Proof. We will use a smaller technical lemma, which we state and prove below. for some constant c < 1, where the expectation is over the random measurement outcome, the random state |H CD , and the Haar-random unitary U .
Proof. Consider the Schmidt decomposition ψ AB = √ p |e 1 A |f 1 B + √ 1 − p |e 2 A |f 2 B where we assume WLOG that p ≥ 1/2. We also assume that p < 1, because the statement is trivially true for any value of c when p = 1. Note that the entanglement entropy of this state is simply S(A) ψ = H 2 (p) where H 2 (p) := −p log p − (1 − p) log(1 − p) is the binary entropy function. Let M 0 := (Π 0 ⊗ I)U and M 1 := (Π 1 ⊗ I)U denote the measurement operators acting on subsystems B and C, where Π i denotes the projector onto the computational basis state |i and U is the Haar-random unitary applied to subsystems B and C. Let X denote a random variable equal to 1 with probability p and equal to 2 with probability 1 − p. Let Y denote the measurement outcome of {M 0 , M 1 } when applied to the state |e X A |f X B |H C,D . The probability of obtaining measurement outcome b on the original state is simply Pr(Y = b), and the post-measurement state after obtaining outcome b is 1 Pr(Y = b) p · Pr(Y = b|X = 1) |e 1 A |b B φ b,1 C,D + (1 − p) · Pr(Y = b|X = 2) |e 2 A |b B φ b,2 C,D = Pr(X = 1|Y = b) |e 1 A |b B φ b,1 C,D + Pr(X = 2|Y = b) |e 2 A |b B φ b,2 C,D where φ b,j C,D are normalized states on subsystems C and D. Define Letting ρ A,b denote the reduced density matrix on subsystem A of the post-measurement state after obtaining measurement outcome b, the maximal eigenvalue of this matrix is lower bounded as λ max (ρ A,b ) ≥ Pr(X = 1|Y = b) + Pr(X = 2|Y = b). (To see this, observe that the reduced density matrix on CD is σ = Pr(X = 1|Y = b) φ b,1 φ b,1 + Pr(X = 2|Y = b) φ b,2 φ b,2 , and the maximal eigenvalue is lower bounded as λ max (ρ A,b ) = λ max (σ ) ≥ φ b,1 σ φ b,1 ≥ Pr(X = 1|Y = b) + Pr(X = 2|Y = b)). Furthermore, note that Now, using concavity of the binary entropy function, we have Consider the ratio r(p, ) := H 2 (p+ (1−p)) H 2 (p) . We want to argue that for any > 0, r(p, ) is bounded away from one on the interval p ∈ [1/2, 1). This statement is clearly true for any p bounded away from one since H 2 is monotonically decreasing on the interval [1/2, 1). Furthermore, it is straightforward to show lim p→1 r(p, ) = 1 − . Hence, we have where c( ) < 1 unless = 0. We now average both sides over the choice of Haar-random state on CD as well as the Haar-random unitary U acting on BC. Since the event > 0 occurs with nonzero probability (in fact, with probability one), we have the strict inequality E H,U [c( )] := c < 1, from which the desired inequality follows.
We may assume that i 0 and j n, as in these cases we trivially have S(ρ A (b)) = 0. The postmeasurement state may be constructed as follows. Apply all gates in the lightcone of qubit i, then measure qubit i. Now apply all gates in the lightcone of qubit i + 1 not previously applied, then measure qubit i + 1. Assume that qubits are introduced only when they come into the lightcone under consideration. Iterate until all qubits in region B have been measured. Finally, apply any gates that have not yet been applied. It is straightforward to verify that this is equivalent to applying all gates of the circuit before performing the measurement of region B, in the sense that the measurement statistics are the same, and the post-measurement state given some outcome b is the same.
By Lemma 15, after the first iteration we are left with the state ψ LR b i 1 i 1 , such that ES(L) ψ ≤ c for some constant c < 1. In all iterations, we let L denote the current subsystem to the left of the measured qubits, and R denote the subsystem to the right of the measured qubits. Now consider the second iteration. Depending on whether i was even or odd, R may consist of one or two qubits immediately after the measurement of i. In the former case, we may apply Lemma 15 again, obtaining ES(L) ψ ≤ c 2 after the measurement of qubit i + 1, and obtaining a two-qubit subsystem to the right of the measured qubits. In the latter case, as a consequence of concavity of von Neumann entropy, we have ES(L) ψ ≤ c after measurement, and are left with a one-qubit subsystem to the right of the measured qubits. Iterating this process, after all qubits of subregion B have been measured, we are left with some state ψ LR such that E S(L) ψ ≤ c |B|/2 ≤ c |B| where c = √ c < 1. Finally, local unitary gates are applied to ψ LR to obtain the final post-measurement state on the entire chain. Since each unitary is applied to only the left of region B or only the right of region B, the entanglement entropy across the (A, A c ) cut is unaffected by these gates, and remains bounded by c |B| in expectation.

C Justification of stat mech mapping
In this appendix, we provide the justification for the procedure in Section 5.1. Namely, we show that Eq. (37), which expresses E U (Z k,∅/A ) as a partition function, is correct, and we derive the equations for the weights of the stat mech model.
To begin, for any integer k ≥ 2, we rewrite Z k,∅/A from Eqs. (32) and (33) where each trace includes k copies of ρ and W (1...k) is the linear operator that performs a k-cycle permutation, denoted (1 . . . k) in cycle notation, of the copies for qudits within region A while leaving the copies of the qudits outside of A unpermuted. When k = 2 there are two copies of ρ and W (A) (12) is the swap operator for qudits in A. After substituting Eq. (30) for each copy of ρ that appears in the equations above, we obtain an expression with k copies of each unitary U u and k copies of its adjoint U † u , as well as k copies where on the left hand side U ij is the (i, j) matrix element of the unitary U and on the right hand side S k is the symmetric group, δ σ ( i, j ) is shorthand for k a=1 δ i a j σ (a) and wg(τσ −1 , q 2 ) is the Weingarten function, which can be defined in several ways, for example by the following expansion [Col03; CŚ06] over irreducible characters of the symmetric group S k wg(π, q 2 ) = 1 (q 2 )! 2 λ χ λ (e) 2 s λ,q 2 (1) where the sum is over all partitions λ of the integer k, χ λ is the irreducible character of S k associated with the partition λ, e is the identity permutation, and s λ,q 2 (1) is the Schur polynomial evaluated at 1 which is equal to the dimension of the representation of U (q 2 ) associated with λ. Note that there exist permutations π for which wg(π, q 2 ) is negative.
In words, formula (113) states that Haar integration can be performed by summing over all ways of pairing up the incoming index for each of the k copies of U with an outgoing index of a copy of U † , and the incoming index of each copy of U † with an outgoing index of a copy of U . The permutations σ , τ ∈ S k encode which copies are paired with each other, and each permutation pair (σ , τ) is weighted by wg(τσ −1 , q 2 ) in the sum. It is helpful to think of this formula graphically, as in Figure 16, where we have depicted how the indices pair up after integration over a two-qudit Haar-random unitary. Haar integration formula allows us to replace Haar-random unitaries from circuit diagram with sums over configurations on a graph with nodes taking values in S k , and edges between graphs contributing a factor to the weight of a configuration.
By applying this formula to all of the Haar-random gates, all of the integrals are eliminated and the tensor network representation of Z k,∅/A can be expressed as a weighted sum over many networks, where each network in this sum corresponds to some choice of (σ u , τ u ) for every unitary u in the original circuit and some choice of M u and M u from M u and M u . Furthermore, each network in this weighted sum is itself composed of many disjoint parts that can be individually evaluated. We can see this by observing that when unitary u 2 succeeds unitary u 1 and shares a qudit, Haar integration forces the k tensor indices representing that qudit at that place in the circuit diagram to pair up with the k dual indices for the qudit at the same place, according to some permutation. This happens both at the output of unitary u 1 (corresponding to permutation σ u 1 ) and at the input of unitary u 2 (corresponding to permutation τ u 2 ) yielding a set of closed loops in the tensor network diagram. If the weak measurement acting on that qudit between unitaries u 1 and u 2 is M, then k copies of M and k copies of M † appear among these loops. An example of such a subdiagram is shown in Figure 17.
This observation justifies Eq. (37), as we have expressed E U (Z k,∅/A ) as a weighted sum, with each term labelled by pairs of permutations at the locations of each unitary, where the weight is given by a product of factors that depend only on two of these permutations. These factors are the weights given by Eqs. ⊗k , and noting that W π commutes with X ⊗k for any X. The final part to justify is the introduction of the auxiliary nodes and corresponding weights in Eq. (40). The tr in the definition of Z k,∅ implies that the indices of the qudits at the circuit output are paired up with their dual indices without permutation. This creates a disjoint closed diagram for each qudit at the circuit output. To evaluate it, we may use the same formula as Eq. (39) taking τ u 2 = e, the identity permutation. This is equivalent to introducing auxiliary nodes, as we have done, that are fixed to e for all qudits and across all terms in the partition function. The same follows for Z k,A with the exception that the operator W (A) (1...k) is applied to the circuit output, which permutes the output indices of any qudit a ∈ A prior to connecting them with their dual indices. This is equivalent to introducing an auxiliary node and fixing it to the value (1 . . . k).
There is no need to introduce auxiliary nodes at the beginning of the circuit because we are assuming the circuit acts on the pure product state |1 . . . 1 1 . . . 1|. Thus, the k copies of the index that feeds into the first unitary of the circuit are forced to be 0 and regardless of the permutation value of the incoming node for that unitary, this part of the circuit will contribute a factor of 1. If we had considered circuits that act initially on the maximally mixed state, we could have handled this by introducing a layer of auxiliary nodes at the beginning of the circuit and fixing their value to e.