Efficient sampling of noisy shallow circuits via monitored unraveling

We introduce a classical algorithm for sampling the output of shallow, noisy random circuits on two-dimensional qubit arrays. The algorithm builds on the recently-proposed"space-evolving block decimation"(SEBD) and extends it to the case of noisy circuits. SEBD is based on a mapping of 2D unitary circuits to 1D {\it monitored} ones, which feature measurements alongside unitary gates; it exploits the presence of a measurement-induced entanglement phase transition to achieve efficient (approximate) sampling below a finite critical depth $T_c$. Our noisy-SEBD algorithm unravels the action of noise into measurements, further lowering entanglement and enabling efficient classical sampling up to larger circuit depths. We analyze a class of physically-relevant noise models (unital qubit channels) within a two-replica statistical mechanics treatment, finding weak measurements to be the optimal (i.e. most disentangling) unraveling. We then locate the noisy-SEBD complexity transition as a function of circuit depth and noise strength in realistic circuit models. As an illustrative example, we show that circuits on heavy-hexagon qubit arrays with noise rates of $\approx 2\%$ per CNOT, based on IBM Quantum processors, can be efficiently sampled up to a depth of 5 iSWAP (or 10 CNOT) gate layers. Our results help sharpen the requirements for practical hardness of simulation of noisy hardware.

Before quantum computers can do anything useful, they must be able to reliably beat classical computers at some task, ideally one that is quantifiable, wellunderstood theoretically, and has reasonable experimental requirements.Random circuit sampling (RCS) has emerged as one of the leading candidates for this role: it is well-suited to the architectures of present-day gate-based quantum processors, and-at least in the ideal case of noiseless computation-its hardness is firmly established in complexity theory [1][2][3][4][5].As a result, it has become the focus of pioneering experimental efforts in the last few years [6][7][8][9].All such experiments, however, are by necessity carried out on present-day noisy, intermediate scale quantum (NISQ) processors, where the question of classical simulation hardness is much more nuanced.This has spurred much interest in the exploration of how noise affects the boundary between "easy" and "hard" simulation problems.
It is now established that RCS in the presence of a finite noise rate can be simulated in polynomial time [10] in the regime of anticoncentration (i.e., informally, at sufficiently large depth [11]).However the algorithm of Ref. [10] is not practical, leaving open the question of simulability of finite-sized, noisy RCS experiments with reasonable classical resources.This issue is subtle and depends on many variables, such as circuit architecture, size and depth, details of the noise models, choice of target metrics, etc.A powerful classical approach is based on tensor networks, which leverage limited entanglement and work best in 1D; for this reason, experiments have focused on 2D qubit arrays and picked highly-entangling gate sets.Large circuit depth also generates more entanglement and thus makes simulation harder.However, in practice, depth is limited by the presence of noise, as more gates also cause the accumulation of more errors.Additionally, the presence of noise in the quantum experiment lowers the bar for classical simulation: an apples-to-apples comparison requires that we tolerate similar levels of error from the classical algorithm as well.This opens the door to various strategies based on tensor networks that can outperform sufficiently-noisy quantum computers [12][13][14].
Characterizing the practical hardness of noisy RCS problems is thus a pressing question in quantum information science.It is also closely related to recent developments in nonequilibrium many-body physics regarding dynamical phases of quantum information in open systems.In particular, the fate of unitary circuits that are sampled during the dynamics (a special case of opensystem evolution) has attracted much interest due to the discovery of entanglement phases that occur as a function of the measurement rate .The emergence of a phase with limited entanglement (area-law) at high measurement rate in these circuits has been used to develop an efficient algorithm for sampling the output of shallow unitary circuits in 2D [45].The algorithm, dubbed "space-evolving block decimation" (SEBD) and sketched in Fig. 1(a), works at depths T below a finite critical depth T c (model-dependent, but ≈ 4 − 5 in typical models).Circuits with T = 3 in 2D are capable of universal quantum computation and thus hard to sample in the worst case [46], making this result surprising.Another very recent development in dynamical phases of quantum information is the discovery of a sharp noiseinduced phase transition in RCS [9,47], which was identified at a fixed number of errors per circuit layer (i.e.noise strength ε ∼ 1/N , N being the number of qubits).In the strong-noise phase, it was argued [9] that the processor's output can be classically "spoofed", while in the weak-noise phase simulation is conjectured to be practically hard.
In this work, we consider the problem of noisy RCS on shallow 2D circuits, Fig. 1(a-b), where the results of Ref. [10] are not applicable.Unlike other works, our goal is not to sample from the ideal output within some tolerance determined by the noise; instead, we aim to accurately sample the noisy output itself-a closely related but different problem.To this end we develop a classical algorithm, dubbed noisy-SEBD, and show that it undergoes a complexity phase transition (from quadratic to exponential in the linear size of the system) as a function of circuit depth T and noise strength ε.The physical principle behind the algorithm is that noise can be viewed as a sequence of (fictitious) measurements done by the environment on the system, Fig. 1(c).Measurements can lower the amount of entanglement in the system and drive a phase transition to an area-law entangled phase, where tensor network simulation is efficient; as a consequence, noise can drive a phase transition in the complexity of noisy-SEBD.Since the unraveling of noise into measurements is not unique, we are free to optimize it in order to lower this threshold as much as possible.In this work we use a two-replica statistical mechanics model to optimize the unraveling, finding weak measurements to be more disentangling than stochastic projective measurements; the effect of this optimized choice lowers the threshold noise rate by as much as a factor of ≈ 2, thus significantly expanding the "easy" phase.
Combined with the depth-induced complexity transition in the original (noiseless) SEBD algorithm [45], this defines a phase boundary in the space of circuit depth T and noise strength ε, sketched in Fig. 1(d).This adds to our growing understanding of the boundaries of "practical" simulability of noisy quantum systems.Moreover, it places sharp constraints on the possibility of achieving beyond-classical computation by scaling RCS experiments in space only-i.e., by growing quantum processor size at fixed circuit depth.This highlights the importance of further improvements in error rates of NISQ hardware.
The paper is structured as follows.
Sec. II reviews background material on random circuit sampling, matrix-product state simulation methods, the SEBD algorithm and monitored dynamics.In Sec.III we discuss the unraveling of noise into monitored trajectories and the choice of entanglement-optimal unravelings, including explicit solutions for unital qubit channels.Sec.IV presents the noisy-SEBD algorithm, numerical simulations of its complexity phase transition, and an illustrative application to circuits based on IBM Quantum's heavy-heaxagon qubit arrays.We conclude in Sec.V by summarizing our results, their implications and connections with other works, and directions for future research.The noise channels are unraveled as measurements (generically weak), represented here as interactions with a fresh ancilla which is then measured.These fictitious measurements can further disentangle the wavefunction.(d) Qualitative sketch of the complexity phase diagram of the noisy-SEBD algorithm.An entanglement phase transition separates an "easy" phase (low depth and/or strong noise), where the computational cost of sampling the noisy circuit output via noisy-SEBD is ∼ LxLy exp(T ), from a "hard" phase where the same cost becomes ∼ Ly exp(LxT ).The location of the phase boundary is model-dependent.The line ε = 0 yields the finite-depth complexity transition of Ref. [45] (noiseless SEBD) while the 1/T = 0 line (which stands for T = O(L)) yields the standard measurement-induced phase transition in two spatial dimensions [24].We conjecture the scaling εc(T ) ∼ εc,2D + O(1/T ) at large T .We emphasize that this is a transition in the complexity of the noisy-SEBD algorithm, not of the sampling task itself [10].
guments for classical hardness [1][2][3][4][5].The idea is to draw a random instance U from an ensemble of local unitary circuits of depth T , run it on a quantum computer prepared in the initial state |0⟩ ≡ |0⟩ ⊗N , and measure the state of each qubit in the computational basis, obtaining a bitstring z ∈ {0, 1} N .Ideally, this process samples bitstrings from the probability distribution which is computationally hard to do for classical computers.Intuitively, this is due to the production of extensive entanglement in the system over the course of typical instances of the unitary evolution U [48] which causes tensor network classical algorithms to fail.At the same time, sufficiently generic ensembles of unitary gates in U ensure that various other strategies for efficient classical simulation (such as stabilizers [49] or matchgates [50][51][52]) are not viable.
These theoretical insights have motivated pioneering experimental efforts in the past few years to demonstrate RCS-based quantum computational supremacy on present-day NISQ hardware [6][7][8][9].Verification of successful RCS is nontrivial.The experiments employ a linear cross-entropy diagnostic where p U (z) is the previously-defined ideal distribution (to be computed classically), and p exp (z) is the distribution of bitstrings obtained from the experiment, which generally differs from the ideal one due to imperfect implementation and uncontrolled noise.This quantity is convenient as it can be estimated by sampling from the experiment: Furthermore, it is designed in such a way that XEB = 1 if the experiment is perfect (p exp = p U , assuming the Porter-Thomas distribution), while XEB = 0 if it is completely noisy (p exp (z) = 2 −N for all z).In real-world conditions, with finite noise per gate, it has been argued that XEB ∼ f N T in the regime of interest to the experiment [6], where f is the average fidelity per gate (more detailed results on the conversion of local noise into global depolarizing noise have been obtained subsequently [53,54]).This lowers the bar for classical algorithms: they do not need to achieve a perfect score XEB = 1, but only to exceed the fidelity of the NISQ experiment.This has led to a flurry of activity in the past few years to develop approximate classical algorithms that can effi-ciently simulate RCS with XEB scores and runtimes comparable to those of the NISQ experiments [12-14, 55, 56].Furthermore, it was recently shown that there exists a polynomial-time algorithm for RCS with constant noise per gate [10], which uses a Feynman path-integral representation for the amplitudes ⟨z| U |0⟩ wherein noise damps the contribution of most paths.While this settles the complexity of noisy RCS formally, at least in the regime of anticoncentration [11], the algorithm comes with a very large exponent and is expected to be impractical at the relevant noise strengths.Thus the practical issue of efficient classical simulation of finite-sized noisy circuits remains open.

B. MPS simulation and the entanglement barrier
The interplay of entanglement and noise and its effects on the complexity of classical simulation become especially transparent in the case of matrix-product state (MPS) simulations of random circuits, limited to 1D (or quasi-1D) geometries [57][58][59].The idea is to represent a wavefunction |ψ⟩ = z c z |z⟩ as a product of threeindex tensors via i is a χ × χ matrix (i labels position in the chain, z is the physical state |z⟩, and two virtual indices are implicit).The bond dimension χ is a cutoff on the Schmidt rank of the state |ψ⟩, which makes the state classically representable.The MPS ansatz allows approximate simulations with high accuracy whenever the entanglement entropy 1 of |ψ⟩ about each cut obeys S ≪ ln(χ).Given the linear growth of entanglement in random circuits [48], the MPS method enables accurate simulation for short circuit depths T ≲ ln(χ), after which the truncation of bond dimension incurs a large error.One can nonetheless carry out finite-χ simulations for larger depths; the effect of MPS truncation error on the fidelity with the true state is found to be qualitatively similar to the effect of noise in the quantum experiments [12,14].Thus if noise strength is large enough, classical MPS simulations may beat NISQ experiments at the task of approximating a given ideal random circuit.
A different task is to classically simulate (or sample from) noisy circuits themselves.Here, classical simulation needs to incorporate noise and thus mixed states.This can be accomplished by using matrix-product operators (MPO): , where each A z,z ′ i is a χ × χ matrix.(A representation that guarantees positivity is 1 In fact compression into MPS form depends on the the behavior of small Schmidt eigenvalues which is captured by Renyi entropies Sn with n < 1.In typical random circuits the von Neumann and Renyi entropies have similar scaling for all values of n away from 0. given by matrix-product density operators (MPDO) [60].) In the presence of noise and unstructured random gates, the unique steady state is expected to be the maximally mixed state ρ = I/2 N = i (I/2) i , which is manifestly disentangled, and can be written as an MPO of bond dimension χ = 1 (i.e. the tensors A z,z ′ i ≡ δ z,z ′ /2 carry no virtual indices).Thus entanglement grows initially due to random unitary interactions, S ∼ T ; this persists until the effects of noise are felt, at depth T ∼ 1/p, p being noise strength; after that, S decreases to zero.Thus to accurately simulate the noisy dynamics one has to overcome an "entanglement barrier" of height S ∼ 1/p, with computational cost ∼ poly(N, exp(1/p)) [61,62].While polynomial in the system size N , for realistic noise strength p ∼ 10 −2 the cost can still be prohibitively large.Moreover, the efficient MPO simulation only applies to 1D; in higher dimension, small entanglement generally does not guarantee efficient simulation [59].

C. 2D shallow circuits and the SEBD algorithm
Due to MPS simulations, 1D circuit architectures require large depths (diverging faster than log(N )) for hardness.On the contrary, 2D circuits could be hard already for finite depth T , as tensor network methods in two or more dimensions are generally not efficient.As hardness of simulation generally scales exponentially in the treewidth of the tensor network, experimental RCS works in two-dimensional circuits [6][7][8][9] set T ∼ √ N to maximize classical hardness.Still, it is natural to ask at what depth T the classical simulation would become hard (asymptotically in large N ).
It is straightforward to note that T ≤ 2 is easy, as the output state U |0⟩ is given by a tensor product of decoupled dimers (T = 1) or one-dimensional subsystems each hosting an MPS of finite bond dimension (T = 2).However, starting at depth T = 3, it is possible to prepare states whose exact simulation is provably hard [46].Surprisingly, Ref. [45] has shown that approximate sampling from 2D circuits up to a finite depth T c ≥ 3 is in fact possible with polynomial time classical algorithms.One of the methods they propose, dubbed spaceevolving block decimation (SEBD), is based on reducing the sampled (2+1)D circuit to a (1+1)D circuit featuring measurements alongside unitary gates.Below a critical depth T c , the state that needs to be simulated classically obeys an area-law for entanglement [63] and can thus be accurately simulated via MPS methods.The emergence of this low-entanglement phase is an instance of a general phenomenon taking place in monitored dynamics, i.e. time evolution that combines unitary interactions and measurements, which we review next.

D. Monitored dynamics
Measurements can disentangle a quantum state.Given a many-body wavefunction |ψ⟩, measuring a qubit i in the computational basis leaves behind a product state between i and the rest of the system: ∝ |z⟩ i ⊗ ⟨z|ψ⟩ ¬i (z ∈ {0, 1} is the measurement outcome, obtained randomly from the Born rule, and ¬i denotes the rest of the system).This not only disentangles i, but may also destroy entanglement globally-as an extreme example, measuring any one qubit of a GHZ state |ψ⟩ = 1 √

(|0⟩
⊗N + |1⟩ ⊗N ) in the computational basis leaves behind a global product state.At the same time, in systems with local interactions, entanglement is generated locally.This asymmetry between entanglement creation and destruction suggests that monitored dynamics, featuring a finite rate p of measurements alongside local unitary interactions, should generically lead to states with low entanglement, for any rate p > 0 [18].Surprisingly, it was found that monitored dynamics can instead successfully stabilize a highly entangled phase as long as the rate of measurement p is below a critical threshold p c > 0 [15][16][17].The phases are characterized by the structure of entanglement in pure output states of the dynamics, |ψ m ⟩, which are indexed by the measurement record m (the sequence of classical outcomes collected during the dynamics).In particular, the scaling of entanglement entropy S A for a subsystem A in these states undergoes a transition from an area-law S A ∼ |∂A| (∂A is the boundary of A) to a volume-law S A ∼ |A|.These scalings are generally washed out in the mixed state ρ = m p m |ψ m ⟩⟨ψ m | obtained by discarding the measurement record.
The stability of entanglement in the volume-law phase (p < p c ) is explained qualitatively by the emergence of a quantum error correcting code that manages to hide information from future measurements for a long time [21,25,64].This coding perspective is illustrated concretely by the behavior of a reference qubit R initially entangled with a monitored many-body system [26].Let S R (t) be the entanglement entropy of the reference qubit as a function of time t in the monitored dynamics; at late times one generally has S R (t) ∼ e −t/τ , with a time constant τ dubbed the purification time.The behavior of τ changes sharply at the transition.In the volume-law phase it obeys τ ∼ exp(N ), signifying a successful encoding of at least some information about the state of R, which is protected from measurements for a long time.In the area-law phase τ becomes O(1), showing that the encoding fails.Finally, at the critical point (p = p c ) τ diverges algebraically, τ ∼ N z/d , with z a dynamical critical exponent (typically z = 1 [40]) and d the spatial dimension.In the following we will make use of the purification time as a practical diagnostic for the underlying entanglement phase, as it is numerically more efficient to compute than the entanglement entropy of large subsystems.
In one dimension, the entanglement phase transition also corresponds to a transition in the classical simulation complexity of the dynamics: area-law states in 1D have constant entanglement, and can be efficiently simulated with MPS methods.This is the principle behind the SEBD algorithm [45], in which a 2D sampling task is reduced to a (1 + 1)D monitored dynamics and simulated by MPS methods in the area-law phase.Ref. [65] has extended this approach to continuous-time Markovian open-system dynamics: the system-environment coupling, in the form of a Lindbladian master equation, can be "unraveled" into trajectories [66] (stochastic purestate evolutions) that contain quantum measurements, which in turn can lower entanglement and help the accuracy of MPS simulation.As the unraveling is a simulation artifact and is not physical, it can be optimized to minimize the amount of entanglement.Ref. [65] proposes an adaptive scheme that chooses the unraveling with the lowest expected value of the entropy at each position and time during the evolution; above a threshold noise strength, the trajectories enter an area-law phase and efficient MPS simulation becomes possible.
In this work we build on the approaches of Refs.[45,65] to address the problem of sampling from noisy, shallow circuits in 2D.

III. UNRAVELING NOISE INTO MONITORED TRAJECTORIES
In this Section we review the basics noise models and their unraveling and discuss the entanglement-optimal unraveling to use in the noisy-SEBD algorithm.

A. Noise models and unraveling
In quantum computers, it is often a good approximation to treat the inevitable interactions with the environment as Markovian noise, modeled by quantum channels (completely-positive trace preserving maps on the space of density matrices [67]).Mathematically a quantum channel Φ can be represented by as a sum of Kraus operators {M i }, which must obey the trace-preservation condition: As examples, the dephasing channel can be represented by i.e. with Kraus operators { √ 1 − εI, √ εZ}, and the depolarizing channel by with Kraus operators { √ 1 − εI, ε/3X, ε/3Y, ε/3Z}.In both cases ε is the noise strength.
One important property of the Kraus operators representation is that it is not unique.For a given quantum channel, different sets of Kraus operators are equivalent under unitary transformations U : This equivalence also holds for non-square, semi-unitary transformations U that satisfy only U † U = I.As an important consequence, even when a channel Φ can be unraveled into unitary processes (such as the dephasing and depolarizing channels above), this equivalence allows the freedom to choose an unraveling into non-unitary operators, which correspond to measurements.For example, the dephasing channel can be unraveled into a stochastic projective measurement, (i.e., a projective measurement of Z taking place with probability 2ε).It can also be unraveled into a weak measurement of Z, where θ is a free parameter tuning the strength of the measurement (θ = 0 returns a unitary unraveling).

B. Sampling noisy circuits
Returning to the noisy circuit problem, we wish to sample from the distribution P N (z) = ⟨z| N (|0⟩⟨0|) |z⟩, with N the channel describing the noisy circuit evolution: Here Φ is a single-qubit noise of strength ε as before, while By expanding each Φ into its Kraus operators, Φ(ρ) = i M i ρM † i , we can rewrite the probability distribution P N (z) as where each index m t,x labels the Kraus operator for the noise channel Φ acting on qubit x at step t, and m is a shorthand for the whole collection of indices {m t,x }.Thus the operators {M m } are a set of Kraus operators for the channel N .We can view each m as a quantum trajectory of the evolution: the initial state |0⟩ evolves into the pure state is the joint probability of drawing trajectory m and sampling bitstring z at the end.Then, Eq. ( 11) can be written as i.e. the marginal distribution obtained by summing over trajectories.This insight is widely used in simulations of opensystem dynamics [66,[68][69][70].The trajectory method allows one to simulate pure states rather than density matrices, which is often much more memory-efficient.This comes at the expense of having to average over many trajectories, e.g. in order to Monte Carlo-sample the expectation value of a target operator.In our case, however, we only aim to sample from the distribution P N (z), so there is no need for trajectory averaging: any joint sample (z, m) drawn from a simulation of the trajectory dynamics yields a valid sample z from the desired distribution P N (z).We emphasize that the m samples and their distribution are purely mathematical artifacts of the method: the decomposition of Φ into Kraus operators includes a gauge degree of freedom that can be fixed arbitrarily.However, the marginal distribution P N (z) is gauge-invariant and physical, corresponding to the true experimental distribution of bitstrings.
Beyond the practical advantage of simulating pure rather than mixed state evolution, the gauge degree of freedom in the choice of unraveling can be exploited to minimize entanglement within the trajectory [65], thus extending the applicability of tensor network methods.Below we address the question of which unraveling yields the lowest entanglement for a given channel.

C. Entanglement-optimal unravelings
For a given model of dynamics (e.g. a Hamiltonian or an individual instance of a RUC), one can locally optimize the unraveling of each noise channel Φ separately [65].Here we take a simpler approach, and look for a general prescription that works well on average over RUCs.Specifically, we aim to exploit the area-law phase in monitored dynamics (Sec.II D), which is driven by the density of measurements in the dynamics.Thus we look for an unraveling of Φ into measurements, so as to increase the effective density of measurements and facilitate a transition to the area-law phase [71].As we have seen in Sec.III A, there are multiple inequivalent ways of unraveling noise into measurements (e.g.weak vs stochastic projective measurements); therefore we look for the unraveling M = {M i } of the nose channel Φ that has the strongest disentangling effect on the dynamics.
Working at a "mean-field" level in Haar-random circuits, we consider the scaling of average purity2 . in trajectories of the dynamics within a two-replica setting.This maps onto the partition function of a Z 2 Ising magnet, whose ordered and disordered phase corresponds to volume-law and area-law entanglement, respectively.Our goal is to facilitate simulation by minimizing entanglement, therefore we aim to minimize the couplings in the magnet.The problem is analyzed in Appendix A, where we show that the minimization of the coupling amounts to maximizing the following objective function: This has a natural physical interpretation: if we view the Kraus operators {M i } as instruments of a generalized measurement (positive operator-valued measure) {M † i M i }, and apply such measurement to the fully mixed state I/2, we obtain post-measurement ; the objective function is given by the average purity of the post-measurement states: i .Thus within this approach, the unraveling that minimizes many-body entanglement in the RUC dynamics is also the one that best purifies a single mixed qubit.Also this is a particular feature of Haar-random circuits and likely not true of more structured models, since under Haar-random unitaries I/2 is the natural averaged reduced density matrix for a single qubit.
We also note that this cost function (postmeasurement purity of a mixed qubit) is in fact identical to the entanglement of a Bell pair state after measuring one qubit, which was proposed in Ref. [71] as a heuristic measure of the disentangling power of different unravelings.
The optimal unraveling M opt is given by where M = {M i } ranges over Kraus decompositions of Φ, and thus is subject to (semi-)unitary gauge freedom per Eq. ( 7).In general, the semi-unitary gauge freedom makes the optimization nontrivial as it allows for infinitely many parameters.However, below we show that for a broad and physically relevant class of quantum channels there exists an optimal unraveling of minimal rank which can be obtained analytically.
Let us consider unital qubit channels, which are defined as those that leave the fully mixed state I/2 invariant.Up to unitary transformations (which we can ignore in the RUC setting), unital qubit channels have the canonical form [73] with p α ≥ 0 and α p α = 1.We start with a set of n Kraus operators M = {M i }, where each element can be represented as where a i and b i are real non-negative numbers 3 , ũi = e iϕi,x u i,x , e iϕi,y u i,y , e iϕi,z u i,z are complex unit vectors (i.e.ũ * i • ũi = 1), and σ = (X, Y, Z).The Kraus operators of Eq. ( 17) must yield the Choi matrix of the unital channel Φ in Eq. ( 16), i.e. they must satisfy (we set σ 0 = I).This gives the following four relations: The optimization of the target function x(M ) subject to these constraints is carried out in Appendix B. The objective function is maximized when a i = p 0 /n and b i = (1 − p 0 )/n for all i, and the unit vectors ũi are real: ũi = u i = (u i,x , u i,y , u i,z ).The remaining constraints on the vectors {u i }, Eq. (20), read As an example, a solution with n = 4 is i.e. the vertices of a regular tetrahedron inscribed in the Bloch sphere, up to a rescaling 3p α /(1 − p 0 ) of each axis.The region enclosed by the solid loop corresponds to the effective 1D subsystem, and the region enclosed by the dashed loop corresponds to sites which are measured after applying gates in the past lightcone.(c) Equivalent 1D circuit for T = 4 and Lx = 5 with gate sequence ABCD.The effective 1D system has size 2Lx = 10, with gates up to the third nearest neighbor, and each measurement is followed by a reset to the |0⟩ state.The dashed lines enclose a unit cell whose architecture repeats periodically in time.
In the case of p x = p y = p z (depolarizing noise, Eq. ( 6)), the conditions in Eq. ( 21) become where E i denotes averaging over i with respect to the uniform probability distribution Pr(i) = 1/n.The conditions in Eq. ( 23) define a spherical 2-design, i.e. a probability distribution on the sphere whose first two moments coincide with those of the uniform distribution.Such distributions exist if n = 4 or n ≥ 6.In particular, the minimal (n = 4) optimal unraveling is i.e., a weak measurement along 4 directions corresponding to the vertices of a regular tetrahedron.Similarly, a solution to Eq. ( 23) with n = 6 is given by the vertices of a regular octahedron, corresponding to weak measurements of the Pauli X, Y and Z operators.
For the dephasing channel Eq. ( 5) (n = 2), an optimal unraveling is In all these cases, the most disentangling unraveling takes the form of weak measurements rather than stochastic projective measurements (e.g.Eq. ( 8) for the case of dephasing).
In Appendix C we verify that, for the optimal unraveling Eq. ( 25), the measurement-induced phase transition occurs at a lower value of ε (ε c ≃ 0.044) compared to the unraveling into stochastic projective measurements Eq. ( 8) (ε c ≃ 0.084-note the probability of doing a measurement is p = 2ε and the well-known MIPT critical point is at p c ≃ 0.168 [27]).We see that our simple mean-field approach is already sufficient to lower the critical noise strength by almost a factor of 2. It would be interesting to test whether locally-and adaptivelyoptimized unravelings as in Ref. [65] could further improve this threshold.For the rest of this work, we will use the optimal unraveling in Eq. ( 24) and Eq. ( 25) for depolarizing and dephasing noise channels unless otherwise specified.
Before moving on, we note that, although the above solution works only for unital channels, other physically relevant models may also be analytically tractable.In particular it is straightforward to find an optimized unraveling for the amplitude damping channel, a paradigmatic non-unital channel defined by Kraus operators In Appendix D we consider the optimization over 2 × 2 unitary rotations of these Kraus operators and derive the optimized unraveling Furthermore, we numerically verify the disentangling effect of this unraveling by studying the critical noise strength of the measurement-induced phase transition: ε c ≈ 0.17 for the optimized unraveling, compared with ε c ≈ 0.29 for the original unraveling, again almost a factor of 2 improvement.Finally, beyond analyticallytractable cases, one can always perform the optimization numerically for arbitrary single-qubit noise models, by restricting to a finite number of Kraus operators.

IV. NOISY-SEBD ALGORITHM A. Description of the algorithm
We consider noisy random circuits in 2D with finite depth T acting on a grid with N = L x × L y qubits.The goal of our algorithm is to sample bitstrings z from the distribution P N (z), Eq. ( 11), determined by the circuit instance and noise channel Φ (whose parameters we assume are known), with a small error.
Let us first review the noiseless case, studied in Ref. [45] and illustrated in Fig. 1(a).Due to locality, the outcome z i on any given qubit only depends on the evolution within its past lightcone.Thus to sample all the outcomes z i on the first row of qubits, y = 1, we only need to apply gates and channels on qubits within the past lightcone of the line {(x, y = 1, t = T )}, which includes qubits with y ≤ T .This corresponds to a circuit of depth T on ≤ L x T qubits, which can be simulated efficiently via MPS methods.At this point one can successfully sample the outcomes z i for the first row of qubits, and move on to the second row (y = 2) iterating the same approach, performing only the gates and channels within the past lightcone of {(x, y = 2, t = T )}, etc.This effectively maps the 2D shallow circuit to an equivalent 1D circuit, where the readout measurements are converted to mid-circuit measurements and resets (See Fig. 2(c) as an example).As a result of this mapping, the spatial direction along L y becomes the time direction for the 1D circuit (an idea also known as space-time duality in quantum circuits [37][38][39]).The SEBD algorithm is based on MPS simulation of this effective 1D dynamics for the purpose of sampling the z outcomes.
The issue with iterating this approach indefinitely is that the entanglement in the quasi-1D state of L x × T qubits in principle grows with each step, up to the point where MPS simulation fails.However, Ref. [45] observed that, due to the mid-circuit measurements, the effective dynamics may enter an area-law phase, wherein the entanglement remains finite and approximate sampling can be carried out efficiently with high accuracy.In general, the effective 1D circuit consists of N 1D = cT L x qubits, where c is a constant depending on the lattice geometry (the slope of the lightcone, in the models considered here c ≈ 1/2), and the spatial range of the two-qubit gates is proportional to T .Equivalently, one may view the effective system as a quasi-1D strip of size L x × cT with nearest-neighbor gates in both directions.Either way, each circuit layer on cT L x qubits is followed by the measurement of L x qubits (a full row), giving a ratio of measurements to unitary operation of ∼ 1/T .When this ratio is sufficiently high (i.e.T sufficiently low), the system enters an area-law phase and SEBD is efficient.
Let us now add noise to the picture.As discussed in Sec.III B, we can unravel the noise channels Φ into an arbitrary set of Kraus operators, simulate the purestate trajectories, sample from the joint distribution P N (z, m) and keep only the z samples.We adopt the entanglement-optimal unraveling of unital noise channels discussed in Sec.III C to suppress entanglement in the effective 1D state at the level of quantum trajectories.The simulated dynamics now features mid-circuit measurements with two distinct origins: a density ∝ 1/T coming from the final sampling step in the 2D circuit, and a density ∝ ε coming from the unraveling of noise.Below a critical noise rate ε c , dependent on the model and the circuit depth T , the entanglement of effective 1D state satisfies area law, and thus allows efficient classical simulation.These ideas are schematically summarized in Fig. 1.

B. Numerical results: entanglement phase transition
In this section, we numerically study the entanglement phase transition in the effective 1D subsystem that is used in the noisy-SEBD algorithm (Fig. 2(c)).As an example, we choose shallow circuits that act on a 2D square lattice (Fig. 2(a-b)) with unitary gates similar to those employed in Google's RCS experiment [6].The two-qubit gates are iSWAP-like fermionic simulation gates sandwiched between single-qubit rotations randomly chosen from the set {X ±1/2 , Y ±1/2 , W ±1/2 , V ±1/2 }, where W = (X + Y )/ √ 2 and V = (X − Y )/ √ 2 are Hadamardlike gates (note that W ±1/2 and V ±1/2 are non-Clifford).The two-qubit gates are applied to bonds of the square lattice according to the sequence ABCD for T = 4 (Fig. 2(a)) and ABCDB for T = 5 (Fig. 2(b)).These specific sequences are chosen to be the most entangling for the effective 1D dynamics, defined as giving the longest single-qubit purification time as discussed in the following.The noise channel Φ is taken to be the depolarizing channel, Eq. ( 6), with strength ε.
As a diagnostic for the entanglement phase transition which underpins the efficiency of noisy-SEBD, we use the single-qubit purification time τ discussed in Sec.II D. This is advantageous from the numerical point of view, relative to a direct calculation of the bipartite entanglement entropy, since it does not suffer from the large finite-size drifts arising from the logarithmic divergence of entropy at the critical point [27].The phase transition between area-law and volume-law entanglement scaling is probed by the mutual information between the single reference qubit and the rest of the system.In practice, we introduce a reference qubit which initially forms a Bell pair with a system qubit.At later time the average entropy of the reference qubit is captured by the exponential decaying relation S R (t) ∼ e −t/τ .In the volume-law phase (low measurement/noise rate), S R can remain nonzero for a long time τ ∼ exp(L x ).Physically this implies that there is finite measurement-induced entanglement between opposite ends of the system as one takes L x , L y to infinity jointly with L y = poly(L x ); this can be interpreted as emergent quantum teleportation [74] and has recently been explored experimentally [44].On the other hand, in the area-law phase (high measurement/noise rate), S R decays rapidly to zero with τ = O(1).At the critical point, we expect τ ∼ L z x , where z is the dynamical critical exponent (typically z = 1 for measurement-induced transitions in short-range interacting 1D systems [26,27]).
In Fig. 3, we show a finite size scaling analysis of τ /L x for both T = 4 (gate sequence ABCD) and T = 5 (gate sequence ABCDB), obtained from MPS simulation of the space-wise dynamics up to L y = 2L x , where only L x ≤ L y ≤ 2L x are used for fitting to avoid the early-time transient effect.The existence of a finite-size crossing point of the ratio τ /L x for all system sizes (Fig. 3(a,b)) indicates z = 1, which is consistent with the emergence of 1+1D conformal symmetry at the transition.Therefore, we use the scaling ansatz to determine the location of the critical point ε c and the correlation length critical exponent ν.From the data collapse, Fig.For both values of T we find correlation length exponent ν ≈ 1.3, which is consistent with the know value for the measurement-induced phase transition in 1D, as expected.Moreover we see that increasing circuit depth T leads to a larger critical noise rate ε c , which is the qualitative behavior sketched in Fig. 1(d).
These numerical results support our qualitative expectations for the complexity of noisy-SEBD.Since the hardness of the method is exponential in T , obtaining accurate predictions for the phase boundary at larger T becomes increasingly challenging.However, it is reasonable to conjecture that, for large T , (i) as the quasi-1D system of cT × L x qubits approaches a 2D limit, one should recover the 2D measurement-induced phase transition which occurs at a finite noise rate ε c,2D [24,75]; (ii) the transition should occur at a finite total noise rate, comprising the unraveled measurements (rate ε) and the sampling of the final state (rate We find evidence in support of these conjectures in Clifford circuits, which can be simulated in polynomial time by the stabilizer method [49] (though note that this limits us to the projective unraveling of the noise channel), see Appendix F.
Finally, in Appendix G we verify the accuracy of the noisy-SEBD algorithm by benchmarking its output against direct MPO simulations of the noisy dynamics and against stabilizer simulations of Clifford circuits.

C. Application: IBM quantum processors
Quantitatively, the results in the previous section show that circuits on square lattice architectures, on NISQ platforms such as Google's Sycamore processor with native iSWAP-like gates and ≳ 98% two-qubit gate fidelities (translating to ε ≲ 0.01 in our parametrization, see Appendix H), are already in the "hard phase" at depth T = 4. Since depth T = 3 is in the easy phase already for noiseless SEBD (i.e., for ε = 0), the inclusion of noise does not allow efficient simulation of an additional gate layer in this setting.This class of circuits is however a worst-case scenario, representing highly scrambling dynamics optimized for hardness of simulation [6,76].
Here we consider the application of noisy-SEBD to circuits on quantum processors with heavy-hexagon geometry and native CNOT gates, such as IBM Quantum's family of processors, Fig. 4(a,b).We consider again highly-scrambling random ciruits with iSWAP twoqubit gates and single-qubit gates randomly chosen from {X ±1/2 , Y ±1/2 , W ±1/2 , V ±1/2 }.An iSWAP gate can be compiled into two native CNOT gates, plus single-qubit gates.Thus the effective noise rate for iSWAP gates is twice the CNOT error, giving e.g.≈ 96% median gate fidelity on the Osprey processor (corresponding to ε ≈ 0.025 in our parametrization, see Appendix H).
We consider subsystems of a qubit array based on the Condor processor, Fig. 4(a).The full system has N = 1, 121 qubits and linear size L x = 43.The gate sequence and unit cell for the SEBD algorithm are shown in Fig. 4(b), for the N = 65, L x = 11 Hummingbird processor.We characterize the efficiency of the (noisy-)SEBD algorithm by computing the half-system bipartite entanglement entropy S, Fig. 4(c), and single-qubit purification time τ , Fig. 4(d), for both noiseless and noisy random circuits with depth T = 5 (measured in units of iSWAP gates, thus corresponding to 10 CNOT gates on each qubit).As the linear system size is varied between L x = 7 and L x = 43, we observe a clear volume-law phase in the noiseless case, with volume-law entropy S ∝ L x and exponential purification time τ ∼ exp(L x ).On the contrary, for noise rate ε = 0.02, we see a weak growth of S with L x , consistent with critical scaling S ∼ ln(L x ) or eventual saturation to an area-law.The τ /L x also decreases, either to a finite constant (critical behavior) or to zero (area-law behavior).Finally, for ε = 0.025, both diagnostics are indicative of an area-law phase.
We conclude that in this case the inclusion of a realistic noise rate in the simulation algorithm is sufficient to drive a transition in complexity of the SEBD algorithm.This makes the difference between an asymptotically-efficient and asymptotically-hard MPS simulation of the sampling problem.We note that, while circuits with low depth such as T = 5 can likely be simulated by brute-force tensor network methods up to hundreds or thousands of qubits, the cost of such methods remains generically exponential in the linear size of the system L x .On nextgeneration processors with N ∼ 10 5 qubits those methods would become intractable, while noisy-SEBD would remain practical in the easy phase.

V. DISCUSSION
We have introduced a classical algorithm, noisy-SEBD, to sample from the output distribution of noisy, shallow circuits in two dimensions.The algorithm uses the insight of mapping a 2D RCS problem to a 1D monitored dynamics problem (space-evolving block decimation, SEBD [45]), while also unraveling the action of noise into additional measurements on the system.At sufficiently low depth and sufficiently strong noise, this enables efficient MPS simulation of the monitored quantum trajectories, and thus efficient sampling from the appropriate noisy output distribution.
Given that the unraveling of noise into measurements is arbitrary, it can be optimized so as to reduce the amount of entanglement [65].Here we have focused on a "meanfield" approach where single-qubit noise channels at all positions and times are unraveled in the same way, chosen based on the two-replica statistical mechanics description of the circuit upon averaging over random gates.We have found that, for unital channels (such as dephasing and depolarizing), the optimal unraveling is based on uniform weak measurements, rather than stochastic projective measurements.The difference between unravelings is substantial-in the standard model of brickwork circuits in 1D, the noise threshold corresponding to the measurement-induced entanglement transition is reduced by a factor of about 2 for the optimal weak-measurement unraveling (ε c ≈ 0.04) compared to the usual projective measurement unraveling (ε c ≈ 0.08, i.e. measurement rate p c = 2ε c ≈ 0.16).This is consistent with prior observations in Ref. [71], and the optimization technique could be of independent interest for the study of measurementinduced entanglement transitions.
While noisy RCS in the anticoncentration regime was shown to be classically-simulable in polynomial time based on a sampling of "Feynman paths" in Pauli operator space [10], the polynomial scaling of the algorithm features a large exponent (proportional to 1/ε, i.e. of order 100 in present-day experiments) that makes the algorithm impractical.This leaves open the question of "practical hardness" for finite-sized RCS experiments.Furthermore, the requirement of anticoncentration in Ref. [10] is not met at constant depth T (depth diverging as at least log(N ) is needed [11]), so that even the in-principle hardness of sampling noisy shallow circuits is an open problem.Our results contribute to sharpen the requirements for such hardness by identi- fying a phase in the parameter space of depth T and noise strength ε [Fig.1(d)] where noisy RCS can be classically simulated via a straightforward MPS algorithm in time4 ∼ N exp(T ).We have located the entanglement phase transitions in circuit architectures based on real-world quantum processors.In square lattices with native iSWAP-like gates, we found that noisy-SEBD allows the efficient sampling of circuits of depth T = 3, like SEBD in the noiseless case; but for realistic noise rates of ε ≲ 1% it does not increase the depth threshold (i.e., T = 4 remains in the hard phase).On heavy-hexagon lattices with native CNOT gates, we have instead found that the inclusion of realistic noise rates can increase the depth threshold, as shown in Fig. 4(c,d).
Our results add to the growing body of work on noiseinduced phase transitions in computational complexity.Recent works have studied the simulation of noisy RCS via MPS simulations truncated to constant bond dimension χ [12,14], finding that the accumulated truncation error behaves similarly to noise in the quantum experiment-i.e.causes an exponential decay of the linear cross entropy, Eq. ( 1); to beat this classical simulation method, the quantum processor must be below a finite noise threshold.We remark that the task considered in those works is different from the one considered here.Namely the goal in Refs.[12,14] is to simulate the ideal (noiseless) bitstring distribution better than the noisy quantum experiment, as quantified e.g. by fidelity or linear cross-entropy.Notably this allows for an exponentially-small (in N and T ) fidelity between simulation and experiment, as long as the former is closer to the ideal result.In contrast, our goal is to simulate with high accuracy the noisy bitstring distribution itself.This is a significant distinction physically as the effect of uncontrolled MPS truncation is a priori very different from that of local noise, even at the same level of fidelity (e.g., MPS truncation does not obey locality).
Even more recently, a noise-induced phase transition in RCS has been reported [9,47] in deep quantum circuits with noise rates scaled as ε ∼ 1/N , i.e. a constant number of errors per layer.The scaling of linear cross entropy [Eq.( 1)] in these models was predicted to sharply change as a function of εN , from an "easy phase" where the system appears to break into finite-sized clusters from the point of view of linear cross-entropy, to a phase where it behaves as a single large cluster.The former phase is easy to spoof classically, while the latter is conjectured to be practically-hard.This transition is also different from the one studied here.For one, it applies to a vanishing noise rate ε = O(1/N ) rather than a finite ε = O(1).Additionally, and more importantly, it is a phase transition in an observable (albeit a complex one like linear cross-entropy) that reflects an intrinsic property of the system, whereas the transition studied here is a property of a simulation algorithm (noisy-SEBD) that is not intrinsic to the system.
A distinctive aspect of the problem we study here is that the simulation task requires precise knowledge of the noise model on the quantum processor, which is not fully controlled or programmable, and whose characterization is a separate challenge [77].For this reason, the sampling task simulated by noisy-SEBD is quite different from the standard (noiseless) one.In other words, given a noise model, there is a fully well-defined simulation task; however, comparison with real-world noisy devices is more subtle, as the "true" noise model for such devices is never completely known.At the same time, this feature of the problem (which is generic of simulation algorithms for noisy systems) opens up an interesting direction for future research, namely using noisy-SEBD as a noise benchmarking tool.One could run noisy-SEBD (or other algorithms for the same task) with a parametrized noise model, compare its output with that of real quan-tum hardware, and use the comparison to optimize the noise parameters.To this end, it would be useful to generalize our discussion to more complex noise models beyond uncorrelated single-qubit errors.We leave these ideas as directions for future research.
Finally, we emphasize again that noisy-SEBD is only possible strategy for classical sampling.It remains an interesting open question to identify other simulation algorithms with polynomial cost O(N c ), with c an εindependent constant (unlike in the Feynman path sampling approach of Ref. [10] or the direct MPO approach for 1D circuits of Ref. [61]) below a finite noise threshold ε c .The possibility of an intrinsically hard phase at finite depth (before anticoncentration and the results of Ref. [10] apply) is also an interesting open question, with noisy-SEBD providing a constraint on this hypothetical phase boundary.Another interesting direction for future work is the possibility of extending these results to higher dimension or all-to-all connected systems, e.g.trapped ion quantum computers.While MIPTs are known to arise in all these cases, the ensuing area-law for entanglement can only be exploited for efficient MPS simulation in 1D (including in the effective 1D subsystems of shallow 2D circuits used in SEBD).Whether other formulations of the MIPT, e.g. the dynamical purification approach [25,26], can be exploited for efficient simulation in more general systems is an interesting open question.
Note added.Upon completion of this manuscript, we became aware of an independent work on related topics appearing in the same arXiv posting [78].Our works are complementary and our results agree where they overlap.ACKNOWLEDGMENTS M. I. thanks Aram Harrow for helpful discussions.We thank Soonwon Choi for bringing Ref. [71] to our attention.Numerical simulations were carried out on computational resources provided by Texas Advanced Computing Center (TACC) and on Stanford Research Computing Center's Sherlock cluster.M. I. was partly supported by the Gordon and Betty Moore Foundation's EPiQS Initiative through Grant GBMF8686.Here we study the most disentangling unraveling of a single-qubit noise channel Φ in the context of 1D brickwork random circuits, by mapping the entropy calculation of random circuits to a classical statistical mechanical model [22,45].

Quasientropy
Consider a subsystem A of a one-dimensional qudit chain.The k-Renyi entropy for the reduced density matrix ρ A is defined as where The von Neumann entropy is given by the k → 1 limit For a hybrid random circuit, we are interested in the trajectory-averaged behavior of k-Renyi entropy where E C represents the combined average of Haar random circuits E U and the average over Kraus operators E M (i.e.quantum trajectories).
Replica tricks can be applied to cure the average of the logarithm.However, to get direct mapping to the stat-mech model, one can alternatively consider the kquasientropy Sk (A) [45], i.e. the k-th moment of the entanglement spectrum, weighted by the k-th power of the measurement outcome probability, Importantly, in the limit k → 1, Sk (A) → ⟨S 1 (A)⟩, similar to the k-Renyi entropy.The k-quasientropy has a natural mapping with a classical stat-mech model: the quantities E C [Z k,A/∅ ] can be viewed as partition functions for a (2+0)-dimensional spin model with different boundary conditions.

Generalized measurement
A general measurement procedure can be represented by a set of Kraus operators {M i } satisfying i M † i M i = I [67].For each quantum trajectory, a specific Kraus operator is chosen with probability p i = tr M i ρM † i and gives the updated state ρ ′ = M i ρM † i /p i .However, in the calculation of the k-quasientropy, it is more convenient to re-parametrize the generalized measurement in terms of operators Mi and classical probabilities µ i satisfying tr M where q is the local Hilbert space dimension.It is easy show µ i is given by and that µ i ≥ 0 and i µ i = 1 (i.e.µ i is a probability distribution).The advantage of this reparametrization is that it renders the k-quasientropy invariant under trivial decomposition of Kraus operators, which is important for optimizing the unraveling.As an example, consider the sets , which differ by a trivial decomposition of σ z and are hence physically equivalent (both describe the unitary transformation ρ → σ z ρσ z ).However, if we define the average over a Kraus set as Using the reparametrized Kraus operators and defining the average as It is also worth mentioning that, in the limit of interest (k → 1), these two formalisms are equivalent.

Mapping to a classical statistical mechanical model
The goal of this mapping is to calculate the averaged partition functions E C [Z k,X ] where X = A or ∅: where S X is the operator that implements a cyclical permutation of the k replicas only in the X subsystem.By using the Haar measure calculus [79], the average over replicated gates (U ⊗ U * ) ⊗k can be expanded onto permutations of the replicas, giving a partition function of "spins" valued in the permutation group of k elements S k .
Hence the average over single-site measurement operators can be evaluated by appropriately contracting with connecting permutations.In this sense, each unitary gate can be viewed as two permutation nodes {σ, τ } ∈ S k , which form a honeycomb lattice as shown in Fig. 5(b).The total partition function can be evaluated by contracting the nodes with proper weights w (k) (σ r , τ r ′ ) on the links: with distinct couplings on dashed and solid links, as shown in Fig. 5(b).The average over Haar random unitary gates gives the coupling for dashed links: where Wg q 2 is the Weingarten function [79].These couplings are independent of measurements.The solid links in Fig. 5(b), on the other hand, are given by where c denotes the number of cycles in permutation τ σ −1 ∈ S k and λ c is the length of cycle c.Using the convention above for averaging over Kraus operators,

Two replicas: classical Ising model
Although the combination of Eq. (A10) and Eq.(A12) gives the exact expression for the partition function, the possible negative weights of w u (σ, τ ) impede the direct mapping to a physical system with real interactions at a real temperature.For the case of k = 2, this sign problem can be circumvented by integrating out all τ variables, which gives rise to a classical Ising model defined on a triangular lattice as shown in Fig. 5(c): where ⟨σ 1 , σ 2 , σ 3 ⟩ denotes a lower-facing triangle with three neighboring vertices σ 1 , σ 2 , σ 3 .For k = 2, we have and denote where u and v are determined by specific Kraus operators via Eq.(A12).Using the definition of µ i , one can see that With these, one can express the three-body interaction w (2) (σ 1 , σ 2 , σ 3 ) explicitly.Due to the permutation symmetry σ → σ (where σ denotes the other permutation with S 2 ≡ Z 2 ) and spatial reflection symmetry σ 1 ↔ σ 2 , one only needs to specify Furthermore one may reexpress the three-body interaction as the product of three two-body interaction w , where J d and J h are the two-body interaction strength for diagonally and horizontally neighboring sites respectively, whose expressions are where x ≡ v/u is a dimensionless parameter given by for all Kraus operators, one has u ≥ v, hence J d ≤ 0 and J h ≥ 0 for q ≥ 2.
The statistical mechanics of this classical Ising model can be solved exactly [80,81] and the critical point x c , separating paramagnet and ferromagnet phases, is determined by the relation 2e This phase transition between an ordered and disordered phase in the statisitical mechanical model then maps onto to the area-law/volume-law phase transition of the 2quasientropy in the hybrid random circuit.As we see, within this two-replica analysis the transition is controlled exclusively by the parameter x, Eq. (A23), with an ordered phase (volume-law) for x < x c and disordered phase for x > x c (area-law).Therefore, to obtain the optimal unraveling for a given quantum channel, we aim to maximize the function x(M ).This justifies the use of x(M ) as a cost function for the unraveling of Φ in the main text.the subscript i for ease of notation, we have with ũ = u R + iu I .(We used the identity σ α σ β = σ 0 δ α,β + iε αβγ σ γ .)Letting v ≡ au R + bu I ∧ u R , the operator M † M has eigenvalues λ ± = (a 2 + b 2 ) ± 2b∥v∥.It follows that the ratio in the definition of x(M ), Eq. ( 14), reads Focusing on the second term, we have ∥v∥ 2 = a 2 cos 2 (θ)+ b 2 cos 2 (θ) sin 2 (θ) sin 2 (χ), where ∥u R ∥ ≡ cos(θ) (note ũ is a unit vector, so ∥u R ∥ 2 + ∥u I ∥ 2 = 1) and χ is the angle between u R and u I .Since we aim to maximize x(M ), we can take χ = π/2 and then maximize over θ the function f (θ) = a 2 cos 2 (θ) + b 2 cos 2 (θ) sin 2 (θ).The maximum depends on the ratio of a and b: where a i and b i are subject to the constraints i a 2 i = p 0 and i b 2 i = 1 − p 0 , Eq. ( 19).For strong noise, p 0 ≤ 1/2, it is possible to choose a i ≤ b i for all i.This gets rid of the negative term in the sum and gives x(M ) = 1, i.e. complete purification, which is optimal.However for weak noise (p 0 > 1/2) we have i a 2 i > i b 2 i , so it is not possible to avoid the sum over i in the above expression.
We first handle the case in which a i > b i for all i.
where we used Sedrakyan's inequality (i.e.Cauchy-Schwartz applied to the vectors { √ c i } and {d i / √ c i }).
The constraints in Eq. ( 19) impose i c i = 1 and This is saturated by setting a i = p 0 /n and b i = (1 − p 0 )/n for all i.Next, we consider the case in which a i ≤ b i for some i.We use primed sums to denote sums over i that are restricted only to those indices: i: ai≤bi = ′ i .Let us define γ ≡ ′ i c i (we have 0 ≤ γ ≤ 1, where γ = 0 recovers the previous case); then, by the same reasoning as above, we have i is non-negative by definition, so where the right-hand side is maximized for γ = 0. Thus the symmetric solution a i = p 0 /n and b i = (1 − p 0 )/n for all i is optimal in general.Finally, note that when a i > b i , the maximum in Eq. ( B3) is achieved at θ = 0, i.e. ∥u I ∥ = 0; thus in the solution above, the unit vectors are real: ũ = u R .To quantify the collapse we consider the similar objective function as in Ref. [27].We first sort the data by x,i where i = {1, • • • , n} labels sorted data points and define y i = τ (x i )/L x,i .Then the objective function R(ε c , ν) is defined as  Probabilities are estimated by averaging over K trajectories, with K = 10 3 -10 5 indicated on top.In the top row, the averaged probabilities pave (from noisy-SEBD and stabilizer simulations) are normalized by the exact value obtained from MPO simulation.In the bottom row, the averaged probabilities pave from noisy-SEBD are normalized by the average of 10 6 trajectories of stabilizer simulations.The ratios converge towards 1 (horizontal lines) with increasing K.
Clifford circuits, varying the number of sampled trajectories K from 10 3 to 10 5 (note we unravel depolarizing noise into probabilistic erasure for the stabilizer simulation, and into weak measurements for noisy-SEBD; we use the same K for both methods).The probabilities P N (z) are scaled by the exact reference value obtained from MPO simulation, whose truncation error is kept below 10 −10 .As the trajectory number increases, one can see both SEBD and Clifford simulation show a good convergence to the exact result.
We then consider the same circuit architecture but with L x = 9, L y = 9, where exact MPO computation would require large computational effort.Therefore, in this case we directly compare SEBD results against the Clifford results (with 10 6 trajectories for the latter).Results are shown in the bottom row of Fig. 11, again displaying good agreement.

Appendix H: Converting gate fidelity to noise rate
The convention for noise strength ε used in this work is in terms of single-qubit channels as in Eq. ( 6).The usual figure of merit for two-qubit gates is the average fidelity f .To convert between the two, we note that where the integral is over the Haar measure on the twoqubit Hilbert space, P is any traceless Pauli operator, and the formula applies equally to dephasing and depolarizing noise (in fact to any unital noise channel upon setting 1 − ε → p 0 , cf Eq. ( 16)).The Haar integral yields Tr P 2 /20 = 1/5, so at small ε, thus ε ≃ (5/8)(1−f ).In particular this means that a gate fidelity of f = 96% translates to ε ≃ 0.025, which is used in Sec.IV C.

FIG. 1 .
FIG. 1. Main ideas of this work.(a) Sampling of 2D shallow unitary circuits.The qubit array has linear dimensions Lx, Ly and the circuit has depth T .The SEBD algorithm is based on an MPS simulation carried out along a spatial direction (e.g.Ly), approximating the wavefunction of a 1D subsystem of qubits on a light cone surface (green).It exploits the measurements on the top boundary of the circuit to disentangle the wavefunction.(b) We consider noisy circuits with uncorrelated local noise: unitary gates (blue rectangles) are interspersed with single-qubit noise channels (orange circles).(c)The noise channels are unraveled as measurements (generically weak), represented here as interactions with a fresh ancilla which is then measured.These fictitious measurements can further disentangle the wavefunction.(d) Qualitative sketch of the complexity phase diagram of the noisy-SEBD algorithm.An entanglement phase transition separates an "easy" phase (low depth and/or strong noise), where the computational cost of sampling the noisy circuit output via noisy-SEBD is ∼ LxLy exp(T ), from a "hard" phase where the same cost becomes ∼ Ly exp(LxT ).The location of the phase boundary is model-dependent.The line ε = 0 yields the finite-depth complexity transition of Ref.[45] (noiseless SEBD) while the 1/T = 0 line (which stands for T = O(L)) yields the standard measurement-induced phase transition in two spatial dimensions[24].We conjecture the scaling εc(T ) ∼ εc,2D + O(1/T ) at large T .We emphasize that this is a transition in the complexity of the noisy-SEBD algorithm, not of the sampling task itself[10].

FIG. 2 .
FIG. 2. Schematic depiction of the 2D qubit array and effective 1D subsystem used in the SEBD algorithm.(a,b) Example of 2D qubit array with Lx = 5, showing the gate sequences used: ABCD for depth T = 4 (a), ABCDB for depth T = 5 (b).The region enclosed by the solid loop corresponds to the effective 1D subsystem, and the region enclosed by the dashed loop corresponds to sites which are measured after applying gates in the past lightcone.(c) Equivalent 1D circuit for T = 4 and Lx = 5 with gate sequence ABCD.The effective 1D system has size 2Lx = 10, with gates up to the third nearest neighbor, and each measurement is followed by a reset to the |0⟩ state.The dashed lines enclose a unit cell whose architecture repeats periodically in time.

FIG. 3 .
FIG. 3. Purification time τ of a single probe qubit as a function of the noise rate ε for (a) depth T = 4 with gate sequence ABCD and (b) depth T = 5 with gate sequence ABCDB.Scaling collapse of the data for (c) T = 4 and (d) T = 5.The data are averaged over 5 • 10 3 − 3 • 10 4 realizations.

FIG. 4 .
FIG. 4. (a) Layout of 1121-qubit IBM quantum processor Condor; 65-qubit Hummingbird is shown as blue region.(b) Gate sequence for circuits with depth T = 5 (ABCDA) on 65-qubit quantum processor Hummingbird, of linear size Lx = 11.Other IBM processors have similar layout: Eagle with Lx = 15, Osprey with Lx = 27, and Condor with Lx = 43.The region surrounded by a solid line depicts the 1D effective subsystem for noisySEBD simulation of random circuits with T = 5 (gate sequence ABCDA).(c) Bipartite entanglement entropy S and (d) purification time τ of a reference qubit, for noiseless (ε = 0) and noisy (ε = 0.02 and 0.025) circuits of depth T = 5 as a function of linear system size Lx, which refers to subsets of the heavy-hexagon lattice in panel (a).The data are averaged over 10 3 − 10 4 realizations of the random circuits.
Appendix A: Derivation of the unraveling cost function from statistical mechanical model

FIG. 5 .
FIG. 5. (a) The quantum circuit consists of brick-wall unitaries (blue rectangles) and generalized measurements (red dots).(b) The mapped classical statistical mechanical model on the honeycomb lattice, where dashed links are weighted by the Weingarten function and the solid links are weighted by the generalized measurements.(c) Classical Ising model for k = 2, after integrating out τ nodes.

FIG. 9 .FIG. 10 .FIG. 11 .
FIG.9.Entanglement phase transition in Clifford circuits with dephasing noise of strength ε.First 5 panels refer to quasi-1D subsystems that mimic the noisy-SEBD simulation of 2D circuits of depth T = 8, 12, 16, 20, 24 (T 4, not shown, is found to be in the area-law phase for all ε).Last panel shows data for the conventional MIPT in 2D square lattices of size L × L, L = 8, 16, 32, with circuits of depth O(L).Vertical dashed lines indicate estimates of the critical point.Data obtained by averaging between 4×10 2 and 2×10 4 realizations of the random circuits depending on system size.