Quantum coding with low-depth random circuits

Random quantum circuits have played a central role in establishing the computational advantages of near-term quantum computers over their conventional counterparts. Here, we use ensembles of low-depth random circuits with local connectivity in $D\ge 1$ spatial dimensions to generate quantum error-correcting codes. For random stabilizer codes and the erasure channel, we find strong evidence that a depth $O(\log N)$ random circuit is necessary and sufficient to converge (with high probability) to zero failure probability for any finite amount below the optimal erasure threshold, set by the channel capacity, for any $D$. Previous results on random circuits have only shown that $O(N^{1/D})$ depth suffices or that $O(\log^3 N)$ depth suffices for all-to-all connectivity ($D \to \infty$). We then study the critical behavior of the erasure threshold in the so-called moderate deviation limit, where both the failure probability and the distance to the optimal threshold converge to zero with $N$. We find that the requisite depth scales like $O(\log N)$ only for dimensions $D \ge 2$, and that random circuits require $O(\sqrt{N})$ depth for $D=1$. Finally, we introduce an"expurgation"algorithm that uses quantum measurements to remove logical operators that cause the code to fail by turning them into additional stabilizers or gauge operators. With such targeted measurements, we can achieve sub-logarithmic depth in $D\ge 2$ below capacity without increasing the maximum weight of the check operators. We find that for any rate beneath the capacity, high-performing codes with thousands of logical qubits are achievable with depth 4-8 expurgated random circuits in $D=2$ dimensions. These results indicate that finite-rate quantum codes are practically relevant for near-term devices and may significantly reduce the resource requirements to achieve fault tolerance for near-term applications.

Random quantum circuits have played a central role in establishing the computational advantages of near-term quantum computers over their conventional counterparts. Here, we use ensembles of low-depth random circuits with local connectivity in D ≥ 1 spatial dimensions to generate quantum error-correcting codes. For random stabilizer codes and the erasure channel, we find strong evidence that a depth O(log N ) random circuit is necessary and sufficient to converge (with high probability) to zero failure probability for any finite amount below the optimal erasure threshold, set by the channel capacity, for any D. Previous results on random circuits have only shown that O(N 1/D ) depth suffices or that O(log 3 N ) depth suffices for all-to-all connectivity (D → ∞). We then study the critical behavior of the erasure threshold in the so-called moderate deviation limit, where both the failure probability and the distance to the optimal threshold converge to zero with N . We find that the requisite depth scales like O(log N ) only for dimensions D ≥ 2, and that random circuits require O( √ N ) depth for D = 1. Finally, we introduce an "expurgation" algorithm that uses quantum measurements to remove logical operators that cause the code to fail by turning them into either additional stabilizers or into gauge operators in a subsystem code. With such targeted measurements, we can achieve sub-logarithmic depth in D ≥ 2 spatial dimensions below capacity without increasing the maximum weight of the check operators. We find that for any rate beneath the capacity, high-performing codes with thousands of logical qubits are achievable with depth 4-8 expurgated random circuits in D = 2 dimensions. These results indicate that finiterate quantum codes are practically relevant for near-term devices and may significantly reduce the resource requirements to achieve fault tolerance for near-term applications.
A common technique in classical error correction is to study random codes, which often nearly saturate the bounds for the optimal codes [31][32][33]. Moreover, practical, near-optimal codes with efficient encoders and decoders are possible through random constructions of low-density parity check (LDPC) codes [32,33]. In the quantum case, the decoding problem tends to be more difficult to solve (including for LDPC codes), but analogous random coding results have been obtained for stabilizer codes, where the decoding problem is similar to the classical case. Two-local random Clifford circuits with all-toall connectivity have been shown to achieve an extensive code distance on N qubits at a depth upper bounded by O(log 3 N ) [34,35]. This scaling is comparable to provably optimal constructions for two-designs from Clifford gates at depth O(log N ) with access to O(N ) additional ancillae [36]. Spatial locality is often an important constraint in quantum computing architectures. In D spatial dimensions, the expected depth for local circuits to achieve an approximate two-design is upper bounded by O(N 1/D ) [37,38]. Such constructions are only required if the code needs to correct all errors up until threshold. Achieving optimal performance for local noise models requires fewer resources because the code only needs to correct typical errors in the thermodynamic limit.
In this paper, we develop the general theory of optimal decoding with low-depth random encodings that include both unitaries and targeted measurements for one of the simplest error models given by the erasure channel. Many of the results apply for more general error channels, but optimal recovery probabilities are easy to compute for erasure errors, making it a useful error model for benchmarking quantum codes [39,40]. We show that, in any spatial dimension, random Clifford encodings of finiterate codes converge to zero failure probability below the optimal erasure threshold, set by the channel capacity, for depths O(log N ); thus, improving on the random circuit bounds described above. We then introduce an "expurgation" algorithm to surpass this logarithmic barrier and achieve convergence at a sub-logarithmic depth in D > 1 dimensions. This method works by using quantum measurements to remove (expurgate) logical operators from the code that have a high-probability of failure until either a steady state code is reached or target coding parameters are obtained. These low-quality logicals are either turned into additional stabilizers or gauge operators to form a subsystem code. This expurgation process monotonically increases the code distance and recovery probability of any stabilizer subsystem code. At a practical level, one can use random coding and expurgation to generate high-performance, finite-rate codes for thousands of logical qubits with depth 4-8 circuits in two dimensions.
Our results also establish several connections between quantum error correction thresholds, random matrix theory (RMT), and statistical physics. Using an RMT ansatz, we develop a complete critical theory for optimal decoding of erasure errors for random stabilizer codes. We numerically benchmark this ansatz to a high degree of precision in the critical region of the erasure threshold. These scaling results guide our numerical analysis of optimal decoding for finite-depth encoders in finite-size systems. Focusing on the critical scaling theory of random codes at low depths, we find that random Clifford circuits can achieve the capacity of the erasure channel only at parametrically larger depth O( √ N ) in 1D. In D > 1 dimensions, however, random Clifford circuits retain the depth ≤ O(log N ) scaling at capacity. The marginal dimension being 2D is consistent with Imry-Ma type arguments regarding the relevance of randomness in the error patterns at the optimal threshold [41].
We also analyze the case of Haar random circuit encoders at high-depth ≥ O(N ), where optimal decoding is likely exponentially hard. We find similar results as for the high-depth Clifford encoders, but with small quantitative differences that indicate Haar random codes are slightly more optimal than random stabilizer codes. Through an approximate mapping to an Ising model, we argue that the erasure threshold with local random circuits can be generally understood as a type of first-order domain-wall pinning phase transition.

A. Relation to previous work
In this section, we discuss the relation of our results to some of the prior work on quantum error correcting codes and random quantum circuits.

Quantum error correcting codes
Starting in the early days of quantum error correction, a common strategy for proving fault-tolerance was to study concatenated codes [42][43][44]. These codes reduce decoherence by successively encoding quantum information in nested chains of small codes. Unfortunately, this approach typically suffers from large space-time resource costs and low error thresholds [11,13,26]. A paradigmatic example of a code that, in balance, requires minimal resources is the 2D surface code [45,46]. This topological code saturates the capacity for the erasure channel at zero code rate on a square lattice [46], is provably faulttolerant under more general noise models [46,47], has highly efficient decoding algorithms [27-30, 39, 40, 46-51], and a large variety of fault-tolerant strategies for implementing gates [12,14,[52][53][54][55][56][57][58]. However, despite its remarkable properties, the surface code requires a prohibitively large overhead in the number of physical qubits for applications on near term devices [14]. With issues of this nature in mind, it remains a central goal to develop more space-efficient, ideally finite-rate, codes that achieve similar levels of performance to the surface code [26].
Extending topological codes, or, more generally, lowdensity parity check (LDPC) codes, to finite rate faces various theoretical obstructions in spatially local models [59]. Two routes to overcome this are to use subsystem codes [60] or remove the constraint of geometric locality, while keeping the LDPC condition. In all-to-all coupled systems, a large variety of finite-rate LDPC codes have been developed by extending the surface code to nonlocal geometries or adapting classical codes based on expander graphs [24,25,61,62]. Furthermore, several threshold theorems have been proved for a large family of these codes [26,63,64]. Maintaining all-to-all connectivity in the thermodynamic limit eventually runs into prohibitive resource constraints, but these codes are applicable to near-term ion trap quantum computers [65,66] and quantum networks [67]. Another interesting class of finite-rate codes that retain some locality structure, but are not of the LDPC type, are provided by holographic codes that originally arose in the study of quantum gravity and the AdS/CFT correspondence [68,69]. The quasilocal codes considered here differ from these various classes of codes because their properties emerge from generic, local scrambling dynamics instead of concatenation, topology, expander graphs, or hyperbolic geometry.
More specifically, our results have direct relevance to a recently discovered phase transition that arises in monitored random circuits, where unitary gates are interspersed with random projective measurements [89,90]. These models have attracted interest in condensed matter theory due to the potential connections to chaos, thermalization, conformal field theory, and the manybody localization phase transition . In the context of quantum information, their study has led to novel insights into emergent quantum error correction [93,94,[117][118][119], as well as the sampling complexity of constant depth circuits in 2D [120]. Due to the repeated rounds of measurements acting on a code space density matrix, the dynamics during our expurgation algorithm display a similar phenomenology to the "purification" dynamics of a mixed state in the unitary-measurement models [93,103,107,119]; however, there are several important differences in the present case due to the nonrandom, targeted choice of measurements. Furthermore, since we show that logarithmic depth random circuits are sufficient to reliably encode quantum information, our results may provide guidance for rigorous existence proofs of the volume-law phase in some models. They may also help guide efforts in developing fault-tolerant strategies for monitored random circuits that incorporate feedback.

B. Structure of paper
The paper is organized as follows: In Sec. II, we outline our theoretical approach for studying random quantum codes. We then summarize our main results and theoretical methods. In Sec. III, we provide some background on the basic concepts and terminology used to describe quantum error correction thresholds. In Sec. IV, we present the RMT solution to the erasure threshold for random stabilizer codes. In Sec. V, we present our results on the behavior of low-depth random circuit encoders for the erasure channel. In Sec. VI, we present our expurgation algorithm to surpass the depth O(log N ) barrier in D > 1 dimensions. In Sec. VII, we present an analysis of the erasure threshold for general Haar random codes. In Sec. VIII, we describe an approximate mapping of the erasure threshold to a first-order domain wall pinning transition that occurs in the ordered phase of the Ising model. We provide further discussions and present our conclusions in Sec. IX.
We remark that the arguments in the paper use a com- error rate e through an error correction threshold for an optimal code. In this work, we probe the optimality of a given code ensemble by comparing the location and universality class of the critical point to random stabilizer codes. (b) Illustration of the models we study: the encoding circuit is a low-depth random unitary circuit and the error is an erasure of a fixed fraction eN of N qubits. The decoding proceeds via generalized measurements with outcomes s and recovery operators Rs. We mostly focus on stabilizer codes, where optimal decoding of Pauli error channels like erasure errors is possible with stabilizer syndrome measurements followed by the conditional application of single-site Clifford gates.
bination of rigorous proofs, large-scale numerics, conjectures, and some occasional heuristics. To test our ideas as strongly as possible with this approach, we analyze the problem from multiple perspectives and systematically compare our results across different spatial dimensions. What emerges from this analysis is a consistent framework to describe quantum coding with local random circuits.

II. SUMMARY
In this section, we outline the theoretical approach taken in this work and summarize our main results.

A. Theoretical approach
This paper is focused on developing a theory of optimal decoding for finite-rate codes generated by random circuits. To approach this problem, we directly investigate the probability of successful recovery P(R) of the encoding and decoding scheme for the specific error model of erasures. This type of observable is complementary to other performance metrics that are agnostic to the error model, for example, the code distance. One advantage of studying recovery/failure probabilities is that it allows us to obtain a more detailed understanding of the code performance near the optimal threshold. For coding below the optimal threshold, we have found that focusing on this observable often suggests methods to tailor the codes to the detailed properties of the noise, as we explore with our expurgation algorithm in Sec. VI.
The qualitative behavior of the optimal (minimal) failure probability P(F ) = 1 − P(R) is shown in Fig. 1(a).
Here, e is a parameter that characterizes the strength of noise in a given error model (or class of error models) and we assume that the implemented code is optimal for this error model. Below threshold, the failure probability converges to zero in the limit of large N . Past a critical error rate e c (set by the channel capacity limit for the optimal code), for e > e c the failure probability instead converges to one in the large-N limit. This discontinuous behavior in the large-N limit is characteristic of a phase transition. Motivated by results from classical error correction [32,33], we assume the failure probability for an optimal code for large N is well approximated by the average behavior of a high-depth random stabilizer code under optimal decoding [121]. One primary question that we address is what minimal depth of a random encoding Clifford circuit is needed for large but finite N to achieve near-optimal failure probability for the specific case of erasure errors [see Fig. 1 More specifically, in finite-size systems, the failure probability for the optimal code will generically be a function of both the error rate e and the number of (qu)bits N per code block. However, in the thermodynamic limit of large N , the failure probability will approach a scaling form in the vicinity of the critical error rate e c [see shaded region in Fig. 1 where a and b are critical exponents and the corrections are assumed to be subleading in powers of 1/N . We take the logarithm of the failure probability as it generally scales like a free energy, e.g. in the surface code [47]. In coding theory, properties of the scaling functions for the optimal codes f opt have been extensively studied under optimal decoding of Markovian error channels (e.g., see Ref. [122]). The finite-size scaling behavior is important because it determines the rate of convergence to the ideal behavior below threshold. The underlying idea of this work is to use the scaling properties of the optimal codes for a given error channel as an ideal performance benchmark. We effectively define a code as optimal if it achieves capacity at threshold and its threshold behavior lies in the same universality class as the truly optimal codes for this error channel. Of course, finding explicit and efficiently implementable representations for encoding and decoding maps of optimal codes is generally a difficult problem [123]. To approximate this paradigm in a setting that allows for more theoretical progress and potential practical implications for quantum computing, we relax the benchmark critical behavior from that of optimal codes to the average behavior of high-depth random stabilizer codes. As mentioned above, random codes typically achieve similar levels of performance as optimal codes. In quantum error correction, even random stabilizer codes are often sufficient. We present numerical evidence on small systems that the Haar random code transition is in the same universality class as the random stabilizer code transition. However, we also see small quantitative differences between the scaling functions for the two cases, with slightly more optimal performance for the Haar codes. Random stabilizer codes are, thus, better classified as "near-optimal" codes for the erasure channel, which is similar to a well-known result for the depolarizing channel [124,125].

B. Main results
As discussed in the introduction, our main results center around the resource requirements (in terms of encoding circuit depth) to achieve zero failure probability or approach capacity for finite-rate codes generated by random circuits. In particular, we study stabilizer codes generated by two-local random Clifford circuits on hypercubic lattices in D spatial dimensions or on all-to-all coupled networks. The basic setup is illustrated in Fig. 1 In this example, every other qubit is mapped to an encoded or "logical" qubit at a code rate of R = 1/2 and the random circuit is implemented in 1D. We provide a summary of the scalings found in this work in Table I.
The error model is taken to be an erasure model where eN sites of an N -qubit system are randomly selected and traced out of the system, with those sites heralded to the decoder but unknown to the encoder. The failure probability for the more physically relevant case of independent, identically distributed (iid) erasures at each site with probability e can be determined from the failure probability for the fixed-fraction erasure model, which is why we mostly focus on the latter. 1 For the random stabilizer codes, we show that the transition is in a certain sense first order, since for e < e c the logarithm of the failure probability is proportional to −(e c − e)N in the limit of large N . If we interpret this as a free energy, it is extensive and its density has a discontinuity in the first derivative with respect to e at e c , as is the case for first-order phase transitions. This first-order transition is rounded out for finite N . This finite-size rounding is minimal if we take an error model with erasures on a fixed fraction eN of sites. The finite-size rounding of the transition in the iid model is much stronger (by a factor of √ N ). We find that the best known analytical bounds for the convergence rate to a two-design strongly overestimate the circuit depth d required for convergence of the failure probability towards the high-depth [d = O(N )] limit. Most notably, in any D ≥ 2, at the critical point the 1 The failure probability P iid (F ) for iid random erasures can be found from the fixed-fraction failure probability P(F ) and the probability p iid (ne) of erasure number ne in the iid model. Since ne is known to the decoder, P(F ) = P iid (F |ne) is effectively a conditional distribution, i.e., P iid (F ) = ne P(F )p iid (ne). coding arbitrarily close to the critical region of the optimal erasure threshold in the thermodynamic limit. We find b = 1 for the fixed-fraction erasure model and b = 1/2 for the iid model. D = 2 is the marginal dimension for the relevance of spatial randomness in the errors to the threshold behavior, which makes the scaling at the optimal threshold difficult to reliably determine from numerics or Imry-Ma arguments. The last column shows the results upon expurgation of bad logical operators using quantum measurements (see Sec. VI).
depth required scales as d ≤ O(log N ), which is comparable to the optimal depth for generating a two-design in systems without spatial locality constraints O(log N ) [36]. Even in D = 1, we find that removing the randomness in erasure locations by taking regularly spaced erasures leads to a required depth to approach zero failure probability below the optimal threshold of O(log N ). Spatial randomness in the erasure locations seems to only be a relevant perturbation to the finite-size scaling behavior in D = 1, and not for D ≥ 2.
To simplify the analysis, we will fix the initial code rate at precisely R = 1/2 in most of our discussion and drop this argument from the scaling functions. Also fixing the initial spatial arrangement of the logical qubits to be every other site in the lattice, the failure probability has a four-parameter dependence − log P(F ) = F (e, D, d, N ). ( We first consider the high-depth limit d = O(N ) of the failure probability, which does not depend on D. Through a RMT ansatz, we obtain an asymptotic form for the failure probability in the fixed fraction model that depends only on the total number of erasures eN (an integer) relative to the threshold number e c N (which does not have to be an integer): Here, e c = (1 − R)/2 coincides with the channel capacity limit for the erasure threshold [126]. For this fixed-fraction erasure error model, the scaling function f ∞ (x n ) is only well-defined on a countably infinite set of values in the thermodynamic limit. The RMT solution predicts a value for the critical failure probability P(F ) = 0.38968 . . . that is independent of R for 0 < R < 1; for R = 1/2 we verify this value numerically to a precision of 10 −4 .
To understand the scaling with depth we first consider a simple mean-field model for the below threshold behavior in which we break the system up into individual blocks of size O(log N ). A simple analysis of this model based on the results of Ref. [37] shows that the convergence to the high-depth behavior of the failure probability in D dimensions is typically O(log N ) for random Clifford encodings, but can be made as low as depth O[(log N ) 1/D ] through the optimized encodings of Ref. [36]. In the latter case, there is a reduction in the effective rate of the code due to the use of O(log N ) ancilla qubits per block in the encoding scheme.
Using an Imry-Ma type argument [41], we then argue that the positional randomness of the erasures is irrelevant for the finite-size scaling in D ≥ 2. As a result, we conjecture that the critical points for all D ≥ 2 have the same leading order scaling with depth as the below threshold behavior predicted from the mean-field model Using this ansatz, we find a consistent scaling collapse in our numerics.
We also study the scaling behavior with depth d in the 1D case (D = 1), which has to be treated separately. By studying the convergence of the critical failure probability P(F )| e=ec to the RMT prediction, we find numerical evidence for a leading order scaling behavior of the form In contrast, below the critical error rate, we find that the failure probability for d > O(log N ) exhibits exponential decay with the depth P(F )| e<ec,D=1 ∼ e −d/A(e) for some function A(e) that diverges as (e c − e) −1 upon approaching e c . This behavior leads to an overall O(log N ) depth for convergence to zero failure probability below threshold, but with a rate that goes to zero at the optimal erasure threshold. We argue that the √ N scaling at e c has an intuitive explanation as arising from the Poisson fluctuations in the number of excess erasures in a given extensive region.
After establishing these scaling results, we introduce our expurgation algorithm based on measuring logical operators in the system that are likely to lead to failures. We prove that the code distance and recovery probability for Pauli error channels will monotonically increase with this expurgation strategy. We then numerically study the performance of the algorithm in 2D and all-to-all coupled systems. In both cases, we see strong evidence that a given target failure probability can be achieved with a sub-log-N depth circuit.
Finally, to test the generality of these results obtained for stabilizer codes, we study the erasure threshold for Haar random circuits. We first study the high-depth limit using small scale numerics. We find consistent critical behavior with the random stabilizer code threshold, but small quantitative differences in the scaling functions (as noted above). Using well-studied mappings of twolocal Haar random circuits to statistical mechanics models [71,74,127], we describe an approximate mapping of the erasure threshold to a first-order pinning transition for domain walls that occurs in the ordered phase of D + 1-dimensional Ising models. Such transitions display similar phenomenology to our numerically observed results for random Clifford circuits.

III. PRELIMINARIES
In this section, we introduce the basic terminology and concepts underlying quantum error correcting codes, optimal decoding, and quantum error correction thresholds. We derive a formula used throughout the paper for the recovery probability of stabilizer subsystem codes under erasure errors.

A. Optimal decoding
The general setup we consider follows the illustration in Fig. 1(b). Information is first mapped into a nonlocal code space, it is then subjected to local errors and decoded. In the theory of fault-tolerance, one needs to consider errors in both the encoding and decoding steps; however, we will not address such issues here and assume both the encoding and decoding operations are implemented perfectly.
In the quantum case, these three operations are typically described using the language of quantum channels, which are linear maps that are completely-positive and trace preserving [128]. Denoting the encoding, error, and decoding channel by E, N and D, respectively, the central object of interest is the composite channel Error correction can be done perfectly when this composite channel acts as the identity on the allowed input states D • N • E(ρ) = ρ or is unitarily equivalent to the identity D • N • E(ρ) = U ρU † for a known unitary U . When this map is not exactly unitarily equivalent to the identity, then it is convenient to use a fidelity metric to quantify its proximity to the identity. One natural fidelity metric that we study in this work is the maxaverage state fidelity [128] where dψ is taken as a uniform measure over pure input states for E and the maximum is taken over all possible decoding maps D. This fidelity metric quantifies the degree to which a randomly drawn code word can be recovered back to its initial state under optimal decoding.
A closely related fidelity metric to this average state fidelity is the entanglement fidelity, which measures the degree to which the map preserves entanglement with a reference system [129]. Given an initial density matrix ρ S on the system S, we purify it to the state ρ SR = |Ψ SR Ψ SR | by introducing entanglement with a reference system R such that ρ S = Tr R [ρ SR ]. Then, the entanglement fidelity under optimal decoding is (6) where the maps act as the identity on the reference system and F e (ρ S ) is independent of the choice of purification. Conveniently, the max-average fidelity is equivalent to the max-entanglement fidelity of the completely mixed state through the formula F avg = [q F ent (I/q)+1]/(q +1), where q is the dimension of the input space [130,131].
In cases where the optimization over decoders is difficult to compute, we can still gain insight into the quantum error correction threshold by studying the coherent quantum information [132,133] where S(ρ) = − Tr[ρ log 2 ρ] is the von Neumann entropy, . In our analysis of random stabilizer codes, we directly compute F avg , while for Haar random codes we use the coherent quantum information to bound F e (I/q).
The coherent quantum information is fundamentally related to the quantum channel capacity through the limiting formula [134,135] In this work, we study erasure errors, which for a single qubit is defined by the channel The states |0 /|e are orthogonal states that herald the absence/occurrence of the erasure on this site. Note that, in many physically relevant scenarios, the state of the system itself is mapped to an orthogonal state under an erasure error, which is an equivalent description of this channel for our purposes. We choose the representation in Eq. (9) to simplify the notation in later discussions. The heralded nature of the erasure locations dramatically simplifies the decoding problem, as we discuss below for stabilizer codes. Furthermore, due to this classical register, the capacity of the erasure channel is additive [i.e., Q(N ) = max ρ I c (ρ, N )] and is derivable from the no-cloning theorem [126]. It is also easy to compute the channel capacity from the maximization of the coherent quantum information Q(N ) = (1 − 2e), where e is the local erasure probability on each site. Equivalently, for a code rate R = k/N = Q, the optimal erasure threshold in the thermodynamic limit is e c = (1 − R)/2.

B. Optimal decoding: stabilizer codes
These concepts of optimal decoding are illustrated more concretely by considering the example of qubit sta-bilizer codes. An [N, k] qubit stabilizer code encodes k logical qubits in N physical qubits. The codewords are spanned by the set of 2 k stabilizer states that are the simultaneous eigenstates of a stabilizer group S ⊂ P N , which is an abelian subgroup of the Pauli group on N qubits P N such that −I / ∈ S. Given a generating set {Z 1 , . . . ,Z N −k } for S, optimal decoding is possible through projective measurements of these generators (called syndrome measurements) for Pauli error channels. These are channels that have a Kraus representation of the form where E is an element of P N , |k are orthogonal quantum states that are used to store classical data (e.g., the erasure locations), and p(E, k) ≥ 0 is a joint probability distribution over the allowed error operators E and register indices k. Such quantum-classical channels are sometimes called a "quantum instrument" [136]. Erasure errors can be represented in this form because of the following identity for the partial trace operation on site n where I n , X n , Y n , and Z n are the four Pauli operators. The Pauli group operation can be represented by standard matrix multiplication of the N -qubit Pauli operators. For two Pauli group elements P 1,2 , we use the notation [[P 1 , P 2 ]] = Tr[P 1 P 2 P −1 1 P −1 2 ]/2 N to denote their scalar commutator: if P 1 and P 2 commute, then [[P 1 , P 2 ]] = 1, otherwise, [[P 1 , P 2 ]] = −1. We can extend the generating set for S to a complete generating set for P N by appending destabilizer operators {X 1 , . . . ,X N −k } that satisfy [X i ,X j ] = δ ij I and [[Z i ,X j ]] = (−1) δij , a generating set for the logical operators L i (these are Pauli group elements that commute with S, but are not contained in S [128]), and the Pauli group element i I.
Since each E is an element of the Pauli group, we can decompose them based on the outcomes they produce in the syndrome measurements E sk = g sk L E sk , where s = (s 1 , . . . , s N −k ) is a vector of syndrome bits (s i = 0/1), g sk is Pauli group element satisfying [[g sk ,Z i ]] = (−1) si , and L E sk is a logical operator. The g sk is non-unique and is allowed to be linearly dependent on elements of S, the destabilizers, logical operators, and iI.
After applying the error channel N and performing a projective measurement of the syndrome bits s and register state k, the state is mapped to where we traced over the classical register for notational convenience. Applying the correction operator g † sk , which is a product of single-site Clifford gates, the state becomes a mixture of states in the code space Below threshold in the thermodynamic limit, all the L E sk must converge in probability to the same logical operator L sk up to multiplication by elements of the stabilizer group S, i.e., L E sk = g E sk L sk for some g E sk in S. Since the operators g E sk act trivially in the code space, the initial state can then be perfectly recovered by applying the additional unitary correction operator L † sk . In finite-size systems, where perfect decoding is not generally possible, an optimal decoding strategy is any maximum-likelihood decoder based on the observed s and k [47]. In this approach, we further break up the set of all E sk into logical operator classes E i sk = g sk g E i sk L i sk , such that the g E i sk are in S and the L i sk cannot be related (modulo a phase) through multiplication by elements of S. Conditioned on s and k, the decoder applies a unitary correction operator sk with L im sk corresponding to the most likely logical error equivalence class. This operator can be computed by finding the value of i that maximizes the probability i.e., Z max (s, k) = max i Z i (s, k). The probability of a perfect recovery for all input states under optimal decoding is then given by For general codes and Pauli error channels, finding L im sk is likely to be exponentially hard, but, in the case of erasure errors, an efficient optimal decoding strategy has been derived by Delfosse and Zemor [40]. Briefly reviewing their argument: one can use the fact that erasure error locations are heralded, which implies that the decoder only needs to find the most likely error operator that acts on the erased sites. Once the sites are known, according to Eq. (11) all error operators are equally likely, which implies that the probabilities in Eq. (14) are all equal for a fixed k; therefore, a maximum-likelihood strategy is to simply choose R sk to be any Pauli operator that lives on the erased sites and produces the observed syndrome measurement. Using the standard representation for stabilizer states from the Gottesman-Knill theorem [137,138], such a Pauli operator can be found given the syndrome check operators, erasure locations, and syndrome measurement outcomes using Gaussian elimination in a time at most O(N 3 ).
Here, we derive an explicit formula for the recovery probability under erasure errors that is convenient for our purposes. We make use of a generating matrix for the error operators that act in the erased region e The first n s = N −k columns s E are vectors of syndrome bits for a local basis of error operators defined by the The last 2k columns E similarly encode the scalar commutator of the local errors with a generating set for the logical operators. Crucially, stabilizer codes are additive codes, which implies that if As a result, the row vectors (s µi | µi ) act as a generating set for all possible syndromes and their associated logical errors in the erased region. We now show how to compute the recovery probability from the matrix M . Performing row reduction on M identifies all errors that map to the all zero syndrome, but have a nontrivial logical operator content. Errors of this type can be used to enumerate all uncorrectable errors for that set of erasure locations. For each matrix M (S, L, e), we define where M S is the submatrix of M consisting of the first n s columns of M . r M counts the number of basis vectors for errors that have a zero syndrome, but act nontrivially on the logical subspace. For each syndrome, the decoder can only apply a single recovery operator; however, this recovery strategy will always fail with some probability if the error is linearly dependent on one of the r M basis vectors with trivial syndrome and nontrivial logical operator content. Since all the errors occur with equal probability for erasures, the optimal recovery probability can then be directly computed as Incidentally, k − r M is also equal to the coherent quantum information of this encoding scheme under erasure errors. As we take advantage of in Sec. VI, these formulas directly generalize to stabilizer subsystem codes by removing columns of M associated with generators for the gauge group.

IV. RANDOM STABILIZER CODE THRESHOLD
In this section, we present a solution to the critical theory of the erasure threshold for random stabilizer codes based on an RMT ansatz. Establishing the basic phenomenology of the random stabilizer erasure threshold transition is standard material in quantum information theory [139]. Our main contribution is to derive analytic predictions for the code-averaged probability of perfect recoveryP(R|n e ) according to Eq. (15), where n e is the number of erased sites in the fixed-fraction model and k is the number of logical encoded qubits. In Appendix A, we further show thatP(R|n e ) is equal to the code-averaged max-average fidelityF avg in the thermodynamic limit. We use this result to argue that the Haar random erasure threshold, where we only approximateF avg , is in the same universality class as the random stabilizer erasure threshold.
The encoding circuit U for a random stabilizer code is a random Clifford unitary on N qubits. Since spatial locality is irrelevant in this discussion, we take the initial unencoded logical qubits to be given by the last block of k qubits, which implies that the stabilizer group has generatorsZ where n s = N − k is the number of stabilizer generators. We use the optimal decoding strategy for erasure errors described in Sec. III B: given a set of erased sites e and syndromes s, the decoder applies any Pauli operator R se that lives on e and flips each stabilizer generatorZ i to have the same sign as s i [40]. The circuit-averaged probability for perfect recovery under optimal decoding satisfies P(R|n e ) = E U e P(R|U, e)p(e) = E U P(R|U, e), (20) because E U P(R|U, e) depends on e only through n e = |e| for a fully random Clifford circuit.
Since random stabilizer codes are nondegenerate in the large N limit, every correctable error needs to map to a unique syndrome. We can find the number of unique syndromes for a given U and e by determining the F 2 -rank n se of the syndrome matrix M S (S, L, e) formed from the first n s columns of M from Eq. (16). Intuitively, 2 nse is simply the total number of unique syndromes available to the decoder for this erasure pattern. Thus, the average recovery probability is wheren se ≡ log 2 E U [2 nse(U,e) ] is just a function of n e , n s = N − k, and N . We can approximate the behavior of n se for a random stabilizer code in the two limits n s 2n e or n s 2n e [139]. In the former case, the number of possible errors is exponentially larger than the number of available syndromes. For a random U , each syndrome occurs with nearly equal probability; thus, each syndrome will be occupied with high probability and result in the scalinḡ n se = n s . In the opposite limit, the number of available syndromes is exponentially larger than the number of errors. As a result, there is a high probability that each error gets mapped to a distinct syndrome, resulting in the scalingn se = 2n e . These estimates show that P(F ) = 1 −P(R) has a discontinuity at the channel capacity bound n s /N = 1−R = 2n e /N = 2e in the large-N limit.
In the RMT approach described below, we can explicitly calculateP(F ) for all values of n e and n s , including arbitrarily close to threshold, where δ = 2n e − n s = 2(e − e c )N is the distance from the critical point and r c is the recovery rate at the critical point. As we show in the section below, r c = 0.610322 . . . in the RMT model. From this RMT solution, we also find that the higher order corrections to this formula are exponentially suppressed in the distance from the critical point O(2 −2|δ| ).

A. RMT solution
The exact formula for the code-averaged recovery probability is given bȳ There is no need to average over e for a fixed erasure number because the circuit average removes the dependence on the spatial locations of errors. In the RMT approach, we assume that the syndrome matrices M (S, L, e) and M S (S, L, e) are given by random 2n e × (n s + 2k) and 2n e × n s matrices, respectively. We do not expect this result to be true exactly because it ignores the constraint that the time-evolution preserves commutation relations; however, we conjecture that it is accurate up to exponentially small corrections in N . The reason it is a probable hypothesis is that M and M S can be constructed by taking submatrices of a much larger 2N × 2N tableau representation for U [137,138]. The RMT ansatz is based on the assumption that these submatrices are insensitive to the "global" constraint on the otherwise random U that it preserves commutation relations of Pauli group elements.
In the RMT anstaz and for e < 1/2, the matrix M will have full rank 2n e with a probability that converges to one exponentially in N because the number of columns is much greater than the number of rows. The average recovery probability then reduces to a combinatorial formula regarding the rank distribution of the M S matrix P(R|n e ) = RMT m # 2n e × n s matrices of rank m # 2n e × n s matrices The denominator is the number of matrices over F 2 of size 2n e × n s , which is equal to 2 2nens since each entry can take one of two independent values. Finding the number of 2n e × n s matrices of rank m is a less trivial, but familiar, result in combinatorics that also has applications to classical error correction [140]. For completeness, we provide a derivation in Appendix B. Using this formula, the probability of successful recovery has the analytic expression where n m se = min(n s , 2n e ).P(F ) = 1 −P(R) has the asymptotic behavior given by Eq. (22) with the critical parameter which is approximately r c ≈ 0.610322 . . .. By numerically sampling M S matrices generated by depth N 1D local circuits with periodic boundary conditions, we have verified that these RMT predictions accurately approximate the true failure probability on these sizes. The results are shown in Fig. 2 up to size N = 128 for R = k/N = 1/2. We find excellent agreement between the exact numerical results and the RMT prediction throughout the critical region, even for sizes down to N = 16. To obtain a more precise comparison, we estimate the success probability with higher precision at the critical point. boundary conditions provides our current best estimate r c ≈ 0.61029 ± 3.7 · 10 −5 = 0.61029(4), which agrees with the RMT value at a precision of 10 −4 .
In Appendix D, we further show that the recovery probability is self-averaging at e c in the sense that a typical random code has a recovery probability that converges to r c in the large-N limit.

V. QUASILOCAL RANDOM STABILIZER CODE THRESHOLD
In this section, we investigate the erasure threshold for random stabilizer codes generated by finite-depth quantum circuits in finite-size systems.

A. Block model: mean-field limit
We can gain a surprising amount of insight into this local random coding problem by first considering a toy model with the simplified block encoding scheme illustrated in Fig. 3. Furthermore, the basic arguments in this section are not specific to the erasure channel. In this model, we remove gates that couple different blocks of qubits such that each block undergoes completely independent random unitary dynamics. Intuitively, this model can be interpreted as a type of mean-field model for the random code transition. At large depth, the average failure probability for this model becomes an upper bound on the average failure probability of the random code transition.
Specifically, we break up a system of N qubits into cubic blocks of size N b = L D where D is the space dimension of the encoding Clifford circuit in each block and L is the linear size of the block. Each block has approximately (1−R)N b stabilizers and RN b logical qubits. Running a high-depth [d = O(N b )] random Clifford circuit on each block results in a rate R random stabilizer code on this block of qubits. If we apply an erasure error below the random code threshold, then the average recovery probability is just the product of the average recovery probability for each block (since the codes between blocks are uncorrelated) where δ i = (1 − R)N b − 2n ei is the distance from the critical point in block i with n ei erased sites. In order for our approximations to be valid we require that N b grows as O(log N ) or faster. We make use of the fact that the fluctuations in the number of erasures in each region are determined by the central limit theorem As a result, the average failure probability is given bȳ Thus, for e < e c , the failure probability converges to zero when lim N →∞ To achieve a random code on each block, we naively need to apply a depth d = O(L) circuit [37,38]; however, this neglects the fact that there are rare Clifford circuits where Pauli operators remain localized to a given site. In particular, to preserve commutation relations every two-qubit Clifford gate has to map at least one singlesite Pauli operator for each site to another single-site Pauli operator. The probablity of such localized logicals appearing in a system of size N scales as N/A d for a constant A that depends on the ensemble of gates used in the random circuit. Note, that even if all two-qubit gates are entangling, A will still be finite. This constraint implies that one needs to apply a depth O(log N ) random Clifford circuit regardless of dimensionality to avoid these rare localized operators. As a result, the block model only converges to zero failure probability for depth d = O(log N ) for all spatial dimensions. Interestingly, after our work appeared, a similar type of argument was used in Corollary 4 of Ref. [141] to prove a lower bound of O(log N ) on the depth required to achieve a form of anticoncentration in random circuits. The ensemble was formed from two-local circuits with the gates drawn randomly from a two-design. Developing a more complete understanding of the relation between encoding properties of low-depth random circuits and other observables, e.g., anticoncentration or sampling complexity, is an interesting subject for future work.
In the case of the channel coding problem considered here, there are two routes to overcome the O(log N ) lower bound on the depth required to achieve zero failure probability below capacity. One simple approach within the block model picture is to apply an optimized implementation of a two-design following Ref. [36], but including SWAP gates to map the all-to-all circuit to a local geometry. This approach also requires the use of O(log N ) ancilla qubits per block, which, by our conventions, would effectively reduce the overall rate of the code. With such optimized circuits, one can deterministically encode each block into a high-performance code in depth O(N 1/D b ); thereby, allowing convergence of the full system to zero failure probability at depth O[(log N ) 1/D ]. This argument shows that, in principle, one can surpass the O(log N ) scaling by introducing long-range correlations into the encoding and allowing for additional ancilla qubits. In practice, however, the block model will always have a relatively weak convergence with depth because it is not taking advantage of correlations that can build up between blocks. To achieve the sub-logarithmic scaling in practice, we therefore use the expurgation strategy described in Sec. VI below. In this approach, these rare localized logical operators are directly removed from the code by the expurgation process.

B. Critical scaling
In the vicinity of the critical point for the random stabilizer code, it is clear that the block encoding scheme fails because each individual block fails with a large probability. As mentioned in the previous section, we expect the original model to achieve better performance because the "blocks" formed by the finite depth circuit are effectively correlated with each other. This implies that the error correction in regions with excessive numbers of erasures can be assisted by nearby regions. As shown in Fig. 4(a), we numerically observe that the convergence to the critical properties of the random code behavior occurs at depth O( √ N ) in 1D. On the other hand, for D ≥ 2, the convergence, even at the critical point, occurs at depth O(log N ). As we show below, this distinction between D = 1 and D ≥ 2 can be traced to the familiar fact that the boundary of a contiguous region in 1D is effectively zero dimensional. In the discussion below, we assume we are working at depth greater than O(log N ) so that large inhomogeneities in the quality of the random code are smoothed out, while what is left over is the randomness in the error pattern.
We first give an argument for the √ N scaling in 1D based on a mapping to a random walk for the iid model. If we sum up the number of erasures relative to the critical number along the length of the system, this is a biased random walk that travels a certain distance on summing around the full system. The random walker's time is the system's space, while the random walker's space is an excess number of erasures in that segment of the system's space. The failures occur where this random walk does a "backtrack" of distance d. So the characteristic d = d * , where the failure probability converges towards its high-depth [d = O(N )] value, is the d where these backtracks become rare in the system of length N . From the statistics of random walks, this has a probability of occurring that falls off as exp(−AN/d 2 ) for some constant A, but the region that is dense can be at of order d 2 /N distinct locations [142]. By considering only regions where these local fluctuations are above threshold, we arrive at the scaling form d * = N 1/2 g[(e c − e)N 1/2 ] with g(x) ∼ (1/x) log(x) at large x and g(0) of order one. Thus, well below threshold in 1D (x 1), the scaling for the critical depth is d * ∼ (e c − e) −1 log N . The depth required to converge to zero failure probability is always O(log N ) in 1D, but the prefactor diverges as one approaches the optimal threshold.
A related argument that connects more directly to the syndrome matrix M (S, L, e) proceeds as follows: Imagine we apply a fixed number of erasures at the critical point n e = n s /2, but distributed randomly throughout the system. If we cut the system into two halves, then one half of the system will effectively be above threshold with ∼ √ N extra erasures, while the other half will be below threshold. In order to correct the above threshold region, we need to "borrow" a sufficiently large number of error syndrome basis elements in M (S, L, e) from the region that is below threshold. This requires that the minimum support of our error syndrome basis elements is ∼ √ N to satisfy this condition; thus, we need to run To test our argument that it is only the local fluctuations in the erasure number that determine the required depth, we compare the convergence to the RMT prediction for random erasures in Fig. 4(a) against regularly arranged erasures in Fig. 4(b). In the spatially nonrandom case, the error is chosen randomly from one of the four regularly spaced erasure patterns with n e = e c N = N/4. In contrast to the random error model, we see convergence to the large depth limit with an O(log N ) scaling.
We remark that the recovery probability for a depth zero circuit with this nonrandom error and our layout of logical qubits is equal to 1/2. Thus, the recovery probability is nonmonotonic with depth: it is 1/2 at d = 0, then drops close to zero for 0 < d A log N and then improves to ∼ = 0.6 for d > A log N ; the coefficient is found to be A ∼ = 6.5.
The situation changes dramatically in higher integer dimensions where the prefactor of the log N scaling of d * does not need to diverge as one approaches the optimal erasure threshold. In this case, the random fluctuations in erasure number within a given region can be overcome by the overlapping syndromes near the boundary whenever This tension between random fluctuations and ordering tendencies is familiar from Imry-Ma arguments. This scaling indicates that D = 2 is the marginal dimension for the relevance of random erasure locations. For D > 2, at the depth d ≥ A log N needed to produce a near-optimal code, the effect of this erasure-location randomness is subdominant. This appears to remain true in the marginal dimension D = 2, where the subdominance is only by factors of log N . In Fig. 5 we show the numerical results for the recovery probability through the erasure threshold at different values of the depth in two dimensions. We clearly see the exponential convergence to the RMT prediction throughout the critical region.
In the case of intermediate dimensions 1 < D < 2, such as can be realized in fractal lattices and critical percolation clusters, the perimeter of a region with excess erasures may have a nontrivial scaling with N that is also not spatially uniform. As a result, it would be an interesting subject for future work to precisely determine the fate of the critical scaling on particular real space lattices with these intermediate dimensions.

C. Spatial correlations of uncorrectable errors
When used as a toy model for the low depth regime log N d N 1/D , the block model suggests that errors will generally be bunched in space. In particular, this model leads to the intuition that regions with excess erasures will fail first with an uncorrectable error of weight ∼ d D . To test this argument we consider a setup inspired by the entanglement fidelity: two of the logical qubit sites are initially entangled with external reference qubits and the other logical qubits are in a random pure product state.
Using these reference qubits as local probes, we define an error as occurring in the vicinity of location i, if reference qubit i loses its entanglement with the system following the full encoding, error, and decoding procedure. Specifically, we study the change in mutual information between each probe and the system ∆I(R i : S) = I(R i : S) − I(R i : S ), (32) where is the density matrix of the system and reference probe i following the decoded error channel, ρ Ri = Tr S [ρ RiS ], and ρ S = Tr Ri [ρ RiS ]. Initially, the mutual information I(S : R i ) = 2. In these stabilizer code models with Pauli error channels, the mutual information changes in discrete integer steps. For two reference qubits entangled with the system at sites x 1 and x 2 , we then define code-averaged local error profiles whereP(·) = E U P(·) and x 12 = |x 1 − x 2 | is the distance between the probes. Numerical results for these error profile functions are shown in Fig. 6(a) for D = 1 with length L = N = 128.  6. (a) Local uncorrectable error probability of one P1(x12, d) or both P12(x12, d) reference probe qubits entangled with the system vs. d. Here, we took D = 1, R = 1/2, ne/N = ec = 1/4, and N = 128. Each two qubit gate in the circuit is a random Clifford gate. The local error probability is defined as the probability that the mutual information of a reference qubit is less than maximal after the optimal decoding. Note, the red curve almost perfectly coincides with the yellow curve, indicating an absence of connected correlations for these far separated uncorrectable errors. (b) Conditional error probability of probe 2 when an error effects probe 1 vs. scaled distance for different d. When a probe fails in a given region it implies that the second probe a distance ∼ d also fails with high probability.
We took R = 1/2 and n e = e c N = N/4 so that uncorrectable errors occur relatively frequently. We see convincing evidence that spatial locality plays an important role for these low depth codes, despite the potential for nonlocal effects induced by the syndrome measurements. In particular, when x 12 = L/2, then the joint failure distribution P 12 (L/2, d) factorizes into a product distribution [P 1 (L/2, d)] 2 at low depths d. On the other hand, when x 12 < d, there is a clear bunching effect whereby P 12 (x 12 , d) > [P i (L/2, d)] 2 . We study this more quantitatively in Fig. 6(b) in terms of the conditional failure probability of reference probe 2 given that reference probe 1 failed: P 2|1 (x, d) = P 12 (x, d)/P 1 (x, d). Rather intuitively, we see a collapse of the curves for different depths when this conditional profile is plotted as a function of x/d.
These spatial correlations in the uncorrectable errors are an indication that these low-depth codes retain features associated with spatial locality despite achieving the critical behavior of fully random or high-depth codes. Thus, in many respects, they are a truly distinct class of codes from fully random stabilizer codes.

VI. EXPURGATION ALGORITHM
As discussed in Sec. V A, there are strategies in higher dimensions to overcome the log N depth scaling found for random Clifford circuits. In this section, we introduce a natural method to improve the performance of these lowdepth codes based on the fact that the dominant failure mode at depths [log N ] 1/D ≤ d ≤ log N are rare regions with bad logical qubits.
The basic ingredient in our algorithm is the efficient implementation of quantum measurements of stabilizer code-space density matrices [138]. We assume we are given a single logical operator g. We can update a generating set for the stabilizer code and its logical operators by making a projective measurement of the code space density matrix following the tableau rules outlined by Aaronson and Gottesman [138] where the sign of the measurement outcome is randomly chosen. This projective measurement operation will not affect the original generating set for S except to add g to the list of generators; however, it will modify the logical operators to ensure that all of the remaining logical operators commute with g. This implies that the "destabilizer" operatorḡ associated with g is no longer a logical operator. As a result, we can form an [N, k − 1] stabilizer code or stabilizer subsystem code by converting g orḡ into a stabilizer or gauge operator, respectively. This procedure can be iterated to successively convert logical operators into additional stabilizers or gauge operators, while leaving the original syndrome stabilizers unaffected.
Specifically, in our expurgation algorithm we begin with an [N, k] stabilizer code with stabilizer group S and logical operators L. We then randomly generate an erasure pattern e and compute the matrix M (S, L, e). Performing row reduction allows us to form a basis {g i } of linearly independent errors that map to the zero syndrome, but have nontrivial logical operator content. We then perform a sequence of projective measurements of these operators as described above to form a new stabilizer or subsystem code. This procedure is iterated many times until either the rate of the code approaches a specified target value, the failure probability reaches a certain threshold, or the number of logical operators goes to zero (i.e., expurgation fails).
To put this algorithm on firmer mathematical footing, we prove the following two simple propositions: Proposition 1. Let S be the stabilizer group for a stabilizer subsystem code with logical operators L and gauge group G. For every g ∈ L ⊗ G that acts nontrivially on L, the distance of S after expurgating g into S or G monotonically increases.
We assume g and an anticommuting logicalḡ are two of the generators and they commute with all other generators for L. The distance of the subsystem code can be found by finding the first Pauli group element E D in this list that has s E D = 0 and a nontrivial anticommutator vector E D . If we expurgate g, then this removes g and g from the list of generators, which amounts to removing two of the columns from E D and adding one column to s E D or not (depending on the expurgation strategy). If E D becomes trivial, then the distance might increase depending on what happens to the next Pauli group element in the list ordered by the Hamming weight. If s E D becomes nontrivial, then the distance might also increase.
If s E D remains trivial and E D remains nontrivial, then the distance stays the same. Therefore, the distance is monotonic. Essentially, we use the following two properties of expurgation: (i) The stabilizer group never shrinks in size (it can even grow depending on the strategy) and (ii) the number of logical operators only decreases. Hence, the relevant set of operators that commute with stabilizers, and anticommute with some logical operator never grows. Therefore, the code distance -defined as the minimum Hamming weight of elements in the relevant set of operators -never decreases.
A related proposition that follows a similar line of reasoning is: Proposition 2. Let S be the stabilizer group for a stabilizer subsystem code with logical operators L and gauge group G. For every g ∈ L ⊗ G that acts nontrivially on L, the optimal decoding recovery probability of S after expurgating g into S or G monotonically increases for all Pauli error channels.
For each Pauli group element E in the list from Proposition 1, we let their probability of appearing in the error channel be p(E). We then group this list of anticommutator vectors into subsets with the same syndrome vector s i , which each occur with total probability P(s i ). We further break up these groups into error classesĒ ij of errors with identical values of Eij . The conditional recovery probability is the probability of the most likely error class divided by P(s i ), such that the total total recovery probability is P(R) = i p i . Expurgation of g will never decrease the total value of this sum. In the case where g is turned into a gauge operator, then the syndrome classes and their total probabilities are unchanged, while All-to- all   FIG. 7. (a) Interpolated depth d * to reach 50 % failure probability for a 2D random Clifford circuit with periodic boundary conditions vs log-system size. All logical operators were turned into gauge degrees of freedom during expurgation. We removed the small amount of N/32 to aid in extracting the scaling vs log 2 N to large sizes and small depths. We took an erasure fraction ne/N = 1/8 for both the expurgation algorithm and the calculation of the failure probability. (b) Same as (a), but for an all-to-all circuit in which N/2 pairs of sites are randomly selected to apply a two-qubit gate for each unit of depth. Each two-qubit gate in both geometries is a random Clifford gate.
the logical equivalence classes for that syndrome can only combine with each other or stay the same. As a result, p i is monotonically increasing for each i, which makes P(R) monotonically increase under expurgation. A similar argument holds when g is turned into a check operator. The dynamics during this expurgation process bears close resemblance to the purification dynamics of ρ S for random circuit models with measurements studied by two of the authors [93] and developed further in Refs. [103,107,119]. In that case, though, the measurements are not selectively chosen to project out certain logical operators, but rather they are chosen as random, few-site projective measurements. In both dynamics, however, we observe a similar trend that the entropy of the code-space density matrix progressively decreases with measurements until it reaches a plateau value. The plateau can either be at a subextensive value (a "pure" phase) or at a finite entropy density (a "mixed" phase). What is common between both types of dynamics is that, whenever there is residual entropy in the code-space density matrix, then the expurgated code is able to better protect the remaining logical qubits against future errors in the system that are statistically independent from the errors that helped form the code.
In Fig. 7, we provide an illustrative example of the performance improvements that are possible with this expurgation strategy for 2D and all-to-all random circuit encodings. In both cases, all expurgated logicals were turned into gauge qubits, which has the advantage that the syndrome check operators are unchanged. In this case, the support of each check operator is determined by the initial encoding circuit. Maintaining lowweight check operators has advantages for fault-tolerance by limiting the effects of measurement errors. For both geometries, we see nearly linear scaling of d * with log N before expurgation. After expurgation, d * has a strongly sublinear scaling with log N . We have also studied the performance of these expurgated codes in 1D, but we do not find improvement of the log N depth scaling upon expurgation. It is an interesting subject for future work to better characterize the full range of possibilities that result from this type of targeted expurgation process for quantum codes that begin with many logical qubits.

VII. HAAR RANDOM CODE THRESHOLD
In this section, we study the Haar random erasure threshold. We find a similar threshold erasure rate and critical scaling behaviors as the random stabilizer erasure thresholds; however, we observe small quantitative differences in the scaling functions near the critical point for the two codes. These results indicate that Haar random codes are more optimal than random stabilizer codes for erasure errors.
In contrast to our analysis of the stabilizer codes, we do not perform an optimal decoding analysis and only test for the existence of an erasure threshold. We consider the coherent quantum information of an initial state in the code space after application of the erasure channel in Eq. (9) on a random set e of the sites (38) where ρ Q is an initial encoded density matrix, and ρ e is the reduced density matrix on e. We study the purified channel where a reference system R is used to purify ρ Q and an environment E purifies the error operation [see inset to Fig. 8(a)]. In the case of the erasure error, the interaction of the system with the environment is through a SWAP operation of each erased qubit in e with a qubit in E. The mutual information between the reference and the fictitious environment is equal to where R is the rate of the code. As we discussed in Sec. III A, when I(R : E ) = |S(ρ Q ) − I c | < , then the max-entanglement fidelity for that input state satisfies F e (ρ Q ) ≥ 1 − 2 √ , i.e., for sufficiently small, the error channel can be approximately decoded.
In Fig. 8, we show the results of numerical simulations forĪ(R : E ) = E U I(R : E ) for the Haar random code with a ρ Q that acts trivially on the code space. We show the results for both fixed fraction and iid erasure errors. We see consistent scaling results with the random stabilizer code: the fixed-fraction error model leads to a finite-size rounding of the transition over a region scaling as |e − e c | ∼ 1/N . The random fluctuations in the total number of erasures in the iid model then round out the threshold even more, producing a "critical" region of width |e − e c | ∼ 1/ √ N and amplitudeĪ ∼ √ N at e c . To obtain a more direct comparison between the Haar random and random stabilizer codes, we show the average coherent quantum information of each code ensemble in Fig. 9. We see remarkably close quantitative agreement between the code performance; however, there are significant differences that appear at the critical point. In particular, in Fig. 9(b), we see that the two codes appear to be converging to substantially different values of I(R : E ) in the large-N limit of 0.720(5) (Haar) and 0.848(5) (Clifford). Thus, a Haar random code is slightly more optimal than a random stabilizer code in this region where the code fails. These quantitative differences in the scaling function indicate that the random stabilizer code does not necessarily saturate the performance of an optimal code, even at leading order in the large-N limit. In the case of the depolarizing channel, the optimal decoding threshold for a random stabilizer code is expected to be smaller than the channel capacity limit [124,125].

VIII. STATISTICAL MECHANICS MAPPING: DOMAIN WALL PINNING
In this section, we present an approximate mapping of the erasure threshold to a first-order domain-wall pinning transition in a related statistical mechanics description. This discussion applies to both Clifford and Haar models.
We consider the quenched average of the purity of a subregion A A natural approximation to the coherent quantum information is the difference in log-average purity of each subregion Although this quantity does not have a clear significance for error correction in general systems, we expect that, for deep Haar random circuits, the fluctuations in I c over circuits are small enough that it is well approximated by I p [73,74,80]. For any U constructed of local two-qubit gates distributed according a two-design, we can compute I p after circuit averaging using a well-studied mapping between the average purity of subregions of a D dimensional random circuit to a D + 1 dimensional partition function of an Ising model with certain boundary conditions at late times [74]. The condition that the initial state is mixed on the logical qubit degrees of freedom corresponds to a spin polarized bottom boundary condition on the logical qubit sites [97]. In this mapping, I p becomes the free energy cost of flipping the polarization of the top erased boundary condition in the presence of the polarized boundary condition due to the logical qubits (see Fig. 10). The temperature of the effective Ising model is well below the transition temperature, which implies that the free energy is minimized primarily through energy minimization. Using a minimal energy surface approximation, we obtain a direct estimate for the analog of the mutual information between the reference and the environment for the log-average purity 2n e n s , 2n e − n s , n s 2n e (1 + R)N, where n s = (1 − R)N . This quantity undergoes a phase transition at the same point as the optimal erasure threshold; thus, we suspect it captures some essential features of the threshold for the optimal code. In particular, the point 2n e = n s corresponds to a transition in the left half of Fig. 10 where the top boundary condition is no longer sufficiently strong to polarize the bulk of the system. In this case, the middle domain flips to align with the logical qubits.
Although it is clear that our codes will not be robust against erasure errors that occur during the encoding circuit, we can gain some additional insight into the breakdown of the threshold using this statistical mechanics model. In the Ising model mapping, erasures in the bulk correspond to fixing a finite density of spins in the bulk to point along the + direction, which will overcome the surface pinning effect and prevent the formation of the ordered − phase in the left of Fig. 10. As a result, in order to have a fault-tolerant encoding, some form of error correction should be applied during the evolution itself.

IX. CONCLUSIONS
In this paper, we revisited the study of quantum error correcting codes generated by low depth random circuits. In any spatial dimension, we found that a depth O(log N ) random circuit is necessary and sufficient to achieve high-performance coding against erasure errors below the optimal erasure threshold, set by the channel capacity. However, in 1D, coding arbitrarily close to the optimal threshold requires a depth O( √ N ) circuit due to the relevance of spatial randomness in errors near code capacity. The marginal dimension for high-performance, low-depth coding at capacity is 2D where spatial randomness becomes an irrelevant perturbation.
Although spatial randomness in the errors becomes irrelevant above 1D, there are still large inhomogeneities in the quality of the random code due to random circuit fluctuations. Using a simple block model, we showed that the effects of code randomness in D > 1 can be mitigated through correlated coding and the use of additional ancilla qubits that effectively reduce the rate of the code. An alternative strategy, that works better in practice, is to expurgate low-weight logical operators from the code using quantum measurements. With these methods, we found that good coding becomes possible at sub-log-N depths. Codes with rates near 1/2 generated by our random coding algorithms can achieve high performance at depth 4-8 in 2D for large erasure rates and block sizes of thousands of qubits.
The results in this work open up many directions for future research. To develop these codes for use on nearterm devices, a more general theory of optimal decoding for Pauli error channels should be developed. Efficient optimal decoding can likely be implemented for these low-depth codes by taking advantage of their strongly local nature. For example, a brute force method is sufficient in the block encoding model with logarithmic block sizes. It will also be interesting to consider the performance of these codes in conventional threshold theorems, including strategies for achieving full fault-tolerance, e.g., as can always be achieved with concatenation.
Another promising avenue of research is to further develop the expurgation algorithm, which we used to significantly reduce the required depth to achieve successful decoding of erasure errors. It has now been well established that fault-tolerant thresholds can be significantly improved by tailoring codes to the detailed properties of the noise [27][28][29][30]. The expurgation algorithm provides a wide variety of additional techniques to tailor codes to specific noise models. In addition, it may be possible to further improve the expurgation by using quantum measurements that explicitly implement entanglement swapping, similar to techniques used for the measurement based preparation of the surface code states [143,144].
As mentioned in the introduction, developing more concrete connections between the results here and measurement-induced phase transitions is also promising to explore. Unitary-measurement models that include both errors and active error correction may realize a different universality class of these transitions that might be more resilient in near-term quantum computing devices.  In this appendix, we prove that the max-average fidelity converges to the perfect recovery probability for the random stabilizer erasure threshold in the thermodynamic limit.
For an initial random pure state |0 |ψ on the unencoded logical qubits at the k sites i = n s + 1, . . . , N for n s = N − k, the probability of successful error correction following the encoding by the Clifford unitary U , erasure at sites e, syndrome measurements with outcome s, and maximum-likelihood recovery is given by P(R|U , ψ, s, e)P(s|U, ψ, e) = 0| ψ|U † R se where P si i = (I − (−1) si U Z i U † )/2 is a syndrome projector for sites i = 1, . . . , n s and R se is the conditional recovery operator. We include a sum over Kroenecker delta functions δ sse , where {s e } are the set of possible syndrome outcomes for e. This term is nonzero only when the observed syndrome is allowed for a given e, thus, it serves as a projector onto the space of allowed syndromes. The maximum possible size of {s e } is 2 2ne as this is the number of Pauli group elements with support only on e (modulo a phase).
The precise form of R se depends on the encoding circuit U in addition to s and e, therefore, it cannot be calculated in general without completely specifying U . On the other hand, we can use the fact that it can be moved past the syndrome projectors by turning this into a projector onto the perfect syndrome outcome. Since it has its support entirely on e, we can then cancel the product of the two recovery operators to arrive at the much simpler formula P(R|U, ψ, s, e)P(s|U, ψ, e) = se δ sse × 0| ψ|U † Tr e [U |0 |ψ 0| ψ|U † ] ⊗ I e 2 ne U |0 |ψ .
where P e runs over a basis of Pauli group elements that act on sites in the subset e. Since the Clifford group forms a two-design, we have the identity from random matrix theory where E µ is an average over the Haar measure on the unitary group on N qubits. This formula can be used to bound the average of the second term in Eq. (A3) as where n m se = min(n s , 2n e ) and in Eq. (A9) we used the fact that U |ψ = U U ψ |0 for U ψ distributed according to the Haar measure. As a result, up to corrections that decay exponentially with N for any erasure rate e < 1/2, we find the formula for the code-averaged max-average gate fidelitȳ In this appendix, we reproduce the standard formula for the number of rank-m 2n e × n s matrices over F 2 . To find the formula, we first use the fact the number of m × n s matrices of rank m is given by m−1 k=0 (2 ns −2 k ) = (2 ns −1)(2 ns −2) · · · (2 ns −2 m−1 ) (B1) because we have 2 ns − 1 choices for the first row and 2 ns − 2 i−1 choices for row i to ensure that it is linearly independent from the first i − 1 rows. When 2n e > m, then we have to account for linear dependence between rows of the matrix, which leads to a degeneracy that is equal to the number of m-dimensional subspaces of a 2n e dimensional vector space over .

(B3)
Appendix C: Independent-identically distributed erasure errors In this appendix, we derive the leading order RMT solution for the recovery probability for independentidentically distributed (iid) erasure errors with error rate e. As noted in Sec. II B, the failure probability for iid errors can be obtained from the failure probability for the fixed-fraction model with n e = eN . After averaging over all possible n e , there is additional rounding of the transition due to Poisson fluctuations in the total number of erasures. To evaluate the associated finite-size scaling, we make use of the fact that the total number of erasures is an extensive variable whose fluctuations are governed by the central limit theorem n e = eN + ∆, ∆ ∼ N (0, σ 2 ), σ = e(1 − e)N (C1) In averaging over the erasure errors, we can ignore the critical region for fixed n e because it has a width ∼ 1 that is much less than the typical fluctuations in n e ∼ √ N where erfc(·) is the complementary error function and f (x, e) is the scaling function for this random code transition. At the critical point, f (0, e c ) = 2e c (1 − e c )/π. This analysis implies that the critical region after averaging over n e has a width scaling as |e − e c | ∼ 1/ √ N that arises from the width of the probability distribution of n e . Similarly, the average log-failure probability at the critical erasure rate e c scales as √ N .
Appendix D: Self-averaging of random stabilizer code transition One of the central assumptions in this work is that the finite-size scaling behavior of random stabilizer codes FIG. 11. Fluctuations in the recovery probability over random codes vs. N for the fixed-fraction erasure model with R = 1/2 at e = ec. The recovery probability for a random code appears to be self-averaging towards the RMT prediction rc at large N . We took a depth 2N encoding circuit in 1D with a brickwork arrangement of gates. Each two-site gate in the circuit was a random Clifford gate.
near threshold well approximates the behavior of the optimal codes. A necessary condition for this to be true is that the random codes are self-averaging in the sense that a single realization of a random code has the same properties as the average over codes in the large N limit. To test this self-averaging condition, we investigate the convergence with N towards the RMT prediction for the critical recovery probability r c for single realizations. Numerical Monte Carlo results for the standard deviation are shown in Fig. 11. We fix a random Clifford unitary U generated by a high-depth circuit (depth 2N in 1D). For that circuit, we then estimate P(R) at the critical point of the optimal codes for the fixed-fraction erasure model. By generating many codes, we can then estimate the variance E U [P(R) − r c ] 2 through sampling. Over the range of sizes shown in the figure, we see clear exponential decay of the standard deviation with N , indicating that P(R) self-averages to the RMT prediction r c in the large N limit.