The cost of universality: A comparative study of the overhead of state distillation and code switching with color codes

Estimating and reducing the overhead of fault tolerance (FT) schemes is a crucial step toward realizing scalable quantum computers. Of particular interest are schemes based on two-dimensional (2D) topological codes such as the surface and color codes which have high thresholds but lack a natural implementation of a non-Clifford gate. In this work, we directly compare two leading FT implementations of the T gate in 2D color codes under circuit noise across a wide range of parameters in regimes of practical interest. We report that implementing the T gate via code switching to a 3D color code does not offer substantial savings over state distillation in terms of either space or space-time overhead. We find a circuit noise threshold of 0.07(1)% for the T gate via code switching, almost an order of magnitude below that achievable by state distillation in the same setting. To arrive at these results, we provide and simulate an optimized code switching procedure, and bound the effect of various conceivable improvements. Many intermediate results in our analysis may be of independent interest. For example, we optimize the 2D color code for circuit noise yielding its largest threshold to date 0.37(1)%, and adapt and optimize the restriction decoder finding a threshold of 0.80(5)% for the 3D color code with perfect measurements under Z noise. Our work provides a much-needed direct comparison of the overhead of state distillation and code switching, and sheds light on the choice of future FT schemes and hardware designs.

In this section we first provide this work's motivation and a summary of our main results in Section I A, and then review some relatively standard but important background material that we will refer to throughout the paper. In Section I B we describe the noise models and simulation approaches that we use to analyze and simulate state distillation and code switching. In Section I C we provide some basic information about the color codes. In Section I D we review how to implement logical operations using 2D color codes. Finally in Section I D we review state distillation.

A. Motivation and summary of results
Recent progress in demonstrating operational quantum devices [1][2][3][4][5] has brought the noisy intermediate-scale quantum era [6], where low-depth algorithms are run on small numbers of qubits. However, to handle the cumulative effects of noise and faults as these quantum systems are scaled, fault-tolerant (FT) schemes [7][8][9][10][11][12][13][14][15] will be needed to reliably implement universal quantum computation. FT schemes encode logical information into many physical qubits and implement logical operations on the encoded information, all while continually diagnosing and repairing faults. This requires additional resources, and much of the current research in quantum error correction (QEC) is dedicated toward developing FT schemes with low overhead.
The choice of FT scheme to realize universal quantum computing has important ramifications. Good schemes can significantly enhance the functionality and lifetime of a given quantum computer. Moreover, FT schemes vary in their sensitivity to the hardware architecture and design, such as qubit quality [16], connectivity, and operation speed. The understanding and choice of FT scheme will therefore influence the system design, from hardware to software, and developing an early understanding of the trade-offs is critical in our path to a scalable quantum computer.
At the base of most FT schemes is a QEC code which (given the capabilities and limitations of a particular hardware platform) should: (i) tolerate realistic noise (ii) have an efficient classical decoding algorithm to correct faults, and (iii) admit a FT universal gate set. In the search of good FT schemes we focus our attention on QEC codes which are known to achieve as many of these points as possible with low overhead. Topological codes are particularly compelling as they typically exhibit high accuracy thresholds with QEC protocols involving geometrically local quantum operations and efficient decoders; see e.g. Refs. [15,[17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]. Two-dimensional (2D) topological codes such as the toric code [32,33] and the color code [34] are particularly appealing for superconducting [35][36][37] and Majorana [38,39] hardware, where qubits are laid out on a plane and quantum operations are limited to those involving neighboring qubits.
The FT implementation of logical gates with 2D topological codes poses some challenges. The simplest FT logical gates are applied transversally, i.e., by independently addressing individual physical qubits. These gates are automatically FT since they do not grow the support of errors. Unfortunately a QEC code which admits a universal set of transversal logical gates is ruled out by the Eastin-Knill theorem [40][41][42]. Furthermore, in 2D topological codes such gates can only perform Clifford operations [43][44][45][46]. There are, however, many innovative approaches to achieve universality, which typically focus on implementing non-Clifford logical gates [47][48][49], which achieve universality when combined with the Clifford gates.
The standard approach to achieve universality with 2D topological codes is known as state distillation [50][51][52]. It relies on first producing many noisy encoded copies of a T state |T = (|0 +e iπ/4 |1 )/ √ 2, also known as a magic state, and then processing them using Clifford operations to output a high fidelity encoded version of the state. The high-fidelity T state can then be used to implement the non-Clifford T = diag(1, e iπ/4 ) gate. Despite significant recent improvements, the overhead of state distillation is expected to be large in practice [35,53]. A compelling alternative is code switching via gauge fixing [54][55][56][57] to a 3D topological code which has a transversal T gate. The experimental difficulty of moving to 3D architectures could potentially be justified if it significantly reduces the overhead compared to state distillation. To compare these two approaches and find which is most practical for consideration in a hardware design, a detailed study is required.
FIG. 1. Two methods of preparing high-fidelity T states encoded in the 2D color code: state distillation and code switching. Both approaches are implemented using quantum-local operations in 3D, i.e., noisy quantum operations are geometrically local, whereas ideal classical operations can be performed globally. In state distillation, many noisy encoded T states are produced and fed into a Clifford distillation circuit. In code switching, one switches to the 3D color code, where the transversal T gate is implemented.
In our work, we estimate the resources needed to prepare high-fidelity T states encoded in the 2D color code, via either state distillation or code switching. We assume that both approaches are implemented using quantum-local operations [58] in 3D, i.e., quantum operations are noisy and geometrically local, whereas classical operations can be performed globally and perfectly (although they must be computationally efficient). In particular, we simulate these two approaches by implementing them with noisy circuits built from single-qubit state preparations, unitaries and measurements, and two-qubit unitaries between nearby qubits. For state distillation, this 3D setting allows a stack of 2D color code patches, whereas for code switching it allows to implement the 3D color code; see Fig. 1. We then seek to answer the following question: to prepare T states of a given fidelity, are fewer resources required for state distillation or code switching?
Our main finding is that code switching does not offer substantial savings over state distillation in terms of both space overhead, i.e., the number of physical qubits required, and space time overhead, i.e., the space overhead multiplied by the number of physical time units required; see Fig. 2. State distillation significantly outperforms code switching over most of the circuit noise error rates 10 −4 ≤ p ≤ 10 −3 and target T state infidelities 10 −20 ≤ p fin ≤ 10 −4 , except for the smallest values of p, where code switching slightly outperformed state distillation. In our analysis we carefully optimize each step of code switching, and also investigate the effects of replacing each step by an optimal version to account for potential improvements. On the other hand we consider only a standard state distillation scheme, and using more optimized schemes such as [59][60][61] would give further advantage to a state distillation approach. We also find asymptotic expressions which support our finding that state distillation requires lower overhead than code switching for p 1 and log p fin / log p 1. In particular, the space and space-time overhead scale as (log p fin / log p) Γ * and (log p fin / log p) Γ * +1 respectively, where Γ CS = 3 for code switching and Γ SD = log 3 15 = 2.46 . . . for the distillation scheme we implement.
To arrive at our main simulation results, we accomplish the intermediate goals below. The comparison of (a) the qubit and (b) space-time overhead as a function of the infidelity p fin of the output T state for state distillation and code switching. Possible future improvements of any steps of our code switching protocol would be included within the shaded region. Note there is no code switching curve for p = 0.001 without assuming optimistic improvements to the protocol as this is higher than the observed threshold for code switching.
2D color code optimization and analysis.-In Section II, we first adapt the projection decoder [30] to the setting where the 2D color code has a boundary and syndrome extraction is imperfect, as well as optimize the stabilizer extraction circuits. We find a circuit noise threshold greater than 0.37(1)%, which is the highest to date for the 2D color code, narrowing the gap to that of the surface code. We also analyze the noise equilibration process during logical operations in the 2D color code and provide an effective logical noise model.
Noisy state distillation analysis.-Using the effective logical noise model, we carefully analyse the overhead of state distillation in Section III. We strengthen the bounds on failure and rejection rate by explicitly calculating the effect of faults at each location in the Clifford state distillation circuits rather than simply counting the total number of locations [35,53,[62][63][64]. We remark that we stack 2D color codes in the third dimension to implement logical operations such as the CNOT in constant time, whereas strictly 2D approaches such as lattice surgery would require a time proportional to code distance. The circuit-noise threshold for this state distillation scheme with the 2D color code is equal to the error correction threshold of 0.37(1)%.
Further insights into 3D color codes.-In Section IV, we provide a surprisingly direct way to switch between the 2D color code and the 3D subsystem color code. Our method exploits a particular gauge-fixing of the 3D subsystem color code for which the code state admits a local tensor product structure in the bulk and can therefore be prepared in constant time. We also adapt the restriction decoder [31] to the setting where the 3D color code has a boundary and optimize it, which results in a threshold of 0.80(5)% and a better performance for small system sizes.
End-to-end code switching simulation.-Section V is the culmination of our work, where building upon results from the previous sections we provide a simplified recipe for code switching, detailing each step and specifying important optimizations. In our simulation, we exploit the special structure of the 3D subsystem color code to develop a method of propagating noise through the T gates in the system, despite the believed computational hardness of simulating general circuits with many qubits and T gates. We numerically find the failure probability of implementing the T gate with code switching as a function of the code distance and the circuit noise strength, which, in turn, allows us to estimate the T gate threshold to be 0.07(1)%. We not only find numerical estimates of the overhead of the fully specified protocol, but also bound the minimal overhead of a code switching protocol with various conceivable improvements, such as using optimal measurement circuits, and optimal classical algorithms for decoding and gauge fixing of the 3D color code.
This work provides a much-needed comparative study of the overhead of state distillation and code switching, and enables a deeper understanding of these two approaches to FT universal quantum computation. More generally, careful end-to-end analyses with this level of detail will become increasingly important to identify the most resource-efficient FT schemes and, in turn, to influence the evolution of quantum hardware. Although our study focuses on color codes, we expect our main finding, i.e., that code switching does not significantly outperform state distillation, to hold for other topological codes such as the toric code as considered in Ref. [49]. Furthermore, we believe that state distillation will not be outperformed by code switching exploiting either 2D subsystem codes [65][66][67] or emulation of a 3D system with a dynamical 2D system [68][69][70][71] since these schemes are even more constrained than when 3D quantum-local operations are allowed. We remark that there are other known FT techniques for implementing a universal gate set [12,[72][73][74][75][76], however they are not immediately applicable to large-scale topological codes. Nevertheless, we are hopeful that there are still new and ingenious FT schemes to be discovered that could dramatically reduce the overhead and hardware requirements for scalable quantum computing.

B. Noise and simulation
The noise model we use throughout the paper is the depolarizing channel, which on single-and two-qubit density matrices ρ (1) and ρ (2) has the action with probability p 3 . Similarly, the depolarizing channel leaves a two-qubit state unaffected with probability 1 − p and with probability p 15 applies a nontrivial Pauli error P 1 ⊗ P 2 . We consider three standard scenarios, 1. Depolarizing noise.-Error correction is implemented with perfect measurements following a single time unit, during which single-qubit depolarizing noise of strength p acts. This is often referred to in the literature as the code capacity setting.
2. Phenomenological noise.-Error correction is implemented with perfect measurements following each time unit, during which single-qubit depolarizing noise of strength p acts. However the measurement outcome bits are flipped with probability p.
3. Circuit noise.-Measurements are implemented with the aid of ancilla qubits and a sequence of one-and two-qubit operations; see Fig. 3(a). One-and two-qubit unitary operations experience depolarizing noise of strength p. One-qubit preparations and measurements fail with probability p by producing an orthogonal state or flipping the outcome; see Fig. 3 An example of an ideal circuit to measure weight-two operator ZZ. We assume that every gate, as well as preparation and single-qubit measurements take one time unit. (b) The noisy circuit is modeled with ideal gates followed by the depolarizing channel of strength p; ideal preparation can fail and produce a state orthogonal to the desired one with probability p, and the outcome of the ideal measurement can be flipped with probability p.
In circuit noise, we approximate every noisy gate, i.e., Pauli X, Y and Z operators, the Hadamard gate H, the phase gate T , the controlled-not gate CNOT, and the idle gate I, by an ideal gate followed by the depolarizing channel on qubits acted on by the gate see Fig. 3. Preparations of the state orthogonal to that intended occur with probability p, and measurement outcome bits are flipped with probability p. We assume that all the elementary operations take the same time, which we refer to as one time unit.
For each of the three noise models, we will use error rate and noise strength interchangeably to describe the single parameter p.
We assume a special form of noise on T states, which is justified as follows. Consider an arbitrary single-qubit state written in the orthonormal basis {|T , |T ⊥ = Z|T }. Now consider a 'twirling operation' consisting of randomly applying the Clifford XS † ∝ |T T | − |T ⊥ T ⊥ |, with probability 1/2. This singlequbit Clifford gate can be implemented instantaneously and perfectly by a frame update. 1 The state is transformed as follows We therefore assume that the noisy T state is of the form ρ = (1 − q)|T T | + q|T ⊥ T ⊥ |, or equivalently that each T state is afflicted by a Z error with probability q. Due to this simplified form of the noise, we use infidelity, which is defined by 1 − T |ρ|T , interchangeably with the noise rate and noise strength to refer to the single parameter q when describing errors on T states. For the purpose of defining pseudo-thresholds later, we find it useful to define the physical error probability p phy (t) for a time t as the probability that a single physical qubit will have a nontrivial operator applied to it over t time units under this noise model. It can be calculated as follows where Pr(I) = 1 − p and Pr(X) = Pr(Y ) = Pr(Z) = p/3. For example p phy (1) = p, p phy (2) = 2p(1 − p) + 2p 2 /3, etc. Note that lim t→∞ p phy (t) = 3/4. To simulate noise, for Clifford circuits we track the net Pauli operator which has been applied to the system by noisy operations using the standard binary symplectic representation. When non-Clifford operations are involved, we use modified techniques which are explained throughout the text.
To estimate the statistical uncertainty of any quantity of interest ξ we use the bootstrap technique, i.e., we repeat sampling from the existing data set I = {I 1 , . . . , I a } to evaluate ξ = ξ(I). In particular, for i = 1, . . . , b we (i) randomly choose a data points I i(j) from the data set I , where i(j) ∈ {1, . . . , a}, (ii) evaluate the quantity ξ i = ξ(I i ) using the data set I i = {I i(1) , . . . , I i(a) }. We remark that the same data point can be chosen multiple times in step (i). We then estimate the quantity of interest to be Note that when reporting estimated values, we will use a digit in parenthesis to indicate the standard deviation of the preceding digit. For example, we write 0.021(3) in place of 0.021 ± 0.003.
We will often consider the failure probability p task of various tasks using a distance d code and noise strength p. We use the following ansatz that characterizes the generic feature that p task decreases exponentially with d for any p below the threshold value p * , i.e., where α(p) and β(p) are functions of p alone. In particular, β(p) is smaller than one for p < p * . This can be considered a generalization of the heuristic behavior of error correction failure rate (p/p * ) (d+1)/2 in topological codes for error rate p in the vicinity of their threshold p * [35,77,78].

C. Basics of 2D and 3D color codes
Here we briefly review some important features of color codes, focusing on 2D and 3D. We also specify the lattices and some notation we use throughout the paper. For a more complete review of the topics covered in this subsection, see Refs. [56,79,80].
Color codes are topological QEC codes which can be defined on any D-dimensional lattice composed of D-simplices with (D + 1)-colorable vertices, where D ≥ 2. Recall that 0, 1, 2, 3simplices are vertices, edges, triangles and tetrahedra, respectively. Qubits are placed on Dsimplices of the lattice; X-and Z-type gauge generators are on (D − 2 − z)-and (D − 2 − x)simplices, and X-and Z-type stabilizer generators are on x-and z-simplices, where x, z ≥ 0 and The lattice L 2D (gray edges with R, G and B vertices) and the corresponding primal lattice L * 2D (black edges, black and white vertices). Qubits are places at triangles in L 2D (vertices in L * 2D ), whereas X-and Z-stabilizer generators correspond to interior vertices in L 2D (faces in L * 2D ). (b) For the lattice L 3D , qubits are identified with tetrahedra, whereas X-and Z-stabilizer generators correspond to interior vertices and interior edges, respectively. Note that L 2D can be obtained from L 3D by retaining only those vertices connected to the boundary vertex v Y along with the edges and faces only containing those vertices.
x + z ≤ D − 2. We say an operator is on a k-simplex when its support comprises all the D-simplices containing that k-simplex.
In this paper, we focus on color codes on two particular (families of) lattices: a 2D triangular lattice with a triangular boundary L 2D , and a 3D bcc lattice with a tetrahedral boundary L 3D ; see Fig. 4. We will occasionally refer to L 2D as the triangular lattice and L 3D as the tetrahedral lattice. Most of the time we rely on context and drop the subscript, simply writing L. Members of these lattice families are parameterized by the code distance d = 3, 5, 7, . . . of color codes defined on them. We typically work with the dual lattice L, but occasionally make use of the primal lattice L * . We remark that the graph constructed from the vertices and edges of the primal lattice L * is bipartite [79]; see Fig. 4

(a).
Given a lattice L, it is useful to define ∆ k (L) as the set of all k-simplices in L. We will abuse notation and write β ∈ ∆ b (α) (or equivalently β ⊆ α) to denote that a b-simplex β belongs to an a-simplex α, where 0 ≤ b ≤ a ≤ D. It is also useful to construct the color-restricted lattice L K , where K is a set of colors, by removing from L all the simplices, whose vertices have colors not only in K. For example, the restricted lattice L RG is obtained by keeping all the vertices of color R or G, as well as all the edges connecting them. We refer to the edges in L RG as RG edges; similarly for other colors. For any subset of edges γ ⊆ ∆ 1 (L) and any vertex v ∈ ∆ 0 (L) we define a restriction γ| v as the subset of edges of γ incident to v. We separate vertices in L into two types: boundary vertices (the three and four outermost vertices in Fig. 4(a) and (b), respectively), and all others which we call interior vertices. We call edges connecting two boundary vertices boundary edges (there are three and six boundary edges in Fig. 4(a) and (b), respectively), and call all other edges interior edges. More generally, we denote the sets of interior objects by ∆ k (L), which is the set of all k-simplices in L containing at least one interior vertex On the 2D lattice, we define the stabilizer 2D color code as follows. Qubits are on triangles, and both X-and Z-stabilizer generators are on interior vertices ∆ 0 (L). This code has one logical qubit and string-like logical operators. One can implement the full Clifford group transversally on the 2D color code.
On the 3D lattice, we can have either a stabilizer color code 2 (with x = 0 and z = 1) or a subsystem color code (with x = z = 0). In both cases, there is one logical qubit and the physical qubits are on tetrahedra. For the 3D subsystem color code, X-and Z-stabilizer generators are on interior vertices ∆ 0 (L), while X-and Z-gauge generators are on interior edges ∆ 1 (L). Recall that for subsystem codes, logical Pauli operators come in two flavors: bare logical operators which commute with the gauge generators, and dressed logical operators which commute with the stabilizer generators. In the 3D subsystem color code, bare and dressed logical operators are sheetand string-like, respectively. One can implement the full Clifford group transversally on the 3D subsystem color code.
For the 3D stabilizer color code, X-and Z-stabilizer generators are on interior vertices ∆ 0 (L) and interior edges ∆ 1 (L) respectively. The logical Pauli Z and X operators are string-and sheetlike, respectively. Crucially, the T gate is a transversal logical operator. To implement it, we split the n qubits in L into two groups, (n + 1)/2 white tetrahedra and (n − 1)/2 black tetrahedra, such that no two tetrahedra of the same color share a face. Applying T to white qubits and T −1 to black qubits implements the logical T gate. For notational convenience we write T = T ±1 , determined by the color of the qubit.
Lastly, it is useful to introduce the notions of vector spaces associated with the constituents of the lattice L and linear maps between them. We define C i to be a vector space over F 2 with the set of i-simplices ∆ i (L) as its basis. Note that the elements of C i are binary vectors and we can identify them with the subsets of ∆ i (L). For any a, b ∈ [D] = {0, 1, . . . , D} we define a generalized boundary operator ∂ a,b : C a → C b , which is an F 2 -linear map specified on the basis element α ∈ ∆ a (L) as follows We remark that the standard boundary operator ∂ i : C i → C i−1 is a special case of the generalized boundary operator ∂ a,b above if we choose a = b + 1 = i. These boundary maps are helpful in discussing error correction with the color code. In particular, since the color code is a CSS code, we can cast the decoding problem in terms of chain complexes, and treat X and Z errors and correction independently [30,31]. For the 2D color code, the boundary map ∂ 2,0 allows us to find for any error configuration ⊆ ∆ 2 (L 2D ) its point-like syndrome via ∂ 2,0 , where is the support of either X or Z error. For the 3D stabilizer color code, the syndromes of X and Z errors correspond to loop-like and point-like objects and can be found as ∂ 3,1 and ∂ 3,0 , where ⊆ ∆ 3 (L 3D ) denotes the support of X and Z error, respectively. In Section IV B we discuss in detail the structure of the loop-like gauge measurement outcomes for the 3D subsystem color code.

D. Fault-tolerant computation with 2D color codes
Here we briefly review an approach to implement quantum computation with a 3D stack of the 2D color codes as in Fig. 5(a). Note that alternatively we could lay out the 2D color codes on a 2D plane and use lattice surgery [81], although many of the elementary logical operations, e.g. CNOT gates, would be slower than for a stack.
Error correction and decoding.-On each patch of the 2D color code, QEC is continuously performed. We use the term QEC cycle to refer to a full cycle of stabilizer extraction circuits producing a syndrome σ, i.e., the set of measured stabilizer generators with outcome −1. In a scenario in which measurements are performed perfectly, a perfect measurement decoder is used to infer a correction C for any error E given the input σ(E). The correction C will return the system to the code space if applied. The perfect measurement decoder fails if and only if CE is a nontrivial logical operator.
In a scenario in which measurements are not perfect, we consider a sequence of QEC cycles t = 1, . . . n cyc , with error E t introduced during each cycle, and observed syndrome σ t for each. For simplicity we assume that the first and last cycle have no additional error and that the syndrome is measured perfectly. Then a faulty measurement decoder is a decoder with takes the full history of syndromes σ 1 , σ 2 , . . . σ ncyc and outputs a correction C such that CE 1 . . . E ncyc has trivial syndrome. The faulty measurement decoder fails if and only if CE 1 . . . E ncyc is a nontrivial logical operator. We discuss decoders for the 2D color code in detail in Section II.
Logical operations.-The elementary logical operations for the 2D color codes are implemented as follows.
• State preparation.-To prepare the |0 state each data qubit is prepared in |0 , then d QEC cycles are performed, and the X-type syndrome outcomes σ at the first cycle are inferred (d cycles are needed to do so fault-tolerantly). A Z-type fixing operator with the syndrome σ is applied. The |+ state is prepared analogously.
• Idle operation.-A single QEC cycle is performed.
• Single-qubit Clifford gates.-As the gates are transversal, these are done in software by Pauli frame updates so are instantaneous and perfect.
• CNOT gate.-This is implemented by the application of a transversal CNOT between adjacent patches in the the stack, and followed by n after cyc QEC cycles; see Section II D for details.
• SWAP gate.-This is implemented by swapping the data qubits on adjacent patches, and followed by a single QEC cycle.
• Measurement.-The readout of single-qubit measurements in the X and Z basis is implemented by measuring each data qubit in the X and Z basis. The output bit string is then processed in two stages: (1) a perfect measurement decoder is run to correct the bit string such that it satisfies all stabilizers, and then (2) the outcome is read off from the parity of the corrected bit string restricted to the support of any representative of the logical operator.
The above list consists of Clifford operations, which are not by themselves universal for quantum computation. In addition, we consider the following non-Clifford gate.
FIG. 6. The three known types of state distillation schemes for a resource state |R . Each wire represents a logical qubit of a QEC scheme that fault-tolerantly performs all Clifford gates. (a) We need a code with a transversal non-Clifford gate U R , such that U R |S = |R for some stabilizer state |S . (b) We need a code with a transversal gate C R , such that C R |R = |R . The measurement of the stabilizers of the code will be probabilistic even when |R input is noiseless, with postselected state |φ(+1, . . . , +1) = |R . (c) We need a code with a transversal gate C R , such that C R |R = |R and C R = U R P U † R for some Pauli operator P .
• T gate. It is implemented using gate teleportation of an encoded T state |T = (|0 + e iπ/4 |1 )/ √ 2 as shown in Fig. 5(b). We will consider the production of the encoded T state by both state distillation and code switching.

E. State distillation
Here we briefly review state distillation [50,51], in which Clifford operations are used with postselection to convert noisy resource states into fewer but crucially less noisy resource states. In our analysis of the overhead of state distillation in Section III, we consider only the standard 15-to-1 state distillation protocol. However, given the wide range of state distillation protocols that have been proposed in recent years, we take the opportunity to attempt to consolidate the concepts behind them here. In particular, we outline three classes of state distillation protocols, which to our knowledge include all known proposals up to small variations. We then go on to describe a family of protocols recently introduced by Haah and Hastings [61], for which the 15-to-1 protocol that we consider is an special instance. In our discussion here we will assume Clifford operations are perfect, although we will relax that assumption in Section III.
Let C R be an operator stabilizing a resource state |R , i.e., C R |R = |R , and U † R be a non-Clifford unitary transforming |R into some stabilizer state |S , i.e., U † R |R = |S . We will refer to C R and U R as the resource stabilizer and resource rotator, respectively, and throughout when we write that U R should be applied, it can be implemented using |R by gate teleportation as in Fig. 5(b). The three classes of state distillation protocols are then summarized as follows (and depicted Fig. 6).
(a) Code projector then resource rotator.-We use a code with a transversal logical gate U R = U R ⊗ U R ⊗ · · · ⊗ U R . We prepare the (noisy) logical state |R by first encoding the logical stabilizer state |S and then implementing the transversal gate U R (using n copies of the noisy resource state |R ). Unencoding |R and postselecting on the +1 measurement outcomes of the stabilizers of the code gives a distilled resource state with infidelity reduced from q to O(q d ) for code distance d. See schemes in [50,59,61,82,83].
(b) Resource stabilizer then code projector.-We use a code with a transversal logical gate C R = C R ⊗ C R ⊗ · · · ⊗ C R . We start with n noisy resource states |R ⊗n , which by definition satisfy C R |R ⊗n = |R ⊗n , then measure the stabilizers of the code. If all measurement outcomes are +1, then the state |R has been prepared, which can then be further decoded to yield a distilled resource state |R with infidelity suppressed from q to O(q d ). Note that this approach seems less promising than the other two since the probability of successful postselection is less than one even for perfect resource states. See schemes in [50,84].
(c) Code projector then resource stabilizer.-We use a code with a transversal logical gate C R = C R ⊗ C R ⊗ · · · ⊗ C R and assume that U R C R U † R = P is a Pauli operator. 3 First, we encode a resource state |R with infidelity q 1 in the code, giving |R , and then measure the logical operator C R and postselect on the +1 outcome. To measure C R we use a measurement gadget consisting of the following three steps. First, apply U R ⊗ U R ⊗ · · · ⊗ U R using n noisy |R states, each with infidelity q 2 . Second, apply the Clifford gate control-(P ⊗ P ⊗ · · · ⊗ P ), controlled by an ancilla state |+ . Third, apply U † R ⊗ U † R ⊗ · · · ⊗ U † R using another n noisy |R states, each with infidelity q 2 . After this gadget, the stabilizers are checked, and if all are satisfied, then the encoded state is decoded and kept as a distilled resource state. The output has infidelity suppressed to O(q 2 1 ) + O(q d 2 ), but the suppression with respect to q 1 can be boosted by, for example, repeating the measurement gadget. See schemes in [51,52,[85][86][87][88][89].
Historically, the first state distillation protocol was proposed by Knill [51] (type c), which takes 15 input T states of infidelity q to produce an output of infidelity 35q 3 , with acceptance probability 1 − 15q. Shortly after, Bravyi and Kitaev [50] proposed two schemes, the first of which is type a in our classification, but which has the same parameters as Knill's (later the two schemes were shown to be mathematically equivalent [90]). The second scheme in Ref. [50] is of type b and has less favorable parameters than the 15-to-1 scheme, but a higher state distillation threshold. Another type b scheme was found by Reichardt [84], which could successfully distill states arbitrarily close to the stabilizer polytope. In Ref. [59], type a schemes outputting multiple resource states were proposed. Multiple outputs were also achieved for type b schemes in Refs. [86,89]. These 'high rate k/n' schemes have promising parameters and may perform well in certain regimes. However, they also tend to have large Clifford circuits likely inhibiting their practicality when including the effect of realistically noisy Clifford operations. Recently, Haah and Hastings introduced yet another family of state distillation protocols of type a [61]. These are based on puncturing quantum Reed-Muller codes and have both interesting asymptotic properties [83] and appear to be practically favorable [61]. We review this family here, which includes the 15-to-1 scheme that we analyze in detail in Section III.
Haah-Hastings state distillation protocols.-This family of state distillation protocols [61] involves first producing what we will call a Reed-Muller state |QRM r on n = 2 3r+1 qubits, where r = 1, 2, . . . , with a Clifford circuit of depth 3r + 1 using (3r + 1)2 3r CNOTs; see Fig. 7. This state has the property that for any subset of k qubits, the state can be decomposed as a set of k Bell pairs between those 'punctured' qubits and those which remain, namely where the states of the k punctured qubits have no bar, and where {|0 i , |1 i } i=1,...,k are logical basis states for a [[n−k, k, d Z ]] CSS code which is triply-even and that transversal T implements a logical T on each logical qubit of the code. Let the Z-distance d Z be the weight of the smallest nontrivial Z-type logical operator of this code. To specify the CSS code, which we call the punctured Reed-Muller code, one can start with the X-and Z-type stabilizer generators of the |QRM r state (which, for example, could be specified by propagating the initial single-qubit stabilizers through the CNOT circuit in Fig. 7). Choosing a stabilizer generator set for |QRM r in such a way that only one X-and one Z-type generator have nontrivial support on each punctured qubit, these form logical operators and the remaining generators with no support on any punctured qubits form the stabilizer generators of the punctured Reed-Muller code. The next step of the protocol is to apply a (noisy) T gate to each non-punctured qubit, To see this, note that (I ⊗ T ) |00 +|11 . Finally, one measures the logical X i operators of the CSS code. If the measurement outcome of X i is +1, then the ith punctured qubit is left in the state T |+ ; otherwise, the ith punctured qubit is in the state T |− = ZT |+ , requiring a simple Pauli Z fix. In practice this is achieved by measuring all n − k non-punctured qubits in the X-basis, and post-processing. Note that post-processing the measurement outcomes also tells us what the X-stabilizer generators for the CSS code are, which should all be +1 in the absence of error. This, in turn provides, a postselection condition of the protocol. The scheme takes n − k encoded T states of infidelity q, and outputs k encoded T states of infidelity Aq d Z , where A is the number of nontrivial Z-type logical operators of minimal weight d Z .
In Fig. 7 we show an instance of the Haah-Hastings protocol for r = 1, which we analyze more explicitly in Section II. To our knowledge, this instance was first shown in Ref. [35], and is essentially a modification of the 15-to-1 Bravyi-Kitaev protocol, which avoids the need of the unencoding part of the circuit in Fig. 6(a). Some larger instances of this family have very good properties for state distillation, although we do not analyze them in detail here. A 15-to-1 state distillation circuit as the r = 1 instance of the Haah-Hastings construction. Each qubit is labeled by a string of 3r + 1 bits, and if the bit string has weight at most r, then that qubit is prepared in the |+ state, otherwise in the |0 state. In the jth round of CNOT gates, we apply CNOTs between the pairs of qubits with bit string differing only at the jth position. Note that the blue CNOTs are redundant as their control or target qubits are |0 or |+ , respectively. The resulting state is |QRM 1 ∝ |+ |+ + |− |− , where in our case |+ and |− are code states of the 15-qubit Reed-Muller code. We then apply the transversal T gate and measure every non-punctured qubit in the X basis to infer the measurement outcomes m i 's for X-stabilizers and m X for the logical X. The protocol rejects if m i = −1 for some i. If there are no faults, then the the remaining unpunctured qubit is in the state T 3−2m X |+ .

II. 2D COLOR CODE ANALYSIS
In this section, we describe an efficient and optimized implementation of the 2D color code under circuit noise. First, in Section II A we adapt Delfosse's color code projection decoder [30] to allow for boundaries in the lattice. Then, we further adapt the decoder to accommodate faulty measurements in Section II B. We optimize the stabilizer extraction circuits in Section II C, finding a circuit noise threshold of 0.37(1)%. This represents a significant improvement over the previous highest threshold value for the color code of 0.2% in [37], and brings it closer to the toric code threshold of near one percent 4 . However, we stress that we are primarily interested in the finitesize rather than asymptotic performance, and therefore take care to accurately account for effects that are often ignored when focusing on threshold alone, such as residual error. We believe the performance improvements we see over previous studies of the color code under circuit noise are due to our use of the hexagonal rather than the square-octagon primal lattice in the case of [78,92], and by optimizing extraction circuits and removing the restriction on the qubit connectivity in the case of [37].

A. Projection decoder with boundaries
In this subsection, we briefly describe our adaptation of Delfosse's color code projection decoder [30] to the lattice L 2D , which has boundaries. Our adaptation is essentially the same as that presented in [92], but for the hexagonal rather than the square-octagon primal lattice. This decoder assumes that the stabilizer measurements are perfect, and we analyse its performance under depolarizing noise.
We consider the 2D color code on the lattice L = L 2D from Section I C. Since the 2D color code is a self-dual CSS code, we can decode X-and Z-errors separately and describe here the correction of X-errors; the correction of Z-errors is identical. We denote the support of the X errors by ⊆ ∆ 2 (L), the Z-type syndrome by σ ⊆ ∆ 0 (L) and the resulting correction byˆ ⊆ ∆ 2 (L). For each pair of colors K ∈ {RG, RB, GB} we define the set of highlighted vertices V K to contain the subset σ K ⊆ σ of all the vertices of color in K and we also include in V K a boundary vertex v K for only one color K ∈ K whenever |σ K | is odd, i.e., The projection decoder (see Fig. 8) can then be described as follows.
1. For each pair of colors K ∈ {RG, RB, GB} we use the minimum weight perfect matching (MWPM) algorithm to find a subset of edges E K ⊆ ∆ 1 (L K ) which connect pairs of highlighted vertices in V K within the restricted lattice L K .
2. The combined edge set E = E RG + E RB + E GB separates the lattice L into two complementary regions ∆ ⊆ ∆ 2 (L) and ∆ c = ∆ 2 (L) \ ∆ and we choose the correctionˆ to be the smaller of the regions ∆ and ∆ c .
Note that step 1 can be viewed as the problem of decoding the toric code defined on the lattice L K , and thus one could use any toric code decoder to find a pairing E K ⊆ ∆ 1 (L K ). The MWPM algorithm we chose for this step is computationally efficient. The boundary edges are permitted in E K , but their edge weight is set to zero for matching.
In Fig. 9(a) we present the failure probability of this decoder under depolarizing noise and perfect measurements. In Fig. 9(b) we consider two distance-dependent quantities that should both converge to the threshold as 1/d approaches zero. The first is the pseudo-threshold p * dep (d), defined as a solution to p fail (p, d) = p. The pseudo-threshold is a good proxy to identify the regime in which the code of distance d is useful. The second is the crossing p × dep (d) between pairs of failure curves of distances d and (d + 1)/2 for d ≡ 1 mod 4. From a linear extrapolation of the data we find intercepts 13.16(4)% for p × dep (d) and 12.08(4)% for p * dep (d). Note that the discrepancy in the intercept values suggests that the systems we consider are too small for a naive linear extrapolation to work reliably. Assuming that both p * dep (d) and p × dep (d) continue to change monotonically with d we then take the maximum observed value of p × dep (d) at d = 29 as a conservative estimate of the threshold under depolarizing noise, i.e., p * dep = 12.45(4)% in accordance with previous threshold estimates in Refs. [28,37].
This analysis of the pseudo-thresholds and failure-curve crossings highlights two general points. Firstly, the threshold may not be a good proxy to the finite-size performance. For example, we see in Fig. 9(b) that our estimate of the threshold is considerably above the observed pseudothresholds. Secondly, in order to find the threshold from the data for small systems one has to exert caution, as extrapolating different quantities (which nevertheless should recover the same threshold value in the limit of infinite d) can result in inconsistent estimates.

B. Noisy-syndrome projection decoder with boundaries
In this subsection we further generalize the projection decoder to handle phenomenological noise. To reliably extract the syndrome and perform error correction one can repeat the stabilizer measurements multiple times. The input of the decoder then consists of stabilizer measurement outcomes (possibly incorrect) at QEC cycles labeled by integers and can be visualized as a (2+1)dimensional lattice Λ = L×[n cyc ], where [n cyc ] = {0, 1, . . . , n cyc } and the extra dimension represents time; see Fig. 10. We use a shorthand notation Λ [t 1 ,t 2 ] to denote the part of Λ within QEC cycles t 1 and t 2 . By an QEC cycle, we simply mean a full cycle of stabilizer measurements. Temporal edges, which vertically connect corresponding vertices in two copies of L at QEC cycles t and t + 1, correspond to stabilizer measurements at QEC cycle t. We use the same concepts and nomenclature for the lattice Λ as for L in Section I C. For instance, we say a vertex (v, t) of Λ is a boundary vertex if and only if v is a boundary vertex of L; otherwise, it is an interior vertex. An edge of Λ is a boundary edge iff it connects two boundary vertices. The sets of interior vertices and edges of Λ are denoted ∆ 0 (Λ) and ∆ 1 (Λ), respectively. Furthermore, we denote by Λ K the restricted lattice of Λ consisting of vertices of color in K, and the edges connecting them.
The input of the noisy-syndrome projection decoder is an observed history of the syndrome σ consisting of the subset of temporal edges corresponding to −1 stabilizer measurement outcomes. We define the set of syndrome flips V ⊆ ∆ 0 (Λ) to be the set of all the vertices, which are incident to an odd number of edges in σ. The decoder is then implemented using the following steps.
2. For every K ∈ {RG, RB, GB} use the MWPM algorithm to find the pairing E K of syndrome flips V K within the restricted lattice Λ K .
3. Combine the obtained pairings, E = E RG + E RB + E GB , and decompose E as a disjoint sum of maximal connected components, E = i E i .

For every connected component
removes temporal edges and adds horizontal ones modulo two, (c) add the edges of F i modulo two to the edge set F t i .

5.
For each QEC cycle t, find a correction (t) ⊆ ∆ 2 (L) as the minimal region enclosed by F (t).
We make some additional technical remarks about the noisy-syndrome projection decoder. In step 2, the boundary edges of the restricted lattice Λ K are assigned zero weight when used for pairing. A boundary vertex of a color in K (it does not matter which) is added to V K whenever |V K | is odd. In step 3, we remove weight-zero edges when establishing connected components of E.
To analyze error correction thresholds in a faulty-measurement setting, it is common to study the somewhat contrived scenario of an initially perfect code state undergoing d QEC cycles, followed by a single cycle of perfect measurements. The justification for this is underpinned by the fact that in the fault-tolerant setting, the logical clock cycle (the time required to implement logical gates) requires approximately d QEC cycles with lattice surgery or braiding. Moreover, one would expect the effects of the artificially perfect preparation and final measurement cycle to be negligible over d cycles when d is sufficiently large, making this scenario appropriate for estimating the threshold value (but not for estimating the actual performance of finite sizes). In Fig. 11(a) we find the failure probability after d time units of phenomenological noise. In Fig. 11(d) we show crossings p × phe (d) between pairs of these failure curves corresponding to distances d and (d + 1)/2, which should converge to the threshold p * phe as 1/d approaches to zero. The notion of a pseudo-threshold must be revisited in the setting of faulty measurements and we cannot extract a meaningful pseudo-threshold directly from these curves as we did for the perfect measurement case in Fig. 9. Consider the scenario in which we assume a perfect initial code state and a perfect final measurement cycle, but consider the performance over a varying number n cyc of noisy QEC cycles; see Fig. 11(b). For each d and n cyc , we define the time-dependent pseudo-threshold p * phe (d, n cyc ) as the error rate at which the encoded failure probability after n cyc QEC cycles matches p phy (n cyc ) the unencoded failure probability for n cyc time units, as defined in Eq. (5). As n cyc increases, p * phe (d, n cyc ) is expected to decrease due to the buildup of residual error. However, for sufficiently large n cyc the time-dependent pseudo-threshold p * phe (d, n cyc ) should eventually stabilize to the long-time pseudo-threshold p * phe (d), as can be seen in Fig. 11(c). The following ansatz fits the data well, and allows us to extract an estimate of the long-time pseudo-threshold. We remark that our approach of finding long-time pseudo-thresholds is similar in spirit, but not exactly the same as the one used to find the "sustainable threshold" [93,94]. Also, the ansatz in Eq. (11) was used in [95] to analyze thresholds of cellular-automata decoders for topological codes. The long-time pseudo-thresholds p * phe (d), as well as the failure curve crossings p × phe (d) should converge in the limit of infinite d to the threshold under phenomenological noise. From a linear extrapolation of the data we obtain intercepts of 4.38(3)% for p * phe (d) and 3.67(3)% for p × phe (d); see Fig. 11(d). Note that p * phe (d) appears to converge more quickly than p × phe (d). Assuming that both quantities continue to monotonically change with d we take the maximum observed value of p * phe (d) at d = 17 as a conservative estimate of the threshold under phenomenological noise, i.e., p * phe = 4.19(4)%.
We remark that the modification of the projection decoder that we have presented here to handle noisy syndrome measurements differs from that discussed by Stephens [92] in an important detail. Namely, our "flattening" of pairings is local, since it occurs separately on each connected component after the matching in the (2+1)-dimensional graphs. In contrast, Stephens' flattening is global-all pairings are flattened together and a global correction is produced. For any finite noise strength and for a sufficiently large number of cycles (which does not need to grow with code distance), the success probability of this global flattening will be vanishingly small. Therefore the adaption proposed by Stephens does not have a finite error-correction threshold, and is not fault tolerant. This did not contribute noticeably to the numerical results presented in [92] since the total cycle number was fixed to d, for data in ranges satisfying d < 1/p. However, by looking at the performance over longer times of Stephens' adaption of the projection decoder, we verify that the time-dependent pseudo-threshold for this decoder fails to stabilize for large n cyc ; see Fig. 11(c).

C. Optimizing stabilizer extraction and circuit noise analysis
There is significant freedom in precisely which circuits are used to extract the syndrome for error correction. We will assume that there is a separate ancilla qubit per stabilizer generator, such that there are two ancillas per face of the lattice L 2D , and will not worry about the precise connectivity details, requiring only that coupled qubits are nearby. The total number of qubits required for our implementation of the distance-d 2D color code is therefore Each circuit starts by preparing an ancilla qubit in either |+ or |0 state, followed by applying CNOT gates between the ancilla qubit and all the qubits of the stabilizer generator, and finishes with measuring the ancilla qubit in the corresponding X-or Z-basis. During each time unit new errors can appear in the system and thus it can be beneficial to parallelize as much as possible the circuits used for stabilizer measurement. When circuits for measuring different stabilizers are interleaved, not all schedules of CNOT gates will work. The following conditions [78] must be satisfied.
• At each time unit at most one operation can be applied to any given qubit.
• The measurement circuit preserves the group generated by the elements of the stabilizer group and Pauli X or Z operators stabilizing the ancilla qubits. 5 In our optimization we assume that it suffices to specify the CNOT ordering for a single X and Z stabilizer generator in the bulk, as the code is translation invariant. Moreover, the CNOT schedule for stabilizer generators along the boundary of the lattice are specified by restricting the schedule for those in the bulk; see Fig. 12.
The CNOT schedule includes twelve CNOT gates to extract both Z and X stabilizers. Each CNOT gate is applied at some time unit, and thus the CNOT schedule is specified by a list of twelve non-negative integers A = {a, b, c, d, e, f ; g, h, i, j, k, l}, possibly with repetitions. We are interested in CNOT schedules which satisfy the following condition: Note that this implies that max A ≤ 12, and thus there are at most 12 12 CNOT schedules. However, most of them are invalid as they do not satisfy one of the following necessary conditions [78].
2. One operation per qubit at a time.-The integers a, b, c, d, e, f must be all distinct, as well as g, h, i, j, k, l, and d, j, f, l, b, h, and e, k, a, g, c, i.

3.
Correct syndrome extraction.-To ensure the ancilla measurement after each CNOT sequence extracts the stabilizer measurement outcome, the following inequalities must hold.
-For the stabilizers in the bulk: -For the stabilizers along the boundary: To illustrate how we obtain the inequalities in the last condition, let us analyze how the Pauli X operator stabilizing the X syndrome extraction ancilla on a given face is spread by the CNOT schedule. It propagates to all the data qubits on that face. From each data qubit it may further propagate to the Z syndrome extraction ancilla on the same face, and this is determined by the relative order of the CNOT gates used for the X and Z syndrome extraction. We need to ensure that at the end of the CNOT schedule the Pauli X operator on the X syndrome extraction ancilla has not propagated to the Z syndrome extraction ancilla, which is equivalent to the inequality We remove an ordering from the list of valid orderings if it is equivalent to another ordering in the list up to a symmetry of the lattice L 2D . No schedules with six time units satisfy all these conditions. However, we find that there are 234, 4854 and 39160 valid orderings for 7, 8 and 9 time units, respectively.
To select a good CNOT schedule among the large number of valid ones, we focus on the 234 shortest schedules, because fewer time units in an error correction cycle tends to translate into fewer possible faults and better performance. We tested each using a d = 7 code for d QEC cycles with circuit noise of strength p = 0.0035 and estimated the failure probability by sampling. This value of p was chosen because preliminary studies indicated it was close to the threshold, and distance d = 7 was chosen as a compromise between reducing the simulation run time and limiting the impact of boundary effects. The limited sampling resources were focused on identifying the best CNOT schedules; see Fig. 13(a). We found that up to sampling error, {4, 1, 2, 3, 6, 5; 3, 2, 5, 6, 7, 4} was the best-performing schedule, with a logical failure probability of 12.8(1)%. For comparison, the worst-performing length-7 schedule was {4, 1, 2, 7, 6, 3; 1, 6, 7, 4, 5, 2} and resulted in substantially worse logical failure probability of 21.1(1)%. Note that the QEC cycle for the best schedule requires only 8 time units to implement by preparing and measuring the X ancilla at time units 0 and 7 respectively, and preparing and measuring the Z ancilla at time units 1 and 8 respectively. Now we focus on the best-performing CNOT schedule under circuit noise and find the long-time pseudo-threshold for a range of distances in Fig. 13(b) using the same approach as in Section II B. Unlike in the cases for depolarizing and phenomenological noise, the data for circuit noise appears to be in the regime where both p * cir (d) and p × cir (d) can be fitted with a linear fit and their intercepts agree to within error. We take their shared intercept as an estimate of the threshold under circuit noise p * cir (d) = 0.37(1)%. To estimate the impact of this kind of circuit optimization, we compare the best and worst performing length-7 CNOT schedules and found that the long-time pseudo-threshold differs by a factor of almost two for distance d = 13; see Appendix B for more details.

D. Modelling noise in logical operations
Now we describe an effective noise model for logical operations in the optimized 2D color code under circuit noise, which we will later use to estimate the performance of state distillation circuits. For circuit noise strength p and code distance d, the effective noise model is specified in terms of the overall failure probability p oper (p, d) of each logical operation oper = prep, idle, CNOT, which we estimate numerically and record in Table I in the form of the ansatz in Eq. (7).
In our simulations each operation is followed by a full decoding. This results in the application of a logical Pauli operator, which if nontrivial is interpreted as a failure. Our effective noise model is designed to overestimate the probability of each nontrivial logical Pauli, and assumes each operation fails independently of others. Specifically, • for state preparation noise is modelled by preparing an orthogonal logical state to that intended with probability p prep (p, d), • for an idle operation, noise is modelled by applying X or Z each with probability  Since the failure probability for logical Pauli measurements is negligible, we assume measurements to be perfect in the effective noise model, and only report the measurement failure probability for p = 10 −3 .
• single-qubit Clifford gates are done in software by Pauli-frame updates so are noise-free, • for SWAP, noise is modelled by applying to each of the two qubits X or Z each with • for CNOT noise is modelled by applying IX or ZI each with probability p CNOT (p, d)/2, or XI or IZ each with probability p CNOT (p, d)/4, or other nontrivial Pauli each with probability p CNOT (p, d)/20, • measurements in the logical Pauli bases are assumed to be perfect.
Let us make a number of remarks about this noise model. Firstly, the conservative estimates of different types of failure are quite loose for large distances, and could be made tighter by finetuning the noise model. Secondly, one may wonder why we treat the SWAP operation as a pair of idle qubits, whereas we treat the CNOT operation separately and find it has a considerable failure probability. The reason is that the propagation of previously existing error (which is exchanged between patches by SWAP, but added across patches by CNOT) can be very significant.
In the remainder of this section, we describe how we estimate the logical failure probabilities using distance-d patches of 2D color code under circuit noise of strength p followed by a perfectmeasurement decoding. To avoid the influence of boundary effects in these simulations, we ensure that the patches are close to QEC equilibrium both before and after the logical operation (or just before or after for measurement or preparation, respectively); see Fig. 14. We therefore implement d QEC cycles on the patches prior to the operation and, when necessary, augment the logical operation by including additional idle QEC cycles at the end of the operation, checking that the residual noise has stabilized before the perfect measurement is implemented. By QEC equilibrium, we mean a regime in which the failure probability increases linearly with the number of QEC cycles. Throughout our analysis we use the best-performing CNOT schedule from Section II C, which uses one ancilla per stabilizer generator. This results in (3d 2 − 1)/2 qubits being needed for a distance-d patch.
Logical state preparation.-Recall that |0 is prepared by initializing data qubits in |0 and then measuring the stabilizers for d QEC cycles under circuit noise of strength p, and applying a Z-type operator intended to fix the X stabilizers. We simulate this followed by the perfectmeasurement decoder and estimate the failure probability from the proportion of trials in which FIG. 14. A sketch of the cumulative failure probability before, during and after a logical operation. Logical preparations and measurements comprise only some parts of this sketch. Before we apply the operation, we first make sure that the system reaches QEC equilibrium. The logical operation, which would take n app cyc cycles by itself, is augmented with n after cyc idle cycles to ensure the system has returned to QEC equilibrium afterwards. The logical operation failure rate p oper is the increment of p fail over the n oper cyc = n app cyc + n after cyc QEC cycles.
the a logical X is applied. The analogous procedure is used to identify the failure probability for preparing |0 . The data is presented in Fig. 15 and fitted with the ansatz in Eq. (7) using the same parameters. This fit provides the entry for p prep (p, d) in Table I.  7) is fitted, giving p prep (p, d) in Table I. Both plots can be fitted well using the same parameters, which justifies treating both |+ and |0 preparation identically in the noise model.
Idle logical qubit.-We extract p idle (p, d), the failure probability of an idle logical qubit for a single QEC cycle, by considering the failure probability of a single patch of distance-d 2D color code over n cyc QEC cycles, given a perfect initial state and a perfect final QEC cycle. After a few initial QEC cycles, we expect the residual error in the system to reach an equilibrium, and that thereafter the failure rate should increase linearly with n cyc in the regime of small p fail (d, n cyc ). We can use linear growth of p fail (d, n cyc ) with time as a hallmark of the system being in QEC equilibrium. To estimate p idle (p, d) for fixed p and d, we fit the failure probability p fail (d, n cyc ) data, such as that presented in Fig. 16(a) for p = 10 −3 , with the following linear function of n cyc , i.e., where c(p, d) is a constant. We plot this and the fit in Fig. 16(a) for a particular value of p, and use the gradient to estimate p idle (p, d) for various p and d in Fig. 16(b). We fit Eq. (7) to the data, giving the entry in Table I. Note that in Fig. 16(a) the y-intercept is negative since the simulation begins with a perfect code state and so the probability of failure during the early QEC cycles is artificially reduced. For later logical operations we will assume that d QEC cycles prior to the logical operation is sufficient to ensure the system has reached QEC equilibrium. In our noise model, we claim that p idle (p, d)/2, p idle (p, d)/2 and p idle (p, d)/20 are conservative estimates of the probability of X, Z and Y respectively. In Fig. 16(c) and (d), we justify this claim by finding the fraction of failures which occur for each Pauli operator. Despite the symmetry of the depolarizing noise, we observe that Y failures are much less frequent than X or Z failures due to the independent detection and decoding of X and Z errors. The failure probability using the best-performing CNOT schedule as a function of n cyc /d for p = 5 · 10 −4 . We use the ansatz in Eq. (13) to find the logical failure rate p(p, d). We observe that QEC equilibration occurs by the dth cycle; see Appendix B for additional data. Not discarding the initial equilibration period would result in an underestimate of the failure probability.
(b) For each value of p, we use the ansatz in Eq. (7) to extract the parameters for p idle (p, d) in Table I.
(bottom panels) The fraction of overall failures corresponding to different logical Pauli errors. The points whose data range falls below the horizontal axis correspond to no observed failures.
Transversal logical CNOT.-The logical CNOT is implemented transversally between a pair of 2D color code patches. We assume the system is in QEC equilibrium before the gate, which is ensured in the simulation by running d QEC cycles on an initially error-free state. Immediately after the CNOT gates are applied, the system's residual noise is elevated since the CNOT propagates X errors from the control to the target and Z errors from the target to the control. To allow the system to return to QEC equilibrium, we include n after cyc QEC cycles after the gate is applied in the logical CNOT operation. We conclude from Fig. 17(a) that n after cyc = 2 is sufficient, which we then incorporate into the logical CNOT operation. We analyze the overall failure probability of the logical CNOT in Fig. 17(b), and the fraction of failures resulting in each logical Pauli in the panels at the bottom of Fig. 17. Note that in contrast with the phenomenon observed in Fig. 16(a) in which the initial system equilibrated from a state of lower noise, here the system equilibrates from a state of higher noise.
To decode each patch, we use the combined syndrome history-since the CNOT propagates X errors from the first patch to the second one, we add the Z-type stabilizer history from the first d QEC cycles from the first patch to that of the second before decoding X errors in the second patch, and similarly for Z errors in the first patch. To isolate the contribution to the logical operator from the CNOT alone, we find and remove the contribution to the logical operator from the initial d QEC cycles by applying a perfect-measurement decoding on the system immediately after the d QEC cycles, and propagate that through the logical CNOT gate. 6 (a)    Table I. (bottom panels) The fraction of overall failures corresponding to different logical Pauli errors. The dominant errors are ZI and IX, since the logical CNOT propagates X errors onto the second patch and Z errors onto the first patch. Other noticeable errors include IZ and XI, which correspond to the failure over two QEC cycles, but become negligible for large d. The points whose data range falls below the horizontal axis correspond to no observed failures.
Logical readout.-To fault-tolerantly read off Z, one measures each data qubit in the Z basis. The resulting bit string can be fixed using a perfect-measurement decoder so that it satisfies all Z-type stabilizers, and then the outcome of the Z logical operator can be read off from any representative. One can fault-tolerantly measure X similarly by measuring each data qubit in the X basis.
To simulate the logical X readout we first run the system for d QEC cycles to ensure equilibration has occurred, then measure each qubit in the X basis under circuit noise of strength p, followed by perfect-measurement decoding. To isolate the contribution to the logical operator from the logical measurement alone, we find and remove the contribution to the logical operator from the initial d QEC cycles by applying a perfect-measurement decoding on the system immediately after the d QEC cycles. We estimate the failure probability and mode fraction for logical measurement of X and Z in Fig. 18. We fit Eq. (7) to the data for both X and Z measurements, which agree with one another and this fit provides the entry for p meas (p, d) in Table I.
Note that the noise contributed by readout is orders of magnitude below the other logical operations, and we only take data for p = 10 −3 since a prohibitively large number of samples would be required for p = 10 −4 and p = 5 · 10 −4 . This justifies that we neglect contributions from readout in our effective noise model. The readout failure probability for X and Z. Note the value is orders of magnitude lower than for the other logical operations and that we only show that for p = 0.001, since the smaller values of p are so low that they are difficult to observe using Monte Carlo. We use the ansatz in Eq. (7) to extract the parameters for p meas (p, d) in Table I, although in our noise model we neglect noise in the measurement.

III. STATE DISTILLATION ANALYSIS
In this section we carefully analyze the performance and estimate the overhead of state distillation of T states using the standard 15-to-1 scheme.
A. Creating the T state via state distillation in 3 steps A protocol to produce an encoded T state using state distillation, which we illustrate in Fig. 19, consists of the following steps.
1. T state initialization.-We initialize encoded T states in small-distance 2D color code patches, which later serve as the input to the first round of state distillation.
2. Expansion and movement of patches.-The code distance in consecutive rounds of state distillation is required to increase to protect the produced T states of improving fidelity. We must therefore increase the size of a patch which is output from one round, and move it to the desired location where it becomes an input for the next round.
3. State distillation circuit.-This is a circuit with every qubit encoded in a distance-d 2D color code, consisting of nearest-neighbor logical Clifford operations on patches arranged in a 3D stack. The input consists of 15 encoded T states, and the output is one higher fidelity encoded T state.
The second and third steps are repeated (on multiple copies of the procedure in parallel) with increasing code distances chosen to minimize the overhead until a T state of the desired quality is produced. In the following subsections, we go through each of the three steps, elaborating on the implementation and simulation details. In our analysis, we assume circuit noise of strength p, and use the effective noise model presented in Section II D to analyze the performance of logical-level circuits implemented with the 2D color code.

T state initialization
The first step of any state distillation protocol is to produce the initial encoded T states. The initialization protocol to do this can be crucial since the results of state distillation depend strongly on the quality of the initial T states. For example, taking the 15-to-1 scheme with perfect Clifford operations, if the starting infidelity is decreased by a factor of two (which, as we will see, can be achieved by varying the CNOT order in the initialization protocol), the output infidelity is reduced by about one, three and eight orders of magnitude over one, two and three state distillation rounds, respectively. Although abstract state distillation protocols have received a lot of attention, there is surprisingly little research on the initialization of T states as inputs for state distillation despite the enormous potential impact. For the surface code, a non fault-tolerant scheme with the logical error of the final encoded T state comparable with that of raw state was proposed by Li [97]. More recent works [75,98] present fault-tolerant approaches to initialize encoded T states and can, in some regimes, achieve higher fidelity encoded T states with low overhead, but are somewhat more challenging to implement.
Our strategy of initializing T states for the 2D color code can be viewed as a generalization of the approach in Ref. [97], which consists of two main steps: (i) produce an encoded T state in a distance d 1 code (with d 1 < d), then (ii) enlarge the code from d 1 to d . For judiciously chosen d 1 ,  FIG. 19. The protocol to create an encoded T state via state distillation, with a qualitative timeline of each step. In step 1, noisy encoded T states are prepared non-fault-tolerantly in a small 2D color code. In step 2, each code patch is expanded to increase its distance, and we assume that the logical infidelity of the encoded T state does not change. In step 3, 16 logical states, either |0 or |+ , are prepared and then the state distillation circuit is run on them and 15 |T 's. Given successful postselection, the output is a single encoded T state of higher fidelity (indicated by a larger font size). Steps 2 and 3 are iterated until a state of the desired infidelity is produced. the noise added during step (ii) can be neglected because it is much less significant that the noise from step (i). We produce an encoded T state in the following steps.
1. Choose representatives of the logical X and Z operators which intersect on a single qubit, and prepare that qubit in |T . Prepare the remaining data qubits along the support of the logical X and Z in |+ and |0 , respectively. Other data qubits are prepared in either |+ or |0 ; see Fig. 20(a).
2. Measure each stabilizer twice; see Fig. 20(b). If the observed syndrome is not the same or the syndrome could not have arisen without fault, the initialization procedure is restarted.
3. Apply a Pauli operator fixing the observed syndrome.
In our simulation, we say the procedure has succeeded in creating an encoded T state if upon a single additional fault-free QEC cycle, one obtains the encoded T state. We remark that the state from step 1 is not an eigenstate of all the stabilizers measured in step 2, and thus, even in the absence of faults, we may need to apply a nontrivial Pauli operator in step 3 to ensure that all the stabilizers are satisfied. Lastly, we optimize the initialization protocol by varying the order in which the CNOT gates are applied during the two QEC rounds in step 2. We considered a range of system sizes, and again assume that there is a separate ancilla qubit per stabilizer generator and consider the 234 valid schedules consisting of 7 CNOT time units as in Section II C. We found that the d = 3 size resulted in the best parameters, and that the worst schedule (which is not the same as that used for standard error correction) results in more than twice the lowest-order failure probability of the best schedule; see Appendix C for more details.
The protocol takes

Expansion and movement of patches
We neglect any error introduced by expansion of the encoded T states, i.e., while enlarging the distance of the base code from d (i) to d (i+1) between rounds i and i + 1, and while these patches are moved into the starting location for the i + 1 round. This is justified since errors are suppressed throughout the expansion similarly as during error correction of the distance-d (i) code. We also neglect any additional time overhead introduced by the expansion and movement of patches. This is justified since the expansion can be done within the d (i) QEC rounds, and the swaps needed to move the outputs each take just one QEC cycle, which we expect can be done during the d (i+1) QEC rounds needed to prepare the |0 and |+ states at the start of the state distillation circuit; see Fig. 21.

15-to-1 state distillation circuit
Here, we analyze the 15-to-1 scheme run on logical qubits encoded in patches of 2D color code with distance d arranged in a 3D stack. The logical circuits are analysed using the effective noise model in Section II D, which takes two parameters: the distance d of the 2D color codes used, and the strength p of the underlying circuit noise. Since we allow Clifford operations only between adjacent patches in the stack, we have to appropriately modify the state distillation circuit; see Fig. 21. In our analysis, we only keep track of errors up to order q 3 and p, which are of similar order of magnitude in the regime of interest, where q is the infidelity of the input encoded T states, and p is the noise strength in the effective noise model. This splits the analysis into looking at error either in the encoded T states alone, or in the Clifford operations alone.
Noisy T states.-First, we briefly review the effect of noise on T states [50]. As described in Section I B, we simplify the form of noise assumed on a T state by twirling, which in this case corresponds to randomly applying the Clifford XS † ∝ |T T | − |T ⊥ T ⊥ | with probability 1/2. Recall that single-qubit Clifford gates can be done instantaneously and perfectly by frame tracking in the 2D color code as described in Section I D. This twirling forces the noisy T state to be of the form ρ = (1 − q)|T T | + q|T ⊥ T ⊥ |, or equivalently that each T state is afflicted by a Z error with probability q. The noise on the set of 15 input noisy T states is therefore represented as a  Z-type Pauli error E occurring with probability q |E| (1 − q) 15−|E| . The protocol will reject if E is a detectable error for the punctured Reed-Muller code, which has distance d = 3 for Z-type operators. The protocol results in a failure if and only if E is a nontrivial logical operator. Explicit enumeration shows that there are 35 weight-3 Z-type logical operators, such that the contributions p rej and p fail due to T errors are Noisy Clifford operations.-Now we consider the effect of noise in the Clifford operations [62,63] in the state distillation circuit. First, we analyze the faults that occur during the Reed-Muller state preparation (lilac in Fig. 21), which takes 16 QEC cycles to complete. We propagate each fault as a logical Pauli operator through the circuit, and assume that every other operation acts perfectly, including the T gates. By explicitly representing the state and operations as the vector and matrices of dimension 2 16 and 2 16 × 2 16 , respectively, we find that the contributions are X X X FIG. 22. A part of the state distillation circuit with the inclusion of a Pauli error X x1 1 Z z1 1 X x2 2 Z z2 2 on a pair of qubits. Such an error can appear as a result of previous faults being propagated through the circuit. Without loss of generality we assume that all the other measurements have been completed before this error, which ensures that the pair of qubits is decoupled from the rest of the system. If the error-free outcome is b, then the error-free state must be |ψ = (Z b |T ⊥ ) ⊗ |T . This simple 2-qubit circuit can then be analyzed for all 16 cases of the error X x1 Next we consider faults in the 16 idle QEC cycles of the output qubit. Note that there is a choice of which of the 16 qubits of the Reed-Muller state to puncture in the 15-to-1 protocol. We simulated all 16 choices, and selected the third qubit as the output since it had the lowest contribution from p CNOT ; see Appendix C. Any failure in any of these locations will result in an undetected failure p out rej = 0, p out fail = 16 p idle .
By explicit calculation, we find the exact contribution to p rej and p fail of every Clifford fault location in Fig. 21 according to our effective noise model.
Lastly we analyze the remaining fault locations. These consist of the 15(d + 16) idle locations involving qubits holding the T states which remain idle during the production of the RM state (pink), the 420 idle and SWAP locations in the shuffle circuit (green), and the 15 CNOTs used to implement gate teleportation (blue) in Fig. 21. A single fault in any of these locations will propagate to a Pauli operator X x 1 1 Z z 1 1 X x 2 2 Z z 2 2 acting only on one pair of qubits as shown, where x 1 , z 1 , x 2 and z 2 each take values 0, 1 and the first qubit holds the |T , and the second is from the Reed-Muller state. To analyze the effect of such a Pauli operator, we imagine delaying the measurements on the affected pair of qubits until after the completion of the rest of the circuit; see Fig. 22. At this point, all other measurements are completed, and if the Pauli operator were trivial, i.e., if x 1 = z 1 = x 2 = z 2 = 0, the outcome b ∈ {0, 1} of the X measurement would be determined by the previous outcomes since the X-stabilizers must be satisfied. Therefore, the pair of qubits must be completely unentangled with the rest of the system. This tells us that none of these fault locations can result in a failure, but result in rejection if and only if the outcome of the X measurement is modified by the Pauli operator X x 1 1 Z z 1 1 X x 2 2 Z z 2 2 . We straightforwardly analyze this 2-qubit circuit with its pure initial state and for Pauli operator find the probability p flip (x 1 , x 2 , x 3 , x 4 ) that the outcome is flipped, namely if x 1 = x 2 and z 2 = 0, 1/2 if x 1 = x 2 and z 2 = 0, 1 if x 1 = x 2 and z 2 = 1, 1/2 if x 1 = x 2 and z 2 = 1.
Then all that remains is to count the contribution to each of these Paulis from the aforementioned locations according to the effective noise model, which yields The contributions from Eq. (16), Eq. (17) p dist fail = 35q 3 + 16.9 p idle + 1.93 p CNOT .
The number of time units required to implement the state distillation circuit can be straightforwardly identified from Fig. 21 and is equal to

B. State distillation overhead
Here we estimate the overhead required for a k-round state distillation protocol using distances {d (1) , d (2) , . . . , d (k) } under circuit noise of strength p, as well as the infidelities output by each round {q (1) , q (2) , . . . , q (k) }. Note that q (k) is the infidelity of the encoded T state produced by the overall protocol.
Recall that the first step of the state distillation protocol is to initialize encoded T states in distance-5 2D color codes; see Fig. 19 and Section III A 1. Writing the infidelity of the state after initialization as q (0) = p init fail (p), the remaining infidelities are then calculated iteratively according to To estimate the overhead, it is useful to streamline our notation. Recall from Eq. (12) that N 2D (d) = (3d 2 − 1)/2 qubits are used to implement the distance-d 2D color code. For each state distillation round i ∈ {1, 2, . . . k}, let r (i) = 1 − p dist rej (p, q (i−1) , d (i) ) be the acceptance probability, R (i) = 15 be the number of |T s required assuming acceptance, and α (i) = 31/15 be the number of logical qubits needed per input |T in the state distillation circuit. To account for the initialization step, we also set d (0) = 3, r (0) = 1 − p init rej (p) , R (0) = 1 and α (0) = 1. We can calculate the expected number of physical qubits required in each round. We imagine preparing a large number of output T states, and thus we can talk about the average overhead. 7 Since the state distillation protocol in the last round succeeds with probability r (k) , on average it needs R (k) /r (k) input |T s. We therefore require on average α (k) N 2D (d (k) )R (k) /r (k) qubits for the last round. To supply the kth round, the (k − 1)th round must therefore output R (k) /r (k) |T 's on average, which requires α (k) N 2D (d (k) )R (k−1) R (k) /(r (k−1) r (k) ) physical qubits, and so on. The qubit overhead N SD of the state distillation protocol can then be found as the number of qubits needed in the most qubit-expensive state distillation round, namely The time required for state distillation is The space-time overhead is then simply N SD τ SD . For various values of p, we run a simple search over number of rounds k ∈ {1, 2, 3} and distances {d (1) , d (2) , . . . , d (k) } to distill a target infidelity p fin for a low space or space-time overhead; see Let us briefly remark on the threshold of this 15-to-1 state distillation using the 2D color code. This threshold is the noise strength above which arbitrarily low target infidelity cannot be achieved, irrespective of the number of state distillation rounds. There are two main features that could limit the state distillation threshold. Firstly, error correction in the 2D color code could be the limiting factor. Namely, if p is above the circuit-noise threshold of 0.37(1)%, one will be unable to achieve logical states with arbitrarily low infidelity. Secondly, the first round of state distillation can be a bottleneck if the infidelity of the initial encoded T state cannot be improved by it. For simplicity, we assume that the encoded Clifford operations execute perfectly, and solve for the critical infidelity q c satisfying 35q 3 c = q c . Using Eq. (15), we obtain a corresponding critical error rate of 1/(3.93 · √ 35) ≈ 4%. We conclude that error correction is the bottleneck, so we estimate the threshold for state distillation with this scheme to be 0.37(1)%.

IV. FURTHER INSIGHTS INTO 3D COLOR CODES
In this section we present some insights into 3D color codes that are useful for code switching. In Section IV A, we describe a simple approach to switch between the 2D and 3D color codes. At the core of this approach is the fact that in a particular state of the gauge qubits, the 3D subsystem color code can be viewed as a collection of 2D color codes. In Section IV B, we describe some relevant features of the gauge operators of the 3D color code which will be relevant for gauge-fixing and also for the simulation of noise upon the application of the transversal T gate. Our discussion closely follows material in [99]. In Section IV C, we generalize the restriction decoder to correct Z-errors in the 3D color code with boundaries, which will later be used in our code switching protocol.

A. A simple way to switch between 2D and 3D color codes
First we recall the general procedure for gauge fixing from a subsystem code with gauge group G to a stabilizer code with stabilizer group S ⊆ G, where both codes share a set of bare logical operators. We require that the stabilizer group S of the subsystem code, which is the centralizer of G in the Pauli group intersected with G modulo the phase, i.e., S = (Z(G) ∩ G)/ iI , is contained in S , namely S ⊆ S . Consider any state |ψ in the code space of the subsystem code, which by definition is a (+1)-eigenstate of all elements of S. To switch from G to S , we first measure a generating set of S \ S. A subset of those measured generators may have −1 outcomes, but there must exist an element g ∈ G which anti-commutes with precisely that subset of generators. Hence after applying g, all the stabilizers of S will be satisfied, and since g commutes with the bare logical operators, the logical state is unaffected, which completes the transfer from G to S . This procedure is named gauge fixing since it involves measuring some (initially unsatisfied) gauge operators, and fixing them to be +1.
Central to switching between color codes is the fact that both the 3D stabilizer color code and the 2D color code can be viewed as gauge fixings of the 3D subsystem color code; see Fig. 24. The gauge and stabilizer groups G sub and S sub for the 3D subsystem color code, and the stabilizer group S 3D for the 3D stabilizer color code are: We can define the stabilizer group S 2D of the 2D color code within the 3D lattice L 3D since L 2D is 'contained' in L 3D ; see Fig. 24(b). Namely, where v Y is the Y boundary vertex. Let us define the stabilizer group S int supported on the qubits which are not near v Y as follows S int = X(e), Z(e) | ∀e ∈ ∆ 1 (L 3D ): e is incident to any interior Y vertex .
We can think of S int as the group generated by the stabilizers of the 2D spherical color codes centered around every Y interior vertex of L 3D ; see Fig. 24(c). Note that every 2D spherical color code encodes zero logical qubits. It is straightforward to see that G sub contains both S 3D and S 2D , S int . Moreover, S sub ⊆ S 3D and S sub ⊆ S 2D , S int since vertex operators X(v) and Z(v) can be formed by multiplying operators X(e) and Z(e) on all the edges of the same color incident to v. Also note that there is a shared  Fig. 4b), all Z-edges (depicted by light struts) are satisfied. This corresponds to the 3D stabilizer color code S 3D . (b) In another gauge fixing of G sub , all X-and Z-edges incident to Y vertices in ∆ 0 (L 3D ) (depicted by light struts) are satisfied. This corresponds to the 2D color code S 2D on the qubits near the boundary vertex v Y and the 2D spherical color codes S int on other qubits. We depict the primal lattice of the 2D color code on the lattice L 2D (shown in Fig. 4a) in black. (c) Around each Y vertex in ∆ 0 (L 3D ), there is the 2D spherical color code, whose primal lattice we depict in black.
representation of bare logical operators for all three codes (for example X and Z applied to every qubit in L 2D ). Therefore, to move from the subsystem code G sub to either of the stabilizer codes, i.e., S 3D or S 2D , S int , gauge switching can be used. Switching from either of the stabilizer codes to the subsystem code requires no action, since any state which is a (+1)-eigenstate of every element of S 3D or S 2D , S int must also be a (+1)-eigenstate of every element in S sub . This example of gauge fixing is sometimes referred to as a dimensional jump because the logical information is moved between codes defined on 2D and 3D lattices [57].

B. Physics of the gauge flux in 3D color codes
Here we consider general features of the gauge operators of the 3D subsystem color code. Our discussion closely follows material in [99]. Suppose there is some X error ⊆ ∆ 3 (L) in the system. We define the Z-type gauge flux γ ⊆ ∆ 1 (L) to be the subset of interior edges which would return −1 outcomes if all Z edges in the system were measured perfectly. Since each edge has one color from K = {RG, RB, RY, GB, GY, BY }, we distinguish six types of the flux and write Although the flux γ can be random, it has to form a collection of strings, which may branch and can terminate only at the boundary of the lattice; see Fig. 25. A local constraint capturing this behavior, which we call the Gauss law, can be stated as follows. Let K 1 , K 2 , K 3 , K 4 ∈ {R, G, B, Y } be four different colors. Then, for any vertex v ∈ ∆ 0 (L) of color K 1 , the number of edges of γ of color K 1 K 2 or K 1 K 3 and incident to v has to be even, i.e., This local constraint arises from the redundancies among gauge generators. Namely, in order to form a stabilizer generator Z(v) identified with the vertex v, we can multiply all the gauge generators on edges of color K 1 K 2 incident to v. Alternatively, we can obtain Z(v) as the product of all gauge generators on edges of color K 1 K 3 incident to v. Since the parity of the number of −1 measurement outcomes among those gauge generators in both cases is the same, we recover Eq. (32). Note that when the stabilizer Z(v) is violated (indicating the presence of some X error), then the number of −1 outcomes for gauge generators on edges incident to v of color K 1 K 2 has to be odd, i.e., γ K 1 K 2 | v ≡ 1 mod 2. In such a case, we call the vertex v a branching point of γ, as three different flux types γ K 1 K 2 , γ K 1 K 3 and γ K 1 K 4 meet at v. We remark that to perform error correction with the 3D subsystem color code, one can use the information about the branching points of the flux γ. If the flux γ has no branching points at any interior vertex, then all Z stabilizers are satisfied and there always exists an X-type gauge operator which anti-commutes with precisely those Z gauge generators in γ.
If an edge set satisfies the Guass law, we say that it is valid. An edge set which does not satisfy the Gauss law is said to be invalid, and we refer to all vertices which violate the Gauss law as violation points. When the gauge flux is measured with noisy circuits, there can be errors in the reported noisy gauge flux causing it to be invalid; see By definition, different connected components are disjoint, i.e., γ j ∩ γ k = ∅ for j = k. Then, we define a linked component of γ as a subset of all connected components of γ, which are linked (in the sense of knot theory); see Fig. 25(b). Finally, we can decompose γ as a disjoint union of its linked components Note that each λ i is the sum of some γ j 's. For convenience, we introduce a function col : ∆ 0 (L) → Z 3 2 , which for each vertex v returns its color, where we set R = (1, 0, 0), G = (0, 1, 0), B = (0, 0, 1) and Y = (1, 1, 1).
For any linked component λ i of γ we refer to a subset σ ⊆ ∆ 0 (λ i ) as an excitation configuration for λ i and call v∈σ col(v) the total charge of σ. We denote the collection of excitation configurations Σ(λ i ) for λ i with the neutral total charge as follows Writing the linked component λ i in terms of its connected components λ i = γ i 1 + γ i 2 + · · · + γ i k , we also introduce the collection of excitation configurations without the linking charge We remark that the linking charge is a charge which can be transferred between two connected components γ i and γ j , which are linked; see [99]. Note that Σ (λ i ) is contained in Σ(λ i ), i.e., Σ (λ i ) ⊆ Σ(λ i ), and they coincide whenever the linked component λ i consists of a single connected component.

C. Restriction decoder for 3D color codes with boundaries
Here we provide details on correcting Z type errors in the 3D stabilizer color code with perfect measurements, which is needed for the final step of the code switching protocol. We seek an efficient decoder for the 3D color code with good performance. Our approach is to adapt the restriction decoder from Ref. [31], which was originally defined for lattices with no boundary, to the tetrahedral lattice L 3D . We also apply additional minor modifications to improve the performance. In what follows we briefly review the restriction decoder and describe our modifications.
Let σ ⊆ ∆ 0 (L 3D ) be the syndrome of the 3D stabilizer code on the tetrahedral lattice L 3D . We pick one color, say Y , and for each color K ∈ {R, G, B} we separately analyze the restricted syndrome σ KY ⊆ ∆ 0 (L KY 3D ) within the restricted lattice L KY 3D . If |σ KY | ≡ 1 mod 2, then we add the boundary vertex v Y to σ KY . Next, we find a subset of edges E KY ⊆ ∆ 1 (L KY 3D ), which provides a pairing of vertices of σ KY within the restricted lattice L RY 3D . Note that we can find a pairing of minimal weight by using the MWPM algorithm. Also, the weight of the edge connecting the boundary vertices v Y and v K is set to zero.
After finding the pairing E = E RY + E GY + E BY we apply a local lifting procedure to every Y vertex in the interior of L 3D . Namely, for every Y vertex v ∈ ∆ 0 (L 3D ) we find any subset of tetrahedra τ (v) ⊆ ∂ 0,3 v in the neighborhood of v, whose 1-boundary locally matches E, i.e., We emphasize that all such choices of τ (v) result in operators Z(τ (v)), which may differ only by a stabilizer operator.
To lift the boundary vertex v Y , we need to adapt the original restriction decoder from Ref. [31]. Since L 2D is a sublattice of L 3D (see Fig. 24 is equivalent to the problem of decoding the 2D color code defined on the facet of the tetrahedral lattice L 3D near v Y . We remark that different choices of τ (v Y ) may lead to operators Z(τ (v Y ) differing by a logical operator. We use the projection decoder for the 2D color code as described in Section II A. Finally, the correction operator is found as v∈∆ 0 (L 3D ) Z(τ (v)), where the product is over all Y vertices, including the boundary vertex v Y . Note that this adaption to accommodate a lattice boundary in 3D is analogous to an adaption presented in [37] for the 2D case.
In Fig. 27(a) we show the performance of the restriction decoder adapted to the 3D stabilizer color code on the tetrahedral lattice L 3D , finding a threshold of 0.55(5)% for independent and identically distributed (iid) phase-flip Z noise. This value can be contrasted with the restriction decoder threshold of 0.77% for the color code on the 3-torus reported in [31]. Also, a similar adaption of the restriction decoder to the 3D color code with a boundary was recently presented in [100], however the reported values of 0.1-0.2% are surprisingly low.
This adapted restriction decoder on moderate system sizes is far from optimal. For example, some weight-2 errors can cause failure for any d ≤ 9. This phenomenon is not solely due to the presence of the boundaries. Namely, we found that there are some weight-2 errors in the color code of distance d = 6 on the 3-torus, which cause the restriction decoder to introduce a logical error.
To improve the performance, we consider a very simple additional modification of the restriction decoder to improve its performance for moderate system sizes. In addition to selecting a small weight set τ (v Y ) for the boundary vertex v Y , we choose as τ (v) the set of minimal weight for every Y vertex v ∈ ∆ 0 (L 3D ). As mentioned above, this can only change the decoder output by a stabilizer, but the explicit representation will typically be of lower weight. Then, we simply rerun the same decoder three times by picking other colors, i.e., R, G and B instead of Y , and finally select the correction which has the lowest total weight among the four for colors Y , R, G, and B. This simple modification yields significant improvements: the lowest distance which corrects all weight-2 errors is now d = 7 compared to d = 11 which was needed without the modification, and the threshold is increased from 0.55(5)% to 0.80(5)%; see Fig. 27.

V. CODE SWITCHING ANALYSIS
In this section we describe how code switching between 2D and 3D color codes can be used to fault-tolerantly produce an encoded T state in the 2D color code, and analyze the overhead of this process.

A. Creating the T state via code switching in 6 steps
Here we outline the protocol to produce an encoded T state using code switching in the presence of circuit noise. The main idea is to first produce a Bell state across a pair of 2D color codes, and switch one of the two into a 3D color code where the logical T is applied transversally. Then, by measuring X for the 3D code, the encoded T state is effectively teleported to the remaining 2D code (up to a known logical Pauli correction). This approach avoids switching from the 3D code back to the 2D code, which we believe considerably reduces the amount of extra noise and simplifies our simulation. More explicitly, the protocol consists of the following steps (which are illustrated in Fig. 28).
FIG. 28. The protocol to create an encoded T state via code switching, with the timeline of each step in units of QEC cycles for the 2D color code. In steps 1 and 2 we simultaneously prepare the Bell state |Φ + in a pair of 2D codes, and the interior of the 3D subsystem color code. In step 3, we measure gauge operators, before fixing them in step 4 to end up in the 3D stabilizer color code. In step 5, T is applied and all the qubits are measured. In step 6, a decoder is used to infer the outcome of the logical X of the 3D stabilizer color code, and the state of the remaining 2D patch is |T up to a logical Z.
1. Prepare Bell state in 2D codes.-The encoded Bell state (|0 2D |0 2D + |1 2D |1 2D )/ √ 2 state is fault-tolerantly prepared in a pair of 2D color codes, defined on two copies of the lattice L 2D . The second of these 2D color codes should be seen as the code defined along the 2D boundary near the yellow vertex of the 3D lattice as in Fig. 24(b).

2.
Prepare the 3D interior.-The remaining qubits in L 3D , i.e., those which are not near the boundary vertex v Y , are prepared as a tensor product of unique spherical 2D color code states. The logical state of the system is a Bell pair between a 2D color code and the 3D subsystem color code (|0 2D |0 sub + |1 2D |1 sub )/ √ 2.
3. Measure gauge operators.-All Z-edge operators for all edges not incident to any Y vertex are measured, yielding the subset of edgesγ corresponding to −1 outcomes.
4. Gauge fix.-We find and apply an X-gauge operator which seeks to fix all Z-edge operators to have +1 outcomes. The logical state is now a Bell state encoded into the 2D color code and the 3D stabilizer color code (|0 2D |0 3D + |1 2D |1 3D )/ √ 2.

5.
Apply T , and measure.-We apply T to every data qubit. This, in turn, implements a logical T gate, yielding the state (|0 2D |0 3D + e iπ/4 |1 2D |1 3D )/ √ 2. Then we measure each individual data qubit in the 3D code in the X basis.
6. Decode Z errors in 3D code.-We use the single-qubit X-basis measurements to first decode Z-errors, and then infer the outcome m = ±1 of the logical X. Then, the state encoded in the 2D color code is (|0 2D + me iπ/4 |1 2D )/ √ 2, which for m = −1 needs to be fixed to the encoded T state by application of logical Z.
In the following subsections, we go through each of the six steps, elaborating on the implementation and simulation details. In our analysis, we adhere to the following guiding principles.
• For each step, we use Monte Carlo simulations under circuit noise to estimate the performance. We select the best error correction techniques, fault-tolerant gadgets and efficient decoding algorithms that we are aware of, and optimize measurement circuits where possible.
• It is possible that some steps will benefit from future improvements in error correction techniques and decoders. We estimate the impact these could have on the performance of this code switching protocol by replacing those steps by a justified estimate of the best improvement one could hope for.
• We choose the fault-tolerant error correction for the 2D color code to be the same optimized configuration we assumed for state distillation (see Section II D) to allow for a fair comparison between code switching and state distillation.
• We assume a single ancilla qubit per gauge operator in the 3D color code interior.
In Fig. 29(a) we present our findings by showing the overall probability of failure of code switching using our simulations. In Fig. 29(b)-(f) we indicate the impact of optimistic improvements of various steps in the protocol on the overall probability of failure of code switching.

Preparing the Bell state in 2D codes
To fault-tolerantly prepare the encoded Bell state (|0 2D |0 2D + |1 2D |1 2D )/ √ 2 in a pair of 2D color codes of distance d, we first fault-tolerantly prepare them in |+ 2D and |0 2D respectively, and then transversally apply a CNOT from the first to the second. Preparation of |+ 2D is carried out by initializing all data qubits in the 2D color code in |+ , and then performing d QEC cycles and fixing the inferred initial syndrome of the Z stabilizers. Preparation of |0 2D similarly involves initialization of all data qubits in |0 , followed by d QEC cycles and fixing the inferred initial syndrome of the X stabilizers. Since each QEC cycle for the 2D color code requires 8 time units, the preparation of the encoded Bell state takes 8d time units. After this point the second of the 2D  29. (a) The failure probability of the code switching protocol under circuit noise, for which we observe a T gate threshold of 0.07(1)%. We also estimate the code switching failure probability when one step, i.e., (b) interior preparation, (c) gauge measurement, (d) gauge fixing and (e) the decoder, is replaced with an optimistic version. In (f), all four of these steps are replaced by their optimistic versions. We use a special form of the ansatz in Eq. (7), i.e., p fail = A (p/p * CS ) (Cd+D) to fit the data up to the crossing. The gray region between the fit lines from (a) is superimposed onto (b)-(f) to guide the eye. We observe that improving the interior preparation and developing a better decoder have the most potential to improve the performance of code switching. If all the potential improvements are achievable, the threshold could be as large as 0.22(5)%.
codes undergoes a single additional QEC cycle while code switching occurs for the first 2D code patch. Although it is arbitrary which of the two patches is used for code switching, in practice there is an asymmetry in the X and Z noise for the patches. We find a marginal benefit from feeding the first patch (initialized in |+ prior to the CNOT) to be used for code switching. For details, see Appendix E.

Preparing the 3D interior
There are 2D spherical color code states to be prepared around each yellow vertex as that shown in Fig. 24(b). Implementation details and optimizations of this step are given in Appendix D and Appendix E, but we provide the key features here.
Data qubits are prepared in |+ , then Z stabilizers are measured with the shortest circuit that uses a single ancilla. If the syndrome extracted for a 2D color code around a yellow vertex is valid, meaning there is an X operator that will set all stabilizers to have +1 outcomes, then such a fix is applied, otherwise the preparation is repeated. If the second iteration also yields an invalid syndrome, one randomly selected outcome is flipped, producing a valid syndrome which is then fixed. The interior preparation step is started sufficiently early so that any repeated preparations have finished by the time the two 2D color code patches completes. Note that since the 2D spherical color code encodes no logical qubits, it is not necessary to repeat measurements, in contrast to preparation of a logical state in the 2D color code patch, which requires a number of QEC cycles proportional to d.
Optimistic improvements. One may hope to improve the preparation of the 2D spherical color codes. Although we found a schedule of shortest length to measure the stabilizers, there could be many others of the same length which could potentially lead to fewer errors. Moreover, it may be possible to exploit the unused ancilla associated with the RG, RB and GB edges and to consider collective rather than independent preparations of the 2D color codes. No matter how good such a preparation protocol is, it would require at least three time units as each qubit appears in three Z-type stabilizer generators. Therefore, we use three idle time units with circuit noise of strength p to bound potential improvements of this step in Fig. 29(b).

Measuring gauge operators
To change gauge to the 3D stabilizer code, the Z-edges of color in K = {RG, RB, GB} are measured. Implementation details and optimizations of this step are given in Appendix F. We find a minimal length circuit to measure the gauge operators in parallel with a single ancilla qubit per edge. Including ancilla preparation and measurement, this circuit requires 8 time units, although the preparation can be done during the last time unit of the interior preparation step.
Since we choose to measure only Z-edge operators of color in K, not all Z-edge operators, we only learn the restricted noisy gauge flux γ K . We emphasize that due to faults in the measurement process, γ K is likely to violate the Gauss law and differ from the restricted gauge flux γ K in the system.
Optimistic improvements. It seems difficult to reduce the errors introduced by this step without increasing the space or time overhead significantly, and for example to use verified cat states to measure each edge operator. We will neglect any space or time overhead in our estimate so that we bound the effect of potential improvements. As each data qubit is in three measurements, a minimum of three time units would be required to implement the measurement. Therefore, we use three idle time units with circuit noise of strength p in our optimistic simulation in Fig. 29(c).

Gauge fixing
This step corresponds to a classical algorithm which takes as its input the restricted noisy gauge fluxγ K ⊆ ∆ 1 (L K 3D ) corresponding to some subset of edges of color in K = {RG, RB, GB}, and outputs some X-type gauge operator aiming to fixγ K . The algorithm proceeds as follows.
(i) Validation of the noisy flux: for any color K ∈ K we validate the noisy restricted flux γ K by matching its violation points within the restricted lattice L RG 3D . We denote by λ K ⊆ ∆ 1 (L K 3D ) the set of edges used in the pairing of the violation points of γ K . Then,γ = K∈K (γ K +λ K ) is the validated flux. Note thatγ ⊆ ∆ 1 (L K 3D ).
(ii) Gauge-fixing from the validated flux: we find an operator e∈f (γ) X(e) consistent withγ and apply it. Such an X-gauge operator can be efficiently found by e.g. Gaussian elimination.
In this step, we are treating the restricted noisy gauge flux γ K as arising from a gauge fluxγ with no branching points along with some measurement errors; see Fig. 30. In other words, we want to find a gauge fluxγ corresponding to some X-gauge operator just by looking at its restricted noisy gauge flux γ K . However the gauge flux can in fact have branching points due to X errors in the system. In such a case, the restricted noisy gauge flux does not even satisfy the Gauss law in the absence of measurement errors as we omit Z-type operators associated with RY , BY and GY edges. For every K ∈ K we choose to independently pair up all the violation points of γ K within the restricted lattice L K , which in turn allows us to find the desiredγ. Optimistic improvements. One can ask how close to optimal the performance of the algorithm is. For instance, it may be possible to not only estimate a gauge fix from the noisy Z-flux measurement, but also to correct X errors in the system; see Fig. 30. To bound the impact of any potential improvements of this step in our simulation, we assume that all X errors in the system prior to the gauge measurements are corrected, and that the gauge itself is identified exactly; see Fig. 29(d). The only remaining X error that is retained in this bound is therefore that which is introduced by the noisy gauge measurement circuits themselves.

Applying T and measuring the 3D code's data qubits
In this step, we first apply the T gate to each data qubit, and then measure it in the X basis. These can be combined into a single operation requiring one time unit. Despite being one of the simplest code switching steps to implement, it is quite challenging to simulate efficiently due to the large number of T gates. In what follows we describe how we exploit some additional structure and assumptions that render the simulation tractable.
Let the X-type residual error before the application of the logical T gate be supported on ρ X ⊆ ∆ 3 (L 3D ); similarly, let ρ Z denote the Z-type residual error. Note that if we could perfectly fix the gauge in the previous step and there were no additional X-errors, then there would be no residual X-error. Generically, there is some nontrivial residual X-error, which unlike the Z-error does not commute with the logical T gate. The effect of the residual X-error on the measurement outcomes amounts to, roughly speaking, flipping some outcomes along the Z-gauge flux γ = ∂ 3,1 ρ X of the the residual X-error ρ X . This is made precise in the following lemma.
Lemma V.1. Let ρ X , ρ Z ⊆ ∆ 3 (L 3D ) be residual Xand Z-errors. Let γ = ∂ 3,1 ρ X be the Z-gauge flux and γ = b j=1 λ j be a decomposition of γ into its linked components λ j 's. If T is applied to every data qubit and the X stabilizers are perfectly measured, then one obtains a syndrome where each excitation configuration σ j ∈ Σ(λ j ) is chosen uniformly at random from Σ(λ j ).
In above, we use notation introduced in Section IV B. To simulate this step, one can generate Pauli operators resulting in the same distribution of syndromes. Note that there is a possibility of introducing a logical Z operator at this step, which we ignore in our simulation, leading to an underestimate of the failure of code switching. For more details and a justification of this lemma, see Ref. [99].
In our simulation we make the additional simplifying assumption to sample σ j uniformly from within the collection of excitation configurations without the linking charge Σ (λ j ) rather than the collection of excitation configurations Σ(λ j ). Recall that Σ (λ j ) ⊆ Σ(λ j ). This assumption amounts to ignoring the linking charge, and we do not expect it to substantially effect the performance. We generate the random Z-error τ Z ⊆ ∆ 3 (L 3D ) as where τ (γ| v ) is a local sampling procedure, which we now describe in detail. Let v ∈ ∆ 0 (γ) be a vertex of color A, which is incident to the flux γ = ∂ 3,1 ρ X , and γ| v be the restriction of γ to the edges incident to v. Let K = {R, G, B, Y } \ {A} denote the set of three different colors. We find a subset τ (γ| v ) ⊆ ∂ 0,3 v of tetrahedra containing v as follows.
(ii) For each K ∈ K , if Ξ K∈K γ| AK v = 0, then choose uniformly at random a subset E AK ⊆ γ| AK v of even cardinality; otherwise choose uniformly at random a subset E AK ⊆ γ| AK v of odd cardinality.
(iii) Find a subset of tetrahedra τ (γ| v ), whose 1-boundary locally matches K∈K E AK , i.e., We now explain why this algorithm produces τ Z ⊆ ∆ 3 (L 3D ) with the correct syndrome distribution. First, any excitation configuration σ j ∈ Σ (λ j ) can be viewed as a sum of local excitation configurations with neutral total charge. If the vertex v is in ∆ 0 (λ j ), then step (ii) is equivalent to randomly selecting a local excitation configuration for λ j | v , which is created by operators supported within the neighborhood ∂ 0,3 v of v and with the neutral total charge. Since each local excitation configuration is equally likely selected, thus the resulting excitation configuration σ j is chosen uniformly at random from Σ (λ j ). Also note that the search in (iii) can be implemented by exhaustively checking which of the possible subsets of tetrahedra ∂ 0,3 v containing v satisfies Eq. (39). This naive implementation is nevertheless efficient since |∂ 0,3 v| is bounded. Lastly, we remark that the residual Z-error present in the system right before the measurement in the X basis is ρ Z + τ Z . We use this Z-error in the following code switching step. 6. Decoding Z errors in the 3D color code In the final step of the protocol, a classical decoding algorithm is run to correct Z errors for the 3D stabilizer color code. The input to the decoder is the set of X measurement outcomes for each data qubit from the previous step. The output of the decoder will be a Z-type Pauli correction which, if applied, would flip all the syndromes computed given the single-qubit X outcomes. At that point one can reliably read off the logical X measurement m = ±1. Then, the state encoded in the remaining 2D color code from step 1 is (|0 2D + me iπ/4 |1 2D )/ √ 2, which is the encoded T state, up to the multiplication by Z for m = −1.
We implement a decoder based on the restriction decoder from Ref. [31], but modified in two ways. The first modification is to adapt the decoder to the 3D stabilizer color code on the lattice L 3D which has boundaries. The second modification is to improve performance. We do this by favoring low-weight corrections (which differ by a stabilizer from the output of the original decoder). Running four independent versions of this decoder in parallel, we then simply select the lowest-weight correction from the four candidates. We describe and analyze the performance of this modified restriction decoder in Section IV C.
Optimistic improvements. A different decoder may improve the performance of code switching. It is difficult to give as rigorous bounds on the performance of this step as we were able to give for the previous steps. However, by assuming that the best decoder performs as if the phase-flip Z noise was iid, we estimate the effect of any potential performance improvements using the following steps: (i) Run the modified restriction decoder, and if it succeeds then we assume the improved version would also succeed, if it fails then continue to the next step.
(ii) Let w be the minimum of the weight of the error, and the weight of the correction produced by the modified restriction decoder. Let n be the number of data qubits in the distance-d 3D stabilizer color code. Choose uniformly at random a real number from 0 to 1, and if it is 2 , then the correction fails; otherwise, the correction succeeds.
Note that p 1.9% is the known threshold of the optimal decoder for the 3D stabilizer color code under iid Z noise [95]. We provide more detailed justifications for this estimate in Appendix G. The impact of potential improvements of this step on code switching is shown in Fig. 29(e).

B. Code switching overhead
To calculate the space and time overhead required to produce an encoded T state with failure probability p fin , we first find the minimum distance d(p fin ) which achieves this by extrapolating the data in Fig. 29(a). The time to implement the code switching protocol at distance d is simply 8(d + 1), which is the time for (d + 1) QEC cycles of the 2D color code to complete. The number of qubits to implement the code switching protocol at distance d is This is comprised of two 2D color code patches (each of which has N 2D (d) = (3d 2 − 1)/2 qubits from Eq. (12)) and the 3D interior (which has d(d 2 + 1)/2 − (1 + 3d 2 )/4 data qubits and (d − 1)(7d 2 + 10d + 15)/12 − 3(d + 1)(d − 1)/8 measurement qubits). Details of the lattices which make clear where these numbers originate from can be found in Appendix A.
In Fig. 29 we fit a special form of the ansatz in Eq. (7), i.e., p fail = A (p/p * CS ) (Cd+D) to fit the data up to the crossing. Setting the failure probability to be at most the target infidelity, i.e. p fail ≤ p fin , and solving for d we obtain where we round the right hand side up to the closest odd integer. Finally, in Fig. 31 we substitute this into N CS (d) and plot the qubit overhead and the space-time overhead as a function of p fin for various values of p for two cases: assuming no improvements, and assuming all optimistic improvements can be achieved.

VI. DISCUSSION AND ACKNOWLEDGMENTS
In this work, we have simulated concrete realizations of state distillation and code switching under circuit noise and compare the overhead that each requires to produce high-fidelity T states encoded in 2D color codes. We have focused on regimes of practical interest, with physical error rates from 10 −4 to 10 −3 and target logical error rates from 10 −20 to 10 −4 ; see Fig. 2. For error rates of 10 −3 and above, our implementation of code switching is not viable since its observed threshold was 0.07(1)%, considerably lower than that of 0.37(1)% for state distillation. For error rates near 5 · 10 −4 , our implementation of code switching becomes viable, but requires considerably more overhead than state distillation. However for error rates around 10 −4 , our code switching implementation begins to slightly outperform that of state distillation. Lastly, we see in Fig. 2 that a highly-optimistic implementation of code switching (which may be very difficult if not impossible to achieve) would not provide substantial savings over our simple implementation of state distillation in the studied regimes.
We now discuss the scaling of the space overhead O S * and space-time overhead O ST * of code switching and state distillation for small physical error rate, i.e., p 1. We further assume that log p fin / log p 1. As we show in Appendix H, we have where c S * and c ST * are some constants, Γ CS = 3 for code switching and Γ SD = max(2, log F R) for state distillation. Here, R is the ratio of the number of input to output magic states and F is the order of error suppression in a single distillation round. The change in the value of Γ SD can be understood as follows-if log F R > 2, then the overhead is dominated by the initial distillation round; otherwise, it is dominated by the final round. For the 15-to-1 distillation scheme we have R = 15 and F = 3, leading to log F R ≈ 2.46 and Γ SD < Γ CS . From Eq. (42) we thus conclude that for log p fin / log p 1 state distillation will always outperform code switching. On the other hand, if we are willing to extrapolate Eq. (42) to more modest values of log p fin / log p, the overhead of code switching is predicted to drop below that of state distillation. The precise value of the crossover is highly sensitive to the implementation details, but this could explain why we observe code switching outperforming state distillation for low p and modest p fin in Fig. 2.
Our findings point toward a number of future research directions. First, it could be fruitful to explore the potential of code switching in the regime of very low error rates, which may be achievable in trapped ion qubits [101] and topological qubits [38]. Second, improving the 2D color code state preparation procedure, syndrome extraction, and decoding algorithms would further reduce the overhead of both code switching and state distillation. We expect that code switching may also greatly benefit from better 3D color code decoders. Third, our code switching protocol produces the T state, but it is possible to directly apply the T gate rather than injecting it which may allow one to implement code switching in constant time rather than a time which scales with the code distance d. Fourth, we have seen that in contradiction with the commonly held belief that a distillation scheme overhead is dominated by that of the first round ∼ (log 1/p fin ) log F R , the last round overhead is ∼ (log 1/p fin ) 2 when implemented with 2D codes, and dominates when log F R < 2. This suggests that searches and optimizations of new distillation schemes should not be restricted to those with very low log F R.
to thank Achiamar Lee Rivera for her unwavering support throughout this research. A.K. is deeply indebted to Héctor Bombín for many colorful discussions. A.K. acknowledges funding provided by the Simons Foundation through the "It from Qubit" Collaboration. Research at Perimeter Institute is supported in part by the Government of Canada through the Department of Innovation, Science and Economic Development Canada and by the Province of Ontario through the Ministry of Colleges and Universities. This work was completed prior to A.K. joining AWS Center for Quantum Computing.

A. Lattice parameters and specifications
Here we provide parameters of the lattices which are used throughout the paper. Table II and  Table III show parameters of the direct lattice and the dual lattice used to define the 2D and 3D color codes of distance d. In general, these lattices can be obtained by tessellating a D-sphere with D-simplices and then removing one vertex from that tessellation, as well as all the simplices containing that vertex [56,79].  24   TABLE III. The lattice L 3D , which we use to define the 3D subsystem and stabilizer color codes of distance d, consists of (d − 1)(d + 1)(d + 3)/12 + 4 vertices, (d − 1)(7d 2 + 10d + 15)/12 + 6 edges, d 3 + d + 2 faces, and (d 3 + d)/2 tetrahedra. We group vertices and edges of L 3D into, respectively, five and three different categories depending on the number of tetrahedra incident to them. Note that qubits are placed on tetrahedra, whereas stabilizer and gauge generators are identified with vertices and edges of L 3D .

B. Supplementary details for 2D color code simulations
Here we provide additional data for the analysis of the performance of the 2D color code using the faulty-measurement projection decoder under phenomenological noise and circuit noise in Section II. In Fig. 32 we show the performance under phenomenological noise, which we use to produce the long-time pseudo-threshold plot in Fig. 11(d) in the main text. Similarly, in Fig. 33 we show the performance under phenomenological noise, which we use to produce the long-time pseudo-threshold in Fig. 13(b) in the main text. We point out that the decay parameter in the fit of the time-dependent pseudo-threshold in Fig. 32(b) and Fig. 33(b) shows that convergence to the long-time pseudo-threshold seems not to depend significantly on the system size. In Fig. 34 we provide the data used to produce the crossings in Fig. 13(b). In Fig. 35 we show the impact of optimizing the stabilizer extraction circuit on the performance of error correction under circuit noise. In Fig. 36, we show additional data from which we extract the logical failure rates for p = 0.001, 0.0001 in Fig. 16(b) in the main text. The failure probability of n cyc QEC cycles for phenomenological noise of strength p given a perfect initial state and final QEC cycle, for various distances d. We estimate the time-dependent pseudo-threshold p * phe (d, n cyc ) as an intersection of quadratic fits (solid lines) with the corresponding physical qubit error probability p phy (n cyc ) as defined in Eq. (5) (dashed curves). (a) The time-dependent pseudothreshold p * phe (d, n cyc ) as a function of n cyc . We estimate the long-time pseudo-thresholds p * phe (d) by fitting p * phe (d, n cyc ) with the ansatz in Eq. (11) and in (b) we plot the decay parameter γ. We observe that γ stabilizes rapidly with increasing d, confirming that the residual noise reaches equilibrium over a time which is independent of system size. cir (d, n cyc ) as a function of n cyc . We estimate the long-time pseudo-thresholds p * cir (d) by fitting p * cir (d, n cyc ) with the ansatz in Eq. (11) and in (b) we plot the decay parameter γ. We observe that γ stabilizes rapidly with increasing d, confirming that the residual noise reaches equilibrium over a time which is independent of system size. The failure probability p fail of the noisy-syndrome projection decoder for n cyc = d rounds of circuit noise and various distances d. We use this data to find the failure curve crossings p × cir (d), which we plot in Fig. 13(b). FIG. 35. Performance comparison of (a) the best ranked, and (b) the worst ranked length-7 CNOT schedules from Fig. 13(a), namely {4, 1, 2, 3, 6, 5; 3, 2, 5, 6, 7, 4} and {4, 1, 2, 7, 6, 3; 1, 6, 7, 4, 5, 2}. We numerically estimate the failure probability of n cyc QEC cycles for circuit noise of strength p for d = 13 given a perfect initial state and final QEC cycle. We fit this data with quadratic functions (solid lines), and also show the physical qubit error probability p phy (n cyc ) (dashed curves). We estimate the pseudo-threshold p * cir (d, n cyc ) which we plot in sub-figure (c) by identifying the intersection of the solid and dashed curves for a given d, p and n cyc . We see that the long-time pseudo-threshold for the good circuit is 0.193(2)% (blue, dashed), considerably larger than that of 0.126(3)% for the bad circuit (yellow, dashed).  Now we elaborate on the procedure to initialize a T state in a distance d color code described in Section III A, and explain some structure that we exploit to simplify the noise analysis. First note that even in the absence of noise, the stabilizers which are measured in the initialization protocol have uncertain outcomes because the state is not a code state of the 2D color code at the start of the protocol. Let A and B be the two sets of qubits prepared in |0 and |+ , respectively; see Fig. 20(a). The protocol involves taking the observed syndrome σ, and producing a Pauli operator P (σ) = P X P Z , where P X and P Z are Pauli X and Z operators supported on qubits in B and A, respectively. The protocol rejects if no such fix exists or if the syndromes from the consecutive rounds of QEC differ . Note that the fix is unique up to a stabilizer since neither of the disjoint sets A and B support a logical operator. This implies that for any two syndromes σ and σ the Pauli operators P (σ)P (σ ) and P (σ + σ ) are equivalent up to a stabilizer.
To faithfully simulate this protocol in the presence of noise, one can consider the scenario in which a complete set of perfect stabilizer measurements is performed (and their outcomes discarded) before the QEC circuits are applied. This allows us to begin the simulation with a state, which is an eigenstate of the stabilizer group. The simulation can therefore be implemented as follows.
1. Start with a perfect code state.
2. With some probability Pr(σ), apply a Pauli operator P (σ) = P X (σ)P Z (σ), where P X (σ) and P Z (σ) are Pauli X and Z operators supported within B and A, respectively.
3. Run the noisy circuits explicitly as described in Section I B. A set of faults will propagate to some Pauli error E on the data qubits at the end of the protocol. The faults could result in the syndromes from the two QEC cycles disagreeing, in which case the protocol is rejected, or both QEC rounds having the same observed syndrome σ, possibly different from the syndrome σ of P . Note that σ + σ can only depend on the set of faults, and is independent of σ.
4. Apply a Pauli operator P ( σ) = P X ( σ)P Z ( σ) with the syndrome σ, where P X ( σ) and P Z ( σ) are Pauli X and Z operators supported on qubits in B and A, or reject if such an operator cannot be found.
5. If the net operator EP (σ)P ( σ) is correctable, then we say that the protocol succeeds, and otherwise we say it fails.
However, note that the net operator EP (σ)P ( σ) must be independent of σ, since P ( σ) and P (σ)P (σ + σ) are equivalent up to a stabilizer. In our simulation we therefore set σ to be trivial for simplicity without loss of generality.

D. Preparing the 3D interior for code switching
Here we consider the 3D interior of the initial state for code switching discussed in Section V A 2, which consists of a (trivial) 2D color code state on the qubits surrounding each Y interior vertex. This object can be viewed as a 2D sphere; see Fig. 37. This state of each sphere is formed by preparing all data qubits in the |+ state, then measuring the Z stabilizers, each with an ancilla measurement qubit in a single QEC cycle, followed by the application of an appropriate X Pauli to fix incorrect stabilizers.
CNOT circuits.-For each of the four types of spheres, we find an ordering of CNOTs which allows for a minimal-length circuit to measure the stabilizers; see Table V. These sequences were found using a greedy algorithm with a random initial sequence and minimizing the cost function of the number of unsatisfied constraints (we require that no qubit is involved in more than one gate per time unit) minus the average time unit when each gate occurs. The greedy algorithm terminates when all constraints are satisfied and when the cost function cannot be further reduced. We fixed the total number of time units to be minimal (4 CNOT time units for the 8-qubit sphere, which involves only weight-4 stabilizers, and 6 CNOT time units for the 12-, 18-and 24qubit spheres, which involve weight-6 stabilizers). We maximize the average time unit when each gate occurs since this removes idle time units by not preparing ancillas for lower-weight stabilizer measurements until they are needed. The sequences we found are optimal with regard to this cost function: all constraints were satisfied using the minimal number of time units, and all weight-r stabilizer generators have CNOTs in the last r time units of each schedule. There could be many schedules which are optimal in this sense, but we did not attempt to explore among those for schedules which performed better.
To make the representation of the schedules slightly more compact, we specify 17 vertex locations in a list v list in lexicographical order as follows Overall code switching performance of four syndrome fixing approaches for the preparation of 2D spherical color codes. Each approach either uses or does not use the repeat (R) and flip (F) strategies for distance d = 9. In the approach which uses both strategies, we use the repeat strategy, and the flip strategy if the repeat also yields an invalid syndrome. In each case, the stabilizers are measured with the CNOT circuits described in Table V. Both the repeat and flip strategies improve performance.
(0, 0, 0) and is the eighth entry in v list . To apply the CNOT schedules we specify in Table V, one must map the neighborhood of every vertex to one of these four standard configurations. All interior vertices lie precisely on these standard coordinates relative to their central yellow vertex. Boundary vertices connected to their central yellow vertex will not be on the standard coordinates in v list , but can be identified and shifted to the standard coordinate uniquely since there is at most one boundary vertex of each color included in the neighborhood of any yellow vertex.
Syndrome fixing.-In the absence of error, the syndrome readout of the Z-stabilizer measurements for each spherical 2D color code (which will have random outcomes) determines the X-type gauge operator to 'fix' the outcomes to be +1. Since the spherical 2D color code encodes no logical qubits, any X-type operator with the correct syndrome will suffice, and can be found with Gaussian elimination. However, faults in the measurement circuits can render the syndrome invalid, in the sense that there is no gauge operator with the observed syndrome. We consider a number of strategies to deal with this. Firstly, we consider the repeat strategy: if the observed syndrome is invalid, we repeat the preparation once. Secondly, we consider the flip strategy: if the observed syndrome is invalid, we flip a bit of the syndrome to produce a valid syndrome, and then apply a fixing operator. We test the impact of these approaches on the failure probability of code switching using a distance 9 code in Fig. 38. Note that when using the repeat strategy, if the first round is successful, the qubits are left idle in the prepared state to allow for a second round to be applied on spheres that failed the first round. In practice we find that these idle rounds do not have a lot of impact on the performance in the regime of interest. We see that both the repeat and flip strategies improve performance, and so we incorporate both into the optimized code switching protocol. , and has all other steps as described in Section V for the optimized code switching protocol.

E. Choices of basis for code switching preparation
Here we analyze the effect of the choice of basis in first two steps of the code switching described in Section V A 1 and Section V A 2. The performance of the code may depend on this choice as it could lead to an asymmetry in the X and Z type noise and later steps of the protocol would not treat X and Z error in the same way.
The first choice is for the preparation of the Bell state in the 2D color code patches in Section V A 1. We fault-tolerantly prepare the first patch in |+ , and the second in |0 before applying a transversal CNOT from the first to the second. Since the CNOT copies X noise from the first to the second patch, and Z from the second to the first, we can expect that the first patch has more Z noise and the second patch has more X after a faulty preparation. We choose whether to feed the first or the second patch to be involved in code switching, which we refer to as 2D patch |+ and 2D patch |0 respectively.
The second choice is for the preparation of the 2D spherical color code in its unique code state around each interior yellow vertex as described in Section V A 2. We can do this by either preparing each data qubit in |0 and then measuring the X stabilizers, or by preparing each data qubit in |+ and then measuring the Z stabilizes. We refer to these cases as 3D interior |0 and 3D interior |+ , respectively, and compare their impact along with the choice for the patch preparation in Fig. 39. We find that there is very little difference in the performance of each of these choices, and we select 2D patch |+ and 3D interior |+ for use in the optimized version of the protocol.

F. Gauge measurement circuits for code switching
Here we discuss the circuits used to measure the Z-type gauge generators corresponding to RG, RB and GB edges used in Section V A 3 for code switching. We find a minimum-length circuit with a single ancilla, which needs 8 time units (including preparation and measurement). A more naive circuit of 20 time units is constructed by initially preparing ancillas in the first time unit, sequentially applying CNOTs associated with RG, RB and GB edges in time units 2 − 7, 8 − 13, 14 − 19 respectively, and finally measuring ancillas in time unit 20.
To specify the CNOT sequences, we separate edges (measurement qubits) into 20 types by color and orientation and identify the tetrahedra containing them (data qubits) in a systematic way in terms of their spatial position relative to the edge. The CNOT sequence is then specified for each edge type. To specify the neighborhood of each edge we label every qubit in its support with some number from 1 to 6 in such a way that all edges of the same type have qubits labelled locally, and are all consistent.
For completeness, we now specify our conventions to label the circuits. To label tetrahedra around an edge with two interior vertices, we specify a local coordinate system. The z basis vector for an RG, GB or RB edge is the normalized vector v edge from the first to the second vertex. The x basis vector is the normalized vector parallel to v irr = (1, π, √ 2), but perpendicular to v edge , which is guaranteed not to be parallel to any edges. The y basis vector is then specified uniquely to form a right hand coordinate system (x, y, z). The tetrahedra around the edge are then labelled in sequence according to the azimuthal angle to their mid-point in this coordinate system. For edges which contain a boundary vertex, the same applies, but the vector v edge is specified according to Table VI. In the bulk, the tetrahedra are positioned regular azimuthal angles for each edge type, but at the boundary we identify the tetrahedra by whichever standard angle it is closest to. When tetrahedra are 'missing' around an edge due to cuts along the boundary, they are simply skipped in the CNOT sequence, with no CNOTs applied during the assigned time unit for that edge.
The sequence in Table VI was found using a greedy algorithm starting with a random initial sequence and minimizing the cost function of the number of unsatisfied constraints (we require TABLE VI. The measurement circuit specification for RG, RB and GB gauge operators. Edges connecting interior vertices (first 14 rows) are labelled by the vertex colors (first column) and the vector connecting them (second column). Edges connecting a boundary vertex are labelled by that boundary vertex v R , v G or v B and the color of the interior vertex. Qubits around an edge are labelled in order of increasing azimuthal angle, and the column in which each data qubit appears specifies which time unit a CNOT occurs from that data qubit to the corresponding mesurement qubit for the edge. that no qubit is involved in more than one gate per time step) minus the average step time for each gate. The result is the shortest possible measurement circuit using a single measurement qubit per edge, and which involves no idle time units for edges types of weight four.
To simulate the gauge measurements, we first apply an X type gauge operator g X supported on randomly selected RB, GB and RG edges. Then the specified circuits are applied and produce the outcomesγ, which in the absence of error would correspond to precisely the Z-edges which anti-commute with g X .

G. Optimistic improvements for the 3D color code decoder
Here we justify of our estimate of the impact on code switching performance due to any potential improvements on the 3D color code decoder as used in Section V A 6. For any error E we define its stabilizer-reduced weight to be the minimum of the weight of sE for any stabilizer s. The estimate rests on the following assumptions.
1. We assume that the stabilizer-reduced weight of errors generated by code switching can be approximated by the minimum of the weight of either the error itself or it's correction produced by the modified restriction decoder.
2. The decoder does not take into account any correlations in the noise produced during the various steps of code switching and its failure probability for errors with stabilizer-reduced weight w produced by code switching is no lower than that of the optimal decoder for iid Z errors of weight w.
3. The optimal decoder for iid Z noise, when applied to uniformly drawn weight-w errors has failure probability satisfying where n is the number of data qubits, and p (1) 3DCC 1.9% is the known optimal threshold of the the 3D stabilizer color code decoder for iid Z noise [95].
We use the stabilizer-reduced weight rather than the actual weight of an error as a proxy for how hard it is to correct. It is important to do this since errors are equivalent up to the application of a stabilizer when applied to code states. Moreover, single faults in a stabilizer measurement circuit can propagate to a high-weight error which is equivalent to a low weight error up to the stabilizer being measured by the circuit. The first assumption is then made to avoid needing to multiply the error by an exponential number of stabilizers to find its minimum-weight representative. This assumption is somewhat justified by the fact that the modified restriction decoder in Section IV C seeks to output a low-weight correction.
The second assumption is somewhat crude, since decoders which take correlations into account can outperform those which do not; see e.g. Ref. [28]. However, we suspect that the assumption holds for typical error patterns, since the decoder will have to correct uncorrelated weight w errors in addition to correlated errors. Therefore, we believe it is unlikely that one can design a decoder which significantly outperforms the optimal decoder for iid Z noise in this setting. We gain further confidence in the first two assumptions by numerically testing the decoder on correlated and uncorrelated noise; see Fig. 41. There, we verify that the performance of the modified restriction decoder for noise produced by the code switching protocol with stabilizerreduced weight w (estimated using the first assumption) is comparable to its performance forr iid Z noise with weight w.
The third assumption is based on the heuristic behavior of error correction failure rate p fail in topological codes for error rate p in the vicinity of their threshold p * [35,77,78], i.e., that p fail (p, d) (p/p * ) (d+1)/2 , which is a special case of the ansatz in Eq. (7). To form an inequality in terms of the error weight w, we first take w as a proxy for the typical error w , which for iid Z noise on n qubits satisfies w = pn. We then note that taking p * as a prefactor would correspond to the pseudo-threshold being independent of system size, thereby overestimating error rate for most topological codes. Performance of the modified restriction decoder for the 3D color code for distance d = 9 given various error weights. Uncorrelated Z noise (blue) of weight w is generated by randomly selecting w of the n qubits and applying a Z error. Correlated Z noise (yellow) is generated by running the decoder for error rate p = 0.0003, and for each error generated, estimating its stabilizer-reduced weight w by taking the smaller of the weight of the error itself and its correction. The correlated and uncorrelated cases are not identical but are very similar, justifying our assumptions.

H. Asymptotic overhead analysis
Here we derive the scaling of the space and space-time overhead of code switching between 2D and 3D codes and state distillation using 2D codes in Eq. (42) in Section VI. We consider the regime with p 1 and log p fin / log p 1, which allows us to assume the code distances and the number of distillation rounds are large.
First, we consider code switching. We assume that the failure probability obeys In order to implement code switching we need (to leading order in d) N 3D d 3 qubits and τ 3D d time units. In the implementation presented in Section V, N 3D = 13/12 and τ 3D = 8. To produce a state of infidelity p fin , we find the smallest d for which p CS (p, d) < p fin , arriving at the expressions Now we consider distillation. The starting point is the simplification that the failure probability of logical operations in the 2D color code is approximately captured by the phenomenological form We will assume that the iterative application of an R-to-1 distillation scheme which (with perfect Clifford operations) reduces the error in a T state from q to Eq F using a Clifford circuit containing L locations and R anc encoded ancilla qubits. In the case of the 15-to-1 scheme presented in Section III, R = 15, E = 35, and F = 3, and L is of the order of 10 3 , depending on the precise details of the implementation and the effective noise model. We assume that the input to the first round is an encoded magic state in a code of fixed size d 0 with infidelity q 0 = Gp H , where the initialization scheme presented in Section III A 1 has G = 6.07 and H = 1. We consider k rounds of distillation, with distances {d 1 , d 2 , . . . , d k }, and with outputs of each round having infidelity {q 1 , q 2 , . . . , q k } satisfying q i+1 = Eq F i in the absence of Clifford noise. In this analysis, we will assume that for each round the distance d i is chosen such that Lp 2D (p, d i ) ∼ q i , and neglect the effect of Clifford error. We can use the fact that the final round should output the target infidelity, i.e., q k = p fin , to relate q i to p fin as follows We solve for the number of rounds k in q 0 = Gp H = p α(k) fin /E β(k) , which yields k = log F log E + (F − 1) log p fin log E + (F − 1) log(Gp H ) .
The ratio of the number of qubits needed in round i + 1 and round i is as p → 0 implies q i → 0. Note that this ratio is independent of the round number i, implying that the overhead changes monotonically with i. Therefore, if log F R > 2, the first round dominates, whereas if log F R < 2 then the last round dominates. We can therefore conclude that the space overhead is dominated by either the first or last round of distillation. The distances d 1 and d k for the first and last rounds are There are R+Ranc R R k logical qubits needed in the first rounds, and R + R anc for the last. To leading order in d, there are N 2D d 2 qubits needed for a distance d 2D color code, and τ 2D d time units, where N 2D = 3/2 and τ 2D = 8 in our implementation in Eq. (12). The time cost scales as the sum of the distances, which is between τ 2D d k and kτ 2D d k since d k is the largest distance; see Eq. (23). However, we just take the time cost to be τ 2D d k (neglecting the prefactor k) since k depends doubly logarithmically on both p and p fin . The space and space-time overhead are then Assuming p 1 and log p fin / log p We expect the asymptotic expressions derived here to apply quite generally for code switching from a 2D to a 3D code, and for state distillation using 2D codes.