Decoding Merged Color-Surface Codes and Finding Fault-Tolerant Clifford Circuits Using Solvers for Satisfiability Modulo Theories

Universal fault-tolerant quantum computers will require the use of efficient protocols to implement encoded operations necessary in the execution of algorithms. In this work, we show how solvers for satisfiability modulo theories (SMT solvers) can be used to automate the construction of Clifford circuits with certain fault-tolerance properties and we apply our techniques to a fault-tolerant magic-state-preparation protocol. Part of the protocol requires converting magic states encoded in the color code to magic states encoded in the surface code. Since the teleportation step involves decoding a color code merged with a surface code, we develop a decoding algorithm that is applicable to such codes.


I. INTRODUCTION
Many problems in quantum computing require the construction of Clifford circuits with some desired properties. For instance, in topological quantum error correction, multi-qubit gates used to measure the stabilizers of the code must be implemented in a particular order to prevent small errors from propagating to large errors which reduce the effective distance of the code [1][2][3][4]. In many cases, fault-tolerant circuits for syndrome extraction require the use of extra ancilla qubits, known as flag qubits, whose role is to detect or prevent the propagation of errors arising from a small number of faults to large data qubit errors [3][4][5][6][7][8][9][10][11][12][13][14][15][16]. For instance, in Refs. [17,18], it was shown that flag qubits could be used to fault-tolerantly prepare high-fidelity magic states without the use of top-down magic state distillation protocols, by which we refer to protocols which use encoded gates to implement all the Clifford operations of the distillation circuits. As such, the Clifford circuits are typically not fault-tolerant. We refer to protocols which prepare magic states using fault-tolerant Clifford circuits as bottom-up protocols. In many cases, fault-tolerant circuits used in bottom-up protocols are constructed from either first principles or brute-force numerical methods.
In this paper, we show in Section II how to formulate the desired properties (or constraints) of Clifford circuits in a format compatible with satisfiability modulo theories (SMT). These problems can then be solved using SMT solvers such as Z3 [19]. In Section III, we apply these techniques to construct fault-tolerant circuits for preparing |H -type magic states encoded in the color code [20][21][22][23], where the physical qubits are constrained to live on a two-dimensional (2D) lattice interacting via nearest neighbors with low degree connectivity. Such constructions have the potential to be suitable for many quantum computing hardware architectures currently under development.
Currently, the leading approach to protect logical information afflicted by noise during a quantum computation is to use a two-dimensional topological quantum error-correcting code [24], such as the surface code [25,26] or the color code. Such codes are then combined with magic state distillation and lattice surgery to perform universal fault-tolerant quantum computation. In particular, the surface code has several ad-vantages over the color code [27]. For instance, the surface code has a much higher noise threshold than the color code and can achieve desired logical failure rates using fewer qubits for physical error rates expected in near term architectures. Variations of the surface code, such as the XZZX code [28], may provide some advantages over the surface code in settings where the underlying noise model exhibits some bias. However, the surface code still provides lower overhead costs to achieve a given logical error rate for most studied realistic noise models [29]. Since the methods of Section III are used to prepare magic states encoded in the color code, in Section IV we show how magic states encoded in the color code can be converted to magic states encoded in the surface code. The schemes involve a teleportation protocol that is implemented using lattice surgery methods. In particular, using gauge fixing to perform an X ⊗ X logical Pauli measurement, the color codes and surface code are merged into one code. However, known decoders for surface codes and color codes are not suitable for correcting errors of the merged code. As such, we conclude this paper by presenting a decoding algorithm that can be used to decode the merged code and is hopefully of value for successfully converting states encoded in the color code to states encoded in the surface code. compilation of Clifford circuits is sometimes performed by a skilled researcher, possibly aided by software that verifies the circuit has the desired computational and fault-tolerance properties. This approach is time-consuming, unpredictable, and may not be as flexible as desired.
In this paper, we document an alternative approach to handdesigning Clifford circuits in which the constraints on a quantum circuit are formulated as an SMT decision problem which we define in Section II A 1. This problem can then be solved by an off-the-shelf SMT solver such as Z3 [19]. Despite SMT decision problems having exponential or worse time complexity for hard instances, "automated reasoning" software libraries such as Z3 have been heavily optimized and refined through decades of research and are are now widely applied in formal software verification and electronic design automation, among many other domains. They can scale to solve problems containing thousands of variables in diverse domains through careful tuning of problem encoding and solver techniques [31][32][33]. In Section II A we show how arbitrary computations from the Clifford group can be represented by bit-matrices, and how to solve for circuits implementing these operations using a limited gate set. We then explain how faults in the circuit can be symbolically propagated through to the end of the circuit, allowing constraints to be added to the SMT problem. In Sections II B 4 and II B 5, we show how such constraints can be used to design circuits with guaranteed fault-tolerance properties. In Section II C, we describe techniques for constructing the SMT problems iteratively, which enable more scalable solutions that in turn can be used to solve more difficult circuit design problems of practical interest.

SMT Decision Problems
Boolean formulas consists of expressions such as the following: These expressions involve some Boolean (i.e., binary) variables (the X i terms in Eq. (1)) along with logical operators such as ∧, ∨, ⊕, and ¬. A Boolean formula such as F is satisfiable if there is a way to assign 0 or 1 to each of the X i such that F evaluates to 1. We call such an assignment of bits to the X i a satisfying assignment. Eq. (1) has the satisfying assignment X 1 = X 3 = 0, X 2 = 1.
Satisfiability modulo theories [31] extend the notion of a Boolean formula to an SMT formula such as the following: F SMT = (X 1 + (X 1 ⊕ X 2 ) ∧ (X 1 + X 2 + (X 3 = 0) + X 4 ≤ 1)) ≤ 3. ( These SMT formulas support variables and clauses over larger non-Boolean domains, such as the integers. They also support operators such as integer arithmetic (+, −, ×, ÷) and compari-son (=, =, ≤, ≥) along with the Boolean operators above. The type of an SMT expression is determined by the topmost operator in the parse tree; for example the formula in Eq.
(2) will evaluate to a Boolean due to the comparison operator. An SMT decision problem is an SMT formula such as F SMT which is of Boolean type (i.e., evaluates to a Boolean ∈ {0, 1}). An SMT solver, such as Z3 [19], is a software program that uses heuristic strategies to find either a satisfying assignment of values to all of the variables X i , such that F SMT ({X i }) evaluates to 1, or a formal proof that no such assignment exists.
SMT solvers exhibit good performance for a wide range of problems from program verification to network engineering [31][32][33]. This performance improves each year as measured by competitions [34,35]. SMT solvers have only recently been applied to quantum circuit synthesis, gate scheduling, and qubit routing [36][37][38][39]. This work uses the bit-matrix representation of Clifford operations to efficiently encode wholecircuit design problems subject to fault-tolerance constraints into SMT decision problems. As we will explain in the subsequent sections, this key technique enables synthesis of large fault-tolerant circuits from scratch to implement nontrivial Clifford operations while maintaining compatibility with 2D hardware.

Clifford Group
The Clifford group on n qubits G (modulo a global phase U (1)) is isomorphic to the binary symplectic group Sp(2n, F 2 ) whose elements may be considered matrices in F 2n×2n 2 preserving the symplectic inner product. In the context of quantum computation, these matrices can be thought of as acting on a bit-vector representation x ∈ F 2n 2 of a Pauli group stabilizer n i=1 X xi i Z xn+i i of a quantum state, modulo the unimportant global phase. For example, the CNOT gate acts on the basis {X 1 , X 2 , Z 1 , Z 2 } of the vector space over F 2 of the Pauli group (modulo phase) on two qubits, as follows: Therefore, the binary matrix representing the CNOT gate is We call this matrix the bit-matrix representation of the CNOT gate and denote it by CNOT. Given a Pauli operator with bit-vector e acting on the input state to a Clifford circuit C, we can propagate the operator through the circuit as e = Ce. That is, we left-multiply the bit-vector by the bitmatrix representation of the Clifford operation implemented by the circuit.
It will be helpful to define the reduced bit-matrices CNOT| X and CNOT| Z which are These matrices characterize the action of the Clifford operation, when one Pauli type (X or Z) is disregarded.

Product-Sum Relation
Given a Clifford circuit C consisting of a series of g gates with bit-matrices G 1 , . . . , G g , we now describe how we can find the bit-matrix representation C of the entire circuit C. Naively, we could multiply all of the gates in time order, finding C = g i=1 G i . This requires g matrix multiplications, where there are g gates in the circuit. In our approach to designing quantum circuits with SMT solvers, the bit-matrices of the gates (i.e. the G i ) will be symbolic matrices, meaning that their entries may be SMT formulas of some abstract variables rather than Boolean constants ∈ {0, 1}. It turns out that when the bit matrices G 1 , . . . , G g are symbolic, the naive approach of multiplying all gates in time order results in unwieldy formulas with an extremely large formula size, as we will explain below. So, we do not use this naive approach, and will later explain the product-sum relation which allows us to keep the formula size small when we design large quantum circuits.
Informally, we can define the formula size of a formula F as: The exact definition of formula size is not as important as the empirical fact that formulas with larger size require more memory and processing time to manipulate when constructing and solving the SMT decision problem with an SMT solver. It is therefore of fundamental importance to keep the formula size as small as possible to be able to design large quantum circuits. Note that we can multiply and add symbolic bit matrices modulo 2 using the SMT operators ∧, ⊕. For example, we can multiply and add symbolic 2 × 2 matrices as follows: The 2 × 2 symbolic matrices on the left side above have entries with formula size 1. Their product on the right has entries with size 7, while their sum has entries with size 3. As illustrated by this example, addition of symbolic matrices results in less of a formula size increase than multiplication.
We will now introduce a technique which allows us to construct the bit-matrix for C with only N matrix multiplications, where N is the number of time steps of C. The technique works by replacing many of the symbolic matrix multiplications with additions. Specifically, we will add instead of multiplying certain matrices corresponding to gates that act simultaneously on disjoint sets of qubits. As we have just explained above, symbolic matrix addition incurs a smaller size increase than multiplication. This technique therefore decreases the resulting formula sizes and improves solver performance, since N g.
Given a bit-matrix representation G ∈ F 2n×2n 2 of a Clifford gate G acting on n qubits, which acts trivially on the th qubit, it can be shown that where [n] := {1, . . . , n}. In other words, the matrix G must leave invariant all possible Pauli operators on the th qubit. We now define the following notation for a bit matrix G: From the previous observation we can see that the matrix ∆G is supported only in the combinatorial rectangle with rows and columns indexed in the set S of qubits supporting the gate corresponding to G. Therefore the product of two gates G 1 G 2 simplifies when acting on disjoint supports: In general, for m simultaneous Clifford gates G 1 , . . . , G m acting on pairwise disjoint sets of qubits, we can compute the composite bit-matrix of all these gates as We refer to Eq. (13) as the product-sum relation. We now give a small example of the product-sum relation for the circuit on 3 qubits shown in Eq. (14).
We refer to the CNOT gate on qubits 1 and 2 as CNOT 1,2 and the H (Hadamard) gate on qubit 3 as H 3 . The bit-matrices for these gates are as follows: These follow from the elementary propagation rules in Eqs. (3) to (6) as well as the relations HX = ZH and XH = HZ. From these we may easily check that as required in Eq. (12). One can also verify that adding I ⊕ ∆CNOT 1,2 ⊕ ∆H 3 = CNOT 1,2 H 3 as per the product-sum relation given in Eq. (13).
The product-sum relation is used in Section II B 2 to reduce the number of costly symbolic bit-matrix multiplications in the construction of the SMT decision problem from scaling with g (the number of gates) to scaling with N (the number of timesteps). For shallow quantum circuits on many qubits, N g, resulting in substantial savings on formula size. The product of the N symbolic matrices of dimensions 2n × 2n can then be computed as a symbolic matrix whose entries have formula size O(N n 3 ).

B. Solving for Clifford Circuits
The reader may notice many familiar notions, which in this section are re-formalized as SMT formulas to enable the precise characterization of the Clifford circuit design problem as an SMT decision problem.

Gate-Time Encoding
To encode a Clifford circuit design problem as an SMT decision problem (as defined in Section II A 1), we require a format for encoding an arbitrary circuit supported by the hardware in terms of some Boolean variables {X i }. This encoding should be efficient and easy to implement in the SMT solver software. We use the gate-time encoding of a circuit, which we define as follows. Suppose our quantum computer has n qubits and that it supports w distinct fundamental gate operations {G 1 , . . . , G w }. Finally, suppose that we wish to encode a circuit with at most N time steps. Then we encode the circuit by wN symbolic Boolean variables indexed as X ij where i ∈ [w] and j ∈ [N ]. By convention, the gate G i is applied at time step j if and only if X ij = 1. The Boolean values X ij then specify an arbitrary-depth N circuit C X consisting of G i gates.
For example, consider the layout of three qubits labeled 1, 2, 3 on the planar graph below. We can imagine that this corresponds to a physical device on a 2D surface, where the qubits have nearest-neighbor interactions shown by the graph edges. More specifically, for any pair of qubits connected by an edge, we can implement any CNOT gate on that pair. Suppose also that we can implement any single qubit Pauli X, Y, Z, Hadamard H, or phase S gate. Then we would have w = 6 + 3 + 3 + 3 + 3 + 3 = 21 distinct gates G 1 , . . . , G w , and our gate set would be as follows: The product-sum relation then gives a symbolic expression for the bit-matrix of the Clifford operation C performed by the circuit C in terms of Boolean variables X ij and the bitmatrices for the individual gates G i . This expression is given by At high level, our technique to construct an SMT decision problem is to construct a symbolic bit-matrix P which is simply the bit-matrix C of the circuit C, but where the entries C ij are now formulas involving the variables X kl that determine the circuit C, as well as some auxiliary variables Y i and the constants 0, 1. The auxiliary variables are used to avoid exponentially compounding increases in formula size when Arguments And(arguments) FastAnd(arguments) multiplying the N timestep matrices, and will be explained in Section II B 2.
Supposing we have some Clifford operation O whose bitmatrix is O, we can enforce that the circuit implements the desired Clifford operation simply by requiring that In practice, it is easy to compute the bit-matrix O as long as a the operation O is well-defined -we can for example use a long, unoptimized circuit which implements O non-faulttolerantly to compute the bit-matrix O. We give more details on how C is constructed in Section II B 2.
Additional requirements on the circuit can be added by conjuncting this core equality with any number of additional constraints. We begin in Section II B 3 by describing how some simple requirements related to the validity of the gate scheduling can be added to the SMT decision problem. We explain how similar techniques are used to enforce geometric locality and enable joint co-design of hardware layout with low-level gate scheduling. In Section II B 4 we explain how faults can be symbolically propagated through the symbolic bit-matrix and show an example application of this technique for faulttolerant surface code syndrome extraction. We generalize this technique in Section II B 5 to design v-flag fault tolerant circuits to implement desired clifford operations.

Symbolic Bit-Matrix Construction
For each i ∈ [w] we find the bit-matrix representation G i ∈ F 2n×2n of the ith gate. We then use the product-sum relation to express the symbolic bit-matrix for the circuit specified by X ij , using Eq. (16).
It will be helpful for Section II B 4 to define more generally a sequence of symbolic bit-matrices for the partial circuits consisting of the last N − k steps of the circuit C: Clearly, we have that C = C (0) and, furthermore, all of the matrix multiplications over F 2 can be implemented with any SMT solver using the fundamental operations of multiplication (i.e., logical AND) and addition mod 2 (i.e., exclusive-or XOR) on the formulas constituting the symbolic matrix, as explained in Section II A 3.
We will next explain two optimizations that we find dramatically reduce the formula size of the symbolic bit-matrices C (k) .
First, we explain the use of auxiliary variables in symbolic matrix multiplication. Suppose we are given three symbolic matrices X, Y, Z whose dimensions are all n × n and whose entries all have formula size 1, and we must compute their symbolic product XY Z. Using the multiplication of two symbolic matrices as a primitive, we could compute their symbolic product matrix as follows (naive approach): 1. Compute the symbolic matrix Y Z, whose entries have formula size Θ(n).
2. Compute the symbolic matrix X(Y Z), whose entries have formula size Θ(n 2 ).
More generally, the formula size of the entries of the product of N matrices using the naive approach is Θ(n N −1 ). Rather than compute XY Z directly, we can introduce the auxiliary symbolic matrix W = (w ij ) and proceed as follows (optimized approach): 1. Compute the symbolic matrix Y Z, whose entries have formula size Θ(n).
2. Add the formula equivalent to W = Y Z to the SMT problem formula, increasing the formula size by Θ(n 3 ).
3. Compute the symbolic matrix XW , whose entries have formula size Θ(n).
More generally, the formula size of the entries of the product of N matrices using the optimized approach remains O(n), but we incur a cost of O((N − 2)n 3 ) in the problem size. However this additional cost is negligible compared to the scaling of the naive approach above. Second, we explain how incremental simplification is used to optimize formula size when manipulating formulas with lots of constants. SMT solvers such as Z3 [19] generally operate in two stages. In the first stage, the formula is constructed to be solved according to the user's wishes. In the second stage, the solver applies a heuristic set of approaches to simplify the formula, derive lemmas, and eventually solve the formula or prove that it is unsatisfiable. Importantly, the formula given by the user in the first stage is left "as-is" during the first stageeven simple formulas such as the following parity expression: are not reduced or simplified (e.g. in the above example, to the literal value 1) in any way. This leads to a huge slowdown in constructing the SMT problem as manipulating these large symbolic formulas has a much larger memory and time overhead than directly manipulating single bits. The SMT solver's simplification routines may be manually triggered on the partial expressions that build up a formula, but this brings its own associated slowdowns since in many cases the solver spends time performing nontrivial simplifications. Moreover, we have observed that a pre-simplified formula can take longer to solve, as it is generally better to leave decisions about nontrivial simplifications, substitution, etc. to the finely tuned heuristics of the solver at solving time rather than to manually force potentially nontrivial simplifications during problem construction.
To avoid the worst of the slowdowns when manipulating large symbolic bit-matrices, we implemented variadic subroutines FastOr, FastAnd, FastXor which pool all constant arguments and only call the symbolic SMT Boolean functions when needed. An example showing how this reduces the formula size is shown in Table I. We found that this optimization leads to a ×100 speedup in the construction of the matrices P (k) and is thus extremely important for scaling up to large circuits involving dozens of qubits and time steps. Here is a small example. When evaluating one term in the inner sum in Eq. (16) we must symbolically compute the product X ij ∆G i , where we recall that X ij is a Boolean variable and ∆G i is a bit-matrix of (literal) Boolean values. As a toy example, imagine that the matrix is equal to then performing the multiplication using FastAnd will give the result whereas using Z3's logical operators directly will give For large matrices with many 0 entries this simplification has a dramatic effect. Another simplification we can make pertains to the use of auxiliary variables in bit matrix construction.

Constraints on Valid Circuits
Valid circuits must generally satisfy some gate exclusion relations. We describe these in terms of a set of SMT assertions {f i } that are conjuncted into the constraint satisfaction problem as i f i . In a typical setup there can be at most one gate acting on each qubit at any one time step. We can capture this as, for a time step t ∈ [N ] and a qubit q ∈ [n], in which the sum is over the integers (i.e., not modulo 2). We then obtain the combined valid circuit constraints as f (t,q) .
In some cases we may modify the gate exclusion relations (22). Here are several examples: • We may choose to represent particular gates by products of those in our gate set G 1 , . . . , G w . For example, we may represent the Pauli Y gate on a single qubit as acting with both the X and Z gates simultaneously. This would decrease the number of distinct gates w, making the circuit representation more efficient.
• We may limit the number of gates of a particular type that are applied, because e.g. the noise rate or overhead cost may be especially high for that gate type.
• We may enforce that few or no idling locations occur, as idling qubits add additional fault locations to the circuit.
• We may limit the number of distinct long-range gates between distant qubits, in case we can only manage to implement a few such gates and wish to use them sparingly.
• We may limit the number of distinct qubits with which any one qubit q ∈ [n] interacts, so as to minimize the degree of the connectivity graph. That is, we may enforce a condition such as in which d is the desired maximum degree.
• We may bind the gates used in one circuit design problem with the gates used in a different problem -that is, we may co-design multiple protocols simultaneously, with a global degree or other gate constraint enforced jointly across all the protocols. For example, suppose that we are designing two protocols labeled 1 and 2 which share a common set of qubits [n] and possible gates indexed by 1, . . . , w. Suppose that we wish for the protocols to be implementable on the same hardware, which is subject to a maximum degree connectivity of d.
Then we can enforce a condition such as in which the {X (k) ij } i,j are the gate-time encoding variables for protocol k.
In all of these cases we can construct the appropriate gate exclusion relations f i , using standard techniques to encode them as SMT decision formulas.
Illustration of how fault propagation constraints can be used to find a fault-tolerant gate scheduling for the surface code such that at most (d − 1)/2 faults does not result in a logical error. (a) 3 × 3 surface code with a vertical logical XL operator. (b) For a particular syndrome extraction circuit with the CNOT gates and time steps shown for one of the X-type stabilizers, a single fault resulting in an X error e on qubit 12 after the second time step propagates to a weight-two X error on the data qubits 3 and 6 that is parallel to the logical XL operator.
(c) Symbolically propagating the error arising from a fault at that spacetime location as e = C (2) e, we can impose the fault propagation constraints (right) to ensure that the fault does not propagate parallel to a logical operator. By imposing this constraint for each possible fault location for X-type errors, we ensure that no single faults in the syndrome extraction circuit propagate parallel to a XL logical operator. Similar constraints can be written to prevent Z-type errors from propagating parallel to a ZL logical operator.

Topological fault-tolerance using symbolic fault-propagation
We now introduce symbolic fault propagation and show how it can be applied to design fault-tolerant syndrome extraction circuits for topological codes such as the surface code. Since arbitrary noise operators can be expressed as linear combinations of Pauli operators, we can model the noise as a distribution on Pauli operators at each spacetime location (i.e., idle, gate, statepreparation and measurement) in a given circuit. Specifically, we describe a Pauli noise operator (ignoring global phases) as a column vector e ∈ F 2n 2 in which the first n rows correspond to X errors on the n qubits, and the second n rows correspond to Z errors.
Suppose that a noise process occurs at time step k resulting in some initial Pauli noise operator vector e. We symbolically propagate the error arising from the fault through the remainder of the circuit by computing the symbolic vector e = C (k) e (where the partial circuit bit-matrices are defined in Eq. (18)). We then impose constraints as needed on the final propagated error to ensure that the scheme is fault-tolerant up to the full code distance. The combined fault propagation constraint is then: in which NotFaultTolerant is a Boolean formula NotFaultTolerant(e 1 , . . . , e 2n ) on 2n variables which decides whether the propagated error at the single fault location invalidates the desired fault-tolerance properties of the circuits. For example, the gate scheduling chosen for the syndrome extraction circuit of a topological code must satisfy the property that errors propagate perpendicularly to the appropriate logical operator. An example is provided in Fig. 1.
In a noise model where two-qubit gates are afflicted by twoqubit Pauli errors, we may wish to enforce these constraints only for faults occurring on those two-qubit gates whose asso-ciated Boolean variables X ij are set to 1. In this case, we can amend the combined constraints as: single fault e after timestep k (¬NotFaultTolerant(e )) ∨ IsInvalidFault(X, e, k), (25) in which IsInvalidFault(X, e, k) is a Boolean formula which decides whether the error e could have occurred at that spacetime location.

Flag Fault-Tolerance
Another interesting application of symbolic fault propagation is in the construction of v-flag circuits used to faulttolerantly measure a given operator (for instance, a stabilizer of an error correcting code). Following the definition introduced in [10], a v-flag circuit for measuring a stabilizer g i has the property where for any set of t ≤ v faults resulting in the error E such that min(wt(E), wt(Eg i )) > t, at least one the flag qubits are measured nontrivially. Here wt(P ) corresponds to the weight of an operator P . More details can be found in Ref. [10]. In what follows, we say that a circuit flagged if at least one of the flag qubits is measured nontrivially. For each qubit i which is to be measured, we let P i ∈ {X, Y, Z} be the Pauli measurement basis. As the P i -basis measurement of qubit i must deterministically give a trivial +1 measurement outcome in the absence of any faults, the initial stabilizers of the input state must be mapped by the desired Clifford operation C to a set of stabilizer generators, which generate a stabilizer group that includes the stabilizer P i on the final state.
To symbolically verify whether a propagated error resulting from ≤ t faults causes the circuit to flag, it suffices to obtain the symbolic propagated error and to verify whether any flag qubit i gives a nontrivial P i -basis measurement outcome. For instance, if qubit i is measured in the P i = X basis, we verify that there is a Z or Y error on that qubit in the symbolic propagated error e .
We may allow the SMT solver to decide which qubits are to play the role of flag qubits. In this case, to each possible flag qubit i we add a Boolean variable IsFlag i , where IsFlag i will be set to 1 if and only if qubit i is a flag qubit in the final protocol. Similarly we may allow the SMT solver to choose the measurement basis P i for each measured qubit i by adding Boolean variables MeasuredInX i and MeasuredInZ i , with the convention that the Y basis is chosen if both are set to 1, and with an added constraint that MeasuredInX i ∨MeasuredInZ i = 1. For convenience, we may still refer to this encoded Pauli variable as P i . As we have just mentioned, the P i basis Pauli measurement outcome must deterministically give +1 when there are no faults, so the desired Clifford operation C must be compatible with the choice of P i and IsFlag i . The solution that we use is to make C itself depend on the setting of these variables, so that the circuit does not flag when there are no faults.
For each possible fault in the circuit we associate a tuple (k, e, e , S) where k ∈ [N ] is the time step such that the fault occurs, e ∈ F 2n 2 is the vector representation of the Pauli noise operator resulting from the fault, e = C (k) e and S is the stabilizer in whose measurement circuit the fault occurred.
We define several functions returning SMT formulas, as follows. The MinWt function returns a formula which evaluates to the minimum integer weight of the propagated error resulting from a given set of t faults when multiplied by the distinct stabilizers } in whose measurement circuits the t faults occurred [40]: Note that the integer min function can be implemented as an SMT formula using the comparison operators and the If-Then-Else operator, which are both supported [19]. The NontrivialOutcome(i, P i , e ) function returns a formula that evaluates to 1 if the P i -basis measurement outcome of qubit i gives a nontrivial -1 outcome in the presence of the error e .
As the operation C is guaranteed to give a +1 measurement outcome when there are no faults, this is easily computed as The IsFlagged function returns a formula which evaluates to a Boolean 1 value if and only if there is a flag qubit which gives a nontrivial measurement outcome where for brevity we have abbreviated the variables MeasuredInX i , MeasuredInZ i as simply P i . The IsValidFaultSet function returns a formula which evaluates to 1 if and only if the faults occur at a valid origin point: where the support of a gate supp(G) is the set of qubits on which it acts, and the support of a noise operator with bitvector e ∈ F 2n 2 is simply supp(e) = {i ∈ [n] : e i ∨ e n+i }. Note that e is not a free variable as it is known at the time of SMT decision problem creation. Therefore the IsValidFaultSet function returns a small SMT formula since the logical OR in Eq. (29) is efficiently implemented by the program which constructs the SMT formula, rather than symbolically encoded in the formula itself.
The IsNotTFlagFaultTolerant function returns a formula which evaluates to 1 if and only if the passed error violates the t−flag property of the circuit. That is, where we have suppressed all arguments except IsNotTFlagFaultTolerant for brevity.
To design a v-flag circuit, we construct an SMT problem with fault tolerance constraints IsVFlag which returns a formula evaluating to 1 if and only if the circuit is v-flag.
where we have again omitted the arguments for clarity, and it is understood that the conjunction is over all possible sets of t ≤ v errors which occur at t distinct fault locations. For Calderbank-Shor-Steane (CSS) code syndrome measurement circuits built from just CNOT gates, we may only concern ourselves with the propagating error type (e.g., the X type errors when measuring the X stabilizers). For this purpose we can consider just the restricted bit matrices P (k) | X , saving a factor of 4 on the size of the symbolic bit-matrices C (k) . Fig. 2 shows a simple example to illustrate the construction of the SMT formula to design a circuit to measure a two-qubit stabilizer with no flag qubits.

C. Iterative Solving
A circuit with depth N on n qubits with w possible gates at each time step, has at most N (w + q) distinct fault locations. As explained in Section II B 5 we can construct a SMT Qubit Interaction Graph  To circumvent this problem, we make use of an iterative approach as shown in Figure 3. We first construct the SMT problem instance F SMT as simply that is, without any fault-tolerance constraints. We then check the v-flag property for all sets of t ≤ v errors occurring at t dis-tinct fault locations. These sets of errors correspond precisely to the clauses in Eq. (31).
and we then update the formula as We then ask the SMT solver to re-solve F SMT . We repeat this process until either the problem is shown undecidable, or no  3. In iterative v-flag circuit solving, an initial SMT formula is constructed without flag constraints. After a solution is found, the conditions in Eq. (31) are checked. If any violated constraints are found, then some are added to the SMT formula and the problem is re-solved. The process terminates if either a v-flag circuit is found or a problem is proven to be unsatisfiable.
fault combinations are present in the circuit which violate the v-flag property, as shown in Fig. 3.
We expect two reasons that iterative solving works better than specifying all the constraints at the beginning of the protocol. The first is that many of the constraints are redundant, in a formal sense or a statistical sense. For example, two faults which occur late in the circuit at far away positions (as measured by the qubit connectivity graph distance) cannot possibly both flag the same flag qubit -therefore as long as each of these faults flag independently, then the combination of both faults will also flag. Therefore the constraint ¬IsNotTFlagFaultTolerant for the fault set containing these two faults is redundant in a formal sense with the constraints for each of the faults on their own. Other sets of t faults may have intersecting lightcones such that the constraint for the fault set is not formally redundant, and yet the majority of solutions to the problem do not violate that fault set's t-flag constraint (¬IsNotTFlagFaultTolerant).
Interestingly, our numerical simulations have shown that iterative solving cuts runtimes to assemble the SMT problem instances dramatically, and enables scaling the approach beyond 1-flag to v-flag (with v ≥ 2) which simply is not possible with the naive approach of constructing the entire SMT problem up front.

III. FAULT-TOLERANT |H -TYPE MAGIC STATE PREPARATION USING SMT SOLVERS
The leading approach to implementing quantum algorithms on a universal fault-tolerant quantum computer is to use magic state distillation (MSD) [41] in combination with lattice surgery techniques. Alternative approaches to MSD for achieving universality, such as code-switching [21,[42][43][44], have been proposed. Such alternative approaches are not always compatible with the 2D hardware constraints and have been shown to require larger resource overhead costs in their implementation [29,45,46].
T -type magic states (with |T = (|0 + e iπ/4 |1 )/ √ 2) can be used as a resource state to fault-tolerantly implement logical T gates. In Refs. [27,47], |T states were prepared by first encoding several |T states in distance d = 1 surface codes using non-fault-tolerant methods and growing the codes to a final distance d 1. Afterwards, such states encoded in a distance d surface code were injected in a MSD protocol to distill them to a desired target logical failure rate determined by the size of the quantum algorithm being implemented. Since the growing scheme is not fault-tolerant, a single fault (in the input |T state) can result a logical error in the injected magic states prior to the implementation of the MSD protocol. Consequently, many rounds of MSD are required, where each operation is encoded in a large distance surface code, to generate high fidelity magic states such that they can be used in an algorithm.
An alternative proposal was put forth in [17,18]. In this approach, an encoded |H -type magic state (with |T = e iπ/8 HS † |H where H and S are Hadamard and phase gates) with code distance d > 1 is directly prepared using a faulttolerant protocol, meaning that any errors arising from at most (d − 1)/2 faults cannot lead to a logical error. The protocols in [17,18] make use of the color code, which has the convenient property that the logicalH gate is transversal (along with all other Clifford operations). In this protocol, a physical (d = 1) |H state is grown using non-fault-tolerant methods to a d > 1 encoded |H state. Subsequently, the grown state is injected in a bottom-up fault-tolerant magic state preparation protocol, where (d − 1)/2 rounds of transversal logical Hadamard measurements and color code syndrome measurements are performed. If any of the syndrome or flag-qubit measurements during the state-preparation protocol are nontrivial, indicating the presence of at least one fault, the protocol is aborted and begins anew. An illustration for the sequence of such operations is shown in Fig. 4a. In [17,18], the intermediate code distances d ∈ {3, 5, 7} were analyzed, and despite additional space-time resource overhead costs due to rejection events, the scheme was shown to reduce overhead costs compared to previous MSD schemes in order to achieve a desired logical failure rate. The main reason for the overhead reduction is due to the fault tolerant nature of the preparation circuits, which can be implemented using physical Clifford operations. If the prepared magic states still require higher fidelities before being used in a quantum algorithm, such states can be injected in a MSD protocol requiring only a small number of distillation rounds Although bottom-up protocols have shown promising results in providing low cost methods for preparing high fidelity magic states, one of the main challenges stems from finding flag-based circuits with the desired fault-tolerance properties. For the case of preparing |H -type magic states encoded in a distance d color code, one requires v-flag circuits for measuring the logical Hadamard H L and stabilizers of the color code (the circuits H (d) m and ED (d) ) in Fig. 4a. Further, for many hardware architectures, the qubits in such flag-based circuits are constrained to be laid out on a two-dimensional plane where the qubits can only interact with nearest neighbors. Further, the degree of the interactions must remain low [13].
In Section II we showed how SMT solvers can be used to find Clifford circuits with a set of desired properties. One application of the techniques shown in Section II B 5 was in |H L n.f. finding v-flag circuits for fault-tolerantly measuring stabilizers of an error correcting code. In this section, we show how the protocols introduced in Sections II B 5 and II C can be used to construct v-flag H (d) m and ED (d) circuits with nearestneighbor and low degree connectivity constraints imposed by many quantum hardware architectures [13,29].

A. Qubit Interaction Graph
Recall that we label the n qubits by the integers 1 through n, and denote this set [n] = {1, . . . , n}. We now designate a subset A ⊂ [n] of qubits that are to be prepared in some Pauli eigenstates and measured at the end of the circuit. In our case, for designing ED (d) and H m and X-type stabilizer circuits in ED (d) , it is initialized in a +1 X eigenstate, whereas the other ancillas are initialized in +1 Z eigenstates. We refer to these flag, ancilla, and root ancilla qubits as the A-qubits. The data qubits are then [n] \ A. Since our target hardware is a 2D device where only nearestneighbor qubits can interact, we create a planar graph G qubits = ([n], E) in which the vertices correspond to qubits and the edges are between qubits that support two-qubit gates. This graph is called the qubit interaction graph. An example of such a qubit interaction graph is shown in Figure 4 (b). In our graph we ensure that no two data qubits share an edge, as twoqubit gates are prohibited between data qubits. This graph is designed by hand to use a low degree of connectivity between all qubits and still be connected, such that in the absence of faults it should be possible to implement the desired circuit. We then assemble our set of gates. Because we know the Clifford operations ED (d) and H (d) m [48] can be implemented using only preparation and measurement in the Pauli X and Z bases and CNOT gates, our gate set {G 1 , . . . , G w } consists only of CNOT gates. Specifically, for each edge (u, v) ∈ E where u, v ∈ A, we add CNOT u,v and CNOT v,u to our gate set , we only add a CNOT from the non-data qubit ∈ A to the data qubit.
Note that controlled-Hadamard gates are equivalent to CNOT gates up to conjugation by a single-qubit non-Clifford gate. These single-qubit corrections do not propagate errors and can be placed at the beginning and end of the H m circuits, it suffices to treat all controlled-Hadamard gates as CNOT gates. Lastly, note that if an error correction scheme was being developed rather than an error detection scheme, the type of data qubit errors would matter and, as such, the type of two-qubit gate used for measuring the logical Hadamard operator would need to be considered. We leave such considerations to future work on error-correction schemes. Once the gate set {G i } is specified, we feed it as input to the functions described in Section II B. These functions construct SMT formulas which are then used to construct the entire SMT decision problem given below in Eq. (34). The A-qubits which are shown in purple in Fig. 4 must each be assigned one of three roles: ancilla, root ancilla, or flag qubit. The parity of the measurement outcomes of the root ancilla as well as all other ancillas which are not flag qubits is used to obtain the measurement outcome for the operator being measured. Specifically, the measured operator, which is either a code stabilizer for ED (d) or the logical Hadamard for H (d) m , has 0 or more ancilla qubits and exactly one root ancilla qubit associated with it. The measurement outcome for this measured operator is then encoded as the product of the measurement outcomes across all these ancillas (including the root) that are measured and interact with the root ancilla. We allow the solver to choose which qubits have the role of flag, ancilla, or root ancilla qubit by creating For an X-type stabilizer measurement and logical H L measurement, we know that the root ancilla is prepared in a +1 eigenstate of the X operator. The non-root ancillas and flag qubits are prepared in a +1 eigenstate of the Z operator. The (root and non-root) ancillas are measured in the X basis. The flag qubits are measured in the Z basis. For a Z-type stabilizer measurement, the X and Z bases are all swapped, that is, we replace X with Z and Z with X in the preceding description. As such, we do not need to produce a solution for Z-type stabilizer measurements. Here, as in Section II B 5 we refer to the measurement bases of each qubit by {P i }, with the understanding that this denotes a pair of symbolic Boolean values (IsMeasuredInXBasis i , IsMeasuredInZBasis i ) as explained in Section II B 5. In our case, for each qubit, the preparation and measurement bases of the qubit are the same and we use P i to refer to this one basis.
We then set a number of time steps N (H) and N (ED) for each circuit to complete. Then we declare the gate-time encoding variables {X m ), but we refer to these just as X ij when we are speaking about a generic protocol of the two. We construct the gate exclusion relations constraint for each circuit as explained in Section II B 3. We label these SMT formulas by GateExclusion (H) , GateExclusion (ED) . Since the construction of these formulas proceeds analogously for both protocols, we refer to these formulas generically as GateExclusion.
Although the qubit interaction graph shown in Fig. 4 already has maximum degree 3 as desired, we can also start with a higher degree graph and enforce a global degree connectivity constraint with variables X We then construct the bit-matrix O that describes the Clifford operation we would like to implement, as used in Eq. (17). Since we allow the SMT solver to choose which qubits are flag, ancilla, and root ancilla qubits, we must use a symbolic matrix to represent O as explained previously in Section II B 5. Recall that we have variables P i for the preparation and measurement basis. For each qubit i ∈ A we construct a bit-vector S i which describes the initial stabilizer of the input state acting on this qubit. As this depends on P i we must make this a symbolic bit-vector so that it represents the appropriate initial Pauli stabilizer depending on the role of the qubit. To understand how to construct the symbolic desired bit-matrix O, let us consider its four quadrants: the upper left quadrant corresponding to X to X propagation, the lower right quadrant corresponding to Z to Z propagation, and the off-diagonal quadrants. The offdiagonal quadrants are set to 0. Recall that we call the upper left quadrant O| X and the lower right quadrant O| Z reduced bit-matrices. Due to the symmetry of CNOT gate propagation for X and Z type Paulis (as shown in Eqs. (3) to (6)), since our circuits are only composed of CNOT gates we have that O| X = O| T Z . By this symmetry, we must therefore only specify one of the two quadrants, so we specify how the O| X quadrant is constructed. Further, we do not have to fully specify O| X , as the value of (O| X ) j,i does not matter for any i which is a non-root ancilla or a flag qubit with j = i. The reason that (O| X ) j,i does not matter for such j and i is the particular input state we have chosen. These off-diagonal values (O| X ) j,i for non-root columns i can be set arbitrarily by right-multiplying O by CNOT i,j . These CNOT i,j gates would have no effect on the input state since the non-root qubits are initialized in the |0 state. In more formal terms, we only specify the symbolic bit-matrix O up to right-multiplication by an arbitrary Clifford operation which we know stabilizes the incoming state anyway. Therefore we just set these rows of the symbolic bit-vector for this column equal to a wildcard value * .
The remaining columns of O| X are associated with the root ancillas of all stabilizers and the data qubits. These columns must be constrained exactly. In particular, the initial X stabilizer on the root ancilla must propagate to an X type Pauli operator supported on all ancillas (including the root) in its stabilizer, along with the data qubits included in that stabilizer. Therefore for each i ∈ A, if IsRoot i = 1, we must have the i th column of O XX set equal to the vector supported on all the root and non-root ancillas and data qubits of this stabilizer. This vector can be constructed symbolically using the IsRoot i , IsFlag i variables to determine whether it is supported on an A-qubit. Furthermore, for all the data qubits i ∈ [n] \ A, there should be no propagation of X errors from the data qubits onto the A-qubits, as the data qubits are always the target of CNOT gates. Therefore for each i ∈ [n] \ A, the ith column of O XX should be 0 except for a 1 in the ith row.
We have now explained how all of the entries of O would be determined, up to right-multiplication by arbitrary Clifford stabilizers of the incoming state. We do not have a particular setting of the qubit roles as these are determined by {IsFlag i } and {IsRoot i }. We handle this by replacing the constraint Eq. (17), with a SMT formula we label HasDesiredEffect. This is easy to construct by going column-by-column through the restricted bit-matrix columns of C| X . For the ith column, we construct a symbolic bit-vector which is determined symbolically by IsRoot i and IsFlag i as discussed above using standard If-Then-Else support from the SMT solver. We then set the non-wildcard rows of this symbolic bit-vector equal to the corresponding rows of the symbolic bit-matrix of the entire circuit as computed by the product-sum formula (Eq. (16)).
Finally we use the techniques of Section II B 5 to add v-flag constraints to the SMT problem. Recall that we can produce the SMT formula IsVFlag in its entirety using the function shown in Eq. (31), and providing the appropriate {X ij }, {P i }, and {IsRoot i } variables for the relevant protocol as inputs. The final SMT decision problem for designing the circuit is then given by (34) This formula F SMT is then fed into an off-the-shelf SMT solver such as Z3 [19]. For large problem instances we apply the   Fig. 7 of Ref. [18] with corresponding protocols in this work shown in Fig. 4 iterative solving techniques of Section II C to speed up the symbolic construction of F SMT and the time taken by the solver to find a solution.
We remark that using the methods described above, the circuit H Fig. 4 has low degree and is obtained systematically. In Fig. 7 (a) of Ref. [18], a similar alternative circuit to H (3) m was obtained "by hand" and is more highly structured. However, this alternative circuit has much higher degree (see Table II). SMT solvers offer a systematic alternative to handdesign when finding optimal low degree circuits conforming to realistic hardware constraints. Such optimal solutions might have less apparent structure than those that are hand-designed. Fortunately, SMT solvers can automatically generate proofs of optimality with respect to protocol depth, degree, etc. For example, we used Z3 to prove that the protocols shown in Fig. 4 have the minimum possible number of timesteps for a degree 3 qubit interaction graph given the geometric and other problem constraints.

C. Complexity of finding v-flag circuits using SMT solvers
In the preceding sections we explained several optimizations for the encoding of a fault tolerant quantum circuit design problem into an SMT decision formula F SMT . These optimizations reduce the formula size significantly. For example, as explained in Section II A 3, the product-sum relation allows us to build formulas with size scaling in the number of time steps N , rather than the (much larger) total number of gates g. For the design of v-flag syndrome extraction circuits of a distance d code, these optimizations can ensure that These formulas therefore have polynomial size whenever v is a constant. The naive brute-force algorithm to solve F SMT has runtime O(2 size(F SMT) ). SMT solvers cannot provide a better runtime guarantee, so our worst-case complexity is still which is doubly-exponential if v = Ω(d). However, despite the doubly-exponential runtime in the worst case, we found that performance is "good enough" for typical problem instances such as small code distances relevant for near-term quantum devices. Our suite of optimizations allowed us to solve for stabilizer and H (d) m measurement circuits for color code distances up to d = 7 with reasonable runtime. As a benchmark, we ran our solver for a 90 qubit system of a distance 5 code to co-design the following three circuits: m circuit, with a CNOT depth of 14.
3. An additional 1-flag syndrome extraction circuit for the merged surface-color code described in Section IV, with a CNOT depth of 9. Such 1-flag circuits are required due to the weight-6 stabilizers along the boundary of the surface code and color code.
This design problem has 44 data qubits and 46 ancilla qubits and uses distance d = 5 codes. In what follows, the numerics were obtained by running the solver on specialized z1d.metal Amazon Web Services (AWS) EC2 instances, which provide an all-core sustained clock frequency of up to 4.0 GHz. Our solver constructs and solves F SMT to find all three circuits in 1 h 14 min with a degree constraint limiting each qubit to interacting with at most 3 nearest-neighbor qubits. This time reduces to just 58 minutes when the degree constraint is loosened to 4.
The heuristics used by the SMT solver (Z3 [19]) are extremely effective compared to the brute-force strategy of guessing every possible circuit. In the H (d) m subproblem described above, there are 141 possible CNOT gates at each timestep, for a total of 1974 possible CNOT gates in the circuit. The number of possible circuits is therefore at least 2 1974 , which is significantly larger than the number of atoms in the known universe. As such, the heuristics of the solver are hugely beneficial. One possible reason for the effectiveness of SMT solvers on our SMT formulas is that all of our variables are over finite (Boolean) domains and we do not use quantifiers such as "∃" and "∀" . The problem of deciding whether an SMT formula containing such features is satisfiable is undecidable in general. To illustrate the challenge presented by such formulas, consider the SMT formula (a 100 +b 100 = c 100 )∧(a > 1∧b > 1∧c > 1) where the variables a, b, c range over the integers. This formula is unsatisfiable, as implied by Fermat's last theorem [49]. Nonetheless, it would be extremely surprising if contemporary SMT solvers could generate a proof of the unsatisfiability of this formula in any reasonable amount of time.
We add that it is not always necessary to scale the code distance d to obtain a family of schemes which is ultimately fault tolerant. For example, the error detection scheme for H (d) m would have an unacceptably high rejection rate for code distances d ≥ 9 due to the very large number of fault locations. Therefore in practical settings, |H -type magic states would be prepared using our method for distances d ≤ 7. Such states would then be teleported to states encoded in the surface code (see Section IV below). Afterwards, a magic state encoded in the surface code would be grown to a larger code distance using gauge fixing and lattice surgery methods. Lastly, a topdown magic state distillation protocol would be used to further increase the fidelity of the state. Therefore, to limit the cost of the top-down protocols, it is crucial that the injected magic states have the highest possible fidelities, warranting the use of sophisticated methodologies such as SMT solvers.

A. Motivation
In Section III we provided tools to find circuits that can be used to fault-tolerantly prepare |H -type magic states encoded in the color code. However, a limitation of such protocols is that the final magic states are encoded in the color code rather than the surface code. As previously discussed, the surface code offers much better performance for quantum memory and computation implemented via lattice surgery. In order to ensure that the prepared magic states can be used in a competitive scheme for universal fault-tolerant quantum computation, in this section we consider a protocol for teleporting states encoded in the color code into states encoded in the surface code. The protocol requires the use of lattice surgery techniques [2,47,[50][51][52][53], where gauge fixing results in a merged surface/color code, after-which the codes are split up again to terminate the two-qubit teleportation step. Our approach is an adaptation of the technique proposed in [54], although there are several key differences and extensions. Firstly, we do not require additional data qubits for the code obtained by merging the surface code and color code (which we refer to as the merged code). Secondly, we explicitly construct a decoder for the the merged code (an aspect which was not addressed in [54]).
In Ref. [55], the authors describe a decoding algorithm for "hybrid color-toric" (HCT) codes constructed by local modifications to 2D color codes that create localized ball-like toric code regions. These HCT codes and their associated decoding algorithm differ from the merged color-surface code that we consider in the work (see Algorithm 2). In the HCT codes in Ref. [55] there is a correspondence between the vertices of the syndrome lattice of the HCT code and the vertices of the "original" unmodified color code lattice. This allows their local lift procedure (see [55], Appendix G) to be described in terms of the standard local lift of the color code introduced by [56], as it would be applied to the vertices of the original unmodified color code. In our hybrid code, there is no such correspondence, so the HCT local lift is not well-defined and we must define the lifting procedure from scratch. Furthermore, the HCT decoder would need to be modified in order to be used for codes with boundaries such as our code; the authors of [55] suggest but do not explicitly describe how this would be done. Figure 5 depicts the teleportation circuit for transferring a state encoded in the color code to a state encoded in the surface code. To implement this circuit fault-tolerantly using the hardware constraints mentioned above, lattice surgery must be used to measure the two-qubit X ⊗ X operator. After measuring X ⊗ X, repeated rounds of error correction must be performed on the merged code in order to prevent both timelike and spacelike errors from giving the wrong parity of the measurement outcome. As such, a decoder is required to process the stabilizer syndrome measurements of the merged code during the lattice surgery protocol. When performing gauge fixing to merge the color code and surface codes, the X-operator measurements at the boundaries of the two-codes anticommute with some Z stabilizers of the original codes, causing them to merge into elongated stabilizers. The stabilizers of the unmerged codes, and the stabilizers of the merged code (including merging operators) are shown in Figure 6. We also note that the theoretical framework of Ref. [23] can be used to obtain the semitransparent domain walls (i.e. stabilizer generators between the surface code and color code) allowing one to perform lattice surgery between the color code and surface code.
In Section IV B, we introduce some notation and provide some definitions. In Section IV C, we provide the details of our decoding algorithm for the merged code.

B. Notation and Definitions
In what follows, we set L ≥ 3 to be odd. A triangular color code (CC) of distance d = L is a CSS code with identical X and Z stabilizers where the data qubits can be placed at the vertices of a two-dimensional hexagonal lattice. We consider an orientation as shown in Figure 6 (a) and (c), right side (for distance 5). We have a 3-coloring of red (R), green (G), and blue (B) tiles corresponding to stabilizer generators of the color code.
The surface code (SC) of distance d = L on a square lattice is a CSS code with differing X and Z stabilizer subgroups. The stabilizers for a d = 5 surface code are shown on the left-hand side of Figure 6 (a) and (c). We assign colors red (R) and blue (B) to the X and Z stabilizers of the surface code, respectively.
For P ∈ {X, Z}, we define the CC (SC) P -syndrome graph L (CC,P ) * (L (SC,P ) * ), whose vertices are associated with stabilizer measurement outcomes of P -type stabilizers of the color (surface) code. In addition, we add virtual boundary vertices corresponding to virtual stabilizers, whose color is indicated by its subscript. For the color code, we add three P -type virtual stabilizers to L (CC,P ) * : ; these are illustrated as isolated dots in Figure 6. The support of v (CC,P ) C is the set of qubits of the color code which are not contained in any (real) C-colored P -type stabilizers.
For the surface code, we add one virtual stabilizer to each of L (SC,X) * and L (SC,Z) * : the X-type red virtual stabilizer v (SC,X) R is added to L (SC,X) * and the Z-type blue virtual stabilizer v (SC,Z) B is added to L (SC,Z) * . The support of v (SC,X) R is the set of all surface code qubits that are contained in exactly one (real) X-type stabilizer, while the support of v is the set of all qubits which are contained in exactly one (real) Ztype stabilizer. Finally, we add edges to the syndrome graphs L (P,CC) * , L (P,SC) * between all pairs of stabilizers (including virtual stabilizers) sharing one or more common qubits.
We will now explain how gauge fixing is implemented to combine the SC and the CC into a single merged code. Consider the orientation shown in Figure 6 for a distance 5 surface code and color code. The rightmost weight-2 blue Z stabilizers are labeled α 1 , . . . , α (d−1)/2 from top to bottom. The leftmost weight-4 blue Z stabilizers are labeled β 1 , . . . , β (d−1)/2 from top to bottom. We also denote by ζ 1 , . . . , ζ (d+1)/2 the red X-type operators which are labeled in Figure 6 (b).
To produce a merged code from a surface code and color code, we gauge fix by measuring the ζ i operators. After performing the measurement, the ζ i operators are now red X stabilizers of the resulting merged code. Clearly ζ 1 anticommutes with both α 1 and β 1 but commutes with α 1 β 1 . Therefore upon measuring ζ 1 , we no longer have α 1 and β 1 as stabilizers, but the operator α 1 β 1 remains a stabilizer. We denote this new stabilizer by η 1 . By similar reasoning we see that all the stabilizers α i , β i are removed upon measuring all the merging operators ζ i , but their products η i := α i β i are left as new stabilizers of the merged code. These new η i stabilizers are shown in blue in Figure 6 (c). In what follows, we will refer to this new code as the merged code (MC).
For P ∈ {X, Z}, we define the MC syndrome graph L (M C,P ) * as follows: the vertices of L (M C,P ) * are associated with the P -type stabilizers of the MC. For the MC, we add the three P -type virtual stabilizers {v  is the set of surface code qubits which are contained in the support of exactly one red surface code stabilizer and no red, green, or blue stabilizers in the MC. This corresponds to the leftmost column of qubits in Figure 6.
Finally, we once again add edges to the syndrome graph L (M C,P ) * between all pairs of stabilizers (including virtual stabilizers) sharing one or more common qubits. The resulting syndrome graph edges are illustrated for L = 5 in Fig. 6. We also define a classification of the qubits of the MC as either face-qubits or edge-qubits. For a qubit q, its classification is determined by the number t of stabilizers (including virtual stabilizers) which contain q. By inspection, we either have t = 2 or t = 3. If t = 2, we say that q is an edge qubit. Otherwise t = 3 and we say that q is a face qubit. We define the 1-boundary denoted ∂ 2 q of a face qubit q, as the set of the 3 edges in L (M C,P ) * which connect the 3 stabilizers of q. Note that we must indeed have all three edges in L (M C,P ) * as any stabilizers sharing a qubit share an edge in L (M C,P ) * . We now extend this notion to a set of face-qubits. For a set of face-qubits S = {q 1 , . . . , q }, we define the 1-boundary denoted ∂ 2 S as in which ⊕ denotes the symmetric difference of sets, and ∂ 2 q is just the 1-boundary of the individual face qubit q. The edges e ∈ L (M C,P ) * are given a real-valued weight which we denote wt(e), which is set to either w 1 or w 2 , where w 1 , w 2 ∈ R. Intuitively, w 1 sets the weight of edges in the surface code, while w 2 sets the weight of edges in the color code. Specifically, if the stabilizers associated with vertices u and v share any face qubits, we set the weight of (u, v) to w 2 , and otherwise we set the weight to w 1 . Optimal values for w 1 and w 2 are found numerically by computing the logical error rates of the MC code for a given noise model. The X distance of MC is d X = L, and the Z distance is d Z = 2L. Note that this is an advantageous configuration for biased noise models where Z errors are more likely than X errors. Furthermore, the orientation can be swapped such that the X distance is larger in the case of X-biased noise.
For each P ∈ {X, Z}, we also define three restricted graphs L contains all vertices of color C ∈ {C 1 , C 2 } and edges between these vertices.
The restricted graphs will be used for the decoding algorithm as explained in Section IV C.

Algorithm 1 Produces a minimum weight
if u 1 ∈ A and u 2 ∈ A then 8: We will now define a few graph-theoretic notions which are used in our MC decoding algorithm. For a graph G = (V, E) with nonnegative edge weights E w − → R ≥0 , e → w(e), we recall that M ⊂ E is a perfect matching (PM) if each vertex v ∈ V has exactly one edge in the set M . More generally, for a subset A ⊂ V , we will say that M ⊂ E is an A−perfect matching (A-PM) if each vertex v ∈ A has exactly one edge in the set M . Note that this permits vertices v ∈ V \ A to have any number of edges in the A−perfect matching. Note also that a V −perfect matching is simply a perfect matching. Finally, we say that M ⊂ E is a minimum-weight A-perfect matching if for all A−perfect matchings M , e∈M w(e) ≤ e∈M w(e). There exists a polynomial-time algorithm [57] for finding a minimum weight perfect matching of a weighted graph, when one exists. Given a weighted graph G and vertex subset A as above, it is possible to construct a new graph G such that a minimum weight perfect matching of G can be used to recover a minimum weight A-perfect matching of G. This construction is illustrated for a small example in Fig. 7 and exploited by Algorithm 1, which uses the minimum weight perfect matching algorithm as a subroutine on Line 24.

C. Merged Surface-Color Code Decoding Algorithm
A decoding algorithm for triangular color code families was provided in Ref. [3]. We now provide a modified version of this decoder to handle effects of the surface code boundary thus making it compatible with the MC code. In this section we explain how our decoder works in detail. Our full decoding algorithm is then shown in Algorithm 2. For convenience, in this section we will refer to specific numbered lines of Algorithm 2.
Let E be a physical Pauli error operator on the data qubits with X and Z error syndromes S X (E Z ), S Z (E X ) respectively. That is, the set S P (E P ) is a subset of the real vertices of L (M C,P ) * . The X and Z syndromes will be decoded independently to produce Z and X correction operators E Z , E X , respectively. Hence, in what follows, let P ∈ {X, Z}. As in Line 2, fix P ∈ {X, Z} and P = P . Initialize the correction operator E P = ∅ as in Line 3. We will refer to the stabilizers contained in S P (E P E P ) as the marked stabilizers. Note that as the correction operator is empty at the start of the algorithm, the initial set of marked stabilizers is the same as the P -syndrome input to the decoder.
We decode S P (E P ) in two stages which we call the color code stage and the surface code stage. The color code stage corresponds to Line 4 through Line 40, while the surface code stage corresponds to Line 41 through Line 43.
The color code stage produces a partial correction E P which only contains color code qubits. We use this partial correction to update the P -syndrome marked stabilizers S P (E P E P ). At the end of the color code stage, no color code stabilizers are marked except possibly some of the η i . In contrast, surface code stabilizers may still be marked. This is because no surface code qubits are contained in the partial correction E P at this stage, and hence any initially-marked surface code stabilizers will remain marked at the end of the color code stage. To be clear, the partial correction E P at the end of the color code stage is stored in the software implementing Algorithm 2, and does not need to be actively applied to the physical data qubits.
After the color code stage, we run the surface code stage beginning on Line 41. The surface code stage is simpler than the color code stage. It makes some final modifications to the partial correction E P so that there are no marked MC stabilizers in the marked set S P (E P E P ). At this point the P correction E P is completed.
Algorithm 2 Produces Z and X corrections E Z , E X given X and Z syndromes S X (E Z ), S Z (E X ) 1: for P ∈ {X, Z} do 2: Let P ∈ {X, Z}, P = P 3: Initialize empty partial correction 4: Let A = S P (E P E P )

41:
A ← S P (E P E P )

42:
E P ← SurfaceCodeCorrection(A) 43: end for We will now explain the steps of the color code stage in detail. The first step is to produce three , along with a path Γ u,v,C for each pair (u, v) ∈ M C . This path Γ u,v,C joins u and v through the restricted graph L (M C,P ) * C1,C2 . The path Γ u,v,C must be legal, as specified by the following conditions: 2. Among the edges of Γ u,v,C , there is at most one edge which is incident to any virtual vertex.
3. There is at most one virtual vertex visited by Γ u,v,C .
4. If u and v are vertices in the color code (including all the η i vertices), then Γ u,v,C does not visit any vertices in the surface code.
We choose Γ u,v,C to be the minimum weight legal path, which is found by the subroutine MinWeightLegalPath in Line 11. Specifically, MinWeightLegalPath returns the minimum weight legal path if one exists, and returns the placeholder symbol ⊥ if no path exists satisfying the above conditions. The subroutine MinWeightLegalPath can be easily implemented by modifying Dijkstra's pathfinding algorithm to take into account the legality conditions above. Specifically, during the Dijkstra search, the virtual vertices should be treated as having zero out-edges, and if u and v are color code vertices as defined above, then all edges to surface code vertices are ignored. The colored pairings (M C , {Γ u,v,C : (u, v) ∈ M C }) enable us to recover a set of qubits in the color code which we add to the partial correction E P . Specifically, we define a subroutine (Lift), which is applied at some of the vertices visited by the pairing paths. The Lift subroutine has arguments Lift(u, Γ).
Here, u is a real (non-virtual) vertex of the graph L . The Lift subroutine returns a set of qubits, as follows. If u is a real vertex of the surface code, then the Lift subroutine returns the empty set. Otherwise u is vertex of the color code and Lift(u, Γ) returns a set of face-qubits to add to E P . Specifically, let Γ| u denote the set of edges contained in Γ which are incident to the vertex u. Then we have that (We remind the reader that ∂ 2 denotes the 1-boundary as we have defined in Section IV B.) The implementation of the Lift subroutine is easily and efficiently implemented using bruteforce search or linear algebra techniques, as described in [3]. Note that above we have not defined the behavior of the Lift subroutine for virtual vertices v (M C,P ) C . As we will explain, we take care never to apply the Lift subroutine to any such virtual vertices in our decoding algorithm, and so this is not an issue. As a cautionary note, please note that above we have used Γ as a placeholder for a general subset of edges through the graph L (M C,P ) * , and Γ is not to be confused with the specific symbol Γ u,v,C which specifically denotes the minimum-weight legal path connecting u and v as we have just described above. In fact, it is impossible to correctly Lift a vertex u with the path Γ = Γ u,v,C , as for this path there would be no subset of facequbits such that Eq. (38) is satisfied. However in our algorithm The weights on edges incident to vertices in V \ A are omitted as these edges can be removed from any A-perfect matching without increasing its total weight. When applying this algorithm to the matching graph of the surface code, all vertices in V \ A correspond to virtual boundary vertices (and thus all edges within V \ A have zero weight). (b) A new graph is assembled by replacing all vertices in V \ A with a single new vertex q. The edges between A and V \ A are replaced with some edges from q to vertices in A, as follows. For each vertex v ∈ A with an edge to V \ A in G, we add an edge (q, v), setting the weight of this new edge to the minimum of all edge weights between v and V \ A in G. This is analogous to the computation of q(v) in Line 3. (c) For each edge (v1, v2) ∈ E with v1, v2 ∈ A, such that w(v1, u) + w(v2, u) < w(v1, v2), we update the edge weight to w(v1, u) + w(v2, u) and set the label function L(v1, v2) to 1 (Line 11) (indicated by the purple weight). (d) A minimum-weight perfect matching M of the graph is computed (in this example, resulting in highlighted edges of weight 6 and 1). (e) A minimum-weight A-perfect matching M is recovered by replacing all edges e ∈ M such that L(e) = 1 with the two minimum-weight edges from the endpoints of e to V \ A (corresponding to n(v) as set in Line 4).  the Lift subroutine is always used in such a way that it has a valid output satisfying Eq. (38).
In our algorithm we use Lift on Line 31 and Line 39 as follows: Above, the parity symbol ⊕ denotes the symmetric difference of the two sets E P and Lift(u, Γ). That is, we update the partial correction to be the set of qubits which are contained in exactly one of E P or Lift(u, Γ).
We must apply the Lift subroutine carefully. As shown in [3], we must avoid lifting at virtual vertices because it decreases the effective distance of the decoder. To avoid lifting at virtual vertices we pre-process the paths which visit the green virtual stabilizer v . This pre-processing step takes place in Line 23 through Line 37. To explain this pre-processing step we will define the notion of a boundary-connected component (BCC). These components are sequences of colored pairing paths which begin on the green virtual stabilizer v for which Additionally, the pairs (v i , χ i ) are unique across the BCCs.
is any other BCC of length , and i ∈ 2, . . . , − 1 and j ∈ 2, . . . , − 1, then we have of the BCC. Following this implementation, the BCCs will satisfy the above properties.
We now iterate over each BCC θ = ({v i }, {χ i }) in Line 23 through Line 37. In Line 24 we declare the path (that is, the set of edges) along the minimum weight legal paths between subsequent v i . That is, we set We then assign θ a color denoted by color(θ) ∈ {R, B} (Line 25). This is done so that Γ θ never visits the virtual vertex v (M C,P ) color(θ) . We apply the Lift subroutine at each vertex visited by Γ θ with color equal to color(θ). Recall that by definition, the Lift subroutine returns the empty set when passed a surface code vertex. However, if the color of the BCC is set to red and its path Γ θ crosses between the color and surface code, then we must do some variant of the lift subroutine to find an appropriate correction in the vicinity of these red surface code vertices. This is achieved by the subroutine SCRedLift(Γ θ ). An example of applying SCRedLift is shown for illustration in Fig. 8. Formally, the SCRedLift subroutine finds the ordered sequence of edges e 1 , e 2 , . . . , e 2m−1 , e 2m ∈ Γ θ which are incident to both surface code and color code stabilizer, and are ordered by increasing vertical position in the 2D layout. There must be an even number of such edges, since by construction the path Γ θ must connect between v , so Γ θ must move from the color code to the surface code an even number of times. Now for two such crossing edges (e i , e i+1 ), let SandwichedQubits(e i , e i+1 ) denote the set of qubits which line between e i and e i+1 , and are contained in the intersection of the color code X-stabilizers and the ζ i X-stabilizers of the MC code. Then SCRedLift(Γ θ ) returns the union of these sandwiched qubits across all the pairs: SandwichedQubits(e 2i−1 , e 2i ) (41) After applying the appropriate Lift subroutine calls (Line 31) and SCRedLift subroutine calls (Line 34) as appropriate, we can remove the BCC paths Γ θ from the combined pairing paths (Line 36). After finishing this for all the BCCs θ we are finished with the pre-processing step.
By carefully processing the BCCs as described and updating the partial correction E P with lifts, we avoided lifting at v in the pre-processing step, we will not apply Lift at any virtual stabilizers in this final step.
After the color code stage is concluded on Line 40, we run the surface code stage on Line 41 through Line 43. First we update the set of marked vertices A = S P (E P E P ) based on the partial P -type correction produced by the previous stage (Line 41). We then obtain a set of qubits SurfaceCodeCorrection(A) using the standard surface code decoder on Line 42. That is, we set V SC = A ∪ {v }, and for each pair (u, v) ∈ V SC × V SC , such that u = v, we set Γ u,v to be the minimum weight path which joins u and v through L (M C,P ) * | SC∪{ηi} (or else ⊥ if none exists), and set w(u, v) = e∈Γu,v wt(e). We let G SC = (V SC , E SC , w) be a weighted graph. We then find a minimum weight A-perfect matching M SC of the graph G SC , and for each (u, v) ∈ M SC , we set E P ← E P ⊕ Γ u,v . This concludes the surface code stage. After this point there are no more marked vertices and E P is completed.
By avoiding all lifts at virtual vertices our decoder obtains good performance as shown by our numerics in Section IV D. Specifically, we maintain the full effective X distance d X of the surface code decoder, and the combined effective Z distance 1 + 2 3 d Z of the surface code decoder along with the color code decoder ( [3]).

D. MC code capacity simulation results
In this section, we describe Monte Carlo simulation results of the MC code under a code-capacity depolarizing noise model where data qubits are afflicted by X, Y and Z errors, each occurring with the same probability p. Data for the logical X and Z failure rates of the MC code for various values of L are shown in Fig. 9.
By carefully analyzing the data, we find the threshold for logical X and Z errors to be approximately p th ≈ 0.13. However, for many noise parameter regimes, the logical Z error rate is substantially lower than the logical X error rate. In particular, for both logical X and Z error rates, we performed a best-fit analysis to obtain logical error rate curves p (X) L and p (Z) L using the following ansatz: where L is shown in Fig. 9, a, b and c are parameters obtained by the fit. As such, the parameter c describes the effective d x and d z distances, i.e. the minimum-weight X and Z errors which can cause a logical fault. The best fit parameters are given in Table III. As can be seen from the parameter c in Table III, the effective d x distance is roughly half the effective d z distance. The reason is that horizontal Z error chains which can result in a logical Z error must span a length of size 2L. In contrast, vertical X error chains which can cause a logical X error must span a length of size L.

V. CONCLUSION
In this paper, we showed in Section II how Clifford circuits can be designed where the desired constraints (such as certain fault-tolerance proprieties, degree of connectivity between the qubits etc) can be formulated as an SMT decision problem. We provided several examples of how an SMT formula's, which evaluates to a Boolean value, can be formulated for various Clifford circuits.
In Section III, we applied our SMT formalism to derive fault-tolerant flag-based circuits to prepare |H -type magic states encoded in the color code. In particular, we discussed how v-flag circuits can be derived with the added constraint that qubits must interact via nearest neighbors with low degree connectivity constraints. Examples of such circuits for d = 3 color codes were provided in Fig. 4. A clear direction of future work would be to obtain such circuits for larger code distances, and optimize iterative solving techniques described in Section II C to reduce the computation time required to find a solution.
Lastly, in Section IV, we considered converting states encoded in the color code to states encoded in the surface code. Performing such conversions was motivated by the fact that surface codes are much better suitable candidates for implementing algorithms via lattice surgery, while at the same time, color codes are particularly well suited for fault-tolerantly preparing magic states.
To convert color codes to surface codes, we provided a decoding algorithm for a code obtained when merging the color code with the surface code via lattice surgery, an integral part of the teleportation step. We then analyzed the performance of the merged code for code capacity noise. A direction of future work would be to extend our decoder to be compatible with lattice surgery protocols, as was done for instance in Ref. [51] and analyze the final logical error rates of the prepared magic states under a full circuit level noise model.
We close with some further suggested directions for designing quantum circuits with SMT solvers. We mention that there is room to optimize our encoding and solver techniques further. For example, using a variant of the Strassen algorithm [58], one could reduce the number of costly symbolic bit matrix multiplications when constructing F SMT . SMT solvers support a wide variety of strategies which should be fine-tuned to get the best solver performance. It is also possible that a simpler standalone algorithm could supplant the use of SMT solvers for certain design problems. Lastly, in this work, we only considered the design of deterministic Clifford-like circuits, i.e., the class of unitary circuits that are equivalent to Clifford circuits up to conjugation by local unitaries. It would be interesting if our techniques can be extended beyond this restricted circuit class to nondeterministic and / or non-Clifford circuits. For circuits that are dominated by Clifford gates but that contain a relatively small number of non-Clifford operations, the algorithm of Ref. [59] could be used as a starting point for encoding in an SMT decision problem.

VI. ACKNOWLEDGMENTS
We thank Markus Kesselring for his comments on our manuscript and for pointing out the connection between our lattice surgery methods and the use of domain walls between the color code and surface code.