Improved single-shot decoding of higher dimensional hypergraph product codes

In this work we study the single-shot performance of higher dimensional hypergraph product codes decoded using belief-propagation and ordered-statistics decoding [Panteleev and Kalachev, 2021]. We find that decoding data qubit and syndrome measurement errors together in a single stage leads to single-shot thresholds that greatly exceed all previously observed single-shot thresholds for these codes. For the 3D toric code and a phenomenological noise model, our results are consistent with a sustainable threshold of 7.1% for $Z$ errors, compared to the threshold of 2.90% previously found using a two-stage decoder~[Quintavalle et al., 2021]. For the 4D toric code, for which both $X$ and $Z$ error correction is single-shot, our results are consistent with a sustainable single-shot threshold of 4.3% which is even higher than the threshold of 2.93% for the 2D toric code for the same noise model but using $L$ rounds of stabiliser measurement. We also explore the performance of balanced product and 4D hypergraph product codes which we show lead to a reduction in qubit overhead compared the surface code for phenomenological error rates as high as 1%.


I. INTRODUCTION
Quantum error correction will be necessary to protect quantum computers from errors due to unwanted interactions with the environment. One of the most promising error correcting codes is the surface code, which can be implemented using stabiliser measurements that are geometrically local in two-dimensional planar architectures, and has a high threshold for realistic noise models. However, for the surface code, stabiliser measurements must be repeated many times to handle the noise that is inevitably present in the stabiliser measurements themselves. Furthermore, one surface code patch only encodes a single logical qubit (k = 1) in n physical qubits, leading to a vanishing encoding rate k/n as the block length n is increased [2]. These limitations lead to large time and space (qubit) overheads, with daunting resource requirements for realising fault-tolerant quantum computation [3].
The time overhead for fault-tolerant quantum computation can be significantly reduced by instead using single-shot quantum error correction, introduced by Bombín [4]. With single-shot quantum error correction, only a single round of stabiliser measurements is sufficient when decoding, even in the presence of measurement errors. Codes that have been shown to support single-shot error correction include the 3D gauge color code [4,5], 3D toric and 3D hypergraph product codes (for Z noise) [6][7][8], 4D toric and 4D hypergraph product codes [9][10][11], 4D hyperbolic codes [12], 3D subsystem toric codes [13] and quantum expander codes [14]. More generally, Quintavalle et al. defined a property called confinement, and showed that codes satisfying this property are single-shot under an adversarial noise model, with linear confinement sufficient for single-shot error correction * oscar.higgott.18@ucl.ac.uk † n.breuckmann@ucl.ac.uk under a stochastic noise model [8]. Of particular interest are single-shot codes that also have improved code parameters relative to the surface code [8,[10][11][12]14] such as hypergraph product codes [10,11,15], which have a finite encoding rate as well as a minimum distance growing as O( √ n).
In this work, we focus on the problem of single-shot decoding of 3D and 4D toric codes and 4D hypergraph product codes. We find that decoding data qubit and syndrome measurement errors together in a single stage using belief propagation and ordered statistics decoding (BP+OSD) leads to very high thresholds. Using our decoder, we find a threshold of 7.1% for the 3D toric code with a phenomenological noise model, far exceeding the highest previously reported threshold of 2.90% found using a two-stage decoder, where syndrome measurement errors are repaired first using minimum-weight perfect matching (MWPM), before correcting data qubit errors using BP+OSD [8]. Another approach for single-shot decoding is using a cellular automaton, the sweep rule, and this approach was found to have a threshold of 1.7% for the 3D toric code [6,7].
Previous work on single-shot decoding of the 4D toric code has mainly considered local decoders, including cellular automaton decoders, as well as a decoder due to Hastings [16], which was found in Ref. [9] to have a threshold of 1.59% for the 4D toric code with a phenomenological noise model. We instead find a threshold of 4.3% using single stage BP+OSD decoding which, remarkably, even exceeds the threshold of 2.93% for the 2D toric code with the same noise model but using L rounds of stabiliser measurements [17].

II. STABILISER CODES
A stabiliser code is defined as the +1-eigenspace of elements of its stabiliser group S, which is an abelian subgroup of the Pauli group P n on n qubits which does arXiv:2206.03122v2 [quant-ph] 27 Jun 2023 not contain the element −I. The centraliser C(S) of the stabiliser group is the set of Pauli operators that commute with every element of S. The weight |P | of a Pauli operator P ∈ P n is the number of qubits on which it acts nontrivially. Elements of C(S) \ S are nontrivial logical operators, and the minimum distance of the stabiliser code is the minimum weight of any Pauli operator in C(S) \ S.
We perform error correction by measuring a set of generators {s 0 , s 1 , . . . , s r−1 } of the stabiliser group S, and we call this set of generators the check operators. Given an error E ∈ P n and a choice of check operators, the syndrome map σ(E) gives the measurement outcome of each check operator. For a given stabiliser code, the reduced weight |P | red of a Pauli operator P ∈ P n is the minimum weight |Q| of any Pauli Q ∈ P n with the same syndrome σ(P ) = σ(Q). Since each Pauli operator can be factorised as the product of a Z-type Pauli operator in {I, Z} n (which we call its Z component) and an X-type Pauli operator in {I, X} n (which we call its X component), we can represent any Pauli operator P (up to phases) as a binary vector p = (p Z , p X ) ∈ F 2n 2 , where the ith element of p Z is nonzero only if the Z component of P acts nontrivially on the ith qubit, and likewise the ith component of p X is nonzero only if the X component of P acts nontrivially on the ith qubit. Multiplication of Pauli operators is equivalent to addition (modulo 2) of their binary representations, and we can represent the stabiliser group of a stabiliser code as a parity check matrix H = (H Z , H X ), where the ith row of H is the binary representation of the ith generator of S. Given an error E ∈ P n with binary representation e, the binary representation of its syndrome s is given by the matrix vector product s = He.
A Calderbank-Shor-Steane (CSS) stabiliser code has a stabiliser group that can be generated by check operators that are either X-type or Z-type Pauli operators. As a result, its parity check matrix can be written in the form where H Z and H X each have n columns and are called the Z and X parity check matrices, respectively. The requirement that the stabiliser group be abelian can now be expressed as the condition (2)

III. CHAIN COMPLEXES AND F2-HOMOLOGY
We will now introduce chain complexes and F 2homology, which we will see later are useful tools for the study of CSS codes. For a more comprehensive introduction to chain complexes, and their use in constructions of quantum LDPC codes, we refer the reader to Refs. [18,19]. In F 2 -homology, a chain complex C with length l is a collection of vector spaces and boundary maps ∂ i : C i → C i−1 , where the boundary maps satisfy the constraint for all i ∈ {0 . . . l}. We refer to each element C i as an i-cell, and call elements of im ∂ i+1 and ker ∂ i boundaries and cycles, respectively. From Equation (3) we know that im ∂ i+1 ⊆ ker ∂ i (every boundary is a cycle). However, each cycle is not necessarily a boundary, and the ith homology group of C is the quotient Associated with C is another chain complex called a cochain complex with coboundary operators δ i : Elements of ker δ i and im δ i−1 are cocycles and coboundaries respectively, and the ith cohomology group is H i = ker δ i / im δ i−1 .

IV. HYPERGRAPH PRODUCT CODES
The commutativity condition for CSS codes, Equation (2), is equivalent to the defining property of chain complexes in F 2 -homology, given in Equation (3). A consequence of this is that we can use a chain complex with length at least two to define a CSS code. We associate each qubit with an i-cell, where 0 < i < l. We then use the ith boundary operator as the X check matrix H X = ∂ i and use the ith coboundary operator as the Z check matrix H Z = δ i = ∂ T i+1 . The commutativity condition H X H T Z = 0 is then guaranteed to be satisfied by Equation (3). The Z logicals are associated with elements of the ith homology group H i , and the X logicals are associated with elements of the ith cohomology group H i . The number of logical qubits is therefore given by dim H i = dim H i .
One method of constructing chain complexes of length at least 2 is by taking the tensor product of length-1 chain complexes. The tensor product B ⊗ C of two complexes B and C, with lengths l B and l C respectively, is a chain complex A with length l A = l B + l C . This product complex A is a collection of vector spaces A 0 , A 1 , . . . , A l A for which and where the action of a boundary operator Here ∂ B i and ∂ C j are boundary operators acting in B and C respectively.
The homology groups of the product complex are given by the Künneth theorem and therefore the rank r A i of the ith homology group of A is given by The parity check matrix H of a linear code can be considered the boundary map ∂ 1 of a length-one chain complex C(H). Given parity check matrices H a , H b of two linear codes, we can construct the hypergraph product code HGP (H a , H b ) by taking the tensor product C(H a ) ⊗ C(H T b ) to obtain a length-two chain complex, with which we associate qubits with 1-cells, Z-stabilisers with 2-cells and X-stabilisers with 0-cells. Using linear rate and linear distance LDPC codes for H a and H b , Tillich and Zémor showed that the quantum LDPC code HGP (H a , H b ) on n qubits has linear rate and distance Ω( √ n) [15]. Specifically, if the check matrices H a and H b have dimensions n T a × n a and n T b × n b respectively then HGP (H a , H b ) will have n a n b + n T a n T b physical qubits. We will denote the nullity of H a , H b , H T a and H T b by k a , k b , k T a and k T b , respectively. From the Künneth theorem, the number of logical qubits in HGP (H a , H b ) is k 1 k 2 + k T 1 k T 2 . We will denote the distance of the codes ker H a and ker H b by d a and d b , respectively. The distance of HGP (H a , H b ) is then given by min( The 2D toric code is the hypergraph product of two repetition codes, and as a result hypergraph product codes can be seen as a generalisation of the toric code.
We can construct higher-dimensional hypergraph product codes by taking the product of more than two chain complexes. Taking the hypergraph product of D repetition code chain complexes we obtain a complex [11] from which a D-dimensional toric code can be derived. A D-dimensional toric code T D i is defined by associating a qubit with each i-cell of C D toric , a Z-stabiliser with (i+1)cell, and an X-stabiliser with each (i − 1)-cell. Hence, we can define D−1 unique toric codes for each C D toric , one for We define the distance d i (C) of the ith homology group of a chain complex C to be the minimum hamming weight of any nontrivial element of the group, and define its distance to be infinite if the homology group has zero rank. In Ref. [11], it was shown that the distance d i (A ⊗ B) of the ith homology group of the product of a complex A (of any length) with a length-one complex B is given by Consider the chain complex obtained by taking the product where H is a full rank r×c parity check matrix of a linear code ker H. The ath homology group of C a,b is the only nontrivial homology group, with rank κ a+b , where κ is the rank of H [11]. If the distance of the code ker H is d, then a direct application of Equation (10) can be used to show that the distance of a CSS code defined with qubits on a-cells of C a,b is min(d a , d b ) [11]. Furthermore, the dimension n a (C a,b ) of the ath vector space of the complex C a,b is given by Therefore the quantum code derived from C a,b by associating qubits with a-cells, X stabilisers with (a − 1)cells and Z stabilisers with (a + 1)-cells has parame- . Setting a = b = 1 we recover the standard hypergraph product code construction, however in this work we will also consider the case a = b = 2, obtaining 4D hypergraph product codes HGP 4D (H).

V. BELIEF PROPAGATION
The belief propagation (BP) algorithm (also known as sum-product algorithm) has been shown to be effective at decoding classical LDPC codes [20,21]. It has a running time that is linear in the block length of the code. In this section, we will review the use of BP for decoding a classical linear code with a binary r × n parity check matrix H. We assume we have measured a syndrome s ∈ F r 2 , and the role of the decoder is to infer a likely error e, which must satisfy He = s. Although we consider classical linear codes in this section, BP can be directly applied to decode Z (or X) errors with CSS quantum codes by using the X (or Z) parity check matrix in place of H, with the error e now corresponding to the Z (or X) component of the error.
BP is a message passing algorithm, in which messages are passed along the edges of a Tanner graph. The Tanner graph is a graphical model describing a factorisation of the joint probability distribution of the error e and syndrome vector s. The factorised joint probability distribution P (e, s) is P (e, s) = P (e)1[s = He] where P (e) is the prior probability distribution over the noise vector (i.e. P (e i ) = p for a binary symmetric channel with a probability p of error) and ∂j ⊆ {1, 2, . . . , n} denotes the indices of the variables involved in the j th parity check in H. The Tanner graph T (H) of Equation (13) has a factor node f j for each parity check and a variable node v i corresponding to the random variable associated with each element e i of e. There is an edge between f j and v i in the factor graph if variable i is involved in parity check j.
The problem BP approximately solves is the bitwise decoding problem of finding the marginal posterior probabilities of each bit, P (e i = 1|s). BP solves the bitwise decoding problem exactly on tree graphs, but is not exact on graphs that contain loops. While the factor graphs of LDPC codes do contain loops, BP can still perform well in practice provided the girth of the graph is sufficiently large.
For our implementation of BP, we represent each binary random variable U using a log-likelihood ratio (LLR) defined as L(U ) = log(P (U = 0)/P (U = 1)). BP involves repeating multiple iterations, where each iteration consists of a "horizontal step" and a "vertical step". In the horizontal step, which iterates over the rows of H, each parity check factor f j sends a message Q fj →vi to its adjacent variable nodes {v i : i ∈ ∂ j }: where each Q vi→fj is initialised before the first iteration to the LLR of the prior of variable v i , which is L(v i ) = log((1 − p)/p) for the binary symmetric channel with bitflip probability p.
In the vertical step, which iterates over the columns of H, each variable node v i sends a message Q vi→fj to its adjacent factor nodes {f j : j ∈ ∂ i }: We can also obtain an estimate Q vi of the LLR of the posterior of each variable v i at each step: These estimates of the posteriors of each variable are also called the soft decisions. From the soft decisions we then obtain hard decisions c ∈ F n 2 , where c i := 0 if Q vi > 0 and c i := 1 otherwise. If c is a valid correction satisfying Hc = s then BP stops and returns c as a correction. If Hc = s then BP instead continues and runs the next iteration of the horizontal and vertical step. If BP reaches a maximum number of iterations without finding a valid correction then it instead returns a heralded failure. There are therefore two possible failure mechanisms from BP: it either returns a valid correction that removes the syndrome but leaves a residual logical error e + c, or it fails to find a valid solution, resulting in a heralded failure. In our work we use a maximum number of iterations of 30 (this compares to 32 iterations used in Ref. [1]). We do not observe significant improvements in decoding performance from further increasing the number of iterations.
The update rules in Equation (14) and Equation (15), which are called the "tanh rule", can result in numerical underflow issues when the log-likelihood ratios become large. One way of avoiding this problem is to use the so-called Jacobian approach instead, which is mathematically equivalent to the tanh rule but does not suffer from numerical underflow issues. Another commonly used choice of updates is the min-sum update, which is an approximation to the tanh rule that has reduced complexity compared to either the tanh rule or the Jacobian method, at the cost of slightly reduced decoding performance. We review these different variants of BP in Appendix A, and refer the reader to Ref. [22] for more details.

VI. ORDERED STATISTICS DECODING
Often, a large fraction of the errors that occur in BP are heralded failures, where the decoder runs out of iterations and fails to converge. When a heralded failure occurs in BP, the ordered statistics decoding (OSD) technique can be used to post-process the soft decisions output by BP and find a good correction that is consistent with the syndrome [23]. This combination of belief propagation and orders statistics decoding (BP+OSD) has been shown to be a good general purpose decoder of quantum LDPC codes [1,24].
In the context of OSD, we will refer to the BP soft decisions as reliabilities r ∈ R n , which indicate the reliability of each bit of the hard decisions c output by BP. In other words, r j > r i implies that bit c j is more reliable than c i . Our definition of reliabilities r i := Q vi is consistent with Ref.
[1], except we use the LLR not the probability that the bit did not flip, but this is equivalent for OSD due to the monotonicity of the LLR function. The first stage of OSD (called order-0 reprocessing or OSD-0) involves finding a set of k bit positions (indices) that have high reliability and also correspond to k linearly independent columns of the generator matrix G of the code. We call this set of k indices, which we will shortly define more precisely, the most reliable information set T . Generally, an information set is any set of k indices corresponding to k linearly independent columns of the generator matrix G of the code.
Given an observed syndrome s and an assignment of a bitstring in F k 2 to the bits of the correction c located in an information set, there is a unique choice of the remaining n − k bits such that Hc = s. Therefore, in OSD-0, we only use the bits of the BP hard decisions that have support in an information set that we consider most reliable (as determined by the soft decisions), and after fixing these k bits, there is a unique choice of the remaining bits that is consistent with the syndrome.
In order to obtain T , we first sort the columns of G in order of non-decreasing reliability from left to right, and call the resultant matrix G . The most reliable basis B k for the column space of G is the first k = rk(G) linearly independent columns of G , from right to left. The most reliable information set T is the set of positions of the columns in B k . We can find T using the parity check matrix H of the code, rather than G. To do so, first we sort the columns of H in order of non-decreasing reliability, obtaining the matrix H 1 . We find the first n − k linearly independent columns of H 1 (e.g. by performing Gaussian elimination from left to right), and these columns constitute the least reliable basis Ω n−k for the column space of H. We denote the set of locations of the columns in Ω n−k by S ⊆ {1, 2, . . . , n}. In Ref. [25] it was shown that T is the complement of S, i.e. T = {1, 2, . . . , n} \ S.
The encoding operator E(x) : F k 2 → F n 2 takes as input a choice of correction x for the k bits in T , and returns the only correction operator u consistent with the syndrome s and choice of x. More formally, the correction u = E(x) is the unique choice of u that satisfies both Hu = s and is the subset of u containing only the elements located in T .
In OSD-0, we output the correction E(c [T ] ) if BP fails to converge. In order-w reprocessing (OSD-w), we instead carry out a brute force search over a set D of vectors x ∈ D with a small hamming distance from c [T ] , and choose the correction u = E(x) with smallest weight |u| among all x ∈ D. There are many possible choices of D for this restricted brute-force search, but we choose to search over all x that differ on any subset of the w weakest bits of c [T ] (the w weakest bits are those with the smallest reliabilities). This version of OSD-w was also used in Ref. [1].

VII. SINGLE-SHOT QUANTUM ERROR CORRECTION
The quantum circuit used to measure the stabilisers of quantum error correcting code will itself be faulty, and it is therefore necessary that any fault-tolerant error correction protocol should be able to handle both data qubit and syndrome measurement errors. For a surface code of distance d, this can be achieved using O(d) repetitions of the stabiliser measurements, using the entire syndrome history when decoding [2]. However, Bombín showed that there exist classes of error correcting codes for which single-shot error correction is possible [4]. With single-shot error correction, the decoder only uses information from a single round of noisy syndrome measurements. While single-shot error correction in general leaves a small residual error after each round, it can be shown that this residual error is local and can remain bounded over multiple rounds of error correction, which is sufficient for fault-tolerant quantum computation. Several families of quantum error correcting codes have been shown to exhibit single-shot quantum error correction, including the 3D gauge color code [4,5], 3D and 4D toric and hypergraph product codes [6][7][8][9][10][11], 4D hyperbolic codes [12], and quantum expander codes [14]. A property of a code that has been shown to be sufficient for single-shot decoding under an adversarial noise model is confinement [8]. A code has (t, f )-confinement if, for all errors E with |E| red ≤ t, the condition is satisfied, where f : Z → R is an increasing function with f (0) = 0. In other words, a code has confinement if, for errors with sufficiently small reduced weight, the minimum weight error consistent with the syndrome is bounded above by some function of the size of the syndrome. Quintavalle et al. proved that the 3D hypergraph product guarantees X-stabilisers with the confinement property [8]. However, an example of a code that is not confined is the 2D toric code, since the syndrome of a single string of Z errors (or X errors) has weight two, regardless of the weight of the error. Some single-shot codes have linear dependencies amongst the check operators, leading to syndromes becoming codewords of a classical linear code (called a metacode) that can be used for syndrome repair. These linear dependencies are not a requirement for a code to be single-shot (indeed quantum expander codes [14] are single-shot and confined, but can have full rank check matrices), however a metacode can nevertheless be useful when decoding [5,8]. We can construct a code that has syndromes encoded in a metacode using a chain complex with length at least three (to obtain a metacode for either X or Z syndromes), or length four if we would like a metacode for both X and Z syndromes. Consider a chain complex with length l at least four, and with qubits on the i-cells, where 1 < i < l − 1. In addition to the X check matrix H X := ∂ i and Z check matrix H Z := ∂ T i+1 , we additionally define the X metacheck matrix M X := ∂ i−1 and the Z metacheck matrix M Z := ∂ T i+2 . From Equation (3), it must hold that M X H X = 0, and hence any valid noiseless X syndrome z (in the image of H X ) must satisfy M X z = 0. Similarly, the Z metacheck matrix satisfies M Z H Z = 0. We can therefore think of valid X (Z) syndromes as codewords of a metacode with check matrix M X (M Z ). Invalid syndromes in ker M X belong to the (i−1)th homology group A. Two-stage decoding Given a noisy syndrome z , one approach to finding a correction operator is to use a two-stage decoder. We will consider two-stage decoding of CSS codes, using the Figure 1. The Tanner graph of a repetition code with noisy syndrome measurements. Also shown is the metacheck m1, which is just the linear combination m1 = f1 + f2 + f3 of the other three parity checks.
X stabilisers and X metacheck matrix to decode in the presence of Z errors and X stabiliser measurement errors.
Exactly the same method can be used to decode X errors with Z stabiliser measurements.
In the first stage of a two-stage decoder, the metacheck matrix M X is first used to find the metasyndrome s = is then used to find a probable noiseless syndrome z given s. In the second stage another decoder f 2 d : F ni−1 2 → F ni 2 is used to find a probable noise vector n given the corrected syndrome z, such that H X n = z.
There are two ways that such a correction can fail. Either the corrected syndrome z = f 1 d (s) is invalid, belonging to H i−1 , or the correction operator n = f 2 d (z) belongs to H i . In Ref. [8], the authors remove the first failure mechanism, which they call a metacode failure, by using a modified metacheck matrix where the rows of L M generate the (i − 1)th cohomology group H i−1 . Note that if L M is not sparse (as is typically the case for topological codes, for example), then decoding using M may be harder for some decoders (such as belief propagation). As a result, in Ref. [8], the authors only use M if the first attempt at decoding s = M z yields an invalid syndrome. However, as noted in Ref. [8], the threshold of two-stage decoders can be limited by that of the metacode, leading to performance that is far from optimal.

B. Single-stage decoding
Suppose a Z-type data qubit error e and syndrome measurement error s e have occurred, giving the observed syndrome s = H X e + s e . Rather than decoding in two stages, a natural decoding strategy we might instead consider is to find the most probable data qubit and syndrome measurement error consistent with the observed syndrome, given the error model for data qubit errors and syndrome measurement errors. This decoding prob-lem can be expressed as finding the minimum weight error e := e s e (19) satisfying H e = s where H is now the modified parity check matrix defined as The problem of minimum weight decoding of general linear codes is known to be NP-complete [26], however we can still apply alternative decoding algorithms to the modified parity check matrix H . Before discussing the choice of decoder in more detail, let us first consider the structure of Tanner graph T (H ) of H . The graph T (H ) is obtained from T (H X ) by adding a variable node v m i for each check node f i , as well as adding the edge (v m i , f i ). These new nodes and edges are highlighted in blue in the repetition code example in Figure 1. Each of these additional variable nodes corresponds to a bit flip error mechanism in the syndrome measurement outcome of the check node it is connected to.
This inclusion of a variable node for each parity check in the Tanner graph to handle measurement errors in the context of decoding with belief propagation was introduced in Ref. [27] to decode Bravyi-Bacon-Shor and subsystem hypergraph product codes, and has also been used to decode hypergraph product codes [28,29] and 4D hyperbolic codes [12]. In our work, we make two additional modifications. Firstly, as we show in Section VIII, belief propagation on its own is not sufficient for single-shot decoding of 3D and 4D toric codes, owing to the presence of degenerate error configurations leading to the splitbeliefs problem. Therefore, we use OSD post-processing to fix this degeneracy problem [1]. Secondly, since we have metachecks for our codes, we include these explicitly on the Tanner graph. These metachecks act solely on the newly added variable nodes corresponding to measurement errors. An example of a metacheck is shown in red in Figure 1. With the metachecks added, our new Tanner graph T (H M ) corresponds to the check matrix where here M is the metacheck matrix. Note that the rows corresponding to the metachecks (the bottom half of H M ) are linear combinations of the rows in the top half, since M (H X , I r ) = (0, M ). As such, the metachecks are already implicitly present in T (H ) as linear combinations of other checks, however we find that adding them explicitly to the Tanner graph and using the syndrome (s, M s) T improves the decoding performance of BP+OSD. We note that both Equation (21) and Equation (20) can be interpreted as the X check matrix of a data-syndrome code [29,30]. Note that, since the BP+OSD decoder finds a solution e satisfying H e = s, we are guaranteed to have 2 · 10 −2 4 · 10 −2 6 · 10 −2 8 · 10 −2 0.1 s + s e ∈ im H X , and so metacode failures are not possible with this single stage decoding. This is in contrast to two-stage decoding, where the L M matrix must be included in the modified metacheck matrix M to avoid metacode failures. However, since L M is typically not sparse, this method can degrade the performance of decoders such as belief propagation.

VIII. NUMERICAL SIMULATIONS
We simulated the performance of single-stage and twostage decoders under a phenomenological noise model, in which we perform N rounds of noisy syndrome measurement, followed by a final round of noiseless syndrome measurements. In each round of measurements, each data qubit suffers a Z (or X) error with probability p. In the rounds of noisy syndrome measurements, the stabiliser is also measured incorrectly with probability p. After each round of stabiliser measurements, we use our single-shot decoder to apply a correction operator. The final round of perfect stabiliser measurements simulates logical measurement by measuring all data qubits destructively in either the X or Z basis. When each data qubit is measured destructively, measurement errors have the same effect as data qubit errors, and stabilisers can be determined without error from classical post-processing. For the majority of our numerical simulations we use our own implementation of BP+OSD for simulations, which is consistent with Ref.
[1] using the exhaustive search strategy for post-processing with w = 10. The only results which use a different implementation are those that use the combination sweep strategy in Section VIII D, for which we used the LDPC Python package [31].  Figure 3. Example of a Z error in the L = 7, 3D toric code, with qubits identified with faces. A single round of Z noise on data qubits was simulated with perfect syndrome measurements. Highlighted in red is an isolated "half-cube" error which leads to split beliefs when decoded with BP. Highlighted in green is another half-cube error which is decoded successfully by BP, since the adjacent (blue) error breaks the degeneracy.

A. Decoder comparison
In Figure 2 we show the performance of the single stage BP decoder (without OSD or metachecks) for the 3D toric code, with 8 rounds of noisy syndrome measurements. Clearly BP does not have a threshold and, unusually, there is a regime where the logical error rate actually increases as the physical error rate decreases (with a peak in logical error rate at physical error rates of around 2% to 3%).
In order to understand this behaviour better, we visualised typical errors that lead to BP failing when decoding the 3D toric code for one round of perfect syndrome measurements (since we observed similar behaviour with this simpler error model). An example of such an error is shown in Figure 3. We find that BP can fail to converge when a "half-cube" failure mechanism is present: that is, when a Z error occurs on three sides of a cube. If we denote such an error as E, and then denote the Z stabiliser associated with the same cube as S, we note that E and SE both have the same syndrome and occur with the same error probability. This can lead to the problem of "split beliefs" resulting from degenerate errors, where the marginal posteriors output by BP are split between equally probable errors, leading to hard decisions that are inconsistent with the syndrome [32]. However, we observe that other errors in close proximity to a half-cube can break the symmetry in the decoding problem, allowing BP to converge. This is illustrated in Figure 3 where the half-cube highlighted in green is decoded correctly by BP (and is adjacent to another error), whereas the half-cube highlighted in red is isolated and  (22)). The legend gives the lattice size L. cannot be decoded by BP alone owing to the problem of split beliefs.
We model the probability of this failure mechanism by the probability that an isolated half-cube error occurs. The probability that a half-cube error occurs on a specific cube of the lattice is 6 3 p 3 (1 − p) 3 , and the probability that q errors occur in a neighbouring region of r qubits is r q p q (1−p) (r−q) . We assume that only some fraction α of these error configurations lead to BP failing to converge. Further, we assume that the error with support in the rest of the lattice (outside the neighbouring region of a half-cube) has no impact on BP decoding success. For an L × L × L 3D toric code, there are L 3 locations where these isolated half-cube errors can be centred. Assuming that up to q max errors can occur in the neighbourhood surrounding a half-cube, this leads to an estimated BP logical failure probability of where q max , r and α can be determined through a fit to the simulated data. We find a good fit with α = 0.42, r = 245 and q max = 4 (see Figure 4), which supports the conclusion that isolated half-cube errors are a dominant failure mechanism for BP in the 3D toric code. We note that the value r = 245 is close to the number of faces (240) in a 4×4×4 cube in the lattice. Our Equation (22) overcounts some error configurations, such as where more than one half-cube error occurs in the lattice. However, we still expect it to be an reasonable estimate of the probability of this class of errors occurring to leading order. We compare different decoding strategies for an L = 9 3D toric code in Figure 5. While single-stage BP decod-  Figure 5. Comparison of decoders for the L = 9 3D toric code with 8 rounds of noisy syndrome measurements. We compare a two-stage decoder (blue) with three single-stage decoding strategies using either BP alone or BP-OSD. We compare the performance of single-stage decoding with metachecks either included explicitly in the Tanner graph ("with MC") or omitted ("no MC").
ing does outperform the two-stage decoder (using BP-OSD) at higher physical error rates (e.g. for error rates of around 3% to 8%), it performs worse for error rates below 3% owing to the dominant half-cube errors. Fortunately, half-cube errors are heralded errors, where we know that BP has failed to converge (BP outputs a correction inconsistent with the syndrome). As a result, these half-cube errors can be fixed using OSD post-processing, which uses the BP posterior probabilities to guide the choice of a correction consistent with the syndrome. As shown in Figure 5 we observe that single-stage decoding with BP-OSD does indeed significantly outperform single-stage decoding with BP alone, and does not suffer from the increase in logical error rate at low physical error rates due to half-cube errors. We also achieve a slight improvement in performance by adding metachecks explicitly into the Tanner graph used by BP to decode.

B. Sustainable thresholds
After each round of noisy stabiliser measurements, we will not in general be able to find a correction operator that returns the state to the codespace. However, we can still protect quantum information for an arbitrarily long time period provided that the error from decoding the noisy syndrome in each round is sufficiently small to be corrected in subsequent rounds. This motivates the definition of the sustainable threshold. The sustainable threshold of a single-shot code family is the physical error rate p sus below which quantum information can be stored indefinitely by increasing the distance of the code, and is  Figure 6. Single-shot sustainable Z-threshold estimate for the 3D toric code using the single-stage BP+OSD decoder, compared to results for the two-stage BP+OSD+MWPM decoder from Ref. [8]. The threshold is plotted as a function of the number of noisy measurement rounds before the final round of perfect measurements. Inset is the threshold plot for 512 noisy rounds of syndrome measuremen, with the lattice size L given in the legend. defined to be [5] where p th (N ) is the threshold for N rounds of stabiliser measurements. In Refs. [5,8], the authors determine p sus by fitting to the ansatz however we do not find this ansatz to be a good fit for the code families and decoders we consider in this work. We found that an ansatz of the form p th (N ) = p sus (1 + (p 0 /p sus − 1)/(N + 1)) approximated our data much better, but still not closely enough. Therefore, we instead determine p sus by estimating p th (N ) numerically for increasing values of N (doubling N each time we increase it), until increasing N no longer decreases p th (N ) to within the statistical significance of our threshold estimates.
For the 3D toric code, with qubits on 2-cells, our results are consistent with a single-shot sustainable threshold for Z errors of at least 7.1% using the single-stage BP+OSD decoder, as shown in Figure 6. Since the logical failure rate is saturated at the threshold, it may be that the crossing occurs at a higher physical error rate (our estimate of 7.1% is a lower bound). This significantly outperforms the threshold of 3.08% found for two-stage decoding with BP+OSD+MWPM in Ref. [8]. In Figure 7 we plot p th (N ) for the 4D toric code (with qubits on 2cells) using the single-stage BP+OSD decoder, and using lattice sizes L ∈ {4, 5, 6, 7}. These results are consistent with a sustainable threshold of at least p sus = 4.3%, and again we find that the logical failure rate is saturated at threshold. Since the 4D toric code is self-dual, the 4.3% sustainable threshold we find is both the Z and X threshold. To the best of our knowledge, this is the highest single-shot threshold found to date for any code family, and even exceeds the 2.93(2)% threshold of the 2D toric code for the same noise model but using L rounds of stabiliser measurements [17].

C. Overhead reduction
We also investigated the performance of 4D hypergraph product codes with improved code parameters relative to the surface code. In Figure 8 we show the performance of a [[20625,1441,9]] 4D hypergraph product code, constructed from the fourfold tensor product of a [10,6,3] linear code. The [10,6,3] was constructed using the method given in [33] using check matrices of dimensions (5, 10) and a column weight of 2 (we constructed 100 check matrices this way and chose the code that performed best when decoded with BP). We computed the distance using the result in Ref. [11]. In this case, we decoded using BP (no OSD), and included the metachecks explicitly in the Tanner graph. The decoding performance without OSD is better than for other block codes we have tried, since there are fewer degenerate errors, owing to the higher weight stabilisers compared to the 4D toric code. The [[20625,1441,9]] code has stabiliser weight 8 and 10, compared to 6 for the 4D toric code. For a phenomenological noise model, using 32 rounds of noisy syndrome measurement, we find that the [[20625,1441,9]] code outperforms rotated surface codes of a similar size, even though the full syndrome history is used by MWPM for the surface codes when decoding. For a physical error rate of around 1%, the [[20625,1441,9]] 4D hypergraph product code requires between 10× and 20× fewer physical qubits than the surface code for the same logical error rate and number of logical qubits. We note that the logical error rate of the [[20625,1441,9]] 4D hyper- graph product code has a similar slope in Figure 8 to the 1441 copies of distance 9 surface codes (requiring 116,721 qubits), which is consistent with the [[20625,1441,9]] being decoded up to the full code distance. This also suggests that the reduction in qubit overhead might reduce to around 116721/20625 ≈ 5.7× in the limit of low physical error rates and low (e.g. 10 −12 ) target logical error rates.
We also investigate the performance of balanced product codes, which have been shown to have asymptotically better parameter scaling than hypergraph product codes [34]. In Figure 9 we compare the performance of a balanced product code with rotated surface codes. The balanced product code has parameters [[1628,100]], and is constructed from the balanced product of an expander code and its dual. The Tanner code is derived from the Cayley graph of the cyclic group C 22 with degree 14 with a [14,9,4] block code (from [35]) as the local code. To choose the generating set S of the Cayley graph of C 22 , we randomly sampled 200 generating sets and chose the set that produced a Cayley graph with the minimum λ 2 . Here λ 2 is the second-largest eigenvalue of the adjacency matrix of the graph and, due to the Cheeger inequalities, a smaller λ 2 implies that the graph has better expansion (see Section III of [34] for a review of how λ 2 relates to the properties of expander codes). The generating set we use has λ 2 = 1.92. The X and Z stabilisers both have check weights at least 8 and at most 13. We find a 10× to 20× reduction in qubit error rate at a physical error rate of around 1%. This is a similar reduction in overhead as achieved by the 4D hypergraph product code in Figure 8, however we note that the balanced product code achieves  Figure 10. Performance of BP+OSD for decoding the 2D toric code. We used the Ref. [31] implementation of BP+OSD with the combination sweep strategy for OSD (setting λ = 60) [24], and product sum updates for BP with a maximum number of iterations equal to the code block length. The lattice size L is given in the legend.
this overhead reduction with a much smaller block length. Our results in Figure 9 provide evidence that balanced product codes can be single-shot, and we provide additional numerical results that also support this conclusion in Appendix C.  Figure 11. Logical Z error rate of the triangular color code decoded using BP+OSD for a code capacity noise model, with the same implementation and settings of BP+OSD as used for Figure 10. The lattice size is given in the legend.

D. Limitations of BP+OSD
BP+OSD has already been shown to have remarkably good decoding performance for a wide range of quantum LDPC codes in the absense of syndrome measurement errors [1,24], and in this work we have shown how it can be used to obtain good single-shot decoding performance for higher dimensional hypergraph product codes. However, BP+OSD has limitations that present a challenge for its use as a general-purpose decoder for quantum LDPC codes. Firstly, BP+OSD has a running time time that is at least cubic in the size of the parity check matrix. This is significantly more computationally expensive than the union-find decoder for the surface code [36], and as a result real-time hardware implementations of BP+OSD may be difficult to achieve [37]. Secondly, we have observed that the performance of BP+OSD can degrade for 2D toric and color codes for large system sizes, as shown in Figure 10 and Figure 11. Although BP+OSD is unlikely to be used for the 2D toric and color codes, as these codes already have efficient decoders, it is nevertheless useful to consider them for benchmarking purposes. If only small system sizes are considered (L ≤ 15), the results of Figure 10 are consistent with a BP+OSD toric code threshold approaching that of minimum-weight perfect matching, as shown in Ref. [24]. However, for larger system sizes we find that the crossing point recedes significantly, which suggests either that the threshold is much lower than results for small system sizes might suggest, or that there may not be a threshold. We observe similar behaviour for the 2D triangular color code, as shown in Figure 11. For both Figure 10 and Figure 11 we used the variant of OSD implemented in Ref. [31] with the "combination sweep" post-processing strategy, setting λ = 60, which involves a larger brute-force search than OSD-10 and a relative reduction in the logical error rate. In order to ensure that the decoding performance was not limited by the number of BP iterations, we used a maximum number of iterations equal to the block length of the code. However, we did not observe a significant improvement from letting the number of iterations grow with block length, and in Appendix B we argue that there may be limited benefit to increasing the number of iterations beyond some constant for Tanner graphs with a small constant girth, which justifies our choice of 30 iterations used in the remainder of this work.
One method of recovering a threshold with BP+OSD using the exhaustive search strategy is by allowing w to grow with the block length, although this can lead to an exponential running time (since the brute-force search in OSD-w has complexity O(2 w )). To understand why this approach recovers a threshold, consider an X check matrix H which we will decode with BP+OSD. If we use OSD-w post-processing, setting w to be the nullity of H, then BP+OSD decoding with OSD-w is equivalent to finding the minimum weight Z-type error, which is the problem that the minimum-weight perfect matching (MWPM) decoder already solves efficiently for the 2D toric code, with a threshold of 10.3% [17]. We would therefore ideally like to find a variant of OSD postprocessing that has polynomial running time as well as a threshold both for the 2D toric code and more general families of quantum LDPC codes. Unfortunately, the greedy search methods for OSD in the literature do not appear to meet this criteria, however one possible solution might be to use the BP posteriors to guide cluster growth in the generalised Union-Find decoder for quantum LDPC codes introduced in Ref. [38]. A new postprocessing method called stabiliser-inactivation has recently been shown to improve on OSD for some quantum LDPC codes [39], although its performance for the toric code has not yet been studied.
The 2D toric and color codes are amongst the most challenging codes to decode for BP+OSD, since they have many low-weight stabilisers of even weight that lead to low-weight degenerate error configurations. For example, any weight-2 X-type error P in the support of a single site X stabiliser S in the toric code will have the same error probability and syndrome as the error P S, and will likely lead to BP failing to converge owing to the splitbelief problem. For the 3D and 4D toric code, typical problematic error configurations that lead to split beliefs occur with lower probability. This is a result of stabilisers having a higher weight than the 2D toric dcode, as well as a larger boundary in the Tanner graph than both the 2D toric and color codes, which increases the probability that nearby errors will break the degeneracy, as discussed in Section VIII A. This makes the decoding problem solved by the OSD post-processing more challenging for the 2D toric and color codes than for the 3D or 4D toric codes. Indeed, we do not observe the crossing recede for the 3D and 4D toric codes, even though the distances and system sizes we consider are already quite large. For example, the L = 7 4D toric code in Figure 7 has 14,406 data qubits, and distance 49. This compares to the only 6,050 data qubits in the L = 65 2D surface code in Figure 10, for which decoding performance has degraded. Will the 3D and 4D toric codes suffer the same problem of degraded performance with BP+OSD but just for larger lattice sizes than for the 2D toric code? Testing this numerically is challenging, given the O(n 3 ) running time of OSD. Regardless, our numerical results demonstrate that single-stage single-shot decoding with BP+OSD for the 3D and 4D toric code already has good performance for very large codes (larger than would likely be used in practice), and so its performance in the asymptotic limit may not be so important in practice.

IX. CONCLUSION
In this work, we have studied the problem of decoding higher dimensional hypergraph product codes in the single-shot regime. A common decoding strategy for these higher dimensional codes has been to use a twostage decoder, which first attempts to repair the syndrome, before using the corrected syndrome to estimate a data qubit correction. However, two-stage decoders introduce additional failure mechanisms (metacheck failures), and can be bottlenecked by the syndrome repair stage. Here, we demonstrate the improved performance that can be attained by decoding in a single stage. Using a single-stage BP+OSD decoder, our results are consistent with a sustainable threshold of around 7.1% for Z errors with the 3D toric code with a phenomenological noise model, a significant improvement on the threshold of 2.90% observed using the two-stage decoder of Ref. [8].
Since the optimal threshold of the syndrome repair step is 3.3% for the 3D toric code [40], our work confirms that this first stage was indeed limiting decoding performance [8]. For the 4D toric code, which is single-shot for both X and Z errors, our results are consistent with a threshold of around 4.3% for a phenomenological noise model, which far exceeds the highest previously observed single-shot threshold of 1.59% [9] for this code, using the Hastings decoder [16]. Furthermore, this even exceeds the 2.93% threshold of the 2D toric code for an equivalent noise model, but instead using L repeated rounds of syndrome extraction.
Although we show that BP+OSD is effective when used for single-shot decoding of higher dimensional codes for the system sizes we considered, we also demonstrate that the performance of BP+OSD significantly degrades for large lattice sizes for the 2D toric code. The 2D toric code is particularly challenging to decode with BP since it has many low weight stabilisers, leading to a high probability that there will be degenerate solutions for any given syndrome, which in turn causes the problem of "split beliefs" for BP [32]. We do not observe this degradation of performance for the higher dimensional codes we studied, despite simulating larger system sizes for these codes than for the 2D toric code. However, we cannot rule out that a similar problem will not arise for even larger system sizes with these higher dimensional codes, and it is challenging to test this regime numerically given the high computational complexity of OSD, which is cubic in the number of qubits. Our results therefore motivate further work to prove the existence of a threshold (or lack of a threshold) for BP+OSD for a family of quantum LDPC codes, as well as the development of improved post-processing techniques that use BP marginals. Despite these challenges, BP+OSD still has good performance for the large finite system sizes of 3D and 4D toric codes considered in this work, and so the asymptotic regime may not be so important in practice.
In future work, it would be interesting to investigate the performance of these higher dimensional hypergraph product codes using more realistic, circuit-level noise models. We might expect the threshold of the 4D toric code to be competitive with that of the 2D surface code, given that we have observed a higher threshold here under a phenomenological noise model, and the 4D toric code only has a slightly higher check-weight of six, relative to the weight-four checks of the surface code. The circuit-level noise can be handled by using BP+OSD to decode the full circuit-level Tanner graph [41], in which there is a variable node for each possible fault mechanism in the stabiliser measurement circuit. However, this requires the use of much larger check matrices, so more efficient post-processing methods than OSD will likely be required. Finally, additional improvements in performance can be expected by using soft information to update the priors on measurement errors [42,43] In this section we will review variants of BP that we used in this work, however we refer the reader to Ref. [22] for a more complete overview of BP. In Section V we reviewed the update rule known as the "tanh rule" for implementing BP when representing probabilities using log-likelihood ratios (LLR). The log-likelihood ratio L(U ) of a binary random variable U is defined as where here the natural logarithm is used. It follows that P (U = 0) = e L(U ) 1+e L(U ) and P (U = 1) = 1 1+e L(U ) . In this representation, the sum (modulo 2) of two independent binary random variables U and V is given by (A2) The identity in Equation (A2) is the origin of the "tanh rule" horizontal update rule for implementing BP using LLRs (see Equation (14)). However, using the tanh rule directly in an implementation of BP can lead to at least two problems. Firstly, once the LLR messages become sufficiently large, then a direct implementation of the horizontal tanh rule updates as given in Equation (14) using floating point arithmetic is numerically unstable. Secondly, the hyperbolic tangent computations can be too expensive for some hardware implementations. approximately constant and equal to ∂ tanh(x) ≈ 2 −54 . This results in an error in f (x) of Since tanh and tanh −1 are both antisymmetric, the same holds for x 0, i.e. the numerical error scales exponentially in |x|, as ∂ f (x) ≈ e 2|x| /2 56 , as also shown empirically in Fig. 12. One approach to handle these underflows is to cap the magnitude of the log-likelihood ratios when evaluating the check-to-variable messages. Alternatively, Equation (14) can be implemented exactly using the Jacobian approach, outlined in Appendix A 1, which does not suffer from these numerical stability issues.
However, the Jacobian approach requires evaluating logarithms and exponentials and, as with the tanh rule, is not so amenable to fast implementations in hardware. The min-sum update rule is an approximation of the tanh rule which avoids its numerical instability issues as well as requiring only real additions, simplifying its implementation. We review the min-sum update rule in Appendix A 2

The Jacobian approach
The horizontal tanh rule of Equation (14) can be rewritten in a form that does not suffer from numerical underflow issues. To derive this alternative update rule [22,44], first notice that In order to implement the log-likelihood horizontal update rule using Equation (A7) [22], we first assume that the factor f j is connected to k variable nodes on the factor graph labelled v 1 , v 2 , . . . , v k , (i.e. we assume ∂j = {1, 2, . . . , k}). We now define . . , g 1 = g 2 ⊕ v 1 , and then calculate all L(f i ) and L(g i ) using Equation (A7) recursively (e.g.
Therefore, the horizontal messages are computed using this 'Jacobian rule' for 1 < i < k as and the remaining messages are calculated as Q fj →v0 = (−1) zj g 1 and Q fj →v k = (−1) zj f k−1 .

The min-sum rule
While the tanh rule and Jacobian rule can both be implemented efficiently in a time linear in the blocklength, for some practical applications it becomes necessary to use an even simpler decoder, which avoids the costly evaluation of the hyperbolic tangent. A widely used update rule is the min-sum rule, which approximates the tanh rule while being significantly simpler to implement in hardware [22,45]. From Equation (A7) we see that |L(U ⊕ V )| ≤ min(|L(U )|, |L(V )|), which leads us to the min-sum horizontal update rule where here 0 ≤ α ≤ 1 is a constant chosen to better approximate the tanh rule. By setting α = 2 −a + 2 −b for some a, b ∈ {1, 2, 3}, multiplications can be replaced with bit shifts and additions, which simplifies FPGA implementations [37].

Appendix B: Locality of BP
We observed in Section VIII D that increasing the maximum number of iterations of BP beyond some fixed number did not noticeably impact performance, even for large 1 iter 3 iters 4 iters 100 iters 100 iters 0.00 0.25 0.50 0.75 1.00 Figure 13. The marginals output by BP for an L = 15 toric code for five different syndromes (and with a prior of 0.05 for each variable node in the Tanner graph). We consider the problem of decoding Z errors using X stabiliser measurements, and a qubit is associated with each edge and an X stabiliser with each vertex of the lattice. For each syndrome we only show a 4 × 7 region of the lattice. A -1 X stabiliser measurement is displayed as a green cross, and the colour of each edge shows the marginal probability output by BP (according to the colour bar on the right). The title for each example syndrome shows the number of iterations used by BP, as well as a tick if BP converged or a cross if it did not converge. We use a maximum number of iterations of 100. distance codes. This suggests that it was not the number of iterations that was limiting performance for the 2D toric code. To understand this better, in Figure 13 we consider the simple problem of decoding a single vertical string-like error in the 2D toric code using BP. In other words, the syndrome is a pair of -1 X stabiliser measurements, separated by some number l of vertical edges in a vertical column of the lattice. We will refer to the locations of -1 stabiliser measurements as defects. We have deliberately set up the problem in such a way that there is only a single minimum-weight solution, to reduce the impact of "split-beliefs" due to degenerate solutions in quantum codes.
We find that, for l ≤ 3, BP converges quickly, as shown by the marginals in the left three examples in Figure 13. However, when the defects are separated by 4 or more edges (e.g. l = 4 and l = 5 for the right two examples in Figure 13), BP instead fails to converge. This suggests that information is only propagated effectively by BP within some local region of the lattice. Since BP is exact on tree graphs, and split beliefs are not a problem for this example, we expect that this issue arises due to loops in the Tanner graph. For the 2D toric code, the shortest loop in the Tanner graph for Z errors (e.g. considering only the X check matrix) has length 8; for each length-4 loop around a face of the lattice, there is a corresponding length-8 loop in the Tanner graph. Similarly, the defects separated by l = 4 edges in the lattice are in fact also separated by 8 edges in the Tanner graph. This suggests that the ability of BP to effectively propagate information significantly degrades beyond a radius in the Tanner graph that corresponds to its girth (some information propagates, but the strength of the signal is reduced). We refer to this as the problem of bounded information spread in BP. Note that we are not claiming that no information propagates beyond the radius equal to the girth. Indeed, we verified that there is still enough information in the BP marginals for l = 4 and l = 5 for OSD to find the minimum-weight correction in Figure 13, since the marginal probabilities are slightly larger along the minimum-weight path (albeit much less than 0.5). However, in a more realistic setting in which more defects are present, we might expect the messages propagated from nearby defects (within the radius equal to the girth) to overwhelm or 'hide' the much weaker messages passed from further away in the Tanner graph. Therefore, there may be limited benefit in increasing the maximum number of iterations of BP with system size for codes with Tanner graphs that have constant girth, and so it is reasonable to leave this parameter as a constant (30 in our case). Note that the full Tanner graph of a quantum code in general has girth 4, in contrast to the Tanner graph of either the X check matrix or Z check matrix alone when decoding X and Z errors independently, as we have done here, which potentially worsens the problem of bounded information spread in BP. This is due to the commutativity condition of quantum codes. For example, for the full Tanner graph of a CSS quantum code, each X stabiliser must overlap on an even number of qubits with each Z stabiliser. Consider an X stabiliser S X and Z stabiliser S Z which overlap on two qubits i and j. In the full Tanner graph there will be variable nodes corresponding to Y errors Y i and Y j . As a result there will be a loop (containing four edges) in the full Tanner graph that visits nodes in the order (S X , Y i , S Z , Y j , S X ).
Our analysis does, however, further emphasises the limitations of BP alone for the 2D toric code, and so postprocessing is crucial to ensure that these string-like errors for l ≥ 4 can be corrected. While the inference performed by BP is useful, it appears important (for the 2D toric code at least) that the post-processing step be performed by a decoder that considers global syndrome information. For quantum codes where these issues of split beliefs and locality arise, a natural strategy might be to take some decoder known to have a threshold for the code of inter-est, and then use BP marginals to boost its performance. That way we do not rely on the BP marginals alone to provide a correction, since these marginals may not be sufficiently reliable.
Appendix C: Additional numerical results for balanced product codes In Figure 14 we show the single-shot decoding performance of the [[1628,100]] balanced product code for a varying number of rounds r of noisy syndrome measurement (before one final round of perfect syndrome measurement). In other words, r = 0 corresponds to perfect syndrome measurement (code capacity model). The logical error rate increases with the number of rounds as expected, but the increase in logical error rate remains stable, increasing approximately linearly with the num- ber of rounds for r > 1.