A hardware-efficient leakage-reduction scheme for quantum error correction with superconducting transmon qubits

Leakage outside of the qubit computational subspace poses a threatening challenge to quantum error correction (QEC). We propose a scheme using two leakage-reduction units (LRUs) that mitigate these issues for a transmon-based surface code, without requiring an overhead in terms of hardware or QEC-cycle time as in previous proposals. For data qubits we consider a microwave drive to transfer leakage to the readout resonator, where it quickly decays, ensuring that this negligibly affects the coherence within the computational subspace for realistic system parameters. For ancilla qubits we apply a $|1\rangle\leftrightarrow|2\rangle$ $\pi$ pulse conditioned on the measurement outcome. Using density-matrix simulations of the distance-3 surface code we show that the average leakage lifetime is reduced to almost 1 QEC cycle, even when the LRUs are implemented with limited fidelity. Furthermore, we show that this leads to a significant reduction of the logical error rate. This LRU scheme opens the prospect for near-term scalable QEC demonstrations.

Quantum computing has recently reached the milestone of quantum supremacy [1] thanks to a series of improvements in qubit count [2,3], gate fidelities [4][5][6][7][8][9][10][11][12][13][14][15] and measurement fidelities [16][17][18]. The next major milestones include showing a quantum advantage [19][20][21][22] and demonstrating quantum error correction (QEC) [3,[23][24][25][26][27][28][29][30][31]. Errors accumulate over time in a quantum computer, leading to an entropy increase which severely hinders the accuracy of its output. Thus QEC is necessary to correct errors and remove entropy from the computing system. If the overall physical error rate is below a certain noise threshold for a given QEC-code family, the logical error rate decreases exponentially with the code distance d at the price of a poly(d) overhead, thus allowing to extend the computational time. Recently, small-size instances of error-detecting [29,30] and error-correcting [3] codes have been experimentally realized. To further demonstrate fault tolerance it is crucial to scale up these systems and show that larger distance codes consistently lead to lower logical error rates than smaller distance codes [31].
Leakage outside of the computational subspace [8-10, 12, 32-37], present in leading quantum-computing platforms such as superconducting qubits and trapped ions, poses a particularly threatening challenge to fault tolerance [23,[38][39][40][41][42][43][44][45][46][47][48]. Leakage can increase entropy by making measurement outcomes no longer point to the underlying errors and can effectively reduce the code distance [46]. Furthermore, leakage can last for many QEC cycles [40], making operations on a leaked qubit fail and possibly spread correlated errors through the code [31,39,46]. In particular, leakage falls outside the stabilizer formalism of QEC as it cannot be decomposed in terms of Pauli errors. Stabilizer codes [49,50] and their decoders are thus typically ill-suited to deal * battistel.fra@protonmail.com with leakage, leading to a significant increase of the logical error rate [42,45,46]. If the average leakage lifetime l L avg , that is, the average number of QEC cycles that a qubit stays leaked (after leaking in the first place), fulfills l L avg = O(1) QEC cycles and l L avg d, then for low-enough error rates a threshold is likely to exist [39] as leakage would have a relatively local effect in space and time. Due to a finite energy-relaxation time, leakage does indeed last for l L avg = O(1) QEC cycles. However, in practice it is important how large l L avg is, since if it is low the noise threshold is expected to be higher. Shortening the relaxation time to reduce l L avg is not effective as this increases the physical error rate as well.
A leakage-reduction unit (LRU) [38,39,41,42,47,48,51,52] is an operation introducing a seepage mechanism besides that of the relaxation channel. A LRU converts leakage into regular (Pauli) errors and shortens the average leakage lifetime, ideally to 1 QEC cycle. As discussed above, this is expected to lead to a higher noise threshold, but not as high as for the case without leakage, since the leakage rate effectively adds to the regular error rate thanks to the LRU. As an alternative to the use of LRUs, post-selection based on leakage detection has been adopted [46] as a near-term method to reduce the logical error rate. While leakage detection could also be used to apply LRUs in a targeted way, post-selection is not scalable. By shortening the lifetime to l L avg = O(1) d, the use of LRUs is instead a scalable approach.
In its imperfect experimental implementation a LRU can either introduce extra Pauli errors or mistakenly induce leakage on a non-leaked qubit. Furthermore, in the context of the surface code the LRUs investigated so far [41,42,47] introduce an overhead in terms of hardware and QEC-cycle time. Specifically, these LRUs are variants of the swap-LRU, in which the qubits are swapped at the end of each QEC cycle, taking alternatively the role of data and ancilla qubits. In this way every qubit is measured every 2 QEC cycles. The core arXiv:2102.08336v2 [quant-ph] 6 Jul 2021 of the swap-LRU is the fact that the measured qubits are reset to the computational subspace after the measurement. This can be accomplished by a scheme which unconditionally maps |1 and |2 (and possibly |3 [48]) to |0 [53][54][55], or conditionally using real-time feedback [28,56]. Under the standard assumption that the SWAP gates swap the states of two qubits only if none of them is leaked (which does not necessarily hold in experiment [48]), l L avg is ideally shortened to 2 QEC cycles. On the downside, for the pipelined surface-code scheme in [57], the pipeline is disrupted as qubits cannot be swapped until the measurement and reset operations are completed, leading overall to an increase up to 50% of the QEC-cycle time depending on the reset time. The extra CZ gates, needed to implement the SWAPs, can cause additional errors or leakage as the CZ is the major source of leakage in transmons [8-10, 12, 32-35]. Moreover, in the surface code an extra row of qubits is needed to perform all the SWAPs [41], which is a non-negligible overhead in the near term. All these issues increase the physical error rate by a considerable amount, thus requiring to increase the system size to compensate for that (assuming that the error rates are still below threshold).
In this work we propose two separate LRUs for data and ancilla qubits which use resources already available on chip, namely the readout resonator for data qubits (res-LRU) and a |1 ↔ |2 π pulse conditioned on the measurement outcome for ancilla qubits (π-LRU). In particular, the use of the res-LRU avoids the necessity to swap data and ancilla qubits to be able to reset the data qubits. The res-LRU is a modification of the two-drive scheme in [53][54][55] to a single drive to deplete only the population in |2 but not |1 , making it a LRU rather than a reset scheme. We additionally show that this negligibly affects the coherence within the computational subspace in an experimentally accessible regime, with a low probability of mistakenly inducing leakage as long as the thermal population in the readout resonator is relatively small. This allows us to unconditionally use res-LRU in the surface code in every QEC cycle. In the pipelined scheme [57] the res-LRU easily fits within the time in which the data qubits are idling while the ancilla qubits are finishing to be measured. As the π-LRU can be executed as a short pulse at the end of the measurement time with real-time feedback, our LRU scheme overall does not require any QEC-cycle time overhead. Using density-matrix simulations [46,50,58] of the distance-3 surface code (Surface-17), we show that the average leakage lifetime is reduced to almost 1 QEC cycle when res-LRU and π-LRU with realistic performance are employed. Furthermore, compared to the case without LRUs, the logical error rate is reduced by up to 30%. The proposed res-LRU and π-LRU can be straightforwardly adapted to QEC-code schemes other than [57] and the res-LRU is potentially applicable to superconducting qubits with higher anharmonicity than transmons. The demonstrated reduction serves as evidence of scalability for our LRU scheme, even though we can-not estimate a noise threshold as we have simulated only one size of the surface code. To explore larger codes it is necessary to use less computationally expensive simulations [23,39,42] that use a simplified version of our error model at the cost of losing some information contained in the density matrix. Furthermore, to optimize the noise threshold the LRUs can be supplied with a leakage-aware decoder [23,39,42,[59][60][61] that uses measurement information about leakage to better correct leakage-induced correlated errors.

I. READOUT-RESONATOR LRU
The readout resonator has been used [53][54][55] to reset a transmon qubit to the |0 state, depleting the populations in |1 and |2 . Targeting the |20 ↔ |01 transition, with the notation |transmon, resonator , those populations are swapped onto the readout resonator, where they quickly decay due to the strong coupling to the transmission-line environment. Ref. [53] uses two drives simultaneously while Refs. [54,55] use these drives in a three-step process. Here we adapt these techniques to use a single drive in a single step to deplete the population in |2 only.
A LRU is defined [38] as an operation such that 1) the incoming leakage population is reduced after the application of the LRU, 2) the induced leakage when applied to a non-leaked state is ideally 0. We thus ensure below that not only leakage is reduced but also that the effect that the drive has on a non-leaked transmon is as small as possible.

A. Transmon-resonator Hamiltonian
We consider a transmon capacitively coupled to a resonator and to a dedicated microwave drive line. The resonator possibly employs a Purcell filter which we do not include explicitly. In a frame rotating at the transmondrive frequency ω d for both the resonator and the transmon, the Hamiltonian is time-independent and is given by where a and b are the creation operators for the resonator and the transmon, respectively; δ r = ω r − ω d and δ q = ω q −ω d with ω r and ω q the resonator and transmon frequencies, respectively; α < 0 is the transmon anharmonicity; g corresponds to the capacitive coupling; Ω and φ are the transmon-drive amplitude and phase, respectively. The phase is not relevant for the results in this work and we fix it to φ = 0. Figure 1. Concept of the readout-resonator LRU. (a) The state |20 (with the notation |transmon, resonator ) is connected to |01 by two main paths via either |11 or |10 , due to the capacitive coupling g or the transmon-drive amplitude Ω, respectively. This generates an effective couplingg which can be used to swap |20 ↔ |01 . The latter quickly decays to |00 due to the typically high coupling κ of the readout resonator to the transmission-line environment, overall removing leakage from a leaked transmon. (b) In the rotating frame of the drive, |20 and |01 show an avoided crossing as a function of the drive frequency ω d , centered at ω * d . The effective couplingg(ω * d ) is equal to half the energy separation at that point. (c),(e) ∆ω * d := ω * d − (2ωq + α − ωr) andg(ω * d ) are respectively evaluated either exactly by full numerical diagonalization of H in Eq. (1), or by approximate analytical formulas (see Section I A and Appendix A) for the parameters in Table I. The absolute errors with respect to the exact curves are shown in (d),(f) respectively.
We can qualitatively understand (see Fig. 1(a)) that H contains an effective couplingg between |20 and |01 . If ω d matches the transition frequency between the "bare" |20 and |01 , these two states are degenerate in the rotating frame and they are connected by two paths (at lowest order) via either |11 or |10 . If ∆ := ω q − ω r g and δ q Ω, then |11 and |10 are occupied only "virtually" and one gets purely an effective |20 ↔ |01 coupling. Modulo a constant term, in the 2D subspace S = span{|20 , |01 } we can write H in Eq. (1) as H| S ≡ −η(ω d )Z/2 +g(ω d )X for an appropriate function η (an approximation can be extracted from Eq. (A35)). As a function of ω d this Hamiltonian gives rise to a |20 ↔ |01 avoided crossing centered at a frequency ω * d (see Fig. 1(b)) where η(ω * d ) = 0. The energy separation at the center of the avoided crossing is then 2g(ω * d ). In order to quantitatively study the action of H, we unitarily transform it using a Schrieffer-Wolff transformation e S [62][63][64][65]. Let {|ij D } be the basis of eigenvectors of H 0 + H c (the transmon-resonator "dressed" basis). In the dispersive regime (g ∆), with respect to a 1st-order Schrieffer-Wolff transformation S 1 in the perturbation parameter g/∆, such that e −S1 |ml ≈ |ml D , we get (see Appendix A) H D := e S He −S ≈ e S1 He −S1 with where ∆ m := ∆ + αm and {|m } are transmon states. H D 0 is diagonal and contains the dispersive shifts, H D d1 is the transmon drive now in the unitarily transformed frame, H D d2 contains an indirect resonator drive and couplings of the kind a † |m m + 2|. In particular, for m = 0 in Eq. (9) we get a lowest order approximation ofg: Notice that at this order there is no dependence on ω d . Furthermore,g would vanish for α = 0, since the two paths in Fig. 1(a) fully destructively interfere in that case. Since α is low for transmons, one can expect that Ω needs to be relatively large forg to be substantial. For the drive to be most effective it is important that ω d matches ω * d . If g = 0 = Ω, there is no avoided crossing but |20 and |01 simply cross at ω * d,0 ≡ 2ω q + α − ω r as can be straightforwardly computed from H 0 in Eq. (2). This value is shifted due to the capacitive coupling (as can be seen from Eq. (7)), as well as due to the possibly strong drive. For g = 0 and Ω = 0 one can either compute ω * d by full numerical diagonalization of H and find the avoided crossing as a function  Table I. Parameters used both in the analysis and Lindblad simulations of the readout-resonator LRU, similar to the experimental ones in [27]. The transmon parameters correspond to the target parameters of a high-frequency data qubit in Section II.
of ω d , or one can find an (approximate) analytical expression. For the latter we use another Schrieffer-Wolff transformation (rather than the resolvent method in [54], which does not give the full Hamiltonian) to account for the effect of the transmon drive H D d1 and to compute ω * d up to order Ω 4 /(δ q ) 3 , see Appendix A. We also use this transformation to computeg up to order Ω 3 /(δ q ) 2 .
, respectively, given the parameters in Table I. We consider 6 energy levels for the transmon and 3 for the resonator as we see that the exact curves converge for such choice. In Fig. 1(c)(d) we see that the two approximations are both pretty good, while in Fig. 1(e)(f) we see that Eq. (10) deviates by up to 1 MHz from the exact value at high Ω and that the absolute error with respect to the exactg(ω * d ) scales in a seemingly quadratic way. Instead, the higher order approximation stays closer to the exact curve and the error scales linearly. We expect that the remaining gap would be mostly filled by considering also higher orders in g/∆ in the first Schrieffer-Wolff transformation, since increasing only the order of approximation in Ω/δ q does not provide a significant improvement in Fig. 1(d).

B. Performance of the readout-resonator LRU
Given the theoretical understanding of the transmonresonator system, we devise a pulse to minimize the population in |2 on a leaked transmon. We consider the pulse shape similarly to [54], where t p is the total pulse duration, at a fixed frequency ω d (t) = ω d . Hence, there are four parameters to optimize over, i.e. Ω, ω d , t p and t rise . We fix t rise = 30 ns since we observe that this strongly   suppresses non-adiabatic transitions out of the manifold of interest: for example, |20 is coupled to |10 by the drive but they are quite off-resonant, so only a fast pulse can cause "non-virtual" transitions between them. Indeed, for t rise 10 ns there appear ripples (for an example see [54]) in e.g. the |20 and |10 populations when the drive is turned on and off, leading to a reduction in performance. We expect that an improved pulse shape can shorten t rise . However, we do not explore this given the long maximum t p allowed in our surface-code scheme (t p ≤ T slot = 440 ns, see Section II A).
We use Lindblad simulations of the transmonresonator system to optimize over Ω, ω d and t p . The Lindblad equation is given bẏ with {K j } the quantum jump operators. We express (and solve) this equation in the exact unitarily transformed frame. That is, while in Section I A we have used a first-order Schrieffer-Wolff transformation e S1 (see Eq. (5)), in the numerics we compute the full transformation e S (see also Eq. (5)). In this way we find the basis that exactly diagonalizes H 0 + H c and express H d in this basis as well, without any further Schrieffer-Wolff transformation like in Section I A. In other words, the simulations reproduce the dynamics under the Hamiltonian in Eqs. (1) to (4) without any approximation. The Hamiltonian parameters are the same as in Section I A and are reported in Table I, including the noise parameters. In particular, while we neglect the transmon thermal population, we include it for the resonator since it determines the leakage that the pulse induces when the transmon was not leaked, as we discuss below. The resonator thermal state is given by [67] for low average photon numbern. We consider dressed relaxation and dephasing, as given below, assuming that this is a good model in the dispersive regime. In the unitarily rotated frame, the employed jump operators {K j } are explicitly given by where T φ = (1/T 2 − 1/2T 1 ) −1 and where we consider 6 energy levels for the transmon and 3 for the resonator.
Note that e.g. for a, going back to the original frame it holds that e −S ae S = 1 l=0 √ l + 1 |l D l + 1| D = a D by definition of e S , corresponding indeed to relaxation in the dressed basis. By considering dressed relaxation and dephasing, the effective relaxation time T q 1 of the transmon is not shortened by the fact that it is coupled to a lossy resonator (Purcell effect). We assume that this is a good approximation also during driving as the drive couples eigenstates which mostly have the same number of excitations in the resonator (except for |20 and |01 when the drive is near-resonant with this transition and causes a strong mixing of these states). We thus mimic the use of a Purcell filter but without including it in the simulations since that would increase the Hilbert-space dimension in a computationally expensive way.
For each choice of (Ω, ω d ) we optimize t p such that, given the initial state |2 2| ⊗ σ th , the leakage population p |2 = 2| Tr r (ρ(T slot ))|2 at the end of the available time slot is minimized (see Fig. 2(a)). The states |20 and |01 approximately form a two-level system with additional damping from |01 to |00 , thus the drive effectively induces damped Rabi oscillations [68] between them. Oscillations occur only forg > κ/4 [68] (underdamped regime), while forg = κ/4 (critical regime) org < κ/4 (overdamped regime) the populations in |20 and |01 simply decay in an exponential-like way without forming any minimum. For the parameters in Table I the critical drive amplitude that givesg = κ/4 is Ω cr /2π ≈ 143 MHz. Thus for Ω ≤ Ω cr the best strategy is to drive until p |2 reaches a (low) practically-stable value (which is in general not 0 when the full system is taken into account). Here with the given κ we find that this occurs in a time comparable to T slot only from about Ω = Ω cr , so for Ω ≤ Ω cr we drive for the entire T slot . For Ω > Ω cr the optimization has many local minima as a function of t p , corresponding to the minima of the |20 ↔ |01 oscillations induced by the drive. Here we choose to target the first minimum as in [54,55] since it is the fastest approach. For a sudden pulse this minimum would occur around π/2g for sufficiently small κ, whereas we find heuristically that a good initial guess for the optimization is π/2g damp withg damp := g 2 − (κ/4) 2 e −κ/7g for larger κ. Then for the optimization over t p we use the bounds t p − 2t rise ∈ [0, 1.1 × π/2g damp ] (using the bounded Brent method in scipy; we provide the code at https://doi.org/10.4121/14762052). While using a longer t p in the underdamped regime (possibly even greater than the allotted T slot ) would eventually lead to an even lower leakage population [53], it is not necessarily desirable as a longer t p may mean that the disturbance to a non-leaked transmon is greater as well (see Appendix B 2).
While the procedure above optimizes t p given a certain pair (Ω, ω d ), we use the package adaptive [66] to choose the next pair to sample and we iterate this process. This package searches a given parameter space (here Ω/2π ∈ [0, 500 MHz], ω d /2π ∈ [5.19, 5.26 GHz]) in a finer way where the given cost function changes faster. Here we use (log p |2 ) 2 as the cost function since it changes faster where p |2 is small, allowing us to get both a high-resolution heatmap (see Fig. 2) and a good first estimation of the p |2 minima in a single run. Then we run a local optimization with tight bounds around some of these candidate points for fine tuning.
In Fig. 2(a) one can observe a band with low p |2 as desired. This band occurs at drive frequencies slightly above ω * d (Ω), which one would expect to be optimal based on Section I A. We attribute this to the fact that a significant share of the time is taken by the rise and fall of the pulse, where Ω(t) is smaller than the maximum. We find that one can choose a broad range of Ωs to achieve a p |2 0.5%, from 130 MHz (slightly below the critical point) to deep in the underdamped regime. However, other considerations apply, namely, on the high end using a very high Ω poses strong experimental requirements on the drive, while on the low end the pulse takes much longer and it is not a priori given that driving at the critical point would be best. Actually, notice that driving at the critical point with good performance is possible only due to the relatively high T slot for the given κ. In the following we choose the point marked by a star in Fig. 2 as the operating point (Ω/2π ≈ 204 MHz, ω d /2π ≈ 5.2464 GHz, t p = 178.6 ns). This point reaches p |2 op. ≈ 0.5% while affecting the least the coherence within the computational subspace (see Appendix B 1). We attribute the fact that this minimum does not reach 0 to re-heating from |00 to |01 , as well as transmon decoherence (resonator pure dephasing would contribute as well but here T r φ = ∞) and interactions with higher energy levels. We note that in Fig. 2(a) we find good p |2 5% up to Ω/2π 100 MHz, which could be used to further ease the requirements on the drive (see Section II).
The time evolution for a few selected states is shown in Fig. 2(b) for the operating point, given the initial state |2 2| ⊗ σ th . The first few ns make |20 rotate into |01 , while the latter decays relatively fast to |00 due to the large relaxation rate κ of the readout resonator. Already after ≈ 220 ns the remaining |01 population has practically returned to the thermal state. The repetition of the pulse, such as in the surface code (see Section II) at every QEC cycle, thus does not lead to heating of the resonator with these system parameters (see Section III for a discussion about other parameter regimes).
We now evaluate the effect of the pulse on a non-leaked transmon (see Fig. 2(c),(d)). There should ideally be no effect, except for an acquired single-qubit phase which can easily be determined and corrected by either a real or virtual Z rotation. First, if the transmon is in |0 and there is some thermal population in the resonator, part of the state is supported on |01 , which rotates into |20 in the same way as the opposite process by unitarity. Figure 2(c) shows that indeed the induced leakage is greater where p |2 is lower in Fig. 2(a). However, due to the lown = 0.005, the induced leakage is also overall low (p |2 ≈ 0.48% in Fig. 2(c) at the operating point, which is comparable to state-of-the-art CZ leakage rates, see Section II B) and can be made even lower by engineering colder resonators. If the initial state is |1 1| ⊗ σ th there is little induced leakage (p |2 ≈ 0.02% at the operating point and p |2 0.04% across the whole landscape) as the drive is off-resonant with transitions from this state. Second, the pulse might affect the coherence times of the transmon by driving transitions within or outside the computational subspace (and back), as the small but non-negligible transitory population in |10 in Fig. 2(b),(d) seems to suggest. However, we find that both the effective T q 1 and T q 2 are only marginally affected as a function of Ω (see Appendix B 1). This is because stronger pulses cause a somewhat stronger disturbance to the qubit, but they are shorter so that in total the effect is small.
(a) Schematic overview of the Surface-17 layout [46,57]. Pink (resp. red) circles with D labels represent low-(high-) frequency data qubits, while blue (resp. green) circles with X (Z) labels represent ancilla qubits, which have an intermediate frequency. Ancilla qubits and high-frequency data qubits are prone to leakage during the CZ gates. (b) The quantum circuit for a single QEC cycle employed in simulation, for the unit-cell scheduling defined in [57], in which we insert the LRUs. The res-LRUs (orange) are applied unconditionally on the high-frequency data qubits after the CZs, while the π-LRUs (teal) are applied on the ancilla qubits depending on the measurement outcome. Gray elements correspond to operations belonging to the previous or the following QEC cycle. The duration of each operation is given in Appendix C 1. The arrow at the bottom indicates the repetition of QEC cycles.

A. Layout and operation scheduling
We study the distance-3 rotated surface code (see Fig. 3(a)), nicknamed Surface-17, in the presence of leakage and LRUs. We follow the frequency and pipelined scheme in [57], in which the 9 data qubits are subdivided into 3 high-and 6 low-frequency ones. The 4 X and the 4 Z ancilla qubits have an intermediate fre-quency. We consider the flux-pulse implementation of the CZs [9,10,[32][33][34] for tunable-frequency transmons, in which the transmon with the greater frequency is lowered towards the other one with a flux pulse. With this technique fluxed transmons are prone to leakage. This means that the high-frequency data qubits and all the ancilla qubits can leak. As shown in [46], leakage can last for many QEC cycles and be quite detrimental to the logical performance of the code. Here we address these issues with the res-LRU for high-frequency data qubits and with the π-LRU for ancilla qubits, as described below. If due to a different implementation of the CZs (or due to leakage mobility [46,48]) also the low-frequency data qubits can leak, one can apply the res-LRU to them as well but we do not explore this here.
The circuit executed for each QEC cycle is shown in Fig. 3(b). The X-type and Z-type parity-check units are implemented in an interleaved way, with the CZs for one unit being applied while the other ancilla-qubit type is measured. The duration of each operation is summarized in Appendix C 1, with a total QEC-cycle duration of 800 ns. The data qubits are idling for a considerable amount of time, namely T slot = 440 ns, while the ancilla qubits are measured. We choose this time slot as the ideal place to apply the res-LRUs, introduced in Section I, to the high-frequency data qubits. Notice that the optimal pulse selected in Section I B, which was simulated for the target parameters of the high-frequency data qubits, takes about t p = 180 ns and thus easily fits within this time slot (see Section III for a discussion about other parameter regimes).
For the ancilla qubits there is no available time slot to apply the res-LRU. A possibility would be to make the QEC-cycle time longer by inserting these LRUs when the measurement is completed. However, this approach would lower the logical error rate of the code by a nonnegligible amount. On the other hand, ancilla qubits are measured and the (analog) measurement outcome contains information about leakage [46]. We choose to use a different type of LRU altogether which uses this information. Specifically, we consider a |1 ↔ |2 π pulse, conditioned on the measurement outcome reporting a |2 . Below we discuss further details of the implementation of this π-LRU.

B. Implementation of the LRUs in the density-matrix simulations
We use density-matrix simulations [50] using the opensource package quantumsim [58] to study Surface-17 with res-LRUs and π-LRUs. We include relaxation and dephasing (T 1 and T 2 ), as well as flux-dependent T 2 and leakage rate L 1 during the CZs, following the same error model as in [46]. L 1 is defined as the average leakage from the computational to the leakage subspace [69]. The state of the art is L 1 ≈ 0.1% [9,10], although the actual L 1 is expected to be higher when operating a multi-transmon processor [30,70], thus here we consider up to L 1 = 0.5%. We assume that single-qubit gates do not induce any leakage as their leakage rates are typically negligible compared to the CZs [5,36,37]. The noise parameters used are reported in Appendix C 1. Furthermore, during a CZ with a leaked transmon, the non-leaked transmon acquires a phase called the leakage conditional phase [46]. We select these phases uniformly at random (see Appendix C 3) and, in contrast to [46], we then keep them fixed for every Surface-17 simulation in this work. This makes it easier to recognize trends as a function of the LRU parameters. In Appendix C 3 we discuss the variability of the logical error rate depending on the leakage conditional phases. We do not consider further leakage from |2 to |3 in subsequent CZ gates [46] as we expect it to be negligible when LRUs make |2 short-lived.

res-LRU for data qubits
In the simulations, leakage-prone transmons are modeled as 3-level systems and non-leakage-prone ones as 2level systems, leading to an already computationally expensive size for the density matrix. As a consequence, we do not include the readout resonator explicitly in these simulations. The resonator is initially in the ground state and is returned to it at the end of the time slot, approximately. We can thus trace the resonator out and model the res-LRU on the transmon qubit as an incoherent |2 → |0 relaxation (see Appendix C 1 a for details). Furthermore, in Section I B we have observed that the res-LRU can also cause a non-leaked transmon to partially leak, so we include that as an incoherent |0 → |2 excitation.
Calling p |j i , p |j f the populations before and after the res-LRU, we define the leakage-reduction rate 0 ≤ R ≤ 1 as R = 1 − p |2 f conditioned on an initially fully leaked transmon, i.e. for p |2 i = 1. Furthermore, we define the average res-LRU leakage rate L LRU 1 as the average of the induced leakage starting from either |0 or |1 (consistently with the definition for CZ [69]), with probability 1/2 each. Since almost all induced leakage comes from |0 (see Section I B), this means that p glecting relaxation effects as the used T 1 = 30 µs is relatively long). Combining these two definitions one gets the expression for an arbitrary incoming state. Notice that, given these definitions, Fig. 2(a),(c) respectively show a heatmap of 1 − R and 2L LRU 1 for the considered transmonresonator parameters. In particular, the operating point achieves R ≈ 99.5% and L LRU 1 ≈ 0.25%. The achieved leakage reduction can be compared with the one given purely by relaxation during T slot , namely R T1 = 1 − e −T slot /(T1/2) = 2.9%, which shows that the LRU provides a much stronger additional seepage channel.

π-LRU for ancilla qubits
The dispersive readout of a transmon qubit is in general performed by sending a pulse to the readout resonator, integrating the reflected signal to obtain a point in the IQ plane and depleting the photons in the resonator (either passively by relaxation or actively with another pulse) [16][17][18]. The measured point is compared to one or more thresholds to declare the measurement outcome. These thresholds are determined as to optimally separate the distributions for the different outcomes, which have a Gaussian(-like) form. Here we assume that the distribution for |2 is sufficiently separated from |0 and |1 [16]. This is generally expected to be possible thanks to the different dispersive shift. Then one uses three thresholds in the IQ plane to distinguish between |0 , |1 and |2 (or two if |2 is well-separated from e.g. |0 ). We also assume that an outcome can be declared during photon depletion, thus enabling realtime conditional feedback. This is challenging to perform in 200-300 ns in experiment due to the classicalpostprocessing requirements, but it has been previously achieved [28,56]. We can then apply the π-LRU right at the end of the depletion time. The |1 ↔ |2 π pulse is expected to be implementable as a simple pulse in the same way and time as single-qubit gates (20 ns) and with comparable, coherence-limited fidelity.
If conditional feedback is not possible in the allotted time, one can either increase the QEC-cycle duration (at the cost of extra decoherence for all qubits, scaling as 1 − e −textra/T2 per qubit per QEC cycle) or postpone the conditional gate to the next QEC cycle. In the latter case, one source of error corresponds to the ancilla qubit already seeping before the application of the π-LRU, which then causes it to leak instead. The probability of this error is already low and is expected to become even lower with longer T 1 s and lower-leakage CZs. The other errors are the Z rotations (depending on the leakage conditional phases) that the leaked ancilla qubit spreads for at least 1 extra QEC cycle, as well as the fact that the parity-check stays disabled. We do not simulate these variants and we expect a relatively low logicalperformance loss, corresponding to an average leakage lifetime of about 2 QEC cycles (see Figs. 4 and 9).
Readout-declaration errors are expected to affect the performance of the π-LRU. On one hand, an incorrect declaration of |1 as a |2 makes the π pulse induce leakage. On the other hand, declaring a |2 as a |1 would lead to leakage not being corrected and lasting for at least one extra QEC cycle. We define the readout matrix M with entries M ij =: p M (i|j) being the probability that the actual state |j resulting from the projective measurement is declared as an |i . In the simulations we use In particular, this means that we do not consider declaration errors within the computational subspace. While that would change the value of the logical error rate since the error syndrome gets corrupted, it is not relevant for evaluating the performance of the π-LRU since a |0 mistaken for a |1 or vice-versa does not trigger the π-LRU anyway. Furthermore, we assume that a |0 cannot be mistaken as a |2 since their readout signals are often much more separated than the signals of |1 and |2 . Note that if a |0 (rather than a |1 , as we assume in this work) could be mistakenly declared as a |2 , then a |1 ↔ |2 π pulse does not induce leakage, so here we consider the worst-case scenario for the π-LRU.

C. Average leakage lifetime and steady state
Once a qubit leaks, it tends to remain leaked for a significant amount of time, up to 10-15 QEC cycles on average [46]. Starting from an initial state with no leakage, the probability that a qubit is in the leaked state tends towards a steady state within a few QEC cycles. It was shown in [46] that this evolution is well captured by a classical Markov process with leakage (resp. seepage) rate Γ C→L (Γ L→C ) per QEC cycle, where C (resp. L) is the computational (leakage) subspace. Note that here L is 1-dimensional, corresponding to |2 . In our error model, without accounting for LRUs, these rates are approximately given by where N flux is in how many CZ gates the transmon is fluxed during a QEC cycle, t c is the duration of a QEC cycle and L 1 (resp. L 2 ) is the average leakage (seepage) probability of a CZ [69]. Thus the two native mechanisms that generate seepage are the CZs themselves and relaxation.
The major effect of a LRU is to effectively increase Γ L→C in Eq. (19) by introducing an extra seepage mechanism. Hence we expect that Γ LRU L→C ∼ Γ L→C + R for data qubits and Γ LRU L→C ∼ Γ L→C + p M (2|2) for ancilla qubits, preventing leakage from accumulating and lasting long for large R or p M (2|2).
The average leakage lifetime l L avg is the average duration of leakage and for a Markov process it is calculated as n P(stay in L for n QEC cycles) thus assuming that the qubit starts in L. The evolution of the leakage probabilityp L (n), averaged over surfacecode runs, as a function of the QEC-cycle number n is well-approximated by [46] p L (n) = Γ C→L Γ C→L + Γ L→C (1 − e −(Γ C→L +Γ L→C )n ). (22) The steady state is the long-time limit and is given bȳ For ancilla qubitsp L (n) can be computed directly from the "true" measurement outcomes (i.e. without declaration errors on top). For data qubits it can be computed from the density matrix. Specifically, for data qubits we evaluatep L (n) right after the CZs. Figure 4 shows l L avg andp L ss extracted from the Surface-17 simulations by fittingp L (n) to Eq. (22) for each qubit. We can indeed observe that these quantities drop substantially for both data and ancilla qubits. The decays follow an inverse proportionality as e.g. for data qubits for sufficiently large R and small Γ LRU C→L . For ancilla qubits we expect, similarly, a 1/p M (2|2) dependence. The lifetime drops from values 10 to ≈ 1, which is the minimum value it can achieve (some points drop below 1 within error bars as it is difficult for the fit to estimate such a short lifetime). As of course the LRUs do not prevent leakage from occurring during the CZs in the first place, one cannot expect the steady state to reach 0 even for a perfect LRU (R = 1), but ratherp L ss ∼ Γ LRU C→L ≈ N flux L 1 (+L LRU 1 if the LRU can mistakenly induce leakage). Figures 4(b),(d) show that this is indeed the case. Figure 4 also demonstrates that both l L avg andp L ss get close to their minimum values already for R, p M (2|2) 80%. This suggests that res-LRU and π-LRU may not necessarily need to be perfect to provide a good logical performance in Surface-17. This means that one could use e.g. a weaker pulse to implement the res-LRU or that the readout of |2 may not need to be particularly optimized in practice.

D. Logical performance
In the simulations the logical qubit is initialized in |0 L and the logical fidelity F L (n) is computed at the end of each QEC cycle as the probability that the decoder correctly determines whether a logical error has occurred or not. We do not perform a similar analysis with initial state |+ L or other states as the density-matrix simulations are computationally expensive and we expect a similar performance. The logical error rate ε L per QEC cycle  can be extracted by fitting F L (n) = [1+(1 − 2ε L ) n−n0 ]/2, where n 0 is a fitting parameter (usually close to 0) [50]. We evaluate ε L for the upper bound decoder (UB) which uses the complete density-matrix information to infer a logical error, and for the minimum-weight perfectmatching decoder (MWPM). Detailed information about these decoders can be found in [50,71] and an overview is given in Appendix C 1 b.
By mapping a leaked qubit back to the computational subspace, a LRU does not fully remove a leakage error but can at most convert it into a regular (Pauli) error. Hence, it is not to be expected that ε L in the presence of leakage can be restored to the value at L 1 = 0. We consider realistic parameters for the LRUs. Specifically, we use R = 95%, L LRU 1 = 0.25%, p M (2|2) = 90% and p M (1|1) = 99.5%. We have shown in Section I B that the first two parameters can be attained with realistic parameters for the transmon-readout system, while the last two are close to be achievable in experiment [14,53]. In particular, while the operating point has R = 99.5%, we conservatively choose R = 95% here. Notice that p M (1|1) = 99.5% is quite high. We argue that the state of the art can be squeezed as the threshold to distinguish between |1 and |2 in the IQ plane could be moved towards |2 , rather than placing it in the middle as is common practice. In this way one would mance as a function of the LRU parameters can be found in Appendix C 2. Figure 5 shows the reduction in ε L as a function of the CZ leakage rate L 1 when LRUs with the given parameters are employed. Using only the res-LRU or the π-LRU lowers ε MWPM L by basically the same amount, while ε UB L is lower for the π-LRU than for the res-LRU. We attribute this to the fact that UB directly uses the information in the density matrix, while MWPM relies on the measured syndrome, thus being more susceptible to ancilla-qubit leakage. When both LRUs are used, we see that ε L is reduced by an amount which is close to the sum of the reductions when only one kind of LRU is used. As expected, ε L is not restored to the value at L 1 = 0, but the reduction is overall significant and can reach up to 30% for both MWPM and UB compared to the case without LRUs.

III. DISCUSSION
In this work we have introduced a leakage-reduction scheme using res-LRUs and π-LRUs which does not require any additional hardware or a longer QEC cycle. Furthermore, while the scheme in [48] is applicable only to ancilla qubits, our combination of res-LRU for data qubits and π-LRU for ancilla qubits enables to significantly reduce leakage in the whole transmon processor. We have shown with detailed simulations using realistic parameters that the reset scheme in [53][54][55] can be adapted to be a LRU without significantly affecting the states in the computational subspace, allowing to unconditionally apply the res-LRU in the surface code. The use of the res-LRU for data qubits, as well as the use of the π-LRU for ancilla qubits, leads to a substantial reduction of the average leakage lifetime and leakage steady state, preventing leakage from lasting more than ≈ 1 QEC cycles on average, even when the LRUs are imperfect and can introduce leakage themselves. Using full density-matrix simulations of Surface-17 we have demonstrated that this leads to a significant reduction of the logical error rate for both the UB and MWPM decoders.
Regarding the practical implementation of the res-LRU, the required drive amplitude is relatively strong, similarly to the one used in the experiments in [53][54][55]. It is thus important that the microwave crosstalk is minimized by careful engineering of the drive lines. Furthermore, in a multi-transmon processor it is relevant that the drive frequency does not accidentally match any two-qubit or neighboring single-qubit transitions. E.g., in the original scheme in [57] that we followed, the target frequencies are 6.7, 6.0 and 4.9 GHz for high-, mid-and low-frequency qubits, respectively, and 7.8 GHz for the readout resonator [27]. In particular, the mid-frequency qubits (the ancilla qubits) are parked around 5.4-5.5 GHz during measurement, with their |1 ↔ |2 transition around 5.1-5.2 GHz. This is close to the optimal drive frequency found in Section I B (≈ 5.25 GHz), which can lead to an indirect ancilla-qubit drive mediated by the bus resonator, albeit weaker. The difficulty of precise frequency targeting in fabrication can further lead to undesired frequency collisions. These issues can be alleviated by choosing slightly different transmon/resonator frequencies and anharmonicities to make the drive more off-resonant with that transition (combined with better frequency targeting [72]), or they can be mitigated altogether by using tunable couplers [1,11,14]. The res-LRU is compatible with tunable-coupler schemes and their possibly different operation scheduling than in [57], as well as potentially applicable to superconducting qubits which use a resonator for dispersive readout other than the transmon. Tunable couplers would also be advantageous to fully protect the res-LRU performance from residual ZZ crosstalk, even though we find that a cumulative ZZ interaction up to ∼ 2 MHz can be tolerated with fixed couplers (see Appendix B 3). Beside this, if the low-frequency data qubits can leak depending on the implementation of the CZ, the res-LRU can be applied to them in the same time slot as the high-frequency ones. If the thermal population in the readout resonator is relatively high in a given experiment, the effect of a correspondingly high L LRU 1 can potentially be mitigated by applying res-LRU conditionally on the detection of leakage by a set of hidden Markov models [46].
Regarding the viability of inserting the res-LRU in the surface-code time scheduling, the necessary condition is that t p ≤ T slot . We can express T slot as T slot = t m − 4t CZ , where t m is the measurement time for the ancilla qubits. Slower CZs might make T slot too short, although CZs even faster than 40 ns (as assumed here) have been realized in 15 ns [12]. The measurement time can be further broken down into readout-pulse time and photondepletion time, t m = t read + t depl . Both of these would be reduced by a larger κ, however, assuming that the κ's of ancilla-and data-qubit resonators are comparable, t p would be reduced as well. Even if we keep t p and t CZ fixed to the values in this work, we get t m ≥ 340 ns, which is significantly lower than t m = 580 ns as considered here. A desirable, additional condition to the necessary one is that T slot − t p ≥ 4/κ, i.e. that there is enough leftover time in T slot to allow for the data-qubit resonator to return the thermal state, where we estimate that 4 decay constants would suffice (together with the fact that the resonator was already relaxing during t p ). Assuming similar depletion time for data-and ancillaqubit resonators, this roughly means that the res-LRU is easily applicable if t p is smaller or similar to t read . Note that in this work we have T slot −t p ∼ 16/κ and t p < t read . If the additional condition above is not satisfied, one could demand that at least the resonator has returned to the thermal state before the res-LRU in the following QEC cycle, i.e. T slot − t p + 8t CZ + 2t H ≥ 4/κ. In this case the disadvantage would be that the presence of a fraction of a photon in the resonator would cause additional dataqubit dephasing especially during the first few CZs. As the extra photon is present only when the qubit was previously leaked, we expect this disadvantage to be small as long as the overall leakage rate is small. If even the relaxed additional condition is violated, on top of the additional dephasing the resonator would also heat up, effectively leading to a higher L LRU 1 in the QEC cycle(s) following the one in which the qubit leaked. As also this effect scales with L 1 , we expect that it would not be an issue as long as κ is not very low (allowing for at most 1 extra QEC cycle to thermalize we get κ/2π ≥ 1 MHz). Otherwise, leakage would not really be removed from the system but would be largely moved back and forth from the transmon to the resonator.
The demonstrated reduction in the average leakage lifetime and in the logical error rate is expected to lead to a higher noise threshold for the surface code in the presence of leakage, compared to the case without LRUs. Furthermore, for error rates below threshold (both regular and leakage) we believe that the logical error rate would be exponentially suppressed with increasing code distance when employing LRUs. Without LRUs this might hold only when the code distance is sufficiently larger than the average leakage lifetime (d l L avg ). For smaller distances the relatively long correlated error chains induced by leakage might lead to a sub-exponential scaling. To study the noise threshold and sub-threshold behavior it is necessary to implement simulations of large code sizes which use a simplified error model, such as a stochastic error model for leakage and Pauli errors [23,39,42]. We expect that the demonstrated MWPM logical error rate can be further lowered by the use of decoders [23,39,42,[59][60][61] that use information about leakage extracted directly or indirectly (e.g. with hidden Markov models [46]) from the measurement outcomes.
The data underlying this work, as well as the code to analyze it, are available at https://doi.org/10.4121/ c.5320331. The code used to generate the data is available upon request to the corresponding author.

Schrieffer-Wolff Transformation
In this section we explain the concept of the Schrieffer-Wolff transformation (SWT) [62][63][64] and derive the equations that we use in the following sections.
Consider a Hamiltonian expressed in a certain basis {|ψ n }, where H 0 is block diagonal with respect to this basis and the perturbation V can be taken as block off-diagonal without loss of generality (block-diagonal terms can be included in the definition of H 0 ). Furthermore, we assume ||V || = O (1) and ∆ ij , where we set ∆ ij as the minimum energy separation between blocks i and j.
The SWT corresponds to finding an anti-hermitian matrix S such that is block diagonal. In other words, calling {|ψ n } the basis of eigenstates of H, e S = n |ψ n ψ n |. The matrix S can be expanded in a series where each S k is block off-diagonal. If ∆ ij one can expect the first order (S 1 ) to provide a good approximation, otherwise one needs to consider higher orders depending on (although the series does not always converge for extensive systems [63] Setting the block off-diagonal parts at 1st, 2nd and 3rd order to 0 we get where we have used the first equation to simplify the following ones. These equations can be solved iteratively for S k (given knowledge of the eigenstates of H 0 ). The Hamiltonian H is then block diagonal up to 4th order and is explicitly given by This expression has been simplified using Eqs. (A13) to (A15), together with the fact that e.g. [S k , [. . . , . . .] D ] D = 0 since S k is block off-diagonal.

SWT of the capacitive coupling
We consider the Hamiltonian H = H 0 + H c + H d of a driven transmon capacitively coupled to a resonator, as given in Eqs. (1) to (4).
The SWT of H c up to 1st order in the perturbation parameter = g/∆, where ∆ = ω q − ω r , is implemented using the matrix [65] where {|m } are transmon states and where we have ab-sorbed in the definition of S 1 . The Hamiltonian in the unitarily transformed frame as defined in Section I A is then given by H D ≈ e S1 He −S1 = e S1 (H 0 + H c )e −S1 + e S1 H d e −S1 (A18) with e S1 (H 0 + H c )e −S1 = H 0 where we define ∆ m = ∆ + αm = ∆ − |α| m as α < 0 for transmons. The second term above contains a Stark shift of the transmon frequency and the last term is the state-dependent dispersive shift. The approximation in Eq. (A20) is due to the fact that we have ignored a double-excitation exchange term coming from [S 1 , H c ], since it is proportional to gα/(∆ m ∆ m−1 ). This is negligible for low anharmonicity and, secondly, for ω r > ω q as then ∆ < 0 and |∆ m | increases with m. If in-stead ω r < ω q , ∆ > 0 and |∆ m | decreases with m, so even if the approximation is good for the two lowest levels, there can be some higher level which does not sit well within the dispersive regime. However, in this work we consider a system with ω r > ω q , hence we do not need to take this into account.
The drive Hamiltonian in the unitarily transformed frame takes the form e S1 H d e −S1 = Ωe iφ 2 b + h.c.
The last term contains a 1st-order approximation in g/∆ of the |20 ↔ |01 effective couplingg, which is linear in Ω. However, the "pure" drive term H D d1 can be quite strong, so we need to evaluate how it affectsg and the rest of the Hamiltonian.

SWT of the pure drive Hamiltonian
Summarizing, in the unitarily transformed frame the original Hamiltonian H takes (approximately) the form where H D 0 is given in Eq. (A20) and H D d1 , H D d2 are given in Eq. (A22).
We now want to find an additional SWT transformation S = S 1 + S 2 + S 3 , with H D d1 taking the role of V in Appendix A 1, defining a "double-dressed" Hamiltonian such that H DD 0 is fully diagonal up to 3rd order in the perturbation parameter = Ω/δ q . Then H DD d gives the couplings within the manifold of interest (|20 , |01 ) and outside of it. We absorb k in the definition of S k so it does not explicitly appear below.
Following Appendix A 1, to find S 1 we need to solve Eq. (A13), i.e.
in this specific case.
Bracketing it with the eigenstates {|ml } of H D 0 , with the notation |transmon, resonator , we get the matrix elements of S 1 as where {E D ml } are the eigenenergies of H D 0 , which can be easily inferred from Eq. (A20). We neglect the dispersive shift since it is proportional to α/∆. Then where δ i,j is the Kronecker delta. From this equation one can infer that where we have defined δ q m = δ q + αm + g 2 ∆−1 ∆m−1∆m . Having derived S 1 , we can compute S 2 from Eq. (A14), i.e.
Clearly the first term is the diagonal part while the second term is the off-diagonal one. With a similar procedure as the one used for S 1 , it follows that We can then compute S 3 from Eq. (A15), i.e.
The result is |m m + 1| 1 12 We can eventually use Eqs. (A29), (A32) and (A34) together with Eq. (A16) to obtain H DD 0 (defined in Eq. (A25)): We note that this expression implicitly contains all cross terms between the perturbative parameters g/∆ and Ω/δ q up to the chosen orders. The approximate coupling Hamiltonian H DD d (defined in Eq. (A25)) up to 2nd order in Ω/δ q is instead given by where and All terms in H DD resid. are relatively small and off-resonant with the |20 ↔ |01 transition so we expect them to have a small effect and we do not proceed with higher orders of SWTs.

Analysis of the |20 ↔ |01 avoided crossing
In this section we give the methods used to calculate the curves in Fig. 1(c),(e).
We define ω * d as the drive frequency corresponding to the center of the |20 ↔ |01 avoided crossing of the full Hamiltonian H as given in Eq. (1). Then the exact value of the effective |20 ↔ |01 couplingg is given by half the energy separation at that point. The avoided crossing can be found numerically by exact diagonalization as a function of ω d .
In the subspace S = span{|20 , |01 } we can write H as H| S ≡ −η(ω d )Z/2 +g(ω d )[cos(φ)X + sin(φ)Y ] = −η(ω d )Z/2 +g(ω d )X for φ = 0 as in Section I A. As we want to implement a |20 ↔ |01 π rotation, we notice that the choice of φ, i.e. the choice of rotation axis in the equator of the Bloch sphere, is irrelevant. We have also ignored a term proportional to the identity I, which gives a phase difference with respect to states outside of S, in particular between the computational and leakage subspaces of the transmon. However, this phase is irrelevant if |20 is swapped entirely onto |01 since the latter decays and dephases fast, thus suppressing any phase coherence. As demonstrated in Section I B the res-LRU can reach a very high R, for which the effect of this phase is then minimal. Assuming that H DD resid. in Eq. (A41) is negligible, an analytical approximation of η is given by where we have made the dependence of H DD 0 in Eq. (A35) on ω d explicit. This holds since then H DD 0 accounts for all the Stark shifts of |20 and |01 due to the capacitive coupling and the drive (up to the given orders). The center of the avoided crossing is found by imposing the condition η(ω d ) = 0. As the explicit expression that can be extracted from Eq. (A35) is not analytically solvable, we use the secant method available in scipy to find ω * d that fulfills this condition in Eq. (A42). It is then straightforward to compute the (approximate) analytical estimate for the effective coupling asg(ω * d ) = | 01|H DD eff.coupl. (ω * d )|20 | from Eq. (A38), which is plotted in Fig. 1(e). . The purple line corresponds to the higher order estimate of the optimal drive frequency ω * d as a function of Ω (see Fig. 1(c)). The heatmaps are sampled using the adaptive package [66].
during the whole time slot of T slot = 440 ns, leading to a uniform metric for the whole (ω d , Ω) landscape. Specifically, to estimate T 1 we prepare the state |1 1| ⊗ σ th , we simulate the Lindblad equation in Eq. (12) and we evaluate the remaining population p |1 in |1 at the end of the time slot after tracing out the resonator. Assuming that p |1 = e −T slot /T1 we then compute T 1 by inverting this formula. To estimate T 2 we prepare |+ +| ⊗ σ th and we evaluate the decay of the off-diagonal transmon matrix element |0 1| as this is directly available in simulation (rather than simulating a full Ramsey experiment). We then invert | 0| Tr r (ρ(T slot ))|1 | = e −T slot /T2 /2 to get T 2 . Figure 6 shows the resulting effective T 1 and T 2 . In Fig. 6(a) one can see that T 1 decreases by at most 15% as a function of Ω, showing that a short t p mostly counterbalances the effect of a strong Ω. In particular, T 1 ≈ 27.1 µs at the operating point. On the other hand, one can notice that T 1 dips around Ω cr /2π = 143 MHz, where the pulses are very long, suggesting that driving slightly into the underdamped regime is favourable. In Fig. 6(b) one can see that the value of T 2 is about 7.7 µs at Ω = 0, i.e. when no pulse is applied. This has to be contrasted with the input T 2 parameter of 30 µs inserted in the Lindblad equation (see Table I). We assume that that implicitly accounts for dephasing caused by flux noise only. Photon-shot noise from the resonator is a further dephasing source which is explicitly included in these simulations. The combination of flux and photon-shot noise leads to the actual effective T 2 reported in Fig. 6(b). We note that ifn = 0 then the effective T 2 at Ω = 0 would exactly match the input of 30 µs. While the effective T 2 can be restored from 7.7 µs to 30 µs with colder resonators or by engineering different system parameters altogether, the important information from Fig. 6(b) is that T 2 barely changes as a function of Ω. Combined with the similar result for T 1 , this means that the drive causes only a marginal effect within the computational subspace. Notice that in the region where the readoutresonator LRU is most effective (just above the purple line in Fig. 6(b)), T 2 is even slightly higher than at Ω = 0 (7.9 versus 7.7 µs). We attribute this to the fact that the pulse temporarily reduces the excited-state population in the resonator (see Fig. 2(d)). In this way photon-shot noise is reduced until the resonator re-thermalizes, however at the cost of some leakage of the transmon. In Fig. 2(d) one can notice that a non-negligible amount of population ends up in |10 from the initial state |0 0| ⊗ σ th . This corresponds to an excitation rate T ↑ 1 ≈ 256 µs at the operating point. We backtrack this source of error to a combination of the drive and the jump operator a † , corresponding to the drive inducing a transmon excitation rate based on the resonator excitation rate. However, as here T ↑ 1 max{T 1 , T 2 }, it is not a limiting factor and we have not included it in the Surface-17 simulations.

Long-drive limit in the underdamped regime and its drawbacks as a LRU
In this section we compare the reset schemes in [54,55] versus [53] in terms of their performance as a LRU in the underdamped regime. The approach of [54,55], which we have adopted in Section I B, aims at swapping |20 and |01 by targeting the first minimum of the oscillations induced by the drive (switching the drive off afterwards). As shown in Section I B, this approach allows for a residual leakage population p |2 op. ≈ 0.5% at the operating point (see Fig. 2(a)), given our parameters (see Table I). While this already reaches thermal-state levels (heren = 0.5%) with the considered system parameters, the approach in [53] could be used in general to achieve an even lower or similar p |2 (in particular for lower κ's).
The approach in [53] keeps the drive on for a much longer period of time (at least one more oscillation) allowing both the populations in |20 and |01 to decay to almost 0, modulo thermal excitations. Figure 7 shows that it is indeed possible to suppress these populations to thermal-state levels, where we use the same (Ω, ω d ) as at the operating point (see Section I B). However, we see that for the operating point there is almost no gain by using this approach. Furthermore, this approach costs much more time and could exceed T slot = 440 ns if κ is not as high as assumed here. In particular, in that case the first few minima after the first one could be slightly higher, due to transmon decoherence, and one would need to wait even longer to overcome this effect.
Another disadvantage of the approach in [53] is that the disturbance to the qubit is stronger as the drive is kept on for a longer period of time. E.g., in Fig. 7 one can see that |00 and |10 reach an equilibrium thanks to the drive (even in the presence of relaxation), where the population in |10 is higher than in Fig. 2(b). By evaluating T 1 we find T 1 ≈ 23 µs instead of 27 µs (see Appendix B 1). Furthermore, if one would have to use a t p > T slot when κ is lower than here, then the QEC cycle would get longer, affecting the coherence of all qubits, not only of the high-frequency data qubits to which the res-LRU is applied.

Sensitivity to residual ZZ crosstalk
In a multi-transmon chip, each transmon is coupled to one or more neighbors. In general, if the coupling is not tunable there can be some residual ZZ crosstalk, i.e. a shift of the transmon frequency by an amount ζ based on whether each neighboring transmon is in |1 instead of |0 . In this section we study the effect of this ZZ coupling on the readout-resonator LRU, which we assume being tuned up when all neighbors are in |0 . We do not include neighboring transmons in our simulations, so we mimic it by shifting the transmon frequency (while keeping the drive parameters fixed).
In Fig. 8 we perform the analysis for the operating point (see Section I B), which resides in the underdamped regime, and for the critical point. In both cases the leakage-reduction rate R scales seemingly quadratically. In the underdamped regime the pulse targets the first minimum of the damped Rabi oscillations, so it is more sensitive to a variation in frequency than in the critical regime. However, we can observe that for |ζ| /2π 2 MHz (note that this is the cumulative ZZ coupling over all neighbors) R stays above 95%, which is the conservative value we have used in Section II D and for which the logical error rate was already close to optimal in Surface-17 (see Appendix C 2). Regarding other performance parameters of the LRU, we find that L LRU 1 scales in the same relative way as R by unitarity, whereas T 1 , T 2 and T ↑ 1 vary by 1%. The parameters used in this work are reported in Table II. a. res-LRU in quantumsim A comprehensive review of the density-matrix simulations and the use of the quantumsim package [58] is available at [46,50]. In this section we explain the specific implementation of the newly introduced res-LRU, expressed in the Pauli Transfer Matrix formalism.
We construct a "phenomenological" Lindblad model with input parameters R, L LRU 1 and t res-LRU . We use the Pauli Transfer Matrix S res-LRU = S ↑ S ↓ , where S ↓ is the Pauli Transfer Matrix of the superoperator S ↓ = e tres-LRUL ↓ and the Lindbladian L ↓ has the quantum jump operator with R sim to be determined. Besides this, L ↓ has the standard qutrit jump operators for relaxation and dephasing [46]. On the other hand, S ↑ is the Pauli Transfer Matrix of the superoperator S ↑ = e L ↑ and the Lindbladian L ↑ has a single jump operator we match the definition of R in Section II B 1 as well. The approximation is very good for large R and low L LRU 1 , which is precisely the interesting regime for res-LRU that we have explored.

b. Decoding
In this section we provide additional information on the UB and MWPM decoders [50,71].
UB considers the 32 computational states that differ by a purely X error on top of |0 L and that are independent (i.e. they cannot be obtained from each other by multiplication with an X-type stabilizer). At the end of each QEC cycle n, each possible final Z syndrome is compatible with a pair of these states, where one can be associated with |0 L and the other with |1 L as they differ by the application of any representation of X L . The largest overlap of these two states with the diagonal of the density matrix at QEC cycle n corresponds to the maximum probability of correctly guessing whether a X L error has occurred or not upon performing a logical measurement of Z L . The latter is assumed to be performed by measuring all data qubits in the {|0 , |1 , |2 } basis and computing the overall parity. To compute the parity we assume that a |2 is declared as a |1 since decoders usually do not use information about leakage (and since measurements often declare |2 as a |1 rather than as a |0 ). Then UB computes F L (n) by weighing this probability with the chance of measuring the given final Z syndrome (conditioned on the density matrix) and by summing over all possible syndromes. In other words, UB always finds the correction that maximizes the likelihood of the logical measurement returning the initial state, here |0 L . As UB uses information generally hidden in the density matrix, it gives an upper bound to the performance of any realistic decoder, which can at most use the syndrome information extracted via the ancilla qubits.
MWPM tries to approximate the most likely correction by finding the lowest weight correction, which is a good approximation when physical error rates are relatively low. As the ancilla qubits can be faulty, the decoding graph is three dimensional. In particular, we allow for space-like edges corresponding to data-qubit errors, time-like edges corresponding to ancilla-qubit errors and spacetime-like edges corresponding to data-qubit errors occurring in the middle of the parity-check circuit. The weights are extracted with the adaptive algorithm in [73] from a simulation (10 5 runs of 20 QEC cycles each) without leakage and an otherwise identical error model. Similarly to UB, for decoding we assume that a |2 is declared as a |1 since the standard MWPM does not account for leakage.

Logical error rate as a function of the LRU parameters
We study the variation in the logical error rate ε L per QEC cycle as a function of the performance parameters of the LRUs. Here we fix L 1 = 0.5% as it is easier to visualize variations in ε L with a relatively large L 1 . The leakage-reduction rate R and the readout probability p M (2|2) play similar roles for the res-LRU and π-LRU, respectively. In Fig. 9(a),(c) one can see that this is the case and that the values of ε L at the parameters used in Section II D (R = 95% and p M (2|2) = 90%) are very close to their best values (at least for this system size). This shows that the advantages of a larger R or p M (2|2) are marginal. We attribute this to the fact that leakage is exponentially suppressed with an already quite large exponent. Furthermore, the parameters L LRU 1 and 1 − p M (1|1) = p M (2|1), regulating the induced leakage, play similar roles as well, as Fig. 9(b),(d) show. We see that ε L is more sensitive to L LRU 1 and 1 − p M (1|1) compared to R and p M (2|2). In particular we see that ε L is slightly larger at the parameters used in Section II D (L LRU 1 = 0.25% and 1 − p M (1|1) = 0.5%) rather than at 0, although the difference is small.

Effect of the leakage conditional phases on the logical error rate
As defined in the main text the leakage conditional phases are the phases that a non-leaked transmon acquires when interacting with a leaked one during a CZ. Here we denote them as φ L flux and φ L stat depending on whether the lower or the higher frequency transmon of the pair is leaked, respectively, and we use φ L to indicate either of them. Furthermore, in this section we use the notation |low-f. transmon, high-f. transmon . Note that for a CZ between two qutrits in principle there are 9 phases (φ 00 , φ 01 , φ 10 , φ 11 , φ 02 , φ 20 , φ 21 , φ 12 , φ 22 ), where the first 4 are fixed to 0, 0, 0, π, respectively. Of the 5 phases containing a |2 we consider only two of them here, i.e. φ L stat = φ 02 − φ 12 and φ L flux = φ 20 − φ 21 as defined above. This is because in our leakage model [46] we set to 0 the coherence between the computational and leakage subspace of each qutrit, motivated by the fact that leakage is projected relatively fast and that the stabilizer measurements ideally prevent any interference effect. This means that the individual phases are global phases, whereas their difference cannot be gauged away when the non-leaked qubit is in a superposition of |0 and |1 .
For a flux-based CZ with conditional phase π for |11 , ideally one should have φ L flux = 0 and φ L stat = π [46] as |02 acquires a conditional phase equal and opposite to |11 . If only |12 and |21 are coupled in the 3-excitation manifold, it holds φ L stat = π − φ L flux . The strength of the repulsion times the CZ duration gives e.g. φ L flux ∼ π/4 for the parameters in [46]. However, |03 interacts with |12 and |21 and breaks the relationship above, for which we can consider φ L flux and φ L stat as effectively unconstrained. The randomized values used across the main text are reported in Fig. 10(b). We use 14 values, of which 3 for φ L stat and 3 for φ L flux when each high-frequency data qubit is leaked or interacts with a leaked ancilla qubit, respectively, and 8 only for φ L stat when each ancilla qubit is leaked and interacts with a low-frequency data qubit (as low-frequency data qubits cannot leak themselves).
In this section we study the dependence of the logical error rate ε L on the leakage conditional phases, without discussing how one would engineer the system to tune them to certain values. The best-case scenario to minimize ε L is to set all φ L = 0, since no Z rotations are spread then. Instead, the worst-case scenario corresponds to all φ L = π/2, since under the twirling effect of the parity-check measurements this corresponds to spreading a Z error with 50% chance. Notice that, if all φ L = π, overall the spread errors amount to a stabilizer (except in the QEC cycle in which leakage occurs),   so it is close to the best-case scenario. Figure 10(a) compares the logical performance for both UB and MWPM in the cases where φ L = 0, φ L = π/2 and when they are random as in Fig. 5 and in the rest of this work. First, one can notice that the performance of random φ L is very close to the worstcase scenario (φ L = π/2). This is due to the fact that it is not necessary to spread an error on every qubit with 50% chance each to cause a logical error with high probability. Second, one can see that just tuning all φ L = 0 without implementing LRUs is almost as good (or even better) as using the LRUs when φ L are random. We at-tribute this to the fact that one of the major effects of the LRUs is to prevent correlated errors being spread by a leaked qubit for many QEC cycles. Tuning φ L = 0 achieves this as well, but it still does not address the fact that the code distance is effectively reduced if a data qubit stays leaked and that the full stabilizer information is not accessible as long as an ancilla qubit is leaked. Indeed, using LRUs even when φ L = 0 always allows for a lower logical error rate (see Fig. 10(a)). Furthermore, the reduction in distance and the corruption of the stabilizer information suggest that a threshold would still likely be low without using LRUs.