High-fidelity measurement of qubits encoded in multilevel superconducting circuits

Qubit measurements are central to quantum information processing. In the field of superconducting qubits, standard readout techniques are not only limited by the signal-to-noise ratio, but also by state relaxation during the measurement. In this work, we demonstrate that the limitation due to relaxation can be suppressed by using the many-level Hilbert space of superconducting circuits: in a multilevel encoding, the measurement is only corrupted when multiple errors occur. Employing this technique, we show that we can directly resolve transmon gate errors at the level of one part in $10^3.$ Extending this idea, we apply the same principles to the measurement of a logical qubit encoded in a bosonic mode and detected with a transmon ancilla, implementing a proposal by Hann et al. [Phys. Rev. A \textbf{98} 022305 (2018)]. Qubit state assignments are made based on a sequence of repeated readouts, further reducing the overall infidelity. This approach is quite general and several encodings are studied; the codewords are more distinguishable when the distance between them is increased with respect to photon loss. The tradeoff between multiple readouts and state relaxation is explored and shown to be consistent with the photon-loss model. We report a logical assignment infidelity of $5.8\times 10^{-5}$ for a Fock-based encoding and $4.2\times 10^{-3}$ for a QEC code (the $S=2,N=1$ binomial code). Our results will not only improve the fidelity of quantum information applications, but also enable more precise characterization of process or gate errors.


I. INTRODUCTION
Quantum information processing (QIP) involves many tasks. One requirement, crucial to any QIP experiment, is the ability to measure a qubit or qubit register. Given a superposition |ψ = α |0 L + β |1 L , our task is to collapse the state and detect the corresponding classical bit of information. That is, we should extract a "0" with probability |α| 2 , and a "1" with probability |β| 2 , projecting onto the state |0 L or |1 L , respectively. A canonical example of qubit measurement is at the end of a quantum computation, after which the experimenter measures the qubit array and infers a useful result. Experimental implementations of quantum computing requirements, including feedback-based state preparation [1], gate calibration, and error-syndrome extraction for quantum error correction [2][3][4][5][6] (QEC), also rely on qubit measurements. Outside of quantum computing applications, measurement is used in quantum key distribution [7], enhanced sensing [8], teleportation of states or gates [9][10][11], and fundamental tests of locality and entanglement [12,13]. Although the particular metrics will depend on the application at hand, better measurements will translate into better results in all of the examples mentioned. Furthermore, improvements in qubit measurement will improve the calibration and characterization [14] of gates and other quantum operations. Finally, we point out that in order to implement large quantum circuits, measurements must improve alongside advances in gate quality and number of qubits. For example, for fixed single-qubit measurement error, the probability that an entire register of qubits will be measured correctly decreases exponentially in the size of the register.
Much progress has been made in the experimental implementation of single-shot qubit measurements in a variety of systems. Single-shot measurements of qubits based on trapped ions [15][16][17], electron spins in quantum dots [18][19][20], and superconducting circuits [21,22] have all been demonstrated with fidelities higher than 99%. Improvements have been made in a variety of ways; interesting approaches have included mapping a spin state onto a metastable charge state [18], and the repeated readout of an ion state using an ancilla [15].
In all of the systems mentioned above, state relaxation is a major limitation to the measurement fidelity. In a continuous measurement, some signal (such as photomultiplier current or the phase of a microwave tone) is recorded and compared to the expected response corresponding to each qubit state. Because noise causes uncertainty in the assignment of individual outcomes, we would generally like to acquire signal for a long time in order to improve the measurement contrast. On the other hand, T 1 events during the measurement, in which the qubit relaxes from its excited to ground state, lead to incorrect assignment of the initial state [23].
In this work, we overcome the first-order T 1 limit by encoding qubits in multilevel systems. If the physical states representing the 0 and 1 bit are separated by multiple energy levels, then a single relaxation event will not corrupt the 0 or 1 which was encoded. For this reason, qubit encodings with a larger distance between codewords with respect to the dominant error channel can be measured with much improved fidelity.
As a simple and direct example of our approach, we study the measurement of a transmon qubit (t) dispersively coupled [24] to a readout resonator (r). In particular, using higher levels of the transmon is shown to improve the fidelity by two orders of magnitude. In Sec. IV, in order to more fully demonstrate the advantage of multilevel encodings, we encode a qubit in a harmonic oscillator implemented as a λ/4 coaxial superconducting cavity. Using a bosonic mode allows us to systematically explore different qubit encodings, including Fock-based encodings as well as error-correctable binomial codes [25]. The information in this "storage" mode (ŝ) is read out using the dispersively coupled transmon as an ancilla according to a recent proposal [26]. In this proposal, the storageancilla interaction is used to map information from the storage onto the ancilla, and the ancilla-readout interaction is used to read out the ancilla state.

II. BACKGROUND
Before going further, it is worth clarifying what is meant by encoding and measuring a bit in a multilevel system. Traditionally, a qubit is identified with a twolevel system such as a spin-1 2 particle in a magnetic field. In this case, there is no trouble identifying the two eigenstates with the values of a bit: |↑ represents a 0, say, and |↓ represents a 1. To measure the bit encoded by such a system means performing a projective measurement of an eigenstate of the system. Finally, to incorrectly measure the bit encoded by such a system means that state transitions or detector noise cause the experimenter to record a state initially |↓ as "0" or vice-versa. These simple ideas are sketched in Fig. 1(a), where dashed arrows indicate incorrect measurements.
Actual implementations of qubits, however, may contain more than two energy levels. In this case, the definition of a qubit is a matter of convention. For example, the transmon [27,28] is an anharmonic oscillator with several energy levels (|g , |e , |f , |h , . . . ) in which the ground and first excited state are often operated as a qubit. This is possible because the |g -|e transition is detuned by many linewidths from all other transitions, so that it can be individually addressed by a single rf drive. In this paper we consider representing and measuring information encoded in the higher states.
Even more care is required when dealing with a harmonic oscillator, in which many transitions fall within the mode linewidth and there is no obvious notion of a qubit. In fact, several continuous-variable codes [25,[29][30][31][32][33][34][35] have been studied, in which two states in Hilbert space, |0 L and |1 L , are designated as the logically encoded 0 and In a two-level system, qubit readout means determining the energy eigenstate of the system at time t = 0. Dashed arrows indicate incorrect assignments, P (↑ | |↓ ) and P (↓ | |↑ ), which may be due to detector noise or state transitions. (b) In a many-dimensional Hilbert space, reading out a single bit requires partitioning Hilbert space into two sets containing the encoded 0 and 1 states. Instead of an energy eigenstate, membership in S or its complementS = H \ S is measured. The dashed arrows now indicate the assignment errors P (S| |1L ) and P (S| |0L ).
1 states. What does it mean to measure qubits encoded in this way? To answer this question, we imagine partitioning the Hilbert space H into two subsets, S andS, containing |0 L and |1 L respectively. An ideal measurement of the encoded bit is a projective measurement of the membership in S. The POVM operators [36] corresponding to such a measurement are Finally, the fidelity of such a measurement is defined [23] as where P (a|s) is the probability that an assignment a will be made, given that the true initial state was s. Note that this definition goes to zero, not 0.5, for random guessing. The fidelity offers a simple, unified metric with which we can compare the measurement of different codes.

III. MULTILEVEL TRANSMON MEASUREMENT
In the standard approach to superconducting qubit measurement, a readout resonator is dispersively coupled to a transmon, meaning that the frequency of the resonator mode is shifted if the transmon is in an excited state. By driving the resonator and recording the amplitude and phase of the response, one can therefore infer the transmon state. Of course, there is inevitably noise on top of the average resonator response. Integrating the signal for a longer time provides more separation between the two averages. However, transmons undergo random transitions between eigenstates with a characteristic time T 1 . After a certain fraction of T 1 , the measurement signal will no longer provide useful information about the initial state; therefore, the signal-to-noise ratio competes with the lifetime of the information being measured [23].
Suppose that instead of asking whether the qubit was initially in its ground state |g or first excited state |e , we ask a slightly different question: Was the qubit initially in its ground state |g , or in an excited state |e , |f , |h , . . .? Extending the logic above, the nth excited state will remain distinguishable from |g until n relaxation events have occured. Therefore we are able to acquire signal for a longer time, leading to a better signal-to-noise ratio. Additionally, the figure of merit t m /T 1 1 for relaxation to the ground state becomes (t m /T 1 ) n when discriminating between |g and the nth excited state, where t m is the measurement acquisition time [37]. This enables us to perform high-fidelity measurements of transmon states. The measurement error for a related system was previously studied as a function of the signal-to-noise ratio and the number of excited states [38].
In order to test or characterize a high-fidelity transmon measurement, we first need to prepare the states |g , |e , |f , and |h accurately. This is done by applying the appropriate pulse sequence, then performing a preliminary "check" measurement and postselecting on measurements which give us a high confidence that the correct state has been prepared. Because we record all outcomes following a successful state preparation, this method is a fair way to characterize the measurement fidelity [39].
We test the fidelity by measuring the transmon and comparing the resulting assignment to the state which was prepared. The resulting misassignment probabilities are plotted in Fig. 2 as functions of the measurement time. To classify an individual measurement record z(t) up to a particular time t m , the record is compared to the average signals,z g (t), . . . ,z h (t) corresponding to the |g , . . . , |h states. The classifier outputs the state s for which tm t=0 |z(t) − z s (t)| 2 is minimal. These labels are used to determine whether a particular realization was assigned correctly as either the ground state or an excited state.
We can understand the shapes of these plots qualitatively as follows: in the first part of the measurement, we acquire more signal and the misassignment probabilities decrease. Eventually, however, we are limited by state relaxation and the measurement no longer improves. If we continue to collect signal and classify the states naively, then we will actually start to make more misassignments due to the state relaxation. The probability that state |e , |f , or |h decays to |g has t 1 , t 2 , or t 3 dependence, leading to the improvement in measurement fidelity shown in Fig. 2(c). We observe that if a transmon is prepared in the |g -|h manifold, then it can be read out with a measurement infidelity of P (S | |h ) + P (S | |g ) = (4.0 ± 0.5) × 10 −4 .
We can apply these ideas to the more usual task of measuring "g" or "e" using a shelving technique. The idea is simple: we apply a rapid e-f π-pulse, X ef π = |f e|+|e f | , to the transmon immediately before measuring it. In this way, we transform the problem of distinguishing |g from |e into the problem of distinguishing |g from |f , which can be done with higher fidelity. This has been previously explored as a method of increasing the contrast of qubit measurements with a latching readout [40]. We observe that in such a scheme, the misassignment of |e as "g" is a second-order error. On the other hand, the misassignment probability is directly sensitive to preparation error. Therefore, if |e is prepared with a pulse, then the error in that pulse will limit the performance of the shelved measurement.
However, this means that a shelved measurement has a resolution comparable to the transmon gate errors. A measurement with such resolution is highly desirable in circuit QED, in which single-qubit gate errors are typically obscured by much larger measurement errors. We now show, using the above method for improved |g -|e readout, how one can calibrate gates in a way that reduces state preparation and measurement (SPAM) errors. For example, a g-e π-pulse, X ge π = |g e| + |e g| , would typically be calibrated by the following procedure.
(1) Prepare the transmon in its ground state, |g . (2) Apply the pulse with a variable amplitude. (3) Measure the state of the qubit. The result of such a procedure is shown in Fig. 3 in blue. The minimum value of this curve, (1.2±0.1)×10 −2 , is the inferred gate error, uncorrected for SPAM errors. Alternatively, the same procedure can be performed using shelving as described above. As shown in red in Fig. 3, the visibility is greatly improved because of the reduced measurement error. We therefore obtain, by direct measurement, an improved bound of (1.4 ± 0.3) × 10 −3 on the gate error.
We have shown how to characterize a gate to within a small factor. A QuTiP [41] simulation [42] predicts a residual g population of P g = 3.7 × 10 −4 after the g-e π-pulse. This is the quantity we would like to be able to measure directly. Similar calculations predict P shelved g = 7.2 × 10 −4 at the end of the e-f pulse, which is somewhat larger due to additional relaxation during the e-f pulse. Finally, in order to roughly estimate the probability of reading out "g," we calculate P meas g = 9.7 × 10 −4 halfway through the measurement. These calculations show that although there is some error in the shelved measurement itself, it is comparable to the pulse error we wish to characterize. We stress that if the gate fidelity (∼ t gate /T 2 ) were to improve, then the shelved measurement would improve with it. The probability that the ground state |g is incorrectly assigned as belonging toS, plotted as a function of the length of measurement signal used to make the classification. The initial shape of the curve is related to the ring-up time of the readout resonator. The misassignment probability decreases as a function of time, because collecting more signal improves the separation of readout signals. However, the improvement stops once the misassignment probability is comparable to the probability that the transmon has gained a photon. Collecting more signal would cause additional misassignments. (c) The probability that an excited state |e , |f , or |h is incorrectly assigned as beloning to S. Again, after some initial transient behaviour, the misassignment probability improves as signal is collected for a longer time. Eventually, relaxation events cause misassignments, and the curves increase for large acquisition times. The probability that all excitations are lost, causing erroneous assignment to S, is much smaller when a higher excited state is prepared. The exact misassignment probabilities achieved depend on many factors, especially the relaxation time T ≈ 51 µs and thermal populationnt ≈ 0.4%.

IV. MEASUREMENT OF LOGICALLY ENCODED QUBITS
As mentioned earlier, continuous-variable (CV) encodings of quantum information in bosonic modes offer a promising route to fault-tolerant quantum computation [29,31,43]. Instead of encoding a logical qubit in many physical qubits, CV schemes use an infinite-dimensional Hilbert space to build the redundancy needed for error correction into a single oscillator or storage mode. In this section, we show how redundancy and distance can be used to improve logical qubit measurements. We implement a proposal [26] to measure multiphoton encodings in a harmonic storage mode with high fidelity. The method consists of mapping the encoded information onto an ancilla, reading out the ancilla, resetting the ancilla, and repeating these steps several times.
To understand the advantage of a repeated readout scheme, first consider the error associated with mapping information onto the ancilla and reading it out. If this process is noisy, it is liable to give the wrong answer, but if it is QND with respect to the state of the stor-age mode-that is, if the readout does not induce extra transitions between eigenstates-then the storage mode can be read out repeatedly. To make an assignment of the bit encoded by the storage mode, one could take a majority vote of the outcomes obtained via the ancilla. In this way, many individual readouts of the encoded information can be combined to a single measurement with a greatly reduced probability of error. Next consider the relaxation of the storage mode itself. As photons are lost incoherently from the storage, the information encoded by the storage is corrupted, limiting the ability of any measurement to extract the initially encoded bit. For example, if information were encoded in Fock states |0 and |1 , then the relaxation rate, κ ↓ , times the total mapping, readout, and reset time, τ, would (with the appropriate prefactor) set a lower bound on the probability of measurement error. On the other hand, if the logical codewords were separated by L Fock states, then the measurement would be robust up to L relaxation events. Therefore the bound would scale like (κ ↓ τ ) L , an exponential improvement.
The measurement infidelity has been estimated for var- After the ground state is prepared, a g-e π-pulse is applied with variable amplitude and the transmon is measured. An e-f "shelving" pulse before the measurement greatly improves the visibility of g-e Rabi oscillation. The unshelved measurement is shown in blue, and the shelved measurement is shown in red.
ious multiphoton encodings which are repeatedly read out using an ancilla as described above. It was shown [26] that the measurement infidelity can be broken into two contributions, corresponding to the observations above: first, the probability that a majority of individual readouts will give the wrong answer (ignoring state relaxation), and second, the probability that the state will relax from S intoS (or vice versa) during the first half of the repeated readout sequence. These terms are given by for the family of Fock codes |0 L = |0 , |1 L = |L . In the equation above, N denotes the number of measurements made, κ ↓(↑) is the rate of energy loss (gain) in the storage mode, and δ 0(1) is the probability of a readout error during a single round of measurement when the |0 L (|1 L ) state was prepared. We choose S = span{|0 , |1 } to measure this encoding, so that the measurement of the |0 state is robust to the gain of a single photon.
The measurement protocol described allows us to perform many independent readouts of an encoded qubit using only a single ancilla. This is accomplished using an adaptive feedforward scheme, which is illustrated in Fig. 4. A high-level description is given in 4(a), with further implementation details shown in 4(b) and 4(c). We next discuss the experimental implementation of the protocol.
Once the system is initialized, the first step of the measurement is to map the information encoded in the storage mode onto the state of the ancilla. This step is shown as an S-controlled X ge π gate in Fig. 4(a). The realization of this entangling gate is based on the dispersive interaction betweenŝ andt, which imparts an ancilla frequency shift which depends on the number of excitations in the storage mode:Ĥ = ω t + χ stŝ †ŝ t †t + · · · An ancilla pulse with spectral content near the frequencies corresponding to the n in S has the effect of flipping the ancilla state if and only if the storage state is in S. For example, when measuring the Fock codes, the ancilla is excited if the storage is in the state |0 or |1 . After the information encoded by the storage mode has been mapped onto the ancilla, the state of the ancilla is read out. The outcome m provides information about the encoded qubit state, and provides the repeated measurement protocol with one vote.
However, the outcome of the readout also determines the operation which must be performed in order to reset the ancilla. The readout was calibrated to distinguish between the states |g , |e , |f , and |h . Ideally the ancilla would stay in the |g -|e manifold, but resolving higher states enables a more robust reset operation. The reset protocol is shown in Fig. 4(c) as a recursive block diagram. It relies on realtime logic implemented on FPGA cards, and does not terminate until it successfully records the ancilla in its ground state |g . Once the ancilla is reset to its ground state, it is available for additional measurements. In this way, the map-measure-reset cycle is repeated many times in order to extract a single highfidelity measurement of the logical qubit.
In order to demonstrate a high-fidelity measurement of the storage mode, it is crucial that the initialization step prepare states accurately. As shown in Fig. 4(b), number states were prepared by creating two excitations in the ancilla, then using a sideband interaction to convert them into a storage excitation [44], and repeating this process the desired number of times. This process is associated with a significant initialization error (on the order of ∼ 10% overall when preparing states of several photons), which would dominate over the measurement infidelity and prevent us from making any conclusions about the performance of our protocol. Therefore, after the state creation routine is finished, number-selective pulses and ancilla readouts are used to repeatedly check that the We first prepare one of several Fock states |n in the storage mode. We then perform several readout cycles, each time obtaining an outcome. Each cycle consists of a decode pulse which excites the ancilla conditioned on the encoded bit, followed by a readout and reset of the ancilla. (b) The initialization procedure uses a series of number-selective pulses to verify that the correct state has been prepared. This verification is crucial in order to demonstrate a sensitive detection. (c) Each reset consists of a real-time feedforward protocol which ensures that the ancilla is in its ground state |g at the end of each cycle.
correct state was prepared. Only if several checks pass is the state preparation accepted.
The results of a repeated measurement experiment are plotted in Fig. 5 and illustrate all of the expected behavior. Fig. 5(a) defines the operator which is measured; in this case the goal is to measure whether the storage was initially in the state |0 or |L for some L ≥ 2. We plot the results of a majority vote as a function of the number of readouts taken. Furthermore, the assignment errors are split into two types: |0 L incorrectly assigned as 1, and vice-versa. As more readouts are incorporated into the majority vote, the measurement fidelity improves because mapping and readout errors are suppressed. Eventually, however, this suppression competes with state transitions in the storage mode itself, causing the measurement to degrade once too many readouts are taken. We also observe that as the distance between codewords (L) increases, the measurement fidelity improves dramatically. These trends are captured by their theoretical description [26] and follow the predictions of Eq. (4), shown in dashed and dash-dotted lines.
The open symbols in Fig. 5 represent postselection of successful reset operations. By removing 0.2% of records corresponding to nonideal ancilla resets, we obtain much better agreement with the theoretical predictions. These events appear to be due to rare excitations to high levels in the ancilla, the origin of which is unclear [45]. It is worth pointing out that we do this postselection only in order to compare to the theory, not to make fair qubit measurements. We emphasize that the results in the remainder of this paper, in particular Fig. 6 and Table I, are not postselected in this way. They represent "fair" measurements.
In addition to the Fock-based codes described above, certain QEC codes can also be measured using the same procedure. The only difference in the implementation of the protocol is the choice of S, which is summarized in Table I. In addition to the Fock-based codes, we also study two binomial codes [25]. Such codes are based on superpositions of photon number states, and are designed to correct for different combinations of photon loss, photon gain, and dephasing errors. A universal gate set and ...

(b) (c) (a)
FIG. 5. Enhancement of measurement fidelity with code distance. (a) In these experiments, we excite the ancilla if the storage mode has fewer than two photons in it. In doing so, we read out whether the storage state is in S orS as shown.
(b) Probability that when a state in S is prepared in the storage mode, it is assigned incorrectly asS. The state |0 or |1 is prepared in the storage mode, as indicated by the labels on the plot. The data show that the fidelity improves exponentially with the number of votes, until excitation out of the S subspace limits the measurement. Dash-dotted lines show the terms in the theoretical prediction corresponding to majority voting, dashed lines show terms corresponding to state transitions, and closed symbols show the experimental data. Open symbols indicate postselection on successful ancilla resets. (c) Probability that when a state inS is prepared in the storage mode, it is assigned incorrectly as S. Again, we see that majority voting suppresses the misassignment probability, but that for large N the misassignment probability increases due to photon loss. States of higher photon number are measured with smaller error because the measurement is robust to more photon loss events.

Code
Flip ancilla if n ∈ |0L |1L Distance Measurement infidelity error correction have been implemented for the lowestorder binomial code [46], and these codes have found application in quantum metrology [47]. In this work, we do not consider codes in which any number state appears in both the |0 L and |1 L states. This means that the POVM of Eq. (1) does not depend on phase for any of the S studied. It is therefore sufficient to prepare number states as inputs to the measurement protocol and average the results appropriately.
Although majority voting is a convenient way to make assignments given a measurement record, it is not optimal. In particular, the measurement fidelity worsens when too many readouts are incorporated into the majority vote, because of state relaxation. To make a uniform comparison between different codes, we use a Bayesian classifier (maximum likelihood estimator or MLE). Such a classification scheme is optimal and sufficiently general to classify any code [26]. The results of such a classi-fication scheme are shown in Fig. 6 as a function of the number of readouts taken. As more readouts are included in the classification, the infidelity improves monotonically, as expected for an MLE. Furthermore, the minimum infidelity improves as the distance of the encoding is increased. The final results are compiled in Table I, along with the definitions of |0/1 L and S. We note in particular that we have achieved a measurement fidelity of 1 − 6.5 × 10 −5 ≈ 0.9999 when discriminating between states |0 and |5 , and 1 − 4.2 × 10 −3 ≈ 0.996 when discriminating between states |0 L and |1 L in the S = 2, N = 1 binomial code, both of which surpass the highest measurement fidelities reported in a circuit QED system [22]. Results of measuring the logical Z observable in different qubit encodings. Given single-shot ancilla assignments, a Bayesian classifier outputs either "0L" or "1L," according to its best estimate of the initial state. As more readouts are incorporated into the estimate, the measurement infidelity decreases monotonically. Results for the 0-2, 0-3, 0-4, and 0-5 Fock codes are shown in blue circles, green squares, red triangles, and black triangles, respectively. Results for the first-and second-order binomial codes are shown in yellow stars and cyan pentagons. As the code distance increases, the measurement fidelity improves.

V. CONCLUSIONS
We have shown that multi-level encodings can be leveraged to improve measurement fidelity in quantum information systems. This idea was explored in two contexts. In the first set of experiments, the multiply excited states of a transmon qubit were used to protect measurements against errors due to state relaxation. Furthermore, it was shown how this technique can be used to mitigate SPAM errors. In the second set of experiments, a repeated readout scheme was used to measure qubits encoded in a harmonic storage mode. In this scheme, repeated readouts suppress the effect of mapping and readout errors, and distance between codewords suppresses the effect of state relaxation. Using an adaptive reset scheme relying on realtime feedforward logic, this scheme was implemented with a single ancilla.
Measurement is a crucial resource in quantum information processing. In particular, reliable measurements are necessary for high-fidelity teleported operations [11], which are an important component of a modular architecture for quantum computation [48]. Therefore, we expect our results to find applications in a variety of future experiments.
Supplemental material for "High-fidelity measurement of qubits encoded in multilevel superconducting circuits"

READOUT QND-NESS
Based on the low-energy Hamiltonian of a transmon coupled to harmonic oscillators [1], we do not expect the measurement protocol to induce additional relaxation in the storage mode. However, several experimental and theoretical works have explored readout-induced state transitions in a transmon [2][3][4][5]. In the context of harmonic storage oscillators, one study has shown that parity measurements, while highly quantum non-demolition (QND), can induce a small amount of additional relaxation [6]. In that experiment, the total relaxation rate was modeled as a combination of the bare storage lifetime τ 0 and a demolition probability P D associated with each parity measurement. Varying the rate, 1/τ i , at which repeated parity measurements were performed, allowed P D to be measured. Here, we perform a similar calibration to extract the additional rate of photon loss from the storage induced by a transmon readout. As shown in Fig. S1, we find that transmon readout is 99.98% QND with respect to the storage mode.
FIG. S1. QND-ness of ancilla readout. Cavity T1 measurements were performed with ancilla readouts during the delay repeated with a variable interval time τi. The data are fit to a model (solid line) 1/τtot = 1/τ0 + PD/τi. From this we infer a demolition probability per readout of PD = 0.02%, corresponding to a QND-ness of 99.98%. The fit parameter τ0 = 1.01 ms is indicated by a dashed red line.

MAXIMUM LIKELIHOOD ESTIMATOR
The repeated readout protocol is modeled as a hidden Markov model, in which the underlying states represent the storage mode photon number, transitions between hidden states represent photon loss and gain, and emission matrices represent error-prone readouts with the ancilla. If the transition and emission matrices T and E are known, then a given series of emissions (readouts) can be converted into a best estimate of the initial state of the storage mode [7].
To determine the measurement fidelities shown in Fig. 6, the forward-backward algorithm [8] is applied on a single-shot basis to every sequence of readout outcomes. The transition matrix used at each step takes into account the total time taken by that cycle's reset operation. For each readout sequence, a degree of belief p (n 0 ) for the initial number of photons in the storage is calculated. The prior distribution p(n 0 ) is based on the code being measured. These probabilities are then converted into a logical measurement outcome in the natural way. For example, when distinguishing (|0 + |4 )/ √ 2 and |2 , the prior distribution is p(0) = p(4) = 0.25, p(2) = 0.5; the assignment is "0 L " if p (0) + p (4) > p (2) and "1 L " otherwise.

STATE PREPARATION
As shown in Fig. 4, we use a heralded method of state preparation in order to characterize the repeated measurement protocol. Here we investigate whether this method will affect the experimentally determined measurement fidelities. Specifically, we show that (i) the heralded states have high fidelity, and (ii) the main source of error is photon loss during the final check measurement. This means that although the state preparation is imperfect, we will only rarely prepare states in S which are meant to be inS, or vice-versa. Furthermore, the error is of the form predicted by the theoretical description of the protocol [7]. Therefore, we do not need to explicitly account for the preparation error in analyzing our experiments.
Our approach is to consider the degree of belief for the current state of the storage mode as additional checks are passed. We suppose that at each step of the verification, there is first a check measurement, and then the possibility of a state transition. Denote the probability of passing a check measurement when in state |n by E n , and the probability of transitioning from state |n to state |n as T n→n . Then by keeping track of P t (n) ≡ P (currently n ∩ t checks passed) as measurements are added, we can update it at every step: (S1) This function is proportional to the conditioned degree of belief, P (currently n | t checks passed). Our claim is that once several checks have passed, this state is nearly the ideal photon number state being prepared, but subject to state relaxation during the last check measurement. To see why, consider the tree shown in Fig. S2, whose branches at every step represent the terms in the sum above. ε denotes a generic error probability (either a false positive of the check measurement, or the time per check divided by the storage lifetime). We see that to first order, only two paths contribute to P t (n). First, there is the ideal path, in which the desired number state |n * was prepared correctly and confirmed. Second, there is the possibility that the correct state was prepared and confirmed, but that it decayed to |n * − 1 at the very end of the protocol.
FIG. S2. Illustration of preparation errors. The probabilities Pt(n) are represented by the rows and columns of the network above. We imagine each node to contain the respective probability. Pt(n) can be calculated by applying a linear map to the previous probabilities, Pt−1(n), at each time step. These linear operations are shown as edges in the graph. By tracing out paths through the network, we can visualize the important terms in the expansion (S1).
The argument above is robust to the details of the number state creation and check measurements. Also, we stress that the remaining preparation error, photon loss, is the very error which our measurement protocol is robust to.

RESET ERRORS
The predictions of Eq. (4) assume that the ancilla is perfectly reset every repeated measurement cycle, and that the reset operation takes a fixed amount of time. While these are good approximations, they break down when we are interested in very small sources of error. Fig. S3(a) shows a histogram of the number of attempts which were required in order for the ancilla reset operation to complete. As expected, the reset almost always succeeds in placing the ancilla in its ground state within a few conditional operations. However, there is an unexpected tail in the distribution: this tail represents rare instances in which tens of consecutive readouts and conditional pulses failed to put the ancilla in its ground state. We refer to the ancilla as being "stuck" if the reset operation takes five or more iterations to succeed. These "stuck" instances can be understood as the result of population in the fifth or higher transmon level, which would not be handled properly by the conditional logic used. The detailed mechanism of these excitations is not apparent from our measurements. Fig. S3(b) shows a histogram of all measurements m after which the reset was "stuck" (as defined in Fig. Fig. 4(a)), revealing the presence of population in higher states. As a reference, a histogram of integrated measurement signals corresponding to the first four levels of the transmon is shown in Fig. S3(c), with red circles as a guide to the eye.

SYSTEM PARAMETERS
Measured system parameters are summarized in Table S1. The parameters used in the theory curves in Fig. 5 of the main text were determined as follows: δ 0 = 5.2 × 10 −2 and δ 1 = 1.5 × 10 −3 were taken as the first point in the 0-photon and 5-photon curve, respectively. κ ↑ τ = 2.7 × 10 −4 was obtained by fitting to the 1-photon curve. κ ↓ τ should be regarded as a single effective parameter calculated as follows. The duration of the mapping pulse is is t map = 2.4 µs, and one ancilla readout and reset attempt have a duration of t r ≈ 2.16 µs. T (s) 1 was determined by independent calibration and combined with the measured QND-ness to obtain κ ↓ τ = (t map + t r )/T (s) 1 + P D = 4.8 × 10 −3 .

QUTIP SIMULATION DETAILS
In Sec. III of the main text, we describe simulations of g-e and e-f π-pulses. These were performed using the anharmonicity, relaxation times, and coherence times in Table S1. The jump operators simulated were and γ . The parameters of the pulses are listed in Table S2. Several amplitudes were simulated for each pulse, and the one maximizing the g → e or e → f probability was chosen in each case. In this way we can calculate P g and P shelved g quoted in the main text. From the simulated populations in states |g , |e , and |f immediately after the shelving pulse, P meas g was calculated by solving

SIMULTANEOUS NUMBER-SELECTIVE PULSES
The pulse which maps information from the storage mode onto the ancilla can be thought of as an Scontrolled π-pulse. That is, we want to enact the following unitary: U map = |n ∈S |n n| |g e| + |e g| + |f f | + · · · (S4) + |n ∈S |n n| |g g| + |e e| + |f f | + · · · . This gate was implemented using simulataneous gaussian pulses centered around the appropriate qubit frequencies and truncated to ±2σ t . The envelopes were chosen with a standard deviation of σ t = 600 ns, corresponding to a frequency width of σ f = 1/2πσ t = 265 kHz. This choice balances the pulse length (which should not be too long, to avoid storage mode relaxation and ancilla decoherence) against the pulse bandwidth (which must be small compared with the dispersive shift in order to be numberselective). The pulses are applied with empirically chosen detunings in order to achieve a reasonably high mapping pulse fidelity. The mapping is calibrated separately for each choice of S (defined in Table I of the main text).

EXPERIMENTAL SETUP
The cQED package used in the experiment is mounted to the base stage of a dilution refrigerator. Drives are single-sideband modulated using an FPGA system and delivered to the device after appropriate filtering and attenuation. The outgoing readout signal is amplified by a JPC at base temperature, a HEMT at 4 K, and a MITEQ amplifier at room temperature. It is then mixed down to 50 MHz and recorded by an ADC. The experimental details are essentially the same as those described in Ref. [9].