Ancilla-free implementation of generalized measurements for qubits embedded in a qudit space

Informationally complete (IC) positive operator-valued measures (POVMs) are generalized quantum measurements that offer advantages over the standard computational basis readout of qubits. For instance, IC-POVMs enable efficient extraction of operator expectation values, a crucial step in many quantum algorithms. POVM measurements are typically implemented by coupling one additional ancilla qubit to each logical qubit, thus imposing high demands on the device size and connectivity. Here, we show how to implement a general class of IC-POVMs without ancilla qubits. We exploit the higher-dimensional Hilbert space of a qudit in which qubits are often encoded. POVMs can then be realized by coupling each qubit to two of the available qudit states, followed by a projective measurement. We develop the required control pulse sequences and numerically establish their feasibility for superconducting transmon qubits through pulse-level simulations. Finally, we present an experimental demonstration of a qudit-space POVM measurement on IBM Quantum hardware. This paves the way to making POVM measurements broadly available to quantum computing applications.


I. INTRODUCTION
Steady progress in the field of quantum technology, attested by continuing improvements in both quantum algorithms [1-3] and hardware performance [4,5], suggests that quantum computers may soon provide significant advantages over their classical counterparts in fields such as optimization, machine learning, finance, quantum physics and chemistry. In particular, ab initio computational studies of molecular systems and materials represent natural areas of application for quantum computers [6][7][8][9][10][11]. These prospects have also attracted interest from the material and drug design industries [12,13].
Proof-of-principle experiments for small molecular systems have been successfully demonstrated on various quantum computing platforms [14][15][16]. Crucially, these applications should be extended to problem sizes of practical interest to reach the scale at which quantum advantage can be indisputably claimed. On current noisy hardware without error correction, the realizable circuit depths are limited by finite gate fidelities and qubit coherence times. Variational algorithms address these issues by leveraging classical resources in combination with, e.g., adaptive quantum protocols and effective sampling from parametrized quantum states [17,18]. For example, the variational quantum eigensolver (VQE) can be used, among other applications, to obtain the ground state energy of molecules [19]. This is achieved by measuring the expectation value of the Hamiltonian for a trial state prepared with a parameterized ansatz circuit. By updating the parameters with a classical optimizer, the energy is minimized to approach the true ground state, in the * aur@zurich.ibm.com † deg@zurich.ibm.com ‡ ita@zurich.ibm.com spirit of the variational principle. A sufficiently good accuracy is only reached if the ansatz circuit is expressive enough to closely approximate the actual ground state. Moreover, the convergence of the classical optimizer can be obstructed by vanishing gradients and local minima, particularly under the influence of hardware noise [20].
Overcoming these issues [10,[21][22][23] still leaves the large number of measurement shots needed to estimate the target observables as a major bottleneck of VQE [24]. This is commonly referred to as the measurement problem. For example, the small-scale molecular calculations of H 2 , LiH, and BeH 2 reported in Ref. [14] required measuring O(10 9 ) quantum circuits. On larger problem instances, these requirements can grow unsustainably large, e.g., an estimate for the Fe 2 S 2 complex predicts up to O(10 13 ) required measurements per energy evaluation [25]. Even with the high sampling rate of superconducting quantum processors of up to 100 kHz, this task would take decades to complete. Circuit execution speed [26] and measurement number reduction are therefore crucial to variational algorithms.
Known strategies to alleviate the measurement problem include Pauli groupings [14,[27][28][29][30][31], classical shadows [32][33][34], and machine learning [35]. Recent work suggests that informationally complete positive operatorvalued measures (IC-POVMs) can also efficiently estimate quantum states and observables, for example, they achieve a near optimal scaling in the number of measurements for the reconstruction of fermionic reduced density matrices [36,37]. In the context of observable expectation value sampling, adapting the POVM to the target observable reduces the measurement overhead by one order of magnitude compared to a standard Pauli grouping in hydrogen chains with 14 qubits [38]. However, the experimental realization of IC-POVMs requires coupling each qubit representing the trial state to two additional quantum states [39]. Traditionally, this is done arXiv:2203.07369v1 [quant-ph] 14 Mar 2022 by coupling each qubit to an ancillary one before readout [38,40]. This approach doubles the number of necessary qubits during the measurement stage, and therefore halves the usable portion of a quantum chip. Moreover, the limited connectivity of most quantum architectures leads to a significant Swap-gate overhead [41].
In this work, we conceptualize and implement a measurement scheme for IC-POVMs, which does not require ancilla qubits. Many quantum computing architectures encode qubits in two levels of a larger Hilbert space, e.g., the energetically lowest states of a transmon or two long-lived states of an atom or ion [42][43][44]. We use two additional states in this surrounding qudit space to realize programmable single-qubit POVM measurements. This requires the ability to distinguish four qudit states through projective measurements and a short pulse sequence coupling to the qudit states at the very end of a quantum circuit. As a result, the coherence and gate fidelity requirements of these additional states are much less stringent than for the qubit states.
Our paper is organized as follows. In Sec. II, we propose a practical implementation of POVM measurements for qubits embedded in a qudit space. In Sec. III, we demonstrate an experimental implementation of our scheme on a superconducting qubit in IBM Quantum hardware. Finally, in Sec. IV, we show how quditbased POVMs implemented in superconducting transmon hardware can sample operators with low variance through pulse-level numerical simulations.

II. THEORY
The POVM formalism describes general measurements of a state ρ S on a system Hilbert space H S . Formally, an M -outcome POVM is a set of M positive semi-definite Hermitian operators Π 0 , . . . , Π M −1 acting on H S which satisfy the completeness relation M −1 m=0 Π m = 1, where 1 is the identity. Each operator Π m represents one possible outcome of the measurement that occurs with a probability p m = Tr(ρ S Π m ). ( This expectation value can thus be estimated from N samples drawn from the POVM outcome distribution as O = m c m N m /N , where N m denotes the number of times outcome m was observed. The error on this estimator is the standard error of the mean Tailoring the POVM operators to the specific observable O and the state ρ S considerably reduces the corresponding variance Var(O) [38].
General POVMs on H S can be implemented by coupling to an extended space H ext either through a tensor product extension (TPE) H ext = H S ⊗ H A or a direct sum extension (DSE) H ext = H S ⊕ H A [39]. To realize POVM measurements, a specific unitary U is applied to H ext such that the probability distribution of a subsequent M -outcome projective measurement on H ext coincides with the POVM outcome distribution {p m } for the original state ρ S . Before applying U , the initial state on H ext is of the form ρ = ρ S ⊗ ρ A in a TPE while in a DSE it has no support on H A . In both cases, the existence of U is guaranteed by Naimark's dilation theorem [45].
We consider IC-POVM measurements on N -qubit systems, specifically product POVMs where each global operator Π m is given by a tensor product of local singlequbit operators of rank one. Each such local POVM includes M = 4 linearly independent operators [39]. The global POVM then consists of 4 N product operators, the minimal number required for informational completeness. Such POVMs are typically implemented in a TPE by coupling each of the N qubits to an ancilla qubit. The singlequbit POVM operators then define a two-qubit unitary U acting on the system and ancilla qubit. This can be accomplished with three Cnot-gates and single-qubit gates through the KAK decomposition [46,47], which can also be improved by scaling pulses [48]. The relation between U and the POVM operators Π m is detailed in App. A 1.
The overhead of ancilla-based POVM implementations in a TPE, which doubles the qubit count, can be avoided if the qubit states |0 and |1 are encoded in the higherdimensional Hilbert space of a qudit. Instead of an ancilla, we use two additional states of the qudit space, denoted |2 and |3 , which are not populated during the quantum circuit to realize a single-qubit POVM through a DSE, see Fig. 1. The states |2 and |3 may be higherexcited states of a superconducting transmon qubit [42] or additional states of the level structure in trapped ions [43] and neutral atoms [44]. We implement the POVM-encoding unitary U on the qudit space through a sequence of pulses that couple adjacent levels. This approach is suitable to architectures where an external drive with a dipole coupling is available, e.g., through microwave or laser pulses.
We now review the action of individual pulses and then decompose U into rotations generated by such pulses.
n=0 E n |n n| denote the qudit Hamiltonian in its eigenbasis in the laboratory frame (lf). An external a) The M = 4 rank-one, single-qubit POVM operators, represented on a Bloch sphere, define a four-dimensional unitary U which encodes the POVM operators. b) We realize this unitary on the qudit space in which the qubit state |ψ is encoded. c) This can be achieved by a sequence of ten π/2pulses that couple adjacent levels. d) Finally, a projective measurement of the four states yields the outcome probabilities of the four POVM operators.
with envelope Ω(t), drive frequency ω D , and phase φ leads to an interaction Hamiltonian Here, g n denotes the coupling strength to the n ↔ n + 1 transition and we set = 1. By transforming into the rotating frame (rf) of the drive, and applying the rotating wave approximation (dropping terms rotating at 2ω D ), these Hamiltonians become g n e iφ |n + 1 n| + h.c. .
We further define generalized Z n↔n+1 (ϕ)-rotations, that act as diag(e −i ϕ 2 , e i ϕ 2 ) on the states |n and |n + 1 and as the identity elsewhere. Such generalized Z-gates can be engineered from two Givens rotations [43]. For qubits, it is common to implement z-rotations virtually by adjusting the phases φ of subsequent drive pulses [49,50]. We generalize this concept to virtually implement qudit-space Z-gates, as detailed in App. A 3.
We construct the POVM-encoding unitary U from Rrotations as in Eq. (9) by adapting an algorithm presented in Ref. [51] that decomposes U (up to remaining phases on the diagonal) into a sequence of Givens rotations G n↔n+1 (θ, φ), following a strategy similar to a QR decomposition [52]. We extend this algorithm in two ways. First, we add Z-gates to the sequence to fully decompose U (including all relative phases) without increasing the number of pulses. Second, we replace the inaccessible G-rotations in the decomposition of U with the realistic R-rotations in Eq. (9), that include additional phases acquired by idle levels. We absorb these phases into the angles φ of the subsequent R-pulses. The details of the decomposition algorithm of U into R-gates are given in App. A. Here, we only quote our main result: The target unitary U can always be realized as a sequence of five R-rotations The specific choice of the targeted POVM operators Π m enter through the angles θ i and φ i , while the order in which the transitions are driven is fixed and independent of the POVM.
Finally, let √ X n↔n+1 denote a π/2-pulse around the x-axis between the states |n and |n + 1 . Any Rrotation can be realized by two √ X -pulses and three virtual Z-gates, see App. A 2. This has the great practical benefit that only the three pulses √ X 0↔1 , √ X 1↔2 , and √ X 2↔3 , rather than a parametrized family of pulses, require calibration. It is thus helpful to decompose the pulse sequence in Eq. (11) into √ X -gates, shifting all angular dependencies into near-perfect virtual Z-gates. Common calibration techniques applicable to the quditspace pulses are readily available [53]. The resulting pulse sequence for the implementation of U requires a total of ten √ X -pulses, see Fig. 1(c) for an example where each pulse is depicted with a Gaussian envelope.

III. IMPLEMENTATION IN SUPERCONDUCTING QUBITS
We now present and discuss experimental results of a qudit-space POVM measurement in a superconducting transmon qubit. Transmons are a popular qubit architecture as they enjoy long coherence times relative to the duration of their gates [4] and can gather measurements at elevated trigger rates, typically around 1 -100 kHz [26]. They are built from a non-linear resonance circuit created by a Josephson junction shunted by a capacitor and are characterized by the ratio of the Josephson energy E J to the charging energy E C , with E J /E C 1 [54]. The spectrum of a transmon is described by an anharmonic oscillator, with the qubit encoded in the ground state |0 and the first excited state |1 . For details on this architecture see App. B.

A. Qudit control of transmons
We propose to use the energetically next-highest states |2 and |3 in addition to the qubit states |0 and |1 to implement qudit-based POVM measurements. With the decomposition in Eq. (11), we only need to drive transitions between adjacent states. In existing experimental setups, these states are accessed by switching the carrier frequency of the microwave drive pulses. Current IBM Quantum systems employ qubits with 0 ↔ 1 transition frequencies of ∼ 5 GHz and anharmonicities of ∼ −300 MHz. Drive pulses are generated by an arbitrary waveform generator with a sampling rate of 4.5 × 10 9 s −1 [55]. We can thus apply modulations to the carrier frequency of up to approximately ±1 GHz (still oversampling by a factor of 4.5). The carrier frequencies of ∼ 4.7 GHz and ∼ 4.3 GHz required to address the 1 ↔ 2 and 2 ↔ 3 transitions, respectively, are thus well within the capabilities of our control hardware. Coherent control of the |2 state following this procedure has already found applications in excited state promotion readout [5,56], entanglement studies [57], gate decompositions [58], fast resets [59], and entangling operations [60].
Qudit-based POVM measurements require sufficient lifetimes of the higher excited states. On typical transmon qubits, we observe that the decay from |3 occurs predominantly sequentially as |3 → |2 → |1 → |0 , while transitions such as |3 → |1 are strongly suppressed, see App. B 2. This is in agreement with theory [61], and previous experiments [42]. For our purposes, coherence in |2 and |3 is only required during the POVM pulse sequence, which lasts a total of O(100 ns) using at most ten √ X -pulses. With measured lifetimes of > 25 µs for the |3 and |2 states, we do not expect the decay of higher excited states to be a limiting factor.
Transmons are dispersively measured by coupling them to a readout resonator [62]. The transmitted signal is typically down-converted and integrated, resulting in a point in the IQ-plane, which is then discriminated into |0 and |1 . Dispersive readout can be extended to distinguish between the four qudit states. Recently, separation of the lowest three states with fidelities >95% has been demonstrated experimentally [63].
A challenge for qudit control of transmons is the charge dispersion of higher-excited states. The exact eigenenergies of all transmon states fluctuate under charge noise of the environment, see App. B 1. This effect increases exponentially for the energetically higher states posing a threat for high-fidelity pulses on the 1 ↔ 2 and especially on the 2 ↔ 3 transition. As a result, transition frequencies fluctuate considerably from one experimental run to another. For IBM Quantum hardware with E J /E C ∼ 40, we observe that the 2 ↔ 3 transition frequency varies by 15 to 20 MHz, see App. B 3. To ensure a resonant driving of the transition, the corresponding drive pulses thus need to cover a broad spectral range. This can be achieved by shortening the pulses, which typically increases phase errors and leakage to neighboring levels. Pulse shaping techniques such as DRAG and advanced optimal control help alleviate this issue [64][65][66]. Furthermore, applying the POVM pulse sequence requires tracking the phases of idle levels. The acquired phases depend on the eigenenergies of each level, which are subject to charge dispersion. Conveniently, the unitary that encodes the POVM requires a single drive of the 2 ↔ 3 transition, see Eq. (11). Hence, the |3 state is only populated once during the sequence, so that any phase uncertainty after the 2 ↔ 3 pulse becomes irrelevant upon measurement in the qudit basis. Thus, whereas full coherent control of the |3 state is difficult to achieve, the relatively simple pulse sequence required for the POVM measurement is particularly robust to phase uncertainties of this state.

B. Experimental demonstration
As a proof-of-principle demonstration on IBM Quantum hardware, we implement a single-qubit IC-POVM which consists of the target POVM operators Three of the operators (Π 1 , Π 2 , and Π 3 ) point along the Cartesian axes of the Bloch sphere, while Π 0 points into the octant which lies opposite of all other vectors, see Fig. 2a. The unitary that encodes this POVM is realized with a sequence consisting of two √ X 0↔1 , two √ X 1↔2 and one √ X 2↔3 gates, see App. B 4. We use the standard single-qubit SX-gate that comes with a highly calibrated Drag-pulse pulse module [67,68]. For the 1 ↔ 2 and 2 ↔ 3 transitions, we first calibrate the transition frequency with spectroscopy after preparing the initial states |1 and |2 , respectively. For simplicity, we implement the √ X -gates on these transitions with Gaussian pulses. We choose a duration of 32 ns for the √ X 1↔2 -and 14 ns for the √ X 2↔3 -pulse. These durations are shorter than the 36 ns standard single-qubit pulse to mitigate charge dispersion in higher-excited states by an increased spectral width. Simulations suggest that even shorter pulses are beneficial, see App. E. However, we find it more difficult to calibrate them. After fixing the pulse duration, we calibrate the angle of the rotations through sinusoidal fits to Rabi oscillations with varying pulse amplitudes. To calibrate the readout, we prepare and measure the states |0 , |1 , |2 , and |3 separately through a sequence of appropriate √ X -gates and use this data to train a classifier with a quadratic decision boundary, as shown in Fig. 2b. For each state, we obtain a characteristic signal that clusters in different regions of the IQ-plane.
We investigate how well our pulse sequence along with the calibrated measurement implements the desired POVM with quantum detector tomography (QDT) [69,70], which characterizes the realized POVM operators. Hereby, a set of reference states is prepared and measured by our POVM implementation. We choose the set of single-qubit states |0 , |1 , |+ , |− , |i , and |−i for this purpose. From the obtained outcome distributions, shown in Fig. 2c, the underlying experimental POVM operators can be estimated with a maximum-likelihood (ML) procedure, which guarantees that they form a valid POVM [71], see App. D. Note that, on the Bloch sphere, the tomography states |− , |1 , and |i lie opposite the POVM operators Π 1 , Π 2 , and Π 3 , respectively. They should thus have zero measurement probability of the corresponding outcomes, which is attested by a noticeable lack of counts in the respective regions of the IQplane in the raw data of Fig. 2c. As a result, the operators obtained from the maximum-likelihood detector tomography are in good qualitative agreement with the theoretical target operators, see Fig. 2d. We quantify the fidelity through the operational distance D OD [72,73], a measure on the POVM space, between the experimentally realized and the target POVM with 0 ≤ D OD ≤ 1 and D OD = 0 for coinciding POVMs, see App. C. The raw measurement data presented in Fig. 2c yields D OD = 0.22. We identify the overlap of the detection regions in the IQ-plane between |1 and |2 and especially |2 and |3 as the main experimental limitation for qudit-based POVM measurements. Specifically, in our experiments, around one quarter of the prepared states in |3 are identified as |2 and vice versa, see Tab. I. To mitigate misassignment errors, we apply readout error mitigation based on the inversion of the misassignment matrix, constrained to non-negative prob-ability vectors [72]. Thereby, we can partially correct the measured raw data and achieve an improved D OD of 0.15 between the theoretical and the ML-estimated experimental POVM.
The difficulty to reliably distinguish the states |2 and |3 complicates the calibration of the average 2 ↔ 3 transition frequency. At the moment, this renders the implementation of POVMs that require virtual Z 2↔3 -gates infeasible. This motivates the choice of the POVM operators in Eq. (12) for our experiments, which are achievable with a slightly simplified pulse sequence, compared to the most general case of Eq. (11), see App. B 4. The measurement pulses used in our experiment are the default pulses provided by the backend, which are optimized for maximal separation of the |0 and |1 states. A large-scale implementation of qudit-space POVM measurements would require a more careful calibration of the readout pulses, which optimizes the separation of all four involved basis states. This would make the virtual Z 2↔3 -gates feasible and improve the √ X 2↔3 -gate.

C. Optimal transmon parameter regime
In the previous section, we demonstrated a quditbased POVM measurement on a quantum device with an E J /E C -ratio of ∼ 45. This value was chosen for optimal qubit operation. However, the substantial charge dispersion in states |2 and |3 of the transmon suggests that larger E J /E C -ratios may be advantageous for qudit POVMs. This would sacrifice some anharmonicity to decrease the charge noise. We now quantitatively assess this trade-off through numerical pulse-level simulations, which account for both leakage errors due to finite anharmonicity and phase errors due to charge noise, but neglect readout misassignment errors.
We start by probing how the achievable D OD depends on E J /E C , using a single-qubit symmetric, informationally complete (SIC) POVM Π SIC as an example of a generic POVM. It consists of four opera- , 2, 3} that point towards the corners of a regular tetrahedron, see Fig. 1a. In contrast to the experimentally demonstrated POVM in Eq. (12), Π SIC requires implementing the pulse sequence from Eq. (11) in its full generality. We simulate this sequence with Gaussian pulse envelopes on a single transmon by numerically integrating the timedependent Schrödinger equation. For details on how we model charge dispersion and calibrate pulses see App. E. As the E J /E C -ratio increases and charge noise becomes less prevalent, D OD (Π SIC , Π sim ) decreases, see Fig. 3a. While the D OD is limited to 0.1 for E J /E C ∼ 40, it improves to 0.01 for E J /E C ∼ 80. The change in anharmonicity with E J /E C affects the duration of the pulse sequence that achieves the optimal D OD , as plotted in Fig. 3b. In the low E J /E C -regime, short pulses are favored as a broad spectral width is required to cover the  large spread of the charge noise, and leakage is minimal due to the large anharmonicity. Conversely, with increasing E J /E C , the anharmonicity of the transmon is reduced, which amplifies leakage. The optimal pulse durations thus increase with the ratio E J /E C .
The longer the pulse sequence, the more it is subject to non-unitary processes like decoherence, which are not considered in our simulation. Consequently, there is a trade-off between the optimal durations of the pulses under unitary dynamics and noise induced by finite coherence times. We therefore limit the total duration of the POVM-encoding pulse sequence to different maximally allowed durations t max , see Fig. 3a. We find that, for fixed t max , the D OD improves with increasing E J /E C until an optimal ratio is reached after which the D OD gradually increases. In the parameter regime of current IBM Quantum hardware (E J /E C ∼ 35 -45), the optimal POVM pulse sequence time is ∼ 100 ns. On this timescale, we do not expect decoherence to be significant, see App. B 2. For reference, single-qubit gates typically last 36 ns [55]. Finally, changing the transmon parameters also affects the conventional gates run in the quan-tum circuit prior to the POVM measurement. This is exemplified by the average gate fidelity F of a single-qubit 36 ns SX-gate, which is shown in Fig. 3c. As E J /E C increases from 20 to 120 the gate fidelity decreases by roughly one order of magnitude due to the reduced anharmonicity.
The trade-off between anharmonicity and charge noise in a transmon qubit is a complex interplay of many factors, including coherence times, gate fidelities and gate speed [54]. Our simulations suggest that, when taking qudit POVM fidelities into account, the optimal hardware regime shifts towards higher E J /E C -ratios. While this improves the quality of qudit-space POVM measurements, it comes at the expense of either slightly worse gate fidelities or slightly slower gate speeds, whose severity ultimately depend on the available coherence times. Optimal control methods may alleviate such issues [66].

IV. APPLICATION TO OPERATOR SAMPLING
Our experimental realization is currently limited by misassignment errors in the readout due to insufficient separation in the IQ-plane. However, even with perfect readout fidelities, the considerable charge noise of current-generation transmon qubits still raises the question whether qudit POVMs with ODs of ∼ 0.1 are sufficient for practical applications. Here, we address this question through numerical simulations of optimized IC-POVMs for estimating the expectation value of an observable O as developed in Ref. [38].

A. Device noise mitigation through detector tomography
We denote the optimized (theoretical) target POVM by Π theo , which defines a target unitary in the qudit space of each transmon with corresponding outcome probabilities p theo m according to Eq. (1). However, due to device noise, the effective (experimental) channel that is applied to the qudits encodes a different POVM, denoted by Π exp , which slightly deviates from the theoretical one. In practice, Π exp defines the experimental measurement probabilities of the outcomes p exp m , while Π theo is used to obtain the decomposition of O with coefficients c theo m , as defined in Eq. (2). The combined estimator converges to O = m c theo m p exp m , which differs from the theoretical expectation value due to the imperfections in the device, leading to a bias m c theo m (p exp m − p theo m ). To estimate the impact of this bias on practical applications, we study its effects on energy measurements of trained VQE ansatz states for small molecular Hamiltonians mapped onto four to eight qubits. As the target operators Π theo , we use POVMs that minimize the variance for the respective Hamiltonians over the trial states as reported in Ref. [38]. These POVMs are simulated under charge noise for a device with E J /E C = 45, see App. E. The biases that arise from the device noise are shown in Fig. 4a (red bars). In most cases, we observe that charge noise creates biases that prevent energy estimations down to chemical accuracy.
To attenuate the large biases induced by the hardware noise, we propose an efficient error mitigation strategy in which the mismatch between Π theo and Π exp is reduced by means of quantum detector tomography [69,70]. This process allows an accurate estimation of the POVM operators that are actually implemented in the device, denoted by Π tomo . With this procedure, we first compute the decomposition of O into the operators of ) converges to zero for infinitely many tomography shots. The desired accuracy in a given application thus defines how many measurements should be dedicated to the detector tomography. Crucially, since the POVMs we consider are always products of single-qubit POVMs, the tomographic reconstruction can be carried out on all qubits in parallel. Thus, the overhead in the shot budget is constant, and we do not expect this process to hamper the scalability of quditbased POVMs. Our simulations indicate that, even for current transmon hardware with E J /E C ∼ 45, quditspace POVM measurements characterized through detector tomography are sufficiently accurate for quantum chemistry applications.

B. Qudit-based POVMs for variance reduction
Finally, we discuss whether the qudit POVM measurements in noisy conditions can be utilized to reduce the variance of an estimator of O . As an example, we consider the 6-qubit Hamiltonian O LiH of a LiH molecule in the STO-3G basis obtained from the Bravyi-Kitaev mapping and investigate the number of shots needed to estimate the energy of a trained VQE state |ψ VQE within chemical accuracy (in the chosen basis set). We compare two situations where firstly, each qubit is measured using a SIC-POVM and secondly, the qubits are measured by means of a product POVM optimized to minimize the variance of O LiH in the state |ψ VQE [38]. For a given POVM, the variance of a specific observable is determined by its decomposition coefficients c m and the measurement probability distribution p m of the state, see Eq. (4). Namely, the second moment m c 2 m p m determines the accuracy of the POVM-based estimator. In particular the outcomes m with both high absolute value of c m and high measurement probability p m contribute to . For the outcome distribution of the SIC-POVM, due to the symmetry of the POVM operators, the data is highly structured, see Fig. 5a. The outcomes with highest probability attain high values of |c m |, which results in a large second moment of 80.86 Ha 2 . By measuring in an optimized POVM, even under charge noise, the second moment is considerably reduced to 1.59 Ha 2 . This approaches the optimum set by the squared first moment O 2 = 1.12 Ha 2 . This effect can be explained by inspecting the shape of the the distribution in Fig. 5b, which shows a "squeezing" such that the most probable outcomes are associated with low absolute values of c m . This in turn leads to very large absolute coefficients for other outcomes, which, in contrast, have negligible measurement probability and thus hardly contribute to the variance.
We observe that with the generic SIC-POVM scheme about 3.5 × 10 7 shots are required to estimate O LiH to within chemical accuracy. In contrast, only 3.1 × 10 5 shots are required when using the optimized POVM in a qudit-based scheme using a transmon affected by stateof-the-art charge noise. This number already includes 10 5 shots devoted solely to the detector tomography used for the bias mitigation discussed in Sec. IV A. With a circuit execution rate of 10 kHz, the optimized POVM reduces the measurement time from 1 hour down to 30 seconds. It is important to note that in this application the mitigated bias lies well within chemical accuracy, as shown in Fig. 4. Based on this example, we conclude that qudit-space POVM measurements constitute a valid, shot-efficient approach to estimate observables with high precision.

V. DISCUSSIONS & CONCLUSIONS
We introduced a method to perform general POVM measurements for qubits via a Naimark dilation construction, which extends the qubit space into a qudit space through the addition of two extra levels, rather than coupling to an additional ancilla qubit. Our strategy makes optimal use of the available quantum resources in a system without requiring full qudit control -a challenging task in general. We couple the qubit states to the two additional levels of the surrounding qudit for only a short duration at the measurement stage of the quantum circuit. Therefore, only modest coherence and pulse fidelities are required. Compared to ancilla-based POVM implementations, we circumvent the doubling of the quantum register size and thus save half of the qubits on the chip, while also avoiding a considerable Swap-gate overhead in case of limited device connectivity. The result is a protocol that is applicable to various qubit architectures including super-and semiconducting qubits, trapped ions, and cold atoms.
For a superconducting transmon qubit, we detailed an implementation of qudit-space POVM measurements, including a description of the decomposition into suitable elementary pulses between adjacent levels, and of the required calibrations. Specifically, we proposed ways to operate the necessary frame changes by tracking advances in relative phases, as well as generalizing the concept of virtual Z-gates to the qudit space. Compared to the standard qubit setting, our proposal admittedly requires further calibrations involving the additional states. However, these calibrations can be performed on all qudits in parallel and are typically faster than two-qubit gate calibrations.
Exploiting the functionalities of Qiskit Pulse [67], we successfully performed a proof-of-principle experiment using the four lowest levels of a transmon in IBM Quantum hardware. We found that measurement misassignments are currently the main limitation of the proposed qudit-based POVMs, which prevents the scaling up to multi-qubit implementations. This calls for a more thorough design and optimization of the shape and frequency of measurement pulses with the aim of obtaining a sufficient dispersive shift for all four qudit levels. Moreover, the importance of choosing the readout resonator frequency appropriately, such that no transitions between higher excited states are accidentally resonant to the resonator frequency, has also been pointed out [42].
From preliminary pulse-level simulations, we conclude that tuning the qubits deeper into the transmon regime would be beneficial to achieve optimal POVM fidelities, as this limits the impact of charge noise in the higherexcited states. Nonetheless, our results indicate that the implementation of qudit-based POVMs in state-of-theart IBM Quantum hardware can significantly reduce the number of measurements required to estimate expectation values. To achieve this goal, we designed a shotefficient strategy based on detector tomography to mitigate systematic errors arising from experimental imperfections.
In addition to operator averaging, informationally complete POVMs can be employed for other paradigmatic quantum information tasks, including state tomography [74] and the extraction of classical shadows [75]. In all these cases, our strategy offers a resource-effective route towards their implementation in state-of-the-art quantum processors. On a broader perspective, our results open up new opportunities to exploit the multi-level structure available on many different qubit architectures, thus contributing to the development of a richer operational toolbox, and extending the native capabilities of current quantum computing architectures.

VI. ACKNOWLEDGEMENTS
This research is part of two projects that have received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreements No. 847471 and No. 955479. This work was supported as a part of NCCR SPIN, a National Centre of Competence in Research, funded by the Swiss National Science Foundation (grant number 51NF40-180604). We acknowledge the use of IBM Quantum services for this work. IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. The current list of IBM trademarks is available at https: //www.ibm.com/legal/copytrade.  Here, we detail the connection between a unitary U applied to a four-dimensional extended Hilbert space H ext and the POVM operators realized on the singlequbit space H S through a Naimark dilation construction. In a tensor product extension (TPE), the four basis states of H ext are formed with an ancilla qubit as |0 ext = |0 S ⊗ |0 A , |1 ext = |1 S ⊗ |0 A , |2 ext = |0 S ⊗ |1 A , and |3 ext = |1 S ⊗ |1 A . In contrast, in a direct sum extension (DSE), the four states |i ext form a qudit space, where the qubit is encoded in the states |0 ext ≡ |0 S and |1 ext ≡ |1 S .
For simplicity, we assume a pure state of the system qubit |ψ S = α |0 S + β |1 S . A Naimark construction for both a TPE and a DSE applies a unitary U to the initial state |ψ init = α |0 ext + β |1 ext , to create the final state Measuring U |ψ init in H ext produces an outcome m ∈ {0, 1, 2, 3} with a probability p m = |U m,0 α + U m,1 β| 2 . This is equal to the probabilities p m = Tr(Π m |ψ S ψ| S ) associated with a POVM of four rank-1 operators acting on H S , which are proportional to projectors along the states with normalization factors Γ m = |U m,0 | 2 + |U m,1 | 2 . Through Eqs. (A2) and (A3), the unitary U applied to H ext can emulate the measurement of any POVM with four rank-one operators on H S . Since the desired POVM only defines the first two columns of U , i.e., U m,0 and U m,1 , we find the remaining columns with a Gram-Schmidt procedure to ensure U is unitary. Without loss of generality, for the decomposition algorithm presented in Sec. A 2, it is convenient to choose the top right element of U to vanish, i.e., U 0,3 = 0.

Pulse decomposition for qudit-space POVMs
Here, we review the decomposition algorithm that we use to realize the unitary U with Givens rotations Gand Z-gates, as defined in the main text. Since this algorithm decomposes special unitary operators, we first define the SU(4) operator U (0) = U det(U ) −1/4 , which encodes the same POVM as U . The decomposition routine iteratively reduces U (0) to the identity matrix through a sequence of gates. We denote the unitary after the i-th gate is applied to U (0) by U (i+1) .
The reduction to the identity matrix is accomplished by creating zeros in the off-diagonal entries starting from the top entry in the fourth column [51]. Since by choice, the first entry U (0) 0,3 is already zero, we create a second zero in the fourth column of U (1) with a Givens rotation G 1,3 = r 1 e iδ1 and U (0) 2,3 = r 2 e iδ2 then the angles of the Givens rotation must be [51] θ 1 = 2 arctan r 1 r 2 , and φ 1 = In the next iteration, we similarly apply another Givens rotation such that G (1) T . Due to unitarity, the remaining non-zero entry is a phase factor U (2) 2,3 = e iβ . A rotation Z (2) 2↔3 with an angle ϕ z = −2β sets the phase β to zero. This finally results in the matrix that has been reduced to a 3×3 block. The above procedure is now repeated for the third column. This requires two Givens rotations and one Z-rotation such that 2,2 , 0 T = (0, 0, 1, 0) T .

(A6)
Finally, applying the same strategy once more to the second column results in the identity matrix. Our initial choice of an SU(4) matrix assures that the final phase of the top left entry vanishes. As a result, applying the inverse of all gates in reverse order gives a decomposition (up to an irrelevant global phase) of the target unitary U into elementary operations of Givens rotations and (potentially virtual) Z-gates: To simplify pulse calibrations, we restrict the gate set to virtual Z-gates and the gates √ X = G(θ = π 2 , φ = 0) which describe a π/2-rotation around the x-axis between the states |n and |n + 1 . A general Givens rotation G(θ, φ) can be exactly realized by a sequence of two √ Xand three Z-gates where we have omitted the subscripts n ↔ n + 1 [49]. Replacing every G-gate in Eq. (A7) with the decomposition in Eq. (A8) results in a decomposition of U that only contains ten √ X -and eleven Z-gates.
For a qudit-space POVM realization in realistic hardware we need to apply the sequence of Gor √ X -and Zgates through the hardware-native rotations R derived in Eq. (9) of the main text. It is sufficient to implement a unitaryŨ equivalent to the target unitary U as long as the same measurement probabilities for any initial qubit state are recovered. In Sec. A 3 and A 4, we detail how to realize such an equivalent unitaryŨ with realistic Rrotations.

Generalized virtual Z-gates
Each pulse as defined in Eq. (5) in the main text is played in a frame that consists of a carrier frequency ω and a phase φ. For our implementation of the qudit-space unitary, three frames are relevant, which correspond to the three driven transitions, i.e., 0 ↔ 1, 1 ↔ 2, and 2 ↔ 3. While the frequencies of the drives in these frames always remain fixed to the transition energies of the system such that ω n = E n+1 − E n , the phases of the frames need to be adjusted to account for phase advances during Rrotations and to virtually implement Z-gates.
As an example of a Z-rotation in qudit space, consider the gate which applies a relative phase of −ϕ between states |1 and |2 . Therefore, an angle ϕ needs to be subtracted from the phase of all subsequent pulses played in the 1 ↔ 2 frame. However, while the above gate leaves the levels |0 and |3 unchanged, it applies a relative phase of ϕ/2 between the levels |0 and |1 , as well as between |2 and |3 . Hence, in addition to affecting all following phases in the 1 ↔ 2 frame, an angle ϕ/2 must be added to all drive phases in the 0 ↔ 1 and 2 ↔ 3 frames. In general, Z n↔n+1 (ϕ) gates can be virtually implemented by adding ϕ/2 to all subsequent pulses in the n+1 ↔ n+2 and n − 1 ↔ n frames while deducting a phase ϕ from the following pulses in the n ↔ n + 1 frame.

Correcting phase advances during R-pulses
While playing a pulse of a total duration T in the n ↔ n + 1 frame, the uncoupled levels acquire non-trivial phases, see Eq. (9) of the main text. It is instructive to look at an example of a drive in the 1 ↔ 2 frame which implements the unitary Let α n = ω n − ω n−1 for n > 0, such that for an anharmonic oscillator, α 1 would simply denote the anharmonicity. The above unitary results in a relative phase of ∆φ 0↔1 = −α 1 T between the states |0 and |1 and a relative phase of ∆φ 2↔3 = α 2 T between the states |2 and |3 . To correct these phases, ∆φ 0↔1 and ∆φ 2↔3 have to be subtracted from the phases φ of all subsequent pulses in the 0 ↔ 1 and 2 ↔ 3 frames, respectively. Generalizing from this example, under a drive R n↔n+1 , the m-th level acquires a phase (ignoring global phases) of φ m↔m+1 = ((m − n)ω n + E n − E m ) T which results in a phase difference of This defines the necessary phase shift of all following pulses in the m ↔ m + 1 frame. In summary, a sequence of gate instructions consisting of Givens rotations G(θ G , φ G ) and phase gates Z(ϕ Z ) can be implemented in the qudit space through pulses R(θ R , φ R ) where the rotation angles remain unchanged (θ R = θ G ) and the phases of the pulses φ R depend on the phases φ G and ϕ Z of all previously implemented gates of the sequence. This procedure is summarized as a pseudocode algorithm in Alg. 1.
Algorithm 1 Implementation of a sequence of Givens rotations G and phase gates Z via hardware-native pulses R achieved by keeping track of all necessary phase shifts.

The transmon qubit
A transmon qubit consists of a Josephson junction with Josephson energy E J shunted by a large capacitance whose single-electron charging energy is denoted as E C , with E C E J . The Hamiltonian of the circuit iŝ wheren andφ are dimensionless conjugate variables describing the number of Cooper pairs on the capacitor and the superconducting phase across the Josephson junction, respectively [76]. The offset charge n g is a constant that results from capacitive coupling of undesired voltage sources due to imperfect isolation from the environment.
We denote the Hamiltonian in its eigenbasis byĤ TM = n E n |n n|, where the qubit is encoded in the lowestlying eigenstates |0 and |1 . Through the expansion cosφ = 1 −φ 2 2 +φ 4 24 − . . . , we see that, for smallφ, the transmon resembles a harmonic oscillator. However, the higher powers ofφ create an anharmonic spectrum where the spacing of the eigenenergies is not equidistant, but decreases with higher levels. With the excitation energies ω n = E n+1 −E n between adjacent levels, we define the anharmonicity α n = ω n − ω n−1 , n ≥ 1, as the difference in adjacent transition frequencies.
In a realistic experimental setting, n g is subject to fluctuations called charge noise. This causes changes of the eigenenergies E n which are periodic in n g [54], see Fig. 6a. The maximal difference in eigenenergies of n = |E n (n g = 0) − E n (n g = 1/2)| , is commonly called the charge dispersion. Thus, under charge noise, the exact transition frequencies ω n fluctuate, which creates phase errors [77]. Transmons mitigate this by increasing the ratio of E J /E C , which decreases charge dispersion, as shown in Fig. 6b. However, the charge dispersion in the |2 and |3 state remain at least one and two orders of magnitude larger than in the |1 state, respectively. As E J /E C increases, the absolute value of the anharmonicity decreases (see Fig. 6c) which complicates driving the individual transitions due to leakage into adjacent levels and phase errors. The transmon relies on the fact that the charge dispersion decreases exponentially with E J /E C while the anharmonicity is only reduced with a weak power-law, making control at high E J /E C favorable [54]. Therefore, IBM Quantum devices currently employ transmon qubits with (E J /E C ∼ 35 -45), ω 0 /(2π) ∼ 5 GHz, and α 1 /(2π) ∼ 300 MHz [55].

Decay of higher excited states
Sufficient coherence of all involved states is required to perform qudit operations acting on higher-excited states. Here, we experimentally probe the T 1 times of the four lowest levels of a transmon in IBM Quantum hardware. We prepare the state |3 by a ladder sequence of π-pulses X 0↔1 , X 1↔2 , and X 2↔3 . The system is left to decay for a time t prior to a projective measurement which extracts the populations p(t) from 1000 measurements at each time step, see Fig. 7.
The fit in Fig. 7, accurately captures the population of the |0 state but deviates slightly for the other states. We attribute this to the significant misassignment errors present in the readout stage, which, even after readout error mitigation, remain significant, see Sec. III B.  Here, we present a direct measurement of the charge dispersion of the |3 state by performing a Ramsey interference experiment on the 2 ↔ 3 transition. The experimental sequence consists of a preparation of the |2 state, followed by a π/2-pulse around the x-axis of the 2 ↔ 3 transition, a delay time t Ramsey , and finally a −π/2pulse around the x-axis of the same transition. We measure the signal in the IQ-plane as t Ramsey is increased. This results in oscillations at the difference between the drive frequency and the true 2 ↔ 3 transition frequency, see Fig. 8a. Transforming the signal into Fourier space reveals that the oscillation is a beating between two contributing frequencies f 1 and f 2 , see Fig. 8b. This can be attributed to quasiparticle tunneling across the qubit junction [42,78]. While repeating the Ramsey sequence 50 times, the two frequency components f 1 and f 2 fluctuate symmetrically around a center frequency f of ∼ 13 MHz, see Fig. 8c. This represents the average detuning of the applied drive pulses. In total, this data suggests that the true frequency of the 2 ↔ 3 transition fluctuates by as much as 15-20 MHz. For this particular qubit with a frequency of ω 0 /(2π) = 5.2 GHz and an anharmonicity of α 1 /(2π) = −340 MHz, a direct diagonalization of the Hamiltonian from Eq. (B1) predicts a charge dispersion of 3 = 13.9 MHz. Our measurements are thus in reasonable agreement with theory.

Experimental POVM pulse sequence
In Sec. III B of the main text, we present an experimental implementation of a single-qubit POVM measurement that consists of the operators given in Eq. (12). Here, we motivate the choice of this POVM and provide further details on the corresponding pulse sequence.
The average 2 ↔ 3 transition frequency is difficult to calibrate due to the significant measurement misassignments between the involved states. This renders highfidelity implementations of virtual Z 2↔3 -gates problematic, as the necessary phase updates to the drive frames depend on the transition frequency, see App. A 3. We have thus chosen a POVM which does not require Z 2↔3gates. Instead, the qudit-space unitary U that encodes our chosen POVM is built up from the gate sequence The resulting POVM operators have a simple geometrical interpretation: three of the four operators points along the x-, y-, and z-axis of the Bloch sphere, Fig. 2a. The pulse sequence that implements the unitary from Eq. (B3) is shown in Fig. 9. The non-trivial phases of the pulses, manifested in non-zero imaginary parts, arise from both the Z 1↔2 -gate in the sequence as well as from phases acquired during frame changes between different transitions. With the lack of Z 2↔3 -gates in the sequence, this POVM does not represent the most general case from Eq. (A7). Besides this simplification, it exhibits all features of our proposed scheme, thus constituting a reasonable compromise between practical feasibility on hardware that is not tailored for qudit operation and generality of the proof of principle.

Appendix C: Operational distance
To compare the fidelity between two POVMs, such as the experimentally implemented POVM and the theoretical target POVM, a suitable distance measure is needed. In this work we use the operational distance (OD) [72,73]. For two M -outcome POVMs Π = {Π m } and Σ = {Σ m } the OD is defined as The OD is thus the worst-case total variation between the probability distribution of measurement outcomes obtained with the two POVMs. Importantly, 0 ≤ D OD (Π, Γ) ≤ 1 where D OD (Π, Γ) = 0 if and only if the two POVMs coincide. The OD can be calculated directly from the POVM operators through where I is the set of all outcomes I = {0, . . . , M − 1}. In an experiment that performs a quantum measurement, the implemented POVM operators can be characterized through quantum detector tomography (QDT) [69,71]. In combination with the better known quantum state tomography (QST) and quantum process tomography (QPT), QDT is required for a full specification of a quantum experiment [70]. In QST an unknown state ρ is estimated from measurements in a known set of refer-  [70]. One possible set of such states for single-qubit POVMs are projectors on the six single-qubit stabilizer states {|0 , |1 , |+ , |− , |+i , |−i } which are the eigenstates of σ z , σ x , and σ y , respectively. This is a convenient choice since the initialization in |0 and subsequent single-qubit rotations to either of these states can be implemented with high fidelity on existing quantum processors. The POVM measurement is carried out on each such reference state, sampling from the probability distributions p m be the number of times outcome m ∈ {0, 1, 2, 3} is recorded for initial state |ψ j . One way to obtain an estimator for the underlying single-qubit POVM operators is to invert the system of linear equations to obtain the entries of Π m . This approach suffers from the fact that the obtained POVM operators might be non-physical, as they are not necessarily positive. An analogous issue exists for QST through linear inversion of Eq. (D1) [79]. Positivity can be enforced with a maximum-likelihood (ML) estimation by maximizing the likelihood functional under the constraint that the operators Π m form a valid POVM [71]. As laid out in Ref. [80], the optimization can be performed with an iterative algorithm that converges to the ML estimator. This procedure has recently been demonstrated experimentally on IBM hardware as a means to mitigate readout noise [72,81]. In this work, we make use of ML quantum detector tomography to reconstruct the implemented POVM operators both for the verification of the experimental proof-of-principle in Sec. III B and for the simulations of our error mitigation scheme in Sec. IV A. 1 0 p(n g )ω n (n g )dn g . The pulse amplitudes are chosen such that the resulting rotation angles for the average transition frequency are π/2. We model the quantum dynamics of a state ρ under a pulse sequence by an effective channel E : ρ −→ 1 0 p(n g )U (n g )ρU (n g ) † dn g (E1) where U (n g ) are the unitary dynamics for a fixed offset charge n g . We obtain U (n g ) under a sequence of pulses with an integrator of the time-dependent Schrödinger equation provided by QuTip [82]. The channel E is numerically approximated by computing U (n g ) for 20 values of n g equally spaced between 0 and 1. To calibrate √ X -pulses in the simulation, we keep the amplitude fixed while varying the duration of the pulses. The target unitary of such a pulse is given by the implemented rotation U tar = R(ϕ = π/2, γ = 0), defined in Eq. (9), which includes phases that are accumulated in the idle levels. As a figure of merit, we compute the average gate fidelity F (E, U tar ) between the target unitary and the channel of the simulated unitary under charge noise [83]. We hereby restrict the computation of F (E, U tar ) to the subspace that is relevant for the POVM pulse sequence. Recall that the POVM-encoding unitary is always realized with pulses that couple adjacent levels in the order 0 ↔ 1, 1 ↔ 2, 0 ↔ 1, 2 ↔ 3, and finally 1 ↔ 2. Since the |3 state is only populated once prior to measurement, the phases acquired by |3 during a 0 ↔ 1 and 1 ↔ 2 gate do not affect the encoded POVM operators. The fidelities of √ X 0↔1 and √ X 1↔2 are thus only computed over the subspaces spanned by |0 , |1 , and |2 . Similarly, only the |1 , |2 , |3 subspace is considered for the fidelities of √ X 2↔3 since no 0 ↔ 1 pulses are applied after the 2 ↔ 3 pulse.
The average gate fidelities for different hardware parameters as a function of the pulse duration are shown in Fig 10. For short durations, the broad spectral range of the pulse leads to leakage errors. In contrast, for long pulse durations, the phases accumulated over time by the idle levels become difficult to track due to charge noise. The infidelities 1 − F ( √ X ) thus typically show a distinct minimum where these two effects are traded off optimally. As E J /E C increases, this optimum shifts towards longer gate durations, see Fig. 10d. For reference, the default single-qubit SX-gate in current IBM Quantum hardware is carried out with Drag pulses of a duration of 36 ns. We find that it is important to employ much shorter pulses when including the phase uncertainty of a neighboring state, despite our use of simple Gaussian pulse envelopes, which are not specifically designed to correct for leakage errors (especially for the √ X 2↔3 -gate). This suggests that phase uncertainties in higher excited states have an overall bigger impact on the qudit gate fidelities than leakage. The remaining leakage errors could be further reduced by a careful calibration of Drag pulses. For current hardware (E J /E C ∼ 35 -45), our simulations suggest achievable gate fidelities in the relevant qudit spaces that reach up to 99.9% for the √ X 0↔1 -and √ X 1↔2gates, and 98% for √ X 2↔3 . This can be improved by over an order of magnitude by tuning deeper into the transmon regime (e.g., E J /E C ∼ 60), at the expense of increased gate durations.
For our simulation of the full POVM pulse sequences in Secs. III C and IV, we employ the durations of the √ X n↔n+1 -pulses that maximize their respective fidelities. When limiting the total duration of the sequence as in Fig. 3a, we incrementally shorten those pulses whose fidelity is affected the least. This is repeated until a pulse sequence is obtained which is at most as long as the desired total length. From the implemented channel E of the pulse sequence, we finally obtain an effective POVM Π sim as the average over the POVM operators encoded by the unitaries U (n g ).