Self-consistent quantum measurement tomography based on semidefinite programming

We propose an estimation method for quantum measurement tomography (QMT) based on semideﬁnite programming (SDP) and discuss how it may be employed to detect experimental imperfections, such as shot noise and / or faulty preparation of the input states on near-term quantum computers. Moreover, if the positive operator-valued measure (POVM) we aim to characterize is informationally complete, we put forward a method for self-consistent tomography, i.e., for recovering a set of input states and POVM effects that is consistent with the experimental outcomes and does not assume any a priori knowledge about the input states of the tomography. Contrary to many methods that have been discussed in the literature, our approach does not rely on additional assumptions such as low noise or the existence of a reliable subset of input states.


I. INTRODUCTION
Quantum measurement tomography (QMT) may be defined as the complete characterization of a measurement performed on a quantum system by reconstructing the corresponding positive operator-valued measure (POVM) [1].This technique is of crucial importance, for instance, for monitoring the properties of near-term quantum computers and for recovering information on the quantum state at a given step of a quantum algorithm.
For example, in algorithms such as the variational quantum eigensolver (VQE), one needs to evaluate expectation values of operators on a trial state.The expectation value is obtained through repeated measurements of the trial state.Since the physical measurement process is generally faulty and the realised measurement operator may differ significantly from the idealised one, one typically obtains biased estimates, jeopardizing the convergence of the algorithm.
While, ideally, measurement tomography returns the exact POVM associated with the measurement appara- * marco.cattaneo@algorithmiq.fitus, in real experiments we usually have to deal with shot noise and imperfect preparation of the tomographic input states.This is why fitting methods, such as maximum likelihood estimation [15], are typically employed to obtain a set of physical POVM effects from the finite data obtained in the tomographic experiment.
In QMT one needs to assume the knowledge of the set of states used in the experiment.Any mismatch between the truly prepared states and their description used in QMT might lead to significant errors in the tomography.A solution to this problem is called self-consistent tomography, and consists of a tomographic procedure where the prepared states and measurement operators are not assumed a priori.Several studies on self-consistent tomography have been presented during the past ten years [16][17][18][19][20][21][22][23][24][25][26][27][28][29].
In this work, we put forward a fitting method for QMT based on a semidefinite program (SDP), a class of convex optimisation problems which can be solved with very efficient numerical methods [30][31][32].We remark that SDPs have already been proposed for quantum tomography, for instance for state tomography with incomplete data [33,34] and for regularization and optimization in detector tomography [35].Here, we show how two specific SDPs can be employed for noise detection in QMT and self-consistent tomography.
The paper is structured as follows.In Sec.II we briefly introduce the concept of quantum measurement tomography.Our SDP-based approach is described in Sec.III, and in Sec.IV we propose a self-consistent tomography method based on a sequence of SDPs.Sec.V presents numerical simulations of QMT experiments that demonstrate how the SDP method can be used to diagnose errors in QMT and how the self-consistent method can im-prove the estimation accuracy.Finally, we draw some concluding remarks in Sec.VI.

II. QUANTUM MEASUREMENT TOMOGRAPHY
Let us now formalize the concept of quantum measurement tomography, which aims to characterize the measurement we may perform on a quantum system as a positive operator-valued measure (POVM) [1].We point out that the term "quantum detector tomography" is also commonly employed in the literature [2, 3,[36][37][38][39][40].In this work, we prefer "measurement tomography" because we are discussing the characterization of a generic POVM that may arise in a plethora of different physical situations, which may not involve proper detectors.
Let us suppose that the measurement we are interested in has m different outcomes.According to quantum mechanics, each outcome can be associated with an operator Π k (also called an effect) satisfying the following properties: where I denotes the identity operator.The set of effects {Π k } m k=1 fully characterises the measurement, since the outcome probabilities for any quantum state ρ can be computed according to the Born rule p k = Tr(ρΠ k ).Thus, the goal of QMT is to, given an uncharacterised measurement apparatus, provide a description of its measurement effects {Π k } m k=1 .The standard way to perform QMT is to prepare a tomographically complete set of states {ρ j } N j=1 [15,41,42] and measure them with the uncharacterised measurement.To be tomographically complete, the set must contain at least d 2 linearly independent states.If this is the case, by knowing the outcome probabilities p jk = Tr[ρ j Π k ] we can solve the tomographic inverse problem [43,44] and obtain each effect Π k through linear inversion.
However, in real experiments we never know the probabilities p jk exactly.This is because we can only perform a finite total number n S of measurement shots, which allows us to estimate the frequencies where C jk is the number of times we have obtained the kth outcome when measuring the jth state and n S /N is the number of times we prepare each state.These frequencies are just an approximation of the true probabilities as lim n S →∞ f jk = p jk .As a consequence, if we apply standard linear inversion starting from {f jk }, we may obtain non-physical effects {Π k } m k=1 (i.e., they may not all be positive) due to finite statistics [15].Therefore, some fitting methods are employed to reconstruct the best physical estimation of the set {Π k } m k=1 starting from the initial data, the most common being maximum likelihood estimation [15].In the next section, we put forward an alternative fitting method based on semidefinite programming.

III. SEMIDEFINITE PROGRAMS FOR MEASUREMENT TOMOGRAPHY
Since standard linear inversion may not yield a physical set of effects, we need to have ways of providing sensible (i.e.physical) estimates of {Π k } m k=1 given {f jk } and {ρ j } N j=1 .A widely used method is given by maximumlikelihood estimation (MLE) [15], which we briefly review in Appendix A. Our goal here is to propose an alternative method that can be computed via semidefinite programming, for which efficient algorithms exist.
In this section we solve QMT through the following optimisation problem: min where f is the vector of frequencies f jk and q is a vector with components Tr(ρ j Π k ), while x is some norm of the vector x.It can be easily shown that the problem above can be written as a simple SDP [30,31] if we choose the norms In this paper, we will focus on the infinite norm x ∞ and 1−norm x 1 .Each norm introduces a different distance between the experimental probabilities f and the quantum probabilities q reconstructed according to the Born rule in Eq. ( 3).The infinite norm is simply capturing the maximal distance between two single elements of each vector of probabilities.The 1−norm corresponds to the total variation distance between f and q.Following the standard interpretation of the total variation distance in classical probability theory, this measure can be employed to compute the success probability of distinguishing between the two different statistics in a single-shot discrimination task [1,45].In our case, since f and q contain several probability distributions (one for each input state), the total variation distance between these vectors divided by the number of input states can be understood as the success probability of distinguishing between experimental and reconstructed statistics averaged over the different input states.Let us first choose the infinite norm, for which (3) becomes min At first sight this seems to be a min-max problem over a non-linear objective function (because of the norm), but one can easily transform it into an SDP by noticing that the biggest absolute value of an entry of a vector x is the minimum value of δ ≥ 0 such that −δ1 ≤ x ≤ δ1, where 1 = (1, • • • , 1) T .Thus, the optimisation problem (4) can be rewritten as min Now the problem involves just a minimisation of a single parameter δ (this is the reason we call it the single-delta SDP).Once an instance of this SDP is solved, we have both the solution δ * and the effects {Π * k } that satisfy all the constraints (i.e.define a valid POVM).Moreover, the SDP has a very neat interpretation: δ can be seen as a perturbation to the frequencies f jk , so that the solution of the SDP δ * quantifies the minimum amount of perturbation we need to add to the frequencies so that they have a quantum realisation.For instance, if δ * = 0, no perturbation is needed and we can find effects {Π k } such that Tr(ρ j Π k ) = f jk ∀ j, k.The other norm that we analyse is the 1-norm, for which (3) becomes min We can use the same reasoning as before and rewrite this optimisation problem as min Notice that now we have added one perturbation δ jk to each f jk , which implies having more variables than the SDP (5).At the same time, now we can have more finetuned information about which frequencies need to be perturbed more to have a physical description.As we will see later, this SDP will be particularly useful to detect errors in the preparation of specific input states.As stated above, the 1−norm introduces the total variation distance between the experimental and reconstructed probabilities distributions.Thus, the solution of Eq. ( 7) can be simply seen as a quantifier of how well we can distinguish the observed statistics from a truly quantum one [1,45].
In Sec.V we will employ these SDPs for numerical simulations of different QMT experiments.Our numerics were run using the MOSEK solver [46] with CVXPY [47,48].The code we have developed can be found in [49] and can be easily employed for reproducing the results of this paper and/or for analyzing different quantum tomographic experiments.
We point out that our aim is not to employ SDPs to improve the performance of QMT (e.g., better runtime with respect to previous methods), but for a better understanding of noise in QMT and for possible self-consistent estimations based on the see-saw method which we describe in the next section.This being said, characterizing the technical performance of the SDPs introduced in this section is also important.We have observed that using the MOSEK solver and CVXPY the performance of SDPs is comparable with that of the widely used log-MLE fitting method [15].We refer the reader to Appendix B for a comparison between these two approaches.

IV. SEE-SAW METHOD FOR SELF-CONSISTENT TOMOGRAPHY
A known issue of QMT (and of quantum tomographic experiments in general) is the fact that, as discussed in Sec.II, we assume to perfectly know the set of input states {ρ j } N j=1 we employ to characterize the POVM effects.In real experimental conditions this is hardly the case, as different types of noise affect the states we prepare, and this can significantly jeopardize the final estimation of a tomographic experiment [18,28].In general terms, undesired noise in the input state preparation and/or in the measurement strategy (e.g., for state tomography) is referred to as SPAM (state preparation and measurement) errors.Different strategies have been proposed to avoid these errors, and here we put forward a new one based on the SDPs introduced in Sec.III.
Naively, if we do not know the set of input states {ρ j } N j=1 precisely, we may try to solve (3) treating both set of states {ρ j } N j=1 and the set of effects {Π k } m k=1 as variables.However, this problem becomes non-convex, which makes it difficult to find an efficient solution to it.To cope with this issue, many different solutions have been proposed in the literature.These include small gate errors to linearize the problem for gate-set-tomography [18], assuming that states and measurements are globally completable to rewrite the problem as an SDP [20], assuming that there is a subset of known input states [23], using self-testing techniques [50] to perform selfconsistent tomography in a photonic setup [24], assuming that there is a set of noiseless unitary gates that we can apply on the input states [25], relying on randomized compiling [51] and assumptions on the gates we can apply during the tomographic procedure [26], or considering this minimisation task applied to measurement of superconducting qubits as a bilevel problem [27].
In this section, we propose to perform self-consistent measurement tomography through a see-saw approach, in which we switch from a quantum measurement tomography to a quantum state tomography iteratively until the solution converges.Notice that since we seek to estimate both the measurement and the set of states implemented, we need to use not only an informationally complete set of states, but also an informationally complete (IC) POVM [52,53].Similarly to IC-states, IC-POVMs are defined as POVMs whose effects form a (Hermitian) basis in the space of bounded operators on the system Hilbert space B(H).Therefore, if the Hilbert space H has dimension d, a POVM must have at least d 2 linearly independent effects to be informationally complete.These POVMs can be employed to acquire the most general information about the state of the system, since they can be used to reconstruct the density matrix of the quantum system via quantum state tomography (QST) [1].

A. Defining the see-saw procedure
Let us now describe how the see-saw method works using the single-delta SDP (one could similarly use the many-deltas SDP).Initially we perform standard QMT of the POVM we aim to characterize with a chosen set of input states {ρ (0) j } N j=1 (this might be, for instance, our best guess for the set of states used in the experiment).The experimental frequency matrix f jk and the set {ρ (0) j } N j=1 will be the input parameters of the SDP problem in (4), as usual.The output of the SDP will be a set of effects, say , and the value of δ (0) according to (5).Due to noise in the input state preparation (and, additionally, to shot noise), δ (0) will be different from zero, as will be discussed in Sec.V.
Then, we will run another SDP whose input parameters are the frequency matrix f jk and the set of output effects {Π (0) k } m k=1 , while the output variables will be a new set of states {ρ (1) j } N j=1 and a new δ (1) .That is, we will perform quantum state tomography for the whole set of input states, using as the "known" measurement device the POVM returned by the first SDP.
The new SDP for QST of the set of input states can be written as: Clearly, δ (1) ≤ δ (0) .We can therefore repeat this procedure many times, alternating the SDP for QMT and the SDP for QST.Since the overall optimisation problem is not convex, there is no guarantee that the see-saw method will converge towards the optimum.However, we have tested see-saw numerically and observed that, in many scenarios, we can obtain very low values of δ.If after the l th iteration we find δ (l) ≈ 0, then we know that {ρ l−1 } N j=1 and {Π k } m k=1 (assuming that the last iteration was a QMT test) consist of pairs of states and effects that are compatible with the measurement statistics.More in particular, the criterion we adopt to stop the see-saw procedure is: interrupt see-saw after the sth step if where ν δ is a small number we suitably choose.Then, after the sth step, we will have a set of input states and POVM effects that will match the experimental frequency matrix f jk up to the precision given by δ (s) .We must stress however that this pair is not unique.There is in general a gauge transformation [28] that can be applied to the states and the effects and preserves their mathematical (and physical) properties and conserves the probabilities Tr[ρ . This gauge freedom is a well-known issue of self-consistent tomography and gate set tomography, and different optimisation methods have been devised to choose a suitable gauge [28] (e.g., the one that minimises the distance between the final set of input states and the initial guess).

B. Finite-shot effects in see-saw
The see-saw method we have just described searches for a set of input states and a set of POVM effects that better match the experimental frequencies.While the main goal is to overcome mismatches between the real and assumed set of input states, the see-saw method ends up also taking into consideration finite statistics effects.Indeed, as we will observe in Sec.V A, shot noise alone in a simple QMT experiment will lead to a value of δ * in the single-delta SDP given by Eq. ( 5) that is different from zero.The see-saw will then try to decrease δ * by searching for another set of input states even if our initial guess for the input states is perfectly correct.
In what follows we will describe a simple method that can be used to mitigate this effect.More specifically, we randomly divide the dataset of the outcomes of the QMT experiment into two subsets of the same size, say A and B. If we use n S shots in the QMT experiment, then each subset will be obtained with n S /2 shots only.Then, for each subset we will have new experimental frequencies f (A) and f (B) , both obtained with half the number of shots of the total subset.We can now estimate the infinite-norm distance between the experimental frequencies of the two subsets as where we have chosen the infinite norm because we are considering the see-saw method based on the single-delta SDP (the 1−norm may be chosen accordingly if we are employing the many-deltas SDP).We then repeat this calculation many times for randomly chosen partitions and estimate the average distance dCV among all partitions.The value dCV is a heuristic measure of the fluctuations (in the infinite norm) of the experimental frequencies f due to shot noise.If n S → ∞, then d CV → 0. Therefore, we can stop see-saw when the infinite-norm distance between the reconstructed quantum probabilities q and the experimental probabilities f (that is, the quantity δ (s) introduced in the previous section) is of the order of d CV , as the mismatch between q and f may be caused by shot noise alone.More specifically, we interrupt see-saw either if Eq. ( 9) is satisfied or if the final δ (s) value of see-saw is such that The 1/2 factor has been inserted because dCV is estimated through half of the shots that see-saw uses.We heuristically choose 1/2 instead of 1/ √ 2 (which is the shot-noise scaling factor of δ * returned by the single-delta SDP as a function of n S , as we will observe in Sec.V A) to be more conservative on when to stop see-saw.

V. EMPLOYING SDP IN NUMERICAL SIMULATIONS OF QMT EXPERIMENTS
In this section we demonstrate the potential of SDP for QMT by simulating different tomographic experiments with and without noise in the input state preparation.We will show how noise affects the tomographic results and how the many-deltas SDP can be used to detect a faulty preparation of the input states.We will then move on to the situation where we do not assume perfect state preparation and use the self-consistent approach proposed in the previous section.
A. Quantifying the impact of noise in QMT through SDP Different types of noise are always present in any quantum experiment and influence the QMT process.For instance, shot noise, i.e., the fact that the probabilities observed differ from the ideal ones due to finite statistic effects.Another possible source of noise is due to the fact that QMT assumes that we perfectly know the set of input states {ρ j } N j=1 , while this is generally not true in quantum experiments.In this section, we will see how these types of noise impact the performance of QMT and how the SDPs provided in Sec.III can be used to diagnose them.

Shot noise
To detect the effects of shot noise on measurement statistics we will employ exclusively the single-delta SDP, as shot noise is uniform over all the input states and effects of QMT.The solution δ * of ( 5) is a measure of the mismatch between the ideal probabilities associated with the set of output effects and the experimental frequencies.Therefore, intuitively it should decrease by increasing the number of shots.We quantitatively investigate this behaviour by performing 100 different numerical simulations of QMT on a single-qubit SIC-POVM (Symmetric informationally complete POVM) [53] with 4 random linearly independent input states and for different number of shots n S .More specifically, the effects of the SIC-POVM can be written in the Bloch representation as [53]: where σ = (σ x , σ y , σ z ) T and n k are unit vectors given by: Our numerical tests of 100 experiments consisted of the following steps: i) We first generated a set of 4 random linearly independent input states (density matrices) {ρ j } 4 j=1 through a suitable function available in QuTiP [54].ii) We then simulated n S total measurement runs of the SIC-POVM on this set of input states by sampling the probability distribution given by p jk = Tr[ρ j Π (SIC) k ] (for each input state we therefore used n S /4 shots), which produced the frequencies f jk .iii) We used these frequencies and and the set {ρ j } 4 j=1 as inputs to the single-delta SDP (5) and obtained the solution δ * and the corresponding set of effects {Π * k }.
The results of our simulation are shown in Fig. 1, where we plot a histogram distribution over the 100 numerical experiments of the final values δ * for different number of shots n S .As expected, δ * is statistically smaller if n S is higher.This is consistent with the fact that QMT is more accurate when more shots are performed, and it returns the exact effects of the POVM we aim to characterize in the limit n S → ∞.
Moreover, we have repeated the same tomographic experiments with different numbers of input states N .If N = 4, then we have a complete set of states for QMT of the SIC-POVM.If N > 4, we say that we have an overcomplete set of input states.We have generated 100 different sets of N random input states and, for each of them, we have run 100 numerical experiments of QMT.We have repeated this with varying N .We plot in Fig. 2 the mean value of δ * over the total 10 4 experiments as a function of the total number of shots n S and for different numbers of input states N .For finite numbers of shots there is trade-off between the number of input states, which add information on the POVM effects, and the number of shots that is split into the different inputs.In Fig. 2 we observe that δ * is larger if N is larger, so, in this case, this trade-off is privileging more shots for less inputs.Finally, we remark that the average δ * respects the scaling from shot noise, which is proportional to 1/ √ n S (solid black line).
In addition, to compare the value of δ * with the actual accuracy of QMT, we have computed the average trace distance [1] between the ideal effects of the SIC-POVM in (12) and the output effects returned by the SDP.The results are displayed in the inset of Fig. 2, where the quantity we are plotting is the average of the trace distance over both the 10 4 different experiments and the different effects.Although more statistics would be nec-essary to compare the performance with respect to N , we observe that the trace distance also decays as 1/ √ n S for any number of input states (see inset of Fig. 2).

Faulty preparation of input states
Another crucial source of errors in QMT consists in mismatches between the assumed input states and the actually prepared ones.If the frequencies observed came from measurements on different states than the ones assumed, the optimisation method (being it MLE, SDP, or any other) will be driven to an erroneous POVM, even in the limit of infinite statistics.In this subsection, we show how the solutions δ * of ( 5) and δ * j,k of ( 7) can be employed to detect noise in the preparation of the input states for QMT.In particular, we will analyze two different types of noise, namely incoherent noise and coherent noise.
For both coherent and incoherent noise, we will study the effects of noisy maps that vary among the input states.This is because, if we apply the same map φ to all input states, we run into a gauge-freedom problem in the measurement tomography.That is, if the real set of effects is {Π k } m k=1 , QMT will return (up to shot noise) the set {φ As a consequence, we will not obtain higher values of δ * .This gauge loophole is avoided if we apply a different map to each input state.Moreover, varying the noise on the input states is also a more physical description of real errors on near-term quantum computers, as different input states are prepared through different gates, and therefore are subject to different noise sources and magnitudes.
a. Incoherent noise We say that a noise channel is "incoherent" if it is reducing the purity of the quantum state of the system.One of the most common examples of incoherent noise is the depolarizing channel [1], defined for one qubit as where p = [0, 1] can be considered as the noise strength.
Another incoherent noise channel is the amplitude damping channel [1]: with Finally, a third example of incoherent noise is given by the phase damping channel [1]: with Let us now construct a quantum map that depends on the value of a 3-outcome random variable X = 0, 1, 2, where the three outcomes have equal probability.The quantum map can be written as: We now simulate a set of N input states by drawing a different value of X for each input state and then applying φ (X) p thereto.Eventually, we prepare the set {φ , where X can vary among the input states.Using the above prescription, we have generated 100 different sets of N random input states and, for each of them, we have performed 100 numerical QMT experiments to characterize the SIC-POVM introduced in Sec.V A 1, with different values of incoherent noise p on the input states.We have done the same for 100 experiments with a specific set of pure (before noise) input states given by the eigenvectors of the Pauli matrices.Moreover, we have also generated 100 different random IC-POVMs by drawing 4 random Kraus operators in QuTiP [54] and then suitably composing them to form 4 random POVM effects.For each random POVM, we have performed 100 QMT experiments using the Pauli eigenstates as input states.The numerical experiments have been performed following the same lines as for shot noise.The results for δ * according to the single-delta SDP are depicted in Fig. 3 (left) for the same sets of 4 and 6 random input states we used in the case of incoherent noise, and for the Pauli eigenstates (both for the SIC-POVM and the randomly generated POVMs).We note that the 6 random input states and the 6 Pauli eigenstates are more sensitive to incoherent noise than 4 random states.
b. Coherent noise Coherent noise can be defined as the application of (undesired) unitary rotations to the input states of QMT.A generic 1-qubit rotation can be characterized by three angles, namely φ, ϕ, and ψ, as follows: We now focus on numerical experiments on QMT of the SIC-POVM and of random POVMs with the Pauli eigenstates, as discussed in the previous sections, but with only coherent noise (no incoherent noise, i.e., p = 0 in (19)).We generate a uniformly random rotation by sampling uniformly ψ and ϕ from [0, 2π] and a quantity ζ uniformly from [0, 1]; then, we compute φ = arcsin √ ζ [55].We perform different numerical experiments by varying the set of input states, the POVM, and the coherent noise magnitude < 1, which is used to scale ψ and φ in (20) (that is, we sample, e.g., ψ as discussed above and then we multiply it by , and the same for φ).In this way, we construct a random unitary rotation that is close to the identity (the parameter ϕ does not need to be of the order of to obtain such a small perturbation).We sample a different unitary rotation for each input state, and then we prepare a set of noisy input states as {U (φ, ϕ, ψ)[ρ j ]} N j=1 , where the parameters of U are sampled and scaled by for each j.
We plot in Fig. 3 (right) the average value of δ * returned by the single-delta SDP as a function of the coherent noise magnitude .We immediately realize that this quantity is able to capture the presence of coherent noise in the input states for any number N thereof.Moreover, even a small amount of coherent error in the state preparation is inducing a relevant δ * (typically, for the same magnitude of p and , one order of magnitude higher than for incoherent noise), therefore we can conclude that δ * will be able to detect noise in the input states in most of the experimental realizations of QMT on near-term quantum computers.c.Predominant noise on a subset of input states So far, we have explored the effects of noise acting randomly on each input state with the same magnitude, treating all of the states on the same footing.This is not always the case in real experimental conditions.For instance, in many of the current devices the initial state of the qubit is prepared in the ground state, and this initialization may be assumed to be more reliable than, for instance, the preparation of an entangled multi-qubit state that requires several CNOT gates.In these situations, the many-deltas SDP can be employed to detect unbalanced noise among the input states.
We have performed two sets of 100 numerical experiments of QMT on the SIC-POVM through the manydeltas SDP with noise on only some of the input states.Specifically, we have chosen as initial input states the set of 6 eigenstates of the Pauli matrices, where we denote |±z as the eigenstate of σ z with eigenvalue ±1, and analogously for the other matrices.For the first set, we have added incoherent noise with p = 0.1 (see (19)) on the states |±x , and no additional noise on the remaining states.For the second set, we have added coherent noise with = 0.01 (see the discussion in the previous subsection) to the state |−z only.The quantity we have computed for each experiment is where k labels the effects of the SIC-POVM in ( 12) and δ * j,k is the output of ( 7).The results are shown in Fig. 4. We observe that the quantity δ * j in (21), which can be obtained through the many-deltas SDP, signals which input state preparation is noisier, both for coherent and incoherent error.Therefore, this SDP can be employed as a diagnostic tool to recognize which state preparation is making QMT less reliable.

B. Improving the estimation through the see-saw method
In Sec.V A we have discussed how the SDPs are affected by mismatches between the assumed states and the real ones.As we have seen, we can still have a decent estimation if the level of noise is low.However, if the real states are far from the assumed ones, our estimation becomes misleading.The see-saw method for self-consistent tomography proposed in Sec.IV can be employed to solve this issue.
To test the see-saw approach, we have run it in 100 different experiments for 100 different sets of random input states (for a total of 10 4 experiments) on the same noisy scenarios we have analyzed in Sec.V A 2 and in Fig. 3, and analogously for 100 experiments on the Pauli eigenstates and SIC-POVM.Following the same discussion as in Sec.V A 2, we have done the same for 10 4 experiments with the Pauli eigenstates and random POVMs.In this first numerical experiment, we have not implemented the stopping condition to avoid finite-shot overfitting expressed by Eq. (11).We have computed the mean value and the median of δ (s) at the final step over the different experiments with ν δ = 10 −6 (clearly, the lower ν δ the most accurate is see-saw, and the longer it takes to run), and the results are depicted in Fig. 5.We have plotted both the mean value and the median because we have noticed that these quantities may be remarkably different in the see-saw approach, and in particular the mean value is often much larger than the median.This is due to a few outlier realizations converging to a relatively large δ (s) , which offset the mean value.
Comparing Fig. 5 with Fig. 3, we note that, for randomly generated states, see-saw typically converges to very low values of δ (s) (median around 10 −8 ) for incoherent noise and independently of the noise strength1 ; for coherent noise the median is larger (between 10 −5 and 10 −4 ) in the case of 6 random input states and non-zero input noise.The larger mean values in Fig. 3 (right) detect the presence of a few "bad" realizations, as explained before.In contrast, for the very specific set of six eigenstates of the Pauli matrices, see-saw does not always converge to such low values, especially for coherent noise and the SIC-POVM.More insights into see-saw for the Pauli eigenstates are given by Fig. 6, which displays the distribution of δ (s) for both coherent and incoherent noise and the SIC-POVM.We note that, despite the average δ (s) is not very low, see-saw is often decreasing the value of δ * from (5) of several orders of magnitude, as also captured by the median in Fig. 5.
Finally, we have repeated the same numerical experiments with see-saw including the stopping condition to avoid finite-shot overfitting that is expressed by Eq. (11).The average and median values of δ (s) when this condition is included are shown in Fig. 7.We observe that the average δ (s) is much higher compared to the case without the stopping condition (results in Fig. 5), with the partial exception of the Pauli eigenstates and SIC-POVM.The higher values of δ (s) show that, in most of the cases, see-saw is stopped before convergence to prevent shot-noise overfitting, as explained in Sec.IV B. For random input states, we observe that the median value of δ (s) is still extremely low.More insights into this result can be found in Fig. 8, where we plot the average and median numbers of see-saw stops in each experiment.Very low median values of δ (s) correspond to two see-saw steps, i.e., we perform one QMT step, followed by a QST one, before stopping.In these cases, the SDP for QST is often able to immediately find very good sets of input states that match the experimental condition, giving rise to very low values of δ (s) .Moreover, note that see-saw for Pauli eigenstates and the SIC-POVM with coherent noise runs for several steps, as it typically does not reach the stopping condition in Eq. (11).
In conclusion, we have observed that see-saw is usually very effective in finding sets of input states and POVM effects that match the experimental probabilities.In addition, adding the stopping condition in Eq. (11) to the see-saw method helps avoiding finite-shot overfitting and also speeds up the whole procedure because it leads to a reduced number of see-saw steps, as shown in Fig. 8.   11), as a function of the noise strength on the input states, for the same experimental conditions as in Figs. 5  and 7. Left: incoherent noise.Right: coherent noise.

VI. CONCLUSIONS
In this work, we have put forward two semidefinite programs (SDPs) for fitting the experimental data of quantum measurement tomography (QMT), and show how they can be employed to detect different noise sources in QMT experiments and devise a strategy for selfconsistent tomography.The SDPs have been introduced in Sec.III, where we have also pointed out that they correspond to minimising the distance between experimental probabilities and ideal quantum probabilities with respect to different norms.The runtime performances of these methods are comparable with that of the standard log-maximum likelihood estimation, as shown in Appendix B.
In Sec.IV we have discussed how the "single-delta" SDP for measurement tomography of an informationallycomplete POVM may be employed for self-consistent tomography through a see-saw optimisation method.The method consists of alternating between an SDP for measurement tomography and an SDP for state tomography on the whole set of input states.The measurement tomography starts from the experimental frequencies and a set of input states that is typically our best guess about the "real" experimental states.At each step of the seesaw, the input parameters are updated according to the output estimates returned by the previous SDP.
In addition, we have devoted Sec.V to the numerical analysis of SDPs for simulated QMT experiments.In particular, in Sec.V A we have discussed how the SDPs can be applied to detect both shot noise, that is, statistical noise due to a finite number of measurement realizations, and noise in the preparation of the set of input states in real experiments on QMT.We have shown that the SDPs well-capture the magnitude of shot noise, as well as of coherent noise and incoherent noise on the input states (with the partial exception of 4 random input states and purely incoherent noise).Moreover, a particular type of SDP, namely the many-deltas SDP corre-sponding to 1-norm minimization, can be employed to detect unbalanced noise among the input states.
Finally, in Sec.V B we have shown that, for the same experimental conditions as in Sec.V A, the seesaw method can reach very low values of the parameter δ (s) that characterizes the mismatch between experimental frequencies and ideal quantum probabilities, thus yielding a set of input states and effects that are compatible with the measurement statistics.Furthermore, we have also devised a strategy to avoid finite-shot overfitting with see-saw and shown its effectiveness in the numerical simulations.
In conclusion, in this work we have shown that SDPs can be a useful, valid and feasible alternative to logmaximum likelihood estimation in quantum measurement tomography.The insights they give on the errors in QMT make them particularly suitable for the analysis of noise on near-term quantum computers.Moreover, the see-saw method is a practical and fast way to perform self-consistent tomography on these types of quantum devices.

ACKNOWLEDGMENTS
We would like to thank Laurin Fischer, Adam Glos, Francesco Tacchino and Ivano Tavernelli for interesting discussions on noise detection on quantum hardware.We would also like to thank Carmen Vaccaro for preliminary studies on the runtime of the single-delta SDP, discussed in Appendix B. The SDPs presented in this work are integrated in Aurora, a proprietary quantum chemistry platform developed by Algorithmiq Ltd.

7 FIG. 1 .
FIG.1.Distribution of the values of δ * returned by the singledelta SDP according to(5) for QMT on the SIC-POVM with 4 random input states (100 numerical experiments), for different total numbers of shots nS.In the plot, we are omitting a few outliers around 10 −8 due to favourable frequency samplings that are close to the ideal case (f jk ≈ p jk ).

FIG. 3 .FIG. 4 .
FIG.3.Average δ * as a function of incoherent noise strength p (left) or coherent noise magnitude (right) on the input states, returned by the single-delta SDP over 10 4 (for the input states and random POVMs) or 100 (for the Pauli eigenstates with SIC-POVM) numerical experiments on QMT, for different sets of input states.The error bars are given by the standard deviations of the samples over the different experiments.We are using a total number of shots nS = 6 × 10 5 .

FIG. 5 .
FIG.5.Mean value (dotted line) and median (dash-dotted line) of δ (s) at the final sth step of see-saw, without countering overfitting, according to(9) with ν δ = 10 −6 , as a function of the noise strength on the input states, over 10 4 (for the random input states and random POVMs) or 100 (for the Pauli eigenstates with SIC-POVM) numerical experiments on QMT, for different sets of input states.The error bars around the mean values are given by the standard deviations of the samples over the different experiments.We are using a total number of shots nS = 6 × 10 5 .Left: incoherent noise.Right: coherent noise.

FIG. 6 .FIG. 7 .FIG. 8 .
FIG.6.Distribution of δ (s) at the final sth step of see-saw according to(9) with ν δ = 10 −6 , for the same experiment as in Fig.5.The set of input states consists of the six eigenstates of the Pauli matrices.We are using a total number of shots nS = 6 × 10 5 .Left: incoherent noise.Right: coherent noise.

FIG. 9 .
FIG. 9. Runtime τ of single-delta SDP (blue), many-deltas SDP (orange), and log-MLE (green) as a function of the dimension d of the Hilbert space of the quantum system.The error bars are the standard deviations of the values over 100 different numerical experiments.We employ either 10 4 or 5 × 10 6 shots per input state, and a complete (N = d 2 ) or overcomplete (N = d(d + 1)) set of random input states.

FIG. 10 .
FIG. 10.Average trace distance between the "ideal effects" and the output effects returned by the single-delta SDP (blue), many-deltas SDP (orange), and log-MLE (green) as a function of the dimension d of the Hilbert space of the quantum system.The error bars are the standard deviations of the values over 100 different numerical experiments.We employ either 10 4 or 5 × 10 6 shots per input state, and a complete (N = d 2 ) or overcomplete (N = d(d + 1)) set of random input states.
FIG.2.Average δ * as a function of the total number of shots nS returned by the single-delta SDP according to (5) over 10 4 numerical experiments on QMT for the SIC-POVM and for different numbers of random input states.Inset: for the same experimental conditions, average trace distance between the estimated effects of the single-delta SDP and the corresponding effects of the SIC-POVM.The trace distance is averaged over both different experiments and different effects.The error bars in the plots are given by the standard deviations of the samples over the different experiments.The shot-noise scaling proportional to 1/ √ nS is also shown in both plots (solid black line).