Learning to Measure: Adaptive Informationally Complete Generalized Measurements for Quantum Algorithms

Many prominent quantum computing algorithms with applications in fields such as chemistry and materials science require a large number of measurements, which represents an important roadblock for future real-world use cases. We introduce a novel approach to tackle this problem through an adaptive measurement scheme. We present an algorithm that optimizes informationally complete positive operator-valued measurements (POVMs) on the fly in order to minimize the statistical fluctuations in the estimation of relevant cost functions. We show its advantage by improving the efficiency of the variational quantum eigensolver in calculating ground-state energies of molecular Hamiltonians with extensive numerical simulations. Our results indicate that the proposed method is competitive with state-of-the-art measurement-reduction approaches in terms of efficiency. In addition, the informational completeness of the approach offers a crucial advantage, as the measurement data can be reused to infer other quantities of interest. We demonstrate the feasibility of this prospect by reusing ground-state energy-estimation data to perform high-fidelity reduced state tomography.

Many prominent quantum computing algorithms with applications in fields such as chemistry and materials science require a large number of measurements, which represents an important roadblock for future real-world use cases. We introduce a novel approach to tackle this problem through an adaptive measurement scheme. We present an algorithm that optimizes informationally complete positive operator-valued measurements (POVMs) on the fly in order to minimize the statistical fluctuations in the estimation of relevant cost functions. We show its advantage by improving the efficiency of the variational quantum eigensolver in calculating ground-state energies of molecular Hamiltonians with extensive numerical simulations. Our results indicate that the proposed method is competitive with state-of-the-art measurement-reduction approaches in terms of efficiency. In addition, the informational completeness of the approach offers a crucial advantage, as the measurement data can be reused to infer other quantities of interest. We demonstrate the feasibility of this prospect by reusing ground-state energy-estimation data to perform high-fidelity reduced state tomography.

I. INTRODUCTION
Quantum computing is a rapidly growing multidisciplinary field with a very clear objective: to understand if, and to what extent, it is possible to build computing machines able to perform tasks that are impossible for conventional (classical) computers. Theoretically, milestone discoveries such as Shor's and Grover's quantum algorithms hint toward a positive answer to this question. These algorithms, which exploit quantum properties of the processor, can in principle outperform all currently existing classical methods. In practice, however, the implementation of such protocols in the regimes of interest will most probably require the use of ideal fault-tolerant universal quantum computers. At the same time, because of the extreme fragility of quantum information storage and processing in the presence of environmental noise, error-correction techniques required to achieve fault tolerance are still experimentally in their infancy.
Universal fault-tolerant quantum computers, however, are not the only type of quantum machines able to tackle computationally hard problems. In fact, we can reformulate the main quantum computing research question and ask ourselves: what are the useful problems that quantum computers can solve more efficiently than their classical counterparts and, specifically, which subclasses of such problems are less demanding in terms of experimental requirements, given the current state-of-the-art quantum hardware? Note that this question has a dif-ferent starting point, namely it focuses on our current -or near-future-technologies and devices, and aims at identifying, based on the current understanding, useful applications that may benefit from them.
There are at least two classes of problems that satisfy the above requirements. The first class has a longstanding history, dating back to Feynman (1982) [1] and Manin (1980) [2], who pointed out that the simulation of quantum systems is hard on classical computers, while, under certain conditions, they can be efficiently investigated by means of other quantum systems [3]. In fact, this can be done using either digital quantum simulators, namely specific-purpose quantum computers [4][5][6], or by employing analog quantum simulators [7][8][9][10], namely other equivalent but easier-to-control quantum systems. The second class of problems emerges when we lift the requirement of finding "exact" solutions to a given problem. Approximate near-term quantum devices might be able, e.g., to find better solutions to certain worst-case instances of non-deterministic polynomial-time hard (NPhard) problems or find such approximate solutions faster.
A final ingredient to move toward the existing approximate noisy quantum devices [11,12] is the combination of quantum and classical techniques to maximize performance. In this paper, we focus on variational quantum algorithms, which have emerged recently as the most suited paradigm to tackle the classes of problems identified above [13,14] with approximate quantum computing. Specifically, these protocols are implemented by arXiv:2104.00569v2 [quant-ph] 3 Dec 2021 preparing a parametrized N -qubit trial state on a quantum device, extracting some observable quantities with suitable measurements and processing such measurement outcomes using a classical optimizer. The latter then returns the small changes that need to be implemented to prepare, in the next step, an updated trial wave function. This cycle is repeated many times until it converges to a quantum state from which the desired approximate solution can be extracted.
This procedure can be used to solve problems in chemistry [15][16][17][18], for the design of new materials [19], and generally in every field of physics where one needs to extract the properties of many-body quantum correlated systems, e.g., interacting fermionic systems, which are typically hard to simulate on classical devices [20,21]. In this case, these algorithms go by the name of Variational Quantum Eigensolvers (VQE) [15,22,23]. In essence, the quantum processor is used to explore the exponentially large Hilbert space of the fermionic particles in order to find iteratively the ground state of the Hamiltonian, without solving the full diagonalisation problem. As an example, the knowledge of the ground state of a chemical compound as a function, e.g., of the nuclear coordinates allows one to extract crucial information such as the equilibrium bond length, bond angle, and dissociation energy. Note that, at least in principle, a quantum computer with a few hundreds of qubits could already have the potential to solve useful quantum chemistry problems that are intractable on classical computers.
The application of VQE has already been demonstrated in many proof-of-principle experiments [15,22,[24][25][26]. However, a few major challenges still need to be overcome along the path to useful quantum advantage. On one hand, the classical optimization step that is associated with variational quantum algorithms can in general incur high computational costs because of the existence of many local minima or due to the problem of vanishing gradients [27]. Some possible solutions have been proposed, combining techniques borrowed from classical optimization theory with a careful design of the variational ansatz, such as the recently proposed ADAPT-VQE [28] and oo-VQE [29], and of the associated cost function [30]. On the other hand, the so-called measurement problem arises from the very high cost in terms of the number of observations that are typically needed to reconstruct the properties of interest, and specifically, the expectation value of the Hamiltonian, on the quantum states constructed by variational means. In fact, as the size of the problem approaches the regime in which the VQE could compete with classical methods, the current approaches would lead to prohibitive requirements to reach the desired degree of accuracy [21,[31][32][33].
In this work, we tackle the second problem, by presenting a novel adaptive method that sensibly alleviates the demands on the number of measurements, thus paving the way for an increase of the affordable problem sizes in experimental realisations.
On a fundamental level, our approach introduces a new perspective on how to improve the overall observables reconstruction strategy in VQE, and possibly in variational algorithms in general, by leveraging informationally complete quantum measurements.
Before introducing our protocol, however, in the next section we describe the measurement problem in more detail and briefly mention the main approaches that have been proposed in the literature to tackle it in the next section.

II. THE MEASUREMENT PROBLEM
One of the most prominent differences between classical and quantum methods concerns the way in which information is extracted at the end of the execution of the algorithm. In a typical situation, the quantum circuit prepares a N -qubit quantum state |ψ that is used to compute the expectation value of an operator O = ψ|O|ψ . Generally, it is not possible to measure O directly in its eigenbasis. For instance if we are interested in finding the ground state of the Hamiltonian H, measuring in its eigenbasis requires solving the problem itself in advance.
The standard measurement protocol, henceforth named the Pauli method, consists in writing the oper- x , . . . are Pauli operators. The expectation value of the operator is therefore obtained in terms of the weighted sum of K expectation values, O = k c k P k .
Unfortunately, this method leads to a suboptimal measurement scheme, as the variance of O is the sum of the weighted variances of the individual operators P k . More precisely, the error in the estimation is given by where Var(P k ) = P 2 k − P k 2 is the variance of P k and S k is the number of measurements, i.e., wave-function collapses, used to estimate term k [32]. Interestingly, under such measurement scheme, even exactly prepared ground states do not enjoy the zero-variance property, such that statistical energy fluctuations always remain finite and large.
This constitutes a major source of problems for variational-based state preparation, where circuit parameters are optimized to minimize the expectation value of the energy. Given its significance, several efforts have been put forward to mitigate this problem. One simple strategy, henceforth named grouped Pauli method, aims at identifying all the Pauli strings that can be measured simultaneously from the same data set [15]. While this is not solving the issue, it reduces the computational overhead of the procedure. Promising approaches also involve the usage of a classical machine-learning engine to perform an approximate reconstruction of the quantum state [34] using only the basis state defined by P k [35], or classical shadows of a quantum state [36][37][38]. Other approaches based on grouping of commuting terms, effective measurement scheduling and optimized qubit tomography have been described in Refs. [39][40][41][42][43][44][45][46][47][48]. In the context of quantum state tomography with generalized quantum measurements, neural network-assisted adaptive methods have also been proposed [49]. In the following, we show how a related idea can be applied to fully general observable reconstruction tasks and gradient-based measurement learning with effective sampling costs. It is also worth reminding that, in a more general scenario in which faulttolerant architectures are available, optimal strategies for obtaining expectation values with Heisenberg-limited precision are known, based on quantum phase estimation [50]. Intermediate solutions between the standard and the quantum-phase-estimation-like sampling regimes are also possible, leveraging some trade-offs between sample complexity and quantum coherence [51,52].
In this work, we present an algorithm for efficient observable estimation that exploits generalized quantum measurements integrating three important components: a hybrid quantum-classical Monte Carlo, a method to navigate generalized measurement space toward efficient measurements, and a recipe to combine different estimations of the observable of interest. The result is a procedure in which the optimal measurement of an operator average is learnt in an adaptive fashion with no measurement overhead.

III. ADAPTIVE MEASUREMENT SCHEME
In this section, we explain our adaptive measurement scheme. In a nutshell, the idea is to use parametric informationally complete positive operator-valued measures (IC POVMs), which can in principle be used to estimate any expectation value of our choice. We first introduce a hybrid Monte Carlo approach, which bypasses the need to use tomographic reconstructions of quantum states from the IC data. We then describe how, by using parametric families of POVMs, the measurement settings can be optimized to yield low statistical errors in the estimation of the target expectation values.
With respect to the second point, special attention must be devoted to achieving the desired POVM optimization without incurring additional overheads in terms of, e.g., the number of repetitions (also named shots in the following) of the state-preparation-and-measurement routine. As we will explain in the following, an adaptive method -that is, an on-the-fly optimization -will serve this scope. In brief, the key is to use the IC data obtained with one given POVM twice. First, we use them to produce an estimation of the mean of the observable. Second, the same set of results can also be employed to find a better POVM for the next experiment. The collection The ansatz prepares a state |ψ( θ) (green box) for which the mean of some observable O must be evaluated. Our algorithm is an efficient measurement subroutine in this process. It relies on parametric informationally complete POVMs (purple box) implemented with ancillary qubits (red box). These are explained in detail in App. A. Initially, we start by performing S1 measurements using the POVM corresponding to parameters x1, and obtain S1 outcomes m1, . . . , mS 1 . The measurement data are post-processed efficiently on a classical device (blue box) twice, with two different goals. First, we estimate the mean of the observable,Ō1, and the corresponding error of the estimation,V1, as explained in Sec. III A. Second, we calculate the gradient of the estimation variance, ∇ x Var(ωm), in POVM parameter space, and thus find a better POVM for iteration 2 (see Sec. III B and App. B). At every step t, the variablesŌ andV integrate all the estimations for t ≤ t while minimising the overall statistical error (see Sec. III C and App. D). The process is repeated iteratively untilV is below some desired threshold.
of intermediate estimators of the target observable, each constructed along the process with a different POVM, is finally integrated together as to minimize the overall statistical uncertainty. As a result of this strategy, the measurement learning procedure improves over the initial POVM (which turns out to be already quite efficient, as shown in Sec. IV) with no additional measurement costs. The scheme is illustrated and summarized in Fig. 1.
It is important to stress that the method does not require any approximations whatsoever. In fact, it is completely agnostic to the nature of the operator O to be measured, as long as it is given in terms of a linear combination of products of single-qubit observables (e.g., Pauli strings). While the algorithm is rather general, its performance is strongly dependent on the weight of such products (the number of non-identity single-qubit operators in every term), as we explain later, which makes quantum chemistry with low-weight fermion-to-qubit mappings, such as Bravyi-Kitaev [53] and the one recently introduced in Ref. [54], ideal use cases. Moreover, it should be mentioned that the methodology relies on the use of one ancillary qubit for every system qubit. However, the ancillary qubits remain in the ground state until the measurement stage, and the procedure only requires an increase in the circuit depth that is independent of the system size (i.e., a single layer of two-qubit gates that can all be executed in parallel). Yet, the efficiency of the method when applied to quantum chemistry problems is comparable to that of state-of-the-art methods that require an additional circuit depth linear in the number of qubits [41] and, at the same time, it provides informationally complete (IC) data useful for purposes beyond energy estimation.
To ease the explanation of the algorithm, we present its three main components separately. We first introduce the hybrid quantum-classical Monte Carlo sampling for the estimation of expectation values of operators in Sec. III A. We then show in Sec. III B how to estimate the gradient in the space of POVMs without additional measurements, using only efficient classical post-processing and, lastly, in Sec. III C, we illustrate how to integrate all the data obtained from different POVMs to estimate mean values while minimising statistical fluctuations.

A. Hybrid quantum-classical Monte Carlo sampling
Our proposed algorithm relies on single-qubit (minimal) IC POVMs, which can be realised by applying a two-qubit gate between a system qubit and an ancillary one, the latter in a known state, and then measuring both qubits in the computational basis. In practice, this means that the ancillary qubits are initialised along with all the other qubits in the device (e.g., they are prepared in the ground state |0 ) and no operations are applied to them until the measurement stage. The implementation of these POVMs on current quantum computers has recently been demonstrated experimentally on IBM Quantum devices [55,56].
By definition, one such POVM is represented by four linearly independent positive operators {Π i > 0, i = 0, . . . , 3} adding up to identity, i Π i = I, and spanning the space of linear operators in the Hilbert space H of the system qubit. Each of these operators, usually called effects, is associated with one of the four possible outcomes of the two-qubit measurement, with Tr[ρΠ i ] being the probability of outcome i on the quantum state ρ of the target qubit. It is important to note that different qubit-ancilla unitaries generally lead to different POVMs. Hence, by parametrising these unitaries, we can parametrize the corresponding family of POVMs (see App. A).
Let us consider the N -qubit case, with local and not necessarily identical POVMs associated with each qubit. The four effects associated with qubit i are denoted by Π (i) m , with m running from 0 to 3. The outcome of an experiment in which all qubits are measured via these local POVMs is a string m = (m 1 , . . . , m N ), where m i ∈ {0, . . . , 3}. The probability of such outcome given an N - As explained in previous sections, in VQE realisations one typically needs to measure an operator O that can be decomposed in terms of K Pauli strings, O = k c k P k (we assume c k ∈ R, as is customary, although our results can be easily generalized to complex-valued coefficients). Given that each of the local POVMs is IC, we can express the Pauli operators acting on each qubit i in terms of the effects Π The above expression seems useless at first sight: we transform a representation of O in terms of K terms c k P k into one with possibly 4 N terms ω m Π m . However, the expectation value of the operator now reads where p m is the probability of obtaining outcome m.
In other words, the mean value of the operator is the average of ω m over the probability distribution {p m }, O = ω m {pm} . This observation enables a very different strategy for estimating O as compared to the standard Pauli method introduced in Sect. II. Instead of evaluating each of the p m = Π m via repeated sampling, and once all these mean values are known with high enough precision calculating O = m ω m Π m , which would be infeasible given the aforementioned exponential amount of terms, we can resort to a Monte Carlo approach. In Monte Carlo integration, one can evaluate integrals over high-dimensional domains efficiently by randomly sampling points within the domain and averaging their image through a suitable function. Similarly, in our case, we can exploit the fact that p m is the probability of the measurement yielding outcome m to calculate O in a similar manner, that is, using the quantum computer to sample values of m and a classical one to calculate the corresponding ω m , hence bypassing the need to evaluate the mean values Π m .
More precisely, the strategy is to repeat the measurement S times using the local POVMs to sample from the probability distribution {p m }, resulting in a sequence of outcomes m 1 , . . . , m S , and computē kimi can be calculated in a polynomial time on a classical computer. This estimator converges to O = m ω m p m as Var(ω m )/S, where Var(ω m ) is the variance of ω m over the probability distribution {p m }, hence possibly providing accurate estimations even when the sum in Eq. (4) only involves a number of terms S 4 N . Crucially, this method estimates the weighted average of all the Pauli strings P k simultaneously, regardless of whether they commute or not, by exploiting IC data, yet circumventing any costly tomographic reconstruction of quantum states. In addition, in this Monte Carlo approach, the variance naturally takes into account the covariance between all these parallel measurements. In other words, the quantity ( ω 2 m {pm} − ω m 2 {pm} )/S, which can be estimated efficiently from the data, accounts for the total statistical error. As we explain next, our strategy is to iteratively search for POVMs that minimize this error.
Importantly, the previous result holds for any operator O, that is, the same sequence of outcomes m 1 , . . . , m S can be used to estimate, using only classical postprocessing, any expectation value. However, not all expectation values can be estimated with the same precision. In particular, note that the products can in principle result in variances scaling exponentially in N . This is so because, generally, each coefficient b can have an absolute value different from, and also larger than, one. Hence, in worst-case scenarios, the absolute value of such products, and therefore of the ω m defined in terms of linear combinations of them, can scale unfavorably with the system size (see App. C for a concrete example). This limitation can be overcome for fermionic problems by using fermion-to-qubit mappings such as the Bravyi-Kitaev (BK) [53] and especially the one recently proposed by Jiang et al. in Ref. [54] (to which we refer as JKMN mapping), which lead to Pauli strings with logarithmic weight (that is, such that fermionic creation/annihilation operators are mapped onto Pauli strings with at most a logarithmic number of non-identity Pauli operators). Since the terms b (i) 0mi , corresponding to the decomposition of identity, are always equal to one (recall that m Π (i) m = I (i) ), these mappings guarantee that the products In any case, it should be clarified that using other mappings does not necessarily imply an unfavorable scaling of the algorithm, as there may nevertheless exist POVM parameters for which the method is efficient. In fact, as we show in Sect. IV, the adaptive strategy that we present in what follows finds POVMs for which the algorithm outperforms the Pauli and grouped Pauli methods in evaluating the ground-state energy of molecular Hamiltonians using the parity [57] and Jordan-Wigner (JW) mappings as well.
Regarding the method proposed in Ref. [54], it should be mentioned that our Monte Carlo approach, given in Eq. (4), offers some advantages over the latter. On the one hand, it bypasses the classical overhead needed for tomographic reconstructions. On the other hand, and more importantly, our approach does not disregard the covariance-induced statistical errors in the estimation of the average resulting from parallel measurements. These points are discussed in more detail in App. C.

B. Classical gradient estimation for POVM optimization
Modification of the POVM results in a different probability distribution {p m }, as well as different weights ω m , and hence potentially different Var(ω m ). This can be exploited to devise an adaptive algorithm in which the measurement of O is optimized over the space of POVMs, that is, by finding one that minimizes the variance Var(ω m ). We now propose a classical postprocessing routine to navigate the space of POVMs toward low-variance ones. Essentially, besides using the outcomes obtained with the current POVM to construct an estimation of the target observable, the same set of data is also employed in a classical routine to assess the variance of other POVMs that have not previously been implemented on the quantum processor. Such procedure is explained in detail in the following.
Suppose that we want to evaluate the Monte Carlo variance Var(ω r ) for a new POVM defined in terms of local POVMs with effects {Γ where the ω r are given by the b kr matrices corresponding to these local POVMs, and Γ r = ri . The second term in Eq. (5) is the squared mean O 2 , which does not depend on the POVM. The first term, i.e. the second moment of ω r over the probability distribution , is the one that we must minimize.
Suppose further that we have already run some experiments on the quantum computer with another IC POVM given by the effects {Π rm are real numbers. Inserting these decompositions into the expres- (d-f) final POVM effects in the gradient optimization process, when starting from the SIC POVM 2, for a sample of 20 realisations from the data set of (a). Every POVM effect is mapped onto the three-dimensional unit-radius ball in a similar way as how single-qubit states are mapped onto the Bloch ball. In particular, the point r = (rx, ry, rz), | r| ≤ 1, is associated with the effect Π( r) = (| r|I + r · σ)/2 (note the difference with the Bloch ball representation of quantum states; see App. A). In the figure, the color indicates the qubit to which an effect corresponds, while the symbol identifies the effect itself among the possible four. The black symbols locate the initial effects, common to all realisations and qubits. Each panel presents the projection of the ball onto a different plane. The clustering of the points with equal color and symbol reveal that all realisations reach approximately the same optimal measurement. However, the result of the optimization is different for every qubit. Moreover, starting with SIC POVM 1 instead leads to a very different measurement (see [58]).
sion for the second moment, we obtain This last expression is also calculated in a hybrid Monte Carlo manner. More precisely, we can reuse the strings m 1 , . . . , m S obtained from the measurements on the quantum computer (sampled from the probability distribution {p m }) to estimate the variances of other POVMs by calculating, for each m s , the corresponding rimi ω r 2 classically. Note, however, that this last sum cannot always be computed efficiently, since it generally contains 4 N terms (both positive and negative), and involves products rimi that can scale exponentially in N . To ensure the feasibility of the procedure, we use a gradient descent approach for the optimization of the POVMs; in such case, only one of the terms in the product is different from one.
For concreteness, suppose that we use the effects {Π (i) m } corresponding to the point x in the POVM parameter space (see App. A) on the quantum computer and obtain S samples with which we can estimate the second moment ω 2 m . We can approximate the partial derivative of the second moment with respect to one of the parameters (for instance, the k-th), as ∂ x k ω 2 m ≈ ( ω r 2 − ω 2 m )/h, where ω r 2 is the estimated second moment corresponding to the POVM the coordinates of which in parameter space x fulfill x k = x k + h and x j = x j for j = k (let us denote the corresponding effects by {Γ (7) Using this method, all the partial derivatives can be calculated using classical post-processing, in polynomial time, of the same samples m 1 , . . . , m S obtained from the quantum computer. Once the gradient has been estimated, we can identify a new POVM with smaller expected variance than the previous one. We detail the gradient-based optimization used in this work in App. B.

C. On-the-fly optimization
An important aspect of the algorithm is that we do not need to first optimize the POVM (until it reaches a smallenough variance) before starting to estimate the expected value of the observable. The intermediate POVMs used in the process are also IC, so they can be used for the estimation of O as well. The strategy is to use the intermediate mean values obtained with every fixed choice of the POVM to calculate a weighted average. As we will show below, the latter is designed in a way that minimizes the resulting variance in the overall estimation. The whole procedure can be carried out iteratively as the algorithm progresses, thus effectively making use of all measurement results obtained during the intermediate POVM optimization steps for the reconstruction of O .
The above procedure can be recast in terms of an iterative algorithm as follows: 1. Initialize two variables,Ō andV , such thatŌ 1 → O andV 1 →V .
2. At the end of each iteration t ∈ (2, . . . , T ) of the POVM optimization, update them as (Ō tV At any point along the process, we have an estimated meanŌ with estimated standard errorV 1/2 that minimizes the overall error of the input data and can be easily updated with new ones. It is important to stress that this iterative mixing of the outcomes is unbiased, as we prove in App. D.

IV. NUMERICAL SIMULATIONS
In this section, we present the results of the numerical experiments that are run to test the feasibility and performance of our algorithm. Section IV A is aimed at illustrating the effect of the adaptive measurement. Section IV B presents a more in-depth analysis of the performance. Finally, in Sec. IV C we demonstrate an important feature of our approach: the IC data used for the estimation of the energy can be reused for other purposes. All the data used in this manuscript are available on Zenodo [59]. The source code used to generate the results is available online [60].

A. Energy measurement learning
We start by measuring the ground-state energy of the H 2 , LiH and H 2 O molecules. For the characterization of each system, we use different numbers of molecular orbitals. The basis set used for H 2 is 6-31G [61][62][63][64]  , and H2O (c)) with different measurement methods, with a total of S = 10 6 shots (for the Pauli and grouped Pauli methods, we use the same number of shots 10 6 /K on every Pauli string, so the total number of shots is in fact S = K 10 6 /K ; this represents a deficit of at most 0.1% in the total number of shots in the examples considered). The ground state is approximated by optimising a VQE ansatz. The estimation error is the absolute difference between the simulation results and the exact value for the optimized ansatz. The points represent the average error over 100 realisations and the error bars show a 95% confidence interval obtained using bootstrapping. For H2, our algorithm offers little improvement, but the difference in performance becomes clearer with the other two molecules. Note that the two initial POVMs yield slightly different results, with SIC POVM 1 generally outperforming SIC POVM 2. We also note that, in the cases involving more qubits, such as the 14-qubit H2O molecule with the BK mapping, the measurement optimization has not fully converged for S = 10 6 shots, so the difference with respect to the other methods is expected to increase for larger S, potentially reaching chemical accuracy earlier.  [54], the Jordan-Wigner (JW), and the parity [57] mapping transformations. The latter has an intrinsic property, deriving from spin up and spin down electron conservation, that reduces the number of qubits required by two [57]. We also leverage different symmetries present in each system to reduce further the qubit count [57]. For the case of LiH and H 2 O we also freeze the core orbitals allowing us to exclude another two spin orbitals from our calculation (refer to the table in Fig. 6 for more details on the Hamiltonians and qubit reductions considered). Each of these molecular Hamiltonians is mapped into qubits using one or more of the aforementioned techniques, hence producing several qubit Hamiltonians with varied number of qubits, which are then used to simulate the energy measurement process in a VQE experiment near convergence.
We proceed as follows. First, for each qubit Hamiltonian H, we numerically approximate the ground state with a hardware-efficient ansatz |ψ( θ) introduced in Ref. [15]. This generates a trial wave function by combining repetitive layers of single qubit R y gates and entangling blocks composed of two-qubit operations [controlled-NOT (CNOT) gates]. The single qubit rotations are parametrized with a set of angles (also known as variational parameters) that are iteratively updated, with the help of a classical optimization routine, in order to minimize the energy expectation value. Once we have the optimal parameters for which the variational form |ψ( θ opt ) approximates the ground-state wave function, we calculate the corresponding exact expected energy E = ψ( θ opt )|H|ψ( θ opt ) .
We then simulate different energy evaluation methods as a function of the number of state preparations (shots)Ē(S), and compute the corresponding errors |Ē(S) − E |. We also calculate the estimated statistical error for each approach, that is, the estimated error when the exact value E is not available (for the gradient-descent algorithm, this error is given byV 1/2 as defined in Sec. III C). These quantities are depicted in Fig. 2 (a-c) for three selected examples.
The effect of the measurement learning results in the error decreasing faster than S −1/2 , especially for small S. This is a consequence of the fact that, after each batch of runs, the next POVM used in the sequence is in principle more efficient (i.e. leads to a smaller variance) than the previous one. Importantly, even if the starting efficiency is lower than that of other methods, our algorithm eventually takes over and reaches better accuracy at lower costs. Moreover, as we discuss in detail in the next subsection, even the use of Eq. (4) with the initial POVM without optimization tends to give better performance than with the Pauli and the grouped Pauli methods, as the size of the problem increases. The results also reveal Number of shots Star required to achieve a target estimated error of tar = 0.5 mHa for H chains as a function of the number of qubits N . The qubit Hamiltonian is obtained using the JKMN mapping. For each method and molecule, we use up to S lim ≈ 10 6 runs, as in Fig. 3. If the average estimated error with S lim shots, lim = V 1/2 , where · represents the average over realisations, is still larger than tar, we estimate the required number of shots needed to reach it by assuming a scaling ∼ S −1/2 , that is, we use Star = S lim 2 lim / 2 tar . While this procedure saves us considerable computing time, it also overestimates the number of measurements needed by our algorithm: indeed, the convergence of our method to tar is faster than ∼ S −1/2 unless it has already converged to the optimal POVM (see Fig. 2). Thus, these results are to be regarded as an upper bound to the total measurement cost of the learning POVM method. The curves depict least squares fits to the data with functions of the form Star = aN b . The corresponding values of the exponent b for each method are reported in Table I. Note that the values found for the Pauli and grouped Pauli methods are consistent with the ones reported in Ref. [41]. Moreover, the performance of our algorithm is similar to that of the state-of-the-art method proposed in Ref. [41], especially for the lower values of N , for which the overestimation of Star is less significant. thatV 1/2 , as introduced in Sec. III C, gives the correct estimation of the statistical error in the evaluation of the energy [67].
The learning process is also illustrated in Fig. 2 (df), where we depict graphically the result of the optimization in terms of a geometric representation of the effects akin to the Bloch sphere for single-qubit states (see App. A for details). We only include the results for one example in the paper, but the results for all the Hamiltonians analysed in this work, as well as their animated version, are available online [58]. Interestingly, while the optimization eventually converges and different realisations with the same initial condition lead to the same minimum (modulo small fluctuations), the two initial conditions considered here (see App. C) result in different optimal POVMs with slightly different performance. This suggests the potential existence of better  Fig. 4, as well as their counterparts using other fermion-to-qubit mappings, are fitted to a function of the form Star = aN b . The table contains the corresponding exponents. The exponent b ≈ 6 of the Pauli method, as well as the mild reduction b ≈ 5.6 offered by grouped Pauli, are consistent with the values reported in Ref. [68] for other molecules. The POVM-based method without optimization already outperforms these results, with b ≈ 4.8 using the JKMN mapping [54]. The adaptive strategy results in a considerably smaller exponent b ≈ 3.3. Interestingly, a similar scaling is achieved also for the JW mapping; while this mapping leads to Pauli strings with weight of O(N ), the adaptive strategy is able to find POVMs for which the measurement process is efficient.
initial conditions than those explored here. This subject will be considered in future work.

B. Performance and scaling
While the previous results illustrate the working principles of the algorithm with three molecular Hamiltonians, we now turn our attention toward the analysis of its performance. In Fig. 3 and in App. E, we collect the errors of similar estimations for several other Hamiltonians corresponding to the same molecules (under different qubitreduction schemes) for a total number of measurements S ≈ 10 6 , from which it can be seen that our algorithm is advantageous in almost all cases, and particularly for LiH and H 2 O. Note that, since our algorithm is adaptive and the error decreases faster than S −1/2 , in contrast to the other methods, the advantage of the former would potentially increase for larger S. This is especially the case for the larger problems, for which the POVM-learning algorithm is further from convergence -and the error in the energy from chemical accuracy-at S = 10 6 shots.
In order to study the performance of the algorithm for larger Hamiltonians, we analyse the number of measurements required to reach an accuracy of 0.5 mHa as a function of the number of qubits for hydrogen chains with increasing number of atoms; arguably, this figure of merit is more informative of the usefulness of the approach in real applications, in which one is interested in determining the ground-state energy within some fixed accuracy, rather than obtaining the best performance for a fixed number of shots.
Due to limitations in computational power, we run our simulations for a limited number of measurements and extrapolate the total number required for such precision (see Fig. 4 for results using the JKMN mapping and caption for a detailed explanation). Even though this method overestimates the actual number of shots needed by our algorithm, we see a considerable improvement with respect to the Pauli and grouped Pauli methods. Interestingly, the bare hybrid quantum-classical Monte Carlo method without optimization, despite yielding higher errors for the small sizes considered here, also shows a more favorable scaling than the former methods.
To provide a more quantitative evaluation, we further fit each set of results into a function of the form aN b . We report the corresponding values of the exponent b, also including those for other fermion-to-qubit mappings, in Table I. Note that, while other mappings are added for completeness, the optimal performance of our algorithm is expected with the mapping from Ref. [54] (see Sec. III A), as confirmed by the results. Importantly, we can see that our method thus benefits from two improvements: the Monte Carlo approach results in a considerable reduction in the exponent, followed by a second scaling improvement stemming from the learning strategy. The result is an overall efficiency comparable to state-of-the-art methods [41,54].

C. Exploiting informationally complete data
Further numerical experiments demonstrate that the IC data collected for the estimation of the energy can indeed be reused for other purposes. As explained in Sec. III A, the same IC outputs can be post-processed to calculate any expectation value of our choice, the only limitation being that, as it is reasonable to expect, the optimization procedure targeting a particular observable may worsen the estimation of other specific quantities. In what follows, rather than focusing on particular additional observables, we consider an arguably more costly task: state tomography. More precisely, we address the reconstruction of all the k-qubit density operators in the system for all k ≤ K. Reduced tomography has recently attracted some interest in the quantum information literature for diverse purposes [39-47, 56, 69].
We thus proceed in a similar manner as in the previous subsections. We approximate the ground states by training VQE ansätze and then estimate the energy using the adaptive algorithm. The resulting data is then used to reconstruct all the k-qubit reduced density matrices using likelihood maximisation. In particular, for every subset of k qubits in the system, we marginalise the outcomes over the subset and then use the algorithm introduced in Ref. [70] to reconstruct the density operator. Since we must integrate IC data from T different POVMs in the likelihood maximisation procedure, we define a collective POVM with T ×4 N effects {Ξ (t,m) = Π (t) m S t /S, t ∈ [1, T ]}, where the index t indicates the POVM optimiza-tion step, S t represents the number of measurements carried out in iteration t, and S = t S t [71]. Once a k-qubit density matrix ρ tomo is reconstructed, we compute its infidelityF(ρ tomo , ρ exact ) = 1 − F(ρ tomo , ρ exact ), (where F(ρ tomo , ρ exact ) = Tr[ √ ρ tomo ρ exact √ ρ tomo ] 2 is the quantum fidelity) with respect to the exact one ρ exact (obtained by tracing out all other qubits in the trained VQE ansatz). In Fig. 5, we show the resulting average k-wise infidelity for the ground states of two molecules, H 2 and LiH, as a function of k, with and without gradient-based POVM optimization. We note that the density matrices can be reconstructed with high fidelity from the same data that was used for the estimation of the energy. Moreover, the comparison between these two methods reveals that the optimization of the POVM with respect to the precision in the estimation of energy also improves the fidelity of the reconstructed density matrices by up to an order of magnitude. In all cases, however, the infidelity increases with k, as expected.

V. DISCUSSION AND CONCLUSIONS
We introduce an algorithm for efficient observable estimation that exploits informationally complete generalized quantum measurements integrating three important components: a hybrid quantum-classical Monte Carlo, an efficient method to navigate POVM space toward lowvariance measurements, and a recipe to combine different estimations of the observable of interest. The result is a procedure in which an optimized measurement of an operator average is learnt in an adaptive fashion with no measurement overhead. Consequently, the overall measurement cost is drastically reduced with respect to the initial POVM considered. This is particularly interesting for real applications, considering that the initial SIC POVMs used already offer a significant improvement over other widely used methods, such as grouped Pauli. Importantly, the method does not require any exponentially scaling classical or quantum computations, although it does involve a modest polynomial classical overhead.
We have illustrate the potential of the approach with several proof-of-principle numerical experiments by reconstructing the ground-state energies of several molecular Hamiltonians. Importantly, our simulations suggest that this adaptive method exhibits scaling performance comparable to those of the most efficient measurementreduction techniques in the current literature. While confirmation of this calls for a more thorough analysis and simulations, possibly including more general operators than molecular Hamiltonians, it is also important to point out that there is still substantial room for improvement in our algorithm, especially in the parametrisation of the POVMs and in the gradient-descent-based update schedule. It might also be interesting to investigate, in a future work, the potential use of classical machine-learning methods to enhance the measurementadaptation step [49].
Our algorithm also offers some other intrinsic advantages. Being completely agnostic to the nature of the qubit Hamiltonian, and not inspired by quantum chemistry but by quantum information alone, the proposed procedure may find interesting applications beyond VQE calculations. The method is also formally exact, as no approximations are made at any point, except for using the estimated variances as proxies of the actual ones. Moreover, the informationally complete data produced during the measurement process for a particular observable can in principle be reused to calculate many other properties of the underlying quantum state, including its tomographic reconstruction. We provide evidence of the feasibility of this prospect by performing high-fidelity reduced state tomography with no additional measurements.
In this paper, we have only consider the task of estimating a given observable for a fixed quantum state. This typically represents a single step of, e.g., a VQE calculation. In perspective, one could, however, easily integrate our proposed method as a subroutine of the whole ansatzoptimization method. In such case, it might actually be helpful to use the optimal POVM from the previous VQE step, or a slight modification of it, as the starting point of the measurement optimization on the updated ansatz, given that the trial wave function should undergo relatively small changes between consecutive iterations. While this stands as a hypothesis for now, it might reduce the average number of steps required to adjust the POVM settings, hence leading to an even larger reduction in the measurement costs associated with the overall ansatz-optimization process.
Admittedly, our contribution presents a drawback: it requires twice as many qubits. However, it is important to discuss what this entails in practice. The ancillary qubits used for the implementation of the POVM are initialised in the ground state, along with the rest of the qubits in the device, but no operations are applied on them until the measurement stage. Hence, the algorithm introduced here does not require the entanglement of 2N qubits throughout the whole computation, which would amplify the detrimental effect of decoherence. Instead, the additional N qubits should be regarded as nothing other than part of the measurement apparatus. In addition, we note that the method offers a significant advantage over a simpler use of the additional qubits, such as, e.g., executing two grouped Pauli iterations in parallel, which, albeit cutting the total run time by up to a factor of 2, would not improve the scaling of such method.
Finally, it is worth discussing some relevant aspects regarding its implementation on real hardware, especially on near-term quantum computers. On the one hand, in devices with limited connectivity, additional SWAP gates may be required in order to enable the interaction between system and ancillary qubits. Importantly, the topology of most currently existing platforms enables the additional SWAP gates, when needed, to be parallelized in such a way that the measurement circuit preserves its size independence. This highlights a favorable aspect of the algorithm: since only a constant-depth measurement circuit is required (namely, the application of a two-qubit gate instead of a single-qubit one, and perhaps some SWAP gates if the connectivity requires so, for every system qubit), the measurement process itself is not expected to introduce significant decoherence effects with respect to applying standard Pauli measurements.
Moreover, the commonly used readout noise-reduction techniques, such as the algorithms integrated in Qiskit or any other error-mitigation strategies that would be used for basic Pauli measurements, can be used here to correct the outcome statistics. While a proper assessment of the performance of the method under real noise conditions, as well as possible specific noise-mitigation strategies, is beyond the scope of this work, these considerations suggest that the ideas introduced in this work can play an important role in enabling the first useful applications of quantum computing for quantum chemistry, so far estimated to require prohibitive computing times. As stated in the main text, the algorithm relies on parametrized, informationally complete POVMs implemented through the application of two-qubit unitaries with ancillary qubits, followed by projective measurements on the computational basis. To explain the parametrisation used in this work, it is easier to start by identifying the POVM characterizing one such measurement when applying an arbitrary unitary gate U between some system qubit q in state ρ and an ancilla a in state |0 0|. Since the two qubits are eventually measured projectively in the computational basis, there are four possible outcomes (b q , b a ) with b q ∈ {0, 1} (and similarly for b a ). Each outcome occurs with probability p (bq,ba) = b q b a | U ρ ⊗ |0 0| U † |b q b a . Writing U = ijkl u ij kl |ij kl|, this expression becomes p (bq,ba) = kk u bqba k0 (u bqba k 0 ) * k| ρ |k = Tr π (bq,ba) π (bq,ba) ρ , where we have defined π (bq,ba) = k (u bqba k0 ) * |k . Hence, the corresponding POVM is given by the set of effects {Π i = |π i π i | , i ∈ [0, 3]}, where we have relabelled the outcomes using i = 2b q + b a .
The previous calculation suggests our strategy for the POVM parametrisation: parametrize the unitary U , and compute the resulting POVM. The following observations are important. Firstly, not all the components u bqba kl are relevant for the measurement, as the initial state of the ancilla deems those with l = 1 irrelevant (provided that U is unitary). Secondly, global phases on |π i have no effect on the resulting operator Π i , so we are free to set u are the components of two orthonormal vectors, which we may call u 0 and u 1 in what follows, in C 4 . Before we proceed any further, let us count the total number of available degrees of freedom. On the one hand, we have four real numbers whose squares add up to one for u 0 , which amounts to 3 degrees of freedom. For u 1 , we have four complex numbers with three constraints (one for normalisation and two for the orthogonality with u 0 ), which results in 5 degrees of freedom. In total, we need 8 parameters per system qubit.
Our parametrisation for single-qubit POVMs thus consists of 8 real numbers x = (x 0 , . . . , x 7 ), with x i ∈ (0, 1), ∀i (in practice, we constrain the values further, see App. B). We start by using the first three of these to produce the set of angles (πx 0 , πx 1 , 2πx 2 ), which identify (uniquely) a point on a 3-sphere S 3 with unit radius embedded in R 4 . The corresponding Euclidean coordinates in the embedding space are four real numbers whose squares add up to one, hence generating u bqba 00 . Defining u bqba 10 from the other five parameters is slightly more involved. To guarantee that the vector u 1 is orthogonal to u 0 , we construct it as a linear combination of orthonormal vectors orthogonal to u 0 , that is, u 1 = i z i u ⊥ i ; the orthonormal basis {u ⊥ i } can be found by means of the Gram-Schmidt orthonormalisation. The components z i , which must also be normalised, are determined by the remaining parameters: once again, we define a list of angles (πx 3 , . . . , πx 6 , 2πx 7 ) and calculate the Euclidean coordinates of the corresponding point in S 5 . These six real numbers {r i , i ∈ [0, 5]} are then used to define three components {z k = r 2k + ir 2k+1 }. The result of this procedure is a vector u 1 ∈ C 4 whose components can be identified with u bqba 10 . Finally, we must find two more vectors u 2 , u 3 ∈ C 4 to complete the missing terms u bqba k1 in the definition of the unitary. This can be done by using the Gram-Schmidt orthonormalisation once more. Once the unitary U is defined, we can not only calculate the corresponding set of effects {Π i }, but also implement it in a given circuit. Indeed, the algorithms to find the circuit decomposition of unitary U are known and readily implemented in Qiskit [72] (also, note that any two-qubit gate can be decomposed in up to three CNOT gates).
Admittedly, this methodology is more complicated than simply parametrising arbitrary two-qubit gates U and then calculating the corresponding POVM. However, as discussed above, our procedure avoids the use of unnecessary or redundant parameters, which could make the POVM optimization harder. Nevertheless, it is likely that other parametrisations, more suitable for the adaptive optimization algorithm, exist. These refinements, as well as improving the gradient descent protocol (see App. B), will be the subject of future work.

Appendix B: Gradient descent protocol
Along the measurement process, we iteratively update the POVM parameters as well as the number of shots per experiment. In particular, we gradually increase the number of shots in order to have more precise estimations of the second moment as the POVM parameters approach a minimum and, consequently, the gradient decreases in magnitude. In this section, we briefly outline the protocol used in our numerical experiments.
As explained in the main text, the POVM-based measurements allow us to estimate the gradient ∇ x ω 2 m classically from the outcomes of an experiment run with the POVM corresponding to parameters x t , where t labels the iteration (for the finite-difference partial derivatives ∂ x k ω 2 m ≈ ( ω 2 r − ω 2 m )/h, we use h = 10 −3 ). With these elements, we determine the POVM to be used in the (t + 1)-th iteration through where |∇ x ω 2 m | is to be understood as the set of absolute values of the components of ∇ x ω 2 m . Hence, ν is the magnitude of the largest change, in absolute value, of the POVM parameters. It should also be mentioned that, to avoid numerical instabilities, we further constrain every parameter to be between [δ, 1 − δ], with δ = 0.05. We start our simulations with S 1 = 1000 shots, and we use ν = 0.05. Every three iterations, we update S t + 1000 → S t+1 and ν/1.2 → ν. Hence, as the algorithm approaches the minimum, we obtain more precise estimations of the gradient (larger S t ) and we make smaller changes to the parameters (smaller ν).
This parameter updating schedule is rather heuristic and still leaves room for improvement. Designing a more theory-driven approach, or using more sophisticated optimization techniques, will be the subject of future work.

Appendix C: Symmetric IC POVMs as initial measurements and correlated estimators
In the absence of prior knowledge about the state of the qubit register, it is desirable to use a so-called symmetric informationally complete POVM (SIC POVM) on every system qubit. Symmetric here means that its single-qubit effects, when rescaled asΠ i = 2Π i yield a set of projectors {Π i :Π 2 i =Π i } fulfilling Tr[Π iΠj ] = (2δ ij + 1)/3), ∀i, j. Hence, the projectors {Π i } form a regular tetrahedron in the Bloch sphere.
In this work, we have considered two different SIC POVMs as initial conditions for the adaptive algorithm. The first one is the classic example of singlequbit SIC POVM, defined in terms of the projec- The second SIC POVM used in this paper was considered by Jiang et al. [54] and is another standard setting [73][74][75]. In order to use them in our algorithm, we must first find the parameters x of each of them in the POVM space (see App. A). This can be done numerically; the resulting parameters are reported in the computer code accompanying this paper [60].
It is worth discussing some properties of this second SIC POVM when used in our hybrid quantum-classical Monte Carlo algorithm, Eq. (4). In this case, all the b 3, √ 3}, ∀k > 0 (for k = 0, these are equal to one, since the effects add up to identity). This, in turn, has interesting implications. Let us consider the statistical error in the estimation of the expectation value of an observable given by a single Pauli string P k with weight l, that is, only l Pauli operators in P k are different from identity. In this case, the variance of the Monte Carlo is given by Var(ω m ) = 3 l − P k 2 ≤ 3 l . Hence, if S measurements are performed, the variance of the estimatorP k is Var(P k ) ≤ 3 l /S. This is indeed consistent with Ref. [54].
While we can reuse the IC data from the quantum computer to calculate the expectation value of other Pauli strings P k with similar statistical error (assuming they have the same weight l), we must take into account that the resulting estimatorsP k andP k can be correlated. In practice, this means that, if we are to use them to calculate the expectation value of an operator defined in terms of a linear combination of Pauli strings, O = k c k P k , the variance of the estimator O = k c kPk depends on the potentially non-zero covariance between distinct terms, so we cannot assume that Var(Ō) = k |c k | 2 Var(P m ).
The estimation based on the Monte Carlo method, Eq. (4), naturally takes into account these correlations when accounting for the statistical error of the approach,  FIG. 6. (Left) Error in the estimation of the energy of an optimal VQE circuit for all the Hamiltonians reported in the table on the right. Every column compares the results for one molecule (H2, LiH, and H2O) with different measurement methods, with a total of S = 10 6 shots. Each row corresponds to a different mapping. Points represent the average error over 100 realisations and the error bars show a 95% confidence interval obtained using bootstrapping. We note that, in some cases, especially those involving more qubits, like the 14-qubit H2O molecule with the BK mapping, the measurement optimization has not fully converged for S = 10 6 , so one would expect more notable differences with respect to the other methods for larger S. (Right) A table of the various combinations of molecule, mapping, basis and qubit-reduction techniques considered, with the corresponding number of qubits N . TQR is the two-qubit reduction for the parity mapping, Z2 refers to qubit reductions due to discrete symmetries [57] and CF denotes core freeze.
hence yielding the correct estimation. This is important for two reasons. On the one hand, it provides a meaningful assessment of how far the algorithm is from reaching the required accuracy at any given point of its execution.
On the other hand, since the Monte Carlo variance is the quantity that our adaptive strategy seeks to minimize, the algorithm presented here can potentially find POVMs for which the negative impact of these correlations on the estimated mean is reduced.
Appendix D: Sequential and one-step mixing equivalence In this section we prove that the sequential estimation mixing presented in the main text is unbiased. To show this, let us first compute the unbiased one-step mixing estimation. Suppose that, after the different experiments have been run, we are left with a set of T estimated means {Ō t } and variances {V t }. We would like to find a set of weights {α t > 0}, with t α t = 1, that minimizes the varianceV T = t α 2 tVt ofŌ T = t α tŌt . To do so, we can introduce a Lagrange multiplier λ and define so that ∂ λ L = 0 imposes the constraint t α t = 1. From ∂ αt L = 0 we obtain α t = λρ t /2, where we have defined ρ t ≡ 1/V t to ease the presentation, as inverse variances will appear throughout. Using now t α t = 1 yields λ = 2/ iρ i and α t =ρ t / iρ i . Hence, we arrive at To assess the result of the sequential algorithm, note that the recurrenceV t =V t−1Vt /(V t−1 +V t ) in the second step is equivalent toρ t =ρ t−1 +ρ t , withρ t ≡ 1/V t . Iterating, we obtainρ T = T t=1ρ t , which is the rightmost term in Eq. (D2). Similarly, the recurrence for the mean,Ō t = (Ō tVt−1 +Ō t−1Vt )/(V t−1 +V t ), reads O t = Ō tρt +Ō t−1ρt−1 /ρ t . Iterating once again, we obtain the expression forŌ T in Eq. (D2). Hence, both estimations are equivalent.