Active Learning for Quantum Mechanical Measurements

The experimental evaluation of many quantum mechanical quantities requires the estimation of several directly measurable observables, such as local observables. Due to the necessity to repeat experiments on individual quantum systems in order to estimate expectation values of observables, the question arises how many repetitions to allocate to a given directly measurable observable. We show that an active learning scheme can help to improve such allocations, and the resultant decrease in experimental repetitions required to evaluate a quantity with the desired accuracy increases with the size of the underlying quantum mechanical system.


I. INTRODUCTION
There is quite a discrepancy between the quantum mechanical observables that can be measured in principle and those that can be measured in practice.Restrictions to local observables or even restrictions to preferred measurement bases are common even in highly controllable experiments with synthetic quantum systems.This results in the necessity to infer quantities that can not be directly measured in terms of observables that can be measured.
While it is well understood how to express a given quantity in terms of expectation values of practically measurable observables [1][2][3][4], any such decomposition implicitly assumes expectation values of all involved observables are known.The probabilistic nature of quantum mechanical measurements, however, implies that expectation values can only be estimated with finite accuracy, and any improvement in accuracy requires more repetitions of the same measurement.
The necessity to estimate expectation values of several observables and to increase experimental repetitions in order to improve any such estimate, opens the question of how to allocate experimental resources to the different observables.Lacking a good basis for a different choice, each observable is typically allocated the same number of experimental repetitions [5][6][7].
Active learning (AL) [8,9] is a particular type of machine learning technique aiming at optimal experimental designs.AL has been applied successfully to improve performance in many scenarios, such as speech recognition [10], image retrieval [11], classification tasks [12], quantum information retrieval [13] and quantum state tomography [14].In contrast to the passive machine learning models [15] (learning from randomly selected data), AL aims to minimise the number of training resources by interactively analysing the most informative samples to ensure the maximum information gain at each step.AL is thus ideally suited for quantum experiment design, where a crucial goal is to develop an optimal strategy in order to minimise the measurement resources.
A key ingredient in the AL scheme is to estimate the informativeness of measurement on each directly measurable observable, and hence decide which observables to query in consecutive measurements.Concentration inequalities [16][17][18] provide sound foundations to estimate uncertainty information of variables based on a limited amount of data, especially when this data is generated by some random variables with unknown distributions.Therefore, it is suitable to use concentration inequalities to construct query strategies based on the outcomes of quantum mechanical measurements.
In this paper, we will develop an adaptive AL scheme to decide actively which observable to measure in a repetition of an experiment in order to obtain the best-expected improvement of the estimate of a quantity that can not be directly measured.We will show that a dynamical allocation of measurements can help to decrease the total number of repetitions required to estimate a given quantity with desired accuracy.

II. DECISIONS BETWEEN OBSERVABLES
Most quantum mechanical quantities of interest can not be assessed in terms of a single observable.This might be due to practical limitations, as it is the case for fidelity with respect to an entangled state or for an entanglement witness: strictly speaking, each of those is a regular observable, but the practical restriction to measurements of single-qubit observables and correlations thereof, implies that several observables need to be measured before the expectation value of the observable of interest can be estimated [2,19,20].This might also be due to more fundamental reasons, as it is the case with the gate fidelity [5,21,22]: since its experimental evaluation requires the implementation of a gate starting from a complete set of different initial states, estimating a gate fidelity implied performing several independent measurements even if there was not a practical restriction to local observables.
In the following, we will thus consider a general quantum mechanical quantity Q whose experimental evaluation requires estimating the expectations values S i of M independent observables S i .Since Q is a function of the expectation values S i , the accuracy of the estimate of Q depends on the accuracy with which the expectation values S i are estimated.Crucially, the accuracy of each observable does not only depend on the number of measurement repetitions, but it also depends on the actual underlying quantum state [23].At any given number of repetitions of a σ z measurement, for example, the accuracy in the estimate of σ z is higher for a state that is close to a σ z eigenstate than for a state that would yield more balanced probabilities of the two possible measurement outcomes.
Given an unknown quantum state, one can thus not find an optimal allocation of measurement repetitions to the M different observables to be measured.Only, as data is being taken, one may estimate the accuracy of the different expectation values, and one can use this information in order to decide how to allocate measurement repetitions for subsequent experiments.
In the following, we will thus consider the situation that measurements of several observables S i that have been performed with n i repetitions each.Based on the accumulated data, one can estimate the expectation values S i and thus the value of the quantity Q of interest with finite accuracy that is limited by the amount of data, i.e. the number n i of measurement repetitions.We will derive a decision rule with AL that helps to identify the observable, the measurement of which will result in the largest available decrease of the inaccuracy in the estimate of Q.With several numerical examples, we will demonstrate that taking data following this AL scheme can help to substantially decrease the number of experiments required to estimate the value of Q with the desired accuracy, and that the gain grows with increasing system size.

A. Concentration Inequalities
The law of large numbers [24] states that the expectation value S of a physical observable S is typi-cally approximated well by the empirical expectation value where s i is the result obtained in the i-th repetition of the measurement of the observable S. The uncertainty in the estimation of S decreases with the number n of measurement repetitions.This uncertainty can be expressed in terms of concentration inequalities, which states that the upper bound on the deviation between S and S e holds with probability 1 − δ.
The explicit form on the upper bound ǫ(n, δ) can depend on the underlying problem.The Empirical Bernstein Bound [18,25,26] with the empirical variance applies to a wide range of problems.In turn, however, it is not necessarily the best available bound for specific problems.In the case of independent repetitions of a measurement with only two distinct outcomes (a dichotomic observable), the bound with the actual variance v applies [16].While this bound generally provides a better estimate of the accuracy of S e than the Empirical Bernstein Bound, it has the disadvantage that it is not formulated in terms of the empirical variance v e , but rather in terms of the actual variance which depends on both the actual probability p to obtain a distinct outcome and the values s 1 and s 2 that the observable S can adopt (i.e.s i ∈ {s 1 , s 2 }).
Since estimating the expectation value of S, or, equivalently, the value of the probability p is the goal of the experiment, the actual variance v is indeed unknown, so that ǫ D (n, δ) in Eq. ( 5) is not usable in practice.
A natural remedy seems to replace the actual variance v by its empirical counterpart v e as defined in Eq. ( 4).Since, however, in cases with close-tocertain outcomes (i.e.p(1 − p) ≃ 0), the empirical variance v e tends to be smaller than the actual variance v, this replacement would result in an underestimate of the uncertainty of empirical expectation values of observables with low variance.Any algorithm that is meant to decide to perform measurements of observables with uncertainty estimates would thus decide to perform too many measurements of observables with high variance and too few observables with low variance.In order to find a decision rule that will result in close-to-optimal choices for observables to measure, we aim at finding a rule that combines the benefits of being defined in terms of the empirical variance (as in Eq. ( 3)) with the suitability to dichotomic observables (as in Eq. ( 5)).
The heuristic ansatz includes two additional terms as compared to Eq. ( 5).With their 1/n-dependence, they become negligible in the limit n → ∞.The last term ensures that ǫ M (n, δ) does not vanish in the case of a few measurements (n 1).The second term in Eq. ( 7) vanishes exactly if v e adopts its maximal value, and it is the largest for vanishing empirical variance.As such, it results in the desired modification to compensate for the misestimate of low variances.

B. Uncertainty Reduction
With the ability to estimate the uncertainty of empirical expectation values of directly measurable observables, one can also estimate the uncertainty of the empirical estimate Q e of the composite quantity of interest.Even though not strictly necessary, we will restrict the following discussion to functions that depend on the observables S i in a linear fashion, since this is given for quantities like fidelity with respect to a pure state or a unitary gate.Non-linear quantities, such as the von Neumann entropy, would require a generalisation that is feasible, but that would make the following discussion unnecessarily technical.
For any given linear function Q = i a i S i , with scalar factors a i , the empirical estimate of Q reads Q e = i a i S i e , and the uncertainty of Q e can be estimated with the inequality where ǫ i is the bound on the uncertainty of S i e following Eq.(7).
The goal at hand is to decrease the inaccuracy of Q e through identification of the observable to measure that results in the largest possible decrease of the right-hand-side in Eq. (8).To this end, it is desirable to estimate how each of the bounds ǫ i would change if an additional repetition of the measurement of S i was performed.
Given the dependence of the bounds in Eq. ( 7) on the empirical variance, this prediction can be made only approximately.Leaving aside situations with extremely sparse data (i.e.n 1), the change in the empirical variance following an additional measurement is expected to be negligible; in this approximation, one can thus quantify the expected uncertainty reduction of the estimate of S i e , where both ǫ i (n i ) and ǫ(n i + 1) follow Eq.( 7) with the empirical variance σ e based on n i measurements.If all the observable S i are pairwise noncommuting, then the best available reduction in the uncertainty of Q e is achieved by measuring the observable S i that yields the largest value of |a i |∆ i .If there are some commuting observables within the set {S i }, then it is essential to take into account that commuting observables can be measured in the same run of an experiment.Instead of focusing on individual observables S i , an algorithm should rather focus on groups G i of observables, such that all observables in any group do pairwise commute.The expected uncertainty reduction of Q e upon measurement of the observables in G i is given by and the group of observables with the largest uncertainty reduction should be measured.

C. Active Learning Algorithm
With the ability to identify the observables to measure that result in the largest uncertainty reduction of the empirical estimate for the quantity Q of interest, we can finally formulate the desired active learning algorithm, which is comprised of the following steps: (i) Since no meaningful decision can be taken without any data, it is necessary to initialise the estimation with some measurements.As arbitrary choices should be kept to a minimum, this initialisation will be restricted to the minimal requirement to evaluate Eq. (7).While a single shot is the minimum required to construct an empirical expectation value, at least two shots are required to construct an empirical variance (Eq.( 4)).The initialisation will thus include two shots of each of the observables S i or each of the groups G i .
(ii) Once there is enough data to estimate the expected uncertainty reduction w i (Eq.( 10)), the observable S i or group G i with the largest expected reduction is selected to be measured in the next step.The outcome of this subsequent measurement is then added to the accumulated data, and this step is repeated as long as necessary or desired.
(iii) The process of repeating step (ii) is ended if the empirical estimate Q e of Q has reached the desired accuracy.
In the examples of explicit implementations of this algorithm discussed below in Sec.IV, this process of estimating Q e will be compared with a more conventional approach, in which step (ii) is replaced by a selection of observables S i or group G i from a fixed list, such that each observable or group is measured approximately as often.

IV. ESTIMATION OF PHYSICAL PROPERTIES WITH ACTIVE LEARNING
This section exemplifies the detailed process of estimating state fidelities and gate fidelities, and the dependence of the benefits of AL on the number of qubits in the underlying systems.All of the subsequent examples are based on numerically simulated measurement outcomes, with the outcomes generated randomly following the quantum mechanical probabilities.

A. State Fidelity
A typical example of a quantity of frequent interest is the fidelity of any given state ̺ with respect to a pure state |Ψ [2].In composite quantum systems, it can hardly ever be measured directly, but it can be cast into a weighted sum of expectation values of directly measurable observables.For any set of mutually orthogonal observables S i , the state fidelity is of the desired form F (̺, |Ψ ) = i a i Ψ| S i |Ψ with Since in most systems in the context of quantum information processing, the practically accessible observables are restricted to tensor products of Pauli matrices σ x , σ y , σ z and the identity 1, the subsequent discussion will assume this choice of observables.Since the identity commutes with all the three Pauli matrices, and measuring an N -qubit observable (i.e. a tensor product of N Pauli matrices, but no identity), implies also measuring all observables obtained by replacing Pauli matrices with identities without any additional effort, this situation fits naturally into the setting of commuting observables discussed above.
In order to achieve a sound statistical comparison between fidelity estimates aided by AL and conventional methods, the following discussion is based on state fidelity with respect to states |Ψ that are randomly chosen from a distribution that is unbiased according to the Haar measure [27].The quantum state ̺ in the state fidelity Eq. ( 11) is chosen such that the fidelity adopts its maximal value of 1, i.e. ̺ = |Ψ Ψ|, but none of the observations made in the following are specific to the case of maximal fidelity.
In particular, in the regime of few measurement repetitions, the data is strongly affected by the statistical fluctuations of measurement results.The empirical estimates of the fidelity will thus typically vary between different realisations of the fidelity estimates.In order to avoid substantial fluctuations in the numerical data, the subsequent discussion will therefore be based on an average over m independent realisations of the same fidelity estimate for any given total number of measurement repetitions (shots) n T .
With the empirical estimate F i (n T ) of the fidelity in the i-th realization with a given number of shots n T , and the exact, theoretically constructed fidelity F , the standard deviation σ(n T ) of the fidelity estimate with a given n T is defined as For sufficiently many repeated realisations m, this standard deviation is indeed independent of the statistical fluctuations that are inherent to each individual realisation, and the subsequent examples are based on m = 10000 realisations.Fig. 1 depicts the convergence of such a fidelity estimate for a randomly chosen 4-qubit state with a logarithmic scale for both the number of shots n T (on the x-axis) and the estimated standard deviation σ (on the y-axis).Data following the estimate aided by AL is depicted with triangles, and data following the conventional estimation strategy is depicted with squares.The black lines depict the 1/ √ n T dependence that is typical for the reduction of statistical noise.Since 2 × 3 4 = 162 shots (i.e. two measurement repetitions on each N-qubit observable) are necessary to complete the initialisation stage of the AL algorithm as described in Sec.III C, convergence is shown only for n T > 162.Initially, the convergence with the estimate aided by AL shows a faster decrease than the typical 1/ √ n T dependence, and it follows the 1/ √ n T -dependence only after n T ≃ 210 shots.On the other hand, the estimate following the conventional allocation (i.e. total number of shots are evenly distributed to each observable) follows the 1/ √ n T dependence during the entire process of convergence.
The initial, faster convergence shows that the AL algorithm is indeed capable of identifying the observables to measure that best help to decrease the inaccuracy in the fidelity estimate.Once enough data is accumulated, however, one can decide on an optimal allocation of repetitions to the different observables without accumulating more data.In this case, the adaptive AL method can no longer outperform a strategy with a fixed, but optimised allocation, and the convergence necessarily needs to follow the 1/ √ n T dependence.Due to the initial, fast convergence, however, the approach aided by AL is expected to outperform conventional approaches also if convergence towards low variances is required, so that a larger part of the convergence is dominated by the 1/ √ n T dependence.The observation that both approaches follow the 1/ √ n T dependence after sufficiently many shots (for n T 210 in this case), is helpful to define a figure of merit for the improvement of the approach with AL over the conventional approach.The ra-tio, n )), is independent of a desired standard deviation of the fidelity estimate as long as this standard deviation is sufficiently small so that the comparison is taken after the initial interval of fast convergence.In the following, we will thus refer to the ratio n Since state fidelity can be defined for systems with various numbers of qubits, it is well suited to highlight the benefits of AL with increasing system size.The following discussion is thus focused on the state fidelity of an N -qubit system with N ranging from one to six.

FIG. 2:
Cumulative distribution of improvements of state fidelity estimations with active learning obtained from statistics with 400 random states for each system size ranging from one to six qubits.While the improvement does depend on the underlying state, there is a clear trend of increasing improvements with a growing qubit number, due to the growing number of observables to choose from.
Fig. 2 depicts the cumulative distribution of the improvement n (c) T /n (AL) T found for different system sizes based on fidelity estimates for 400 different random states.In the case of a single qubit (dashed line with circles), one can notice that the improvement is smaller than one in about 25% of the cases.In those cases, the conventional method yields better estimates than the method aided by AL.The distribution of the observed improvements, however, is skewed towards higher values, and the average improvement does indeed indicate in favour of the AL method.
The only moderate benefit of AL in the estimate of single qubit fidelities can be attributed to the fact that there are only three different directly-measurable observables to choose from.Since, however, the range of different observable settings grows exponentially in the number of qubits, one would expect that the benefits of AL become increasingly pronounced with increasing system size.This expectation is also clearly corroborated by Fig. 2. For three qubits and more, the improvement does always exceed the threshold value of one, and the observed improvements grow steadily with the number of qubits.For N = 6 qubits (solid line with downwards triangles), the improvement exceeds the value 1.8 in half of the cases, and the improvement reaches values up to 3; that is, the number of measurements to be taken can be reduced by a factor of 3 without a decrease in the accuracy of the fidelity estimate.

B. Gate Fidelity
The case of state fidelity highlights that the benefits of AL are particularly pronounced if there is a large number of measurement settings to choose from.Since, in the case of gate fidelity, there is a choice for both the initial state and the measurement to be taken on the final state, the estimate aided by AL is potentially particularly beneficial for the estimate of gate fidelities.This section will thus focus on the estimate of gate fidelities.Rather than analysing statistics over randomly chosen gates, this section focuses on the two-qubit controlled-NOT (CNOT) gate and the three-qubit Toffoli gate.
The fidelity of a quantum channel Λ with respect to a gate U for N qubits [21] is given by where the summation is performed over two complete sets of orthonormal state vectors.
In order to recast the definition of gate fidelity into an experimentally realisable measurement prescription, it is necessary to expand each of the operators |i j| in the argument of Λ into a set of actual quantum states.While a set of four quantum states is sufficient for a single qubit, the following analysis is based on the five states With this choice of states, the gate fidelity for a single qubit can be expressed as with complex scalar coefficients c ijk .Due to the choice of an over-complete set of states, the values of these coefficients in not uniquely determined, but the choices c 000 = 1, c 111 = 1, c 01k = This generalises straight-forwardly to the gate fidelity for N qubits, with 5 N initial states |Φ k given by tensor products of the single qubits states |φ k , and coefficients C ijk given by products of the coefficients c ijk .
An explicit prescription in terms of state preparation, dynamics described by the channel Λ and final measurement is obtained by expanding the operators ij C ijk U |j i|U † into the set of observables that can be directly measured.With the set of local Pauli measurements S i also used in Sec.IV A for the state fidelity, one obtains with The situation regarding the choice of measurements is thus analogous to the case of state fidelity, but in addition to the choice of measurement, there is also the choice of initial state.In every step of the process, the AL algorithm will thus select the most informative initial state and corresponding measurement.
Similarly to the estimate of state fidelities discussed in Sec.IV A, the accuracy of an empirical estimate of the gate fidelity also depends on the actual realisation of random measurement outcomes, and a reliable assessment of the two methods to-becompared is obtained only in terms of statistics of many independent realisations of the same fidelity estimates.
Fig. 3 depicts the decrease of the standard deviation σ(n T ) with the number of shots for (a) the estimation of the fidelity between a two-qubit CNOT gate and CNOT channel, and (b) the estimation of the fidelity between a three-qubit Toffoli gate and a Toffoli channel, similar to Fig. 1.Triangles denote the case of estimates aided by AL, and squares denote the case in which all measurements are taken with the conventional approach.Qualitatively, the convergence confirms the behavior identified in Fig. 1, but the quantitative details are different: the period of faster convergence in the approach aided by AL last until n T ≈ 2000 for the CNOT gate and until n T ≈ 1.1 × 10 4 for the Toffoli gate.The improvement as derived from the ≈ 2.2 for the Toffoli gate.With the larger improvement for the CNOT gate and the Toffoli gate as compared to the improvement found for two-qubit and three-qubit state fidelities, Fig. 3 thus confirms the expectations that the benefits of AL are growing with the number of measurement settings to choose from.

V. OUTLOOK
In particular, in the era of noisy intermediate-scale quantum (NISQ) devices [28], the estimate of statefidelities and gate fidelities is a commonly encountered problem [29][30][31].The rapidly growing number of observables to be measured makes this an extremely challenging task even for moderate qubit numbers [32,33].Due to the large noise level in such devices [34,35], there is large uncertainty about a created state or an implemented gate, so a prior allocation of measurement repetitions for the specific state or gate is indeed problematic.The interactive active learning process for observables to be measured can thus practically facilitate the estimate of fidelities.Since such estimates are at the core of data-driven optimisation processes [36][37][38], and due to their iterative nature, these optimisations require several fidelity estimates, the proposed algorithm can contribute to our ability to derive practical use from faulty hardware.
The use of the proposed techniques is also not limited to fidelities, but it can also find applicability in variational quantum algorithms (VQA) [39][40][41] in which expectation values of a Hamiltonian or some other operator need to be estimated.The goal of the VQA is the experimental realisation of the quantum state that minimises this expectation value, and iterative optimisation algorithms estimate this expectation value with the same accuracy for all considered states [42].Since the proposed algorithm does provide not only empirical expectation values, but also bounds on their accuracy, it can also identify the lowest conceivable expectation value at any point in time during the data acquisition.As soon as this value exceeds the expectation value observed with another state, one can safely stop taking data based on this state and start estimating the expectation value with a different state.
With possible extensions to the estimate of quantities like entropy or correlation functions involving products of expectation values, the use of the proposed active learning algorithm has clear potential to become a commonly used tool in the analysis of quantum systems.

FIG. 1 :
FIG. 1: Convergence of fidelity estimates with active learning (triangles) and with the conventionally uniform allocation of repetitions to different observables (squares).Both standard deviation and the number of shots are depicted on a logarithmic scale.The lines indicate 1/ √ nT convergence.
number of shots n (c) T required to achieve a given accuracy of the fidelity with the conventional approach and the number of the shots n (AL) T to achieve the same accuracy with the approach aided by AL (i.e.σ(n (c) T ) = σ(n (AL) T

FIG. 3 :
FIG. 3: Convergence comparisons of the fidelity estimations with active learning and conventionally uniform allocations of experimental repetitions for (a) CNOT Gate and (b) Toffoli Gate.Both standard deviation and the total number of shots are plotted on a logarithmic scale.The reference lines indicate the convergence that scales as 1/ √ nT .part of the convergence that satisfies the 1/ √ n T behaviour is n (c) T /n (AL) T ≈ 2 for the CNOT gate and n (c) T /n (AL) T