Designing optimal protocols in Bayesian quantum parameter estimation with higher-order operations

Using quantum systems as sensors or probes has been shown to greatly improve the precision of parameter estimation by exploiting unique quantum features such as entanglement. A major task in quantum sensing is to design the optimal protocol, i.e., the most precise one. It has been solved for some specific instances of the problem, but in general even numerical methods are not known. Here, we focus on the single-shot Bayesian setting, where the goal is to find the optimal initial state of the probe (which can be entangled with an auxiliary system), the optimal measurement, and the optimal estimator function. We leverage the formalism of higher-order operations to develop a method based on semidefinite programming that finds a protocol that is close to the optimal one with arbitrary precision. Crucially, our method is not restricted to any specific quantum evolution, cost function or prior distribution, and thus can be applied to any estimation problem. Moreover, it can be applied to both single or multiparameter estimation tasks. We demonstrate our method with three examples, consisting of unitary phase estimation, thermometry in a bosonic bath, and multiparameter estimation of an SU(2) transformation. Exploiting our methods, we extend several results from the literature. For example, in the thermometry case, we find the optimal protocol at any finite time and quantify the usefulness of entanglement.


I. INTRODUCTION
Quantum parameter estimation, also known as quantum metrology or quantum sensing, is at the heart of quantum technologies [1].The quantitative assessment of some properties of a system, such as magnetic field amplitude, length, temperature or chemical potential, to name a few, is a key task for science and industry.A sensor is a device which manipulates probes interacting with the system of interest in order to readout its properties.Loosely speaking, the sensing becomes quantum, whenever the manipulation of the probes and their interaction with the measured system is governed by quantum physics.Quantum metrology has been very successful in advancing technological frontiers as showcased in several experiments, namely, the detection of gravitational waves [2,3], thermometry [4,5], magnetometry [6,7], and phase estimation in optical platforms [8].
The theory of quantum metrology aims at developing protocols that use optimally the probes and other metrological resources-such as quantum correlations, coherence and measurement time-in order to estimate the parameter with minimal error [9][10][11][12][13], and uncovers ultimate limits on the achievable estimation precision [14][15][16][17][18][19].These limits are usually expressed as bound on the Fisher information (matrix) that must hold in a certain context and are related to the mean squared error (MSE) via a Cramér-Rao type bound [20][21][22][23][24][25].In the single-parameter case, such bounds are often saturable in the regime where the protocol is repeated many times [26,27].* jessica.bavaresco@unige.ch† patryk.lipka.bartosik@gmail.com‡ pavel.sekatski@gmail.com§ mohammad.mehboudi@tuwien.ac.atHowever, in the limit of small data, such bounds are not generally saturable, and furthermore the MSE, addressed by the Carmér-Rao bound, may not be the best quantifier of the estimation precision.Such problems can be attacked from the perspective of the full Bayesian framework.In the Bayesian approach, one starts with a prior distribution (belief) of the parameter and updates it through the protocol based on the observed measurements results.Crucially, the choices of prior distribution and the cost (or reward) function have a substantial impact on the optimal protocol.Finding such optimal Bayesian protocols, is one of the key problems in metrology.This is a non-trivial task even in the case of single-shot scenarios, where the protocol is described by the combination of the initial state, the final measurement, and the estimator function.Optimal protocols are only known for a few highly-symmetric specific cases (see Ref. [28,29] for a review), and for specific cost functions in the singleparameter regime [30,31], while general effective numer-ical methods for finding them are lacking.
We therefore dedicate this work to address the shortcomings of quantum metrology within the single-shot Bayesian framework.Namely, we exploit the formalism of higher-order operations [32][33][34][35] to combine two pivotal aspects of the estimation protocol-the quantum state and the measurement, referred to as the quantum strategyinto a single and equivalent higher-order transformation, called quantum tester [32,36,37].While the standard approach to metrology typically involves the optimization over state and measurement individually [38][39][40], often in a non-efficient, heuristic manner, quantum testers allow us to optimize over the quantum strategy altogether, finding the optimal state and measurement efficiently with a single instance of a semidefinite program (SDP).Originally a tool applied to tasks such as channel discrimination [36,41,42], the higher-order operations formalism was recently extended to quantum parameter estimation problem, both in the frequentist setting in order to maximize the Fisher information of a protocol [43,44] and in the Bayesian setting in order to maximize the probability of a fixed-width credible interval [40].In this work we focus on single-shot Bayesian setting and show how to leverage the properties of higher-order operations in order to efficiently optimize the estimation protocol with respect to any reward function.
We propose three different methods to integrate the optimization of the quantum strategy, i.e., state and measurement, with the optimization of the estimatorstherefore finding the optimal overall protocol within arbitrary precision.Our methods take into account both numerical and practical limitations, finding application in a wide range of realistic scenarios.It is furthermore appropriate to any estimation problem regardless of prior distribution, reward or cost function, or the type of quantum evolution.Moreover, we show how these methods can be straightforwardly adapted to multiparameter estimation problems.
To demonstrate the merit of our approach, we present three case studies where we apply our methods to relevant parameter estimation problems: phase estimation, thermometry, and SU(2) estimation.These examples cover single and multiparameter problems, both unitary and non-unitary evolution, reward or cost functions of varying nature (e.g.fidelity and MSE), and different prior distributions (e.g.uniform and Gaussian).Moreover, we use one of our case studies, thermometry, to show how our approach can be adapted to approximate quantum strategies that do not permit for entanglement between the probe and an auxiliary system for their implementation.This allows us to demonstrate that entanglement provides an advantage over no-entanglement strategies in a finite-time temperature estimation task.Our techniques can be similarly used to answer whether entanglement can be useful in other estimation tasks, and put a lower bound on the usefulness of entanglement.In the thermometry problem, we also find the optimal protocol in finite time, which was previously only known in the frequentest regime [45], and show that the estimation precision only decreases with t → ∞.
All the code developed for this work is made available in our open online repository [46].

A. Bayesian parameter estimation
In a standard metrology problem, one is interested in estimating an unknown parameter θ by encoding it into the quantum state of a probe.The encoding process can be described by a quantum channel-a completely positive and trace preserving map-which we denote by E θ : L(H I ) → L(H O ) where H I and H O are the Hilbert spaces of the input and output systems of the channel, respectively.When probing the channel, it is in general more advantageous to also use an auxiliary system which is initially entangled to the probe-but does not go through the channel, as sketched in Fig. 1.In other words, one considers the extended channel E θ ⊗ id, where "id" is the identity channel acting on the auxiliary system.The chosen global input state, given by the density operator ρ ∈ L(H I ⊗ H aux ), is then mapped to a global output state ρ θ := (E θ ⊗ id)[ρ] by the extended channel.In order to extract the information about the parameter θ encoded in this state, one performs a joint measurement ) in the auxiliary system and the output state of the channel.Finally, in the considered setting, one designs an estimator θ that assigns an estimate θi to the true value of the parameter θ, conditioned on each measurement outcome i.The quality of the estimation can be then quantified by setting some score (cost) function, evaluating the closeness (deviation) of the estimator to the true parameter value.Indeed, the score should depend on the protocol ; i.e., the triplet of the initial state, the measurement, and the estimator {ρ, {M i } i , { θi } i }.A central problem in quantum metrology is finding the optimal protocol.
In the Bayesian approach, one starts with a prior belief in the parameter value given by a probability distribution p(θ).After the measurement, described by the Born rule one uses the Bayes' rule to update the distribution of the parameter based on the observed outcome i where the normalization factor is defined as p(i) := dθp(i|θ)p(θ).
The performance of the estimation strategy can be quantified according to a score.Generally, this can be cast as where r(θ, θi ) is a reward or cost function that quantifies the difference between the parameter θ and each estimate θi .A particular choice of cost function is the MSE r MSE (θ, θi ) = (θ − θi ) 2 .In light of this definition, it becomes clear that the optimal protocol will be the one that either maximizes or minimizes the score S, depending on whether r(θ, θi ) is a reward or a cost function, respectively.
As previously mentioned, this problem does not have a known analytical solution in general.and efficient numerical methods have only been proposed for a few special problems [39,47].In this work we provide an efficient algorithm that approximates the solution with arbitrary precision, and works for all cost functions and number of parameters.
B. Quantum testers: the quantum strategy as a higher-order operation A typically cumbersome part of metrology and estimation problems is the optimization of the quantum strategy, i.e., of the state and measurement that is used to probe the channel that encodes the parameter to be estimated.Here, we apply techniques from the formalism of higherorder operations [32][33][34][35] to fully characterize the set of quantum strategies applicable to a given estimation task.We then use this reformulation to efficiently optimize over quantum strategies using semidefinite programming [48][49][50].In particular, we exploit the connection between the states and measurements and an object of the higher-order formalism called a quantum tester.
While quantum maps describes transformations of quantum states, higher-order operations (also called supermaps) describe transformations of quantum maps themselves.The equivalent of a POVM in this formalism is a quantum tester -the most general higher-order transformation that maps quantum channels to a probability distribution, effectively "measuring" a quantum channel and yielding a classical outcome with some probability.As illustrated in Fig. 1, a tester T is equivalent to the concatenation of a state ρ and a measurement M .Nevertheless, as we now explain, the Born rule in Eq. (1) becomes linear in the tester variable, which is characterized by simple SDP constraints.We then exploit these two properties to efficiently optimize over the quantum strategies.
In order to express an estimation problem in terms of testers, we start by restating the problem using the Choi-Jamiołkowski isomorphism [51,52].In this repres-entation, a map E θ : L(H I ) → L(H O ) can be equivalently expressed as an operator Using the Choi operator, the output state of the probe can be expressed as (5) where (•) T I denotes the partial transposition over the input space H I .Then, the probability of obtaining outcome i, as in Eq. ( 1), can be equivalently written as We can now group the objects that constitute the quantum strategy, that is the state and the measurement, into a single object called the quantum tester [32,36,37].
) is a set of N O (standing for the "number of outcomes") operators defined as which allows one to rewrite the probability of obtaining outcome i, in Eq. ( 7), as simply The usefulness of this representation comes from the fact that, as shown in Ref. [32,36,37], testers have a simple mathematical characterization.More specifically, they obey the following set of necessary and sufficient conditions: i where σ ∈ L(H I ), σ ≥ 0 and tr(σ) = 1.It is straightforward to see that every set of operators T that satisfy Eq. ( 8) also satisfy Eqs.(10) and (11).The converse is also true.Given any set of operators T that satisfy Eqs.(10) and (11), one can define a state ρ and measurement such that The state ρ and measurement M are called a quantum realization of the tester T .This realization is not unique, as different sets of states and measurements can lead to the same tester.However, crucially, different states and measurements that lead to the same tester will also yield the same probability distribution {p(i|θ)} i in Eq. ( 1), and have the same performance in an estimation task.Hence, the optimization of any linear function of p(i|θ) in Eq. ( 9) over a tester T = {T i } that satisfies Eqs.(10) and ( 11) is a semidefinite program, and its optimal tester is guaranteed to have a quantum realization in terms of a quantum state and measurement.Importantly, once the optimal quantum strategy (i.e.tester) is found, the corresponding optimal state and measurement can be easily determined using Eqs.( 12) and (13).
Notice that, while a tester is a set of operators that act only on the input and output space of the channel C θ , its quantum realization may require an auxiliary system.This implies that the optimal quantum strategy may require entanglement between the target and auxiliary systems, and a global measurement that acts on both of these systems.The dimension of the auxiliary space is bounded to be at most the dimension of H I , as established by the explicit construction of ρ in Eq. ( 12).The auxiliary system can also be interpreted as a (quantum) memory.Hence, by optimizing over testers, one is effectively optimizing over all possible quantum strategies, including those that may require memory/entanglement for their implementation.
However, certain experimental limitations might induce a situation in which it is necessary to design a quantum strategy that does not require entanglement for its implementation, or a means to certify whether entanglement is indeed advantageous in a given estimation task.In App.A we provide details on how quantum strategies that do not require entanglement can be approximated with SDPs.Moreover, in Sec.V B we provide an example of a temperature estimation problem in which our methods demonstrate a clear gap between the performance of strategies operating with and without entanglement.

III. OPTIMAL TESTER FOR METROLOGY VIA SEMIDEFINITE PROGRAMMING
Using quantum testers, we can now rewrite the score of an estimation problem in Eq. ( 4) as Now, to find the optimal score of a given estimation task, the optimization of S over the triplet {ρ, {M i } i , { θi } i } can be substituted for an optimization over the pair We may express all dependencies of the score S on the estimates { θi } N O i=1 with a set of operators {X( θi )} N O i=1 , X( θi ) ∈ L(H I ⊗ H O ), which are given by an integral over the parameter θ, defined as These operators encompass all the given information about the task (prior distribution, cost function, and channels in which the parameter is encoded) that does not depend on the quantum strategy.Expressed in terms of these operators, the score is simply For any given set of fixed estimates { θi }, the optimization of the score is given by either a maximization or minimization (depending on the character of the cost function) of S over all testers T .Taking maximization for instance, we have that is the optimal score.The optimization over testers includes the constraints of Eqs.(10) and (11).Since testers T = {T i } are sets of positive semidefinite operators characterized by linear constraints, the above optimal score can be efficiently computed using SDP.Once again, the optimal tester is guaranteed to have a quantum realization, hence for any optimal solution of T that the SDP should return, there exist a probe state ρ and measurement M that can realize it; they constitute the optimal quantum strategy for the given estimators.
Notice that this can be straightforwardly generalized to the multiparameter regime as well.In App.C we provide more details on this case, while in Sec.V C we present an example of the application of our methods to the multiparameter problem of SU(2) estimation.
It is now clear that given the knowledge of the estimator values θi and the operators X( θi ) one can find the optimal tester T efficiently.The remaining difficulties thus are: (1) Finding the optimal estimators { θ * i } leading to the optimal score (2) Computing the integral in Eq. ( 16).
In the following, we construct three different approaches to tackle both of these problems.

IV. PARAMETER DISCRETIZATION AND ESTIMATOR OPTIMIZATION
In situations where the optimal estimators are unknown, or the integral in Eq. ( 16) cannot be calculated exactly, an approximation of the optimal score in Eq. ( 18) can still be computed with SDP.This can be achieved by first discretizing the parameter θ to a finite number of hypotheses, thereby mapping the original parameter estimation task onto one closely resembling channel discrimination.
Concretely, let us choose a discretization of θ such that θ → {θ k } N H k=1 , where N H (standing for the "number of hyphotheses") is the total number of different values assigned to θ.We can then define a prior distribution over the new hypotheses as which is computationally straightforward and has the advantage of giving a valid probability distribution.Now, let's define the discrete equivalent of the operators in Eq. ( 16) as { X( θi )} N O i=1 , where Hence, the approximate score S can be expressed as The value of S will depend on a chosen discretization {θ k } of the continuous parameter θ-the finer the discretization, the better the approximation.Hence, for a given discretization {θ k }, the optimum score is given by either maximizing or minimizing S again over the pair {{ θi } i , {T i } i } of estimates and testers.
In what follows we propose three different methods, all based on semidefinite programming, with which this approximation can be computed.

A. Method 1: Approximating metrology with channel discrimination
The first approach we propose is heavily based on the problem of channel discrimination [53].Its starting point is the realization that, without loss of generality, we may restrict ourselves to testers with as many measurement outcomes as there are hypotheses to be distinguished.In the context of our discretized parameter estimation problem, this amounts to setting N O = N H ; essentially, there is no advantage in increasing the number of measurement outcomes beyond the number of different values in the discretization of θ.The second simplification is to choose the values of the estimates { θi } to be the same as the values in the discretization {θ k }, in such a way that each measurement outcome i is directly associated to a value θi = θ i .Choosing the values of the potential estimates of the parameter to correspond to the values in the discret-ization of the parameter reduces the estimation problem to a discrimination problem.In this case, the task can be interpreted as determining the "classical" label k that is encoded via the values of θ k in the channel C θ k .In this case, the set of operators { X( θi )} becomes where N = N O = N H , and the approximate score S becomes For this fixed values of the discretization {θ i } N i=1 , the optimum value of S over all testers T is an SDP.
This approach circumvents problem (1), of finding the optimal estimators, by setting them to be the same values used in the discretization of the continuous parameter θ; and problem (2), of computing the integral in Eq. ( 16), by discretizing it.In principle, the higher the number of values in the discretization of θ, the closer the estimates are to the optimal estimator.The advantage then is that the optimal score can be found with a single SDP that needs to optimize only over the quantum strategies.The drawback, on the other hand, is that to achieve a good approximation of the optimal estimator, a high number N of values in the discretization are necessary, and since this number is directly associated to the number of measurement outcomes in the quantum strategy, the problem can eventually become intractable numerically and experimentally.In practice, however, as demonstrated in our examples in Sec.V, this method yields very good results with a value of N that can still be straightforwardly handled numerically.
Nevertheless, our next approach is designed to overcome this problem as well.

B. Method 2: Parameter discretization with optimal estimator
One possible way to overcome computational challenges is to fix the number of measurement outcomes N O , and hence the number of tester elements, to a value that is computationally (and experimentally) tractable and increase the value of N H far beyond that.Since, in this case, the complexity of the problem does not depend on N H , the discretization of θ can be arbitrarily fine.However, because the number of values in the discretization of θ can far surpass the number of measurement outcomes, the association of one estimate θi to each discretization value θ k is no longer possible.Therefore, the problem of choosing a "good" set of estimates { θi } is crucial.
Let us start by assuming that the optimal estimator is known to be { θ * i }.Then, the operators { X( θ * i )} amount to for a fixed discretization {θ k } N H k=1 , which can now in principle contain an arbitrarily high number of values N H ≫ N O .The approximate score S then becomes The optimization of S over the quantum strategy is then given by an SDP.This approach essentially takes care of problem (2), of numerically computing the integral in Eq. ( 16), by discretizing the parameter θ in an arbitrarily fine manner, while maintaining the number of measurements low enough to decrease the computational demand of the SDP.Hence, it is better suited for a situation in which the optimal estimator is known.It can nevertheless also be applied to a problem in which only a good guess for the optimal estimator is known, in which case the solution will be an approximation of the optimal score.Otherwise, to overcome problem (1) of finding the optimal estimator in the first place, we combine this approach with an estimator optimization in a seesaw algorithm, detailed in the following.

C. Method 3: Parameter discretization with estimator optimization
This final approach consists of a seesaw between two optimization problems-which are not necessarily SDPsthat will approximate an optimization over both the quantum strategy and the values of the estimates.
A seesaw is an iterative method that alternates between two optimization problems, using the solution of one as the input of the other.In our case, the first optimization problem is the SDP of the previous approach (Method 2).Namely, given where The second optimization problem will then be one that, for a fixed tester {T i }, taken to be the optimal tester of the previous SDP, optimizes over the values { θi } of the estimates.Namely, given where { θi } are N O possible values of θ.
Whether the problem in Eq. ( 28) is an SDP will depend on whether the reward function r(θ k , θi ) is linear on { θi }.In practice, this will often not be the case.Nevertheless, in some cases this problem can be solved analyticallydepending on the form of the reward function, the optimal estimator may be known or it may be found by standard Lagrangian optimization methods.In other cases, heuristic optimization methods may be applied.
This iterative method, although even for a fixed discretization {θ k } does not necessarily converge to the optimal value of S, in practice leads to very good approximations.A relevant point here is that, assuming a situation where the seesaw does converge to the optimal estimator, one may restrict themselves without loss of generality to a maximum number of outcomes N O that is related to the extremality properties of the tester.In principle, since (i) the set of testers T = {T i } is convex and (ii) the function S is linear on each tester element T i , the maximum (or minimum) of S will be achieved by an extremal tester.Analogously to extremal POVMs [54], extremal testers have at most d 2 (non-zero) elements, where d is the dimension of the space upon which the tester (or POVM) elements act.Hence, the number of outcomes N O in the seesaw can be fixed to be at most ), since, for optimal estimators, there is no advantage in optimizing over non-extremal testers.This fact also holds for Method 2 if one is guaranteed to know the optimal estimator.Furthermore, if the cost function is the mean squares error r MSE , then the optimal measurement will be projective (see Appendix A in Ref. [47]), and hence the optimal tester will have at most (d I × d O ) outcomes.We present a case study in Sec.V B, which concerns the problem of thermometry, that precisely falls in this case.

D. Convergence of the Methods
In all three methods above, we encounter some error due to discretization of the integral in finite hypotheses, as well as sub-optimality due to our choice of estimators.The discretization error is expected to vanish as N H increases, since all three methods are based on approximating an integral with a Riemannian sum with an error that vanishes as 1/N H .As for the sub-optimality, let us define the best approximate score S * similarly to Eq. ( 19) but for the approximate score defined in Eq. (22).For large enough N H , and when the cost function is supposed to be maximised, it means that S * ≤ S * while for cost functions that are supposed to be minimised, it means that S * ≥ S * .The sub-optimality roots from the fact that, none of the methods simultaneously optimise over both { θi } and {T i }.Each of the three methods, however, deals with sub-optimality differently.In all three methods one has and thus one can guarantee convergence by choosing N O ≫ 1.In Appendix B we rigorously derive the convergence for arbitrary cost functions and furthermore show that for certain cost functions that we will later use in the case studies (Examples 1. and 2.) the convergence is even faster, i.e., | S * − S * | = O(1/N 2 O ).When we cannot arbitrarily increase N O , Methods 2 and 3 come to the rescue.In particular, if a priori we know what the optimal estimators are, then Method 2 allows to find the optimal testers in one shot.However, it is rarely the case that we do know the optimal estimators a priori.Nonetheless, as we see in the examples below, Method 2 typically finds sub-optimal solutions that are very close to the optimum.Method 3, on the other hand, adds a powerful layer of optimization based on a seesaw between the estimators and testers, and therefore has a higher chance of finding the optimal protocol even with

V. CASE STUDIES
Our methodology for solving the Bayesian parameter estimation problem using higher-order operations offers numerous advantages over conventional techniques in quantum metrology.By proposing to optimize over the input state and measurement with a single SDP, and combining this with effective heuristics for the joint optimization of the quantum strategy and estimator, we overcome the longstanding challenges of the Bayesian approach.Our approach provides a comprehensive and versatile set of techniques that can be applied to any Bayesian estimation problem, setting it apart from most existing methods in the literature.
The key strength of our approach lies in its ability to handle a wide range of estimation problems, without being limited to specific error quantifiers.This universality allows our method to be seamlessly applied to any estimation scenario.Moreover, the techniques we described here are equally effective for single parameter and multiparameter estimation tasks.Finally, unlike most techniques in the Bayesian approach, our methods are not bound by the type of dynamics used to encode the parameter.Whether the parameter is encoded via a unitary evolution (e.g.phase estimation), or a more complex open system dynamics resulting from the probe's interaction with a thermal environment (e.g.quantum thermometry), our approach can be systematically applied and, as we show in the following, delivers consistent results which are very close to the optimal values.
We now delve into the practical application of our methods and explore how they can be applied to determine the optimal estimation strategy for various scenarios encountered in quantum metrology.The examples are deliberately chosen to be cover a wide range of different problems to demonstrate the versatility of our methods.

A. Example 1: Paradigmatic example -Local phase estimation
We start with a paradigmatic task in quantum metrology, namely the single-parameter unitary phase estimation.In this problem a single parameter θ ∈ [0, 2π) is encoded in an n-qubit quantum system via a local unitary channel where S z is a collective spin operator is the Pauli-Z matrix of the i-th qubit.Due to the intrinsic symmetry of the problem, every state of the n-qubit system can be effectively described using an (n+1)−dimensional Hilbert space, i.e. the symmetric subspace [55].Consequently, the n-qubit phase estimation problem can be equivalently mapped into a phase estimation problem of a d−dimensional system with d = n + 1.In this representation the generator of the dynamics, S z , expressed in the computational basis {|0⟩ , |1⟩ , . . ., |n⟩} is given by S z = n i=0 i |i⟩⟨i| [56].For this example, we take a typical reward function in phase estimation, which takes into account the cyclicity of its parameter space, given by We also choose two different priors, one given by a uniform distribution according to where θ min = 0 and θ max = 2π are respectively the minimal and maximal values of the parameter.The other distribution is given by a Gaussian distribution, according to where N is the normalization factor.We set the mean µ = π and the deviation σ = 1.For the discretization of the parameter θ and initial value of the estimators, we fix and respectively.We now discuss how to apply each of our methods to infer the optimal protocol in this case and present the results obtained for the problem of (n = 2)-qubit phase estimation, plotted in Fig. 2.
Method 2.Here we fix the number of outcomes N O ∈ {2, . . ., 10} and set the number of hypotheses to be N H = 1000 ≫ N O .We then discretize the parameter and set the estimators according to Eqs. (34) and (35).For the case of a uniform prior (Eq.(32), Fig. 2(a)), these are expected to be the optimal estimators.For the case of a Gaussian prior (Eq.( 33), Fig. 2(b), these estimators are not expected to be optimal, but are nonetheless used in Method 2, serving as a starting point for the estimator optimization in Method 3.
Method 3.For this method, we take the solution for the testers found using Method 2 for each N O as a starting point, and then optimize over the estimator.We prove in App.B that in this case the optimal estimators are given as a function of the optimal tester, according to where Note that with the definition of Eq. ( 36) the range of the estimator is ], instead of [0, 2π] as defined initially, this has no effect on the expected reward and can be resolved by adding 2π if θ * i < 0. Hence, the second step in the see-saw is solved analytically, and in each round of the seesaw we update the value of the estimators according to the expression above, as a function of the testers found by the SDP in the first step.Here and in the following examples, we iterate these two steps until the gap between the value of the score in subsequent rounds is smaller than 10 −6 .
Results.In Fig. 2 we plot the maximal approximate scores S obtained via the three methods outlined above.In the case of the uniform prior (Fig. 2(a)) we observe that the approximate score very quickly reaches the optimal one, i.e. S ≈ S = 1 2 (1 + cos(π/4)) which was formerly obtained with alternative methods [28,57].All three methods converge very quickly to this solution, already for N O = 3 outcomes, using Method 2 and 3, and for N O = 4 outcomes, using Methods 1.Notice also that, since we start already at the optimal estimator, we observe that for all N O there is no advantage of applying the seesaw in Method 3, since Method 2 already returns the optimal solution.For the case of the Gaussian prior (Fig. 2(b)), we see that again Method 3 converges to a stable value of S with N O = 4 outcomes.Here we can see an initial difference between Methods 1 and 2, which take a fixed estimator, and Method 3, which optimizes over the estimator.Nevertheless, all methods quickly converge to approximately the same value, at N O = 10.

B. Example 2: Non-unitary evolution -Thermometry
Let us now discuss a different instance of single parameter estimation, namely thermometry [58,59].In this case, the unknown parameter is the temperature θ of a sample (or a thermal bath) that is resting at thermal equilibrium, and it is encoded in the probe using a nonunitary quantum channel.We consider the probe to be a two level system (qubit) which is potentially entangled with an auxiliary system-that does not undergo the non-unitary dynamics.At the initial time t = 0, the probe and the sample which are initially uncorrelated start to interact.After some fixed time t, the probe and the auxiliary system will be jointly measured to infer the temperature of the bath.The probe's reduced state ρ p θ = tr A [ρ θ ] evolves according to a standard Markovian quantum master equation [60][61][62][63], i.e.
where H = ϵ |1⟩ ⟨1| is the Hamiltonian of the probe, σ − = |0⟩ ⟨1| and σ + = |1⟩ ⟨0| are the jump operators, and is the dissipator superoperator which captures the effect of the environment on the probe.The dissipation rates Γ in and Γ out are the only temperature dependent parts of the dynamics and are responsible for encoding the parameter.For a bosonic/fermionic environment, we have Γ in = J(ϵ)N B/F and Γ out = J(ϵ)(1 ± N B/F )-where minus sign should be used for fermions, and positive sign for bosons-with J(ϵ) being the bath spectral density while N B/F is the occupation number for the bosonic or fermionic bath, defined as N B = (e ϵ/θ − 1) −1 and N F = (e ϵ/θ + 1) −1 , respectively.In what follows, we focus on the bosonic bath, however our methods can be applied to the fermionic case as well.
The evolution specified by Eq. ( 38) generates an effective quantum channel E θ (t) that imprints the temperature into the probe's state (see App. D for the explicit expression).Note that in our notation we keep the time dependence because we are also interested in the optimal protocol at different times.
As for the cost function, we use the MSE while the prior distribution p(θ) is uniform and given by Eq. ( 32), where we set θ min = 0.1 and θ max = 2 as the minimum and maximum values of the temperature.We discretize the temperature parameter θ and fix the estimators according to Eqs. (34) and (35), respectively.We evaluate the thermometry problem for 100 different time steps, evenly distributed between t = 0 and t = 1.
Let us now discuss how to approach this problem using each of the three methods presented in this work.
Method 2.Here we fix the number of outcomes N O ∈ {2, . . ., 20} and set the number of hypotheses to be N H = 1000 ≫ N O .These values for the estimator are not expected to be optimal but are nevertheless used in Method 2, serving as an starting point for the estimator optimization in Method 3.
Method 3. To apply the seesaw in Method 3, we again begin with the solution provided by Method 2 for each N O and t as a starting point.For the thermometry problem, as was the case for the phase estimation problem, we can analytically express the optimal estimator as a function of the quantum strategy (tester).Since the score is quantified using the MSE, the optimal estimator is simply the mean over the posterior where ⟨θ k ⟩ is given by Eq. (37).Once again, in this example the first step of the seesaw consists in an optimization over the testers while the second step consists in reassigning a value to the estimators as a function of the testers found in the previous step, according to the above expression.
Results.In Fig. 3 we compare the performance of the three methods outlined above as a function of the number of outcomes N O for a fixed time t = 0.05.We observe that Methods 1 and 2 start from a relatively large S, which is now being minimized, that gradually decreases with increasing N O , while Method 3 already starts at a value of S close to where it will converge.While Method 2 quickly converges to the same values of S of Method 3 with increasing N O , at N O = 20 the approximate score predicted by Method 1 is still somewhat above the corresponding one obtained using Method 3.This is a result of the error in approximating the operators X( θi ) using a Riemannian sum with N H elements.Indeed, it is guaranteed that only in the limit N H → ∞ the approximate score in Method 1 converges to the true optimal score.Finally, we observe that Method 3 saturates around its optimal value already for This is expected since, for a MSE cost function, projective measurements are optimal [47].In Fig. 4, we focus on this point, by comparing Method 3 for some fixed values of N O as a function of time t, for the whole interval of time evaluated in this problem.Here we can see that while there is an improvement in increasing N O up to 4, the curve for N O = 16 lies on top of that of N O = 4, demonstrating that there is indeed no advantage in increasing the number of outcomes beyond Another interesting point to make here is that the score clearly depends on time of evolution.In particular, in Fig. 4 we observe that there exist times t < ∞ where the score is much better (recall that for a MSE cost function, the optimal score is being minimized) than at t → ∞.This means that measuring the probe in the transient regime can be advantageous over estimation performed after reaching the steady state.Indeed, this effect has been observed in thermometry previously [64].For the steady state, the optimal measurement strategy is known.In particular, in this case the auxiliary system is useless and the optimal measurement is a PVM in the basis of the probe Hamiltonian [65].However, as we discuss in the next paragraph, here we show that this is not the case for the transient regime, where entanglement with the probe leads to a more precise estimation.Our techniques therefore allow to determine the optimal probe and memory state, as well as the measurements and estimators in this difficult regime.
Finally, in order to investigate the role of entanglement in the transient regime of temperature estimation, we compare two different measurement strategies: a general strategy where the memory can be initially entangled The inset shows the same curves plotted in a log-log scale.We see that there are times t for which the precision of estimation is better than in the steady state t → ∞.This can be understood as an advantage arising from having entanglement with the probe: the entanglement allows the transfer of the information about the parameter into the memory system which is itself not subject to the dephasing dynamics of the master equation.As a consequence, measuring the entangled probe and memory systems before the joint system thermalizes provides a significant advantage.
with the probe, and the scenario where the initial state of the memory and probe is separable.Figure 5 highlights the importance of the entangled auxiliary qubit.To this aim, we have focused again on Method 3, and depict the approximate score as a function of time, for strategies with and without entanglement.As one expects, at the limits of very short time or very long time the two kinds of strategies perform equally.While in the former this is because there has not been enough time to collect and add new information to the prior, in the latter case it is because after a long time, the system reaches a steady state regardless of the input state; namely, of it being entangled or not.However, at the transient regime, we observe that an auxiliary system entangled with the probe can significantly improve the score.Let us emphasize that very often the parameter estimation problem described above cannot be solved analytically and is very difficult to solve numerically.In general, the effective evolution of the probe may result from a complicated master equation, which has to be evaluated many times.In our approach there is no need to evaluate the evolution for each potential state of the probe, as the only thing we need are the Choi operators associated with the effective channels.In this sense, our methods only require to solve the dynamics on a finite grid of parameter values, and thus makes finding the solution more tractable The entanglement of the optimal initial state ρ = |Ψ⟩⟨Ψ| IA in Eq. ( 41) as function of time found by method 3 for NO = 4.
The corresponding score S is given in Fig. 4, the physical parameters are given in Fig. 3.

numerically.
Finally, for the method 3 we investigate the optimal initial probe-ancilla state ρ found by our algorithm for different times t ∈ [0, 1].We consider the case with four outcomes N O = 4, which is found to give the optimal precision.Here the tester has four elements {T 1 , . . ., T 4 }, and the initial state can be computed with the help of equations (11,12).Note that the state ρ = |Ψ⟩⟨Ψ| IA is pure by construction, and its Schmidt diagonal form reads (for p 0 ≥ p 1 ) where n is the Bloch vector corresponding to the state |n⟩⟨n| = 1 2 (1 + n T σ) and basis choice of the ancillary qubit plays no physical role.We find that for all times the optimal state is always Schmidt diagonal in the computational basis for the probe, and the Schmidt state corresponding the the larger value p 0 is always the ground state, i.e. |n⟩ I = |0⟩ I .In contrast, the amount of entanglement in the optimal state depends on the interaction time t, as shown in Fig. 6.In particular, for the two first times t = 0 and t = 0.01 the state is closed to be maximally entangled.At t = 0 any state encodes no information on the parameter.At t = 0.01 this is due to a numerical error, as the score S is still found to be maximal by the algorithm, see the orange line Fig. 4. Then we find that the entanglement in |Ψ⟩, as captured by the value p 0 changes smoothly with t.Asymptotically t → ∞, we know that all initial states do equally well as the dynamics maps prepares as the steady state, which is product with the auxiliary system's state independent of the parameter.Moreover, from Fig. 5 it is clear that the presence of entanglement in the initial state does not give any substantial improvement for t close to one.We have also considered the optimal measurements {M 1 , . . ., M 4 } in Eq. ( 13) found by the algorithm.We see that the POVM elements are given by rank-1 projectors.The states corresponding to the projectors are also Schmidt diagonal in the computational basis (for the probe), and the amount of entanglement first increases and then decreases for t ≥ 0.06 (similar to Fig. 6).All the data about the optimal strategies found with our methods is available in our repository [46].
Lastly, as we have readily pointed out our methods can be simply adopted to other reward functions.As an example, in the Appendix D we address the thermometry problem with the mean square logarithmic error as the reward function-which has gained attention in recent years due to respecting scale-invariance properties [66][67][68].

C. Example 3: Multi-parameter estimation -SU(2) gates
For our final example, we will consider a more complex metrology problem which involves multiple parameters.This is the problem of estimating any qubit unitary-the group SU (2).
As a first observation, note that any qubit unitary operator can be parameterized in terms of three independent parameters θ := (θ x , θ y , θ z ), with 0 ≤ θ i < 2π for all i ∈ {x, y, z}, as Here, σ i for i ∈ {x, y, z} are the three Pauli operators.Since these generators do not commute, the estimation of the unitary U θ (or equivalently the parameter vector θ) is a multiparameter estimation problem.The unitary channel that acts on the probe system and encodes the parameter θ is then given simply by E θ [•] = U θ (•)U † θ , and will have a Choi operator C θ associated to it.
We take a natural reward function that captures how close the estimated unitary is from the actual one, which is the fidelity, i.e., where in this example d = 2.Here C θi -and for later reference C θ k -are defined analogously to C θ , for a vector of estimator values θi = ( θx a , θy b , θz c ) and for a vector of discretization values θ k = (θ x a , θ y b , θ z c ).Here, we again analyze the cases of two different prior distributions, a uniform prior, as in Eq. ( 32), and a Gaussian prior, as in Eq. (33).
The parameter vector θ = (θ x , θ y , θ z ) is discretized into values θ k = (θ x a , θ y b , θ z c ), where each of the three elements follow the discretization in Eq. ( 34), with θ min = −π and θ max = π, and all with the same value of N H = n H . Notice that this will amount to a final number of different discretization values of N H = n 3 H .The initial set of estimators θi = ( θx a , θy b , θz c ) is also set according to Eq. ( 35) for each parameter estimator, using the same values of θ min = −π and θ max = π, and all with the same value of N O = n O .Here again this amounts to a total number of outcomes equal to N O = n 3 O , analogously to the indexation of the parameter discretization.
We now discuss the application of our methods to this specific problem, plotting our results in Fig. 7.
Method 1.To apply Method 1, we again set n H = n O = n for each of the three parameters, amounting to a total number of discretization values and of outcomes equal to N = n 3 .For the results presented here, we look at values of n = N 1/3 ∈ {2, . . ., 10}.
Method 2.Here we fix the number of outcomes n O = N 1/3 O ∈ {2, . . ., 10} and set the number of hypotheses to be N H = n 3 H = 10 3 .Notice that while the total number of hypothesis is 1000, each of the three parameters is discretized in only 10 different values.These values for the estimator are not expected to be optimal but are nevertheless used in Method 2, serving as an starting point for the estimator optimization in Method 3.
Method 3. To apply the seesaw in Method 3, we again begin with the solution provided by Method 2 for each N O as a starting point.In this case, we optimize over the estimators using standard gradient descent techniques.Therefore, the first step in the seesaw is the SDP in Method 2, while the second step is a heuristic search over the estimators for a fixed tester.
Results.In Fig. 7, we plot the maximal approximate scores S obtained via the three methods outlined above.Panel (a) in Fig. 7 concerns the case of a uniform prior distribution and panel (b) that of a Gaussian prior distribution.In both cases we observe Method 3 converging to its final value of S with N

VI. CONCLUSION AND OUTLOOK
We introduced a new set of tools for addressing Bayesian parameter estimation problems, applying techniques from the formalism of higher-order operations and drawing inspiration from the problem of channel discrimination.
The key insight that we exploit consists of describing the quantum strategy, i.e., the state preparation and the measurement (see Fig. 1), as a single operation called a quantum tester.The later is characterized by SDP constraints, and can thus be optimized efficiently.We developed three methods for determining the state of the probe, the measurement, and the estimators in any parameter estimation problem, regardless of prior distribution, reward function, or description of the quantum evolution.
The first method exploits the connection between the Bayesian approach to parameter estimation and quantum channel discrimination.By discretizing the parameter to a finite set of values, and by furthermore associating each value of the estimator to a value of the discretized parameter, one can directly map a parameter estimation task onto a channel discrimination one, albeit with a reward function which inherits the geometry of the original parameter set.We leveraged this connection to create a general method for approximating the optimal solution of the estimation problem within any arbitrary precision.We also proved that our approximation converges to the optimal score.Although this method is conceptually simple and comes with a convergence guarantee, it may nonetheless be computationally demanding since the size of the optimization variables increases with the finess of parameter discretization.Our second method computes an approximation of the optimal quantum strategy for a fixed set of estimators.This method is less computationally demanding in general, but it relies on a good guess for optimal estimators, which is not always available.To address this drawback, our third method iteratively combines an optimization over the quantum strategy and over the estimators, and hence does not require any previous knowledge over the estimator.
A key advantage of our methods is their universal applicability.They can be used for any parameter estimation problem, regardless of the nature of parameter-encoding or the number of parameters to be estimated.To showcase this wide-ranging applicability, we examined three distinct case studies of high practical importance: local phase estimation, thermometry, and SU(2) estimation.
We also developed tools to bound the performance of estimation strategies that do not require entanglement and used them to show that, in the thermometry problem, probe states that are entangled with an auxiliary system lead to a more precise estimation of the temperature parameter, particularly at finite times.
Our work provides a starting platform for the application of higher-order operations to the problem of Bayesian parameter estimation.We conclude by summarizing further research directions that could draw further benefits from this approach.
Generalization to many-shots.The quantum strategies explored here concern a situation in which, at each independent realization of the experiment, one is given access to a single call, or copy, of the channel that encodes the parameter θ.It is also the case that, in more general scenarios, where one has access to multiple calls at once, the optimization of the quantum strategy can be done with SDP as well.Multiple-copy testers can take different forms, describing different classes of quantum strategies, such as parallel (non-adaptive) and sequential (adaptive), or even those involving an indefinite causal order.Such testers have been defined in Refs.[41,42] and explored in a frequentist approach to metrology [44].These techniques can also be applied to multiple-copy Bayesian estimation protocols, and be exploited to investigate, for example, whether different classes of estimation strategies can lead to higher precision in the parameter estimation.Similarly, strategies with an indefinite causal-order could lead to establishing new types of metrological resources.
While in the multi-shot scenario one can find the global optimal protocol as explained above, it can be hard to implement in practice, due to the exponential growth of of Hilbert space dimension.Alternatively, one can seek greedy optimal algorithms [69].In such a strategy, one would (i) perform the optimisation protocol as subscribed in our work, (ii) update the prior distribution to the posterior distribution based on the outcome and repeat steps (i) and (ii) until all shots are consumed.Despite not being necessarily globally optimal, this strategy can be very strong and has shown to be asymptotically optimal in some cases [70].An example of our interest is in Bayesian equilibrium thermometry [68].Regardless of global optimality, it is practically easier to implement such multi-shot strategies since the required operations do not involve exponentially increasing Hilbert spaces.
Applications to complex noise models.One of the key advantages of our approach is its versatility in handling various types of parameter-encoding dynamics.Often sensing methods are limited to specific parameterencoding channels, however, our approach can effectively model and accommodate any type of dynamics and address different types of noise.Specifically, if one has a good description of the noise appearing in the measurement process, one can simply incorporate this noise into the encoding channel and compute optimal testers and optimal estimators according to any one of the three methods.Therefore, applying these techniques to real (noisy) experimental settings to infer their actual performance would be an interesting next step.
Quantum metrology techniques for asymptotic quantum channel discrimination.In this study, we utilized higherorder operations, a technique previously used to study channel discrimination, to provide a nearly optimal solution for the quantum parameter estimation problem.It would be interesting to explore whether the reverse approach could also yield novel insights into the field of channel discrimination.Specifically, one could investigate whether leveraging asymptotic theoretical results from quantum metrology, such as the Heisenberg scaling, can contribute to the investigation of asymptotic quantum channel discrimination.This direction holds promise for gaining a deeper understanding of the relationship between channel discrimination and quantum metrology.
Connections with the multi-hypothesis testing problem.The discretization of the parameter space that we perform suggests that the Bayesian estimation problem can be connected with a multi-hypothesis testing problem [71].However, it should be noted that this connection is only partial.Indeed, our work exploits the fact that Bayesian estimation can be seen as a multi-hypothesis testing problem with (i) a continuous set of hypotheses and with (ii) a specific geometry on the "hypothesis space" as captured by the cost function.Still, we believe that the methods developed here (Method 1 in particular) could be potentially useful for determining bounds on the errors for the multi-hypothesis testing problem.Another interesting question is whether some of the bounds on error probabilities arising in the multi-hypothesis testing scenario could be also applicable in the Bayesian setting.
All code developed for this work is freely available in our online repository [46].

Appendix B: Convergence of the approximations
The cornerstone of our results is the discretization of the estimators and the hypotheses.The intuition suggests that as the discretization is made finer, the approximation becomes more precise and converges to the exact value.Here, we make this statement more rigorous.We focus on a score function that needs to be maximised; for those that require minimization a similar argument holds.
First, let us denote the optimal protocol by {{T * i }, { θ * i }}, it maximizes the score in Eq. ( 17) to it's optimal value S * .We know that the optimal protocol has at most D := (d I × d O ) 2 elements, therefore Now consider another protocol in which the estimators are fixed to { θi } N O i=1 , and the tester { Ti } N O i=1 is the solution of the SDP maximizing the score for the given estimators The protocol achieves a certain score where ϵ i quantifies how different these values are.For simplicity we also introduce ϵ := max i |ϵ i |.Note that for concreteness we here used the absolute value of the difference | θi − θ * k | as a distance between the estimated values, however any other distance d( θi , θ * k ) could be used here and below to define θ * k and ϵ instead (e.g. in the multiparameter case).The new values { θ * i } allow us to define the protocol {{T * i }, { θ * i }}, where the tester are taken from the optimal protocol but the estimators have been modified.It achieves a certain score Here, we used the fact that by construction { θ * k } D i=1 form a subset of { θi } N O i=1 , therefore the maximization over the tester {T i } N O i=1 includes the maximization over the tester {T i } D i=1 (some of the tester elements can be identically zero).
Our next goal is to bound the deviation between D i=1 tr X( θ * i ) T * i and the optimal score S * , which are obtained with the same tester.To do so we recall their Bayesian interpretation in terms of posterior parameter distribution in Eq. ( 4) where each expected values E (i) [•] is taken with respect to the probability distribution p(θ|i).This allows us to write Here, it is intuitively clear that for nearby value θ * i and θ * i the expected values E (i) [r(θ, θ * i )] and E (i) [r(θ, θ * i )] will also be close, provided that the reward function r is regular enough.For simplicity let us now assume that it is Lipschitz continuous, i.e., for any small enough ϵ ≤ δ (here K r might depend on delta) which directly implies S * ≥ S * − K r ϵ.Finally, for a scalar parameter the N O estimators { θi } can be chosen such that ϵ ≤ L N O , where L is some constant depending on the prior.This would guarantee the convergence to the optimal score with 1.The case of reward functions that are not Lipschitz continuous Notably, Lipschitz continuity of the reward function is not necessary to guarantee the convergence of the score S * → S * .However, in such cases it seems difficult to make a general statement, which might furthermore require to assume some regularity of the prior.Nevertheless, for illustration let us consider a piece-wise constant reward function that can be used to define a confidence interval for the parameter.This reward function coincides with the recent proposal in Ref. [40].This function is manifestly discontinuous, with r(θ, θ * i ) − r(θ, θ * i ) taking the value +1 on an interval θ ∈ I i + of width | θ * i − θ * i | ≤ ϵ, the value −1 on another interval of the same width, and is otherwise zero.From Eq. (B9) we then find where each probability Pr (i) [f (θ)] = dθf (θ)p(θ|i) is taken over the conditional distribution p(θ|i).Defining the union of all the intervals I + = ∪ D i=1 I i + we can further upper bound the score difference with where in the last term the probability is taken over the prior distribution p(θ) = i p(i)p(θ|i), and we used In this case the score function is a cost which has to be minimized, so to match to the notation with the previous section we consider maximization of r(θ, θ) = −(θ − θ) 2 .We have where ϵ i is defined in Eq. (B6).Plugging this in the Eq.(B9) one gets But for the MSE we know that the optimal estimator is the mean, i.e. θ * i = E (i) [θ] = dθ θp(θ|i).Therefore, the second term is zero and we find with ϵ = max i |ϵ i |.
3. Convergence for the cos 2 reward function In the phase estimation problem we considered a reward function that reads r(θ, θ * i ) = cos 2 θ− θ * i 2 . First of all, note that in this case the optimal estimator can be find in a closed form.To do so, we first rewrite the score for the posterior distribution p(θ|i) as E (i) cos 2 θ − θ * Impose that the derivative of the score with respect to the estimator θ * i is zero Which is equivalent to and admits two solutions θ * i = arctan ⟨sin(θ)⟩ (i) ⟨cos(θ)⟩ (i)  or θ * i = arctan ⟨sin(θ)⟩ (i) ⟨cos(θ)⟩ (i) + π, (B25) with the notation from the main text.We then need to pick the value which gives the highest contribution to the reward in Eq. (B22).In fact, up to a constant the reward is the scalar product between the vectors (⟨cos(θ)⟩ (i) , ⟨sin(θ)⟩ (i) ) and (cos θ * i , sin θ * i ), so it's maximum is attained when the two vectors are in the same half of the disc.Since the range of arctan ∈ [− π 2 , π 2 ] corresponds to positive cosine, the choice of the optimal estimator solution depends on the sign of ⟨cos(θ)⟩ (i)   θ * i =    arctan ⟨sin(θ)⟩ (i) ⟨cos(θ)⟩ (i)   ⟨cos(θ)⟩ (i) ≥ 0, arctan ⟨sin(θ)⟩ (i) ⟨cos(θ)⟩ (i) + π otherwise.
where in the penultimate line we use the optimality criterion Eq. (B23), and used the definition of ϵ i in Eq. (B6), and ϵ = max i |ϵ i |.Here, we provide some technical details for the Example B where we want to estimate the temperature of a bosonic bath.A qubit that is initially prepared in the state ρ p (0) = Remark.-Theonly temperature dependence comes from N B/F .In particular, the Hamiltonian term is independent of (θ) and thus can be ignored.Then the optimal solution for this problem should be rotated with the same Hamiltonian in order to compensate for it.As such, we can ignore the phases in the off-diagonal terms above.
Using the expected mean logarithmic error as a cost function In the main text, we took the MSE as our figure of merit.However, in recent years, an alternative cost function has been put forward for thermometry, which is motivated by scale invariance [67].This is the so called expected mean square logarithmic error (EMSLE) at the kernel of which lies the following reward function r(θ, θ * i ) = log 2 ( θi /θ), which can be analytically solved to find the optimal estimator as [67] θ * i = exp dθp(θ|i) log(θ) .(D4) Interestingly, for this cost function, one can also prove that the optimal POVM is in fact a PVM [66].Our results straightforwardly apply to such figure of merit.We showcase this by reproducing our Figs.3, 4, and 5.These are depicted here in the three panels of Fig. 8, respectively from top to bottom.The fact that PVMs are optimal is reflected in the middle panel, where our method M3 is optimal with only N O = 4 outcomes.

iFigure 1 .
Figure 1.Strategy for Bayesian parameter estimation.The left panel represents the prior probability distribution of the parameter θ encoded in the channel E θ .The center panel shows a single-shot strategy of parameter estimation in which part of a quantum state ρ is sent through the channel E θ and then measured by POVM {Mi}, yielding a classical outcome i.The right panel then represents the posterior probability distribution of the parameter θ, conditioned on the obtained measurement outcome i.

Figure 2 .
Figure 2. Local phase estimation (Example 1).The maximum approximate score S in a local n = 2 qubit phase estimation problem.Each panel shows the scores corresponding to Methods M1, M2, and M3 as a function of the number of outcomes NO ∈ {2, . . ., 10} for different prior distributions of the local phase: panel (a) corresponds to the case of uniform prior, while (b) corresponds to a Gaussian prior.The phase parameter ranges from θmin = 0 to θmax = 2π.The considered cost function is the cosine squared in Eq. (31).

Figure 3 .
Figure 3. Thermometry (Example 2).The minimum approximate score S in the finite-time temperature estimation problem S, renormalized by the maximum value of S in the plot.The temperature θ is encoded via a qubit non-unitary evolution specified by Eq. (38) acting for an amount of time t < ∞, here shown for a fixed time t = 0.05.The plot shows the different scores corresponding to Methods M1, M2, and M3 as a function of the number of outcomes NO ∈ {2, . . ., 10} for a uniformly distributed prior in a temperature parameter range of θmin = 0.1, θmax = 2.The considered cost function is the MSE in Eq.(39).The remaining parameters chosen are ϵ = 0.1 and J(ϵ) = 2.

Figure 5 .
Figure 5. Thermometry (Example 2): Advantage of entanglement in the transient regime.The main panelshows the approximate score S (computed via Method 3) as a function of the evolution time t for a fixed number of outcomes NO = 4.The parameters chosen are as in Fig.3.All values are renormalized by the maximum value of S in the plot.The inset shows the same curves plotted in a log-log scale.We see that there are times t for which the precision of estimation is better than in the steady state t → ∞.This can be understood as an advantage arising from having entanglement with the probe: the entanglement allows the transfer of the information about the parameter into the memory system which is itself not subject to the dephasing dynamics of the master equation.As a consequence, measuring the entangled probe and memory systems before the joint system thermalizes provides a significant advantage.

Figure 6 .
Figure 6.Thermometry (Example 2): Optimal state.The entanglement of the optimal initial state ρ = |Ψ⟩⟨Ψ| IA in Eq. (41) as function of time found by method 3 for NO = 4.The corresponding score S is given in Fig.4, the physical parameters are given in Fig.3.

Figure 7 .
Figure 7. SU(2) estimation (Example 3).The maximum approximate score S for an SU(2) multiparameter estimation problem.Both panels show the scores corresponding to Methods M1, M2, and M3 as a function of the cubic root of the total number of outcomes N 1/3 O ∈ {2, . . ., 10}, for different prior distributions of the phase parameters (θ x , θ y , θ z ).Panel (a) shows the case of a uniform prior while (b) corresponds to a Gaussian prior.Each of the three parameters ranges from θmin = −π to θmax = π.The considered cost function is the fidelity in Eq. (43).

1 / 3 O 3 O 3 O 1 / 3 O = 10 ,
= 3, i.e. total number of outcomes N O = 27.This is consistent with the fact that, in this case, extremal testers have at most(d I × d O ) 2 =16 outcomes, and hence, for N 1/= 2 (total number of outcomes N O = 8) we are not yet optimizing over all possible extremal testers.This is an interesting case where extremal non-projective POVMs with (d I × d O ) 2 outcomes show improvement over (d I × d O )outcome PVMs.Method 2 quickly approaches this same value, while Method 1 requires higher values of N 1/.Nevertheless, as expected, for a larger number of outcomes, namely N all methods yield the same result.

Appendix D :
Details of thermometry (Example 3)

Figure 8 .
Figure 8.The thermometry problem seen from the perspective of the EMSLE as the cost function.The top, middle and low panels correspond to the Figs. 3, 4, and 5 of the main text, respectively-note the logarithmic scaling in the middle and bottom figures.All other parameters are kept the same as the corresponding graphs in the main text.
4outcomes.The approximate score S is computed using Method 3 as a function of time t for different values of NO.The parameters are chosen as in Fig.3.The inset plot is a log-log plot of the same curves.All values are renormalized by the maximum value of S in the plot.Since the cost function is the MSE, in this case projective measurements (which have at most dI × dO outcomes) are optimal.Indeed, we observe that increasing NO beyond 4 does not change the value of the score.