Tight bounds on the simultaneous estimation of incompatible parameters

The estimation of multiple parameters in quantum metrology is important for a vast array of applications in quantum information processing. However, the unattainability of fundamental precision bounds for incompatible observables has greatly diminished the applicability of estimation theory in many practical implementations. The Holevo Cramer-Rao bound (HCRB) provides the most fundamental, simultaneously attainable bound for multi-parameter estimation problems. A general closed form for the HCRB is not known given that it requires a complex optimisation over multiple variables. In this work, we show that the HCRB can be solved analytically for two parameters. For more parameters, we generate a lower bound to the HCRB. Our work greatly reduces the complexity of determining the HCRB to solving a set of linear equations. We apply our formalism to magnetic field sensing. Our results provide fundamental insight and make significant progress towards the estimation of multiple incompatible observables.


I. INTRODUCTION
Physical quantities such as time, phase, and entanglement cannot be measured directly, but instead must be inferred through indirect measurements. An important category of such indirect measurements is parameter estimation. Quantum metrology describes the quantum mechanical framework that handles this estimation procedure. By recasting the problem as a statistical inference problem, parameter estimation can be associated with fundamental precision bounds. The key question in quantum metrology is what is the fundamental precision bound and how we can achieve it. Early applications of estimation theory focused on single parameter estimation such as phase measurements [1][2][3]. The ultimate precision bound for a single parameter is the quantum Cramér-Rao lower bound ( ), which was proved by Helstrom and Holevo [4][5][6]. Multiparameter quantum metrology extends the single parameter case [7][8][9], and is of fundamental importance in understanding a variety of practical applications, such as Hamiltonian tomography [10], field sensing [11][12][13] and imaging [14][15][16][17], and distributed sensing [18][19][20]. A central problem is to determine the optimal measurement strategies that saturate the [21]. To achieve this, one must assume locally unbiased estimators [22], which is reasonable given large amounts of prior information [23,24], and with many independent repetitions of the experiment [25]. Several reviews on the topic highlight recent progress in the field [26][27][28].
Each individual parameter we wish to estimate has an optimal measurement observable. However, when we wish to estimate two or more parameters simultaneously, the corresponding optimal observables may be incompatible. In this case, we can not achieve the optimal precision for each parameter individually. In this case the matrix bound is generally not simultaneously saturable for all parameters [29][30][31]. This motivates the search for tighter bounds that can be realised for practical applications of multi-parameter estimation theory. The Holevo Cramér Rao bound ( ) encapsulates the difficulties associated with incompatible observables [6]. Operationally, it is the maximum of all existing lower bounds for the error of unbiased measurements [6]. It represents the best precision attainable with collective measurements on an asymptotically large number of identical copies of a quantum state [32][33][34][35]. Despite its importance, the has seen limited use in quantum metrology so far. There are several reasons for this. First, the is difficult to evaluate given that it is defined through a complex optimisation over a set of observables. Second, implementing collective measurements is generally a difficult task. Nevertheless, applications of the in metrological tasks do exist. Suzuki found closed form results for parameter estimation with qubits [36], and explored connections between different types of metrological bounds in the special case of two parameter estimation theory. For pure states [7] and displacement estimation with Gaussian states [6], it has been shown that the is attained by single-copy measurements. The was also used as a tool to define the precision of state estimation for finite dimensional quantum systems [35]. Bradshaw et al. calculated the for a joint parameter estimation of a displacement operation on a pure two-mode squeezed probe [37].
Arguably, the is most relevant in multi-parameter estimation. An increasing number of true multi-parameter estimation protocols has been explored [38][39][40][41], and therefore the need for general, attainable bounds on multi-parameter quantum estimation is urgent. Recently, Albarelli et al. have investigated the numerical tractability of calculating the for the simultaneous estimation of multiple parameters [42]. For finite-dimensional systems, they recast the evaluation of the as a semi-definite program, which is an optimisation problem that can be efficiently implemented. To date, no general analytic expression for the is known. In this paper, we find that it is possible to recast the as a quadratic program with linear constraints, thereby providing tight bounds for multi-parameter estimation problems. We find that whenever the is strictly larger than the , the two-parameter has a surprisingly simple analytic form that does not require any numerical optimisation. Our analytical solution for the optimal observables that saturate the allows one to establish analytically the minimum penalty due to arXiv:1912.09218v1 [quant-ph] 19 Dec 2019 the incompatibility of the observables. Specifically, we generalise attainability constraints for simultaneous multi-parameter estimation problems where the commonly used Cramér-Rao bounds can not be saturated due to incompatibility. The analytic two-parameter can be considered a generalised quantum uncertainty relation [43]. For more than two parameters, our method does not provide tight bounds but still outperforms the . We provide an overview of multi-parameter quantum estimation in section II. We derive the analytic results for the for two parameters in section III, and extend the treatment to arbitrary number of parameters. In section IV, we discuss the application of the bound to magnetometry. We conclude in section V.

II. MULTI-PARAMETER QUANTUM ESTIMATION
Quantum estimation theory provides fundamental bounds to the estimation precision of physical parameters and the optimal measurements that saturate these limits [21]. We are interested in estimating multiple parameters simultaneously. The prototypical scheme requires that the vector of parameters θ = (θ 1 , . . . , θ d ) ∈ R d be imprinted on a quantum state ρ(θ). Denoting H D as the set of all Hermitian matrices in the Hilbert space of dimension D, we can see that ρ(θ) is a positive semidefinite matrix in H D with unit trace. We define measurement operators via a positive operator valued measure ( ) where 1 D denotes the identity operator, and Ω is the set of measurement outcomes. The outcomes of such a measurement can be used in a function called the estimatorθ, which gives an estimate of the parameters. A general estimation scheme requires access to multiple identical copies of the quantum probe state. A separable measurement can be individually applied to each copy of the state to obtain estimates of each parameter separately, whereas a collective measurement can be applied jointly on all copies of the state to acquire a simultaneous estimate of all parameters. The ultimate precision bound is the one that is asymptotically achieved by a sequence of the best collective measurements as the number of copies tends to infinity [9,[32][33][34][44][45][46][47][48]. The performance of the estimatorθ under any measurement can be quantified in terms of its mean square estimate matrix ( ) where the probability of measurement outcomes is provided by Born's rule p(ω|θ) = Tr[ρ(θ)Π ω ], and N is the number of independently repeated measurements. The set of estimators are said to be locally unbiased if for all ω ∈ Ω Under these conditions, the is equivalent to the covariance matrix of parameter estimates, and is lower bounded through generalisations of the Cramér-Rao bound from classical statistics where F is the classical Fisher information matrix [5,6]. The Fisher information characterises the for the best classical data manipulation given a measurement strategy in the asymptotic limit [25]. Well known quantum generalisations include the symmetric logarithmic derivative ( ), L j ∈ H N which generates the real symmetric quantum Fisher information ( [49,50]. This is referred to as the . Similarly, the right logarithmic derivative ( ), R j ∈ H N induces the complex Hermitian [51,52]. The matrices I S and I R generate different lower bounds with a corresponding scalar and , defined via Here W is a positive definite weight matrix, and · 1 denotes the trace norm of a matrix [53]. Throughout, Re(·) and Im(·) of a matrix denotes taking the real and imaginary part of each matrix element. Nagaoka investigated in detail the relationship between these bounds [54]. The central problem in quantum estimation theory is the minimisation of these scalar bounds over the family of probability of distributions defined by quantum measurements. Helstrom demonstrated that the is tight for single parameter estimation under the locally unbiased condition of Eq. (3), which is weaker than the unbiased condition. In this case, the is asymptotically attained through adaptive measurements [55,56]. This attainability does not generally extend to multiple parameter estimations. Specifically, measuring one parameter may affect the precision in the measurement of another parameter, and a trade-off between the associated with each parameter exists for any collective measurement [8]. Intuitively, any incompatibility among the parameters θ prohibits the simultaneous optimal estimation of all parameters. The was introduced as the minimum of the weighted sum of the s for each parameter under the unbiased condition. Its achievability was shown for Gaussian states [6], qubits [33,57], and qudit states [48]. However, the is not always attainable since the optimal estimators derived from the may not correspond to physical s [58]. The problem with saturability of the multiparameter bound was noted by Holevo, who provided the most general quantum extension to the classical Cramér-Rao bound. He introduced a tighter bound, the C H (θ), defined as the maximum among all existing lower bounds for the of unbiased measurements for the estimation of a set of parameters [59] Tr WΣ θ (Π,θ) ≥ C H (θ) ≥ max {C S (θ), C R (θ)} . (7) Helstrom [5] and Holevo [59] demonstrated that C H (θ) is attainable if the locally unbiased equality constraints in Eq. (3) are satisfied. Consider the Hermitian matrix with X j = ∫ dθ (θ j −θ j )Π(θ) Hermitian operators. The matrix Z(X) majorises both the inverse of the and Fisher information matrix [33]. Helstrom where ∂ j ≡ ∂/∂θ j and the constraints are the local unbiased conditions of Eq. (3). By minimising over only the first term in the objective function of Eq. (9), we obtain the [54] C S (θ)(ρ, W) = Tr W[I S ] −1 = min This shows that the is indeed a tighter bound than the , since the second term in Eq. (9) is non-negative [60]. Further, note that the is not defined explicitly in terms of a closed form for a given statistical model. This is in contrast to the classical case, where the Fisher information can be readily determined from a given statistical model. Recently, Tsang demonstrated that the is upper bounded by a quantity that is three times the -, such that [61] max In this paper, we provide an analytic form for the . The is the best asymptotically achievable bound under the conditions stated in Refs [32][33][34]48]. Both inequalities in Eq. (7) can be tight [60]. For instance, consider the skewsymmetric matrix Im(Tr[L j L k ρ θ ]). When we have C H (ρ, θ, W) = C S (ρ, θ, W) [8]. This condition is referred to as the weak commutativity criterion [62], and when it is fulfilled the is a good proxy for the . In the next section, we show how we can use methods from optimisation theory to address the minimisation over several Hermitian operators in the case where the weak commutativity criterion is not fulfilled.

III. HOLEVO CRAMÉR-RAO BOUND
In this section, we present algorithms for calculating the Holevo Cramér-Rao bound. To facilitate the derivation, we construct a bound on C * H such that and from Eq. (7) we see that C * H ≥ C R whenever C R is a tighter bound than C S (in other words, whenever C R is relevant). We first derive an analytic expression for C * H for two parameters in section III A. We demonstrate that the optimisation problem can be solved using the method of Lagrange multipliers. This reduces the problem of evaluating C * H to that of solving two sets of linear equations. In section III B, we derive a lower bound on C * H for more than two parameters.

A. Two-parameter setting
We first consider the for two parameters θ = (θ 1 , θ 2 ) . To obtain an analytic expression, we must define the weight matrix W for the scalar bound. For simplicity, we use the identity weight matrix. We determine C * H for two parameter estimation schemes using optimisation theory [63]. We want to solve the minimisation in Eq. (9), which is convex but not quadratic. We will manipulate Eq. (9) into a quadratic form in the variables X 1 and X 2 . Once this is done, such an optimisation problem can be solved analytically using the method of Lagrange multipliers.
Choosing Y = X 1 + iX 2 , Eq. (9) can be written as an optimisation program (see appendix B) minimize Y, t t, subject to Tr Y ρY † ≤ t, Tr Y † ρY ≤ t, Note that by considering both the real and imaginary parts of the above equality constraints, the actual number of real-valued equality constraints is six, which is consistent with the number of equality constraints corresponding to the minimisation in Eq. (9). Here Y is optimised over all complex matrices of dimension D, and is in general not a Hermitian matrix. By mapping Y and t into a real vector x, we cast this optimisation program into the standard form of where f (x) is a real linear objective function, while h i (x) and c i (x) are the corresponding equality and inequality constraint functions that must also be real. Eq. (14) is a convex program, since its equality constraints are linear and its inequality constraints are quadratic and convex. To check whether we can use optimality conditions from optimisation theory, we check whether Slater's constraint qualification holds. This amounts to checking that all the inequality constraints in Eq. (14) can strictly hold. Since t can be arbitrarily large, this indeed is the case. The optimality conditions for a continuous optimisation program are best stated in terms of the Lagrangian of Eq. (14), given by where the coefficients λ i ≥ 0 and z i ∈ R are Lagrange multipliers for the inequality and equality constraints respectively. Since Eq. (15) is a convex program and Slater's constraint qualification holds, the first order Karush-Kuhn-Tucker ( ) conditions of stationarity, primal and dual feasibility, and complementary slackness are necessary and sufficient [63] to determine the optimality of Eq. (14).
For our problem we have dual variables λ = (λ 1 , λ 2 ) = (u, v) and z = (z 1 , . . . , z 6 ) , which are vectors of Lagrange multipliers. The primal variables are Y and t, and the Lagrangian is given by (17) Here b = (0, 1, 0, 0, 0, 1) is a column vector that encodes the equality constraints in Eq. (14), constructed in Appendix B.2. The operator A is a linear superposition of ρ and its derivatives, where Due to the duality principle in optimisation theory [63], we may equivalently view the optimisation by considering the Lagrange dual function g(λ, z) = inf x L(x, λ, z) of Eq. (16).
Since the Lagrangian L is quadratic in x, the Lagrange dual can be found analytically by an unconstrained minimisation of the Lagrangian with respect to x for fixed values of the dual variables λ and z [63]. Due to the structure of the Lagrangian in Eq. (17), the Lagrange dual is never unbounded from below whenever u + v = 1. Hence, maximising the Lagrange dual function corresponds to an unconstrained maximisation problem. Since the Lagrange dual is also a quadratic function in terms of its dual variables z, it can be easily maximised exactly with respect to z. Note that our Lagrange dual is not a quadratic function with respect to λ = (u, v). This means that we cannot solve it analytically as it stands. Instead, we find a lower bound to C H that is associated with (u, v) = (0, 1) or (u, v) = (1, 0). In those two cases, the Lagrangian in Eq. (17) simplifies. In these cases, we may no longer have C H but we do have a lower bound C * H ≤ C H . However, C * H = C H if C * H > C S , as stated in Eq. (13). This arises as a necessary consequence of removing the absolute value in the objective function of the optimisation program in Eq. (9), which makes the optimisation problem 9: for all j, k = 1 to 6 do 10: and its associated optimal measurement observables X 1 and X 2 . Note that this algorithm depends only on the state ρ and its two derivatives ∂ 1 ρ and ∂ 2 ρ.
continuous. Hence, we obtain the maximum of two analytical formulas, which is itself analytical. When C * H is equal to C S , the Lagrange dual becomes a lower bound to C H and bears no physical significance. As shown in appendix B, the resulting analytic expression for C * H can be determined through finding a z ∈ R 6 that solves where Q j has the matrix elements The matrices Re(Q j ) are full rank when the derivatives ∂ 1 ρ and ∂ 2 ρ are linearly independent. We collect the result of this optimisation in the following theorem: Theorem 1. Let ∂ 1 ρ and ∂ 2 ρ be linearly independent. Then C * H for two parameters is given by This theorem is the main result of the paper: it gives a procedure for finding the analytic form of the for twoparameter estimation. Fig. 1 shows the pseudocode for this procedure.
Next, we need to establish whether C * H > C S . As shown in appendix B 2, when C * H C S the conditions require that either (u, v) = (0, 1) or (u, v) = (1, 0). Hence, there are two choices for Y that minimise the Lagrangian in Eq. (17), corresponding to the two choices for (u, v): corresponding to the choice Q 1 and Q 2 , respectively. Then, using Eq. (18) and the optimised values for z, we construct the analytic form for the observables X 1 and X 2 . Since the optimisation is only over 6 parameters, it is straightforward to find an analytic form of C * H . The procedure is shown algorithmically in Fig. 1.
From the expressions for X 1 and X 2 , we can check whether the assumption that C * H > C S holds. From duality theory, if we substitute X 1 and X 2 into the objective function of Eq. (9), we must obtain a value that coincides with either the (u, v) = (1, 0) value or the (u, v) = (0, 1) value. If not, then the assumption that C * H > C S must be violated, and we have in fact C H = C S , corresponding to for the optimal X 1 and X 2 . This procedure gives either C * H or C S as optimal. In the latter case, we have to find C S separately by calculating the .

B. Lower bound in the multi-parameter setting
For more than two parameters, we can also use the method of Lagrange multipliers to find C * H . However, this method is considerably more involved than the two-parameter case. In the two-parameter case, we could obtain a simple quadratic expression for ReTr[Z] + ImZ 1 that appears in the objective function of Eq. (9). However for the corresponding generalisation to multiple parameters, ReTr[Z] + ImZ 1 is no longer a quadratic form in the variables X j . For example, for three parameters Z takes the form The trace norm of ImZ is related to the eigenvalues of ImZ, and the eigenvalues of a 3 × 3 matrix involve a cubic equation. This renders evaluating the trace norm incompatible with our methodology. To address this, we obtain a lower bound to ImZ 1 that allows ReTr[Z] + ImZ 1 to be written as a quadratic form. This yields an optimisation problem whose optimal value is a lower bound for C * H , and which is given by The minimisation is performed over Hermitian matrices X 1 , . . . , X d and t. The indices j, k run over {1, . . . , d}. The inequality constraint V α is a function of a binary string α such that with X d+1 = X 1 . The inequality constraints V α arise from the structure of our lower bound on the trace norm of ImZ (see appendix C). By substituting Y j = X j + iX j+1 , we can write the matrices X j in terms of the matrices Y j , as before. We next interpret the Y j as arbitrary complex matrices of size n, and impose Hermiticity conditions for the corresponding X j matrices.
The Lagrangian of such an optimisation problem is a function of the complex matrices {Y 1 , . . . , Y d }, and also a function of its Lagrange multipliers. Its Lagrange multipliers are given by the non-negative multipliers v ∈ R 2 d for the inequality constraints, z ∈ R d(d+1) for the equality constraints, and Hermitian multipliers ξ 1 , . . . , ξ d for the Hermitian constraints. Most importantly, the inequality constraints can be satisfied strictly, so Slater's constraint qualification holds, and we can use the to determine the optimality conditions for Eq. (27). We minimise the Lagrangian constructed from the optimisation problem in Eq. (26). Since the Lagrangian is a convex quadratic form in the variables Y 1 , . . . , Y d , it can be minimised exactly. When this is done, we obtain the Lagrange dual function, which only depends on the Lagrange multipliers v, z and ξ 1 , . . . , ξ d . The Lagrange dual function always gives a lower bound for the primal optimisation problem.
While the Lagrange dual is quadratic in z and ξ 1 , . . . , ξ d , it is not quadratic in v. By minimising the Lagrangian over t and using the conditions, we conclude as before that the sum of the components in v is 1. We obtain a lower bound for the Lagrange dual by maximising over a discrete set of feasible Lagrange multipliers v, which corresponds to the tightness of the constraints V α ≤ t. Thus, we created a quadratic optimisation problem for three or more parameters that leads to a lower bound on C H . However, there is no guarantee that this lower bound is tight.
Next, we study the Lagrange dual function. By carefully choosing v, the Lagrangian is quadratic in {Y 1 , . . . , Y d } and can be minimised individually for each Y j . The coefficients for Y j in the Lagrangian are given by Γ j , where where ρ 0 = ρ, ρ j = ∂ j ρ for j = 1, 2, and T k, j are matrix elements that relate the Y j to the X k . Then the optimal value for the Lagrange multipliers can be obtained by maximising the Lagrange dual functions Algorithm 2 v = g α (Z, ξ 1 , . . . , ξ d )  with respect to the scalar variables z j,k and the Hermitian variables ξ j . Our lower bound to C H in the multi-parameter setting is then given by the following theorem. Theorem 2. Let d ≥ 3, z ∈ R d(d+1) and ξ 1 , . . . , ξ d are Hermitian matrices. Then C H is lower-bounded by where g α is given by Eq. (29).
This optimisation problem can be solved exactly using a single step of Newton's method. It requires the input state ρ and its derivatives ∂ j ρ. The algorithm to implement this lower bound is illustrated in Fig. 3.

IV. APPLICATION: MAGNETOMETRY
We use the technique introduced in this paper to consider magnetic field sensing, which has important technological applications in navigation, position tracking, and imaging [64]. We apply our method of finding the to the estimation of a magnetic field B = (B x , B y , B z ) in three dimensions. Quantum magnetometry is an important application of quantum metrology, and is essential for detecting defects and realising compact magnetic resonance imaging scanners [65]. Estimating each component individually allows us to attain the quantum limit [21], and this has been demonstrated in several studies [66,67]. However, in many practical applications, knowledge of multiple parameters is required simultaneously, and we must consider joint estimation strategies.
The three parameters of interest θ = (θ 1 , θ 2 , θ 3 ) appear in the single spin HamiltonianĤ j (θ) = θ · S j , where S j is the spin operator for the j th spin, and we included all coupling if v α > lowerhcrb then 21: lowerhcrb ← v α

22:
for all j = 1 to d do 23: provides a general description for a noisy environment, where g denotes the depolarisation magnitude and takes values between 0 and 1. The parameters θ are imprinted on the probe state via the unitary evolutionÛ = exp[−iĤ j (θ)]. For our example, we assume that the magnetic field in the z-direction is known, and we therefore wish to estimate the two parameters B x and B y . We choose an identity weight matrix to equally prioritise each parameter into a weighted scalar mean square error. We consider three families of n-spin probe states, namely the traditional states for single-parameter estimation, the modified 3D-states introduced by Baumgratz and Datta [13], and the states introduced by Ouyang in the context of quantum error correction [68].
First, the 3D-state can be written as where n is the total number of spins, N is the normalisation constant of the state and |φ ± j are the eigenvectors corresponding to the ±1 eigenvectors of the j th spin matrix. The evolved state then becomes We illustrate how the and the change with the number of probe qubits n for different depolarising channel strengths g in Fig. 4. We observe that the is indeed tighter than the . Both variance bounds increase with an increasing depolarising probability of the depolarising channel, as expected. The 3D-state attains the Heisenberg precision scaling for the noiseless case.
Second, we consider the class of states that are robust to a constant amount of erasure and dephasing [69]: Here, for every w = 0, . . . , n, the Dicke state |D n w is a uniform superposition over all computation basis states |x 1 ⊗ · · · ⊗ |x n with Hamming weight w. Since n = 2G, where G is related to the number of bit-flip errors that can be corrected, we present results for the states for which n is even. These are shown in Fig. 5, and compared with traditional states and 3Dstates. The traditional states give a worse estimation for larger qubit number at constant depolarisation rate, as is well-known. The states perform similarly to the 3Dstates.
Finally, we use our formalism to determine the optimal nqubit observables X 1 and X 2 that attain the using the 3D-states. Unlike the single qubit estimation case, analytic solutions to these observables are challenging and the dimension of X j scales as 2 n . Instead, we numerically determine their structure, shown in Fig. 6. We plot the real and imaginary parts of the matrices X 1 and X 2 , and the Hermiticity of the observables is clearly observed. FIG. 6: Heatmap for the optimal measurement observables X 1 and X 2 for the depolarised 3D-state, with five qubits under a depolarising strength g = 0.3. We plot the real and imaginary parts separately. The Hermiticity of these observables are clearly illustrated.

V. CONCLUSIONS AND DISCUSSIONS
Quantum metrology promises practical near term quantum technologies. Experimental developments in sensing are demonstrating early theoretical results and advancements in estimation theory. On the theoretical front, one prominent limitation that remains is the estimation of multiple non-compatible observables. Specifically, the optimal strategy to define the fundamental limits to precision estimates and their attainabil-ity is not known. Efforts to estimate multiple non-compatible observables have largely been focused on approaching the fundamental quantum Cramér-Rao bound ( ). This has led to efforts to devise non trivial measurement schemes that approach the . An alternative approach is to focus on the tighter Holevo Cramér-Rao bound ( ), which is physically attainable. However, the is difficult to evaluate since it involves a difficult optimisation over two observables. This has limited its application in quantum estimation theory.
In this paper, we have made significant progress in analytically solving the for two-parameter estimation problems, and providing bounds for larger number of parameters. In the two-parameter case, we reduce the complexity of the optimisation procedure to that of solving a set of linear equations, which can be easily solved using most numerical software packages. We also provide analytic expressions for the optimal positive operator valued measurements ( ). Our results readily apply to a large range of physical applications. This will provide deeper insight into the role of quantum measurements in quantum sensing, and help continue the drive of realising quantum technologies.
We illustrate an application of our results by considering the estimation of a magnetic field using noisy multi-qubit probe states. A recent numerical study by Albarelli et al. demonstrated the necessity of using the over the , based on a violation of the weak commutation condition [42]. Here, we provide further insight into the role that the plays in quantum estimation theory. We provide conditions for when this bound is tighter than the (or the Helstrom bound) and provide the corresponding optimal measurement observables.
There are several clear extensions of our work that can be readily addressed. The first would be to use the analytic expressions that we derive to provide further insight into more protocols in estimation theory. We hope that this will help to drive the wave for experimental validation. A second line of work would consider an extension of the Holevo bound to parameters with arbitrary choice of weight matrices. In this work we have considered unit weight matrices, which was motivated through placing equal importance to each parameter. A more general weight matrix would provide a more general bound. A final line of work would consider the optimal implementation of the general measurements that were derived in this work. This would provide an immediate access to the tighter through experimental implementation.

VI. ACKNOWLEDGMENTS
J.S.S and P.K. acknowledge the support of EPSRC via the Quantum Communications Hub (EP/M013472/1). Y.O. and E.C. acknowledge the support of EPSRC through grant number EP/M024261/1.

Appendix A: Matrix calculus
We prove some elementary facts about matrix calculus that we use repeatedly in our analysis of the turning points of the Lagrangian functions that occur throughout the manuscript.
We begin by defining some notations. Since a complex matrix of size n is a map from C n to C n , we use L(C n ) to denote the set of all complex matrices of size n. Here, the notation L(C n ) reflects the fact that a matrix is a linear mapping that is an automorphism on C n . At times we are interested in matrices that are also Hermitian, which means that they are equal to their complex conjugates. In this scenario, we use H n to denote the set of all complex matrices that are also Hermitian. Clearly for instance, H n is a strict subset of L(C n ). Now let f : L(C n ) → C denote a function that maps a complex matrix to a complex scalar.
to denote the Fréchet derivative of f (Y ) in the direction H. In the above formula, h is a real infinitesimal parameter. Properties of these Fréchet derivatives continues to be an active area of research [70], and they have also been recently used in quantum information theory [71].
In this paper, we are interested in matrix functions that are either linear or quadratic in the matrix variable Y . This leads us to analyse the Fréchet derivatives given by the following lemma.
Proof. The proof of the above results from direct application of the definition of the Fréchet derivative for the first two equations. For the last two equations, we also use the cyclic property of the trace.
We are often faced with the unconstrained minimisation of a quadratic form, and we show in the following lemma what the optimal solution to these optimisation problems are.

Lemma 4.
Let A ∈ L(C n ) and let ρ be a full rank matrix in H n . Then with optimal solutions given by Y = −A † ρ −1 and Y = −ρ −1 A † respectively.
Proof. We first prove (A6) and (A7). The corresponding objective functions that are to be minimised are convex and differentiable, so it suffices to find when their Fréchet derivatives are equal to zero for any direction H. For this, we use Lemma 3, from which we find that we must have ρY † + A = 0 and Y † ρ + A = 0 respectively. Making use of the fact that ρ is invertible whenever it has full rank, we multiply both sides of the equations, and find that the optimal Y s are given by Y = −A † ρ −1 and Y = −ρ −1 A † respectively. Substituting this back into the objective functions gives the result.

Appendix B: for two parameters
We explicitly derive the for the two-parameter case. In the two-parameter setting, the with a weight matrix W is given by the optimisation problem minimize subject to Tr ρX j = 0, where X j are constrained to be Hermitian matrices in H N , and Z is a matrix given by .

(B2)
Note that W is always taken to be a positive definite matrix. For simplicity, we only consider the scenario where W is the identity matrix.

Reformulation of the optimisation problem
The optimisation problem (9) can be solved analytically primarily from our ability to rewrite the objective function as a quadratic function in the optimisation variables X 1 and X 2 . The method of Lagrange multipliers when applied to problems with quadratic objective functions and linear equality constraints is well-known to be exactly solvable, for example in theory of portfolio optimisation in finance [72]. A similar argument will allow us to solve (9) using this method. We begin by showing why the objective function is quadratic. To see this, we first note that that the diagonal terms of Z are positive numbers, because X 1 and X 2 are Hermitian and X(·)X † is a completely positive map. Second, the positivity of the diagonal entries of Z implies that Third, the positivity of the diagonal entries of Z implies that the trace norm of ImZ can be explicitly evaluated. This is because the diagonal entries of ImZ must be zero. Since X 1 , X 2 and ρ are Hermitian matrices, it follows that where w = Tr[ρX 1 X 2 ] − Tr[X 2 X 1 ρ] is an imaginary number. The eigenvalues of ImZ are therefore ±w/2, which implies that the trace norm of ImZ is max{iw, −iw}. From this, we get Now let us make the substitution Y = X 1 +iX 2 . In this scenario, we can rewrite the equality constraints in (B1) as Hence the optimisation problem (9) can be written as Tr Y † ρY ≤ t, Note that the optimisation problem (B6) is a linear optimisation problem with convex quadratic and linear constraints. When the equality constraints are satisfied, the quadratic terms in the inequality constraints are non-negative, and by setting t to be arbitrarily large, we can see that the inequality constraints in (B6) can always be strictly satisfied. Since (B6) is also a convex optimisation problem because of its linear objective function and convex constraint functions, the Slater constraint qualification holds with respect to (B6). This implies that the first order Karush-Kuhn-Tucker ( ) conditions suffices to determine the optimality of (B6).

Analysing the Lagrangian
The conditions are stated in terms of the Lagrangian of (B6). The column vector of Lagrange multipliers corresponding to the equality constraints is The Lagrangian of (B6) is where u, v are non-negative Lagrange multipliers corresponding to the inequality constraints.
There are four types of conditions. First is the stationarity of the derivative of the Lagrangian with respect to the primal variables. Second is complementary slackness, which states that the product of the constraint functions [73] and their corresponding Lagrange multipliers is always zero. Third is the feasibility of the primal variables, and fourth is feasibility of the dual variables. Our strategy is to show that these optimality conditions hold.
Using this, it follows that where b is the column vector (0, 1, 0, 0, 0, 1) and where Before we proceed to derive the Lagrange dual function, we note the following.
1. We prove that the optimal t must be strictly positive from the positive definiteness of ρ. From the positive definiteness of ρ, t is equal to zero if and only if Y is 0, but this would violate the feasibility constraints. Hence t cannot be equal to zero.
2. We prove that for optimal dual variables, we must have u + v = 1. If the dual variables u and v are such that u+v < 1, then the optimal t is 0 which is a contradiction. If u + v > 1, the optimal t is arbitrarily large which makes the Lagrangian unbounded from below, which is also a trivial scenario. This implies that the term in the Lagrangian t(1 − u − v) for optimal values of the primal and dual variables must evaluate to zero. Hence for the optimal solution, we must have u + v = 1 and t > 0.
3. The conditions require that the complementary slackness conditions hold for the inequality constraints in (B6). This means that If ImZ 1 > 0, exactly one of the constraints corresponding to u and v must be tight, and complementary slackness implies that the optimal (u, v) must be either (u, v) = (1, 0) or (u, v) = (0, 1). This corresponds to the scenario where the is not equal to the . If ImZ 1 = 0, (u, v) = (1, 0) and (u, v) = (0, 1) do not necessarily optimize the value of the Lagrange dual, and in general provide a lower bound to the Lagrange dual.

Deriving the Lagrange dual functions
When (u, v) = (1, 0), the Lagrangian evaluates to where b = (0, 1, 0, 0, 0, 1) . Since ρ is full rank, ρ is invertible. Using Lemma 4, the above is minimised with respect to Y when Y = −A † ρ −1 with optimal value −Tr[A † ρ −1 A]. In this scenario, the Lagrange dual function of (B6) evaluated with Similarly when (u, v) = (0, 1), the Lagrangian evaluates to and is minimised when Y = −ρ −1 A † with an optimal value of −Tr[Aρ −1 A † ]. In this scenario, the Lagrange dual function of (B6) evaluated with (u, v) = (1, 0) is The Lagrange dual functions g(1, 0, z) and g(0, 1, z) can be rewritten in terms of the matrices Q 1 and Q 2 where in the Dirac bra-ket notation, we have Here | j denotes a column vector and k | denotes a row vector. The Lagrange dual function that we consider are thus

Optimal solutions
When is strictly greater than , the is thus equal to the maximum of the optimal value of both the Lagrange dual functions (B22), and is thus Since Q j are derived from the Lagrange dual, we have the promise that Q j are positive-semidefinite matrices.
Using the spectral decomposition of Q j , we can write where λ i ≥ 0 and u i are eigenvectors of Q j . Then, by denoting ·, · as the inner product on a complex Euclidean space, we can write and we can see that z T Q j z is always a real number for every real vector z.
Since z T Q j z is always real, and we know that z is real, then we must have This implies that it suffices to consider the minimization min z∈R 6 is the correct optimality condition to consider. Thus, when Re(Q j ) is full rank, we can write the as Interestingly, when ρ is full rank, the matrices Re(Q j ) are also full rank. We demonstrate this in the next subsection.

Full-rankness of Q
The analytic solution to the requires Re(Q j ) to have full rank such that the solution can be determined. In this subsection, we demonstrate that the full-rankness of the probe state ρ entails the full-rankness of these matrices. Since the regularity conditions of estimation theory require the state to be full-rank, our solution to the always exists. Notice that the matrices Q j defined in Eq. (B21) can be written where H is Gram matrix defined as follows. We consider the Hilbert-Schmidt inner-product X, Y = Tr[X † Y ]. We define the operators Then, we have that H is a Gram matrix with respect to this set of operators As a Gram matrix, it is positive semi-definite. Furthermore, we know that H will be full-rank if and only if the set {B 1 , B 2 , B 3 } is linearly independent. We note that A 1 cannot be written as a sum of A 2 and A 3 (since A 1 has nonzero trace whereas A 2 and A 3 are traceless). Also by a trace-argument, if { A 1 , A 2 , A 3 } are linearly dependent, we must have that A 2 is proportional to A 3 . But if A 2 and A 3 are proportional, then it is really a oneparameter problem and not a two-parameter problem. Hence, is also a linearly independent set. So full-rankness of ρ entails full-rankness of H.
We are now interested in the real part of the Q j matrices, looking for solutions of 2Re By performing elementary row operations by taking a linear combination of rows, followed by elementary column operators by taking a linear combination of columns, we get where we used Re(H) + iIm(H) = H. Since both rows are linearly independent, Re(Q j ) is also always full-rank. Therefore, we have that if the state is full-rank, then so too is the matrix Re(Q j ).

Appendix C: Lower bound in the multi-parameter setting
By restricting ourselves to the identity weight matrix, recall that the is the optimal value of the following optimisation problem over the Hermitian matrices X j in H N given by minimize X 1 , . . . , X d Tr[ReZ] + ImZ 1 , subject to Tr ρX j = 0, For j, k = {1, . . . , d}, let

Deriving a lower bound for the objective function
In general for a d-parameter estimation problem, we have Note that ImZ is always a skew-Hermitian matrix. For example, when d = 3, we have (C5) We use several ansatz's to obtain a lower bound for the trace norm of ImZ utilising the fact that Here, the inequality for the matrix indicates Loewner ordering of matrices. Instead of optimising over all possible U, we can optimise U over the finite set where ⊕ denotes addition modulo d over the ring Z d , s denotes the shift, and {| j : j = 0, . . . , d − 1} is an orthonormal basis. Clearly, the singular values of the matrices in U are all equal to 1, and hence U ∞ ≤ 1, where · ∞ denotes the Schatten infinity norm. Hence we can use U to get the lower bound which implies that for every s = {1, . . . , (d − 1)/2 } we have and each |w a,b | is the maximum of

Recasting the optimisation problem
Now let us observe that Tr[ReZ] + (a,b)∈E d, s |w a,b |/2 can be written as a quadratic form in the optimisation variables X 1 , . . . , X d . To see this, note for instance that The last equality in Eq. (C11) arises because every j can be a component of (a, b) in exactly two ways, as the tuples (a, b) correspond to edges in a cycle graph. To see how this works explicitly in the d = 3 and s = 1 scenario, note that Tr[(X 3 + iX 1 )ρ(X 3 + iX 1 ) † ]).
We can rewrite with an introduction of an auxillary variable α ∈ R so that (C14) is equivalent to minimize X 1 , . . . , X d , t t, subject to Tr ρX j = 0, Tr ∂ j ρX k = δ jk , This minimisation problem can be numerically checked for consistency with the optimisation in Eq. (C1).

Diagonalising the quadratic forms
At this point, we want to define new variables Y 1 , . . . , Y d so that we can for example write In general, we define the variables Y 1 , . . . , Y d so that they depend linearly on the variables X 1 , . . . , X d according to the system of linear equations However such a choice of S need not have be full rank. It is easy to see that S is full rank whenever its dimension is not a multiple of 4.

Proposition 5.
Let d be a positive integer, that is at least 2 and is not a multiple of 4. Then S has full rank.
Here, the Lagrange multipliers z j,k are real numbers while the Lagrange multipliers v α are non-negative numbers. The Lagrange multipliers ξ k are Hermitian matrices in H N . Note that the multiparameter Lagrangian is a quadratic form in Y, and as such, can be minimised using Lemma 4. Before for we do so, we consider the minimisation of the Lagrangian with respect to the primal variable t.
If the Lagrangian multiplier v α do not all sum to one, by picking t to either approach positive or negative infinity, the Lagrangian L d becomes unbounded. Hence the optimal multipliers v α must sum to one. By picking a discrete set of values of v α where v α is equal to zero to all but one value of α, and maximising the Lagrange dual function for each of these cases, we can obtain our lower bound to the multi-parameters . Hence without loss of generality, there is some value of the binary vector α for which the effective Lagrangian that we need to consider is Then we can rewrite the terms on the first line on the right side of (C29) as Then, given that ρ is a Hermitian full rank matrix, we can use Lemma 4 to get the corresponding Lagrange dual to be (C32) Our lower bound to the is thus max α ∈ {0,1} d max{g α : z j,k ∈ R, ξ k ∈ H N }, where j = 0, . . . , d, k = 1, . . . , d. Any feasible value of g α yields a lower bound to the .