Does nonlinear metrology offer improved resolution? Answers from quantum information theory

A number of authors have suggested that nonlinear interactions can enhance resolution of phase shifts beyond the usual Heisenberg scaling of 1/n, where n is a measure of resources such as the number of subsystems of the probe state or the mean photon number of the probe state. These suggestions are based on calculations of `local precision' for particular nonlinear schemes. However, we show that there is no simple connection between the local precision and the average estimation error for these schemes, leading to a scaling puzzle. This puzzle is partially resolved by a careful analysis of iterative implementations of the suggested nonlinear schemes. However, it is shown that the suggested nonlinear schemes are still limited to an exponential scaling in \sqrt{n}. (This scaling may be compared to the exponential scaling in n which is achievable if multiple passes are allowed, even for linear schemes.) The question of whether nonlinear schemes may have a scaling advantage in the presence of loss is left open. Our results are based on a new bound for average estimation error that depends on (i) an entropic measure of the degree to which the probe state can encode a reference phase value, called the G-asymmetry, and (ii) any prior information about the phase shift. This bound is asymptotically stronger than bounds based on the variance of the phase shift generator. The G-asymmetry is also shown to directly bound the average information gained per estimate. Our results hold for any prior distribution of the shift parameter, and generalise to estimates of any shift generated by an operator with discrete eigenvalues.


I. INTRODUCTION
In many measurement scenarios, an environmental variable acts to translate or shift a property such as the optical phase or position of a probe state.Accurate estimation of the shifted parameter allows a correspondingly accurate measurement of the environmental variable.For example, interferometric measurements of quantities ranging from temperature to gravitational wave amplitudes rely on the estimation of an optical phase shift.An important aim of quantum metrology is to determine the fundamental bounds on the resolution of such estimates, and how these bounds scale with available resources such as energy [1][2][3][4].
Let us denote the initial probe state by the density operator ρ 0 , and the generator of shifts by some Hermitian operator G. Then if the shift parameter Φ has the value φ, the final probe state is ρ φ = exp(−iGφ)ρ 0 exp(iGφ).In the following, particular attention will be paid to the estimation of a phase shift parameter, as this is sufficient for discussing various nonlinear estimation schemes previously proposed in the literature [5][6][7][8][9][10][11].The generator G in this case has integer eigenvalues, so that ρ φ+2π = ρ φ .More generally, however, our results apply to any shift generator G having a discrete eigenvalue spectrum.This includes the atomic scheme proposed in [12], which has recently led to the first experimental demonstration of nonlinear quantum metrology [13].
Returning to an optical example, a linear phase shift of a single-mode optical probe state corresponds to G = N where N is the photon number operator.Similarly, for a probe state comprising m such modes, each undergoing a nonlinear quadratic phase shift, one has G = [5,6].In cases like this, we quantify the resources n by the total mean photon number j N j .Alternatively, for a probe comprising n atomic qubits, each with a Pauli Z operator σ (j) z , one may consider the generator G = σ (1) z , and powers thereof, corresponding to linear and nonlinear Ramsey interferometers respectively [7,8,10,11].Again, n quantifies the resources.
We note that this quantification of resources n is different from the N (which we will denote N ) used in Ref. [14][15][16][17][18].The n used here typically corresponds to the conspicuous physical resources required to generate the probe state, and is what has previously been used to claim an advantage when using nonlinear interactions [5][6][7][8][9][10][11][12][13].
If Φ denotes an estimate of a shift parameter Φ, for some measurement scheme, then a standard measure [19,20] of the performance of the estimate is given by the average estimation error (called rms error in Ref. [19]), where the expectation value here is defined as Here p( φ|φ) is the probability density of the estimate conditioned on a fixed shift value Φ = φ, and ℘(φ) denotes the prior probablity density of the shift parameter.Measurement schemes which minimise the average estimation error, for given resources such as the average photon number or number of qubits available, are of fundamental interest in quantum metrology.
However, attention has often focused instead on minimising a different quantity, the 'local precision', defined for a fixed value of the shift parameter, Φ = φ, by [21,22] where • φ denotes an average with respect to the conditional probability density p( φ|φ).Some proposed nonlinear measurement schemes can achieve local precisions which scale in terms of the number of resources n as, for example, n −3/2 [5,7,8,12,13], n −2 [10], or 2 −n [9], for some value of φ.Even so, it will be shown below that the corresponding average estimation errors can scale no better than the usual Heisenberg scaling, n −1 .
For estimates which are, approximately, locally unbiased for all values of Φ over some interval [23], one has for shift parameters confined to this interval, providing a simple connection to the average estimation error.However, many phase estimates are unbiased only over very limited ranges [24], where these ranges are of widths comparable to the local precision itself.Thus, for example, while a high local precision of 2 −K in some region may allow the Kth binary digit of a phase shift to be estimated, it often will not allow the preceding digits to be estimated with any accuracy.These must either already be known (e.g., in phase tracking [25] or phase sensing [26] applications), which requires the prior probability distribution ℘(φ) to be almost as narrow as the posterior distribution (after the measurement), or they must be determined using further resources.Hence, unless the phase is already very well known, the scaling of P φ ( Φ) may be a very poor guide to the scaling of ǫ( Φ).Indeed, whereas the local precision has a scaling lower bound set by the rms variance, ∆G, of the generator for the probe state [21,22], the average estimation error has an asymptotically stronger (i.e.higher) lower bound, set by the entropy, H(G), of the generator [27,28].Thus, maximising the variance, rather than the entropy, of G, will not typically minimise ǫ( Φ).Here we further generalise and strengthen this entropic bound, in Secs.II and III, to replace H(G) by the so-called G-asymmetry of the probe state [29].The fundamental role of this quantity is emphasised by showing that it also bounds the mutual information between the shift parameter Φ and any estimate Φ.An important consequence demonstrated in Sec.IV is that, in a surprising contrast to the case of local precision, simply replacing G by some nonlinear function thereof, such as F = G 2 , cannot improve the average estimation error nor the information gain.
A careful analysis in Sec.V shows that nonlinearity can improve the scaling of ǫ( Φ) beyond n −1 for iterative implementations.These are implementations where the shift is applied on a sequence of probes of different sizes, so that G is replaced by a suitable sum of nonlinear generators.However, for a probe state comprising n qubits, it is shown that even adaptive variable-pass implementations of previously proposed nonlinear schemes can at best achieve scalings exponential in √ n for the average estimation error.In contrast, in Sec.VI, we show that the best possible scaling for ǫ( Φ) is exponential in n, both for qubit and optical probes, whether or not the generator is linear or nonlinear.Moreover, an exponential scaling is in fact achievable via linear estimation schemes, if multipass implementations are allowed.Whether nonlinear schemes are more robust than linear schemes to the presence of loss is left as a question for future investigation.

II. AN INFORMATION BOUND
The mutual information between the shift parameter and its estimate, H( Φ : Φ), is a measure of performance in its own right, quantifying the average number of bits obtained per estimate.A general upper bound for mutual information is obtained here, applicable to any generator G having a discrete spectrum, which will be used in Sec.III below to obtain a lower bound for the average estimation error.Several useful properties of this bound are also established.
Consider a parameter Φ with some prior distribution ℘(φ), and define an average prior state ρ = dφ ℘(φ) ρ φ .Then, using the Holevo bound [30], one immediately has H( Φ : Φ) ≤ S(ρ) − dφ ℘(φ) S(ρ φ ) = S(ρ) − S(ρ 0 ).Here S(ρ) = −tr[ρ ln ρ] denotes the von Neuman entropy of the state ρ.Now define where Π g is the projection on to the eigenspace corresponding to eigenvalue g of G, and the second equality may be checked by considering a basis diagonal in G.This map is unital, i.e., it maps the unit operator to itself.Using the nondecreasing property of von Neumann entropy under unital maps [30], together with U G (ρ) = U G (ρ 0 ), then yields the desired upper bound, for the mutual information.
The upper bound, A G (ρ 0 ) in Eq. ( 6), may be recognised as the increase in quantum entropy due to an ideal measurement of G on the probe state, with postmeasurement state U G (ρ 0 ).This entropy increase is relevant to bounding efficiencies in quantum thermodynamics [31].More generally, A G (ρ) represents the asymmetry of the state ρ with respect to a unitary group G (in this paper, the one-parameter Abelian group with Hermitian generator G) [29].The G-asymmetry quantifies the degree to which ρ 0 can break the symmetry of G (in this paper, the extent to which it carries information about the variable Φ which is conjugate to G) [29,[32][33][34].For the case where G has integer eigenvalues, A G (ρ 0 ) quantifies to what extent ρ 0 can act as a phase reference, an attribute clearly essential for detecting phase shifts.Note that for G with incommensurate eigenvalue gaps, the corresponding group is non-compact, but the above expression (5) for U G allows one to generalise the G-asymmetry (6) to this case also.
A form of Eq. ( 6) has been previously given for the special case of compact groups where the average prior state ρ is symmetric with respect to the group [32][33][34].For the case of a phase-shift (i.e. a G with integer eigenvalues) this means a prior distribution ℘(φ) which is uniform over the unit circle.Equation ( 6) represents a generalisation, for the case of one-parameter groups, to an arbitrary discrete generator G and arbitrary prior distributions of the shift parameter.Note that Φ ranges over (−∞, ∞) if e −iGφ is nonperiodic, corresponding to a noncompact group.
Several useful properties of A G (ρ) will be required further below.First, if ρ is pure and/or G is nondegenerate, then the states ρ g = Π g ρΠ g /p g , with p g = tr[ρΠ g ], are pure and mutually orthogonal, and Eqs. ( 5) and ( 6) yield where H(G|ρ) = − g p g ln p g is the entropy of the generator for state ρ.Second, one has the general bounds where f (G) is any function of G and 0 ≤ λ ≤ 1.The lower bound in Eq. ( 8) is saturated when f is 1:1, as may be seen by replacing G by f (G) and f by f −1 , while the upper bound is saturated for any pure state from Eq. ( 7).
The convexity of A G (ρ), as per Eq. ( 9), implies that the G-asymmetry is maximised for pure states.
The lower bound in ( 8) is obtained by noting that the eigenspaces of G are subspaces of the eigenspaces of f (G), so that U G • U f (G) = U G , and using the nondecreasing property of von Neumann entropy under unital maps [30] for the particular case To obtain the upper bound, let |ψ be some purification of ρ on a tensor product of the probe Hilbert space with an ancilla a, so that ρ = tr a [|ψ ψ|].Rewriting A G (ρ) as g p g S(ρ ρ g ), where S(ρ σ) = tr[σ(ln σ − ln ρ)] denotes the relative entropy of ρ and σ, one then has as desired, where the second equality follows from Eq. ( 7), and the inequality from the nonincreasing property of relative entropy under the operation of tracing over the ancilla [30].Finally, Eq. ( 9) may be obtained via the representation A G (ρ) = lim w→∞ w −1 w 0 dφ S(ρ ρ φ ), following from Eq. ( 5), and using the joint convexity property of the relative entropy [30].
Equations ( 6) and ( 8) imply in particular that the mutual information is bounded by the entropy of the generator for the probe state, i.e., Thus, for example, for a generator having d distinct eigenvalues, no more than ln d nats, i.e., log 2 d bits, of information can be extracted per probe state about the value of the shift parameter.

III. BOUNDS FOR RESOLUTION OF SHIFT PARAMETERS A. Average estimation error
A strong bound for the average estimation error in Eq. ( 1) may be derived analogously to weaker bounds obtained in [28,35], i.e., by combining a quantum upper bound -such as Eq. ( 6) -for the mutual information with the classical lower bound [36,37] where H(Φ) = − ℘(φ) ln[℘(φ)]dφ denotes the entropy of the prior probability density ℘(φ) for Φ.This lower bound is well known from rate-distortion theory, and follows from the inequality chain [37] , where H(A|B) denotes the conditional entropy H(AB) − H(B).
In particular, the combination of Eqs. ( 6) and ( 11) immediately yields the fundamental bound for the average estimation error, for any discrete generator G.This bound both strengthens and generalises previous entropic bounds in the literature [17,27,28,35].
For example, Nair [28] and Yuen [35] use weaker upper bounds for the mutual information, corresponding to replacing A G (ρ 0 ) in Eq. ( 12) by the quantum channel capacity under a fixed photon number constraint.Hall and co-workers have previously obtained bounds in a different manner, based on entropic uncertainty relations, which correspond to replacing A G (ρ) in Eq. ( 12) by the upper bound in Eq. ( 8) [17,27] (and, alternatively, by the upper bound in Eq. ( 7) for nondegenerate generators [27]), and replacing e H(Φ) by 1/q max , where q max denotes the maximum value of ℘(φ) [27].
Note that our bound ( 12) is applicable to iterative schemes, including adaptive ones, where the measurement performed on some probe state components is dependent (in practice, through additional known phase rotations) on the outcomes of earlier measurements on other components [4].This is because such a measurement scheme is formally equivalent to first applying shift generators G 1 , G 2 , . . . to respective probe components (e.g., qubits or optical modes), corresponding to applying the total generator G = G 1 +G 2 +. . ., and then performing the measurements sequentially (and adaptively).

B. Local precision
A bound for the local precision in Eq. ( 3) follows via the quantum Cramer-Rao inequality [2,3], and has the form [21,22] where ∆G denotes the root mean square deviation of the (total) generator G for the probe state.Note that, taking the averages of ν independent estimates, one obtains the usual statistical enhancement factor of 1/ √ ν for both of the bounds in ( 12) and ( 13).This will not be discussed further here, other than to remark that although the latter bound for P φ ( Φ) may be asymptotically achievable as ν → ∞ [21,22,24,25], this does not imply anything about the achievability of the corresponding bound for ǫ( Φ).

C. Comparisons
For generators with integer eigenvalues, i.e., phase shift generators, the scaling of ǫ( Φ) with the exponentiated Gasymmetry e −AG(ρ0) in Eq. ( 12) implies a scaling with the root mean square error ∆G which is at least as strong as that for P φ ( Φ) in Eq. ( 13) (ignoring multiplicative constants of order unity).This is a consequence of the inequality chain (14) for such generators, where the first inequality follows from Eq. ( 8) and the second is well known [37,38].For the case of a completely unknown phase shift, with ℘(φ) = 1/(2π), Eqs. ( 12) and ( 14) yield the asymptotic lower bound ǫ( Φ) (e∆G) −1 for the average estimation error, which is comparable to the lower bound (13) for local precision.However, importantly, the bound in Eq. ( 12) is significantly more powerful, as we now show.
Consider, for example, a probe comprising n qubits, and generator G = σ  13) and (14).That is why this state, equivalent to a NOON state or GHZ state [39], is often considered in quantum metrology.However, the corresponding G-asymmetry follows via Eq.( 7) as only A G (ρ 0 ) = H(G|ρ 0 ) = ln 2, implying via Eq.( 12) that the average estimation error does not decrease at all as a function of n.The only way an average estimation error scaling as n −1 would be possible from this state would be if there was sufficient prior information.That is, from Eq. ( 12), if −H(Φ) + A G (ρ 0 ) was of order ln n.But since A G (ρ 0 ) = ln 2, this means −H(Φ) ∼ ln n itself.Hence the amount of prior information about the parameter to be estimated would already be sufficient to locate it with the precision achievable by the measurement.
It is thus apparent that the lower bounds ( 12) and ( 13) can exhibit markedly different behaviour, with the former bound having an asymptotically stronger scaling in general.It follows that probe states generating optimal scaling for P φ ( Φ), obtained by maximising ∆G under various constraints for some value of φ, do not necessarily correspond to optimal bounds for ǫ( Φ).Since it is the latter quantity which has direct operational significance for the performance of the estimate, this has crucial implications for some nonlinear estimation schemes proposed in the literature, as will be seen below.
However, noting from Eq. ( 8) that A f (G) (ρ) ≤ A G (ρ), for any function f , precisely the same conclusion follows for the nonlinear generators G = (J z ) q and G = nJ z .That is, the average estimation error cannot achieve better than Heisenberg scaling for these generators.Moreover, the nonlinear generators G = H and G = A of Eq. ( 15) do not even allow the possibility of Heisenberg scaling, as they each only have 3 distinct eigenvalues, 0 and ±2 n−1 [40].
The above results are in stark contrast to the best possible scalings of local precision for these schemes, which improve on Heisenberg scaling, with n −q for G = (J z ) q [5,7,8,11]; n −2 for G = nJ z [10]; and 2 −n for G = H or A [9].This difference in scalings immediately raises a conundrum: how can nonlinearity improve the local precision, yet not the average estimation error?
This puzzle may be further deepened by noting that the probe states yielding optimal local precisions are generally an equally-weighted superposition of two orthogonal eigenstates of G, corresponding to the maximum and minimum eigenvalues of the generator [7,8,10,11].Thus, A G (ρ 0 ) = ln 2 for such a probe state, implying that the average estimation error cannot decrease with n at all, as discussed in Sec.III C above.Indeed, Eq. (10) implies that no more than 1 bit of information about the phase shift can be gained via such an 'optimal' probe state.

B. Probes comprising optical modes
An analogous puzzle holds for optical probes.As a simple example, if G = N is the number operator for a single mode field, then H(G|ρ) ≤ ln(e N + 1 ), implying from Eqs. ( 8) and ( 12) that the average estimation error can scale no better than N + 1 −1 (see also [17,27,28,35]).But for any nonlinear generator G = f (N ), A G (ρ) ≤ A N (ρ) ≤ H(N |ρ) from Eq. (8).Hence, the same scaling bound also applies to nonlinear generators for single mode fields.
In contrast, nonlinearity can significantly enhance the local precision.For example, choosing the coherent probe state ρ 0 = |α α| and nonlinear generator G = N 2 , P φ ( Φ) can scale as N −3/2 for large N [5].For this case the photon number distribution is Poissonian, which is well approximated by a Gaussian distribution for large N .Thus, using Eq. ( 7), A G (ρ 0 ) = H(N 2 |ρ 0 ) = H(N |ρ 0 ) ≈ (1/2) ln(2πe N ), implying via Eq.( 12) that the corresponding average estimation error can decrease with N .However, it cannot scale even as well as the Heisenberg limit, N −1 , but rather is lower bounded by the standard quantum limit scaling, These examples again lead to the puzzle that while nonlinearity can improve the scaling of the local precision, it cannot, by itself, influence the scaling of the average estimation error.This raises the question: can nonlinear schemes offer any advantage over linear schemes?

V. PUZZLE RESOLUTION: ITERATIVE SCHEMES
It has been seen that a simple replacement of a generator by a nonlinear function thereof cannot lead to an improved scaling of the average estimation error, in marked contrast to the situation regarding the local precision.However, a careful analysis shows that with a suitable sum of nonlinear generators, the bounds in Eqs. ( 12) and ( 6) allow for an enhanced scaling of ǫ( Φ), and that this enhanced scaling could, plausibly, be achievable by adaptive measurements.This resolves the above puzzle to some degree.Significantly, however, the scaling of the average estimation error does not necessarily achieve the same scaling as the local precision.
As a first example, let G(l) denote the nonlinear generator (J z ) 2 , for l qubits, and let ρ(l) denote an equally weighted superposition of two eigenstates of G(l), corresponding to its minimum and maximum eigenvalues (i.e., to 0 and l 2 if l is even, and 1 and l 2 if l is odd).Now consider the total generator and corresponding composite probe state defined by with n k := ⌈2 (k−1)/2 ⌉ (where ⌈x⌉ denotes the smallest integer not less than x).Since the phase shift generated by G(l) has period ≈ 2π/l 2 , this ensures that the phase shift generated by G(n k ), on the probe state component The basic idea is that the kth bit in a binary expansion of Φ/(2π) is estimated from the kth component of the probe state.Note that it is impossible to obtain more than 1 bit from each component of the probe state, i.e., more than ln 2 nats, as a consequence of Eq. ( 6) and the property A G(l) (ρ(k)) = H(G(l)|ρ(l)) = ln 2. This is the idea behind the famous quantum phase estimation algorithm [30] for linear phase shifts, subsequently generalized in [14][15][16].In practice, to achieve the best scaling for the average estimation error it may be necessary to use M > 1 copies of each component of the probe state to estimate each bit accurately, when combined with an adaptive measurement sequence [14,15].Counter-intuitively, it is the least significant (Kth) bit of Φ/(2π) that should be determined first, to allow the optimal measurement of the (K − 1)th bit, and so on up to the most significant bit.
The total number of qubits required in the above setup where these have a uniform distribution for ρ 0 .Taking into account the M copies of G it and ρ 0 , the corresponding generator (note that for M > 1 the distribution of G over ⊗ r ρ 0 is not uniform, and this upper bound is not tight).Thus Eq. ( 12) yields the following lower bound for the average estimation error, in the worst-case scenario of a completely random phase shift.Note that this bound is compatible with the scaling expected for a scheme that determines the first K bits of Φ, giving ǫ( Φ) ≤ (2π)/2 K+1 ∼ (M/n) 2 .In other words, for M large enough for this bitwise estimation scheme to work, we would expect the scaling with n in Eq. ( 17) to be attainable.Thus, this adaptive scheme demonstrates the possibility of an asymptotic n −2 scaling for the average estimation error.This is the same scaling (up to a constant factor) as for the optimal local precision for the generator G(n) [7,11].Furthermore, an n −q asymptotic scaling can be obtained for an analogous adaptive scheme based on the nonlinear generator (J z ) q , with ǫ( Φ) for a phase shift random over [0, 2π), where c q increases exponentially with q.Analogous results may be obtained for iterative implementations of the schemes in [5,6].However, a correspondence of scalings between P φ ( Φ) and ǫ( Φ) does not hold more generally.For example, let G(l) instead denote the nonlinear generator H + 2 l−1 for l qubits, with H as in Eq. ( 15) but with l in place of n, and with the additive constant being chosen to simplify eigenvalue counting.Further, let ρ(l) denote an equally weighted superposition of the two eigenstates corresponding to the minimum and maximum eigenvalues 0 and 2 l of G(l) (note such superpositions include the separable states |z, z, . . ., z and |−z, −z, . . ., −z [9,40]).Successive estimation of l binary digits then corresponds to n k = k − 1 in Eq. ( 16), requiring a total qubit number n = M K(K − 1)/2 ≈ M K 2 /2.One has A G (⊗ M ρ 0 ) ≤ ln(M 2 K ) as before, yielding the lower bound for the case of a completely random phase shift.Thus the scaling with n is considerably worse than the 2 −n scaling of the local precision for the corresponding single generator scheme [9].This last result demonstrates that the local precision does not necessarily characterise the performance of the average estimation error even for adaptive implementations.It follows that comparisons between various schemes, whether linear or nonlinear, should be made on the basis of the operationally significant quantity, ǫ( Φ) in Eq. ( 1), rather than P φ ( Φ) in Eq. (3).

A. Probes comprising n qubits
Since any generator G for n qubits has at most 2 n distinct eigenvalues, it follows from Eqs. ( 7) and ( 8) that A G (ρ 0 ) ≤ H(G|ρ 0 ) ≤ n ln 2, with equality for a probe state that is an equally weighted superposition of the corresponding eigenstates.Hence, from Eq. ( 12), the best possible scaling for the average estimation error satisfies Estimation schemes having an exponential scaling linear in n therefore of fundamental interest.Note, as per Eq. ( 19), that such a scaling is not attained via an adaptive implementation of the nonlinear scheme in [9], despite the local precision scaling as 2 −n for this scheme.It is possible that nonlinear schemes exist with an exponential scaling in n for ǫ( Φ).Here we show that, surprisingly, such a scaling can be achieved with linear generators.The extra ingredient that makes this possible is to allow for multiple (and varied) applications of the phase shift prior to measurement.
In particular, following the ideas in Higgins et al. [14], consider a probe system comprising m unentangled qubits, each in the state where the kth qubit is subjected to 2 k−1 applications of a linear phase shift generated by (1 + σ (k) z )/2.In Ref. [14] this was achieved experimentally via multiple passes through a medium.Another possibility would be to suitably increase interaction times of the qubits with the phase shift medium.The total generator and the probe state therefore have the forms The total phase shift of the kth qubit thus has period 2π/2 k−1 , and so can be used to estimate the kth bit of Φ/(2π), in the adaptive manner explained above [14].
If, as for the nonlinear schemes above, M copies are used to estimate each bit accurately, then the total number of qubits required is n = M K. Following the style of arguments used in the preceding section gives a lower bound on the the average estimation error of In this case it would appear that minimizing M could make a big improvement to the precision, and for M = 1 the ultimate scaling (20) might be achievable.However it must be remembered that Eq. ( 22) is merely a lower bound.Moreover, from the arguments in the preceding section, it is only for M sufficiently large that we expect these bitwise estimation schemes to attain the scaling with n of the lower bounds.Luckily, in this case, we can compare this bound to the actual performance of the best known adaptive schemes, for an initially completely random phase, as this has been studied extensively.These studies were done using the Holevo variance V H ( Φ) [41] rather than the average estimation error, but when these are small (as here, for n large), V H ( Φ) ≤ ǫ( Φ) 2 ≤ (π/2) 2 V H ( Φ) [42].In terms of scaling with n, the best performances is indeed for M = 1, which corresponds to the quantum phase estimation algorithm [30] and yields [15] (The constant c ≈ 1.18 can be evaluated by performing the integral of the distribution of phase estimates, Eq. (4.5) of [15].)Although this is not identical scaling to (20), it is still exponential in n, unlike (19).For M = 2, 3, and 4, the performance scales as 2 −n/4 , while for M ≥ 4 it scales as 2 −n/M , achieving the lower bound scaling in Eq. ( 22) as expected for M sufficiently large.
Note that in terms of N = M (2 K+1 − 1), the number of qubit-passes through the phase shift (which is the resource considered in [14][15][16][17]), the change in scaling as M increases appears quite different.For M = 1 and M = 2, the scaling is N −1/2 ; for M = 3 it is N −3/4 ; and for M ≥ 4 it is N −1 .We emphasize that counting resources as above, in terms of the number of qubits n, is necessary to enable consistent comparison with the nonlinear schemes considered in [7][8][9][10][11].Recently, some other papers have also considered the number of qubits (or, more strictly, the number of qubit measurements) as a resource [43,44].However, as these were motivated by qubit gate characterization in solid-state quantum computing, they imposed the constraint that the qubit measurement basis is fixed.In this case the only thing that can be chosen adaptively is the number of times the phase shift is applied to a given qubit prior to measurement.In [43] numerical evidence was presented that, using a locally optimal ("greedy") adaptive algorithm, a scaling of approximately 2 −0.1n is achievable.In [44] an analytical argument was given suggesting that a scaling of approximately 2 −0.16n should be achievable.Neither achieves the scaling (23) of schemes that allows adaptive controlled qubit rotations prior to measurement.

B. Probes comprising optical modes
For a optical probe containing m orthogonal modes, let N m denote the photon number of the mth mode, and N denote the total photon number N 1 + • • • + N m .The entropy of any generator G = f (N 1 , . . ., N m ) is bounded above by the joint entropy of N 1 , . . ., N m (since the distribution of G is a coarse graining of the joint distribution), yielding via Eq.( 8) ≤ ln e m+ N (25) The second line follows from standard statistical mechanics techniques, and the third line from the monotonic convergence of (1 + x/y) y to e x as y increases.Using Eq. ( 12), the average estimation error therefore has the fundamental lower bound for any generator which is a function of N 1 , . . ., N m .This includes the total photon number, N = N 1 + • • • + N m , in particular [35], but also includes, for example, the nonlinear generators N 2 and (N 1 ) 2 +. . .(N m ) 2 .Similarly, using Eq. ( 6), one has the fundamental upper bound m + N for the mutual information H( Φ : Φ). follows that estimation schemes with an exponential scaling linear in the average photon number are of fundamental interest.Further, a linear scheme, analogous to the one above for n qubits, is sufficient to obtain such a scaling (though with a different coefficient).In particular, the M = 1 linear multipass scheme of Higgins et al. [14], equivalent to the quantum phase estimation algorithm [30], is precisely such a scheme, involving m = 2K modes, with each pair of modes in the superposition state Hence, K = N = m/2, and from Eq. ( 23), ǫ( Φ) ≃ 1.18 × e −K ln(2)/2 asymptotically, which is consistent with Eq. ( 26) as here m+ N = 3K.Again, it should be noted, as for the qubit case above, the measure of resources considered here is the total mean photon number N required for the scheme, rather than the number of photon-passes N through the phase shift medium as in Ref. [14].Again, we use N to enable comparison with various linear and nonlinear estimation schemes [5,6].Moreover, photon number is a natural measure of interest to consider, as it characterises the energy resources required for a given optical scheme.

VII. DISCUSSION
The average estimation error and mutual information have been shown to satisfy the general entropic bounds in Eqs. ( 6) and ( 12), for any shift generator having a discrete spectrum, and for any prior distribution of the shift parameter.While, for phase shift generators, the G-asymmetry can be bounded above in terms of the variance of the generator, via Eq.( 14), the G-asymmetry is typically much less than this upper bound.Hence, the average estimation error can scale very differently to the local precision in Eq. ( 13), in terms of available resources such as number of qubits or total input photon number.
Indeed, somewhat surprisingly, a simple replacement of a linear generator by some nonlinear function thereof may have no effect on the average estimation error, yet lead to a marked improvement of the local precision.Further, while such scaling differences can disappear for iterative estimation schemes, this is not always the case.
It follows that the optimal scaling of the local precision, for a given value of the shift parameter, should be treated with some caution.As noted in relation to Eq. ( 4), the local precision is a direct measure of the average root mean square error only over an interval for which the corresponding estimate is (approximately) unbiased.However, many estimators in the literature are unbiased only over a very small interval, similar in magnitude to the local precision itself.In such a case (which is relevant, for example, in phase tracking and phase sensing appli-cations [25,26]), for the optimal scaling to be achievable, the amount of prior information required about the shift parameter is typically so great that the estimate itself can only extract up to 1 bit of further information, irrespective of the number of resources n.Examples of this phenomenon have been given in Secs.III C and IV.
It is concluded from the above that meaningful comparisons between various estimation schemes are most easily made on the basis of the operationally significant quantity, ǫ( Φ) in Eq. ( 1), rather than P φ ( Φ) in Eq. (3).Alternatively, if P φ ′ ( Φ) for some φ ′ is used, then it should be supplemented by the interval over which this estimate is (approximately) locally unbiased, i.e., the interval for which the local precision of the estimate corresponds to the actual root-mean-square error of the estimate, ( Φ − φ) 2 φ .Note that the width of this interval also bounds the width of any 'sweet spot' for which P φ ( Φ) = P φ ′ ( Φ), and hence bounds the width of the prior probability density required to ensure a precision of P φ ′ ( Φ) is actually achieved via measurement.As per Eq. ( 4), the estimation error averaged over this prior distribution will then (approximately) be equal to P φ ′ ( Φ).
Universal lower bounds for the average estimation error have also been given, in terms of the number of qubits or photons available, in Eqs. ( 20) and ( 26).Like Eq. ( 12), these bounds are independent of the form of the generator, and hence apply equally well to both linear and nonlinear schemes, including multipass schemes.They imply that it is impossible to achieve a scaling better than exponential, in terms of qubit number, and in terms of input photon number plus number of modes, respectively.Further, exponential scalings can be attained by multipass linear estimation schemes, as has indeed been shown experimentally for optical phase shifts [14].It follows that, in terms of the best possible scaling that can be achieved relative to qubit or input photon number, nonlinear schemes offer no fundamental advantage over linear schemes if multiple passes are possible.
However, a practical advantage of nonlinear schemes may be a greater robustness to loss.For example, multipass linear schemes of the type discussed above will be highly sensitive to loss due to multiple (or longer) interactions with the phase shift medium.Thus it would be interesting to find alternative physical implementations of the generator in Eq. ( 21) and its optical analogue.Further, while only shot-noise scaling is achievable for simple linear schemes in the presence of loss [1,45,46] (including for the average estimation error [28]), there is evidence this may not be the case for nonlinear schemes [6].Hence further investigation is required, including the determination of fundamental scaling bounds for lossy schemes analogous to Eqs. ( 20) and (26).
It would also be of interest to investigate the degree to which results generalise to the case of a generator with a continuous spectrum, such as spatial translations generated by a momentum operator.For example, the fundamental bounds (20) and ( 26) are universal, since n ln 2 and (m+ N ) ln e are respective upper bounds for mutual information, following via the Holevo bound.The possibility of other generalisations is supported by results such as a universal Heisenberg-type scaling for the average estimation error, in terms of |G| , which holds for both discrete and continuous generators [27].Further, weaker measurement-dependent entropic bounds on the average estimation error, given in [27], may prove helpful.Note that some differences are to be expected regarding linearity vs nonlinearity for the continuous case, since, for example, the property H(f (G)|ρ) ≤ H(G|ρ) no longer holds in general.