Quantum Semiparametric Estimation

In the study of quantum limits to parameter estimation, the high dimensionality of the density operator and that of the unknown parameters have long been two of the most difficult challenges. Here I propose a theory of quantum semiparametric estimation that can circumvent both challenges for a class of problems and produce simple analytic quantum bounds, even when the dimensions are arbitrarily high and little prior information about the density operator is assumed. The theory is especially relevant to the estimation of a parameter that can be expressed as a function of the density operator, such as the mean of an observable, the fidelity to a pure state, the purity, or the von Neumann entropy. Potential applications include quantum state characterization and optical imaging.

The study of quantum limits has grown into an active research field called quantum metrology in recent years, building on the pioneering work of Helstrom [1] and Holevo [6]. A major current challenge is the computation of quantum bounds for high-dimensional density operators and highdimensional parameters, as the brute-force method quickly becomes intractable for increasing dimensions; see Refs. [9] for a sample of recent efforts to combat the so-called curse of dimensionality. Most of the existing methods, however, ultimately have to resort to numerics for high dimensions. While numerical methods are no doubt valuable, analytic solutions should be prized higher for their simplicity and offer of insights, as with any study in physics. Unfortunately, except for a few cases where one can exploit the special structures of the density operators [1,6,[10][11][12][13], analytic results for highdimensional problems remain rare in quantum metrology.
Here I propose a theory of quantum semiparametric estimation that can turn the problem on its head and deal with density operators with arbitrarily high dimensions and little assumed structure. The theory is especially relevant to the estimation of a parameter that can be expressed as a function of the density operator, such as the mean of an observable, the fidelity to a given pure state, the purity, or the von Neumann entropy. The density operator is assumed to come from an enormous class, its dimension can be arbitrarily high and possibly infinite, and the unknown "nuisance" parameters have a similar dimension to that of the density operator. Despite the seemingly bleak situation, the theory can yield surprisingly simple analytic results, precisely because of the absence of structure. * mankei@nus.edu.sg; https://www.ece.nus.edu.sg/stfpage/tmk/ The results here are ideally suited to scientific applications, such as quantum state characterization [14,15] and optical imaging [1,4,5,12,13], where the dimensions can be high and it is prudent to assume little prior information.
The theory set forth generalizes the deep and exquisite theory of semiparametric estimation in classical statistics [16][17][18], which has seen wide applications in fields such as biostatistics [18], econometrics [19], astrostatistics [20], and, most recently, optical superresolution [21]. By necessity, the classical theory involves infinite-dimensional spaces for continuous random variables and makes extensive use of Hilbertspace and geometric concepts. As will be seen later, the operator Hilbert space introduced by Holevo [6,22] turns out to be the right arena for the quantum case, and the geometric picture of quantum states [7,8,23] can provide helpful insights.

II. PREVIEW OF TYPICAL RESULTS
Before going into the formalism, I present some typical results of the theory to offer motivation.
Suppose that an experimenter has received N quantum objects, such as atoms, electrons, or photons, each with the same quantum state ρ. The experimenter would like to estimate a parameter β as a function of ρ. Perhaps the mean position of each electron β = tr ρX, where tr is the operator trace and X is the position operator, is of interest in electron microscopy, or the fidelity β = ψ| ρ |ψ to a target pure state |ψ is of interest in a quantum-information experiment [14]. Without any knowledge of ρ, what is the best measurement to perform for the estimation of β, and what is the fundamental limit to the precision for any measurement?
The quantum semiparametric theory provides simple answers to the above questions. Let β = tr ρY , where Y is a given observable, and assume that the estimator is required to be unbiased. The theory shows that the best measurement is simply a von Neumann measurement of the observable Y of each copy of the objects, followed by an average of the outcomes. For any measurement, the mean-square error (MSE) of the estimation has a lower bound given by Absent any information about ρ, the separate measurements and the sample mean seem to be the most obvious procedure to perform, but it is not at all obvious that it is optimal, given the infinite possibilities allowed by quantum mechanics.
More nontrivial examples include the purity β = tr ρ 2 , for which the bound is 2) and the relative entropy β = tr ρ(ln ρ − ln σ) with respect to a target state σ, for which For these two examples, the theory in its present form does not answer the question of how one can achieve the bounds, but they still serve as quantum limits, with expressions that are simple, exact, and nontrivial-the factor of 4 in Eq. (2.2) is not a typo and rather curious. This work thus addresses a foundational question by Horodecki [15]: "What kind of information (whatever it means) can be extracted from an unknown quantum state at a small measurement cost?" While the theory is not yet able to determine the optimal measurement for a general nonlinear parameter, it at least gives an exact limit to the information extraction via a statistical notion of efficiency. It shows that quantum metrology-and quantum semiparametric estimation in particular-offers a viable attack on the question.
An extension of the above scenario is the estimation of β given a constraint on ρ. For example, suppose that the quantum state is known to achieve a fidelity of φ| ρ |φ = F with respect to another pure state |φ . How may this new information affect the estimation? Write the constraint as tr ρZ = ζ, where Z is an observable and ζ is a given constant. The quantum bound for the β = tr ρY example turns out to be where A • B = (AB + BA)/2 denotes the Jordan product. The bound is reduced by the correlation between Y and Z. The theory can in fact give similar analytic results for a class of such semiparametric problems. It must be stressed that, apart from the underlying Hilbert space and the constraint discussed above, the experimenter is assumed to know nothing about ρ, and the proposed bounds are valid regardless of the dimension of ρ. The existing method of deriving such quantum limits is to model ρ with parameters in a commensurate dimension [7,8,24], compute a quantum version of the Fisher information matrix, and then invert it. This brute-force method is not feasible for high dimensions. A new philosophy is needed.
In the next sections, I present approaches to quantum semiparametric estimation in increasing sophistication. Section III introduces the basic concepts and an approach that produces results in the most direct way. This direct approach has problems dealing with infinite-dimensional systems however, and Sec. IV shows how they can be solved via an elegant concept called parametric submodels. In the classical theory, the concept was first adumbrated by Charles Stein [25] and developed by Levit and many others [16][17][18]. Section V further develops the formalism to account for constraints, in order to produce results such as Eq. (2.4). Section VI discusses the practical problem of incoherent optical imaging [4] and summarizes existing results concerning the problem in the language of quantum semiparametrics, in order to provide a more physical context for the formalism.

A. Helstrom bound
be a family of density operators parametrized by θ = {θ j : j ∈ J }. The operators are assumed to operate on a common Hilbert space H, with an orthonormal basis that does not depend on θ. Both J and Q are assumed to be countable and, for convenience, totally ordered. Their dimensions are expressed as |J | and |Q|, respectively. The family is assumed to be smooth enough so that ∂ j = ∂/∂θ j can be interchanged with the operator trace tr in any operation on ρ(θ). Define a set of operators {S j : j ∈ J } as solutions to which is shorthand for the system of equations All functions of θ in this section are assumed to be evaluated implicitly at the same θ = φ, which is taken to be the true parameter value. {S} are called symmetric logarithmic derivatives in the quantum metrology literature, but here I call them scores in accordance with the statistics terminology [16][17][18]. I use S to denote a vector of the scores and, if the distinction is needed, {S} to denote the set of scores, also known as the tangent set. All vectors are assumed to be column vectors in this paper. Let the parameter of interest be a scalar β(θ) ∈ R; generalization for a vectoral β is possible via the concept of replicating spaces [18] but tedious and not attempted here. If |J | < ∞, a quantum version of the Cramér-Rao bound due to Helstrom [1] is HB = (∂β) ⊤ K −1 ∂β, (3.5) where ⊤ denotes the matrix transpose, and ∂β and the Helstrom information matrix K have entries given by The Helstrom bound sets a lower bound on the estimation error for any quantum measurement and any unbiased estimator [1,[6][7][8]. The estimation of β when θ contains infinitely many unknowns is called semiparametric estimation in statistics [16][17][18], although the same methodology applies to arbitrary dimensions. If θ is partitioned into (β, η 1 , η 2 , . . . ) ⊤ , then η are called nuisance parameters, but this explicit designation is not needed in this work.
B. Hilbert spaces for operators I now follow Holevo [6,22] and introduce operator Hilbert spaces in order to generalize the Helstrom bound for semiparametric estimation. The formalism may seem daunting at first sight, but the payoff is substantial, as it simplifies proofs, treats the infinite-dimensional case rigorously, and also enables one to avoid the explicit computation of S and K −1 for a large class of problems. In the following, I assume familiarity with the basic theory of Hilbert spaces and the mathematical theory of quantum mechanics; see, for example, Refs. [6,26,27].
All operators considered in this paper are self-adjoint. Consider ρ in the diagonal form ρ = j λ j |e j e j | with λ j > 0. The support of ρ is defined as supp(ρ) = span {|e j } ⊆ H, where span denotes the closed linear span. ρ is called full rank if supp(ρ) = H. Define the weighted inner product between two operators h 1 and h 2 as and a norm as 8) not to be confused with the operator norm h op = sup |ψ ∈H ψ| h 2 |ψ ≥ h . An operator is called bounded if h op < ∞ and square-summable with respect to ρ if h < ∞, although all operators are bounded by definition if |Q| < ∞. If A is a vector of operators and h is an operator, let A, h denote a vector with entries A, h j = A j , h . For two vectors of operators A and B, it is also convenient to use A, B ⊤ to denote the matrix with entries A j , B k , such as K = S, S ⊤ .
Define the Hilbert space for square-summable operators with respect to the true ρ as [6,22] To be precise, each Hilbert-space element is an equivalent class of operators with zero distance between them, viz., {ĥ j : ĥ j −ĥ k = 0 ∀j, k}. The distinction between an element and its operators is important only if ρ is not full rank; I put a hat on an operator if the distinction is called for. Two important Hilbert-space elements are the identity element I and the zero element 0; sometimes I will even replace I with 1 for brevity. Define a subspace for zero-mean operators as 10) and the orthocomplement of Z in Y as (3.11) In particular, the projection of any h ∈ Y into Z ⊥ is simply Π(h|Z ⊥ ) = h, 1 , where Π denotes the projection superoperator, and The most important Hilbert space in estimation theory is the tangent space spanned by the tangent set [16][17][18], generalized here as The condition T ⊆ Z requires the assumption K jj = S j , S j < ∞ for all j; the zero-mean requirement is satisfied because S, 1 = tr ∂ρ = ∂ tr ρ = 0. A useful relation for any bounded operator h is via Ref. [6,Eq. (2.8.88)]. Denote also the orthocomplement of T in Z as Another important concept in the classical theory is the influence functions [16][17][18], which I generalize by defining the set of influence operators as

C. Generalized Helstrom bound
To model a measurement, consider a positive operatorvalued measure (POVM) E on a measurable space (X , Σ X ), where Σ X is the sigma algebra on the set X , and an estimatoř β : X → R that satisfies is called an unbiased measurement. Both E andβ should not depend on θ [28]. Define the deviation operator with respect to (E,β) as It can be shown [6,Sec. 6.2] that δ ∈ D (as long as δ < ∞), and also that δ 2 bounds the estimation error as A generalized Helstrom bound (GHB) for any unbiased measurement can then be expressed as Proofs that Eq. (3.21) is equal to Eq. (3.5) if |J | < ∞ and K −1 exists can be found in Refs. [8,29]. The following theorem gives a more general expression that is the cornerstone of quantum semiparametric estimation. 22) where δ eff , henceforth called the efficient influence, is the unique element in the influence-operator set D given by

23)
and Π(δ|T ) denotes the projection of any influence operator δ ∈ D into the tangent space T .
Proof. The proof is similar to the classical one [17,18]. First note that, since D ⊂ Z = T ⊕ T ⊥ , any δ ∈ D can always be decomposed into This implies S, δ eff = S, δ − h = S, δ = ∂β, and therefore δ eff ∈ D. Now the Pythagorean theorem gives 25) which means that min δ∈D δ 2 = δ eff 2 , and also that δ 2 = δ eff 2 if and only if h = δ − δ eff = 0, implying the uniqueness of δ eff as a minimizing element in D.
I call an unbiased measurement efficient if it has an error that achieves the GHB, following the common statistics terminology [16][17][18]. Figure 1 illustrates all the Hilbert-space concepts involved in Theorem 1.
Before I apply the theorem to examples, I list a couple of important corollaries. The first corollary reproduces the original Helstrom bound given by Eq. (3.5) and is expected from earlier proofs in Refs. [8,29]; here I simply state explicitly that it is a special case of Theorem 1.
Proof. Delegated to Appendix A.
Note that, unlike Eq. (3.5), Theorem 1 works with no regard for any linear dependence in {S}. This generalization is in fact indispensable to semiparametric theory, especially when the concept of parametric submodels is introduced in Sec. IV.
The second corollary, which gives a scaling of the bound with the number of object copies and is easy to prove via K −1 , requires more effort to prove if K −1 is to be avoided.
FIG. 1. The whole space in the picture represents Z, the space for zero-mean operators. T is the tangent space spanned by the tangent set {S}. T ⊥ is the orthocomplement, which contains elements orthogonal to all the scores. D is the set of influence operators, which all have a fixed component in T determined by ∂β. δ is an influence operator in D. The projection of δ into T gives the efficient influence δ eff , which has the smallest norm among all the influence operators.

26)
the efficient influence and the GHB are given by where U is a map defined as Proof. Delegated to Appendix B.

D. Influence operator via a functional gradient
Theorem 1 is useful if an influence operator δ ∈ D can be found and Π(δ|T ) is tractable. To be specific, assume that the parameter of interest is a functional β[ρ], and a derivative of β[ρ] in the "direction" of an operator h can be defined as Assume that the derivative can be expressed as 3.30) in terms of aβ ∈ Y, hereafter called a gradient of β[ρ]. Any ordinary partial derivative of β becomes Projecting the gradient into Z then gives an influence operator, viz., (3.33) as it is straightforward to check that δ, 1 = 0 and S, δ = ∂β. The top flowchart in Fig. 2 illustrates the steps to obtain δ from β[ρ].β, δ, and δ eff are all gradients that satisfy Eq. (3.30); the difference lies in the set of directions to which each is restricted. δ, for instance, is restricted to Z and orthogonal to Z ⊥ , while δ eff is restricted to T and orthogonal to T ⊥ [30]. Now consider some examples. The first is β = tr ρY for a given (i.e., θ-independent) observable Y , which leads to Y can be a physical observable, such as the position of a quantum particle, or a pure-state projection |ψ ψ| that makes β = ψ| ρ |ψ the fidelity. The second example is the purity β = tr ρ 2 , which leads to The final example is the relative entropy β = tr ρ(ln ρ − ln σ) [7,31]. where ln ρ = j ln λ j |e j e j | and σ is a given density operator with supp(σ) ⊇ supp(ρ). The differentiability of β is not a trivial question when |Q| = ∞, but for |Q| < ∞ it can be done to give 7,Theorem 6.3]. The von Neumann entropy is a simple variation of this example.

E. Projection into the tangent space
The next step is Π(δ|T ). If the family of density operators is large enough, T can fill the entire Z and the projection becomes trivial. To be specific, consider the orthonormal basis given by Eq. (3.2). The most general parametrization of ρ is [24] and a special entry θ 0r is removed from the parameters and set as θ 0r = 1 − q =r θ 0q , such that tr ρ = q θ 0q = 1 and |J | = |Q| 2 − 1. ∂ρ is then given by The next theorem is a key step in deriving simple analytic results.
Proof. Consider the solution to S, h = 0 for an h ∈ Z. All operators are bounded if |Q| < ∞. I can then use Eqs. (3.14) and (3.41) to obtain 44) whereĥ is any operator in the equivalent class of h. Thus all the diagonal entries ofĥ are equal to r|ĥ |r , and all the off-diagonal entries are zero. In other words,ĥ = r| h |r Î , whereÎ is the identity operator. But h ∈ Z also means that tr ρĥ = r|ĥ |r = 0, resulting inĥ = 0 as the only solution.
Hence T ⊥ = {0} contains only the zero element, and T = Z.
F 0 implies that the experimenter knows nothing about the density operator, apart from the Hilbert space H on which it operates. If ρ has a high dimension, S would be intractable, let alone K −1 , but Theorems 1 and 2 turn the problem into a trivial exercise once an influence operator has been found, since a δ ∈ D ⊂ Z is already in Z = T and hence efficient. Corollary 2 can then be used to extend the result for N copies. For β = tr ρY , Eq. (3.34) leads to 3.45) This implies that a von Neumann measurement of Y of each copy and taking the sample mean of the outcomes are already efficient; no other measurement can do better in terms of unbiased estimation. For β = tr ρ 2 , Eq. (3.35) leads to 46) and for β = tr ρ(ln ρ − ln σ), Eq. (3.36) leads to Intriguingly, this expression coincides with the information variance that has found uses in other contexts of quantum information theory, such as quantum hypothesis testing [32]. While Eqs. (3.46) and (3.47) for the last two examples serve as fundamental limits, their attainability is an open question, since the δ in each case does not seem to correspond to any valid measurement. Whether adaptive measurements can attain efficiency in an asymptotic N → ∞ limit [7,8,24,33] is not easy to ascertain for semiparametric estimation.

IV. PARAMETRIC SUBMODELS
The direct approach in Sec. III has a few shortcomings. It requires a few assumptions that may be difficult to check if the dimensions |J | and |Q| are high, such as the smoothness of the density-operator family and the finiteness of the Helstrom information. In particular, the F 0 family given by Eq. (3.37) becomes unwieldy if |Q| = ∞ and thus |J | = ∞. The proof of the important Theorem 2 also breaks down for |Q| = ∞ as it assumes bounded operators. These problems can be alleviated by the beautiful concept of parametric submodels [16][17][18]25]. Let G = {ρ g : g ∈ G} (4.1) be a "mother" density-operator family. The density operators are still assumed to operate on a common separable Hilbert space H. Denote the true density operator in the family as ρ ∈ G. A parametric submodel F σ is defined as any subset of G that contains the true ρ and has the parametric form of Eq. (3.1). To wit, where p denotes the dimension of the parameter and φ denotes the parameter value at which σ(φ) = ρ is the truth; both may be specific to the submodel. The mother family is assumed to be completable with parametric submodels, such that it can be expressed as In the language of geometry [7,8,23], each F σ may be regarded as a submanifold of G. For example, if p = 1, then F σ is simply a curve in G, and all the curves are required to intersect at ρ g = ρ. Figure 3 illustrates the concept.
FIG . 3. The space represents G, a mother family of density operators. The true density operator is denoted as ρ. Parametric submodels are submanifolds, represented by curves, contained in G that intersect at ρ. Each score S σ is a tangent vector that quantifies the "velocity" of a density-operator trajectory in a certain direction.
Each submodel F σ is assumed to be smooth enough for scores to be defined in the same way as before by which, to be specific, denotes a system of p equations given by Everything is evaluated at the truth ρ, so the scores across all submodels in fact live in the same Hilbert space Z with respect to ρ. Define the tangent set as the set of all the scores from all submodels, viz., 6) and the tangent space as the span of the set, viz., An influence operator is now defined as any operator that satisfies the unbiasedness conditions for all submodels with respect to {S}. The conditions can be expressed as where (∂β) θ=φ is specific to each submodel, or more compactly as S, δ = ∂β. The influence-operator set D then has the same expression as Eq. (3.17). An unbiased measurement still satisfies Eq. (4.8) or (4.9) by the generic arguments in Ref. [6,Sec. 6.2], which apply to any submodel, so the deviation operator given by Eq. (3.19) is still in D, and δ 2 is a lower bound on its estimation error according to Eq. (3.20). Theorem 1 can now be extended for the mother family.
Theorem 3. The GHB for the mother family G is given by where the efficient influence δ eff is the unique element in the influence-operator set D given by 11) δ is any influence operator in D, and T is the tangent space spanned by the scores of all parametric submodels of G.
Proof. The proof is identical to that of Theorem 1 if one takes {S} to be the tangent set containing the scores of all parametric submodels.
Corollary 2 can also be generalized in an almost identical way, although the proof requires more careful thought.

Corollary 3. For a family of density operators that model N independent and identical quantum objects in the form of
the efficient influence and the GHB are given by eff and GHB (1) are those for the N = 1 family according to Theorem 3 and U is the map given by Eq. (3.28).
Proof. Delegated to Appendix C.
Before I can generalize Theorem 2 for |Q| = ∞, I need to be mindful of the unbounded operators in Z. The good news is that they are well defined as limits of bounded-operator sequences in Y, thanks to Holevo [6,22]; just a minor modification is needed to make his result work for Z.
To be precise, I call a Hilbert-space element bounded and denote it by h op < ∞ if its equivalent class contains a bounded operatorĥ. Denote the set of all bounded elements in Z as 14) and the closure of B as B. If |Q| < ∞, B = B = Z since all operators are bounded in the finite-dimensional case, but if |Q| = ∞, B ⊂ Z is a strict subset. A useful lemma is as follows. Proof. Delegated to Appendix D.
With the concept of parametric submodels and Lemma 1, I can finally generalize Theorem 2 for infinite-dimensional quantum systems. This is also a more precise generalization of a classic result in semiparametric theory [17,Example 1 in Sec. 3.2]. Theorem 4. T = Z for G 0 , defined as the family of arbitrary density operators.
Proof. Take any h ∈ B ⊂ Z and a bounded operatorĥ from its equivalent class. Construct a one-parameter exponential family as [7,8] σ(θ) = κ(θ) tr κ(θ) , κ(θ) = exp(θĥ/2)ρ exp(θĥ/2), (4.15) where θ ∈ R and the truth is at σ(0) = ρ. Asĥ is bounded, exp(θĥ/2) is bounded and strictly positive. As ρ is nonnegative and unit-trace, κ(θ) is nonnegative and trace-class [6,Theorem 2.7.2]. Moreover, tr κ(θ) satisfies the properties ∞ > tr κ(θ) = tr ρ exp(θĥ) > 0, (4.16) because κ(θ) is trace-class and exp(θĥ) is strictly positive. Hence σ(θ) is a valid density operator at any θ. Since G 0 contains arbitrary density operators, so the score for this model can be taken as S σ = h. Define a submodel in the same way for every h ∈ B, such that all of the B elements are in the tangent set {S}, leading to B ⊆ {S} ⊆ T . As T is closed, the limit points of B must also be in T , and B ⊆ T . Lemma 1 then gives Z = B ⊆ T . (4.18) In view of the fact T ⊆ Z, the theorem is proved.
A comparison of the proofs of Theorems 2 and 4 shows how the parametric-submodel concept helps. Instead of dealing with one large family such as Eq. (3.37), here one exploits the freedom offered by G 0 to specify many ad-hoc and elementary submodels. Each submodel in the proof cannot be simpler-the exponential family is simply a type of geodesics through ρ in density-operator space [7]. An enormous number of such submodels are introduced, one for each B element in the proof, leading to an extremely overcomplete tangent set. But that presents no trouble for the Hilbert-space formalism; only the tangent space matters at the end. Figure 4 illustrates the idea.
By virtue of Theorem 4, an influence operator δ ∈ D ⊂ Z = T found for a parameter of interest is the efficient one for G 0 . The examples in Secs. III D and III E work for G 0 in the same way they work for F 0 . If β is given by β[ρ g ], an influence operator that satisfies Eq. (4.9) can be found via a gradient of β[ρ g ], as shown in Sec. III D and Fig. 2. In particular, the influence operators given by Eqs. (3.34)-(3.36) and the bounds given by Eqs. (3.45)-(3.47) for the various examples should still hold for G 0 , although the entropy example will require a more rigorous treatment when |Q| = ∞ [31]. FIG. 4. For any h ∈ B, one can associate with it an exponential family (a straight line in the density-operator space) that passes through ρ. Since G0 contains arbitrary density operators, every line must be contained in G0. It follows that each line is a parametric submodel for G0, and each h should be put in the tangent set. The dots represent the fact that the proof involves lines in all directions and, on each line, scores with all possible norms.

A. Antiscore operators
To deal with families less general than G 0 , consider families of the form  (5.2) then each operator given by Thus {R} are orthogonal to the tangent set {S} and span{R} must be a subset of T ⊥ . I call R the antiscore operators, as the following theorem shows that they span T ⊥ in the same way the scores span T .
The use of f (u) follows the classical version in Ref. [17], as plotted in Fig. 5 [34]. Only its local properties around u = 0 are essential to the tangent space. An adjustible operator g(θ) is included in the submodel to make σ(θ) satisfy the constraint away from ρ. Figure 6 further illustrates the idea of the proof.
Given an influence operator δ, such as those derived in Sec. III D, the efficient influence and the GHB can be computed in terms of T ⊥ instead of T via The same projection formula that gives δ eff in Appendix A can be adapted to give  Fig. 2, and R can be computed analytically for linear constraints, the purity constraint, and the entropy constraint by following the same type of calculations shown in Eqs. (3.34)-(3.36). For example, the vectoral linear constraint γ[ρ] = tr ρZ − ζ = 0 with respect to a vector of operators Z and a vectoral constant ζ leads to R, δ becomes a vector of correlations of δ to Z, and R, R ⊤ becomes the covariance matrix for Z. Equation (2.4) is a special example of the constrained GHB if β = tr ρY and Z is just one operator.

B. Philosophy
The semiparametric philosophy is the polar opposite of the usual approach to quantum estimation. In the usual bottomup approach, one assumes a small family of density operators with a few parameters and computes Π(δ|T ) 2 that is determined by the overlap between δ and the scores S. Here, one starts with a family so large that the tangent space cannot be bigger, computes δ 2 for an amenable δ, and then reduces it by Π(δ|T ⊥ ) 2 that is determined by the overlap between δ and the antiscores R, as illustrated by Fig. 7. The complexity of the problem thus depends on the size of the family, and the essential insight of this work is that the problem can become simple again when the size is large enough. Of course, if the dimension of T ⊥ is large, the semiparametric approach may also suffer from the curse of dimensionality. The medium-size families that have a large T as well as a large T ⊥ are the most difficult to deal with, as they may be impregnable from either end. FIG. 7. An illustration of the conventional bottom-up approach to quantum estimation and the top-down semiparametric approach, as discussed in Sec. V B.

C. Looser bounds
It may often be the case that, despite one's best efforts, the exact δ eff for a problem remains intractable. Then a standard strategy in statistics and quantum metrology is to sandwich ||δ eff || 2 between upper and lower bounds. ||δ|| 2 is an obvious upper bound and can be obtained from the gradient method in Sec. III D if β can be expressed as a functional β[ρ]. Another way is to use Eq. (3.20) if an unbiased measurement and its error are known. The evaluation of lower bounds, on the other hand, can be facilitated by the following proposition. Proposition 1. Let V ⊆ T be a closed subspace of T and V ⊥ be the orthocomplement of V in Z. Then

23)
In particular, if is taken as the tangent space for a particular parametric submodel F σ , then is the GHB for that submodel.
Proof. Delegated to Appendix E.
A tight lower bound on δ eff 2 can be sought by devising a submodel that is as unfavorable to the estimation of β as possible. Another approach is to devise an overconstrained model with V ⊥ ⊇ T ⊥ and evaluate a lower bound on δ eff 2 from the top by overshooting, as illustrated by Fig. 8.   FIG. 8. One can obtain a lower bound on δ eff 2 either by undershooting from the bottom via a more amenable subspace V ⊆ T , or overshooting from the top via an overconstrained model with V ⊥ ⊇ T ⊥ .

VI. INCOHERENT OPTICAL IMAGING
A. The mother model I now apply the semiparametric formalism to the problem of incoherent optical imaging and summarize existing results concerning the problem [4] under a unified treatment. While this section presents no new results essentially, the goal is to give the semiparametric theory a more concrete connection to reality, and also to hopefully inspire new insights for future studies of related problems.
The basic setup of an imaging system is depicted in Fig. 9. The object is assumed to emit spatially incoherent light at an optical frequency. For simplicity, the imaging system is assumed to be one-dimensional, paraxial, and diffractionlimited. A model of the photons on the image plane is [4,13,35] G (N ) = ρ ⊗N P : P ∈ P , (6.1) 2) where N is the number of detected photons [36], P is the unknown source distribution that is modeled as a probability measure, X is the object-plane coordinate, ψ(x) is the pointspread function of the imaging system, x is the image-plane coordinate normalized with respect to the magnification factor [37], and |x is the Dirac position ket that satisfies x|x ′ = δ(x − x ′ ). X and x are further assumed to be normalized with respect to the width of ψ(x) so that they are dimensionless. ψ(x) is assumed here to be Various generalizations can be found in Refs. [4,13,21,35] and references therein. Besides imaging, the model can also be used to describe a quantum particle under random displacements [11,38]. The problem is semiparametric if no prior assumption is made about the source distribution, viz., P = {all probability measures on (R, Σ R )} , (6.5) and the parameter of interest is a functional of P , such as the object moment β µ [P ] = X µ dP (X), (6.6) where µ ∈ N 1 denotes the order of the moment of interest. The errors and their bounds are all functionals of the true distribution P , and I will focus on their values for subdiffraction distributions, which are defined as those with a width ∆ around X = 0 much smaller than the point-spread-function width, or in other words ∆ ≪ 1 [4].

B. Semiparametric measurements and estimators
Two different unbiased measurements for semiparametric moment estimation are known [21]; both are separable measurements and sample means in the form of [39] (λ n ), λ n ∈ X . (6.8) The first measurement is direct imaging, which measures the intensity on the image plane and is equivalent to the projection of each photon in the position basis as (6.9) An unbiased semiparametric estimator is given by the sample mean of 10) 1 proposition = 1, if proposition is true, 0, otherwise, (6.12) and the error is on the order of (6.13) where O(1) denotes a prefactor that does not scale with ∆ in the first order. The second measurement is the so-called spatial-mode demultiplexing or SPADE [3,4,13,21,35], which demultiplexes the image-plane light in the Hermite-Gaussian basis given by 15) where He q (x) is a Hermite polynomial [40]. For the estimation of an even moment with µ = 2j, the POVM for each photon is (6.16) an unbiased semiparametric estimator is given by the sample mean ofβ (6.17) and the error is on the order of 18) which is much lower than that of direct imaging in the subdiffraction regime for the second and higher moments. For the estimation of odd moments with SPADE, only approximate results have been obtained so far [12,35,41,42] and are not elaborated here. Both estimators are efficient for their respective measurements in the classical sense [21]. In the quantum case, the question is whether SPADE is efficient or there exist even better measurements. Computing the GHB, or at least bounding it, would answer the question and establish the fundamental quantum efficiency.
The deviation operator for either measurement defined by Eq. (3.19) is an influence operator. The existence of two different influence operators implies that the tangent space T ⊂ Z is a strict subset of Z for this problem, since any influence operator would be efficient if T = Z and Theorem 3 states that the efficient influence must be unique. This fact means that the tangent space is not trivial and the problem is more challenging, but at the same time more interesting.

C. Lower bounds via parametric submodels
Both Eqs. (6.13) and (6.18) are upper bounds on the GHB. By virtue of Proposition 1, all earlier quantum lower bounds derived for incoherent imaging via parametric models are in fact lower bounds on the GHB for the mother family given by Eq. (6.1), with the true ρ being evaluated at certain special cases of P . For example, by assuming that the object consists of two equally bright point sources, viz., (6.20) where θ is the separation between the two sources, Ref. [3] finds that the per-photon Helstrom information for this θ is The two-point model is indeed a parametric submodel of the mother family for a special case of P , so Proposition 1 leads to a lower bound on the GHB for β 2j = (θ/2) 2j given by which matches the SPADE performance for the second moment (µ = 2j = 2) in order of magnitude but is much lower for higher moments. Reference [43] has computed the Helstrom bound for two point sources with unequal brightnesses, which may also be used to produce tighter submodel bounds. A more elaborate submodel is the M -point model given by with θ 0r = 1 − m =r θ 0m . The exact Helstrom bound is difficult to compute analytically for large M [44], but Ref. [4] uses an extended-convexity technique [11,45] to bound it as N . (6.27) This shows that direct imaging is near-efficient for the first moment. For the second moment, a more careful calculation shows that the error of SPADE in fact matches Eq. (6.27) exactly [21]. For the higher moments, however, Eq. (6.27) remains much lower than that achievable by SPADE. The latest attempt at deriving a tight bound assumes the formal model [13] (6.28) where k is the canonical momentum operator and θ µ = X µ dP (X) = β µ . By considering this as a one-parameter model with the parameter being θ µ for a given µ and all the other parameters being fixed, Ref. [13] finds a lower bound on the Helstrom bound via a purification technique [46] on the order of 29) which leads to This lower bound does match the performance of SPADE in order of magnitude, but it does not have a simple closed-form expression, and the question of whether SPADE is exactly efficient for moments higher than the second remains open. Besides incoherent imaging, it is also possible to generalize the model for coherent or partially coherent imaging with classical or nonclassical light, or to account for other sources of noise, such as atmospheric turbulence. The semiparametric formalism provides the appropriate tools to attack such problems when little prior assumption about the object is desired.

VII. CONCLUSION
I have founded a theory of quantum semiparametric estimation and showcased its power by producing simple quantum bounds for a large class of problems with high or even infinite dimensions. The theory establishes the notion of quantum semiparametric efficiency, which should inform and inspire the design of semiparametric measurements for applications in quantum information, optical imaging, and beyond.
Many open problems still remain. More extensions and applications of the theory remain to be worked out. The attainability of efficiency is a thorny issue [7,8,24,33] and only touched upon here. The assumption of unbiased estimation is a drawback, and it remains to be seen whether the theory can be generalized to the Bayesian or minimax paradigm [47]. Estimation of a vectoral β is another unexplored topic in this work. These problems may benefit from studies of alternative quantum bounds beyond the Helstrom or Cramér-Rao type [6,9,29,48]. In view of Eq. (3.47) and Figs. 3 and 4, the connections of quantum semiparametrics to other aspects of quantum information [32] and quantum state geometry [7,8,23] are also interesting future directions.
In light of the richness and wide applications of the classical semiparametric theory [16][17][18][19][20][21], this work has only scratched the surface of the full potential of quantum semiparametrics. It should open doors to further useful results.

ACKNOWLEDGMENTS
Helpful comments from Ranjith Nair and Francesco Albarelli are gratefully acknowledged. This work is supported by the Singapore National Research Foundation under Project No. QEP-P7.
The projection formula is more well known to be Π(δ|T ) = J j=1 a j a j , δ = a ⊤ a, δ .
Now consider T = span{S j : 1 ≤ j ≤ J}. To derive an orthonormal basis from {S}, let where G is assumed to be a lower-triangular matrix (G jk = 0 for k > j) with positive diagonal entries (G jj > 0) and thus invertible [49]. The orthonormal condition gives which can be written more compactly as where I is the J × J identity matrix. Hence If K > 0, G −1 can be obtained by the Cholesky factorization of K [49], which is equivalent to the Gram-Schmidt procedure. Equation (A4) can now be expressed as Π(δ|T ) = (GS) ⊤ GS, δ = S ⊤ G ⊤ G S, δ which is Eq. (A1).
Appendix B: Proof of Corollary 2 Denote any concept discussed so far with the superscript (N ) if it is associated with F (N ) , but omit the superscript (1) for brevity if N = 1. From Z, generate a subspace U Z ⊂ Z (N ) such that and U is a surjective map to U Z by definition of the space. It can be shown that so U Z is isomorphic to Z, and U is a unitary map from Z to U Z [27]. It can also be shown that so T (N ) = span{S (N ) } ⊆ U Z, and T (N ) is isomorphic to T . For any U h ∈ U Z, it is not difficult to prove that Π(U h|T (N ) ) = U Π(h|T ), given the isomorphisms. Now let where δ is an influence operator. δ (N ) is also an influence operator, since The efficient influence for F (N ) becomes the norm becomes δ (N ) eff and the corollary ensues. Consider the N = 1 family and omit the superscript (1) in that case for brevity. Let be a complete set of parametric submodels for G, where each submodel has the form of Eq. (4.2). Define where each F σ(N ) is the submodel generated from F σ with the same σ(θ) and parameter space. In view of the form of G (N ) given by Eq. (4.12), F (N ) must also be a complete set of submodels for G (N ) . For each F σ(N ) , the scores are given by where S σ are those for F σ . It follows that the tangent space T (N ) for G (N ) is isomorphic to the T for G, by the same arguments in Appendix B. Projecting an influence operator of the form δ (N ) = U δ/ √ N into T (N ) gives the efficient influence δ (N ) eff = U δ eff / √ N , again by the same arguments in Appendix B.