Quantum Theory of Superresolution for Two Incoherent Optical Point Sources

Rayleigh's criterion for resolving two incoherent point sources has been the most influential measure of optical imaging resolution for over a century. In the context of statistical image processing, violation of the criterion is especially detrimental to the estimation of the separation between the sources, and modern farfield superresolution techniques rely on suppressing the emission of close sources to enhance the localization precision. Using quantum optics, quantum metrology, and statistical analysis, here we show that, even if two close incoherent sources emit simultaneously, measurements with linear optics and photon counting can estimate their separation from the far field almost as precisely as conventional methods do for isolated sources, rendering Rayleigh's criterion irrelevant to the problem. Our results demonstrate that superresolution can be achieved not only for fluorophores but also for stars.


I. INTRODUCTION
Rayleigh's criterion for resolving two incoherent point sources, requiring them to be separated at least by a diffraction-limited spot size on the image plane [1,2], has been the most influential measure of optical imaging resolution for over a century. More recently, insights from quantum optics [3] and statistics [4] have led to revolutions in farfield superresolution techniques [5][6][7] beyond his criterion. The techniques proposed in Refs. [3,4] rely on locating a point source when no other nearby sources are radiating in the same optical mode. While such techniques have achieved spectacular success in microscopy, they require sophisticated control of the emission of special fluorophores and are irrelevant to astronomy and remote sensing.
For two sources with overlapping radiations on the image plane, studies have found that signal processing of the imaging data can still determine their locations, although the precision in the presence of photon shot noise quickly deteriorates when Rayleigh's criterion is violated [8][9][10]. The precision degradation is mandated by the Cramér-Rao lower error bound [11], suggesting that the degradation is fundamental to direct imaging. Given such prior work, conventional wisdom thus suggests that the positions of two incoherent sources should become harder to estimate when their radiations overlap, a statistical phenomenon we call Rayleigh's curse.
Since photon shot noise is now the dominant noise source in fluorescence microscopy [10,12] as well as stellar imaging [13][14][15][16], it is timely to inquire whether a quantum treatment can lead to new insights. Here we attack the problem from the perpsective of quantum metrology, a branch of quantum information theory relevant to sensing and imaging [17,18]. To be specific, we derive the fundamental quantum limit to the precision of locating two weak thermal optical point sources in the form of the quantum Cramér-Rao bound (QCRB) proposed by Helstrom [17]. Surprisingly, we find that the QCRB maintains a fairly constant value for any separation and shows * mankei@nus.edu.sg no sign of Rayleigh's curse. This behavior is in stark contrast to the QCRB for in-phase coherent sources, in which case Rayleigh's curse is fundamental [19].
It is known mathematically that there exists a measurement scheme to attain the QCRB for one parameter asymptotically [20,21]. For a more concrete experimental implementation, here we propose the method of SPAtial-mode DEmultiplexing (SPADE). We show that SPADE can ideally estimate the separation between the two sources with a quantum-optimal Fisher information, and we also propose linear optical system designs that can implement the measurement. Direct imaging is poor at localization of two close sources precisely because it estimates their separation poorly, and SPADE is able to overcome this problem and Rayleigh's curse via further linear optical processing before photon counting.
The subject of quantum imaging has been extensively studied; see Appendix A for a literature review. Most prior proposals rely on nonclassical sources or multiphoton coincidence measurements, however, making them difficult and inefficient to use in practice. Incoherent sources, such as fluorophores and stars, are of course much more common, and linear optical methods to enhance the localization precision for close incoherent sources will be of monumental interest to both localization microscopy [4,6,7] and astrometry [15,16]. The most relevant prior work remains the pioneering studies by Helstrom on thermal sources [17], yet he studied two sources only in the context of binary hypothesis testing and assumed a given separation in the two-source hypothesis [22]. As the separation is usually unknown and needs to be estimated in the first place [4, 6-10, 15, 16], our parameter-estimation framework should be more useful.

II. QUANTUM OPTICS FOR WEAK THERMAL SOURCES
To illustrate the essential physics, we follow Lord Rayleigh's lead [1] and assume quasi-monochromatic scalar paraxial waves and one spatial dimension on the object and image planes. Within each short coherence time interval for a thermal source at an optical frequency, it is standard [13,14,[23][24][25][26][27] to assume that the average photon number arriving on the image plane is much smaller than 1, and useful information is obtained only after many photons have been measured over many such intervals. This means that the quantum density operator for the optical fields on the image plane in each coherence time interval can be well approximated as ρ = (1 − )ρ 0 + ρ 1 + O( 2 ), (2.1) where ρ 0 = |vac vac| is the zero-photon state, ρ 1 is a onephoton state, and O( 2 ) denotes terms on the order of 2 ; see Appendix B for a detailed derivation. For the rest of the paper, we neglect the O( 2 ) terms and use the ≈ sign to denote the first-order approximations. Similar approximations were also used earlier to study stellar interferometry [26,27]. A connection with classical statistical optics can be made by observing that ρ 1 is related to the mutual coherence of the optical fields with respect to the Sudarshan-Glauber distribution. As shown in Appendix B, the one-photon state for two incoherent point sources and a diffraction-limited imaging system can be taken as where x is the image-plane coordinate normalized with respect to the magnification factor of the imaging system [28], |x = a † (x) |vac is the photon image-plane position eigenket defined with respect to annihilation and creation operators that obey [a(x), a † (x )] = δ(x − x ) [29,30], and ψ s (x) is the image-plane wavefunction from each source. We can reproduce the standard Poisson model of direct image-plane photon counting [10,[12][13][14][31][32][33] by considering the 1 − probability of no photon count and the 1 probability of measuring a photon. If a photon is detected, the probability density of the photon position x is With 1, the photon count at each pixel with width dx can be approximated as Poisson with a mean given by Λ(x)dx. The total photon count over M coherence time intervals then remains approximately Poisson with a mean M Λ(x)dx = N Λ(x)dx, where N ≡ M is the average photon number collected over the M intervals and Λ(x) becomes the mean intensity profile. To illustrate, Fig. 1 depicts the wavefunctions and the mean intensity for a typical imaging system. Note the crucial point that ψ 1 |ψ 2 = ∞ −∞ dxψ * 1 (x)ψ 2 (x) = 0, and the spatial modes excited by the two sources are in general not orthogonal, especially when Rayleigh's criterion is violated. This overlap underlies all the physical and mathematical difficulties with the resolution problem, as it implies on a fundamental level that the two modes cannot be separated for independent measurements.

III. CLASSICAL AND QUANTUM CRAMÉR-RAO BOUNDS
To investigate the impact of measurement noise on parameter estimation, suppose that ρ depends on a set of unknown parameters denoted by {θ µ ; µ = 1, 2, . . . } [34], and a quantum measurement is made on the image plane over the M intervals to estimate θ. Any quantum measurement can be mathematically described by a positive operator-valued measure (POVM) E(Y) [17], such that the probability distribution of measurement outcome Y is P (Y) = tr E(Y)ρ ⊗M , with tr denoting the operator trace and ρ ⊗M denoting a tensor product of M density operators. Letθ µ (Y) be an estimator and be the error covariance matrix. For any unbiased estimator, the Cramér-Rao bound is given by is the Fisher information matrix with respect to P (Y) [11]. For the Poisson model of direct imaging [10,[31][32][33], Alternatively, the same result can be derived without the Poisson approximation by considering the one-photon distribution given by Eq. (2.4) and no multiphoton coincidence. As the Cramér-Rao bound is asymptotically achievable [11], the Fisher information has become the standard precision measure in modern fluorescence microscopy [10,[31][32][33] as well as astronomy [13,16,35,36]. Direct imaging, though standard, is but one of the infinite measurement methods permitted by quantum mechanics. The ultimate performance of any quantum measurement and any unbiased estimator can be quantified using the quantum Cramér-Rao bound where K is the quantum Fisher information matrix in terms of ρ ⊗M [17]. To compute K analytically, we assume a spatially invariant imaging system with ψ s (x) = ψ(x − X s ), where ψ(x) is the point-spread function of the imaging system and X s is the unknown position of each source [28]. Both J (direct) and K turn out to be diagonal if we redefine the parameters of interest as the centroid and the separation as depicted in Fig. 1. We also assume, with little loss of generality, that the point-spread function has a constant xindependent phase, which can be easily implemented by a two-lens system [28]. The phase is then irrelevant to ρ 1 in Eq. (2.2) and ψ(x) can be taken as real.
The computation of K is described in Appendix C; the result is is the spatial-frequency variance of the real point-spread function set by the diffraction limit and is a parameter that depends on θ 2 . The prefactor N indicates a shot-noise scaling with respect to the average photon number, as expected from classical sources [18,19]. For θ 2 → ∞, γ 2 → 0, and we recover the standard shot-noise limit to the localization of isolated sources.
To compare the quantum Fisher information with the classical information for direct imaging, Fig. 2 plots the diagonal elements of J (direct) and K, assuming a Gaussian pointspread function [12] ψ(x) = ( where σ = 1/(2∆k) = λ/(2πNA), λ is the free-space wavelength, and NA is the effective numerical aperture. The constant K 22 in particular becomes The Gaussian case is representative and the same qualitative behaviors can be observed for other common point-spread functions. For the centroid, both the classical and quantum information is within a factor of 2 of the standard limit N/σ 2 . J (direct) 11 ≤ K 11 as it should, but the small gap between the two means that there is little room for improvement. are the corresponding classical values for direct imaging. The horizontal axis is normalized with respect to the point-spread function width σ, while the vertical axis is normalized with respect to N/(4σ 2 ), the value of K22.
The difference between the separation information quantities K 22 and J (direct) 22 in Fig. 2 is much more dramatic. Both quantities approach the same limit N/(4σ 2 ) as θ 2 → ∞, implying that direct imaging is quantum-optimal for wellseparated sources. For θ 2 /σ → 0, however, the classical information J (direct) 22 decreases to zero. This means that direct imaging is progressively worse at estimating the separation for closer sources, to the point that the information vanishes and the Cramér-Rao bound diverges at θ 2 = 0. We call this divergent behavior due to overlapping wavefunctions Rayleigh's curse, as it implies a severe penalty on the localization precision when the intensity profiles overlap significantly and Rayleigh's criterion is violated for a given N .
Direct imaging suffers from Rayleigh's curse for any pointspread function, as ∂Λ(x)/∂θ 2 vanishes at θ 2 = 0 while Λ(x) remains nonzero in regions of x where the derivative vanishes, causing J (direct) 22 to vanish via Eq. (3.4). This is the reason why the Cramér-Rao bounds derived in Refs. [8][9][10] for separation estimation all diverge when Rayleigh's criterion is violated. Remarkably, the quantum information K 22 in Eq. (3.8) stays constant regardless of the separation. If the centroid θ 1 is known, there exists a POVM with error Σ 22 asymptotically attaining the single-parameter QCRB [20,21], viz., This means that Rayleigh's curse can be avoided for separation estimation, and considerable improvements can be obtained, if the optimal quantum measurement can be implemented. To expound the issue, Fig. 3 plots the quantum and classical Cramér-Rao bounds 1/K 22 and 1/J , demonstrating more dramatically the divergent error in the classical case and the substantial room for improvement offered by quantum mechanics. ) on the error of separation estimation. The bounds are normalized with respect to the quantum value 4σ 2 /N . Rayleigh's curse refers to the divergence of the classical bound when θ2 < ∼ σ, as discovered by Refs. [8][9][10].

IV. SPATIAL-MODE DEMULTIPLEXING (SPADE)
Instead of measuring the position of each photon in the direct imaging method, we propose a discrimination in terms of the Hermite-Gaussian spatial modes [37] to estimate the separation. Consider the basis {|φ q ; q = 0, 1, . . . } with eigenkets given by where H q is the Hermite polynomial [37]. The POVM for each coherence time interval can be expressed as projections Conditioned on a detection event, the probability of detecting the photon in the qth mode becomes Similar to direct imaging, 1 implies that, over M intervals, the total photon count m q in each Hermite-Gaussian mode can be approximated as Poisson with a mean given by N P 1 (q).
To proceed further, we assume that the centroid θ 1 is known, and only θ 2 is to be estimated. Since centroid estimation using direct imaging is relatively insensitive to the separation, the assumption of an accurately known centroid is not difficult to satisfy; see Appendix D for a detailed discussion. Under this assumption, we can assume θ 1 = 0 without loss of generality, and the wavefunctions become For simple analytic results, we further assume that the point-spread function is Gaussian. The overlap factors in Eq. (4.4) can then be evaluated by recognizing that |φ q is mathematically equivalent to an energy eigenstate of a harmonic oscillator (in the configuration space of the photon), and |ψ 1 and |ψ 2 are equivalent to configuration-space coherent states with displacements ±θ 2 /(4σ). The result is This formula is valid even if the two sources have unequal intensities and ρ 1 is any mixture of |ψ 1 ψ 1 | and |ψ 2 ψ 2 |. The classical Fisher information for the Hermite-Gaussian-basis measurement over M intervals becomes which is equal to the quantum information given by Eq. (3.11) and also free of Rayleigh's curse.
To measure in the Hermite-Gaussian basis, one needs to demultiplex the image-plane field in terms of the desired spatial modes before determining the outcome based on the mode in which the photon is detected. To do so with a high information-extraction efficiency, one should perform a oneto-one conversion of the Hermite-Gaussian modes into modes in a more accessible degree of freedom with minimal loss and measurements that capture as many photons as possible. For example, we can take advantage of the fact that the Hermite-Gaussian modes are waveguide modes of a quadratic-index waveguide [37]. Suppose that we couple the image-plane optical field into such a highly multimode waveguide centered at the centroid position, as shown in Fig. 4. Each mode with index q acquires a different propagation constant β q along the longitudinal direction z. If a grating coupler [38] with spatial frequency κ is then used to couple all the modes into free space, each mode will be coupled to a plane wave with a different spatial frequency β q − κ along the z direction in free space, and a Fourier-transform lens can be used to focus the different plane waves onto different spots of a photoncounting array in the far field.  An alternative is to use evanescent coupling with different single-mode waveguides [39], as depicted in Fig. 5. If each single-mode waveguide is fabricated to have a propagation constant equal to a different value of β q , the phase-matching condition will cause each mode in the multimode waveguide to be coupled to a specific fiber.  Given these physical setups, we can now explain the operation of SPADE in a more intuitive semiclassical optics language: it is based on the exquisite sensitivity of the modecoupling efficiencies to the offset of the wavefunctions from the centroid. The incoherent sources are literally blinking on the fundamental coherence time scale, causing each imageplane photon to have a wavefunction given randomly by ψ 1 (x) or ψ 2 (x). Either wavefunction can excite the waveguide modes coherently with the same excitation probabilities, causing the final photon counts to be as sensitive to the offset for two sources as it is for one. Put another way, the incoherence between the two sources implies a random relative phase between the two fields and enables coupling into the first-order odd mode, which is the main spatial mode responsible for the high sensitivity to small offsets.
The use of photon counting is essential here to discriminate against the abundant but uninformative zero-photon events. If homodyne or heterodyne methods were used instead, they would suffer from excess vacuum fluctuations when no photon arrives. The poor performance of heterodyne methods for weak thermal sources is also known in the context of stellar interferometry [27,40]. The situation is different from measurements of coherent light, the density operator of which contains off-diagonal terms with respect to the photon-number basis and the probabilistic photon picture is less adequate. Our preliminary calculations [41] confirm this expectation and suggest that heterodyne detection of the spatial modes still suffers from Rayleigh's curse.
Suppose that a total of L photons are detected over the M trials. A record of the modes for the L photons (q 1 , . . . , q L ) can be obtained, but in fact a time-resolved record is not necessary, as l q l is a sufficient statistic for estimating Q and θ 2 [42], meaning that the set of photon numbers {m q = l δ qq l ; q = 0, 1, . . . } detected in different modes are also sufficient. The maximum-likelihood estimator becomeš which is straightforward to implement computationally. For ≈ K 22 , the QCRB is asymptotically attainable as well. Appendix E reports a Monte Carlo analysis of the maximum-likelihood estimator for SPADE, confirming that the Cramér-Rao bound remains close to the estimation error for finite photon numbers.

V. BINARY SPADE
Since direct imaging has trouble estimating the separation only when θ 2 /σ is small, and only low-order Hermite-Gaussian modes in SPADE are excited significantly in that case, we can focus on the discrimination of low-order modes to simplify the SPADE design. One such design is depicted in Fig. 6, where only the q = 0 component is coupled into the single-mode waveguide, while any photon in the higherorder modes remains in the multimode waveguide for subsequent detection. An alternative design is depicted in Fig. 7: the q = 0 mode is coupled to a single-mode waveguide, while higher-order modes are necessarily coupled to the leaky modes of the waveguide, which are also measured.
Conditioned on a detection event, the probability of detecting the photon in the q = 0 mode remains but now the higher-order modes cannot be discriminated, and the probability of detecting a photon in any higher-order mode becomes The Fisher information for this scheme is hence for direct imaging. It can be seen that binary SPADE gives significant information for small θ 2 /σ, which happens to be the regime where direct imaging performs poorly. Binary SPADE actually works less well when the sources are far apart, and the two methods can complement each other to enhance the localization precision, as shown in Appendix D.
For a total of L detected photons, L and m 0 , the number of photons detected in the q = 0 mode, are sufficient statistics for estimating Q and θ 2 , and m 0 follows the binomial distribution for L trials and success probability exp(−Q) [42]. The maximum-likelihood estimator becomeš For L = 0 or m 0 = 0, one can select finite values forθ 2 to regularize the estimator. Appendix E reports a Monte Carlo analysis of the resulting estimation error, confirming that it remains close to the Cramér-Rao bound for finite photon numbers. Compared with the large amount of data generated by direct imaging and the complex algorithms needed to process them, only two photon numbers are needed by binary SPADE to estimate the separation precisely. The highly compressed measurement output and computationally simple estimators, enabled by the coherent optical processing, come as bonuses with our schemes.

VI. OTHER POINT-SPREAD FUNCTIONS
Our analysis of SPADE so far relies on the assumption of a Gaussian point-spread function. For other point-spread functions, it is nontrivial to find a suitable basis of spatial modes, although we can still rely on the mathematical existence of a quantum-optimal measurement [20,21] to be sure that the QCRB can be saturated. For a more concrete method, the analysis of the binary SPADE schemes is fortunately still tractable, if we assume a single-mode waveguide with a mode profile that matches the point-spread function ψ(x) centered at the centroid position. Define |ψ = ∞ −∞ dxψ(x) |x as the state of one photon in the waveguide mode. The efficiency of coupling a photon in state |ψ 1 or |ψ 2 into the waveguide mode becomes where Υ(θ 2 ) is the mode overlap factor and Ψ(k) ≡ (2π) −1/2 ∞ −∞ dxψ(x) exp(−ikx) is the optical transfer function of the imaging system before the image plane [28]. For the density operator in Eqs. (2.1), (2.2), and (2.3), or in fact any mixture of |ψ 1 ψ 1 | and |ψ 2 ψ 2 |, the probability of finding a photon in the waveguide mode becomes P (ψ) ≈ Υ, and the probability of finding a photon in any other mode is P (ψ) ≈ (1 − Υ). The Fisher information over M intervals is then (6.2) To study its behavior for small θ 2 , expand Υ(θ 2 ) in Eq. (6.1) as Υ( can hence reach the quantum information K 22 = N ∆k 2 at θ 2 = 0, precisely where J 22 is expected to decrease, as the scheme is unable to discriminate the higher-order modes that become more likely to be occupied. Figure 9 plots K 22 , the numerically computed J  is the numerically computed value for direct imaging, and J (b) 22 is that for binary SPADE tailored for the sinc function. The vertical axis is normalized with respect to K22 = π 2 N/(3W 2 ).

VII. TWO-DIMENSIONAL IMAGING
The essential physics remains unchanged when we consider two-dimensional imaging, and we discuss the generalization only briefly here; the details are given elsewhere [43]. The single-photon ket in Eq. (2.3) should now be expressed as and ψ s (x, y) is a two-dimensional wavefunction [29,30]. In terms of a point-spread function ψ(x, y) and unknown positions (X 1 , Y 1 ) and (X 2 , Y 2 ), ψ s (x, y) = ψ(x − X s , y − Y s ), and we can define the four centroid and separation parameters as θ 1 = (X 1 + X 2 )/2, θ 2 = X 2 − X 1 , for the estimation of θ 2 and θ 4 decreases to zero when the sources are close, and Rayleigh's curse still exists for direct imaging [9,10]. On the other hand, the quantum Fisher information matrix, to be reported in Ref. [43], again shows no sign of Rayleigh's curse for two-dimensional separation estimation.
For SPADE, we can use the two-dimensional Hermite-Gaussian basis [37]. Assuming a Gaussian point-spread function and a known centroid, it is straightforward to show that a measurement of each photon in the Hermite-Gaussian basis with mode indices q and p obeys a two-variable Poisson distribution, and the classical Fisher information with respect to θ 2 and θ 4 remains a constant and free of Rayleigh's curse, similar to the one-dimensional case. For other point-spread functions, such as the Airy disk [2,28], binary SPADE with a matching mode profile can estimate the separation without Rayleigh's curse for small separations in the same way as the one-dimensional case, but information about the direction of the separation is lost. To obtain directional information, one needs to discriminate at least some of the higher-order modes in different directions.
A quadratic-index optical fiber can support twodimensional Hermite-Gaussian modes, while a weakly guiding step-index fiber also has modes closely resembling the Hermite-Gaussian modes [38]. A complication arises for cylindrically symmetric fibers, as modes with the same total order q + p will have a degenerate propagation constant, causing multiple modes to satisfy the same phase-matching conditions in grating or evanescent coupling and preventing discrimination of modes with the same order. The net result is that directional information is compromised. One solution is to turn the point-spread function into an elliptic one with asymmetric widths and use an elliptic fiber to break the degeneracy.

VIII. CONCLUSION
We have presented two important results in this paper: the fundamental quantum limit to locating two incoherent optical point sources and the SPADE measurement schemes for quantum-optimal separation estimation. Our quantum bound sets the ultimate limit to localization precision in accordance with the fundamental laws of quantum mechanics, while SPADE can extract the full information offered by quantum mechanics concerning the separation parameter via linear photonics. The proposed SPADE schemes work well for close sources with significant overlap in their wavefunctions, avoiding Rayleigh's curse and the divergent error that plagues direct imaging. The computational simplicity of the estimators is an additional advantage. Foreseeable applications include binary-star astrometry [15,16,44] and single-molecule imaging [7], either as a replacement of techniques based on fluorescence resonant energy transfer [12,45] or as an enhancement of localization microscopy [5-7, 10, 12] to provide complementary information about close pairs of fluorophores.
Note added.-Subsequent to the completion of this work (the first version of this manuscript was submitted to the arXiv preprint server on Nov 2, 2015 [46]), we have developed a semiclassical but less general theory to explain our results here for pedagogy [47], discovered an alternative scheme called Super-Localization via Image-inVERsion interferometry (SLIVER) that can overcome Rayleigh's curse without the need to tailor the device to the point-spread function [43,48], derived the QCRB for thermal sources without the 1 approximation using an alternative approach [49], and shown that variations of SPADE and SLIVER can attain the bound for arbitrary [49], validating the results here. A generalization of our theory presented here to two dimensions, with similar conclusions, is described in detail elsewhere [43]. Following Ref. [46], Lupo and Pirandola have derived the ultimate quantum Fisher information for separation estimation with arbitrary quantum sources [50], including our independent result on thermal sources [49] as a special case. Experiments inspired by our theory have been reported in Refs. [51][52][53][54]; Refs. [52][53][54] also propose variations of SPADE that are easier to implement experimentally. Helstrom pioneered the application of his quantum estimation and detection bounds to optical imaging problems [17,22,55], focusing on coherent and thermal sources. In particular, the now well-known expression for the shot-noiselimited localization error for one classical source can be found in Ref. [55]; similar expressions in the context of direct imaging were later reported in Refs. [35,36,56]. For more recent studies of quantum metrology for coherent-state or nonclassical-state imaging, see, for example, Refs. [57][58][59]. For studies on the use of squeezed light for single-object localization, see, for example, Refs. [19,[60][61][62][63]. None of these studies considered the problem of locating two close incoherent sources.
The standard quantum model of paraxial imaging and the use of nonclassical light for that purpose were proposed by Yuen and Shapiro [29]. This topic has been further investigated most notably by Kolobov and co-workers [64,65], who focused on coherent or squeezed light, homodyne detection, and field fluctuations. Such models are irrelevant to incoherent sources such as stars and fluorophores, for which the mean field is zero and photon counting is the more relevant method to minimize vacuum noise; to quote Helstrom [66], With such incoherently illuminated or radiating objects, it is not the field of the light that is of interest, for that field is best described as a random process having zero mean value and a most erratic spatiotemporal variation. Rather it is the mean-square value of the field, averaged over many cycles of the dominant temporal frequency, that characterizes the object in the most informative way. [67][68][69][70] considered the squeezing and measurement of the eigenmodes of an imaging system for image-reconstruction superresolution. Again, these studies focused on coherent or squeezed light only. The "Rayleigh resolution limit" mentioned by many of these papers is a misnomer, as the resolution limit for coherent imaging should be attributed to Abbe, while Rayleigh's criterion is defined for two incoherent sources [1,2] and ill-suited to coherent imaging [71]. Moreover, the imaging-system eigenmodes they studied have no relation to the spatial modes we propose for the two-source localization problem and they did not use the more rigorous framework of statistical parameter estimation.

Subsequent work by Kolobov and co-workers
We can consider the schemes proposed in Refs. [72][73][74][75][76][77][78][79] as another class of superresolution imaging protocols, which require coherent or nonclassical sources and multiphoton coincidence measurements and do not consider statistical inference. It is well known in statistical optics that a multiphoton coincidence measurement, such as the obsolete Hanbury Brown-Twiss interferometry, fundamentally has a much poorer signal-to-noise ratio than amplitude interferometry because multiphoton coincidence events are rare for thermal optical sources [23,27]. The actual statistical resolution of this class of protocols is thus questionable, especially for weak optical sources, without further proofs in the context of inference accuracy. In recent years, there has also been significant interest in quantum lithography [80][81][82][83] and ghost imaging [81,[84][85][86], although their applications are clearly different from our purpose and will not be elaborated here.
The relative neglect of incoherent sources in the quantumimaging literature, despite their obvious importance, may be due to a lack of appreciation that quantum mechanics can be relevant to such highly classical light. Our work thus showcases quantum metrology as a powerful tool to discover the ultimate performance of sensing and imaging even for classical sources, providing not only rigorous quantum limits but also pleasant surprises for one of the most important applications in optics. Define α = (α 1 , . . . , α J ) as a column vector of complex field amplitudes for J optical spatial modes on the image plane and |α as a multimode coherent state with amplitude α. Any quantum state can be expressed as where Φ(α) is the Sudarshan-Glauber representation and Dα is an appropriate measure [24]. For thermal sources, it is standard [24] to assume Φ to be a zero-mean complex Gaussian given by where α † = (α * 1 , . . . , α * J ) denotes the complex transpose of α, is the image-plane mutual coherence matrix, and E [f (α)] ≡ DαΦ(α)f (α) denotes the expectation of any function f with respect to the Φ distribution. Writing the coherent state in terms of a superposition of Fock states and applying the Gaussian moment theorem [24] to Eq. (B1), we can express ρ as the incoherent mixture where π n is the probability of having n total photons in the state and ρ n is an n-photon multimode Fock state. At optical frequencies or beyond, it is standard [13,14,[23][24][25][26][27] to assume that, within the short coherence time of a source, the average photon number arriving at the imaging device is much smaller than 1. We will make the same assumption for two sources, viz., where tr denotes the operator trace, a j is the annihilation operator for the jth mode, and a † j is the creation operator. For example, a star with sun-like temperature 6000 K emits ∼ 10 −2 photon on average per mode at wavelength 500 nm, while the limited fraction of the coherence area captured by the telescope aperture further reduces the received photon number [23]. In microscopy, a typical fluorophore emits < 10 7 photons per second [31] with coherence time < 50 fs [12], leading to < 10 −6 for two sources. The zero-photon probability given by is then the highest, the one-photon probability given by is to the first order, and the multiphoton probability is in the second order, leading to Eq. (2.1). As the vacuum state provides no information and multiphoton events are rare, we will focus on the one-photon state ρ 1 . This focus also makes our formalism applicable to inefficient single-photon emitters, which may have non-Poissonian multiphoton statistics but rare multiphoton events, and electron microscopy [9]. The negligence of the O( 2 ) multiphoton probability leads to a Poisson photon-counting distribution [23], which ignores bunching or antibunching effects but remains an excellent empirical model for both astronomical optical sources [13,14,[23][24][25] and fluorophores [10,12,[31][32][33] by virtue of the 1 condition. To quote Mandel [25], The light from these sources is always so weak thatnξ/T 1 [ in our terminology] and the degeneracy is unlikely to be detected in measurements on a single beam. The situation is, of course, improved when correlation measurements are undertaken on two or more coherent beams (Hanbury Brown and Twiss 1956), since these measurements single out the degenerate photons (Mandel 1958). Even so it is unlikely that any faint stars could be studied in this way.
Similarly, Goodman states that [23] If the count degeneracy parameter [ in our terminology] is much less than 1, it is highly probable that there will be either zero or one counts in each separate coherence interval of the incident classical wave. In such a case the classical intensity fluctuations have a negligible "bunching" effect on the photo-events, for (with high probability) the light is simply too weak to generate multiple events in a single coherence cell. If negligible bunching of the events takes place, the count statistics will be indistinguishable from those produced by stabilized singlemode laser radiation, for which no bunching occurs.
A more recent work by Zmuidzinas [13] also states that It is well established that the photon counts registered by the detectors in an optical instrument follow statistically independent Poisson distributions, so that the fluctuations of the counts in different detectors are uncorrelated. To be more precise, this situation holds for the case of thermal emission (from the source, the atmosphere, the telescope, etc.) in which the mean photon occupation numbers of the modes incident on the detectors are low, n 1 [ in our terminology]. In the high occupancy limit, n 1, photon bunching becomes important in that it changes the counting statistics and can introduce correlations among the detectors. We will discuss only the first case, n 1, which applies to most astronomical observations at optical and infrared wavelengths.
Define |j = a † j |vac as the ket with one photon only in the jth mode. Consider the one-photon matrix elements We can then assume, to the first order of , Similar approximations were also used in Refs. [26,27]. To derive Γ, let β ≡ (β 1 , . . . , β K ) be the field amplitudes for optical modes on the object plane, and consider the field propagation rule α = Sβ for a linear optical system, where S is the field scattering matrix. The image-plane mutual coherence Γ is then related to the object-plane mutual coherence matrix This propagation rule is a basic principle in both classical and quantum statistical optics [24].
In the paraxial regime, we can use localized wavepacket modes as a basis [29,30]. Let u be the position index for a wavepacket mode on the one-dimensional object plane and consider two incoherent sources with equal intensities at positions u = u 1 and u = u 2 . The fields are uncorrelated at different points on the object plane, with nonzero intensities only at the sources. Then where 0 is the average photon number from each source. On the image plane, the mutual coherence becomes and the average photon number can be expressed as = 2 0 η, where η ≡ j |S jus | 2 is the quantum efficiency of the imaging system and we have made the reasonable assumption that the efficiency is the same for both sources. Equation (B10) can then be expressed as Eq. (2.2) if we define single-photon kets |ψ s ≡ j ψ(j, u s ) |j with normalized wavefunctions ψ(j, u s ) = S jus / √ η. Assuming image-plane wavepacket positions x j = x 0 + jdx, position eigenkets |x j = |j / √ dx, and wavefunctions ψ s (x j ) = ψ(j, u s )/ √ dx, we arrive at Eq. (2.3) by taking the continuous-space limit with infinitesimal dx [29,30].
Appendix C: Quantum metrology: derivation of Eqs. (3.8) The quantum Fisher information matrix with respect to ρ ⊗M proposed by Helstrom [17] is defined as where L µ (ρ) is a symmetric logarithmic derivative (SLD) of ρ. Writing ρ in its eigenbasis as L µ (ρ) can be expressed as [17,87,88] Given this definition and Eq. (B4), it can be shown that as each ρ n is in an orthogonal subspace. Since the vacuum state ρ 0 = |vac vac| contains no information and multiphoton events are rare, the total information will be dominated by that from the one-photon state ρ 1 . We will therefore focus on the one-photon component π 1 K(ρ 1 ) as a tight lower bound on the quantum information and assume in the following With π 1 ≈ and the probability of multiphoton events being O( 2 ) according to Eq. (B8), this approximation is accurate to the first order of .
To compute the quantum Fisher information matrix K(ρ 1 ) according to Eqs. (C1)-(C3), we first need to diagonalize the ρ 1 in Eqs. (2.2) and (2.3), noting that the eigenvectors should span the supports of ρ 1 and ∂ρ 1 /∂θ µ . The partial derivative of ρ 1 with respect to X µ can be expressed as where H.c. denotes the Hermitian conjugate. In addition to the support of ρ 1 spanned by |e 1 and |e 2 , we also need to find more eignevectors that span the support of ∂ |e 1 /∂X µ and ∂ |e 2 /∂X µ . Assuming that ψ µ (x) = ψ(x − X µ ) and the point-spread function ψ(x) has an x-independent phase, we can take ψ(x) to be real without loss of generality and choose the following orthonormal set of eigenvectors: where ∆k 2 and γ are given by Eqs. (3.9) and (3.10), respectively, and the eigenvalues of ρ 1 are After more algebra, the SLD in Eq. (C3) with respect to the derivative in Eq. (C6) can be expressed as with a real and symmetric matrix L In terms of the centroid and separation parameters given by θ 1 = (X 1 + X 2 )/2 and θ 2 = X 2 − X 1 , the SLDs become We can now substitute Eqs. (C9)-(C12) into Eq. (C1) to compute the quantum Fisher information matrix K(ρ 1 ). The final result, assuming M π 1 ≈ M = N , is given by Eqs. (3.8) with zero off-diagonal terms.

Appendix D: Unknown centroid and misalignment
Our analysis of SPADE in Sec. IV and V assumes that the centroid of the two sources is known exactly and the device is optimally aligned with the centroid. For astronomy, it is reasonable to assume that the centroid is known accurately, as extensive telescopic data on stellar objects should be readily available and conventional imaging is accurate in estimating the centroid. Even if the centroid is unknown, stellar objects usually shine long enough for one to collect ample prior information before aligning the SPADE device. For microscopy, however, biological samples may drift more quickly and fluorophores can bleach, giving little time and few photons for one to estimate both parameters. One option, to be explored in future work, is to scan the SPADE device across the image plane in a manner similar to the operation of a confocal microscope [12].
Another option, illustrated in Fig. 10, is to split the optical field by a beam splitter, measure one output port by direct imaging, and use the centroid estimate to align SPADE at the other port in a hybrid scheme. As the overall optical system is linear with photon counting, the output statistics remain Poisson for 1, meaning that the statistics of the measurements are independent and can be analyzed separately. The penalty of beam-splitting with the classical sources is simply a reduction of photon number at each port. With direct imaging offering little information about θ 2 when Rayleigh's criterion is violated, the additional information offered by SPADE for a reduced photon number can still be helpful. The outstanding issues are then the robustness of SPADE to the misalignment due to imperfect centroid estimation, and the overhead resources of photons needed to achieve satisfactory alignment.
Let the center of a SPADE device beθ 1 and consider θ 1 = θ 1 due to misalignment. For a Gaussian point-spread function and the Hermite-Gaussian-basis measurement, Eq. (4.4) should be generalized to Define the level of misalignment as We treat ξ as a systematic error and θ 2 as the parameter of interest for SPADE. Figure 11 plots the resulting Fisher information for several levels of misalignment. It can be seen that the information degrades with misalignment, but appreciable enhancements over direct imaging are still present even if θ 2 σ and the wavefunction overlap is significant, as long as ξ 1. Appendix E confirms this conclusion numerically for finite photon numbers. To attain a tolerable level of misalignment, θ 1 first needs to be estimated andθ 1 should be aligned with the estimate. With conventional imaging, the centroid estimation error is near-optimal and on the order of σ/ √ N in terms of the rootmean-square value, meaning that the number of extra photons N 1 needed to attain ξ is roughly An even more realistic analysis would takeθ 1 to be a stochastic waveform determined by the centroid measurements and the adaptive alignment control [89]. For binary SPADE, Eqs. (5.1) and (5.2) should be generalized to Figure 12 plots the Fisher information for misaligned binary SPADE, showing a similar degradation behavior to that in Fig. 11 for nonzero ξ. Significant improvements over direct imaging are still possible for small separations and ξ 1. For two-parameter estimation, consider the hybrid scheme in Fig. 10, assuming 50-50 beam-splitting and binary SPADE for example. For simplicity, assume that the binary-SPADE output is used only for separation estimation, such that the total information matrix with respect to θ 1 and θ 2 remains diagonal. Compared with direct imaging, the centroid information for the hybrid scheme is halved, viz., but the separation information gained by binary SPADE can be appreciable, with The net performance of the hybrid scheme can be quantified in terms of the Cramér-Rao bounds for locating X 1 and X 2 . For a diagonal information matrix J with respect to θ 1 and θ 2 , the bound on the mean-square error Σ (X) of estimating either X 1 = θ 1 − θ 2 /2 or X 2 = θ 1 + θ 2 /2 is simply which demonstrates the detrimental effect of small J 22 for localization. Figure 13 compares the localization bounds for the hybrid scheme and direct imaging in log-log scale. For small separations, it can be seen that the increased separation information in the hybrid scheme more than compensates for the reduced centroid information and allows localization errors substantially lower than those for direct imaging. With a higher N 1 = N/2, more accurate centroid information from the imaging port can be used to reduce the misalignment at the SPADE port, and performance converging to the ideal ξ = 0 case in Fig. 13 can be expected for high N . Cramér-Rao bounds on the mean-square error of estimating X1 or X2 for a 50-50 hybrid scheme (solid) and direct imaging (dash-dotted). Note that the log-log scale is used here for clarity, unlike all the other plots in this paper. The vertical axis is normalized with respect to the error of locating an isolated source with direct imaging.

Appendix E: Monte Carlo analysis
To confirm that the classical Cramér-Rao bounds satisfactorily represent the actual performance of SPADE for finite photon numbers, here we simulate the device output data numerically, apply maximum-likelihood estimation, and investigate the resulting error. To refine our error analysis, we condition our results on the total number of detected photons L, which is obtained after an experiment, rather than the average photon number N [8]. It is not difficult to show that, conditioned on L, the classical and quantum Fisher information retain their expressions except that N is replaced by L. The error bounds become 1 J (HG) 22 It can also be shown that the sufficient statistic q qm q in the maximum-likelihood estimatior for SPADE in Eq. (4.7) is Poisson with mean LQ, so it is simple to generate samples of the maximum-likelihood estimatesQ ML andθ 2ML according to Eq. (4.7). Figure 14 plots the simulated mean-square errors, normalized with respect to Eq. (E1), for several values of L. It is intriguing to see that, as θ 2 /σ → 0, the errors go below the bounds. This is a well known statistical phenomenon called superefficiency [90,91], as the maximum-likelihood estimator here is actually biased for finite samples, and the simple Cramér-Rao bounds considered here need not apply. In asymptotic frequentist statistics, superefficiency is not regarded as an important idea [90], because a superefficient estimator can beat the Cramér-Rao bound only on a set of points with zero measure in the asymptotic limit [90,91], suggesting that any region of superefficiency should shrink for larger samples, as also shown in Fig. 14, and its usefulness is increasingly limited. A Bayesian version of the Cramér-Rao bound [11] can also be used to bound the global or minimax error of any biased or unbiased estimator; the Fisher information still plays a decisive role in the Bayesian bound and its significance as a precision measure remains strong in Bayesian and minimax statistics [92]. For our present purpose, the main point of Fig. 14 is that the errors remain less than twice the Cramér-Rao bound at worst and even offer the pleasant surprise of superefficiency for small separations. The overall closeness of the errors to the Cramér-Rao bounds supports our use of the Fisher information to represent the performance of SPADE.
For binary SPADE, the Fisher information conditioned on L has the same form as Eq. (5.3), and the Cramér-Rao bound can be expressed as The sufficient statistic m 0 inθ (b) 2ML given by Eq. (5.4) is binomial and also simple to generate. In case m 0 = 0, we setθ 2 = 2σ, the maximum of our considered range of θ 2 . Figure 15 plots the simulated mean-square errors for binary SPADE with otherwise the same parameters as those for Fig. 14. For small θ 2 /σ, the errors follow very similar trends as their counterparts in Fig. 14, and for larger θ 2 /σ the errors begin to follow the rising Cramér-Rao bound according to Eq. (E2). This supports our use of the Fisher information to represent the performance of binary SPADE. Note that the vertical axis is normalized with respect to 4σ 2 /L, so the plotted values are the actual errors magnified by L/(4σ 2 ). Each error is computed by averaging 10 5 simulations, and the lines connecting the data points are guides for eyes.
To investigate the effect of misalignment described in Appendix D, Fig. 16 plots the simulated errors for binary SPADE with a misalignment level defined in Eq. (D2) given by ξ = 0.1. The overhead photon number required to achieve ξ = 0.1 is N 1 ∼ 100 according to Eq. (D3) and negligible if L N 1 . Since ξ is unknown in reality, the maximum-likelihood estimator used in the simulations assumes zero misalignment for simplicity. Despite the model mismatch, the errors remain close to the Cramér-Rao bound, especially for larger L, and substantially below the bound for direct imaging. The maximum-likelihood estimator that assumes no misalignment is used. Note that the vertical axis is normalized with respect to 4σ 2 /L. Each error is computed by averaging 10 5 simulations, and the lines connecting the data points are guides for eyes. The errors are substantially below the Cramér-Rao bound for direct imaging (dashdotted curve).
for large N , and we can expect the performance for a given L = N to be an increasingly accurate approximation of the average performance in the given-N , random-L scenario.