Technical advantages for weak value amplification: When less is more

The technical merits of weak value amplification techniques are analyzed. We consider models of several different types of technical noise in an optical context and show that weak value amplification techniques (which only use a small fraction of the photons) compare favorably with standard techniques (which uses all of them). Using the Fisher information metric, we demonstrate that weak value techniques can put all of the Fisher information about the detected parameter into a small portion of the events and show how this fact alone gives technical advantages. We go on to consider a time correlated noise model, and find that a Fisher information analysis indicates that while the standard method can have much larger information about the detected parameter than the postselected technique. However, the estimator needed to gather the information is technically difficult to implement, showing that the inefficient (but practical) signal-to-noise estimation of the parameter is usually superior. We also describe other technical advantages unique to imaginary weak value amplification techniques, focusing on beam deflection measurements. In this case, we discuss combined noise types (such as detector transverse jitter, angular beam jitter before the interferometer and turbulence) for which the interferometric weak value technique gives higher Fisher information over conventional methods. We go on to calculate the Fisher information of the recently proposed photon recycling scheme for beam deflection measurements, and show it further boosts the Fisher information by the inverse postselection probability relative to the standard measurement case.

The technical merits of weak value amplification techniques are analyzed. We consider models of several different types of technical noise in an optical context and show that weak value amplification techniques (which only use a small fraction of the photons) compare favorably with standard techniques (which uses all of them). Using the Fisher information metric, we demonstrate that weak value techniques can put all of the Fisher information about the detected parameter into a small portion of the events and show how this fact alone gives technical advantages. We go on to consider a time correlated noise model, and find that a Fisher information analysis indicates that while the standard method can have much larger information about the detected parameter than the postselected technique. However, the estimator needed to gather the information is technically difficult to implement, showing that the inefficient (but practical) signal-to-noise estimation of the parameter is usually superior. We also describe other technical advantages unique to imaginary weak value amplification techniques, focusing on beam deflection measurements. In this case, we discuss combined noise types (such as detector transverse jitter, angular beam jitter before the interferometer and turbulence) for which the interferometric weak value technique gives higher Fisher information over conventional methods. We go on to calculate the Fisher information of the recently proposed photon recycling scheme for beam deflection measurements, and show it further boosts the Fisher information by the inverse postselection probability relative to the standard measurement case.

I. INTRODUCTION
There has arisen considerable interest in the use of "weak value" techniques to improve the accuracy of precision measurement. While it has been recognized for some time that these techniques, in and of themselves, do not overcome fundamental limits for coherent light sources ("standard quantum limit") (see e.g. [1]), there are technical advantages in that these methods make the experimental approach to these limits relatively easy with common experimental equipment. Indeed, these techniques have already been successfully applied in the lab to measure with high precision the optical spin Hall effect and other polarization dependent beam deflections [2][3][4], interferometric deflections of optical beams [5][6][7], phase shifts [8][9][10], frequency shifts [11], temperature shifts [12], and velocity measurements [13]. In most of these experiments, the weak value amplification (WVA) technique met and even surpassed the sensitivity of standard techniques in the field. For a recent review of this and related weak value research, see Ref. [14].
Although these experimental findings have been employed in a number of different research groups and applied to metrological questions of a number of different physical parameters, there are still some open questions and even controversy [15,16] [1]. An important step was made in this question when Feizpour, Xingxing, and Steinberg [10] were able to consider a more general kind of technical noise, and showed that so long as it had a long correlation time, WVA also help suppress it in the signal to noise ratio (SNR). In other closely related work, Kedem [17], Brunner and Simon [18] and Nishizawa [19] also showed an increased performance of the SNR in the presence of technical noise.
In contrast to these results, recent papers have claimed that WVA gives no technical advantage [15,16]. The argument is justified by using a Fisher information analysis of technical noise applied to the signal carrier (e.g., such as beam displacement jitter). (We note the Fisher information analysis has been recently applied to WVA by other authors as well [13,20].) However, this particular form of technical noise does not represent the complete picture. There are many forms of technical noise that are not incorporated in this model. For example, in optical beam deflection, noise sources include: electronics noise, transverse displacement and angular jitter, analogto-digital discretization noise, turbulence, vibration noise of the other optical elements, spectral jitter, etc. In light of this criticism, it is our aim in this work to analyze some of these models and examples using Fisher information and maximum likelihood methods in order to understand in precisely what sense they give or fail to give a technical advantage, as well as describe other technical advantages in beam deflection (and derivative) experiments where the imaginary WVA technique does lead to the optimal Fisher information even in the presence of some types of noise sources mentioned above.
The paper is organized as follows. In Sec. II, we introduce the concepts of Fisher information and maximum likelihood techniques, and illustrate how to apply them to Gaussian random measurements. We introduce weak value amplification and postselection in Sec. III. Uncorrelated, displacement technical noise is discussed in Sec. IV. Time correlated technical noise is analyzed in Sec. V. Air turbulence is discussed briefly in Sec. VI. The combination of displacement jitter and turbulence is discussed in Sec. VII, showing the weak value technique can suppress both. The weak value technique is shown to better suppress angular jitter in deflection measurements in Sec. VIII. We examine a recent photon recycling proposal in Sec. IX and show the Fisher information is boosted by the inverse postselection probability. Our conclusions are summarized in Sec. X.

II. FISHER INFORMATION OF AN UNKNOWN PARAMETER
Fisher information describes the available information about an unknown parameter in a given probability distribution. Consider an unknown parameter d, upon which some random variable x depends. Let the probability distribution of x, given d be p(x|d). The score of the distribution is defined as S = ∂ d log p(x|d). Assuming p is a smooth function, the average of S over p is 0, and its variance (second moment) is defined as the Fisher information, Fisher information is additive over independent trials, so for N statistically independent measurements (such as the collection of N photons from a coherent source), Consider an unbiased estimator of d, calledd. This is any statistical estimator whose expectation value is d. The variance ofd is bounded from below by the Cramér-Rao bound (CRB), or Var[d] ≥ I(d) −1 . Thus, the Fisher information sets the minimal possible estimate on the uncertainty of d, for any unbiased estimator.
To illustrate how this works, let us consider a Gaussian distribution for p({x i }|d), which will describe N independent measurements {x i } of an unknown mean with known variance σ 2 for each measurement, In this example, the score is given by S = N i=1 (x i − d)/σ 2 , so indeed S = 0, and the Fisher information is given by Thus, the CRB on the variance is simply σ 2 /N . Consequently, the minimum resolvable signal d min will be of order of d min ∼ σ/ √ N . In order to achieve this minimum bound on the variance, the optimal unbiased estimator d opt (often called the efficient estimator) must be used. We can find it with maximum likelihood methods, by setting the score to zero, and replacing d byd opt . In the case of P G , we have so we find the efficient estimator, which is simply the average of the data in this case. We can check that the variance gives the inverse CRB: In an optical context, this is the "standard quantum limit" scaling with N , which we can interpret as the photon number.
Here the parameter d can be interpreted as the displacement of a beam with transverse width σ. One can immediately see that the maximum Fisher information occurs for the smallest allowable beam waist. This is intuitively obvious. If a beam of very small waist experiences a small shift in its mean value, a position sensitive detector (e.g., a split detector) would see a large change in intensity as a function of its position compared to a beam with a large waist.
In the rest of this paper, we will consider coherent Gaussian distributions for simplicity. While this is somewhat restrictive, it is also quite reasonable since most of the experiments have been performed using coherent Gaussian probability densities. This is also quite nice theoretically, owing to the fact that the log likelihood function is twice differentiable and we can use variations of the simple form of the Fisher information derived above.

III. REAL WEAK VALUES AND POSTSELECTION
To apply the above results to recent optical experiments, we briefly recall a few facts about weak values [21]. If a quantum system is prepared in an initial state |i , has a system operator A that is measured by weakly interacting with a meter prepared in a state of spatial variance σ 2 , and then postselected in a final state |f with probability γ = | f |i | 2 , the meter degree of freedom will be shifted by a multiplicative factor where A w is the "weak value" of the operator A, while leaving the width σ unchanged (for the moment we consider the real weak value case for simplicity). Such behavior is in contrast to the non-postselected case, where if the initial state is an eigenstate of A, so that A|i = a i |i , the average meter shift is a i d, which can be much smaller in size than the weak value shift, A w d. This process gives rise to a (normalized) Gaussian meter probability distribution consisting of N = γN measurement events {x i }, Computing the CRB on the variance as before gives the score to be Note that the post-selection probability γ canceled out. Therefore, the Fisher information is the same as before, except for a factor f |A|i 2 , a number that can be arranged to approach 1 for a two-level system with a judicious choice of operator, pre-and post-selection (see Appendix). This is consistent with the SNR analysis of Ref. [1]. Similar points were made by Hofmann et al. [22]. We can then extract all the Fisher information from the weak value, showing that ideally, the WV technique can put all of the Fisher information into the post-selected events, which matches the Fisher information in the standard methods using all of the events. We note that it is not surprising that considering any subensemble gives less information than the whole ensemble.
What is surprising is that by using this particular small sub-ensemble gives you all the Fisher information. This fact alone gives us technical advantages, as we shall see. Following the maximum likelihood method presented in Eqs. (5,6), the weak value efficient estimator is given

IV. TYPE 1 TECHNICAL NOISE: DISPLACEMENT NOISE
As a next step, we include the effect of one kind of technical noise. We first consider the technical noise model of Knee and Gauger, where the N random variables {x i } are independently distributed with mean d and known variance σ 2 [15]. Gaussian white technical noise {ξ i } is added to {x i } with mean 0 and covariance ξ i ξ j = J 2 δ ij . We note that similar models have also been previously discussed in Refs. [1,10,17]. Knee and Gauger comment that this kind of noise might represent transverse beam displacement jitter (in a collimated beam), for example. We call this type 1 technical noise. The measured signal will be the sum of s i = x i +ξ i , so the distribution function for s i is p( Integrating over x i gives the distribution of the measured results as a convolution of the two distributions, where N is a normalization constant. Calculating the Fisher information as before, for N independent measurements (photons) gives The technical noise simply broadens the width, decreasing the Fisher information. In the weak value case, we follow the same procedure, except that d → A w d, and N → N = γN , where γ is the post-selection probability. We add the same technical noise to the resulting distribution of post-selected events {s i }, and get Calculating the Fisher information for the postselected case (which must be reduced by γ, the post-selection probability) gives where we used the weak value formula (7). Consequently, as before we find the Fisher information is modified by a factor of order 1 (which will decrease or keep the Fisher information the same, as discussed before). This is the same conclusion reached by Knee and Gauger [15] and Ferrie and Combes [16], that the weak value amplification offers no increase of Fisher information for technical noise. This result was previously found by Feizpour, Xingxing, and Steinberg [10], who noted that this type 1 technical noise could not be suppressed by the real weak values technique. However, as we will show in Sec. VII, this is not the case for techniques implementing an imaginary weak value. We go on to find the efficient estimator in the postselected case, following the maximum likelihood method presented in Eq. (5,6). The efficient estimator is given In other words, one can efficiently estimate the parameter from the SNR. We note that in either the standard or WVA scheme, the Fisher information can be further improved by reducing the width of the meter state, but only until σ ∼ J, after which the technical noise dominates the variance. We will return to this point in Sec. VII. In order to achieve the CRB, the Fisher information must not only be calculated, but the associated estimator must also be practical to implement. The point of practicality of the efficient estimator can be strongly made by considering the analysis of Feizpour, Xingxing, and Steinberg [10]. Consider an experiment with single photons, such that the measurement is only triggered by the photon detection at the detector, and consists of N measured variables φ j described by an averageφ plus a noise term η j that is correlated in general, η i η j = C ij (we define this as type 2 technical noise). The noise term contains both quantum and technical noise. It is then straightforward to check that the average is given by (1/N ) N i=1 φ i =φ, and the variance is given by The two limits considered in Ref. [10] are (i) the white noise limit, C ij = Cδ ij , and (ii) the fully correlated limit C ij = C. In case (i) we have V = C/N , while in case (ii), we have V = C. The SNR is given by R =φ/ √ V , thus averaging helps the SNR for white noise, but not for fully correlated noise. Now consider the postselected case, whereφ →φA w , and N → N = γN . In case (i), the variance is now C/N , while the signal is A wφ , so the SNR scales like A w √ γ, so the small post-selection probability drops out, indicating there is no advantage to using post-selection in this case (exactly as we calculated in the previous sections for type 1 technical noise). However, in case (ii), the signal is boosted the same amount, while the variance remains C in the fully correlated case. Consequently, the SNR shows an advantage over the non-postselected case. Feizpour, Xingxing, and Steinberg go on to consider a particular noise model consisting of a combination of white and time-correlated noise, showing this advantage remains so long as the correlation time remains long compared to the photon production rate (as it is in many optical implementations).
We can now revisit this model in the context of the Fisher information metric to see how it compares to the SNR metric. The joint probability distribution of all of the variables φ i can be written, assuming Gaussian statistics, as Here, x is a vector of elements φ i , and µ is a vector of the means (in this case, µ =φ1 is a vector with the same mean in every element). The matrix C is the covariance matrix and has elements C ij . C −1 is the inverse of the covariance matrix, and detC is its determinant.
In this case, we may calculate the Fisher information (2) aboutφ contained in this distribution, and obtain, Notice this Fisher information contains a double sum of the elements of the inverse covariance matrix. The "diagonal" terms describe the self-correlation terms, and scale proportionally to N , recovering the independent trials for the case when the covariance matrix is diagonal. However, for strong correlations, such as the type Feizpour, Xingxing, and Steinberg consider, each element of the inverse covariance matrix can be of comparable value, so the Fisher information can scale at most like N 2 , giving a much larger Fisher information than for uncorrelated noise. The reason is because of the correlations between the different measurements that the SNR metric misses.
When we go to the postselected case, the dimension of the matrix shrinks from N to γN , while the mean is boosted by A w ∼ γ −1/2 . The Fisher information is boosted by 1/γ as before, but the double sum in (15) now only goes to γN in the upper limit, ij . The covariance matrix is different in general, since it now describes the correlations between the postselected photons only. Consequently, for white noise the Fisher information is the same, up to a factor of order 1, while for highly correlated noise, the Fisher information scales at most as N 2 γ, so it is actually decreased by a factor of γ by the weak value technique, compared to the non-postselected case. It is easy to see why: the correlations of any single postselected photon with any rejected photon are lost in the detection scheme (unless further processing of the correlated missing photons is done), and consequently cannot be harnessed to further suppress the noise. In contrast to this, if the photon is correlated only with itself, then taking a random postselection will not hurt, and the Fisher information can stay the same.
However, this is not the whole story.
We must ask what estimator should be used that saturates the CRB, and whether it is practical to implement it. We can find this estimator using maximum likelihood methods described earlier (5,6) to find the estimatorφ. We assume that the covariance matrix is positive definite, symmetric, and invertible. It then follows that its inverse is also symmetric. We find the estimator in the non post-selected case to bê We can check the variance of this estimator matches the CRB, because the correlation of the two random variables gives the elements of the covariance matrix, C jl , which cancels one of the inverse matrices in the sums, giving one factor of the denominator, resulting in the CRB we found above, the inverse of (15). However there is a difficulty in this result: The experiment implementing the estimator (16) must multiply every data point φ j by a different weighting factor, which knows about the rest of the data points. The experimenter must know exactly what the correlations are, how many data points are being collected, and must be able to weight each data point by a different factor in order to extract the maximal information in the data average. This is generally a very challenging experimental task. Therefore, the SNR, which treats every type of noise on equal footing (so the weighting assignment is f j = 1/N for all j) is usually the most practical option. Consequently, the SNR, although suboptimal as a means for estimating in this case, is still superior since the optimal estimator is impractical to implement for typical experiments. This is why the SNR is used by experimentalists: because a complete categorization of the noise correlations is a formidable task requiring detailed knowledge of noise correlations and extensive post-processing. If we go on to consider non-Gaussian correlated noise, such as 1/f noise, the problem of implementing the estimator becomes even worse.

VI. TYPE 3 TECHNICAL NOISE -AIR TURBULENCE
Another type of technical noise that is very important in open air experiments is turbulence, which we refer to here as type 3 technical noise. While we will not give a quantitative analysis of this effect here, it gives additional beam width broadening beyond the diffraction limit because of beam breathing (on a short time scale), and beam wander (on a longer time scale) because of the propagation through the random medium [23]. This becomes an important problem when there is a large optical path length from the position where the deflection occurs to the detector where it is measured. Beam jitter from turbulence cannot be underestimated when dealing with extremely small deflections. The beam wander effects become important on time scales longer than the ratio of the beam width to the typical air velocity. Typical experiments can run between seconds and hours, so this effect must be accounted for. The weak values schemes have a distinct advantage over the standard beam deflection measurement in having short optical path lengths. We will see this in detail in the next section.

VII. IMAGINARY WEAK VALUES AND TECHNICAL ADVANTAGES
We proceed to consider situations where imaginary WVA has technical advantages for combined technical noise types. Kedem points out that in the case of imagi- nary weak values, noise in the average position does not appear in making measurements in the momentum basis, and that noise on the average momentum helps the SNR [17]. We focus on a different effect unique to imaginary weak values. For definiteness, consider beam deflection measurements for a coherent Gaussian beam using standard deflection techniques versus a Sagnac interferometer weak value experiment. Most experiments used to date that show technical advantages use imaginary weak values.
To start, assume that the system has no technical noise. The unknown parameter of interest is the deflection k of the beam, which can be interpreted as the transverse momentum kick given by a mirror. Treps et al. showed that one can interfere a local oscillator of a first order TEM mode to achieve the optimal Fisher information [24]. However, for simplicity and to understand the role of beam diameters we consider a more standard approach. A tilt or deflection on a beam requires propagation to observe a displacement. The standard method of measuring an unknown mirror tilt k with a beam of width σ is to propagate the light and focus it with a lens, thereby taking the Fourier transform of the beam by measuring the beam with a split detector in the back focal plane of the lens (see Fig. 1(a) assuming q = 0). The lens transforms the tilt k into a displacement f k/k 0 on the detector, where f is the focal length and k 0 is the wavenumber. The lens takes the beam width σ at the mirror to a beam width at the focus given by The single photon probability distribution function is then where N is a normalization. If we send N independent photons to measure k, this yields a Fisher information of which is an intuitive result. This means that in order to get the smallest beam waist in the back focal plane, one wants the largest beam waist possible before the lens. We now compare this with the Sagnac interferometer weak values result. The weak value technique for measuring the beam deflection ( Fig. 1(b) with q = 0) gives the post-selected distribution, that when properly normalized, is a Gaussian distribution with mean 4kσ 2 /φ and width σ 2 . Here, φ is the phase difference of clockwise and counter-clockwise photons in the interferometer applied by the Soleil-Babinet compensator (SBC). This applies only to the post-selected fraction N = γN of the photons that satisfy the postselection criterion, where γ = φ 2 /4. The postselected single photon probability distribution function is then The Fisher information for the postselected measurements, reduced by the postselection probability γ (N → γN ) is then found to be Thus, the weak values and standard method result in the same amount of classical Fisher information, as also discussed in the appendix [25]. It is important to realize that this calculation is only valid in the regime of small angles sin 2 (φ/2) ≈ φ 2 /4. In other words, in the small angle or weak value regime, all of the Fisher information is in the photons leaving the dark port of the interferometer. So how does noise affect the Fisher information for the two methods? As a first step, we will assume that there is no transverse beam deflection jitter, but only transverse detector jitter. In other words, there is a small transverse deflection k with no technical noise on the beam, but the detector used to measure the beam has a transverse jitter ξ that we only sample at the photon arrival times, giving type 1 technical noise. We first consider the standard method. As before, the new likelihood function is the convolution of the technical noise-free likelihood function with a Gaussian of width J. This simply increases the average beam waist at the detector by σ 2 f + J 2 , resulting in a Fisher information This is a rather interesting result. It means that this approach can achieve the maximum Fisher information (as if there were no noise at all), but only in the limit of large focal length (f 2k 0 σJ). A large focal length is equivalent to opting for a larger displacement (f k/k 0 ) while allowing for a large focal spot (σ f J). This is once again an intuitive result: If a very small focal spot lands on a detector that has Gaussian random shifts, large differential intensity fluctuations will occur, due to detector jitter, compared with a large focal spot.
For the imaginary weak value approach, the results are quite different. The beam waist for this case is given by σ, not the focused beam waist σ f . This gives a Fisher information of Therefore, since σ σ f and there is freedom to choose σ as large as one wishes, we can make the beam waist much larger than J, which both suppresses J and increases the Fisher information (21). This shows that Fisher information for the imaginary weak values approach remains unchanged in the presence of transverse detector jitter, while the focused beam approach requires long focal lengths.
Hence, the measurement geometry plays a very important role in several ways. Ideally one would like a detector with a continuous detection distribution, but practical considerations mean there will always be dead space between finite width detectors. A simple method for obtaining nearly optimal beam detection is split detection. However, even in this case there is always a gap between the detectors, which sets a minimum beam diameter and thus a finite propagation through horizontal turbulence. This is not a problem for weak values since the detector can be placed immediately after the last beam splitter which mitigates the type 3 technical noise while also suppressing type 1 technical noise. The standard method operating in the regime where type 1 noise is completely ameliorated will suffer from the type 3 technical noise.

VIII. TYPE 4 TECHNICAL NOISE -ANGULAR BEAM JITTER
We now consider angular beam jitter by modeling it as a random momentum kick q given to the beam before it enters the interferometer, or before it approaches the signal mirror in the standard method. This kick could be from air turbulence or mirror jitter.

A. Standard method results for angular jitter
We first consider the standard method of measuring the beam displacement illustrated in Fig. 1(a). The Fourier optics of this geometry is described by a series of unitary operators that act on the transverse degree of freedom in the paraxial approximation. Starting from an initial transverse state |ψ , that we take to be a Gaussian in transverse position with zero mean, and variance σ 2 , the unitary U q = exp(iqx) gives the first random momentum kick q to the beam. This is followed by a propagation of distance l 1 , given by the unitary U l1 = exp(−ip 2 l 1 /2k 0 ), where k 0 is the wavenumber of the light. This is followed by a momentum kick k described by U q→k , and then the lens which gives a quadratic phase front, U f = exp(−ik 0x 2 /2f ). The final propagation U l2 puts the beam at the measurement device, which is a position-sensitive detector (PSD).
Taken together, we can describe the final state as To obtain an explicit form for the state, and the expectation of the position and its variance in this state, a series of complete sets of states are inserted between the unitaries. As an intermediate step, let us define state ψ i as the state after the k momentum kick, but before the lens. The action of the lens is diagonal in the position basis, but the subsequent propagation is diagonal in the momentum basis. Defining the coordinate at the detector as x, the state at the detector ψ d (x) is given by We can reverse the order of integration and perform the p integration first. This gives Let us choose l 2 = f , so the quadratic terms in the exponential cancel out, which will lead to focusing the beam. This leaves the state as which simply gives the scaled Fourier transform of ψ i , but with a particular value of the momentum,ψ i (p → k 0 x/f ). The remainder of the solution involves finding the intermediate state. This is straightforward in the momentum basis, because the first three operators are diagonal in this basis. Therefore, the momentum-space expression for ψ i is given bỹ (28) Putting these results together, we find that the final state at the detector gives a Gaussian distribution in position x with average and variance of The net result is that the random momentum kick q simply adds to the signal k. Averaging this distribution of Gaussian random jitter of momentum q with zero mean and variance Q 2 gives another Gaussian distribution for x, of mean −kf /k 0 and wider variance f 2 [1/(2σk 0 ) 2 + Q 2 /k 2 0 ]. Thus, the Fisher information, for N independent measurements, about k in this distribution is given by (30)

B. Weak value treatment of angular jitter
A similar analysis can be carried out for the weak value interferometer case. In addition to the momentum kick and the propagation steps, there are two unitaries that depend on the which-way operatorŴ = | | − | |, where the states | , | are clockwise and counterclockwise moving photon states inside the Sagnac interferometer. Superpositions of these states are created by the 50-50 beam splitter operating on the incoming beam. The unitaries are the relative phase shift operator, U φ = exp(iφŴ /2), induced by the SBC, and the whichpath momentum kick operator, U W k = exp(ikŴx) delivered by the signal mirror.
As shown in Fig. 1(b), starting in the state |Ψ i = |ψ (| + i| )/ √ 2, and ending in state which describes photons entering in one interferometer port and exiting the other interferometer port. Fortunately, the which-path states pass through all but two of the operators, so this amplitude may be simplified to eliminate the which-path states and operators, giving the state of the transverse beam at the detector, ψ wv (x), to be ψ wv (x) = x|e −ip 2 l2/2k0 sin(kx + φ/2)e −ip 2 l1/2k0 e iqx |ψ .
(32) The detailed calculation of this state involves inserting complete sets of states, resulting in a complicated expression. The result is simplified by expanding to linear order in φ and k since we assume the weak value ordering of parameters, kσ < φ < 1. We must renormalize the post-selected distribution by the probability γ of a photon exiting the dark port, given by We drop the other terms compared with (φ/2) 2 , so the phase shift φ controls the post-selection. This gives an average displacement of where we suppress higher order terms in k, q, φ. The first term of order k is the usual weak value term, amplified by 1/φ [5]. The second term comes from the diffraction effects which gives a small correction to the first term because we take σ 2 l 1 (l 1 + l 2 )/2k 0 . The remaining term, q(l 1 + l 2 )/k 0 is just the free propagation from the momentum kick, and has a geometric optics interpretation of the imparted deflection angle q/k 0 times the total length of propagation. We note that the way q enters into the average displacement has a very different form than Eq. (29). We will return to this shortly. The displacement variance at the detector may be similarly calculated to find, plus further corrections of order k 2 . Thus, we see that the diffraction effects broaden the width of the beam, proportional to the total path length. If we approximate the distribution as a Gaussian with the mean and variance discussed above, the Fisher information about k can be found by averaging over q as before to find Consequently, even if σQ is large compared to 1, so as to degrade the Fisher information in the standard method (30), in the weak values technique, there is an additional suppression factor of the amount of diffraction, (l 1 + l 2 )/(2k 0 σ 2 ) 1, indicating that the weak value technique outperforms the standard method when dealing with angular jitter. This result can be understood intuitively because the angular jitter directly adds to the detected deflection in the standard measurement case, whereas it is only a small correction to the deflection that is controlled by diffraction in the weak value technique.
The rather remarkable properties associated with imaginary weak value experiments may not be entirely attributed to weak values, but to geometric terms associated with the experiments. These terms, such as beam waist diameter, arise even when WVA is small. However, it is important to note only in the limit of large WVA that all of the information in the measurement can be placed in the measured photons and that optimal SNR estimation of the parameter can be made. Therefore, these geometric terms and WVA work hand-in-hand to achieve the technical noise suppression.

IX. FISHER INFORMATION FOR RECYCLED PHOTONS
We can also consider the possible benefit to the Fisher information of the recent photon recycling proposal of Dressel et al. [26]. Considering photons as a resource, one can ask what the maximum amount of information can be extracted from a given set of photons. In much the same way as a high finesse cavity can increase phase sensitivity in an interferometer, recycling photons from the bright port of a weak value experiment can increase the information about a parameter. The central idea is to postselect all of the photons while keeping the large weak value amplification. The scheme is to recycle the rejected photons by closing off the interferometer, so the remaining N 1 = N − N photons are re-injected (N = γN ) and once again sample the unknown parameter k. The authors show that it is critical that the rejected light be reshaped to once again be in its original profile, otherwise all amplification of the split detector SNR is erased over many cycles. Below, we consider the simplest version where no propagation effects are included. In that case, the second round of postselected light has exactly the same distribution on the detector, with the same post-selection probability γ, so the Fisher information for this cycle is I 1 = (γN 1 )4σ 4 /(γσ 2 ) = 4σ 2 N 1 . This process is now repeated many times, where N i is the number of rejected photons on the i th round, and we define N 0 = N . The uniform postselection probability indicates Since each measurement is independent from the last, the Fisher information simply adds, giving a total of Therefore, the Fisher information has been boosted by a factor of 1/γ compared to the standard method, or the single pass weak value method. Here, we are considering the photon number N as the resource. Of course, as is pointed out in Ref. [26], one could have been sending more light onto the detector using the standard method in the time taken for the light recycling or employed other standard schemes, but there could be many technical reasons why this may be impossible or inadvisable given a laboratory set-up. There may be a minimum quiet time between laser pulses, for example, or the detector may have a low light power threshold. We note that the profile reshaping process actually removes information about the parameter k, but does so in a way that the estimator can be simply and easily implemented as the split detection SNR. Together with the technical advantages already discussed, this is an important improvement over previous techniques. When further combined with quantum light techniques, this method gives a powerful advantage for estimating a parameter. The fact that all photons are now collected permits one to further mine interphoton correlations for noise suppression.

X. CONCLUSIONS
We have shown how weak value based measurement techniques can give certain technical advantages to precision metrology. First and foremost, the Fisher information in a weak value measurement (which uses a small fraction of the available light) can be as large as the Fisher information of a standard measurement (which uses all of the available light). This is remarkable because the remaining light can be sent to another experiment [11], or recycled [26] to give even higher amounts of Fisher information.
We have also explored technical advantages the weak value experiments can have over standard measurement techniques. Obvious advantages, such as when the detectors saturate at a certain light intensity, have been pointed out in previous works [1,5]. The possible advantages for different types of technical noise should be investigated on a case by case basis. There are cases where the weak value techniques gives advantages, and other cases where it is at a disadvantage, and yet other cases where there is no difference. It is clear, for example, that dephasing noise will reduce the size of the weak value, which will be detrimental to this technique [27]. We have shown how detector noise, together with air turbulence can both be eliminated by a weak value deflection measurement, whereas the conventional standard method must suffer from turbulence jitter if detector noise is suppressed in open air experiments. We also demonstrated that for angular jitter, the weak value technique for beam deflection measurements can have much higher Fisher information than the standard technique. Considering the wide range of experiments that have now successfully employed weak value techniques to make high precision measurements [1][2][3][4][5][6][7][9][10][11][12][13], this should not be too surprising.
Another conclusion we reach is that it is not sufficient to show an estimator does not reach the CRB to decide it should be rejected. Rather, the optimal estimator should be found, and must be practically implementable. If it is not, the inefficient -but practical -estimator is superior. We have argued that time-correlated technical noise is one example where the difficulty of implementing the optimal estimator is outweighed by the option of implementing the postselection with amplification. A w = −i cot φ. In order to maximize γ we choose Θ = π/2, so the results are |i = |+ + e iΦ |− / √ 2 and |f = e −iφ |+ − e iΦ+iφ |− √ 2. Finally, in the small angle approximation, γ ≈ φ 2 and A w ≈ −i/φ, making | f |i Im(A w )| ≈ 1. On the other side, a pure real weak value can be obtained making φ = 0, so γ = sin 2 θ, and A w = sin(Θ + θ)/ sin θ. In order to make A w large for this case, we choose again Θ = π/2. The preselection is identical to the former case |i = |+ + e iΦ |− / √ 2, and the postselection takes the form |f = sin θ |+ + e iΦ |− / √ 2+cos θ |+ − e iΦ |− / √ 2. Taking the small angle approximation for this case we obtain a similar result to the pure imaginary weak value case, γ ≈ θ 2 , A w ≈ 1/θ, and | f |i Re(A w )| ≈ 1.
Setting Θ = π/2 turns out to be the best choice for practical purposes. We therefore calculate the products | f |i Re(A w )| and | f |i Im(A w )| for such a choice, and plot them in Fig. 3. It is shown in Fig. 3(a) that the largest value | f |i Re(A w )| = 1 is closest reached if φ = 0 and |θ| 1, which corresponds to a pure real weak value and an almost orthogonal postselection. Similarly, Fig. 3(b) shows that | f |i Im(A w )| ≈ 1 only if θ = 0 and |φ| 1, defining a pure imaginary weak value with almost orthogonal postselection.