The quantum Bell-Ziv-Zakai bounds and Heisenberg limits for waveform estimation

We propose quantum versions of the Bell-Ziv-Zakai lower bounds on the error in multiparameter estimation. As an application we consider measurement of a time-varying optical phase signal with stationary Gaussian prior statistics and a power law spectrum $\sim 1/|\omega|^p$, with $p>1$. With no other assumptions, we show that the mean-square error has a lower bound scaling as $1/{\cal N}^{2(p-1)/(p+1)}$, where ${\cal N}$ is the time-averaged mean photon flux. Moreover, we show that this accuracy is achievable by sampling and interpolation, for any $p>1$. This bound is thus a rigorous generalization of the Heisenberg limit, for measurement of a single unknown optical phase, to a stochastically varying optical phase.


I. INTRODUCTION
The probabilistic nature of quantum mechanics imposes fundamental limits to hypothesis testing and parameter estimation [1][2][3][4][5]. Such limits are relevant to many metrological applications, such as optical interferometry, optomechanical sensing, gravitational-wave detection [6][7][8][9], optical imaging [10][11][12], magnetometry, gyroscopy, and atomic clocks [13]. The ultimate quantum limits to parameter estimation have been studied extensively in recent years, as they imply that a minimum amount of resource, such as the average photon number for optical phase estimation, is needed to achieve a desired precision, regardless of the measurement method.
Many real-world tasks, such as optical imaging [37,38], quantum tomography and system identification [39,40], and time-varying signal (waveform) estimation [7,9,[41][42][43][44], require the estimation of multiple parameters. Multiparameter quantum Cramér-Rao bounds have been known since the 1970s [7,[45][46][47], but efforts to derive multiparameter Heisenberg limits from these bounds have not been successful. This is not surprising since, even in the case of single-parameter phase estimation, it is not possible to derive the Heisenberg limit from the quantum Cramér-Rao bound without additional assumptions on the state [48].
In Ref. [41], some of us recently proposed a Heisenbergstyle limit for the estimation of an optical phase waveform with stationary Gaussian prior statistics and a power-law spectrum. However, that limit, being derived from a quantum Cramér-Rao bound, requires additional assumptions: It applies only to the specific class of optical beams described by Gaussian fields, with statistics that are both stationary and time symmetric. A very different approach is given in Ref. [49], which derives a multiparameter Heisenberg limit for independent parameters by applying the single-parameter Heisenberg limit to each parameter. In practice, multiple parameters often have nontrivial prior correlations, particularly in the case of continuous waveform estimation, where the correlations are crucial to pose the problem [50]. Thus, the existence of general Heisenberg limits for such cases has remained an open question.
In this paper, we derive new quantum bounds on multiparameter estimation by developing quantum versions of the Bell-Ziv-Zakai bounds [51]. The Bell-Ziv-Zakai bounds were proposed in 1997 by Bell et al. [52], building upon the Ziv-Zakai bound [53], and further generalized by Basu and Bresler [54]. We then apply our bounds to the notable task of quantum optical phase waveform estimation. Here, the waveform to be estimated is a time-varying phase shift signal XðtÞ applied to an optical beam. For a waveform XðtÞ with stationary Gaussian prior statistics and a power-law spectrum (∝ 1=jωj p ; p > 1), we derive a lower bound on the mean-square error with a 1=N 2ðp−1Þ=ðpþ1Þ scaling, where N is the mean photon flux. This proof confirms that the scaling previously proposed in Ref. [41] is valid for arbitrary quantum states. Moreover, we show that this scaling is achievable for all p > 1. Previously, achievability has been shown only numerically, and only for p ¼ 2 [55]. By contrast, the results in the current paper are rigorous Heisenberg bounds, which are both applicable to arbitrary field states and achievable up to a constant prefactor for all p > 1.
This paper is separated into two main parts. The first part, Secs. II-V, assumes unbounded parameters and focuses on the mean-square error. The second part, Secs. VI-VII, focuses on periodic figures of merit, which are more appropriate for periodic parameters such as phase or orientation angles for gyroscopy. They are also insensitive to phase-wrap errors and enable us to demonstrate that our bounds are achievable in more general situations.

A. Classical estimation
First, we summarize known results for the classical estimation problem, following Refs. [52,56], and then present new quantum versions of the bounds in Sec. II B. Let X be a column vector of unknown real parameters given by ð2:1Þ and let Y be a column vector of observations given by ð2:2Þ Both X and Y are random variables; we denote the prior probability density by P X ðxÞ and the conditional observation probability density by P YjX ðyjxÞ. In other words, P X ðxÞ (for example) denotes the probability density that X ¼ x, as distinct from PrðeÞ, which we use to denote the probability of an event e. Furthermore, let X ̬ ðYÞ be an estimator of the unknown X from the observations Y. We use the vee accent (X ̬ ) for the estimator, rather than the hat (X) common in statistics, to avoid confusion with quantum operators.
Define the error vector for the estimator as ϵðX; YÞ ≔ X ̬ ðYÞ − X: ð2:3Þ To characterize the performance of the estimator, we consider an error measure, or "distortion function," of the form Dðu ⊤ ϵÞ, where u is an arbitrary real vector with J elements that defines and weights the error components of interest. Here, ⊤ denotes the transpose, so u ⊤ ϵ ¼ P j u j ϵ j . For example, the choices DðτÞ ¼ τ 2 and u j ¼ δ jk yield the distortion function The expectation value E½D k in this case is the mean-square error in the estimate for the parameter X k . The notation E denotes expectation over the random variables X and Y. More generally, the choice DðτÞ ¼ τ 2 , with arbitrary u, yields the mean-square error where Σ is the error covariance matrix Σ ≔ Eðϵϵ ⊤ Þ: ð2:6Þ We focus on distortion functions with the properties that they are: nondecreasing on ½0; ∞Þ; differentiable, with a derivative denoted by _ DðzÞ; symmetric, that is, DðzÞ ¼ Dð−zÞ; and satisfying Dð0Þ ¼ 0. We can then express the expected distortion with respect to the probability density P ju ⊤ ϵj ðzÞ, for the random variable ju ⊤ ϵj, as dzDðzÞP ju ⊤ ϵj ðzÞ: ð2:7Þ Since the probability density is equal to the negative derivative of the cumulative probability function Prðju ⊤ ϵj ≥ zÞ, we obtain, via integration by parts, Pr ju ⊤ ϵj ≥ τ 2 : ð2:8Þ In the last line, we have made the substitution z ¼ τ=2, to conform with the notation of Ref. [52].
Since _ D is nonnegative by assumption, a lower bound on the expected distortion can be obtained by lower bounding the probability Prðju ⊤ ϵj ≥ τ=2Þ. The central idea of the Ziv-Zakai family of bounds is to bound Prðju ⊤ ϵj ≥ τ=2Þ using the error probability in a related binary hypothesis-testing problem. To this end, let v be a real column vector with J elements that satisfies the restriction with respect to the error-weighting vector u. This is an important restriction on v which we will use throughout the paper. We rewrite Prðju ⊤ ϵj ≥ τ=2Þ as follows: where π 0 ≔ P X ðxÞ P X ðxÞ þ P X ðx þ vτÞ ; ð2:12Þ In Eq. (2.11), we have used the restriction that u ⊤ v ¼ 1.
The term in square brackets in the last two lines of Eq. (2.11) is the probability of error for distinguishing x from x þ vτ with prior probabilities π 0 and π 1 , respectively, from the observations Y and a particular decision rule. This decision rule is that, if u ⊤ X ̬ ðYÞ ≤ u ⊤ x þ τ=2, the decision is taken to choose x, and if u ⊤ X ̬ ðYÞ > u ⊤ x þ τ=2, the decision is taken to choose x þ vτ. Because this is the probability of error for a given decision rule, it cannot be smaller than the minimum probability of error, which we denote P e ðx; x þ vτÞ. To be explicit, this minimum probability of error is given by [57,58] P e ðx; x þ vτÞ An alternative lower bound can be obtained by replacing P X ðxÞ and P X ðx þ vτÞ with minfP X ðxÞ; P X ðx þ vτÞg in Eq. (2.10), which gives This time, the expression in square brackets is the error probability for a decision problem with equal prior probabilities. Let P el e ðx; x þ vτÞ denote the minimum error probability for the decision problem with equal prior probabilities. It can be obtained from Eq. (2.14) by substituting π 0 ¼ π 1 ¼ 1=2. This yields dx minfP X ðxÞ; P X ðx þ vτÞgP el e ðx; x þ vτÞ:

ð2:19Þ
Using this result in Eq. (2.8) then gives the alternative Bell-Ziv-Zakai bound [52,56] QUANTUM BELL-ZIV-ZAKAI BOUNDS AND HEISENBERG … PHYS. REV. X 5, 031018 (2015) 031018-3 For simplicity, here we have omitted a "valley-filling" function used in Ref. [56] that can, in principle, tighten the bounds further, but makes analytic results more difficult to obtain; see Appendix A for the tighter bounds. Note that, in both Eqs. (2.15) and (2.19), Prðju ⊤ ϵj ≥ τ=2Þ is lower bounded by an overlap integral between two terms. The first term depends only on prior information: the prior probability density P X ðxÞ and its translated form P X ðx þ vτÞ. The second term further depends on posterior information: the degree to which the observations Y can distinguish Finding the translation vector v that maximizes the overlap, subject to the constraint u ⊤ v ¼ 1 for a given error-weighting vector u, will give the tightest lower bounds.
This concludes the summary of prior results for the classical case, and we next present our bounds for quantum estimation.

B. Quantum estimation
For the quantum parameter estimation problem, letρ X be the density operator that describes the state of a quantum probe as a function of the unknown parameters X, and let E Y ðyÞ be the positive operator-valued measure (POVM) that specifies the observation statistics [1]. The likelihood function is then ð2:21Þ with tr denoting the operator trace. It is known [1,28] that, for any POVM, We call Eqs. (2.26)-(2.29) the quantum Bell-Ziv-Zakai bounds.

C. Special conditions: Homogeneous displacements and Gaussian prior
To derive further analytic results, we focus on the case in which the quantum probe stateρ x is related to an input probe stateρ byρ where fα x g is a set of completely positive trace-preserving (CPTP) maps satisfying the homogeneity condition Here, the addition can be normal vector addition, or it can be modulo addition. This includes, for example, unitary transformations such as optical phase shifts and phasespace translations (see Secs. III and IV below), as well as homogeneous decoherence processes such as the addition of Gaussian noise to optical modes (with x being the noise variances) [61] or lossy transmission of optical modes (with x being a logarithmic measure of transmission). Now, since the trace norm cannot increase under CPTP maps [60], one has ∥π 0ρx − π 1ρxþvτ ∥ 1 ¼ ∥α x ðπ 0ρ − π 1ρvτ Þ∥ 1 ≤ ∥π 0ρ − π 1ρvτ ∥ 1 , implying thatρ x andρ xþvτ may be replaced byρ andρ vτ , respectively, in each of the quantum Ziv-Zakai bounds in Eqs. (2.26) and (2.28). Similarly, since the fidelity is nondecreasing under such maps [60], one may replace Fðx; x þ vτÞ by Fð0; vτÞ in the bounds (2.27) and (2.29). As a consequence, the integration over x in these bounds greatly simplifies, being dependent only on the prior probability densities P X ðxÞ and P X ðx þ vτÞ.
Next, we impose the condition that P X ðxÞ is a multivariate Gaussian distribution. In terms of its covariance matrix (see also Appendix B). Note from Eq. (2.33) that τ 0 ðvÞ sets the approximate scale of τ above for which the prior density P X ðxÞ can be reliably distinguished from the translated density P X ðx þ vτÞ. A convenient lower bound on the complementary error function, useful for obtaining analytic results in the next section, is where Λ is the triangle function ΛðzÞ ≔ maxð1 − jzj; 0Þ, as shown in Fig. 1. Thus, for homogeneous displacements having a Gaussian prior probability density, the quantum Ziv-Zakai bounds in Eqs. (2.28) and (2.29) may be replaced by the simplified bounds ð2:38Þ

III. ESTIMATION FOR COMMUTING GENERATORS WITH GAUSSIAN PRIOR
We now consider the problem of probe states displaced by commuting generators, as illustrated in Fig. 2, as well as a Gaussian prior distribution for X. For this problem, the output probe state has the general form FIG. 2. The quantum estimation problem for commuting generators. The initial state ρ may be entangled, and each subsystem or mode undergoes a displacement X j generated by operatorĝ j , with ½ĝ j ;ĝ k ¼ 0. The output may be measured via some general joint measurementÊ Y ðyÞ. ρ X ¼ α X ðρÞ ≔ expð−iX ⊤ĝ Þρ expðiX ⊤ĝ Þ; ð3:1Þ whereρ is the initial quantum state andĝ is a column vector of shift generators with ½ĝ j ;ĝ k ¼ 0. The operatorsĝ j must be Hermitian but are otherwise unrestricted. Note this includes multimode phase estimation as a special case, withĝ j corresponding to the photon number operatorn j of mode j, and X j to the phase shift generated byn j . Equation (3.1) satisfies the homogeneous displacement property in Eq. (2.31), and more generally, it would satisfy this property for ½ĝ j ;ĝ k ¼ c jk ¼ constant. With the assumption of a Gaussian prior, the simplified quantum Ziv-Zakai bounds in Eqs. (2.37) and (2.38) are applicable. We now simplify the latter of these bounds further, by relating the fidelity term Fð0; vτÞ to the expectation value ofĝ.
First, note that if jψi is a purification ofρ on some larger Hilbert space, then expðiτv ⊤ĝ Þjψi is a purification ofρ vτ . Hence, defining hÔi ≔ trðÔρÞ and CðgÞ ≔ hgjρjgi for eigenstate jgi ofĝ, Uhlmann's theorem [59,60] yields the lower bound CðgÞCðg 0 Þ cos½τv ⊤ ðg − g 0 Þ: ð3:2Þ We are ultimately interested in obtaining a Heisenberg limit, i.e., a limit scaling linearly with the modulus of the shift generator. In order to achieve this, we use the linear bound [23] cos θ ≥ 1 − λjθj ð 3:3Þ on the cosine function, where λ ≈ 0.7246 is a solution of λðπ − arcsin λÞ ¼ 1 þ ffiffiffiffiffiffiffiffiffiffiffiffi 1 − λ 2 p , as depicted in Fig. 3. This leads to where jvj is the vector formed from v by taking the absolute values of each of its elements, and similarly, jĝj is the vector formed by replacing each eigenvalue of each elementĝ j of g by its absolute value. Since fidelity is non-negative, a tighter bound is Recall that both τ 0 in Eq. (2.35) and τ F in Eq. (3.6) vary inversely with the magnitude of v, ensuring that both sides of Eq. (3.7) vary as the square of the magnitude of u. While τ 0 relates to the prior information (the narrower the prior distribution, the smaller τ 0 is), τ F relates to the average magnitude of the generators with respect to the input state (the greater this average is, the smaller τ F is). These properties are quite intuitive-the size of the lower bound on the mean-square error decreases with more prior information or for a quantum state with greater potential sensitivity to any given displacement value x.

A. Taking the continuous limit
We now consider an optical beam where XðtÞ is a phase signal that is a continuous function of time t, as illustrated in Fig. 4. We use Σ 0 ðt; t 0 Þ to denote the covariance function for the prior correlations between XðtÞ and Xðt 0 Þ, and we again take the probability distribution to be Gaussian. The requirement that almost all (in a measure sense) realizations xðtÞ of XðtÞ be continuous can be ensured by requiring Σ 0 ðt; t 0 Þ to be continuous at t ¼ t 0 .
We seek a continuous analogue to the quantum Bell-Ziv-Zakai bound obtained above. In other words, we would like to replace all sums in the preceding section by integrals. Thus, the measurement uncertainty, the quantity on which we wish to place a lower bound, is quantified by where Σðt; t 0 Þ is the error covariance function for the estimates at times t and t 0 , and uðtÞ is a given real function. This expression is the continuous analog of u ⊤ Σu in the discrete Bell-Ziv-Zakai bounds above. Similarly, for a continuous phase signal, the sum P j X jnj for a multimode phase shift in Eq. (3.1) (where we have specialized to the case whereĝ j is the photon number operatorn j of mode j) is replaced with an integral using the photon flux operator. In other words, the probe stateρ is modulated by the phase signal XðtÞ to givê where integration is over the real line. The photon flux operatorÎðtÞ [3,62,63] can be regarded as the Δt → 0 limit of Δn=Δt, where Δn is the number operator corresponding to the photons in a time interval ðt; t þ ΔtÞ. Conversely, the integral ofÎðtÞ over some time interval gives the operator for the number of photons in that time interval. In the continuum limit, withρ X given by Eq. (4.2), the bound we wish to derive on the measurement uncertainty as quantified by Eq. (4.1), and the constraints that define that bound, should also be defined in terms of time integrals. But to proceed rigorously, we first need to turn the continuous expressions into discrete expressions by using a small time unit, δt > 0. This allows us to apply the discrete quantum Bell-Ziv-Zakai bound in Eq. (3.7), prior to taking the limit δt → 0. The general principle is that the state can be approximated by a succession of modes of duration δt such that XðtÞ does not vary significantly during this time. More specifically, we first approximate XðtÞ as a constant over time intervals of length δt, so we can use the result (3.7) above and then take the limit as δt approaches zero. Because the realizations of XðtÞ are continuous, this yields a closer and closer approximation of the true signal as δt → 0.
In particular, consider the discrete times where j is an integer and the range of j is over all integers. We can then define corresponding discrete quantities as follows. Approximating XðtÞ to be constant in each interval ½t j ; t jþ1 Þ, we take X j to be X j ≔ Xðt j Þ: ð4:4Þ The prior covariance function Σ 0 ðt; t 0 Þ will then be constant for t and t 0 within these intervals, and we define Σ 0jk ≔ Σ 0 ðt j ; t k Þ: ð4:5Þ We take u j to be the integral of uðtÞ over each interval, When XðtÞ is constant in each interval, it is straightforward to show that the best estimate of XðtÞ is also constant in each interval. Hence, the error covariance function Σðt; t 0 Þ can be taken to be constant for t and t 0 within given time intervals, allowing us to define the discrete covariance matrix Σ jk ≔ Σðt j ; t k Þ: ð4:7Þ Finally, we define the operatorĝ j to bê corresponding to the number of photons between times t j and t jþ1 . The integral in Eq. (4.2) then becomes where nowĝ is the photon number operatorn. Thus, the integral that generates the shifts of the probe state in Eq. (4.2) is replaced by a sum of the form of the discrete case in Eq. (3.1).
With the above definitions of the discrete quantities, we can use the quantum Bell-Ziv-Zakai bound in Eq. (3.7) to obtain a lower bound on the measurement uncertainty in the continuous case, as quantified by the expression in Eq. (4.1), by taking the limit as δt → 0. However, the bound in Eq. (3.7) depends on τ 0 ðvÞ in Eq. (2.35) and τ F ðvÞ in Eq. (3.6), which in turn depend on the quantities v ⊤ Σ −1 0 v and jvj ⊤ hjĝji, where the vector v satisfies u ⊤ v ¼ 1. Hence, suitable analogues of these quantities are required in the continuous limit.
Therefore, for each value of δt > 0, we define the corresponding real function vðtÞ by Furthermore, noting that the inverse of the prior covariance function, the matrix inverse of Σ 0 in Eq. (4.7) is given by We then obtain where in the latter we have used the fact thatÎðtÞ is a positive operator for all t. Taking the continuous-time limit δt → 0, the quantum Bell-Ziv-Zakai bound (3.7) thus holds essentially unchanged, with summations replaced by the corresponding integrals in Eqs. (4.1), (4.12), (4.15) and (4.16).

B. Stationarity and power-law scaling
We now make two assumptions, in addition to Gaussianity, about the prior statistics of XðtÞ. The first is stationarity, implying that the prior covariance function Σ 0 ðt; t 0 Þ is a function of t − t 0 only. This means that we can define a prior power spectral densityΣ 0 ðωÞ bỹ Then, the inverse of Σ 0 ðt; t 0 Þ is given by The second assumption is that the spectral densityΣ 0 ðωÞ has an asymptotic power-law scaling, i.e.,Σ 0 ðωÞ ∼ 1=jωj p for ω large. This scaling is problematic for small ω, however, because it diverges at ω ¼ 0. To avoid this divergence, it is convenient to take [50] for some constant γ. Note that the Ornstein-Uhlenbeck process used in Refs. [42,43] corresponds to p ¼ 2.
Alternative choices for removing the singularity at ω ¼ 0 yield similar results, as shown in Appendix C. The constant κ essentially gives a multiplicative constant for the phase variation XðtÞ. The power of p − 1 ensures that κ has dimensions of reciprocal time. Thus, κ −1 is a characteristic time for the spreading; for example, for a Wiener process, κ −1 is the time needed to reach a variance of 1. The constant γ, on the other hand, gives a characteristic time for the relaxation of the phase towards zero and ensures that the spectrum is limited at ω ¼ 0. For example, for an Ornstein-Uhlenbeck process, γ is just the decay constant.

C. Towards the Heisenberg scaling bound
We are particularly interested in the mean-square error of the estimate at a particular time t 0 , i.e., Σðt 0 ; t 0 Þ. For this case, we must choose implying from Eq. (4.12) that vðt 0 Þ ¼ 1. We can choose any function such that vðt 0 Þ ¼ 1 in the quantum Bell-Ziv-Zakai bound. We find it convenient to use the family of functions defined by A near-optimal choice for the characteristic time T that parametrizes this family will be made below.
With the above choice of vðtÞ, one has vðωÞ ¼ 2πTe iωt 0 ΛðωTÞ. This result allows the integral in Eq. (4.19) to be computed analytically for the power spectral density in Eq. (4.21), resulting in this (implicit) expression for τ 0 : We can similarly obtain, from Eq. (4.16), a simple expression for τ F : where we have defined a weighted average of the flux around t 0 by With these results, we are now in a position to derive a Heisenberg limit.

D. Heisenberg scaling bound: Constant flux
Deriving the desired limit is particularly simple for the case of constant photon flux, hÎðtÞi ¼ constant ¼ N , which also equals N ðt 0 Þ. We are interested in the limit of large N . The lower bound in Eq. (3.7) scales as the lesser of τ 2 0 and τ 2 F , so the strongest (i.e., largest) lower bound will be obtained for τ 0 ðvÞ ≈ τ F ðvÞ. Our choice of vðtÞ in Eq. (4.23) depends on T and t 0 , and the expressions for τ 0 ðvÞ and τ F ðvÞ in Eqs. (4.25) and (4.26) only depend on T. Therefore, we wish to choose T to ensure τ 0 ðvÞ ≈ τ F ðvÞ.
The condition τ 0 ðvÞ ¼ τ F ðvÞ yields This equation means that N is inversely related to T, and the limit of large N means small T. In that limit, the second term on the right-hand side can be ignored, in which case the solution for the optimal characteristic time T is ð4:29Þ We can use this expression for T for all N , although it is only optimal in the limit of large N . Then, τ 0 ðvÞ < τ F ðvÞ, so using the second subfunction in (3.8) gives ZðvÞ ≥ 11 420 τ 0 ðvÞ 2 : ð4:30Þ Using this inequality with Σðt 0 ; t 0 Þ ≥ ZðvÞ gives the lower bound where c Z and c γ are the dimensionless constants ðγ=κÞ p ð4π 2 λ 2 p 3 Þ p=ðpþ1Þ : ð4:33Þ In the limit of large N , the expression in square brackets in Eq. (4.31) approaches 1, and we have ð4:34Þ Note that this bound on the mean-square error is independent of the constant γ used to remove the singularity at ω ¼ 0 in Eq. (4.21); i.e., the result depends on the scaling of the spectrum for large ω, not on the behavior for small ω.
Alternative choices for removing the singularity yield similar results, as shown in Appendix C. Note also that both the strict bound in Eq. (4.31) and the asymptotic bound in Eq. (4.34) continue to hold if Eq. (4.21) just gives a lower bound on the spectrum, rather than the exact spectrum. This is because increasing the spectrum will only decrease the value of the integral in Eq. (4.19), leading to a larger value of τ 0 ðvÞ. Because ZðvÞ in Eq. (3.8) is monotonically increasing with τ 0 ðvÞ, this increase in τ 0 ðvÞ would only increase the lower bound.
The ðκ=N Þ 2ðp−1Þ=ðpþ1Þ scaling of Eq. (4.34) was previously proposed in Ref. [41] as the Heisenberg limit for a stochastically varying phase with a power-law spectrum. However, the proof in that work applies only to a specific class of Gaussian quantum states. Here, we have proved the scaling for arbitrary quantum states with constant flux by using our powerful new technique of the quantum Bell-Ziv-Zakai bound. Next, we show how to obtain essentially the same result even for a time-varying flux.

E. Heisenberg scaling bound: Varying flux
For the general case of a time-varying photon flux, the overall performance of any estimate is characterized by the error averaged over time, i.e., bȳ For constant flux, it is clear that this has the same lower bound as Σðt 0 ; t 0 Þ in Eq. (4.34) above. More generally, we define the time-averaged flux by It is easy to check that the time average of N ðt 0 Þ in Eq. (4.27) is also equal to N , implying that this definition is consistent with the above notation for the case of constant flux. We can rewrite the lower bound in Eq. (3.7) as  Fig. 5. Therefore, using Jensen's inequality, we can place a lower bound on this expression by replacing 1=τ F ðvÞ with its average value. As 1=τ F ðvÞ is proportional to N ðt 0 Þ, its average value can be found by just replacing N ðt 0 Þ with N . As a result, we obtain a lower bound onΣ in terms of the time average of N ðt 0 Þ, i.e., in terms of N : In other words, we have a lower bound on the timeaveraged mean-square error in terms of the time-averaged flux.
As before, for large N the expression in square brackets is approximately 1, and we havē Again, this result is not dependent on the particular means used in Eq. (4.21) to avoid a singularity at ω ¼ 0, as shown Appendix C.

V. ACHIEVING THE OPTIMAL SCALING
A lower bound is not a Heisenberg limit, and is of limited value, if it is not close to a realizable error. Here, we demonstrate that the scaling in Eq. (4.39) is indeed achievable in principle, up to a constant prefactor. The measurement technique we present is only intended to be possible in principle (allowed by the laws of quantum mechanics), analogous to the canonical measurements with optimal states for measurement of a constant phase [25]. It is presented because it enables us to prove rigorous results. How to perform measurements that would be experimentally realizable is a question for future work.
Consider an estimation strategy where the probe field is concentrated into pulses separated by time T, as shown in Fig. 6. This T is, as yet, unrelated to the T used to obtain the lower bound in the preceding section. Each pulse is assumed to be so short that the phase XðtÞ does not vary significantly during the pulse duration. An average flux N is ensured by each pulse having average photon number N T.
We first assume that the phase modulation is weak; viz., Using canonical phase measurements and minimumuncertainty states within each pulse, the observation Y n ∈ ð−π; π at each sampling can be linearized as Y n ≈ XðnTÞ þ ξ n ; ð5:2Þ where ξ n is a random variable describing the noise with moments Eðξ n jXÞ ≈ 0; Eðξ n ξ m jXÞ ≈ δ nm ð4=27Þjz A j 3 ðN T Þ 2 ; ð5:3Þ with z A ≈ −2.338 being the first negative root of the Airy function [64]. The above moments are exact in the asymptotic limit of large N T. The condition given by Eq. (5.1) can be relaxed for large phase fluctuations by making the canonical phase measurements adaptive [65], as shown in Appendix D. A rigorous accounting of the error due to phase ambiguity will be presented in Sec. VII in the case of a periodic distortion function. For the remainder of this section we assume Eqs. (5.1)-(5.3) for simplicity.
After all measurements are made, a final estimate of XðtÞ can be constructed via the Whittaker-Shannon interpolation formula: As in the preceding section, we are interested in the limit of asymptotically large N , which yields lower error for the estimation. Because the aliasing error will be smaller as the time between samples T is smaller, we consider the limit of small T. Averaging over time, and taking γT ≪ π, the aliasing error is for γT ≪ π. Note that the first term increases with T, whereas the second term decreases with T. This result is as we expect, because increasing T means that the phase is sampled less frequently and can vary more in between samples, but increasing T also means that more power is available to estimate each sample, which reduces the error. The optimal value of T, corresponding to minimizing the right-hand side of Eq. (5.11), is which decreases with N as we expect. It is slightly different from the T in the preceding section, but the scaling is the same. This calculation thus gives some insight into the T FIG. 6. A pulsed phase measurement scheme to achieve the optimal scaling. used to define the function vðtÞ in the preceding section: It relates to the correlation time for the intensity fluctuations in a light beam that will optimally probe the timevarying phase.
With this value of T, we obtain an average mean-square errorΣ where c A > c Z is the dimensionless constant Thus, the achievable variance has the same scaling with respect to κ=N as that in the lower bound in Eq. (4.39) but with a larger prefactor. This demonstrates that the scaling of the lower bound in Eq. (4.39) with N is tight, up to a constant prefactor, and thus represents a rigorous Heisenberg limit.

A. Periodic distortion functions
We have considered phase estimation as an example of the application of the quantum Bell-Ziv-Zakai bounds. Phase measurements are intrinsically modulo 2π because they are unable to distinguish between phases that differ by multiples of 2π. For this reason, the phase will typically be taken to be in some standard region, such as ð−π; π. Then, a phase of −π þ δ 1 , for some small δ 1 > 0, can easily be estimated as π − δ 2 , for some δ 2 > 0. It seems unrealistic to quantify the error as ≈2π, because the phase difference is small modulo 2π. For this reason, it is better to use periodic distortion functions for measurements of this type.
A periodic distortion function satisfies for π=2 ≤ z ≤ π. This is a technical condition needed for the classical Ziv-Zakai bound of Ref. [54]. An example of a periodic distortion function satisfying these conditions is the periodic modification of the mean-square error, ð6:3Þ where the notation denotes the value of z modulo the interval ð−π; π, as shown in Fig. 7.
Rather than considering the distortion function Dðu ⊤ ϵÞ for a general vector u, as in Sec. II, for simplicity, we restrict our attention to the case u l ¼ δ l;j . This corresponds to the distortion function Dðϵ j Þ, where ϵ j denotes the jth component of the estimation error ϵ. This case is sufficient for bounding the average mean-square error of the estimate.

B. Periodic Ziv-Zakai bounds
In this subsection, we summarize the technique for obtaining periodic Ziv-Zakai bounds from Ref. [54]. One can obtain an expression similar to Eq. (2.8) as This result is equivalent to Eq. (15) of Ref. [54]. Using the property (6.2) of D, Eq. (6.5) gives (see Appendix F) where v is a vector as in Sec. II A, and Similarly to Eqs. (2.12) and (2.13), the probabilities π 0 and π 1 are given by Now we require the vector v to have its j component equal to 1, i.e., v j ¼ 1. This requirement corresponds to the condition u ⊤ v ¼ 1 used in previous sections, for the special case u l ¼ δ l;j . With this restriction, consider a hypothesistesting problem for distinguishing x from x þ vτ. It is easily seen that the condition ð½ϵ j 2π > τ=2Þ∨ð½ϵ j 2π ≤ τ=2 − πÞ in the second line of Eq. (6.7) corresponds to the estimate X ̬ j ðYÞ being closer to x j þ τ than to x j (see Fig. 3 of Ref. [54]). Similarly, the condition ð½ϵ j 2π ≤ −τ=2Þ ∨ ð½ϵ j 2π > π − τ=2Þ corresponds to X ̬ j ðYÞ being closer to x j than to x j þ τ (see Fig. 4 of Ref. [54]). Hence, Gðπ 0 ; x; v; τÞ is equal to the probability of error for distinguishing x and x þ vτ for the decision scheme based on whether x j or x j þ τ is closer to the estimate X ̬ j ðYÞ. It is therefore lower bounded by the error for an optimal decision scheme, which, together with Eq. (6.6), implies þ P X ð½x þ vτ 2π ÞP e ðx; ½x þ vτ 2π Þ: ð6:9Þ Using minfP X ðxÞ; P X ð½x þ vτ 2π Þg, similarly to the approach in Sec. II A, gives (see Appendix F) dx min½P X ðxÞ; P X ð½x þ vτ 2π ÞGð1=2; x; v; τÞ: ð6:10Þ Here, Gð1=2; x; v; τÞ is the error probability for a decision problem with equal prior likelihood. Lower bounding the error probability then gives dx min½P X ðxÞ; P X ð½x þ vτ 2π ÞP el e ðx; ½x þ vτ 2π Þ: ð6:11Þ This concludes the summary of the results in Ref. [54]. Next, using Eq. (2.23), for the case of quantum measurements, we finally obtain the periodic quantum Ziv-Zakai bound ð6:12Þ

C. Gaussian prior
To provide a result for Gaussian variation of x, we encounter a difficulty. Gaussian distributions always extend from −∞ to þ∞ (even though they exponentially decay), whereas the variation of x j is limited to the interval ð−π; π. The method we use is to take a Gaussian probability distribution P G and wrap it around 2π. This would physically correspond to a case where a phase shift is caused by variation in an unbounded quantity (such as the position of a mirror on which the beam is incident [44]) that has Gaussian statistics. The corresponding periodic probability density is then given by where the sum is over all vectors of integers, n. Denoting the integral over x in Eq. [23] can be lower bounded as Z π −π dx min½P X ðxÞ; P X ð½x þ vτ 2π ÞHðxÞ We therefore obtain a result closely analogous to Eq. (2.29): where the second integral is over all x. On the right-hand side, the main difference between this expression and that in Eq. (2.29) is that the integral is up to τ ¼ π, rather than τ → ∞. Next, we choose the mean-square error distortion function in Eq. (6.3) for periodic variables. Assuming generators that commute and satisfy Eq. (2.31) with addition modulo 2π, it follows via Eq. (6.16) above that the quantum Ziv-Zakai bound in Eq. (3.7) is modified to τ F ðvÞ > π < τ 0 ðvÞ:

ð6:18Þ
A minor difference between this result and what we obtained before is that the left-hand side in Eq. (6.17) is the mean-square error (modulo 2π) for a single component for the error, whereas the left-hand side in Eq. (3.7) contains the full covariance matrix. In other words, Eq. (6.17) corresponds to taking u l ¼ δ l;j , to give the mean-square error for x j . However, this is exactly what is used in Sec. IV. Therefore, we can follow an analysis similar to that in Sec. IV.
First, consider the case of constant flux; then, using Eq. (4.29) for T, we have τ 0 ðvÞ < τ F ðvÞ. Next, we need either the second or third subfunction in Eq. (6.18). The third subfunction will be at least π 2 11=420 when the condition is satisfied. If τ 0 > π, but the second subfunction is less than π 2 11=420, then the second subfunction still provides a lower bound on Z π ðvÞ. The net result is that we can lower bound Z π ðvÞ by the minimum of π 2 11=420 and the second subfunction, and the lower bound on the variance is only slightly modified to Σðt 0 ; t 0 Þ ≥ min π 2 11 420 ;

ð6:19Þ
In the case of time-dependent flux, we can again show that the lower bound in Eq. (6.18) is a convex function of 1=τ F ðvÞ. This means that, for flux that is dependent on time, we again obtain a lower bound on the average value by replacing the instantaneous flux with the time-averaged flux N . As a result, we obtain exactly the same lower bound on the time-averaged varianceΣ. In the limit of large N , the lower bound on the variance is less than π 2 11=420. Thus, the min is not needed, and we obtain exactly the same asymptotic expression for the bound as we did in Sec. IV.

VII. ACHIEVING THE OPTIMAL SCALING: EFFECT OF PHASE AMBIGUITY
The analysis of the technique for achieving the optimal scaling given in Sec. V does not fully address the fact that the phase can only be measured modulo 2π. When tracking a phase, it is possible to resolve this ambiguity from the fact that the variation of the phase is continuous. Provided the phase does not change too much between successive estimates, and each estimate is reasonably accurate, we can keep track of changes by 2π. In other words, one can add suitable multiples of 2π to Y n to give X ̬ n , such that jX ̬ n − X ̬ n−1 j ≤ π. If the initial range of the phase is known, then the error should not exceed π.
This approach is problematic when the phase can vary arbitrarily far from zero, such as for a Wiener process. There is a nonzero possibility, however small, of choosing the wrong multiple of 2π at any step, and from that point on, there will continue to be an error of size 2π because of this initial error. This is called a phase-wrap error. When averaging measurements over an arbitrarily long period of time, the phase error can grow to be arbitrarily large. For the phase variation we consider, the Fourier spectrum Σ 0 ðωÞ is bounded for ω ¼ 0, so the prior distribution has a bounded variance for any given time. This means that the error due to phase wraps is not unbounded, but it is still problematic.
Here, we consider the periodic distortion function in Eq. (6.3), given by the mean-square error modulo 2π. In this case, phase-wrap errors for individual points on their own do not matter because they do not increase a periodic distortion function. The problem appears when we consider estimation of the phase between the sample points, where the phase is interpolated. The estimated interpolation error for the Whittaker-Shannon interpolation formula is only accurate if there are no phase-wrap errors.
To simplify the problem, we take XðtÞ to vary over the entire real line. Since the error is quantified modulo 2π, the estimation problem is identical to that where XðtÞ is limited to the region ð−π; π. There is the additional advantage that the probability distribution is now exactly Gaussian, rather than given by Eq. (6.13).
In this section, let t be the particular time for which we wish to bound the mean-square deviation (modulo 2π) between the interpolated estimate and the actual phase. Let n t be the sample time nearest t, in the sense that n t ≔ roundðt=TÞ;

ð7:1Þ
where T is the pulse separation. We use the "round half down" convention, so if t is equidistant between two sample times, we take the smaller sample time. Then, X ̬ n t is the phase estimate for the sample time nearest t. Let L n be the multiple of 2π that is added to the measurement result Y n ∈ ð−π; π to give X ̬ n , so that ð7:2Þ We choose L n t ¼ 0. Thus, X ̬ n t is just Y n t , and it is in the interval ð−π; π. In this section, the noise random variable ξ n is redefined as ξ n ≔ ½Y n − XðnTÞ 2π : ð7:3Þ With this definition, the moments given in Eq. (5.3) are correct. We then define an integer K n as We also define K ≔ K n t . The number of phase-wrap errors between samples is quantified by The quantity K is the number of phase wrappings between X ̬ n t and Xðn t TÞ. This difference, in itself, is unimportant if deviations are only measured modulo 2π. What is important is that this difference is maintained for the other estimates. To achieve this result, we add or subtract multiples of 2π as needed (i.e., we choose L n ) to make the differences between neighboring estimates no more than π; in particular, X ̬ n − X ̬ n−1 ∈ ð−π; π. Provided certain conditions are met (discussed below), the difference between X ̬ n and XðnTÞ will be close to 2πK for all n, in other words, to the same multiple of 2π. Otherwise, there are phase-wrap errors, so K n is dependent on n. This dependence on n will introduce error to the interpolation; however, this error can be bounded.
The interpolated values X ̬ ðtÞ can be expressed as This expression shows why we want the difference between the estimates and values to remain the same multiple of 2π.
If the difference is the same, then KðtÞ simply becomes the constant K. On the other hand, if K n varies with n, then KðtÞ will not be an integer, and the extra term 2πKðtÞ in Eq. (7.7) will give an increased error modulo 2π. , as per the last line above, because the expression on the last line is a concave function of these quantities. The time-averaged value of E½Δ 2 ðtÞ is exactly what was obtained in Sec. V, with the result given in Eq. (5.11). The remaining task is therefore to find the timeaveraged value of E½½2πKðtÞ 2 2π . To achieve this, we need to bound the probabilities of phase-wrap errors. If t is a sample time so that t ¼ n t T, then KðtÞ ¼ K, and ½2πKðtÞ 2π is equal to zero. Therefore, in the remainder of this discussion, we assume that t is not a sample time. We wish to consider the error in interpolating at the given time t. Without loss of generality, this time can be taken to be in the interval ð0; T=2. This is because there is translation symmetry and time-reversal symmetry. One can simply translate the time by a multiple of T and change the sample numbering such that n ¼ 0 or n ¼ 1 corresponds to the closest sample time. That would yield t ∈ ð0; TÞ. Then, if t ∈ ðT=2; TÞ, one can simply reverse all times about T=2, so t ∈ ð0; T=2.
Since n t ≔ roundðt=TÞ, for t ∈ ð0; T=2, we have n t ¼ 0. Hence, we are starting with X ̬ 0 in the interval ð−π; π, so X ̬ 0 ¼ Y 0 and L 0 ¼ 0. Then, all other values of L n are selected such that jX ̬ n − X ̬ n−1 j ≤ π. The goal of this is to ensure that the values of K n are equal (or at least close) to K 0 .
Recall that the quantity Z n , defined in Eq. (7.6) as the difference between successive values of K n , quantifies the number of phase-wrap errors between samples. The probability of a phase-wrap error between samples is therefore given by p err ≔ PrðjZ m j > 0Þ:

ð7:11Þ
We first place an upper bound on p err , and then bound E½½2πKðtÞ 2 2π in terms of p err . Using Eqs. (7.2) and (7.4), we obtain ð7:12Þ In the last line, we have used the fact that the values of L n have been chosen such that jX ̬ n − X ̬ n−1 j ≤ π. Now, if jξ n − ξ n−1 j þ jXðnTÞ − Xððn − 1ÞTÞj is less than π, then 2πjK n − K n−1 j < 2π. Because K n takes integer values, this inequality implies that K n ¼ K n−1 , i.e., that there is no phase-wrap error between the two successive samples. In particular, there will be no phase-wrap error if jξ n − ξ n−1 j < π=2 and jXðnTÞ − Xððn − 1ÞTÞj < π=2. This means that To bound the first probability on the right-hand side in Eq. (7.13), we use the fact that the statistics for X are Gaussian. It turns out that E½½XðtÞ − Xðt 0 Þ 2 can be bounded as a polynomial in jt − t 0 j (see Appendix G). Here, jt − t 0 j ¼ T, and we choose T to vary polynomially in κ=N . As a result, E½½XðnTÞ − Xððn − 1ÞTÞ 2 varies polynomially in κ=N , becoming small for large N =κ. Because the probability distribution for X is Gaussian, the probability of the difference between XðnTÞ and Xððn − 1ÞTÞ being larger than π=2 is exponentially small in N =κ.
Next, we consider the probability of jξ n − ξ n−1 j exceeding π=2. By modifying the phase measurements at each sample, it is possible to ensure that the probability of error greater than π=4 is exponentially small in N =κ while not modifying the mean-square error to leading order (see Appendix H). Therefore, the probability of jξ n − ξ n−1 j exceeding π=2 is exponentially small in N =κ. Combining this with Eq. (7.13), and the fact that PrðjXðnTÞ − Xððn − 1ÞTÞj ≥ π=2Þ is exponentially small in N =κ, we find that p err is exponentially small in N =κ.
To upper bound E½½2πKðtÞ 2 2π , we express K n in terms of Z n . We obtain, for positive n, Similarly, for negative n, Therefore, we can write KðtÞ as where (see Appendix I) and Ψ is the digamma function ΨðxÞ ¼ Γ 0 ðxÞ=ΓðxÞ. Using this expression, we can bound E½½2πKðtÞ 2 2π as follows: where S ≔ X n;m∶jn−mj>1 E½Z n Z m a m a n : ð7:19Þ We use the notation ≲ to indicate that higher-order terms in N have been omitted. In the last line of Eq. (7.18), we have used the fact that Z m is limited to f0; −1; þ1g with high probability, which implies E½Z 2 m ≈ p err (see Appendix J).
In addition, jE½Z m Z mþ1 j cannot exceed E½Z 2 m , which gives the inequality.
The quantity in square brackets in the last line of Eq. (7.18) is less than 1 (this can be seen numerically). Moreover, it can be shown that the longer-range correlations quantified by S are exponentially small (see Appendix J). Hence, we find that E½½2πKðtÞ 2 2π is exponentially small in N =κ. This result is for the worst-case value of t (i.e., midway between two sample points), so averaging over t can only give smaller values.
Rather than deriving an optimal value of T, we simply use the value in Eq. (5.12). The scaling for the average variance given in Eq. (5.13) corresponds to the time average of E½Δ 2 ðtÞ, with T given as in Eq. (5.12). Denoting this quantity byΣ 0 and the time-averaged value of E½½2πKðtÞ 2 2π byΣ wrap , using Eq. (7.10) then gives where c A is defined in Eq. (5.14). BecauseΣ wrap is of higher order in N , it does not affect the final scaling constant for the leading-order term. Thus, we find that, when we fully take account of phase-wrap errors, we still obtain

VIII. CONCLUSIONS
While fundamental quantum limits to the accuracy of estimation of single quantities are well known, deriving fundamental limits for estimating multiple parameters becomes very challenging when there is prior information describing correlations between them. A particularly important example of this is in phase estimation, where the phase at any time is correlated with the phase at earlier and later times. Such phase estimation tasks are needed, for example, in gravitational-wave astronomy.
Here, we have proven quantum forms of the Bell-Ziv-Zakai bounds for multiparameter estimation. One of these enables us to bound the accuracy possible when measuring a phase with stationary Gaussian prior statistics and a power-law spectrum ∼ω −p . We have thereby been able to prove that the scaling bound found in Ref. [41], for quantum states having time-symmetric stationary Gaussian statistics for the field quadratures, in fact holds for all possible quantum states.
Moreover, we have shown here analytically that the lower bound we have derived is always achievable, up to a constant prefactor. Specifically, it is possible, in principle, to achieve it by sampling with a regularly timed sequence of pulses, each of which is measured by a canonical phase measurement, and with interpolation of the phase between those times. This bound can therefore be regarded as analogous to the Heisenberg limit for measurement of a single constant phase.
We have also provided bounds for periodic distortion functions. An example of this is measurement of phase modulo 2π, so the mean-square error is evaluated modulo 2π. We find that the bounds we derive for the nonperiodic case hold almost unchanged.
In addition to providing the asymptotic form of the bounds for large flux, we have provided strict bounds that hold for all values of the flux. Our results provide a definite theoretical prediction, that the average mean-square error of any measurement technique cannot be below the bounds given. Another interesting prediction is that, for phase variation close to 1=f noise (i.e., p ≈ 1), coherent states provide close to the best possible probe states.
The question of how to achieve these bounds is more difficult. The measurement technique we have described is possible in principle, in that it is allowed by the laws of quantum mechanics. A more practical scheme for measuring a varying phase would be to use a squeezed state and adaptive phase measurements, as has been described, for example, in Refs. [55,66], and experimentally demonstrated in Ref. [43]. Numerical results indicate that such techniques can give mean-square errors comparable to the lower bound for p ¼ 2, but it is an open question how effective these techniques would be for other values of p.
Of course, all real measurements are limited by loss. We expect that real measurements would only give the scaling predicted up to some maximum flux dependent on the amount of loss and thereafter scale in the same way as for coherent states (given in Ref. [41]). The bound for measurements with loss is a question for future work. Some preliminary results on that question are given in Ref. [9].
Another important open question is how to reduce the gap between the upper and lower bounds on the minimum achievable variance. The ratio c A =c Z (of the prefactors for the upper and lower bounds) is about 14 to 20 for moderate values of p, but it is unbounded in the limit p → 1 or p → ∞. For the case of measurements of a constant phase, the exact minimum variance is known [25], and it would be desirable to achieve the same result here. Some possible approaches to this are to use a better function vðtÞ (as we have not attempted to optimize this function), as well as to use different forms of the Ziv-Zakai bound.
We also note that while Gaussian correlations were assumed for the applied phase shift X [e.g., in Eq. (2.33)], we believe our method can be generalized to yield estimation bounds for a wide class of non-Gaussian correlations. More generally, there are many other multiparameter estimation tasks, in which there are prior constraints on the correlations, for which our quantum Bell-Ziv-Zakai bounds could reveal the ultimate achievable limits.

APPENDIX A: VERSION OF BOUNDS WITH VALLEY-FILLING FUNCTION
Here, we provide slightly tighter forms of the quantum Bell-Ziv-Zakai bounds using the valley-filling function. The valley-filling function V is defined by Using Eqs. (31) and (35) of Ref. [52] to bound Prðju ⊤ ϵj ≥ τ=2Þ and noting Property 1 of Ref. [52] yields the Bell-Ziv-Zakai bounds [52,56]: These inequalities are tighter versions of Eqs. (2.17) and (2.20). The quantum bounds are obtained by lower bounding P e and P el e using Eq. (2.22) or (2.23) with the right prior probabilities.

APPENDIX B: EVALUATION OF THE INTEGRAL
Here, we show how to evaluate the integral Writing this integral out explicitly gives where k is the dimension of the vector and μ is the mean of the distribution. Let V be an orthogonal matrix that diagonalizes Σ 0 , and D the matrix of eigenvalues of Σ 0 , so Σ −1 0 ¼ V ⊤ D −1 V. Next, let U be an orthogonal matrix such that w ≔ UD −1=2 Vv is a vector with only one nonzero entry. The magnitude of this vector is jwj ¼ ðv ⊤ Σ −1 0 vÞ 1=2 . Taking a change of variables z ¼ UD −1=2 Vðx − μÞ, the integral becomes Integrating each of the components of z that are orthogonal to w gives Splitting the integral up into the parts where each term in the minimum is smaller, we can then evaluate it to give which yields the result in Eq. (2.33).

APPENDIX C: ASYMPTOTIC SCALING FOR SPECTRA WITH POWER-LAW TAIL
More generally, the prior power spectral densityΣ 0 ðωÞ may scale as κ p−1 =jωj p for jωj → ∞. Making this concept rigorous, we assume that there exist constants w 0 and G such thatΣ Then, we obtain, with vðtÞ given in Eq. (4.23), The first term is negligible, provided that κT is small. With the expression we take for T, κT ∝ ½κ=N ðt 0 Þ 2=ðpþ1Þ . Because we consider scaling with large N ðt 0 Þ=κ, the first term is negligible, and we again obtain the result in the main text.

APPENDIX D: CANONICAL PHASE-LOCKED LOOP
The accuracy of our linear model in Sec. V relies on the assumption of weak phase modulation. For large phase fluctuations, we can borrow from the phase-locked loop concept [42][43][44][65][66][67][68][69][70] and modulate each pulse by an adaptive phase −XðnTÞ before the canonical phase measurement, whereXðnTÞ is a causal estimate of XðnTÞ extrapolated from previous observations fY n−1 ; Y n−2 ; …g. Provided thatXðnTÞ tracks XðnTÞ closely, viz., the net phase modulation XðnTÞ −XðnTÞ will be small, and Y n ∈ ð−π; π can be linearized as The requirement of small causal error according to Eq. (D1) is now less stringent than Eq. (5.1). To evaluate the causal error analytically, we approximate the discrete observations Y n as a continuous-time signal given by YðtÞ ≈ XðtÞ −XðtÞ þ ξðtÞ; ðD3Þ The continuous approximation is accurate in the high-N limit because the measurement period T in Eq. (5.12) can be made arbitrarily small in the limit. The minimum causal error at a steady state is then given by the Yovits-Jackson formula [50]: Since the error decreases with decreasing R and increasing N , the phase tracking can be made arbitrarily accurate in the high-N limit. These considerations are similar to the principles of a homodyne phase-locked loop [66][67][68][69], except that here we assume canonical phase measurements to avoid photon-number fluctuations. In the long-time limit, phase-wrap errors, no matter how rare, can still occur, making the estimate diverge from the true waveform by multiples of 2π. Just like the classical phase modulation system, it can be expected that this divergence will be eliminated by adding a dc notch filter to the output [69].

APPENDIX E: TIME AVERAGES
Here, we show how to take the time averages (5.9) and (5.10) given in the main text. Each average is taken because the error will depend on how far t is from the nearest sampling point. Because the distribution is otherwise time invariant, we need only average over the interval ½0; T.
Next, evaluating the average in Eq. (5.10) gives

APPENDIX F: DERIVING THE BOUND IN THE PERIODIC CASE
Here, we show how to obtain Eq. (2.38) from Eq. (6.5), in order to provide a bound in the periodic case. Using the property (6.2) of D, Eq. (6.5) gives Introducing a vector v as in Sec. II A, one can place a lower bound on the sum of probabilities in the expression above, via The probabilities π 0 and π 1 are given by Eq. (6.8). Using this result in Eq. (F1) then implies Eq. (6.6).

APPENDIX G: BOUNDING CHANGE IN X
Here, we show how to bound E½½XðtÞ − Xðt 0 Þ 2 . In general, using only the property thatΣ 0 ðωÞ is an even function, To bound the variance, rather than using the spectrum in the form (4.21), we use upper bounds. In the case for 1 < p < 3, it is convenient to use the upper bound Σ 0 ðωÞ < κ p−1 =jωj p , which gives, for p ≠ 2, In the case p ¼ 2, the result is κjt − t 0 j, which is equivalent to taking the limit p → 2 in Eq. (G2). For p ≥ 3, we upper bound the spectrum as In the case p ¼ 3, we then get For p > 3, For jt − t 0 j ¼ T, in each case, we find that the variance varies as a polynomial in κ=N and is therefore small for large N =κ.

APPENDIX H: MEASUREMENTS WITH BOUNDED ERROR
Here, we explain the modified measurement technique required in Sec. VII, to ensure that the probability of large phase errors is exponentially small in N =κ. It is known that there are probe states, with maximum photon number 2N and mean photon number N, such that for a given integer M a canonical phase measurement accurately estimates any phase shift to within π=M, with an error probability exponentially small in N [71]. For each sample pulse, we can therefore apply such a measurement to a first subpulse having a number of photons scaling as Θð ffiffiffiffiffiffiffiffi N T p Þ, with M ¼ 12, providing an exponentially small probability of an error greater than π=12. Call this measurement A. For each sample pulse, a measurement as in Ref. [64] can also be performed on a second subpulse having a mean number of photons N T − Θð ffiffiffiffiffiffiffiffi N T p Þ. Call this measurement B. The mean-square error of this measurement is, to leading order, no different than the measurement with mean number N T because the difference in number is higher order.
The two measurements for each sample can be combined by using the result from measurement B if it is within π=6 of the result from measurement A. Provided the result from measurement A is no more than π=12 from the true phase, the result from measurement B will be no more than π=4 from the true phase. Hence, the probability of a phase error greater than π=4 is exponentially small. Conversely, if the two measurement results differ by more than π=6, then the result from measurement A is used. Provided the result from measurement A is no more than π=12 from the true phase, the error in the result from measurement A is then smaller than the error in measurement B. Hence, using the result from measurement A can only decrease the variance, while providing a probability of error greater than π=4 that is exponentially small. The net result of this combined measurement is that the probability of an error greater than π=4 can be made exponentially small in N =κ, while not affecting the mean-square error to leading order.

APPENDIX I: PROOF OF FORMULA FOR a m
To prove the last line of Eq. (7.16) for m > 0, we can simplify the sum over sinc functions to Using Eq. (5.7.7) of Ref.
[72] gives Combining the two, with x ¼ πt=T and z ¼ m − x=π, gives the result in the last line of Eq. (7.16) for m > 0.
Similarly, for m ≤ 0, we have Here, we justify the approximations used in the last line of Eq. (7.18) and show that the long-range correlations quantified by S are exponentially small.

The approximations
To justify the approximations, we first break up Z n into contributions from the change in the phase and from the error in the samples. Using the definition of K n in Eq. (7.4), we find that Now Z n is an integer, and This means that ½−XðnTÞ þ Xððn − 1ÞTÞ − ξ n þ ξ n−1 =ð2πÞ takes values within 1=2 of Z n , which in turn implies that Here, the rounding is taken to use the "round half up" convention. Now define The important thing to note is that, for jn − mj > 1, D n is independent of D m because these only depend on the error in independent measurements. Moreover, D n and D m are independent of C n and C m . However, C n and C m can be correlated because of correlations in the phase variation. Let us use the notation V ≔ jV n−m j and ς ≔ signðV n−m Þ. Note that these quantities depend on n − m; we do not show this dependence, so the equations may be presented more simply. Then, the probability density can be given as In terms of this probability distribution, the difference of probabilities is In the last line, we used the fact that, if v m is negative, the components of the integral for negative values of y cancel out the components of the integral for the corresponding positive values of y. Similarly, if v n is negative, the components for negative x cancel with those for corresponding positive values of x. Since the argument of the exponential is negative, its derivative is no greater than 1, and we have This result then gives An alternative bound can be derived from Eq. (J15). If V n−m > 0, then the first term in braces in Eq. (J15) is larger, and if V n−m < 0, then the second term in braces in Eq. (J15) is larger. In either case, we find j PrðC n ≥ v n ; C m ≥ v m Þ − PrðC n ≥ v n ; C m ≤ −v m Þj where we have made the change of variables u ¼ x þ y and w ¼ x − y. Because the integral over w is upper bounded by the integral over the infinite interval, we have j PrðC n ≥ v n ; C m ≥ v m Þ − PrðC n ≥ v n ; C m ≤ −v m Þj ≤ Because V ≤ σ 2 and ðjv n j þ jv m jÞ 2 ≥ v 2 n þ v 2 m , we can use this to obtain the inequality j PrðC n ≥ v n ; C m ≥ v m Þ − PrðC n ≥ v n ; C m ≤ −v m Þj ≤ We can rewrite Eq. (J17) as Again using V ≤ σ 2 , this implies Combining this result with Eq. (J20) gives j PrðC n ≥ v n ; C m ≥ v m Þ − PrðC n ≥ v n ; C m ≤ −v m Þj ≤ min 1 2 ; The possible range of V=σ 2 is [0,1]. It is easily checked that, within this range, the minimum in the above expression is no larger than 1.04 × V=σ 2 . Hence, we have the inequality j PrðC n ≥ v n ; C m ≥ v m Þ − PrðC n ≥ v n ; C m ≤ −v m Þj ≤ 1.04 It is important to note that σ 2 decreases polynomially in N . Hence, for z n > 1 or z m > 1, this expression is exponentially small in N . Moreover, this expression decays exponentially in z n and z m , so the sum over z n > 0 and z m > 0, excluding z n ¼ z m ¼ 1, is exponentially small in N . Considering the remaining term for z n ¼ z m ¼ 1, we obtain the bound It is possible to perform measurements such that the probability of error larger than π=4 is exponentially small in N . As a result, using Eq. (J7), we have E½Z n Z m ¼ jV n−m jgðN Þ; where gðN Þ is an exponentially decreasing function.
Evaluating V n−m , we obtain Taking m ¼ 0 and n > 0 for simplicity, and integrating by parts, we obtain where we have used the fact thatΣ 0 ðωÞ is bounded at ω ¼ 0 and approaches zero for ω → ∞. Using the fact that Σ 0 0 ðωÞ ≤ 0, AsΣ 0 ð0Þ is bounded (equal to κ p−1 =γ p ), V n−m scales as 1=jn − mj. Hence, E½Z n Z m scales as a factor exponentially small in N times 1=jn − mj. Next, we consider the scaling for a m . Using the first two terms of the asymptotic series for the digamma function, we have As a result, a m has the scaling In fact, it turns out that ja m j ≤ π=jmj. Now we have sufficient results to bound the component of the sum S ¼ X n;m∶jn−mj>1 E½Z n Z m a m a n : ðJ33Þ Using the above results, the sum can be bounded as S ≲ fðN Þ X n;m∶jn−mj>1;n≠0;m≠0 where f is an exponentially decreasing function. Splitting the sum into m < n and m > n, the bound can be rewritten as Using the inequality x 2 þ y 2 ≥ 2xy for x ¼ ffiffiffi n p and y ¼ ffiffi ffi r p gives n þ r ≥ 2 ffiffiffiffiffi nr p . Hence, substituting n þ r with 2 ffiffiffiffiffi nr p gives where ζðzÞ is the Riemann zeta function. This result means that the sum S is exponentially small in N , as claimed.