Exact quantum sensing limits for bosonic dephasing channels

Dephasing is a prominent noise mechanism that afflicts quantum information carriers, and it is one of the main challenges towards realizing useful quantum computation, communication, and sensing. Here we consider discrimination and estimation of bosonic dephasing channels, when using the most general adaptive strategies allowed by quantum mechanics. We reduce these difficult quantum problems to simple classical ones based on the probability densities defining the bosonic dephasing channels. By doing so, we rigorously establish the optimal performance of various distinguishability and estimation tasks and construct explicit strategies to achieve this performance. To the best of our knowledge, this is the first example of a non-Gaussian bosonic channel for which there are exact solutions for these tasks.

An important channel to consider in the context of quantum technologies is the bosonic dephasing channel (BDC).A single-mode BDC D p is characterized by a probability density function p(ϕ), where ϕ ∈ [−π, π] represents the random angle of phase space rotation induced by the channel [1].Accordingly, the action of D p on an input density operator ρ is given by where n is the photon number operator [2].Dephasing is a major noise mechanism that afflicts quantum information carriers [3], and it is one of the main challenges towards realizing useful quantum computation, communication, and sensing.In a dephasing noise process, the relative phase information between different photonnumber components of a superposed state is lost; for quantum communication, such a process can be understood as arising from, e.g., temperature fluctuations of the environment that stretch or contract the length of a fiber [4].As such, the problem of understanding the ultimate quantum limits for quantum information tasks using such channels has received considerable attention recently [1,[5][6][7][8][9][10][11][12].
Two important tasks for characterizing the capabilities of BDCs are channel discrimination (quantum hypothesis testing) and parameter estimation (quantum metrology).For hypothesis testing, the task is to distinguish between models describing different physical processes.The most basic setting involves a binary decision, for which the goal is to distinguish between two hypotheses, commonly called the null hypothesis and the alternative hypothesis.Quantum state discrimination is crucial in several applications (e.g., quantum communication [13], astronomical sensing [14,15], and spectroscopy [16]), and it has been extensively studied [17,18].Quantum channel discrimination, a generalization of state discrimination, has been studied less; however, there is an increasing body of literature on this topic [19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36].In channel discrimination, the unknown channel is called n times, and the goal is to perform a measurement on the final state to determine which channel was called.This setting is more complex than state discrimination because one can optimize over various strategies in order to make the error probabilities as low as possible.One can either employ an adaptive strategy or a non-adaptive parallel strategy.It is known that adaptive strategies possess advantages over non-adaptive ones in the non-asymptotic regime [23,32]; however, these advantages come with limits [34,36].In the setting of symmetric error, adaptive strategies can also outperform parallel ones in the asymptotic regime [35], but they do not in the asymptotic setting of asymmetric error [29][30][31]36].
Quantum metrology deals with the optimal estimation of parameters encoded in quantum states and quantum channels, and the typical goal is to minimize the variance of the parameter of interest.Quantum strategies for estimation involve nonclassical effects like entanglement to achieve precision limits beyond those that are allowed by classical physics.The ultimate quantum limit for quan- FIG. 1.A general, adaptive protocol for channel discrimination and parameter estimation, when either N0 or N1 is called three times.The initial input state is τ , the adaptive operations are A1 and A2, and the final measurement is Q.The final states are denoted by ρ tum metrology using adaptive strategies is in general exceedingly difficult to characterize, as the nested optimizations over each step of an adaptive protocol often lead to a mathematically intractable problem.When estimating parameters encoded into quantum channels, one can again consider parallel and adaptive strategies [37], with there being differences in their performance [38].Previous works have considered bounds for parameter estimation of unitary channels [39][40][41], for teleportationcovariant and Gaussian channels [26,27], and for general channels [42].
In this paper, we consider several tasks associated with BDCs: 1. channel discrimination of two BDCs in the symmetric error setting of hypothesis testing, 2. channel discrimination of two BDCs in the asymmetric error setting of hypothesis testing, 3. noise parameter estimation of BDCs.
We consider several other discrimination tasks and discuss the multimode generalization of our results in some of the appendices.By making a connection to the physics ("environmental state") that gives rise to the channel processes, for 1., 2., and 3., we quantify the largest distinguishability or estimability that can be realized between BDCs using the most general strategy achievable by adaptive protocols.We do so by proving optimality bounds and showing their attainability.In the latter case, we provide a fully rigorous proof of the convergence of the guessed probability density to the original one.To the best of our knowledge, our results here constitute the first example of a class of non-Gaussian bosonic channels for which we have exact solutions for these tasks.The structure of our paper is as follows.In Sections II A-II B, we introduce the theoretical frameworks of quantum channel discrimination and estimation.We present our main results in Section III; the optimality parts of our proofs are given in Section V, and the attainability parts in Section VI.We show that in the energy-unconstrained limit the corresponding figures of merit match, thus leading to exact solutions for channel discrimination and estimation tasks involving BDCs.In Section VII, we generalize our findings when photon loss is present in addition to dephasing, showing that the same fundamental limits apply.We finally conclude in Section VIII with a summary and some directions for future research.In Appendix E, we discuss a variety of other scenarios to which our results apply, including the strong converse exponent, the error exponent, multiple channel discrimination, and antidistinguishability.In Appendix F, we briefly touch upon how our results generalize to multimode BDCs, and in Appendix G, how our findings generalize to multiparameter estimation.

II. QUANTUM CHANNEL DISCRIMINATION AND ESTIMATION
A. Quantum channel discrimination The goal of quantum channel discrimination is to distinguish one quantum channel N 0 from another channel N 1 by calling the unknown channel n times.A strategy for doing so is abbreviated by A, which denotes the initial state τ prepared, the n − 1 adaptive operations between every call to the unknown channel, and the final measurement, denoted by so that Q is a positive operator-valued measure (POVM).The measurement outcome Q 0 corresponds to deciding N 0 , and the outcome Q 1 corresponds to deciding N 1 .An example of such a quantum channel discrimination protocol with n = 3 is depicted in Figure 1.The type I error probability is the probability of deciding N 1 when the actual channel is N 0 , and the type II error probability is the probability of deciding N 0 when the actual channel is N 1 .We denote these error probabilities, respectively, as follows: where ρ denotes the final state of the protocol in case the unknown channel is N 0 and ρ denotes the final state of the protocol in case the unknown channel is N 1 .Additionally, in the notations α n (A) and β n (A), we have left the dependence on the channels N 0 and N 1 implicit, but we explicitly denote the dependence on the number n of calls to the unknown channel and the strategy A. In radar applications, α n is known as the false alarm probability, and β n is called the missed detection probability [43].For more details of quantum channel discrimination, please refer to [29, Section III-A].
The difficulty of quantum channel discrimination arises because most operational quantities of interest involve optimizations of these error probabilities over every possible adaptive strategy A. For the finite-dimensional case, these optimizations can be phrased as semi-definite programs [32], but their computational complexity grows exponentially in n.For the infinite-dimensional case, the resulting optimization problem is an infinite-dimensional semi-definite program and, in general, is even more difficult to solve.
Classical hypothesis testing, on the other hand, is much simpler to handle computationally.In this scenario, a sample ϕ is selected from a probability density function p or q, and the goal is to identify whether ϕ came from p or q.This setting can be generalized to the multiple sample setting, in which n samples, abbreviated as ϕ n ≡ (ϕ 1 , . . ., ϕ n ), are selected from p ⊗n or q ⊗n .In order to decide whether p ⊗n or q ⊗n is the underlying density, the most general procedure one can perform on ϕ n is a randomized test t [44, Section 2.1], which is the classical version of a POVM (i.e., t(ϕ n ) ∈ [0, 1] is the probability of deciding p ⊗n for all ϕ n ).The error probabilities of classical hypothesis testing are then given by where, in the notation for α n (t) and β n (t), we have left the dependence on the probability densities p and q implicit.

B. Quantum channel estimation
The goal of quantum channel estimation is to estimate a parameter θ encoded in a family (N θ ) θ∈Θ of quantum channels.The main difference with quantum channel discrimination is that θ is selected from a continuous set Θ, and thus we can only obtain an estimate of it, rather than identify it exactly.However, one can think of channel discrimination as being a special case of channel estimation in which the set Θ is a finite set consisting of just two elements (i.e., Θ = {0, 1} for channel discrimination).
The most general protocol for channel estimation is adaptive, similar to that discussed for channel discrimination.As such, we use the same notation A to refer to an adaptive strategy for channel estimation.However, the main difference is that the final POVM of a channel estimation protocol outputs an estimate θ of the unknown parameter θ, and thus, the final POVM is of the form (Q θ ) θ∈Θ , where Q θ ≥ 0 for all θ and d θ Q θ = I.
Denoting the final state of an n-shot protocol by ρ (n) θ whenever the underlying channel is N θ , the conditional probability density of observing θ is Tr To quantify the performance of a channel estimation protocol, we employ a cost function c( θ, θ) that measures the deviation of the estimate θ from the true value θ.The basic properties for such a cost function are as follows [45, Section 2.1]: 4. c( θ, θ) is not identically equal to zero, 5. for some constants k, a > 0, the following bound holds: c( θ, θ) ≤ k(1 + | θ − θ| a ) for all θ, θ.
Beyond basic properties expected for such a function, we require it to be continuous.Common choices when Θ = R include the absolute deviation c( θ, θ) = | θ − θ| and quadratic cost c( θ, θ) = ( θ − θ) 2 .We then define the risk of an n-round adaptive strategy A to be the expected cost: As with channel discrimination, in the notation r n (θ, A), we leave the dependence on N θ implicit.If we choose c( θ, θ) = ( θ − θ) 2 , the expected cost is the mean squared error or variance, which is the standard error metric for parameter estimation.The difficulty in channel estimation also lies in the complexity of adaptive strategies that can be considered.Classical parameter estimation is again simpler.In this scenario, there is a parameterized family (p θ ) θ∈Θ of probability densities.Given access to n independent samples selected from p ⊗n θ , labeled by ϕ n , an estimate θ is output according to the conditional probability density t( θ|ϕ n ).Then the conditional probability density for outputting θ is and thus the risk of a classical strategy t for estimating θ is Here, in the notation r n (θ, t), we again leave the dependence on the underlying probability density p θ implicit.

III. MAIN RESULTS
One of the main insights of our paper is that, when restricting the underlying channels to BDCs D p and D q of the form in (1) and with respective probability densities p and q, various optimized functions of the error probabilities α n (A) and β n (A), as defined in (2), are equal to the same optimized functions of the classical error probabilities α n (t) and β n (t), as defined in (3)-(4).Our first main result is that the following equality holds for all λ ∈ (0, 1): where the optimization on the left is over all adaptive strategies and that on the right is over all randomized tests (we adopt this same abbreviated notation in all related statements that follow).Our second main result is that the following equality holds for all ε ∈ (0, 1): In Appendix A, we prove that these claims are actually a consequence of the fact that the sets of achievable pairs, A related insight of our paper concerns parameterized families (D p θ ) θ∈Θ of BDCs, for which the probability density p θ underlying D p θ is parameterized.In this case, our third main result is that the following equality holds for an arbitrary risk function for which the underlying cost function is continuous: where, similar to (8), the optimization on the left is over all adaptive strategies for channel estimation of D p θ and the optimization on the right is over all classical estimation strategies for p θ .
The equalities in ( 8), (9), and ( 10) are some of the main results of our paper, and as we will see, they give a complete understanding of the fundamental limits of channel discrimination and parameter estimation for BDCs.The equalities in ( 8)-( 9) thus represent a significant reduction in the difficulty of channel discrimination for BDCs, i.e., reducing it to a classical hypothesis testing problem, for which there is a wealth of knowledge that we can apply.A similar statement applies for (10) and channel estimation of BDCs.In the subsections that follow, we discuss various scenarios of interest in more detail.

A. Symmetric hypothesis testing
In the symmetric setting of channel discrimination, the goal is to find an optimal strategy that attains the minimum average error probability, defined as inf where λ ∈ (0, 1) is the prior probability that channel N 0 is selected and α n (A) and β n (A) are defined in (2).Our result here establishes, for BDCs D p and D q , that the equality in (8) holds.As a consequence of a well known result from statistics (see, e.g., [46,Lemma 1.4]), we have the following explicit form for the error probability: where ∥f ∥ The non-asymptotic error exponent for channel discrimination in the symmetric setting is defined as (13) By appealing to the Chernoff theorem from probability theory [47] and defining the Chernoff divergence of two probability densities p and q as our result implies the following simple expression for the asymptotic error exponent of BDCs D p and D q : This is because where the first equality follows from ( 8) and ( 13) and the second from Chernoff's theorem.

B. Asymmetric hypothesis testing
In the asymmetric setting of channel discrimination, the goal is to minimize the type II error probability subject to a constraint on the type I error probability.One example is in radar applications or quantum illumination [48][49][50], where one is willing to tolerate a certain rate of false alarms but then desires to minimize the chances of missed detections [51].Indeed, the asymmetric error setting is the right setting to focus on for such applications and leads to a curve known as the receiver operating characteristic [43].
More formally, for ε ∈ (0, 1), the goal is to optimize over every adaptive strategy A in order to minimize the type II error probability where α n (A) and β n (A) are defined in (2).Our result here is that, for BDCs D p and D q , the equality in ( 9) holds, where α n (t) and β n (t) are defined in ( 3)-( 4).
The non-asymptotic error exponent for channel discrimination in the asymmetric setting is defined as (19) By appealing to Strassen's theorem [52, Theorem 3.1], itself a refinement of the classical Stein's lemma [53,54], our result implies the following expansion of Z n (ε, D p , D q ): (See also [44,Proposition 2.3].)In the above, the relative entropy, the relative entropy variance, and the inverse cumulative distribution function of a standard normal random variable are respectively defined as where The equality in (20) follows from the equality in (9) and from the following expansion, a direct application of Strassen's theorem: Note that Φ −1 (ε) < 0 for ε < 1/2 and Φ −1 (ε) > 0 for ε > 1/2.As such, by inspecting (20), we see that, as a function of the number n of channel uses, the error exponent Z n (ε, D p , D q ) approaches the optimal asymptotic value D(p∥q) from below for ε < 1/2 and from above for ε > 1/2, at a speed determined by the relative entropy variance V (p∥q).The regime of practical interest occurs when ε < 1/2, for which the error exponent Z n (ε, D p , D q ) thus approaches D(p∥q) from below.

C. Quantum metrology
In the setting of quantum metrology (an umbrella term containing quantum channel estimation), the goal is to minimize the risk over all possible adaptive strategies.Our main result for channel estimation, which applies to a continuous family (D p θ ) θ of BDCs, states that the equality in (10) holds.In the case that the underlying cost function is the quadratic cost function and restricting the optimization to be over adaptive strategies that lead to an unbiased estimator, a consequence of our finding and the classical Cramér-Rao bound is the following inequality: where the Fisher information of the parameterized family (p θ ) θ is defined as follows: The inequality in (26) follows directly from (10) and the well known Cramér-Rao bound inf t r n (θ, t) ≥ [nF (θ)] −1 [45, Corollary 1.9].

IV. EXAMPLES OF BOSONIC DEPHASING CHANNELS
To gain some intuition about our findings, let us consider the specific examples of BDCs studied in [1].We plot the Chernoff divergence, the relative entropy, and the Fisher information of certain instances of these channels, which are the main quantities of interest in the asymptotic settings of symmetric channel discrimination, asymmetric channel discrimination, and channel estimation, respectively.
As stressed in [1] and previous works [5,6], perhaps the most important class of BDCs are those resulting from setting the probability density p(ϕ) in (1) to be the wrapped normal distribution: where γ > 0 is the variance.This probability density results from picking ϕ according to a mean-zero normal distribution of variance γ, but then outputting a value in [−π, π] modulo 2π.Physically, as considered in [5,6], it corresponds to interacting the channel input mode with an environmental mode prepared in the vacuum state, according to the Hamiltonian n ⊗ (ê + ê † ), where ê is the annihilation operator for the environmental mode.It can alternatively be realized in terms of Lindbladian evolution for a time γ according to the single Lindblad operator n.Another probability density of interest for the BDC is based on the von Mises distribution: where I n denotes a modified Bessel function of the first kind.The parameter λ determines the spread of the distribution, analogous to γ for the wrapped normal.For λ → ∞, it converges to the uniform density, while it becomes highly peaked at zero in the limit λ → 0.
The final circular distribution that we consider is the wrapped Cauchy distribution, given by The parameter κ > 0 again determines the spread of the distribution.
Figure 2 plots the Chernoff divergence of a pair of BDCs for each kind of distribution, with one spread parameter fixed at a value of γ = λ = κ = 1 and the other spread parameter varied.Figure 3 does the same for the relative entropy.Figure 4 plots the Fisher information of the underlying channel parameter γ, λ, or κ.We find similar qualitative behavior for all three kinds of probability densities.

V. OPTIMALITY
In this section, we prove one side of the equalities in ( 8), (9), and (10) (called the "optimality" part), based on a simple observation about all BDCs of the form in (1).Namely, they can be simulated by the method discussed in [55, Section 3.3], that is, by means of adjoining a parameterized environment state followed by the action of an unparameterized channel.After [55] appeared, similar observations were made for other channels in several subsequent works, including [41] and [26,27,29], and here our contribution is to make a similar observation for BDCs.Namely, all BDCs can be simulated by composing the following two processes: 1.A classical background phase ϕ is chosen randomly according to the probability density p(ϕ) in (1).
2. The input system has the phase operator e −inϕ applied to it, based on the value of ϕ chosen, and the value ϕ is subsequently discarded.
More formally, first we define an environment state σ p that encodes the probability density p(ϕ) in (1) as follows: where, in the physics literature, {|ϕ⟩} ϕ is usually interpreted as a set of 'eigenkets' obeying the 'orthogonality relation' ⟨ϕ ′ |ϕ⟩ = δ(ϕ − ϕ ′ ) (i.e., {|ϕ⟩} ϕ can be seen as an orthogonal basis for the phase ϕ).We may also interpret (31) as a representation of a random variable on [−π, π] with probability density p. Then D p decomposes 5. Channel discrimination and parameter estimation for environment-parameterized channels Dp and Dq, where the underlying environment states are σp and σq, respectively.The yellow-shaded boxes denote the underlying environmental states to which we do not have access. as where The first channel F p appends the environment state σ p to the input state ρ, while G measures σ p and, based on the measured phase ϕ, applies the unitary phase operator e −inϕ to ρ.The action of D p on an arbitrary input state ρ is thus as follows: The implications of the composition in (32) are far reaching, indeed leading to our optimality bounds.The idea is that when we decompose the channel this way, we can "pull back" the environmental state from our analysis of an adaptive strategy A, as depicted in Figure 5.Then, a quantum channel discrimination or estimation task can be recast as a classical state discrimination or estimation task, respectively.Given that the operations in the adaptive strategy A composed with n instances of the channel G are independent of p and q, the optimality of the distinguishability task is then limited by the distinguishability of the environmental states.More formally, every n-round adaptive strategy A for channel discrimination, when applied to the BDC D p , can be composed with n calls of the p-independent channel G to view the resulting strategy as a particular classical test t performed on the probability density p ⊗n .This reasoning then implies the following inequality, which holds for every adaptive strategy A applied to BDCs D p and D q : Since this inequality holds for every adaptive strategy A, we conclude the inequality "≥" in (8) for BDCs D p and D q .Furthermore, for every adaptive strategy A satisfying α n (A) ≤ ε, we conclude, by the same reasoning, for BDCs D p and D q that Since this inequality holds for every adaptive strategy A satisfying α n (A) ≤ ε, we conclude the inequality "≥" in (9) for BDCs D p and D q .
The same reasoning applies for channel estimation, with respect to a continuous family (D p θ ) θ of parameterized BDCs.Indeed, every n-round adaptive strategy A for channel estimation can be composed with n calls of the θ-independent channel L to view the resulting strategy as a particular randomized estimator t performed on the probability density p ⊗n θ .We then conclude the following inequality for every continuous family (D p θ ) θ of parameterized BDCs: Since this inequality holds for every adaptive strategy A, we conclude the inequality "≥" in (10) for every continuous family (D p θ ) θ of parameterized BDCs.

VI. ATTAINABILITY
Now we prove the other side ("attainability") of the equalities in (8), (9), and (10).Again here, the basic principle behind our reasoning is simple.As we will show, for a BDC D p , it is possible to input a sequence (ρ ν ) ν∈N of states to it and perform a POVM (M ϕ ) ϕ such that, for all ϕ ∈ [−π, π], where In the formulation above, we have used ν as an abstract index for a sequence of states.In Sections VI A-VI B, we provide concrete examples in which ν is replaced by photon number or used as an index for a sequence of coherent states with increasing energy.A channel satisfying (39)-( 40) is said to be environment seizable [29,Definition 36] because it is possible to perform pre-and post-processing of the channel in order to "seize" the background environment state.In this case, we can recover the probability density p(ϕ), characterizing a BDC D p , exactly in the ν → ∞ limit and process it directly.It is similarly possible to do this for all n ∈ N because, for all as a direct consequence of (39).Thus, a particular sequence of strategies for channel discrimination is to input the state ρ ν to every channel use, followed by the measurement M ϕ , leading to the density p ⊗n ν (ϕ).We then process the resulting densities with a classical test t.As we will see shortly, such a sequence of strategies is optimal in the limit ν → ∞.
In the case that (39) holds, it directly follows that the type I and type II error probabilities under an arbitrary test t obey the following equalities: where and with q ν (ϕ) defined as in ( 40), but with D p replaced by D q .
As a consequence of (42), if we can show that there exists a sequence (ρ ν ) ν∈N of states and a measurement M ϕ such that (39) holds, then the desired attainability claims hold because the aforementioned strategy is a particular kind of adaptive strategy A; that is, for every test t and for every test Since the inequalities hold for every test t and for every test t ′ such that α n (t ′ ) ≤ ε, we conclude that the same inequalities hold with infima taken on the right-hand side.Combining this claim with the optimality results from the previous section concludes the proof of the desired equalities in (8) and (9).We can make similar conclusions for channel estimation for a continuous family (D p θ ) θ of parameterized BDCs.Indeed, in the case that (39) holds and the estimator t satisfies a simple application of Lebesgue's dominated convergence theorem shows that where As a consequence of (48), and similar reasoning used above in (45), we conclude the following attainability inequality: which thus finishes the proof of our main channel estimation result in (10).
In the two subsections that follow, we exhibit two specific schemes for which the needed equality in (39) holds.Moreover, the methods are simple to describe in physical terms, involving either 1) preparation of a uniform superposition of photon-number states at the input and a quantum Fourier transform followed by photon-number measurement and classical post-processing at the output or 2) preparation of a coherent state at the input and heterodyne detection followed by classical post-processing at the output.See Figure 6 for a visual depiction of the two methods.The latter method is robust to loss in the channel in addition to dephasing, due to the fact that the purity of coherent states is retained under a pureloss channel (see Section VII for more discussions of this point).The first scheme is similar to that introduced in [56], and the measurement used in the first scheme can be considered an approximation of the canonical phase measurement, also discussed in [57].The second scheme has been considered in [57].

A. Photon-number-superposition method
As stated above, this method involves preparing a uniform superposition of photon-number states at the input and performing a quantum Fourier transform, followed by photon detection and classical post-processing at the output.Photon-number superposition states have been well investigated in the context of optical phase estimation (see [58][59][60][61][62]).The scheme we consider below is quite similar to that proposed in [56].
Let us begin by defining a d-level, uniform superposition of photon-number states: where |n⟩ is a photon-number state [2].A property of |+ d ⟩ is that phases become encoded into it as follows: Such encoded phases can be recovered approximately by performing a measurement in the Fourier basis, which is defined for all k ∈ {0, . . ., d − 1} as Indeed, as shown in Appendix B, we find that the probability of measuring k ∈ {0, . . ., d − 1} is as follows: The function on the last line above is proportional to the Fejér kernel, a well-known object in Fourier analysis [63]; as a function of ϕ, it is peaked at ϕ = 2πk/d.
which is essentially equivalent to the less formal statement that p d ( φ|ϕ) converges to the Dirac delta function δ( φ − ϕ) in the d → ∞ limit.The above convergence statement in ( 57) is proved in a rigorous way in Lemma 1 in Appendix B. Now applying this reasoning to the BDC D p , we find from a direct application of (54) that (58) Let us then denote by (M ϕ ) ϕ the measurement that 1) performs a Fourier basis measurement {|u k ⟩⟨u k |} k with outcome k, 2) calculates the value ϕ = 2πk/d and shifts by −2π if ϕ ∈ (π, 2π), and 3) finally adds to this value uniform noise selected from an interval of size 2π/d to produce an outcome ϕ.It then follows as a consequence of ( 57) and ( 58) that concluding our proof of (39) for this scheme.

B. Coherent-state method
This method involves preparing a coherent state at the input and performing heterodyne detection and classical post-processing at the output, which is a routine method for optical phase estimation (see, e.g., [57,[64][65][66][67]).The coherent-state method is easier to implement in practice than the photon-number-superposition method.In this approach, the initial state is given by the following coherent state: where α ∈ C. For the scheme we use here, we fix α ∈ R + .
After the phase rotation e −inϕ acts, the state becomes as reviewed in Appendix C. Performing a heterodyne measurement (with POVM elements { 1 π |β⟩⟨β|} β∈C ) on the state in (61) then leads to the measurement outcome β.The final step (classical post-processing) is to compute the argument of β, i.e., φ := arg(β), as an estimate of the phase ϕ; the probability density for φ is known as the Rician phase distribution and is given by [68, Eqs. ( 10) & (20)] See Appendix D for a derivation of the Rician phase probability density, provided for convenience.Notably, this probability density is highly peaked at φ = ϕ and converges to a Dirac delta function in the following sense: where p(ϕ) is an arbitrary probability density defined on the interval [−π, π].We provide a rigorous statement of the above convergence in Lemma 2 in Appendix C. Finally, denoting by (M ϕ ) ϕ the measurement that 1) performs heterodyne detection with outcome β and 2) outputs the value ϕ = arg(β), it follows as a consequence of (63) that concluding our proof of (39) for this scheme.Let us remark that an explicit form for the POVM (M ϕ ) ϕ was obtained in [57, Eq. (3.10)] and is as follows:

C. Comparison of methods for finite energy
Let us compare the performance of the photonnumber-superposition and coherent-state methods to the fundamental limit when there is an energy constraint in place.In particular, let us consider channel discrimination (asymmetric error) of two bosonic dephasing channels D pγ 1 and D pγ 2 , for which the underlying probability densities p γ1 and p γ2 are wrapped normal distributions with respective variances γ 1 and γ 2 .Figure 7 illustrates how quickly the relative entropy of these schemes converges to the optimal relative entropy.For the photonnumber-superposition scheme, the probability density as a function of d is given by where p d ( φ|ϕ) is defined in (55), from which we can calculate the relative entropy D(p d,γ1 ∥p d,γ2 ) of this scheme as a function of d.For the coherent-state scheme, the Comparison of photon-number-superposition and coherent-state schemes to the fundamental limit, when considering channel discrimination in the setting of asymmetric error.The figure plots the various relative entropies when γ1 = 1; here d = 20 for the photon-number-superposition scheme and |α| 2 ∈ {9.5, 25} for the coherent-state scheme.Note that the average energy of the input state is the same for |α| 2 = 9.5 and d = 20.
probability density as a function of α is given by from which we can calculate the relative entropy D(p α,γ1 ∥ p α,γ2 ) for this scheme as a function of α.Interestingly, Figure 7 indicates that these schemes in practice come close to achieving the fundamental limit, and we also see that the coherent-state scheme has an advantage over the photon-number scheme for the same fixed energy, given that the mean photon number is 9.5 for the state |+ d ⟩ in (51) when d = 20.

VII. BOSONIC DEPHASING AND LOSS
Our results apply more generally to a scenario that involves photon loss in addition to dephasing.This indicates a certain robustness of our results, since we expect to encounter photon loss in any realistic scenario.Namely, suppose that the two channels to distinguish are L η • D p and L η • D q , where L η is a pure-loss bosonic channel of transmissivity η ∈ (0, 1].This composite channel has been studied in the context of quantum communication, under the name bosonic loss-dephasing channel [69,70].Our main observation here is that the distinguishability of these channels in all scenarios considered is no different from the distinguishability of D p and D q .Thus, all results stated above for D p and D q hold also for L η • D p and L η • D q . The optimality part of this claim follows by similar reasoning used in Section V.That is, since the pure-loss channel L η is common to both L η • D p and L η • D q , it can be considered as part of an adaptive strategy used to discriminate these channels, and so their distinguishability is still limited by the classical environmental states σ p and σ q .The attainability part follows by using the scheme from Section VI and the fact that coherent states retain their purity after the action of a pure-loss channel.That is, L η (|α⟩⟨α|) = | √ ηα⟩⟨ √ ηα|.Then the following limit holds by applying the same reasoning used to justify (64): Here we also used the fact that dephasing channels and pure-loss channels commute; i.e., Similarly, all estimation results stated above for the parameterized family (D p θ ) θ∈Θ hold also for the parameterized family (L η • D p θ ) θ∈Θ .This follows from the same reasoning given for the discrimination setting.Namely, the optimality part follows because all estimation strategies for the family (L η • D p θ ) θ∈Θ are limited by those of the family (p θ ) θ∈Θ of probability densities.Then for the attainability part, the equality in (68) applies, allowing us to apply the reasoning in Section VI again.
Finally, numerical estimates using the probability distribution derived in Appendix H indicate that the photon-number-superposition method from Section VI A might be optimal also in the presence of loss, provided that one considers the limit of infinite energy.That is, although the uniform superposition state in (51) is affected detrimentally by loss, it seems to retain sufficiently high coherence to effectively detect a phase-space rotation.The coherent-state scheme from Section VI B, however, might still have an advantage over the photon-numbersuperposition method also in the presence of loss if one considers the finite-energy setting; Figure 7 illustrates that this is indeed the case for channel discrimination in the setting of asymmetric error.

VIII. CONCLUSION
In conclusion, we have determined the fundamental limits of discrimination and estimation for BDCs, complementing the recent results of [1] on communication.Not only have we accomplished this for asymptotic quantities like the symmetric and asymmetric error exponents for channel discrimination, but we have also done so for the underlying fundamental, operational quantities like the symmetric and asymmetric error probabilities of an arbitrary n-round adaptive strategy (see (8) and ( 9), respectively).We have done the same for the main operational quantity in channel estimation, the risk of an n-round adaptive strategy (see (10)).The main ideas for these results relied on the method of simulation from [55], for the optimality part, and to exhibit a sequence of strategies that pre-and post-process a BDC to recover its underlying probability density, for the attainability part.This is similar in spirit to previous results of [26,27].
Going forward from here, the main pressing open question is to determine the limits for these tasks whenever there is a realistic energy constraint in place.More specifically, we think it is interesting to determine which scheme, either the photon-number-superposition scheme from Section VI A or the coherent-state scheme from Section VI B, performs better in the finite-energy regime, as well as in the case that there is photon loss in addition to dephasing.There are certainly other schemes besides these two to consider as well.Furthermore, given that our findings in Section VII only apply when the transmissivity parameter η is fixed, it is open to determine the limits of discrimination and estimation when the transmissivity parameter varies in addition to the dephasing channel.
Another natural generalization of our results is to the case of an arbitrary random unitary channel of the form where p is a probability density on the real line and H is a general Hamiltonian.The same simulation arguments from Section V allow for concluding optimality bounds, that all adaptive strategies for discriminating or estimating channels from this class are limited by the underlying classical probability densities.Based on the insights from [71, Proposition 2], we expect that seizing the underlying probability density p might be possible for a large class of Hamiltonians.If that is the case, then our results could be extended far beyond the setting we considered here.
where Π d is defined by (56), and x mod 2π := min {x + 2πk : where D p is the bosonic dephasing channel given by (1), satisfies uniformly on [−π, π], and furthermore Proof.We start by observing that, due to the calculation in the first part of this appendix, Here, in (i) we introduced the Fejér kernel F d (x) := implying that indeed continuing with the justification of the first chain of identities, in (iii) we used the periodicity of F d to substitute φ′ with φ, and finally in (iv) we introduced the notation Now, calling p the periodic extension of p to the whole real line, for all ξ ∈ R one sees that Note that since p is continuous on the compact set [−π, π], it is also uniformly continuous.Due to the fact that p(−π) = p(π), its extension p can also be shown to be uniformly continuous.Let ω be the modulus of continuity of p.This means that ω : [0, ∞) → [0, ∞) is a non-decreasing continuous function, with ω(0) = 0, such that for all ξ, ξ ′ ∈ R it holds that Now, for ξ, ξ ′ ∈ R we can write that where in the last line we leveraged the fact that We are finally ready to put everything together and prove the first half of the claim.We write that where in the last line we noted that φ ≤ 2π To deduce (B11) from (B10) it suffices to note that and the right-hand side tends to 0 as d → ∞ due to (B10).
Appendix C: Calculations for coherent-state method Let us first justify the equality in (61).After the phase rotation e −inϕ acts, the state becomes After performing heterodyne detection, and as discussed in the main text, we compute the argument of β as the estimate of ϕ, i.e., φ := arg(β).The induced probability density function for φ is known as the Rician phase distribution (see [68, Eqs. ( 10) & (20)]).In particular, we can model the random process by which φ is generated as being like that in [68, Eq. ( 3)], given by where n is a complex Gaussian random variable CN (0, 1) (such that the variance for each of the real and imaginary parts is 1/2, i.e., σ 2 = 1/2, using the notation of [68, Eq. ( 3)]).We can restrict α to be a positive real number, and in this case, we have that A = α and B = −1, using the notation of [68, Eq. ( 3)].Following [68, Eqs. ( 10) & ( 20)], we find that the probability density p α ( φ|ϕ) for φ ∈ [−π, π] is given by We now show that this probability density converges to a Dirac delta at ϕ in the limit as α → ∞, in the sense stated in (63).Proof.For this proof it is ideal to work with the integral representation of the Rician probability density given in (D5); namely, Substituting (C10) into (C7), we now have that The justification of the above chain of identities is as follows: in (i) we used the above integral representation of the Rician probability density and introduced the periodic extension p of p to the whole real line; in (ii) we used Fubini's theorem, changing variables to γ := be −i( φ−ϕ) ; in (iii) we changed variables again, setting z := γ − α.Now, since α ≥ 0 is real, we have that where in (iv) we employed Lebesgue's dominated convergence theorem, which is applicable because, due to its periodicity and continuity, p is a bounded function, which means that p φ + arg(z + α) e −|z| 2 ≤ M e −|z| 2 for some constant M > 0, and the right-hand side is an absolutely integrable function of z.This completes the proof of (C8).
To deduce (C9), we first note that if p(ξ) ≤ M for all ξ ∈ R, then also where α n (A) and β n (A) are defined in (2).By applying the same reasoning as given in Sections V and VI, we conclude for BDCs D p and D q that B n (r, D p , D q ) = sup t − 1 n ln α n (t) : β n (t) ≤ e −rn , (E6) where α n (t) and β n (t) are defined in (3)-( 4) and taken with respect to the probability densities p and q defining D p and D q , respectively.By taking the n → ∞ limit and applying the classical result of [78], we conclude that lim n→∞ B n (r, D p , D q ) = sup α∈(0,1) α − 1 α (r − D α (p∥q)) , (E7) where the Rényi relative entropy D α (p∥q) is defined in (E4).

Multiple channel discrimination
The goal of multiple channel discrimination is to decide which channel has been chosen from a tuple of channels.More formally, let (N i ) ℓ i=1 be a tuple of channels.Then an adaptive protocol for channel discrimination consists of an adaptive strategy of the form discussed previously in Section II A, with the only difference being that the final measurement is be the final state of such a protocol when the ith channel has been selected, the success probability of multiple channel discrimination is where λ i is the prior probability that channel N i is selected.(Thus, the following constraints apply: λ i ≥ 0 for all i ∈ {1, . . ., ℓ} and ℓ i=1 λ i = 1).Now let us consider classical multiple hypothesis testing.Let (p i ) ℓ i=1 be a tuple of probability densities.Here the goal is to observe a sample ϕ n ≡ (ϕ 1 , . . ., ϕ n ) from one of the product densities (i.e., of the form p ⊗n i ) and decide the value of i (i.e., which density generated the sample sequence).The success probability is given by p s n ((p i ) ℓ i=1 ) := sup t ℓ i=1 λ i dϕ n t(i|ϕ n ) p ⊗n i (ϕ n ), (E9) where λ i is a prior probability and (t(i|ϕ n )) ℓ i=1 is a conditional probability distribution (i.e., satisfying t(i|ϕ n ) ≥ 0 for all i ∈ {1, . . ., ℓ} and ℓ i=1 t(i|ϕ n ) = 1).By the same reasoning from Sections V and VI, our main result here is that where (D pi ) ℓ i=1 is a tuple of bosonic dephasing channels defined by the corresponding tuple (p i ) ℓ i=1 of probability densities.By employing the known result [79] (see also [80,Theorem 4.2] and [81][82][83]) that the asymptotic error exponent for multiple hypothesis testing is equal to the minimum pairwise Chernoff divergence, we conclude the following: where the Chernoff divergence C(p i ∥p j ) is defined from (14).

Antidistinguishability
The problem of antidistinguishability has the same structure as multiple channel discrimination, but the goal is the opposite.That is, the goal is to decide which channel was not selected.That is, if the ith channel is selected, the goal is to report back "not i".We can thus adopt all of the notation from the previous section, but the error probability for the antidistinguishability problem is given by Similarly, for the classical antidistinguishability problem, the error probability is given by p e n ((p i ) ℓ i=1 ) := inf t ℓ i=1 λ i dϕ n t(i|ϕ n ) p ⊗n i (ϕ n ).(E13) Thus, the main difference with multiple hypothesis testing mathematically is to minimize the objective functions rather than maximize them.By the same reasoning from Sections V and VI, we conclude that where (D pi ) ℓ i=1 is a tuple of bosonic dephasing channels defined by the corresponding tuple (p i ) ℓ i=1 of probability densities.
As shown recently in [84], there is a solution for the asymptotic error exponent of antidistinguishability.Namely, the following limit holds:

FIG. 6 .
FIG.6.Two measurement methods that achieve the optimality bounds.The method on the top (see Section VI A) involves preparing a uniform superposition of photon-number states, transmitting through the BDC Dp, performing a Fourier transform, followed by photodetection and classical postprocessing.The method on the bottom (see Section VI B) involves preparing a coherent state |α⟩, where α ∈ R+, transmitting through the BDC Dp, performing heterodyne detection, followed by classical post-processing.
FIG. 7.Comparison of photon-number-superposition and coherent-state schemes to the fundamental limit, when considering channel discrimination in the setting of asymmetric error.The figure plots the various relative entropies when γ1 = 1; here d = 20 for the photon-number-superposition scheme and |α| 2 ∈ {9.5, 25} for the coherent-state scheme.Note that the average energy of the input state is the same for |α| 2 = 9.5 and d = 20.

sin 2 (Using x − 1 <
dx/2) d sin 2 (x/2) , in (ii) we observed that the only nonzero term in the sum is for k = d φ′ 2π , where φ′ := φ mod 2π; indeed, since changing k → k + 1 displaces the point φ− 2πk d by exactly −2π/d, and the function Π d is nonzero in an interval of length precisely equal to 2π/d, there can be only one nonzero term in the sum; using that φ′ = φ + 2πp, where p ∈ {0, 1}, we can also verify that φ ⌊x⌋ ≤ x, we now note that 0

F
d (θ) = 1 for all d.In other words, also p ⋆ F d is uniformly continuous, and it has the same modulus of continuity as p.
≤ φ+ 2π d , because of the elementary properties of the floor function.Now, since lim d→∞ ω(2π/d) = 0, to establish (B10) we only need to check that p ⋆ F d converges uniformly to p as d → ∞; and this is well known to follow from the continuity of p, due to Fejér's theorem[75, Theorem 3.4].