Typicality of Heisenberg scaling precision in multi-mode quantum metrology

We propose a measurement setup reaching Heisenberg scaling precision for the estimation of any distributed parameter $\varphi$ (not necessarily a phase) encoded into a generic $M$-port linear network composed only of passive elements. The scheme proposed can be easily implemented from an experimental point of view since it employs only Gaussian states and Gaussian measurements. Due to the complete generality of the estimation problem considered, it was predicted that one would need to carry out an adaptive procedure which involves both the input states employed and the measurement performed at the output; we show that this is not necessary: Heisenberg scaling precision is still achievable by only adapting a single stage. The non-adapted stage only affects the value of a pre-factor multiplying the Heisenberg scaling precision: we show that, for large values of $M$ and a random (unbiased) choice of the non-adapted stage, this pre-factor takes a typical value which can be controlled through the encoding of the parameter $\varphi$ into the linear network.


I. INTRODUCTION
The precision achievable in a measurement when all experimental noise sources are minimized is ultimately determined by the discreteness of all physical phenomena: electronic devices will suffer the discreteness of the electric charge, whereas the quantum nature of light will affect optical devices. Due to this quantum noise, the error in the estimation of a physical parameter ϕ through a measurement employing N probes (e.g. photons, electrons) is strongly limited by the so-called "shot noise" factor of 1/ √ N . However, it has been proven that quantum features such as entanglement and squeezing can be exploited to go beyond the shot-noise limit and reach a precision of order 1/N , the so-called Heisenberg limit (HL) [1][2][3][4][5][6][7][8].
Several quantum metrological problems have been largely studied and a few approaches have been proposed to reach a HL sensitivity. However, these protocols are usually difficult to implement experimentally due to the convoluted and challenging measurement procedures [9][10][11] and the fragile quantum coherence needed in the input states [2,[12][13][14]. Gaussian states, on the other hand, provide a promising avenue for quantum optical technologies [15,16], since they are easier to create and manipulate experimentally compared to non-Gaussian ones, such as Fock states. Moreover, they allow a complete analytical treatment from a theoretical point of view [15][16][17]. In particular, the squeezing of a Gaussian state, which allows for highly reduced-noise signals, appears to be a valuable tool to reach quantum super-sensitive precision [8]. From a metrological perspective, squeezed states * giovanni.gramegna@ba.infn.it † danilo.triggiani@port.ac.uk ‡ vincenzo.tamma@port.ac.uk are often used along with Gaussian measurements [18][19][20], defined as measurement schemes producing a Gaussian probability distribution of the outcomes for any Gaussian state [15]. Homodyne and heterodyne detection represent paradigmatic examples of Gaussian measurements. It has been shown, both theoretically [21] and experimentally [22], that an adaptive homodyne phase estimation performs better than heterodyne detection, and approaches closer to the intrinsic quantum uncertainty than any previous technique when no prior knowledge of the phase is given. The importance of feedback and adaptivity in quantum estimation protocols has been underlined also in subsequent works [18,23]. Adaptiveness can be avoided in an optimal protocol (or near optimal) only if some constraint in the range of variation of the parameter is given [24][25][26].
Within the domain of quantum optics, photons are sent as probes through an interferometer where a parameter ϕ to be estimated is encoded. The information about the parameter is imprinted then in the output state of the photons, and it can be extracted by a suitable measurement. The situation which has been often considered is the case where ϕ is an optical phase [1,3,6,18,20] or a phase-like parameter [2,4]. These results clearly apply also to situations in which other quantities of interest (e.g. a distance) can be converted into an optical phase [3], but they fail to cover more general situations, e.g. the unknown parameter is distributed among several components of the interferometer. Recently, some progress has been made along this direction concerning the estimation of particular functions of multiple parameters distributed in a specific manner within a particular network. [27][28][29][30][31][32][33]. It has been also shown in a recent work [19] that the presence of a single unknown parameter distributed in multiple nodes of an arbitrary network introduces non-trivial complications if no constraints are given on the range of values the parameter is allowed to arXiv:2003.12551v2 [quant-ph] 17 Feb 2021 assume: in fact, it appears that a simultaneous adaptive procedure both in the input probe and in the measurement is needed in order to reach the HL, making the whole scheme quite unfeasible from a practical point of view. Furthermore, the proposed scheme requires an unquantified precision and number of resources in the adaptive procedure.
In this work we demonstrate the typicality of the Heisenberg limited sensitivity with a simple metrological technique which overcomes at the same time all these serious drawbacks. In particular, we consider a general scenario in which ϕ can be any parameter embedded into an arbitrary linear passive M -modes interferometer: it can be a parameter characterizing any specific component of the interferometer, or arbitrarily distributed among different components of the circuit. We will show that an experimentally feasible scheme achieving Heisenberg scaling is typically possible in such a general scenario. In our scheme (see FIG.1), a single-mode squeezed vacuum state is sent through a linear, passive preliminary stage which scatters the input photons among all the M channels of the interferometer, in order to extract the information on ϕ which is distributed among all the modes. A second auxiliary stage at the output of the interferometer refocuses the photons in the only observed output port. By employing a single-mode homodyne detection, we present two broad conditions which together suffice to reach the HL: the first being the requirement that most of the injected photons are successfully refocused on the observed output mode; the second simply being a minimal-resolution requirement on the homodyne measurement. Remarkably, these conditions allow for imperfections both in the refocusing and in the measurement. Heisenberg scaling is thus achievable by choosing two additional passive and linear stages, whose roles are to conveniently scatter the input probe to all the M modes, and then to refocus the photons. Despite the fact that the choice of these unitary stages can be in general ϕdependent, we show that it is always sufficient to adapt only one of the two stages, which we will thus call optimized, while the other stage can be chosen arbitrarily and independent of the parameter. Moreover we show that the optimized stage can be prepared with a precision which is achievable using only classical resources, or by means of a preliminary classical estimation. This is also consistent with the result obtained in [34]; namely, that a preliminary classical estimation of ϕ yields enough information to correctly prepare the optimized stage and thus to achieve Heisenberg scaling in the estimation protocol. Finally we show that the non-optimized stage affects the precision simply by a constant pre-factor. Using typicality and results of measure concentration in highdimensional vector spaces, we show that distributing the unknown parameter among an high number of modes M allows this pre-factor to typically take non-vanishing values.
The rest of the paper is organized as follows. In Section II we describe the proposed optical interferometer Block diagram of the investigated setup. A singlemode squeezed vacuum state with real squeezing parameter r is injected into the first preparation stageVin, which inputs the linear networkÛϕ encoding the parameter ϕ to be estimated. After the network, there is a second stageVout before the measurement. Finally, homodyne detection on the first output port ofVout is performed, and the quadrature field x θ is measured. In order to reach Heisenberg scaling sensitivity in the estimation of ϕ, it suffices to optimize only one of the two auxiliary stages, in such a way that one of the conditions (22) or (23) holds. and the relative Fisher Information. In Section III we use the Fisher Information to prove that the Heisenberg scaling can be achieved under suitable physical conditions; we then show how, even in the most general case, all the adaptivity can be confined within one of the auxiliary stages. In Section IV we discuss the typicality of our results for interferometers with a large number of channels. Finally, in Section V we draw some conclusions and discuss the outlook.

II. THE PROPOSED SETUP
Let us consider a metrological scheme where the parameter to be estimated is encoded into an M -ports passive linear network described by the unitaryÛ ϕ acting on M bosonic modesâ j (j = 1, . . . , M ) obeying the canonical commutation relations [â j ,â † k ] = δ jk , and [â j ,â k ] = [â † j ,â † k ] = 0. For a passive linear network, the action ofÛ ϕ on the annihilation operators is associated with an M × M unitary matrix via: The unitarity of the matrix U ϕ is strictly related to the conservation of the number of photons injected. By definition, U ϕ is the matrix of the single-photon transition amplitudes, i.e. |(U ϕ ) jk | 2 is the probability that a single photon injected into the k-th input channel ends up in the j-th output channel due to the action of the network.
We now propose an estimation scheme reaching Heisenberg scaling if suitable conditions are satisfied. As shown in FIG. 1, the preparation of the input probe consists in two steps: first, we inject a single-mode squeezed vacuum in the first port ofV in . Then, the unitary stagê V in is used to scatter the photons injected among all the modes. The input state of the networkÛ ϕ in our protocol is therefore given by |ψ 0 =V inŜ1 (r) |vac , wherê S 1 (r) = e r 2 (â 2 1 −â †2 1 ) is a single-mode squeezing operator with squeezing parameter r > 0, and |vac = |0 ⊗M is the M -mode vacuum state. The average number of photons injected in the apparatus is thus N = sinh 2 r. The stateÛ ϕ |ψ 0 at the output of the networkÛ ϕ undergoes the unitaryV out which refocuses all the photons into a single mode, namely the first one, where a homodyne measurement of the field quadraturex θ is performed. If the refocusing procedure is not perfect there will be some photons scattered into other channels with probability 1 − P ϕ , where is defined by the the probability amplitude (u ϕ ) 11 = (V out U ϕ V in ) 11 for the transition from the first input to the first output port in the overall interferometer u ϕ = V out U ϕ V in , with V in and V out being the single-photon unitary matrix representatives ofV in and V out respectively, obtained analogously to (1).
The homodyne measurement is described by a Positive Operator Valued Measure (POVM) M = {Π x }, whose elements are defined bŷ The probability of obtaining a value x from a measurement of the quadraturex θ = e iθâ † 1â 1x 1 e −iθâ † 1â 1 is then given by Born's rule over the output stateû ϕŜ1 (r) |vac after the overall interferometric evolutionû ϕ =V outÛϕVin , which yields (see Appendix A) where the variance of the Gaussian distribution It is known from classical estimation theory that the maximum precision attainable when inferring the value of the unknown parameter ϕ, is given by the so-called Cramér-Rao bound [35,36] δϕ 1 where ν is the number of measurments performed, while F (ϕ) is the Fisher Information (FI): The bound (7) can be asymptotically saturated through post-processing Bayesian data analysis [37][38][39]. The Fisher Information related to a Gaussian probability distribution with variance ∆ ϕ can be evaluated by inserting (5) into (8), and it reads Plugging (6) into (9), it is possible to explicitly evaluate the FI (see Appendix B) obtaining: where with being the accumulated phase through the interferometric evolution.

III. HEISENBERG SCALING
We now demonstrate that Heisenberg scaling sensitivity can be achieved in the proposed metrological setup shown in FIG. 1, if two conditions are met. The first condition is the constraint that the average number of photons scattered into channels which are not measured is a finite quantity ϕ , independent of N , which translates into the condition on the probability P ϕ in (2). Here, ϕ depends in general on the linear network U ϕ in which the parameter is embedded, and on the auxiliary stages V in and V out in FIG.1. In subsection III A we will show how, for any given arbitrary U ϕ , it is possible to optimize only a single stage so that the probability distribution in (2) can be expressed as (14). The second condition relates the accumulated phase γ ϕ = arg(u ϕ ) 11 through the whole setup Phase space representation of the squeezed vacuum state (with squeezing parameter re 2iγϕ ) at the first output channel of the whole setup shown in FIG. 1 (blue), and of the Fisher information (10) (four red lobes). We have considered for simplicity the case where all the photons are refocused in the first output channel (when condition (14) reduces to Pϕ = 1). Given any axis at an angle θ with respect to the horizontal axis, corresponding to the x0 = x quadrature, the distance between its intersections with the ellipse and the origin represents the standard deviation ∆ϕ of the quadraturex θ . In other words, the blue graph is the polar plot of ∆ϕ shown in (6) as a function of θ. A polar plot of the Fisher Information is overlaid in red. The Fisher Information takes vanishing values if the minimum-variance quadratures are measured, namely for θmin = γϕ ± π/2. This happens because the variance ∆ϕ of the quadrature along θmin is locally insensitive to the variations of ϕ. Thus, one needs to get far enough from θmin to achieve a suitable high value of the Fisher Information. In particular, we have shown that for a large number N of photons it is enough to move from θmin of an additional angle of the order 1/N as in (15) to reach the Heisenberg scaling in the measure of the parameter ϕ. and the phase θ = θ ϕ of the measured quadrature field x θ , according to where k ϕ can depend on ϕ, but is assumed to be independent of N . In practice, one can even fix k ϕ to a constant value without using additional resources.
A heuristic explanation behind this condition can be found in FIG. 2: in order to maximise the ratio in (9) while keeping constant N = sinh 2 r, the choice of the quadraturex θ to be measured is a trade-off between two opposite behaviours. One consists in minimizing the vari-ance ∆ ϕ in the denominator of (9), while the other consists in maximizing the sensitivity of the variance with respect to the variations of ϕ, namely choosing θ such that ∂ ϕ ∆ ϕ in the numerator is maximal. The former is met for θ as close as possible to γ ± π/2, sincex γ±π/2 are the squeezed quadratures after the rotation in phase space by the phase γ ϕ accumulated through the interferometer; the latter instead requires a choice of θ − γ far enough from the stationary points of the variance at γ ± π/2, where ∆ ϕ and therefore the overall probability distribution p(x|ϕ) are insensitive to variation in the parameter ϕ. Noticeably, the larger is N , and thus the squeezing parameter, the closer to the squeezed direction the quadrature field should be measured, as can be seen in (15) In order to prove the claim of HL scaling, we will evaluate the asymptotics of the Fisher information (10) as N → ∞. Substituting the value θ = θ ϕ in (15) into (11) and (12), we get Hence, substituting (16) and (17) in (10), and neglecting higher order terms, the asymptotic behavior of the Fisher information reads with (k, ) = 8k The quadratic scaling in the mean number photons N in (18) finally proves that conditions (14) and (15) suffice to reach the Heisenberg scaling. The asymptotics for the Fisher Information carries two pre-factors, (k ϕ , ϕ ) and (∂ ϕ γ ϕ ) 2 . We easily notice that the pre-factor (k, ) vanishes only at k = 0, and attains its maximum at = 0, k = ±1/4: so that, with this choice of the constants k and , the Fisher Information asymptotically reads Moreover, (k, ) is a decreasing function of independent of k, so that = 0 is always the best case, meaning that the less photons that are scattered in different channels, the higher the sensitivity in the estimation. Instead, for a fixed arbitrary positive value of , the maximum of (k, ) is reached for k = ± √ 4 + 1/4.
A. One-sided adaptivity (14) may appear to require a simultaneous optimization of the input V in and the output V out in a parameter-dependent way. This two-sided adaptation can be quite difficult to realize in practice.
However, we are going to show that in fact conditions (14) and (15) can always be satisfied with just a one-sided parameter-dependent adaptation, which can be performed either at the input or at the output of the network equivalently. And remarkably, this adaptation can be accomplished by performing a preliminary classical, shot-noise limited, estimation of ϕ.
In particular, one can choose to adaptively optimize only V out and fix V in to an arbitrary parameterindependent unitary stage: in this case, one can set the parameter-dependent condition Alternatively, it is possible to adaptively optimize only V in with the condition while V out can be arbitrarily chosen.
Remarkably both equations (22) and (23) imply that an error of the order of O 1 √ N is allowed to prepare the optimized stage to reach condition (14). To show that both equations (22) and (23) satisfy condition (14), we notice that P ϕ can be expressed as a transition probability between the two normalized vectors |v in = U ϕ V in |e 1 and |v out = V † out |e 1 , with |e 1 = (1, 0, . . . , 0) T . Then, equation (22) translates into with some H = H † , by unitarity. Therefore, we can see that We can notice that in both equations (22) and (23), no assumption on the non-optimized stage is made, so that its choice is completely arbitrary. This freedom affects the precision of the estimation of ϕ through the N -independent pre-factor (∂ ϕ γ ϕ ) 2 which appears in the Fisher Information (18). At this point, one may argue that this pre-factor may be vanishing if a poor choice for the non-adapted unitary is made. Remarkably, in the next Section we will show that the pre-factor is typically non-vanishing for random choices of the non-adapted stage and suitably well-behaved given linear networksÛ ϕ .

IV. TYPICAL SENSITIVITY
In this section we will address in more detail the study of the pre-factor (∂ ϕ γ ϕ ) 2 in the Fisher information (18), clarifying under what circumstances it can be safely considered non-vanishing and characterizing its magnitude for random choices of the non-optimized stage. First of all, we can link (∂ ϕ γ ϕ ) 2 to the derivative of the matrix element (u ϕ ) 11 = (V out U ϕ V in ) 11 = P ϕ e iγϕ : If condition (14) is satisfied, equation (27) simplifies to so that the two quantities are equal up to order 1/N . Now, if the adaptation is performed in the output, i.e. we choose an arbitrary V in and adapt V out according to equation (22), we see that where the Hermitian operator is the (ϕ-dependent) generator of U ϕ . If, on the other hand, condition (14) is realized through an adaptation on the input while taking an arbitrary V out , then equation (23) implies that Using equations (28)-(31), we can finally rewrite the asymptotic expression of the Fisher information (18) as with U = V in if the optimization is performed on the output, while U = U † ϕ V † out if the optimization is carried out on the input. We emphasize that the pre-factor f (U, G ϕ ) is completely independent of the choice of the optimized stage.
The maximization of the prefactor (33) can be realized, for example, if U = V ϕ is some unitary diagonalizing G ϕ i.e. satisfying equation V † ϕ G ϕ V ϕ = D ϕ with D ϕ = diag(g 1 , g 2 , . . . , g M ) being the diagonal matrix of the eigenvalues of G ϕ , ordered in such a way that |g 1 | = G ϕ is the maximum eigenvalue in absolute value [19]. Actually, it is not necessary to take a diagonalizing unitary to maximize (33), since only the first column of U enters in the definition of f (U, G ϕ ); hence, to maximize f (U, G ϕ ) it is sufficient to require this column to be the eigenvector of G ϕ corresponding to the maximum eigenvalue G ϕ . However, even that requirement would necessitate the complete knowledge of G ϕ , which in general depends on the unknown parameter ϕ. Therefore, it is more relevant to consider arbitrary choices of the non-adapted network (the unitary U ) independently of ϕ in order to determine the practical advantages of the obtained Heisenberg scaling precision for finite values of N and only one (classically) adapted stage.
For this reason, we will perform now a statistical analysis on the typical values which can be assumed by the prefactor f (U, G ϕ ) for random choices of the unitary U . Assuming no prior knowledge of the unitary U , we sample it from the unitary group U(M ) according to the unbiased uniform distribution probability, i.e. the unitarily invariant Haar measure P.
For a random unitary U , sampled according to this distribution, the average value of the prefactor f (U, G ϕ ) can be computed using techniques from random matrix theory (see Appendix D): where E[·] denotes the expectation value over U(M ) with respect to the Haar measure.
In the trivial case of a generator proportional to the identity, G ϕ = G ϕ 1, which corresponds to the case of a network U ϕ = e i Gϕ 1 acting as a ϕ-dependent global phase shifter, we have Tr(G ϕ ) 2 = M 2 G ϕ 2 and Tr G 2 ϕ = M G ϕ 2 , so that the average value of the pre-factor equals the maximum one, f max = G ϕ 2 , in accordance to the fact that in this particular case every unitary in U(M ) diagonalizes G ϕ .
In general, we are interested in determining the conditions which make this average value in (34) as large as possible. First of all, we can note directly from expression (34) that eigenvalues of opposite signs can have a detrimental effect on this average, since they lower the value of Tr(G ϕ ). In general, we can find a lower bound on the average value (34) using Jensen's inequal- where again, the average E[(U † G ϕ U ) 11 ] has been computed using standard techniques (see Appendix D). Notice that the right-hand side of this inequality is nothing but the square of the average between G ϕ 's eigenvalues. Hence, if we have some degree of control on the eigenvalues, we can achieve a result which is a certain fraction α of the maximum value f max if the average of G ϕ 's eigenvalues is at least a fraction √ α of the maximum eigenvalue, namely: (36) However, this may be not sufficient for our purposes, since the average of a random variable alone does not determine its typical behaviour: a paradigmatic elementary example is that of a real random variable taking only the values 0 or 1 with equal probabilities, thus having an average of 1/2 even if it never takes values close to 1/2.
We will show now that this is not the case for the pre-factor f (U, G ϕ ), thanks to the fact that it is a sufficiently well behaved function with respect to the random unitary U . In fact, by using results on concentration of measure in high-dimensional probability spaces, we prove in Appendix E that for an network with a large number M of ports the pre-factor f (U, G ϕ ) becomes typical, meaning that it becomes almost constant with respect to random choices of U ∈ U(M ) (according to the unitarily invariant measure), hence concentrating around its average value (34), bounded below by (36).
In formulas, we have that where A = (72π 3 ) −1 . This result tells us that for large interferometers it is extremely unlikely to obtain a prefactor sensibly different from its average, since for large values of M the probability of f (U, G ϕ ) being different from its average is exponentially suppressed. This result can be also seen from the exact distribution of the pre-factor computed for some particular cases, which is shown in FIG. 3. It can be seen from these figures that as M is increased the distribution of the pre-factor concentrates around its average: in particular, for the chosen configuration, a value of M = 20 is already sufficient to get this concentration. Thus, for any well-behaved linear networkÛ ϕ such that the expectation value in (34) is far enough from zero, any random choice of the non-adapted stage in the proposed interferometric setup typically yields an Heisenberg-scaling precision for the estimation of ϕ if the number M of interferometric channels is large enough.

V. CONCLUSIONS
We demonstrated by using a simple metrological technique the typicality of Heisenberg scaling precision for the estimation of a generic parameter ϕ encoded into an arbitrary M -mode network. Our scheme can be applied regardless of the nature of the parameter, which can even be distributed among several components of the network. In particular, the proposed scheme makes use of a single-mode squeezed state as a probe, scattered throughout all the modes by means of an auxiliary passive linear stage. Once the information on the parameter is gathered by the probe, this gets refocused on a single output channel by a second auxiliary stage, and then detected with homodyne measurement. The analysis of the Fisher information associated with such scheme reveals that, if a constant average number of photons (not scaling with the total number of photons injected) is scattered into channels different from the one measured, due to an imperfect refocusing procedure, the Heisenberg limit can be asymptotically reached, provided that the homodyne detection is performed with a sufficient resolution. For a distributed parameter, the refocusing is generally parameter-dependent, implying some sort of adaptive procedure in order to correctly refocus the probe. However, we have shown that all the dependence on the parameter can be entirely bounded to only one of the two auxiliary stages, while the other only affects the estimation through a multiplicative pre-factor. Moreover, we have also discussed how all the information on the parameter needed to sufficiently refocus the probe can be obtained with a classical shot-noise precision, meaning that the number of resources required to adaptively optimize the auxiliary stages is not detrimental for the Heisenberg scaling precision. Finally, we have shown that, for a large number of modes, Heisenberg scaling is typically obtained by an arbitrary non-adapted stage, with an overwhelming probability, i.e. an exponentially suppressed probability of failure.
briefly introduce only the concepts and tools needed to obtain the expression (5) for the probability distribution p(x|ϕ) of measuring the quadrature value x at the output of the proposed interferometer in FIG. 1. We recall first the definition (4) whereû ϕ describes the overall interferometric evolution of the single mode squeezed stateŜ 1 (r) |vac . To evaluate (A1), it is useful to firstly recover its Fourier transform, It is possible to write this characteristic function in a more canonical way. Indeed, first notice that we can write (the derivation is given in Appendix A 1 for completeness) whereD is the displacement operator, with Then, using equation (A3) we can write the characteristic function (A2) as Due to the Gaussian nature of the squeezed vacuum state and the linearity of the interferometric setup, the characteristic function (A6) is a Gaussian bivariate function centred in zero, of the form [15] where σ ϕ is the 2 × 2 covariance matrix of the whole interferometer output stateû ϕŜ1 (r) |vac , reduced to the first mode. In order to evaluate this matrix, we firstly recover the covariance matrix Γ 0 of the input stateŜ 1 (r) |vac , which reads where R is the M × M diagonal matrix with a single non-zero entry R 11 ≡ r. After the action of the interferometer, the covariance matrix transforms into where R ϕ is the orthogonal and symplectic matrix associated with the interferometer unitary matrix u ϕ and where 1 is the M × M identity matrix. R ϕ can be easily evaluated to be so that Γ ϕ in (A9) reads (A13) where we have defined the M × M matrices In the second lines of each of the previous expression, we have exploited the fact that R is real. We are interested to evaluate σ ϕ , the covariance matrix reduced to the first mode, which we can now readily write and insert in (A7). Our final step is to invert the Fourier transform to finally get the expression of the probability distribution p(x|ϕ) given in (5). In order to do that, we introduce the 2 × 2 orthogonal matrix such that ξ θ = O θ ξ 0 , with ξ 0 = (ξ, 0) T . Then, the characteristic function (A7) can be written in a more convenient way, namely Exploiting, by the definition of R, the identities and some elementary trigonometry, the term O T θ σ ϕ O θ 11 can be further manipulated to match the expression of ∆ ϕ given in (6). In fact After applying the inverse Fourier transformation on the Gaussian characteristic function (A7), the probability distribution reads (5).
Appendix B: Derivation of the Fisher Information in (10) In this appendix we will evaluate the FI in (10) from the expression in (9). Let us recall that the variance ∆ ϕ of the Gaussian probability density function (5) reads The derivative of ∆ ϕ is written as a sum of two contributions Re[e −2iθ ∂ϕ(u ϕ ) 2 11 ]. (B2) The derivative in the first contribution is thus evaluated while the derivative in the second contribution reads Then, defining γ ϕ as the phase of (u ϕ ) 11 , and recalling that sinh 2 r = N , (B2) reads while (B1) can be written as Inserting the expressions (B5) and (B6) into (9), we get the FI Moreover we can write (u ϕ ) 11 = e iγϕ P ϕ , so that Since P ϕ , γ ϕ and their derivatives are real, once we define the quantities we easily obtain as displayed in (10).
Appendix C: Analytic distribution of the pre-factor (33) in the Fisher information (32) for generators with only two distinct eigenvalues We will derive here the explicit form of the probability density function for the pre-factor f (U, G ϕ ) = (U † G ϕ U ) 2 11 in (33) for a fixed generator G ϕ as U is sampled from U(M ) with the Haar measure. First of all, note that this distribution depends only on the eigenvalues of G ϕ , which we denote with g = (g 1 , . . . , g M ), dropping the ϕ subscript for notation simplicity. This can be seen using the spectral decomposition of G ϕ = V † ϕ D ϕ V ϕ , where D ϕ = diag(g), yielding: where in the last step we used the invariance property of the Haar measure and we used the notation d = to say that the two random variables have the same distribution. In light of this remark, we have that having defined the random vector u = U e 1 obtained by the application of the random matrix U ∈ U(M ) to the fixed basis vector e 1 = (1, . . . , 0) T ∈ C M , where C M denotes the set of M -tuples of complex numbers. We see that f (U, D ϕ ) can be interpreted as a weighted average of the eigenvalues of G ϕ with random weights; these weights are given by the square modulus of the components of a random vector drawn from the unit sphere in C M with the Haar measure. The distribution of this random variable can be quite complicated for a generic choice of the G ϕ 's eigenvalues g = (g 1 , . . . , g M ). We will consider here the situation in which there are at most two distinct eigenvalues g 1 g 2 0, i.e: where we used the normalization constraint In order to get the distribution of (C4), let us first consider the random quantity τ (U ) defined by the sum inside the brackets, namely: We start from the distribution q(t) of τ (U ), defined in such a way that q(t)dt is the probability to have or, defining x 2j−1 := Re u j and x 2j := Im u j , the probability to have: This probability can be interpreted as the geometrical surface of a 2k-dimensional hyperspherical cap of a (2M − 1)-dimensional hypersphere sitting in R 2M . Using this interpretation, one then finds that [42,43]: where Starting from the distribution (C9) of τ (U ), the probability density function p(x) of can be found with a change of variables to be: where ∆g := g 1 − g 2 . We then have explicitly: (C13) where C is a normalization constant given by: This distribution is valid whenever G ϕ has only two distinct positive eigenvalues g 1 g 2 0. Numerical results are compared with the probability density function (C13) in FIG. 3. while equation (D11) can be proved using (D9): For A = G ϕ and i = j = 1, the expressions (D10) and (D11) reduces to the equalities in (35) and to (34), respectively.
Appendix E: Derivation of the typicality result in (37) In this appendix we will show how to derive equation (37) starting from a standard result on concentration of measure in high-dimensional spaces known as Levy's Lemma, which we report in the following theorem for the sake of completeness.
Theorem 1. Let f : S n−1 → R be a function defined over the unit euclidean sphere S n−1 = x ∈ R n n k=1 x 2 k = 1 (E1) endowed with the invariant Haar probability measure P. Denote with L the Lipschitz constant of the function, i.e. the minimum L such that for all x, y ∈ S n−1 , where x 2 = n k=1 x 2 k is the Euclidean norm. Then: where C is some positive constant which can be taken to be C = 9π 3 [46,47].
In order to apply Theorem 1 to our case, we need to compute the Lipschitz constant associated with the prefactor (33). First, note that f (U, G ϕ ) can be interpreted as a function defined on a real unit sphere. In fact, it can be written as where u is a complex vector on the unit sphere, given by u = U e with e = (1, 0, . . . , 0) T ∈ C M . Since only the squared moduli |u j | 2 appear in this expression, we can recast the problem in terms of a real vector x ∈ R 2M whose components are defined by: x 2j−1 = Re u j , x 2j = Im u j , j = 1, . . . , M.
(E5) The normalization constraint M j=1 |u j | 2 = 1 becomes 2M j=1 so that x ∈ S 2M −1 , the unit sphere sitting inside R 2M . We see then that the random factor in equation (E4) can be envisioned as a function defined over the unit sphere where we have defined the diagonal matrix G = diag( g) with g = (g 1 , g 1 , . . . , g M , g M ) ∈ R 2M . In order to apply Theorem 1 we need to estimate the Lipschitz constant L of the function f ; to this aim, we evaluate the gradient of f , which is given by: ∇f (x) = 4(x T Gx) Gx.
The Lipschitz constant for f can be then obtained as: To see this, note simply that: where in the inequality we used the fact that |x T Gx| G and x T G 2 x = Gx 2 G 2 , while in the last equality we used the fact that G = G ϕ . The value ∇f (x) 2 = 4 G ϕ 2 can be obtained with x = (1, 0 . . . , 0) T , which together with (E10) proves (E9). Applying Theorem 1 to our case with n = 2M and L = 4 G ϕ 2 finally yields equation (37).