Random bosonic states for robust quantum metrology

We study how useful random states are for quantum metrology, i.e., surpass the classical limits imposed on precision in the canonical phase estimation scenario. First, we prove that random pure states drawn from the Hilbert space of distinguishable particles typically do not lead to super-classical scaling of precision even when allowing for local unitary optimization. Conversely, we show that random states from the symmetric subspace typically achieve the optimal Heisenberg scaling without the need for local unitary optimization. Surprisingly, the Heisenberg scaling is observed for states of arbitrarily low purity and preserved under finite particle losses. Moreover, we prove that for such states a standard photon-counting interferometric measurement suffices to typically achieve the Heisenberg scaling of precision for all possible values of the phase at the same time. Finally, we demonstrate that metrologically useful states can be prepared with short random optical circuits generated from three types of beam-splitters and a non-linear (Kerr-like) transformation.


I. INTRODUCTION
Quantum metrology opens the possibility to exploit quantum features to measure unknown physical quantities with accuracy surpassing the constraints dictated by classical physics [1][2][3][4]. Classically, by employing N probes to independently sense a parameter, the mean squared error of estimation scales at best as 1/N . This resolution is also known as the Standard Quantum Limit (SQL) [5]. Quantum mechanics, however, allows one to engineer entangled states of N particles which, when used as probes, can lead to resolutions beyond the SQL. Crucially, in the canonical phase sensing scenario a precision scaling like 1/N 2 , known as the Heisenberg Limit (HL) [6], may be attained. In practice, the destructive impact of noise must also be taken into account [7][8][9], but quantum-enhanced resolutions have been successfully observed in optical interferometry [10,11] (including gravitational-wave detection [12]), ultracold ion spectroscopy [13,14], atomic magnetometry [15,16], and in entanglement-assisted atomic clocks [17,18].
A fundamental question is to understand which quantum states offer an advantage for quantum metrology. Quantumenhanced parameter sensitivity may only be observed with systems exhibiting inter-particle entanglement [19], thus, such enhanced sensitivity can be used to detect multipartite entanglement [20][21][22][23] and lower-bound the number of particles being entangled [24,25]. However, the precise connection between entanglement and a quantum metrological advantage is so far not fully understood.
It is known that states achieving the optimal sensitivity must have entanglement between all their particles [25], like for example the Greenbegrer-Horne-Zeilinger (GHZ) state (equivalent to the N00N state in optical interferometry), yet there also exist classes of such states which are useless from the metrological perspective [26]. The optimal states, however, belong to the symmetric (bosonic) subspace, from which many states have been recognized to offer a significant advantage in quantum metrology [26][27][28]. On the other hand, a very weak form of entanglement-so-called undistillable entanglement-may lead to Heisenberg scaling [29], while any super-classical scaling arbitrarily close to the HL (1/N 2− with > 0) can be * michal.oszmaniec@icfo.es achieved with states whose geometric measure of entanglement vanishes in the limit N → ∞ [30].
Here, we go beyond merely presenting examples of states leading to quantum-enhanced precision. Instead, we conduct a systematic study by analyzing typical properties of the quantum and classical Fisher information on various ensembles of quantum states. First, we show that states of distinguishable particles typically are not useful for metrology, despite having a large amount of entanglement as measured by the entanglement entropy [31][32][33][34] and various other measures [35][36][37][38].
On the contrary, we show most pure random states from the symmetric (bosonic) subspace of any local dimension achieve resolutions at the HL. Moreover, we prove that the usefulness of random symmetric states is robust against loss of a finite number of particles, and holds also for mixed states with fixed spectra (as long as the distance from the maximally mixed state in the symmetric subspace is sufficiently large). This is in stark contrast to the case of GHZ states, which completely lose their (otherwise ideal) phase sensitivity upon loss of just a single particle. Third, we show that, even for a fixed measure-ment, random pure bosonic states typically allow to sense the phase at the HL. Concretely, this holds in the natural quantum optics setting of photon-number detection in output modes of a balanced beam-splitter [39] and independently of the value of the parameter. Finally, we demonstrate that states generated using random circuits with gates from a universal gate-set on the symmetric subspace consisting only of beam-splitters and a single non-linear Kerr-like transformation also typically achieve Heisenberg scaling-again even for a fixed measurement. As all our findings also equally apply to standard atomic interferometry [40][41][42][43]. Our work shows that metrological usefulness is a more generic feature than previously thought and opens up new possibilities for quantum-enhanced metrology based on random states.
Lastly, let us note that, as metrological usefulness of quantum states is tantamount to the notion of state macroscopicity [44], our results directly apply in this context [45,46]. Moreover, since states attaining HL can be used to approximately clone N quantum gates into as many as N 2 gates as N → ∞, one can immediately use our findings to also infer that typical symmetric states provide a resource allowing for optimal asymptotic replication of unitary gates [47,48].
Our results are based on leveraging recent insights concerning the continuity of quantum Fisher information [30], measure concentration techniques [35,[49][50][51], lately proven results about the spectral gap in the special unitary group [52], as well as the theory of approximate t-designs [53][54][55].
Our work sheds new light on the role of symmetric states in quantum metrology [26-28, 46, 56]. In particular, it clarifies the usefulness of symmetric states from the typicality perspective [26], but also analytically confirms the findings about their typical properties previously suggested by numerical compuations [46,56].
The remainder of the paper is organized as follows. In Section II we introduce the setting of quantum parameter estimation including the classical and quantum Fisher information and their operational interpretation. In Section III we familiarize the reader with the technique of measure concentration and introduce some additional notation. In Section IV we present our results on the lack of usefulness of random states of distinguishable particles for quantum metrology. Subsequently, in Section V we show that states from the symmetric (bosonic) subspace typically attain the HL. Following that, in Section VI, we analyze the robustness of metrological usefulness of such states under noise and loss of particles. In Section VII we turn our attention to the classical Fisher information in a concrete measurement setup. We prove that with random pure symmetric states Heisenberg scaling can be typically achieved with a physically accessible measurement setup-essentially a Mach-Zehnder interferometer with particle number detectors. Finally, in Section VIII, we demonstrate that symmetric states whose metrological properties effectively mimic those of random symmetric states can be prepared by short random circuits generated from a universal gate-set in the symmetric subspace. We conclude our work in Section IX.

II. QUANTUM METROLOGY
We consider the canonical phase sensing scenario of quantum metrology [6], in which one is given N devices (black boxes) that encode a phase-like parameter ϕ. The task is to de-termine the optimal strategy of preparing quantum probes, so that after interaction with the devices the probes can be measured in a way that reveals highest sensitivity to fluctuations of the parameter ϕ. Crucially, in quantum mechanics one has the freedom to apply the devices "in parallel" to a (possibly entangled) state ρ of N particles-see Fig. 1. The whole process is assumed to be repeated many ν 1 times, so that sufficient statical data is always guaranteed. Note that parameter sensing is the most "optimistic" metrology setting [57], in the sens that any type of general phase estimation problem that also accounts for the non-perfect prior knowledge about the value of ϕ or finite size of the measurement data is bound to be more difficult [58].
Let p n|ϕ be the probability that (in any given round) the measurement resulted in the outcome n given that the initial state was ρ and the parameter was ϕ. Then, the mean squared error ∆ 2φ of any unbiased and consistent estimatorφ of ϕ is lower-bounded by the Cramér-Rao Bound (CRB) [57]: where F cl p n|ϕ := n 1 p n|ϕ dp n|ϕ dϕ 2 (2) is the classical Fisher information (FI). Importantly, in the phase sensing scenario, the CRB (1) is guaranteed to be tight in the limit of large ν [57]. The classical Fisher information, hence, quantifies in an operationally meaningful way what is the ultimate resolution to the fluctuations of ϕ with a given measurement and state.
The quantum device is taken to act on a single particle (photon, atom, etc.) with Hilbert space H loc of finite dimension d := | H loc | via a Hamiltonian h, which without loss of generality we assume to be traceless (any nonzero trace of h can always be incorporated into an irrelevant global phase-factor). The device unitarily encodes the unknown parameter ϕ by performing the transformation e −ihϕ . The multi-particle state ρ ∈ H N := (H loc ) ⊗N moves along the trajectory corresponding to the unitary evolution with the global Hamiltonian The measurement is defined by a positive-operator-valued measure (POVM), {Π N n } n , acting on the whole system (or, while accounting for particle losses, only on the remaining to N − k particles, see Fig. 1) and satisfying n Π N n = I. It yields outcome n with probability In the seminal work by Braunstein and Caves [59], it was shown how to quantify the maximal usefulness of a state ρ in the above scenario by maximizing the classical Fisher information (2) over all possible POVMs. The resulting quantity is called quantum Fisher Information (QFI). It depends solely on the quantum state ρ being considered and the Hamiltonian H responsible for the parameter encoding and we denote it by F(ρ, H). A QFI scaling faster than linear with N (for the fixed local Hamiltonian h) ultimately leads to super-classical resolutions by virtue of the CRB (1). Since the Hamiltonian (4) is local and parameter-independent, the ultimate HL is unambiguously given by F cl ∝ N 2 with super-classical scaling being possible solely due to the entanglement properties of ρ and not due to a non-local or non-linear parameter dependence [60][61][62]. Importantly, thanks to the unitary character of the parameter encoding (3), resolutions that scale beyond SQL can indeed be attained in metrology [8,9].
Although in the phase sensing scenario the optimal measurement can be designed for a particular value of ϕ, it is often enough to know, prior to the estimation, the parameter to lie within a sufficiently narrow window of its values, as then there exist a sequence of measurements which eventually-in the limit of many protocol repetitions (ν → ∞ in Fig. 1) with measurements adaptively adjusted-yields a classical Fisher information that still achieves the QFI [63]. Crucially, the optimal scaling of the QFI achievable in the phase sensing scenario is proportional to N 2 , what proves the Heisenberg scaling to indeed be the ultimate one.
For the sake of having a concise terminology, we call a family of states for increasing N useful for quantum sensing if there exists a Hamiltonian h in Eq. (3) such that the corresponding QFI scales faster than N (i.e., F(ρ, H) / ∈ O(N )) in the limit of large N . In contrast, we say that the family of states is not useful for quantum sensing (and hence also for all less "optimistic" metrological scenarios), if its QFI scales asymptotically at most like N (i.e., F(ρ, H) ∈ O(N )). We adopt the above nomenclature for the sake of brevity and concreteness. However, let us stress that states reaching beyond SQL, despite not preserving super-classical precision scaling, may also yield dramatic precision enhancement (e.g., squeezed states in gravitational detectors [12]), which, in fact, guarantees then their rich entanglement structure [20][21][22][23][24][25]. Nevertheless, it is the super-classical precision scaling that manifests the necessary entanglement properties to be maintained with the system size. In particular, its protection at the level of QFI has recently allowed to design novel noiserobust metrology protocols [64,65].
In the remainder of this section we give a mathematical definition of the QFI and discuss some of its properties. The QFI has an elegant geometric interpretation [59] as the "square of the speed" along the trajectory (3) measured with respect to the Bures distance d B (ρ, σ) := 2 [1 − F (ρ, σ)], where F (ρ, σ) := tr ρ 1/2 σ ρ 1/2 is the fidelity. This allows one to define the QFI geometrically [66]: This geometric interpretation of QFI is key for the derivation of the following results. Using the spectral decomposition of a quantum state ρ = d N j=1 p j |e j e j |, with p j ≥ 0 denoting its eigenvalues, we can write the QFI more explicitly as [2,3]: which for pure states, ρ = ψ, simplifies further to the variance ∆ 2 H ψ of H, that is, Let us also recall that the QFI is a convex function on the space of quantum states. This reflects that mixing states can never increase their parameter sensitivity above their average sensitivity. This, together with the fact that the QFI is also additive on product states [2,3], directly proves that only entangled states can lead to resolutions beyond the SQL.

III. CONCENTRATION OF MEASURE PHENOMENON
Before presenting our main results, we briefly discuss a key ingredient for their proofs-the concentration of measure phenomenon [49][50][51]. For a more detailed discussion we point the reader to Appendix A. For any finite dimensional Hilbert space we denote by µ (H) the Haar measure on the special unitary group SU (H). The Haar measure can be thought of as the uniform distribution over unitary transformations. We denote by Pr U ∼µ(H) (A(U )) the probability that a statement A holds for unitaries U drawn from the measure µ (H) and by the expectation value of a function f : SU (H) → R. Our findings concern the typical value of such functions. For example, f (U ) could be the QFI of some family of so-called isospectral states, i.e., states of the form U ρ U † , with ρ some fixed state on H and U ∼ µ (H) a unitary drawn from the Haar measure on SU(H) [37]. Note that as F(U ρ U † , H) = F(ρ, U † H U ) (this follows directly from Eq. (7)), all our results can also be interpreted as being about random Hamiltonians instead of random states. To show that for almost all U the value of such a function is close to the typical value and that this typical value is close to the average, we repeatedly employ the following concentration of measure inequality [50]: 10) It holds for every ≥ 0 and every function f : SU (H) → R that is Lipschitz continuous (with respect to the geodesic distance on SU (H)) and thus possesses its corresponding Lipschitz constant L. Recall that the Lipschitz constant gives the bound on how fast the value of a function can change under a change of its argument. For a formal definition of L see Eq. (A5) in Appendix A, where we explicitly prove bounds on Lipschitz constants of all the functions relevant for our considerations. Let us here only note that as both the FI (2) and the QFI (6) are non-trivial (in particular non-linear) functions of quantum states, we need to resort to advanced techniques of differential geometry.
Before we move on to our results, we introduce a minimal amount of additional notation: Given two functions f, g we write f (N ) ∈ Θ(g(N )) if both functions have the same behaviour in the limit of large N (up to a positive multiplicative constant) and write f (N ) ∈ O(g(N )) if there exists a constant C such that f (N ) ≤ C g(N ) in the limit of large N . Slightly abusing notation, we sometimes also use the symbols Θ(f (N )) and O(f (N )) to denote an arbitrary function with the same asymptotic behavior as f . For any operator X we denote its operator norm by where |ψ = ψ|ψ stands for the standard vector norm. Then, the trace and the Hilbert-Schmidt norms of X are defined as X 1 := tr √ X † X and X HS := tr (X † X), respectively. These generally obey the relation X ≤ X HS ≤ X 1 .

IV. FUTILITY OF GENERAL RANDOM STATES
First, we show that Haar-random isospectral states of distinguishable particles are typically not useful for quantum metrology even if they are pure and hence typically highly entangled [31][32][33][35][36][37][38]. This remains true even if one allows for local unitary (LU) optimization before the parameter is encoded (see Fig. 1). Note that in the special case d = 2 the LU-optimization of the input state is equivalent to an optimization over all unitary parameter encodings. The maximal achievable QFI with LU-optimization is given by where LU denotes the local unitary group, i.e., a group consisting of unitaries of the form V = V 1 ⊗V 2 ⊗· · ·⊗V N with V j acting on the j-th particle of the system (see Fig. 1). Despite the fact that for other states this sometimes boosts their QFI [26], we have the following no-go theorem for random states from the full Hilbert space H N of distinguishable particles: Theorem 1 (Most random states of distinguishable particles are not useful for metrology even after LU-optimization). Fix a local dimension d, single-particle Hamiltonian h, and a pure state Proof sketch. From Eq. (8) we have that F(ψ, H) ≤ 4 tr(ψ H), which implies The terms with j = k give a contribution of at most 4 N h 2 . In the remaining terms, however, the operator But, the average of tr ¬j,k (U ψ N U † ) − 1/d 2 1 can be upper bounded exponentially [67] as so that Conversely, a direct computation of the average non LUoptimized QFI yields a lower bound of order N h 2 . Applying a concentration inequality of the type given in Eq. (10) yields the result (see Appendix D for details).
Due to the convexity of QFI the typical behavior of the QFI on any unitary-invariant ensembles [66] of mixed density matrices in H N can only be worse than that of pure states predicted by the above theorem. Furthermore, a bound similar to Eq. (17) can also be derived for Hamiltonians H with few body terms, like for example such with finite or short range interactions on regular lattices or those considered in Ref. [60]. Lastly, let us remark that, as we consider the most optimistic phase sensing protocol, Theorem 1 also disproves possibility of super-classical scalings of precision when considering random states in any general phase estimation protocol [58], e.g., the Bayesian single-shot scenarios [68][69][70].
The above proof relies on the fact that most random states on H N have very mixed two-particle marginals. Thus, high entanglement entropy is enough to make random pure states on H N useless for quantum metrology. Complementing this, in [30] it has been proven that non-vanishing geometric measure of entanglement E g (ψ) ∈ Θ (1) is at the same time necessary for Heisenberg scaling (recall that the geometric measure of entanglement for a pure state ψ is defined as E g (ψ) := 1 − max σ∈SEP tr(ψσ), where SEP denotes the set of separable states in D (H N )). Interestingly, pure random states of N particles do typically satisfy E g (ψ) ≈ 1 [36]. This, together with Theorem 1 shows that, contrary to a recent conjecture of Ref. [71], high geometric measure of entanglement is not sufficient for states to exhibit super-classical precision scaling in quantum metrology. Let us however note that this is consistent with numerical findings of Ref. [46]. That the presence or absence of super-classical scaling of the QFI arises solely from the two-particle marginals has recently also been noted in Ref. [72].

V. USEFULNESS OF RANDOM SYMMETRIC STATES
We now turn to the study of random states from the symmetric (bosonic) subspace of N qudits, S N := span{|ψ ⊗N : This subspace of states contains metrologically useful states such as the GHZ state or the Dicke states [27] and naturally appears in experimental setups employing photons and bosonic atoms [3]. For the special case d = 2 it was proven in [26] that with LU optimization almost all pure symmetric states exhibit In what follows we consider random isospectral symmetric states, i.e., states of the form U σ N U † with σ N being a fixed state on S N and U ∼ µ (S N ). By σ mix = 1 S N /|S N | we denote the maximally mixed state in S N . In particular, we prove that typically such symmetric states achieve a Heisenberg-like scaling, provided that the spectrum of σ N differs sufficiently from the spectrum of σ mix . Interestingly, this holds even without LU-optimization: Theorem 2 (Most random isospectral symmetric states are useful for quantum sensing). Fix a single-particle Hamiltonian h, local dimension d and a state σ N from the sym- Proof sketch. We use the standard integration techniques (see Appendix C for details) on the unitary group to show that where and Λ ({p j } j ) is a function of the eigenvalues {p j } j which for pure states attains Λ = 1. Therefore, for the case of pure states we have where ψ N is an arbitrary pure state on S N . From this it clearly follows that the average QFI of random pure symmetric states exhibits Heisenberg scaling in the limit N → ∞. Moreover, it turns out that Λ ({p j } j ) satisfies the inequality The inequality (18) follows now from concentration of measure inequalities on SU (S N ) (see Appendix D for a detailed proof).
As the Bures distance to the maximally mixed state d B (σ N , σ mix ) enters the theorem in a non-trivial way, we illustrate the power of the result by showing that even states that asymptotically move arbitrary close to σ mix still typically achieve a super-classical scaling: Corollary 1 (Sufficient condition for usefulness of random isospectral symmetric states). Let U σ N U † be an ensemble of isospectral states from the symmetric subspace S N with eigenvalues {p j } j . Theorem 2 implies that random states drawn from such a ensemble are typically useful for sensing as for any α < min {1/2, (d − 1)/3} they yield a precision scaling Let us remark that Theorem (2) constitutes a fairly complete description of the typical properties of QFI on various ensembles of isospectral density matrices. Typical properties of entanglement and its generalizations on sets of isospectral density matrices were analyzed in Ref. [37].

VI. ROBUSTNESS TO IMPERFECTIONS
Next we underline the practical importance of the above results by showing that the usefulness of random symmetric states is robust against dephasing noise and particle loss.

A. Depolarising noise
We first investigate how mixed σ N may become while still providing a quantum advantage for metrology. To this aim, we consider a concrete ensemble of depolarized states: Example 1 (Depolarized random symmetric states). Fix a local dimension d, single particle Hamiltonian h, and p ∈ [0, 1]. Let ψ N be a pure state on S N and set The above example shows that for all values of p < 1, the Heisenberg scaling of the QFI is typically retained. The QFI then still concentrates around its average, which is of order N 2 . Moreover, we observe that for large N , the average value of QFI of random symmetric depolarized states decreases essentially linearly with p as |Λ p − (1 − p)| ≤ 2/|S N |. Finally, Eq. (24) is a large deviation inequality for QFI on the ensamble of depolarized pure symmetric states, with the mean E F p . The average E F p for the special case p = 0 is given by (21).

B. Finite loss of particles
Next we investigate whether the Heisenberg scaling of random symmetric states is robust under the loss of particles. We model the particle loss by the partial trace over k ≤ N particles, i.e., σ N → tr k (σ N ) for a given state σ N . Note that due to the permutation symmetry of state σ N considered, it does not matter which particles are lost. In particular, such mechanism is equivalent to the situation in which one is capable of measuring only (as if distinguishable) N − k of the particles. We are therefore interested in typical properties of For comparison, let us recall that the GHZ-state, which is known to be optimal in quantum sensing [5], becomes completely useless upon the loss of just a single particle as the remaining reduced state is separable. In contrast, sufficiently pure random bosonic states typically remain useful for sensing even when a constant number of particles has been lost. Even the Heisenberg scaling ∼ 1/N 2 of the QFI is preserved: Theorem 3 (Random isospectral symmetric states are typically useful under finite particle loss). Fix a local dimension d, single particle Hamiltonian h, and a state σ N on S N with eigenvalues {p j } j . Let F k (U ) := F(tr k U σ N U † , H N −k ), then 6 The main difficulty in the proof of the above theorem is that the random matrix ensemble induced by the partial trace of random isospectral states form S N is not well-studied. Hence we cannot use standard techniques to compute the average of F k (U ). We circumvent this problem by lower-bounding the QFI by the asymmetry measure [73] and then the HS norm and then computing the average of the right hand side instead. We provide all the technical details of the proof in the Appendix D; let us here only consider the two mode case as an example: Example 2. For d = 2 and after fixing tr(h 2 ) = 1/2, the following inequality holds (see Lemma 9 in Appendix C) which for the pure states further simplifies to (29) It is interesting to note that without particle losses, i.e., k = 0, this formula gives a result that differs from the exact expression (21) only by a factor of 1/2.
The above result is most relevant for atomic interferometry experiments [40][41][42][43], in which unit detection efficiencies can be achieved and it is hence reasonably possible to limit the loss of particles to a small number. In contrast, current optical implementations are limited by inefficiencies of photon detectors that are adequately modeled with a fictions beamsplitter model [3]. It is known in noisy quantum metrology that generic uncorrelated noises (in particular the noise described by the beam-splitter model) constrain the ultimate precision to a constant factor beyond the SQL [8,9]. The beamsplitter effectively fixes the loss-rate per particle allowing all the particles to be lost with some probability. In fact modeling losses with a beam-splitter is equivalent to tracing-out k particles with probability N k (1 − η) k η N −k , where η is the fictitious beam-splitter transitivity (see Appendix E for the proof). Thus, the number k of particles lost fluctuates according to a binomial distribution. As a result, the lower bound on the average QFI utilized in Section VI B must also be averaged over the fluctuations of k and the super-classical scaling is lost. Our results on the robustness of random bosonic states against finite particle losses are hence fully consistent with the no-go theorems of [8,9]. Moreover, the fact that random states are much more robust against particle loss than for example N00N states, which loose their metrological usefulness upon the loss of a single particle, raises the hope that for finite N they might still perform comparably well even under uncorrelated noise.

VII. ATTAINING THE HEISENBERG LIMIT WITH A SIMPLE MEASUREMENT
We have demonstrated that random bosonic states lead in a robust manner to super-classical scaling of the QFI. This proves that, in principle, they must allow to locally sense the phase around any value with resolutions beyond the SQL. However, as previously explained in Section II, the phase sensing scenario allows the measurement to be optimised for the particular parameter value considered. Moreover, such measurement may also strongly depend on the state utilised in the protocol, so one may question whether it could be potentially implemented in a realistic experiment, as theoretically it then must be adjusted depending on the state drawn at random. Thus, it is a priori not clear if metrological usefulness of random symmetric states can be actually exploited in practice.
Here we show that this is indeed the case. For random symmetric states of two-mode bosons, a standard measurement in optical and atomic interferometry suffices to attain the Heisenberg scaling of precision when sensing the phase around any value.
In particular, we consider the detection of the distribution of the N bosons between two modes (interferometer arms) after a balanced beam-splitter transformation [3]. As depicted in Fig. 2, this corresponds in optics to the photon-number detection at two output ports of a Mach-Zehnder (MZ) interferometer [39]. Yet, such a setup also applies to experiments with atoms in double-well potentials, in which the beam-splitter transformation can be implemented via trap-engineering and atomic interactions [40,41], while number-resolving detection has recently been achieved via cavity-coupling [42] and fluoresence [43].
One may directly relate the general protocol of distinguishable qubits (see Fig. 1) to the optical setup of photons in two modes (see Fig. 2) after acknowledging that the Dicke basis of general pure symmetric qubit states is nothing but their twomode picture, in which a qubit in a state |0 (|1 ) describes a photon travelling in arm a (b) of the interferometer. In particular, a general pure bosonic state of N qubits may then be written as a superposition of Dicke states {|n, N − n } N n=0 , where each |n, N − n represents the situation in which n and N − n photons are travelling in arm a and b, respectively. In Fig. 2, the estimated phase ϕ is acquired in between the interferometer arms, i.e., by the transformation exp(−iĴ z ϕ) [74]. In the qubit picture this corresponds to a single-particle unitary exp(−ihϕ) with h = σ z /2. Moreover, the unitary balanced beam-splitter transformation of Fig. 2, commonly defined in the modal picture asB := exp(−iπĴ x /2), is then equivalent to a local rotation exp(−iπσ x /4) of each particle in the qubit picture applied after the ϕ-encoding.
Hence, the measurement of Fig. 2 with outcomes labeled by n (the number of photons detected in mode a) corresponds to a are the projections onto Dicke states, |D N n := |n, N − n . Given a general pure state ψ N inside the interferometer (see Fig. 2), the state after acquiring the estimated phase reads ψ N (ϕ) := e −iĴzϕ ψ N e iĴzϕ . Then, the probability of outcome n given that the unknown parameter was ϕ is just Before we proceed, let us note that due to the identitŷ B e −iĴzϕB † = e iĴyϕ it is possible to effectively map the above measurement scheme to the situation in which the initial state is already propagated thorough a beam-splitter:ψ N = Bψ NB † , yet the parameter is encoded via a Hamiltonian in the y direction (viah = −σ y /2). As a result, the measurement (POVM) elements then simplify to just projections onto the Dicke states D N n (see Appendix C 4 for details). Having the explicit form of the measurement-outcome probability, we can compute the corresponding (classical) FI .
The unitary rotation e −iĴzϕ entering the definition of ψ N (ϕ) is responsible for the strong dependence (see also the numeric results in Section VIII) of the FI on the value of ϕ. However, let us note that when averaging (with respect to Haar measure) the FI (32) over all bosonic states ψ N ∈ S N , any unitary transformation of the state becomes irrelevant. In particular, owing to the parameter being unitary encoded, ψ N (ϕ) may then be simply replaced by ψ N , so that the average of Eq. (32) manifestly seizes to depend on ϕ. This observation alone is not sufficient to deduce that the concentration behavior of FI (32) is independent of the value of the parameter ϕ. However, with the following theorem we show not only this but actually a significantly stronger statement. We prove that the FI given in Eq. (32) evaluated on random symmetric states not only typically attains the Heisenberg scaling for a certain value of ϕ, but typically does so for all values of ϕ at the same time.
Theorem 4 (Pure random symmetric qubit states typically attain the HL for all values of ϕ in the setup of Fig. 2). Let ψ N be a fixed pure state on S N for d = 2 modes and p n|ϕ (U ψ N U † ) the probability to obtain outcome n given that the value of the unknown phase parameter is ϕ and the inter- In other words, the probability that for a random state there exists a value of the parameter ϕ for which F cl (U, ϕ) does not achieve Heisenberg scaling is exponentially small in N . Hence, when dealing with typical two-mode bosonic states one does not have to resort to LU-optimization (similarly, as in Theorem 2) in order to reveal their metrological usefulness even for a fixed measurement. Note that such an optimization would make the problem ϕ-independent. As the interferometric scheme of Fig. 2 is restricted to the symmetric subspace S N , one is allowed to perform only LU operations of the form V ⊗N . However, setting V = exp(−i θ σ z /2) one may then always shift ϕ → ϕ + θ to any desired value.
We provide a detailed proof of Theorem 4 in Appendix D, where we also present its more precise and technical version. One of its constituents-the analysis of the average value of the FI (32)-may be found in Appendix C 4, where by rigorously showing that with constants 0 < c − < c + < ∞, we prove that the average FI indeed asymptotically follows the HL-like scaling. Although our derivation allows us only to bound the actual asymptotic constant factor, we conjecture that In particular, we realise that such behaviour is recovered after replacing the denominators of all terms in the sum of Eq. (32) by their average values and also verify our conjecture numerically in Section VIII below. The fact that random symmetric states typically lead to Heisenberg scaling for all values of ϕ in the simple setup of Fig. 2 has important consequences. If this were not the case, it could be possible that for typical symmetric state ψ N there are values of ϕ for which the sensitivity was low. Our stronger result however is directly useful for realizable setups: In real interferometry experiments one typically starts the phase estimation protocol by calibrating the device [10,11]. This is done by taking control of ϕ and reconstructing the p n|ϕ empirically form measurements with different known values of ϕ. A tomography of the state ψ N is then not necessary and an efficient estimator (e.g., max-likelihood [57]) can always be constructed that saturates the CRB (1) after sufficiently many protocol repetitions. Crucially, this implies that one might randomly generate (for instance, following the protocol we present in Section VIII) many copies of some fixed random symmetric state ψ N and, even without being aware of its exact form, one can still typically construct an estimator that attains the Heisenberg scaling, while sensing small fluctuations of the parameter around any given value of ϕ.

VIII. EFFICIENT GENERATION OF RANDOM SYMMETRIC STATES.
We have shown that random symmetric states have very promising properties for quantum sensing scenarios, but have so far not addressed the question of how to efficiently generate such states. In this section, we demonstrate that the random symmetric states can be simulated with help of short random circuits whose outputs indeed yield, on average, the Heisenberg scaling not only of the QFI but also of the FI for the measurement scheme depicted in Fig. 2.
Concretely, we consider random circuits over a set of gates that is universal on the special unitary group of the symmetric subspace and consists of four different gates: three beam splitters and a cross-Kerr non-linearity. A set of gates is said to be universal on a certain unitary group if by taking products of its elements one can obtain arbitrary good approximations (in trace norm) to any unitary operation in this group. We emphasize that the universality of in the symmetric subspace S N is not connected to the notion of universal quantum computation. This follows from the fact that the dimension of S N scales polynomially (in the case of d = 2 modes linearly) in the number of particles N and consequently this space is not sufficient for universal quantum computation. We first construct a universal set of unitary gates on S N for d = 2, which is inspired by operations commonly available when dealing with bosonic (optical and atomic) systems. We present our results in the language of two-mode interferometry (see Fig. 2).
We start with the following set of gates: known to be a "fast" universal gate-set for linear optics [75][76][77][78][79]. The above matrices reflect how gates act on a single particle (which can be either in mode a or in mode b). The action on S N is then given byV j = V ⊗N j for j ∈ {1, 2, 3}. We now supplement the above collection by a two-mode gate corresponding to a cross-Kerr nonlinearity (with effective action time t = π/3) [80]. Concretely, we takê wheren a/b are the particle number operators of modes a/b marked in Fig. 2. For the general method method for checking if a given gate promotes linear optics to universality in S N see [81].
In atom optics, large cross-Kerr nonlinearities and phase shifts (XKPS) can be achieved in ultracold two-component Bose gases in the so-called two-mode approximation [82][83][84] (see also [34,85]). In optics, reaching large XKPS is more challenging [86][87][88], but there has recently been a spectacular progress in this area, both on weak [89,90] and strong nonlinearities [91][92][93]. From the theoretical perspective, using the methods of geometric control theory [94,95] and ideas from representation theory of Lie algebras [81] it is possible to prove that the gates {V 1 ,V 2 ,V 3 ,V XK } are universal for SU (S N ). The gateV XK is not the only gate yielding universality when supplemented with gates universal for linear optics. The comprehensive characterization of (non-linear) gates having this property will be presented in [81].
A random circuit of depth K over this gate set is now obtained by picking at random (according to a uniform distribution) K gates from the set We call states generated by applying such a circuit to some fixed symmetric state random circuit states.
Our intuition that the above scheme should generate unitaries distributed approximately according to µ (S N ) comes from the theory of the so-called -approximate unitary tdesigns [54,55,96]. There are several essentially equivalent ways to define unitary designs [54]. One of them, introduced in Ref. [53], implies that an -approximate unitary t-design µ ,t (H) is a distribution over the unitary group SU(H) acting on a Hilbert space H (of dimension |H|) that efficiently approximates the Haar measure µ(H) in a way such that for all balanced monomials f : SU(H) → R of degree t, it holds that √ xn|n, N − n with: x0 = 1 polarised (red), xn = N n /2 N balanced (blue), or x0 = xN = 1/2 N00N (yellow). Fast convergence to the values N (N + 1)/3 and N (N + 1)/6 (black horizontal lines) is evident in all cases. The shaded regions mark "worse than SQL" and "better than HL" precisions. The F cl curves are plotted for ϕ = π/2 (solid) and ϕ = π/3 (dotted). The inset depicts the sufficient circuit depth K suf as a function of N , such that the corresponding sample-averaged QFI (black curves) or FI (red curves) is at most 1% (top curves) or 10% (bottom curves) from its typical value. As K suf grows at most mildly with N , for realistically achievable photon numbers [78], K ≈ 20 may be considered sufficient.
Moreover, if such a function satisfies a concentration inequality of the form given in Eq. (10) with respect to the Haar measure, then an (albeit weaker) concentration also holds with respect the design [53]. All these statements carry over in a similar form to balanced polynomials. Their corresponding difference may always be bounded by the weighted sum of the differences of their constituting monomials. The iso-spectral QFI, F(U ) = F(U σ N U † , H), introduced in Theorem 2 as a function of unitary rotations is precisely such a balanced polynomial of order two, which can be seen directly from Eq. (7).
For distinguishable qudits there are efficient methods to generate approximate designs by using random circuits over local universal gate-sets on H N [54,55]. These constructions unfortunately do not immediately carry over to the symmetric subspace S N of N qubits. However, one can use the fact proven in Ref. [55] (based on results of Ref. [52]) that in any Hilbert space H sufficiently long random circuits over a set of universal gates form an -approximate unitary t-design. More precisely, this holds whenever the gates employed in the circuit are non-trivial and have algebraic entries. The set of gates universal in S N given above satisfies this condition. For this to hold, it would actually be sufficient to replace all the three gatesV 1 ,V 2 ,V 3 by a single non-trivial beam splitter [77] and the gateV XK by essentialy non-trivial non-linear gate. The latter would not even have to be a reproducibly implementable non-linear operation, but its strength could even be allowed to vary from invocation to invocation. However, it is mathematically very difficult to analytically bound the depth K of such circuits that is sufficient to achieve a given for a given t.
For this reason we resort to a numerical analysis to verify how rapidly with increasing K the average QFI and the FI of random circuit states converge to the respective averages for FIG. 4. (color online) Mean squared error attained by random bosonic states generated by sufficiently deep random circuits: We depict the ultimate limit of the resolution ν ∆ 2φ attainable with random circuit states generated by applying a deep random circuit (K = 60) onto two-mode balanced (xn = N n 1 2 N in Fig. 3) state for both the interferometric measurement of Fig. 2 (red) and the theoretically optimal one yielding the QFI (black). The corresponding sampleaveraged FI and QFI quickly concentrate around the typical values N 2 /3 and N 2 /6 (dashed lines) respectively. The shaded regions mark the "worse than SQL" and "better than HL" precisions. random symmetric states. We consider the scenario of Fig. 2. In this two-mode case d = 2 it holds that |S N | = N + 1 and from Eq. (32) we obtain concentration of the QFI around the value E F := E U ∼µ(H) F(U ) = N (N + 1)/3 [97]. For the FI, we expect to find E F cl := E U ∼µ(H) F cl (U, ϕ) = N (N + 1)/6 (for details see the discussion after Theorem 4 and Appendix C 4). In Fig. 3, we show explicitly that indeed random circuit states generated according to our recipe allow to reach these values already for moderate K.
In Fig. 4, we verify further the behavior of the ultimate limits on the attainable precision via the relevant CRB (see Eq. (1)) dictated by the attained values for 1/ F and 1/ F cl . We observe that the ultimate bounds predicted by the average E F and E F cl for random symmetric states are indeed saturated quickly (here, K = 60) and, crucially, both reach the predicted Heisenberg scaling. This demonstrates that it is possible to generate states that share the favorable metrological properties of Haar-random symmetric states via the physical processes of applying randomly selected optical gates.

IX. CONCLUSIONS
In this work we present a systematic study of the usefulness of random states for quantum metrology. We show that random states, sampled according to the Haar measure from the full space of states of distinguishable particles, are typically not useful for quantum enhanced metrology. In stark contrast, we prove that states from the symmetric subspace have many very promising properties for quantum metrology: They typically achieve Heisenberg scaling of the quantum Fisher information and this scaling is robust against particle loss and equally holds for very mixed isospectral states. Moreover, we show that the high quantum Fisher information of such random states can actually be exploited with a single fixed measurement that is implementable with a beam splitter and particle-number detectors. Finally, we also demonstrate that states generated with short random circuits can be used as a resource to achieve a classical Fisher information with the same scaling as the Heisenberg limit. Our results on random symmetric states open up new possibilities for quantum enhanced metrology.
Our work, being a study initiating a new research direction, naturally leads raises a number of interesting questions: From the physical perspective it would be important to investigate the impact of more realistic noise types, such as: local (and correlated) dephasing, depolarisation [56] and particle loss on the classical Fisher information in the interferometric scenario considered in Section VII, as well as the quantum Fisher information in general for finite N . Further, it would be interesting to see whether bosonic random states are also useful for multi-parameter sensing problems with non-commuting generators [72]. An important part of the quantum metrology research is devoted to infinite dimensional optical systems with the mean number of particles-corresponding to the power of a light beam-being fixed [3] e.g., in squeezing-enhanced interforometry with strong laser beams of constant power [98]. Here one could ask whether states prepared via random Gaussian transformations [99,100] or random circuits of gates universal for linear optics are typically useful for metrology. Another relevant problem beyond our analysis is the speed of convergence to the approximate designs when considering performance of the states prepared with random bosonic circuits discussed in Section VIII. Further it is interesting to further study properties of the ensembles of random states generated from the Haar-random pure states and possibly particle loss, of both bosons and fermions. Lastly, a natural question to be asked is whether the typical metrological usefulness of random bosonic states remains valid if one considers general phase estimation scenarios, e.g., single-shot protocols with no prior knowledge assumed about the parameter value [68], for which Bayesian inference methods must be employed to quantify the attainable precision [69,70].

Appendices
Here we give the details that are needed to obtain the main results given in the main text. In the Section A we discuss concentration of measure on the special unitary group, and give bounds on the Lipschitz constants of the relevant functions on this group. In Section B we prove a lower bound on the QFI that are useful when studying particle losses. In Section C we give bounds for averages of FI and QFI on the relevant ensembles of density matrices that we consider-isospectral density matrices of distinguishable particles, random symmetric (bosonic) states of identical particles, and random bosonic states that underwent particle loss (Section C 3). In Section D we use the previously derived technical results to prove Theorems 1, 2, 3, and 4 in the main text. In Section E we prove the equivalence of the beam-splitter model of particle losses and the operation of taking partial traces over particles contained in two-mode bosonic systems. In this section we first present basic concentration of measure inequalities on the special unitary group SU (H). Then, we give bounds on the Lipschitz constants of various functions based on QFI and FI that appear naturally, while studying different statistical ensembles of states on H-isospectral density matrices, partially traced isospectral density matirices etc.

Concentration of measure on unitary group
We will make an extensive use of the concentration of measure phenomenon on the special unitary group SU (H). It will be convenient to use a metric tensor g where we have used X † = X, the identity U U † = 1, and the cyclic property of the trace. The gradient of a smooth function f : SU (H) → R at point U ∈ SU (H) is defined by the condition that has to be satisfied for all X ∈ su (H). be the Lipschitz constant of f . Then, for every ≥ 0, the following concentration inequalities hold where D = |H| is the dimension of H.
where d (V, W ) is the geodesic distance between unitaries U and V given by with D γ :=´[ 0,1] g HS dγ dt , dγ dt , and the infimum is over the (piecewise smooth) curves γ that start at U and end at V .

Remark 1.
Due to the definition of the gradient ∇f (A3) and the structure of the tangent space T U SU (H) for U ∈ SU (H) (see Eq. (A1)) we have where X = X † and tr X = 0. Assume that for C > 0 we can find the bound which is valid for all U ∈ SU (H). Then, from Eq. (A10) we can conclude that C is an upper bound on the Lipschitz constant of f .

Lipschitz constants for the quantum Fisher information for the general Hamiltonian encoding
Recall that for unitary encodings the quantum Fisher information for a state ρ with spectral decomposition i p i |e i e i | F (ρ, H) = 2 i,j:pi+pj =0 where H is the Hamiltonian generating the unitary evolution of mixed states The QFI depends on both the state ρ ∈ D (H), and the Hamiltonian H ∈ Herm (H) encoding the phase ϕ. In what follows, without any loss of generality, we assume that tr (H) = 0. We are interested in the behavior of F (ρ, H) when H is fixed and ρ varies over some ensemble of (generally mixed) states. As we want to use concentration inequalities (of the type of Eq. (A7)), our aim here is to give bounds on Lipschitz constant of QFI on relevant sets of density matrices. We first study QFI on the set of isospectral density matrices where sp ↑ (ρ) denotes the vector on non increasingly ordered eigenvalues of ρ. In what follows, for the sake of simplicity we will use the shorthand notation Ω (p1,...,p D ) := Ω. Let where ρ 0 is the arbitrary chosen state belonging to Ω. Then, we can prove the following lemma.
where 1/D is the maximally mixed state on H.
Remark 2. The quantity d B (ρ, 1/D) depends only on the spectrum of ρ and thus is constant on the set of isospectral density matrices Ω.
Proof. In a recent paper by [30] the following inequality was proven: where d B (ρ, σ) = 2 [1 − F (ρ, σ)] is the Bures distance between density matrices with F(ρ, σ) = tr σ 1/2 ρσ 1/2 denoting the fidelity. Inserting into Eq. (A17), one obtains Dividing the above by |φ| and taking the limit φ → 0, one arrives at Then, the Eq. (6) implies that and therefore In addition, we have two upper bounds on the square root of the quantum Fisher information: and where Eq. (A23) follows from the fact that the maximal value of the QFI for the phase encoded via the Hamiltonian H is bounded from above by 4 H 2 [1], and Eq. (A24) follows from Eq. (A17) for σ = 1/D, for which the QFI trivially vanishes. Combining inequalities Eqs. (A23) and (A24) with Eq. (A22) and using Remark 1, one finally obtains Eq. (A16).

Remark 3.
The upper bound on the Lipschitz constant of F Ω,H given in Eq. (A16) depends explicitly on the spectrum of the considered set of isospectral density matrices. Specifically, the right-hand side of Eq. (A16) decreases as ρ becomes more mixed. For special cases of Haar-random pure states and random depolarized states, see below, we can get better bounds on the Lipschitz constant of the QFI.
Lemma 2. Consider the ensemble of Haar-random depolarized pure states, where ψ stands for the projector onto a Haar-random pure state |ψ and p ∈ [0, 1]. For fixed p, the states in Eq. (A25) form an ensemble of isospectral density matrices since for any such ρ we have For this particular spectrum the Lipschitz constant L p (with respect to g HS ) of the function F p := F Ω,H , defined by Eq. (A15), is upper bounded by Proof. Let us first note that for ρ given by Eq. (A25), the QFI takes the form [2]: from which it directly follows that for all U ∈ SU (H), and consequently, One can estimate L 0 by exploiting the fact that for pure states the QFI is simply F (ψ 0 , H) = 4{tr(ψ 0 H 2 ) − [tr(ψ 0 H)] 2 }, which allows one to express F 0 (U ) as where V = 4 (H 2 ⊗ 1 − H ⊗ H). By the virtue Lemma 6.1 of [38] (see also [37]), the Lipschitz constant of F 0 is bounded by 2 V ≤ 16 H 2 . Combining this with Eq. (A30) yields Eq. (A27).
It is also possible to prove the Lipschitz continuity of the optimized version of QFI on Ω, where V ⊂ SU (H) is a compact class of unitary gates on H.
Proof. Let U, U ∈ SU (H). Without loss of generality we can assume F V Ω,H (U ) ≥ F V Ω,H (U ). Let V 0 ∈ V be the element such that Consequently, we have the following inequalities where in the last inequality we used Lipschitz continuity of F Ω,H which is guaranteed by Lemma 1.

Lipschitz constants for the quantum Fisher information with particle losses
We now give bounds on the Lipschitz constant of the QFI in the case of particle losses for bosonic states. Recall that in this setting the Hamiltonian acting on N particles is given by H N = N i=1 h (i) and that the Hilbert space of the system is the totally symmetric space of N particles denoted by S N . Let us define a function Ω ≤ min 1, 2 where P N sym /|S N | is the maximally mixed state on S N and P N sym stands for the projector onto S N .
Proof. We prove Eq. (A37) in an analogous way to Eq. (A16). Let ρ = tr k (ρ) and σ = tr k (σ) be two states on S N −k obtained by tracing out k particles from ρ and σ, respectively. Applying the inequality Eq. (A17) to ρ and σ and the Hamiltonian H = H N −k , one obtains where the second inequality follows from the fact that the Bures distance does not increase under trace preserving completely positive maps [66] (for us the relevant TPCP map is the partial trace, ρ −→ tr k (ρ)). We now set where U ∈ SU (S N ) and X ∈ su (S N ) and the rest of the proof is exactly the same as that of Lemma 1.

Lipschitz constant of the classical Fisher information
We conclude this section by giving bounds of Lipschitz constant of the classical Fisher information for the case of isospectral mixed states, fixed Hamiltonian encoding and fixed measurements setting. Recall that for the unitary encoding (A13) classical Fisher information is a function of the state ρ ∈ D (H), Hamiltonian H, the phase ϕ, and the POVM {Π n } used in the phase estimation procedure. These three object define a family of probability distributions p n|ϕ (ρ (ϕ)) = tr (Π n ρ (ϕ)) , where ρ (ϕ) = exp (−iϕH) ρ exp (iϕH). The classical Fisher information is then given by where the summation is over the range of indices labeling the outputs of a POVM {Π n } (for simplicity we consider POVMs with finite number of outcomes). Let us fix the Hamiltonian H, the phase ϕ, and the POVM {Π n }. Let us define a function for some fixed state ρ 0 ∈ Ω. Proof. The strategy of the proof is analogous to the one presented in the other lemmas in this section. The idea s to find a bound for in terms of the Hilbert-Schmidt norm of X. Let us first assume that at U ∈ SU (H) for all n Under the above condition we have where ρ U (ϕ) = exp (−iϕH) U ρ 0 U † exp (iϕH). Let us introduce the auxiliary notation Clearly, we have the inequality In order to bound A and B (from above) we observe that for any state ρ ∈ D (H) we have In order to prove (A50) we first upper bound |tr (HΠ n ρ)| , where in (A54) we have used the nonnegativity of operators Π n and ρ. In (A55) we have repetitively used the Cauchy-Schwartz inequality, first for P = √ ρH √ Π n , Q = √ M √ ρ and then for P = √ ρH 2 , Q = M √ ρ. The final inequality (A56) follows immediately form operator inequalities Using analogous reasoning it is possible to prove |tr (HΠ n ρ)| ≤ tr (ρΠ n ) H . This finishes the proof of (A50). Using essentially the same methodology it is possible to prove the inequalities (A52) and (A51). By plugging inequalities (A50), (A52) and (A51) into Eq. (A49) for ρ = ρ U (ϕ) and using the normalization condition n tr (Π n ρ U (ϕ)) = 1 , we obtain Appendix B: Lower bounds on the QFI Lemma 6. Let ρ ϕ be a one parameter family of states states on aa Hilbert space H. Then, The following lower bound for QFI holds (see also [101]) In particular for ρ ϕ = exp (−iHϕ) ρ exp (iHϕ) we have Recall that right hand side of (B2), [H, ρ] 2 1 , equals the measure of asymmetry introduced in [73]. Proof. We give the proof only in the Hamiltonian case (B2). The proof of the general case is analogous. Recall that the quantum Fisher information is related to the Bures distance At the same time, from the Fuchs-van der Graaf inequalities [102] we know that with the second inequality stemming from that fact that F (ρ, σ) ≤ 1. By combining Eqs. (B3) and (B4) we obtain Diving it by |δϕ| and taking then the limit δϕ → 0, we get which, by virtue of the fact that where |ϕ = | ϕ / ψ|H 2 |ψ . The matrix under the trace norm is manifestly anti-Hermitian and of rank two, and so it is straightforward to compute its norm. To do this let us write |ϕ = α|ψ + β|ψ ⊥ , where |α| 2 + |β| 2 = 1 and |ψ ⊥ is some normalized vector orthogonal to |ψ . It also follows that α ∈ R because α = ψ|ϕ = ψ|H|ψ / ψ|H 2 |ψ . All this implies and consequently, the eigenvalues of the above matrix are ±i|β|. Thus, its trace norm amounts to 2|β|, giving In this section we compute and/or bound averages of the FI or QFI on the ensembles of mixed quantum states appearing in the main text.

Averages of QFI on ensembles of isospectral density matrices
We will extensively use the following result concerning the integration on the special unitary group.  Herm (H ⊗ H), the following equality holds [67], where P sym and P asym are projectors onto the symmetric and antisymmetric subspaces of H ⊗ H and can be expressed as with S being the co-called swap operator satisfying S|x |y = |y |x for any pair |x , |y ∈ H. Finally, the real coefficients α and β are given by We first consider the case in which both the Hilbert space H and the Hamiltonian H are fully general.
Lemma 7. Let F Ω,H be defined as in Eq. (A15). then, the following equality holds Proof. We have the following sequence of equalities where the third equality follows from Fact 3 for V = H ⊗ H, and the real numbers α H and β H are given by To obtain (C8) we also used the fact that tr (H) = 0. Inserting then Eq. (C8) to Eq. (C7) and using the identities tr (|e i e j | ⊗ |e j e i |) = 0, tr (|e i e j | ⊗ |e j e i |S) = 1, one arrives at F Ω,H (U ) = tr H 2 2 which, by virtue of the definitions of D ± , leads us to Eq. (C4).
The formula (C4) simplifies significantly for the case of pure states.

Remark 7.
Let Ω 0 consist of pure states on H. In this case we it fairly easy to see that i,j:pi+pj =0 and consequently, By comparing Eqs. (C12) and (C4) one finds that the average QFI over any ensemble Ω of isospectral states can be easily related to the average QFI over pure states. Specifically, one has where Note that Λ({p j } j ) = 1 for pure states and Λ({p j } j ) = 0 for the maximally mixed state. Since in general the dependance on the spectrum in the above formula is quite complicated it is desirable to have simple bounds on i,j:pi+pj =0 (pi−pj ) 2 pi+pj . The following fact provides one such bound: Proof. First, by using the identity (p i − p j ) 2 = (p i + p j ) 2 − 4p i p j the left-hand side of the inequality C15 can be rewritten as which, noting that the first sum in the above amounts to 2(D − 1), can be rewritten as To obtain the inequality in Eq. (C15) we apply the following well-known relation between the harmonic and geometric means to Eq. (C17). Then, to obtain the equality in (C15) and complete the proof it suffices to notice that Remark 8. Note that the bound (C15) is tight. To be more precise, it is saturated for the maximally mixed state ρ = I/D, for which both sides of the inequality (C15) simply vanish, and for pure states for which they amount to 2(D − 1).

Averages of QFI for N particles
We now discuss the average behaviour of the QFI for ensembles consisting of states of distinguishable or bosonic particles (in the case when all particles evolve in the same manner under local Hamiltonian). For the case of N distinguishable particles we have where C d is a Hilbert space of a single particle and N is a number of particles. Clearly, we have D = |H N | = d N . The Hilbert space of N bosons in d modes is the completely symmetric subspace of H N , . It will be convenient for us to use the orthonormal basis of S N consisting of generalized Dicke states [103] (we wil use them extensively also in the part of the Appendix, where we estimate the impact of particle losses on typical properties of QFI). Within the second quantization picture S N can be treated as a subspace of d mode bosonic Fock space and the generalized Dicke states are of the form where |Ω is the Fock vacuum, a † i are the standard creation operators and the vector k = (k 1 , k 2 , . . . , k d ) consists of non-negative integers counting how many particles occupy each mode. Due to the fact that the number of particles is N , the vector k satisfies the normalization condition | k| := d i=1 k i = N . Let us also notice that in the particle picture the Dicke states are given by and by P N sym we denote the orthonormal projector onto S N ⊂ H N . The Hamiltonian used in the phase estimation is local and symmetric under exchange of particles, where h stands for the single-particle local Hamiltonian. In what follows we assume for simplicity that tr (h) = 0. Note that the Hamiltonian H N preserves the subspace S N . Now, it follows from Eqs. (C4) and (C12) that the average behaviour of the QFI on states supported on the subspace W ⊂ H is dictated by the value of tr W (H 2 N ). In the following lemma we compute the latter in the cases W = H N and W = S N .
Lemma 8. Let Hamiltonian H be given by (C26). Then, the following relations and E U ∼µ(S N ) are true. For pure qubits (C28) simplifies to Remark 9. The qualitative meaning of the above lemma is twofold. First, it shows that for uniformly distributed isospectral states from H N the scaling of the QFI on average is at most linear in the number of particles N , and, secondly, it proves that for random pure symmetric states the average QFI scales quadratically with N , both for fixed local Hamiltonian h and local dimension d. Thus, for symmetric states the average QFI attains the Heisenberg limit. This behaviour still holds for random isospectral density matrices, provided their spectrum is sufficiently pure with the "degree of purity" quantified by Λ({p j } j ) defined by (C14).
Proof. We start from the proof of (C27). Using the fact that the local Hamiltonian h is traceless, one obtains Inserting the above to (C4) (note that here H = H N ), we arrive at (C27). The proof of (C28) is more involving as it requires the computation of tr S N H 2 . The final result reads which when plugged into (C4) yields (C28).
To determine explicitly (C31) let us chose the basis {|i } d i=1 of the single particle space as the eigenbasis of the local Hamiltonian h. Thus we have h|i = λ i |i for i = 1, . . . , d. The corresponding generalized Dicke states (see (C23)) satisfy where λ = (λ 1 , . . . , λ d ) is the vector of eigenvalues of h and is the standard inner product in R d . Now, Eq. (C32) together with the fact that the generalized Dicke states form a basis of S N allow us to write where to obtain the second equality we explicitly squared all scalar products appearing under the sum. From the symmetry we have k:| k|=N for all i, i and for all pairs of different indices (i, j) and (i , j ). As a result Eq. (C33) simplifies to The fact that h is traceless yields Moreover, due to the condition k 1 + . . . + k d = N we have By exploiting the identities (C35) the left-hand side of the above equation can be rewritten as As a result, one obtains Using Eqs. (C36), (C37) and (C40) we finally arrive at We compute the sum k:| k|=N k 2 1 by noting that where # ( ) denotes the number of elements of a discrete set. The above equation follows from the fact that the number of elements of the set { k| | k| = N, k 1 = i} is the same as the dimension of the Hilbert space of N − i bosons in d − 1 modes. Consequently, we get k:| k|=N Inserting the above expression to (C41) yields (C31). The equlity (C43) can be proven using standard combinatorial identities. Below we sketch its proof for completeness. First, by the virtue of the diagonal sum property of binomial coefficients [104] we have that where a an arbitrary integer. Inserting (C44) (with a = d = 2) to the left hand side of (C43) we get The sum N −k i=0 i 2 is a polynomial of degree 3 in k and can be easily computed. Therefore, in order to finish the computation it suffices to know the moments for the powers j = 1, 2, 3. These can be found for instance on page 5 of [105].
Remark 10. The most demanding part in the proof of Lemma 8 was the computation of tr S N H 2 that can be simplified greatly by the use of group theoretic methods. This should allow one to perform analogous analysis for other irreducible representations of the group SU (d), for instance, for the fermionic subspace of H N .

Average QFI for bosons with particle losses
Let ρ = tr k (ρ) be a mixed symmetric state on N − k particles arising from tracing out k particles of some N -partite state ρ ∈ D (S N ). Our aim in this section is to bound the average of the QFI over mixed states created in the above way, where ρ is a random isospectral state acting on S N . Recall that that we are interested in the standard context of quantum metrology, i.e., the Hamiltonian H encoding the phase ϕ is given by Eq. (C26).
Lemma 9. Let ρ ∈ D (S N ) be a state of N bosons with single particle d-dimensional Hilbert space H l and the spectrum p 1 , . . . , p |S N | . Let us fix the local Hamiltonian h and a non-negative integer k. Then, the following inequality holds .
Proof. Denoting σ U = tr k (U ρU † ) we notice that the inequality (B9) allows one to lower-bound the QFI as where due to the fact that σ U is symmetric, the trace is taken over the symmetric subspace S N −k . For the same reason we can cut the Hamiltonian to the symmetric subspace on which it acts as where, as before, | n, N − k are (N − k)-partite generalized Dicke states and n = (n 0 , . . . , n d−1 ) is a vector of non-negative integers such that n 0 + . . . + n d−1 = N − k, and, λ (N −k) n are the eigenvalues of H N −k . By abuse of notation, in what follows we denote both the Hamiltonian and its symmetric part (C50) by H N −k .
Using the swap operator introduced in Fact 3 for H = S N −k and the fact that tr(SA ⊗ B) = tr(AB) holds for any pair of operators acting on S N −k , we can rewrite Eq. (C48) as where to obtain the second line we used the fact that σ U acts on S N −k and that S 2 N −k = P N −k sym ⊗ P N −k sym , and, for simplicity, we dropped the subscript S N −k ⊗ S N −k in the trace.
Exploting the fact that the symmetric projector P N sym is diagonal in the Dicke basis, that is, the representation of the Hamiltonian in Eq. (C48) and the definition of the swap operator, one arrives at the following formula which when plugged into Eq. (C51) gives We are now ready to lower bound the average E U ∼µ(S N ) F (σ U , H N −k ). Using the fact that σ U = tr k (U ρU † ) and that U ρU † is symmetric, we obtain from inequality (C55) that where now the trace is performed over S N ⊗ S N . Let us focus for a moment on the statê It follows from Fact 3 (for H = S N ) that after performing the integration the above state assumes the following form For completeness let us recall that P sym∧sym and P as∧as are the projectors onto the symmetric and antisymmetric subspaces of S N ⊗ S N , respectively, and are given by Moreover, the real coefficients α and β are explicitly given by where D ± (S N ) = |S N |(|S N | ± 1)/2. Plugging Eq. (C58) into Eq. (C56) and using Eq. (C59) one arrives at The right-hand side of this inequality can significantly be simplified if one observes that the first trace under the curly brackets is nonzero only if m = n, giving Our aim now is to compute the remaining trace, which for further purposes we denote T m, n . We use the fact that the projector P k sym can be written as in Eq. (C53), which together with the following identity allows us to express T m, n as This gives where we used the explicit expressions for α and β and denoted (C67) We now compute each sum separately. To this end, let us first notice that it follows from Eq. (C63) that where to get the second equality we used Eq. (C53), while to obtain the third one we used the fact that the partial trace of P N sym over N − k subsystems is given by With the aid of formula Eq. (C68) we can write L N,k as (C72) Then, exploiting formulas Eq. (C53) and Eq. (C63) and the form of the Hamiltonian H N −k this further rewrites as where the second equality stems from Eq. (C71). To compute L N,k we follow more or less the same strategy. First, using Eqs. (C63) and (C50) we can rewrite it as Then, we use the fact in that the full Hilbert space (C d ) ⊗(N −k) , H N −k assumes the form given in Eq. (C26), which gives where the second line follows from Eq. (C71). To compute the remaining trace we expand h in its eigenbasis as h = d−1 n=0 ξ n |n n| (where ξ i are the eigenvalues of h), which can also be written using the "mode representation" as where n = (i 0 , . . . , i d−1 ) is now a d-dimensional vector whose components are such that n i = 0, 1 and n 0 + . . . + n d−1 = 1. In this representation a number n = i ∈ {0, . . . , d − 1} is represented by a vector n whose ith component n i = 1 and the remaining ones are zero. Using Eq. (C63) one obtains where the summation is taken over vectors n specified above (there is d such vectors). The second equality straightforwardly stems from the fact that 1 n = 1. We then exploit the fact that k+1 n+ o = k+1 on+1 k o and the assumption that tr h = 0 to get where, to recall, λ Plugging Eqs. (C73) and (C81) into Eq. (C56), one eventually finds that the average QFI is lower-bounded as Remark 11. It is worth mentioning that using similar techniques, one can also provide an upper bound on the average QFI for bosons in the case of particle losses. To be more precise, in what follows we will derive such a bound for multi-qubit states. As the QFI is upper bounded by the variance, one has Using then the fact that the right-hand side can be rewritten as one obtains With the aid of Eqs. (C71) and (C31) we eventually get Notice that for k = 0 this bound gives N (N + 2)/3 which differs from the exact value for qubtis by a factor linear in N . In general, however, this bound is not very informative because even for significant particle losses as e.g. k = ηN with 0 < η < 1, the right-hand side of Eq. (C86) scales quadratically with N .

Average FI of random two-mode bosonic states in the interferometric setup
In this part we study the interferometric setup introduced in Section VII and depicted in Fig. 2. Recall that the classical Fisher information (FI) associated with such a measurement scheme is given by: Similarly as in Theorem 4 of Section VII, after fixing ψ N to be a particular pure state on S N , we may then define where U ∈ SU (S N ) and ϕ ∈ [0, 2π].
Lemma 10. Let F cl (U, ϕ) be defined as above. Then, the following inequalities hold where Proof. The main difficulty in the proof comes from the fact that F cl (U, ϕ) is a complicated, non-linear function of U . Let us first note that by using the relationB e −iĴzϕB † = e iĴyϕ it is possible to rewrite the FI in Eq. (C87) as whereψ(ϕ) = exp iϕĴ y ψ exp −iϕĴ y . Let us introduce the auxiliary notation Using the above formulas we obtain the compact expression for F cl (U, ϕ) , In what follows we will make use of the inequality which follows directly from (A50) applied to the considered setting. In order to obtain bounds on the average E U ∼µ(S N ) F cl (U, ϕ) we will use the use the following subsets of the SU (S N ), where n = 0, . . . , N and α ∈ [0, 1]. Because of the unitary invariance of the Haar measure and the fact that projectors D N n have rank one the distribution of the random variable g n (U, ϕ) is identical with the distribution of the random variable X (V ) = tr ψV ψV † , where V -is Haar distributed unitary on C N +1 and ψ is a pure state on this Hilbert space. The distribution of X (V ) is known (see for instance equation (9) in [106]) and is given by Lower bound. Let us first derive the lower bound for the average of FI. Consider first the average of a single term in a sum (C94). For α > 0 we have the following chain of (in)equalities In the above sequence of (in)equalities (C99) follows form the definitions of sets G n ±,α , (C101) follows from the nonnegativity of g n (U, ϕ) − α on G n +,α and from (C95). Equation (C102) follows form the definition of the random variable X presented in the discussion above (C98). Finally equation (C103) follows directly form (C98). Summing up over n we obtain the inequality Using the integration techniques analogous to the ones used in preceding sections it is possible to show that Making use of the fact that tr Ĵ y D N n = 0 we obtain In the last equality of (C106) we have used (C31) and the fact thatĴ y originates in a single particle Hamiltonian satisfying tr h 2 = 1 2 . Plugging (C106) to (C104) we obtain that for all α > 0 we obtain By setting α = ∆ N , where ∆ is a fixed positive parameter, and by using the inequality Finding the maximal value of right hand side of (C108) (treated as a function of ∆) is difficult. Numerical investigation shows that the maximal value is obtained very close to ∆ = 6 which finally gives where c − = 1 36 − 4 3e 5 ≈ 0.0244. Upper bound. The proof of the upper bound of the average Fisher information is analogous. For α > 0 we have the following chain of (in)equalities In the above sequence of (in)equalities (C110) follows form the definitions of sets G n ±,α , (C112) follows from the nonnegativity of α − g n (U, ϕ) on G n −,α and from (C95). Equation (C102) follows form the definition of the random variable X presented in the discussion above (C98). Finally equation (C114) follows directly form (C98). Summing up over n we obtain the inequality Let ∆ be a fixed positive number. By setting α = ∆ N , and by using (C106) we obtain the upper bound Finding the minimal value of right hand side of (C116) (treated as a function of ∆) is difficult. Numerical investigation shows that the minimal value is obtained very close to ∆ = 1. Inserting this to (C116) gives where c + = − 5 6 + 3 e ≈ 0.270.

Appendix D: Proofs of main theorems
In this section we use the technical results developed in the preceding parts of the Appendix to prove main theorems form the main manuscript. In the main text we have used, for the sake of simplicity, the Θ notation that allowed us to hide the presence of complicated constants in the concentration inequalities. In what follows we will present technical versions of these theorems giving explicitly all the relevant constants. Proofs of Theorems 1,2,3, and Example 1 are analogous in a sense that they all relay concentration inequalities (A7) and on The proof of Theorem 4 is slightly more complicated and relies on the regularity of F cl (U, ϕ) viewed as a function of the parameter ϕ.
Let us start with a immediate corollary of Fact 1 describing the concentration of measure on SU (H).
Assume that the expectation value of f is lower bounded as E U ∼µ(H) f ≥ F − . Then, for every ≥ 0 the following large deviation bound holds, We use Corollary 2 to prove technical versions of Theorems 1,2,3 and Example 1 from the main text.  (D3) and (D4) respectively yields Theorem 1. Proof. The proof of Theorem 5 follows directly from Corollary 2 and results proved previously. From Lemma 1 and 3 one can infer that the Lipschitz constant of F LU is upper bounded byL = 32 H 2 = 32N 2 h 2 . From (17) we have the upper bound on E U ∼µ(H) F LU . Using this bound in (D1) gives (D3). The lower bound E U ∼µ(H) F LU can be obtained by noting that the unoptimized QFI is a lower bound to its optimized version. Therefore where in the last equality we used (C27). Plugging (D5) in (D2) yields (D4).
where |S N | = N +d−1 Using the inequality (C15) and the Fuch-van de Graaf inequality [102], 1 − F 2 (σ N , σ mix ) ≤ 1 2 d B (σ N , σ mix ), we obtain Inserting this inequality into (D7) gives which together with the bound on the Lipschitz constant of F(U ) and Corollary 2 allows us to conclude (D6). Sketch of the proof. The proof of Example 3 parallels proofs of Theorem 5 and 6 and relies on Fact 1. The bound of the Lipschitz constant of F p (U ) is provided by Lemma 3. The expression the average of F p (U ) is given in Lemma 8. The inequality (D11) follows directly from concentration inequalities from Fact 1 by setting =˜ E U ∼µ(S N ) F p .
Theorem 7 (Technical version of Theorem 3 from the main manuscript). Fix a single particle Hamiltonian h, local dimension d, nonngative integer k and a state σ N on S N with eigenvalues {p j } j . Let σ mix be the maximally mixed state on S N . Let F k (U ) := F(tr k U σ N U † , H N −k ), then for every ≥ 0 where |S N | = N +d−1 Theorem 8 (Technical version of Theorem 4 from the main manuscript). Let ψ N be a fixed pure state on S N with d = 2 bosonic modes. Let p n|ϕ (U ψ N U † ) the probability to obtain outcome n in the interferometric scheme defined in Section VII, given that the value of the unknown phase parameter is ϕ and the input state was U ψ N U † (see also (31)). Let F cl (U, ϕ) := F cl ( p n|ϕ (U ψ N U † ) ) be the corresponding FI according to (32) (or (C87)). Then, for every ≥ 0 and every ϕ ∈ [0, 2π] we have , Pr In the equations above E U ∼µ(S N ) F cl (U, ϕ) satisfies inequalities Moreover, let us notice that from (A59) it follows that F cl (U, ϕ) is Lipschitz continuous for fixed U and varying ϕ: where in (A59) we set X = H =Ĵ z . From (D21) it follows that for fixed U ∈ SU (S N ) and for ϕ,φ ∈ [0, 2π] we have When the points in the discretization (D20) are separated by ∆ = 2π M , the distance on any ϕ ∈ [0, 2π] to closest ϕ i the does not exceed ∆ = ∆ 2 = π M . Using the union bound, equation (D14) and the lower bound in equation (D16) we obtain Pr U ∼µ(S N ) Using (D22) and the discussion following it we obtain Now by setting in the above equation M = 12πN c− (this is the smallest integer M such that 3N 3 ∆ ≤ ci 4 N 2 ) and = c− 2 N 2 we obtain (D18).
Appendix E: Partial-trace and beam-splitter models of particle losses In this section we prove the equivalence of the beam-splitter model of particle losses and the operation of taking partial trace over the constituent particles in the system of N bosons in d = 2 modes. A general pure state ψ N of N bosons in two modes a and b can be written as where x := |x| := i x i denotes the Hamming weight of any binary string x = [x 1 , . . . , x N ], whose consecutive entries specify the state of each qubit. As a result, we may write a general bosonic pure state (E1) in the particle basis as with the coefficients c x then given by c x = 1 ( N n ) N n=0 α n δ x,n .
1. Tracing-out k particles Let us define notation in which we may split any binary string, x (describing N qubits), into two strings, x and u (describing first N − k and last k qubits respectively), so that x = [x , u] = [x 1 , . . . , x N −k , u 1 , . . . , u k ]. Then, we may generally write the bosonic state (E3) in the particle basis after tracing-out the last k qubits as (E8) In the mode basis we may equivalently write (E13) 2. Beam-splitter model of mode-asymmetric particle losses In quantum optics, photonic losses are modelled by adding fictitious beam-splitters (BSs) of fixed transmittance into the light transmission modes [39]. In this way, by impinging a vacuum state on the other input port of any such BS and tracing out its unobserved output port, one obtains a model depicting loss of photon. In case of the two-mode N -photon bosonic state (E1), after fixing the transmissivity of the fictitious BS introduced in mode a (b) to η a (η b ), the density matrix describing then the observed modes generally reads [3]: p la,l b |ξ la,l b m ξ la,l b | , where Λ BS ηa,η b is the effective quantum channel representing the action of fictitious BSs in the two modes, while indices l a and l b denote the number of photons lost in modes a and b respectively. The states |ξ la,l b m := 1 √ p la,l b