Quantum Variational Optimization of Ramsey Interferometry and Atomic Clocks

We discuss quantum variational optimization of Ramsey interferometry with ensembles of $N$ entangled atoms, and its application to atomic clocks based on a Bayesian approach to phase estimation. We identify best input states and generalized measurements within a variational approximation for the corresponding entangling and decoding quantum circuits. These circuits are built from basic quantum operations available for the particular sensor platform, such as one-axis twisting, or finite range interactions. Optimization is defined relative to a cost function, which in the present study is the Bayesian mean square error of the estimated phase for a given prior distribution, i.e. we optimize for a finite dynamic range of the interferometer. In analogous variational optimizations of optical atomic clocks, we use the Allan deviation for a given Ramsey interrogation time as the relevant cost function for the long-term instability. Remarkably, even low-depth quantum circuits yield excellent results that closely approach the fundamental quantum limits for optimal Ramsey interferometry and atomic clocks. The quantum metrological schemes identified here are readily applicable to atomic clocks based on optical lattices, tweezer arrays, or trapped ions.


I. INTRODUCTION
Recent progress in quantum technology of sensors has provided us with the most precise measurement devices available in physical sciences. Examples include the development of optical clocks [1], atom [2] and light [3] interferometers, and magnetic field sensing [4]. These achievements have opened the door to novel applications from the practical to the scientific. Atomic clocks and atomic interferometers allow height measurements in relativistic geodesy [5][6][7][8] or fundamental tests of our understanding of the laws of nature [9][10][11], such as time variation of the fine structure constant. In the continuing effort to push the boundaries of quantum sensing, entanglement as a key element of quantum physics gives the opportunity to reduce quantum fluctuations inherent in quantum measurements below the standard quantum limit (SQL), i.e. what is possible with uncorrelated constituents [12]. Squeezed light improves gravitational wave detection [13], allows lifescience microscopy below the photodamage limit [14], further, squeezing has been demonstrated in atom interferometers [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30]. However, beyond the SQL, quantum physics imposes ultimate limits on quantum sensing, and one of the key challenges is to identify, and in particular devise experimentally realistic strategies defining optimal quantum sensors [31].
Here, entangled input states and the entangled measurement protocols [57][58][59][60][61][62][63] defining the generalized Ramsey interferometer are represented as variational quantum circuits built from 'natural' quantum resources available on the specific sensor platform (see Fig. 1), which are optimized in light of a given cost function defining the optimal interferometer. As we will show, already lowdepth variational quantum circuits can provide excellent approximations to the optimal interferometer. Intermediate scale atomic quantum devices [64,65], acting as programmable quantum sensors [66], present the opportunity to implement these low-depth quantum circuits, defining an experimental route towards optimal Ramsey interferometry.
As noted above, optimality of a quantum sensing protocol is defined via a cost function C which is identified in context of a specific metrological task. In our study of variational N -atom Ramsey interferometry, we wish to optimize for phase estimation accuracy defined as the mean squared error (φ) relative to the actual phase φ, averaged with respect to a prior distribution P(φ) with width δφ, which represents the finite finite dynamic range of the interferometer. Thus the cost function is C ≡ (∆φ) 2 = dφ (φ)P(φ). This corresponds to a Bayesian approach to optimal interferometry where the prior width of the phase distribution δφ is updated through measurement to ∆φ characterizing the posterior distribution. As outlined in Fig. 1 1. (a) Quantum circuit representation of Ramsey interferometer with uncorrelated atoms. The phase φ is imprinted on the atomic spin-superposition prepared by global π/2-rotation around y-axis, Ry(π/2). Consequent rotation, Rx(π/2), and measurement of difference m of atoms in eigenstates |↑ and |↓ in z-basis allows estimating the phase φ using an estimator function φest(m). (b) Quantum circuit of a generalized Ramsey interferometer with generic entangling and decoding operations UEn and UDe, respectively. Our variational approach (c) consists of an ansatz, where optimal UEn and UDe are approximated by low-depth circuits. These are built from 'layers' of elementary operations, which are provided by the given platform. We specify the variationally optimized quantum sensor by circuits UEn(θ) and UDe(ϑ) [see Eqs. (6) and (7)], of depth nEn and nDe, respectively. Here θ ≡ {θi} and ϑ ≡ {ϑi} are vectors of variational parameters to be optimized for a given strategy represented by a cost function C defined here as Bayesian mean squared error (BMSE) [see Eqs. (2) and (10)]. We illustrate the approach with a variational circuit built from global spin rotations Rx and one-axis-twisting gates Tx,z available in neutral atom and ion quantum simulation platforms, as discussed in Sec. II B. The circuits optimization, shown as a feedback loop (in red), can be performed on a classical computer, or, if the complexity of underlying quantum many-body problem exceeds capabilities of classical computers, on the sensor itself, thus, leading to a (relevant) quantum advantage, see Sec. II G.
variational approach to optimal Ramsey interferometry seeks to minimize C over variational quantum circuits, and thus identifying optimal input states and measurements for a given δφ. Note that in the present work we optimize a metrological cost function for the complete quantum sensing protocol with variational quantum circuits. We distinguish this from variational state preparation schemes, e.g. variational squeezed state preparation of Ref. [66], where a squeezing parameter was optimized as cost function.
We contrast our Bayesian approach of identifying a metrological cost function to a Fisher information approach, which optimizes accuracy locally at a specific value of the phase, corresponding to the limit δφ → 0 [3]. Discussions of fundamental limits in quantum sensing are often phrased in terms of quantum Fisher information and the quantum Cramèr-Rao bound leading to definition of the Heisenberg limit (HL) [67][68][69]. This identifies GHZ states [70], saturating the HL, as the optimal states for Ramsey interferometry. Furthermore, this leads to the conclusion that adding a decoding step (see Fig. 1) is not beneficial for quantum metrology since a separable measurement is optimal in this context [68]. This conclusion, however, is not applicable to phase estimation with finite prior width since the GHZ state interferometry in single-shot scenarios is optimal only for estimation of phase values in an interval δφ GHZ ∼ 1/N , which shrinks as number of atoms N increases [37,71], see Sec. II F below. In fact, for large priors δφ tailored quantum input states will differ greatly from squeezed spin states (SSS) [72,73] or GHZ states [3,31], and a nontrivial mea-surement is required for an optimal metrological protocol. Our variational approach to optimal Ramsey interferometry (see Fig. 1) finds these optimal entangling and decoding circuits [74].
Our discussion of optimal single-shot Ramsey interferometry [75] has immediate relevance for atomic clocks [12,[76][77][78][79]. An optical atomic clock operates by locking the frequency of an oscillator, represented by a classical laser field with fluctuating frequency ω L (t), to the transition frequency ω A of an ensemble of N isolated atoms [1]. The locking of the laser to the atomic transition is achieved by repeatedly measuring the accumulated phase φ = T 0 dt[ω L (t) − ω A ] in Ramsey interferometry with interrogation time T . Importantly, the width δφ of the distribution of this phase increases with the Ramsey time T . It is therefore critical to achieve a good phase estimate in conjunction with a wide dynamic range for making an accurate inference about the frequency deviation, and ultimately for stabilizing the clock laser to the atomic transition. Our variational approach for Bayesian phase estimation is made to satisfy these requirements, and provides optimal quantum states and measurements minimizing the instability in atomic clocks as measured by its Allan deviation. We predict significant improvements over previously known one-shot non-adaptive strategies. Our predictions are backed up by comprehensive numerical simulations of the clock laser and its stabilization to the atomic reference in a closed feedback loop [78,79].
In the following, we first develop the general theory of variationally optimized Ramsey interferometry based on Bayesian phase estimation in Sec. II, and then apply this theory to the specific problem of an optical atomic clock in Sec. III.

II. QUANTUM VARIATIONAL OPTIMIZATION OF RAMSEY INTERFEROMETRY
For concreteness, we consider estimation of the phase φ in an atomic interferometer consisting of an ensemble of N identical two-level atoms described as spin-1/2 particles [12]. The general idea developed in the following applies to any SU (2) interferometer. The interferometer encodes the phase in the atomic state by evolving according to |ψ φ = exp(−iφJ z ) |ψ in . Here |ψ in is an initial probe state [80], and J x,y,z = 1/2 N k=1 σ x,y,z k is the collective spin with σ x,y,z the Pauli operators. The task is to determine the unknown phase φ by performing a measurement on the atoms.

A. Bayesian approach to phase interferometry
The most general measurement is described by a positive operator valued measure (POVM), that is a set {Π x } of positive Hermitian operators such that dx Π x = 1 1. The parameter φ is estimated on the basis of a measurement result x using an estimator function φ est (x). The phase estimation accuracy is characterized by a mean squared error (MSE) with respect to the actual phase φ where p(x|φ) = Tr{Π x |ψ φ ψ φ |} is the conditional probability of the measurement outcome x [3]. In our discussion we consider the phase φ to be defined on the interval −∞ < φ < ∞ [81]. In order to find an interferometer performing the most accurate measurement of the phase φ we cannot minimize the MSE (1) for all values of φ simultaneously. First, the atomic interferometer is only sensitive to phase values modulo 2π as exp(−iφJ z ), and hence also p(x|φ) is periodic. Thus, it can not distinguish arbitrary phases. Second, an initial state and measurement working well for one phase value might be insensitive to another value. Thus we consider an estimation error minimized for a weighted range of phase values relevant for a given sensor and measurement task. In the following we will adopt a Bayesian approach where the estimation error is averaged over a prior phase distribution P(φ). The cost function of interest is thus defined as the MSE averaged over the prior distribution, defining the Bayesian mean squared error (BMSE) The prior distribution P(φ) reflects the statistical properties of the unknown phase φ hence it is, in general, sensor and task dependent.
Optimal interferometry is based on minimizing the cost function (2) over |ψ in , {Π x }, and φ est (x) for the given prior distribution. For simplicity, we will focus on prior distribution as a normal distribution centered around zero This problem was addressed in [31], where the optimal quantum interferometer has been identified. Below we optimize the cost function (2) within a variational quantum algorithmic approach.

B. Variational Ramsey interferometry
Our goal is to find an implementation of the optimal interferometer given a restricted set of quantum gates available on an experimental platform such as neutral atoms or trapped ions. We will show that low-depth variational quantum circuits of given depth [see Fig. 1(c)] are excellent approximations to optimal interferometry, and can yield significant improvements over SQL defined for uncorrelated atoms.
In the most general form the variational interferometer, illustrated in Fig. 1(b), can be defined by a generic entangling unitary operation U En preparing an entangled input state from the initial product state |ψ 0 = |↓ and a decoding operation U De transforming the projective measurement of a typical observable J z , with eigenbasis |m , into a generic projection Here we consider the subspace spanned by the states |m , m ∈ {−N/2, . . . , N/2}, which are completely symmetric under permutations of N atoms, and |ψ 0 = |−N/2 . The measurement amounts to counting the difference m of atoms in state |↑ and |↓ . As shown in [34], this assumption can be made without loss of generality. The basis states |m are given by the eigenstates of total spin of maximum length, j = N/2, thus satisfying J 2 |m = j(j + 1) |m and J z |m = m |m . As shown in [31], the optimal POVM may be restricted to the class of standard projection von Neumann measurements Π x = |x x|, x|x = δ xx . Thus the measurement of the collective spin component J z transformed by a decoder U De represents the measurement problem in full generality. We assume that the programmable quantum sensor provides us with a set of native resource Hamiltonians {H (i) R }. The unitaries generated by these Hamiltonians determine a corresponding native set of quantum gates as variational ansatz for U En and U De . A generic example is provided by global rotations R µ (θ) = exp(−iθJ µ ) and the infinite range one-axis-twisting (OAT) interaction [73] T µ (θ) = exp(−iθJ 2 µ ) with µ = x, y, z. Such interactions have been realized on quantum simulation platforms [15-29, 82, 83], and very recently also on an optical clock transition [30]. Within this set of gates we constrain the quantum circuits to be invariant under the spin x-parity transformation ensuring an anti-symmetric estimator at and around φ = 0 (see App. B). The most general circuits satisfying the x-parity constraint for a fixed number n En and n De of layers of entangling and decoding gates are and Here the subscripts on the parameters indicate the layer containing the same three gates and the superscript identifies the gate within the layer. The complexity of the circuit is thus classified by (n En , n De ), and we have 3(n En +n De ) (global) variational parameters in a (n En , n De )-circuit, independent of N . Note that here U En and U De commute with particle exchange. The Hilbert space dimension for dynamics in the symmetric subspace is linear in N , which allows us to study theoretically the scaling for large particle numbers N below -in contrast to the case of finite range interactions in Sec. II G. We note that conventional Ramsey interferometry with uncorrelated atoms corresponds to the (0, 0)-circuit with U En = R y (π/2) and U De = R x (π/2). Here atoms are prepared initially in a product state, or coherent spin state (CSS), and remain in a product state during the evolution in interferometer followed by measurement of J y . On the other hand, the interferometer with SSS as input, and GHZ interferometry emerge as (1, 0)-and (2, 1)-circuits, respectively.
In the presented entangler-decoder framework the performance of the interferometer is described, similar to Eq. (1), by the MSE where the conditional probability is Therefore, the optimal interferometer found within the restricted set of available operations is described by the minimum of the BMSE  2. Performance of the variationally enhanced interferometer with N = 64 particles. Performance is shown in terms of the posterior phase distribution width relative to the prior width, ∆φ/δφ, for a given prior, that is, for a given dynamic range of the interferometer. Colored lines show the performance of variationally optimized circuits for the depth (nEn, nDe) of entangling and decoding layers as indicated. The number of variational parameters is given by 3(nEn+nDe). The performance of the optimal quantum interferometer (OQI) [31] is indicated by the dotted line. The shaded areas indicate the classically accessible (purple) and the quantum mechanically forbidden (gray) regions (for N = 64). Related results applied to atomic clocks are shown in Fig. 10 To be specific, we assume for the prior a normal distribution P δφ (φ) with standard deviation δφ [see Eq. (3)]. In addition, (10) assumes a linear estimator φ est (m) = am which is close to optimal, as shown below. We note that it is possible to use the optimal Bayesian estimator, which however is computationally demanding. We describe the corresponding iterative procedure in App. D for the case of a phase operator as observable.

C. Results of optimization
Results of interferometer optimizations [84] are shown in Fig. 2 for N = 64 atoms. The figure plots the ratio ∆φ/δφ of the root BMSE ∆φ relative to the normal prior width δφ. The more information we gain about the parameter φ in a single measurement, the smaller the value of this ratio.
The black dotted line shows the result of the unrestricted minimization of the cost function (2) with normal prior [31], which we refer to as optimal quantum interferometer (OQI). It defines the region (shaded area) inaccessible to any N -particle quantum interferometer. The purple line represents performance of the conventional Ramsey interferometer with CSS as input and a linear estimator, given by the (0, 0)-circuit. Thus, the shaded area above the purple line roughly defines the classically achievable performance.
The performance of the entanglement enhanced inter- 3. Visualization of quantum states |ψ φ = exp(−iφJz) |ψin , and quantum measurement operators as Wigner distributions on the generalized Bloch sphere for N = 64 and δφ ≈ 0.7. The first (a,d), second (b,e), and third column (c,f) correspond to (nEn, nDe) = (1, 0) (squeezed input state, and Jy measurement operator), the optimal quantum interferometer, and to a (1, 3) quantum circuit, respectively. Measurement operators are visualized as colored contours on the Bloch sphere corresponding to different measurement outcomes. The corresponding optimized (optimal) states |ψ φ are shown at various angles φ as gray shaded areas. (a,b,c) three dimensional view of the generalized Bloch sphere with a state rotated to φ = π/3. (d, e, f) Top view of the Bloch sphere with the state rotated to angles φ = 0, π/3, 2π/3. (g) Measurement probability p(m|φ) [see Eq. (9)] corresponding to the overlap between the contours of the measurement distribution and the respective state distribution, displayed in the same column. The three rows correspond to the above three angles φ. Note that for the Jy measurement the distributions at angles π/3 and 2π/3 are indistinguishable in measurement statistics. In contrast, for the OQI and the (1, 3) quantum circuit these angles are well resolved.
ferometer is shown with colored lines. The orange curve represents a (1, 0)-circuit corresponding to a squeezed spin state (SSS) interferometer [72,73], employing the OAT interaction to generate an entangled initial state with suppressed fluctuations along the axis of the effective J y measurement. The minimum of the orange line is located at smaller δφ values when compared to the minimum of the purple line corresponding to the SQL. This manifests the fact that SSS input state increases the sensitivity of the phase measurement at expense of dynamic range [76,85,86]. By adding a single layer of a decoding circuit we obtain the blue curve corresponding to the (1, 1)-interferometer with a slightly enhanced sensitivity and dynamic range. The red and green lines correspond to (1, 3)-and (2, 5)-circuits, respectively, and show striking improvement in sensitivity, providing an excellent approximation for the optimal interferometer (black dotted line). Remarkably, the minima of the red, green, and black curves are located at a wider dynamic range δφ than that of the CSS interferometer. Hence the optimal entangled initial state and the effective nonlocal observable allows us to achieve both a higher phase sensitivity and a wider dynamic range.
To gain understanding of the physical meaning of the measurements and initial states emerging from the numerical optimization, we show their Wigner functions in Fig. 3. A formal definition of the Wigner distribution is provided in App. C. The three columns correspond, in consecutive order, to the (1, 0)-circuit (SSS interferometer), the optimal quantum interferometer of [31], and the (1, 3)-circuit. The chosen prior width δφ ≈ 0.7 is indicated in Fig. 2 by the vertical dashed line. The first row of panels 3(a-c) shows 3D views of the generalized Bloch sphere with Wigner functions of the measurement operators shown in shades of red and blue for J y , the optimal observable, and U † De J z U De operators, respectively. A contour of constant color corresponds roughly to a certain measurement outcome which is obtained with the probability given by the overlap of the contour with the Wigner function of a quantum state. The states are shown in 3(a-f) with the gray outlined areas.
Panel 3(a) shows clearly the non-optimality of the SSS interferometer with a measurement of the spin projection J y . Optimization of the SSS results in a moderate level of squeezing (gray ellipse squeezed along the y-axis). More squeezing would produce stronger anti-squeezing along z-axis leading to overlap with more contours of the J y Wigner function, thus increasing the variance of the measurement results for nonzero φ [76,86]. Another limitation of the SSS interferometer, illustrated in panels (d) and (g), is the reduced dynamic range in the interval −π/2 and π/2. Panels (d) and (g) show that states rotated by the phase angle φ = 2π/3 > π/2 have the same measurement statistics as states rotated by φ = π/3. Thus, phases outside the [−π/2, π/2] interval can not be reliably estimated.
The optimal quantum interferometer is explained in the central column of Fig. 3. Here panel (b) shows that the initial state is squeezed significantly stronger than in the SSS interferometer. This is possible because the corresponding optimal measurement is very similar to the phase operator of Pegg and Barnett [87], which has eigenstates with well defined phases (see Sec. II D below for detailed comparison). One can see that the color contours of the optimal measurement Wigner function in panels (b) and (e) are aligned with the meridians and thus overlap favorably with the strongly squeezed initial state rotated by a wide range of phase angles φ. Strikingly, the OQI can effectively use the full 2π dynamic range, as illustrated in panels (e) and (g).
Finally, the (1, 3)-interferometer, presented in the third column of Fig. 3, exhibits properties similar to the OQI. Interestingly, the initial state in this case is not a con-ventional squeezed state, as shown in panel (c), but a slightly twisted one. This, however, does not impair the performance of the interferometer as the effective measurement is also twisted such that it matches the initial state rotated by a wide range of phase angles. This peculiarity is a consequence of the restricted gate set available for the variational optimization in a realistic system. It is remarkable that the low depth (1, 3)-circuit already provides an excellent approximation for the OQI.
The extended dynamic range of the variationally optimized interferometer is explored in Fig. 4. Panels (a) and (b) show, respectively, the estimator expectation valuē and the estimator mean squared error (8) as functions of the actual phase φ for an interferometer optimized for prior width of δφ ≈ 0.7 (indicated with vertical dashed lines).
The estimator expectation value of the (0, 0)-and (1, 0)circuit (CSS and SSS interferometer) is given by a sine function [purple and orange line in panel (a)], thus, it can unambiguously map the estimated phase to the actual phase in the range between −π/2 and π/2. However, the useful dynamic range of the interferometer is even narrower as shown by the estimator error in panel (b). The estimator error of the SSS state is suppressed below the CSS benchmark line only for phases between, roughly, −π/4 and π/4. The (1, 1)-interferometer [blue line in (a) and (b)] starts to exploit the entangled measurement and achieves a bit wider linear regime ofφ est in (a) and a wider region of suppressed estimator error in (b). Although the minimum error of (1, 1)-circuit is larger than that of (1, 0)circuit, it still has superior overall sensitivity as phases in the tails of the prior distribution are better resolved.
Finally, more complex decoding operations employed by the (1, 3)-and (2, 5)-circuit (red and green lines) allow to approach the performance of the optimal interferometer (black dotted lines). The linear regime ofφ est extends almost to the full 2π range, and the estimator error is well suppressed for phases deeply within the tails of the prior.

D. Comparison between variational and phase operator based interferometers
From a theory perspective it is interesting to compare the performance of the variationally optimized interferometer and the interferometer based on covariant measurement [33,34]. Here covariant measurements represent the class of measurements optimal for phase estimation with no a priori knowledge and phase-shift symmetry, i.e. assuming a prior distribution P(φ) = (2π) −1 and a 2π-periodic cost function, as opposed to the MSE (1).
In the case of clocks and magnetometry, the free evolution encoding the phase φ is the collective spin rotation, e −iφJz . The corresponding covariant measurement optimal for estimation of the rotation angle φ can be represented by the von Neumann measurement [88] with phase operatorΦ [87], which we define in App. D.
In order to evaluate the performance of the phase operator based interferometer (POI), we minimize the cost function (2) forΦ as the observable and the normal prior P δφ (φ). To this end, we use the optimal Bayesian estimator known as the Minimum Mean Squared Error (MMSE) estimator [3] and find the corresponding optimal initial state |ψΦ (see App. D for details). This results in the optimal posterior width ∆φ POI as discussed in Sec. II C for the variationally optimized interferometer.
To compare different interferometers we consider their performance at the optimal prior width with respect to the OQI performance and define the ratio: The χ value corresponds to the ratio of minima of an interferometer and the OQI curves in Fig. 2. The OQI corresponds to χ = 1. Figure 5 shows the χ − 1 value for variationally optimized andΦ based interferometers for various system sizes up to N = 512. The figure highlights sub-optimality of the POI (blue points) for the task of phase estimation with non-periodic cost function, as is relevant for frequency estimation in, e.g., optical clocks. For small systems, N 16, the POI is up to ∼ 10% less efficient than the OQI and the variational (1, 3)-and (2, 5)-interferometers (green and red points, respectively). (1, 3)-circuit outperforms POI for systems of up to N ∼ 40 atoms, whereas (2, 5)-circuit is better for up to N ∼ 100 atoms. In the limit of large number of atoms, N ≫ 1, the POI approaches the OQI performance. Empirical fitting indicates convergence rate χ POI − 1 ∼ N −0.77 , as N increases. On the other hand, the variationally optimized interferometers diverge from OQI linearly with N .

E. Variational Optimization in Presence of Imperfections and Noise
Variational optimization can be extended to include imperfections and decoherence. This optimization can also be carried out on the physical quantum sensor. This is particularly beneficial when the experimental characterization of imperfections and noise is incomplete.
There are various sources of imperfections and decoherence, which are relevant in our context. First, there are control errors in implementing variational quantum gates. These include offsets of control parameters and Hamiltonian design errors. The latter are deviations of the physically realized vs. the ideal Hamiltonian, e.g. in the implementation of one-axis twisting interaction. However, if these (unknown) control or design errors are static, i.e. do not fluctuate between experimental runs, a variational algorithm performed on the device will still optimize, and thus compensate in the best possible way for these errors in U En and U De , i.e. find the best gate decomposition for given building blocks. In addition, there will be decoherence due to fluctuations of control parameters, or coupling to an environment as in spontaneous emission or dephasing.
To incorporate the latter we need to extend the formalism to density matrices instead of the previously discussed pure states. Below we illustrate this by an optimization of the Ramsey interferometer in the presence of single atom dephasing noise during the Ramsey interrogation time T , as one example of experimentally relevant decoherence. Local dephasing noise is described by the Lindbladian Thus the density matrix after the Ramsey interrogation time, can be expressed in terms of the dimensionless phase φ accumulated during the Ramsey interrogation time T and the effective exposure to the dephasing noise γT with dephasing rate γ. Here ρ θ = U En (θ) |ψ 0 ψ 0 | U † En (θ), where we used that the dephasing Lindbladian and the free evolution of the clock supercommute. The particle permutation symmetry of the Lindbladian enables us to simulate systems at a cubic cost in N [89,90]. The conditional probability, required to determine the BMSE in Eq. (10) therefore reads Figure 6 shows that the optimized ∆φ/δφ increases as the noise increases, as expected. For a small γT /δφ = 0.01 the variational (1, 3)-interferometer is close to optimal without noise. Remarkably for all ratios γT /δφ 1, the minimum of the (1, 3) interferometer remains well below the uncorrelated (0, 0)-and the SSS (1, 0)-interferometers. This ordering of the respective global minimum is independent of N , whereas for γT /δφ = 10 none of the entangling sequences improve significantly compared to SQL [91].

F. Towards the Heisenberg limit
The variationally optimized interferometer with lowdepth quantum circuits found within the Bayesian framework quickly approaches the accuracy of the optimal Ramsey interferometer. We will now discuss our results from the perspective of reaching the Heisenberg limit (HL).
The HL is a lower bound on the accuracy of an interferometer imposed by quantum mechanics. For an N -atom interferometer the HL and SQL are traditionally written as which must be understood in context of the quantum Fisher information [67,69,92] and quantum Cramér-Rao bound [33,93] (implying δφ → 0). In contrast, in the present work we have adopted a Bayesian approach, which includes optimizing for a finite dynamic range δφ.
To evaluate the performance of our quantum variational results for a given circuit depth in comparison with HL, we will adopt below van Trees inequality [94,95] as a bound for the BMSE. In brief, for any given conditional probability distribution p(m|φ) the Cramér-Rao inequality provides a bound on the variance of an unbiased (φ est = φ) For pure states, i.e. in the absence of decoherence, F φ ≤ N 2 in correspondence to the HL above. We emphasize that the Cramér-Rao inequality seeks to identify optimal unbiased estimators, which can in general be achieved only locally in φ, i.e. in a small neighborhood of a given phase, and not for a finite dynamic range as is the goal in our Bayesian approach. In the Bayesian framework, a bound on the BMSE is imposed by van Trees' inequality, Here, the first term in the denominator is the Fisher information (17) averaged over the prior distribution, The second term is the Fisher information of the prior distribution, representing the prior knowledge. To isolate the measurement contribution from the prior knowledge, we define an effective measurement variance (∆φ M ) 2 via and obtain reminiscent of the Cramér-Rao inequality (16). In case of a normal prior distribution (3) we have I = (δφ) −2 , and the effective measurement variance (19) reads In Fig. 7 we plot (∆φ M ) × N , the measurement error scaled to the atom number, for the (2, 5)-variational interferometer (solid lines) as a function of the prior width δφ for a range of atom numbers N . In addition, we  indicate the HL and the π-corrected HL (see below) as dotted lines and show results for a GHZ interferometer with spin x-parity measurement [70] (dashed lines). In the case of the GHZ interferometer with a normal prior we have showing that the GHZ interferometer attains the HL uncertainty ∆φ M → 1/N for a given prior width δφ only for atom numbers N 1/δφ. This fact is illustrated in Fig. 7 by the dashed lines which diverge from the HL for smaller and smaller δφ as N grows. In contrast, the variational interferometer (solid lines) is of the order of the π-corrected HL [34,35,37,96], ∆φ M → π/N , for a wide range of prior widths δφ as N increases. Intuitively, the emergence of π-corrected HL can be understood as follows. The optimal N atom quantum interferometer can be described as a von Neumann measurement in the particle permutation symmetric subspace [31,34]. Thus, there are N +1 possible measurement outcomes to distinguish at most N + 1 phase values in the interval [−π, π]. The corresponding estimation error for evenly spread estimates reads ∆φ ∼ (1/2) 2π/(N + 1) → π/N . For large δφ the solid lines in Fig. 7 exhibit strong deviations from the asymptotic π-corrected HL behavior. The cusps are explained by phase slips outside the interval [−π, π] which lead to a squared estimation error of 4π 2 . For a normal prior distribution, the performance of an interferometer limited by the π-corrected HL including the phase slips is given by Results of this section are obtained in absence of decoherence.

G. Finite range interactions
Our previous discussion assumed infinite range interactions as entangling quantum resource, while e.g. neutral atoms stored in tweezer arrays feature finite range interactions. The variational optimization of the BMSE can be directly generalized to finite range interactions, which we illustrate by optimizing a sensor based on Rydberg dressing resources [97,98] , as is realized in alkaline earth tweezer clocks [46][47][48]. The effective interaction Hamiltonian we use for the optimization reads where r k represents the position of particle k. The interaction strength at short distances V 0 and interaction radius R C depend on the Rydberg level and the dressing laser used to let the particles interact [99].
Ref. [66] presented a study of variationally optimized spin-squeezed input states, and we refer to this work for the elementary gates we employ as building blocks for variationally optimizing entangling and decoding operations. In analogy to Eqs. (6) and (7), we write the entangler and decoder, effectively replacing the T x,z by D x,z . In a similar way we can rewrite Eq. (9) to account for dynamics in full 2 N -dimensional Hilbert space. Figure 8(a) shows the optimized ∆φ/δφ for a 4 × 4 square array for R C = a with a the lattice constant. We find variational solutions approximating the OQI, similarly to the OAT interactions in Fig. 2. In contrast to the infinite range OAT interaction we are not able to exactly reproduce the optimal GHZ-state interferometer at δφ < 1/N . Nonetheless, at any prior distribution width significant improvement beyond the uncorrelated interferometer is achieved and in particular around global minimum of the optimal interferometer (vertical dashed line), the decoder-enhanced circuits clearly surpass sensitivity of entangled input states only.
In Fig. 8(b) we further study the dependence on the scaled interaction radius R C /a for a fixed prior distribution width δφ corresponding to the minima of variational and optimal interferometer curves in Fig. 8(a) (vertical dashed line). We see that even in the limit of an effective nearest neighbor interaction R C = a a clear improvement beyond the classical sensitivity limit is possible. As the interaction radius increases, the root BMSE of the variationally optimized interferometer decreases, ultimately reproducing the results of infinite-range interactions in the limit R C /a → ∞.
Theoretical treatment of the variational interferometry with finite range interactions involves solution of a quantum many-body problem. This, in general, is an exponentially hard problem representing the regime where variational optimization on the quantum sensor as a physical device provides a (relevant) quantum advantage, beyond the capabilities of classical computation.

III. APPLICATION TO ATOMIC CLOCKS
Atomic clocks realized with neutral atoms in optical trap arrays or trapped ions provide us with natural entanglement resources to implement variationally optimized Ramsey interferometry. Below we provide a study of a variationally optimized clock assuming as quantum resources global spin rotations and OAT, as realized, for example, with trapped ions as Mølmer-Sørensen gate, or in cavity setups with neutral atoms. This discussion is readily extended to other platforms and resources.
Optical atomic clocks operate by locking the fluctuating laser frequency ω L (t) to an atomic transition frequency ω A [1]. To this end, an atomic interferometer is used to repeatedly measure the phase φ k = t k +T t k dt[ω L (t) − ω A ] accumulated during interrogation time T at the k-th cycle of clock operation, i.e. k = 1, 2, . . .. After each cycle, the measurement outcome m k providing the phase estimate φ est (m k ) is used to infer an estimated frequency deviation φ est (m k )/T . In combination with previous measurement results this is used to correct the laser frequency fluctuations via a feedback loop yielding the corrected frequency of the clock ω(t). For further details on the actual clock operation we refer to App. G, where we also describe our numerical simulations of optical atomic clocks. We emphasize the importance of finite dynamic range in phase estimation in identifying the optimal clock operation, as provided by the Bayesian approach of Sec. II.
The relevant quantity characterizing the long-term clock instability is the Allan deviation σ y (τ ) for fluctuations of fractional frequency deviations y ≡ [ω(t) − ω A ]/ω A , averaged over time τ T [1]. To connect the Bayesian posterior phase variance of the optimized interferometer (10) of Sec. II, we follow the approach of [78] to obtain predictions for the clock instability in the limit of large averaging time τ . Our predictions are supported by numerical simulations of the closed servo-loop of the optical atomic clocks.
In the following we assume that interrogation cycles can be performed without dead times (Dick effect). This can be achieved using interleaved interrogation of two ensembles [100]. For interrogation of a single ensemble, Dick noise may pose limitations for interaction-enhanced protocols especially for larger ensembles, as was analyzed for squeezed states in Ref. [79]. In App. F and App. H we characterize in more detail the requirements regarding dead time for the class of variational protocols developed here.

A. Prediction of clock instability in the Bayesian framework
As shown in [78], the Allan deviation can be well approximated by means of the effective measurement uncertainty ∆φ M which isolates the measurement contribution from the prior knowledge, as in Eq. (19). Assuming no dark times between interrogation cycles, the Allan deviation reads Here τ /T is the number of cycles of clock operation and ∆φ M (T ) ≡ [(∆φ T ) −2 − (δφ T ) −2 ] −1/2 is the effective measurement uncertainty of one cycle. The posterior width ∆φ T is found according to (10) assuming a prior width δφ T = (b α T ) α/2 corresponding to laser noise dominated spreading of the phase distribution within one interrogation cycle. The labels α = 1, 2, 3 specify temporal correlations in the phase noise of the laser and correspond to atomic clocks with a white-, flicker-, or random-walkfrequency-noise-limited laser, respectively. The laser noise bandwidth b α and the exponent α are related to the power spectral density S L (f ) ∝ f 1−α of the free running laser (see App. A). Representative examples for σ y (τ ) when using variationally optimized protocols are shown in Fig. 9. The solid lines result from numerical simulations of the full feedback loop of an atomic clock in which an integrating servo corrects out frequency fluctuations over the course of multiple cycles, see App. G for details. For the simulations we assume the atoms as ideal frequency references without any systematic shift of ω A . In atomic clocks the simulated Allan deviations presented in Fig. 9 are larger at small averaging times τ /T ∼ 1, due to the delayed feedback, before reducing as σ y (τ ) ∝ τ −1/2 at long averaging times τ /T 1 when all correlated laser noise is corrected out. To determine long-term stability the Allan deviation is measured experimentally for a time τ long enough that clock instability  has reached this asymptotic scaling. Therefore, we introduce and consider below a dimensionless prefactor for the asymptotic scaling which gives the Allan deviation in units of ω −1 A (b α /τ ) 1/2 , as shown by the dashed lines in Fig. 9. In the following, we use Eq. (25) to re-evaluate the performance of the optimized interferometers presented in Fig. 2 as the achievable long-term clock instability σ at an averaging time τ . In comparison to the framework of Sec. II the BMSE is replaced by the Allan deviation and the prior width by the interrogation time T . We note that the scaling of the Allan deviation with respect to T is more intricate than the one of the BMSE with the prior width: On the one hand, a large interrogation time means good accuracy in frequency estimation, but on the other hand, it also broadens the prior distribution and therefore degrades the phase estimation. Figure 10(a,b) shows the achievable long-term clock instability σ as a function of the interrogation time T for clocks made of N = 64 atoms and the flicker-noiselimited laser. The purple line (in both panels) represents performance of the conventional clock exploiting Ramsey interferometer with CSS as input, collective spin projection measurement, and a linear estimator, given by the circuit (0, 0). Thus, the shaded area above the purple line roughly defines the performance achievable by classical clocks. In the case of CSS based classical clocks the cost function (10) can be analytically minimized [78] yielding the dimensionless Allan deviation

B. Results of the clock optimization
where ν ≡ (δφ T ) 2 . The expression (26) has two important limits. For small interrogation times and, consequently, small prior widths the performance of the clock is limited by the quantum projection noise of the uncorrelated atoms as σ SQL = (N b α T ) −1/2 . The SQL limited clock instability σ SQL (dashed purple line) decreases as the interrogation time grows. For large interrogation times, b α T ∼ 1, however, the laser noise becomes dominant and generates accumulated phase values exceeding the dynamic range of the atomic interferometer, thus, leading to the laser coherence time limit (CTL) [79] of the clock σ CSS . Between these two limits there exists an optimal interrogation time delivering the minimum Allan deviation σ opt ≡ min T σ which defines the optimal clock performance.
The black dotted line in Fig. 10(a,b) shows the instability of the optimal quantum clock (OQC), σ OQC , exploiting single-shot protocols with the optimal interferometer. The gray shaded region below the black dotted curve is inaccessible to any N -particle clock not using entanglement between different clock cycles for initial state preparations and/or measurements. The laser CTL for the optimal clock in the asymptotic limit of large N can be estimated from Eq. (2) by assuming zero phase estimation error within the [−π, π] interval and (φ) = 4π 2 outside of the interval due to the phase slip The green dotted line in panel (a) shows the laser CTL for the optimal clock, σ OQC CTL . The optimal clock instability at shorter interrogation times demonstrates two distinct scalings corresponding to the two Heisenberg limits discussed in Sec. II F. At very short times, (b α T ) α/2 N −1 , the GHZ state based clock (red line) becomes optimal approaching the instability limit given by the conventional HL, σ HL = N −1 (b α T ) −1/2 (red dashed line). Larger interrogation times correspond to wider prior phase distributions hence the π-corrected HL becomes the limiting factor, σ πHL = πN −1 (b α T ) −1/2 (green dashed line). The optimal quantum clock instability in the limit of large number of atoms, N → ∞, is fundamentally restricted by the interplay between the σ πHL and σ OQC CTL as we will discuss below.
The instabilities of clocks based on variationally optimized interferometers employing quantum circuits of various complexities are shown in Fig. 10(b) with solid color lines. In particular, the orange line corresponds to the SSS based clock, given by the circuit (1, 0). As the circuits depth grows, the enhanced dynamic range of the variational interferometer shifts the laser CTL towards larger interrogation times which in combination with suppressed shot noise reduces the clock instability. The figure shows that variational clocks of growing complexity quickly outperform the SSS clock and approach the optimal quantum clock instability. Beyond the model predictions this improvement is also observed in simulations of a full clock operation using variationally optimized protocols, as shown by the markers in Fig. 10(b). Deviations between theory and numerical results can arise due to a number of different effects. For one, the onset of fringehops for b 2 T ∼ 1 is not included explicitly in the models. Especially for small N a sudden loss of stability, resulting from fringe-hops, can occur before reaching the CTL due to stronger, non-Gaussian measurement noise [78,79]. In contrast, for clocks with larger N and increasing complexity it is expected that the onset of fringe-hops and the minimum of CTL coincide. Another source of discrepancy is the assumption of a laser noise dominated prior width δφ T = (b α T ) α/2 . Propagation of the measurement uncertainty and delay within the feedback control can lead to a broadening of the true phase distribution. Especially protocols which are highly optimized to a particular prior width may thus not achieve their predicted stability in the simulations, e.g. around b 2 T ≈ 0.02 in Fig. 10(b).
Nevertheless, good agreement between the numerically determined instability and the theory prediction is found around the overall optimal protocols.
In Fig. 11 we study optimal instability of the variational clocks σ opt (corresponds to minima in Fig. 10) as a function of the atomic ensemble size N . The CSS clock is represented by the purple line which scales asymptotically as σ CSS opt ∝ N −(3α−1)/(6α) . The scaling is a bit slower than the conventional SQL limit ∝ N −1/2 due to the laser CTL which reduces the optimal interrogation time as N grows. Any classical clock using one-shot protocols with collective spin measurements belongs to the shaded purple region above the CSS clock line. The N -scaling of the optimal quantum clock is shown with the black dotted line for system sizes up to N = 64.
For larger system sizes we show the asymptotic behavior (black dashed line) obtained by combining the noise contributions of the π-corrected HL and the laser CTL, σ asym ≡ min T [σ 2 πHL + (σ OQC CTL ) 2 ] 1/2 . Similar to the classical clock scaling, the laser CTL prevents the optimal quantum clock (OQC) from achieving the Heisenberg scaling ∝ N −1 , instead, leading to a logarithmic correction in the large N limit as found in [101,102]. The present approach allows obtaining tighter bounds on the asymptotic scaling for general α (see App. E). In particular, for the flicker-noise-limited laser, α = 2, the OQC instability scales as with z ≡ 32N 4 /π and the corresponding optimal interrogation time scaling as T OQC opt πb −1 2 ln(z ln z) −1/2 . The gray shaded area below the dashed and dotted black lines is inaccessible to quantum clocks without entangled clock cycles. Finally, the variationally optimized clocks of various circuit complexities are shown with solid color lines and demonstrate scalings approaching the optimal quantum clock as the circuit depth increases.
We have also studied performance of the variationally optimized clocks experiencing individual atomic dephasing during the interrogation period T . Similar to the results of Sec. II E, the optimized clocks perform well for decoherence rates small compared to the laser noise bandwidth, γ/b α 1. For stronger noise, γ/b α 1, the optimized clock instability approaches the one of the classical clock, as expected. We also checked the performance of optimized clocks for other types of laser noise α = 1, 3, and found no significant changes to the results presented above.
In summary, atomic clocks based on variational quantum interferometers with low-depth circuits can approach the performance of the optimal quantum clock in singleshot protocols. The variationally optimized clocks can be readily complemented with more sophisticated interrogation schemes [103,104], eventually also approaching the ultimate quantum bound on the Allan deviation [105,106].

IV. OUTLOOK AND CONCLUSIONS
In this work we have studied optimal Ramsey interferometry for phase estimation with entangled N -atom ensembles, and application of these optimal protocols to atomic clocks. We have considered a Bayesian approach to quantum interferometry, and have defined optimality via a cost function, which in the present study is the BMSE for a given prior distribution or, in the context of atomic clocks, the Allan deviation for a given Ramsey time. The key feature of the present work is that optimization is performed within the family of operational quantum resources provided by a particular programmable quantum sensor platform. Thus identifying the optimal quantum sensor is recast as a variational quantum optimization where the entangling circuits generating the optimal input state, and the decoding circuits implementing the optimal generalized measurement are variationally approximated with the given resource up to a certain circuit depth. We have presented two model studies: in our first model, we considered one-axis twisting as quantum resource; our second model uses finite range interactions as entangling operations. Our examples demonstrate that already low-depth circuits provide excellent approximations for optimal quantum interferometry. We emphasize that the familiar discussions of interferometry with spinsqueezing and GHZ states are included as special cases. Furthermore, advanced measurement strategies including adaptive measurement and quantum phase estimation are not advantageous for the present problem, as a von Neumann measurement has been proven optimal.
Given advances in building small atomic scale quantum computers, or programmable quantum simulators which can act also as quantum sensors, the variational approach to optimal quantum sensing provides a viable route to entanglement enhanced quantum measurements with existing experimental entangling, possibly non-universal resources, and optimizing in presence of noise. Indeed trapped ions with Mølmer-Sørensen entangling gates, and optical arrays interacting via Rydberg finite range interactions or cavity setups provide the necessary ingredients for implementing such variational protocols, and quantum sensors. While first generation experiments might demonstrate optimal Ramsey interferometry for a specified dynamic range of the phase, and optimization of quantum circuits 'on the quantum sensor' for various circuit depths (Sec. II), the present work also promises application of variational quantum sensing on existing quantum sensors, in particular atomic clocks (Sec. III). The guiding principle behind the present work of identifying for a sensing task the optimal sensing protocol given the quantum resources provided by a particular sensor and sensor platform, is, of course, general and generic, and applies beyond Ramsey interferometry, and beyond the BMSE as cost function.
As an outlook, we emphasize that the search for optimal sensing can also be run directly as a quantum-classical feedback loop on the physical quantum sensor. This offers the intriguing possibility of optimizing with given quantum resources and in presence of imperfections of the actual device, which might include control errors and noise. Further studies are needed to explore best optimization strategies of the cost function on the classical side of the optimization loop given the limited measurement budget on the programmable quantum sensor. This applies to both the initial global parameter search, supported by theoretical modeling, and small iterative readjustments of optimal operation points due to slow drifts of the quantum sensor.
Optimization on the (physical) quantum sensor can also be performed in the regime of large particle numbers N , which might be inaccessible to classical computations, i.e. in the regime of quantum advantage. Hybrid classicalquantum algorithms have been discussed previously as variational quantum eigensolvers for quantum chemistry and quantum simulation, where 'lowest energy' plays the role of the cost function which is evaluated on the quantum device. In contrast, in variational quantum sensing we optimize quantum circuits in view of an 'optimal measurement' cost function, and it is the (potentially large scale) entanglement represented by the variational manyparticle wavefunction in N -atom quantum memory, which provides the quantum resource and gain for the quantum measurement.
Note added. After submission of the present manuscript, Ref. [107] reported an experimental implementation of variationally optimized Ramsey interferometry in a systems of up to N = 26 trapped ions, in one-to-one correspondence to the present theoretical work. This includes demonstration of quantum enhancement in metrology beyond squeezing through low-depth, variational quantum circuits, and on-device quantum-classical feedback optimization to 'self-calibrate' the variational parameters. In both cases it is found that variational circuits outperform classical and direct spin squeezing strategies under realistic noise and imperfections. To present results in Sec. III in dimensionless units, we follow [78] and define an effective bandwidthb via where σ L is the Allan deviation of the uncorrected reference laser. For a laser that is mainly limited by a single power spectral density component, i.e S L (f ) = h 1−α f 1−α one can unambiguously express the bandwidth in terms of the prefactors h 1−α in the power spectral density and the respective Allan deviation [108], so that Numerical simulation of the clock feedback loop [78] reveal that the dimensionless time b α T is related to the prior distribution width of a stabilized clock by the relation (δφ) 2 = (b α T ) α , where b α = χ(α) 1/αb α is a rescaled bandwidth, differing fromb α only by an empirically determined prefactor χ ≈ 1, 1.8, 2 for α = 1, 2, 3. For a laser spectrum containing all three contributions Eq. (A1) can still be used to determine an effective bandwidth, and servo loop simulations of the clock can reveal the modified time dependence of the prior distribution width enabling one to extend the clock model to realistic laser noise parameters.
Appendix B: Spin x-parity in entangling and decoding circuits We consider global rotations R µ , OAT interactions T µ (see Sec. II B) and finite range dressing interactions D µ (see Sec. II G) with µ = x, y, x as resources for the variational optimization. Within is this set of resources we are able to ensure an anti-symmetric estimator by imposing invariance under the spin x-parity P x on the Entangler and Decoder, i.e. P x U En R y (−π/2)P x = U En R y (−π/2) and P x U De P x = U De under the spin x-parity P x = R x (π/2), since this implies where we use that P x J x P x = J x , P x J y,z P x = −J y,z , P † x = P x and P x R y (π/2) |ψ 0 = R y (π/2) |ψ 0 . The most general entangling and decoding sequences satisfying these constraints are used in Eq. (6),(7) and displayed in Fig. 1.  [109].
To obtain the Wigner distribution, the operator is expanded in terms of spherical tensors where j k j −m q m denotes the Wigner 3j symbol. O can be represented in the spherical tensor basis where c k,q = Tr OT k,q . Replacing T k,q in this representation by spherical harmonics Y k,q (θ, φ), one arrives at the Wigner distribution, as a quasi-probability distribution on a generalized Bloch sphere.
The Wigner function can be used to calculate the expectation value by integrating the overlap of the respective Wigner functions over the generalized Bloch sphere. This implies that we can interpret contours of the measurement distribution with the different eigenvalues of the measurement operator while the amplitude of the state distribution indicates how much the state overlaps with the respective projection of the measurement projection.
Appendix D: Numerical optimization of the phase-operator based interferometer Here we define the phase operator and describe an iterative optimization procedure allowing us to minimize the cost function (2) for a given observable using the Minimal Mean Squared Error (MMSE) estimator [3]. The phase operatorΦ reads [87,88]: where J z |m = m |m .
Our goal is to minimize the cost function Eq. (2) for the observableΦ and the MMSE estimator by finding the optimal initial state |ψΦ . The MMSE estimator reads [3]: where the conditional probability is p(φ|s) ∝ p(s|φ)P(φ) with p(s|φ) = | s|e −iφJz |ψ in | 2 and the observable eigenstate |s defined in Eq. (D3). The optimization is performed iteratively. Initially we start with s = 0 eigenstate ofΦ as the input state |ψ (0) in = |s = 0 , which is a good approximation for a state highly sensitive to phases around φ = 0. The state defines the corresponding MMSE estimator φ MMSE est(0) (s) as given by Eq. (D4). In the next iteration we find the state |ψ (1) in minimizing the cost function (2) for the given φ MMSE est(0) (s) estimator by solving a corresponding eigenproblem, as described in [31]. The iterative procedure converges quickly yielding the optimal initial state for the POI |ψ (k) in → k→∞ |ψΦ which, in turn, defines the optimal estimator via Eq. (D4) and the corresponding posterior width ∆φ POI . This result is used in Sec. II D.
Appendix E: N -scaling of the optimal quantum clock instability Here we derive asymptotic scaling of the optimal interrogation time and the corresponding minimal instability of the optimal quantum clock. As discussed in Sec. III, the instability of clocks exploiting single-shot protocols is fundamentally limited by the measurement shot noise given by the π-corrected HL for short interrogation times T , and the laser CTL for large T . For the dimensionless Allan variance we write where s ≡ π − 2 α b α T is the dimensionless Ramsey time. The goal is to minimize Eq. (E1) with respect to s in the limit of large number of atoms, N → ∞. The derivative with respect to s reads and, using a self-consistent assumption for optimal time s * 1, results in the following equation for s * , Here we used the error function asymptotic 1 − erf(x) → e −x 2 /( √ πx) for x → ∞. Taking the logarithm of the expression (E2) (s * , α, and N are positive) we obtain an equation for w ≡ s −α * , w − ln w = ln z, with z ≡ 8α 2 N 4 /π. For z > e, the solution can be written as the infinitely nested logarithm, w(z) = ln(z ln(z ln(z(ln . . .) . . .))), and can be checked by direct substitution. Using the w(z) function we can express the optimal Ramsey time for N 1 as follows Finally, we substitute the optimal Ramsey time into Eq. (E1) We use Eqs. (E3) and (E4) and keep only the first two logarithms in the definition of w(z) to obtain expressions for the optimal interrogation time and minimal instability of the optimal quantum clocks in Sec. III for α = 2.
Appendix F: Finite dead time in the atomic clock protocol Here we discuss upper limits to the dead times of atomic clocks, which are required to reach the variationally optimized stability presented in Sec. III. When each interrogation cycle of duration T C = T D + T is composed of a dead time T D > 0, and Ramsey free evolution time T , the stability is reduced compared to the ideal case at T D = 0 discussed in the main text.
Let us consider S L (f ) = h −1 f −1 as the power spectral density of the free running laser. In addition, we assume that the protocols are sensitive to phase shifts during T only and that all entangling and decoding operations are included in the dead time where we assume no sensitivity. Given these assumptions, the instability contribution of the Dick effect is [110] with χ given in App. A and the duty cycle d = T /T C . In addition, the instability predicted in the Bayesian framework, Eq. (24), becomes with σ as defined in Eq. (25). In the following we want to estimate below which level of dead time the combined instability σ y (τ ) = σ 2 Bay (τ ) + σ 2 Dick (τ ) is no longer dominated by the contribution of the Dick effect. The minimal required duty cycle d min where the value for σ 2 Dick (τ ) at optimal Ramsey time b 2 T opt dives below the lowest variational instability is sin 2 (πnd) π 2 n 3 ≤ σ 2 opt .
(F3) From d min one can directly infer the maximum fraction R = T D,max /T C = 1 − T opt /T C = 1 − d min of dead time in the clock cycle, where T C = T opt + T D,max . In the limit R 1 it can be shown that − ln(R)R 2 /(1 − R) 2 ∝ (b 2 T opt )σ 2 opt , so it is expected that for N 1 this ratio will eventually follow a similar scaling as σ 2 opt . The exact relation is shown in Fig. 12. It is worth noting that R 1 is still recommended for small ensemble sizes, even though this condition is not required based on d min , to prevent unnecessarily increasing the clock instability. A more complete model for the influence of dead time and the Dick effect requires to include the full spectral density S L (f ) of the laser and evaluating the sensitivity function during the entangling and decoding dynamics. In order to see how well σ [Eq. (25)] reflects an achievable instability we perform numerical simulations of all essential parts involved in the closed feedback loop of an optical atomic clock when operating with the variationally optimized Ramsey protocols.
Building up the simulations proceeds as follows: (i) The free-running laser is simulated. Given a particular spectral density S L (f ) = h 1−α f 1−α and the Ramsey time T we generate a sequence of random numbers y k = 1 T t k +T t k dt[ω L (t) − ω A ]/ω A which gives the average frequency fluctuations of the laser without any feedback in each cycle k. Correlations between different cycles, required when α = 1, can e.g. be obtained in the time domain by implementingȳ k as a random walk or a sum of multiple damped random walks [78].
(ii) To stabilise the laser frequency for long averaging times τ T a feedback correction is applied to the laser frequency at the end of each cycle. In the simulations, the estimated frequency deviationȳ est,k = m k /(2πω A T ∂ φm (φ) |φ=0 ) obtained from measurement result m k at t k is multiplied by a gain factor 0 < g ≤ 1 and subtracted from the true laser frequency. This integrating servo corrects frequency errors over ∼ 1/g cycles and is sufficient to achieve a robust stabilization at τ /T 1/g for flicker noise limited lasers [79]. However, to simulate the quantum probabilities p(m|φ k ) at t k the phase φ k = ω A Tȳ k based on the actual laser noiseȳ k is needed. Thus, later measurements are affected not only by the noise of the free-running laser but also by the measurement results and corrections from earlier cycles. To implement this efficiently, the simulation runs sequentially: At the beginning the phase φ 1 is calculated for the first cycle only. Then the probabilities p(m|φ 1 ) with this particular phase are calculated and a single measurement result m 1 is sampled according to this distribution. The estimator y est,1 is calculated and the servo corrects the laser frequency so thatȳ 2 =ȳ 2 − gȳ est,1 is the actual noise in the second cycle. This procedure is repeated in each cycle with the corrected frequencies, meaning e.g. φ 2 = ω A Tȳ 2 .
(iii) The clock stability is evaluated, based on the simulated sequence of stabilized frequency deviationsȳ k . The overlapping Allan deviation σ y (τ = nT ) is calculated numerically from averages over n cycles. Statistical averaging is performed over many intervals of length n in a single run with n tot n cycles and then averaging again over multiple runs. Finally, the long term instability is extracted by fitting the prefactor to the asymptotic scaling σ y (τ ) ∝ τ 1/2 reached typically after n ∼ 10 4 cycles in simulations of n tot = 2 × 10 6 cycles.
To compare numerical results to theory predictions, as in Fig. 10(b), the values for T and h 1−α in the simulations are matched to reproduce the same laser induced prior width (δφ) 2 = (b α T ) α .

Appendix H: Cumulative interaction angle
A relevant question regarding the Dick effect is the time it takes to perform the entangling and decoding sequence. The slowest time scale on a quantum simulator is usually the interaction strength. Results presented in Fig. 2, 10 were obtained for interaction angles ≤ π/2. From a practical point of view, however, it might be beneficial to consider smaller interaction angles.
Here we show that, close to the respective minima in Fig. 2, 10, the displayed results of the variationally optimized interferometers can be well approximated by quantum circuits with small cumulative interaction angles θ OAT = nEn k=1 θ . In Fig. 13 we constrain each interaction angle to be positive and smaller than a threshold that decreases with The cumulative angle of all one axis twisting gates Tx,y required to obtain the dimensionless Allan deviations displayed above. The vertical dashed line indicated the interaction angle of π/2 required to prepare a GHZ-state. the depth of the circuit. In addition we require that the cumulative interaction angle θ OAT is always smaller or equal than π/2, the interaction angle required to prepare a GHZ-state. Similarly to the OAT squeezing [73], the variational sequences can also work with a cumulative interaction that decrease rapidly with N while the resulting Allan deviation remains a good approximation of the unconstrained optimization in Fig. 11.