Optimizing one-axis twists for variational Bayesian quantum metrology

Quantum metrology and sensing seek advantage in estimating an unknown parameter of some quantum state or channel, using entanglement such as spin squeezing produced by one-axis twists or other quantum resources. In particular, qubit phase estimation, or rotation sensing, appears as a ubiquitous problem with applications to electric field sensing, magnetometry, atomic clocks, and gyroscopes. By adopting the Bayesian formalism to the phase estimation problem to account for limited initial knowledge about the value of the phase, we formulate variational metrology and treat the state preparation (or encoding) and measurement (or decoding) procedures as parameterized quantum circuits. It is important to understand how effective various parametrized protocols are as well as how robust they are to the effects of complex noise such as spatially correlated noise. First, we propose a new family of parametrized encoding and decoding protocols called arbitrary-axis twist ansatzes, and show that it can lead to a substantial reduction in the number of one-axis twists needed to achieve a target estimation error. Furthermore, we demonstrate that the estimation error associated with these strategies decreases with system size in a faster manner than classical (or no-twists) protocols, even in the less-explored regimes where the prior information is limited. Last, using a polynomial-size tensor network algorithm, we numerically analyze practical variational metrology beyond the symmetric subspace of a collective spin, and find that quantum advantage persists for the arbitrary-axis twist ansatzes with a few one-axis twists and smaller total twisting angles for practically relevant noise levels.

Quantum metrology and sensing seek advantage in estimating an unknown parameter of some quantum state or channel, using entanglement such as spin squeezing produced by one-axis twists or other quantum resources.In particular, qubit phase estimation, or rotation sensing, appears as a ubiquitous problem with applications to electric field sensing, magnetometry, atomic clocks, and gyroscopes.By adopting the Bayesian formalism to the phase estimation problem to account for limited initial knowledge about the value of the phase, we formulate variational metrology and treat the state preparation (or encoding) and measurement (or decoding) procedures as parameterized quantum circuits.It is important to understand how effective various parametrized protocols are as well as how robust they are to the effects of complex noise such as spatially correlated noise.First, we propose a new family of parametrized encoding and decoding protocols called arbitraryaxis twist ansatzes, and show that it can lead to a substantial reduction in the number of one-axis twists needed to achieve a target estimation error.Furthermore, we demonstrate that the estimation error associated with these strategies decreases with system size in a faster manner than classical (or no-twists) protocols, even in the less-explored regimes where the prior information is limited.Last, using a polynomial-size tensor network algorithm, we numerically analyze practical variational metrology beyond the symmetric subspace of a collective spin, and find that quantum advantage persists for the arbitrary-axis twist ansatzes with a few one-axis twists and smaller total twisting angles for practically relevant noise levels.

I. INTRODUCTION
Sensors that utilize quantum mechanical effects to increase sensing precision and accuracy are expected to be among the first quantum information technologies to achieve broad application [1].The study of the abilities of such sensors is referred to as quantum sensing or quantum metrology.When the system being used as a sensor is a collection of qubits or spin-1/2 particles a quintessential problem is qubit phase estimation, or rotation sensing [1,2].Many applications can be understood as instances of this problem including magnetometry with neutral atoms [3], atomic clock stabilization with trapped ions [4] or with neutral atoms [5,6], nuclear magnetic resonance gyroscopes [7], Rydberg atom electric field sensors [8], and many other examples.This paradigm may even be relevant to quantum algorithms [9].
It has been known for a long time that using entangled many-body states can in principle lead to sensing precision that would not have been possible with an unentangled state of the same system [1,[10][11][12][13][14][15].Typically, the available advantage is quadratic in system size, leading to the famous Heisenberg limit [12].A useful type of entanglement for rotation sensing with spin-1/2 particles is spin-squeezing which can be generated, for example, via one-axis twisting type interactions [16,17].These interactions have been engineered in Bose-Einstein condensates [18], cavity QED setups [19], trapped ions [20], and approximately in Rydberg atom arrays [21,22].When one-axis twisting is used in only the state preparation phase the available advantage scales with system size to the power of 5/3 [17].If one-axis twists are used in both the state preparation and measurement phases, as in twist-untwist protocols [23][24][25][26][27][28][29][30], the available advantage scales quadratically.These protocols have been realized experimentally for example in [29].
First, in order to optimize the Bayesian mean squared error, we propose a new family of parameterized circuits, called arbitrary-axis twist ansatzes, where the entangling capacity of one-axis twists is effectively optimized with FIG.1: A schematic representation of a general quantum metrology experiment.Time advances from left to right.
the help of global rotations about arbitrary axes.We find that performance comparable to that of previous approaches can be found with a reduced number of oneaxis twists, especially in the few twist regime.Second, we study whether these schemes remain effective in the presence of complex noise.To this end, we utilize a tensor network simulation technique which takes the permutation invariance of the noiseless evolution as a starting point to exactly simulate the protocol in the presence of correlated dephasing noise during the free evolution.We also study the effect of circuit level noise associated with the one-axis twists.The organization of the paper is as follows.In Sec.II, we review the necessary background.In Sec.III, we describe our arbitrary-axis twist ansatzes and evaluate their effectiveness.Finally, in Sec.IV, we study the robustness of our ansatzes to various types of noise.
The goal of single parameter quantum metrology [1, 10-15, 74, 75] is to optimally estimate a parameter φ associated with some quantum channel Λ φ .Since we are interested in optimal schemes for estimating the parameter, we must specify an objective function by which different schemes may be compared and a feasible set that describes schemes we are willing to consider.The exact form of the objective function is typically dictated by prior knowledge about the value the parameter may take.One method to properly account for prior knowledge (or lack thereof) about the value of φ is to treat φ as a random variable and pursue a Bayesian statistics approach to the problem.Now knowledge about φ before the experiment is preformed can be captured by the prior distribution for this random variable p(φ).A useful objective function is then given by the Bayesian mean squared error (BMSE) where the {ζ} are measurement outcomes, φ est (ζ) is an estimator for the value of φ, and p(ζ|φ) is the probability of measurement outcome ζ conditioned on the value of φ.This quantity is also referred to as the averaged estimation error.Such an approach is often called a global approach since it evaluates performance based on estimation error for all values of φ.In contexts where Λ φ [•] = Λ φ+2π [•], as we will consider below, this integral is sometimes taken to run for 0 to 2π instead.Whether or not this choice makes sense depends on the physical origin of the channel Λ φ [•].For example, a spin- 1  2 particle exposed to a strong magnetic field for a fixed amount of time may acquire a relative phase between its spin-up and spin-down states that is equal to 2πn + θ for some large integer n.This gives the same state as would a relative phase θ which may arise from a much weaker magnetic field but the underlying physical situation is clearly different.Further, when δφ is not too large the probability that |φ| > 2π is small and the two approaches should give similar results.In this paper, the integrals in all BMSEs run from −∞ to ∞.This is in contrast to approaches based on the quantum Fisher information [12,75] which tells us about the possible variance of the estimator at some particular parameter value φ 0 .To gain some understanding of the relationship between this averaged estimation error and the Fisher information, note that the BMSE can be written in terms of the variance and bias as where the variance and bias are defined by and ⟨•⟩ denotes the average over measurement outcomes.So this roughly indicates that the Fisher information approach should be appropriate when an estimator can be found that is unbiased or has small bias for all possible values of the parameter φ.Throughout this paper, we will take the BMSE as the objective function and we will take the prior to be a Gaussian distribution with mean zero and standard deviation δφ.
The basic structure of a quantum metrology experiment is outlined in Fig. 1.The first step is the preparation of some initial fiducial state ρ fid .Throughout this paper, we take this state to be the spin coherent state ρ fid = |+⟩⟨+| ⊗N .An "encoding" channel, at least partially controlled by the experimenter, is then applied to this state to produce a probe state ρ probe = Λ En [ρ fid ].By "partially controlled" we mean, for example, that the intended probe encoding may consist of unitary dynamics but the quantum channel that is actually realized may be a more general completely positive trace-preserving (CPTP) map due to experimental limitations.This situation is considered in Sec.IV.The probe state then undergoes a "free evolution" according to the channel Λ φ [•] that is not controlled by the experimenter and depends on the parameter to be estimated.The resulting state now generally depends on the parameter ρ φ = Λ φ [ρ probe ].We will consider channels that reduce to Λ φ [•] = e −iφJz • e iφJz in the limit that noise is negligible.After the encoding of the parameter φ, the experimenter can apply a "decoding" channel Λ De [•] to produce the state ρφ = Λ De [ρ φ ].At this point, the state is measured according to some positive operator-valued measure (POVM) with measurement operators {E ζ } giving outcome ζ with probability We always take the measurement operators associated with this measurement to be projectors onto the space of states with Hamming weight w, i.e.P w = x s.t.|x|=w |x⟩⟨x| where x ∈ {0, 1} N and |x| is the Hamming weight of x, i.e. the number of entries in the bit string x that are equal to one.This constitutes a measurement of the observable J z .Note that these operators respect a permutation symmetry and can be implemented experimentally without the ability to detect each particle independently.
The final step in this type of experiment is to process the measurement outcome into an estimate φ est (w) of the phase φ.We take this estimator to be φ est (w) = a 2 (N − 2w) where a is a proportionality constant.The average of this estimator over measurement outcomes is proportional to the expectation value of J z in the state ρφ , where we have used the fact that since each qubit in |0⟩ increases the Z-angular momentum by 1 2 and qubit in |1⟩ decreases it by 1 2 .A permutationinvariant measurement operator such as this is known to be optimal in the absence of noise for this type of free evolution [31,33].
The optimal value of the proportionality constant a can be found by first expanding the quadratic form in Eq. ( 1) and performing the averages over the measurement outcomes and the prior distribution to obtain where • avg denotes the average with respect to the prior p(φ).Optimizing this expression with respect to a gives the optimal value of the proportionality constant This value is fixed by the encoding and decoding procedures and the free evolution.Notably, the value of a opt depends on any noise that may be present.While the resulting estimator is not optimal over all estimators, it is simple to compute and the use of a linear estimator eases comparison to prior work [34,69] where consideration was also restricted to linear estimators.The optimal value of the BMSE over all possible probe states, measurements, and estimators can be found via an iterative optimization procedure [33,76,77].In this optimization procedure, first the probe state is fixed and the measurement is optimized for that state.Then the measurement is fixed and the state is optimized for that measurement.This is then repeated until the procedure converges.Since in the absence of noise the optimal strategy is permutation-invariant [31], this optimization can be restricted to the N + 1 dimensional permutationinvariant subspace.Unfortunately, however, the output of this optimization does not provide a procedure for realizing this optimal performance with any particular set of resources.

B. Practical resources for quantum metrology
We now turn the the question of what should constitute the feasible set of strategies.In many experimental setups the available resources include global rotations and one-axis twists where n is a 3 dimensional unit vector and J = J x e x + J y e y + J z e z [17,18,21,22].Here e µ is the unit vector directed along the µ-axis.We will frequently use the FIG. 2: Illustration of the general form of (a) encoding and (b) decoding layers used in the parity symmetric ansatzes [34,57].In this diagram, operations further to the left occur first.
short hand notations R µ (θ) ≡ R eµ (θ) and T µ (θ) ≡ T eµ (θ) for global rotations and one-axis twists along the x, y, or z -axes respectively.Measurements of the total angular momentum of the ensemble along some axis are also commonly possible.Accordingly, we will take the feasible set of strategies to be defined by the set of probe states that can be prepared from |+⟩ ⊗N by a specified sequence of global rotations and one-axis twists and the set of POVMs that can be implemented via a (possibly different) specified sequence of global rotations and one-axis twists ending with a global π 2 x -rotation R x ( π 2 ) and followed by a measurement of the z -component of the total angular momentum.Note that all of the entanglement used in these schemes comes from the one-axis twists.These are the resources used in twist-untwist protocols [23][24][25][26][27][28][29][30].In these protocols, the encoding operation consists of a one-axis twist about the z -axis by 1 √ N followed by a rotation about the x -axis by π 2 .The decoding is then the inverse of the encoding operation followed by a final rotation about the x -axis by π 2 .

C. Variational metrology and parity symmetric ansatzes
The basic idea of quantum variational algorithms is to prepare, on the quantum device, a classically parameterized state by applying a series of unitaries that depend on classical parameters to some fiducial quantum state following which some efficient measurement is preformed on the resulting state.The outcome of this measurement, or a series of such measurements, is then used to update the values of the classical parameters.The sequence of unitaries used here is referred to as a parameterized circuit or an ansatz.We will replace both the encoding channel and the decoding channel described above with appropriate parameterized circuits and then numerically find the optimal values for the classical parameters.
We are interested primarily in studying the perfor- FIG. 3: Illustration of the general form of the unitaries that are appended to the (a) encoding and (b) decoding circuits to increase the number of one-axis twists once at least one is in use in the arbitrary-axis twist ansatzes.
In this diagram, time proceeds from left to right.
mance of two sets of parameterized circuits.The first set was introduced originally by a pioneering work of Kaubruegger et al. [34,57].This family of ansatzes is described in terms of a number of encoding layers L En and a number of decoding layers L De so that the total encoding and decoding circuits can be written . Each layer is composed of two oneaxis twists as defined in Eq. ( 10) and a global rotation as defined in Eq. ( 9).The encoding and decoding layers respectively have the form as illustrated in Fig. 2.After the decoding unitary is applied, an R x π 2 rotation is performed followed by a measurement of the z -component of total angular momentum as described in Eq. ( 5).We call these the parity symmetric ansatzes since all gates used in the circuit independently commute with the X-parity operator P X = j X j .This property implies that the resulting estimator will be anti-symmetric in φ and that the probe state's average angular momentum will be directed along the x -axis.The second property is important because spin squeezed states oriented along the x -axis are likely to preform well.The member of this family of ansatzes with L En encoding layers and L De decoding layers will be referred to as the PAR 2LEn 2LDe ansatz since 2L En oneaxis twists are used in the encoding and 2L De twists are used in the decoding.The parity symmetric ansatzes were found to achieve nearly the optimal estimation performance allowed by quantum mechanics [33] when the number of layers is sufficiently large, see for reference Fig. 2 in Ref. [34].

III. ARBITRARY-AXIS TWIST ANSATZES A. Definition and motivation
We propose a new family of parameterized circuits for this problem which we refer to as the arbitrary-axis twist ansatzes.The encoding and decoding unitaries are alternations of arbitrary global rotations and one-axis twists about the z -axis.The unitaries always begin and end with an arbitrary rotation so that they effectively implement a series of one-axis twists and global rotations about arbitrary axes followed by an additional rotation.We will label these ansatzes by the number of one-axis twists used in the encoding and the number of one-axis twists used in the decoding.The arbitrary-axis twist ansatz using n En one-axis twists in the encoding and n De one-axis twists in the decoding will be referred to as AAT nEn nDe .For example, AAT 1  1 can be thought of as a family of generalized twistuntwist protocols.The encoding and decoding unitaries used in AAT 1  1 are 1 )T z (θ 1 ), ( 13) 1 ).( 14) As for the parity symmetric ansatzes, we also include a final R x ( π 2 ) rotation at the end of the decoding immediately prior to the J z measurement.Each additional one-axis twists requires three new parameters.In particular, adding an additional encoding or decoding one-axis twist means appending the following unitaries to the end (beginning) of the encoding (decoding) circuit, The form of these appended unitaries is illustrated in Fig. 3. Altogether AAT nEn nDe contains 4 + 3(n En + n De ) circuit parameters.
We note that using sequential rotations about only 2 axes, rather than the standard 3 rotations by Euler angles, is sufficient to make the global rotations arbitrary.To understand why, notice that we can write a one-axis twist flanked by two different arbitrary global rotations as So the number of rotations can be reduced by one.This can be repeated for each set of three rotations.Since the fiducial state is taken to be |+⟩ a parameter can be removed from the first rotation by using the Euler decomposition R z (γ)R y (β)R x (α) and dropping the x -rotation.Similarly, since the final measurement is effectively a measurement of J y , a parameter can be removed from the last global rotation by using the Euler decomposition R y (γ)R x (β)R z (α) and dropping the last rotation.Notably, while it utilizes only the one-axis twists about the fixed z -axis, AAT nEn nDe is the most general ansatz composed of one-axis twists and global rotations with n En twists used in the encoding and n De used in the decoding.This is because the rotations can effectively change the axis of the one-axis twists, so that this family of ansatzes is universal to generate the set of unitaries which are composed of a sequence of one-axis twists about arbitrary axes interspersed with any global rotations.Due to the generality of the arbitrary-axis twist ansatzes, AAT nEn nDe will preform at least as well as PAR nEn nDe .However, PAR 2nEn 2nDe has 3(n En + n De ) parameters while AAT 2nEn 2nDe has 4 + 6(n En + n De ) parameters.Thus, comparisons between the members of these two families of ansatzes with a similar number of parameters represent a comparison between the power of increased freedom in rotations and increased entanglement.

B. Noiseless performance
To analyze the performance of these ansatzes we begin by numerically optimizing the circuit parameters for several of the low depth arbitrary-axis twist ansatzes over a wide range of values of the prior variance δφ.While the simulation of the quantum evolution is efficient due to the permutation invariance in the ideal setting (and lowbond dimension matrix product state representations in the noisy setting), the classical optimization of the circuit parameters can be a major limitation as the number of circuit parameters is increased.Accordingly, we consider a system size of N = 30 throughout this section.See App.A for details on how the numerical optimization was preformed.In Fig. 4a, we plot the Bayesian mean squared error we obtain vs. the prior standard deviation.In particular, we compare AAT 0 1 , AAT 1 0 , AAT 1 1 , AAT 1 2 , and AAT 2  1 .For reference, we also include curves corresponding to the best entanglement free (i.e.classical) approach and the optimal approach allowed by quantum mechanics [33].We observe a first dramatic improvement in performance when moving from the classical approach to AAT 1 0 .A second large improvement occurs when moving from AAT 1 0 to AAT 1 1 .Finally we notice another significant improvement when moving from AAT 1  1 to AAT 1 2 .This is indicates that, once there is a single encoding twist, the most promising place to look for improvement is in adding twists to the decoding.This also makes sense in light of previous observations that the optimal probe states are well approximated by spin squeezed states in this case [33].
We now focus on the regime where the largest im-provement over the classical approach is possible, i.e. 0.5 ≲ δφ ≲ 0.9.In this region, we examine how increasing the depth of the decoding circuit effects the BMSE.A comparison for AAT 1 1 , AAT 1 3 , AAT 1 4 , and AAT 1 5 is shown in Fig. 4b.We observe that as the circuit depth is increased the optimized estimators approach the optimal value of the BMSE.However, we also note that increasing the circuit depth appears to provide diminishing returns as the optimal is approached.While this is to some extent inevitable due to the existence of a lower bound, we observe strikingly little improvement in the transition from AAT 1  4 to AAT 1  5 .Due to these diminishing returns it is reasonable to consider circuits with only a few one-axis twists in at least some cases.
We next examine the origin of the improvement of AAT 1  2 over AAT 1 1 .To do this we plot, in Fig. 4c, the variance of the optimized estimators at δφ ≈ 0.74 over a range of values of φ for AAT 1  1 , AAT 2 1 , and AAT 1 2 .We note that neither AAT 1 2 nor AAT 2 1 exhibit dramatic improvement over AAT 1  1 .In Fig. 4d, we plot the bias squared of the optimized estimators over measurement outcomes at the same value of δφ and same range of φ for AAT 1  1 , AAT 2 1 , and AAT 1 2 .We observe that the estimator associated with AAT 1  2 displays a larger region of low bias than the other two ansatzes.This is in line with previous observations that increasing the depth of the decoding circuit primarily serves to decrease bias [69].It is also not surprising that a decrease in bias should be the most effective way to obtain a smaller BMSE since we are examining the case where the prior variance is fairly large.

C. System size dependence and scaling
Next, we examine how the performance of our ansatz compares to that of the optimal and classical strategies as the system size is increased with an emphasis on scaling.The quantum van Trees [34,42,[78][79][80], or Bayesian quantum Cramér-Rao, inequality states that where F Q is the quantum Fisher information averaged over the prior distribution and F P is the Fisher information of the prior distribution.In the absence of noise, the quantum Fisher information is independent of the value of φ and thus this averaged Fisher information satisfies The Fisher information of a Gaussian distribution is given by Motivated by this, we simplify the curve fitting process by considering a shifted version of the BMSE defined by FIG. 5: Plots of the shifted Bayesian mean squared error ∆ φ2 vs. system size N for prior standard deviations of (a) δφ = 10 −3 and (b) δφ = 0.7.The markers on the solid curves correspond to the performance of the strategies the dashed lines are the fitted curves.In (a) the optimal, AAT 1 2 , twist-untwist (TuT) strategies are all fit the curves with exponents ν ≈ 2.0, the AAT 1 0 is fit to a curve with ν ≈ 1.7, and the classical strategy is fit to a curve with ν ≈ 1.0.In (b) the optimal strategy is again fit to a curve with ν ≈ 2.0, the AAT 1  1 strategy is fit to a curve with ν ≈ 0.73, and the classical strategy is fit to a curve with ν ≈ 0.45.
The fitting is further simplified by subtracting out the excess mean-squared error associated with phase slips, i.e. times when the ϕ < π or ϕ > π.When this happens the mean-squared error is approximately 4π 2 .Further, the probability of this occurring is 1 In order to study the scaling, we then fit ∆ φ to a curve of the form where α and ν are fitting parameters.When performing the fitting we exclude points where optimization becomes stuck in a local minimum.
The results are shown in Fig. 5 for prior standard deviations of (a) δφ = 10 −3 and (b) δφ = 0.7.The markers on the solid line indicate the performance of the strategy at a particular system size and the dashed lines indicate the fittings.In Fig. 5 (a), where the prior standard deviation is small, the BMSE is dominated by variance and the optimal and classical curves respectively exhibit the familiar Heisenberg ν ≈ 2 and standard quantum limit ν ≈ 1 scaling.Our AAT 1 0 shows a scaling in good agreement with that expected of spin squeezed states ν ≈ 1.7 ≈ 5 3 [17].Our AAT 1 1 ansatz exhibits Heisenberg scaling for large enough system sizes.For reference, we also plot the performance of the twist-untwist (TuT) protocol described in [25].The twist-untwist protocol exhibits comparable but slightly worse performance than our AAT 1  1 ansatz which makes sense given that the AAT 1  1 ansatzes are generalized twist-untwist protocols.
In Fig. 5 (b), where the prior standard deviation is much larger, we find that the exponents of the optimal strategy continues to exhibit Heisenberg scaling as expected from the π-corrected Heisenberg limit [38].However, while the optimal scaling is independent of the prior standard deviation, the variational strategies only achieve the optimal performance at large prior standard deviation if a large number of layers are used.For this reason, we see a decrease in the scaling exponent when the prior standard deviation is increased but the number of layers is kept small.The scaling of the classical strategy decreases to ν ≈ 0.45.This decrease may be due to the restriction that ⟨φ est ⟩ ∝ ⟨J z ⟩.Our AAT 1 1 ansatz exhibits an intermediate scaling with ν ≈ 0.73.From these observations we infer that even our low depth arbitrary-axis twist ansatzes can achieve better scaling than the corresponding classical strategies over a wide range of values of prior standard deviation.

D. Comparison to the previously considered parity symmetric ansatzes
Finally, we compare the performance of the optimized arbitrary-axis twist estimators to that of the optimized parity symmetric estimators with low and medium depth decoding circuits introduced in [34,57,69].The results of this comparison are shown in Fig. 6.We find that AAT 1  1 achieves a smaller BMSE than PAR 2 2 despite using only half as many one-axis twists.We also observe that AAT 1  4 performs comparably to PAR 2 6 despite the former using only 5 one-axis twists and the latter using 8.We also have some evidence that the optimization of the arbitrary-axis twist ansatzes seems to be eased for numerical optimization.In particular, we find that arbitrary-FIG.6: Plot of the root Bayesian mean squared error ∆φ/δφ vs. prior variance δφ for the optimized ansatzes PAR 2  2 , PAR 2 6 , AAT 1 1 , and AAT 1 4 in the 0.5 ≲ δφ ≲ 0.9 regime for N = 30.AAT 1  1 preforms better than PAR 2 2 and AAT 1 4 preforms about as well as PAR 2 6 .We note that the rightmost PAR 2  6 point is trapped in a local minima in this numerical setting, but can be resolved by a different initialization, for instance.See App.A and C for details on how the optimization was preformed and values of the Bayesian mean squared error that can be obtained with this ansatz and prior variance via other optimization methods.
axis twist ansatzes may be less likely to fail due to local minima compared to the parity symmetric ansatzes with a similar number of parameters.See App.C for more details about the comparison of the performance of these two families of ansatzes and App.B for information about the total twisting times associated with the arbitrary-axis twist strategies.

IV. TENSOR NETWORK SIMULATION FOR NOISY QUANTUM METROLOGY
As the effect of single qubit dephasing noise during the free evolution in variational quantum metrology has been studied previously in both the Bayesian [34] and Fisher information settings [59,62,64,65,71,73], here we focus on the effect of more complex noise such as correlated dephasing models.We also study the effect of circuit level noise during the encoding and decoding circuits.

A. Correlated dephasing
In realistic scenarios, the signal φ to be detected may not be constant over all qubits.We consider the case where the signal φ j at each site can be broken up as φ j = φ + r j where φ is a constant piece at each site and r j is a Gaussian random variable with mean zero.In particular, we take the spatial correlation functions associated with this random variable to be r j r k ≡ C j,k , where • indicates average over the Gaussian distribution that the r j are drawn from and we will assume, as a first nontrivial example, that C j,k = 0 if |j − k| > 1.The physical content of this is that we allow for correlations but only between nearest neighbor sites in 1D.Such an evolution can be thought of as a global z -rotation followed by a correlated dephasing channel.The effective noise channel is then where in the last line we explicitly preformed the multivariate Gaussian integration.Such a noise model may be relevant to cavity QED based experiments in which the spins are arranged in a quasi one-dimensional lattice [19,29,30,81,82].Up until this point, all dynamics have been restricted to the symmetric subspace, i.e. the N 2 ( N 2 + 1) eigenspace of J 2 x + J 2 y + J 2 z .This subspace has dimension N + 1 allowing for efficient simulation.This is a result of the initial states, unitary dynamics, and measurements respecting a permutation symmetry.However, this noise model breaks the permutation symmetry.It has been shown by Chabuda et al. in [77] that this noise model nevertheless has an efficient tensor network representation, and a tensor network contraction was utilized to solve recursively for the optimal Fisher information associated with this noisy free evolution.On the other hand, we propose to compile symmetric subspace techniques to tensor network constructions for permutationinvariant states and one-axis twists and combine them with the above construction of correlated dephasing, so that the correlated noise effects during the free evolution of metrology protocols are simulated efficiently.

B. Matrix product states and operators for permutation-invariant states and one-axis twists
We now review some basic facts about matrix product states (MPSs) and operators (MPOs) [83][84][85][86] and efficient representations of permutation-invariant states and one-axis twists.An MPS representation of a state |ψ⟩ is a representation of the form where the {A (0) [j], A (1) [j]} are matrices, except when j = 1 or j = N in which case they are respectively row and column vectors.The label x j is called the physical index.
The largest dimension of any of the matrices A (s) [j] is referred to as the bond dimension of the representation.Any state of N qubits can be put into an MPS form with a bond dimension of 2 ⌈ N 2 ⌉ .A natural extension of the MPS construction is the MPO.An MPO representation of an operator O is a representation of the form The labels x L j and x R j are the called physical indices and the matrix indices are referred to as virtual indices.All operators on N qubits can be put into the form of an MPO with bond dimension 4 ⌈ N 2 ⌉ .An MPS representation for the state that results from acting an operator with a known MPO representation on a state with a known MPS representation can be determined by contracting the physical indices of the MPO associated with x R with the physical indices of the MPS.If the initial state MPS representation had bond dimension d s with tensors composed of {A (xj ) [j]} and the MPO had bond dimension d o with tensors composed of This allows for the simulation of time evolution by contracting an MPO representation of the time evolution operator with an MPS representation of some initial state.
If the bond dimension of the state remains polynomial at all times, then expectation values of operators that have polynomial bond dimension MPO representations can be computed with polynomial cost.An MPS representation with bond dimension ⌈ N 2 ⌉ + 1 can always be constructed for a permutation-invariant state as follows.Any permutation-invariant state of N qubits can be written as where the coefficients c |x| depend only on the Hamming weight of x.The matrices used in the matrix product state representation at a site j < ⌈ N +1 2 ⌉ have dimension j × (j + 1) and elements given by (1) The matrices associated with site N − j + 1 with j > ⌈ N +1 2 ⌉ are just the transpose of those associated with site j.The remaining site j = ⌈ N +1 2 ⌉ has matrices Analogously, all permutation-invariant operators have a MPO representation with bond dimension O(N 3 ) as described in App.D. We now turn to constructing MPOs for one-axis twists.We focus on the one-axis twist about the z -axis but similar results hold for any operator that is permutationinvariant and can be diagonalized by single qubit operations.A one-axis twist about the z -axis can be written as The key observation is that, as was the case for the permutation-invariant state, the coefficients depend only on the Hamming weight.The MPO matrices for a site j ̸ = ⌈ N +1 2 ⌉ are then given by,

S [j], and
2 ⌉ the matrices are given by and The other unitary element of our ansatzes, the global rotations, can be given a bond dimension one representation.For example, the matrices at each site associated with an x -rotation are where µ, ν ∈ {0, 1}.Accordingly, when used in time evolution, the global rotations do not cause the bond dimension of the state to increase.This is a result of the fact that the global rotations do not involve interactions between spins.Finally, we will need MPO constructions for the operators J z and J 2 z since we must evaluate expectation values of these operators.The operator J z can be given a bond dimension two representation.At sites away from the boundary j ̸ = 1, N , the matrices are given by At j = 1 and j = N , the matrices are respectively δ µ,ν .The operator J 2 z can be given a bond dimension three representation.At sites away from the boundary it has matrices given by At the boundary sites, the matrices are given by A (µ,ν) FIG. 7: Reduction of the root Bayesian mean squared error ∆φ/δφ for the optimized AAT 1  1 ansatz under a correlated dephasing channel of Eq. ( 24) during a free evolution, when N = 30 and δφ ≈ 0.74.The correlated noise is characterized in terms of the variance of r j at each site c 1 and the covariances at adjacent sites c 2 .The curves of constant ∆φ/δφ indicate that correlated dephasing is not much worse than uncorrelated and anti-correlated dephasing is more favorable than uncorrelated.

C. Simulation algorithm
We use these constructions as a basis for simulations of metrology experiments in the presence of noise that breaks the permutation symmetry.If 1D geometrically local noise, for example the correlated dephasing described above, occurs during the free evolution, we introduce a simulation strategy that makes use of both symmetric subspace simulation and matrix product constructions.
Initially, the first three stages in Fig. 1 are simulated in the symmetric subspace so that an (N + 1) × (N + 1) symmetric subspace representation of ρ probe is obtained.Then this is converted to an MPO representation using the construction for permutation-invariant states described in the last section.
Second, if the only entangling resources used are oneaxis twists and the number of one-axis twists is sufficiently small, the decoding, i.e. stages 6 and 7 in Fig. 1, can be dealt with in two ways.The first method is to map each one-axis to an MPO as described in the last section.
The MPOs for all one-axis twists and global rotations used in the decoding can then be contracted to obtain an MPO for the time evolution operator associated with the decoding.Alternatively, if the number of one-axis twists is large enough that the first procedure would lead to an intractable bond dimension or entangling resources other than one-axis twists are used, the decoding unitary may be compiled into a single permutation-invariant unitary as described in App.E. If the depth of the circuit is ℓ, then the cost of this compilation is O(ℓN 7 ) in time.Note that it is not sufficient to work only in the symmetric subspace when compiling the decoding unitary as it will be acting on a state that lies outside of the symmetric subspace.The resulting permutation-invariant unitary can then be mapped to an MPO with bond dimension O(N 3 ) via a construction analogous to the one for permutationinvariant states as noted above and in App.D. The first approach has the advantage of allowing the compiling step to be skipped, while the advantage of the second approach is that by construction the bond dimension of the resulting MPO is guaranteed to be polynomial in system size.In this paper, we utilize the former approach in which the MPOs for individual one-axis twists are independently constructed.
Next, an MPO is constructed for the superoperator associated with the free evolution, i.e. stages 4 and 5 in Fig. 1.The correlated dephasing model described above in Eq. 24 has an MPO representation with bond dimension 2 [77].Since this noise commutes with the unitary part of the free evolution we consider and the unitary part is a global rotation, the entire free evolution superoperator can be given a bond dimension 3 representation.
Finally, the MPOs obtained in these three steps can then be contracted to obtain an MPO of the state immediately before measurement.If the number of one-axis twists used in the decoding does not scale with system size and the MPO for the free evolution superoperator has polynomial bond dimension, either decoding simulation strategy gives an algorithm with cost polynomial in system size.If the number of gates used in the decoding scales polynomially with system size then the compiling strategy continues to give a polynomial cost algorithm.

D. Robustness to correlated noise
We now study the robustness of the strategies found to be optimal in Sec.III to various types of noise.First, in the presence of spatially correlated dephasing as described in Sec.IV A, we analyze how the performance of the solutions found to be optimal in the absence of noise decays.In particular, we take the correlation functions to be so that c 1 is the variance at each site and c 2 is the covariance between adjacent sites.These results were obtained using our tensor network simulation scheme.Due to the polynomial bond dimensions of the involved states and operators described in Sec.IV B there is no truncation of singular values and the results are exact to numerical precision.The effect of this type of noise on the optimal AAT 1 1 noiseless strategy found at δφ ≈ 0.74 for different values of c 1 and c 2 is shown in Fig. 7.Note that while we use the circuit parameters from the noiseless optimization, the proportionality constant a in the estimator is changed to be the optimal one given the noise channel  , and the classically optimal strategy under noisy one-axis twists vs. the noise strength when N = 30 and δφ ≈ 0.74.In (a) and (c) the noise model is dephasing noise with noise strength p and in (b) and (d) the noise model is amplitude damping noise with noise strength γ.The ansatzes with deeper decoding circuits are advantageous only when the noise strength p or γ is sufficiently small.defined by Eq. ( 24).We do not observe a dramatic drop off in performance in the presence of correlated noise.This is indicated by the relatively constant value of the Bayesian mean squared error along curves of constant c 1 + c 2 when c 2 > 0. We also find that the Bayesian mean squared error at constant values of c 1 decreases as c 2 is made more negative.In other words, performance is improved as the noise becomes anti-correlated.This can be understood as resulting from the underlying estimator being proportional to ⟨J z ⟩ when averaged over measurement outcomes.Intuitively, as the estimator is the average of a separate estimator for each qubit, we may imagine some cancellation of the errors if the errors in these estimators are anti-correlated.

E. Circuit level noise
Finally, we study the robustness of the strategies found to be optimal in Sec.III to circuit level noise, namely noises during encoding and decoding circuits.First, we consider the effect of single-qubit dephasing, on each qubit after each one-axis twist, as a model of a noisy one-axis twist.We refer to p as the noise strength.We again employ the tensor network based simulation and again the proportionality constant a used in the estimator is the optimal one given the noise.The results are shown in Fig. 8a.We observe that AAT 1 1 and AAT 1 2 outperform the classical strategy up to a noise strength of at least p = 0.1.Second, notice that AAT 1 2 performs better than AAT 1  1 over the same range.This is not a property shared by the ansatzes with deeper decoding circuits.The deeper circuit ansatzes are all outperformed by AAT 1  2 starting at some noise strength between 0.0025 and 0.0047.AAT 1  3 is outperformed by AAT 1 1 at p ≈ 0.03 and by the classical strategy at p ≈ 0.079.The ansatzes with deeper decoding circuits are outperformed by AAT 1 1 beginning between a noise strength of p ≈ 0.0085 and p ≈ 0.012 and they are outperformed by the classical strategy beginning between at noise strength between p ≈ 0.035 and p ≈ 0.05.In Fig. 8c, the robustness of the parity symmetric ansatzes to dephasing noise is displayed.The behavior is qualitatively similar.We note that AAT and PAR 2  6 for noise strengths greater than about 0.00091.In particular, for noise strengths greater than 0.00091 all of the considered parity symmetric ansatzes are outperforms by atleast one of AAT 1  1 or AAT 1 2 .All of the parity symmetric strategies are outperformed by the classical strategy for error rates greater than p ≈ 0.027 due to the increased number of one-axis twists.
Next, we consider the effect of single qubit amplitude damping.This could arise due to atomic spontaneous emission.The noise channel is given by where γ is the noise strength for this model and σ Again we consider the situation in which each qubit passes through this channel after each oneaxis twist.As for the last noise model, the estimator coefficient used is the optimal one given the noise.The results are shown in Fig. 8b.For this noise model AAT 1 1 still outperforms the classically optimal strategy up to at least γ = 0.1.All deeper ansatzes are outperformed by AAT 1 1 at some noise strength between 0.0095 and 0.028 and by the classical strategy at some noise strength between 0.044 and 0.089.The robustness of the optimized parity symmetric strategies to the same noise is shown in Fig. 8d.Again, the results are qualitatively similar but we note that AAT 1  1 always outperforms PAR 2 2 and AAT 1  2 outperforms PAR 2 4 and PAR 2 6 for noise strengths greater than 0.0047.Above this noise strength all parity symmetric ansatzes are outperformed by atleast one of AAT 1  1 or AAT 1 2 .All of the parity symmetric strategies are outperformed by the entanglement free strategy for values of γ greater than ≈ 0.0056.
The first point of this analysis is to give some indication of when it is worth increasing the circuit depth in a particular device as which circuit preforms the best can be dependent on the noise level.The second is to indicate that the presence of noise during the encoding and decoding can rapidly eliminate the benefit of increased decoding depth.This is another sense in which going to deeper ansatzes provides diminishing returns.

V. CONCLUSION
We have advanced several aspects of Bayesian, or global, variational quantum metrology.We introduced and analyzed a new family of parameterized circuits which we call arbitrary-axis twist ansatzes.These are the most general ansatzes composed of one-axis twists and global rotations with a fixed number of one-axis twists in the encoding and decoding circuits.We found that a reduction in the number of one-axis twists needed for performance comparable to that of a previously considered variational approach based on parity symmetric ansatzes is available.This is especially meaningful in the shallow depth regime where the reduction can be by as much as half.The significance of this finding is complemented by another finding that the usefulness of deeper circuits is harmed by the presence of noise in the encoding and decoding circuits.In the example shown in Fig. 8, we found that for dephasing noise strengths greater than p = 0.00091 or amplitude damping noise strengths greater than γ = 0.0047, all parity symmetric ansatzes considered are outperformed by AAT 1  2 .At larger noise strengths, they are also all outperformed by AAT 1  1 .In addition to being the ansatzes utilizing the fewest one-axis twists, AAT 1  1 and AAT 1 2 are usually also among the ansatzes with the smallest total twisting angles (see App. B).In fact, for the example in Fig. 8 they are the two ansatzes with the smallest total twisting angle.They also start outperforming the other AAT ansatzes at similar noise strengths.Minimizing the number of entangling resources is also useful from a classical simulation perspective.For example, we simulated the effect of spatially correlated dephasing noise during the free evolution using a tensor network algorithm.The cost of this algorithm was reduced due to the small number of entangling one-axis twist gates in the ansatz of interest.
Our findings first show the rather important contributions of non-entangling resources to variational metrology especially in the shallow depth regime.This could be a useful fact in the design of future ansatzes.Second, we found that variational metrology can still perform well in the presence of a small amount of spatially correlated noise.Finally, we also studied the robustness of the ideal strategies to circuit level noise, and found that for deep decoding circuits the performance falls off rapidly with the strength of the noise associated with the one-axis twists.This suggests that in the absence of some form of quantum error correction, variational metrology circuits should be kept shallow.We numerically search for the optimal values of the circuit parameters for a given value of δφ for both the arbitrary-axis twist and parity symmetric ansatzes.We first perform the optimization for AAT 1  1 and PAR 2 2 .Then the circuit parameters found via this optimization are used as the initialization for one level deeper circuits, e.g.AAT 1  2 .The parameters found via this optimization are then used as the starting point for the next level deeper optimization and so on.We refer to this initialization strategy as sequential initialization.
The optimization of each ansatz begins with a Nelder-Mead search [87][88][89].For an optimization over n variables Nelder-Mead constructs an n + 1 point simplex.Roughly speaking, at each step of the optimization it attempts to replace the point in the simplex with the largest objective value with a better point.If a better point cannot be found, this is interpreted as indicating the presence of a valley and the simplex is shrunk.The optimization is stopped when the difference between the largest and smallest value currently in the simplex differ by less than some value ϵ 1 .
We then take the output parameters from the Nelder-Mead search and use them as the initialization parameters for a sequential quadratic programming optimization [90,91].At each iteration, this algorithm finds the minimum of a quadratic approximation to the objective function at the current point.While this approach can provide better performance than linear approximations, it does require access to second derivatives.The stopping criteria is that the minimum objective function value between consecutive iterations differ by less than ϵ 2 .We then take the output circuit parameters from that optimization and use them to initialize an additional Nelder-Mead optimization.This process is repeated until consecutive optimizations agree to within ϵ 3 .
While the values of ϵ 1 , ϵ 2 , and ϵ 3 could in principle be allowed to differ, we fix them all to the same value ϵ = ϵ 1 = ϵ 2 = ϵ 3 and take that value to be ϵ = 10 −13 .The optimization is carried out using the software package NLopt [92] with automatic differentiation provided by Zygote [93].We also note that the numerical integration over φ to estimate the Bayesian mean squared error is preformed using Hermite-Gauss quadrature with 500 integration points during the optimization and 25 integration points in the tensor network simulations.The 25 integration points is close to the amount used in a recent proof of principle experiment [69].The total algorithm is illustrated by the flow chart in Fig. 9.The Hermite-Gauss quadrature is preformed using FastGaussQuadrature package [94] and the tensor network simulations use the ITensors package [95,96].FIG.9: Illustration of the algorithm used to find the optimal strategies.First, the expectation values of J z and J 2 z are estimated at a sufficient number of different values of φ to estimate ∆φ.Then the estimate of ∆φ/δφ and possibly estimates of the derivatives of this function are used to choose new values of the circuit parameters θ to consider.To place in context the method of resource counting utilized in this paper, we briefly comment on the total twisting angle θ twist tot , which is given by the sum of the magnitudes of the angles associated with all one-axis twists used in the ansatz, as a potential alternative to the number of one-axis twists.Although the total twisting angle is a seemingly simple way to quantify the cost of the variational strategies for many experimental setups, in this appendix, we will explain why the number of one-axis twists is indeed both a more suitable and more tractable resource to count.We also note that the strategies considered here do not exhibit large total twisting angles in practice, in fact they are usually less than π 8 .First, in Table.I, we separately display the total twisting angle associated with the encoding circuit θ twist En and the total twisting angle associated with the decoding circuit θ twist De for N = 30 and δφ ≈ 0.74 (our most commonly studied prior standard deviation in the main text).The data suggests that these should be treated as two separate objects rather than two parts of θ twist tot .For example, when the data from this table is fit to a curve of the form  the decoding once a single twist is used in the encoding.In Fig. 10, we plot the total twisting angle vs. decoding depth for the optimized AAT ansatzes with N = 30, n En = 1, and various values of δφ.Usually, AAT ansatzes with a larger number twists result in a larger total twisting angle than AAT ansatzes with fewer twists.In the cases where this is not true we often observe little improvement due to the additional twists.These observations suggest that the number of one-axis twists used by the strategy is indeed a reasonable proxy for the total twisting angle for the AAT ansatzes.
Second, the optimization where the number of twists used in the encoding and in the decoding are separately allowed to vary but the total twisting angle is fixed is possible but could lead to unusual solutions if the number of twists is truly unbounded.In particular, such an approach could result in pathological solutions where the number of twists used becomes very large but the angle of each twist becomes very small.Such a strategy may be challenging to implement in some experimental platforms, for example in ion traps [69].Additionally, it is still guaranteed that at a fixed total twisting angle the AAT ansatzes will outperform the PAR ansatzes as the later are a special case of the former.We also note that fixing the number of twists does provide an upper bound on the total twisting angle used for that optimization.In this sense, the number of twists can be viewed as a coarsegrained measure to capture much of the physics of the dependence of the BMSE on the magnitude of the twisting angles.For completeness, we mention the total twisting angle for the PAR ansatzes for δφ ≈ 0.74.The total twisting angles associated with PAR 2 2 and PAR 2 4 are both about 0.153, and the total angle associated with PAR 2  6 is about 0.194.These total twisting angles are larger than that of AAT 1  2 but smaller than that of AAT 4 .However, the total twisting angle varies for other values of δφ.In particular, the total twisting angle of AAT 1  4 has a range between 0.15 and 0.26 as seen in Fig. 10, so that it can be comparable to that of PAR 2  6 .In particular, for some values of δφ, AAT 1  4 exhibits a smaller total twisting angle than PAR 2  6 .
FIG. 10: Plot of the total twisting angle of the optimal AAT 1 nDe strategies vs. the decoding depth for various values of δφ.Note that there is a clear relation between the two quantities and in the case that δφ ≈ 0.74 the relationship is nearly linear for the depths considered here.

Appendix C: Comparison of arbitrary-axis twist and parity symmetric ansatzes
We note that the result of the optimization, and even which ansatzes outperform others, can depend significantly on the specifics of the hyperparameters used.To study this phenomena we perform the optimization of PAR 2  6 and AAT 1 3 for different sets of hyperparameters.In particular, the hyperparameters we adjust are ϵ, whether the sequential initialization is used or if all circuit parameters are initialized to zero, and the number of points used to approximate the integral over φ.The results of all these optimizations are shown in Fig. 11.We chose these two ansatzes as examples for three reasons.First, they are fairly deep circuits.Second, PAR 2  6 and AAT 1   3   have a similar number of circuit parameters with 12 and 16 respectively.Third, they are an example where which ansatz achieves a smaller BMSE depends on the hyperparameters used.We find that the optimizations are liable to get stuck in local minimia whenever the hyperparameters are relaxed from the values described above.This sensitivity to hyperparameters has the potential to be an issue for experimental on device optimization since it necessitates examining more values of φ for the integration and using more repetitions to increase the estimation precision of ∆φ/δφ.We also note that we have found that the sequential initialization strategy described above is able to at least partially mitigate these issues.We have also heuristically observed that the arbitraryaxis twist ansatzes may be somewhat less susceptible to these issues than the parity symmetric ansatzes.On an optimistic note, most optimizations found reasonably good values for ∆φ/δφ even if they do not find the global minimum.This likely acceptable for some applications. .The blue crosses are the results for initializing all circuit parameters to zero, setting ϵ = 10 −8 , and using only 25 points in the numerical integration.The orange circles are the same as the blue crosses but with 500 points used in the numerical integration.The green triangles are the same as the orange circles but with ϵ = 10 −13 .The red diamonds use ϵ = 10 −8 but sequential initialization.Finally, the purple triangles are the result of using the hyperparameters described in App. A. As they are usually the best, the numerical setting of the purple triangles are displayed throughout the main text.
It has been known for a long time that permutation invariant states of qubits can be represented with bond dimension that scales linearly with system size [83].Here we give a concrete construction that works for any state of qudits of any dimension d in the symmetric subspace and has maximum bond dimension O(N d−1 ).That is, the resulting MPS has bond dimension that is polynomial in the system size.This construction can be extended to MPOs for operators via the Choi-Jamio lkowski isomorphism.
In this construction, the tensor at a site 1 is described by 1×d matrices of the form, (A (s) [1]) 1,m = δ s,m−1 .This vector can be thought of as keeping track of the type vector of some given basis ket in the sense that there is one possible vector with a single entry equal to one and the rest zero here for each possible type vector of a one site system.The remaining tensors associated with sites on the left half of the chain, i.e. those sites with j < ⌈ N +1 2 ⌉ will serve to update this vector with new information about the global type as we move from left to right in that j k=1 A (s k ) [k] should be a 1 × j+d−1 d−1 vector in which each entry is associated with one of the possible type vectors of a j site system.Accordingly, the right dimension of A (s) [j] is j+d−1 d−1 = O(j d−1 ).The sorting of the entries of this vector is in order of increasing values of ||t|| 1 = j t j .For types with the same value of ||t|| 1 , the entries are sorted first in order of decreasing t 1 .For types with the same value of t 1 they are sorted in order of decreasing t 2 and so on.
If s = 0, then (A (0) [j]) µ,ν = δ µ,ν .When s ̸ = 0, the tensors are constructed recursively as (A (s) for some choice of {t n } satisfying n t n = j − 1.The tensors for the right half of the chain j > ⌈ N +1 2 ⌉ are constructed in the same way but with the role of column and row indices exchanged, i.e. they are the transpose.
This leaves a single site unaccounted for at j = ⌈ N +1 2 ⌉.All the coefficients that are needed to describe this operator can be placed in the tensor at this site.The µν matrix element of A (s) [⌈ N +1 2 ⌉] has the value c t if with t (L) + t (R) + ês = t where ês is the type vector with all elements equal to zero except for element s which is equal to one and we use the convention that n m = 0 if m < 0. The result of this construction is an exact MPS or MPO representation of the state or operator with bond dimension ⌈ N +1 Here, we show that a polynomial depth circuit of permutation invariant unitaries can be compiled into a single permutation invariant unitary at a cost polynomial in system size.In App.D, we saw that a permutation invariant operator is specified by only N +3 N = O(N 3 ) coefficients, one for each possible length three type vector for N sites.Thus, given two permutation invariant operators U and V with coefficients a t and b t ′ respectively, our task is to find the coefficients c t ′′ of their product U V .We will prime the elements of the type vectors in
Appendix B: Resource counting as measured by the number of one-axis twists vs. total twisting angle

FIG. 11 :
FIG.11: Plot of the root Bayesian mean squared error ∆φ/δφ vs. prior variance δφ resulting from the Nelder-Mead/sequential quadratic programming approach described in the text for various hyperparameters for (a) PAR2  6 and (b) AAT 1 3 .The blue crosses are the results for initializing all circuit parameters to zero, setting ϵ = 10 −8 , and using only 25 points in the numerical integration.The orange circles are the same as the blue crosses but with 500 points used in the numerical integration.The green triangles are the same as the orange circles but with ϵ = 10 −13 .The red diamonds use ϵ = 10 −8 but sequential initialization.Finally, the purple triangles are the result of using the hyperparameters described in App. A. As they are usually the best, the numerical setting of the purple triangles are displayed throughout the main text.

constraint d− 1 j=1 1 d− 1 =
t j ≤ N .We again choose a representative element from each equivalence class as|x(t)⟩ = |0⟩ ⊗N − j tj ⊗ |1⟩ ⊗t1 ⊗ • • • ⊗ |d − 1⟩ ⊗t d−1 .It is convenient to define t 0 ≡ N − j t j to be the number of qudits in state |0⟩.Then permutation invariant qudit states |ψ⟩ can be written as|ψ⟩ = t s.t.j tj ≤N c t t 0 !j t j !σ∈S N P σ |x(t)⟩.(D3)The number of independent coefficients here is equal to the number of distinct type vectors, or the number of multisets of cardinality N with elements drawn from an underlying set of cardinality d, i.e.N +d−O(N d−1 ).This framework can also include permutation invariant operators via the Choi-Jamio lkowski isomorphism.Under this isomorphism, an operator O = s,s ′ c s,s ′ |s⟩⟨s ′ | on qudits of dimension d is associated with a state of qudits of dimension d 2 according toO = s,s ′ c s,s ′ |s⟩⟨s ′ | → |O⟩ = s,s ′ c s,s ′ |(s 1 , s ′ 1 )(s 2 , s ′ 2 ) • • • (s N , s ′ N [j]) = [A (s) [j − 1], C s j ]where the matrix C s j has right dimension j+d−2 d−2 and all of its elements are either zero or one.The non-zero elements of the matrix (C s j ) µ,ν are those whose components satisfy

2 ⌉+d−1 d− 1 =+1 2 ⌉+d 2 −1 d 2 − 1 =
O(N d−1 ) for pure states and⌈ N O(N d 2 −1 ) for operators.Appendix E: Permutation invariant compilation then the MPS representation for the final state has bond dimension d s d o with De with A ≈ 0.044 and B ≈ 0.00041.This is closely related to the observation in the main text that most of the improvement comes from adding twists to

TABLE I :
This table contains the values of θ twist En , θ twist De , and θ twist tot for n En = 1, N = 30, δφ ≈ 0.74, and various values of n De .The values are reported here to three significant figures.
⊗N −t1⊗ |1⟩ ⊗t1 so that the first N − t 1 qubits are in state |0⟩ and the rest are in state |1⟩.A permutation invariant state |ψ⟩ is then one of the form This framework can be naturally extended to states of N qudits.We denote a standard basis for qudits of dimension d by {|0⟩, |1⟩, . . ., |d − 1⟩}.A standard basis state for N qudits of dimension d then has the form ⊗ j |s j ⟩ ≡ |s 1 s 2 • • • s N ⟩ where each |s j ⟩ ∈ {|0⟩, |1⟩, . . ., |d − 1⟩}.Now the equivalence classes defined by the action of the permutation group are labeled by a type vector of the form t = (t 1 , t 2 , . . ., t d−1 ) subject to the