Path integral contour deformations for observables in $SU(N)$ gauge theory

Path integral contour deformations have been shown to mitigate sign and signal-to-noise problems associated with phase fluctuations in lattice field theories. We define a family of contour deformations applicable to $SU(N)$ lattice gauge theory that can reduce sign and signal-to-noise problems associated with complex actions and complex observables. For observables, these contours can be used to define deformed observables with identical expectation value but different variance. As a proof-of-principle, we apply machine learning techniques to optimize the deformed observables associated with Wilson loops in two dimensional $SU(2)$ and $SU(3)$ gauge theory. We study loops consisting of up to 64 plaquettes and achieve variance reduction of up to 4 orders of magnitude.


I. INTRODUCTION
In order to test the Standard Model and search for new physics in experiments involving hadrons and nuclei, precision calculations of Standard Model observables are required. To achieve this, Monte Carlo (MC) calculations of lattice-regularized path integrals for quantum chromodynamics (QCD) have been used to make precise predictions for many phenomenologically relevant quantities in the meson and nucleon sectors; for recent reviews see Refs. [1][2][3][4][5]. Lattice QCD calculations using MC methods are performed in Euclidean spacetime where the action S(U ) is typically a real function of the discretized gauge field U x,µ ∈ SU (3) and an ensemble of gauge fields can be generated with probability distribution proportional to e −S(U ) . Predictions for the expectation values of observables O are then obtained by averaging the results for O(U ) obtained for each gauge field configuration.
For correlation functions describing nucleons, nuclei, and most mesons, O(U ) is complex and includes a gaugefield-dependent phase [6,7]. Phase fluctuations grow more rapid with increasing Euclidean time separation and lead to a "signal-to-noise (StN) problem" in which the StN for a fixed size statistical ensemble decreases exponentially with increasing time separation [6][7][8][9][10][11][12]. Alternatively, multi-nucleon systems can be probed by including non-zero quark chemical potential in the action. In this case, e −S(U ) is not real and positive definite and cannot be interpreted as a probability distribution and the theory is said to have a "sign problem." If e − Re S(U ) is used to define a probability distribution in this case, e −i Im S(U ) leads to rapid phase fluctuations of path integrands and the appearance of a StN problem that is exponential in the spacetime volume [13][14][15][16][17][18][19].
A generic method for taming sign and StN problems in path integrals has recently emerged. This method involves deforming the manifold of integration of the path integral into a complexified field space. If the path integrand is a holomorphic function of the field variables, then a multi-dimensional version of Cauchy's integral theorem ensures that expectation values of the corre-sponding observable is unchanged by the manifold deformation. Manifold deformation may, however, change the values of observables on individual field configurations and therefore modify the severity of phase fluctuations and associated sign/StN problems. Several methods for finding useful manifolds have been proposed and successfully applied in lattice field theories as well as nonrelativistic quantum mechanical theories relevant for condensed matter physics . For a recent review see Ref. [47]. Although most applications have focused on improving sign problems in theories with complex actions, an analogous method for improving StN problems for complex observables in theories with real actions was introduced in Ref. [48].
The focus of this work is on addressing sign/StN problems in QCD-like theories; in this setting, path integral contour deformations have so far been restricted to solving sign problems in simple contexts. Thermodynamic observables at non-zero quark chemical potential have been computed in (0 + 1)D QCD, a theory with a single link variable, using Lefschetz thimble methods [49] and the generalized thimble method [50], while preliminary results using neural-network manifolds were obtained in [51]. In higher dimensions, Lefschetz thimbles were used to analyze finite-density observables for one-and two-site systems in the heavy-dense limit in Ref. [52], and this study predicted that the number of relevant Lefschetz thimbles would grow exponentially with the number of lattice sites for larger systems. The task of computing noisy observables in non-Abelian lattice gauge theories for larger spacetime volumes and higher spacetime dimensions has remained challenging. This paper introduces a simple yet expressive family of complexified manifolds for taming sign/StN problems in SU (N ) gauge theory using path integral contour deformations. The Jacobians required for calculations using this family of deformations are shown to be triangular matrices whose determinants can be computed with a cost that scales linearly with the spacetime volume. This family of manifolds can be applied to all theories involving SU (N ) gauge fields, including gauge theories coupled to matter fields, although their practical utility for im-proving sign/StN problems is only explored here for pure gauge theory.
The deformed observable method introduced in Ref. [48] relates path integrals over deformed contours to path integrals written in terms of modified observables on undeformed contours, enabling improvement in the StN of observables without the need to modify MC sampling. We apply the method here to calculations of Wilson loops in SU (2) and SU (3) gauge theory, in which Wilson loops are known to have an exponentially severe StN problem and have been used to study other StN improvement methods [53]. Calculations are performed in (1 + 1)D as a proof of concept, as it is possible to compare with exact StN results derived analytically and to use specialized approaches for efficient Monte Carlo ensemble generation for (1 + 1)D gauge theories. Results are obtained for a range of Wilson loop areas and lattice spacings including areas of up to 64 lattice units at the finest lattice spacing. The variances of Wilson loops with largest areas are reduced by factors of 10 3 -10 4 , demonstrating that deformed observables can dramatically improve StN problems in SU (N ) lattice gauge theory. The linear scaling with spacetime volume of these contour deformations suggests that it should be computationally feasible to explore the application of analogous contour deformations to (3 + 1)D lattice gauge theory in future work.
The remainder of this paper is organized as follows. Sec. II describes our approach to contour deformations for SU (N ) variables, including a family of complex manifolds for integration over sets of SU (N ) variables, and reviews the deformed observables method introduced in Ref. [48]. Sec. III presents analytical results for expectation values and variances of observables in (1 + 1)D SU (N ) lattice gauge theory. Results for MC calculations of deformed observables for Wilson loops are presented for SU (2) gauge theory in Sec. IV and for SU (3) gauge theory in Sec. V. A summary of results and consideration of future work is found in Sec. VI.

II. GENERAL FORMALISM
Cauchy's integral theorem implies that the contour of a complex line integral can be deformed without changing the integral value if the integrand is holomorphic in the intervening region and the endpoints are held fixed. 1 When multidimensional integration is performed, the full theorem can be generalized if the integral is describable as iterated complex line integrals or by a technical extension to the full multivariate setting [54]. For the purpose of contour deformations, however, only a weaker form of the theorem (equivalent to Stokes' theorem) is 1 For periodic functions this condition on the endpoints can be relaxed, as discussed in Sec. II A. required. Specifically, a contour deformation from manifold M A to M B leaves the integral value unchanged if M A ∪M B bounds a region in which the integrand is holomorphic; see Ref. [47] for a simple proof. To implement such contour deformations and confirm holomorphy of an integrand throughout the relevant region of configuration space, a coordinate parameterization is useful. We discuss such parameterizations and contour deformation for SU (N ) groups and SU (N ) gauge theory in the following sections.
A generic integral over group-valued variable U ∈ SU (N ) can be written as where the Haar measure dU is defined to be the unique measure that satisfies d(V U ) = d(U V ) = dU for V ∈ SU (N ). We choose the conventional normalization condition dU = 1. Using the coordinates introduced above, the integral can immediately be cast as integration over a sub-domain of R N 2 −1 , (2) where h(Ω) is the Jacobian factor associated with the change of measure from dU to dΩ = j dφ j k dθ k , and U (Ω) is the group element at the manifold coordinate Ω. The specific forms of h(Ω) and U (Ω) for the groups SU (2) and SU (3) are presented in Sec. IV A and V A. From Eq. (2), it is clear that each θ k can be deformed into the complex plane holding the endpoints 0 and π/2 fixed. Each φ j can be deformed under the weaker constraint that the endpoints remain identified, as shown in Fig. 1. This can be seen by noting that the end-points of the shifted contour can be connected to the original endpoints using a pair of segments parallel to the imaginary axis; these segments differ by a real 2π shift and a change of orientation so that integrals of periodic functions along these segments exactly cancel. This approach to deforming periodic variables with identified endpoints has previously been applied to path integrals involving U (1) variables [48,[56][57][58]. If the integrand h(Ω)f (U (Ω)) is a holomorphic function of all components of Ω, the value of the integral is unchanged by the deformations described above. We can define a deformed integration contour by the maps φ j = φ j (Ω) ∈ C and θ i = θ i (Ω) ∈ C; for conciseness the collective set of deformed coordinates can be written as a function of original coordinates Ω ≡ ( φ 1 , . . . , φ J , θ 1 , . . . , θ K ) = Ω(Ω). The value of the integral is unchanged by this deformation, Above, U ≡ U ( Ω) is the deformed group element, J(Ω) ≡ det ∂ Ωα ∂Ω β is the (complex) Jacobian factor relating the measure dΩ of the base contour to the measure d Ω of the deformed contour, and the Jacobian factor relating the Haar measure dU between the original and deformed contours is given by For any concrete map U (Ω), the deformed group element is given by simply applying the map to the complexified coordinate Ω. Any real Lie group has a unique complexification and the space in which U lives is well understood. In particular, the complexification of SU (N ) is SL(N, C) [59]. This is easy to see on intuitive grounds, as SU (N ) matrices are specified by unit-norm eigenvalues with determinant 1 and the effect of complexifying the group is to allow arbitrary non-zero eigenvalues while preserving the determinant constraint, resulting in the group SL(N, C).
Though the deformation is defined in terms of the manifold coordinates, we can see in the last line of Eq. (3) that this new integral can be written independently of the coordinates, using the modified integrand J(U )f ( U (U )). This form suggests a coordinate-independent definition of a holomorphic integrand and contour deformation. Such a general approach is beyond the scope of this work, but we note that a deformation can be applied in any coordinate system so long as the integrand can be shown to be holomorphic in some coordinate system. The angular coordinates for SU (N ) are sufficient to show that all components of the matrix representation of U are holomorphic (i.e. the Wirtinger derivatives ∂U (Ω)/∂Ω i are zero [54]). The components of U −1 are holomorphic whenever U is invertible, as can be seen by starting with the definition of the inverse, U −1 U = 1, applying Wirtinger derivatives to both sides, ∂(U −1 U )/Ω i = 0, and using holomorphy of U to obtain ∂U −1 /∂Ω i = 0.
The general construction so far is valid for collections of SU (N ) group variables, and is therefore applicable to SU (N ) lattice gauge theory. Standard pure-gauge actions for SU (N ) lattice gauge theory can be written as holomorphic functions of the variables U and their inverses, if we replace instances of U † with U −1 , and are thus holomorphic throughout the SL(N, C) domain of each complexified group variable. This fact has previously been recognized in complex Langevin approaches to extensive sign problems in lattice gauge theory [60][61][62]. Similarly, (unrooted) fermionic determinants can be written as polynomials in components of gauge links and are therefore holomorphic [36], and many observables can be analyzed in the same way.

B. Vertical contour deformations
We define a family of deformed manifolds for an SU (N ) variable in terms of the angular parameterization by where f (Ω; λ, χ) ∈ R N 2 −1 . This family of vertical deformations inspired by Refs. [56,63] is not fully general as it only includes contour deformations describable by vertically shifting each point purely in the imaginary direction, and in particular it does not include any deformations with Re Ω(Ω) = Re Ω(Ω ) for some Ω = Ω . The Jacobian of any deformation in this family is straightforward to compute as where α, β = 1, . . . , N 2 − 1 index the angular parameters Ω = (φ 1 , . . . , φ J , θ 1 , . . . , θ K ). The function f can further be expanded in terms of Fourier modes, where I ≡ (n 1 . . . n a , m 1 . . . m b ), and sin(2θ j m j ), (8) which provides a complete basis for vertical contour deformations in the limit Λ → ∞. Including successively more Fourier modes by increasing the Fourier cutoff Λ systematically improves the flexibility of the function f . In our applications to SU (N ) gauge theories in (1 + 1)D below, Fourier cutoffs Λ ∈ {0, 1, 2} are explored. It is noteworthy that the sum over azimuthal Fourier modes includes the constant mode as well as phase offsets χ i because azimuthal angles can be deformed without fixing their endpoints as discussed above. These constant modes are essential for the sign/StN problem reduction achieved in (1 + 1)D examples below.

C. Path integral deformations for noisy observables
We next review the deformed observable approach presented in Ref [48], in which contour deformations of lattice field theory path integrals are used to define modified observables with improved noise properties and unchanged expectation value. We focus here on SU (N ) lattice gauge theory path integrals, which are highdimensional integrals over a collection of group-valued degrees of freedom U x,µ ∈ SU (N ). Here, x specifies a site on the discrete spacetime lattice and µ ∈ {1, . . . , N d } is any of the N d spacetime directions on the lattice. The integrals under study take the form where Z = DU e −S(U ) (10) and the action S(U ) ∈ R is a function of all gauge links. Details on the construction of lattice gauge theory are presented in Sec. III. It is possible to deform the integration contour of an SU (N ) lattice gauge theory path integral by individually deforming each group-valued variable U x,µ using the formalism presented above. In principle the deformed link, U x,µ , could be a function of all other links on the lattice. However, evaluating the Jacobian factor arising from such an arbitrary deformation would require O(V 3 ) operations, where V is the number of sites of the lattice. For state-of-the-art lattice field theory calculations, this is intractable. A similar obstacle is encountered in the application of normalizing flows to sampling probability distributions in image or lattice data analysis, see e.g. Ref. [64] for a review, where it is avoided by explicitly restricting to triangular Jacobians for which the determinant factor is efficiently calculable from the diagonal elements. In this work, we similarly restrict to triangular Jacobians by allowing deformations of each variable to depend only on previous variables in a canonical ordering, described in detail for our particular studies of SU (2) and SU (3) in the following sections. Exploring other options is an interesting possibility for future work; for example, transformations built from a composition of multiple triangular transformations may allow more general spacetime dependence without significant increase in cost, whereas other alternatives based on convolutions may scale supra-linearly as the volume is increased.
Given a deformed manifold M with tractable Jacobian factor, a deformed integral can be constructed to compute the same expectation value defined by Eq. (9), The above equality holds if the original manifold can be deformed to M without encountering any nonholomorphic regions of the integrand, i.e. if the integrand is holomorphic in the region bounded by the union of M and the original manifold. The abstract manifold M can be specified by a map U (U ), which then gives a concrete prescription for computing Eq. (11), In contrast to Eq. (9), the path integral in Eq. (12) involves a generically complex-valued action, S( U (U )) ∈ C, and a different observable J(U )O( U (U )) written in terms of the (efficiently computed) Jacobian factor J(U ). Cauchy's theorem implies that these modifications conspire to cancel and result in an identical expectation value.
It is useful to further rewrite Eq. (12) as a path integral with respect to the original action, In this rewriting, it is clear that the deformed path integral is still accessible by performing Monte Carlo sampling with respect to the original action S(U ) and estimating the sample mean of Q(U ). These methods can therefore be applied at the measurement step, after an ensemble of field configurations has been sampled. In essence, the deformed path integral defines a new observable Q(U ) that has provably identical expectation value to the original observable O(U ). Notably, this new observable generically has very different structure from O, as it may be non-local and depends on the structure of the action. The variance of Q(U ) can, however, be vastly different from the variance of O(U ). For most observables, samples of O(U ) are complex-valued, and the variance of the real and imaginary components are given by These are not generically identical to the variance of Re Q and Im Q, The final term is unchanged because O = Q , but the first two terms in both lines of Eq. (15) are not generically equal to the corresponding terms in Eq. (14). Explicitly, those terms are given by Extra factors of S(U ) persist in both expressions in Eq. (16), and a non-holomorphic absolute value appears for |Q 2 | , preventing identification with the terms in Eq. (14). It can thus be fruitful to look for a modified observable Q for which the terms in Eq. (16) are minimized and the statistical noise is less than that of the original observable.
Given an explicit parameterization of the deformed contour, standard gradient-based optimization methods can be applied to find the parameters that minimize the terms in Eq. (16). Since the parameters only affect the observable itself (the sampling weight is always e −S(U ) ), the gradient of the variance with respect to the parameters can be written as an expectation value under the original Monte Carlo sampling. In this work, minimization of Eq. (16) is performed using stochastic gradient descent over the deformed contour parameters, which approximately converges to a local minimum of the variance under mild assumptions [65][66][67][68], and starting at the original manifold ensures that the result improves relative to (or at worst is equivalent to) the original variance.
A potential obstacle to the deformed observable method is that some contour deformations could lead to a severe overlap problem between probability distributions proportional to e −S(U ) and e −Re[S( U (U ))] that could make estimates of deformed observable variances from finite Monte Carlo ensembles unreliable. The possibility of constructing deformed observables with severe overlap problems can be mitigated, however, by constructing deformed observables that minimize the variance of Q since large fluctuations of e −S( U (U ))+S(U ) will tend to increase the variance of Q. Overlap problems could still arise due to overfitting to the sample variance of a Monte Carlo ensemble that does not sample the field configurations associated with large fluctuations of e −S( U (U ))+S(U ) ; practical strategies to avoid such overfitting problems are discussed in Sec. IV C below.
The terms to minimize in Eq. (16) are specific to a particular observable O. There is no reason to expect a single manifold deformation to be optimal for all possible observables, but one does expect similar observables to be highly correlated, and thus to receive similar variance improvements from the same deformation. This suggests two useful practical improvements: 1. Optimal manifolds can be found for a few representative observables and can be reused for other similar observables.
2. When optimizing manifold parameters, the optimal parameters for a similar observable can be used to initialize the search.
In our studies of SU (2) and SU (3) lattice gauge theory in Sec. IV and Sec. V, the initialization approach was found to significantly reduce the number of steps required for optimization.

D. Rewriting observables before deformation
It is often the case that the expectation value of an observable O has multiple equivalent path integral representations, for example when a theory possesses a gauge or global symmetry that modifies O but leaves the action and expectation values of observables invariant. In addition to the freedom of choosing the parameterization of contour deformations for a given path integral discussed above, the application of contour deformations to observables includes the freedom of choosing which path integral representation to deform.
Simple examples of this freedom arise in twodimensional U (1) Euclidean lattice gauge theory. For instance, path integral representations of Wilson loops with unit area in this theory are proportional to which is an integral representation of the modified Bessel function of the first kind I n (β), with n = 1, written in terms of the field variable φ. The integrand appearing can be interpreted as a product of an observable O = e iφ and an e −S factor for the Euclidean action S = −β cos φ. This theory has a charge conjugation symmetry that acts on the integrand of Eq. (17) by φ → −φ, which leaves the action invariant but modifies the observable e iφ → e −iφ . An equally valid integral representation of I 1 (β) is obtained by averaging the observable and its charge conjugate as The choice of integrand in Eq. (17) or Eq. (18) is irrelevant for Monte Carlo calculations using the integration contours shown because the Monte Carlo estimator used in the first case is Re e iφ = cos φ. However, the ability of path integral contour deformations to reduce the variance of a Monte Carlo calculation does depend on the representation. In particular, the variance of a Monte Carlo evaluation of Eq. (17) can be significantly reduced by contour deformations while the variance of a Monte Carlo evaluation of Eq. (18) cannot, as discussed below and shown in Fig. 2.
Ref. [48] demonstrated that the variance of twodimensional U (1) Wilson loops can be significantly reduced using contour deformations of the representation given in Eq. (17); we review the analytical derivation here. Denote averaging of O(φ) with respect to the path integral of the theory with action −β cos φ by where the modified Bessel function with rank n = 0 is used to normalize the distribution. In general, the modified Bessel functions of the first kind are given by 2π e inφ e β cos φ and will be used throughout the following derivation. Further denoting the corresponding variance of O given β by Var β [O], the variance of Re e iφ = cos φ can be computed to be The constant vertical deformation leads to a deformed observable satisfying Q e β = e iφ β . Using Re Q e as a Monte Carlo estimator instead of Re e iφ results in variance where For positive β, Var β [Re Q e ] generically has a minimum at some strictly positive f that defines the optimal contour for reducing the variance of Q e . Using deformed observables associated with this optimal contour rather than the original contour leads to variance reduction whose significance is larger than an order of magnitude for β 4 as shown in Fig. 2. Larger variance reductions can be obtained for multi-dimensional generalizations of Eq. (17) associated with larger U (1) Wilson loops [48]. Applying the same constant vertical deformation to the representation in Eq. (18) leads to an alternative deformed observable satisfying Q c β = e iφ β . The real part of this alternative deformed observable has variance This expression is symmetric under f → −f and its gradient with respect to f therefore vanishes at f = 0, which can be verified to be the minimum of f . The deformed observable Q c associated with integrating along the real axis is thus the choice with lowest variance, while increasing |f | away from zero always leads to increased variance as illustrated for several choices of β in Fig. 2.
The qualitative features of these results can be understood from the behaviors of magnitude and phase fluctuations of deformed observables. The magnitude of e i φ is reduced for all φ by a constant vertical deformation with f > 0. To preserve the results of the holomorphic integral in Eq. (17) under this deformation, cancellations from phase fluctuations must correspondingly be less severe and sign/StN problems should therefore be improved. On the other hand, the magnitude of cos( φ) always satisfies | cos( φ)| ≥ | cos(φ)| and therefore can only be increased by applying a vertical contour deformation. This suggests that phase fluctuations must become more severe to preserve deformation-invariant integral results and that sign/StN problems will be worsened. This argument applies to non-constant vertical deformations and suggests that it is generically difficult to construct a contour deformation of Eq. (18) that leads to a deformed observable with variance comparable to the observable Q e obtained by deforming Eq. (17).
In the non-Abelian gauge theories that are focus of this work, the traces of Wilson loops define gauge invariant observables related to the potential energies of static quark configurations analogous to e iφ in U (1) gauge theory. The eigenvalues of Wilson loops in U (N ) or SU (N ) gauge theories can be expressed as e iφj , where φ j ∈ R for j = 1, . . . , N and for the case of SU (N ) the phases satisfy N j=1 φ j = 0 mod 2π. The trace of the Wilson loop is given by j e iφj . For SU (N ) gauge theory, the unit determinant condition therefore results in analogous obstacles to improving the variance of the trace of Wilson loops if the observable is not first rewritten using symmetries.
For the case of SU (2), there is a precise correspondence between Wilson loop traces and Eq. (18). The unit determinant condition requires the Wilson loop eigenvalues to be of the form e iφ and e −iφ and the trace appearing in both the observable and the Wilson action [69] (discussed below in Sec. IV) are proportional to cos(φ). It is therefore similarly impossible to improve the variance of a unit area SU (2) Wilson loop trace using constant vertical deformations, and the previous analysis suggests it is difficult to make significant variance improvements to traces of SU (2) Wilson loops of general area using vertical contour deformations. However, this analysis also indicates that e iφ could provide a suitable starting point for defining deformed observables. The gauge invariant component of the Wilson loop eigenvalue e iφ is cos(φ) making this a suitable rewriting of the observable that does not affect the expectation value but enables variance improvements by contour deformation. 3 In the case of SU (N ) with N ≥ 3, complex eigenvalues are not guaranteed to come in complex conjugate pairs, but the unit determinant condition still guarantees that any vertical deformation of the eigenvalue phases will increase the magnitude of at least one e iφj phase factor. In the sum over eigenvalues defining the trace of a Wil-son loop, the largest eigenvalue magnitude will set the typical observable magnitude on generic gauge fields in the integration domain. While it is possible for stronger cancellation between eigenvalues to occur on particular gauge fields, this larger typical magnitude throughout the majority of the domain of integration suggests that cancellations from phase fluctuations with similar severity to those of the original observable will therefore be required for such deformed observables to achieve identical expectation values, and significant sign/StN problems may be expected. If one instead chooses the non-gauge-invariant integrand e iφ1 to measure the same expectation value, the determinant constraint does not prevent exponentially decreasing the magnitude throughout the integration domain using contour deformations. For example, one can choose the shift which has the effect of reducing the observable magnitude by e −f on every gauge field in the integration domain.
Rewriting Wilson loop observables based on eigenvalues requires diagonalization, and a more practical alternative is to use the (1, 1) component of the (matrixvalued) Wilson loop as another non-gauge-invariant function with the same expectation value as the trace divided by N . The phase of this (or any other) single color component of the Wilson loop is not constrained by the unit determinant condition and therefore one expects that a suitable parameterization can be found in which vertical deformations can be applied to the phase of the (1, 1) component of the Wilson loop analogously to e iφ . Such parameterizations are given for SU (2) in Sec. IV and for SU (3) in Sec. V following Ref. [55] and are used as starting points for defining deformed observables with reduced variance in calculations of Wilson loop expectation values. An alternative parameterization of SU (2) in which the real part of the (1, 1) component of the Wilson loop is expressed as cos(α/2) is explored in Appendix B; as expected, vertical contour deformations do not improve the variance of unit area Wilson loops and less (though still significant) variance reduction is found for larger area Wilson loops with this alternative parameterization.

III. NOISE PROBLEMS IN SU (N ) LATTICE GAUGE THEORY
A simple setting for analyzing SU (N ) lattice gauge theory is obtained by considering (1 + 1)D Euclidean spacetime with open boundary conditions. In this spacetime geometry, much like in (3 + 1) dimensions, the theory features confinement of static test charges and an exponentially severe StN problem associated with static quark correlation functions, which can be identified with Wilson loops [69]. Numerical calculations of Wilson loop expectation values can be performed at much lower computational cost in (1 + 1)D than (3 + 1)D, facilitating a first exploration of path integral contour deformations applied to non-Abelian gauge theory observables on nontrivial lattices. Analytic results for (1 + 1)D observables such as Wilson loops are also known [70,71] and can be used to verify the correctness of numerical results. These results are extended to analytic results for the variances of (1 + 1)D Wilson loops below, which are then used in Secs. IV-V to verify the correctness and study the effectiveness of contour deformations applied to Wilson loops. In particular, analytic results can be used to determine the StN gains obtained by using deformed observables even when the corresponding undeformed observables are too noisy to be determined reliably.
A. SU (N ) lattice gauge theory in (1 + 1)D Lattice gauge theory in (1 + 1)D is defined on a set V of Euclidean spacetime points x arranged in a discrete two-dimensional lattice, with vectors1 and2 giving the displacement in lattice units between neighboring lattice sites along the two Euclidean spacetime axes. The discretized gauge field is represented by group-valued variables on each link of the lattice, with U x,µ denoting the variable associated with link (x, x+μ). The physical content of the theory is encoded in the (discretized) action. We consider the Wilson action for SU (N ) lattice gauge theory [69], given for a (1 + 1)D Euclidean spacetime volume by where g is the bare gauge coupling and each plaquette P x ∈ SU (N ) is defined as Writing the action and plaquettes using inversion rather than Hermitian conjugation allows the relevant integrands to be interpreted in the following sections as holomorphic functions of integration variables throughout the complexified domain. For SU (N ) elements these operations are equivalent, but analytically continuing the action to SL(N, C) requires the use of the inverse [60][61][62]. Expectation values of operators O(U ) in the lattice regularized theory are defined by specializing Eq. (9)-(10) to the particular case of SU (N ) lattice gauge theory where the Euclidean partition function Z is defined by and DU = x,µ dU x,µ in terms of the Haar measure dU x,µ of SU (N ).
With open boundary conditions in (1 + 1)D, the partition function defined by this action factorizes into a product of independent integrals over each P x . To exploit this factorization in (1 + 1)D, a gauge fixing prescription can be applied in which U x,2 = 1 for all x and U x,1 = 1 for sites with x 2 = 0 (a maximal tree gauge). In this gauge, which can be easily inverted to obtain The variables P x are therefore in one-to-one correspondence with the remaining non-gauge-fixed U x,1 . The Haar measure is invariant under this change of variables, and the path integral defining the partition function factorizes as where V ⊂ V is the subset of lattice points with unconstrained U x,1 in this gauge (those for which x 2 = 0) and z is the contribution to the partition function from a single plaquette, The calculations of z and similar single-variable SU (N ) integrals are presented in Appendix A. Wilson loops are defined by the matrix-valued quantity where x,µ∈∂A U x,µ represents an ordered product of links along the boundary ∂A of the two-dimensional region A with area A. The expectation value of the gaugeinvariant observable 1 N tr (W A ) probes the interaction between a pair of static quarks if the region A is taken to be rectangular. Inserting Eq. (33) into Eq. (36) gives 4 Using linearity of expectation values and factorization of path integrals analogous to Eq. (34), the expectation values of Wilson loops can be related to products of (matrixvalued) single-variable expectation values, 4 For simplicity we restrict to rectangular Wilson loops with one corner at the origin.
Each single-variable expectation value is given by P ab x = χ 1 δ ab , allowing the traced Wilson loop to be written as a product of scalars, where we have introduced the single-variable normalized expectation value of the group character function χ 1 (P ) = tr(P ), whose value is computed in Appendix A.
Eq. (39) implies that Wilson loop expectation values follow area law scaling, tr(W A )/N ∼ e −σA , and SU (N ) gauge theory in (1 + 1)D confines for all values of the coupling, with a separation-independent force between static test charges given by the string tension Although χ 1 is in general given by a convergent infinite series in Eq. (A6), in the case of SU (2) a simpler form can be found in terms of modified Bessel functions, which goes to zero as g 2 → 0. This observation can be generalized to all SU (N ) groups, and the lattice-units string tension goes to zero while the static quark correlation length grows to infinity in the limit of g 2 → 0 in all cases. We can consider this to be the naive continuum limit of the theory, though the correlation lengths of dynamical quantities such as plaquettes or localized Wilson loops remain finite by the factorization of the path integral. When investigating the approach to the continuum in Sec. IV and V, we should decrease the coupling while fixing the dimensionless quantity σV , where V is the total number of plaquettes; the particular choices of couplings and V used in our numerical studies are reported in Table I. Results are plotted versus σA when comparing quantities at fixed physical separation is important.

B. Noise and sign problems in the Wilson loop
Although the expectation value tr(W A )/N is real, the integrand tr(W A )/N has fluctuating signs (for N = 2) or fluctuating complex phases (for N ≥ 3) across the domain of integration. These fluctuations result in a sign/StN problem for this observable. The sample mean The expectation values in the first and second terms in the variance can be factorized analogously to the Wilson loop expectation value, and are shown in Appendix A to be in terms of the single-site integrals χ 1,−1 , χ 2 , and χ 1,1 , defined in Eqs. (A9) and (A13). In total, the variance is where c is a constant. The fact that χ r < 1 for nontrivial irreps r (assuming that g 2 is finite) [72] implies that c > 0 and therefore that the variance is asymptotically constant as A → ∞, The signal-to-noise ratio for n samples can be written exactly in terms of Eqs. (43), (44), and (39), but for this analysis it is sufficient to identify the asymptotic behavior from Eqs. (45) and (39), giving which degrades exponentially with area A. For the estimator Re tr(W A )/N , the analysis above shows that this can only be counteracted by exponentially increasing the number of samples n. Eqs. (44) and (45) also make clear that the leading asymptotic behavior of the variance is due to the typical magnitude-squared of the observable, | tr(W A ) 2 |/N 2 , which remains O(1) for all areas. Cancellations due to phase fluctuations are required to reproduce the exponentially small Wilson loop expectation values predicted for large areas, confirming that the StN problem can be related to a sign problem in the Wilson loop observable.
Attributing the StN problem to O(1) magnitudes for individual samples of the Wilson loop observable at all areas also inspires our deformations of the Wilson loop observable in the following sections. The quantity | tr(W A ) 2 |/N 2 can be written as an integral of a nonholomorphic integrand which will generically be modified by contour deformations of the path integral. If we choose contour deformations that reduce the average magnitude of the observable, this quantity, and thus the leading term of the variance, will be reduced. The observable mean is unchanged and the StN ratio will thus increase under such a deformation.
For SU (2), the single-site integrals can be evaluated straightforwardly (see Appendix A) and the SU (2) variance is where σ SU (2) is given in Eq. (42). The StN of SU (2) Wilson loops in (1 + 1)D can therefore be explicitly calculated, There are many possible parameterizations of the SU (2) group manifold, any of which can be used to define valid path integral contour deformations. We argue above that it is advantageous to consider a single component of the Wilson loop, taken without loss of generality to be W 11 A , as the observable whose path integral contour is deformed in order to calculate W 11 A = tr(W A )/N . Contour deformations that reduce the magnitude of W 11 A in generic gauge field configurations while preserving W 11 A can be expected to reduce phase fluctuations and therefore the variance of W 11 A . The angular parameterization of each plaquette P x ∈ SU (2) is useful for this purpose, and is explicitly defined by following the generalized SU (N ) angular parameterization given in Ref. [55]. The azimuthal angles satisfy φ 1 x , φ 2 x ∈ [0, 2π], with endpoints identified, while the angle θ x spans the finite interval [0, π/2]. The normalized Haar measure can be written in these coordinates as We begin by considering the effects of simple deformations using these parameters. In the simplest case of a region A with area A = 1, the Wilson loop consists of a single plaquette, W 11 A = P 11 x , where the loop starts and ends at site x. The magnitude of W 11 A can be reduced by e −λ by deforming φ 1 x → φ 1 x + iλ analogously to the approach described for e iφ integrals above. In the case of A = 2, the Wilson loop can be written in terms of the product of two plaquettes, W 11 A = (P x P x ) 11 . In the angular parameterization, the Wilson loop is a sum of two terms The first term involves products of diagonal entries whose magnitude can be reduced by e −λ by taking φ 1 x + iλ and the second term involves offdiagonal components whose magnitude can be reduced analogously by taking φ 2 For A > 2, it can be seen similarly that shifting φ 1 for all x leads to suppression of the magnitudes of all terms appearing in W 11 A .
A family of contour deformations capable of expressing these constant imaginary shifts to the phases of all elements of P x can therefore be expected to reduce phase fluctuations and the variance of W 11 A . Such a family of contour deformations is parameterized below as a subset of the vertical deformation expanded in a Fourier series in Eqs. (5)- (8).
An alternative parameterization of SU (2) is explored in Appendix B, in which it is found that imaginary shifts along these lines are more difficult to express and orders of magnitude less variance reduction is achieved when applying the same optimization methods. This exploration suggests that a choice of parameterization that allows the observable to be expressed in the form e iφ is important for variance reduction in observables afflicted with a sign problem.
It is also possible to directly parameterize the gauge field U x,µ ∈ SU (2) using Eq. (50). This alternative parameterization may be useful in more than two spacetime dimensions, where Gauss' Law constraints imply that not all plaquettes are independent and a path integral change of variables from U x,µ to P µν x cannot be performed straightforwardly.

B. Fourier deformation basis
In our study of SU (2) gauge theory, we optimize over a family of vertical contour deformations expressed in terms of a Fourier series truncated above a specific cutoff mode. To avoid a costly Jacobian calculation, each plaquette variable P x is deformed conditioned on plaquettes P y at sites earlier in the product defining W A in Eq. (37), which we write as y ≤ x. This family of deformations is given bỹ y , φ 2 y ; κ xy , λ xy , η xy , χ xy , ζ xy ), y , φ 2 y ; κ xy , λ xy , η xy , χ xy , ζ xy ), in terms of parameters κ xy , λ xy , η xy , χ xy , and ζ xy . The functions f θ , f φ 1 , and f φ 2 compute the shift in the imaginary direction of the angular parameters of P x conditioned on P y , and their decomposition in terms of Fourier modes is detailed below. For this ordered dependence on previous sites, the Jacobian determinant factorizes into a product of block determinants per lattice site where The structure of the deformation in Eq. (53) therefore bypasses the need for expensive Jacobian determinant calculations involving matrices whose rank grows with the spacetime volume and is inspired by analogous methods to simplify Jacobian determinant calculations in normal-izing flows [64]. Note that an absolute value is not taken over the determinant in Eq. (55).
The vertical deformation in Eq. (53) can be expanded in a multi-parameter Fourier series as where Λ is a hyperparameter that sets the maximum Fourier mode to include and controls the total number of free parameters. As the zero modes have trivial y dependence, we have collected them in Eq. (53) into the yindependent terms κ x;φ 1 0 and κ x;φ 2 0 . The included Fourier terms are defined to satisfy the constraints θ x (0) = 0, θ x (π/2) = π/2, φ 1 , which together ensure that the endpoints of both the zenith and azimuthal integration domains are appropriately handled as described in Sec. II A. The derivatives needed for the Jacobian in Eqs. (54)-(55) can be calculated straightforwardly by differentiating Eq. (56). The additional factor describing the change in the Haar measure needed to compute the Jacobian of the group measure is given in these coordinates as Combining the results of Eq. (50) and Eqs. (53)-(57), the expectation value of any holomorphic observable O({P x }) is equal to the expectation value of the deformed observable where If the plaquettes are sampled in the matrix representation for Monte Carlo evaluation, computing the observable Q in Eq. (58) requires converting to the angular representation before deforming and evaluating. This conversion is given by and can be done when evaluating the observable Q({P x }). Though these functions are not entire, the conversion used here does not determine whether the integrand itself is a holomorphic function of these angular parameters.

C. Optimization procedure
This contour deformation expanded in a Fourier series provides a means of exploring deformed observables with potentially reduced variance. It is shown above that simple deformations within this family are possible to construct by hand and are already sufficient to reduce the average magnitude of Wilson loop observables. However, these deformations are restricted to zero modes of the Fourier expansion and rely on construction based on intuition. To maximize the variance reduction, we explore numerical optimization of the deformation parameters κ xy , λ xy , η xy , χ xy , and ζ xy as discussed in Sec. II C. We are interested in Re W 11 A , for which the terms of Eq. (43) that can be modified by contour deformation are where Q A is the deformed observable associated with the W 11 A . The first term in Eq. (61) is manifestly non-holomorphic due to the absolute value over a complex-valued observable, while the second term includes squared reweighting factors of the original and deformed action which prevent identification as a deformation of (W 11 A ) 2 . These terms together define the loss function L that we aim to minimize as a function of the deformation parameters. This loss function is written as an expectation value in terms of sampling from the original Monte Carlo weights e −S({Px}) , and its gradient can similarly be written as an expectation value, The term ∇ Re Q A can be expanded using the explicit form of Q A given in Eq. (58), as well as the forms of the observable W 11 A and the action S in terms of {P x } in Sec. III A. For this study, the gradient ∇ Re Q A was computed explicitly and cross-checked using automatic differentiation available in the JAX library [73]. Eq. (62) can be stochastically estimated using an (undeformed) ensemble of n configurations {P i x }, i ∈ [1, . . . , n], drawn proportional to the weight e −S({Px}) , In all experiments below, we used the Adam optimizer [74] to iteratively update parameters based on these stochastic gradient estimates. Each gradient estimate was computed using 1/100th of the generated ensemble; empirically, this small subset of the data was sufficient to learn useful manifold parameters with significant variance reduction. The optimizer was configured with default hyperparameters, except for a dynamically scheduled step size. Stochastic noise on gradient estimates and large optimizer step size can either slow convergence or result in unstable training dynamics, while step sizes that are too small waste computation as parameters fail to move quickly along precisely estimated gradients. We thus used a dynamic schedule that reduced the step size over time. In particular, our step size schedule started with an initial step size s 0 and then permanently reduced the step size by a factor of F (i.e. s i+1 = s i /F ) each time the loss function failed to improve over a window of W steps. The schedule halted training after the step size was reduced N r times. We used the parameters F = 10, W = 50, N r = 2, and s 0 = 10 −2 for both SU (2) and SU (3) gauge theory.
In preliminary investigations, we found that naive manifold optimization resulted in overtraining, i.e. overfitting parameters to the specific training data available [75][76][77]. In the context of manifold optimization, this corresponds to minimizing a finite-ensemble variance estimator rather than minimizing the true variance of Re Q A . In practice this produced deformed observables with higher variance when measured on a different ensemble than the training set.
To mitigate overtraining in the final results, we reserved a "test set" of configurations, independent of the training data, on which the loss function was periodically measured [78]; the reserved test set of configurations was chosen to have the same size as the training set. The step size schedule was configured to use loss measurements based on this test set, ensuring that training was slowed and halted before significantly overfitting. We further used a mini-batching technique, in which a set of m ≤ n configurations are resampled from the training set to estimate Eq. (63), as a means of avoiding overtraining [68]. The mini-batch size was chosen to be equal to the size of the full training set (i.e. m = n), thus mini-batch evaluation in this context was just a resampling operation, giving variation in gradient estimates over multiple evaluations. The fluctuations in these gradient estimates are on the order of the uncertainty of the variance estimator, preventing overfitting below this uncertainty. For each observable W 11 A we also found it important to restrict to deforming only the plaquettes within the region A. Though including extra degrees of freedom cannot make the optimal variance any higher (the optimization steps could always leave those plaquettes undeformed), in practice we found that including such degrees of freedom allowed the deformed manifold to rapidly overtrain, resulting in worse performance overall. Appendix C details further possible approaches to avoiding overfitting and overlap problems using a regularizing term added to the loss function. These approaches were found to be unnecessary for our final results.
Finally, making a good choice of initial manifold parameters yielded practical improvement in training time and quality. On one hand, initializing the manifold parameters to zero ensures that gradient descent starts from the original manifold, and the optimization procedure should find a local minimum with variance no higher than the original manifold (up to noise from stochastic gradient estimates). However, correlations in sign and magnitude fluctuations of observables with similar structure, such as Wilson loops W 11 A and W 11 A with overlapping areas A and A , suggest that the variance reduction from contour deformations will be correlated as well. Though the optimal manifold for one observable will not generically be optimal for the other, it can serve as a better starting point than the original manifold. In our study of Wilson loops, we initialized manifold parameters for each Wilson loop of area A using the optimal parameters for the Wilson loop of area A−1, when the Fourier cutoff Λ = 0, or using the optimal parameters for the Wilson loop of area A and cutoff Λ − 1, when Λ = 0. Figure 3 shows the improvement in optimization time and quality using this method for manifold deformations with Λ = 0. While this approach sacrifices the guarantee that the local minimum obtained corresponds to a deformed observable with variance no higher than the original observable (in the limit of infinitely precise gradient estimates), in practice we find that this property is not violated. We also note that this property can be easily checked after optimization, and if the variance were found to increase with respect to the original observable training could instead be started from the original manifold to recover the guarantee. The loss curves are averaged over blocks of 25 iterations for clarity. In each case, the training is halted based on the plateau criterion described in the main text, resulting in traces of different lengths. For Wilson loops with larger areas, the final loss value is substantially lower when initialized from optimal manifold parameters at a A − 1, despite using nearly four times fewer training iterations.

D. Monte Carlo calculations
We investigated the practical performance of these contour deformations on the three sets of SU (2) parameters detailed in Table I. These parameters correspond to variation in the string tension by a factor of 4. At each choice of parameters, n = 32000 configurations were generated, exploiting the factorization of the path integral to draw samples of each plaquette P x independently from the marginal distribution, Plaquette samples were generated using Hybrid Monte Carlo [79] with parameters tuned to maintain autocorrelation times of order 1-2, 5 and these individual plaquettes were then arranged into lattices consisting of V sites each. A random shuffle was applied to the collection of plaquettes prior to this assignment to avoid spurious spatial correlations, ensuring an asymptotically vanishing bias in the limit of infinite ensemble size. On each ensemble, we performed a study of deformations with Fourier cutoff Λ fixed to zero. For all three choices of parameters, training on a subset of 320 configurations was sufficient to converge to manifolds with variance reduction up to four orders of magnitude when comparing the largest deformed observables to the original Wilson loop observables. An additional subset of 320 configurations was reserved as the test set during optimization, and the remaining 31360 configurations were used to measure results. Measurements of Q A were found to be consistent with the exact results given in Sec. III A and with the Monte Carlo results for W 11 A in the region where the estimates of W 11 A were reliable. The variance of Re W 11 A is dominated by (Re W 11 A ) 2 , which is positivedefinite for all gauge field configurations. It was therefore possible to precisely measure the variance without a sign/StN problem, and the expected agreement with analytical results was reproduced. In particular, O(1) scaling at large area can be clearly seen. In contrast, we found that the variance of Q A for manifolds optimized as above falls exponentially at large area, giving exponential improvements in the signal-to-noise. These results are presented for all three ensembles in Fig. 4.
Improvements from contour deformation were also found to be similar for Wilson loops with fixed physical area σA across the three values of the lattice spacing. Fig. 5 compares the improvement in the Wilson loop variance versus the dimensionless scale σA across all three ensembles, and the three curves can be seen to nearly collapse at small areas, though there are differences of roughly a factor of 4 at the largest areas. Despite this variation, the variance of the Wilson loop with largest area is reduced by more than 10 3 even for the finest ensemble. Analogous results were observed for (1 + 1)D U (1) gauge theory in Ref. [48].
For Λ = 0, there are few enough parameters that it is possible to investigate the optimal parameters found by the optimization procedure.  corresponding to a positive imaginary shift applied to φ 1 x , and a decreasing κ x;φ 2 0 , corresponding to a positive imaginary shift applied to each difference φ 2 x −φ 2 x+1 . Only these relative differences between neighboring φ 2 x have an effect on the value of Q A , thus the overall shift on the collection of κ x;φ 2 0 is irrelevant. We further optimized manifold parameters using Fourier cutoffs Λ = 1, 2 on the ensemble with coupling g = 0.71 to investigate gains from including higher Fourier modes. Including Fourier modes larger than the constant term enables more complex deformations of each angular parameter, and introduces possible de- pendence on plaquettes at sites y ≤ x when deforming P x . Despite this increased expressivity, these additional parameters did not provide significantly larger StN improvements compared to using only constant terms, as shown in Fig. 7. In some cases, the optimized manifold with larger cutoff resulted in higher variance (lower variance ratio) than the optimized manifold with cutoff Λ = 0. The manifolds with larger cutoffs include all possible manifolds with smaller cutoffs, thus this is necessarily a training effect, likely due to noisier gradients and less stable training dynamics. We did not pursue noise reduction and alternative approaches to training (such as iteratively including higher Λ) as these manifolds with higher cutoffs did not produce significant improvements at any value of the area.

V. SU (3) PATH INTEGRAL CONTOUR DEFORMATIONS
We further investigated the ability of contour deformations to reduce the variance of Wilson loops in SU (3) gauge theory in (1 + 1)D with open boundary conditions. This setting is identical to the previous section, except for the use of SU (3) rather than SU (2) gauge field variables. Suitable parameterizations for contour deformations of SU (3) gauge fields are discussed below.

A. Gauge field parameterization and contour deformation
For the SU (3) gauge group, we use the angular parameterization constructed in Ref. [55]. The components of a single plaquette P x ∈ SU (3) are parameterized as P 11 x = cos θ 1 x cos θ 2 x e iφ 1 x , in terms of the three zenith angles 0 ≤ θ 1 x , θ 2 x , θ 3 x ≤ π/2 and the five azimuthal angles 0 ≤ φ 1 x , . . . , φ 5 x ≤ 2π for each plaquette. We collect these angles into a variable Ω x = (θ 1 x , .., θ 3 x , φ 1 x , ..., φ 5 x ) for ease of notation. The Haar measure is related to the measure of Ω by [55] To compute deformed observables from Monte Carlo samples in the matrix representation, an inverse map of (65) is needed and for example can be specified by An SU (3) field configuration in (1 + 1)D with open boundary conditions is defined by a collection of angular variables Ω x associated all plaquettes P x on the lattice. The a th component of the deformed angles at site x is denoted by ( Ω x ) a , and for vertical deformations expanded in Fourier modes is specified bỹ Deformed observables analogous to Eq. (58) can be constructed for the SU (3) case using this parameterization and ratios of the deformed and undeformed Haar measure factors obtained from Eq. (66).

B. Results
Practical performance of these deformations was investigated by optimizing Wilson loop variance using the three sets of SU (3) parameters detailed in Table I as in the SU (2) case. The couplings were tuned to match the string tensions used for SU (2) gauge theory and correspond to lattice spacings varying by a factor of 4. For each choice of parameters, an ensemble of n = 32000 configurations was generated using the factorized HMC method discussed in Sec. IV D. Fig. 8 shows variance reduction for Wilson loops of all possible sizes for the three lattice spacings studied. At the largest loop areas, we found variance reduction of greater than three orders of magnitude. Across all three ensembles, analytical results for the Wilson loop expectation values and variances were reproduced by the undeformed Monte Carlo estimates. The expectation value of the deformed observable is consistent with the analytical and original Monte Carlo results, while the variance of the deformed observable exponentially decreases with increasing σA.  ance at the two coarser lattice spacings (g = 0.72 and g = 0.53), and a small, yet significant, decrease in the variance improvement achieved at the finest lattice spacing (g = 0.38). Despite this, the variance was reduced by three orders of magnitude at the largest area on the finest lattice by using an optimized deformed observable, and at all three couplings variance improvements are consistent with exponential in the physical loop area. The number of parameters to be optimized grows with the volume in lattice units, and the analogous results observed for SU (2) gauge theory suggest that the results at finer lattice spacings could be partially explained by increased difficulty in training the larger number of parameters. The Λ = 0 manifold parameterization involves few enough parameters that it is possible to investigate the structure of the optimal parameters similarly to the case of SU (2) gauge theory. As shown in Fig. 10 The magnitude of this quantity can be reduced by shift- loop, the (1, 1) component is given by The magnitude of the first, second, and fourth terms are reduced by shifting φ 1 x → φ 1 x +iλ and φ 1 x → φ 1 x +iλ with λ > 0. The magnitude of the second and third terms can be reduced by shifting φ 3 x + iδ with δ > 0; this is also consistent with a positive imaginary shift of iδ in φ 3 x − φ 4 x and φ 4 x − φ 3 x , reducing the magnitude of the fourth and fifth terms. These deformations result in reduced magnitude and correspondingly lower variance. Deformations with these qualitative features are reproduced in the optimized manifolds found for Λ = 0. Finally, we note that φ 2 x appears in the exponent with opposite signs in the second and third terms, and similarly for φ 5 x in the fourth and fifth terms, so there is no constant vertical deformation of these terms that will reduce the overall magnitude.
Analogously to the case of SU (2) gauge theory, no significant StN improvements were obtained by including higher Fourier modes. Fig. 11 directly compares optimized manifolds for Λ ≤ 2 at the coarsest lattice spacing, showing that the variance improvement was unchanged by including higher Fourier modes. Constant vertical deformations therefore appear to saturate the StN benefits achieved by general Fourier series parameterizations for deformed Wilson loops in (1+1)D. More complicated parameterizations of contour deformations may prove more useful in (3+1)D; however, numerical studies of deformed observables in (3 + 1)D lattice gauge theory are left to future work.

VI. CONCLUSIONS
In this work, we have defined a family of complex manifolds for path integral deformations in SU (N ) lattice field theories. The manifolds introduced here are described in terms of an angular parameterization of each SU (N ) variable, with dependence between variables at differing spacetime sites restricted to enforce a triangular Jacobian. We find that choosing a parameterization in which the observable of interest can be written as O = e iθ X is a useful practical choice, allowing constant shift deformations in the parameter θ to make substantial progress in reducing noise. Choosing a spacetime dependence of the deformation that gives rise to a triangular Jacobian is key to ensuring the deformed integral can be computed efficiently, as the Jacobian determinant can then be evaluated with cost linear in the number of spacetime lattice sites.
This manifold parameterization can be combined with the method of deformed observables introduced in Ref. [48] to reduce noise in observables. This approach is applicable when the action is real and the Boltzmann weight e −S can be treated as a probability measure. We stress that this method of deformed observables does not require changing the Monte Carlo sampling, despite being based on an analysis of contour deformation of the entire path integral, and can be thought of as an approach to analytically relate observables with identical expectation values and different variance. Keeping the Monte Carlo weights unchanged allows manifold optimization using estimates of the deformed observable variance computed with respect to a fixed Monte Carlo ensemble. There is a tradeoff between the cost of optimizing manifold parameters and the statistical precision gained. In practice, we find that initializing manifold parameters from optimal parameters for similar observables significantly reduces the associated cost.
This method was shown in Sec. IV and V to improve the variance of Wilson loop observables in (1+1)D SU (2) and SU (3) lattice gauge theory by orders of magnitude. For the original Wilson loop observables, the signal-tonoise ratio decreases exponentially with area. The deformed observables mitigate this StN problem, and in particular we find that the improvements are consistent with an exponential in the physical Wilson loop area, with the most significant reduction in noise for Wilson loops of the largest area. The improvement in variance was empirically found to be similar at three different lattice spacings, though less of an improvement was seen at the finest lattice spacing for SU (3); the achieved improvements in the continuum limit and for other theories is thus an interesting subject of future investigation. However, we stress that making any gains at finite lattice spacing is still a significant step forward due to the convenience of the method: optimizing a deformation on a fixed ensemble quickly gives new observables that encode the same physical content while having significantly reduced noise.
In demonstrating the method, we focused here on a particular deformation of the angular parameters based on a Fourier series expansion and shift in the imaginary direction only. Writing the observable phase fluctuations in terms of these periodic angular parameters that can be shifted by a constant in the imaginary direction led to deformed observables with significant StN improvement. The surprising result that zero-mode terms alone significantly reduce noise, with neither dependence between plaquettes at different spacetime sites nor dependence on the values of the angular parameters themselves, suggests that the majority of the StN problem in these (1 + 1)D theories arises from independent local fluctuations of SU (N ) angular parameters.
Complications are expected in higher dimensions, as Gauss' Law implies that plaquettes at differing spacetime locations and orientations must satisfy many inde-pendent constraints. Deformations thus cannot independently address fluctuations in each plaquette included in a Wilson loop, or more generally in each fundamental degree of freedom included in an observable. It is therefore an interesting line of future work to determine how best to incorporate this spacetime dependence in higher dimensional applications of path integral contour deformations. The approach employed here is one of many possible approaches to creating expressive transformations with efficiently computable Jacobians. This issue has been explored in some depth in normalizing flows for sampling in many contexts, including image generation and ensemble generation for lattice field theory; see Refs. [64,83] for recent reviews. Similar techniques may prove more fruitful in future applications of path integral contour deformations to observables of phenomenological interest in (3 + 1)D lattice gauge theories. The calculation of z for SU (N ) can be performed analogously to the U (N ) case considered in Refs. [70,71] with further details for the SU (N ) case given in Ref. [72]. In the eigenbasis of P , the Haar measure is given by where the δ-function enforces the unit determinant condition of SU (N ). Using the Fourier series representation of the δ-function for the compact variable N K=1 θ K , this can be expressed as The product of e iθ I − e iθ J factors can be expressed in terms of the determinant of a Vandermonde matrix [70,71]. The SU (N ) Haar measure is given by a sum of r dr χr 1   TABLE II. Properties of group representations: the dimension dr and the character χr. We have followed the normalizations in Table 14 of Ref. [72].
similar determinants, and z can be expressed as [72] where the entries of the matrix Z q are given by where I n (x) is a modified Bessel function. For example, in the SU (2) case z is given explicitly by A simpler but equivalent form z SU (2) = g 2 2 I 1 (4/g 2 ) can also be derived using the parameterization introduced in Section IV. From this, we can derive an expression for χ 1 from taking derivatives of z Similar to tr(W A )/N , we can factorize tr(W A ) 2 /N 2 and tr(W A ) 2 /N 2 into integrals of character functions over single-elements of SU (N ) using an identity [84] dΩ From Table II, we recognize that tr(W A ) 2 /N 2 is related to χ 1,−1 (P x ). Thus, applying Eq. (A7) we can de-rive that Since tr(W A ) 2 is not a character χ r , attempting to factorize it generates new terms. Specifically, in addition to tr(W A ) 2 one finds tr(W 2 A ) for A ⊂ A. Instead, a basis involving χ 2 (P ) and χ 1,−1 (P ) can be constructed from linear combinations of these traces and satisfies The character expectation values can be obtained for general SU (N ) gauge groups by numerically evaluating the integrals in Eq. (A13). For SU (2), it is straightforward to analytically evaluate the character expection values appearing in Eq. (A13). Not all representations of SU (2) are independent using the enumeration in Table II, and in particular χ 2 (P ) = χ 1,−1 (P ) and χ 1,1 (P ) = χ 0 (P ) = 1. After removing the redundant characters, the remain integrals χ j (P ) can be solved using expression of the characters in terms of angles [72] χ j = sin([j + 1/2]α) sin(α/2) (A16) where j = 0, 1 2 , 1 · · · index the unique characters of SU (2) and r = {1, −1} equals j = 1. With these, one has 1 z DP 1 d j χ j (P ) e As an alternative to the parameterization presented in Sec. IV A, plaquettes P x ∈ SU (2) can be represented as The su(2) unit vectorn x can be further parameterized aŝ n x = (cos φ x sin θ x , sin φ x sin θ x , cos θ x ), and a general SU (2) group element P x can be parameterized in terms of the three angles 0 ≤ α x < 2π, 0 ≤ φ x < 2π, 0 ≤ θ x < π. (B3) The Haar measure is given in these coordinates as The inverse map needed to obtain these angular parameters for an SU (2) matrix P x is given by α x = 2 arccos 1 2 P 11 x + P 22 x θ x = arccos P 11 x − P 22 x 2i sin(α x /2) φ x = 1 2 arg P 21 x P 12 x . (B5) As with Eq. (60), these are not entire functions of P x but this is not an obstacle for contour deformation because it is only the parameterization given by Eqs. (B1) and (B2) that determines whether path integrands can be interpreted as holomorphic functions of the angles {α x , θ x , φ x } associated with P x . Deformed observables starting with the (1, 1) component of SU (2) Wilson loops can be defined using this parameterization. A family of vertical deformations for {α x , θ x , φ x } can be defined analogously to the deformation described in Sec. IV B. Since α x and θ x have fixed (non-identified) integration contour endpoints, a constant vertical deformation can only be applied to φ x . For A = 1 in particular, Tr(P x ) = cos(α x /2), and the trace is independent of the only constant vertical deformation that can be applied. Neither this constant vertical deformation nor non-constant vertical deformations corresponding to Fourier basis cutoffs Λ = 1, 2 lead to statistically significant variance reduction with A = 1. As shown in Fig. 12, for A > 1 deformed observable results using this parameterization with Λ = 0 do lead to significant variance reduction when compared to undeformed contour results. However, orders of magnitude less variance reduction is obtained for large area Wilson loops using optimized deformed observables with this parameterization when compared to results using the parameterization explored in Sec. IV A. The fact that the parameterization in Eq. (B1) leads to less variance reduction than the parameterization in Sec. IV A can be intuitively explained by the inability of constant vertical deformations to decrease the magnitudes of the (1, 1) components of (products of) SU (2) matrices using the parameterization Eq. (B1). The significance of the difference between the results demonstrates the utility of rewriting observables before deformation in achieving practical StN improvements, as discussed in Sec. II D.
Appendix C: Regularization terms to avoid overtraining and overlap problems When ReS is significantly different from S, we can encounter an overlap problem for training and evaluation; both processes involve factors of e − ReS+S that can have very large magnitude fluctuations in this situation. To mitigate this problem, it is helpful to include regularization terms in the loss function L. These terms may bias the exact loss minimum away from the optimal, but allow closer convergence to that optimal solution given finite statistics estimates of L. The strength of these terms is controlled by a small parameter .
We discuss two possible terms here. First, an L2 regularizer [85] may be used, which simply ensures the parameters controlling the deformation all remain close to zero. Generically labeling those parameters as λ i , this loss term can be written In the limit of → ∞, the parameters λ i are forced to zero and the optimization procedure must remain at the original manifold. A smaller choice of mildly biases the optimization procedure towards the original manifold, such that the loss function and gradients remain feasible to estimate with finite statistics. An alternate approach is to directly penalize distance between ReS and S using a regularization term, This term is minimized when S = ReS, providing a bias towards remaining close to the original manifold. Though written as a path integral, this quantity can be estimated using the original samples, much like the main loss function and gradients. Both of these regularizer terms were explored, however no severe overlap problem was observed during training