Finite-Density Monte Carlo Calculations on Sign-Optimized Manifolds

We present a general technique for addressing sign problems that arise in Monte Carlo simulations of field theories. This method deforms the domain of the path integral to a manifold in complex field space that maximizes the average sign (therefore reducing the sign problem) within a parameterized family of manifolds. We presents results for the $1+1$ dimensional Thirring model with Wilson fermions on lattice sizes up to $40\times 10$. This method reaches higher $\mu$ then previous techniques while substantially decreasing the computational time required.


I. INTRODUCTION
Monte Carlo methods are critical to the study of field-theoretical and many-body systems. In particular, they are the only general-purpose approach to address strongly interacting field theories. The basic idea of all Monte Carlo methods is simple: observables are formulated as path integrals which, on a discretized spacetime, become high dimensional integrals. Those are then estimated stochastically by importance sampling. Importance sampling relies on interpreting part of the integrand (typically the exponential of the action) as a probability, which makes sense only if this term is real and non-negative. Unfortunately, many theories, even when formulated in imaginary time (Euclidean space), have a negative or even complex integrand. This so-called "sign problem" is a major roadblock to the understanding of some of the most important systems in physics. Many systems at finite density (including QCD at finite baryon density) and non-relativistic systems lacking some special symmetry between fermion species (as in the Hubbard model away from half-filling or on non bi-partite lattices) suffer from sign problems. Also, some real time observables in thermal equilibrium as well as truly non-equilibrium phenomena are not amenable to imaginary time calculations and have a particularly severe sign problem that renders most Monte Carlo methods a non-starter. A simple, albeit not very effective, way of dealing with the sign problem is to choose a manifestly positive part of the integrand as the statistical weight while moving the part with the fluctuating sign/phase to the observable to be measured. This "reweighting" is effective to the extent that the average sign, that is, the average of the fluctuating sign on the ensemble defined by the positive measure, is not too small. However, in theories with sign problems, the average sign typically decreases exponentially with the volume and the inverse temperature of the system. Many techniques have been proposed in the past to ameliorate the sign problem. Among them are the complex Langevin method [1], the density of states method [2], canonical methods [3,4], reweighting methods [5], series expansion in the chemical potential [6], fermion bags [7], and analytic continuation from imaginary chemical potentials [8]. Each one has its successes and pitfalls. It is fair to say, however, that the sign problems of field theories remain largely unsolved.
More recently, the "thimble" method was proposed [9,10]. The main idea is to complexify the domain of the path integral. Instead of integrating over real values of the fields, one deforms the manifold of integration from R N ⊂ C N to some other N -dimensional manifold, M ⊂ C N . A multidimensional generalization of Cauchy's theorem of complex analysis guarantees, under some conditions on M, that the integral over M and R N of any holomorphic integrand is the same. This allows one to compute expectation values of observables O for which Oe −S is holomorphic, if M is properly chosen. The key to these methods is that the average sign e −iS I is an integral of a non-holomorphic function, and therefore dependent upon the integration manifold, whereas the physical expectation values do not. The manifold M was originally suggested to be the combination of thimbles, multidimensional generalizations of the steepest descent/constant phase path familiar from complex analysis. This method and its associated algorithmic problems were pursued by several groups [11][12][13][14][15][16][17][18][19][20][21][22][23][24]. Relevant analytical work, closely connected to the "resurgent transseries" (for a recent review see [25]) was also pursued in [26][27][28][29][30]. Experience with actual simulations made evident some problems with the thimble approach. The first is that thimbles are complicated manifolds that have to be found "on the fly" by the algorithm and the lack of a local characterization of thimbles makes this computationally expensive.
Second, theories where more than one thimble contribute to the path integral significantly [27] are particularly difficult to sample [16,20].
This led to some modifications of the method. In the generalized thimble method [19,22,23] the manifold of integration M is chosen to be the deformation of R N by the holomorphic flow defined by the action. If R N is deformed by the flow by an infinite amount of flow time, M approaches the right combination of thimbles equivalent to the original integration domain. If the flow is stopped at some finite flow time, M is close, but not identical, to the sum of appropriate thimbles. It is, however, a legitimate manifold of integration in the sense that it gives exactly the same result as the original manifold R N . The advantage of the manifold M over the thimbles is that 1) less flow corresponds to smaller computational cost and 2) M can be algorithmically constructed during the simulation by solving the flow equations while finding the thimbles and determining which ones contribute to the integral is a difficult task in all but the simplest field theories. This is not to say that the generalized method does not have its own problems. Large flow times can improve the sign problem but generate multimodal distributions difficult to sample. Shorter flowing times avoid the multimodality but improve the sign problem less, so the flow time has to be carefully chosen and, in fact, there is no guarantee that a "middle ground" flowing time can be found. (Multimodality can also be dealt with by more sophisticated sampling algorithms [24,31].) In addition, the computation of the Jacobian arising from parameterizing M by the initial point of the flow in R N is expensive. The proposal presented in this paper drastically reduces the cost of the Jacobian. At the same time, it provides more flexibility in the choice of M while systematically improving the sign problem.
One step towards speeding up the costly calculations involved in the generalized thimble method was given in [21]. A feed-forward neural network was trained to interpolate points in M obtained by the more expensive holomorphic flow. The neural net was then used to quickly generate more points in M. In the present paper we go one step further and completely bypass the need to generate points by flowing. Instead, we seek to flow directly toward a manifold of maximum average sign, albeit in a restricted family of manifolds M λ which are parameterized by a finite number of parameters λ. A similar proposal based on maximizing the approximate average sign was pursued in [32,33]. Our method produces a manifold M that can be sampled as rapidly as R N via : where a pointφ in M is parametrized by a point φ in R N . We present in Sec. II how the algorithm can be implemented, with special emphasis on the gradient ascent method we use to obtain the local maximum value of average sign. Further, it is shown that the derivative of the sign problem with respect to λ can be efficiently calculated despite a potentially small sign. The method of determining an optimal manifold for integration, as well as the procedure for integrating along that manifold, is detailed in Sec. II. In Sec. III, we define the physical model we study with this algorithm, the Thirring model. In Sec. IV we present our results, and conclusions are summarized in Sec. V.

II. THE METHOD
We start by specifying a family M λ of submanifolds of C N , parameterized by λ. The choice of this family is guided by the ease of computation of the Jacobian and some experience acquired with the generalized thimble method. We then proceed to maximize the average sign among this family of manifolds using a simple gradient ascent technique. On a manifold of integration M λ , the average sign is where φ are the fields in the theory and S eff ≡ S − ln det J is the effective action. On this manifold, we compute a vector proportional to the gradient of the magnitude of the average sign, and then proceed to change λ by a small amount along this vector.
Here η is the learning rate, determining how large each step along the computed gradient should be. We initialize M λ0 to be R N . After a large number of steps, and if the learning rate η is small enough, we should arrive at a (local) maximum of the average sign. Critically, the computation of the direction of the gradient has no sign problem.
We now show how to compute the direction of the gradient. The numerator of Eq. (2), being the integral of a holomorphic function e −S along M, does not depend on λ. In contrast, since the integral of e − Re S eff cannot be written as an integral of a holomorphic function, the denominator will vary with λ. The gradient of the magnitude | σ λ | with respect to the manifold parameters λ, then, is given by From this, we see that the gradient factorizes into two pieces: the average sign on M λ , and an expectation value of an operator on that manifold. The second factor is an expectation value with respect to e − Re S eff , and therefore is sign-problem free; the first is a scalar which does not affect the direction. This allows us to compute (up to that overall scalar) the gradient on a manifold reliably by a short Monte Carlo simulation. For a gradient ascent method, an overall magnitude like | σ λ | (even varying with λ) can be safely neglected: it does not change the direction the gradient points in λ-space. This allows our method to be efficient even when the average sign is statistically indistinguishable from zero. Therefore, at each step s of the gradient ascent, we update the manifold parameters λ s according to In principle, one might use a more efficient stochastic gradient ascent algorithm, such as Adam [34], to both speed up the calculation and avoid finding suboptimal local maximum. For this work, we found naïve gradient ascent converges adequately swiftly, and the stochastic nature of the Monte Carlo simulation used to compute the gradient helped to explore parameter space.
In a gradient ascent method, the parameter η must chosen to be small enough to avoid overshooting a maximum, but not much smaller, otherwise it will oscillate around the maximum but never converge. For our purposes, there is one additional practical consideration restricting the size of η. In calculating the expectation value of Eq. (5), we would like to avoid needing to completely re-thermalize the Markov chain after every gradient ascent step. To this end, we set the step size η to be sufficiently small that S[φ(φ)], for any fixed φ, changes only slowly with s. The value of φ at the end of one Monte Carlo run can then be used to seed the next run on the new manifold, minimizing the necessary thermalization time.
It should be stressed that lack of care in this process, or in any other detail of the sign maximization process, may reduce the average sign of the manifold ultimately found and increase the computational time, but does not affect the correctness of physical observables on that manifold. The "real-plane" integral is calculated as an integral over compact variables, that is, an integral over T N = (S 1 ) N . The manifold M λ is a submanifold of the complexified N -torus (S 1 × R) N . Cauchy's integral theorem guarantees that, provided the domain of integration is compact (as T N is), the integral over M λ will equal that over T N if the manifold T N is continuously deformable to M λ . For our purposes, this is guaranteed by making the family M λ be continuous in the parameters λ, and letting M λ = T N .
The determinant of the Jacobian J -which must be computed during a Monte Carlo on M -is a potentially expensive operation, with a cost approximately cubic in the number of degrees of freedom. To avoid this, we will chose an ansatz family M λ for which the Jacobian is diagonal. In particular, we writeφ , which is the most general ansatz possible satisfying our constraints. Relaxing this constraint to a non-diagonal Jacobian should improve the sign problem by allowing nonlocal correlations in the imaginary components of φ, but this will come at computational expense and will be left to future work.

III. THIRRING MODEL
In order to make the ideas more concrete we will phrase our discussion in terms of a specific field theory model, the 1 + 1D massive Thirring model with Wilson fermions. The lattice action is given by with where ψ is a two-component Dirac spinor with the flavor indices a taking values from 1, . . . , N F , g is the coupling, µ the fermion chemical potential and κ = 1/(2m + 4), where m is the bare mass of the fermions. Standard universality arguments applied to this asymptotically free theory indicate that, in the continuum limit, this action is equivalent to the continuum action The four-fermion interaction is generated when the bosonic A µ (x) auxiliary field is integrated over. The Thirring model was chosen since other similar methods have been applied to it, thus it serves as a useful benchmark for our method.
The integration over the fermion fields results in the action In this work we take N F = 2. For finite chemical potential µ = 0, the determinant det D(A) is not strictly real, and we must address a sign problem. In applying the method described in Sec. II we enforce three additional constraints on f , all coming from symmetries of the action Eq. (8). The action is 2π-periodic in the fields A 0 , A 1 and an even function; therefore we require the same of f . Finally, the action is invariant under translations of the lattice. The lattice degrees of freedom are divided into timelike links A 0 and spatial links A 1 . Translational invariance of the f i implies that the form of f i can depend only on whether the index i refers to an A 0 field or an A 1 field.
Consistent with these demands, we use a simple two-parameter family with f 0 (φ) = λ 0 + λ 1 cos φ and f 1 (φ) = 0, so that the manifold M λ is defined byÃ As discussed above, Cauchy's theorem guarantees that expectation values computed on M λ are equal to those computed on T N , provided that one manifold may be continuously deformed to the other. To see that this is so, note that M 0 = T N , and thatÃ is a continuous function of λ. One might consider using a larger class of manifolds. We have investigated including a cos 2 A 0 term inÃ 0 (A 0 , A 1 ) and a cos A 1 in theÃ 1 (A 0 , A 1 ), but in all cases found negligible improvement in the average sign computed on the resulting manifolds.
Once the manifold has been selected by a suitably long gradient ascent, we perform a Monte Carlo calculation to determine observables of interest via Eq. (1). The imaginary part of both the action and the log of the Jacobian determinant must be included in the reweighting. Since we chose a manifold of integration for which the Jacobian is diagonal, the Monte Carlo sampling proceeds as quickly as it would for a standard Metropolis running on R N . There are no constraints on the observables computed, aside from the requirement that Oe −S be holomorphic.

IV. RESULTS
We choose bare parameters g and m of the action so that the renormalized particle masses lie below the lattice cutoff scale. We measure two particle masses -a fermion mass am f and a boson mass am b -by fitting the large-time We perform calculations on two lattice sizes: N t × N x = 20 × 10 and 40 × 10. The maximization of the sign average is done using a step size η = 10 −4 . This step size was determined by starting with a large η, where the optimization process exhibited oscillatory behavior, and then reducing it until the process becomes smooth. We only tuned it on the most demanding ensemble, the 40 × 10 ensemble with the largest chemical potential, and used the same value for all other ensembles. The optimization process is stopped when the λ parameters converge, that is when the gradient in Eq. (5) becomes too small. The manifold parameters determined by the optimization procedure are shown in Fig. 1. optimization going forward: perform the gradient ascent at a small number of values of µ, and interpolate to determine the manifold of integration for all other desired chemical potentials. Another option is to use as a starting point for the optimization process the values determined for a "nearby" ensemble, one with similar chemical potential. The fact that the interpolated values of λ 0 , λ 1 are approximate and not strictly optimal affects only the efficiency of the algorithm, not its correctness. For more elaborate families of manifolds, with more parameters, the gradient ascent phase becomes more time-consuming. This optimization could be computationally expensive in such cases, but optimizations along the lines suggested above are likely to be available. We note that the discontinuities in Fig. 1 are due to an early exit from the optimization loop. We decided to keep these parameters to show that this discontinuity is not reflected in the observables, as a further check of the method.
With the parameters determined above, we performed a Monte-Carlo calculation generating of the order of two to ten thousand independent configurations (except for a few points discussed below). The average sign and measurements of average fermion density (per flavor) n for 0 < µ/m f < 4 on a 20×10 lattice are shown in Fig. 2. The real plane (R N ) calculations are shown in black; data points for which the average sign could not be distinguished from 0 at 2σ (indicating that no measured observable will be meaningful) are grayed out. Calculations on the tangent plane of the dominant Lefschetz thimble A 0 (x) + iA are shown in red, and those of the machine-learned learnifold L T from Ref. [21] in blue. Finally, we present calculations done on the sign-optimized manifold M S in green. We see that the sign-optimized manifold finds an average sign problem as good or better than the learnifold does, with the added benefits of being computationally faster and simpler to implement. These improvements allow for us to compute the density with reduced uncertainty and even reach higher values of µ/m f . As a further check of our results, we show the result for non-interacting fermions (with the same renormalized mass) as a dotted line. Similarly, results for a lattice size of 40 × 10 are shown in Fig. 3. On this larger lattice, the relative performance of the sign-optimized manifold is moderately improved. For µ/m f > 1.83, neither the real plane calculation, nor the learnifold, could resolve the sign problem. The sign-optimized manifold has sufficiently large average signs to allow us to measure the density up to µ/m f = 2.50. Furthermore, other methods had problems computing the density near µ/m f ≈ 1.00, while using the sign-optimized manifold we compute the density at this point easily.
At both lattice sizes, we demonstrate that the sign-optimized manifold method is capable of reproducing the "Silver Blaze" phenomenon [35]: the µ-independence of observables below the threshold chemical potential µ ≈ m f .
To estimate the speedup given by the optimized manifold over a naive calculation on R N , we performed two tests. First, 72, 000 decorrelated measurements at µ/m f = 3.33 on a 20 × 10 lattice. This number of measurements is not enough to resolve the sign average from zero, so we obtain only a lower bound on the speedup attributable to using M S . We find that the real plane has an average sign of 0.002 ± 0.003, whereas the M S has an average sign of 0.086 ± 0.007, which is larger by at least a factor of 16. The number of measurements required to obtain a fixed precision is proportional σ −2 , therefore this corresponds to a speedup greater than 250. The second test computed 10, 000 measurements on R N at µ/m f = 1.00 on a 40 × 10 lattice (this value is of interest because it corresponds to the first particle threshold). The real plane was found to have an average sign of 0.005 ± 0.005, but an average sign of 0.155 ± 0.007 on the optimized manifold, which is larger by at least a factor of 15, giving a speedup of 225.
The speedup given by this algorithm over the learnifold procedure is more difficult to estimate; however, the learnifold procedure requires evolving the holomorphic flow equations many times to achieve at best the same average sign as M S . According to Ref. [21], generating the training set and training the neural network took 114 CPU-hours for a 20 × 10 lattice at µ/m f = 3.83. The algorithm described in this paper replaces that step with a gradient ascent routine, which took approximately 24 CPU-hours, and is amenable to further optimization.

V. DISCUSSION AND PROSPECTS
We have exhibited an efficient method for reducing the sign problem of the finite density Thirring model in 1+1 dimensions. Our method works with a pre-determined family of manifolds, seeking the manifold in that family which has the largest average sign. Once such a manifold has been found, a standard Metropolis calculation, with reweighting, is performed on that manifold. Using this method, we have increased the µ/m f range that can reliably computed. It is important to stress that comparisons with other methods of dealing with the sign problem must take into account that the computational cost of the method presented here has both a fixed cost (independent of the number of measurements made) and a variable one (that is proportional to the number of measurements). The variable cost compares very favorably with other methods, especially the generalized thimble method. Therefore, the best way to apply the method is to determine the parameters of M roughly so the average sign is distinguishable from zero but not necessarily particularly close to one. Then, a high number of measurements can be made cheaply to reduce the error bars.
This method is closely related to previous approaches based on the complexification of lattice degrees of freedom, but works without evolving a differential equation to determine the manifold of integration. This makes it faster as long as the parameters defining the manifold can be determined quickly. The method has the drawback that it requires the construction of a model-specific family of manifolds, so physical insight is required. Nevertheless, given such an ansatz, the method is very advantageous. This suggests that theoretical effort should be put into generating ansatze applicable to more interesting physical theories, like gauge theories and real-time (Minkowski space) calculations of other models.