Alternating Descent Method for Gauge Cooling of Complex Langevin Simulations

We study the gauge cooling technique for the complex Langevin method applied to the computation in lattice quantum chromodynamics. We propose a new solver of the minimization problem that optimizes the gauge, which does not include any parameter in each iteration, and shows better performance than the classical gradient descent method especially when the lattice size is large. Two numerical tests are carried out to show the effectiveness of the new algorithm.


Introduction
Lattice QCD is the standard nonperturbative tool for quantum chromodynamics (QCD). The link variables U x,µ ∈ SU (3) stand for the gluons between lattice points x and x +μ where x = (t, x 1 , x 2 , x 3 ) ∈ N 4 . Using {U } to represent the collection of all link variables U x,µ , one can compute the expectation value of any observable O by where the partition function Z is given by Several methods have been proposed to mitigate the numerical sign problem, including analytical methods such as analytical continuation and series expansion [11,4], the Lefschetz thimble method [10,9], and the complex Langevin method (CLM) [15,14,16,22]. The CLM is a straightforward generalization of the real Langevin method, which extends the dynamics on the SU (3) field to the dynamics on the SL(3, C) field. Despite its formal legitimacy [2], the results may diverge or even converge to wrong results [20,24], and this often occurs when large excursions occur in the complex Langevin dynamics. Recently in [24], by a detailed analysis on the boundary terms, the underlying mechanism of the wrong convergence result has surfaced, and the result therein is further developed in [25] to find corrections of the numerical values.
Nevertheless, the most straightforward way to stabilize the dynamics is to pull the fields closer to SU (3), which can be done by gauge cooling [26] or dynamical stabilization [5]. The gauge cooling method utilizes the redundant degrees of freedom in the gauge field, and it does not introduce any biases to the expectation values. Such a method has been formally justified in [20,18], and it has been successfully applied to a number of problems [27,19]. Basically, the gauge cooling method requires to choose an optimal gauge for the complexified field, which minimizes the distance between the current field and SU (3). In one dimension, it has been figured out in [8] that the problem can be solved analytically since the field is essentially equivalent to the one-link model. While the analytical solution is unavailable for the multi-dimensional case, the minimization problem is usually solved by the gradient descent method, for which the step length has to be chosen carefully to achieve satisfactory convergence rate. Some techniques to choose the step length have been discussed in [1,6].
In this paper, we would like to focus on the gauge cooling problem and propose a more efficient method to solve the optimization problem, which does not require delicate selection of the step length. The rest of this paper is organized as follows. In Section 2, we give a brief introduction to the complex Langevin method and the gauge cooling technique. Our alternating descent method is presented in Section 3. Section 4 includes two numerical experiments to test the new method. Finally, some concluding remarks are given in Section 5.

Complex Langevin method and gauge cooling
In this section, we provide a brief review of the complex Langevim method and the method of gauge cooling. To be more general, we consider the SU (n) gauge theory with {U } ∈ [SU (n)] 4N0N1N2N3 , where N 0 and N 1 , N 2 , N 3 are the numbers of lattice points in time and spatial directions. Define then {U } can be represented by {U x,µ : x ∈ G; µ = 0, 1, 2, 3}. We assume periodic boundary conditions for {U }.
Let w a,x,µ for a = 1, · · · , n 2 − 1, x ∈ G and µ = 0, 1, 2, 3 be independent Brownian motions. The complex Langevin method is described by a complex stochastic process: where λ a , a = 1, · · · , n 2 − 1 are the infinitesimal generators of the SU (n) group satisfying and D a,x,µ denotes the left Lie derivative operator defined as being the set of the matrices U ǫ y,ν = exp(iǫδ xy δ µν λ a )U y,ν for y ∈ G, ν = 0, 1, 2, 3, a = 1, 2, · · · , n 2 − 1. Here the field {U ǫ } is actually the perturbation of {U } which only replaces the single link U x,µ by exp(iǫλ a )U x,µ . In practice, we apply the Euler-Maruyama discretization of the equation (1) for time evolution: where ∆t is time step and any of η a,x,µ is normally distributed with mean 0 and variance 2. The complex Langevin method may diverge due to insufficient decay of the probability density function at infinity, which is often owing to too much excursion away from a unitary field. The gauge cooling method uses the redundant degrees of freedom in the gauge theory to restrict such excursion. Specifically, the observable O and the complex action S are invariant under the gauge transformation defined as where V x ∈ SL(n, C) for any x ∈ G, which also satisfy the periodic boundary conditions. Define {Ũ } as the collection of all theŨ x,µ . The gauge cooling method applies such a transform after every time step. By choosing V x appropriately, the drift away from the unitary field can be mitigated. The distance between {U } and [SU (n)] 4N0N1N2N3 can be measured by the norm . In [26], this problem is solved iteratively by the gradient descent method: and s stands for the step length. Since the optimization problem has to be solved at every time step, an accurate line search method may be too expensive for gauge cooling. The simplest method is to choose s to be proportional to ∆t [26], while it is unclear how to select a suitable coefficient. In [1], the authors introduced the method of adaptive gauge cooling, which sets s to be larger when the current solution is farther away from the minimum point, and controls the divergence by reducing s when the cooling drift is large. In [6], the authors find out s numerically by applying Brent's method [7] and adding an upper bound to avoid instability.

Alternating descent method for gauge cooling
To avoid intricate selection of the time step, we will introduce in this section a new numerical method of gauge cooling, which is free of parameters in each iteration. The general idea of this method is to decompose all the variables V x into two subsets, and these two subsets of variables will be optimized in turns. Below we refer to this method as the alternating descent method.
To begin with, we define two index sets: In order to be consistent with the periodic boundary condition, we require that N 0 , N 1 , N 2 and N 3 be even numbers. Each iteration in the alternating descent method then contains two steps: The operation in each step finds the optimal gauge transformation with transformation variables V ) being fixed to be identity on odd (or even) lattice points. To implement the above algorithm, it remains only to find V Due to the periodic boundary condition, the algorithms for finding both sets of matrices are nearly identical, and below we focus only on the algorithm to find V (k) x ∈ G e . For simplicity, we omit the superscript k, and thus the optimization problem to be solved is (6) argmin Note that the objective function is exactly the value of F (·) for link variables after the gauge transformation. In this problem, the optimization of every V x can be fully decoupled, meaning that we can solve the minimization problem (7) argmin for all x ∈ G e , as makes it feasible to solve (6) exactly. Now we fix x ∈ G e and try to solve (7). Since where all V a,x are real numbers as in the gradient descent method. By straightforward calculation, we have For simplicity, we let Now we further simplify the notation by omitting the variable x, so that the equation (9) can be rewritten as If α is given, this equation is the algebraic Riccati equation [17]. The standard way to solve this equation is to first solve the eigenvalue problem where X, Y, Λ ∈ C n×n and Λ is a diagonal matrix whose all diagonal elements have positive real parts. Then the solution can be written as V V † = Y X −1 . Here Λ only gives a half of the eigenvalues of the first matrix in (11), and the other half of the eigenvalues must have negative real parts. The solution contains the parameter α, which can be further determined by det(V V † ) = 1. In our case, we can simplify the problem by expanding (11): Since P and Q are Hermitian positive definite matrices, their product matrix QP is diagonalizable and all the eigenvalues are real. Thus X can be obtained by solving a smaller eigenvalue problem (13). Here we assume that det X = 1. Suppose ξ 1 , · · · , ξ n are the eigenvalues of QP . Then Λ = diag ξ 1 + 1/4α 2 , ξ 2 + 1/4α 2 , · · · , ξ n + 1/4α 2 .
Note that this equation determines a unique solution, due to the monotonicity of the left-hand side with respect to α. Thus we obtain the following algorithm to solve V : 1) Eigendecompose the matrix QP to get the eigenvalues ξ 1 , ξ 2 , · · · , ξ n and eigenvectors X with det(X) = 1. 2) Solve the equation (15) to find the value of α: 3) Compute V V † by (14). 4) Find V by taking the matrix square root of (14).
The above algorithm solves the optimization problem (7) exactly, and thus provides the solution of (6).
This method turns out to be similar to the coordinate descent method, which updates one variable at a time. The coordinate descent method has wide applications in large-scale optimization problems [21], especially when the subproblem for each variable can be solved exactly [13,28]. In the following section, we will show by numerical tests that such a method is also highly efficient in the gauge cooling problem.

Numerical Examples
We will now apply the alternating descent method to two numerical examples to demonstrate its efficiency.
In both examples, the Euler-Maruyama method (2) is applied to solve the complex Langevin dynamics. Our method will be compared with the gradient descent method, and when the alternating descent method is used, only one iteration is applied after each time step. The two examples to be shown below are, respectively, a one-dimensional Polyakov loop model and a four-dimensional model for heavy quark QCD. 4.1. Polyakov loop model. For the one-dimensional Polyakov loop model, We denote the link variables by U k , k = 1, 2, · · · , N . Following [26], we define the action S by S({U }) = − tr(β 1 U 1 · · · U N + β 2 U −1 N · · · U −1 1 ), and choose β 1 = β + κe µ and β 2 =β + κe −µ with β = 2, κ = 0.1 and µ = 1. The model is applied to the SU (3) theory, so that λ a , a = 1, · · · , 8 are Gell-Mann matrices. For the one-dimensional model, the parameter α x in (9) always equals zero since both V −1 x P x V − † x and V † x Q x V x are Hermitian, positive definite, and have the determinant 1.
In our numerical experiments, we choose the Langevin time step to be ∆t = 2 × 10 −5 and the simulation ends at t = 10. Note that in the one-dimensional case, the exact solution of the optimization problem (4) can be obtained analytically, and the result is where Λ is a diagonal matrix whose diagonal entries are all the eigenvalues of U N U 1 · · · U N −1 . For details, we refer the readers to a recent work [8]. Thus, the following four methods will be tested in this example: • Complex Langevin with no gauge cooling.
• Complex Langevin with gauge cooling implemented by the gradient descent method [26].
• Complex Langevin with gauge cooling implemented by the alternating descent method introduced in Section 3. For the gradient descent method, the update of the link variables (5) can be formulated by where s stands for the step length which is set as s = ∆t. Such iteration is applied three times after every Langevin step. For the alternating descent method, only one iteration per time step is applied as mentioned in the beginning of this section. From the figures, we can observe without surprise that if no gauge cooling is applied, the computations are forced to terminate for N = 8, 16, 32 due to overflow, and even for N = 4, the norm of the links increases too quickly. For N = 4 and 8, the performance of gradient descent method is excellent due to its good agreement with the optimal gauge cooling. But the disparity between these two methods increases gradually as N gets larger, as is obvious for N = 16 and 32. On the contrary, in all cases, the alternating descent method performs almost as well as the optimal gauge cooling. In general, we see that the alternating descent method is always competitive, and when the number of lattice points are large, the alternating descent method can significantly outperform the gradient descent method. To demonstrate the efficiency of the method, we have also listed in Table 1 the computational time in our experiments. In Table 1, the total computational time and the computational time spent only on gauge cooling are both provided, where for the gradient descent method, three iterations are applied at every time step, while only one iteration per time step is applied for the alternating descent method. Averagely, the computational time for every iteration is similar for both methods, and the alternating descent method is faster due to its capability of using less iterations. By comparing the two columns for each method, it can be seen that gauge cooling takes a significant portion of total computational time, so that a better algorithm for optimization can effectively reduce the computational cost. It is also illustrated in Fig. 1 that for the gradient descent method applied to N = 32, if we increase the number of iterations to 20, the dynamics can be better stablized, although the value of ∆F is still about onemagnitude larger than the alternating descent method. However, this also results in a significant increment of the computational cost. By tuning the parameters, we find that choosing the time step s = 100∆t and keeping the number of iterations to be 3 can provide a better balance between the computational cost and the efficacy of gauge cooling. Even so, the alternating descent method is still superior in this case.
To verify the validity of these numerical methods, we calculated the expectation values of observables O k ({U }) = tr[(U 1 U 2 · · · U N ) k ] for k = ±1, ±2 ± 3. Samples are taken every 50 steps until t = 10 after t = 1. The results are listed in Table 2, and because the value of O k is real, its imaginary part is omitted. The exact values of the expectation can be obtained by Weyl's integration theorem. For N = 4, although the expectation values can be calculated without gauge cooling, the results are obviously erroneous. For N = 4, 8 and 16, all the gauge cooling methods give reasonable approximations of the exact values. When N = 32, the data for gradient descent method provided in Table 2 are computed using s = ∆t with 3 iterations per step, which show large errors especially when |k| is large. To demonstrate the efficiency of the alternating descent method more clearly, we remove the complex Langevin dynamics and consider solving the optimization problem (7) directly. The link variables {U } are set to be a random complexified gauge transformation applied to a random SU (3) field, so that one of the Gauge cooling steps solution to the problem (7) is the field before the gauge transformation, for which ∆F = 0. Below we compare the performances of the alternating descent method and the gradient descent method with various step lengths s. The numbers of link variables are chosen to be 4, 32, 256 and 1024, and the results are plotted in Fig.2. Note that the gradient descent method may get divergent when the time step is too large, and in our numerical test for every N , we have tried to maximize the step length s which maintains the stability of iterative process. In all test cases, the alternating descent method shows its superiority in the cooling effect. In particular, it can efficiently bring down the value of ∆F by two to three orders of magnitude in the first few steps, and this effect is independent of the number of link variables. Such a property is highly desired in the simulation of lattice QCD, for it allows us to reduce the computational cost of gauge cooling.

4.2.
Heavy quark QCD. In this example, we consider the QCD at finite chemical potential described in [3]. Denoting any x ∈ G by x = (t, x), the fermion determinant is given by and Define the plaquette U x,µν = U x,µ U x+μ,ν U −1 x+ν,µ U −1 x,ν . We can write the action as We refer the readers to [3] for the Lie deriatives of this action. The same example has also been computed using the complex Langevin method in [23,26,12]. In our simulation, we choose the chemical potential µ ranging from 0 to 4.0 with β = 3.0 and β = 5.0 to compute the expectation values of the following observables: The value of κ is set to be 0.12. The lattice used in our numerical tests is N 0 = N 1 = N 2 = N 3 = 4, and the Langevin time step is ∆t = 2 × 10 −5 . Again, at every Langevin step, we perform three iterations for the gradient descent method and only one iteration for the alternating descent method. Since the lattice size in each direction is small, we expect that the gradient descent method and the alternating descent method have similar efficiency, while the alternating descent method still holds the advantage of its parameter-free property in each iteration. Moreover, averagely speaking, the gradient descent method spends 8.22ms on gauge cooling, while the alternating descent method spends only 3.15ms on gauge cooling.
For β = 3.0, we start to take samples every 50 steps from t = 2 to t = 10 for all values of µ. For β = 5.0, the Langevin dynamics becomes less stable for large µ, and therefore we start 8 parallel complex Langevin processes, and for each of them, we take one sample every 50 steps from t = 2 to t = 4, so that in total, the same number of samples as the case β = 3.0 are collected to compute the obsersvables.
The observables obtained in both methods are plotted in Fig.3 and Fig.4. For small µ, these results can be cross-checked with the analytical results in [23], and the bell-shaped curves agree with the results computed in [26]. As expected, the results of both methods well agree with each other, while it is worth mentioning that for the gradient descent method, three iterations are applied to solve the optimization problem after every time step, while for the alternating descent method, only one iteration is applied. Fig.5 shows the evolution of ∆F . Our one-dimensional examples in Section 4.1 suggest similar behavior of two methods, as is generally true in our numerical tests. However, it can be observed in the case β = 5.0 and µ = 1.5 that the gradient descent method shows several sharp peaks of the norm, indicating the failure of gauge cooling in certain particular cases. While the details are still to be further studied, it shows better stability of the alternating descent method. We also expect better performance of the alternating descent method when the lattice size gets larger.

Conclusion
We have introduced a new gauge cooling technique called alternating descent method, which helps effectively find the gauge transformation that pulls the complexified gauge field closer to the unitary field. The key properties of this method include: • No parameter needs to be chosen in each iteration.
• It is easy to implement since the cooling step is implemented site by site.
• The objective function declines fast especially in the first few steps.
• The performance of the method is stable for various numbers of link variables.
Our numerical tests show that when the number of link variables is small, its performance is comparable to the gradient descent method with carefully chosen step length; and when the lattice size gets larger, the alternating descent method gets more superior. The underlying mechanism of the alternating descent method needs to be further studied, and we expect more applications to be carried out in our future work.