Application of the path optimization method to a discrete spin system

The path optimization method, which is proposed to control the sign problem in quantum field theories with continuous degrees of freedom by machine learning, is applied to a spin model with discrete degrees of freedom. The path optimization method is applied by replacing the spins with dynamical variables via the Hubbard-Stratonovich transformation, and the sum with the integral. The one-dimensional (Lenz-)Ising model with a complex coupling constant is used as a laboratory for the sign problem in the spin model. The average phase factor is enhanced by the path optimization method, indicating that the method can weaken the sign problem. Our result reproduces the analytic values with controlled statistical errors.


I. INTRODUCTION
To understand the non-perturbative properties of quantum field theories and spin models, the Monte Carlo (MC) method plays an important and crucial role.In the MC calculation, expectation values are evaluated with the Boltzmann weight.However, the Boltzmann weight is a complex value in some cases even if the partition function is still real.This problem is called the sign problem.A typical example is quantum chromodynamics with a finite quark chemical potential (µ), reviewed in Refs.[1,2].Another example of a discrete spin system is the Hubbard model away from the half-filling [3].
The sign problem can be milder by the path optimization method or the sign-optimized manifold [4][5][6], which has a close relationship with the Lefschetz thimble method [7].Both are the so-called complexified dynamical variable approaches based on Cauchy's integral theorem, which ensures the independence of the expectation value via modification of the integral path as long as the integrand is an entire function with no contribution at infinity.If we have an integral representation of the partition function, such as quantum field theories with continuous degrees of freedom, a sign-problemreduced integral path could be on the complexified dynamical variable plane.The path optimization method utilizes machine learning to determine the optimized integration path.The path optimization works well for several models: a simple Gaussian model [4], the 1 + 1 dimensional complex λφ 4 theory [8], the Polyakov-loop extended Nambu-Jona-Lasinio model [9,10], the 1+1 and 2+1 dimensional Thirring model [5,11], the 0 + 1 dimensional Bose gas [6], the 0 + 1 dimensional QCD [12], the two-dimensional U(1) gauge theory with complexified coupling constant [13][14][15], the 2+1 dimensional XY model [16].It is also employed for error reduction of observables [17,18].The recent progress of the complexified dynamical variable approach is reviewed in Ref. [19].
For the spin models, on the other hand, we have a sum in the partition function, instead of the integral.We cannot directly apply the complexified dynamical variable approach.A solution is the Hubbard-Stratonovich transformation.It converts the sum to the integral using the auxiliary field.We demonstrate it in the onedimensional classical (Lenz-)Ising model with a complex coupling constant.Since we have the analytic result, we can judge the correctness of results by the path optimization method.Another famous example is the Hubbard model away from half-filling [3].In this case, the complexified dynamical variable approach is feasible; for example, see the review [20].
In this paper, we apply the path optimization method [4,8] to the Ising model with the complex coupling constant.The dynamical variables are replaced by the Hubbard-Stratonovich transformation as in Refs.[21,22].We also introduce parallel tempering to the path optimization method [13] toward control of the global sign problem, as first applied in the Lefschetz thimble method [23,24].
This paper is organized as follows.In Sec.II, we explain the formulation of the Ising model with the complex coupling constant, the Hubbard-Stratonovich transformation, and the path optimization method.The numerical setup and results are shown in Sec.III and Sec.IV, respectively.Section V is devoted to a summary.

II. FORMULATION
We employ the one-dimensional classical Ising model with a complex coupling constant as a laboratory to investigate the sign problem in spin models.The sign problem is induced by the imaginary part of the external field.We first explain the integral representation of the Ising model through the Hubbard-Stratonovich transformation.We then explain the application of the path optimization method to the model.

A. Ising model with complex coupling constant
The Hamiltonian of the classical one-dimensional (Lenz-)Ising model with an external magnetic field [25,26] is given by where J is a coupling constant for the nearest-neighbor spins, h is strength of the external magnetic field, and ; this is the one-dimensional Ising chain.We impose the periodic boundary condition, σ 0 = σ N and σ N +1 = σ 1 .The Hamiltonian H can be represented in matrix style as where K is the symmetric connectivity matrix, s is the spin matrix defined as s = (σ

and
. This representation can be also applied to higher-dimensional systems by using a suitably constructed symmetric connectivity matrix.The coefficient 1/2 in Eq. ( 2) is introduced to avoid double counting of the nearest-neighbor interaction when we make K symmetric.
The sign problem arises from the imaginary part of J.The realistic Ising model does not have such an imaginary part, but is sometimes introduced for analysis of the Lee-Yang zeros [27] or the Fisher zeros [28].Such an imaginary part also naturally arises when we consider the QCD-like Potts model [29][30][31], as discussed in Appendix A The partition function of the Ising model is where the sum takes over all possible states.The inverse temperature β can be absorbed into H by replacing J ′ = βJ and h ′ = βh.

B. Hubbard-Stratonovich transformation
With the expression (2), we can use the Hubbard-Stratonovich transformation as in Ref. [22]; where the normalization constant N is defined by It should be noted that the eigenvalue of K must be positive for the Hubbard-Stratonovich transformation.We thus put a constant shift for K as where I is the unit matrix and the constant C takes the same sign as that of J.The C-independence of the physical result is confirmed in Ref. [22].If we set C > n where n is the maximum number of nearest neighbors of one site, K is positive definite; n = 2 for the one-dimensional Ising model.The final form of the partition function becomes where N ′ includes a contribution of N and C, which is irrelevant in the evaluation of the expectation values.We can consider H ′ as the effective Hamiltonian in molecular dynamics.The expectation value of magnetization for the single spin is obtained as The analytic result of the magnetization [26] is known as where λ ± are the eigenvalues of the transfer matrix of the model,

C. Path optimization method
The path optimization method [4][5][6] is proposed as a complex dynamical variable approach for the path integral formulation to control the sign problem via machine learning.Although the path optimization method does not need initial teacher data, the effectiveness of the modified path can be automatically evaluated in the learning part.
In the path optimization method, we first complexify the dynamical variable v ∈ R N as where v R , v I ∈ R N .This procedure means modification of the integral path on the complexified dynamical variable plane.There are several ways to express the modified integral path minimizing the sign problem.We use the representation constructed by the neural network; the input is v = v R , and the output is v I .The actual procedure is as follows: The output layer is where L is the total number of layers.The hidden layer is composed of where Weight w and bias b are the parameters of the neural network optimized by the back-propagation method with the appropriate cost function.The activation function is the hyperbolic tangent, f (•) = tanh(•).In this work, we basically employ the following cost function where J (v R ) is Jacobian, S(v ′ ) represents the action composed of v ′ .θ 0 means the phase of the partition function.Since we do not know the exact value of θ 0 , we estimate it iteratively in the learning process.We can employ other cost functions; we introduce a penalty term as shown later.
The actual procedure is as follows: 1. Generate configurations on the original path 2. Update the neural network parameters using the generated configurations 3. Regenerate configurations on the modified path 4. Repeat 2 and 3 to obtain a converged result Since the Boltzmann weight is still complex even on the modified path, phase reweighting is required for the probability: where O represents an observable such as magnetization.
The left-hand side in Eq. ( 18) is the correct expectation value of O, while • • • pq is the phase-quenched expectation value, and Z pq is the partition function with the corresponding Boltzmann weight.The denominator of Eq. ( 18) is the so-called average phase factor (APF).If APF is exactly 1, the sign problem completely disappears.The sign problem becomes serious when the APF approaches 0. Note that the path optimization method and other sign-optimized manifold approaches usually require a Jacobian calculation to modify the integral path, which requires a high numerical cost, O(N 3 ).In this work, we consider the simple model, and thus do not introduce the reduction technique of the Jacobian calculation, but we need it for more complicated models and theories.One of the possible ways is that we completely neglect the Jacobian calculation in the learning part; this is a most drastic reduction technique because the Jacobian is completely neglected except in the evaluation part of the expectation values.In Ref. [15], such a drastic approximation is shown to work at least in the 1 + 1 dimensional U (1) gauge theory.Another treatment of the reduction of the Jacobian calculation, for example, is discussed by using the affine coupling layer in Ref. [32].No Jacobian calculation is required in the configuration generation of the worldvolume Hybrid Monte Carlo method by use of the flow equations [33,34].

D. Parallel tempering
Since the path optimization method makes the phases of the Boltzmann weight in the partition function localize, we may encounter the global sign problem even if it seems to be absent on the original integral path.The global sign problem arises if there are some relevant contributions on the integral path which are separated by the energy barrier in the molecular dynamics.To treat the global sign problem, we consider the parallel tempering method [35][36][37], as adopted to the Lefshetz thimble method [23] and the path optimization method [13].
In this study, we introduce replicas as follows, instead of varying the temperature.We modify the integral path using the path optimization method, and we have Eq.(11).We then make replicas as where r = 1, • • • , N r and N r means the total number of replicas.The region between the original path and the modified path is divided into N r slices as replicas.
The exchange probability between the rth-replica and (r + 1)th-replica is set as where E. Improvements We introduce the following three improvements to the path optimization method.Improvements are in part based on knowledge obtained in the machine learning community.
First, we add the penalty term to the cost function (16), similar to the L 2 normalization, where λ is the strength of the term.The penalty term prohibits too large separations of the integral path from the original path in the training.Second, we mix the previous and regenerated configurations as 50 : 50 in the training part to make the training speed moderate; we only use the regenerated configurations in the evaluation of the expectation value.If the regenerated configurations are significantly changed compared with the previous configurations, it may violate the stability of training.
Finally, we introduce the scheduler, Exponen-tialLR [38], to ensure stable training.The scheduler decreases the learning rate as the training progresses, and thus the change of the parameters in the neural network becomes mild.If the model approaches good minima, a decrease in the learning rate leads to better determination of the parameters.

III. NUMERICAL SETUP
We consider N = 4 spins in the one-dimensional Ising model.Our numerical codes are implemented in the framework of PyTorch [39].For evaluation of the expectation values, we generate N conf = 1000 configurations after thermalization by HMC.The trajectory length is 1 with a step size of 0.2.For the number of replicas, we employ N r = 10.The statistical error is estimated using the Jackknife method with bin size 50.Measurements are performed at each 100 trajectories.We set Re J = 1.0 and h = 0.1 ∈ R. The shift value C in K in (6) is 2 + 10 −5 .
In the training part, we use the batch training [40] with the batch size 32.The number of hidden layers is L = 2 and each layer contains 64 units.We employ AdamW [41] as an optimizer.We set the strength of the penalty term λ = 1.0.The decay rate on the scheduler is γ = 0.9.In the following, we show the results with the three improvements explained in Sec.II E. The results are evaluated after the 30th training.If no significant improvement is achieved in the early stage of training, we reset the initial values of the neural network.
After finishing the training, we regenerate the configurations and estimate APF and the magnetization.

IV. NUMERICAL RESULTS
Figure 1 shows the real and imaginary parts of APF for T = 0.8, 1.0 and 1.2 with Im J = 0.5 for each learning step, where each learning step contains batch training and MC update.The results in the figures are obtained without improvements explained in Sec.II E. In learning steps, the training is almost stable with large APF, but sometimes shows a sudden drop; see Appendix B for the distribution of the phase of the Boltzmann weight in the training.This problem may be solved with a large number of replicas because the bias of sampling in HMC is relaxed.We may also need a deeper neural network or a network based on physical knowledge of the model and/or theory to enhance the expressive power of the neural network.We will keep them in our future work.
Figure 2 shows the real and imaginary parts of APF for T = 0.8, 1.0, and 1.2 with Im J = 0.5 for each learning step.Comparison of Fig. 1 with Fig. 2 suggests that training becomes more stable than that without the im- provements.Enhancement of APF is also observed in Fig. 3, which shows the real part of magnetization and the real and imaginary parts of APF at fixed T = 1.0 with Im J = 0.25 ∼ 1.0 for each learning step.
Figure 4 shows the magnetization on the original and modified paths.Here, we consider Im J = 0.25 ∼ 1.0 with T = 0.8, 1.0 and 1.2.On the original path, the statistical errors are large due to the small APF, at least in the present number of configurations.In some regions, the error becomes very small, but the results do not reproduce the analytic result; this indicates that the HMC on the original path does not sample all relevant configurations.On the modified path constructed by the path optimization method with some improvements, we can see that the statistical errors are well reduced.

V. SUMMARY
In this paper, we have applied the path optimization method [4][5][6] to the (Lenz-)Ising model with a complex coupling constant, which is prepared as a laboratory to investigate the sign problem in spin models with the discretized degrees of freedom.The sum of spins is transformed into an integral using the Hubbard-Stratonovich transformation [21,22], which allows us to modify the in- tegral path on the real dynamical variable plane to that on the complex dynamical variable plane.We found that the path optimization method works in the spin model, at least in the Ising-type model.The average phase factor is enhanced on the modified integral path compared to that on the original integral path with improvements: the parallel tempering, the penalty term in the cost function, the mixed configurations in the training part, and the scheduler.On the original path, the statistical error of the magnetization can be huge, or can be underestimated even with 1000 configurations indicating lack of all relevant contributions in sampling, due to the sign problem.On the modified path by the path optimization, the expectation value of the magnetization reproduces the exact result with a well-reduced statistical error.
It should be noted that the same procedure can work also in the gauge theory case if we can rewrite the sum for spins in the path integral.However, we should be careful with the gauge symmetry because it is hard to enhance the average phase factor without adequate treatment of the gauge symmetry.We may need the suitable gauge fixing [13], the gauge-invariant input [14], or the gauge-covariant network [15,42] for the path optimization method.
Since the one-dimensional Ising model does not have a phase transition, it is interesting to apply the present method to the spin model which shows a phase transition, such as the higher dimensional Ising model and also the Potts model.While we only use machine learning to represent the integral path, we can also use it to accelerate the sampling of configurations near the phase transition point [43].We will report on these issues elsewhere.energy becomes where s is a complex vector with 2N -components consisting of Φ x and Φx , A is the 2N × 2N symmetric matrix, and where N is the number of sites.We impose the periodic boundary condition for this model.
To keep the first term in Eq. (A5) real, we sum up a possible combination of Φ and Φ.The partition function is then given by where the sum takes over all possible states of the Potts spins and β is absorbed into H.Since the expression (A5) is similar to that of the Ising model , we can use the same formulation.Therefore, we can use the hybrid Monte Carlo method for the QCD-like Potts model if the effective Hamiltonian is real.The Hamiltonian becomes complex at finite µ, which causes the sign problem.It should be noted that the expectation value of the energy must be positive.
Since degrees of freedom in the present model can be expressed by continuous dynamical variables, the path optimization method can be applied to the QCD-like Potts model.

FIG. 1 .
FIG.1.The magnetization and APF with Im J = 0.5 at T = 0.8, 1.0 and 1.2.Here, we do not introduce three improvements.The left panel shows the real part of the magnetization and the right panel shows APF.The circle and square symbols in the right panel are the results of the real and imaginary parts of APF, respectively.

FIG. 2 .
FIG. 2. The magnetization and APF with Im J = 0.5 at T = 0.8, 1.0 and 1.2.The three improvements are included in the training.The left panel shows the real part of the magnetization and the right panel shows APF.The circle and square symbols in the right panel are the results of the real and imaginary parts of APF, respectively.

FIG. 3 .
FIG. 3. The magnetization and APF with Im J = 0.25, 0.5, 0.75 and 1.0 at T = 1.0.The three improvements are included in the training.The left panel shows the real part of the magnetization and the right panel shows APF.The dotted line in the left panel denotes the analytic result.The circle and square symbols in the right panel are the results of the real and imaginary parts of APF, respectively.

FIG. 4 .
FIG. 4. The magnetization with Im J = 0.25 ∼ 1.0 at T = 0.8, 1 and 1.2.The three improvements are included in the training.The left (right) panel is the result on the original (modified) path.

FIG. 5 .
FIG. 5.The top (bottom) panel shows the histogram for the phase of the Boltzmann weight (un-reweighted magnetization) with Im J = 0.5 at T = 1.0 for the 0th, 10th and 20th learning steps from the left to the right panel without the improvements.