Self-Learning Monte Carlo Method: Continuous-Time Algorithm

The recently-introduced self-learning Monte Carlo method is a general-purpose numerical method that speeds up Monte Carlo simulations by training an effective model to propose uncorrelated configurations in the Markov chain. We implement this method in the framework of continuous time Monte Carlo method with auxiliary field in quantum impurity models. We introduce and train a diagram generating function (DGF) to model the probability distribution of auxiliary field configurations in continuous imaginary time, at all orders of diagrammatic expansion. By using DGF to propose global moves in configuration space, we show that the self-learning continuous-time Monte Carlo method can significantly reduce the computational complexity of the simulation.

Quantum Monte Carlo (QMC) is an unbiased numerical method for studying quantum many-body systems.A standard QMC scheme for interacting fermion systems is the determinantal QMC method [1][2][3][4].This method uses (1) the Hubbard-Stratonovich transformation to decompose the two-body fermion interaction, and (2) the Suzuki-Trotter decomposition of the partition function to discretize the imaginary time interval into a large number of time slices.Monte Carlo sampling is performed in the space of auxiliary Hubbard-Stratonovich fields.Recently, a continuous-time modification of the fermionic QMC algorithm was developed [5][6][7][8].In this algorithm, the partition function is expanded in the powers of interaction, and the Monte Carlo simulation is performed by the stochastic sampling of the diagrammatic expansion of interaction terms.Both the number and position of interaction terms on the imaginary time interval change constantly during the simulation.For both determinantal and continuous-time QMC methods, to compute the weight of each configuration requires integrating out the fermions.This is very time-consuming and in practice limits the size of fermion systems in QMC studies.
Recently, we introduced a new general method, dubbed self-learning Monte Carlo (SLMC), which speeds up the MC simulation by designing and training a model to propose efficient global updates [9][10][11].The philosophy behind SLMC is "first learn, then earn".In the learning stage, trial simulations are performed to generate a large set of configurations and their weights.These data are then used to train an effective model H eff , whose Boltzmann weight e −βH eff fits the probability distribution of the original problem.Next, in the actual simulation, H eff is used as a guide to propose highly efficient global moves in configuration space.Importantly, the acceptance probability of such global update is set by the detailed balance condition of the original Hamiltonian.This ensures the MC simulation is statistically exact.
SLMC method is ideally suited for QMC simulation of fermion systems.In the determinantal QMC method, the weight of an auxiliary field configuration φ(x) is computed by integrating out fermions, which is numerically expensive.In contrast, the effective model H eff [φ(x)] is an explicit functional of φ(x), and its Boltzmann weight can be computed fast.Therefore, the SLMC method has a far less computational cost than the original method, leading to a dramatic speedup as we demonstrated in previous works [10].
In this work, we extend the SLMC to continuous-time quantum Monte Carlo algorithms for fermion systems.Based on theoretical analysis and numerical study, we demonstrate that our continuous-time SLMC reduces the computational complexity of the simulation in the lowtemperature or strong coupling regime, where the autocorrelation time in the standard method becomes large.The key ingredient of our method is an effective model for the diagrammatic interaction expansion in continuous time, which we term "diagram generating function" (DGF).The form of DGF is constrained by the symmetry of the Hamiltonian under study.The parameters in DGF are trained and optimized in the learning stage of SLMC.
As an example, we implement SLMC to simulate the single impurity Anderson model [12], using the continuoustime auxiliary-field (CT-AUX) method [7,8,13,14].The DGF for this model is found to take a remarkably simple form, and reproduce with very high accuracy the exact distribution of auxiliary fields in continuous time, to all orders of the diagrammatic expansion.We demonstrate the speedup of SLMC in comparison to the standard CT-AUX, and find the acceleration ratio increases with the average expansion order.
The paper is organized as follows: We first briefly review the CT-AUX algorithm in the Anderson impurity model, after which we give a detailed introduction to the self-learning CT-AUX algorithm, and discuss the physical ideas behind the DGF.Then we show the performance of our new algorithm on the Anderson model.Finally we analyze the complexity of the algorithm.The technical details are shown in the supplemental materials [15].
While this work is being performed, a related work [16] also extending SLMC [9,10,17] to continuous time domain appeared.Unlike ours, that work uses interaction expansion without auxiliary field, and does not analyze the computational complexity of continuous-time SLMC to demonstrate its speedup.
CT-AUX Method The Hamiltonian of the single impurity Anderson model is written as the combination of a free fermion part and an interaction part [8] where σ =↑, ↓, c † σ and a † p,σ are the fermion creation operators for an impurity electron with spin σ, and that for a bath electron with spin σ and momentum p, respectively.n σ = c † σ c σ is the fermion number operator.β = 1/T is the inverse temperature.K is an arbitrary chosen parameter controls the coupling strength of the auxiliary field and the average expansion order, which we will see below.
In the CT-AUX method, the density-density interaction in H 1 is decoupled by an auxiliary Ising field s as γ is the coupling strength between the fermion density and the auxiliary field, and is determined by cosh(γ) ≡ 1 + (βU )/(2K).The partition function is expanded as Here , and and (G {τi} 0σ ) ii = g σ (0 + ).g σ (τ ) > 0 is the free fermion Green's function at the impurity site.The configuration space for the MC sampling is hence the collection of all the possible auxiliary spin configurations on the imaginary time interval and at all possible expansion orders n = 0, 1, ..
The corresponding weight w c is given by Eq. ( 6).Then a random walk c 1 → c 2 → c 3 → • • • in configuration space is implemented usually by inserting/removing random spins at random imaginary times.
Self-learning CT-AUX In this section, we describe the self-learning continuous-time auxiliary-field method.Like other SLMC methods, it consists of two parts: (1) learn an effective model or DGF that approximates the probability distribution of auxiliary spins in imaginary time interval {{τ 1 , s 1 } • • • {τ n , s n }}, and (2) propose a global move by executing a sequence of local updates in the effective model [10].
Since the number of auxiliary spins changes constantly with the expansion order n in the sampling process, one may expect that to reproduce the entire probability distribution at all orders requires a highly sophisticated model with a huge number of parameters.On the contrary, we introduce a DGF of a remarkably simple form which fits the probability distribution very accurately Several features of H eff n deserve attention: (i) DGF serves as an approximation to Z n in the weak coupling expansion as is indicated in Eq. ( 7), whose functional form could be obtained exactly if one could integrate out fermion degrees of freedom exactly.This is indeed what is done in the original CT-AUX algorithm.Here in SLMC, the DGF is instead constructed by series expansion and symmetry analysis.To be specific, Eq. ( 8) is the spin-spin interactions satisfying the spinflip symmetry s i → −s i up to two-body terms.Since the performance of the DGF is already good enough at this stage, we did not include fourth-order interactions that are proportional to s i s j s k s l .
(ii) The interaction terms J(τ ) and L(τ ) are in principle allowed to be different functions of τ at different expansion orders n, which would result in vastly more parameters.Here this predicament is avoided by choosing the same functions to all expansion orders.
(iii) The expansion-order dependent factor 1/n in Eq. ( 7) is crucial.It can be justified by considering the atomic limit V = 0, where the interaction term H 1 (τ ) ≡ H 1 in Eq. ( 5) becomes independent of τ i , and hence Z n ∝ Tr(H n 1 ).For large n, Tr(H n 1 ) ≃ ǫ n 0 is dominated by the contribution from the largest eigenvalue ǫ 0 , hence ln Z n /Z 0 increases linearly with n.On the other hand, H eff n in Eq. ( 8) includes a summation of n 2 pairwise interactions at pairs of imaginary time instances (τ i , τ j ).Therefore we must include the factor 1/n to match the two results.
As we will show later, this simple DGF performs remarkably well.
The training procedure goes as follows.Given a set of configurations {c i } taken from the Markov chain of Here m c,J , m c,L and m c,f are the truncation orders for the respective functions.The rationale behind the choice of basis functions is that the Chebyshev polynomial is close to the minimax polynomial minimizing the maximum error in the approximation.In other word, the Chebyshev polynomials approximate the original function uniformly [23].In the simulation, we always increase the truncation order until the results converge.The total number of training parameters is thus m c ≡ m c,J + m c,L + m c,f + 3 (summation starts from 0) [24].Since the DGF H eff n is a linear function of these parameters, they can be trained simply with a linear regression [25].We have also exploited iterative training procedure to improve the efficiency [9], whereby Monte Carlo configurations and weights generated by the self-learning algorithm are used as training data to further train the DGF.This procedure can be iterated until the DGF reproduces exact probability distribution sufficiently well.We note that training the effective model can be regarded as supervised learning in a broader context of machine learning, which recently has many fruitful applications in physics [26][27][28][29][30][31][32][33][34][35][36][37][38].
After completing the training process, we use the trained DGF to propose highly-efficient global moves on the Markov chain in actual simulations.Here we adopt the general procedure of cumulative update introduced in Ref. [10].Fig. 1 illustrates how self-learning CT-AUX proposes global moves, in comparison with the original CT-AUX method.Starting from a config- uration c i , we perform a sufficiently large number (denoted by M eff ) of local updates by inserting/removing random spins at random imaginary times based on the weights of the DGF, until reaching a configuration c j that is sufficiently uncorrelated with c i .The global move c i → c j is then proposed, and its acceptance rate p is calculated from the exact weight of the original model, p = min{1, (w cj w eff ci )/(w ci w eff cj )}, where w ci and w eff ci are weights of configuration c i computed from the original model Eq. ( 5) and effective model Eq. ( 7) respectively.As shown previously [10], this cumulative update procedure fulfills the ergodicity condition and obeys the detailed balance principle.Since computing the weight of DGF is much faster than computing the fermion determinant in the original method, our method significantly reduces the computational cost of the simulation.A detailed discussion on the choice of the cumulative update length M eff and the computational complexity of self-learning CT-AUX method is presented in the last section of this work.
Performance on Anderson Model Now we are ready to show the performance of self-learning CT-AUX on the single impurity Anderson model.We consider a bath with a semicircular density of states ρ 0 (ǫ) = (2/(πD) 1 − (ǫ/D) 2 ) and set the half bandwidth D = 1 as the energy unit.The chemical potential is set to be µ = U/2 to maintain a half-filling.
In the simulation, we use 5 × 10  To quantitatively measure the goodness of fit, we evaluate the quantity R 2 ∈ [0, 1] which is introduced as the "score" of self-learning Monte Carlo method in general [10].Here, we find the DGF for the Anderson impurity model (with U = 5, β = 10, V = 1, and K = 1) has a score of R 2 = 99.9%.Thanks to the success of our DGF, a global move proposed by cumulative update between two uncorrelated configurations has a very high average acceptance rate around 0.68.
To demonstrate the speedup of self-learning CT-AUX method, we compute the autocorrelation function of the auxiliary spin polarization defined by m ≡ (1/n) n i=1 s i .Fig. 4 shows the autocorrelation time of the original CT-AUX method, defined in terms of the number of local updates.It is clear that the autocorrelation time increases rapidly with β and U , rendering the algorithm inefficient at low temperature and in the strong coupling regime.In contrast, the performance of the self-learning CT-AUX method is shown in Fig. 5.The autocorrelation function decays rapidly with the number of global moves proposed by the DGF.This is because (1) a single global move is the cumulative outcome of M eff local updates, where M eff is taken to be so large that the proposed configuration is sufficiently uncorrelated from the current one, and (2) the average acceptance rate for such global moves are high enough -greater than 0.6 for all the data points in Fig. 5.The inset shows the U dependence of the autocorrelation time t 0 , which is estimated from the initial slope of the autocorrelation function m(t)m(t + ∆t) ∼ e −∆t/t0 .It is worth noting that with increasing M eff , the autocorrelation time of our self-learning algorithm saturates to a small value even for very large U .
Computational Complexity Finally, we discuss the actual calculation cost of the self-learning CT-AUX method.Fig. 1 shows schematically the Markov chains to obtain two uncorrelated configurations.Roughly speaking, self-learning CT-AUX is faster than the original CT-AUX because the computational cost of each local move in the Markov chain is smaller than that in the CT-AUX.A detailed analysis is given as follows.In order to compare the two methods on a equal footing, we consider the cost to obtain an uncorrelated configuration from a given one.In this way, the two methods give the same error bar for the measured observables.The cost for in-serting/removing a vertex with the use of fast updates is O( n 2 ) in the original CT-AUX simulation [8].n is the average expansion order that determines the size of the matrix N σ ({s i , τ i }).To obtain an uncorrelated configuration, τ ori such local updates are needed.(This is actually the definition of autocorrelation time in the original method.) Thus, the total cost is O( n 2 τ ori ).On the other hand, the cost for inserting/removing a vertex is O(m c n ) in the effective model.Recall m c is the number of the training parameters in the DGF.The scaling of n is different from that in the original CT-AUX because the weight of DGF is computed directly without calculating the fermion determinant.The number of the local updates using DGF M eff should be τ ori in order to obtain an uncorrelated configuration.And we need one more calculation of the determinant to decide the weight of the proposed global move, whose computational cost is O( n 3 ).Note that the global move is not always accepted, there is additional τ SL overhead, which is the autocorrelation time measured in Fig. 5. Thus the total calculation cost of the self-learning algorithm is O ( n 3 + m c n τ ori τ SL ).Since n ∼ βU [8] and the autocorrelation time τ ori is approximately proportional to U 4 β 3 as shown in Fig. 4, the second term in the bracket dominants.This is indeed the case shown in the inset in Fig. 5.In fact, in our computation n is less than 30 while the τ ori can be up to of order 10 6 .In this way, the actual speed-up ratio t s is expressed by As long as the DGF is good enough, τ SL is O(1).Since m c hardly scales with U and β, the self-learning CT-AUX method is generally faster than the original CT-AUX especially in the low temperature and strong coupling regime when n ∼ βU is large.Conclusion We developed the continuous-time version of the SLMC with auxiliary field, which trains an effective model (DGF) to propose new uncorrelated configurations in the Markov chain, with high acceptance rate.The DGF for Anderson impurity model is found to take a remarkably simple form, and reproduce very well the exact distribution of auxiliary fields in continuous time to all orders of the diagrammatic expansion.Our method reduces the computational complexity of the simulation in the low-temperature or strong coupling regime, where the autocorrelation time in the standard method becomes large.
Our self-learning CT-AUX method have many potential applications.It can be used as an impurity solver for dynamical mean-field theory, and is ideal for studying systems near the critical point [39][40][41], where standard methods suffer from severe critical slowing down.Our method can also be generalized straightforwardly to fermion lattice models.

SELF LEARNING UPDATES
The probability of moving from a configuration c i to a configuration c j can be split into the probability of proposing the move and the probability of accepting it, p(c i → c j ) = p prop (c i → c j )p acc (c i → c j ).Then the detailed balance principle implies In the self-learning CT-AUX, the new configuration c j is proposed based on the effective weight w eff cj .The probability to propose the move p prop (c j → c i )/p prop (c i → c j ) = w eff ci /w eff cj .Combined with Eq. (S1), we obtain the desired acceptance rate.This result can be understood intuitively in the limit that the effective weight w eff c is equal to the original weight w c for all configurations.Then we are as if doing the MC update on exactly the original model.Therefore the "global update" from configuration c i to configuration c j should always be accepted, i.e., p acc (c i → c j )/p acc (c j → c i ) = w eff ci w cj /(w eff cj w ci ) = 1.

COMPARISON BETWEEN THE ORIGINAL AND EFFECTIVE WEIGHTS
To show the efficiency of the trained DGF, we plot the original and effective weights w ci and w eff ci .We set m s = 12 V = 1, β = 10, and K = 1.The configurations are generated by the Markov process in the original CTAUX simulation.The expansion order n changes in the simulation.As shown in Fig. S1, one can clearly find that the weights between these two methods are quite similar.

LOCAL UPDATES IN SL-CTAUX
We show that the calculation cost of the local updates is O( n ).We consider the configuration with the expansion order n and the insertion of a vertex with the auxiliary spin s at the imaginary time τ .The weight w eff n+1 is expressed

:
FIG. 1.(Color online) Schematic figure for the Markov chains in the original and self-learning continuous-time Monte Carlo methods to obtain an uncorrelated configuration.n denotes the average expansion order that determines the size of the matrix Nσ({si, τi}), and further determines the complexity of the simulation.See the last section of the paper for a detailed discussion.

4 FIG. 3 .
FIG. 3. (Color online) For 5×10 4 independent configurations on the Markov chain of the original CT-AUX, histogram is the distribution of the difference ln Wi − ln W eff i .The upper-left and upper-right insets are distributions of ln Wi and ln W eff i , respectively.Here U = 5, V = 1, β = 10, and K = 1.

3 FIG. 4 .
FIG. 4. (Color online) (Left panel) U -dependence of the autocorrelation time of the original CT-AUX with β = 10.(Right panel) β-dependence of the autocorrelation time with U = 2.The other related parameters are V = 1 and K = 1.In both of the figures, the unit time is a local update (inserting/removing a auxiliary spin) in the original CT-AUX method.
FIG. 5.(Color online) Autocorrelation function of the auxiliary-spin magnetization for a system with β = 10, V = 1, and K = 1.Unit time is defined in the main text.(Inset) U dependence of the autocorrelation time in the self-learning CT-AUX.We set the number of local updates on DGF to be M eff = 2 × 10 3 (U = 1, 2, 3, 4) and M eff = 5 × 10 4 (U = 5, 6).
FIG. S1. (Color online)Comparison between the original and effective weights with mc = 12.We set V = 1, β = 10, and K = 1.The configurations are generated by the Markov process in the original CTAUX simulation.The expansion order n changes in the simulation.