On the Infinite Variance Problem in Fermion Models

Monte Carlo calculations of fermionic systems with continuous auxiliary fields frequently suffer from a diverging variance. If a system has the infinite variance problem, one cannot estimate observables reliably even with an infinite number of samples. In this paper, we explore a method to deal with this problem based on sampling according to the distribution of a system with an extra time-slice. The necessary reweighting factor is computed both perturbatively and through a secondary Monte Carlo. We show that the Monte Carlo reweigthing coupled to the use of a non-biased estimator of the reweigthing factor leads to a method that eliminates the infinite variance problem at a very small extra cost. We compute the double occupancy in the Hubbard model at half-filling to demonstrate the method and compare the results to well established results obtained by other methods.

Monte Carlo calculations of fermionic systems with continuous auxiliary fields frequently suffer from a diverging variance for fermionic observables. If the simulations have an infinite variance problem, one cannot estimate observables reliably even with an arbitrarily large number of samples. In this paper, we explore a method to solve this problem using sampling based on the distribution of a system with an extra time-slice. The necessary reweighting factor is computed both perturbatively and through a secondary Monte Carlo. We show that the Monte Carlo reweighting coupled to the use of an unbiased estimator of the reweighting factor leads to a method that eliminates the infinite variance problem at a very small extra cost. We compute the double occupancy in the Hubbard model at half-filling to demonstrate the method and compare the results to well established results obtained by other methods.

I. INTRODUCTION
Lattice fermion models are ubiquitous in many areas of physics, from condensed matter to lattice QCD. Frequently, no analytic methods are available to study them, so stochastic methods are used instead. But fermions pose a number of special challenges to Monte Carlo calculations, among them the infinite variance problem we address in this paper. The Hubbard model is an example of a lattice fermion model and has attracted much attention since it not only exhibits a rich phase structure as the temperature and doping are varied (including antiferromagnetic, superconducting, and non-Fermi liquid regions) but also because it is supposed to be intimately connected to the physics of cuprate high-T c superconductors.
A common strategy for simulating the Hubbard model is to introduce an auxiliary bosonic variable via the Hubbard-Stratonovich transformation, which allows for a Monte Carlo simulation over those variables (instead of the original fermionic ones) to be carried out. Although discrete auxiliary variables are most commonly used, there are at least two reasons for using continuous variables instead. First, away from half-filling, the action of the auxiliary field becomes complex and the model develops a sign problem, making a Monte Carlo simulation nearly impossible. The methods used to bypass the sign problem, such as the complex Langevin method, are all based on an analytic continuation of the auxiliary variables [1][2][3][4][5] (see [6] for a review), all require the use of continuous variables. Secondly, the very successful hybrid Monte * aalexan@gwu.edu † bedaque@umd.edu ‡ acarosso@gwu.edu § hyunwooh@umd.edu Carlo algorithm, which speeds up Monte Carlo calculations in other models, also requires continuous auxiliary fields [7][8][9]. However, the desirability of continuous auxiliary variables is tempered by the fact that their use typically leads to large (or even infinite) variance for interesting observables, again making Monte Carlo calculations extremely difficult. A manifestation of this problem also occurs in lattice QCD for discretizations that encounter "exceptional" gauge configurations [10]. Note that while for QCD this problem was detected in the context of quenched simulations and attributed to neglecting the determinant in the sampling weight, similar divergences crop up in dynamical simulations [11] which lead to instabilities in the HMC algorithm. The solution for these instabilities [12][13][14] is similar to the solution we explore here: use sampling with a positive weight that has no zeros in the field space together with a reweighting. A number of methods have been proposed to deal with the infinite variance problem [15][16][17][18]. The origin of the problem is that some auxiliary field configurations with small statistical weights contribute significantly to observables. A way of dealing with this issue, proposed in [15], is to sample the configurations according to a modified distribution so that small statistical weights and large contributions to the observables do not occur for the same configurations. The proposal was made in the context of auxiliary-field Quantum Monte Carlo with noncompact auxiliary fields at zero temperature, and consists in a reweighting with respect to a probability density involving one more time-slice than the system being considered.
In this work, we adapt the extra time-slice method to be suitable for standard Monte Carlo simulations of Euclidean time path integrals over compact auxiliary fields, and show how it is implemented for two discretizations of the action of the system. The implementation proposed in [15], however, requires that one compute fermion contractions of exp(−ϵH) to obtain the reweighting factor, where H is the Hamiltonian operator of the system under investigation. Two options present themselves: compute several terms in an expansion in ϵ, or estimate it nonperturbatively using Monte Carlo by writing it as an integral with respect to a known probability distribution. Since the first option introduces new O(ϵ k ) errors, and quickly becomes difficult to calculate at orders k > 2, the second option is preferable. The latter option amounts to a stochastic estimate of 1/F , where F is a quantity estimated by Monte Carlo-but the estimator 1/F for this quantity is known to be biased [19]. We therefore implement an unbiased estimator of the reweighting factor and show the improvement obtained by this method.
In what follows, we first define our conventions for the testbed model used in our investigation, namely, the Hubbard model at half-filling. We then review the origins of the infinite variance problem and display its symptoms in a Monte Carlo simulation. The extra time-slice method is then defined, together with the two approaches to computing the requisite reweighting factor. Lastly, we describe the unbiased estimation of the reweighting factor.

II. MODEL
We denote an electron creation operator at site x byψ † a,x , where a =↑, ↓ is the spin. The grand canonical Hamiltonian of the Hubbard model on a square lattice is then given by Here, we have used parameters which yield half-filling at µ = 0. The summation in the first term is over sites ⟨x, y⟩ that are nearest neighbors. On a lattice with an even number of sites, we defineψ 1,x ≡ψ ↑,x ,ψ 2,x ≡ (−1) xψ † ↓,x , so that, in terms of the new variables, the Hamiltonian becomes up to a constant shift. Using a Hubbard-Stratonovich transformation, the expectation value of an observable O in a thermal ensemble at temperature T has a representation as an integral over an auxiliary bosonic field defined on the same square lattice: where the auxiliary field action S 0 (ϕ) is As described in the appendix, one can derive this using a Trotterization of the partition function is a product of the fermion matrices for each spin, which are given by where ϕ t = {ϕ t,x } denotes the field on time-slice t, and the spatial matrices B a (ϕ) are given by with (H 2 ) x,y = κϵδ ⟨x,y⟩ + ε a µϵδ x,y , Here, δ ⟨x,y⟩ is the nearest neighbor hopping matrix, and ε 1 = +1, ε 2 = −1. The parameter β is related to ϵU by This path integral representation, derived in the appendix, is accurate up to terms of order ϵ 2 . Another representation, accurate only up to order ϵ is also discussed there.

III. INFINITE VARIANCE PROBLEM
Consider the expectation value in Eq. (3). For a fermionic observable, O(ϕ) det M(ϕ) arises from the integration of a polynomial in Grassmann variables, resulting in a polynomial in the matrix elements of M(ϕ), and is therefore non-singular for a finite system. This remains true even for so-called "exceptional configurations" with det M(ϕ) = 0. It is also typically the case that O(ϕ) det M(ϕ) does not vanish on these configurations. For nearly-exceptional configurations with small but nonvanishing determinant, M −1 exists, so O(ϕ) ∼ 1/ det M(ϕ) can be large. While the contribution of such configurations to the average value of the observable is finite, their contribution to the variance σ 2 O = ⟨O 2 ⟩ − ⟨O⟩ 2 is unbounded. In fact, the integrand in the numerator of is proportional to the divergent quantity And because the polynomial O det M typically does not vanish at exceptional configurations, their presence will cause the variance to diverge.
In the Monte Carlo simulation, one computes a stochastic estimator O of ⟨O⟩, together with an error estimate given by the standard deviation of the mean where n is the number of statistically independent ϕ configurations generated in the Markov chain. When the variance σ 2 O is finite, the error estimate err O decreases as 1/ √ n. If the probability distribution is importance-sampled according to the distribution p(ϕ) ∼ exp[−S 0 (ϕ)] det M(ϕ), the exceptional configurations with vanishing det M(ϕ) are never accepted, yet nearby configurations can be accepted, and these contribute arbitrarily large values for O 2 (ϕ) det M(ϕ), leading to the non-convergence of the variance estimator as n is increased [15,17]. The effect of the infinite variance in a Monte Carlo calculation is exemplified by the upper curve in Fig. 1. Sudden "glitches" caused by the sampling of a nearly-exceptional configuration change abruptly the average value of the observable (the double occupancy D in this case) and lead to the non-convergence of the estimated error. In Fig. 2 we show that the large values of D are correlated with small eigenvalues of the fermion matrix M a (ϕ).
It would be advantageous to sample auxiliary fields according to a different statistical distribution where i) the variance with respect to the new distribution is not infinite, and ii) where the probability of the nearly-exceptional configurations is not so small and they are sampled more frequently while, at the same time, reweighting them down so their contribution is correctly taken into account. One such method is discussed in the next section.

A. Extra time-slice reweighting
To eliminate the infinite variance problem in our simulation, we follow the method proposed by Shi, et al. [15]. The basic idea is to sample the auxiliary field according to a modified distribution and reweigh the contribution of each configuration appropriately. We will choose the probability distribution of an identical system with one extra time-slice to define the distribution we will sample from. The rationale is that for a given auxiliary field this probability distribution, when neglecting the fields on the extra time-slice (marginalized distribution), is similar to the original probability density, but it has no zeros.
Formally, we multiply and divide the integrand of the partition function by a function F (ϕ) defined by so that is the bosonic action of the extra time-slice, and S 0 (ϕ, ϕ * ) = S 0 (ϕ) + S 0 (ϕ * ) is the total auxiliary field action over N + 1 time-slices. For insertions of an observable O(ϕ), we similarly find Observables may then be written as where the symbol ⟨· · · ⟩ N +1 stands for the average with respect to the probability distribution of a system with an extra time-slice: At half-filling, these two determinants are complex conjugates of each other, and therefore det M N +1 is greater than or equal to zero. The function F (ϕ) is then strictly positive, since it is an integral of a nonnegative quantity that cannot be uniformly zero, and because det M is bounded above on a finite lattice, F (ϕ) is also non-singular. This makes the reweight factor R(ϕ) strictly positive.
The infinite-variance problem is solved by this reweighting prescription because, if ϕ 0 is an exceptional configuration where det M N (ϕ) vanishes, then any positive power of O(ϕ 0 )R(ϕ 0 ) will be nonsingular, since the determinant det M N (ϕ) cancels in their product. In particular, the integrand of the variance of OR is (OR) 2 p N +1 , which is non-singular. The computation of F (ϕ) can proceed in different ways. As described in Appendix A 2, F can be approximately written as a Grassmann integral of the observable exp(−ϵH) in the presence of a background field ϕ. Perhaps the simplest way to compute it is then to expand in ϵ as exp(−ϵH) = 1 − ϵH + . . . and compute the fermion contraction of H, and higher order terms, and truncate at some order. As we describe in Section V, we find empirically that the O(ϵ 2 ) error introduced by this truncation is numerically very large. Instead, we will perform a nonperturbative Monte Carlo estimate of the reweighting factor by noticing that F (ϕ) can be computed as an expectation value with respect to the probability distribution g(ϕ * ) = exp[−S 0 (ϕ * )]/Z. Note that the normalization constant Z = dϕ * exp[−S 0 (ϕ * )] can be evaluated analytically, but it can be neglected since it drops out in the ratio in Eq. 14. This "sub-Monte Carlo" is relatively inexpensive, first because the distribution function g(ϕ * ) factorizes site-by-site and then standard methods can be used to sample efficiently the one-dimensional probability distribution. Second, the matrices appearing in M N +1 (ϕ, ϕ * ) have ϕ as a fixed background, so the matrix product B a (ϕ N ) · · · B a (ϕ 1 ) need only be computed once in the evaluation of F (ϕ).

B. Unbiased estimator for 1/F
A subtlety arises in the Monte Carlo evaluation of 1/F (ϕ). The Monte-Carlo estimator for F (ϕ) is computed by taking the mean over n * auxiliary variables ϕ * i : F (ϕ) is an unbiased estimator of F (ϕ), but 1/F (ϕ) is not an unbiased estimator of 1/F (ϕ). Lets denote A i = Z det M N +1 (ϕ, ϕ * i ), to simplify notation. We can compute the bias of the estimator 1/A from an expansion in A − ⟨A⟩, The second term vanishes but the remaining terms do not, which gives us a perturbative expansion for the bias. In a Monte Carlo simulation, such a bias is only relevant if it is larger than the statistical error. Since the bias falls like 1/n to leading order, while the error falls as 1/ √ n, in general one can control for the bias by increasing the sample size. However, if generating a large ensemble is expensive, a more elegant solution is to construct an unbiased estimator instead.
We adopt the unbiased estimator proposed in [19] for the reciprocal 1/ ⟨A⟩, for positive definite A i > 0. The estimator is defined bŷ Here n is a non-negative integer-valued random variable distributed according to a geometric distribution with success probability p. The probability for outcome n is q n = (1 − p) n p. To calculate this estimator we have to evaluate n uncorrelated random samples A i . The average number of random variables required to compute the estimator once is ⟨n⟩ = 1/p. The estimator is unbiased for any value of p and any value of w < 2/ ⟨A⟩. For a given ϕ we tune the values of w and p, to minimize the variance of the estimator, using the following method [19]: We take k samples of A i and compute the means A and A 2 . We choose w and p according to where A max is the largest A i . The total number of samples to get one estimate is then k + ⟨n⟩ = k + 1/p. In our simulations, it turns out that for most cases w = 1/kA and then p ≈ 1/k, thus the total cost for one evaluations is 2k, on average.

V. RESULTS
To test this method, we compute the double occupancy for Hubbard model at half-filling in a small volume. The infinite-variance problem is apparent in Fig. 1 here we use the average value n sub ≈ 2k, which includes the cost of the k estimates required to compute w and p in eq. (21). The gray band is the extrapolation for the biased estimator at n sub → ∞ including its error bar.
are gone or, at least, drastically reduced. Even more evident is the fact that the variance of the sampled mean approaches a constant instead of diverging as the number of samples is increased. This difference in behavior between the calculation with and without reweighting was consistently observed over a range of Hubbard model parameters and for different variations of the reweighting procedure (which are described next).
In the left pane of Fig. 3 we show the result of the ϵ → 0 extrapolation using the "improved" action (accurate to O(ϵ 2 )) but computing the reweighting factor R(ϕ) in two ways: using sub-Monte Carlo estimation and from an expansion of exp(−ϵH) ≈ 1−ϵH that introduces ϵ 2 corrections. The sub-Monte Carlo method has a mild ϵ-dependence; this may be expected from the fact that the improved action is correct to order ϵ 2 and that no further approximation is made in computing R, but we also observe that the coefficient of ϵ 2 is small. Meanwhile, the exp(−ϵH) expansion, while parametrically of the same order as the sub-Monte Carlo method, introduces a notable systematic error, shifting the ϵ 2 coefficient substantially. In the right pane of Fig. 3, the corresponding results computed from the "conventional" action described in Appendix A 1 are displayed. This action is accurate to order ϵ, and this is reflected in the data; the systematic error in ϵ is large. We note that it is likely that higher order expansions of exp(−ϵH) will show a better continuum limit behavior than the ϵ 2 truncation. However, the calculation of R become quickly impractical due to the number of contractions needed to compute higher powers of H.
This indicates that if one wants to keep the advantages of using the improved action and, at same time, circumvent the infinite variance problem using the extra time-slice reweighting, the best strategy is to use the sub-Monte Carlo estimator, as described in the previous section.
The usefulness of our method hinges on being able to estimate the reweighting factor with a moderate number of sub-Monte Carlo samplings. In Fig. 4 we show the bias for the double occupancy as a function of the (average) number of sub-Monte Carlo samplings required. We compare the unbiased estimator discussed above, with a biased one that uses 1/F as an estimator for 1/ ⟨F ⟩. The results clearly show that the unbiased estimator is superior to the biased one: the error bars are comparable and the bias vanishes. We note that the stochastic errors on the final results are a combination of variance of the observable from configuration-to-configuration and the variance of the estimator. As we reduce the variance of the estimator, which comes at increasing cost, the errors are asymptotically converging to the value controlled by the configuration-to-configuration fluctuations. The number of sub-Monte Carlo samplings required to reach the asymptotic regime turns out to be very modest and, since the reweighting is done only on the configurations used for measurements, the overhead of using our method is minimal.

VI. DISCUSSION
Using the Hubbard model as a testbed, we studied the infinite variance problem in a fermionic system and suggested a new way to improve the extra timeslice method proposed in Ref. [15]. We established that computing the reweighting factor in powers of the time step ϵ, as advocated in Ref. [15], leads to large finite ϵ corrections, at least at leading order. We proposed instead to compute the reweighting factor using a sub-Monte Carlo calculation and pointed out that using an unbiased estimator allow us to use very large values of ϵ.
To assess the effectiveness of our method, we calculated the double occupancy of Hubbard model. We used two different discretizations, one correct up to O(ϵ) and another, improved one, correct up to O(ϵ 2 ). In order to take full advantage of the convergence properties of the improved action it is essential that the reweighting is done non-perturbatively and using an unbiased estimator.
We note that for efficient simulations, we need to use the improved action. As shown in Fig. 3, one can use larger values of ϵ with our improved action. Larger values of ϵ translate in a smaller number of time-slices with a resulting lowering of computational costs. In fact, the bottleneck in Monte Carlo simulation of fermionic theories is the calculation of the fermion determinant. This scales in general as O(V 3 N 3 ), for direct evaluations, and as O(V 3 N ) when using the reduction formula as in eq. (A9) where V is the volume of spatial lattice and N is the number of time-slices. Additionally, autocorrelation is generally increased at smaller values of ϵ, which then favors simulations with large values of ϵ. On another hand, the improved action simulations exposed the infinite variance problem, which seems to be more prominent than when simulating the unimproved action.
The success of the extra time-slice method requires that F (ϕ) is strictly greater than zero, which holds for the Hubbard model at half-filling. Away from half-filling, a sign problem develops which then spoils this property, and a new definition of F (ϕ) will be required to avoid the variance problem. However, it may be possible to apply a modified extra time-slice method in that scenario; we will explore this in future work.
Lastly, we note the fact that 1/A is a biased estimator for 1/⟨A⟩ also implies that in traditional applications of reweighting, for example in simulations with a sign problem, the estimate for 1/⟨R⟩ (see denominator in eq. (14)) is biased. In fact, if the same ensemble is used to compute the numerator and denominator, the overall estimator for any observable will have a bias different from that of 1/R alone, and therefore care should be taken to define unbiased estimators for observables in reweighted theories, especially when the ensemble size is not large.
where [dϕ] = t,x dϕ t,x , S 0 is the auxiliary field action, eq. (4), and the fermion matrix is given by Here, b t = −1 for t = N − 1 and 1 otherwise, due to antiperiodicity, and δ ⟨x,y⟩ = µ δ x,y+µ is the hopping matrix. Using the following identity for an arbitrary set of matrices {A t } [22], The error in using this action is O(ϵ) because, in taking the product of eq. (A2) for all t, the subleading term is O(N ϵ 2 ) = T −1 O(ϵ), T being the temperature.
The quantity in the trace can be evaluated using the properties of Grassmann coherent states as and M −1 a (ϕ) is the matrix inverse of eq. (A20). Thus F (ϕ) = G 1 (ϕ) + O(ϵ 2 ). Higher orders may be com-puted, but the number of fermion contractions grows exponentially.