Population Control Bias and Importance Sampling in Full Configuration Interaction Quantum Monte Carlo

Population control is an essential component of any projector Monte Carlo algorithm. This control mechanism usually introduces a bias in the sampled quantities that is inversely proportional to the population size. In this paper, we investigate the population control bias in the full configuration interaction quantum Monte Carlo method. We identify the precise origin of this bias and quantify it in general. We show that it has different effects on different estimators and that the shift estimator is particularly susceptible. We derive a re-weighting technique, similar to the one used in diffusion Monte Carlo, for correcting this bias and apply it to the shift estimator. We also show that by using importance sampling, the bias can be reduced substantially. We demonstrate the necessity and the effectiveness of applying these techniques for sign-problem-free systems where this bias is especially notable. Specifically, we show results for large one-dimensional Hubbard models and the two-dimensional Heisenberg model, where corrected FCIQMC results are comparable to the other high-accuracy results.


I. INTRODUCTION
Projector Monte Carlo algorithms, including Green's function Monte Carlo (GFMC) [1], diffusion Monte Carlo (DMC) [2], and full configuration interaction quantum Monte Carlo (FCIQMC) [3], have become indispensable tools in extracting the ground state properties of various quantum systems. These methods employ a stochastic version of the power method; a method that starts from an initial wavefunction with a non-zero overlap with the ground state and filters out higher excited states by a repeated application of a suitable imaginary-time propagator of the Schrödinger equation. As the system size grows, its Hilbert space grows exponentially, and a stochastic representation of the wavefunction becomes a necessity. The wavefunction is then sparsely represented by a set of walkers that are evolved according to the propagator. Their expectation value on any configuration reflects the wavefunction value on that configuration.
The main challenge faced by these algorithms, compared to traditional stochastic methods, is that their propagators do not form column-stochastic matrices in general [4]. On the one hand, their elements can be either positive or negative. On the other hand, the sum of the absolute values for each column is not normalized. The first issue implies that walkers should be signed and often leads to a sign problem. The latter issue can be dealt with using a branching process where walkers need to be added or removed dynamically to reflect the changes in the L 1 -norm of the wavefunction. However, the branching creates a practical problem because there is a chance that all walkers will die or their storage exceeds the available computer memory. Moreover, the computational ef- * k.ghanem@fkf.mpg.de † a.alavi@fkf.mpg.de fort per sample and the statistical accuracy are directly proportional to the total number of walkers, so minimizing the fluctuations in the number of walkers makes running times more predictable and markedly enhances the utilization of available computational resources. Consequently, different projector Monte Carlo methods apply some population control mechanism that keeps the number of walkers around a set target throughout the simulation. It has long been known that this population control introduces a bias in DMC calculations [5] and other Green's function methods [6] and that the bias scales inversely with the number of walkers [4]. Different remedies have been proposed to overcome this bias, including an extrapolation in the number of walkers, a modification of the wavefunction renormalization procedure, and deriving new unbiased estimators by reweighting with the discarded scale factors [7]. The population control bias has often been neglected in FCIQMC because it is usually much smaller than the statistical error bars or other biases [8]. Indeed, for any system with a fermionic sign problem, the original full FCIQMC algorithm needs a minimum number of walkers, below which the sign-incoherent noise dominates the simulation and the dynamics of the walkers become unstable. This number of walkers is typically large enough that the population bias does not exceed the statistical error bars. Moreover, in practice, the initiator approximation is often used instead of the original FCIQMC algorithm. This approximation constrains the dynamics of the walkers, which prevents the sign-incoherent noise from propagating and destabilizing the simulation [9]. The constraint allows stable simulations using any number of walkers but introduces a bias that also scales down with the number of walkers. The resulting initiator bias is much larger than the population control bias, and reducing it requires increasing the population. Therefore, the population bias has been masked in these applications by the initiator bias and did not pose a practical concern. In contrast to these systems, there are some sign-problemfree ones, where no minimum number of walkers is required for getting stable FCIQMC simulation, and it is a matter of obtaining more samples to reduce the statistical error bars. For such systems, the population control bias is the only systematic bias, making its study most convenient in these cases. Consequently, in this paper, we study the population control bias in FCIQMC, particularly in its application to sign-problem-free systems, and in conjunction with importance sampling.
The paper is structured as follows: We first review the FCIQMC method, its sign problem, energy estimators, and its population control schemes. We show that population control leads to a biasing source term in the master equation, which equals the covariance between the shift parameter and the wavefunction. We also show that this covariance term scales inversely with the number of walkers and use it to relate the biases of different energy estimators. Ref. [8] borrowed a reweighting procedure developed originally for DMC and showed numerically that it effectively corrects the projected energy estimates. We provide a derivation of this reweighting procedure within FCIQMC and adapt it for correcting the shift estimator. We then examine the effect of the importance sampling on reducing the bias. Finally, we demonstrate the effectiveness of the correction technique and the importance sampling by applying them to the one-dimensional Hubbard model and the two-dimensional Heisenberg model with large lattice sizes.

II. BACKGROUND
A. FCIQMC FCIQMC projects the ground state of a Hamiltonian H from an initial state |ψ 0 by evolving it according to the imaginary-time Schrödinger equation where S is some scalar shift of the diagonal elements. The formal solution of this equation can be written in terms of the eigenstates of the Hamiltonian |φ n and their energies E n as following If the shift S is lower than the first excited state energy, then contributions from all excited states decay exponentially leaving only the ground state wavefunction in the long time limit The wavefunction in FCIQMC is represented by a set of walkers that occupy the Hilbert space spanned by Slater determinants, which are constructed from a finite basis of single-particle orbitals. The walker population is updated stochastically at each time step such its average change respects the time-discretized Schrödinger equation i.e.
where ∆τ is the discretized time-step, N i := D i |ψ is the number of walkers on determinant |D i , and H ij := D i |Ĥ|D j are the Hamiltonian matrix elements in this determinant basis. In the long time limit, the ground state wavefunction can then be estimated by averaging the walkers over many time steps The master equation (4) is implemented using three steps of walker dynamics [3]: • Spawning: Each walker on determinant |D i randomly spawns another walker on a connected determinant |D j with probability P i→j . The spawned walker is then accepted with probability and it is given a sign opposite to the sign of H ij N i . When P spawn > 1, one spawn is created stochastically with a probability that equals the fraction part, while the integral part determines the number of additional deterministically-created spawns.
• Death/Clone: Each walker dies or gets cloned with a probability that equals the absolute value of The walker is killed when this value is positive and duplicated when it is negative.
• Annihilation: Walkers of opposite sign residing on the same determinant are cancelled and removed from the simulation. As discussed next, this step is crucial to ensure convergence to the ground state. Without it, the solution would quickly get contaminated by noise that masks the physical ground state.

B. Sign Problem
In the absence of annihilation, the in-phase combination of the positive and negative walkers corresponds to a non-physical solution that grows faster than the desired out-of-phase solution [10]. The annihilation reduces the growth rate of the in-phase solution, and the reduction is proportional to the total number of walkers. As a result, a minimum number of walkers, known as the annihilation plateau, is needed in order for the physical solution to outgrow the in-phase one and for the correct sign structure to emerge. This minimum number is typically less than the full size of the Hilbert space but can be a substantial fraction of it.
The initiator approximation overcomes this issue by modifying the dynamics of the walkers [9]. Only walkers residing on determinants with a population above a certain threshold are allowed to spawn onto empty determinants. This prevents lowly-populated determinants (the non-initiators) with fluctuating signs from propagating the sign-incoherent signal further. This constrained dynamics allows FCIQMC to converge at an arbitrarily small number of walkers. The cost is introducing a bias that can be systematically improved by increasing the number of walkers. The initiator bias can also be significantly reduced using the recently-proposed adaptive shift method [11,12]. In this method, the life time of a non-initiator determinant is boosted by reducing its shift to account for the missing back-spawns from its underpopulated surrounding.
The annihilation plateau is the manifestation of the sign problem in FCIQMC. There are a few model systems, however, which are free from this sign problem and can converge at any number of walkers without the initiator constraint. For these systems, all walkers spawned onto a specific determinant have the same sign. One obvious way this can happen is when all the non-diagonal elements of the Hamiltonian have a negative sign (socalled stoquasticity); thus, all walkers are of the same sign. More generally, a Hamiltonian is sign-problem-free when we can classify the determinants into two disjoint sets such that the matrix elements between members of the same set are negative (or zero), while the matrix elements between members of different sets are positive (or zero). In this case, the walkers of the two sets have opposite signs, but they never mix throughout the simulation. This general case can always be recast as the earlier one-sign case by using a different sign convention for the determinants in one of the two sets, which would make all the non-diagonal matrix elements negative. Examples of sign-problem-free systems include the one-dimensional Hubbard model with open boundary conditions, the one-dimensional Hubbard model with periodic boundary conditions and specific combinations of up-and down-electrons, and the two-dimensional Heisenberg model on a bipartite lattice. All of these systems are introduced and discussed in Section VI.

C. Projected Energy
Two kinds of energy estimators are used in FCIQMC. One is the energy shift discussed in the next section. Another is the so-called projected energy obtained by pro-jecting on some trial wavefunction |ψ T : As the average wavefunction converges to the true ground state, its projected energy converges to true ground-state energy. To compute this estimate, one does not need to know the average wavefunction explicitly. Instead, the numerator and denominator are computed out of the instantaneous wavefunctions during the simulation and then averaged separately. The ratio of the averaged quantities gives an estimate of the projected energy of the average wavefunction: where L is the number of FCIQMC samples |ψ(τ i ) taken after an appropriate thermalization period. For single-reference molecular systems, the trial wavefunction is typically chosen to be the reference determinant with the highest population (usually the Hartree-Fock determinant). For multi-reference molecular systems, the simulation is run for some period before a trial space is constructed out of the most populated determinants. The ground state in the trial space is then calculated and used as a trial wavefunction for a more reliable projected energy estimator. However, for some model systems, like the ones considered in this paper, the wavefunction spreads across the whole Hilbert space such that the overlap with a restricted subspace is very small, and the energy estimates mentioned above are very noisy. For these systems, we introduce a particular projected energy estimator, which is useful when they are free of the sign problem. Let us denote by |±1 , the uniform trial wavefunction that spans the whole Hilbert space equally and has the same sign structure as the ground state wavefunction the corresponding projected energy estimator then reads We call this the uniform projected energy. Calculating this estimator requires knowing the sign structure of the ground state a priori, which is feasible in sign-problemfree cases. Assuming in these cases a sign convention that renders all walkers positive, the uniform projected energy takes the following simple form where N is the total number of walkers. This estimator will be relevant later when deriving a correction of the population control bias.

D. Population Control
From Eq. (3), it is clear that unless the shift S equals the ground state energy E 0 , the overall amplitude of the wavefunction, and thus the total number of walkers N , would decay/grow exponentially. To maintain a stable number of walkers in the long-run, the shift is updated periodically as following: where γ is a damping factor, and A is the number of time steps between successive updates of the shift. The intuition behind this formula is simple. When the population increases, the shift is lowered, which decreases the population in subsequent steps and vice versa. The logarithm is used because the change in the number of walkers is exponentially proportional to the energy/shift difference. This population control scheme is the most commonly used one in FCIQMC. To get the number of walkers around a set target, the shift is held constant at the beginning of the simulation, during which the number of walkers increases exponentially. Then once it reaches its desired value, it is stabilized using Eq. (13). After an equilibration period, the average number of walkers becomes a constant, and thus the average value of the shift can be used as an independent energy estimator.
Ref. [13] has recently proposed an improved scheme that avoids potential overshoots in the number of walkers. The new scheme actively targets the desired number of walkers in its update formula, evading the need for the initial constant-shift mode. Other population control schemes can also be devised where the amplitude of the wavefunction is controlled using its overlap with a trial wavefunction ψ T |ψ instead of the total number of walkers N . This can be achieved either using a formula similar to Eq. (13) or by setting the shift equal to the corresponding projected energy estimator E T at each time step. The latter keeps the average value of the overlap Such intermediate normalization schemes have the advantage of reducing the statistical noise in the corresponding projected energy estimates.

III. POPULATION CONTROL BIAS
Population control introduces a bias into the stochastic solution of FCIQMC because of the dependence of the shift on the wavefunction itself and the resulting feedback into the solution. In the following, we will spell out this dependence explicitly and gain insights into the associated bias.
At each time step of FCIQMC, the wavefunction is updated stochastically such that the average change in the wavefunction obeys the Schrödinger equation (1) The master equation of the average wavefunction itself, however, is obtained by taking the average of both sides [14] − Controlling the population by updating the shift necessarily implies that the shift and the wavefunction are correlated, and thus the average product S(τ ) |ψ(τ ) cannot be replaced by the product of the averages, but there is a non-vanishing covariance term As a result, the average wavefunction does not satisfy the Schrödinger equation exactly, but it instead evolves according to the following equation, which has an additional source term This additional term is the origin of the population control bias in FCIQMC. We can go one step further and express the above equation in terms of the average walker populations where we defined the relative bias of population N i as From this, one can view the population bias as a modification of the diagonal elements of the Hamiltonian and so the biased wavefunction as the ground state of this modified Hamiltonian rather than the desired original Hamiltonian. Note that in anticipation of the discussion next subsection, we defined B i in a way that explicitly highlights the dependence of the bias on the average number of walkers N .
As an illustrative example, we show in Fig. 1 the energy estimates for the one-dimensional 14-site Hubbard model with U/t = 4 and periodic boundary conditions. The model is small enough that we can calculate its exact ground state energy E 0 with exact diagonalization. A quick recap of the Hubbard model is presented in Section VI A. We performed FCIQMC calculations targeting an increasing number of walkers from N = 100 to The shift is updated every 10 iterations with damping parameter γ = 0.1. The error bars are calculated using the blocking method of Ref. [15]. From this plot, we see that both the projected energy and the shift estimators have a bias that scales down the average number of walkers as 1/N . We also note that the shift estimator has a noticeably higher bias than the projected energy at any specific number of walkers. The scaling behavior is discussed in the next subsection, while the discrepancy between the different estimators is discussed subsequently.

A. Scaling with the Number of Walkers
To show that the population control bias scales as 1/N , we need to show that the relative bias B i is independent of the average number of walkers. Our starting point is the shift update formula (13) relating the shift to the total number of walkers. We will assume that the shift is updated continuously to simplify the analysis. Nevertheless, the conclusion regarding scaling still holds in the case of a periodic update and is discussed in Appendix A.
By rearranging the terms in the update formula, one can see that the quantity stays constant for all times τ . Therefore, its covariance with any other statistical quantity vanishes, which allows us to express the covariance between the shift and the population of determinant |D i as follows: where in the second line, we expanded the logarithm of N around its mean value N . Substituting back in the relative bias definition (20), we get When there is no sign problem or the number of walkers is above the annihilation plateau, the mean populations of all determinants scale linearly with the total number of walkers. Moreover, since the spawning and death/cloning events in FCIQMC are scaled Bernoulli random variables, their variances are proportional to their mean values. Therefore, the covariances between the populations of different determinants also scale linearly with the number of walkers. As a result, the numerator and denominator of the above ratio have the same scaling, such that the relative bias B i is mostly independent of the total number of walkers.

B. Different Biases for Different Estimators
In Fig. 1, we saw that the shift estimator has a higher bias than the projected energy estimator at any specific number of walkers. We observed this systematic deviation of the two estimators for different systems and was also reported in Ref. [8]. To understand it, let us consider Eq. (19) in the ideal case scenario where all determinants have the same relative bias B i = B. In that special case, the wavefunction itself and all its associated energy estimators would not be biased, but the average shift would still have the following positive bias: This suggests that the shift, as an energy estimator, is particularly affected by the population bias. We can generally quantify how much the average shift is biased compared to any other projected energy estimator. By projecting the biased master equation (18) on the corresponding trial wavefunction, we get the time evolution of its overlap with the solution At equilibrium, the left-hand side vanishes and the equation gives a direct relation between the average shift and the projected energy estimate This difference in the bias of the two estimators can be easily estimated in FCIQMC using samples of the overlap ψ T |ψ . For example, for sign-problem-free cases, the difference between the average shift and the uniform projected energy estimate can be expressed in terms of the average number of walkers and its covariance with the shift In Fig. 2, we show that this equation is indeed satisfied by the earlier results of the Hubbard model. This relation provides a simple way of partially correcting the bias of the average shift in cases where calculations of projected energies are not practically feasible. To remove the bias completely, we need to correct the wavefunction itself, which we address in the next section. It is worth noting that Fig. 2 demonstrates that the covariance Cov(S, N ) is mostly independent of N , which is consistent with the same behavior of Cov(S, N i ) concluded from the previous section. The dependence of the covariance terms on other simulation parameters is discussed in Appendix B.

IV. BIAS CORRECTION
The intuitive idea behind eliminating population control bias is noting that the role of updating the shift is merely scaling the wavefunction. The problem is that the scale factors are correlated with the sampled wavefunctions. Nevertheless, if we keep track of these factors and use them to scale the wavefunctions back, we can undo the effects of the shift updates. As a result, these rescaled wavefunctions would then satisfy the Schrödinger equation with a constant shift and thus without bias.
In the following subsection, we show mathematically how this is achieved by a reweighting of FCIQMC samples. The reweighting technique can be used to correct the estimates of any wavefunction-related quantity, but it cannot be applied directly to the shift average. So we show next how to fix the shift estimator properly. Finally, we discuss some technical details of the reweighting technique.

A. Correcting the Wavefunction
In FCIQMC, the time derivative in Schrödinger equation is discretized using Euler's explicit method, leading to the linear propagator However, to make the derivation and results more streamlined, we assume that FCIQMC is using the following exponential propagator instead of its linear approximation This replacement of the linear propagator by the exponential one does not affect the validity of the results. The reason is that the next step requires decomposing the propagator into the product of two propagators. For the linear propagator, this decomposition introduces a timestep error of the order O(∆τ 2 ), which is the same order of error introduced by using the exponential propagator instead.
Let us now decompose the propagator into one with a hypothetical constant shift C and a remaining pure scaling term Substituting back in Eq. (30), we get the following equation, which is (approximately) satisfied at every iteration of FCIQMC Denoting X C (τ ) := exp ∆τ C − S(τ ) and taking the average of both sides we see that the average of X C (τ ) |ψ(τ + ∆τ ) is the unbiased evolution of the average of |ψ(τ ) using the constant shift C .
By repeating the above analysis for an arbitrary number of iterations n, we get the following general relation where the weight factor is now the product of subsequent scaling factors This expression shows that by performing enough iterations and using a large-enough value of n, we can project out the exact ground state |φ 0 without any population control bias by reweighing the wavefunction at iteration τ with weight W C,n (τ ) In Fig. 3, we plot these weights for the 14-site Hubbard model using different values of n. The value n can be thought of as the order of correction applied to the wavefunction. Weights of order n reflect how the amplitude of the wavefunction would change if the population control had been suspended for a period of n∆τ . For large values of n, the weights fluctuate over several orders of magnitude, which effectively reduces the number of samples in any averaged quantity. This puts a practical limit on the chosen value of n, as we will see later.

B. Correcting Energy Estimators
Correcting the projected energy estimators using the wavefunction weights is straightforward. One has to reweight the terms appropriately in both the numerator and the denominator of Eq. (9) To correct the shift, however, the reweighting cannot be applied directly. We need first to relate the shift to the corrected wavefunction. Being able to correct the shift is valuable because, for some systems, obtaining reliable projected energy estimates is computationally expensive while the shift is readily available.  Evolving the exact ground state wavefunction |φ 0 using the Schrödinger equation with fixed shift C leads to an exponential decay of its norm with rate If we project the above equation on some trial state |ψ T and rearrange the terms we get what is known as the growth estimator of the ground state energy We can approximate both the numerator and denominator using equation (34) and a large enough value of n Substituting back in Eq. (39), and replacing the mean values by averages over L iterations, we get This expression is valid for any value of C below the first excitation energy and any trial wavefunction ψ T with nonzero overlap with the ground state. The specific choices may only influence the statistical uncertainty of the estimator.
Incidentally, by setting C to the average shift S, the second term gives us the necessary correction to remove the population control bias of the average shift. In fact, using the average shift as C is optimal in the sense that its value minimizes the squared distance to all sampled shift values and thus minimizes the fluctuations of the scale factors X C . As was the case for project energies, the choice of the trial wavefunction depends on the system. It is reasonable in molecular systems to use the reference determinant or the ground state of a trial space constructed out of the most populated determinants. In sign-problem-free systems, using the uniform trial wavefunction of Eq. (10) the overlap ψ T |ψ(τ ) is just the total number of walkers N (τ ) In Fig. 4, we plot the results of applying this reweighting technique to the 14-sties Hubbard model. The reweighting eliminates the population control bias in both the shift and projected energy estimates, with the corrected values falling spot on the exact energy. In this plot, we used n = 2560 terms in the weight factors.

C. Correction Order
The number of correction terms n included in the weight factors W C,n affects both the accuracy and precision of the corrected energy estimates. Ideally, we want to include as many terms as possible because more terms imply longer projection times, Eq. (34), thus further reduction of the population control bias (better precision). On the other hand, including more terms implies longer times without population control and larger the fluctuations of the weight factors, Fig. 3, thus larger error bars (less accuracy). In Fig. 5, we plot the corrected shift estimates against the correction order n. As the correction order increases, the population control bias decays exponentially while the statistical error bars get larger. After some point, around n = 3000, increasing the correction order has diminishing returns for the bias but still increases the error bars. The optimal value of n is the one for which the bias is just below the statistical uncer-   tainty. Knowing this value would not be possible without knowing the exact result. As a heuristic, we suggest choosing n as the lowest value, after which the energy estimate goes up (assuming the bias is positive). When a set of calculations with different numbers of walkers is available, a better choice can be made. Choose the value for which the corrected energy estimates from different number walkers agree within the error bars. For example, this is the value n = 2560 for the result of the 14-site Hubbard shown in Fig. 6.
Finally, it is worth pointing out that when comparing the corrected estimates of different calculations using different time steps ∆τ , the relevant quantity is the total imaginary time T := n∆τ , which should be long enough to project out the bias and make it smaller than the statistical error bars. Consequently, it is desirable to choose the time step as large as possible. On the other hand, the correction formula has a local truncation error of the order O(∆τ 2 ) due to the replacement of the linear propagator by the exponential one. The global truncation error thus scales as O(∆τ ) for a fixed projection time T . In practice, however, we did not observe this error in our calculations even when the time-step was close to the maximum allowed value 2/(E max − E 0 ), where E max is the maximum eigenvalue of the Hamiltonian.

V. ROLE OF IMPORTANCE SAMPLING
Since the population control bias is proportional to the covariance between the shift and the wavefunction, reducing the variance of the shift or the wavefunction should help reduce the bias. A well-known technique for reducing the variance of quantum Monte Carlo estimates is importance sampling [16,17]. Given a guiding wavefunction |ψ G , importance sampling corresponds to solving an auxiliary problem with the modified Hamiltoniañ The wavefunctions of the original and auxiliary problem are then related byÑ Using a guiding wavefunction that is close to the ground state, reduces the fluctuation in the number of walkers and the shift. The reason is that a walker on a determinate |D j contributes on average 1 + ∆τ ( iH ij − S) walkers to the next iteration. As the guiding wavefunction becomes closer to the ground state, the variation in the column-sum iH ij gets smaller, reducing the variation in the number of walkers and its dependence on the details of the sampled wavefunction. From a different perspective, the propagator becomes closer to a columnstochastic matrix alleviating the need for population control. Importance sampling has also been discussed in the context of Density Matrix Quantum Monte Carlo [18].
As an illustrative example, take the limiting case where the guiding wavefunction is the exact ground state. Then the column-sum of the modified Hamiltonian matrix elements becomes a constant and thus all walkers have the same reproduction rate 1 + ∆τ (E 0 − S) . Moreover, the uniform projected energy E ± equals the grounds state energy E 0 regardless of the sampled wavefunction. Therefore, the population control bias is given entirely by Eq. (28) which only depends on the covariance of the shift with the number of walkers rather than with the details of the sampled wavefunction.
In practice, the guiding wavefunction does not necessarily need to be very close to the ground state. Using even the simplest forms of guiding wavefunctions can have a noticeable impact on the population control bias. In this paper, we employ the following Gutzwiller-like wavefunction [19], where energetically-unfavorable determinants are exponentially suppressed  Figure 7. Comparison of the energy estimates of the 6 × 6 Heisenberg model with and without importance sampling (labeled "Guided" and "Original", respectively). The importance sampling uses a Gutzwiller wavefunction with parameter g = 3. and g is a non-negative parameter to be optimized later. Implementing importance sampling using this guiding wavefunction is straightforward in FCIQMC. Spawns from determinant |D i to determinant |D j have to be scaled by the factor

Energy per Site
Computing this factor incurs only a small computational overhead, and for local lattice models, it can be evaluated in a way that is independent of the system size. In Fig. 7, we show the effect of using this guiding wavefunction on the 6 × 6 Heisenberg model with periodic boundary conditions (see Section VI B). The plot demonstrates how markedly-effective the importance sampling is in reducing not only the statistical error bars but also the population control bias of both the shift and the projected energy estimators. The guiding wavefunction parameters can be optimized by minimizing the variational energy as in variational Monte Carlo or by solving an eigenvalue equation in a projected subspace as done in coupled-cluster meth-ods. Besides these two options, we found that a straightforward bootstrapping method works very well for the sake of minimizing population control bias. We first optimize the parameters using low-accuracy but cheap FCIQMC energy estimates and later use these optimal parameters to obtain a highly-accurate energy estimate employing a much larger number of walkers. For example, in the earlier case of the Heisenberg model, the parameter g of Gutzwiller wavefunction was chosen by running different FCIQMC calculation using 1000 walkers and different values of g and then selecting the one giving the lowest energy estimates as shown in Fig. 8.

VI. APPLICATIONS
To demonstrate the capabilities of the presented improvements of the FCIQMC algorithm, we apply it to sign-problem-free lattice models with large lattice sizes. The calculations were performed using the efficient NECI implementation [20]. In these calculations, we employed importance sampling using the Gutzwiller-like wavefunction of Eq. (47) and optimized its parameter g as described earlier. Since calculating the uniform projected energy gets more expensive for larger grid sizes, we relied solely on the shift as an energy estimator. To account for any remaining population control bias, we used the reweighting formula of Eq. (43) to correct the shift average.

A. Hubbard Model
The Hubbard Hamiltonian on a general lattice readŝ where ij indicates summation over all neighboring sites, c † i,σ (ĉ i,σ ) is the fermionic creation (annihilation) operator of an electron of spin σ at site i andn i,σ =ĉ † i,σĉ i,σ is the corresponding number operator, t is the hopping amplitude between nearest neighbors and U is on-site Coulomb repulsion. This simple model of interacting electrons on a lattice has been under study for over 50 years and is used as a model for understanding the physics of hightemperature superconductors [21][22][23][24]. Interestingly, the one-dimensional case can be solved in the thermodynamic limit by employing Bethe ansatz, which reduces the problem to a set of coupled algebraic equations, Lieb-Wu equations, which can be solved analytically in the thermodynamic limit [25,26].
Using open boundary conditions, the 1D Hubbard model does not have a sign problem in FCIQMC. This can be seen by ordering the orbitals by their spin then by their lattice site. Since in real space, it is only kinetic energy terms that cause transitions between two Slater determinants (by hopping single electrons between nearest-neighbor sites), and the sign of these matrix elements are negative, and in addition, since (in 1D and open boundary conditions) there is no possibility for an electron to return to its original lattice site via alternative routes, the problem is sign-problem-free. With periodic boundary conditions, electrons hopping across the boundary can, in general, lead to positive matrix elements due to a possible negative Fermi-sign from the paths that wrap around the boundary and return to the original Slater determinant. Nevertheless, if the number of spin-up electrons and the number of spin-down electrons are both odd, the number of required commutations will always be even, and thus no Fermi-sign arises. This means that specific off-half-filling periodic models can also be studied with the FCIQMC method without any sign problem.
In the following, we look at some paradigmatic long 1D Hubbard systems. We benchmark our results against DMRG results obtained using the BLOCK code [27]. DMRG is a deterministic method that uses the so-called matrix product state (MPS) to represent the groundstate wavefunction. The MPS is iteratively optimized in sweeps through the sites in a predefined order, which is trivially chosen in 1D systems. The method benefits from the low-entanglement of the ground states that is especially present in 1D problems. The computational resources required to solve a system with DMRG up to a given accuracy is mainly determined by the bond dimension M . DMRG also benefits from the locality, which is broken by introducing periodic boundary conditions.
In Table I, we show FCIQMC and DMRG results of the 102-site Hubbard chain for U/t = 4 and 8 at half-filling as well as the 150-site Hubbard chain at U/t = 8. These results are also compared with the analytical result for the thermodynamic limit obtained with the Bethe ansatz. The on-site interaction parameter U/t = 8 is chosen as an example of an intermediate coupling regime. In both the real-space and the reciprocal-space basis representation, the wavefunction is highly spread-out; a single Slater determinant is a bad approximation in both cases, thus making it a difficult system to treat. For U/t = 4, using a reciprocal-space basis would normally be more suitable as the wavefunction would be more compact in this case. However, we stick to the real-space basis because FCIQMC benefits from the absence of a sign problem significantly more than compactness. For the 102-site systems, we also calculate both with open boundary conditions (OBC) and periodic boundary conditions (PBC). The DMRG computational effort for 1D systems with OBC typically scales as M 3 , whereas it scales as M 5 for PBC. There are, however, similar MPS-based algorithms available that improve the scaling of PBC calculations compared to traditional DMRG [28,29]. In contrast, the computational cost of FCIQMC is the same for open and periodic boundary conditions as long as the system contains an odd number of spin-up and spin-down electrons such that it remains strictly sign-problem-free. While there is still a small bias for the guided but uncorrected   Table II and for  the DMRG calculations in Table III, respectively. "FCIQMC -Original" implies no bias correction or importance sampling. "FCIQMC -Guided" implies the use of importance sampling with the Gutzwiller-like factor. "FCIQMC -Corrected" implies the population control bias correction on top of the importance sampling. Chain lengths of ∞ indicate calculations in the thermodynamic limit.
result, the corrected FCIQMC result agrees well with DMRG within statistical error bars. The larger statistical error bars for the U/t = 4 compared to the U/t = 8 for comparable wall clock time are due to the fact that the real-space basis is less suitable for lower U/t. Additionally, we also look at the 102-site system with four holes (with total spin projection M s = 0) at U/t = 8. As there are more low-lying excited states, it takes more iterations for FCIQMC to project out the ground state. This results in slightly larger error bars compared to the equivalent half-filled system. Systems off half-filling are also more difficult to treat with DMRG, and thus higher M values are required to get a converged result. Still, there is good agreement between the DMRG benchmark result and the corrected FCIQMC result.
In general, the FCIQMC results can be continuously improved by adding more CPU time which, is typically the bottleneck to reduce statistical error bars. However, the required memory to store the stochastic representation of the FCIQMC wavefunction at any time during the simulation is typically much lower than the corresponding MPS in DMRG, even though an MPS is already a very efficient way of expressing the ground-state wavefunction of a 1D system. The basic memory require-ments of an FCIQMC calculation are the list of the instantaneously occupied determinants in a binary format (with the length of this binary representation depending linearly on the number of sites) and the instantaneous walker population on these determinants. Additionally, there is the need to store the hash table for the main walker list (requiring a hash table with two integers per entry and a list containing empty indices, both roughly scaling with the total number of determinants) and the spawning array (in a conservative estimation requiring up to 1/10 of the main walker list). Thus, in total, one needs to store 64-bit integers, where is the number of sites and N max dets is the maximum number of occupied determinants throughout the simulation. Based on this, we can approximate the memory usage: e. g., the periodic 102-site system at U/t = 8 was calculated with 3 × 10 7 walkers. Since the wavefunction is highly spread out, we can assume that the walkers also occupy roughly 3 × 10 7 determinants. According to Eq. (50), this then requires 2.52×10 8

System
Sites N/10 6 g n Gutzwiller is obtained by the projection scheme presented in [30]. Optimization of g using the scheme presented in Fig

B. Heisenberg Model
When the Coulomb repulsion U in the Hubbard model is much larger than the hopping parameter t, double occupancies become energetically unfavorable. One can then project the Hubbard Hamiltonian to the space of states with no double occupations and derive an effective Hamiltonian describing the low-energy excitations. At half-filling, these excitations are spin excitations, and the effective Hamiltonian is the spin-1/2 antiferromagnetic Heisenberg model with a coupling constant J = 4t 2 /U : where S i = (S x i , S y i , S z i ) is the spin operator on site i with The relation between the two Hamiltonians can be derived using second-order perturbation theory in the t/U parameter, which shows that neighboring electrons with opposite spins can benefit from virtual hopping forming double occupancies. The Heisenberg model is a basic and important model for studying magnetic materials and their phase transitions. Like the Hubbard model, it can be solved in the one-dimensional case using the Bethe ansatz [31,32]. In two dimensions, the model has been solved numerically to a high-accuracy for different lattice sizes using different methods including Green's function Monte Carlo (GFMC) [33][34][35], stochastic series expansion (SSE) [36], and world line loop/cluster methods (WLQMC) [37][38][39][40]. The model has also been used as a benchmark for recent variational ansatzes such as projected entangled pair states (PEPS) [41,42], neural quantum states (NQS) [43], and neural autoregressive quantum states (NAQS) [44].
In FCIQMC, the Heisenberg model is sign-problemfree on a bipartite lattice [10]. Walkers of different signs do not mix on such lattices, although all spawns have opposite signs to their parents. To see why, we split the lattice into two interlacing sub-lattices, where each site on one sub-lattice is surrounded only by the sites of the other. Then we can classify determinants into two groups, A and B, depending on whether a determinant has an even or an odd number of spin-up electrons on one of these sub-lattices. Applying the Hamiltonian to a determinant increases the number of spin-up electrons on one of the sub-lattices by one and decreases it on the other. Therefore, the Hamiltonian connects members of A only to other members of B and vice versa, so walkers of opposite signs never mix.
For periodic boundary conditions, we see that FCIQMC results agree extremely well with SSE and is more accurate than GFMC. The specific variant of GFMC used for obtaining these results is close in spirit to FCIQMC and formulated similarly as a projective Monte Carlo method in the Hilbert space [35]. The two methods differ mainly in the details of their walker dynamics and population control. As pointed out in Ref. [36], the discrepancy between SSE (and thus also FCIQMC) results and GFMC results is likely to be due to a remaining population control bias in the latter. For open boundary conditions, FCIQMC provides the same level of accuracy as the benchmark results of WLQMC. These two QMC methods provide, as expected, much lower energies than the other variational methods: PEPS and NAQS. The results confirm our approach's effectiveness in eliminating the population control bias and demonstrate the potential of FCIQMC in producing accurate results for large spin systems. Nonetheless, we note that the population control bias and the error bars of FCIQMC get larger as the lattice size increases. We attribute this behavior to two factors: the shortcomings of our simple guiding wavefunction and the system's closing gap in the thermodynamic limit.
One obvious drawback of the guiding wavefunction (47), in this case, is breaking the rotational symmetry of the Hamiltonian. To see this, remember that for the Heisenberg model, we use the eigenfunctions of the S z i operators as a single-particle basis and thus the diagonal Hamiltonian elements are only a function of the z-component of the spin However, the Heisenberg Hamiltonian of Eq. (52) is isotropic and thus the guiding wavefunction does not possess the necessary rotational invariance of the true ground state. The discrepancy between the guiding wavefunction and the exact ground state evidently becomes more critical as the systems size increases. It thus lessens the effectiveness of importance sampling in reducing the population control bias and the statistical error bars of the guided results. Moreover, it is has been shown that the gap between the ground state and the first excited state of the Heisenberg model scales inversely with the lattice size [45,46], which implies longer projection times. As a result, more correction terms need to be included in the reweighting procedure to remove the same amount of bias, which increases the statistical error bars of the corrected results further. While the closing gap is an inherent property of the system under study, the guiding wavefunction is under our control and can be improved to refine the results further and accelerate convergence.    A promising research direction would be employing one of the variational anastzes as a guiding wavefunction in FCIQMC. Particularly interesting are neural quantum states that provide compact representation and allow fast evaluations of the guiding wavefunction, a necessary property for its practical use in FCIQMC.

VII. SUMMARY
In this paper, we related the population control bias of FCIQMC directly to the covariance between the sampled wavefunction and the shift. We used this relation to explain why the bias scales inversely with the average number of walkers and quantify the resulting discrepancy between the different energy estimators. We then derived a post-processing reweighting procedure that practically eliminates the bias from different energy estimates, including the average shift. We also showed how importance sampling could dramatically reduce the bias and reduce the required number of correction terms and the associated error bars. Both techniques have only a small computational cost and make FCIQMC results practically exact (within statistical error bars) in the absence of a sign problem. Our approach's effectiveness makes FCIQMC competitive with other high-accuracy methods in its application to large sign-problem-free systems. Yet, its value is not restricted to these systems but extends to a broader range of systems with sign problems, whose study is the topic of an upcoming publication. Although N i and N above correspond to different iterations, their covariance still scales linearly with the number of walker as in Eq. (24). The main difference is that we expect this covariance to be weaker because the correlation between quantities at different iterations gets smaller when they are further apart.

Appendix B: Covariance of the Shift and the Number of Walkers
The covariance between the shift and the number of walkers can be related to the variance in the number of walkers as following: where Eq. (23) is used as an approximation. It is important to note here that Cov (N, S) does not scale with parameters γ, A, ∆τ and N as suggested by the above formula. The reason is that Var (N ) implicitly depends on those parameters and scales in the opposite direction. For example, by using half the value of γ, the variance of the number of walkers will almost double due to the looser control of the shift. The behavior becomes more evident by rewriting the covariance in terms of the shift variance Cov (N, S) ≈ − A∆τ γ Var (S) N , which leads to the exact opposite explicit dependence on these parameters. This argument can be extended to the covariance between the shift and the individual populations N i and explains why we could not establish a clear relation between the population control bias and the parameters γ, A or ∆τ .