Exploiting the hopping parameter expansion in the hybrid Monte Carlo (HMC) simulation of lattice QCD with two degenerate flavours of Wilson fermions

We show how the hopping parameter expansion at order $\kappa^2$ and $\kappa^4$ can be exploited in the simulation of lattice QCD with two flavours of degenerate Wilson fermions. A natural extension of this idea is a"UV-filtering"by using rooted polynomials. These approaches can be easily combined with, for example, mass preconditioning. First numerical tests are performed for the Wilson gauge action at $\beta=5.6$ and $\kappa=0.156$ and $0.1575$.


I. INTRODUCTION
Lattice QCD simulations are our primary tool to obtain non-perturbative results from QCD. To this end, a fair share of CPU time on the largest supercomputers that are available today is used. Still we would not mind, getting more accurate results from such simulations.
Hence any algorithmic progress is highly desirable. Here I shall address the generation of the gauge field.
In order to fix the notation, let us briefly recall the definition of lattice QCD. It is (1) The fermion fields ψ,ψ appear in bilinear form, and therefore can be integrated out exactly in the partition function Z. It remains an integral over the gauge field only: where M f [U] is the fermion matrix and the product runs over the flavours of the quarks.
In the literature different types of fermion actions are discussed. In the following we shall consider two degenerate flavours of Wilson fermions. The fermion matrix is given by where is the hopping matrix and the hopping parameter κ is a real number. The Wilson plaquette action is given by whereμ is a unit vector in µ-direction. For a more detailed discussion see for example the textbooks and review articles [1][2][3][4].
For lattice sizes that are needed to extract continuum physics from the simulation, it is by far too expensive to evaluate the determinant of the fermion matrix exactly. Therefore, following the proposal of Weingarten and Petcher [5], in the case of two degenerate flavours, one introduces auxiliary degrees of freedom, so called pseudo-fermions: where where φ is a vector with complex components. Hence the action, as a function of the gauge field and the pseudo-fermion fields, is given by Still the pseudo-fermion action is non-local and the evaluation requires the solution of a system of linear equations. The non-locality is in particular a problem for local algorithms that are used to simulate the pure gauge action. The hybrid Monte Carlo (HMC) algorithm [6] is better adapted to this situation, since all gauge degrees of freedom evolve simultaneously.
To this end, an artificial Hamiltonian is introduced: where the antihermitian momenta Π x,µ are conjugate to the gauge field U x,µ . They are auxiliary variables that are solely introduced for algorithmic reason. Their scalar product is defined as Here we follow the convention of, for example, ref. [7]. Note that in the literature often the factor 2 is omitted in the definition of the scalar product, see e.g. ref. [8]. Note that this leads to a relative factor √ 2 in the fictitious Monte Carlo time τ that is introduced below.
A discussion of this point is given in ref. [9], below eq. (3.2). The momenta and the gauge field evolve according the equations of motion where τ is the fictitious Monte Carlo time and the force F fulfills (ω, F ) = δ ω S(U) for infinitesimal variations of the gauge field δ ω U x,µ = ω x,µ U x,µ . Here we consider the so called φ-algorithm [10], where the pseudo-fermions stay fixed during the evolution of the gauge field and the momenta.
The equations of motion (11) can not be integrated exactly. Therefore a numerical integration scheme with a finite step-size is used. This leads to an integration error. The idea of the HMC-algorithm [6] is that this error can be corrected for by a Metropolis accept/reject step.
One update cycle (or trajectory) of the HMC is composed of the following three steps: • Perform a heat-bath for both the conjugate momenta Π and the pseudo-fermion field φ. In the case of the pseudo-fermion field one generates a field η with a Gaussian distribution P (η) ∝ exp(−|η| 2 ) and then Evaluate the Hamiltonian and save the initial gauge configuration U.
• Keeping φ fixed, we evolve the gauge field U and the conjugate momenta Π according to the classical equations of motion for the fictitious time τ . Since this can not be done exactly, a numerical integration scheme with the finite step-size δτ is used. At the end of the integration we have the fields U ′ , Π ′ , and φ ′ = φ. For a detailed discussion of the integration scheme see below.
• Accept U ′ as the new gauge field with the probability where else we keep U.
In order to fulfill detailed balance, the numerical integration scheme has to be area preserving and reversible. Reversible means that changing the sign of the momenta at the end of the integration time, we run back exactly to the initial gauge field U. Such integration schemes are called symplectic integrators. Let us introduce a short hand for finite update steps by δτ : Itegrators are build from these basic steps. Here we consider the second order Omelyan integrator [11], where we get for λ = 1/6 the scheme proposed in ref. [12], which is also discussed for example in ref. [13]. A trajectory of length τ is given by T m O with τ = m δτ . Taking λ = 1/2 the expression (18) simplifies to the well know leapfrog scheme: In our simulations, we use both the leapfrog and the Omelyan scheme with λ = 1/6. Sexton and Weingarten proposed a multilevel integration scheme [12]. Each level is associated with a term in the action. For example, in eq. (8), we can associate the gauge action with level i = 0 and the pseudo-fermion action with i = 1. For each level a time step δτ i = 2m i−1 δτ i−1 is defined. The scheme can be iteratively defined: and Note that for the leapfrog scheme, we use the convention δτ i = m i−1 δτ i−1 , which is more natural in this case. For a nice discussion of this scheme see for example section 2.2 of ref. [14]. The scheme can be generalized even further. The parameter λ might depend on the level i. Or me might use a fourth order scheme at low levels and a second order scheme at higher levels. An important property of symplectic integrators is that they preserve a so called shadow Hamiltonian. Here we will not delve into this discussion but refer the reader to refs. [13,15] and references therein.
Applying the HMC algorithm to the pseudo-fermion action (7), two problems are encountered: Going to lighter quark masses, sending κ to κ c , the condition number of the fermion matrix increases. As a result, for iterative solvers like the Conjugate Gradient (CG) or the Biconjugate gradient stabilized method (BiCGstab) [16,17], the number of iterations needed to solve the system of linear equations is increasing. The second problem is less obvious. It turns out that, in order to keep the acceptance rate fixed, the step-size of the integration scheme has to be reduced with decreasing quark mass. At the Lattice 2001 in Berlin the situation was referred to as "Berlin wall". At the time, it seemed impossible to reach sufficiently small masses, to reliably extrapolate, by using chiral perturbation theory, to the physical mass of the pion.
The situation considerably improved by the advent of better solvers, for example [18,19], and by replacing the pseudo-fermion action (7) by better alternatives. Note that the representation of the fermion determinant by pseudo-fermions is not unique. In [20] a large number of pseudo-fermion fields were introduced, allowing to express the fermion determinant in terms of a local pseudo-fermion action. This approach did not outperform the HMC algorithm in the end. It turned out that the large number of fields implicate that only small steps can be performed in the update. An alternative approach to local updating, which also did not outperform the HMC, is discussed in ref. [21]. See also [22] and references therein.
Based on this experience, alternatives to eq. (7), to be used in HMC simulations, were proposed. These are primarily mass preconditioning [23,24], domain-decomposition [7,25], and rooting [26]. The basic idea behind these approaches is to split the fermion matrix M into (several) factors, and introduce a separate pseudo-fermion field for each of the factors.
By using a suitable factorisation, the stochastic estimate of the fermion determinant becomes less noisy, allowing for a larger step-size in the integration scheme. A second potential advantage is that different parts of the pseudo-fermion action can be put on different timescales of the integration scheme [12]. In the ideal case, the numerically most expensive parts can be put on large time scales.
In the case of a finite step updating scheme [21], the multiboson (MB) algorithm [20] and the polynomial hybrid Monte Carlo (PHMC) algorithm [27][28][29], it has been shown that the updating scheme becomes more efficient by incorporating the hopping parameter expansion [30][31][32]. The hopping parameter expansion, taken at a low order, is used as UV-filter for the pseudo-fermion action. Here we demonstrate how this can efficiently be done for the HMC algorithm applied to two degenerate flavours. Compared with the simulation using the pseudo-fermion action (7) we get a speed-up of a factor of two or three, depending on the order of the hopping parameter expansion. In large scale simulations, this idea can be combined with mass preconditioning [23] and might lead to a speed-up of the order of 20%.
Furthermore we give a preliminary discussion of UV-filtering by using rooted polynomials.
The motivation is similar to ref. [26] and could also be seen as a natural extension of the UV-filtering by using the hopping parameter expansion.
The outline of the paper is the following. In the next section we discuss in detail how the hopping parameter expansion is used as UV-filter. Then we discuss how this idea can be naturally extended by using polynomial approximations of the rooted inverse of the fermion matrix. We briefly summarize results on the acceptance rate, the variance of ∆H and the forces that are given in the literature. Then in section IV we discuss our numerical results.
First we study the effect of UV-filtering by using the hopping parameter expansion up to the orders κ 2 and κ 4 . Then we present our still very preliminary results on rooted polynomials.
Finally we give a summary and an outlook.

II. INCORPORATING THE HOPPING PARAMETER EXPANSION INTO THE HYBRID MONTE CARLO SIMULATION
In the case of two degenerate flavours, the fermion determinant can be expressed as where one expands For small values of n, TrH n can be evaluated analytically. In the case of Wilson fermions, terms with odd values of n do not contribute. Furthermore n = 2 also does not contribute.
The leading non-vanishing contribution TrH 4 amounts to a plaquette term. This can be written as a shift of the parameter β. In the case of two degenerate Wilson fermions one gets ∆β = 96κ 4 . For n = 6 we get contributions from three different Wilson loops. With increasing n, the number of Wilson loops that contribute, rapidly increase and things become intractable. For a more detailed discussion see sect. III of ref. [33]. In the case of clover-improved Wilson fermions the situation is worse. There is already a non-vanishing contribution for n = 2, see eq. (2.8) of ref. [32]. Already n = 4 was not considered in ref.
In the simulation we consider a modified gauge actioñ where k is the order, up to which TrH n in terms of Wilson loops is tractable in the simulation.
In ref. [21] we discussed preconditioning by using the hopping parameter expansion in the context of a finite step updating scheme. To this end the value of the pseudo-fermion action has to be evaluated. Following eq. (8) of ref. [21] a modified fermion matrix is introduced and correspondinglyS The idea is thatS P F fluctuates less than S P F and hence allows for a larger step-size in the HMC-simulation. In ref. [21] we evaluatedM −1 φ by using the series expansion ofM −1 in κH. Also in the case of the MB algorithm [30,31] and the PHMC algorithm [32] it is natural to representM by using a polynomial in M.
Here we discuss an alternative representation that is more suitable for the HMC algorithm applied to two degenerate fermion flavours. In particular, we expressM −1 essentially in terms of M −1 to make use of iterative solvers to computeM −1 φ. For simplicity, let us first discuss the case k = 1. The series expansion of the inverse ofM in κH is given bỹ Since all coefficients of the expansion of M −1 are equal to one, we can easily evaluate the coefficients a n = n i=0 Hence we can writeM where α = exp(−1) and Since b n rapidly converges to 0, the sum ∞ n=0 b n κ n H n (31) can be truncated at a low order n max . For larger values of k we get a similar result, where α = exp(− k n=1 1/n). The coefficients b n can be evaluated by using an algebra program like Maple or Mathematica. Note that the coefficients in eq. (24) are tunable parameters of the algorithm. Previous experience [21,32] however shows that taking the values given by the hopping parameter expansion is a good choice. Here we will not further discuss this question. Now let us discuss how the HMC algorithm can be implemented forM −1 . The crucial question is how the forces can be computed. Here we have to put together the results obtained for the HMC algorithm and the PHMC algorithm. The variation of the pseudofermion action S P F with respect to the gauge field can be computed as where The variation of the polynomial has been worked out in ref. [28,29]. We follow the implementation of ref. [29,32] using Horners scheme. Here we need the variation ofS P F = |M −1 φ| 2 Note that we are free to take n t < n max , since the truncation error introduced is corrected for in the accept/reject step, where the summation is performed up to n max . We get where where In order to compute the variation for the polynomial efficiently, n t vectors have to be precomputed, following Horners scheme: and then recursively down to Then we compute recursively and The variation of the pseudo-fermion action can be written as In the following we refer to exploiting the hopping parameter expansion up to order κ k as κ k -filtering.

A. Rooted polynomials
In our simulations we make use of the hopping parameter expansion up to κ 4 . It is practically impossible to push the hopping parameter expansion to higher order. Therefore, with a similar motivation as ref. [26], where the rational HMC is considered, we propose to use rooted polynomials as UV-filters. Also note that For a discussion see section II. B. of ref. [21]. This means that for sufficiently large N, we can approximate the hopping parameter expansion by using low order polynomials that approximate M −1/N . Let us define M 0 =M , eq. (25), and then recursively up to some maximal j max , where where n j > n j−1 . The remainder can be written as where b n and α are computed by using an algebra program.
The construction proposed here contains both the noise reduction by rooting as proposed in [21,26] as well as a hierarchical splitting similar to mass preconditioning. Note that a hierarchical splitting, in the framework of the PHMC, was already discussed in refs. [34][35][36].
In particular, aiming at the application to a single flavour, one would like to investigate how well M −1/N jmax+1 can be approximated by a rational approximation. Also it might be feasible to compute detM jmax+1 , without using a noisy estimator, since likely only a few smallest eigenvalues of M contribute. In this case, it might be sufficient to compute detM jmax+1 in the accept/reject step only.
In our numerical tests we have used j max = 2 and N 1 = N 2 = N for simplicity. The general framework contains a large number of free parameters that is hard to tune without having a theoretical understanding of the dependence of the acceptance rate on these parameters. Ref. [13] and possible extensions might be helpful to this end.
In the case of the pseudo-fermion action (7) it is simple to perform a heat-bath update, eq. (12), of the pseudo-fermions at the beginning of the trajectory. The fermion matrix M has to be applied to a vector with a Gaussian distribution. In the case of the rooted polynomials the numerical costs are considerable larger, since W j has to be represented by a high order polynomial in M or equivalently H. In our preliminary study, we implemented the heat-bath update of the pseudo-fermions associated with W j in the straight forward way.
A more efficient solution is provided by ref. [37], where only a good approximation of W j is needed to update the pseudo-fermions.

B. Even/odd preconditioning
In all our numerical tests, we started from the even/odd preconditioned fermion matrix where e and o denote the collection of even and odd sites, respectively. Note that detM oo = detM and the condition number of M oo is reduced compared with M. In the discussion of the algorithm above, essentially κH has to be replaced by κ 2 H oe H eo . Note that indices in section IV below, refer to powers of κ 2 H oe H eo . Note that in ref. [29] it is explicitly spelled out, how the PHMC algorithm can be implemented for even/odd preconditioned clover-improved Wilson fermions.

III. THE ACCEPTANCE RATE AND FORCES
Typically the step-size of the HMC is tuned such that the acceptance rate 0.8 P acc 0.9. The optimal value depends on the integration scheme that is used. Also the occurrence of spikes might require to decrease the step-size δτ . Spikes mean that occasionally ∆H ≫ 1 appears in the simulation. Here we have encountered this phenomenon when using the second order Omelyan integrator.
The acceptance rate can be determined by simply counting the accepted configuration.
The statistical error is reduced by sampling min See eq. (3.1) of ref. [13] and references therein. In our simulations, as long as no spikes occur, eq. (50) turned out to be valid to good precision.
The HMC simulation using improved pseudo-fermion actions [7,24,26,[34][35][36] requires to tune a number of parameters. Therefore it is highly desirable to know how the acceptance rate, or equivalently Var(∆H), depends on these parameters. A step in this direction is taken by ref. [13], where the variances of the forces associated with the different parts of the action are related to Var(∆H). For the second order Omelyan scheme with λ = 1/6 the authors of ref. [13] find, see their eq. (3.4), Note that for λ = 1/6 also other terms than the forces appear at the order δτ 4 . For a more general result see ref. [38]. A main ingredient in the derivation of eq. (51) is the fact that a symplectic integrator conserves a shadow Hamiltonian. The deviation of the shadow Hamiltonian from the true Hamiltonian can be computed as a power series in the step-size δτ . Furthermore, it is assumed that the forces due to different pieces of the action are not correlated.

IV. NUMERICAL RESULTS
The study is performed on three servers with two CPUs with 10 cores each, that were immediately available to us. For programming convenience no highly optimized code was used. As solver, we have used the BiCG-stab [16,17] algorithm. Here we did not experiment much with the stopping criterion, but did run the solver essentially up to machine precision.
We simulate comparatively small lattices at β = 5. 6. In particular we have tested κ 2 -and A rather detailed study at this value of β is presented in ref. [39]. Based on the Sommer scale r 0 [40], the authors of ref. [39] find that for β = 5.6, κ = 0.156 on a 16 3 × 32 lattice a = 0.09796(56) fm. For the same parameters they find m P S = 0.9002(69) GeV for the mass of the lightest pseudo-scalar particle. For β = 5.6, κ = 0.1575 on a 16 3 × 32 lattice they obtain a = 0.0839(11) fm and m P S = 0.6524(86) GeV. This means that the masses are still quite large compared with the mass of the pion m π 0 ≈ 135 MeV. Note that a number of algorithmic studies were performed at β = 5.6, the values of κ and lattice sizes that were studied in ref. [39]. See for example [7,14,34].

A. Exploiting the hopping parameter expansion
In this set of simulations, we tested the efficiency of κ k -filtering. To this end, we simulated the system with the pseudo-fermion action (7) and the modified pseudo-fermion action (26) up to κ 2 and κ 4 . We simulated by using the leapfrog as well as the second order Omelyan integrator at λ = 1/6. In both cases, we used two time scales. On the coarse time step we put the pseudo-fermion action and on the fine one the gauge action. The time step of the gauge action was chosen to be such that further decreasing it, virtually does not increase the acceptance rate. Next we have to decide how to truncate eq. (31). In the extended runs that we performed first, we set ad hoc n t = 7 and n max = 19 for κ 2 . Note Later we carefully checked the dependence of the acceptance rate on n t . Furthermore we demonstrate that the value of n t has no influence on the reversibility.

Extended runs
We performed a few extended runs. This way we checked for spikes in ∆H and tried to estimate autocorrelation times. Throughout we used trajectories of the length τ = √ 2, corresponding to τ = 1 in the convention of, for example, ref. [8].
A first set of runs was performed by using the leapfrog integration scheme. We performed preliminary simulations to find the step-size δτ that gives P acc ≈ 0.8. In table I we summarize the results of our extended runs. The plaquette value is P = 1 3Np p ReTrU p , where the sum runs over all plaquettes on the lattice and U p denotes the ordered product of the gauge variables around the plaquette p and N p is the number of plaquettes. Since the effort required for the evaluation of the polynomial (31) is small compared with that for the iterative solver, the performance gain achieved by the κ k -filtering is essentially given by the ratio of the step numbers m. This means that even in the case of κ 2 -filtering that is still achievable in the case of clover-improvement [32], we see a gain of a factor of two. Next we redid the exercise by using the second order Omelyan integrator. In order to get an acceptance rate of ≈ 80%, we find from preliminary simulations that m = 18 and 8 for the order 0 and 2 are needed, respectively. Hence the performance gain is even a bit larger than in the case of the leapfrog TABLE I. Extended runs using the leapfrog algorithm for a 12 3 × 24 lattice at β = 5.6 and κ = 0.156. We study the effect of κ n -filtering. stat gives the number of trajectories, the number of coarse time steps m, and the expectation value of the plaquette P . The acceptance rate is given by P acc = min [1, exp(−∆H)] . In all three cases, the estimate of P acc obtained from the variance Var(∆H), by using eq. (50) is consistent with the result given in column 5. integrator. Performing longer runs, spikes in ∆H appeared. Therefore we do not further discuss these runs. It is known that the second order Omelyan integrator is more susceptible to this problem than the leapfrog. The problem can be cured by reducing the step-size. In the case of κ 4 -filtering we could not find an m that gives an acceptance rate of ≈ 80%. For m = 6, the acceptance rate is considerably larger and for m = 5 it is smaller. We decided to perform a longer run for m = 6. From 24540 trajectories we get P = 0.56991(2), P acc = 0.8830 (15), and Var(∆H) = 0.0886 (15). In this run no spikes appear. We find that the direct determination of P acc and the result obtained from eq. (50) are consistent. From this run we get the estimates τ int,P = √ 2 × 9.3(1.7) and τ int,iter = √ 2 × 24.8(4.5) for the integrated autocorrelation times of the plaquette and the iteration number of the solver, respectively. Given the relatively low accuracy of the autocorrelation time, we are not able to decide whether the UV-filtering has an influence on the autocorrelation time.

The forces
As it is argued in ref. [13], the acceptance rate can be inferred from the variance of the forces Var(|F | 2 ). Computing Var(|F | 2 ) for κ 4 -filtering, we get essentially consistent results from the run with the leapfrog and the second order Omelyan integrator. We conclude Var(|F P F | 2 ) = 57500(1000), where the error is only a rough estimate. In the case of the runs without filtering and κ 2 -filtering, using the leapfrog integration scheme, we get Var(|F P F | 2 ) = 11400000(200000) and 344000(4000), respectively. The runs with the second order Omelyan scheme contain spikes in ∆H. These spikes can also be seen in F P F . As a result, Var(|F P F | 2 ) is by far larger than for the runs with the leapfrog. Excluding the spikes by hand, Var(|F P F | 2 ) is much reduced, and very roughly consistent with what we find in the runs with the leapfrog integrator. Following eq. (51), keeping Var(|F P F | 2 ) δτ 4 fixed, should result in a fixed acceptance rate. Indeed, (11400000/344000) 1/4 ≈ 2.4 and (11400000/58000) 1/4 ≈ 3.7 are roughly consistent with the speed-ups that we have observed directly.
For the gauge action, we get from the runs with the leapfrog and the second order Omelyan scheme for both κ 2 -filtering and no filtering consistent results that can be summarized as Var(|F G | 2 ) = 28800000(400000). In the case of κ 4 -filtering, due to the larger value of β inS G , we get the larger value Var(|F G | 2 ) = 30000000(400000). We checked that, also according to eq. (51), our choices of δτ G are small enough, not to influence the acceptance rate markedly.

Truncation of the series and reversibility
In principle we can relax the accuracy of the calculation of the force to the point, where the acceptance rate is markedly affected. However it turned out that, using iterative solvers, the reversibility of the integration is increasingly violated with decreasing accuracy of the solution. With exact numerics, reversibility would be given at any precision of the solver.
However we work with double precision numbers, and rounding errors occur. Furthermore, iterative solvers approach the solution in a chaotic way. Hence, if we stop the solver at a moderate precision, deviations caused by rounding errors are blown up. This phenomenon does not occur when we evaluate a series with fixed coefficients. Therefore the truncation at the order n t < n max of the sum (31) can be chosen such that the acceptance rate is reduced by little compared with larger values of n t . We checked this reasoning for κ 4 -filtering and the second order Omelyan scheme at λ = 1/6. To this end, we selected ten configurations, which were separated by 400 trajectories each from our extended run. For each of these configurations, we started a trajectory using the same parameters as for our extended run.
At the end of the trajectory the momenta are reversed and the trajectory is run backwards, resulting in the configuration U ′ . We compute For n t = 5, 6, and 15 and running the solver essentially up to machine precision, we get ∆ ≈ 6.3 × 10 −21 for all three choices. Instead, keeping n t = 15 fixed and relaxing the stopping criterion of the BiCG-stab, ∆ is clearly increasing.

The acceptance rate as a function of n t
For both κ 2 -and κ 4 -filtering, we performed runs with different values of n t . We used the second order Omelyan scheme with λ = 1/6 throughout. In all cases the trajectories have the length τ = √ 2. As expected, we find that with increasing n t the acceptance rate rapidly reaches a plateau value.
For κ 2 -filtering, we first performed runs with m = 8. Similar to the extended run, spikes appeared. Therefore we redid the runs with m = 10, where we did not encounter this problem for n t > 3. The results are summarized in table II. The acceptance rate as well as Var(∆H) rapidly approach a plateau, which is reached at the level of our numerical precision for n t = 5. For n t > 3, the estimate of P acc obtained from the variance Var(∆H), by using eq. (50) is consistent with the direct measurement.
Our results for κ 4 -filtering are summarized in table III. Also here the acceptance rate as well as Var(∆H) rapidly reach a plateau value. At the level of our accuracy this happens for n t = 7. As expected, this value is larger than for κ 2 -filtering.
We conclude that the choice of n t is uncritical. Using a few short runs we can locate the point, where the acceptance rate as a function of n t levels off. In the production run we then add a small safety margin.

Scaling with the lattice size and κ
To get an idea how the performance scales with the hopping parameter κ, we performed two short runs at κ = 0.1575 on a 16 3 × 32 lattice by using κ 4 -filtering. In both cases the length of a trajectory is τ = √ 2. We started the simulations with a configuration taken from the runs discussed in the section below. In the first simulation we used the leapfrog Omelyan scheme at λ = 1/6. We give the acceptance rate and Var(∆H) as a function of the maximal summation index n t . Throughout we use m = 10 and the length of the trajectory is In the run for n t = 3 spikes occurred. After removing them by hand we get Var(∆H) = 0.88 (5).

B. Runs with rooted polynomials
We performed a few runs with the rooted polynomial action. We simulated a 16 3 × 32 lattice at β = 5.6 and κ = 0.1575. All runs are characterized by j max = 2. In all cases we use for simplicity the leapfrog scheme with different time scales. Throughout we use the trajectory length τ = √ 2.

Without hopping parameter expansion
In this first set of runs we simulated without making use of the hopping parameter expansion. The polynomials are characterized by n 1 = 8 and n 2 = 32 and rooting with N = 2, 3, 4, 6, 8, and 16. In figure 1 we show the coefficient b n of eq. (47) for N = 2, 3, and 4. For n > n 2 = 32, b n is oscillating, with a decreasing amplitude. As it can be seen from the figure, the decay is exponential in n. The decay becomes faster with increasing N. With increasing N, the decay rate converges to a finite limit. In table IV we summarize the basic  parameters of the simulations and give the acceptance rate P acc and Var(∆H). We have taken m such that P acc ≈ 0.8. The parameters m 2 , m 1 , and m 0 are chosen ad hoc and are likely larger than the optimal values. Note that error bars might be underestimated, since the lengths of the runs are relatively short. It is reassuring that our estimates of P are consistent with the result given in table I of [39].
In table V we summarize the results obtained for the variances of the forces. As one might expect, Var(|F G | 2 ) does not depend on N. Furthermore, comparing with the runs for the 12 3 × 24 lattice of the previous section, we see that Var(|F G | 2 ) is roughly proportional to the volume of the lattice. In the case of the rooted pseudo-fermion action we find that Var(|F P F,1 | 2 ) and Var(|F P F,2 | 2 ) are decreasing with increasing N. In the limit N → ∞, a finite value, corresponding to the hopping parameter expansion should be reached. Here, it seems that we are still far away from this limit. Going from N = 8 to 16, Var(|F P F,1 | 2 ) and Var(|F P F,2 | 2 ) are reduced by roughly a factor of four. Following eq. (51), this should allow to increase the corresponding step-size by a factor of √ 2. Since the numerical effort for evaluating S P F increases by a factor of two, the algorithm becomes less efficient. In order to compare the numerical costs, we define the cost index c = n j N Var(|F P F,1 | 2 ) 1/4 , where the exponent 1/4 is motivated by eq. (51). Our results are summarized in table VI. In the case of S P F,1 we see a small increase from N = 2 to 4. For S P F,2 the cost index is very similar for N = 2, 3, and 4. On the other hand, Var(|F P F,3 | 2 ) is clearly decreasing going from N = 2 to 4. The costs related with S P F,3 depend on the solver that is used. Here we made no effort to find the optimal solver. Therefore we refrain from quoting a performance index for S P F,3 .
Anyway, it seems likely that the optimal overall performance is reached for N > 2.

Employing κ 4 -filtering
UV-filtering by using the hopping parameter expansion can by easily implemented in the PHMC-algorithm [32]. Here we perform a preliminary study, employing κ 4 -filtering. We consider polynomials characterized by n 1 = 16 and n 2 = 42 and N = 8. The remainder is characterized by α = 0.01390254... . Note that in the limit N → ∞ one gets α = 0.01321050... .

V. CONCLUSION AND OUTLOOK
We discuss how the hopping parameter expansion can be used as an efficient UV-filter in the HMC simulation of lattice QCD with two degenerate fermion flavours. We have carefully tested the idea for the Wilson gauge action and Wilson fermions at β = 5.6 and κ = 0.156 and the relatively small lattice size 12 3 × 24. Compared with the pseudo-fermion action (7) we find a speed-up of a factor of two and three, using κ 2 -and κ 4 -filtering, respectively. The latter result is confirmed by short runs performed for a 16 3 × 32 lattice and κ = 0.1575.
In large scale simulations the idea can be combined with mass preconditioning or domain decompositioning. In the case of mass preconditioning one might be able to skip the term in the action that corresponds to the most heavy mass. In the case of domain decompositioning one applies the idea to the fermion matrix that is restricted to the domains. The speed-up achieved this way might be of the order of 20%.
A natural extension of applying the hopping parameter expansion as UV-filter is the use of rooted polynomials. This idea is related with the rooting proposed in ref. [26] as well as the idea of hierarchically factorised polynomials [35,36]. Here our results are still preliminary, and both a better theoretical understanding as well as further numerical experiments are needed.

This work was supported by the Deutsche Forschungsgemeinschaft under the grant No
HA 3150/4-1.