Multigrid for Chiral Lattice Fermions: Domain Wall

Critical slowing down for the Krylov Dirac solver presents a major obstacle to further advances in lattice field theory as it approaches the continuum solution. We propose a new multi-grid approach for chiral fermions, applicable to both the 5-d domain wall or 4-d Overlap operator. The central idea is to directly coarsen the 4-d Wilson kernel, giving an effective domain wall or overlap operator on each level. We provide here an explicit construction for the Shamir domain wall formulation with numerical tests for the 2-d Schwinger prototype, demonstrating near ideal multi-grid scaling. The framework is designed for a natural extension to 4-d lattice QCD chiral fermions, such as the M\"obius, Zolotarev or Borici domain wall discretizations or directly to a rational expansion of the 4-d Overlap operator. For the Shamir operator, the effective overlap operator is isolated by the use of a Pauli-Villars preconditioner in the spirit of the K\"ahler-Dirac spectral map used in a recent staggered MG algorithm [1].


Introduction
Increasingly powerful computers and better theoretical insights continue to improve the predictive power of lattice quantum field theories, most spectacularly for lattice quantum chromodynamics (LQCD) [2]. However, with larger lattice volumes and finer lattice spacing exposing multiple scales, the lattice Dirac solver becomes increasingly ill-conditioned threatening further progress. The cause is well known: as the fermion mass approaches zero the Dirac operator becomes singular due to the exact chiral symmetry of the Dirac equation at zero mass, causing critical slowing down [3]. The algorithmic solution to this problem for lattice QCD was recognized 30 years ago [4]: the fine grid representation for the linear solver should be coupled to multiple scales on coarser grids in the spirit of Wilson's real space renormalization group and implemented as a recursive multi-grid (MG) preconditioner [5]. Early investigations in the 1990's introduced a gauge-invariant productive MG algorithm [6,7] with encouraging results for the Dirac operator in the presence of weak (or smooth) background gauge fields near the continuum. However, in practice lattice sizes at that time were too small and the gauge fields were too rough to achieve useful improvements.
It was not until the development of adaptive geometric MG methods [8,9] that a fully recursive MG algorithm, capable of projecting strong background chromodynamics fields onto coarser scales, was found for the Wilson Dirac discretization. However there are two other discretizations, referred to as staggered [10] and domain wall [11] fermions, that are used extensively in high energy applications that more faithfully represent chiral symmetry on the lattice. The extension of adaptive geometric MG to these discretizations has proven to be more difficult, perhaps related to the improved lattice chiral symmetry. There has been progress on a two-level MG algorithm for domain wall fermions [12,13,14] and a non-Galerkin algorithm for the closely related overlap operator [15,16]. Recently an adaptive geometric multigrid algorithm for staggered fermions was discovered based on a novel pre-conditioner inspired by the Kähler-Dirac spin structure [17,18].
Here we propose a new approach to the domain wall discretization which leverages, at least on a heuristic level, features developed from both the Wilson and staggered MG methods. We hope that a comparison of methods will lead to new optimizations across the full set of discretizations. The design strategy of our domain wall MG algorithm consists of trying to separate the 4-d physical subspace of low modes found in the effective 4-d overlap operator from the larger 5-d domain wall vector space. This procedure is conveniently enumerated in 3 steps: i. Approximate Pauli-Villars preconditioning of the domain wall operator [19]. ii. Wilson kernel MG projection on the domain wall and Pauli-Villars factors [9]. iii. Truncated projection/prolongation restricted to the domain wall boundary.
The salient features of each step are: i.) The exact Pauli-Villars inverse D −1 P V , which is a perfect map from the DW spectrum onto overlap, is well approximated by the application of the Pauli-Villars adjoint, D † P V . ii.) A Galerkin MG coarsening is applied to the 4-d Wilson kernel on each extra-dimensional slice separately for both the domain wall and Pauli-Villars factors. The null space projection is formulated entirely from the 4-d Wilson kernel and does not scale with the size of the extra dimension. iii.) Finally, within the multigrid cycle, the residual coarsening and error interpolation is restricted to the domain wall, which in turn allows the extent of the extra dimension of the coarse-level operator to be reduced.
We again follow the successful development strategy for the Wilson [8] and staggered [1] MG algorithms by using the two-flavor lattice Schwinger model [20,21] as a prototype for exploration and testing. The reader is referred to Fig. 3.1 and the accompanying Table 2 for a concise summary of the performance of our domain wall algorithm for the 2-d Schwinger model. In Sec. 2, the underlying motivation and formalism is given. For simplicity, the discussion is restricted to the Shamir domain wall operator [22]. This is followed in Sec. 3 by the details of the implementation and benchmarks for our prototype 2-d Schwinger model. Care is taken to present the formalism in a dimension-agnostic form to accommodate extensions from 2-d to 4-d gauge theories. In Sec. 5 we conclude by noting that our core developments not only apply to the 4-d Shamir formulation presented here but also to the Möbius [23,24], Borici [25,26], and Zolotarev [27,28] formulations, as well as directly to the overlap operator approximation to the sign function [15,29,30,31].

Domain Wall Formalism
All lattice discretizations of the Dirac operator seek to rapidly approach the continuum Dirac operator, Dψ(x) = γ µ (∂ µ − iA µ (x))ψ(x) + mψ(x) , (2.1) as the lattice spacing vanishes. The continuum operator is a first derivative, an anti-Hermitian operator, thus the spectrum is imaginary indefinite except for a small real shift for m > 0. It obeys an exact chiral symmetry at zero mass (m = 0). The Wilson discretization,

2)
introduces an anti-Hermitian "naïve" first difference and adds a Hermitian second-difference (or so-called Wilson term) to lift the doublers to the cut-off scale π/a at the expense of explicitly violating lattice chiral symmetry at O(a) in lattice spacing. The Wilson lattice operator then requires fine tuning of the bare quark in order to restore chiral symmetry in the continuum limit.
The chiral overlap [15,32] and domain wall [22] fermions, beyond their remarkable physical properties, have the feature that the Wilson kernel can be re-purposed. In the domain wall (DW) approach, the Wilson kernel is present in an extra dimension separating 4-d domain walls by a lattice of length L s . Suppressing the four-dimensional indices, the DW operator is given by where P ± = 1 2 (1 ± γ 5 ). The indices s, s = 1, · · · L s label 4-d blocks in the extra fifth dimension (or d+1 dimension). The bulk mass, M 5 < −1, is tachyonic. The physical bare mass parameter is encoded by the boundary parameter m.
In the limit of L s → ∞ an exact lattice chiral symmetry appears up to an explicit fermion mass gap given by The result is that propagators between the domain walls are described by the effective 4-d overlap operator proposed by Neuberger [15,33] with the deformed chiral algebra of the Ginsparg-Wilson identity [34], at zero quark mass. The explicit spectral map from the domain wall to the overlap operator will be presented in Sec. 2.2, following closely the notation in [19] which we will refer to as BNO. This spectral map between domain wall and overlap operators plays a central role in our DW MG algorithm.

Standard Approaches and Shortcomings
The domain wall operator encodes chiral symmetry in a subtle and indirect fashion. The full spectrum of the domain wall operator in Eq. 2.3, illustrated in Fig. 2.1 for two dimensions, does not have the expected small eigenvalues of the continuum as you approach the chiral limit, but instead has O(L s ) small eigenvalues. This can easily be seen in the free limit (U = 1) which at zero momentum has the spectrum, λ n = m 1/Ls e i(2n + 1)π/L s for n = 0, 1, ..., L s − 1 . (2.6) In the exact chiral limit, first taking L S → ∞ followed by m → 0, this operator has no zero modes. Instead they form a unit circle around the origin in the complex plane. This feature persists when gauge fields are turned on as illustrated in Fig. 2.1 for two dimensions with Abelian gauge fields.
The DW operator features further issues. Unlike with staggered or Wilson fermions, the DW operator is dramatically non-normal ([D DW (m), D † DW (m)] = 0), even in the exact free field. The spectrum does not satisfy the half-plane eigenvalue condition with positive real values (Re[λ] > 0). These two properties defeat reliable convergence properties of traditional Krylov solvers. For example, consider a normal indefinite matrix, whose spectrum fits in a circle of radius r centered at a complex point c ∈ C. For GMRES methods, one can show the relative residual on iteration n of a GMRES method is bounded by |r/c| n [35].
A standard method to solve the domain wall linear system is to replace with the normal system, This system has multiple benefits but also complications.
The normal operator encodes a single low mode with, in the free-field limit, eigenvalues O(m 2 ). The low modes are "bound" to the domain wall, as can be visually inspected by looking at the profile of the eigenvectors (the singular vectors of the domain wall operator) in the bulk dimension. Both of these properties form a stark contrast with D DW ; the normal operator transforms the physics of the domain wall operator into as a single chiral fermion below the cut-off.
From a numerical standpoint, the normal operator is Hermitian positive definite (HPD) and can be solved efficiently by traditional Krylov methods, e.g., Conjugate Gradient. Further, it is amenable to deflation with eigenpairs generated via an efficient Lanczos process. Also as a Hermitian positive definite matrix, the solver for this normal operator can be implemented as a traditional MG algorithm [12,13]. However, a numerical implementation of the coarsened normal operator, a distance-two stencil, requires a non-trivial increase in computation and communication relative to a distance-one stencil [36,37,38,39]. This owes in large part to a far more complicated gather pattern due to around-the-corner terms, which the original fine level original normal operator avoids by being the product of two distance-one operators.
One solution to this issue is to recognize that this normal operator is the square of a distance-one "γ 5 " Hermitian operator, where Γ 5 = γ 5 R is a product of γ 5 and the reflection in the extra dimension by R ss = δ (s+s −1)%Ls,0 . This operator has an indefinite real spectrum similar to the imaginary spectra in the continuum and staggered operator on the lattice. Appealingly, the operator also has a single chiral mode with eigenvalues of O(m). However, this operator itself has its own frustrations: while it can be coarsened, it develops spurious small eigenvalues similar to with a naïve approach to staggered fermions, which was shown in [1] to harm a fully recursive algorithm.
The Γ 5 D DW operator leads itself to a clean interpretation of spurious low modes via a free-field analysis. The low modes of Γ 5 D DW do not include support on the bulk; as such, a "coarsening" inspired by the low modes eliminate it. In this case, the coarsened operator can be shown to be γ 5 D naïve with well understood extra low modes. This is actually a foretelling of a salient feature of our algorithm: the bulk dimension cannot be trivially eliminated. As a last concern, this approach does not generalize beyond Shamir domain wall fermions: the fully general Möbius formulation has a Wilson kernel inverse as part of its definition of Γ 5 . Our new approach seeks to avoid these difficulties base on the spectral map form the domain wall and to the overlap representations in BNO, combined with methods borrowed from prior multigrid algorithms for Wilson [8] and staggered [1] discretizations.

Spectral Map from Domain Wall to Overlap
One may view the Pauli-Villars operator, D P V ≡ D DW (1) as a left preconditioner of the domain wall operator in the linear system (2.10) We will show that the Pauli-Villars operator is an ideal, albeit expensive, preconditioner and that even a simple approximation to the inverse dramatically accelerates convergence. This is accomplished via the generalized eigenmode problem, D DW (m)Ψ λ = λD P V Ψ λ , which separates the low chiral generalized eigenvectors (λ 0) bound to the walls at s = 1 and s = L s , from the high bulk modes at the cut-off: λ = O(π/a).
To see this explicitly, it is convenient as in BNO to first move both walls to s = 1 by introducing a cyclic permutation of the negative chiral modes at s = L s to s = 1 by with P ± = 1 2 (1 ± γ 5 ). This defines a unitary transformation of our domain wall operators, Following the derivation via LDU transformation in BNO, the preconditioned matrix in this permuted chiral basis, is mapped into a block diagonal form [19] in the extra dimension. This remarkable identity is the central observation for our preconditioned multigrid algorithm. The effective overlap operator block, K DW 1,1 (m) = D ov (m), has all the non-trivial low eigenvalues. The additional extra heavy modes are mapped exactly to unit eigenvalues (or in physical units at the 1/a), irrespective of L s , lattice spacing (here scaled to a = 1) and gauge interactions.
Parenthetically, we should acknowledge that this general mechanism to isolate the low domain wall modes in an effective overlap block is well known and it is the fundamental insight to chiral lattice fermions [32]. In particular the Monte Carlo sampling of the path integral must divide by the Pauli-Villars determent to give a finite determinant ratio, (1)] in the continuum. As explained by Kaplan and Schmaltz in [40] in an elegant exposition based on a kinematic super symmetry cancellation between the bulk fermion and the bosonic Pauli-Villars pseudofermions, broken only by domain wall boundary, to give rise to boundary chiral modes via the Callan-Harvey descent relations [41]. The block structure of K DW (m) lends itself to a structured block-inverse given by The identification of the overlap propagator G 11 ≡ D −1 ov (m) agrees with the practice of computing DW propagators by solving the linear system, D DW (m)G = D P V b, from an arbitrary source on the wall b ∼ δ 1,s . The heavy modes (0, 0, · · · , 1, · · · , 0) are static. On the other hand, the non-zero elements in the first column show that the chiral modes bleed exponentially into the interior by factors, ∆ s+1 = T ∆ s = T s /(1 + T L ) in terms of the transfer matrix: in the Shamir implementation. In the limit L s → ∞ this becomes the exact sign function: We can trace the sparse block structure to the mass dependence on the dyadic structure at the boundaries relating the Pauli-Villars operator to the domain wall operator, After applying the Sherman-Morrison-Woodbury formula [42], and again considering the chiral basis, V T → V T P = 1 0 0 · · · 0 , we see a clear projection onto the first column in K DW s,s (m), reproducing the sparse structure in Eq. 2.14.
Free-Field Limit: The analysis of the free-field (U = 1) limit for the domain wall and Pauli-Villars operators gives valuable insight and guidance to our MG construction, particularly when examining the low spectra well below the UV cut-off scale, π/a. Qualitative and even quantitative features survive the introduction of the gauge fields generated by lattice Monte Carlo methods.
We begin by transforming the free Wilson kernel in Eq. 2.3 to momentum space, where we introduce the expression on the right to emphasize the well known feature of circular arcs in the complex spectrum of the Wilson operator. This gaping separates the doubler modes from the continuum modes as evident in Fig. 2.4 even with non-zero gauge fields turned on. The lattice spacing a has been introduced to identify physical low modes (|p| π/a) relative to UV cut-off: O(π/a). The low spectrum, λ ± m + ±i p 2 + ap 2 /2, is the continuum Dirac spectrum plus the O(ap 2 ) Wilson term.
This Fourier analysis had a straightforward generalization to the Pauli-Villars operator, D P V , as the boundary conditions are antiperiodic in the fifth dimension, (2.21) The boundary conditions restrict the bulk momentum p 5 to half integer modes: p 5 = π(2n + 1)/L s . After setting the mass to its free-field tachyonic value M 5 = −1, the low momentum expansion again has the familiar Wilson form (iγ µ p µ + (a/2)p 2 ) with first "eye" displaced to a circle centered at λ = 1. Not surprisingly the free domain wall operator, D DW (m), which differs by a fermion-mass dependence on the boundary (reducing to Dirichlet boundary conditions for m = 0), has a qualitatively similar spectra as seen in Fig Turning to the normal equation, we see a dramatic difference. While the domain wall normal operator has chiral modes at m = 0, as we show in Appendix A, the normal equation for the Pauli-Villars operator is positive definite with a large gap from zero with singular values of order the cut-off: The first correction is O(a 2 p 4 ) in physical units. Indeed more generally for M 5 = −1 one can prove for any finite L s in the free limit that D † P V D P V is bounded from below by 1, i.e., at the lattice cut-off scale: 1/a 2 . This feature suggest the usefulness of our approximate preconditioning, which avoids the expensive need to invert the Pauli-Villars operator. Even after gauge fields are included, we note in Fig. 2.5 this approximation conforms well at small eigenvalues including the O(ap 2 ) for the parabolic curvature in the complex plane.
To explore this further, we summarize results from Appendix A, comparing the low momentum spectra for the exact overlap map, D −1 P V D DW , with the approximate map, D † P V D DW . At m = 0, the low spectrum for D −1 P V D DW is given by λ 0 ± = ±i p 2 + ap 2 + O(a 2 ). By using the shift identity, λ 0 → λ = am + (1 − am)λ 0 as implicit in Eq. A.6, the low spectrum for non-zero mass is which after a rescaling, The general expansion in powers of momentum p 2 has a rather remarkable independence on L s . As noted in Appendix A, a direct evaluation of the free overlap kernel in Eq. A.6 for any finite L s ≥ 2 results in a series in Finally we compare the low mass spectra to our approximate preconditioned opera- relative to the exact preconditioner in Eq. 2.24. The only difference to quadratic order occurs at dimension 6 with a contribution O(a 2 mp 2 ), helping to explain why a Wilson-like spectra in the overlap sector is preserved in Fig. 2.34, including the parabolic curvature to O(ap 2 ).
Similarity with the Kähler-Dirac Preconditioned Staggered Operator: It is interesting to compare the Wilson and the overlap spectra with the preconditioned spectrum for the staggered MG algorithm in Ref. [1]. The staggered lattice operator has the unique property, shared by the continuum, of being an exactly anti-Hermitian operator plus a constant mass shift as illustrated for m = 0 by the vertical (red) spectra in Fig. 2.3. Both the staggered and continuum operators are normal operators.
At first these similarities between the staggered operator and the continuum may seem to be ideal for multigrid, but this turned out to be major obstacle to extending the Galerkin projection method used successfully for the Wilson MG algorithm [16] to the staggered operator. The solution found in Ref. [1] was to first precondition by dividing the anti-Hermitian staggered operator by the spin-taste Kähler-Dirac block, deforming the spectrum into one resembling the overlap spectrum or the first "eye" of Wilson spectra in Fig. 2.3.
More specifically, this required writing the staggered operator as an "even/odd" 2 d block operator (i.e., 2 2 squares in 2d, 2 4 hypercubes in 4d) decomposed as a sum of blocklocal terms "B" and block-hopping terms "C" as described in Eq. 2.11 of [1]. Each of Eigenvalues, 32 2 , free, m = 0

Staggered
Kähler-Dirac Prec these terms are separately anti-Hermitian operators with an indefinite spectra. In this formalism, the preconditioned operator is simply the block Jacobi preconditioned form, B −1 D stag . This maps the spectrum onto an exactly unitary circle in the free case, as shown in Fig. 2.3, resembling the exact overlap operator at L s = ∞.
The structural similarity to the the Pauli-Villars preconditioner is striking. In this case, we could have also started with a pair of anti-Hermitian indefinite operators, "iΓ 5 D P V " for the Pauli-Villars operator and "iΓ 5 D DW (m)" for the domain wall operator. We can now formulate the Pauli-Villars preconditioned domain wall operator as where we've made explicit that this preconditioning takes an imaginary indefinite spectrum to a unitary circle, identical in form to Fig.2.2. Beyond the practical consequence of this observation, it is intriguing to ask why this is the case. It may hint of a unifying principle for our multigrid algorithm common to all three major fermion discretizations in the chiral limit, which is worthy of additional investigation.

Outline of our Three Step Multigrid Implementation
Pauli-Villars Preconditioning: For the first step we need to consider the ideal Pauli-Villars preconditioner. It is worth re-emphasizing the challenge and importance of preconditioning the domain wall operator. The domain wall operator on a d + 1 dimensional lattice increases the number of eigenvalues from N ev = 2 d/2 × N c × L d by a factor of L s . The Pauli-Villars inverse spectra transform is an ideal preconditioner, putting the "bulk" N ev × (L s − 1) eigenvalues exactly at the cut-off 1/a in physical units.
Due to the Pauli-Villars operator having a maximally indefinite spectrum, it is most optimally solved via the normal operator. Although the Pauli-Villars normal operator is extremely well conditioned with a positive real spectrum starting at 1/a 2 , its use as a preconditioner is still prohibitively expensive. Instead we consider an approximate Pauli-Villars preconditioner, Eigenvalues, 24 2 , β = 10.0, m = 0.05, L s = 8 as motivated by low-order expansion of the Pauli-Villars normal operator given in Eq. 2.22. This approximate operator importantly preserves the property that the spectrum is confined to the right complex half-plane, with −π/2 < θ < π/2 for all eigenvalues λ = r exp[iθ]. This is proven in Appendix B based on the positive definite spectra of the normal operator factor and the right-half plane spectrum of D −1 P V D DW . This does imply Krylov solvers such as BiCGStab can be directly applied to this approximate operator.
While Fig. 2.5 is consistent with this property, the qualitatively strong match between the low eigenvalues of the two operators suggests we can make a much stronger statement. Indeed the two operators are nearly identical, with deviations confined to larger eigenvalues in the approach to the cut-off scale π/a. This is again motivated by the free field limit where we prove for M 5 = −1 29) and as a result the spectrum of the approximation is valid up to O(p 4 ) corrections. Indeed in the free theory, the additive operator to 1 (or 1/a 2 in physical units) is positive definite for all momenta. Further details on this can be found in Appendix A.
We note that a point of future investigation could be approximating (D P V D † P V ) −1 by a low-order polynomial in the normal operator as opposed to truncating it to 1. Given the success of the truncation to 1, it is unclear if higher order polynomials would be worth the additional computational burden.
Wilson Kernel MG Projection: In the second step, we introduce a coarsening projection using the familiar Galerkin projecting developed for Wilson MG acting independently on the Wilson kernel for each of the s = 1, 2, · · · L s slices, Here color and spin indices are implicit, and on the right we have further followed the convention of Wilson MG by suppressing the indices of the d-dimensional space-time lattice. The projection preserves γ 5 so that γ 5 P = Pσ 3 and P † (1 ± γ 5 )P = 1 ± σ 3 . Of course this factorization also applies to the Pauli-Villars term and to generalized domain wall formulations such as Möbius, Zolotarev etc. However this factorization does not apply to non-linear functional of the kernel such as the preconditioned product where P ± = 1 2 (1 ± σ 3 ). This coarsened operator has identical algebraic structure as original fine-level Domain wall operator. So when applying a coarse Pauli-Villars preconditioner D −1 P V , the exact same algebraic manipulations for the BNOfactorization carry over. Again the bulk modes are moved to the cut-off and "chiral" modes are confined to the boundary to form an effective coarse D ov (m) operator. This recursive construction is the key element for our DW multigrid construction.
Truncated projection/prolongation: Lastly, in the third step, we define a convention for residual coarsening and error correction prolongation. One straightforward approach would be to coarsen and prolongate across all L s slices. Instead, we find it is possible and in fact advantageous to only restrict and prolong the boundary contribution for the residual coarsening and the error correction, respectively. We assert this convention here and use it in general going forward, however we quantitatively study only acting on the boundary as opposed to the entire bulk in Sec. 4.1.
In the convention where the boundary is at the s = 1 slice, we define the projection of residual and prolongation of the error on the boundary by r s = P † r 1 for s = 1 0 for s > 1 and e s = P e 1 for s = 1 0 for s > 1 . (2.35) The s > 1 elements are restored efficiently by the outer solver and as a smoother correction. This is inspired by the Sec 3.2 Overlap to bulk Domain Wall reconstruction procedure in BNO [19]. It may be surprising, but this is still effective despite the fact we are using D † P V in place of D −1 P V . We discuss this further below, but in brief, this can be motivated as reasonable again by equivalence, D † P V ∼ D −1 P V , up to the quadratic order in momenta in the free theory.
All of these elements come with a variety of potential parameters for optimization on each level. For example since we are only transferring the residual and error correction on a single slice we are also able to reduce the extra dimension on the coarse levels. We will denote L s for the first coarsening as L s , on the second coarsening L s , and so on. We will use a reduced L s and L s going forward, however we quantitatively study varying the coarser L s in Sec. 4.2.
Summary: The full MG algorithm for the linear system D DW x = b, formulated as an extension to a K-cycle (i.e., a multigrid cycle where each coarse solve is wrapped in a Krylov solver) is as follows: • Left precondition the system by D † P V , giving the new linear system • Perform an MG-preconditioned iterative solve (via GCR, FGMRES, etc.) using the operator D † P V D DW .
-Relax on the current residual with D † P V D DW , known as the pre-smoother. • Repeat until the desired tolerance on ||b − D DW x||.
Needless to say, there are several knobs to tune, even as far as MG algorithms go. We have explored a few of these parameters in a preliminary form in the 2-d two-flavor Schwinger model and have left unexplored further until testing for 4-d domain wall methods discussed briefly in the conclusion.  The behavior of our MG algorithm in the approach to the continuum limit at constant physics and physical volume. We see an improvement in convergence as we approach the continuum limit, noting that stable convergence depends on an inexpensive deflation of the coarsest level. We chose to deflate 128 eigenvectors on the coarsest level.

Numerical Tests with the 2-d Schwinger Model
We now turn to testing our domain wall MG algorithm on the two-flavor Schwinger model [20,21,43]. As with the cases of Wilson [8] and staggered MG [1], the Schwinger model is a useful framework for the development and testing of algorithms for QCD [44]. As a low-dimensional prototype model it has the advantage of enabling the rapid exploration of a wide variety of alternative features in a serial laptop code. This can be used to demonstrate validity of an MG algorithm and guide the subsequent application to QCD and software tuning at scale on modern GPU accelerated systems. The importance of this two-step approach can not be over emphasized.
For our investigations in the interacting case, we have fixed M 5 = −1.05 relative to the correct free-field value M 5 = −1 and L s = 16 as a representative large value. Our current performance of the DW MG algorithm outlined above is illustrated in Fig. 3.1 and the accompanying Table 2. The one exception to these parameters is our study of the continuum limit, where we explore the addition of deflation on the coarsest level.
Before describing the details, it is apparent that our basic algorithm vastly improves scaling in the approach to the continuum and chiral limit, nearly eliminating critical slowing down. Our study of additional deflation in the continuum limit, given in Table 1, suggests that the lack of perfect scaling shown in Fig. 3.1 can be improved with deflation. This is supported by ongoing investigations of deflation of the coarsest level with both twisted mass and HISQ fermions in 4-d QCD and inspiration from [14].

Algorithmic Details and Analysis
A benefit of our MG algorithm is its setup is the same as the setup for the traditional γ 5 -preserving MG algorithm for Wilson fermions. In Eq. 2.34 we see that the coarsened domain wall operator takes a Galerkin-projected Wilson operator as a kernel. For this reason we expect the setup to be roughly the same cost as one five-dimensional domain wall solve. We have tuned the Wilson operator to the critical mass for near-null vector generation. Near-null vectors are generated by relaxing on the homogeneous normal system with a Gaussian-distributed initial guess followed by a chiral doubling via 1 2 (1 + P ± ) to preserve a σ 3 -Hermiticity on the coarser levels [1,8].
For the solve, the main structure of the K-cycle is unchanged relative to our previous work. Fixed parameters related to the setup and the K-cycle (target tolerance on each level, etc) are described in Table 2   Inspired by the work of [14], we tested adding a deflation step to the coarsest level, resulting in a four-level algorithm. This is important for addressing critical slowing down on the coarsest level, as has been also seen in 4-d studies for Wilson and staggered fermions. For conciseness of presentation we do not explore this past the results given in Table 1.
To study the viability of our MG algorithm, we consider sweeps in input fermion mass m at both fixed β = 6.0, varying the 2-d lattice volume and at fixed volume 256 2 , varying the bare coupling β. An ideal MG algorithm shifts critical slowing down to the coarsest level, corresponding to mass-independent behavior on the finest and intermediate levels.
Further, it should be insensitive to the volume at constant physics. We are interested in an algorithm that works for all reasonable values of β, however in practice it only needs to works for large β approaching the continuum at fixed physical correlation lengths well below the UV cut-off.

Elimination of critical slowing down
In Fig. 3.2 we consider the number of fine domain wall operator applications as a function of the input fermion mass, which is proportional to the number of outer GCR iterations. On the left, we see that for fixed β critical slowing down has been largely eliminated, albeit there is still a small increase at vanishing mass. On the right, we see that for a fixed volume, the MG algorithm shows an improved fermion mass independence in the approach to the continuum limit. The algorithm is unsuccessful for our coarsest β = 3.0, however this corresponds to a gauge correlation length of l σ ≈ 2.4, which is pushing into an unphysical regime. Similar effects have been noticed in staggered MG [1]. As noted in Table 1, preliminary investigations of deflation on the coarsest level lead to a further reduction in iteration count and, by extension, operator application count on the fine level, though not a complete elimination of the increase as a function of decreasing mass.
Intermediate level: In Fig. 3.3 we consider the average number of GCR iterations on the intermediate level as a function of the input fermion mass. We see that the average iteration count is roughly independent of the coupling β and the volume, which is encouraging. There is a weak mass dependence to the iteration count, however we note that the growth in iteration count appears to be weaker than power law. This is a significant improvement over the power-law dependence that is traditional of critical slowing down. Preliminary investigations of deflation on the coarsest level lead to a reduction in iteration count on the intermediate level, though not a complete elimination of the increase as a function of decreasing mass.
Coarsest level: In the previous two paragraphs we have demonstrated the elimination of critical slowing down from the finest and the intermediate level. This is because critical slowing down has been shifted to the coarsest level. In Fig. 3.4 we consider the average number of CGNR iterations on the coarsest level as a function of the input fermion mass. In contrast to plots for the fine and coarse levels, here we present this data on a log-log plot to examine power behavior. In both the left and right panels, we see behavior consistent with power-law divergence of the iteration count independent of volume and β, showing that critical slowing down has been successfully shifted to the coarsest level. As has been seen in studies with twisted clover and HISQ fermions in 4-d, this final critical slowing down can be efficiently eliminated by deflation.
Comparison with direct solve: In the previous paragraphs we have demonstrated a MG algorithm which shifts critical slowing down from the finest level down to the coarsest level. In Fig. 3.1, we can see a stark contrast in behavior between our MG algorithm and CGNR directly on the domain wall operator. While the number of fine domain wall operator applications scales with only weak mass dependence in the case of our MG algorithm, there is a strong power-law dependence present for CGNR. This is the critical slowing down which has shifted to the coarsest level in our MG algorithm.
We also considered applying the BiCGStab-l Krylov solver directly to D † P V D DW . We can do this because, in contrast to D DW in isolation, D † P V D DW obeys the half-plane condition as proven in Appendix B. While this approach unsurprisingly still demonstrates critical slowing down, there is a marked reduction in fine operator applications. This could be of immediate use for computing domain wall propagators for four-dimensional QCD.

Discussion
The MG algorithm described above, while generally successful, does introduce several components that are worth understanding better and very likely can lead to further improve performance. Even if we were to use the full effective overlap operator, D −1 P V D DW , the assumption that we need only prolong and restrict the boundary mode when going between levels is non-trivial. That this continues to be adequate with our approximation, D † P V D DW is even more surprising. Also the benefit of reducing L s on the coarsened levels begs a better understanding. Here is our initial effort to explore these issues.
As we did in our prior paper on the staggered multigrid [1], we begin by studying the spectrum of our approximate operator and its "coarsened" version to glean some intuition. While we find a strong overlap of physical low eigenvalues between the fine and the coarsened operator, we also find that the coarsened operator includes additional spurious small eigenvalues. Unlike the naïve formulation of MG for staggered fermions, these modes do not appear to undermine the success of the algorithm. To probe this phenomena, we consider the local colinearity and the oblique projector as monitors for the quality of our MG preconditioner. We see that transferring only the boundary component of the fine residual and the coarse error correction is essential for the success of our multigrid algorithm, and present a physical argument for why this "cures" the problem introduced by the spurious small eigenvalues. In the free field limit presented in detail in Appendix A, we note that the low momentum modes to order O(p 2 ) are fixed for any L s ≥ 2, which maybe an indication of the underlying mechanism. We investigated reducing L s on the coarser levels, finding that this reduction leads to an improved algorithm relative to using the fine L s on the coarsened levels.

Boundary-Only Transfer Operator
In Fig. 4.1, we see that the spectrum of the coarsened operator D † P V D DW relative to the spectra for the fine operator D † P V D DW introduces a large number of nearly real low modes. The problem resembles the spurious small eigenvalues that plagued the direct application of Galerkin projection to the staggered operator prior to the Kähler-Dirac preconditioning.
In this instance we posit that the saving grace for the domain wall operator is that the low modes of D −1 P V D DW are bound to the chiral walls, while higher modes bleed more dominantly into the bulk as suggested by Eq. 2.15. Based on this we posit that projecting only the boundary modes between levels acts as a filter against the bulk spurious modes.
In Sec. 2.3 we noted two ways to formulate the transfer operator between levels. The method we utilize across this paper is only prolonging and restricting the boundary mode as given in Eq.  As a quantitative approach to this study we follow the investigations of the staggered MG paper [1]. Consider the normalized right eigenpairs of the fine operator, (λ, v λ ). Given these we inspect both the local colinearity, a measure of preserving the low eigenspace in the least-squares sense, defined by and the oblique projector, defined by which quantifies the reduction or enhancement of a given error component for a magnitude less than or greater than one. While an error enhancement is not inherently a problem, a very large error enhancement requires a prohibitively expensive compensation at the smoother step.  Table 2.
In Fig. 4.2, we consider the local colinearity and the oblique projector for a given configuration. On the left-hand side, we see that the local colinearity is smaller in magnitude when prolongating and restricting the entire boundary and bulk compared with just transferring the boundary. This is not surprising as the boundary-only transfer operator by construction has a smaller span than the full transfer operator. On the other hand, the oblique projector using the boundary-only transfer operator is generally smaller in magnitude than the full transfer operator. This is a good indicator that the coarsening prescription given in the previous section combined with a transfer operator acting only on the boundary modes leads to an improved MG algorithm.

Reduction of Coarse level L s
An additional benefit of using a transfer operator which only acts on the boundaries is it gives us the flexibility to tune L s between levels. We will investigate this by two avenues. First, we will consider the behavior of the local colinearity and the oblique objector for two different values of the coarse L s . Next, we will perform an explicit test of domain wall MG for a large value of the outer L s , with a range of fixed smaller L s values on all coarser levels.
We illustrate the oblique projector on the left-hand side of Fig. 4.3. We see that a reduced coarse L s = 4 behaves at least as well as maintaining a constant L s = 8. In addition,  for larger values of the fine eigenvalue (i.e., higher momentum and bulk modes), the error enhancement is further suppressed. This may be related to the spectrum of the reduced L s operator on the right-hand of Fig. 4.3. We see that this operator does suffer from fewer spurious small eigenvalues, leading to a reduced risk of error enhancement. Following this investigation of the oblique projector and the spectrum on a small configuration, we have studied the performance of a solve on a 128 2 configuration with a representative β = 6.0. Unlike our previous studies, we have chosen a large L s = 32 for the outermost solve. For simplicity we chose a fixed reduced L s for both the intermediate and coarsest level.
We see in Fig. 4.4 that reducing the L s between the fine and the intermediate level leads to a perfectly well behaved preconditioner. This is impressive for two reasons. One, a reduction in L s does lead to an enhancement in chiral symmetry breaking, suggesting that the intermediate and coarsest levels would not accurately capture the low modes of the fine level. This does not appear to be a problem. Two, a reduced L s leads to enhanced stability for very small masses. There also may be a side benefit of fewer spurious small modes for the coarsest operator with reduced L s .
We are encouraged that reducing L s on the intermediate and coarsest level leads to an improved algorithm, and it is what informed the formulation studied in Sec. 3. Of course, there is a wider parameter space we could explore in this study: varying the length of the extra dimension separately for each level, tuning M 5 , tuning m to approximately constant m + m res , deflating the coarsest level to avoid a large iteration count. In light of our success simply reducing L s and making no other changes, we defer such in depth  investigations to a study in four dimensions.

Conclusion
We have presented a new approach to formulating an MG solver for domain wall fermions. This is a critical step to realizing the full benefit of this chiral formulation which is theoretically superior but more computationally demanding than the Wilson and staggered discretizations. For clarity of exposition our formalism was restricted to the Shamir version of domain wall and for easy of development and testing restricted to domain wall fermions for the 2-d two-flavor Schwinger model. Neither of these choices are fundamental. These results convince us that this is a solution to a long sought fully recursive MG algorithm for domain wall fermions that can eliminate critical slowing down approaching the continuum limit for small fermion masses.
As was the case with prototyping MG solvers the for Wilson [8] and staggered [1] discretizations, the next step is to develop software and optimize performance for the Dirac solver for lattice QCD. We anticipate a range of algorithmic embellishments and software methods to optimize such an algorithm for use on exascale-trajectory machines in both the weak-and strong-scaling regimes.
One salient feature is important to emphasize. The projection and prolongation only requires finding the near null space of the 4-d Wilson kernel, saving computational overhead and memory occupancy relative to the naïve cost of the 5-d domain wall operator. The fact that there is no expansion of the null space relative to Wilson MG due to the heavy flavors in the extra dimension is very good news. We foresee that this approach will be effective even for workflows that require few solves per gauge field, e.g., Hybrid Monte Carlo.
This points to another benefit. The basic method presented here for the Shamir implementation applies equally well to Möbius, Zolotarev, Borici, etc. formalism via the domain wall/Pauli-Villars factorization in BNO. The only requirement is that each factor is a linear functional of the Wilson kernel. As a consequence it should also be straight forward to generalize our algorithm directly to the overlap operator itself. To appreciate the full landscape consider the large class of chiral fermion methods presented by Edwards, Joo, Kennedy, Orginos, and Wenger in Ref. [31]. In all of these the sign function sign [H] in the overlap operator Clearly there is a larger landscape of MG algorithms for chiral fermion operators to explore. Optimizations will depend on the specific applications and target architectures. We anticipate that alternative implementations of the MG solver for domain wall fermions in four dimensions [12,13] may contribute to further optimizations. We leave the detailed study of these generalizations and optimizations for MG for domain wall and overlap chiral fermions to future investigations.

A Low Momentum Expansions for the DW Fermions
The free theory, setting U = 1, can be expanded and diagonalized in momentum space for finite L s , giving valuable guidance to our MG construction. First consider our approximation, D † we note that the first term by in isolation gives a circle of eigenvalues in the complex plane. Moreover, as is apparent in Fig. 2.1, this basic pattern persists even with non-trivial gauge fields. In the free-field limit, we can further diagonalize in space-time Fourier modes, p µ , giving where the summation includes γ 5 and p 5 . The free normal operator is Indeed more generally for M 5 = −1 one can prove that D † P V D P V is bounded below by 1, i.e., at the lattice cutoff scale 1/a 2 . With non-trivial gauge fields the approximation D † P V ≈ D −1 P V holds qualitatively particularly when M 5 is appropriately tuned.
The next step is to compare the low eigenspectra of the preconditioned operator D −1 P V D DW (m) and the approximation D † P V D DW (m). From Fig. 2.5 we note the near exact coincidence of small eigenvalues even in the presence of gauge fields. We can elucidate this in the free-field limit.
We cannot simultaneously diagonalize D P V and D DW (m) in the bulk dimension due to the difference in boundary conditions which complicates the analysis. In this case we take advantage of low-momentum perturbation theory in the space-time dimensions. Given D DW (p µ , m) ss , we expand both it and the Pauli-Villars factor at low d-momenta, take the appropriate products and solve the characteristic polynomial for the eigenvalues. The pairing of low modes gives a complex square root singularity in the complex plane, leading to the circular structure of the low spectrum as evident in Fig. 2.5. The results from this approach are given in the text in Eq. 2.24 and Eq. 2.25. Comparing the exact vs the approximate low spectra we see they are identical up to a six-dimensional operator O(a 2 mp 2 ).
We additionally note that the low momentum expansion of the effective overlap operator, which up to rescaling by 1/(1 − m) gives λ ± ∼ m q + p 2 ± i p 2 in agreement with Eq. 2.24 in the text up to a dimension-three scaling operator of O(mp 2 ). Higher-order expansions in 2-d and 4-d effective overlap operators are easily shown in Mathematica to follow this expansion invariance up to O(p Ls ). We believe the fact that the quadratic form is independent of L s ≥ 2 even for our approximate D † P V D DW (m) effective overlap operator is at the core the effectiveness of using small L s for MG iterations on the coarse level.
Finally we note that this expansion is almost certainly a divergent asymptotic series and as such does not by itself lend itself to fix the coefficients in improved higher-order polynomial approximation [D P V D † P V ] −1 c 0 + c 1 D P V D † P V + c 2 (D P V D † P V ) 2 · · · (A.10) to [D P V D † P V ] −1 in Eq. 2.27 beyond the zeroth order. We see this first-order equivalence is given by c i = (1, 0, 0, 0 · · · ). We can take thus further by expanding [D P V D † P V ] −1 = 1/(1 + A P V ) in A P V = D P V D † P V − 1 to show that each term in Eq. 2.27 corrects the approximation by another two powers of p. For example just the second-order expansion, c i = (2, −1, 0, 0, · · · ), giving the equivalence D −1 P V = D † P V (2 − D P V D † P V ) + O(p 4 µ ). This polynomial on its own cannot be used in our MG algorithm because 2 − D P V D † P V is not positive definite so coefficient in a polynomial truncation must be weighted appropriately over the entire spectrum.
B Proof that the spectrum has a positive real part In the full interacting case with gauge fields we consider our approximation D † P V D DW as D † P V D P V D −1 P V D DW and note from Eq. 2.14 that D −1 P V D DW has the eigenspectra of the finite-L s overlap kernel, Equivalently this is just a statement of the unitarity bound on matrix elements. By extension, the spectrum of D ov (m) lives inside a circle of radius 1−m 2 centered at ( 1+m 2 , 0) in the complex plane. Thus both D ov (m) and D −1 P V D DW have positive definite real part for m > 0.
Let us now consider the eigenvalue problem for the operator D † P V D DW for any right eigenvector |λ , where we have generally written the eigenvalue λ as re iθ . We wish to prove the right-half plane condition θ ∈ (−π/2, π/2). We note we can re-write the left-hand side of the above system as D † P V D P V D −1 P V D DW |λ = re iθ |λ . Since D −1 P V D DW satisfies the right half-plane condition and D † P V D P V is a Hermitian positive operator, this proves the right half-plane condition θ ∈ (−π/2, π/2) for m > 0.