Multigrid for Staggered Lattice Fermions

Critical slowing down in Krylov methods for the Dirac operator presents a major obstacle to further advances in lattice field theory as it approaches the continuum solution. Here we formulate a multi-grid algorithm for the Kogut-Susskind (or staggered) fermion discretization which has proven difficult relative to Wilson multigrid due to its first-order anti-Hermitian structure. The solution is to introduce a novel spectral transformation by the K\"ahler-Dirac spin structure prior to the Galerkin projection. We present numerical results for the two-dimensional, two-flavor Schwinger model, however, the general formalism is agnostic to dimension and is directly applicable to four-dimensional lattice QCD.


Introduction
Increasingly powerful computers and better theoretical insights continue to improve the predictive power of lattice quantum field theories, most spectacularly for lattice quantum chromodynamics (LQCD) [1]. However, with larger lattice volumes and finer lattice spacing, exposing multiple scales, the lattice Dirac linear system becomes increasing illconditioned threatening further progress. The cause is well known: as the fermion mass approaches zero, the Dirac operator becomes singular, due to the exact chiral symmetry of the Dirac equation at zero mass, causing critical slowing down [2]. The algorithmic solution to this problem for lattice QCD was recognized 25 years ago. The fine-grid representation for the linear solver should be coupled to multiple scales on coarser grids in the spirit of Wilson's real space renormalization group and implemented as a recursive multigrid (MG) pre-conditioner [3]. Early investigations in the 1990s introduced a gaugeinvariant projective MG algorithm [4,5] with encouraging results for the Dirac operator in the presence of weak (or smooth) background gauge fields near the continuum. However, in practice lattice sizes at that time were too small and the gauge fields were too rough to achieve useful improvements.
Not until the development of adaptive geometric MG methods [6,7], was a fully recursive MG algorithm found for the Wilson-Dirac discretization, which was able to transfer the strong background chromodynamics fields onto coarser scales and eliminate the illconditioning of the Dirac kernel in the chiral limit. In spite of this achievement for the Wilson-Dirac and closely related twisted mass formulation [8,9], these are not the only important Dirac discretizations in common use in lattice field theory. Three other discretizations used extensively in high energy applications, which more faithfully represent chiral symmetry on the lattice, are referred to as the domain wall [10], overlap [11], and staggered [12] fermions. The application of adaptive geometric MG to these discretizations has proven to be more difficult, perhaps related to the improved lattice chiral symmetry. A two-level MG solver for domain wall fermions has been implemented [13,14] which shows some promise, and a non-Galerkin algorithm has been implemented for overlap fermions [15], but there has been no success at formulating a MG staggered algorithm. Moreover, since staggered lattice ensembles are now the largest available for LQCD, requiring O(10 5 ) iterations for good convergence, improving staggered solvers is a critical issue. Here we introduce a novel solver with the Kähler-Dirac spin structure [16,17] that allows, at last, the construction of an effective multi-level adaptive geometric MG algorithm for staggered fermions.
The staggered fermion is a remarkable discretization [12,18] which closely resembles the continuum Dirac linear operator, The lattice discretization replaces the derivative by a gauge-covariant central difference, resulting in a sparse matrix operator on a hypercubic lattice with the background gauge fields U (x, y) represented by highly oscillatory SU (3) matrices on each link x, y of the lattice. The γ µ matrices are replaced by a single staggered ±1-sign: η µ = (−1) ν<µ x ν . Similar staggered lattice realizations of Dirac fermions have proven valuable not only for lattice QCD investigations, but also for a variety of physical systems such as graphene in condensed matter [19], supersymmetry [20], and strongly interacting conformal fixed points of possible interest for beyond the standard model (BSM) physics in the Higgs sector [21,22,23,24,25,26,27,28,29].
Unlike the Wilson and domain wall methods, the staggered discretization preserves the exact anti-Hermiticity of the continuum Dirac operator up to real mass shift. In this sense it represents the most primitive (or even fundamental) discretization. It has no explicit spin matrices (γ µ ), so the Dirac spin structure only emerges in the continuum limit. Each 2 d lattice sub-block in four dimensions reassembles into four Dirac flavors (or tastes), the content of a single Kähler-Dirac fermion [30]. This is the structure that our MG algorithm exploits: dividing out the 2 d Kähler-Dirac spin structure transforms the spectrum into a near "circle" in the complex plane as illustrated in Fig. 2.3. The striking similarity of the resultant spectrum to the Wilson and overlap spectra is, we believe, essential to the success of our staggered MG algorithm.
In LQCD applications with staggered fermions, the system D(U, m) ij ψ j = b i is typically solved via Krylov methods on the Schur decomposed even/odd operator (or, equivalently, the red/black operator). Because the preconditioned operator is Hermitian positive definite, the system can be solved by the conjugate gradient (CG) algorithm. This method has proven robust, and there are some well established methods to fend off critical slowing down, such as EigCG [31] eigenvalue deflation or block Krylov solvers [32,33,34,35]. Block solvers do not remove critical slowing down, and deflation methods scale poorly with the volume in terms of the number of eigenvectors need to remove critical slowing down. As explained in our earlier report [36], an adaptive geometric MG algorithm for the staggered normal operator can be easily formulated which removes critical slowing down. However this comes with a heavy overhead. A Galerkin coarsening of the normal equation introduces next-to-nearest neighbor (or corner) terms, resulting in a 2d + 2d(d − 1) site coarse operator stencil; in four dimensions increasing the off-diagonal terms from 8 to 32 terms. This becomes prohibitively expensive in terms communication pressure in parallel strong scaling MG solvers [37,38,39,40].
The solution to this problem is to develop an MG algorithm directly on the staggered operator. In the interest of algorithm development, we consider a two-dimensional model system as opposed to the full four-dimensional QCD. The two-dimensional staggered fermion, coupled to an Abelian gauge theory, U (x, x + µ) = exp[iθ µ (x)] is the two-flavor Schwinger model in the continuum limit [41,42]. This is a fully non-perturbative quantum field theory which is an ideal analogue to four-dimensional QCD. Like QCD it exhibits confinement with a zero mass triplet of "pion-like" bound states in the chiral (zero mass) limit, and instantons that present a topological mechanism which breaks chiral symmetry dynamically in the flavor singlet channel [43]. As such, this has proven to be a reliable test framework [6] prior to a full implementation for four-dimensional QCD. The reader is referred to an extensive literature to understand the physical features that guide our construction in two dimensions and the natural generalization to four dimensions. The lattice Schwinger model has the action Introducing the lattice spacing a, the bare mass (m) and the gauge coupling (g) are given by dimensionless parameters, m 0 = am and β = 1/(a 2 g 2 ) respectively. There are two important physical length scales determined by these parameters: (1) The fundamental gauge correlation length (or string length) measured by the Wilson loop area law is l σ = a √ 2β. (2) The fundamental fermion length scale measured by the "pion" Compton wave length is l Mπ = 1 Mπ ≈ 0.5a(am) −2/3 β 1/6 [42]. To approach the continuum both must be large relative to the lattice spacing. As an analogue to QCD, we should also approach the chiral regime with l Mπ /l σ 1. To control finite volume L d and finite lattice spacing a errors, the four length scales should obey the constraint: L l Mπ l σ a.
This two-dimensional theory has been carefully selected because of its remarkable similarity to four-dimensional QCD both in terms of the underlying physics and the formal mathematical structure. Although at present our numerical tests are restricted to two dimensions, the entire formal structure is applicable to higher dimensions. The numerical analysis of a four-dimensional algorithm for lattice QCD is under development in QUDA [44,45,46], an efficient GPU framework for LQCD applications. Results will be presented in a subsequent publication.
The organization of the paper is as follows. In Sec. 2 we give the mathematical framework of the staggered Dirac operator essential to our subsequent MG formulations. In Sec. 3 we consider a Galerkin projection of the original operator and explain why it fails as a MG preconditioner. We then constrast it with the coarse projection of our new Kähler-Dirac preconditioned operator. In Sec. 4 we present in detail the construction of the staggered MG algorithm, followed by detailed numerical tests for the two-dimensional Schwinger model. In Sec. 5 we discuss some alternatives to our current implementation, which may be useful in the application of our staggered MG algorithm to four-dimensional LQCD and other staggered lattice simulations. For example, a method for exactly preserving complex conjugate eigenpairing and numerical tests thereof is presented in Sec. 5 and in Appendix A, respectively.

Mathematical Preliminaries of Staggered Fermions
The geometric structure of the staggered Dirac operator Eq. 1.2 and its relationship to the low-lying eigenspectrum is important for our analysis. Many of its features are inherited directly from the discretization of the continuum action, (2.1) The naïve fermion discretization uses a central difference approximation for the first derivative, which causes the so-called "doubling" (or aliasing) problem [47]. In the continuum, a single naïve fermion gives 2 d Dirac fermions: 16 four-component spinors in four dimensions and 4 two-component spinors in two dimensions. The staggered construction reduces this multiplicity by spin diagonalizing the Dirac structure, then dropping all but one of the 2 d/2 copies. Explicitly, this spin diagonalization, is achieved by the unitary field redefinition, with Ω x = γ x 1 1 γ x 2 2 · · · γ x d d . Dropping all but one copy, with the replacement results in a partial solution of the doubling problem by reducing the fermion content to 2 d/2 staggered Dirac fermions: 4 in four dimensions and 2 in two dimensions. It is convenient to write the staggered operator succinctly as The staggered operator has a few special properties not shared by other fermion discretizations. The staggered operator is anti-Hermitian up to a mass shift and is normal: [D(U, m), D † (U, m)] = 0, just like the continuum operator. This is in contrast to the Wilson discretization, with its Hermitian second order Wilson (stabilization) term that decouples doublers but makes D W non-normal in the interacting case. The Wilson term also explicitly breaks chiral symmetry. On the other hand, the staggered operator retains a single exact chiral symmetry in the interacting case, with (x) being the generator of the chiral symmetry. These good chiral properties give (x)D + D (x) = 2m (x) → 0 as m → 0. The chiral projectors, 1 2 (1 ± (x)), partition the lattice into even and odd sub-lattices, Furthermore, D features an (x) Hermiticity, analogous to γ 5 Hermiticity of the continuum Dirac operator.
The normal equations for the staggered operator are diagonal, The Schur-preconditioned system takes on a similar structure and is also Hermitian positive definite. For the free problem (i.e., unit gauge fields, U (x, x + µ) = 1), there is an exact cancellation of all next-to-nearest neighbor "around-the-corner" terms in the normal operator. This is a result of the η µ phases preserving a key property of the Dirac algebra when taking the product of ηs around a plaquette, The result is a set of 2 d decoupled Laplace operators on a lattice with spacing 2a illustrated in Fig. 2.1. In this sense, the free staggered operator is truly the "square root" of the Laplace operator, similar to the continuum Dirac operator. We can immediately write down the eigenvalues of the free staggered operator, given by where the p µ = 2n µ π/L for integers n µ ∈ [−L/4 + 1, L/4] due to the shift-by-two translational invariance. The eigenvalues are imaginary (up to a real mass shift) and come in complex conjugate pairs. When an interacting gauge field is turned on, the "around-thecorner" terms no longer vanish, leaving the two decoupled components in Eq. 2.8. These The normal operator applied on an odd site "o". All contributions to even sites "x" cancel due to D being normal. Links in black (solid) and red (dashed) correspond to ±1, respectively, due to the contributions of η µ and the anti-Hermiticity of D. In the free field, it's clear that corner terms cancel.
next-to-nearest neighbor terms are the standard so called clover term, resulting in an irrelevant, in the Wilsonian sense, spin gauge interaction (σ µν F µν ) in the continuum. The spectrum cannot be found analytically, but (x) Hermiticity symmetry ensures that the eigenvalues still appear in exact complex conjugate pairs.

Kähler-Dirac Preconditioning
We now consider the spectral transformation which is essential to the staggered MG algorithm presented in Sec. 3 and tested numerically in Sec. 4. Here we will show that when the staggered operator is right-preconditioned by the 2 d Kähler-Dirac blocks, the spectrum on the resultant 2a blocked lattice is dramatically different. In the free case, we prove that this transformation gives an exactly circular spectrum in the complex plane, similar to the overlap lattice Dirac discretization [11]. The inclusion of gauge fields and/or the three link Naïk term [48] are relatively small modifications of this basic circular structure.
The argument proceeds as follows. The staggered operator is composed of blocks containing 2 d sites, corresponding to 2 d degrees of freedom that in the continuum limit are recombined into a multiplet of Dirac fermions [49,50]. It is straightforward to see that the decomposition of the staggered operator in 2 d blocks of sites partitions the lattice, as illustrated in red in Fig. 2.2, into independent 2 d blocks B containing a plaquette of links.
We will refer to these as Kähler-Dirac blocks. We also include the local mass term into this B block. The nearest-neighbor terms between the B blocks contibute to a block hopping term C, which is unitarily equivalent, up to the mass shift, to the block-local contributions in B. B and C only share sites at the corner of squares in two dimensions, cubes in three dimensions, and hypercubes in four dimensions. This is a dual decomposition: half of the links on the original lattice contribute to B, and half contribute to C, as represented in Fig. 2.2. We denote this partition between hopping terms within and across blocks as We remark that we can interchange this dual description by shifting the coordinates x i → x i + 1, where 1 is a vector of ones. We now construct the right-block-Jacobi or Kähler-Dirac preconditioned operator as This is a remarkably different operator with which we develop our MG algorithm.
To characterize these differences, we will first consider the free case. After rescaling and multiplying by (x), the generator of the exact staggered chiral symmetry, both terms are separately Hermitian, traceless, and unitary. More concretely, we define B = B (x)/ √ d + m 2 and C = C (x)/ √ d and note: as a trivial consequence of the perfect cancellation of the corner terms for Eq. 2.9. These properties imply that B and C have equal numbers of ±1 eigenvalues, and further that the product C B is a unitary matrix U . (The addition of a Naïk term does not changeB, but it does contribute to C.) With this observation, our free Kähler-Dirac staggered operator A is given by (2.14) The eigenvalues of DB −1 lie on a circle centered at 1 as illustrated in Fig. 2.3. The radius of the circle is ρ = d d+m 2 . In the massless limit, the radius is exactly 1. This leads to an identical structure to the overlap operator, under the mapping (γ 5 , γ 5 ) → ( C, B). Both C and B are algebraically similar to γ 5 and γ 5 , being Hermitian and unitary with an equal number of ±1 eigenvalues. Adding a mass term to the overlap operator similarly rescales the unitary portion of the spectrum, introducing a mass gap. For comparison, in the right panel of Fig. 2.3, we show the free spectrum of the massless two-dimensional Wilson operator, the two-dimensional overlap operator and our new two-dimensional Kähler-Dirac preconditioned operator. Very similar figures apply to four dimensions, except the Wilson spectrum now has four arcs in the positive real direction. Finally, we note that if we add a Naïk term to the original staggered operator, the right preconditioning perturbs the unitarity of the spectrum but preserves the qualitative geometric features. A comparison of the Kähler-Dirac preconditioned operator to the original staggered operator is given in the left panel of Fig. 2.3. We compare the massive spectrum against all other fermion discretizations in Fig. 2.4.
The Kähler-Dirac operator no loner admits a simple "γ 5 " Hermiticity condition. However, it does obey a modified asymmetric γ L/R 5 condition, which is essential  for our discussions in Sec. 3. The key observation is to note that we can change the "convention" that A is given by a right block preconditioning of D to a left block preconditioning via the transformation (x)A † (x) = B −1 D. We can rearrange this identity and note and likewise we can take advantage of the (x) Hermiticity of D to note ). This is a generalization of the idea of γ 5 Hermiticity: now, γ L 5 γ R 5 = 1 and A † = γ L 5 Aγ R 5 . Also, just as is the case for the Wilson and staggered operator, these properties are enough to show that A features complex conjugate eigenpairs. Assume A|λ R = λ|λ R , where the superscript R . We can take the Hermitian conjugate of each side of the equation. Next, we can right-multiply by γ L 5 and take advantage of γ Hermiticity. This gives us λ R |γ L 5 A = λ R |γ L 5 λ * , that is, A also has an eigenvalue λ * with a left eigenvector λ * L | ≡ λ R |γ L 5 . This same exercise can be trivially repeated for left eigenvectors using γ R 5 to the same end.

Free Spectrum after Kähler-Dirac Preconditioning
For a detailed analysis of the spectrum, we introduce the flavor representation of the staggered operator [49,51,50], which is unitarily equivalent to a lattice Kähler-Dirac fermion in the free field [17,52]. Here each submatrix B is expressed in terms of the spintaste gamma matrices which enumerate the components of a single continuum Kähler-Dirac fermion [30]. Its action is where q(X) is the Kähler-Dirac field containing 2 d degrees of freedom, X is the Kähler-Dirac block index, b = 2a = 2 is the lattice spacing between Kähler-Dirac blocks, and the finite difference operators are defined as In the language of staggered fermions, the γ µ matrices generate the spin algebra, while the matrices τ µ = γ † µ generate the so-called taste algebra. It should be noted that if these lattice fermions are gauged on the lattice with twice the lattice spacing b = 2a, the resulting lattice theory of interacting Dirac-Kähler fermions [16,53] is no longer equivalent to the interacting staggered fermion and, of note, can generate a dynamical mass term. Likewise, on a continuum Riemann manifold, a Kähler-Dirac fermion admits a different gravitational gauging than Dirac fermions [54].
Our decomposition of D = B + C is now partitioning Eq. 2.19 into local and nearest neighbor contributions. The local block B is given by in the massless case. The inverse is given by The transformation D → A = DB −1 gives the kernel  trivially diagonalized, and the imaginary part is prescribed by recalling the shifted unitary structure of the spectrum. This gives This spectrum is visualized on the left panel of Fig. 2.3. The spectrum can be written as 1]. We note again that, up to a scaling, the low spectrum is similar to the Wilson, overlap, and staggered spectrum.

Non-zero Mass Term
The spectrum undergoes a minor change when the original staggered operator is massive. The local block now becomes µ γ 5 ⊗ τ µ τ 5 + m, and the preconditioned spectrum becomes which parameterizes the arc of a circle 1 − ρe iθ centered at (1, 0) with radius ρ = d d+m 2 . The arc is bounded to the range cos(θ) = ρd −1 Naïk Term Many modern LQCD simulations add a next-to-next-to-nearest neighbor improvement term known as a Naïk [48] term. Two common realizations of this improvement, equivalent in the free-field limit, are AsqTad [55] and HISQ [56] fermions. The free operator [52] is given by The improved action admits the spectrum (2.28) The effect of the Naïk term on the Kähler-Dirac action is to modify the nearest neighbor term and add a next-to-nearest neighbor term. The 2 d Kähler-Dirac block B N aik = 9 16 B is unchanged up to a trivial rescaling. The new contributions are confined to C, which is no longer unitary: C † N aik C N aik = I. Likewise, the spectrum is no longer a shifted unitary spectrum. Indeed, in two dimensions, the massless free spectrum is given by where S n = 1 2 µ=x,y cos(np µ ) and x = −1/48 9/16 , the ratio of the improvement coefficients in the Naïk-improved action. The improved spectrum is shown on the left panel of Fig. 2.3, with the low modes emphasized in Fig. 2.4. We again make the critical observation that the spectrum is qualitatively similar to the original Kähler-Dirac spectrum, Wilson spectrum, and overlap spectrum.
Interacting Staggered Fermons in Kähler-Dirac Form: We are ultimately interested in performing this right-block-Jacobi preconditioning on the interacting staggered operator, not the free operator. Procedurally, this is done by first gauging the staggered operator, and then performing the same unitary blocking transformation between the staggered form and the Kähler-Dirac form. The local block no longer has a simple structure because of gauge links [17]. In two dimensions, the Kähler-Dirac block B attached to a unit corner at 2 n on the original staggered lattice is given by  Like the free case, the block B is still anti-Hermitian plus a massive shift. However, unlike the free case, the interacting B and C are not unitary, and as such the product CB −1 does not have a unitary spectrum. Nonetheless, the spectrum is still approximately circular and centered at 1, as can be seen later in Fig. 3.5. This is a desirable property for matrix preconditioning in general [57,58], and is essential for a successful MG algorithm. Importantly, this operator still maintains the γ L/R 5 Hermiticity defined in Eq. 2.17 and 2.18. This is true because the proofs of γ L/R 5 Hermiticity solely depend on the (x) Hermiticity of D, which holds in the interacting case. By extension, the proofs of complex conjugate eigenpairing still hold. These comments carry over as appropriate when a Näik term is also included.

Multigrid Coarse Operator
In forming the Galerkin projection of the staggered operator, we follow the methods of previous successful formulations of MG for the Wilson-Dirac discretization for LQCD [6]. Near-null vectors, or vectors which predominantly span the low-right eigenspace of the Wilson operator, are constructed by relaxing on the homogeneous equation with random initial guess as is discussed in detail in Sec. 4. Later in this section we will also consider exact low eigenvectors programmatically as near-null vectors. The resulting near-null vectors are chirally doubled and block-orthonormalized to construct the rows of the restrictor matrix R, which aggregates fine degrees of freedom to a single site on the coarse lattice, and the prolongator matrix P , which maps coarse degrees of freedom back to the fine lattice. Unless otherwise noted, R = P † . For the staggered operator, this implies that coarsening preserves the anti-Hermitian plus mass-shift structure. Block orthonormalization implies P † P = I. The prologator and restrictor can be used to define the coarse operator, (3.1) The hat notation refers to an operator one level coarser than the "unhatted" operator.
We will begin by reviewing the Wilson formulation, largely to establish notation. We will extend this formulation to the staggered operator and show why this method fails to produce an effective recursive algorithm in this case. We will last repeat this formulation for the Kähler-Dirac preconditioned operator and show that, in contrast to the original staggered case, this method succeeds.

Review of Wilson Dirac Coarse Operator
We begin by a basic restatement of the procedure for the adaptive geometric MG developed for the Wilson operator in QCD [5,6]. It is important to first note that the Wilson operator does obey γ 5 Hermiticity, that is, Hermiticity is sufficient to prove that eigenvalues of D W come in complex conjugate pairs as the limiting case of γ L 5 = γ R 5 , as discussed in Sec. 2.1. Returning to MG, n 1 vec near-null vectors are generated, where the "1" refers to coarsening the finest level, as discussed in Sec. 4. A key next step is chiral doubling: every near-null vector |ψ i is "doubled," giving 1 2 (1 ± γ 5 ) |ψ i . For this reason, on the coarse operator, each coarse site has 2n 1 vec internal degrees of freedom (dof), or, alternatively, a dense structure of n 1 vec "coarse color" dof times two "chirality" dof. A successful implementation of Wilson MG critically depends on the preservation of chirality.
After performing a chiral doubling of the near-null vectors, we pack the doubled vectors into the prolongator and again define R = P † . The tilde convention here is an indication that we have not (yet) block orthonormalized the 2n 1 vec vectors on each block. The chiral doubling implies γ 5 P = P σ 3 , where σ 3 = diag[1, · · · , 1, −1, · · · , −1] is a block Pauli matrix, or alternatively, the traditional σ 3 acting on the coarse chirality dof. It is easy to see that P † D W P is "σ 3 " Hermitian: The essential property γ 5 P = P σ 3 is unchanged after we perform the last step, block orthonormalizing P to get P , because we performed our chiral doubling with a bona fide projector. The top chiral components and the bottom chiral components are already trivially orthonormal. This gives the final essential properties γ 5 P = P σ 3 , and D W = P † D W P is σ 3 Hermitian. This methodology can be trivially extended to a recursive coarsening.

Failure of Galerkin Projection of Staggered Operator
The prescription for (recursively) generating a coarse refinement of the Wilson operator D W fails when naïvely translated to the staggered operator D with the only change being the replacement of γ 5 with (x), as noted by Eq. 2.6. While the iterative inversion of the even/odd preconditioned system exhibits critical slowing down, it does converge. However this attempt at a Galerkin MG on the staggered operator D stalls completely at large volumes as illustrated in Fig. 3.1. We need to understand the cause of the failure of the Galerkin projection D = P † DP as a preconditioner. A MG algorithm may fail because the coarse operator does not accurately reproduce the low eigenspace of the fine operator, or because the coarse error "correction" is ineffective. We will study each of these properties for the staggered operator to attempt to understand the issue.
As a spectral preconditioner, we expect the coarse operator to approximately preserve the low eigenmodes of the fine operator. In Fig. 3.2 we address this issue by comparing our failed staggered MG spectra with the successful Wilson MG spectra in Fig. 3 Table 1.
due to (x) Hermiticity on the fine level and coarse levels. The other four columns give the spectrum for a recursively-coarsened operator, constructing the prolongator/restrictor from exact low eigenvectors (left side) and near-null vectors (right side), where the nearnull vectors are again generated as discussed in Sec. 4. Filled shapes correspond to the first coarse level. Hollow shapes correspond to the operator from a recursive coarsening. The horizontal black lines trace the low modes of the fine operator across the coarsened operators. While these physical low modes are well preserved in all cases, there many additional, spurious low eigenvalues in the coarse spectrum.
These spurious eigenvalues have a simple origin. Consider the normalized eigenvector of the coarse operatorD|λ =λ|λ . We note thatλ = λ |D|λ = λ |P † D P |λ . If  Table 1. with complex conjugate eigenvalues. This eigenvector may have nothing to do with the low modes of D.
This would be less of an issue if higher modes were gapped along the real axis. This is true of the Wilson operator, as can be seen for a representative case in Fig. 3.3 * . For the fine operator, whose eigenvalues are given by red squares, high modes are gapped along the real axis. For the coarse operator, whose eigenvalues are blue triangles, low modes are well preserved. Higher modes "collapse" towards the complex origin but are still well gapped along the real axis. This could be why MG on the Wilson operator does not break down, and may identically predict success for the Kähler-Dirac preconditioned operator. * In the interacting case, the Wilson operator is no longer normal, and our convex hull proof breaks down. It appears that it is still sufficiently true, perhaps because free Wilson operator is exactly normal. In a perturbative sense, the interacting Wilson operator is then "approximately" normal. Local Co-linearity versus the Oblique Projector The Galerkin MG scheme involves two different projection operators: • The projection operator, P = P R, from the fine space into a coarse subspace. Using the right eigenvectors as a basis for the fine vector space, V = {|v λ , 0 < |λ| ≤ |λ max |}, our goal is for eigenvectors with small (near-null) eigenvalues, |λ|/|λ max | < ε, to be approximately represented within the span of the coarse subspace, V = PV , in a least-squares sense. • The oblique (or Petrov-Galerkin) projector, P ob = 1 − P (RDP ) −1 RD, that defines space of error components that are returned to the fine level with a complete solve in the coarse subspace. To not overburden the smoother this should at least not unduly amplify large eigenvectors, |λ|/|λ max | > ε.
Both are true projectors dividing the fine vector space V into disjoint subspaces, P(1−P) = 0 and P ob (1 − P ob ) = 0, though they do not define the same subspaces. The orthogonality, P ob P = 0 is one-sided since PP ob = 0.   Table 1. Both panels are sorted by increasing magnitude of the eigenvalues.
Let us see how well the staggered MG handles these two requirements. In our construction, R = P † , so the coarse space projector is Hermitian. The statement of preserving the low eigenspace in the least-squares sense can be formulated as sufficiently minimizing for small eigenvalues of fine operator D. Since we generate our coarse space by geometric aggregation, this can be thought of as the local co-linearity of near-null vectors with low eigenmodes. In the top left panel of Fig. 3.4, we see that starting either with a blockorthonormalized basis of near-vectors or of low eigenvectors results in a good converage of the low spectrum. This is typical of MG methods. At the bottom left this is extended to the next coarsest level with similar result. This has important implications for eigenvector compression methods [59].
However, this is not sufficient for a successful coarse correction in a MG algorithm. The coarse correction should address the low modes of the fine operator without introducing large errors in the high mode subspace. The error after solving the coarse level is updated as e ← e − P (RDP ) −1 RDe = P ob e. This is quantified by the magnitude of each eigenvector acted on by the Petrov-Galerkin or the so-called oblique projector , (3.5) The oblique projection of the coarse error (Pe) is zero: P ob P = [1−P (RDP ) −1 RD]P R = 0. However, the oblique projection is not Hermitian so this does not imply the error in the orthogonal complement space (e − P Re) vanishes. This is illustrated on the right side of Fig. 3

.4.
A magnitude less than or greater than one corresponds to a reduction or enhancement of the complementary error component, respectively. A successful coarse operator should strongly reduce the error component for low eigenmodes. In the context of MG, the enhancement from higher modes is addressed by the smoother. A larger enhancement requires a more expensive smoother, otherwise the solve stalls. In the top right panel of Fig. 3.4, we see that, for high modes, there is a large error enhancement. This is worse for a prolongator generated from near-null vectors than one generated from eigenvectors. In the lower right panel, we see the situation is even worse for a three-level algorithm. In all cases, an aggressive smoother is needed, increasingly so at coarser levels. This is why we saw the MG algorithm fail. Now we turn to the same analysis for the Galerkin construction of the Kähler-Dirac preconditioned operator, which has in contrast minimal error enhancement, evident in Fig. 3.6.

Coarse Kähler-Dirac Staggered operator: A
We will coarsen the Kähler-Dirac preconditioned operator similarly to how we coarsened the staggered operator, still using 1 2 (1 ± (x)) (unitarily rotated into the flavor basis) as a chiral projector on the near-null vectors. We will denote the method of coarsening the Kähler-Dirac preconditioned operator using 1 2 (1 ± (x)) as chiral projectors. Again, we will use R = P † . In Sec. 5, we will discuss an asymmetric coarsening where R = P † . While not being of merit in two dimensions, it may be an interesting point of investigation in fourdimensional QCD. In this section we will consider the spectrum, co-linearity, and oblique  Table 1.
projector for a symmetric coarsening. Looking forward, in Sec. 4, we will demonstrate that symmetric coarsening produces a well behaved and robust recursive algorithm independent of the volume and the mass for physically relevant values of β.
As we described previously, the Wilson operator has a well behaved spectrum for MG as the high modes are well gapped along the real axis. This is also true for the Kähler-Dirac preconditioned operator in Fig. 3.5. As we discussed at the end of Sec. 2.1, the interacting spectrum is no longer a perfect circle in the complex plane. This does not undermine the qualitative benefits of the spectrum. Additionally, in the interacting case, a mass term still gaps the spectrum. In the right panel of Fig. 3.5, where we zoom in on the origin of the complex plane, we see that low modes are well preserved under our coarsening prescription, and there are no spurious modes near the complex origin. Eigenvalues of the coarse operator do not come in exact complex conjugate pairs, a consequence of using (x) as the chiral projector. This is inescapable because, in general, γ L/R 5 does not define a good projector. The eigenvalues are approximately paired, which is consistent with a general preservation of the low spectrum. This may also be consistent with 1 2 (1 ± (x)) becoming equivalent to 1 2 1 ± γ L/R 5 , up to a unitary transformation, in the continuum limit, and as such preserving complex conjugate eigenpairs.
A careful study of the right panel of Fig. 3.5 shows that both the original operator and its coarsening feature eigenvalues with negative real part, that is, lying in the left-half plane. We refer to these eigenvalues as exceptional eigenvalues, borrowing the language from Wilson-clover fermion literature [60]. The existence of modes in the left-half plane invalidate proofs which bound the convergence of Krylov solvers [61]. We will see in Sec. 4   Table 1.
that, because error components in these exceptional modes are well solved by the coarse error correction, a recursive MG algorithm can successfully address this problem. As we will see in Sec. 4, this stabilizes the MG solve, independent of mass and volume, and is consistent with the success of MG for the Wilson operator beyond the critical mass.
Local Co-linearity versus the Oblique Projector The overall failure of MG for the staggered operator stemmed from the large error enhancement to the high modes from the coarse correction. A predictor of success for MG on the Kähler-Dirac preconditioned operator would be a significant reduction of this enhancement. We would also still need to see strong local co-linearity and a significant coarse error correction on low modes. In the left and right panels of Fig. 3.6, we consider the local co-linearity and oblique projector, respectively, of the Kähler-Dirac preconditioned operator on a representative configuration. We explore using both near-null vectors and right eigenvectors to define the prolongator P and restrictor R = P † .
On the left, we see that local co-linearity of low modes of the Kähler-Dirac operator is well maintained, similar to the original staggered operator. The benefit of coarsening the Kähler-Dirac preconditioned operator as opposed to the original staggered operator is most clearly noted by the action of the oblique projector as displayed on the right panel of Fig. 3.6. The oblique projector reduces the error component on the fine level for roughly the lowest 15% of the spectrum. Above this threshold, the error component is enhanced, but only minimally.  Table 1.

MG Algorithm Numerical Results
The convergence rate of our new MG algorithm on the Kähler-Dirac preconditioned operator, illustrated in Fig. 4.1, is a dramatic improvement relative to the failed MG algorithm applied to the original staggered operator in Fig. 3.1. The only methodological difference is coarsening the Kähler-Dirac preconditioned operator instead of the original staggered operator. Moreover, as we scan in the quark mass as shown in Fig. 4.2, we see that our formulation has eliminated ill-conditioning due to critical slowing down: unlike using CG on the even/odd preconditioned system, an MG solve takes a roughly constant number of outer iterations as the chiral limit is approached.
Let us now describe in detail the new algorithm and the numerical analysis for MG applied to the Kähler-Dirac preconditioned staggered operator. The parameters we choose are summarized in Table 1. First, we consider a two-level algorithm. We construct a right near-null vector ψ by relaxing on the homogeneous normal system AA † ψ = 0, using  Gaussian distributed random vectors ψ 0 as the initial guess † . In practice, this is performed in multiple steps.
• We convert the homogeneous system to the residual system AA † e = r ≡ −AA † ψ 0 .
• We relax on the residual system using CG to a relative tolerance of 10 −4 or a maximum number of 250 iterations.
• We reconstruct the near-null vector ψ = ψ 0 + e, where e is the result of relaxation. † We remark that A † A generally works just as well. We have also explored relaxing on A directly using BiCGstab and BiCGstab(l), l = 6 [62], which in practice works well at small volumes but degrades for larger volumes. The use of the normal operator may be why we can effectively capture exceptional eigenvalues. This is performed n 1 vec times, and then we globally orthonormalize the full set of nearnull vectors. We subsequently chirally double the near-null vectors using 1 2 (1 ± (x)), and form the second-level operator A = P † AP from the block-orthonormalized chirally-doubled null vectors. The coarse correction follows three steps. (1) Relax on the current residual, a process known as the pre-smoother, (2) approximately solve the second-level system: [RAP ] Re = Rr (or, equivalently, approximately solve Aê =r), giving the prolonged error correction e = Pê, and (3) post-smooth on the error accumulated from steps 1 and 2. In step 2 we use a Krylov solver, and as such the MG preconditioner is not stationary. For this reason, we use the restarted generalized conjugate residual (GCR) [63] as a flexible outer solver, forming a K-cycle. We use a global MR for our pre-and post-smoother. The specific details of these steps are given in Table 1. In practice, we iterate on the even/odd preconditioned system on the fine level, with the prescription where we coarsen assuming the odd contributions are all zero, and we also ignore the odd contributions in the prolonged error. This technique proved successful for the Wilson operator [64].
A two-level algorithm does not fully eliminate critical slowing down, it just shifts it to the second level. We address this by generalizing to a recursive algorithm, where we perform a still coarser correction to the system in step (2) of the above description.
We generate a third level, A , similar to how we generate the second level: we generate near-null vectors with A A † , chirally double the near-null vectors using 1 2 (1 ± σ 3 ), and subsequently form a third level.
This clearly generalizes to still coarser levels. For our numerical experiments in Sec. 4, we only study a three-level algorithm. Unlike on the fine level, the Krylov solve we perform on the intermediate level is an iteration directly on A, as we found this was more stable in practice. We approximately solve the coarsest level via CG on the normal error. Due to the exceptional eigenvalues which propagate to coarser levels, as noted in Fig. 3.5, numerical experiments with Krylov solvers acting on A were in general not successful. This was either due to stability reasons (using BiCGstab(l) [62]) or due to cost (using GCR). We believe using the normal operator is of critical importance.

Results
A successful, recursive MG algorithm will shift critical slowing down to the coarsest level. In the context of the Schwinger model, and four-dimensional QCD, this means we want consistent convergence independent of mass and volume. We are also interested in the MG algorithm being successful in all physically interesting regimes. In the case of our  Table 1: The parameters we use for our K-cycle. For consistency, we use the same setup parameters throughout the procedures described in this paper.  target problems, this means we need to study the behavior with the bare coupling β.
The continuum limit is taking β → ∞ at constant physics, where the relevant region is l Mπ > l σ . When β is too small, close to the cutoff scale, we are no longer studying relevant physics. A breakdown of MG for very small β is acceptable. The values of β studied, 3.0, 6.0, and 10.0, correspond to l σ ≈ 2.4, 3.5, and 4.5, respectively. The lowest value of β is becoming rather unphysical.
Elimination of critical slowing down: fine level The indication of a successful twolevel algorithm is the elimination of critical slowing down for the fine operator A, that is, constant iterations with respect to the mass and volume per each β. In Fig. 4.3, we present the number of applications of the fine operator A between the GCR algorithm and the MG preconditioner, which is proportional to the number of iterations for the outer GCR solve. On the left we consider the case of fixed physical β = 6.0 at varying volume. The number of A applications is roughly constant, independent of the volume and mass in the chiral limit. In the right panel, we consider our largest volume, 256 2 , fixed for three different values of β. We see that at β = 10.0 and 6.0, critical slowing down has been essentially eliminated as a function of mass. At β = 3.0, where we are probing somewhat cutoff scale physics, the number of iterations appears to not be diverging with power law behavior, and as such critical slowing down has still been eliminated.
Elimination of critical slowing down: intermediate level A successful recursive algorithm eliminates critical slowing down at each level. Thus, we consider the average We remark that in a highly optimized and tuned implementation, it is important that we use a K-cycle at the second level. In such an implementation, the maximum number of iterations on the coarsest level may be capped to some reasonable amount. This would cause the number of iterations on the intermediate level to increase. Since in a K-cycle the second level is solved to a fixed residual as opposed to a fixed number of iterations, the number of iterations at the finest level remains stable. Critical slowing down: coarsest level The previous two paragraphs demonstrate an elimination of critical slowing down from finer levels. Thus, there should be critical slowing down on the efficiently solvable coarsest level. In Fig. 4.5 we consider the average number of iterations for the coarsest solve via CGNE. In contrast to the previous two figures, these plots are on a log-log scale instead of a log-linear scale. In the left and right panels, we consider constant β = 6.0 and a constant volume of 256 2 , respectively. The number of iterations is divergent with power law behavior § . Critical slowing down has been shifted to the coarsest level.
Comparison with a direct solve In looking at the outermost level, the intermediate level, and the coarsest level in a three-level solve, we see that we have formulated a MG algorithm which shifts critical slowing down to the coarsest level. Furthermore, the solve is stable: in Fig. 4.1, we saw that a MG-GCR solve converges smoothly at our most chiral point for β = 10. There are large reductions in the relative residual on each iteration. On the other hand, the traditional solve with CG on the even/odd operator, despite converging successfully, converges very slowly, an indication of critical slowing down.
This behavior persists independent of mass. In Fig. 4.2, we trace the number of iterations away from the chiral limit, seeing that it is roughly constant. Critical slowing down has been eliminated. On the other hand, the number of iterations for a solve with the even/odd operator diverges with mass with power-law behavior. This is exactly the critical behavior that's been shifted to the coarsest level in Fig. 4.5. The benefit of our MG algorithm is drastic.

Continuum Limit
It should be emphasized that our fixed prescription is effective in the most relevant regime: towards the continuum, where the lattice spacing vanishes  relative to fixed physics, and in the chiral limit, where l Mπ diverges relative to l σ .
For the two-dimensional Schwinger model, taking the continuum limit at constant physics corresponds to simultaneously doubling the length scale of the fine volume, halving the mass, and quadrupling β. In Table 2, we consider the use of MG while taking the continuum limit from two base configurations. First, we consider a base configuration of 64 2 at m = 0.01 and β = 3.0, where we have discussed earlier that a MG algorithm is successful. On two successive refinements towards the continuum limit, we see that there is a reduction in the number of outer applications of A and in the average number of iterations on the intermediate level. In tandem, the average number of iterations on the coarsest level increases: there is more critical slowing down to shift to the coarsest level, which is to be expected, towards the continuum limit. Our MG algorithm performs better as the continuum limit is taken. Next, we consider a base configuration of 64 2 at m = 0.004 and β = 0.75, an unphysically coarse configuration. In this case, a MG algorithm fails to converge. Again, on progressive refinements, the MG algorithm becomes convergent and becomes better behaved as the continuum limit is taken.

Preserving Complex Conjugate Pairs
A possible, if not necessary, generalization for four-dimensional QCD or other staggered fermion problems could be the exact preservation of complex conjugate pairs upon coarsening. Indeed it is possible to develop a prolongator P and a restrictor R = P † , abandoning chiral doubling with projectors, which preserves complex conjugate eigenpairs after coarsening the Kähler-Dirac preconditioned operator, or any operator satisfying γ L/R 5 Hermiticity with γ L 5 γ R 5 = I. The resulting formalism gives what we will call an asymmetric coarsening with σ L/R 1 Hermiticity on the coarse level.
We consider a set of left and right vectors, ψ i | and |ψ i , respectively, which can generally be arbitrary and unequal. We perform a chiral doubling which gives These prolongators and restrictors obey P σ 1 = γ R 5 R † and σ 1 R = P † γ L 5 . This is sufficient to prove RA P is σ 1 Hermitian. The next step is to block bi-orthonormalize R and P , enforcing RP = I, by-products of which give us σ L/R 1 . As a clarifying tangent, we will consider the case γ L 5 = γ R 5 and |ψ i = |ψ i , that is, R = P † . This is true, for example, for the Kähler-Dirac preconditioned operator in the free field limit, or when considering the Wilson operator in general. The critical observation in this case is to recall that the process of (block) orthonormalization via a Gram-Schmidt is equivalent to a thin-QR decomposition. We define the block-dense matrix M of block dimension (coarse dof) × (coarse dof) as P † P = M = Σ † Σ, (5.2) where in the last step we have performed a Cholesky decomposition. We can rearrange Eq. 5.2 as By definition, P ≡ P Σ −1 is block orthonormal. With the definition σ Σ 1 ≡ Σσ 1 Σ −1 , we have γ 5 P = P σ Σ 1 , and P † AP is σ Σ 1 Hermitian.
We return to the (block) bi-orthonormalization of R and P . The above procedure generalizes to a "thin-LU" decomposition. Eq. 5.2 generalizes to R P = M = LU, (5.4) where in the last step we have performed an LU decomposition. We can rearrange Eq. 5.4 as R and P are block bi-orthonormal. We can show A ≡ RAP admits a σ L/R 1 Hermiticity condition via defining and noting The pair σ L/R 1 obeys σ L 1 σ R 1 = I, as can be verified by explicit calculation, requiring the critical and subtle observation thatRP is σ 1 Hermitian itself.
We emphasize that this construction is fully generic, whether or not |ψ i = |ψ i . We defer a discussion of numerical experiments with preserving complex conjugate eigenpairs to appendix A. Our deference to an appendix reflects our observations that, in two dimensions, (recursively) preserving eigenpairing actually leads to a less effective, and sometimes unstable algorithm. This method, or a further development thereof, may bear some fruit in four dimensions.
We make the additional remark that we can now make the algorithmic choice to rightblock-Jacobi precondition A, analogous to the transformation we made to the staggered operator in the Kähler-Dirac form in the first place, and continue to preserve complex conjugate eigenpairs if we coarsen again. Let us denote A = B + C, where B is the block-local contribution. The resulting right-block-preconditioned operator AB −1 obeys a σ rbj,L/R 1 Hermiticity condition with σ rbj,L 1 = σ L 1 B −1 and σ rbj,R 1 = Bσ R 1 . This recursive rightblock-Jacobi preconditioning did not lead to an effective algorithm in two dimensions.

Exact Preservation of Eigenvectors
In the case of, for example, the Wilson operator, chiral doubling with 1 2 (1 ± γ 5 ) preserves complex conjugate eigenpairs. We can choose the vectors |ψ i ≡ |ψ i to be right eigenvectors |λ +,R i with eigenvalues λ + i , where the + denotes that the eigenvalue has positive real part.
The coarse operator P † D W P exactly preserves the eigenvalue λ + i , and |λ +,R i is exactly preserved on the coarse subspace, that is, P P † |λ +,R i = |λ +,R i . However, even though chiral doubling guarantees the eigenvalue λ − i is also preserved by the coarse operator, it is not because |λ −,R i is exactly preserved by the coarse subspace, that is, We can use asymmetric coarsening to preserve |λ −,R i . We can choose |ψ i = |λ +,R i and ψ i | = λ +,L i |, then chirally double using Eq. 5.1 and subsequently block bi-orthonormalize the P and R. This operator preserves the eigenvalues λ ± i , and additionally P R|λ ±,R i = |λ ±,R i and λ ±,L i |P R = λ ±,L i |.

Conclusion
The first successful MG algorithm in LQCD was constructed for the Wilson discretization of the Dirac operator nearly a decade ago [6,7]. This advance relied on, at the time, the novel approach in LQCD to adaptively discover the near-null space and geometrically project onto coarse lattices. Remarkably, with the exception of the similar twisted-mass discretization, the basic method has not been easily generalized to two important methods: staggered and domain wall fermions, each of which feature improved chiral symmetry. A more fundamental understanding of MG methods in LQCD is clearly lacking. Here, we have taken a step towards this. For the staggered operator, we identified the spectral feature that was responsible for the failure of a straightforward generalization of Wilson MG and have overcome this problem by preconditioning by the Kähler-Dirac (spin-flavor) block structure. We demonstrate that this has a dramatic effect on the spectrum: in the singular, zero mass limit, the pure imaginary spectrum of the anti-Hermitian operator maps to a unitary circle of the form seen in the overlap operator.
The success of the resultant MG algorithm for this Kähler-Dirac preconditioned operator has been demonstrated numerically for the two-dimensional Schwinger model. Both the theoretical framework and the phenomenological features naturally generalize to the case of four-dimensional QCD. On this basis, we are optimistic that our staggered multgrid algorithm will have similar success in this application. Numerical tests for this conjecture are underway by extending the high performance MG framework of the QUDA library to coarsen staggered-like operators. These tests will be made on the largest available lattices to explore the scaling of the algorithm over a range similar to the two-dimensional tests presented here.
We have also made an effort to explore a range of projection methods that are capable of exactly preserving the complex conjugate pairs of eigenvalues present in the Kähler-Dirac preconditioned operator. We hope our emphasis on spectral analysis and transformations will provide some flexibility in adapting our algorithm not only to four-dimensional QCD but also to similar Dirac discretizations found in BSM theories [2], supersymmetric Yang-Mills theory [20], and quantum critical behavior in condensed matter [19].

A Studies of Preserving Complex Conjugate Eigenpairs
In Sec. 5, we developed a formalism to exactly preserve complex conjugate eigenpairs for a coarsened Kähler-Dirac preconditioned operator. This used an asymmetric coarsening which gave a σ L/R 1 on the coarse level. This formulation is largely successful, however, it can suffer from anomalously large real eigenvalues in the negative half plane, destabilizing the MG preconditioned solve, in cases where the symmetric coarsening proceeded without issues. If these stability issues can be addressed, it may lead to a better algorithm in two dimensions and four dimensions. As appropriate, this will be the topic of a future publication. This appendix will follow the structure of Sec. 3.3, where we study the spectrum, local co-linearity, and oblique projector of the asymmetrically coarsened operator in the case where a recursive algorithm is successful. We will then scan the iteration counts as a function of mass, similar to in Sec. 4.1, and identify cases where the algorithm breaks down. Last, we will investigate one of these cases.
In Sec. 3.3 we considered a representative spectrum of the Kähler-Dirac preconditioned operator and a symmetric coarsening. In the case of asymmetric coarsening, we again expect the low modes to be preserved well, but additionally come in complex con- jugate pairs. This is exactly the case in Fig. A.1, where we overlay the spectrum of the asymmetric coarse operator. We also see a "feature" of σ L/R 1 Hermiticity: there are pairs of purely real eigenvalues.
In the case of the Wilson or overlap operator, pairs of purely real eigenvalues have a significant physical interpretation. The smaller real eigenvalue corresponds to a physical chiral mode via the lattice index theorem [65], which thus needs to be well captured by a MG algorithm. The paired large real eigenvalue is merely a quirk of being on a finite lattice, and thus lives as an isolated large eigenvalue near the cutoff. On the other hand, the pairs of real eigenvalues for the coarsened Kähler-Dirac operator do not have an obvious physical intuition, just as the naïve staggered fermion operator does not trivially correspond to an index theorem [43]. These purely real eigenvalues are a symptom of unstable solves at Returning to stable solves, we consider the local co-linearity and the oblique projector under an asymmetric coarsening. These are overlaid on the data for a symmetric coarsening in Fig. A.2. An asymmetric coarsening is roughly comparable to quality to a symmetric coarsening, indicative of a successful MG algorithm. ¶ As a next task, we consider MG preconditioned solves with the asymmetric coarsened operator. We will only present a subset of the cases considered in Sec. 4.1 and instead focus on the cases where the solve is unstable: large volumes. The number of fine operator applications and average intermediate applications are presented in Fig. A.3. In the cases where a data point is marked by a "×", the solve failed. The failures are largely confined to smaller masses, but not with a discernable pattern; indeed, for β = 6.0, the lowest masses had stable solves! We present the spectrum of the asymmetric coarsened operator, where an MG solve with an asymmetric coarsened operator fails, in the left panel of Fig. A.4, where we see there are now large, real eigenvalues far in the right plane and also in the left plane. There ¶ In general, the local-colinearity is not bounded by 1 when R = P † . This is because (1 − P R) is not a normal operator. Thus, for a normalized vector v, v † 1 − R † P † (1 − P R) v isn't bounded by 1. This can be realized by the bi-orthonormal basis p 1 = (1/2, 1/2, 1/2, 1/2), p 2 = (1/2, 1/2, 1/2, −3/2), r 1 = (1/2, 1/2, 1/2, 1/2), r 2 = (1/2, −1/2, 1/2, −1/2). is also a large negative real eigenvalue at approximately -26.75. These pathological real eigenvalues are not part of the low subspace and are therefore not well captured by our MG algorithm. However, in the right panel, we see that the low spectrum is still well behaved. It is a point of future research to see if these anomalously large, real eigenvalues can be addressed.