Nucleon Mass with Highly Improved Staggered Quarks

We present the first computation in a program of lattice-QCD baryon physics using staggered fermions for sea and valence quarks. For this initial study, we present a calculation of the nucleon mass, obtaining $964\pm16$ MeV with all sources of statistical and systematic errors controlled and accounted for. This result is the most precise determination to date of the nucleon mass from first principles. We use the highly-improved staggered quark action, which is computationally efficient. Three gluon ensembles are employed, which have approximate lattice spacings $a=0.09$ fm, $0.12$ fm, and $0.15$ fm, each with equal-mass $u$/$d$, $s$, and $c$ quarks in the sea. Further, all ensembles have the light valence and sea $u$/$d$ quarks tuned to reproduce the physical pion mass, avoiding complications from chiral extrapolations or nonunitarity. Our work opens a new avenue for precise calculations of baryon properties, which are both feasible and relevant to experiments in particle and nuclear physics.


I. INTRODUCTION
Lattice-QCD calculations have entered a precision era, with total uncertainties below one percent for some simple properties of mesons and Standard Model parameters that can be determined from them [1][2][3][4][5][6]. It is important for interpreting experiments in nuclear and particle physics to extend such precision calculations to nucleon properties. For example, various nucleon expectation values (with no momentum transfer) are needed for precision nucleon beta decay (scalar and tensor charges), direct dark matter detection (sigma terms), and highenergy scattering (moments of parton distribution functions) [7,8]. With nonzero momentum transfer, there are form factors pertinent to lepton-nucleon scattering [9]. In particular, lattice-QCD calculations of vectorcurrent form factors can be compared to measurements in electron-nucleon scattering, while very similar calculations of (nucleon) axial-current form factors are needed as inputs to the analysis of neutrino-nucleus scattering.
To carry out a lattice QCD calculation one must first choose a discretization for the quarks and gluons. Because of the doubling problem of lattice fermion fields, the quarks are the more complicated consideration. The precise meson-sector calculations referred to above employ the "highly improved staggered quark" (HISQ) action [10]. Not only are the discretization effects small (by design and, it turns out, in practice [11][12][13]), but also the MILC collaboration has generated two dozen * yin01@uchicago.edu † ameyer@quark.phy.bnl.gov ‡ chughes@fnal.gov § ask@fnal.gov ¶ simone@fnal.gov * * astrel@fnal.gov ensembles of SU(3) gauge fields with 2+1+1 flavors of sea quarks (where "2" implies the up and down quarks are chosen to have equal mass, and the strange-and charm-quark masses are tuned close to their physical value). The MILC HISQ ensembles [3,14] have four lattice spacings (a ≈ 0.15, 0.12, 0.09, 0.06 fm) with pion masses near 135 MeV, 210 MeV, and 300 MeV, a fifth (a ≈ 0.042 fm) at 135 and 300 MeV, plus a sixth (a ≈ 0.03 fm) at 300 MeV only. It is worth investigating how useful these ensembles are for nucleon physics. Here we present our first step in this direction: a calculation of the nucleon mass employing the HISQ action for the valence quarks and using the MILC HISQ ensembles with physical pion masses, which have lattice spacings ranging from a ≈ 0.15, 0.12, and 0.09 fm. The valence masses are chosen equal to the equal-mass light pair in the sea. An advantage of using only the physical-pion ensembles is that we do not need to extrapolate unphysical-pion-mass data to the physical limit.
These ensembles have already been used for nucleon physics [15][16][17][18][19][20] using different fermion formulations for the valence quarks instead of HISQ. As a consequence, there are violations of unitarity expected to be of the order a 2 . It is worthwhile to explore a setup which does not have this complication.
The challenge for an all-staggered calculation stems from the remaining doubling: one staggered fermion field yields four Dirac fermions. The quantum number labeling the four species is known as "taste". In the continuum, infinite-volume limit, SU(4) taste and SO(4) spacetime symmetries are expected to become separately exact. At nonzero lattice spacing, however, the taste-rotation symmetry group (of the transfer matrix) is a finite group lying in a diagonal subgroup of SU(4) × SO(4) [21][22][23]. Consequently, it is complicated to construct staggered-baryon creation and annihilation operators [22], especially when isospin and strangeness are incorporated [24]. These complications need to be confronted only once for each correlation function, after which one can study whether the all-HISQ formulation is promising for simple quantities (e.g., masses and form factors). If successful, increasingly more complicated quantities can be determined. Here, we start with the nucleon mass.
This paper is organized as follows: in Sec. II the construction of the staggered baryon irreducible interpolating operators is discussed, in Sec. III the simulation details are given, and in Section IV our different fitting methodologies are described. Specific details about fitting the staggered nucleon two-point correlators is given in Sec. V, and these results are combined with all sources of systematic errors in Sec. VI to produce a final estimate of the nucleon mass. We then discuss our conclusions in Sec. VII. Appendices A, B, and C spell out the grouptheoretic construction of staggered baryon operators in detail.

II. BARYONS BUILT WITH STAGGERED FERMIONS
Here we outline the construction of our baryon operators using staggered quarks, and refer the reader to Appendices A, B, and C for more technical details.
With staggered fermions [25], the doubling problem is partly solved. With the simplest, most naive discretization, one finds a set of "doubling symmetries" that, in the end, imply that a single lattice fermion field corresponds to 16 Dirac fermions in the continuum limit. A subset of doubling symmetries can be simultaneously diagonalized [26,27], leaving four identical, decoupled onecomponent fields. Three of these four copies can simply be removed, leaving four tastes instead of 16. This procedure retains one exact axial symmetry, which is a nonsinglet with respect to taste. This remnant is enough to ensure several important consequences of chiral symmetry: if the bare mass vanishes, the pion mass vanishes; renormalization constants related by chiral symmetry are equal, etc.
On the other hand, the translational and doubling symmetries are not separate after this diagonalization. The lattice action is invariant under a composition of the two known as "shifts", which multiply the fermion field with a phase that depends on the originating site and direction of the translation.

A. Staggered Baryon Quantum Numbers
The diagonalization of the doubling symmetries leads to an intimate relationship between the spin-taste quantum numbers of a hadron creation/annihilation operator and the spatial distribution of the constituent quark fields. For computing masses and matrix elements, the relevant symmetry group is the subgroup of lattice trans- formations restricted to a single timeslice. This group is called the geometric timeslice group [22], and denoted "GTS". For a meson bilinear, operators that transform irreducibly under GTS can be constructed by fixing both the relative displacement between the quark and antiquark in the bilinear, in combination with fixing the relative signs between the bilinear from one lattice site to its neighbors. Rotation symmetries interchange the staggered phases identified with the sites, and shift symmetries induce phase changes in meson operators. For baryons, we need the fermionic irreducible representations (irreps), which are much more complicated. GTS has three fermionic irreducible representations, labeled 8, 8 , and 16, which are simply the dimensions of the irreps. The staggered quark field with zero momentum transforms irreducibly under the 8, where the 8 elements of the representation map onto the 8 vertices of a spatial unit cube. Shift symmetries and rotations interchange quark fields at the unit cube sites, and also change the phases of the sites relative to one another.
The characters of the 8 irrep are similar to those of the fundamental 8 irrep, except that the π/2 rotations have the opposite sign. The 16 irrep splits into two disjoint sets of eight elements; shifts and rotations about the z axis map an element of a subset into another element of the same subset, while rotations about the x and y axes map elements to a linear combination of elements from both sets. The elements of the 16-irrep at each unit cube site are comprised of linear combinations of terms that appear to transform as a 3-vector under rotations. Upon closer inspection, only two of these linear combinations are independent. We conventionally construct the 16-irrep elements such that the corresponding baryon operators at the origin are eigenstates under a π/2 z-axis rotation with eigenvalue ±1. The remaining 16-irrep elements can be obtained by applying shifts to take these two rotation-eigenstate operators at the origin to each of the remaining unit cube sites.
The mapping of the continuum spin representations to the lattice octahedral representations is given in Table I. There is a one-to-one correspondence between the 8, 8 , and 16 representations of the GTS group to the conventionally named G 1 , G 2 , and H representations of the double cover of the cubic rotation group [28]. This mapping is a consequence of the reduction of the continuum taste symmetry group SU(4) to the Clifford group [23,29]. Here, Q 8 is the order-8 quaternion group and D 4 is the order-8 dihedral group. Both Q 8 and D 4 only have one fermionic representation, and in both cases this fermionic representation is 2dimensional. The method of induced representations tells us that these fermionic representations are the only representations that can appear when lifting a fermionic representation from the octahedral group to the GTS group. In this way, the only modification of the G 1 /G 2 /H representations when including the taste symmetry is to increase the dimension of these representations from 2/2/4 to 8/8 /16.
In phenomenological models, the structure of baryonic wave functions is typically understood by embedding the SU(2) spin and SU(3) flavor groups into an SU(6) symmetry group. As baryonic wave functions need to be overall antisymmetric, and the color component is antisymmetric, it is necessary to isolate the representations of SU(6) that are symmetric combinations of spin and flavor.
This embedding procedure may also be performed for the staggered fermions by combining the SU(4) taste symmetry group with the aforementioned spin and flavor groups. These three groups are thus embedded into SU(24) [24]. The representations of SU(24) are decomposed back into SU(2) × SU(3) × SU(4) in order to understand the symmetry structure of the resulting operators. This procedure yields the usual baryon octet and decuplet (paired with symmetric taste representations) as well as several other representations that are mixedsymmetric or antisymmetric in taste.
Notably, it is possible to construct baryon operators that have nonsymmetric taste. However, these operators will also have nonsymmetric spin and flavor wave functions that, when combined with the anti-symmetric color component, produce an antisymmetric baryon wave function. Such operators have no close analog in the physical world without taste. For example, in the physical world, symmetrizing over spin and isospin only allows for combinations of spin 1/2 (3/2) with isospin 1/2 (3/2), which correspond to the nucleon (∆). With the additional taste symmetry, it is also possible to build operators nontrivial symmetrization over tastes that create states that have isospin 3/2 with spin 1/2, and vice versa. Further, Bailey [24] shows that in the continuum limit each of these additional states lies in a multiplet that is related by a SU(12) flavor-taste symmetry transformation to the physical nucleon or ∆ states. Consequently, these taste nonsymmetric (iso)spin (3/2) 1/2 representations must give identical physics in the continuum limit to the physical nucleon/∆. We refer to these states as "N -like" or "∆-like".
In each irrep of GTS, multiple taste partners of the same baryon can contribute, which we call multiplicity of tastes, e.g., three N -like tastes lie in the 8. The expected multiplicity of the lowest-lying multiplet (i.e., not orbital or radial excitations) of N -like and ∆-like states are given in Table II. Excited multiplets are expected to have the same taste multiplicities if they share the same particle content and J P quantum numbers. In the numerical work presented below, we only use the 16 irrep of the isospin 3/2 constructions of Table II, because the 16 irrep only has a single N -like state, whereas the 8 has three and the 8 has none.

B. Interpolating Operator Construction
Here we give an overview of our interpolating operator construction and refer the reader to Ref. [24] and Appendix C for more technical details. When constructing staggered baryon interpolating operators, it is easier to work on one timeslice with combinations of quark fields which are defined by their displacement from the origin modulo 2. There are eight sets of these quark combinations which can be labeled by a unit cube corner A with A ∈ {0, 1} and ∈ {1, 2, 3}. We refer to these objects as "corner walls", and write them as The sum over " x@ A" is defined by summing x over all sites on a timeslice that are displaced modulo 2 from the origin by the vector A: for some general function f ( x). Here, N s is the (even) number of sites in a spatial direction.
Interpolating operators with nontrivial taste quantum numbers have two or more quarks at different spatial sites. To make these operators gauge invariant, parallel transporters must be inserted to connect the quarks at different spatial sites. Our parallel transporters are defined as (with color indices suppressed) which have an equal sum of links in both the positive and negative directions away from a lattice site, ensuring the operators have simple transformations under discrete rotations.
The parallel transporters from perpendicular directions are chained together to build operators that obey the full set of spin-taste symmetries allowed by the staggered lattice symmetry group. The dressing with par-allel transports may be denoted with a vector B, with B ∈ {0, 1} and ∈ {1, 2, 3}, indicating the displacement away from the starting site. We write these eight parallel transport dressings with the condensed notation, To construct gauge invariant operators, the quark fields are dressed with these parallel transporters: where a and b are color indices. A baryon operator, B, is a quark trilinear that needs to be overall antisymmetric when considering color, flavor, spin, and taste. Because color is antisymmetric, B must be symmetric under simultaneous interchange of the flavor and spin-taste of any two quarks, namely, where the flavor indices i, j, k, and the spin-taste unitcube indices A, B, C are for the three quarks in the baryon. The remaining unit-cube index D is discussed below.
As mentioned above, we restrict ourselves to the isospin I = 3/2 representations of a baryon in this paper. As the I = 3/2 irrep built from three I = 1/2 irreps is completely symmetric, in the following we drop the flavor indices for clarity. Then baryon operators are constructed as Here, the index D is the site where all parallel transporters meet. Next, the baryon operators must be symmetrized over the flavor and spin-taste unit cube site indices. The unit-cube sites of the quarks are symmetrized via To build interpolating operators that transform in a desired irreducible representation of GTS, one must define appropriate tensors and contract them with the symmetrized baryon operators S D, A B C in Eq. (2.8). For the fermionic irreps of the GTS symmetry group, these tensors can be written as O R s D, A B C , where R is an index distinguishing different irreducible representations and spatial combinations, and s is an additional index for the 16 irrep. It trivially takes one value for the two 8-dimensional irreps, and in the 16-dimensional irrep s = ±1. The spin-taste unit cube index D serves as the component of the irrep. In Appendix C, we give explicit formulas for the O R s D, A B C that we use in this paper. Finally, the baryon interpolating operator that transforms within a definite GTS irrep is where D is not summed over. Here, D and s D denote the representation index for irrep 8, 8 , or 16. The antibaryon operator is a similarly symmetrized trilinear, but with antiquarks rather than quarks. Further, we use the conjugate of the construction in Eq. (2.1), without parallel transporters, rather than Eq. (2.5). We denote this object as S after replacingχ A, B ( x) →χ A (with no B or x dependence). These operators are (2.10) We retain the spatial dependence in B R s D , but not in BR s D , because in Sec. III we use them as sink and source, respectively.
As described in Appendix C, operators in each irrep can be obtained from distinct "classes" of the three quark fields. Here, "class" is shorthand [22,24] for distinct spatial distributions of the three quark fields within the TABLE III. Details of the gauge-field ensembles used in this study. β is the gauge coupling; a (fm) is the lattice spacing in a mass-independent fp4s scheme [3]; am l , ams, and amc are the sea quark masses; Ns × NT gives the spatial and temporal extent of the lattices; and n cfg is the number of configurations used for each ensemble.
Set β a (fm) am l ams amc Ns × NT n cfg unit cube. As operators with identical quantum numbers, these classes of operators all excite the same N -like or ∆like states. 1 As described in Appendix C, in this work we use the four classes of the I = 3/2, GTS 16 irrep, which are labeled as class 2, 3, 4, and 6 [22,24]. With any of these operators at the source or sink, we consequently have a 4 × 4 matrix correlation function.

III. SIMULATION DETAILS
We use gauge field configurations generated by the MILC collaboration [3,14]. For the gauge fields, they employed the one-loop tadpole-improved Lüscher-Weisz gauge action improved through O(α s a 2 ) [30] and included 2 + 1 + 1 flavors in the sea, the up and down quarks (with equal mass m l ), the strange quark (m s ), and the charm quark (m c ). For the sea quarks, MILC employed the HISQ action [10], also improved to O(α s a 2 ) by removing one-loop taste-changing processes. For valence quarks, we use the HISQ action with the same bare masses as their sea counterparts. This choice introduces no additional unitarity violations from a mixed action [31]. Further, the remnant chiral symmetry ensures there are no unwanted near-zero modes in the propagators at nonzero quark mass [32].
To enable a continuum extrapolation, we choose three ensembles, with lattice spacings in the range a ≈ 0.09-0.15 fm. Details of these ensembles are given in Table III. The spatial volumes of the lattices are large enough to ensure single particle finite volume effects are exponentially small [33]. Each ensemble has m l tuned to reproduce the physical pion mass. They differ from those listed in Refs. [3,14] by retuning the light sea-quark masses to reproduce the pion masses more accurately. The retuning does not alter the lattice spacing values, which are determined from the mass-independent scheme.
Although MILC has generated many ensembles with unphysically large m l , we do not use those ensembles here. In this way, we can circumvent using baryon chiral perturbation theory (or some other physically motivated function) to guide the unphysical data to the physical value. There is some evidence [34] that high-order functional forms are necessary. To gain control over the chiral extrapolation it might, in this case, be more costly, because numerous ensembles could be needed.
To compute baryon masses, we construct the two-point correlation function where the source is defined in Eq. (2.10) and the sink is defined in Eq. (2.7). In order to increase statistics for ensembles 1 and 2, we choose two well separated timeslices for t = 0. The locations of these two timeslices are chosen randomly for each configuration. Successive configurations generated within each ensemble are expected to be correlated. These autocorrelations were studied in Ref. [14] and were, however, found not to be appreciable, so that these configurations can be treated as statistically independent. Even so, we reduce the autocorrelations by blocking two consecutive configurations to obtain each sample. By expressing hadron correlators in terms of quark fields and then using Wick's theorem in the Feynman path integral, one can write the two-point correlation functions in Eq. for the Green function G ac A (x, t), where / D ab xy is the kernel of the HISQ action [10].
A straightforward way to construct the full set of correlation functions would require 64 different quark propagators, one for every source corresponding to the paralleltransported field, χ a A, B ( x) in Eq. (2.5). To reduce the number of propagators, we have fixed the gauge fields to Coulomb gauge. Then the links connecting the quark fields in the source interpolating operators are no longer necessary. Instead, only the eight propagators with the corner-wall sources specified in Eq. (2.1) must be computed. It turns out that the gauge fixing improves the signal significantly. Without gauge fixing, it would be necessary to introduce the parallel transporters within each unit cube. Contributions from different cubes would average to zero, albeit introducing some gauge noise. With gauge fixing, however, every part of the corner walls is linked to the others, providing a helpful volume factor in the signal.
The parallel transporters at the sink are applied after all quark propagators have been calculated. The only nonzero correlation functions are those where the quantum numbers are conserved, e.g., where R andR belong to the same irreducible representation. The correlation functions are also nonzero when all unit-cube sites D are summed without any staggered phase factor, as reflected in Eq. (3.1), which increases statistics eightfold.

IV. FITTING METHODOLOGIES
After generating data for the two-point correlation function, Eq. (3.1), the next step is to extract the baryon masses. In this section, we discuss general aspects of the problem; in Sec. V, we apply these considerations to the data at hand.
Inserting a complete set of eigenstates of the QCD Hamiltonian into Eq. (3.1) yields (for the 16 irrep) the spectral decomposition where a (r1) and b (r2) are the source and sink overlap amplitudes, M is the mass of the corresponding state, t is the propagation time, and T = N T a is the time extent of the lattice. The superscripts r 1 and r 2 indicate the classes of the source and sink operators. For clarity, we separate the two-point correlator into C N (t), C ∆ , C r (t), and C − (t), which are, respectively, the terms containing only contributions from the ground state nucleon, the lowest three ∆-like states, all other positive parity excited states, and all negative parity states, respectively. Due to the antiperiodic boundary conditions of the lattice, two-point correlator data at time t and T − t should converge to the same result (up to a sign) at infinite statistics. To remove this redundancy, it is convenient to average the correlator data around T /2, substituting 2) and then fitting in t only up to T /2.
Although an infinite number of states contribute to the spectral decomposition, the exponential suppression in Eqs. (4.1) of excited states effectively reduces the sums to a small number of states. Even so, a few still contribute to the correlator after a few time steps. Thus, one obstacle in extracting accurate information about a particular state is correctly disentangling its contribution from the other states' contributions.
In this work, we are only interested in the ground state nucleon, so the contamination just mentioned comes from excited states. We treat all excited states, for both positive and negative parities, as nuisance parameters. There are three kinds of excited states: 1) the three tastes of ∆-like states in C (r1,r2) ∆ , 2) the other parity-even excited states in C (r1,r2) r , and 3) the negative parity partners in C (r1,r2) − . Concerning 1), the nucleon and ∆ baryons have distinct quantum numbers, so in other fermion formulations the ∆ baryon does not contribute to their nucleon correlation functions [15]. Here, however, the 16 irrep contains both the N -like and ∆-like states, as shown in Table II. From experiment [35], the ∆ mass (≈ 1232 MeV) is closer to the nucleon mass (≈ 940 MeV) than any other J P = 1/2 + or 3/2 + excitations, so it is the most important excited state in staggered-baryon correlator data. Further, the 16 irrep contains three tastes of ∆-like states, which are separated by a splitting of order α s a 2 .
Regarding 2), after extracting the N -like and ∆-like states from the correlator data, the next state in the positive-parity spectrum is the N π state in a P wave and, thus, energy near 1250 MeV on our ensembles. The next single-particle state in the spectrum is expected to be the so-called Roper resonance N (1440), which has J P = 1/2 + . Extracting these excited states is, however, unlikely within our statistical precision. Still, we include every state as a nuisance parameter when fitting correlator data to avoid any excited-state contamination.
Last, concerning 3), all our correlation functions show oscillatory behavior as a consequence of the negativeparity states. The lowest single-particle contribution to this channel is expected to be the N (1520) with J P = 3/2 − . The lowest finite-volume two-body state in this channel is expected to be the N π in an S wave with zero momentum and, thus, energy around 1080 MeV. We expect finite-volume corrections from scattering between the two states [36], with potentially large effects near the N (1535) resonance. In the meson sector, extracting twobody eigenstates from correlation functions built from single particle interpolating operators has not been possible [37]. With staggered baryons, even though we only use single-baryon interpolating operators, we may still be able to resolve the lowest negative-parity two-body eigenstates, because the next-lowest state has an appreciable splitting. Moreover, the operator constructions for the 16 irrep classes are somewhat nonlocal, being spread out over the whole unit cube. In fact, evidence of negative parity N π states has been found using only three-quark operators with Wilson fermions [38].
These contributions all must be dealt with carefully in order to extract nucleon physics. Because we compute four different classes of staggered-baryon operators, as described in Sec. II B, we obtain a 4×4 matrix correlation function. Further, we adopt two distinct fitting strategies to ensure that we have removed excited-state contamination reliably. In Sec. IV A, we apply multi-state Bayesian curve fitting [39] to the matrix correlation function, using all information in the spectral decomposition, Eq. (4.1). In Sec. IV B, we solve the generalized eigenvalue problem (GEVP) [40][41][42][43], adapted to correlators with oscillating states [44]. This is particularly suited to staggered baryons because the Golterman-Smit-Bailey construction naturally provides several distinct operator classes. We have found that the two analyses yield consistent results for the nucleon mass.

A. Bayesian Approach
In the Bayesian fitting approach, we simultaneously fit multiple different correlators using the corrfitter and related packages [45][46][47]. We do not fit correlator data built from a class 3 interpolating operator, which we have empirically observed to have a poor signal-to-noise ratio due to a smaller overlap with the lowest N -like state. As such, we fit a 3 × 3 matrix of correlation functions built from the r i = 2, 4, and 6 operator classes residing at either the source or sink.
Within the Bayesian methodology, every fit parameter is assigned a prior distribution. The fit function can incorporate, in principle, an arbitrarily large number of states. Any state insufficiently constrained by the data will return a posterior distribution identical to the prior, and so will have negligible effect upon the fit results. With such an approach, in contrast to plateau fitting (of the effective mass), we can include as many states as needed to successfully fit the correlation functions at small t without compromising fit quality. As baryon correlation functions suffer from an exponential signal-tonoise degradation at large times, fitting to small t increases the amount of available data, leading to a more precise nucleon mass.
Suitably wide priors have to be chosen for each fit parameter, based on available information. In practice, one has knowledge only about the first few excited-state mass splittings, while one has very little knowledge about the remaining spectrum or overlap amplitudes. For the nucleon fit parameters, we want to assume no significant prior knowledge, so the prior width is chosen wide enough to leave them effectively unconstrained. As is standard, we shall demonstrate that our fitted nucleon mass is stable against reasonable variations of the prior widths. To ensure the correct ordering of states, we choose a lognormal distribution for the mass differences between adjacent states and a normal distribution for all other priors, as implemented in corrfitter [45].
We have observed that fits including all three ∆-like tastes are unstable under a variation of fit choices, such as the number of states in the fit function and prior choices. This is caused by the presence of three nearly degenerate ∆-like states in the spectrum, which have masses separated by O(α s a 2 ) taste splittings. Resolving three states with such small splittings is not possible within our statistical errors, and including these three states in a fit function gives rise to a flat direction in the χ 2 landscape.
Knowing the cause of the issue, it is easy to overcome it by removing the flat directions systematically. For simplicity, we can examine a single correlation function. Let δm taste denote the typical taste splitting between the ∆like masses. The taste splittings between HISQ pions are O(α s a 2 ) [10], the largest of which is between the tastescalar pion and Goldstone pion and is around 200 MeV on our ensembles [14]. We order the three tastes-split states as M ∆1 < M ∆2 < M ∆3 and take M ∆3 − M ∆1 ∼ δm taste as the largest taste splitting.
To marginalize the ∆-like contribution, we replace C ∆ (t) in Eq. (4.1) with a functional form containing two exponentials instead of three, where the backward propagating terms and operator class index are suppressed for clarity. One can safely use C ∆ (t) in place of C ∆ (t) for times t such that To explore the systematic error introduced by C ∆ (t), one can Taylor expand both C ∆ (t) and C ∆ (t) around M ∆ 1 to find  Table IV. Parameters for C ∆ (t) are calculated using Eq. (4.6)-(4.8). The t fit within the white range are those used in the later analysis. The correlator data used to generate the crosses are our most statistically precise, coming from the a ≈ 0.12 fm ensemble with a class-2 operator at both source and sink.
Treating a ∆i , b ∆i , and M ∆i as fixed, we can then solve for A ∆ i and M ∆ i order-by-order. There are three free parameters in C ∆ (t) and we can solve for these analytically to This analysis (which has been used successfully before [48]) shows that the two-and three-state fits are equivalent if the statistical error is larger than O((δm taste t) 2 ). In Fig. 1, we compare our statistical precision to the quantity defined in Eq. (4.4) for various δm taste . The data are from the a ≈ 0.12 fm ensemble with the class 2 operator residing at both the source and sink. This correlator has the smallest statistical error of all our data, giving it the best opportunity to resolve the various tastesplit states. As can be seen, the systematic error introduced by using C ∆ (t) is smaller than our statistical precision, and so we are unable to distinguish between C ∆ (t) and C ∆ (t). If we do try to fit to C ∆ (t), we observe the expected flat direction in the ∆ parameters.
In summary, the final function that we perform a Bayesian fit to is (4.9) We have found empirically that C r (t) and C − (t) contribute very little to the correlator during our fit range. Note that one should be careful about assigning physical meaning to the ∆ states, because the corresponding fit parameters at best become physical in the continuum limit where taste symmetry is restored, so that all ∆ and (presumably) ∆ masses converge to the same value.

B. GEVP Approach
The construction of staggered baryons naturally gives rise to different classes of interpolating operators, which can form a basis for the GEVP. With this GEVP approach we can extract the mass of the lowest N -like state.
In our analysis of the 16 irrep correlator data we have a 3 × 3 matrix C(t) ≡ C (r1,r2) (t), where r 1 , r 2 ∈ {2, 4, 6} denotes the class of the operator at the source or sink. The GEVP is defined via where v L i (t, t 0 ) and v R i (t, t 0 ) are the generalized left and right eigenvectors. They share the same set of generalized eigenvalues λ i (t, t 0 ). Here, i ∈ {1, 2, 3} as we have a rank three correlation matrix.
For staggered fermions, λ i (t, t 0 ) has the form [44] lim (4.12) where M i is the mass of i-th particle in the spectrum, δM i accounts for contamination from oscillating states, and A i , B i are coefficients which satisfy A i + B i ≈ 1. In the asymptotic large-time limit each of the λ i would couple to just one state. In practice, due to the signal-to-noise problem, it is not possible to take this limit and in finitetime computations some excited state contributions are present in the eigenvalues. However, this contamination can be controlled by varying t 0 and t, as shown below.
Based on the arguments in Sec. IV A, we can marginalize a ∆-like state and constrain the effective ∆ contributions. As a consequence, we expect the N -like ground state and the two effective ∆ states to be the three distinguishable states from the asymptotic time GEVP with a 3×3 basis. t 0 is chosen to suppress excited state contributions to the eigenvalues, which in this work come from the higher nonoscillating states and all of the oscillating states. The traditional approach employed to suppress excited state contamination in the GEVP is to choose a large enough t 0 in either Eq. (4.10) or (4.11). Below we show control over excited contributions by varying the choice of t 0 in Sec. V B, where we observe such excited states have a negligible impact on the quality of fits.
The presence of the staggered phase (−1) t/a in Eq. (4.12) can be mitigated via a combination that becomes an exact cancellation in the asymptotic large-time limit. To achieve such an affect, we take the symmetrized combination which cancels most of the opposite-parity contributions. A similar expression holds for the left eigenvectors v L i (t). At large t and t 0 , λ i (t, t 0 ) → λ i (t, t 0 ), but for intermediate t and t 0 λ i (t, t 0 ) is much better behaved due to the explicit cancellation of oscillations. In the scenario with a 1 × 1 correlator matrix, this approach is equivalent to the smoothed effective mass [44,49].
After performing the GEVP analysis and determining the eigenvalues, we perform unconstrained plateau fits to where C i and δM i are fit parameters used to account for any residual excited-state contribution, and τ = t − t 0 .
In the subsequent analysis, we fix τ /a = 2 and vary t 0 .
In this way we can obtain the nucleon mass M N = M 1 .

V. FITTING NUCLEON CORRELATOR DATA
In this section, we present specific details about fitting the 16 irrep two-point correlators. The N -like mass on each ensemble is determined as a fit parameter from this process. The Bayesian and GEVP analyses are discussed in turn.

A. Bayesian Analysis
Table V summarizes the priors used in the Bayesian fits for the three ensembles that we utilize. Detailed prior and posterior values are given in Table XI of Appendix D. For each two-point correlator, t min is fixed in physical units across all three ensembles to be ≈ 0.6 fm and t max is chosen to be the timeslice where the signal-to-noise exceeds 5%.
We choose large prior widths for the excited states in the region of the ∆ mass, because finite-volume states could receive significant corrections from avoided level crossings with N π-like scattering states, amongst others. The mass difference between the lowest N -like state and the ∆ 1 state is chosen so that M ∆ 1 ≈ 1230 MeV, which is the physical ∆ resonance mass. We choose a prior width of 100 MeV. To roughly match the observed taste-split masses in the staggered meson sector, the mass difference between ∆ 1 and ∆ 2 states is given a prior of 150 ± 50 MeV, 100 ± 50 MeV and 50 ± 50 MeV on the 0.15 fm, 0.12 fm and 0.09 fm ensembles. The other mass differences, for both even and odd parity states, are 400 ± 200 MeV.

Fit Analysis
For brevity, we focus on the ensemble with a ≈ 0.12 fm, with the corresponding information for a ≈ 0.15, 0.09 fm in Appendix D. We plot the raw correlator data overlaid with the fit result in Fig. 2, showing the results for the correlator matrix element, C (ij) , where i, j = 2, 4, 6 corresponds to the different source and sink operators. The fractional residues in the bottom panels of these figures are defined by frac. res. ≡ data − nominal fit data (5.1) where the correlations between the data and the nominal fit are not included and the exhibited error bars are statistical.
To further examine the quality-of-fit of the nominal fits, we show effective masses in Fig. 3. Again, we plot results for each correlator matrix element C (ij) . The effective mass is defined by where C(t) is the two-point correlator at timeslice t, and τ = 2 is chosen to reduce the effects of oscillations. For some two-point correlators (solid blue circles), effective mass plateaus are not visible due to the excited state contributions combined with the oscillating terms (cf., Eq. (4.1)). Traditional plateau fits to these effective masses will clearly fail. After carefully fitting away the excited states with the Bayesian method and subtracting the excited-state contributions from the raw data, a much   better plateau (solid orange circles) appears across the fit range. There is also consistency between plateaus for different choices of source and sink operators. Without excited states, the solid orange circles would only contain a single exponential contribution from the ground state, and produce a flat plateau. Consequently, we have successfully eliminated most of the excited states contamination. The final posterior estimates of the N -like masses are done with 1000 bootstrap samples for each ensemble. They are listed in the Bayesian fit column of Table VII.

Systematic Checks
We have plotted several stability plots in Figs. 4-5 to demonstrate control over various types of systematic effects when extracting the N -like mass. Excited state contamination, whether from the ∆-like states or the first negative parity state, will manifest as a variation of the ground state N -like mass posterior as we change the value of t min /a. Figure 4 shows the stability of our Bayesian results under such a variation. Note that as t min is increased, there are fewer data points included in the fit and as such the errors increase. We observe that the ground state N -like mass is stable under a change of t min for all three ensembles. This indicates that excited state contamination is under control in these Bayesian fits.
To estimate any small residual excited state contamination, let t nom /a be the nominal value of t min /a, which is shown as the red diamonds in Fig. 4. We choose t sys ≈ 0.15 fm to be a fixed physical length across all ensembles and vary t min = t nom −t sys to produce the posterior values shown as black squares in Fig. 4. The central value difference between the red diamonds and black squares is an estimate of the fitting systematic in the extracted ground state posterior. We expect the source of this difference to be due to any small residual excited state contamination and/or the choice of fitting parameters. Although these two errors are not independent, we conservatively combine the statistical error and the above fitting systematic estimate in quadrature to obtain the total N -like mass posterior width.
We also study how the ground-state mass posterior changes as a function of the number of states included in the Bayesian fit function, as shown in Fig. 5. Based on the stability of the ground state posterior, our nominal fit contains four even and four odd parity states, which we denote by 4E+4O. As can be seen, including too many higher excited states and, thus, many more poorly constrained priors, can cause noticeable changes in the lowenergy posteriors. This behavior could be avoided if we had some prior knowledge of the overlap factors, which could then be used to impose prior widths of natural size.
Finally, the variation of posteriors from changing t max /a is not significant, as shown in Fig. 6. The breakdown of the final uncertainties can be found in Table VII. 3. Negative-parity states As shown in Eqs. (4.1), our correlation functions contain contributions from negative-parity states. After combining positive and negative times via Eq. (4.2), this contribution comes with a characteristic (−1) t/a in the time evolution. The lowest-lying single-particle negativeparity state should be the N (1520), while the lowest-lying two-particle threshold should consist of S-wave N π states with energy around M N + M π = 1100 MeV. With staggered quarks, N π states spread out over several levels, corresponding to different tastes of pions. Experience from the meson sector and studies of nucleon correlators in chiral perturbation theory [50] provide no reason to expect that N π states contribute enough to single-particlecorrelator data to be determined reliably. That said, our operators differ from those in other formulations, being spread over a unit cube. Particularly in light of the results of Ref. [38], we should keep an open mind.
We have studied these states with the Bayesian methodology. The default prior for the lowest-lying negative-parity energy is 1400(200) MeV (cf., Table V), which yields a posterior centered around 1250(50) MeV for the 0.15 fm and 0.12 fm ensembles (cf., aM −,1 in Table XI). The 0.09 fm ensemble does not exhibit this behavior, instead returning a posterior of 1400(50) in agreement with the prior. We have tried further Bayesian fits on the coarser ensembles with the prior centered within the range 1250-1500 MeV and a similar width. Such fits always return a posterior centered around 1250 MeV, and the same holds for any prior with significant probability at 1250 MeV. Note also that even though the GEVP should not be expected to isolate any state besides the positive parity nucleon and two ∆ s, it also returns 1400 ± 40 MeV in the negative-parity channel. To pull the posterior away from 1250 MeV, it is necessary to choose a prior with center separated from 1250 MeV by at least a few multiples of the prior width. In such cases, it is possible to obtain a posterior centered somewhere in the range 1250-1500 MeV.
These findings suggest that the data contain some information about N π states, but it is too weak to pull a "bad prior" back to the expected threshold energy. It is certainly more likely that the 1250 MeV signal consists of multiparticle states than the N (1520). While this study is interesting, a definitive work would require multi-body interpolating operators.
In the context of our determination of the nucleon mass, this study of the negative-parity channel is relevant for the simple reason that the N -like posterior is stable under the changes mentioned here.

B. GEVP Analysis
We have also performed a GEVP analysis in order to extract both λ 1 (t, t 0 ) and λ 1 (t, t 0 ). From the solutions of the GEVP in Eqs. (4.10) and (4.13), we use Eqs. (4.12) and (4.14) respectively to fit the ground state nucleon mass. Table VI summarizes the fitting parameters used in the GEVP analysis. To compare consistently between both fitting strategies, we impose that t 0 + t min in the λ 1 fits is equal to t − t 0 + t min in the λ 1 fits.
For the λ 1 (t, t 0 ) analyses, we fix t 0 /a = 3, 5, 5 for the 0.15 fm, 0.12 fm, and 0.09 fm ensembles to minimize the effects of oscillating states. In Fig. 7, we have plotted the results for λ 1 (t, t 0 = 5) overlaid with the fitted value for the 0.12 fm ensemble. Similar plots for the other ensembles are shown in Fig. 17 in Appendix D.
Next, in the left column of Fig. 8, we show three different definitions of the effective masses for the λ 1 (t, t 0 ) data just described. The first effective mass (solid blue circles) is the usual effective masses as defined in (5.2) using λ 1 (t, t 0 ) instead of C(t). The second effective mass (solid orange squares) is obtained by subtracting the central values of the excited states from the nominal fit to λ 1 (t, t 0 ) and then using Eq. (5.2). This third effective mass (black diamonds) is obtained by reducing the oscillations in the effective masses via the smoothed effective mass [44,49] M eff (t) = 1 4 (2M eff (t) + M eff (t − 1) + M eff (t + 1)) .  We can see that M eff (t) shows no sign of oscillations and agrees well with both Bayesian estimates and the direct fitting to λ 1 (t, t 0 ), giving confidence in the reliability of the extracted nucleon mass.
The results are shown in the right column of Fig. 8. Without any post-processing, the plateaus of λ 1 are clearly identifiable, with no sign of oscillations. Thus, with no issues associated with negative-parity states, we simply perform unconstrained fits to Eq. (4.13) and extract the N -like mass.
We estimate the fitting systematics of both λ 1 and λ 1 using the same procedures as in Sec. V A 2. These fitting systematics are listed in Table VII. All aspects are qualitatively similar to those in Sec. V A 2. Still, to emphasize this point, in Fig. 9 we show the stability of the GEVP extracted nucleon mass as a function of t min /a. Table VII lists the N -like mass estimates for all ensembles from the three analyses. The extracted N -like mass values from the three Bayesian and GEVP analyses all agree within their (uncorrelated) 1σ uncertainties. As the smoothed effective mass with the GEVP removes opposite parity contamination and shows a visible plateau, we take the N -like mass values from λ 1 (t, t 0 ) for use in a continuum extrapolation and all further discussion. Note that since all three analyses agree with one another, the value of the continuum nucleon mass does not depend on which fitting methodology we use.

VI. NUCLEON MASS DETERMINATION
In the previous section we have extracted the N -like masses from physical-mass ensembles at three lattice spacings. In this section, we use these values in order to extract a physical value of the nucleon mass that can be compared to experiment.

A. Sources of systematic error
In our calculation, the sources of systematic uncertainty include excited-state contamination, a slightly unphysical quark mass on one ensemble, finite-volume effects, isospin-breaking effects, and scale-setting errors. Errors arising from excited-state contamination have already been addressed, estimated, and controlled in Sec. V.
As mentioned in Sec. III, all three of our ensembles have nearly-physical pion masses. As such, we avoid the potentially large chiral extrapolation errors (compounded by using a slowly converging chiral fit function [34]) and do not need to include an error from a chiral extrapolation. Nevertheless, the mistuned light-quark mass on the 0.09 fm ensemble is an important effect. Although the other two ensembles have negligible mistuning, the task at hand is to combine the three results. The taste-Goldstone pion on the 0.09 fm ensemble has a mass of 128.3(7) MeV, as mentioned in Sec. III, which is about 7 MeV smaller than the taste-Goldstone pion mass from the other two ensembles. In Refs. [51,52], a wide range of pion masses is studied, and it is observed that (within uncertainties) M N ≈ 800 MeV + M π over a wide range of pion mass and with different actions. Since our mistuning is small, this observation suggests that the nucleon mass on the 0.09 fm ensemble is approximately 7 MeV too small. We therefore take account of the 0.09 fm ensemble mistuning by applying a shift of +7(7) MeV before performing the continuum extrapolation.
Single-particle finite-volume errors are exponentially small as a function of M π L [33]. For our a ≈ 0.15 fm, 0.12 fm, and 0.09 fm ensembles, the corresponding values of M π L are 3.4, 4.0, and 3.7 respectively. The lattice data of Ref. [53] supports a model based on a resummation of the Lüscher formula [54], but the statistical errors on the nucleon masses are too large-around 5%-to be conclusive. Applying this model with our ensemble parameters, only the a = 0.15 fm nucleon mass would receive an appreciable correction, namely −5 MeV. However, applicability of this model is still unclear, as Ref. [55] observes no change in the nucleon mass with a variation of M π L between 3.4-6.7, with ∼ 1% statistical and fitting uncertainties. Moreover, Ref. [20] has ensembles which have M π L ranging from 3.3-5.5 and finds a positive 4 MeV shift between M π L = 3.3 and 4, beyond which any further change is negligible. Due to this conflicting information, even about the sign of the correction, we apply a 0(5) MeV error on the a = 0.15 fm nucleon mass arising from finite-volume corrections. Our final result is insensitive to this finite-volume correction and we leave an in-depth study of potential finite-volume corrections of the nucleon mass on the MILC HISQ ensembles to a future investigation.
Our lattice simulation is isospin symmetric, i.e., the up-and down-quark masses have the same value and quantum electrodynamics is omitted. Both of these ef- The green band is the result of the Bayesian fit described in Sec. V A 1. The right column shows the data for λ1 (defined in Eq. (4.13)) and the corresponding plateau fits (brown bands) to − ln λ1/τ for τ /a ≡ (t − t0)/a = 2 using Eq. (4.14).
fects effects give rise to the proton-neutron mass difference, which is less than 1 MeV. As 1 MeV is small compared to our statistical error, we apply no additional uncertainty from these effects.

B. Continuum extrapolation
Using the ensemble-by-ensemble nucleon masses given in Sec. V B, we can include all sources of systematic error and perform a continuum extrapolation to produce a nucleon mass which can be compared to experiment. To do so, we perform a Bayesian fit to the functional form    and choose an order-one prior for o 4 = 0(1). The list of priors and posteriors are given in Table VIII, and the continuum extrapolation is shown in Fig. 10. where "stat", "fit", "a", "FV", and "mis" represent the statistical, fitting, scale-setting, finite-volume, and the 0.09 fm ensemble quark mistuning errors contribution to the final continuum nucleon mass uncertainties. With no prior constraint on M N,phy , we find the posterior M N,phy = 966(8) stat and the same systematic uncertainties as in Eq. (6.2). Our determination of the nucleon mass is 1.6σ above the experimental value, which arises due to the high nucleon mass on the 0.09 fm ensemble. This behavior can be clearly seen in Fig. 10. Either higher statistics at a ≈ 0.09 fm or additional ensembles with smaller lattice spacings could be employed to see whether this is a statistical fluctuation. Such work is planned.
It is interesting to see what happens if we do not apply a +7(7) MeV correction on this ensemble for the quarkmistuning. If instead we apply a 0(7) MeV correction, the final value of our nucleon mass is 955 (16) MeV which is within the 1σ error of our final result in Eq. (6.3), as expected. Even though this result is closer to experiment, we do not prefer it, because the size and direction of the shift is on solid footing. The only robust way to reconcile this issue is to generated an ensemble with a more precisely tuned light-quark mass.

C. Comparison With Other Studies
The average of the proton and neutron masses found in experiment [35] is 939 MeV, and the uncertainties on these masses are about the level of one part per million. In this work, we determine a nucleon mass of 964(16) MeV. Although this work is the first to determine the nucleon mass from first principles using staggered baryons, other first-principles results exist in the literature. Those which quote a full error budget, and hence are comparable to the present work in scope, can be found in Refs. [20,53]. 2 Reference [53] uses a tree-level O(a 2 )-improved Symanzik gauge action, (2 + 1) tree-level improved Wilson fermions, and includes 20 different ensembles covering three lattice spacings (0.13 fm, 0.09 fm, 0.07 fm), 4-5 different light quark masses (giving pion masses ranging from approximately 190-650 MeV), three ensembles with different physical volumes, and eight ensembles with different strange quark masses. Reference [53] gives two determinations of the nucleon mass: 936(25)(22) MeV and 953(29)(19) MeV, where the first error is statistical and second is systematics. Here, the two values differ by the quantity used to set the scale: the first uses the Ξ baryon mass and the second uses the Ω. Reference [20] uses 11 ensembles generated by the MILC collaboration (general details of which are in Sec. III). The ensembles have (2 + 1 + 1)-HISQ sea-quarks with pion masses ranging from 128-320 MeV and four lattice spacings covering 0.06 fm, 0.09 fm, 0.12 fm, and 0.15 fm. Their valence quarks have the improved Wilson-clover action. With a combined chiral-continuum-finite-volume ansatz for the systematic extrapolation, they find a nucleon mass of 976 (20) MeV. As such, our result is the most precise first-principles determination of the nucleon mass in the literature, and is relatively low cost calculation. A comparison of the results from the three collaborations is shown in Fig. 11.

VII. DISCUSSION AND CONCLUSIONS
In this work, we have extracted a precise and accurate value for the nucleon mass, including a full error budget, using lattice QCD with the HISQ action for both valence and sea quarks. We find M N = 964 ± 16 MeV [Eq. (6.3)]. Some notable details of our simulations are three lattice spacings ranging from a = 0.09-0.15 fm, all of which are tuned to a nearly-physical pion mass. All ensembles have u, d, s and c quarks in the sea. We employ three different fitting methodologies: multistate constrained Bayesian curve fitting and two versions of the generalized eigen-value problem approach. Within each approach, we verify stability under variation of the fitting range, the numbers of states, and other choices. The superb consistency between the results of these fitting procedures demonstrates their robustness and accuracy.
Our results suggest a promising outlook for staggered baryon lattice QCD. As can be seen in Eq. (6.2), our dominant error arises from the light-quark-mass mistuning on the 0.09 fm ensemble, compounded by the continuum extrapolation. The most direct method to reduce this error would be to generate an ensemble with a better tuned light-quark mass. Alternatively, an ensemble with slightly heavier light quarks would allow retuning via interpolation. Further, with three ensembles a 1σ statistical fluctuation on one of them is not unlikely. As can be seen in Fig. 10, the 0.09 fm ensemble seems to exhibit such a fluctuation. Another data point at smaller lattice spacing would help alleviate effects from both the mistuning and this potential fluctuation.
After the error from mistuning, the next largest error comes from statistics. Reducing the statistical error is possible by adding additional configurations, or adopting techniques such as the background field method [57], lowmode averaging, or the truncated solver method [20,58]. Another way to reduce the statistical error is to compute the matrix correlation functions for the 8 and 8 irreps. In the continuum limit, where taste symmetry is restored, all N -like masses should tend to the same point. Thus, the final result could be improved by combining the information from all three baryon irreps and enforcing a common continuum limit. Finally, one could introduce more sophisticated smeared interpolating operators. We have carried out initial tests with stride-two staggered smearing functions and find them to be promising.
Staggered-baryon methodology can be straightforwardly applied to compute further baryon properties.
The Ω baryon mass is especially interesting for scale setting in lattice QCD [59]. It is long-lived and composed of three strange quarks, so the quark propagators are computationally cheaper than those for light quarks, and its two-point correlation function has a better signal-to-noise ratio. Moreover, the Ω baryon mass is known unambiguously from experiment, unlike the pion decay constant, which relies on determinations of |V ud | from nuclear beta decay. Robust and precise scale setting is, of course, crucial as total error budgets for lattice QCD fall below 1%, which is not only feasible but, in the case of hadronic-vacuum-polarization contribution to the muon g − 2, necessary [60]. We have started such work on the ensembles used in Table III), i.e., neglecting QED, which is especially important with a charged particle.
This work also paves the way for all-staggered computations of three-point baryon correlation functions. Now that we have identified N -like states via both the GEVP and multistate Bayesian curve fitting, we can have confidence that the extracted N -like matrix elements do indeed correspond to the physical nucleon. Especially important for neutrino scattering experiments, for example, is the nucleon axial form factor. The first step in such a program is to calculate the axial charge, g A , which is just the form factor at zero-momentum transfer. Because it is precisely known from neutrino beta decay, g A serves as a calibration point for lattice QCD. Indeed, we consider the nucleon mass presented here more important as a prerequisite for future all-staggered calculations of nucleon matrix elements than as a test of (lattice) QCD.

Appendix A: Irreducible Representations of GTS
In this Appendix, we are concerned with the geometric symmetries of staggered fermions, in order to classify physical baryon states. The properties of physical states under charge conjugation, baryon number, and the exact axial symmetry are straightforward. Complications arising from the interplay of flavor and taste are deferred to Appendix B.
The emergence of four Dirac fermions in the continuum limit stems from the group theory of the shifts, which translate fields by a single lattice site and multiply fermion fields by a convention-dependent sign factor such that where F is fermion number, and µ and ν denote directions of translations. The sign factor for S µ depends on x ν , ν = µ, but does not depend on x µ . Therefore, T µ ≡ S 2 µ is a normal translation by two lattice sites for all fields. It is convenient (and permissible, because the translations commute with the shifts) to remove the translation part of the shifts by introducing Ξ µ ≡ S µ T −1/2 µ , where T −1/2 µ is a formal square root of T −1 µ in any representation of the symmetry group. Nowadays one calls the Ξ µ the "taste" generators (reserving "shift" for S µ and "flavor" for flavor). They satisfy Eq. (A1) and Ξ 2 µ = 1; thus, they generate the Clifford group Γ 4 .
On physical states, T 4 is the (two-timeslice [27,61]) time-evolution operator, also known as the transfer matrix. The classification of states in Hilbert space hinges on the symmetries that commute with T 4 . These are the spatial translations, all four shifts, and (assuming the same extent in all three spatial directions) the rotationreflection symmetries of the cube. Thus, on a (2N ) 3 spatial lattice with (anti)periodic boundary conditions, the geometric symmetry group of the staggered-fermion transfer matrix is [21][22][23] where the first two factors are the groups generated by the (two-site) spatial translations T i and the tastes Ξ µ . W 3 is the cubic rotation-inversion symmetry group, generated by π/2 rotations in the ij plane, R ij , and spatial inversion, I S . The last product is semidirect, because Earlier work [22,23] has shown that the problem of finding irreducible representations can be simplified by grouping the generators judiciously. In particular, the spatial taste generators can be chosen to be Ξ 123 ≡ Ξ 1 Ξ 2 Ξ 3 and any two Ξ ij ≡ Ξ i Ξ j . Further, the combination P ≡ Ξ 4 I S commutes with the taste generators as well as with rotations, so in the continuum limit it is the analog of parity [22,23]. It is convenient to use I S as a generator and leave parity until the end; then [29], where Z 2 (P ) = {1, P }, Q 8 is the quaternion group of order 8, D 4 is the dihedral group (also of order 8), and SW 3 ⊂ SO (3) is the cubic rotation group. The generators of these groups are listed in Table IX. 3 The In this paper, we are concerned with the trivial representation of the translation group, namely, zero 3momentum. We note in passing, however, that the group theory at nonzero momentum is simpler if the taste generators insensitive to rotations are factored as in Eq. (A3).
At zero momentum, we are left with the "geometric rest-frame group" [22] where the 768-element "geometric timeslice group" [22] Equation (A5) is equivalent to an isomorphism given by Kilcup and Sharpe [23], since Q 8 SW 3 is isomorphic to SW 4 , the rotation group of the four-dimensional hypercube. Baryon states transform under the "fermionic" representations of GTS, namely those that preserve the minus sign in Eq. (A1). Both Q 8 and D 4 have one such irrep, which is two-dimensional and can be expressed as Pauli matrices. We denote them σ and B, respectively. Similarly, fermions obtain a minus sign under 2π rotations, which is possible with representations of the double cover of SW 3 , SW 3 ⊂ SU (2). As shown in Table I, there are three of these [28], G 1 , H = G 1 ⊗ E, and G 2 = G 1 ⊗ A 2 , where E and A 2 are, respectively, the two-dimensional and nontrivial one-dimensional irreps of SW 3 . The fermionic irreps of GTS are then the tensor products (labeled by their dimension, following Ref. [23]) From the matrix form of the tensor product, one sees that σ ⊗ B automatically identifies (−1, −1) ∈ Q 8 × D 4 with (1, 1). For completeness, we discuss the bosonic representations of GTS, which correspond to even F in Eq. (A1) and no sign for 2π rotations. Because of the Z 2 divisor in Eq. (A3), these arise from the bosonic representations of all three factors. D 4 has four 1-dimensional bosonic representations, A I S Ξ123 , in which ±Ξ 123 and I S can each be represented by ±1. Consequently, for every bosonic irrep of (Q 8 SW 3 ), four irreps of GTS are induced. These induced representations are just the tensor products with A I S Ξ123 and, thus, have the same dimension as their (Q 8 SW 3 ) counterparts.
To fully classify representations of (Q 8 SW 3 ), it is easiest to first consider representations of Q 8 and then use the Wigner little-group method to induce the representations of the full group. 4 Q 8 has four one-dimensional bosonic irreps, which are the trivial representation and three sign representations in which two of Ξ 23 , Ξ 31 , and Ξ 12 have character −1 (and the third +1). The trivial representation is in its own orbit, and the latter three for another orbit. These orbits arise from the way the group elements transform into each other under conjugation with the rotations: Physically, the nontrivial bosonic representations act as a 3-vector under rotations. The vector's direction follows from the signs representing the Ξ ij . The orbits and their little groups, L ⊂ SW 3 , are shown in Table X. Note that the little group D 4 for the nontrivial 1-dimensional Q 8 irreps is generated by, 5 e.g., R 23 and R 2 12 for the irrep in which the character χ(Ξ 23 ) = 1 (and χ(Ξ 12 ) = χ(Ξ 31 ) = −1). From this construction, one sees that (Q 8 × SW 3 ) ∼ = SW 4 has 10 bosonic irreps and three TABLE X. Structure of GTS irreps γ. The first column shows the orbits of the Q8 irreps under SW3. The little group L ⊂ SW3 and its irreps are given in the next two columns. (As discussed in the text, the fermionic irrep requires the double cover SW3.) The fourth column gives the irrep of the D4 factor in Eq. (A3). The next-to-last column gives the dimension dim γ of the induced irreps of GTS. The last column gives the number of resulting GTS irreps: in all, 40 bosonic and 3 fermionic. fermionic irreps. The final step is simple, because GTS is a direct product of (Q 8 × SW 3 ) ∼ = SW 4 with D 4 , but modded out by a Z 2 , requiring bosonic (fermionic) irreps to be tensored with bosonic (fermionic) irreps. Thus, GTS has 40 bosonic irreps and three fermionic irreps.

Appendix B: Staggered Lattice Baryon Irrep Identification
The useful strategy to build baryon operators starts with embedding SU(2) spin and SU(3) flavor into an SU(6) spin-flavor group. As baryons must obey Fermi statistics, the overall baryon wavefunction must be antisymmetric. The antisymmetrization is completely captured by SU(3) color, so the only needed representations of SU(6) spin-flavor are those that are overall symmetric. Decomposition of these symmetric SU(6) representations back into SU(2) S × SU(3) F pairs the symmetric (mixedsymmetric) representations of SU(2) S with the symmetric (mixed-symmetric) representations of SU(3) F , giving the usual octet and decuplet of physical baryons.
When including the continuum taste symmetry, this can be extended to include the SU(4) taste symmetry. Golterman and Smit [22] pursued this strategy without considering flavor, and Bailey [24] generalized it to include SU(3) flavor. Here, we summarize the main steps.
Thus, SU(2) S , SU(3) F and SU(4) T are embedded into SU (24), applying the symmetrization to combinations of spin, flavor, and taste. In addition to the usual baryon decuplet and octet, which lie in the symmetric SU(4) irreps, further states appear in mixed and asymmetric taste representations combined with mixed and asymmetric spin-flavor representations. These states have no real-world equivalent, but the SU(24) embedding demonstrates that they have the same masses and matrix elements as the physical baryons.
At nonzero lattice spacing, the spin-flavor-taste representations break down into direct sums of GTS irreps. There are two important consequences. First, states within a continuum-limit multiplet split into smaller multiplets that differ at order a 2 (or α s a 2 for tree-level improved actions). Second, because there are so few GTS irreps, various multiplets can mix, again at order a 2 (or α s a 2 ). Excitations of the GTS irreps must, in general, be matched up as the continuum limit is approached to identify higher-spin baryons, as is familiar elsewhere in spectroscopy [63].
In the following, the SU(N ) representations are denoted by a number and a subscript, where the number is the dimension of the representation and the subscript refers to the symmetrization of the representation indices, and can either be symmetric (S), mixed-symmetric (M ), or antisymmetric (A). Subgroups also often carry a subscript for spin (S), flavor (F ), or taste (T ). Note that the restriction of a large SU(N ) to smaller SU(N ) subgroups need not be unique. In all of the following, we use the pattern for SU(N ) → SU(N 1 ) × SU(N 2 ), N = N 1 N 2 , in which an SU(N 1 ) × SU(N 2 ) matrix is the Kronecker product of an SU(N 1 ) matrix and an SU(N 2 ) matrix. Thus, this decomposition always starts with and yields only defining representations, i.e., N → N 1 ⊗ N 2 .
Quarks transform under the defining 24-dimensional representation of the SU(24) embedding group. The symmetric combination of three fundamental quarks is the representation denoted 2600 S . The first step is to separate out the SU(2) spin group, which yields where we abbreviate, for example, 4 S ⊗ 364 S by (4 S , 364 S ). In Eq. (B1), 4 S (2 M ) is the usual symmetric (mixed-symmetric) spin 3 2 ( 1 2 ) construction for the baryon decuplet (octet). Now, however, we have larger multiplets of SU(12) F T . Because "flavor" and "taste" are both names for quark species, 6 the SU(12) F T symmetry remains, even when these representations are decomposed into SU(3) F × SU(4) T . As a consequence, any representation that is formed by decomposing the (4 S , 364 S ) representation can be identified with a baryon from the physical decuplet, and similarly any representations found by decomposing the (2 M , 572 M ) irrep can be identified with baryons from the physical octet. It is convenient to refer to states in these representations decuplet-like and octet-like, respectively, as a reminder of the differences with and similarities to the physical decuplet and octet.
Next, the flavor and taste symmetries are separated from each other. The decomposition of the 572 M representation gives The physical octet is in the taste-symmetric (8 M , 20 S ) representation. The other representations are all a consequence of nontrivial taste symmetry. As discussed in the main text, they should not be discarded: they are, in fact, useful, in a way similar to the utility of taste-nonsinglet pions. Similarly, the decomposition of the 364 S representation yields The ( where the ellipses denote representations with nonzero strangeness. The 4 S and 2 M are the isospin 3 2 and 1 2 representations, respectively. At this point, we have the group-theoretic ingredients to specify the operators for baryon states labeled by (S, F, T ). To understand the decomposition of these representations into GTS, it is convenient to carry out the decomposition in several steps. Each of the subgroups Q 8 , SW 3 , and D 4 that build GTS has a faithful fermionic representation generated by Pauli matrices. As such, identification of each of the subgroups with SU(2) is useful. 7 One of these factors comes directly from the SU(2) S spin in the decomposition of Eq. (B1), while the other two are found by decomposing the SU(4) taste factor into SU(2) Q8 × [(SU(2) × Z 4 )/Z 2 ] D4 . Under this decomposition, we have the following three representations to consider For brevity, we do not introduce a label for the Z 4 quantum number. The nontrivial element is ±i (±1) for the fermionic (bosonic) representations, with the sign being removed via modding out by Z 2 .
To mimic GTS, the quaternion factor SU(2) Q8 should be in a semidirect product with something corresponding to the lattice rotation group, which we denote SU(2) SW3 . In the continuum limit, however, spin and taste commute, namely SU(2) S × SU(4) T ; cf. Eqs. (B1) and (B2). It is possible to arrive and the desired structure by noting where SO(4) ⊃ SW 4 of Eq. (A6). If the generators of SU(2) Q8 and SU(2) S are τ and Σ, respectively, then the generators of SU(2) SW3 are σ ≡ Σ + τ . Although τ and Σ commute, one finds the desired behavior of the tastes under lattice rotations: [σ i , τ j ] = 2iε ijk τ k .
In summary, to mimic GTS with SU(2) groups, (B7) the last Z 2 is the same as the Z 2 factor in Eq. (A5). The remainder of this section focuses on decomposing the various SU(2) group factors down into their discrete lattice subgroups, being mindful that SU(2) Q8 × SU(2) S in Eq. (B7) is isomorphic to SU(2) Q8 SU(2) SW3 , as shown in Eq. (B6). In this way we derive the full map from SU (24), where it is easiest to construct operators obeying Fermi statistics, to the GTS symmetry of lattice gauge theory.
The decompositions of all irreps to this point have resulted in just two SU(2) representations: 2 M and 4 S . It is important to keep track of which subgroup factor each representation belongs to. In the interest of clarity, the subduction of these subgroup factors are listed for each subgroup factor individually, with a guide for assembling the individual subduction patterns into GTS irreps. The end result of this assembly yields the subduction of Eq. (B7), summarized in Eq. (B14).
The [(SU(2) × Z 4 )/Z 2 ] D4 factor is separated from the other products by a direct product, and so may be considered independently. Subduction to D 4 can yield only one fermionic representation, and so all fermionic representations must subduce to multiples of this irrep. This gives Thus, when 4 S of [(SU(2) × Z 4 )/Z 2 ] D4 appears in Eq. (B8), the irreps subduced from the other groups appear twice in the subduction of GTS. Eq. (B7) demonstrates that the taste and spin representations may be considered separately in the continuum. However, he presence of the semidirect product in Eq. (A5), Q 8 SW 3 , means that the lattice generators of rotations mix up the discrete taste transformations. It is instructive to trace the subduction from the continuum SU(2) Q8 SU(2) SW3 down to the discrete subgroup Q 8 SW 3 to see how the spin and taste degrees of freedom become mixed up by the discretization.
The first step in the subduction is to identify how the direct product is converted to a semidirect product in Eq. (B6). The semidirect product is enforced by replacing Σ with σ, to arrive at the rotation group SU(2) SW3 that acts on both spin and taste. Since the two groups in Eq. (B6) are isomorphic, their irreps must be in a one-to-one correspondence. In addition, to preserve the representations' dimensions, the semidirect product cannot mix different irreps, and the little group is nothing but the entire SU(2) SW3 group in all cases. This means that the mapping between irreps of the direct product and the semidirect product is trivial, where for clarity below, we omit the second subscript when referring to the semidirect product.
To understand the decomposition of the semidirect product group, it is not sufficient to break the two SU(2) subgroups into Q 8 and SW 3 , respectively, but the separate breakings provide a useful ingredient. The Q 8 group factor has only one fermionic irrep, σ, and so all fermionic irreps of SU(2) Q8 must break into copies of that irrep, The SW 3 factor has three fermionic irreps, and the decomposition of the relevant SU(2) SW3 irreps is simple, The key point is that the semidirect product also induces a rotation of the copies of the σ irreps into each other. This additional transformation acts as an irrep of the SW 3 rotation group, and is combined as a tensor product with the irrep resulting from the direct decomposition of the SU(2) SW3 factor. We can write this as an additional irrep factor belonging to the SW 3 group instead of as an uninformative multiplicative factor on the number of irreps, eight different D components are related by a shift symmetry, we fix D and only give a single operator within this set. The other seven operators within this set can be generated with the nontrivial shift symmetry transformations.
Eqs. (C1)−(C8) gives all the operators we use that are unrelated by a shift symmetry, e.g., one choice for each value of s and C. Quarks on the site xz conventionally appear with an extra minus sign so they respect the shift and rotation symmetry operations. The operators are written with only the positive directions, but symmetrization over all combinations of positive and negative directions is implied. These operators are also shown diagrammatically in Fig. 12. We have found empirically that the class 3 operators O 16,3 are much noisier than the others and so we have excluded them from the analysis. In this Appendix, we provide additional data for the other ensembles that are not given in the main text.

ACKNOWLEDGMENTS
We are grateful to the MILC collaboration for the use of the source code adapted to generate the correlators in this study and for permission to use their 2+1+1-flavor gauge-field ensembles. Computations for this work were carried out on facilities of the USQCD Collaboration, which are funded by the Office of Science of the U. S.