Nucleon isovector charges and twist-2 matrix elements with $N_f=2+1$ dynamical Wilson quarks

We present results from a lattice QCD study of nucleon matrix elements at zero momentum transfer for local and twist-2 isovector operator insertions. Computations are performed on gauge ensembles with non-perturbatively improved $N_f=2+1$ Wilson fermions, covering four values of the lattice spacing and pion masses down to $M_\pi \approx 200\,\mathrm{MeV}$. Several source-sink separations (typically $\sim 1.0\,\mathrm{fm}$ to $\sim 1.5\,\mathrm{fm}$) allow us to assess excited-state contamination. Results on individual ensembles are obtained from simultaneous two-state fits across all observables and all available source-sink separations with the energy gap as a common fit parameter. Physical results are obtained from a combined chiral, continuum and finite-size extrapolation. For the nucleon isovector axial, scalar and tensor charges we find physical values of $g_A^{u-d} = 1.242(25)_\mathrm{stat}(\genfrac{}{}{0pt}{2}{+00}{-31})_\mathrm{sys}$, $g_S^{u-d} = 1.13 (11)_\mathrm{stat}(\genfrac{}{}{0pt}{2}{+07}{-06})_\mathrm{sys}$ and $g_T^{u-d} = 0.965(38)_\mathrm{stat}(\genfrac{}{}{0pt}{2}{+13}{-41})_\mathrm{sys}$, respectively, where individual systematic errors in each direction from the chiral, continuum and finite-size extrapolation have been added in quadrature. Our final results for the isovector average quark momentum fraction and the isovector helicity and transversity moments are given by $\langle x\rangle_{u-d} = 0.180(25)_\mathrm{stat}(\genfrac{}{}{0pt}{2}{+14}{-06})_\mathrm{sys}$, $\langle x\rangle_{\Delta u-\Delta d} = 0.221(25)_\mathrm{stat}(\genfrac{}{}{0pt}{2}{+10}{-00})_\mathrm{sys}$ and $\langle x\rangle_{\delta u-\delta d} = 0.212(32)_\mathrm{stat}(\genfrac{}{}{0pt}{2}{+16}{-10})_\mathrm{sys}$, respectively.


Introduction
Nucleon matrix elements carry information on the internal structure and properties of nucleons, which can be related to a large variety of physical processes. Using calculations within the framework of lattice QCD, these matrix elements can be studied from first principles. Considering local, isovector operator insertions and zero momentum transfer, the corresponding matrix elements give access to isovector nucleon charges. These can be obtained from lattice QCD without the need to consider contributions from quark-disconnected diagrams, which are computationally particularly difficult.
Unlike the axial charge, the scalar and tensor charges, which can contribute to the β decay of the nucleon through non-standard couplings outside the Standard Model (SM) [19] and are important for interpreting the results of dark matter searches [20], are much less well-determined from phenomenology. Therefore, in this case lattice QCD can provide crucial input to searches for Beyond the Standard Model Physics (BSM) physics. The tensor charge also enters in searches for BSM sources of CP -violation as it governs the contribution of quark electric dipole moments to the neutron electric dipole moment [21]. Future experimental results will likely improve the precision of phenomenological determinations of the tensor charge [22], which should allow for future tests of predictions from lattice QCD.
Beyond nucleon matrix elements of local operators, there are observables related to higher-twist operators, such as parton distribution functions (PDFs). In particular, the average quark momentum fraction of the nucleon is of considerable phenomenological interest, as it contributes in the gauge-invariant decomposition of the nucleon spin given by Ji's sum rule [23]. For twist-2 operator insertions, as required for e.g. the isovector average quark momentum fraction and the second moments of helicity and transversity PDFs, again lattice QCD can be used to compute the relevant matrix elements, which are typically less well determined than the ones related to local operators. Even higher moments of PDFs would involve also higher-twist operators rendering lattice calculations infeasible due to operator mixing and further decreasing signal-to-noise ratios.
Excited-state contamination is one of the dominant sources of systematic uncertainty in contemporary lattice QCD calculations of nucleon matrix elements [24][25][26][27][28]. This is caused by an exponentially decreasing signal-to-noise ratio making sufficiently large Euclidean time separations to suppress such unwanted contributions from excited states unaffordable in terms of computational cost. Several approaches have been used in the past in an attempt to tame these effects. They mostly rely on either explicitly fitting excited states in two-and three-point functions [11,29] for a given nucleon matrix element, or on performing a summation over the operator insertion [12,30,31] to achieve additional suppression of excited-state contaminations. Here we investigate an approach to simultaneously fit data for nucleon charges and second moments of PDFs at multiple source-sink separations with a common, fitted energy gap. While nucleon charges and moments of PDFs are typically studied separately on the lattice, we find that such a combined analysis has several advantages. First of all, the spectrum and thus the energy gaps depend only on the quantum numbers of the interpolating operators chosen for the nucleon state, but not on the operator insertion itself. Therefore, fitting a common energy gap allows us to fully exploit correlations between different matrix elements. We find that such simultaneous fits are much more stable compared to fitting single observables with an energy gap left free. Moreover, assuming sufficient statistics, the convergence of the fitted gap towards its theoretical expectation can be tracked as a function of the fit range in our approach, and no additional assumption is required with respect to the energy gap.
In this paper we present physical results for the isovector axial, scalar and tensor charge of the nucleon, ID β T /a L/a M π / MeV M π L M N / GeV N HP N LP twist-2 t sep / fm H102 3. 40 Table 1: Overview of ensembles used in this study. The error on the pion and nucleon masses include the error from the scale setting. N HP and N LP denote to the number of high-precision (HP) and low-precision (LP) measurements on each value of t sep , respectively. The column labelled "twist-2" indicates whether twist-2 operator insertions are avilable on a given ensemble. The statistics for the two-point function is always the same as for the three-point functions.
its average quark momentum fraction, and the isovector moments for the helicity and transversity PDFs from isovector twist-2 operator insertions. Some preliminary results have been published in Refs. [32,33]. Since we consider only isovector quantities, there are no contributions from quark-disconnected diagrams. However, we are planning to add isoscalar observables in a future publication; a first account of related work can be found in Refs. [34,35].
This paper is organized as follows: In section 2 we present the setup for our lattice calculations, including an overview of the ensembles, operators and matrix elements, technical details on the calculation of two-point and three-point functions as well as a discussion of the renormalization required to obtain physical results. Section 3 deals with the methods we employ to ensure ground-state dominance in the desired nucleon matrix elements, which is required for the extraction of physical observables from lattice data. Physical results from chiral, continuum and finite size (CCF) extrapolations are discussed in section 4, and some concluding remarks are contained in section 5. Additional technical details related to renormalization have been moved to an appendix.

Lattice setup 2.1 Ensembles
Calculations have been performed on eleven gauge ensembles provided by the Coordinated Lattice Simulations (CLS) initiative [36]. These ensembles have been generated with N f = 2+1 flavors of non-perturbatively O(a)improved dynamical Wilson fermions and the tree-level Symanzik gauge action. A twisted-mass regulator has been introduced in the simulations to suppress exceptional configurations [37] and open boundary conditions in time direction are employed to alleviate the issue of long autocorrelations in the topological charge [38]. For further details on the simulations we refer to Ref. [36].
An overview of the ensembles used in the present study is shown in Tab Table 2: Values of the lattice spacing a and t 0 /a 2 for each value of β used in this study. Values are taken from Ref. [40]. The first error is statistical, second one systematic.
are chosen such that M π L 4, with the exception of the S201 ensemble which has been included to enable a direct test of finite-size effects. Values for the pion mass have been (re-)measured for most ensembles on the same set of gauge configurations that has been used in the calculation of nucleon matrix elements, hence they may slightly differ from the values originally published in Ref. [36]. The only exception are ensembles H102 and H105 at the coarsest lattice spacing, for which we employ the values from Ref. [36]. However, the precision of the values on the pion mass is in any case not yet relevant to the present study.
In Table 2 we list the values of the lattice spacing, corresponding to the four values of β in Tab. 1, together with values for the gradient flow scale t 0 /a 2 introduced in Ref. [39]. All results in Table 2 are taken from Ref. [40] and we refer to this publication for further details on the scale-setting procedure. In order to set the scale in our study the physical value of t 0 is required, which has also been determined in Ref. [40] 8t 0,phys = 0.415(4) stat (2) sys fm , through the physical quantity f πK = 2 3 (f K + 1 2 f π ) employing Particle Data Group values for the pion and kaon decay constant f π = 130.4(2) MeV and f K = 156.2(7) MeV [41].

Operators and matrix elements
In this study we aim at computing isovector axial, scalar and tensor charges that are related to the following local dimension-three operators Additionally, we are interested in forward matrix elements of twist-2, dimension-four operators where {...} indicates symmetrization over indices with subtraction of the trace and [...] denotes anti-symmetrization.
Dirac matrices are labelled by γ µ,5 , Throughout this study we will work in Euclidean spacetime. Besides, we introduce a compact notation for which the matrix element of a given operator insertion O X µ1...µn with X ∈ {A, S, T, vD, aD, tD} and n Lorentz indices reads where u(p i , s i ),ū(p f , s f ) denote Dirac spinors with initial (final) state momentum p i (p f ) and spin s i (s f ). W X µ1...µn (Q 2 ) on the right-hand side is an operator-dependent form factor decomposition. For example, for the axial vector current one has where G A (Q 2 ), G P (Q 2 ) are the axial and induced pseudoscalar form factor, Q µ = (iE f −iE i , q) is the Euclidean four-momentum transfer with q = p f − p i and M N the nucleon mass. For further details on the relevant form factor decompositions for generalized parton distribution functions (GPDFs) we refer to Ref. [42].
Obtaining nucleon matrix elements in lattice QCD requires the computation of spin-projected two-and three-point functions as depicted in Fig. 1 In case of the three-point functions we employ polarization in the z-direction, i.e. we project with Γ z = Γ 0 (1 + iγ 5 γ 3 ), while for the two-point functions Γ 0 = 1 2 (1 + γ 0 ) is used, effectively averaging over all three spatial polarizations. For the two-point function we find that the latter yields a slightly better signal-to-noise ratio for e.g. the resulting nucleon masses, than using Γ z . The proton interpolating field is given in position space by where C is the charge conjugation matrix, and we have introduced Gaussian-smeared quark fields The values for the parameters κ G and N have been chosen to correspond to a smearing radius of ∼ 0.5 fm for each value of β. Furthermore, we apply spatial APE-smearing [43] to the gauge links entering the threedimensional Laplacian ∆, to improve the ground state projection for the relevant matrix elements and to gain additional noise reduction.
For the following discussion we define the source-sink separation t sep = t f −t i and introduce the shorthand t = t op − t i . W.l.o.g. we will assume that the source time is zero, i.e. t i = 0, corresponding to a index shift in the actual calculation. Moreover, we demand that the final state is produced at rest, i.e. p f = 0, q = − p i . In momentum space the two-and three-point functions in Eqs. (6,7) can then be written as Extracting the physical matrix elements requires the cancelation of unknown overlap factors in the three-point function, which in the case of zero momentum transfer Q 2 = 0 can be achieved by forming the ratio In the limit of large Euclidean time separations t and t sep − t the ratio turns into a plateau as it becomes dominated by the ground state, i.e. lim t→∞ lim (tsep−t)→∞ R X µ1...µn ( 0, t, t sep ) = const (13) For the local operators in Eqs.
(2) one obtains the following, asymptotic relations at large Euclidean times for the isovector axial-, scalar-and tensor charges The decompositions for the isovector combinations of the dimension-four operators in Eqs. (3) lead to where in the GPDF notation of Ref. [42] we have defined the isovector average quark momentum fraction In the actual calculation we always average over all contributing, numerically non-identical index permutations.

Computation of two-and three-point functions
Apart from the actual generation of the gauge ensembles, the computationally most expensive part of this study is the calculation of two-and especially three-point functions in Eqs. (10,11). Therefore, we employ the truncated solver method [44][45][46] on most ensembles to reduce the cost of the required inversions. The method is based on the idea of using a (relatively) large number of low-precision N LP inversions to obtain a statistically precise estimate of the actual observable and only a small number N HP of high-precision measurements to correct for the resulting bias in the final expectation value For the ensembles in Table 1 we typically observe a factor ∼ 2 to 3 improvement in computer time compared to using only exact solves. The total numbers of low-and high-precision inversions for each ensembles can be found in Tab. 1.
For the computation of three-point functions we perform sequential inversions through the sink with the final state produced at rest. Depending on the value of the lattice spacing and the available statistics, we compute three-point functions for at least three and up to five values of t sep . This allows us to check the dependence on the source-sink separation, which is instrumental in dealing with excited-state contamination. The values of t sep in physical units are shown in Table 1. Note that we do not include values of t sep smaller than 1 fm. For the initial (forward) propagator we use point sources distributed on a single timeslice in the center bulk of the lattice. Typically, the actual position of the source timeslice t i (before performing the index shift t i → 0) on a given ensemble is chosen such that t min sep = T −2t i holds for the smallest, available value of the source-sink separation t min sep . Since the ensembles used in this study have been generated with open boundary conditions, this choice guarantees that all operators remain sufficiently far away from the boundaries in time, hence preventing further contamination due to boundary effects. Finally, two-point functions are generally computed on the same source timeslice t i and with the same statistics as the three-point functions.

Renormalization
Unlike hadron masses which are renormalization group invariants, matrix elements as given in Eq. (4) typically require renormalization. To this end we have performed the non-perturbative renormalization for the relevant operators using the Rome-Southampton method [47] at each lattice spacing except for the finest one. The reason for this is that at lattice spacings of a 0.05 fm topological charge freezing is expected to become a severe issue, hence simulations with periodic boundary conditions as required by the Rome-Southampton method are not feasible in such a setup. Our results are summarized in the top portion of Tables 3 and 4 for local and twist-2 operators, respectively. They are all given in the MS scheme at a scale of µ = 2 GeV and we have included results for both irreducible representations for the twist-2 operators. For further details of our renormalization procedure and associated notation, we refer to appendix A.
For the required values of the renormalization constants at our finest lattice spacing at β = 3.7, we have resorted to extrapolations, introducing a further source of uncertainty. The numerical results from this procedure are summarized in the bottom lines of Tables 3 and 4 for local and twist-2 operators, respectively. The errors for the extrapolated values have been scaled by a factor of ten to account for the systematic uncertainty of this procedure.
Even allowing for a generous error margin on the extrapolated Z-factors may not entirely disperse all doubts concerning the reliability of this procedure; however, in the case of the axial-vector matrix element we have performed a trorough cross-check using the results for Z A determined from the chirally-rotated Schrödinger functional [48], which are available for all four values of β used in our study. This offers the possibility for cross-checking the validity of the extrapolation that we applied in the case of β = 3.7 from the perspective of the final, (combined) continuum extrapolation. Moreover, this alternative renormalization method allows for a more consistent O(a) improvement in the case of g u−d A . We will validate our extrapolation for Z A by a detailed comparison with Schrödinger-functional results in section 4.2.

The values of Z SF
A used in this study have been collected in Table 5 together with results for the improvement coefficient b A taken from Ref. [49] as well as values of κ crit determined in Ref. [50]. The final, O(a)-improved renormalization factors are then dependent on the bare coupling constant g 0 as well as on the quark mass m q = 1 2a ( 1 κq − 1 κcrit ) where q = l, s and the average quark massm = 1 3 (2m l + m s ) The last term depends on an additional improvement coefficientb A for which results have not been published for all four values of β. However, it is formally of O(g 4 0 ) and hence likely to be suppressed. Moreover, it has been found in Ref. [49] at the coarsest lattice spacing for ensemble H102 that the value ofb A is indeed compatible with zero, albeit with large statistical errors. Therefore, we will drop this term from our analysis.

Ground-state dominance
It is a well-established fact that nucleon structure calculations in lattice QCD are hampered by excited-state contamination [16]. This is caused by a signal-to-noise problem preventing the use of sufficiently large sourcesink separations in the calculation of nucleon three-point functions. Therefore, in practice it is not feasible     Table 5: Axial vector renormalization factors Z SF A from the Schrödinger functional method as given in Ref. [48], improvement coefficients b A from Ref. [49] and values for κ crit as determined in Ref. [50]. In the notation of Ref. [48] we choose Z l A,sub from the L 1 constant line of physics for Z SF A . Statistical and systematic errors have been added in quadrature.
to directly extract a reliable ground-state plateau value from lattice data for the ratio in Eq. (12). We have investigated several approaches to deal with excited states and extract the final observables.

Multi-state fits
Our main approach to tackle excited-state contamination in nucleon structure calculations are multi-state fits to lattice data for the ratio in Eq. (12). Inserting complete sets of states in the two-and three-point functions in Eqs. (10,11) their spectral representation can be parameterized as in terms of observable-independent energies E k ( q) and observable-dependent factors a k ( p), A X,kl µ1...µn ( q) containing amplitudes and further kinematical expressions. The exact form of the latter will not be relevant for our purposes in this section. Moreover, suppressing all indices related to the operator insertion by introducing the shorthand A kl ( q) = A X,kl µ1...µn ( q) and defining the three-point function in Eq. (23) can be rewritten as where we have introduced the energy gaps ∆ kl ( q) = E l ( q) − E k ( q). Assuming zero-momentum transfer q = 0 as required in our actual calculation and suppressing all occurrences of zero momenta in the notation the expression is further simplified to become where we made use of the fact thatÃ kl ( 0) =Ã lk ( 0) for the current insertions we consider. Keeping only terms involving the lowest gap ∆ = ∆ 01 , one arrives at the following expression for the ratio in Eq. (12) where we definedĀ The first term on the r.h.s. is then a (linear combination of) form factor(s) at zero momentum transfer depending on the operator insertion X and the spin-projection in the original three-point function, e.g. for X = A and µ = 3 for our choice of projectors one finds thatĀ A,00

3
gives the axial charge. The expression in Eq. (27) represents our final fit model, which has already been applied in a previous analysis of lattice data with N f = 2 dynamical quark flavors in Ref. [9]. In principle, it is possible to fit the model in Eq. (27) leaving the gap as a free parameter; however, this requires very precise data and leads to rather large errors on the estimate for the corresponding observables. Still, from a theoretical point of view it is desirable to apply such fits without additional assumptions, in contrast to Ref. [9], where the gap was fixed to ∆ 0 = 2M π on each ensemble. Therefore, we choose a more sophisticated approach, fitting the model in Eq. (27) with a single free gap ∆ to all observables and for all available values of t sep simultaneously. This is possible because the gaps are only related to the interpolating operators which are chosen the same for all the nucleon matrix elements in this study. These simultaneous fits yield much more stable fits compared to fitting a free gap to a single observable only. In fact, we find that they often outperform simple two-state fits with a fixed gap with respect to the resulting error onĀ X,00 µ1...µn (0) as correlations in the data are more thoroughly exploited.
Since the fit form in Eq. (27) is symmetric in t around t sep /2 we explicitly symmetrize the data before fitting, which leaves a fit range of t ∈ [t fit , t sep /2] at each value of t sep . Furthermore, we restrict ourselves to a consistent set of source-sink separations for each value of β as listed in Table 6. As a result, we drop the largest available source-sink separation from the fit in a few cases. The data at these additional, largest source-sink separations are typically very noisy and do not affect the final results much within errors, and dropping them entirely can lead to more stable fits. This is in particular so because the problem size is reduced, and hence the estimate of the inverse covariance matrix becomes more reliable.
When selecting time intervals for the simultaneous fits, some care is required to ensure that the fitted gap is stable under variation of the fit interval, since the excitation spectrum is very dense. In the actual fits, we demand M π t fit ≥ 0.4 on all our ensembles, which we found to be a reasonable compromise between the statistical precision and the suppression of further excited states. On some of the ensembles, however, due to the high statistical precision achieved, it is necessary to be more restrictive and leave out further data points. The final choices of t fit /a can be found in Table 6 together with the resulting correlated χ 2 /dof and p-values, as well as the renormalized results for the individual observables on each ensemble. 1 Demanding at least a consistent lower bound on t fit in units of M π is motivated by the expectation that the lowest gap for our ensembles will typically be close to 2M π . On ensembles with large enough statistics it is actually possible to track the convergence of the gap as a function of t fit , which allows us to further corroborate the choice of t fit in these cases. This is illustrated in Fig. 2 for two of our ensembles (C101, N203). Clearly, in both cases the value of ∆ approaches 2M π within errors, which are increasing with M π t fit . Keeping in mind that we are not actually interested in a precise determination of the value of the gap itself, we generally choose the fit range such that the gap has converged within statistical errors (at least on ensembles for which it can be sufficiently tracked) while statistical precision still allows for a stable fit and a meaningful extraction of the final observable.
The goal of our simultaneous multi-state fits is to suppress the residual excited-state contamination to a level which is no larger than the statistical precision. These fits can be systematically improved by 1. increasing statistics while choosing more restrictive bounds on t fit , 2. adding further terms in the fit corresponding to additional terms in Eq. (26), provided one has sufficient statistics to retain a stable fit, 3. including further observables, which we found to stabilize the fits and reduce the resulting error.
Besides, it is possible to use similar fits beyond the case of zero momentum transfer, by removing the assumption of symmetric plateaux. A fit model analogous to in Eq. 27 can be derived for this case, although it will contain additional amplitudes and gaps due to the momentum transfer. Finally, we remark that from a theoretical point of view these simultaneous fits also supersede earlier attempts using a fixed gap as used in Ref. [9] with statistically much less precise data.

Summation method
Ignoring all but the very first term for the ratio on the r.h.s of Eq. (27), corresponds to a constant fit to the ratio data, which is also known as plateau method. Although convergence for the plateau method could in principle be tested considering several source-sink separations, this is in practice not feasible due to the exponential decrease of the signal-to-noise ratio for the nucleon at increasing time separations. At any rate, for the available values of t sep 1.5 fm one would not expect convergence. However, instead of explicitly   Table 6: Parameters including correlated χ 2 /dof and p-values and renormalized results for all six observables from simultaneous fits on each ensemble. Note that the set of source-sink separations that has been used in the fits differs in a few cases from the full list of available data given in Table 1; see discussion in text.  fitting excited-state terms for the ratio in Eq. (27) as discussed in the previous section, it is also possible to achieve additional suppression of excited states by appropriate summation over the operator insertion in time. This so-called summation method was first introduced in Ref. [30]. Here we consider the version with explicit summation of the ratio R X µ1...µn ( 0, t, t sep ) over timeslices t [16,51]  Restricting ourselves to the terms present in Eq. (27), the constant c X µ1...µn and the coefficient f X µ1...µn both receive contributions proportional toĀ X,01 µ1...µn related to transition matrix elements involving the ground state and the first excited state. In order to avoid contributions from contact terms, one (two) timeslices at both ends are excluded from the sum for local (twist-2) operators, i.e. t ex = 1 for X ∈ {A, S, T } and t ex = 2 for X ∈ {vD, aD, tD}. The desired ground-state matrix elementĀ X,00 µ1...µn can be obtained from a linear fit to the lattice data for the l.h.s. using several values of t sep . Clearly, the leading correction ∼ e −∆tsep on the r.h.s. of the above expression is then more strongly suppressed by the larger time extent t sep compared to the leading correction ∼ e −∆t in the case of a naive plateau fit to the data for the ratio itself. In the left column of Fig. 3, examples of fits for g u−d A , g u−d T and x u−d are shown for the N203 ensemble.
In our current setup, such summation method fits are dominated by the smallest source-sink separations, which exhibit the smallest statistical errors. Again, this is a consequence of the aforementioned signal-to-noise problem. Moreover, the efficacy of the summation method is restricted by the total number of different values of t sep and the fact that data at consecutive source-sink separations tend to be strongly correlated. Typically, these issues lead to larger statistical errors for the summation method compared to the plateau method or multi-state fits. Therefore, we consider the summation method only as a cross-check rather than a stand-alone method to obtain final numbers.
Besides, we observe deviations from the linear behavior in Eq. (30) on ensembles with large statistics and including five values of t sep . While hardly visible by eye, the values in the left column of Fig. 3 exhibit non-linear curvature and lie systematically below the fitted result for t sep > 20a. Still, on most ensemble our data are well described by the linear fits, albeit within the rapidly increasing errors at larger values of t sep . For our current setup, the summation method works best for the statistically precise axial and tensor charges. In principle, the results from the summation method might still depend on the source-sink separations used, however, it is not possible to systematically test this effect with the available number of source-sink separations and effective statistics by e.g. leaving out the smallest source-sink separation.
In Table 7, we have included the results from the summation method for all six observables on each ensemble. Overall we find rather good agreement with the results from the simultaneous fits in Table 6, although the errors for the summation method are significantly larger for the local operator insertions. An example for this is shown in the right column of Fig. 3 where we have plotted the lattice data together with results from the summation method and a simultaneous fit for selected observables on ensemble N203. Note that for the summation method we have always used all available values of t sep . For the scalar charge and the twist-2 operator insertions we observe some fluctuations when comparing the two methods. In particular, the summation method fails completely for g u−d S on H105 yielding a negative value, which is clearly due to insufficient statistics. The simultaneous fits still give a reasonable result in this case as they exploit correlations between the different matrix elements.
In general, there is no obvious global trend in any of the observed deviations between summation method and simultaneous fits. However, it appears that there is typically a larger spread in the results from the summation method. This is still true even for the twist-2 operator insertions for which the relative statistical precision is more similar to that of the simultaneous fits than in the case of local operator insertions. However, this behavior is more or less expected because the summation method only uses data for a given observable while the two-state fits are stabilized by fitting all matrix elements simultaneously. This is another important reason why the simultaneous multi-state fits are our preferred method to deal with excited-state contamination.

Chiral, continuum and finite-size extrapolation
Obtaining the final, physical results requires a combined chiral, continuum and finite-size extrapolation to account for unphysical quark masses and the fact that lattice simulations are performed at finite values of the lattice spacing and at finite volume. To this end we have tested several fit ansätze guided by chiral perturbation theory. For any given quantity Q(M π , a, L), the fit models used in this study are derived from the following expression, by an appropriate selection of non-zero fit parameters A, B, C, D and E. We will label fit models by their corresponding combination of non-zero fit parameters, e.g. "ABD". The first term on the r.h.s. represents the observable in the SU (2) F -chiral, continuum and infinite-volume limit, while the second and third term describe the leading chiral behavior. In the case of the axial charge, the coefficient C g A of the term containing the chiral logarithm is known analytically [52,53], The leading continuum behavior is observable dependent, i.e. by default we have n(Q) = 1 for unimproved observables, while in case of the axial and the scalar charge we assume n(g A ) = n(g S ) = 2 since additional counterterms at O(a) do not contribute to the corresponding operators at zero momentum transfer. The last term on the r.h.s of Eq. (31) describes the leading finite-size behavior; see Ref. [54].
As regards the term containing the chiral logarithm, we find that it does not describe our data at all. In the case of the axial charge, we have tested both possible choices, i.e. including the analytically known coefficient in Eq. (32) and leaving it as a free parameter of the fit for model ABCDE. Using the analytical expression we arrive at an implausibly small value of g u−d A = 1.143(21) stat . Besides, we observe a large cancellation between the chiral logarithm and the term ∼ M 2 π for which the coefficient is otherwise compatible with zero. This seems to indicate that our data are not really sensitive to the chiral logarithm. Leaving the parameter free in the fit yields a more plausible result of g u−d A = 1.275(62) stat , however, with a much larger statistical error. Moreover, the fitted coefficient C g A comes out with the wrong sign compared to the analytical expectation in Eq. (32). This is similar to what has been found in an earlier, two-flavor study in Ref. [9]. As a result we do not include this term in our final fit model. We remark that excluding data with M π > 300 MeV does not remedy any of these issues: the corresponding results g u−d A = 1.178(35) stat and g u−d A = 1.31 (15) stat have larger statistical errors, but the qualitative features remain unchanged. Given that the applicability of baryonic ChPT in the mass range studied here is by no means established, we do not necessarily expect an ansatz incorporating Eq. (32) to be superior.

Test of finite-size effects for g u−d A
In the left column of Figure 4 we show the chiral and continuum behavior for g u−d A obtained from fitting model ABD, i.e. without including a finite-size term. The lattice data in the upper and lower panel have been corrected to vanishing lattice spacing and to physical light quark mass, respectively. The resulting behavior is very flat in both M 2 π and a 2 . Nevertheless, a significant spread in the data remains around the blue extrapolation bands. This is reflected by a prohibitively bad value of χ 2 /dof for this fit, i.e. χ 2 /dof ≈ 4.067. In particular, there is one outlier that lies far below all other data points. This data point belongs to ensemble S201 which is the only ensemble with M π L ≈ 3. Since it has been generated with the same input parameters as N200 apart from the spatial volume, we can perform an explicit finite-size test in this case. With respect to the continuum extrapolation shown in the lower panel we find respectively. This very significant difference can be attributed to finite-size effects. For the plots in the right column of Figure 4 we show the chiral and continuum behavior from fit model ABDE, i.e. including a finitesize term, which greatly reduces the scattering of results around the extrapolation bands. In fact, it entirely removes the spread in the values for S201 and N201 which now read respectively. We also find that the quality of the fit is greatly improved, resulting in χ 2 /dof ≈ 0.573. Moreover, the introduction of the additional fit parameter barely increases the statistical error on the final results. We remark that we find finite-size effects to be relevant for all observables. However, it is only for the statistically precise axial charge that we observe such a significant improvement in the resulting value of χ 2 /dof when switching from model ABD to ABDE. The finite-size extrapolation for g u−d A from model ABDE after taking the continuum limit and extrapolating to the physical pion mass is shown in Fig. 5.
For the final CCF extrapolation, we hence adopt model ABDE and perform the required fits using a bootstrap procedure with N s = 10000 samples. To this end, we apply resampling for the values of M π , the individual results for the observables as well as for all quantities that are only β-dependent such as renormalization factors, t 0 /a 2 and t 0,phys . The latter enters the analysis only to fix the physical value of M π in units of MeV. For the physical pion mass, we use the FLAG value in the isospin limit M π,phys = 134.8(3) MeV [55], reflecting the fact that we impose isospin symmetry and neglect electromagnetic effects in our simulations. The bootstrap procedure allows us to propagate all individual errors and accounts also for correlations introduced in the fit by β-dependent quantities such as renormalization factors and factors of t 0 /a 2 . In fact, t 0 /a 2 and unimproved renormalization factors are quark-mass independent and hence 100% correlated at any given β. Lattice data have been corrected to the physical value of the pion mass and the continuum limit using parameters from the fit. Therefore, the corrected data points are highly correlated.  from different CCF fits employing model ABDE and using data from simultaneous fits. In the column labeled renormalization the tag "RIMOM" refers to using renormalization factors from the Rome-Southampton method as in Table 3, while "SF imp." refers to using improved renormalization factors from the Schrödinger functional approach as defined in Eq. (21) and obtained from the data in Table 5. "SF" refers to using unimproved renormalization factors from the Schrödinger functional approach.
added in quadrature to the respective statistical errors before the resampling such that they are propagated into the final error estimate as well. Therefore, the resulting errors are not purely statistical, however, the effects of these systematic uncertainties are very small compared to the actual statistical errors on the final results.

Study of systematics related to renormalization
Another potential source of uncertainty concerns the renormalization factors at β = 3.7 determined via the Rome-Southampton method. As discussed in Sec.  Table 8. Red symbols denote results obtained using Z A from the Rome-Southampton method. Blue and violet symbols represent data obtained using Z A from the Schrödinger functional with mass-dependent counterterms included and excluded, respectively. Filled symbols are used for results obtained by fitting data at all four lattice spacings, while open symbols are used for results when excluding data at β = 3.7. Circles and boxes refer to fitting a lattice artifact O(a 2 ) and O(a), respectively.
is also expected to be the most sensitive one with respect to the aforementioned issues. Besides, for the axial vector current insertion, renormalization factors are available from the Schrödinger functional approach [48] for all four values of β including the mass-dependent factor in Eq. 21. This allows us to conduct an explicit consistency check in this case. A graphical overview of the ten variations can be found in Fig. 6.
The first six of these variations all assume that the leading lattice artifacts are of O(a 2 ) in the CCF fit model ABDE. They can be divided into three subgroups corresponding to the employed renormalization factors, i.e. the Rome-Southampton method and the Schrödinger functional, where the latter may include the mass-dependent factor or not. This allows us to test for the agreement of the two renormalization schemes and for possible deviations caused by ignoring mass-dependent counterterms in Z A . Within each of these three groups, we have two variations with and without including the data at the finest lattice spacing. For results using Z A from the Rome-Southampton method this serves as a cross-check that the extrapolation required for the renormalization factors at β = 3.7 is sound. With respect to the results renormalized via the Schrödinger functional method, we include this variation to be able to disentangle effects which arise when removing data for the finest lattice spacing from the continuum extrapolation and effects related to a potential issue with the extrapolation of Z A at β = 3.7. The last three variations shown in Fig. 6 assume that the leading lattice artifact in the CCF fit is of O(a) instead of O(a 2 ).
First, we find that the results using Z A from the Rome-Southampton method and the Schrödinger functional are in good agreement for the extrapolations linear in a 2 (variations 1 to 6). Moreover, leaving out the data at β = 3.7 has a very similar effect on g u−d A when either the Rome-Southampton method or the Schrödinger functional approach is applied for the renormalization. This leads to the conclusion that a systematic effect caused by the extrapolation to β = 3.7 for the Rome-Southampton method must indeed be very small.
A comparison of variations {3, 4} to {5, 6} reveals that the mass-dependent factor in Eq. 21 is completely negligible within the current statistical precision, i.e. both variations give practically identical results, demonstrating that residual discretization artifacts of O(a) are extremely small. This is also confirmed by the last three variations. While replacing the O(a 2 ) term by an O(a) term in the fit generally leads to somewhat larger continuum results, this behavior cannot be caused by the mass-dependent factor, since the shift is very similar in both cases, as can be inferred from variations 8 and 9.

CCF-related systematics and final results
In Fig. 7 we plot the chiral behavior for the three local isovector charges after taking the continuum limit and correcting for finite-size effects. The panels in the left column show the extrapolation band together with the original lattice data, which gives some indication for the size of continuum and finite-size corrections. For the plots in the right column the lattice data has been corrected for a → 0 and M π L → ∞ using the parameters obtained from the combined CCF fit. In general, the observed chiral behavior is very mild and the corresponding slope w.r.t. M 2 π is often found to be compatible with zero within errors. However, the corrections for leading lattice artifacts and finite-size corrections are typically non-negligible. A qualitatively similar picture is observed for the matrix elements of the twist-2 operator insertions in Fig. 8.
In order to estimate systematic effects in our CCF extrapolations we consider the following three, distinct variations of the fits for each observable: 1. Excluding data with M π,cut > 300 MeV to test the effect of neglecting higher order terms in the chiral extrapolation on our final results. Since the convergence properties of baryonic χPT in the regime of M π > 300 MeV are doubtful, such terms are potentially a major source of systematic errors and even more so at larger light quark masses.
2. Excluding data at the coarsest lattice spacing (β = 3.4) to test the convergence of the continuum extrapolation.
3. Excluding data with M π L < 4 from the CCF fits (ensembles S201 and H105) to test the stability of the finite-size extrapolation.
These cuts in the data are chosen such that enough lattice data points remain for a meaningful fit in all cases. Still, at least for the first two variations they result in significantly larger errors than a fit to the full data set. For each of the three variations we assign an additional systematic error to the final results for each observable, which is given by the difference of the result from the variation and the result using the full set of data. These systematic errors for the three variations are labelled "χ", "cont" and "FS", respectively. However, it should be kept in mind that these variations cannot be fully independent due to the simultaneous (and non-linear) fits. For example, removing the data at β = 3.4 simultaneously removes one of the two ensembles with the smallest pion mass (C101). Therefore, this variation affects not only the continuum extrapolation as intended but in addition may potentially alter the chiral extrapolation in a rather unfavorable way, i.e. removing data at the smallest available light quark masses. This is why we believe that these estimates of systematic errors are rather conservative. Nonetheless, we find that that they are typically of similar or smaller size than the statistical errors, indicating that the final extrapolations are not dominated by systematic effects at the current level of statistical precision.
Our final results for the local nucleon charges read while for the lowest moments of the parton distributions we obtain x ∆u−∆d = 0.221 (25) x δu−δd = 0.212 (32) The corresponding χ 2 /dof and p-values can be found in Table 9, where we have also included the values for the three variations that have been used to assign the systematic errors. In general, we observe that our data are well described by the fit model. Only for g u−d T we observe some tension, which might be related to the chiral extrapolation. This is the only case for which a cut in M π leads to a significant improvement of the fit. None of the other applied cuts have an effect on the fit quality, as can be seen from Table 9. However, we cannot exclude that the behavior observed for g u−d T is merely a fluctuation in our data. Therefore, we prefer to quote the final result from fitting the full set of data, which is consistent with the choice for the other observables.

Summary and discussion
We have computed isovector nucleon axial, scalar and tensor charges as well as the isovector average quark momentum fraction, helicity and transversity moments on a set of eleven gauge ensembles using N f = 2 + 1 flavors of non-perturbatively improved Wilson fermions. The ground-state contribution has been extracted from simultaneous fits with a common, fitted energy gap. Physical results were obtained using a simultaneous extrapolation to the physical pion mass and the continuum and infinite-volume limits.
Adding the (directed) systematic errors in quadrature, our final results can be summarized as g u−d −00 ) sys and x δu−δd = 0.212(32) stat ( +16 −10 ) sys . This is to be compared to the N f = 2 + 1 FLAG average [18] of g u−d A = 1.254 (16) (30), and the N f = 2 + 1 + 1 FLAG averages [18]  A noticeable feature of lattice determinations of g A is that the results from most collaborations are low compared to the experimental value. Looking at our combined chiral, continuum and infinite-volume extrapolation, we find that this may potentially be explained by a conspiracy of different correction terms, all of which tend to depress the lattice value: while the chiral extrapolation is fairly flat, both the continuum and the infinite-volume extrapolation yield large positive corrections to the measured values, which come on top of the positive correction from the removal of the leading excited-state contaminations. Given that all of these effects have the same sign, even small remnants of each could considerably depress the value extracted from lattice simulations.
There are a number of directions in which the present study can be extended: • It would be highly desirable to further increase statistics on existing ensembles in a future study. Since the data at the smallest source-sink separation is already extremely precise, the most effective way to achieve this would be to include additional measurements for the larger source-sink separations such that effective statistics are comparable for each source-sink separation. We expect such an increase in statistics to greatly improve the simultaneous fits. On the one hand, it will lead to a much better determination of the excited-state-to-excited-state term in Eq. (27), which will lead to even more stable fits and smaller statistical errors. On the other hand, it will allow us to further increase the value of t fit and possibly even to drop the smallest source-sink separation entirely, which should lead to an additional reduction of the systematic error arising from excited-state contamination.
• We also plan to add additional ensembles, including one with physical quark masses, in the near future. This should allow us to further reduce the uncertainty on the chiral extrapolation, and might help to remedy the issue with fitting the chiral logarithm in Eq. (31), particularly for g u−d A .
• We are also working on computing the contributions from disconnected quark loops in order to study the isoscalar counterparts of the isovector quantities considered here. This will also require the renormalization of the corresponding singlet operators, which may undergo mixing, adding a further level of complexity.
Finally, we plan to extend our analysis beyond the case of zero momentum transfer in order to study the isovector (and eventually the isoscalar) form factors of the nucleon. A study of the isovector electromagnetic and axial-vector form factors is currently under way.   Table 6, while in the right column the lattice data have been corrected for the continuum limit and finite-size extrapolation using the corresponding fit parameters. Therefore, the corrected data points in the right column are highly correlated within the same plot.    Table 10: The N f = 3 flavour ensembles with periodic boundary conditions used to determine the renormalization constants for this study. The ensembles labelled "rqcd.0XX" were made available by the RQCD collaboration as part of a joint NPR effort.

A Non-Perturbative Renormalization
In this appendix, we give further details of our renormalization procedure, which follows closely that presented for the case of the N f = 2 CLS ensembles in Ref. [59].
where the bare vertex function Λ O is derived from the bare Green's functions G O and S 0 via the amputation of its external legs, The bare Green's functions are measured using momentum sources [61] to compute position-momentum propagators S(y|p) = D −1 yx e ip·x , such that the bare propagator is given by the bare Green's function for a local bilinear operator O X µ1...µn (x) = u(x)Γ X µ1...µn d(x) by To reduce O(4) violation effects, we use only diagonal momenta of the form p = (µ, µ, µ, µ), where twisted boundary conditions ψ(x + L ν e ν ) = e iθν ψ(x) are employed to allow access to arbitrary momenta besides the Fourier modes.
To ensure that the vector and axial vector Ward identities are respected, we further replace [61,63] tr in the case of the vector and axial currents, X ∈ {V, A}.
In the case of the one-link operators, there are two inequivalent H(4) irreps in each case. For the vector and axial vector operators, there are a six-and a three-dimensional representation in each case [64], v 2,a (τ whereas for the tensor operator, there are two inequivalent eight-dimensional representations [64],  Figure 9: The subtraction functions D X (µ, a) for the operators O X considered in this study at a lattice spacing of a = 0.086 fm.
We can then define a subtracted renormalization constant and we expect the corresponding RGI renormalization constant Z RGI,sub X (a) to show only very mild lattice artifacts when considered as a function of µ.

A.2.2 Automated perturbation theory
Since the Feynman rules for lattice perturbation theory are quite complex and do not usually allow for an analytical evaluation of Feynman integrals, we employ the HiPPy/HPsrc packages [76,77], which separate the (complicated, action-dependent) Feynman rules from the (action-independent) Feynman diagrams: the diagrams are coded once and for all in an operator-and action-independent fashion using the HPsrc library of Fortran 95 modules; these generic diagrams can then be evaluated numerically for in principle arbitrary operators and lattice actions. The automated derivation of the action-and operator-dependent Feynman rules is performed in a separate step using the HiPPy library of Python modules, which takes a human-readable expression for an action or operator as input and outputs the corresponding Feynman rules in a machinereadable format suitable for use with HPsrc.
In this manner, we have been able to reuse much of the code written in the context of our study of non-perturbative renormalization for the N f = 2 CLS ensembles [59], even though the gluonic action used is different in the two-and three-flavor cases. , using the bare coupling g 0 , the boosted coupling g b , or the BLM coupling g BLM for the perturbative subtraction. It can be seen that the BLM coupling is most efficient in removing the lattice artifacts, which are otherwise very large.

A.2.3 Choice of coupling
To combine the perturbative and non-perturbative results, we need to make a choice for the coupling. The bare coupling g 2 0 = 6/β is well-known to give generally rather poor results. A widely-used alternative is the boosted coupling g 2 b = g 2 0 / P (g 0 ) , where P (g 0 ) is the (non-perturbatively determined) value of the average plaquette. Using the boosted coupling amounts to a partial resummation of higher-order terms in the perturbative expansion. To better control this resummation, the BLM coupling [78] g 2 BLM = 4πα V (q * ) can be used, where α V (q) is the coupling in the potential scheme defined by the expression for the static potential, and q * is a process-dependent typical momentum scale given by log(q 2 * ) = d 4 q f (q) log(q 2 ) d 4 q f (q) .
We find that using the BLM coupling is highly efficient in removing most of the lattice artifacts using oneloop lattice perturbation theory. In Fig. 10, we show a representative example, i.e. acomparison between the different couplings in the case of the tensor renormalization constant Z RGI T ; it can clearly be seen that the use of the BLM coupling leads to a nearly perfect subtraction of the (rather large) lattice artifacts and is vastly superior in efficiency to the use of either the bare or boosted couplings.

A.3.1 Final fits
To remove the residual µ-dependence of the subtracted RGI renormalization constants, we perform the fit Z RGI,sub X (a, µ) = Z RGI X (β) 1 + d X 1 g 8 MS (µ) + d X 2 (β) (aµ) 2 ∆Z MS X (µ)Z MS X,RI −MOM (µ), where the β-independent term with coefficient d 1 accounts for the use of three-loop continuum perturbation theory in converting from the RI'-MOM scheme, and the term with coefficient d 2 (β) accounts for the use of the perturbative subtraction leaving residual discretization artifacts.
To keep both higher-order perturbative effects and lattice artifacts small, the fit region should ideally satisfy Λ MS µ a −1 .
Since we cannot realistically fulfil both of those inequalities at the same time, we have chosen to take the lower end of the window at µ min = 3 GeV, but allow renormalization scales as large as µ max = 2.75a −1 in the fit, because we rely on the perturbative subtraction of the leading artifacts. An example of the resulting fits is shown in Fig. 11.

A.3.2 Fit variants
To explore possible sources of systematic error, we employ the following fit variants: • adding either a higher-order chiral termc(a, µ)(aM π ) 4 or a finite-volume term d(a, µ)e −MπL to the chiral extrapolation (61), • varying the value of aΛ MS within the uncertainties of Λ MS , and • narrowing the fit window by increasing the lower bound on the fit intervals to µ min = 4 GeV, or by decreasing the upper bounds on the fit intervals to µ max = 2.5a −1 .
Our final estimate of the systematic error is obtained conservatively by adding the spreads from all three variants in quadrature.
A.3.3 Extrapolation to β = 3.7 Since the RI'-MOM scheme is defined in terms of quantities at well-defined four-momenta, it requires a four-dimensional Fourier transform and thus implicitly relies on the gauge ensembles being generated with periodic boundary conditions in time. Due to the extreme critical slowing-down observed in quantities related to the global topology, the generation of sufficiently large and properly thermalized gauge ensembles with periodic boundary conditions at β = 3.7 is not feasible with currently existing computer resources, and the existing β = 3.7 ensembles with open boundary conditions are not suitable for use with RI'-MOM. While there are some proposals how to bypass this issue [79,80], for this study we will rely on an extrapolation of the measured renormalization constants to β = 3.7. Given three values of β at which we have data, we use a linear extrapolation in β to obtain the central value, but do not trust the errors from the fit to account for the full uncertainty. We therefore inflate them by an ad hoc factor of ten to cover the full range of uncertainty involved in the extrapolation.