Confronting lattice parton distributions with global QCD analysis

We present the first Monte Carlo based global QCD analysis of spin-averaged and spin-dependent parton distribution functions (PDFs) that includes nucleon isovector matrix elements in coordinate space from lattice QCD. We investigate the degree of universality of the extracted PDFs when the lattice and experimental data are treated under the same conditions within the Bayesian likelihood analysis. For the unpolarized sector, we find rather weak constraints from the current lattice data on the phenomenological PDFs, and difficulties in describing the lattice matrix elements at large spatial distances. In contrast, for the polarized PDFs we find good agreement between experiment and lattice data, with the latter providing significant constraints on the spin-dependent isovector quark and antiquark distributions.


I. INTRODUCTION
Recent years have witnessed remarkable advances in the extraction of light-cone parton distributions from first-principles lattice QCD calculations of nucleon matrix elements of nonlocal operators. The standard paradigm for determining parton distribution functions (PDFs) empirically has involved factorizing high-energy observables into calculable, processdependent hard-scattering cross sections and nonperturbative functions characterizing the intrinsic properties of the nucleon. On the other hand, lattice QCD allows the nonperturbative nucleon structure to be computed directly, offering a complementary path, along with the phenomenological efforts, to mapping out the structures in the nucleon that emerge from the quark and gluon degrees of freedom in QCD.
While PDFs cannot be calculated directly in lattice QCD, there are novel methods that give access to the x dependence of PDFs (for a recent review see Ref. [1]). One of the most notable approaches is that via parton quasi-distributions [2], which allow one to access PDFs via matrix elements of momentum-boosted hadrons coupled with nonlocal spatial operators. For finite, but large enough, momenta the Fourier transforms of these matrix elements are related to the light-cone PDFs by applying a matching procedure using Large Momentum Effective Theory (LaMET) [3]. In the quasi-PDF approach, the connection between the lattice data and the PDFs is not direct, however. Firstly, the relation involves a Fourier transform from lattice coordinate space into momentum space. More significantly, it requires a nontrivial perturbative matching, up to power corrections, to relate the light-cone correlation functions encoded in the definition of the PDFs with the off-light-cone correlation functions accessible in the lattice calculations.
The traditional method for reconstructing the x-dependence of light-cone PDFs from lattice data has involved performing a truncated discrete Fourier transform (DFT) of the calculated lattice matrix elements. However, the truncation of the Fourier series restricts the information that can be inferred about the PDFs, as only the region measured by the lattice data in coordinate space propagates into the extracted PDFs. This procedure invariably limits the possibility of directly incorporating additional information from experimental data, and ultimately the ability to confront the universality of the PDFs across lattice and experimental observables. Currently, various advanced reconstruction methods are being investigated [4].
An alternative approach to this problem is to Fourier transform the matching between quasi-PDFs and PDFs into coordinate space, where the hadronic matrix elements are calculated in lattice QCD. Such direct matching between the coordinate space matrix elements, which define the quasi-PDFs, and the light-cone PDFs was actually proved to be valid to all orders in QCD perturbation theory when the nonlocal spatial operators are sufficiently localized into a small distance [5][6][7]. This approach allows lattice data to be treated on an equal footing with the experimental observables, and fitted simultaneously within the same analysis.
In this paper, we adopt this approach to perform the first Monte Carlo (MC) based global QCD analysis of both spin-averaged and spin-dependent PDFs from unpolarized and polarized DIS, Drell-Yan, and lattice data [8]. For the analysis we employ the theoretical framework of the Jefferson Lab Angular Momentum (JAM) Collaboration [9][10][11], using MC Bayesian inference sampling methodology, parametrizing the PDFs at the input scale µ, and solving the evolution equations in Mellin space to relate different observables measured at different scales. An analysis for the spin-averaged PDFs using lattice data within the NNPDF framework can be found in Ref. [12].
While the interpretability of the lattice data in terms of PDFs is subject to many systematic uncertainties, including discretization effects, finite volume effects, higher order perturbative corrections in the matching kernel, and power corrections, the inferred PDFs from the DFT exhibit promising general trends consistent with phenomenological distributions, but also some unexpected features in the sea quark sector. In particular, thed −ū antiquark asymmetry at intermediate parton momentum fractions x ∼ 0.1 is found to have the opposite sign in the lattice simulations compared to that in phenomenological analyses.
Similarly, for the spin-dependent PDFs there are indications from some global QCD studies that the ∆ū − ∆d asymmetry may also be opposite in sign to that from the lattice DFT results, although the uncertainties here are considerably larger.
Although the lattice systematic effects still need to be fully understood, it is nevertheless important to explore these intriguing features by confronting the lattice results with experimental data in the framework of QCD global analysis. Such an analysis can aim to achieve three goals: firstly, test the feasibility of including lattice data with experimental data using continuous Fourier transform from PDFs to lattice matrix elements; secondly, establish the degree of universality of the PDFs across the data space; and finally, identify the kinematic regions where the lattice data are compatible with the experimental measurements.
In proceeding with this narrative, we begin in Sec. II by describing the theoretical framework for the lattice matrix elements and experimental cross sections to be used in the global analysis. In Sec. III we discuss the Bayesian methodology used in previous JAM global QCD analyses to make inferences on PDFs from the data. The numerical results from our global analysis are presented in Sec. IV, for both the spin-averaged and spin-dependent PDFs, where we assess the impact of the lattice data on the QCD analysis of the experimental cross sections, and contrast the PDFs with those extracted using the DFT method. We in addition compare the results of the first few moments of the unpolarized and helicity PDFs with and without the lattice data, and study the relative importance of different of coordinate space on the PDF determination. Finally, concluding remarks and suggestions for further research are summarized in Sec. V.

II. LATTICE METHODOLOGY
In this analysis, we consider the nucleon matrix elements of nonlocal operators, for a given Dirac structure Γ, defined by where |N (P 3 ) is the nucleon state with longitudinal momentum P 3 , and ψ q is the quark field operator for flavor q. The quark field operators, which have been shown to be multiplicatively renormalizable [13,14], are displaced along the third spatial component in coordinate space with length z, and connected by a straight Wilson line W 3 (z). The point-by-point multiplicative renormalization factor Z Γ is included to renormalize the field operators, as well as the power-law divergence associated with the Wilson line, using a prescription developed in Refs. [15,16]. The prescription is based on an RI-type scheme and is applied for each value of z separately, with Z Γ (z, µ) converted to the MS scheme and evolved to a scale µ = 2 GeV. Other prescriptions for the point-by-point multiplicative renormalization were also proposed in Refs. [17][18][19][20]. The Dirac structure Γ allows us to access spin-averaged distributions using Γ = γ 0 , or spin-dependent distributions by taking Γ = γ 3 γ 5 . In practice, the lattice data are available for the isovector flavor combination q = u − d, to which we will restrict our analysis.
The spin-averaged and spin-dependent quasi-PDFs are defined in terms of Fourier transforms of the renormalized matrix elements M q ≡ M q [γ 0 ] and M ∆q ≡ M q [γ 3 γ 5 ] , respectively, with respect to the length of the Wilson line, z, by The quasi-PDFs are related to the usual light-cone PDFs via a perturbative matching procedure in LaMET [2,3], where the light-cone PDFs f q and ∆f q have support in the range x ∈ [−1, 1]. The matching kernels C q and C ∆q are calculable in perturbative QCD, and in our analysis we use the 1-loop expression from Ref. [21], which ensures valence quark number conservation. To this order, for the nonsinglet combination u − d the spin-averaged and spin-dependent kernels are equivalent, C q = C ∆q . The conservation of valence quark number is achieved by a mod- where "· · · " indicates the neglected power corrections. The direct matching of the lattice QCD-calculable matrix elements to light-cone PDFs was formally proved to all orders in perturbation theory when z is sufficiently small, with power corrections proportional to O(z 2 Λ 2 QCD ) [5][6][7]. The relations in Eqs. (4) allow us to formally connect the lattice observables to the light-cone PDFs in the same form as the relations connecting experimental observables to the PDFs. For example, the experimental cross sections for inclusive unpolarized (σ DIS ) and polarized (∆σ DIS ) DIS and for Drell-Yan (DY) lepton-pair production (σ DY ) considered in this analysis can be written generically in terms of the spin-averaged f i and spin-dependent where the functions "H" are the process-dependent, short-distance partonic cross sections, computed up to next-to-leading order (NLO) in perturbative QCD, and here the flavor labels i run over quarks, antiquarks and gluons. The symbol "⊗" represents the convolution . In Eqs. (5) the variables x B and x F are the Bjorken and Feynman scaling variables, respectively, and Q 2 is the mass squared of the exchanged or produced virtual boson. The similarity of the relations between the lattice matrix elements and the PDFs in Eqs. (4) and between the experimental cross sections and PDFs in Eqs. (5) allows both types of "observables" to be analysed simultaneously within the same global QCD analysis.
While the flavor sums in Eqs. (4) and (5) run over all partons in the nucleon, the sensitivity of each of the observables to individual parton flavors is different. In the case of inclusive DIS, the data are mostly sensitive to the sums of quark and antiquark distributions, and the use of proton and deuterium data allows one to discriminate between the u-and d-quark flavors [22][23][24][25][26]. On the other hand, fixed target DY data from pp and pd collisions offer direct sensitivity to products of quark and antiquark distributions, and careful choice of kinematics can isolate the antiquark distributions in the proton, and in particular its flavor dependence [27,28].
For the polarized inclusive DIS case, the constraints from the current data on the spin-dependent PDFs are somewhat weaker, due to the lack kinematic coverage across x B and Q 2 , as well as the larger uncertainties on spin-dependent observables. In addition, the valence quark number and momentum sum rules that exist for the spin-averaged PDFs are absent for spin-dependent distributions, and only the moments of the isovector and SU(3) octet combinations of spin PDFs can be related with empirical axial vector charges. The use of flavor SU(3) symmetry in global analyses has been challenged, however, in recent studies of spin-dependent parton distributions [10].
To assess the constraints that the lattice data can place on the individual quark and antiquark distributions, we write the light-cone PDFs (which are defined for positive and negative parton momentum fraction x) in terms of quark and antiquark components defined with support in the range x ∈ [0, 1]. Using the cross symmetry properties of the spinaveraged and spin-dependent PDFs, we can formally write with each of the quark and antiquark PDFs q,q, ∆q and ∆q defined for 0 ≤ x ≤ 1. At leading order, quark PDFs and quasi-PDFs are identical, which implies that PDFs are given by the Fourier transform of the matrix elements of Eq. (1). Considering the real and imaginary parts of the matrix elements separately, one then finds at this order for the spin-averaged distributions the relations and for the spin-dependent case, where the higher order corrections are indicated explicitly. The real part of the matrix ele-ment for the unpolarized case is therefore sensitive only to the valence quark PDF, while the imaginary part contains additional information on the antiquark content. The opposite is true for the spin-dependent distributions, with the imaginary part determined by the polarized valence distributions, and the real part sensitive to the polarization of antiquarks. The sensitivity of the spin-averaged and spin-dependent sea quark distributions is thus maximal for the imaginary and real parts of their corresponding matrix elements, respectively, and minimal for the conjugate counterparts.

III. BAYESIAN FRAMEWORK
Our Bayesian inference for the PDFs is based on the MC methods developed by the JAM Collaboration [9][10][11], which involve parametrizing the PDFs at an input scale µ 0 (chosen to be the charm quark mass, µ 0 = m c ) and solving the DGLAP evolution equations to evaluate the various observable considered in the analysis at different scales. The confidence region for the PDF parameters is characterized by the Bayesian posterior distribution where a represent a vector of the PDF shape parameters, and the likelihood function is chosen to be Gaussian in the χ 2 , The prior function π(a) is flat across the parameter space, and set to zero in regions that give unphysical PDFs.
The MC strategy to sample the posterior distribution utilizes the data resampling approach. It consist in carrying out a large number of χ 2 minimizations until statistical convergence is achieved. Each minimization consists of distorting the original data with Gaussian noise, within the quoted uncertainties, and selecting random initial guesses across the region of parameter space allowed by the prior as the starting points for the minimization.
The procedure allows one to scan thoroughly the parameter space, resulting in ensembles of parameters are that distributed according to the posterior distribution.
For the parametrization of the PDFs at the input scale µ 0 we use the traditional form in the literature, given by a generic template shape where the normalization N (a) = 1 0 dz z a 1 +1 (1 − z) a 2 is defined to be second moment of the distribution in order to decouple the factor a 0 from the exponents a 1 and a 2 , and to guarantee a finite normalization for the PDFs. Using the template shape Eq. (11) where e, i label different data sets and data points from different experiments as well as the lattice data, d e,i are the measured observables, and t e,i are the corresponding theoretical values that depend on the shape parameters a. The quantity α i denotes the uncorrelated uncertainties added in quadrature, while β k,i are the k-th source of point-by-point correlated systematic uncertainties. The parameters r e,k are nuisance parameters that allow distortion of the theory values additively within the systematic uncertainties β k,i , and we add a Gaussian penalty as a regulator for these parameters to avoid over-fitting. While these parameters can be fitted along with the shape parameters a, they can also be optimized by solving the relation ∂χ 2 (a, data)/∂r e,k = 0. Finally, the theoretical values are modified multiplicatively by the parameter N e , which is included as part of the fitting parameters with a Gaussian penalty of width δN e as a regulator.
The resulting parametrization for the spin-averaged PDFs involves 18 shape parameters and 9 normalization parameters for the experimental data, making a total of 27 free param-eters to be inferred using the posterior distribution. Similarly, for the spin-dependent PDFs we have 18 shape parameters and 13 data normalizations, for a total of 31 free parameters.
Due to the high dimensionality of the parameter space for the MC sampling of the posterior distribution, we use the multi-step strategy developed in Ref. [11]. This allows one to bootstrap efficiently into the region of parameters that have higher likelihood values by performing a sequence of steps to build the MC ensemble of parameters. Specifically, we start from flat priors (within the hyperbox) as guess parameters to be optimized by a subset of the data sets. At each new step, the resulting parameters from the previous step are used as starting parameters for the next iteration, in which the data space is enlarged to include an additional data set. The sequence is terminated once all the data sets have been included. This strategy allows one to optimize the parameter ranges efficiently, which would otherwise make the scan of the high dimensional parameter space very inefficient due to the computational cost of evaluating the χ 2 function.
Once the ensemble of parameters {a} is obtained from the MC sampling, we can estimate expectation values and variances for any PDF or observable O computed from the PDFs, With these analysis tools and theoretical framework outlined in Sec. II, we can proceed to the numerical analysis of the experimental and lattice data in the next section.

IV. NUMERICAL ANALYSIS
We begin the discussion of our numerical results with a brief overview of the experimental and lattice data sets used in the our simultaneous global fit. Following this, we present the results of the analysis for the isovector combinations of the unpolarized and polarized PDFs, and assess the impact of the lattice data on the global PDF extractions.
The lattice data used in this analysis were taken from the simulations by the ETM Collaboration in Ref. [8], referred to as "ETMC19." The calculation was performed using Ref. [21]. The excited state contributions were found to be suppressed for the results with t s = 12a, and the data were compatible, within uncertainties, with those from the other analysis methods considered. Three values of the longitudinal momentum P 3 were explored, P 3 = 0.83, 1.11 and 1.38 GeV; however, to control statistical uncertainties in this analysis we use only the results at the largest momentum, P 3 = 1.38 GeV, which were obtained with 72,990 measurements. Finally, nucleon mass corrections were taken into account using the results of Ref. [48].
The lattice data are typically given within a symmetric range of z/a, between −15 and +15. Strictly speaking, the data points at negative z are fully correlated with those at positive z due the symmetry properties of the matrix elements. While one can argue that only half of the data points represents the actual information collected by the lattice simulations, in our analysis we include the full spectrum from negative to positive z values in order to impose the symmetry property of the matrix elements as an additional constraint for the Bayesian inference.
For a meaningful estimate of the impact of the lattice data on the global PDFs fits, we perform two separate likelihood analyses, with and without lattice data, for both unpolarized and polarized PDFs. In principle, an analysis with lattice data alone could also be performed.
However, in practice we find that the PDFs extracted using from lattice data only are not well A summary of the fit results for the spin-averaged PDF analysis is presented in Table I, where the χ 2 values for each data set are given, along with the total, for the experimentonly and experiment + lattice scenarios. The fit to the 2,930 inclusive DIS and Drell-Yan experimental data points gives a χ 2 per datum of ≈ 1.3. This is a little higher than in recent start-of-the-art global PDF analyses [11,[49][50][51][52][53], mostly because of the very simple functional form chosen in Eq. (11). However, for the purposes of our current analysis, which is to assess the impact of the lattice data on global QCD fits rather than to provide a new set of precision PDFs, the fit is acceptable.   effect of including the lattice data is to increase the total χ 2 by over ≈ 800 units, leading to an overall χ 2 per datum of 1.6 for the combined experiment + lattice fit. The current lattice data therefore do not have enough constraining power, relative to the precision of the experimental data, to significantly affect, or improve, the overall global fit of the unpolarized PDFs.
The fit results are illustrated graphically in Fig. 1, where we compare the real and imaginary parts of the matrix element M u−d from the lattice simulations [21], as a function of z/a, with the global fits to the experimental data only and with the combined experiment + lattice analysis results. For the real part of the matrix element, which is sensitive to the isovector valence quark combination u v − d v , we find relatively good agreement in the small-|z| region, which reflects the consistency between the lattice and experimental data for the valence number sum rules. It is also related to the fact that the matching in Eq. (4) is valid when z is sufficiently small, whereas the agreement deteriorates at large |z|, where the fitted theory prefers a faster suppression with increasing |z|.
The imaginary part of the isovector matrix element, which is sensitive to the antiquark asymmetryū −d, has an even stronger tension with the experimental data, as seen in the comparison of the lattice values in Fig. 1 with the experimental and combined fits. The discrepancy is dramatically illustrated in the comparison of the isovector combinations of fitted PDFs in Fig. 1 with the distributions extracted from the direct DFT approach. For the net u − d distribution, one observes a moderate compatibility of the PDFs with and without lattice data in the valence quark region, but a striking difference in the very high-x region compared with the direct DTF results. For the isovector antiquark distributionū −d, inclusion of the lattice data does not change the negative sign of the asymmetry at x 0.2, which is driven largely by the E866 Drell-Yan data [27,28], preventing a faster decay of the matrix elements in the large-|z| region. This discrepancy observed in the x-dependent PDFs has been discussed in Refs. [8,21].

C. Polarized PDFs
The results for the spin-dependent PDF analysis, in which 651 polarized inclusive DIS data points are fitted together with an additional 61 lattice data points, are shown in Table II.
The total χ 2 per datum for the experiment-only fit is 1.1, and actually decreases slightly The level of agreement between the lattice and experimental data can be seen in Fig. 2 for the spin-dependent coordinate space matrix element M ∆u−∆d . For the real part of the matrix element, which is sensitive to the isovector combination ∆u + ∆ū − ∆d − ∆d, we find excellent agreement in the small-|z| region, indicating the consistency of the lattice triplet axial charge with experiment. The agreement deteriorates at larger values of |z|, where the tails of the matrix element increasingly deviate away from zero, but are still consistent within the large uncertainties with the fit to experimental data.
The predictions for the imaginary part, which is sensitive to the spin-dependent isovector valence combination ∆u v − ∆d v , have the largest discrepancies at intermediate values of z/a  if one uses only the experimental data. On the other hand, the combined experiment + lattice data analysis allows an optimal solution to be found that simultaneously describes both types of data sets. The fact that the lattice data are able to change the posterior distribution indicates that already the existing lattice data can provide significant constraints on spin-dependent PDFs beyond that from the existing experimental measurements.
The impact of the lattice data on the spin-dependent PDF uncertainties is illustrated in Fig. 2, where the uncertainties on the PDFs fitted to experimental data alone are significantly reduced when supplemented with the lattice matrix element data. Also shown in Fig. 2 are the PDFs inferred via the DTF method, which are found to be in relatively good agreement with the PDFs extracted using the continuum approach within current uncertainties. In contrast to the spin-averaged case in Fig. 1, however, we find that for the spin-dependent PDFs the lattice data do not suggest any large ∆ū − ∆d asymmetry, as suggested by the data on the longitudinal single-spin asymmetry for W ± production from RHIC [54]. More accurate lattice data and better matching coefficients could help improve the determination of this asymmetry.

D. Moments
To further test the quality of the agreement between the lattice and experimental data, we consider moments of PDFs, which are less sensitive to point-to-point statistical fluctuations of the PDFs in momentum space. The N -th moments of the spin-averaged (q) and spindependent (∆q) PDFs in Eqs. (6a) and (6b) are defined by respectively. In Fig. 3 the first five moments are compared relative to the moments extracted from experimental data alone. For the spin-averaged case, a clear pattern of disagreement is observed when the experimental are fitted together with the lattice matrix elements. Although the lattice-constrained results differ from the experiment-only fits by ∼10%-15%, because of the small uncertainties on the data this represents a discrepancy of up to ≈ 4σ. x(∆u − ∆d) For the spin-dependent case, on the other hand, the moments are generally in very good agreement when extracted with or without the lattice data, and indicate a significant reduction of uncertainties when the lattice data are included in the analysis.
Assuming that the issues that generate the disagreement between lattice and experimental data for the spin-averaged case are resolved, a relevant question to ask is whether the current precision of the lattice calculations can have an impact on the spin-averaged PDFs. To answer this question, we carry out a pseudo-data analysis using as "lattice" data the central predictions from the PDFs extracted from experimental data alone, but keeping the actual uncertainties quoted by the lattice calculations. As can be seen from Fig. 3, the impact of the current precision of the lattice data on spin-averaged PDFs is marginal. In contrast, the reduction of the uncertainties on the moments for the spin-dependent case is quite significant.
Focusing on the spin-dependent PDFs, we also explore the importance of the large-|z| lattice matrix elements, which generally have larger statistical errors than the low-|z| data.
The larger uncertainties stem from the renormalization factor growing exponentially with |z| due to power-law divergences, and greater sensitivity to power corrections and lattice systematic uncertainties. To address this question we perform several additional fits using subsets of the lattice data with cuts on the maximum values of |z| used in the analysis.
The results in Fig. 4 generally indicate stability of the spin-dependent PDFs with respect to varying the cut between z max /a = 15, which includes all of the computed lattice data, and z max /a = 8, which excludes most of the tails in which the errors on the matrix elements grow large. The extracted spin-dependent PDFs from each of the regions considered are within the estimated errors, and the ratios of the moments for z max /a = 8 and 12 to the full results with z max /a = 15 are unity within the quoted uncertainties.

V. CONCLUSIONS
In this paper we have performed the first combined MC-based global QCD analysis of spin-averaged and spin-dependent PDFs, using both experimental data and matrix elements computed from first principles in lattice QCD. Utilizing the recent calculations of the isovector matrix elements from Ref. [8], we combined these in our likelihood analysis on the same footing as the experimental data using the continuous Fourier transform approach relating matrix elements with the PDFs.
For the spin-averaged PDFs, we found significant tension between the lattice and experimental data, with the precision of the latter dominating the inference on the PDFs.
The precision of the unpolarized lattice matrix elements will need to improve significantly in future for the lattice data to be competitive with experiment in constraining PDFs. In contrast, our analysis of the spin-dependent PDFs shows promising agreement between the lattice and experimental data, with the current precision of the polarized matrix elements providing significant constraints on PDFs compared with the precision of existing experimental data. In particular, we find that the existing lattice results do not indicate a large ∆ū−∆d asymmetry, as suggested in recent data on W production in single-spin asymmetries observed at RHIC [54].
Since the direct matching between lattice calculable matrix elements and PDFs in Eq. (4) has been shown to be valid to all orders in QCD perturbation theory when z is sufficiently small, in the future more precise data from lattice configurations with smaller lattice spacing will be needed. This will allow for a larger range of hadron momentum P 3 and values of z, which will provide better constraints on the x dependence of PDFs, especially in the spin-dependent sector. Our analysis demonstrates the feasibility of including lattice data in standard QCD global analysis, and paves the way for future precision studies of PDFs that can treat information from experimental data and lattice QCD on equal footing.