Self-consistent determination of proton and nuclear PDFs at the Electron Ion Collider

We quantify the impact of unpolarized lepton-proton and lepton-nucleus inclusive deep-inelastic scattering (DIS) cross section measurements from the future Electron-Ion Collider (EIC) on the proton and nuclear parton distribution functions (PDFs). To this purpose we include neutral- and charged-current DIS pseudodata in a self-consistent set of proton and nuclear global PDF determinations based on the NNPDF methodology. We demonstrate that the EIC measurements will reduce the uncertainty of the light quark PDFs of the proton at large values of the momentum fraction $x$, and, more significantly, of the quark and gluon PDFs of heavy nuclei, especially at small and large $x$. We illustrate the implications of the improved precision of nuclear PDFs for the interaction of ultra-high energy cosmic neutrinos with matter.

Introduction -The construction of an Electron-Ion Collider (EIC) [1,2] has been recently approved by the United States Department of Energy at Brookhaven National Laboratory, and could record the first scattering events as early as 2030. By colliding (polarized) electron or positron beams with proton or ion beams for a range of center-of-mass energies, the EIC will perform key measurements to investigate quantum chromodynamics (QCD) at the intensity frontier. These measurements will be fundamental to understand how partons are distributed in position and momentum spaces within a proton, how the proton spin originates from the spin and the dynamics of partons, how the nuclear medium modifies partonic interactions, and whether gluons saturate within heavy nuclei.
In this paper we focus on one important class of EIC measurements, namely inclusive cross sections for unpolarized lepton-proton and lepton-nucleus deep-inelastic scattering (DIS). In particular we study how such data could improve the determination of the unpolarized proton and nuclear parton distribution functions (PDFs) [3] by incorporating suitable pseudodata in a self-consistent set of PDF fits based on the NNPDF methodology (see Ref. [4] and references therein for a comprehensive description). The unique ability of an EIC to measure inclusive DIS cross sections consistently for the proton and a wide range of nuclei will be exploited also to update the proton PDFs used as a boundary condition in the nuclear PDF fit. This feature distinguishes our analysis from previous studies [5,6], and may be extended to a simultaneous determination of proton and nuclear PDFs in the future. The results presented in this work integrate those contained in Sects. 7.1.1 and 7.3.3 of the upcoming EIC Yellow Report [7]. They systematically account for the impact of projected inclusive DIS measurements at an EIC on the unpolarized proton PDFs for the first time (for projected semi-inclusive DIS measurements see Ref. [8]), and supersede a previous NNPDF analysis of the impact of EIC measurements on nuclear PDFs [6]. Similar studies for polarized PDFs have been performed elsewhere [9][10][11][12], including in the NNPDF framework [13].
The structure of this paper is as follows. We first describe how EIC pseudodata are generated. We then study how they affect the proton and nuclear PDFs once they are fitted. Lastly, we illustrate how an updated determination of nuclear PDFs can affect QCD at the cosmic frontier, in particular predictions for the interactions of highly-energetic neutrinos with matter as they propagate through Earth towards large-volume detectors.
Pseudodata generation -In this analysis we use the same pseudodata as in the EIC Yellow Report [7], see in particular Sect. 8.1. In the case of lepton-proton DIS, they consist of several sets of data points corresponding to either the neutral-current (NC) or the charged-current (CC) DIS reduced cross sections, σ NC and σ CC , respectively. See, e.g., Eqs. (7) and (10) in Ref. [14] for their definition. Both electron and positron beams are considered, for various forecast energies of the lepton and proton beams. In the case of lepton-nucleus DIS, the pseudodata correspond only to NC DIS cross sections, see, e.g., the discussion in Sect. 2.1 of Ref. [6] for their definition. Both electron and positron beams are considered in conjunction with a deuteron beam; only an electron beam is instead considered for other ions, namely 4 He, 12 C, 40 Ca, 64 Cu, and 197 Au. A momentum transfer Q 2 > 1 GeV 2 , a squared invariant mass of the system W 2 > 10 GeV 2 and a fractional energy of the virtual particle exchanged in the process 0.01 ≤ y ≤ 0.95 are assumed in all of the above cases, consistently with the detector requirements outlined in Sect.8.1 of [7].
The pseudodata distribution is assumed to be multi-Gaussian, as in the case of real data. It is therefore uniquely identified by a vector of mean values µ and a covariance matrix Σ, for which the following assumptions are made. The mean values correspond to the theoretical expectations t of the DIS cross sections obtained with a true underlying set of PDFs, and smeared by normal random numbers r sampled from the covariance matrix such that µ = t + rΣ. Specifically we use a recent variant [15] of the NNPDF3.1 determination [16], and the nNNPDF2.0 determination [17], for proton and nuclear PDFs, respectively. The covariance matrix is made up of three components, which correspond to a statistical uncertainty, an additive uncorrelated systematic uncertainty, and a multiplicative correlated systematic uncertainty. The statistical uncertainty is determined by assuming an integrated luminosity L of 100 fb −1 for electron-proton NC and CC DIS, and of 10 fb −1 in all other cases. The systematic uncertainties are instead determined with the djangoh event generator [18], which contains the Monte Carlo program heracles [19] interfaced to lepto [20]. These pieces of software collectively allow for an account of one-loop electroweak radiative corrections and radiative scattering. The Lund string fragmentation model, as implemented in pythia/jetset (see, e.g., Ref. [21] and references therein) is used to obtain the complete hadronic final state. The non-perturbative proton and nuclear PDF input is made available to djangoh by means of numeric tables corresponding to the relevant NC and CC DIS structure functions, which were generated with apfel [22] in the format of lhapdf [23] grids. The optimal binning of the pseudodata is determined accordingly.
The complete set of pseudodata considered in this work is summarized in Table 1. For each pseudodata set, we indicate the corresponding DIS process, the number of data points n dat before (after) applying kinematic cuts (see below), the energy of the lepton and of the proton or ion beams E and E p , the center-of-mass energy √ s, the luminosity L, and the relative uncorrelated and correlated systematic uncertainties (in percentage) σ u and σ c . Two different scenarios, called optimistic and pessimistic henceforth, are considered, which differ for the number of data points and for the size of the projected systematic uncertainties. In the case of NC cross sections, the uncorrelated uncertainty was estimated to be 1.5% (2.3%) in the optimistic (pessimistic) scenario. These uncertainties originated from a 1% uncertainty on the radiative corrections, and a 1% (2%) uncertainty due to detector effects. The normalization uncertainty was set to 2.5% (4.3%). This included a 1% uncertainty on the integrated luminosity and a 2% (4%) uncertainty due to detector effects. In the case of CC cross sections, an uncorrelated uncertainty of 2% was used in both the optimistic and pessimistic scenarios, while a normalization uncertainty of 2.3% (5.8%) was used in the optimistic (pessimistic) scenario. This uncertainty includes contributions from luminosity, radiative corrections and simulation errors. Estimating systematic uncertainties for an accelerator and a detector which have not yet been constructed is particularly challenging. The percentages given in Table 1 Table 1. The EIC pseudodata sets considered in this work. For each of them we indicate the corresponding DIS process, the number of data points n dat in the optimistic/pessimistic scenarios before (after) kinematic cuts, the energy of the lepton and of the proton or ion beams E and E p , the centerof-mass energy √ s, the integrated luminosity L, and the relative uncorrelated and correlated systematic uncertainties (in percentage) σ u and σ c in the optimistic/pessimistic scenarios.
performed using the EIC Handbook detector and the current EIC detector matrix [24]. Relative systematic uncertainties are estimated to be independent from the values of x and Q 2 , in contrast to statistical uncertainties. For NC pseudodata (with L=100 fb −1 ), systematic uncertainties are significantly larger than statistical uncertainties in much of the probed kinematic phase space, see e.g. Figs. 7.1 and 7.67 in [7]. Conversely, for CC pseudodata, systematic uncertainties are comparable to statistical uncertainties for most of the measured kinematic space. The kinematic coverage of the EIC pseudodata in the (x, Q 2 ) plane is displayed in Fig. 1 for the optimistic scenario. Pseudodata for lepton-proton and lepton-deuteron are separated from pseudodata for electron-ion collisions via different panels. The approximate coverage of currently available inclusive DIS measurements is shown as a shaded area. Dashed lines correspond to the kinematic cuts used in the PDF fits described below. From Fig. 1 x Current DIS Data EIC e − A DIS NC Figure 1. The expected kinematic coverage in the (x, Q 2 ) plane of the EIC pseudodata for leptonproton or lepton-deuteron (left) and lepton-nucleus (right panel) collisions, see Table 1. Shaded areas indicate the approximate kinematic coverage of the available inclusive DIS measurements. The dashed lines denote the kinematic cuts used in the PDF fits, Q 2 ≥ 3.5 GeV 2 and W 2 ≥ 12.5 GeV 2 .
Fit ID Description NNPDF3.1+EIC (optimistic) Same as the base fit of [15] augmented with the e ± p (CC and NC) and e ± d (NC) EIC pseudodata sets for the optimistic scenario.
NNPDF3.1 pch+EIC (optimistic) Same as the proton baseline fit of [17] augmented with the e ± p (CC and NC) pseudodata sets for the optimistic scenario.
nNNPDF2.0+EIC (optimistic) Same as the nuclear fit of [17] augmented with the e − A (NC) pseudodata sets (with A = 2 d, 4 He, 12 C, 40 Ca, 64 Cu and 197 Au for the optimistic scenario. nNNPDF2.0+EIC (pessimistic) Same as nNNPDF2.0+EIC (optimistic), but with EIC pseudodata sets for the pessimistic scenario. relevance of the EIC for the determination of nuclear PDFs. In this case, the EIC measurements extend the kinematic reach of DIS by more than one order of magnitude in both x and Q 2 .
In the case of proton PDFs, instead, the EIC measurements mostly overlap with those already available, in particular from HERA, except for a slightly larger extension at very high x and Q 2 .
Fitting procedure -We include the pseudodata in the series of fits summarized in Table 2. All these fits use the NNPDF methodology. Because nuclear PDFs are correlated with proton PDFs (the former should reduce to the latter in the limit A → 1, where A is the nucleon number), and because the EIC measurements of Table 1 will affect both, we determine them sequentially. First, we focus on the proton PDFs, and perform the NNPDF3.1+EIC optimistic and pessimistic fits. These are a rerun of the base fit of Ref. [15], which is now augmented with the e ± p (CC and NC) and e ± d (NC) EIC pseudodata sets for the optimistic and pessimistic scenarios. As in Ref. [15,16], they are all made of N rep = 100 Monte Carlo replicas. After kinematic cuts, the fits include a total of 5264 (5172) data points in the optimistic (pessimistic) scenario, out of which 1286 (1194) are EIC pseudodata and 3978 are real data (see Ref. [15] for details). Kinematic cuts are the same as in Ref. [15,16], specifically Q 2 > 3.5 GeV 2 and W 2 > 12.5 GeV 2 . These cuts, which serve the purpose of removing a kinematic region in which potentially large higher-twist and nuclear effects may spoil the accuracy of the PDF analysis, are more restrictive than those used to generate the pseudodata. This fact is however not contradictory, and reproduces what customarily happens with real data, when different kinematic cuts are used in the experimental analysis and in a fit. These fits are accurate to next-to-next-to-leading order (NNLO) in perturbative QCD, they utilize the FONLL scheme [25][26][27] to treat heavy quarks, and they include a parametrization of the charm PDF on the same footing as the lighter quark PDFs. In comparison to the original NNPDF3.1 fits [16], a bug affecting the computation of theoretical predictions for charged-current DIS cross sections has been corrected, the positivity of the F c 2 structure function has been enforced, and NNLO massive corrections [28,29] have been included in the computation of neutrino-DIS structure functions.
We then focus on nuclear PDFs, and perform the NNPDF3.1 pch+EIC and nNNPDF2.0+EIC optimistic and pessimistic fits. These are a rerun of the proton and nuclear baseline determinations of Ref. [17], augmented respectively with the e ± p (CC and NC) and the e − A (NC), A = d, 4 He, 12 C, 40 Ca, 64 Cu, and 197 Au, pseudodata sets for the optimistic and pessimistic scenarios. As in Ref. [17], the proton (nuclear) fits are made of N rep = 100 (N rep = 250) Monte Carlo replicas. After kinematic cuts, the NNPDF3.1 pch+EIC fits include a total of 4147 (4055) data points in the optimistic (pessimistic) scenario, out of which 846 (754) are EIC pseudodata and 3301 are real data (see Ref. [17] for details). The nuclear fits include a total of 3007 data points, out of which 1540 are EIC pseudodata and 1467 are real data. Kinematic cuts are the same as above, and are in turn equivalent to these used in Refs. [16,17]. These fits are accurate to next-to-leading order (NLO) in perturbative QCD, and assume that charm is generated perturbatively, consistent with Ref. [17].
Although the proton and nuclear PDF fits are performed independently, they remain as consistent as possible. Most importantly, the unique feature of an EIC to measure DIS cross sections with a comparable accuracy and precision for a wide range of nuclei and for the proton is key to inform the fit of nuclear PDFs as much as possible. Not only do the measurements on nuclear targets enter the fit directly, but also the measurements on a proton target are first used to update the necessary baseline proton PDF determination. This feature distinguishes our work from previous similar studies [5,6], where only the effect of measurements on nuclear targets were taken into account in the determination of nuclear PDFs. A simultaneous determination of proton and nuclear PDFs might eventually become advisable at an EIC, should the measurements be sufficiently precise to make an independent determination less reliable.
We also note that the pseudodata sets for a deuteron target are alternatively included in the fit of proton PDFs or in the fit of nuclear PDFs. To avoid double counting, they are not included in the fit of proton PDFs used as baseline for the fit of nuclear PDFs. This choice follows the common practice to include fixed-target DIS data on deuteron targets in fits of proton PDFs, as done, e.g., in NNPDF3.1 and in the variant fit used here to generate the pseudodata. The reason being that they are essential to achieve a good quark flavour separation. The EIC pseudodata sets for a deuteron target are then treated, in the proton PDF fits performed here, similarly to the fixed-target DIS data already included in NNPDF3.1. Specifically we assume that nuclear corrections are negligible, and therefore we do not include them. This assumption could be overcome by means of a simultaneous fit of proton and nuclear PDFs, or by means of the iterative procedure proposed in Ref. [30], whereby proton and deuteron PDFs are determined by subsequently including the uncertainties of each in the other. Any of these approaches goes beyond the scope of this work, as they will have little applicability in the context of pseudodata.
Results -We now turn to discuss the results of the fits collected in Table 1. As expected, the goodness of each fit measured by the χ 2 per number of data points is comparable to that of the fits used to generate the pseudodata. The description of each data set remains unaltered within statistical fluctuations, and the χ 2 per number of data points for each of the new EIC pseudodata sets is of order one, as it should by construction. In the following we therefore exclusively discuss how the EIC pseudodata affect PDF uncertainties. In Fig. 2 we show the relative uncertainty of the proton PDFs in the NNPDF3.1 fit variant used to generate the pseudodata, and in the NNPDF3.1+EIC fits, both for the optimistic and pessimistic scenarios. In each case, uncertainties correspond to one standard deviation, and are computed as a function of x at Q 2 = 100 GeV 2 . Only the subset of flavors (or flavor combinations) that are the most affected by the EIC pseudodata are shown: u, d/u, s and g. Fig. 2 allows us to make two conclusions. First, the impact of the EIC pseudodata is localized in the large-x region, as expected from their kinematic reach (see Fig. 1). This impact is significant in the case of the u PDF, for which PDF uncertainties could be reduced by up to a factor of two for x 0.7. The impact is otherwise moderate for the d/u PDF ratio (for which it amounts to an uncertainty reduction of about one third for 0.5 x 0.6) and for the s PDF (for which it amounts to an uncertainty reduction of about one fourth for 0.3 x 0.6). The relative uncertainty of the gluon PDF, and of other PDFs not shown in Fig. 2, remains unaffected. These features rely on the unique ability of the EIC to perform precise DIS measurements at large x and large Q 2 : their theoretical interpretation remains particularly clean, as any non-perturbative large-x contamination due, e.g., to higher-twist effects, is suppressed. This possibility distinguishes the EIC from HERA, which had a similar reach at high Q 2 but a more limited access at large-x, and from fixed-target experiments (including the recent JLab-12 upgrade [31]), which can access the high-x region only at small Q 2 . Secondly, the impact of the EIC pseudodata does not seem to depend on the scenario considered: the reduction of PDF uncertainties remains comparable irrespective of whether optimistic or pessimistic pseudodata projections are included in the fits. Because the two scenarios only differ in systematic uncertainties, we conclude that it may be sufficient to control these to the level of precision forecast in the pessimistic scenario. A similar behavior is observed for the NNPDF3.1 pch fits, which are therefore not displayed. In Figs. 3 we show the relative uncertainty of the nuclear PDFs in the nNNPDF2.0 fit used to generate the pseudodata, and in the nNNPDF2.0+EIC fits, both in the pessimistic and in the optimistic scenarios. Uncertainties correspond to one standard deviation, and are computed as a function of x at Q 2 =100 GeV 2 . Results are displayed for the ions with the lowest and highest atomic mass, 4 He and 197 Au, and for an intermediate atomic mass ion, 64 Cu, and only for the PDF flavors that are the most affected by the EIC pseudodata: u,d, s and g.
From Fig. 3 we observe a reduction of nuclear PDF uncertainties, due to EIC pseudodata, that varies with the nucleus, the x region considered, and the PDF. Overall, the heavier the nucleus, the largest the reduction of PDF uncertainties. This is a consequence of the fact that nuclear PDFs are customarily parametrized as continuous functions of the nucleon number A: nuclear PDFs for 4 He, which differ from the proton PDF boundary by a small correction, are better constrained than nuclear PDFs for 197 Au because proton data are more abundant than data for nuclei. In this respect, the EIC will allow one to perform a comparatively accurate scan of the kinematic space for each nucleus individually, and, as shown in Fig. 3, to determine the PDFs of all ions with a similar precision. The reduction of PDF uncertainties is localized in the small-x region, where little or no data are currently available (see Fig. 1), and in the large-x region, where nuclear PDF benefit from the increased precision of the baseline proton PDFs. In the case of the gluon PDF, the reduction of uncertainties is seen for the whole range in x. This is a consequence of the extended data coverage in Q 2 , which allows one to constrain the gluon PDF even further via perturbative evolution. As observed in the case of proton PDFs, the fits obtained upon inclusion of the EIC pseudodata do not significantly differ whether the optimistic or the pessimistic scenarios are considered, except for very small values of x. In this case the optimistic scenario leads to a more marked reduction of PDF uncertainties, especially for the total PDF combinations u + and d + . This feature is mainly driven by the smaller systematic uncertainties that affect the NC pseudodata in the optimistic scenario (about 3%) with respect to the pessimistic one (about 5%), see Table 1. That is aligned with the fact that the statistical uncertainties are comparable between the two scenarios.
Implications for neutrino astrophysics -The reduction of PDF uncertainties due to EIC pseudodata, in particular for nuclear PDFs, may have important phenomenological implications. Not only at the intensity frontier, e.g. to characterize gluon saturation at small x, but also at the energy frontier, e.g., for searches of new physics that require a precise knowledge of PDFs at high x, and at the cosmic frontier, e.g., in the detection of highly energetic neutrinos from astrophysical sources. We conclude our paper by focusing on this last aspect. Specifically it was shown in Ref. [32] that the dominant source of uncertainty in the theoretical predictions for the cross section of neutrino-matter interactions is represented by nuclear effects. The corresponding NC and CC inclusive DIS cross sections may differ significantly depending on whether they are computed for neutrino-nucleon or neutrino-nucleus interactions. The uncertainty is larger in the latter case, because nuclear PDFs are not as precise as proton PDFs, and is such that it encompasses the difference in central values. We revisit this statement in light of the precise nNNPDF3.0+EIC fits.
In Fig. 4 we show the CC (left) and NC (right) neutrino-nucleus inclusive DIS cross sections, with their one-sigma PDF uncertainties, as a function of the neutrino energy E ν . Moreover, in Fig. 5 we show the transmission coefficient T for muonic neutrinos, defined as the ratio between the incoming neutrino flux Φ 0 and the flux arriving at the detector volume Φ (see Eq. (3.1) and the ensuing discussion in Ref. [32] for details); T is displayed for two values of the nadir angle θ as a function of the neutrino energy E ν . In both cases, we compare predictions obtained with the calculation presented in Refs. [32,33] and implemented in hedis [34]. For a proton target the prediction is made with the proton PDF set determined in Ref. [35], a variant of the NNPDF3.1 PDF set in which small-x resummation effects [36] and additional constraints from D-meson production measurements in proton-proton collisions at 5,7 and 13 TeV [37][38][39] have been included. This prediction is labeled HEDIS-BGR in Figs. 4-5. For a nuclear target (A = 31 is adopted as in Ref. [32]), the prediction is made alternatively with the nNNPDF2.0 and the nNNPDF2.0+EIC (optimistic) PDFs. The corresponding predictions are labeled HEDIS-nBGR [nNNPDF2.0] and HEDIS-nBGR [nNNPDF2.0 (EIC)] in Figs. 4-5. Predictions are all normalized to the central value of the proton result. In comparison to nNNPDF2.0, the effect of the EIC pseudodata is seen to reduce the uncertainty of the prediction for a nuclear target by roughly a factor of two for E ν 10 6 GeV. The reduced uncertainty no longer encompasses the difference between predictions obtained on a proton or on a nuclear target, except in the case of an attenuation rate computed with a large nadir angle. Furthermore, this reduction extends to much larger neutrino energy (E ν 10 7 ), beyond the EIC-sensitive x-region of the PDFs. We believe this to be partly due to DGLAP evolution and sum rules that smoothen the low-x PDF behavior, but also potentially a consequence of the factorisation approximation used to account for nuclear corrections in the ultra high-energy cross-sections highlighted by Eqs. (5.2, 5.3) in Ref. [32].
Summary -In this paper we have quantified the impact that unpolarized lepton-proton and lepton-nucleus inclusive DIS cross section measurements at the future EIC will have on the unpolarized proton and nuclear PDFs. In particular, we have extended the NNPDF3.1 and nNNPDF2.0 global analyses by including suitable NC and CC DIS pseudodata corresponding to a variety of nuclei and center-of-mass energies. Two different scenarios, optimistic and pes-   [32] with the proton PDF of [35], and with the nNNPDF2.0 and nNNPDF2.0+EIC nuclear PDFs. They are all normalized to the central value of the proton results. See text for details. Figure 5. The transmission coefficient T for muonic neutrinos as a function of the neutrino energy E ν and for two values of the nadir angle θ. Predictions correspond to the computation of [32] with the proton PDF of [35], and with the nNNPDF2.0 and nNNPDF2.0+EIC nuclear PDFs. They are all normalized to the central value of the proton results. See text for details. simistic, have been considered for the projected systematic uncertainties of the pseudodata. We have found that the EIC could reduce the uncertainty of the light quark PDFs of the proton at large x, and, more significantly, the quark and gluon PDF uncertainties for nuclei in a wide range of atomic mass A values both at small and large x. In general the size of this reduction turns out to be similar for both the optimistic and pessimistic scenarios. We therefore conclude that it may be sufficient to control experimental uncertainties to the level of precision forecast in the latter scenario. Lastly, we have illustrated how theoretical predictions obtained with nuclear PDFs constrained by EIC data will improve the modelling of the interactions of ultra-high energy cosmic neutrinos with matter. In particular we have demonstrated that nuclear PDF uncertainties may no longer encompass the difference between predictions obtained on a proton and on a nuclear target. This fact highlights the increasing importance of carefully accounting for nuclear PDF effects in high-energy neutrino astrophysics.
Further phenomenological implications could be investigated in the future, for instance whether a simultaneous determination of proton and nuclear PDFs can improve the constraints provided by the EIC data in comparison to the self-consistent strategy adopted in this paper, or the extent to which semi-inclusive DIS (SIDIS) data can further improve both proton and nuclear PDF determinations. The PDF sets discussed in this work are available in the LHAPDF format [23] from the NNPDF website: http://nnpdf.mi.infn.it/for-users/nnnpdf2-0eic/