Impact of inclusive hadron production data on nuclear gluon PDFs

A precise knowledge of nuclear parton distribution functions (nPDFs) is -- among other things -- important for the unambiguous interpretation of hard process data taken in pA and AA collisions at the Relativistic Heavy Ion Collider (RHIC) and the Large Hadron Collider (LHC). The available fixed target data for deep inelastic scattering (DIS) and Drell-Yan (DY) lepton pair production mainly constrain the light quark distributions. It is hence crucial to include more and more collider data in global analyses of nPDFs in order to better pin down the different parton flavors, in particular the gluon distribution at small x. To help constrain the nuclear gluon PDF, we extend the nCTEQ15 analysis by including single inclusive hadron (SIH) production data from RHIC (PHENIX and STAR) and LHC (ALICE). In addition to the DIS, DY and SIH data sets, we will also include LHC W/Z production data. As the SIH calculation is dependent on hadronic fragmentation functions (FFs), we use a variety of FFs available in the literature to properly estimate this source of uncertainty. We study the impact of these data on the PDFs, and compare with both the nCTEQ15 and nCTEQ15WZ sets. The calculations are performed using a new implementation of the nCTEQ code (nCTEQ++) including a modified version of INCNLO which allows faster calculations using pre-computed grids. The extension of the nCTEQ15 analysis to include the SIH data represents an important step toward the next generation of PDFs.

Parton distribution functions (PDFs) are fundamental quantities required to calculate predictions for any process involving hadrons in the initial state. The QCD parton model has been used successfully to make predictions for a variety of experiments at SLAC, HERA, TeVatron, RHIC and LHC. This theoretical framework will also be essential for both the physics program of the EIC, and proposed future experiments such as the FCC. While precise constraints have been imposed on the proton PDFs, for the case of nuclear PDFs (nPDFs), there is still much room for improvement of the uncertainties .
The gluon PDFs are particularly problematic because the cross sections for the deep inelastic scattering (DIS) and the Drell-Yan (DY) processes, which represent the bulk of the precision data in nPDF fits like nCTEQ15 [3], are not directly sensitive to the gluon PDF at leading order.
While many different microscopic models for nuclear effects on PDFs exist, no unambiguous picture has yet emerged for either the shadowing region [23][24][25][26], antishadowing region [26][27][28], or the EMC effect [26,[29][30][31][32][33]. A particularly promising unified approach is provided by the Color Glass Condensate [34,35]. On the other hand, unbiased fits to the experimental data provide important global constraints on these theoretical ideas and are an indispensable ingredient for many current and future experimental (i.e. at LHC, but also RHIC and EIC) and theoretical analyses (e.g., for the very successful Statistical Hadronization Model describing the freeze-out of the QGP [36]). This is the approach we take in the following. Note that there are currently ongoing studies at the LHC of medium, i.e. final state effects also in small systems created in pA and even pp collisions [37,38]. In our analysis below, we will demonstrate that our results are largely independent of the final state hadron fragmentation and thus that our interpretation of the nuclear effects as modifications of a cold initial state is currently totally consistent with the available experimental data.

A. The gluon PDF
Single Inclusive Hadron (SIH) production data has the potential to yield new constraints on the gluon PDF because the gluon contributes a significant part to the overall cross section of this process. The importance of the gluon contribution can be seen in Fig. 1, which shows the fractional contribution to the process p+Pb → π 0 +X as a function of the transverse momentum p T for the various subprocesses initiated by gluons, up, down, and strange partons inside a lead nucleus. In particular, the red shaded area shows the fraction where a parton from the proton interacts with a gluon from the lead nucleus to produce a neutral pion. The gluon contribution dominates in the low to mid p T region at a center of  Figure 2 shows the relative contributions to the cross section of p+Pb → π 0 +X of each parton's fragmentation function (FF). For instance, the red area shows the contribution from processes where the initial scattering event produces a gluon which then fragments into a neutral pion. These contributions are very similar to those of the PDF flavors ( Fig. 1), but with slightly larger contributions from the antiquarks. Both figures are computed with nCTEQ15WZ PDFs [39] and DSS FFs [40], but there are no qualitative differences when other nPDFs or FFs are used.
In this investigation we will study single inclusive hadron production in proton-lead and deuterium-gold collisions. The focus will be to incorporate this process into the global analysis, including the dependence of the fragmentation function, and to determine the resulting impact on the nuclear gluon PDF. The remainder of this section provides an overview of the nCTEQ framework and the available data sets. In Sec. II we investigate the fragmentation function dependence, along with other theory considerations like the scale dependence. In Sec. III we present the fits obtained using the SIH data, and compare with the theoretical predictions. The main conclusions are summarized in Sec. IV.

B. The nCTEQ++ framework
The nCTEQ project expands upon the foundation of the proton PDF global fitting analysis by including the nuclear dimension. In early proton PDF analyses (e.g., Ref. [41]), the nuclear data was used to calculate correction factors which were then applied to the proton PDF fit without any uncertainties. In contrast, the nCTEQ framework enables full communication between nuclear and proton data, which means that observed tensions between data sets can be investigated through the lens of nuclear corrections.
The details of the nCTEQ15 nPDFs are presented in Ref. [3]. The current analysis, along with the other recent nCTEQ analyses, such as nCTEQ15WZ [39] and nCTEQ15HIX [42], are performed with a new C++ based code nCTEQ++. This allows us to easily interface external programs such as HOPPET [43], APPLgrid [44], and INCNLO [45]. In particular, we work at leading twist and next-to-leading order (NLO) of QCD for both the PDF and FF evolution equations as well as the hard scattering coefficients.
For the fits in this investigation, we use the same 19 parameters as for the nCTEQ15WZ set. These 19 parameters include the 16 free parameters of the nCTEQ15 analysis, with an additional 3 open parameters for the strange distribution.
Recall that for the nCTEQ15 set, the strange PDF was constrained by the relation s =s = (κ/2)(ū+d) at the initial scale Q 0 = 1.3 GeV so that it had the same form as the other sea quarks.
Our PDFs are parameterized at the initial scale Q 0 = 1.3 GeV as (1) and the nuclear A dependence is encoded in the coefficients as , a g 1 , a g 4 , a g 5 , b g 0 , b g 1 , b g 4 , b g 5 , a s+s 0 , a s+s 1 , a s+s 2 }.
To obtain the cross section for single inclusive hadron production, the PDFs of the two initial state particles are convoluted with the cross section of the partonic subprocess and the final state fragmentation function: where h is the produced light hadron and a sum over all possible subprocesses ab → c + X is understood. The twist-2 factorization formula has an error which is suppressed by a power of the ratio Λ/Q where Λ is a hadronic scale and Q the hard scale of the process (for example the p T of the light hadron). This factorization formula is the result of a rigorous factorization theorem (see [46,47] and references therein) originally devised for pp collisions. It is supposed to hold true also for pA collisions; however, the error (higher twist terms) is possibly enhanced by the nuclear A and one has to assess phenomenologically which minimum value for the hard scale is necessary for the twist-2 factorization formula to be a good approximation. 1 A detailed overview is given in Ref. [50]. Performing all convolutions for each data point in each iteration of a fit is however too computationally expensive. A solution is to perform the convolution of the proton PDF (or deuteron PDF in case of RHIC data) and the pion FF ahead of time and store the results to a grid; thus, the cross section evaluation can be reduced to a single convolution during the fitting process. In order to perform the corresponding calculations and produce such grids we have modified the INCNLO [45] program. The obtained grids have been validated to reproduce the full calculation within a margin significantly smaller than the data uncertainty.

C. SIH data sets
In this analysis we include the same deep inelastic scattering (DIS) and Drell-Yan (DY) lepton pair production data as in the nCTEQ15 analysis. The original nCTEQ15 analysis used also the RHIC pion data allowing to provide constraints on the gluon PDF. We now extend this analysis to include additional hadrons from the RHIC data, as well as new data from recent ALICE analyses. We will study four types of hadrons: neutral pions, charged pions, charged kaons and eta mesons. The charged mesons always appear as the average of their positively and negatively charged version, which is also how the neutral pions are calculated for their fragmentation functions. The data is taken at center of mass energies per nucleon of 200 GeV (PHENIX, STAR), 5020 GeV and 8160 GeV (ALICE). Table I gives an overview of the available data sets, while Fig. 3 shows the p T distribution of the available data points for each set.
As in other types of experiments, kinematic cuts are applied to remove data that cannot be adequately described by the theory. For example, in the very low p T region, the SIH process becomes non-perturbative, so we will impose a lower p T cut on the data. Additional restrictions may come from the FFs D h i (z, Q 2 ) required to compute the cross sections. The available sets of FFs are typically only reliable for momentum fractions z min < z < 1 with some minimal z min ∼ 0.1. This issue will be further discussed in Sec. III A.
All the single inclusive hadron production data is given in terms of ratios All fits are also repeated including the W ± and Z production data from our recent nCTEQ15WZ [3] analysis because this data has a significant impact on the gluon PDF. Since the impact of SIH production on the high-x region is negligible, we do not include a further comparison with the nCTEQ15HIX [42] analysis.
Since all SIH data sets have considerable normalization uncertainties, we need to account for those to avoid data sets pulling the fit in unphysical directions due to their normalization errors. This is done by including a normalization factor for each set as a parameter in the fit. By adding a χ 2 penalty that increases as normalizations stray further from unity we ensure that they are kept at reasonable values. This is done according to the prescription outlined in the Appendix A to avoid the D'Agostini bias [58].  π0, π ± , K ± AKK [63] 2008 π0, π ± , K ± NNFF [64] 2017 π0, π ± , K ± JAM20 [65] 2021 π0, π ± , K ± DSS14 [40] 2014 π0, π ± DSS17 [66] 2017 K ± AESSS [67] 2011 η

II. FRAGMENTATION FUNCTIONS
In this analysis we investigate a total of ten different fragmentation functions (FFs), as listed in Table II. We will give a brief overview of their properties, and then compare them both in terms of predictions for protonproton and proton-nucleus collisions.

A. Available fragmentation functions
In a manner complementary to the PDFs, the FFs describe the hadronization of a partonic constituent into a final-state hadron. Both the PDFs and FFs are nonperturbative objects, and hence must be obtained by fitting to data. The pioneering fits of FFs used only single-inclusive hadron production in electron-positron annihilation. In more recent analyses, groups have added data from semi-inclusive deep inelastic lepton-nucleon scattering and other processes to improve the accuracy and kinematic range of their fits [68,69].
A selection of FFs for various mesons 2 is shown in Table II. The fragmentation function sets listed in bold provide uncertainties. We will henceforth denote the combination of DSS14 for pions and DSS17 for kaons simply as "DSS." Note that HKNS16 [70] exists as an updated version of HKNS07, but no code is available to use these updated fragmentation functions.
Additional fragmentation functions exist for other final states like protons, antiprotons and unidentified charged hadrons (SGK18 [71], NNFF1.1h [72]). Some 6 of the aforementioned FFs (AKK, BKK, HKNS, KKP, KRETZER) also include those, but we exclude those from the analysis due to the comparatively large uncertainties both on the data and the fragmentation functions. There have also been studies on the effect of the nuclear medium on the fragmentation [73,74], but we exclude the fragmentation functions obtained there from our analysis in order to avoid double counting of the shared data points. Also, any possible medium modifications of the FFs are small compared to the uncertainties of the FFs.

B. Comparison in proton-proton collisions
Before we examine the dAu and pPb cases, let us first look at the p + p → h + X baseline process to help understand the limitations of the theory prediction due to the uncertainties of the FFs. Figure 4 shows a comparison of predictions from various FF sets with data taken by the PHENIX and ALICE experiments at √ s N N = 200 GeV and 7000 GeV respectively. The nCTEQ15 proton PDF is used in these calculations. The fragmentation functions displayed are BKK, KKP, DSS, NNFF and JAM20. The KRETZER and HKNS FFS are not shown as their more strict kinematic restrictions preclude comparison with the ALICE data. At 200 GeV all fragmentation functions are able to describe the data for p T ≥ 3 GeV with a satisfactory χ 2 /N dof < 1 if one allows for a normalization shift. Below p T of 3 GeV, all the curves display a significant upward slope in Fig. 4 which points to a qualitative disagreement. There is also a slight upwards slope towards higher p T for all fragmentation functions, but it is well within the data uncertainties, considering the allowed normalization shift.
At ALICE energies, the data can be well described by BKK, KKP, DSS and NNFF down to p T values of 3 GeV if a normalization is introduced. The JAM20 result is also relatively constant across the p T range, but it begins to decrease slightly for lower p T values in the range of p T ≈ 5 GeV and below. 3 Again, the theory predictions increasingly overshoot the data the further one goes below 3 GeV. Since this effect is not fragmentation function dependent, it is also independent of the produced final state. 3 In principle, for the computation of Fig. 4 the FFs should be combined with their matching PDFs, i.e., JAM20 FFs with JAM20 PDFs, and DSS FFs with MSTW2008 [77] PDFs. Since our focus is the impact on the nuclear PDFs however, we use our proton baseline instead.

C. Scale uncertainties
The prediction for the SIH production cross section depends on three scale choices: the initial state factorization scale µ i , the final state factorization scale µ f and the renormalization scale µ r . Frequently, they are taken to be µ i = µ f = µ r = c p T , where c is a constant that is commonly chosen as either 1 /2 or 1, but there is no unambiguous prescription for their choice. Figure 5 shows the prediction for pion production at 200 GeV and 7 TeV with each scale varied independently between the two common choices, c = { 1 /2, 1}. The case where all scales are equal to 1 /2 p T (bold, grey) gives the best description of the 200 GeV data; additionally, this is also the only scale choice that yields χ 2 /N dof < 1 for the 7 TeV data with p T > 3 GeV, if the normalization is chosen freely. Therefore, in the following comparisons we make the choice c = 1 /2 going forward. This means that we need to freeze the initial state factorization scale (µ i ) to the initial scale of our PDF evolution (Q 0 = 1.3 GeV) whenever c p T < Q 0 , i.e., for p T ≤ 2.6 GeV: Otherwise we would have to interpolate the PDFs to scales below the initial scale which is technically challenging.
Note that the 200 GeV and 7 TeV data sets shown in Fig. 5 are actually included in the fit of the DSS FFs, where they impose a cut of p T ≥ 5 GeV in their analysis which uses a scale choice of µ i = µ f = µ r = p T (blue curve).

D. Comparison in dAu and pPb collisions
We now examine the impact of the different FFs on the nuclear ratios R pPb and R dAu for pion, kaon, and eta production. Figure 6 compares all the data sets with predictions using the nCTEQ15WZ nuclear PDFs for each set of FFs. Data taken at p T < 1ĠeV is not shown as the twist-2 formula for the theory is certainly not valid in that region. We also display the uncertainty band for the DSS FFs to gauge the spread of the various FFs as this represents a typical FF uncertainty.
We observe that the predictions with BKK, KKP and DSS show very close agreement. The most notable difference between BKK and KKP is seen in the charged kaon production, where KKP lies a bit lower for high p T values. NNFF and JAM20 also agree very well with each other across all data sets, and the only instance where they lie outside of the uncertainty given by DSS is for the high-p T ALICE pion data. Since the data uncertainties in this region is quite large, this should not have any significant impact on our fits. In the kinematic region where AKK allows predictions, they also agree with the previous FFs. The KRETZER FFs show some Theory / Data FIG. 4. Comparison of predictions made with different fragmentation functions for p + p → π 0 + X. The calculations are done using nCTEQ15 proton PDFs. Both panels show data for neutral pions, with PHENIX data [75] in the upper and ALICE data [76] in the lower one.
qualitative differences in the region just above the cut, but lie within the uncertainty of DSS. For HKNS the disagreement is slightly larger but still well below the data uncertainties. The predictions made with AESSS agree well with the eta meson production data, but since AESSS is the only available fragmentation function for eta mesons no comparisons can be made.
We calculate fragmentation function uncertainties from the DSS FFs (see below) for each data point and add these as a systematic uncertainty in our fit. Although these uncertainties also depend on the PDF, this dependence is very weak and can be neglected. Note also that the predictions are already quite close to the data values. While this suggests that the data will not significantly change the central value of the PDFs, the data may well reduce their uncertainties.

E. Uncertainties of fragmentation functions
We now consider the FF uncertainties in further detail. Four of the available FF sets include uncertainties; HKNS and DSS provide their uncertainties in terms of Hessian eigenvectors, while JAM20 and NNFF provide Monte Carlo replicas. We show the uncertainties for NNFF and JAM20 in Figs. 16 and 17 of Appendix B. The HKNS FFs yield uncertainties that are larger than the data uncertainty for p T values below 10 GeV; hence, they will not help constrain the PDFs in this kinematic region. The NNFF fragmentation functions yield slightly larger uncertainties than those of DSS shown in Fig. 6. This may be due, in part, to the use of a parameterization-free neural network instead of a "traditional" parameterization, and a slightly   smaller data set.
Lastly, the uncertainties of the JAM20 fragmentation are so small across the kinematic region with p T >1 GeV that they can be neglected when compared with the data uncertainty.
It is important to note that the displayed bands do not reflect the full uncertainty of the theory prediction, but rather represent a lower bound for the following reasons: Firstly, the theory predictions for low p T points may depend on fragmentation functions extrapolated beyond their fitted kinematic region and the accuracy of the Hessian method outside of the region where data exists is heavily dependent on the specific parameterization of the FF. Even more important are low p T corrections. As we move to lower values of p T , perturbation theory begins to break down as contributions from non-perturbative sources increase. This can also make it difficult to disentangle initial from final state effects in hadron production processes. Medium suppression due to energy loss can be observed not only in AA, but even in pA and pp collisions [78][79][80]. These higher twist effects, however, are suppressed by powers of the hard scale p T . Thus, for our predictions in the lower p T range, we may reach the transition region between stable perturbative predictions and unreliable non-perturbative predictions. These factors are the reason why we need to impose cuts on low p T values to ensure reliable predictions.

F. Fragmentation kinematics
Finally, it is interesting to investigate the correspondence of the p T value of the data to the  z region of the fragmentation function. Figure 7 shows the contribution of different z regions to the total p+Pb → π 0 +X cross section calculated using nCTEQ15WZ PDFs and DSS fragmentation functions at 200 GeV and 5 TeV. We can see that the z < 0.2 region does not have a substantial contribution to the cross section at 200 GeV, and even the region z < 0.4 hardly contributes above p T > 3.0 GeV.
In the lower panel we see that the higher energy (5 TeV) shifts the z bands towards lower z values, with the z < 0.2 region still contributing a non-negligible amount even at p T ≈ 10 GeV. The z < 0.1 region starts contributing below p T = 4 GeV, but stays below 10%. Since most fragmentation functions include data at least down to z = 0.05 in their fits, this eliminates concerns about FF extrapolation having any significant impact on our results.

A. Data selection
Before performing the fits, we need to decide which data sets to include, and which kinematic cuts to impose. Firstly, we choose not to include the eta meson data in the current analysis as we only have a single FF without known uncertainties; but, we will examine this data set in Sec. III G.
To make sure that we can accurately describe the proton baseline as presented in Fig. 4, we cut all data with p T < 3 GeV. This is a more restrictive cut than in nCTEQ15(WZ) and EPPS16 [4] which both used RHIC neutral pion data with p T values down to 1.7 GeV. Our p T cut is also sufficient to ensure that even at the highest √ s N N of the ALICE data, the fragmentation functions are used only within their well constrained region. This cut leaves us with 77 (out of 174) ALICE and 32 (out of 77) RHIC data points.
To account for the fragmentation function uncertainties, we take our error estimate from the DSS eigenvectors and add them in quadrature with the systematic uncertainties of the data.

B. Main PDF fits
We now use the single inclusive hadron production data to extend both the nCTEQ15 and nCTEQ15WZ fits. For comparison, we will produce two baseline fits. We produce one baseline with the BKK fragmentation functions as this was the set used in the previous nCTEQ15 and nCTEQ15WZ analyses. We produce also a second baseline with the DSS fragmentation functions as these come from a more recent analysis and include uncertainties. We will then compare these fits with other available fragmentation functions in Sec. III D.
A short summary of the properties of the main fits are given in the following.
• Cuts are applied below p T = 3 GeV for all data sets.
• Eta mesons are not included in the current fits; we examine this data later in Sec. III G.
• PHENIX charged hadrons are excluded by our p T cut.
• Normalizations of all SIH data sets are fitted according to the prescription given in the Appendix A.  • Fits are performed first with data uncertainties alone, and again with uncertainties from the DSS fragmentation functions added as a systematic uncertainty to the data.
• Except for those items specified above, all other inputs to the fit are kept equal to the baseline fit.
The resulting fits are shown in Figs. 8 and 9 for the nCTEQ15 and nCTEQ15WZ baseline, respectively. The plots show the baseline fit in black, the fit with regular data uncertainties in red, and the fit with DSS uncertainties added to the data in green. We focus only on the lead PDF since the new data is taken on lead and gold, which is similarly heavy.
Examining Fig. 8, the most obvious change between the baseline (black) and the new fits (red, green) is the change in the gluon which is enhanced at x < 0.05 and suppressed at 0.05 < x < 0.3. The central values of the other flavors also exhibit some slight changes as they are of course coupled to the gluons via the DGLAP evolution. The uncertainties for up, down and strange flavors are larger in the new fits than in the baseline due to the newly opened strange parameters. This is not unexpected and was also seen in the recent nCTEQ15WZ analysis where the same strange parameters were opened up. The inclusion of the DSS uncertainty does not cause any significant change in the central value but does result in an increased PDF error band which is most noticeable at small x, especially for the strange PDF. Somewhat surprisingly, the region x ∼ 0.1 sees a slight decrease in uncertainties, likely caused by slight shifts in the Hessian basis' eigenvector directions.
In Fig. 9, the same fits are shown with W/Z data included. Here, we see that the new fits for nCTEQ15WZ+SIH are generally (with the exception of the strange PDFs) more similar to the baseline fit than was the case for the nCTEQ15SIH fits shown in Fig. 8. For the gluon, we see a similar behavior as in Fig. 8, but slightly less pronounced due to the additional constraints from the W/Z data. A somewhat surprising feature of this fit is the enhancement of the strange quark at low x. As Fig. 1 shows no particular strange sensitivity of the SIH data, presumably this enhanced strange PDF is being driven, in part, by the influence of the W/Z data. While the resulting strange PDF in the new fits is substantially larger than the baseline at low x values, it is important to recall that the LHC heavy ion data primarily constrains the region x > ∼ 0.01. Including the DSS uncertainty in this case causes no visible difference in central values, but yields slightly larger uncertainty bands on the gluon. The shifted eigenvector basis results in slightly decreased strange quark uncertainties in the low-x region.

C. Quality of the fits
To judge the quality of the fits, we first take a detailed look at the resulting χ 2 values. Figure 10 shows the χ 2 /N dof for each of the fitted data sets of the two main fits, nCTEQ15SIH and nCTEQ15WZ+SIH. We see that the DIS and DY data sets are still well described by the new PDF, and generally satisfy χ 2 /N dof < 1, with one exception. 4 The W/Z data also remains well described when including the SIH data, with the exception of data set 6215 (ATLAS Run I, Z production); this behavior was also observed in the nCTEQ15WZ analysis.
More quantitative insights regarding the fit results can be obtained from Table IV, which shows a breakdown of the χ 2 /N dof by experiment type and by data set. Note that there is a small difference for the STAR and PHENIX pion results reported here (with p T > 3 GeV) and in the nCTEQ15WZ analysis which used a p T > 1.7 GeV cut.
Beginning with the nCTEQ15 fit, we see that the DIS and DY data are well described. In contrast, the W/Z and SIH data (which were not fitted) yield large χ 2 /N dof values. Adding SIH data to the fit (nCTEQ15SIH) significantly improves the SIH data from χ 2 /N dof = 1.23 to 0.38, as well as the W/Z data from χ 2 /N dof = 3.74 to 2.32. There is also a slight improvement in the DY data, with a marginal increase in the DIS χ 2 .
In a similar manner, the nCTEQ15WZ fit yields good χ 2 values for the DIS, DY and W/Z data, but the fit to the SIH data is not optimal. Including the SIH data in the fit we find an improvement from χ 2 /N dof = 0.81 to 0.41. This results in marginal shifts for the DIS and DY data, but does increase the W/Z data from χ 2 /N dof = 0.90 to 1.02. However, the total χ 2 /N dof for the combined fit nCTEQ15WZ+SIH is χ 2 /N dof = 0.85 as compared to the nCTEQ15WZ with 0.90.
Adding either inclusive hadron data or colorless weak boson production data also improves the fit for the other data set when compared to the nCTEQ15 baseline. This supports the consistency of our interpretation of the nuclear modification in the inclusive hadron data as cold nuclear effects. A less ambiguous signal for medium effects would therefore certainly be correlations of several particles as observed e.g., in the production of two hadrons ( [82,83]). Table III shows the fitted normalizations of the SIH data sets. All the resulting normalization parameters are consistent with unity within the normalization uncertainty. Therefore, no significant normalization penalties are applied.   4001  4002  4003  4004  4101  4102  4201  5101  5102  5103  5104  5105  5106  5107  5108  5111  5112  5113  5114  5115  5116  5119  5120  5121  5122  5123  5124  5125  5126  5127  5129  5131  5132  5135  5136  5137  5138  5139  5140  5141  5143  5156  5157  5158  5159  5160  5201  5202  5203  5204  5205  5206  6211  6213  6215  6231  6232  6233  6234  6235  6251  6253 Table I, and the other processes are listed in Ref. [39].

D. Comparisons with other FFs
To investigate the influence of the choice of fragmentation function on the quality of our fit to the SIH data, in Table V we compute the χ 2 /N dof for the collection of fragmentation functions listed in Table II using the parameters from our nCTEQ15WZ+SIH fit. Additionally, we show the result using DSS both with and without the added systematics arising from the fragmentation function uncertainties. For these two DSS results, it is clear that including the additional uncertainties yields a lower χ 2 value. The results shown with the other fragmentation functions are computed using the modified data including fragmentation function uncertainties.
In Table V we find the results from KKP and BKK are quite comparable to the DSS result (with modified data), and the NNFF is just slightly higher. Nevertheless, all these results are below the DSS result with unmodified data, and this suggests that the inclusion of the extra uncertainties taken from the DSS error bands provides our fit with a reasonable estimate of the impact of the fragmentation function choice. The JAM20 fragmentation functions yield a higher χ 2 /N dof than the others, and this reflects the observations noted in Sec. II B and Fig. 4 which displayed the comparisons with IV. We present the χ 2 /N dof for the individual SIH data sets, the individual processes DIS, DY, SIH, WZ, and the total. The shown χ 2 is the sum of regular χ 2 and normalization penalty. Excluded processes are shown in parentheses. Note that both nCTEQ15 AND nCTEQ15WZ included the neutral pions from STAR and PHENIX. χ 2 /N dof for selected experiments and processes STAR PHENIX ALICE DIS DY WZ SIH Total π 0 π ± π 0 5 TeV π 0 5 TeV π ± 5 TeV K ± 8 TeV π 0 nCTEQ15 0. 13  the p+p → π 0 +X data.

E. Comparison of data and theory
We now present a detailed comparison between our new fits and the SIH data in Fig. 11 which displays the nuclear ratios R AA as a function of p T . Although our fits imposed a 3 GeV p T cut on the data, we extrapolate to lower p T values in the shaded regions of the figure.
Examining the nCTEQ15 PDF curves in Fig. 11, we notice these have a significant positive slope for most of the data sets as compared to the other fits. This observation suggests that as we add more data sets, the final predictions exhibit a reduced slope, and if we focus on the fitted region with p T > 3 GeV, the curves approach unity within approximately 10%.
Comparing the nCTEQ15SIH fit and its baseline (nCTEQ15), we see a considerable shift in the R AA nuclear ratio when the SIH data is included.
In contrast, the nCTEQ15WZ+SIH fit and its baseline (nCTEQ15WZ) show only subtle differences, aside from the normalization, which is not surprising given that Table IV indicated that the W/Z data pulls in the same direction as the SIH data. Finally, the nCTEQ15SIH and nCTEQ15WZ+SIH fits are quite comparable, certainly given the uncertainty of the data.

F. Correlation between data and PDFs
By looking at the PDFs alone we cannot judge the impact of each individual new data set on the fit. Therefore we make use of two further methods to study how each data set impacts the gluon specifically. The first quantity we want to analyze is the cosine of the correlation angle between two observables X and Y , as used in Refs. [12,84]: where the index of each sum runs over the 19 eigenvector directions. Another useful quantity is the effective ∆χ 2 eff as introduced in Ref. [3]. In contrast to the cosine of the correlation angle, ∆χ 2 eff is more sensitive to the number of data points and error size of the experiments because the normalization  does not cancel these factors out. For an experiment E j and an observable X, it is defined as: To investigate the impact of individual experimental data sets E j on the gluon PDF g(x, Q), we look at the cosine of the correlation angle cos(φ[g(x, Q), χ 2 (E j )]) and the effective χ 2 difference ∆χ 2 eff [g(x, Q), E j ]. Since neither of these quantities display a strong Q dependence, we show them only for the value of Q=10 GeV in Figs. 12 and 13. We also limit ourselves to the gluon in lead, as the focus of the SIH data is on the heavy elements; the results for gold are similar to lead.  12.
In Fig. 12 we see how the 5 TeV SIH ALICE data sets (π 0 , π ± , K ± ) display a strong anti-correlation (cos φ ∼ −0.9) with the low x gluon (x ∼ 10 −3 ) that is not seen in any of the remaining data, including the 8 TeV ALICE neutral pions. This observation suggests that the 5 TeV SIH ALICE data has significant impact on the resulting gluon in the small x region. Interestingly, the correlation angle of the STAR and PHENIX neutral pion data are quite similar to each other, and in the region x ∼ 5×10 −2 they also exhibit a strong anti-correlation (cos φ ∼ −0.9), which then becomes strong and positive (cos φ ∼ +0.8) for larger x. The 8 TeV ALICE neutral pion data show a correlation behavior similar to the NMC96 SnC data set (which is the dominant DIS set due to its large size and Q coverage), and somewhat opposite to the STAR and PHENIX neutral pion data. Examining the larger x region (x > 0.1), the influence of the various data sets is more mixed with with the STAR and PHENIX π 0 data yielding a large positive correlation and ATLAS 8 TeV π 0 and STAR π ± yielding a large negative correlation, with the result that the high x gluon remains mostly unchanged in Fig. 9.  Turning to the χ 2 eff in Fig. 13, we can see that the CMS Run II W ± and NMC96 SnC data remain the main forces determining the gluon, with the ALICE neutral pion and NMC95re CaD data sets also providing constraints.
Among the SIH data sets, the 8 TeV neutral pion data has the largest χ 2 eff , followed by the 5 TeV neutral pion data. However, they generally do not reach values as high as the previously mentioned DIS and WZ production data. It is unfortunate that we must impose the p T > 3 GeV cut on the SIH data due to limitations of our perturbative theoretical calculations; this removes a large amount of precision SIH data from our analysis. Improved theoretical techniques such as resummation may allow us to extend our analysis to smaller p T values in the future so that a larger amount of the SIH data can be included in the PDF determination.

G. Impact of the eta meson
We now investigate the impact of including the eta meson data. We will compute our fit with the DSS  fragmentation functions for the pions and kaons, and use the AESSS fragmentation functions for the eta mesons. The AESSS FFs do not provide any uncertainties, so we will not include any for the eta meson data. Using the same p T ≥ 3 cut as before for the pions and kaons, the eta meson data now provides an additional 18 data points from RHIC and 19 from ALICE. The fits including the eta meson data are shown in Figs. 14 and 15, and are compared with the baseline fit (nCTEQ15WZ) in black, the corresponding main fit (nCTEQ15SIH) in green, and the fit with eta in red (nCTEQ15SIH+eta).
Examining the results of Figs. 14, the impact of the eta meson data yields a slight upwards shift of the gluon in the low x region, and a downward modification of the strange quark both at small x and larger x ∼ 0.2. The uncertainty of the gluon shrinks by a small amount, the strange uncertainty is reduced in the low x region while it increases at medium-x, and the other flavors show a slight increase at very small x.
Examining the results of the second case based on the nCTEQ15WZ fit as shown in Figs. 15, we observe that the central values of the nCTEQ15WZ+SIH fit with and without the eta meson data are virtually identical. Regarding the PDF uncertainties, the error bands of gluon and down-quark are reduced only by a negligible margin, while the strange quark uncertainty grows very slightly. Since the uncertainty of the eta fragmentation function is expected to be larger than that of pions and kaons, the net effect of including the eta meson data into the fit would most likely be inconsequential if these additional uncertainties were included in the analysis.

IV. CONCLUSION
Using the nCTEQ++ framework, we incorporated new data on single inclusive hadron (SIH) production from ALICE and RHIC into our PDF analysis. We investigated the choice of scales and fragmentation functions (including their uncertainties), and identified a p T region where reliable perturbative predictions can be applied to help constrain the PDFs.
We obtained a good χ 2 /N dof for all the data sets, and found that the new SIH meson data had a noticeable impact on the gluon PDF at low to medium x. Compared to the nCTEQ15WZ PDF set, the gluon flattens out in the region around x = 0.05, and the uncertainties in this region shrink.
The necessary p T > 3 GeV cut limits our ability to constrain the PDFs in the low x region. If it were possible to expand the theoretical predictions to lower p T values with improved calculation techniques, then we could use the very precise ALICE data in this region to further improve the determination of the gluon PDF.
Nevertheless, even with the current limitations on the kinematics, the SIH data provide useful constraints on the nuclear gluon distribution which is still one of the least constrained nPDFs. As such we believe the presented analysis and the obtained PDFs provide an important step on the way to more precise knowledge of the nuclear structure. Also on the practical level the reduced gluon uncertainties are important for many applications. The PDFs of the nCTEQ15WZ+SIH fit for a selection of nuclei will be provided through the LHAPDF website, and those of other fits can be obtained upon request. We use the χ 2 prescription given in Ref. [58] to fit the normalizations of the SIH and W/Z production data. For a data set D with N data points and S correlated systematic errors, the χ 2 of the data set is given by: where σ norm is the normalization uncertainty and T i is the theoretical prediction for point i. The last term of Eq. (A1) is called the normalization penalty, and it vanishes when the fitted normalization is equal to unity. The penalty is scaled by the normalization uncertainty σ norm , which is around 0.03 for W/Z production and ranges from 0.034 to 0.17 for SIH production. The covariance matrix C ij is defined as: where σ i is the total uncorrelated uncertainty (added in quadrature) for data point i, andσ iα is the correlated systematic uncertainty for data point i from source α. We use the analytical formula for the inverse of the correlation matrix found in Ref. [85] to obtain: and

Appendix B: Uncertainties of other FFs
We compare the data with our theoretical predictions with nCTEQ15WZ PDFs using the uncertainties taken from the NNFF and JAM20 fragmentation functions in Figs. 16 and 17, respectively.
The NNFF fragmentation functions yield slightly larger uncertainties than those of DSS shown in Fig. 6. This may be due, in part, to the use of a parameterization-free neural network instead of a "traditional" parameterization, and a slightly smaller data set. The uncertainties of the JAM20 fragmentation are so small across the kinematic region with p T >1 GeV that they can be neglected when compared with the data uncertainty.