Neural Network QCD analysis of charged hadron Fragmentation Functions in the presence of SIDIS data

In this paper, we present a QCD analysis to extract the Fragmentation Functions (FFs) of unidentified light charged hadron entitled as SHK22.h from high-energy lepton-lepton annihilation and lepton-hadron scattering data sets. This analysis includes the data from all available single inclusive electron-positron annihilation (SIA) processes and semi-inclusive deep-inelastic scattering (SIDIS) measurements for the unidentified light charged hadron productions. The SIDIS data which has been measured by the COMPASS experiment could allow the flavor dependence of the FFs to be well constrained. We exploit the analytic derivative of the Neural Network (NN) for fitting of FFs at next-to-leading-order (NLO) accuracy in the perturbative QCD (pQCD). The Monte Carlo method is implied for all sources of experimental uncertainties and the Parton distribution functions (PDFs) as well. Very good agreements are achieved between the SHK22.h FFs set and the most recent QCD fits available in literature, namely JAM20 and NNFF1.1h. In addition, we discuss the impact arising from the inclusion of SIDIS data on the extracted light-charged hadron FFs. The global QCD resulting at NLO for charged hadron FFs provides valuable insights for applications in present and future high-energy measurement of charged hadron final state processes.


I. INTRODUCTION
In perturbative quantum chromodynamic (pQCD), the hard-scattering processes in which a hadron is observed in the final state, include an integral part in which to be called as Fragmentation Functions (FFs) in the theoretical framework. They are process-independent and universal quantity, and they show a non-perturbative transition of a parton into a hadron. FFs depend on the fraction of the longitudinal momentum of the parton taken by the hadron and the scale of energy [1]. FFs have a critical role in the current experimental programs at Jefferson Lab, Future Electron-Ion Collider (EIC) [2,3], Future Circular Collider (FCC) [4,5] and LHC, and this aspect of their role leads to the main motivation for studying the collinear FFs in several phenomenological studies.
Since FFs are non-perturbative quantities, they need to be determined from a QCD analysis of the corresponding experimental data sets. The core experimental data sets are the single-inclusive electron-positron annihilation (SIA) from several collaborations and at different range of center of mass energy from 10.5 GeV up to the M Z [6][7][8][9][10][11][12][13]. In order to disentangle all the different flavors of FFs for quark and anti-quark, in addition to the SIA data sample, one needs to taken into account some other observable. Hence, the determination of FFs in the global QCD analyses also include the data on semi-inclusive deep-inelastic scattering (SIDIS) processes [14][15][16][17] and the single-inclusive hadron production in proton-proton collisions [18][19][20][21][22].
There are two methods to calculate the FFs of unidentified light charged hadron. In the first method, for every flavor, they can be calculated as a sum of the FF sets of all identified light charged hadrons produced in the fragmentation of the given parton. Alternatively, the FFs of unidentified charged hadrons are implemented independently from a QCD analysis included the unidentified charged hadron experimental data directly.
Before discussing our analysis, we will first review the FF sets of unidentified charged hadrons which have been recently calculated. In our recent analysis entitled SGKS20 [27], we implemented the FFs of unidentified charged hadron up to next-to-next-to-leading order (NNLO) by taking advantage of the first method and determined in a simultaneous fit the FF sets for pion, kaon, proton, and the residual light charged hadrons. All the available SIA data for pion, kaon, and proton and unidentified charged hadrons production have been considered in this analysis.
Another recent analyses for light charged hadron has been done by JAM Collaboration [25] up to the next-toleading-order (NLO). Their analysis includes all available SIDIS and SIA data for pion, kaon, and unidentified charged hadrons to calculate the FFs of pion, kaon, and charged hadrons. In addition, they have used data from inclusive DIS and Drell-Yan lepton-pair production to calculate the PDFs simultaneously with the FFs. Accounting for unidentified charged hadrons, they have used the first method and add a fitted residual correction to the sum.
The other analysis of unidentified charged-hadron FFs has been presented by NNPDF Collaboration [30]. They have utilized the second method to calculate the FFs up to the NLO accuracy. The proton-proton data for unidentified charge hadron production has been added by means of Bayesian reweighting to the analysis based only on SIA data sets. They have tried to complement their analysis of Ref. [34] with the measurements of the charged hadron spectra in pp collisions. Their study demonstrated that the inclusion of pp data in a FF fit could provide a stringent constraint on the gluon distribution FF.
Another analysis of unidentified charged hadron, based on the second method, have been done by SGK18 in which the FFs of charged hadrons have been done up to NNLO by including all the unidentified charged hadrons from SIA experimental data sets [29].
The main aim of this paper is to revisit our previous QCD analysis in Ref. [29] to implement a global QCD analysis for FFs of charged hadrons by adding the SIDIS data sets to the data sample, and applying the Neural Network (NN) technique. In this analysis, the hadron FFs are fitted directly from all the experimental data for unidentified light charged hadrons production from SIA and SIDIS processes. Our main goal in this study is the inclusion of COMPASS SIDIS experimental data [14] as the only data set for the charged hadron production from SIDIS process.
In recent years, Machine Learning (ML) has spread through all subjects of particle physics, specially collider physics. One of the encouraging areas of application of such methods is improving our knowledge of non-perturbative quantities of nucleon such as PDFs and FFs [23,28,35]. In the light of this fact, we decided to use such method based on the artificial Neural Networks (NNs) to extract the light charge hadron FFs form QCD analysis of the corresponding data sets. The modern optimization techniques are utilized in this project to minimize the bias of FFs parametrization by taking advantage of the Neural Network and also the Monte Carlo sampling method as a proper statistical treatment of experimental data uncertainties to obtain the probability density distribution from the data. For this purpose, we use the publicly available code called MontBlanc in this analysis which can be obtained from [36].
This code is devoted to the extraction of collinear distributions of fragmentation functions. The code is an open-source package that provides a framework for the determination of the FFs, for many different kinds of analyses in QCD. So far, it has been developed to determine the FFs of the pion from experimental data for SIA and SIDIS data sets [23], and in our most recent study to determine the fragmentation functions of Ξ − /Ξ + [37]. MontBlanc can analyze the SIA data up to NNLO and the SIDIS data up to NLO in perturbation theory. The framework in this code is combination of the Monte Carlo method to map the uncertainty distributions of FFs and Neural Networks to parameterize the FFs.
The structure of the paper is as follows: In Sec. II, we review the theoretical formalism for the inclusive hadron production in electron-positron annihilation and semiinclusive deep-inelastic scattering (SIDIS) process, and the time-like evolution equation. The parametrization of FFs in terms of Neural Network are also discussed in detail in this section. Sec. III includes our fitting methodology. We also illustrate the Monte Carlo methodology adopted in our analysis to calculate the uncertainties of FFs and the optimal fit. This section also summarizes the SIA and SIDIS experimental data sets analyzed in this study, and the possible tensions between the data sets also examined. The main results of SHK22.h are presented in Sec. IV. This section includes the SHK22.h fit quality and the numerical results for the differential cross sections, and detailed comparison of theory predictions with the analyzed experimental data sets. We present the SHK22.h light charged hadron FFs and detailed comparisons with other results available in the literature, namely JAM20 and NNFF1.1h FFs. We also discuss in this section the impact arising from the inclusion of SIDIS data on the extracted light-charged hadron FFs. Finally, we summarize our conclusions in Sec. V and outline possible future developments.

II. THEORETICAL SETUP
In this section, we present the theoretical backgrounds of the standard collinear factorization and discuss the perturbative and non-perturbative parts of the crosssections measurements in the SIA and multiplicities in SIDIS processes. Then, we present the time-like evolution equations and the splitting functions used for the FFs. Finally, we discuss the parametrization of the FFs in terms of the Neural Networks in the presence of the SIDIS data. The Neural Network architecture for all the fitted FFs and the Monte Carlo uncertainty propagation also will be discussed in this section.

A. SIA and SIDIS factorization
In the standard collinear factorization, we separate the QCD cross sections into the perturbative partonic hard factors which are convoluted with the non-perturbative partonic or hadronic distribution functions.
In the present analysis, we consider the SIA process which is given by, and the semi-inclusive charged hadrons production in the lepton-nucleon deep inelastic scattering which can be written as, According to the collinear factorization theorem, the cross-sections for the two processes mentioned above can be written as, where theσ indicates to the process dependent perturbative partonic cross-section. The parton distribution function (PDF) and FF are non-perturbation functions.
The details of the computation of the SIA crosssections are provided in some studies available in the literature, and we refer the reader to the Refs. [28,29] for a clear review.
The basic cross-sections for the charged hadron production with four-momentum p h in deep inelastic scattering of a lepton with momentum l from a nucleon with momentum p can be written as, which are functions of Bjorken scaling variable x = Q 2 2p.q , the charged hadrons fragmentation scaling variable z h = p h .p q.p , energy transfer or inelasticity y = Q 2 xs , and the four momentum transfer squared of the virtual photon Q 2 = −q 2 . In this equation, the α indicates the finestructure constant.
The structure functions F h 1 and F h L in Eq. 4 are the relevant inclusive DIS structure functions in which at NLO accuracy are given by, The convolution symbol ⊗ in equations above is defined as, In Eqs. 5 and 6, the PDFs inside the nucleon are denoted by q,q and g, and D h q , Dq, and D g denote the FFs. The hard scattering coefficient functions C 1,L ij related to the F h 1 and F h L structure functions admit the usual perturbative expansion. Currently, these coefficient functions are known up to O(α s ), i.e., NLO and can be found for example in Refs. [38,39]. Although the coefficient functions for structure function in SIA process is known up to the O(α 2 s ), i.e., NNLO, the full set of NNLO accuracy corrections are not known for the SIDIS process, and the structure functions only known up to NLO. Hence, our QCD calculations in the perturbative part are limited to the O(α s ), i.e. NLO, accuracy.
It should be noted here that we determine the charged hadron FFs in the Zero-Mass Variable-Flavor-Number Scheme (ZM-VFNS) in which all the active flavors are considered to be massless. However, the masses of heavy quarks require to be introduced during the subschemes to determine the number of active flavors based on the heavy-quark thresholds. In this analysis, the charm and bottom masses are considered to be fixed at m c = 1.51 GeV and m b = 4.92 GeV, respectively.
As a final point, we should highlight here that we have used the proton PDF set NNPDF31 [35] at NLO accuracy to calculate the cross-section in the SIDIS process. The SIDIS data included in our QCD fit, has been measured by the COMPASS Collaboration in which the muon beam collides with the Lithium ( 6 LiD) target. In our present study, we focus on the analysis with the proton PDFs without considering the nuclear corrections. We plan to revisit this analysis in the near future to study the impact of such nuclear effect along with the target mass corrections (TMC) and hadron mass corrections.

B. pQCD and time-like evolution
The integrated FFs are universal and process independent quantity in the sense that D h (z, Q) is the same in processes like e + e − annihilation, SIDIS, and hadronic collisions. FFs depend on an additional parameter called the renormalization scale µ because of the QCD dynamics. Based on the pQCD approach, the structure of the evolution equations for the unpolarized integrated FFs generally is given by, where P ji is the matrix for the time-like splitting functions and have a perturbative expansion of the form, The NLO time-like splitting functions P ji have been computed in Refs. [40,41]. Usually the evolution equation is decomposed into a singlet sector comprising the gluon and the sum of all quark and antiquark FFs, and the non-singlet sector for quark-antiquark and flavor differences.
The range of applicability for the FFs is limited to the medium-to-large range of z value. There are two reasons for such limitation. First, the strong singular behavior in the time-like splitting functions when z → 0. Second, the produced hadrons in the final state are considered to be massless. In supplementary to this, the time-like splitting functions have a logarithmic piece ln 2 z/z in the NLO part which leads to negative FFs for z 1 in the influence of the Q 2 evolution and it leads to unphysical, negative cross sections. In addition, at small z, the finite mass corrections become more important. Therefore, in this global QCD analysis we limit ourselves to the kinematic regions in which mass corrections and the singularity of small-z behavior in the evolution kernels are negligible, as discussed in Sec. III B.

C. Neural Network and flavour decomposition
As we discussed in Ref. [29], inclusive SIA data allow for the determination of only the summed quark and antiquark FFs by including the total inclusive, light-, charmand bottom-quark tagged cross sections It has been shown that adding the SIDIS data sets to the data sample could provide a direct constraint on the individual q andq FFs for light quarks. We adopt the following parametrization basis for h + , Taking into account the SIDIS data in the QCD fit, the above combinations of quark FFs D h ± u + and D h ± d + +s + in Eq. 10 are considered to be decomposed. The heavy distributions are assumed to be symmetric. It reads, Hence, by adding SIDIS COMPASS data, the number of independent distributions increases to the seven. We observed that under these flavor combinations of quark FFs for hadron production, the best fit quality and accuracy can be achieved. In order to choose the best parametrization basis, we study different scenarios. However, in the most general case, we disentangle all light flavors. The data sets included can not constrain well enough all 6 light, 2 heavy quarks, and gluon FFs. In another case, we assume symmetry just between and a separate parametrization are implied for the D h + d and D h + s . In particular, we find this assumption leads to a deterioration of the quality of FFs and the fit, and then we omitted such assumptions.
We finally note that the COMPASS measurements have been reported for both positive (h + ) and negative (h − ) charged hadron separately. We consider a relation based on charge conjugation between h + and h − as follow, Hence, one can obtain the h − FFs in terms of the h + FFs.
The h + FFs for every parton flavor in terms of a Neural Network is defined at the initial scale of Q 0 = 5 GeV, which is given by, Here N i (z, θ) denotes the output of the Neural Network and θ stands for the (internal) parameters of the Neural Network. Note that the result of Neural Network at z = 1 is subtracted in order to satisfy the requirement that FFs should vanish at this point. Also, the result is squared to make sure that FFs always stay positive. A few remarks on the construction of Neural Networks are in order. First, in this analysis we use a simple yet efficient 1 Neural Network structure which has only one hidden layer. Second, we choose to use 20 neurons (nodes) in the hidden layer, admittedly this number is somewhat arbitrary, smaller number of nodes may result in an equally accurate fit. In an effort to examine the effect of choosing an alternative NN architecture on the obtained results, we performed another analysis with different configurations, i.e. {1-9-9-7} architecture. Our examination shows that the results are basically unchanged, which is in agreement with other studies available in the litrature [42]. We believe that this stability despite changing the NN shows that our FFs are not driven by hyperparameters such as the number of hidden layers or the number of nodes in each layer but by input experimental data. Third, number of replicas in this analysis is 200, regardless, requirements of replica method can be achieved by smaller number of replicas, for example 100 [23].

III. FITTING PROCEDURE
In this section, we first discuss the minimization strategy and the uncertainty propagation estimation using the Monte Carlo method. Then we illustrate a comprehensive set of measurements of the charged hadron production in electron-positron annihilation SIA, and the lepton-nucleon SIDIS processes as well. We also discuss the kinematic cuts on the experimental data in which a description in the framework of pQCD can be expected to work well. Finally, we comment on the tension between COMPASS data with some of other SIA data sets analyzed in SHK22.h study.

A. Minimization and uncertainty propagation method
It goes without saying that any measurement in highenergy particle physics has an uncertainty associated with it, this introduces the problem of understanding the effect that this produces on other quantities referred to as uncertainty propagation. Two widely used methods to propagate the experimental uncertainties to the FFs or observables are; one the Hessian method and second the Monte Carlo(MC) or replica method. The Monte Carlo approach is nowadays widely used in various QCD analyses [35,[44][45][46][47]. This method estimates the parameters posterior probability distribution by performing a number of fits. Every fit is independent and performed on a pseudo data set (a replica) resulting in an optimal set. The results of all fits performed then learn the probability distribution which defines both the central value(mean of the probability distribution) and the uncertainties of FFs (standard deviation of the probability distribution). In order to properly into account the uncertainty of the PDFs used in SIDIS observables, we use the similar method as developed and adopted in Ref. [23] to ensure that the PDF uncertainty is propagated into FFs. In SIDIS calculations each time a different proton replica of NNPDF3.1 is chosen at random from NNPDF31_nlo_pch_as_0118 set.
In order to perform the QCD analysis, one mainly applies the maximum log-likelihood method which in turn reduces to minimum χ 2 under usual assumptions. In this case, the problem of finding the optimal parameters of a parametric form or optimal Neural Network parameters is equivalent to minimizing the χ 2 function at hand. There are a few ways to minimize a χ 2 function in a QCD fit which is based on Neural Network; one that readily comes to mind is explicit differentiation and calculation of global minimum directly, this approach is inefficient and sometimes straight impossible. It is therefore natural to look for numerical methods such as genetic algorithm used by NNPDF for PDFs [48], stochastic gradient descent methods used by nNNPDF for nuclear PDFs [49], and trustregion methods as provided by Ceres Solver [50] and utilized by MAPFF [23] and SHKS22 [37]. In this analysis, we adopt the later method which is implemented in the MontBlanc package [36]. The χ 2 function that is subject to minimization is defined as follows, In the equation above, the x (k) is the k-th replica, T(θ (k) ) is the theoretical prediction for the k-th replica based on the parameters of Neural Network (θ (k) ), and C is the covariance matrix of the data which contains all information on the uncertainties and correlations. In view of the fact that Neural Networks by construction are redundant i.e. number of parameters is typically much bigger than that of a functional form parametrization. For this reason the χ 2 is by convention normalized to N dat , number of data points.

B. Data sets selection
In the present analysis, we make use of all available experimental data on the charged hadron production in SIA and SIDIS processes to determine the unidentified lightcharge hadron FFs. In our previous analyses [27,29,51], we have included all available SIA experimental data sets to determine the FFs of charged hadron production. The SIA measurements are reported as a sum of the observables for the positive and negative charged hadron production. However, the observables in SIDIS process are separated into positive and negative charged hadron production. The kinematic coverage in the (z, Q) plane of the SIA and SIDIS data sets analyzed in SHK22.h analysis are shown in Fig. 1. The data points for SIA are shown as blue, the SIDIS data points are shown as green; and the gray points are excluded by kinematic cuts as discussed in the text. They data sets contain all analyzed flavoruntagged and tagged measurements which are reported by different experiments. These data sets include the TASSO [8]  As one can see from Table I, there are several different measured observables available for these data sets. The experimental collaborations have reported total inclusive and light-, charm-, and bottom-tagged cross sections. Determination of the separate FFs for light and heavy quark flavors is provided by the light and heavy flavor tagged measurements. For a detailed discussion of the SIA data sets, we refer the reader to our previous study on the light-charged hadrons FFs. [29].
One of our main aim and motivations in this analysis is to revisit our previous QCD analysis [29] by including the available SIDIS data to the SIA data sample. The COMPASS Collaboration has measured the multiplicities of the charged hadrons produced in semi-inclusive scattering. They have used a 160 GeV muon beam and a target ( 6 LiD). COMPASS has measured the differential multiplicity for positive and negative charged hadrons separately, which is given by.
where the numerator is given by the differential SIDIS cross section for charged hadron production and the denominator is given by the differential inclusive DIS cross section. The cross sections at leading-order (LO) can be expressed in terms of PDFs q(x, Q 2 ) and FFs D h q (z, Q 2 ), .
The kinematic cuts have been imposed on the photon virtuality Q 2 > 1 (GeV/c) 2 , on the Bjorken scaling variable 0.004 < x < 0.4, on the scale variable in the final state 0.2 < z < 0.85 and on the inelasticity 0.1 < y < 0.7.
Although the cleanest way to access the FFs for hadron production in the final state is electron-positron annihilation process and also the FFs are the only nonperturbative objects in the observables, SIA has several limitations which can be addressed by SIDIS process. On one hand, while the extraction of flavor-separated FFs is difficult in a QCD analysis based on the SIA only, the data from SIDIS experiments are crucial to getting direct constrain on the separation of quark and anti-quark FFs. On the other hand, the range of the center-of-mass energy at which FFs are probed in SIA covers from Q = 10 GeV to the Q = M Z . However, SIDIS data cover lower scales of energy, from Q ∼ 1 GeV to the Q ∼ 6 GeV.
Considering the discussions presented in Sec. II B, we apply kinematic cuts on the experimental data for which a description in the framework of pQCD and time-like evolution can be expected to work well. Hence, we exclude the range of very small values of z from SIA data sets. For the SIA data points, we use the kinematic cuts on z as 0.02 ≤ z ≤ 0.9 for data at a center-of-mass energy of M Z , and 0.075 ≤ z ≤ 0.9 for other data points. As a matter of fact, at low-energy scale Q, higher order purterbative corrections are necessary to have acceptable theory predictions. So the perturbative QCD corrections up to NLO accuracy are unreliable for low Q. Hence, we exclude the range of very small values of Q from SIDIS data sets. For the COMPASS SIDIS, we implemented cuts of Q > 2 GeV.
Finally, we include in total N dat = 684 data points in our analysis after kinematic cuts in which include the N dat = 314 data points for SIDIS, and N dat = 370 for the SIA.
C. Compatibility of TASSO 35 GeV In this section, we comment on the possible tensions between the SIA data sets. As one can see from data sets reported in Table. I, one of the sources of SIA data sets is the TASSO 35 GeV which we do not include in the list of the data sets. Our detailed study on the individual χ 2 shows that there is a tension between this data set and the COMPASS data sets. We first perform the calculation of the light-charged hadron FFs using the SIA experimental data only, and we achieve an acceptable description for the TASSO dataset at 35 GeV with the individual χ 2 per data point of 1.56. However, when the COMPASS SIDIS data are added to the QCD fit, one can not obtain an optimal description of the TASSO 35 GeV data set in the fit, and a large χ 2 per data point is achieved for it, χ 2 /N dat = 8.37. The origin of this behavior, can be related to a tension between the TASSO 35 GeV and the COMPASS data.
We should note here that the drop of matching between the theoretical predictions and all other TASSO experimental data are also examined, and have been seen for all other scales of energy 14, 22, 35, and 44 GeV. As a matter of fact, the extracted values of χ 2 per data points for other data points are also worsened, which in order of 2.3 for the TASSO 14 GeV, 1.9 for the TASSO 22 GeV, 4.5 the for TASSO 44 GeV, and 5.3 for the TASSO 35 GeV after the inclusion of the COMPASS data. Since the matching drop for the TASSO measurements at 14, 22 and 44 GeV are milder than those of the 35 GeV one, and the extracted ranges of χ 2 per data point are seem to be acceptable for them, we decided to exclude only the TASSO 35 GeV measurements in the fit. Notwithstanding, we have performed a separate analysis which included the TASSO35 dataset and noticed that the central value of the distributions does not affect, whereas the uncertainty estimates are now larger at small z values. We presume that this large uncertainty band is an overestimate and not truthful. This finding also indicates to the tension between TASSO35 and COMPASS datasets.

IV. SHK22.H NUMERICAL RESULTS
In this section, we present the main results and findings for the determination of the FFs of light charged hadron, called SHK22.h, in which most of available and updated SIA and SIDIS measurements are added to the data sample and analyzed up to the NLO accuracy in perturbative QCD.
We first present the fit quality and discuss in details the term of both the individual and the total datasets included in our analysis. Then we present the data and theory comparison, both for the SIA and SIDIS data sets analyzed in SHK22.h.
We also illustrate the resulting light charged hadron FFs and their uncertainties, for all parton species, focusing on the comparison of the extracted NLO FFs with the publicly available JAM20 and NNFF1.1h analyses. We discuss the interplay between the SIA and SIDIS experimental data, and the stability of the light charged hadron FFs upon inclusion of SIDIS data sets.

A. Fit quality
In Table. I we report the value of the χ 2 per data point, χ 2 /N dat , for the individual data sets for both SIA and SIDIS included in the SHK22.h analysis. This table also includes the number of data points that pass the kinematic cuts. The values of the χ 2 per data point for the total datasets also shown as well.
Considering the numbers presented in this table, a few remarks for the individual and total datasets are in order. As can be seen, the global χ 2 per data point in our fit, equal to 1.080, indicates, in general, a very good description of the entire data sets. Remarkably, a comparable fit quality is observed for both the SIA and SIDIS data sets separately.
Concerning the fit quality of the individual SIDIS experiments, we see that for the h − production at COM-PASS, we obtain a better χ 2 per data point with respect to the COMPASS h + .
A closer look to the χ 2 per data point presented in this table reveals that acceptable descriptions are achieved almost for all of the individual data sets analyzed in the SHK22.h fit, with two main exceptions.
First, for some data sets reported in the table, the χ 2 /N dat value is still large: this specifically happens for the h ± light charged hadron production in TASSO 14 GeV, TASSO 44 GeV and OPAL total inclusive.
From this table we also observe that the χ 2 /N dat value for the DELPHI uds and OPAL bottomh ± is anomalously small. This finding was already observed and reported in some previous FFs analyses which is likely to be due to the overestimate of the uncorrelated systematic uncertainty. We refer the reader to the Refs. [45,[52][53][54] for more details.

B. Theory and data comparison
In order to assess, it would be instructive to look at the comparison between the data and the NLO theory predictions obtained with the SHK22.h light charged hadron FFs for all SIA and some selected SIDIS data sets.
We start with detailed comparisons with the SIA data analyzed in this work. For all results presented in SHK22.h, the upper panels represent the absolute distributions while the lower ones display the ratio to the experimental central values analyzed in SHK22.h. In Fig. 2  we compare the NLO theory predictions with the inclusive data sets form TASSO 14, TASSO 22 and TASSO 44 GeV Collaborations. The same comparison also are shown for the TPC data. As one can see, overall, satisfying agreements are achieved, however, with exception at high z. Our theory predictions do not satisfy the high-z TASSO data, which is the origin of the slightly highχ 2 value reported in Table. I for these data sets. This specific feature is particularly pronounced for TASSO 44 data, and more moderate, but still significant, for TASSO 14 and 22. For the TPC data, some deviation can bee seen for small value of z, but with the χ 2 value reported in Table. I, the description of TPC data seems to be still convincing.
In Fig. 3, we present detailed comparisons of the NLO theory predictions with the total inclusive ALEPH, DEL-PHI, OPAL and SLD data. Comparisons with the udstagged data from DELPHI, OPAL and SLD data also presented as well. Consistently with the χ 2 values reported in Table. I, the description of these data sets is desirable, with one exception for the OPAL inclusive data. Some small deviation for large-z data can be seen from the plots presented in Fig. 3. Another important finding emerges for the comparison is that, the experimental data points for the inclusive measurements by OPAL Collaboration fluctuate at high-z around the theoretical predictions by an amount that is seems to be typically larger than the calculated uncertainties. This should explain the poor χ 2 values reported in Table. I for this specific data sets.
We now turn to the comparisons of our NLO theory predictions with the charm and bottom-tagged data from OPAL, SLD, DELPHI and SLD. The corresponding plots are shown in Fig. 4. Once again, the goodness of the χ 2 values reported in Table. I is reflected in a general good description of the charm and bottom-tagged data.
Figs. 5 and 6 presents the data/theory comparison for some of the COMPASS multiplicities data sets for the h + and h − , respectively. Each panel shows a distribution as a function of z corresponding to a bin in x and y. As above, the lower panels display the ratio to the experimental central values. A remarkable feature of the distributions shown in these figures is the very nice agreements between the light charged hardon COMPASS data and the SHK22.h NLO theory prediction.
While the agreements for the COMPASS h − data is noticeable for all range of z and for all bin in x and y, some differences between the COMPASS h + data and theory predictions can be seen, more specifically for the small value of z in which the theoretical predictions overshoot the data. This is also consistent with the poor χ 2 for COMPASS h + data reported in Table. I.
As a short summary, in general, an overall good agreement between the analyzed data sets and the NLO theoretical predictions is achieved for all experiments, consistent with the total values of χ 2 reported in this section. Remarkably, the SHK22.h NLO theoretical predictions and both SIA and SIDIS data are in reasonable agreement also in the small and large-z values with exception of few data sets that we discussed above.

C. The SHK22.h light-charged hadron FFs Set
We are now in a position to discuss the SHK22.h lightcharged hadron FFs sets. In order to study in details the extracted FFs sets, we compare our best-fit results to other recent counterparts available in the literature, the JAM20 [25] and NNFF1.1h [30] analyses.
We display the light-charged hadron FFs parameterized in SHK22.h fits, and their uncertainties in Fig. 7. We present the 7 hadronic species at Q=5 GeV which . The upper panel of each plot presents the absolute distributions, while the the lower panels display the ratio to the SHK22.h. It is to be noted that NNFF1.1h only extracted the gluon, c, b quark, and flavor singlet combination in their analysis. However, the authors have given instructions to disentangle the up and down contributions in Appendix A of [28] and also produced the related LHAPDF format grids, and the plots that are presented here use such prescriptions.
Concerning the shapes of the light-charged hadron FFs, there are a number of interesting similarities and differences between these three different sets as can be seen from the comparisons presented in Fig. 7.
We start with the bottom quark FF zD h + b + (z, Q). As one can see from Fig. 7, in term of the central distribution, these three sets are in very good agreement, except for the high-z region. The uncertainty of the zD h + b + (z, Q) FFs deserves a separate comment. The uncertainty for the NNFF1.1h is relatively large over all region of z, while for the JAM20 is very narrow. For SHK22.h, the calculated uncertainty is slightly large for the large value of z due to the lack of data in this region. Although, it still remained smaller than NNFF1.1h over all range of z. The same findings are hold for the charm-quark FF zD h + c + (z, Q) with the exception that the central value of SHK22.h and NNFF1.1h is smaller than those of the JAM20 for low value of z; z < 2 × 10 −1 . An interesting difference can be seen for the zD h + u (z, Q) FF between SHK22.h and NNFF1.1h for medium to large value of z in which the NNFF1.1h result is larger than SHK22.h, while for the case of zD h + u (z, Q) they are in good agreement. Moderate differences on up-quark FF are observed for the SHK22.h and JAM20 for almost all region of z while SHK22.h and NNFF1.1h FFs remain consistent.
The most pronounced differences both in shape and uncertainty bands are observed for the gluon zD h + g (z, Q), down-quark zD h + d (z, Q) and zD h + d (z, Q) FFs. For the zD h + d (z, Q) and zD h + d (z, Q) FFs, the differences in shape and uncertainty bands among the three FF sets are more marked for large z rather than the small region of z. As one can see, a fair agreement for the central value is observed only in the region of z < 2 × 10 −1 . For medium to large z region, the SHK22.h zD h + d (z, Q) FF is larger than NNFF1.1h and smaller than the JAM20. For the case of zD h + d (z, Q), our results are larger than those of the two others in most of the z range. In term of uncertainty bands, we obtained a larger error bands in respect to the JAM20 analysis, which is expected, considering their functional parametrization. For the NNFF1.1h, the central value of zD h + d (z, Q) and zD h + d (z, Q) tend to zero for large value of z, z > 0.6, with much wider error bands for all region of z.
The central value and uncertainty bands of the gluon FFs zD h + g (z, Q), deserve separate comments. As one can see from Fig. 7, there are noticeable differences both in term of central values and uncertainty bands between these three different sets.
The JAM20 analysis includes all available SIDIS and  SIA, and the inclusive DIS and Drell-Yan lepton-pair production as well to calculate the PDFs simultaneously with the FFs. The analysis by NNPDF Collaboration included the proton-proton data for unidentified charge hadron production by means of Bayesian reweighting to the analysis based only on the SIA data sets. Typically, the uncertainties of the NNFF1.1h FFs are much larger than our results and JAM20 at small and large values of z. The un-certainty band for our result is wider than JAM20 over the small value of z, and smaller for the high-z region. The smallness of the uncertainties for the JAM20 analysis are discussed in details in Ref. [24]. We should stress here that the smaller uncertainty for all FFs presented in Fig. 7 reflect the more restrictive functional form used in the JAM20 analysis to parameterize their FFs at the input scale.  As we previously mentioned, the SIA cross-sections are less sensitive to the gluon FF. As a consequence, one would expect that, in the presence of SIA data only, the gluon FF will be determined with larger uncertainties than other quark FFs. Hence, in SHK22.h study, the SIDIS data added to the data sample to provide stronger constraint for the gluon density. In the next subsection, we present our study on the effect of SIDIS data on the extracted FFs.  In this section, we discuss the impact of SIDIS data sets on the extracted FFs. To this end, we compare the main results of the SHK22.h global QCD analysis which include the SIA and SIDIS experimental data with the analysis in which based on the SIA measurements only.
According to Eq. 10 and Eq. 11, the flavor decompositions in the parametrization of FFs are not the same for the analysis with SIA data only, and the global analysis with both SIA and SIDIS data sets. Then, in order to investigate the impact of including SIDIS data in the SHK22.h FFs analysis, the comparison of FFs for different flavors have been shown in Fig. 8 at Q=5 GeV.
From the comparisons presented in Fg. 8, one observes that the inclusion of SIDIS COMPASS data sets affects both the central values and the uncertainty bands of the extracted FFs.
Such differences are more pronounced for the gluon FF zD h + g in terms of both the central value and for the error bands. As one can see from the comparison between the SIA and SIDIS for the gluon FF, the inclusion of the SIDIS data leads to an enhancement of the zD h + g distribution for small value of z; z < 0.2 in comparison with a fit to the SIA data. We also see that at the large value of z, the SIA+SIDIS and SIA fits are in good agreement. For the medium value of z, the gluon distribution of the global SIA+SIDIS fit get suppressed with respect to the SIA fit. One can also see from Fig. 8 that the gluon FF uncertainty for all range of z is reduced. However, gluon FF enter the description of SIA and SIDIS at the same order of perturbation theory, so they are not different regarding the sensitivity to the FF of gluon. In order to constrain the gluon FF one needs to include protonproton data in analysis which is sensitive to gluon FF already at LO [30]. Therefore, we believe that the reduction in the uncertainty of gluon FF is because of significant increase in the statistics of the included experimental information from SIDIS observables.
For other FFs, we find in general a reasonable agreement between SIA and SIDIS fits, but also with important differences. As one can see, SIA and SIDIS fits are in good agreement for the central value of zD h + c + and zD h + b + FFs for all range of z. Reductions on the uncertainty of SIA+SIDIS also can be seen in respect to the SIA for both FFs. For the case of zD h + u + FF, the SIA fit are larger than SIA+SIDIS for the range of z down to z ∼ 0.1 and smaller elsewhere, however similar size of error bands are obtained for both SIA and SIA+SIDIS fits. For the case of zD h + d + +s + , a smaller uncertainty band is obtained and the central value for the SIA+SIDIS fit is larger than SIA over the whole range of z. Generally speaking, we find that the inclusion of the SIDIS data could affect the central value of the extracted FFs and leads to significant reductions of the uncertainty, and more specifically for the gluon FF.
We finally note that the value of total χ 2 per data point increases from 0.8 in SIA data only fit to the 1.079 in the global analysis of SIA+SIDIS data. As we mentioned before in section III C, the increasing of the value of the total χ 2 is related to the tension between TASSO and the COMPASS experimental data sets. As we reported, the values of χ 2 per data points for the TASSO data, more specifically TASSO 35 GeV, sets significantly increases after the inclusion the COMPASS data to the data sample.

V. SUMMARY AND CONCLUSIONS
In summary, we have presented a new global QCD analysis of light-charged hadron FFs, SHK22.h, by introducing several new features and some methodological improvements. On the methodological front, we have used the Machine Learning framework to extract the SHK22.h FFs sets, along with the Monte Carlo uncertainty analysis. This well-established fitting methodology is specifically designed to provide a faithful representation of the experimental uncertainties. This methodology is also useful to minimize any bias related to the parametrization of the light-charged hadron FFs and to the minimization procedure as well.
In terms of the input data sets, in addition to the comprehensive set of high-energy lepton-lepton annihilation (SIA), we have added the lepton-hadron scattering (SIDIS) data sets to our data sample. We have shown that SIDIS data sets have significant effect on the FFs, and more specifically on the gluon FFs and the reduction of its uncertainty. The tension among some of the data sets included in our analysis also studied and discussed in details.
The detailed comparisons to the existing light-charged hadron FFs sets (NNFF1.1h and JAM20) fully demonstrate a reasonable agreement within the FFs error bands. Although, some discrepancies in flavor dependence were observed, more specifically for the gluon and down-quark FFs. The resulting NLO theory predictions for the SIA and SIDIS cross-sections show very good agreement with the corresponding analyzed experimental data sets, as confirmed by the reported total χ 2 per data point.
Based on our findings in this study, one can conclude that adding the SIDIS data in the light-charged hadron study could lead to a much better level of precision of the extracted FFs.
In terms of future work, it would be interesting to revisit this analysis and study in detail the light-charged hadron FFs analysis described here considering the nuclear corrections in which we expect that it could affects the resulting FFs and their uncertainty, and could improve the description of the SIDIS data as well. Exploring the implications of such correction is left for the future work.
The parametrizations of the SHK22.h light-charged hadron FFs presented in this paper are available in the standard LHAPDF format [55], and can be obtained from the authors upon request.