Probing Parton distribution functions at large x via Drell-Yan Forward-Backward Asymmetry

The forward-backward asymmetry of the Drell-Yan process in dilepton decays at high invariant masses can be used to probe the parton distribution functions at large x. The behavior of three modern PDF sets (CT18NNLO, MSHT20, and NNPDF4.0) are compared, and updated under various scenarios via ePump using proton-proton collision pseudo-data generated at sqrt(s) = 13 TeV with 3000 fb^{-1} of integrated luminosity.


I. INTRODUCTION
Many of the future, high luminosity, precision measurements at the Large Hadron Collider (LHC) will be limited by modeling and theoretical uncertainties.Prominent among these are uncertainties related to knowledge of parton distribution functions (PDFs).Kinematics of the wide range of measurements and searches at the LHC span parton x ranges from very low where gluon and antiquark densities are relevant to high x regions dominated by valance quark densities.An intriguing approach to further limiting PDF uncertainties might be to, rather than rely on global, all-purpose PDF sets, create targeted "boutique" PDF sets designed for specific purposes.Judicious kinematic selection of LHC data as strategic, supplementary inputs to PDF fitting might lead to legitimate reductions from such special-purpose PDF sets.Neutral-current Drell-Yan (NCDY) backgrounds constitute a trial of this idea as high mass resonant and non-resonant searches are standard targets for new physics searches but they are now compromised by PDF uncertainties, and will be more-so in the future.Standard model (SM) backgrounds that must be modeled very well are experimentally uncomplicated and amenable to high experimental precision.So NCDY studies seem a good way to explore this idea of specialized-PDFs.
We tested that approach in a previous study [1] and demonstrated how measurements of NCDY below m ℓℓ = 1 TeV, bring significant improvement to high-mass PDF uncertainties.That paper studied the triple differential cross-section in the variables m ℓℓ , cos(θ * ), and η ℓℓ and simulated the sensitivity of modified PDFs in high mass searches for new physics.In a similar vein, this paper specifically looks at the behavior of various PDF sets in the description of the forward-backward asymmetry (A F B ) at high invariant masses.The ePump Package [2] is then used to understand the effect of using these updated PDF nominal values and uncertainties if nature behaved according to the other PDFs considered.We note that others have begun exploring ways to improve NCDY PDF uncertainties at high invariant mass as well [3][4][5][6][7][8][9].
The following equation describes the relationship between initial state parton momentum fractions and kinematic variables in hadron collisions: where x 1 , x 2 are the momentum fractions of partons in the incoming beam, Q is the energy scale, √ s is the collision energy, and y is the rapidity of the system.According to Equation (1), high mass events directly correspond to the high x region in PDFs.In modern PDFs, the valence quarks are well-known for x ∼ 0.3 while the sea quark PDFs, especially the strangeness PDFs, are hardly known for x > 0.3.However, for x > 0.7 even the valence PDFs are not determined well.Kinematic distributions for the high mass Drell-Yan process, especially the forward-backward asymmetry (A F B ) directly provide valence parton information in that large x region.Therefore, A F B measurements in the high mass region at the LHC can provide complementary information on large x PDFs to the science program at the Electron-Ion Collider (EIC) [10].
In this study, three modern PDF sets are compared: CT18NNLO [11], MSHT20 [12], and NNPDF4.0[13] by generating proton-proton (pp) collision pseudo-data corresponding to √ s = 13 TeV and an integrated luminosity of 3000 fb −1 .These pseudo-data are used as input to the ePump Package to update either CT18NNLO or MSHT20 as the underlying theory input.By updating the PDFs, both the central value of the PDF, and the 68% confidence level (C.L.) variation uncertainty (i.e. for CT18NNLO this uncertainty describes the variation of the underlying eigen-vectors) will change.This study also carefully considers the role of the tolerance used in ePump when updating PDF sets with new data, and discusses how to appropriately set these parameters to obtain a realistic result.

II. FORWARD-BACKWARD ASYMMETRY(A F B ) AND DILUTION FACTOR
A F B is defined as, where N F and N B are the number of forward and backward events, respectively.In the NCDY process, the scattering angle θ is defined by the momentum direction of the outgoing fermion f j relative to the momentum direction of the incoming fermion f i .The sign of cos θ is used to define forward (cos θ > 0) and backward (cos θ < 0) events.At hadron colliders, due to the unknown momentum direction of the incoming quarks, forward and backward events are defined in the Collins-Soper (CS) rest frame [14], where the polar and azimuthal angles are defined relative to the two hadron beam directions.The z axis is defined in the rest frame of the DY pair, bisecting the angle between the incoming hadron momentum and the negative of the other hadron momentum.
The cosine of the polar angle θ * between the momentum direction of the outgoing lepton l − and the ẑ axis in the CS frame is defined as the scattering angle of the DY pair at hadron colliders, which can be calculated directly from the laboratory frame lepton quantities by where the scalar factor c (either 1 or -1) is defined for the Tevatron and the LHC, respectively, as c = 1, for the Tevatron (a proton-antiproton collider) ⃗ p Z,ll /|⃗ p Z,ll |, for the LHC (a proton-proton collider) . ( And thus, the sign of the z axis is defined as the proton beam direction for the Tevatron, and as the sign of the boost direction of the lepton pair with respect to the z axis in the laboratory frame on an event-by-event basis for the LHC.The variables p Z,ll , m ll , and p T,ll denote the longitudinal momentum, invariant mass and transverse momentum of the dilepton system, respectively, and, where the lepton (anti-lepton) energy and longitudinal momentum are E 1 and p Z,1 (E 2 and p Z,2 ), respectively.DY events are therefore defined as forward (cos θ * CS > 0) or backward (cos θ * CS < 0) according to the direction of the outgoing lepton in this frame of reference.In the case of the LHC, one can define another frame such that the z axis is oriented to the quark direction of motion.We use cos θ * q to denote this case, and define a coefficient c as, We then use cos θ * h to denote the case using the lepton pair momentum to define the z axis.One can easily obtain the relationship between cos θ * h and cos θ * q , cos θ * h = cos θ * q , for E(q) > E(q) cos θ * h = − cos θ * q , for E(q) < E(q).
As introduced in previous studies [15][16][17], the dilution factor (D) quantifies the probability that the energy of the anti-quark is larger than the energy of quark.When the quark carries higher energy, the number of forward and backward events in the two different frames will be the same, which has a probability of (1 − D).When the antiquark carries higher energy, the number of forward and backward events in the two different frames will have a different sign, which has a probability of D. Finally, the number of forward and backward events N h F and N h B defined by the cos θ * h can be written as where the N q F and N q B represent the number of forward and backward events defined by the cos θ * q .As a result, the relationship between the A F B defined by the cos θ * h , and the A F B defined by the cos θ * q , can be roughly written as, where A h F B and A q F B are the asymmetries defined by cos θ * h and cos θ * q , respectively.Writing this for other flavors, Equation ( 9) can be written precisely as where f represents the index of flavors which is directly coupled with Z boson.Here the f contains five flavors: uū, d d, ss, cc, and b b.The N and N f represent the total number of events, and the number of events with a specific flavor, respectively.D f represents the dilution factor in the process with a certain flavor, and A f F B is defined by the cos θ * q with flavor dependence.The up-type and down-type flavor coupled with Z boson have different A F B values, where a more detailed discussion can be found in the A F B factorization study [17].The dilution factor D defined in Equation ( 9) can be roughly treated as an average of all the flavor combinations.Fig. 1 shows the A F B and Dilution factor as a function of dilepton mass for all flavor combinations, D, only uū contributions, D u , and only d d contributions, D d , for the CT18 PDF.The calculation is done by using the MCFM program [18] at NLO, interfaced to APPLgrid [19].The s, c, and b quark contribution are about the same as their antiquarks in CT18 PDFs, so the dilution factor D s , D c , and D b will be 0.5, and the A F B defined by cos θ * h for those processes will be zero.For CT18NNLO, s(x, Q 0 ) is assumed to be identical to s(x, Q 0 ) at Q 0 = 1.3 GeV.When the PDFs are evolved to higher energy scale Q, s(x, Q 0 ) and s(x, Q 0 ) can differ slightly at NNLO and beyond.However, even in the case of allowing s(x, Q 0 ) not equal to s(x, Q 0 ), such as in CT18As [20], the current constraint on the strangeness asymmetry is such that the dilution factor originating from the ss production channel will also be small.As shown in Fig. 1 (right), the Dilution factor of the uū process is smaller than that of the d d process, while the A h F B of the uū process is larger than the d d process.To specify what kind of flavor and x region can be constrained, we write the dilution factor in parton language.According to the definition of the dilution factor, the equivalent in parton language can be written as, , with From this one can easily see that the dilution factor is sensitive to the relative difference between quarks and anti-quarks.Fig. 3 shows the comparison of ū/u and d/d quark PDFs for the CT18 This separation between ū and u and d and d suggest that A F B in the high invariant mass region can provide sensitive information about the relative difference between quarks and antiquarks, namely, the valence information in the large x region, which is not currently constrained by existing measurements at the LHC.

III. PDF-UPDATING USING A F B IN THE HIGH MASS REGION
In this section, we quantitatively show the impact of using A F B in the high invariant mass region, to update the PDFs under study and their associated uncertainties, using the ePump package.ePump is meant to be used as a tool to approximate an update to a given PDF set and its respective PDF uncertainties in response to new kinds of input data, in this study, A F B .Should an ePump exercise suggest that these new kinds of data might inform the central PDF and/or reduce PDF uncertainties, then these new data types should be considered as additions to complete, global fitting.In this study pseudo-data were generated for A F B in the high invariant mass region using ResBos [21,22] at N 3 LL+NNLO in QCD at √ s = 13 TeV.Each PDF under study was used to generate the theory template, required by ePump to perform the update.Electroweak parameters can affect A F B distributions, but a previous study [15] shows that this mainly affects the Z peak region.So we leave the electroweak parameters set to be the same between pseudo-data and theory templates.In order to study the impact of various possible datasets, CT18, MSHT20, and NNPDF4.0 were each used to generate pseudo-data, which are then subsequently used to update the nominal PDF under study.
For the pseudo-data, samples were generated corresponding to an integrated luminosity of 3000 fb −1 in order to study the impact of data in future at the high luminosity LHC.No kinematic cuts are imposed in this study.A F B in the invariant mass region from 500-5000 GeV were used to perform the PDF updating (using 25 mass-bins of varying size to preserve good statistical precision).
As a starting point, Fig. 4 shows the central value and uncertainty of CT18, MSHT20, and NNPDF40, as well as their ratio compared to CT18, for ū/u, and d/d.

A. PDF-updating for CT18
Firstly, the CT18 PDF was updated using psuedo-data generated by CT18 itself, based on the flavor combination of ū/u, and d/d.This is because according to the definition of the dilution factor in Equation ( 11), the most sensitive flavor combination of A F B data is ū/u, and d/d.Fig. 5 shows the ratio of CT18 and its uncertainties before and after updating.Since CT18 was used for both the pseudo-data and the theory templates, the central values of the PDF set does not change, as expected.Similarly, the error band after the update shows only a slight reduction of a few percent compared to the original.This is due to the relatively small number of events in the high invariant mass region.A similar study [5] which used xFitter instead of ePump to update CT18 using A F B from the Z peak region, found a relative improvement in the PDF uncertainty at x = 10 −4 of ∼63%.We also calculated the result of limiting our inputs to the Z peak region instead of the high mass region, and found an improvement in the PDF uncertainty of ∼53% for the same x

B. PDF-updating for MSHT20
In this section, another set of PDF updating results are presented based on using MSHT20 as the nominal PDF set.The pseudo-data are kept the same as the previous part of this study.
The difference in this section is that the theory templates are generated using the MSHT20 PDF instead of CT18.Firstly, in MSHT20 the nonperturbative shape parameters are different with respect to CT18, which leads to different behavior for ū/u and d/d, and means it is interesting to repeat the study using MSHT20 as the nominal PDF.Secondly, since the tolerance used in MSHT20 is about 10, which is smaller than CT18 by a factor of 3 (The average tolerance for CT18 is about 30 for 68% C.L.), the impact of new data on MSHT20 could be much stronger.A more in-depth discussion on the effect of using different tolerances is described in Appendix A. However, as shown in Fig. 7, the PDF uncertainty of A F B for MSHT20 is much smaller than for CT18.The updating procedure also conveys such information, which when coupled with the smaller tolerance being used, finally ends up with a similar result in terms of relative change to the CT18 case.
Fig. 8 shows the PDF updating results when using the pseudo-data generated by CT18, MSHT20, and NNPDF40 to update MSHT20.

IV. PDF COMPARISONS
Figure 9 shows the MCFM [18] calculation at NLO for the A F B spectrum and the dilution factor as a function of dilepton invariant mass in the high invariant mass region with CT18, MSHT20, and NNPDF4.0PDFs.Both Figures 4 and 9 suggest different behavior for NNPDF4.0 as compared with that of CT18 and MSHT20, and the dilution factor can explain the drop-off in A F B for NNPDF4.0 at high invariant masses compared to CT18 and MSHT20.Also, Figure 9 (right) shows a growing feature for the dilution factor as a function of invariant mass for NNPDF4.0.
According to Equation ( 9), when the dilution factor D is larger than 0.5, a negative A h F B will be observed and that appears to be the case for NNPDF4.0 in the very high invariant mass region (> 5000 GeV).In order to explain the negative A F B in very high invariant mass regions, a naive toy model has been discussed [23].
Comparing A F B and the dilution factor described by CT18 and MSHT20 also reveals some differences in the methodology of the PDF global analysis, with the adoption of different datasets, higher-order theoretical calculations, and different choices of the non-perturbative parameterization forms of various parton PDFs.

V. CONCLUSION
This study presents the result of using the kinematic information from forward-backward asymmetry in the Drell-Yan process at high invariant mass to update various PDF sets (CT18NNLO, MSHT20, and NNPDF4.0).Pseudo-data were generated for pp collisions at √ s = 13 TeV, for an integrated luminosity of 3000 fb −1 , using the separate PDFs, and input into ePump to update the PDF central value and 68% C.L. variation uncertainties for CT18NNLO and MSHT20.NNPDF4.0 was found to have a different behavior in A F B versus dilepton invariant mass compared to the other PDFs considered.This is related to the dilution factor, and as such the most sensitive parts of the PDF to these effects are the ratio of ū/u and d/d.Using a dataset of 3000 fb −1 appears to be sufficient to differentiate among the PDFs, such that if CT18NNLO or MSHT20 were assumed as the underlying theory, but nature behaved closer to NNPDF4.0, the former PDF sets would be updated by the new input data accordingly.While this new information from A F B at high invariant mass is effective at shifting the nominal prediction of PDFs, it was also shown that the effect on the PDF variation uncertainty is fairly modest.
One could also make the argument against using high mass data in global PDF fits for the fear of somehow absorbing new physics into Standard Model (SM) PDFs.However, even low mass data could hide new physics in the tails of some distributions given enough data, and at some point SM-only PDFs would not be able to describe such a departure from the SM.Furthermore, given the different PDF behaviors observed in this study, the community should be aware that some PDFs (such as NNPDF4.0)might predict distributions of A F B versus dilepton invariant mass that are similar to many non-resonant new physics models (predicting high-mass drop off in A F B ).It should be noted that the choice of tolerance when using ePump is important as it causes the new input data to have a stronger (for lower tolerances) or weaker (for higher tolerances) effect on the resulting PDF update.Choosing a tolerance that accurately reflects the new input data is vital to ensure a realistic resulting update.
Further studies are planned for specifically high-mass Drell-Yan searches to explore the notion that "boutique" PDFs created for specific uses might be developed using specified measurables to help constrain SM backgrounds in regions where, without more precise PDF modeling, new physics might lay hidden.Finally, we note that when real data become available, a full global fit has to be carried out to explore the need of new non-perturbative functional forms of the PDFs at the initial Q 0 scale to better describe the data, especially when the data provides new constraints on PDFs at very large or small x regions.
using T = 1, as done in the default xFitter profiling [24], gives a much stronger impact in the update compared to T = 10, using the same pseudo-data input.Much more detailed discussions can be found in the Appendix F of [11].
PDF Ratio to CT18NNLO For CT18 the dynamical tolerances are used in the global QCD analysis.The effective tolerance parameter is about 100 for 90% C.L., or equivalently 37 for 68% C.L..If the tolerance parameter is set to 1 for the 68% C.L., it effectively means that a weight of 37 is given to the new data set.
This will dramatically overestimate the impact of new data set.

Fig. 2
Fig.2shows the parton luminosities for uū, d d, ss, cc, and b b processes.This shows that the uū process is dominant in the high invariant mass region.As a result, the A h F B value of the all flavor combination is closer to the A h F B in the uū process.

FIG. 1 .FIG. 2 .
FIG. 1.The A F B (left) and Dilution (right) spectrum as a function of dilepton mass for MCFM at NLO accuracy in the high mass region (M ll > 3000 GeV), using the CT18NNLO PDF set.The red, blue, and green curves represents the all flavor combination, only uū contribution, and only d d contribution, respectively.

9 FIG. 4 .
FIG. 4. Central value and uncertainty of CT18, MSHT20, and NNPDF40 (top), and ratio (bottom) of the central value and uncertainty to the CT18NNLO central value of the ū/u (left), and d/d (right) PDFs.

9 FIG. 5 . 9 FIG. 6 .
FIG. 5. PDF update of CT18 for ū/u (left), and d/d (right) using A F B pseudo-data generated using CT18.The central value and uncertainty are compared to the CT18 central value.

FIG. 9 .
FIG. 9. Comparison of the A F B (left) and Dilution (right) with different PDF input for MCFM at NLO accuracy in the high mass region (M ll > 3000 GeV).The band represents the PDF uncertainty.

9 FIG. 10 .
FIG. 10.PDF update for ū/u (left), and d/d (right) using A F B pseudo-data generated using NNPDF4.0.The central value and uncertainty are compared to the CT18 central value.The tolerance is set to be 10 and 1 respectively.