Parton Density Uncertainties and the Determination of Electroweak Parameters at Hadron Colliders

We discuss the determination of electroweak parameters from hadron collider observables, focusing on the W -boson mass measurement. We revise the procedures adopted in the literature to include in the experimental analysis the uncertainty due to our imperfect knowledge of the proton structure. We show how the treatment of the proton parton density functions ’ (PDFs ’ ) uncertainty as a source of systematic error leads to the automatic inclusion in the fit of the bin-bin correlation of the kinematic distributions with respect to PDF variations. In the case of the determination of M W from the charged lepton transverse momentum distribution, we observe that the inclusion of this correlation factor yields a strong reduction of the PDF uncertainty, given a sufficiently good control over all the other error sources. This improvement depends on a systematic accounting of the features of the QCD-based PDF model, and it is achieved by relying only on the information available in current PDF sets. While a realistic quantitative estimate requires taking into account the details of the experimental systematics, we argue that, in perspective, the proton PDF uncertainty will not be a bottleneck for precision measurements.

Introduction.-The values of the W-boson mass m W and of the sinus of the leptonic effective weak mixing angle sin 2 θ l eff are very precise predictions in the electroweak (EW) sector of the standard model (SM) and allow stringent tests at the level of the quantum corrections. The measurements of these two parameters at the Tevatron and at the LHC indicate [1][2][3][4][5][6] the imperfect knowledge of the proton structure as one of the main sources of systematic uncertainty of theoretical origin. The latter affects the computation of the templates used in the fit of the kinematic distributions, and eventually the determination of the EW parameters.
The proton collinear parton distribution functions (PDFs) suffer from different uncertainties of experimental as well as theoretical origins. The impact of the error of the data from which the PDFs are extracted is represented by sets of functions, Hessian eigenvectors, or Monte Carlo replicas that span in a statistically significant way the functional space of all possible parameterizations. The propagation of the experimental error in the prediction of any observable is achieved by simply repeating the evaluation of the latter several times, with all the available members of the PDF set; the mean and standard deviation are eventually computed, with the latter being the propagation of the experimental error to the observable under study. All the members of a PDF set share some common theoretical features, like the fact that they all obey the perturbative QCD (PQCD) evolution equations and sum rules, and are thus correlated with each other. While this correlation is automatically included in the propagation of the experimental PDF error to the prediction of any observable, the determination of a parameter extracted from the simultaneous fit of several observables requires a careful discussion.
In the case of the W-boson mass determination, the role of the PDFs has been discussed in the articles presenting the experimental results, for those PDF sets used in the analyses, whereas a more general comparison of different parameterizations has been presented in Refs. [7,8]; in all cases, the common outcome is that an uncertainty Δ PDF m W at the 10 (20) MeV level is expected in the lepton-pair transverse mass (lepton transverse momentum) case, with the precise value depending on several details of the analyses and on which parametrizations are included in the study. All these studies considered the fit of a kinematic distribution by combining the information of different bins weighed by their statistical and systematic errors but neglecting any bin-bin correlation with respect to PDF variations. In Ref. [9], the dependence of the uncertainty on the rapidity range included in the acceptance region was exploited to quantify the benefit given by an m W measurement at LHCb to the final combination of all the available results, in terms of a reduced PDF uncertainty. The possibility of a systematic extraction of very precise information about the Drell-Yan parton-parton luminosities has been studied in Refs. [10][11][12][13][14], aiming at a better modeling of the initial state and to a consequent reduction of the PDF error on m W . The impact of measurements at different colliders and energies has been scrutinized in Ref. [15].
We plan to revise the propagation of the PDF uncertainties of experimental origin in the determination of a parameter obtained via the fit of a kinematic observable.
Bin-bin correlation and template fit definitions.-At hadron colliders, m W is determined in the charged-current (CC) Drell-Yan (DY) process from the measurement of observables such as the charged-lepton transverse momentum dσ=dp l ⊥ and the lepton-pair transverse mass distributions, for which the Jacobian peak enhances the sensitivity to the position of the pole of the W propagator. The finite rapidity detector acceptance and other kinematical constraints induce a sensitivity of the shape of these observables, defined in the transverse plane, to the initial state proton collinear PDFs.
The PDF uncertainty, represented by a set of replicas with N rep members, affects the normalization but also the shape of the observables. Different bins of the same distribution are correlated with respect to a PDF replica variation, as can be seen in Fig. 1, because of kinematic constraints and due to the theoretical framework shared by all the replicas. In Fig. 1, the sudden and strong change of sign of the dσ=dp l ⊥ self-correlation is quite evident across the Jacobian peak at p l ⊥ ∼ 40 GeV; the self-correlation of the dσ=dx 1 distribution also signals the existence of two partonic x ranges, below and above x ∼ 4 × 10 −3 . The cross correlations thus establish a link between the parton-parton luminosities (i.e., the source of the PDF uncertainty) and the dσ=dp l ⊥ distribution (from which m W is determined) with a nontrivial underlying correlation pattern.
The determination of a Lagrangian parameter from a kinematic distribution via a template fit requires the choice of a Lagrangian density (in our case, the SM one) and of a tool that simulates the observables computed in that model in a well-defined setup. The simulation tool is fully specified by the choice of a proton PDF parametrization, whereas the parameter (e.g., m W ) is left free to vary when comparing to the experimental data. In this construction, the PDF replicas represent a one-parameter family of models to analyze the data.
The equivalence of the replicas in the proton description represents a source of theoretical systematic error when we try to determine m W from the fit of a kinematic distribution. We account for this systematics in the following χ 2 definition: where in the bin i of the distribution, we have the following quantities: T 0;k is our fitting model based on the average replica 0 of PDF set R, and it has been computed with the kth W-boson mass hypothesis m W;k ; D exp is the experimental value, and σ 2 i is its statistical error; and the differences S r;k ¼ T r;k − T 0;k are computed for each member r of the PDF set and are treated as nuisances with fit parameters α r . The quadratic penalty factor P r∈R α 2 r corresponds to having assumed a Gaussian penalty for the replicas with respect to the central replica of the set. Since the templates are in general affected by statistical Monte Carlo and experimental errors, we should take that into account by considering By repeating the minimization of χ 2 k , with respect to the α r , for different values of m W;k , the minimum of the sequence labeled by k selects the preferred m W value and the Δχ 2 ¼ 1, 4, and 9 rules identify the one, two, and three standard deviation intervals due to the PDF uncertainty. For a given m W;k , the minimum of the χ 2 expression in Eq. (1) can be written [16] with the binbin covariance matrix computed with respect to PDF variations and including the statistical and systematic error contributions [17].
where Σ stat is a diagonal matrix with the statistical variances on each bin of the distribution, estimated for a given integrated luminosity L; Σ MC is the diagonal matrix of the squared Monte Carlo error of the templates; and N cov is the number of PDF replicas used to compute the PDF covariance matrix [19]. We introduce in the full covariance matrix an additional term Σ exp;syst to account for experimental systematics, although their faithful description depends on the details of each experiment. In Eq. (3), we approximate Σ exp;syst by using the detector model of the Compact Muon Solenoid (CMS) experiment presented in Ref. [20]. We stress that in this Letter all the replicas are treated as equivalent; i.e., we do not anticipate the impact that future measurements may have in reducing the PDF uncertainty.
The approach that we are proposing to include the PDF uncertainty on an EW parameter has to be compared with what has been used in the past, e.g., in Refs. [7,8], where the analysis relied on the minimization of a χ 2 defined as treating the contributions of different bins as independent and weighing them with their statistical error. The templates were generated with the central PDF replica 0 for different mass hypotheses k; the distributions, computed with N rep different replicas, were treated as independent pseudodata; and the minimization was repeated separately for each of them. The resulting N rep preferred m W;r values were eventually analyzed by computing the mean value and standard deviation and ignoring the associated values of χ 2 k;r;no−cov ; the standard deviation was taken as the estimate of the PDF uncertainty. A similar χ 2 definition, including only diagonal contributions, has been used up to now by the experimental collaborations at the Tevatron and LHC.
Numerical results.-We perform all the simulations using the CC-DY event generator provided in the POWHEG-BOX [21,22], showered with PYTHIA 8.2 [23], setting ffiffiffi S p ¼ 13 TeV. We restrict ourselves to W þ production without hindering the generality of our arguments. We apply the acceptance cut jη l j < 2.5. We use for our analysis the PDF set NNPDF30_nlo_as_0118_1000 [24], featuring N rep ¼ 1000 replicas.
In Eq. (2), the templates are computed using the replica 0 of the PDF set, scanning m W with a 1 MeV spacing in the interval m W ∈ ½80.035; 80.735 GeV. We let the distribution computed with the central replica 0 of the PDF set, and with a fixed m W;0 ¼ 80.385 GeV value, play the role of the experimental data D exp ; this choice does not spoil the validity of the method and of the conclusion, and it offers a sanity check on the fit results. The covariance matrix is evaluated with the N rep replicas. We checked that the dependence of the covariance matrix on m W , in the interesting range of AE20 MeV around the central value of 80.385 GeV, is small; therefore we neglected it in the numerical analysis. The statistical error on the pseudodata is estimated by assuming two different luminosities: 1 and 300 fb −1 .
Since the value of the PDF uncertainty affecting the m W determination is sensitive to the fit window ½p min ⊥ ; p max ⊥ , we perform a scan in the two values p min;max ⊥ and plot, for each point in this plane, the uncertainty value corresponding to the half-width of the To present a comparison with the previous approaches, we perform an analysis using the prescription of Eq. (6), using 200 replicas (this time with a fixed m W;0 ¼ 80.385 GeV value) as distinct pseudodata distributions; we generate the templates with the replica 0. In Fig. 2, we show the analysis of distributions normalized to the cross section integrated in the fitting interval. The results, consistent with those presented in Ref. [8] and labeled in all the Figures by "Bozzi et al.," show a weak sensitivity to the upper limit of the fit window but a clear dependence on its lower limit.
In Figs. 3 and 4, we present the results based on Eq. (2), in the case of normalized distributions, assuming an experimental integrated luminosity L int , as well as no template Monte Carlo error. Figure 5 also corresponds to 300 fb −1 , but we now include a Monte Carlo error extrapolated to a statistics of 10 10 events [25]. The statistical error is dominant in Fig. 3, whereas it becomes negligible at high luminosity, putting in evidence a strong reduction of the PDF uncertainty, down to the Oð1 MeVÞ level. The Monte Carlo error of the templates has a visible impact, as shown in Fig. 5, and would become negligible in a sample with 200 billions events. We remark on the weak sensitivity of the results to the fit window. We eventually show in Fig. 6 the results corresponding to 300 fb −1 and a systematic error on the muon momentum reconstruction simulated via the model of Ref. [20]. The covariance matrix used in Fig. 5 is added to the one coming from the detector effects, estimated using 100 toys. The negative impact in the description of the peak region balances the improved control on the tails of the distribution, increasing the size of the total error. We have checked that a reduction by a factor of 10 of the Gaussian smearing of the lepton momentum would lead to uncertainties close to the ones shown in Fig. 4. Other sources of theoretical systematics, such as perturbative QCD or parton shower uncertainties, could become one of the limiting factors for the m W determination, and they will be considered in a future publication.
We observe that this approach strongly reduces the impact of PDF uncertainties because of the specific structure of the bin-bin PDF covariance matrix Σ PDF of the p l ⊥ distribution, with the presence of quite distinct blocks formed by the bins below and above the Jacobian peak [26]. The eigenvalues spectrum of Σ PDF covers more than seven orders of magnitude between the largest and the smallest elements in absolute value. The broad range of the eigenvalues induces a very narrow shape of the χ 2 distribution as a function of m W , implying a strong penalty factor for all the templates that do not perfectly overlay their peak position with the one of the data. The penalty applies to the differences in the tails of the p l ⊥ distribution while, at the same time, an excellent sensitivity to m W at the 1 MeV level given by the templates granularity is preserved, as we explicitly verified as a sanity check of the approach. The important role played by Σ PDF is partially smeared by the interplay between PDFs and statistical and systematic errors. Since C ¼ Σ PDF þ Σ stat þ Σ MC þ Σ exp;syst ; at low luminosities or low template accuracy, the statistical error has a nontrivial interplay with the PDF error, yielding larger uncertainties than the values obtained for each class of errors alone; at high luminosities, with highly accurate templates, we instead approach the limit C ≃ Σ PDF and the corresponding strong uncertainty reduction. Similar comments apply to the inclusion of the experimental systematic errors.
The PDF uncertainty band of the p l ⊥ distribution is given by a combination of perturbative and nonperturbative effects, which cannot be analytically separated; although the PQCD elements (Dokshitzer-Gribov-Lipatov-Altarelli-Parisi equations, QCD sum rules) in the proton description cannot be qualified as uncertainty sources, they nevertheless enter in the generation of the uncertainty band because of their entanglement with the data. The covariance matrix allows the effective encoding of a substantial piece of information of PQCD origin, which should not be qualified as uncertainty, and includes it in the fit. [8] FIG. 3. PDF error as a function of the fit window expressed by its minimum and maximum p l ⊥ values. Error estimated from a fit of shape distributions and using a covariance matrix obtained by summing the PDF one with a statistical (diagonal) error on the pseudodata corresponding to 1 fb −1 . [8] FIG. 4. Same as in Fig. 3, but assuming L int ¼ 300 fb −1 . [8] FIG. 5. Same as in Fig. 4, but also including a Monte Carlo error on the templates corresponding to 10 10 events.
The description of the proton in terms of a QCD-inspired model and the representation of the uncertainty via Monte Carlo replicas are thus the two elements allowing the PDF uncertainty reduction. The discussion of the PDF sets representing the associated uncertainty via Hessian eigenvectors will be presented in a future publication.
In conclusion, we have studied the theoretical systematic error due to the PDF uncertainty, focusing on the determination of the W-boson mass from the DY dσ=dp l ⊥ distribution. We included this systematics in the χ 2 that we use in the data fitting, achieving the automatic inclusion of the bin-bin correlation with respect to PDF variations. We observe a drastic reduction of the PDF uncertainty on m W , which we explain as a consequence of the strong kinematic correlation, of PQCD origin, of the bins above and below the Jacobian peak of the distribution. The interplay of the PDF with the statistical and experimental systematic errors yields nontrivial results when the statistics is limited and systematics not fully understood. We consider this approach promising in view of the reduction of one of the bottlenecks limiting, so far, the high-precision determination of m W at hadron colliders. The formulation of Eq. (2) is well suited for a direct and efficient inclusion of the PDF uncertainty in the analysis of the experimental data. The use of this information should not be limited to the fit of m W , but it should also be part of the determination of any Lagrangian parameter derived in the analysis of LHC observables. The inclusion of further observables sensitive to the QCD model in a global fit, such as p Z ⊥ , and of the corresponding cross-correlations might provide additional benefits to the m W determination. However, to properly assess the impact of such a procedure, a detailed study beyond the scope of this Letter is needed. E. B. is supported by the Paul Scherrer Institut. We thank L. Bianchini, S. Camarda, E. Manca, M. Mangano, G. Rolandi, and L. Silvestrini for useful discussions. We also thank the DESY IT department for making available to us the computational resources of the BIRD/NAF2 cluster, which have been extensively used for this work. A. V. is supported by the European Research Council under the European Unions Horizon 2020 research and innovation program (Grant Agreement No. 740006).