Study of $Z$ bosons produced in association with charm in the forward region

Events containing a $Z$ boson and a charm jet are studied for the first time in the forward region of proton-proton collisions. The data sample used corresponds to an integrated luminosity of $6 \, {\rm fb}^{-1}$ collected at a center-of-mass energy of 13 TeV with the LHCb detector. In events with a $Z$ boson and a jet, the fraction of charm jets is determined in intervals of $Z$-boson rapidity in the range $2.0<y(Z)<4.5$. A sizable enhancement is observed in the forward-most $y(Z)$ interval, which could be indicative of a valence-like intrinsic-charm component in the proton wave function.

The possibility that the proton wave function may contain a |uudcc component, referred to as intrinsic charm (IC), in addition to the charm content that arises due to perturbative gluon radiation, i.e. g → cc splitting, has been debated for decades (for a recent review, see Ref. [1]). The light front QCD calculations of Refs. [2,3], referred to as the BHPS model, predict that non-perturbative IC manifests as valence-like charm content in the parton distribution functions (PDFs) of the proton; whereas, if the c-quark content is entirely perturbative in nature, the charm PDF resembles that of the gluon and sharply decreases at large momentum fractions, x. (Charge conjugation is implied throughout this Letter, e.g., charm refers to both the c andc quarks.) Understanding the role that non-perturbative dynamics play inside the nucleon is a fundamental goal of nuclear physics [4][5][6][7][8][9][10][11][12][13][14][15]. Furthermore, the existence of IC would have many phenomenological consequences. For example, IC would alter both the rate and kinematics of c hadrons produced by cosmic-ray proton interactions in the atmosphere, which are an important source of background in studies of astrophysical neutrinos [16][17][18][19][20][21]. The cross sections of many processes at the LHC and other accelerators would also be affected [22][23][24][25][26][27][28][29][30][31][32].
Measurements of c-hadron production in deep inelastic scattering [33] and in fixedtarget experiments [34], where the typical momentum transfers were Q 10 GeV (natural units are used throughout this Letter), have been interpreted both as evidence for [35,36] and against [37] the percent-level IC content predicted by BHPS. Even though such experiments are in principle sensitive to valence-like c-quark content, interpreting these low-Q data is challenging since it requires careful theoretical treatment of nonperturbative hadronic and nuclear effects. Recent global PDF analyses, which also include measurements from the LHC, are inconclusive and can only exclude IC carrying more than a few percent of the momentum of the proton [38,39].
Reference [29] proposed probing IC by studying events containing a Z boson and a charm jet, Zc, in the forward region of proton-proton (pp) collisions at the LHC. The ratio of production cross sections R c j ≡ σ(Zc)/σ(Zj), where Zj refers to events containing a Z boson and any type of jet, was chosen because it is less sensitive than σ(Zc) to experimental and theoretical uncertainties. Since Zc production is inherently at large Q, above the electroweak scale, hadronic effects are small. A leading-order Zc production mechanism is gc → Zc scattering (see Fig. 1), where in the forward region one of the initial partons must have large x, hence Zc production probes the valence-like region (Fig. S4 of the Supplemental Material shows the x regions probed). Using next-to-leading-order (NLO) Standard Model (SM) calculations, Fig. 2 illustrates that a percent-level valence-like IC contribution would produce a clear enhancement in R c j for large (more forward) values of Z rapidity, y(Z); whereas only small effects are expected in the central region where all previous measurements of R c j were made [40,41].    [41] is used for y(Z) < 2; otherwise the fiducial region of this analysis is employed. The broadening of the error band that arises in the forward region, when allowing for IC, is due to the lack of sensitivity to valence-like IC from previous experiments. More details on these calculations are provided in the Supplemental Material [43]. The error bands shown for the first two predictions display the 68% confidence-level regions. Only the central value is shown for BHPS due to the charm PDF being fixed. Z bosons p T (µ) > 20 GeV, 2.0 < η(µ) < 4.5, 60 < m(µ + µ − ) < 120 GeV Jets 20 < p T (j) < 100 GeV, 2.2 < η(j) < 4.2 Charm jets p T (c hadron) > 5 GeV, ∆R(j, c hadron) < 0.5 Events ∆R(µ, j) > 0.5 This Letter presents the first measurement of R c j in the forward region of pp collisions. The data sample used corresponds to an integrated luminosity of 6 fb −1 collected at a center-of-mass energy of √ s = 13 TeV with the LHCb detector. The Z bosons are reconstructed using the Z → µ + µ − decay, where henceforth all Z/γ * → µ + µ − production in the mass range 60 < m(µ + µ − ) < 120 GeV is labeled Z → µ + µ − . The analysis is performed using jets clustered with the anti-k T algorithm [44] using a distance parameter R = 0.5. The fiducial region is defined in terms of the transverse momentum, p T , pseudorapidity, η, and azimuthal angle, φ, of the muon and jet momenta, and includes a requirement on ∆R(µ, j) ≡ ∆η(µ, j) 2 + ∆φ(µ, j) 2 to ensure that the muons and jets are well separated, which suppresses backgrounds from QCD multijet events and electroweak processes like W +jet production. Charm jets are the subset for which there is a promptly produced and weakly decaying c hadron within the jet. The fiducial region is defined in Table 1. If multiple jets satisfy these criteria, the one with the highest p T is selected. No requirement is placed on the maximum number of jets in the event.
The quantity R c j is measured in intervals of y(Z) as R c j = N (c-tag)/[ε(c-tag)N (j)], where N (c-tag) is the observed Zc yield, ε(c-tag) is the c-tagging efficiency, and N (j) is the total Zj yield. The integrated luminosity does not enter this expression because R c j involves a ratio of production cross sections. In addition, the muon and jet reconstruction efficiencies largely cancel in the ratio due to the similarity of the Z-boson and jet kinematics in Zc and Zj production. The c-tagging algorithm, which is described in detail in Ref. [45], looks for a displaced-vertex (DV) signature inside the jet cone that is indicative of the weak decay of a c hadron.
The LHCb detector is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, described in detail in Refs. [46,47]. Simulated data samples are used to evaluate the detector response for jet reconstruction, including the c-tagging efficiency, and to validate the analysis. In the simulation, pp collisions are generated using Pythia [48] with a specific LHCb configuration [49]. Decays of unstable particles are described by EvtGen [50], in which final-state QED radiation is generated using Photos [51]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [52] as described in Ref. [53].
The online event selection is performed by a trigger [54,55] consisting of a hardware stage using information from the calorimeter and muon systems, followed by a software stage that performs a full event reconstruction. At the hardware stage, events are required to have a muon with p T (µ) > 6 GeV. In the software stage, the muon track is required to be of good quality and to have p T (µ) > 10 GeV. The offline selection builds Z → µ + µ − candidates from two oppositely charged muon tracks that must be in the fiducial region defined in Table 1 and consistent with originating directly from the same pp collision.
Jet reconstruction is performed offline by clustering charged and neutral particle-flow candidates [56] using the anti-k T clustering algorithm as implemented in FastJet [57]. Reconstructed jets with 15 < p T (j) < 100 GeV and 2.2 < η(j) < 4.2 are kept for further analysis. Jets with 15 < p T (j) < 20 GeV, which are outside of the fiducial region, are retained for use when unfolding the detector response. The η(j) requirement, which is included in the fiducial region and was first used in Refs. [58][59][60], ensures a nearly uniform c-tagging efficiency of about 24%, with minimal p T (j) or η(j) dependence. The fiducial requirement ∆R(µ, j) > 0.5 is applied to reconstructed jets. Finally, the highest-p T jet satisfying these criteria from the same pp collision as the Z boson is selected. After applying all requirements, 68 694 Zj candidates remain in the dataset.
The effects of the detector response on the measured jet momenta are accounted for using an unfolding procedure. This involves first determining the reconstructed Zc and Zj yields in intervals of [y(Z), p T (j)]. The non-Z background is neglected for both Zc and Zj candidates because it is less than 1% and largely cancels in the R c j ratio. The c-jet yields are determined using the DV-based tagging approach described in detail in the following paragraphs. Interval migration is accounted for by unfolding the p T (j) distributions of the Zc and Zj yields in each y(Z) interval independently using an iterative Bayesian procedure [61,62]. The Zc yields are then corrected for c-tagging inefficiency. Finally, the unfolded [y(Z), p T (j)] distributions are integrated for p T (j) > 20 GeV to obtain the Zc and Zj yields used to determine the R c j ratios. The analysis employs three y(Z) intervals with ranges 2.00-2.75, 2.75-3.50, and 3.50-4.50, and four p T (j) intervals ranging 15-20, 20-30, 30-50, and 50-100 GeV, where after unfolding the yields in the three highest p T (j) intervals are summed to obtain R c j . The signature of a c jet is the presence of a long-lived c hadron that carries a sizable fraction of the jet energy. The tagging of c jets is performed using DVs formed from the decays of such c hadrons. The choice of using DVs and not single-track or other non-DV-based jet properties, e.g. the number of particles in the jet, is driven by the need for a small misidentification probability of light-parton jets. Furthermore, the properties of DVs from c-hadron decays are known to be well modeled by simulation, which means that only small corrections using control samples are required. Since DVs can also be formed from the decays of b hadrons or due to artifacts of the reconstruction, the DV-tagged charm yields are obtained by fitting the distributions of DV features with good discrimination power between c, b, and light-parton jets.
The tracks used as inputs to the DV-tagger algorithm are required to have p T > 0.5 GeV and to be inconsistent with originating directly from a pp interaction point. A DV is associated to a jet when ∆R < 0.5 between the jet axis and the DV direction of flight, defined by the vector from the pp interaction point to the DV position. Requirements that reject strange-hadron decays and particles formed in interactions with material [63] are placed on the mass, m(DV), and momentum, p(DV), of the particles that form the DV, along with the DV position. In addition, only DVs with at most four tracks are used, since higher-multiplicity DVs are almost exclusively due to b-hadron decays. More details about the c-tagging algorithm are provided in Ref. [45].
Two DV properties are used to separate charm jets from beauty and lightparton jets: the number of tracks in the DV, N trk (DV), and the corrected mass, where θ is the angle between the momentum and the flight direction of the DV. The corrected mass, which is the minimum mass the long-lived hadron can have that is consistent with the flight direction, peaks near the typical c-hadron mass for c jets, and consequently provides excellent discrimination against other jet types. The DV track multiplicity provides additional discrimination against b jets, since b-hadron decays often produce many displaced tracks. These two distributions are fitted simultaneously to obtain the DV-tagged c-jet yields. The probability density functions, referred to as templates, for c, b, and light-parton jets are obtained from calibration data samples that are each highly enriched in a given jet flavor [45]. Figure 3 shows the m cor (DV) and N trk (DV) distributions for all DV-tagged candidates in the Zj data sample reconstructed in the fiducial region, along with the fit projections; such fits are performed in each [y(Z), p T (j)] interval to obtain the reconstructed Zc yields.
The effects of p T (j) interval migration are corrected for using the unfolding procedure. The detector response is studied using the p T -balance distribution p T (j)/p T (Z) for Zj candidates that are nearly back-to-back in the transverse plane, using the same technique as in Refs. [56,64]. Small adjustments are applied to the p T (j) scale and resolution in simulation to obtain the best agreement with data. In addition, for the Zc and Zj samples the p T (j) and p T (DV) distributions in simulation are adjusted to match those observed in data. The unfolding matrix for jets that contain a reconstructed DV is shown in Fig. 4, while the corresponding matrix for inclusive Zj production is provided in the Supplemental Material [43].
The dominant systematic uncertainty is due to limited knowledge of the c-tagging efficiency, which is measured in p T (j) intervals using data in Ref. [45] and briefly summarized here. Scale factors that correct for discrepancies between data and simulation are determined using a tag-and-probe approach on a dijet calibration sample. A stringent requirement is applied to the tag jet which enriches the probe-jet sample in charm content. The DV-tagged c-jet yield in the probe sample is obtained in the same way the Zc yield LHCb Figure 4: The detector-response matrix for c-tagged jets. The shading represents the intervalto-interval migration probabilities ranging from (white) 0 to (black) 1. Numerical labels are only shown for values greater than 1%. Jets with true (reconstructed) p T (j) in the 20-100 GeV region but for which the reconstructed (true) p T (j) is either below 15 GeV or above 100 GeV are included in the unfolding but not shown graphically.
is determined in this analysis, namely by fitting the m cor (DV) and N trk (DV) distributions for DV-tagged probe jets. The total number of c jets in the probe sample is obtained by fully reconstructing the D 0 → K − π + and D + → K − π + π + decays, obtaining the prompt-charm yields by fitting the D-meson mass and impact-parameter distributions, then correcting these yields for the detector response, decay branching fractions [65], and c-hadron fragmentation fractions [66]. The c-tagging efficiency is the ratio of the DV-tagged and total c-jet probe-sample yields. The scale factors that correct the c-tagging efficiency in simulation are determined to be 1.03 ± 0.06, 1.01 ± 0.08, and 1.09 ± 0.17 in the 20-30, 30-50, and 50-100 GeV p T (j) intervals, respectively, with corresponding c-tagging efficiencies of (23.9 ± 1.4)%, (24.4 ± 1.9)%, and (23.6 ± 4.1)%. These uncertainties, which include all statistical and systematic contributions, are propagated to the R c j results producing 6-7% relative uncertainties in each y(Z) interval.
Other sources of smaller systematic uncertainty are also considered. First, variations of the m cor (DV) and N trk (DV) templates are studied, which arise from using different strategies to model the backgrounds in the highly enriched calibration data samples. However, the shifts observed in the Zc yields largely cancel with the corresponding shifts seen in ε(c-tag). The residual differences of 3-4% in each y(Z) interval are assigned as systematic uncertainties. The ratio of the jet-reconstruction efficiency for c and inclusive jets is consistent with unity in all kinematic intervals in simulation, with a 1% systematic uncertainty assigned due to the limited sample sizes. Finally, the statistical precision of the back-to-back Zj sample used to determine the p T (j) scale and resolution is propagated through the unfolding procedure resulting in a 1% relative systematic uncertainty on R c j . The systematic uncertainties are summarized in Table 2. Figure 5 shows the measured R c j distribution in intervals of y(Z); the numerical results are provided in Table 3, and additional results are reported in the Supplemental Material [43]. The measured R c j values are compared to NLO SM calculations [29] based on Refs. [67][68][69][70][71][72][73], which are validated against additional predictions [70,71,74,75] and updated here to use more recent PDFs [38,39,42,76,77]. While Zc predictions at NNLO in QCD are not available, Zb predictions are [78], and similar methods should be applicable. The NNPDF analysis provides results where the charm PDF is allowed to vary, both in size and in shape [39]. The sizable uncertainties that arise in the forward region are due to the lack of sensitivity to valence-like IC from previous experiments. Reference [38] updated the CT14 analysis [79] to include the IC content predicted by BHPS [2,3], which results in the enhancement at forward y(Z) shown previously in Fig. 2. These predictions have smaller uncertainties because the size and shape of the IC contribution are fixed, i.e. the IC contribution is assumed to be known, hence does not contribute to the uncertainty on R c j . More details on the theory calculations, along with predictions based on other PDFs [80][81][82], are provided in the Supplemental Material [43].
The observed R c j values are consistent with both the no-IC and IC hypotheses in the first two y(Z) intervals; however, this is not the case in the forward-most interval where the ratio of the observed to no-IC-expected values is 1.85 ± 0.25. As illustrated in Fig. 2, this is precisely the y(Z) region where valence-like IC would cause a large  Figure 5: Measured R c j distribution (gray bands) for three intervals of forward Z rapidity, compared to NLO SM predictions [29] without IC [42], with the charm PDF shape allowed to vary (hence, permitting IC) [39,76], and with IC as predicted by BHPS with a mean momentum fraction of 1% [38]. The predictions are offset in each interval to improve visibility.  Fig. 5 shows that, after including the IC PDF shape predicted by BHPS with a mean momentum fraction of 1%, the theory predictions are consistent with the data. Incorporating these novel forward R c j results into a global analysis should strongly constrain the large-x charm PDF, both in size and in shape. While the large enhancement in the forward-most y(Z) interval is suggestive of valence-like IC, no definitive statements can be made until the R c j results are included in a global PDF analysis. In conclusion, events containing a Z boson and a charm jet are studied for the first time in the forward region of pp collisions. The data sample used corresponds to an integrated luminosity of 6 fb −1 collected at a center-of-mass energy of 13 TeV with the LHCb detector. The ratio R c j is measured in intervals of y(Z) and compared to NLO SM calculations. The observed spectrum exhibits a sizable enhancement at forward Z rapidities, consistent with the effect expected if the proton wave function contains the |uudcc component predicted by BHPS. However, conclusions about whether the proton contains valence-like intrinsic charm can only be drawn after incorporating these results into global PDF analyses.

Supplemental Material for LHCb-PAPER-2021-029
Study of Z bosons produced in association with charm in the forward region

Theory Predictions
The NLO SM calculations of Ref.
Hadronization was performed with the Pythia 8 event generator, while hadrons were decayed with the EvtGen package [50] interfaced with the Photos final state radiation generator [72]. No requirement is placed on the maximum number of jets in the event.
The results in Ref.
[29] were produced using the CT14 NNLO PDF set [79] for the matrix element, and the corresponding LO PDF set for the parton shower. Using the kinematics of the initiating partons, these results were then weighted in this analysis to produce predictions from more recent PDF sets. The associated systematic uncertainty for these predictions consists of PDF, scale, and strong-coupling uncertainty. Varying the charm mass leads to negligible changes in the R c j predictions. Full correlations were considered when evaluating these uncertainties. The PDF uncertainty is determined using the technique of Monte Carlo PDF replicas [73], while the scale uncertainty is determined from the envelope obtained by independently varying both the factorization and renormalization scales by up to a factor of two. The strong coupling uncertainty is evaluated by varying α s within experimental uncertainty. Both the scale and strong-coupling uncertainties largely cancel in the ratio, leaving the PDF as the dominant source of uncertainty.
These theory predictions are validated by comparing to the R c j results from the CMS collaboration at central y(Z) [41], where there is minimal sensitivity to IC. The CMS collaboration measured R c j to be (10.2 ± 0.9)%. Reference [41] also provides NLO SM predictions, (11.1 ± 0.3)% and (9.0 ± 0.9)%, obtained using the aMC@NLO generator interfaced with the Pythia 8 parton shower using FxFx matching and the fixed-order MCFM generator [74,75], respectively. Repeating the procedure used to obtain the theory predictions presented in this Letter, but in the CMS fiducial region, gives (9.6 ± 1.0)%, (9.5 ± 1.0)%, and (9.7 ± 1.0)% for the PDF4LHC15 no IC, NNPDF 3.0 IC allowed, and CT14 with BHPS PDFs, respectively. Therefore, the approach used here produces predictions that are consistent both with the CMS measurement and with the theory predictions provided in Ref. [41].
Finally, in Fig. 5 the CT14 with BHPS results are from the PDF set labeled as BHPS3 in Ref. [38], which is provided for the value x IC = 1%.

Additional Numerical Results
Numerical results are provided in Table 3. The statistical uncertainties are uncorrelated between y(Z) intervals, whereas the systematic uncertainties are approximately completely correlated. Since many of the systematic uncertainties are correlated between y(Z) intervals, numerical results are also provided for the ratios of R c j values between pairs of intervals in Table S1.   LHCb Figure S3: The detector-response matrix for inclusive Zj events. The shading represents the interval-to-interval migration probabilities ranging from (white) 0 to (black) 1. Numerical labels are only shown for values greater than 1%. Jets with true (reconstructed) p T (j) in the 20-100 GeV region but for which the reconstructed (true) p T (j) is either below 15 GeV or above 100 GeV are included in the unfolding but not shown graphically.   Figure S5: Measured R c j distribution (gray bands) for three intervals of forward Z rapidity, compared to NLO SM predictions [29] using various PDF sets. The predictions are offset in each interval to improve visibility. The left plot shows that the three PDF sets [76,77,79] on which the PDF4LHC15 [42] set is formed from all provide consistent predictions for R c j . The right plot shows that the ABM16 [80], JR14 [81], and HERAPDF 2.0 [82] PDF sets also provide qualitatively similar predictions, though the JR14 and HERAPDF 2.0 predictions are shifted to lower R c j values.