Unraveling the couplings of a Drell-Yan produced $Z'$ with heavy-flavor tagging

Despite no new physics so far at the LHC, a $Z'$ boson with $m_{Z'} \sim 100$ GeV could still emerge via Drell-Yan (DY) production, $q \bar q \to Z' \to \mu^+ \mu^-$, in the next few years. To unravel the nature of the $Z'$ coupling, we utilize the $c$- and $b$-tagging algorithms developed by ATLAS and CMS to investigate $cg \to c Z'$ at 14 TeV LHC. While light-jet contamination can be eliminated, mistagged $b$-jets cannot be rejected in any of the tagging schemes we adopt. On the other hand, for nonzero $bbZ'$ coupling, far superior $b$-tagging could discover the $bg \to b Z'$ process, where again light-jet mistag can be ruled out, but mistagged $c$-jets cannot yet be excluded. Provided that DY production is discovered soon enough, we find that a simultaneous search for $c g \to c Z'$ and $b g \to b Z'$ can conclusively discern the nature of $Z'$ couplings involved.


I. INTRODUCTION
A Z boson a few hundred GeV in mass could still emerge via the Drell-Yan (DY) process, qq → Z → µ + µ − , for qqZ couplings that are weaker than analogous Standard Model (SM) couplings. Recent searches [1,2] set stringent bounds on the couplings of such a Z boson to u, d and s quarks, but the limits are much weaker for c or b quarks, hence discovery is possible within the next few years. One such scenario [3] involves a Z that couples to c quarks, leading to DY production cc → Z → µ + µ − at the LHC. The cg → cZ → cµ + µ − process then offers a unique probe of the flavor structure of the Z coupling, if the c-jet flavor can be identified. Recent developments at ATLAS and CMS in c-tagging [4-6] algorithms and excellent performance of b-tagging [5, 7,8] offer such an opportunity. In this paper we discuss how these heavy flavor taggers can probe the couplings of a Z after its discovery through the DY process.
We illustrate with the scenario of Ref. [3], where a Z couples relatively weakly to charm quarks and predominantly to muons. The DY process pp → Z + X → µ + µ − + X (X being inclusive activity) could emerge in the next few years, and a ccZ coupling would imply the cg → cZ process. We apply the c-tagging algorithms to investigate the discovery potential of pp → cZ + X → cµ + µ − + X (denoted as cZ process, with the conjugate process implied) at √ s = 14 TeV LHC. The c-tagging algorithms of ATLAS [4,5] and CMS [6] discriminate c-jets from light-jets (jets originating from u, d, s and gluon) at the expense of c-tag efficiency, while misidentification (or mistag) rate of b-jets as c-jets are relatively sizable. If the Z couples to light q = u, d, s quarks, a potential cZ signal may arise from mistag (denoted as fake cZ ). As the qqZ coupling is constrained by search for heavy resonance in DY process [1], our analysis shows that in certain c-tagging schemes one can completely rule out the possibility of fake cZ from light-jets. But these tagging schemes fail to rule out the possibility of fake cZ from mistagged b-jet.
In case the Z couples instead to b quarks (bbZ coupling), pp → bZ + X → bµ + µ − + X (bZ process) would emerge after the discovery in DY. This process could be observed by the well developed b-tagging algorithms [5,7,8], which provide excellent discrimination against light-and c-jets while maintaining high b-tagging efficiency. We find the current limit on ccZ coupling allows for fake bZ discovery at LHC due to mistag of c-jet as b-jet. However, this fake bZ process at LHC could be ruled out if ∼ 250 fb −1 data is collected. We find that, if a Z is discovered via the DY process in the next few years, combining the cZ and bZ signatures together with current limits from heavy resonance searches, one can conclusively infer the nature of Z couplings.
We finally consider a case where both bbZ and ccZ couplings are nonzero and study DY, cZ and bZ processes for a representative Z mass. We find that the coupling structure of such a scenario can also be disentangled, if combined with the current limit from heavy resonance search in DY process.
The paper is organized as follows. In Sec. II, we analyze the discovery potential of the DY process due to qqZ couplings. In Sec. III, we apply different c-tagging algorithms for the discovery potential of the cZ process and discuss fake sources. Sec. IV is dedicated to the bZ process, and on disentangling the Z coupling structure by combining with the results of Sec. III. The scenario for having both ccZ and bbZ couplings is analyzed in Sec. V, and we summarize in Sec. VI. The analysis for the DY process is detailed in Appendix A, while normalized kinematic distributions for the signal and backgrounds of the cZ process are provided in Appendix B.

II. THE DRELL-YAN PROCESS
We take the following effective couplings, FIG. 1. The 5σ discovery reach of the DY process pp → Z + X → µ + µ − + X at 14 TeV LHC with 3000 fb −1 data, initiated by ccZ (left) and bbZ (right) couplings. The purple shaded regions are the 95% CL upper limits extracted from Ref. [1].
where g is the coupling of Z to the muon, tauon and their neutrinos, and g R qq is the right-handed (RH) qqZ coupling (induced by some underlying heavy particles [9]). The context is the effective model based on the gauged L µ − L τ [10,11] symmetry, as discussed in Refs. [9,12]. For simplicity and to be more general, we set all flavor violating couplings to zero and assume g R qq to be real. The coupling g is taken to be much larger than the coupling g R qq , hence the Z couples more weakly to quarks, and its decay branching ratios can be approximated as: ( The results in this paper can be scaled to any narrow Z that couples to quarks and muons by the relation: Search for heavy dilepton resonances by ATLAS [1] and CMS [2] set stringent bounds on σ(pp → Z +X)·B(Z → µ + µ − ), hence on g R qq couplings. The ATLAS result is based on 36 fb −1 data, while the CMS result is for 13 fb −1 . We use the former [1] to extract 95% credibility level (CL) upper limits on g R cc and g R bb couplings, shown as the purple shaded regions in Fig. 1. In doing so, we calculate σ(pp → Z + X), where the dominant contribution is from qq → Z with subdominant contributions qg → qZ and gg → qqZ (q = c or b), at leading order (LO) for fixed m Z and g R qq by MadGraph5 aMC@NLO [13] (referred as MadGraph5 aMC from here on); we generate matrix elements (ME) with up to two additional jets in the final state 1 with the parton distribution function (PDF) set NN23LO1 [14], followed by PYTHIA 6.4 [15] adopting the MLM scheme [16] for ME and parton shower (PS) matching and merging. We, then, rescale the estimated cross section by |g R qq | 2 and extract the upper limit on |g R qq | for each m Z from the ATLAS result assuming Fig. 1, the 5σ discovery reach 2 is also given with 3000 fb −1 data for the High Luminosity LHC (HL-LHC). If the Z couples to u, d or s quark, the limits on g R qq would be much stronger due to a larger PDF , i.e. probing a much smaller g R uu , g R dd or g R ss coupling than that of g R cc and g R bb . The details of the cut-based analysis and background processes are given in Appendix A. For sake of a decent S/B ratio, we restrict ourselves to m Z 700 GeV. In principle, the methodology in this paper can be applied to left-handed (LH) qqZ couplings g L qq , although there is some subtlety; that is, the SU(2) L gauge symmetry relates couplings of the up-and down-type sector quarks nontrivially. For instance, a nonzero g L cc is generally accompanied by a nonzero g L ss and all possible down-type sector couplings, e.g., g L dd , g L bb and g L bs , which are CKM-suppressed. Hence, one has to deal with multiple couplings simultaneously. This would complicate the analysis, and we defer to future study.

III. THE cZ PROCESS
Having discussed the discovery potential of ccZ coupling through the DY process, we turn to pp → cZ + X → cµ + µ − +X, i.e. the cZ process, which requires tagging of c-jet. Thanks to recent developments in charm tagging by ATLAS [5] and CMS [6], it is now possible to study such a process, where many phenomenological studies and discussions can already be found [17][18][19][20][21][22][23][24][25].

A. Searching for cZ
Let us briefly discuss the present c-tagging algorithms. ATLAS [5] gives a range for b-and light-jet rejections 3 for a fixed value of c-tagging efficiency. These fixed c-tagging 2 Significance is defined by S/ √ B, where S and B denote the number of signal and background events, respectively. 3 The mistag rate is defined as the complement of rejection rate. efficiencies are presented as curves (called "iso-efficiency curve") in the b-vs light-jet rejection plane. CMS [6] presents similar constant c-tagging efficiency curves in the b-and light-jet mistag efficiency plane. For ATLAS iso-efficiency curves, c-tagging schemes with high lightjet rejection have low b-jet rejection rates, and vice versa. The CMS curves show similar behavior. The largest background for the cZ process is Z/γ * + light-jet. In order to reduce this background, we take two c-tagging working points (WP) with low light-jet mistag rate (i.e. high light-jet rejection) from the ATLAS analysis, which we call configuration 1 (Conf1) and configuration 2 (Conf2), given in the first two rows of Table I. On the other hand, CMS gives three c-tagging WPs called c-tagger L, M and T (abbreviated as ctagL, ctagM and ctagT in this paper), which we give in the last three rows of Table I. For both ATLAS and CMS, WPs with higher b-jet rejection could be taken at the cost of lower lightjet rejection for a fixed c-tagging efficiency, but we do not consider such cases in this study. Note that these c-tagging schemes show mild dependence on transverse momentum (p T ) and pseudo-rapidity (η) of the jet. For simplicity, we take them to be constant in this study.
To illustrate the discovery potential of the cZ process, we choose the benchmark values of mass and coupling m Z = 150 GeV, g R cc = 0.005, setting all other g R qq couplings in Eq. (1) to zero. The cZ process suffers from several SM backgrounds. The dominant ones are Z/γ * +jet, tt, W t, with smaller contributions from W W , W Z, ZZ, ttZ, ttW and tW Z. There exist non-prompt and fake backgrounds such as W +jets, QCD multi-jets etc., which we do not consider, as these backgrounds are not properly modeled in simulation. Due to different tagging efficiencies and mistag rates, we separate Z/γ * + jet background into three different categories, i.e. Z/γ * + c-jet, b-jet and light-jet, respectively.
Signal and background events are generated at LO in the pp collision with √ s = 14 TeV via the Monte Carlo event generator MadGraph5 aMC@NLO with the PDF set NN23LO1, interfaced to PYTHIA 6.4 for showering and hadronization. The event samples are finally fed into the fast detector simulator Delphes 3.4.0 [26] for inclusion of (CMS-based) detector effects. For ME and PS matching and merging we followed MLM matching scheme. To take higher order corrections into account, the LO cross section of Z + light-jet is normalized by a correction factor 1.83 [27] up to NNLO. For simplicity we assume correction factors for the Z +c-jet and Z +b-jet backgrounds to be same as Z + light-jet. The LO tt and W t cross sections are normalized to the NNLO+NNLL ones by factors 1.84 [28] and 1.35 [29], respectively. Furthermore, the LO cross sections of W W , W Z and ZZ backgrounds are normalized to the NNLO QCD ones by factors 1.98 [30], 2.07 [31] and 1.74 [32], respectively. The NLO K factors for the ttZ and ttW − (ttW + ) backgrounds are assumed to be 1.56 [33] and 1.35 (1.27) [34]. We do not include K factors for the signal and the tW Z background.
We follow Ref. [25] closely in our analysis for both signal and background. We select events with two oppositely charged muons and at least one jet. Normalized event distributions can be found in Appendix B for transverse momenta of the two muons and leading c-jet, and the invariant mass of a µ + µ − pair. We require the leading and subleading muons to have p µ1 T > 50 GeV, p µ2 T > 40 GeV, respectively. The transverse momenta of the leading jet in an event should be p j T > 45 GeV. The minimum separation between two muons (∆R µµ ) and the separation between any muon and the leading jet (∆R µj ) are required to be > 0.4. The maximum pseudo-rapidity (|η|) of both muons and the leading jet in an event are required to be < 2.5. The jets are reconstructed using anti-k T algorithm with radius parameter R = 0.5. To reduce contribution from tt and W t backgrounds, events with missing transverse energy (E miss T ) > 40 GeV are rejected. Finally, we impose an invariant-mass cut |m µµ − m Z | < 15 GeV on the two oppositely charged muons in an event. If an event contains more than one m µµ combination, the combination closest to m Z is selected. The impact of the selection cuts on the signal and backgrounds are given in Table II (based on ATLAS c-tagging) and Table III (based on CMS c-tagging).
The ATLAS Conf1 and Conf2 schemes may discover cZ process with 930 fb −1 and 1090 fb −1 integrated luminosities, respectively. The dominant background contribution for Conf1 is from Z/γ * + light-jet, while Z/γ * + cjet constitute the second largest background. This is distinctly different for Conf2: Z/γ * + c-jet and tt provide the dominant and second largest contributions. A larger c-tagging efficiency makes Conf1 superior to Conf2 for c-tagger WP (ATLAS) Signal Z/γ * + c-jet Z/γ * + b-jet Z/γ * + light-jet tt W t V V ttV tW Z Total Bkg. discovery. Similarly, ctagL, ctagM and ctagT for CMS could discover cZ process with 1150 fb −1 , 1550 fb −1 and 2120 fb −1 integrated luminosities. The ctagL requires roughly the same luminosity as ATLAS Conf2, although the c-tagging efficiencies and b-and light-jet mistag rates are different. The larger c-tagging efficiency of ctagL is balanced by higher mistag rates for light-and b-jets. The smaller c-tagging efficiencies make the cZ process harder to discover for ctagM and ctagT. Following the same selection cuts, 4 we extend our analysis for Z mass up to 700 GeV. The discovery reaches for the ATLAS Conf1 (orange dotted), Conf2 (orange solid), CMS ctagL (blue dot-dashed), ctagM (blue dotted) and ctagT (blue solid ) with 3000 fb −1 data are given in Fig. 2.

B. Fake cZ
Signal for cZ process could arise from light-and b-jet mistags, which we display in Fig. 3 for the cases of g R uu (left), g R ss (middle) and g R bb (right) couplings, for LHC at √ s = 14 TeV with 3000 fb −1 data. The purple shaded regions correspond to 95% CL upper limits extracted from Ref. [1]. Let us take a closer look.
The fake cZ signals depend on the upper limits on qqZ coupling and the c-tagging schemes adopted. The extraction of upper limits involves the underlying DY process qq → Z , which depends on the initial state quark PDFs, and is also proportional to g R qq 2 . On the 4 Our study is for illustration, and we do not optimize the selection cuts for each m Z . We, however, checked a possible impact of such a cut optimization. The largest impact would be obtained by narrowing the invariant mass window |mµµ − m Z | < 15 GeV for a light Z : we found, for m Z = 150 GeV, the 5 GeV window leads to enhancement in the signal significance by ∼ 30% − 34%, depending on the c-tagging scheme. We found effects of changing the p T cuts for the muons and leading c-jet are minor, once we impose the |mµµ − m Z | cut, which tends to select events with higher p T muons for a higher Z mass.
other hand, fake cZ signals can originate from qg → qZ and its conjugate process. Although also proportional to g R qq 2 , the cross sections are suppressed by the 2 → 2 nature compared to the DY process, and depend on gluon and quark PDFs. Due to high light-jet rejection rates, two c-tagging schemes Conf2 and ctagT can fully eliminate fake cZ from light-jets. That is, the 5σ contours for them lie in the excluded regions for both g R uu and g R ss couplings in the Z mass range studied, unlike Conf1, ctagL and ctagM, which excludes only some m Z regions.
None of these schemes, however, shows promise in reducing fakes from b-jet misidentification, since all schemes have considerable b-jet mistag rates. This can be seen from the rightmost panel of Fig. 3. The high light-jet rejection and low c-tagging efficiency (to reduce the dominant Z/γ * +light-and Z/γ * +c-jet backgrounds) make ctagT performing the worst. However, although having same c-tagging efficiency and even lower light-jet rejection, the lower b-jet mistag rate of Conf2 makes it perform better than ctagT. Our choice of high light-jet, but moderate b-jet, rejections allows the possibility of fake cZ arising from bbZ coupling. We thus turn to scrutinize this issue in the next section.

A. Searching for bZ
If the discovery of DY produced Z is due to bbZ coupling, it implies bg → bZ → bµ + µ − (and its conjugate) could also be discovered at the LHC. To illustrate the potential for pp → bZ + X → bµ + µ − + X at LHC, we adopt similar strategy as before, and take the following benchmark for mass and coupling: m Z = 150 GeV, g R bb = 0.005. We follow the same cut-based analysis as in previous section, except the tag jet is now a b-jet. We incorporate in Delphes p T and η dependent b-tagging efficiencies. The rejection factor of the light-jets are taken as  Fig. 2). 137 [39]. For simplicity, we assume the correction factors to the LO background cross sections generated by MadGraph5 aMC to be the same as in previous section, and do not multiply K factor for the signal. The signal and background cross sections after selection cuts are given in Table IV. The required luminosity to discover the 150 GeV Z is 1180 fb −1 . Our analysis is further extended up to m Z = 700 GeV, as shown in the left panel of Fig. 4. For simplicity we choose the same selection cut as in cZ process to generate Fig. 4.

B. Fake bZ
Mistagged light-or c-jets can also produce fake bZ signals at the LHC, but the required g R qq couplings (q = u, d, s) to produce fake bZ at 5σ with 3000 fb −1 are already disallowed by heavy resonance DY searches [1]. This attests to the excellent performance of b-tagging algorithms in reducing light-jet contributions. However, fake bZ can still arise from mistagged c-jets, except two tiny mass windows around m Z ∼ 150 and 300 GeV, as can be read from the right panel of Fig. 4 for the 5σ reach with 3000 fb −1 . We infer that, if no Z is observed via DY with ∼ 250 fb −1 dataset, one can rule out the possibility of fake bZ from the ccZ coupling at LHC.
Even if a Z is discovered via DY with ∼ 100 fb −1 or smaller dataset, one can still eliminate the possibility of fake bZ from ccZ coupling by combining bZ and cZ searches.
For instance, a 600 GeV Z with g R cc = 0.02, which can be discovered with 110 fb −1 of data via the DY process, requires 1310 fb −1 of data to give fake bZ signals at 5σ; however, observing cZ does not take long after the discovery of the DY process (e.g., 160 fb −1 for Conf2 and 350 fb −1 for ctagT; see the left panel of Fig. 1, Fig. 2, and the right panel of Fig. 4). In general, fake bZ from ccZ coupling, if observed, should be preceded by the discovery of cZ with a smaller dataset. A similar argument holds for fake cZ from bbZ coupling: after discovery via DY induced by bbZ , fake cZ can emerge, but it should be preceded by discovery of bZ for all five c-tagging schemes (see the right panels of Fig. 1 and Fig. 3, and left panel of Fig. 4). Therefore, the simultaneous search for cZ and bZ can reveal if the coupling behind DY production is ccZ or bbZ .

V. PRESENCE OF BOTH cZ AND bZ PROCESSES
We have so far studied the discovery potential of cZ and bZ processes with a nonzero ccZ or bbZ coupling exclusively. However, all uuZ , ddZ , ssZ , ccZ and bbZ couplings could in principle coexist. If any of the first three couplings involving light quarks are nonzero, we might discover Z in DY process, without subsequent discovery of cZ and/or bZ processes which can be easily discerned by using both c-and b-tagging algorithms.
A more interesting scenario is when both ccZ and bbZ couplings are nonzero, but all other couplings to light quarks vanish. These couplings would give rise to both cZ and bZ processes, depending on their individual strengths. In order to investigate such a scenario, we take the following benchmark point: These g R cc and g R bb values remain within respective allowed regions, as well as σ(pp → Z +X)·B(Z → µ + µ − ) within 95% CL upper limit set by ATLAS [1]. Larger g R cc and g R bb would be in tension with σ · B upper limit. This benchmark can be discovered in the DY process with just 210 fb −1 integrated luminosity, followed by a discovery in the bZ process with 870 fb −1 data, which is lower than the one quoted for case g R bb = 0.005 alone in Sec. IV A. The cZ process would emerge later, at 2370 fb −1 (Conf1), 2420 fb −1 (Conf2), 2570 fb −1 (ctagL), 2600 fb −1 (ctagM) or 1740 fb −1 (ctagT). The benchmark

VI. SUMMARY
We analyze the possibility to probe the coupling structure of a relatively weakly coupled Z via the qg → qZ process, adopting c-and b-tagging algorithms of ATLAS and CMS at 14 TeV LHC. Such a resonance would appear first in the Drell-Yan process. Our study shows that, if a Z is discovered first via the pp → Z + X → µ + µ − + X DY production, one could then discover cg → cZ and bg → bZ processes at the HL-LHC. We illustrate with two different c-tagging schemes from ATLAS, chosen to optimally reduce Z+ light-jet background, but maintaining moderate c-tagging efficiencies. We also adopt three c-tagging working points from CMS in our analysis.
The cZ process could arise from misidentification of light-or b-jets. Fake cZ from light-jet misidentification can be excluded by existing data, if one adopts ATLAS Conf2 or CMS ctagT scheme. However, none of the ctagging schemes can rule out the possibility of fake cZ from mistag of b-jets. In order to eliminate fake cZ from finite bbZ coupling, we advocate simultaneous study of cZ and bZ processes. We find that a nonzero bbZ coupling would give genuine bZ and fake cZ signatures. Conversely, a nonzero ccZ coupling can give genuine cZ and fake bZ , within the allowed region of ccZ coupling. The latter possibility can be eliminated in the near future if no Z emerges in the DY process with ∼ 250 fb −1 data. Our study is based on the current status of c-tagging algorithms. Any future improvement in c-tagging would only improve the analysis.
It would be interesting if both ccZ and bbZ couplings are nonzero. We illustrate with one such representative scenario, i.e. for a 150 GeV Z with g R cc = 0.003 and g R bb = 0.005. We find that 210 fb −1 data is needed for DY discovery, which would be followed by discovery of the bZ process with 870 fb −1 , while the cZ process would emerge much later with integrated luminosities ranging from ∼ 1740 fb −1 to 2600 fb −1 , depending on c-tagging scheme. This scenario differs from cases when either g R cc or g R bb vanish. For example, when only g R cc = 0.005 is nonzero, DY discovery for a 150 GeV Z would be followed by discovery in the cZ process, without emergence of subsequent 5σ signature of fake bZ process, even with full HL-LHC data. However, if g R bb = 0.005 is the only nonzero coupling, DY process would be followed by discovering the bZ process. The highest attainable fake cZ signature in this scenario would be about 4.4σ.
We have not included backgrounds associated with fake and nonprompt sources, systematic uncertainties and QCD corrections for the signal, which would induce some uncertainties to our results. Furthermore, we have not included the uncertainties from scale dependence and PDF with the latter being large for the heavy quarks, in particular for b quark. The PDF uncertainties for c or b quark initiated processes are discussed in Refs. [40,41], while a detailed discussion on PDF choices and their uncertainties for Run 2 of LHC can be found in Ref. [42]. All these effects would impact on the extracted upper limits on the g R cc and g R bb couplings, as well as our estimated luminosities for discovery.
Our study illustrates that new resonances could still emerge at the LHC, and large integrated luminosities can probe weaker couplings, or unravel more detail. Given that our study was partly motivated by flavor "anomalies" [35][36][37][38], associated flavor of Z production could shed more light on potential new physics indications from the flavor sector. Of course, one would certainly search for other Z decay modes, such as Z → τ + τ − implied by Eq. (1).
Note Added. While revising the manuscript, we noticed CMS released a new result [43] for the dilepton resonance search with 36 fb −1 data of the 13 TeV LHC. We checked resulting 95% CL upper limits on different g R qq couplings, with the procedure to interpret the CMS results discussed in Ref. [3], and found that the new CMS limits [43] are comparable to the ATLAS limits with 36 fb −1 data [1], except for m Z ∼ 500 GeV, where the CMS gives slightly stronger limits due to a sharp downward fluctuation in its observed data. We confirmed that the new CMS limits do not impact on our conclusion.