Search for low mass vector resonances decaying into quark-antiquark pairs in proton-proton collisions at $\sqrt{s} =$ 13 TeV

A search for low mass narrow vector resonances decaying into quark-antiquark pairs is presented. The analysis is based on data collected in 2017 with the CMS detector at the LHC in proton-proton collisions at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 41.1 fb$^{-1}$. The results of this analysis are combined with those of an earlier analysis based on data collected at the same collision energy in 2016, corresponding to 35.9 fb$^{-1}$. Signal candidates will be recoiling against initial state radiation and are identified as energetic, large-radius jets with two pronged substructure. The invariant jet mass spectrum is probed for a potential narrow peaking signal over a smoothly falling background. No evidence for such resonances is observed within the mass range of 50-450 GeV. Upper limits at the 95% confidence level are set on the coupling of narrow resonances to quarks, as a function of the resonance mass. For masses between 50 and 300 GeV these are the most sensitive limits to date. This analysis extends the earlier search to a mass range of 300-450 GeV, which is probed for the first time with jet substructure techniques.


Introduction
Many extensions of the standard model (SM), including models with extra dimensions or with new gauge symmetries, amongst others, predict the existence of leptophobic vector or axial-vector mediators that couple to SM quarks (q) [1][2][3][4][5][6][7][8][9][10][11][12][13]. These particles would be observed as resonances in the dijet mass distribution. At the CERN LHC, searches for such particles have reached the TeV scale, placing limits on resonances with masses between 1.0 and 7.6 TeV [14,15]. Below 1 TeV, the sensitivity of these searches is limited by the large background rate from quantum chromodynamics (QCD) multijet events that saturates the hardware selection algorithm (trigger) bandwidth. Complementary techniques have been explored to overcome this limitation. For masses between 450 and 1000 GeV, limits on resonances have been set by trigger-level analyses that record only partial event information and perform searches in the dijet mass spectrum with lower trigger thresholds [15][16][17][18]. In order to extend searches to even lower resonance masses, this study looks for dijet resonances that would be produced with significant initial-state radiation (ISR). The presence of ISR ensures that the events have enough energy to satisfy the trigger requirement, either by the ISR jet or by the resonance itself. For low resonance masses, the decay products of the resonance are expected to be collimated into a single, large-radius jet. Previous searches have probed the mass regime between 10 and 300 GeV using this event signature [19][20][21][22]. An ATLAS search with events containing a dijet and a high transverse momentum (p T ) photon in the final state, sets limits above 225 GeV, probing the mass range between 225 and 450 GeV where the resonance decay products start to fall outside the large-radius cone [23].
This paper focuses on a search for narrow leptophobic vector resonances with masses below 450 GeV and a natural width small relative to the detector's mass resolution. We take a Z model [24] as a proxy for such states. We consider a Lorentz-boosted event topology where the resonance recoils against significant ISR from quark/gluon radiation, increasing the momenta of the decay daughters and enabling more efficient triggering in the low resonance mass region. The resonance is reconstructed as a single, large-radius jet and it is distinguished from the dominant QCD background using jet substructure. We extend previous searches to higher resonance masses by using a jet clustering algorithm with a larger distance parameter. Using wider jets enhances the acceptance at masses above 200 GeV where the resonance decay products tend to have a larger angular separation. The data sample used in this paper was collected with the CMS detector in 2017 at √ s = 13 TeV and corresponds to an integrated luminosity of 41.1 fb −1 . The reach of this search is further extended by statistically combining the results with those from a similar analysis [20] based on data collected by CMS at the same collision energy in 2016. The resulting search for new dijet resonances in boosted topologies is based on a total integrated luminosity of 77.0 fb −1 .
based on the likelihood of the particle of originating from the hard scattering vertex. Further corrections are applied to simulated jet energies as a function of jet η and p T to match the observed detector response [46,47]. The most energetic jet in the event is assumed to correspond to the Z → qq system, and is reconstructed as a single AK8 or CA15 jet. The AK8 jets provide better sensitivity for signal mass hypotheses below 175 GeV, while the CA15 jets provide better sensitivity at mass hypotheses above 175 GeV. This is because a heavier resonance with the same transverse momentum has a lower Lorentz boost and a larger radius jet is required to contain the Z hadronization products.
Signal jets are identified using the soft-drop (SD) algorithm [48,49], the p T -invariant variable ρ [48,50], and a jet substructure variable, N 1 2 [51] . The SD algorithm with angular exponent β = 0 is applied to the jet to remove soft and wide-angle radiation with a soft radiation fraction z cut less than 0.1. The SD grooming algorithm has the effect of reducing the mass of QCD background jets for which soft gluon radiation tends to increase, while preserving the masses of merged Z /Z → qq and W → q q jets. This algorithm is used for the offline analysis, while the jet-trimming algorithm [52] is used at trigger level, as explained below. The jet-trimming algorithm reclusters the jet constituents into k T -subjets [53] with R = 0.2, and discards any subjet with p T /p jet T < 0.03. The jet mass (m SD ) is corrected by a factor derived in simulated W boson samples to ensure a p T -and η-independent jet mass distribution centered on the nominal boson mass. The dimensionless variable ρ, defined as ρ = ln(m 2 SD /p 2 T ), is used to characterize the correlation between the jet N 1 2 , jet mass, and jet p T . The observable N 1 2 is used to determine the consistency of a given jet with a two pronged topology. It is constructed from the ratio of 3-point ( 2 e 3 ) and 2-point ( 1 e 2 ) generalized energy correlation functions v e n that are based on the energies and v pairwise angles among n particles within a jet, as described in Ref. [51]. Jets originating from a two pronged decay have a larger 2-point correlation than a 3-point correlation, leading to a smaller value of N 1 2 . Since this search probes a wide range of jet mass and jet p T , we decorrelate the N 1 2 variable from the jet mass and p T following the procedure described in Refs. [19,20,50]. Without decorrelation, a selection based on N 1 2 , or a similar variable, would distort the jet mass distribution as a function of the jet p T , making the search for a resonant peak difficult. The transformed variable, denoted as a designed decorrelated tagger (DDT), is defined as is the 5th percentile of N 1 2 in simulated QCD multijet events and indicates the values of N 1 2 that divide the multijet events into groups with 5 and 95% of background efficiency, for each ρ and p T bin. This ensures that the selection N 1,DDT 2 < 0, or equivalently N 1 2 < X (5%) , yields a constant 5% of simulated QCD multijet events, irrespective of ρ and p T . The 5% quantile choice maximizes the sensitivity to a Z boson signal. The distributions of X (5%) for the AK8 and CA15 jets are shown in Appendix A.
In order to fully exploit the differential variation of N 1 2 between adjacent bins of p T and ρ and to reduce the dependence on the number of available events from simulation, we use a Gaussian kernel estimate to build the X (5%) map. In contrast to the search performed using 2016 data [20], which used an ad hoc k-nearest-neighbor (kNN) approach [54] to smooth the X (5%) distribution, this analysis is based on the detector resolutions of the N 1 2 and ρ distributions as a function of the jet p T . The X (5%) distribution is derived from distributions of the jet N 1 2 and ρ at the generator level. These distributions are smeared to include detector effects, taking into account correlations between these variables. Each of these jet observables is multiplied by a random number drawn from a Gaussian distribution, such that the smeared jet matches the resolution obtained from fully simulated events. The advantage of this method over the kNN approach is that it allows better control of the smoothness of the transformation map while maintaining similar performance in terms of the amount of jet mass decorrelation.
Events are triggered using a combination of online signatures requiring minimum thresholds on H T or on the AK8 jet p T . We also make use of a jet substructure trigger, which places a requirement on the trimmed jet mass [52], in addition to a minimum required H T or p T . Trimming the jet removes soft radiation remnants from the jet, which allows to lower H T and jet p T trigger thresholds while maintaining a similar rate, and improves the signal acceptance.
The trigger efficiency with respect to the offline selection is measured as a function of the softdrop jet mass in an independent single muon data set. The efficiency does not reach 100% smoothly since the trimmed jet mass triggers were not available early in the 2017 data collection, corresponding to the first 4.8 fb −1 of data recorded. This condition also motivates the use of a higher p T threshold compared to that used for the 2016 data period (p T > 500 GeV). The trigger selection is greater than 95% efficient for events with at least one AK8 jet with p T > 525 GeV, or with at least one CA15 jet with p T > 575 GeV. Following this selection, the trigger efficiency for both AK8 and CA15 jets is shown in Fig. 1. At high jet masses, the trigger efficiency for the larger CA15 jet decreases slightly. This decrease is due to events in which the jet passes the CA15 jet selection but fails the trigger-level AK8 jet p T and trimmed mass requirements.  Figure 1: High-level trigger efficiency as a function of the soft-drop jet mass (m SD ) for AK8 jets with p T > 525 GeV (blue squares) and CA15 jets with p T > 575 GeV (red circles). The trigger selection is >95% efficient for 2017 data for both cone sizes and is applied to AK8 jets with masses between 50 and 275 GeV and CA15 jets with masses between 150 and 450 GeV. For jet masses above 200 GeV, the trigger efficiency for the larger CA15 jet decreases slightly. This is due to events for which a reconstructed jet passing the CA15 jet selection does not satisfy the AK8 jet selection at the trigger level.
Events are selected by requiring, with |η| < 2.5, at least one AK8 jet with p T > 525 GeV or at least one CA15 jet with p T > 575 GeV. To reduce SM EW backgrounds, events are rejected if they contain isolated charged leptons with p T > 10 GeV and |η| < 2.5, 2.4, or 2.3, for electrons, muons [55,56], and tau leptons. For electrons or muons, the isolation criteria require that the pileup-corrected sum of the p T of charged hadrons and neutral particles surrounding the lepton divided by the lepton p T be less than approximately 15 or 25%, respectively, depending on η [55,56]. Tau leptons, reconstructed by combining information from charged hadrons and π 0 candidates, are required to satisfy the loose working point of a multivariant-based identifica-tion discriminant that combines information on isolation and lifetime of the tau lepton [57].
For QCD events, the distribution of ρ is approximately independent of jet p T . To avoid departure from this invariance, only events with jets in the range −5.5 < ρ < −2.0 (−4.7 < ρ < −1.0) are considered for the AK8 (CA15) jets. This results in the m SD range under study depending on the jet p T . Nonperturbative effects are large at low masses and scale as 1/m SD ; this region is avoided by the lower bound on ρ. The upper bound is imposed to avoid instabilities because the cone size of the jets is insufficient to provide complete containment at high masses [20].
Finally, jets are required to have N 1,DDT 2 < 0. This selection rejects 95% of the multijet background independently of the jet mass and p T . Events failing this requirement, with N 1,DDT 2 > 0, are used in the background estimate from data described in the next section.

Background estimate
The background is dominated by QCD multijet events with smaller contributions from W(q q )+jets, Z(qq )+jets, and top quark processes. Backgrounds from other EW process are found to be negligible.
The contributions from top pair and single top quark production are obtained from simulation. Scale factors correct the overall top quark background normalization and the N 1,DDT 2 mistag efficiency for jets originating from top quark decays. These are computed from a dedicated tt-enriched control region in data, in which an isolated muon is required.
The dominant QCD multijet background, estimated from data, has a jet mass shape that depends on the jet p T . Because of the decorrelation of N 1,DDT 2 from ρ and p T , the QCD jet mass distributions for events passing and failing the N 1,DDT 2 selection exhibit the same smoothly falling shape. Thus, we can use the distribution of events failing the selection to constrain the distribution of QCD events passing the selection as: where n QCD pass and n QCD fail are the number of passing and failing events in a given m SD , p T bin, and R p/f is the "pass-to-fail ratio".
The fraction of events, p, passing the N 1,DDT 2 selection in simulated QCD multijet events is, by construction, 5% irrespective of ρ and p T . Therefore, the correction R p/f is flat at p = 5% and f = 95% in the QCD background simulation. To account for residual differences between data and simulation, R p/f is allowed to deviate from a constant. This deviation is modeled by parametrizing R p/f as a function of ρ and p T and expanding it in a Bernstein polynomial basis of the form: where a k are the polynomial coefficients, and is a polynomial of degree n in the Bernstein basis.
The Bernstein basis is chosen over a standard polynomial because with the variable x bounded between 0 and 1 it is more stable numerically and the function is nonnegative.
With the exception of a 00 , which is fixed to unity by choice, the coefficients a k and p are unconstrained and determined together with the signal yield from a simultaneous fit to the data events passing and failing the N 1,DDT 2 selection. The minimum number of coefficients needed to model the R p/f shape is determined using a Fisher F-test on data [58]. The test is performed by iteratively comparing two parametrizations of the R p/f , one with higher polynomial order than the other, and computing the expected change in the log likelihood, i.e. using the goodness-offit as the F-statistic. To determine whether the polynomial order is sufficient, we compare the F-statistic observed in data to that computed from a set of simulated samples generated from the default fit model and fit with the higher order polynomial using the background only fit. If one provides a significantly better fit (p-value <5%), we choose that as the new default. For the AK8 jets, the optimal parametrization is found to be third order in p T and fifth order in ρ; for the CA15 jets, it is second order in ρ and fifth order in p T . The result is a slow variation of R p/f over the m SD -p T plane, with p bounded between 4.5-6.5%. This allows one to estimate the background under a narrow signal resonance across the jet mass range under investigation. As an example, the parametric shape of R p/f derived from data for the AK8 jet analysis is given in Appendix A as Fig. 7.
In order to validate the robustness of the fit and its associated systematic uncertainties, we perform a goodness-of-fit test and signal injection studies on background-only fits that estimate the possible bias on the background estimate due to the presence of a signal. We generate pseudo-experiments, with and without the injection of simulated signal, and then fit with the signal plus background model, for different values of the Z boson mass. No significant bias in the fitted signal strength is observed. As a further test of the R p/f fit robustness, we split the subset of events failing the N 1,DDT 2 selection into two smaller subsets mimicking the passing and failing selection in the data fit. The mimicked passing-like events also reject 95% of the QCD background events in the failing region. We repeat our background estimation procedure on this selection and use the coefficients a k from this fit to generate pseudo-experiments. We then fit the data with the signal plus background model and find the biases in the fitted signal strength to be negligible.

Systematic uncertainties
The dominant uncertainty in this analysis is the uncertainty in the fit for R p/f , as described in Eq. 2 (1-3%), arising from the parameters a k , and the statistical uncertainty on the data in the The systematic uncertainties in the shapes and normalization of the W and Z boson backgrounds and the signal are correlated since they are affected by similar systematic effects. The uncertainties in the jet mass scale and resolution, and the N 1,DDT 2 selection efficiency, are estimated using an independent sample of merged W boson jets in semileptonic tt events in data. In this region, we require events to have an energetic muon with p T > 100 GeV, p miss T > 80 GeV, a high-p T AK8 (CA15) jet with p T > 200 GeV, and an additional jet separated from the AK8 (CA15) jet by ∆R > 0. 8 (1.5). The efficiency of the N 1,DDT 2 < 0 requirement is measured in simulation and data by fitting the W boson mass peak in the jet mass distribution for events passing and failing this requirement in the control region. This efficiency is used to correct overall yields for resonant backgrounds obtained from simulation in the signal region and is measured to be 0.90 ± 0.09 (1.02 ± 0.06) for AK8 (CA15) jets. The jet mass resolution data-tosimulation scale factor is measured to be 1.1 ± 0.1 for both AK8 and CA15 jets. The jet mass scales in data and simulation are found to be consistent within 1%. The variation of the jet mass scale with jet p T is studied using large cone size jets. At high momenta (p T > 350 GeV) the decay products of the top quark are contained in a single jet, and the m SD distribution exhibits a top quark peak. By performing simultaneous fits to data and simulation of this peak binned in p T , a small (1%) variation in jet mass scale is observed and applied in the fit as an additional p T -dependent nuisance parameter. These scale factors determine the initial shape and normalization of the jet mass distribution for the W, Z boson, and signal but they are further constrained in the fit to data because of the presence of the W and Z resonances in the jet mass distribution.
To account for potential deviations due to missing higher-order corrections, uncertainties are applied to the W and Z boson yields. These uncertainties increase with the jet p T and are correlated per p T bin. An additional systematic uncertainty is included to account for potential differences between the W and Z boson higher-order corrections (NLO EW W/Z decorrelation). The uncertainty associated with the modeling of the Z boson p T spectrum when considering extra jets in the generation is propagated to the overall normalization of the Z signal. Finally, uncertainties associated with the jet energy resolution [46], trigger efficiency, variations in the amount of pileup and the integrated luminosity determination [59] are also applied to the W, Z, and Z boson signal yields.
A quantitative summary of the systematic effects considered for signal and W/Z boson background processes is given in Table 1. Table 1: Summary of the systematic uncertainties for signal (Z ) and W/Z boson background processes, for AK8 and CA15 jet reconstruction. The reported ranges denote a variation of the uncertainty across p T bins, from 525 to 1500 GeV (AK8 jets) and from 575 to 1500 GeV (CA15 jets). The symbol denotes uncorrelated uncertainties for each p T bin. For the uncertainties related to the jet mass scale and resolution, the reported percentage reflects a one standard deviation effect on the nominal jet mass shape. A long dash (-) indicates that the uncertainty does not apply.

Results
A binned maximum likelihood fit to the shape of the observed m SD distribution is performed using the sum of the Z signal, W, Z, tt, and QCD contributions. We search for a signal from a Z resonance in the mass range from 50 to 450 GeV. Signal shapes are taken directly from simulation. The fit is performed simultaneously in the passing and failing regions of five (four) p T categories for AK8 (CA15) jets, as well as in the passing and failing components of the ttenriched control region. The boundaries of the p T categories are: 525, 575, 625, 700, 800, and 1500 GeV for the AK8 jets and 575, 625, 700, 800, and 1500 GeV for the CA15 jets. The bin boundaries are chosen so that approximately the same number of events are used to constrain R p/f in each p T bin.
The number of observed events is consistent with the predicted background from SM processes. Figure 2 shows the m SD distribution for data and measured background contributions for AK8 jets in each p T category of the fit for a Z mass hypothesis of 150 GeV. Figure 3 shows the distributions for CA15 jets in each category for a Z mass hypothesis of 210 GeV. For AK8 jets, the W and Z boson contributions are clearly visible as a merged peak in the data, while for CA15 jets, due to the ρ selection and increased QCD background, the W/Z contributions are only visible in the lower p T categories.
The results of the fit are used to set 95% confidence level (CL) upper limits of the Z boson coupling to quarks g q , which is related to the Z coupling convention of Ref.
[24] by g q = g B /6. Upper limits are computed using the modified frequentist approach for confidence levels (CL), taking the profile likelihood ratio as the test statistic [60, 61] in the asymptotic approximation [62]. Systematic uncertainties are incorporated as nuisance parameters and profiled over in the limit calculations, using log-normal priors for normalization uncertainties and Gaussian constraints for shape uncertainties. The dominant uncertainty on the g q limit arises from the fit parameters of the R p/f followed by the theoretical uncertainties on the signal yield due to missing NLO QCD corrections.
Limits on g q as a function of the Z boson mass are shown in Fig. 4, using only data collected in 2017. Based on the expected sensitivity, the AK8 and CA15 jet selections are used for signal masses below and above 175 GeV, respectively. Coupling values above the solid curves are excluded at the 95% CL. The maximum local observed p-value corresponds to 2.9 standard deviations at a Z (qq) mass of 200 GeV. The largest downward fluctuation in the limits occurs at a Z (qq) mass of 60 GeV, corresponding to a local significance of −3 standard deviations. A loss of sensitivity of 20%, relative to the results set by the previous search [20], is observed, due to the higher p T threshold determined by the trigger turn-on for the 2017 data set.
We summarize the results of this paper in the mass vs. coupling plane in Fig. 5. For masses between 50 and 220 GeV, the most restrictive limits for this search are obtained from the statistical combination of the upper limits set by the 2016 and 2017 data sets using AK8 jets. The limits correspond to a total integrated luminosity of 77.0 fb −1 and are the most sensitive to date. For higher masses, between 220 and 450 GeV, the most stringent limits come from the analysis of 2017 data using CA15 jets, corresponding to an integrated luminosity of 41.1 fb −1 . In the mass range between 220 and 300 GeV these limits are also the most sensitive to date. For comparison, less sensitive limits set by the AK8 jet analysis in the range from 220 to 300 GeV, using the combined data sets recorded in 2016 and 2017, are presented in Fig. 8 of Appendix A. The sensitivity is driven by the multijet background uncertainty on the parametric fit of R p/f , which is modeled with different polynomial orders for the 2016 and 2017 data sets. A local excess in the observed limit over the expected limit, corresponding to 2.9 standard deviations, was ob-

Summary
A search for a narrow vector resonance (Z ) decaying into a quark-antiquark pair and reconstructed as a single jet with a topology of a resonance recoiling against initial state radiation has been presented. The analysis uses a data set comprised of proton-proton collisions at √ s = 13 TeV collected in 2017 at the LHC, corresponding to an integrated luminosity of 41.1 fb −1 . The results are statistically combined with those obtained with data collected in 2016 to achieve more sensitive exclusion limits with a total integrated luminosity of 77.0 fb −1 . Jet substructure techniques are employed to identify a jet containing a Z boson candidate over a smoothly falling jet mass distribution in data. No significant excess above the standard model prediction is observed. Upper limits at 95% confidence level are set on the Z boson coupling to quarks, g q , as a function of the Z boson mass. Coupling values of g q > 0.4 are excluded over the signal mass range from 50 to 450 GeV, with the most stringent constraints set for masses below 250 GeV where coupling values of g q > 0.2 are excluded. For masses between 50 and 300 GeV these are the most sensitive limits to date. The results obtained for masses from 300 to 450 GeV represent the first direct limits to be published in this range for a leptophobic Z signal reconstructed as a single large-radius jet.  [57] CMS Collaboration, "Performance of reconstruction and identification of τ leptons decaying to hadrons and ν τ in pp collisions at √ s = 13 TeV", JINST 13 (2018) P10005, doi:10.1088/1748-0221/13/10/P10005, arXiv:1809.02816.
[60] T. Junk, "Confidence level computation for combining searches with small statistics", Nucl.   variable for AK8 jets (right) and CA15 jets (left), corresponding to the 5% quantile of the N 1 2 distribution in simulated multijet events. The distributions are shown as a function of the jet ρ and p T . The N 1 2 variable is mostly insensitive to the jet ρ and p T in the kinematic phase space considered for this analysis: −5.5 < ρ < −2.0 (AK8 jets) and −4.7 < ρ < −1.0 (CA15 jets). The distributions of X (5%) are used to take into account residual correlations in simulation by applying a decorrelation procedure that yields the N 1,DDT 2 variable. In order to ensure smoothness of the transformation, we simulate particle-level QCD multijet events and smear them using a parametric detector response derived for the N 1 2 variable as a function of ρ and p T . This method overcomes the limitation from the limited event count in simulated samples by generating 10 4 the original number of events available in the multijet simulation. is constructed so that, for simulated multijet events, R p/f is constant at p = 5% and f = 95% (blue). To account for residual differences between data and simulation, R p/f is extracted by performing a two-dimensional fit to data in (ρ, p T ) space (orange). The R p/f shown is derived for AK8 jets using 41.1 fb −1 of data collected in 2017 and corresponds to a polynomial in the Bernstein basis of third order in p T and fifth order in ρ.