Search for narrow H$\gamma$ resonances in proton-proton collisions at $\sqrt{s} =$ 13 TeV

A search for heavy, narrow resonances decaying to a Higgs boson and a photon (H$\gamma$) has been performed in proton-proton collision data at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 35.9 fb$^{-1}$ collected with the CMS detector at the LHC in 2016. Events containing a photon and a Lorentz-boosted hadronically decaying Higgs boson that is reconstructed as a single, large-radius jet are considered, and the $\gamma$+jet invariant mass spectrum is analyzed for the presence of narrow resonances. To increase the sensitivity of the search, events are categorized depending on whether the large-radius jet can be identified as a result of the merging of two jets originating from b quarks. Results in both categories are found to agree with the predictions of the standard model. Upper limits on the production of H$\gamma$ resonances are set as a function of the resonance mass in the range of 720-3250 GeV, representing the most stringent constraints to date.

In this Letter, we extend the above program by reporting on a search for narrow resonances decaying into a Higgs boson and a photon (Hγ).The existence of such resonances has been recently predicted [38], using a Z → Hγ decay at one-loop level as an example, which we chose as the signal benchmark in the present analysis.Here, Z is a U(1) spin-1 boson, similar to the SM Z boson.For the decay of a sufficiently massive Z boson via this channel, the Higgs boson would be produced with a significant Lorentz boost.If the Higgs boson decays into a pair of b quarks, then its final state will be reconstructed as a single large-radius jet with a characteristic substructure that can be exploited to distinguish those jets from jets originating from gluons or light-flavor quarks.The background to this process is dominated by SM γ+jet production, and the application of "b-tagging" techniques to identify jets originating from b quarks ("b quark jets" or "b jets") can reduce the backgrounds by nearly two orders of magnitude.The remaining backgrounds come from nonresonant bbγ production, from qqγ production with the light-flavor quarks incorrectly identified as b quarks, and from multijet production with light flavor jets mistagged as b quark jets in addition to a jet misreconstructed as a photon.Very recently, when the work reported here was nearing completion, the ATLAS Collaboration reported on a first search in this channel [39].
The analysis reported in this Letter is performed with data corresponding to an integrated luminosity of 35.9 fb −1 recorded with the CMS detector at the LHC in proton-proton (pp) collisions at a center-of-mass energy of 13 TeV during 2016.We focus on Z masses above ≈700 GeV where the Higgs boson is produced with a sufficient Lorentz boost to be efficiently reconstructed as a single, large-radius jet.The analysis uses control samples in data to predict the shape of the background and to optimize the analysis.The background is obtained via a fit to data with a functional form determined using these control samples, as well as simulation, and a signal is searched for as a narrow resonance on top of a smoothly falling background.To increase the sensitivity of the search, events are categorized depending on whether the large-radius jet can be identified as a result of the merging of two jets originating from b quarks.This categorization uses an advanced b tagging technique developed specifically for searches involving Lorentz-boosted Higgs bosons decaying into bb pairs [40].
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections.Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors.Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.Events of interest are selected using a two-tiered trigger system [41].The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a time interval of less than 4 µs.The second level, referred to as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to less than 1 kHz before data storage.A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [42].
Online, events are selected using a logical OR of two triggers: one requiring a photon candidate with a transverse momentum (p T ) greater than 175 GeV and the other requiring a photon with p T > 165 GeV, which in addition must have the ratio of the energy deposits in the HCAL and ECAL below 0.1.
Offline, events are reconstructed using a particle-flow (PF) algorithm [43], which aims to reconstruct and identify each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector.The energy of photons is directly obtained from the ECAL measurement.The energy of electrons is determined from a combination of the electron momentum at the primary interaction vertex as measured by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track.The momentum of muons is obtained from the curvature of the corresponding track.The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for zero-suppression effects and for the response function of the calorimeters to hadronic showers.Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energy.
The events must contain at least one reconstructed primary vertex [44] with at least four associated tracks, with transverse (longitudinal) coordinates required to be within 2 (24) cm of the nominal collision point.The reconstructed vertex with the largest value of summed physicsobject p 2 T is taken to be the primary pp interaction vertex.The physics objects are the jets, clustered using the infrared and collinear safe anti-k T algorithm [45,46] with the tracks assigned to the vertex as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the p T of those jets.
For each event, hadronic jets are clustered from PF candidates using the anti-k T algorithm, with a distance parameter of 0.8 ("large-radius jet," or J), as implemented in the FASTJET package [46].The jet momentum is determined as the vectorial sum of all particle momenta in the jet, and is found from simulation to be within 5 to 10% of the true momentum over the whole p T spectrum and detector acceptance.Additional pp interactions within the same or nearby bunch crossings (pileup) can contribute additional tracks and calorimetric energy depositions to the jet momentum.To mitigate this effect, the pileup-per-particle identification (PUPPI) algorithm [47] is applied to the PF candidates in the event prior to jet clustering.The correction applied to the PF candidates is a rescaling of the candidate's four momentum based on a variable that estimates the probability that it originated from pileup or from the primary vertex.Jet energy corrections are derived from simulation to bring measured response of jets to that of particle-level jets on average.In situ measurements of the momentum balance in dijet, multijet, γ+jet, and leptonically decaying Z+jet events are used to account for any residual differences in jet energy scale in data and simulation [48].The jet energy resolution amounts typically to 15% at 10 GeV, 8% at 100 GeV, and 4% at 1000 GeV.Additional selection criteria are applied to each jet to remove jets potentially dominated by anomalous contributions from various subdetector components or reconstruction failures [49].
Photons are required to pass identification via a multivariate analysis (MVA) classifier [50].The inputs to the MVA include shower shape variables, isolation sums computed from PF candidates in a cone of radius ∆R = 0.3 in the η-φ plane, centered on the photon candidate, and variables that account for the dependencies of the shower shape and isolation variables on pileup.In addition, a conversion-safe electron veto [50] is applied.Photon candidates are required to pass an MVA working point that corresponds to a typical photon reconstruction and identification efficiency of 90% in the photon p T range used in the analysis.
At the preselection level, events are required to contain at least one photon candidate with p γ T > 200 GeV and |η γ | < 2.4, excluding the transition region between the ECAL barrel and endcap, where the reconstruction is not optimal.At least one large-radius jet with p J T > 250 GeV and |η J | < 2.6 is also required.The photon and jet must be separated by ∆R(γ, J) > 1.1, so as to ensure no overlap between the jet and the photon isolation cone.The triggers are found to be more than 98% efficient for the signal events passing the preselection.This slight trigger inefficiency is observed only at the low end of the Jγ invariant mass spectrum, below 750 GeV, and the events in data are weighted by the inverse of the trigger efficiency in this range, to account for this effect.
In addition, to get a more precise estimate of the Higgs boson jet invariant mass, a jet grooming algorithm known as "soft drop'" (SD) [51] is applied.The SD algorithm, which is an extension of a modified mass drop tagger [52], reclusters the jet, starting from its original constituents using the Cambridge-Aachen algorithm [53], and removes soft, wide-angle components of the jet at each step in the clustering process.The following SD grooming parameters are used: z cut = 0.1 and β = 0, where the z cut specifies trailing subjet p T relative to the whole jet p T at which the jet declustering into subjet pairs is stopped.The parameter β adds additional angular requirements on the jet declustering; for β = 0 these requirements are neglected.The groomed jet mass (m SD J ) is computed from the sum of the four-momenta of the remaining constituents, which are corrected with the same factor as has already been used in the generic jet reconstruction described above.
To take advantage of the dominant Higgs boson decay mode, H → bb, we further classify the events according to the output of the dedicated double b tagging (DBT) algorithm [40], which attempts to identify a two-prong substructure within a large-radius jet, as well as the likelihood that the two subjets originate from the b quarks.For the latter, several associated variables are used in an MVA tagging algorithm.These variables include the significance of the impact parameters of the tracks relative to the primary vertex, the number and masses of secondary vertices, and a variable that characterizes the system of two secondary vertices, taking into account kinematic properties that help to distinguish bb pairs produced in massive-particle decays from those originating from gluon splitting.We define two event categories using the "tight" working point of the DBT algorithm [40], which corresponds to a 36% tagging efficiency and a ≈1% mistag probability per event.This working point was found to give the greatest signal sensitivity at lower signal masses, below ≈2000 GeV, as it provides powerful background rejection.The events with the leading jet passing the tight working point of the DBT algorithm are assigned to the "b-tagged" category, while the rest are classified as "untagged".The untagged category allows the search to maintain optimal sensitivity to high-mass resonances (above ∼2000 GeV), for which the background is small and at the same time the DBT efficiency deteriorates compared to that at low masses, as the tracks originating from the secondary vertices become more collimated and harder to resolve.
For each category, two regions in preselected data are defined: the search region (SR), in which the invariant mass of the leading jet is required to be 110 < m SD J < 140 GeV, centered on the nominal Higgs boson mass, and the sideband (SB) region, which requires 100 < m SD J < 110 GeV.The SB region is chosen because it reproduces the shape of the distributions of domi-nant backgrounds in the SR for the kinematic variables used in the analysis, as seen in data and also confirmed by the Monte Carlo (MC) simulation.Consequently, the SB region can be used to predict the background shape in the SR, and therefore to optimize the analysis without reliance on background simulation.The lower boundary of the SB region is chosen so as to avoid contamination from W and Z dijet decays, and the upper boundary is chosen far enough from the nominal Higgs boson mass to minimize possible signal contamination.Since the mass of a large-radius jet is loosely correlated with its p T , we find that with the preselection kinematic requirements, an adequate description of the background shape in the SR is achieved via the SB region for Jγ masses above 720 GeV.Therefore, we search for narrow Hγ resonances in the 720-3250 GeV range, with the upper range chosen where we expect to see no events in data in either of the two categories.
Simulated Z signal samples are generated based on the benchmark model of Ref. [38], as implemented in the MADGRAPH5 aMC@NLO 2.3.3 [54] MC generator.Signals are generated at leading order (LO) in the mass range of 650-3250 GeV, in steps of 100 GeV (650-850 GeV), 150 GeV (850-2050 GeV), or 400 GeV (2050-3250 GeV).The intrinsic width of the signal is chosen to be negligibly small compared to the experimental resolution.The signal shape is parametrized with the Crystal Ball function [55], which provides an adequate description in the entire mass range considered.For other values of the mass, the signal shape is smoothly interpolated between the Crystal Ball function parameter values derived for the simulated mass points [56].The interpolation procedure was shown to reproduce the correct signal shape for a number of specific simulated mass points.
While the background estimates come from data, we use simulated background samples to check that the kinematic variables used in the analysis optimization are adequately described by the data in the SB region.The following simulated background samples are used: multijets, γ+jets, W+jets, and Z/γ * +jets, all generated at LO with MADGRAPH5 aMC@NLO.All the background samples are normalized using next-to-leading order cross sections and the integrated luminosity of the data sample.
The NNPDF3.0 [57] parton distribution functions (PDFs) are used for all simulated samples.The fragmentation and hadronization are described with PYTHIA version 8.205 [58] using the underlying event tune CUETP8M1 [59].The CMS detector response is simulated using GEANT4 [60].All simulated samples include effects of pileup by superimposing on hard scattering events simulated minimum bias collisions, with their multiplicity matching that observed in data.
Scale factors are applied to simulated samples to remove discrepancies between various efficiencies in simulation compared to those in data.These scale factors range 0.85-0.91[40] for the DBT efficiency, and ≈0.99 [50] for the photon identification efficiency.
Further requirements on the kinematic variables of selected events are applied in order to ensure optimal sensitivity for signal models with resonance masses between 720 and 3250 GeV.These selection criteria are chosen in an iterative fashion, whereby the requirements on all variables except one (the "variable of interest") are held constant, while the requirement on the variable of interest is varied to optimize the signal sensitivity.The background estimate for the optimization studies comes from the SB region, normalized to the overall number of events in the SR.The optimization is performed separately for the b-tagged and untagged categories.Both the signal significance and the signal limit optimization were checked and in most of the cases the optimal points are the same.In a few cases where the two optima are different, the change in the signal significance when using the best limit point is marginal.Several kinematic variables were considered, including the N-subjettiness variables [61], characterizing the sub-structure of the jet.As a result, the following additional requirements were chosen on top of the preselection for both b-tagged and untagged categories: the photon must be found in the ECAL barrel (|η| < 1.44), the leading jet must have |η| < 2.2, ensuring that the core of the jet is within the tracker coverage, and the ratio of the photon p T to the mass of the Jγ system, p γ T /m Jγ must exceed 0.35.The latter requirement suppresses photons from the γ+jet background, which tend to be more forward than the signal photons.Selection on other variables, including the jet substructure ones, does not improve the sensitivity of the analysis appreciably.The final selections are summarized in Table 1.Shown in Fig. 1 are the products of signal acceptance and efficiency versus generated signal mass for the two analysis categories, evaluated for each of the simulated signal samples, and a fit function used to interpolate between the generated mass points.In the b-tagged category, the product of the overall acceptance, and the reconstruction, trigger, and full selection efficiency for signal events increases from about 3% at low signal masses to a peak of 6% near a signal invariant mass of 1500 GeV, and decreases thereafter to about 5% for high signal masses.The observed behavior at high masses is due to the degradation in the DBT efficiency caused by the decreasing angular separation between the two subjets corresponding to the H → bb decay products [40].For the untagged category, the corresponding product of acceptance and efficiency increases from about 7% at low signal masses to around 16% for high signal masses.The main factors that impact the acceptance are the 57% SM Higgs boson branching fraction to bb and the fact that about 35% of the signal large-radius jets fail the 110 < m SD J < 140 GeV SR requirement.
After the final selection, the background shape in each category is modeled by fitting a smooth, monotonically falling function to the Jγ invariant mass spectrum in the SR.A variety of functional forms are considered for the background fit, based on the SB region data and on sim-ulated background samples.For every function, a goodness-of-fit (GOF) test known as the method of saturated models [62] is performed in the SR.The nominal background fit function is then chosen as the one with the best GOF with the minimal number of parameters.The selection of the nominal fit function is performed independently for the b-tagged and untagged categories.In both categories, the following function was found to give the best GOF to the Jγ invariant mass spectrum in the SR: where p i , i = 0, 1, 2 are the free parameters of the fit.
In order to prove that no systematic bias arises because of the choice of the background fit function, a number of tests are performed.An alternative fit function that performed well in the GOF test is taken as the probability distribution function for the background Jγ invariant mass spectrum and is used to generate a large number of pseudo-data spectra, with or without signal injection.The spectra are then fit to the sum of the chosen background template given by Eq. ( 1) and a signal with the mass and normalization allowed to float in the fit.The signal significance is extracted from each pseudo-data set and the distributions of the pull of the signal yield are constructed, where the pull is defined as the difference between the injected and extracted signal normalizations, divided by the statistical uncertainty in the extracted signal normalization from the fit.We observe that the distributions of the pulls are consistent with a Gaussian function with zero mean and a standard deviation of unity, and thus conclude that any systematic bias from the background fitting procedure is negligible compared to the statistical uncertainties in the fit.We therefore use the latter as the only uncertainties associated with the background estimate.
Several systematic uncertainties are taken into account in the signal extraction procedure.These uncertainties stem from effects that may lead to an imperfect estimate of the signal rate and shape, including experimental uncertainties in the integrated luminosity (2.5%) [64]), jet energy scale and resolution (2.0%) [48,65], photon energy scale and resolution (0.1-2.3%, depending on the Jγ mass [22]), pileup (1.0%), groomed jet mass scale (5.0%), and various identification efficiencies (4.0%, dominated by the DBT efficiency uncertainty [40]).We also include an uncertainty in signal acceptance due to the PDF choice, (2.0%) based on the PDF4LHC recommendations [66] using the NNPDF3.0replicas [57].Since the correction for the trigger inefficiency in data never exceeds 2%, the uncertainty due to this correction is always much smaller than the statistical uncertainty of data and therefore has been ignored.
The Jγ invariant mass spectra for both b-tagged and untagged categories, together with the background-only fit, as well as expected signal shapes for several signal masses, are shown in Fig. 2. The background fit function is shown with the 68 and 95% confidence level (CL) uncertainty bands obtained from the fit.Results in both categories are found to agree with the background-only hypothesis and do not exhibit any significant resonance-like structures.We set upper limits on the production cross section of narrow spin-1 resonances using the modified frequentist CL s criterion [67][68][69], with a likelihood ratio used as a test statistic, and uncertainties incorporated as nuisance parameters with log-normal priors.
Shown in Fig. 3 are 95% CL upper limits on the product of the signal cross section and the branching fraction to Hγ for the b-tagged, untagged, and the statistical combination of the two categories.These limits are the most stringent to date in the entire mass range studied and are the only available limits for masses below 1000 GeV and above 3000 GeV.The significant improvement in the sensitivity compared to the very recent ATLAS limits [39] in the 1000-3000 GeV range results from the application of the more efficient DBT algorithm, at low masses, The green and yellow bands correspond to the one and two standard deviation uncertainties in the background-only fit.For bins with a low number of data entries, the error bars correspond to the Garwood confidence intervals [63].Shown in the lower panels are the differences between the number of events in data and the nominal background prediction from the fit, divided by the combined statistical uncertainty in the data and the background fit.The error bars correspond to the statistical uncertainty in the data alone.and from the use of the untagged event category, at high masses.
In summary, a search for heavy, narrow resonances decaying to a Higgs boson and a photon (Hγ) has been performed in proton-proton collision data at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 35.9 fb −1 collected with the CMS detector at the LHC in 2016.Events in which a photon and a Lorentz-boosted Higgs boson that decays hadronically and is reconstructed as a single, large-radius jet are considered, and the γ+jet invariant mass spectrum is analyzed for the presence of narrow resonances.To increase the sensitivity of the search, events are categorized depending on whether the large-radius jet can be identified as a result of the merging of two jets originating from b quarks.The backgrounds, dominated by standard model γ+jet production, are estimated directly from data, without reliance on simulation.Results in both categories are found to agree with the predictions of the standard model.Upper limits on the production cross section of Hγ resonances ranging from 25 to 0.4 fb are set as a function of the resonance mass in the range of 720-3250 GeV.These are the most stringent constraints on narrow, spin-1 Hγ resonances to date in the entire mass range, and the first limits available below 1000 GeV and above 3000 GeV.

Figure 1 :
Figure 1: Signal acceptance times efficiency (A ) after the final selection, shown for the b-tagged and untagged categories.

Figure 2 :
Figure 2: The observed Jγ invariant mass spectra in the signal region, shown along with the background fit and a few selected signals, for the b-tagged (left) and untagged (right) categories.Signal samples are plotted with arbitrary normalizations and are shown for illustration purposes.The green and yellow bands correspond to the one and two standard deviation uncertainties in the background-only fit.For bins with a low number of data entries, the error bars correspond to the Garwood confidence intervals[63].Shown in the lower panels are the differences between the number of events in data and the nominal background prediction from the fit, divided by the combined statistical uncertainty in the data and the background fit.The error bars correspond to the statistical uncertainty in the data alone.

Figure 3 :
Figure3: Upper limits at 95% CL on the product of the signal cross section and the branching fraction to Hγ for the b-tagged channel (left), the untagged channel (middle), and the statistical combination of the two channels (right).The background-only hypothesis is consistent with the observed limits within two standard deviations.

Table 1 :
Final selection requirements.Events are categorized as b-tagged or untagged based on the DBT algorithm.