Measurements of $\mathrm{t\bar{t}}$H production and the CP structure of the Yukawa interaction between the Higgs boson and top quark in the diphoton decay channel

The first observation of the $\mathrm{t\bar{t}}$H process in a single Higgs boson decay channel with the full reconstruction of the final state (H$\to\gamma\gamma$) is presented, with a significance of 6.6 standard deviations ($\sigma$). The CP structureof Higgs boson couplings to fermions is measured, resulting in an exclusion of the pure CP-odd structure of the top Yukawa coupling at 3.2$\sigma$. The measurements are based on a sample of proton-proton collisions at a center-of-mass energy $\sqrt{s}=$ 13 TeV collected by the CMS detector at the LHC, corresponding to an integrated luminosity of 137 fb$^{-1}$. The cross section times branching fraction of the $\mathrm{t\bar{t}}$H process is measured to be $\sigma_{\mathrm{t\bar{t}H}}\mathcal{B}_{\gamma\gamma}=$ 1.56$^{+0.34}_{-0.32}$ fb, which is compatible with the standard model prediction of 1.13$^{+0.08}_{-0.11}$ fb. The fractional contribution of the CP-odd component is measured to be $f^{\mathrm{Htt}}_\mathrm{CP}=$ 0.00 $\pm$ 0.33.

metric photon transverse energy (E T ) thresholds of 30 and 18 (22) GeV for the data collected during 2016 (2017/2018). The trigger efficiency is >95% and is measured as a function of E T , η, and R 9 of the photons using an alternative trigger, where R 9 is the energy sum of the 3 × 3 crystals centered on the most energetic crystal in the cluster divided by the energy of the photon.
H candidates are built from pairs of photon candidates, which are reconstructed from energy clusters in the ECAL not linked to charged-particle tracks (with the exception of converted photons). The photon energies are corrected for the containment of electromagnetic showers in the clustered crystals and the energy losses of converted photons with a multivariate regression technique based on simulation [44]. The ECAL energy scale in data is corrected using Z → e + e − simulated events smeared to reproduce the energy resolution measured in data. The offline diphoton selection criteria are similar to, but more stringent than, those used in the trigger [44].
Photons are further required to satisfy a loose identification (photon ID [44]) criterion based on a boosted decision tree (BDT) classifier trained to separate photons from jets. The shower shape and isolation variables (inputs to the photon ID) in simulation are corrected with a chained quantile regression method [45] based on studies of Z → e + e − events. Each variable is corrected with a separately trained BDT, taking the photon kinematics, per-event energy density, and the previously corrected features as inputs, to ensure that correlations between the inputs are preserved and closer to those in data. This method improves the modeling of the photon ID BDT discriminant in MC simulation with respect to the previous CMS H → γγ results [44].
After the preselection described above, we require 100 < m γγ < 180 GeV, p T /m γγ > 1/3 and 1/4 for the leading (in p T ) and subleading photons respectively and then divide events into two channels. The leptonic channel is aimed at selecting events where at least one top quark decays leptonically, and demands the presence of ≥1 jet with p T > 25 GeV and |η| < 2.4, ≥1 isolated e (µ) of p T > 10(5) GeV and |η| < 2.4. The hadronic channel targets tt hadronic decays by requiring at least three jets, at least one b-tagged jet, and no isolated leptons (e/µ).
A dedicated BDT-bkg discriminant is employed in each channel to distinguish between ttH and background events. These BDTs are trained with the XGBOOST [46] framework on signal and background MC samples, with one exception as noted below. The background MC samples include γ + jets, γγ + jets, tt + jets, tt + γ, tt + γγ, Z + γ, and W + γ processes, as well as a variety of other rarer backgrounds. Non-ttH production modes of H are also treated as background. The dominant background in the hadronic channel consists of γ + jets events, where one jet is misidentified as a photon. To improve the performance of the hadronic BDTbkg, the γ + jets background is modeled from a large sample of data events with one photon candidate failing the photon ID requirement, these are almost exclusively multi-jet and γ + jets events. For each such event, the photon ID value of the misidentified jet is replaced by a value drawn from the MC distribution of photon ID values of misidentified jets passing the photon ID requirement. These events, appropriately weighted, are then used in the hadronic BDT-bkg training instead of the γ + jets MC sample.
Input features of BDT-bkg include kinematic properties of jets, leptons, photons and diphotons (but not m γγ ), jet and lepton multiplicity, b-tagging scores of jets, and p miss T . The inclusion of b-tagging scores reduces the non-tt background; further, jets and leptons in ttH events tend to have higher p T and smaller |η| than in background events. The BDT-bkg also uses output of the photon ID BDT, and the outputs of other machine learning (ML) algorithms described below as input features. One such ML algorithm is a top quark tagger BDT (top tagger) [47] to distinguish events with top quarks decaying into three jets from events that do not contain top quarks. We also use long short-term memory based [48] DNNs trained to separate ttH from the dominant backgrounds in a signal-enriched phase space: γγ + jets and tt + γγ (hadronic channel); and tt + γγ (leptonic channel). In addition to the features that are used in BDT-bkg, the DNNs exploit low-level information including the full four-vectors of each jet and lepton and the jet flavor scores [28]. The four-vectors allow for a more effective use of event kinematics and the jet flavor scores allow the differentiation of the origins of hadronic jets between ttH and γγ + jets (tt + γγ) events. The DNNs are used as additional inputs to the BDT-bkg, rather than in place of the BDT-bkg, because they provide superior performance over BDTs when many training instances are available (DNNs trained with fewer instances, as is the case for other background samples, suffer from severe overfitting and sub-optimal performance). The modeling of the input features has been validated by comparing data and MC distributions for events passing the preselection in both channels. The BDT-bkg score has been validated by comparing the distributions in data and MC in both the m γγ sidebands, satisfying either 100 < m γγ < 120 GeV or 130 < m γγ < 180 GeV (as in Fig. 1) as well as in dedicated control regions which target tt + Z events.

CP 4 CP 3
Data  Events are either rejected or further divided into eight (four) categories to maximize expected significance (sensitivity to CP structure of the Htt amplitude), according to their BDT-bkg output as shown in Fig. 1 and Table 1. We perform a simultaneous binned maximum likelihood fit to the m γγ distributions in the eight categories to extract the product of the ttH cross section and H → γγ branching fraction (σ tt H B γγ ) and the signal strength µ tt H , defined as the ratio of the measured to SM expected H → γγ. In the fit, all other H production modes are constrained where ψ t and ψ t are the Dirac spinors, m t is the top quark mass, v is the SM H field vacuum expectation value, and κ t andκ t are the CP-even and CP-odd Yukawa couplings. In the SM, κ t = 1 andκ t = 0. We measure the CP structure with f Htt CP = |κ t | 2 |κ t | 2 + |κ t | 2 sign(κ t /κ t ).
When the cross sections of the CP-even and CP-odd contributions are equal, f  It has been shown in Ref.
[22] that an optimal analysis of the CP structure in the ttH process can be performed with two observables, D 0− and D CP . They could be obtained by using full kinematic information with the matrix element or ML techniques with the same sensitivity [57].
In this study, we use a BDT to obtain D 0− and do not include D CP since it requires tagging the flavor of light jets. As a consequence, it is not possible to measure the relative sign, or phase, of the κ t andκ t couplings. Nonetheless, this sign is incorporated into the f We train a BDT to distinguish CP-even and CP-odd contributions. The observables used in the training include the kinematic variables of the first six jets (in p T ) and the diphoton system (but not m γγ ), the b-tagging scores of jets, and in the leptonic channel, the lepton multiplicity and the kinematic variables of the leading lepton. The output of the BDT is the D 0− observable. Simulation shows that D 0− has negligible correlation with the BDT-bkg discriminant. The events selected for the signal strength measurements are split into 12 categories, leptonic or hadronic, two BDT-bkg categories shown in Fig. 1, and three D 0− bins, as shown in Fig. 3.
A simultaneous fit to the m γγ distribution is performed using the 12 categories to measure f Htt CP . The µ tt H parameter is left unconstrained. An additional systematic uncertainty is introduced to cover possible small differences in the modeling of the distributions with the JHUGEN generator used for variation of the CP structure of the ttH coupling and MADGRAPH5 aMC@NLO generator used to model SM distributions. However, statistical uncertainties dominate the measurement of f Htt CP . In addition to the ttH process, we parameterize the tH production with the µ tt H and f The fit results are shown in Fig. 3 and are obtained using the profile likelihood method as f Htt CP = 0.00 ± 0.33, with the constraint | f Htt CP | < 0.67 at 95% confidence level (CL). The coverage was determined with pseudo-datasets and found to agree with that expected in the asymptotic limit [59]. The pure pseudoscalar model of CP structure of the Htt coupling ( f   [48] S. Hochreiter and J. Schmidhuber, "Long short-term memory", Neur. Comp. 9 (1997) 1735, doi:10.1162/neco.1997.9.8.1735.