CP tests of Higgs couplings in $t\bar{t}h$ semileptonic events at the LHC

The CP nature of the Higgs coupling to top quarks is addressed in this paper, in single charged lepton final states of $t\bar{t}h$ events produced in proton-proton collisions at the LHC. Pure scalar ($h=H$) and pseudo-scalar ($h=A$) Higgs boson signal events, generated with MadGraph5_aMC@NLO, are fully reconstructed using a kinematic fit. Angular distributions of the decay products, as well as CP-sensitive asymmetries, are exploited to separate and gain sensitivity to possible pseudo-scalar components of the Higgs boson and reduce the contribution from the dominant irreducible background $t\bar{t}b\bar{b}$. Significant differences are found between the pure CP-even and -odd signal hypotheses as well as with respect to the Standard Model background, in particular the $t\bar{t} b\bar{b}$ contribution. Such differences survive the event reconstruction, allowing to define optimal observables to extract the Higgs couplings parameters from a global fit. A dedicated analysis is applied to efficiently identify signal events and reject as much as possible the expected Standard Model background. The results obtained are compared with a similar analysis in the dilepton channel. We show that the single lepton channel is more promising overall and can be used in combination to study the CP nature of the Higgs coupling to top quarks.


I. INTRODUCTION
A new particle with mass around 125 GeV, consistent with the Higgs boson predicted by the Standard Model (SM), was discovered by ATLAS [1] and CMS [2] at the LHC. This discovery is of fundamental importance for the electroweak symmetry breaking mechanism [3], which allows elementary particles to acquire mass. The properties of the Higgs boson (mass, spin, parity, etc.) have been extensively studied ever since and, in particular, its couplings. Even though the SM predictions for the Higgs boson are in remarkable agreement with experimental results [4][5][6], the SM cannot be the ultimate theory. It does not explain the baryon asymmetry in the Universe, which may require additional sources of CP violation. The SM also fails to provide a viable dark matter candidate. Extensions with multiple Higgs doublets [7] can provide new sources of CP violation in the Higgs sector. These may have an impact on the Higgs Yukawa couplings to fermions, by adding a new CP-odd component to the SM coupling.
Although a pure CP-odd case was already excluded at 99.98% C.L. in γ, Z, and W interactions with Higgs bosons [5,6], mixing between a CP-even and CP-odd component is allowed by experimental data. Moreover, the couplings of the Higgs boson to fermions might show a different nature in which large CP-odd couplings are still allowed and may be different among the different flavors of fermions [8]. Of particular relevance is the measurement of the Higgs couplings in associated production with top quarks. Since the Higgs Yukawa coupling to the top quark is expected to be close to unity [9], much larger than the other Yukawa couplings, its impact on the vacuum stability is expected to be more important. The study of this process allows a direct measurement of the vertex and, in particular, provides sensitivity to the CP nature of the Higgs couplings to top quarks [10]. The precise evaluation of the total production cross section to the highest order in QCD is of crucial importance to study this rare process at the LHC. The impact of soft gluon resummation to next-tonext-to-leading logarithmic (NNLL) accuracy is already available [11,12].
In this paper, the associated production of a Higgs boson with a pair of top quarks (tth) is studied at the LHC for a center-of-mass energy of 13 TeV. While semileptonic decays of the tt system are searched for (tt → bW þb W − → bbqq 0 l AE ν l ), the Higgs is expected to decay through the SM dominant decay mode (H → bb). The single lepton final-state topology is characterized by the presence of an isolated charged lepton and missing transverse energy (E) from the undetected neutrino.
The observation of the associated production of top quarks with a Higgs boson was recently announced independently by the CMS and the ATLAS collaborations, using a combination of analyses, resulting in significances in excess of five standard deviations [13,14]. These analyses employed LHC data collected at 7, 8, and 13 TeV and are sensitive to tth final states, with the Higgs boson decaying to bb, WW Ã , τ þ τ − , γγ, and ZZ Ã . These observations open the field to the type of measurements proposed in the current paper.
The analysis of tth (h → bb) final states at the LHC is particularly challenging due to the low expected cross section and large tt þ jets background. For the Higgs decay channel considered in this paper, h → bb, the ttbb background is of particular importance. We search for deviations from the SM by comparing the kinematics of tth signals with SM-like couplings (h ¼ H and J CP ¼ 0 þ ) with tth signals with pure pseudoscalar Higgs bosons (h ¼ A and J CP ¼ 0 − ). The most general Lagrangian that accounts for contributions from CP-even and CP-odd components of the couplings is defined as where y t is the SM Higgs boson Yukawa coupling and α represents a CP phase. While the SM interaction is recovered by fixing jcos αj ¼ 1, the pure pseudoscalar is obtained by setting cos α ¼ 0.
We consider several CP-sensitive angular distributions introduced in the literature [15][16][17]. These are evaluated for the first time in semileptonic final states of tth decays in this paper. Full event kinematic reconstruction is applied to reconstruct the 4-momenta of all massive particles (t,t, h, W þ , and W − ) as well as the undetected neutrino. We show that, even after showering, detector simulation, event selection, and full kinematic reconstruction, the distributions of several angular variables are largely preserved. Moreover, background discrimination in this channel can be enhanced using these angular variables. Although results are consistent with what was observed in an analysis of the dileptonic tth channel [17,18], the signal sensitivity of the semileptonic analysis reported here is larger than the dileptonic one. As in the dileptonic analysis [18], all the mixed states of CP-even and CP-odd couplings gave results between the ones obtained with cos α ¼ 0 and cos α ¼ 1; only these two extreme cases are considered in the present analysis.
This paper is organized as follows. We begin with a brief Introduction given in this section and a description of the event generation, simulation and event selection in Sec. II. In Sec. III, we analyze the angular variables, as reconstructed in the tth semileptonic channel, and in Sec. IV, the results are discussed. The conclusions are summarized in Sec. V.

A. Monte Carlo generation
The generation of the tth signals (both scalar and pseudoscalar) and the ttbb dominant background were performed, at next-to-leading order (NLO) in QCD, with MADGRAPH5_AMC@NLO [19]. The NNPDF2.3 sets for the parton distributions of the nucleon (PDF) [20] were used. While the default model (SM) was used for the CP-even SM Higgs boson signal (h ¼ H), the generation of the pure CPodd pseudoscalar signal (h ¼ A) used the HC_NLO_X0 model [21]. The MADGRAPH5_AMC@NLO event-generator NLO cross sections were assumed for all three of these samples. For the case of the ttH signal, this is in agreement with a recent calculation at approximate next-to-next-toleading-order (NNLO) accuracy including threshold resummation of soft gluon emission in the soft-collinear effective theory framework [11]. The best current knowledge of the ttH cross section comes from NNLL calculations [12], which show only small differences with respect to the NLO cross section but roughly factor 2 improved precision. In addition to the ttbb dominant background, other sources of SM backgrounds were also considered. These included tt þ jets (where "jets" stands for up to three additional jets from the hadronization of cor light-flavored quarks), ttV þ jets (where V ¼ W AE , Z) and jets can go up to one additional jet), single top quark production (s-channel, t-channel, and Wt associated production), diboson (W þ W − , ZZ, W AE Z þ jets with up to three additional jets), W AE þ jets (with up to four additional jets), and Wbb þ jets (with up to two additional jets). These backgrounds were also generated with MADGRAPH5_AMC@NLO but at leading order (LO) in QCD. For the generation at LO, we used the MLM [22] matching scheme to merge collinear/soft-radiation of the parton final states, with hard parton configurations, where large angle emissions lead to the presence of extra jets in events. The cross section of the tt þ jets background was normalized to the NNLO in QCD with NNLL resummation of soft gluon terms [20,[23][24][25][26]. The electroweak single top quark production cross section was scaled to the approximate NNLO theoretical calculation [27,28]. Although full NNLO calculations exist [29] for single top quark production, the approximate cross section was used instead and rescaled to the exact top quark mass used in the generation according to the prescription given in Ref. [30]. The same prescription was applied to the tt þ jets background. For all the other SM background processes, the MADGRAPH5_AMC@NLO event-generator cross sections were used. The generation was performed at a center-ofmass energy of 13 TeV, at the LHC, with dynamic renormalization and factorization scales set to the sum of the transverse masses of all final-state partons. The masses of the top quark (m t ), the W boson (m W ), and Higgs bosons (for both scalar, m H , and pseudoscalar, m A ) were set to 173, 80.4, and 125 GeV, respectively. To preserve full spin correlations, MADSPIN [31] was used to decay heavy particles. Parton showering and hadronization was perfomed by PYTHIA6 [32].

B. Event simulation and reconstruction
Following generation and parton showering, events were passed through a fast simulation of a typical LHC detector, using DELPHES [33]. This allows one to have more realistic experimental conditions in what concerns the reconstruction of charged leptons, jets, and missing transverse energy (and missing momentum). The efficiencies and resolutions of the default detector subsystems are parametrized as a function of transverse momentum, p T , and pseudorapidity, η, for the different types of particles (details may be found in Ref. [33]). FASTJET [34] was used for jet reconstruction using the anti-k t algorithm [35] with radial parameter R set to 0.4. The efficiency (ϵ b ) for tagging jets originating from the hadronization of b quarks, i.e., b tagging, is dependent on their transverse momentum (p T ) according to with p T given in GeV, in the region where p T ≥ 10 GeV and jηj ≤ 2.5. The efficiency is set to zero outside this region. The mistag probability of identifying light and c jets as fake b jets is given by The analysis of simulated events was performed with MADANALYSIS5 [36] in the expert mode [37]. To build the angular distributions, kinematic properties of signal events need to be fully reconstructed. This was accomplished by KLFITTER [38] with the likelihood-based reconstruction method. KLFITTER uses transfer functions, W k ðE meas k jE parton k Þ, to reconstruct particle energies (E parton k ) using their measured values (E meas k ) after detector simulation, together with the knowledge of experimental resolutions. These are considered for jets (k ¼ j) and charged leptons (k ¼ l). We implemented a dedicated parametrization of the jet transfer functions, which depend on their energy and pseudorapidity. We also applied the transfer function W miss ðE meas miss;xðyÞ jE parton ν;xðyÞ Þ, to reconstruct the xðyÞ component of the neutrino transverse energy, E parton ν;xðyÞ , from the measured xðyÞ missing transverse energy component, E meas miss;xðyÞ . To make sure the transfer functions were appropriate for semileptonic tth final states, we applied a preselection by requiring events to have, at least, six jets and one charged lepton. These cuts were applied in the definition of the transfer functions themselves. Once all were defined, the likelihood was built according to which used several Breit-Wigner probability density functions, Bðm x 1 ;x 2 ;… jm X ; Γ X Þ, to evaluate the probability of reconstructing the invariant mass (m x 1 ;x 2 ;… ) of system x 1 ; x 2 ; …, consistent with a particle of mass m X and width Γ X (X ¼ t, W and h). We tried all possible permutations of all measured jets, in order to find the best matching between the jets and (i) the two c-or light-flavored quarks (q 1 , q 2 ), from the hadronically decaying (W) boson (ii) the two b quarks from the decay of the Higgs boson (b h ,b h ) (iii) the two b quarks, one from the fully hadronic (b had ) and the other from the semileptonic (b lep ) decays of the top quarks present in the events. The lepton (l) and undetected neutrino (ν) from the leptonically decaying (W) boson, together with the b lep candidate in each permutation, were used to reconstruct the top quark that decayed through the semileptonic decay. The neutrino p z reconstruction was accomplished by considering the x (y) components of the missing transverse energy to be the x (y) components of the neutrino's momentum, constrained by If, for any permutation, two solutions were found, the one that maximized the likelihood function was selected. When no solution was found for a particular permutation, the neutrino p z was fixed to zero. From the long list of all possible permutations and solutions, the one chosen as the best candidate for the full kinematic reconstruction of the event was the one with the largest value of the likelihood function. The partons' 4-momenta were reconstructed from the objects of that particular permutation, and in order to accommodate the corrections from the transfer functions, their energy was changed to that obtained in the kinematic fit. The momentum components were also rescaled according to with C. Event selection Following the preselection and kinematic reconstruction, additional selection criteria were applied to events, defining what we call the "final selection." Charged leptons and b-tagged jets were required to have p T ≥ 20 GeV and jηj ≤ 2.5. For non-b-tagged jets, the η selection was increased to jηj ≤ 4.5. Only events with E > 20 GeV were used, since the final-state topology involves one undetected neutrino. Furthermore, only topologies with six to eight jets, three or four of which were (b) tagged, were considered in the event analysis. We checked that these topologies have the largest matching efficiency (ϵ match ¼ 30.1%), defined as the fraction of events for which all objects from the chosen permutation were within ΔR ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ðΔηÞ 2 þ ðΔϕÞ 2 p < 0.4 of the corresponding partons at generator level. At this stage, all selected events were reconstructed by KLFITTER.  In Table I, the expected cross sections (in fb) are shown, at preselection and final selection levels, for semileptonic final states of tth signals and SM backgrounds. The ttA pseudoscalar signal was scaled to the ttH scalar cross section for illustration purposes only. Figure 1 shows the reconstructed transverse momenta of the Higgs boson (top left) and top quark (bottom left) against the corresponding values at the generator level. Equivalent results for the tth dileptonic final state, published in Ref. [18], are presented on the right-hand side for comparison. The reconstruction using KLFITTER performs better than the one used in the dileptonic channel, avoiding, for instance, the asymmetries seen in the transverse momentum of the Higgs boson (top left). This is particularly relevant, since the shape of the p T ðhÞ distribution is particularly sensitive to the CP nature of the Higgs boson Yukawa couplings to top quarks. Although the kinematic fit could be further improved, no optimization of the reconstruction was attempted.

III. ANGULAR OBSERVABLES
For the reconstruction of the angular distributions, we use the spin helicity formalism and, generically, define θ X Y as the angle between the momentum direction of the Y particle (or system), measured in the rest frame of X, with respect to the direction of X, in the rest frame of its parent particle [17]. As particles follow successive decays starting from the tth center-of-mass system (X ¼ tth) until all intermediate particles have decayed, X defined above includes three-, two-, and single-particle systems. The tth system momentum direction is measured with respect to the laboratory frame. In case of ambiguity in describing the angular distributions, the exact definition for the angles is specified in the text. In performing the boosts, two different prescriptions can be used for the decays: 1) the direct approach, in which the laboratory 4-momenta of particles were used for X and Y, or 2) the sequential approach, in which the 4-momenta of particles X and Y were boosted through all intermediate center-of-mass systems. Both prescriptions lead to different distributions due to the non-Abelian nature of the Lorentz group. In Figs. 2 and 3, we show θ tth t ) (the angle between the momentum direction of the top quark, in the tth system, and the tth direction, in the lab frame) versus θ h b h (the angle between the momentum of the b quark from the Higgs boson, in the Higgs boson frame, and the Higgs boson momentum in theth frame). Distributions are shown at generator level in Fig. 2 and after event selection and kinematic reconstruction in Fig. 3. While the left (right) distributions are for tth semileptonic (dileptonic) decays, the top (bottom) ones are for the scalar h ¼ H (pseudoscalar h ¼ A) tth signal. The dileptonic results are only shown for comparison. The pattern differences observed between the scalar and pseudoscalar signal distributions are quite noticeable, even after event selection and kinematic reconstruction. This behavior is particularly visible in tth semileptonic decays after kinematic reconstruction (Fig. 3). While events tend to be more uniformly distributed in the plot for the case of the scalar couplings (Fig. 3, top), the pseudoscalar case tends to concentrate events in two extreme regions (Fig. 3, bottom). Given the good performance of the kinematic reconstruction in semileptonic decays of tth, we study the following angular distributions and corresponding asymmetries defined in Ref. [17]:  [15]. Table II shows the asymmetry values for the different scalar and pseudoscalar tth signals, together with the ones expected for the dominant SM background, ttbb. These were calculated after event selection and full kinematic reconstruction. The uncertainties on the simulated asymmetries resulting from the finite size of the simulated sample are below 10 −2 . For an integrated luminosity of 100 fb −1 , the statistical uncertainties on the asymmetry measurements are expected to be below 4 × 10 −2 .
In Fig. 4, some of the corresponding angular distributions are shown. While the direct prescription was applied to boost the lepton (l − ) to the Higgs boson (h) system in the top-left distribution, cosðθ¯t h h Þ cosðθ h l − Þ, the sequential prescription was used in the top-right distribution, sinðθ tth h Þ sinðθ¯tb¯tÞ, to boost theb to its parent top quark system (t). For both middle plots, sinðθ tth h Þ cosðθ¯t b h Þ and sinðθ tth t Þ sinðθ h b h Þ on the left and right, respectively, the sequential prescription was used to boost the b quark from the Higgs boson decay, to thet and h center-of-mass systems, respectively. Finally, in the bottom plots, the angular distributions of sinðθ tth h Þ sinðθ tt t Þ and b 4 [15] are shown on the left and right, respectively. Clear differences among the shapes of both tth signals and also with respect to the dominant background, ttbb, are visible even after event selection and full kinematic reconstruction.
The angular distributions can be grouped in two different categories: i) those that exhibit similar behavior between the scalar and pseudoscalar signals and both different from the backgrounds and ii) those that are different among signals. While the first set (which may include distributions like the ones shown in Fig. 4, top right or bottom left) is appropriate for measurements of total tth production rates at the LHC that do not show strong shape dependence on the type of coupling, the second set (which may include distributions like the ones in Fig. 4, middle right or bottom right) provides sensitivity to probe the CP nature of the Higgs boson Yukawa couplings to top quarks. Other observables previously proposed [15,31,39] have also been investigated. We have found that, for the semileptonic decays of tth events, and after selection and kinematic reconstruction, they do not have the same sensitivity as the b 4 variable. For illustration purposes, we show in Fig. 5 (top) the expected number of events for the different SM backgrounds and the SM Higgs signal, after event selection and kinematic reconstruction for a luminosity of 100 fb −1 at the LHC. Two angular distributions are shown: For completeness, we also show a fake data distribution obtained by randomly sampling the expected SM signal and background distributions to mimic the intended integrated luminosity. The tt þ jets background in Fig. 5 (top) includes the contributions from light and c jets, which, as can be seen in Table I, is a significant background after the final event selection applied in this paper. Restricting the selection to four b-tagged jets, the signal significance can increase at the expense of some statistical loss, and the background composition changes to a more ttbb-dominated sample [40]. This is the main reason why signal angular distributions are shown against the ttbb background.

IV. RESULTS
Expected limits at 95% C.L. for σ × BRðh → bbÞ and for the signal strength, μ 1 in the background-only hypothesis, were obtained using ROOT's TLimit [41] implementation of the modified frequentist likelihood method (CL s ) [42,43]. A test statistic was defined and computed for 10 5 pseudoexperiments in the hypotheses of signal plus background and background only. The statistical fluctuations of the pseudoexperiments were performed with Poisson distributions. All statistical uncertainties of the expected backgrounds and signal efficiencies were taken into account in deriving the confidence level for a given signal hypothesis. The limits were calculated for the angular distribution sinðθ tth h Þ sinðθ¯tb¯tÞ and the b 4 variable. We checked that other angular distributions gave similar results. Scalar (h ¼ H) and pseudoscalar (h ¼ A) signals were used, corresponding to values of the CP phase set to jcosðαÞj ¼ f0; 1g [see Eq. (1)]. Figure 5 shows the limits obtained for the angular distribution sinðθ tth h Þ sinðθ¯tb¯tÞ (middle) and b 4 (bottom) on the σ × BRðh → bbÞ (left) and signal strength μ (right). The limits were set for integrated luminosities of 100, 300, and 3000 fb −1 . Sensitivity to the SM ttH production with μ ¼ 1 should be attained shortly after 100 fb −1 of total integrated luminosity has been collected, using the angular distributions in this channel alone. The expected confidence level for the exclusion of an overall contribution to data of a pure   [15]. These are shown after event selection and full kinematic reconstruction. The light blue line represents the tth SM model signal (h ¼ H and CP ¼ þ1), and the dark blue line corresponds to the pure pseudoscalar distribution tth (h ¼ A and CP ¼ −1). The filled region corresponds to the ttbb dominant background. FIG. 5. Distributions of (top) x Y ¼ sinðθ tth h Þ cosðθ¯t b h Þ (left) and x Y ¼ sinðθ tth h Þ sinðθ¯tb¯t Þ (right) after final event selection and kinematic reconstruction at 13 TeV for 100 fb −1 with the contributions from the full SM background and fake data; (middle) expected limits at 95% C.L. in the background-only hypothesis, for j cosðαÞj ¼ 0, 1. Limits on σ × BRðh → bbÞ (left) and μ (right) obtained with the sinðθ tth h Þ sinðθ¯tb¯t Þ distribution for integrated luminosities of 100, 300, and 3000 fb −1 are shown. The lines correspond to the median, while the narrower (wider) bands correspond to the 1σð2σÞ intervals. (Bottom) The same limits as presented for the middle plots, but here for the b 4 distribution. pseudoscalar signal (A) against the SM Higgs hypothesis (H) was set at 85.5%, 96.9%, and 100.0% for 100, 300, and 3000 fb −1 , respectively. The results obtained in the semileptonic channel are almost a factor 2 better than the ones presented for the dileptonic channel in Ref. [17]. Combining both channels should allow one to decrease the luminosity needed to probe the structure of Higgs boson couplings to the top quarks. This study, however, is outside the scope of this paper.

V. CONCLUSIONS
In this paper, we study the experimental sensitivity to the CP nature of the Higgs boson Yukawa couplings to top quarks, which can be obtained through the use of several angular observables using tth (with h ¼ H, A) events produced at the LHC. Several benchmarks for integrated luminosities were used, i.e., 100, 300, and 3000 fb −1 . Semileptonic final states from tth decays were fully reconstructed by a kinematic fit performed with KLFITTER. We show that, even after event selection and full kinematic reconstruction, the shape of the new angular distributions and asymmetries is largely preserved and can be used to discriminate between the different types of signals (scalar vs pseudoscalar) and the dominant irreducible SM background, ttbb. As the spin information is largely preserved, the angular distributions were used to determine expected limits at 95% C.L. on σ × BRðh → bbÞ and signal strength μ. The performance obtained from the use of angular variables is compared with that of other observables commonly discussed in the literature, yielding at least the same sensitivity to the nature of the top quark Yukawa coupling, if not better. All results presented in this paper were obtained using the semileptonic final states of tth events alone, which were found to be significantly better (around a factor 2) than the ones obtained in the dileptonic channel. Thus, searches for a CP-odd component in the coupling of the Higgs boson to top quarks can be expected to improve when combining the information from both decay channels using angular observables.