Kinematic corrections and reconstruction methods for neutral Higgs boson decay to $b\bar{b}$ in 2HDM type-I at future lepton colliders

In this paper, an approach for neutral Higgs bosons search is described based on 2HDM type-I at electron-positron linear colliders operating at $ \sqrt{s}=1$ TeV. The beam is assumed to be unpolarized and fast detector simulation is included. The signal process produces a fully hadronic final state through $ e^{+}e^{-}\rightarrow AH\rightarrow b\bar{b}b\bar{b} $ where both CP-even and CP-odd Higgs bosons ($H$ and $A$) are assumed to decay to a pair of $b$-jets. Several benchmark scenarios are introduced as the baseline for the analysis taking $ m_{H/A}$ in the range 150--300 GeV. In order to avoid Higgs boson conversion $A \to ZH$, Higgs boson masses are chosen with $m_A-m_H<m_Z$. It is shown that with a proper kinematic correction applied on final state $b$-jet four momenta, true combinations of $b$-jets can be found for simultaneous reconstruction of both Higgs bosons through $b\bar{b}$ invariant mass calculation. Results show that observable signals can be achieved with statistical significance exceeding $5\sigma$ well before the target integrated luminosity of 8 $ab^{-1}$.


Introduction
The Standard Model of particle physics (SM) is one of the most precisely tested theories which has been verified in various experiments. After the discovery of the electroweak gauge bosons, there has been extensive search for the missing key element of the standard model, i.e., the Higgs boson, h SM [1][2][3][4][5][6]. The result of these searches is the observation of a new boson at the Large Hadron Collider (LHC) by the two collaborations ATLAS and CMS [7,8].
The properties of the observed boson show reasonable compatibility with SM predictions as verified at LHC [9-16]. However, these measurements still allow possible extensions to the Higgs sector such as the two Higgs doublet model [17][18][19] which is the basis for several beyond SM scenarios such as supersymmetry [20][21][22].
The Two Higgs Doublet model (2HDM) has attracted attention even as a standalone model without necessarily embedding it in a supersymmetric theory. Since there are two Higgs boson doublets with complex fields, a total number of five Higgs bosons are predicted including the lightest Higgs boson h (the SM-like Higgs boson), the heavy neutral CP-even(CP-odd) Higgs  The overall conclusion from these searches is that the upper left region of the parameter space in m A vs m H plane defined by m A − m H > m Z is excluded up to m H 300 GeV. The area of the excluded region, however, depends on the type of 2HDM.
Before going to the analysis details, the theoretical framework of the analysis is described and the working points in the parameter space are introduced. These points represent example analysis scenarios which target the unexplored region of the parameter space by LHC.

Two Higgs Doublet Model
The two SU(2) doublets introduced in 2HDM contain complex fields resulting in eight degrees of freedom, three of which are eaten by the electroweak gauge bosons to receive their masses. Therefore five degrees of freedom remain leading to five physical Higgs bosons which are denoted as h, H, A and H ± . The neutral Higgs masses are assumed to be in the same order as listed above, i.e., h is the lightest Higgs boson which is considered to be SM-like and the other two neutral Higgs bosons are heavier. Each doublet has its own vacuum expectation value or "vev" (v 1 and v 2 ). They are related to the SM-like v = 246 GeV through v 1 = v cos β and v 2 = v sin β resulting in the ratio tan β = v 2 /v 1 which is the free parameter of the model.
In addition to β parameter, there is also rotation angle, denoted by α, which diagonalizes the masssquared matrix of the neutral Higgs bosons. The two free parameters α and β determine the Higgsfermion couplings in various 2HDM types.
The neutral scalar Higgs couplings with gauge bosons also depend on these parameters through sin(β − α) (h-gauge) or cos(β − α) (H-gauge) and therefore the lightest Higgs boson of the model acquires the same couplings with gauge bosons as those of h SM if sin(β − α) = 1 which is the alignment limit [27]. With this requirement, H/A-fermion couplings will solely depend on tan β or cot β and h-fermion couplings coincide their SM values which are m f /v with m f being the fermion mass [28][29][30][31]. It should be noted that the alignment limit is naturally realized in the decoupling regime where the other Higgs bosons are decoupled by assuming that they are much heavier than the electroweak scale v [32]. However, the Higgs boson masses under study in this work, are of O(v). Therefore the chosen scenario is the alignment limit without decoupling.
The Yukawa Lagrangian for Higgs-fermion interactions can be written in this form: According to Tab. 1, the heavy Higgs couplings acquire additional type dependent factor ρ f which can be used to distinguish the model type as well as the Higgs boson decay properties [33,34]. I  II  III  In Tab. 1, u(d) and denote the up(down)-type quarks and leptons. Type-III is also called "Flipped " and Type-IV is called "lepton-specific".

2HDM Types
We have recently performed various studies of different 2HDM types at future lepton colliders. The main focus has been on Higgs boson pair production, through e + e − → HA.
In type-I, H → bb has been shown to be the most promising decay channel with A → bb [35] or A → ZH with possible leptonic or hadronic decay of the Z boson [36][37][38]. The four b-jet final state through H/A → bb has also shown discovery potential in the flipped type or type-III [39]. In the lepton-specific type or type-IV, the leptonic decay channels, i.e., H/A → τ τ or µµ, result in observable signals in parts of the parameter space as reported in [40][41][42].
In this work, the same Higgs boson pair production in the four b-jet final state is considered as the signal. If m H/A < 2m t , the Higgs boson decay to top quark pair is kinematically forbidden. Given all Higgs-fermion couplings proportional to cot β, the branching ratio of Higgs decay to fermions becomes independent of cot β as the common factors from partial decay rates and the total width cancel out. The remaining key parameter is thus the fermion mass resulting in dominant Higgs decay to b-jet pair.
It will be shown that using kinematic correction applied on final state four-momenta, a dramatic improvement of the results is obtained compared to those reported in [35]. Moreover two possible approaches for the simultaneous reconstruction of the Higgs bosons are introduced and compared.

Signal and background processes
The signal process is assumed to be the Higgs boson pair production producing four b-jet final state in the framework of type-I 2HDM, i.e., e + e − → AH → bbbb. Since Higgs-fermion couplings are proportional to the fermion mass and the common cot β factor cancels out in branching ratio calculations at tree level, both Higgs bosons predominantly decay to bb which is the heaviest accessible fermion pair, provided that the Higgs boson mass is below the top quark pair production threshold.
The linear collider is assumed to be e + e − collider operating at center-of-mass energy of √ s = 1 TeV which is realized at the upgrade phase of ILC with target integrated luminosity of 8 ab −1 [43,44].
For illustrative purposes, several benchmark scenarios are introduced and the analysis focuses on the selected points in the Higgs boson mass parameter space as shown in Fig. 1. The masses of the CP-even (H) and CP-odd (A) Higgs bosons are chosen to be in the region between the two dashed lines shown in Fig. 1.
The analysis strategy is different from what is adopted by LHC experiments (ATLAS and CMS). They take pp → A + X as the signal followed by . This requirement limits their explorable region in m A vs m H plane which is shown with the upper dashed line in Fig. 1. In the current analysis, the Higgs boson fermionic decay, i.e., H/A → bb is adopted for analysis. Therefore it is possible to explore regions in parameter space which are inaccessible by the current LHC analyses, i.e., those with m A − m H < m Z . These points are well outside the excluded region of type-I 2HDM reported by LHC [23]. Table 2 shows parameter values for the four benchmark points BP1-BP4. The tan β parameter is set to 10 and sin(β − α) is required to be 1 for all scenarios. The Higgs potential mass parameter m 2 12 is determined by searching for values which satisfy the theoretical requirements of potential stability [45], unitarity [46][47][48] and perturbativity. Therefore, for each benchmark point, a range of allowed m 2 12 values is obtained as shown in Tab. 2.
All calculations related to the parameter values and theoretical requirements as well as the Higgs boson branching ratio of decays are performed with the help of 2HDMC 1.8.0 [49][50][51]. Agreement with experimental results is confirmed by embedding HiggsBounds 5.9.0 [52][53][54][55][56] and HiggsSignal 2.6.0 [57][58][59] in 2HDMC 1.8.0 where the selected benchmark points are checked to be not in the excluded regions reported by LHC and TeVatron experiments.
Since, the signal final state consists of four b-jets, any SM process with the same final state should be regarded as the background. Moreover, detector effects, b-tagging fake rate and the final state radiation can also be a source of background. Therefore, Drell- Yan Z/γ * (single neutral gauge boson), ZZ (gauge boson pair) and tt (top quark pair) are the main background production processes. The so called overlay hadronic background from photon interactions, i.e., γγ → hadrons is not simulated in the analysis. However we follow the same approach as adopted by the CLIC collaboration reported in [61] by adding the jet momentum smearing to account for the effect of the hadronic overlay on the jet reconstruction. The jet smearing at 1.5 TeV collisions proposed in [61] assumes 1% and 5% relative smearing applied to the jet momentum with |η| < 0.76 and |η| ≥ 0.76 respectively. We perform a rough tuning of the above values to 0.7% and 3% in the corresponding pseudorapidity bins for 1 TeV collisions.

Analysis strategy for Higgs boson reconstruction
In this section, the software setup for event generation, cross section calculation, detector response simulation and analysis is presented in detail including package versions which are all the current latest versions.
The b-tagging algorithm and kinematic corrections applied on b-jets based on full four momentum conservation are described in the next sub-sections. The analysis details are then presented where two approaches for finding the true combinations of final state objects are described and the best approach is 1987 − 2243 3720 − 3975 5948 − 6203 8671 − 8926 tan β 10 sin(β − α) 1 Table 2 The selected benchmark points of the analysis and parameter values. adopted.

Analysis software setup
The signal and background generation is performed with the use of WHIZARD 3.0.0-β [62,63] including the beam spectrum and initial state radiation (ISR). The beam spectrum file is taken from the official package repository [64]. For the detector coordinate system we use the azimuthal angle φ and pseudorapidity defined as η = − ln tan(θ/2) where θ is the polar angle with respect to the beam axis.
The detector simulation output is stored in ROOT files which are created using ROOT 6.22/08 [71] and serve as the datasets for the final numerical and graphical analysis.

Cross sections
The signal and background cross sections are obtained using both PYTHIA and WHIZARD. The signal process is defined in WHIZARD using built-in models THDM or MSSM (both models give identical cross sections) and the Higgs boson masses are set in the SINDARIN command files. Using PYTHIA for signal production requires mass spectrum files in LHA format [60] which are generated using 2HDMC. Tables 3  and 4 show the signal and background cross sections respectively with values from the two generators. Final results are normalized to WHIZARD cross sections. The Z ( * ) /γ * has been generated in the fully hadronic final state and is slightly higher than the corresponding result from PYTHIA. The ZZ background includes only Z boson pair production. The WHIZARD cross sections include beam spectrum and ISR.   Table 4 Background cross sections from WHIZARD (denoted by "W") and PYTHIA ("P").

b-tagging
Since there are four b-jets in the signal final state, the b-tagging algorithm is used based on MC truth matching using ILCgen card which contains p T , η dependent selection efficiencies for b-jets, c-jets and light jets (u, d, s). There are three benchmark scenarios for the average b-tagging efficiency: 80%, 70% and 50%.
Every signal or background event is required to contain exactly four b-jets. Therefore, the event selection starts with selecting events with exactly four jets requiring all of them to pass the b-tagging.
The kinematic requirement for the jet selection is E T > 10 GeV (soft jet veto) and |η| < 2 (central jet selection). Figure 2 shows the jet multiplicity in signal and background events and provides the reason for excluding events with more than four jets which are dominated by tt. Figure 3 shows the b-jet multiplicities in three b-tagging scenarios. Although choosing tighter b-tagging scenario reduces the four b-jet selection efficiency in signal events from 70% to roughly 50%, the contribution of tt events in the four b-jet bin is the most important point due to the very high cross section of this process. However, as is seen in Fig. 3, the contribution of this background in the four 4-jet bin is suppressed very well by choosing the third b-tagging scenario with average efficiency of 50%.

Kinematic correction
When an event containing four b-jets is selected, a kinematic correction is applied on the b-jets four momenta according to what is expected from momentum and energy conservation.
The energy conservation relies on the fact that at lepton colliders, the beam energy is known within the uncertainty arised from the ISR and beamstrahlung. At hadron colliders, the situation is more complicated due to the fact that the effective center of mass energy varies event by event due to the parton distribution functions. The beam spectrum and ISR certainly affect the correction procedure in this analysis, however, as can be seen from the final results, a reasonable performance is observed even including such effects.
It should be noted that the beam crossing angle can also affect the kinematic correction performance which implies that there is no total momentum component in any direction. The effect of beam crossing angle can easily be activated in WHIZARD. However, since it is not yet implemented in DELPHES, we did not apply it for the current analysis.
The set of four equations representing fourmomentum conservation includes four correction factors which are named c i with i ∈ 1 − 4 assigned to the four b-jets in the event: Therefore all four momentum components of each bjet receive the same correction factor without changing the b-jet flight direction. In order to avoid negative energy values, all correction factors are required to be positive. The set of Eq. 2 consists of four linear equations with four unknowns. The momentum components of the four b-jets make the 4 × 4 coefficient matrix which has a non-zero determinant due to the random nature of the momentum components in events. Therefore, a unique non-trivial solution for the four coefficient factors is expected.
The energy correction may change the order of the b-jets in the list as they are sorted according to descending energies by default. Therefore, the energy sorting algorithm is applied again after the correction and the four b-jets are selected for the rest of the analysis.
Tables 5 and 6 present the pre-selection efficiencies of the jet reconstruction and four jet selection, btagging (having four b-tagged jets in the event) and the positive correction factor requirement in signal and background processes.
According to Tab. 5, the correction efficiencies increase with increasing the Higgs boson masses. In other words, the correction performance is better for harder jets from heavier Higgs bosons and reaches 86% for BP4.
The single Z/γ * production is expected to have a low four jet selection efficiency. However, as seen from Tab. 6, The ZZ background is also suppressed very well by the four jet selection which is due to the performance of the kinematic cuts applied on the jets. The main kinematic difference between the signal and ZZ events is due to the pseudorapidity distributions which are shown in Fig. 4.
As seen from Fig. 4, the jets from ZZ events tend to proceed through the forward/backward region resulting in central jet selection efficiency (|η| < 2) of about 66%. Therefore, all four jets appear in |η| < 2 region with probability of ∼ 18%. The cut on the jet transverse energy (E jet × sin θ) further reduces the selection efficiency to 11% which appears in Tab. 6. The b-tagging requirement suppresses this background further, which, followed by the correction efficiency, results in preselection efficiency of 10 −3 . Figure 5 shows correction factor distributions in signal events (BP1) with average values of 1.16, 1.25, 1.21 and 1.26. While the mean values of the distributions are close to unity, their widths (RMS values) are 0.6, 1.1, 1.45, 1.98 for c 1 to c 4 respectively and again shows better performance of the correction for harder jets.
In order to verify the correction efficiency, using Fig. 5 as the example, the four correction factors c 1 to c 4 are found to be positive with efficiencies of 98, 96, 92 and 85% respectively. Since all factors are required to be positive, the quoted efficiencies are multiplied to yield a total efficiency of ∼ 70% which is what we obtain in the event analysis.
The effect of the kinematic correction on the b-jet pair invariant mass is shown in Fig. 6 using BP2 as the example. Details of the b-jet pair selection are presented in the next sections.
The correction factor sensitivity to the reconstructed jet energies is verified by estimating the uncertainty of the correction factors due to the jet energy smearing.
In order to do so, 1% additional smearing is applied on the jet four-momentum components on event-by-event basis and the distributions of relative errors of the correction factors are obtained. Results are shown in Fig. 7 and can be regarded as the correction factor smearing due to 1% uncertainty in the jet energies. The average uncertainties are 9, 11, 14 and 18% for c 1 to c 4 respectively.
The above estimates are one of the sources of the total uncertainty in the final distributions. However, a detailed study of different sources of uncertainties and their influence on the final distributions is beyond the scope of the current analysis and can be performed in a more detailed analysis based on full simulation of the detector.    Figure 7 The correction factor uncertainty due to 1% smearing on the jet energies.

b-jet pair selection based on their energies
Finding the true b-jet combination for Higgs boson reconstruction relies on two approaches. In the  first approach, we note that decay products which move closer to the beam axis in the Higgs boson rest frame, acquire the highest and lowest energies when the Lorentz boost is applied to transform them from the Higgs boson frame to the laboratory frame. The other decay products belong to the latter Higgs boson whose decay occurs at a larger angle with respect to the beam axis. Therefore having sorted the four b-jets in terms of their energies, b 1 and b 4 are expected to be the decay products of one Higgs boson and b 2 and b 3 from the other.
Since the decay products of the Higgs bosons fly at random angles with respect to the beam in each event, two possible scenarios may occur: Some events choose the former scenario and the others choose the latter. Therefore, the b-jet pair invariant mass distribution dramatically shows both Higgs boson signals even if the distribution is obtained using only b 1 b 4 or b 2 b 3 . If the Higgs boson masses are different enough to distinguish their signals, two separated peaks are observed, otherwise only one peak is observed. Figure 8 shows an example of the signal distributions (BP2) with two pairings, i.e., b 1 b 2 and b 2 b 3 . The latter pairing results in slightly better distribution due to using more central b-jets.

b-jet pair selection based on their spatial distance
In the alternative approach, the two b-jet pairs are selected requiring minimum ∆R between each pair. In order to do so, sum of the two ∆R values are required to be minimum for the selected pairs. The idea is based on the fact that, in general, b-jets from the decay of a particle, are expected to proceed at smaller ∆R values compared to two randomly se- lected b-jets, each one belonging to a different particle. Figure 9 shows an example of a signal event (BP1) in four b-jet final state in the detector using DELPHES Display module [67]. state in the detector. Jets from each Higgs boson tend to be collinear. The visible non-zero total momentum along the beam axis is due to the ISR+beamstrahlung as verified by inspecting the WHIZARD event file which was used for the simulation.
The above requirement turns out to perform the bjet pairing very similar to the previous scenario. As an example, in 97% of the BP1 events, the selected pairs are b 1 b 4 and b 2 b 3 and ∆R(b 1 b 4 ) + ∆R(b 2 b 3 ) is minimum among other possible combinations.
Therefore, for the final event selection, the two approaches described in sub-sections 3.5 and 3.6 are combined by requiring min(∆R(b i b j ) + ∆R(b k b l )) and then demanding i = 1, j = 4, k = 2 and l = 3.
The performance of this requirement depends on the Higgs boson masses and their momenta and decreases when m H +m A reaches the kinematic threshold √ s. In such cases, the two Higgs bosons are almost created at rest with their decay products flying back-to-back at the maximum ∆R in the laboratory frame. However, we keep this requirement to suppress the large Z/γ * background, which otherwise, extends to the signal region. Figure 10 compares the two pairing scenarios with min(∆R) applied. The two distributions are in general better than those shown in Fig. 8 Figure 11 shows the bb invariant mass distributions in signal (BP2) and background events with b-tagging efficiencies of 80 and 70%. The tt contribution is sizable in these two b-tagging scenarios, however, this background is suppressed well by the tight b-tagging scenario with average efficiency of 50%. Therefore the final results are shown in Fig.  12 based on the tight b-tagging scenario and normalized to integrated luminosity of 1 ab −1 which is enough for signal observation and can be easily used to re-scale the results to any other integrated luminosity.

Mass window optimization and final statistics
The final statistics can be obtained by applying a mass window cut which is optimized to achieve the maximum signal significance. Results are presented in terms of an optimized interval (window) in the distribution and the number of signal and background events in the mass window are counted for signal significance evaluation.  Figure 12 The b2b3 pair invariant mass distributions in signal and background events at √ s = 1 TeV. The min(∆R) requirement has been applied as described in sub-section 3.6. Table 7 shows the results of the adopted scenario presented in sub-section 3.6, including the min∆R cut efficiency, the mass window, total signal selec-tion efficiency, number of signal (S) and background (B) events, signal to background ratio and the signal significance (defined as S/ √ B), for each benchmark point, at √ s = 1 TeV and integrated luminosity of 1 ab −1 . The corresponding signal significance values using the first scenario (using b 2 b 3 without min(∆R) cut) are 109, 14, 31, 9.3.
The quoted results presented in Tab. 7 show that all selected benchmark points are observable, however, these results can easily be normalized to any other integrated luminosity.
The signal cross section is independent of tan β as the Z.H.A vertex coupling is sin(β − α) (which was set to unity in the current analysis). The Higgs boson decays with the above assumption depend on cot β and below the kinematic threshold of Higgs boson decay to tt, BR(H/A → bb) is dominant and independent of tan β at tree level. Therefore, results obtained in this analysis hold for other tan β values which are not yet excluded by experiments.

Conclusions
Possibility of observing CP-even and CP-odd neutral Higgs bosons, H and A, was studied at a lepton collider operating at √ s = 1 TeV. The theoretical framework was chosen to be the type-I 2HDM with sin(β − α) = 1 and tan β = 10. The Higgs boson decay to b-jet pair, H/A → bb, was analyzed using a fast detector simulation and two different approaches for the signal observation were presented including a kinematic correction based on the four momentum conservation. It was illustrated that a clear signal can be observed on top of the background in the bb invariant mass distribution and the signal significance exceeds 5σ in all benchmark points at integrated luminosity of 1 ab −1 . The current analysis contains improvements in several aspects, i.e., the use of beam spectrum in event generation, more dedicated ILC detector card (ILCgen), application of several b-tagging scenarios and using updated software related to theoretical and experimental considerations. Compared to the previous results reported in [35] the observed signal distributions are much sharper and well located above the assumed Higgs boson masses, the total background is more suppressed and the signal to background ratio and the signal significance also show reasonable enhancements and more points proved to be explorable with the current analysis setup. [10] G. Aad Table 7 Selection efficiencies and the signal significance in different benchmark points based on the scenario described in sub-section 3.6. The min(∆R) cut means that indices which minimize ∆R(bibj) + ∆R(b k b l ) are i = 1, j = 4, k = 2 and l = 3. [17] Branco, G.C., Spontaneous CP nonconservation and natural flavor conservation: a minimal model, Phys. Rev. D22, 2901 (1980).