Measuring the trilinear Higgs boson self--coupling at the 100 TeV hadron collider via multivariate analysis

We perform a multivariate analysis of Higgs-pair production via the decay channel $HH \to b\bar b \gamma\gamma$ at the future 100 TeV $pp$ collider to determine the trilinear Higgs self--coupling (THSC) $\lambda_{3H}$, which takes the value of 1 in the standard model. We consider all known background processes. For the signal we adopt the most recent event generator of {\tt POWHEG-BOX-V2} to exploit the NLO distributions for Toolkit for Multivariate Data Analysis (TMVA). Through the technique of Boosted Decision Tree (BDT) analysis trained for $\lambda_{3H}=1$, compared to the the conventional cut-and-count approach, the signal-to-background ratio improves tremendously from about $1/10$ to $1$ and the significance can reach up to $20.5$ with a luminosity of 3 ab$^{-1}$. In addition, by implementing a likelihood fitting of the signal-plus-background $M_{\gamma\gamma b b}$ distribution with optimized bin sizes, the THSC can be determined with the precision of 7.5\% at 68\% CL with 3 ab$^{-1}$.


I. INTRODUCTION
Since the discovery of the 125 GeV Higgs boson in 2012 at the LHC [1], we have been looking for a clear signal or even a hint of new physics beyond the Standard Model (SM) but without much success. Moreover, after completing the Runs I and II at the LHC, it turns out that the 125 GeV Higgs boson is best described as the SM Higgs boson [2], although there is an upward trend in the overall signal strength [3]. Under this situation, one of the most solid avenues to explore for new physics is to measure the Higgs potential which could be significantly different from that of the SM.
The current limits on the THSC in units of λ 3H , which takes the value of 1 in the SM, are −5.0 < λ 3H < 12 from ATLAS [15] and −11.8 < λ 3H < 18.8 from CMS [16] at 95% confidence level (CL). At the high-luminosity option of the LHC running at 14 TeV (HL-LHC) with an integrated luminosity of 3 ab −1 , a combined ATLAS and CMS projection of the 68% CL interval is 0.57 < λ 3H < 1.5 without including systematic uncertainties [17].
On the other hand, at the International Linear Collider (ILC) operated at 1 TeV can reach the precision of 10% at 68% CL with an integrated luminosity of 8 ab −1 [18,19].
In this work, we perform a multivariate analysis of Higgs-pair production in HH → bbγγ channel at the 100 TeV hadron collider. In our previous work, based on the conventional cut-and-count analysis, it was shown that the THSC can be measured with about 20% accuracy at the SM value with a luminosity of 3 ab −1 [20]. In this Letter, with the use of the BDT method closely following Ref. [21], we show that the THSC can be measured with a precision of 7.5% at 68% CL at the 100 TeV hadron collider assuming 3 ab −1 luminosity, which is superior to the accuracy expected at the 1 TeV ILC even with 8 ab −1 .

II. EVENT GENERATION AND TMVA ANALYSIS
The Higgs bosons in the signal event samples are generated on-shell with a zero width by POWHEG-BOX-V2 [22,23] with the damping factor hdamp set to the default value of 250 to limit the amount of hard radiation. This code provides NLO distributions matched to a parton shower taking account of the full top-quark mass dependence. The signal cross section at NNLO order in QCD is calculated according to σ NNLO (λ 3H ) = K NNLO/NLO SM σ NLO (λ 3H ) using σ NLO (λ 3H ) from POWHEG-BOX-V2 and K NNLO/NLO SM = 1.067 [24] in the FT approximation in which the full top-quark mass dependence is considered only in the real radiation while the Born improved Higgs Effective Field Theory is taken in the virtual part. And then, the MadSpin code [25] is used for the decay of both Higgs bosons into two bottom quarks and two photons.
For generation and simulation of backgrounds, we closely follow Ref. [20], except for the use of the post-LHC PDF set of CT14LO [26] for non-resonant backgrounds. Furthermore, for the two main non-resonant backgrounds of bbγγ and ccγγ, we use the merged cross sections and distributions by MLM matching [27,28] with xqcut and Q cut set to 20 GeV and 30 GeV, respectively. For the remaining non-resonant backgrounds, we are using the cross sections and distributions obtained by applying the generator-level cuts as adopted in Ref. [9,13] which might provide more reliable and conservative estimation of the nonresonant backgrounds containing light jets [20].
For parton showering and hadronization, PYTHIA8 [29] is used both for signal and backgrounds. Finally, fast-detector simulation and analysis are performed using Delphes3 [30] with the Delphes-FCC template.
All the signal and backgrounds are summarized in Table I, together with information of the corresponding event generator, the cross section times the branching ratio and the order in QCD, and the Parton Distribution Function (PDF) used.
A multivariate analysis is performed using TMVA [31] with ROOTv6.18 [32]. After applying a sequence of event selections as in Table II, we choose the following 8 kinematic variables for TMVA: The judicious choice of the two photons or two b quarks for the above TMVA variables has been made as in [21]. We also refer to Ref. [21] for the details of our TMVA setup and analysis. And we choose BDT for our analysis since the BDT-related methods show higher performance with better signal efficiency and stronger background rejection.

III. RESULTS
In the left panel of Fig. 1, we show the BDT responses obtained using BDT trained for λ 3H = 1 which is dubbed as BDT SM . By validating the BDT distributions for Di-photon trigger condition, ≥ 2 isolated photons with P T > 30 GeV, |η| < 5 Events are required to contain ≤ 5 jets with P T > 40 GeV within |η| < 5  when the BDT response is cut at 0.216, at which, the signal and background efficiencies are 0.48 and 1.58 × 10 −4 , respectively. We denote by vertical lines the positions of the optimal cut on the BDT response which maximizes the significance.
In Table III, we present the expected number of signal and background events at the 100 TeV hadron collider assuming 3 ab −1 using BDT SM with the BDT response cut of 0.216.
We show the four representative values of λ 3H for signal and the backgrounds are separated into three categories. For comparisons, we also show the results obtained using the cutand-count analysis [20]. In the last column, we additionally present the effective luminosity (Eff. Lumi.) for each of signal and background samples. In the tt and ttγ backgrounds, the first (second) number is the effective luminosity when the two top quarks decay fully (semi-) leptonically. We find about 550 signal and 550 background events for λ 3H = 1. Comparing to the results using the cut-and-count analysis [20], the number of signal events decreases by only 19% while the number of backgrounds by almost 90%, resulting in an increase in significance from 8.44 to 20.50. Note that the composition of backgrounds changes drastically by the use of BDT. In the cut-and-count analysis, the non-resonant background is about two times larger than the single-Higgs associated background. While, in the BDT analysis, the single-Higgs associated background is more than four times larger than the non-resonant one and tt associated background becomes negligible. Note that we generate relatively smaller number of events for the ccγγ, ccjγ, and bbjj backgrounds since we observe that they quickly decrease when the BDT response cut approaches to the point Z max of 0.216. Specifically, the bbjj background vanishes for the BDT response cut larger than 0.2. Otherwise, we generate enough number of events considering the assumed luminosity of 3 ab −1 .
First, we try to determine the THSC considering the total number of events. As shown in the left panel of Fig. 2, we find that the THSC can be measured with about 11% accuracy at the SM value which is about two times better than the result based on the conventional cut-and-count analysis [20]. However, there is a second solution around λ 3H = 6.5. To lift up the two-fold ambiguity, we implement a likelihood fitting of the signal-plus-background M γγbb distribution and find the second solution is ruled out by more than 8σ confidence, see the right panel of Fig. 2.
To improve the sensitivity of the THSC around the SM value and to tame the statistical fluctuation due to the limited size of the MC samples, we repeat the likelihood fitting of M γγbb distribution by optimizing the bin size between 1/20 GeV and 1/60 GeV. Finally, we find that the THSC can be determined with a precision of 7.5% at 68% CL as shown in the left panel of Fig. 3. In the right panel of Fig. 3 Before we end this section, in Table IV, we show the relative importance of the variables that we employed in this BDT analysis. We observe that the two most important variables are ∆R bb and ∆R γγ , which is consistent with our previous cut-and-count analysis [20].  the THSC λ 3H that one can expect at the 100 TeV pp collider with an integrated luminosity 3 ab −1 . With TMVA one can improve the signal-to-background ratio for λ 3H = 1 to 1 : 1 compared with the ratio 1 : 10 obtained in the conventional cut-and-count approach.
Furthermore, the significance of such a signal jumps to 20.
Other than determining the THSC by measuring the total number of events, one can also improve the sensitivity and lift the two-fold degeneracy by implementing a likelihood fitting of the signal-plus-background M γγbb distribution with optimized bin sizes. The THSC can be determined with a precision of 7.5% at 68% CL with 3 ab −1 , which is indeed better than the ILC running at 1 TeV with 8 ab −1 . Extrapolating our result conservatively, we expect that one can achieve the precision better than ∼ 2% with 30 ab −1 .