Combined analysis of double Higgs production via gluon fusion at the HL-LHC in the effective field theory approach

We perform the combined analysis of the double Higgs production via gluon fusion in the $b\bar{b} \gamma\gamma$ and $b\bar{b}\tau^+\tau^-$ decay channels at the High-Luminosity LHC (HL-LHC). To validate our analysis, we reproduce the ATLAS result of the $b\bar{b} \gamma\gamma$ process including all contributions from fakes. For the $b\bar{b}\tau^+\tau^-$ decay channel, we perform the similar analysis to the CMS one. As an improvement, we also perform the multivariate analysis employing the boosted decision tree algorithm. Then, we derive 68% probability contours on anomalous Higgs couplings in the effective field theory (EFT) approach for various analyses. We find that the $b\bar{b}\tau^+\tau^-$ process outperforms the $b\bar{b}\gamma\gamma$ for the measurement of energy-growing operators, while adding the $b\bar{b}\tau^+\tau^-$ process is least beneficial for improving the precision of the Higgs self-coupling (mainly set by the $b\bar{b}\gamma\gamma$ process). We illustrate that the double Higgs production alone can be comparable to the single Higgs process in constraining the modification of the top Yukawa coupling in the positive direction. Focusing on the Higgs self-coupling as a special interest, we derive the precision as a function of various improvable parameters such as tag and mistag rates of tau leptons, heavy flavor jets, photon identification, diphoton mass resolution, and jet energy resolution to take into account future phenomenological studies. As an optimistic benchmark scenario, we illustrate that the 68% and 95% probability intervals of the Higgs self-coupling, $\lambda_3/\lambda_{3}^{SM}$, at the HL-LHC can reach $[0.2,\, 2.3]$ and $[-0.1,\, 3.5] \cup [4.0,\, 6.5]$, respectively, where the correlation among the EFT coefficients is taken into account.


I. INTRODUCTION
The precision measurements of the interactions of the recently discovered Higgs boson [1,2] with the fermions and gauge bosons at the Large Hadron Collider (LHC) indicate that we live in a special theory that stays weakly coupled up to very high energy scales, namely the Standard Model (SM) [3][4][5][6]. On the other hand, the story of the purely Higgs sector is a bit pessimistic. The global picture of the Higgs potential in the bottom-up approach is currently unavailable 1 , and we may have to wait a long time until we achieve a high precision on the Higgs self-coupling according to recent studies at various future colliders [9][10][11][12][13]. Without the precision measurement of the Higgs self-coupling, it will be unlikely for us to complete our understanding of the origin of electroweak symmetry breaking and the thermal history of the Higgs potential through the evolution of the Universe. It is not hard to imagine a new physics scenario that manifests itself only via the modification of the Higgs self-coupling.
For instance, a large deviation of the Higgs self-coupling from the SM prediction can be linked to the baryon-anti-baryon asymmetry based on the strong first-order electroweak phase transition [14][15][16].
In this work, we revisit the double Higgs production via gluon fusion known as the most prominent process 2 to access modifications of the Higgs self-coupling. We target the HL-LHC as it will likely be the earliest future collider to provide us a meaningful precision on the Higgs self-coupling. We explore three directions to exploit the benefit from the HL-LHC: i) we combine two decay channels, bbγγ and bbτ + τ − , of the double Higgs production in the effective field theory (EFT) approach just like combining various decay channels in the single Higgs coupling measurements is beneficial; ii) we apply a multivariate technique employing the boosted decision tree (BDT) algorithm to our analysis as a way to enhance the significance; iii) we parametrize the precision of the Higgs self-coupling as functions of various improvable variables such as tag and mistag rates of tau jets and heavy flavor jets, photon identification efficiency, and invariant mass resolution etc to take into account future phenomenological studies.
As the evidence of the new physics is not seen with the increasing reach of the new physics scale, the viewpoint of the SM as an effective field theory provided by the large mass gap between the electroweak scale and the new physics scale makes more sense as a way to parametrize the effects of the new physics. In the EFT approach, the new physics effects are 1 The constraints on the Higgs self-coupling extracted from the LHC data at √ s = 8 and √ s = 13 TeV are found to be weak [7,8]. 2 There also have been attempts to constrain the Higgs self-coupling via single Higgs process [17][18][19][20] (see [21] for a related discussion), the V hh (V = W, Z) process [22], and double Higgs production in the vector boson fusion [23]. encoded in the coefficients of the effective Lagrangian. The precision measurements of those coefficients will constrain the possible structure of the underlying new physics from which the EFT originated. When the new physics is not far from the cutoff scale of the EFT, it may manifest itself as a large deviation of the EFT coefficient. The total amplitude of the double Higgs production process in the EFT approach contains many diagrams due to new types of interactions in the EFT Lagrangian (see Fig. 1). The amplitude is parametrized in terms of five anomalous couplings in the nonlinear basis, namely the top Yukawa coupling, the Higgs self-coupling, contact interactions between two Higgs bosons and two top quarks, a Higgs boson and two gluons, and two Higgs bosons and two gluons. From the viewpoint of the EFT, setting all the EFT coefficients to the SM values except λ 3 (Higgs self-coupling), as commonly done in literature, may not be justified unless the selection of only λ 3 is associated with a symmetry or a hidden fine-tuning is involved. To take into account the correlation between the Higgs self-coupling and other anomalous couplings, we keep all the EFT coefficients in our analysis. We will perform the marginalization over other parameters when we derive the precision of the Higgs self-coupling as our special interest.
Previous studies on the double Higgs production at the HL-LHC have been performed in various final states including bbγγ [11,[24][25][26][27][28][29][30][31][32][33], bbτ + τ − [31,34,35], bbW + W − [36,37] and bbbb [38][39][40]. Among those, despite its lower signal rate, the bbγγ (BR 0.264 %) decay channel is the cleanest and most thoroughly studied in literature. Apparently, switching to the bbτ + τ − (BR 7.31 %) decay channel may look promising, as the cross section increases by a factor of 27.7 compared to the bbγγ decay channel. However, the signal rate is penalized by a series of factors, the first one being the sub-branching ratios of the τ + τ − system, namely 42.3% and 45.5% for the fully hadronic and semileptonic final states, respectively. The signal rate is further penalized by the tau tagging efficiency. Finally, reconstructing the h → τ + τ − system, for instance, against Z → τ + τ − is challenging due to the irreducible loss of information via invisible neutrinos [41]. As a result, the performance of the bbτ + τ − channel does not look better than the bbγγ channel [31]. The bbτ + τ − may have the potential to be at best comparable to the bbγγ. While most previous studies of the double Higgs production at the HL-LHC focused only on the variation of λ 3 , the EFT approach was initiated in [42] and [11] (see [28,29] for related studies) for the bbγγ decay channel and [43] for the bbτ + τ − decay channel 3 . In our work, we for the first time perform the combined analysis of two decay channels, bbγγ and bbτ + τ − , in the EFT approach. We show that the bbτ + τ − process outperforms the bbγγ process for the measurement of energy- 3 The result in [43] should be considered optimistic due to optimistic tau reconstruction efficiency and a smaller set of backgrounds. For instance, truth tau leptons with 70% reconstruction efficiency was assumed with a negligible fake rate and only three irreducible backgrounds, tt, hZ, and ZZ, were included. growing operators, while adding the bbτ + τ − process is least beneficial for improving the precision of the Higgs self-coupling.
A multivariate analysis seems the right method beyond cut-based analysis to improve the performance when kinematic variables are correlated in a complicated way, since it can efficiently identify a signal region in a multidimensional parameter space. Some studies are found in [30,39]. We utilize the boosted decision tree (BDT) algorithm to estimate its impact on the sensitivity of the EFT coefficients 4 . We find that the multivariate analysis improves the significance of the SM from 1.3 to 2.1 at the HL-LHC for the bbγγ decay channel and from 0.9 (0.6) to 1.4 (0.8) for the fully hadronic (semileptonic) bbτ + τ − decay channel. We observe that the benefit of the multivariate analysis becomes more evident for the EFT coefficients Improving the photon identification, tagging efficiencies of τ jets and heavy flavor jets (such as b , c jets) as well as suppressing various fake rates, j → γ, j → b, and c → b, are key to enhancing significance. Also, high-quality invariant mass resolutions of m γγ , m bb , and m τ τ are incontrovertible ingredients to disentangle a signal from the backgrounds. Since those improvements can be made via various independent phenomenological studies, it would be informative to express the precision as a function of those improvable factors to predict our capability in the future. A similar approach has been considered in the study at the 100 TeV pp collider [13]. This paper is organized as follows. In Section II, we discuss the basis of anomalous Higgs couplings and the prametrization of the cross section in the EFT approach. In Section III A, we perform the analysis of the bbγγ decay channel. We first validate our cut-based analysis by reproducing the ATLAS result at the HL-LHC [32] (see also [44]). Then we carry out a simple multivariate analysis using the BDT technique. In Section III B, we perform a similar exercise for the bbτ + τ − decay channel. Here, we follow the similar CMS cut-based analysis [31]. In Section IV, we discuss the sensitivity of the EFT coefficients obtained by either individual or combined analysis. We present the constraint on the Higgs self-coupling taking into account the correlation among the EFT coefficients. We also discuss the impact of the individual improvable factors on the precision of the Higgs self-coupling. Finally, we present our projected precision on the Higgs self-coupling based on an optimistic benchmark scenario at the HL-LHC. In Section V, we summarize our results and reach out to a conclusion. In Appendix A, we provide the detail of our background simulation.

A. Parametrizations of Higgs boson couplings
Embedding the Higgs boson into an effective field theory approach can be straightforwardly done once a guiding principle that the interactions of the Higgs boson with the other SM fields are determined. In the SM, the physical Higgs boson is a part of the doublet of the SU (2) L × U (1) Y gauge group. This linear representation of the Higgs doublet is well supported by the observation at the LHC which points toward the SM prediction. A new physics effect will be allowed to appear in the vicinity of the SM set by the experimental uncertainty. The corrections to the SM are encoded in coefficients of higher-dimensional operators O i organized by their dimensions, where the Λ is the cutoff scale where a new physics kicks in. The operators with an odd dimension have been neglected by assuming the accidental global symmetries in the SM such as the lepton number. The leading contribution comes from the dimension-six operators in this framework. We list only the dimension-six operators which participate in the double Higgs production process, and they are written as, in the SILH basis [45], where v = 1/( √ 2G F ) 1/2 = 246 GeV and m h = 125 GeV. These dimension-six operators are formed by adding two additional powers of the Higgs doublet, H † H, to the renormalizable terms in the SM while keeping the same number of derivatives. An additional power of the Higgs doublet in the SILH basis is accompanied by the factor, g * /m * ≡ 1/f , where a new physics is characterized by one coupling g * and one scale m * which is associated with the scale of new states. The SILH power counting of the coefficients estimates where λ is a weak spurion coupling suppressingc g .
The typical size of the coefficients varies depending on an assumption on the structure of the UV completion. For instance, dropping the assumption on the Higgs boson as a pseudo-Goldstone boson in the SILH power counting will remove the suppressions in Eq. (2) which breaks the shift symmetry. In this modified power counting, the degeneracy in the size of the coefficients in Eq. (3) breaks, and a parametric separation between coefficients can be achieved [11,21], where λ 4 is the quartic coupling in the SM Higgs potential. The parametrization in the basis of the Higgs boson that belongs to the linear representation in Eq. (2) is suitable to the small sizes of the coefficients in order for the EFT expansion to make sense. In the situation where the size of the coefficients can be substantially large, the parametrization in the nonlinear basis in terms of the custodial singlet physical Higgs boson h becomes more suitable. It is called a nonlinear basis in a sense that SU (2) L × U (1) Y gauge group is nonlinearly realized. The Higgs boson h in the nonlinear basis is not necessarily the SU (2) L doublet, but it can be more generic. The following five parameters in the nonlinear basis are relevant for double Higgs production, where we set top quark mass to m t = 173 GeV. As is evident in Eq. (5), the coefficients of terms with one Higgs and two Higgs bosons are not necessarily related in the nonlinear basis unlike the case of the linear basis. The coefficients in the nonlinear basis can be matched to those in the linear basis by resuming over all powers of H and expanding operators in terms of the physical Higgs h. At the level of dimension-six operators, the relations are given by where α 2 ≡ g 2 /4π.
The O H operator in the linear basis (the first term of Eq. (2)) leads to the universal modification of the processes involving the Higgs bosons via the wave function renormalization of the Higgs boson, and it is strongly constrained by the h − Z coupling measurement.
Then, a sizable deviation of the Higgs self-coupling in the nonlinear basis is translated into the sizable value of c 6 according to the relation of Eq. (6). The validity of the EFT with a large c 6 will be an important issue that has to be considered. As is indicated in Eq. (6), the modification of the top Yukawa coupling, c u , can be measured via three types of amplitudes in Fig. 1 involving c t and c 2t couplings. It will be interesting to know how well c u can be constrained by the double Higgs production process alone compared to the sensitivity extracted from the single Higgs process such as tth. We will illustrate in Section IV A that combining various decay channels of the double Higgs process can be comparable to the single Higgs process in constraining the modification of the top Yukawa coupling in the positive direction. An important characteristic feature of the double Higgs production process with the set of coefficients in Eq. (5) is that the amplitudes in Fig. 1 exhibit the different scaling behaviors with the scale of the process, √ŝ , in the high energy limit. The one-loop amplitude due to the tthh interaction with c 2t has a high-energy behavior scaling like A nl ∼ c 2t log 2 (m 2 t /ŝ). The amplitude with c t is either suppressed by the s-channel Since the Higgs self-coupling c 3 (or c 6 ) only appears in A and A 3 which have the suppression by m 2 h /ŝ factor, its sensitivity relies on the threshold region where the size of the backgrounds is largest. The modification of the top Yukawa coupling c u might gain a better sensitivity via the amplitude A nl which grows with the energy. The energy-dependent behavior suggests to us to exploit the exclusive analysis utilizing the differential distribution of m hh ≡ √ŝ to enhance the sensitivity on the couplings and break the degeneracy among various coefficients. This exercise has been done at 14 TeV and 100 TeV pp colliders for the bbγγ decay channel [11,12] and for the bbτ + τ − decay channel at 14 TeV [43].

B. Cross section of double Higgs production
The square of the summed amplitudes in Fig. 1 has various terms. We parametrize the cross section as a function of the coefficients in the nonlinear basis as in [11,21], We choose the same m hh bin size and division for the bbγγ channel as done in [11] (see also [21]). In this way, the fit coefficients of the cross section for six m hh bins in [11] can be recycled. The cross section fit for the inclusive analysis can be constructed from the exclusive ones.
The similar type of exclusive analysis in the bbτ + τ − decay channel is more challenging.
Since the decay of the τ + τ − system always involves invisible neutrinos, it is difficult to extract the exact scale of the process, √ŝ (= m hh ). Instead, one can at most reconstruct the transverse mass of the bbτ + τ − system, denoted as m vis hh , out of the available information in the final state 5 . Then, it is technically difficult to get a set of fit coefficients of Eq. (7) with a great accuracy for many m vis hh bins by the similar trick used in [11] (using the events at the hadron level necessarily lose statistics due to a low efficiency, reducing the accuracy of the fit coefficients). In this work, we will perform only the inclusive analysis of bbτ + τ − channel without using m vis hh distribution. More importantly, √ŝ is a scale that controls the validity of the EFT. Only events below the cutoff scale, √ŝ < Λ, must be included in the analysis to derive the sensitivity on the EFT coefficients. Imposing a cut on any type of transverse mass, m vis hh , will inevitably introduce a contamination from the events above the cutoff scale when the correlation between √ŝ and m vis hh is poor [46]. At the HL-LHC, the number of the events of the SM beyond the TeV scale after analysis cuts is extremely tiny 6 .
Although the signal rate away from the SM for operators growing with the energy might be enhanced, we expect that this issue does not cause a severe problem as long as the cutoff Λ is not assumed to be sub-TeV scale. 5 A sophisticated matrix element method (or something analogous to it) may reproduce the overall invariant mass distribution of the bbτ + τ − system that matches to the parton level m hh distribution. However, the one-to-one matching between two quantities in an event-by-event basis is still difficult. 6 See Table II for the differential distribution of m hh in the bbγγ decay channel. The situation will be similar for the bbτ + τ − decay channel as its overall signal rate is similar to that of bbγγ.

III. DOUBLE HIGGS PRODUCTION AT HL-LHC
The signal events were generated using our internal C++ code linked to QCDLoop [47] which evaluates one-loop diagrams in Fig. 1 with anomalous couplings (see Eq. (5)). The leading-order (LO) SM cross section is found to be 16.2 fb at √ s = 14 TeV using CTEQ6ll PDF's (LO PDF with LO α s ), setting the factorization and renormalization scales to Q = m hh , and m h = 125 GeV, m t = 173 GeV. We then rescale the signal cross section by the nextto-next-to-leading order (NNLO) k-factor of 2.27 [48] 7 as done in [11]. The hard-scattering events have been showered and hadronized by PYTHIA v8.219 [56].
All backgrounds samples were generated by MadGraph5 aMC@NLO v2.3.3 [57,58] with the default factorization and renormalization scales, interfaced with PYTHIA v6.4 for the parton showering and hadronization. The full description on the background generation can be found in Appendix A.
We include some of the detector effects based on the ATLAS detector performances [59]. We smear the momenta and energies of the reconstructed jets, photons and leptons depending on their energies 8 . On the other hand, we do not include the multiple interaction and pile-up in our simulation. Throughout our paper, the significance is defined as where N S (N B ) is the number of the signal (total backgrounds), assuming 3 ab −1 at the HL-LHC.
A. bbγγ decay channel The events first go through the photon/lepton isolation criteria. A photon (lepton) is declared to be isolated if it satisfies p Σ T /p T (γ) < 0.1 (p Σ T /p T (l) < 0.15), where p Σ T is the sum of the transverse momenta of final state particles within the ∆R = 0.4 isolation cone. We include only the isolated photons (leptons) with p T (γ) > 25 GeV (p T (l) > 20 GeV) in the |η(γ)| < 2.5 (|η(γ)| < 2.4) region. Events with more than two isolated photons are vetoed.
The remaining particles are clustered by the FastJet [60] implementation using the anti- 7 While the k-factor in [48] was obtained in the heavy top mass limit (see [49] for the first NLO calculation), the NNLL matched to the NNLO cross section at 14 TeV including the finite top quark mass effects to the NLO has been calculated to be 39.64 fb with uncertainties of +4.4% −6.0% from the QCD scale, and ±2.1% and ±2.2% from the PDF4LHC15 nnlo mc parton distribution function and α s respectively [50][51][52]. The cross section used in our analysis is about 7% smaller (conservative) but it is within the uncertainty. The dependence of the NLO, NNLO k-factors on EFT coefficients of the double Higgs production at √ s = 14 TeV have been studied in [53][54][55]. 8 For smearing jets, we take noise (N), stochastic (S), and constant (C) parameters 13.15, 0.74, and 0.05, respectively [59]. Since photons and muons can be remarkably well reconstructed, we use the effective resolutions of σ/E = 0.02 and σ/E = 0.05 E/1000 for photons and muons, respectively. k T algorithm [61] with a distance parameter of R = 0.4. Events with at least two jets which pass minimum cuts p T (j) > 25 GeV and |η(j)| < 2.5 are considered. Jets are further classified into three categories based on the heavy flavors that matches to them. Our heavy flavor tagging algorithm runs iteratively over jets to search for b hadrons or c hadrons inside them. If a b hadron (c hadron) is found inside, it is classified as a b jet (c jet). The remaining unmatched jets are called light jets.
We will only consider the events with at least four reconstructed objects, i.e. N b + N c + N light−jet + N γ + N l ≥ 4. The isolated photons and reconstructed jets are iteratively paired into two photon candidates and two b-jet candidates for the cut-based analysis. Here, the photon candidate can be either an isolated photon, a lepton or a jet faking a photon.
Similarly the b-jet candidate can be either a b jet, a c jet faking a b jet, or a light jet faking a b jet. We then reconstruct two invariant masses m γγ and m bb out of those two photon candidates and two b jet candidates.
An efficient b-tagging remains essential for obtaining a higher signal efficiency and lowering background contaminations. While a realistic b-tagging method incorporates all η and p T dependences, we apply a flat b-tag rate of b = 0.7 and a mistag rate that a c jet (light jet) is misidentified as a b jet of c→b = 0.3 ( j→b = 0.015). The effects of varying mistag rates will be discussed in Section IV B. The fake rate of a jet passing a photon identification is parameterized by ATLAS [62], The fake rate of a lepton passing a photon identification is set to l→γ = 0.02 (0.05) in the barrel |η| < 1.37 (end-cap calorimeter 1.37 < |η| < 2.37) region, based on the estimate by ATLAS [32] for the HL-LHC. Finally, a photon reconstruction efficiency is implemented based on the parametrization by ATLAS [62]

Cut-based analysis
We validate our cut-based analysis by reproducing the result by ATLAS at the HL-LHC [32] and they are summarized in Table I. The cuts in ATLAS [32] are listed below; where p T > (b) (p T < (b)) are the transverse momentum of the harder (softer) b-jet candidate.
The η denotes a pseudorapidity, and ∆R is a distance in the η-φ (pseudorapidity-azimuthal angle) plane between two objects. For the ∆R(γ, b), the minimum distance between photon and b-jet candidates is taken among four combinations. The m γγ and m bb (p T (γγ) and p T (bb)) are the invariant masses (transverse momenta) of two photon-and two b-jet systems.
The N jet is the number of jets which satisfy p T (j) > 25 GeV and |η(j)| < 2.5.
The background samples were categorized in terms of jet flavors at the hadron level as in [32]. Among the backgrounds in the Table I, the bbγγ (ccγγ) category includes the events with at least two b-jet candidates (c-jet candidates). The jjγγ category in the Table I is the collection of events with less than two b or c-jet candidates which includes jjγγ, cjγγ, bjγγ, and bcγγ at the hadron level. We find that the dominant contribution in the jjγγ category comes from bcγγ followed by bjγγ and cjγγ. The dominant contribution to the ttγ background is caused by the leptons faking the photons. As is evident in Table I, the signal rate and the backgrounds look consistent with the ATLAS simulation. The significance of the SM for the bbγγ decay channel is estimated to be 1.3 which is consistent with the ATLAS simulation. The detail of the background simulation is given in Appendix A.
In addition to the analysis with the ATLAS cuts, we also provide the column for the analysis with cuts in [11] except a few differences. In the analysis in [11], the fakes of j, l → γ were not included in the background estimation, and the c-jet candidate was not separately considered. Instead, the c-flavor jet was included in the definition of the light jet where the universal mistag rate, j→b = 0.01, was applied. The photon reconstruction efficiency in [11] was assumed to be 80%. The numbers (fourth column) in the Table I were obtained by imposing the same set of cuts in [11] but with the same tag/mistag rates and photon reconstruction efficiency along with the separate treatment of the c-flavor jet as in ATLAS [32]. Consequently, the total background size is significantly larger than 37.1 events in [11] whereas the signal is found to be degraded 9 .

Multivariate Analysis
In this section, we carry out a more sophisticated analysis approach, namely a multivariate analysis, to improve the performance compared to the cut-based analysis. We proceed it by employing the BDT algorithm with the help of the TMVA-Toolkit [63] in the ROOT framework [64]. The cut-based analysis cuts out the phase space by applying a series of 9 The final signal rate (8.1) is smaller than 12.8 events in [11]. We find that the discrepancy is caused by two reasons: p T -dependent photon reconstruction efficiency in Eq. (9) and the jet smearing that we applied [59] (further jet smearing in addition to those caused by the parton shower and hadronization was not applied in [11]).   [32], by our analysis with ATLAS cuts (as the validation) and with cuts in [11] and by MVA (multivariate analysis). The tt(≥ 1 lepton) and bbjj backgrounds are not displayed in the table due to their small sizes.
windows (bounded by either both boundaries or one side). This approach might be able to reach the maximal efficiency via an optimization when the variables are not correlated.
When a signal region has a more complicated boundary due to the correlations among variables, the cut-based analysis may not be the best option. The MVA technique is one way to achieve a better performance since it can efficiently identify the signal region in the multidimensional phase space 10 .
We prepare training samples for BDT analysis for each background. The samples are required to pass following cuts: The variables entering into the BDT analysis, in addition to those in Eq. (11), are p T (γγ), the transverse momentum of the hardest extra jet (if it exists) which is not assigned as a b 10 Identifying the signal region accurately requires a large statistics, and this approach will be difficult to be a data driven in a process with a low signal rate. An alternative approach may be identifying the backgrounds better in a control region (where enough statistics is guaranteed) and extrapolating them to the signal region. By rejecting the extrapolated backgrounds in the signal region, one may be able to extract the signal rate. We implicitly assume that the final performance of either approach will be similar.
jet and c jet. The ∆R(γ i , b j ) is the distance between ith photon and jth b jet, and E miss As the significance plot as a function of BDT cut minimum in Fig. 2 shows roughly a plateau in the vicinity of the maximal significance, it is not straightforward to choose unambiguously a unique reference point in the BDT analysis. In our BDT analysis for the bbγγ decay channel, we choose the BDT minimum = 0.90 as a reference point to proceed deriving the sensitivities on various anomalous Higgs couplings in later sections. Increasing BDT minimum can further reduce the total backgrounds for only a small loss of the signal rate. In this situation, the significance becomes more robust to systematic uncertainties but, at the same time, it becomes more vulnerable to a larger statistical error caused by the limitation of the background simulation. While one ideally may select different BDT minima at different beyond the SM (BSM) points to take into account of the varying kinematics point-by-point in the BSM phase space, we simply apply the universal BDT minimum chosen at the SM point to all the other BSM points. Using the reference BDT minimum, the significances are improved to 2.1.
We include two leading final states, τ h τ h and τ µ τ h , in our analysis.
The typical signature of the leptonic tau decay is simply an isolated lepton as it decays to a lepton and two neutrinos, τ → l + ν τ + ν l . The hadronic tau decay, on the other hand,  level misidentification rate for 50 − 60% of tagging efficiency [65]. We take a somewhat semi-realistic approach for the τ -jet identification. Just like a nominal b-tagging algorithm searching for a b hadrons inside a jet, our τ -tagging works in a similar way by searching for the truth-level τ parton inside a R = 0.4 anti-k T jet. The τ -jet candidate is further multiplied by an appropriate tag rate. In this study, we will take 50% tagging efficiency and 0.48% misidentification rate as central values. The impact of different tau-tagging performances will be examined in Section IV B.
The event preselection of the bbτ + τ − final state starts from the lepton isolation which uses the same criterion as that in the previous section. Only the events without an isolated lepton are selected for the fully hadronic bbτ + τ − final state, whereas exactly one isolated lepton is required for the semileptonic bbτ + τ − final state. The remaining particles are clustered into anti-k T jets with a distance parameter R = 0.4, and only the jets with p T (j) > 30 GeV and |η(j)| < 2.4 are considered. As in the case of bbγγ, our heavy flavor/tau tagging algorithm runs iteratively over jets to classify them as b jets, c jets, light jets, or τ jets which are then iteratively paired into two τ -jet candidates (one τ -jet candidate and a lepton) and two b-jet candidates for the fully hadronic (semileptonic) bbτ + τ − . Since the decay products of both leptonic and hadronic taus include invisible neutrinos, the missing transverse momentum (/ p T ) is defined such that it balances the p T sum of all visible final state particles including those from the bb system.

Cut-based analysis
In our cut-based analysis, we follow the similar strategy to CMS [31] except that the reconstruction of the h → τ + τ − resonance is differently done. We impose the following sets of cuts for the fully hadronic bbτ + τ − final state: where p T > (τ ) (p T < (τ )) is the transverse momentum for the harder (softer) τ -jet candidate, and for the semileptonic bbτ + τ − final state , where τ h (τ l ) denotes the hadronic (leptonic) tau and p T (τ ) cut is imposed on both the hadronic and leptonic taus.
After two tau candidates are identified, we reconstruct the resonant h → τ + τ − system. A difficulty arises due to neutrinos in the decay chain, which causes irreducible loss of the information. A well-established method used by the experimental collaborations to reconstruct the h → τ + τ − resonance utilizes the likelihood method (see [31] for CMS τ + τ − reconstruction). Realizing this sophisticated approach in our study is beyond the scope of our work. Instead, we adopt a rather simple prescription utilizing the transverse mass type variable, proposed in [41]. The transverse mass in proposed in [41] is called m τ τ Higgs-bound which is defined as where the H µ is the total momentum of the τ + τ − -system, The invariant mass of H µ (= H µ H µ ) is minimized over all possible assignments for the invisible four momenta, q µ 1 and q µ 2 , subject to the constraint χ (see Eq. (17)) as is defined in Eq. (14). The p µ 1 (= p T (τ 1 )) and p µ 2 (= p T (τ 2 )) are visible four momenta of two tau candidates. The p µ 1, 2 and q µ 1 2 are supposed to satisfy the following internal mass constraints, where the transverse momenta of q µ 1 and q µ 2 are subject to the experimental constraint, The minimization procedure in Eq. (14) returns the m reco τ τ value if there exists a solution for the constraints in Eq. (16). As was pointed out in [41], a solution exists for almost all events at the parton-level analysis with the truth neutrinos whereas the failure rate for a solution significantly increases when the method is applied to the hadron level events. We find that only ∼ 25% of the events succeed to find a solution at the hadron level analysis and the rate is sensitive to the missing transverse momentum measurement. Another cause of the failure to find a solution is the extra neutrinos from the bb system via the semileptonic decays of the b hadrons. It was also pointed out in [41] that there exists at least a solution if and only if the following condition is satisfied: where the stransverse mass M T 2 [66][67][68] is defined as where M T is the transverse mass, / p 1T, 2T hypothesized transverse momenta, and m inv the mass of the invisible particle. We have confirmed that the failed events to find a solution indeed do not satisfy the constraint in Eq. (18). While the parton-level events with the truth neutrinos nicely shows the end point behavior bounded by m τ ∼ 1.78 GeV (on-shell mass of the tau lepton), the M T 2 distribution of the hadron level events severely overshoot above m τ ∼ 1.78 GeV.
The significance as a function of m τ for the bbτ + τ − process against all the backgrounds (left) and h → τ + τ − against Z → τ + τ − (right). The significance was maximized for each m τ value by adjusting a cut on the m reco τ τ , while the other cuts remain the same Motivated by our observation regarding the inequality in Eq. (18), we take the internal mass m τ as an independent handle that controls the success rate for the solution instead of fixing it to the on-shell tau mass. We vary the m τ variable to retain more events that  find a solution. In our analysis, we choose a m τ value in such a way that it maximizes the significance. To this end, we estimate the significance as a function of m τ against all the backgrounds to the bbτ + τ − decay channel of the double Higgs production. It turns out that the significance is optimized around m τ ∼ 10 GeV (see the left panel of Fig. 3). To measure the importance of the separation of the signal against the Z + jets background, we also estimate the significance for the pure h → τ + τ − samples against the Z → τ + τ − events in a separate simulation 11 . We find that the significance is maximized around 6 ∼ 8 GeV (see the right panel of Fig. 3) which is close to the value (∼ 10 GeV) obtained using the samples of the bbτ + τ − decay channel. Throughout our analysis for the bbτ + τ − decay channel, we set to m τ = 10 GeV with which we find almost all events are retained 12 .
The reconstructed invariant mass m reco τ τ are shown in Fig. 4 for both fully hadronic and semileptonic modes of the τ + τ − system. The shape of the invariant mass m reco τ τ for both decay modes appear to be roughly symmetric at a lower central value than 125 GeV (as a property of the transverse mass). Unlike the case of the likelihood method used by the experimental collaborations where the mass resolution for the fully hadronic case is better than the semileptonic case 13 , we instead see in Fig. 4 that the semileptonic case has slightly 11 The pure h → τ + τ − (Z → τ + τ − ) samples were generated via the Zh process where the Z-boson (Higgs) decay was switched off, and both samples were further processed via the parton shower and hadronization. The τ -jet candidates and leptons were smeared accordingly based on the description in Sec. III. The significance was maximized for each m τ value by adjusting a cut on the m reco τ τ , while the other cuts remain the same (similarly for the left panel of Fig. 3). 12 In the likelihood method used by CMS [69], all events are retained by reconstructing the invisible momenta from neutrinos taking into account the finite resolution of the missing transverse momentum. 13 For instance, the SVFIT algorithm in CMS achieves the relative m τ τ resolution of about 10%, 15%, and better mass resolution than the fully hadronic case. This observation is an artifact caused by our τ -tagging algorithm. Since our τ -tagging algorithm searches for a truth τ lepton within the isolation cone around a jet, instead of requiring no hadronic activities within the annulus formed by the smaller and bigger cones around the hardest track as done in the traditional τ -tagging, the resulting τ -jet candidate is necessarily more contaminated than the case that passed more stringent criterion. However, we suspect that it does not change our result.
The reconstruction of the invariant mass of the bb system, on the other hand, takes the similar approach to the case of bbγγ. We restrict the events to those in the following mass windows: 100 < m reco τ τ < 125 GeV (semileptonic) , for the τ + τ − system and 95 < m reco bb < 135 GeV ,  While the tt is a dominant background, we introduce another M T 2 variable whose distribution is expected to be bounded from the above around top mass [68] (see [70] for the related discussion). The M T 2 is computed using two b-jet candidates and two visible τ candidates, with the minimization over two possible combinations,  where p(b, τ ) denotes the total four vector of a b jet and visible τ candidate, namely p(b, τ ) = p T (b) + p T (τ ) , and / p T is a total missing transverse momentum with m inv = 0.
The distributions of M T 2 for the fully hadronic and the semileptonic modes are shown in This, however, does not apply to the signal topology in which the M T 2 distribution can be widely spread out, and therefore, we impose the following cut on the M T 2 variable: The results obtained by our cut-based analyses at the HL-LHC, assuming the data of 3 ab −1 , are presented in Table III, which looks roughly consistent with the CMS analysis [31] 14 .
The significance of the SM for the fully hadronic (semileptonic) decay channel is estimated to be 0.9 (0.6) which is very close to 0.89 (0.55) of the CMS analysis [31]. Even though CMS has employed a MVA method for the semileptonic decay channel, we find that the MVA method improves the significance only ∼ 40% for the semileptonic channel compared to the cut-based analysis (see Table III).

Multivariate Analysis
We employ the BDT algorithm for the multivariate analysis in the bbτ + τ − channel. First, we prepare training samples for each background. The events are required to pass following 14 The validation of our analysis by reproducing the CMS result is not straightforward as our treatment of τ + τ − system differs from the CMS one. Nevertheless, we obtain the similar result. cuts: where τ denotes either a hadronic or leptonic tau. The variables used in our BDT analysis, in addition to those in Eq. (25), are ∆R(b, b), ∆R(τ, τ ), ∆R(b i , τ j ) (i, j = 1, 2), p T (bb), p T (τ τ ), m reco hh , p T (hh), y(hh), N jet , E miss T , and m eff , where an effective mass is defined by m eff = (p(τ 1 ) + p(τ 2 ) + / p T ) 2 (p(τ 1 ) and p(τ 2 ) as visible four momenta of two τ candidates, as was described before). The detail of our BDT analysis can be found in Appendix B.
The significance as a function of the signal rate is illustrated in Fig. 6 which shows that the improvement of the significance is more pronounced for the fully hadronic channel whereas the effect is mild for the semileptonic channel. The BDT cut minima are chosen so that the signal rates are 10 and 7.9 for the fully hadronic and the semileptonic channels where the significances are almost maximal. We use these points as reference points in the BDT analysis for the bbτ + τ − channel. Using the reference BDT cut minimum, the significances can reach 1.5 and 0.8 for the fully hadronic and the semileptonic channels.

IV. COMBINED ANALYSIS
In this section, we will combine our results, obtained through the analyses in Section III, for two decay channels of the double Higgs production, namely bbγγ and bbτ + τ − . For the latter, we include the semileptonic and fully hadronic decay modes of the τ + τ − system. We will perform the combined analysis of both the cut-based (with ATLAS cuts as a default choice unless specified) and multivariate type to extract the sensitivity on the coefficients of the effective Lagrangian. To this end, we will exploit the exclusive (and inclusive) analysis for the bbγγ decay channel and two inclusive analyses for the bbτ + τ − decay channel. The types of plots presented are intended to be similar to those in [11] to make the comparison clear. We will demonstrate several ways for further improvements, focusing on the sensitivity on the Higgs self-coupling, for simplicity. Throughout our work, we will use the Bayesian statistical method to derive the sensitivity. Firstly, we consider the sensitivity on the EFT coefficients in the nonlinear basis where the EFT coefficients are not related to each other. In Fig. 7, we illustrate 68% probability contours of the likelihoods in (c 2t , c 3 ) and (c 2t , c 2g ) planes for the bbγγ and bbτ + τ − channels using the result obtained by our cut-based analysis. The other EFT coefficients not displayed in Fig. 7 were set to the SM values. The impact of the marginalization on the contours will be discussed later. In making contours of the combined analysis, we take the exclusive analysis of the bbγγ channel as a default. We show the contours of the inclusive analysis of the bbγγ only for the purpose of illustration.

A. Sensitivity on anomalous Higgs couplings in an EFT Lagrangian
The two plots in Fig. 7 reveal interesting aspects of the combining various channels.
The bbτ + τ − shows a similar strong anticorrelation in (c 2t , c 3 ) plane with the case of bbγγ, but the sensitivity is mainly determined by the bbγγ due to its much stronger sensitivity.
This indicates that combining two channels is the least beneficial for the determination of the Higgs self-coupling. On the contrary, the bbτ + τ − channel outperforms the bbγγ in the (c 2t , c 2g ) plane, and the sensitivity is mainly set by the bbτ + τ − except the positive deviation of c 2t . Although the orientation of the contour of the bbτ + τ − appears orthogonal to that of the bbγγ, the almost region of the bbτ + τ − lies inside the contour of the bbγγ, and its benefit is mainly for the determination of c 2g . The 68% probability contours of the likelihoods in the same (c 2t , c 3 ) and (c 2t , c 2g ) planes using the sophisticated multivariate analysis are presented in Fig. 8. As was expected from the Tables I and III, the multivariate analysis improves the significance up to the factor of 2 for each decay channel, and consequently, the contours noticeably shrink in both planes of Fig. 8 in such a way that the overlap between the two decay channels is reduced. The combined analysis breaks a significant amount of degeneracy. As is evident in Table I, the significance of the SM by our newly done cut-based analysis with cuts in [11] is worse than the result in [11] after including a larger set of backgrounds (with a larger mistag rate for the c-flavor jets), applying jet smearing, and p T -dependent photon reconstruction efficiency etc. We show 68% probability contours of the likelihoods of the bbγγ (exclusive analysis) in Fig. 9 using our cut-based analysis with the aforementioned modification. For the purpose of comparison, we also show 68% probability contours of the likelihoods for the cut-based analysis with ATLAS cuts and for the multivariate analysis. For simplicity, we have not performed the marginalization over all other parameters in Fig. 9 where we expect only a minor effect due to it. We see in Fig. 9 that the sensitivity extracted from our newly done cut-based analysis is worse compared to [11], but the lost sensitivity is recovered by the multivariate analysis. As a result, the constrained region by the multivariate analysis look similar to the region in [11].
We repeat similar exercise for the coefficients of the effective Lagrangian in the linear basis. The set of the coefficients considered in this study includesc H ,c u ,c 6 ,c g , andc d while the remaining parameters such asc γ is set to the SM values. Among those coefficients,c H , c u ,c g ,c d are also constrained by the single Higgs data. We take the ATLAS projection of the single Higgs processes at the HL-LHC using 3 ab −1 [71]. So far, we have not performed the marginalization over other parameters. Since the marginalization over muti-variables is time consuming, we will examine the effect of the marginalization on the sensitivity only for limited cases in the linear basis.
In Fig. 10, we show 68% probability contours of the likelihoods of the double Higgs production process (with and without folding in single Higgs processes) in (c u ,c 6 ) plane using our cut-based analysis with ATLAS cuts. While the combined analysis in the left panel of Fig. 10 combines only the exclusive analysis of the bbγγ and the inclusive one of bbτ + τ − , the single Higgs processes is folded in the combined analysis in the right panel of Fig. 10. The Fig. 10 indicates that having additional bbτ + τ − channel is least beneficial in (c u ,c 6 ) plane as we have already observed a similar property in the nonlinear basis.
Comparing the two plots in Fig. 10 indicates that, the single Higgs processes, mainly tth, are effective in constrainingc u , namely the deviation of the up-type Yukawa coupling. We performed the marginalization for the combined analysis in the right panel of Fig. 10 over c g 15 with the priors from the single Higgs data (shown as solid-red line). The effect of the marginalization is broadening the sensitivity onc u . We perform a similar exercise using the result by the multivariate analysis. The situation is illustrated in Fig. 11 where we observe a couple of changes. The allowed region noticeably shrinks with the result obtained by the multivariate analysis. Especially, as is evident in the right panel of Fig. 11, the 68% probability contour of the combined analysis having all other parameters set to SM values is split into two islands in the (c u ,c 6 ) plane. However, when the marginalization overc g is performed, two previously separated islands merge with the considerable change of the shape. After examining the shapes of the likelihoods, we find that the height of the second peak away from the SM point in (c u ,c 6 ) plane is significantly reduced 15 We also have performed the marginalization overc g ,c H ,c d . We find that the effect of the marginalization overc g dominates over the other two parameters which indicates that marginalizing overc H andc d causes only a negligible effect. In what follows,c H andc d will be set to the SM values, namelyc H =c d = 0 to save the computation time, and the marginalization will be performed only overc g (orc g andc u when deriving the 1D likelihood ofc 6 ). after the marginalization, and at the same time, the height of the first peak around the SM point as well as the middle region between two peaks is enhanced. A similar broadening effect ofc u after the marginalization is observed in this case.
We move onto the constraint in the (c u ,c g ) plane. Here, we illustrate our result only for the multivariate analysis. The sensitivity onc u andc g extracted by the double Higgs production process alone is illustrated in the left panel of Fig. 12 which shows the strong benefit from the bbτ + τ − channel in constrainingc g . Regardingc u , the bbγγ decay channel looks better for constraining a negative deviation while the positive deviation is better constrained by the bbτ + τ − process. Consequently, we see that the improvement on the positive deviation ofc u by adding the bbτ + τ − decay channel makes the double Higgs production itself comparable to the single Higgs process in constraining the positive deviation of the Yukawa coupling, as is seen in Fig. 13, where the sensitivities extracted by the double Higgs  Fig. 12 is zoomed in. The line/color coding is the same as Fig. 12 on the right panel.
are illustrated 16 . The shaded regions (both orange and light blue colors) in Fig. 13 indicate the strong correlation betweenc u andc g coefficients, which means that the marginalization overc g can significantly affect the precision onc u and vice versa (for example, see right panels of Figs. 10 and 11). Another benefit of the double Higgs production is seen in the right panel of Fig. 12, where one of two islands in the single Higgs fit away from the SM point is disfavored by the double Higgs production process. Finally, we derive 68% and 95% probability intervals on the Higgs cubic coupling, namelȳ c 6 , from the marginalized likelihood (overc u andc g ) as a function ofc 6 . The result is given in Table IV 17 . Whenc H = 0 is assumed, the sensitivity onc 6 can be directly translated into 16 While we have not marginalized over other EFT coefficients in Fig. 13 (and Fig. 12), see [11] to see the effects from the marginalization. 17 Our 68% probability interval extracted from the exclusive bbγγ (or combined) analysis employing the multivariate analysis is close to what has been reported in [11], namelyc 6 = [−1.0, 1.8] ∪ [3.5, 5.1]. However, as was shown in Table I, the newly done cut-based analysis with modifications mentioned below (but with cuts in [11]) loses sensitivity, and the lost sensitivity is apparently recovered by the multivariate analysis. On the other hand, our result appears more pessimistic compared to what was reported in [21]. Note that one sigma sensitivity in [21] was derived by a different statistical treatment, namely reading off the interval corresponding to 1 = ∆χ 2 where µ i is a signal strength as a function of c 6 for ith process. When a likelihood distribution is highly non-Gaussian, as is the case for the Higgs self-coupling at the HL-LHC, the Bayesian method used in our analysis does not necessarily give a similar result to that from 1 = ∆χ 2 . We find that the absence of the second interval in [21] is also due to the different statistical treatment. For instance, applying 1 = ∆χ 2 method to the same likelihood of the that of c 3 in the nonlinear basis via the relation c 3 = 1 +c 6 (see Eq. (6)). As is shown in Fig. 14, the effect of the exclusive analysis for the bbγγ is to break degeneracy between two degenerate maxima of the likelihood. Comparison between two plots in Fig. 14 demonstrates the impact of the multivariate analysis on the precision ofc 6 . As is evident in Fig. 14, the middle region between two maxima is reduced in the multivariate analysis, and the peak around the SM point becomes more pronounced. The plots in Fig. 15 illustrate the impact of the marginalization on the precision ofc 6 . The solid (dashed) blue lines in the right panel of Fig. 15 is the likelihood of the inclusive bbτ + τ − alone with the marginalization (without marginalization), and they indicate that the marginalization also breaks the degeneracy between two maxima in favor of the peak around the SM point. Overall, the net effect of the marginalization in the exclusive bbγγ and combined analyses is a small degradation of the peak around SM point as well as the middle region for the combined analysis. The effect is less pronounced for the cut-based analysis (see the left panel of Fig. 15). A similar observation was discussed in [11] where a strong correlation between the precision of the Yukawa coupling and the precision ofc 6 at 100 TeV pp collider is also discussed (see [21] as well for related discussion). combined analysis using the multivariate technique in Table IV   The allowed region onc 6 for the exclusive bbγγ, inclusive bbτ + τ − , and combined channels. The 68% and 95% probability intervals are extracted from the marginalized likelihoods (overc g andc u ) for the cut-based and the multivariate analyses. The interval onc 6 can be translated to that of c 3 via the relation c 3 = 1 +c 6 , assumingc H = 0.

B. On the effect of future phenomenological studies
By the time the HL-LHC starts operating, there will be significant improvements in many factors that are beneficial to the performance of the double Higgs production via various independent phenomenological studies. The most important factors will be the improved τ , c, b-tagging along with the reduced mistag rates, j→τ , j→b , c→b and the improved photon We vary onlyc 6 (or equivalently c 3 = 1 +c 6 ) while setting other EFT coefficients to the SM values. Also, for simplicity, we take the combined analysis of the inclusive analyses of the bbγγ and bbτ + τ − decay channels in Figs. 16 and 17. The first interval ofc 6 around the SM point among two peaks of the likelihood (for instance, see Fig. 14) is selected for illustration.
We do this exercise for the purpose of illustration. The dependence of the positive deviation ofc 6 on the b and τ is illustrated in Fig. 16.
The signal rate scales like ∼ 2 b and ∼ 2 τ for the bbγγ and bbτ + τ − decay channels, respectively, which means that the significance scales with one less power of tag rate, namely ∼ b and ∼ τ . On the other hand, the dependence on the fake rates which can widely change the size of the backgrounds are shown in Fig. 17. The improvements are less pronounced compared to those by b and τ . The resolution of m γγ is an another important factor for the improvement of the performance. As a nonresonant background has a featureless m γγ distribution unlike the sharp peak of the signal, the gaining in the signal-background discrimination can be significant.
In Fig. 18, we show the m γγ distribution from ATLAS [32], CMS [8], and our simulation. The standard deviation of m γγ distribution (SD(m γγ )) in our simulation, which is roughly 1.74 GeV, is similar with the ATLAS simulation. The standard deviation for the recent CMS simulation is about 25% smaller (SD(m γγ ) = 1.3 GeV) than ATLAS and ours. At the OPT-HL-LHC (see Eq. (26)), we adopt this CMS value. The net effect of the smaller width is the 25% relative reduction of bbγγ, ccγγ, bbjγ, jjγγ and ttγ backgrounds. It is because that the m γγ distribution for the background processes is almost flat in the range 120 < m γγ < 130 GeV.
Based on the previous exercise and the recent progress on the performance of the band τ -tagging algorithms [65,72], we select the following benchmark scenario as an optimistic where we still include only muons as leptonic taus.
The likelihood ofc 6 at the HL-LHC (for the purpose of comparison) and OPT-HL-LHC are illustrated in Fig. 19. Our estimate of the precision onc 6 using the multivariate analysis is reported in Table V where we used the same BDT cut minima as the HL-LHC. We extract the numbers in Table V taking into account the marginalization overc u andc g with the priors from the single Higgs data. As is evident in Table V, the second interval of the 68% probability intervals at the OPT-HL-LHC for the combined analysis has disappeared, and the previous 95% probability interval got split into two intervals. If one focuses on the first interval, a meaningful O(1) determination of the Higgs cubic coupling is possible even at the 95% probability level.
In the OPT-HL-LHC, we also included the jet energy resolution as an improvable factor that can affect the performance. We smeared jet momenta according to the parametrization by ATLAS [73]. The fractional jet energy resolution is described by three parameters, namely noise (N ( µ )), stochastic (S), and constant (C) terms as and it is given by  Table V. This seems to be a characteristic of the likelihood at the HL-LHC, namely highly non-Gaussian with two peaks. When only inclusive analyses of the bbγγ and bbτ + τ − decay channels are combined, the for the purpose of illustration, we use the combined analysis of the inclusive analyses of the bbγγ and bbτ + τ − decay channels and we do not perform the marginalization over other EFT coefficients in Fig. 20. We find that the improved jet energy resolution is especially beneficial to the fully hadronic bbτ + τ − decay channel. For instance, it improves the discrimination of the signal, h → τ + τ − , against Z → τ + τ − of Z + jets in the τ + τ − system as well as the improved resolution of the bb system, as is evident in Fig. 21.
second peak of the likelihood ofc 6 away from the SM point becomes much more pronounced than the case using the exclusive analysis of the bbγγ (as in Table V). This implies that the relative probability, or the area, of the second (first) peak increases (decreases), and this makes the 68% probability interval of the first peak narrower. Skipping the marginalization also makes the peaks narrower than the marginalized case.

V. SUMMARY
In this work, we have performed a combined analysis of the double Higgs production in the bbγγ and bbτ + τ − decay channels at the HL-LHC in the EFT approach. We have validated our cut-based analysis of the bbγγ decay channel by reproducing the ATLAS result. We have also provided the cut-based analysis following the cuts in [11] (with the modification described in Section III A) for the purpose of comparison. For the bbτ + τ − decay channel, we have obtained a similar CMS result. While the CMS analysis utilized the maximum likelihood fit method, called the SVFIT algorithm, we have implemented m Higgs−bound τ τ [41] in our analysis to reconstruct the invariant mass of the τ + τ − system. We have shown that increasing the m τ value (as an independent variable) with respect to the truth tau-lepton mass improves the signal efficiency resulting in the better significance. We have explored the multivariate analysis employing the BDT technique, and we found that the BDT technique improves the significance of the SM by roughly a factor of 2 in the both the bbγγ and bbτ + τ − decay channels.
Regarding the EFT approach, we have demonstrated that the different decay channels can constrain different regions of the parameter space. For instance, we have shown that the bbτ + τ − channel plays an important role in constraining the energy-growing higherdimensional operators, while it is least beneficial for improving the precision of the Higgs self-coupling where the sensitivity is dominantly determined by the bbγγ channel. We have illustrated that the double Higgs production can be more efficient to constrain the positive deviation of the top-Yukawa coupling compared to the single Higgs process.
The performance of the double Higgs production in the bbγγ and bbτ + τ − decay channels relies on the efficient identification of tau leptons and heavy flavor jets against the fake rates as well as the performance of the calorimeters. These factors can be improved in the future throughout numerous independent phenomenological studies. In this work, we have illustrated the dependence of the precision of the Higgs self-coupling on the improvable factors, namely the tag and mistag rates of tau jets and heavy flavor jets, photon identification efficiency, and invariant mass resolution.
As a benchmark scenario, we have considered a situation at the HL-LHC with a set of improved parameters, what we call an optimistic HL-LHC or OPT-HL-LHC. We have shown that the second 68% probability interval can be removed at the OPT-HL-LHC, while the 95% probability interval is being split into two intervals whose the first one around the SM gives a meaningful O(1) determination of the Higgs cubic coupling. For instance, given the choice of parameters at the OPT-HL-LHC, the 68% and 95% probability intervals of  The matching is performed in the four-flavor scheme. The σ · BR denotes the signal rate before applying k-factors. In the jjγγ background, we required at least two leading jets with p T (j) > 25 GeV (higher than the matching scale).
four-flavor scheme. In both processes, we imposed a cut on the invariant mass of two photons, 110 GeV < m γγ < 140 GeV, at the generation for better statistics. For the bbγγ background, we also imposed a cut on the invariant mass of the bb system, 60 GeV < m bb < 300 GeV at the generation. In the four-flavor scheme, the contribution from the ccγγ process is included in the jjγγ process. It has been shown in [11] via the NLO estimate performed by Mad-Graph5 aMC@NLO that the k-factor ∼ 2 of the bbγγ background is mainly originated by the real emission. The resonant backgrounds, Z(bb)h(γγ) and bbh(γγ), are matched to allow an additional jet at the ME 21 . Whereas a resonant tth(γγ) background is generated without the matching, the tree-level cross section is normalized by NLO k-factor [76] (k-factor = 1.435). The ATLAS analysis [32] has shown two more non-negligible backgrounds, bbjγ and ttγ backgrounds. Both samples are similarly matched up to an additional jet at the ME 22 .
For the bbjγ background, we restrict the events to the window, 60 GeV < m bb < 300 GeV, at the generation to improve statistics. The b quarks (photons) in all backgrounds are required to satisfy p T (b, γ) > 20 GeV and |η(b)| < 3 (|η(γ)| < 2.5) at the generation. The simulation details are summarized in Table VI.
In the validation of our simulation against the ATLAS result in Table I, we still need a few more steps in the background estimation. In our simulation of the γγ + jets samples (done by two processes, bbγγ and jjγγ at the ME) matched in the four-favor scheme, the contribution 21 The cross sections of Z(bb)h(γγ) and bbh(γγ) backgrounds are estimated to the NLO in [32]. In our simulation, the NLO effect is partially included via a matching. 22 As far as we can tell, the ttγ sample in [32] has not been matched.
to the bjγγ and bcγγ at the hadron level are underestimated. This issue is partly related to that, in the matching procedure in the four-flavor scheme, a proton does not include b quark as a initial parton and the gluon splitting into bb pair, g → bb, is prohibited in the parton shower for the consistency of the four-flavor scheme. While the jjγγ category in Table I includes jjγγ, cjγγ, bjγγ, and bcγγ at the hadron level, the contributions from jjγγ and cjγγ at the hadron level are not big enough to agree with the ATLAS estimate. For the clarification, we made a separate set of γγ + jets samples matched in the five-flavor scheme and estimated the sizes of the bjγγ, and bcγγ at the hadron level (along with the good agreement of jjγγ, cjγγ with those obtained in four-flavor scheme). It turns out that the dominant contributions to jjγγ category in Table I come from the bcγγ followed by bjγγ, cjγγ at the hadron level.
Since we chose the matching in the four-flavor scheme with the separate simulation of the bbγγ process at the ME to achieve a better statistics of the heavy flavor jets, we adopt a simple (somewhat ad hoc) trick to take into account the contributions to the bcγγ and bjγγ at the hadron level. We estimate the conversion probability of j → b (not confused with the mistag rate) by comparing matched (g → bb forbidden) and unmatched (g → bb allowed) jjγγ samples. We find that the conversion probability is 0.9% and 1.3% for the default and Perugia-2012 [77] parton shower tuning, respectively. Although there is some parton shower dependence, we use P j→b = 1% in our analysis.

Background simulation of bbτ + τ − decay channel
The relevant background to the bbτ + τ − decay channel includes Z + jets, tt, tth, ttV (with V = W ± , Z), hZ(bbτ + τ − ), tW , and V V . Among them, the Z + jets are generated by two processes, Zbb and Zjj, at the ME to obtain enough statistics of the b-enriched events. For the Zbb background, we imposed a cut on the invariant mass of the bb system, namely 60 GeV < m bb < 300 GeV at the generation. All the samples except for tth are matched up to an additional j at the ME using the k T -jet MLM matching in either a four-or five-flavor scheme to partially take into accounts for NLO effects. The four-flavor scheme was chosen for the matching procedures of Z + jets, tt, ttV , and hZ processes. The tree-level cross section of the tth background was rescaled to the NLO value (with the k-factor of 1.435).
On the other hand, the single top with a W boson, tW , is matched up to one additional j in the five-flavor scheme excluding diagrams that overlap with the tt process to avoid a double counting [78].
Similarly, the diboson background V V is matched allowing an additional j in the fiveflavor scheme excluding diagrams that overlap with the tW process. The tt is a dominant background for the semileptonic τ + τ − channel. We include only the leptonic decays of both ME Matching (scheme) xqcut/Qcut (GeV) σ · BR(fb) Generated Events  The summary of the backgrounds to the bbτ + τ − decay channel of the double Higgs production process. The xqcut and Qcut set the matching scale in the k T -jet MLM matching. The matching is performed in the four-flavor scheme unless specified explicitly. The σ · BR denotes the signal rate before applying k-factors. In the Zjj background, we required at least two leading jets with p T (j) > 25 GeV (higher than the matching scale) tops for better statistics, or tt → (bW + )(bW − ) → (bl + ν)(bl −ν ), where l = e, µ, τ . For the hZ background, both h and Z were forced to decay into either two b quarks or two τ 's to improve statistics. The simulation details are summarized in Table VII.

Appendix B: Multivariate Analysis
In this section, we provide some detail of our BDT analyses discussed in Sections III A 2 and III B 2. There are a few things that help us to understand the situation better. The improvements by BDT analyses in our bbγγ and bbτ + τ − channels are marginal (less than a factor of 2), which implies that the performances of the cut-and-count analyses are not bad. The set of cuts in our cut-and-count analyses are not based on a more sophisticated optimization, and in principle, an optimization over multiparameter space could lead to a smaller discrepancy between the BDT and optimized cut-and-count analysis. In that situation, the remaining discrepancy would be purely due to the nonredundant discrimination from the BDT analysis. We suspect that the improvement by the BDT analysis is the accumulated effect of a series of small improvements in variables due to more efficient signal isolation as will be partly explained below. We have chosen the BDT cut minimum of 0.90 in our BDT analysis. As is evident in Fig. 22, the significance gets improved as more variables are added. In V4 → V3, the enhanced significance is mainly due to p T (γγ) and p T (bb). The jet multiplicity N jet 23 in V3 → V2 appears to be an efficient discriminator; for instance, it can efficiently suppress the backgrounds with the large jet multiplicity such as ttγ and ttH. The additional variables in V1, which make the full set of variables used in our BDT analysis, only slightly improves the significance: they could be removed without affecting the result. It is interesting to notice that p T (γ) and p T (b) can be made redundant or can be traded for other variables. 23 While the jet multiplicity is an efficient variable to suppress the backgrounds with the large multiplicity, the simulation of higher jet multiplicity is nontrivial in some processes including the signal: the signal simulation is at leading order (although it includes the virtual correction) in real emission and γγ + jets samples were matched up to three jets. Although we expect that the current result would be similar to the one using more rigorous simulation, its effect in the current analysis, in principle, should be taken with a grain of salt. where the variables in V5 are similar to the subset of those used in the CMS analysis. The ∆R(τ i , b j ) in V1 includes three combinations except for the minimum one in V4. As is evident in Fig. 23, the ∆R and p T cuts of the bb and τ + τ − systems in V5 → V4 gives a nonredundant improvement of the significance. It is partly because p T (τ ) and p T (b) were not included in V5. While the variables in V4 are already the minimal set for the semileptonic bbτ + τ − channel, the hadronic bbτ + τ − channel continuously gains a series of improvements in V4 → V1 (for the given choice of the BDT cut that corresponds to 10 signal events).
Similarly to the bbγγ case, the p T (τ ) and p T (b) can be made redundant, or can be traded for other variables.
To make our BDT analysis stable against an overtraining issue, we have generated large enough Monte Carlo samples which guarantee smooth distributions of the variables used in the BDT analysis. As are seen in Figs. 22 and 23, the significance curves as a function of the BDT cut minimum, or the number of the signal events are smooth. We also made a following check to avoid overtraining: a data set is randomly split into two, namely S 1 and S 2 . A trained BDT function is obtained with S 1 (S 2 ), and it is applied to the other sample,