Search for the Standard Model Higgs boson produced in association with top quarks and decaying into a $b\bar{b}$ pair in $pp$ collisions at $\sqrt{s}$ = 13 TeV with the ATLAS detector

A search for the Standard Model Higgs boson produced in association with a top-quark pair, $t\bar{t}H$, is presented. The analysis uses 36.1 fb$^{-1}$ of $pp$ collision data at $\sqrt{s}$ = 13 TeV, collected with the ATLAS detector at the Large Hadron Collider in 2015 and 2016. The search targets the $H \to b\bar{b}$ decay mode. The selected events contain either one or two electrons or muons from the top-quark decays, and are then categorized according to the number of jets and how likely these are to contain $b$-hadrons. Multivariate techniques are used to discriminate between signal and background events, the latter being dominated by $t\bar{t}$ + jets production. For a Higgs boson mass of 125 GeV, the ratio of the measured $t\bar{t}H$ signal cross-section to the Standard Model expectation is found to be $\mu = 0.84^{+0.64}_{-0.61}$. A value of $\mu$ greater than 2.0 is excluded at 95% confidence level while the expected upper limit is $\mu<1.2$ in the absence of a $t\bar{t}H$ signal.

with the ATLAS detector at the Large Hadron Collider in 2015 and 2016. The search targets the H → bb decay mode. The selected events contain either one or two electrons or muons from the top-quark decays, and are then categorized according to the number of jets and how likely these are to contain b-hadrons. Multivariate techniques are used to discriminate between signal and background events, the latter being dominated by tt + jets production. For a Higgs boson mass of 125 GeV, the ratio of the measured ttH signal cross-section to the Standard Model expectation is found to be µ = 0.84 +0. 64 −0.61 . A value of µ greater than 2.0 is excluded at 95% confidence level while the expected upper limit is µ < 1.2 in the absence of a ttH signal.

Introduction
After the discovery of the Higgs boson [1][2][3] in 2012 by the ATLAS [4] and CMS [5] collaborations, attention has turned to more detailed measurements of its properties and couplings as a means of testing the predictions of the Standard Model (SM) [6][7][8]. In particular, the coupling to the top quark, the heaviest particle in the SM, could be very sensitive to effects of physics beyond the SM (BSM) [9]. Assuming that no BSM particle couples to the Higgs boson, the ATLAS and CMS experiments measured a value of the top-quark's Yukawa coupling equal to 0.87 ± 0.15 times the SM prediction by combining [10] their respective Higgs-boson measurements from the Run 1 dataset collected at center-of-mass energies of 7 TeV and 8 TeV at the Large Hadron Collider (LHC). This measurement relies largely on the gluon-gluon fusion production mode and on the decay mode to photons, which both depend on loop contributions with a top quark. If no assumption is made about the particle content of such loop contributions, then the top-quark coupling is only determined through tree-level processes, and a value of 1.4 ± 0.2 times the SM prediction is obtained.
Higgs-boson production in association with a pair of top quarks, ttH, is the most favorable production mode for a direct measurement of the top-quark's Yukawa coupling [11][12][13][14]. Although this production mode only contributes around 1% of the total Higgs-boson production cross-section [15], the top quarks in the final state offer a distinctive signature and allow many Higgs-boson decay modes to be accessed. Of these, the decay to two b-quarks is predicted to have a branching fraction of about 58% [15], the largest Higgs-boson decay mode. This decay mode is sensitive to the b-quark's Yukawa coupling, the second largest in the SM. In order to select events at the trigger level and reduce the backgrounds, the analysis targets events in which one or both top quarks decay semi-leptonically, producing an electron or a muon. 1 The main experimental challenges for this channel are the low combined efficiency to reconstruct and identify all final-state particles, the combinatorial ambiguity from the many jets containing b-hadrons in the final state which makes it difficult to reconstruct the Higgs boson, and the large backgrounds from the production of tt + jets especially when the associated jets stem from bor c-quarks. Some representative Feynman diagrams for the ttH signal are shown in Figure 1, together with the dominant tt + bb background. The ATLAS collaboration searched for ttH production with Higgs-boson decays to bb at √ s = 8 TeV, using tt decays with at least one lepton [16] or no leptons [17]. A combined signal strength µ = σ/σ SM of 1.4 ± 1.0 was measured. The CMS collaboration searched for the same process at √ s = 7 TeV and √ s = 8 TeV using tt decays with a single-lepton or dilepton in the final state, obtaining a signal strength of 0.7 ± 1.9 [18]. These results were combined with each other, and with results for Higgs boson decay to vector bosons, to τ-leptons or to photons [18][19][20], resulting in an observed (expected) significance of 4.4 (2.0) standard deviations for ttH production [10]. The measured signal strength is 2.3 +0. 7 −0.6 .
In this article, a search for ttH production with 36.1 fb −1 of pp collision data at √ s = 13 TeV is presented.
The analysis targets Higgs-boson decays to b-quarks, but all the decay modes are considered and may contribute to the signal. Events with either one or two leptons are taken into account, and exclusive analysis categories are defined according to the number of leptons, the number of jets, and the value of a b-tagging discriminant which provides a measure of how likely a jet is to contain a b-hadron. In the single-lepton channel, a specific category, referred to as 'boosted' in the following, is designed to select events containing a Higgs boson and with at least one of the two top quarks produced at high transverse momentum. In the analysis categories with the largest signal contributions, multivariate discriminants are used to classify events as more or less signal-like. The signal-rich categories are analyzed together with the signal-depleted ones in a combined profile likelihood fit that simultaneously determines the event yields for the signal and for the most important background components, while constraining the overall background model within the assigned systematic uncertainties. The combination of the results presented in this article with the results from other analyses targeting ttH production with different final states is reported in Ref. [21].
The article is organized as follows. The ATLAS detector is described in Section 2. Section 3 summarizes the selection criteria applied to events and physics objects. The signal and background modeling are presented in Section 4. Section 5 describes the event categorization while Section 6 presents the multivariate analysis techniques. The systematic uncertainties are summarized in Section 7. Section 8 presents the results and Section 9 gives the conclusions.

ATLAS detector
The ATLAS detector [22] at the LHC covers nearly the entire solid angle2 around the collision point. It consists of an inner tracking detector surrounded by a thin superconducting solenoid magnet producing a 2 T axial magnetic field, electromagnetic and hadronic calorimeters, and an external muon spectrometer (MS) incorporating three large toroid magnet assemblies. The inner detector (ID) consists of a highgranularity silicon pixel detector and a silicon microstrip tracker, together providing precision tracking in the pseudorapidity range |η| < 2.5, complemented by a straw-tube transition radiation tracker providing tracking and electron identification information for |η| < 2.0. A new innermost silicon pixel layer, the insertable B-layer [23] (IBL), was added to the detector between Run 1 and Run 2. The IBL improves the ability to identify displaced vertices and thereby significantly improves the b-tagging performance [24]. The electromagnetic sampling calorimeter uses lead or copper as the absorber material and liquid argon (LAr) as the active medium, and is divided into barrel (|η| < 1.475), endcap (1.375 < |η| < 3.2) and forward (3.1 < |η| < 4.9) regions. Hadron calorimetry is also based on the sampling technique and covers |η| < 4.9, with either scintillator tiles or LAr as the active medium and with steel, copper or tungsten as the absorber material. The muon spectrometer measures the deflection of muons with |η| < 2.7 using multiple layers of high-precision tracking chambers located in a toroidal field. The field integral of the toroids ranges between 2.0 and 6.0 Tm across most of the detector. The muon spectrometer is also instrumented with separate trigger chambers covering |η| < 2.4. A two-level trigger system [25], using custom hardware followed by a software-based level, is used to reduce the trigger rate to an average of around one kHz for offline storage. 2 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the center of the detector and the z-axis coinciding with the axis of the beam pipe. The x-axis points from the IP to the center of the LHC ring, and the y-axis points upward. Cylindrical coordinates (r,φ) are used in the transverse plane, φ being the azimuthal angle around the beam pipe. The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). Unless stated otherwise, angular distance is measured in units of ∆R ≡ (∆η) 2 + (∆φ) 2 .

Event selection
Events are selected from pp collisions at √ s = 13 TeV recorded by the ATLAS detector in 2015 and 2016.
Only events for which all relevant subsystems were operational are considered. Events are required to have at least one vertex with two or more tracks with transverse momentum p T > 0.4 GeV. The vertex with the largest sum of the squares of the transverse momenta of associated tracks is taken as the primary vertex. The event reconstruction is affected by multiple pp collisions in a single bunch crossing and by collisions in neighboring bunch crossings, referred to as 'pileup'. The number of interactions per bunch crossing in this dataset ranges from about 8 to 45 interactions. The dataset corresponds to an integrated luminosity of 3.2 ± 0.1 fb −1 recorded in 2015 and 32.9 ± 0.7 fb −1 recorded in 2016, for a total of 36.1 ± 0.8 fb −1 [26].
Events in both the single-lepton and dilepton channels were recorded using single-lepton triggers. Events are required to fire triggers with either low lepton p T thresholds and a lepton isolation requirement, or with higher thresholds but with a looser identification criterion and without any isolation requirement. The lowest p T threshold used for muons is 20 (26) GeV in 2015 (2016), while for electrons the threshold is 24 (26) GeV.
Electrons are reconstructed from energy deposits (clusters) in the electromagnetic calorimeter matched to tracks reconstructed in the ID [27,28] and are required to have p T > 10 GeV and |η| < 2.47. Candidates in the calorimeter barrel-endcap transition region (1.37 < |η| < 1.52) are excluded. Electrons must satisfy the loose identification criterion described in Ref.
[28], based on a likelihood discriminant combining observables related to the shower shape in the calorimeter and to the track matching the electromagnetic cluster. Muons are reconstructed from either track segments or full tracks in the MS which are matched to tracks in the ID [29]. Tracks are then re-fitted using information from both detector systems. Muons are required to have p T > 10 GeV and |η| < 2.5. To reduce the contribution of leptons from hadronic decays (non-prompt leptons), both electrons and muons must satisfy isolation criteria based on information from both the tracker and the calorimeter. The loose lepton isolation working point [28, 29] is used. Finally, lepton tracks must match the primary vertex of the event: the longitudinal impact parameter IP z is required to satisfy |IP z | < 0.5 mm, while the transverse impact parameter significance, |IP rφ |/σ IP rφ , must be less than 5 for electrons and 3 for muons.
Jets are reconstructed from three-dimensional topological energy clusters [30] in the calorimeter using the anti-k t jet algorithm [31] implemented in the FastJet package [32] with a radius parameter of 0.4. Each topological cluster is calibrated to the electromagnetic scale response prior to jet reconstruction. The reconstructed jets are then calibrated to the jet energy scale derived from simulation and in situ corrections based on 13 TeV data [33]. After energy calibration, jets are required to have p T > 25 GeV and |η| < 2.5. Quality criteria are imposed to identify jets arising from non-collision sources or detector noise, and any event containing such a jet is removed [34]. Finally, to reduce the effect of pileup, an additional requirement is made using an algorithm that matches jets with p T < 60 GeV and |η| < 2.4 to tracks with p T > 0.4 GeV to identify jets consistent with the primary vertex. This algorithm is known as jet vertex tagger [35], referred to as JVT in the remainder of this article.
Jets are tagged as containing b-hadrons through a multivariate b-tagging algorithm (MV2c10) that combines information from an impact-parameter-based algorithm, from the explicit reconstruction of an inclusive secondary vertex and from a multi-vertex fitter that attempts to reconstruct the bto c-hadron decay chain [36,37]. This algorithm is optimized to efficiently select jets containing b-hadrons (b-jets) and separate them from jets containing c-hadrons (c-jets), jets containing hadronically decaying τ-leptons (τ-jets) and from other jets (light jets). Four working points are defined by different MV2c10 discriminant output thresholds and are referred to in the following as loose, medium, tight and very tight. The efficiency for b-jets with p T > 20 GeV in simulated tt events to pass the different working points are 85%, 77%, 70% and 60%, respectively, corresponding to rejection factors3 of c-jets in the range 3-35 and of light jets in the range 30-1500. A b-tagging discriminant value is assigned to each jet according to the tightest working point it satisfies, ranging from 1 for a jet that does not satisfy any of the b-tagging criteria defined by the considered working points up to 5 for jets satisfying the very tight criteria. This b-tagging discriminant is used to categorize selected events as discussed in Section 5 and as an input to multivariate analysis techniques described in Section 6.
Hadronically decaying τ leptons (τ had ) are distinguished from jets using the track multiplicity and a multivariate discriminant based on the track collimation, further jet substructure, and kinematic information [38]. These τ had candidates are required to have p T > 25 GeV, |η| < 2.5 and pass the Medium τ-identification working point.
To avoid counting a single detector response as more than one lepton or jet, an overlap removal procedure is adopted. To prevent double-counting of electron energy deposits as jets, the closest jet within ∆R y = (∆y) 2 + (∆φ) 2 = 0.2 of a selected electron is removed.4 If the nearest jet surviving that selection is within ∆R y = 0.4 of the electron, the electron is discarded. Muons are removed if they are separated from the nearest jet by ∆R y < 0.4, which reduces the background from heavy-flavor decays inside jets. However, if this jet has fewer than three associated tracks, the muon is kept and the jet is removed instead; this avoids an inefficiency for high-energy muons undergoing significant energy loss in the calorimeter. A τ had candidate is rejected if it is separated by ∆R y < 0.2 from any selected electron or muon.
The missing transverse momentum in the event is defined as the negative vector sum of the p T of all the selected electrons, muons and jets described above, with an extra term added to account for energy in the event which is not associated with any of these. This extra term, referred to as the 'soft term' in the following, is calculated from ID tracks matched to the primary vertex to make it resilient to pileup contamination [39,40]. The missing transverse momentum is not used for event selection but it is included in the inputs to the multivariate discriminants that are built in the most sensitive analysis categories.
For the boosted category, the selected jets are used as inputs for further jet reclustering [41] through an anti-k t algorithm with a radius parameter of R = 1.0, resulting in a collection of large-R jets. Large-R jets with a reconstructed invariant mass lower than 50 GeV are removed. The resulting large-R jets are used to identify top quarks and Higgs bosons in signal events when these have high transverse momenta (boosted) and decay into collimated hadronic final states. Boosted Higgs-boson candidates are required to have p T > 200 GeV and contain at least two constituent jets, among which at least two are b-tagged at the loose must have exactly two leptons with opposite electric charge. The subleading lepton p T must be above 15 GeV in the ee channel or above 10 GeV in the eµ and µµ channels. In the ee and µµ channels, the dilepton invariant mass must be above 15 GeV and outside of the Z-boson mass window 83-99 GeV. To maintain orthogonality with other ttH search channels [21], dilepton events are vetoed if they contain one or more τ had candidates. Events enter the single-lepton channel if they contain exactly one lepton with p T > 27 GeV and no other selected leptons with p T > 10 GeV. In the single-lepton channel, events are removed if they contain two or more τ had candidates.
To improve the purity in events passing the above selection, selected leptons are further required to satisfy additional identification and isolation criteria, otherwise the corresponding events are removed. For electrons, the tight identification criterion based on a likelihood discriminant [28] is used, while for muons the medium identification criterion [29] is used. Both the electrons and muons are required to satisfy the Gradient isolation criteria [28,29], which become more stringent as the p T of the leptons considered drops.
Finally, events in the dilepton channel must have at least three jets, of which at least two must be b-tagged at the medium working point. Single-lepton events containing at least one boosted Higgs-boson candidate, at least one boosted top-quark candidate and at least one additional jet b-tagged at the loose working point enter the boosted category. Events that do not enter the boosted category and have at least five jets, with at least two of them b-tagged at the very tight working point or three of them b-tagged at the medium working point, are classified as 'resolved' single-lepton events. The fraction of simulated ttH(H → bb) events passing the dilepton event selection is 2.5%. These fractions are 8.7% for the resolved single-lepton channel and 0.1% for the boosted category.

Signal and background modeling
This section describes the simulation and data-driven techniques used to model the ttH signal and the background processes, to train the multivariate discriminants and to define the templates for the signal extraction fit. In this analysis, most Monte Carlo (MC) samples were produced using the full ATLAS detector simulation [42] based on G 4 [43]. A faster simulation, where the full G 4 simulation of the calorimeter response is replaced by a detailed parameterization of the shower shapes [44], was adopted for some of the samples used to estimate modeling systematic uncertainties. To simulate the effects of pileup, additional interactions were generated using P 8.186 [45] and overlaid onto the simulated hard-scatter event. Simulated events are reweighted to match the pileup conditions observed in the data. All simulated events are processed through the same reconstruction algorithms and analysis chain as the data. In the simulation, the top-quark mass is assumed to be m t = 172.5 GeV. Decays of band c-hadrons were performed by E G v1.2.0 [46], except in samples simulated by the S event generator.

Signal modeling
The ttH signal process was modeled using M G 5_aMC@NLO [47] (referred to in the following as MG5_aMC@NLO) version 2.3.2 for the matrix element (ME) calculation at next-to-leading-order (NLO) accuracy in quantum chromodynamics (QCD), interfaced to the P 8.210 parton shower (PS) and hadronization model using the A14 set of tuned parameters [48]. The NNPDF3.0NLO parton distribution function (PDF) set [49] was used, and the factorization and renormalization scales were set to µ F = µ R = H T /2, with H T defined as the scalar sum of the transverse masses p 2 T + m 2 of all final-state particles. The top quarks were decayed using M S [50], preserving all spin correlations. The Higgsboson mass was set to 125 GeV and all decay modes were considered. The ttH cross-section of 507 +35 −50 fb was computed [15,[51][52][53][54][55] at NLO accuracy in QCD and includes NLO electroweak corrections. The branching fractions were calculated using HDECAY [15,56].

tt + jets background
The nominal sample used to model the tt background was generated using the P -B v2 NLO event generator [57][58][59][60], referred to as P in the remainder of this article, with the NNPDF3.0NLO PDF set. The h damp parameter, which controls the transverse momentum of the first gluon emission beyond the Born configuration, was set to 1.5 times the top-quark mass [61]. The parton shower and the hadronization were modeled by P 8.210 with the A14 set of tuned parameters. The renormalization and factorization scales were set to the transverse mass of the top quark, defined as m T,t = m 2 t + p 2 T,t , where p T,t is the transverse momentum of the top quark in the tt center-of-mass reference frame. The sample is normalized using the predicted cross-section of 832 +46 −51 pb, calculated with the Top++2.0 program [62] at next-to-next-to-leading order (NNLO) in perturbative QCD including resummation of next-to-next-to-leading logarithmic (NNLL) soft gluon terms [63][64][65][66]. Alternative tt samples used to derive systematic uncertainties are described in Section 7.
The tt + jets background is categorized according to the flavor of additional jets in the event, using the same procedure as described in Ref. [16]. Generator-level particle jets are reconstructed from stable particles (mean lifetime τ > 3 × 10 −11 seconds) using the anti-k t algorithm with a radius parameter R = 0.4, and are required to have p T > 15 GeV and |η| < 2.5. This categorization employs a jet flavor-labeling procedure that is more refined than the one described in Section 3. The flavor of a jet is determined by counting the number of bor c-hadrons within ∆R < 0.4 of the jet axis. Jets matched to exactly one b-hadron, with p T above 5 GeV, are labeled single-b-jets, while those matched to two or more b-hadrons are labeled B-jets (with no p T requirement on the second hadron); single-c-and C-jets are defined analogously, only considering jets not already defined as single-b-or B-jets. Events that have at least one single-b-or B-jet, not counting heavy-flavor jets from top-quark or W-boson decays, are labeled as tt + ≥1b; those with no single-b-or B-jet but at least one single-c-or C-jet are labeled as tt + ≥1c. Finally, events not containing any heavy-flavor jets aside from those from top-quark or W-boson decays are labeled as tt + light. This classification is used to define the background categories in the likelihood fit. A finer classification is then used to assign correction factors and estimate uncertainties: events with exactly two single-b-jets are labeled as tt + bb, those with only one single-b-jet are labeled as tt + b, and those with only one B-jet are labeled as tt + B, the rest of the tt + ≥1b events being labeled as tt + ≥3b. Events with additional b-jets entirely originating from multi-parton interactions (MPI) or b-jets from final-state radiation (FSR), i.e. originating from gluon radiation from the top-quark decay products, are considered separately in the tt + b (MPI/FSR) subcategory. Background events from tt containing extra c-jets are divided analogously.
To model the dominant tt + ≥1b background with the highest available precision, the relative contributions of the different subcategories, tt + ≥3b, tt + bb, tt + B and tt + b, in the P +P 8 sample described above are scaled to match those predicted by an NLO ttbb sample including parton showering and hadronization [67], generated with S +O L [68,69]. The sample was produced with S version 2.1.1 and the CT10 four-flavor (4F) scheme PDF set [70,71]. The renormalization scale for this sample was set to the CMMPS value, µ CMMPS = i=t,t,b,b E 1/4 T,i [67], while the factorization scale was set to H T /2 = 1 2 i=t,t,b,b E T,i . The resummation scale µ Q , which sets an upper bound for the hardness of the parton-shower emissions, was also set to H T /2. This sample, referred to as 'S 4F' in the remainder of this article, employs a description of the kinematics of the two additional b-jets with NLO precision in QCD, taking into account the b-quark mass, and is therefore the most precise MC prediction for the tt + ≥1b process available at present. Topologies that are not included in this NLO calculation but are labeled as tt + ≥1b, i.e. events in the tt + b (MPI/FSR) subcategory, are not scaled. Figure 2 shows the predicted fractions for each of the tt + ≥1b subcategories, with the P +P 8 inclusive tt sample compared to the tt + bb S 4F sample. The tt + b (MPI/FSR) subcategory is not present in the tt + bb S 4F sample and accounts for 10% of the events in the P +P 8 tt + ≥1b sample.   Figure 2: The relative predicted fractions of the tt + b, tt + bb, tt + B and tt + ≥3b subcategories before any event selection. The prediction from the inclusive P +P 8 sample is compared to the four-flavor ttbb calculation from S 4F, with its uncertainties (from a combination of the sources discussed in Section 7) shown as the shaded area. The fractions are normalized to the sum of the four contributions shown here, without considering the tt + b (MPI/FSR) subcategory as part of the total.

Other backgrounds
Samples of ttW and tt Z (ttV) events were generated with an NLO matrix element using MG5_aMC@NLO interfaced to P 8.210 with the NNPDF3.0NLO PDF and the A14 parameter set.
Samples of Wt and s-channel single-top-quark backgrounds were generated with P -B v1 at NLO accuracy using the CT10 PDF set. Overlap between the tt and Wt final states was handled using the 'diagram removal' scheme [72]. The t-channel single-top-quark events were generated using the P -B v1 event generator at NLO accuracy with the four-flavor PDF set CT10 4F. For this process, the top quarks were decayed using M S . All single-top-quark samples were interfaced to P 6.428 [73] with the Perugia 2012 set of tuned parameters [74]. The single-top-quark Wt, tand s-channel samples are normalized using the approximate NNLO theoretical cross-sections [75][76][77].
Samples of W/Z production in association with jets were generated using S 2.2.1. The matrix elements were calculated for up to two partons at NLO and four partons at leading order (LO) using C [78] and O L , and merged with the S parton shower [79] using the ME+PS@NLO prescription [80]. The NNPDF3.0NNLO PDF set was used in conjunction with dedicated parton-shower tuning. The W/Z + jet events are normalized using the NNLO cross-sections [81]. For Z + jet events, the normalization of the heavy-flavor component is corrected by a factor 1.3, extracted from dedicated control regions in data, defined by requiring two opposite-charge same-flavor leptons (e + e − or µ + µ − ) with an invariant mass, m , inside the Z-boson mass window 83-99 GeV. The diboson + jet samples were generated using S 2.1.1 as described in Ref. [82].
Higgs-boson production in association with a single top quark is rare in the SM, but is included in the analysis and treated as background. Samples of single top quarks produced in association with a W boson and with a Higgs boson, tW H, were produced with MG5_aMC@NLO interfaced to H ++ [83] with the CTEQ6L1 PDF set. Samples of single top quarks plus Higgs boson plus jets, tHqb, were produced at LO with MG5_aMC@NLO interfaced to P 8, using the CT10 4F scheme PDF set. The other Higgs-boson production modes were found to be negligible and are not considered. Four-top production (tttt) as well as ttWW events were generated with MG5_aMC@NLO with LO accuracy and interfaced with P 8. Events from t Z production were also generated with MG5_aMC@NLO with LO accuracy, but interfaced with P 6. The process t ZW was also generated with MG5_aMC@NLO interfaced with P 8, but with NLO accuracy.
In the single-lepton channel, the background from events with a jet or a photon misidentified as a lepton (hereafter referred to as fake lepton) or non-prompt lepton is estimated directly from data using a matrix method [84]. A data sample enhanced in fake and non-prompt leptons is selected by removing the lepton isolation requirements and, for electrons, loosening the identification criteria. Next, the efficiency for these 'loose' leptons to satisfy the nominal selection ('tight') criteria is measured in data, separately for real prompt leptons and for fake or non-prompt leptons. For real prompt leptons the efficiency is measured in Z-boson events, while for fake and non-prompt leptons it is estimated from events with low missing transverse momentum and low values of the reconstructed leptonic W-boson transverse mass.5 With this information, the number of fake or non-prompt leptons satisfying the tight criteria can be calculated by inverting the matrix defined by the two equations: where N l (N t ) is the number of events observed in data passing the loose (tight) lepton selection, N l r (N l f ) is the number of events with a real prompt (fake or non-prompt) lepton in the loose lepton sample, and ε r (ε f ) is the efficiency for these events to pass the tight lepton selection. By generalizing the resulting formula to extract ε f N l f , a weight is assigned to each event selected in the loose lepton data sample, providing a prediction for both the yields and the kinematic distribution shapes for the fake and nonprompt lepton background. In the three most sensitive single-lepton signal regions, SR and SR 5j 1 (see Section 5), the contribution from events with a fake or non-prompt lepton is found to be very 5 The reconstructed leptonic W-boson transverse mass is defined as 2p small, consistent with zero, and is neglected. In the dilepton channel, this background is estimated from simulation and is normalized to data in a control region with two same-sign leptons.
All background samples described in this section, apart from the ttV samples, are referred to as 'non-tt' and grouped together in the figures and tables. The contribution to the total background prediction from non-tt varies between 4% and 15% depending on the considered signal or control region, as can be seen in Appendix A.

Event categorization
After the selection, the data sample is dominated by background from tt events. In order to take advantage of the higher jet and b-jet multiplicities of the ttH signal process, events are classified into non-overlapping analysis categories based on the total number of jets, as well as the number of b-tagged jets at the four working points. Events in the boosted single-lepton category are not further categorized due to the small number of selected events in this category. Events in the dilepton (resolved single-lepton) channel are first classified according to whether the number of jets is exactly three (five) or at least four (six). These events are then further subdivided into analysis categories, depending on the number of jets tagged at the four b-tagging working points, or, equivalently, on the values of the b-tagging discriminant for the jets. The b-tagging requirements are optimized in order to obtain categories enriched in one of the relevant sample components: ttH plus tt + bb, tt + b, tt + ≥1c and tt + light. The analysis categories where ttH and tt + bb are enhanced relative to the other backgrounds are referred to as 'signal regions'; in these, multivariate techniques are used to further separate the ttH signal from the background events. The remaining analysis categories are referred to as 'control regions'; no attempt is made to separate the signal from the background in these analysis categories, but they provide stringent constraints on backgrounds and systematic uncertainties in a combined fit with the signal regions.
In the dilepton channel, three signal regions are defined, with different levels of purity for the ttH and tt + bb components. The signal region with the highest ttH signal purity, referred to as SR ≥4j 1 , is defined by requiring at least four jets of which three are b-tagged at the very tight working point and another one is b-tagged at the tight working point. The other two signal regions, SR ≥4j 2 and SR ≥4j 3 , are defined with looser b-tagging requirements. The remaining dilepton events with at least four jets are divided into two control regions, one enriched in tt + light, CR ≥4j tt+light , and one in tt + ≥1c, CR ≥4j tt+≥1c . Dilepton events with three jets are split into two control regions, CR 3j tt+light and CR 3j tt+≥1b , enriched in tt + light and tt + ≥1b, respectively. The detailed definition of the signal and control regions for the dilepton channel is presented in Figure 3.
In the single-lepton channel, five signal regions are formed from events passing the resolved selection, three requiring at least six jets, and the other two requiring exactly five jets. They are referred to as SR  Figure 4. Figure 3: Definition of the (a) three-jet and (b) four-jet signal and control regions in the dilepton channel, as a function of the b-tagging discriminant defined in Section 3. The vertical axis shows the values of the b-tagging discriminant for the first two jets, while the horizontal axis shows these values for (a) the third jet or (b) the third and fourth jets. The jets are ordered according to their value of the b-tagging discriminant in descending order. Figures 5 and 6 show, respectively, the fraction of the different background components as well as the ttH signal purity for each of the signal and control regions in the dilepton and single-lepton channels. The H → bb decay represents 89% of the ttH signal events in the signal regions of the dilepton channel, 96% in the signal regions of the resolved single-lepton channel and 86% in the boosted signal region. Figure 4: Definition of the (a) five-jet and (b) six-jet signal and control regions in the single-lepton resolved channel, as a function of the b-tagging discriminant defined in Section 3. The vertical axis shows the values of the b-tagging discriminant for the first two jets, while the horizontal axis shows these values for the third and fourth jets. The jets are ordered according to their value of the b-tagging discriminant in descending order.

Multivariate analysis techniques
In each of the signal regions, a boosted decision tree (BDT) is exploited to discriminate between the ttH signal and the backgrounds. This BDT is referred to as the 'classification BDT' in the following. The distributions of the classification BDTs in the signal regions are used as the final discriminants for the profile likelihood fit described in Section 8. In the control regions, the overall event yield is used as input to the fit, except in those enriched in tt + ≥1c in the single-lepton channel, CR 5j tt+≥1c and CR ≥6j tt+≥1c ; in ATLAS = 13 TeV s Dilepton these two control regions, the distribution of the scalar sum of the p T of the jets, H had T , is used to further control the tt + ≥1c background.
The final state of the ttH(H → bb) process is composed of many jets stemming from the Higgs-boson and top-quark decay products, as well as from additional radiation. Many combinations of these jets are possible when reconstructing the Higgs-boson and top-quark candidates to explore their properties and the signal event topology. To enhance the signal separation, three intermediate multivariate techniques are implemented prior to the classification BDT: (a) the 'reconstruction BDT' used to select the best combination of jet-parton assignments in each event and to build the Higgs-boson and top-quark candidates, (b) a likelihood discriminant (LHD) method that combines the signal and background probabilities of all possible combinations in each event, (c) a matrix element method (MEM) that exploits the full matrix element calculation to separate the signal from the background. The outputs of the three intermediate multivariate methods are used as input variables to the classification BDT in one or more of the signal regions. The properties of the Higgs-boson and top-quark candidates from the reconstruction BDT are used to define additional input variables to the classification BDT. Although the intermediate techniques  exploit similar information, they make use of this information from different perspectives and based on different assumptions, so that their combination further improves the separation power of the classification BDT. Details of the implementation of these multivariate techniques are described in Sections 6.1-6.4.

Classification BDT
The classification BDT is trained to separate the signal from the tt background on a sample that is statistically independent of the sample used for the evaluation. The toolkit for multivariate analysis (TMVA) [85] is used to train both this and the reconstruction BDT. The classification BDT is built by combining several input variables that exploit the different kinematics of signal and background events, as well as the b-tagging information. General kinematic variables, such as invariant masses and angular separations of pairs of reconstructed jets and leptons, are combined with outputs of the intermediate multivariate discriminants and the b-tagging discriminants of the selected jets. In the case of the boosted single-lepton signal region, kinematic variables are built from the properties of the large-R jets and their jet constituents. The input variables to the classification BDT in each of the signal regions are listed in Appendix B. The input variables are selected to maximize the performance of the classification BDT; however, only variables with good modeling of data by simulation are considered. The output of the reconstruction BDT, the LHD and the MEM represent the most powerful variables in the classification BDT.

Reconstruction BDT
The reconstruction BDT is employed in all dilepton and resolved single-lepton signal regions. It is trained to match reconstructed jets to the partons emitted from top-quark and Higgs-boson decays. For this purpose, W-boson, top-quark and Higgs-boson candidates are built from combinations of jets and leptons.
The b-tagging information is used to discard combinations containing jet-parton assignments inconsistent with the correct parton candidate flavor.
In the single-lepton channel, leptonically decaying W-boson candidates are assembled from the lepton four-momentum (p ) and the neutrino four-momentum (p ν ); the latter is built from the missing transverse momentum, its z component being inferred by solving the equation m 2 W = (p + p ν ) 2 , where m W represents the W-boson mass. Both solutions of this quadratic equation are used in separate combinations. If no real solutions exist, the discriminant of the quadratic equation is set to zero, giving a unique solution. The hadronically decaying W-boson and the Higgs-boson candidates are each formed from a pair of jets. The top-quark candidates are formed from one W-boson candidate and one jet. The top-quark candidate containing the hadronically (leptonically) decaying W boson is referred to as the hadronically (leptonically) decaying top-quark candidate. In the single-lepton signal regions with exactly five selected jets, more than 70% of the events do not contain both jets from the hadronically decaying W boson. Therefore, the hadronically decaying top-quark candidate is assembled from two jets, one of which is b-tagged. In the dilepton channel, no attempt to build leptonically decaying W-boson candidates is made and the top-quark candidates are formed by one lepton and one jet.
Simulated ttH events are used to iterate over all allowed combinations. The reconstruction BDT is trained to distinguish between correct and incorrect jet assignments, using invariant masses and angular separations in addition to other kinematic variables as inputs. In each event a specific combination of jetparton assignments, corresponding to the best BDT output, is chosen in order to compute kinematic and topological information of the top-quark and Higgs-boson candidates to be input to the classification BDT. However, although the best possible reconstruction performance can be obtained by including information related to the Higgs boson, such as the candidate Higgs-boson invariant mass, in the reconstruction BDT, this biases the background distributions of these Higgs-boson-related observables in the chosen jet-parton assignment towards the signal expectation, reducing their ability to separate signal from background. For this reason, two versions of the reconstruction BDT are used, one with and one without the Higgs-boson information and the resulting jet-parton assignments from one, the other or both are considered when computing input variables for the classification BDT, as detailed in Appendix B.
The Higgs boson is correctly reconstructed in 48% (32%) of the selected ttH events in the single-lepton channel SR ≥6j 1 using the reconstruction BDT with (without) information about the Higgs-boson kinematics included. For the dilepton channel, the corresponding reconstruction efficiencies are 49% (32%) in SR ≥4j 1 . The reconstruction techniques are not needed in the signal region SR boosted , as the Higgs-boson and the top-quark candidates are chosen as the selected large-R jets described in Section 3. The large-R jet selected as a Higgs-boson candidate contains two b-tagged jets stemming from the decay of a Higgs boson in 47% of the selected ttH events.

Likelihood discriminant
In the resolved single-lepton signal regions, the output from a likelihood discriminant is included as an additional input variable for the classification BDT. The LHD is computed analogously to Ref. [86] as a product of one-dimensional probability density functions, pdfs, for the signal and the background hypotheses. The pdfs are built for various invariant masses and angular distributions from reconstructed jets and leptons and from the missing transverse momentum, in a similar way to those used in the reconstruction BDT.
Two background hypotheses are considered, corresponding to the production of tt + ≥ 2 b-jets and tt + exactly one b-jet, respectively. The likelihoods for both hypotheses are averaged, weighted by their relative fractions in simulated tt + jets events. In a significant fraction of both the ttH and tt simulated events with at least six selected jets, only one jet stemming from the hadronically decaying W boson is selected. An additional hypothesis, for both the signal and the background, is considered to account for this topology. In events with exactly five selected jets, variables including the hadronically decaying top-quark candidate are built similarly to those for the reconstruction BDT.
The probabilities p sig and p bkg , for signal and background hypotheses, respectively, are obtained as the product of the pdfs for the different kinematic distributions, averaged among all possible jet-parton matching combinations. Combinations are weighted using the b-tagging information to suppress the impact from parton-jet assignments that are inconsistent with the correct parton candidates flavor. For each event, the discriminant is defined as the ratio of the probability p sig to the sum of p sig and p bkg , and added as an input variable to the classification BDT. As opposed to the reconstruction BDT method, the LHD method takes advantage of all possible combinations in the event, but it does not fully account for correlations between variables in one combination, as it uses a product of one-dimensional pdfs.

Matrix element method
A discriminant (MEM D1 ) based on the MEM is computed following a method similar to the one described in Ref. [16] and is included as another input to the classification BDT. The MEM consumes a significant amount of computation time and thus is implemented only in the most sensitive single-lepton signal region, SR ≥6j 1 . The degree to which each event is consistent with the signal and background hypotheses is expressed via signal and background likelihoods, referred to as L S and L B , respectively. These are computed using matrix element calculations at the parton level rather than using simulated MC samples as for the LHD method. The matrix element evaluation is performed with MG5_aMC@NLO at the LO accuracy. The ttH(H → bb) process is used as a signal hypothesis, while tt + bb is used as a background hypothesis. To reduce the computation time, only diagrams representing gluon-induced processes are considered. The parton distribution functions are modeled with the CT10 PDF set, interfaced via the LHAPDF package [87]. Transfer functions, that map the detector quantities to the parton level quantities, are derived from a tt sample generated with P +P 6 and validated with the nominal P +P 8 tt sample. The directions in η and φ of all visible final-state objects are assumed to be well measured, and their transfer functions are thus represented by δ-functions. The neutrino momentum is constrained by imposing transverse momentum conservation in each event, while its p z is integrated over. The integration is performed using VEGAS [88], following the implementation described in Ref. [89]. As in the reconstruction BDT, b-tagging information is used to reduce the number of jet-parton assignments considered in the calculation. The discriminating variable, MEM D1 , is defined as the difference between the logarithms of the signal and background likelihoods: MEM D1 = log 10 (L S ) − log 10 (L B ).

Systematic uncertainties
Many sources of systematic uncertainty affect the search, including those related to the luminosity, the reconstruction and identification of leptons and jets, and the theory modeling of signal and background processes. Different uncertainties may affect only the overall normalization of the samples, or also the shapes of the distributions used to categorize the events and to build the final discriminants. All the sources of experimental uncertainty considered, with the exception of the uncertainty in the luminosity, affect both the normalizations and the shapes of distributions in all the simulated samples. Uncertainties related to modeling of the signal and the backgrounds affect both the normalizations and the shapes of the distributions for the processes involved, with the exception of cross-section and normalization uncertainties that affect only the normalization of the considered sample. Nonetheless, the normalization uncertainties modify the relative fractions of the different samples leading to a shape uncertainty in the distribution of the final discriminant for the total prediction in the different analysis categories.
A single independent nuisance parameter is assigned to each source of systematic uncertainty, as described in Section 8. Some of the systematic uncertainties, in particular most of the experimental uncertainties, are decomposed into several independent sources, as specified in the following. Each individual source then has a correlated effect across all the channels, analysis categories, signal and background samples. For modeling uncertainties, especially tt modeling, additional nuisance parameters are included to split some uncertainties into several sources independently affecting different subcomponents of a particular process.

Experimental uncertainties
The uncertainty of the combined 2015+2016 integrated luminosity is 2.1%. It is derived, following a methodology similar to that detailed in Ref.
[26], from a calibration of the luminosity scale using x-y beam-separation scans performed in August 2015 and May 2016. A variation in the pileup reweighting of MC events is included to cover the uncertainty in the ratio of the predicted and measured inelastic cross-sections in the fiducial volume defined by M X > 13 GeV where M X is the mass of the hadronic system [90].
The jet energy scale and its uncertainty are derived by combining information from test-beam data, LHC collision data and simulation [33]. The uncertainties from these measurements are factorized into eight independent sources. Additional uncertainties are considered, related to jet flavor, pileup corrections, η dependence, and high-p T jets, yielding a total of 20 independent sources. Although the uncertainties are not large, totaling 1%-6% per jet (depending on the jet p T ), the effects are amplified by the large number of jets in the final state. Uncertainties in the jet energy resolution and in the efficiency to pass the JVT requirement that is meant to remove jets from pileup are also considered. The jet energy resolution is divided into two independent components.
The efficiency to correctly tag b-jets is measured in data using dileptonic tt events. The mis-tag rate for c-jets is also measured in tt events, identifying hadronic decays of W bosons including c-jets [91], while for light jets it is measured in multi-jet events using jets containing secondary vertices and tracks with impact parameters consistent with a negative lifetime [36]. The b-tagging efficiencies and mis-tag rates are first extracted for each of the four working points used in the analysis as a function of jet kinematics, and then combined into a calibration of the b-tagging discriminant distribution, with corresponding uncertainties that correctly describe correlations across multiple working points. The uncertainty associated with the b-tagging efficiency, whose size ranges between 2% and 10% depending on the working point and on the jet p T , is factorized into 30 independent sources. The size of the uncertainties associated with the mis-tag rates is 5%-20% for c-jets depending on the working point and on the jet p T , and 10%-50% for light jets depending on the working point and on the jet p T and η. These uncertainties are factorized into 15 (80) independent sources for c-jets (light jets). Jets from τ had candidates are treated as c-jets for the mis-tag rate corrections and systematic uncertainties. An additional source of systematic uncertainty is considered on the extrapolation between c-jets and these τ-jets.
Uncertainties associated with leptons arise from the trigger, reconstruction, identification, and isolation efficiencies, as well as the lepton momentum scale and resolution. These are measured in data using leptons in Z → + − , J/ψ → + − and W → eν events [28,29]. Uncertainties of these measurements account for a total of 24 independent sources, but have only a small impact on the result.
All uncertainties in energy scales or resolutions are propagated to the missing transverse momentum. Additional uncertainties in the scale and resolution of the soft term are considered, for a total of three additional sources of systematic uncertainty.

Modeling uncertainties
The predicted ttH signal cross-section uncertainty is +5.8% −9.2% (scale) ± 3.6%(PDF), the first component representing the QCD scale uncertainty and the second the PDF+α S uncertainty [15,[51][52][53][54][55]. These two components are treated as uncorrelated in the fit. The effect of QCD scale and PDF variations on the shape of the distributions considered in this analysis is found to be negligible. Uncertainties in the Higgs-boson branching fractions are also considered; these amount to 2.2% for the bb decay mode [15]. An additional uncertainty associated with the choice of parton shower and hadronization model is derived by comparing the nominal prediction from MG5_aMC@NLO+P 8 to the one from MG5_aMC@NLO interfaced to H ++.
The systematic uncertainties affecting the modeling of the tt +jets background are summarized in Table 1. An uncertainty of ±6% is assumed for the inclusive tt NNLO+NNLL production cross-section [62], including effects from varying the factorization and renormalization scales, the PDF, α S , and the topquark mass. The tt + ≥1b, tt + ≥1c and tt + light processes are affected by different types of uncertainties: tt + light has additional diagrams and profits from relatively precise measurements in data; tt + ≥1b and tt + ≥1c can have similar or different diagrams depending on the flavor scheme used for the PDF, and the mass differences between cand b-quarks contribute to additional differences between these two processes. For these reasons, all uncertainties in tt + jets background modeling, except the uncertainty in the inclusive cross-section, are assigned independent nuisance parameters for the tt + ≥1b, tt + ≥1c and tt + light processes. The normalizations of tt + ≥1b and tt + ≥1c are allowed to float freely in the fit. Systematic uncertainties in the shapes are extracted from the comparison between the nominal sample and various alternative samples. For all these uncertainties, alternative samples are reweighted in such a way that they have the same fractions of tt + ≥1c and tt + ≥1b as the nominal sample. In the case of the tt + ≥1b background, separate uncertainties are applied to the relative normalization of the tt + ≥1b subcomponents as described later. Therefore, for all the alternative samples used to derive uncertainties that are not specifically associated with these fractions, the relative contributions of the tt + ≥1b subcategories are scaled to match the predictions of S 4F, in the same way as for the nominal sample. This scaling is not applied to the tt + b (MPI/FSR) subcategory, as explained in Section 4.
Uncertainties associated with the choice of tt inclusive NLO event generator as well as the choice of parton shower and hadronization model are derived by comparing the prediction from P +P 8 with the S predictions (hence varying simultaneously the NLO event generator and the parton shower and hadronization model) and with the predictions from P interfaced with H 7 [92] (varying just the parton shower and hadronization model). The former alternative sample was generated using S Table 1: Summary of the sources of systematic uncertainty for tt + jets modeling. The systematic uncertainties listed in the second section of the table are evaluated in such a way as to have no impact on the relative fractions of tt + ≥1b, tt + ≥1c and tt + light events, as well as on the relative fractions of the tt + b, tt + bb, tt + B and tt + ≥3b subcategories, which are all kept at their nominal values. The systematic uncertainties listed in the third section of the table affect only the fractions of the various tt + ≥1b subcategories. The last column of the table indicates the tt category to which a systematic uncertainty is assigned. In the case where all three categories (tt + light, tt + ≥1c and tt + ≥1b) are involved (marked with 'all'), the last column also specifies whether the uncertainty is considered as correlated or uncorrelated across them. PDF set was used and both the renormalization and factorization scales were set to 0.5 × (m 2 T,t + m 2 T,t ). This sample is referred to as 'S 5F' in the remainder of this article, which should not be confused with the S 4F sample defined in Section 4. The comparison with the latter alternative sample is considered as an independent source of uncertainty, related to the parton shower and hadronization model choice. This sample was generated with the same settings for P as the nominal tt sample in terms of h damp , PDF and renormalization and factorization scales, but it was interfaced with H 7 version 7.0.1, with the H7-UE-MMHT set of tuned parameters for the underlying event. Additionally, the uncertainty in the modeling of initial-and final-state radiation (ISR / FSR) is assessed with two alternative P +P 8 samples [93]. One sample with the amount of radiation increased has the renormalization and factorization scales decreased by a factor of two, the h damp parameter doubled, and uses the Var3c upward variation of the A14 parameter set. A second sample with the amount of radiation decreased has the scales increased by a factor of two and uses the Var3c downward variation of the A14 set. The uncertainties described in this paragraph correspond to three independent sources for each of the tt + light, tt + ≥1c and tt + ≥1b components.
For the background from tt + ≥1c, there is little guidance from theory or experiment to determine whether the nominal approach of using charm jets produced primarily in the parton shower is more or less accurate than a prediction with tt + cc calculated at NLO in the matrix element. For this reason, an NLO prediction with tt + cc in the matrix element, including massive c-quarks and therefore using the 3F scheme for the PDFs, is produced with MG5_aMC@NLO interfaced to H ++, as described in Ref. [94]. The difference between this sample and an inclusive tt sample produced with the same event generator and a 5F scheme PDF set, in which the tt + ≥1c process originates through the parton shower only, is taken as an additional uncertainty in the tt + ≥1c prediction. This uncertainty is related to the choice between the tt + cc ME calculation and the prediction from the inclusive tt production with c-jets via parton shower and is applied as one additional independent source to the tt + ≥1c background.
For the tt + ≥1b process, the difference between the predictions from P +P 8 and S 4F is considered as one additional source of uncertainty. This uncertainty accounts for the difference between the description of the tt + ≥1b process by the NLO tt inclusive MC sample with a 5F scheme and a description at NLO of tt + bb in the ME with a 4F scheme. This uncertainty is not applied to the tt + b (MPI/FSR) subcategory since it is not included in the 4F calculation.
The uncertainties described above do not affect the relative fractions of the tt + b, tt + bb, tt + B and tt + ≥3b subcomponents as these fractions are fixed to the prediction of S 4F. The uncertainties in these fractions in S 4F are assessed separately and are divided into seven independent sources. Three of these sources are evaluated by varying the renormalization scale up and down by a factor of two, changing the functional form of the resummation scale to µ CMMPS , and adopting a global scale choice, µ Q = µ R = µ F = µ CMMPS . Additionally, two alternative PDF sets, MSTW2008NLO [95] and NNPDF2.3NLO, are considered, as well as an alternative shower recoil scheme and an alternative set of tuned parameters for the underlying event. These sources of uncertainty contribute to the uncertainty band shown in Figure 2 for the S 4F prediction. Given the large difference between the 4F prediction and the various 5F predictions for the tt + ≥3b process, which is not covered by the uncertainties described above, this sub-process is given an extra 50% normalization uncertainty.
The relative fraction of the tt + b (MPI/FSR) subcategory is not fixed in the alternative samples used to derive the systematic uncertainties related to the choice of NLO event generator, parton shower and hadronization model and to ISR/FSR. These sources already incorporate variations related to the fraction and shape of the tt + b (MPI/FSR) subcategory. In addition, a 50% normalization uncertainty is assumed for the contribution from MPI, based on studies of different underlying event sets of tuned parameters.
In total, thirteen independent sources of modeling uncertainties are assigned to the tt + ≥1b component, four to the tt + ≥1c component and three to the tt + light component in addition to the one source that corresponds to the inclusive tt production cross-section uncertainty.
An uncertainty of 40% is assumed for the W + jets cross-section, with an additional 30% normalization uncertainty used for W + heavy-flavor jets, taken as uncorrelated between events with two and more than two heavy-flavor jets. These uncertainties are based on variations of the factorization and renormalization scales and of the matching parameters in the S simulation. An uncertainty of 35% is then applied to the Z + jets normalization, uncorrelated across jet bins, to account for both the variations of the scales and matching parameters in S simulation and the uncertainty in the extraction from data of the correction factor for the heavy-flavor component.

An uncertainty of +5%
−4% is considered for each of the three single-top production mode cross-sections [75][76][77]. For the Wt and t-channel production modes, uncertainties associated with the choice of parton shower and hadronization model and with initial-and final-state radiation are evaluated according to a set of alternative samples analogous to those used for the tt process: the nominal prediction is compared with samples generated with P interfaced with H ++ and with alternative P -B v1 + P 6 samples with factorization and renormalization scale variations and appropriate variations of the Perugia 2012 set of tuned parameters. The uncertainty in the amount of interference between Wt and tt production at NLO [72] is assessed by comparing the default 'diagram removal' scheme to the alternative 'diagram subtraction' scheme.
A 50% normalization uncertainty in the diboson background is assumed, which includes uncertainties in the inclusive cross-section and additional jet production [82]. The uncertainty of the ttV NLO crosssection prediction is 15% [96], split into PDF and scale uncertainties as for ttH. An additional ttV modeling uncertainty, related to the choice of event generator, parton shower and hadronization model, is assessed by comparing the nominal sample with alternative ones generated with S . Uncertainties in ttV production are all treated as uncorrelated between tt Z and ttW. A total 50% normalization uncertainty is considered for the tttt background. The small backgrounds from t Z, ttWW, tH jb and WtH are each assigned two cross-section uncertainties, split into PDF and scale uncertainties, while tW Z is assigned one cross-section uncertainty that accounts for both the scale and PDF effects.
Finally, a 50% uncertainty is assigned to the overall estimated yield of non-prompt lepton events in the single-lepton channel, taken as uncorrelated between electron-plus-jet and muon-plus-jet events, between boosted and resolved analysis categories, and between the resolved analysis categories with exactly five jets and those with six or more jets. In the dilepton channel, the non-prompt lepton background is assigned a 25% uncertainty, correlated across lepton flavors and all analysis categories.

Results
The distributions of the discriminants from each of the analysis categories are combined in a profile likelihood fit to test for the presence of a signal, while simultaneously determining the normalization and constraining the differential distributions of the most important background components. As described in Section 6, in the signal regions, the output of the classification BDT is used as the discriminant while only the total event yield is used in the control regions, with the exception of CR 5j tt+≥1c and CR ≥6j tt+≥1c , where the H had T distribution is used. No distinction is made in the fit between signal and control regions, other than a different choice of discriminant variables. The binning of the classification BDT is optimized to maximize the analysis sensitivity while keeping the total MC statistical uncertainty in each bin to a level adjusted to avoid biases due to fluctuations in the predicted number of events.
The likelihood function, L(µ, θ), is constructed as a product of Poisson probability terms over all bins in each distribution. The Poisson probability depends on the predicted number of events in each bin, which in turn is a function of the signal-strength parameter µ = σ/σ SM and θ, where θ is the set of nuisance parameters that encode the effects of systematic uncertainties, and of the two free floating normalization factors k(tt + ≥1b) and k(tt + ≥1c) for the tt + ≥1b and tt + ≥1c backgrounds, respectively. The nuisance parameters are implemented in the likelihood function as Gaussian, log-normal or Poisson priors, with the exception of the normalization factors k(tt + ≥1b) and k(tt + ≥1c), for which no prior knowledge from theory or subsidiary measurements is assumed and hence which are only constrained by the profile likelihood fit to the data. The statistical uncertainty of the prediction, that incorporates the statistical uncertainty of the MC events and of the data-driven fake and non-prompt lepton estimate, is included in the likelihood in the form of additional nuisance parameters, one for each of the included bins. The test statistic t µ is defined as the profile likelihood ratio: t µ = −2 ln(L(µ,θ µ )/L(μ,θ)), whereμ andθ are the values of the parameters which maximize the likelihood function, andθ µ are the values of the nuisance parameters which maximize the likelihood function for a given value of µ. This test statistic is used to measure the probability that the observed data is compatible with the background-only hypothesis, and to perform statistical inferences about µ, such as upper limits using the CL s method [97][98][99]. The uncertainty of the best-fit value of the signal strength,μ, is obtained varying t µ by one unit. Figure 7 shows the observed event yield compared to the prediction in each control and signal region, both before the fit to data ('pre-fit') and after the fit to data ('post-fit'), performed in all the analysis categories in the two channels and with the signal-plus-background hypothesis. For the pre-fit prediction, the normalization factors for the tt + ≥1b and tt + ≥1c processes are set to 1, which corresponds to considering the prediction from P + P 8 for the fraction of each of these components relative to the total tt prediction. Figure 8 shows the H had T distributions in the tt + ≥1c-enriched control regions of the single-lepton channel, while Figures 9, 10 and 11 show the distributions of the classification BDTs in the dilepton and single-lepton signal regions, both before and after the fit. All these distributions are reasonably well modeled pre-fit within the assigned uncertainties. The level of agreement is improved post-fit due to the nuisance parameters being adjusted by the fit. In particular, the best-fit values of k(tt + ≥1b) and k(tt + ≥1c) are 1.24 ± 0.10 and 1.63 ± 0.23, respectively. The uncertainties in these measured normalization factors do not include the theory uncertainty of the corresponding tt + ≥1b and tt + ≥1c cross-sections. The post-fit uncertainty is also significantly reduced, as a result of the nuisance-parameter constraints and the correlations generated by the fit.
In addition to the distributions that are given as input to the fit, all the distributions of the input variables to the classification BDTs in the signal regions are checked post-fit, and no significant deviations of the predictions from data are found. Figure 12 shows the data compared to the post-fit prediction for three of these distributions, namely the Higgs-boson candidate mass distributions in the most sensitive signal regions in the dilepton channel and the single-lepton resolved channels as well as in the single-lepton boosted signal region. The probability of obtaining a discrepancy between these two signal-strength parameters equal to or larger than the one observed is 19%. Figure 13 shows the comparison between the combined µ and the two independent signal-strength parameters from the combined fit, with their uncertainties split into the statistical and systematic components. The statistical uncertainty is obtained by redoing the fit to data after fixing all the nuisance parameters to their post-fit values, with the exception of the free normalization factors in the fit: k(tt + ≥1c), k(tt + ≥1b) and µ. The total systematic uncertainty is obtained from the subtraction in quadrature of the statistical uncertainty from the total uncertainty. The statistical uncertainty contributes significantly less than the systematic component to the overall uncertainty of the measurement. When fitting the dilepton and single-lepton data separately, the observed signal strengths are 0.11 +1.36 −1.41 and 0.67 +0.71 −0.69 , respectively. These two signal-strength values are both lower than the combined measured µ due to the large correlations in the systematic uncertainties of the background prediction between the two channels.
The contributions from the different sources of uncertainty in the combined fit to µ are reported in Table 2. The total systematic uncertainty is dominated by the uncertainties in the modeling of the tt + ≥1b background, the second-largest source being the limited number of events in the simulated     Figure 7: Comparison of predicted and observed event yields in each of the control and signal regions, in the dilepton channel (a) before and (b) after the fit to the data, and in the single-lepton channel (c) before and (d) after the fit to the data. The ttH signal is shown both as a filled red area stacked on the backgrounds and separately for visibility as a dashed red line, normalized to the SM cross-section before the fit and to the fitted µ after the fit. The hatched area corresponds to the fitted uncertainty in the total prediction. The pre-fit plots do not include an uncertainty for the tt + ≥1b or tt + ≥1c normalization. samples, followed by the uncertainties in the b-tagging efficiency, the jet energy scale and resolution, and the signal process modeling. The 20 nuisance parameters describing the independent sources of systematic uncertainty with the largest contribution to the total uncertainty of the measured signal strength are reported in Figure 14, ranked by decreasing contribution. For each of these nuisance parameters, the best-fit value and the post-fit uncertainty are shown. The uncertainty coming from the comparison between the S 5F and the nominal prediction for the tt + ≥1b process, related to the choice of the NLO event generator for this background component, has the largest impact on the signal strength, followed by three uncertainties also related to the modeling of the tt + ≥1b background. Systematic uncertainties related to the ttH signal modeling, the modeling of the tt + ≥1c and tt + light backgrounds, and to experimental sources such as b-tagging, jet energy scale and resolution, also appear in Figure 14; however, their contributions are significantly smaller than the ones from the tt + ≥1b background. The Total unc.
Total unc.
(d) Figure 8: Comparison between data and prediction for the H had T distributions in the single-lepton tt + ≥1c-enriched control regions (a, c) before, and (b, d) after the combined dilepton and single-lepton fit to the data. Despite its small contribution in these control regions, the ttH signal prediction is shown stacked at the top of the background prediction, normalized to the SM cross-section before the fit and to the fitted µ after the fit. The pre-fit plots do not include an uncertainty for the tt + ≥1b or tt + ≥1c normalization. total uncertainty of the signal strength is reduced by 5% if the fit is performed excluding the systematic uncertainties not shown in this figure.       Figure 9: Comparison between data and prediction for the BDT discriminant in the dilepton signal regions (a, c, e) before, and (b, d, f) after the combined dilepton and single-lepton fit to the data. The ttH signal yield (solid red) is normalized to the SM cross-section before the fit and to the fitted µ after the fit. The dashed line shows the ttH signal distribution normalized to the total background prediction. The pre-fit plots do not include an uncertainty for the tt + ≥1b or tt + ≥1c normalization.       Figure 10: Comparison between data and prediction for the BDT discriminant in the single-lepton channel five-jet and boosted signal regions (a, c, e) before, and (b, d, f) after the combined dilepton and single-lepton fit to the data. The ttH signal yield (solid red) is normalized to the SM cross-section before the fit and to the fitted µ after the fit. The dashed line shows the ttH signal distribution normalized to the total background prediction. The pre-fit plots do not include an uncertainty for the tt + ≥1b or tt + ≥1c normalization.       Figure 11: Comparison between data and prediction for the BDT discriminant in the single-lepton channel six-jet signal regions (a, c, e) before, and (b, d, f) after the combined dilepton and single-lepton fit to the data. The ttH signal yield (solid red) is normalized to the SM cross-section before the fit and to the fitted µ after the fit. The dashed line shows the ttH signal distribution normalized to the total background prediction. The pre-fit plots do not include an uncertainty for the tt + ≥1b or tt + ≥1c normalization.

SR Pre-Fit
(c) Figure 12: Comparison between data and prediction for the Higgs-boson candidate mass from the reconstruction BDT trained without variables involving the Higgs-boson candidate (a) in the dilepton SR ≥4j 1 and (b) in the single-lepton SR ≥6j 1 , and (c) for the boosted Higgs-boson candidate in SR boosted , after the combined dilepton and singlelepton fit to the data. The ttH signal yield (solid red) is normalized to the fitted µ after the fit. The dashed red line shows the ttH signal distribution normalized to the total background yield. The dashed black line shows the pre-fit total background prediction.
The theoretical predictions for the tt + ≥1b process suffer from large uncertainties as reflected in the size of the difference between alternative simulated samples used to model this background. The corresponding systematic uncertainties are therefore large and are a crucial limiting factor for this search. The choice of nuisance parameters for systematic uncertainties related to the tt + ≥1b background is studied carefully to ensure sufficient flexibility in the fit to correct for possible mis-modeling of this background and avoid any bias in the measured signal strength. In total, 13 independent nuisance parameters are assigned to tt + ≥1b background modeling uncertainties. The capability of the fit to correct for mis-modeling effects, beyond the ones present in the distributions used in the fit, is confirmed by comparing the predictions of all input variables of the classification BDT obtained post-fit to data. As mentioned before, no significant deviations of the predictions from data are found and the agreement is improved post-fit. Alternative approaches to model the tt + ≥1b background, to define the associated uncertainties and to correlate them are also tested, and the corresponding results are found to be compatible with the nominal result.
To further validate the robustness of the fit, a pseudo-data set was built from simulated events by replacing the nominal tt background by an alternative sample that is not used in the definition of any uncertainty. This alternative sample was generated with P +P 6 and is similar to the sample used for the ttH(H → bb) analysis [16] in Run 1 of the LHC. The fit to this pseudo-data sample did not reveal any bias in the signal extraction. Figure 14 shows that some nuisance parameters are shifted in the fit from their nominal values. To understand the origin of these shifts, the corresponding nuisance parameters are switched to be uncorrelated between analysis categories and samples and the fit is repeated. These shifts are found to correct mainly the predictions of the tt background to the observed data in various regions. Similar shifts are observed when a background-only fit is performed after removing the bins with the most significant signal contributions. Moreover, the variations induced in the signal strength by these shifts are quantified by fixing the corresponding nuisance parameters to their pre-fit values, repeating the fit, and comparing the obtained µ-value with the one from the nominal fit. These variations were found to be smaller than the uncertainty in the signal strength. Independent signal-strength values extracted from different sets of analysis categories and from the two channels are also found to be compatible. Figure 14 also shows that the uncertainties corresponding to some nuisance parameters are reduced by the fit. When performing the profile likelihood fit, nuisance parameters associated with uncertainties affecting the discriminant distributions by variations that would result in large deviations from data are significantly constrained. The capability of the fit to constrain systematic uncertainties is validated on the pseudo-data sample described above, and on the pseudo-data sample produced from the nominal predictions, the Asimov dataset [97].
An excess of events over the expected SM background is found with an observed (expected) significance of 1.4 (1.6) standard deviations. A signal strength larger than 2.0 is excluded at the 95% confidence level, as shown in Figure 15. The expected significance and exclusion limits are calculated using the background estimate after the fit to the data. Figure 16 shows the event yield in data compared to the post-fit prediction for all events entering the analysis selection, grouped and ordered by the signal-to-background ratio of the corresponding final-discriminant bins. The predictions are shown for both the fit with the backgroundonly hypothesis and with the signal-plus-background hypothesis, where the signal is scaled to either the measured µ or the value of the upper limit on µ. tot ( stat syst ) Figure 13: Summary of the signal-strength measurements in the individual channels and for the combination. All the numbers are obtained from a simultaneous fit in the two channels, but the measurements in the two channels separately are obtained keeping the signal strengths uncorrelated, while all the nuisance parameters are kept correlated across channels.
Nuis. Param. Pull ATLAS -1 = 13 TeV, 36.1 fb s Figure 14: Ranking of the nuisance parameters included in the fit according to their impact on the measured signal strength µ. Only the 20 most highly ranked parameters are shown. Nuisance parameters corresponding to MC statistical uncertainties are not included here. The empty blue rectangles correspond to the pre-fit impact on µ and the filled blue ones to the post-fit impact on µ, both referring to the upper scale. The impact of each nuisance parameter, ∆µ, is computed by comparing the nominal best-fit value of µ with the result of the fit when fixing the considered nuisance parameter to its best-fit value,θ, shifted by its pre-fit (post-fit) uncertainties ±∆θ (±∆θ). The black points show the pulls of the nuisance parameters relative to their nominal values, θ 0 . These pulls and their relative post-fit errors, ∆θ/∆θ, refer to the scale on the bottom axis. The parameter k(tt + ≥1b) refers to the floating normalization of the tt + ≥1b background, for which the pre-fit impact on µ is not defined, and for which both θ 0 and ∆θ are set to 1. For experimental uncertainties that are decomposed into several independent sources, NP I and NP II correspond to the first and second nuisance parameters, ordered by their impact on µ, respectively. Table 2: Breakdown of the contributions to the uncertainties in µ. The line 'background-model statistical uncertainty' refers to the statistical uncertainties in the MC events and in the data-driven determination of the non-prompt and fake lepton background component in the single-lepton channel. The contribution of the different sources of uncertainty is evaluated after the fit described in Section 8. The total statistical uncertainty is evaluated, as described in the text, by fixing all the nuisance parameters in the fit except for the free-floating normalization factors for the tt + ≥1b and tt + ≥1c background components. The contribution from the uncertainty in the normalization of both tt + ≥1b and tt + ≥1c is then included in the quoted total statistical uncertainty rather than in the systematic uncertainty component. The statistical uncertainty evaluated after also fixing the normalization of tt + ≥1b and tt + ≥1c is then indicated as 'intrinsic statistical uncertainty'. The other quoted numbers are obtained by repeating the fit after having fixed a certain set of nuisance parameters corresponding to a group of systematic uncertainty sources, and subtracting in quadrature the resulting total uncertainty of µ from the uncertainty from the full fit. The same procedure is followed for quoting the individual effects of the tt + ≥1b and the tt + ≥1c normalization. The total uncertainty is different from the sum in quadrature of the different components due to correlations between nuisance parameters built by the fit.  (   Figure 15: Summary of the 95% confidence level (CL) upper limits on σ(ttH) relative to the SM prediction in the individual channels and for the combination. The observed limits are shown, together with the expected limits both in the background-only hypothesis (dotted black lines) and in the SM hypothesis (dotted red lines). In the case of the expected limits in the background-only hypothesis, one-and two-standard-deviation uncertainty bands are also shown. The limits for the two individual channels are derived consistently with Figure 13, both extracted from the profile likelihood including the data in both channels, but with independent signal strengths in the two channels.  The signal is then shown normalized to the best-fit value and to the value excluded at the 95% CL, in both cases summed to the background prediction from the fit. The lower frame reports for each bin the pull (residual divided by its uncertainty) of the data relative to the background prediction from the fit. These data pulls are compared to the pulls of the signal-plus-background prediction from the fit, assuming a signal strength equal to the best-fit value (solid red line) and equal to the exclusion limit (dashed orange line). The background and its pull are also shown after the fit to data assuming zero signal contribution (dashed black line, obscured by solid line in the upper frame). The first bin includes the underflow. A value higher than 2.0 is excluded at the 95% confidence level, compared to an expected exclusion limit of 1.2 in the absence of signal. The measurement uncertainty is presently dominated by systematic uncertainties, and more specifically by the uncertainty in the theoretical knowledge of the tt + ≥1b production process. An improved understanding of this background will be important for future efforts to observe the ttH(H → bb) process.

Appendix A Yield tables
The predicted event yields in each of the analysis categories, broken down into the different signal and background contributions and compared to the observed yields in data, are reported in Tables 3, 4 and 5. Both the pre-fit and post-fit predictions are shown, where post-fit refers to the combined fit to the dilepton and single-lepton channels with the signal-plus-background hypothesis, reported in Section 8. The total uncertainties of each of the signal and background components, and of the total prediction are also reported. Table 3: Event yields in the dilepton channel (top) control regions and (bottom) signal regions. Post-fit yields are after the combined fit in all channels to data. The uncertainties are the sum in quadrature of statistical and systematic uncertainties in the yields. In the post-fit case, these uncertainties are computed taking into account correlations among nuisance parameters and among the normalization of different processes. The uncertainty in the tt + ≥1b and tt + ≥1c normalization is not defined pre-fit and therefore only included in the post-fit uncertainties; the reported prefit uncertainties on the tt + ≥1b and tt + ≥1c components arise only from acceptance effects. For the ttH signal, the pre-fit yield values correspond to the theoretical prediction and corresponding uncertainties, while the post-fit yield and uncertainties correspond to those in the signal-strength measurement.

B Input variables to the classification BDTs
In this appendix, the full list of variables used as inputs to the classification BDT, described in Section 6, in each of the signal regions is reported. Variables are listed separately in Table 6 for the dilepton channel, in Table 7 for the resolved single-lepton channel and in Table 8 for the boosted category. Variables are grouped according to the type of information that is exploited. The variables from the reconstruction BDT exploit the chosen jet-parton assignments described in Section 6.2. The b-tagging discriminant assigned to each jet is defined in Section 3. The most powerful variables in the classification BDT are the reconstruction BDT output, the LHD (Section 6.3) and the MEM D1 (Section 6.4). The large-R jets used to build the Higgs-boson and top-quark candidates in the boosted category are defined in Section 3.
Some kinematic and topological variables are built considering only b-tagged-jets in the event. The b-tagging requirements for these jets are optimized separately for each variable in each region to improve the classification BDT performance. In the resolved single-lepton channel, b-tagged-jets are defined as the four jets with the largest value of the b-tagging discriminant. If two jets have the same b-tagging discriminant value, they are ordered by decreasing jet p T value. In the dilepton channel, the b-tagging requirements depend on the signal region: in SR  Table 6: Variables used in the classification BDTs in the dilepton signal regions. For variables from the reconstruction BDT, those with a * are from the BDT using Higgs-boson information, those with no * are from the BDT without Higgs-boson information while for those with a ** both versions are used. These two versions of the reconstruction BDT are described in Section 6.2.

Variable
Definition SR  Sum of b-tagging discriminants of jets from best Higgs candidate from the reconstruction BDT -- Table 7: Input variables to the classification BDTs in the single-lepton signal regions. For variables from the reconstruction BDT, those with a * are from the BDT using Higgs-boson information, those with no * are from the BDT without Higgs-boson information. These two versions of the reconstruction BDT are described in Section 6.2. The MEM D1 variable is only used in SR ≥6j 1 , while variables based on the b-tagging discriminant are not used in this region.

Variable
Definition SR     12 Top-quark candidate first splitting scale [101] Variables from b-tagging w b-tag Sum of b-tagging discriminants of all b-jets w add b-tag /w b-tag Ratio of sum of b-tagging discriminants of additional b-jets to all b-jets

The ATLAS Collaboration
States of America