) Search for the standard model Higgs boson produced in association with top quarks and decaying into a bb̄ pair in pp collisions at √s̅ = 13 TeV with the ATLAS detector

A search for the standard model Higgs boson produced in association with a top-quark pair, t ¯ tH , is presented. The analysis uses 36 . 1 fb − 1 of pp collision data at ﬃﬃﬃ s p ¼ 13 TeV collected with the ATLAS detector at the Large Hadron Collider in 2015 and 2016. The search targets the H → b ¯ b decay mode. The selected events contain either one or two electrons or muons from the top-quark decays, and are then categorized according to the number of jets and how likely these are to contain b -hadrons. Multivariate techniques are used to discriminate between signal and background events, the latter being dominated by t ¯ t þ jets production. For a Higgs boson mass of 125 GeV, the ratio of the measured t ¯ tH signal cross-section to the standard model expectation is found to be μ ¼ 0 . 84 þ 0 . 64 − 0 . 61 . A value of μ greater than 2.0 is excluded at 95% confidence level (C.L.) while the expected upper limit is μ < 1 . 2 in the absence of a t ¯ tH signal.

A search for the standard model Higgs boson produced in association with a top-quark pair, ttH, is presented. The analysis uses 36.1 fb −1 of pp collision data at ffiffi ffi s p ¼ 13 TeV collected with the ATLAS detector at the Large Hadron Collider in 2015 and 2016. The search targets the H → bb decay mode. The selected events contain either one or two electrons or muons from the top-quark decays, and are then categorized according to the number of jets and how likely these are to contain b-hadrons. Multivariate techniques are used to discriminate between signal and background events, the latter being dominated by tt þ jets production. For a Higgs boson mass of 125 GeV, the ratio of the measured ttH signal cross-section to the standard model expectation is found to be μ ¼ 0.84 þ0. 64 −0.61 . A value of μ greater than 2.0 is excluded at 95% confidence level (C.L.) while the expected upper limit is μ < 1.2 in the absence of a ttH signal. DOI: 10.1103/PhysRevD.97.072016

I. INTRODUCTION
After the discovery of the Higgs boson [1][2][3] in 2012 by the ATLAS [4] and CMS [5] Collaborations, attention has turned to more detailed measurements of its properties and couplings as a means of testing the predictions of the standard model (SM) [6][7][8]. In particular, the coupling to the top quark, the heaviest particle in the SM, could be very sensitive to effects of physics beyond the SM (BSM) [9]. Assuming that no BSM particle couples to the Higgs boson, the ATLAS and CMS experiments measured a value of the top-quark's Yukawa coupling equal to 0.87 AE 0.15 times the SM prediction by combining [10] their respective Higgsboson measurements from the Run 1 dataset collected at center-of-mass energies of 7 and 8 TeV at the Large Hadron Collider (LHC). This measurement relies largely on the gluon-gluon fusion production mode and on the decay mode to photons, which both depend on loop contributions with a top quark. If no assumption is made about the particle content of such loop contributions, then the top-quark coupling is only determined through tree-level processes, and a value of 1.4 AE 0.2 times the SM prediction is obtained.
Higgs-boson production in association with a pair of top quarks, ttH, is the most favorable production mode for a direct measurement of the top-quark's Yukawa coupling [11][12][13][14]. Although this production mode only contributes around 1% of the total Higgs-boson production cross section [15], the top quarks in the final state offer a distinctive signature and allow many Higgs-boson decay modes to be accessed. Of these, the decay to two b-quarks is predicted to have a branching fraction of about 58% [15], the largest Higgs-boson decay mode. This decay mode is sensitive to the b-quark's Yukawa coupling, the second largest in the SM. In order to select events at the trigger level and reduce the backgrounds, the analysis targets events in which one or both top quarks decay semileptonically, producing an electron or a muon. 1 The main experimental challenges for this channel are the low combined efficiency to reconstruct and identify all finalstate particles, the combinatorial ambiguity from the many jets containing b-hadrons in the final state which makes it difficult to reconstruct the Higgs boson, and the large backgrounds from the production of tt þ jets especially when the associated jets stem from bor c-quarks. Some representative Feynman diagrams for the ttH signal are shown in Fig. 1, together with the dominant tt þ bb background.
The ATLAS Collaboration searched for ttH production with Higgs-boson decays to bb at ffiffi ffi s p ¼ 8 TeV, using tt decays with at least one lepton [16] or no leptons [17]. A combined signal strength μ ¼ σ=σ SM of 1.4 AE 1.0 was measured. The CMS Collaboration searched for the same process at ffiffi ffi s p ¼ 7 TeV and ffiffi ffi s p ¼ 8 TeV using tt decays with a single-lepton or dilepton in the final state, obtaining a signal strength of 0.7 AE 1.9 [18]. These results were combined with each other, and with results for Higgs boson decay to vector bosons, to τ-leptons or to photons [18][19][20], resulting in an observed (expected) significance of 4.4 (2.0) standard deviations for ttH production [10]. The measured signal strength is 2.3 þ0.7 −0.6 . In this article, a search for ttH production with 36.1 fb −1 of pp collision data at ffiffi ffi s p ¼ 13 TeV is presented. The analysis targets Higgs-boson decays to b-quarks, but all the decay modes are considered and may contribute to the signal. Events with either one or two leptons are taken into account, and exclusive analysis categories are defined according to the number of leptons, the number of jets, and the value of a b-tagging discriminant which provides a measure of how likely a jet is to contain a b-hadron. In the single-lepton channel, a specific category, referred to as 'boosted' in the following, is designed to select events containing a Higgs boson and with at least one of the two top quarks produced at high transverse momentum. In the analysis categories with the largest signal contributions, multivariate discriminants are used to classify events as more or less signal-like. The signal-rich categories are analyzed together with the signal-depleted ones in a combined profile likelihood fit that simultaneously determines the event yields for the signal and for the most important background components, while constraining the overall background model within the assigned systematic uncertainties. The combination of the results presented in this article with the results from other analyses targeting ttH production with different final states is reported in Ref. [21]. The article is organized as follows. The ATLAS detector is described in Sec. II. Section III summarizes the selection criteria applied to events and physics objects. The signal and background modeling are presented in Sec. IV. Section V describes the event categorization while Sec. VI presents the multivariate analysis techniques. The systematic uncertainties are summarized in Sec. VII. Section VIII presents the results and Sec. IX gives the conclusions.

II. ATLAS DETECTOR
The ATLAS detector [22] at the LHC covers nearly the entire solid angle 2 around the collision point. It consists of an inner tracking detector surrounded by a thin superconducting solenoid magnet producing a 2 T axial magnetic field, electromagnetic and hadronic calorimeters, and an external muon spectrometer (MS) incorporating three large toroid magnet assemblies. The inner detector (ID) consists of a high-granularity silicon pixel detector and a silicon microstrip tracker, together providing precision tracking in the pseudorapidity range jηj < 2.5, complemented by a straw-tube transition radiation tracker providing tracking and electron identification information for jηj < 2.0. A new innermost silicon pixel layer, the insertable B-layer [23] (IBL), was added to the detector between Run 1 and Run 2. The IBL improves the ability to identify displaced vertices and thereby significantly improves the b-tagging performance [24]. The electromagnetic sampling calorimeter uses lead or copper as the absorber material and liquid argon (LAr) as the active medium, and is divided into barrel (jηj < 1.475), endcap (1.375 < jηj < 3.2) and forward (3.1 < jηj < 4.9) regions. Hadron calorimetry is also based  ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the center of the detector and the z-axis coinciding with the axis of the beam pipe. The xaxis points from the IP to the center of the LHC ring, and the yaxis points upward. Cylindrical coordinates (r,ϕ) are used in the transverse plane, ϕ being the azimuthal angle around the beam pipe. The pseudorapidity is defined in terms of the polar angle θ as η ¼ − ln tanðθ=2Þ. Unless stated otherwise, angular distance is measured in units of ΔR ≡ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ðΔηÞ 2 þ ðΔϕÞ 2 p . on the sampling technique and covers jηj < 4.9, with either scintillator tiles or LAr as the active medium and with steel, copper or tungsten as the absorber material. The muon spectrometer measures the deflection of muons with jηj < 2.7 using multiple layers of high-precision tracking chambers located in a toroidal field. The field integral of the toroids ranges between 2.0 and 6.0 Tm across most of the detector. The muon spectrometer is also instrumented with separate trigger chambers covering jηj < 2.4. A two-level trigger system [25], using custom hardware followed by a software-based level, is used to reduce the trigger rate to an average of around one kHz for offline storage.

III. EVENT SELECTION
Events are selected from pp collisions at ffiffi ffi s p ¼ 13 TeV recorded by the ATLAS detector in 2015 and 2016. Only events for which all relevant subsystems were operational are considered. Events are required to have at least one vertex with two or more tracks with transverse momentum p T > 0.4 GeV. The vertex with the largest sum of the squares of the transverse momenta of associated tracks is taken as the primary vertex. The event reconstruction is affected by multiple pp collisions in a single bunch crossing and by collisions in neighboring bunch crossings, referred to as "pileup." The number of interactions per bunch crossing in this data set ranges from about 8 to 45 interactions. The data set corresponds to an integrated luminosity of 3.2 AE 0.1 fb −1 recorded in 2015 and 32.9 AE 0.7 fb −1 recorded in 2016, for a total of 36.1 AE 0.8 fb −1 [26]. Events in both the single-lepton and dilepton channels were recorded using single-lepton triggers. Events are required to fire triggers with either low lepton p T thresholds and a lepton isolation requirement, or with higher thresholds but with a looser identification criterion and without any isolation requirement. The lowest p T threshold used for muons is 20 (26) GeV in 2015 (2016), while for electrons the threshold is 24 (26) GeV.
Electrons are reconstructed from energy deposits (clusters) in the electromagnetic calorimeter matched to tracks reconstructed in the ID [27,28] and are required to have p T > 10 GeV and jηj < 2.47. Candidates in the calorimeter barrel-endcap transition region (1.37 < jηj < 1.52) are excluded. Electrons must satisfy the loose identification criterion described in Ref.
[28], based on a likelihood discriminant combining observables related to the shower shape in the calorimeter and to the track matching the electromagnetic cluster. Muons are reconstructed from either track segments or full tracks in the MS which are matched to tracks in the ID [29]. Tracks are then re-fitted using information from both detector systems. Muons are required to have p T > 10 GeV and jηj < 2.5. To reduce the contribution of leptons from hadronic decays (non-prompt leptons), both electrons and muons must satisfy isolation criteria based on information from both the tracker and the calorimeter. The loose lepton isolation working point [28,29] is used. Finally, lepton tracks must match the primary vertex of the event: the longitudinal impact parameter IP z is required to satisfy jIP z j < 0.5 mm, while the transverse impact parameter significance, jIP rϕ j=σ IP rϕ , must be less than 5 for electrons and 3 for muons.
Jets are reconstructed from three-dimensional topological energy clusters [30] in the calorimeter using the anti-k t jet algorithm [31] implemented in the FASTJET package [32] with a radius parameter of 0.4. Each topological cluster is calibrated to the electromagnetic scale response prior to jet reconstruction. The reconstructed jets are then calibrated to the jet energy scale derived from simulation and in situ corrections based on 13 TeV data [33]. After energy calibration, jets are required to have p T > 25 GeV and jηj < 2.5. Quality criteria are imposed to identify jets arising from noncollision sources or detector noise, and any event containing such a jet is removed [34]. Finally, to reduce the effect of pileup, an additional requirement is made using an algorithm that matches jets with p T < 60 GeV and jηj < 2.4 to tracks with p T > 0.4 GeV to identify jets consistent with the primary vertex. This algorithm is known as jet vertex tagger [35], referred to as JVT in the remainder of this article.
Jets are tagged as containing b-hadrons through a multivariate b-tagging algorithm (MV2c10) that combines information from an impact-parameter-based algorithm, from the explicit reconstruction of an inclusive secondary vertex and from a multi-vertex fitter that attempts to reconstruct the bto c-hadron decay chain [36,37]. This algorithm is optimized to efficiently select jets containing b-hadrons (b-jets) and separate them from jets containing c-hadrons (c-jets), jets containing hadronically decaying τleptons (τ-jets) and from other jets (light jets). Four working points are defined by different MV2c10 discriminant output thresholds and are referred to in the following as loose, medium, tight and very tight. The efficiency for bjets with p T > 20 GeV in simulated tt events to pass the different working points are 85%, 77%, 70% and 60%, respectively, corresponding to rejection factors 3 of c-jets in the range 3-35 and of light jets in the range 30-1500. A b-tagging discriminant value is assigned to each jet according to the tightest working point it satisfies, ranging from 1 for a jet that does not satisfy any of the b-tagging criteria defined by the considered working points up to 5 for jets satisfying the very tight criteria. This b-tagging discriminant is used to categorize selected events as discussed in Sec. V and as an input to multivariate analysis techniques described in Sec. VI.
Hadronically decaying τ leptons (τ had ) are distinguished from jets using the track multiplicity and a multivariate discriminant based on the track collimation, further jet 3 The rejection factor is defined as the inverse of the efficiency to pass a given b-tagging working point.
Events are required to have at least one reconstructed lepton with p T > 27 GeV matching a lepton with the same flavor reconstructed by the trigger algorithm within ΔR < 0.1. Events in the dilepton channel must have exactly two leptons with opposite electric charge. The subleading lepton p T must be above 15 GeV in the ee channel or above 10 GeV in the eμ and μμ channels. In the ee and μμ channels, the dilepton invariant mass must be above 15 GeV and outside of the Z-boson mass window [83][84][85][86][87][88][89][90][91][92][93][94][95][96][97][98][99] To maintain orthogonality with other ttH search channels [21], dilepton events are vetoed if they contain one or more τ had candidates. Events enter the single-lepton channel if they contain exactly one lepton with p T > 27 GeV and no other selected leptons with p T > 10 GeV. In the single-lepton channel, events are removed if they contain two or more τ had candidates.
To improve the purity in events passing the above selection, selected leptons are further required to satisfy additional identification and isolation criteria, otherwise the corresponding events are removed. For electrons, the tight identification criterion based on a likelihood discriminant [28] is used, while for muons the medium identification criterion [29] is used. Both the electrons and muons are required to satisfy the Gradient isolation criteria [28,29], which become more stringent as the p T of the leptons considered drops.
Finally, events in the dilepton channel must have at least three jets, of which at least two must be b-tagged at the medium working point. Single-lepton events containing at least one boosted Higgs-boson candidate, at least one boosted top-quark candidate and at least one additional jet b-tagged at the loose working point enter the boosted category. Events that do not enter the boosted category and have at least five jets, with at least two of them b-tagged at the very tight working point or three of them b-tagged at the medium working point, are classified as "resolved" singlelepton events. The fraction of simulated ttHðH → bbÞ events passing the dilepton event selection is 2.5%. These fractions are 8.7% for the resolved single-lepton channel and 0.1% for the boosted category.

IV. SIGNAL AND BACKGROUND MODELING
This section describes the simulation and data-driven techniques used to model the ttH signal and the background processes, to train the multivariate discriminants and to define the templates for the signal extraction fit. In this analysis, most Monte Carlo (MC) samples were produced using the full ATLAS detector simulation [42] based on GEANT4 [43]. A faster simulation, where the full GEANT4 simulation of the calorimeter response is replaced by a detailed parameterization of the shower shapes [44], was adopted for some of the samples used to estimate modeling systematic uncertainties. To simulate the effects of pileup, additional interactions were generated using PYTHIA 8.186 [45] and overlaid onto the simulated hard-scatter event. 4 The rapidity is defined as y ¼ 1 2 ln Eþp z E−p z where E is the energy and p z is the longitudinal component of the momentum along the beam pipe. Simulated events are reweighted to match the pileup conditions observed in the data. All simulated events are processed through the same reconstruction algorithms and analysis chain as the data. In the simulation, the top-quark mass is assumed to be m t ¼ 172.5 GeV. Decays of band c-hadrons were performed by EVTGEN v1.2.0 [46], except in samples simulated by the SHERPA event generator.

A. Signal modeling
The ttH signal process was modeled using MADGRAPH5_aMC@NLO [47] (referred to in the following as MG5_aMC@NLO) version 2.3.2 for the matrix element (ME) calculation at next-to-leading-order (NLO) accuracy in quantum chromodynamics (QCD), interfaced to the PYTHIA 8.210 parton shower (PS) and hadronization model using the A14 set of tuned parameters [48]. The NNPDF3.0NLO parton distribution function (PDF) set [49] was used, and the factorization and renormalization scales were set to μ F ¼ μ R ¼ H T =2, with H T defined as the scalar sum of the transverse masses ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p 2 T þ m 2 p of all finalstate particles. The top quarks were decayed using MADSPIN [50], preserving all spin correlations. The Higgs-boson mass was set to 125 GeV and all decay modes were considered. The ttH cross section of 507 þ35 −50 fb was computed [15,[51][52][53][54][55] at NLO accuracy in QCD and includes NLO electroweak corrections. The branching fractions were calculated using HDECAY [15,56].

B. tt + jets background
The nominal sample used to model the tt background was generated using the POWHEG-BOX v2 NLO event generator [57][58][59][60], referred to as POWHEG in the remainder of this article, with the NNPDF3.0NLO PDF set. The h damp parameter, which controls the transverse momentum of the first gluon emission beyond the Born configuration, was set to 1.5 times the top-quark mass [61]. The parton shower and the hadronization were modeled by PYTHIA 8.210 with the A14 set of tuned parameters. The renormalization and factorization scales were set to the transverse mass of the top quark, defined where p T;t is the transverse momentum of the top quark in the tt center-of-mass reference frame. The sample is normalized using the predicted cross-section of 832 þ46 −51 pb, calculated with the Top++2.0 program [62] at next-to-next-to-leading order (NNLO) in perturbative QCD including resummation of next-to-next-to-leading logarithmic (NNLL) soft gluon terms [63][64][65][66]. Alternative tt samples used to derive systematic uncertainties are described in Sec. VII.
The tt þ jets background is categorized according to the flavor of additional jets in the event, using the same procedure as described in Ref. [16]. Generator-level particle jets are reconstructed from stable particles (mean lifetime τ > 3 × 10 −11 seconds) using the anti-k t algorithm with a radius parameter R ¼ 0.4, and are required to have p T > 15 GeV and jηj < 2.5. This categorization employs a jet flavor-labeling procedure that is more refined than the one described in Sec. III. The flavor of a jet is determined by counting the number of bor c-hadrons within ΔR < 0.4 of the jet axis. Jets matched to exactly one b-hadron, with p T above 5 GeV, are labeled single-b-jets, while those matched to two or more b-hadrons are labeled B-jets (with no p T requirement on the second hadron); single-c-and C-jets are defined analogously, only considering jets not already defined as single-b-or B-jets. Events that have at least one single-b-or B-jet, not counting heavy-flavor jets from top-quark or W-boson decays, are labeled as tt þ ≥ 1b; those with no single-b-or B-jet but at least one single-c-or C-jet are labeled as tt þ ≥ 1c. Finally, events not containing any heavy-flavor jets aside from those from top-quark or W-boson decays are labeled as tt þ light. This classification is used to define the background categories in the likelihood fit. A finer classification is then used to assign correction factors and estimate uncertainties: events with exactly two single-b-jets are labeled as tt þ bb, those with only one single-b-jet are labeled as tt þ b, and those with only one B-jet are labeled as tt þ B, the rest of the tt þ ≥ 1b events being labeled as tt þ ≥ 3b. Events with additional b-jets entirely originating from multiparton interactions (MPI) or b-jets from finalstate radiation (FSR), i.e. originating from gluon radiation from the top-quark decay products, are considered separately in the tt þ bðMPI=FSRÞ subcategory. Background events from tt containing extra c-jets are divided analogously.
To model the dominant tt þ ≥ 1b background with the highest available precision, the relative contributions of the different subcategories, tt þ ≥ 3b, tt þ bb, tt þ B and tt þ b, in the POWHEG+PYTHIA 8 sample described above are scaled to match those predicted by an NLO ttbb sample including parton showering and hadronization [67], generated with SHERPA+OPENLOOPS [68,69]. The sample was produced with SHERPA version 2.1.1 and the CT10 fourflavor (4F) scheme PDF set [70,71]. The renormalization scale for this sample was set to the CMMPS value, [67], while the factorization scale was set to The resummation scale μ Q , which sets an upper bound for the hardness of the parton-shower emissions, was also set to H T =2. This sample, referred to as "SHERPA4F" in the remainder of this article, employs a description of the kinematics of the two additional b-jets with NLO precision in QCD, taking into account the b-quark mass, and is therefore the most precise MC prediction for the tt þ ≥ 1b process available at present. Topologies that are not included in this NLO calculation but are labeled as tt þ ≥ 1b, i.e. events in the tt þ bðMPI=FSRÞ subcategory, are not scaled. Figure 2 shows the predicted fractions for each of the tt þ ≥ 1b subcategories, with the POWHEG+PYTHIA 8 inclusive tt sample compared to the tt þ bb SHERPA4F sample. The tt þ bðMPI=FSRÞ subcategory is not present in the tt þ bb SHERPA4F sample and accounts for 10% of the events in the POWHEG+PYTHIA 8 tt þ ≥ 1b sample.

C. Other backgrounds
Samples of ttW and ttZ (ttV) events were generated with an NLO matrix element using MG5_aMC@NLO interfaced to PYTHIA 8.210 with the NNPDF3.0NLO PDF and the A14 parameter set.
Samples of Wt and s-channel single-top-quark backgrounds were generated with POWHEG-BOX v1 at NLO accuracy using the CT10 PDF set. Overlap between the tt and Wt final states was handled using the "diagram removal" scheme [72]. The t-channel single-top-quark events were generated using the POWHEG-BOX v1 event generator at NLO accuracy with the four-flavor PDF set CT10 4F. For this process, the top quarks were decayed using MADSPIN. All single-top-quark samples were interfaced to PYTHIA 6.428 [73] with the Perugia 2012 set of tuned parameters [74]. The single-top-quark Wt, tand schannel samples are normalized using the approximate NNLO theoretical cross-sections [75][76][77].
Samples of W=Z production in association with jets were generated using SHERPA 2.2.1. The matrix elements were calculated for up to two partons at NLO and four partons at leading order (LO) using COMIX [78] and OPENLOOPS, and merged with the SHERPA parton shower [79] using the ME+PS@NLO prescription [80]. The NNPDF3.0NNLO PDF set was used in conjunction with dedicated partonshower tuning. The W=Z þ jet events are normalized using the NNLO cross sections [81]. For Z þ jet events, the normalization of the heavy-flavor component is corrected by a factor 1.3, extracted from dedicated control regions in data, defined by requiring two opposite-charge same-flavor leptons (e þ e − or μ þ μ − ) with an invariant mass, m ll , inside the Z-boson mass window 83-99 GeV. The diboson þ jet samples were generated using SHERPA 2.1.1 as described in Ref. [82].
Higgs-boson production in association with a single top quark is rare in the SM, but is included in the analysis and treated as background. Samples of single top quarks produced in association with a W boson and with a Higgs boson, tWH, were produced with MG5_aMC@NLO interfaced to HERWIG++ [83] with the CTEQ6L1 PDF set. Samples of single top quarks plus Higgs boson plus jets, tHqb, were produced at LO with MG5_aMC@NLO interfaced to PYTHIA 8, using the CT10 4F scheme PDF set. The other Higgs-boson production modes were found to be negligible and are not considered. Four-top production (tttt) as well as ttWW events were generated with MG5_aMC@NLO with LO accuracy and interfaced with PYTHIA 8. Events from tZ production were also generated with MG5_aMC@NLO with LO accuracy, but interfaced with PYTHIA 6. The process tZW was also generated with MG5_aMC@NLO interfaced with PYTHIA 8, but with NLO accuracy.
In the single-lepton channel, the background from events with a jet or a photon misidentified as a lepton (hereafter referred to as fake lepton) or non-prompt lepton is estimated directly from data using a matrix method [84]. A data sample enhanced in fake and non-prompt leptons is selected by removing the lepton isolation requirements and, for electrons, loosening the identification criteria. Next, the efficiency for these "loose" leptons to satisfy the nominal selection ("tight") criteria is measured in data, separately for real prompt leptons and for fake or nonprompt leptons. For real prompt leptons the efficiency is measured in Z-boson events, while for fake and non-prompt leptons it is estimated from events with low missing transverse momentum and low values of the reconstructed leptonic W-boson transverse mass. 5 With this information, the number of fake or nonprompt leptons satisfying the tight criteria can be calculated by inverting the matrix defined by the two equations: where N l (N t ) is the number of events observed in data passing the loose (tight) lepton selection, N l r (N l f ) is the number of events with a real prompt (fake or nonprompt) lepton in the loose lepton sample, and ε r (ε f ) is the efficiency for these events to pass the tight lepton selection. By generalizing the resulting formula to extract ε f N l f , a weight is assigned to each event selected in the loose lepton data sample, providing a prediction for both the yields and the kinematic distribution shapes for the fake and nonprompt lepton background. In the three most sensitive single-lepton signal regions, SR ≥6j 1 , SR ≥6j 2 and SR 5j 1 (see Sec. V), the contribution from events with a fake or nonprompt lepton is found to be very small, consistent with zero, and is neglected. In the dilepton channel, this background is estimated from simulation and is normalized to data in a control region with two same-sign leptons.
All background samples described in this section, apart from the ttV samples, are referred to as 'non-tt' and grouped together in the figures and tables. The contribution to the total background prediction from non-tt varies between 4% and 15% depending on the considered signal or control region, as can be seen in Appendix A.

V. EVENT CATEGORIZATION
After the selection, the data sample is dominated by background from tt events. In order to take advantage of the higher jet and b-jet multiplicities of the ttH signal process, events are classified into nonoverlapping analysis categories based on the total number of jets, as well as the number of b-tagged jets at the four working points. Events in the boosted single-lepton category are not further categorized due to the small number of selected events in this category. Events in the dilepton (resolved single-lepton) channel are first classified according to whether the number of jets is exactly three (five) or at least four (six). These events are then further subdivided into analysis categories, depending on the number of jets tagged at the four b-tagging working points, or, equivalently, on the values of the b-tagging discriminant for the jets. The b-tagging requirements are optimized in order to obtain categories enriched in one of the relevant sample components: ttH plus tt þ bb, tt þ b, tt þ ≥ 1c and tt þ light. The analysis categories where ttH and tt þ bb are enhanced relative to the other backgrounds are referred to as "signal regions"; in these, multivariate techniques are used to further separate the ttH signal from the background events. The remaining analysis categories are referred to as "control regions"; no attempt is made to separate the signal from the background in these analysis categories, but they provide stringent constraints on backgrounds and systematic uncertainties in a combined fit with the signal regions.
In the dilepton channel, three signal regions are defined, with different levels of purity for the ttH and tt þ bb components. The signal region with the highest ttH signal purity, referred to as SR ≥4j 1 , is defined by requiring at least four jets of which three are b-tagged at the very tight working point and another one is b-tagged at the tight working point. The other two signal regions, SR ≥4j 2 and SR ≥4j 3 , are defined with looser b-tagging requirements. The remaining dilepton events with at least four jets are divided into two control regions, one enriched in tt þ light, CR ≥4j tt þlight , and one in tt þ ≥ 1c, CR ≥4j tt þ≥1c . Dilepton events with three jets are split into two control regions, CR 3j tt þlight and CR 3j tt þ≥1b , enriched in tt þ light and tt þ ≥ 1b, respectively. The detailed definition of the signal and control regions for the dilepton channel is presented in Fig. 3.
In the single-lepton channel, five signal regions are formed from events passing the resolved selection, three requiring at least six jets, and the other two requiring exactly five jets. They are referred to as SR ≥6j 1 , SR ≥6j 2 , SR ≥6j 3 , SR 5j 1 and SR 5j 2 . The two purest signal regions, SR ≥6j 1 and SR 5j 1 , require four b-tagged jets at the very tight working point, while looser requirements are applied in the other signal regions. Events passing the boosted single-lepton selection form a sixth signal region, SR boosted . The remaining events with at least six jets are then categorized into three control regions enriched in tt þ light, tt þ ≥ 1c and tt þ b, referred to as CR ≥6j tt þlight , CR ≥6j tt þ≥1c , CR ≥6j tt þb , respectively. Analogously, remaining events with exactly five jets are categorized into other three control regions, referred to as CR 5j tt þlight , CR 5j tt þ≥1c and CR 5j tt þb . The detailed definition of the signal and control regions for the resolved single-lepton channel is presented in Fig. 4.

ATLAS
= 13 TeV s Dilepton FIG. 5. Fractional contributions of the various backgrounds to the total background prediction in each analysis category (a) in the dilepton channel and (b) in the single-lepton channel. The predictions for the various background contributions are obtained through the simulation and the data-driven estimates described in Sec. IV. The tt background is divided as described in Sec. IV. The predicted event yields in each of the analysis categories, broken down into the different signal and background contributions, are reported in Appendix A. Figures 5 and 6 show, respectively, the fraction of the different background components as well as the ttH signal purity for each of the signal and control regions in the dilepton and single-lepton channels. The H → bb decay represents 89% of the ttH signal events in the signal regions of the dilepton channel, 96% in the signal regions of the resolved single-lepton channel and 86% in the boosted signal region.

VI. MULTIVARIATE ANALYSIS TECHNIQUES
In each of the signal regions, a boosted decision tree (BDT) is exploited to discriminate between the ttH signal and the backgrounds. This BDT is referred to as the "classification BDT" in the following. The distributions of the classification BDTs in the signal regions are used as the final discriminants for the profile likelihood fit described in Sec. VIII. In the control regions, the overall event yield is used as input to the fit, except in those enriched in tt þ ≥ 1c in the single-lepton channel, CR 5j tt þ≥1c and CR ≥6j tt þ≥1c ; in these two control regions, the distribution of the scalar sum of the p T of the jets, H had T , is used to further control the tt þ ≥ 1c background. The final state of the ttHðH → bbÞ process is composed of many jets stemming from the Higgs-boson and topquark decay products, as well as from additional radiation. Many combinations of these jets are possible when reconstructing the Higgs-boson and top-quark candidates to explore their properties and the signal event topology. To enhance the signal separation, three intermediate multivariate techniques are implemented prior to the classification BDT: (a) the "reconstruction BDT" used to select the best combination of jet-parton assignments in each event and to build the Higgs-boson and top-quark candidates, (b) a likelihood discriminant (LHD) method that combines the signal and background probabilities of all possible combinations in each event, (c) a matrix element method (MEM) that exploits the full matrix element calculation to separate the signal from the background. The outputs of the three intermediate multivariate methods are used as input variables to the classification BDT in one or more of the signal regions. The properties of the Higgsboson and top-quark candidates from the reconstruction BDT are used to define additional input variables to the classification BDT. Although the intermediate techniques exploit similar information, they make use of this information from different perspectives and based on different assumptions, so that their combination further improves the separation power of the classification BDT. Details of the implementation of these multivariate techniques are described in Secs. VI A-VI D.

A. Classification BDT
The classification BDT is trained to separate the signal from the tt background on a sample that is statistically independent of the sample used for the evaluation. The toolkit for multivariate analysis (TMVA) [85] is used to train both this and the reconstruction BDT. The classification BDT is built by combining several input variables that exploit the different kinematics of signal and background events, as well as the b-tagging information. General kinematic variables, such as invariant masses and angular separations of pairs of reconstructed jets and leptons, are combined with outputs of the intermediate multivariate discriminants and the b-tagging discriminants of the selected jets. In the case of the boosted single-lepton signal region, kinematic variables are built from the properties of the large-R jets and their jet constituents. The input variables to the classification BDT in each of the signal regions are listed in Appendix B. The input variables  are selected to maximize the performance of the classification BDT; however, only variables with good modeling of data by simulation are considered. The output of the reconstruction BDT, the LHD and the MEM represent the most powerful variables in the classification BDT.

B. Reconstruction BDT
The reconstruction BDT is employed in all dilepton and resolved single-lepton signal regions. It is trained to match reconstructed jets to the partons emitted from top-quark and Higgs-boson decays. For this purpose, W-boson, top-quark and Higgs-boson candidates are built from combinations of jets and leptons. The b-tagging information is used to discard combinations containing jet-parton assignments inconsistent with the correct parton candidate flavor.
In the single-lepton channel, leptonically decaying W-boson candidates are assembled from the lepton fourmomentum (p l ) and the neutrino four-momentum (p ν ); the latter is built from the missing transverse momentum, its z component being inferred by solving the equation where m W represents the W-boson mass. Both solutions of this quadratic equation are used in separate combinations. If no real solutions exist, the discriminant of the quadratic equation is set to zero, giving a unique solution. The hadronically decaying W-boson and the Higgs-boson candidates are each formed from a pair of jets. The top-quark candidates are formed from one W-boson candidate and one jet. The top-quark candidate containing the hadronically (leptonically) decaying W boson is referred to as the hadronically (leptonically) decaying top-quark candidate. In the single-lepton signal regions with exactly five selected jets, more than 70% of the events do not contain both jets from the hadronically decaying W boson. Therefore, the hadronically decaying top-quark candidate is assembled from two jets, one of which is b-tagged. In the dilepton channel, no attempt to build leptonically decaying W-boson candidates is made and the top-quark candidates are formed by one lepton and one jet.
Simulated ttH events are used to iterate over all allowed combinations. The reconstruction BDT is trained to distinguish between correct and incorrect jet assignments, using invariant masses and angular separations in addition to other kinematic variables as inputs. In each event a specific combination of jet-parton assignments, corresponding to the best BDT output, is chosen in order to compute kinematic and topological information of the top-quark and Higgs-boson candidates to be input to the classification BDT. However, although the best possible reconstruction performance can be obtained by including information related to the Higgs boson, such as the candidate Higgs-boson invariant mass, in the reconstruction BDT, this biases the background distributions of these Higgs-boson-related observables in the chosen jet-parton assignment towards the signal expectation, reducing their ability to separate signal from background. For this reason, two versions of the reconstruction BDT are used, one with and one without the Higgs-boson information and the resulting jet-parton assignments from one, the other or both are considered when computing input variables for the classification BDT, as detailed in Appendix B.
The Higgs boson is correctly reconstructed in 48% (32%) of the selected ttH events in the single-lepton channel SR ≥6j 1 using the reconstruction BDT with (without) information about the Higgs-boson kinematics included. For the dilepton channel, the corresponding reconstruction efficiencies are 49% (32%) in SR ≥4j 1 . The reconstruction techniques are not needed in the signal region SR boosted , as the Higgs-boson and the top-quark candidates are chosen as the selected large-R jets described in Sec. III. The large-R jet selected as a Higgs-boson candidate contains two b-tagged jets stemming from the decay of a Higgs boson in 47% of the selected ttH events.

C. Likelihood discriminant
In the resolved single-lepton signal regions, the output from a likelihood discriminant is included as an additional input variable for the classification BDT. The LHD is computed analogously to Ref. [86] as a product of onedimensional probability density functions, pdfs, for the signal and the background hypotheses. The pdfs are built for various invariant masses and angular distributions from reconstructed jets and leptons and from the missing transverse momentum, in a similar way to those used in the reconstruction BDT.
Two background hypotheses are considered, corresponding to the production of ttþ ≥ 2 b-jets and tt + exactly one b-jet, respectively. The likelihoods for both hypotheses are averaged, weighted by their relative fractions in simulated tt þ jets events. In a significant fraction of both the ttH and tt simulated events with at least six selected jets, only one jet stemming from the hadronically decaying W boson is selected. An additional hypothesis, for both the signal and the background, is considered to account for this topology. In events with exactly five selected jets, variables including the hadronically decaying top-quark candidate are built similarly to those for the reconstruction BDT.
The probabilities p sig and p bkg , for signal and background hypotheses, respectively, are obtained as the product of the pdfs for the different kinematic distributions, averaged among all possible jet-parton matching combinations. Combinations are weighted using the b-tagging information to suppress the impact from parton-jet assignments that are inconsistent with the correct parton candidates flavor. For each event, the discriminant is defined as the ratio of the probability p sig to the sum of p sig and p bkg , and added as an input variable to the classification BDT. As opposed to the reconstruction BDT method, the LHD method takes advantage of all possible combinations in the event, but it does not fully account for correlations between variables in one combination, as it uses a product of one-dimensional pdfs.

D. Matrix element method
A discriminant (MEM D1 ) based on the MEM is computed following a method similar to the one described in Ref. [16] and is included as another input to the classification BDT. The MEM consumes a significant amount of computation time and thus is implemented only in the most sensitive single-lepton signal region, SR ≥6j 1 . The degree to which each event is consistent with the signal and background hypotheses is expressed via signal and background likelihoods, referred to as L S and L B , respectively. These are computed using matrix element calculations at the parton level rather than using simulated MC samples as for the LHD method. The matrix element evaluation is performed with MG5_aMC@NLO at the LO accuracy. The ttHðH → bbÞ process is used as a signal hypothesis, while tt þ bb is used as a background hypothesis. To reduce the computation time, only diagrams representing gluoninduced processes are considered. The parton distribution functions are modeled with the CT10 PDF set, interfaced via the LHAPDF package [87]. Transfer functions, that map the detector quantities to the parton level quantities, are derived from a tt sample generated with POWHEG+PYTHIA 6 and validated with the nominal POWHEG+PYTHIA 8 tt sample. The directions in η and ϕ of all visible finalstate objects are assumed to be well measured, and their transfer functions are thus represented by δ-functions. The neutrino momentum is constrained by imposing transverse momentum conservation in each event, while its p z is integrated over. The integration is performed using VEGAS [88], following the implementation described in Ref. [89]. As in the reconstruction BDT, b-tagging information is used to reduce the number of jet-parton assignments considered in the calculation. The discriminating variable, MEM D1 , is defined as the difference between the logarithms of the signal and background likelihoods: MEM D1 ¼ log 10 ðL S Þ − log 10 ðL B Þ.

VII. SYSTEMATIC UNCERTAINTIES
Many sources of systematic uncertainty affect the search, including those related to the luminosity, the reconstruction and identification of leptons and jets, and the theory modeling of signal and background processes. Different uncertainties may affect only the overall normalization of the samples, or also the shapes of the distributions used to categorize the events and to build the final discriminants. All the sources of experimental uncertainty considered, with the exception of the uncertainty in the luminosity, affect both the normalizations and the shapes of distributions in all the simulated samples. Uncertainties related to modeling of the signal and the backgrounds affect both the normalizations and the shapes of the distributions for the processes involved, with the exception of cross section and normalization uncertainties that affect only the normalization of the considered sample. Nonetheless, the normalization uncertainties modify the relative fractions of the different samples leading to a shape uncertainty in the distribution of the final discriminant for the total prediction in the different analysis categories.
A single independent nuisance parameter is assigned to each source of systematic uncertainty, as described in Sec. VIII. Some of the systematic uncertainties, in particular most of the experimental uncertainties, are decomposed into several independent sources, as specified in the following. Each individual source then has a correlated effect across all the channels, analysis categories, signal and background samples. For modeling uncertainties, especially tt modeling, additional nuisance parameters are included to split some uncertainties into several sources independently affecting different subcomponents of a particular process.

A. Experimental uncertainties
The uncertainty of the combined 2015 þ 2016 integrated luminosity is 2.1%. It is derived, following a methodology similar to that detailed in Ref. [26], from a calibration of the luminosity scale using x-y beam-separation scans performed in August 2015 and May 2016. A variation in the pileup reweighting of MC events is included to cover the uncertainty in the ratio of the predicted and measured inelastic cross-sections in the fiducial volume defined by M X > 13 GeV where M X is the mass of the hadronic system [90].
The jet energy scale and its uncertainty are derived by combining information from test-beam data, LHC collision data and simulation [33]. The uncertainties from these measurements are factorized into eight independent sources. Additional uncertainties are considered, related to jet flavor, pileup corrections, η dependence, and high-p T jets, yielding a total of 20 independent sources. Although the uncertainties are not large, totaling 1%-6% per jet (depending on the jet p T ), the effects are amplified by the large number of jets in the final state. Uncertainties in the jet energy resolution and in the efficiency to pass the JVT requirement that is meant to remove jets from pileup are also considered. The jet energy resolution is divided into two independent components.
The efficiency to correctly tag b-jets is measured in data using dileptonic tt events. The mis-tag rate for c-jets is also measured in tt events, identifying hadronic decays of W bosons including c-jets [91], while for light jets it is measured in multijet events using jets containing secondary vertices and tracks with impact parameters consistent with a negative lifetime [36]. The b-tagging efficiencies and mistag rates are first extracted for each of the four working points used in the analysis as a function of jet kinematics, and then combined into a calibration of the b-tagging discriminant distribution, with corresponding uncertainties that correctly describe correlations across multiple working points. The uncertainty associated with the b-tagging efficiency, whose size ranges between 2% and 10% depending on the working point and on the jet p T , is factorized into 30 independent sources. The size of the uncertainties associated with the mis-tag rates is 5%-20% for c-jets depending on the working point and on the jet p T , and 10%-50% for light jets depending on the working point and on the jet p T and η. These uncertainties are factorized into 15 (80) independent sources for c-jets (light jets). Jets from τ had candidates are treated as c-jets for the mis-tag rate corrections and systematic uncertainties. An additional source of systematic uncertainty is considered on the extrapolation between c-jets and these τ-jets.
Uncertainties associated with leptons arise from the trigger, reconstruction, identification, and isolation efficiencies, as well as the lepton momentum scale and resolution. These are measured in data using leptons in Z → l þ l − , J=ψ → l þ l − and W → eν events [28,29]. Uncertainties of these measurements account for a total of 24 independent sources, but have only a small impact on the result.
All uncertainties in energy scales or resolutions are propagated to the missing transverse momentum. Additional uncertainties in the scale and resolution of the soft term are considered, for a total of three additional sources of systematic uncertainty.

B. Modeling uncertainties
The predicted ttH signal cross-section uncertainty is þ5.8% −9.2% ðscaleÞ AE 3.6%ðPDFÞ, the first component representing the QCD scale uncertainty and the second the PDF þ α S uncertainty [15,[51][52][53][54][55]. These two components are treated as uncorrelated in the fit. The effect of QCD scale and PDF variations on the shape of the distributions considered in this analysis is found to be negligible. Uncertainties in the Higgs-boson branching fractions are also considered; these amount to 2.2% for the bb decay mode [15]. An additional uncertainty associated with the choice of parton shower and hadronization model is derived by comparing the nominal prediction from MG5_aMC@NLO +PYTHIA 8 to the one from MG5_aMC@NLO interfaced to HERWIG++.
The systematic uncertainties affecting the modeling of the tt þ jets background are summarized in Table I. An uncertainty of AE6% is assumed for the inclusive tt NNLO þ NNLL production cross section [62], including effects from varying the factorization and renormalization scales, the PDF, α S , and the top-quark mass. The tt þ ≥ 1b, tt þ ≥ 1c and tt þ light processes are affected by different types of uncertainties: tt þ light has additional diagrams and profits from relatively precise measurements in data; tt þ ≥ 1b and tt þ ≥ 1c can have similar or different diagrams depending on the flavor scheme used for the PDF, and the mass differences between cand b-quarks contribute to additional differences between these two TABLE I. Summary of the sources of systematic uncertainty for tt þ jets modeling. The systematic uncertainties listed in the second section of the table are evaluated in such a way as to have no impact on the relative fractions of tt þ ≥ 1b, tt þ ≥ 1c and tt þ light events, as well as on the relative fractions of the tt þ b, tt þ bb, tt þ B and tt þ ≥ 3b subcategories, which are all kept at their nominal values. The systematic uncertainties listed in the third section of the table affect only the fractions of the various tt þ ≥ 1b subcategories. The last column of the table indicates the tt category to which a systematic uncertainty is assigned. In the case where all three categories (tt þ light, tt þ ≥ 1c and tt þ ≥ 1b) are involved (marked with "all"), the last column also specifies whether the uncertainty is considered as correlated or uncorrelated across them.

Systematic source
Description tt categories tt cross-section Up or down by 6% All, correlated kðtt þ ≥ 1cÞ Free Alternative set of tuned parameters for the underlying event tt þ ≥ 1b tt þ ≥ 1b MPI Up or down by 50% tt þ ≥ 1b tt þ ≥ 3b normalization Up or down by 50% tt þ ≥ 1b processes. For these reasons, all uncertainties in tt þ jets background modeling, except the uncertainty in the inclusive cross-section, are assigned independent nuisance parameters for the tt þ ≥ 1b, tt þ ≥ 1c and tt þ light processes. The normalizations of tt þ ≥ 1b and tt þ ≥ 1c are allowed to float freely in the fit. Systematic uncertainties in the shapes are extracted from the comparison between the nominal sample and various alternative samples. For all these uncertainties, alternative samples are reweighted in such a way that they have the same fractions of tt þ ≥ 1c and tt þ ≥ 1b as the nominal sample. In the case of the tt þ ≥ 1b background, separate uncertainties are applied to the relative normalization of the tt þ ≥ 1b subcomponents as described later. Therefore, for all the alternative samples used to derive uncertainties that are not specifically associated with these fractions, the relative contributions of the tt þ ≥ 1b subcategories are scaled to match the predictions of SHERPA4F, in the same way as for the nominal sample. This scaling is not applied to the tt þ bðMPI=FSRÞ subcategory, as explained in Sec. IV. Uncertainties associated with the choice of tt inclusive NLO event generator as well as the choice of parton shower and hadronization model are derived by comparing the prediction from POWHEG+PYTHIA 8 with the SHERPA predictions (hence varying simultaneously the NLO event generator and the parton shower and hadronization model) and with the predictions from POWHEG interfaced with HERWIG 7 [92] (varying just the parton shower and hadronization model). The former alternative sample was generated using SHERPA version 2.2.1 with the ME+PS@NLO setup, interfaced with OPENLOOPS, providing NLO accuracy for up to one additional parton and LO accuracy for up to four additional partons. The NNPDF3.0NNLO PDF set was used and both the renormalization and factorization scales were set to ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 0.5 × ðm 2 T;t þ m 2 T;t Þ q . This sample is referred to as 'SHERPA5F' in the remainder of this article, which should not be confused with the SHERPA4F sample defined in Sec. IV. The comparison with the latter alternative sample is considered as an independent source of uncertainty, related to the parton shower and hadronization model choice. This sample was generated with the same settings for POWHEG as the nominal tt sample in terms of h damp , PDF and renormalization and factorization scales, but it was interfaced with HERWIG 7 version 7.0.1, with the H7-UE-MMHT set of tuned parameters for the underlying event. Additionally, the uncertainty in the modeling of initial-and final-state radiation (ISR/FSR) is assessed with two alternative POWHEG+PYTHIA 8 samples [93]. One sample with the amount of radiation increased has the renormalization and factorization scales decreased by a factor of two, the h damp parameter doubled, and uses the Var3c upward variation of the A14 parameter set. A second sample with the amount of radiation decreased has the scales increased by a factor of two and uses the Var3c downward variation of the A14 set. The uncertainties described in this paragraph correspond to three independent sources for each of the tt þ light, tt þ ≥ 1c and tt þ ≥ 1b components.
For the background from tt þ ≥ 1c, there is little guidance from theory or experiment to determine whether the nominal approach of using charm jets produced primarily in the parton shower is more or less accurate than a prediction with tt þ cc calculated at NLO in the matrix element. For this reason, an NLO prediction with tt þ cc in the matrix element, including massive c-quarks and therefore using the 3F scheme for the PDFs, is produced with MG5_aMC@NLO interfaced to HERWIG++, as described in Ref. [94]. The difference between this sample and an inclusive tt sample produced with the same event generator and a 5F scheme PDF set, in which the tt þ ≥ 1c process originates through the parton shower only, is taken as an additional uncertainty in the tt þ ≥ 1c prediction. This uncertainty is related to the choice between the tt þ cc ME calculation and the prediction from the inclusive tt production with c-jets via parton shower and is applied as one additional independent source to the tt þ ≥ 1c background.
For the tt þ ≥ 1b process, the difference between the predictions from POWHEG+PYTHIA 8 and SHERPA4F is considered as one additional source of uncertainty. This uncertainty accounts for the difference between the description of the tt þ ≥ 1b process by the NLO tt inclusive MC sample with a 5F scheme and a description at NLO of tt þ bb in the ME with a 4F scheme. This uncertainty is not applied to the tt þ bðMPI=FSRÞ subcategory since it is not included in the 4F calculation.
The uncertainties described above do not affect the relative fractions of the tt þ b, tt þ bb, tt þ B and tt þ ≥ 3b subcomponents as these fractions are fixed to the prediction of SHERPA4F. The uncertainties in these fractions in SHERPA4F are assessed separately and are divided into seven independent sources. Three of these sources are evaluated by varying the renormalization scale up and down by a factor of two, changing the functional form of the resummation scale to μ CMMPS , and adopting a global scale choice, μ Q ¼ μ R ¼ μ F ¼ μ CMMPS . Additionally, two alternative PDF sets, MSTW2008NLO [95] and NNPDF2.3NLO, are considered, as well as an alternative shower recoil scheme and an alternative set of tuned parameters for the underlying event. These sources of uncertainty contribute to the uncertainty band shown in Fig. 2 for the SHERPA4F prediction. Given the large difference between the 4F prediction and the various 5F predictions for the tt þ ≥ 3b process, which is not covered by the uncertainties described above, this subprocess is given an extra 50% normalization uncertainty.
The relative fraction of the tt þ bðMPI=FSRÞ subcategory is not fixed in the alternative samples used to derive the systematic uncertainties related to the choice of NLO event generator, parton shower and hadronization model and to ISR/FSR. These sources already incorporate variations related to the fraction and shape of the tt þ bðMPI=FSRÞ subcategory. In addition, a 50% normalization uncertainty is assumed for the contribution from MPI, based on studies of different underlying event sets of tuned parameters.
In total, thirteen independent sources of modeling uncertainties are assigned to the tt þ ≥ 1b component, four to the tt þ ≥ 1c component and three to the tt þ light component in addition to the one source that corresponds to the inclusive tt production cross-section uncertainty.
An uncertainty of 40% is assumed for the W þ jets cross section, with an additional 30% normalization uncertainty used for W þ heavy-flavor jets, taken as uncorrelated between events with two and more than two heavy-flavor jets. These uncertainties are based on variations of the factorization and renormalization scales and of the matching parameters in the SHERPA simulation. An uncertainty of 35% is then applied to the Z þ jets normalization, uncorrelated across jet bins, to account for both the variations of the scales and matching parameters in SHERPA simulation and the uncertainty in the extraction from data of the correction factor for the heavy-flavor component.
An uncertainty of þ5% −4% is considered for each of the three single-top production mode cross sections [75][76][77]. For the Wt and t-channel production modes, uncertainties associated with the choice of parton shower and hadronization model and with initial-and final-state radiation are evaluated according to a set of alternative samples analogous to those used for the tt process: the nominal prediction is compared with samples generated with POWHEG interfaced with HERWIG++ and with alternative POWHEG-BOX v1 +PYTHIA 6 samples with factorization and renormalization scale variations and appropriate variations of the Perugia 2012 set of tuned parameters. The uncertainty in the amount of interference between Wt and tt production at NLO [72] is assessed by comparing the default "diagram removal" scheme to the alternative "diagram subtraction" scheme.
A 50% normalization uncertainty in the diboson background is assumed, which includes uncertainties in the inclusive cross-section and additional jet production [82]. The uncertainty of the ttV NLO cross-section prediction is 15% [96], split into PDF and scale uncertainties as for ttH. An additional ttV modeling uncertainty, related to the choice of event generator, parton shower and hadronization model, is assessed by comparing the nominal sample with alternative ones generated with SHERPA. Uncertainties in ttV production are all treated as uncorrelated between ttZ and ttW. A total 50% normalization uncertainty is considered for the tttt background. The small backgrounds from tZ, ttWW, tHjb and WtH are each assigned two crosssection uncertainties, split into PDF and scale uncertainties, while tWZ is assigned one cross-section uncertainty that accounts for both the scale and PDF effects.
Finally, a 50% uncertainty is assigned to the overall estimated yield of nonprompt lepton events in the singlelepton channel, taken as uncorrelated between electronplus-jet and muon-plus-jet events, between boosted and resolved analysis categories, and between the resolved analysis categories with exactly five jets and those with six or more jets. In the dilepton channel, the nonprompt lepton background is assigned a 25% uncertainty, correlated across lepton flavors and all analysis categories.

VIII. RESULTS
The distributions of the discriminants from each of the analysis categories are combined in a profile likelihood fit to test for the presence of a signal, while simultaneously determining the normalization and constraining the differential distributions of the most important background components. As described in Sec. VI, in the signal regions, the output of the classification BDT is used as the discriminant while only the total event yield is used in the control regions, with the exception of CR 5j tt þ≥1c and CR ≥6j tt þ≥1c , where the H had T distribution is used. No distinction is made in the fit between signal and control regions, other than a different choice of discriminant variables. The binning of the classification BDT is optimized to maximize the analysis sensitivity while keeping the total MC statistical uncertainty in each bin to a level adjusted to avoid biases due to fluctuations in the predicted number of events.
The likelihood function, Lðμ; θÞ, is constructed as a product of Poisson probability terms over all bins in each distribution. The Poisson probability depends on the predicted number of events in each bin, which in turn is a function of the signal-strength parameter μ ¼ σ=σ SM and θ, where θ is the set of nuisance parameters that encode the effects of systematic uncertainties, and of the two free floating normalization factors kðtt þ ≥ 1bÞ and kðtt þ ≥ 1cÞ for the tt þ ≥ 1b and tt þ ≥ 1c backgrounds, respectively. The nuisance parameters are implemented in the likelihood function as Gaussian, log-normal or Poisson priors, with the exception of the normalization factors kðtt þ ≥ 1bÞ and kðtt þ ≥ 1cÞ, for which no prior knowledge from theory or subsidiary measurements is assumed and hence which are only constrained by the profile likelihood fit to the data. The statistical uncertainty of the prediction, that incorporates the statistical uncertainty of the MC events and of the data-driven fake and nonprompt lepton estimate, is included in the likelihood in the form of additional nuisance parameters, one for each of the included bins. The test statistic t μ is defined as the profile likelihood ratio: t μ ¼ −2 lnðLðμ;θ μ Þ=Lðμ;θÞÞ, whereμ andθ are the values of the parameters which maximize the likelihood function, andθ μ are the values of the nuisance parameters which maximize the likelihood function for a given value of μ. This test statistic is used to measure the probability that the observed data is compatible with the background-only hypothesis, and to perform statistical inferences about μ, such as upper limits using the CL s method [97][98][99]. The uncertainty of the best-fit value of the signal strength,μ, is obtained varying t μ by one unit. Figure 7 shows the observed event yield compared to the prediction in each control and signal region, both before the fit to data ('pre-fit') and after the fit to data ('post-fit'), performed in all the analysis categories in the two channels and with the signal-plus-background hypothesis. For the pre-fit prediction, the normalization factors for the tt þ ≥ 1b and tt þ ≥ 1c processes are set to 1, which corresponds to considering the prediction from POWHEG+PYTHIA 8 for the fraction of each of these components relative to the total tt prediction. Figure 8 shows the H had T distributions in the tt þ ≥ 1c-enriched control regions of the single-lepton channel, while Figs. 9, 10 and 11 show the distributions of the classification BDTs in the dilepton and single-lepton signal regions, both before and after the fit. All these distributions are reasonably well modeled prefit within the assigned uncertainties. The level of agreement is improved postfit due to the nuisance parameters being adjusted by the fit. In particular, the best-fit values of kðtt þ ≥ 1bÞ and kðtt þ ≥ 1cÞ are 1.24 AE 0.10 and 1.63 AE 0.23, respectively. The uncertainties in these measured normalization factors do not include the theory uncertainty of the corresponding Comparison of predicted and observed event yields in each of the control and signal regions, in the dilepton channel (a) before and (b) after the fit to the data, and in the single-lepton channel (c) before and (d) after the fit to the data. The ttH signal is shown both as a filled red area stacked on the backgrounds and separately for visibility as a dashed red line, normalized to the SM cross-section before the fit and to the fitted μ after the fit. The hatched area corresponds to the fitted uncertainty in the total prediction. The pre-fit plots do not include an uncertainty for the tt þ ≥ 1b or tt þ ≥ 1c normalization. tt þ ≥ 1b and tt þ ≥ 1c cross sections. The postfit uncertainty is also significantly reduced, as a result of the nuisance-parameter constraints and the correlations generated by the fit.
In addition to the distributions that are given as input to the fit, all the distributions of the input variables to the classification BDTs in the signal regions are checked postfit, and no significant deviations of the predictions from data are found. Figure 12 shows the data compared to the postfit prediction for three of these distributions, namely the Higgs-boson candidate mass distributions in the most sensitive signal regions in the dilepton channel and the single-lepton resolved channels as well as in the single-lepton boosted signal region.
The best-fit μ value is: determined by the combined fit in all signal and control regions in the two channels. The expected uncertainty of the signal strength is identical to the measured one. An alternative combined fit is also performed in which the dilepton and single-lepton channels are assigned two independent signal strengths. The corresponding fitted values of μ are −0.24 þ1.02 −1.05 in the dilepton channel and 0.95 þ0. 65 −0.62 in the single-lepton channel. The probability of obtaining a discrepancy between these two signal-strength parameters equal to or larger than the one observed is 19%.     8. Comparison between data and prediction for the H had T distributions in the single-lepton tt þ ≥ 1c-enriched control regions (a, c) before, and (b, d) after the combined dilepton and single-lepton fit to the data. Despite its small contribution in these control regions, the ttH signal prediction is shown stacked at the top of the background prediction, normalized to the SM cross-section before the fit and to the fitted μ after the fit. The prefit plots do not include an uncertainty for the tt þ ≥ 1b or tt þ ≥ 1c normalization.       9. Comparison between data and prediction for the BDT discriminant in the dilepton signal regions (a, c, e) before, and (b, d, f) after the combined dilepton and single-lepton fit to the data. The ttH signal yield (solid red) is normalized to the SM cross-section before the fit and to the fitted μ after the fit. The dashed line shows the ttH signal distribution normalized to the total background prediction. The prefit plots do not include an uncertainty for the tt þ ≥ 1b or tt þ ≥ 1c normalization.       10. Comparison between data and prediction for the BDT discriminant in the single-lepton channel five-jet and boosted signal regions (a, c, e) before, and (b, d, f) after the combined dilepton and single-lepton fit to the data. The ttH signal yield (solid red) is normalized to the SM cross section before the fit and to the fitted μ after the fit. The dashed line shows the ttH signal distribution normalized to the total background prediction. The pre-fit plots do not include an uncertainty for the tt þ ≥ 1b or tt þ ≥ 1c normalization.        11. Comparison between data and prediction for the BDT discriminant in the single-lepton channel six-jet signal regions (a, c, e) before, and (b, d, f) after the combined dilepton and single-lepton fit to the data. The ttH signal yield (solid red) is normalized to the SM cross-section before the fit and to the fitted μ after the fit. The dashed line shows the ttH signal distribution normalized to the total background prediction. The prefit plots do not include an uncertainty for the tt þ ≥ 1b or tt þ ≥ 1c normalization. Figure 13 shows the comparison between the combined μ and the two independent signal-strength parameters from the combined fit, with their uncertainties split into the statistical and systematic components. The statistical uncertainty is obtained by redoing the fit to data after fixing all the nuisance parameters to their post-fit values, with the exception of the free normalization factors in the fit: kðtt þ ≥ 1cÞ, kðtt þ ≥ 1bÞ and μ. The total systematic uncertainty is obtained from the subtraction in quadrature of the statistical uncertainty from the total uncertainty. The statistical uncertainty contributes significantly less than the systematic component to the overall uncertainty of the measurement. When fitting the dilepton and single-lepton data separately, the observed signal strengths are 0.11 þ1.36 −1.41 and 0.67 þ0. 71 −0.69 , respectively. These two signal-strength values are both lower than the combined measured μ due to the large correlations in the systematic uncertainties of the background prediction between the two channels.

SR Pre-Fit
The contributions from the different sources of uncertainty in the combined fit to μ are reported in Table II. The total systematic uncertainty is dominated by the uncertainties in the modeling of the tt þ ≥ 1b background, the second-largest source being the limited number of events in the simulated samples, followed by the uncertainties in the b-tagging efficiency, the jet energy scale and resolution, and the signal process modeling. The 20 nuisance parameters describing the independent sources of systematic uncertainty with the largest contribution to the total

SR Post-Fit
(c) FIG. 12. Comparison between data and prediction for the Higgs-boson candidate mass from the reconstruction BDT trained without variables involving the Higgs-boson candidate (a) in the dilepton SR ≥4j 1 and (b) in the single-lepton SR ≥6j 1 , and (c) for the boosted Higgsboson candidate in SR boosted , after the combined dilepton and single-lepton fit to the data. The ttH signal yield (solid red) is normalized to the fitted μ after the fit. The dashed red line shows the ttH signal distribution normalized to the total background yield. The dashed black line shows the prefit total background prediction.

SEARCH FOR THE STANDARD MODEL HIGGS BOSON …
PHYS. REV. D 97, 072016 (2018) 072016-21 uncertainty of the measured signal strength are reported in Fig. 14, ranked by decreasing contribution. For each of these nuisance parameters, the best-fit value and the postfit uncertainty are shown. The uncertainty coming from the comparison between the SHERPA5F and the nominal prediction for the tt þ ≥ 1b process, related to the choice of the NLO event generator for this background component, has the largest impact on the signal strength, followed by three uncertainties also related to the modeling of the tt þ ≥ 1b background. Systematic uncertainties related to the ttH signal modeling, the modeling of the tt þ ≥ 1c and tt þ light backgrounds, and to experimental sources such as b-tagging, jet energy scale and resolution, also appear in Fig. 14; however, their contributions are significantly smaller than the ones from the tt þ ≥ 1b background. The total uncertainty of the signal strength is reduced by 5% if the fit is performed excluding the systematic uncertainties not shown in this figure.
The theoretical predictions for the tt þ ≥ 1b process suffer from large uncertainties as reflected in the size of the difference between alternative simulated samples used to model this background. The corresponding systematic uncertainties are therefore large and are a crucial limiting factor for this search. The choice of nuisance parameters for systematic uncertainties related to the tt þ ≥ 1b background is studied carefully to ensure sufficient flexibility in the fit to correct for possible mis-modeling of this background and avoid any bias in the measured signal strength. In total, 13 independent nuisance parameters are assigned to tt þ ≥ 1b background modeling uncertainties. The capability of the fit to correct for mis-modeling effects, beyond the ones present in the distributions used in the fit, is confirmed by comparing the predictions of all input variables of the classification BDT obtained post-fit to data. As mentioned before, no significant deviations of the predictions from data are found and the agreement is improved postfit. Alternative approaches to model the tt þ ≥ 1b background, to define the associated uncertainties and to correlate them are also tested, and the corresponding results are found to be compatible with the nominal result.
To further validate the robustness of the fit, a pseudodata set was built from simulated events by replacing the nominal tt background by an alternative sample that is not used in the definition of any uncertainty. This alternative sample was generated with POWHEG+PYTHIA 6 and is similar to the sample used for the ttHðH → bbÞ analysis  13. Summary of the signal-strength measurements in the individual channels and for the combination. All the numbers are obtained from a simultaneous fit in the two channels, but the measurements in the two channels separately are obtained keeping the signal strengths uncorrelated, while all the nuisance parameters are kept correlated across channels. TABLE II. Breakdown of the contributions to the uncertainties in μ. The line "background-model statistical uncertainty" refers to the statistical uncertainties in the MC events and in the data-driven determination of the nonprompt and fake lepton background component in the single-lepton channel. The contribution of the different sources of uncertainty is evaluated after the fit described in Sec. VIII. The total statistical uncertainty is evaluated, as described in the text, by fixing all the nuisance parameters in the fit except for the free-floating normalization factors for the tt þ ≥ 1b and tt þ ≥ 1c background components. The contribution from the uncertainty in the normalization of both tt þ ≥ 1b and tt þ ≥ 1c is then included in the quoted total statistical uncertainty rather than in the systematic uncertainty component. The statistical uncertainty evaluated after also fixing the normalization of tt þ ≥ 1b and tt þ ≥ 1c is then indicated as "intrinsic statistical uncertainty." The other quoted numbers are obtained by repeating the fit after having fixed a certain set of nuisance parameters corresponding to a group of systematic uncertainty sources, and subtracting in quadrature the resulting total uncertainty of μ from the uncertainty from the full fit. The same procedure is followed for quoting the individual effects of the tt þ ≥ 1b and the tt þ ≥ 1c normalization. The total uncertainty is different from the sum in quadrature of the different components due to correlations between nuisance parameters built by the fit.  [16] in Run 1 of the LHC. The fit to this pseudo-data sample did not reveal any bias in the signal extraction. Figure 14 shows that some nuisance parameters are shifted in the fit from their nominal values. To understand the origin of these shifts, the corresponding nuisance parameters are switched to be uncorrelated between analysis categories and samples and the fit is repeated. These shifts are found to correct mainly the predictions of the tt background to the observed data in various regions. Similar shifts are observed when a background-only fit is performed after removing the bins with the most significant signal contributions. Moreover, the variations induced in the signal strength by these shifts are quantified by fixing the corresponding nuisance parameters to their pre-fit values, repeating the fit, and comparing the obtained μvalue with the one from the nominal fit. These variations were found to be smaller than the uncertainty in the signal strength. Independent signal-strength values extracted from different sets of analysis categories and from the two channels are also found to be compatible. Figure 14 also shows that the uncertainties corresponding to some nuisance parameters are reduced by the fit. When performing the profile likelihood fit, nuisance parameters associated with uncertainties affecting the discriminant distributions by variations that would result in large deviations from data are significantly constrained. The capability of the fit to constrain systematic uncertainties is validated on the pseudodata sample described above, and on the pseudodata sample produced from the nominal predictions, the Asimov data set [97].

Uncertainty source
An excess of events over the expected SM background is found with an observed (expected) significance of 1.4 (1.6) standard deviations. A signal strength larger than 2.0 is excluded at the 95% C.L., as shown in Fig. 15. The expected significance and exclusion limits are calculated using the background estimate after the fit to the data.  Only the 20 most highly ranked parameters are shown. Nuisance parameters corresponding to MC statistical uncertainties are not included here. The empty blue rectangles correspond to the prefit impact on μ and the filled blue ones to the postfit impact on μ, both referring to the upper scale. The impact of each nuisance parameter, Δμ, is computed by comparing the nominal best-fit value of μ with the result of the fit when fixing the considered nuisance parameter to its best-fit value,θ, shifted by its prefit (postfit) uncertainties AEΔθ (AEΔθ). The black points show the pulls of the nuisance parameters relative to their nominal values, θ 0 . These pulls and their relative post-fit errors, Δθ=Δθ, refer to the scale on the bottom axis. The parameter kðtt þ ≥ 1bÞ refers to the floating normalization of the tt þ ≥ 1b background, for which the pre-fit impact on μ is not defined, and for which both θ 0 and Δθ are set to 1. For experimental uncertainties that are decomposed into several independent sources, NP I and NP II correspond to the first and second nuisance parameters, ordered by their impact on μ, respectively. In the case of the expected limits in the background-only hypothesis, one-and two-standard-deviation uncertainty bands are also shown. The limits for the two individual channels are derived consistently with Fig. 13, both extracted from the profile likelihood including the data in both channels, but with independent signal strengths in the two channels. Figure 16 shows the event yield in data compared to the post-fit prediction for all events entering the analysis selection, grouped and ordered by the signal-to-background ratio of the corresponding final-discriminant bins. The predictions are shown for both the fit with the background-only hypothesis and with the signal-plus-background hypothesis, where the signal is scaled to either the measured μ or the value of the upper limit on μ.

IX. CONCLUSION
A search for the associated production of the standard model Higgs boson with a pair of top quarks is presented, based on 36.1 fb −1 of pp collision data at ffiffi ffi s p ¼ 13 TeV, collected with the ATLAS detector at the Large Hadron Collider in 2015 and 2016. The search focuses on decays of the Higgs boson to bb and decays of the top quark pair to a final state containing one or two leptons. Multivariate techniques are used to discriminate between signal and background events, the latter being dominated by tt þ jets production. The observed data are consistent with both the background-only hypothesis and with the standard model ttH prediction. A 1.4σ excess above the expected background is observed, while an excess of 1.6σ is expected in the presence of a standard model Higgs boson. The signal strength is measured to be 0.84 þ0.64 −0.61 , consistent with the expectation from the standard model. A value higher than 2.0 is excluded at the 95% C.L., compared to an expected exclusion limit of 1.2 in the absence of signal. The measurement uncertainty is presently dominated by systematic uncertainties, and more specifically by the uncertainty in the theoretical knowledge of the tt þ ≥ 1b production process. An improved understanding of this background will be important for future efforts to observe the ttHðH → bbÞ process.

ACKNOWLEDGMENTS
We thank CERN for the very successful operation of the LHC, as well as the support staff from our institutions without whom ATLAS could not be operated efficiently. We acknowledge the support of ANPCyT 16. Postfit yields of signal (S) and total background (B) as a function of logðS=BÞ, compared to data. Final-discriminant bins in all dilepton and single-lepton analysis categories are combined into bins of logðS=BÞ, with the signal normalized to the SM prediction used for the computation of logðS=BÞ. The signal is then shown normalized to the best-fit value and to the value excluded at the 95% C.L., in both cases summed to the background prediction from the fit. The lower frame reports for each bin the pull (residual divided by its uncertainty) of the data relative to the background prediction from the fit. These data pulls are compared to the pulls of the signal-plus-background prediction from the fit, assuming a signal strength equal to the best-fit value (solid red line) and equal to the exclusion limit (dashed orange line). The background and its pull are also shown after the fit to data assuming zero signal contribution (dashed black line, obscured by solid line in the upper frame). The first bin includes the underflow. PIC (Spain), ASGC (Taiwan), RAL (UK) and BNL (USA), the Tier-2 facilities worldwide and large non-WLCG resource providers. Major contributors of computing resources are listed in Ref. [100].

APPENDIX A: YIELD TABLES
The predicted event yields in each of the analysis categories, broken down into the different signal and background contributions and compared to the observed TABLE III. Event yields in the dilepton channel (top) control regions and (bottom) signal regions. Postfit yields are after the combined fit in all channels to data. The uncertainties are the sum in quadrature of statistical and systematic uncertainties in the yields. In the postfit case, these uncertainties are computed taking into account correlations among nuisance parameters and among the normalization of different processes. The uncertainty in the tt þ ≥ 1b and tt þ ≥ 1c normalization is not defined pre-fit and therefore only included in the postfit uncertainties; the reported prefit uncertainties on the tt þ ≥ 1b and tt þ ≥ 1c components arise only from acceptance effects. For the ttH signal, the prefit yield values correspond to the theoretical prediction and corresponding uncertainties, while the postfit yield and uncertainties correspond to those in the signal-strength measurement. Postfit yields are after the combined fit in all channels to data. The uncertainties are the sum in quadrature of statistical and systematic uncertainties in the yields. In the postfit case, these uncertainties are computed taking into account correlations among nuisance parameters and among the normalization of different processes. The uncertainty in the tt þ ≥ 1b and tt þ ≥ 1c normalization is not defined prefit and therefore only included in the postfit uncertainties; the reported prefit uncertainties on the tt þ ≥ 1b and tt þ ≥ 1c components arise only from acceptance effects. For the ttH signal, the prefit yield values correspond to the theoretical prediction and corresponding uncertainties, while the postfit yield and uncertainties correspond to those in the signal-strength measurement.  (Table continued) yields in data, are reported in Tables III, IV and V. Both the prefit and postfit predictions are shown, where postfit refers to the combined fit to the dilepton and single-lepton channels with the signal-plus-background hypothesis, reported in Sec. VIII. The total uncertainties of each of the signal and background components, and of the total prediction are also reported. In this appendix, the full list of variables used as inputs to the classification BDT, described in Sec. VI, in each of the signal regions is reported. Variables are listed separately in Table VI for the dilepton channel, in Table VII for the resolved singlelepton channel and in Table VIII for the boosted category. Variables are grouped according to the type of information that is exploited. The variables from the reconstruction BDT exploit the chosen jet-parton assignments described in Sec. VI B. The b-tagging discriminant assigned to each jet is defined in Sec. III. The most powerful variables in the classification BDT are the reconstruction BDT output, the LHD (Sec. VI C) and the MEM D1 (Sec. VI D). The large-R jets used to build the Higgs-boson and top-quark candidates in the boosted category are defined in Sec. III. Some kinematic and topological variables are built considering only b-tagged-jets in the event. The b-tagging requirements for these jets are optimized separately for each variable in each region to improve the classification BDT performance. In the resolved single-lepton channel, b-tagged-jets are defined as the four jets with the largest value of the b-tagging discriminant. If two jets have the same b-tagging discriminant value, they are ordered by decreasing jet p T value. In the dilepton channel, the b-tagging requirements depend on the signal region: in SR ≥4j 1 the tight working point is used, in SR ≥4j 3 the very tight working point is used and in SR ≥4j 2 the loose working point is used with the exception of N Higgs 30 bb , which uses the medium working point, and Aplanarity b-jet , which uses the tight working point. The loose working point is used in the boosted signal region. TABLE VI. Variables used in the classification BDTs in the dilepton signal regions. For variables from the reconstruction BDT, those with a Ã are from the BDT using Higgs-boson information, those with no Ã are from the BDT without Higgs-boson information while for those with a ÃÃ both versions are used. These two versions of the reconstruction BDT are described in Sec. VI B.

Variable Definition
Variables from jet reclustering ΔR H;t ΔR between the Higgs-boson and top-quark candidates ΔR t;b add ΔR between the top-quark candidate and additional b-jet ΔR H;b add ΔR between the Higgs-boson candidate and additional b-jet ΔR H;l ΔR between the Higgs-boson candidate and lepton m Higgs candidate Higgs-boson candidate mass ffiffiffiffiffiffi ffi d 12 p Top-quark candidate first splitting scale [102] Variables from b-tagging w b-tag Sum of b-tagging discriminants of all b-jets w add b-tag =w b-tag Ratio of sum of b-tagging discriminants of additional b-jets to all b-jets Higgs boson produced in association with top quarks and decaying into bb in pp collisions at ffiffi ffi s p ¼ 8 TeV with the ATLAS detector, Eur. Phys. J. C 75, 349 (2015