Measurement of the inclusive $t\bar{t}$ production cross section in the lepton+jets channel in $pp$ collisions at $\sqrt{s}$= 7 TeV with the ATLAS detector using support vector machines

A measurement of the top quark pair-production cross section in the lepton+jets decay channel is presented. It is based on 4.6 fb$^{-1}$ of $\sqrt{s} = 7$ TeV $pp $ collision data collected during 2011 by the ATLAS experiment at the CERN Large Hadron Collider. A three-class, multidimensional event classifier based on support vector machines is used to differentiate $t\bar{t}$ events from backgrounds. The $t\bar{t}$ production cross section is found to be $\sigma_{t\bar{t}}=168.5 \pm 0.7$(stat.)$^{+6.2}_{-5.9}$(syst.)$^{+3.4}_{-3.2}$(lumi.) pb. The result is consistent with the Standard Model prediction based on QCD calculations at next-to-next-to-leading order.


Introduction
In the Standard Model of particle physics [1][2][3], the top quark (t) and the bottom quark (b) belong to a doublet representation of the weak-isospin SU (2).The top quark is the most massive of the known elementary particles.Because its mass is close to the electroweak symmetry breaking scale, it may play a fundamental role in the mechanism of breaking of the SU(2) symmetry of the electroweak interaction.Top quark production is also the dominant background in many analyses looking for physics beyond the Standard Model at high mass scales at the Large Hadron Collider (LHC), and a good understanding of top quark production is a necessary step in many "new physics" searches.
The cross section is one of the simplest observables that can be measured in the t t system.It allows one to make important comparisons with theoretical predictions available at next-to-next-to-leading order in perturbative QCD, including the soft-gluon resummation at next-to-next-to-leading-log order (NNLO + NNLL); see Ref. [4] and references therein.For pp collisions at a center-of-mass energy of √ s = 7 TeV, the predicted t t production cross section is σ NNLO+NNLL tt = 177 +10 −11 pb.This theoretical value was calculated with the Top++ 2.0 program [5], including soft-gluon resummation, assuming a top quark mass value of 172.5 GeV, and using the PDF4LHC [6] procedures.
According to the Standard Model, top quarks from pp collisions at the LHC are produced mostly via the strong interaction as t t pairs, with each top quark decaying into a W boson and a b-quark nearly 100% of the time.The t t events in which one of the W bosons decays into a quark pair and the other into a charged lepton and a neutrino, are classified as "lepton+jets", as such events contain an electron or muon or τ-lepton, a neutrino and typically four hadronic jets (two of which originate from the b-quark and b-quark).
In this paper, a measurement of the top quark pair-production cross section at √ s = 7 TeV using events with a single charged lepton (electron or muon) and jets in the final state is presented.The previously published result from the ATLAS Collaboration for the lepton+jets channel [7] uses 35 pb −1 of data and obtains a precision of 12%.The most precise CMS t t cross-section measurement in the same channel [8] has a precision of 7%.In the dilepton channel, the best ATLAS result [9] achieves 3.5% precision, while CMS [10] reaches 3.6%.These ATLAS and CMS dilepton results have been combined [11], resulting in an uncertainty of 2.6% in σ tt .
The analysis presented in this paper is based on the full dataset collected with the ATLAS detector at the LHC in 2011, corresponding to an integrated luminosity of 4.6 fb −1 , and attains statistical and systematic uncertainties that are significantly lower than in previous ATLAS measurements in this final state.In an extension of the usual application of binary multivariate classifiers, this analysis uses a large number of variables to train three different support vector machines (SVMs).The three SVMs are used to define a three-dimensional space in which a multi-class event discriminator is constructed to identify the t t events through a simultaneous profile likelihood fit in four independent regions of this space.

ATLAS detector
The ATLAS detector is described in Ref. [12].It is a multipurpose particle detector with forwardbackward symmetry and a cylindrical geometry. 1The inner tracking detectors are surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer with a magnetic field generated by three superconducting toroidal magnets of eight coils each.The inner-detector system (ID), in combination with the 2 T magnetic field from the solenoid, provides precision momentum measurements for charged particles within the pseudorapidity range |η| < 2.5.Moving radially outwards, it consists of a silicon pixel detector, a silicon microstrip detector, and a straw-tube tracker that also provides transition radiation measurements for electron identification.The calorimeter system covers the pseudorapidity range |η| < 4.9.A high-granularity liquid-argon (LAr) sampling calorimeter with lead absorber measures electromagnetic showers within |η| < 3.2.In the region matched to the ID, |η| < 2.5, the innermost layer has fine segmentation in η to improve the resolution of the shower position and direction measurements.Hadronic showers are measured by an iron/plastic-scintillator tile calorimeter in the central region, |η| < 1.7, and by a LAr calorimeter in the end cap region, 1.5 < |η| < 3.2.In the forward region, measurements of both electromagnetic and hadronic showers are provided by a LAr calorimeter covering the pseudorapidity range 3.1 < |η| < 4.9.The muon spectrometer is instrumented with separate trigger and high-precision tracking chambers.It provides muon identification for charged-particle tracks within |η| < 2.7.The combination of all ATLAS detector systems provides charged-particle measurement along with lepton and photon measurement and identification in the pseudorapidity range |η| < 2.5.Jets are reconstructed over the full range covered by the calorimeters, |η| < 4.9.
A three-level trigger system [13] is used to select interesting events.The first-level (L1) trigger is implemented in hardware and uses a subset of detector information to reduce the event rate to a design value of at most 75 kHz.This is followed by two software-based trigger levels which together reduce the event rate to about 200 Hz.
An extensive software suite [14] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Object definitions Electrons
Electron candidates are selected using the offline identification with tight requirements [15] within a fiducial region with transverse momentum p T > 25 GeV and |η| < 2.47, excluding the calorimeter transition region 1.37 < |η| < 1.52.They are subjected to several other strict criteria including requirements on track quality, impact parameter, calorimeter shower shape, and track-cluster matching.The electron candidates are also required to be isolated.The transverse energy (E T ) deposited in the calorimeter in a cone of size ∆R = 0.2 around the electron is calculated.Additionally, the scalar sum of the p T of tracks in a cone of size ∆R = 0.3 is determined.Both of these quantities have selection requirements that depend on the η and E T of the electron candidate, and which ensure 90% efficiency for electrons from W boson or Z boson 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the center of the detector and the z-axis along the beam pipe.The x-axis points from the IP to the center of the LHC ring, and the y-axis points upwards.Cylindrical coordinates (r, ϕ) are used in the transverse plane, ϕ being the azimuthal angle around the z-axis.The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2), and the distance ∆R in the η-ϕ space is defined as ∆R ≡ (∆η) 2 + (∆ϕ) 2 .
decays [16].Finally, electrons lying within ∆R = 0.4 of a selected jet are discarded to reject leptons from heavy-flavor decays.

Muons
Muon candidates reconstructed from tracks in both the muon spectrometer and ID are selected with the MuID algorithm [17].Only candidates satisfying p T > 20 GeV and |η| < 2.5 are selected.Muon candidates are required to have a sufficient number of hits in the ID.The impact parameter with respect to the primary vertex in the longitudinal direction along the beam axis is required to satisfy |z 0 | < 2 mm.The tight muon candidates used in this analysis are required to be isolated.The sum of the calorimeter transverse energy within ∆R = 0.2 of a muon is required to be below 4 GeV, and the sum of the p T of all the tracks within ∆R = 0.3 (excluding the muon track) must be below 2.5 GeV.The efficiency of this combined isolation requirement varies between 95% and 97%, depending on the data-taking period.In order to reduce the background from muons produced by heavy-flavor decays inside jets, muons are required to be separated by ∆R > 0.4 from the nearest selected jet.

Jets
Jets are reconstructed from topological clusters [18] formed from energy deposits in the calorimeters using the anti-k t algorithm [19,20] with a radius parameter of 0.4.Clusters are calibrated using the local cluster weighting (LCW), which differentiates between the energy deposits arising from electromagnetic and hadronic showers [21].The jet reconstruction is done at the electromagnetic scale and then a scale factor is applied in order to obtain the jet energy at the hadronic scale.The jet energy scale (JES) corrections account for the calorimeter response to the true jet energy by using "truth jets" from simulation.The truth jets are formed through the application of the anti-k t algorithm to stable particles, with the exception of final-state muons and neutrinos.Jet calibration includes both the LCW and JES calibrations.In addition, the jets are corrected for distortions due to multiple pp collisions per bunch crossing (pileup) using a method which estimates the pileup activity on an event-by-event basis, as well as the sensitivity of a given jet to pileup.With this method [21], a contribution to the jet transverse momentum equal to the product of the jet area in the η-ϕ plane and the average transverse energy density in a given event is subtracted.The effects of additional collisions in either the same bunch crossing or those adjacent in time are taken into account using corrections which depend on the average number of interactions per bunch crossing and the number of primary vertices.To avoid double counting of jets and electrons (which are reconstructed by independent algorithms), jets within ∆R = 0.2 of a reconstructed electron are removed.For this analysis, only jets in the central region of the detector, |η| < 2.5, and with transverse momentum p T > 25 GeV are considered.

Identification of b-jets
The identification of "b-jets" (jets arising from the decay of B-hadrons) is performed with the MV1 algorithm [22], which combines the outputs of three different tagging algorithms into a multivariate discriminant.Jets are defined to be "b-tagged" if the MV1 discriminant value is larger than a threshold (operating point) corresponding to 70% efficiency to identify b-quark jets in simulated t t events.Approximately 20% of jets originating from charm quarks are identified as b-jets, while light-flavor jets are mistagged as b-jets at the 1% level.

Missing transverse momentum
The missing transverse momentum is calculated [23] as the complement of the vector sum, in the transverse plane, of calorimeter cell energies within |η| < 4.9, after all corrections are applied to the associated physics objects (including jets, electrons, and muons).A correction for significant energy deposits not associated with high-p T physics objects is also included.The magnitude of the missing transverse momentum vector is denoted by E miss T , while its direction in the transverse plane is either denoted by an azimuthal angle ϕ or inferred through its vector components E miss T x and E miss T y .

Event selection
This analysis considers the single-lepton decay channel for the t t pair.The selected events are required to have exactly one lepton (either an electron or a muon), a large amount of E miss T , and three or more hadronic jets.The number of b-tagged jets in an event must be two or less.Events must have at least one primary vertex, with five or more tracks with p T > 150 MeV.If there is more than one primary vertex, the one with the largest tracks p 2 T is chosen.Events were collected using single-lepton triggers, and each lepton candidate must be matched to an appropriate lepton trigger.In the muon channel, events are selected with a single-muon trigger with a p T threshold of 18 GeV.For the electron channel, a single-electron trigger is required with p T > 20 GeV.This is increased to 22 GeV during high instantaneous luminosity periods.Three or more jets with p T greater than 25 GeV are required in each event.A large amount of E miss T is required to select events containing a neutrino.For electron events the E miss T must be greater than 30 GeV, while for muon events the E miss T is required to be greater than 20 GeV.
In order to reduce the background due to multi-jet production containing misidentified or nonprompt leptons, an additional selection requirement is imposed.Typically, in an event arising from this background, the missing transverse momentum vector points in the same direction in the transverse plane as the chargedlepton candidate.Therefore, electron candidate events must pass a requirement that m T (W) > 30 GeV, while muon candidate events must have m T (W) + E miss T > 60 GeV.Here, m T (W) is defined as where ∆ϕ is the difference in ϕ between the direction of the charged-lepton transverse momentum, p ℓ T , and the missing transverse momentum vector.

Data samples
The data sample used in this analysis comes from the √ s = 7 TeV pp collisions collected during LHC Run 1 in 2011, and was recorded during stable beam conditions with all relevant ATLAS subdetector systems operational.It corresponds to an integrated luminosity of 4.6 fb −1 , with an uncertainty of 1.8% [24].

Signal and background modeling
Except for the background due to multi-jet production leading to misidentified leptons (which is estimated from the data), all signal and background samples are modeled using Monte Carlo (MC) simulated events in conjunction with factors to correct the simulations to data where required.

Multi-jet background / fake leptons
The background from multi-jet events, in which a jet is misidentified as a muon or electron, or where a nonprompt lepton within a jet passes the tight lepton selection requirements, is sizable because of the large multi-jet production cross section.Events from these two sources of multi-jet background are referred to as fake-lepton events.This background is estimated using the so-called "matrix method," which is based on the measurement of the selection efficiencies of leptons using data event samples satisfying relaxed identification criteria [25,26].Loose electrons are electrons satisfying the baseline selection criteria where the requirements on particle identification using transition radiation measurements and on the energy-to-momentum ratio (E/p) are eased, and no requirement on the isolation is imposed.For loose muons, isolation is not required, but all other selection criteria are applied.The matrix method is based on the measurement of the efficiencies for real and fake leptons in the loose lepton selection to pass the tight selection criteria.The real-lepton efficiencies are measured in Z → ℓℓ data samples, while the fake-lepton efficiencies are determined in data control regions with selection requirements designed to enhance the multi-jet content (1 lepton, ≥ 1 jet, and only a small amount of E miss T ).These efficiencies depend on both the lepton kinematics and event characteristics.To account for this, event weights are computed from the efficiencies parametrized as a function of a number of observables, which are then used to reweight the sample of data events with lepton candidates that satisfy the loose but not the tight selection criteria.The sums of these weights provide estimates of the multi-jet background.

Monte Carlo samples
The samples used in this analysis were obtained from a simulation chain consisting of an event generator interfaced to a parton shower and hadronization model, the outputs of which were passed through a simulation of the ATLAS detector and trigger system [27], and then reconstructed with the same algorithms as the data.The ATLAS detector response was modeled with the ATLAS full simulation (FS) based on Geant4 [28].For t t samples used to evaluate the signal modeling systematic uncertainties, the ATLAS fast simulation AtlasFast-II (AF) [27,29] was used to model the response of the detector.The t t signal was simulated using the NLO Powheg-hvq (patch 4) matrix element generator [30] interfaced with Pythia 6.425 [31], using parameter values set according to the C variant of the Perugia 2011 tune [32] to model the underlying event and parton shower.The NLO CT10 [33] parton distribution function (PDF) set was used for the NLO matrix element part, and the LO PDF set CTEQ6L1 [34] was used with Pythia.The top quark mass was fixed at 172.5 GeV.This sample is referred to as Powheg+Pythia 6.In order to evaluate the dependence on the choice of parton shower and fragmentation models, additional samples of t t events were created.The MC@NLO+Herwig sample was created with the NLO MC@NLO [35] generator interfaced with Herwig [36] using the LO AUET2 tune [37].The Powheg+Herwig sample was created with Powheg interfaced to Herwig using the LO AUET2 tune.
The largest backgrounds to the t t events in the selected sample are from W+jets and Z+jets production.These were simulated with the LO event generator Alpgen 2.13 [38] with LO PDF set CTEQ6L1, and interfaced with Herwig 6.52.Alpgen calculates matrix elements (ME) for final states with up to six partons.The MLM [39] matching procedure was used to remove the overlaps between ME and parton shower products in samples with N and N +1 final-state partons.In addition to the inclusive parton-flavor processes, separate ME samples of W+b b+jets, W+cc+jets, W+c+jets, and Z+b b+jets were generated.The double counting of b-and c-quarks in W/Z+jets that occurs between the shower of the inclusive samples and the ME of heavy-flavor production was eliminated using an overlap-removal algorithm based on parton-to-jet ∆R matching [40].The W+jets and Z+jets event samples are referred to as Alpgen+Herwig.
The single-top backgrounds were simulated at NLO using the MC@NLO generator with the NLO PDF set CTEQ6.6 [34], and interfaced with Herwig, except the t-channel samples, which were modeled with the LO AcerMC 3.8 generator [41] interfaced with Pythia.Dibosons (WW, WZ, and ZZ) were generated with Herwig using the LO PDF set CTEQ6L1.All samples generated with Herwig for the parton shower evolution and hadronization used Jimmy 4.31 [42] for the underlying-event model.
The effects of pileup were modeled by overlaying simulated minimum-bias events on the hard-scattering events.The Monte Carlo events were then reweighted such that the distribution of the number of interactions per bunch crossing, ⟨µ⟩, matched the shape and observed average of ⟨µ⟩ = 9.1 in the 2011 data.

Signal and background classes
The most challenging backgrounds (i.e., those which most resemble t t ) are single-top and W/Z+b b+jets.Therefore, the t t cross-section measurement is expected to be affected most by the modeling of these backgrounds.To improve discrimination between the t t signal and the different types of background, this analysis separates the background events into two classes and treats them independently.The "Heavy" background class includes the Monte Carlo samples for single-top, W+b b+jets, and Z+b b+jets.All other types of background, including fake leptons, are assigned to the group designated as the "Light" class.Table 1 summarizes the composition of the classes and lists the datasets which are used to model them.
Table 1: Class definitions and compositions are presented.In the process column, "lf" is defined as any partons that are not b-quarks.The other columns show the source of the events and the fractional contribution to the given class.The expected numbers of signal and background events in the selected sample are presented in Table 2.The uncertainties shown include theoretical uncertainties in the production cross sections of the processes [4,[43][44][45][46][47].The W/Z+jets and diboson uncertainties include a contribution derived from an event yield comparison with Sherpa [48] Monte Carlo samples.The uncertainty in the number of events with fake leptons is estimated to be 20% for muons and 50% for electrons [49,50].The observed number of events in data is in good agreement with the prediction.

Analysis method
The SVM is a binary learning algorithm [51].For any two classes of events, the signed distance from a hyperplane that separates the events is the SVM discriminant.For the analysis presented in this paper, a system of three support vector machines is used to create a three-dimensional multi-class event classifier to distinguish signal events from two classes of background (i.e., Light and Heavy).For events from any dataset, the distances from the three hyperplanes, trained to distinguish between Signal vs.Light (SvL), Signal vs. Heavy (SvH), and Light vs. Heavy (LvH), are treated as the coordinates of points in a 3D decision space.The resulting templates of the prediction model are used in a binned likelihood fit to the analogous 3D distribution of the data events.
The SVM was chosen as the binary classifier because it is linear, it has firm mathematical foundations, and it offers a simple geometrical interpretation.Because the SVM method provides the solution to a straightforward convex optimization problem, the minimum it finds is a global one.The stopping point at the training stage is well defined, which therefore makes the method robust against overtraining.The method also works well in problems involving a large number of observables.

The SVM discriminant
Each event is described by N observables (i.e., features), and can be represented as a point, #» z , in an N-dimensional feature space.A linear binary classifier finds a hyperplane of dimension N − 1 that separates the two classes.Once the separating hyperplane is found, its reconstruction only requires the vectors that lie closest to the plane.These are the support vectors from whence the method derives its name.If the two classes to be discriminated are not linearly separable in the original feature space, this N-dimensional space can be mapped, #» z → #» φ ( #» z ), into a higher-dimensional space in which the problem is linearly separable.Detailed knowledge of this mapping is not required when it is known how to calculate the inner product of the mapped vectors [52,53].The distribution of classes in the mapped space can be probed directly by analyzing the multidimensional space which takes as its mathematical basis the SVM solutions for different class pairings.
The soft-margin SVMs [54] used in this analysis are constructed using a variant of the sequential minimal optimization algorithm [55] that includes the improvements suggested in Ref. [56].
In the case of a three-class problem like the one considered in this analysis, three different SVM classifiers are trained.Each SVM has the form which is the generalized equation of a plane.The j th SVM (with j in {1, 2, 3}) has a normal vector given by |w j ⟩, and a constant offset b j from the origin.The vectors #» v are the support vectors from training (all other training vectors find their λ = 0), the y's are their "truth" values (±1), and the λ's are parameters found in the training process along with b j .Hence, |w j ⟩ is a linear combination of training vectors mapped by #» φ to an alternative vector space.The bra-ket notation here serves as a reminder that these vectors belong to this mapped space.The inner product of two vectors in the mapped space given their non-mapped vectors #» x 1 and #» x 2 is determined via the kernel function . The SVMs in this analysis use the Gaussian kernel: The width σ is an input parameter of the training process, along with an additional positive constant C which limits the range of the λ's and is necessary for soft-margin SVMs.
In order to construct an orthonormal basis from the three trained SVMs, the Gram-Schmidt procedure [57] is used with their |w j ⟩ vectors: Using this basis, 3-tuples (X,Y, Z) for a decision space are created: .
In this way, an input vector #» z describing an event has coordinates in the XYZ space given by calculating ( X( It is these new coordinates in the decision space which are used to describe all events, and this is the space in which the 3D templates of the likelihood function are created.

Physics observables
In this analysis, 21 physics observables are used to distinguish t t events from background events (see Table 3).Twenty are kinematic variables, and one comprises the b-tagging information of the event.These include the electron or muon momentum, the number of jets in the event, the magnitude and direction of the missing transverse momentum vector, sums of the jet momenta components, the first five Fox-Wolfram moments (FWM), H T , the two largest eigenvalues of the normalized momentum tensor, and the mass of the lepton+jets system (m ℓj ).The H T is the scalar sum of E miss T , electron p T or muon p T , and the p T of all jets passing the selection requirements.
Fox-Wolfram moments [58] were originally introduced for e + e − colliders.The FWMs correspond to a decomposition of the event's phase space into Fourier modes on the surface of a sphere.They were modified for use at the Tevatron and the LHC to characterize the complex shapes of final states at hadron colliders [59,60].They form a set of event shape variables, and the l th FWM (H l ) is defined in the following way: The Y m l 's are the spherical harmonics, i runs over all selected jets in the event, and E T (total) represents the sum of the transverse energy from selected jets.The angles θ i and ϕ i indicate the direction of the i th jet.This analysis makes use of H 1 through H 5 .
The normalized momentum tensor uses the E miss T and the momenta of the lepton and up to five jets, and has the following form: Here i and j run over the x, y, and z components of momentum (for E miss T , only the x and y components are nonzero).The two largest eigenvalues of this "p-tensor" are used as SVM inputs.
Because the lepton+jets decays are rotationally invariant in ϕ, some variables are calculated with respect to the lepton direction in the plane transverse to the beam.Hence, for the momenta of jets, p ∥ and p ⊥ denote the components which are parallel and perpendicular to the direction of the lepton in the transverse plane.Similarly, the ϕ(E miss T ) variable is then the angle between the transverse momentum of the lepton and the missing transverse momentum vector, and p ∥ for the lepton corresponds to its entire transverse momentum.
The SVMs treat each variable as one of the coordinates of a point in a 21-dimensional space.The algorithm requires that each variable should fall roughly in the same numeric range so that all features contribute a similar weight when evaluating the distance from the separating hyperplane.The variables which have values outside the range of (−1, +1) are transformed such that they approximately meet this requirement.All input variables and the values that were used to scale them are listed in Table 3.

SVM training
The SVMs are trained to separate three classes of events: the Light and Heavy backgrounds and the t t signal.In order to train the SVMs, the Monte Carlo simulation samples and a data sample representing the multi-jet background are split into two subsamples.For training purposes, events from each class are randomly selected from those passing the selection requirements.The remaining events are used to test how well the trained SVMs perform.Also, it is only these remaining events that are utilized in the subsequent analysis.
The training process aims to find the set of support vectors that forms the optimal decision plane in the mapped space induced by the kernel function.As described in Section 7.1, there are two free parameters that need to be specified when training.These are the σ parameter of the Gaussian kernel and C, the positive constant which constrains the λ's in the solution.A search grid over the values of these parameters was implemented, and the performance of a given training was then evaluated based upon the area of the resulting receiver operating characteristic (ROC) curve created with the events not used in the training.As a result of this study, the values of 1.2 for σ and 2.0 for C were settled upon.The t t Signal class and the two background classes, Light and Heavy, each used 8,000 events for training, which is a small fraction of the total available events.Increasing the number of training events was not found to improve performance.
It was also verified that the trained SVMs were not overtrained (i.e., that their discriminant distributions generalize well from the training set to the full class dataset).

Class templates
Different physics processes can be distinguished by their distinctly different distributions in the XYZ decision space.These distributions are obtained by applying the Gram-Schmidt procedure to each event's SVM output values.Histogramming the different physics processes in the resulting 3D decision space creates probability distribution functions (i.e., templates) that can then be used in the likelihood fit.In order to minimize the potential effect of small fluctuations in the modeling, a small set of wider bins was constructed.The full 3D XYZ decision space was organized into four quadrants by dividing the space at Y = 0 and Z = −0.01.Each quadrant was then further divided into bins along the X axis.The quadrants are designated YZ, Yz, yZ, and yz, where the capitalization of the letters indicates where the quadrant is located (e.g., Yz means Y > 0.00 and Z ≤ −0.01).The division points for X were chosen to keep a minimum of approximately 1,000 events in each bin while preserving the shapes of the distributions.It is these four binned distributions from the quadrants that are used for the final profile likelihood fit.

Cross-section measurement
A binned profile likelihood function is used in a fit to determine the t t cross section from the data.In the likelihood fit, four templates are used: t t, W/Z, single-top, and fake lepton.In evaluating the systematic uncertainties, particularly with respect to the modeling of t t, it was observed that a large uncertainty occurs because of the similarity between the final states of t t events and single-top background events.To alleviate this effect the single-top backgrounds, which arise from electroweak processes rather than the strong interactions responsible for the production of t t pairs, are combined into a single template normalized to their predicted cross sections.The W/Z+b b+jets, the light-flavor W/Z+jets, and the diboson backgrounds are combined into a W/Z template.The normalizations of the t t, W/Z, and fake-lepton templates are free parameters of the fit.
The grouping of physics processes used when constructing the templates for the fit can differ from the class definitions used for training.At the training stage, the events are arranged in order to create SVMs that can distinguish between the t t signal, the Light backgrounds, and the Heavy backgrounds.After training, each physics process can be reassigned to templates.The chosen allocation of the physics processes to four templates (t t, W/Z, single-top, and fake lepton) results in smaller expected uncertainties in the t t cross section.
The likelihood function uses the templates (projections onto the X axis from each of the quadrants) that have been built in the XYZ decision space.Each template has an associated strength parameter θ in the likelihood.The maximum value of the likelihood is obtained in determining the central values of the θ parameters.The systematic uncertainties of the fit results are also included in the likelihood as nuisance parameters (NPs, α's, or collectively #» α ) with Gaussian constraints.Each template is a function of the nuisance parameters in the likelihood, which is then able to capture the effects due to each source of systematic uncertainty.
The likelihood of an unknown sample for an n T template problem is defined as Here G(α j , σ j ) is the Gaussian constraint for the j th NP, α j , with the corresponding uncertainty σ j ; and P(E i , o i ) is the Poisson probability mass function for the i th bin given the observed number of events, o i , and the expected number of events, E i : The templates are constructed such that T j (i, #» α ) gives the fractional number of events in template T j 's i th bin.Consequently, the sum over all bins of a given template is equal to 1.The N( #» α ) j are defined to be the total number of events expected from the j th template assuming an integrated luminosity L .To calculate this value, the following sum over all modeled processes belonging to the j th template is computed: The σ k and ϵ k are the cross section and acceptance for the k th physics process.They are derived from MC simulation.For the multi-jet background, N j is taken from the fake-lepton estimate.
A maximum-likelihood fit is performed to extract the values of the θ and α parameters.The 1σ uncertainty for a given parameter is taken to be the change in the value of that parameter which causes ln L to decrease by 0.5 away from ln L 0 , when ln L is maximized with respect to all other free parameters and where ln L 0 is the global maximum.All θ and α parameters have both their +1σ and −1σ uncertainties determined in this way.
The θ for the t t class, multiplied by the assumed cross section, gives the measured value of the t t cross section.Similarly, the uncertainty in θ tt from the fit, multiplied by the assumed cross section, gives the uncertainty in the σ tt measurement.

Systematic uncertainties
All systematic uncertainties were evaluated using the profile likelihood fit.The systematic effects are incorporated into the templates, and each template is associated with appropriate nuisance parameters (α's) in the likelihood.A nuisance parameter that takes a value of 0 in the fit keeps the nominal template, while a value of +1 or −1 changes the template to look like the +1σ / −1σ effect.Templates at intermediate values of the α's are linearly interpolated.A Gaussian constraint is also applied to each α in order to propagate its controlled uncertainty when the data have no preference for that systematic effect.The profile likelihood fit then provides a simultaneous measurement of the θ and α parameters.In this way, the systematic effects are converted into a statistical framework that properly takes into account correlations and which can potentially lower the uncertainties in the measurement.
The individual effects of various sources of systematic uncertainty are displayed in Table 4.They are obtained by leaving groupings of nuisance parameters out of the fit, and calculating the square of each effect as the difference of the squares of the total error and the residual error.

Object modeling
Systematic uncertainties in the lepton selection arise from uncertainties in lepton identification, reconstruction, and triggering.These are evaluated by applying tag-and-probe methods to Z → ℓℓ events [16].
Uncertainties due to the energy scale and resolution are also considered for electrons and muons.These effects are evaluated by assigning each of them a separate nuisance parameter in the likelihood so as to allow the error source to be shifted both upwards and downwards by its uncertainty.The resulting systematic effects are summarized in Table 4 as Leptons.
For jets, the main source of uncertainty is the jet energy scale (JES).The JES and its uncertainty are evaluated using a combination of test-beam data, LHC collision data, and simulation [21].As a result of the in situ analyses for the calibration of the full 2011 dataset, the correlations between various JES uncertainty components are encoded in 21 subcomponents.These include statistical and method uncertainties, detector uncertainties, modeling and theory uncertainties, mixed detector and modeling uncertainties, and pileup.The JES uncertainty is evaluated by assigning a separate NP to each of these 21 JES subcomponents.The jet energy resolution is separated by process (t t, single-top, and W/Z+jets) and is assigned three corresponding NPs.These extra degrees of freedom allow differences in the kinematics and prevalence of b-quark, light-quark, and gluon jets in these processes to be better represented in the profile likelihood.The resulting uncertainties in σ tt from these sources are indicated in Table 4 as Jets.
The jet-flavor-dependent efficiencies of the b-tagging algorithm are calibrated using dijet events, and dilepton t t events from data.Differences in the b-tagging efficiency as well as c-jet and light-jet mistag rates between data and simulation are parametrized using correction factors, which are functions of p T and η [22].The b-tag systematic uncertainties were evaluated by constructing nine NPs that correspond to unique bins in jet p T , as the uncertainties at low and high jet p T should be largely uncorrelated.Single NPs were used for the c-tag and Mistag systematic uncertainties.These systematic effects appear in Table 4 as three uncertainties labeled b-tag, c-tag, and Mistag.
During the variation of nuisance parameters related to jets and leptons, the E miss T is recalculated in accordance with the changes caused by those systematic effects.In this way, the jet and lepton uncertainties are propagated to the E miss T .However, the E miss T uncertainty due to calorimeter cells not assigned to any other physics object is evaluated individually.Also, an additional 6.6% uncertainty due to pileup is applied to E miss T .Both of these are given separate NPs in the profile likelihood, and they are listed in Table 4 under Missing transverse momentum.

Modeling of t t events
Systematic uncertainties due to the choice of t t MC generator are evaluated by taking the full difference between Powheg+Herwig (AF) and MC@NLO+Herwig (AF).The systematic uncertainty due to the choice of parton shower model is taken as the full difference between Powheg+Herwig (AF) and Powheg+Pythia 6 (AF).These are listed as Generator and Shower/hadronization in Table 4, respectively.
The systematic error due to uncertainties in the modeling of initial-and final-state radiation (ISR/FSR) is evaluated using Alpgen interfaced to Pythia 6. Monte Carlo samples were created in which the parameter that controls the amount of ISR/FSR in Alpgen was either halved or doubled.Half of the spread between the Alpgen samples with raised and lowered ISR/FSR parameter values is taken as the systematic error.
The uncertainty due to renormalization and factorization scales is evaluated with two modified samples, generated with MC@NLO interfaced with Herwig, in which parameters controlling the renormalization and factorization scales, introduced to cure the ultraviolet and infrared divergences in ME calculations, are simultaneously either halved or doubled.The full difference between the two samples is taken as the Renormalization/factorization error.
Each of the major t t modeling systematic uncertainties (Generator, Shower/hadronization, ISR/FSR, and Renormalization/factorization) is given a shape NP in each quadrant.The uncertainty in the normalization of events is assigned two NPs.One of these is used to track the migration of events between quadrants, and it mirrors the change in the relative normalization of the quadrants as observed when comparing the nominal and systematically shifted samples.The second NP is taken as an overall normalization error which corresponds to the normalization difference seen for the full event selection (where all four quadrants are combined).Therefore, each of the t t modeling uncertainties mentioned above has six NPs (four for shape, and two for normalization).
The underlying-event modeling error is evaluated by comparing two different t t MC event samples produced with varied parameters in Powheg+Pythia 6.One was generated with the Perugia 2011 central tune, and the other with Perugia 2011 mpiHI [32].Both of these samples use the P2011 CTEQ5L Pythia tune, and not P2011C CTEQ6L1, which applies to the nominal t t MC sample.Their full difference is used as the measurement uncertainty for the underlying event, using a single NP in the profile likelihood.
All particles in the final state from the LHC pp collisions must be color singlets.Different schemes for the color reconnection (CR) of the beam remnant and other outgoing hard collision objects are examined.The t t cross-section uncertainty due to this effect is estimated by comparing two different t t MC samples produced with Powheg+Pythia 6.A reference sample was obtained using the Perugia 2011 central tune.The other sample was generated with Perugia 2011 noCR [32], and has modified color reconnection parameters.These samples use the P2011 CTEQ5L Pythia tune.The full difference between these two samples is taken as the CR uncertainty, using a single NP.
To estimate the uncertainty due to the choice of parton distribution function, the CT10 PDF set parametrization is examined using its 26 upwards and downwards systematic variations.Each of the 26 CT10 eigenvector components is assigned a separate NP in the profile likelihood.

Background modeling
To estimate the error due to the shape of the W/Z+jets backgrounds and to assess the effect of any mismodeling, background samples are reweighted to match data for each of the following variables, taken one at a time, in a signal-depleted control region: lepton E, ϕ(E miss T ), m ℓj , jets p ∥ , and jets p ⊥ .This control region was defined as events matching the nominal selection, but containing exactly three jets, none of which are b-tagged.The reweighting functions were applied only to W/Z+jets samples (leaving t t, single-top, and fake lepton untouched).Five NPs in the profile likelihood implement these functions such that the NPs turn the reweighting effects on and off, each according to the differences seen for these five variables in the data.In Table 4, these effects appear under the heading W/Z reweighting.For the single-top shape, AcerMC samples with raised and lowered ISR/FSR parameter values are compared, and a single NP is assigned to this systematic uncertainty.
The effects due to the uncertainty of the single-top, W+jets, and Z+jets cross sections are investigated by varying these cross sections within their theoretical errors.For the W+jets background, a 4% uncertainty is applied to the inclusive W boson cross section, with an additional 24% uncertainty added in quadrature at each ascending jet multiplicity [39,61].This method is also applied to the Z+jets cross sections.To evaluate the systematic uncertainty in the t t cross section due to the theoretical uncertainties in the single-top cross section, the single-top cross section is varied in accordance with the theoretical results, taken from Refs.[45][46][47].The relative normalization within the W/Z+jets MC sample is varied by raising and lowering the corresponding nominal relative yields of each jet multiplicity by their respective errors.Similarly, the relative normalization between fake-electron and fake-muon events is varied by raising and lowering their nominal predictions by their errors.The resulting effects are evaluated using appropriate NPs added in the profile likelihood fit.This uncertainty is quoted as W/Z and fakes relative normalization in Table 4. Also, the uncertainty due to variations of the W+jets heavy-flavor fraction is included via three NPs in the profile likelihood.These NPs place an additional 25% uncertainty on each of the assumed W+b b+jets, W+cc+jets, and W+c+jets cross sections.Table 4 summarizes these in the row labeled Heavy-flavor fraction.

Template statistics / luminosity
For the profile likelihood fit, an additional fit parameter is introduced for each bin.These parameters are used to represent the Poisson fluctuation of the predicted number of events in each bin as estimated from the size of the Monte Carlo samples.The error propagated to θ tt from these additional parameters is then an appropriate representation of the MC statistical error.
The integrated luminosity measurement has an uncertainty of 1.8% [24], and therefore each physics process is assigned an uncertainty of this magnitude.This systematic error is controlled by a single nuisance parameter in the likelihood.
The total measurement uncertainty, including individual groups of contributions, is listed in Table 4.The largest uncertainties are due to the lepton selection and luminosity, followed by the uncertainties due to JES, b-tagging, ISR/FSR, and other t t modeling.

Beam energy
The LHC beam energy during the 2012 √ s = 8 TeV pp run was measured to be within 0.1% of the nominal value of 4 TeV per beam, using the revolution frequency difference of protons and lead ions during p + Pb runs in early 2013 combined with the magnetic model errors [62].A similar uncertainty in the beam energy is applicable to the 2011 LHC run.The approach used in Ref. [63] was therefore applied to the measurement using the √ s = 7 TeV dataset.The uncertainty in the t t theoretical cross section due to this energy difference was calculated to be 0.27%, using the Top++ 2.0 program [5] and assuming that the relative change of the t t cross section for a 0.1% change in √ s is as predicted by the NNLO + NNLL calculation.It is negligible compared to other sources of systematic uncertainty.Table 4: Summary table of the measurement uncertainties.Because the profile likelihood fit accounts for correlations, the total error is not simply the components added in quadrature.Individual effects were obtained by leaving groupings of NPs out of the fit, and calculating the square of each effect as the difference of the squares of the total error and the residual error.
This result includes all systematic uncertainties as evaluated with the profile likelihood fit, with the statistical and luminosity errors listed separately.
Figure 2 shows a comparison between the observed and fitted numbers of events in each of the quadrants.A correlated χ 2 test was used to check that there is good agreement between the data and the fit results within the combined statistical and systematic error bands.
Comparison plots between data and the fit prediction are shown for a few selected input variables in Figure 3 for all events.Analogous comparisons in signal-rich and background-rich regions of the XYZ space are shown in Figures 4 and 5, respectively.The signal-rich region is defined by X > 0, Y > 0, and Z < 0, while the background-rich region lies in the opposite octant of XYZ space, and has X < 0, Y < 0, and Z > 0. The X dimension corresponds to the Signal vs.Light decision hyperplane, while the Y and Z dimensions are linear combinations of the other SVM hyperplanes and are the directions orthogonal to X. Based on a correlated χ 2 test, the data and the fit agree well within the combined statistical and systematic error bands for all 21 variables.
The measured t t cross section is in good agreement with the theoretical predictions based on the NNLO + NNLL calculations of σ NNLO+NNLL tt = 177 +5 −6 (scale) ± 9 (PDF+α s ) pb = 177 +10 −11 pb for pp collisions at a center-of-mass energy of √ s = 7 TeV and a top quark mass of 172.5 GeV [4].
The ATLAS measurement of the t t cross section at 7 TeV in the dilepton channel [9] is σ tt = 182.9± 3.1(stat.)± 4.2(syst.)± 3.6(lumi.)pb.Depending upon the assumptions made for the systematic uncertainty correlations between these two measurements, the significance of their discrepancy was found to be in the  The fitted yields of signal and background processes compared with data, shown in four YZ quadrants divided along the X axis, as used in the fit.They are labeled quad1YZ, quad2Yz, quad3yZ, and quad4yz (the boundary letters are appended for easy reference).The lower panel shows the ratio of data to fit prediction.The shaded regions correspond to the statistical and systematic uncertainties.The first and last bins also contain any events found outside the range of the horizontal axis.

Top quark mass dependence
The result of the profile likelihood fit depends on the assumed mass of the top quark through differences in t t acceptance owing to lepton kinematics, and also from minor variations in the shape of the discriminant.The analysis in this paper assumes a top quark mass of m ref = 172.5 GeV.The 2014 average of Tevatron and LHC Run 1 measurements of the top quark mass [64] gives a value of m t = 173.34± 0.27(stat.)± 0.71(syst.)GeV.The current ATLAS average from 2019 [65] yields m t = 172.69± 0.25(stat.)± 0.41(syst.)GeV.
The dependence of the t t cross section on the mass of the top quark was determined through alternative profile likelihood fits that assume different top quark masses.Monte Carlo samples for both t t and singletop with top quark mass values of 165.0, 170.0, 172.5, 175.0, 177.5, and 180.0 GeV were employed to measure the t t cross section assuming each of these masses.These measurements were then fitted to obtain the t t cross section's top quark mass dependence.When constrained to go through this measurement's central value at 172.5 GeV, the best-fit second-order polynomial for the t t cross section as a function of ∆m t = m t − m ref is σ tt (∆m t ) = 0.016 • ∆m 2 t − 0.75 • ∆m t + 168.5 pb, with ∆m t in GeV.

Summary
A measurement of the top quark pair-production cross section in the lepton+jets channel was performed with the ATLAS experiment at the LHC, using a multivariate technique based on support vector machines.
The measurement was obtained with a three-class, multidimensional event classifier.It is based on 4.6 fb −1 of data collected during 2011 in pp collisions at √ s = 7 TeV.The t t cross section is found to be σ tt = 168.5 ± 0.7(stat.)+6.2 −5.9 (syst.)+3.4 −3.2 (lumi.)pb, which has a relative uncertainty of +4.2 −4.0 %.This measurement is consistent with the Standard Model prediction based on QCD calculations at next-to-nextto-leading order.
France; DFG and AvH Foundation, Germany; Herakleitos, Thales and Aristeia programmes co-financed by EU-ESF and the Greek NSRF, Greece; BSF-NSF and MINERVA, Israel; Norwegian Financial Mechanism 2014-2021, Norway; NCN and NAWA, Poland; La Caixa Banking Foundation, CERCA Programme Generalitat de Catalunya and PROMETEO and GenT Programmes Generalitat Valenciana, Spain; Göran Gustafssons Stiftelse, Sweden; The Royal Society and Leverhulme Trust, United Kingdom.

Appendix: Fit visualization
The fit to the data can be difficult to visualize because of the 3D nature of the decision space.A series of 2D and 1D projections of the 3D space have been created in order to better illustrate its characteristics.
Projections are provided in the XY, XZ, and YZ planes to give a qualitative comparison of the fit results with the data in XYZ space, and also to help visualize the definitions of the signal-rich and background-rich regions employed in Figures 4 and 5.The 2D projections of the 3D decision space use the standard fit results in XYZ space, but project the constant binning obtained with cubes of edge length 0.008 in that space.On the right side of Figure 6 the 2D projection plots show the fractional relative difference between the number of events predicted by the fit and the observed data.The 2D plots on the left side of Figure 6 show the class composition of each bin when projected onto the XY, XZ, and YZ planes.The rectangle representing each 2D bin is colored from top to bottom such that the colors that fill it are in the same ratio as the predictions for each class in that bin.Contour lines, which each represent the concentration of events in the plane, are drawn on top of the colored bins.Contours exist for each of the following fractional values of the maximal 2D bin height: (0.05, 0.10, 0.20, 0.40, 0.60, 0.80, 0.90, 0.95).
The 1D projections of the 3D decision space onto the X, Y, and Z axes are shown in Figure 7 for the results of the fit to the data.For visualization purposes, the full 3D decision space is projected onto each of these axes.These projections have the systematic error bands superimposed, and also include the ratio of data to fit prediction.Figure 7: The 1D projections of the data and expected events after the likelihood fit.X, Y, and Z form an orthogonal basis in the decision space defined by the three trained SVMs.X corresponds to the Signal vs.Light plane.Y and Z are the directions orthogonal to X found with the remaining Signal vs. Heavy and Light vs. Heavy SVMs.The projections use the fit parameters obtained with the profile likelihood fit to four quadrants, but project a constant binning obtained with cubes of edge length 0.008 in XYZ space.In each of these plots, the prediction is broken down by the templates used in the fit.The first and last bins contain events found outside the range of the horizontal axis.[66] ATLAS Collaboration, ATLAS Computing Acknowledgements, ATL-SOFT-PUB-2021-003, 2021, url: https://cds.cern.ch/record/2776662.

Figure 1 (Figure 1 :
Figure 1(a) shows a contour at a fixed value of the template function from each of the classes.This highlights the different regions in 3D decision space where the different event types congregate.As an alternative way of illustrating these distributions, Figure 1(b) shows a sampling of 10,000 events from each class.Also shown are the three trained decision planes, which serve to demonstrate the nonorthogonal nature of the basis defined by the SVMs.The SvL plane at X = 0 is seen to separate Signal from Light as it extends downwards in Z.Similarly, the SvH and LvH planes separate their training classes.The multiple band structure in the templates arises because of training with the number of b-tags, which is a strong discriminant and is discrete.These template bands represent groupings of events with 0, 1, or 2 b-tags.

Figure 2 :
Figure 2: The fitted yields of signal and background processes compared with data, shown in four YZ quadrants divided along

Figure 3 :
Figure 3: The data distributions of six selected input variables are shown with their post-fit predictions in the selected sample.The predicted signal fraction is 24.8%.Shown are (a) the number of jets, (b) the number of b-tagged jets, (c) H T , (d) the 4 th Fox-Wolfram moment, (e) E miss T , and (f) the mass of the lepton and jets.Data are shown with the overlaid dots.The predicted events are shown for each of the templates used in modeling the data.The statistical and systematic error bands are given by the shaded regions.The first and last bins contain events found outside the range of the horizontal axis.

Figure 4 :
Figure 4:  The data distributions of six selected input variables are shown with their post-fit predictions in the selected sample for the signal-rich region with X > 0, Y > 0, and Z < 0. The predicted signal fraction in this region is 79.3%.Shown are (a) the number of jets, (b) the number of b-tagged jets, (c) H T , (d) the 4 th Fox-Wolfram moment, (e) E miss T , and (f) the mass of the lepton and jets.Data are shown with the overlaid dots.The predicted events are shown for each of the templates used in modeling the data.The statistical and systematic error bands are given by the shaded regions.The first and last bins contain events found outside the range of the horizontal axis.

Figure 5 :
Figure 5: The data distributions of six selected input variables are shown with their post-fit predictions in the selected sample for the background-rich region with X < 0, Y < 0, and Z > 0. The predicted signal fraction in this region is 3.1%.Shown are (a) the number of jets, (b) the number of b-tagged jets, (c) H T , (d) the 4 th Fox-Wolfram moment, (e) E miss T , and (f) the mass of the lepton and jets.Data are shown with the overlaid dots.The predicted events are shown for each of the templates used in modeling the data.The statistical and systematic error bands are given by the shaded regions.The first and last bins contain events found outside the range of the horizontal axis.

Figure 6 :
Figure 6:  (left) These 2D projections show the composition of each bin according to the template fit results.Bins are shaded in the same ratio as the fit prediction.The contours drawn on top of the bins represent the overall concentration of events.Contours are provided at 5%, 10%, 20%, 40%, 60%, 80%, 90%, and 95% of the maximal 2D bin height.(right) The 2D bins are shaded in accordance with the fractional difference between the observed data and the number of events predicted by the fit.The contours shown are the same as in the left column and provide a reference.

Table 2 :
The observed and expected numbers of events in the selected sample is shown.The first two columns list the contributions by physics process, and the two rightmost columns present events by class.The Heavy class includes the W/Z+b b+jets and single-top processes, while the Light class includes all other backgrounds.Predicted values are rounded with respect to their individual uncertainties.

Table 3 :
List of the 21 variables used as input to the SVMs.The variables were divided by the given values to make them all of similar magnitude.